CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Urgent Help: improve Supermicro Workstation or new multicore?

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree1Likes
  • 1 Post By mdgowhar

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   July 6, 2016, 08:18
Default Urgent Help: improve Supermicro Workstation or new multicore?
  #1
New Member
 
Join Date: Apr 2013
Posts: 26
Rep Power: 12
pippo2013 is on a distinguished road
Hi everybody,

I am a newbie and I do need some help in choosing how to spend around 1000-1500 euros within the end of the week (the remains of a research fund) in the best possible way to improve one of the multicore computer I am currently using or to buy a new multicore desktop (adding maximum another 500-1000 euros).

So any help is very appreciated!!

I currently use Openfoam with problems of maximum 10 millions cells and LES simulation, pimplefoam and 1 phase flow.

The first hypothesis would be to improve one of the two multicores that are available at the moment:
P24: has a Supermicro Workstation 4022G-6F, and 2 CPU AMD OPTERON 12-CORE 6238, 2.8GHz, 128 Gb Ram (DDR3 1333 MHz)
P16: has a Supermicro Workstation 4022G-6F, and 2 CPU AMD OPTERON 8-CORE 6328, 3.2GHz, 96 Gb Ram (DDR3 1333 MHz)

The 24 cores (P24) despite having more processors, even though a bit slower, is very slow if compared with the 16 cores (P16) even increasing for the same simulation the number of processes. And L3 cache should be better for P24.
I do not understand if the reason is just the clock speed (and from what I have read shouldn't be) or the amount of RAM.

Would it be useful to increase the memory? I did not buy or configure these workstations but memory should not be populated equally in adjacent memory banks? It is not the case for P16

Since their maximum bandwidth is 51Gb/s, how much RAM they should have? Are there other bottlenecks in their configuration?
I have uploaded the full specs in two separate files.

Please help me to understand if it is worthwhile to invest some money on them to improve their performance or it is better to devote this money (also adding another 500-1000 euros) to buy a multicore desktop.

Which components would you suggest in this second hypothesis for my needs and for maximum 5 millions cells? Maybe an i7 5820 with x99?
Which other components?

Thanks in advance!!!
Attached Files
File Type: txt P24_info.txt (33.0 KB, 8 views)
File Type: txt P16_info.txt (31.1 KB, 6 views)
pippo2013 is offline   Reply With Quote

Old   July 7, 2016, 19:10
Default
  #2
Senior Member
 
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,162
Rep Power: 23
evcelica is on a distinguished road
According to the motherboard manual on Page 27:
ftp://ftp.supermicro.com/CDR-APLUS2_...DGi(6)(-F).pdf

It looks like P24 is configured correctly, and P16 has one processor operating in dual channel configuration instead of quad channel. That unbalances the memory, and should slow down P16 considerably. The fact that P16 is faster is quite odd, as that shouldn't be the case because of the unbalanced memory configuration of P16.

Are the CPUs being throttled due to heat or some other reason, or are they running at their maximum frequency?
Have you run other benchmarks like linpak (intel burn test)?
evcelica is offline   Reply With Quote

Old   July 8, 2016, 07:57
Default
  #3
Member
 
Mohammed Gowhar
Join Date: Feb 2014
Posts: 48
Rep Power: 12
mdgowhar is on a distinguished road
Quote:
P24: has a Supermicro Workstation 4022G-6F, and 2 CPU AMD OPTERON 12-CORE 6238, 2.8GHz, 128 Gb Ram (DDR3 1333 MHz)
P16: has a Supermicro Workstation 4022G-6F, and 2 CPU AMD OPTERON 8-CORE 6328, 3.2GHz, 96 Gb Ram (DDR3 1333 MHz)
I already done a benchmark study in which I found that the more L3 Cache memory in AMD processors, more faster you will get.

In your case
P24:
AMD OPTERON 12-CORE 6238 has L3 = 16MB.
L3 Cache per core : 1.33Mb per core which is low.
P16:
AMD OPTERON 8-CORE 6328 has L3 = 16MB.
L3 Cache per core : 2MB per core.

And I noticed that in P16 you have installed only 2 x 16GB in Memory-1 slots. Fill two more slots in Memory-1 to get maximum memory bandwidth.

As a result, your P16 will be 1.5 times faster than P24 if you populated the memory properly.

I have AMD Opteron 6308 processor (L3 cache : 4MB per core) which is 2 times faster of P16. I have a benchmark case in OpenFOAM with one million cells which I got 5.2 times faster than i7-4790k.
pippo2013 likes this.
mdgowhar is offline   Reply With Quote

Old   July 8, 2016, 11:23
Default
  #4
New Member
 
Join Date: Apr 2013
Posts: 26
Rep Power: 12
pippo2013 is on a distinguished road
Thank you so much for this clear explanation... I couldn't figure out the reason why it was that slow!!
pippo2013 is offline   Reply With Quote

Old   July 8, 2016, 12:35
Default
  #5
Senior Member
 
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,162
Rep Power: 23
evcelica is on a distinguished road
Is this some tiny benchmark you are running, because Cache size shouldn't increase performance linearly on large problems where there is heavy RAM usage.
evcelica is offline   Reply With Quote

Old   July 11, 2016, 01:21
Default
  #6
Member
 
Mohammed Gowhar
Join Date: Feb 2014
Posts: 48
Rep Power: 12
mdgowhar is on a distinguished road
I am sorry I was away for the weekend.
You setup a case in OpenFOAM and upload here. I will run in i7-4790K and AMD 6308 and send you the log file.
Lets benchmark it.
mdgowhar is offline   Reply With Quote

Old   July 14, 2016, 07:54
Default
  #7
New Member
 
Join Date: Apr 2013
Posts: 26
Rep Power: 12
pippo2013 is on a distinguished road
Thank you so much to both of you!! I do apologize for late reply but I was away the last week.

I have added the ram in P16 and obtained much better results! thank you mdgowhar!!

Evcelica is right!! my cases were indeed very small, around 1.5 million cells and unsteady. When I have compared simulations more memory consuming the number of processors of p24 speeds the simulation, as it should be, no matter of the cache and of the clock frequency!! And correctly scales with the number of processors!
pippo2013 is offline   Reply With Quote

Old   July 15, 2016, 02:22
Default
  #8
Member
 
Mohammed Gowhar
Join Date: Feb 2014
Posts: 48
Rep Power: 12
mdgowhar is on a distinguished road
Hi pippo2013!

Its good to see the results.
1. Can you tell me on how many processors you ran your simulation on P16 an P24?
2. What type of decomposePar (Scotch or simple or hierarchical) methods are you using?
mdgowhar is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Dual cpu workstation VS 2 node cluster single cpu workstation Verdi Hardware 18 September 2, 2013 04:09
PC vs. Workstation Tim Franke Main CFD Forum 5 September 29, 1999 16:01


All times are GMT -4. The time now is 00:53.