CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Hardware

4 cpu motherboard for CFD

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   November 6, 2011, 04:54
Default 4 cpu motherboard for CFD
  #1
New Member
 
Join Date: Oct 2011
Posts: 5
Rep Power: 5
petrile_83 is on a distinguished road
Have anyone used 4 cpu motherboars for cfd? With 4 cpu motherboard and four opteron 6174 prosessors you can build compact 48 core machine. For example this motherboard:

http://www.supermicro.com/Aplus/moth...x0/H8QGi-F.cfm

Which one would be faster setup?

hardware 1:
cluster of 2 computers
2 cpu motherboards and two 6174 opteron for each machine

hardware 2:
1 computer
4 pcs 6174 opteron
4 cpu motherboard
petrile_83 is offline   Reply With Quote

Old   November 7, 2011, 16:06
Default
  #2
Senior Member
 
sail's Avatar
 
Vieri Abolaffio
Join Date: Jul 2010
Location: Always on the move.
Posts: 308
Rep Power: 7
sail is on a distinguished road
Quote:
Originally Posted by petrile_83 View Post
Have anyone used 4 cpu motherboars for cfd? With 4 cpu motherboard and four opteron 6174 prosessors you can build compact 48 core machine. For example this motherboard:

http://www.supermicro.com/Aplus/moth...x0/H8QGi-F.cfm

Which one would be faster setup?

hardware 1:
cluster of 2 computers
2 cpu motherboards and two 6174 opteron for each machine

hardware 2:
1 computer
4 pcs 6174 opteron
4 cpu motherboard
should be the setup number 2. comunnications within the machine are way faster.
sail is offline   Reply With Quote

Old   November 7, 2011, 18:02
Default
  #3
Member
 
Robert
Join Date: Jun 2010
Posts: 86
Rep Power: 8
RobertB is on a distinguished road
Aren't the 2 processors per board and 4 processors per board a different part number - for opterons now they are the 2000 and 8000 series I believe? Typically the 4 processor per board models are significantly more expensive.

I would think that with a decent interconnect that the two board solution is probably faster and cheaper. You will probably get more memory bandwidth per core with the 2 board solution.
RobertB is offline   Reply With Quote

Old   November 8, 2011, 05:05
Default
  #4
Senior Member
 
Markus Rehm
Join Date: Mar 2009
Location: Erlangen (Germany)
Posts: 176
Rep Power: 8
markusrehm is on a distinguished road
Hi,

the 4 processor solution together with the 12 core Opterons (6100 series aka Magny Cours) and the soon available 6200 series (aka Interlagos) which should fit into the same board are really very popular at the moment in the HPC community because they offer a very good price performance ratio. The memory bandwidth is also quite nice. For a benchmark you might read here:

http://www.anandtech.com/show/3894/s...clash-dellr815

If you can wait: the prices of Interlagos should be even more competitive but what first benchmarks for yet available desktop FX-series indicate is that you need some compiler tuning to get full performance:

http://www.phoronix.com/scan.php?pag...ompilers&num=1

Regards, Markus.
markusrehm is offline   Reply With Quote

Old   November 9, 2011, 10:47
Default
  #5
Senior Member
 
Join Date: Mar 2009
Location: Austin, TX
Posts: 134
Rep Power: 9
kyle is on a distinguished road
CFD performance with unstructured grids on AMD's multi-socket boards is extremely poor. This article from anandtech tries to investigate why. I am assuming that Interlagos won't fix this entirely.

The best price/performance for CFD available now is far and away Intel's desktop chips. Four i5 2400 machines, which you can build for as little as $300 each, would blow your two choices out of the water. With just four machines you can get away with just a gig-e network.

Or, you could wait a week and get the new Intel Sandy Bridge E chips, which have six cores and an absolutely ridiculous amount of memory bandwidth. They machines would cost a little more than ones using the current Sandy Bridge chips, but the performance should be significantly more as well. It definitely would be way cheaper, and way faster, than buying server class hardware from AMD.
kyle is offline   Reply With Quote

Old   November 10, 2011, 04:49
Default
  #6
Senior Member
 
Markus Rehm
Join Date: Mar 2009
Location: Erlangen (Germany)
Posts: 176
Rep Power: 8
markusrehm is on a distinguished road
I doubt that Euler3d results are representative for general CFD
performance. On this system

How do people even make use of super computers for CFD?

the speedup was almost linear.

Also Gigabit Ethernet interconnects are not a good choice if you want top performance.

From my point of view you are better off with Intel chips at the moment if the licensing model of your CFD code is per core. If this doesn't matter Opterons are often the better alternative. But as we saw before this is not generally valid so best you run benchmarks of your code before buying.

Regards, Markus.
markusrehm is offline   Reply With Quote

Old   November 10, 2011, 11:25
Default
  #7
Senior Member
 
Join Date: Mar 2009
Location: Austin, TX
Posts: 134
Rep Power: 9
kyle is on a distinguished road
Gigabit ethernet is good enough for very small clusters. I had a four node cluster with gigabit ethernet that scaled from one to four nodes at 90% efficiency. Infiniband would take that up to what, 93%? For the money I could just buy another node and get ~20% speedup instead of ~3%.

AMD just is not competitive right now. With traditional CFD on unstructured grids, performance is dominated by memory bandwidth, memory latency and caching... all of which are areas that Intel has a significant advantage. Clockspeed doesn't really matter, I overclocked my machines from 3.4ghz to 4.0ghz and only saw a tiny speedup.

Regardless of per-core licensing issues, if you have a fixed amount of money to spend then buying Intel systems will give you the fastest cluster.

All of this only holds true for traditional CFD on unstructured meshes. If you are using structured meshes or a Lattice Boltzman code like Exa, then AMD likely DOES make sense.
kyle is offline   Reply With Quote

Old   November 11, 2011, 16:46
Default
  #8
Senior Member
 
Join Date: Oct 2009
Location: Germany
Posts: 637
Rep Power: 12
abdul099 is on a distinguished road
Another point to consider is energy consumption. My private owned AMD CPU is slower than the Xeon in my workstation and needs more energy. This is no issue as long as it doesn't run for a long time, but when it's up an 24/7 and under full load, it makes a huge difference. In Germany, it makes a difference of 50 bucks on the electricity bill per node in just a year. But the AMD would need to run at leas 20% longer to get the same results.
It's a shame, as I don't like the total market control and pricing policy of Intel - but at least the moement, AMD can't compete with the power and efficiency of Intel CPU's.
abdul099 is offline   Reply With Quote

Old   December 18, 2011, 13:10
Default
  #9
New Member
 
Ulrich Siller
Join Date: Dec 2011
Location: Germany
Posts: 2
Rep Power: 0
USiller is on a distinguished road
I recently had the chance to make a little benchmark between a two socket XeonX5675 (24 Cores, 3.06GHz) and the new AMD Opteron 6274 (32 Cores, 2.1GHz). I run the DLR turbomachinery solver TRACE on a multi-block mesh of a axial compressor stage. OS was openSuse 12.1 in both cases, use of openMPI for parallelization

The results at a glance

machine numberJobs numberCores timesteps/minute (over all jobs)
XeonX5675 3 4 30,57
XeonX5675 3 8 33,93
XeonX5675 4 6 34,09

Opt6274 4 4 26,79
Opt6274 4 8 37,57


The main conclusions (from my perspective)

- Hyperthreading on Xeon is only effective in case of imperfect load balancing, at least for this number crunching intensive code.
- The sharing of one FPU for two cores on the Opteron system is the better deal for CFD, the test with 4*8 cores has about 40% more speed than 4*4 cores (one FPU per process)

- Opteron is the better deal, especially for a four socket system with infiniband interconnection, resulting in much lower hardware costs.
USiller is offline   Reply With Quote

Old   April 11, 2012, 05:14
Default
  #10
New Member
 
Join Date: Mar 2009
Posts: 5
Rep Power: 8
gskillas is on a distinguished road
Dear Mr. Siller

just to make sure I understand your benchmark correctly: You run three/four distinct cases utilizing all cores available to the system.

Could it be that if you use all cores for one job (and make sure that no processor switches happen, emptying the INT/CMD/FPU pipelines) the results may look different? (And yes, I agree, HT is not relevant for CFD).

I am asking because I have to make the desicion Opteron 62XX vs E5-26YY and there are different aspects to consider. From the Benchmarks

http://www.amd.com/de/products/serve...t-servers.aspx

ROMS and WRFv3 are interesting for CFD applications, while

http://investors.ansys.com/releaseDe...leaseID=662929

it seems to me that the 6174 processor can only win in certain rather artificial situations. If any you need to consider 6276 as a direct E5-26YY competitor.

Best regards,

George Skillas

Quote:
Originally Posted by USiller View Post
I recently had the chance to make a little benchmark between a two socket XeonX5675 (24 Cores, 3.06GHz) and the new AMD Opteron 6274 (32 Cores, 2.1GHz). I run the DLR turbomachinery solver TRACE on a multi-block mesh of a axial compressor stage. OS was openSuse 12.1 in both cases, use of openMPI for parallelization

The results at a glance

machine numberJobs numberCores timesteps/minute (over all jobs)
XeonX5675 3 4 30,57
XeonX5675 3 8 33,93
XeonX5675 4 6 34,09

Opt6274 4 4 26,79
Opt6274 4 8 37,57


The main conclusions (from my perspective)

- Hyperthreading on Xeon is only effective in case of imperfect load balancing, at least for this number crunching intensive code.
- The sharing of one FPU for two cores on the Opteron system is the better deal for CFD, the test with 4*8 cores has about 40% more speed than 4*4 cores (one FPU per process)

- Opteron is the better deal, especially for a four socket system with infiniband interconnection, resulting in much lower hardware costs.
gskillas is offline   Reply With Quote

Old   April 16, 2012, 08:08
Smile
  #11
New Member
 
Ulrich Siller
Join Date: Dec 2011
Location: Germany
Posts: 2
Rep Power: 0
USiller is on a distinguished road
Hi Mr. Skillas,

your are right: I started the same computation n times on the machine and measured the time to finish for a specific number of timesteps. While for the Interlagos and the Xeon without HT all runs finished quite at the same time, the OT on case had very different running times (up to 10%).

My little benchmark is far away answering even the most important questions of the matrix beeing relevant for parallel computing.

We had the following strategy to answer the question:
- We have no core based licensing issue of our CFD solver - that simplifies a lot.
- Comparing the hardware costs of an Xeon based 2-socket server and an Interlagos 4-socket server (both with IB interconnection) we came up with approx. half the hardware costs per core for the AMD system - the lower clock speed of the AMD is already included.

Last week we received our HPC cluster from Delta Computer GmbH (Hamburg) and we are now looking forward to test again in-house .

Best regards,
Ulrich Siller
USiller is offline   Reply With Quote

Old   April 16, 2012, 16:49
Default
  #12
Senior Member
 
Charles
Join Date: Apr 2009
Posts: 179
Rep Power: 9
CapSizer is on a distinguished road
Ulrich, it would be great if you could keep us informed about what you find. I am particularly interested in seeing how well your application scales on a node compared to how well it scales across nodes. There seems to be quite a lot of uncertainty about whether it is really better to run with many cores on a motherboard (call it pure shared memory), or if it is faster to have more nodes, but not so many cores per motherboard.
CapSizer is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Superlinear speedup in OpenFOAM 13 msrinath80 OpenFOAM Running, Solving & CFD 18 March 3, 2015 06:36
stop when I run in parallel Nolwenn OpenFOAM 34 September 24, 2011 14:51
OpenFOAM 13 Intel quadcore parallel results msrinath80 OpenFOAM Running, Solving & CFD 13 February 5, 2008 06:26
OpenFOAM 13 AMD quadcore parallel results msrinath80 OpenFOAM Running, Solving & CFD 1 November 11, 2007 00:23
Dual Core CPU hjasak OpenFOAM Running, Solving & CFD 5 July 22, 2006 03:57


All times are GMT -4. The time now is 17:57.