CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Hardware

32 Cores Mini-Cluster, 2 Nodes vs. 4 Nodes performance

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   July 7, 2016, 04:21
Default 32 Cores Mini-Cluster, 2 Nodes vs. 4 Nodes performance
  #1
New Member
 
Join Date: Jun 2014
Posts: 3
Rep Power: 4
Courageous is on a distinguished road
Hello everyone.

Iíve been working for some time on the definition of a 32 cores mini-cluster configuration, mainly to run CFX.
Most of the components have been chosen, but one very important question remains about the number of computing nodes: What is the difference in terms of performance between a 2 nodes configuration with 2*8 cores each & a 4 nodes configuration with 2*4 cores each ?
CPUís would be Xeon E5-2600 v4, 2667 for 2 node configuration & 2637 for 4 cores configuration

I remember reading a thread where Glenn Horrocks said that cfp2006 benchmark is a good indicator of final CFX-Fluent performance, so I performed some comparison based on cfp2006 benchmark results (see attachment)

The conclusion is that the 4 nodes configuration would be 70% faster than the 2 nodes configuration, which seems unbelievable.
Base frequency is only 10% higher for 4 cores CPU but Memory Bandwidth per core is 50% also higher for 4 cores CPU. Is it enough to explain such difference ?
I assumed scalability is perfect because weíll be using Infiniband, but even if inter-node scalability is 90% the gap will still be huge.
The only drawback of the 4 nodes configuration is the total CPU dissipated power which is doubled compared to the 2 nodes config...

Does anyone has some advice or comment about that ?
Did I miss something in my logic ?

Thanks for reading.
Aurelien
Attached Images
File Type: jpg 32 Cores Xeon E5-2667 v4 vs. E5-2637 v4.JPG (85.4 KB, 26 views)
Courageous is offline   Reply With Quote

Old   July 7, 2016, 07:47
Default
  #2
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,232
Rep Power: 22
flotus1 will become famous soon enoughflotus1 will become famous soon enough
If you can afford it, my advice would be a 4-node setup.
The reason why it is faster is, as always, memory bandwidth. Four dual-socket nodes deliver twice the memory bandwidth compared to two nodes. This will translate into a huge performance increase in CFX.
In addition to that, the Xeon E5-2637 v4 is the processor with the highest amount of L3-cache per core which also helps.
If money is not an issue at all, you might even consider installing the E5-2667 v4 on all 4 nodes. If you use only 8 cores per node, it has even more L3-Cache per (used) core and about the same clock speed. Keep in mind that the base frequency is not a good indicator for the actual clock speed under load. These processors usually run with higher frequencies even when all cores are used. In fact, the turbo frequency for all cores is 3.6GHz for the E5-2637v4 and 3.5GHz for the E5-2667v4.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   July 7, 2016, 10:39
Default
  #3
New Member
 
Join Date: Jun 2014
Posts: 3
Rep Power: 4
Courageous is on a distinguished road
Thanks Alex.

Sadly money is an issue, so the 4-node setup with E5-2637 v4 will be fine .
And the E5-2637 4-node setup is only 10% more expensive than the E5-2667 2-node setup, so it is a worthwhile investment.

Do you really believe it will be 70% faster or should I expect something between 20 & 50% for some reason ? (Even 20% it is still great anyway)

Best Regards
Courageous is offline   Reply With Quote

Old   July 7, 2016, 11:37
Default
  #4
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,232
Rep Power: 22
flotus1 will become famous soon enoughflotus1 will become famous soon enough
It is hard to say how much faster it will be exactly, this also depends on the type of cases you run. But on average the difference in performance will definitely be more than just 20%. Somewhere around 50% should be a conservative estimate.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Compute Cluster with diskless compute nodes Pauli Hardware 0 October 6, 2015 16:48
OF211 with mvapich2 on redhat cluster, error when using more than 64 cores? ripperjack OpenFOAM Installation 4 August 30, 2014 03:47
Performance Improvements by core locking RobertB STAR-CCM+ 7 October 22, 2010 07:59
Linux Cluster Performance with a bi-processor PC M. FLUENT 1 April 22, 2005 09:25
CFX4.3 -build analysis form Chie Min CFX 5 July 12, 2001 23:19


All times are GMT -4. The time now is 18:36.