CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

32 Cores Mini-Cluster, 2 Nodes vs. 4 Nodes performance

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   July 7, 2016, 05:21
Default 32 Cores Mini-Cluster, 2 Nodes vs. 4 Nodes performance
  #1
New Member
 
Aurélien
Join Date: Jun 2014
Posts: 3
Rep Power: 11
Courageous is on a distinguished road
Hello everyone.

· I’ve been working for some time on the definition of a 32 cores mini-cluster configuration, mainly to run CFX.
· Most of the components have been chosen, but one very important question remains about the number of computing nodes: What is the difference in terms of performance between a 2 nodes configuration with 2*8 cores each & a 4 nodes configuration with 2*4 cores each ?
· CPU’s would be Xeon E5-2600 v4, 2667 for 2 node configuration & 2637 for 4 cores configuration

· I remember reading a thread where Glenn Horrocks said that cfp2006 benchmark is a good indicator of final CFX-Fluent performance, so I performed some comparison based on cfp2006 benchmark results (see attachment)

· The conclusion is that the 4 nodes configuration would be 70% faster than the 2 nodes configuration, which seems unbelievable.
· Base frequency is only 10% higher for 4 cores CPU but Memory Bandwidth per core is 50% also higher for 4 cores CPU. Is it enough to explain such difference ?
· I assumed scalability is perfect because we’ll be using Infiniband, but even if inter-node scalability is 90% the gap will still be huge.
· The only drawback of the 4 nodes configuration is the total CPU dissipated power which is doubled compared to the 2 nodes config...

Does anyone has some advice or comment about that ?
Did I miss something in my logic ?

Thanks for reading.
Aurelien
Attached Images
File Type: jpg 32 Cores Xeon E5-2667 v4 vs. E5-2637 v4.JPG (85.4 KB, 52 views)
Courageous is offline   Reply With Quote

Old   July 7, 2016, 08:47
Default
  #2
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,396
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
If you can afford it, my advice would be a 4-node setup.
The reason why it is faster is, as always, memory bandwidth. Four dual-socket nodes deliver twice the memory bandwidth compared to two nodes. This will translate into a huge performance increase in CFX.
In addition to that, the Xeon E5-2637 v4 is the processor with the highest amount of L3-cache per core which also helps.
If money is not an issue at all, you might even consider installing the E5-2667 v4 on all 4 nodes. If you use only 8 cores per node, it has even more L3-Cache per (used) core and about the same clock speed. Keep in mind that the base frequency is not a good indicator for the actual clock speed under load. These processors usually run with higher frequencies even when all cores are used. In fact, the turbo frequency for all cores is 3.6GHz for the E5-2637v4 and 3.5GHz for the E5-2667v4.
flotus1 is offline   Reply With Quote

Old   July 7, 2016, 11:39
Default
  #3
New Member
 
Aurélien
Join Date: Jun 2014
Posts: 3
Rep Power: 11
Courageous is on a distinguished road
Thanks Alex.

Sadly money is an issue, so the 4-node setup with E5-2637 v4 will be fine .
And the E5-2637 4-node setup is only 10% more expensive than the E5-2667 2-node setup, so it is a worthwhile investment.

Do you really believe it will be 70% faster or should I expect something between 20 & 50% for some reason ? (Even 20% it is still great anyway)

Best Regards
Courageous is offline   Reply With Quote

Old   July 7, 2016, 12:37
Default
  #4
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,396
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
It is hard to say how much faster it will be exactly, this also depends on the type of cases you run. But on average the difference in performance will definitely be more than just 20%. Somewhere around 50% should be a conservative estimate.
flotus1 is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Compute Cluster with diskless compute nodes Pauli Hardware 0 October 6, 2015 17:48
OF211 with mvapich2 on redhat cluster, error when using more than 64 cores? ripperjack OpenFOAM Installation 4 August 30, 2014 04:47
Performance Improvements by core locking RobertB STAR-CCM+ 7 October 22, 2010 08:59
Linux Cluster Performance with a bi-processor PC M. FLUENT 1 April 22, 2005 10:25
CFX4.3 -build analysis form Chie Min CFX 5 July 13, 2001 00:19


All times are GMT -4. The time now is 00:55.