CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   Hardware (https://www.cfd-online.com/Forums/hardware/)
-   -   32 Cores Mini-Cluster, 2 Nodes vs. 4 Nodes performance (https://www.cfd-online.com/Forums/hardware/174200-32-cores-mini-cluster-2-nodes-vs-4-nodes-performance.html)

Courageous July 7, 2016 04:21

32 Cores Mini-Cluster, 2 Nodes vs. 4 Nodes performance
 
1 Attachment(s)
Hello everyone.

· I’ve been working for some time on the definition of a 32 cores mini-cluster configuration, mainly to run CFX.
· Most of the components have been chosen, but one very important question remains about the number of computing nodes: What is the difference in terms of performance between a 2 nodes configuration with 2*8 cores each & a 4 nodes configuration with 2*4 cores each ?
· CPU’s would be Xeon E5-2600 v4, 2667 for 2 node configuration & 2637 for 4 cores configuration

· I remember reading a thread where Glenn Horrocks said that cfp2006 benchmark is a good indicator of final CFX-Fluent performance, so I performed some comparison based on cfp2006 benchmark results (see attachment)

· The conclusion is that the 4 nodes configuration would be 70% faster than the 2 nodes configuration, which seems unbelievable.
· Base frequency is only 10% higher for 4 cores CPU but Memory Bandwidth per core is 50% also higher for 4 cores CPU. Is it enough to explain such difference ?
· I assumed scalability is perfect because we’ll be using Infiniband, but even if inter-node scalability is 90% the gap will still be huge.
· The only drawback of the 4 nodes configuration is the total CPU dissipated power which is doubled compared to the 2 nodes config...

Does anyone has some advice or comment about that ?
Did I miss something in my logic ?

Thanks for reading.
Aurelien

flotus1 July 7, 2016 07:47

If you can afford it, my advice would be a 4-node setup.
The reason why it is faster is, as always, memory bandwidth. Four dual-socket nodes deliver twice the memory bandwidth compared to two nodes. This will translate into a huge performance increase in CFX.
In addition to that, the Xeon E5-2637 v4 is the processor with the highest amount of L3-cache per core which also helps.
If money is not an issue at all, you might even consider installing the E5-2667 v4 on all 4 nodes. If you use only 8 cores per node, it has even more L3-Cache per (used) core and about the same clock speed. Keep in mind that the base frequency is not a good indicator for the actual clock speed under load. These processors usually run with higher frequencies even when all cores are used. In fact, the turbo frequency for all cores is 3.6GHz for the E5-2637v4 and 3.5GHz for the E5-2667v4.

Courageous July 7, 2016 10:39

Thanks Alex.

Sadly money is an issue, so the 4-node setup with E5-2637 v4 will be fine :).
And the E5-2637 4-node setup is only 10% more expensive than the E5-2667 2-node setup, so it is a worthwhile investment.

Do you really believe it will be 70% faster or should I expect something between 20 & 50% for some reason ? (Even 20% it is still great anyway)

Best Regards

flotus1 July 7, 2016 11:37

It is hard to say how much faster it will be exactly, this also depends on the type of cases you run. But on average the difference in performance will definitely be more than just 20%. Somewhere around 50% should be a conservative estimate.


All times are GMT -4. The time now is 20:25.