rja October 1, 2013 15:37

New Dell M420 Cluster
We have just added a 24 node cluster using Dell M420 servers. All the servers have 2 Xeon E5-2440 processors, 48GB RAM and an SSD. The also have QDR Infiniband NICís and are interconnected through 2 FDR Infiniband switches. Does anyone out there have experience with a similar setup? We have been running some Star CCM+ benchmarks on this new cluster and are seeing the performance drop off quickly when running on 10 or more cores/node. Iím wondering if this is just a limitation for these servers or do we have something configured incorrectly. Any help/suggestions will be greatly appreciated.

evcelica October 2, 2013 15:06

You are most likely hitting memory bandwidth bottlenecks when going over 4 cores/processor or 8 cores/node.

rlc113 October 16, 2013 15:11

Are you saying you get close to linear scaling when running up to 9 cores per node (total of 216 cores) Then it starts to lose performance?

I am seeing non linear behavior on GigE interconnect when I add a 4th node (6 core i7) to my current set up. Trying to determine if it is the network latency or memory bandwidth

TMG November 16, 2013 12:27

You can't expect to scale CFD runs to many nodes using GigE. It has neither the bandwidth nor the latency performance to work well. We've seen it die at 3 nodes but it depends on what's in the nodes. There are some new low latency 10GigE interconnects that will get you quite a bit farther but ultimately you would need to go to Infiniband to scale to very large numbers of nodes. The cost of the low latency 10GigE hardware is comparable to Infiniband anyway so I don't quite see the advantage.

wrt to using 10 cores/cpu - it won't work either. In that case you are hitting a memory bandwidth limitation. The latest Intel memory architectures have just about enough bandwidth to keep 8 cores busy. You can't keep adding cores without adding more memory bandwidth and there is no way to increase memory bandwidth in the current generation of Intel cpu's.

