Would you suggest Infiniband?
Hello,
We currently have a Cisco cluster which is made up of 4 nodes. Each node has 2x Intel E5-2630v3 CPU which are 8 cores each@2.4GHz. There's 64GB or RAM in each node (8x8GB). The nodes are in a Cisco rack which I believe has a 2x10GB/s connection between the nodes. One of the nodes is also used as the headnode. I am looking to upgrade the cluster and have around £20,000 to spend which I'd like to get the best performance for the money. The cluster is used mostly for a mix of Star-CCM+ and Converge CFD in-cylinder simulations with detailed chemistry solver. I was thinking was of purchasing 4 more nodes with 2x E5-2630 v4 CPU (10 cores @ 2.4GHz) in each node, giving an additional 80 cores. Along with 8x8GB of RAM per node. I believe that the additional nodes can be installed in the same Cisco rack and use the 2x10GB/s connection. Do you think that my plan is a sensible way of proceeding? Would we benefit from spending some of the money on Infiniband connection between the nodes? We are limited to purchasing Cisco hardware. Thanks for your time. |
CFD hardware
Hello,
For the price range you have in mind and tech data you have mentioned, I would recommend to get in touch with Totalsim, www.totalsim.co.uk.They are based n Brackley. My company have bought a similar cluster configuration from them,which has worked two years without a single glitch. Best regards, Borian |
Jonny,
You always want to remove bottlenecks where they exist and in my experience, network latency and bandwidth is a common bottleneck. If you want to test just how much, run a job in STAR-CCM+ . Run that job on 16 cores, specifically filling one node completely. I suggest a single-phase simulation with relatively simple physics as these scale well and if you keep it relatively light - say 2 million cells - you're not pushing the RAM too hard either. Then rerun that job again on 16 cores, but this time 8 cores on one node, and 8 cores on another. If you can, fully reserve both nodes so no other jobs run on it. (You could submit a job with 32 cores - i.e. both nodes full - but specify a machine file for STAR-CCM+ so it only runs on 8 cores on each node) Time each run. STAR-CCM+ includes reports now that allow you to report the simulation or iteration elapsed time (but ideally you'd write a simple Java macro that: - gets system time - runs simulation - gets system time - computes run time from (end-start) time. This latter method again reduces overhead and prospect of additional bandwidth being used across the network. Do this test a few times in each configuration. As the only difference is running on network versus running 'locally'* this should give you an idea of how much of a bottleneck the network is. I'd be surprised if it's none at all. Bear in mind more complex cases with combustion where there will be lots of inter-CPU communication will exacerbate this bottleneck significantly. In my opinion, if you can get Infiniband, then do so. But as with most things, this is always going to be case/system dependent! *using the consideration that two CPUs in one node is equivalent to one local run - that's not really the case, but I'm trying to keep this test simple! |
All times are GMT -4. The time now is 06:39. |