|
[Sponsors] |
![]() |
![]() |
#1 |
Member
Join Date: Jul 2011
Posts: 53
Rep Power: 14 ![]() |
Hi,
I would very much appreciate a second set of eyes on my pick of hardware for a 32 core compute setup for Fluent/CFX. My company is exclusively supplied by Dell, and I cannot "build my own," so to speak. My potential build consists of 2 nodes (Dell Precision 7920), each with: 2 x Intel Xeon Gold 6134 - 3.2 GHz - 8C 12 x 8 GB DDR4 2666 MHz (ECC) - 96 GB total So that's 32 cores spread over 4 sockets over 2 separate nodes (with six DIMMS per socket). This appears to hit the sweet spot for a powerful yet affordable setup. Will I be severely bottlenecked if I connect the two nodes using 10 GbE? Would going to a faster interconnect be advisable? Thanks ![]() |
|
![]() |
![]() |
![]() |
![]() |
#2 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,344
Rep Power: 45 ![]() ![]() |
No objections on the build itself.
Just two comments: 1) make sure to get dual-rank DIMMs 2) consider a quad-socket setup. The Xeon Gold 6134 you picked are actually a good choice for quad-socket thanks to their 3 UPI links. This way you avoid any hassle with node interconnects and maybe even save a few bucks. If you do not need 2 dedicated workstations for whatever reason, I highly recommend quad-socket instead. If you need a node interconnect, you can always try 10gbit ethernet first and see if it affects your scaling. Run a 16-core job on one machine, run the same job spread across the machine on 8 cores each, compare the results. If you want a faster interconnect for only two nodes, all you need are two additional infiniband cards, no switch required. I have seen benchmarks where 10gbit ethernet did not impose a serious bottleneck for a small number of nodes, but results may vary. Then again, UPI links (quad-socket) are faster than infiniband. |
|
![]() |
![]() |
![]() |
![]() |
#3 | |
Member
Join Date: Jul 2011
Posts: 53
Rep Power: 14 ![]() |
Quote:
I've looked into a quad-socket setup from Dell, but it actually works out to approximately twice the price (!) of two dual-socket machines. So I could get four dual-socket machines for the price of one quad-socket machine. I think the reason for this is that the dual-socket machine falls within Dell's Precision line, whereas the quad-socket is the more expensive PowerEdge line. As for interconnect, our server room is already setup with 10 gig ethernet, and we've got spare cards and switches available. So I will probably do as you say and start off with that and see how it performs. When it comes to single vs dual-rank memory, this I am unsure about! The online configurator for Dell doesn't mention rank, it just states "96 GB (12 x 8 GB) 2666 MHz DDR4 RDIMM ECC." I will contact our Dell sales rep and ask. Thanks again! |
||
![]() |
![]() |
![]() |
![]() |
#4 |
Member
Join Date: Jul 2011
Posts: 53
Rep Power: 14 ![]() |
I have 2 x HPC Packs, so I can run on 32 simultaneous cores. We will not be purchasing additional HPC Packs in the foreseeable future.
Will I see a performance hit in terms of network and I/O activities if I'm running CFD calculations on *all* my available physical cores? Does, for example, an infiniband/10GbE connection require dedicated CPU resources in order to function well while running a simulation? I'm considering going for dual 12 core CPUs (Intel Xeon Gold 6136 or 6146) instead of dual 8 core Xeon Golds - which will mean I have spare compute power even when fully utilizing my 32 core HPC license. So I would be running on 32 out of the 48 available cores, leaving cores free for network and I/O duties. Are there specific downsides to this? Granted, the 6146 is rather expensive, but so be it. Could also consider the Xeon Gold 6144 - 8C 3.5 Ghz, but then I wouldn't have any spare compute power again. These are my options, I think: 2 x Intel Xeon Gold 6134 - 8C 3.2 Ghz (Turbo to 3.7 Ghz when running on 8 cores), 8 MiB L2 cache 2 x Intel Xeon Gold 6136 - 12C 3.0 Ghz (Turbo to 3.6 Ghz when running on 8 cores), 12 MiB L2 cache 2 x Intel Xeon Gold 6144 - 8C 3.5 Ghz (Turbo to 4.1 Ghz when running on 8 cores), 8 MiB L2 cache 2 x Intel Xeon Gold 6146 - 12C 3.2 Ghz (Turboboost to 4.0 Ghz when running on 8 cores), 12 MiB L2 cache Ignoring the price differences, and assuming my memory configuration is optimal and the same for all CPUs, how would I get the best performance out of my 32-core HPC license? Last edited by SLC; November 27, 2017 at 14:26. |
|
![]() |
![]() |
![]() |
![]() |
#5 |
Senior Member
Join Date: Mar 2009
Location: Austin, TX
Posts: 160
Rep Power: 17 ![]() |
You might see a slight performance hit using every core, but not one large enough that spending an additional $4000 would make sense.
If you've got more money to spend, why not get 3 machines with the 6 core Xeon Gold 6128? It would be significantly faster than either of your suggestions because you'd have 50% more memory bandwidth and a little more cache. You'd still get the small bump from having unused cores. I wouldn't even bother with ethernet. Second hand FDR Infiniband equipment costs next to nothing and configuration is trivial. Edit - Also, you shouldn't be worried about how much L2 cache you are getting. L2 cache is private to each core, and it's the same per core for every model. No matter what, if you're running on 32 cores, you're utilizing 32mb of L2 cache. The rest is sitting idle. Additional L3 cache would be a benefit however since it is shared. |
|
![]() |
![]() |
![]() |
![]() |
#6 | |
Member
Join Date: Jul 2011
Posts: 53
Rep Power: 14 ![]() |
Quote:
Yeah, I'm going for FDR Infiniband. I'll have to buy it new direct from Dell, but they gave me a decent quote on pair of Mellonax ConnectX-3 VPI FDR Infiniband cards. I didn't realise that about L2 and L3 cache - thanks for the pointer. The delta in terms of $$ for me in going from the Xeon Gold 6134 to: 4 x Intel Xeon Gold 6134: -------- 4 x Intel Xeon Gold 6136: + $1000 4 x Intel Xeon Gold 6144: + $3200 4 x Intel Xeon Gold 6146: + $5000 Changing the setup into a three node build with the Xeon Gold 6128 is prohibatively expensive, unfortunately. It would be an extra ~$15,000 USD overall, even after considering the lower price point of the actual CPUs. Having to get 12 sticks of RAM for each node makes it pricey! I'd also have to get an Infiniband switch (I think?). So perhaps the question is: should I go for the 6134 (8C) or the 6136 (12C)? |
||
![]() |
![]() |
![]() |
![]() |
#7 | ||
Member
Join Date: Jul 2011
Posts: 53
Rep Power: 14 ![]() |
Quote:
Quote:
Is there any point in going for a 12C CPU in order ensure there is spare processing power when running a simulation on 8 of those cores, compared to an 8C CPU where all 8 cores will be running at 100 % load? The base clock of the 8C CPU is 3.2 GHz, whereas for the 12C CPU it is 3.0 GHz. |
|||
![]() |
![]() |
![]() |
![]() |
#8 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,344
Rep Power: 45 ![]() ![]() |
Yeah base clocks... nobody except Intel knows what turbo frequencies these CPUs will run in non-AVX or mild-AVX workloads. An educated guess would be that the 12-core CPU with only 8 cores load will run similar or even higher clock speeds than the 8-core CPU thanks to the higher TDP (150W vs 130W)
https://www.microway.com/knowledge-c...e-family-cpus/ I don't think it will be much of a benefit having idle CPU cores that can handle communication. But then again, the price difference is "only" 1000$ which should be small compared to the total system cost. And you get the CPUs with potentially higher clock speeds. Intels Skylake-SP seems to be missing a similar 10-core CPU. Which is kind of weird given the lineup consists of at least 58 different CPUs. |
|
![]() |
![]() |
![]() |
![]() |
#9 | |
Member
Join Date: Jul 2011
Posts: 53
Rep Power: 14 ![]() |
Quote:
Think I've landed on the Xeon Gold 6144 ![]() |
||
![]() |
![]() |
![]() |
![]() |
#10 |
Senior Member
Micael
Join Date: Mar 2009
Location: Canada
Posts: 156
Rep Power: 17 ![]() |
Would be great if you can post few benchmarks once you have your system running
![]() |
|
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
2D Glass Melt Simulation Setup | marmz | FLUENT | 5 | October 9, 2016 15:25 |
[ICEM] Hexa mesh, curve mesh setup, bunching law | Anorky | ANSYS Meshing & Geometry | 4 | November 12, 2014 00:27 |
OpenFoam Cluster Hardware | ChrisA | Hardware | 3 | July 25, 2013 03:42 |
Questions about CPU's: quad core, dual core, etc. | Tim | FLUENT | 0 | February 26, 2007 14:02 |
Hardware for Serious Use | Justin | CFX | 6 | January 3, 2007 00:15 |