CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

HPC Cluster Configuration Question

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   September 7, 2018, 10:04
Default HPC Cluster Configuration Question
  #1
New Member
 
Join Date: Sep 2018
Posts: 1
Rep Power: 0
chuckb is on a distinguished road
We’re debating the hardware selection of a ~128 core, CFD cluster for ANSYS fluent. Due to Fluent’s licensing scheme we are core count limited so the focus is on per-thread performance. Another quirk of the program is that a single gpu is equivalent to a single cpu core from the licensing perspective, i.e. 120 cpu cores + 8 Telsa V100 would be the same license draw as 128 cpu cores. We engage in CFD problems ranging from combustion modeling (~6 million cell, 20 gas species, ~27 total equations) to some thermal mixing models (~5 million cell, ~9 equations) to isothermal flow (higher cell count but 6 equations). Looking through benchmarks unfortunately they seem heavily focused on comparing a cpu versus cpu, but for us a 32 core cpu is not equivalent to a 16 core cpu because it would take twice as many licenses.

In summary our questions are:
  1. Excluding Xeon platinums that seem outside our price consideration, would a Xeon Gold 6144 (3.5Ghz, 8 cores) be the best per-core chip to get? Looking at availability we may need to consider 6134 (3.2Ghz, 8 cores).
  2. Assuming we get a 6144 or 6134, what would be a good ratio of cpu’s to gpu’s considering the type of CFD jobs we run? It’s very difficult to find benchmarking comparisons of an optimum ratio of those two products. The HPC vendors are suggesting cluster nodes with 2 cpu’s per gpu, so with a 6144 or 6134 that would be 16 cpu cores for 1 gpu. However, they don’t have any real benchmarking to support the ratio. There is costs limitations here too, we couldn't fund 128 gpu's.
  3. Is AVX-512 a significant speed-up, i.e. should we be focused on products with AVX-512? (which a Xeon Gold 6144 or 6134 would have)
  4. Is there any benchmarking comparing K40 to P100 to V100? Even at a 2 cpu to 1 gpu ratio the GPU’s quickly become a dominant cost of the entire cluster and it’s not clear if V100 is worth the premium. Also I read some issues on RAM size, do we need to pick maximum RAM on the cards to prevent that bottlenecking?
  5. Is there any kind of benchmarking on Intel omni-path fabric? The vendors are generally not recommending omni-path fabric for a cluster this small.
Thanks for any advice, once we get this built we’d be happy to post some benchmarking.
chuckb is offline   Reply With Quote

Old   September 8, 2018, 08:26
Default
  #2
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,400
Rep Power: 47
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
I might not be qualified to answer all of your questions, especially about GPUs. But I can share my opinion on some of them. You should definitely get in touch with Ansys and your hardware vendor before buying so you can hold their feet to the fire in case performance is not optimal.


Quote:
Excluding Xeon platinums that seem outside our price consideration, would a Xeon Gold 6144 (3.5Ghz, 8 cores) be the best per-core chip to get? Looking at availability we may need to consider 6134 (3.2Ghz, 8 cores).
Xeon Platinum are only an option if you need more than 4 CPUs in a shared memory system. So Xeon Gold 61xx is the way to go here since you probably want 2 CPUs per node.
Gold 6144 is a compelling option here thanks to high all-core turbo (4.1GHz) and high cache per core.
Gold 6146 might also be an option. It only costs around 300$ more. Since it has 12 cores, you can either use it to scale down the number of nodes or run simulations on 8/12 cores to get slightly better performance per core.
6134 could be used to save some money on CPUs, but given the total cost of the cluster and the licenses this is probably not top priority.

Quote:
Assuming we get a 6144 or 6134, what would be a good ratio of cpu’s to gpu’s considering the type of CFD jobs we run? It’s very difficult to find benchmarking comparisons of an optimum ratio of those two products. The HPC vendors are suggesting cluster nodes with 2 cpu’s per gpu, so with a 6144 or 6134 that would be 16 cpu cores for 1 gpu. However, they don’t have any real benchmarking to support the ratio. There is costs limitations here too, we couldn't fund 128 gpu's.
We had some benchmarks posted lately with up to 4 GPUs in a dual-socket node: GPU acceleration in Ansys Fluent
Showing no scaling at all for tiny cases, but considerable scaling with 4 GPUs for larger cases.
What you should definitely check first: do your particular simulations benefit from GPU acceleration at all. Either try it with hardware you have or contact Ansys.

Quote:
Is AVX-512 a significant speed-up, i.e. should we be focused on products with AVX-512? (which a Xeon Gold 6144 or 6134 would have)
AVX512 is not particularly beneficial for these bandwidth-limited calculations. But since all relevant Skylake-SP CPUs have it, you don't really need to worry about it.

Quote:
Is there any benchmarking comparing K40 to P100 to V100? Even at a 2 cpu to 1 gpu ratio the GPU’s quickly become a dominant cost of the entire cluster and it’s not clear if V100 is worth the premium. Also I read some issues on RAM size, do we need to pick maximum RAM on the cards to prevent that bottlenecking?
Concerning the VRAM size: if your models don't fit into VRAM, you can not run the case with GPU acceleration. Not a bottleneck, but simply a no-no.
K40 is seriously outdated, I don't think anyone still sells these in a new cluster.
Other than that, I have not come across any benchmark comparing P100 to V100 for Ansys Fluent GPU acceleration. Sorry.

Quote:
Is there any kind of benchmarking on Intel omni-path fabric? The vendors are generally not recommending omni-path fabric for a cluster this small.
Apart from Intels own benchmarking, Infiniband seems to perform better in all benchmarks I have seen. So I would choose IB for the node interconnect. There is a reason why it is the quasi-standard for HPC clusters.

Last edited by flotus1; September 9, 2018 at 05:51.
flotus1 is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
[Other] Basic questions about OpenFOAM cluster running and installing Fauster OpenFOAM Installation 0 May 25, 2018 15:00
Submit a cfd job with shell on HPC cluster matheusmonjon Hardware 1 December 19, 2017 11:32
windows7 cluster for HPC ztdep Hardware 1 February 3, 2015 08:01
Small cluster configuration for pump simulation at CFX Nevel Hardware 2 April 7, 2014 06:07
HPC on a Linux cluster Jihwan Siemens 2 November 22, 2005 10:17


All times are GMT -4. The time now is 02:42.