HPC Cluster Configuration Question

chuckb · September 7, 2018, 10:04

We’re debating the hardware selection of a ~128 core, CFD cluster for ANSYS fluent. Due to Fluent’s licensing scheme we are core count limited so the focus is on per-thread performance. Another quirk of the program is that a single gpu is equivalent to a single cpu core from the licensing perspective, i.e. 120 cpu cores + 8 Telsa V100 would be the same license draw as 128 cpu cores. We engage in CFD problems ranging from combustion modeling (~6 million cell, 20 gas species, ~27 total equations) to some thermal mixing models (~5 million cell, ~9 equations) to isothermal flow (higher cell count but 6 equations). Looking through benchmarks unfortunately they seem heavily focused on comparing a cpu versus cpu, but for us a 32 core cpu is not equivalent to a 16 core cpu because it would take twice as many licenses.

In summary our questions are:

Excluding Xeon platinums that seem outside our price consideration, would a Xeon Gold 6144 (3.5Ghz, 8 cores) be the best per-core chip to get? Looking at availability we may need to consider 6134 (3.2Ghz, 8 cores).
Assuming we get a 6144 or 6134, what would be a good ratio of cpu’s to gpu’s considering the type of CFD jobs we run? It’s very difficult to find benchmarking comparisons of an optimum ratio of those two products. The HPC vendors are suggesting cluster nodes with 2 cpu’s per gpu, so with a 6144 or 6134 that would be 16 cpu cores for 1 gpu. However, they don’t have any real benchmarking to support the ratio. There is costs limitations here too, we couldn't fund 128 gpu's.
Is AVX-512 a significant speed-up, i.e. should we be focused on products with AVX-512? (which a Xeon Gold 6144 or 6134 would have)
Is there any benchmarking comparing K40 to P100 to V100? Even at a 2 cpu to 1 gpu ratio the GPU’s quickly become a dominant cost of the entire cluster and it’s not clear if V100 is worth the premium. Also I read some issues on RAM size, do we need to pick maximum RAM on the cards to prevent that bottlenecking?
Is there any kind of benchmarking on Intel omni-path fabric? The vendors are generally not recommending omni-path fabric for a cluster this small.

Thanks for any advice, once we get this built we’d be happy to post some benchmarking.

September 7, 2018, 10:04	HPC Cluster Configuration Question	#1
chuckb New Member Join Date: Sep 2018 Posts: 1 Rep Power: 0	We’re debating the hardware selection of a ~128 core, CFD cluster for ANSYS fluent. Due to Fluent’s licensing scheme we are core count limited so the focus is on per-thread performance. Another quirk of the program is that a single gpu is equivalent to a single cpu core from the licensing perspective, i.e. 120 cpu cores + 8 Telsa V100 would be the same license draw as 128 cpu cores. We engage in CFD problems ranging from combustion modeling (~6 million cell, 20 gas species, ~27 total equations) to some thermal mixing models (~5 million cell, ~9 equations) to isothermal flow (higher cell count but 6 equations). Looking through benchmarks unfortunately they seem heavily focused on comparing a cpu versus cpu, but for us a 32 core cpu is not equivalent to a 16 core cpu because it would take twice as many licenses. In summary our questions are: Excluding Xeon platinums that seem outside our price consideration, would a Xeon Gold 6144 (3.5Ghz, 8 cores) be the best per-core chip to get? Looking at availability we may need to consider 6134 (3.2Ghz, 8 cores). Assuming we get a 6144 or 6134, what would be a good ratio of cpu’s to gpu’s considering the type of CFD jobs we run? It’s very difficult to find benchmarking comparisons of an optimum ratio of those two products. The HPC vendors are suggesting cluster nodes with 2 cpu’s per gpu, so with a 6144 or 6134 that would be 16 cpu cores for 1 gpu. However, they don’t have any real benchmarking to support the ratio. There is costs limitations here too, we couldn't fund 128 gpu's. Is AVX-512 a significant speed-up, i.e. should we be focused on products with AVX-512? (which a Xeon Gold 6144 or 6134 would have) Is there any benchmarking comparing K40 to P100 to V100? Even at a 2 cpu to 1 gpu ratio the GPU’s quickly become a dominant cost of the entire cluster and it’s not clear if V100 is worth the premium. Also I read some issues on RAM size, do we need to pick maximum RAM on the cards to prevent that bottlenecking? Is there any kind of benchmarking on Intel omni-path fabric? The vendors are generally not recommending omni-path fabric for a cluster this small. Thanks for any advice, once we get this built we’d be happy to post some benchmarking.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[Other] Basic questions about OpenFOAM cluster running and installing	Fauster	OpenFOAM Installation	0	May 25, 2018 15:00
Submit a cfd job with shell on HPC cluster	matheusmonjon	Hardware	1	December 19, 2017 11:32
windows7 cluster for HPC	ztdep	Hardware	1	February 3, 2015 08:01
Small cluster configuration for pump simulation at CFX	Nevel	Hardware	2	April 7, 2014 06:07
HPC on a Linux cluster	Jihwan	Siemens	2	November 22, 2005 10:17