|
[Sponsors] |
Help for buid WS or mini server (256 physical core or more) |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
October 14, 2022, 03:08 |
Help for buid WS or mini server (256 physical core or more)
|
#1 | |
Member
Nguyen Trong Hiep
Join Date: Aug 2018
Posts: 48
Rep Power: 7 |
Hi all,
I want to get a workstation or mini server for run OpenFOAM with 256-512 physical core. I dont have much experience for run parallel on cluster (i tested it with 1GBps ethernet cable and its 10 time slower) Therefore, i would like to build a configuration like workstation and run it like single computer. However, the lastest CPU has only 64 cores per socket and 2 sockets maximum per motherboard. Is there any solution for this problem? If not, mini server will be fine but i need support for setup server. My budget it 200k$. I found a configuration in internet, it have 4 nodes in 1 rack, each node have 2 cpu. Do i need setup cluster and infiniband connection between each node? Quote:
|
||
October 14, 2022, 05:12 |
|
#2 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
The only reasonable way to run this many threads is with a cluster. Even if there were CPUs available with 128 or 256 cores, you would not get what you expect. Scaling on such high core counts within a single CPU is pretty flat. I.e. going from 32 cores to 64 cores, you might expect to double the performance. But all you get is maybe a 10-15% increase. Shared CPU resources, especially memory bandwidth, create a bottleneck.
See also: General recommendations for CFD hardware [WIP] What you need is a cluster made from nodes with 2x32-core CPUs. Maybe 48-core CPUs if you want to increase density at the cost of lower maximum performance. And a shared file system that all nodes can access. For 512 cores total, that's 8 compute nodes. 10 Gigabit Ethernet CAN work with this relatively low amount of nodes, But I would rather opt for a proper HPC interconnect like Infiniband here. There is just no way around this that would make sense. The business you buy these computers from will be happy to assist you in choosing the right adapters and a switch for the node interconnect. Setting that up to run OpenFOAM in parallel requires some research, but it's not rocket science. There are guides online, and you can certainly ask here if you run into problems. Or you can make some room in the 200k$ budget to hire someone to walk you through the setup. |
|
October 17, 2022, 03:33 |
|
#3 | |
Member
Nguyen Trong Hiep
Join Date: Aug 2018
Posts: 48
Rep Power: 7 |
Quote:
Before hiring an HPC consultant, I would like a preliminary configuration that I will use, mainly in terms of CPU and Memory (I prefer a workstation because it is simple and easy to use). About memory bandwidth bottleneck, currently, We are using an HPC with E5-2696v4 cpu (with 76.8Gb/s maximum memory bw and 0.7 tflops for fp32 with 20 core). Its mean we will get 110 gb/s bw for 1 tflops. With epyc 7763 cpu (208Gb/s bw and 5tflops), we have 41.6 Gb/s per tflops. Is this the cause of it? And what is the optimal number of memory banwidth per tflops for CFD? I have another question. Will 10Gigabit ethernet (1.25 Gb/s data transfer) is the bottleneck when transfer data between nodes? |
||
October 17, 2022, 05:25 |
|
#4 | ||||
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
Quote:
But again, it is just not feasible at the moment. Even if we had CPUs with this many cores. The next generation of Epyc CPUs will have effectively twice the memory bandwidth, allowing two times as many cores to be used somewhat efficiently. But that's still nowhere near 512 cores in a shared memory system. Quote:
These are 32-core CPUs, so 64 cores per node. Which makes 8 of these nodes for 512 cores. Quote:
Theoretical flops numbers can be tricky. Especially if they include AVX instructions. These can only be leveraged by highly optimized codes, which happen to be a good fit for vectorization. Real-world CFD codes will never get close to those numbers, even if we remove the memory bottleneck. A much easier rule of thumb: 2-4 cores per memory channel. The Epyc 7543 sits right at the upper limit with 4 cores per memory channel, which is acceptable for open-source codes like OpenFOAM. Quote:
What we talked about so far is "intra-node scaling". I.e. how much speedup you get from increasing the number of threads in a single node. And why there is a limit here. The node interconnect determines "inter-node scaling". I.e. how much speedup you get by going from 1 to 2, 4, 8... compute nodes. Ethernet works ok for a low number of nodes that are relatively slow. The more nodes you connect, and the faster they are, the better the interconnect should be. If it's too slow, the same thing happens as within a node when running out of memory bandwidth: you get less-than-ideal inter node scaling. Imagine you buy 8 of these nodes, but only get maximum performance equivalent to 6 of them, because the node interconnect is too slow. Not good. With 8 of these fairly high-performance nodes, Infiniband is the way to go. Node interconnects aren't just about sequential transfer rates. Latency is also important, and this is where Infiniband is way ahead of Ethernet. |
|||||
October 18, 2022, 05:05 |
|
#5 |
Member
Nguyen Trong Hiep
Join Date: Aug 2018
Posts: 48
Rep Power: 7 |
Many thank for your information.
It helped me understand a lot. Recently, new Fluent version can run with native GPU (no data transfer between CPU and GPU) for unstructured grid. Is it better to build a configuration that includes the GPU than just the CPU and will OpenFOAM do the same as Fluent? I found Rapid CFD run on GPU similar with Fluent but dont have update since 3 years ago (I haven't tried it yet). |
|
October 18, 2022, 07:10 |
|
#6 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
You will have to ask about the plans for GPU acceleration in the OpenFOAM part of the forum. I assume you read the section on GPU acceleration in the thread I linked earlier: General recommendations for CFD hardware [WIP]
My opinion on GPU acceleration hasn't changed much. Unless you are absolutely sure that the stuff you need OpenFOAM to do benefits enough from GPU acceleration NOW, don't invest into GPUs. |
|
October 18, 2022, 07:43 |
|
#7 |
Member
Nguyen Trong Hiep
Join Date: Aug 2018
Posts: 48
Rep Power: 7 |
Its not GPU acceleration. The solver run native in GPU, no data transfer between CPU-GPU (this is bottleneck of GPU acceleration)
That is new release 2022R2 version with limit solver, but i can not find any paper about native solver in GPU (except LBM solver) |
|
October 18, 2022, 08:02 |
|
#8 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46 |
Terminology isn't the problem here.
It's that codes running on GPUs don't have feature parity with CPU codes. If you have all you need with the current GPU implementation, and it actually runs faster: great. But that's something you will have to verify yourself. I get that you want to avoid a cluster, because it all seems complicated. But If you want to avoid distributed memory by going multi-GPU with OpenFOAM, you are jumping out of the frying pan right into the fire. Last edited by flotus1; October 19, 2022 at 03:52. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
stop when I run in parallel | Nolwenn | OpenFOAM | 36 | March 21, 2021 04:56 |
mpirun, best parameters | pablodecastillo | Hardware | 18 | November 10, 2016 12:36 |
Superlinear speedup in OpenFOAM 13 | msrinath80 | OpenFOAM Running, Solving & CFD | 18 | March 3, 2015 05:36 |
OpenFOAM 13 Intel quadcore parallel results | msrinath80 | OpenFOAM Running, Solving & CFD | 13 | February 5, 2008 05:26 |
OpenFOAM 13 AMD quadcore parallel results | msrinath80 | OpenFOAM Running, Solving & CFD | 1 | November 10, 2007 23:23 |