CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   Hardware (https://www.cfd-online.com/Forums/hardware/)
-   -   Performance problems on AMD Epyc cluster (https://www.cfd-online.com/Forums/hardware/223160-performance-problems-amd-epyc-cluster.html)

crpvn December 28, 2019 03:58

Performance problems on AMD Epyc cluster
 
Dear All!

In my workplace we have a new AMD based cluster to use OpenFOAM 19.06 for steady-state incompressible turbolent simulations with upper 40 millions cells mesh.
- 2xAMD Epyc 7702 (2x 64 cores);
- ram 256 GB DDR4;
- hard disk RAID 5
- CentOS 7.7

Now, we have some problems using many cores simultaneously. As benchmark I ran simultaneously a simpleFoam single core case with an airfoil mesh (500'000 tetra cells).

Using 4 cores test takes about 1200 s and on 128 cores about 4 hours. But, we noted many different single core performances.
Time differences through cores increase as increasing cores used.

For you, what can cause different single core performance?

We ran also a simpleFoam case with about 15 millions cells mesh for 50 iterations. On 16 cores test takes 660 s, while 600 s on 32 cores.

We ran same tests also in an Intel cluster with 2xIntel Xeon Gold (28 cores in total).
After the first test, we noted very similar time for all cores used.
Running the second case (15 millions tetras mesh) on 28 cores, it takes about 400 s.

For now, we are disappointed, because we read about excellent multi-cores performance on AMD Epyc socket.

Have anyone experiences about OpenFOAM scalability and performance on AMD Epyc 7002?

Thank you very much!

flotus1 December 28, 2019 07:23

So far, there is one Epyc Rome result in the benchmark thread. It took first place as far as dual-socket systems are concerned.
https://www.cfd-online.com/Forums/ha...tml#post747857

So in theory, such a system can be fast in OpenFOAM. In practice, performance can depend on a lot of factors. A few things you should check:
Use test cases that are large enough. 500k cells is definitely too small for 128 cores.
Disable SMT in the bios
Make sure the CPU clock speed is in the proper range when the system is under load, e.g. using turbostat
Check memory configuration. You need 16 DIMMs of DDR4-3200, populated in the correct DIMM slots.
Check how the system distributes the threads across the cores, e.g. using htop.
You can also try a newer operating system. CentOS 8 finally switched to a 4.x kernel version, which might be better for bleeding edge hardware like yours.
And last not least: adjust expectations. I would not expect much scaling beyond 64 cores, due to memory bandwidth limitations.

Edit: also, "hard disk RAID5"... do your timing checks include meshing and I/O times, or do you only look at solver times?

crpvn December 30, 2019 07:14

Thank you for your answer! I'll check those.

I used 500k cells because I ran it on single core n times simultaneously.

My timing checks include only solver times.

crestang February 17, 2020 08:50

Disabled SMT in the BIOS and everything is ok now!


All times are GMT -4. The time now is 23:59.