CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   Hardware (https://www.cfd-online.com/Forums/hardware/)
-   -   2990wx falls far behind the dual way E5-2696V4 system in CFD simulations (https://www.cfd-online.com/Forums/hardware/213328-2990wx-falls-far-behind-dual-way-e5-2696v4-system-cfd-simulations.html)

bravebear December 21, 2018 21:31

2990wx falls far behind the dual way E5-2696V4 system in CFD simulations
 
The AMD platform is:
CPU: 2990WX (32 cores (64 threads))
RAM: 64GB of DDR4-2400 (16GB*4)
SSD: INTEL 960P 512GB (NVME)
GPU: GTX 1080Ti

The Intel platform is:
CPU: E5-2696V4*2 (22 cores (44 threads) per CPU)
RAM: 128GB of DDR4-2400 RECC (16GB*8)
SSB: SAMSUNG 860evo (SATA)
GPU: GTX 1060Ti

OS are Windows 10 Pro in both platforms.

The 2990wx has 4 dies in the CPU, 2 of them (die 0, die 2 ) connect to the RAMs directly, and the other 2 dies access to the RAMs through die 0 and die 2 respectively. The review from pcworld indicated that the per-core bandwidth of 2990wx is only 2GB/s when all cores were used, an obvious delay of memory access would be expected in this situation. The core to core bandwidth of using 2 dies (16 cores) is 5GB/s,

In my case, I utilized 16 cores (32 threads) to solve CFD cases (in SimVascular) of 3M and 5M elements on both platforms (32 threads), the Intel one is almost 3 times faster than the AMD platform. I tried to perform the simulations in UBUNTU 18.04 on AMD platform, still falls far behind the Intal one.:eek:

What will help to improve the performance of the 2990wx platform? Will there be a boost improved performance if I use high-frequency RAM (DDR4-2666 OR 3600)? Or I should insert all the 8 DIMMs with ram?

Suggestions are appreciated!

-----------------
link to PCWORLD's review
https://www.pcworld.com/article/3298...rformance.html

RobertB December 22, 2018 09:14

The Intel system has at least twice the bandwidth of the AMD plus all the CPUs are direct linked to the memory.

It is unclear why you expect a sub $2000 dollar processor to be competitive with 2x $4000 dollar processors.

For CFD you should be using an Epyc processor as it has a much higher bandwidth memory system and a better internal architecture to use that system. Threadrippers are for high CPU/low memory tasks and CFD isn't one of those.

In terms of what you can do.

Turn off hyperthreading as it is rarely helpful in CFD applications

Set core affinity for the processes so the code actually uses the cores attached to the memory and the threads use the cache more efficiently.

On the Intel system if the BIOS has a cache mode you can try changing that. Anandtech found 'local' iirc was 20% better on openfoam.

flotus1 December 22, 2018 10:16

First things first: even when configured correctly the TR2990WX will be much slower than the dual-Intel system in parallel CFD.

What you need to change to get better results:
  • disable SMT
  • disable the 2 dies that have no direct memory path
  • -OR- pin your CFD code to the 16 cores on dies with a memory controller
  • tweak memory speed and timings.

It has been mentioned quite a few times on this forum: The TR2990WX is a pretty sub-optimal CPU especially for CFD and similar workloads. So I kind of hope that you bought it mainly for a different kind of application.

Quote:

Will there be a boost improved performance if I use high-frequency RAM (DDR4-2666 OR 3600)? Or I should insert all the 8 DIMMs with ram?
Faster memory definitely helps. Filling all 8 slots on the other hand will only decrease the maximum memory frequency you can reach and thus limit performance.
If this machine is mainly for CFD here is what I would recommend: Sell the CPU and RAM and get a TR2950X instead with some really fast RAM certified to run with TR CPUs. 4x16GB DDR4-3200 at the very least. Or if 5M cells is a typical problem size for you 4x8GB might be enough.

bravebear December 23, 2018 10:11

Quote:

Originally Posted by RobertB (Post 719895)
The Intel system has at least twice the bandwidth of the AMD plus all the CPUs are direct linked to the memory.

It is unclear why you expect a sub $2000 dollar processor to be competitive with 2x $4000 dollar processors.

For CFD you should be using an Epyc processor as it has a much higher bandwidth memory system and a better internal architecture to use that system. Threadrippers are for high CPU/low memory tasks and CFD isn't one of those.

In terms of what you can do.

Turn off hyperthreading as it is rarely helpful in CFD applications

Set core affinity for the processes so the code actually uses the cores attached to the memory and the threads use the cache more efficiently.

On the Intel system if the BIOS has a cache mode you can try changing that. Anandtech found 'local' iirc was 20% better on openfoam.

Hi, Robert

Thanks for the suggestions! The AMD platform was initially built for some machine learning tasks. Occasionally I use it for CFD analysis, but I didn't realize the memory bandwidth's role until I read the posts in this forum.

I traced the core utilize in windows 10, the latest update seems optimized the task distribution. when using 16 core, the cores on dies that have direct ram link were utilized. However, as you mentioned, the performance is very poor compared with the dual Xeon system.

bravebear December 23, 2018 10:26

Quote:

Originally Posted by flotus1 (Post 719902)
First things first: even when configured correctly the TR2990WX will be much slower than the dual-Intel system in parallel CFD.

What you need to change to get better results:
  • disable SMT
  • disable the 2 dies that have no direct memory path
  • -OR- pin your CFD code to the 16 cores on dies with a memory controller
  • tweak memory speed and timings.

It has been mentioned quite a few times on this forum: The TR2990WX is a pretty sub-optimal CPU especially for CFD and similar workloads. So I kind of hope that you bought it mainly for a different kind of application.


Faster memory definitely helps. Filling all 8 slots on the other hand will only decrease the maximum memory frequency you can reach and thus limit performance.
If this machine is mainly for CFD here is what I would recommend: Sell the CPU and RAM and get a TR2950X instead with some really fast RAM certified to run with TR CPUs. 4x16GB DDR4-3200 at the very least. Or if 5M cells is a typical problem size for you 4x8GB might be enough.

Hi, Alex

I turned off SMT and ran some simple benchmarks in SimVascular, the sweet point of 2990wx for a 5M elements model is 16 cores... any more core won't help. The simulations of this model take around 10GB ram. Under 16 cores, it took around 450s to finish the simulation in AMD machine, while the Intel one only took 180s. If using 22 cores in Intel platform, only around 100s were spend...

As the 2990wx platform is mainly for the machine learning tasks, I'm considering a dual Epyc 7301 system. The OpenFoam benchmarks of this CPU is very impressive. The dual intel system is tooooo expensive. My current dual 2696v4 machine took me more than 6000 USD, despite the CPUs are second hand.

Thank you very much!

Sixkillers March 8, 2019 23:30

For Windows OS it might be worth to try: https://youtu.be/M2LOMTpCtLA


All times are GMT -4. The time now is 08:28.