CFD Online Discussion Forums - AMD Ryzen Threadripper 1920X vs. Intel Core i7 7820X

Page 1 of 2

Show 40 post(s) from this thread on one page

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- Hardware (https://www.cfd-online.com/Forums/hardware/)

- - AMD Ryzen Threadripper 1920X vs. Intel Core i7 7820X (https://www.cfd-online.com/Forums/hardware/194831-amd-ryzen-threadripper-1920x-vs-intel-core-i7-7820x.html)

bennn

October 25, 2017 05:22

AMD Ryzen Threadripper 1920X vs. Intel Core i7 7820X

Hi all,

After all the talks about these two new core families, I had the opportunity to build two new stations, one with each.

AMD Ryzen Threadripper 1920X, 3.5 GHz, 12 cores, 24 threads, 658.25€ in France
http://www.amd.com/fr/products/cpu/a...adripper-1920x

Intel Core i7 7820X, 3.6 GHz, 8 cores, 16 threads, 541.58€ in France
https://www.intel.fr/content/www/fr/.../i7-7820x.html

Motherboard for AMD is 38 euros more expensive, the cooling is 30 euros more expensive, and the power supply is bigger so 15 euros more expensive. So let's assume the overall cost is 741.25 for AMD

Both cores were tried hyperthreaded.

They have the exact same memory fitted :
Corsair Mémoire PC Vengeance LPX - DDR4 - Kit 32Go (4x 8 Go) - 3200 MHz - CL16 -
The memory was more than enough for all cases tested.

And the exact same drives. No overclocking was used.

The results on OpenFOAM are

Motorbike simpleFOAM (OF v5.0) on 6 cores
AMD : ExecutionTime = 153.31 s ClockTime = 155 s
Intel : ExecutionTime = 148.6 s ClockTime = 155 s

DTCHull interDyMFOAM (OF v1706) on 8 cores
AMD : ExecutionTime = 56577.9 s ClockTime = 56665 s
Intel : ExecutionTime = 52854.7 s ClockTime = 52888 s

If you compute the "euros * time /core" index you get :

AMD : 3500244
Intel : 3580385

So it is very close, but AMD is still a good choice.

I'd like to add that AMD temperature sensing was messy, with lm-sensors not reading it. But after managing to see the temperature during the runs, Intel reached 70deg C while AMD was only 50.

flotus1

October 25, 2017 05:41

Thanks for sharing your results.
However, I am not quite convinced by your metric. So far, the Intel chip (let alone the platform) costs less and is faster. I would be more interested in a comparison running with the maximum amount of physical cores available.
Which exact memory are you using? Did both cases fit in the memory?

bennn

October 25, 2017 08:14

Well my understanding is that, thinking in not hyperthreaded logic, AMD can do one and half DTC hull case in 56000 sec, while INTEL can do one of those in 52000s. Compared to the price paid, I think AMD is at least as efficient.

Ho and by the way the motherboard is 38 euros more expensive for AMD now. I should add that indeed.

I've updated my initial post with answers re dimms

I'm open to any feedback or test that you think make sense.

flotus1

October 25, 2017 08:29

Quote:

Originally Posted by bennn (Post 669135)

AMD can do one and half DTC hull case in 56000 sec, while INTEL can do one of those in 52000s. Compared to the price paid, I think AMD is at least as efficient.

:confused::confused::confused:
Because it still has 4 cores left idling? That seems like quite a daring extrapolation. Go ahead and try it, you might be surprised. CFD performance usually does not scale linearly with the number of cores. That's why I would be more interested in a comparison with the full amount of physical cores. 12 for AMD, 8 for Intel.

bennn

October 25, 2017 08:35

You understand though that I can't just increase the amount of parallel domains just for one chip, otherwise the results are biased right ?

Is it ok for you if I launch concurrently 2 of the same motorbike case on 8 cores on Intel, and 3 on AMD ?

flotus1

October 25, 2017 08:45

Biased in which sense? Higher communication overhead due to a larger number of smaller domains? That is exactly why I always prefer a smaller number of faster cores over a larger number of slower cores.
Running several cases concurrently, the results will also be "biased" due to a lack of total memory bandwidth. Plus you need 50% more memory in total if you want to run 50% more cases simultaneously. Which increases the hardware cost.
When I need a result, I am interested in how fast my computer can provide it. Avoiding biases caused by parallel efficiencies <100% is usually the least of my worries and sounds more like cherry-picking to me.

lac	October 26, 2017 10:52

I'm also interested in some results for these chips with some specific settings:
1. Hyperthreading turned off.
2. All cores are used on both cpus, but for only one job/CPU.
3. Run the parallel threads with affinity set (mpirun -np (number of cores) -bind-to hwthread)

As I have read it on this forum many times, and experienced it myself too, hyperthreading is most of the time useless for CFD.
I think that all cores should be used if possible. Off course it will be biased in some way, but you won't buy hardware with 12 cores to have 4 idling.
The last thing, affinity will help the AMD CPU most likely, as due to the architecture it acts like as multiple CPUs (considering the higher latency communication between the different CCX-es ).
Also, I don't know, if the different available instruction sets (AVX2 vs AVX512) have influence on the results, but it's possible that they do.

bennn

October 26, 2017 11:57

Hi all, latest tests :

motorBike on all CPUs :
AMD : 113s
Intel : 135s

and now that is counter-intuitive for me, but using --bind-to hwthread actually makes computation time twice as long for AMD and 1.5 for Intel. Using --bind-to none solves the issue, and is the way to get for several single-threaded jobs.

RobertB

October 27, 2017 06:25

Perhaps a stupid question but since you appear to have hyperthreading on did you core lock to only the physical cores?

If it is half as fast it looks like you might of locked to both the physical and hyperthreaded core and left half the cores unused.

Iirc (and I may not) you need to lock to every other core 0,2.

We always found core locking worked better on the Xeons, admittedly dual processor systems where a thread being pushed to the other core would cause a major loss in cache efficiency.

JBeilke

October 30, 2017 08:17

Hi Benoit,

we ran the Motorbike case on a Xeon E5-1650 v3 (6 core processor) with hyperthreading turned off on 6 cores and got:

ExecutionTime = 167.03 s ClockTime = 169 s

How does this compare to your machines, with HT disabled?

Thanks
Jörn

lac	October 30, 2017 10:44

Quote:

Originally Posted by RobertB (Post 669393)

Perhaps a stupid question but since you appear to have hyperthreading on did you core lock to only the physical cores?

You can try to run it with -bind-to core if HT was turned on. It would explain why you had this slow down.
On my WS the results (Clocktime, Motorbike case, OFv5):
73s (with -bind-to hwthread)
110s (withouth it)
The machine is:
Dual Xeon E5-2673 v3 (all-core turbo 2.7 GHz, 12core/cpu)
8x8GB single rank dimms
HT off

bennn

November 2, 2017 03:28

Ok so the results with HT off is exactly the same. With HT on running with 8 or 16 cores for intel chip, and 12 or 24 cores for AMD chip, all give the same results as well.

No improvement with any bind-to setting for now.

Testing multiple single CPU jobs in the next few days.

Simbelmynë

November 24, 2017 10:51

Quote:

Originally Posted by lac (Post 669740)

Just curious. When you make comparisons, using different decomposition of the motorbike case, how do you know that you are decomposing the domain similarly? Or is this just indication of the performance of -bind-to hwthread?

lac	November 27, 2017 07:33

I have used the same, default hierarchical decomposition (with n = (6 4 1)) with the same number of domains. So yes, it show the 'performance' of process binding.

Simbelmynë

November 27, 2017 08:25

So do you time the simpleFoam execution or is it everything in the Allrun script file?

Using 14 threads on a 7940X (HT enabled), with decomposition (7-2-1), I have done some benchmarks.

Assuming you time the simpleFoam only then:

Code:

$ time mpirun -np 14 -bind-to none simpleFoam -parallel

Gives a real time of 117s.

Code:

$ time mpirun -np 14 -bind-to hwthread simpleFoam -parallel

Yields a real time of 150s.

A simple

Code:

$ time ./Allrun

Results in 157s of real time. (this is without -bind-to hwthread)

lac	November 27, 2017 08:36

If you use -bind-to-hwthread with HT turned on, I guess processes will be bind to the 'real' and 'HT' cores as well. So it may be better to use bind to cores. I only timed the simpleFoam execution btw.

The_Sle

January 23, 2018 22:07

Hi and thanks for this and other similar conversations, buying kit can be a pain without some information beforehand, and this forum eases that pain quite significantly :D

I'd like to add the overclocking capabilities of Skylake-X to this conversation. I recently purchased a 7820X and am running OpenFOAM with it, quite succesfully. My chip (and pretty much all of them) will run 4,5 GHz on all cores on air cooling with ease. This is of course true (with some limitations) on the i9 chips as well, and the results improve beyond their AMD counterparts.

With 32 GB of 3200 MHz memory, I can run the simpleFoam part of motorBike-tutorial in 121 seconds on 8 threads, which in my mind makes the Skylake look better value than Threadripper for OF use at least, when considering the disparity in motherboard and cooling costs.

Cheers

JBeilke

January 24, 2018 01:01

Thanks for sharing the results. We usually used 6 cores for this benchmark. So it is easier to compare the results.

It would be interesting to see some results from the Epyc for this benchmark.

Simbelmynë

January 24, 2018 03:37

Thank you for sharing the OC results. Was it with Allrun or with just the solver?

The_Sle

January 24, 2018 14:57

6 cores run in 134 seconds.

Both results are for just the solver, with

Code:

time mpirun -bind-to none -np 6 simpleFoam -parallel

All times are GMT -4. The time now is 02:57.

Page 1 of 2

Show 40 post(s) from this thread on one page