CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

AMD Ryzen Threadripper 1920X vs. Intel Core i7 7820X

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree14Likes

Reply
 
LinkBack Thread Tools Display Modes
Old   October 25, 2017, 05:22
Default AMD Ryzen Threadripper 1920X vs. Intel Core i7 7820X
  #1
Member
 
benoit paillard
Join Date: Mar 2010
Posts: 72
Rep Power: 10
bennn is on a distinguished road
Hi all,

After all the talks about these two new core families, I had the opportunity to build two new stations, one with each.

AMD Ryzen Threadripper 1920X, 3.5 GHz, 12 cores, 24 threads, 658.25€ in France
http://www.amd.com/fr/products/cpu/a...adripper-1920x

Intel Core i7 7820X, 3.6 GHz, 8 cores, 16 threads, 541.58€ in France
https://www.intel.fr/content/www/fr/.../i7-7820x.html

Motherboard for AMD is 38 euros more expensive, the cooling is 30 euros more expensive, and the power supply is bigger so 15 euros more expensive. So let's assume the overall cost is 741.25 for AMD

Both cores were tried hyperthreaded.

They have the exact same memory fitted :
Corsair Mťmoire PC Vengeance LPX - DDR4 - Kit 32Go (4x 8 Go) - 3200 MHz - CL16 -
The memory was more than enough for all cases tested.

And the exact same drives. No overclocking was used.

The results on OpenFOAM are

Motorbike simpleFOAM (OF v5.0) on 6 cores

AMD : ExecutionTime = 153.31 s ClockTime = 155 s
Intel : ExecutionTime = 148.6 s ClockTime = 155 s

DTCHull interDyMFOAM (OF v1706) on 8 cores
AMD : ExecutionTime = 56577.9 s ClockTime = 56665 s
Intel : ExecutionTime = 52854.7 s ClockTime = 52888 s

If you compute the "euros * time /core" index you get :

AMD : 3500244
Intel : 3580385

So it is very close, but AMD is still a good choice.

I'd like to add that AMD temperature sensing was messy, with lm-sensors not reading it. But after managing to see the temperature during the runs, Intel reached 70deg C while AMD was only 50.
flotus1 and BlnPhoenix like this.

Last edited by bennn; October 25, 2017 at 08:28. Reason: Changed dimm and temperature info ; added platform cost
bennn is offline   Reply With Quote

Old   October 25, 2017, 05:41
Default
  #2
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,811
Rep Power: 28
flotus1 will become famous soon enoughflotus1 will become famous soon enough
Thanks for sharing your results.
However, I am not quite convinced by your metric. So far, the Intel chip (let alone the platform) costs less and is faster. I would be more interested in a comparison running with the maximum amount of physical cores available.
Which exact memory are you using? Did both cases fit in the memory?
BlnPhoenix likes this.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   October 25, 2017, 08:14
Default
  #3
Member
 
benoit paillard
Join Date: Mar 2010
Posts: 72
Rep Power: 10
bennn is on a distinguished road
Well my understanding is that, thinking in not hyperthreaded logic, AMD can do one and half DTC hull case in 56000 sec, while INTEL can do one of those in 52000s. Compared to the price paid, I think AMD is at least as efficient.

Ho and by the way the motherboard is 38 euros more expensive for AMD now. I should add that indeed.

I've updated my initial post with answers re dimms

I'm open to any feedback or test that you think make sense.
bennn is offline   Reply With Quote

Old   October 25, 2017, 08:29
Default
  #4
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,811
Rep Power: 28
flotus1 will become famous soon enoughflotus1 will become famous soon enough
Quote:
Originally Posted by bennn View Post
AMD can do one and half DTC hull case in 56000 sec, while INTEL can do one of those in 52000s. Compared to the price paid, I think AMD is at least as efficient.

Because it still has 4 cores left idling? That seems like quite a daring extrapolation. Go ahead and try it, you might be surprised. CFD performance usually does not scale linearly with the number of cores. That's why I would be more interested in a comparison with the full amount of physical cores. 12 for AMD, 8 for Intel.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   October 25, 2017, 08:35
Default
  #5
Member
 
benoit paillard
Join Date: Mar 2010
Posts: 72
Rep Power: 10
bennn is on a distinguished road
You understand though that I can't just increase the amount of parallel domains just for one chip, otherwise the results are biased right ?

Is it ok for you if I launch concurrently 2 of the same motorbike case on 8 cores on Intel, and 3 on AMD ?
bennn is offline   Reply With Quote

Old   October 25, 2017, 08:45
Default
  #6
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,811
Rep Power: 28
flotus1 will become famous soon enoughflotus1 will become famous soon enough
Biased in which sense? Higher communication overhead due to a larger number of smaller domains? That is exactly why I always prefer a smaller number of faster cores over a larger number of slower cores.
Running several cases concurrently, the results will also be "biased" due to a lack of total memory bandwidth. Plus you need 50% more memory in total if you want to run 50% more cases simultaneously. Which increases the hardware cost.
When I need a result, I am interested in how fast my computer can provide it. Avoiding biases caused by parallel efficiencies <100% is usually the least of my worries and sounds more like cherry-picking to me.
__________________
Please do not send me CFD-related questions via PM
flotus1 is offline   Reply With Quote

Old   October 26, 2017, 10:52
Default
  #7
lac
New Member
 
Join Date: Apr 2016
Posts: 12
Rep Power: 4
lac is on a distinguished road
I'm also interested in some results for these chips with some specific settings:
1. Hyperthreading turned off.
2. All cores are used on both cpus, but for only one job/CPU.
3. Run the parallel threads with affinity set (mpirun -np (number of cores) -bind-to hwthread)

As I have read it on this forum many times, and experienced it myself too, hyperthreading is most of the time useless for CFD.
I think that all cores should be used if possible. Off course it will be biased in some way, but you won't buy hardware with 12 cores to have 4 idling.
The last thing, affinity will help the AMD CPU most likely, as due to the architecture it acts like as multiple CPUs (considering the higher latency communication between the different CCX-es ).
Also, I don't know, if the different available instruction sets (AVX2 vs AVX512) have influence on the results, but it's possible that they do.
ashokac7 likes this.
lac is offline   Reply With Quote

Old   October 26, 2017, 11:57
Default
  #8
Member
 
benoit paillard
Join Date: Mar 2010
Posts: 72
Rep Power: 10
bennn is on a distinguished road
Hi all, latest tests :

motorBike on all CPUs :
AMD : 113s
Intel : 135s

and now that is counter-intuitive for me, but using --bind-to hwthread actually makes computation time twice as long for AMD and 1.5 for Intel. Using --bind-to none solves the issue, and is the way to get for several single-threaded jobs.
elvis, flotus1, BlnPhoenix and 2 others like this.
bennn is offline   Reply With Quote

Old   October 27, 2017, 06:25
Default
  #9
Senior Member
 
Robert
Join Date: Jun 2010
Posts: 112
Rep Power: 10
RobertB is on a distinguished road
Perhaps a stupid question but since you appear to have hyperthreading on did you core lock to only the physical cores?

If it is half as fast it looks like you might of locked to both the physical and hyperthreaded core and left half the cores unused.

Iirc (and I may not) you need to lock to every other core 0,2.

We always found core locking worked better on the Xeons, admittedly dual processor systems where a thread being pushed to the other core would cause a major loss in cache efficiency.
lac likes this.
RobertB is offline   Reply With Quote

Old   October 30, 2017, 09:17
Default
  #10
Senior Member
 
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 246
Rep Power: 13
JBeilke is on a distinguished road
Hi Benoit,

we ran the Motorbike case on a Xeon E5-1650 v3 (6 core processor) with hyperthreading turned off on 6 cores and got:

ExecutionTime = 167.03 s ClockTime = 169 s

How does this compare to your machines, with HT disabled?

Thanks
JŲrn
lac and AhmadZ like this.
JBeilke is offline   Reply With Quote

Old   October 30, 2017, 11:44
Default
  #11
lac
New Member
 
Join Date: Apr 2016
Posts: 12
Rep Power: 4
lac is on a distinguished road
Quote:
Originally Posted by RobertB View Post
Perhaps a stupid question but since you appear to have hyperthreading on did you core lock to only the physical cores?
You can try to run it with -bind-to core if HT was turned on. It would explain why you had this slow down.
On my WS the results (Clocktime, Motorbike case, OFv5):
73s (with -bind-to hwthread)
110s (withouth it)
The machine is:
Dual Xeon E5-2673 v3 (all-core turbo 2.7 GHz, 12core/cpu)
8x8GB single rank dimms
HT off
flotus1 likes this.
lac is offline   Reply With Quote

Old   November 2, 2017, 04:28
Default
  #12
Member
 
benoit paillard
Join Date: Mar 2010
Posts: 72
Rep Power: 10
bennn is on a distinguished road
Ok so the results with HT off is exactly the same. With HT on running with 8 or 16 cores for intel chip, and 12 or 24 cores for AMD chip, all give the same results as well.

No improvement with any bind-to setting for now.

Testing multiple single CPU jobs in the next few days.
bennn is offline   Reply With Quote

Old   November 24, 2017, 11:51
Default
  #13
Senior Member
 
SimbelmynŽ's Avatar
 
Join Date: May 2012
Posts: 221
Rep Power: 8
SimbelmynŽ is on a distinguished road
Quote:
Originally Posted by lac View Post
You can try to run it with -bind-to core if HT was turned on. It would explain why you had this slow down.
On my WS the results (Clocktime, Motorbike case, OFv5):
73s (with -bind-to hwthread)
110s (withouth it)
The machine is:
Dual Xeon E5-2673 v3 (all-core turbo 2.7 GHz, 12core/cpu)
8x8GB single rank dimms
HT off
Just curious. When you make comparisons, using different decomposition of the motorbike case, how do you know that you are decomposing the domain similarly? Or is this just indication of the performance of -bind-to hwthread?
SimbelmynŽ is offline   Reply With Quote

Old   November 27, 2017, 08:33
Default
  #14
lac
New Member
 
Join Date: Apr 2016
Posts: 12
Rep Power: 4
lac is on a distinguished road
I have used the same, default hierarchical decomposition (with n = (6 4 1)) with the same number of domains. So yes, it show the 'performance' of process binding.
lac is offline   Reply With Quote

Old   November 27, 2017, 09:25
Default
  #15
Senior Member
 
SimbelmynŽ's Avatar
 
Join Date: May 2012
Posts: 221
Rep Power: 8
SimbelmynŽ is on a distinguished road
So do you time the simpleFoam execution or is it everything in the Allrun script file?

Using 14 threads on a 7940X (HT enabled), with decomposition (7-2-1), I have done some benchmarks.

Assuming you time the simpleFoam only then:

Code:
$ time mpirun -np 14 -bind-to none simpleFoam -parallel
Gives a real time of 117s.

Code:
$ time mpirun -np 14 -bind-to hwthread simpleFoam -parallel
Yields a real time of 150s.

A simple
Code:
$ time ./Allrun
Results in 157s of real time. (this is without -bind-to hwthread)
SimbelmynŽ is offline   Reply With Quote

Old   November 27, 2017, 09:36
Default
  #16
lac
New Member
 
Join Date: Apr 2016
Posts: 12
Rep Power: 4
lac is on a distinguished road
If you use -bind-to-hwthread with HT turned on, I guess processes will be bind to the 'real' and 'HT' cores as well. So it may be better to use bind to cores. I only timed the simpleFoam execution btw.
lac is offline   Reply With Quote

Old   January 23, 2018, 23:07
Default
  #17
New Member
 
Join Date: Jan 2018
Posts: 6
Rep Power: 2
The_Sle is on a distinguished road
Hi and thanks for this and other similar conversations, buying kit can be a pain without some information beforehand, and this forum eases that pain quite significantly

I'd like to add the overclocking capabilities of Skylake-X to this conversation. I recently purchased a 7820X and am running OpenFOAM with it, quite succesfully. My chip (and pretty much all of them) will run 4,5 GHz on all cores on air cooling with ease. This is of course true (with some limitations) on the i9 chips as well, and the results improve beyond their AMD counterparts.

With 32 GB of 3200 MHz memory, I can run the simpleFoam part of motorBike-tutorial in 121 seconds on 8 threads, which in my mind makes the Skylake look better value than Threadripper for OF use at least, when considering the disparity in motherboard and cooling costs.

Cheers
The_Sle is offline   Reply With Quote

Old   January 24, 2018, 02:01
Default
  #18
Senior Member
 
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 246
Rep Power: 13
JBeilke is on a distinguished road
Thanks for sharing the results. We usually used 6 cores for this benchmark. So it is easier to compare the results.

It would be interesting to see some results from the Epyc for this benchmark.
JBeilke is offline   Reply With Quote

Old   January 24, 2018, 04:37
Default
  #19
Senior Member
 
SimbelmynŽ's Avatar
 
Join Date: May 2012
Posts: 221
Rep Power: 8
SimbelmynŽ is on a distinguished road
Thank you for sharing the OC results. Was it with Allrun or with just the solver?
SimbelmynŽ is offline   Reply With Quote

Old   January 24, 2018, 15:57
Default
  #20
New Member
 
Join Date: Jan 2018
Posts: 6
Rep Power: 2
The_Sle is on a distinguished road
6 cores run in 134 seconds.

Both results are for just the solver, with

Code:
time mpirun -bind-to none -np 6 simpleFoam -parallel
The_Sle is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
solving a conduction problem in FLUENT using UDF Avin2407 Fluent UDF and Scheme Programming 1 March 13, 2015 03:02
Superlinear speedup in OpenFOAM 13 msrinath80 OpenFOAM Running, Solving & CFD 18 March 3, 2015 06:36
Color display problem to view OpenFOAM results. Sargam05 OpenFOAM Paraview & paraFoam 16 May 11, 2013 00:10
CFX11 + Fortran compiler ? Mohan CFX 20 March 30, 2011 18:56
AMD X2 & INTEL core 2 are compatible for parallel? nikolas FLUENT 0 October 5, 2006 06:49


All times are GMT -4. The time now is 14:43.