CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

AMD's Ryzen threadripper

Register Blogs Community New Posts Updated Threads Search

Like Tree5Likes
  • 1 Post By BlnPhoenix
  • 2 Post By Simbelmynė
  • 1 Post By flotus1
  • 1 Post By flotus1

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   January 5, 2018, 00:57
Smile AMD's Ryzen threadripper
  #1
Senior Member
 
ashokac7's Avatar
 
Ashok Chaudhari
Join Date: Aug 2016
Location: Pune, India
Posts: 260
Rep Power: 10
ashokac7 is on a distinguished road
Send a message via Skype™ to ashokac7
How is the AMD's Ryzen threadripper for Workstation. It has 16 cores, 32 threads with 3.4 GHz base clock frequency. It is priced here at 90k INR (1420 USD). Similar configuration in Intel costs 2 times. I want to do some low level simulations on multi-cores and my budget is limited. Any suggestion?
ashokac7 is offline   Reply With Quote

Old   January 5, 2018, 06:15
Default
  #2
Senior Member
 
Join Date: Aug 2014
Location: Germany
Posts: 292
Rep Power: 13
BlnPhoenix is on a distinguished road
Quote:
Originally Posted by ashokac7 View Post
How is the AMD's Ryzen threadripper for Workstation. It has 16 cores, 32 threads with 3.4 GHz base clock frequency. It is priced here at 90k INR (1420 USD). Similar configuration in Intel costs 2 times. I want to do some low level simulations on multi-cores and my budget is limited. Any suggestion?

I have the threadripper in my private workstation and i like it. I did not compare against Intel but for me it does a good job. Also imho you can have a relatively cheap an quiet system even with air cooling (e.g. Noctua).

For me, i would be suprised if Intel does a significantly better job for the same money spend, so i honestly do not regret building my system with it.
ashokac7 likes this.
BlnPhoenix is offline   Reply With Quote

Old   January 5, 2018, 08:25
Default
  #3
Senior Member
 
Simbelmynė's Avatar
 
Join Date: May 2012
Posts: 546
Rep Power: 15
Simbelmynė is on a distinguished road
I have some comparison in this thread. Check the later parts of the thread though for accurate benchmarks.

In short: For the LBM simulations, 1950X outperformed 7940X by 30% while at a similar price.
BlnPhoenix and ashokac7 like this.
Simbelmynė is offline   Reply With Quote

Old   January 10, 2018, 03:36
Smile
  #4
Senior Member
 
ashokac7's Avatar
 
Ashok Chaudhari
Join Date: Aug 2016
Location: Pune, India
Posts: 260
Rep Power: 10
ashokac7 is on a distinguished road
Send a message via Skype™ to ashokac7
Thank You !!!!
ashokac7 is offline   Reply With Quote

Old   September 30, 2020, 15:16
Default 3990x ?
  #5
Member
 
Patti Michelle Sheaffer
Join Date: Sep 2018
Posts: 55
Rep Power: 7
pattim is on a distinguished road
Is there any new information on this? I need at least 44 cores for a complex (chemistry/multiphysics) CFD model. So, naturally, the 3990x seems to be the only ryzen suitable on a single mobo.
pattim is offline   Reply With Quote

Old   September 30, 2020, 15:54
Default
  #6
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Which software are you using? Why does it need at least 44 cores? Such a specific number sounds pretty weird.
flotus1 is offline   Reply With Quote

Old   September 30, 2020, 16:36
Default
  #7
Member
 
Patti Michelle Sheaffer
Join Date: Sep 2018
Posts: 55
Rep Power: 7
pattim is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Which software are you using? Why does it need at least 44 cores? Such a specific number sounds pretty weird.

Hi - thanks. The number (44) comes from the way the domain is hard-coded to be decomposed on this particular weather model. (Still, really large cell-count CFD models which I've done can use many more processors.)

An important question for a CPU is how well does the memory<->core data transport happen (also core<->core data transport) under MPICHx. I am worried that Ryzen may not have enough data/memory lanes to really support all the cores in a big simulation. I've only seen decent ray-tracing demos - I haven't found good, stressing CFD demos on the latest high-core-count Ryzens. I had such CPU-bottleneck problems with 638x Opterons.
pattim is offline   Reply With Quote

Old   September 30, 2020, 17:00
Default
  #8
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
If parallelization is implemented via MPI + domain decomposition, it is very likely that the code handles NUMA rather well. In which case two Epyc CPUs would be a much better choice compared to a single TR3990x. And they would be even cheaper if you go with 24-core models like the Epyc 7352. I think you are on the right track, suspecting that only 4 memory channels would severely limit performance of a Threadripper CPU with this many cores. It has been demonstrated numerous times with every CFD and FEA benchmark I have seen so far.
pattim likes this.
flotus1 is offline   Reply With Quote

Old   September 30, 2020, 17:37
Default
  #9
Member
 
Patti Michelle Sheaffer
Join Date: Sep 2018
Posts: 55
Rep Power: 7
pattim is on a distinguished road
Thanks Flotus - that's been my experience also. But in that case, the motherboard also matters. I was unfamiliar with Epyc - it looks like they make 2-CPU systems. https://en.wikipedia.org/wiki/Epyc Maybe Supermicro has a workstation mobo for that... yes, it looks like they're the primary supplier there.... at least for 2P.
In my experience with AMD, clock speed also matters a lot, especially turbo. Base RAM speed also matters a lot.
Yes to MPI+DD for parallelization.

Quote:
Originally Posted by flotus1 View Post
If parallelization is implemented via MPI + domain decomposition, it is very likely that the code handles NUMA rather well. In which case two Epyc CPUs would be a much better choice compared to a single TR3990x. And they would be even cheaper if you go with 24-core models like the Epyc 7352. I think you are on the right track, suspecting that only 4 memory channels would severely limit performance of a Threadripper CPU with this many cores. It has been demonstrated numerous times with every CFD and FEA benchmark I have seen so far.

Last edited by pattim; September 30, 2020 at 19:11.
pattim is offline   Reply With Quote

Old   September 30, 2020, 19:31
Default
  #10
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
For a DIY solution, you have two motherboards to choose from:
Supermicro H11DSi(-NT) R2.0
Gigabyte MZ72-HB0
Both will do for CPUs with 24 or 32 cores, but the latter is a bit on the expensive side at almost 900€.

And trust me on this: lower clock speed, lower memory frequency and higher latency on Epyc do not make the comparison any more favorable for Threadripper CPUs. They are just the wrong tool for the job.
flotus1 is offline   Reply With Quote

Old   September 30, 2020, 20:46
Default
  #11
Member
 
Patti Michelle Sheaffer
Join Date: Sep 2018
Posts: 55
Rep Power: 7
pattim is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
For a DIY solution, you have two motherboards to choose from:
Supermicro H11DSi(-NT) R2.0
Gigabyte MZ72-HB0
Both will do for CPUs with 24 or 32 cores, but the latter is a bit on the expensive side at almost 900€.

And trust me on this: lower clock speed, lower memory frequency and higher latency on Epyc do not make the comparison any more favorable for Threadripper CPUs. They are just the wrong tool for the job.
I didn't understand you on which was better for MPI+DD.

My understanding is that memory channel count (but also topology, including how cores access memory... "hops") with large #'s of cores is most important when everything else is similar. Cores don't solve a single decomposed domain section in one go - so I think it doesn't matter about core <-> core communication speed; things are mostly memory-access limited in these cases. Also, more than one thread per core is typically not used since each core is basically running flat-out in that type of simulation.

I have to admit that I'm not really a hardware person. During the Opteron days, it was noted that Xeon's had much faster prefetch algorithms than Opterons, so always win out on MPI+DD simulations. That's the only real data point I have and it's pretty outdated.
pattim is offline   Reply With Quote

Old   October 1, 2020, 05:09
Default
  #12
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
The current "Zen2" architecture of AMD Epyc and Ryzen CPUs has nothing in common with the "Bulldozer" and "Piledriver" architecture from the Opteron days. Both in terms of the architecture itself, and more importantly, the performance characteristics.
There are many data points for popular CFD solvers scattered throughout this forum. To name a few:
Xeon Gold Cascade Lake vs Epyc Rome - CFX & Fluent - Benchmarks (Windows Server 2019)
AMD Epyc CFD benchmarks with Ansys Fluent
And of course: OpenFOAM benchmarks on various hardware

Quote:
I didn't understand you on which was better for MPI+DD.
It has less to do with how parallelism is implemented, and more with how low computational intensity of most CFD solvers is.
Threadripper CPUs only have 4 memory channels. Which is why in CFD benchmarks, parallel scaling usually stops somewhere between 8-12 cores. The memory subsystem can not get the data to the cores fast enough to keep them all busy. Because the actual amount of computation going on is rather low (i.e. low computational intensity).
The situation is much improved with Epyc CPUs, because they have 8 memory channels operating at nearly the same transfer speeds. The ability to add a second CPU, which doubles the amount of memory channels, increases the gap even more. To the point where two Epyc CPUs will solve a CFD model more than 3 times faster than any Threadripper CPU.
The point about MPI + domain decomposition is that just lends itself better to systems with more than one NUMA node. Compared to e.g. parallelism through poorly optimized OpenMP, which can be severely limited by the increased latency if frequent access to data within a different NUMA node is necessary.

To summarize: Threadripper bad, Epyc good, 2xEpyc even better
pattim likes this.
flotus1 is offline   Reply With Quote

Old   October 1, 2020, 19:52
Default
  #13
Member
 
Patti Michelle Sheaffer
Join Date: Sep 2018
Posts: 55
Rep Power: 7
pattim is on a distinguished road
Thanks for the clear explanation - I was trying to survey 3 "new" (to me) AMD processors; tricky at best - especially trying to remember to watch for any "gotchas" in processors and mobos.

Currently, I have a few 3.3GHz Xeon 5680 hexacore 2P 1U R610 servers hooked together with an IB (connectX2) switch (gfortran/openmpi -O3 compiled). It seems fast, but for some reason rails-out at 3 boxes hooked together (32 out of 36CPUs... even if I add another box; 32CPUs total is the fastest - past that it actually slows a bit - but the software itself is known to rail at 88-90 CPUs on "big iron" systems). The IB is tested to be nowhere near saturation during normal running of the software, and every server box I add to increase CPU count adds its own memory channel set, so it seems very odd that domain decomposition rails out like that. I'm obviously overlooking something.

I guess I was looking to see if there's a single-box, high-quality alternative, but even in 2020 they're still super-pricey with no way to test-before-buy.

Thanks,
Patricia
pattim is offline   Reply With Quote

Old   October 2, 2020, 02:54
Default
  #14
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
I thought your simulation always has 44 domains. How do you run it on less than 44 cores? Overprovisioning?

Anyway, there are ways to run benchmarks of your code on newer hardware. You could briefly rent an instance with these CPUs from some cloud computing service.
Or if it's easy no get running and non-confidential, maybe you find someone here who can give it a try with their own hardware.
flotus1 is offline   Reply With Quote

Old   October 2, 2020, 16:00
Default
  #15
Member
 
Patti Michelle Sheaffer
Join Date: Sep 2018
Posts: 55
Rep Power: 7
pattim is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
I thought your simulation always has 44 domains. How do you run it on less than 44 cores? Overprovisioning?

Anyway, there are ways to run benchmarks of your code on newer hardware. You could briefly rent an instance with these CPUs from some cloud computing service.
Or if it's easy no get running and non-confidential, maybe you find someone here who can give it a try with their own hardware.

I've been googling around for some cloud system that claims to have Epyc 2P systems. No joy. I guess cloud doesn't really work that way.
pattim is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
New AMD Ryzen cpus epc1 Hardware 50 April 16, 2021 05:26
AMD Ryzen Threadripper 1920X vs. Intel Core i7 7820X bennn Hardware 30 January 28, 2018 14:54
Threadripper - Titan Xp workstation (baby Nvidia DEVBOX alternative?) dussa Hardware 2 September 23, 2017 00:45
Ryzen 5 1600 or Intel cpu for Ansys Fluent??? tusher3365 Hardware 5 September 10, 2017 05:45
Threadripper or Dual Xeon system? Echidna Hardware 1 August 16, 2017 17:44


All times are GMT -4. The time now is 00:31.