CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Maximum number of useful cores?

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree6Likes
  • 1 Post By flotus1
  • 1 Post By CFDfan
  • 2 Post By flotus1
  • 1 Post By mluckyw
  • 1 Post By ErikAdr

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   October 2, 2021, 08:31
Default Maximum number of useful cores?
  #1
New Member
 
Join Date: Feb 2020
Posts: 9
Rep Power: 3
EagerToLearn is on a distinguished road
Hello,

I wanted to get into topology optimization of heat sinks (like e.g. shown here: https://www.youtube.com/watch?v=wXZWTVnBRLU). With a new system probably coming soon, I was wondering what hardware to best use.

As shown in the presentation (e.g. min 9:00) they used a super-computer wit 1200 cores. Now what I am wondering: Does a Dual 64 core (EPYC Rome/Milan) setup make sense for this application or will the memory cause a bottleneck and therefore not make full use of all 128 cores?

Regards
Mike
EagerToLearn is offline   Reply With Quote

Old   October 2, 2021, 09:12
Default
  #2
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 2,866
Rep Power: 40
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Hard to make any assumptions based on the information presented in the video.
But in general, for any decently optimized CFD solver, it is not worth going beyond 32 cores per CPU on Epyc Rome/Milan. At that point, excess money is better spent on more nodes than more cores per node.
If you already know which software you will be using, you can just run a few scaling tests on hardware you already have. Post the results here if you need help making sense of the results.
EagerToLearn likes this.
flotus1 is offline   Reply With Quote

Old   October 2, 2021, 12:33
Default
  #3
New Member
 
Join Date: Feb 2020
Posts: 9
Rep Power: 3
EagerToLearn is on a distinguished road
Hi Alex,

thank you. Currently we only have sinlge CPU 32 core workstations. There are clusters here, but due to some internal "difficulties" they are not accessible to me/us.
While we do have an IT department, how difficult is building a multi node system (I assume it is connecting multiple workstations via a network an one workstation acting as the task distributor)? Thinking about it, this would be a good solution, as during the week people/students could be using the workstations at longer calculations could be run during the weekend.

The software will probably be ANSYS or COMSOL.
EagerToLearn is offline   Reply With Quote

Old   October 2, 2021, 14:13
Default
  #4
Senior Member
 
Join Date: Jun 2011
Posts: 110
Rep Power: 12
CFDfan is on a distinguished road
the guy on the video spent a lot of computational time optimizing the shapes and the presentation itself was very nice. I have doubts however about the practicality of the optimized heatsinks because they will be expensive to manufacture (compared with cheap, extruded ones) and yet the temperature improvements are insignificant to be taken seriously. Maybe in the LED world it will fly because every C' degree down extends the LED's life, but not in the general electronics.
Regarding the hardware - 32 cores will do the job as well working over the weekends. The CFD codes don't benefit tremendously of having thousands of cores because the overheads of managing the paralleled cores becomes the bottle neck of the parallel processing at some point. The commercial CFD code developers usually have a graph showing the improvements in the processing speed by the number of paralleled cores. You could inquire about it and make an educated decision how much paralleled cores would make the desired time difference. Besides the increased hardware cost there is question of the rising cost of licensing as well, so you have to play with these 3 parameters to figure out what is the best for you.
EagerToLearn likes this.
CFDfan is online now   Reply With Quote

Old   October 2, 2021, 17:03
Default
  #5
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 2,866
Rep Power: 40
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
Originally Posted by EagerToLearn View Post
thank you. Currently we only have sinlge CPU 32 core workstations. There are clusters here, but due to some internal "difficulties" they are not accessible to me/us.
A single workstation would be enough to answer your initial question.
If I understood correctly, it wasn't about "how many cores can be used in total", but whether you get any benefit from more than 4 cores per memory channel.

Quote:
Originally Posted by EagerToLearn View Post
While we do have an IT department, how difficult is building a multi node system (I assume it is connecting multiple workstations via a network an one workstation acting as the task distributor)?
It's not magic, especially for small clusters with a few thousand cores maximum. A proper IT department should have no trouble setting that up. Of course, support and maintenance also cost time and money.
But there are some considerations before buying your own cluster.
1) Licensing constraints. Working with "Ansys" and Comsol, you need enough licenses to run on this many cores. They can be expensive.
2) Utilization. Having your own cluster is only viable if you can keep it fed with jobs pretty much 24/7. Otherwise, buying on-demand computing time on commercial (or academic?) HPC clusters is the easier and cheaper solution.

A note on parallelization, especially for topology optimization: there are two ways to do it.
1) Sequential execution of geometry iterations. this means one job runs parallel across all compute resources. Upon completion, the next geometry iteration can start based on these results. Running this across multiple compute nodes requires proper interconnects like Infiniband
2) Parallel execution of several cases. The cases can run independently for the most part, e.g. one on each compute node. This is much easier to parallelize since communication/synchronization only needs to happen at the end of a batch of runs. And can be done with pretty much any node interconnect. Particularly handy if the individual jobs are too small to get good scaling on many cores.

What you can pick depends on the software, and may also have implications on the number and type of licenses you need for commercial solvers.
CFDfan and EagerToLearn like this.
flotus1 is offline   Reply With Quote

Old   October 6, 2021, 14:44
Default
  #6
New Member
 
Join Date: Feb 2020
Posts: 9
Rep Power: 3
EagerToLearn is on a distinguished road
Thank you very much

So based on your answers I think the best course of action will be to create a model and compare the times on a 32 an 64 core workstation.
EagerToLearn is offline   Reply With Quote

Old   October 6, 2021, 16:29
Default
  #7
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 2,866
Rep Power: 40
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
It would be easier to interpret the results if the tests are run on a single machine. Eliminates a few variables. Ideally, a strong scaling test is run with 1, 2, 4, 8... threads, up to the number of physical cores.
From my previous experiences of recommending a scaling test, here are some pointers. Just ignore if this is old news to you: Eliminate setup and post-processing times from the results, we are only interested in solver time. And we don't need a full run, the average time for a few iterations is enough. It doesn't have to be a fancy workstation with many cores. And it doesn't have to be a huge test case that runs for several hours. It just has to be representative of what you actually want to run.
flotus1 is offline   Reply With Quote

Old   October 7, 2021, 04:57
Default
  #8
New Member
 
Join Date: Feb 2020
Posts: 9
Rep Power: 3
EagerToLearn is on a distinguished road
Thank you for the tips, now I get it/remember. Use the software to regulate the cores that are being used.

So the first step will be creating a model that is representative of what I want to do.
EagerToLearn is offline   Reply With Quote

Old   October 7, 2021, 14:27
Default
  #9
Senior Member
 
Erik
Join Date: Feb 2011
Location: Earth (Land portion)
Posts: 1,078
Rep Power: 20
evcelica is on a distinguished road
Double Precision will also effect the scaling, as it uses more memory bandwidth.
Personally I run everything as double precision by default, and start to see non-linear scaling much sooner than single precision.
evcelica is offline   Reply With Quote

Old   October 7, 2021, 14:52
Default
  #10
Senior Member
 
Join Date: Jun 2011
Posts: 110
Rep Power: 12
CFDfan is on a distinguished road
Quote:
Originally Posted by evcelica View Post
Double Precision will also effect the scaling, as it uses more memory bandwidth.
Personally I run everything as double precision by default, and start to see non-linear scaling much sooner than single precision.

I have never used single precision and wander:

Does a single precision case run faster (than double)?
Does the convergence suffer with single precision?
CFDfan is online now   Reply With Quote

Old   October 12, 2021, 15:11
Default
  #11
New Member
 
Chrowale
Join Date: Sep 2021
Location: Bandung, Indonesia
Posts: 16
Rep Power: 2
mluckyw is on a distinguished road
Quote:
Originally Posted by CFDfan View Post
I have never used single precision and wander:

Does a single precision case run faster (than double)?
Does the convergence suffer with single precision?
Personally, on my case which just simple flow of a wing.
Single precision will run faster than double precision, then usually I do scaling on the solver part to make it uses more ram than it should.
On some cases double precision is required, so I think it depends on case per case. for me it's worth it to try the case with single precision, It can save you more time in the future.

On my case the precisions only affecting the overall domain imbalance, single precision makes it a tad bit chaotic but still below 0.001% so its fine. RMS residuals met 10^-6 and max of 10^-4 on both single and double precision run
CFDfan likes this.
mluckyw is offline   Reply With Quote

Old   October 26, 2021, 16:09
Default
  #12
New Member
 
Join Date: Aug 2021
Posts: 2
Rep Power: 0
Jose Mourinho is on a distinguished road
Hi, there. And which will be better: 2x16 cores EPYC 7282 or 2x24 cores EPYC 7451? Provided that the price for the processor is the same.
Jose Mourinho is offline   Reply With Quote

Old   October 27, 2021, 04:25
Default
  #13
New Member
 
Erik Andresen
Join Date: Feb 2016
Location: Denmark
Posts: 15
Rep Power: 7
ErikAdr is on a distinguished road
Quote:
Originally Posted by Jose Mourinho View Post
Hi, there. And which will be better: 2x16 cores EPYC 7282 or 2x24 cores EPYC 7451? Provided that the price for the processor is the same.

Stay away from EPYC 7282! It got a very weak memory bandwidth of 85 GB/s where most EPYC's from that generation have a bandwidth of 205 GB/s. The 7282 is really bad for cfd. Take a look at https://www.spec.org/cpu2017/results/ where systems are benchmarked with real codes. The now old EPYC 7451 is much better than 7282. I would look for an EPYC 7313 from the latest generation, but it is difficult to find.
Jose Mourinho likes this.
ErikAdr is offline   Reply With Quote

Old   October 27, 2021, 12:35
Default
  #14
New Member
 
Join Date: Aug 2021
Posts: 2
Rep Power: 0
Jose Mourinho is on a distinguished road
Quote:
Originally Posted by ErikAdr View Post
Stay away from EPYC 7282! It got a very weak memory bandwidth of 85 GB/s where most EPYC's from that generation have a bandwidth of 205 GB/s. The 7282 is really bad for cfd. Take a look at https://www.spec.org/cpu2017/results/ where systems are benchmarked with real codes. The now old EPYC 7451 is much better than 7282. I would look for an EPYC 7313 from the latest generation, but it is difficult to find.
Thank you very much, Erik!
Jose Mourinho is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Foam-Extend 4.0 simpleFoam motorbike parallel error? EternalSeekerX OpenFOAM Running, Solving & CFD 0 May 10, 2021 05:55
[snappyHexMesh] Error snappyhexmesh - Multiple outside loops avinashjagdale OpenFOAM Meshing & Mesh Conversion 53 March 8, 2019 10:42
[Other] Equal decomposition of cylindrical fluid domain Sean95 OpenFOAM Meshing & Mesh Conversion 3 February 12, 2019 04:34
compressible flow in turbocharger riesotto OpenFOAM 50 May 26, 2014 02:47
decomposePar pointfield flying OpenFOAM Running, Solving & CFD 28 December 30, 2013 16:05


All times are GMT -4. The time now is 19:45.