Maximum number of useful cores?
Hello,
I wanted to get into topology optimization of heat sinks (like e.g. shown here: https://www.youtube.com/watch?v=wXZWTVnBRLU). With a new system probably coming soon, I was wondering what hardware to best use. As shown in the presentation (e.g. min 9:00) they used a super-computer wit 1200 cores. Now what I am wondering: Does a Dual 64 core (EPYC Rome/Milan) setup make sense for this application or will the memory cause a bottleneck and therefore not make full use of all 128 cores? Regards Mike |
Hard to make any assumptions based on the information presented in the video.
But in general, for any decently optimized CFD solver, it is not worth going beyond 32 cores per CPU on Epyc Rome/Milan. At that point, excess money is better spent on more nodes than more cores per node. If you already know which software you will be using, you can just run a few scaling tests on hardware you already have. Post the results here if you need help making sense of the results. |
Hi Alex,
thank you. Currently we only have sinlge CPU 32 core workstations. There are clusters here, but due to some internal "difficulties" they are not accessible to me/us. While we do have an IT department, how difficult is building a multi node system (I assume it is connecting multiple workstations via a network an one workstation acting as the task distributor)? Thinking about it, this would be a good solution, as during the week people/students could be using the workstations at longer calculations could be run during the weekend. The software will probably be ANSYS or COMSOL. |
the guy on the video spent a lot of computational time optimizing the shapes and the presentation itself was very nice. I have doubts however about the practicality of the optimized heatsinks because they will be expensive to manufacture (compared with cheap, extruded ones) and yet the temperature improvements are insignificant to be taken seriously. Maybe in the LED world it will fly because every C' degree down extends the LED's life, but not in the general electronics.
Regarding the hardware - 32 cores will do the job as well working over the weekends. The CFD codes don't benefit tremendously of having thousands of cores because the overheads of managing the paralleled cores becomes the bottle neck of the parallel processing at some point. The commercial CFD code developers usually have a graph showing the improvements in the processing speed by the number of paralleled cores. You could inquire about it and make an educated decision how much paralleled cores would make the desired time difference. Besides the increased hardware cost there is question of the rising cost of licensing as well, so you have to play with these 3 parameters to figure out what is the best for you. |
Quote:
If I understood correctly, it wasn't about "how many cores can be used in total", but whether you get any benefit from more than 4 cores per memory channel. Quote:
But there are some considerations before buying your own cluster. 1) Licensing constraints. Working with "Ansys" and Comsol, you need enough licenses to run on this many cores. They can be expensive. 2) Utilization. Having your own cluster is only viable if you can keep it fed with jobs pretty much 24/7. Otherwise, buying on-demand computing time on commercial (or academic?) HPC clusters is the easier and cheaper solution. A note on parallelization, especially for topology optimization: there are two ways to do it. 1) Sequential execution of geometry iterations. this means one job runs parallel across all compute resources. Upon completion, the next geometry iteration can start based on these results. Running this across multiple compute nodes requires proper interconnects like Infiniband 2) Parallel execution of several cases. The cases can run independently for the most part, e.g. one on each compute node. This is much easier to parallelize since communication/synchronization only needs to happen at the end of a batch of runs. And can be done with pretty much any node interconnect. Particularly handy if the individual jobs are too small to get good scaling on many cores. What you can pick depends on the software, and may also have implications on the number and type of licenses you need for commercial solvers. |
Thank you very much :)
So based on your answers I think the best course of action will be to create a model and compare the times on a 32 an 64 core workstation. |
It would be easier to interpret the results if the tests are run on a single machine. Eliminates a few variables. Ideally, a strong scaling test is run with 1, 2, 4, 8... threads, up to the number of physical cores.
From my previous experiences of recommending a scaling test, here are some pointers. Just ignore if this is old news to you: Eliminate setup and post-processing times from the results, we are only interested in solver time. And we don't need a full run, the average time for a few iterations is enough. It doesn't have to be a fancy workstation with many cores. And it doesn't have to be a huge test case that runs for several hours. It just has to be representative of what you actually want to run. |
Thank you for the tips, now I get it/remember. Use the software to regulate the cores that are being used.
So the first step will be creating a model that is representative of what I want to do. |
Double Precision will also effect the scaling, as it uses more memory bandwidth.
Personally I run everything as double precision by default, and start to see non-linear scaling much sooner than single precision. |
Quote:
I have never used single precision and wander: Does a single precision case run faster (than double)? Does the convergence suffer with single precision? |
Quote:
Single precision will run faster than double precision, then usually I do scaling on the solver part to make it uses more ram than it should. On some cases double precision is required, so I think it depends on case per case. for me it's worth it to try the case with single precision, It can save you more time in the future. On my case the precisions only affecting the overall domain imbalance, single precision makes it a tad bit chaotic but still below 0.001% so its fine. RMS residuals met 10^-6 and max of 10^-4 on both single and double precision run |
Hi, there. And which will be better: 2x16 cores EPYC 7282 or 2x24 cores EPYC 7451? Provided that the price for the processor is the same.
|
Quote:
Stay away from EPYC 7282! It got a very weak memory bandwidth of 85 GB/s where most EPYC's from that generation have a bandwidth of 205 GB/s. The 7282 is really bad for cfd. Take a look at https://www.spec.org/cpu2017/results/ where systems are benchmarked with real codes. The now old EPYC 7451 is much better than 7282. I would look for an EPYC 7313 from the latest generation, but it is difficult to find. |
Quote:
|
All times are GMT -4. The time now is 01:08. |