CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Best PC recommendation for special CFD simulation with a short time step

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree3Likes
  • 1 Post By flotus1
  • 1 Post By flotus1
  • 1 Post By flotus1

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   November 29, 2019, 01:04
Default Best PC recommendation for special CFD simulation with a short time step
  #1
Member
 
Join Date: Oct 2019
Posts: 63
Rep Power: 6
Habib-CFD is on a distinguished road
Hi guys, I use the Flow 3D software for special CFD simulation including both heat and mass transfer. With respect to the nature of the problem, I have to use a very short time step in the term of convergence. The total cell count is about 100k. Currently, I employ the AMD 2970wx configured by 4*8GB 3200MHz ram and CentOS v7.7. This PC shows very high performance in the multi-cores computation of default examples. Although in the cases with the higher time step, the 20 cores performance is double time higher than 8 cores condition, the optimized condition of a customized model achieved just at 8 cores. I checked many various methods, for example, disabling SMT, memory interleaving as the channel, configuring Numa cores, and many other efforts without special benefits. I already checked the other PC (Intel 6580K) and found the same multi-processing problem.

I will be grateful if you suggest some hardware replacement for this special case (Up to 2000$) or some method to improve the performance of my current PC. The following link mentioned about floating-point performance as the most important parameter in CFD.

https://www.flow3d.com/hardware-sele...w-3d-products/
https://en.wikichip.org/wiki/flops

Based on the attached link, the 2970WX uses the AVX2 & FMA (128-bit) while the intel products utilize the AVX2 & FMA (256-bit) or AVX-512 & FMA (512-bit) with higher FLOPs per cycle. I have tried to find the benefits of AVX-512, although there are some claims on reducing efficiency!!

https://software.intel.com/en-us/for...g/topic/815069
https://lemire.me/blog/2018/04/19/by...st-experiment/

As I understand, the poor efficiency of multi-cores means that the problem is not correctly balanced between all cores or that the time needed for communications is important compared to computing time. I believe my model is somehow limited by the intercommunications time. BTW, I am not really sure about the new architecture of the intel product. My license covers up to 32 cores and the time step must be limited to 5e-06s with respect to 0.25mm of the cubic mesh size.
Habib-CFD is offline   Reply With Quote

Old   November 29, 2019, 04:50
Default
  #2
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,411
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
There is A LOT to unravel here. I might get back to it over the course of the weekend.
For now, I can not help but notice that the first link you quoted is riddled with questionable claims about CFD performance, and it contradicts itself.
To quote from that source:
Quote:
because CFD solver performance is entirely dependent on the floating-point performance of the CPU
This is plain wrong. Memory performance plays an equally important role, if not more important.
Weirdly enough, they follow up by contradicting themselves:
Quote:
Skylake and newer considerations

We have determined that the way RAM is physically populated on the board is extremely important for performance on Skylake and newer architectures. For a CPU that supports six memory channels, 6 or 12 DIMMs should be populated identically. For four channel CPUs, 4 or 8 DIMMs should be populated identically.
An unbalanced configuration, where the memory channels or DIMM size/speed are mismatched, reduces performance significantly.
Which is a complicated way of saying: memory performance is crucial for CFD solver performance.

So in short: you are not looking for the highest theoretical floating point performance in a CPU, but a balance between FP performance and memory subsystem performance.
Habib-CFD likes this.
flotus1 is offline   Reply With Quote

Old   December 1, 2019, 04:59
Default
  #3
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,411
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
All right here is my take on this issue.

First things first: with a low cell count of 200k, your case just might not scale very well on many cores, no matter what you do. Time step size should have very little impact on the whole situation.
But here is what you can try with your 2970WX
  • Disable SMT
  • Deactivate the cores on chiplets without direct memory access. Running CFD codes on them will slow down your computations. This will leave you with 12 usable cores. Either deactivate the cores in the bios (no idea if it lets you do that), or pin your simulation to the other cores using taskset or similar tools.
  • Use NUMA mode, set memory interleaving option to channel
  • Make sure memory is populated correctly, and runs at the intended frequency. The motherboard manual should have information about which DIMM slots to use for 4 DIMMs.
  • Optimize memory timings further, using Ryzen DRAM calculator.
  • Last, not least: make sure there are no other bottlenecks. For example I/O from frequent writes to a slow disk. In this case, increase the output interval.

As for buying a different PC just for this application: While the TR 2970WX might not be ideal for this task, it will be very difficult to get a significantly better configuration in the 2000$ price range. At least once all of the issues above have been addressed.
Habib-CFD likes this.
flotus1 is offline   Reply With Quote

Old   December 1, 2019, 07:20
Default
  #4
Member
 
Join Date: Oct 2019
Posts: 63
Rep Power: 6
Habib-CFD is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
All right here is my take on this issue.

First things first: with a low cell count of 200k, your case just might not scale very well on many cores, no matter what you do. Time step size should have very little impact on the whole situation.
As for buying a different PC just for this application: While the TR 2970WX might not be ideal for this task, it will be very difficult to get a significantly better configuration in the 2000$ price range. At least once all of the issues above have been addressed.

At first, thank you for your helpful reply. I need to mention that all my experience on CFD just limited to Flow 3D software and threadripper based system so maybe some of my claims are not true for the other cases.


I agree with your comment on multi-core performance using 200k cell but the effect of the time step is completely obvious. Bellow the 1e-6s step the solving time for 4 cores is similar to 20 cores. I am a little confused about the result and very interesting to find the bottleneck.

I already checked these customization. For example in my case, disabling the SMT did not show significant improvement (less than 5 percent). Overclocking the frequency of RAM from 2166 (M.B. default) to 3200MHz (RAM default with the best timing) showed about 10 percent improvement. All four slots populated correctly in quad mode. In addition, I use a high-speed M.2 memory (970 Evo). Setting the memory interleaving in Die mode revealed more benefits than Channel mode (5 percent totally). It seems the auto-configuration in ASRock x399 taichi works very well.

Maybe, it is better to focus on the codes in simulation and looking for a way to increase the time step or better defining the problem.
Thanks again.
Habib-CFD is offline   Reply With Quote

Old   December 1, 2019, 08:48
Default
  #5
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,411
Rep Power: 49
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
If time step size really has such a high impact with your test case, it will be necessary to find out what is going on here. This is not normal.

Apart from that, you still seem to be missing the most important optimization I mentioned. Run the code only on cores that have direct memory access. Without this potentially huge bottleneck out of the way, judging the impact of other factors is rather pointless.
Habib-CFD likes this.
flotus1 is offline   Reply With Quote

Old   December 1, 2019, 10:01
Default
  #6
Member
 
Join Date: Oct 2019
Posts: 63
Rep Power: 6
Habib-CFD is on a distinguished road
Oh, I missed to explain the set up of the cores with direct access to ram. The Linux command provides some option for disabling cores directly so I checked different sets, e.g. disabling die 1 and 3 with and without SMT. As I mentioned, due to plenty of threads in 2970wx, the effect somehow was vanished. I heard that this bottleneck has a significant effect on some benchmark like 7-zip compression using higher than 8 cores, but in Flow 3D the condition looks different.



Thank you.
Habib-CFD is offline   Reply With Quote

Reply

Tags
amd, time step reduced

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Time Step Continuity Errors simpleFoam Dorian1504 OpenFOAM Running, Solving & CFD 1 October 9, 2022 09:23
[Other] Contribution a new utility: refine wall layer mesh based on yPlus field lakeat OpenFOAM Community Contributions 58 December 23, 2021 02:36
AMI speed performance danny123 OpenFOAM 21 October 24, 2020 04:13
How to write k and epsilon before the abnormal end xiuying OpenFOAM Running, Solving & CFD 8 August 27, 2013 15:33
IcoFoam parallel woes msrinath80 OpenFOAM Running, Solving & CFD 9 July 22, 2007 02:58


All times are GMT -4. The time now is 18:10.