|
[Sponsors] |
Garbage FloTHERM performance from 32-core system |
![]() |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
![]() |
![]() |
#1 |
New Member
Join Date: Nov 2018
Location: USA
Posts: 9
Rep Power: 8 ![]() |
2x Epyc 7301, 16x16GB DDR4-2666
I've got 32 cores at 2.7 GHz and 16 channels of memory bandwidth. Synthetic memory benchmarks show total read/write bandwidth in excess of 310 GB/s. System has 8 NUMA nodes. Windows Server 2016. Mentor tech support indicates they've done internal testing showing good scaling on up to 32 cores. SMT is off and I've set memory interleaving on at the channel level. Haven't tried die or socket interleaving yet but it seems to me like those would be worse options. I see two problems: First, while solving and watching CPU utilization, there are long stretches of time where FloTHERM is only using a single thread when initializing or finalizing a radiation factor calculation or the main thermal/airflow solver. Is that expected behavior? For a lot of models, those periods take longer than the parallel solvers do! But the much larger problem is that the parallel solvers are painfully slow too on the Epyc system. Parallel solve time is equivalent or slower than generic 4-6 core, 2 memory channel desktops and laptops that we have. Compared to a 12 core, 8 memory channel Haswell system we have, total solve times for our benchmark model are 2.5 times SLOWER although we have 1.9x the parallel compute resources and 2.3x the memory bandwidth. Single-threaded performance is about 75% of the Haswell system. In terms of CPU time, the Haswell system spends 1845 minutes of CPU time in the parallel solver while the Epyc system spends 15007 minutes of CPU time. So on a per thread basis, the Haswell system is running 8 times faster in the parallel solver than the Epyc system. Something is egregiously wrong.... Any ideas? |
|
![]() |
![]() |
![]() |
![]() |
#2 |
New Member
Join Date: Nov 2018
Location: USA
Posts: 9
Rep Power: 8 ![]() |
Enabling “set KMP_AFFINITY=noverbose,nowarnings,compact” in flotherm.bin gives about a 40% improvement but we're still far from where we should be.
I strongly suspect this is because our Epyc system has 8 NUMA nodes. On our 2x E5-2643 v3 server, going from SMP to NUMA modes results in a ~15% performance penalty. Compared to our 2x E5-2643 v3 machine, we're now about 1.5x slower in FloTHERM. But OpenFOAM benchmarks done by others on this site on their own similar systems are about 2x faster, right in line with the hardware advantage. OpenFOAM benchmarks on various hardware |
|
![]() |
![]() |
![]() |
![]() |
#3 |
New Member
DB
Join Date: Jan 2019
Posts: 28
Rep Power: 8 ![]() |
What revision of FloTHERM are you using and what type Classic or XT? The mesher and geometry transfer functions seem to have sped up significantly with newer versions. I also noticed some time ago that the mesher was only using 1 core, hence showing less than 10% of total utilization. Seems like it has improved lately. I haven't experienced this with the solver.
what is the size of your model? Number of cells. |
|
![]() |
![]() |
![]() |
![]() |
#4 |
New Member
Join Date: Nov 2018
Location: USA
Posts: 9
Rep Power: 8 ![]() |
FloTHERM classic, 12.2, 1.6M cells.
But we see the same behavior on larger models too. Feeding the KMP_affinity function the "verbose" command yields a cmd window when the solver is launched. This window shows that the OpenMP library isn't getting our system topology correct. Mentor support has referred me to their R&D department for further investigation. For some light reading on optimizing OpenMP library calls for the Epyc architecture.... http://www.prace-ri.eu/IMG/pdf/Best-...-Guide-AMD.pdf |
|
![]() |
![]() |
![]() |
![]() |
#5 |
New Member
DB
Join Date: Jan 2019
Posts: 28
Rep Power: 8 ![]() |
Wow! something is seriously wrong. I've solved 24M cells for 2U24 server model in that amount of time, actually 18 hours in classic V9. This is on a 4 core Dell 16Gb memory from year 2010. On a Compaq Presario 10 year old laptop with 4Gb memory a 1.6M cell model would solve in about 30 min or hour. Laptop has an AMD processor which seems to run faster than Intel's on some apps.
I've noticed that some virus protection programs slow down the program significantly to 10% or so (Win10 system). I think the virus program is checking out all the temp files of the mesher or solver and interferring. Try turning virus protection off during mesh or solve. DB |
|
![]() |
![]() |
![]() |
![]() |
#6 |
New Member
Join Date: Nov 2018
Location: USA
Posts: 9
Rep Power: 8 ![]() |
EPYC 7002 (ROME) processors have moved to a single NUMA node per socket.
Has anybody ran FloTHERM on either EPYC 7002 or Threadripper 39X0X series processors? I'm trying to gauge if upgrading to the new series of processors, with a simpler NUMA topology, would fix my performance issues. |
|
![]() |
![]() |
![]() |
![]() |
#7 | |
New Member
Felix
Join Date: Jun 2020
Posts: 1
Rep Power: 0 ![]() |
Quote:
|
||
![]() |
![]() |
![]() |
![]() |
#8 | |
New Member
Join Date: Nov 2018
Location: USA
Posts: 9
Rep Power: 8 ![]() |
Quote:
I considered trying to upgrade the system to 2nd gen EPYC, but get this, on the early version of the Supermicro motherboard I have, the motherboard BIOS ROM is too small (32 instead of 64 mb IIRC) to be able to drop in the 2nd gen processors. Supermicro had to do a dot rev on the motherboard to upgrade the ROM when 2nd gen EPYC came out... So, I can't upgrade processors without ripping up the mobo and starting over. Oh well, we've found lots of other uses for the server (Ansys, Keyshot, etc.). I really think 2x 7352 would be killer in FloTHERM for the price. |
||
![]() |
![]() |
![]() |
![]() |
#9 |
New Member
Join Date: Nov 2018
Location: USA
Posts: 9
Rep Power: 8 ![]() |
Bump, has anybody tested Flotherm performance with Zen 2 Threadripper or Epyc?
i.e. Threadripper 3xxxX, Epyc 7xx2 Looks like Zen 3 is just around the corner, I'm assuming also with one NUMA node per socket, which I hypothesize would solve the issue. Not to mention massively improved single-thread performance (radiation, starting the solver...). |
|
![]() |
![]() |
![]() |
![]() |
#10 |
New Member
DB
Join Date: Jan 2019
Posts: 28
Rep Power: 8 ![]() |
A lot of the discussion is over my head but I've seen that the new OS's with real time disk encryption slow things down a lot (bit locker, Win10). Might see if an exception can be applied.
In FloEFD you can set the radiation solver to only run every x iterations. Say every 5th instead of everyone, recommended by Mentor support. There isn't a setting in the Flotherm app for this but you guys knowing DOS commands might be able to figure it out. |
|
![]() |
![]() |
![]() |
![]() |
#11 |
New Member
DB
Join Date: Jan 2019
Posts: 28
Rep Power: 8 ![]() |
The easiest thing to get FloTHERM to run faster is to get a 3.# GHz CPU speed. 3.3/2.7 is 22% faster.
|
|
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Flotherm vs Flotherm XT vs FloEFD | zdunol | FloEFD, FloWorks & FloTHERM | 39 | July 14, 2020 09:36 |
Modeling component of system level in Flotherm | SOHYEON MUN | FloEFD, FloWorks & FloTHERM | 1 | April 30, 2014 04:55 |
Serial Job Jumping from Core to Core | Will | FLUENT | 2 | August 25, 2008 15:21 |
Need ideas-fuel discharge system | Jan | Main CFD Forum | 0 | October 9, 2006 05:27 |
Flotherm-Icepak comparison for electronic cooling | Masoud Ameli | FLUENT | 0 | February 28, 2005 10:00 |