CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > FloEFD, FloWorks & FloTHERM

Garbage FloTHERM performance from 32-core system

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree1Likes
  • 1 Post By zwilhoit

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   December 19, 2018, 14:29
Default Garbage FloTHERM performance from 32-core system
  #1
New Member
 
Zachary Wilhoit
Join Date: Nov 2018
Posts: 5
Rep Power: 2
zwilhoit is on a distinguished road
2x Epyc 7301, 16x16GB DDR4-2666

I've got 32 cores at 2.7 GHz and 16 channels of memory bandwidth. Synthetic memory benchmarks show total read/write bandwidth in excess of 310 GB/s. System has 8 NUMA nodes. Windows Server 2016.

Mentor tech support indicates they've done internal testing showing good scaling on up to 32 cores.

SMT is off and I've set memory interleaving on at the channel level. Haven't tried die or socket interleaving yet but it seems to me like those would be worse options.

I see two problems:

First, while solving and watching CPU utilization, there are long stretches of time where FloTHERM is only using a single thread when initializing or finalizing a radiation factor calculation or the main thermal/airflow solver. Is that expected behavior? For a lot of models, those periods take longer than the parallel solvers do!

But the much larger problem is that the parallel solvers are painfully slow too on the Epyc system. Parallel solve time is equivalent or slower than generic 4-6 core, 2 memory channel desktops and laptops that we have.

Compared to a 12 core, 8 memory channel Haswell system we have, total solve times for our benchmark model are 2.5 times SLOWER although we have 1.9x the parallel compute resources and 2.3x the memory bandwidth. Single-threaded performance is about 75% of the Haswell system.

In terms of CPU time, the Haswell system spends 1845 minutes of CPU time in the parallel solver while the Epyc system spends 15007 minutes of CPU time. So on a per thread basis, the Haswell system is running 8 times faster in the parallel solver than the Epyc system. Something is egregiously wrong.... Any ideas?
akidess likes this.
zwilhoit is offline   Reply With Quote

Old   December 21, 2018, 18:07
Default
  #2
New Member
 
Zachary Wilhoit
Join Date: Nov 2018
Posts: 5
Rep Power: 2
zwilhoit is on a distinguished road
Enabling “set KMP_AFFINITY=noverbose,nowarnings,compact” in flotherm.bin gives about a 40% improvement but we're still far from where we should be.

I strongly suspect this is because our Epyc system has 8 NUMA nodes.

On our 2x E5-2643 v3 server, going from SMP to NUMA modes results in a ~15% performance penalty.

Compared to our 2x E5-2643 v3 machine, we're now about 1.5x slower in FloTHERM. But OpenFOAM benchmarks done by others on this site on their own similar systems are about 2x faster, right in line with the hardware advantage.

OpenFOAM benchmarks on various hardware
zwilhoit is offline   Reply With Quote

Old   January 7, 2019, 23:28
Smile FloTHERM performance - which version
  #3
New Member
 
DB
Join Date: Jan 2019
Posts: 8
Rep Power: 2
DB99 is on a distinguished road
What revision of FloTHERM are you using and what type Classic or XT? The mesher and geometry transfer functions seem to have sped up significantly with newer versions. I also noticed some time ago that the mesher was only using 1 core, hence showing less than 10% of total utilization. Seems like it has improved lately. I haven't experienced this with the solver.
what is the size of your model? Number of cells.
DB99 is offline   Reply With Quote

Old   January 8, 2019, 15:31
Default
  #4
New Member
 
Zachary Wilhoit
Join Date: Nov 2018
Posts: 5
Rep Power: 2
zwilhoit is on a distinguished road
FloTHERM classic, 12.2, 1.6M cells.

But we see the same behavior on larger models too.

Feeding the KMP_affinity function the "verbose" command yields a cmd window when the solver is launched. This window shows that the OpenMP library isn't getting our system topology correct.

Mentor support has referred me to their R&D department for further investigation.

For some light reading on optimizing OpenMP library calls for the Epyc architecture.... http://www.prace-ri.eu/IMG/pdf/Best-...-Guide-AMD.pdf
zwilhoit is offline   Reply With Quote

Old   January 10, 2019, 15:09
Default 1.6M cells, 1845 minutes
  #5
New Member
 
DB
Join Date: Jan 2019
Posts: 8
Rep Power: 2
DB99 is on a distinguished road
Wow! something is seriously wrong. I've solved 24M cells for 2U24 server model in that amount of time, actually 18 hours in classic V9. This is on a 4 core Dell 16Gb memory from year 2010. On a Compaq Presario 10 year old laptop with 4Gb memory a 1.6M cell model would solve in about 30 min or hour. Laptop has an AMD processor which seems to run faster than Intel's on some apps.
I've noticed that some virus protection programs slow down the program significantly to 10% or so (Win10 system). I think the virus program is checking out all the temp files of the mesher or solver and interferring. Try turning virus protection off during mesh or solve.
DB
DB99 is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Flotherm vs Flotherm XT vs FloEFD zdunol FloEFD, FloWorks & FloTHERM 20 January 10, 2019 14:56
Modeling component of system level in Flotherm SOHYEON MUN FloEFD, FloWorks & FloTHERM 1 April 30, 2014 04:55
Serial Job Jumping from Core to Core Will FLUENT 2 August 25, 2008 15:21
Need ideas-fuel discharge system Jan Main CFD Forum 0 October 9, 2006 05:27
Flotherm-Icepak comparison for electronic cooling Masoud Ameli FLUENT 0 February 28, 2005 10:00


All times are GMT -4. The time now is 09:38.