CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Slow calculation time on CFD server

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   June 18, 2021, 10:05
Default Slow calculation time on CFD server
  #1
New Member
 
Killian
Join Date: Nov 2017
Posts: 26
Rep Power: 6
killian153 is on a distinguished road
Hello,

We currently have a dual-CPU server with:

- CentOS 7
- 2x AMD EPYC 7352 24 cores
- 256Go of ECC-RAM
- 1x RTX 3090

We run our calculations remotely with MobaXterm and use the command "workbench&" to launch it.

The issue :

We realized the calculations seemed very slow when 2 different fluent process were running (even with only half of the cores in-use).

So I ran a benchmark on the same case (at the same time) between my computer and the server, with the same conditions: PBNS simulation of a nozzle in 2D axisymmetric - 12 cores

For 3076 iterations on my computer, I only made 1699 iterations on the server. BUT when I check the CPU info, here's what I get:


My computer :

---------------------------------------------------------------------------------------
| CPU | System Mem (GB)
Hostname | Sock x Core x HT Clock (MHz) Load (%)| Total Available
---------------------------------------------------------------------------------------
####### | 1 x 8 x 2 3792 2.44111| 31.8535 14.192
---------------------------------------------------------------------------------------
Total | 16 - - | 31.8535 14.192
---------------------------------------------------------------------------------------

---------------------------------------------
| CPU Time Usage (Seconds)
ID | User Kernel Elapsed
---------------------------------------------
host | 187.141 29.9219 6098.92
n0 | 1754.94 31.6563 6097.94
n1 | 1999.02 49 6097.94
n2 | 2042.19 44.1875 6097.93
n3 | 2069.17 45.0938 6097.92
n4 | 2055.78 51.4844 6097.91
n5 | 2071.81 47.2656 6097.91
n6 | 2085.63 37.7813 6097.9
n7 | 2081.78 41.1875 6097.89
n8 | 2015.33 45.3594 6097.88
n9 | 2049.89 46.7031 6097.87
n10 | 2048.98 52.3125 6097.85
n11 | 2055.55 58.5313 6097.84
---------------------------------------------
Total | 24517.2 580.484 -
---------------------------------------------

Model Timers (Host)
Flow Model Time: 31.281 sec (CPU), count 3076
Other Models Time: 0.344 sec (CPU)
Total Time: 31.625 sec (CPU)

Model Timers
Flow Model Time: 999.346 sec (WALL), 1000.984 sec (CPU), count 3076
Turbulence Model Time: 278.553 sec (WALL), 276.125 sec (CPU), count 3076
Temperature Model Time: 211.580 sec (WALL), 211.203 sec (CPU), count 3076
Other Models Time: 0.533 sec (WALL)
Total Time: 1490.012 sec (WALL)

Performance Timer for 3076 iterations on 12 compute nodes
Average wall-clock time per iteration: 0.489 sec
Global reductions per iteration: 69 ops
Global reductions time per iteration: 0.000 sec (0.0%)
Message count per iteration: 28044 messages
Data transfer per iteration: 16.082 MB
LE solves per iteration: 4 solves
LE wall-clock time per iteration: 0.239 sec (49.0%)
LE global solves per iteration: 4 solves
LE global wall-clock time per iteration: 0.001 sec (0.3%)
LE global matrix maximum size: 589
AMG cycles per iteration: 7.876 cycles
Relaxation sweeps per iteration: 974 sweeps
Relaxation exchanges per iteration: 0 exchanges
LE early protections (stall) per iteration: 0.007 times
LE early protections (divergence) per iteration: 0.000 times
Total SVARS touched: 364

Total wall-clock time: 1503.849 sec

The server:


---------------------------------------------------------------------------------------
| CPU | System Mem (GB)
Hostname | Sock x Core x HT Clock (MHz) Load (%)| Total Available
---------------------------------------------------------------------------------------
##| 2 x 24 x 1 2300 21.6 | 251.845 125.986
---------------------------------------------------------------------------------------
Total | 48 - - | 251.845 125.986
---------------------------------------------------------------------------------------

---------------------------------------------
| CPU Time Usage (Seconds)
ID | User Kernel Elapsed
---------------------------------------------
host | 75 87 -
n0 | 389 88 -
n1 | 2737 243 -
n2 | 2741 237 -
n3 | 2746 244 -
n4 | 2756 235 -
n5 | 2746 224 -
n6 | 2757 209 -
n7 | 2747 216 -
n8 | 2740 207 -
n9 | 2752 221 -
n10 | 2749 211 -
n11 | 2750 205 -
---------------------------------------------
Total | 30685 2627 -
---------------------------------------------

Model Timers (Host)
Flow Model Time: 3.001 sec (CPU), count 1699
Other Models Time: 0.178 sec (CPU)
Total Time: 3.179 sec (CPU)

Model Timers
Flow Model Time: 190.640 sec (WALL), 191.337 sec (CPU), count 1699
Turbulence Model Time: 59.533 sec (WALL), 59.675 sec (CPU), count 1699
Temperature Model Time: 51.040 sec (WALL), 51.256 sec (CPU), count 1699
Other Models Time: 0.283 sec (WALL)
Total Time: 301.496 sec (WALL)

Performance Timer for 1699 iterations on 12 compute nodes
Average wall-clock time per iteration: 0.182 sec
Global reductions per iteration: 68 ops
Global reductions time per iteration: 0.000 sec (0.0%)
Message count per iteration: 25853 messages
Data transfer per iteration: 14.436 MB
LE solves per iteration: 4 solves
LE wall-clock time per iteration: 0.081 sec (44.7%)
LE global solves per iteration: 4 solves
LE global wall-clock time per iteration: 0.001 sec (0.7%)
LE global matrix maximum size: 614
AMG cycles per iteration: 7.741 cycles
Relaxation sweeps per iteration: 961 sweeps
Relaxation exchanges per iteration: 0 exchanges
LE early protections (stall) per iteration: 0.007 times
LE early protections (divergence) per iteration: 0.000 times
Total SVARS touched: 364

Total wall-clock time: 308.849 sec



---------------------------------------

So, the time/iteration is 0.182s for the server and 0.489s for my computer and yet my calculation is 45% slower on the server.

I just don't understand..

Thanks!


For information:

We also have an alert message reach time we launch Fluent (maybe it's related):

Warning:
Direct rendering unavailable, hardware acceleration will be disabled.
In the absence of hardware-accelerated drivers, the performance of all graphics operations will be severely affected. Make sure you have a supported graphics card, latest graphics driver, and a supported remote visualization tool with direct server-side rendering enabled. If you feel your system meets these requirements, try forcing the accelerated driver by using the command line flag (-driver <name>) or setting the HOOPS_PICTURE environment variable. Refer to the documentation for more details.


We never solved this problem. The GPU is recognized by Fluent, all drivers are installed but maybe the problem comes from MobaXterm ?
killian153 is offline   Reply With Quote

Old   June 18, 2021, 18:39
Question Don't understand your timing assessment
  #2
Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 63
Rep Power: 9
wkernkamp is on a distinguished road
The wall clock for the server is 309 sec and your local computer 1504 sec. Correcting for number of iterations, we would get 309*3076/1699 is 559 sec if the server had performed the same 3076 iterations. So the server is performing better than your home computer. What am I missing?

Last edited by wkernkamp; June 18, 2021 at 18:42. Reason: Filled in correct times and number of iterations.
wkernkamp is offline   Reply With Quote

Old   June 18, 2021, 20:01
Default
  #3
New Member
 
Killian
Join Date: Nov 2017
Posts: 26
Rep Power: 6
killian153 is on a distinguished road
Quote:
Originally Posted by wkernkamp View Post
The wall clock for the server is 309 sec and your local computer 1504 sec. Correcting for number of iterations, we would get 309*3076/1699 is 559 sec if the server had performed the same 3076 iterations. So the server is performing better than your home computer. What am I missing?
In "real-time" (by real time I mean the physical time), when launched at the same time, the calculation made 3076 iterations on my computer and only 1699 on the server. That's what I don't understand: results show that the server is faster but in reality, it's 50% slower (I can prove that because I launched both simulations at the same time and when I decide to stop it, my computer is at 3079 while the server is still at 1699.
killian153 is offline   Reply With Quote

Old   June 19, 2021, 18:03
Default Got it
  #4
Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 63
Rep Power: 9
wkernkamp is on a distinguished road
Looks like the server is performing OK when running, but most of the available time is consumed by something else. I assume there is nothing else running on the machine so that leaves the error message and a possible interface wait as the only possibility. Have you tried running fluent from the command line:


fluent <version> -g -t<nprocs>-gpgpu=<ngpgpus> -i journalfile > outputfile
wkernkamp is offline   Reply With Quote

Old   July 27, 2021, 08:28
Default
  #5
New Member
 
Join Date: Dec 2019
Posts: 22
Rep Power: 3
Baum is on a distinguished road
This is interesting because I only saw this behaviour in our PCs when I maxed them out 100%: the same PC finished a task faster with e.g. 28/32 cores running compared to 32/32, though the Fluent process claims that the iterations are solved more quickly in the second case. I always wrote this off as a case of CPU-overhead which is needed for other (smaller) tasks, like plotting the graphs, writing report files etc.. However, if you have the same problem with 12/24 cores running, I guess that assumption was wrong. Have you tried contacting your Fluent representative to check up with them?
Baum is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
laplacianFoam with source term Herwig OpenFOAM Running, Solving & CFD 17 November 19, 2019 14:47
pimpleDyMFoam computation randomly stops babapeti OpenFOAM Running, Solving & CFD 5 January 24, 2018 06:28
pressure in incompressible solvers e.g. simpleFoam chrizzl OpenFOAM Running, Solving & CFD 13 March 28, 2017 06:49
Superlinear speedup in OpenFOAM 13 msrinath80 OpenFOAM Running, Solving & CFD 18 March 3, 2015 06:36
How to write k and epsilon before the abnormal end xiuying OpenFOAM Running, Solving & CFD 8 August 27, 2013 16:33


All times are GMT -4. The time now is 20:07.