Tyrmida |
October 10, 2023 14:49 |
Performance Issue - flmpi stops CX process busy
Good evening all,
I am hoping that someone can help me with direction with trying to figure out why a certain model may be performing slowly.
While solving (local single machine or over msmpi/ms job scheduler) the fl_mpi23* process works for a few seconds, but then stops until the next iteration.
In this "wait" period, the cx23* process is madly busy doing I don't know what - is there any way I can figure out what it is in fact trying to do?
Things we have tried:
- Local run only and also through MS Job scheduler (msmpi)
- Run without GUI or with
- Limiting CPU Cores Per Node different amounts from 4 to 40
- Limiting Nodes different amounts from 1 to 12
- Disabling dpm in the model
All other benchmarks that we have tried is performing 100%. Our system is 12 x nodes of dual 44-core Intel Xeon 6152 cpus with 6 memory channels per CPU. Storage is SSD. Network is 100gb IB.
The "Pausing" is reflected in the performance timer as the iteration supposedly takes 4 seconds but 30 iterations take 1276 seconds.
No errors in fluent output at all showing something is wrong. Disk queues empty. It seems that the "average wall-clock time per iteration" scales accordingly to number of nodes/cores given to the job, however the waiting still stays there.
If anyone can maybe suggest how we can figure out the root cause I'd really appreciate it.
Code:
Performance timer output:
Performance Timer for 30 iterations on 240 compute nodes
Average wall-clock time per iteration: 4.032 sec
Global reductions per iteration: 443 ops
Global reductions time per iteration: 0.000 sec (0.0%)
Message count per iteration: 760147 messages
Data transfer per iteration: 3612.734 MB
LE solves per iteration: 5 solves
LE wall-clock time per iteration: 0.607 sec (15.1%)
LE global solves per iteration: 2 solves
LE global wall-clock time per iteration: 0.025 sec (0.6%)
LE global matrix maximum size: 355
AMG cycles per iteration: 6.000 cycles
Relaxation sweeps per iteration: 436 sweeps
Relaxation exchanges per iteration: 0 exchanges
LE early protections (stall) per iteration: 0.000 times
LE early protections (divergence) per iteration: 0.000 times
Total SVARS touched: 398
DPM updates per iteration: 0.5000 updates
DPM wall-clock time per iteration: 1.334 sec (33.1%)
Time-step updates per iteration: 0.50 updates
Time-step wall-clock time per iteration: 2.448 sec (60.7%)
Total wall-clock time: 120.947 sec
Total dpm solve time: 40.029 sec
Total dpm i/o time: 0.000 sec
Simulation wall-clock time for 30 iterations 1276 sec
Thank you very much in advance
|