CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   FLUENT (https://www.cfd-online.com/Forums/fluent/)
-   -   Performance Issue - flmpi stops CX process busy (https://www.cfd-online.com/Forums/fluent/252292-performance-issue-flmpi-stops-cx-process-busy.html)

Tyrmida October 10, 2023 14:49

Performance Issue - flmpi stops CX process busy
 
Good evening all,

I am hoping that someone can help me with direction with trying to figure out why a certain model may be performing slowly.

While solving (local single machine or over msmpi/ms job scheduler) the fl_mpi23* process works for a few seconds, but then stops until the next iteration.

In this "wait" period, the cx23* process is madly busy doing I don't know what - is there any way I can figure out what it is in fact trying to do?

Things we have tried:
  1. Local run only and also through MS Job scheduler (msmpi)
  2. Run without GUI or with
  3. Limiting CPU Cores Per Node different amounts from 4 to 40
  4. Limiting Nodes different amounts from 1 to 12
  5. Disabling dpm in the model

All other benchmarks that we have tried is performing 100%. Our system is 12 x nodes of dual 44-core Intel Xeon 6152 cpus with 6 memory channels per CPU. Storage is SSD. Network is 100gb IB.

The "Pausing" is reflected in the performance timer as the iteration supposedly takes 4 seconds but 30 iterations take 1276 seconds.

No errors in fluent output at all showing something is wrong. Disk queues empty. It seems that the "average wall-clock time per iteration" scales accordingly to number of nodes/cores given to the job, however the waiting still stays there.

If anyone can maybe suggest how we can figure out the root cause I'd really appreciate it.

Code:

Performance timer output:

Performance Timer for 30 iterations on 240 compute nodes
  Average wall-clock time per iteration:                4.032 sec
  Global reductions per iteration:                        443 ops
  Global reductions time per iteration:                0.000 sec (0.0%)
  Message count per iteration:                        760147 messages
  Data transfer per iteration:                      3612.734 MB
  LE solves per iteration:                                  5 solves
  LE wall-clock time per iteration:                    0.607 sec (15.1%)
  LE global solves per iteration:                          2 solves
  LE global wall-clock time per iteration:              0.025 sec (0.6%)
  LE global matrix maximum size:                          355
  AMG cycles per iteration:                            6.000 cycles
  Relaxation sweeps per iteration:                        436 sweeps
  Relaxation exchanges per iteration:                      0 exchanges
  LE early protections (stall) per iteration:          0.000 times
  LE early protections (divergence) per iteration:      0.000 times
  Total SVARS touched:                                    398
  DPM updates per iteration:                          0.5000 updates
  DPM wall-clock time per iteration:                    1.334 sec (33.1%)
  Time-step updates per iteration:                      0.50 updates
  Time-step wall-clock time per iteration:              2.448 sec (60.7%)

  Total wall-clock time:                              120.947 sec
  Total dpm solve time:                                40.029 sec
  Total dpm i/o time:                                  0.000 sec


Simulation wall-clock time for 30 iterations            1276 sec

Thank you very much in advance


All times are GMT -4. The time now is 21:26.