CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   SU2 Installation (https://www.cfd-online.com/Forums/su2-installation/)
-   -   SU2 example job on multiple nodes creates incorrect result (https://www.cfd-online.com/Forums/su2-installation/213092-su2-example-job-multiple-nodes-creates-incorrect-result.html)

merijn December 14, 2018 08:58

SU2 example job on multiple nodes creates incorrect result
 
Hi,
I'm an HPC admin, trying to get SU2 to run in slurm on an opa fabric,
which seems to work succesfully, but produces no valid results.

so to test this I started the https://su2code.github.io/tutorials/Inviscid_ONERAM6/
example.

if I run it manually on one node like so:
SU2_CFD inv_ONERAM6.cfg
it produces a 29M restart_flow.dat and I can view the resulting surface_flow.vtk file in paraview for a visualization.

if I run the same simulation in parallel on two nodes x24 cores using
parallel_computation.py -n $SLURM_NTASKS -f inv_ONERAM6.cfg
it produces a clean log output, stating success, but the restart_flow is smaller, 15M restart_flow.dat (half the size of a manual 1 node run) and the .vtk file has no points to show.

on 8 nodes the filesize of restart_flow.dat becomes 3.6 MB
on 16 nodes the filesize of restart_flow.dat becomes 1.8MB

I do see the processes running on the allocated nodes.

im deducing, it breaks up the problem to multiple compute-nodes, and the restart_flow.dat will also contain only the partial solution of one compute-node. and the solition vtk file will be invalid. speculating, perhaps it just overwrites restart_flow.dat on each node.

I compiled SU2 with ./configure --prefix=/shared/Modules/SU2/6.1.0 --enable-mpi CFLAGS=-g -O3
with openmpi 4.0.0

how to debug this, how to solve this test job with multiple nodes?

merijn


All times are GMT -4. The time now is 06:46.