CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   problem with interfoam on multiple nodes (https://www.cfd-online.com/Forums/openfoam-solving/173774-problem-interfoam-multiple-nodes.html)

gkara June 27, 2016 11:00

problem with interfoam on multiple nodes
 
Hi,

I am facing trouble with running interFoam on a HPC cluster and would really appreciate some help.

My code runs in parallel without problems when it is restricted to a single node. However, when I submit the job on multiple nodes the simulation crashes after it has performed an arbitrary number of time steps. The output for a typical case which was run in parallel (60 cores, i.e. 3 nodes) is given below

Quote:

Courant Number mean: 1.06878e-05 max: 0.000182367
Interface Courant Number mean: 4.91428e-07 max: 0.000158003
deltaT = 6.62472e-07
Time = 3.914837612e-06

PIMPLE: iteration 1
MULES: Solving for alpha.water
Phase-1 volume fraction = 0.0855268 Min(alpha.water) = 0 Max(alpha.water) = 1
MULES: Solving for alpha.water
Phase-1 volume fraction = 0.0855268 Min(alpha.water) = 0 Max(alpha.water) = 1
MULES: Solving for alpha.water
Phase-1 volume fraction = 0.0855268 Min(alpha.water) = 0 Max(alpha.water) = 1
GAMG: Solving for p_rgh, Initial residual = 0.00011391, Final residual = 1.10544e-06, No Iterations 6
time step continuity errors : sum local = 1.68697e-11, global = 1.55809e-13, cumulative = 3.4914e-11
GAMG: Solving for p_rgh, Initial residual = 7.62066e-05, Final residual = 7.97734e-07, No Iterations 5
time step continuity errors : sum local = 1.21538e-11, global = 8.31624e-15, cumulative = 3.49224e-11
GAMG: Solving for p_rgh, Initial residual = 3.03775e-05, Final residual = 8.72768e-09, No Iterations 10
time step continuity errors : sum local = 1.32912e-13, global = 1.30875e-17, cumulative = 3.49224e-11
ExecutionTime = 76.1 s ClockTime = 79 s

Courant Number mean: 1.29647e-05 max: 0.000228018
Interface Courant Number mean: 5.9077e-07 max: 0.000191552
deltaT = 7.94966e-07
Time = 4.709803933e-06

PIMPLE: iteration 1
MULES: Solving for alpha.water
Phase-1 volume fraction = 0.0855268 Min(alpha.water) = 0 Max(alpha.water) = 1
MULES: Solving for alpha.water
Phase-1 volume fraction = 0.0855268 Min(alpha.water) = 0 Max(alpha.water) = 1
MULES: Solving for alpha.water
Phase-1 volume fraction = 0.0855268 Min(alpha.water) = 0 Max(alpha.water) = 1
GAMG: Solving for p_rgh, Initial residual = 9.46689e-05, Final residual = 9.36175e-07, No Iterations 6
time step continuity errors : sum local = 2.03431e-11, global = 1.79796e-13, cumulative = 3.51022e-11
GAMG: Solving for p_rgh, Initial residual = 6.33264e-05, Final residual = 7.16328e-07, No Iterations 5
The simulation does not appear to diverge, however the simulation crashes and this is a typical error message is given below:

Quote:

srun: error: node196: task 37: Floating point exception
srun: Terminating job step 109895.0
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: *** STEP 109895.0 CANCELLED AT 2016-06-23T15:57:08 *** on node019
srun: error: node196: tasks 20-36,38-39: Killed
srun: error: node245: tasks 40-59: Killed
srun: error: node019: tasks 0-19: Killed
Attempting to resolve this issue I have tried many different things which range from playing with the solution method, e.g. use PCG instead of GAMG, use different decomposition methods, e.g. scotch, metis, simple, hierarchical, also played with different schemes but in all cases the problem persists.

To check whether this problem was affecting solely my code, I have also tried to run the damBreakWithObstacle tutorial. This tutorial exhibits many similarities with my code in terms of solution method, etc. For my test, I used the interFoam solver (instead of the inteDyMFoam used by default in the tutorial) with a mesh of (165 165 165) elements and ran it on 60 cores (3 nodes), which corresponds to approximately 75000 cells/processor. Exactly the same problem arises also in this case!

Finally, I should also mention that I use OpenFOAM 2.4.0 which has been compiled using the intelmpi 5.0.3 library.

I would greatly appreciate any ideas on what might cause this problem and how it would be possible to resolve this.

Best regards,

George

gkara July 6, 2016 09:17

Any suggestions?


All times are GMT -4. The time now is 19:49.