CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM (https://www.cfd-online.com/Forums/openfoam/)
-   -   OpenFOAM on cluster (ubuntu 12.04 LTS x64) (https://www.cfd-online.com/Forums/openfoam/125058-openfoam-cluster-ubuntu-12-04-lts-x64.html)

novakm October 18, 2013 04:13

OpenFOAM on cluster (ubuntu 12.04 LTS x64)
 
Dear FOAMers.

It has been a while since I started to use openfoam on cluster.
As always, there are problems with openfoam that magically appear and disappear. This thread shall serve as a summary of problems and solutions (hopefully) to them that are connected to cluster computing.

Best Regards

Martin

novakm October 18, 2013 06:45

Dear all,

It does not taken so long for first error to appear.

I am observing that the choice of the number of cpus is not completely free.
Sometimes for a certain number of processors this error appears.

Code:

r17:~/OpenFOAM/novakm-2.2.x/myCases/pE/pE_2p2_300>mpirun -np 8 myColdEngineFoam -parallel
/*---------------------------------------------------------------------------*\
| =========                |                                                |
| \\      /  F ield        | OpenFOAM: The Open Source CFD Toolbox          |
|  \\    /  O peration    | Version:  2.2.x                                |
|  \\  /    A nd          | Web:      www.OpenFOAM.org                      |
|    \\/    M anipulation  |                                                |
\*---------------------------------------------------------------------------*/
Build  : 2.2.x-54a7f72f9b3e
Exec  : myColdEngineFoam -parallel
Date  : Oct 18 2013
Time  : 10:54:40
Host  : "r17"
PID    : 20138
Case  : /srv/groot/novakm/OpenFOAM/novakm-2.2.x/myCases/pE/pE_2p2_300
nProcs : 8
Slaves :
7
(
"r17.20139"
"r17.20140"
"r17.20141"
"r17.20142"
"r17.20143"
"r17.20144"
"r17.20145"
)

Pstream initialized with:
    floatTransfer      : 0
    nProcsSimpleSum    : 0
    commsType          : nonBlocking
    polling iterations : 0
sigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).
fileModificationChecking : Monitoring run-time modified files using timeStampMaster
allowSystemOperations : Disallowing user-supplied system call operations

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create engine time

Create mesh for time = 300

Selecting dynamicFvMesh dynamicTopoFvMesh
Selecting metric Knupp
Selecting motion solver: mesquiteMotionSolver
Selecting quality metric: InverseMeanRatio
Selecting objective function: LPtoP
Selecting optimization algorithm: FeasibleNewton
Outer termination criterion (tcOuter) was not found. Using default values.
Reading thermophysical properties

Selecting thermodynamics package
{
    type            hePsiThermo;
    mixture        pureMixture;
    transport      polynomial;
    thermo          janaf;
    equationOfState perfectGas;
    specie          specie;
    energy          sensibleEnthalpy;
}


Reading field U

Reading/calculating face flux field phi

Creating turbulence model

Selecting turbulence model type RASModel
Selecting RAS turbulence model kEpsilon
kEpsilonCoeffs
{
    Cmu            0.09;
    C1              1.44;
    C2              1.92;
    C3              -0.33;
    sigmak          1;
    sigmaEps        1.3;
    Prt            1;
}

Creating field dpdt

Creating field kinetic energy K

Creating fintite volume options from fvOptions

Selecting finite volume options model type temperatureLimitsConstraint
    Source: source1
    - applying source for all time
    - selecting all cells
    - selected 1560932 cell(s) with volume 0.000839220537

Courant Number mean: 0.0557076891 max: 2.43731091 velocity magnitude: 42.8669571
Total cylinder mass: 0.000636838361

PIMPLE: max iterations = 5
    field p        : relTol -1, tolerance 0.001
    field U        : relTol -1, tolerance 0.001
    field h        : relTol -1, tolerance 0.001


Starting time loop

Courant Number mean: 0.0557076891 max: 2.43731091 velocity magnitude: 42.8669571
Crank angle = 300.1 CA-deg
[r17:20140] *** An error occurred in MPI_Recv
[r17:20140] *** on communicator MPI_COMM_WORLD
[r17:20140] *** MPI_ERR_TRUNCATE: message truncated
[r17:20140] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
--------------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 20140 on
node r17 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------


Does anybody know, where possibly might the problem be?

Best Regards

Martin

wyldckat October 19, 2013 05:39

Hi Martin,

Given that you've been having issues getting the MPI toolbox working with OpenFOAM on your cluster (http://www.openfoam.org/mantisbt/view.php?id=1052), then I suggest that you first do some communication tests.

From my blog post Notes about running OpenFOAM in parallel:
Quote:

On how to test if MPI is working: post #4 of "openfoam 1.6 on debian etch", and/or post #19 of "OpenFOAM updates" - Note: As of OpenFOAM 2.0.0, the application "parallelTest" is now called "Test-parallel".
This way you can first confirm if the problem is related to bad MPI usage.
Then related should be the memory limits and later on to confirm if the problem is related to the solver+libraries being used.

Best regards,
Bruno

novakm October 19, 2013 06:42

Quote:

Originally Posted by wyldckat (Post 457770)
Hi Martin,

Given that you've been having issues getting the MPI toolbox working with OpenFOAM on your cluster (http://www.openfoam.org/mantisbt/view.php?id=1052), then I suggest that you first do some communication tests.

From my blog post Notes about running OpenFOAM in parallel:
This way you can first confirm if the problem is related to bad MPI usage.
Then related should be the memory limits and later on to confirm if the problem is related to the solver+libraries being used.

Best regards,
Bruno

Hi Bruno,

Great, I ll look on it.

The problem is that the case is runable for 1-7 cpus on 8 cpus node, however, sometimes when I want to run on 8 cpus on the same 8 cpus node, the above crash appears.

The instalation of OF 2.2.x is the default (OF mpicc 1.6.3 is used from thirdparty). The bug http://www.openfoam.org/mantisbt/view.php?id=1052 was connected with my effort to use the system default mpi (1.6.5), to test, if the problem is not hidden in mpi library.

Best regards

Martin

wyldckat October 19, 2013 07:21

Hi Martin,

Quote:

Originally Posted by novakm (Post 457778)
The problem is that the case is runable for 1-7 cpus on 8 cpus node, however, sometimes when I want to run on 8 cpus on the same 8 cpus node, the above crash appears.

Mmm... two suspects are revealed from that:
  1. Decomposition: you should check what and where the meshes are on the processor folders, because when some patches are between multiple processors, it can lead to trouble. One example are the cyclic patches, although those should work fine in parallel on OpenFOAM 2.2...
    In addition, how the mesh is deformed can also be influenced by the multiple processors... in other words, excessive deformation between processors could probably lead to problems in alleviating the flow between the processor patches on those deformations.
  2. The machine itself: Is it "8 cpus" as actual cores on a single or two physical CPUs? Or is it actually 8 threads?
Best regards,
Bruno

novakm October 19, 2013 07:38

Quote:

Originally Posted by wyldckat (Post 457782)
  1. The machine itself: Is it "8 cpus" as actual cores on a single or two physical CPUs? Or is it actually 8 threads?

The machine architecture is based on Core i7 technology. It is 8 physical cpus that use some type of hyperthreading which is leads to appear as 16 virtual cpus in the info given by linux. However, at the node is 8 real cpus and therefore I call for 8 cpus.

M


All times are GMT -4. The time now is 19:18.