SigFpe when running ANY application in parallel

April 23, 2015, 04:55
[SOLVED] SigFpe when running ANY application in parallel
Hi everybody.

I have a very simple case made of a box-shaped volume created as a single block in blockMesh. The mesh is made of cubic cells, without any non-ortogonality. Everything runs fine until I try to run the solver or any other application using mpirun. Then the polymesh loading fails and create a sigFpe error.

As a simple demonstration of the error i run checkMesh (single core), decomposePar, checkmesh (decomposed).

When i run checkMesh this is the output:
| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  2.2.1                                 |
|   \\  /    A nd           | Web:                      |
|    \\/     M anipulation  |                                                 |
Build  : 2.2.1-57f3c3617a2d
Exec   : checkMesh
Date   : Apr 23 2015
Time   : 10:06:40
Host   : "node166"
PID    : 30400
Case   : /gpfs/scratch/userexternal/lamerio0/Rete/M3
nProcs : 1
sigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).
fileModificationChecking : Monitoring run-time modified files using timeStampMaster
allowSystemOperations : Disallowing user-supplied system call operations

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Create polyMesh for time = 0

Time = 0

Mesh stats
    points:           6031169
    faces:            17826816
    internal faces:   17562624
    cells:            5898240
    faces per cell:   6
    boundary patches: 7
    point zones:      0
    face zones:       0
    cell zones:       0

Overall number of cells of each type:
    hexahedra:     5898240
    prisms:        0
    wedges:        0
    pyramids:      0
    tet wedges:    0
    tetrahedra:    0
    polyhedra:     0

Checking topology...
    Boundary definition OK.
    Cell to face addressing OK.
    Point usage OK.
    Upper triangular ordering OK.
    Face vertices OK.
    Number of regions: 1 (OK).

Checking patch topology for multiply connected surfaces...
    Patch               Faces    Points   Surface topology
    cyclic_bottom       61440    62177    ok (non-closed singly connected)
    cyclic_top          61440    62177    ok (non-closed singly connected)
    cyclic_left         61440    62177    ok (non-closed singly connected)
    cyclic_right        61440    62177    ok (non-closed singly connected)
    in                  5040     5328     ok (non-closed singly connected)
    out                 9216     9409     ok (non-closed singly connected)
    net                 4176     4649     ok (non-closed singly connected)

Checking geometry...
    Overall domain bounding box (0 -0.04 -0.04) (0.8 0.04 0.04)
    Mesh (non-empty, non-wedge) directions (1 1 1)
    Mesh (non-empty) directions (1 1 1)
    Boundary openness (1.7218198e-16 3.407543e-16 -4.6831815e-17) OK.
    Max cell openness = 3.5101045e-16 OK.
    Max aspect ratio = 1.5 OK.
    Minimum face area = 6.9444444e-07. Maximum face area = 1.0416667e-06.  Face area magnitudes OK.
    Min volume = 8.6805556e-10. Max volume = 8.6805556e-10.  Total volume = 0.00512.  Cell volumes OK.
    Mesh non-orthogonality Max: 0 average: 0
    Non-orthogonality check OK.
    Face pyramids OK.
    Max skewness = 2.8799967e-06 OK.
    Coupled point location match (average 1.232301e-17) OK.

Mesh OK.

Sounds great!

Now i run decomposePar with this decomposeParDict:
    version     2.0;
    format      ascii;
    class       dictionary;
    object      decomposeParDict;

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

numberOfSubdomains 10;
method scotch;

distributed false;

It returns:
| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  2.2.1                                 |
|   \\  /    A nd           | Web:                      |
|    \\/     M anipulation  |                                                 |
Build  : 2.2.1-57f3c3617a2d
Exec   : decomposePar
Date   : Apr 23 2015
Time   : 10:00:12
Host   : "node166"
PID    : 23719
Case   : /gpfs/scratch/userexternal/lamerio0/Rete/M3
nProcs : 1
sigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).
fileModificationChecking : Monitoring run-time modified files using timeStampMaster
allowSystemOperations : Disallowing user-supplied system call operations

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Decomposing mesh region0

Create mesh

Calculating distribution of cells
Selecting decompositionMethod scotch

Finished decomposition in 17.57 s

Calculating original mesh data

Distributing cells to processors

Distributing faces to processors

Distributing points to processors

Constructing processor meshes

Processor 0
    Number of cells = 591664
    Number of faces shared with processor 1 = 12561
    Number of faces shared with processor 1 = 15
    Number of faces shared with processor 1 = 24
    Number of faces shared with processor 3 = 12095
    Number of faces shared with processor 3 = 13
    Number of faces shared with processor 3 = 4
    Number of faces shared with processor 3 = 8
    Number of processor patches = 7
    Number of processor faces = 24720
    Number of boundary faces = 24004

Processor 1
    Number of cells = 589712
    Number of faces shared with processor 0 = 12561
    Number of faces shared with processor 0 = 15
    Number of faces shared with processor 0 = 24
    Number of faces shared with processor 9 = 12199
    Number of faces shared with processor 9 = 2
    Number of faces shared with processor 9 = 2
    Number of faces shared with processor 9 = 10
    Number of faces shared with processor 9 = 3
    Number of processor patches = 8
    Number of processor faces = 24816
    Number of boundary faces = 25086

Processor 2
    Number of cells = 590427
    Number of faces shared with processor 4 = 11695
    Number of faces shared with processor 4 = 4
    Number of faces shared with processor 4 = 6
    Number of faces shared with processor 4 = 5
    Number of processor patches = 4
    Number of processor faces = 11710
    Number of boundary faces = 33342

Processor 3
    Number of cells = 591917
    Number of faces shared with processor 0 = 12095
    Number of faces shared with processor 0 = 4
    Number of faces shared with processor 0 = 13
    Number of faces shared with processor 0 = 8
    Number of faces shared with processor 4 = 11647
    Number of faces shared with processor 4 = 6
    Number of faces shared with processor 4 = 2
    Number of faces shared with processor 4 = 1
    Number of faces shared with processor 4 = 14
    Number of processor patches = 9
    Number of processor faces = 23790
    Number of boundary faces = 24428

Processor 4
    Number of cells = 589664
    Number of faces shared with processor 2 = 11695
    Number of faces shared with processor 2 = 4
    Number of faces shared with processor 2 = 5
    Number of faces shared with processor 2 = 6
    Number of faces shared with processor 3 = 11647
    Number of faces shared with processor 3 = 2
    Number of faces shared with processor 3 = 6
    Number of faces shared with processor 3 = 14
    Number of faces shared with processor 3 = 1
    Number of processor patches = 9
    Number of processor faces = 23380
    Number of boundary faces = 24842

Processor 5
    Number of cells = 590639
    Number of faces shared with processor 6 = 12086
    Number of faces shared with processor 6 = 1
    Number of faces shared with processor 6 = 7
    Number of faces shared with processor 6 = 1
    Number of faces shared with processor 6 = 13
    Number of processor patches = 5
    Number of processor faces = 12108
    Number of boundary faces = 33578

Processor 6
    Number of cells = 589995
    Number of faces shared with processor 5 = 12086
    Number of faces shared with processor 5 = 7
    Number of faces shared with processor 5 = 1
    Number of faces shared with processor 5 = 13
    Number of faces shared with processor 5 = 1
    Number of faces shared with processor 7 = 11834
    Number of faces shared with processor 7 = 10
    Number of faces shared with processor 7 = 1
    Number of faces shared with processor 7 = 19
    Number of processor patches = 9
    Number of processor faces = 23972
    Number of boundary faces = 24414

Processor 7
    Number of cells = 589221
    Number of faces shared with processor 6 = 11834
    Number of faces shared with processor 6 = 1
    Number of faces shared with processor 6 = 10
    Number of faces shared with processor 6 = 19
    Number of faces shared with processor 8 = 11479
    Number of faces shared with processor 8 = 12
    Number of faces shared with processor 8 = 5
    Number of faces shared with processor 8 = 8
    Number of processor patches = 8
    Number of processor faces = 23368
    Number of boundary faces = 25196

Processor 8
    Number of cells = 588220
    Number of faces shared with processor 7 = 11479
    Number of faces shared with processor 7 = 5
    Number of faces shared with processor 7 = 12
    Number of faces shared with processor 7 = 8
    Number of faces shared with processor 9 = 11772
    Number of faces shared with processor 9 = 20
    Number of faces shared with processor 9 = 10
    Number of faces shared with processor 9 = 2
    Number of processor patches = 8
    Number of processor faces = 23308
    Number of boundary faces = 23794

Processor 9
    Number of cells = 586781
    Number of faces shared with processor 1 = 12199
    Number of faces shared with processor 1 = 2
    Number of faces shared with processor 1 = 2
    Number of faces shared with processor 1 = 3
    Number of faces shared with processor 1 = 10
    Number of faces shared with processor 8 = 11772
    Number of faces shared with processor 8 = 20
    Number of faces shared with processor 8 = 2
    Number of faces shared with processor 8 = 10
    Number of processor patches = 9
    Number of processor faces = 24020
    Number of boundary faces = 25052

Number of processor faces = 107596
Max number of cells = 591917 (0.35485162% above average 589824)
Max number of processor patches = 9 (18.421053% above average 7.6)
Max number of faces between processors = 24816 (15.320272% above average 21519.2)

Time = 0

Processor 0: field transfer
Processor 1: field transfer
Processor 2: field transfer
Processor 3: field transfer
Processor 4: field transfer
Processor 5: field transfer
Processor 6: field transfer
Processor 7: field transfer
Processor 8: field transfer
Processor 9: field transfer

But, if I run any application using mpirun on the decomposed case I receive a sigFpe error after the "Create polyMesh for time = 0" phase.
E.g. if I run " mpirun -np 10 checkMesh -parallel" I receive:

[error moved in the next post due to characters limit]

I tried to change OF version from 2.3.0 to 2.2.1, I also changed cluster, nothing worked.

I al so changed the number of processors from 10 to 9 and to 11, but neither this worked.

How can I solve the problem?

April 23, 2015, 04:56
The sigFpe error returned by checkMesh
| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  2.2.1                                 |
|   \\  /    A nd           | Web:                      |
|    \\/     M anipulation  |                                                 |
Build  : 2.2.1-57f3c3617a2d
Exec   : checkMesh -parallel
Date   : Apr 23 2015
Time   : 10:05:51
Host   : "node166"
PID    : 29821
Case   : /gpfs/scratch/userexternal/lamerio0/Rete/M3
nProcs : 10
Slaves :

Pstream initialized with:
    floatTransfer      : 0
    nProcsSimpleSum    : 0
    commsType          : nonBlocking
    polling iterations : 0
sigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).
fileModificationChecking : Monitoring run-time modified files using timeStampMaster
allowSystemOperations : Disallowing user-supplied system call operations

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Create polyMesh for time = 0

checkMesh:29827 terminated with signal 11 at PC=7fdfce7f54da SP=7fff71ac1150.  Backtrace:
[2] #0  [7] Foam::error::printStack(Foam::Ostream&)#0  [8] #0  [9] Foam::error::printStack(Foam::Ostream&)#0  [5] # 0  [4] Foam::error::printStack(Foam::Ostream&)#0  Foam::error::printStack(Foam::Ostream&)Foam::error::printStack(Fo am::Ostream&)Foam::error::printStack(Foam::Ostream&)[3] #0  Foam::error::printStack(Foam::Ostream&)[0] #0  Foam::er ror::printStack(Foam::Ostream&)[1] #0  Foam::error::printStack(Foam::Ostream&) at ??:?
[3] #1  Foam::sigFpe::sigHandler(int) at ??:?
 at ??:?
[9] #1   at ??:?
[7] #1  Foam::sigSegv::sigHandler(int)[0] #1  Foam::sigSegv::sigHandler(int) at ??:?
Foam::sigSegv::sigHandler(int)[4] #1   at ??:?
[2] #1  Foam::sigSegv::sigHandler(int)Foam::sigSegv::sigHandler(int) at ??:?
[1] #1  Foam::sigSegv::sigHandler(int) at ??:?
[5] # at ??:?
[8] #1  1  Foam::sigSegv::sigHandler(int)Foam::sigSegv::sigHandler(int) at ??:?
[3] #2   at ??:?
[7] #2   at ??:?
[9] #2   at ??:?
[2] #2   at ??:?
[5] #2   in "/lib64/"
[3] #3  Foam::processorPolyPatch::updateMesh(Foam::PstreamBuffers&) at ??:?
[1] #2   at ??:?
[0] #2   at ??:?
[8] #2   at ??:?
[4] #2   in "/lib64/"
[7] #3  Foam::processorPolyPatch::updateMesh(Foam::PstreamBuffers&) in "/lib64/"
[9] #3  Foam::processorPolyPatch::updateMesh(Foam::PstreamBuffers&) in "/lib64/"
[2] #3  Foam::processorPolyPatch::updateMesh(Foam::PstreamBuffers&) in "/lib64/"
[5] #3  Foam::processorPolyPatch::updateMesh(Foam::PstreamBuffers&) in "/lib64/"
[1] #3  Foam::processorPolyPatch::updateMesh(Foam::PstreamBuffers&) at ??:?
[3] #4  Foam::polyBoundaryMesh::updateMesh() in "/lib64/"
[4] #3  Foam::processorPolyPatch::updateMesh(Foam::PstreamBuffers&) in "/lib64/"
[8] #3   in "/lib64/"
Foam::processorPolyPatch::updateMesh(Foam::PstreamBuffers&)[0] #3  Foam::processorPolyPatch::updateMesh(Foam::Pstre amBuffers&) at ??:?
[7] #4  Foam::polyBoundaryMesh::updateMesh() at ??:?
[9] #4  Foam::polyBoundaryMesh::updateMesh() at ??:?
[2] #4  Foam::polyBoundaryMesh::updateMesh() at ??:?
[5] #4  Foam::polyBoundaryMesh::updateMesh() at ??:?
[3] #5  Foam::polyMesh::polyMesh(Foam::IOobject const&) at ??:?
[1] #4  Foam::polyBoundaryMesh::updateMesh() at ??:?
[4] #4  Foam::polyBoundaryMesh::updateMesh() at ??:?
[0] #4  Foam::polyBoundaryMesh::updateMesh() at ??:?
[8] #4  Foam::polyBoundaryMesh::updateMesh() at ??:?
[7] #5  Foam::polyMesh::polyMesh(Foam::IOobject const&) at ??:?
[9] #5  Foam::polyMesh::polyMesh(Foam::IOobject const&) at ??:?
[2] #5  Foam::polyMesh::polyMesh(Foam::IOobject const&) at ??:?
[3] #6   at ??:?
[1] #5  Foam::polyMesh::polyMesh(Foam::IOobject const&) at ??:?
[5] #5  Foam::polyMesh::polyMesh(Foam::IOobject const&) at ??:?
[4] #5  Foam::polyMesh::polyMesh(Foam::IOobject const&)
 at ??:?
[7] #6   at ??:?
[9] #6   at ??:?
[8] #5  Foam::polyMesh::polyMesh(Foam::IOobject const&) at ??:?
[0] #5  Foam::polyMesh::polyMesh(Foam::IOobject const&)[3]  at ??:?
[3] #7  __libc_start_main at ??:?
[5] #6   at ??:?
[1] #6   at ??:?
[2] #6

 at ??:?
[4] #6
 at ??:?
[8] #6   in "/lib64/"
[3] #8
 at ??:?
[0] #6
[7]  at ??:?
[7] #7  __libc_start_main[9]  at ??:?
[9] #7  __libc_start_main

[5]  at ??:?
[5] #7  __libc_start_main[1]  at ??:?
[1] #7  __libc_start_main
 in "/lib64/"
[7] #8
 in "/lib64/"
[9] #8  [4]  at ??:?
[4] #7  __libc_start_main[2]  at ??:?
[2] #7  __libc_start_main[3]  at ??:?
[node166:29824] *** Process received signal ***
[node166:29824] Signal: Floating point exception (8)
[node166:29824] Signal code:  (-6)
[node166:29824] Failing at address: 0x6ab300007480
 in "/lib64/"
[5] #8
[node166:29824] [ 0] /lib64/[0x7fcaeb103640]
[node166:29824] [ 1] /lib64/[0x7fcaeb1035c9]
[node166:29824] [ 2] /lib64/[0x7fcaeb103640]
[node166:29824] [ 3] /cineca/prod/applications/openfoam/2.2.1/openmpi--1.8.4--gnu--4.9.2/OpenFOAM-2.2.1/platforms/l inux64GccDPOpt/lib/[0x7fcaec2e0 344]
[node166:29824] [ 4] /cineca/prod/applications/openfoam/2.2.1/openmpi--1.8.4--gnu--4.9.2/OpenFOAM-2.2.1/platforms/l inux64GccDPOpt/lib/[0x7fcaec2e4d11]
[node166:29824] [ 5] [8]  at ??:?
[8] #7  __libc_start_main/cineca/prod/applications/openfoam/2.2.1/openmpi--1.8.4--gnu--4.9.2/OpenFOAM-2.2.1/platfor ms/linux64GccDPOpt/lib/[0x7fcaec33575a]
[node166:29824] [ 6] checkMesh[0x409c78]
[node166:29824] [ 7] /lib64/[0x7fcaeb0efaf5]
[node166:29824] [ 8] checkMesh[0x40aab9]
[node166:29824] *** End of error message ***

 in "/lib64/"
[1] #8  [0]  at ??:?
[0] #7  __libc_start_main[7]  at ??:?

checkMesh:29828 terminated with signal 11 at PC=7f109fbf75c9 SP=7fff80095e78.  Backtrace:

[9]  at ??:?

checkMesh:29830 terminated with signal 11 at PC=7f178dac35c9 SP=7fff45671cf8.  Backtrace:
 in "/lib64/[0x7f178dac3640]
[4] #8  /cineca/prod/applications/openfoam/2.2.1/openmpi--1.8.4--gnu--4.9.2/OpenFOAM-2.2.1/platforms/linux64GccDPOp t/lib/[0x7f178eca0359]
 in "/lib64/"
[8] #8   in "/lib64/"
[2] #8   in "/lib64/"
[0] #8  [5]  at ??:?

checkMesh:29826 terminated with signal 11 at PC=7f2772bdd5c9 SP=7fff33a875b8.  Backtrace:
[1]  at ??:?

checkMesh:29822 terminated with signal 11 at PC=7f727af225c9 SP=7fffcedd8f78.  Backtrace:

[8]  at ??:?

checkMesh:29829 terminated with signal 11 at PC=7f2a8b8e35c9 SP=7fffe927b2b8.  Backtrace:
[4]  at ??:?

checkMesh:29825 terminated with signal 11 at PC=7f77c1eaa5c9 SP=7fff6da6d078.  Backtrace:
[2]  at ??:?

checkMesh:29823 terminated with signal 11 at PC=7f17651a65c9 SP=7fff7d9894b8.  Backtrace:
mpirun noticed that process rank 3 with PID 29824 on node node166 exited on signal 8 (Floating point exception).
April 23, 2015, 07:27
Solved using the fix from wyldckat found here

Originally Posted by wyldckat
Hi guilha,

Mmm... I originally thought you were using OpenFOAM 2.1... but if it's OpenFOAM 2.0.1, then my guess is that you're having problems with decomposing cyclic patches. Actually, I vaguely remember that only in OpenFOAM 2.2 were fully fixed the issues with decomposing cyclic patches.

Try using the "preservePatches" entry in "decomposeParDict". In the file "applications/utilities/parallelProcessing/decomposePar/decomposeParDict" you should find this example:
//- Keep owner and neighbour on same processor for faces in patches:
// (makes sense only for cyclic patches)
//preservePatches (cyclic_half0 cyclic_half1);
Uncomment the last line and use the names of your cyclic patches.

Best regards,
April 23, 2015, 14:53
Bruno Santos
For future reference, this was reported here: - and they've answered back that this has already been fixed in 2.3.1 and 2.3.x.
