CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

Segmentation fault with mpi

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   December 4, 2018, 09:23
Default Segmentation fault with mpi
  #1
New Member
 
Johannes Voß
Join Date: May 2018
Posts: 13
Rep Power: 4
J.H.59 is on a distinguished road
Hi there,

I got an annoying problem with mpi. Everytime I run OpenFOAM parallel with mpi I get a Segmentation fault after a randome time. This can be 7 minutes, 1 hour or even 1 day. If I run the same case seriell everything is fine.
I'm using an own version of "sonicLiquidFoam" to be able to use waveTransmissive as a BC. But the same error occurred with other solvers like rhoPimpleFoam. My geometry is a long rectangle with a sphere in the middle. The error is:


Code:
[19] #0  Foam::error::printStack(Foam::Ostream&) at ??:?
[19] #1  Foam::sigSegv::sigHandler(int) at ??:?
[19] #2  ? in "/lib64/libc.so.6"
[19] #3  ? at btl_vader_component.c:?
[19] #4  opal_progress in "/home/j/j_voss/anaconda3/envs/foam/lib/./libopen-pal.so.40"
[19] #5  ompi_request_default_wait_all in "/home/j/j_voss/anaconda3/envs/foam/lib/libmpi.so.40"
[19] #6  PMPI_Waitall in "/home/j/j_voss/anaconda3/envs/foam/lib/libmpi.so.40"
[19] #7  Foam::UPstream::waitRequests(int) at ??:?
[19] #8  Foam::GeometricField<double, Foam::fvPatchField, Foam::volMesh>::Boundary::evaluate() at ??:?
[19] #9  Foam::tmp<Foam::GeometricField<double, Foam::fvPatchField, Foam::volMesh> > Foam::fvc::surfaceSum<double>(Foam::GeometricField<double, Foam::fvsPatchField, Foam::surfaceMesh> const&) at ??:?
[19] #10  ? at ??:?
[19] #11  ? at ??:?
[19] #12  __libc_start_main in "/lib64/libc.so.6"
[19] #13  ? at ??:?
[r08n11:323145] *** Process received signal ***
[r08n11:323145] Signal: Segmentation fault (11)
[r08n11:323145] Signal code:  (-6)
[r08n11:323145] Failing at address: 0x9979800004ee49
[r08n11:323145] [ 0] /lib64/libc.so.6(+0x362f0)[0x2b4c959052f0]
[r08n11:323145] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b4c95905277]
[r08n11:323145] [ 2] /lib64/libc.so.6(+0x362f0)[0x2b4c959052f0]
[r08n11:323145] [ 3] /home/j/j_voss/anaconda3/envs/foam/lib/openmpi/mca_btl_vader.so(+0x41ec)[0x2b4ca902f1ec]
[r08n11:323145] [ 4] /home/j/j_voss/anaconda3/envs/foam/lib/./libopen-pal.so.40(opal_progress+0x2c)[0x2b4c9b0af32c]
[r08n11:323145] [ 5] /home/j/j_voss/anaconda3/envs/foam/lib/libmpi.so.40(ompi_request_default_wait_all+0xed)[0x2b4c98cf8e5d]
[r08n11:323145] [ 6] /home/j/j_voss/anaconda3/envs/foam/lib/libmpi.so.40(PMPI_Waitall+0x9f)[0x2b4c98d3497f]
[r08n11:323145] [ 7] /home/j/j_voss/foam/OpenFOAM-v1712/platforms/linux64GccDPInt32Opt/lib/openmpi-system/libPstream.so(_ZN4Foam8UPstream12waitRequestsEi+0x85)[0x2b4c95ca2a75]
[r08n11:323145] [ 8] TestSonicLiquidFoam(_ZN4Foam14GeometricFieldIdNS_12fvPatchFieldENS_7volMeshEE8Boundary8evaluateEv+0x1ba)[0x439a5a]
[r08n11:323145] [ 9] TestSonicLiquidFoam(_ZN4Foam3fvc10surfaceSumIdEENS_3tmpINS_14GeometricFieldIT_NS_12fvPatchFieldENS_7volMeshEEEEERKNS3_IS4_NS_13fvsPatchFieldENS_11surfaceMeshEEE+0x2cf)[0x475eaf]
[r08n11:323145] [10] TestSonicLiquidFoam[0x475fb7]
[r08n11:323145] [11] TestSonicLiquidFoam[0x427a7d]
[r08n11:323145] [12] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b4c958f1445]
[r08n11:323145] [13] TestSonicLiquidFoam[0x42a83a]
[r08n11:323145] *** End of error message ***
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 19 with PID 0 on node r08n11 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
I checked some posts in this forum like
segmentation fault--parrallel problem?
but was not able to find a solution. I tried different decompostion methods (like scotch or hierarchical) but both ends in this kind of error.
Also the place where the error occures is sometimes different.
For example the same error just with

Code:
[132] #9   Foam::fv::gaussGrad<double>::gradf(Foam::GeometricField<double,  Foam::fvsPatchField, Foam::surfaceMesh> const&, Foam::word  const&) at ??:?
so in gaussGrad.


I also checked the new version 1806 from OpenFOAM but the error stays the same. I installed my version on a cluster via anaconda and from source like in
https://www.openfoam.com/download/install-source.php .
I'm thankful for every kind of help.


Best regards
Johannes
J.H.59 is offline   Reply With Quote

Old   December 12, 2018, 06:47
Default
  #2
New Member
 
Johannes Voß
Join Date: May 2018
Posts: 13
Rep Power: 4
J.H.59 is on a distinguished road
It seems to be that something went wrong with the installation.
The choosen MPI version was SYSTEMOPENMPI in "WM_MPLIB=SYSTEMOPENMPI" in the etc/bashrc file which is using the system installed openmpi.
When I change this to "WM_MPLIB=OPENMPI" and in /etc/config.sh/mpi the line under OPENMPI to "export FOAM_MPI=openmpi-1.10.4" and install this OpenMpi version as ThirdParty everything works fine.
J.H.59 is offline   Reply With Quote

Old   December 14, 2018, 15:47
Default
  #3
Member
 
cyss38's Avatar
 
Cyrille Bonamy
Join Date: Mar 2015
Location: Grenoble, France
Posts: 62
Rep Power: 7
cyss38 is on a distinguished road
You may have a conflict with your condo installation ...
In the error log, there is a link to your conda installation.

Systemopenmpi works well for me. But I do not combine conda and openfoam (/home/j/j_voss/anaconda3/.....libmpi.so.40) ;-)
cyss38 is offline   Reply With Quote

Old   December 17, 2018, 12:09
Default
  #4
New Member
 
Johannes Voß
Join Date: May 2018
Posts: 13
Rep Power: 4
J.H.59 is on a distinguished road
You are right, that seems to be the reason. I think Systemopenmpi choose a not right working mpi version (or at least not for OpenFOAM) installed with conda.
J.H.59 is offline   Reply With Quote

Old   February 21, 2019, 03:31
Default
  #5
Member
 
Lukas Fischer
Join Date: May 2018
Location: Germany
Posts: 55
Rep Power: 4
lukasf is on a distinguished road
Hi Johannis,


I am having a similar issue but I am not using a condo installation.


Which version of openMPI defined in SYSTEMOPENMPI caused your problem?

Last edited by lukasf; February 21, 2019 at 03:32. Reason: wrong thread
lukasf is offline   Reply With Quote

Old   February 21, 2019, 05:07
Default
  #6
New Member
 
Johannes Voß
Join Date: May 2018
Posts: 13
Rep Power: 4
J.H.59 is on a distinguished road
Hi Lukas,


since in the error message the openmpi version of conda is called I think it should be Open MPI: 3.1.0.
But even if you don't use conda to install OpenFOAM you should be able to choose the Open MPI version in the installation. So you could choose perhaps also openmpi-1.10.4.
J.H.59 is offline   Reply With Quote

Old   February 21, 2019, 06:07
Default
  #7
Member
 
Lukas Fischer
Join Date: May 2018
Location: Germany
Posts: 55
Rep Power: 4
lukasf is on a distinguished road
I am not sure if it is ok to use OpenFoam4.1 (OF4.1) with a different openMPI version other than which has been used to compile it.


This is my issue (similiar to yours):



I am using OF4.1 on a cluster.



I wanted to compile it with the newest openmpi version 4.0.0.


This is not possible ("ptscotch.H" cannot be found when I try to compile the thirdparty folder).



I switched to version 3.1.3. which allows me to compile OF4.1.


Now I have an OpenFoam Version which is compiled with 3.1.3.


When I run simulations in parallel they will crash (floating point exception). Those simulations have run in the past on a different cluster with openMPI 1.6.5. Unfortunalety, it is not possible to use this version on the new cluster.


What I did to bypass this problem was to source the
openmpi version 4.0.0.



Now I am using OF4.1 (compiled with the openpi version 3.1.3) but with a sourced openmpi version 4.0.0.


The old simulations work in parallel now without a problem.


I tried to run a different case now and the segmentation fault arises.



I improved the mesh to reduce the possibility that this is reason for the issue. CheckMesh does not fail (I know that the mesh can still be an issue though).


I used scotch to decompose my case.


I will try now to use axial decomposition and see if it crashes at the same time.
lukasf is offline   Reply With Quote

Old   February 26, 2019, 08:54
Default
  #8
Member
 
Lukas Fischer
Join Date: May 2018
Location: Germany
Posts: 55
Rep Power: 4
lukasf is on a distinguished road
I was able to compile OF4.1 with Open-MPI 1.10.7 and I can also run it with it.


The segmentation fault still arises.


I tried the same case in parallel with a different number of processors.


The case crashes with a higher number of processors (e.g. 240 or 280) after some time (the time differs). The case runs with a lower number of processors (e.g. 168) without crashing and reaches a higher endtime. This is too slow for me though.


Right now I think, that it has nothing to do with the MPI Versions with which I compiled and sourced OpenFOAM.


I would still be interested in any opinion.

Last edited by lukasf; March 1, 2019 at 04:02. Reason: updated my post
lukasf is offline   Reply With Quote

Old   October 26, 2019, 06:01
Default
  #9
Member
 
Shujaut H. Bader
Join Date: Aug 2016
Posts: 61
Rep Power: 6
backscatter is on a distinguished road
Quote:
Originally Posted by lukasf View Post
I was able to compile OF4.1 with Open-MPI 1.10.7 and I can also run it with it.


The segmentation fault still arises.


I tried the same case in parallel with a different number of processors.


The case crashes with a higher number of processors (e.g. 240 or 280) after some time (the time differs). The case runs with a lower number of processors (e.g. 168) without crashing and reaches a higher endtime. This is too slow for me though.


Right now I think, that it has nothing to do with the MPI Versions with which I compiled and sourced OpenFOAM.


I would still be interested in any opinion.
Hi Lukas,

Did you find the solution to this issue? I am facing a similar problem in my modified pimpleFoam for scalar transport. My simulations also crash abruptly and then when I restart them, they run finely again until next abrupt mpi exit.
backscatter is offline   Reply With Quote

Old   October 27, 2019, 11:50
Default
  #10
Member
 
Lukas Fischer
Join Date: May 2018
Location: Germany
Posts: 55
Rep Power: 4
lukasf is on a distinguished road
Hi,


I am able to run pimpleFoam without problems with openMPI1.6.5.


I have compiled openfoam 4.1 while I had sourced openMPI1.6.5. While running OF 4.1 make sure to use openMPI1.6.5 as well.



Content of my sourced bashrc:


export PATH="/opt/openmpi/1.6.5/gcc/bin/:${PATH}"

#source openfoam
source /home/lukas/openFoam41_openMPI1.6.5/OpenFOAM-4.1/etc/bashrc


Content of my cluster runscript:


# load MPI


export PATH="/opt/openmpi/1.6.5/gcc/bin/:${PATH}"
export LD_LIBRARY_PATH="/opt/openmpi/1.6.5/gcc/lib/:${LD_LIBRARY_PATH}"


# source openfoam 4.1 for centos7
. /home/lukas/openFoam41_openMPI1.6.5
lukasf is offline   Reply With Quote

Reply

Tags
mpi error, parallel error, segmentaion fault

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
mpi run problem signal 11 (segmentation fault) FGM simulation Fedindras CONVERGE 1 October 26, 2017 16:05
Segmentation fault in SU2 V5.0 ygd SU2 2 March 1, 2017 04:38
Segmentation fault when running in parallel Pj. OpenFOAM Running, Solving & CFD 3 April 8, 2015 08:12
segmentation fault when installing OF-2.1.1 on a cluster Rebecca513 OpenFOAM Installation 9 July 31, 2012 15:06
Error using LaunderGibsonRSTM on SGI ALTIX 4700 jaswi OpenFOAM 2 April 29, 2008 10:54


All times are GMT -4. The time now is 17:18.