CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM Bugs (http://www.cfd-online.com/Forums/openfoam-bugs/)
-   -   SnappyHexMesh in parallel openmpi (http://www.cfd-online.com/Forums/openfoam-bugs/62372-snappyhexmesh-parallel-openmpi.html)

wikstrom October 17, 2008 04:44

Description: Lately I have se
 
Description:
Lately I have several times been running into the followin problem. It is repeatable with the same case on two different hardware archs and with icc and gcc compilers:

During Shell refinement iteration (>1) an MPI-error occur:

[dagobah:01576] *** An error occurred in MPI_Bsend
[dagobah:01576] *** on communicator MPI_COMM_WORLD
[dagobah:01576] *** MPI_ERR_BUFFER: invalid buffer pointer
[dagobah:01576] *** MPI_ERRORS_ARE_FATAL (goodbye)

Problem does not occur using e.g. 2 processes.

Solver/Application:
(name of application. If one of your own track down where the problem is
inside OpenFOAM)


Testcase:
http://www.cfd-online.com/OpenFOAM_D...hment_icon.gif snappyHexMesh-coarse.tgz

Platform:
linux i686 and X86-64, The latter tested with gcc and icc

Version:
1.5.x (2008-10-10)

Notes:

To run:

blockMesh
decomposePar
foamJob -p -s snappyHexMesh


Cheers Niklas

mattijs October 17, 2008 11:40

Have you checked for 'if (user
 
Have you checked for 'if (user() == "Niklas")' ;-)

Your problem should work with 17-10 onwards version. Fixed some unsynchronised communications.

wikstrom October 19, 2008 14:51

;-) you're great Mattijs. Than
 
;-) you're great Mattijs. Thanks, and I'm pulling allready. Testing tomorrow morning.

Really promising tool by the way, snappy! Awed. And it got me into learning Blender more, as well, which is fun.

/Niklas

wikstrom October 20, 2008 07:35

Sorry Mattijs, did not do i
 
Sorry Mattijs,

did not do it. Same situation as before, it seems. Does it work with user()== "Mattijs" ? ;-)

mattijs October 20, 2008 14:20

It worked for me on 17/10 when
 
It worked for me on 17/10 when I pushed those changes in. Didn't use foamJob but that is about the only difference.

niklas October 20, 2008 16:46

didnt work for me either. I
 
didnt work for me either.

I see there's a new version of openmpi out 1.2.8
Im using 1.2.6 still.
Think that could be it?

Will try 1.2.8 tomorrow and see if it matters.

N

niklas October 21, 2008 03:46

nope, same problem...
 
nope, same problem...

mattijs October 21, 2008 04:12

Hadn't expected mpi version to
 
Hadn't expected mpi version to matter. Where does it go wrong? Can you post log or run it in separate windows (e.g. using mpirunDebug) and get a traceback? What are your Pstream settings (non-blocking?)

niklas October 21, 2008 06:17

Dont know what my Pstream sett
 
Dont know what my Pstream settings are.

Where do I check/change that?

N

wikstrom October 21, 2008 06:23

etc/controldict Mine is non
 
etc/controldict

Mine is nonBlocking, but I earlier tried with blocking. No difference.

niklas October 21, 2008 06:36

nvm, its nonBlocking. Here's
 
nvm,
its nonBlocking.
Here's a log from mpirunDebug

Im trying different setting now.
Shell refinement iteration 2
----------------------------

Marked for refinement due to refinement shells : 338416 cells.
Determined cells to refine in = 0.74 s
Selected for internal refinement : 339745 cells (out of 736740)
Edge intersection testing:
Number of edges : 9527741
Number of edges to retest : 8366794
Number of intersected edges : 63397
Refined mesh in = 20.68 s
After refinement shell refinement iteration 2 : cells:3114955 faces:9495395 points:3267682
Cells per refinement level:
0 19318
1 8320
2 19344
3 2848317
4 155933
5 63723

Program received signal SIGHUP, Hangup.
[Switching to Thread 182941472640 (LWP 15999)]
0x0000002a98140c92 in opal_progress ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/ThirdParty/openmpi-1.2.8/platforms/linux64 GccDPOpt/lib/libopen-pal.so.0
#0 0x0000002a98140c92 in opal_progress ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/ThirdParty/openmpi-1.2.8/platforms/linux64 GccDPOpt/lib/libopen-pal.so.0
#1 0x0000002a9a57f0f5 in mca_pml_ob1_probe ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/ThirdParty/openmpi-1.2.8/platforms/linux64 GccDPOpt/lib/openmpi/mca_pml_ob1.so
#2 0x0000002a97e9cd86 in MPI_Probe ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/ThirdParty/openmpi-1.2.8/platforms/linux64 GccDPOpt/lib/libmpi.so.0
#3 0x0000002a979babd1 in Foam::IPstream::IPstream ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/openmpi -1.2.8/libPstream.so
#4 0x0000002a963a512f in Foam::fvMeshDistribute::receiveMesh ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libdyna micMesh.so
#5 0x0000002a963a7a9b in Foam::fvMeshDistribute::distribute ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libdyna micMesh.so
#6 0x0000002a95762e65 in Foam::meshRefinement::refineAndBalance ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libauto Mesh.so
#7 0x0000002a95702e7b in Foam::autoRefineDriver::shellRefine ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libauto Mesh.so
#8 0x0000002a9570394f in Foam::autoRefineDriver::doRefine ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libauto Mesh.so
#9 0x0000000000406357 in main ()
(gdb) Hangup detected on fd 0
Error detected on fd 0

wikstrom October 21, 2008 06:41

scheduled, blocking or nonBloc
 
scheduled, blocking or nonBlocking gives exactly the same result.

/N

FYI:

I have problems running mpirunDebug. Same issues I've had earlier about OF shell scripts running /bin/sh: In ubuntu /bin/sh -> /bin/dash. (If I change the shebang to /bin/bash it usually works, though.) Does Suse link /bin/sh to bash?

I can fix this locally. Just wanted you to know.

/Niklas

olesen October 21, 2008 07:43

Hi Niklas, yes on SuSE /bin
 
Hi Niklas,

yes on SuSE /bin/sh -> bash, but I thought most of the OpenFOAM scripts were POSIX-compliant (except for wmakeScheduler, which uses bash).

Could you be hitting this?
https://bugs.launchpad.net/debian/+s...89/+viewstatus

wikstrom October 21, 2008 10:06

No, that one seem to be solved
 
No, that one seem to be solved. One example problem is in mpirunDebug, line 130. Specifically this does not work

#!/bin/dash
nProcs=4

for ((proc=0; proc<$nProcs; proc++))
do
echo $proc
done

But result in Syntax error: Bad for loop variable.
With /bin/bash it works, though.

mattijs October 21, 2008 12:39

What is MPI_BUFFER_SIZE set to
 
What is MPI_BUFFER_SIZE set to? I am running with 200000000 or even bigger. It transfers whole sections of the mesh across so might run into problems with small buffer. Had hoped mpi would give nice error message though :-(

You cannot run with fulldebug on it - ordering problem in constructing the patches with the new mesh.

niklas October 22, 2008 03:17

Let me buy you a beer http://w
 
Let me buy you a beer http://www.cfd-online.com/OpenFOAM_D...part/happy.gif

setting MPI_BUFFER_SIZE to 2000000000 (thats 9 zeros) solved it for me
(plus changing my username to Bob the superior builder ;))

for those who need to look like I did its in etc/settings.(c)sh

lhcamilo November 26, 2008 05:06

Hi there, I have been tryi
 
Hi there,

I have been trying to run a snappyHexMesh in parallel with no success at all. I am trying to run a case called "simplecase" using the following command from one directory above the case directory(I am using a quad core 32 bit machine):

"mpirun -np 4 snappyHexMesh -case simplecase -parallel "

then I get:

Points so far:8(137802 272906 280719 303252 303253 310382 325627 325731)#0 Foam::error::printStack(Foam:http://www.cfd-online.com/OpenFOAM_D...part/proud.gifstream&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so"
#1 Foam::error::abort() in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so"
#2 Foam::hexRef8::setRefinement(Foam::List<int> const&, Foam::polyTopoChange&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libdynamicMesh.so"
#3 Foam::meshRefinement::refine(Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#4 Foam::meshRefinement::refineAndBalance(Foam::strin g const&, Foam::decompositionMethod&, Foam::fvMeshDistribute&, Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#5 Foam::autoRefineDriver::surfaceOnlyRefine(Foam::re finementParameters const&, int) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#6 Foam::autoRefineDriver::doRefine(Foam::dictionary const&, Foam::refinementParameters const&, bool) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#7 main in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh"
#8 __libc_start_main in "/lib/libc.so.6"
#9 Foam::regIOobject::writeObject(Foam::IOstream::str eamFormat, Foam::IOstream::versionNumber, Foam::IOstream::compressionType) const in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh"
[3]
[3]
[3] From function hexRef8::setRefinement(const labelList&, polyTopoChange&)
[3] in file polyTopoChange/polyTopoChange/hexRef8.C at line 3349.
[3]
FOAM parallel run aborting
[3]
[openfoam01:05499] MPI_ABORT invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode 1
mpirun noticed that job rank 0 with PID 5496 on node openfoam01 exited on signal 15 (Terminated).
2 additional processes aborted (not shown)


alternatively if If I cd to the case directory I and use the command

foamJob -p -s snappyHexMesh

I get this slightly bigger error message:

#1 Foam::error::abort() in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so"
#2 Foam::hexRef8::setRefinement(Foam::List<int> const&, Foam::polyTopoChange&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libdynamicMesh.so"
#3 Foam::meshRefinement::refine(Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#4 Foam::meshRefinement::refineAndBalance(Foam::strin g const&, Foam::decompositionMethod&, Foam::fvMeshDistribute&, Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#5 Foam::autoRefineDriver::surfaceOnlyRefine(Foam::re finementParameters const&, int) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#6 Foam::autoRefineDriver::doRefine(Foam::dictionary const&, Foam::refinementParameters const&, bool) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#7 main in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh"
#8 __libc_start_main in "/lib/libc.so.6"
#9 Foam::regIOobject::writeObject(Foam::IOstream::str eamFormat, Foam::IOstream::versionNumber, Foam::IOstream::compressionType) const in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh"
[1]
[1]
[1] From function hexRef8::setRefinement(const labelList&, polyTopoChange&)
[1] in file polyTopoChange/polyTopoChange/hexRef8.C at line 3349.
[1]
FOAM parallel run aborting
[1]
[openfoam01:04820] MPI_ABORT invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 1
[3]
[3]
[3] cell 217431 of level 0 uses more than 8 points of equal or lower level
Points so far:8(137802 272906 280719 303252 303253 310382 325627 325731)#0 Foam::error::printStack(Foam:http://www.cfd-online.com/OpenFOAM_D...part/proud.gifstream&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so"
#1 Foam::error::abort() in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so"
#2 Foam::hexRef8::setRefinement(Foam::List<int> const&, Foam::polyTopoChange&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libdynamicMesh.so"
#3 Foam::meshRefinement::refine(Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#4 Foam::meshRefinement::refineAndBalance(Foam::strin g const&, Foam::decompositionMethod&, Foam::fvMeshDistribute&, Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#5 Foam::autoRefineDriver::surfaceOnlyRefine(Foam::re finementParameters const&, int) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#6 Foam::autoRefineDriver::doRefine(Foam::dictionary const&, Foam::refinementParameters const&, bool) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#7 main in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh"
#8 __libc_start_main in "/lib/libc.so.6"
#9 Foam::regIOobject::writeObject(Foam::IOstream::str eamFormat, Foam::IOstream::versionNumber, Foam::IOstream::compressionType) const in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh"
[3]
[3]
[3] From function hexRef8::setRefinement(const labelList&, polyTopoChange&)
[3] in file polyTopoChange/polyTopoChange/hexRef8.C at line 3349.
[3]
FOAM parallel run aborting
[3]
[openfoam01:04825] MPI_ABORT invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode 1




I have tried following the advice above and increasing the MPI_BUFFER_SIZE, but that did not help.

please help

thanks in advance


leo

mattijs November 26, 2008 05:52

Did you try running the 1.5.x
 
Did you try running the 1.5.x git repository? There are various fixes in there relating to snappyHexMesh.

lhcamilo November 26, 2008 06:55

I did not, I will try that rig
 
I did not, I will try that right away, although I have dowloaded OF-1.5 3 weeks ago into this machine.

In any case I tried running the case with 2 processor and surprisingly enough it went a lot further without breaking until I got this message:

Setting up information for layer truncation ...
mpirun noticed that job rank 1 with PID 7153 on node openfoam01 exited on signal 15 (Terminated).
1 process killed (possibly by Open MPI)
[leo@openfoam01 simplecase]$ exited on signal 15 (Terminated)


I am still a little bit in the dark about why this keeps breaking, but I will try the git repos anyway.

thanks for the suggestion though

regards

leo


All times are GMT -4. The time now is 15:34.