CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > OpenFOAM Bugs

SnappyHexMesh in parallel openmpi

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   October 17, 2008, 04:44
Default Description: Lately I have se
  #1
Member
 
Niklas Wikstrom
Join Date: Mar 2009
Posts: 84
Rep Power: 7
wikstrom is on a distinguished road
Description:
Lately I have several times been running into the followin problem. It is repeatable with the same case on two different hardware archs and with icc and gcc compilers:

During Shell refinement iteration (>1) an MPI-error occur:

[dagobah:01576] *** An error occurred in MPI_Bsend
[dagobah:01576] *** on communicator MPI_COMM_WORLD
[dagobah:01576] *** MPI_ERR_BUFFER: invalid buffer pointer
[dagobah:01576] *** MPI_ERRORS_ARE_FATAL (goodbye)

Problem does not occur using e.g. 2 processes.

Solver/Application:
(name of application. If one of your own track down where the problem is
inside OpenFOAM)


Testcase:
snappyHexMesh-coarse.tgz

Platform:
linux i686 and X86-64, The latter tested with gcc and icc

Version:
1.5.x (2008-10-10)

Notes:

To run:

blockMesh
decomposePar
foamJob -p -s snappyHexMesh


Cheers Niklas
wikstrom is offline   Reply With Quote

Old   October 17, 2008, 11:40
Default Have you checked for 'if (user
  #2
Super Moderator
 
Mattijs Janssens
Join Date: Mar 2009
Posts: 1,416
Rep Power: 15
mattijs is on a distinguished road
Have you checked for 'if (user() == "Niklas")' ;-)

Your problem should work with 17-10 onwards version. Fixed some unsynchronised communications.
mattijs is offline   Reply With Quote

Old   October 19, 2008, 14:51
Default ;-) you're great Mattijs. Than
  #3
Member
 
Niklas Wikstrom
Join Date: Mar 2009
Posts: 84
Rep Power: 7
wikstrom is on a distinguished road
;-) you're great Mattijs. Thanks, and I'm pulling allready. Testing tomorrow morning.

Really promising tool by the way, snappy! Awed. And it got me into learning Blender more, as well, which is fun.

/Niklas
wikstrom is offline   Reply With Quote

Old   October 20, 2008, 07:35
Default Sorry Mattijs, did not do i
  #4
Member
 
Niklas Wikstrom
Join Date: Mar 2009
Posts: 84
Rep Power: 7
wikstrom is on a distinguished road
Sorry Mattijs,

did not do it. Same situation as before, it seems. Does it work with user()== "Mattijs" ? ;-)
wikstrom is offline   Reply With Quote

Old   October 20, 2008, 14:20
Default It worked for me on 17/10 when
  #5
Super Moderator
 
Mattijs Janssens
Join Date: Mar 2009
Posts: 1,416
Rep Power: 15
mattijs is on a distinguished road
It worked for me on 17/10 when I pushed those changes in. Didn't use foamJob but that is about the only difference.
mattijs is offline   Reply With Quote

Old   October 20, 2008, 16:46
Default didnt work for me either. I
  #6
Super Moderator
 
niklas's Avatar
 
Niklas Nordin
Join Date: Mar 2009
Location: Stockholm, Sweden
Posts: 692
Rep Power: 18
niklas will become famous soon enough
didnt work for me either.

I see there's a new version of openmpi out 1.2.8
Im using 1.2.6 still.
Think that could be it?

Will try 1.2.8 tomorrow and see if it matters.

N
niklas is offline   Reply With Quote

Old   October 21, 2008, 03:46
Default nope, same problem...
  #7
Super Moderator
 
niklas's Avatar
 
Niklas Nordin
Join Date: Mar 2009
Location: Stockholm, Sweden
Posts: 692
Rep Power: 18
niklas will become famous soon enough
nope, same problem...
niklas is offline   Reply With Quote

Old   October 21, 2008, 04:12
Default Hadn't expected mpi version to
  #8
Super Moderator
 
Mattijs Janssens
Join Date: Mar 2009
Posts: 1,416
Rep Power: 15
mattijs is on a distinguished road
Hadn't expected mpi version to matter. Where does it go wrong? Can you post log or run it in separate windows (e.g. using mpirunDebug) and get a traceback? What are your Pstream settings (non-blocking?)
mattijs is offline   Reply With Quote

Old   October 21, 2008, 06:17
Default Dont know what my Pstream sett
  #9
Super Moderator
 
niklas's Avatar
 
Niklas Nordin
Join Date: Mar 2009
Location: Stockholm, Sweden
Posts: 692
Rep Power: 18
niklas will become famous soon enough
Dont know what my Pstream settings are.

Where do I check/change that?

N
niklas is offline   Reply With Quote

Old   October 21, 2008, 06:23
Default etc/controldict Mine is non
  #10
Member
 
Niklas Wikstrom
Join Date: Mar 2009
Posts: 84
Rep Power: 7
wikstrom is on a distinguished road
etc/controldict

Mine is nonBlocking, but I earlier tried with blocking. No difference.
wikstrom is offline   Reply With Quote

Old   October 21, 2008, 06:36
Default nvm, its nonBlocking. Here's
  #11
Super Moderator
 
niklas's Avatar
 
Niklas Nordin
Join Date: Mar 2009
Location: Stockholm, Sweden
Posts: 692
Rep Power: 18
niklas will become famous soon enough
nvm,
its nonBlocking.
Here's a log from mpirunDebug

Im trying different setting now.
Shell refinement iteration 2
----------------------------

Marked for refinement due to refinement shells : 338416 cells.
Determined cells to refine in = 0.74 s
Selected for internal refinement : 339745 cells (out of 736740)
Edge intersection testing:
Number of edges : 9527741
Number of edges to retest : 8366794
Number of intersected edges : 63397
Refined mesh in = 20.68 s
After refinement shell refinement iteration 2 : cells:3114955 faces:9495395 points:3267682
Cells per refinement level:
0 19318
1 8320
2 19344
3 2848317
4 155933
5 63723

Program received signal SIGHUP, Hangup.
[Switching to Thread 182941472640 (LWP 15999)]
0x0000002a98140c92 in opal_progress ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/ThirdParty/openmpi-1.2.8/platforms/linux64 GccDPOpt/lib/libopen-pal.so.0
#0 0x0000002a98140c92 in opal_progress ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/ThirdParty/openmpi-1.2.8/platforms/linux64 GccDPOpt/lib/libopen-pal.so.0
#1 0x0000002a9a57f0f5 in mca_pml_ob1_probe ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/ThirdParty/openmpi-1.2.8/platforms/linux64 GccDPOpt/lib/openmpi/mca_pml_ob1.so
#2 0x0000002a97e9cd86 in MPI_Probe ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/ThirdParty/openmpi-1.2.8/platforms/linux64 GccDPOpt/lib/libmpi.so.0
#3 0x0000002a979babd1 in Foam::IPstream::IPstream ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/openmpi -1.2.8/libPstream.so
#4 0x0000002a963a512f in Foam::fvMeshDistribute::receiveMesh ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libdyna micMesh.so
#5 0x0000002a963a7a9b in Foam::fvMeshDistribute::distribute ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libdyna micMesh.so
#6 0x0000002a95762e65 in Foam::meshRefinement::refineAndBalance ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libauto Mesh.so
#7 0x0000002a95702e7b in Foam::autoRefineDriver::shellRefine ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libauto Mesh.so
#8 0x0000002a9570394f in Foam::autoRefineDriver::doRefine ()
from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libauto Mesh.so
#9 0x0000000000406357 in main ()
(gdb) Hangup detected on fd 0
Error detected on fd 0
niklas is offline   Reply With Quote

Old   October 21, 2008, 06:41
Default scheduled, blocking or nonBloc
  #12
Member
 
Niklas Wikstrom
Join Date: Mar 2009
Posts: 84
Rep Power: 7
wikstrom is on a distinguished road
scheduled, blocking or nonBlocking gives exactly the same result.

/N

FYI:

I have problems running mpirunDebug. Same issues I've had earlier about OF shell scripts running /bin/sh: In ubuntu /bin/sh -> /bin/dash. (If I change the shebang to /bin/bash it usually works, though.) Does Suse link /bin/sh to bash?

I can fix this locally. Just wanted you to know.

/Niklas
wikstrom is offline   Reply With Quote

Old   October 21, 2008, 07:43
Default Hi Niklas, yes on SuSE /bin
  #13
Senior Member
 
Mark Olesen
Join Date: Mar 2009
Location: http://olesenm.github.io/
Posts: 774
Rep Power: 17
olesen will become famous soon enough
Hi Niklas,

yes on SuSE /bin/sh -> bash, but I thought most of the OpenFOAM scripts were POSIX-compliant (except for wmakeScheduler, which uses bash).

Could you be hitting this?
https://bugs.launchpad.net/debian/+s...89/+viewstatus
olesen is offline   Reply With Quote

Old   October 21, 2008, 10:06
Default No, that one seem to be solved
  #14
Member
 
Niklas Wikstrom
Join Date: Mar 2009
Posts: 84
Rep Power: 7
wikstrom is on a distinguished road
No, that one seem to be solved. One example problem is in mpirunDebug, line 130. Specifically this does not work

#!/bin/dash
nProcs=4

for ((proc=0; proc<$nProcs; proc++))
do
echo $proc
done

But result in Syntax error: Bad for loop variable.
With /bin/bash it works, though.
wikstrom is offline   Reply With Quote

Old   October 21, 2008, 12:39
Default What is MPI_BUFFER_SIZE set to
  #15
Super Moderator
 
Mattijs Janssens
Join Date: Mar 2009
Posts: 1,416
Rep Power: 15
mattijs is on a distinguished road
What is MPI_BUFFER_SIZE set to? I am running with 200000000 or even bigger. It transfers whole sections of the mesh across so might run into problems with small buffer. Had hoped mpi would give nice error message though :-(

You cannot run with fulldebug on it - ordering problem in constructing the patches with the new mesh.
mattijs is offline   Reply With Quote

Old   October 22, 2008, 03:17
Default Let me buy you a beer http://w
  #16
Super Moderator
 
niklas's Avatar
 
Niklas Nordin
Join Date: Mar 2009
Location: Stockholm, Sweden
Posts: 692
Rep Power: 18
niklas will become famous soon enough
Let me buy you a beer

setting MPI_BUFFER_SIZE to 2000000000 (thats 9 zeros) solved it for me
(plus changing my username to Bob the superior builder ;))

for those who need to look like I did its in etc/settings.(c)sh
niklas is offline   Reply With Quote

Old   November 26, 2008, 04:06
Default Hi there, I have been tryi
  #17
Member
 
Leonardo Honfi Camilo
Join Date: Mar 2009
Location: Delft, Zuid Holland, The Netherlands
Posts: 46
Rep Power: 7
lhcamilo is on a distinguished road
Hi there,

I have been trying to run a snappyHexMesh in parallel with no success at all. I am trying to run a case called "simplecase" using the following command from one directory above the case directory(I am using a quad core 32 bit machine):

"mpirun -np 4 snappyHexMesh -case simplecase -parallel "

then I get:

Points so far:8(137802 272906 280719 303252 303253 310382 325627 325731)#0 Foam::error::printStack(Foam:stream&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so"
#1 Foam::error::abort() in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so"
#2 Foam::hexRef8::setRefinement(Foam::List<int> const&, Foam::polyTopoChange&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libdynamicMesh.so"
#3 Foam::meshRefinement::refine(Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#4 Foam::meshRefinement::refineAndBalance(Foam::strin g const&, Foam::decompositionMethod&, Foam::fvMeshDistribute&, Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#5 Foam::autoRefineDriver::surfaceOnlyRefine(Foam::re finementParameters const&, int) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#6 Foam::autoRefineDriver::doRefine(Foam::dictionary const&, Foam::refinementParameters const&, bool) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#7 main in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh"
#8 __libc_start_main in "/lib/libc.so.6"
#9 Foam::regIOobject::writeObject(Foam::IOstream::str eamFormat, Foam::IOstream::versionNumber, Foam::IOstream::compressionType) const in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh"
[3]
[3]
[3] From function hexRef8::setRefinement(const labelList&, polyTopoChange&)
[3] in file polyTopoChange/polyTopoChange/hexRef8.C at line 3349.
[3]
FOAM parallel run aborting
[3]
[openfoam01:05499] MPI_ABORT invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode 1
mpirun noticed that job rank 0 with PID 5496 on node openfoam01 exited on signal 15 (Terminated).
2 additional processes aborted (not shown)


alternatively if If I cd to the case directory I and use the command

foamJob -p -s snappyHexMesh

I get this slightly bigger error message:

#1 Foam::error::abort() in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so"
#2 Foam::hexRef8::setRefinement(Foam::List<int> const&, Foam::polyTopoChange&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libdynamicMesh.so"
#3 Foam::meshRefinement::refine(Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#4 Foam::meshRefinement::refineAndBalance(Foam::strin g const&, Foam::decompositionMethod&, Foam::fvMeshDistribute&, Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#5 Foam::autoRefineDriver::surfaceOnlyRefine(Foam::re finementParameters const&, int) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#6 Foam::autoRefineDriver::doRefine(Foam::dictionary const&, Foam::refinementParameters const&, bool) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#7 main in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh"
#8 __libc_start_main in "/lib/libc.so.6"
#9 Foam::regIOobject::writeObject(Foam::IOstream::str eamFormat, Foam::IOstream::versionNumber, Foam::IOstream::compressionType) const in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh"
[1]
[1]
[1] From function hexRef8::setRefinement(const labelList&, polyTopoChange&)
[1] in file polyTopoChange/polyTopoChange/hexRef8.C at line 3349.
[1]
FOAM parallel run aborting
[1]
[openfoam01:04820] MPI_ABORT invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 1
[3]
[3]
[3] cell 217431 of level 0 uses more than 8 points of equal or lower level
Points so far:8(137802 272906 280719 303252 303253 310382 325627 325731)#0 Foam::error::printStack(Foam:stream&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so"
#1 Foam::error::abort() in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so"
#2 Foam::hexRef8::setRefinement(Foam::List<int> const&, Foam::polyTopoChange&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libdynamicMesh.so"
#3 Foam::meshRefinement::refine(Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#4 Foam::meshRefinement::refineAndBalance(Foam::strin g const&, Foam::decompositionMethod&, Foam::fvMeshDistribute&, Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#5 Foam::autoRefineDriver::surfaceOnlyRefine(Foam::re finementParameters const&, int) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#6 Foam::autoRefineDriver::doRefine(Foam::dictionary const&, Foam::refinementParameters const&, bool) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so"
#7 main in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh"
#8 __libc_start_main in "/lib/libc.so.6"
#9 Foam::regIOobject::writeObject(Foam::IOstream::str eamFormat, Foam::IOstream::versionNumber, Foam::IOstream::compressionType) const in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh"
[3]
[3]
[3] From function hexRef8::setRefinement(const labelList&, polyTopoChange&)
[3] in file polyTopoChange/polyTopoChange/hexRef8.C at line 3349.
[3]
FOAM parallel run aborting
[3]
[openfoam01:04825] MPI_ABORT invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode 1




I have tried following the advice above and increasing the MPI_BUFFER_SIZE, but that did not help.

please help

thanks in advance


leo
lhcamilo is offline   Reply With Quote

Old   November 26, 2008, 04:52
Default Did you try running the 1.5.x
  #18
Super Moderator
 
Mattijs Janssens
Join Date: Mar 2009
Posts: 1,416
Rep Power: 15
mattijs is on a distinguished road
Did you try running the 1.5.x git repository? There are various fixes in there relating to snappyHexMesh.
mattijs is offline   Reply With Quote

Old   November 26, 2008, 05:55
Default I did not, I will try that rig
  #19
Member
 
Leonardo Honfi Camilo
Join Date: Mar 2009
Location: Delft, Zuid Holland, The Netherlands
Posts: 46
Rep Power: 7
lhcamilo is on a distinguished road
I did not, I will try that right away, although I have dowloaded OF-1.5 3 weeks ago into this machine.

In any case I tried running the case with 2 processor and surprisingly enough it went a lot further without breaking until I got this message:

Setting up information for layer truncation ...
mpirun noticed that job rank 1 with PID 7153 on node openfoam01 exited on signal 15 (Terminated).
1 process killed (possibly by Open MPI)
[leo@openfoam01 simplecase]$ exited on signal 15 (Terminated)


I am still a little bit in the dark about why this keeps breaking, but I will try the git repos anyway.

thanks for the suggestion though

regards

leo
lhcamilo is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
SnappyHexMesh in Parallel bastil OpenFOAM Mesh Utilities 22 April 7, 2010 11:48
Parallel case setup boundry conditions snappyhexmesh oskar OpenFOAM Pre-Processing 5 September 11, 2009 01:12
SnappyHexMesh in parallel openmpi wikstrom OpenFOAM Mesh Utilities 7 November 24, 2008 09:52
OpenMPI performance vega OpenFOAM Running, Solving & CFD 13 November 27, 2007 01:28
Cant run in parallel on two nodes using OpenMPI CHristofer Main CFD Forum 0 October 26, 2007 09:54


All times are GMT -4. The time now is 19:51.