![]() |
Description:
Lately I have se
Description:
Lately I have several times been running into the followin problem. It is repeatable with the same case on two different hardware archs and with icc and gcc compilers: During Shell refinement iteration (>1) an MPI-error occur: [dagobah:01576] *** An error occurred in MPI_Bsend [dagobah:01576] *** on communicator MPI_COMM_WORLD [dagobah:01576] *** MPI_ERR_BUFFER: invalid buffer pointer [dagobah:01576] *** MPI_ERRORS_ARE_FATAL (goodbye) Problem does not occur using e.g. 2 processes. Solver/Application: (name of application. If one of your own track down where the problem is inside OpenFOAM) Testcase: http://www.cfd-online.com/OpenFOAM_D...hment_icon.gif snappyHexMesh-coarse.tgz Platform: linux i686 and X86-64, The latter tested with gcc and icc Version: 1.5.x (2008-10-10) Notes: To run: blockMesh decomposePar foamJob -p -s snappyHexMesh Cheers Niklas |
Have you checked for 'if (user
Have you checked for 'if (user() == "Niklas")' ;-)
Your problem should work with 17-10 onwards version. Fixed some unsynchronised communications. |
;-) you're great Mattijs. Than
;-) you're great Mattijs. Thanks, and I'm pulling allready. Testing tomorrow morning.
Really promising tool by the way, snappy! Awed. And it got me into learning Blender more, as well, which is fun. /Niklas |
Sorry Mattijs,
did not do i
Sorry Mattijs,
did not do it. Same situation as before, it seems. Does it work with user()== "Mattijs" ? ;-) |
It worked for me on 17/10 when
It worked for me on 17/10 when I pushed those changes in. Didn't use foamJob but that is about the only difference.
|
didnt work for me either.
I
didnt work for me either.
I see there's a new version of openmpi out 1.2.8 Im using 1.2.6 still. Think that could be it? Will try 1.2.8 tomorrow and see if it matters. N |
nope, same problem...
nope, same problem...
|
Hadn't expected mpi version to
Hadn't expected mpi version to matter. Where does it go wrong? Can you post log or run it in separate windows (e.g. using mpirunDebug) and get a traceback? What are your Pstream settings (non-blocking?)
|
Dont know what my Pstream sett
Dont know what my Pstream settings are.
Where do I check/change that? N |
etc/controldict
Mine is non
etc/controldict
Mine is nonBlocking, but I earlier tried with blocking. No difference. |
nvm,
its nonBlocking.
Here's
nvm,
its nonBlocking. Here's a log from mpirunDebug Im trying different setting now. Shell refinement iteration 2 ---------------------------- Marked for refinement due to refinement shells : 338416 cells. Determined cells to refine in = 0.74 s Selected for internal refinement : 339745 cells (out of 736740) Edge intersection testing: Number of edges : 9527741 Number of edges to retest : 8366794 Number of intersected edges : 63397 Refined mesh in = 20.68 s After refinement shell refinement iteration 2 : cells:3114955 faces:9495395 points:3267682 Cells per refinement level: 0 19318 1 8320 2 19344 3 2848317 4 155933 5 63723 Program received signal SIGHUP, Hangup. [Switching to Thread 182941472640 (LWP 15999)] 0x0000002a98140c92 in opal_progress () from /afs/scania.se/home/z/sssnos/OpenFOAM/ThirdParty/openmpi-1.2.8/platforms/linux64 GccDPOpt/lib/libopen-pal.so.0 #0 0x0000002a98140c92 in opal_progress () from /afs/scania.se/home/z/sssnos/OpenFOAM/ThirdParty/openmpi-1.2.8/platforms/linux64 GccDPOpt/lib/libopen-pal.so.0 #1 0x0000002a9a57f0f5 in mca_pml_ob1_probe () from /afs/scania.se/home/z/sssnos/OpenFOAM/ThirdParty/openmpi-1.2.8/platforms/linux64 GccDPOpt/lib/openmpi/mca_pml_ob1.so #2 0x0000002a97e9cd86 in MPI_Probe () from /afs/scania.se/home/z/sssnos/OpenFOAM/ThirdParty/openmpi-1.2.8/platforms/linux64 GccDPOpt/lib/libmpi.so.0 #3 0x0000002a979babd1 in Foam::IPstream::IPstream () from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/openmpi -1.2.8/libPstream.so #4 0x0000002a963a512f in Foam::fvMeshDistribute::receiveMesh () from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libdyna micMesh.so #5 0x0000002a963a7a9b in Foam::fvMeshDistribute::distribute () from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libdyna micMesh.so #6 0x0000002a95762e65 in Foam::meshRefinement::refineAndBalance () from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libauto Mesh.so #7 0x0000002a95702e7b in Foam::autoRefineDriver::shellRefine () from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libauto Mesh.so #8 0x0000002a9570394f in Foam::autoRefineDriver::doRefine () from /afs/scania.se/home/z/sssnos/OpenFOAM/OpenFOAM-1.5.2/lib/linux64GccDPOpt/libauto Mesh.so #9 0x0000000000406357 in main () (gdb) Hangup detected on fd 0 Error detected on fd 0 |
scheduled, blocking or nonBloc
scheduled, blocking or nonBlocking gives exactly the same result.
/N FYI: I have problems running mpirunDebug. Same issues I've had earlier about OF shell scripts running /bin/sh: In ubuntu /bin/sh -> /bin/dash. (If I change the shebang to /bin/bash it usually works, though.) Does Suse link /bin/sh to bash? I can fix this locally. Just wanted you to know. /Niklas |
Hi Niklas,
yes on SuSE /bin
Hi Niklas,
yes on SuSE /bin/sh -> bash, but I thought most of the OpenFOAM scripts were POSIX-compliant (except for wmakeScheduler, which uses bash). Could you be hitting this? https://bugs.launchpad.net/debian/+s...89/+viewstatus |
No, that one seem to be solved
No, that one seem to be solved. One example problem is in mpirunDebug, line 130. Specifically this does not work
#!/bin/dash nProcs=4 for ((proc=0; proc<$nProcs; proc++)) do echo $proc done But result in Syntax error: Bad for loop variable. With /bin/bash it works, though. |
What is MPI_BUFFER_SIZE set to
What is MPI_BUFFER_SIZE set to? I am running with 200000000 or even bigger. It transfers whole sections of the mesh across so might run into problems with small buffer. Had hoped mpi would give nice error message though :-(
You cannot run with fulldebug on it - ordering problem in constructing the patches with the new mesh. |
Let me buy you a beer http://w
Let me buy you a beer http://www.cfd-online.com/OpenFOAM_D...part/happy.gif
setting MPI_BUFFER_SIZE to 2000000000 (thats 9 zeros) solved it for me (plus changing my username to Bob the superior builder ;)) for those who need to look like I did its in etc/settings.(c)sh |
Hi there,
I have been tryi
Hi there,
I have been trying to run a snappyHexMesh in parallel with no success at all. I am trying to run a case called "simplecase" using the following command from one directory above the case directory(I am using a quad core 32 bit machine): "mpirun -np 4 snappyHexMesh -case simplecase -parallel " then I get: Points so far:8(137802 272906 280719 303252 303253 310382 325627 325731)#0 Foam::error::printStack(Foam:http://www.cfd-online.com/OpenFOAM_D...part/proud.gifstream&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so" #1 Foam::error::abort() in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so" #2 Foam::hexRef8::setRefinement(Foam::List<int> const&, Foam::polyTopoChange&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libdynamicMesh.so" #3 Foam::meshRefinement::refine(Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #4 Foam::meshRefinement::refineAndBalance(Foam::strin g const&, Foam::decompositionMethod&, Foam::fvMeshDistribute&, Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #5 Foam::autoRefineDriver::surfaceOnlyRefine(Foam::re finementParameters const&, int) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #6 Foam::autoRefineDriver::doRefine(Foam::dictionary const&, Foam::refinementParameters const&, bool) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #7 main in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh" #8 __libc_start_main in "/lib/libc.so.6" #9 Foam::regIOobject::writeObject(Foam::IOstream::str eamFormat, Foam::IOstream::versionNumber, Foam::IOstream::compressionType) const in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh" [3] [3] [3] From function hexRef8::setRefinement(const labelList&, polyTopoChange&) [3] in file polyTopoChange/polyTopoChange/hexRef8.C at line 3349. [3] FOAM parallel run aborting [3] [openfoam01:05499] MPI_ABORT invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode 1 mpirun noticed that job rank 0 with PID 5496 on node openfoam01 exited on signal 15 (Terminated). 2 additional processes aborted (not shown) alternatively if If I cd to the case directory I and use the command foamJob -p -s snappyHexMesh I get this slightly bigger error message: #1 Foam::error::abort() in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so" #2 Foam::hexRef8::setRefinement(Foam::List<int> const&, Foam::polyTopoChange&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libdynamicMesh.so" #3 Foam::meshRefinement::refine(Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #4 Foam::meshRefinement::refineAndBalance(Foam::strin g const&, Foam::decompositionMethod&, Foam::fvMeshDistribute&, Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #5 Foam::autoRefineDriver::surfaceOnlyRefine(Foam::re finementParameters const&, int) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #6 Foam::autoRefineDriver::doRefine(Foam::dictionary const&, Foam::refinementParameters const&, bool) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #7 main in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh" #8 __libc_start_main in "/lib/libc.so.6" #9 Foam::regIOobject::writeObject(Foam::IOstream::str eamFormat, Foam::IOstream::versionNumber, Foam::IOstream::compressionType) const in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh" [1] [1] [1] From function hexRef8::setRefinement(const labelList&, polyTopoChange&) [1] in file polyTopoChange/polyTopoChange/hexRef8.C at line 3349. [1] FOAM parallel run aborting [1] [openfoam01:04820] MPI_ABORT invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 1 [3] [3] [3] cell 217431 of level 0 uses more than 8 points of equal or lower level Points so far:8(137802 272906 280719 303252 303253 310382 325627 325731)#0 Foam::error::printStack(Foam:http://www.cfd-online.com/OpenFOAM_D...part/proud.gifstream&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so" #1 Foam::error::abort() in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libOpenFOAM.so" #2 Foam::hexRef8::setRefinement(Foam::List<int> const&, Foam::polyTopoChange&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libdynamicMesh.so" #3 Foam::meshRefinement::refine(Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #4 Foam::meshRefinement::refineAndBalance(Foam::strin g const&, Foam::decompositionMethod&, Foam::fvMeshDistribute&, Foam::List<int> const&) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #5 Foam::autoRefineDriver::surfaceOnlyRefine(Foam::re finementParameters const&, int) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #6 Foam::autoRefineDriver::doRefine(Foam::dictionary const&, Foam::refinementParameters const&, bool) in "/home/leo/OpenFOAM/OpenFOAM-1.5/lib/linuxGccDPOpt/libautoMesh.so" #7 main in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh" #8 __libc_start_main in "/lib/libc.so.6" #9 Foam::regIOobject::writeObject(Foam::IOstream::str eamFormat, Foam::IOstream::versionNumber, Foam::IOstream::compressionType) const in "/home/leo/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/snappyHexMesh" [3] [3] [3] From function hexRef8::setRefinement(const labelList&, polyTopoChange&) [3] in file polyTopoChange/polyTopoChange/hexRef8.C at line 3349. [3] FOAM parallel run aborting [3] [openfoam01:04825] MPI_ABORT invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode 1 I have tried following the advice above and increasing the MPI_BUFFER_SIZE, but that did not help. please help thanks in advance leo |
Did you try running the 1.5.x
Did you try running the 1.5.x git repository? There are various fixes in there relating to snappyHexMesh.
|
I did not, I will try that rig
I did not, I will try that right away, although I have dowloaded OF-1.5 3 weeks ago into this machine.
In any case I tried running the case with 2 processor and surprisingly enough it went a lot further without breaking until I got this message: Setting up information for layer truncation ... mpirun noticed that job rank 1 with PID 7153 on node openfoam01 exited on signal 15 (Terminated). 1 process killed (possibly by Open MPI) [leo@openfoam01 simplecase]$ exited on signal 15 (Terminated) I am still a little bit in the dark about why this keeps breaking, but I will try the git repos anyway. thanks for the suggestion though regards leo |
| All times are GMT -4. The time now is 15:57. |