r2d2 September 27, 2007 08:28

Hi All, I am having a proble
Hi All,
I am having a problem running a case of mine in parallel whilst the serial version is all fine (so far).
The mesh is imported from fluent (with the new fluent3DMeshToaFoam utility) and has an internal wall. As I said, this doesnt seem to bother much the run in serial, but after decomposing it the run invariably finishes with an MPI error message like:

Create mesh for time = 0

[oct11:07921] *** An error occurred in MPI_Recv
[oct11:07921] *** on communicator MPI_COMM_WORLD
[oct11:07921] *** MPI_ERR_TRUNCATE: message truncated
[oct11:07921] *** MPI_ERRORS_ARE_FATAL (goodbye)
[1] --> FOAM FATAL IO ERROR : Expected a ')' or a '}' while reading List, found on line 0 an error
[1] file: IOstream at line 0.
[1] From function Istream::readEndList(const char*)
[1] in file db/IOstreams/IOstreams/Istream.C at line 159.
FOAM parallel run exiting
?? in "/lib/"
[2] #3 ?? at pml_ob1_recvfrag.c:0
[2] #4 mca_btl_sm_component_progress in "/home/radu/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/ lib/openmpi/"
[2] #5 mca_bml_r2_progress in "/home/radu/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/ lib/openmpi/"
[2] #6 opal_progress in "/home/radu/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/ lib/"
[2] #7 mca_pml_ob1_probe in "/home/radu/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/ lib/openmpi/"
[2] #8 MPI_Probe in "/home/radu/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/ lib/"
[2] #9 Foam::IPstream::IPstream(int, int, Foam::IOstream::streamFormat, Foam::IOstream::versionNumber) in "/home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/openmpi-1.2.3/libPstream .so"
[2] #10 Foam::globalPoints::receivePatchPoints(Foam::HashS et<int,> >&) in "/home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/"
[2] #11 Foam::globalPoints::globalPoints(Foam::polyMesh const&) in "/home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/"
[2] #12 Foam::globalMeshData::updateMesh() in "/home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/"
[2] #13 Foam::globalMeshData::globalMeshData(Foam::polyMes h const&) in "/home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/"
[2] #14 Foam::polyMesh::globalData() const in "/home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/"
[2] #15 Foam::polyMesh::polyMesh(Foam::IOobject const&) in "/home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/"
[2] #16 Foam::fvMesh::fvMesh(Foam::IOobject const&) in "/home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/"
[2] #17 main in "/home/radu/OpenFOAM/OpenFOAM-1.4.1/applications/bin/linux64GccDPOpt/icoFoam"
[2] #18 __libc_start_main in "/lib/"
[2] #19 Foam::regIOobject::readIfModified() in "/home/radu/OpenFOAM/OpenFOAM-1.4.1/applications/bin/linux64GccDPOpt/icoFoam"
[oct11:07971] *** Process received signal ***
[oct11:07971] Signal: Segmentation fault (11)
[oct11:07971] Signal code: (-6)
[oct11:07971] Failing at address: 0x47300001f23
[oct11:07971] [ 0] /lib/ [0x2aaaac61c110]
[oct11:07971] [ 1] /lib/ [0x2aaaac61c07b]
[oct11:07971] [ 2] /lib/ [0x2aaaac61c110]
[oct11:07971] [ 3] /home/radu/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/l ib/openmpi/ [0x2aaab26b8c17]
[oct11:07971] [ 4] /home/radu/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/l ib/openmpi/ [0x2aaab2cd07cb]
[oct11:07971] [ 5] /home/radu/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/l ib/openmpi/ [0x2aaab28c426a]
[oct11:07971] [ 6] /home/radu/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/l ib/ [0x2aaaad93495a]
[oct11:07971] [ 7] /home/radu/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/l ib/openmpi/ [0x2aaab26b61a5]
[oct11:07971] [ 8] /home/radu/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/l ib/ [0x2aaaad28fda6]
[oct11:07971] [ 9] /home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/openmpi-1.2.3/libPstream. so(_ZN4Foam8IPstreamC1EiiNS_8IOstream12streamForma tENS1_13versionNumberE+0xee) [0x2aaaac82f24e]
[oct11:07971] [10] /home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/ globalPoints18receivePatchPointsERNS_7HashSetIiNS_ 4HashIiEEEE+0x22c) [0x2aaaababc50c]
[oct11:07971] [11] /home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/ globalPointsC1ERKNS_8polyMeshE+0x24f) [0x2aaaababccaf]
[oct11:07971] [12] /home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/ globalMeshData10updateMeshEv+0x110) [0x2aaaabaae890]
[oct11:07971] [13] /home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/ globalMeshDataC1ERKNS_8polyMeshE+0xe4) [0x2aaaabaaff64]
[oct11:07971] [14] /home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/ polyMesh10globalDataEv+0x55) [0x2aaaabad07f5]
[oct11:07971] [15] /home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/ olyMeshC2ERKNS_8IOobjectE+0x1c02) [0x2aaaabad6f12]
[oct11:07971] [16] /home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/ am6fvMeshC1ERKNS_8IOobjectE+0x19) [0x2aaaaae3cae9]
[oct11:07971] [17] /home/radu/OpenFOAM/OpenFOAM-1.4.1/applications/bin/linux64GccDPOpt/icoFoam [0x412e07]
[oct11:07971] [18] /lib/ [0x2aaaac6094ca]
[oct11:07971] [19] /home/radu/OpenFOAM/OpenFOAM-1.4.1/applications/bin/linux64GccDPOpt/icoFoam(_ZN4 Foam11regIOobject14readIfModifiedEv+0x1a9) [0x412979]
[oct11:07971] *** End of error message ***
mpirun noticed that job rank 0 with PID 7969 on node oct11 exited on signal 15 (Terminated).
3 additional processes aborted (not shown)

checkMesh does say:

Mesh stats
points: 4079936
edges: 12145862
faces: 12129187
internal faces: 11788349
cells: 3986256
boundary patches: 4
point zones: 0
face zones: 0
cell zones: 3

Number of cells of each type:
hexahedra: 3986256
prisms: 0
wedges: 0
pyramids: 0
tet wedges: 0
tetrahedra: 0
polyhedra: 0

Checking topology...
Boundary definition OK.
Point usage OK.
Upper triangular ordering OK.
Topological cell zip-up check OK.
Face vertices OK.
Number of identical duplicate faces (baffle faces): 77004
Face-face connectivity OK.
Number of regions: 1 (OK).

Checking patch topology for multiply connected surfaces ...
Patch Faces Points Surface
pared 182407 182695 ok (not multiply connected)
inflow_top_lid 1836 1963 ok (not multiply connected)
outflow_top_lid 2587 2750 ok (not multiply connected)
pared_interior 154008 77742 multiply connected surface (shared edge)
<<Writing 77718 conflicting points to set nonManifoldPoints

Checking geometry...
Domain bounding box: (-0.04 -0.04 -1.42109e-17) (0.04 0.04 0.08)
Boundary openness (-2.89631e-16 -8.26677e-16 -8.0705e-16) OK.
Max cell openness = 8.55581e-16 OK.
Max aspect ratio = 323.357 OK.
Minumum face area = 8.39926e-10. Maximum face area = 8.16213e-06. Face area magnitudes OK.
Min volume = 6.33172e-14. Max volume = 1.06746e-08. Total volume = 0.000402107. Cell volumes OK.
Mesh non-orthogonality Max: 32.6604 average: 5.2825
Non-orthogonality check OK.
Face pyramids OK.
Max skewness = 0.594768 OK.
Min/max edge length = 2.04497e-05 0.00509539 OK.
All angles in faces OK.
Face flatness (1 = flat, 0 = butterfly) : average = 1 min = 0.999999
All face flatness OK.

Mesh OK.

Is that multiply connected surface (the internal wall) that is causing the trouble? Or should I look elsewhere? I am saying this because I did the import with the old "fluentMeshToFoam" and used the procedure described by Bernhard for "mesh with internal walls" and, having two patches instead of one did get rid of these "multiply connected surfaces" label, but the parallel run failed again.

Sorry for the long post...

fra76 September 27, 2007 09:18

How did you decompose the mesh
How did you decompose the mesh?
Are you using the same OF version for decomposing and running?
Try checkMesh in parallel, but I guess it returns the same error...

r2d2 September 27, 2007 10:06

Running decomposePar with the
Running decomposePar with the "simple" option as the 3D mesh is just a 2D one replicated in the third direction a certain number of times.
And yes I used the same OF1.1.4 for decomposing and running, and previously did an ./Allwmake in ~/OpenFOAM/OpenFOAM-1.4.1/applications/utilities/
parallelProcessing, just in case.
However, in the meantime I wiped a part of the mesh of one side of that internal wall patch, so now it became a normal boundary patch of type wall, and did the whole process of importing the mesh etc, etc...and IT WORKED!...both serial and parallel.
So my guess is that the multiply connected face, or the two faces of zero "depth" that were created following Bernhardīs procedure make the difference in some stage of the parallel run process.
Did someone encounter the same problem, or am I just rubbish/sluggish somewhere in the way?

mattijs September 27, 2007 12:42

Can you check that both sides
Can you check that both sides of all processor patches have the same number of points? This is a requirement for valid meshes.

Just run checkMesh on all the domains and look at the patch statistics for procBoundary_xxToyy v.s. procBoundary_yyToxx.

r2d2 September 28, 2007 04:14

Hi Mattijs, I did that check
Hi Mattijs,
I did that check and forgot to mention it in the post. And yes, the stats say that they do have the same number of points and faces either way of the processor boundaries.
Furthermore,the patch with multiply connected faces lies inside one of the processor domain, some cells away from the processor boundary.
LASTMINUTE: ..I changed the decomposition method from simple to metis with the same weight on all processes and to my surprise it works in it runs with no MPI failure.
So I guess that I will stick with this to get some results. In the meantime will try to understand what went wrong before (truth is that I simply donīt think I will, cause I donīt see anything wrong, for what I know)
Cheers anyway,

mattijs September 28, 2007 05:42

Can you post the case or send
Can you post the case or send it to me? (m.janssens)

r2d2 September 28, 2007 06:43

How can I upload a case? Never
How can I upload a case? Never done that....mesh file from Gambit is large (~800M)...

r2d2 September 28, 2007 06:52

try to load the 0 and constant
try to load the 0 and constant dirs...system should be like the one of e.g. icoFoam/cavity and run with icoFoam...

r2d2 September 28, 2007 07:02

Well... I now know how to do i
Well... I now know how to do it,but of course it complains about the size...and it fails...

mattijs September 28, 2007 09:43

There's a 50K limit on this fo
There's a 50K limit on this forum.

Have any smaller case that has the problem?

Or cut out the bits that give problems perhaps?
- set your startTime to latestTime
- for all domains pick up the cells using any point on the boundary:

setSet . processorXXX
faceSet f0 new boundaryToFace
pointSet p0 new faceToPoint f0 all
cellSet c0 new pointToCell p0 any

- subset the c0 part of the mesh:
subsetMesh <root> <case> c0

- pack up the subsetted meshes (there will be new time directories with a polyMesh inside)

r2d2 September 28, 2007 10:03

Did what you said, but the tgz
Did what you said, but the tgz of one of them polyMesh directories "weights" still some 9M. Will do a smaller case and check.
Thank you for your effort and I will let you know as soon as I get something. Probably monday...

r2d2 October 1, 2007 04:42

Well, well..I did a smaller ca
Well, well..I did a smaller case, but then everything was fine so no luck in catching the fault. However, on the big case (the one with metis decomp.), the run failed at some time when a dump of data had to be done..and gave me some errors like:

[oct11:08555] *** Process received signal ***
[oct11:08555] Signal: Bus error (7)
[oct11:08555] Signal code: (2)
[oct11:08555] Failing at address: 0x2aaaab04be10
[oct11:08555] [ 0] /lib/ [0x2aaaac98b110]
[oct11:08555] [ 1] /home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/ oam20coupledFvsPatchFieldIdE5writeERNS_7OstreamE+0 ) [0x2aaaab04be10]
[oct11:08555] [ 2] /home/radu/OpenFOAM/radu-1.4.1/applications/bin/linux64GccDPOpt/porosoSteadyFoam (_ZNK4Foam14GeometricFieldIdNS_13fvsPatchFieldENS_ 11surfaceMeshEE22GeometricBoun daryField10writeEntryERKNS_4wordERNS_7OstreamE+0x1 2b) [0x42573b]
[oct11:08555] [ 3] /home/radu/OpenFOAM/radu-1.4.1/applications/bin/linux64GccDPOpt/porosoSteadyFoam (_ZN4FoamlsIdNS_13fvsPatchFieldENS_11surfaceMeshEE ERNS_7OstreamES4_RKNS_14Geomet ricFieldIT_T0_T1_EE+0x1d4) [0x43a344]
[oct11:08555] [ 4] /home/radu/OpenFOAM/radu-1.4.1/applications/bin/linux64GccDPOpt/porosoSteadyFoam (_ZNK4Foam14GeometricFieldIdNS_13fvsPatchFieldENS_ 11surfaceMeshEE9writeDataERNS_ 7OstreamE+0xf) [0x43a3ef]
[oct11:08555] [ 5] /home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/ 1regIOobject11writeObjectENS_8IOstream12streamForm atENS1_13versionNumberENS1_15c ompressionTypeE+0x263) [0x2aaaabd69e03]
[oct11:08555] [ 6] /home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/ 4objectRegistry11writeObjectENS_8IOstream12streamF ormatENS1_13versionNumberENS1_ 15compressionTypeE+0x93) [0x2aaaabd6dc63]
[oct11:08555] [ 7] /home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/ 4objectRegistry11writeObjectENS_8IOstream12streamF ormatENS1_13versionNumberENS1_ 15compressionTypeE+0x93) [0x2aaaabd6dc63]
[oct11:08555] [ 8] /home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/ Time11writeObjectENS_8IOstream12streamFormatENS1_1 3versionNumberENS1_15compressi onTypeE+0x3ab) [0x2aaaabd7ffdb]
[oct11:08555] [ 9] /home/radu/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/ 1regIOobject5writeEv+0x4f) [0x2aaaabd69b7f]
[oct11:08555] [10] /home/radu/OpenFOAM/radu-1.4.1/applications/bin/linux64GccDPOpt/porosoSteadyFoam [0x416b0e]
[oct11:08555] [11] /lib/ [0x2aaaac9784ca]
[oct11:08555] [12] /home/radu/OpenFOAM/radu-1.4.1/applications/bin/linux64GccDPOpt/porosoSteadyFoam (__gxx_personality_v0+0xda) [0x412a4a]

Rings a bell to anyone?
So...I guess that somethingīs wrong in the cluster setup, right? Have to contact the Admin.

mattijs October 2, 2007 13:05

Something seems to be still wr
Something seems to be still wrong on your processor patches ...

Try running with the following environment variables set:

# Initialise blocks of memory to NaN
# Abort instead of exit
# Exit if NaN encountered

and possibly under valgrind.

r2d2 October 5, 2007 04:21

Mattijs, Could not advance an
Could not advance anything into the problem so far, cause I had so many admin. things to do, but I will and will let you know.

r2d2 October 26, 2007 06:39

Hi all, Finally I gave up se
Hi all,
Finally I gave up searching for the problem in any of decomposePar and friends. And, surprisingly enough, now it seems to work fine. That is after I removed some exports in my bashrc...probably something tampered with my OF install.
Thatīs that then.

schmidt_d July 31, 2008 15:04

I believe that increasing the
I believe that increasing the environment variable:
may help.

