CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Bugs (https://www.cfd-online.com/Forums/openfoam-bugs/)
-   -   snappyHexMesh - memory corruption (OF-v. 2.2.x) (https://www.cfd-online.com/Forums/openfoam-bugs/120324-snappyhexmesh-memory-corruption-v-2-2-x.html)

pct July 4, 2013 09:57

snappyHexMesh - memory corruption (OF-v. 2.2.x) [SOLVED]
 
Good morning everybody,

i'm not sure if this is actually a bug or just an unfortunate effect of different wrong settings or a bad mesh, but since it's not a regular OF-error but a memory-corruption i'll post it here.

i'm currently doing my diploma-thesis at university, which includes working with OpenFoam. (v. 2.2.x)
From one of my predecessors working on the project i have a basic case for mesh generation, including several *.stl-files for the geometry, which resembles a turbine-cascade. The problem occurs while processing the outlet.stl and looks like this:

Code:

*** glibc detected *** snappyHexMesh: malloc(): memory corruption: 0x0000000006051b50 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x76518)[0x7f6bf117e518]
/lib64/libc.so.6(+0x794cf)[0x7f6bf11814cf]
/lib64/libc.so.6(__libc_malloc+0x77)[0x7f6bf11835a7]
/usr/lib64/libstdc++.so.6(_Znwm+0x1d)[0x7f6bf175259d]
/usr/lib64/libstdc++.so.6(_Znam+0x9)[0x7f6bf17526b9]
/usr/local/share/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64IccDPOpt/lib/libautoMesh.so(_ZN4Foam9syncTools13syncPointListINS_6VectorIdEENS_13minMagSqrEqOpIS3_EENS_13mapDistribute9transformEEEvRKNS_8polyMeshERKNS_4ListIiEERNSB_IT_EERKT0_RKSF_RKT1_+0x154)[0x7f6bf29f06b4]
/usr/local/share/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64IccDPOpt/lib/libautoMesh.so(_ZNK4Foam14autoSnapDriver25calcNearestSurfaceFeatureERKNS_14snapParametersEiddRKNS_5FieldIdEERKNS4_INS_6VectorIdEEEERNS_14motionSmootherE+0xc47)[0x7f6bf2a0afb7]
/usr/local/share/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64IccDPOpt/lib/libautoMesh.so(_ZN4Foam14autoSnapDriver6doSnapERKNS_10dictionaryES3_dRKNS_14snapParametersE+0xcba)[0x7f6bf29e2bca]
snappyHexMesh[0x40d237]
/lib64/libc.so.6(__libc_start_main+0xe6)[0x7f6bf1126c16]
snappyHexMesh[0x40b459]

The following memory-map is about 2 pages but i can post it, if anyone actually wants to see it.
Since i have no access to valgrind and next to zero experience with gdb i simply started digging, by searching for the provided info-strings to see where the error actually occured an traced it to:

/src/mesh/autoMesh/autoHexMesh/autoHexMeshDriver/autoSnapDriverFeature.C

the last output given was from line 2853 to 2888 (codeblock commented with "//Count")

By inserting my own Info-outputs i reached syncTools::syncPointList ( ...)

Of course by inserting said info-outputs i changed the memorymap and from a certain length of output-string onwards the whole error collapsed to:

Code:

snappyHexMesh: malloc.c:4625: _int_malloc: Assertion `(unsigned long)(size) >= (unsigned long)(nb)' failed.
Abbruch
Exitcode 134

I'm running just snappyHexMesh, no multicore/parallel-run for a decomposed-case of something like that.

Since something like that can always be the consequence of corrupted base-data:

surfaceCheck for said *.stl looks like this.

Code:

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Reading surface from "constant/triSurface/outlet.stl" ...

Statistics:
Triangles    : 18
Vertices    : 16
Bounding Box : (328.133 -10 561.695) (328.133 60 961.695)

Region  Size
------  ----
GEOM    18


Surface has no illegal triangles.

Triangle quality (equilateral=1, collapsed=0):
    0 .. 0.05  : 0
    0.05 .. 0.1  : 0
    0.1 .. 0.15  : 0
    0.15 .. 0.2  : 0.222222
    0.2 .. 0.25  : 0
    0.25 .. 0.3  : 0.555556
    0.3 .. 0.35  : 0.222222
    0.35 .. 0.4  : 0
    0.4 .. 0.45  : 0
    0.45 .. 0.5  : 0
    0.5 .. 0.55  : 0
    0.55 .. 0.6  : 0
    0.6 .. 0.65  : 0
    0.65 .. 0.7  : 0
    0.7 .. 0.75  : 0
    0.75 .. 0.8  : 0
    0.8 .. 0.85  : 0
    0.85 .. 0.9  : 0
    0.9 .. 0.95  : 0
    0.95 .. 1  : 0

    min 0.198651 for triangle 2
    max 0.340691 for triangle 4

Edges:
    min 21 for edge 1 points (328.133 39 841.695)(328.133 60 841.695)
    max 162.432 for edge 9 points (328.133 11 681.695)(328.133 39 841.695)

Checking for points less than 1e-6 of bounding box ((0 70 400) meter) apart.
Found 0 nearby points.

Surface is not closed since not all edges connected to two faces:
    connected to one face : 12
    connected to >2 faces : 0
Conflicting face labels:12
Dumping conflicting face labels to "problemFaces"
Paste this into the input for surfaceSubset

Number of unconnected parts : 1

Number of zones (connected area with consistent normal) : 1


End

and checkMesh gives:

Code:

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Create polyMesh for time = 0

Time = 0

Mesh stats
    points:          828072
    faces:            2384496
    internal faces:  2286781
    cells:            778550
    faces per cell:  5.99997
    boundary patches: 11
    point zones:      0
    face zones:      0
    cell zones:      0

Overall number of cells of each type:
    hexahedra:    778527
    prisms:        23
    wedges:        0
    pyramids:      0
    tet wedges:    0
    tetrahedra:    0
    polyhedra:    0

Checking topology...
    Boundary definition OK.
    Cell to face addressing OK.
    Point usage OK.
    Upper triangular ordering OK.
    Face vertices OK.
    Number of regions: 1 (OK).

Checking patch topology for multiply connected surfaces...
    Patch              Faces    Points  Surface topology                 
    top                33850    34503    ok (non-closed singly connected) 
    endwall            33850    34503    ok (non-closed singly connected) 
    inlet              2162    2280    ok (non-closed singly connected) 
    defaultFaces        9315    9792    ok (non-closed singly connected) 
    cut.stl_GEOM        92      120      ok (non-closed singly connected) 
    cyclic1            6486    6792    ok (non-closed singly connected) 
    cyclic2            6486    6792    ok (non-closed singly connected) 
    cyclic0            2599    2736    ok (non-closed singly connected) 
    cyclic3            2599    2736    ok (non-closed singly connected) 
    cyclic4            138      168      ok (non-closed singly connected) 
    cyclic5            138      168      ok (non-closed singly connected) 

Checking geometry...
    Overall domain bounding box (-225 0 -1.421085e-14) (390.985 50 906.612)
    Mesh (non-empty, non-wedge) directions (1 1 1)
    Mesh (non-empty) directions (1 1 1)
    Boundary openness (-1.250668e-15 -8.838649e-14 8.18473e-17) OK.
    Max cell openness = 3.526904e-16 OK.
    Max aspect ratio = 7.587198 OK.
    Minimum face area = 0.3513021. Maximum face area = 38.50867.  Face area magnitudes OK.
    Min volume = 0.7637002. Max volume = 35.89391.  Total volume = 5739637.  Cell volumes OK.
    Mesh non-orthogonality Max: 43.7972 average: 7.422812
    Non-orthogonality check OK.
    Face pyramids OK.
    Max skewness = 0.8312743 OK.
    Coupled point location match (average 1.031287e-13) OK.

Mesh OK.

End

So, can anyone help me, or give me a tip where to look next? Is this actually a bug, or should i try messing around with different refinement/precision-parameters in the snappyHexMeshDict, or is this something different alltogehter?

If more data on the used geometry or one of the Dict-files is needed, just say so, i'll post it.

Any input is appreciated... thanks in advance..

Alex

wyldckat July 7, 2013 11:04

Greetings Alex and welcome to the forum!

Quote:

Originally Posted by pct (Post 437804)
Code:

*** glibc detected *** snappyHexMesh: malloc(): memory corruption: 0x0000000006051b50 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x76518)[0x7f6bf117e518]
/lib64/libc.so.6(+0x794cf)[0x7f6bf11814cf]
/lib64/libc.so.6(__libc_malloc+0x77)[0x7f6bf11835a7]
/usr/lib64/libstdc++.so.6(_Znwm+0x1d)[0x7f6bf175259d]
/usr/lib64/libstdc++.so.6(_Znam+0x9)[0x7f6bf17526b9]
/usr/local/share/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64IccDPOpt/lib/libautoMesh.so(_ZN4Foam9syncTools13syncPointListINS_6VectorIdEENS_13minMagSqrEqOpIS3_EENS_13mapDistribute9transformEEEvRKNS_8polyMeshERKNS_4ListIiEERNSB_IT_EERKT0_RKSF_RKT1_+0x154)[0x7f6bf29f06b4]
/usr/local/share/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64IccDPOpt/lib/libautoMesh.so(_ZNK4Foam14autoSnapDriver25calcNearestSurfaceFeatureERKNS_14snapParametersEiddRKNS_5FieldIdEERKNS4_INS_6VectorIdEEEERNS_14motionSmootherE+0xc47)[0x7f6bf2a0afb7]
/usr/local/share/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64IccDPOpt/lib/libautoMesh.so(_ZN4Foam14autoSnapDriver6doSnapERKNS_10dictionaryES3_dRKNS_14snapParametersE+0xcba)[0x7f6bf29e2bca]
snappyHexMesh[0x40d237]
/lib64/libc.so.6(__libc_start_main+0xe6)[0x7f6bf1126c16]
snappyHexMesh[0x40b459]


:eek: This doesn't look good. There are a few possibilities here:
  • Icc was used to build OpenFOAM. Which exact version was it used? I ask this because the only supported versions are:
    Quote:

    Originally Posted by http://www.openfoam.org/download/git.php
    Intel ICC: 12.1.0 and 13.1.0

  • Perhaps there isn't enough RAM?
  • Perhaps there is a hardware failure?
  • Have you tried using Gcc for compiling OpenFOAM?

Quote:

Originally Posted by pct (Post 437804)
Code:

snappyHexMesh: malloc.c:4625: _int_malloc: Assertion `(unsigned long)(size) >= (unsigned long)(nb)' failed.
Abbruch
Exitcode 134


It would be helpful to know what values each side of this inequality expression has got, namely "(unsigned long)(size)" and "(unsigned long)(nb)".

Quote:

Originally Posted by pct (Post 437804)
Code:

Mesh stats
    points:          828072
    faces:            2384496
    internal faces:  2286781
    cells:            778550
    faces per cell:  5.99997
    boundary patches: 11
    point zones:      0
    face zones:      0
    cell zones:      0


"5.99997"? This looks a bit... er, a lot suspicious! And how did you manage to get a checkMesh on the resulting mesh, if snappyHexMesh crashed?

Was Icc used with the "fast math" option turned on?

Best regards,
Bruno

pct July 8, 2013 03:36

Hi Wyldckat,

thanks for the response.

Icc is Version 12.0.4.191 Build 20110427
You think that might be reason enough for this error?


RAM and Hardware-failure: I tested it on three different machines with 8, 24 and 64 GB RAM, everytime, exact same error... so i guess it's neither of it.


Since we use a shared-installation at the institut, we're not supposed to compile the whole package for ourselves. If i try however, to compile the autoMesh-module with wmake CC or CXX=/usr/bin/gcc-4.3 i get a bunch of compiling-errors about "undefined reference to" ...


I'll see what i can do about that inequality expression.. that may give another clue..
But like i said, that only occurs if i shift the whole memorymap by inserting a pretty long outputstring somewhere before the failing expression in the source-code..


The final geometry is "cut in form" by repeatingly using snappyhexmesh with different *.stl-files. Also, snappyHexMesh writes out the already cut mesh before starting the morphing-phase. checkMesh reports for both meshes a final evaluation of "mesh ok".
If i take a look at the already cut out mesh with paraFoam, it has a bad case of ugly staircase. The mesh is cut in/with the right plane, but the cells are not yet snapped to fit it. Thats what the morphing should do, but there it crashes.


I don't know if "fast math" was used, but i can ask. You think there was too much algebraic optimization for performance-reasons?

best regards
Alex

wyldckat July 10, 2013 16:15

Hi Alex,

Gcc 4.3 is no longer supported in OpenFOAM 2.2. The minimum is Gcc 4.5.

As for ICC, yes that could be a big problem. The supported versions of ICC are recommended, simply because ICC is sometimes slower to catch up to GCC, in terms of supported features, where some of those features are necessary for OpenFOAM to be built properly.

There is another possibility: if OpenFOAM was built on a machine with a more recent Linux Distribution and then shared among all other machines, that could lead to some incompatibilities between the base C/C++/Intel libraries.
Yet another possibility is that the machines you're using are AMD based and the OpenFOAM build you're using might have been built on an Intel machine, leading to this crazy issue. To check this, run:
Code:

lscpu

# or

cat /proc/cpuinfo

You might also want to contact the person responsible for building that OpenFOAM version you're using.

Last but not least, if you share a small test case that reproduces the crash you get, it would help to make sure if this is only a problem you're having or if it's a specific problem with OpenFOAM.

edit: from my experience, snappyHexMesh is very sensitive to math precision. As for fast math option itself, check this: http://www.cfd-online.com/Forums/blo...-amd-cpus.html - sorry, it's a long read, but you should also check the comments, where fast math is mentioned ;)

Best regards,
Bruno

pct July 10, 2013 16:42

Hey,

I run all my simulations on the provided servers/workstations which are all equipped with intel-processors. Gcc-version on these ones is 4.3 and 4.6 but that changed only recently. (the update to 4.6).
ICC (which was used for compiling) is still 12.0.4 on these machines.
I forwarded the information about a recommended update to 12.1 / 13.1 ..
When this is done and helped, i'll tell you.

About a test-case. I'll try, but the problem seem's limited to the one special case i have, because all the tutorials and other test-cases i had, worked fine so far.
The 5.9 faces per Cell (which indeed sound strange) don't seem to affect the other snappyHexMesh-procedures in the overall build-script for the final geometry.
I'll have a second look at the *.stl-file in question but since the whole case isn't exactly what you would call "properly documented", altough it's relativly complex, this is more trial&error than it sould be.

I'm not sure i can share this case, since it resembles a part of a test-bench used here at the department. If i'm allowed to, i'll upload it.

edit: thanks for the information on snappyhexmesh and fastmath. I didn't know there were such big differences in performace from gcc to icc.. guess thats why we use the icc-compiler..
But for ffast-math: Does the "not guaranteed (numerical) precision" affects every calculation or does it only come into play if i use extreme write-precision or tolerance-values? For instance 8 to 10 decimal places or more? Or does it favor things like numerical diffusion and overall rounding-errors in general?

Best regards
Alex

wyldckat July 13, 2013 14:46

Hi Alex,

I had to go look for at least one example: http://forums.anandtech.com/showthread.php?t=1796118

Mmm... from this, I would say that at first it shouldn't affect much. But I've had a few situations where an error of something like 1e-5 was more than enough for sHM to complain about not being able to snap to the geometry, simply because "a >= b" wasn't giving the right result.
Although solver-wise, the small errors could accumulate over the millions of calculations that solvers usually do.
Then there are the trigonometrical functions, where faster versions can have rather higher errors, given the accumulation of inherent mathematical operations.

I would suggest that you test the tutorials in OpenFOAM that use sHM to ascertain if any of them crashed or gave any strange messages, because back when I got that strange error, was with one of the tutorials.

Best regards,
Bruno

pct July 15, 2013 03:47

sHM-tutorials
 
Hi Bruno,

i just ran all OF-tutorials which had a sHMDict, none of them crashed with a memorycorruption or any other sort of error.

Though some of them also had relativly odd numbers for the "faces per cell", like 6.095 or 6.1512.. but always > 6.

In my case it's 5.9.. from the beginning (after blockMesh). But the first sHM-operation runs smoothly. After this there are some topo-set-operations, then comes the second sHM-operation which crashes.

So i guess this problem is _very_ specific to my case. Or at least some of the files related to it. I'll keep experimenting, maybe something turns up.

I'm also still waiting for the update on icc 12.1/13.1 and the rebuilt, but the person responsible is out of the office is far as i know.

Also thanks for the link on ffast-math. Guess if you don't need the "extreme" math-precision in your case, you might as well take the performance-increase which comes with it.
Do you use ffast-math for your builds or is this a no-go on principal for you?

Best regards
Alex

pct July 16, 2013 03:47

solved..
 
it's solved. The *.stl used for this operation had
a) some negative coordinates and was
b) apparently a little bit to small, to cut the provided mesh along all edges.

So, even before doing the point-sync or somewhere in the process, it compared to points/values not compatible with each other and crashed..

aaand, it was corrupted base-data. -.-

Regardless, thanks for your help, it turned up some other things i'll have a look at...

best regards
Alex

wyldckat July 16, 2013 17:30

Hi Alex,

I'm glad you've found the problem!

In reply to your question:
Quote:

Originally Posted by pct (Post 439735)
Do you use ffast-math for your builds or is this a no-go on principal for you?

I usually don't use the "fast-math" option. If anything, I use the "Single Precision" builds, because this way I effectively know that the precision will be seriously smaller, instead of having to guess if I'm having issues with non-respected IEEE protocols :D

Best regards,
Bruno

wyldckat July 21, 2013 13:57

Hi Alex,

I have to apologize for something, namely about this:
Quote:

Code:

faces per cell:  5.99997

I usually deal with meshes that have cube-like cells, which usually gives me well rounded values for the "faces per cell", such as "6.0".
But apparently this is normal in OpenFOAM, namely to have these non-rounded kinds of values, when there are cells that are not of the type "hexahedra".

Best regards,
Bruno


All times are GMT -4. The time now is 04:52.