 skabilan October 12, 2007 11:31

Hi, I get the following err

Hi,

I get the following error message after successfully running for nearly 6 hours on 4 processors. Can someone help me with this?

Courant Number mean: 5.28423e-06 max: 0.634632
deltaT = 2.15949e-24
Time = 0.00317049

DILUPBiCG: Solving for Ux, Initial residual = 0.0807868, Final residual = 0.000402516, No Iterations 2
DILUPBiCG: Solving for Uy, Initial residual = 0.061261, Final residual = 8.34029e-06, No Iterations 3
DILUPBiCG: Solving for Uz, Initial residual = 0.0724885, Final residual = 2.75137e-06, No Iterations 3
GAMG: Solving for p, Initial residual = 0.798164, Final residual = 0.0379458, No Iterations 2
time step continuity errors : sum local = 4.71854e-07, global = -5.93706e-09, cumulative = -9.63915e-06
[0] #0 Foam::error::printStack(Foam:http://www.cfd-online.com/OpenFOAM_D...part/proud.gifstream&)[1] #0 Foam::error::printStack(Foam:http://www.cfd-online.com/OpenFOAM_D...part/proud.gifstream&) in "/files0/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libOpenFOAM.so"
[0] #1 Foam::sigFpe::sigFpeHandler(int) in "/files0/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libOpenFOAM.so"
[1] #1 Foam::sigFpe::sigFpeHandler(int) in "/files0/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libOpenFOAM.so"
[0] #2 in "/files0/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libOpenFOAM.so"
[1] #2 __restore_rt__restore_rt in "/lib64/tls/libc.so.6"
[0] #3 void Foam::processorLduInterface::compressedSend<foam:: vector<double> >(Foam::UList<foam::vector<double> > const&, bool) const in "/lib64/tls/libc.so.6"
[1] #3 void Foam::processorLduInterface::compressedSend<foam:: vector<double> >(Foam::UList<foam::vector<double> > const&, bool) const in "/files0/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libfiniteVolume.so"
[0] #4 Foam::processorFvPatchField<foam::vector<double> >::initEvaluate(bool) in "/files0/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libfiniteVolume.so"
[1] #4 Foam::processorFvPatchField<foam::vector<double> >::initEvaluate(bool) in "/files0/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libfiniteVolume.so"
[0] #5 Foam::GeometricField<foam::vector<double>, Foam::fvPatchField, Foam::volMesh>::GeometricBoundaryField::evaluate() in "/files0/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libfiniteVolume.so"
[1] #5 Foam::GeometricField<foam::vector<double>, Foam::fvPatchField, Foam::volMesh>::GeometricBoundaryField::evaluate() in "/files0/OpenFOAM/skabilan-1.4.1/applications/bin/linux64GccDPOpt/icoFoamVarDt"
[0] #8 Foam::tmp<foam::geometricfield<foam::outerproduct< foam::vector<double>, double>::type, Foam::fvPatchField, Foam::volMesh> > Foam::fvc::grad<double>(Foam::GeometricField<doubl e,> const&, Foam::word const&) in "/files0/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libfiniteVolume.so"
[1] #8 Foam::tmp<foam::geometricfield<foam::outerproduct< foam::vector<double>, double>::type, Foam::fvPatchField, Foam::volMesh> > Foam::fvc::grad<double>(Foam::GeometricField<doubl e,> const&, Foam::word const&) in "/files0/OpenFOAM/skabilan-1.4.1/applications/bin/linux64GccDPOpt/icoFoamVarDt"
[0] #9 Foam::tmp<foam::geometricfield<foam::outerproduct< foam::vector<double>, double>::type, Foam::fvPatchField, Foam::volMesh> > Foam::fvc::grad<double>(Foam::GeometricField<doubl e,> const&) in "/files0/OpenFOAM/skabilan-1.4.1/applications/bin/linux64GccDPOpt/icoFoamVarDt"
[1] #9 Foam::tmp<foam::geometricfield<foam::outerproduct< foam::vector<double>, double>::type, Foam::fvPatchField, Foam::volMesh> > Foam::fvc::grad<double>(Foam::GeometricField<doubl e,> const&) in "/files0/OpenFOAM/skabilan-1.4.1/applications/bin/linux64GccDPOpt/icoFoamVarDt"
[0] #10 in "/files0/OpenFOAM/skabilan-1.4.1/applications/bin/linux64GccDPOpt/icoFoamVarDt"
[1] #10 mainmain in in "/files0/OpenFOAM/skabilan-1.4.1/applications/bin/linux64GccDPOpt/icoFoamVarDt"
[0] #11 __libc_start_main"/files0/OpenFOAM/skabilan-1.4.1/applications/bin/linux64GccDPO pt/icoFoamVarDt"
[1] #11 __libc_start_main in "/lib64/tls/libc.so.6"
[1] #12 __gxx_personality_v0 in "/lib64/tls/libc.so.6"
[0] #12 __gxx_personality_v0 in "/files0/OpenFOAM/skabilan-1.4.1/applications/bin/linux64GccDPOpt/icoFoamVarDt"
[bigbox:04806] *** Process received signal ***
[bigbox:04806] Signal: Floating point exception (8)
[bigbox:04806] Signal code: (-6)
[bigbox:04806] [ 0] /lib64/tls/libc.so.6 [0x34dc12e2b0]
[bigbox:04806] [ 1] /lib64/tls/libc.so.6(gsignal+0x3d) [0x34dc12e21d]
[bigbox:04806] [ 2] /lib64/tls/libc.so.6 [0x34dc12e2b0]
[bigbox:04806] [ 3] /files0/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libfiniteVolume.so(_ZNK4Foam 21processorLduInterface14compressedSendINS_6Vector IdEEEEvRKNS_5UListIT_EEb+0x92) [0x2a958c0072]
[bigbox:04806] [ 4] /files0/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libfiniteVolume.so(_ZN4Foam2 1processorFvPatchFieldINS_6VectorIdEEE12initEvalua teEb+0x65) [0x2a958c3055]
[bigbox:04806] [ 5] /files0/OpenFOAM/skabilan-1.4.1/applications/bin/linux64GccDPOpt/icoFoamVarDt(_Z N4Foam14GeometricFieldINS_6VectorIdEENS_12fvPatchF ieldENS_7volMeshEE22GeometricB oundaryField8evaluateEv+0x89) [0x4207d9]
[bigbox:04806] [ 8] /files0/OpenFOAM/skabilan-1.4.1/applications/bin/linux64GccDPOpt/icoFoamVarDt(_Z N4Foam3fvc4gradIdEENS_3tmpINS_14GeometricFieldINS_ 12outerProductINS_6VectorIdEET _E4typeENS_12fvPatchFieldENS_7volMeshEEEEERKNS3_IS 7_SA_SB_EERKNS_4wordE+0x68) [0x42eb38]
[bigbox:04806] [ 9] /files0/OpenFOAM/skabilan-1.4.1/applications/bin/linux64GccDPOpt/icoFoamVarDt(_Z N4Foam3fvc4gradIdEENS_3tmpINS_14GeometricFieldINS_ 12outerProductINS_6VectorIdEET _E4typeENS_12fvPatchFieldENS_7volMeshEEEEERKNS3_IS 7_SA_SB_EE+0x1d3) [0x436a23]
[bigbox:04806] [10] /files0/OpenFOAM/skabilan-1.4.1/applications/bin/linux64Gcc in "/files0/OpenFOAM/skabilan-1.4.1/applications/bin/linux64GccDPOpt/icoFoamVarDt"
[bigbox:04805] *** Process received signal ***
[bigbox:04805] Signal: Floating point exception (8)
[bigbox:04805] Signal code: (-6)
[bigbox:04805] [ 0] /lib64/tls/libc.so.6 [0x34dc12e2b0]
[bigbox:04805] [ 1] /lib64/tls/libc.so.6(gsignal+0x3d) [0x34dc12e21d]
[bigbox:04805] [ 2] /lib64/tls/libc.so.6 [0x34dc12e2b0]
[bigbox:04805] [ 3] /files0/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libfiniteVolume.so(_ZNK4Foam 21processorLduInterface14compressedSendINS_6Vector IdEEEEvRKNS_5UListIT_EEb+0x92) [0x2a958c0072]
[bigbox:04805] [ 4] /files0/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libfiniteVolume.so(_ZN4Foam2 1processorFvPatchFieldINS_6VectorIdEEE12initEvalua teEb+0x65) [0x2a958c3055]
[bigbox:04805] [ 5] /files0/OpenFOAM/skabilan-1.4.1/applications/bin/linux64GccDPOpt/icoFoamVarDt(_Z N4Foam14GeometricFieldINS_6VectorIdEENS_12fvPatchF ieldENS_7volMeshEE22GeometricB oundaryField8evaluateEv+0x89) [0x4207d9]
[bigbox:04805] [ 8] /files0/OpenFOAM/skabilan-1.4.1/applications/bin/linux64GccDPOpt/icoFoamVarDt(_Z N4Foam3fvc4gradIdEENS_3tmpINS_14GeometricFieldINS_ 12outerProductINS_6VectorIdEET _E4typeENS_12fvPatchFieldENS_7volMeshEEEEERKNS3_IS 7_SA_SB_EERKNS_4wordE+0x68) [0x42eb38]
[bigbox:04805] [ 9] /files0/OpenFOAM/skabilan-1.4.1/applications/bin/linux64GccDPOpt/icoFoamVarDt(_Z N4Foam3fvc4gradIdEENS_3tmpINS_14GeometricFieldINS_ 12outerProductINS_6VectorIdEET _E4typeENS_12fvPatchFieldENS_7volMeshEEEEERKNS3_IS 7_SA_SB_EE+0x1d3) [0x436a23]
[bigbox:04805] [10] /files0/OpenFOAM/skabilan-1.4.1/applications/bin/linux6DPOpt/icoFoamVarDt [0x4161fc]
[bigbox:04806] [11] /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x34dc11c3fb]
[bigbox:04806] [12] /files0/OpenFOAM/skabilan-1.4.1/applications/bin/linux64GccDPOpt/icoFoamVarDt(__ gxx_personality_v0+0xfa) [0x412aba]
[bigbox:04806] *** End of error message ***
4GccDPOpt/icoFoamVarDt [0x4161fc]
[bigbox:04805] [11] /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x34dc11c3fb]
[bigbox:04805] [12] /files0/OpenFOAM/skabilan-1.4.1/applications/bin/linux64GccDPOpt/icoFoamVarDt(__ gxx_personality_v0+0xfa) [0x412aba]
[bigbox:04805] *** End of error message ***
mpirun noticed that job rank 0 with PID 4805 on node bigbox exited on signal 15 (Terminated).
3 additional processes aborted (not shown)

 skabilan October 12, 2007 11:52

Hi, An Update.I tried the s

Hi,

An Update.I tried the simulation again and the process died at the same time step (Time = 0.00317049). Does it have to do anything with the e-24 timestep?

Thanks

 mattijs October 12, 2007 13:02

You have a factor 1E5 differen

You have a factor 1E5 difference between mean and max Courant number.

Maybe you have this huge variation in cell size and a nice uniform flow but more likely your velocity has gone bad.

 skabilan October 12, 2007 13:08

Mattijs, Thanks for the qui

Mattijs,

Thanks for the quick response. Any suggestions to handle these type of anisotropic meshes?

Regards,
Senthil

 skabilan October 12, 2007 13:15

Hi, I have added the output

Hi,

I have added the output of checkMesh.

/*---------------------------------------------------------------------------*\
| ========= | |
| \ / F ield | OpenFOAM: The Open Source CFD Toolbox |
| \ / O peration | Version: 1.4.1 |
| \ / A nd | Web: http://www.openfoam.org |
| \/ M anipulation | |
\*---------------------------------------------------------------------------*/

Exec : checkMesh . phantom_icofoam_trans
Date : Oct 11 2007
Time : 11:18:30
Host : bigbox
PID : 27691
Root : /files0/skabilan/uw_workdir/openfoam/rat_urt_phantom
Case : phantom_icofoam_trans
Nprocs : 1
Create time

Create polyMesh for time = constant

Time = constant

Mesh stats
points: 804332
edges: 5388519
faces: 8970303
internal faces: 8574189
cells: 4386123
boundary patches: 3
point zones: 0
face zones: 0
cell zones: 0

Number of cells of each type:
hexahedra: 0
prisms: 0
wedges: 0
pyramids: 0
tet wedges: 0
tetrahedra: 4386123
polyhedra: 0

Checking topology...
Boundary definition OK.
Point usage OK.
Upper triangular ordering OK.
Topological cell zip-up check OK.
Face vertices OK.
Face-face connectivity OK.
Number of regions: 1 (OK).

Checking patch topology for multiply connected surfaces ...
Patch Faces Points Surface
inlet 1559 803 ok (not multiply connected)
out2 811 429 ok (not multiply connected)
w1 393744 196901 ok (not multiply connected)

Checking geometry...
Domain bounding box: (-0.00340639 -0.00303818 -0.002) (0.00325593 0.00827911 0.0635)
Boundary openness (-1.63723e-16 2.07309e-17 -6.91347e-17) OK.
Max cell openness = 8.35141e-16 OK.
Max aspect ratio = 39.9471 OK.
Minumum face area = 2.08231e-11. Maximum face area = 9.21688e-07. Face area magnitudes OK.
Min volume = 6.22034e-17. Max volume = 1.67119e-10. Total volume = 5.89343e-07. Cell volumes OK.
Mesh non-orthogonality Max: 85.0028 average: 34.3687
*Number of severely non-orthogonal faces: 59690.
Non-orthogonality check OK.
<<Writing 59690 non-orthogonal faces to set nonOrthoFaces
Face pyramids OK.
Max skewness = 2.64509 OK.
Min/max edge length = 6.66684e-06 0.00175888 OK.
All angles in faces OK.
All face flatness OK.

Mesh OK.

End

 mbeaudoin October 12, 2007 22:44

Hello Senthil, We currently

Hello Senthil,

We currently have an OpenFOAM parallel simulation under "investigation" that generates the same kind of error messages.

What I know so far is that in our case, the error is triggered by a floating point overflow exception in the method void Foam::processorLduInterface::compressedSend().

If you look at the code for this method (http://openfoam-extend.svn.sourceforge.net/viewvc/openfoam-extend/trunk/Core/Ope nFOAM-1.4.1-dev/src/OpenFOAM/matrices/lduMatrix/lduAddressing/lduInterface/proce ssorLduInterfaceTemplates.C?view=markup)
, (line 87 to 112), you will see that in an attempt to "cut down" on the time spent sending double precision field values to neighboring processor patches, single precision delta values are computed and transmitted instead.

The single precision delta values are computed relative to the "last" entry in the vector containing the double precision field values (slast in the code).

Now, if your double precision solutions values located at the processor patch are getting very very large, the computation of the delta values will eventually generate floating point values that are too large to be represented correctly by single precision variables. In that case, the assignation to fArray[i] on line 101 you will immediately generates an overflow floating point exception, and you will get the nice stack trace you are seeing with your simulation.

Now, you can see at line 87 of processorLduInterfaceTemplates.C that this computation of single precision deltas values will only happen if OpenFOAM is compiled in double precision, and if the main .OpenFOAM/controlDict optimization switch called floatTransfer is activated (set to 1). With OpenFOAM 1.4.1, this "optimization" switch is on by default.

Just for fun, try disabling this option by setting floatTransfer to 0, and start your parallel run again. Your solution might still blow up numerically, but it won't get interrupted by this little piece of code. And you will be sure that your double precision numerical computations will not be affected by any loss of numerical precision coming from the transmission of the processor patches field values.

On my side, I still need to evaluate why the solutions are blowing up after a while with our simulation.

I also need to evaluate if this kind of "optimization" in processorLduInterfaceTemplates.C is really usefull given our cluster interconnect technology (10 Gbps Infiniband).

Given the Infiniband fabric performance, it is less than obvious that I need to waste time computing single precision delta values instead of sending over the whole double precision processor field values in the first place.

I will keep you posted of my findings.

Martin

 skabilan October 12, 2007 23:47

Martin, Thanks for the expl

Martin,

Thanks for the explanation. Ill look foward to your findings. Let me know if there is anything that can be done from my side.

Regards,
Senthil

