shang |
January 18, 2018 13:23 |
Foam::error::printStack(Foam::Ostream&) with pimpleFoam of OF1612 on Cluster
1 Attachment(s)
Hello everybody,
I had been testing a case with wavy channel by a self-installed (no sudo right) OF1612 on my Uni's cluster. The calculation had been running well but terminated unexpectedly at one time step (Time = 15.4823) with following error message:
Code:
[44] #0 Foam::error::printStack(Foam::Ostream&) addr2line failed
[44] #1 Foam::sigFpe::sigHandler(int) addr2line failed
[44] #2 ? addr2line failed
[44] #3 Foam::FaceCellWave<Foam::wallPointYPlus, int>::updateFace(int, Foam::wallPointYPlus const&, double, Foam::wallPointYPlus&) addr2line failed
[44] #4 Foam::FaceCellWave<Foam::wallPointYPlus, int>::mergeFaceInfo(Foam::polyPatch const&, int, Foam::List<int> const&, Foam::List<Foam::wallPointYPlus> const&) addr2line failed
[44] #5 Foam::FaceCellWave<Foam::wallPointYPlus, int>::handleProcPatches() addr2line failed
[44] #6 Foam::FaceCellWave<Foam::wallPointYPlus, int>::cellToFace() addr2line failed
[44] #7 Foam::FaceCellWave<Foam::wallPointYPlus, int>::FaceCellWave(Foam::polyMesh const&, Foam::List<int> const&, Foam::List<Foam::wallPointYPlus> const&, Foam::UList<Foam::wallPointYPlus>&, Foam::UList<Foam::wallPointYPlus>&, int, int&) addr2line failed
[44] #8 Foam::patchDataWave<Foam::wallPointYPlus>::correct() addr2line failed
[44] #9 Foam::patchDataWave<Foam::wallPointYPlus>::patchDataWave(Foam::polyMesh const&, Foam::HashSet<int, Foam::Hash<int> > const&, Foam::UPtrList<Foam::Field<double> > const&, bool) addr2line failed
[44] #10 Foam::wallDistData<Foam::wallPointYPlus>::correct() addr2line failed
[44] #11 Foam::wallDistData<Foam::wallPointYPlus>::wallDistData(Foam::fvMesh const&, Foam::GeometricField<double, Foam::fvPatchField, Foam::volMesh>&, bool) addr2line failed
[44] #12 Foam::LESModels::vanDriestDelta::calcDelta() addr2line failed
[44] #13 Foam::LESModels::Smagorinsky<Foam::IncompressibleTurbulenceModel<Foam::transportModel> >::correct() addr2line failed
[44] #14 ?
[44] #15 __libc_start_main addr2line failed
[44] #16 ?
On our Cluster, the error message was printed separately from log file, below is the output of log file. I include here also the last two time steps before the error message:
Code:
Time = 15.4822
PIMPLE: iteration 1
DILUPBiCG: Solving for Ux, Initial residual = 0.000556487, Final residual = 2.12262e-06, No Iterations 1
DILUPBiCG: Solving for Uy, Initial residual = 0.00596978, Final residual = 9.37667e-07, No Iterations 2
DILUPBiCG: Solving for Uz, Initial residual = 0.00639191, Final residual = 9.79846e-07, No Iterations 2
Pressure gradient source: uncorrected Ubar = 0.3, pressure gradient = 0.173112
DICPCG: Solving for p, Initial residual = 0.0201665, Final residual = 0.000198046, No Iterations 22
time step continuity errors : sum local = 4.22849e-09, global = 6.70099e-15, cumulative = -7.63734e-11
Pressure gradient source: uncorrected Ubar = 0.3, pressure gradient = 0.173323
DICPCG: Solving for p, Initial residual = 0.00155507, Final residual = 9.92083e-07, No Iterations 219
time step continuity errors : sum local = 2.28596e-11, global = 6.70113e-15, cumulative = -7.63667e-11
Pressure gradient source: uncorrected Ubar = 0.3, pressure gradient = 0.173316
ExecutionTime = 35366.4 s ClockTime = 35563 s
fieldAverage fieldAverage1 write:
Calculating averages
Courant Number mean: 0.0786736 max: 0.345867
Time = 15.4823
PIMPLE: iteration 1
DILUPBiCG: Solving for Ux, Initial residual = 0.000556473, Final residual = 2.12284e-06, No Iterations 1
DILUPBiCG: Solving for Uy, Initial residual = 0.00596983, Final residual = 9.38263e-07, No Iterations 2
DILUPBiCG: Solving for Uz, Initial residual = 0.0063918, Final residual = 9.82205e-07, No Iterations 2
Pressure gradient source: uncorrected Ubar = 0.3, pressure gradient = 0.173062
DICPCG: Solving for p, Initial residual = 0.02018, Final residual = 0.000195317, No Iterations 23
time step continuity errors : sum local = 4.17008e-09, global = 6.77215e-15, cumulative = -7.63599e-11
Pressure gradient source: uncorrected Ubar = 0.3, pressure gradient = 0.173276
DICPCG: Solving for p, Initial residual = 0.00155713, Final residual = 9.79816e-07, No Iterations 192
time step continuity errors : sum local = 2.27268e-11, global = 6.77206e-15, cumulative = -7.63531e-11
Pressure gradient source: uncorrected Ubar = 0.3, pressure gradient = 0.173274
--------------------------------------------------------------------------
A process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
The process that invoked fork was:
Local host: [[60763,1],44] (PID 115663)
If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
[node087:01306] *** Process received signal ***
[node087:01307] *** Process received signal ***
[node087:01309] *** Process received signal ***
[node087:01311] *** Process received signal ***
[node087:01312] *** Process received signal ***
[node087:01313] *** Process received signal ***
[node087:01314] *** Process received signal ***
[node087:01315] *** Process received signal ***
[node087:01316] *** Process received signal ***
[node087:01317] *** Process received signal ***
[node087:01319] *** Process received signal ***
[node087:01320] *** Process received signal ***
[node087:115663] *** Process received signal ***
[node087:115663] Signal: Floating point exception (8)
[node087:115663] Signal code: (-6)
[node087:115663] Failing at address: 0xbfec0001c3cf
[node087:115663] [ 0] /lib64/libc.so.6[0x3e13632660]
[node087:115663] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x3e136325e5]
[node087:115663] [ 2] /lib64/libc.so.6[0x3e13632660]
[node087:115663] [ 3] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam12FaceCellWaveINS_14wallPointYPlusEiE10updateFaceEiRKS1_dRS1_+0x6f)[0x2aaaaab7201f]
[node087:115663] [ 4] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam12FaceCellWaveINS_14wallPointYPlusEiE13mergeFaceInfoERKNS_9polyPatchEiRKNS_4ListIiEERKNS6_IS1_EE+0xe9)[0x2aaaaab722a9]
[node087:115663] [ 5] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam12FaceCellWaveINS_14wallPointYPlusEiE17handleProcPatchesEv+0x457)[0x2aaaaab79957]
[node087:115663] [ 6] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam12FaceCellWaveINS_14wallPointYPlusEiE10cellToFaceEv+0x568)[0x2aaaaab7a1d8]
[node087:115663] [ 7] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam12FaceCellWaveINS_14wallPointYPlusEiEC2ERKNS_8polyMeshERKNS_4ListIiEERKNS6_IS1_EERNS_5UListIS1_EESF_iRi+0x40a)[0x2aaaaab7a66a]
[node087:115663] [ 8] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam13patchDataWaveINS_14wallPointYPlusEE7correctEv+0xc8)[0x2aaaaab7aae8]
[node087:115663] [ 9] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam13patchDataWaveINS_14wallPointYPlusEEC2ERKNS_8polyMeshERKNS_7HashSetIiNS_4HashIiEEEERKNS_8UPtrListINS_5FieldIdEEEEb+0x154)[0x2aaaaab7af94]
[node087:115663] [10] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam12wallDistDataINS_14wallPointYPlusEE7correctEv+0x153)[0x2aaaaab7b1b3]
[node087:115663] [11] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam12wallDistDataINS_14wallPointYPlusEEC2ERKNS_6fvMeshERNS_14GeometricFieldIdNS_12fvPatchFieldENS_7volMeshEEEb+0x1a4)[0x2aaaaab7b6d4]
[node087:115663] [12] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam9LESModels14vanDriestDelta9calcDeltaEv+0x539)[0x2aaaaab6d889]
[node087:115663] [13] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libincompressibleTurbulenceModels.so(_ZN4Foam9LESModels11SmagorinskyINS_29IncompressibleTurbulenceModelINS_14transportModelEEEE7correctEv+0x1b)[0x2aaaab0d7bbb]
[node087:115663] [14] pimpleFoam[0x426834]
[node087:115663] [15] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3e1361ed1d]
[node087:115663] [16] pimpleFoam[0x42739d]
[node087:115663] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 44 with PID 115663 on node node087 exited on signal 8 (Floating point exception).
--------------------------------------------------------------------------
The case actually started from Time = 11 and finished at Time = 15 with average being taken from Time = 12. But the averaged data at Time = 15 was not settled yet, so I re-submitted the job on the cluster to run from Time = 15 to 17. The first unexpected termination happened at Time = 15.1755. Unfortunately, its error message was deleted by mistake, so only output in log file is attached. Then I re-submitted the data written at Time = 15.17 to continue the calculation, it successfully run through Time = 15.1755, but unexpectedly terminated again at Time = 15.4823. The above error message and log output are for Time = 15.4823.
You may have noticed, the output in the attached log (for the first termination at T = 15.1755) is different from the log output above (for the second termination at T = 15.4823), this indicates the two terminations might be due to different reasons, but I can't tell without the error message of the first termination, shame...
I have done some researches in the forum and got some initial thoughts about the error message:
I think the problem was not caused by bad boundary conditions or division of 0, as the case had been running for quite a long time and the continuity errors at the last time step before the termination was at a fairly low level.
The problem seems to point to a file called "FaceCellWave.C". However, when I open the file and go to the updateFace method, I can't really find anything which can cause FPE (maybe because I am not familiar with the code:(:(:(). The method reads:
Code:
template<class Type, class TrackingData>
bool Foam::FaceCellWave<Type, TrackingData>::updateFace
(
const label facei,
const Type& neighbourInfo,
const scalar tol,
Type& faceInfo
)
{
// Update info for facei, at position pt, with information from
// same face.
// Updates:
// - changedFace_, changedFaces_,
// - statistics: nEvals_, nUnvisitedFaces_
nEvals_++;
bool wasValid = faceInfo.valid(td_);
bool propagate =
faceInfo.updateFace
(
mesh_,
facei,
neighbourInfo,
tol,
td_
);
if (propagate)
{
if (!changedFace_[facei])
{
changedFace_[facei] = true;
changedFaces_.append(facei);
}
}
if (!wasValid && faceInfo.valid(td_))
{
--nUnvisitedFaces_;
}
return propagate;
}
I re-submitted the job yesterday with the data written at Time = 15.48 to continue the simulation, since then no error has occurred and it now reaches Time = 16.5. So, I have no clue about what exact the problem was and how did it cause the termination. Can anybody give me some hints?
BTW, The controlDict, fvSchemes and fvSolution files are included in my another thread in https://www.cfd-online.com/Forums/op...tml#post678480
Many thanks,
Yeru
|