CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Foam::error::printStack(Foam::Ostream&) with pimpleFoam of OF1612 on Cluster (https://www.cfd-online.com/Forums/openfoam-solving/197847-foam-error-printstack-foam-ostream-pimplefoam-of1612-cluster.html)

shang January 18, 2018 13:23

Foam::error::printStack(Foam::Ostream&) with pimpleFoam of OF1612 on Cluster
 
1 Attachment(s)
Hello everybody,

I had been testing a case with wavy channel by a self-installed (no sudo right) OF1612 on my Uni's cluster. The calculation had been running well but terminated unexpectedly at one time step (Time = 15.4823) with following error message:

Code:

[44] #0  Foam::error::printStack(Foam::Ostream&) addr2line failed
[44] #1  Foam::sigFpe::sigHandler(int) addr2line failed
[44] #2  ? addr2line failed
[44] #3  Foam::FaceCellWave<Foam::wallPointYPlus,  int>::updateFace(int, Foam::wallPointYPlus const&, double,  Foam::wallPointYPlus&) addr2line failed
[44] #4  Foam::FaceCellWave<Foam::wallPointYPlus,  int>::mergeFaceInfo(Foam::polyPatch const&, int,  Foam::List<int> const&, Foam::List<Foam::wallPointYPlus>  const&) addr2line failed
[44] #5  Foam::FaceCellWave<Foam::wallPointYPlus, int>::handleProcPatches() addr2line failed
[44] #6  Foam::FaceCellWave<Foam::wallPointYPlus, int>::cellToFace() addr2line failed
[44] #7  Foam::FaceCellWave<Foam::wallPointYPlus,  int>::FaceCellWave(Foam::polyMesh const&, Foam::List<int>  const&, Foam::List<Foam::wallPointYPlus> const&,  Foam::UList<Foam::wallPointYPlus>&,  Foam::UList<Foam::wallPointYPlus>&, int, int&) addr2line  failed
[44] #8  Foam::patchDataWave<Foam::wallPointYPlus>::correct() addr2line failed
[44] #9  Foam::patchDataWave<Foam::wallPointYPlus>::patchDataWave(Foam::polyMesh  const&, Foam::HashSet<int, Foam::Hash<int> >  const&, Foam::UPtrList<Foam::Field<double> > const&,  bool) addr2line failed
[44] #10  Foam::wallDistData<Foam::wallPointYPlus>::correct() addr2line failed
[44] #11  Foam::wallDistData<Foam::wallPointYPlus>::wallDistData(Foam::fvMesh  const&, Foam::GeometricField<double, Foam::fvPatchField,  Foam::volMesh>&, bool) addr2line failed
[44] #12  Foam::LESModels::vanDriestDelta::calcDelta() addr2line failed
[44] #13  Foam::LESModels::Smagorinsky<Foam::IncompressibleTurbulenceModel<Foam::transportModel>  >::correct() addr2line failed
[44] #14  ?
[44] #15  __libc_start_main addr2line failed
[44] #16  ?

On our Cluster, the error message was printed separately from log file, below is the output of log file. I include here also the last two time steps before the error message:
Code:

Time = 15.4822

PIMPLE: iteration 1
DILUPBiCG:  Solving for Ux, Initial residual = 0.000556487, Final residual = 2.12262e-06, No Iterations 1
DILUPBiCG:  Solving for Uy, Initial residual = 0.00596978, Final residual = 9.37667e-07, No Iterations 2
DILUPBiCG:  Solving for Uz, Initial residual = 0.00639191, Final residual = 9.79846e-07, No Iterations 2
Pressure gradient source: uncorrected Ubar = 0.3, pressure gradient = 0.173112
DICPCG:  Solving for p, Initial residual = 0.0201665, Final residual = 0.000198046, No Iterations 22
time step continuity errors : sum local = 4.22849e-09, global = 6.70099e-15, cumulative = -7.63734e-11
Pressure gradient source: uncorrected Ubar = 0.3, pressure gradient = 0.173323
DICPCG:  Solving for p, Initial residual = 0.00155507, Final residual = 9.92083e-07, No Iterations 219
time step continuity errors : sum local = 2.28596e-11, global = 6.70113e-15, cumulative = -7.63667e-11
Pressure gradient source: uncorrected Ubar = 0.3, pressure gradient = 0.173316
ExecutionTime = 35366.4 s  ClockTime = 35563 s

fieldAverage fieldAverage1 write:
    Calculating averages

Courant Number mean: 0.0786736 max: 0.345867
Time = 15.4823

PIMPLE: iteration 1
DILUPBiCG:  Solving for Ux, Initial residual = 0.000556473, Final residual = 2.12284e-06, No Iterations 1
DILUPBiCG:  Solving for Uy, Initial residual = 0.00596983, Final residual = 9.38263e-07, No Iterations 2
DILUPBiCG:  Solving for Uz, Initial residual = 0.0063918, Final residual = 9.82205e-07, No Iterations 2
Pressure gradient source: uncorrected Ubar = 0.3, pressure gradient = 0.173062
DICPCG:  Solving for p, Initial residual = 0.02018, Final residual = 0.000195317, No Iterations 23
time step continuity errors : sum local = 4.17008e-09, global = 6.77215e-15, cumulative = -7.63599e-11
Pressure gradient source: uncorrected Ubar = 0.3, pressure gradient = 0.173276
DICPCG:  Solving for p, Initial residual = 0.00155713, Final residual = 9.79816e-07, No Iterations 192
time step continuity errors : sum local = 2.27268e-11, global = 6.77206e-15, cumulative = -7.63531e-11
Pressure gradient source: uncorrected Ubar = 0.3, pressure gradient = 0.173274

--------------------------------------------------------------------------
A process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The process that invoked fork was:

  Local host:          [[60763,1],44] (PID 115663)

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
[node087:01306] *** Process received signal ***
[node087:01307] *** Process received signal ***
[node087:01309] *** Process received signal ***
[node087:01311] *** Process received signal ***
[node087:01312] *** Process received signal ***
[node087:01313] *** Process received signal ***
[node087:01314] *** Process received signal ***
[node087:01315] *** Process received signal ***
[node087:01316] *** Process received signal ***
[node087:01317] *** Process received signal ***
[node087:01319] *** Process received signal ***
[node087:01320] *** Process received signal ***
[node087:115663] *** Process received signal ***
[node087:115663] Signal: Floating point exception (8)
[node087:115663] Signal code:  (-6)
[node087:115663] Failing at address: 0xbfec0001c3cf
[node087:115663] [ 0] /lib64/libc.so.6[0x3e13632660]
[node087:115663] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x3e136325e5]
[node087:115663] [ 2] /lib64/libc.so.6[0x3e13632660]
[node087:115663] [ 3] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam12FaceCellWaveINS_14wallPointYPlusEiE10updateFaceEiRKS1_dRS1_+0x6f)[0x2aaaaab7201f]
[node087:115663] [ 4] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam12FaceCellWaveINS_14wallPointYPlusEiE13mergeFaceInfoERKNS_9polyPatchEiRKNS_4ListIiEERKNS6_IS1_EE+0xe9)[0x2aaaaab722a9]
[node087:115663] [ 5] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam12FaceCellWaveINS_14wallPointYPlusEiE17handleProcPatchesEv+0x457)[0x2aaaaab79957]
[node087:115663] [ 6] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam12FaceCellWaveINS_14wallPointYPlusEiE10cellToFaceEv+0x568)[0x2aaaaab7a1d8]
[node087:115663] [ 7] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam12FaceCellWaveINS_14wallPointYPlusEiEC2ERKNS_8polyMeshERKNS_4ListIiEERKNS6_IS1_EERNS_5UListIS1_EESF_iRi+0x40a)[0x2aaaaab7a66a]
[node087:115663] [ 8] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam13patchDataWaveINS_14wallPointYPlusEE7correctEv+0xc8)[0x2aaaaab7aae8]
[node087:115663] [ 9] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam13patchDataWaveINS_14wallPointYPlusEEC2ERKNS_8polyMeshERKNS_7HashSetIiNS_4HashIiEEEERKNS_8UPtrListINS_5FieldIdEEEEb+0x154)[0x2aaaaab7af94]
[node087:115663] [10] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam12wallDistDataINS_14wallPointYPlusEE7correctEv+0x153)[0x2aaaaab7b1b3]
[node087:115663] [11] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam12wallDistDataINS_14wallPointYPlusEEC2ERKNS_6fvMeshERNS_14GeometricFieldIdNS_12fvPatchFieldENS_7volMeshEEEb+0x1a4)[0x2aaaaab7b6d4]
[node087:115663] [12] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libturbulenceModels.so(_ZN4Foam9LESModels14vanDriestDelta9calcDeltaEv+0x539)[0x2aaaaab6d889]
[node087:115663] [13] /mnt/nfs2/engdes/ys92/OpenFOAM/OpenFOAM-v1612+/platforms/linux64GccDPInt32Opt/lib/libincompressibleTurbulenceModels.so(_ZN4Foam9LESModels11SmagorinskyINS_29IncompressibleTurbulenceModelINS_14transportModelEEEE7correctEv+0x1b)[0x2aaaab0d7bbb]
[node087:115663] [14] pimpleFoam[0x426834]
[node087:115663] [15] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3e1361ed1d]
[node087:115663] [16] pimpleFoam[0x42739d]
[node087:115663] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 44 with PID 115663 on node node087 exited on signal 8 (Floating point exception).
--------------------------------------------------------------------------

The case actually started from Time = 11 and finished at Time = 15 with average being taken from Time = 12. But the averaged data at Time = 15 was not settled yet, so I re-submitted the job on the cluster to run from Time = 15 to 17. The first unexpected termination happened at Time = 15.1755. Unfortunately, its error message was deleted by mistake, so only output in log file is attached. Then I re-submitted the data written at Time = 15.17 to continue the calculation, it successfully run through Time = 15.1755, but unexpectedly terminated again at Time = 15.4823. The above error message and log output are for Time = 15.4823.

You may have noticed, the output in the attached log (for the first termination at T = 15.1755) is different from the log output above (for the second termination at T = 15.4823), this indicates the two terminations might be due to different reasons, but I can't tell without the error message of the first termination, shame...

I have done some researches in the forum and got some initial thoughts about the error message:
I think the problem was not caused by bad boundary conditions or division of 0, as the case had been running for quite a long time and the continuity errors at the last time step before the termination was at a fairly low level.
The problem seems to point to a file called "FaceCellWave.C". However, when I open the file and go to the updateFace method, I can't really find anything which can cause FPE (maybe because I am not familiar with the code:(:(:(). The method reads:

Code:

template<class Type, class TrackingData>
bool Foam::FaceCellWave<Type, TrackingData>::updateFace
(
    const label facei,
    const Type& neighbourInfo,
    const scalar tol,
    Type& faceInfo
)
{
    // Update info for facei, at position pt, with information from
    // same face.
    // Updates:
    //      - changedFace_, changedFaces_,
    //      - statistics: nEvals_, nUnvisitedFaces_

    nEvals_++;

    bool wasValid = faceInfo.valid(td_);

    bool propagate =
        faceInfo.updateFace
        (
            mesh_,
            facei,
            neighbourInfo,
            tol,
            td_
        );

    if (propagate)
    {
        if (!changedFace_[facei])
        {
            changedFace_[facei] = true;
            changedFaces_.append(facei);
        }
    }

    if (!wasValid && faceInfo.valid(td_))
    {
        --nUnvisitedFaces_;
    }

    return propagate;
}

I re-submitted the job yesterday with the data written at Time = 15.48 to continue the simulation, since then no error has occurred and it now reaches Time = 16.5. So, I have no clue about what exact the problem was and how did it cause the termination. Can anybody give me some hints?

BTW, The controlDict, fvSchemes and fvSolution files are included in my another thread in https://www.cfd-online.com/Forums/op...tml#post678480

Many thanks,
Yeru

shang January 22, 2018 06:54

Anyone has any idea about above error message?

piu58 January 22, 2018 07:01

It would be easier for us to answer if you describe your case here, perhaps with some sketches. It is contained in your file, of course, but we had to analyze it first.

Please explain your b.c. too.

shang January 22, 2018 10:58

1 Attachment(s)
Hey Uwe,

The case is a fairly standard wavy channel test case with the geometry attached.

It has a periodic inlet and outlet, the wavy wall and side wall are no-slip. On the wall, U and nut are 0, and p is zero gradient. I think the BC is same as the channel395 tutorial case.

Thanks,
Yeru

piu58 January 22, 2018 11:58

How do you set the periodic b.c. for pressure and velocity?

shang January 22, 2018 12:06

I just follow what channel395 did.

For p:

Code:

boundaryField
{
    WAVYTOPWALL
    {
        type            zeroGradient;
    }
    WAVYBOTTOMWALL
    {
        type            zeroGradient;
    }
    SIDEWALL
    {
        type            zeroGradient;
    }
    inlet
    {
        type            cyclic;
    }
    outlet
    {
        type            cyclic;
    }
}

For U:

Code:

boundaryField
{
    WAVYTOPWALL
    {
        type            fixedValue;
        value          uniform (0 0 0);
    }
    WAVYBOTTOMWALL
    {
        type            fixedValue;
        value          uniform (0 0 0);
    }
    SIDEWALL
    {
        type            fixedValue;
        value          uniform (0 0 0);
    }
    inlet
    {
        type            cyclic;
    }
    outlet
    {
        type            cyclic;
    }
}


piu58 January 23, 2018 08:19

The changes of the channel flow with it's overall cyclic b.c. does not contain from the boundary conditions but from the initial conditions. They have to be physical correct. It may happen, for instance, that the pressure difference does not fit the flow or that the velocity "generates" or "loses" volume. All of this would lead to a crash.

shang January 24, 2018 08:30

Hi Uwe,

I am sorry that I don't really understand your point on cyclic BC, do you mean the cyclic BC is just a initial condition rather than a BC for complete duration of simulation?

Also, if the pressure difference doesn't fit the flow, I suppose the error of p or U will become large, but my case doesn't like that, as shown in the log file above, the error of p and U are very small.

Do you know if there is any other thing can cause such termination as I encountered?

Regards,
Yeru


All times are GMT -4. The time now is 15:02.