CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   reactingParcelFoam 2D crash in parallel, works fine in serial (https://www.cfd-online.com/Forums/openfoam-solving/153555-reactingparcelfoam-2d-crash-parallel-works-fine-serial.html)

FerdiFuchs May 28, 2015 12:15

reactingParcelFoam 2D crash in parallel, works fine in serial
 
Hi everyone,

im solving a "simple" 2d channelflow with air and a spray with water in 2d (similar to $FOAM_TUT/lagrangian/reactingParcelFoam/verticalChannel), just in 2d.

When i try to run this case in parallel, the solver crashes at the first injection timestep with the following errormessage:
Code:

Solving 2-D cloud reactingCloud1

--> Cloud: reactingCloud1 injector: model1
Added 91 new parcels

[$HOSTNAME:31049] *** An error occurred in MPI_Recv
[$HOSTNAME:31049] *** reported by process [139954540642305,1]
[$HOSTNAME:31049] *** on communicator MPI_COMM_WORLD
[$HOSTNAME:31049] *** MPI_ERR_TRUNCATE: message truncated
[$HOSTNAME:31049] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[$HOSTNAME:31049] ***    and potentially your MPI job)
[$HOSTNAME:31035] 2 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[$HOSTNAME:31035] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

In serial, it runs fine without any error. If i change number of $WM_NCOMPPROCS, sometimes the solver stucks instead of crashing. htop shows then a lot of red cpu usage (kernel threads).

i found something in the net; someone had the same error here, solved it by disable functionObjects and cloudFunctions. Not in my case...
method for decomposing is also irrelevant, i checked simple and scotch.

Maybe this thread is also better placed in OpenFOAM bugs? if someone could confirm this, i will also open an issue in OF-2.3.x-bugtracking.
Tomorrow ill check it in OF-2.4.x and in FE-3.1.

If somebody knows what to do, every help is appreciated. This case is some kind of urgent for me.

Thank you very much!

oswald June 11, 2015 06:31

I'm having a similar problem with a lagrangian tracking solver in parallel, based on icoUncoupledKinematicParcelFoam. It works at first, but after some time it crashes with the same error message as in your case.

Code:

[ran:7367] *** An error occurred in MPI_Waitall
[ran:7367] *** on communicator MPI_COMM_WORLD
[ran:7367] *** MPI_ERR_TRUNCATE: message truncated
[ran:7367] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 7367 on
node ran exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

And, also as in your case, sometimes it just gets stuck somewhere without crashing. I tried to narrow down the step where it gets stuck and it seems to be that in kinematicCloud.evolve() in kinematicCloud.C it is stuck at getting the trackingData

Code:

template<class CloudType>
void Foam::KinematicCloud<CloudType>::evolve()
{
    Info << "start kinematicCloud.evolve" << endl;
    if (solution_.canEvolve())
    {
        Info << "solution can evolve, getting track data" << endl;
        typename parcelType::template
            TrackingData<KinematicCloud<CloudType> > td(*this);

        Info << "start solving" << endl;
        solve(td);
    }
}

When the solver is stuck, my program's last output is "solution can evolve, getting track data". So it seems to be some error there.

When changing the commsType from nonBlocking to blocking in $WM_PROJECT_DIR/etc/controlDict, the error is:
Code:

[0]
[0]
[0] --> FOAM FATAL IO ERROR:
[0] error in IOstream "IOstream" for operation operator>>(Istream&, List<T>&) : reading first token
[0]
[0] file: IOstream at line 0.
[0]
[0]    From function IOstream::fatalCheck(const char*) const
[0]    in file db/IOstreams/IOstreams/IOstream.C at line 114.
[0]
FOAM parallel run exiting
[0]


clockworker August 4, 2015 10:32

reproduced
 
Hi there!

I ran into the same error message in a case similar to
$FOAM_TUT/lagrangian/reactingParcelFoam/verticalChannel/

On the tutorial case i was able to reproduce the described behavior
with the following commands:

Code:

#!/bin/sh
cd ${0%/*} || exit 1    # run from this directory

# Source tutorial run functions
. $WM_PROJECT_DIR/bin/tools/RunFunctions

# create mesh
runApplication blockMesh

cp -r 0.org 0

# initialise with potentialFoam solution
runApplication potentialFoam

rm -f 0/phi

# run the solver
runApplication pyFoamDecompose.py . 4
runApplication pyFoamPlotRunner.py mpirun -np 4 reactingParcelFoam -parallel

# ----------------------------------------------------------------- end-of-file

The calculation hangs at

Code:

...
Courant Number mean: 1.705107874 max: 4.895575368
deltaT = 0.0004761904762
Time = 0.0109524

Solving 3-D cloud reactingCloud1

with htop showing CPU usage of ~ 100 % on all cores.

If i deactivate
Code:

dispersionModel none;//stochasticDispersionRAS;
I can reproduce the error message in the OP:

Code:

--> Cloud: reactingCloud1 injector: model1
[$Hostname:15844] *** An error occurred in MPI_Recv
[$Hostname:15844] *** on communicator MPI_COMM_WORLD
[$Hostname:15844] *** MPI_ERR_TRUNCATE: message truncated
[$Hostname:15844] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort

I use Ubuntu 14.04.3 LTS with openfoam240. Can anyone else confirm this behaviour or even provide a solution?
Thank you very much for your time.

clockworker August 7, 2015 03:03

Hi there,
I think I stumbled upon a solution
I changed the reactingCloud1Properties from

Code:

massTotal      8;
duration        10000;

to

Code:

massTotal      0.0008;
duration        1;

and the calculation continued without the error messages
Hope this helps someone.

FerdiFuchs August 10, 2015 09:18

mh this does not really help.

what you changed is the timeframe of the injection and the mass which is injected in this time.
The Injection starts at SOI for the defined timeframe.

If you change these values, you will definetly get results you dont want to have. ;)

Greets,
Ferdi

clockworker August 10, 2015 18:19

3rd try
 
Hi Ferdi,

I was under the impression that you can maintain a constant mass flow rate if you change massTotal proportional to the duration according to this
HTML Code:

http://www.dhcae-tools.com/images/dhcaeLTSThermoParcelSolver.pdf
as long as duration is longer as endTime. I stand corrected if this is not the case.
Nonetheless, I was not able to reproduce the described behavior at home on 2 cores anymore. Meaning the error messages appear no matter what I do with massTotal or duration.

What I tried now was changing the injectionModel from patchInjection to coneNozzleInjection like this:

Code:

injectionModels
    {
            model1
    {
        type            coneNozzleInjection;
        SOI            0.01;
        massTotal      8;
        parcelBasisType mass;
        injectionMethod disc;
              flowType        constantVelocity;
              UMag                40;
        outerDiameter  6.5e-3;
        innerDiameter  0;
        duration        10000;
        position        ( 12.5e-3 -230e-3 0 );
        direction      ( 1 0 0 );
        parcelsPerSecond 1e5;
        flowRateProfile constant 1;
        Cd              constant 0.9;
        thetaInner      constant 0.0;
        thetaOuter      constant 1.0;

        sizeDistribution
        {
                type        general;
                generalDistribution
                {
                    distribution
                    (
                        (10e-06      0.0025)
                        (15e-06      0.0528)
                        (20e-06      0.2795)
                        (25e-06      1.0918)
                        (30e-06      2.3988)
                        (35e-06      4.4227)
                        (40e-06      6.3888)
                        (45e-06      8.6721)
                        (50e-06      10.3153)
                        (55e-06      11.6259)
                        (60e-06      12.0030)
                        (65e-06      10.4175)
                        (70e-06      10.8427)
                        (75e-06      8.0016)
                        (80e-06      6.1333)
                        (85e-06      3.8827)
                        (90e-06      3.4688)
                    );
                }
            }

And now the error messages disappear :) I don't know if coneNozzleInjection is applicable in 2D but perhaps that does provide a workaround. Or perhaps ManualInjection can be an alternative. I have to try this at my work case. It is also 2D.
Thanks Ferdi for taking the time.
Greetings
clockworker


All times are GMT -4. The time now is 10:12.