CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM (http://www.cfd-online.com/Forums/openfoam/)
-   -   Open MPI-fork() error (http://www.cfd-online.com/Forums/openfoam/103931-open-mpi-fork-error.html)

zxj160 June 29, 2012 06:38

Open MPI-fork() error
 
Hi,

I run a parallel case from 0s to 20.8s. But there is an error in 20.8s as follow. Could anyone know how to solve it?

Time = 20.8
Courant Number mean: 0.208474 max: 0.316474
DILUPBiCG: Solving for Ux, Initial residual = 0.000518377, Final residual = 1.4007e-06, No Iterations 1
DILUPBiCG: Solving for Uy, Initial residual = 0.0138311, Final residual = 4.40116e-06, No Iterations 2
DILUPBiCG: Solving for Uz, Initial residual = 0.0111841, Final residual = 3.21597e-06, No Iterations 2
DILUPBiCG: Solving for C, Initial residual = 0.000130826, Final residual = 5.54248e-08, No Iterations 1
[1] #0 Foam::error::printStack(Foam::Ostream&)[5] #0 Foam::error::printStack(Foam::Ostream&)[3] #0 Foam::error::printStack(Foam::Ostream&)[7] #0 Foam::error::printStack(Foam::Ostream&)--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
The process that invoked fork was:
Local host: u2n126 (PID 19527)
MPI_COMM_WORLD rank: 1
If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.

wyldckat June 30, 2012 07:37

Greetings zxj160,

Not much information to go on with to diagnose the issue.

According to the error message, it looks like you're trying to launch another application from within the solver during parallel execution.

I've searched online for the last message line and picked up on this:
Quote:

Originally Posted by http://webstokes.ist.ucf.edu/forum/viewtopic.php?f=10&t=101#p247

Change the mpirun line from:
Code:

mpirun -machinefile $PBS_NODEFILE -np $NP $EXECUTABLE
to:
Code:

mpirun --mca mpi_warn_on_fork 0 -machinefile $PBS_NODEFILE -np $NP $EXECUTABLE
Please let me know if this works for you.

Best regards,
Bruno

zxj160 June 30, 2012 12:42

Quote:

Originally Posted by wyldckat (Post 368993)
Greetings zxj160,

Not much information to go on with to diagnose the issue.

According to the error message, it looks like you're trying to launch another application from within the solver during parallel execution.

I've searched online for the last message line and picked up on this:


Best regards,
Bruno

Hi, many thanks for your reply. It may depend on the values I set to some constant scalar. I change the value now it can run longer time.

By the way, I find that the cyclic inlet and outlet patch can not accept any other type of BC (eg. zeroGradient), only cyclic for all the variables. My velocity in the inlet and outlet is cyclic. But I want to set the passive scalar fixvalue (0) for the inlet and zeroGradient for the outlet. Do you have any idea of this problem? Many thanks.

wyldckat June 30, 2012 17:49

Mmm... I vaguely remember that OpenFOAM calls an external application when it wants to do a printStack (i.e., when it tries to do a controlled crash and show how it got where it is when it crashes)... so that's why it wants to fork()...

Have you tried one of the "directMapped*" BCs instead of "cyclic"?

zxj160 July 1, 2012 06:13

Quote:

Originally Posted by wyldckat (Post 369044)
Mmm... I vaguely remember that OpenFOAM calls an external application when it wants to do a printStack (i.e., when it tries to do a controlled crash and show how it got where it is when it crashes)... so that's why it wants to fork()...

Have you tried one of the "directMapped*" BCs instead of "cyclic"?

I have not tried 'directMapped' BCs but I heard about it. But I define cyclic BC for inlet and oulet in the blockMeshDict. I do not know whether the cyclic BCs accept 'directMapped' BCs. If I used 'directMapped' BCs, the inlet and outlet will still be cyclic? or those I mapped?

I know a sub-derived cyclic BC, fan ,is used like:
ad
{
type fan;
patchType cyclic;
f List<scalar> 2(10.0 -1.0);
value uniform 0;
}

I want to set
inlet
{
type fixedValue;
patchType cyclic;
value uniform 0;
}
But I remember someone said that cyclic BC can only accept cyclic and its sub-derived BC (e.g. fan BC). I do not know whether my inlet idea is correct or not.

zxj160 July 1, 2012 06:14

Quote:

Originally Posted by zxj160 (Post 369073)
I have not tried 'directMapped' BCs but I heard about it. But I define cyclic BC for inlet and oulet in the blockMeshDict. I do not know whether the cyclic BCs accept 'directMapped' BCs. If I used 'directMapped' BCs, the inlet and outlet will still be cyclic? or those I mapped?

I know a sub-derived cyclic BC, fan ,is used like:
ad
{
type fan;
patchType cyclic;
f List<scalar> 2(10.0 -1.0);
value uniform 0;
}

I want to set
inlet
{
type fixedValue;
patchType cyclic;
value uniform 0;
}
But I remember someone said that cyclic BC can only accept cyclic and its sub-derived BC (e.g. fan BC). I do not know whether my inlet idea is correct or not.

I also want to set
outlet
{
type zeroGradient;
patchType cyclic;
}

wyldckat July 1, 2012 15:27

Hi zxj160,

Unfortunately I don't know.
All I know is that the cyclic boundary condition is conceptually similar to the symmetry boundary condition, in the sense that it does everything on its own. directMapped samples the result from one end and places it in the other, having to be defined in the "polyMesh/boundary" file the particular sampling location.

And by the example you gave of the cyclic fan, it looks like you'll have to code your own BC derived from the cyclic one. This is in case the directMapped one or derived ones from then don't do what you want!

Best regards,
Bruno

zxj160 July 4, 2012 10:39

Quote:

Originally Posted by wyldckat (Post 369148)
Hi zxj160,

Unfortunately I don't know.
All I know is that the cyclic boundary condition is conceptually similar to the symmetry boundary condition, in the sense that it does everything on its own. directMapped samples the result from one end and places it in the other, having to be defined in the "polyMesh/boundary" file the particular sampling location.

And by the example you gave of the cyclic fan, it looks like you'll have to code your own BC derived from the cyclic one. This is in case the directMapped one or derived ones from then don't do what you want!

Best regards,
Bruno

Dear Bruno,

I am trying to use directMapped. But I am new to it. I do not know how to use it. Could you explain the meaning of each keywords. The following comes from pisoFoam/pitzDailyDirectMapped in the tutorials.

blockMeshDict
inlet
{
type directMappedPatch;
offset ( 0.0495 0 0 );
sampleRegion region0;
sampleMode nearestCell;
samplePatch none;
}

0/U
boundaryField
{
inlet
{
type directMapped;
value uniform (10 0 0);
interpolationScheme cell;
setAverage true;
average (10 0 0);
}


The distance between my inlet and outlet is 30m.

Many thanks,
Jian

wyldckat July 4, 2012 18:23

Hi Jian,

Most of the parameters here are self-explanatory. When in doubt about other options for most of those parameters: Use bananas

As for "offset", it's sort-of simple: it indicates the relative position of the other patch to look at for cell data.
  1. Imagine the layer of cells next to the outlet patch.
  2. Visualize the location of the plane that intersects the center of the cells in that layer, or at least is a plane goes through all of the relevant cells...
  3. The offset will be the relative location from the inlet patch (the origin of this referential) and that plane near the outlet.
Conceptually, it would make a lot more sense to simply state the name of the other patch and get information directly from it. Unfortunately, AFAIK OpenFOAM's infrastructure doesn't give that much freedom, so it requires this trick of referencing the cells near the other patch.

Best regards,
Bruno

Bernhard January 17, 2013 03:47

I am experiencing the same kind of error with OpenFOAM 2.1.1 while using the mapped boundary condition. Does any of you know if there are updates on this MPI-fork() issue?

Is it safe to run with mpi_warn_on_fork switched of?

wyldckat January 17, 2013 10:08

Greetings Bernhard,

I suppose it's somewhat safe to turn off that warning, even if it's just for one test run.

But the problem might be larger, since if it's triggering that warning, that's very likely because your case is crashing. As I wrote above, the printStack method launches an application that helps diagnose where the crash occurred.

As for the direct mapped problem, it would be good to know if you're able to reproduce the same error with a simple case or a modified tutorial case!

By the way, knowing which MPI version you're using could also help! And if you're using GridEngine, do not use Open-MPI 1.5.3 that comes with OpenFOAM. Either downgrade to Open-MPI 1.4.x or upgrade to 1.6.x.

Best regards,
Bruno

Bernhard January 17, 2013 17:27

I am quite convinced that it is not my case that is causing the error, and I will try to construct a minimal reconstructive case. If I replace mapped by fixedValue there are no issues by the way.

I am using PBS and OpenMPI 1.4.4. I don't know if there are known issues with this set-up?

Bernhard January 18, 2013 18:11

Ok, so I re-setup my case. I did not do anything special or different from the earlier situation, but for some reason I do not encounter this issue anymore. Wyldcat, I think I have to agree with you on the printStack, but I am still a bit puzzled...

wyldckat January 18, 2013 18:50

Hi Bernhard,

I can't remember about any problems with PBS and Open-MPI 1.4.4, so my guess is that the problem is related to a crash.

As for things now working as intended, there are few possible scenarios:
  1. The previous "constant/polyMesh/boundary" file might have been somewhat damaged. Re-doing the case set-up might have cleared up things.
  2. Minor mesh changes or domain decomposition may affect how things are working. In particular, you might be triggering a bug related to "directMapped" that only occurs when decompositions occur in a certain way.
  3. Cluster occupancy or hardware stability might affect how OpenFOAM is operating.
Either way, without a test case, it's very hard to diagnose the problem :(

Best regards,
Bruno

Bernhard January 19, 2013 15:39

Hi Bruno.

1. I did not change the boundary file for the new setup.
2. I tried quite a few decompositions, so I can rule that out.
3. On the cluster I've used here, I have the nodes available for myself and the system is maintained by a bunch of professionals, and I did not have any hardware stability issues ever.

Now I am rechecking things, I see that in the failed case, for some files I have both a .gz and an uncompressed file. I don't know which have been read, but probably not the intended one for some variables. Have been looking for the error in the wrong spot, as this have to be it. Although OpenFOAM should read the files according to the settings in the controlDict, but I think it does not do so.

wyldckat January 19, 2013 15:50

Hi Bernhard,

Quote:

Originally Posted by Bernhard (Post 402882)
Although OpenFOAM should read the files according to the settings in the controlDict, but I think it does not do so.

Honestly I don't know what is OpenFOAM's reading priority, but I do know that it has to ignore "controlDict" when reading (compressed vs uncompressed only), because you might want to switch between compressed and uncompressed between time steps, so the previous state still has to be read!

But that's one interesting detail that I'll try to keep in mind: always check if there are duplicate files inside the "constant" and time folders!!

Best regards,
Bruno

smraniaki December 18, 2014 16:58

I've had the same problem. This is one of those problem drive me crazy! I could unreasonably solve the problem by modifying fvSolution dictionary (changing a smoother). The other time I could get rid of it by changing my decomposition scheme. I still don't know how and why it happens but apparently it comes from ghost cells that are not identifiable by MPI.

Goodluck
Smran

linch February 16, 2016 09:13

Is there any update to this issue?

wyldckat February 21, 2016 14:03

Quote:

Originally Posted by linch (Post 585427)
Is there any update to this issue?

Quick answer: No one ever managed to give me more details on how to reproduce this error, therefore I didn't manage to find a solution for this :(.

If you can provide me with more details, then it'll be easier to diagnose and solve the problem.
If not, then try the information provided here: Notes about running OpenFOAM in parallel - specially this information:
Quote:

Is the output from mpirun (Open-MPI) only coming out at the end of the run? Check this post: mpirun openfoam output is buffered, only output at the end post #9

linch February 22, 2016 05:04

Thanks a lot, Bruno.

I regularly received the error using OF2.1.x on our local computational cluster. But since it has moved to a new OS this weekend, I'll first have to test if the problem still persists.


All times are GMT -4. The time now is 17:33.