CFD Online Discussion Forums - Problems running a customized solver in parallel

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- OpenFOAM Programming & Development (https://www.cfd-online.com/Forums/openfoam-programming-development/)

- - Problems running a customized solver in parallel (https://www.cfd-online.com/Forums/openfoam-programming-development/138111-problems-running-customized-solver-parallel.html)

Problems running a customized solver in parallel

Hi All

I have been trying to run a new developed solver in parallel which generally delivers the same message meaning the evolution of the second process is broken down. But i have checked my MPI installation which is really fine and also runs some other standard tutorials in parallel. The decompostionPar file looks fine, as the complexity of my geometry is very low as it is square. However i have tried to used different decomposition schemes in segregating the geometry. Can somebody clarify me where actually does these errors arise from.

Note : I am trying to run it on a normal desktop (4 parallel).

Thanks in Advance
Rohith

Code:

 

Courant Number mean: 0 max: 0

Interface Courant Number mean: 0 max: 0

Time = 0.01



MULES: Solving for alpha1

MULES: Solving for alpha1

Liquid phase volume fraction = 0.7475  Min(alpha1) = 0  Min(alpha2) = 0

MULES: Solving for alpha1

MULES: Solving for alpha1

Liquid phase volume fraction = 0.7475  Min(alpha1) = 0  Min(alpha2) = 0

diagonal:  Solving for rho, Initial residual = 0, Final residual = 0, No Iterations 0

diagonal:  Solving for rhoCp, Initial residual = 0, Final residual = 0, No Iterations 0

diagonal:  Solving for rhoHs, Initial residual = 0, Final residual = 0, No Iterations 0

GAMG:  Solving for T, Initial residual = 1, Final residual = 0.0007007297, No Iterations 1

Correcting alpha3, mean residual = 2.9046116e-09, max residual = 0.0010659991

GAMG:  Solving for T, Initial residual = 1.3759125e-05, Final residual = 3.8207349e-08, No Iterations 2

Correcting alpha3, mean residual = 2.233146e-09, max residual = 0.00081956968

[rohith-ESPRIMO-P700:10520] *** An error occurred in MPI_Recv

[rohith-ESPRIMO-P700:10520] *** on communicator MPI_COMM_WORLD

[rohith-ESPRIMO-P700:10520] *** MPI_ERR_TRUNCATE: message truncated

[rohith-ESPRIMO-P700:10520] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort

--------------------------------------------------------------------------

mpirun has exited due to process rank 2 with PID 10520 on

node rohith-ESPRIMO-P700 exiting improperly. There are two reasons this could occur:



1. this process did not call "init" before exiting, but others in

the job did. This can cause a job to hang indefinitely while it waits

for all processes to call "init". By rule, if one process calls "init",

then ALL processes must call "init" prior to termination.



2. this process called "init", but exited without calling "finalize".

By rule, all processes that call "init" MUST call "finalize" prior to

exiting or it will be considered an "abnormal termination"



This may have caused other processes in the application to be

terminated by signals sent by mpirun (as reported here).

--------------------------------------------------------------------------

[rohith-ESPRIMO-P700:10517] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal

[rohith-ESPRIMO-P700:10517] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Greetings Rohith,

I've moved your post from the other thread: http://www.cfd-online.com/Forums/ope...ntroldict.html - because it wasn't a similar problem :(

The problem you're getting is that there is a problem on the receiver end in one of the processes, because the message was truncated. That usually refers to either there not being enough memory available for the data transfer to be performed safely or there was possibly an error in the network connection.

Without more information about the customizations you've made, it's almost impossible to diagnose the problem.
All I can say is that at work we had a similar problem sometime ago and the problem was that we weren't using enough "const &" variables for keeping a local copy of scalar fields; instead, we always called the method that calculated and gave us the whole field, which was... well... bad programming, since it had to calculate a lot of times the whole field for the whole mesh, just to give us one result for a single cell :rolleyes:

Suggestions:

Divide-and-conquer: Break down the code into smaller parts and comment out some of those parts of code, to attempt to isolate the problem.
Follow the same coding guidelines as OpenFOAM's source code. If you do not do so, you'll just be asking to get into trouble ;)...

Best regards,
Bruno

Same problem in new solver made of simpleIbFoam

Hi,
I have a similar problem in a solver developed using "simpleIBFoam" in foam-extend-4.0.

This happens only when the follwing part is added to the UEqn.H

Code:

-fvc::div(mu*dev2GradUTranspose)

dev2GradUTranspose is:

Code:

volTensorField dev2GradUTranspose =dev2(fvc::grad(U)().T())

UEqn.H is:

Code:

tmp<fvVectorMatrix> UEqn

(

fvm::div(phi,U)

-fvm::laplacian(mu,U)

-fvc::div(mu*dev2GradUTranspose)

);

UEqn().relax();

solve(UEqn() == -fvc::grad(p));

The error is:

Code:

[b-cn0105:506766] *** An error occurred in MPI_Recv

[b-cn0105:506766] *** reported by process [784990209,0]

[b-cn0105:506766] *** on communicator MPI_COMM_WORLD

[b-cn0105:506766] *** MPI_ERR_TRUNCATE: message truncated

[b-cn0105:506766] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,

[b-cn0105:506766] ***    and potentially your MPI job)

I will send case and solver if necessary.
I'm stucked in this long time.
Please see whether someone can help!!

Thanks in advance.
Thamali

Quick answer: My guess is that you should not use "dev2" independently from the whole equation. In other words, "dev2GradUTranspose" should not be used like that. You should instead code it directly like this:

Code:

tmp<fvVectorMatrix> UEqn

(

fvm::div(phi,U)

-fvm::laplacian(mu,U)

-fvc::div(mu*dev2(fvc::grad(U)().T()))

);

... wait... no, that's not it... this transposition of "grad()" sounds like trouble... where did you find this piece of source code?