CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Programming & Development

Problems running a customized solver in parallel

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   June 23, 2014, 03:17
Question Problems running a customized solver in parallel
  #1
Member
 
Rohith
Join Date: Oct 2012
Location: Germany
Posts: 57
Rep Power: 13
RaghavendraRohith is on a distinguished road
Hi All

I have been trying to run a new developed solver in parallel which generally delivers the same message meaning the evolution of the second process is broken down. But i have checked my MPI installation which is really fine and also runs some other standard tutorials in parallel. The decompostionPar file looks fine, as the complexity of my geometry is very low as it is square. However i have tried to used different decomposition schemes in segregating the geometry. Can somebody clarify me where actually does these errors arise from.


Note : I am trying to run it on a normal desktop (4 parallel).

Thanks in Advance
Rohith


Code:
 
Courant Number mean: 0 max: 0
Interface Courant Number mean: 0 max: 0
Time = 0.01

MULES: Solving for alpha1
MULES: Solving for alpha1
Liquid phase volume fraction = 0.7475  Min(alpha1) = 0  Min(alpha2) = 0
MULES: Solving for alpha1
MULES: Solving for alpha1
Liquid phase volume fraction = 0.7475  Min(alpha1) = 0  Min(alpha2) = 0
diagonal:  Solving for rho, Initial residual = 0, Final residual = 0, No Iterations 0
diagonal:  Solving for rhoCp, Initial residual = 0, Final residual = 0, No Iterations 0
diagonal:  Solving for rhoHs, Initial residual = 0, Final residual = 0, No Iterations 0
GAMG:  Solving for T, Initial residual = 1, Final residual = 0.0007007297, No Iterations 1
Correcting alpha3, mean residual = 2.9046116e-09, max residual = 0.0010659991
GAMG:  Solving for T, Initial residual = 1.3759125e-05, Final residual = 3.8207349e-08, No Iterations 2
Correcting alpha3, mean residual = 2.233146e-09, max residual = 0.00081956968
[rohith-ESPRIMO-P700:10520] *** An error occurred in MPI_Recv
[rohith-ESPRIMO-P700:10520] *** on communicator MPI_COMM_WORLD
[rohith-ESPRIMO-P700:10520] *** MPI_ERR_TRUNCATE: message truncated
[rohith-ESPRIMO-P700:10520] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
--------------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 10520 on
node rohith-ESPRIMO-P700 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[rohith-ESPRIMO-P700:10517] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[rohith-ESPRIMO-P700:10517] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
RaghavendraRohith is offline   Reply With Quote

Old   June 28, 2014, 14:24
Default
  #2
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,975
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings Rohith,

I've moved your post from the other thread: http://www.cfd-online.com/Forums/ope...ntroldict.html - because it wasn't a similar problem

The problem you're getting is that there is a problem on the receiver end in one of the processes, because the message was truncated. That usually refers to either there not being enough memory available for the data transfer to be performed safely or there was possibly an error in the network connection.

Without more information about the customizations you've made, it's almost impossible to diagnose the problem.
All I can say is that at work we had a similar problem sometime ago and the problem was that we weren't using enough "const &" variables for keeping a local copy of scalar fields; instead, we always called the method that calculated and gave us the whole field, which was... well... bad programming, since it had to calculate a lot of times the whole field for the whole mesh, just to give us one result for a single cell

Suggestions:
  • Divide-and-conquer: Break down the code into smaller parts and comment out some of those parts of code, to attempt to isolate the problem.
  • Follow the same coding guidelines as OpenFOAM's source code. If you do not do so, you'll just be asking to get into trouble ...

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   April 25, 2018, 05:55
Default Same problem in new solver made of simpleIbFoam
  #3
Member
 
Thamali
Join Date: Jul 2013
Posts: 67
Rep Power: 12
Thamali is on a distinguished road
Hi,
I have a similar problem in a solver developed using "simpleIBFoam" in foam-extend-4.0.

This happens only when the follwing part is added to the UEqn.H

Code:
-fvc::div(mu*dev2GradUTranspose)
dev2GradUTranspose is:
Code:
volTensorField dev2GradUTranspose =dev2(fvc::grad(U)().T())
UEqn.H is:

Code:
tmp<fvVectorMatrix> UEqn
(
fvm::div(phi,U)
-fvm::laplacian(mu,U)
-fvc::div(mu*dev2GradUTranspose)
);
UEqn().relax();
solve(UEqn() == -fvc::grad(p));
The error is:

Code:
[b-cn0105:506766] *** An error occurred in MPI_Recv
[b-cn0105:506766] *** reported by process [784990209,0]
[b-cn0105:506766] *** on communicator MPI_COMM_WORLD
[b-cn0105:506766] *** MPI_ERR_TRUNCATE: message truncated
[b-cn0105:506766] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[b-cn0105:506766] ***    and potentially your MPI job)
I will send case and solver if necessary.
I'm stucked in this long time.
Please see whether someone can help!!

Thanks in advance.
Thamali
Thamali is offline   Reply With Quote

Old   April 30, 2018, 17:44
Default
  #4
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,975
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Quick answer: My guess is that you should not use "dev2" independently from the whole equation. In other words, "dev2GradUTranspose" should not be used like that. You should instead code it directly like this:
Code:
tmp<fvVectorMatrix> UEqn
(
fvm::div(phi,U)
-fvm::laplacian(mu,U)
-fvc::div(mu*dev2(fvc::grad(U)().T()))
);
... wait... no, that's not it... this transposition of "grad()" sounds like trouble... where did you find this piece of source code?
__________________
wyldckat is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
[ANSYS Meshing] Help with element size sandri_92 ANSYS Meshing & Geometry 14 November 14, 2018 07:54
Problem running in parralel Val OpenFOAM Running, Solving & CFD 1 June 12, 2014 02:47
rhoCentralFoam solver with Slip BCs fails in Parallel Only JLight OpenFOAM Running, Solving & CFD 2 October 11, 2012 21:08
Customized solver to run in parallel hsieh OpenFOAM Running, Solving & CFD 3 September 21, 2006 04:59
Coupled problem running in parallel liu OpenFOAM Running, Solving & CFD 1 June 24, 2005 05:57


All times are GMT -4. The time now is 01:39.