CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (http://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Finite area method (fac::div) fails in parallel (http://www.cfd-online.com/Forums/openfoam-solving/108454-finite-area-method-fac-div-fails-parallel.html)

cuba October 24, 2012 08:01

Finite area method (fac::div) fails in parallel
 
Hi everyone,
I am a new foam user and working on a modified version of pimpleDyMFoam utilized with a k-omega model in 1.6-ext.
The problem is the code works fine for serial runs but stops working in parallel when it comes to line-3:

Code:

1- var0.boundaryField()[PAtchID] = U.boundaryField()[PAtchID];
2- var1.internalField() = vsm.mapToSurface(var0.boundaryField());
3- areaScalarField var2 = - fac::div(var1);

The above given lines depend on a physical condition. If the condition is false then the program does not evaluate the above given lines and works also fine in parallel.

The error message in parallel (where
export FOAM_ABORT=1
mpirun --mca orte_base_help_aggregate 0 -d -np 4 pimpleDyMFoam -parallel > log ) is given below:


Code:

[n-62-24-13:13854] *** An error occurred in MPI_Recv
[n-62-24-13:13854] *** on communicator MPI_COMM_WORLD
[n-62-24-13:13854] *** MPI_ERR_TRUNCATE: message truncated
[n-62-24-13:13854] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[n-62-24-13:13854] sess_dir_finalize: proc session dir not empty - leaving
[n-62-24-13:13853] sess_dir_finalize: proc session dir not empty - leaving
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 13854 on
node n-62-24-13 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[n-62-24-13:13853] sess_dir_finalize: job session dir not empty - leaving
[n-62-24-13:13853] sess_dir_finalize: proc session dir not empty - leaving
orterun: exiting with status 15


In the meantime, I have tried different div schemes (in faSchemes file) changing the line

Code:

default                Gauss linear;
to other Gauss interpolations, yet, I could not get rid of the error.

What might be the problem? How can I debug more and find the error?
Any comments and advices ?
Thanks in advance

kmooney October 25, 2012 11:25

Have you tried compiling and running in debug? I've had pretty good luck with mpirunDebug when it comes to parallel debugging.

I use the finite area library in parallel but unfortunately I do not use that operator.

cuba October 25, 2012 11:32

Thanks for the reply
I thought that adding "-d" in the mpirun would be enough to debug, but apparently did help at all.
Should I compile the program first in a way that it can be debugged in parallel ? and how can I do that?

kmooney October 25, 2012 11:46

Quote:

Originally Posted by cuba (Post 388513)
Thanks for the reply
I thought that adding "-d" in the mpirun would be enough to debug, but apparently did help at all.
Should I compile the program first in a way that it can be debugged in parallel ? and how can I do that?


Yep, recompile with the debug compiler option set. Set this option by running this in the shell:
Code:

export WM_COMPILE_OPTION=Debug
then clean and wmake to recompile. You should see a flag in the compiler output jargon that looks like -FULLDEBUG or something like that as you recompile.

You might need to install mpirunDebug from your linux software repository. I don't think it comes with the standard MPI package.

cuba October 25, 2012 11:58

Thanks for the replies, I will be working on that

cuba October 29, 2012 10:09

I have re-compiled my code entering the "export WM_COMPILE_OPTION=Debug" to the terminal first. It gave me the below given message.

--------------------------
g++ -m64 -Dlinux64 -DWM_DP -Wall -Wextra -Wno-unused-parameter -Wold-style-cast -Wnon-virtual-dtor -O0 -fdefault-inline -ggdb3 -DFULLDEBUG -DNoRepository -ftemplate-depth-40 -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/dynamicMesh/dynamicFvMesh/lnInclude -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/dynamicMesh/dynamicMesh/lnInclude -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/meshTools/lnInclude -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/turbulenceModels -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/transportModels -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/transportModels/incompressible/singlePhaseTransportModel -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/finiteArea/lnInclude -DFACE_DECOMP -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/tetDecompositionFiniteElement/lnInclude -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/tetDecompositionMotionSolver/lnInclude -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/finiteVolume/lnInclude -IlnInclude -I. -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/OpenFOAM/lnInclude -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/OSspecific/POSIX/lnInclude -fPIC -Xlinker --add-needed Make/linux64GccDPOpt/pimpleDyMFoam.o -L/appl/OpenFOAM/OpenFOAM-1.6-ext/lib/linux64GccDPOpt \
-ldynamicFvMesh -ltopoChangerFvMesh -ldynamicMesh -lmeshTools -lincompressibleTransportModels -lincompressibleTurbulenceModel -lincompressibleRASModels -lincompressibleLESModels -lfiniteArea -lfiniteVolume -llduSolvers -lOpenFOAM -liberty -ldl -ggdb3 -DFULLDEBUG -lm -o /zhome/83/d/74221/OpenFOAM/cuba-1.6-ext/applications/bin/linux64GccDPOpt/pimpleDyMFoam
-------------------------------

later I decomposed my domain and entered

mpirunDebug -np 4 pimpleDyMFoam -parallel

later I have selected

Choose running method: 0)normal 1)gdb+xterm 2)gdb 3)log 4)log+xterm 5)xterm+valgrind 6)nemiver: 1
Run all processes local or distributed? 1)local 2)remote: 2


It produced gdbCommands, mpirun.schema, processor0.sh, processor1.sh, processor2.sh and processor3.sh files.

How can I start the runs per each processor?

If I just type processor0.sh to the terminal, I get the following in the processor0.log file and terminal.


-----------------------
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-48.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /zhome/83/d/74221/OpenFOAM/cuba-1.6-ext/applications/bin/linux64GccDPOpt/pimpleDyMFoam...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Detaching after fork from child process 22079.
[New Thread 0x7fffee345700 (LWP 22083)]
[Thread 0x7fffee345700 (LWP 22083) exited]

--> FOAM FATAL ERROR:
bool Pstream::init(int& argc, char**& argv) : attempt to run parallel on 1 processor

From function Pstream::init(int& argc, char**& argv)
in file Pstream.C at line 74.

FOAM aborting

Program received signal SIGABRT, Aborted.
0x00000030f8232885 in raise () from /lib64/libc.so.6
#0 0x00000030f8232885 in raise () from /lib64/libc.so.6
#1 0x00000030f8234065 in abort () from /lib64/libc.so.6
#2 0x00007ffff448d28b in Foam::error::abort() () from /appl/OpenFOAM/OpenFOAM-1.6-ext/lib/linux64GccDPOpt/libOpenFOAM.so
#3 0x00007ffff3b781da in Foam::Pstream::init(int&, char**&) () from /appl/OpenFOAM/OpenFOAM-1.6-ext/lib/linux64GccDPOpt/openmpi-system/libPstream.so
#4 0x00007ffff449a655 in Foam::argList::argList(int&, char**&, bool, bool) () from /appl/OpenFOAM/OpenFOAM-1.6-ext/lib/linux64GccDPOpt/libOpenFOAM.so
#5 0x00000000004252b3 in main ()
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.5.x86_64 libgcc-4.4.5-6.el6.x86_64 libibverbs-1.1.4-2.el6.x86_64 librdmacm-1.0.10-2.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
[?1034h(gdb)
--------------------------------



Could anyone give me more information on how to use mpirunDebug?

cuba November 1, 2012 04:40

Hi everyone,

I could not make a progress on mpirun debugging yet but I have another question.

How can I make the processors synchronized before evaluating the line ?

Code:

3- areaScalarField var2 = - fac::div(var1);

cuba November 8, 2012 11:14

Anyone knows how to make the processors wait for each other before evaluating some piece of code as above ?

wyldckat November 10, 2012 05:36

Greetings Cuba,

In reply to your last post and PM you sent me:
  • An old thread on this subject: http://www.cfd-online.com/Forums/ope...-openfoam.html
  • Simpler code can be found in "applications/test/parallel/Test-parallel.C", more specifically the part that starts at:
    Code:

    Perr<< "\nStarting transfers\n" << endl;
    In that example, the slaves can be stalled with the block of code that starts with:
    Code:

    Perr<< "slave receiving from master "
    While the master can hold them back until the "for" loop that has this line is executed:
    Code:

    Perr << "master sending to slave " << slave << endl;
    The only problem with this is that the master will be the last one back on the job, since it has to first communicate with all other slaves, telling them to get back to work :rolleyes:...
Best regards,
Bruno

cuba November 15, 2012 08:03

Thanks for the reply wyldckat
It did help me to have a better understanding about Pstream commands.

I finally solved (at least I got around it) my problem.
My problem was briefly ... one of subdomains was entering the routine while others were not as the condition defined to enter the routine was not true for them.

In the mean time,
to find the maximum value of a variable defined on a patch over all the subdomains, I have tried both gMax(var) and max(reduce(var, maxOp<scalarField>)) commands. But the one found by the gMax was not the maximum and was actually smaller than the one found by the reduce command. Anyone noticed such a thing before?

Thanks again for the replies

wyldckat November 20, 2012 08:03

Hi Cuba,

Quote:

Originally Posted by cuba (Post 392250)
In the mean time,
to find the maximum value of a variable defined on a patch over all the subdomains, I have tried both gMax(var) and max(reduce(var, maxOp<scalarField>)) commands. But the one found by the gMax was not the maximum and was actually smaller than the one found by the reduce command. Anyone noticed such a thing before?

I haven't had the time to check on this yet... it sort-of looks like a bug, but without a test case that replicates the issue, it's hard to do any checking myself.
By the way, the maximum reported value, which one was actually correct?
Because it's also possible that one of them was actually picking up an outdated value!
For example, one of them might be picking up a value that was communicated between processes at the beginning of the iterations, but the maximum value was calculated only after those iterations, and since it was located between processes, the value is no longer up-to-date...

Best regards,
Bruno


All times are GMT -4. The time now is 22:54.