Finite area method (fac::div) fails in parallel
I am a new foam user and working on a modified version of pimpleDyMFoam utilized with a k-omega model in 1.6-ext.
The problem is the code works fine for serial runs but stops working in parallel when it comes to line-3:
The error message in parallel (where
mpirun --mca orte_base_help_aggregate 0 -d -np 4 pimpleDyMFoam -parallel > log ) is given below:
In the meantime, I have tried different div schemes (in faSchemes file) changing the line
What might be the problem? How can I debug more and find the error?
Any comments and advices ?
Thanks in advance
Have you tried compiling and running in debug? I've had pretty good luck with mpirunDebug when it comes to parallel debugging.
I use the finite area library in parallel but unfortunately I do not use that operator.
Thanks for the reply
I thought that adding "-d" in the mpirun would be enough to debug, but apparently did help at all.
Should I compile the program first in a way that it can be debugged in parallel ? and how can I do that?
Yep, recompile with the debug compiler option set. Set this option by running this in the shell:
You might need to install mpirunDebug from your linux software repository. I don't think it comes with the standard MPI package.
Thanks for the replies, I will be working on that
I have re-compiled my code entering the "export WM_COMPILE_OPTION=Debug" to the terminal first. It gave me the below given message.
g++ -m64 -Dlinux64 -DWM_DP -Wall -Wextra -Wno-unused-parameter -Wold-style-cast -Wnon-virtual-dtor -O0 -fdefault-inline -ggdb3 -DFULLDEBUG -DNoRepository -ftemplate-depth-40 -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/dynamicMesh/dynamicFvMesh/lnInclude -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/dynamicMesh/dynamicMesh/lnInclude -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/meshTools/lnInclude -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/turbulenceModels -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/transportModels -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/transportModels/incompressible/singlePhaseTransportModel -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/finiteArea/lnInclude -DFACE_DECOMP -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/tetDecompositionFiniteElement/lnInclude -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/tetDecompositionMotionSolver/lnInclude -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/finiteVolume/lnInclude -IlnInclude -I. -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/OpenFOAM/lnInclude -I/appl/OpenFOAM/OpenFOAM-1.6-ext/src/OSspecific/POSIX/lnInclude -fPIC -Xlinker --add-needed Make/linux64GccDPOpt/pimpleDyMFoam.o -L/appl/OpenFOAM/OpenFOAM-1.6-ext/lib/linux64GccDPOpt \
-ldynamicFvMesh -ltopoChangerFvMesh -ldynamicMesh -lmeshTools -lincompressibleTransportModels -lincompressibleTurbulenceModel -lincompressibleRASModels -lincompressibleLESModels -lfiniteArea -lfiniteVolume -llduSolvers -lOpenFOAM -liberty -ldl -ggdb3 -DFULLDEBUG -lm -o /zhome/83/d/74221/OpenFOAM/cuba-1.6-ext/applications/bin/linux64GccDPOpt/pimpleDyMFoam
later I decomposed my domain and entered
mpirunDebug -np 4 pimpleDyMFoam -parallel
later I have selected
Choose running method: 0)normal 1)gdb+xterm 2)gdb 3)log 4)log+xterm 5)xterm+valgrind 6)nemiver: 1
Run all processes local or distributed? 1)local 2)remote: 2
It produced gdbCommands, mpirun.schema, processor0.sh, processor1.sh, processor2.sh and processor3.sh files.
How can I start the runs per each processor?
If I just type processor0.sh to the terminal, I get the following in the processor0.log file and terminal.
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-48.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
Reading symbols from /zhome/83/d/74221/OpenFOAM/cuba-1.6-ext/applications/bin/linux64GccDPOpt/pimpleDyMFoam...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Detaching after fork from child process 22079.
[New Thread 0x7fffee345700 (LWP 22083)]
[Thread 0x7fffee345700 (LWP 22083) exited]
--> FOAM FATAL ERROR:
bool Pstream::init(int& argc, char**& argv) : attempt to run parallel on 1 processor
From function Pstream::init(int& argc, char**& argv)
in file Pstream.C at line 74.
Program received signal SIGABRT, Aborted.
0x00000030f8232885 in raise () from /lib64/libc.so.6
#0 0x00000030f8232885 in raise () from /lib64/libc.so.6
#1 0x00000030f8234065 in abort () from /lib64/libc.so.6
#2 0x00007ffff448d28b in Foam::error::abort() () from /appl/OpenFOAM/OpenFOAM-1.6-ext/lib/linux64GccDPOpt/libOpenFOAM.so
#3 0x00007ffff3b781da in Foam::Pstream::init(int&, char**&) () from /appl/OpenFOAM/OpenFOAM-1.6-ext/lib/linux64GccDPOpt/openmpi-system/libPstream.so
#4 0x00007ffff449a655 in Foam::argList::argList(int&, char**&, bool, bool) () from /appl/OpenFOAM/OpenFOAM-1.6-ext/lib/linux64GccDPOpt/libOpenFOAM.so
#5 0x00000000004252b3 in main ()
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.5.x86_64 libgcc-4.4.5-6.el6.x86_64 libibverbs-1.1.4-2.el6.x86_64 librdmacm-1.0.10-2.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
Could anyone give me more information on how to use mpirunDebug?
I could not make a progress on mpirun debugging yet but I have another question.
How can I make the processors synchronized before evaluating the line ?
Anyone knows how to make the processors wait for each other before evaluating some piece of code as above ?
In reply to your last post and PM you sent me:
Thanks for the reply wyldckat
It did help me to have a better understanding about Pstream commands.
I finally solved (at least I got around it) my problem.
My problem was briefly ... one of subdomains was entering the routine while others were not as the condition defined to enter the routine was not true for them.
In the mean time,
to find the maximum value of a variable defined on a patch over all the subdomains, I have tried both gMax(var) and max(reduce(var, maxOp<scalarField>)) commands. But the one found by the gMax was not the maximum and was actually smaller than the one found by the reduce command. Anyone noticed such a thing before?
Thanks again for the replies
By the way, the maximum reported value, which one was actually correct?
Because it's also possible that one of them was actually picking up an outdated value!
For example, one of them might be picking up a value that was communicated between processes at the beginning of the iterations, but the maximum value was calculated only after those iterations, and since it was located between processes, the value is no longer up-to-date...
|All times are GMT -4. The time now is 22:54.|