CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM Installation (http://www.cfd-online.com/Forums/openfoam-installation/)
-   -   OpenFOAM solvers not able to run in parallel (http://www.cfd-online.com/Forums/openfoam-installation/126730-openfoam-solvers-not-able-run-parallel.html)

raagh77 November 25, 2013 06:31

OpenFOAM solvers not able to run in parallel
 
To all OpenFOAM users,
We successfully installed OpenFOAM 1.7.1 in our SGI cluster with RHEL 6.2 and also tested one solver simpleFoam with 80 number of processors.
Now we just gave a trial run with twoPhaseEulerFoam but ended up with some errors [as described below]
Code:

[31]  [15] ##00    Foam::error: rintStack(Foam::Ostream&)Foam::error: rintStack(Foam::Ostream&)[30]  #0  Foam::error:
rintStack(Foam::Ostream&)--------------------------------------------------------------------------
An  MPI process has executed an operation involving a call to the
"fork()" system  call to create a child process.  Open MPI is currently
operating in a  condition that could result in memory corruption or
other system errors; your  MPI job may hang, crash, or produce silent
data corruption.  The use of  fork() (or system() or other calls that
create child processes) is strongly  discouraged.

The process that invoked fork was:

  Local  host:          compute-0-8.local (PID 32076)
  MPI_COMM_WORLD rank:  31

If you are *absolutely sure* that your application will  successfully
and correctly survive a call to fork(), you may disable this  warning
by setting the mpi_warn_on_fork MCA parameter to  0.
--------------------------------------------------------------------------
 addr2line  failed
[30] #1  Foam::sigFpe::sigFpeHandler(int) addr2line failed
[15] #1  Foam::sigFpe::sigFpeHandler(int) addr2line failed
[30] #2
[30]  addr2line  failed
[30] #3  Foam:ILUPreconditioner::calcReciprocalD(Foam::Field<double>&,  Foam::lduMatrix const&) addr2line failed
[30] #4  Foam:ILUPreconditioner:ILUPreconditioner(Foam::lduMatrix::solver const&,  Foam::dictionary const&) addr2line failed
[15] #2
 addr2line  failed
[30] #5  Foam::lduMatrix:reconditioner::addasymMatrixConstructorToTable<Foam:ILUPreconditioner>::New(Foam::lduMatrix::solver  const&, Foam::dictionary const&) addr2line failed
[30] #6  Foam::lduMatrix:reconditioner::New(Foam::lduMatrix::solver const&,  Foam::dictionary const&) addr2line failed
[30] #7  Foam::PBiCG::solve(Foam::Field<double>&, Foam::Field<double>  const&, unsigned char) const[15]  addr2line failed
[15] #3  Foam:ILUPreconditioner::calcReciprocalD(Foam::Field<double>&,  Foam::lduMatrix const&) addr2line failed
[30] #8  Foam::fvMatrix<double>::solve(Foam::dictionary const&) addr2line  failed
[30] #9  addr2line failed
[15] #4  Foam:ILUPreconditioner:ILUPreconditioner(Foam::lduMatrix::solver const&,  Foam::dictionary const&)
[30]
[30] #10  __libc_start_main addr2line  failed
[15] #5  Foam::lduMatrix::preconditioner::addasymMatrixConstructorToTable<Foam::DILUPreconditioner>::New(Foam::lduMatrix::solver  const&, Foam::dictionary const&) addr2line failed
[30]  #11
[30]
[compute-0-8:32075] *** Process received signal  ***
[compute-0-8:32075] Signal: Floating point exception  (8)
[compute-0-8:32075] Signal code:  (-6)
[compute-0-8:32075] Failing at  address: 0x1f400007d4b
 addr2line failed
[15] #6  Foam::lduMatrix::preconditioner::New(Foam::lduMatrix::solver const&,  Foam::dictionary const&)[compute-0-8:32075] [ 0] /lib64/libc.so.6()  [0x3c81a32900]
[compute-0-8:32075] [ 1] /lib64/libc.so.6(gsignal+0x35)  [0x3c81a32885]
[compute-0-8:32075] [ 2] /lib64/libc.so.6()  [0x3c81a32900]
[compute-0-8:32075] [ 3]  /apps2/OpenFOAM/OpenFOAM-1.7.1/lib/linux64GccDPOpt/libOpenFOAM.so(_ZN4Foam18DILUPreconditioner15calcReciprocalDERNS_5FieldIdEERKNS_9lduMatrixE+0x137)  [0x2b68fd599877]
[compute-0-8:32075] [ 4]  /apps2/OpenFOAM/OpenFOAM-1.7.1/lib/linux64GccDPOpt/libOpenFOAM.so(_ZN4Foam18DILUPreconditionerC2ERKNS_9lduMatrix6solverERKNS_10dictionaryE+0x159)  [0x2b68fd599a39]
[compute-0-8:32075] [ 5]  /apps2/OpenFOAM/OpenFOAM-1.7.1/lib/linux64GccDPOpt/libOpenFOAM.so(_ZN4Foam9lduMatrix14preconditioner31addasymMatrixConstructorToTableINS_18DILUPreconditionerEE3NewERKNS0_6solverERKNS_10dictionaryE+0x3c)  [0x2b68fd599f3c]
[compute-0-8:32075] [ 6]  /apps2/OpenFOAM/OpenFOAM-1.7.1/lib/linux64GccDPOpt/libOpenFOAM.so(_ZN4Foam9lduMatrix14preconditioner3NewERKNS0_6solverERKNS_10dictionaryE+0x2da)  [0x2b68fd58a0ea]
[compute-0-8:32075] [ 7]  /apps2/OpenFOAM/OpenFOAM-1.7.1/lib/linux64GccDPOpt/libOpenFOAM.so(_ZNK4Foam5PBiCG5solveERNS_5FieldIdEERKS2_h+0x6da)  [0x2b68fd58f20a]
[compute-0-8:32075] [ 8]  /apps2/OpenFOAM/OpenFOAM-1.7.1/lib/linux64GccDPOpt/libfiniteVolume.so(_ZN4Foam8fvMatrixIdE5solveERKNS_10dictionaryE+0x147)  [0x2b68fbeba477]
[compute-0-8:32075] [ 9] twoPhaseEulerFoam()  [0x43b9ef]
[compute-0-8:32075] [10] /lib64/libc.so.6(__libc_start_main+0xfd)  [0x3c81a1ecdd]
[compute-0-8:32075] [11] twoPhaseEulerFoam()  [0x42bda9]
[compute-0-8:32075] *** End of error message ***
 addr2line  failed
[15] #7  Foam::PBiCG::solve(Foam::Field<double>&,  Foam::Field<double> const&, unsigned char) const addr2line  failed
[15] #8  Foam::fvMatrix<double>::solve(Foam::dictionary  const&)--------------------------------------------------------------------------
mpirun  noticed that process rank 30 with PID 32075 on node compute-0-8.local exited on  signal 8 (Floating point  exception).
--------------------------------------------------------------------------
[compute-0-0.local:05537]  2 more processes have sent help message help-mpi-runtime.txt /  mpi_init:warn-fork
[compute-0-0.local:05537] Set MCA parameter  "orte_base_help_aggregate" to 0 to see all help / error  messages



The same case was working fine Suse version 10.2 earlier.

Now after installing RHEL 6.2 [with ROCKS cluster management tool] we are facing this issue.
Is it something related to ROCKS cluster management or something else in the OpenFOAM installation.

You suggestions/replies would be of great help

Regards
Raghu

wyldckat November 25, 2013 16:42

Greetings Raghu,

A few questions:
  1. Which MPI toolbox is being used in the new cluster installation?
  2. Was OpenFOAM 1.7.1 built with that MPI toolbox?
  3. Which GCC/G++ version was used for building OpenFOAM 1.7.1?
Best regards,
Bruno

raagh77 November 26, 2013 02:54

Thanks for the quick reply
 
Dear Bruno,

Pls find the details given below

1. Which MPI toolbox is being used in the new cluster installation?

openmpi-1.6.3

2 Was OpenFOAM 1.7.1 built with that MPI toolbox?

Yes. At the time of compilation openmpi-1.6.3 has been used.

3. Which GCC/G++ version was used for building OpenFOAM 1.7.1?
4.7.3

Regards
Raghu

wyldckat November 26, 2013 06:50

Hi Raghu,

Did you do any modification to the source code or building options, in order to be able to build OpenFOAM 1.7.1 with GCC 4.7.1?
Because AFAIK, the most recent GCC that OpenFOAM 1.7.1 was compatible with is GCC 4.5.x: http://openfoamwiki.net/index.php/In...tion_.28GCC.29

Best regards,
Bruno

raagh77 November 26, 2013 23:57

Hi Bruno,

Previously we tried with version 4.3 and also with 4.5 but this problem didn't get solved.

Surprisingly, some OF solvers like simpleFoam works fine with parallel but some solvers doesn't. :confused:



Regards
Raghu

wyldckat November 27, 2013 18:05

Hi Raghu,

Are you 100% certain that the build process used the correct gcc and g++ applications for each version you mentioned?
I ask this because the "rules" files are hard-coded to use the names "gcc" and "g++":
Which means that defining in "etc/bashrc":
Code:

WM_CC=gcc-4.5
WM_CXX=g++-4.5

is not enough!




Beyond this, there is always the possibility that there is a numerical instability in the case you are using:
  • In SuSE 10.2, it did work well, because it was able to stay in the stable numerical region.
  • In RHEL 6.2, some mathematical operations might be optimized differently, leading to the solver to fall into the unstable numerical region.
For example:
  • In SuSE 10.2, it stayed at values like these:
    Code:

    0.000432413
    0.000123412
    0.000531561

  • But in RHEL 6.2, it does something like this:
    Code:

    0.000421312
    -0.000003258743
    crash



I've done a quick search in Google with this search expression:
Code:

site:www.openfoam.org/mantisbt  twoPhaseEulerFoam
And found the following bug reports that might be relevant to the problem you're witnessing:
In other words, it might be worth your while to test building OpenFOAM 1.7.x, which is the latest (and last) bug fixed version of OpenFOAM 1.7. With any luck, that crash will no longer occur with the latest 1.7.x.

Best regards,
Bruno


All times are GMT -4. The time now is 21:45.