CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Installation (https://www.cfd-online.com/Forums/openfoam-installation/)
-   -   [OpenFOAM.org] Non-root installation of OpenFOAM 2.4.x, parallel issue (https://www.cfd-online.com/Forums/openfoam-installation/215318-non-root-installation-openfoam-2-4-x-parallel-issue.html)

syavash February 28, 2019 16:42

Non-root installation of OpenFOAM 2.4.x, parallel issue
 
Hi Foamers,

I have been trying to install OF 2.4.x on a CentOS 7 cluster. It was fine in serial but when I try to submit a job in parallel I get the following error:

Code:

Could not retrieve MPI tag from  /home/.../.../.../OpenFOAM/OpenFOAM-2.4.x/platforms/linux64GccDPOpt/bin/pimpleFoam
To be indicated, I intend to use system gcc as the compiler. I use the following commands to submit the job:

Code:

#!/bin/bash

#SBATCH -N 4
#SBATCH -t 00:10:00
#SBATCH -J ...
#SBATCH -A ...


export FOAM_INST_DIR=/home/.../.../.../OpenFOAM

. $FOAM_INST_DIR/OpenFOAM-2.4.x/etc/bashrc

#--nranks is used when less than the number of available cores is used

mpprun  --nranks=128 `which pimpleFoam` -parallel >& results

To have access to OpenMPI, I load a gcc module and OpenMPI is loaded as the result:

Code:

The buildenv-gcc module makes available:
 - Compilers: gcc, gfortran, etc.
 - MPI library with mpi-wrapped compilers: OpenMPI with mpicc, mpifort, etc.
 - Numerical libraries: OpenBLAS, LAPACK, ScaLAPACK, FFTW

I guess that OpenFOAM cannot find the system mpi on the cluster so it complains accordingly. I hope someone had similar experience with this issue and could help me out.

Regards,
Syavash

wyldckat February 28, 2019 17:20

Quick answer: Given that OpenFOAM's "etc/bashrc" file is being used in the job script, then you also need to load in the job script the module for Open-MPI, before sourcing/activating OpenFOAM's shell environment.

syavash March 1, 2019 05:36

Quote:

Originally Posted by wyldckat (Post 726447)
Quick answer: Given that OpenFOAM's "etc/bashrc" file is being used in the job script, then you also need to load in the job script the module for Open-MPI, before sourcing/activating OpenFOAM's shell environment.

Dear Bruno,

Thanks for your reply. I tried to include the following line in the job script:

Code:

export FOAM_MPI=/software/sse/easybuild/prefix/modules/all/Compiler/GCC/6.4.0-2.28/OpenMPI
The above line was added prior to
Code:

. $FOAM_INST_DIR/OpenFOAM-2.4.x/etc/bashrc
However, I still get the same error after submitting the job. My question is that am I loading Open-MPI in a correct way? Could you help me to figure it out?

Regards,
Syavash

fertinaz March 1, 2019 11:57

Hello Ehsan


How exactly did you install OF, using openmpi module in the cluster or the openmpi shipped with OF?


If you want to stick to the cluster's OMPI (which is the right way usually), then be sure that you made the required modifications in the OF's bashrc file. To check if everything is built correctly, you can have a look at the $FOAM_LIBBIN contents as well. Also, I'd suggest ldd tool to check if dependencies can be resolved.

syavash March 1, 2019 14:44

Quote:

Originally Posted by fertinaz (Post 726511)
Hello Ehsan


How exactly did you install OF, using openmpi module in the cluster or the openmpi shipped with OF?


If you want to stick to the cluster's OMPI (which is the right way usually), then be sure that you made the required modifications in the OF's bashrc file. To check if everything is built correctly, you can have a look at the $FOAM_LIBBIN contents as well. Also, I'd suggest ldd tool to check if dependencies can be resolved.

Dear Fatih,

Thanks for your attention. Honestly, I am not so sure, however I intend to use system default gcc/open-mpi modules. I followed the steps described in the following link:

HTML Code:

https://openfoamwiki.net/index.php/Installation/Linux/OpenFOAM-2.4.x/CentOS_SL_RHEL#CentOS_7.1
I skipped the first three steps. I also replaced the command in step 8 with the one below:

Code:

module load buildenv-gcc/2018a-eb
The above command outputs:

Code:

You have loaded an gcc buildenv module
***************************************************
The buildenv-gcc module makes available:
 - Compilers: gcc, gfortran, etc.
 - MPI library with mpi-wrapped compilers: OpenMPI with mpicc, mpifort, etc.
 - Numerical libraries: OpenBLAS, LAPACK, ScaLAPACK, FFTW

Please let me know if you need further information to diagnose the problem. I am confused now!

Regards,
Syavash

Edit: when I enter the command below
Code:

cd $FOAM_LIBBIN
I am redirected to the following address:

Code:

/home/.../.../.../OpenFOAM/OpenFOAM-2.4.x/platforms/linux64GccDPOpt/lib

fertinaz March 1, 2019 15:31

All right, I think the problem is the module you loaded doesn't actually load openmpi environment. It seems like it is a compiler module to access actual compilers (not just runtime libs) as well as to load the other modules built with that specific gcc version.

To check whether you have ompi in your environment, you can run something like: "which mpirun". Since I don't know how your cluster is configured you might want to check mpi compilers as well with "which mpicc". If you don't have those binaries located somewhere then you need to load the specific openmpi module and re-start the compilation.

If you have them in your environment and also if you're sure that built was completed using correct openmpi modules, then your job script can be wrong as well.

So in general:
== Before initiating the installation load correct modules of gcc and openmpi. Rest is optional (boost, metis etc.). After loading them, make sure they are loaded properly. You can use commands like which, module show etc.
== Check OF path is correctly defined in the etc/bashrc file (e.g. $HOME/OpenFOAM)
== Make sure "export WM_MPLIB=SYSTEMOPENMPI" is defined in OF/etc/bashrc when you use the ompi module
== After the installation check if mpi directories exist (e.g. openmpi-system) under the $FOAM_LIBBIN. So just being able to cd there doesn't mean much.
== To run solvers, meshers etc. load the same modules required for build
== Don't forget to source OF/etc/bashrc in your scripts
== Don't put an alias for it since multi-node jobs need to source it by default

Good luck

// Fatih

syavash March 2, 2019 15:43

Quote:

Originally Posted by fertinaz (Post 726524)
All right, I think the problem is the module you loaded doesn't actually load openmpi environment. It seems like it is a compiler module to access actual compilers (not just runtime libs) as well as to load the other modules built with that specific gcc version.

To check whether you have ompi in your environment, you can run something like: "which mpirun". Since I don't know how your cluster is configured you might want to check mpi compilers as well with "which mpicc". If you don't have those binaries located somewhere then you need to load the specific openmpi module and re-start the compilation.

If you have them in your environment and also if you're sure that built was completed using correct openmpi modules, then your job script can be wrong as well.

So in general:
== Before initiating the installation load correct modules of gcc and openmpi. Rest is optional (boost, metis etc.). After loading them, make sure they are loaded properly. You can use commands like which, module show etc.
== Check OF path is correctly defined in the etc/bashrc file (e.g. $HOME/OpenFOAM)
== Make sure "export WM_MPLIB=SYSTEMOPENMPI" is defined in OF/etc/bashrc when you use the ompi module
== After the installation check if mpi directories exist (e.g. openmpi-system) under the $FOAM_LIBBIN. So just being able to cd there doesn't mean much.
== To run solvers, meshers etc. load the same modules required for build
== Don't forget to source OF/etc/bashrc in your scripts
== Don't put an alias for it since multi-node jobs need to source it by default

Good luck

// Fatih

Dear Fatih,

Thanks for your elaboration. Well, I used "which mpirun" and "which mpicc" and I got the following outputs:

Code:

$which mpirun
/software/sse/easybuild/prefix/software/OpenMPI/1.10.3-GCC-5.4.0-2.26/bin/mpirun

Code:

$which mpicc
/software/sse/manual/mpprun/4.0/nsc-wrappers/mpicc

WM_MPLIB=SYSTEMOPENMPI is OK in etc/bashrc.

Also, inside $FOAM_LIBBIN I have "openmpi-system" directory which contains the following objects:

Code:

$ls
libPstream.so  libptscotchDecomp.so

I am not sure of anything else which might be of importance. Please let me know if you (Bruno as well :)) had any other idea on how to resolve this issue.

Regards,
Syavash

wyldckat March 2, 2019 16:28

Quick answer @syavash: Why can't you add this line:
Code:

module load buildenv-gcc/2018a-eb
before this one in the job script:
Code:

. $FOAM_INST_DIR/OpenFOAM-2.4.x/etc/bashrc
Does the case not launch with that?

The other hypothesis is to check if this module will in fact load other modules. If you start a new terminal and run:
Code:

module load buildenv-gcc/2018a-eb
module list

what does it give you?

syavash March 3, 2019 03:12

Dear Bruno,

Thanks for your reply.

Quote:

Why can't you add this line:
Code:

module load buildenv-gcc/2018a-eb
before this one in the job script:
Code:

. $FOAM_INST_DIR/OpenFOAM-2.4.x/etc/bashrc
Does the case not launch with that?
I had added the following alias for OF which I used to call it before submitting the job:

Code:

alias OF24x='export FOAM_INST_DIR=/home/.../.../.../OpenFOAM; module load buildenv-gcc/2018b-eb; source /home/.../.../.../OpenFOAM/OpenFOAM-2.4.x/etc/bashrc WM_NCOMPPROCS=4 WM_MPLIB=SYSTEMOPENMPI'
So I excluded the buildenv-gcc/2018b-eb module in the job script. Nevertheless, adding the line
Code:

module load buildenv-gcc/2018b-eb
before the line
Code:

. $FOAM_INST_DIR/OpenFOAM-2.4.x/etc/bashrc
did not make any changes and the error is still there.

Quote:

The other hypothesis is to check if this module will in fact load other modules. If you start a new terminal and run:
Code:

module load buildenv-gcc/2018a-eb
module list

what does it give you?
Here is the output for
Code:

module load buildenv-gcc/2018a-eb
:

Code:

You have loaded an gcc buildenv module
***************************************************
The buildenv-gcc module makes available:
 - Compilers: gcc, gfortran, etc.
 - MPI library with mpi-wrapped compilers: OpenMPI with mpicc, mpifort, etc.
 - Numerical libraries: OpenBLAS, LAPACK, ScaLAPACK, FFTW

It also makes a set of dependency library modules available via
the regular module command. Just do:
  module avail
to see what is available.

and the output for
Code:

module list
:

Code:

Currently Loaded Modules:
  1) mpprun/4.0
  2) nsc/.1.0                            (H,S)
  3) EasyBuild/3.5.3-nsc17d8ce4
  4) nsc-eb-scripts/1.0
  5) buildtool-easybuild/3.5.3-nsc17d8ce4
  6) GCCcore/6.4.0
  7) binutils/.2.28                      (H)
  8) GCC/6.4.0-2.28
  9) numactl/.2.0.11                      (H)
 10) hwloc/.1.11.8                        (H)
 11) OpenMPI/.2.1.2                      (H)
 12) OpenBLAS/.0.2.20                    (H)
 13) FFTW/.3.3.7                          (H)
 14) ScaLAPACK/.2.0.2-OpenBLAS-0.2.20    (H)
 15) foss/2018a
 16) buildenv-gcc/2018a-eb

  Where:
  S:  Module is Sticky, requires --force to unload or purge
  H:            Hidden Module

Any idea? :confused:

Regards,
Syavash

wyldckat March 3, 2019 09:14

Quick answer: If using "module load" doesn't work within a job, then it's because it's not able to properly load it... or because mpprun needs to be used properly.

If the cluster/supercomputer you are using has a support page, you should consult it and tell us about it if you cannot understand it!
I did a quick search online for mpprun and found this page: https://www.nsc.liu.se/support/tutorials/mpprun/

Apparently the error message:
Code:

Could not retrieve MPI tag from .../pimpleFoam
is because the application itself does not have the tag needed for mpprun to assess which modules to load.

So if you run in the login node this command:
Code:

dumptag $(which pimpleFoam)
you should get a similar message stating that the tag was not found.

Apparently you must use the "-Nmpi" additional compilation argument, as documented here: https://www.nsc.liu.se/software/buildenv/ and here: https://www.nsc.liu.se/software/mpi-libraries/

Therefore, if my estimates are correct:
  1. Edit the file "OpenFOAM-2.4.x/wmake/rules/linux64Gcc/c++Opt".
  2. Add the entry "-Nmpi" to "c++OPT", e.g.:
    Code:

    c++OPT      = -O3 -Nmpi
  3. And then you have to rebuild OpenFOAM 2.4.x entirely, because this extra build option has to be added to all object files during compiling...
  4. Then again, you could try only rebuilding only pimpleFoam and hope it's enough:
    Code:

    wclean $FOAM_SOLVERS/incompressible/pimpleFoam
    wmake $FOAM_SOLVERS/incompressible/pimpleFoam

  5. Then check if it's tagged properly:
    Code:

    dumptag $(which pimpleFoam)
    which should tell you several details, including a line similar to this one:
    Code:

    Built with MPI:        openmpi 1_6_2_build1
Beyond this, using "module load" should not be necessary in the job script.

syavash March 3, 2019 16:24

Quote:

If the cluster/supercomputer you are using has a support page, you should consult it and tell us about it if you cannot understand it!
Dear Bruno, I see your point, however I should say that I never came across the support page! I searched the error message in Google only but found little help. I know it might look weird but I was confused and missed the support web page, for that I am sorry.

I should also thank you again as your thorough elaboration made the job run in parallel. As you instructed I did a fresh installation, this time by including -Nmpi flag.

Regards,
Syavash

wyldckat March 3, 2019 18:21

Quick answer: I'm very glad that it worked!

Mmm... it is possible that Google was bias towards me, given my search profile, even though I wasn't logged in...

Then again, when i searched for "Could not retrieve MPI tag from" without the quotes, it didn't give me anything, but with the quotes, it did give me just one answer: http://www.tfd.chalmers.se/~hani/wik.../_Installation - namely one of Håkan Nilsson's wiki pages, but I didn't understand very well the context either, since his solution was to simply use Intel MPI... I did have to research a bit more and happened to look for "mpprun" and started to read through things a bit more carefully.

But I was also expecting that the cluster/supercomputer you are using, would have an instructions page... but I'm glad that NSC has a fairly complete instructions page and managed to help us out!

syavash March 4, 2019 07:15

Quote:

Originally Posted by wyldckat (Post 726644)
Quick answer: I'm very glad that it worked!

Mmm... it is possible that Google was bias towards me, given my search profile, even though I wasn't logged in...

Then again, when i searched for "Could not retrieve MPI tag from" without the quotes, it didn't give me anything, but with the quotes, it did give me just one answer: http://www.tfd.chalmers.se/~hani/wik.../_Installation - namely one of Håkan Nilsson's wiki pages, but I didn't understand very well the context either, since his solution was to simply use Intel MPI... I did have to research a bit more and happened to look for "mpprun" and started to read through things a bit more carefully.

But I was also expecting that the cluster/supercomputer you are using, would have an instructions page... but I'm glad that NSC has a fairly complete instructions page and managed to help us out!

Dear Bruno,

You are right about Håkan's page, as I had installed OF 2.3.1 based on his guide and had used Intel as the compiler. However, Intel compiler would give me some issues when I wanted to use mapFields or when trying to extract lines of data by sampleDict. I had previously installed OF 2.3.1 on another cluster with Gcc (though an older version) and I did not have those issues, so I decided to install OF 2.4.x with Gcc this time, hoping that it does not complain with mapFields anymore.
Also, I examined NSC page. As you indicated they have provided a fairly helpful support page so I will check it out before bringing up the issue here! :)

Regards,
Syavash


All times are GMT -4. The time now is 07:51.