CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM (http://www.cfd-online.com/Forums/openfoam/)
-   -   what is wrong with the mpirun parameter -mca ? (http://www.cfd-online.com/Forums/openfoam/72660-what-wrong-mpirun-parameter-mca.html)

donno February 12, 2010 23:25

what is wrong with the mpirun parameter -mca ?
 
HI,all

I am a newbie to running OF1.6 in parallel, the problem coming to me is that
i can run mpirun in a cluster (suse10.2 with IB, PBS installed) like this :

## it is ok for parallel
#PBS -N mycase
## Submit to specified nodes:
##PBS -S /bin/bash
#PBS -l nodes=1:ppn=16
#PBS -j oe
#PBS -l walltime=00:10:00
##PBS -q debug

cat $PBS_NODEFILE
##cat $PBS_O_WORKDIR
##cd $PBS_O_WORKDIR

NP=`cat $PBS_NODEFILE|wc -l`

mpirun -np $NP -machinefile $PBS_NODEFILE interFoam -parallel

while it is not ok with the script only one line changed as

mpirun -np $NP -machinefile $PBS_NODEFILE --mca btl self,openib interFoam -parallel

and the output is following :



--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: node11
Framework: btl
Component: openib
--------------------------------------------------------------------------
[node11:06179] mca: base: components_open: component pml / csum open function failed
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: node11
Framework: btl
Component: openib
--------------------------------------------------------------------------
[node11:06179] mca: base: components_open: component pml / ob1 open function failed
--------------------------------------------------------------------------
No available pml components were found!

This means that there are no components of this type installed on your
system or all the components reported that they could not be used.

This is a fatal error; your MPI process is likely to abort. Check the
output of the "ompi_info" command and ensure that components of this
type are available on your system. You may also wish to check the
value of the "component_path" MCA parameter and ensure that it has at
least one directory that contains valid MCA components.
--------------------------------------------------------------------------
[node11:06179] PML ob1 cannot be selected
--------------------------------------------------------------------------
mpirun has exited due to process rank 3 with PID 6179 on
node node11 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

what is wrong with the parameter -mca ?

olesen February 15, 2010 04:46

Quote:

Originally Posted by donno (Post 245911)
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: node11
Framework: btl
Component: openib
--------------------------------------------------------------------------
[node11:06179] mca: base: components_open: component pml / csum open function failed
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Does "ompi_info" report that the components in question are available?

donno February 15, 2010 09:37

ompi_info:

Package: Open MPI henry@dm
Distribution
Open MPI: 1.3.3
Open MPI SVN revision: r21666
Open MPI release date: Jul 14, 2009
Open RTE: 1.3.3
Open RTE SVN revision: r21666
Open RTE release date: Jul 14, 2009
OPAL: 1.3.3
OPAL SVN revision: r21666
OPAL release date: Jul 14, 2009
Ident string: 1.3.3

some related components:

MCA pml: cm (MCA v2.0, API v2.0,
Component v1.3.3)
MCA pml: csum (MCA v2.0, API v2.0,
Component v1.3.3)
MCA pml: ob1 (MCA v2.0, API v2.0,
Component v1.3.3)
MCA pml: v (MCA v2.0, API v2.0,
Component v1.3.3)
MCA bml: r2 (MCA v2.0, API v2.0,
Component v1.3.3)
MCA rcache: vma (MCA v2.0, API v2.0,
Component v1.3.3)
MCA btl: self (MCA v2.0, API v2.0,
Component v1.3.3)
MCA btl: sm (MCA v2.0, API v2.0,
Component v1.3.3)
MCA btl: tcp (MCA v2.0, API v2.0,
Component v1.3.3)

is it wrong with the openib?

bastil February 15, 2010 10:31

I think you need to recompile OpenMPI for openib. Take a look at the Third-Party Allwmake.

Regards BastiL

olesen February 15, 2010 10:34

Quote:

Originally Posted by donno (Post 246065)
ompi_info:

Package: Open MPI henry@dm
...
is it wrong with the openib?

It's obviously not configured in the default release. If you examine the Allwmake file in ThirdParty, you'll see something like this:

Code:

        # Infiniband support
        # if [ -d /usr/local/ofed -a -d /usr/local/ofed/lib64 ]
        # then
        #    mpiWith="$mpiWith --with-openib=/usr/local/ofed"
        #    mpiWith="$mpiWith --with-openib-libdir=/usr/local/ofed/lib64"
        # fi

Fix it to suit your configuration and recompile openmpi. Presumably you have the corresponding headers/libraries for infiniband.
While you are at it, you might also consider getting a more recent version (openmpi-1.4.1) - there have been various bugfixes since the 1.3.3 release.

bjr March 24, 2010 11:34

I believe I'm having virtually the exact same problem as the OP Donno... OpenSUSE 11.2 cluster, vanilla OF-1.6, trying to use --mca btl openib,self

http://www.cfd-online.com/Forums/ope...not-quite.html

Did you ever find out what your Allwmake file should look like to get this working? Are you using OFED as I am?

bjr March 24, 2010 17:00

Think I just got it figured out...

host2:~ # which ompi_info
/root/OpenFOAM/ThirdParty-1.6/openmpi-1.3.3/platforms/linux64GccDPOpt/bin/ompi_info
host2:~ # ompi_info | grep openib
MCA btl: openib (MCA v2.0, API v2.0, Component v1.3.3)

Turns out it was in the Allwmake file.

The relevant lines being changed to (for my configuration)...

./configure \
--prefix=$MPI_ARCH_PATH \
--disable-mpirun-prefix-by-default \
--disable-orterun-prefix-by-default \
--enable-shared --disable-static \
--disable-mpi-f77 --disable-mpi-f90 --disable-mpi-cxx \
--disable-mpi-profile
# These lines enable Infiniband support
#--with-openib=/usr/local/ofed \
#--with-openib-libdir=/usr/local/ofed/lib64
--with-openib=/usr/include/infiniband


All times are GMT -4. The time now is 14:34.