CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   MPI error (https://www.cfd-online.com/Forums/openfoam-solving/59326-mpi-error.html)

msrinath80 September 3, 2007 17:15

Has anyone seen this kind of e
 
Has anyone seen this[1] kind of error message before? I've linked OpenFOAM 1.4.1 with MPICH-compatibility libraries provided by the HP-MPI suite. I first tried linking directly with hpmpi.so and Pstream compiled fine. However, after that whenever I tried compiling any solver, I got the very same error messages that Frank Bos reported a while ago[2]. I believe those errors are due to C++ bindings being enabled when building Pstream. However, I don't know how to disable them when using HP-MPI. As a result I had to switch to MPICH-compatibility libraries provided by HP-MPI which allow me to build both Pstream and my solver without problems. I need to use HP-MPI as the cluster is configured for Voltaire Infiniband switched-fabric interconnect with Hewlett Packard's XC software stack.

Now, in MPICH-compatibility mode, ldd `which icoFoam_1` gives:

[madhavan@matrix ~]$ ldd `which icoFoam_1`
libfiniteVolume.so => /home/users/madhavan/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libfiniteVolume .so (0x0000002a95557000)
libOpenFOAM.so => /home/users/madhavan/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libOpenFOAM.so (0x0000002a96158000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003b8f600000)
libstdc++.so.6 => /home/users/madhavan/OpenFOAM/linux64/gcc-4.2.1/lib64/libstdc++.so.6 (0x0000002a96604000)
libm.so.6 => /lib64/tls/libm.so.6 (0x0000003b8f100000)
libgcc_s.so.1 => /home/users/madhavan/OpenFOAM/linux64/gcc-4.2.1/lib64/libgcc_s.so.1 (0x0000002a96829000)
libc.so.6 => /lib64/tls/libc.so.6 (0x0000003b8f300000)
libPstream.so => /home/users/madhavan/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/hpmpi/libPstrea m.so (0x0000002a96937000)
libtriSurface.so => /home/users/madhavan/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libtriSurface.s o (0x0000002a96a3f000)
libmeshTools.so => /home/users/madhavan/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libmeshTools.so (0x0000002a96bbf000)
libz.so => /home/users/madhavan/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libz.so (0x0000002a96e2d000)
/lib64/ld-linux-x86-64.so.2 (0x0000003b8ef00000)
libmpich.so => /opt/hpmpi/MPICH1.2/lib/linux_amd64/libmpich.so (0x0000002a96f42000)
librt.so.1 => /lib64/tls/librt.so.1 (0x0000003b94000000)
liblagrangian.so => /home/users/madhavan/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/liblagrangian.s o (0x0000002a9706e000)
libhpmpi.so => /opt/hpmpi/MPICH1.2/lib/linux_amd64/libhpmpi.so (0x0000002a97170000)
libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003b8fa00000)
[madhavan@matrix ~]$

Interestingly, if I run a case which is only around 1-2 million cells the run executes perfectly. But it does not for a 6 or 9 million cell case. Which suggests that this has something to do with 32 or 64 bit build issues. Nevertheless, the output from ldd (shown above) seems to suggest the contrary.

I would appreciate if someone could shed some light on this issue. Thanks!


[1] Error message from MPI:

[12]
[12]
[12] --> FOAM FATAL IO ERROR : IOstream::check(const char* operation) : error in IOstream "IOstream" for operation operator>>
(Istream&, List<t>&) : reading first token
[12]
[12] file: IOstream at line 0.
[12]
[12] From function IOstream::fatalCheck(const char* operation) const
[12] in file db/IOstreams/IOstreams/IOcheck.C at line 73.
[12]
FOAM parallel run exiting
[12]
[11]
[11]
[11] --> FOAM FATAL IO ERROR : IOstream::check(const char* operation) : error in IOstream "IOstream" for operation operator>>
(Istream&, List<t>&) : reading first token
[11]
[11] file: IOstream at line 0.
[11]
[11] From function IOstream::fatalCheck(const char* operation) const
[11] in file db/IOstreams/IOstreams/IOcheck.C at line 73.
[11]
FOAM parallel run exiting
[11]
[10]
[10]
[10] --> FOAM FATAL IO ERROR : IOstream::check(const char* operation) : error in IOstream "IOstream" for operation operator>>
(Istream&, List<t>&) : reading first token
[10]
[10] file: IOstream at line 0.
[10]
[10] From function IOstream::fatalCheck(const char* operation) const
[10] in file db/IOstreams/IOstreams/IOcheck.C at line 73.
[10]
FOAM parallel run exiting
[10]
MPI Application rank 12 exited before MPI_Finalize() with status 1
MPI Application rank 11 exited before MPI_Finalize() with status 1


[2] http://www.cfd-online.com/OpenFOAM_D...es/1/2968.html

mattijs September 4, 2007 13:51

In LAM&openMPI I just had a lo
 
In LAM&openMPI I just had a look through the mpi.h to see how to not include the c++ bindings. Maybe there is a similar nice switch in HP-MPI.

Alternatively additionally link in the mpi library that provides the c++ functions.

msrinath80 September 4, 2007 14:35

Hi Mattijs, I looked into m
 
Hi Mattijs,

I looked into mpi.h and found no such convenient switch. However I located the additional link that provides C++ functions (-lmpiCC). When I add this to my mplibHPMPI rule, the build proceeds fine except for this error message near the end:

/usr/bin/ld: /opt/hpmpi/lib/linux_amd64/libmpiCC.a(intercepts.o): relocation R_X86_64_32S against `MPI::Comm::key_ref_map' can not be used when making a shared object; recompile with -fPIC
/opt/hpmpi/lib/linux_amd64/libmpiCC.a: could not read symbols: Bad value
collect2: ld returned 1 exit status
make: *** [/home/users/madhavan/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/hpmpi/libPstre am.so] Error 1

How should I proceed? Thanks for your help!

msrinath80 September 4, 2007 15:31

Mattijs, Thanks very much for
 
Mattijs, Thanks very much for your inspiration which made me fool around a lot with HP-MPI. I think I have finally solved the problem. Here is the detailed solution in the hopes that it will prove useful for others in a similar predicament.

First I need to poke into mpi.h as Mattijs suggested and find out if they have an easy switch I can pass through PFLAGS in ~/OpenFOAM/OpenFOAM-1.4.1/wmake/rules/linux64Gcc/mplibHPMPI

which by the way is the file you create when you try to build Pstream with an MPI implementation already installed on the cluster.

Note: I also added the following lines to ~/OpenFOAM/OpenFOAM-1.4.1/.bashrc:

elif [ .$WM_MPLIB = .HPMPI ]; then

export HPMPI_ARCH_PATH=/opt/hpmpi

AddLib $HPMPI_ARCH_PATH/lib/linux_amd64/
AddPath $HPMPI_ARCH_PATH/bin

export FOAM_MPI_LIBBIN=$FOAM_LIBBIN/hpmpi


and export WM_MPLIB=HPMPI to ~/OpenFOAM/OpenFOAM-1.4.1/.OpenFOAM-1.4.1/bashrc


Finally, update the Allwmake script in ~/OpenFOAM/OpenFOAM-1.4.1/src/Pstream to include "$WM_MPLIB" = "HPMPI"

Currently for HP-MPI the mplibHPMPI file reads:

PFLAGS = -DHPMP_BUILD_CXXBINDING
PINC = -I/opt/hpmpi/include
PLIBS = -L/opt/hpmpi/lib/linux_amd64 -lhpmpio -lhpmpi -ldl -lmpiCC

As one can see, I have added the -DHPMP_BUILD_CXXBINDING switch to PFLAGS as I found that doing so enables C++ bindings support within HP-MPI. In addition, I also added the -lmpiCC to link the libraries with C++ MPI bindings.

When I tried to build Pstream, it failed with the error message mentioned above in this thread (i.e. relocation error). This is caused by mixing static libraries with shared builds. The solution for the same is to try and find a libmpiCC.so in the HP-MPI installation. I could not find one. So I googled for the same and came up with an alternative proposed by HP[1]. This let me rebuild
libmpiCC.a using my current g++ (supplied with OpenFOAM). However the library was still static. So I googled again on how to create shared libraries and found this link[2]. Now all I had to do was follow the recipe:

g++ -fPIC -c intercepts.cc -I/opt/hpmpi/include -DHPMP_BUILD_CXXBINDING
g++ -fPIC -c mpicxx.cc -I/opt/hpmpi/include -DHPMP_BUILD_CXXBINDING
g++ -shared -Wl,-soname,libmpiCC.so -o libmpiCC.so.1.0.1 intercepts.o mpicxx.o -lc

And finally symlink the libmpiCC.so.1.0.1 to ~/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/hpmpi/libmpiCC.so

Now, my mplibHPMPI file reads:
PFLAGS = -DHPMP_BUILD_CXXBINDING
PINC = -I/opt/hpmpi/include
PLIBS = -L /home/users/madhavan/OpenFOAM/OpenFOAM-1.4.1/src/mpiCCsrc -L/opt/hpmpi/lib/linux_amd64 -lhpmpio -lhpmpi -ldl
-lmpiCC

And after rebuilding libPstream.so followed by icoFoam_1 (my customized icoFoam solver), ldd `which icoFoam_1` gives:

[madhavan@matrix icoFoam]$ ldd `which icoFoam_1`
libfiniteVolume.so => /home/users/madhavan/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libfiniteVolume .so (0x0000002a95557000)
libOpenFOAM.so => /home/users/madhavan/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libOpenFOAM.so (0x0000002a96158000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003b8f600000)
libstdc++.so.6 => /home/users/madhavan/OpenFOAM/linux64/gcc-4.2.1/lib64/libstdc++.so.6 (0x0000002a96604000)
libm.so.6 => /lib64/tls/libm.so.6 (0x0000003b8f100000)
libgcc_s.so.1 => /home/users/madhavan/OpenFOAM/linux64/gcc-4.2.1/lib64/libgcc_s.so.1 (0x0000002a96829000)
libc.so.6 => /lib64/tls/libc.so.6 (0x0000003b8f300000)
libPstream.so => /home/users/madhavan/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/hpmpi/libPstrea m.so (0x0000002a96937000)
libtriSurface.so => /home/users/madhavan/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libtriSurface.s o (0x0000002a96a4f000)
libmeshTools.so => /home/users/madhavan/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libmeshTools.so (0x0000002a96bcf000)
libz.so => /home/users/madhavan/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/libz.so (0x0000002a96e3d000)
/lib64/ld-linux-x86-64.so.2 (0x0000003b8ef00000)
libmpio.so.1 => /opt/hpmpi/lib/linux_amd64/libmpio.so.1 (0x0000002a96f52000)
libmpi.so.1 => /opt/hpmpi/lib/linux_amd64/libmpi.so.1 (0x0000002a9708d000)
libmpiCC.so => /home/users/madhavan/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/hpmpi/libmpiCC. so (0x0000002a972c8000)
liblagrangian.so => /home/users/madhavan/OpenFOAM/OpenFOAM-1.4.1/lib/linux64GccDPOpt/liblagrangian.s o (0x0000002a973e4000)

References:
[1] http://docs.hp.com/en/B6060-96024/ch03s02.html
[2] http://tldp.org/HOWTO/Program-Librar...libraries.html

msrinath80 September 4, 2007 15:38

Addendum: One might wish to ad
 
Addendum: One might wish to add the -m64 flag to the g++ command line just to be safe.

msrinath80 September 4, 2007 18:50

Alright, I give up! Even after
 
Alright, I give up! Even after successfully building the application with HP-MPI support, I get the same error message when running a 6-million case. I'm reverting to OpenMPI 1.2.3 for good. If there is one thing I've learnt through this ordeal it is that proprietary software is "EVIL" by design.

eugene October 30, 2007 08:55

Follow these instructions to g
 
Follow these instructions to get HPMPI working:

http://openfoamwiki.net/index.php/HowTo_Pstream

Thanks to Henry and Mattijs for the work-around.

msrinath80 October 30, 2007 22:19

Thanks a lot Eugene for the in
 
Thanks a lot Eugene for the info. Thanks of course to Henry and Mattijs as well. It certainly works. But I will need to check if I can run cases with 4-6 million cells without issues.

eugene October 31, 2007 10:05

Yes, please let me know. My co
 
Yes, please let me know. My connection to the machine I was about to do the tests on has gone down so I have no way of confirming that the fix solves the 6M cell problem as well.

msrinath80 November 9, 2007 22:17

Apologies for the late respons
 
Apologies for the late response Eugene. HPMPI works very nicely for large cases as well using the instructions you pointed to earlier. Thanks Henry and Mattijs!

ali84 August 5, 2009 16:07

Still there is a problem
 
As you may know in OpenFOAM-1.5-dev and OpenFOAM-1.6, the file mplibHPMPI has beed added to wmake/rules/$WM_ARCH directory to support HP-MPI and it includes instructions that eugene has linked. I have copmiled Pstream using that settings (mplibHPMPI) but still I get the same error that msrinath80 introduced, for a 3 million gridpoints mesh (or higher) on more than 4 CPUs (1 node).


All times are GMT -4. The time now is 00:13.