CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM (http://www.cfd-online.com/Forums/openfoam/)
-   -   Almost have my cluster running openfoam, but not quite... (http://www.cfd-online.com/Forums/openfoam/74061-almost-have-my-cluster-running-openfoam-but-not-quite.html)

bjr March 23, 2010 17:14

Almost have my cluster running openfoam, but not quite...
 
I present two "successful cases" to describe where I am. Given these, does anybody spot the error in Case 3?

Case 1: "Hello world-ish" on a 6-node by 2-cpu cluster:

~/.bashrc is not yet including /root/OpenFOAM/OpenFOAM-1.6/etc/bashrc

host1:~ # /usr/lib64/mpi/gcc/openmpi/bin/mpirun --mca btl openib,self -machinefile list.txt -np 12 test/comm_size_with_id.out
Process 1 on host2 out of 12
Process 7 on host2 out of 12
Process 2 on host3 out of 12
Process 4 on host5 out of 12
Process 8 on host3 out of 12
Process 10 on host5 out of 12
Process 3 on host4 out of 12
Process 9 on host4 out of 12
Process 5 on host6 out of 12
Process 11 on host6 out of 12
Process 6 on host1 out of 12
Process 0 on host1 out of 12

Case 2: As close as I can get to OpenFOAM working in parallel:

~/.bashrc including OpenFOAM specific environment variables as set by /root/OpenFOAM/OpenFOAM-1.6/etc/bashrc

host1:~ # which mpirun
/root/OpenFOAM/ThirdParty-1.6/openmpi-1.3.3/platforms/linux64GccDPOpt/bin/mpirun

host1:~ # mpirun -np 12 simpleFoam -parallel -case inletProfile/

/*---------------------------------------------------------------------------*\
| ========= | |
| \\ / F ield | OpenFOAM: The Open Source CFD Toolbox |
| \\ / O peration | Version: 1.6 |
| \\ / A nd | Web: www.OpenFOAM.org |
| \\/ M anipulation | |
\*---------------------------------------------------------------------------*/
Build : 1.6-f802ff2d6c5a
Exec : simpleFoam -parallel -case inletProfile/
Date : Mar 23 2010
Time : 12:43:31
Host : host1
PID : 25231
Case : ./inletProfile
nProcs : 12
Slaves :
11
(
host1.25232
host1.25233
host1.25234
host1.25235
host1.25236
host1.25237
host1.25238
host1.25239
host1.25240
host1.25241
host1.25242
)

...........
...........

It goes on to work correctly, but only on one machine with 12 processes

Case 3: Case 2, but including "-machinefile list.txt":

host1:~ # mpirun -np 12 -machinefile list.txt simpleFoam -parallel -case inletProfile/

zsh:1: command not found: orted
A daemon (pid 25297) died unexpectedly with status 127 while attempting
to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
zsh:1: command not found: orted
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
192.168.1.102 - daemon did not report back when launched
192.168.1.103 - daemon did not report back when launched
192.168.1.104 - daemon did not report back when launched
192.168.1.105 - daemon did not report back when launched
192.168.1.106 - daemon did not report back when launched
zsh:1: command not found: orted
host1:~ # zsh:1: command not found: orted

bjr March 23, 2010 17:33

Getting closer, per...

http://www.open-mpi.org/faq/?categor...mpilers-static

So I can run interactively, but not in a non-interactive login, per...

host1:~ # ssh host5 $HOME/sum_serial.out
/root/sum_serial.out: error while loading shared libraries: libmpi_cxx.so.0: cannot open shared object file: No such file or directory
host1:~ #

bjr March 23, 2010 17:41

The solution is to change the shell back to bash and copy the .bashrc to every machine so that it's sourced when you login. you can change the shell by running
chsh root /bin/bash
chsh admin /bin/bash
and then copy .bashrc to every node to /root (assuming you run mpirun as root)

bjr March 23, 2010 19:05

This works...
mpirun -np 12 -machinefile list.txt simpleFoam -parallel -case inletProfile/ > log.simpleFoam.Parallelopenib &

But this (with the '--mca btl, openib,self) doesn't...
mpirun --mca btl openib,self -np 12 -machinefile list.txt simpleFoam -parallel -case inletProfileMonday/ > log.simpleFoam.Parallelopenib &

Do you know of a way to tell if my simulation is using infiniband for sure? I was thinking of pulling some ethernet cables and seeing what happens as a brute force approach. It's configured in such a way that I can picture it being able to us either one.

That error message is this a bunch of times...

"""
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: host3
Framework: btl
Component: openib
--------------------------------------------------------------------------
[host3:08394] mca: base: components_open: component pml / csum open function failed
--------------------------------------------------------------------------
"""

bjr March 23, 2010 19:06

Turns out that it was using the ethernet side without these flags.

bjr March 24, 2010 11:35

Looks like we have the same problem...
 
http://www.cfd-online.com/Forums/ope...tml#post251459

jsquyres March 24, 2010 11:37

Quote:

Originally Posted by bjr (Post 251333)
Case 1: "Hello world-ish" on a 6-node by 2-cpu cluster:

~/.bashrc is not yet including /root/OpenFOAM/OpenFOAM-1.6/etc/bashrc

host1:~ # /usr/lib64/mpi/gcc/openmpi/bin/mpirun --mca btl openib,self -machinefile list.txt -np 12 test/comm_size_with_id.out

Note that there is a subtle difference between

/path/to/mpirun ...

and

mpirun ...

In the former case, Open MPI will add itself to PATH and LD_LIBRARY_PATH on all the remote nodes (regardless of what you do in your .bashrc). In the latter case, it will not (meaning: you probably should have added Open MPI to your PATH / LD_LIBRARY_PATH in your .bashrc).

Note that the "/path/to/mpirun ..." form is a shortcut for the --prefix command line option to mpirun. See Open MPI's mpirun(1) man page for details.

Quote:

Originally Posted by bjr (Post 251333)
Case 2: As close as I can get to OpenFOAM working in parallel:

~/.bashrc including OpenFOAM specific environment variables as set by /root/OpenFOAM/OpenFOAM-1.6/etc/bashrc

host1:~ # mpirun -np 12 simpleFoam -parallel -case inletProfile/

You didn't specify a -machinefile here, and I'm assuming you're not using a scheduler. So Open MPI probably had no other choice other than to assume that you wanted all 12 processes on the same host (i.e., it had no other list of hosts to draw from other than "localhost").

Quote:

Originally Posted by bjr (Post 251333)
Case 3: Case 2, but including "-machinefile list.txt":

host1:~ # mpirun -np 12 -machinefile list.txt simpleFoam -parallel -case inletProfile/

zsh:1: command not found: orted

I'm *guessing* that your had PATH / LD_LIBRARY_PATH set properly to find Open MPI and its executables+libraries in the shell that you were launching from on host1. But your .bashrc didn't set these things for the other hosts listed in the list.txt file. Hence, when Open MPI tried to launch on the remote nodes, it couldn't find its helper executable "orted", and things went downhill from there.

If you had used the /path/to/mpirun... form, this case probably would have worked.

Or you could propagate your .bashrc out to all nodes so that all nodes can find Open MPI's executables+libraries, and that should work, too (which, by a later post, I think you did -- but I wanted to explain just so that you knew *why* it worked).

Make sense?

jsquyres March 24, 2010 11:48

Quote:

Originally Posted by bjr (Post 251343)
But this (with the '--mca btl, openib,self) doesn't...
mpirun --mca btl openib,self -np 12 -machinefile list.txt simpleFoam -parallel -case inletProfileMonday/ > log.simpleFoam.Parallelopenib &

Do you know of a way to tell if my simulation is using infiniband for sure?

With the above command line, you are telling Open MPI to use *only* the "openib" and "self" BTL plugins. BTL = Byte Transfer Layer -- Open MPI's lowest layer point-to-point transport for these kinds of networks. So the syntax means this:
  • --mca: the next two parameters will set an MCA parameter value
  • btl: the MCA parameter name. The "btl" parameter specifies exactly which plugins to use (or not use).
  • openib,self: The "btl" parameter takes a comma-delimited list of plugins to use. The openib BTL plugin is what you use for OpenFabrics networks (e.g., InfiniBand or iWARP). The self plugin is for process-local loopback (i.e., if an MPI process sends to itself).
In this case, you're only allowing Open MPI to use IB for communications. So if it can't reach a peer MPI process via IB, it'll error out.

Hence, Open MPI doesn't (yet) give a positive ACK that you're using IB -- it gives a negative ACK if it can't. Enough people have asked for a positivie ACK that we're likely to add it in the v1.5 series sometime.
BTW, you probably actually want to use "--mca btl openb,sm,self". This allows Open MPI to use shared memory for on-node communication (which can be faster than forcing it to loopback through your IB adapters). I don't know much about OpenFOAM to know if this will provide an overall performance boost or not.
Quote:

Originally Posted by bjr (Post 251343)
I was thinking of pulling some ethernet cables and seeing what happens as a brute force approach. It's configured in such a way that I can picture it being able to us either one.

Don't do that. :-)

Open MPI uses TCP for setup and teardown, even if you're using OpenFabrics transports (IB or iWARP) for MPI communications.


Quote:

Originally Posted by bjr (Post 251343)
"""
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: host3
Framework: btl
Component: openib
--------------------------------------------------------------------------
[host3:08394] mca: base: components_open: component pml / csum open function failed
--------------------------------------------------------------------------
"""

Hmm. That's darn weird, actually.

Did you happen to install one version of Open MPI and then install a different version over it? You *may* have Open MPI plugins from different versions in the same installation tree that don't play nicely with each other.

If this is the case, try fully uninstalling Open MPI and then manually inspecting the $prefix/lib/openmpi dir and ensure that there's no plugins left over from a prior Open MPI installation. The install Open MPI again. Let me know if that works.

You can also try:

rm /open/mpi/installation/tree/lib/openmpi/mca_pml_csum.*

(you probably won't use the CSUM PML, so it's save to either remove or move to a different location where Open MPI won't find it) I'm not optimistic that that will fix it, but it could (if csum is from a prior Open MPI install).

bjr March 24, 2010 11:50

Quote:

Originally Posted by jsquyres (Post 251462)
Note that there is a subtle difference between

/path/to/mpirun ...

and

mpirun ...

In the former case, Open MPI will add itself to PATH and LD_LIBRARY_PATH on all the remote nodes (regardless of what you do in your .bashrc). In the latter case, it will not (meaning: you probably should have added Open MPI to your PATH / LD_LIBRARY_PATH in your .bashrc).

I think sourcing /root/OpenFOAM/OpenFOAM-1.6/etc/bashrc on all nodes as I did does this, so I missed seeing the difference.

Quote:

Originally Posted by jsquyres (Post 251462)

You didn't specify a -machinefile here, and I'm assuming you're not using a scheduler. So Open MPI probably had no other choice other than to assume that you wanted all 12 processes on the same host (i.e., it had no other list of hosts to draw from other than "localhost".

I was trying to emphasize my breaking point, but it turns out it was the default zsh setting was preventing my distributed .bashrc's from being seen.

Quote:

Originally Posted by jsquyres (Post 251462)

I'm *guessing* that your had PATH / LD_LIBRARY_PATH set properly to find Open MPI and its executables+libraries in the shell that you were launching from on host1. But your .bashrc didn't set these things for the other hosts listed in the list.txt file. Hence, when Open MPI tried to launch on the remote nodes, it couldn't find its helper executable "orted", and things went downhill from there.

If you had used the /path/to/mpirun... form, this case probably would have worked.

Or you could propagate your .bashrc out to all nodes so that all nodes can find Open MPI's executables+libraries, and that should work, too (which, by a later post, I think you did -- but I wanted to explain just so that you knew *why* it worked).

Make sense?

Yes. Not noticing that the other nodes were starting with zsh prevented me from being where I thought I already was.

Thanks for the great clarifications on my untidy explainations.

bjr March 24, 2010 11:56

Quote:

Originally Posted by jsquyres (Post 251468)

You probably actually want to use "--mca btl openb,sm,self". This allows Open MPI to use shared memory for on-node communication (which can be faster than forcing it to loopback through your IB adapters). I don't know much about OpenFOAM to know if this will provide an overall performance boost or not.

Good to know and try

Quote:

Originally Posted by jsquyres (Post 251468)
Hmm. That's darn weird, actually.

Did you happen to install one version of Open MPI and then install a different version over it? You *may* have Open MPI plugins from different versions in the same installation tree that don't play nicely with each other.

I did... the version I'm using for my hello world in my earlier posts was there before this OpenFOAM version.

Quote:

Originally Posted by jsquyres (Post 251468)

If this is the case, try fully uninstalling Open MPI and then manually inspecting the $prefix/lib/openmpi dir and ensure that there's no plugins left over from a prior Open MPI installation. The install Open MPI again. Let me know if that works.

You can also try:

rm /open/mpi/installation/tree/lib/openmpi/mca_pml_csum.*

(you probably won't use the CSUM PML, so it's save to either remove or move to a different location where Open MPI won't find it) I'm not optimistic that that will fix it, but it could (if csum is from a prior Open MPI install).

I'm unclear about this last point... don't really understand what those files are... or if I should do that in lieu of, before, after the attempt to install Open MPI again?

Thanks again.

jsquyres March 24, 2010 13:12

Quote:

Originally Posted by bjr (Post 251472)
I'm unclear about this last point... don't really understand what those files are... or if I should do that in lieu of, before, after the attempt to install Open MPI again?

The easiest would be if you installed Open MPI into its own prefix that isn't shared with anything else. Perhaps:

Code:

./configure --prefix=/opt/openmpi ...
make -j 4 install

Then you can just "rm -rf /opt/openmpi" without fear of accidentally removing anything else (I do this all the time when developing Open MPI). Then you can install a new copy of Open MPI into /opt/openmpi and be sure that it's 100% clean.

The $prefix/lib/openmpi/* files (or, more specifically, $libdir/openmpi/*) that I was referring to were Open MPI's plugins. Each plugin has at least 1 file (sometimes 2, depending on your installation) in $libdir/openmpi. For example, $libdir/openmpi/mca_btl_openib.so. That's the BTL openib plugin.

The CSUM PML plugin is the Point-to-point messaging layer plugin named CSUM. The PML is the layer right behind MPI_SEND and friends. Specifically, MPI_SEND calls the back-end PML send function to actually effect the send (and so on). Think of the PML as the engine that drives all the MPI messaging semantics (communicator and tag matching, etc.).

CSUM is a PML that does all the normal sending and receiving, but also does checksums on the data to ensure data integrity. This is a Good Thing, but it definitely imposes a performance overhead. Most transports provide their own data reliability checking (e.g., TCP), so CSUM typically isn't worth it. But some transports can have problems with end-to-end reliablility -- that's why we developed CSUM.

Normally, you should probably use the "ob1" PML. OB1 and CSUM are identical except that CSUM does the checksumming. Specifically, both OB1 and CSUM use BTL plugins underneath the covers to effect point-to-point transmission and reception. Hence, both CSUM and OB1 can use the openib BTL.

That being said, to be totally clear, while you can use multiple BTL plugins in a single run (e.g., openib, sm, and self), you can only use ONE PML at a time. So you'll use CSUM *or* OB1 -- not both. So my point before was that if CSUM was somehow mucked up on your system, you could remove the CSUM .so plugin file and then it wouldn't ever be used. But I wasn't confident that that would fix your problem.

bjr March 24, 2010 17:01

Think I just got it figured out...

host2:~ # which ompi_info
/root/OpenFOAM/ThirdParty-1.6/openmpi-1.3.3/platforms/linux64GccDPOpt/bin/ompi_info
host2:~ # ompi_info | grep openib
MCA btl: openib (MCA v2.0, API v2.0, Component v1.3.3)

Turns out it was in the Allwmake file.

The relevant lines being changed to (for my configuration)...

./configure \
--prefix=$MPI_ARCH_PATH \
--disable-mpirun-prefix-by-default \
--disable-orterun-prefix-by-default \
--enable-shared --disable-static \
--disable-mpi-f77 --disable-mpi-f90 --disable-mpi-cxx \
--disable-mpi-profile
# These lines enable Infiniband support
#--with-openib=/usr/local/ofed \
#--with-openib-libdir=/usr/local/ofed/lib64
--with-openib=/usr/include/infiniband

jsquyres March 24, 2010 17:10

Quote:

Originally Posted by bjr (Post 251514)
Turns out it was in the Allwmake file.

I'm unfamiliar with Allwmake.

Quote:

Originally Posted by bjr (Post 251514)
./configure \
--prefix=$MPI_ARCH_PATH \
--disable-mpirun-prefix-by-default \
--disable-orterun-prefix-by-default \
--enable-shared --disable-static \
--disable-mpi-f77 --disable-mpi-f90 --disable-mpi-cxx \
--disable-mpi-profile
# These lines enable Infiniband support
#--with-openib=/usr/local/ofed \
#--with-openib-libdir=/usr/local/ofed/lib64
--with-openib=/usr/include/infiniband

FWIW, the --disable-mpirun-prefix-by-default and --disable-orterun-prefix-by-default aren't necessary; they're both redundant (i.e., one is a synonym for the other) and disabling that option is the default.

Additionally, --enable-shared and --disable-static is also the default.

The --with-openib line doesn't look quite right, but it probably squeaks by the tests we have in configure. Meaning: if you have OFED installed with the default /usr prefix, then Open MPI should be able to find OFED's headers and libraries with no extra help (because they're in the compiler's and linker's default search paths). So you should be able to do just --with-openib (i.e., not list any dir).

But hey, if it works... :-)

bjr March 24, 2010 17:13

Thanks Jeff... this Allwmake is the script that sets up all of our OpenFOAM "3rd party" tools. So the points you make are relevant community-wide and I wonder if I shouldn't try to make those changes and get them checked into the OpenFOAM SVN repository... as the best way to communicate this.

jsquyres March 24, 2010 17:23

Quote:

Originally Posted by bjr (Post 251514)
./configure \
--prefix=$MPI_ARCH_PATH \
--disable-mpirun-prefix-by-default \
--disable-orterun-prefix-by-default \
--enable-shared --disable-static \
--disable-mpi-f77 --disable-mpi-f90 --disable-mpi-cxx \
--disable-mpi-profile
# These lines enable Infiniband support
#--with-openib=/usr/local/ofed \
#--with-openib-libdir=/usr/local/ofed/lib64
--with-openib=/usr/include/infiniband

Oh -- I didn't see the problem until after I posted.

There's no \ after the --disable-mpi-profile line, so it probably ignored your --with-openib line. But then again, if OFED is installed in compiler/linker default locations, the --with-openib option is not strictly necessary because OMPI will find that stuff by default (and therefore build support for it).

Specifically, we treat --with-<foo> options in OMPI's configure thusly:

http://www.open-mpi.org/faq/?categor...#default-build

Hope that helps...

bjr March 24, 2010 17:25

I've actually fixed that, but you make a good point... I think simply recompiling is what solved it, not the path reference... because it did work as shown above.

bjr July 19, 2010 20:01

OpenFOAM working on our cluster!
 
I did get this working, and would be happy to try to address similar problems with anybody in the future.


All times are GMT -4. The time now is 03:24.