CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Problem with openmpi (https://www.cfd-online.com/Forums/openfoam-solving/59029-problem-openmpi.html)

cricke November 7, 2007 08:42

Ok, I have configured the .ssh
 
Ok, I have configured the .ssh with a 'authorization_keys' file and I may now ssh into the server node without password. Though, not in the reverse direction for some reason. When I ssh into the server all the OpenFOAM env-variables is set automatically via the .bashrc so seem to work properly. Still, the mpirun never starts after executing

$OPENMPI_ARCH_PATH/bin/mpirun --hostfile machines -np 4 simpleFOAM $HOME VAFAB_multi -paralell

nothing happends

The /bin/true exists but

bash: bin/true: can not find file

Still I believe some environment files are missing...

/C

olesen November 7, 2007 09:38

The /bin/true exists but
 
Quote:

The /bin/true exists but

bash: bin/true: can not find file
When I issue 'bin/true' I also get a similar message

-bash: bin/true: No such file or directory

With '/bin/true' however, it works fine.

Quote:

Still, the mpirun never starts after executing

$OPENMPI_ARCH_PATH/bin/mpirun --hostfile machines -np 4 simpleFOAM $HOME VAFAB_multi -paralell
Okay, but have you made sure that all the subcomponents are really working?

Check that it works on the same machine:
$OPENMPI_ARCH_PATH/bin/mpirun -np 4 /bin/hostname

Add '--debug-daemons' and see what you find.
It might be time to find someone closer to your location (eg sysadmin) and have them take a look.

cricke November 7, 2007 10:52

Sysadmins whitin Linux only ex
 
Sysadmins whitin Linux only exists i heaven, WINDOWS have bewitched em all...but thanks for all your support!

/Christofer

mer March 11, 2008 11:24

Hi! I have an execution para
 
Hi!
I have an execution parallel problem with openmpi. I run an interfoam case using two Pcs under FEDORA 7 and I get some errors during execution, both Pcs have the case and I search in openmpi forum and the recommandations doesn't work for my case.
Here are the details:*

- For the eth0 in PC1
/sbin/ifconfig

eth0 Link encap:Ethernet HWaddr 00:1D:92:09:A9:BE

inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0

inet6 addr: fe80::21d:92ff:fe09:a9be/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:378 errors:0 dropped:0 overruns:0 frame:0

TX packets:92 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:63038 (61.5 KiB) TX bytes:18064 (17.6 KiB)



lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

inet6 addr: ::1/128 Scope:Host

UP LOOPBACK RUNNING MTU:16436 Metric:1

RX packets:14355 errors:0 dropped:0 overruns:0 frame:0

TX packets:14355 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:90409124 (86.2 MiB) TX bytes:90409124 (86.2 MiB)



virbr0 Link encap:Ethernet HWaddr 00:00:00:00:00:00

inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0

inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:40 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:0 (0.0 b) TX bytes:8700 (8.4 KiB)

- For the eth0 in PC2
/sbin/ifconfig
eth0 Link encap:Ethernet HWaddr 00:00:21:0B:C6:2B
inet addr:192.168.0.3 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::200:21ff:fe0b:c62b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:113 errors:0 dropped:0 overruns:0 frame:0
TX packets:105 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:17546 (17.1 KiB) TX bytes:19208 (18.7 KiB)
Interrupt:5 Base address:0x4000

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:19960 errors:0 dropped:0 overruns:0 frame:0
TX packets:19960 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:142422400 (135.8 MiB) TX bytes:142422400 (135.8 MiB)

virbr0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:35 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:8390 (8.1 KiB)

- for the execution:

(1)
[mer@merrouche2 ~]$ mpirun --mca pls_rsh_agent "ssh : rsh" --hostfile /home/mer/machinefile -np 2 interFoam /home/mer/OpenFOAM/mer-1.4.1/run/tutorials/interFoam case_2 -parallel

MPI Pstream initialized with:

floatTransfer : 1

nProcsSimpleSum : 0

scheduledTransfer : 0



/*---------------------------------------------------------------------------*\

| ========= | |

| \ / F ield | OpenFOAM: The Open Source CFD Toolbox |

| \ / O peration | Version: 1.4.1 |

| \ / A nd | Web: http://www.openfoam.org |

| \/ M anipulation | |

\*---------------------------------------------------------------------------*/



Exec : interFoam /home/mer/OpenFOAM/mer-1.4.1/run/tutorials/interFoam case_2 -parallel

[0] Date : Mar 11 2008

[0] Time : 15:17:23

[0] Host : merrouche2

[0] PID : 4576

[1] Date : Mar 11 2008

[0] Root : /home/mer/OpenFOAM/mer-1.4.1/run/tutorials/interFoam

[0] Case : case_2

[0] Nprocs : 2

[0] Slaves :

[0] 1

[0] (

[0] merrouche3.3216

[0] )

[0]

Create time



[1] Time : 15:20:19

[1] Host : merrouche3

[1] PID : 3216

[1] Root : /home/mer/OpenFOAM/mer-1.4.1/run/tutorials/interFoam

[1] Case : case_2

[1] Nprocs : 2

[merrouche2][0,1,0][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_comple te_connect ] connect() failed with errno=111

[merrouche3][0,1,1][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_comple te_connect ] connect() failed with errno=111




(2)
[mer@merrouche2 ~]$ mpirun --mca pls_rsh_agent "rsh : ssh" --hostfile /home/mer/machinefile -np 2 interFoam /home/mer/OpenFOAM/mer-1.4.1/run/tutorials/interFoam case_2 -parallel

merrouche3: Connection refused

[merrouche2:05257] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275

[merrouche2:05257] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1164

[merrouche2:05257] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90

[merrouche2:05257] ERROR: A daemon on node merrouche3 failed to start as expected.

[merrouche2:05257] ERROR: There may be more information available from

[merrouche2:05257] ERROR: the remote shell (see above).

[merrouche2:05257] ERROR: The daemon exited unexpectedly with status 1.

[merrouche2:05257] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188

[merrouche2:05257] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1196

--------------------------------------------------------------------------

mpirun was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS.


Please help me

mike_jaworski March 11, 2008 13:06

Merrouche, Do you have any
 
Merrouche,
Do you have any firewalls running on either machine? My understanding of openMPI is that it doesn't make use of a specific range of ports so it's not possible to configure a firewall to use it. Or if there is, no one seems to know how. I'd suggest testing this by disabling your firewalls and trying mpirun again.

Good Luck,
Mike J.

mer March 11, 2008 14:51

Hi Mike! Thanks for your quic
 
Hi Mike!
Thanks for your quick response.
The firewalls are desabled and I can ssh to machines. Also, my case works on each machine alone using mpirun.
Another informations:
- if I type ompi_info : the mca pls_rsh_agent is rsh and I can't rsh to machines, for this reason there is the second case : "merrouche3: Connection refused"

- but I can ssh to machines and I ask if my expression: --mca pls_rsh_agent "ssh : rsh" is correct and force openmpi to use ssh.

ThankS

mer March 15, 2008 07:30

Hi! Is there any one who can
 
Hi!
Is there any one who can help me to resolve my problem. Until now, I can't run my execution using mpirun under OF1.4.1. The errors are the same as mentionned in my previous posts.
I try the hint posted in the forum without any success. I create a new user and install a new OF-1.4.1 version, but the problem remains.
Is there any relation with LAM/MPI and OpenMPI? I ask this question because I installed a lam-7.1.3 version on my pcs before installing OF -1.4.1.

I really NEED your helps.

THANKS

DSpreitz April 20, 2009 15:55

OpenMPI on thin client
 
3 Attachment(s)
Hi forum,
Since I have a problem with my OpenMPI and OF 1.5 .... I read lots of posts in this forum and sorted out a lot of problems on the way (thanks to all the helpful advices already posted in this forum) , but now I am really stuck.

As a little background, I'll give you the big picture of my intentions:
Over night & weekend I want to use the CAD workstations in office to run OF in parallel. Since I can't install anything locally on these workstations, I have to use them as thin-clients and boot them over the GBIT-network (pxe, dhcp, atftp, nfs, ...). On the server, that supplies the boot image (derived from pelicanHPC) to the clients I run Ubuntu and installed OF 1.5 with Jure's script.
At this point I can successfully boot one client over network and then mount the server's home directory, where OF resides using NFS. When I connect to the client using ssh and the passwordless login, I can run icoFoam and the cavity case on the client. I can run the same case on the server too.
To check the openMPI installation, I ran the 'hello world' example that Mark suggested in my home directory (which is also mounted on the client) on the server by typing:
wget http://icl.cs.utk.edu/open-mpi/paper...p-2006/hello.c
mpicc hello.c -o hello
mpirun -np 4 ./hello


which resulted in:
Hello, World. I am 0 of 4
Hello, World. I am 1 of 4
Hello, World. I am 2 of 4
Hello, World. I am 3 of 4


To check if I can run the hello world example on server and client I tried:
mpirun -np 4 -host 10.11.12.1 -host 10.11.12.2 ./hello
and got the same result as posted above. I verified, that the example actually ran on both machines by specifying the -d option in the above command.

So far so good. OF seems to work, networking between the machines is working, mpi is running.

However, if I go and try:
mpirun -np 2 -hostfile machines icoFoam /home/user/OpenFOAM/user-1.5/run/tutorials/icoFoam/cavity -parallel > log_mpi_icoFOAM 2>&1
I get:
cat log_mpi_icoFOAM
--------------------------------------------------------------------------
Failed to find the following executable:

Host: debian
Executable: icoFoam

Cannot continue.
--------------------------------------------------------------------------

mpirun noticed that job rank 0 with PID 8624 on node 10.11.12.1 exited on signal 15 (Terminated).

From the debug output of the above command (see attachment) I conclude, that all my environment variables are set correctly and also exported over mpi.
Yet, parallel execution on the client fails, although I can run the same case on the client directly. Very strange to me.

Did anybody experience a similar problem or can at least give me a hint or an idea where to start digging for a solution? Any help would be much appreciated.

P.S.:
I don't know if it has anything to do with my problem, but I attached the output of my ldd -v icoFoam, that I ran on the client and server. The libraries are different, but as I said before, icoFoam seems to run on the client if I launch it directly through ssh.

DSpreitz April 20, 2009 16:45

Ok, I have to reply to myself:
My .bashrc file contained:
[ -z "$PS1" ] && return
which means it did not load any OF related environment variables at non-interactive (openMPI & SSH) login.

By replacing this line with:
if [ -z "$PS1" ]; then
source /home/user/OpenFOAM/OpenFOAM-1.5/etc/bashrc
fi
I changed that behaviour. Now the necessary OF environment variables are loaded at non-interactive login and my mpi runs are executed correctly.

Maybe this helps somebody else.

Dominic

holzmichel May 14, 2009 02:39

Hello,

I try to run OF-1.5-dev in parallel without any success. I changed my bashrc like DSpreitz describes, I created the OF on all pc's, I installed NFS and so on ...

When I try to run, for example icoFoam, the following message is in my terminal:

michel@Linux-K:~/OpenFOAM/michel-1.5-dev/run/tutorials/icoFoam/cavity$ /home/michel/OpenFOAM/OpenFOAM-1.5-dev/applications/bin/linuxGccDPOpt/icoFoam: error while loading shared libraries: libfiniteVolume.so: cannot open shared object file: No such file or directory

This error will every time come when I use the command:

mpirun --hostfile <machines> -np 2 /home/michel/OpenFOAM/OpenFOAM-1.5-dev/applications/bin/linuxGccDPOpt/icoFoam $FOAM_RUN/tutorials/icoFoam cavity -parallel > log &

but when i use this command:
mpirun --hostfile <machines> -np 2 icoFoam $FOAM_RUN/tutorials/icoFoam cavity -parallel > log &
the following error will appear
mpirun --hostfile /home/michel/machines -np 2 icoFoam /home/michel/OpenFOAM/michel-1.5-dev/run/tutorials/icoFoam cavity -parallel > log--------------------------------------------------------------------------
Failed to find the following executable:

Host: ubuntu
Executable: interFoam

Cannot continue.
--------------------------------------------------------------------------
mpirun noticed that job rank 0 with PID 14312 on node 192.168.1.82 exited on signal 15 (Terminated)

thats really strange because OF as I said is on the other pc too. However, I tried this command with the full path of mpirun

/home/michel/OpenFOAM/ThirdParty/openmpi-1.2.6/platforms/linuxGccDPOpt/bin/mpirun --hostfile ... and so on and no error will appear but my pc do nothing???

Has anybody some suggestions for me. I need the parallel mode because computing of my cases will need more and more time.

Thanks

Best regards

Michel

holzmichel May 14, 2009 03:57

well
 
I tried the following command:

/home/michel/OpenFOAM/OpenFOAM-1.5-dev/bin/foamExec -v 1.5-dev /home/michel/OpenFOAM/ThirdParty/openmpi-1.2.6/platforms/linuxGccDPOpt/bin/mpirun --hostfile /home/michel/machines -np 2 icoFoam /home/michel/OpenFOAM/michel-1.5-dev/run/tutorials/icoFoam cavity -parallel > log

with the foamExec and this error appears

[: 106: ==: unexpected operator
[: 106: ==: unexpected operator
[: 153: ==: unexpected operator
[: 257: ==: unexpected operator
[: 259: ==: unexpected operator
[: 56: ==: unexpected operator
[: 74: ==: unexpected operator

and nothing will happens.
which file will cause this error? and what can i do to solve it?

Thank you again

Michel

DSpreitz May 14, 2009 13:46

Michael,

I don't know if it really changes great things, but the mpirun command that I use contains a -case statement.
Here is the command:
mpirun --hostfile ~/machines_pil -np 12 /home/user/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/icoFoam -case /home/user/OpenFOAM/user-1.5/run/tutorials/icoFoam/cavity -parallel
everything in one line, obviously.

Dominic

holzmichel May 15, 2009 03:02

Thanks Dominic for your reply,

i tried it with -case but the same error appears
/home/michel/OpenFOAM/OpenFOAM-1.5-dev/applications/bin/linuxGccDPOpt/icoFoam: error while loading shared libraries: libfiniteVolume.so: cannot open shared object file: No such file or directory
I think openmpi do not know the paths to all libraries. I tried to export them in the bashrc but it doesn't worked. maybe i do something wrong by adding the paths but i do not know.

Thanks

Michel

holzmichel May 15, 2009 09:29

I checked my $PATH and $LD_LIBRARY_PATH on every machine. both looks like this

$PATH
/home/michel/OpenFOAM/ThirdParty/ParaView3.3-cvs/platforms/linuxGcc/bin:/home/michel/OpenFOAM/ThirdParty/cmake-2.4.6/platforms/linux/bin:/home/michel/OpenFOAM/ThirdParty/openmpi-1.2.6/platforms/linuxGccDPOpt/bin:/home/michel/OpenFOAM/ThirdParty/gcc-4.3.1/platforms/linux/bin:/home/michel/OpenFOAM/michel-1.5-dev/applications/bin/linuxGccDPOpt:/home/michel/OpenFOAM/OpenFOAM-1.5-dev/applications/bin/linuxGccDPOpt:/home/michel/OpenFOAM/OpenFOAM-1.5-dev/wmake:/home/michel/OpenFOAM/OpenFOAM-1.5-dev/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games

$LD_LIBRARY_PATH
/home/michel/OpenFOAM/ThirdParty/ParaView3.3-cvs/platforms/linuxGcc/bin:/home/michel/OpenFOAM/OpenFOAM-1.5-dev/lib/linuxGccDPOpt/openmpi-1.2.6:/home/michel/OpenFOAM/ThirdParty/openmpi-1.2.6/platforms/linuxGccDPOpt/lib:/home/michel/OpenFOAM/ThirdParty/gcc-4.3.1/platforms/linux/lib:/home/michel/OpenFOAM/michel-1.5-dev/lib/linuxGccDPOpt:/home/michel/OpenFOAM/OpenFOAM-1.5-dev/lib/linuxGccDPOpt

I think that everything is ok but mpirun is still not working.
Is the PATH and the LD_LIBRARY_PATH ok or do I have to add something in the bashrc?

regards

Michel

holzmichel May 20, 2009 06:01

I am still not able to run in parallel :(
Has nobody some ideas for me?

Michel

olesen May 20, 2009 08:36

Quote:

Originally Posted by holzmichel (Post 216033)
I tried the following command:

/home/michel/OpenFOAM/OpenFOAM-1.5-dev/bin/foamExec -v 1.5-dev /home/michel/OpenFOAM/ThirdParty/openmpi-1.2.6/platforms/linuxGccDPOpt/bin/mpirun --hostfile /home/michel/machines -np 2 icoFoam /home/michel/OpenFOAM/michel-1.5-dev/run/tutorials/icoFoam cavity -parallel > log

Okay, but did you also try what is suggested in the foamExec description?
With foamExec being called by mpirun, instead of vice-versa:
Code:

#    Can also be used for parallel runs e.g.
#    mpirun -np <nProcs> \
#        foamExec -v <foamVersion> <foamCommand> ... -parallel


olesen May 20, 2009 08:42

Quote:

Originally Posted by holzmichel (Post 216273)
I checked my $PATH and $LD_LIBRARY_PATH on every machine. both looks like this
...
I think that everything is ok but mpirun is still not working.
Is the PATH and the LD_LIBRARY_PATH ok or do I have to add something in the bashrc?

Are you certain that the PATH and LD_LIBRARY_PATH are set for non-interactive shells? You can try, for example, the following (the single-quotes are needed to avoid shell expansion):

Code:

ssh $HOST 'echo $PATH; echo; echo $LD_LIBRARY_PATH'
Are you using something like ksh that doesn't have a resource file for non-interactive shells (although I believe this problem was fixed long ago in openmpi).

holzmichel May 26, 2009 03:57

Hello Mark,

your way solved my problem.
Thank you for your help.

Best regards

tomislav_maric August 1, 2009 15:49

Hello everyone, I'm glad I've found this thread, since I have the same problem holzmichel has ran into. searching on the net I've found the instructions that I should comment out the if command that returns from the bash if it's ran in non interactive mode.

actually, I'm running OpenFOAM from SLAX live DVD and I'm trying to figure out how to use this live DVD for simulations on a LAN.

the folowwing code suggested by olesen

Code:

ssh $HOST 'echo $PATH; echo; echo $LD_LIBRARY_PATH'
has showed me that LD_LIBRARY_PATH and PATH are not set at all when bash runs in non interactive mode. how exactly can I solve this?

Thank you in advance,
Tomislav

tomislav_maric August 2, 2009 16:06

Quote:

Originally Posted by olesen (Post 216716)
Okay, but did you also try what is suggested in the foamExec description?
With foamExec being called by mpirun, instead of vice-versa:
Code:

#    Can also be used for parallel runs e.g.
#    mpirun -np <nProcs> \
#        foamExec -v <foamVersion> <foamCommand> ... -parallel


I'm having the same problem: error loading shared libraries (libFiniteVolume.so..), so I have tried the command above, but mpirun complains that I'm running multiple commands with unspecified -np number.

my command goes like this:

/path/name/of/mpirun -np 2 -H mario foamExex -v 1.5-dev interFoam -parallel

and it works fine on my dual core laptop.

What am I doing wrong?


All times are GMT -4. The time now is 07:14.