CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Problem with openmpi (https://www.cfd-online.com/Forums/openfoam-solving/59029-problem-openmpi.html)

mighelone October 31, 2007 12:46

Hello to everybody! I'm tes
 
Hello to everybody!

I'm testing the new release of openFoam (1.4.1) with the openmpi libraries.

Starting a job with mpirun, I've obtain this error:

michele@enercluster:~$ mpirun -hostfile machines -np 3 a.out
bash: orted: command not found
[enercluster:06382] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275
[enercluster:06382] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1164
[enercluster:06382] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
[enercluster:06382] ERROR: A daemon on node 192.168.0.2 failed to start as expected.
[enercluster:06382] ERROR: There may be more information available from
[enercluster:06382] ERROR: the remote shell (see above).
[enercluster:06382] ERROR: The daemon exited unexpectedly with status 127.
[enercluster:06382] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188
[enercluster:06382] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1196
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS.

If I run the process only in the local machine it works!
I suppose is that a problem of PATH on the remote machines, infact executing this command:

michele@enercluster:~$ ssh node2 printenv |grep PATH
PATH=/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games

It means that when I log using ssh the openmpi are not correctly configured.

But if I log in the remote node using ssh:

# ssh node2
# echo $PATH
/opt/Fluent.Inc/bin:/home/michele/OpenFOAM/linux64/paraview-2.4.4/bin:/home/mich ele/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/bin:/hom e/michele/OpenFOAM/OpenFOAM-1.4.1/src/mico-2.3.12/platforms/linux64GccDPOpt/bin: /home/michele/OpenFOAM/linux64/j2sdk1.4.2_05/bin:/home/michele/OpenFOAM/linux64/ gcc-4.2.1/bin:/home/michele/OpenFOAM/michele-1.4.1/applications/bin/linux64GccDP Opt:/home/michele/OpenFOAM/OpenFOAM-1.4.1/applications/bin/linux64GccDPOpt:/home /michele/OpenFOAM/OpenFOAM-1.4.1/wmake:/home/michele/OpenFOAM/OpenFOAM-1.4.1/bin :/usr/local/bin:/usr/bin:/bin:/usr/games

Since it is a problem of the PATH environment variable with non-interactive ssh session.
The OpenFoam path is defined in .bashrc

Any ideas?

Thank you
Michele

gtg627e October 31, 2007 22:48

Michele, check out this pos
 
Michele,

check out this post:

http://www.cfd-online.com/OpenFOAM_D...es/1/5473.html

I ran into a similar problem a while back.

Alessandro

mighelone November 1, 2007 07:11

Hi Alessandro, Thank you fo
 
Hi Alessandro,

Thank you for your answer, but I've already read your post, but I suppose that my problem is different.

OpenFoam is correctly installed in every nodes (I try to run the case not in parallel in some nodes, without problem).

I guess the problem is related to the ssh non-interactive login, when I execute mpirun.
Infact during the non-interactive ssh login, my .bashrc settings file is not read, so my PATH doesn't include the PATH for mpi and OpenFoam.

I guess also that is a problem of openssh package distributed by debian, because working with other distro the .bashrc is correctly read ( I test using "ssh HOST printenv").


Michele

olesen November 2, 2007 02:10

Hi Michele, There are a few
 
Hi Michele,

There are a few problems that you might be experiencing.

Open-MPI needs to find its own binary path in order to boot the orted. OpenFOAM sets the variable 'OPENMPI_ARCH_PATH' to point to the openmpi installation. You can use that to help openmpi find itself. Calling mpirun with either of these should get the orte to boot:

a) $OPENMPI_ARCH_PATH/bin/mpirun ...
b) mpriun -prefix $OPENMPI_ARCH_PATH

If you try 'mpirun -help' and get the message the it can't find anything, try adding the following:

$ export OPAL_PREFIX=$OPENMPI_ARCH_PATH


Now that openmpi can boot, the remaining problem is getting the OpenFOAM environment set on the remote nodes. The nicest solution (thanks Henry) is to use foamExec to wrap the call to your application.

Attached is a modified version of http://www.cfd-online.com/OpenFOAM_D...s/mime_txt.gif foamExec in which the version is optional.

mighelone November 2, 2007 04:06

Hi Mark, I'm trying to run
 
Hi Mark,

I'm trying to run a very simple parallel application (hello world!) with openmpi. If I run without -prefix option:

mpirun --hostfile machines -np 4 ./a.out

I obtain the previous error:
bash: orted: command not found
mpirun: killing job...

If I run with the option -prefix:

mpirun -prefix $OPENMPI_ARCH_PATH --hostfile machines -np 4 ./a.out

I don't receive any answer from my application.

furthermore I try to install the openmpi from my distro, installaed in canonical path (/usr/bin e /usr/lib), and the same program run without problem.

Now I will try the foamExec with native openmpi lib!

Thank you Michele

olesen November 2, 2007 04:38

It might be that orte is worki
 
It might be that orte is working, but takes a long time to boot. Check if the orte is running on any of the remote nodes. I find the following alias quite useful:

alias psf='/bin/ps -e f -o user,pid,ppid,pgrp,command'


Increasing the mpirun verbosity and/or the debug-daemons might help figure out what is happening (see mpirun -help), but I haven't done this sort of thing for a long time.

Also try with an absolute path to your a.out, in case the working directory is somehow getting lost across the nodes.

mighelone November 2, 2007 06:29

Nothing to due! I f a log i
 
Nothing to due!

I f a log in the remote machine I don't have any process related to mpi.

Also giving the absolute path of a.out I don't receive any message!

I've found that the PATH during a non-interactive PATH, it is possible to set the environment variables in the file /etc/environment or in $HOME/.ssh/environment

The problem is that OpenFOAM has too many variable to set, and I don't found any way to execute the OpenFOAM script during the interactive login.

olesen November 2, 2007 07:16

The problem is that OpenFOAM h
 
Quote:

The problem is that OpenFOAM has too many variable to set, and I don't found any way to execute the OpenFOAM script during the interactive login.
The problem is setting *any* variables, not how many variables there are.

This is what foamExec helps you do. It provides a simple wrapper to source the requisite bashrc before executign whatever command.

In fact, we only set the FOAM environment as required and don't put it in my ~/.profile or ~/.bashrc.

Thus if I ssh to a remote host and check the OpenFOAM variables:

Eg,
$ ssh HOST printenv | grep WM_

shows nothing

but if I use foamExec:

$ ssh HOST /path/to/foamExec printenv | grep WM_

then I see about 18 OpenFOAM variables.
Does this not work for you?

mighelone November 2, 2007 07:54

Hi Mark! At the moment I do
 
Hi Mark!

At the moment I don't try foamExec, but I solve the problem in this way:

# printenv > .ssh/environment

and I copied this file in all remote nodes.

In this way I define all my environment variables in the file .ssh/environment, that is read by ssh during non-interactive login.

Anyway the use of foamExec is very interesting in order to have a clean system without the several variables used by OpenFOAM.

I suppose that using foamExec I have anyway to set the variable related with openmpi in .ssh/environment, like:

OPENMPI_HOME=/home/michele/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3
OPENMPI_VERSION=1.2.3


since without these variables mpirun is not able to find the command orte in the remote machine.

Thank you
Michele

mighelone November 2, 2007 09:52

Hi Mark! I'm trying to use
 
Hi Mark!

I'm trying to use foamExec, after removing any references in .bashrc and .ssh/environment about OpenFOAM (if I well understood foamExec is able to set all the foam variables, before running any applications), but I obtain the following error:

Error : bashrc file could not be found for OpenFOAM-1.4.1

That I suppose is related to the fact that the script do not found the bashrc file of OpenFOAM.

How can I solve this inconvenient?

Michele

olesen November 2, 2007 10:07

Error : bashrc file could not
 
Quote:

Error : bashrc file could not be found for OpenFOAM-1.4.1
That I suppose is related to the fact that the script do not found the bashrc file of OpenFOAM.
I assume
1) that you are using the foamExec that I posted
2) that you copied foamExec to $WM_PROJECT_DIR/bin/
3) that $WM_PROJECT_DIR/.OpenFOAM-$WM_PROJECT_VERSION/ exists and that $WM_PROJECT_DIR/.OpenFOAM-$WM_PROJECT_VERSION/bashrc exists
4) that you are using the fully qualified path to foamExec -OR- you are using the -v option.

Check what foamExec is doing:

$ /bin/sh -x /path/to/foamExec /bin/true
OR
$ /bin/sh foamExec -v 1.4.1 /bin/true

mighelone November 3, 2007 06:42

Hi Mark! 1) I'm executing y
 
Hi Mark!

1) I'm executing your script
2) the script is not in the $WM_PROJECT_DIR/bin/ but in another place
3) These files and directories exist
4) I'm using the fully qualified path, but in the wrong place

Now I'm not at work, so I can not try your advices, Monday I will try again!

Thank you again for your help

Michele

cricke November 6, 2007 05:11

Hi Mark and Michelle! I am fo
 
Hi Mark and Michelle!
I am following your thread but get following message when trying to execute the following command

Command:

foamExec -v OpenFOAM-1.4.1 $OPENMPI_ARCH_PATH/bin/mpirun --hostfile machines -np 4 simpleFOAM $HOME VAFAB_multi -paralell

Message:

foamExec: access denied

1) I'm executing your script
2) the script is in the $WM_PROJECT_DIR/bin/
3) These files and directories exist
4) I'm using the -v option

When using the '$OPENMPI_ARCH_PATH/bin/mpirun ...' as you suggested the mpirun and orte is started on the host machine but not on the slave.

What am I doing wrong?

Regards

Christofer Ivarsson

olesen November 6, 2007 05:21

Message: foamExec: access de
 
Quote:

Message:
foamExec: access denied
What about the next point?
5) Is foamExec readable and executable by the users (eg, chmod 0755).

cricke November 6, 2007 06:37

Hi and thanks for your instant
 
Hi and thanks for your instant reply!


The file is perfectly readable and saved as an application-file. This is what happends when trying to execute foamExec only

a403518@ENEPCST75:~$ foamExec
bash:home/a403518/OpenFOAM/OpenFOAM-1.4.1/bin/foamExec: Access denied

Obviously its all about environment variables, right? I get the same result even with no extra environment variables added in the .bashrc

Shall I add som kind of path in the .bashrc to point out the foamExec?

/Chris

olesen November 6, 2007 06:56

Hi Christofer, With "Access
 
Hi Christofer,

With "Access Denied", I still suspect some sort of file permissions problem. Check that the rest of your OpenFOAM installation is readable.
(eg, chmod -v -R a+rX ...). Note use large 'X' and not small 'x' in the chmod command.

Recheck what foamExec is doing:
$ /bin/sh -x /path/to/foamExec /bin/true
OR
$ /bin/sh foamExec -v 1.4.1 /bin/true


When I issue foamExec without any arguments, I get the message "no application specified" and then the usage as per the -help option.

cricke November 6, 2007 08:12

ok, sorry for me being a nut h
 
ok, sorry for me being a nut head but do you want me to write those commands in the terminal because that is what I did and here is the results.

a403518@ENEPCST75:~$ /bin/sh -x /patch/to/foamExec /bin/true
/bin/sh: Can't open /patch/to/foamExec
a403518@ENEPCST75:~$ bin/sh foamExec -v 1.4.1 /bin/true
bash: bin/sh: File does not exist

I have no restricted access to my OpenFOAM directory and OpenFOAM runs smoothly. Do you still think its a permission problem? When I run the chmod command it suddenly changed the permission to my OpenFOAM-directory to locked so I had to change that back again.

/Chris

olesen November 6, 2007 09:04

There was a typo there: $ /
 
There was a typo there:

$ /bin/sh -x /the/path/to/foam/bin/on/your/machine/foamExec

I don't know the path to foam on your machine.

cricke November 6, 2007 09:23

Thanks for you being patient w
 
Thanks for you being patient with me. Following happends

Recheking what foamExec is doing when:

a403518@ENEPCST75:~$ /bin/sh foamExec -v 1.4.1 $HOME/OpenFOAM/OpenFOAM-1.4.1/bin/true
Executing: /home/a403518/OpenFOAM/OpenFOAM-1.4.1/.bashrc
Executing: /home/a403518/OpenFOAM/OpenFOAM-1.4.1/.OpenFOAM-1.4.1/apps/ensightFoam/bashrc
Executing: /home/a403518/OpenFOAM/OpenFOAM-1.4.1/.OpenFOAM-1.4.1/apps/paraview/bashrc
exec: 141: /home/a403518/OpenFOAM/OpenFOAM-1.4.1/bin/true: not found

OR

403518@ENEPCST75:~$ /bin/sh -x $HOME/OpenFOAM/OpenFOAM-1.4.1/ bin/true
a403518@ENEPCST75:~$

The latter seem to work since it doesnt protest in the command window. However the first command seem to find all env-variables except 'true'. I must tell that I have already added the first three:
.../.bashrc
.../paraview/basrc
.../ensightFOAM/bashrc

in my .bashrc so that they set automatically when opening a terminal window.

So whats up next?
I get contact with my slave node and I enter the password. Then nothing else happends in the command window. The mpirun and orted start but are listed as sleeping processes. The solver never starts.

/Chris

olesen November 7, 2007 02:28

Hi Christofer, I really tho
 
Hi Christofer,

I really thought that absolutely every Unix machine had '/bin/true', but if not you can always test with a command such as 'hostname' which normally lives under '/bin' but if your system is really strange it might be under '/usr/bin' or I don't know where else.

You *do* however need to setup ssh to connect without passwords, otherwise you can forget about the rest. There must be something about this in the open-mpi FAQ. If not, check the mpich FAQ.

/mark

cricke November 7, 2007 08:42

Ok, I have configured the .ssh
 
Ok, I have configured the .ssh with a 'authorization_keys' file and I may now ssh into the server node without password. Though, not in the reverse direction for some reason. When I ssh into the server all the OpenFOAM env-variables is set automatically via the .bashrc so seem to work properly. Still, the mpirun never starts after executing

$OPENMPI_ARCH_PATH/bin/mpirun --hostfile machines -np 4 simpleFOAM $HOME VAFAB_multi -paralell

nothing happends

The /bin/true exists but

bash: bin/true: can not find file

Still I believe some environment files are missing...

/C

olesen November 7, 2007 09:38

The /bin/true exists but
 
Quote:

The /bin/true exists but

bash: bin/true: can not find file
When I issue 'bin/true' I also get a similar message

-bash: bin/true: No such file or directory

With '/bin/true' however, it works fine.

Quote:

Still, the mpirun never starts after executing

$OPENMPI_ARCH_PATH/bin/mpirun --hostfile machines -np 4 simpleFOAM $HOME VAFAB_multi -paralell
Okay, but have you made sure that all the subcomponents are really working?

Check that it works on the same machine:
$OPENMPI_ARCH_PATH/bin/mpirun -np 4 /bin/hostname

Add '--debug-daemons' and see what you find.
It might be time to find someone closer to your location (eg sysadmin) and have them take a look.

cricke November 7, 2007 10:52

Sysadmins whitin Linux only ex
 
Sysadmins whitin Linux only exists i heaven, WINDOWS have bewitched em all...but thanks for all your support!

/Christofer

mer March 11, 2008 11:24

Hi! I have an execution para
 
Hi!
I have an execution parallel problem with openmpi. I run an interfoam case using two Pcs under FEDORA 7 and I get some errors during execution, both Pcs have the case and I search in openmpi forum and the recommandations doesn't work for my case.
Here are the details:*

- For the eth0 in PC1
/sbin/ifconfig

eth0 Link encap:Ethernet HWaddr 00:1D:92:09:A9:BE

inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0

inet6 addr: fe80::21d:92ff:fe09:a9be/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:378 errors:0 dropped:0 overruns:0 frame:0

TX packets:92 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:63038 (61.5 KiB) TX bytes:18064 (17.6 KiB)



lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

inet6 addr: ::1/128 Scope:Host

UP LOOPBACK RUNNING MTU:16436 Metric:1

RX packets:14355 errors:0 dropped:0 overruns:0 frame:0

TX packets:14355 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:90409124 (86.2 MiB) TX bytes:90409124 (86.2 MiB)



virbr0 Link encap:Ethernet HWaddr 00:00:00:00:00:00

inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0

inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:40 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:0 (0.0 b) TX bytes:8700 (8.4 KiB)

- For the eth0 in PC2
/sbin/ifconfig
eth0 Link encap:Ethernet HWaddr 00:00:21:0B:C6:2B
inet addr:192.168.0.3 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::200:21ff:fe0b:c62b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:113 errors:0 dropped:0 overruns:0 frame:0
TX packets:105 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:17546 (17.1 KiB) TX bytes:19208 (18.7 KiB)
Interrupt:5 Base address:0x4000

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:19960 errors:0 dropped:0 overruns:0 frame:0
TX packets:19960 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:142422400 (135.8 MiB) TX bytes:142422400 (135.8 MiB)

virbr0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:35 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:8390 (8.1 KiB)

- for the execution:

(1)
[mer@merrouche2 ~]$ mpirun --mca pls_rsh_agent "ssh : rsh" --hostfile /home/mer/machinefile -np 2 interFoam /home/mer/OpenFOAM/mer-1.4.1/run/tutorials/interFoam case_2 -parallel

MPI Pstream initialized with:

floatTransfer : 1

nProcsSimpleSum : 0

scheduledTransfer : 0



/*---------------------------------------------------------------------------*\

| ========= | |

| \ / F ield | OpenFOAM: The Open Source CFD Toolbox |

| \ / O peration | Version: 1.4.1 |

| \ / A nd | Web: http://www.openfoam.org |

| \/ M anipulation | |

\*---------------------------------------------------------------------------*/



Exec : interFoam /home/mer/OpenFOAM/mer-1.4.1/run/tutorials/interFoam case_2 -parallel

[0] Date : Mar 11 2008

[0] Time : 15:17:23

[0] Host : merrouche2

[0] PID : 4576

[1] Date : Mar 11 2008

[0] Root : /home/mer/OpenFOAM/mer-1.4.1/run/tutorials/interFoam

[0] Case : case_2

[0] Nprocs : 2

[0] Slaves :

[0] 1

[0] (

[0] merrouche3.3216

[0] )

[0]

Create time



[1] Time : 15:20:19

[1] Host : merrouche3

[1] PID : 3216

[1] Root : /home/mer/OpenFOAM/mer-1.4.1/run/tutorials/interFoam

[1] Case : case_2

[1] Nprocs : 2

[merrouche2][0,1,0][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_comple te_connect ] connect() failed with errno=111

[merrouche3][0,1,1][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_comple te_connect ] connect() failed with errno=111




(2)
[mer@merrouche2 ~]$ mpirun --mca pls_rsh_agent "rsh : ssh" --hostfile /home/mer/machinefile -np 2 interFoam /home/mer/OpenFOAM/mer-1.4.1/run/tutorials/interFoam case_2 -parallel

merrouche3: Connection refused

[merrouche2:05257] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275

[merrouche2:05257] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1164

[merrouche2:05257] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90

[merrouche2:05257] ERROR: A daemon on node merrouche3 failed to start as expected.

[merrouche2:05257] ERROR: There may be more information available from

[merrouche2:05257] ERROR: the remote shell (see above).

[merrouche2:05257] ERROR: The daemon exited unexpectedly with status 1.

[merrouche2:05257] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188

[merrouche2:05257] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1196

--------------------------------------------------------------------------

mpirun was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS.


Please help me

mike_jaworski March 11, 2008 13:06

Merrouche, Do you have any
 
Merrouche,
Do you have any firewalls running on either machine? My understanding of openMPI is that it doesn't make use of a specific range of ports so it's not possible to configure a firewall to use it. Or if there is, no one seems to know how. I'd suggest testing this by disabling your firewalls and trying mpirun again.

Good Luck,
Mike J.

mer March 11, 2008 14:51

Hi Mike! Thanks for your quic
 
Hi Mike!
Thanks for your quick response.
The firewalls are desabled and I can ssh to machines. Also, my case works on each machine alone using mpirun.
Another informations:
- if I type ompi_info : the mca pls_rsh_agent is rsh and I can't rsh to machines, for this reason there is the second case : "merrouche3: Connection refused"

- but I can ssh to machines and I ask if my expression: --mca pls_rsh_agent "ssh : rsh" is correct and force openmpi to use ssh.

ThankS

mer March 15, 2008 07:30

Hi! Is there any one who can
 
Hi!
Is there any one who can help me to resolve my problem. Until now, I can't run my execution using mpirun under OF1.4.1. The errors are the same as mentionned in my previous posts.
I try the hint posted in the forum without any success. I create a new user and install a new OF-1.4.1 version, but the problem remains.
Is there any relation with LAM/MPI and OpenMPI? I ask this question because I installed a lam-7.1.3 version on my pcs before installing OF -1.4.1.

I really NEED your helps.

THANKS

DSpreitz April 20, 2009 15:55

OpenMPI on thin client
 
3 Attachment(s)
Hi forum,
Since I have a problem with my OpenMPI and OF 1.5 .... I read lots of posts in this forum and sorted out a lot of problems on the way (thanks to all the helpful advices already posted in this forum) , but now I am really stuck.

As a little background, I'll give you the big picture of my intentions:
Over night & weekend I want to use the CAD workstations in office to run OF in parallel. Since I can't install anything locally on these workstations, I have to use them as thin-clients and boot them over the GBIT-network (pxe, dhcp, atftp, nfs, ...). On the server, that supplies the boot image (derived from pelicanHPC) to the clients I run Ubuntu and installed OF 1.5 with Jure's script.
At this point I can successfully boot one client over network and then mount the server's home directory, where OF resides using NFS. When I connect to the client using ssh and the passwordless login, I can run icoFoam and the cavity case on the client. I can run the same case on the server too.
To check the openMPI installation, I ran the 'hello world' example that Mark suggested in my home directory (which is also mounted on the client) on the server by typing:
wget http://icl.cs.utk.edu/open-mpi/paper...p-2006/hello.c
mpicc hello.c -o hello
mpirun -np 4 ./hello


which resulted in:
Hello, World. I am 0 of 4
Hello, World. I am 1 of 4
Hello, World. I am 2 of 4
Hello, World. I am 3 of 4


To check if I can run the hello world example on server and client I tried:
mpirun -np 4 -host 10.11.12.1 -host 10.11.12.2 ./hello
and got the same result as posted above. I verified, that the example actually ran on both machines by specifying the -d option in the above command.

So far so good. OF seems to work, networking between the machines is working, mpi is running.

However, if I go and try:
mpirun -np 2 -hostfile machines icoFoam /home/user/OpenFOAM/user-1.5/run/tutorials/icoFoam/cavity -parallel > log_mpi_icoFOAM 2>&1
I get:
cat log_mpi_icoFOAM
--------------------------------------------------------------------------
Failed to find the following executable:

Host: debian
Executable: icoFoam

Cannot continue.
--------------------------------------------------------------------------

mpirun noticed that job rank 0 with PID 8624 on node 10.11.12.1 exited on signal 15 (Terminated).

From the debug output of the above command (see attachment) I conclude, that all my environment variables are set correctly and also exported over mpi.
Yet, parallel execution on the client fails, although I can run the same case on the client directly. Very strange to me.

Did anybody experience a similar problem or can at least give me a hint or an idea where to start digging for a solution? Any help would be much appreciated.

P.S.:
I don't know if it has anything to do with my problem, but I attached the output of my ldd -v icoFoam, that I ran on the client and server. The libraries are different, but as I said before, icoFoam seems to run on the client if I launch it directly through ssh.

DSpreitz April 20, 2009 16:45

Ok, I have to reply to myself:
My .bashrc file contained:
[ -z "$PS1" ] && return
which means it did not load any OF related environment variables at non-interactive (openMPI & SSH) login.

By replacing this line with:
if [ -z "$PS1" ]; then
source /home/user/OpenFOAM/OpenFOAM-1.5/etc/bashrc
fi
I changed that behaviour. Now the necessary OF environment variables are loaded at non-interactive login and my mpi runs are executed correctly.

Maybe this helps somebody else.

Dominic

holzmichel May 14, 2009 02:39

Hello,

I try to run OF-1.5-dev in parallel without any success. I changed my bashrc like DSpreitz describes, I created the OF on all pc's, I installed NFS and so on ...

When I try to run, for example icoFoam, the following message is in my terminal:

michel@Linux-K:~/OpenFOAM/michel-1.5-dev/run/tutorials/icoFoam/cavity$ /home/michel/OpenFOAM/OpenFOAM-1.5-dev/applications/bin/linuxGccDPOpt/icoFoam: error while loading shared libraries: libfiniteVolume.so: cannot open shared object file: No such file or directory

This error will every time come when I use the command:

mpirun --hostfile <machines> -np 2 /home/michel/OpenFOAM/OpenFOAM-1.5-dev/applications/bin/linuxGccDPOpt/icoFoam $FOAM_RUN/tutorials/icoFoam cavity -parallel > log &

but when i use this command:
mpirun --hostfile <machines> -np 2 icoFoam $FOAM_RUN/tutorials/icoFoam cavity -parallel > log &
the following error will appear
mpirun --hostfile /home/michel/machines -np 2 icoFoam /home/michel/OpenFOAM/michel-1.5-dev/run/tutorials/icoFoam cavity -parallel > log--------------------------------------------------------------------------
Failed to find the following executable:

Host: ubuntu
Executable: interFoam

Cannot continue.
--------------------------------------------------------------------------
mpirun noticed that job rank 0 with PID 14312 on node 192.168.1.82 exited on signal 15 (Terminated)

thats really strange because OF as I said is on the other pc too. However, I tried this command with the full path of mpirun

/home/michel/OpenFOAM/ThirdParty/openmpi-1.2.6/platforms/linuxGccDPOpt/bin/mpirun --hostfile ... and so on and no error will appear but my pc do nothing???

Has anybody some suggestions for me. I need the parallel mode because computing of my cases will need more and more time.

Thanks

Best regards

Michel

holzmichel May 14, 2009 03:57

well
 
I tried the following command:

/home/michel/OpenFOAM/OpenFOAM-1.5-dev/bin/foamExec -v 1.5-dev /home/michel/OpenFOAM/ThirdParty/openmpi-1.2.6/platforms/linuxGccDPOpt/bin/mpirun --hostfile /home/michel/machines -np 2 icoFoam /home/michel/OpenFOAM/michel-1.5-dev/run/tutorials/icoFoam cavity -parallel > log

with the foamExec and this error appears

[: 106: ==: unexpected operator
[: 106: ==: unexpected operator
[: 153: ==: unexpected operator
[: 257: ==: unexpected operator
[: 259: ==: unexpected operator
[: 56: ==: unexpected operator
[: 74: ==: unexpected operator

and nothing will happens.
which file will cause this error? and what can i do to solve it?

Thank you again

Michel

DSpreitz May 14, 2009 13:46

Michael,

I don't know if it really changes great things, but the mpirun command that I use contains a -case statement.
Here is the command:
mpirun --hostfile ~/machines_pil -np 12 /home/user/OpenFOAM/OpenFOAM-1.5/applications/bin/linuxGccDPOpt/icoFoam -case /home/user/OpenFOAM/user-1.5/run/tutorials/icoFoam/cavity -parallel
everything in one line, obviously.

Dominic

holzmichel May 15, 2009 03:02

Thanks Dominic for your reply,

i tried it with -case but the same error appears
/home/michel/OpenFOAM/OpenFOAM-1.5-dev/applications/bin/linuxGccDPOpt/icoFoam: error while loading shared libraries: libfiniteVolume.so: cannot open shared object file: No such file or directory
I think openmpi do not know the paths to all libraries. I tried to export them in the bashrc but it doesn't worked. maybe i do something wrong by adding the paths but i do not know.

Thanks

Michel

holzmichel May 15, 2009 09:29

I checked my $PATH and $LD_LIBRARY_PATH on every machine. both looks like this

$PATH
/home/michel/OpenFOAM/ThirdParty/ParaView3.3-cvs/platforms/linuxGcc/bin:/home/michel/OpenFOAM/ThirdParty/cmake-2.4.6/platforms/linux/bin:/home/michel/OpenFOAM/ThirdParty/openmpi-1.2.6/platforms/linuxGccDPOpt/bin:/home/michel/OpenFOAM/ThirdParty/gcc-4.3.1/platforms/linux/bin:/home/michel/OpenFOAM/michel-1.5-dev/applications/bin/linuxGccDPOpt:/home/michel/OpenFOAM/OpenFOAM-1.5-dev/applications/bin/linuxGccDPOpt:/home/michel/OpenFOAM/OpenFOAM-1.5-dev/wmake:/home/michel/OpenFOAM/OpenFOAM-1.5-dev/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games

$LD_LIBRARY_PATH
/home/michel/OpenFOAM/ThirdParty/ParaView3.3-cvs/platforms/linuxGcc/bin:/home/michel/OpenFOAM/OpenFOAM-1.5-dev/lib/linuxGccDPOpt/openmpi-1.2.6:/home/michel/OpenFOAM/ThirdParty/openmpi-1.2.6/platforms/linuxGccDPOpt/lib:/home/michel/OpenFOAM/ThirdParty/gcc-4.3.1/platforms/linux/lib:/home/michel/OpenFOAM/michel-1.5-dev/lib/linuxGccDPOpt:/home/michel/OpenFOAM/OpenFOAM-1.5-dev/lib/linuxGccDPOpt

I think that everything is ok but mpirun is still not working.
Is the PATH and the LD_LIBRARY_PATH ok or do I have to add something in the bashrc?

regards

Michel

holzmichel May 20, 2009 06:01

I am still not able to run in parallel :(
Has nobody some ideas for me?

Michel

olesen May 20, 2009 08:36

Quote:

Originally Posted by holzmichel (Post 216033)
I tried the following command:

/home/michel/OpenFOAM/OpenFOAM-1.5-dev/bin/foamExec -v 1.5-dev /home/michel/OpenFOAM/ThirdParty/openmpi-1.2.6/platforms/linuxGccDPOpt/bin/mpirun --hostfile /home/michel/machines -np 2 icoFoam /home/michel/OpenFOAM/michel-1.5-dev/run/tutorials/icoFoam cavity -parallel > log

Okay, but did you also try what is suggested in the foamExec description?
With foamExec being called by mpirun, instead of vice-versa:
Code:

#    Can also be used for parallel runs e.g.
#    mpirun -np <nProcs> \
#        foamExec -v <foamVersion> <foamCommand> ... -parallel


olesen May 20, 2009 08:42

Quote:

Originally Posted by holzmichel (Post 216273)
I checked my $PATH and $LD_LIBRARY_PATH on every machine. both looks like this
...
I think that everything is ok but mpirun is still not working.
Is the PATH and the LD_LIBRARY_PATH ok or do I have to add something in the bashrc?

Are you certain that the PATH and LD_LIBRARY_PATH are set for non-interactive shells? You can try, for example, the following (the single-quotes are needed to avoid shell expansion):

Code:

ssh $HOST 'echo $PATH; echo; echo $LD_LIBRARY_PATH'
Are you using something like ksh that doesn't have a resource file for non-interactive shells (although I believe this problem was fixed long ago in openmpi).

holzmichel May 26, 2009 03:57

Hello Mark,

your way solved my problem.
Thank you for your help.

Best regards

tomislav_maric August 1, 2009 15:49

Hello everyone, I'm glad I've found this thread, since I have the same problem holzmichel has ran into. searching on the net I've found the instructions that I should comment out the if command that returns from the bash if it's ran in non interactive mode.

actually, I'm running OpenFOAM from SLAX live DVD and I'm trying to figure out how to use this live DVD for simulations on a LAN.

the folowwing code suggested by olesen

Code:

ssh $HOST 'echo $PATH; echo; echo $LD_LIBRARY_PATH'
has showed me that LD_LIBRARY_PATH and PATH are not set at all when bash runs in non interactive mode. how exactly can I solve this?

Thank you in advance,
Tomislav

tomislav_maric August 2, 2009 16:06

Quote:

Originally Posted by olesen (Post 216716)
Okay, but did you also try what is suggested in the foamExec description?
With foamExec being called by mpirun, instead of vice-versa:
Code:

#    Can also be used for parallel runs e.g.
#    mpirun -np <nProcs> \
#        foamExec -v <foamVersion> <foamCommand> ... -parallel


I'm having the same problem: error loading shared libraries (libFiniteVolume.so..), so I have tried the command above, but mpirun complains that I'm running multiple commands with unspecified -np number.

my command goes like this:

/path/name/of/mpirun -np 2 -H mario foamExex -v 1.5-dev interFoam -parallel

and it works fine on my dual core laptop.

What am I doing wrong?


All times are GMT -4. The time now is 23:35.