CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (http://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Parallel Computing on more than one machine (http://www.cfd-online.com/Forums/openfoam-solving/58229-parallel-computing-more-than-one-machine.html)

harly December 8, 2008 18:31

Hi, I want to use multiple
 
Hi,

I want to use multiple computers to speed up my calculations. but I ran in some trouble, here is my story:

I have the following setup:

1 Server with the home directories

n clients with 2 Cores each

so the problem is that when I log on to Client No. 1 and want to start a calculation with:

mpirun --hostfile machines -np nx2 turbFoam -parallel >log

I get the following error:

bash: orted: command not found

I figured it could be a problem, that the library for orted(mpi) lies in the ThirdParty folder and therefore would be accessed by both computers at the same time on the server.

To solve that problem I created a new account and copied the OpenFoam Installation.

Now what I wanted to do was:

- being logged on to client No.1
- start calculation on client 1 with current user and on client 2 as the new user

but I could not find a way to tell mpirun to log on the client 2 with a seperate username.

Is multiple computing with the setup I have possible at all? I just have full write access to my homefolder and the home directories are mounted as nfs.

Maybe some of you tried that already.

Thank you
-harly

gschaider December 9, 2008 08:46

Hi Harly! I think that the
 
Hi Harly!

I think that the problem (in your original setup, I don't understand the stuff with the two users) might have been that the other machine did not know where to find orted. If both of your machines are identically set up a generous application of the -x option of mpirun might help. The options that I for instance use are (it is possible that not all of them are necessary):

-x PATH -x LD_LIBRARY_PATH -x WM_PROJECT_DIR -x FOAM_MPI_LIBBIN

Bernhard

floooo December 9, 2008 11:45

Hi, In my company the home
 
Hi,

In my company the home folder is shared on all computers.
And MPICH methode works.
I tried a work on 3x8 cores. And it works.

But I don't know exactly how the home folder is mounted on the machines.

Florian

harly December 9, 2008 15:17

Hi, the trick with -x didn'
 
Hi,

the trick with -x didn't help here is the complete error - maybe you can find something in there:

bash: orted: command not found
[lab15:18747] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275
[lab15:18747] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1166
[lab15:18747] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
[lab15:18747] ERROR: A daemon on node lab13 failed to start as expected.
[lab15:18747] ERROR: There may be more information available from
[lab15:18747] ERROR: the remote shell (see above).
[lab15:18747] ERROR: The daemon exited unexpectedly with status 127.
[lab15:18747] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188
[lab15:18747] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1198
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS.
--------------------------------------------------------------------------

thanks a lot

btw: lab15 is my client 1(I am logged on to and start the calculation) and lab13 is client 2

gschaider December 9, 2008 15:41

Hi! Never had that problem,
 
Hi!

Never had that problem, so I'm not sure: is it possible that mpirun can't start the remote processes (either via rsh or ssh). Try from lab15

rsh lab13 which orted

or

ssh lab13 which orted

One of them should find the correct file. Otherwise you'd have to configure rsh or ssh (better) to allow remote execution.

Although the "bash: orted: command not found" hints at "I can connect but on the remote machine I can't find orted". Make sure it is in the same place on both machines (symbolic links are your friends)

Bernhard


All times are GMT -4. The time now is 01:14.