|
[Sponsors] |
December 8, 2008, 19:31 |
Hi,
I want to use multiple
|
#1 |
Member
Daniel Harlacher
Join Date: Mar 2009
Location: Davis, CA, United States
Posts: 60
Rep Power: 17 |
Hi,
I want to use multiple computers to speed up my calculations. but I ran in some trouble, here is my story: I have the following setup: 1 Server with the home directories n clients with 2 Cores each so the problem is that when I log on to Client No. 1 and want to start a calculation with: mpirun --hostfile machines -np nx2 turbFoam -parallel >log I get the following error: bash: orted: command not found I figured it could be a problem, that the library for orted(mpi) lies in the ThirdParty folder and therefore would be accessed by both computers at the same time on the server. To solve that problem I created a new account and copied the OpenFoam Installation. Now what I wanted to do was: - being logged on to client No.1 - start calculation on client 1 with current user and on client 2 as the new user but I could not find a way to tell mpirun to log on the client 2 with a seperate username. Is multiple computing with the setup I have possible at all? I just have full write access to my homefolder and the home directories are mounted as nfs. Maybe some of you tried that already. Thank you -harly |
|
December 9, 2008, 09:46 |
Hi Harly!
I think that the
|
#2 |
Assistant Moderator
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51 |
Hi Harly!
I think that the problem (in your original setup, I don't understand the stuff with the two users) might have been that the other machine did not know where to find orted. If both of your machines are identically set up a generous application of the -x option of mpirun might help. The options that I for instance use are (it is possible that not all of them are necessary): -x PATH -x LD_LIBRARY_PATH -x WM_PROJECT_DIR -x FOAM_MPI_LIBBIN Bernhard
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request |
|
December 9, 2008, 12:45 |
Hi,
In my company the home
|
#3 |
Member
florian
Join Date: Mar 2009
Location: Mannheim - Vincennes - Valenciennes, Deutchland - France
Posts: 34
Rep Power: 17 |
Hi,
In my company the home folder is shared on all computers. And MPICH methode works. I tried a work on 3x8 cores. And it works. But I don't know exactly how the home folder is mounted on the machines. Florian |
|
December 9, 2008, 16:17 |
Hi,
the trick with -x didn'
|
#4 |
Member
Daniel Harlacher
Join Date: Mar 2009
Location: Davis, CA, United States
Posts: 60
Rep Power: 17 |
Hi,
the trick with -x didn't help here is the complete error - maybe you can find something in there: bash: orted: command not found [lab15:18747] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275 [lab15:18747] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1166 [lab15:18747] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90 [lab15:18747] ERROR: A daemon on node lab13 failed to start as expected. [lab15:18747] ERROR: There may be more information available from [lab15:18747] ERROR: the remote shell (see above). [lab15:18747] ERROR: The daemon exited unexpectedly with status 127. [lab15:18747] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188 [lab15:18747] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1198 -------------------------------------------------------------------------- mpirun was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS. -------------------------------------------------------------------------- thanks a lot btw: lab15 is my client 1(I am logged on to and start the calculation) and lab13 is client 2 |
|
December 9, 2008, 16:41 |
Hi!
Never had that problem,
|
#5 |
Assistant Moderator
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51 |
Hi!
Never had that problem, so I'm not sure: is it possible that mpirun can't start the remote processes (either via rsh or ssh). Try from lab15 rsh lab13 which orted or ssh lab13 which orted One of them should find the correct file. Otherwise you'd have to configure rsh or ssh (better) to allow remote execution. Although the "bash: orted: command not found" hints at "I can connect but on the remote machine I can't find orted". Make sure it is in the same place on both machines (symbolic links are your friends) Bernhard
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
parallel computing | Daniel | FLUENT | 1 | November 21, 2007 17:09 |
single and parallel machine with different results | zenith | FLUENT | 2 | May 10, 2007 01:27 |
Parallel Computing | Himanshu Almadi | FLUENT | 0 | April 12, 2006 13:43 |
Reg Parallel Computing | Kalyan | CFX | 3 | August 5, 2005 10:00 |
CFX-5.7.1(Linux) Parallel - 4 CPU Machine | James Date | CFX | 6 | June 14, 2005 19:03 |