CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM Installation (http://www.cfd-online.com/Forums/openfoam-installation/)
-   -   Please help cannot start lamboot (http://www.cfd-online.com/Forums/openfoam-installation/57540-please-help-cannot-start-lamboot.html)

hsieh March 15, 2005 23:47

Hi, I am trying to get para
 
Hi,

I am trying to get parallel computing going and are running into problem. It will be appreciate if anyone here can help me.

1. I got nfs running. Process1 mounted to process0.
2. I got passwordless ssh working. I can type:
ssh -v phsieh@192.168.254.43 and log in to the remote computer without entering a password.

But, I cannot get lamboot -v ... to start (in the file machines contains 2 nodes).

Here is the error message:
------------------------
[phsieh@brian3 interFoam]$ lamboot -v /home/phsieh/OpenFOAM/phsieh-1.1/run/tutorials/interFoam/damBreakFine/system/mac hines

LAM 7.1.1 - Indiana University

n-1<4730> ssi:boot:base:linear: booting n0 (brian3.hsieh.com)
n-1<4730> ssi:boot:base:linear: booting n1 (kevin3.hsieh.com)
ERROR: LAM/MPI unexpectedly received the following on stderr:
connect to address 192.168.254.32: Connection refused
connect to address 192.168.254.32: Connection refused
trying normal rsh (/usr/bin/rsh)
kevin3.hsieh.com: Connection refused
-----------------------------------------------------------------------------
LAM failed to execute a process on the remote node "kevin3.hsieh.com".
LAM was not trying to invoke any LAM-specific commands yet -- we were
simply trying to determine what shell was being used on the remote
host.

LAM tried to use the remote agent command "rsh"
to invoke "echo $SHELL" on the remote node.

*** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
*** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
*** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
*** MAILING LIST.

This usually indicates an authentication problem with the remote
agent, some other configuration type of error in your .cshrc or
.profile file, or you were unable to executable a command on the
remote node for some other reason. The following is a list of items
that you should check on the remote node:

- You have an account and can login to the remote machine
- Incorrect permissions on your home directory (should
probably be 0755)
- Incorrect permissions on your $HOME/.rhosts file (if you are
using rsh -- they should probably be 0644)
- You have an entry in the remote $HOME/.rhosts file (if you
are using rsh) for the machine and username that you are
running from
- Your .cshrc/.profile must not print anything out to the
standard error
- Your .cshrc/.profile should set a correct TERM type
- Your .cshrc/.profile should set the SHELL environment
variable to your default shell

Try invoking the following command at the unix command line:

rsh kevin3.hsieh.com -n 'echo $SHELL'

You will need to configure your local setup such that you will *not*
be prompted for a password to invoke this command on the remote node.
No output should be printed from the remote node before the output of
the command is displayed.

When you can get this command to execute successfully by hand, LAM
will probably be able to function properly.
-----------------------------------------------------------------------------
n-1<4730> ssi:boot:base:linear: Failed to boot n1 (kevin3.hsieh.com)
n-1<4730> ssi:boot:base:linear: aborted!
n-1<4735> ssi:boot:base:linear: booting n0 (brian3.hsieh.com)
n-1<4735> ssi:boot:base:linear: booting n1 (kevin3.hsieh.com)
ERROR: LAM/MPI unexpectedly received the following on stderr:
connect to address 192.168.254.32: Connection refused
connect to address 192.168.254.32: Connection refused
trying normal rsh (/usr/bin/rsh)
kevin3.hsieh.com: Connection refused
-----------------------------------------------------------------------------
LAM failed to execute a process on the remote node "kevin3.hsieh.com".
LAM was not trying to invoke any LAM-specific commands yet -- we were
simply trying to determine what shell was being used on the remote
host.

LAM tried to use the remote agent command "rsh"
to invoke "echo $SHELL" on the remote node.

*** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
*** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
*** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
*** MAILING LIST.

This usually indicates an authentication problem with the remote
agent, some other configuration type of error in your .cshrc or
.profile file, or you were unable to executable a command on the
remote node for some other reason. The following is a list of items
that you should check on the remote node:

- You have an account and can login to the remote machine
- Incorrect permissions on your home directory (should
probably be 0755)
- Incorrect permissions on your $HOME/.rhosts file (if you are
using rsh -- they should probably be 0644)
- You have an entry in the remote $HOME/.rhosts file (if you
are using rsh) for the machine and username that you are
running from
- Your .cshrc/.profile must not print anything out to the
standard error
- Your .cshrc/.profile should set a correct TERM type
- Your .cshrc/.profile should set the SHELL environment
variable to your default shell

Try invoking the following command at the unix command line:

rsh kevin3.hsieh.com -n 'echo $SHELL'

You will need to configure your local setup such that you will *not*
be prompted for a password to invoke this command on the remote node.
No output should be printed from the remote node before the output of
the command is displayed.

When you can get this command to execute successfully by hand, LAM
will probably be able to function properly.
-----------------------------------------------------------------------------
n-1<4735> ssi:boot:base:linear: Failed to boot n1 (kevin3.hsieh.com)
n-1<4735> ssi:boot:base:linear: aborted!
lamboot did NOT complete successfully
[phsieh@brian3 interFoam]$

pei

niklas March 16, 2005 04:14

Hi, Have you logged in to k
 
Hi,

Have you logged in to kevin3 with just ssh prior to starting lamboot?

The first time you log on a machine with ssh you have to answer a yes/no question and lamboot can no handle this.

You must therefore, by hand, log on to the machines you want to use for the parallell run before starting.
N

seang March 16, 2005 04:22

another alternative is to copy
 
another alternative is to copy all the relevant host information in the .ssh directory

seang March 16, 2005 04:24

ah ... sorry, a rather incompl
 
ah ... sorry, a rather incomplete sentence there, i mean, make sure all the different host id are present in the relevant file in your .ssh directory.

gjesing March 16, 2005 04:24

Hi, LAM is as default using
 
Hi,

LAM is as default using rsh, but that is properly not installed or running on your system. Instead, change to ssh (which is also more secure), by adding and setting the enviroment variable LAMRSH to "ssh -x". As I read your post, you already has ssh working, so you just need to tell lam to use it.

/Rasmus

lakeat May 22, 2007 21:41

Sorry, but my remote computer
 
Sorry, but my remote computer (30 cpus) doesn't support ssh, and what could i do? I have to rsh it (let me call it B) from my computer (I call it A).
I have tried passwordless rsh for a whole day, such as set .rhosts file, or /etc/hosts.****, but failed.
my question is, do i need to install openFoam on the remote computer?
Thanks in advance.

Daniel

gschaider May 23, 2007 14:58

Yep. For rsh the binaries (in
 
Yep. For rsh the binaries (in your case OpenFOAM) have to be available on B (rsh just starts a shell on B and tries to execute your command there; no data(programs) get sent during the process). If your B uses the same /home as your A then this should be relatively easy (provided they are both of the same architecture) because you already have a /home/daniel/OpenFOAM where these programs reside. The only problem is that maybe the OpenFOAM/OpenFOAM-1.4/.OpenFOAM-1.4/bashrc doesn't get sourced on B and therefor the OF-applications (and the .so-s) will not be found (but it's a long time that I worked with rsh, so I'm only 93% sure).

Just try
rsh B interFoam
and report what error message you get.

lakeat May 23, 2007 22:41

Thank you very much, Bernhardh
 
Thank you very much, Bernhardhttp://www.cfd-online.com/OpenFOAM_D...part/happy.gif
now, when i try rsh B interFoam, i get

connect to address 202.***.***.*** port ***: Connection refused
Trying krb4 rsh...
connect to address 202.***.***.*** port ***: Connection refused
Trying normal rsh (/usr/bin/rsh)
Login incorrect.


I'm a newbie to linux world, and i guess that I probably have made a mistake on trying rsh passwordless, I should have modified hosts, hosts.allow, hosts.deny, hosts.equiv, and .rhosts in computer B, not in A. am i right?

I use fedora c6, and my remote computer is IRIX which InstallationTest told me OpenFOAM can only be installed on linux, therefore that's why i ask do I have to install OpenFOAM on both computers? and it's not quite easy for me to get the root right of B.http://www.cfd-online.com/OpenFOAM_D...part/happy.gif

Thanks again.

Daniel

gschaider May 24, 2007 14:44

As above: if the machines shar
 
As above: if the machines share a /home then setting ~/.rhost should be sufficient.

One guess is this: Some of the passwordless logins ONLY work if the .bashrc (or equivalen whatever your choice is) don't print any output to stdout. If your B does such a thing you'll have to remove these commands or distinguish between interactive (can have output) and non-interactive (==rsh) logins. But that is only a guess.


All times are GMT -4. The time now is 11:54.