CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Parallel runs across a network broken with OpenMPI -- SSH issue--no bashrc? (https://www.cfd-online.com/Forums/openfoam-solving/62627-parallel-runs-across-network-broken-openmpi-ssh-issue-no-bashrc.html)

carcass March 15, 2009 20:51

Parallel runs across a network broken with OpenMPI -- SSH issue--no bashrc?
 
Hi,

Glad to see the new forums are up and running. I have a question for the group, and I wouldn't be surprised if this were a common problem (at least I'm hoping so, for my sake).

I have run a parallel case decomposed for two processors on a single machine successfully. I set up a second computer running an identical OS (Ubuntu 8.10) with the same update level, etc. as my primary box. Both computers have identical paths to the OpenFOAM installation as well as the case files.

I decomposed the case for four processors, two per machine, and executed the project per the instructions in the documentation. At this point I began to have problems with the remote machine not being able to find the icoFoam executable (nor any OpenFOAM executable, for that matter).

I tested running some simple system commands using a nearly identical mpirun command and succeeded running them on all four processors.

e.g.: mpirun --hostfile ./machines -np 4 uptime returns:

carcass@Esker:~/user-files/Works/scrubber-icoFoam$ time mpirun --hostfile ./machines -np 4 uptime
20:54:46 up 1 day, 4:24, 2 users, load average: 0.00, 0.00, 0.00
20:54:46 up 1 day, 4:24, 2 users, load average: 0.00, 0.00, 0.00
20:54:40 up 23:13, 5 users, load average: 0.40, 0.40, 0.39
20:54:40 up 23:13, 5 users, load average: 0.40, 0.40, 0.39

real 0m0.480s
user 0m0.048s
sys 0m0.056s


The problem seems to stem from SSH not executing any setup files (.bashrc, .profile, .bash_profile, /etc/profile, /etc/bash.bashrc, etc ad nauseam) when running a non-interactive, non-login shell.

I can duplicate the problem by executing:

mpirun --hostfile ./machines -np 4 which icoFoam

carcass@Esker:~/user-files/Works/scrubber-icoFoam$ mpirun --hostfile ./machines -np 4 which icoFoam
/home/carcass/OpenFOAM/OpenFOAM-1.5/applications/bin/linux64GccDPOpt/icoFoam
/home/carcass/OpenFOAM/OpenFOAM-1.5/applications/bin/linux64GccDPOpt/icoFoam

(only two processors return)

The problem appears to be that I cannot get the OpenFoam-1.5/etc/bashrc file to be evaluated upon ssh connecting to the remote machine. The problem recurs no matter which machine is the local and which is the remote (the configuration is exactly equivalent on the two machines).

I have done a bunch of searching around on the internet and have found many references to OpenSSH being broken as far as bash goes because the new OpenSSH uses pipes, not sockets, to connect, and hence will not be recognized by bash as needing execution of the environment setup files. Some have suggested rebuilding bash, but I'm really not at all interested in doing that.

Has anyone in the OpenFOAM community experienced a similar problem? I'd greatly appreciate some assistance. As I mentioned, I can successfully run on two processors on a local machine, but I'm really interested in getting networked machines into the mix as well.

Thanks in advance!

kieranwood85 March 16, 2009 08:02

Hi,

I had a very similar problem and although my Linux knowledge is limited I solved it like this.

The .bashrc file in /home/<user>/ has a line in it like this,

[ -z "$PS1" ] && return

This is telling the bash file to not load any more lines of the .bashrc file if it is a non interactive login (no keyboard). Since the remote login by OMPI is non interactive the OpenFOAM bash script is never loaded thus no executables can be found.

To solve just add the line,

. $HOME/OpenFOAM/OpnFOAM-1.5/etc/bashrc

,to the bottom of .bashrc as instructed, and comment out the line above so it reads with a hash infront,

# [ -z "$PS1" ] && return

I was using Xubuntu7.10 so I guess the problem is similar.

Kieran

jsquyres March 16, 2009 08:12

Have a look at http://www.open-mpi.org/faq/?categor...g-ompi-to-path -- it talks about adding Open MPI to your PATH (etc.), but the same principles apply. Additionally, it talks about what files are sourced by what shells, etc.

Hope this helps.

carcass March 16, 2009 18:53

Thanks for the answers. The workaround removing the reference to $PS1 worked for me.

Happily running parallel across a network,
carcass

kieranwood85 March 16, 2009 21:25

Glad it helped, however I am still having problems running in parallel across a network.

I am having to run OF in a Linux virtual machine ontop of a windows host using VMware since I am not an admin on the computers. The performance on individual machines is good and OMPI works with the two local processors. So far so good.

But I cannot get bridged unique IP adresses for the virtual machines and so all communication must be done over a singe port routed by NAT (which is automatically setup by VMWare) where the IP is shared for outgoing communication however any incomming communication is stoped by the host since it did not know it was for the guest. The computer admins have set up a single port forward rule and so all communications need to be routed over this port.

I have sucessfuly 'ssh'd' in and out of the remote machines over this port and also set up OpenVPN and then 'ssh'd' over many different ports using my newly created virtual private network IP's and so there is conectivity.

My problems start when trying OF over OMPI over the VPN. Simple programs like "uptime" work accross the VPN (using my private IP's) indicating MPI_init is completeing sucessfully and the nodes can see each other.

However when using OF I get a "blt_complete_connect" error of some sort after the OF header is displayed. The strange thing is that all of the proessors on the remotes get set at 100% until I ctrl+c the program, and so it appears that the post MPI-Init communication is failing. i.e. OF finds the nodes, connects, finds the executeables (thanks to the fix above), calls the header, but then cannot send any data needed for the solution.

Does anyone know of a reason OMPI would not work over a VPN? I thought about maybe there being some non IP traffic but then tested that with a IP traffic monitor on my home network where I could get the system working with bridged networking (no VPN required) and it is all IP traffic.

The point of all this is to allow parallel OF clusters to be setup on pre-existing windows networks with verly little change to the network or computer setup, thus keeping admins happy and giving me my own little network cluster.

Thanks in advance for any help.

Kieran

luca_g June 3, 2009 11:46

Dear Kieran,

I'm also trying something similar, just using sun virtualbox instead of vmware, and I'm having the same problem. So far I've been using cygwin and OF 1.4 port to cygwin but clearly a solution using virtualisation is required since OF porting to cygwin is not beeing updated any more. Furthermore, I've checked that running under virtual machine is faster than under cygwin by approx 20%.

I've found somewhere indications that openmpi v1.2.x, which is used in OF, does not allow connection between different subnets and that this limitation has been removed from v1.3. However, I've not tried to install openmpi v1.3 so far.

Did you find any solution since your last post ?

Thanks,

Luca

shamantic June 18, 2009 17:15

I have created ~/.ssh/rc with the content:
. $HOME/OpenFOAM/OpenFOAM-1.5/etc/bashrc

This helped also.

tomislav_maric August 2, 2009 13:54

hello everyone,

I'm having the same problem with OpenFOAM SLAX live DVD.

it seems that ssh is not calling .bashrc.

I didn't have any line to comment in the .bashrc file for the non-interactive run mode, since there was no such file to begin with. I have added

source /OpenFOAM/OpenFOAM-1.5-dev/etc/bashrc

in the ~/.bashrc file, but nothing happens.

when I run this command trying to execute interFoam on a damBreak over the LAN:

/full/pathname/mpirun -H mario -np2 `which interFoam` -parallel

I get errors telling me that mpirun daemon rolled over dead because it couldn't find libWhatever.so - this points to the same problem: unset environmental variables.

I've also tried adding rc file to .ssh but nothig seems to work. Any ideas?

of course, I've successfully set up ssh to log on without passwords and I can use scp and everything else without a problem.

shamantic December 9, 2022 10:07

My default Ubuntu .bashrc starts with code that terminates sourcing in non-interactive mode.
I then source the required environment just before that:


# ~/.bashrc: executed by bash(1) for non-login shells.
# see /usr/share/doc/bash/examples/startup-files (in the package bash-doc)
# for examples


source ~/OpenFoam/OpenFOAM-v2206/etc/bashrc

# If not running interactively, don't do anything
case $- in
*i*) ;;
*) return;;
esac
.....


And sorry about that hint on ~/ssh.rc - this works with ssh but not with mpirun


All times are GMT -4. The time now is 10:58.