CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > OpenFOAM Running, Solving & CFD

Parallel runs across a network broken with OpenMPI -- SSH issue--no bashrc?

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   March 15, 2009, 20:51
Default Parallel runs across a network broken with OpenMPI -- SSH issue--no bashrc?
  #1
New Member
 
db
Join Date: Mar 2009
Posts: 3
Rep Power: 8
carcass is on a distinguished road
Hi,

Glad to see the new forums are up and running. I have a question for the group, and I wouldn't be surprised if this were a common problem (at least I'm hoping so, for my sake).

I have run a parallel case decomposed for two processors on a single machine successfully. I set up a second computer running an identical OS (Ubuntu 8.10) with the same update level, etc. as my primary box. Both computers have identical paths to the OpenFOAM installation as well as the case files.

I decomposed the case for four processors, two per machine, and executed the project per the instructions in the documentation. At this point I began to have problems with the remote machine not being able to find the icoFoam executable (nor any OpenFOAM executable, for that matter).

I tested running some simple system commands using a nearly identical mpirun command and succeeded running them on all four processors.

e.g.: mpirun --hostfile ./machines -np 4 uptime returns:

carcass@Esker:~/user-files/Works/scrubber-icoFoam$ time mpirun --hostfile ./machines -np 4 uptime
20:54:46 up 1 day, 4:24, 2 users, load average: 0.00, 0.00, 0.00
20:54:46 up 1 day, 4:24, 2 users, load average: 0.00, 0.00, 0.00
20:54:40 up 23:13, 5 users, load average: 0.40, 0.40, 0.39
20:54:40 up 23:13, 5 users, load average: 0.40, 0.40, 0.39

real 0m0.480s
user 0m0.048s
sys 0m0.056s


The problem seems to stem from SSH not executing any setup files (.bashrc, .profile, .bash_profile, /etc/profile, /etc/bash.bashrc, etc ad nauseam) when running a non-interactive, non-login shell.

I can duplicate the problem by executing:

mpirun --hostfile ./machines -np 4 which icoFoam

carcass@Esker:~/user-files/Works/scrubber-icoFoam$ mpirun --hostfile ./machines -np 4 which icoFoam
/home/carcass/OpenFOAM/OpenFOAM-1.5/applications/bin/linux64GccDPOpt/icoFoam
/home/carcass/OpenFOAM/OpenFOAM-1.5/applications/bin/linux64GccDPOpt/icoFoam

(only two processors return)

The problem appears to be that I cannot get the OpenFoam-1.5/etc/bashrc file to be evaluated upon ssh connecting to the remote machine. The problem recurs no matter which machine is the local and which is the remote (the configuration is exactly equivalent on the two machines).

I have done a bunch of searching around on the internet and have found many references to OpenSSH being broken as far as bash goes because the new OpenSSH uses pipes, not sockets, to connect, and hence will not be recognized by bash as needing execution of the environment setup files. Some have suggested rebuilding bash, but I'm really not at all interested in doing that.

Has anyone in the OpenFOAM community experienced a similar problem? I'd greatly appreciate some assistance. As I mentioned, I can successfully run on two processors on a local machine, but I'm really interested in getting networked machines into the mix as well.

Thanks in advance!
carcass is offline   Reply With Quote

Old   March 16, 2009, 08:02
Default
  #2
New Member
 
Kieran Wood
Join Date: Mar 2009
Posts: 7
Rep Power: 8
kieranwood85 is on a distinguished road
Hi,

I had a very similar problem and although my Linux knowledge is limited I solved it like this.

The .bashrc file in /home/<user>/ has a line in it like this,

[ -z "$PS1" ] && return

This is telling the bash file to not load any more lines of the .bashrc file if it is a non interactive login (no keyboard). Since the remote login by OMPI is non interactive the OpenFOAM bash script is never loaded thus no executables can be found.

To solve just add the line,

. $HOME/OpenFOAM/OpnFOAM-1.5/etc/bashrc

,to the bottom of .bashrc as instructed, and comment out the line above so it reads with a hash infront,

# [ -z "$PS1" ] && return

I was using Xubuntu7.10 so I guess the problem is similar.

Kieran
kieranwood85 is offline   Reply With Quote

Old   March 16, 2009, 08:12
Default
  #3
New Member
 
Jeff Squyres
Join Date: Mar 2009
Posts: 6
Rep Power: 8
jsquyres is on a distinguished road
Have a look at http://www.open-mpi.org/faq/?categor...g-ompi-to-path -- it talks about adding Open MPI to your PATH (etc.), but the same principles apply. Additionally, it talks about what files are sourced by what shells, etc.

Hope this helps.
jsquyres is offline   Reply With Quote

Old   March 16, 2009, 18:53
Default
  #4
New Member
 
db
Join Date: Mar 2009
Posts: 3
Rep Power: 8
carcass is on a distinguished road
Thanks for the answers. The workaround removing the reference to $PS1 worked for me.

Happily running parallel across a network,
carcass
carcass is offline   Reply With Quote

Old   March 16, 2009, 21:25
Default
  #5
New Member
 
Kieran Wood
Join Date: Mar 2009
Posts: 7
Rep Power: 8
kieranwood85 is on a distinguished road
Glad it helped, however I am still having problems running in parallel across a network.

I am having to run OF in a Linux virtual machine ontop of a windows host using VMware since I am not an admin on the computers. The performance on individual machines is good and OMPI works with the two local processors. So far so good.

But I cannot get bridged unique IP adresses for the virtual machines and so all communication must be done over a singe port routed by NAT (which is automatically setup by VMWare) where the IP is shared for outgoing communication however any incomming communication is stoped by the host since it did not know it was for the guest. The computer admins have set up a single port forward rule and so all communications need to be routed over this port.

I have sucessfuly 'ssh'd' in and out of the remote machines over this port and also set up OpenVPN and then 'ssh'd' over many different ports using my newly created virtual private network IP's and so there is conectivity.

My problems start when trying OF over OMPI over the VPN. Simple programs like "uptime" work accross the VPN (using my private IP's) indicating MPI_init is completeing sucessfully and the nodes can see each other.

However when using OF I get a "blt_complete_connect" error of some sort after the OF header is displayed. The strange thing is that all of the proessors on the remotes get set at 100% until I ctrl+c the program, and so it appears that the post MPI-Init communication is failing. i.e. OF finds the nodes, connects, finds the executeables (thanks to the fix above), calls the header, but then cannot send any data needed for the solution.

Does anyone know of a reason OMPI would not work over a VPN? I thought about maybe there being some non IP traffic but then tested that with a IP traffic monitor on my home network where I could get the system working with bridged networking (no VPN required) and it is all IP traffic.

The point of all this is to allow parallel OF clusters to be setup on pre-existing windows networks with verly little change to the network or computer setup, thus keeping admins happy and giving me my own little network cluster.

Thanks in advance for any help.

Kieran
kieranwood85 is offline   Reply With Quote

Old   June 3, 2009, 10:46
Default
  #6
Member
 
Luca Gasparini
Join Date: Mar 2009
Location: Italy
Posts: 37
Rep Power: 8
luca_g is on a distinguished road
Dear Kieran,

I'm also trying something similar, just using sun virtualbox instead of vmware, and I'm having the same problem. So far I've been using cygwin and OF 1.4 port to cygwin but clearly a solution using virtualisation is required since OF porting to cygwin is not beeing updated any more. Furthermore, I've checked that running under virtual machine is faster than under cygwin by approx 20%.

I've found somewhere indications that openmpi v1.2.x, which is used in OF, does not allow connection between different subnets and that this limitation has been removed from v1.3. However, I've not tried to install openmpi v1.3 so far.

Did you find any solution since your last post ?

Thanks,

Luca
luca_g is offline   Reply With Quote

Old   June 18, 2009, 16:15
Default
  #7
New Member
 
Klaus Rädecke
Join Date: Jun 2009
Location: Rüsselsheim, Germany
Posts: 4
Rep Power: 8
shamantic is on a distinguished road
I have created ~/.ssh/rc with the content:
. $HOME/OpenFOAM/OpenFOAM-1.5/etc/bashrc

This helped also.
shamantic is offline   Reply With Quote

Old   August 2, 2009, 12:54
Default
  #8
Senior Member
 
Tomislav Maric
Join Date: Mar 2009
Location: Darmstadt, Germany
Posts: 259
Blog Entries: 5
Rep Power: 11
tomislav_maric is on a distinguished road
hello everyone,

I'm having the same problem with OpenFOAM SLAX live DVD.

it seems that ssh is not calling .bashrc.

I didn't have any line to comment in the .bashrc file for the non-interactive run mode, since there was no such file to begin with. I have added

source /OpenFOAM/OpenFOAM-1.5-dev/etc/bashrc

in the ~/.bashrc file, but nothing happens.

when I run this command trying to execute interFoam on a damBreak over the LAN:

/full/pathname/mpirun -H mario -np2 `which interFoam` -parallel

I get errors telling me that mpirun daemon rolled over dead because it couldn't find libWhatever.so - this points to the same problem: unset environmental variables.

I've also tried adding rc file to .ssh but nothig seems to work. Any ideas?

of course, I've successfully set up ssh to log on without passwords and I can use scp and everything else without a problem.
tomislav_maric is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
SnappyHexMesh in parallel openmpi wikstrom OpenFOAM Bugs 18 November 26, 2008 06:55
SnappyHexMesh in parallel openmpi wikstrom OpenFOAM Mesh Utilities 7 November 24, 2008 10:52
Cant run in parallel on two nodes using OpenMPI CHristofer Main CFD Forum 0 October 26, 2007 09:54
parallel runs Andy F CFX 1 March 5, 2006 17:32
Network Interface for parallel processing mAx FLUENT 1 November 10, 2005 12:07


All times are GMT -4. The time now is 06:03.