|
[Sponsors] |
Parallel runs across a network broken with OpenMPI -- SSH issue--no bashrc? |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
March 15, 2009, 19:51 |
Parallel runs across a network broken with OpenMPI -- SSH issue--no bashrc?
|
#1 |
New Member
db
Join Date: Mar 2009
Posts: 3
Rep Power: 17 |
Hi,
Glad to see the new forums are up and running. I have a question for the group, and I wouldn't be surprised if this were a common problem (at least I'm hoping so, for my sake). I have run a parallel case decomposed for two processors on a single machine successfully. I set up a second computer running an identical OS (Ubuntu 8.10) with the same update level, etc. as my primary box. Both computers have identical paths to the OpenFOAM installation as well as the case files. I decomposed the case for four processors, two per machine, and executed the project per the instructions in the documentation. At this point I began to have problems with the remote machine not being able to find the icoFoam executable (nor any OpenFOAM executable, for that matter). I tested running some simple system commands using a nearly identical mpirun command and succeeded running them on all four processors. e.g.: mpirun --hostfile ./machines -np 4 uptime returns: carcass@Esker:~/user-files/Works/scrubber-icoFoam$ time mpirun --hostfile ./machines -np 4 uptime 20:54:46 up 1 day, 4:24, 2 users, load average: 0.00, 0.00, 0.00 20:54:46 up 1 day, 4:24, 2 users, load average: 0.00, 0.00, 0.00 20:54:40 up 23:13, 5 users, load average: 0.40, 0.40, 0.39 20:54:40 up 23:13, 5 users, load average: 0.40, 0.40, 0.39 real 0m0.480s user 0m0.048s sys 0m0.056s The problem seems to stem from SSH not executing any setup files (.bashrc, .profile, .bash_profile, /etc/profile, /etc/bash.bashrc, etc ad nauseam) when running a non-interactive, non-login shell. I can duplicate the problem by executing: mpirun --hostfile ./machines -np 4 which icoFoam carcass@Esker:~/user-files/Works/scrubber-icoFoam$ mpirun --hostfile ./machines -np 4 which icoFoam /home/carcass/OpenFOAM/OpenFOAM-1.5/applications/bin/linux64GccDPOpt/icoFoam /home/carcass/OpenFOAM/OpenFOAM-1.5/applications/bin/linux64GccDPOpt/icoFoam (only two processors return) The problem appears to be that I cannot get the OpenFoam-1.5/etc/bashrc file to be evaluated upon ssh connecting to the remote machine. The problem recurs no matter which machine is the local and which is the remote (the configuration is exactly equivalent on the two machines). I have done a bunch of searching around on the internet and have found many references to OpenSSH being broken as far as bash goes because the new OpenSSH uses pipes, not sockets, to connect, and hence will not be recognized by bash as needing execution of the environment setup files. Some have suggested rebuilding bash, but I'm really not at all interested in doing that. Has anyone in the OpenFOAM community experienced a similar problem? I'd greatly appreciate some assistance. As I mentioned, I can successfully run on two processors on a local machine, but I'm really interested in getting networked machines into the mix as well. Thanks in advance! |
|
March 16, 2009, 07:02 |
|
#2 |
New Member
Kieran Wood
Join Date: Mar 2009
Posts: 7
Rep Power: 17 |
Hi,
I had a very similar problem and although my Linux knowledge is limited I solved it like this. The .bashrc file in /home/<user>/ has a line in it like this, [ -z "$PS1" ] && return This is telling the bash file to not load any more lines of the .bashrc file if it is a non interactive login (no keyboard). Since the remote login by OMPI is non interactive the OpenFOAM bash script is never loaded thus no executables can be found. To solve just add the line, . $HOME/OpenFOAM/OpnFOAM-1.5/etc/bashrc ,to the bottom of .bashrc as instructed, and comment out the line above so it reads with a hash infront, # [ -z "$PS1" ] && return I was using Xubuntu7.10 so I guess the problem is similar. Kieran |
|
March 16, 2009, 07:12 |
|
#3 |
New Member
Jeff Squyres
Join Date: Mar 2009
Posts: 6
Rep Power: 17 |
Have a look at http://www.open-mpi.org/faq/?categor...g-ompi-to-path -- it talks about adding Open MPI to your PATH (etc.), but the same principles apply. Additionally, it talks about what files are sourced by what shells, etc.
Hope this helps. |
|
March 16, 2009, 17:53 |
|
#4 |
New Member
db
Join Date: Mar 2009
Posts: 3
Rep Power: 17 |
Thanks for the answers. The workaround removing the reference to $PS1 worked for me.
Happily running parallel across a network, carcass |
|
March 16, 2009, 20:25 |
|
#5 |
New Member
Kieran Wood
Join Date: Mar 2009
Posts: 7
Rep Power: 17 |
Glad it helped, however I am still having problems running in parallel across a network.
I am having to run OF in a Linux virtual machine ontop of a windows host using VMware since I am not an admin on the computers. The performance on individual machines is good and OMPI works with the two local processors. So far so good. But I cannot get bridged unique IP adresses for the virtual machines and so all communication must be done over a singe port routed by NAT (which is automatically setup by VMWare) where the IP is shared for outgoing communication however any incomming communication is stoped by the host since it did not know it was for the guest. The computer admins have set up a single port forward rule and so all communications need to be routed over this port. I have sucessfuly 'ssh'd' in and out of the remote machines over this port and also set up OpenVPN and then 'ssh'd' over many different ports using my newly created virtual private network IP's and so there is conectivity. My problems start when trying OF over OMPI over the VPN. Simple programs like "uptime" work accross the VPN (using my private IP's) indicating MPI_init is completeing sucessfully and the nodes can see each other. However when using OF I get a "blt_complete_connect" error of some sort after the OF header is displayed. The strange thing is that all of the proessors on the remotes get set at 100% until I ctrl+c the program, and so it appears that the post MPI-Init communication is failing. i.e. OF finds the nodes, connects, finds the executeables (thanks to the fix above), calls the header, but then cannot send any data needed for the solution. Does anyone know of a reason OMPI would not work over a VPN? I thought about maybe there being some non IP traffic but then tested that with a IP traffic monitor on my home network where I could get the system working with bridged networking (no VPN required) and it is all IP traffic. The point of all this is to allow parallel OF clusters to be setup on pre-existing windows networks with verly little change to the network or computer setup, thus keeping admins happy and giving me my own little network cluster. Thanks in advance for any help. Kieran |
|
June 3, 2009, 10:46 |
|
#6 |
Member
Luca Gasparini
Join Date: Mar 2009
Location: Italy
Posts: 37
Rep Power: 17 |
Dear Kieran,
I'm also trying something similar, just using sun virtualbox instead of vmware, and I'm having the same problem. So far I've been using cygwin and OF 1.4 port to cygwin but clearly a solution using virtualisation is required since OF porting to cygwin is not beeing updated any more. Furthermore, I've checked that running under virtual machine is faster than under cygwin by approx 20%. I've found somewhere indications that openmpi v1.2.x, which is used in OF, does not allow connection between different subnets and that this limitation has been removed from v1.3. However, I've not tried to install openmpi v1.3 so far. Did you find any solution since your last post ? Thanks, Luca |
|
June 18, 2009, 16:15 |
|
#7 |
New Member
Klaus Rädecke
Join Date: Jun 2009
Location: Rüsselsheim, Germany
Posts: 9
Rep Power: 16 |
I have created ~/.ssh/rc with the content:
. $HOME/OpenFOAM/OpenFOAM-1.5/etc/bashrc This helped also. |
|
August 2, 2009, 12:54 |
|
#8 |
Senior Member
Tomislav Maric
Join Date: Mar 2009
Location: Darmstadt, Germany
Posts: 284
Blog Entries: 5
Rep Power: 21 |
hello everyone,
I'm having the same problem with OpenFOAM SLAX live DVD. it seems that ssh is not calling .bashrc. I didn't have any line to comment in the .bashrc file for the non-interactive run mode, since there was no such file to begin with. I have added source /OpenFOAM/OpenFOAM-1.5-dev/etc/bashrc in the ~/.bashrc file, but nothing happens. when I run this command trying to execute interFoam on a damBreak over the LAN: /full/pathname/mpirun -H mario -np2 `which interFoam` -parallel I get errors telling me that mpirun daemon rolled over dead because it couldn't find libWhatever.so - this points to the same problem: unset environmental variables. I've also tried adding rc file to .ssh but nothig seems to work. Any ideas? of course, I've successfully set up ssh to log on without passwords and I can use scp and everything else without a problem. |
|
December 9, 2022, 09:07 |
|
#9 |
New Member
Klaus Rädecke
Join Date: Jun 2009
Location: Rüsselsheim, Germany
Posts: 9
Rep Power: 16 |
My default Ubuntu .bashrc starts with code that terminates sourcing in non-interactive mode.
I then source the required environment just before that: # ~/.bashrc: executed by bash(1) for non-login shells. # see /usr/share/doc/bash/examples/startup-files (in the package bash-doc) # for examples source ~/OpenFoam/OpenFOAM-v2206/etc/bashrc # If not running interactively, don't do anything case $- in *i*) ;; *) return;; esac ..... And sorry about that hint on ~/ssh.rc - this works with ssh but not with mpirun Last edited by shamantic; December 9, 2022 at 11:57. Reason: clarifying a previous post |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
SnappyHexMesh in parallel openmpi | wikstrom | OpenFOAM Bugs | 18 | November 26, 2008 05:55 |
[snappyHexMesh] SnappyHexMesh in parallel openmpi | wikstrom | OpenFOAM Meshing & Mesh Conversion | 7 | November 24, 2008 09:52 |
Cant run in parallel on two nodes using OpenMPI | CHristofer | Main CFD Forum | 0 | October 26, 2007 09:54 |
parallel runs | Andy F | CFX | 1 | March 5, 2006 16:32 |
Network Interface for parallel processing | mAx | FLUENT | 1 | November 10, 2005 11:07 |