CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > OpenFOAM Running, Solving & CFD

Running MPI on 2 laptops

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree6Likes
  • 1 Post By wyldckat
  • 1 Post By wyldckat
  • 1 Post By wyldckat
  • 1 Post By ahsan_smme.nust@yahoo.com
  • 1 Post By ahsan_smme.nust@yahoo.com
  • 1 Post By wyldckat

Reply
 
LinkBack Thread Tools Display Modes
Old   December 12, 2013, 15:02
Question Running MPI on 2 laptops
  #1
New Member
 
Ahsan Javed
Join Date: Oct 2013
Location: Pakistan
Posts: 8
Rep Power: 4
ahsan_smme.nust@yahoo.com is on a distinguished road
I am trying to run airfoil 2D case on two laptops . I made my own blockMesh having 1.1 million cells. I have 2 laptops both having 4 processors. Same openfoam version 2.2.1 and ubuntu 12.04LTS version. I gave decomposePar in both laptops for the same case airfoil2D
The problem is when i give command
mpirun --hostfile hosts -np 8 simpleFoam -parallel
in terminal, where hosts file contain names and number of processors of slave.Its located inside airfoil2D folder.
Host file:
ahsan@ahsan-Inspiron-5521
Dell-4050
cpu=4

I get this error:

ahsan@ahsan-Inspiron-5521:~/airFoil2D$ mpirun --hostfile hosts -np 8 simpleFoam -parallel
ssh: Could not resolve hostname Dell-4050: Name or service not known
--------------------------------------------------------------------------
A daemon (pid 12100) died unexpectedly with status 255 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished

Dell 4050 is my slave. I read posts about running MPI on clusters but couldn’t understand how to give library path as i'm not much of a programmer and new to ubuntu. This is my final year project if any1 can help me out ill be more than thankful.
ahsan_smme.nust@yahoo.com is offline   Reply With Quote

Old   December 15, 2013, 15:18
Default
  #2
Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 9,531
Blog Entries: 36
Rep Power: 97
wyldckat is just really nicewyldckat is just really nicewyldckat is just really nicewyldckat is just really nicewyldckat is just really nice
Greetings Ahsan,

Two reference posts that can come in handy:
So, basically what happens in your case is two things:
  1. The two laptops are not aware of the host names of each other, therefore are not able to find their corresponding IP address.
  2. Your "machines" file is incorrect.


To solve the problem, you must first find the IP address for each laptop. You can use this command:
Code:
ifconfig
And then look for the IP address at "inet addr:" related to "eth0", "eth1" or "wlan0" or "wlan1".
Make a note of those IP addresses and try pinging the other machine, using it's IP address, to check if it can find it. For example:
  • If "machine1" has the IP "11.12.13.14";
  • and if "machine2" has the IP "11.12.13.132";
Then:
  • From "machine2", run:
    Code:
    ping 11.12.13.14
  • From "machine1", run:
    Code:
    ping 11.12.13.132
They should be able to see each other.

Now, once the IPs are properly determined and test, edit the file "/etc/hosts" as super-user in both machines and add the two lines at the end of the file:
Code:
11.12.13.14  machine1
11.12.13.132  machine2
Now, the other problem is the "machines" file you have got. It should be something like this:
Code:
machine1 slots=4 max-slots=4
machine2 slots=4 max-slots=4
Best regards,
Bruno
sharonyue likes this.
__________________
___
I'll be at OFW11 in Portugal
wyldckat is offline   Reply With Quote

Old   December 16, 2013, 09:05
Default
  #3
New Member
 
Ahsan Javed
Join Date: Oct 2013
Location: Pakistan
Posts: 8
Rep Power: 4
ahsan_smme.nust@yahoo.com is on a distinguished road
Thank you cat, i have successfully pinged both laptops, and did all the changes you mentioned.

Code:
192.168.1.2 ahsan@ahsan-Inspiron-5521 
192.168.1.1 umer-HP-ProBook-4540s
Pinged:

Code:
ahsan@ahsan-Inspiron-5521:~/airFoil2D$ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_req=41 ttl=64 time=0.520 ms
64 bytes from 192.168.1.1: icmp_req=42 ttl=64 time=0.362 ms
My hosts file says:

Code:
ahsan@ahsan-Inspiron-5521 slots=4 max-slots=4
umer-HP-ProBook-4540s slots=4 max-slots=4
But when i run:
Code:
ahsan@ahsan-Inspiron-5521:~/airFoil2D$ mpirun --hostfile hosts -np 8 simpleFoam -parallel

ssh: connect to host umer-HP-ProBook-4540s port 22: Connection refused
--------------------------------------------------------------------------
A daemon (pid 6863) died unexpectedly with status 255 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished

Connection refused.
:/.
Also when i ssh from ahsan: connection timed out
While when i ssh from umer : Permission denied

Last edited by wyldckat; December 30, 2013 at 09:50. Reason: Added [CODE][/CODE]
ahsan_smme.nust@yahoo.com is offline   Reply With Quote

Old   December 30, 2013, 09:49
Default
  #4
Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 9,531
Blog Entries: 36
Rep Power: 97
wyldckat is just really nicewyldckat is just really nicewyldckat is just really nicewyldckat is just really nicewyldckat is just really nice
Hi Ahsan,

Sorry, I was not able to answer any sooner.
In summary, the problems in question are probably:
  1. Do not use the @ symbol in the name of a machine. The one you are seeing in your command line is only implying that your user name is logged in at your machine; it does not mean that the whole text is the name of the machine.
    In your case, the actual name of your machine is only "ahsan-Inspiron-5521", so you should adjust the files accordingly.
  2. The error message is indicating that no SSH connection was possible over the port 22. You need to configure SSH for it to work on both machines: https://help.ubuntu.com/community/SSH
Best regards,
Bruno
__________________
___
I'll be at OFW11 in Portugal
wyldckat is offline   Reply With Quote

Old   January 1, 2014, 02:50
Question Help needed!
  #5
New Member
 
Ahsan Javed
Join Date: Oct 2013
Location: Pakistan
Posts: 8
Rep Power: 4
ahsan_smme.nust@yahoo.com is on a distinguished road
I have successfully SSH both laptops. But it gives error :

Cannot find executable s

Is it necessary for 2 laptops to have same version of ubuntu, Same openFOAM version, Same architecture (32 or 64 bit) to run a case across a network?
In our case one laptop is 32 and other 64 bit also 1 laptop has openfoam version 2.2.1 and other has 2.1.1. I think the problem lies in architecture and ubuntu version. Plz help me out on this.
ahsan_smme.nust@yahoo.com is offline   Reply With Quote

Old   January 1, 2014, 08:20
Default
  #6
Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 9,531
Blog Entries: 36
Rep Power: 97
wyldckat is just really nicewyldckat is just really nicewyldckat is just really nicewyldckat is just really nicewyldckat is just really nice
Hi Ahsan,

Don't expect OpenFOAM to do universal parallel processing
It doesn't matter what Ubuntu versions you have on each machine. But it does matter that you use the same exact architecture and version of OpenFOAM, as well as installed in the same path.

If one laptop is using a 32bit version of Ubuntu, then you have to install the same version of OpenFOAM and the same 32bit build, on both machines! This is because it's the common denominator between both machines, because 64bit machines can also use 32bit applications.

First choose which version of OpenFOAM you want to use and let me know, so that I can give you directions on what to do.

Best regards,
Bruno
__________________
___
I'll be at OFW11 in Portugal
wyldckat is offline   Reply With Quote

Old   February 25, 2014, 04:20
Unhappy cluster problem
  #7
New Member
 
Ahsan Javed
Join Date: Oct 2013
Location: Pakistan
Posts: 8
Rep Power: 4
ahsan_smme.nust@yahoo.com is on a distinguished road
Hey wyldcat i really appreciate your help. I have successfully run an airfoil3d case on two core i 3 machines. now i have moved to 3 machines. Actually what I did is that I installed same ubuntu(12.04 LTS) and openfoam (221) versions on all machines. Made same username and password account on all machines. All machines have same architecture. My host file looks like this:

ahsan cpu=4
alvi cpu=4
aftab cpu=4

In /etc/hosts I added these lines as you instructed:

192.168.1.2 ahsan
192.168.1.3 alvi
192.168.1.10 aftab

All machines ping and ssh successfully.
I start mpirun by typing:

mpirun -np 12 --hostfile ./hosts /opt/openfoam221/bin/foamExec simpleFoam -parallel

Now the problem is when i run my case on a single machine using its 4 cores it runs faster and reaches 200 time-steps in 5 mins 44 sec, while when I run it along 3 machines using 12 cores, it takes 10 min 33 sec. I used a hub to connect all pcs together. Decomposition method is "scotch". I decomposed my domain into 12 sub domains by typing "decomposePar" in all 3 machines. I dont know where the problem lies . Have i given enough info? plz tell me ill provide.

Last edited by ahsan_smme.nust@yahoo.com; February 25, 2014 at 17:03. Reason: Problem definition updated
ahsan_smme.nust@yahoo.com is offline   Reply With Quote

Old   March 2, 2014, 11:12
Default
  #8
Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 9,531
Blog Entries: 36
Rep Power: 97
wyldckat is just really nicewyldckat is just really nicewyldckat is just really nicewyldckat is just really nicewyldckat is just really nice
Hi Ahsan,

Well, throwing more CPUs at a problem doesn't mean that it will solve the problem faster. You need to take into account several details:
  1. If all CPUs are equally powerful.
  2. If the memory speed on all machines is the same.
  3. If the connection between machines is powerful enough.
In addition, using Hyper-Threading isn't very good when using OpenFOAM. Therefore, if your machines are using i3 CPUs, then you should only use 2 cores on each one.


If you can describe:
  1. The CPU model on each machine (specific model, e.g. i3-3115C);
  2. The RAM speed on each machine, e.g. DDR3 1333 MHz;
  3. The times that the ping command gives you, when checking the connection with the other machines;
  4. The Ethernet connection, namely if it's 100 Mbps or 1 Gbps;
  5. The number of cells in the mesh;
I can estimate what can be improved or if it will even work at all.

Best regards,
Bruno
sharonyue likes this.
__________________
___
I'll be at OFW11 in Portugal
wyldckat is offline   Reply With Quote

Old   March 4, 2014, 04:09
Post Specs
  #9
New Member
 
Ahsan Javed
Join Date: Oct 2013
Location: Pakistan
Posts: 8
Rep Power: 4
ahsan_smme.nust@yahoo.com is on a distinguished road
intel (R) Core (TM) i3-2120 CPU @ 3.30GHz 3.30GHz
Installed Memory 8 Gb
64 bit operating system,x64 based processor
ethernet connection 100Mbps
RAM DDR3 1333MHz

No. of cells in mesh 1.1 million
mesh density (150 150 10)

check the attachment for ping times.
Attached Images
File Type: png Capture.PNG (45.8 KB, 18 views)
ahsan_smme.nust@yahoo.com is offline   Reply With Quote

Old   March 4, 2014, 06:05
Default
  #10
Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 9,531
Blog Entries: 36
Rep Power: 97
wyldckat is just really nicewyldckat is just really nicewyldckat is just really nicewyldckat is just really nicewyldckat is just really nice
Hi Ahsan,

Quote:
Originally Posted by ahsan_smme.nust@yahoo.com View Post
intel (R) Core (TM) i3-2120 CPU @ 3.30GHz 3.30GHz
It has a Passmark index of 3871. Pretty powerful CPU! It's about 45% of an i7-2600. Which is pretty good, considering that it's 2 cores on the i3 vs 4 cores on the i7!

Quote:
Originally Posted by ahsan_smme.nust@yahoo.com View Post
Installed Memory 8 Gb
OK, no problem here.
Quote:
Originally Posted by ahsan_smme.nust@yahoo.com View Post
64 bit operating system,x64 based processor
Good.
Quote:
Originally Posted by ahsan_smme.nust@yahoo.com View Post
ethernet connection 100Mbps
Not good. Really not good
Any chance you can change to a Gigabit network, namely 1000Mbps = 1Gbps? Because this is the main reason why you're getting such bad timing results

For a sense of perspective: the CPUs would have to be running at 400 to 800 MHz per core, for you to notice improvements in using multiple machines. Which at this point, doesn't make much sense, because each CPU runs at 3300MHz.

Quote:
Originally Posted by ahsan_smme.nust@yahoo.com View Post
RAM DDR3 1333MHz
Good enough!

Quote:
Originally Posted by ahsan_smme.nust@yahoo.com View Post
No. of cells in mesh 1.1 million
mesh density (150 150 10)
150*150*10 = 225000
this only equates to 225 thousand cells.

Quote:
Originally Posted by ahsan_smme.nust@yahoo.com View Post
check the attachment for ping times.
0.700 ms is an indicator of pretty bad latency issues You need at least 0.300 to 0.100 to achieve relatively good simulation time.


If you cannot change the Ethernet connection to Gigabit, then you will have to resort to another kind of parallel performance: you can run 3 independent simulations at the same time, one of each machine, to test different settings.

If you can change the Ethernet connection to Gigabit and use 2 cores per machine, then I expect that you'll see something like this:
  • 1 machine: 5 min
  • 2 machines: 3 min 30 s
  • 3 machines: 2 min 10 s
This is also depending on the number of cells per processor, which should be above 50000 to 100000 cells per processor.

Best regards,
Bruno
__________________
___
I'll be at OFW11 in Portugal
wyldckat is offline   Reply With Quote

Old   April 2, 2014, 22:23
Default
  #11
New Member
 
liyu
Join Date: Jan 2014
Location: Beijing
Posts: 10
Rep Power: 4
ahliyu is on a distinguished road
hi,ahsan~
should the case folder on different laptops be located in the same path ?
and where will the results be stored, the host or the slave?

thx~
ahliyu is offline   Reply With Quote

Old   April 3, 2014, 06:30
Default Path
  #12
New Member
 
Ahsan Javed
Join Date: Oct 2013
Location: Pakistan
Posts: 8
Rep Power: 4
ahsan_smme.nust@yahoo.com is on a distinguished road
Case folder should have same path in all Pcs. If Pc1 is host then from processor 0-4 results will be stored in Pc1, 5-8 results will be stored in Pc2 and so on.
ahliyu likes this.
ahsan_smme.nust@yahoo.com is offline   Reply With Quote

Old   April 3, 2014, 06:46
Default
  #13
New Member
 
liyu
Join Date: Jan 2014
Location: Beijing
Posts: 10
Rep Power: 4
ahliyu is on a distinguished road
thank you for your vary fast reply!
another question~
is the password-less ssh necessary? i have not understood the detailed procedure even after reading this instruction on http://homepages.inf.ed.ac.uk/imurra...dless_ssh.html.
i am a freshman to openfoam as well as linux ~~

Besides, could you show me the " roots" part in decomposeParDict file?
i am not sure what the " node" means, one node means one PC ?

Last edited by ahliyu; April 3, 2014 at 09:42.
ahliyu is offline   Reply With Quote

Old   April 3, 2014, 12:43
Default Ssh
  #14
New Member
 
Ahsan Javed
Join Date: Oct 2013
Location: Pakistan
Posts: 8
Rep Power: 4
ahsan_smme.nust@yahoo.com is on a distinguished road
I tried passwordless ssh but didnt work fr me. As far as my knowledge is concerned openfoam may not initiate MPI without a passwrd ssh. Il give u a key shortcut to remove all the fuss in making a cluster. Make same username account and password in all Pcs. Give different IPs to each Pc and run SSH. Pfff ! cluster is ready. Abt decomposePar ill ans that in a while.
ahliyu likes this.

Last edited by ahsan_smme.nust@yahoo.com; April 3, 2014 at 12:52. Reason: To Elaborate
ahsan_smme.nust@yahoo.com is offline   Reply With Quote

Old   April 3, 2014, 23:27
Unhappy help~
  #15
New Member
 
liyu
Join Date: Jan 2014
Location: Beijing
Posts: 10
Rep Power: 4
ahliyu is on a distinguished road
Quote:
Originally Posted by wyldckat View Post
Greetings Hisham,

Indeed there seems to be some strange detail that is escaping here...

OK, let's try to debug this in parts:
  1. Is the user name the same on both machines? If not, you might want to try using the multiple roots definition in "decomposeParDict". I wrote a blog post about it some time ago: Running OpenFOAM in parallel with different locations for each process
  2. Let's skip the need for password. Follow these instructions for a passwordless access for your own user between machines: http://homepages.inf.ed.ac.uk/imurra...dless_ssh.html
    It's only passwordless if you don't use a password for your key. If yours is a closed internal network and unlikely that an internal access attack is done, then you won't need a password for this private/public key pair.
  3. Specify the number of cpu's for the local machine as well, just in case.
I can't think of any other hypothesis for now.

Best regards,
Bruno
hi,bruno~
i have learnt a lot from your post, so really thank you for your work!
i have several questions about parallel running on two PCs(actually, two workstations).

1. you mentioned the username. So if two machines have the same username, is it still necessary for me to edit the "roots" in decomposeParDict ?
besides, as far as i am concerned, one node means one PC ? i have 2 workstations and each of them has 8 cores. so the " n" in " roots n" should be 1 or 15 ?

2. i still fail to understand how to get a password-less SSH even after reading the instructions on http://homepages.inf.ed.ac.uk/imurra...dless_ssh.html.
i am a freshman to linux~
is it possible to avoid the password-less SSH setting?

thank you again~
ahliyu is offline   Reply With Quote

Old   April 5, 2014, 20:26
Default
  #16
Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 9,531
Blog Entries: 36
Rep Power: 97
wyldckat is just really nicewyldckat is just really nicewyldckat is just really nicewyldckat is just really nicewyldckat is just really nice
Greetings to all!

@ahliyu:
Quote:
Originally Posted by ahliyu View Post
i have not understood the detailed procedure even after reading this instruction on http://homepages.inf.ed.ac.uk/imurra...dless_ssh.html.
i am a freshman to openfoam as well as linux ~~
I suggest that you read the following pages:

Quote:
Originally Posted by ahliyu View Post
Besides, could you show me the " roots" part in decomposeParDict file?
It's explained here: Running OpenFOAM in parallel with different locations for each process


Quote:
Originally Posted by ahliyu View Post
i am not sure what the " node" means, one node means one PC ?
Yes. Usually the nodes are referred to:
  • master node - the main machine, responsible for keeping track of the jobs being executed in the cluster and usually also responsible for running part of the parallel case.
  • slave nodes or worker nodes - as the name says, these nodes usually only do the heavy computational work, all together.
-------------------
edit: I moved the post above from the other quoted thread, to keep the discussion in the same thread here.

Quote:
Originally Posted by ahliyu View Post
1. you mentioned the username. So if two machines have the same username, is it still necessary for me to edit the "roots" in decomposeParDict ?
The "roots" are only a feature that allows that each processor folder is placed in different locations on each machine. If you do not define the "roots", the same exact path should be used for the main case folder and to somehow be shared among all machines.

Quote:
Originally Posted by ahliyu View Post
besides, as far as i am concerned, one node means one PC ? i have 2 workstations and each of them has 8 cores. so the " n" in " roots n" should be 1 or 15 ?
Already answered above.

Quote:
Originally Posted by ahliyu View Post
2. i still fail to understand how to get a password-less SSH even after reading the instructions on http://homepages.inf.ed.ac.uk/imurra...dless_ssh.html.
i am a freshman to linux~
is it possible to avoid the password-less SSH setting?
Also already answered above.

Best regards,
Bruno
ahliyu likes this.
__________________
___
I'll be at OFW11 in Portugal

Last edited by wyldckat; April 5, 2014 at 20:32. Reason: see "edit:"
wyldckat is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
mpirun, best parameters pablodecastillo Hardware 17 April 27, 2012 13:05
Statically Compiling OpenFOAM Issues herzfeldd OpenFOAM Installation 21 January 6, 2009 10:38
Error using LaunderGibsonRSTM on SGI ALTIX 4700 jaswi OpenFOAM 2 April 29, 2008 10:54
Is Testsuite on the way or not lakeat OpenFOAM Installation 6 April 28, 2008 11:12
Kubuntu uses dash breaks All scripts in tutorials platopus OpenFOAM Bugs 8 April 15, 2008 07:52


All times are GMT -4. The time now is 08:08.