CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Running MPI on 2 laptops (https://www.cfd-online.com/Forums/openfoam-solving/127502-running-mpi-2-laptops.html)

ahsan_smme.nust@yahoo.com December 12, 2013 14:02

Running MPI on 2 laptops
 
I am trying to run airfoil 2D case on two laptops . I made my own blockMesh having 1.1 million cells. I have 2 laptops both having 4 processors. Same openfoam version 2.2.1 and ubuntu 12.04LTS version. I gave decomposePar in both laptops for the same case airfoil2D
The problem is when i give command
mpirun --hostfile hosts -np 8 simpleFoam -parallel
in terminal, where hosts file contain names and number of processors of slave.Its located inside airfoil2D folder.
Host file:
ahsan@ahsan-Inspiron-5521
Dell-4050
cpu=4

I get this error:

ahsan@ahsan-Inspiron-5521:~/airFoil2D$ mpirun --hostfile hosts -np 8 simpleFoam -parallel
ssh: Could not resolve hostname Dell-4050: Name or service not known
--------------------------------------------------------------------------
A daemon (pid 12100) died unexpectedly with status 255 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished

Dell 4050 is my slave. I read posts about running MPI on clusters but couldn’t understand how to give library path as i'm not much of a programmer and new to ubuntu. This is my final year project if any1 can help me out ill be more than thankful.:confused:

wyldckat December 15, 2013 14:18

Greetings Ahsan,

Two reference posts that can come in handy:
So, basically what happens in your case is two things:
  1. The two laptops are not aware of the host names of each other, therefore are not able to find their corresponding IP address.
  2. Your "machines" file is incorrect.


To solve the problem, you must first find the IP address for each laptop. You can use this command:
Code:

ifconfig
And then look for the IP address at "inet addr:" related to "eth0", "eth1" or "wlan0" or "wlan1".
Make a note of those IP addresses and try pinging the other machine, using it's IP address, to check if it can find it. For example:
  • If "machine1" has the IP "11.12.13.14";
  • and if "machine2" has the IP "11.12.13.132";
Then:
  • From "machine2", run:
    Code:

    ping 11.12.13.14
  • From "machine1", run:
    Code:

    ping 11.12.13.132
They should be able to see each other.

Now, once the IPs are properly determined and test, edit the file "/etc/hosts" as super-user in both machines and add the two lines at the end of the file:
Code:

11.12.13.14  machine1
11.12.13.132  machine2

Now, the other problem is the "machines" file you have got. It should be something like this:
Code:

machine1 slots=4 max-slots=4
machine2 slots=4 max-slots=4

Best regards,
Bruno

ahsan_smme.nust@yahoo.com December 16, 2013 08:05

Thank you cat, i have successfully pinged both laptops, and did all the changes you mentioned.

Code:

192.168.1.2 ahsan@ahsan-Inspiron-5521
192.168.1.1 umer-HP-ProBook-4540s

Pinged:

Code:

ahsan@ahsan-Inspiron-5521:~/airFoil2D$ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_req=41 ttl=64 time=0.520 ms
64 bytes from 192.168.1.1: icmp_req=42 ttl=64 time=0.362 ms

My hosts file says:

Code:

ahsan@ahsan-Inspiron-5521 slots=4 max-slots=4
umer-HP-ProBook-4540s slots=4 max-slots=4

But when i run:
Code:

ahsan@ahsan-Inspiron-5521:~/airFoil2D$ mpirun --hostfile hosts -np 8 simpleFoam -parallel

ssh: connect to host umer-HP-ProBook-4540s port 22: Connection refused
--------------------------------------------------------------------------
A daemon (pid 6863) died unexpectedly with status 255 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished

Connection refused.

:/.
Also when i ssh from ahsan: connection timed out
While when i ssh from umer : Permission denied

wyldckat December 30, 2013 08:49

Hi Ahsan,

Sorry, I was not able to answer any sooner.
In summary, the problems in question are probably:
  1. Do not use the @ symbol in the name of a machine. The one you are seeing in your command line is only implying that your user name is logged in at your machine; it does not mean that the whole text is the name of the machine.
    In your case, the actual name of your machine is only "ahsan-Inspiron-5521", so you should adjust the files accordingly.
  2. The error message is indicating that no SSH connection was possible over the port 22. You need to configure SSH for it to work on both machines: https://help.ubuntu.com/community/SSH
Best regards,
Bruno

ahsan_smme.nust@yahoo.com January 1, 2014 01:50

Help needed!
 
I have successfully SSH both laptops. But it gives error :

Cannot find executable s

Is it necessary for 2 laptops to have same version of ubuntu, Same openFOAM version, Same architecture (32 or 64 bit) to run a case across a network?
In our case one laptop is 32 and other 64 bit also 1 laptop has openfoam version 2.2.1 and other has 2.1.1. I think the problem lies in architecture and ubuntu version. Plz help me out on this.

wyldckat January 1, 2014 07:20

Hi Ahsan,

Don't expect OpenFOAM to do universal parallel processing ;)
It doesn't matter what Ubuntu versions you have on each machine. But it does matter that you use the same exact architecture and version of OpenFOAM, as well as installed in the same path.

If one laptop is using a 32bit version of Ubuntu, then you have to install the same version of OpenFOAM and the same 32bit build, on both machines! This is because it's the common denominator between both machines, because 64bit machines can also use 32bit applications.

First choose which version of OpenFOAM you want to use and let me know, so that I can give you directions on what to do.

Best regards,
Bruno

ahsan_smme.nust@yahoo.com February 25, 2014 03:20

cluster problem
 
Hey wyldcat i really appreciate your help. I have successfully run an airfoil3d case on two core i 3 machines. :) now i have moved to 3 machines. Actually what I did is that I installed same ubuntu(12.04 LTS) and openfoam (221) versions on all machines. Made same username and password account on all machines. All machines have same architecture. My host file looks like this:

ahsan cpu=4
alvi cpu=4
aftab cpu=4

In /etc/hosts I added these lines as you instructed:

192.168.1.2 ahsan
192.168.1.3 alvi
192.168.1.10 aftab

All machines ping and ssh successfully.
I start mpirun by typing:

mpirun -np 12 --hostfile ./hosts /opt/openfoam221/bin/foamExec simpleFoam -parallel

Now the problem is when i run my case on a single machine using its 4 cores it runs faster and reaches 200 time-steps in 5 mins 44 sec, while when I run it along 3 machines using 12 cores, it takes 10 min 33 sec. I used a hub to connect all pcs together. Decomposition method is "scotch". I decomposed my domain into 12 sub domains by typing "decomposePar" in all 3 machines. I dont know where the problem lies :(. Have i given enough info? plz tell me ill provide.

wyldckat March 2, 2014 10:12

Hi Ahsan,

Well, throwing more CPUs at a problem doesn't mean that it will solve the problem faster. You need to take into account several details:
  1. If all CPUs are equally powerful.
  2. If the memory speed on all machines is the same.
  3. If the connection between machines is powerful enough.
In addition, using Hyper-Threading isn't very good when using OpenFOAM. Therefore, if your machines are using i3 CPUs, then you should only use 2 cores on each one.


If you can describe:
  1. The CPU model on each machine (specific model, e.g. i3-3115C);
  2. The RAM speed on each machine, e.g. DDR3 1333 MHz;
  3. The times that the ping command gives you, when checking the connection with the other machines;
  4. The Ethernet connection, namely if it's 100 Mbps or 1 Gbps;
  5. The number of cells in the mesh;
I can estimate what can be improved or if it will even work at all.

Best regards,
Bruno

ahsan_smme.nust@yahoo.com March 4, 2014 03:09

Specs
 
1 Attachment(s)
intel (R) Core (TM) i3-2120 CPU @ 3.30GHz 3.30GHz
Installed Memory 8 Gb
64 bit operating system,x64 based processor
ethernet connection 100Mbps
RAM DDR3 1333MHz

No. of cells in mesh 1.1 million
mesh density (150 150 10)

check the attachment for ping times.

wyldckat March 4, 2014 05:05

Hi Ahsan,

Quote:

Originally Posted by ahsan_smme.nust@yahoo.com (Post 477866)
intel (R) Core (TM) i3-2120 CPU @ 3.30GHz 3.30GHz

It has a Passmark index of 3871. Pretty powerful CPU! It's about 45% of an i7-2600. Which is pretty good, considering that it's 2 cores on the i3 vs 4 cores on the i7!

Quote:

Originally Posted by ahsan_smme.nust@yahoo.com (Post 477866)
Installed Memory 8 Gb

OK, no problem here.
Quote:

Originally Posted by ahsan_smme.nust@yahoo.com (Post 477866)
64 bit operating system,x64 based processor

Good.
Quote:

Originally Posted by ahsan_smme.nust@yahoo.com (Post 477866)
ethernet connection 100Mbps

Not good. Really not good :(
Any chance you can change to a Gigabit network, namely 1000Mbps = 1Gbps? Because this is the main reason why you're getting such bad timing results :(

For a sense of perspective: the CPUs would have to be running at 400 to 800 MHz per core, for you to notice improvements in using multiple machines. Which at this point, doesn't make much sense, because each CPU runs at 3300MHz.

Quote:

Originally Posted by ahsan_smme.nust@yahoo.com (Post 477866)
RAM DDR3 1333MHz

Good enough!

Quote:

Originally Posted by ahsan_smme.nust@yahoo.com (Post 477866)
No. of cells in mesh 1.1 million
mesh density (150 150 10)

150*150*10 = 225000
:confused: this only equates to 225 thousand cells.

Quote:

Originally Posted by ahsan_smme.nust@yahoo.com (Post 477866)
check the attachment for ping times.

0.700 ms is an indicator of pretty bad latency issues :( You need at least 0.300 to 0.100 to achieve relatively good simulation time.


If you cannot change the Ethernet connection to Gigabit, then you will have to resort to another kind of parallel performance: you can run 3 independent simulations at the same time, one of each machine, to test different settings.

If you can change the Ethernet connection to Gigabit and use 2 cores per machine, then I expect that you'll see something like this:
  • 1 machine: 5 min
  • 2 machines: 3 min 30 s
  • 3 machines: 2 min 10 s
This is also depending on the number of cells per processor, which should be above 50000 to 100000 cells per processor.

Best regards,
Bruno

ahliyu April 2, 2014 22:23

hi,ahsan~
should the case folder on different laptops be located in the same path ?
and where will the results be stored, the host or the slave?

thx~

ahsan_smme.nust@yahoo.com April 3, 2014 06:30

Path
 
Case folder should have same path in all Pcs. If Pc1 is host then from processor 0-4 results will be stored in Pc1, 5-8 results will be stored in Pc2 and so on.:)

ahliyu April 3, 2014 06:46

thank you for your vary fast reply!
another question~
is the password-less ssh necessary? i have not understood the detailed procedure even after reading this instruction on http://homepages.inf.ed.ac.uk/imurra...dless_ssh.html.
i am a freshman to openfoam as well as linux ~~

Besides, could you show me the " roots" part in decomposeParDict file?
i am not sure what the " node" means, one node means one PC ?

ahsan_smme.nust@yahoo.com April 3, 2014 12:43

Ssh
 
I tried passwordless ssh but didnt work fr me. As far as my knowledge is concerned openfoam may not initiate MPI without a passwrd ssh. Il give u a key shortcut to remove all the fuss in making a cluster. Make same username account and password in all Pcs. Give different IPs to each Pc and run SSH. Pfff ! cluster is ready. Abt decomposePar ill ans that in a while.

ahliyu April 3, 2014 23:27

help~
 
Quote:

Originally Posted by wyldckat (Post 365693)
Greetings Hisham,

Indeed there seems to be some strange detail that is escaping here...

OK, let's try to debug this in parts:
  1. Is the user name the same on both machines? If not, you might want to try using the multiple roots definition in "decomposeParDict". I wrote a blog post about it some time ago: Running OpenFOAM in parallel with different locations for each process
  2. Let's skip the need for password. Follow these instructions for a passwordless access for your own user between machines: http://homepages.inf.ed.ac.uk/imurra...dless_ssh.html
    It's only passwordless if you don't use a password for your key. If yours is a closed internal network and unlikely that an internal access attack is done, then you won't need a password for this private/public key pair.
  3. Specify the number of cpu's for the local machine as well, just in case.
I can't think of any other hypothesis for now.

Best regards,
Bruno

hi,bruno~
i have learnt a lot from your post, so really thank you for your work!
i have several questions about parallel running on two PCs(actually, two workstations).

1. you mentioned the username. So if two machines have the same username, is it still necessary for me to edit the "roots" in decomposeParDict ?
besides, as far as i am concerned, one node means one PC ? i have 2 workstations and each of them has 8 cores. so the " n" in " roots n" should be 1 or 15 ?

2. i still fail to understand how to get a password-less SSH even after reading the instructions on http://homepages.inf.ed.ac.uk/imurra...dless_ssh.html.
i am a freshman to linux~
is it possible to avoid the password-less SSH setting?

thank you again~

wyldckat April 5, 2014 20:26

Greetings to all!

@ahliyu:
Quote:

Originally Posted by ahliyu (Post 483650)
i have not understood the detailed procedure even after reading this instruction on http://homepages.inf.ed.ac.uk/imurra...dless_ssh.html.
i am a freshman to openfoam as well as linux ~~

I suggest that you read the following pages:

Quote:

Originally Posted by ahliyu (Post 483650)
Besides, could you show me the " roots" part in decomposeParDict file?

It's explained here: Running OpenFOAM in parallel with different locations for each process


Quote:

Originally Posted by ahliyu (Post 483650)
i am not sure what the " node" means, one node means one PC ?

Yes. Usually the nodes are referred to:
  • master node - the main machine, responsible for keeping track of the jobs being executed in the cluster and usually also responsible for running part of the parallel case.
  • slave nodes or worker nodes - as the name says, these nodes usually only do the heavy computational work, all together.
-------------------
edit: I moved the post above from the other quoted thread, to keep the discussion in the same thread here.

Quote:

Originally Posted by ahliyu (Post 483790)
1. you mentioned the username. So if two machines have the same username, is it still necessary for me to edit the "roots" in decomposeParDict ?

The "roots" are only a feature that allows that each processor folder is placed in different locations on each machine. If you do not define the "roots", the same exact path should be used for the main case folder and to somehow be shared among all machines.

Quote:

Originally Posted by ahliyu (Post 483790)
besides, as far as i am concerned, one node means one PC ? i have 2 workstations and each of them has 8 cores. so the " n" in " roots n" should be 1 or 15 ?

Already answered above.

Quote:

Originally Posted by ahliyu (Post 483790)
2. i still fail to understand how to get a password-less SSH even after reading the instructions on http://homepages.inf.ed.ac.uk/imurra...dless_ssh.html.
i am a freshman to linux~
is it possible to avoid the password-less SSH setting?

Also already answered above.

Best regards,
Bruno


All times are GMT -4. The time now is 07:25.