CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   FLUENT (https://www.cfd-online.com/Forums/fluent/)
-   -   [TUTORIAL] Run fluent on distributed memory with 2 windows 7 64 bit machines (https://www.cfd-online.com/Forums/fluent/134275-tutorial-run-fluent-distributed-memory-2-windows-7-64-bit-machines.html)

ghost82 April 27, 2014 05:15

[TUTORIAL] Run fluent on distributed memory with 2 windows 7 64 bit machines
 
4 Attachment(s)
Hi all,
this is a small tutorial to have the smallest cluster of 2 machines up and running with fluent.
The following setup is only a demo setup to show how 2 windows machines can be connected; more into details this tutorial will show you how to connect 2 machines running windows 7 64 bit to run jobs on distributed memory, without a switch, but directly connecting the 2 machines.

My demo setup is:

MACHINE 1 (this machine will be the host + 16 nodes)
Homemade workstation
Windows 7 64 bit Professional
2x intel xeon e5-2687w
64 gb ecc ram ddr3 at 1600 Mhz
250 gb ssd
2 TB hard drive for data storage
Wifi adapter Edimax Ralink EW-7318usg

MACHINE 2 (this machine will be 2 nodes)
Macbook pro late 2008 (laptop)
Bootcamp installed with windows 7 64 bit Ultimate
1x 2,5 Ghz intel core 2 duo
4 gb non ecc ram ddr2 at 667 Mhz
40 gb hard drive
Internal wifi N adapter


Since this is only a demo setup and since I haven't a cross cable to connect the machines through the gigabit ethernet, I was going to connect the 2 machines by wifi at 54 mbps (very slow); in a real setup, the 2 machines should be connected at least with a cross cable cat. 5E (Gbit ethernet), or better with infiniband.

First of all I need to create a lan to connect the 2 machines.
Edimax provides an access point software, so machine 1 will be the access point.
Machine 2 is the wifi client which will connect to machine 1.

Let's assign static ips and subnet masks to the machines.

On machine 1 (which will be the gateway):
Control panel->network and internet->network and sharing server->change adapter settings (on the left)
Right click on the network adapter (in my case the Edimax Ralink EW-7318usg) and click properties: highlight "internet protocol version 4 (tcp/ipv4)" and click properties
Check "use the following ip address" and "use the following DNS server addresses"
Ip address: 192.168.1.1 (I chose the standard home network 192.168.xxx.xxx)
Subnet mask: 255.255.255.0
Default gateway: 192.168.1.1

Click ok and exit

On machine 2:
Control panel->network and internet->network and sharing server->change adapter settings (on the left)
Right click on the network adapter (in my case the internal wifi adapter) and click properties: highlight "internet protocol version 4 (tcp/ipv4)" and click properties
Check "use the following ip address" and "use the following DNS server addresses"
Ip address: write 192.168.1.2
Subnet mask: 255.255.255.0
Default gateway: 192.168.1.1

Click ok and exit

Disable firewalls and UAC on machine 1 and 2 to prevent errors in communication (you can activate them later once all is working)

Test the ping command
To see if the machines are seeing each other open on each machine the command prompt (cmd).
On machine 1 type: ping 192.168.1.2
and click enter
On machine 2 type: ping 192.168.1.1
and click enter

4 packets will be sent by machine 1 to machine 2 and to machine 2 to machine 1 and you must see that all packets reach destination.

VERY IMPORTANT
The 2 machines must have fluent installed in the same directory;
The 2 machines must have intel mpi installed in the same directory;
The 2 machines must have the same username and password to login in windows (so you must assign the same password to the usernames, you cannot have a blank password).


Test usernames/password

On both machines 1 and 2 share a directory, for example the C:\ directory
Go to Start->Computer
Right click on C:\ then click on properties
Click on sharing tab->advanced sharing
Check Share this folder, click apply and click on permissions
Highlight Everyone in users and groups and assign full control (all checks under "Allow")
Click apply, ok, ok

From machine 1 go to Start->Computer and click on network on the left to see machine 2
Double click on it and access the shared folder C:\ on machine 2
You will be prompted for a username and a password
Type the windows username and password and see if you can access the shared folder

From machine 2, do the same

Usernames and passwords must be the same!

On machine 1 create a new .txt file with notepad and write the hostnames of the network: the format of the file is:

Code:

ipmachine1
ipmachine1
ipmachine1
ipmachine1
ipmachine2
ipmachine2

This is a 4 cores machine 1 + 2 cores machine 2; under the last line a blank line must be present.
This is my hostnames.txt file (16 cores on machine 1 and 2 cores on machine 2)

Code:

192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.2
192.168.1.2

Save the hostnames.txt file on machine 1, on your desktop.

Start fluent on machine 1
Choose the working directory on machine 1 (cas and dat files are on machine 1)
Processing options: check parallel and type the number of processes, in my case 18
Click on parallel settings tab: under interconnects set defaults, under MPI types set Intel (do not leave defaults)
Check Distributed memory on a cluster and check File containing machine names: point to hostnames.txt on the desktop on machine 1
Click ok to start fluent
Skip the warning and info messages

I'm attaching some pictures: 'alicegate' is the macbook pro (2 cores); 'Workstation' is the workstation with 16 cores.
The other pictures show: on the macbook pro 100% cpu usage and network addresses; on the workstation 100% cpu usage and fluent running



I hope it's clear enough for all.

Daniele

PS: the only thing I cannot understand is why I have access denied ('Accesso negato') when I stop the simulation and save the cas and dat files on C:\ on machine 1.
As you can see in the picture I have access denied in the command window, however files are written (this happens with both C:\ folder shared and not shared).

mariachi May 6, 2014 09:47

It works...
 
I was able to setup a similar 2 machine cluster connected across a network, following your method, and FLUENT runs just fine. Now, going to add 2 more machines to this little cluster :D

Thanks for this tutorial Daniele!

aarratia May 12, 2014 17:23

I know it worked, but are the simulations calculated faster?
 
Hi, I am thinking on using parallel solving in 3 pcs win 7 x64 (24 processors in total) using ansys 13. I plan on connecting them via WIFI. My router is a Netgear WN2500RP up to 300 mbps according to specs. Will this router be capable of transfering data as fast as I need it to? when running on a single work station My file size increases at a rate of 350 Mb/hr(approx.). What do you guys think should I go through all the steps or should I just stick with one workstation? Does it actually help with my current conditions? Another thing is when running simulation is the internet available?

Regards,
AD

ghost82 May 12, 2014 18:29

Quote:

Originally Posted by aarratia (Post 491346)
Hi, I am thinking on using parallel solving in 3 pcs win 7 x64 (24 processors in total) using ansys 13. I plan on connecting them via WIFI. My router is a Netgear WN2500RP up to 300 mbps according to specs. Will this router be capable of transfering data as fast as I need it to? when running on a single work station My file size increases at a rate of 350 Mb/hr(approx.). What do you guys think should I go through all the steps or should I just stick with one workstation? Does it actually help with my current conditions? Another thing is when running simulation is the internet available?

Regards,
AD

I don't think it will be fast enough..
..however if you already have the 3 pcs and all the hardware needed to connect them without spend money you can try yourself and report your results.

Quote:

in a real setup, the 2 machines should be connected at least with a cross cable cat. 5E (Gbit ethernet), or better with infiniband
In your case not cross cables but straight ones as you need a switch (gigabit lan), or infiniband cables+switch (more expensive, expecially the switch).

Quote:

Another thing is when running simulation is the internet available?
If you use a router and not a switch, yes internet will be available.

Daniele

omid8 May 20, 2014 06:31

unable to connect to 192.168.1.2
 
hi
i want to use fluent in parallel processing in 2 PC.
i have read instruction, which was very helpful.

but i receiving this error from fluent
"unable to connect to 192.168.1.2"
but i could connect to this computer in the network and i also checked the connection of two computer with ping in cmd.
could you help me?
by the way, should i set the username of two computer same?i mean the username of PC1&PC2 should be same?
thanks

ghost82 May 20, 2014 06:55

Hi, yes, please read carefully:

Quote:

The 2 machines must have the same username and password to login in windows (so you must assign the same password to the usernames, you cannot have a blank password).


Test usernames/password

On both machines 1 and 2 share a directory, for example the C:\ directory
Go to Start->Computer
Right click on C:\ then click on properties
Click on sharing tab->advanced sharing
Check Share this folder, click apply and click on permissions
Highlight Everyone in users and groups and assign full control (all checks under "Allow")
Click apply, ok, ok

From machine 1 go to Start->Computer and click on network on the left to see machine 2
Double click on it and access the shared folder C:\ on machine 2
You will be prompted for a username and a password
Type the windows username and password and see if you can access the shared folder

From machine 2, do the same

Usernames and passwords must be the same!

quiqui May 29, 2014 16:14

very helpful, I am going to tryit.

xh110120 January 13, 2015 20:34

multi-machine parallel
 
Hi,

I have focused on the muti-machine parallel for fluent 14.5 these days. And there is always a problem. The relative works have been done. But after opening the fluent in the host computer, the state will always get stuck at this step"checking the status of SMPD for INTEL MPI on the local machine...smpd runing on tan-PC". And I also check the task manager seperately in the host computer and node computer. The corresponding cores have been working, and the process of smpd also appears in the task manager. I don't know what problems it has. And I've tried every method I could think, while they don't work. Could you please help me to find another way to figure it out? Thank you very much for your kindly help!


Quote:

Originally Posted by ghost82 (Post 488458)
Hi all,
this is a small tutorial to have the smallest cluster of 2 machines up and running with fluent.
The following setup is only a demo setup to show how 2 windows machines can be connected; more into details this tutorial will show you how to connect 2 machines running windows 7 64 bit to run jobs on distributed memory, without a switch, but directly connecting the 2 machines.

My demo setup is:

MACHINE 1 (this machine will be the host + 16 nodes)
Homemade workstation
Windows 7 64 bit Professional
2x intel xeon e5-2687w
64 gb ecc ram ddr3 at 1600 Mhz
250 gb ssd
2 TB hard drive for data storage
Wifi adapter Edimax Ralink EW-7318usg

MACHINE 2 (this machine will be 2 nodes)
Macbook pro late 2008 (laptop)
Bootcamp installed with windows 7 64 bit Ultimate
1x 2,5 Ghz intel core 2 duo
4 gb non ecc ram ddr2 at 667 Mhz
40 gb hard drive
Internal wifi N adapter


Since this is only a demo setup and since I haven't a cross cable to connect the machines through the gigabit ethernet, I was going to connect the 2 machines by wifi at 54 mbps (very slow); in a real setup, the 2 machines should be connected at least with a cross cable cat. 5E (Gbit ethernet), or better with infiniband.

First of all I need to create a lan to connect the 2 machines.
Edimax provides an access point software, so machine 1 will be the access point.
Machine 2 is the wifi client which will connect to machine 1.

Let's assign static ips and subnet masks to the machines.

On machine 1 (which will be the gateway):
Control panel->network and internet->network and sharing server->change adapter settings (on the left)
Right click on the network adapter (in my case the Edimax Ralink EW-7318usg) and click properties: highlight "internet protocol version 4 (tcp/ipv4)" and click properties
Check "use the following ip address" and "use the following DNS server addresses"
Ip address: 192.168.1.1 (I chose the standard home network 192.168.xxx.xxx)
Subnet mask: 255.255.255.0
Default gateway: 192.168.1.1

Click ok and exit

On machine 2:
Control panel->network and internet->network and sharing server->change adapter settings (on the left)
Right click on the network adapter (in my case the internal wifi adapter) and click properties: highlight "internet protocol version 4 (tcp/ipv4)" and click properties
Check "use the following ip address" and "use the following DNS server addresses"
Ip address: write 192.168.1.2
Subnet mask: 255.255.255.0
Default gateway: 192.168.1.1

Click ok and exit

Disable firewalls and UAC on machine 1 and 2 to prevent errors in communication (you can activate them later once all is working)

Test the ping command
To see if the machines are seeing each other open on each machine the command prompt (cmd).
On machine 1 type: ping 192.168.1.2
and click enter
On machine 2 type: ping 192.168.1.1
and click enter

4 packets will be sent by machine 1 to machine 2 and to machine 2 to machine 1 and you must see that all packets reach destination.

VERY IMPORTANT
The 2 machines must have fluent installed in the same directory;
The 2 machines must have intel mpi installed in the same directory;
The 2 machines must have the same username and password to login in windows (so you must assign the same password to the usernames, you cannot have a blank password).


Test usernames/password

On both machines 1 and 2 share a directory, for example the C:\ directory
Go to Start->Computer
Right click on C:\ then click on properties
Click on sharing tab->advanced sharing
Check Share this folder, click apply and click on permissions
Highlight Everyone in users and groups and assign full control (all checks under "Allow")
Click apply, ok, ok

From machine 1 go to Start->Computer and click on network on the left to see machine 2
Double click on it and access the shared folder C:\ on machine 2
You will be prompted for a username and a password
Type the windows username and password and see if you can access the shared folder

From machine 2, do the same

Usernames and passwords must be the same!

On machine 1 create a new .txt file with notepad and write the hostnames of the network: the format of the file is:

Code:

ipmachine1
ipmachine1
ipmachine1
ipmachine1
ipmachine2
ipmachine2

This is a 4 cores machine 1 + 2 cores machine 2; under the last line a blank line must be present.
This is my hostnames.txt file (16 cores on machine 1 and 2 cores on machine 2)

Code:

192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.2
192.168.1.2

Save the hostnames.txt file on machine 1, on your desktop.

Start fluent on machine 1
Choose the working directory on machine 1 (cas and dat files are on machine 1)
Processing options: check parallel and type the number of processes, in my case 18
Click on parallel settings tab: under interconnects set defaults, under MPI types set Intel (do not leave defaults)
Check Distributed memory on a cluster and check File containing machine names: point to hostnames.txt on the desktop on machine 1
Click ok to start fluent
Skip the warning and info messages

I'm attaching some pictures: 'alicegate' is the macbook pro (2 cores); 'Workstation' is the workstation with 16 cores.
The other pictures show: on the macbook pro 100% cpu usage and network addresses; on the workstation 100% cpu usage and fluent running



I hope it's clear enough for all.

Daniele

PS: the only thing I cannot understand is why I have access denied ('Accesso negato') when I stop the simulation and save the cas and dat files on C:\ on machine 1.
As you can see in the picture I have access denied in the command window, however files are written (this happens with both C:\ folder shared and not shared).


FahadAmin January 28, 2015 07:29

Fahad Amin
i have two machines, same login name and password, but when i try to access folder from one machine it requires username and password and i assigned the same name and password which is assigned to my both machine but i always receiving that bad password

ghost82 January 30, 2015 07:34

UPDATE:
Yesterday I upgraded fluent to version 16.0.
Unfortunately, after installing intel mpi and following my tutorial (http://www.cfd-online.com/Forums/flu...band-win7.html), fluent asked for user/password in its console, when started in parallel distributed memory.
Solution was to open a cmd command prompt as admin and type wmpiregister.
A new window pops up: in this window just put windows 7 user and password and click register.
I did this in both machines.
Fluent now is able to connect without asking user/password.

Daniele

amirmasoud_akhyani June 28, 2015 10:04

fluent 6.3.26
 
hi
i have fluent 6.3.26 64 bit on my pc.
could u help me to Run fluent with 2 machine on windows?
i read your tutorial (Run fluent on distributed memory with 2 windows 7 64 bit machines) but i can't use that for fluent 6.3.26

amirmasoud_akhyani August 25, 2015 13:30

before running
 
could you tell me which sofware should installed before running fluent on distributed memory with 2 windows?

ettore August 25, 2015 18:47

More than 32 Nodes
 
Helo Daniele,
Have you tried to spawn a computation over more than 32 nodes?
I have a 4 X CPU machine with 64Cores but the computation wont spawn on more than 32 nodes. It will only use 2 CPUs no mather what i chose in the Computation setup (32 - 64 Cores). I get no limitation regarding cores in console. I am usig Fluent v16.

Thank you first for this tutorial, helped me setup a "cluster farm'' out of 4 computers that worked flawlesly!
Trying to extend the performance a came across the 4XCPU mainboard solution (to avoid the latency and communication lag), but as describet above a came across an unexpeted limitation.

Thank you in advance for a reply.

ghost82 September 11, 2015 13:10

Quote:

Originally Posted by amirmasoud_akhyani (Post 561022)
could you tell me which sofware should installed before running fluent on distributed memory with 2 windows?

Nothing special..read tutorials, it's all explained there. You must install fluent and mpi, infiniband drivers (only if you interconnect is infiniband) and visual studio (only if you need to compile udfs).

ghost82 September 11, 2015 13:12

Quote:

Originally Posted by ettore (Post 561052)
Helo Daniele,
Have you tried to spawn a computation over more than 32 nodes?
I have a 4 X CPU machine with 64Cores but the computation wont spawn on more than 32 nodes. It will only use 2 CPUs no mather what i chose in the Computation setup (32 - 64 Cores). I get no limitation regarding cores in console. I am usig Fluent v16.

Thank you first for this tutorial, helped me setup a "cluster farm'' out of 4 computers that worked flawlesly!
Trying to extend the performance a came across the 4XCPU mainboard solution (to avoid the latency and communication lag), but as describet above a came across an unexpeted limitation.

Thank you in advance for a reply.

No I have 2 workstations with a total of 32 cores, but there should be no problems in connecting more cores. Can you attach a picture of what is wrong?
What is your OS?

ettore September 12, 2015 12:50

Hi,
Thank you for the reply.
I have an OS thet uses all CPUs (all 64 threads are shown in task manager).
I have taken out 2 CPUs and threfore i cannot put any prinscreens but i can explain the "problem".
In fluent i can setup any number of cores without any problem. In console i get the info that the simulation is spawn an all set nodes (CPUs) but when i look in task manager only 50% of them are used (CPU1 and CPU2). I read some ANSYS documentation on scaling and i have seen that after 32 Cores on a machine they don't get any benefit.

ghost82 September 12, 2015 13:32

Do you have hyperthreading active?

ettore September 12, 2015 13:59

Hi,
I have an 4xOpteron machine and AMD doesn't use HTT. Every thread is a core.
I have decided to mount al CPU's back and disable 8 core for every CPU (mainboard allows it) and try to run again on 32 Cores (4CPU X 8 cores = total of 32 Cores). In thoery i should gain 8 memory channels.
As a top i can oveclock the CPUs :D (they are allready at 3.1 Ghz instead of the 2.6 default).
I will leave a feedback here on what the outcome was. It' gonna take a while because i am waiting for some coolers.

amirmasoud_akhyani September 19, 2015 15:54

smpd version mismatch
 
hi
i have a new problem.
after i run the fluent in parallel mode, this error will always get shown at this step"checking the status of SMPD for INTEL MPI on the local machine...aborting: unable to connect to workstation, smpd version mismatch.
aborting: unableto connect to 192.168.1.1,smpd version mismatch."
what should i do to solve this?

LuckyTran September 19, 2015 23:36

Quote:

Originally Posted by amirmasoud_akhyani (Post 564769)
hi
i have a new problem.
after i run the fluent in parallel mode, this error will always get shown at this step"checking the status of SMPD for INTEL MPI on the local machine...aborting: unable to connect to workstation, smpd version mismatch.
aborting: unableto connect to 192.168.1.1,smpd version mismatch."
what should i do to solve this?

Are you running a distributed system with more than 1 machine or is this an error you receive in parallel mode on a single machine?

This error occurs when a different version of the Intel MPI is installed on each machine. Another way, is for the MPI to be incorrectly installed. Are you running windows 8 or 10? Check to make sure your version of the MPI is supported by your OS. You might be able to find some answers on the Intel developer zone.


All times are GMT -4. The time now is 08:55.