CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Edge

Edge on a Linux cluster

Register Blogs Members List Search Today's Posts Mark Forums Read

 
 
LinkBack Thread Tools Display Modes
Old   September 13, 2011, 03:39
Default Edge on a Linux cluster
  #1
New Member
 
Filip Wallberg
Join Date: Oct 2010
Posts: 22
Rep Power: 6
filipwa is on a distinguished road
Dear Edge users.

I am planing to use Edge to solve a problem with about 100,000,000 cells and I don't think that my average desktop computer will be able to coupe with this. Instead I am planning to put together about 10 desktop computers with 4 cores each, giving me a cluster with 40 processors. Only problem is that I can not find any information on how get Edge to run on a cluster.

At the moment I am running Edge on Ubuntu with parallel computing utilizing all 4 cores of the computer without any problems.

Do any of you have experience of running Edge on a Linux cluster? What kind of cluster would you use? I have heard about people using the Linux Rocks Cluster with other codes, would this work with Edge as well? As far as I know Edge uses MPI, which is supported by Rocks. Or is there some simpler way of building a cluster in Ubuntu on which I can run Edge?

Please advise. Thanks!
filipwa is offline  

Old   September 15, 2011, 01:57
Default
  #2
New Member
 
Filip Wallberg
Join Date: Oct 2010
Posts: 22
Rep Power: 6
filipwa is on a distinguished road
After some struggle I managed to setup an MPI-cluster using the mpd process manager. However, I cannot get edge to run on it.

For testing purposes I am right now only using two computers 1 server and 1 node (the server has 2 processors and the node has 4).

Booting the cluster I used the following command: mpdboot --verbose --ncpus=2 -n 2

Giving the following output:

running mpdallexit on AB-SATS
LAUNCHED mpd on AB-SATS via
RUNNING: mpd on AB-SATS
LAUNCHED mpd on 192.168.12.173 via AB-SATS
RUNNING: mpd on 192.168.12.173

mpdtrace confirms that the cluster is up and running. I also tried running some of the test applications to confirm that it is actually working, so far so good.

As a next step prepared the input and mesh files on the server and then cloned them onto the node.

I now try to run a multi process calculation with edge by using the following command: edge_mpi_run test.ainp 6

The error message I get from edge is the following:

Initialisation started

Give input-file (.ainp) ?test.ainp
Reading: "test.ainp"
done

ERROR IN EDGE, IN ROUTINE "MIMD_SETUP" !!!
--- ERROR FROM MIMD_SETUP, NPARTC=/NPART ---


ERROR IN EDGE, IN ROUTINE "MIMD_SETUP" !!!
--- NPARTC= 1 ---


ERROR IN EDGE, IN ROUTINE "MIMD_SETUP" !!!
--- NPART = 6 ---

DATE - 110915
TIME - 12:45:48
--- EXITING EDGE FROM SUBROUTINE "MIMD_SETUP" ---
************************************************** ************************
* *
* Starting Edge *
* Edge 5.0.0 www.foi.se/edge *
* *
* Build time Tue Mar 23 22:30:01 CET 2010 *
* Built by enp *
* Build system Linux-x86_64 *
* Build host mohawk *
* Build FC mpif90 *
* *
************************************************** ************************
Date - 110915
Time - 12:45:48

Initialisation started

Give input-file (.ainp) ?At line 84 of file /extra3/enp/src/edge/5.0.0/solver/basic/callok_m.f90 (unit = 5, file = 'stdin')
Fortran runtime error: End of file


I also tried to run the same job locally by first shutting down the cluster, but end up with the same error message.

Does anyone have any suggestions on what I am doing wrong? Do I need to use another process manager?
filipwa is offline  

Old   September 20, 2011, 23:55
Default
  #3
New Member
 
Filip Wallberg
Join Date: Oct 2010
Posts: 22
Rep Power: 6
filipwa is on a distinguished road
I am happy to inform that I managed to get the cluster running using OpenMPI and it seems to be working without any problems.

However, when increasing the number of active cores past 12 I do not seem to get a decrease in calculation time. Looking in the resource manager of the different machines I can see all cores of the CPU working at 100 % and sending/receiving data at about 6 Mb/s. All computers have gigabit LAN adapters and I am using a gigabit switch as well. I believe that LAN is not an issue?

Does someone with some more experience have any suggestions? Is there some setting I have overlooked?
filipwa is offline  

Old   January 12, 2012, 04:35
Default
  #4
New Member
 
Alvin
Join Date: Jun 2011
Posts: 5
Rep Power: 6
airsupply is on a distinguished road
Quote:
Originally Posted by filipwa View Post
I am happy to inform that I managed to get the cluster running using OpenMPI and it seems to be working without any problems.

However, when increasing the number of active cores past 12 I do not seem to get a decrease in calculation time. Looking in the resource manager of the different machines I can see all cores of the CPU working at 100 % and sending/receiving data at about 6 Mb/s. All computers have gigabit LAN adapters and I am using a gigabit switch as well. I believe that LAN is not an issue?

Does someone with some more experience have any suggestions? Is there some setting I have overlooked?
Congratulations!

I used to run edge on one machine with 8 cores and nearly linear speed up was achieved.

However I also encountered your problem of how to get the cluster running. Could you please describe the general steps of running edge on linux clusters? (I am using MPICH2)
airsupply is offline  

Old   January 15, 2012, 12:06
Default
  #5
New Member
 
Filip Wallberg
Join Date: Oct 2010
Posts: 22
Rep Power: 6
filipwa is on a distinguished road
Quote:
Originally Posted by airsupply View Post
Congratulations!

I used to run edge on one machine with 8 cores and nearly linear speed up was achieved.

However I also encountered your problem of how to get the cluster running. Could you please describe the general steps of running edge on linux clusters? (I am using MPICH2)
I installed the OpenMPI packages openmpi-bin and openmpi-common..

I created a new user on all computers which i called 'cluster' and gave it the same password on all computers..

After that I installed ssh and set it up so that the user cluster can log on to any other computer in the cluster without having to enter a password.

After that I edited the /etc/hosts file, on all the computers, so that it includes the host name and ip address of all computers in the cluster.

After that you need a working directory with the same path on all computers, e.g. /home/cluster/edge/. On the server this folder must contain all files created after running the preprocessor. On the nodes it must contain the following files: .ainp, .aboc and all .bedg_p1, 2 ... n files.

On the server you have to create a hostfile for openmpi so that it knows how many cores each node has avaliable. The file should be located in your working directory. I called it simply mpi.hosts.. The file should be structured in the following way:

server slots=1
node1 slots=4
node2 slots=4
etc

If everything is correct you should now be able to run edge on all the computers in your mpi.hosts file. To start the calculation, open a terminal window and cd to the working directory and run the following command

mpirun.openmpi -n 9 --hostfile mpi.hosts edge_mpi_run.x

When asked give the name of your .ainp file and it should start running.
filipwa is offline  

Old   January 17, 2012, 03:34
Default
  #6
New Member
 
Alvin
Join Date: Jun 2011
Posts: 5
Rep Power: 6
airsupply is on a distinguished road
Quote:
Originally Posted by filipwa View Post
I installed the OpenMPI packages openmpi-bin and openmpi-common..

I created a new user on all computers which i called 'cluster' and gave it the same password on all computers..

After that I installed ssh and set it up so that the user cluster can log on to any other computer in the cluster without having to enter a password.

After that I edited the /etc/hosts file, on all the computers, so that it includes the host name and ip address of all computers in the cluster.

After that you need a working directory with the same path on all computers, e.g. /home/cluster/edge/. On the server this folder must contain all files created after running the preprocessor. On the nodes it must contain the following files: .ainp, .aboc and all .bedg_p1, 2 ... n files.

On the server you have to create a hostfile for openmpi so that it knows how many cores each node has avaliable. The file should be located in your working directory. I called it simply mpi.hosts.. The file should be structured in the following way:

server slots=1
node1 slots=4
node2 slots=4
etc

If everything is correct you should now be able to run edge on all the computers in your mpi.hosts file. To start the calculation, open a terminal window and cd to the working directory and run the following command

mpirun.openmpi -n 9 --hostfile mpi.hosts edge_mpi_run.x

When asked give the name of your .ainp file and it should start running.
I really appreciate for your detailed description. Thank you! I made it using 8*2 cores and all the CPU occupation were more than 98%, and I felt a good speed up.
airsupply is offline  

Old   February 24, 2012, 15:21
Default
  #7
jka
New Member
 
Adam Jirasek
Join Date: Mar 2011
Posts: 15
Rep Power: 6
jka is on a distinguished road
Have you figured out how to run on MPI?


Quote:
Originally Posted by filipwa View Post
After some struggle I managed to setup an MPI-cluster using the mpd process manager. However, I cannot get edge to run on it.

For testing purposes I am right now only using two computers 1 server and 1 node (the server has 2 processors and the node has 4).

Booting the cluster I used the following command: mpdboot --verbose --ncpus=2 -n 2

Giving the following output:

running mpdallexit on AB-SATS
LAUNCHED mpd on AB-SATS via
RUNNING: mpd on AB-SATS
LAUNCHED mpd on 192.168.12.173 via AB-SATS
RUNNING: mpd on 192.168.12.173

mpdtrace confirms that the cluster is up and running. I also tried running some of the test applications to confirm that it is actually working, so far so good.

As a next step prepared the input and mesh files on the server and then cloned them onto the node.

I now try to run a multi process calculation with edge by using the following command: edge_mpi_run test.ainp 6

The error message I get from edge is the following:

Initialisation started

Give input-file (.ainp) ?test.ainp
Reading: "test.ainp"
done

ERROR IN EDGE, IN ROUTINE "MIMD_SETUP" !!!
--- ERROR FROM MIMD_SETUP, NPARTC=/NPART ---


ERROR IN EDGE, IN ROUTINE "MIMD_SETUP" !!!
--- NPARTC= 1 ---


ERROR IN EDGE, IN ROUTINE "MIMD_SETUP" !!!
--- NPART = 6 ---

DATE - 110915
TIME - 12:45:48
--- EXITING EDGE FROM SUBROUTINE "MIMD_SETUP" ---
************************************************** ************************
* *
* Starting Edge *
* Edge 5.0.0 www.foi.se/edge *
* *
* Build time Tue Mar 23 22:30:01 CET 2010 *
* Built by enp *
* Build system Linux-x86_64 *
* Build host mohawk *
* Build FC mpif90 *
* *
************************************************** ************************
Date - 110915
Time - 12:45:48

Initialisation started

Give input-file (.ainp) ?At line 84 of file /extra3/enp/src/edge/5.0.0/solver/basic/callok_m.f90 (unit = 5, file = 'stdin')
Fortran runtime error: End of file


I also tried to run the same job locally by first shutting down the cluster, but end up with the same error message.

Does anyone have any suggestions on what I am doing wrong? Do I need to use another process manager?
jka is offline  

 

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
how to set periodic boundary conditions Ganesh FLUENT 13 January 22, 2014 05:11
Actuator disk model audrich FLUENT 0 September 21, 2009 07:06
fluent add additional zones for the mesh file SSL FLUENT 2 January 26, 2008 12:55
LINUX cluster and server azmir CD-adapco 6 September 17, 2006 19:09
Linux Cluster Performance with a bi-processor PC M. FLUENT 1 April 22, 2005 09:25


All times are GMT -4. The time now is 21:03.