CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   Edge (http://www.cfd-online.com/Forums/edge/)
-   -   Edge on a Linux cluster (http://www.cfd-online.com/Forums/edge/92381-edge-linux-cluster.html)

filipwa September 13, 2011 03:39

Edge on a Linux cluster
 
Dear Edge users.

I am planing to use Edge to solve a problem with about 100,000,000 cells and I don't think that my average desktop computer will be able to coupe with this. Instead I am planning to put together about 10 desktop computers with 4 cores each, giving me a cluster with 40 processors. Only problem is that I can not find any information on how get Edge to run on a cluster.

At the moment I am running Edge on Ubuntu with parallel computing utilizing all 4 cores of the computer without any problems.

Do any of you have experience of running Edge on a Linux cluster? What kind of cluster would you use? I have heard about people using the Linux Rocks Cluster with other codes, would this work with Edge as well? As far as I know Edge uses MPI, which is supported by Rocks. Or is there some simpler way of building a cluster in Ubuntu on which I can run Edge?

Please advise. Thanks! :)

filipwa September 15, 2011 01:57

After some struggle I managed to setup an MPI-cluster using the mpd process manager. However, I cannot get edge to run on it.

For testing purposes I am right now only using two computers 1 server and 1 node (the server has 2 processors and the node has 4).

Booting the cluster I used the following command: mpdboot --verbose --ncpus=2 -n 2

Giving the following output:

running mpdallexit on AB-SATS
LAUNCHED mpd on AB-SATS via
RUNNING: mpd on AB-SATS
LAUNCHED mpd on 192.168.12.173 via AB-SATS
RUNNING: mpd on 192.168.12.173

mpdtrace confirms that the cluster is up and running. I also tried running some of the test applications to confirm that it is actually working, so far so good.

As a next step prepared the input and mesh files on the server and then cloned them onto the node.

I now try to run a multi process calculation with edge by using the following command: edge_mpi_run test.ainp 6

The error message I get from edge is the following:

Initialisation started

Give input-file (.ainp) ?test.ainp
Reading: "test.ainp"
done

ERROR IN EDGE, IN ROUTINE "MIMD_SETUP" !!!
--- ERROR FROM MIMD_SETUP, NPARTC=/NPART ---


ERROR IN EDGE, IN ROUTINE "MIMD_SETUP" !!!
--- NPARTC= 1 ---


ERROR IN EDGE, IN ROUTINE "MIMD_SETUP" !!!
--- NPART = 6 ---

DATE - 110915
TIME - 12:45:48
--- EXITING EDGE FROM SUBROUTINE "MIMD_SETUP" ---
************************************************** ************************
* *
* Starting Edge *
* Edge 5.0.0 www.foi.se/edge *
* *
* Build time Tue Mar 23 22:30:01 CET 2010 *
* Built by enp *
* Build system Linux-x86_64 *
* Build host mohawk *
* Build FC mpif90 *
* *
************************************************** ************************
Date - 110915
Time - 12:45:48

Initialisation started

Give input-file (.ainp) ?At line 84 of file /extra3/enp/src/edge/5.0.0/solver/basic/callok_m.f90 (unit = 5, file = 'stdin')
Fortran runtime error: End of file


I also tried to run the same job locally by first shutting down the cluster, but end up with the same error message.

Does anyone have any suggestions on what I am doing wrong? Do I need to use another process manager?

filipwa September 20, 2011 23:55

I am happy to inform that I managed to get the cluster running using OpenMPI and it seems to be working without any problems.

However, when increasing the number of active cores past 12 I do not seem to get a decrease in calculation time. Looking in the resource manager of the different machines I can see all cores of the CPU working at 100 % and sending/receiving data at about 6 Mb/s. All computers have gigabit LAN adapters and I am using a gigabit switch as well. I believe that LAN is not an issue?

Does someone with some more experience have any suggestions? Is there some setting I have overlooked?

airsupply January 12, 2012 04:35

Quote:

Originally Posted by filipwa (Post 324965)
I am happy to inform that I managed to get the cluster running using OpenMPI and it seems to be working without any problems.

However, when increasing the number of active cores past 12 I do not seem to get a decrease in calculation time. Looking in the resource manager of the different machines I can see all cores of the CPU working at 100 % and sending/receiving data at about 6 Mb/s. All computers have gigabit LAN adapters and I am using a gigabit switch as well. I believe that LAN is not an issue?

Does someone with some more experience have any suggestions? Is there some setting I have overlooked?

Congratulations!

I used to run edge on one machine with 8 cores and nearly linear speed up was achieved.

However I also encountered your problem of how to get the cluster running. Could you please describe the general steps of running edge on linux clusters? (I am using MPICH2)

filipwa January 15, 2012 12:06

Quote:

Originally Posted by airsupply (Post 338957)
Congratulations!

I used to run edge on one machine with 8 cores and nearly linear speed up was achieved.

However I also encountered your problem of how to get the cluster running. Could you please describe the general steps of running edge on linux clusters? (I am using MPICH2)

I installed the OpenMPI packages openmpi-bin and openmpi-common..

I created a new user on all computers which i called 'cluster' and gave it the same password on all computers..

After that I installed ssh and set it up so that the user cluster can log on to any other computer in the cluster without having to enter a password.

After that I edited the /etc/hosts file, on all the computers, so that it includes the host name and ip address of all computers in the cluster.

After that you need a working directory with the same path on all computers, e.g. /home/cluster/edge/. On the server this folder must contain all files created after running the preprocessor. On the nodes it must contain the following files: .ainp, .aboc and all .bedg_p1, 2 ... n files.

On the server you have to create a hostfile for openmpi so that it knows how many cores each node has avaliable. The file should be located in your working directory. I called it simply mpi.hosts.. The file should be structured in the following way:

server slots=1
node1 slots=4
node2 slots=4
etc

If everything is correct you should now be able to run edge on all the computers in your mpi.hosts file. To start the calculation, open a terminal window and cd to the working directory and run the following command

mpirun.openmpi -n 9 --hostfile mpi.hosts edge_mpi_run.x

When asked give the name of your .ainp file and it should start running.

airsupply January 17, 2012 03:34

Quote:

Originally Posted by filipwa (Post 339403)
I installed the OpenMPI packages openmpi-bin and openmpi-common..

I created a new user on all computers which i called 'cluster' and gave it the same password on all computers..

After that I installed ssh and set it up so that the user cluster can log on to any other computer in the cluster without having to enter a password.

After that I edited the /etc/hosts file, on all the computers, so that it includes the host name and ip address of all computers in the cluster.

After that you need a working directory with the same path on all computers, e.g. /home/cluster/edge/. On the server this folder must contain all files created after running the preprocessor. On the nodes it must contain the following files: .ainp, .aboc and all .bedg_p1, 2 ... n files.

On the server you have to create a hostfile for openmpi so that it knows how many cores each node has avaliable. The file should be located in your working directory. I called it simply mpi.hosts.. The file should be structured in the following way:

server slots=1
node1 slots=4
node2 slots=4
etc

If everything is correct you should now be able to run edge on all the computers in your mpi.hosts file. To start the calculation, open a terminal window and cd to the working directory and run the following command

mpirun.openmpi -n 9 --hostfile mpi.hosts edge_mpi_run.x

When asked give the name of your .ainp file and it should start running.

I really appreciate for your detailed description. Thank you! I made it using 8*2 cores and all the CPU occupation were more than 98%, and I felt a good speed up.

jka February 24, 2012 15:21

Have you figured out how to run on MPI?


Quote:

Originally Posted by filipwa (Post 324183)
After some struggle I managed to setup an MPI-cluster using the mpd process manager. However, I cannot get edge to run on it.

For testing purposes I am right now only using two computers 1 server and 1 node (the server has 2 processors and the node has 4).

Booting the cluster I used the following command: mpdboot --verbose --ncpus=2 -n 2

Giving the following output:

running mpdallexit on AB-SATS
LAUNCHED mpd on AB-SATS via
RUNNING: mpd on AB-SATS
LAUNCHED mpd on 192.168.12.173 via AB-SATS
RUNNING: mpd on 192.168.12.173

mpdtrace confirms that the cluster is up and running. I also tried running some of the test applications to confirm that it is actually working, so far so good.

As a next step prepared the input and mesh files on the server and then cloned them onto the node.

I now try to run a multi process calculation with edge by using the following command: edge_mpi_run test.ainp 6

The error message I get from edge is the following:

Initialisation started

Give input-file (.ainp) ?test.ainp
Reading: "test.ainp"
done

ERROR IN EDGE, IN ROUTINE "MIMD_SETUP" !!!
--- ERROR FROM MIMD_SETUP, NPARTC=/NPART ---


ERROR IN EDGE, IN ROUTINE "MIMD_SETUP" !!!
--- NPARTC= 1 ---


ERROR IN EDGE, IN ROUTINE "MIMD_SETUP" !!!
--- NPART = 6 ---

DATE - 110915
TIME - 12:45:48
--- EXITING EDGE FROM SUBROUTINE "MIMD_SETUP" ---
************************************************** ************************
* *
* Starting Edge *
* Edge 5.0.0 www.foi.se/edge *
* *
* Build time Tue Mar 23 22:30:01 CET 2010 *
* Built by enp *
* Build system Linux-x86_64 *
* Build host mohawk *
* Build FC mpif90 *
* *
************************************************** ************************
Date - 110915
Time - 12:45:48

Initialisation started

Give input-file (.ainp) ?At line 84 of file /extra3/enp/src/edge/5.0.0/solver/basic/callok_m.f90 (unit = 5, file = 'stdin')
Fortran runtime error: End of file


I also tried to run the same job locally by first shutting down the cluster, but end up with the same error message.

Does anyone have any suggestions on what I am doing wrong? Do I need to use another process manager?



All times are GMT -4. The time now is 03:22.