CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > SU2

How to run SU2 on cluster of computers (How to specify nodes ?)

Register Blogs Community New Posts Updated Threads Search

Like Tree5Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   May 8, 2013, 15:03
Default How to run SU2 on cluster of computers (How to specify nodes ?)
  #1
Member
 
Amit
Join Date: May 2013
Posts: 85
Rep Power: 12
aero_amit is on a distinguished road
Dear SU2 Developers,

Details of running SU2 on multicore Machine is given in the manual. If I want to run it on cluster of workstation (each having multiple cores), what is the command/details (How to specify node number etc)?

GPU computing is becoming very popular (cost effective/fast computing). Is there any plan to release a GPU version of SU2 in near future?

Thanks
aero_amit is offline   Reply With Quote

Old   May 9, 2013, 13:26
Default
  #2
New Member
 
Santiago Padron
Join Date: May 2013
Posts: 17
Rep Power: 12
Santiago Padron is on a distinguished road
Hi,

In order to run SU2 in parallel, first you will need to compile the code in parallel, and then you can use the python script in the following manner
$ parallel_computation.py -f your_config_file.cfg -p 4.
-f for file, -p for number of processors.

You should note that different clusters have their own protocol to submit jobs, so you will have to look into that with you cluster administrator.

Santiago
Santiago Padron is offline   Reply With Quote

Old   May 9, 2013, 13:27
Default
  #3
Member
 
Sean R. Copeland
Join Date: Jan 2013
Posts: 40
Rep Power: 13
copeland is on a distinguished road
Hi Harry,

It sounds like you are able to run parallel cases on a multi-core workstation using the parallel run tools included in the SU2 distribution. If this is not the case, then the manual provides details on installing with MPI and executing parallel cases using the parallel_computation.py python script.

The syntax for running SU2 on clusters depends on the job-submittal environment installed on the cluster. There are several standards (slurm, PBC, etc.) and depending on the unique environment of your cluster, you may be able to use the existing python tools directly or make small modifications to make it work. If you can provide me with some more details, I may be able to help more, but most computing groups have some introductory documentation on how to submit jobs to the cluster. I recommend seeking this information out and I think the path forward will become more clear.

GPU computing has the potential to be very powerful. The development team has discussed this, but, at the current time, we don't have anyone working on it. If members of the community are interested, we encourage folks to take on the challenge!


-Sean
copeland is offline   Reply With Quote

Old   May 30, 2013, 05:38
Default
  #4
Member
 
Amit
Join Date: May 2013
Posts: 85
Rep Power: 12
aero_amit is on a distinguished road
Thanks Santiago/Cope,

With small modifications in python script, I am able to run problem on distributed cluster.

aero_amit is offline   Reply With Quote

Old   May 30, 2013, 11:12
Default
  #5
New Member
 
Join Date: Feb 2013
Posts: 12
Rep Power: 13
Abhii is on a distinguished road
Hi Aero_Amit

could you please specify as to what modifications you did on your code in order to submit your code...? I am facing similar probles when trying to run parallel computation on cluster
Abhii is offline   Reply With Quote

Old   June 1, 2013, 02:26
Default
  #6
Member
 
Amit
Join Date: May 2013
Posts: 85
Rep Power: 12
aero_amit is on a distinguished road
Hi Abhii,

As developers already said it depends on job-submittal environment.
For me adding hostfile with mpirun worked.
aero_amit is offline   Reply With Quote

Old   March 31, 2014, 21:16
Default
  #7
Member
 
Carlos Alexandre Tomigawa Aguni
Join Date: Mar 2014
Posts: 40
Rep Power: 12
CrashLaker is on a distinguished road
Hello.

I've just added the host file but now all the nodes are doing the same job

Anyone?
CrashLaker is offline   Reply With Quote

Old   April 25, 2014, 01:20
Default
  #8
New Member
 
何建东
Join Date: Jun 2013
Posts: 22
Rep Power: 12
hejiandong is on a distinguished road
Quote:
Originally Posted by aero_amit View Post
Hi Abhii,

As developers already said it depends on job-submittal environment.
For me adding hostfile with mpirun worked.
Hi aero_amit,

Can u tell me adding hostfile in which script, because i can,t find mpirun in parallel_computation.py and shape_optimization.py.

For Fluent, we can use -cnf to specify nodes.
hejiandong is offline   Reply With Quote

Old   April 25, 2014, 03:25
Default
  #9
New Member
 
何建东
Join Date: Jun 2013
Posts: 22
Rep Power: 12
hejiandong is on a distinguished road
Quote:
Originally Posted by copeland View Post
Hi Harry,

It sounds like you are able to run parallel cases on a multi-core workstation using the parallel run tools included in the SU2 distribution. If this is not the case, then the manual provides details on installing with MPI and executing parallel cases using the parallel_computation.py python script.

The syntax for running SU2 on clusters depends on the job-submittal environment installed on the cluster. There are several standards (slurm, PBC, etc.) and depending on the unique environment of your cluster, you may be able to use the existing python tools directly or make small modifications to make it work. If you can provide me with some more details, I may be able to help more, but most computing groups have some introductory documentation on how to submit jobs to the cluster. I recommend seeking this information out and I think the path forward will become more clear.

GPU computing has the potential to be very powerful. The development team has discussed this, but, at the current time, we don't have anyone working on it. If members of the community are interested, we encourage folks to take on the challenge!


-Sean
I have the same problem, environment on our cluster is PBC.

For fluent, we can use fluent -tx -ssh -cnf="hostfile"
to specify the nodes to use.

However,for SU2, how to solve this problem?

Thanks a lot!
hejiandong is offline   Reply With Quote

Old   April 25, 2014, 09:14
Default
  #10
Member
 
Carlos Alexandre Tomigawa Aguni
Join Date: Mar 2014
Posts: 40
Rep Power: 12
CrashLaker is on a distinguished road
Quote:
Originally Posted by hejiandong View Post
I have the same problem, environment on our cluster is PBC.

For fluent, we can use fluent -tx -ssh -cnf="hostfile"
to specify the nodes to use.

However,for SU2, how to solve this problem?

Thanks a lot!
You could either manually edit it at /SU2_RUN/SU2/run/interface.py or modify parallel_computation.py to accept another input (longest way).

If you choose the first one you should modify the hostfile PBC creates for you. (I've never used PBC but some of them creates a random nodefile name. So you could just do something like this :
cp $NODEFILE hosts).
Or you implement parallel_computation.py to receive the hostfile name.
CrashLaker is offline   Reply With Quote

Old   April 26, 2014, 02:45
Default
  #11
New Member
 
何建东
Join Date: Jun 2013
Posts: 22
Rep Power: 12
hejiandong is on a distinguished road
Quote:
Originally Posted by CrashLaker View Post
You could either manually edit it at /SU2_RUN/SU2/run/interface.py or modify parallel_computation.py to accept another input (longest way).

If you choose the first one you should modify the hostfile PBC creates for you. (I've never used PBC but some of them creates a random nodefile name. So you could just do something like this :
cp $NODEFILE hosts).
Or you implement parallel_computation.py to receive the hostfile name.
Thanks, it helps me a lot!

I have edited the interface.py file. with mpirun -hostfile myhostfile -np..........
and then parallel_computation.py can work on the bode specified in my hostfile.

however, shape_optimization.py can not work..can anyone give me some suggestions?
zkdkeen likes this.
hejiandong is offline   Reply With Quote

Old   April 27, 2014, 22:18
Default
  #12
Member
 
Carlos Alexandre Tomigawa Aguni
Join Date: Mar 2014
Posts: 40
Rep Power: 12
CrashLaker is on a distinguished road
Quote:
Originally Posted by hejiandong View Post
Thanks, it helps me a lot!

I have edited the interface.py file. with mpirun -hostfile myhostfile -np..........
and then parallel_computation.py can work on the bode specified in my hostfile.

however, shape_optimization.py can not work..can anyone give me some suggestions?
I think we need more experts to hop in here.
From what I've seen searching through shape_optimization.py is that it calls "from scipy.optimize import fmin_slsqp" which I'm afraid it isn't parallelized yet.

http://docs.scipy.org/doc/scipy-0.13...min_slsqp.html
CrashLaker is offline   Reply With Quote

Old   April 28, 2014, 01:55
Default
  #13
New Member
 
何建东
Join Date: Jun 2013
Posts: 22
Rep Power: 12
hejiandong is on a distinguished road
Quote:
Originally Posted by CrashLaker View Post
I think we need more experts to hop in here.
From what I've seen searching through shape_optimization.py is that it calls "from scipy.optimize import fmin_slsqp" which I'm afraid it isn't parallelized yet.

http://docs.scipy.org/doc/scipy-0.13...min_slsqp.html
Thanks a lot, i have solved this problem by specified the directory of hostfile in mpirun rather than just specified the name of hostfile..
zkdkeen and CrashLaker like this.
hejiandong is offline   Reply With Quote

Old   May 20, 2014, 02:34
Default i have the same problem
  #14
New Member
 
nilesh
Join Date: Mar 2014
Location: Kanpur / Mumbai, India
Posts: 27
Rep Power: 12
nilesh is on a distinguished road
Quote:
Originally Posted by CrashLaker View Post
Hello.

I've just added the host file but now all the nodes are doing the same job

Anyone?
All my nodes are doing the same job as you mentioned. I do not understand anything about host file. Please help me with it.
I have an i7 machine with 4 cores (multithreaded to 8).
Thanks.
nilesh is offline   Reply With Quote

Old   May 20, 2014, 08:30
Default
  #15
Member
 
Carlos Alexandre Tomigawa Aguni
Join Date: Mar 2014
Posts: 40
Rep Power: 12
CrashLaker is on a distinguished road
Quote:
Originally Posted by nilesh View Post
All my nodes are doing the same job as you mentioned. I do not understand anything about host file. Please help me with it.
I have an i7 machine with 4 cores (multithreaded to 8).
Thanks.
Seems like there's a problem with your mpi or the way
you're linking the LD_LIBRARY_PATH env.
For example:
When using MPICH2 you should've to add it's lib to LD_LIBRARY_PATH.

Using hostfile is needed when you want to run it on more than 1 computer. But since you have only 1 using -np will be just fine.

Are you using the 3.0 or 3.1 version?
At first I recommend you to recked your mpi installation.
CrashLaker is offline   Reply With Quote

Old   May 20, 2014, 09:20
Default
  #16
New Member
 
nilesh
Join Date: Mar 2014
Location: Kanpur / Mumbai, India
Posts: 27
Rep Power: 12
nilesh is on a distinguished road
Quote:
Originally Posted by CrashLaker View Post
Seems like there's a problem with your mpi or the way
you're linking the LD_LIBRARY_PATH env.
For example:
When using MPICH2 you should've to add it's lib to LD_LIBRARY_PATH.

Using hostfile is needed when you want to run it on more than 1 computer. But since you have only 1 using -np will be just fine.

Are you using the 3.0 or 3.1 version?
At first I recommend you to recked your mpi installation.
I am using version 3.1. Where and how to add this LD_LIBRARY_PATH ?
nilesh is offline   Reply With Quote

Old   May 20, 2014, 09:37
Default
  #17
Member
 
Carlos Alexandre Tomigawa Aguni
Join Date: Mar 2014
Posts: 40
Rep Power: 12
CrashLaker is on a distinguished road
Quote:
Originally Posted by nilesh View Post
I am using version 3.1. Where and how to add this LD_LIBRARY_PATH ?
You can add this to your .bashrc file to make it permanent or you can add this to your current session.

Open .bashrc; Find "export LD_LIBRARY_PATH"; Create a new line below that and write "LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/pathtompich2/lib"

or add this on your terminal:
root$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/pathtompich2/lib

I recommend the latter one so that it doesn't interfere on another applications
CrashLaker is offline   Reply With Quote

Old   May 21, 2014, 09:32
Default
  #18
New Member
 
nilesh
Join Date: Mar 2014
Location: Kanpur / Mumbai, India
Posts: 27
Rep Power: 12
nilesh is on a distinguished road
Quote:
Originally Posted by CrashLaker View Post
You can add this to your .bashrc file to make it permanent or you can add this to your current session.

Open .bashrc; Find "export LD_LIBRARY_PATH"; Create a new line below that and write "LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/pathtompich2/lib"

or add this on your terminal:
root$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/pathtompich2/lib

I recommend the latter one so that it doesn't interfere on another applications
I tried your suggestions but the problem still persists. I also tried installing mpich2 again. DDC is unable to divide the mesh. Even if a divided mesh created from another machine is provided, SU2_CFD runs the whole problem in parallel on each node.
Mysteriously, I tried installing it on another machine and its working on a CentOS machine with the same config procedure.
nilesh is offline   Reply With Quote

Old   May 21, 2014, 11:09
Default
  #19
Member
 
Carlos Alexandre Tomigawa Aguni
Join Date: Mar 2014
Posts: 40
Rep Power: 12
CrashLaker is on a distinguished road
Quote:
Originally Posted by nilesh View Post
I tried your suggestions but the problem still persists. I also tried installing mpich2 again. DDC is unable to divide the mesh. Even if a divided mesh created from another machine is provided, SU2_CFD runs the whole problem in parallel on each node.
Mysteriously, I tried installing it on another machine and its working on a CentOS machine with the same config procedure.
Did you check if the mpirun (which mpirun) you're using is mpich2?
CrashLaker is offline   Reply With Quote

Old   May 22, 2014, 05:21
Default Solved!!!
  #20
New Member
 
nilesh
Join Date: Mar 2014
Location: Kanpur / Mumbai, India
Posts: 27
Rep Power: 12
nilesh is on a distinguished road
Quote:
Originally Posted by CrashLaker View Post
Did you check if the mpirun (which mpirun) you're using is mpich2?
Thanks a lot Mr. Carlos, I highly appreciate your help in this matter.
The problem has probably got to do with the way Ubuntu installs mpi from the package manager. All mpis somehow get installed in the same directory if done automatically and then it becomes really difficult to ensure which one is being run.

The solution:
I removed all mpi's and then installed mpich (required for other purposes) from the package manager. Then manually downloaded and installed mpich2 from source to another directly and finally added "export PATH=/usr/lib/mpich2/bin:$PATH" to my bashrc. Its finally up and running!!!!

Is there a way to mark this post so that it could be easier for other users facing a similar problem?
hlk likes this.
nilesh is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Cant run in parallel on two nodes using OpenMPI CHristofer Main CFD Forum 0 October 26, 2007 09:54
Cluster run without master node Mikhail CFX 4 September 29, 2005 08:58
Minimum number of nodes to run CFX in parallel Rui CFX 3 April 11, 2005 20:46
Maximum number of nodes in cluster Marat Hoshim Main CFD Forum 7 April 9, 2001 17:11
Linux Cluster Computing Mark Rist Main CFD Forum 5 September 10, 2000 05:51


All times are GMT -4. The time now is 06:18.