CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   SU2 (https://www.cfd-online.com/Forums/su2/)
-   -   How to run SU2 on cluster of computers (How to specify nodes ?) (https://www.cfd-online.com/Forums/su2/117448-how-run-su2-cluster-computers-how-specify-nodes.html)

aero_amit May 8, 2013 15:03

How to run SU2 on cluster of computers (How to specify nodes ?)
 
Dear SU2 Developers,

Details of running SU2 on multicore Machine is given in the manual. If I want to run it on cluster of workstation (each having multiple cores), what is the command/details (How to specify node number etc)?

GPU computing is becoming very popular (cost effective/fast computing). Is there any plan to release a GPU version of SU2 in near future?

Thanks

Santiago Padron May 9, 2013 13:26

Hi,

In order to run SU2 in parallel, first you will need to compile the code in parallel, and then you can use the python script in the following manner
$ parallel_computation.py -f your_config_file.cfg -p 4.
-f for file, -p for number of processors.

You should note that different clusters have their own protocol to submit jobs, so you will have to look into that with you cluster administrator.

Santiago

copeland May 9, 2013 13:27

Hi Harry,

It sounds like you are able to run parallel cases on a multi-core workstation using the parallel run tools included in the SU2 distribution. If this is not the case, then the manual provides details on installing with MPI and executing parallel cases using the parallel_computation.py python script.

The syntax for running SU2 on clusters depends on the job-submittal environment installed on the cluster. There are several standards (slurm, PBC, etc.) and depending on the unique environment of your cluster, you may be able to use the existing python tools directly or make small modifications to make it work. If you can provide me with some more details, I may be able to help more, but most computing groups have some introductory documentation on how to submit jobs to the cluster. I recommend seeking this information out and I think the path forward will become more clear.

GPU computing has the potential to be very powerful. The development team has discussed this, but, at the current time, we don't have anyone working on it. If members of the community are interested, we encourage folks to take on the challenge!


-Sean

aero_amit May 30, 2013 05:38

Thanks Santiago/Cope,

With small modifications in python script, I am able to run problem on distributed cluster.

:)

Abhii May 30, 2013 11:12

Hi Aero_Amit

could you please specify as to what modifications you did on your code in order to submit your code...? I am facing similar probles when trying to run parallel computation on cluster

aero_amit June 1, 2013 02:26

Hi Abhii,

As developers already said it depends on job-submittal environment.
For me adding hostfile with mpirun worked.

CrashLaker March 31, 2014 21:16

Hello.

I've just added the host file but now all the nodes are doing the same job :(

Anyone?

hejiandong April 25, 2014 01:20

Quote:

Originally Posted by aero_amit (Post 431307)
Hi Abhii,

As developers already said it depends on job-submittal environment.
For me adding hostfile with mpirun worked.

Hi aero_amit,

Can u tell me adding hostfile in which script, because i can,t find mpirun in parallel_computation.py and shape_optimization.py.

For Fluent, we can use -cnf to specify nodes.

hejiandong April 25, 2014 03:25

Quote:

Originally Posted by copeland (Post 426398)
Hi Harry,

It sounds like you are able to run parallel cases on a multi-core workstation using the parallel run tools included in the SU2 distribution. If this is not the case, then the manual provides details on installing with MPI and executing parallel cases using the parallel_computation.py python script.

The syntax for running SU2 on clusters depends on the job-submittal environment installed on the cluster. There are several standards (slurm, PBC, etc.) and depending on the unique environment of your cluster, you may be able to use the existing python tools directly or make small modifications to make it work. If you can provide me with some more details, I may be able to help more, but most computing groups have some introductory documentation on how to submit jobs to the cluster. I recommend seeking this information out and I think the path forward will become more clear.

GPU computing has the potential to be very powerful. The development team has discussed this, but, at the current time, we don't have anyone working on it. If members of the community are interested, we encourage folks to take on the challenge!


-Sean

I have the same problem, environment on our cluster is PBC.

For fluent, we can use fluent -tx -ssh -cnf="hostfile"
to specify the nodes to use.

However,for SU2, how to solve this problem?

Thanks a lot!

CrashLaker April 25, 2014 09:14

Quote:

Originally Posted by hejiandong (Post 488111)
I have the same problem, environment on our cluster is PBC.

For fluent, we can use fluent -tx -ssh -cnf="hostfile"
to specify the nodes to use.

However,for SU2, how to solve this problem?

Thanks a lot!

You could either manually edit it at /SU2_RUN/SU2/run/interface.py or modify parallel_computation.py to accept another input (longest way).

If you choose the first one you should modify the hostfile PBC creates for you. (I've never used PBC but some of them creates a random nodefile name. So you could just do something like this :
cp $NODEFILE hosts).
Or you implement parallel_computation.py to receive the hostfile name.

hejiandong April 26, 2014 02:45

Quote:

Originally Posted by CrashLaker (Post 488182)
You could either manually edit it at /SU2_RUN/SU2/run/interface.py or modify parallel_computation.py to accept another input (longest way).

If you choose the first one you should modify the hostfile PBC creates for you. (I've never used PBC but some of them creates a random nodefile name. So you could just do something like this :
cp $NODEFILE hosts).
Or you implement parallel_computation.py to receive the hostfile name.

Thanks, it helps me a lot!

I have edited the interface.py file. with mpirun -hostfile myhostfile -np..........
and then parallel_computation.py can work on the bode specified in my hostfile.

however, shape_optimization.py can not work..can anyone give me some suggestions?

CrashLaker April 27, 2014 22:18

Quote:

Originally Posted by hejiandong (Post 488295)
Thanks, it helps me a lot!

I have edited the interface.py file. with mpirun -hostfile myhostfile -np..........
and then parallel_computation.py can work on the bode specified in my hostfile.

however, shape_optimization.py can not work..can anyone give me some suggestions?

I think we need more experts to hop in here.
From what I've seen searching through shape_optimization.py is that it calls "from scipy.optimize import fmin_slsqp" which I'm afraid it isn't parallelized yet.

http://docs.scipy.org/doc/scipy-0.13...min_slsqp.html

hejiandong April 28, 2014 01:55

Quote:

Originally Posted by CrashLaker (Post 488555)
I think we need more experts to hop in here.
From what I've seen searching through shape_optimization.py is that it calls "from scipy.optimize import fmin_slsqp" which I'm afraid it isn't parallelized yet.

http://docs.scipy.org/doc/scipy-0.13...min_slsqp.html

Thanks a lot, i have solved this problem by specified the directory of hostfile in mpirun rather than just specified the name of hostfile..

nilesh May 20, 2014 02:34

i have the same problem
 
Quote:

Originally Posted by CrashLaker (Post 483106)
Hello.

I've just added the host file but now all the nodes are doing the same job :(

Anyone?

All my nodes are doing the same job as you mentioned. I do not understand anything about host file. Please help me with it.
I have an i7 machine with 4 cores (multithreaded to 8).
Thanks.

CrashLaker May 20, 2014 08:30

Quote:

Originally Posted by nilesh (Post 492987)
All my nodes are doing the same job as you mentioned. I do not understand anything about host file. Please help me with it.
I have an i7 machine with 4 cores (multithreaded to 8).
Thanks.

Seems like there's a problem with your mpi or the way
you're linking the LD_LIBRARY_PATH env.
For example:
When using MPICH2 you should've to add it's lib to LD_LIBRARY_PATH.

Using hostfile is needed when you want to run it on more than 1 computer. But since you have only 1 using -np will be just fine.

Are you using the 3.0 or 3.1 version?
At first I recommend you to recked your mpi installation.

nilesh May 20, 2014 09:20

Quote:

Originally Posted by CrashLaker (Post 493095)
Seems like there's a problem with your mpi or the way
you're linking the LD_LIBRARY_PATH env.
For example:
When using MPICH2 you should've to add it's lib to LD_LIBRARY_PATH.

Using hostfile is needed when you want to run it on more than 1 computer. But since you have only 1 using -np will be just fine.

Are you using the 3.0 or 3.1 version?
At first I recommend you to recked your mpi installation.

I am using version 3.1. Where and how to add this LD_LIBRARY_PATH ?

CrashLaker May 20, 2014 09:37

Quote:

Originally Posted by nilesh (Post 493108)
I am using version 3.1. Where and how to add this LD_LIBRARY_PATH ?

You can add this to your .bashrc file to make it permanent or you can add this to your current session.

Open .bashrc; Find "export LD_LIBRARY_PATH"; Create a new line below that and write "LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/pathtompich2/lib"

or add this on your terminal:
root$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/pathtompich2/lib

I recommend the latter one so that it doesn't interfere on another applications

nilesh May 21, 2014 09:32

Quote:

Originally Posted by CrashLaker (Post 493110)
You can add this to your .bashrc file to make it permanent or you can add this to your current session.

Open .bashrc; Find "export LD_LIBRARY_PATH"; Create a new line below that and write "LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/pathtompich2/lib"

or add this on your terminal:
root$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/pathtompich2/lib

I recommend the latter one so that it doesn't interfere on another applications

I tried your suggestions but the problem still persists. I also tried installing mpich2 again. DDC is unable to divide the mesh. Even if a divided mesh created from another machine is provided, SU2_CFD runs the whole problem in parallel on each node.
Mysteriously, I tried installing it on another machine and its working on a CentOS machine with the same config procedure.

CrashLaker May 21, 2014 11:09

Quote:

Originally Posted by nilesh (Post 493343)
I tried your suggestions but the problem still persists. I also tried installing mpich2 again. DDC is unable to divide the mesh. Even if a divided mesh created from another machine is provided, SU2_CFD runs the whole problem in parallel on each node.
Mysteriously, I tried installing it on another machine and its working on a CentOS machine with the same config procedure.

Did you check if the mpirun (which mpirun) you're using is mpich2?

nilesh May 22, 2014 05:21

Solved!!!
 
Quote:

Originally Posted by CrashLaker (Post 493385)
Did you check if the mpirun (which mpirun) you're using is mpich2?

Thanks a lot Mr. Carlos, I highly appreciate your help in this matter.
The problem has probably got to do with the way Ubuntu installs mpi from the package manager. All mpis somehow get installed in the same directory if done automatically and then it becomes really difficult to ensure which one is being run.

The solution:
I removed all mpi's and then installed mpich (required for other purposes) from the package manager. Then manually downloaded and installed mpich2 from source to another directly and finally added "export PATH=/usr/lib/mpich2/bin:$PATH" to my bashrc. Its finally up and running!!!! :) :) :)

Is there a way to mark this post so that it could be easier for other users facing a similar problem?

CrashLaker May 22, 2014 13:16

Quote:

Originally Posted by nilesh (Post 493534)
Is there a way to mark this post so that it could be easier for other users facing a similar problem?

haha I'm glad that you've managed to solve it! :)

I also received a lot of help when I first started SU2. There's a lot of experts here so don't forget to ask/and answer anytime hehe

About your question I don't know what could me done since I'm a newbie just like you haha. But it couldn't be helped considering that you didn't know the exact problem.

GL! :)
Carlos.

aero_amit March 10, 2016 03:56

Hello Guys,

Actually you can run it on a cluster by using ---

mpirun -np 30 --hostfile hostfile SU2_CFD filename.cfg

(without using python script)

Here number of codes are 30 (can be any number).

In hostfile (In local directory), you should write nodes for example --

node201
node202
node203
.......... so on

This works for me...

ernestyalumni May 15, 2016 23:00

SU2 + GPU in parallel
 
Quote:

Originally Posted by copeland (Post 426398)

GPU computing has the potential to be very powerful. The development team has discussed this, but, at the current time, we don't have anyone working on it. If members of the community are interested, we encourage folks to take on the challenge!

-Sean

Sean Copeland, I was curious, as this post was almost 3 years ago, and searching here and on Google for "SU2 GPU parallel computing", I only came across this post. What's the status for SU2 on GPU parallel computing and/or any plans? Let us know; I myself am reading up on parallel computing with combustion CFD on parallel threads on GPU in mind.

Samirs July 18, 2018 06:08

Quote:

Originally Posted by aero_amit (Post 588963)
Hello Guys,

Actually you can run it on a cluster by using ---

mpirun -np 30 --hostfile hostfile SU2_CFD filename.cfg

(without using python script)

Here number of codes are 30 (can be any number).

In hostfile (In local directory), you should write nodes for example --

node201
node202
node203
.......... so on

This works for me...

Are you sure that each process is not doing redundant work?

I tried it like first exporting

export SU2_MPI_COMMAND="mpirun -np %i %s"

and then running parallel_computation.py -f turb_ONERA_M6.cfg -n 10

It works well but my only concern is i'm not getting good scalability.

Please correct me if it's wrong

aero_amit August 4, 2018 03:26

Scalability..... I have not benchmarked any thing but it seems to scale reasonably I feel. Can u share the details i. e. number of cells, cores used, memory etc..

Samirs August 7, 2018 01:36

Quote:

Originally Posted by aero_amit (Post 701503)
Scalability..... I have not benchmarked any thing but it seems to scale reasonably I feel. Can u share the details i. e. number of cells, cores used, memory etc..

Thanks Amit :)

I'm running testcase from https://github.com/su2code/su2code.g...bulent_ONERAM6

like "parallel_computation.py -f turb_ONERAM6.cfg -n 4"

I'm running it on Intel broadwell nodes with each of 128GB RAM.

Thanks again for your help

aero_amit September 18, 2018 10:31

Quote:

Originally Posted by Samirs (Post 701722)
Thanks Amit :)

I'm running testcase from https://github.com/su2code/su2code.g...bulent_ONERAM6

like "parallel_computation.py -f turb_ONERAM6.cfg -n 4"

I'm running it on Intel broadwell nodes with each of 128GB RAM.

Thanks again for your help

Hi samirs,

Sorry for the late reply. Actually if u have a smaller mesh, it will not scale beyond few cores. I could not give a try to above problem personally but I have given experimental try to couple of cases having mesh size in few 10s of million and they scaled well upto 512 cores or so.

Samirs September 28, 2018 01:25

Quote:

Originally Posted by aero_amit (Post 706713)
Hi samirs,

Sorry for the late reply. Actually if u have a smaller mesh, it will not scale beyond few cores. I could not give a try to above problem personally but I have given experimental try to couple of cases having mesh size in few 10s of million and they scaled well upto 512 cores or so.

Thanks Amit,

As you mentioned about mesh size, do you know any such tool to generate mesh of larger size which we can use in SU2?

hlk September 29, 2018 12:47

There are a number of different mesh generation options.

Pointwise is a commercial tool that outputs directly into the SU2 format, and there are several options (including some free options) that output to CGNS format, which SU2 can read while using the mesh_format config options.



Quote:

Originally Posted by Samirs (Post 708008)
Thanks Amit,

As you mentioned about mesh size, do you know any such tool to generate mesh of larger size which we can use in SU2?


kont87 February 24, 2020 13:44

Hello, it is an old topic but I have a similar issue.

When I execute the run with mpiexec command, it works flawlessly. However, when I switch to parallel_computation.py and lets say run the solver on 4 cores, it seems each core does the same computation (4 solutions).

I have win 8.1 OS, MS-MPI, python 3.8 (and all packages) are set, the paths are linked.

To sum up, I can simulate by using:
mpiexec -n 4 SU2_CFD default.cfg (gives 1 solution, ~4x speed up)

but not by this one:
python parallel_computation.py -f default.cfg -n 4 (gives 4 identical solutions, no speed up)

Any help is appreciated.


All times are GMT -4. The time now is 04:23.