CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > SU2

Problem running SU2 using PBS

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   February 17, 2015, 16:06
Default Problem running SU2 using PBS
  #1
New Member
 
Join Date: Dec 2014
Posts: 6
Rep Power: 11
jzhen is on a distinguished road
I'm trying to run a case in parallel on a cluster using my qsub script:

#PBS -N SU2_inlet_test
#PBS -q default
#PBS -S /bin/bash
#PBS -l nodes=compute-0-0: ppn=16

# Location of Case
CASEDIR=$PBS_O_WORKDIR

# Scratch directory
SCRATCH=/tmp/jzhen/SU2_inlet/completed

#make directory in scratch
mkdir -p ${SCRATCH}

# Change directories to Location of Case
cd ${CASEDIR}

# Copy files to the scratch directory
cp * ${SCRATCH}

# Change directories to scratch directory (must exist)
cd ${SCRATCH}

# Clear out any files from previous runs
rm -fr *.dat *.csv *.png

pwd

# MPIRun of SU2_CFD
mpirun -n 8 SU2_CFD inlet.cfg &>z.out


# Copy files back to original directory

cp -r ${SCRATCH} ${CASEDIR}


When Ie execute this, I get the error message:

mpirun noticed that process rank 0 with PID 16581 on node compute-0-0.local exited on signal 9 (Killed).


Does anyone experience this?
Attached Files
File Type: txt SU2out.txt (43.1 KB, 14 views)
jzhen is offline   Reply With Quote

Old   February 24, 2015, 20:05
Default
  #2
Senior Member
 
Zach Davis
Join Date: Jan 2010
Location: Los Angeles, CA
Posts: 101
Rep Power: 16
RcktMan77 is on a distinguished road
Have you partitioned the mesh already? Typically I would invoke SU2 using their parallel_computation.py wrapper python script. In order to do so using a cluster environment with PBS/Torque schedulers you may need to modify $SU2_HOME/SU2_PY/SU2/run/interface.py. On line 58 of this file you can modify the unique mpirun command that gets executed when the parallel_computation.py wrapper is invoked. For PBS/Torque with OpenMPI, you could modify this line to read:

mpi_Command = 'mpirun --hostfile ${PBS_NODEFILE} -np %i %s'

Although, with OpenMPI the library should automatically get a lists of hosts from the scheduler, so this example isn't very instructive. However, you can adapt this to suit the specific MPI library used to compile SU2 on your cluster environment. For MPICH as another example, you would modify this line to read:

mpi_Command = 'mpirun -f ${PBS_NODEFILE} -n %i %s'
RcktMan77 is offline   Reply With Quote

Old   March 3, 2015, 15:36
Default
  #3
Senior Member
 
Charles
Join Date: Apr 2009
Posts: 185
Rep Power: 18
CapSizer is on a distinguished road
Is there a simple series of command line actions that one can reliably take instead of the parallel python script? The problem with the python script is that it does a bunch of stuff before actually getting around to launching the mpirun command. In a cluster environment, you don't want to do any of that on the login node, so one would typically make sure that the decomposition and merging steps are done on a compute node. Basically, what I'm after is the commands for:

decompose

mpirun solver

merge

sort out the output stuff

Most of this is easy, but it looks like the odd one is the decomposition command. The guide on the website says that SU2_PRT is used for this ... but I don't find a binary for that in the current version. Any ideas?
CapSizer is offline   Reply With Quote

Old   March 3, 2015, 16:13
Default
  #4
New Member
 
Join Date: Dec 2014
Posts: 6
Rep Power: 11
jzhen is on a distinguished road
RcktMan77: I tried your suggestion of using the python script with the modification of: mpi_Command = mpirun -f ${PBS_NODEFILE} -n %i %s. I launched an interactive PBS job using qsub -I -l nodes=1pn=8 and did a cat $PBS_NODEFILE to make sure I had the nodes, and got a similar error.

I wasn't sure if this problem was specific to my *.cfg and mesh files or my PBS setup, so I used the same PBS process on the Onera tutorial case, and that worked! I'm led to believe there is some issue with my mesh (maybe it's a memory problem?) or my *.cfg file, and not with my PBS setup itself. I remade a mesh in Pointwise that was axisymmetric, and was able to get that to work.

CapSizer: I didn’t run the decomposition command (SU2_PRT) because I’ve come to understand that SU2_CFD has built-in the partitioning step. That's why it is not included in the version of SU2 I have installed (v 3.2.8). The command-line option you can use is just: mpirun -n NP SU2_CFD your_config_file.cfg where NP is the number of processors.
jzhen is offline   Reply With Quote

Old   March 3, 2015, 16:29
Default
  #5
Senior Member
 
Zach Davis
Join Date: Jan 2010
Location: Los Angeles, CA
Posts: 101
Rep Power: 16
RcktMan77 is on a distinguished road
The command(s) that are included in your scheduler's submission script should only run on the compute nodes which are reserved for the job. Thus, anything that is run via the the Python parallel_computation.py script will only be run on those nodes reserved for the job. You shouldn't need to be concerned that some of the components that are run via the Python script will be run on the node from where the job was submitted.

That being said, I'm not sure what they did with the SU2_PRT C++ module as I don't have it in my installation either.
RcktMan77 is offline   Reply With Quote

Old   March 3, 2015, 16:32
Default
  #6
Senior Member
 
Zach Davis
Join Date: Jan 2010
Location: Los Angeles, CA
Posts: 101
Rep Power: 16
RcktMan77 is on a distinguished road
Quote:
Originally Posted by jzhen View Post
CapSizer: I didn’t run the decomposition command (SU2_PRT) because I’ve come to understand that SU2_CFD has built-in the partitioning step. That's why it is not included in the version of SU2 I have installed (v 3.2.8). The command-line option you can use is just: mpirun -n NP SU2_CFD your_config_file.cfg where NP is the number of processors.
Thanks for clarifying that, jzhen. CapSizer left me wondering how partitioning was being handled in 3.2.8. I hadn't even noticed it had gone missing.
RcktMan77 is offline   Reply With Quote

Old   March 4, 2015, 12:33
Default
  #7
Senior Member
 
Charles
Join Date: Apr 2009
Posts: 185
Rep Power: 18
CapSizer is on a distinguished road
Got it figured out OK, example PBS script is up at http://wiki.chpc.ac.za/howto:su2 . Thanks for the advice!
CapSizer is offline   Reply With Quote

Old   March 18, 2015, 16:30
Default
  #8
New Member
 
Join Date: Dec 2014
Posts: 6
Rep Power: 11
jzhen is on a distinguished road
I'm running into problems again with running my case in parallel using PBS. I'm using the parallel_computation.py script, and get the following error message:

---------------------- Read Grid File Information -----------------------
Three dimensional problem.
8341318 interior elements.
8124838 tetrahedra.
179400 hexahedra.
37080 pyramids.
1645805 points.
8124838 tetrahedra.
179400 hexahedra.
37080 pyramids.
1645805 points.
8124838 tetrahedra.
179400 hexahedra.
37080 pyramids.
1645805 points.
8124838 tetrahedra.
179400 hexahedra.
37080 pyramids.
1645805 points.
5 surface markers.
16424 boundary elements in index 0 (Marker = box).
5 surface markers.
16424 boundary elements in index 0 (Marker = box).
5 surface markers.
16424 boundary elements in index 0 (Marker = box).
3092 boundary elements in index 1 (Marker = inflow).
3092 boundary elements in index 1 (Marker = inflow).
3092 boundary elements in index 1 (Marker = inflow).
3994 boundary elements in index 2 (Marker = outflow).
3994 boundary elements in index 2 (Marker = outflow).
3994 boundary elements in index 2 (Marker = outflow).
254470 boundary elements in index 3 (Marker = symmetry).
254470 boundary elements in index 3 (Marker = symmetry).
254470 boundary elements in index 3 (Marker = symmetry).
5 surface markers.
16424 boundary elements in index 0 (Marker = box).
3092 boundary elements in index 1 (Marker = inflow).
3994 boundary elements in index 2 (Marker = outflow).
254470 boundary elements in index 3 (Marker = symmetry).
74060 boundary elements in index 4 (Marker = walls).
74060 boundary elements in index 4 (Marker = walls).
74060 boundary elements in index 4 (Marker = walls).
74060 boundary elements in index 4 (Marker = walls).
Traceback (most recent call last):
File "/home/jzhen/SU2/SU2_Install/bin/parallel_computation.py", line 110, in <module>
main()
File "/home/jzhen/SU2/SU2_Install/bin/parallel_computation.py", line 61, in main
options.compute )
File "/home/jzhen/SU2/SU2_Install/bin/parallel_computation.py", line 88, in parallel_computation
info = SU2.run.CFD(config)
File "/home/jzhen/SU2/SU2_Install/bin/SU2/run/interface.py", line 88, in CFD
run_command( the_Command )
File "/home/jzhen/SU2/SU2_Install/bin/SU2/run/interface.py", line 276, in run_command
raise exception , message
RuntimeError: Path = /tmp/jzhen/scratch/WD001/,
Command = mpirun -n 4 /home/jzhen/SU2/SU2_Install/bin/SU2_CFD config_CFD.cfg
SU2 process returned error '137'


I don't know what the Runtime Error means. Any thoughts?
jzhen is offline   Reply With Quote

Old   March 18, 2015, 17:53
Default
  #9
Senior Member
 
Zach Davis
Join Date: Jan 2010
Location: Los Angeles, CA
Posts: 101
Rep Power: 16
RcktMan77 is on a distinguished road
@jzhen Which MPI library did you compile SU2 with?
RcktMan77 is offline   Reply With Quote

Old   March 26, 2015, 13:29
Default
  #10
New Member
 
Join Date: Dec 2014
Posts: 6
Rep Power: 11
jzhen is on a distinguished road
I'm using OpenMPI v 1.6.2
jzhen is offline   Reply With Quote

Old   March 26, 2015, 13:42
Default
  #11
Senior Member
 
Zach Davis
Join Date: Jan 2010
Location: Los Angeles, CA
Posts: 101
Rep Power: 16
RcktMan77 is on a distinguished road
The command which you edited in the interface.py file isn't exactly consistent with OpenMPI. You will want to ensure that your OpenMPI library was installed with Torque/PBS scheduler support--it's an option you can pass in during the configure step. Otherwise you need to pass in a machinefile to mpirun. Additionally, with OpenMPI you specify the number of MPI ranks using the -np flag, not -n as you have done. The -n flag would be appropriate for MPICH--another MPI library. Thus, to be safe you could edit your interface.py file as follows:


mpi_Command = 'mpirun --hostfile ${PBS_NODEFILE} -np %i%s'

If you're certain that your OpenMPI library was built with Torque/PBS scheduler support, then the --hostfile flag and it's ${PBS_NODEFILE} argument shouldn't be required. I hope this helps!
RcktMan77 is offline   Reply With Quote

Old   July 25, 2019, 13:10
Post Issues in the same vein
  #12
New Member
 
Nadeem Kever
Join Date: Jul 2019
Posts: 4
Rep Power: 6
NKever is on a distinguished road
Edit: Asked in a different post Parallelizing SU2 in PBS


Hi,

I am trying to run the shape_optimization.py script and use all my available nodes in PBS and I don't think I am doing this right.

Here is my script:

Code:
#PBS -q devel

##PBS -q long

##PBS -q normal

#PBS -W group_list=a1489

#PBS -lselect=50:ncpus=28:model=bro,walltime=02:00:00

##PBS -lselect=80:ncpus=28:model=bro,walltime=08:00:00

##PBS -lselect=80:ncpus=28:model=bro,walltime=50:00:00

#PBS -N test_su2

#

# Request that regular output and terminal output go to the same file

#PBS -j oe

#PBS -m n

# go to where the qsub was performed.

cd $PBS_O_WORKDIR

# Running Program

export SU2_RUN="/home6/nkever/su2/install/SU2/bin"

export SU2_HOME="/home6/nkever/su2/src/SU2"

export PATH=$PATH:$SU2_RUN

export PYTHONPATH=$PYTHONPATH:$SU2_RUN

module load mpi-sgi/mpt

module load python/2.7.15

module load comp-intel/2018.0.128

./shape_optimization.py -g CONTINUOUS_ADJOINT -o SLSQP -f inv_NACA0012_basic.cfg -n 400

echo "finished mpirun"

So I think the issue may be the -n, my interpretation is that the number of partitions tells su2 the amount of nodes. How do I use all my nodes?

Last edited by NKever; July 29, 2019 at 12:17.
NKever is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem About Running Fluent In Linux mitra FLUENT 18 June 20, 2019 02:11
ANSYS Licensing Problem, Processes Running but Showing as Not Running penguinman ANSYS 3 September 27, 2016 13:30
[Other] Problem running after decomposition paul.m.edwards OpenFOAM Meshing & Mesh Conversion 0 January 14, 2015 12:55
Problem while running in Highperformance computing environment Phanipavan STAR-CD 1 September 11, 2013 06:42
problem with running in parallel dhruv OpenFOAM 3 November 25, 2011 05:06


All times are GMT -4. The time now is 20:16.