CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

Running PBS job in parallel on HPC cluster

Register Blogs Community New Posts Updated Threads Search

Like Tree6Likes
  • 1 Post By silviliril
  • 1 Post By Gerry Kan
  • 1 Post By silviliril
  • 1 Post By Gerry Kan
  • 1 Post By Gerry Kan
  • 1 Post By Gerry Kan

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   August 2, 2019, 01:06
Default Running PBS job in parallel on HPC cluster
  #1
Member
 
L S
Join Date: Apr 2016
Posts: 63
Rep Power: 10
silviliril is on a distinguished road
Hi,

I have a PBS script available for different application. How can I write this script for running my case on OpenFOAM 2.2.0 with 8 cores. Available gcc version is 4.4.7 on redhat.

Manually I run something like this:
1. of220
2. mpirun -np 8 phaseChangeHeatFoam -parallel >>log.solver (already decomposed)

The PBS Script:
Code:
#!/bin/bash
#PBS -N HELLO_OUT
#PBS -q small
#PBS -l nodes=2:ppn=4
#PBS -V
#PBS -j oe
source /SOFT/ics_2013.1.046/icsxe/2013.1.046/ictvars.sh intel64
cd $PBS_O_WORKDIR
cat $PBS_NODEFILE > pbsnodes
mpirun -machinefile $PBS_NODEFILE -np 8 ./MPIHello | tee MPIHello.out
Светлана likes this.
silviliril is offline   Reply With Quote

Old   August 2, 2019, 02:29
Default
  #2
Senior Member
 
Gerry Kan's Avatar
 
Gerry Kan
Join Date: May 2016
Posts: 347
Rep Power: 10
Gerry Kan is on a distinguished road
Dear Sabrina:

Aside from the -parallel switch from your mpirun line I think it is good to go.

Another thing to check is that, depending on your cluster, you might need to call a specific version of mpirun provided with the cluster infrastructure (e.g., Infiniband). You need to check with your cluster admin on this, though.

Good luck, Gerry.
Светлана likes this.
Gerry Kan is offline   Reply With Quote

Old   August 2, 2019, 02:50
Default Error while running PBS script
  #3
Member
 
L S
Join Date: Apr 2016
Posts: 63
Rep Power: 10
silviliril is on a distinguished road
Hi gerry Kan,

I am still confused how to modify the above mentioned script for my OpenFOAM case. How do I add command for openmpi? I have written the script as shown below which is not working.

Code:
#!/bin/bash
#PBS -N HELLO_OUT
#PBS -q small
#PBS -l nodes=2:ppn=4
#PBS -V
#PBS -j oe

cd $PBS_O_WORKDIR
cat $PBS_NODEFILE > pbsnodes

#open case file folder
cd /UHOME/lsilvish/Coarse  


#Load OpenFOAM's environment
module load openmpi-x86_64
source $HOME/OpenFOAM/OpenFOAM-2.2.0/etc/bashrc WM_NCOMPPROCS=4 foamCompiler=system WM_COMPILER=Gcc45 WM_MPLIB=SYSTEMOPENMPI
mpirun -np 8 phaseChangeHeatFoam -parallel > Output.log
I got error:
Code:
ModuleCmd_Load.c(204):ERROR:105: Unable to locate a modulefile for 'openmpi-x86_64'
/UHOME/lsilvish/OpenFOAM/OpenFOAM-2.2.0/etc/config/settings.sh: line 405: mpicc: command not found
/var/spool/PBS/mom_priv/jobs/7994.hn.SC: line 19: mpirun: command not found
module avail is showing following
Quote:
------------------------------------------------------------------------------ /usr/share/Modules/modulefiles -------------------------------------------------------------------------------
dot module-cvs module-info modules null use.own

------------------------------------------------------------------------------------- /etc/modulefiles --------------------------------------------------------------------------------------
compat-openmpi-psm-x86_64 openmpi-x86_64
Светлана likes this.

Last edited by silviliril; August 2, 2019 at 03:25. Reason: Adding module avail output
silviliril is offline   Reply With Quote

Old   August 2, 2019, 04:59
Default
  #4
Senior Member
 
Gerry Kan's Avatar
 
Gerry Kan
Join Date: May 2016
Posts: 347
Rep Power: 10
Gerry Kan is on a distinguished road
Hi Sabrina:

Reading your error message, it looks like you are using an mpi library that is not installed on the cluster. You need to talk to your cluster manager to locate the correct mpirun and its environments.

I suppose the easiest way is to obtain and modify an existing pbs script that already works properly from another user.

Another thing, you should not have to redirect your output to a file. The queue manager will do that for you. The output will be stored in a file corresponding to your job name and its job number in the queue.

Gerry.
Светлана likes this.
Gerry Kan is offline   Reply With Quote

Old   August 2, 2019, 05:02
Default
  #5
Senior Member
 
Gerry Kan's Avatar
 
Gerry Kan
Join Date: May 2016
Posts: 347
Rep Power: 10
Gerry Kan is on a distinguished road
I also I see that you are loading the module for the mpi. Use "module av" to see if the module "openmpi-x86_64" really exists. It is very likely that there will be a version number assoicated with it (e.g., openmpi-x86_64/4.0.x).

If so, change the module load line to relfect this.
Светлана likes this.
Gerry Kan is offline   Reply With Quote

Old   August 4, 2019, 06:18
Default
  #6
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,975
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Quote:
Originally Posted by silviliril View Post
Moreover, I have to run my case with Job scheduler script on redhad server. Currently, I am unable run the solver with PBS job scheduler script due to openmpi gcc version error. which I have asked in separate thread:
Running PBS job in parallel on HPC cluster
Quick answer: As Gerry Kan indicated, you should add a line for "module avail" within the job script, e.g.:
Code:
#Load OpenFOAM's environment

module avail

module load openmpi-x86_64
source $HOME/OpenFOAM/OpenFOAM-2.2.0/etc/bashrc WM_NCOMPPROCS=4 foamCompiler=system WM_COMPILER=Gcc45 WM_MPLIB=SYSTEMOPENMPI
__________________
wyldckat is offline   Reply With Quote

Old   August 6, 2019, 00:58
Default still the same error
  #7
Member
 
L S
Join Date: Apr 2016
Posts: 63
Rep Power: 10
silviliril is on a distinguished road
Quote:
Originally Posted by wyldckat View Post
Quick answer: As Gerry Kan indicated, you should add a line for "module avail" within the job script, e.g.:
Code:
#Load OpenFOAM's environment

module avail

module load openmpi-x86_64
source $HOME/OpenFOAM/OpenFOAM-2.2.0/etc/bashrc WM_NCOMPPROCS=4 foamCompiler=system WM_COMPILER=Gcc45 WM_MPLIB=SYSTEMOPENMPI
Dear bruno santos,

I have applied the solution that you have mentioned in the above post. I re-written the script as below:
Code:
#!/bin/bash
#PBS -N HELLO_OUT
#PBS -q small
#PBS -l nodes=2:ppn=4
#PBS -V
#PBS -j oe

cd $PBS_O_WORKDIR
cat $PBS_NODEFILE > pbsnodes

#open case file folder
cd /UHOME/lsilvish/Coarse  


#Load OpenFOAM's environment
module avail
module load openmpi-x86_64
source $HOME/OpenFOAM/OpenFOAM-2.2.0/etc/bashrc WM_NCOMPPROCS=4 foamCompiler=system WM_COMPILER=Gcc45 WM_MPLIB=SYSTEMOPENMPI
mpirun -np 8 phaseChangeHeatFoam -parallel > Output.log
which still returns with pretty much same error:

Code:
------------------------ /usr/share/Modules/modulefiles ------------------------
dot         module-cvs  module-info modules     null        use.own
ModuleCmd_Load.c(204):ERROR:105: Unable to locate a modulefile for 'openmpi-x86_64'
/UHOME/lsilvish/OpenFOAM/OpenFOAM-2.2.0/etc/config/settings.sh: line 405: mpicc: command not found
/var/spool/PBS/mom_priv/jobs/7999.hn.SC: line 19: mpirun: command not found
Moreover, the above script is calling for, WM_COMPILER=Gcc45, but, the version that I have available is
Quote:
GCC 4.4.7
G++ 4.4.7
Can GCC version be one of the possible reason for the error?

Where I have been going wrong ?

Last edited by silviliril; August 6, 2019 at 02:01.
silviliril is offline   Reply With Quote

Old   August 6, 2019, 03:36
Default
  #8
Senior Member
 
Gerry Kan's Avatar
 
Gerry Kan
Join Date: May 2016
Posts: 347
Rep Power: 10
Gerry Kan is on a distinguished road
Dear lsilvish:

Before we proceed, just one question. Does your cluster allow you to run small MPI jobs locally? If so, it should let you sort out your module issues.

First off, "module av" shows only a very limited number of modules.
"dot module-cvs module-info modules null use.own"

Locate the meta-module which contains the gcc and mpi. Load this in your .bashrc before doing anything else. Once this is done, try to run a test job with your setting and make sure there are no errors.

Again, this is something your cluster admin, or another, more experienced user of the cluster would be in a much better position to give you a hand with it. We can only guess what it could be from your description.

Sincerely, Gerry.

P.S. - Did you compile OpenFOAM locally in your user account, or did you use a distribution that is made available on the cluster to you? The GCC version should be an non-issue as long as the properly C libraries have been provided by the cluster (and it sounded like it should).
Gerry Kan is offline   Reply With Quote

Old   August 8, 2019, 03:00
Default
  #9
Member
 
L S
Join Date: Apr 2016
Posts: 63
Rep Power: 10
silviliril is on a distinguished road
Quote:
Originally Posted by Gerry Kan View Post
Dear lsilvish:

Before we proceed, just one question. Does your cluster allow you to run small MPI jobs locally? If so, it should let you sort out your module issues.

First off, "module av" shows only a very limited number of modules.
"dot module-cvs module-info modules null use.own"

Locate the meta-module which contains the gcc and mpi. Load this in your .bashrc before doing anything else. Once this is done, try to run a test job with your setting and make sure there are no errors.

Again, this is something your cluster admin, or another, more experienced user of the cluster would be in a much better position to give you a hand with it. We can only guess what it could be from your description.

Sincerely, Gerry.

P.S. - Did you compile OpenFOAM locally in your user account, or did you use a distribution that is made available on the cluster to you? The GCC version should be an non-issue as long as the properly C libraries have been provided by the cluster (and it sounded like it should).
Thank you for replying.
I contacted my admin of the HPC and he edited "bash_profile" file by adding the following line :
Code:
export PATH=$PATH:/usr/lib64/openmpi/bin
Now, We are able to get rid of the first line of error (i.e Unable to locate a modulefile for 'openmpi-x86_64'). But, the rest two lines of error still persist.:
Quote:
/UHOME/lsilvish/OpenFOAM/OpenFOAM-2.2.0/etc/config/settings.sh: line 405: mpicc: command not found
/var/spool/PBS/mom_priv/jobs/8006.hn.SC: line 19: mpirun: command not found
What is still missing. My admin does not know beyond this.
silviliril is offline   Reply With Quote

Old   August 8, 2019, 04:38
Default
  #10
Senior Member
 
Gerry Kan's Avatar
 
Gerry Kan
Join Date: May 2016
Posts: 347
Rep Power: 10
Gerry Kan is on a distinguished road
Hi lsilvish:

What the cluster admin did was to point you to the location of the OpenMPI executables. The fact that mpicc and mpirun are still not there suggests that either the path is incorrect, or the PATH environment variable is not properly updated.

To verify this, look at the contents of the variable PATH in the console. This can be quickly done by issuing the following command

Code:
echo ${PATH}
and look for "/usr/lib64/openmpi/bin" in the output. From what your cluster admin did, it should be at the end.

If it is there, check if mpirun and mpicc are also there. This can be done using the following commands:

Code:
which mpirun
which mpicc
Another thing, the "dot" in front of "bash_profile" is important. So your file should be ".bash_profile". Please check if this is the case.

However, I prefer to save all my environment settings in .bashrc so I don't know for certain how .bash_profile is sourced when you send your job to the cluster.

Gerry.
silviliril likes this.
Gerry Kan is offline   Reply With Quote

Old   August 8, 2019, 04:53
Default
  #11
Member
 
L S
Join Date: Apr 2016
Posts: 63
Rep Power: 10
silviliril is on a distinguished road
Quote:
Originally Posted by Gerry Kan View Post
Hi lsilvish:

What the cluster admin did was to point you to the location of the OpenMPI executables. The fact that mpicc and mpirun are still not there suggests that either the path is incorrect, or the PATH environment variable is not properly updated.

To verify this, look at the contents of the variable PATH in the console. This can be quickly done by issuing the following command

Code:
echo ${PATH}
and look for "/usr/lib64/openmpi/bin" in the output. From what your cluster admin did, it should be at the end.

If it is there, check if mpirun and mpicc are also there. This can be done using the following commands:

Code:
which mpirun
which mpicc
Another thing, the "dot" in front of "bash_profile" is important. So your file should be ".bash_profile". Please check if this is the case.

However, I prefer to save all my environment settings in .bashrc so I don't know for certain how .bash_profile is sourced when you send your job to the cluster.

Gerry.
Dear Gerry Kan, Thanks for replying.

The "echo ${PATH}" command gives following output:
Code:
/UHOME/lsilvish/OpenFOAM/ThirdParty-2.2.0/platforms/linux64Gcc45/gperftools-svn/bin:/UHOME/lsilvish/OpenFOAM/ThirdParty-2.2.0/platforms/linux64Gcc45/paraview-3.12.0/bin:/UHOME/lsilvish/OpenFOAM/lsilvish-2.2.0/platforms/linux64Gcc45DPOpt/bin:/UHOME/lsilvish/OpenFOAM/site/2.2.0/platforms/linux64Gcc45DPOpt/bin:/UHOME/lsilvish/OpenFOAM/OpenFOAM-2.2.0/platforms/linux64Gcc45DPOpt/bin:/UHOME/lsilvish/OpenFOAM/OpenFOAM-2.2.0/bin:/UHOME/lsilvish/OpenFOAM/OpenFOAM-2.2.0/wmake:/opt/pbs/default/sbin:/opt/pbs/default/bin:/usr/local/java/default/bin:/opt/cmu/bin:/opt/cmu:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/UHOME/lsilvish/bin:/usr/lib64/openmpi/bin
The "which mpirun", "which mpicc" commands have given me following output:
Quote:
/usr/lib64/openmpi/bin/mpirun
/usr/lib64/openmpi/bin/mpicc
.bash_profile and .bashrc files are attached.
Attached Files
File Type: zip bash_profile.zip (683 Bytes, 9 views)

Last edited by silviliril; August 8, 2019 at 04:59. Reason: Attachment
silviliril is offline   Reply With Quote

Old   August 9, 2019, 11:50
Default
  #12
Senior Member
 
Gerry Kan's Avatar
 
Gerry Kan
Join Date: May 2016
Posts: 347
Rep Power: 10
Gerry Kan is on a distinguished road
Hi silviliril:

I don't see anything that jumps out in both the .bash_profile and .bashrc files. I would try the following two things:

1) write a dummy hello world shell script and submit that to the cluster. Given the settings you provided, this should go through. If not, there is still something wrong with your submission script and you should talk to the cluster admin to fix it. Now you have a simple test program they should be more willing to give you a hand.

2) try to start a small (4 node) OpenFOAM MPI run locally (i.e., without qsub, just run mpirun -np X yyFoam -parallel). Any simple tutorial would do. If there is anything not right with your OpenFOAM build the run should also fail.

The other question is, how did you set up OpenFOAM on your system? I am asking because I noticed the version of GCC on your system is different from that indicated by WM_COMPILER. My suspicion is that you simply copied OpenFOAM from another system to the cluster? Even that it should work, given that the glibc specific to your OpenFOAM build are there.
Gerry Kan is offline   Reply With Quote

Reply

Tags
openfoam 2.2.0, pbs script, running in parallel


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Error while running a Fluent Simulation in Hpc cluster averageindianjoe FLUENT 0 February 7, 2017 05:10
problem about running parallel on cluster killsecond OpenFOAM Running, Solving & CFD 3 July 23, 2014 21:13
interFoam process forking on HPC using PBS JFM OpenFOAM Running, Solving & CFD 2 February 4, 2014 08:49
Parallel Solving with HPC Cluster? bah OpenFOAM Running, Solving & CFD 8 October 13, 2011 03:08
[OpenFOAM] Running paraFoam as a parallel job Hisham ParaView 2 September 13, 2011 09:14


All times are GMT -4. The time now is 20:25.