CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Script to Run Parallel Jobs in Rocks Cluster (https://www.cfd-online.com/Forums/openfoam-solving/58101-script-run-parallel-jobs-rocks-cluster.html)

asaha January 10, 2009 12:19

Hello All, Is there a simpl
 
Hello All,

Is there a simple script available that can be used to run OpenFOAM 1.5 parallel jobs in CentOS PE-Rocks cluster with SGE.

I am running serial jobs in the cluster using

qsub -cwd -S /bin/bash ser_script.sh

ser_script.sh

#!/bin/bash
. $HOME/OpenFOAM/OpenFOAM-1.5/etc/basrhc
interFoam

and for parallel jobs I used

qsub -cwd -S /bin/bash par_script.sh

par_script.sh

#!/bin/bash
. $HOME/OpenFOAM/OpenFOAM-1.5/etc/bashrc
mpirun -np 2 interFoam -parallel

It seems my parallel script is not working well as the clock times for the parallel job (32797 s) is very high compared with that of the serial job (1385 s).

To check I ran the above serial (1588 s) and parallel (697 s) cases in the master node and observed that the case runs well in parallel mode.

I would like to know what I need to put in the parallel script so that the case gives a good parallel speed up when submitted through qsub.

Please advise.


Thanks and regards,

a a saha

velan January 11, 2009 00:22

Parallel submission requires h
 
Parallel submission requires hostname of the local machine. Usually in PBS(Rocks),

mpirun -hostfile `cat $PBS_NODEFILE` -np `cat $PBS_NODEFILE | wc -l` interFoam -parallel > output.log

I never tried SGE. But you can do by getting those machine names, like

mpirun -host machinefile -np 2 interFoam -parallel > output.log

Here machinefile should contains the hostname of the local machine (compute-node-01, compute-node-02....)

asaha January 11, 2009 02:08

Hello Velan, Thanks for you
 
Hello Velan,

Thanks for your post. Does the script for PBS (Rocks) optimises the allocation of the processors for parallel jobs?

Whereas, for SGE the indicated script will not allocate the processors for parallel jobs optimally as it will take first two of the machines listed in the machinefile and start processing.

Please correct me if I am missing something here.

Regards,

a a saha.

villet January 11, 2009 06:02

Hi Saha, first of all, are
 
Hi Saha,

first of all, are you maintaining the cluster by yourself? If not, you should ask your system administrator how to run parallel jobs on SGE. They provide the information how the SGE is configured - what is supported and what is not.

I presume you use OpenMPI which is the default MPI in OpenFOAM. Then you need to ask to use a parallel environment in the script (I ask for "openmpi"-named PE on our cluster and 8 slots):

#$ -pe openmpi 8
#$ -R y

The OpenMPI-specific way to call OpenFOAM is:

mpirun -np $NSLOTS interFoam -parallel

In your script SGE allocated only one slot, but OpenFOAM ran with two processes. That must have slowed the system.

Hope this helps,
Ville

asaha January 11, 2009 12:45

Hello Ville, Thanks for you
 
Hello Ville,

Thanks for your advise. I checked up with the cluster documentation which says

for parallel jobs on n-processors to use:

$ qsub -cwd -S /bin/bash -pe mpi n your_script.sh

However, when I execute the following

$ qsub -cwd -S /bin/bash -pe openmpi 2 -R y par_script_new.sh

I get the following error

Unable to run job: job rejected: the requested parallel environment "openmpi" does not exist.

I think there is no support for openmpi parallel environment in the cluster.

Pl. suggest if there is a possible work around.

Regards,

a a saha

velan January 11, 2009 13:01

Hello Saha, Can you please
 
Hello Saha,

Can you please tell me, from where you are using machinefile ?. Send me the output of machinefile.

And meanwhile can you tell me openFOAM is installed/mounted in all compute nodes.

If possible send me the script, i will try to check it.

asaha January 11, 2009 13:56

Hello Velan, OpenFOAM is in
 
Hello Velan,

OpenFOAM is installed in my home directory.
I have two scenario which can still run my parallel cases.

(1) Scenario 1 (no qsub)

I don't have any problem in executing parallel run using the following command:

mpirun -np 2 -hostfile machines interFoam -parallel

where I give the list of compute nodes in the machines file as below

compute-0-5 slots=1
compute-0-7 slots=1

If you see I do not use qsub for my parallel execution here. Clock time (1305 s)

(2) Scenario 2 (with qsub)

I use the following command

qsub -cwd -S /bin/bash par_script_new.sh

par_script_new.sh

#!/bin/bash
. $HOME/OpenFOAM/OpenFOAM-1.5/etc/bashrc
mpirun -np 2 -hostfile machines interFoam -parallel

The contents of machine file is same as in Scenario 1.

In this qsub allots 1 slot for my job and the parallel run is executed on two compute nodes as per the machines file. Clock time = 1271 s.


I do not think the above two methods are elegant way of executing parallel jobs in a cluster.

Is it possible to get the MPI parallel environemnt in OpenFOAM-1.5 instead of OpenMPI?
I am using the binary version here.

Regards,

a a saha.

velan January 11, 2009 14:52

Hello Saha, I tried openfoa
 
Hello Saha,

I tried openfoam with MPICH, but i never succeed in my cluster. I keep getting error. So i moved to default openMPI.

I am feeling bad, that i couldnt help you to resolve this issue. May be somebody can help you to solve it.

villet January 11, 2009 16:16

Saha: In your case I would
 
Saha:

In your case I would contact the system administrator and ask which MPI are supported and how you should run OpenFOAM. You shouldn't just pass the job queue system. Parallel envinronment "openmpi" is just a name in my cluster and the configurations can vary.

I have used MPICH and OpenMPI in SGE, but there are many other options. If the system is compatible with OpenMPI, you may need to do a small modification. SGE support was dropped on the default compilation of OpenMPI some time ago. Adding the option "--with-sge" in "ThirdParty/Allwmake" (at the end of OpenMPI configure section) enables SGE support. Then you should run "./Allwmake" in ThirdParty-directory and make sure the compilation is successful.

Ville

asaha January 12, 2009 13:00

Hello Ville, I could not co
 
Hello Ville,

I could not contact the system administrator today. It seems to me that openmpi, mpich, lam, mpich-hpl and openmpi-hpl are available in the cluster.

Is the following error I get is due to no SGE support in OpenMPI of OpenFOAM?
------------------------------------------------
Unable to run job: job rejected: the requested parallel environment "openmpi" does not exist.
------------------------------------------------

Please suggest if I must do the compilation in "ThirdParty/Allwmake" enabling SGE support so that I dont get the error.

Other wise I would try with OpenFOAM-1.4.1-dev instead which has default lam parallel environment.

Hello Velan:

Thanks for your help.


Regards,

a a saha.

villet January 12, 2009 15:33

The error was just because the
 
The error was just because there was no parallel environment called "openmpi" defined in your SGE system. You should use "-pe mpi n" on the command line instead (like in your example).

I'm not aware how the actual configurations of parallel environments differ between different MPIs. I'm just an user http://www.cfd-online.com/OpenFOAM_D...part/happy.gif Usually there are PE for every MPI.

asaha January 16, 2009 00:45

Hello Ville, I just conduct
 
Hello Ville,

I just conducted a check to find if OpenMPI pe environemnt is available with SGE on the OpenMPI versions available in Rocks and that comes with OpenFOAM-1.5.

I found by executing in OpenFOAM env that SGE support exits.

ompi_info | grep gridengine
MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.6)
MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.6)

Whereas, SGE support is not there with OpenMPI installed in Rocks. Probably due to this qsub is not accepting my parallel jobs.

So installed OpenFOAM-1.4.1-dev and using the following script I could successfully execute parallel jobs on the cluster.

qsub -cwd -S /bin/bash -pe mpi 4 par_script_new_4_cpu.sh

par_script_new_4_cpu.sh:

#!/bin/bash
date

. $HOME/OpenFOAM/OpenFOAM-1.4.1-dev/.OpenFOAM-1.4.1-dev/bashrc
sleep 180
export LAMRSH=ssh
lamboot -d $FOAM_RUN/parallel_test_4_cpu/system/machines
mpirun -np 4 interFoam $FOAM_RUN parallel_test_4_cpu -parallel

I use the sleep time of 180 during which I specify the nodes allocated by SGE to the machines file.

Thanks again for all the help.


Regards,

a a saha.

phuchuynh July 4, 2012 22:51

Run OpenFOAM on cluster CentOS
 
Hi everyone, I am using operating system CentOS cluster. I ran OF (OpenFoam) on PC serial. Currently, I would like to run it on cluster, but I dont know how to write script to run OF ? Can everyone help me ?
thanks
phuchuynh


All times are GMT -4. The time now is 07:06.