CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (http://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Running parallel job using qsub on sun grid engine (http://www.cfd-online.com/Forums/openfoam-solving/59160-running-parallel-job-using-qsub-sun-grid-engine.html)

nishant_hull February 6, 2008 16:47

Hi all I need some help from
 
Hi all
I need some help from you about some open Mpi problem.
I am trying to run a program on my AMD64 cluster at university computation facilty.
My problem is running fine using command:
mpirun -machinefile machine -np 4 case root etc

where -machinefile is a manually generated script. But I am trying to run it on cluster using qsub command with automatically allocated machines (not the master node necessarily). for this I write this qsub-file script. I used mpich here and write hostfile/machinefile.

#!/bin/sh
#$ -N MPICH_JOB
#$ -cwd
# Join stdout and stderr
#$ -j y
# pe request for MPICH. Set your number of processors here.
# Make sure you use the "mpich" parallel environemnt.
#$ -pe mpich 4
#
# Run job through bash shell
#$ -S /bin/bash
#
# The following is for reporting only. It is not really needed
# to run the job. It will show up in your output file.
echo "Got $NSLOTS processors."
echo "Machines:"
# add here code to map regular hostnames into ATM hostnames
#echo $TMPDIR/machines
cat $PE_HOSTFILE
mpirun -machinefile machine -np 4 case root etc

From this script, I am getting the hostfile name in this format:
comp03.dcs.hull.ac.uk 1 parallel.q@comp03.dcs.hull.ac.uk <null>
comp29.dcs.hull.ac.uk 1 parallel.q@comp29.dcs.hull.ac.uk <null>
comp11.dcs.hull.ac.uk 1 parallel.q@comp11.dcs.hull.ac.uk <null>
comp09.dcs.hull.ac.uk 1 parallel.q@comp09.dcs.hull.ac.uk <null>

But my open Mpi implementation need it in this way:-
comp00.dcs.hull.ac.uk slots=2 max-slots=2
comp03.dcs.hull.ac.uk slots=2 max-slots=2
comp04.dcs.hull.ac.uk slots=2 max-slots=2
comp05.dcs.hull.ac.uk slots=2 max-slots=2


Can you please suggest me something about it? If there is any material to read or so then let me know. Any kind of help will be helpful.

Also, I like to ask from the experts, Is this possible with the current code?

looking forward to your help in this regard.

with warm regards,

Nishant Singh

mighelone February 7, 2008 04:33

Nishant, To run parallel Op
 
Nishant,

To run parallel OpenFoam jobs under qsub (Torque version) I use the following script:

#!/bin/bash
#PBS -N damBreakFine
#PBS -l nodes=4
CASE=damBreakFine
SOLVER=interFoam

CURDIR=$HOME/OpenFOAM/michele-1.4.1/run/tutorials/interFoam
cd $CURDIR
mpirun --machinefile $PBS_NODEFILE $SOLVER $CURDIR $CASE -parallel

The variable $PBS_NODEFILE defines the path of the file where the nodes used for the run are stored.

Generally using qsub command you don't know which nodes will be used for the run, so you can not define at priori the machine file.

Michele

nishant_hull February 7, 2008 08:02

Thanks Michele, Unfortunat
 
Thanks Michele,

Unfortunately my cluster is not pbs supported. As you can see my script. Can you suggest something which could replace $PBS_NODEFILE for my case. Or else, Is there any way to make cluster to support pbs script?

Nishant

mighelone February 7, 2008 09:59

Sorry Nushant, I don't obse
 
Sorry Nushant,

I don't observe that you're using grid engine as resource manager and not torque.
I'm sorry, but I don't have any experience on qsub on grid engine.

Michele

olesen February 7, 2008 10:14

Nishant, What was so wrong
 
Nishant,

What was so wrong with the old thread ( http://www.cfd-online.com/cgi-bin/OpenFOAM_Discus/show.cgi?1/6504 ) that warranted starting a completely new thread for this discussion?

IMO it gave fairly reasonable reasonable information and was not exactly out-of-date.

nishant_hull February 7, 2008 15:52

Hi Mark Thanks for the rep
 
Hi Mark

Thanks for the reply. In fact I go through that as well. But I can not understand those codes at first hand. I would appreciate if you can please brief me, how to run parallel foam cases on SGE cluster using QSUB command. I can see some piece of code there but I can not exactly figure out how to implement it in my case.

I am briefing you wot I undersatnd out of it. Actually I do not exactly get what this piece of code is doing here?

PeHostfile2MachineFile()
{
cat $1 | while read line; do
# echo $line
host=`echo $line|cut -f1 -d" "|cut -f1 -d"."`
nslots=`echo $line|cut -f2 -d" "`
i=1
# while [ $i -le $nslots ]; do
# # add here code to map regular hostnames into ATM hostnames
echo $host cpu=$nslots
# i=`expr $i + 1`
# done
done
}
touch OFmachines
PeHostfile2MachineFile $1 | cat >> OFmachines
mhost=`echo $2|cut -f1 -d"."`
echo $mhost >> mhost

AGAIN, I do not understand why qFoam-Snippet is required and where to use it. Bcoz I am actually looking for just a qsub run script. Sorry If I sound very naive.
I understand a bit of the piece of code underneath, which says:-

#!/bin/bash
echo Enter a casename:
read casename
echo "Enter definition WDir:"
read Wdir
#echo Enter Solver :
#read Solver
echo "Number of processors:"
read cpunumb
#
if [ $cpunumb = "1" ]; then
touch Foam-$casename.sh
chmod +x Foam-$casename.sh
echo '#!/bin/bash' >> Foam-$casename.sh
echo '### SGE ###' >> Foam-$casename.sh
echo '#$ -S /bin/sh -j y -cwd' >> Foam-$casename.sh
echo 'read masthost <mhost'>> Foam-$casename.sh
echo 'ssh $masthost "cd $PWD;'SteadyCompFoam' '$Wdir' '$casename' "' >> OFoam-$casename.sh
echo 'rm -f OFmachines' >> Foam-$casename.sh
echo 'rm -f mhost' >> Foam-$casename.sh
echo 'rm -f 'Foam-$casename.sh' ' >> Foam-$casename.sh
qsub -pe OFnet $cpunumb -masterq tom02.q,tom03.q,tom04.q,tom05.q,tom06.q,tom22.q,to m23.q,tom24.q,tom25.
q Foam-$casename.sh
else
touch Foam-$casename.sh
chmod +x Foam-$casename.sh
echo '#!/bin/bash' >> Foam-$casename.sh
echo '### SGE ###' >> Foam-$casename.sh
echo '#$ -S /bin/sh -j y -cwd' >> Foam-$casename.sh
echo 'read masthost <mhost'>> Foam-$casename.sh
echo 'ssh $masthost "export LAMRSH=ssh;cd $PWD;lamboot -v -s OFmachines"' >> Foam-$c
asename.sh
echo 'ssh $masthost "cd $PWD;mpirun -np '$cpunumb' 'SteadyCompFoam' '$Wdir' '$casename' -parallel" ' >>
Foam-$casename.sh
echo 'ssh $masthost "cd $PWD;lamhalt -d"' >> Foam-$c
asename.sh
echo 'rm -f OFmachines' >> Foam-$casename.sh
echo 'rm -f mhost' >> Foam-$casename.sh
echo 'rm -f 'Foam-$casename.sh' ' >> Foam-$casename.sh
qsub -pe OFnet $cpunumb -masterq tom02.q,tom03.q,tom04.q,tom05.q,tom06.q,tom22.q,to m23.q,tom24.q,tom25.
q Foam-$casename.sh
fi

BUT I DONT GET, How it can help in my case. What OFnet means? Also it is for LAM implementation and I am using OpenMpi.

Please suggest, How can I can proceed here?

Nishant


All times are GMT -4. The time now is 10:17.