CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

Sun Grid Engine

Register Blogs Community New Posts Updated Threads Search

Like Tree2Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   January 28, 2008, 09:19
Default Dear All, I'm starting to u
  #1
Senior Member
 
Gavin Tabor
Join Date: Mar 2009
Posts: 181
Rep Power: 17
grtabor is on a distinguished road
Dear All,

I'm starting to use OpenFOAM on a new machine. Does anyone have any experience with using OpenFOAM with Sun Grid Engine? Comments on this would be useful; submission scripts would be _really_ useful.

Gavin
grtabor is offline   Reply With Quote

Old   January 28, 2008, 09:40
Default Getting OpenFOAM working with
  #2
Senior Member
 
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,685
Rep Power: 40
olesen has a spectacular aura aboutolesen has a spectacular aura about
Getting OpenFOAM working with OpenMPI and GridEngine okay (much, much better than trying to get LAM working).

1. Check that the OPAL_PREFIX is properly set by your Foam installation.

2. Assuming that you don't have the OpenFOAM settings being sourced within your bashrc/cshrc, or you are using sh/ksh, the job script should include this sourcing information.

I've attached a script snippet qFoam-snippet that should help get you going.
The snippet CANNOT be used as is. I'd rather not send the entire script, since there a number of interdependencies with our site-specific scripting and it would likely be too confusing anyhow.

Since I have it set up to run in the cwd, there is no need to pass the root/case information to the script, but you do need to tell it which application should run.

You will not only need site-specific changes, you will also notice funny looking "%{STUFF}" constructs throughout. These placeholders are replaced with the appropriate environment variables to create the final job script. There is also some odd bits with an "etc/" directory. This is simply a link to the appropriate OpenFOAM-VERSION/.OpenFOAM-VERSION directory.
olesen is offline   Reply With Quote

Old   January 28, 2008, 10:00
Default Hi Gavin, I use SGE job sch
  #3
Member
 
Luca M.
Join Date: Mar 2009
Location: Luzern, Switzerland
Posts: 59
Rep Power: 17
luca is on a distinguished road
Hi Gavin,

I use SGE job scheduler with OpenFOAM in our cluster. I wrote this rules:



PeHostfile2MachineFile()
{
cat $1 | while read line; do
# echo $line
host=`echo $line|cut -f1 -d" "|cut -f1 -d"."`
nslots=`echo $line|cut -f2 -d" "`
i=1
# while [ $i -le $nslots ]; do
# # add here code to map regular hostnames into ATM hostnames
echo $host cpu=$nslots
# i=`expr $i + 1`
# done
done
}
touch OFmachines
PeHostfile2MachineFile $1 | cat >> OFmachines
mhost=`echo $2|cut -f1 -d"."`
echo $mhost >> mhost




with this batch that creates the SGE script:




#!/bin/bash
echo Enter a casename:
read casename
echo "Enter definition WDir:"
read Wdir
#echo Enter Solver :
#read Solver
echo "Number of processors:"
read cpunumb
#
if [ $cpunumb = "1" ]; then
touch Foam-$casename.sh
chmod +x Foam-$casename.sh
echo '#!/bin/bash' >> Foam-$casename.sh
echo '### SGE ###' >> Foam-$casename.sh
echo '#$ -S /bin/sh -j y -cwd' >> Foam-$casename.sh
echo 'read masthost <mhost'>> Foam-$casename.sh
echo 'ssh $masthost "cd $PWD;'SteadyCompFoam' '$Wdir' '$casename' "' >> OFoam-$casename.sh
echo 'rm -f OFmachines' >> Foam-$casename.sh
echo 'rm -f mhost' >> Foam-$casename.sh
echo 'rm -f 'Foam-$casename.sh' ' >> Foam-$casename.sh
qsub -pe OFnet $cpunumb -masterq tom02.q,tom03.q,tom04.q,tom05.q,tom06.q,tom22.q,to m23.q,tom24.q,tom25.
q Foam-$casename.sh
else
touch Foam-$casename.sh
chmod +x Foam-$casename.sh
echo '#!/bin/bash' >> Foam-$casename.sh
echo '### SGE ###' >> Foam-$casename.sh
echo '#$ -S /bin/sh -j y -cwd' >> Foam-$casename.sh
echo 'read masthost <mhost'>> Foam-$casename.sh
echo 'ssh $masthost "export LAMRSH=ssh;cd $PWD;lamboot -v -s OFmachines"' >> Foam-$c
asename.sh
echo 'ssh $masthost "cd $PWD;mpirun -np '$cpunumb' 'SteadyCompFoam' '$Wdir' '$casename' -parallel" ' >>
Foam-$casename.sh
echo 'ssh $masthost "cd $PWD;lamhalt -d"' >> Foam-$c
asename.sh
echo 'rm -f OFmachines' >> Foam-$casename.sh
echo 'rm -f mhost' >> Foam-$casename.sh
echo 'rm -f 'Foam-$casename.sh' ' >> Foam-$casename.sh
qsub -pe OFnet $cpunumb -masterq tom02.q,tom03.q,tom04.q,tom05.q,tom06.q,tom22.q,to m23.q,tom24.q,tom25.
q Foam-$casename.sh
fi


The stuff works with LAM mpi libraries. You can submit the job but at the moment you have to stop your calculation by the controlDict and not by the qmon interface.

We can start from this to develop a better one

Luca
luca is offline   Reply With Quote

Old   January 30, 2008, 05:30
Default Dear Luca, Mark, Thanks for
  #4
Senior Member
 
Gavin Tabor
Join Date: Mar 2009
Posts: 181
Rep Power: 17
grtabor is on a distinguished road
Dear Luca, Mark,

Thanks for your scripts - I can kind of make sense of them!! I've managed to get things running now for single-processor jobs using a simplified version of what you suggest.

For the parallel running case; am I right that $nslots is a variable giving the number of processors being allocated for the parallel run? How is this being set in SGE?

Gavin
grtabor is offline   Reply With Quote

Old   January 30, 2008, 07:39
Default The $NSLOT (all uppercase) is
  #5
Senior Member
 
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,685
Rep Power: 40
olesen has a spectacular aura aboutolesen has a spectacular aura about
The $NSLOT (all uppercase) is set by GridEngine. The qsub manpage is the best starting point for finding out more about which env variables are used.

Based on personal experience, I would really try to avoid LAM with GridEngine and use OpenMPI instead.

BTW: killing the job via qdel (or qmon) works fine (it doesn't leave around any half-dead processes), but obviously won't have OpenFOAM write results before exiting.

Using the '-notify' option for qsub would give you a chance to trap the signals. But apart from some OpenMPI issues in the past, it is not certain that a particular OpenFOAM solver could finish its iteration *and* write the results before the true kill signal gets sent. Increasing the notify period before pulling the plug maynot be the correct answer either.

For the moment, I've modified a few solvers to recognize the presence of an 'ABORT' file and quit and write if it exists. This is usually quite a bit easier than modifying the controlDict.
I think there is another solution, but still need to think about it a bit.

/mark
olesen is offline   Reply With Quote

Old   February 7, 2008, 12:32
Default hello, I am also trying to
  #6
Member
 
nicolas
Join Date: Mar 2009
Location: Glasgow
Posts: 42
Rep Power: 17
nico765 is on a distinguished road
hello,

I am also trying to use qsub to run in parallel using 4 cpus.

My command is:

qsub -q queue_name.q sge_script.sh

in sge_script.sh:

source /net/c3m/opt/OpenFOAM/OpenFOAM-1.4.1/.OpenFOAM-1.4.1/bashrc
source /net/c3m/opt/OpenFOAM/OpenFOAM-1.4.1/.bashrc
mpirun -np 4 simpleFoam .. case -parallel


This for some reasons does not work.

I get this output:
error: executing task of job 25585 failed:
[c4n26:16009] ERROR: A daemon on node c4n26 failed to start as expected.
[c4n26:16009] ERROR: There may be more information available from
[c4n26:16009] ERROR: the 'qstat -t' command on the Grid Engine tasks.
[c4n26:16009] ERROR: If the problem persists, please restart the
[c4n26:16009] ERROR: Grid Engine PE job
[c4n26:16009] ERROR: The daemon exited unexpectedly with status 1.

This occurs only if i use mpirun, if i use the same command in serial (simpleFoam .. case) it works ok.

Also, if I ssh into the node and start the script there it runs ok.

Nicolas
nico765 is offline   Reply With Quote

Old   February 8, 2008, 03:10
Default Nishant, I am re-directing
  #7
Senior Member
 
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,685
Rep Power: 40
olesen has a spectacular aura aboutolesen has a spectacular aura about
Nishant,

I am re-directing your thread ( http://www.cfd-online.com/cgi-bin/Op...how.cgi?1/6598 ) to here, since this is where the relevant information is.

If you read the thread, you'll notice that my response (with the qFoam snippet) addressed running with OpenMPI, whereas the information from Luca was for LAM. If you are not using LAM, then you don't need any of that stuff and don't need to worry about it.

The qFoam snippet is a template run script. The '%{STUFF}' placeholders must be replaced with the relevant information before it can be submitted with the usual qsub -pe NAME slots.

How exactly you wish to use the template to create your job script is left to you. Some people might want an interactive solution (like Luca showed) others might want to wrap it with Perl, Python or Ruby. We generally use Perl to create the final shell script and feed it to qsub via stdin.

From you original question about using something like "mpirun -machinefile machine -np 4 case root etc". Why do you want to generate a machine file and specify the number of processes? This is the purpose of the OpenMPI and GridEngine integration and you are ignoring it.

As you can also see from the qFoam snippet, there is no need to use -machinefile or -np when using OpenMPI and GridEngine. All the bits are already done for you. Have you already consulted your site support people?
olesen is offline   Reply With Quote

Old   February 9, 2008, 07:40
Default Thanks Mark. I hope this sho
  #8
Senior Member
 
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17
nishant_hull is on a distinguished road
Thanks Mark.
I hope this should help. I will update you soon in this regard.

Nishant
__________________
Thanks and regards,

Nishant
nishant_hull is offline   Reply With Quote

Old   February 9, 2008, 11:42
Default Hi marks.. I edited my
  #9
Senior Member
 
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17
nishant_hull is on a distinguished road
Hi marks..

I edited my qfoam-snippet.sh file as under and run it. The output is suggesting error at line containing __DATA__ and line 25-26.
Please see my file and suggest required editing.

17 rootName=interFoam
18 caseName=$PWD/dam-dumy
19 jobName=$caseName
20 # avoid generic names
21 case "$jobName" in
22 foam | OpenFOAM )
23 jobName=$(dirname $PWD)
24 jobName=$(basename $jobName)
25 ;;
26 ecas
27
28 # ----------------------------------------
29 # OpenFOAM (re)initialization
30 #
31 unset FOAM_SILENT
32 FOAM_INST_DIR=$HOME/$WM_PROJECT
33 FOAM_ETC=$WM_PROJECT_DIR/.OpenFOAM-1.4.1
34
35 # source based on parallel environment
36 for i in $FOAM_ETC/bashrc-$PE $FOAM_ETC/bashrc

qfoam-snippet.txt
__________________
Thanks and regards,

Nishant
nishant_hull is offline   Reply With Quote

Old   February 9, 2008, 14:25
Default sorry but the error posted on
  #10
Senior Member
 
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17
nishant_hull is on a distinguished road
sorry but the error posted on my last message somes when I tried running on single proccessor, using qsub qfoam-sippet.sh
When I try running it on 4 processor, using qsub -pe qfoam-sippet.sh 4 with modified attached script, job didnt get submitted on the cluster.
please see the script.

qfoam-snippet.txt
__________________
Thanks and regards,

Nishant
nishant_hull is offline   Reply With Quote

Old   February 11, 2008, 03:07
Default The __DATA__ is a remnant from
  #11
Senior Member
 
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,685
Rep Power: 40
olesen has a spectacular aura aboutolesen has a spectacular aura about
The __DATA__ is a remnant from the original Perl wrapper and should be deleted.
The "ecas" is a typo from a last minute edit and should obviously be "esac".
The idea of the snippet was to give an idea of what to do, not to provide a finished solution.
olesen is offline   Reply With Quote

Old   February 11, 2008, 15:24
Default Now the error is changed to:
  #12
Senior Member
 
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17
nishant_hull is on a distinguished road
Now the error is changed to:

(EE) /.OpenFOAM-1.4.1/bashrc cannot be found

Actually I sourced my code with .bashrc. how can I make this code to run on cluster now?

Nishant
__________________
Thanks and regards,

Nishant
nishant_hull is offline   Reply With Quote

Old   February 14, 2008, 04:15
Default any ideas on my problem descri
  #13
Member
 
nicolas
Join Date: Mar 2009
Location: Glasgow
Posts: 42
Rep Power: 17
nico765 is on a distinguished road
any ideas on my problem described above?

Thanks,

Nicolas
nico765 is offline   Reply With Quote

Old   February 15, 2008, 13:27
Default Hello, I finally fixed my p
  #14
Member
 
nicolas
Join Date: Mar 2009
Location: Glasgow
Posts: 42
Rep Power: 17
nico765 is on a distinguished road
Hello,

I finally fixed my problem.

'mpirun' should be replaced by 'mpirun -prefix $OPENMPI_ARCH_PATH'

Nicolas
nico765 is offline   Reply With Quote

Old   February 20, 2008, 12:15
Default Hi .. i am using $PE mpich
  #15
Senior Member
 
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17
nishant_hull is on a distinguished road
Hi ..

i am using $PE mpich for runing paralell damBreak problem on 4 processor. However I am getting this error:
Got 4 processors.
Machines:

mpirun --prefix ~/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/ -np 4 -machinefile /tmp/802.1.parallel.q/machines interFoam . dam-dumy -parallel
[comp20:32553] mca: base: component_find: unable to open paffinity linux: file not found (ignored)
[comp20:32553] mca: base: component_find: unable to open ns proxy: file not found (ignored)
[comp20:32553] mca: base: component_find: unable to open ns replica: file not found (ignored)
[comp20:32553] mca: base: component_find: unable to open errmgr hnp: file not found (ignored)
[comp20:32553] mca: base: component_find: unable to open errmgr orted: file not found (ignored)
[comp20:32553] mca: base: component_find: unable to open errmgr proxy: file not found (ignored)
[comp20:32553] mca: base: component_find: unable to open rml oob: file not found (ignored)
[comp20:32553] mca: base: component_find: unable to open gpr null: file not found (ignored)
[comp20:32553] mca: base: component_find: unable to open gpr proxy: file not found (ignored)
[comp20:32553] mca: base: component_find: unable to open gpr replica: file not found (ignored)
[comp20:32553] mca: base: component_find: unable to open sds env: file not found (ignored)
[comp20:32553] mca: base: component_find: unable to open sds pipe: file not found (ignored)
[comp20:32553] mca: base: component_find: unable to open sds seed: file not found (ignored)
[comp20:32553] mca: base: component_find: unable to open sds singleton: file not found (ignored)
[comp20:32553] mca: base: component_find: unable to open sds slurm: file not found (ignored)
[comp20:32553] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init_stage1.c at line 214
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

orte_sds_base_select failed
--> Returned value -13 instead of ORTE_SUCCESS

--------------------------------------------------------------------------
[comp20:32553] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_system_init.c at line 42
[comp20:32553] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 52
--------------------------------------------------------------------------
Open RTE was unable to initialize properly. The error occured while
attempting to orte_init(). Returned value -13 instead of ORTE_SUCCESS.

can anybody help?
__________________
Thanks and regards,

Nishant
nishant_hull is offline   Reply With Quote

Old   February 21, 2008, 07:52
Default After providing path to the re
  #16
Senior Member
 
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17
nishant_hull is on a distinguished road
After providing path to the relevant library file of OF, Now I am getting this error:-

Got 4 processors.
Machines:
[comp30:06445] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init_stage1.c at line 214
--------------------------------------------------------------------------
Sorry! You were supposed to get help about:
orte_init:startup:internal-failure
from the file:
help-orte-runtime
But I couldn't find any file matching that name. Sorry!
--------------------------------------------------------------------------
[comp30:06445] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_system_init.c at line 42
[comp30:06445] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 52
--------------------------------------------------------------------------
Sorry! You were supposed to get help about:
orterun:init-failure
from the file:
help-orterun.txt
But I couldn't find any file matching that name. Sorry!
--------------------------------------------------------------------------


can anybody comment on it?

nishant
__________________
Thanks and regards,

Nishant
nishant_hull is offline   Reply With Quote

Old   February 21, 2008, 08:02
Default If OPAL_PREFIX is set, the fil
  #17
Senior Member
 
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,685
Rep Power: 40
olesen has a spectacular aura aboutolesen has a spectacular aura about
If OPAL_PREFIX is set, the file should be found.
olesen is offline   Reply With Quote

Old   February 21, 2008, 09:08
Default My script file contains these
  #18
Senior Member
 
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17
nishant_hull is on a distinguished road
My script file contains these lines:

#!/bin/sh
#
# Your job name
#$ -N OMPI_Dumy
#
# Use current working directory
#$ -cwd
#
# Join stdout and stderr
#$ -j y
#
# pe request for MPICH. Set your number of processors here.
# Make sure you use the "mpich" parallel environemnt.
#$ -pe mpich 4
#
# Run job through bash shell
#$ -S /bin/bash
#
# The following is for reporting only. It is not really needed
# to run the job. It will show up in your output file.
echo "Got $NSLOTS processors."
echo "Machines:"
#
# These exports needed for OpenMPI, when using the command line
# these are set with the modules command, but with SGE scripts
# we assume nothing!
export PATH=~/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/bin:$ PATH
export LD_LIBRARY_PATH=~/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64Gcc DPOpt/lib:$PATH

# Use full pathname to make sure we are using the right mpirun
# Need to use prefix for nodes
~/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/bin/mpirun --prefix ~/OpenFOAM/OpenFOAM-1.4.1/src/openmpi-1.2.3/platforms/linux64GccDPOpt/ -np $NSLOTS -machinefile ~/.mpich/mpich_hosts.$JOB_ID interFoam . dam-dumy -parallel


Did i set the OPAL_PREFIX right?

Nishant
__________________
Thanks and regards,

Nishant
nishant_hull is offline   Reply With Quote

Old   February 21, 2008, 09:12
Default i think I am doing something w
  #19
Senior Member
 
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17
nishant_hull is on a distinguished road
i think I am doing something wrong with the path of machinefile/hostfile. Don't I? The current path is for MPICH hostfle path. Should it be openfoam's open mpi path? can you tell me something about that?

Nishant
__________________
Thanks and regards,

Nishant
nishant_hull is offline   Reply With Quote

Old   February 25, 2008, 08:59
Default While I was trying to debug th
  #20
Senior Member
 
Nishant
Join Date: Mar 2009
Location: Glasgow, UK
Posts: 166
Rep Power: 17
nishant_hull is on a distinguished road
While I was trying to debug the problem on my sge cluster, I used the ompi_info command to see the problem.

The error registered by ompi_info is:

ompi_info: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory

Can anybody tell me, whats going wrong now?

Nishant
__________________
Thanks and regards,

Nishant
nishant_hull is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
CFX integration with Sun Grid Engine mausmi CFX 2 February 4, 2016 16:30
Grid Engine OpenFOAM15dev and OpenMPI124 tian OpenFOAM Installation 11 February 26, 2009 10:43
Running parallel job using qsub on sun grid engine nishant_hull OpenFOAM Running, Solving & CFD 5 February 7, 2008 14:52
IC engine Araz Banaeizadeh Main CFD Forum 0 June 28, 2006 22:56
CFX and Sun Grid Engine David Hargreaves CFX 1 August 25, 2005 23:50


All times are GMT -4. The time now is 14:51.