CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > OpenFOAM Running, Solving & CFD

Running via SGE and mpich-mx

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   August 3, 2009, 05:47
Default Running via SGE and mpich-mx
  #1
Senior Member
 
BastiL
Join Date: Mar 2009
Posts: 462
Rep Power: 10
bastil is on a distinguished road
Dear all,

I have build OF1.5-dev with support for you mpich-mx (myrinet cluster). SUbmitting a Job via SGE fails without any arrer message. The job simple does not start without any error. I use a script for submit:
Code:
#!/bin/sh

#$ -N OF_0013_mpich_mx
#$ -S /bin/sh
#$ -cwd
#$ -j y
#$ -pe mpi 32

export WM_MPLIB=MPICH-MX
source /opt/OpenFOAM/OpenFOAM-1.5-dev/etc/bashrc
echo $MPI_HOME
which simpleFoam
mpirun -machinefile $TMPDIR/machines -np $NSLOTS simpleFoam  -parallel </dev/null

I get mpich-Benchmarks running with that without problems but not simpleFoam. Any ideas?
bastil is offline   Reply With Quote

Old   August 3, 2009, 11:01
Default
  #2
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 3,862
Rep Power: 38
gschaider will become famous soon enoughgschaider will become famous soon enough
Quote:
Originally Posted by bastil View Post
Dear all,

I have build OF1.5-dev with support for you mpich-mx (myrinet cluster). SUbmitting a Job via SGE fails without any arrer message. The job simple does not start without any error. I use a script for submit:
Code:
#!/bin/sh

#$ -N OF_0013_mpich_mx
#$ -S /bin/sh
#$ -cwd
#$ -j y
#$ -pe mpi 32

export WM_MPLIB=MPICH-MX
source /opt/OpenFOAM/OpenFOAM-1.5-dev/etc/bashrc
echo $MPI_HOME
which simpleFoam
mpirun -machinefile $TMPDIR/machines -np $NSLOTS simpleFoam  -parallel </dev/null
I get mpich-Benchmarks running with that without problems but not simpleFoam. Any ideas?
I'm not sure whether your machines automatically know the PATH and where to find the libraries. Try adding to the mpirun
-x PATH -x LD_LIBRARY_PATH

In my setup I also transfer WM_PROJECT_DIR, FOAM_MPI_LIBBIN and MPI_ARCH_PATH, but I'm not 100% sure whether these are required (and as long as it works I see no harm in adding those)

If that doesn't work for you, have a look into the *.oxxx and *.poXXX-files that SGE gives you and share the error message there with us

Bernhard
gschaider is offline   Reply With Quote

Old   August 4, 2009, 05:27
Default
  #3
Senior Member
 
BastiL
Join Date: Mar 2009
Posts: 462
Rep Power: 10
bastil is on a distinguished road
Thanks Bernhard for your help. I am still struggling.

Quote:
Originally Posted by gschaider View Post
If that doesn't work for you, have a look into the *.oxxx and *.poXXX-files that SGE gives you and share the error message there with us
Here is my submitscript:

Code:
#!/bin/sh
#$ -N OF_0013_mpich_mx
#$ -S /bin/sh
#$ -cwd
#$ -j y
#$ -pe mpi 32

export WM_MPLIB=MPICH-MX
source /opt/OpenFOAM/OpenFOAM-1.5-dev/etc/bashrc
echo $MPI_HOME
which simpleFoam
mpirun -machinefile $TMPDIR/machines -np $NSLOTS `which simpleFoam` /run/brblo OF_0013_mpich_mx -parallel -x PATH -x LD_LIBRARY_PATH < /dev/null
this is *.oXXX
Code:
/opt/OpenFOAM/ThirdParty/mpich-mx-1.2.7..5
/opt/OpenFOAM/OpenFOAM-1.5-dev/applications/bin/linux64GccDPOpt/simpleFoam
and this is *.poXXX
Code:
/opt/sge/default/spool/node006/active_jobs/5519.1/pe_hostfile
node006
node006
node006
node006
node030
node030
node030
node030
node003
node003
node003
node003
node026
node026
node026
node026
node027
node027
node027
node027
node004
node004
node004
node004
node025
node025
node025
node025
node020
node020
node020
node020
rm: cannot remove `/tmp/5519.1.all.q/rsh': No such file or directory

The Job does not even seem to start at all.
bastil is offline   Reply With Quote

Old   August 4, 2009, 13:56
Default
  #4
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 3,862
Rep Power: 38
gschaider will become famous soon enoughgschaider will become famous soon enough
Quote:
Originally Posted by bastil View Post
Thanks Bernhard for your help. I am still struggling.

<snipped>

The Job does not even seem to start at all.
"Does not seem to start at all": What does that mean: The skript is hanging in the queue or simpleFoam/mpirun does not run?

No idea what could be the cause.

Just some tips for further testing:
- use qlogin to test it interactivly. Be aware that there are subtle differences between the environment you get with qlogin and qsub
- use less nodes and try to find out whether this is a problem that occurs if all processes are on one node, too (if not, then you might have a rsh/ssh-problem)
- MPI-demo programs run with the same script?

Bernhard
gschaider is offline   Reply With Quote

Old   August 4, 2009, 15:33
Default
  #5
Senior Member
 
BastiL
Join Date: Mar 2009
Posts: 462
Rep Power: 10
bastil is on a distinguished road
Quote:
Originally Posted by gschaider View Post
"Does not seem to start at all": What does that mean: The skript is hanging in the queue or simpleFoam/mpirun does not run?
No it is out of the queue imiditely with producing o and po files but without further output. It does not seem to do any serious work...

Quote:
- MPI-demo programs run with the same script?
Yes, without any problem.
bastil is offline   Reply With Quote

Old   August 5, 2009, 08:01
Default
  #6
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 3,862
Rep Power: 38
gschaider will become famous soon enoughgschaider will become famous soon enough
Quote:
Originally Posted by bastil View Post
Yes, without any problem.
And running your script (with simpleFoam) in a qlogin-shell (you'll have to build the machine-file by hand)?
gschaider is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Sun Grid Engine grtabor OpenFOAM Running, Solving & CFD 28 August 22, 2012 10:27
Star-ccm+ batch mode on SGE Karl Jensen CD-adapco 0 February 4, 2009 16:54


All times are GMT -4. The time now is 20:24.