CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Running via SGE and mpich-mx (https://www.cfd-online.com/Forums/openfoam-solving/67071-running-via-sge-mpich-mx.html)

bastil August 3, 2009 05:47

Running via SGE and mpich-mx
 
Dear all,

I have build OF1.5-dev with support for you mpich-mx (myrinet cluster). SUbmitting a Job via SGE fails without any arrer message. The job simple does not start without any error. I use a script for submit:
Code:

#!/bin/sh

#$ -N OF_0013_mpich_mx
#$ -S /bin/sh
#$ -cwd
#$ -j y
#$ -pe mpi 32

export WM_MPLIB=MPICH-MX
source /opt/OpenFOAM/OpenFOAM-1.5-dev/etc/bashrc
echo $MPI_HOME
which simpleFoam
mpirun -machinefile $TMPDIR/machines -np $NSLOTS simpleFoam  -parallel </dev/null


I get mpich-Benchmarks running with that without problems but not simpleFoam. Any ideas?

gschaider August 3, 2009 11:01

Quote:

Originally Posted by bastil (Post 225048)
Dear all,

I have build OF1.5-dev with support for you mpich-mx (myrinet cluster). SUbmitting a Job via SGE fails without any arrer message. The job simple does not start without any error. I use a script for submit:
Code:

#!/bin/sh

#$ -N OF_0013_mpich_mx
#$ -S /bin/sh
#$ -cwd
#$ -j y
#$ -pe mpi 32

export WM_MPLIB=MPICH-MX
source /opt/OpenFOAM/OpenFOAM-1.5-dev/etc/bashrc
echo $MPI_HOME
which simpleFoam
mpirun -machinefile $TMPDIR/machines -np $NSLOTS simpleFoam  -parallel </dev/null

I get mpich-Benchmarks running with that without problems but not simpleFoam. Any ideas?

I'm not sure whether your machines automatically know the PATH and where to find the libraries. Try adding to the mpirun
-x PATH -x LD_LIBRARY_PATH

In my setup I also transfer WM_PROJECT_DIR, FOAM_MPI_LIBBIN and MPI_ARCH_PATH, but I'm not 100% sure whether these are required (and as long as it works I see no harm in adding those)

If that doesn't work for you, have a look into the *.oxxx and *.poXXX-files that SGE gives you and share the error message there with us

Bernhard

bastil August 4, 2009 05:27

Thanks Bernhard for your help. I am still struggling.

Quote:

Originally Posted by gschaider (Post 225087)
If that doesn't work for you, have a look into the *.oxxx and *.poXXX-files that SGE gives you and share the error message there with us

Here is my submitscript:

Code:

#!/bin/sh
#$ -N OF_0013_mpich_mx
#$ -S /bin/sh
#$ -cwd
#$ -j y
#$ -pe mpi 32

export WM_MPLIB=MPICH-MX
source /opt/OpenFOAM/OpenFOAM-1.5-dev/etc/bashrc
echo $MPI_HOME
which simpleFoam
mpirun -machinefile $TMPDIR/machines -np $NSLOTS `which simpleFoam` /run/brblo OF_0013_mpich_mx -parallel -x PATH -x LD_LIBRARY_PATH < /dev/null

this is *.oXXX
Code:

/opt/OpenFOAM/ThirdParty/mpich-mx-1.2.7..5
/opt/OpenFOAM/OpenFOAM-1.5-dev/applications/bin/linux64GccDPOpt/simpleFoam

and this is *.poXXX
Code:

/opt/sge/default/spool/node006/active_jobs/5519.1/pe_hostfile
node006
node006
node006
node006
node030
node030
node030
node030
node003
node003
node003
node003
node026
node026
node026
node026
node027
node027
node027
node027
node004
node004
node004
node004
node025
node025
node025
node025
node020
node020
node020
node020
rm: cannot remove `/tmp/5519.1.all.q/rsh': No such file or directory


The Job does not even seem to start at all.

gschaider August 4, 2009 13:56

Quote:

Originally Posted by bastil (Post 225187)
Thanks Bernhard for your help. I am still struggling.

<snipped>

The Job does not even seem to start at all.

"Does not seem to start at all": What does that mean: The skript is hanging in the queue or simpleFoam/mpirun does not run?

No idea what could be the cause.

Just some tips for further testing:
- use qlogin to test it interactivly. Be aware that there are subtle differences between the environment you get with qlogin and qsub
- use less nodes and try to find out whether this is a problem that occurs if all processes are on one node, too (if not, then you might have a rsh/ssh-problem)
- MPI-demo programs run with the same script?

Bernhard

bastil August 4, 2009 15:33

Quote:

Originally Posted by gschaider (Post 225246)
"Does not seem to start at all": What does that mean: The skript is hanging in the queue or simpleFoam/mpirun does not run?

No it is out of the queue imiditely with producing o and po files but without further output. It does not seem to do any serious work...

Quote:

- MPI-demo programs run with the same script?
Yes, without any problem.

gschaider August 5, 2009 08:01

Quote:

Originally Posted by bastil (Post 225259)
Yes, without any problem.

And running your script (with simpleFoam) in a qlogin-shell (you'll have to build the machine-file by hand)?


All times are GMT -4. The time now is 10:54.