|
[Sponsors] | |||||
all processes end up in the same node when submitting parallel job by SGE |
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
|
|
#1 |
|
New Member
Marko Niinimaki
Join Date: Nov 2012
Posts: 3
Rep Power: 2 ![]() |
Dear all,
has anyone seen this kind of problem? Background: OpenFOAM version OpenFOAM-2.1.x, compiled by "Allwmake", Grid Engine GE 6.2u3, Scientific Linux SL release 5.5 Cluster of 224 cores in 20-something nodes. The following distributes a task nicely in many nodes: mpirun -np 64 --machinefile machines simpleFoam -parallel Slaves : 63 ( "node015.374" "node016.2178" .. But submitting the same task by SGE leads to a situation where _all_ the processes are in a single node. mpirun -np $NSLOTS --machinefile $TMPDIR/machines simpleFoam -parallel The nodes in "machines" generated by SGE are diverse node015, node016.. but simpleFoam always starts the processes in a single node. Is there something I should check? mpirun is from the ThirdParty package. |
|
|
|
|
|
|
|
|
#2 |
|
New Member
|
Add this line to your script:
Code:
unset SGE_ROOT Alex |
|
|
|
|
|
|
|
|
#3 |
|
New Member
Marko Niinimaki
Join Date: Nov 2012
Posts: 3
Rep Power: 2 ![]() |
Hi,
thanks for the reply. I am not sure what else I should change in the script. Here we have it: mpirun -np $NSLOTS -machinefile $TMPDIR/machines /opt/OpenFOAM/OpenFOAM-2.1.x/platforms/linux64GccDPOpt/bin/simpleFoam -parallel this runs everything in just one node unset SGE_ROOT mpirun -np $NSLOTS -machinefile $TMPDIR/machines /opt/OpenFOAM/OpenFOAM-2.1.x/platforms/linux64GccDPOpt/bin/simpleFoam -parallel fails with the following error message: ssh: Unsupported option - -x |
|
|
|
|
|
|
|
|
#4 |
|
New Member
|
The error may depends to openmpi: what version are you using? Can you post your launch script?
Alex |
|
|
|
|
|
|
|
|
#5 |
|
Senior Member
Niels Nielsen
Join Date: Mar 2009
Location: Denmark
Posts: 383
Rep Power: 10 ![]() |
Hi
if openmpi was built using --with-sge then you dont need "-machinefile $TMPDIR/machines" unset $SGE_ROOT for our cluster puts the job on one node even though its reserving the nodes. Here is how I start an OF job on a sge cluster "qsub runScript" with runScript containing the lines below Code:
#!/bin/bash
#
#$ -cwd
#$ -o ./log.out
#$ -e ./log.err
#$ -pe orte 24
#
#$ -q all.q
#$ -S /bin/bash
# unset SGE_ROOT
echo Got $NSLOTS processors.
source /share/apps/OpenFOAM/OpenFOAM-2.1.x/etc/bashrc
mpi=`command -v mpirun`
solver=`command -v pimpleDyMFoam`
echo $mpi
echo $solver
if [ -z "$mpi" -a -z "$solver" ]
then
echo ">> mpi was not found, quitting!"
exit 1
else
echo ">> mpi was found will continue"
$mpi -np $NSLOTS -x LD_LIBRARY_PATH -x PATH -x WM_PROJECT_DIR -x WM_PROJECT_INST_DIR -x WM_OPTIONS -x FOAM_LIBBIN -x FOAM_APPBIN -x FOAM_USER_APPBIN -x MPI_BUFFER_SIZE $solver -parallel > log
fi
__________________
Linnemann |
|
|
|
|
|
|
|
|
#6 |
|
New Member
Marko Niinimaki
Join Date: Nov 2012
Posts: 3
Rep Power: 2 ![]() |
Many thanks!
Unfortunately the script that you copied behaves the same way as before: all processes in 1 node. I need to set "machinefile" in the script, otherwise I get "ssh unsupported option -x" in stderr. I compiled OpenFOAM 2.1.1 with just "./Allwmake". Is there a trick to force "--with-sge"? |
|
|
|
|
|
![]() |
| Tags |
| sge |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Difficulty in calculating angular velocity of Savonius turbine simulation | alfaruk | CFX | 4 | May 2, 2013 18:40 |
| Problems with CEL (guess it's simple to solve) | Felggv | CFX | 19 | July 17, 2012 19:19 |
| CFX Parameters Settings | Flaky | CFX | 21 | October 28, 2010 18:16 |
| error using combination of step function | xujjun | CFX | 1 | January 15, 2008 16:46 |
| How to apply negtive pressure to outlet | bioman66 | CFX | 5 | June 3, 2006 01:40 |