all processes end up in the same node when submitting parallel job by SGE
Dear all,
has anyone seen this kind of problem? Background: OpenFOAM version OpenFOAM-2.1.x, compiled by "Allwmake", Grid Engine GE 6.2u3, Scientific Linux SL release 5.5 Cluster of 224 cores in 20-something nodes. The following distributes a task nicely in many nodes: mpirun -np 64 --machinefile machines simpleFoam -parallel Slaves : 63 ( "node015.374" "node016.2178" .. But submitting the same task by SGE leads to a situation where _all_ the processes are in a single node. mpirun -np $NSLOTS --machinefile $TMPDIR/machines simpleFoam -parallel The nodes in "machines" generated by SGE are diverse node015, node016.. but simpleFoam always starts the processes in a single node. Is there something I should check? mpirun is from the ThirdParty package. |
Add this line to your script:
Code:
unset SGE_ROOT Alex |
Hi,
thanks for the reply. I am not sure what else I should change in the script. Here we have it: mpirun -np $NSLOTS -machinefile $TMPDIR/machines /opt/OpenFOAM/OpenFOAM-2.1.x/platforms/linux64GccDPOpt/bin/simpleFoam -parallel this runs everything in just one node unset SGE_ROOT mpirun -np $NSLOTS -machinefile $TMPDIR/machines /opt/OpenFOAM/OpenFOAM-2.1.x/platforms/linux64GccDPOpt/bin/simpleFoam -parallel fails with the following error message: ssh: Unsupported option - -x |
The error may depends to openmpi: what version are you using? Can you post your launch script?
Alex |
Hi
if openmpi was built using --with-sge then you dont need "-machinefile $TMPDIR/machines" unset $SGE_ROOT for our cluster puts the job on one node even though its reserving the nodes. Here is how I start an OF job on a sge cluster "qsub runScript" with runScript containing the lines below Code:
#!/bin/bash |
Many thanks!
Unfortunately the script that you copied behaves the same way as before: all processes in 1 node. I need to set "machinefile" in the script, otherwise I get "ssh unsupported option -x" in stderr. I compiled OpenFOAM 2.1.1 with just "./Allwmake". Is there a trick to force "--with-sge"? |
All times are GMT -4. The time now is 19:57. |