CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Sun Grid Engine (https://www.cfd-online.com/Forums/openfoam-solving/59044-sun-grid-engine.html)

nishant_hull March 3, 2008 16:48

I finally able to run my case
 
I finally able to run my case in parallel. There was some problem in the gcc installation. Now its working fine,.

Thank you ..

Nishant

nishant_hull March 12, 2008 10:17

I thought my parallel case is
 
I thought my parallel case is running, but actually it was not. However I can see the program running on queue. Error file is saying that..
ERROR: A daemon on node comp26 failed to start as expected.
As i mentioned earlier my mpirun -hostfile machine <rooot> <case> -parallel command is running quite well on cluster directly. I mean to say that it's working fine if we run on master node (for us its kittyhawk.dcs.hull.ac.uk) but it fails on any other node ( like comp01/02/10/11 etc) I tried a hello mpi programm as well but that also failed to run using qsub and running quite well directly on master kittyhawk. my gcc compiler is unable to compile a program on any other node except master node kittyhawk. however they are using the right gcc (that is openfoam version of gcc)

Again, I am using cluster's version of MPICH as PE. (#$ mpich -pe 4), which is installed at /usr/.....**
The default PE environment here is >>score<< which we run using mpisub command.
Do I need to use a local version of mpich in order to run in parallel using qsub? Or could it be possible to run openfoam program using score?
can anybody suggst something?

regards,
Nishant

4xF April 17, 2009 11:21

Running OpenFOAM in parallel with SGE
 
Hi,

to run openFOAM in parallel with SGE, you need to make sure that the following requisites are satisfied:

1) use an openmpi version >= 1.2.0
The reason is that any version prior to that isn't working with SGE.

2) Make sure that you define a parallel environment, for instance "orte", with the following definition (that's here for 8 parallel slots = 8 cores in parallel):
pe_name orte
slots 8
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $round_robin
control_slaves TRUE
job_is_first_task FALSE

urgency_slots min
3) Submit your job with (for example with a run on 8 cores):
qsub RUN.sh
where RUN.sh contains:
#!/bin/sh
#$ -V
### number of processors and parallel environment
#$ -pe orte 8
### Job name
#$ -N "mypartest"
### Start from current working directory
#$ -cwd
### Generate the hostfile
HOSTFILE=system/hostfile
awk '{print $1" cpu=1"}' ${PE_HOSTFILE} > ${PWD}/${HOSTFILE}
### Run application
SOLVER=icoFoam
${MPI_ARCH_PATH}/bin/mpirun -np ${NSLOTS} --hostfile ${PWD}/${HOSTFILE} ${FOAM_APPBIN}/${SOLVER} -parallel
exit $?

You will also find further information at:
http://www.open-mpi.org/faq/?categor...run-scheduling

Alternatively, you can try to compile MPICH from source. I've been able to run v1.2.7p1 without any dramas. This is quite straightforward if you take a look at the Allwmake scripts in $WM_THIRD_PARTY.

Hope this helps...

carmir January 18, 2010 08:10

Problem with openFoam and SGE
 
Hello to all,

I'm trying to run openFoam on a SGE Sun Cluster. When running the job on parallel in a single node, everything works. But when trying to run the same job on different nodes, I get the following error message:

epsilon2.o31752:
PHP Code:

  1 Warningno access to tty (Bad file descriptor).
  
2 Thus no job control in this shell.
  
3 Host key verification failed.^M
  4 
--------------------------------------------------------------------------
  
5 A daemon (pid 21783died unexpectedly with status 255 while attempting
  6 to launch so we are aborting
.
  
7
  8 There may be more information reported by the environment 
(see above).
  
9
 10 This may be because the daemon was unable to find all the needed shared
 11 libraries on the remote node
You may set your LD_LIBRARY_PATH to have the
 12 location of the shared libraries on the remote nodes 
and this will
 13 automatically be forwarded to the remote nodes
.
 
14 --------------------------------------------------------------------------
 
15 --------------------------------------------------------------------------
 
16 mpirun noticed that the job abortedbut has no info as to the process
 17 that caused that situation
.
 
18 --------------------------------------------------------------------------
 
19 mpirunclean termination accomplished 

To submmit the job I'm using the command qsub with the following script:

------------------------------------------------------------------------------
#!/bin/tcsh
# This is a simple example of a SGE batch script
#$-o /nfs/home/cardenas/Documents/OpenFOAM/Cases/Platte/Laenge120mm/Pulsierend/eps ilon2 -j y
#$-N epsilon2
#$-pe batch_64_2 2
#$-S /bin/tcsh
touch $HOME/.ssh/known_hosts
cd /nfs/home/cardenas/Documents/OpenFOAM/Cases/Platte/Laenge120mm/Pulsierend/epsilon2
touch -a ./*.*
touch -a ./system/*
source /nfs/home/cardenas/OpenFOAM/OpenFOAM-1.6.x/etc/cshrc
cat $PE_HOSTFILE |awk '{ print $1 " cpu=" $2}' > $HOME/mpi/machines.LINUX.$JOB_ID
sleep 10;
mpirun --hostfile $HOME/mpi/machines.LINUX.$JOB_ID -np 2 icoFoam -parallel >log

-----------------------------------------------------------------------------------------------

It seems that something with the Host Keys is not working properly, but since I'm not expirienced in SGE, I would appreciete any suggestions and hints. Thank you very much

Alejandro

olesen January 19, 2010 03:10

Quote:

Originally Posted by carmir (Post 242935)
Hello to all,

I'm trying to run openFoam on a SGE Sun Cluster. When running the job on parallel in a single node, everything works.

There are a myriad of things that could be going wrong.
The very first thing it to determine if GridEngine support has been compiled into your openmpi.

Use the command "ompi_info" to list all the backends and grep for gridengine. If it's not there, you should recompile openmpi using the --with-sge configure option (see the third-party Allwmake).


Quote:

Originally Posted by carmir (Post 242935)
touch $HOME/.ssh/known_hosts

^^^ what is this? Touching a file into existence doesn't make the hosts known!

Quote:

Originally Posted by carmir (Post 242935)
cat $PE_HOSTFILE |awk '{ print $1 " cpu=" $2}' > $HOME/mpi/machines.LINUX.$JOB_ID
...
mpirun --hostfile $HOME/mpi/machines.LINUX.$JOB_ID -np 2 icoFoam -parallel >log

If you have GridEngine and the openmpi is configured to use it, you should not be using--hostfile or -np. The GridEngine already knows how many slots you have (which would be $NSLOTS in your script), and it knows the host names too. It should also take care of inheriting the environment as well.

If the final backend uses rsh, ssh, or the GridEngine builtin transport will depend on what you have configured as the 'rsh_command' and 'rsh_daemon' in GridEngine.

BTW: your example is using cshell. Be certain that the queue is configured with the corresponding shell_start_mode. Be default this will be 'posix_compliant' (ie, use /bin/sh) and not 'unix_behavior' (ie, use #! to determine the shell/program).

schteff October 27, 2011 10:03

Pending but not running
 
Hi,

I also tried to run OpenFoam in parallel with SGE.
I use the following script to submit the job:
Code:

#!/bin/csh
#$ -V
###set queue
#$ -q normal
### number of processors and parallel environment
#$ -pe OpenFOAM 4

#$ -S /bin/csh

### Job name
#$ -N "mypartest"
### Start from current working directory
#$ -cwd
 
source ./soft/OpenFOAM/OpenFOAM-2.0.0/etc/cshrc

### Run application

mpirun -np ${NSLOTS} pisoFoam -parallel

I get the following error:


Code:

xhost: Command not found.

: Command not found.

: Command not found.

: Command not found.

: Command not found.
/soft/OpenFOAM/OpenFOAM-2.0.0/etc/cshrc

: No such file or directory.



I don´t know why the grid engine can´t find the command.


Does anybody have an idea why it doesn’t work? Or are there any settings I have to modify?

I´m thankful for any help

Stefan


rreis May 4, 2012 11:56

Quote:

Originally Posted by 4xF (Post 213293)
3) Submit your job with (for example with a run on 8 cores):
qsub RUN.sh
where RUN.sh contains:
#!/bin/sh
#$ -V
### number of processors and parallel environment
#$ -pe orte 8
### Job name
#$ -N "mypartest"
### Start from current working directory
#$ -cwd
### Generate the hostfile
HOSTFILE=system/hostfile
awk '{print $1" cpu=1"}' ${PE_HOSTFILE} > ${PWD}/${HOSTFILE}
### Run application
SOLVER=icoFoam
${MPI_ARCH_PATH}/bin/mpirun -np ${NSLOTS} --hostfile ${PWD}/${HOSTFILE} ${FOAM_APPBIN}/${SOLVER} -parallel
exit $?

If you change the submit script to have

Code:

HOSTFILE=system/hostfile
awk '{print $1" cpu="$2}' ${PE_HOSTFILE} > ${PWD}/${HOSTFILE}

it will become more general. Nice hack thx

rreis May 4, 2012 11:57

Quote:

Originally Posted by schteff (Post 329732)
Hi,

I also tried to run OpenFoam in parallel with SGE.
I use the following script to submit the job:
Code:

#!/bin/csh
#$ -V
###set queue
#$ -q normal
### number of processors and parallel environment
#$ -pe OpenFOAM 4

#$ -S /bin/csh

### Job name
#$ -N "mypartest"
### Start from current working directory
#$ -cwd
 
source ./soft/OpenFOAM/OpenFOAM-2.0.0/etc/cshrc

### Run application

mpirun -np ${NSLOTS} pisoFoam -parallel

I get the following error:


Code:

xhost: Command not found.

: Command not found.

: Command not found.

: Command not found.

: Command not found.
/soft/OpenFOAM/OpenFOAM-2.0.0/etc/cshrc

: No such file or directory.



I don´t know why the grid engine can´t find the command.


Does anybody have an idea why it doesn’t work? Or are there any settings I have to modify?

I´m thankful for any help

Stefan


what is the full path to the OpenFOAM dir? maybe just

/soft/OpenFOAM/OpenFOAM-2.0.0/etc/cshrc

without the initial . ?

tikulju August 22, 2012 09:27

solution to "Host key verification failed"
 
Hi!
If somebody is having problems with host-keys, adding a line
Code:

StrictHostKeyChecking no
at the end of
Code:

/etc/ssh/ssh_config
-file and restarting ssh-daemon from the computing nodes should fix it. Of course SGE has to know, that you're using ssh for communication instead rsh. This can be done by specifying
Code:

export OMPI_MCA_orte_rsh_agent=ssh
in the run script.


All times are GMT -4. The time now is 09:53.