CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   FLUENT (http://www.cfd-online.com/Forums/fluent/)
-   -   Fluent jobs through pbs (http://www.cfd-online.com/Forums/fluent/73532-fluent-jobs-through-pbs.html)

ibnkureshi March 10, 2010 15:39

Fluent jobs through pbs
 
Hi all,

I am a system admin at a UK university and we got a request to provide an HPC resource for fluent.

After installing Fluent on Our Cluster which is running CENTOS5.4 and is of the architecture nodes=16:ppn=4 plus a head node we were successfully able to start fluent vi the terminal and submit parallel jobs through the shell with a journal file and the -g switch. The University has 45 licenses for Fluent 6.3.26 and 30 licenses for an older version 6.0/2?? (not sure which). These licenses reside on a windows server with flexlm running on it. We have floating licenses for many softwares on that machine.

When I try to submit a job through the job scheduler PBS the simulations do not run as there is a license problem, even though it seems to be looking in the right place.

I have posted this on a PBS/TORQUE based forum as well but I thought since it dealt with fluent users here might be better help.

I would appreciate any help regarding this. Below are the PBS submission script, the journal file, the PBS output file and the PBS error file respectively.

______________________________
PBS Submission Script
______________________________
#!/bin/bash
#
# Example PBS script to run a job on the myrinet-3 cluster.
# The lines beginning #PBS set various queuing parameters.
#PBS -m e
# o -N Job Name
#PBS -N fluent
#PBS -M sengik@hud.ac.uk
#
# o -l resource lists that control where job goes
# here we ask for 3 nodes, each with the attribute "p4".
#PBS -l nodes=3
#
# o Where to write output
# asd PBS -e stderr
# asd PBS -o stdout
#
# o Export all my environment variables to the job
#PBS -V
#
fluent 2d -g -ssh -t3 -i /home/sengik/Desktop/test/input.in

______________________________
Journal File
______________________________
file/read-case /home/sengik/Desktop/test/2dcar_10.cas
solve/initialize/initialize-flow
file/write-data /home/sengik/Desktop/test/2dcar_10.dat
exit
yes
#as you can see just a simple case of load initialise save and exit

______________________________
PBS Output File
______________________________

/usr/Fluent.Inc/fluent6.3.26/bin/fluent -r6.3.26 2d -g -ssh -t3 -i /home/sengik/Desktop/test/input.in
Loading "/usr/Fluent.Inc/fluent6.3.26/lib/fluent.dmp.114-32"
Done.
/usr/Fluent.Inc/fluent6.3.26/bin/fluent -r6.3.26 2d -pethernet -host -alnx86 -t3 -mpi=hp -path/usr/Fluent.Inc -ssh -cx node16.testbed-CLS:56711:56126

Server node is down or not responding
See the system adminstrator about starting the server, or
make sure the you're referring to the right host (see LM_LICENSE_FILE)
Feature: fluent
Hostname: mech1
License path: 7241@mech1:/usr/Fluent.Inc/license/lnx86/../license.dat
FLEXlm error: -96,7. System Error: 11 "Resource temporarily unavailable"
For further information, refer to the FLEXlm End User Manual,
available at "www.macrovision.com".

______________________________
PBS Error File
______________________________
/usr/Fluent.Inc/fluent6.3.26/bin/fluent: line 2397: glxinfo: command not found
/usr/Fluent.Inc/fluent6.3.26/cortex/lnx86/cortex.3.7.3 -f fluent -g -i /home/sengik/Desktop/test/input.in (fluent "2d -pethernet -host -alnx86 -r6.3.26 -t3 -mpi=hp -path/usr/Fluent.Inc -ssh")
Starting /usr/Fluent.Inc/fluent6.3.26/lnx86/2d_host/fluent.6.3.26 host -cx node16.testbed-CLS:56711:56126 "(list (rpsetvar (QUOTE parallel/function) "fluent 2d -node -alnx86 -r6.3.26 -t3 -pethernet -mpi=hp -ssh") (rpsetvar (QUOTE parallel/rhost) "") (rpsetvar (QUOTE parallel/ruser) "") (rpsetvar (QUOTE parallel/nprocs_string) "3") (rpsetvar (QUOTE parallel/auto-spawn?) #t) (rpsetvar (QUOTE parallel/trace-level) 0) (rpsetvar (QUOTE parallel/remote-shell) 1) (rpsetvar (QUOTE parallel/path) "/usr/Fluent.Inc") (rpsetvar (QUOTE parallel/hostsfile) "") )"

Welcome to Fluent 6.3.26

Copyright 2006 Fluent Inc.
All Rights Reserved

Loading "/usr/Fluent.Inc/fluent6.3.26/lib/flprim.dmp.1119-32"
Done.

Unexpected license problem; exiting.

______________________________________

The simple line:
fluent 2d -g -ssh -t3 -cnf=<hostfile> -i /home/sengik/Desktop/test/input.in
works perfectly fine.

EDIT: PBS allocates the nodes correctly and is working fine. the simulation just ends when the license error occurs.


Thanks in advance for the help

ibnkureshi March 10, 2010 16:03

Cleaned Up the Script
 
I just realised that the script might be misleading as it is a sample one i usually use to experiment. I have cleaned it up:

#!/bin/bash
#PBS -S /bin/bash
#PBS -m e
#PBS -M sengik@hud.ac.uk
#PBS -N fluent
#PBS -l nodes=3
#
#PBS -e stderr
#PBS -o stdout
#
#PBS -V
#
fluent 2d -g -ssh -t3 -i /home/sengik/Desktop/test/input.in

ibnkureshi March 11, 2010 06:37

i was just wondering, would we need fluent-par licenses to submit a job like this through PBS?

when i submit of the commandline and specify multiple processors, it just takes away as many licenses it needs from the pool. But i was reading on another forum that through PBS fluent looks for fluent-par licenses. Is this really the case? Cant I just specify which pool it should take licenses from?

Shamoon Jamshed June 8, 2011 11:17

Quote:

Originally Posted by ibnkureshi (Post 249420)
I just realised that the script might be misleading as it is a sample one i usually use to experiment. I have cleaned it up:

#!/bin/bash
#PBS -S /bin/bash
#PBS -m e
#PBS -M sengik@hud.ac.uk
#PBS -N fluent
#PBS -l nodes=3
#
#PBS -e stderr
#PBS -o stdout
#
#PBS -V
#
fluent 2d -g -ssh -t3 -i /home/sengik/Desktop/test/input.in



Dear Mr. kureshi

If you are accessing the license server on a remote machine you on windows you must sepcify its IP address in the /etc/hosts file of the linux node. I havent tried this myself but one of my friend did that.

Also before that I recommend you to ping <IP remote PC> like ping 192.168.1.1 to see if your linux PC is seeing the remote one on the network? Then in the etc file as I said type
192.168.1.1 <license server name> <domain name>
192.168.1.1 node00 abc.ac.uk abc.ac.uk

Then try to run a serial solver first without a job scheduler to see if its picking the license or not.

If its OK go for parallel without job scheduler and then finally parallel with job scheduler. I hope it will work.

ibnkureshi June 9, 2011 12:02

Hi Shamoon,

This was sorted a while ago. We have a DNS server that would translate the license server name. What I had not done was enable IP forwarding on the head node of the cluster. The job scheduler makes a node the simulation controller which books out the license. So when I would run directly it would work but through the script it would not work.

It was just a matter of enabling IPTABLES and setting the appropriate rules.

Thanks for your input though. If you would like any help do let me know

Regards


Ibad Kureshi
Lecturer: Department of Engineering and Technology
Administrator: High Performance Computing - Resource Centre
Postgraduate Researcher: School of Computing and Engineering

Canal Side East 2/13
University of Huddersfield
Queensgate
Huddersfield HD13DH
t: 01484 422288 ext 1855

Shamoon Jamshed June 9, 2011 13:43

Quote:

Originally Posted by ibnkureshi (Post 311312)
Hi Shamoon,

This was sorted a while ago. We have a DNS server that would translate the license server name. What I had not done was enable IP forwarding on the head node of the cluster. The job scheduler makes a node the simulation controller which books out the license. So when I would run directly it would work but through the script it would not work.

It was just a matter of enabling IPTABLES and setting the appropriate rules.

Thanks for your input though. If you would like any help do let me know

Regards


Ibad Kureshi
Lecturer: Department of Engineering and Technology
Administrator: High Performance Computing - Resource Centre
Postgraduate Researcher: School of Computing and Engineering

Canal Side East 2/13
University of Huddersfield
Queensgate
Huddersfield HD13DH
t: 01484 422288 ext 1855


Thanks Mr. Kureshi

I assume that you are from Pakistan. :) Same as I. Well, in my case I am facing problem with Fluent 6.3. I have installed it on multiple nodes (workstations) and then when I try to run in parallel, on the head node it says after after few 1000 iterations "connection to license server lost". I don't know why is it so? I have license installed on each node because sometime i have to use them individually. I have ssh properly configured on each node plus I have password-less environment set. Remember that I do not use job scheduler.


All times are GMT -4. The time now is 11:48.