CFD Online Discussion Forums

CFD Online Discussion Forums (
-   FLUENT (
-   -   problem of running parallel Fluent on linux cluster (

ivanbuz July 22, 2009 17:33

problem of running parallel Fluent on linux cluster
the case runs fine if I require several processors on the SAME node, but if the processors are on different nodes, I have the "Connection refused" problem.

I search online and see that some people have the similar problem, but I can not find a solution to this specific problem. the output from Fluent and the submission script are attached below.

Thanks in advance!

/opt/hpc/Fluent.Inc/fluent6.3.26/bin/fluent -r6.3.26 -pib -cnf=/var/spool/PBS/aux//3666504.cmgr01 -g 2ddp -t6 -i test2.jou
/opt/hpc/Fluent.Inc/fluent6.3.26/cortex/lnamd64/cortex.3.7.3 -f fluent -g -i test2.jou (fluent "2ddp -pib -host -r6.3.26 -t6 -mpi=hp -cnf=/var/spool/PBS/aux//3666504.cmgr01 -path/opt/hpc/Fluent.Inc")
Loading "/opt/hpc/Fluent.Inc/fluent6.3.26/lib/fluent.dmp.114-64"
/opt/hpc/Fluent.Inc/fluent6.3.26/bin/fluent -r6.3.26 2ddp -pib -host -t6 -mpi=hp -cnf=/var/spool/PBS/aux//3666504.cmgr01 -path/opt/hpc/Fluent.Inc -cx scw-029.i:59263:37434
Starting /opt/hpc/Fluent.Inc/fluent6.3.26/lnamd64/2ddp_host/fluent.6.3.26 host -cx scw-029.i:59263:37434 "(list (rpsetvar (QUOTE parallel/function) "fluent 2ddp -node -r6.3.26 -t6 -pib -mpi=hp -cnf=/var/spool/PBS/aux//3666504.cmgr01 ") (rpsetvar (QUOTE parallel/rhost) "") (rpsetvar (QUOTE parallel/ruser) "") (rpsetvar (QUOTE parallel/nprocs_string) "6") (rpsetvar (QUOTE parallel/auto-spawn?) #t) (rpsetvar (QUOTE parallel/trace-level) 0) (rpsetvar (QUOTE parallel/remote-shell) 0) (rpsetvar (QUOTE parallel/path) "/opt/hpc/Fluent.Inc") (rpsetvar (QUOTE parallel/hostsfile) "/var/spool/PBS/aux//3666504.cmgr01") )"
Welcome to Fluent 6.3.26
Copyright 2006 Fluent Inc.
All Rights Reserved
Loading "/opt/hpc/Fluent.Inc/fluent6.3.26/lib/flprim.dmp.1119-64"

Host spawning Node 0 on machine "scw-029" (unix).
/opt/hpc/Fluent.Inc/fluent6.3.26/bin/fluent -r6.3.26 2ddp -node -t6 -pib -mpi=hp -cnf=/var/spool/PBS/aux//3666504.cmgr01 -mport
Starting /opt/hpc/Fluent.Inc/fluent6.3.26/multiport/mpi/lnamd64/hp/bin/mpirun -prot -vapi -e MPI_HASIC_VAPI=1 -e MPI_USE_MALLOPT_SBRK_PROTECTION=1 -e MPI_USE_MALLOPT_AVOID_MMAP=1 -f /tmp/fluent-appfile.32087 Connection refused
mpirun: Warning one more more remote shell commands exited with non-zero status, which may indicate a remote access problem.

#PBS -j oe
#PBS -l nodes=2:ppn=3
#PBS -q main
#PBS -l walltime=00:10:00
#Set variables for script
# What version of the solver to use
#HOW MANY CPUS- note that you'll still need to update the $PBS -l nodes line
#Which input journal file to use to give fluent?
#Where do we want to put output at?

# Run Fluent with:
# -pib use Infiniband parallel
# -cnf=$PBS_NODEFILE get the list of machines PBS is running on from the server
# -t$CPUCOUNT use $CPUCOUNT CPUs total
# -g no graphics, batch mode
# -i read the file in $INPUT
# > $OUTPUT 2>&1 Redirect program output to a file in your home directory.

-mAx- July 23, 2009 02:32

Are you able to connect to your nodes with rsh or ssh?

ivanbuz July 23, 2009 04:15

yes, I have access to all nodes using SSH

-mAx- July 23, 2009 04:44

passwords aren't required?

ivanbuz July 23, 2009 15:07

I log into the cluster before submitting the job. so there should be no problem with password.

ivanbuz July 23, 2009 15:10

If guys from Fluent.INC see this thread, can you take a look?

-mAx- July 24, 2009 02:19

I am not an linux expert / fluent-parallel , but you said that you are using SSH.
SSH needs password.
if you try a command on a node, what is the result?
> ssh ip-address ls

ivanbuz July 24, 2009 16:10

Hi, mAx, There is no problem. the output is similar to the following

bushivan@scw-097:~>ssh ls

-mAx- July 24, 2009 17:35

Is there an administrator under your Cluster or did you set it yourself?

ivanbuz July 24, 2009 18:43

The cluster is operated by the HPC center. I just submit my job there. I actually talked to people of HPC, and they said something like the MPI used by the clusters is not compatible with Fluent, but I am not sure if they are right. So I just post my problem here and see if anyone has encountered the same problem.

-mAx- July 25, 2009 03:15

Then they are the right people to solve your problem.
But it may be sad not being able to use fluent's parallel enhancement with this cluster

ibnkureshi March 10, 2010 16:13

i know this is late, but you have to give the '-ssh' in the fluent command line in the submission file. that forces fluent to use ssh rather than rsh which it always goes to by default

All times are GMT -4. The time now is 01:47.