problem of running parallel Fluent on linux cluster
the case runs fine if I require several processors on the SAME node, but if the processors are on different nodes, I have the "Connection refused" problem.
I search online and see that some people have the similar problem, but I can not find a solution to this specific problem. the output from Fluent and the submission script are attached below. Thanks in advance! OUTPUT FROM FLUENT ----------------------------------------- /opt/hpc/Fluent.Inc/fluent6.3.26/bin/fluent -r6.3.26 -pib -cnf=/var/spool/PBS/aux//3666504.cmgr01 -g 2ddp -t6 -i test2.jou /opt/hpc/Fluent.Inc/fluent6.3.26/cortex/lnamd64/cortex.3.7.3 -f fluent -g -i test2.jou (fluent "2ddp -pib -host -r6.3.26 -t6 -mpi=hp -cnf=/var/spool/PBS/aux//3666504.cmgr01 -path/opt/hpc/Fluent.Inc") Loading "/opt/hpc/Fluent.Inc/fluent6.3.26/lib/fluent.dmp.114-64" Done. /opt/hpc/Fluent.Inc/fluent6.3.26/bin/fluent -r6.3.26 2ddp -pib -host -t6 -mpi=hp -cnf=/var/spool/PBS/aux//3666504.cmgr01 -path/opt/hpc/Fluent.Inc -cx scw-029.i:59263:37434 Starting /opt/hpc/Fluent.Inc/fluent6.3.26/lnamd64/2ddp_host/fluent.6.3.26 host -cx scw-029.i:59263:37434 "(list (rpsetvar (QUOTE parallel/function) "fluent 2ddp -node -r6.3.26 -t6 -pib -mpi=hp -cnf=/var/spool/PBS/aux//3666504.cmgr01 ") (rpsetvar (QUOTE parallel/rhost) "") (rpsetvar (QUOTE parallel/ruser) "") (rpsetvar (QUOTE parallel/nprocs_string) "6") (rpsetvar (QUOTE parallel/auto-spawn?) #t) (rpsetvar (QUOTE parallel/trace-level) 0) (rpsetvar (QUOTE parallel/remote-shell) 0) (rpsetvar (QUOTE parallel/path) "/opt/hpc/Fluent.Inc") (rpsetvar (QUOTE parallel/hostsfile) "/var/spool/PBS/aux//3666504.cmgr01") )" Welcome to Fluent 6.3.26 Copyright 2006 Fluent Inc. All Rights Reserved Loading "/opt/hpc/Fluent.Inc/fluent6.3.26/lib/flprim.dmp.1119-64" Done. Host spawning Node 0 on machine "scw-029" (unix). /opt/hpc/Fluent.Inc/fluent6.3.26/bin/fluent -r6.3.26 2ddp -node -t6 -pib -mpi=hp -cnf=/var/spool/PBS/aux//3666504.cmgr01 -mport 192.168.2.129:192.168.2.129:34193:0 Starting /opt/hpc/Fluent.Inc/fluent6.3.26/multiport/mpi/lnamd64/hp/bin/mpirun -prot -vapi -e MPI_HASIC_VAPI=1 -e MPI_USE_MALLOPT_SBRK_PROTECTION=1 -e MPI_USE_MALLOPT_AVOID_MMAP=1 -f /tmp/fluent-appfile.32087 192.168.2.135: Connection refused mpirun: Warning one more more remote shell commands exited with non-zero status, which may indicate a remote access problem. SUBMISSION SCRIPT ----------------------------------------- #!/bin/sh #PBS -j oe #PBS -l nodes=2:ppn=3 #PBS -q main #PBS -l walltime=00:10:00 cd ${PBS_O_WORKDIR} cat ${PBS_NODEFILE} #Set variables for script # What version of the solver to use FLUENTSOLVER=2ddp #HOW MANY CPUS- note that you'll still need to update the $PBS -l nodes line CPUCOUNT=6 #Which input journal file to use to give fluent? #INPUT=${PBS_O_WORKDIR}/${PBS_JOBNAME} INPUT=test2.jou #Where do we want to put output at? OUTPUT=${PBS_O_WORKDIR}/${PBS_JOBID}.out # Run Fluent with: # -pib use Infiniband parallel # -cnf=$PBS_NODEFILE get the list of machines PBS is running on from the server # -t$CPUCOUNT use $CPUCOUNT CPUs total # -g no graphics, batch mode # -i read the file in $INPUT # > $OUTPUT 2>&1 Redirect program output to a file in your home directory. fluent $FLUENTSOLVER -t$CPUCOUNT -pib cnf=$PBS_NODEFILE -g -i $INPUT > $OUTPUT 2>&1 |
Are you able to connect to your nodes with rsh or ssh?
|
yes, I have access to all nodes using SSH
|
passwords aren't required?
|
I log into the cluster before submitting the job. so there should be no problem with password.
|
If guys from Fluent.INC see this thread, can you take a look?
|
I am not an linux expert / fluent-parallel , but you said that you are using SSH.
SSH needs password. if you try a command on a node, what is the result? > ssh ip-address ls |
Hi, mAx, There is no problem. the output is similar to the following
bushivan@scw-097:~>ssh 192.166.2.195 ls airfoil airfoil_ins_eig default_id.dbs ... |
Is there an administrator under your Cluster or did you set it yourself?
|
The cluster is operated by the HPC center. I just submit my job there. I actually talked to people of HPC, and they said something like the MPI used by the clusters is not compatible with Fluent, but I am not sure if they are right. So I just post my problem here and see if anyone has encountered the same problem.
|
Then they are the right people to solve your problem.
But it may be sad not being able to use fluent's parallel enhancement with this cluster |
i know this is late, but you have to give the '-ssh' in the fluent command line in the submission file. that forces fluent to use ssh rather than rsh which it always goes to by default
|
Running job on HPC
Hi every one,
I have accessed to the super computer facility in my univeristy. I have also a node on which i have to run my simulation , but the problem is, how can i setup my case using commands. I want linux to read my mesh or case file , but i am facing problems. Please help me in this reagrd |
Quote:
Hi, install mobaxterm, connect to your cluster and you can able to compile |
Quote:
|
Quote:
|
All times are GMT -4. The time now is 22:42. |