October 17, 2015, 12:45
|
Error while running cfx in parallel configuration
|
#1
|
New Member
H. Omar
Join Date: Mar 2013
Posts: 23
Rep Power: 13
|
Hi everyone,
I am currently trying to run CFX (v16) in parallel configuration using the slurm manager. I used runCFX.sh script which is as follows:
Quote:
!/bin/bash
srun hostname -s > /tmp//hosts.$SLURM_JOB_ID
if [ "x$SLURM_NPROCS" = "x" ]; then
if [ "x$SLURM_NTASKS_PER_NODE" = "x" ];then
SLURM_NTASKS_PER_NODE=1
fi
SLURM_NPROCS=`expr $SLURM_JOB_NUM_NODES \* $SLURM_NTASKS_PER_NODE`
fi
# use ssh instead of rsh
export CFX5RSH=ssh
# format the host list for cfx
cfxHosts=`tr '\n' ',' < /tmp//hosts.$SLURM_JOB_ID`
# run the partitioner and solver
/usr/ansys_inc/v160/CFX/bin/cfx5solve -par -par-dist "$cfxHosts" -def ./AADL2.def -part $SLURM_NPROCS -start-method "Platform MPI Distributed Parallel"
# cleanup
rm /tmp/hosts.$SLURM_JOB_ID
|
I submitted the job with the following command line
Quote:
sbatch -n 10 -N 2 -p mypartition -t 10 ./runCFX.sh
|
i obtained the following error in the slurm output file:
Quote:
<IBM Platform MPI>: : warning, dlopen of libhwloc.so failed (null)/lib/linux_amd64/libhwloc.so: cannot open shared object file: No such file or directory
An error has occurred in cfx5solve:
The ANSYS CFX partitioner was interrupted by signal SEGV (11)
|
Can anyone help me to solve this issue? Thank you
|
|
|