CFD Online Discussion Forums - Parallel Fluent Error in Batch Mode

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- FLUENT (https://www.cfd-online.com/Forums/fluent/)

- - Parallel Fluent Error in Batch Mode (https://www.cfd-online.com/Forums/fluent/50058-parallel-fluent-error-batch-mode.html)

Parallel Fluent Error in Batch Mode

I am experiencing an error message when trying to run a fluent job in parallel via a PBS batch system on a Unix cluster.

The case will load in interactive mode, but when I try to launch it in batch mode something goes wrong and it gives an error before loading the grid.

Here is the error from the fluent output file:

Multicore processors detected. Processor affinity set!

Reading "Case1FullStack.cas"...

MPI Application rank 0 killed before MPI_Finalize() with signal 9 node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... 999999 (../../src/mpsystem.c@1123): mpt_read: failed: errno = 11 999999: mpt_read: error: read failed trying to read 4 bytes: Resource temporarily unavailable

Here is the batch file: #PBS -N parallel_fluent #PBS -l walltime=1:00:00 #PBS -l nodes=1:ppn=4 #PBS -l software=fluent:fluentpar+4 #PBS -j oe #PBS -m ae #PBS -S /bin/csh set echo on hostname module load fluent cd $PBS_O_WORKDIR cat $PBS_NODEFILE | sort > pnodes set ncpus=`cat pnodes | wc -l` fluent 3ddp -t$ncpus -pinfiniband.ofed -cnf=pnodes -g < Case1Fullgrdck.input

And the input file: file/read-case Case1FullStack.cas grid/check solve/initialize/initialize-flow file/write-data Case1FullStack.dat exit yes

Any input on what I am doing wrong would be greatly appreciated. Thanks Justin

Re: Parallel Fluent Error in Batch Mode

Nevermind. I figured it out. It was using too much RAM for a single compute node, so I switched to one core per node on multiple nodes.

Quote:

Originally Posted by Justin
;155513

Nevermind. I figured it out. It was using too much RAM for a single compute node, so I switched to one core per node on multiple nodes.

Hi Justin, could you elaborate how you solved this problem, I just met with the same thing.
The funny thing is that the simulation ran for 12 time steps before meeting this problem.

Thanks!

Hi, guys did you figure out? I am having the same problem. I guess it is also related to memory.
MPI Application rank 4 killed before MPI_Finalize() with signal 9
Node 12: Process 22312: Received signal SIGTERM.
Node 13: Process 22313: Received signal SIGTERM.
Node 14: Process 22314: Received signal SIGTERM.
Node 8: Process 22308: Received signal SIGTERM.
Node 5: Process 22305: Received signal SIGTERM.
Node 2: Process 22302: Received signal SIGTERM.
Node 11: Process 22311: Received signal SIGTERM.
===============Message from the Cortex Process================================
Fatal error in one of the compute processes.