CFD Online Discussion Forums - Issues with mpirun in HPC

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)

- - Issues with mpirun in HPC (https://www.cfd-online.com/Forums/openfoam-solving/215978-issues-mpirun-hpc.html)

Issues with mpirun in HPC

Hi all,

I've been trying to run an optimization software that will run several OpenFOAM cases in a HPC, but I'm having a hard time to make it work, maybe someone can help me with this...

I have two scenarios, both should run 4 OpenFOAM cases using 4 processors each:

1- 16 processors allocated in only 1 node: in OpenFOAM I'm using all the commands in parallel in the format "mpirun -np 4 --bind-to none simpleFoam -parallel < /dev/null > log.simpleFoam 2>&1". The result is that the simulations are being run in the same processors, which is taking a very long time to finish. In my notebook I could use this command with no issues.

2- 16 processors allocated in 4 nodes (4 processors each node): I use the same kind of command, but now all the simulations are running in the same node, which also takes a long time to finish.

Does anyone know how to solve any of the scenarios? I'm using SLURM to send the job, along with OF4.1 and Dakota for optimization.

In case you need any additional information just let me know!

Thanks! =)

Quick answer/question:

From your description, the problem is that you are using the same job for running the 4 cases.
If you were to provide an example SLURM job file that you are using, it would make it a lot easier to try and suggest how to correct this issue.
Knowing which MPI toolbox and version is being used would also make it a bit easier to suggest more specific options...
I've got a fairly large blog post with notes on the overall topic: Notes about running OpenFOAM in parallel
- From there, see post #9 on this thread: https://www.cfd-online.com/Forums/ha...tml#post356954

Hi Bruno,

Thank you for your reply!

Actually I have already checked the links you sent me, I found the bind options there, but I couldn't find a solution to my problem there.

From your questions:
2- This is the job file (very simple, the action actually happens when I call Dakota, everything happens in the input for Dakota, including the generation of the cases to run):

Code:

#!/bin/bash  -v

#SBATCH --partition=SP2

#SBATCH --ntasks=16             # number of tasks / mpi processes

#SBATCH --ntasks-per-node=4             # number of tasks / node (teste)

#SBATCH --cpus-per-task=1       # Number OpenMP Threads per process

#SBATCH -J OpenFOAM

#SBATCH --time=15:00:00         # Se voce nao especificar, o default é 8 horas. O limite é 480

#SBATCH --mem-per-cpu=2048     # 24 GB RAM por CPU. Maximo de 480000 por todas as CPUs





#OpenMP settings:

export OMP_NUM_THREADS=1

export MKL_NUM_THREADS=1

export OMP_PLACES=threads

export OMP_PROC_BIND=spread



echo $SLURM_JOB_ID              #ID of job allocation

echo $SLURM_SUBMIT_DIR          #Directory job where was submitted

echo $SLURM_JOB_NODELIST        #File containing allocated hostnames

echo $SLURM_NTASKS              #Total number of cores for job



#run the application:



export PATH=/apps/gnu:$PATH



cd $HOME/opt_test/bump/18_WTGs



./initial_sol_run



dakota -i dakota_of.in



exit

3- I'm using openmpi 1.10.7

What I found weird is that when I run the same case (using less processors) in my notebook, it works fine, all OpenFOAM cases are distributed to the processors.

The only solution I could think about is to use 4 nodes and specify the node to be used in each OpenFOAM case during the optimization process, but it will be a little difficult to do. This HPC is shared, it has 128 nodes, so I will only know which nodes will be allocated when the job starts, I cannot set this up before submitting the job.

Let me know if you need any additional information!

Best Regards,
Luis

Quick answers:

From your job script, the variable "SLURM_JOB_NODELIST" has the path to the file that contains the node names, therefore that way you can have some method to extract the names of the machines and adjust the call to mpirun for each case.
- Although, the SLURM documentation states that it provides the list of nodes directly and not the path to a file.
I'm not familiar with Dakota, so I don't know how you are telling it how to run cases in parallel.
If Open-MPI is being used, the selection of nodes (machines) can be done with "--host" as explained here: https://www.open-mpi.org/faq/?catego...mpirun-options
My guess is that SLURM has overridden how mpirun works and might disregard the option "--bind-to none" for security reasons, e.g. to avoid having someone use the cores from another run on a node.

Hi Bruno,

From your answers:

1- At the same time I was writing this post I was writing a code to extract the nodes names, this part I was able to do. In my case, I'm saving the output of SLURM_JOB_NODELIST to a file, which gives me the name of all nodes allocated to my job;

2- Dakota is basically generating the input for my CFD cases, I have several scripts to read this input file and create the cases I need, including writing Allrun scripts, where I use the mpi commands to run OpenFOAM;

3- I was trying this option, but only giving a file with the name of all possible hosts, which didn't work... I guess I have to provide the specific host, right?

4- Makes a lot of sense to me, once a node can be shared with other users.

I'll keep working in a way to provide specific hosts to each CFD case, this will give me a lot of work, but I don see other way out.

Anyway, if you have any other advice, I'm opened to it!!

Thank you for taking your Sunday time to help me!! =)

@Bruno,

The administrators of the HPC changed some configurations, now I'm using the option --bind-to socket and I was able to check that each process is running in a different CPU.

The speed is not as high as I was expecting, maybe because of the clock of the HPC CPUs, but that's something I cannot change.

Thank you for the help!

[Moderator note: This post was split half-way, where the first part was moved to here: https://www.cfd-online.com/Forums/op...ller-code.html ]