CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Running, Solving & CFD

Issues with mpirun in HPC

Register Blogs Community New Posts Updated Threads Search

Like Tree2Likes
  • 1 Post By lebc
  • 1 Post By lebc

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   March 24, 2019, 14:34
Default Issues with mpirun in HPC
  #1
Member
 
Luis Eduardo
Join Date: Jan 2011
Posts: 85
Rep Power: 15
lebc is on a distinguished road
Hi all,

I've been trying to run an optimization software that will run several OpenFOAM cases in a HPC, but I'm having a hard time to make it work, maybe someone can help me with this...

I have two scenarios, both should run 4 OpenFOAM cases using 4 processors each:

1- 16 processors allocated in only 1 node: in OpenFOAM I'm using all the commands in parallel in the format "mpirun -np 4 --bind-to none simpleFoam -parallel < /dev/null > log.simpleFoam 2>&1". The result is that the simulations are being run in the same processors, which is taking a very long time to finish. In my notebook I could use this command with no issues.

2- 16 processors allocated in 4 nodes (4 processors each node): I use the same kind of command, but now all the simulations are running in the same node, which also takes a long time to finish.

Does anyone know how to solve any of the scenarios? I'm using SLURM to send the job, along with OF4.1 and Dakota for optimization.

In case you need any additional information just let me know!

Thanks! =)
lebc is offline   Reply With Quote

Old   March 24, 2019, 14:50
Default
  #2
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,975
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Quick answer/question:
  1. From your description, the problem is that you are using the same job for running the 4 cases.
  2. If you were to provide an example SLURM job file that you are using, it would make it a lot easier to try and suggest how to correct this issue.
  3. Knowing which MPI toolbox and version is being used would also make it a bit easier to suggest more specific options...
  4. I've got a fairly large blog post with notes on the overall topic: Notes about running OpenFOAM in parallel
__________________
wyldckat is offline   Reply With Quote

Old   March 24, 2019, 15:48
Default
  #3
Member
 
Luis Eduardo
Join Date: Jan 2011
Posts: 85
Rep Power: 15
lebc is on a distinguished road
Hi Bruno,

Thank you for your reply!

Actually I have already checked the links you sent me, I found the bind options there, but I couldn't find a solution to my problem there.

From your questions:
2- This is the job file (very simple, the action actually happens when I call Dakota, everything happens in the input for Dakota, including the generation of the cases to run):
Code:
#!/bin/bash  -v
#SBATCH --partition=SP2
#SBATCH --ntasks=16             # number of tasks / mpi processes
#SBATCH --ntasks-per-node=4             # number of tasks / node (teste)
#SBATCH --cpus-per-task=1       # Number OpenMP Threads per process
#SBATCH -J OpenFOAM
#SBATCH --time=15:00:00         # Se voce nao especificar, o default é 8 horas. O limite é 480
#SBATCH --mem-per-cpu=2048     # 24 GB RAM por CPU. Maximo de 480000 por todas as CPUs


#OpenMP settings:
export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export OMP_PLACES=threads
export OMP_PROC_BIND=spread

echo $SLURM_JOB_ID              #ID of job allocation
echo $SLURM_SUBMIT_DIR          #Directory job where was submitted
echo $SLURM_JOB_NODELIST        #File containing allocated hostnames
echo $SLURM_NTASKS              #Total number of cores for job

#run the application:

export PATH=/apps/gnu:$PATH

cd $HOME/opt_test/bump/18_WTGs

./initial_sol_run

dakota -i dakota_of.in

exit
3- I'm using openmpi 1.10.7

What I found weird is that when I run the same case (using less processors) in my notebook, it works fine, all OpenFOAM cases are distributed to the processors.

The only solution I could think about is to use 4 nodes and specify the node to be used in each OpenFOAM case during the optimization process, but it will be a little difficult to do. This HPC is shared, it has 128 nodes, so I will only know which nodes will be allocated when the job starts, I cannot set this up before submitting the job.

Let me know if you need any additional information!

Best Regards,
Luis
lebc is offline   Reply With Quote

Old   March 24, 2019, 16:51
Default
  #4
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,975
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Quick answers:
  1. From your job script, the variable "SLURM_JOB_NODELIST" has the path to the file that contains the node names, therefore that way you can have some method to extract the names of the machines and adjust the call to mpirun for each case.
    • Although, the SLURM documentation states that it provides the list of nodes directly and not the path to a file.
  2. I'm not familiar with Dakota, so I don't know how you are telling it how to run cases in parallel.
  3. If Open-MPI is being used, the selection of nodes (machines) can be done with "--host" as explained here: https://www.open-mpi.org/faq/?catego...mpirun-options
  4. My guess is that SLURM has overridden how mpirun works and might disregard the option "--bind-to none" for security reasons, e.g. to avoid having someone use the cores from another run on a node.
wyldckat is offline   Reply With Quote

Old   March 24, 2019, 17:41
Default
  #5
Member
 
Luis Eduardo
Join Date: Jan 2011
Posts: 85
Rep Power: 15
lebc is on a distinguished road
Hi Bruno,

From your answers:

1- At the same time I was writing this post I was writing a code to extract the nodes names, this part I was able to do. In my case, I'm saving the output of SLURM_JOB_NODELIST to a file, which gives me the name of all nodes allocated to my job;

2- Dakota is basically generating the input for my CFD cases, I have several scripts to read this input file and create the cases I need, including writing Allrun scripts, where I use the mpi commands to run OpenFOAM;

3- I was trying this option, but only giving a file with the name of all possible hosts, which didn't work... I guess I have to provide the specific host, right?

4- Makes a lot of sense to me, once a node can be shared with other users.

I'll keep working in a way to provide specific hosts to each CFD case, this will give me a lot of work, but I don see other way out.

Anyway, if you have any other advice, I'm opened to it!!

Thank you for taking your Sunday time to help me!! =)
wyldckat likes this.
lebc is offline   Reply With Quote

Old   April 22, 2019, 17:20
Default
  #6
Member
 
Luis Eduardo
Join Date: Jan 2011
Posts: 85
Rep Power: 15
lebc is on a distinguished road
@Bruno,

The administrators of the HPC changed some configurations, now I'm using the option --bind-to socket and I was able to check that each process is running in a different CPU.

The speed is not as high as I was expecting, maybe because of the clock of the HPC CPUs, but that's something I cannot change.

Thank you for the help!

[Moderator note: This post was split half-way, where the first part was moved to here: mpi killer code ]
wyldckat likes this.

Last edited by wyldckat; April 22, 2019 at 17:23. Reason: see "Moderator note:"
lebc is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
mpirun hangup.. can anyone help how to fix mpirun issues. prameelar OpenFOAM 12 February 16, 2022 16:23
MPIrun Problem with FoamJob Workaround kaszt OpenFOAM Running, Solving & CFD 3 October 4, 2018 12:55
mpirun unable to find SU2_PRT Apollinaris SU2 Installation 1 May 10, 2017 05:31
Multigrid Stability Issues ThomasHermann SU2 1 November 5, 2014 16:18
Can we merge HPC Pack licenses? Phillamon FLUENT 0 January 24, 2014 02:59


All times are GMT -4. The time now is 23:06.