Issues with mpirun in HPC
Hi all,
I've been trying to run an optimization software that will run several OpenFOAM cases in a HPC, but I'm having a hard time to make it work, maybe someone can help me with this... I have two scenarios, both should run 4 OpenFOAM cases using 4 processors each: 1- 16 processors allocated in only 1 node: in OpenFOAM I'm using all the commands in parallel in the format "mpirun -np 4 --bind-to none simpleFoam -parallel < /dev/null > log.simpleFoam 2>&1". The result is that the simulations are being run in the same processors, which is taking a very long time to finish. In my notebook I could use this command with no issues. 2- 16 processors allocated in 4 nodes (4 processors each node): I use the same kind of command, but now all the simulations are running in the same node, which also takes a long time to finish. Does anyone know how to solve any of the scenarios? I'm using SLURM to send the job, along with OF4.1 and Dakota for optimization. In case you need any additional information just let me know! Thanks! =) |
Quick answer/question:
|
Hi Bruno,
Thank you for your reply! Actually I have already checked the links you sent me, I found the bind options there, but I couldn't find a solution to my problem there. From your questions: 2- This is the job file (very simple, the action actually happens when I call Dakota, everything happens in the input for Dakota, including the generation of the cases to run): Code:
#!/bin/bash -v What I found weird is that when I run the same case (using less processors) in my notebook, it works fine, all OpenFOAM cases are distributed to the processors. The only solution I could think about is to use 4 nodes and specify the node to be used in each OpenFOAM case during the optimization process, but it will be a little difficult to do. This HPC is shared, it has 128 nodes, so I will only know which nodes will be allocated when the job starts, I cannot set this up before submitting the job. Let me know if you need any additional information! Best Regards, Luis |
Quick answers:
|
Hi Bruno,
From your answers: 1- At the same time I was writing this post I was writing a code to extract the nodes names, this part I was able to do. In my case, I'm saving the output of SLURM_JOB_NODELIST to a file, which gives me the name of all nodes allocated to my job; 2- Dakota is basically generating the input for my CFD cases, I have several scripts to read this input file and create the cases I need, including writing Allrun scripts, where I use the mpi commands to run OpenFOAM; 3- I was trying this option, but only giving a file with the name of all possible hosts, which didn't work... I guess I have to provide the specific host, right? 4- Makes a lot of sense to me, once a node can be shared with other users. I'll keep working in a way to provide specific hosts to each CFD case, this will give me a lot of work, but I don see other way out. Anyway, if you have any other advice, I'm opened to it!! Thank you for taking your Sunday time to help me!! =) |
@Bruno,
The administrators of the HPC changed some configurations, now I'm using the option --bind-to socket and I was able to check that each process is running in a different CPU. The speed is not as high as I was expecting, maybe because of the clock of the HPC CPUs, but that's something I cannot change. Thank you for the help! [Moderator note: This post was split half-way, where the first part was moved to here: https://www.cfd-online.com/Forums/op...ller-code.html ] |
All times are GMT -4. The time now is 04:33. |