CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   SU2 (https://www.cfd-online.com/Forums/su2/)
-   -   SU2 code scaling poorly on multiple nodes (https://www.cfd-online.com/Forums/su2/204239-su2-code-scaling-poorly-multiple-nodes.html)

Samirs July 18, 2018 05:22

SU2 code scaling poorly on multiple nodes
 
1 Attachment(s)
Hi All,

I have successfully compiled parallel version of SU2 on our HPC cluster having Intel Broadwell nodes. I made changes in parallel_computation.py so as to make mpirun command for running SU2_CFD in parallel. On single node I can see a linear scaling with number of mpi processes but when executing same script through batch mode using SLURM on multiple nodes, performance is degrades.

I tried simulation of Turbulent ONERA_M6 testcase.

Thanks in advance for your suggestions / help in this regard

Attached is slurm script to submit job

hlk August 25, 2018 19:15

You may want to refer to SU2_PY/SU2/run/interface.py to see how the mpi command is called from the python scripts, to make sure that this works with your cluster. You can also set the SU2_MPI_COMMAND in your config file to set something customized without needing to modify the python scripts.



Sometimes multiple nodes each with several processors scales worse than multiple processors within a single node because information now has to travel between multiple nodes rather than just within a single node - and, what sometimes surprises people, is that it actually matters how long the cable is that connects the nodes. However, on most modern clusters it shouldn't be so extreme as to prevent you from benefiting from multiple nodes. If it is an extreme difference you can try running other parallel programs that require communication between processes, or contacting your system administrators about what they expect for the difference in communication between vs within nodes, and tips on compiling in a way to optimize for the specific cluster architecture.



Quote:

Originally Posted by Samirs (Post 699646)
Hi All,

I have successfully compiled parallel version of SU2 on our HPC cluster having Intel Broadwell nodes. I made changes in parallel_computation.py so as to make mpirun command for running SU2_CFD in parallel. On single node I can see a linear scaling with number of mpi processes but when executing same script through batch mode using SLURM on multiple nodes, performance is degrades.

I tried simulation of Turbulent ONERA_M6 testcase.

Thanks in advance for your suggestions / help in this regard

Attached is slurm script to submit job



All times are GMT -4. The time now is 01:38.