CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > Siemens > STAR-CCM+

Tutorial: Running STARCCM on Ubuntu with SLURM and OpenMPI over Infiniband

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   July 11, 2020, 05:35
Post Tutorial: Running STARCCM on Ubuntu with SLURM and OpenMPI over Infiniband
  #1
New Member
 
Erik Lönroth
Join Date: Jul 2020
Location: Sweden
Posts: 3
Rep Power: 5
erik_lonroth is on a distinguished road
This is a tutorial on running a reference StarCCM+ job on Ubuntu18.04 using the snap version of SLURM with openMPI 4.0.4 over infiniband.

You could use this to perform scaling studies, track down issues and optimizing performance or use it as you like. Much of this will work on other OS:es too.

This is the workbench used:

* Hardware: 2 hosts with 2x20 cores 187GB ram.
* Infiniband: Mellanox MT28908 Family [ConnectX-6]
* OS: Linux 4.15.0-109-generic (x86_64) Ubuntu18.04.4
* SLURM 20.04 (https://snapcraft.io/slurm)
* OpenMPI: 4.0.4 (ucx, openib)
* StarCCM+: STAR-CCM+14.06.012
* A Reference model which is small enough for your computers and large enough to run over 2 nodes.

Lets get started.

Modify ulimits on all nodes
This is done by editing /etc/security/limits.d/30-slurm.conf
Code:
* soft nofile  65000
* hard nofile  65000
* soft memlock unlimited
* hard memlock unlimited
* soft stack unlimited
* hard stack unlimited
Modify slurm systemd unit startup files
Code:
$ sudo systemctl edit snap.slurm.slurmctld.service
Code:
[Service]
LimitNOFILE=131072
LimitMEMLOCK=infinity
LimitSTACK=infinity
* Restart slurm on all nodes.

* Make sure login nodes has correct ulimits after a login.

* Validate that all worker nodes also has correct values on ulimits when using slurm. For example:

Code:
$ srun -N 1
ulimit -a
You must have all consistent settings for ulimit or things will go sideways. Remember that slurm propagates ulimits from the submitting node, so make sure those are consistent there too.

Compile OpenMPI 4.0.4
At the time, this is the latest version. This is my configure but I think you can compile it differently for your needs.

Code:
$ ./configure --without-cm --with-ib --prefix=/opt/openmpi-4.0.4
Validate that openmpi can see the correct mca ucx
Code:
/opt/openmpi-4.0.4/bin/ompi_info  | grep -E 'btl|ucx'
MCA btl: openib (MCA v2.1.0, API v3.1.0, Component v4.0.4)
MCA btl: vader (MCA v2.1.0, API v3.1.0, Component v4.0.4)
MCA btl: self (MCA v2.1.0, API v3.1.0, Component v4.0.4)
MCA btl: tcp (MCA v2.1.0, API v3.1.0, Component v4.0.4)
MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v4.0.4)
MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v4.0.4)
MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v4.0.4)
What we are looking for here is:
* MCA btl: openib (MCA v2.1.0, API v3.1.0, Component v4.0.4)
* MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v4.0.4)

The rest are not important at this point. But you might know better, please let me know. You can see in the jobscript later where these modules are referenced.

Validate that ucx_info see your Infiniband device and ib_verbs transports
In my case, I have a Mellanox device (show with: ibv_devices) so I should see that with ucx_info:

Code:
ucx_info -d | grep -1 mlx5_0
$ ucx_info -d | grep -1 mlx5_0
#
# Memory domain: mlx5_0
# Component: ib
--
# Transport: rc_verbs
# Device: mlx5_0:1
#
--
# Transport: rc_mlx5
# Device: mlx5_0:1
#
--
# Transport: dc_mlx5
# Device: mlx5_0:1
#
--
# Transport: ud_verbs
# Device: mlx5_0:1
#
--
# Transport: ud_mlx5
# Device: mlx5_0:1
#

Modify the STARCCM+ installation
My version of StarCCM uses an old ucx and calls /usr/bin/ucx_info. At some point during startup, it fails when its not able to find libibcm.so.1 when using our custom openMPI. Perhaps there is a way to force starccm+ to look for ucx_info on the system, but I have not found any way to do this.

To have StarCCM+ ignore its own ucx, simply remove the ucx from the installation tree and replace with an empty directory.

Code:
rm -rf /opt/STAR-CCM+14.06.012/ucx/1.5.0-cda-001/linux-x86_64*
mkdir -p /opt/STAR-CCM+14.06.012/ucx/1.5.0-cda-001/linux-x86_64-2.17/gnu7.1/lib
This is not needed on OS:es such at centos6 and centos7 because they use the deprecated libibcm.so.1.

Time to write the job-script
Code:
#!/bin/bash
#SBATCH -J starccmref
#SBATCH -N 2
#SBATCH -n 80

set -o xtrace
set -e

# StarCCM+
export PATH=$PATH:/opt/STAR-CCM+14.06.012/star/bin

# OpenMPI
export OPENMPI_DIR=/opt/openmpi-4.0.4
export PATH=${OPENMPI_DIR}/bin:$PATH
export LD_LIBRARY_PATH=${OPENMPI_DIR}/lib

# Report on the versions for logs
which ompi_info
which mpirun
ompi_info | grep btl
ompi_info | grep ucx

# Kill any leftovers from previous runs
kill_starccm+

CDLMD_LICENSE_FILE="27012@license.server.com"
SIM_FILE=SteadyFlowBackwardFacingStep_final.sim
STAR_CLASS_PATH="/software/Java/poi-3.7-FINAL"

NODE_FILE="nodefile"

# Assemble a nodelist using this python lib
hostListbin=/software/hostlist/python-hostlist-1.18/hostlist

$hostListbin --append=: --append-slurm-tasks=$SLURM_TASKS_PER_NODE -e $SLURM_JOB_NODELIST >  $NODE_FILE

# Start
starccm+ -machinefile ${NODE_FILE} \
         -power \
         -batch ./starccmSim.java \
         -np $SLURM_NTASKS \
         -ldlibpath $LD_LIBRARY_PATH \
         -classpath $STAR_CLASS_PATH \
         -fabricverbose \
         -mpi openmpi \
         -mpiflags "--mca pml ucx --mca btl openib --mca pml_base_verbose 10 --mca mtl_base_verbose 10"  \
         ./SteadyFlowBackwardFacingStep_final.sim

# Kill off any rogue processes
kill_starccm+
Submit to slurm
Code:
$ squeue -d debug -n 80 ./starccmubuntu.sh
You will likely need to modify the jobscript above. For example, use a multiple of cores to fill up your nodes in the submission, in my case ,-n 80, will equal to the core count for the 2 nodes used in my workbench. You might need a different value.

You can watch your infiniband counters to see that significant amount of traffic is sent over the wire which will indicate that you have succeeded.

Code:
watch -d cat /sys/class/infiniband/mlx5_0/ports/1/counters/port_rcv_packets
I've been presenting at Ubuntu Masters about the setup I use to work with my systems which allows me to do things like this easily. Here is a link to that material: https://youtu.be/SGrRqCuiT90

I hope you can make use of this and also that starccm will soon be supporting ubuntu straight out of the box.

Last edited by erik_lonroth; July 11, 2020 at 17:06.
erik_lonroth is offline   Reply With Quote

Reply

Tags
infiniband, openmpi, slurm, starccm, ubuntu


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 16:42.