CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > SU2 > SU2 Installation

shape_optimization.py - Inconsistent MPI Errors on HPC Nodes

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   July 4, 2023, 11:44
Default shape_optimization.py - Inconsistent MPI Errors on HPC Nodes
  #1
New Member
 
mardar
Join Date: Dec 2019
Posts: 17
Rep Power: 6
mardar572 is on a distinguished road
Hi everyone,

I'm currently attempting to utilize shape_optimization.py for the 3D inviscid onera tutorial with discrete adjoint. While performing this optimization on my local host, everything runs smoothly without any issues. However, when I attempt to run it on the nodes of HPC cluster, I encounter occasional errors. The error seems to occur randomly, sometimes during the DEFORM process, and other times during the ADJOINT or DIRECT processes. Additionally, I'm using su2 version 740 and open mpi 414.

This inconsistency error is seems like related to MPI, and I'm seeking some insights into potential reasons for this behavior on the HPC nodes. Has anyone else experienced a similar issue or have any ideas on what could be causing this problem?

Thanks in advance for your help!

my job file:

#! /bin/bash
#$ -S /bin/bash
#$ -V
#$ -cwd
#$ -j y
export SU2_MPI_COMMAND="/mypath/apps/ompi414/bin/mpirun --mca mtl ^ofi --mca btl_openib_allow_ib 1 --mca btl vader,self,openib -n %d %s"
/mypath/apps/anaconda3/bin/python /mypath/apps/su740/bin/shape_optimization.py -n 30 -g DISCRETE_ADJOINT -f inv_ONERAM6_adv.cfg


ERROR:

File "/mypath/apps/su740/bin/SU2/run/interface.py", line 208, in SOL
run_command( the_Command )
File "/mypath/apps/su740/bin/SU2/run/interface.py", line 271, in run_command
raise exception(message)
RuntimeError: Path = /mypath/opt_try/try0/DESIGNS/DSN_001/ADJOINT_DRAG/,
Command = /mypath/apps/ompi414/bin/mpirun --mca mtl ^ofi --mca btl_openib_allow_ib 1 --mca btl vader,self,openib -n 30 /mypath/apps/su740/bin/SU2_SOL config_SOL.cfg
SU2 process returned error '139'
[compute-5-2:24097] *** Process received signal ***
[compute-5-2:24097] Signal: Segmentation fault (11)
[compute-5-2:24097] Signal code: Address not mapped (1)
[compute-5-2:24097] Failing at address: 0x2b6466853770
[compute-5-2:24118] *** Process received signal ***
[compute-5-2:24118] Signal: Segmentation fault (11)
[compute-5-2:24118] Signal code: Address not mapped (1)
[compute-5-2:24118] Failing at address: 0x2ad7a4e50770
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 13 with PID 24118 on node compute-5-2 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
mardar572 is offline   Reply With Quote

Reply

Tags
su2

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Building OpenFOAM1.7.0 from source ata OpenFOAM Installation 46 March 6, 2022 13:21
Initial residuals of p increases within the piso loop of pimpleFoam efsolat OpenFOAM Running, Solving & CFD 0 December 20, 2021 03:25
Transient simulation not converging skabilan OpenFOAM Running, Solving & CFD 14 December 16, 2019 23:12
Errors when running Shape Optimization Tutorial 1 - NACA0012 northfly SU2 7 February 14, 2019 03:46
Upgraded from Karmic Koala 9.10 to Lucid Lynx10.04.3 bookie56 OpenFOAM Installation 8 August 13, 2011 04:03


All times are GMT -4. The time now is 14:50.