CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   SU2 Shape Design (https://www.cfd-online.com/Forums/su2-shape-design/)
-   -   NACA0012 optimization fails on parallel run (https://www.cfd-online.com/Forums/su2-shape-design/224517-naca0012-optimization-fails-parallel-run.html)

Lazlo February 23, 2020 09:12

NACA0012 optimization fails on parallel run
 
Hi everybody,


I am using linux SU2 7.0.1 on Fedora 31 on a single server/multicore AMD CPU.

All tutorials run perfectly except shape design ones. I encounter two problems with Inviscid_2D_Unconstrained_NACA0012:

1 - CONTINUOUS_ADJOINT run fails when reading surface sensitivity file (exposed in an other thread https://www.cfd-online.com/Forums/su...est-cases.html) when DISCRETE_ADJOINT is ok in single process but...

2 - SU2_CFD returns a segmentation fault when the same case is run in parallel. It occurs in DSN_002/DIRECT, with an error 139 (DSN_001 is ok). log_Direct.out finishes with the call to ParMETIS. The error sent to the terminal is:

Code:

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 0 on node server01 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

It is a pity because I am really interested by this design capability.


Lazlo

jtrin February 24, 2020 10:04

Hi Lazlo,

In my experience shape design can be a memory gobbler. If the first design iteration is running fine and you're experiencing a segfault, my guess is that you may be running out of memory.

Have you tried running "top" or something similar on your compute nodes?

Lazlo February 24, 2020 13:11

Thanks jtrin,
I stay very low on memory with the NACA0012 test case (2GB). It only occurs in parallel mode, serial mode is ok.

talbring February 26, 2020 07:01

Can you post the complete stack trace of python?

Lazlo February 26, 2020 14:01

Thank you for your interest,
Here is the output :
Code:

Traceback (most recent call last):
  File "/home/lazlo/bin/shape_optimization.py", line 176, in <module>
    main()
  File "/home/lazlo/bin/shape_optimization.py", line 108, in main
    options.nzones      )
  File "/home/lazlo/bin/shape_optimization.py", line 152, in shape_optimization
    SU2.opt.SLSQP(project,x0,xb,its,accu)
  File "/home/lazlo/bin/SU2/opt/scipy_tools.py", line 133, in scipy_slsqp
    epsilon        = eps            )
  File "/usr/lib64/python3.7/site-packages/scipy/optimize/slsqp.py", line 208, in fmin_slsqp
    constraints=cons, **opts)
  File "/usr/lib64/python3.7/site-packages/scipy/optimize/slsqp.py", line 399, in _minimize_slsqp
    fx = func(x)
  File "/usr/lib64/python3.7/site-packages/scipy/optimize/optimize.py", line 300, in function_wrapper
    return function(*(wrapper_args + args))
  File "/home/lazlo/bin/SU2/opt/scipy_tools.py", line 383, in obj_f
    obj_list = project.obj_f(x)
  File "/home/lazlo/bin/SU2/opt/project.py", line 233, in obj_f
    return self._eval(konfig, func,dvs)
  File "/home/lazlo/bin/SU2/opt/project.py", line 202, in _eval
    vals = design._eval(func,*args)
  File "/home/lazlo/bin/SU2/eval/design.py", line 147, in _eval
    vals = eval_func(*inputs)
  File "/home/lazlo/bin/SU2/eval/design.py", line 244, in obj_f
    func += su2func(this_obj,config,state) * sign * scale * global_factor
  File "/home/lazlo/bin/SU2/eval/functions.py", line 92, in function
    aerodynamics( config, state )
  File "/home/lazlo/bin/SU2/eval/functions.py", line 255, in aerodynamics
    info = su2run.direct(config)
  File "/home/lazlo/bin/SU2/run/direct.py", line 77, in direct
    SU2_CFD(konfig)
  File "/home/lazlo/bin/SU2/run/interface.py", line 112, in CFD
    run_command( the_Command )
  File "/home/lazlo/bin/SU2/run/interface.py", line 292, in run_command
    raise exception(message)
RuntimeError: Path = /media/data/lazlo/Logiciels/git/SU2/Tutorials/Inviscid_2D_Unconstrained_NACA0012 DA/DESIGNS/DSN_002/DIRECT/,
Command = mpirun -n 8 /home/lazlo/bin/SU2_CFD config_CFD.cfg
SU2 process returned error '139'
[server01:6556 :0:6556] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace ====
[server01:6557 :0:6557] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace ====
[server01:6558 :0:6558] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace ====
[server01:6559 :0:6559] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace ====
[server01:6561 :0:6561] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace ====
[server01:6563 :0:6563] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x514)
==== backtrace ====
[server01:6555 :0:6555] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace ====
    0  /lib64/libucs.so.0(+0x1b25f) [0x7fb6ced4025f]
    1  /lib64/libucs.so.0(+0x1b42a) [0x7fb6ced4042a]
    2  /home/lazlo/bin/SU2_CFD() [0xb3d2e0]
    3  /home/lazlo/bin/SU2_CFD() [0xb3ecb9]
    4  /home/lazlo/bin/SU2_CFD() [0x8050f4]
    5  /home/lazlo/bin/SU2_CFD() [0x806447]
    6  /home/lazlo/bin/SU2_CFD() [0x806bf8]
    7  /home/lazlo/bin/SU2_CFD() [0x80cb1f]
    8  /home/lazlo/bin/SU2_CFD() [0x45a8e0]
    9  /lib64/libc.so.6(__libc_start_main+0xf3) [0x7fb6d4e8e1a3]
  10  /home/lazlo/bin/SU2_CFD() [0x4687be]
===================
    0  /lib64/libucs.so.0(+0x1b25f) [0x7f79a413225f]
    1  /lib64/libucs.so.0(+0x1b42a) [0x7f79a413242a]
    2  /home/lazlo/bin/SU2_CFD() [0xb3d96b]
    3  /home/lazlo/bin/SU2_CFD() [0xb3ecb9]
    4  /home/lazlo/bin/SU2_CFD() [0x8050f4]
    5  /home/lazlo/bin/SU2_CFD() [0x806447]
    6  /home/lazlo/bin/SU2_CFD() [0x806bf8]
    7  /home/lazlo/bin/SU2_CFD() [0x80cb1f]
    8  /home/lazlo/bin/SU2_CFD() [0x45a8e0]
    9  /lib64/libc.so.6(__libc_start_main+0xf3) [0x7f79a52801a3]
  10  /home/lazlo/bin/SU2_CFD() [0x4687be]
===================
    0  /lib64/libucs.so.0(+0x1b25f) [0x7f0cdc54a25f]
    1  /lib64/libucs.so.0(+0x1b42a) [0x7f0cdc54a42a]
    2  /home/lazlo/bin/SU2_CFD() [0xb3d96b]
    3  /home/lazlo/bin/SU2_CFD() [0xb3ecb9]
    4  /home/lazlo/bin/SU2_CFD() [0x8050f4]
    5  /home/lazlo/bin/SU2_CFD() [0x806447]
    6  /home/lazlo/bin/SU2_CFD() [0x806bf8]
    7  /home/lazlo/bin/SU2_CFD() [0x80cb1f]
    8  /home/lazlo/bin/SU2_CFD() [0x45a8e0]
    9  /lib64/libc.so.6(__libc_start_main+0xf3) [0x7f0cde6991a3]
  10  /home/lazlo/bin/SU2_CFD() [0x4687be]
===================
    0  /lib64/libucs.so.0(+0x1b25f) [0x7f2bb8f4125f]
    1  /lib64/libucs.so.0(+0x1b42a) [0x7f2bb8f4142a]
    2  /home/lazlo/bin/SU2_CFD() [0xb3d96b]
    3  /home/lazlo/bin/SU2_CFD() [0xb3ecb9]
    4  /home/lazlo/bin/SU2_CFD() [0x8050f4]
    5  /home/lazlo/bin/SU2_CFD() [0x806447]
    6  /home/lazlo/bin/SU2_CFD() [0x806bf8]
    7  /home/lazlo/bin/SU2_CFD() [0x80cb1f]
    8  /home/lazlo/bin/SU2_CFD() [0x45a8e0]
    0  /lib64/libucs.so.0(+0x1b25f) [0x7fe60d20b25f]
    1  /lib64/libucs.so.0(+0x1b42a) [0x7fe60d20b42a]
    2  /home/lazlo/bin/SU2_CFD() [0xb3d96b]
    3  /home/lazlo/bin/SU2_CFD() [0xb3ecb9]
    4  /home/lazlo/bin/SU2_CFD() [0x8050f4]
    5  /home/lazlo/bin/SU2_CFD() [0x806447]
    6  /home/lazlo/bin/SU2_CFD() [0x806bf8]
    7  /home/lazlo/bin/SU2_CFD() [0x80cb1f]
    8  /home/lazlo/bin/SU2_CFD() [0x45a8e0]
    9  /lib64/libc.so.6(__libc_start_main+0xf3) [0x7fe60f35a1a3]
  10  /home/lazlo/bin/SU2_CFD() [0x4687be]
===================
    0  /lib64/libucs.so.0(+0x1b25f) [0x7f32782c925f]
    1  /lib64/libucs.so.0(+0x1b42a) [0x7f32782c942a]
    2  /home/lazlo/bin/SU2_CFD() [0xb3d96b]
    3  /home/lazlo/bin/SU2_CFD() [0xb3ecb9]
    4  /home/lazlo/bin/SU2_CFD() [0x8050f4]
    5  /home/lazlo/bin/SU2_CFD() [0x806447]
    6  /home/lazlo/bin/SU2_CFD() [0x806bf8]
    7  /home/lazlo/bin/SU2_CFD() [0x80cb1f]
    8  /home/lazlo/bin/SU2_CFD() [0x45a8e0]
    9  /lib64/libc.so.6(__libc_start_main+0xf3) [0x7f327a4181a3]
  10  /home/lazlo/bin/SU2_CFD() [0x4687be]
===================
    0  /lib64/libucs.so.0(+0x1b25f) [0x7f6ff004825f]
    1  /lib64/libucs.so.0(+0x1b42a) [0x7f6ff004842a]
    2  /lib64/libc.so.6(cfree+0x20) [0x7f6ff11fb7b0]
    3  /home/lazlo/bin/SU2_CFD() [0x5f5882]
    4  /home/lazlo/bin/SU2_CFD() [0xb3d366]
    5  /home/lazlo/bin/SU2_CFD() [0xb3ecb9]
    6  /home/lazlo/bin/SU2_CFD() [0x8050f4]
    7  /home/lazlo/bin/SU2_CFD() [0x806447]
    8  /home/lazlo/bin/SU2_CFD() [0x806bf8]
    9  /home/lazlo/bin/SU2_CFD() [0x80cb1f]
  10  /home/lazlo/bin/SU2_CFD() [0x45a8e0]
  11  /lib64/libc.so.6(__libc_start_main+0xf3) [0x7f6ff11961a3]
  12  /home/lazlo/bin/SU2_CFD() [0x4687be]
===================
    9  /lib64/libc.so.6(__libc_start_main+0xf3) [0x7f2bbb0901a3]
  10  /home/lazlo/bin/SU2_CFD() [0x4687be]
===================
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 6 with PID 0 on node server01 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------



All times are GMT -4. The time now is 06:08.