CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   SU2 (https://www.cfd-online.com/Forums/su2/)
-   -   Segmentation fault in SU2 V5.0 (https://www.cfd-online.com/Forums/su2/183977-segmentation-fault-su2-v5-0-a.html)

ygd February 19, 2017 10:24

Segmentation fault in SU2 V5.0
 
1 Attachment(s)
Hi,

I have installed SU2 V5.0 and I am doing shape optimization case for RAE2822 airfoil (RANS simulation) using fixed CL mode (which means I specify a target CL instead of Angle of Attack). This case was run in parallel using 16 processors. However, this case failed due to the segmentation fault. It seems that it could not write the solution documents (flow.dat and surface_flow.dat) after the CFD simulation in the first DESIGN step.

The output error file is as following:
Code:

Traceback (most recent call last):
  File "/home/gy2m14/Solver/SU2-V5.0/bin/shape_optimization.py", line 169, in <module>
    main()
  File "/home/gy2m14/Solver/SU2-V5.0/bin/shape_optimization.py", line 104, in main
    options.quiet        )
  File "/home/gy2m14/Solver/SU2-V5.0/bin/shape_optimization.py", line 145, in shape_optimization
    SU2.opt.SLSQP(project,x0,xb,its,accu)
  File "/home/gy2m14/Solver/SU2-V5.0/bin/SU2/opt/scipy_tools.py", line 140, in scipy_slsqp
    epsilon        = eps            )
  File "/local/software/python/2.7.5/lib/python2.7/site-packages/scipy/optimize/slsqp.py", line 206, in fmin_slsqp
    constraints=cons, **opts)
  File "/local/software/python/2.7.5/lib/python2.7/site-packages/scipy/optimize/slsqp.py", line 308, in _minimize_slsqp
    mieq = sum(map(len, [atleast_1d(c['fun'](x, *c['args'])) for c in cons['ineq']]))
  File "/home/gy2m14/Solver/SU2-V5.0/bin/SU2/opt/scipy_tools.py", line 464, in con_cieq
    cons = project.con_cieq(x)
  File "/home/gy2m14/Solver/SU2-V5.0/bin/SU2/opt/project.py", line 235, in con_cieq
    return self._eval(konfig, func,dvs)
  File "/home/gy2m14/Solver/SU2-V5.0/bin/SU2/opt/project.py", line 184, in _eval
    vals = design._eval(func,*args)
  File "/home/gy2m14/Solver/SU2-V5.0/bin/SU2/eval/design.py", line 144, in _eval
    vals = eval_func(*inputs)
  File "/home/gy2m14/Solver/SU2-V5.0/bin/SU2/eval/design.py", line 457, in con_cieq
    func = su2func(this_con,config,state)
  File "/home/gy2m14/Solver/SU2-V5.0/bin/SU2/eval/functions.py", line 93, in function
    aerodynamics( config, state )
  File "/home/gy2m14/Solver/SU2-V5.0/bin/SU2/eval/functions.py", line 241, in aerodynamics
    info = su2run.direct(config)
  File "/home/gy2m14/Solver/SU2-V5.0/bin/SU2/run/direct.py", line 83, in direct
    SU2_CFD(konfig)
  File "/home/gy2m14/Solver/SU2-V5.0/bin/SU2/run/interface.py", line 117, in CFD
    run_command( the_Command )
  File "/home/gy2m14/Solver/SU2-V5.0/bin/SU2/run/interface.py", line 297, in run_command
    raise exception , message
RuntimeError: Path = /home/gy2m14/hicks_henne_19/DESIGNS/DSN_001/DIRECT/,
Command = mpirun -n 16 /home/gy2m14/Solver/SU2-V5.0/bin/SU2_CFD config_CFD.cfg
SU2 process returned error '139'
[green0158:13039] *** Process received signal ***
[green0158:13039] Signal: Segmentation fault (11)
[green0158:13039] Signal code: Address not mapped (1)
[green0158:13039] Failing at address: 0x2237e38
[green0158:13039] [ 0] /lib64/libpthread.so.0(+0xf7e0) [0x7f9f6d4ea7e0]
[green0158:13039] [ 1] /local/software/openmpi/1.6.4/gcc-ofed-2.0/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0x20d) [0x7f9f6e7d439d]
[green0158:13039] [ 2] /local/software/openmpi/1.6.4/gcc-ofed-2.0/lib/libmpi.so.1(opal_memory_ptmalloc2_int_memalign+0xbf) [0x7f9f6e7d51ff]
[green0158:13039] [ 3] /local/software/openmpi/1.6.4/gcc-ofed-2.0/lib/libmpi.so.1(opal_memory_ptmalloc2_memalign+0xb3) [0x7f9f6e7d5e13]
[green0158:13039] [ 4] /usr/lib64/libibverbs.so.1(+0xc652) [0x7f9f69c70652]
[green0158:13039] [ 5] /usr/lib64/libmlx4-rdmav2.so(+0x63c8) [0x7f9f674dc3c8]
[green0158:13039] [ 6] /usr/lib64/libmlx4-rdmav2.so(+0x1af20) [0x7f9f674f0f20]
[green0158:13039] [ 7] /local/software/openmpi/1.6.4/gcc-ofed-2.0/lib/openmpi/mca_btl_openib.so(mca_btl_openib_finalize+0x111) [0x7f9f697ee1c1]
[green0158:13039] [ 8] /local/software/openmpi/1.6.4/gcc-ofed-2.0/lib/libmpi.so.1(mca_btl_base_close+0x83) [0x7f9f6e762b53]
[green0158:13039] [ 9] /local/software/openmpi/1.6.4/gcc-ofed-2.0/lib/openmpi/mca_pml_ob1.so(+0x51d9) [0x7f9f6a4901d9]
[green0158:13039] [10] /local/software/openmpi/1.6.4/gcc-ofed-2.0/lib/libmpi.so.1(mca_base_components_close+0x72) [0x7f9f6e7da432]
[green0158:13039] [11] /local/software/openmpi/1.6.4/gcc-ofed-2.0/lib/libmpi.so.1(mca_pml_base_close+0xc8) [0x7f9f6e771ca8]
[green0158:13039] [12] /local/software/openmpi/1.6.4/gcc-ofed-2.0/lib/libmpi.so.1(ompi_mpi_finalize+0x2c2) [0x7f9f6e72fbe2]
[green0158:13039] [13] /home/gy2m14/Solver/SU2-V5.0/bin/SU2_CFD(main+0x1f9) [0x4a49c9]
[green0158:13039] [14] /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f9f6d165d5d]
[green0158:13039] [15] /home/gy2m14/Solver/SU2-V5.0/bin/SU2_CFD() [0x4a17d1]
[green0158:13039] *** End of error message ***
[green0158:13037] *** Process received signal ***
[green0158:13037] Signal: Segmentation fault (11)
[green0158:13037] Signal code: Address not mapped (1)
[green0158:13037] Failing at address: 0x2439158
[green0158:13037] [ 0] /lib64/libpthread.so.0(+0xf7e0) [0x7f93cdfd97e0]
[green0158:13037] [ 1] /local/software/openmpi/1.6.4/gcc-ofed-2.0/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0x20d) [0x7f93cf2c339d]
[green0158:13037] [ 2] /local/software/openmpi/1.6.4/gcc-ofed-2.0/lib/libmpi.so.1(opal_memory_ptmalloc2_int_memalign+0xbf) [0x7f93cf2c41ff]
[green0158:13037] [ 3] /local/software/openmpi/1.6.4/gcc-ofed-2.0/lib/libmpi.so.1(opal_memory_ptmalloc2_memalign+0xb3) [0x7f93cf2c4e13]
[green0158:13037] [ 4] /usr/lib64/libibverbs.so.1(+0xc652) [0x7f93ca75f652]
[green0158:13037] [ 5] /usr/lib64/libmlx4-rdmav2.so(+0x63c8) [0x7f93c7fcb3c8]
[green0158:13037] [ 6] /usr/lib64/libmlx4-rdmav2.so(+0x1af20) [0x7f93c7fdff20]
[green0158:13037] [ 7] /local/software/openmpi/1.6.4/gcc-ofed-2.0/lib/openmpi/mca_btl_openib.so(mca_btl_openib_finalize+0x111) [0x7f93ca2dd1c1]
[green0158:13037] [ 8] /local/software/openmpi/1.6.4/gcc-ofed-2.0/lib/libmpi.so.1(mca_btl_base_close+0x83) [0x7f93cf251b53]
[green0158:13037] [ 9] /local/software/openmpi/1.6.4/gcc-ofed-2.0/lib/openmpi/mca_pml_ob1.so(+0x51d9) [0x7f93caf7f1d9]
[green0158:13037] [10] /local/software/openmpi/1.6.4/gcc-ofed-2.0/lib/libmpi.so.1(mca_base_components_close+0x72) [0x7f93cf2c9432]
[green0158:13037] [11] /local/software/openmpi/1.6.4/gcc-ofed-2.0/lib/libmpi.so.1(mca_pml_base_close+0xc8) [0x7f93cf260ca8]
[green0158:13037] [12] /local/software/openmpi/1.6.4/gcc-ofed-2.0/lib/libmpi.so.1(ompi_mpi_finalize+0x2c2) [0x7f93cf21ebe2]
[green0158:13037] [13] /home/gy2m14/Solver/SU2-V5.0/bin/SU2_CFD(main+0x1f9) [0x4a49c9]
[green0158:13037] [14] /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f93cdc54d5d]
[green0158:13037] [15] /home/gy2m14/Solver/SU2-V5.0/bin/SU2_CFD() [0x4a17d1]
[green0158:13037] *** End of error message ***
[green0158:13027] [[4708,0],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 145
[green0158:13027] [[4708,0],0] attempted to send to [[4708,1],8]: tag 20
[green0158:13027] [[4708,0],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 145
[green0158:13027] [[4708,0],0] attempted to send to [[4708,1],6]: tag 20
[green0158:13027] [[4708,0],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 145
[green0158:13027] [[4708,0],0] attempted to send to [[4708,1],5]: tag 20
[green0158:13027] [[4708,0],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 145
[green0158:13027] [[4708,0],0] attempted to send to [[4708,1],3]: tag 20
[green0158:13027] [[4708,0],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 145
[green0158:13027] [[4708,0],0] attempted to send to [[4708,1],7]: tag 20
[green0158:13027] [[4708,0],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 145
[green0158:13027] [[4708,0],0] attempted to send to [[4708,1],1]: tag 20
[green0158:13027] [[4708,0],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 145
[green0158:13027] [[4708,0],0] attempted to send to [[4708,1],12]: tag 20
[green0158:13027] [[4708,0],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 145
[green0158:13027] [[4708,0],0] attempted to send to [[4708,1],14]: tag 20
[green0158:13027] [[4708,0],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 145
[green0158:13027] [[4708,0],0] attempted to send to [[4708,1],10]: tag 20
[green0158:13027] [[4708,0],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 145
[green0158:13027] [[4708,0],0] attempted to send to [[4708,1],13]: tag 20
--------------------------------------------------------------------------
mpirun noticed that process rank 9 with PID 13037 on node green0158 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

And the tail of log_Direct.out file in the folder ../DESIGNS/DSN_001/DIRECT is shown below:
Code:

29996  0.036916    -4.717973    -4.201405      0.823338      0.021130
29997  0.036916    -4.717988    -4.201409      0.823338      0.021130
29998  0.036916    -4.718003    -4.201414      0.823338      0.021130
29999  0.036916    -4.718018    -4.201419      0.823337      0.021130

-------------------------- File Output Summary --------------------------
Writing comma-separated values (CSV) surface files.
Merging coordinates in the Master node.
Merging solution in the Master node.
Writing SU2 native restart file.
Writing the forces breakdown file.
-------------------------------------------------------------------------

History file, closed.

------------------------- Solver Postprocessing -------------------------
Deleted CNumerics container.
Deleted CIntegration container.
Deleted CSolver container.
Deleted CIteration container.
Deleted CInterpolator container.
Deleted CTransfer container.
Deleted CGeometry container.
Deleted CFreeFormDefBox class.
Deleted CSurfaceMovement class.
Deleted CVolumetricMovement class.
Deleted CConfig container.
Deleted COutput class.
-------------------------------------------------------------------------

Completed in 1107.844763 seconds on 16 cores.

------------------------- Exit Success (SU2_CFD) ------------------------

Normally, it should continue to write the flow.dat and surface_flow.dat following the CFD calculation. However, apparently it stopped here and thus led to the termination of the optimization.

I also attached the configuration file in the attachment. Note that I changed the file format in order to upload it.

Furthermore, I have noticed several points after running further tests:
1) If the fixed CL mode is applied to an inviscid optimisation case (e.g. Euler flow for NACA0012 airfoil), it would run successfully.
2) If the fixed AoA mode is applied to an viscous optimisation case (e.g. N-S flow for RAE2822 airfoil), it would also run successfully.
3) If the fixed CL mode is applied to an viscous CFD simulation (e.g. RANS simulation of RAE2822 airfoil), it could run successfully, but it does fail sometimes due to the segmentation fault.

I am wondering if there are some code errors in SU2 V5.0? Could anyone help me with the issue as mentioned above?

Many Thanks,
Yang

asthelen February 28, 2017 17:39

Maybe this doesn't help at all, but this sounds similar to an issue I'm having.

I noticed that using "parallel_computation.py -f <config file> -n <Nproc>" will produce the flow and surface flow data files, as will "SU2_CFD <config file>". But when I try to use "mpirun -np <Nproc> SU2_CFD <config file>", it runs normally but the flow and surface flow files are not produced.

Maybe there's a different command you could use to run in parallel? For example, if you're using mpirun, maybe you should instead use something like "shape_optimization.py -f <config file> -n <Nproc>"?

ygd March 1, 2017 04:38

Hi Andrew,

Thanks for your reply. Actually, I have been using the commands "parallel_computation.py -f <config file> -n <Nproc>" and "SU2_CFD <config file>" to run the code either in parallel or in serial. It seems that the error is not coming from the commands.

At the moment, I have narrowed down the problem to the cases that viscous CFD simualtions using fixed CL mode. Either serial or parallel run would fail due to the segmentation error.

I tracked down the code and found that at the tail of SU2_CFD.cpp, there is one function "MPI_Finalize()" could not be executed by the master processor.

Code:

#ifdef HAVE_MPI
  MPI_Buffer_detach(&buffptr, &buffsize);
  free(buffptr);
  MPI_Finalize();
#endif

It seems like MPI problem but it only happens for viscous cases.

Anyone else who has any experience on this? Many thanks.

Yang

Quote:

Originally Posted by asthelen (Post 638931)
Maybe this doesn't help at all, but this sounds similar to an issue I'm having.

I noticed that using "parallel_computation.py -f <config file> -n <Nproc>" will produce the flow and surface flow data files, as will "SU2_CFD <config file>". But when I try to use "mpirun -np <Nproc> SU2_CFD <config file>", it runs normally but the flow and surface flow files are not produced.

Maybe there's a different command you could use to run in parallel? For example, if you're using mpirun, maybe you should instead use something like "shape_optimization.py -f <config file> -n <Nproc>"?



All times are GMT -4. The time now is 20:49.