CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   SU2 (https://www.cfd-online.com/Forums/su2/)
-   -   Segfault writing restart files (https://www.cfd-online.com/Forums/su2/255550-segfault-writing-restart-files.html)

davistib April 16, 2024 10:30

Segfault writing restart files
 
I am running a case which otherwise seems to run fine. However, when writing restart files it segfaults only if there is already an existing restart file.

So if my OUTPUT_WRT_FREQ is less than my ITER, it will segfault on the second write, or at the completion of a successful run when the final solution is written to disk. Also if I try to restart a solution, it will obviously segfault when trying to write the new solution.

I'm running version 8.0.1. Note the first write of the of the restart file works fine. Any ideas?

Here is a partial trace:

Code:

+-----------------------------------------------------------------------+
|        File Writing Summary      |              Filename            |
+-----------------------------------------------------------------------+
|SU2 binary restart                |restart_flow.dat                  |
[<> :0:6905] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x30)

Code:

==== backtrace (tid:  7130) ====
 0 0x000000000004eb50 killpg()  ???:0
 1 0x0000000000049f61 opal_obj_run_destructors()  /build-result/src/hpcx-v2.16-gcc-mlnx_ofed-redhat8-cuda12-gdrcopy2-nccl2.18-x86_64/ompi-db10576f403e833fdf7cd0d938e66b8393b20680/
 2 0x0000000000049f61 ompi_file_close()  /build-result/src/hpcx-v2.16-gcc-mlnx_ofed-redhat8-cuda12-gdrcopy2-nccl2.18-x86_64/ompi-db10576f403e833fdf7cd0d938e66b8393b20680/ompi/file
 3 0x000000000006efe6 PMPI_File_close()  /build-result/src/hpcx-v2.16-gcc-mlnx_ofed-redhat8-cuda12-gdrcopy2-nccl2.18-x86_64/ompi-db10576f403e833fdf7cd0d938e66b8393b20680/ompi/mpi/
 4 0x00000000005e8a58 CFileWriter::OpenMPIFile()  ???:0
 5 0x00000000005f8a5e CSU2BinaryFileWriter::WriteData()  ???:0
 6 0x00000000005c3391 COutput::WriteToFile()  ???:0
 7 0x00000000005c4d36 COutput::SetResultFiles()  ???:0
 8 0x000000000052de76 CFluidIteration::Solve()  ???:0
 9 0x0000000000505fd0 CSinglezoneDriver::StartSolver()  ???:0
10 0x000000000047b8d8 main()  ???:0
11 0x000000000003ad85 __libc_start_main()  ???:0
12 0x00000000004a1b1e _start()  ???:0

Code:

=================================
[:07130] *** Process received signal ***
[:07130] Signal: Segmentation fault (11)
[:07130] Signal code:  (-6)
[:07130] Failing at address: 0x4e2100001bda
[:07130] [ 0] /usr/lib64/libc.so.6(+0x4eb50)[0x14c9a1b0cb50]
[:07130] [ 1] /opt/hpcx-v2.16-gcc-mlnx_ofed-redhat8-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/lib/libmpi.so.40(ompi_file_close+0x11)[0x14c9a27fbf61]
[:07130] [ 2] /opt/hpcx-v2.16-gcc-mlnx_ofed-redhat8-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/lib/libmpi.so.40(PMPI_File_close+0x16)[0x14c9a2820fe6]
[:07130] [ 3] /opt/su2/su2-8.0.1/bin/SU2_CFD[0x5e8a58]
[:07130] [ 4] /opt/su2/su2-8.0.1/bin/SU2_CFD[0x5f8a5e]
[:07130] [ 5] /opt/su2/su2-8.0.1/bin/SU2_CFD[0x5c3391]
[:07130] [ 6] /opt/su2/su2-8.0.1/bin/SU2_CFD[0x5c4d36]
[:07130] [ 7] /opt/su2/su2-8.0.1/bin/SU2_CFD[0x52de76]
[:07130] [ 8] /opt/su2/su2-8.0.1/bin/SU2_CFD[0x505fd0]
[:07130] [ 9] /opt/su2/su2-8.0.1/bin/SU2_CFD[0x47b8d8]
[:07130] [10] /usr/lib64/libc.so.6(__libc_start_main+0xe5)[0x14c9a1af8d85]
[:07130] [11] /opt/su2/su2-8.0.1/bin/SU2_CFD[0x4a1b1e]
[:07130] *** End of error message ***


bigfootedrockmidget April 16, 2024 13:31

Does this only happen when using mpi?


All times are GMT -4. The time now is 21:05.