davistib |
April 16, 2024 10:30 |
Segfault writing restart files
I am running a case which otherwise seems to run fine. However, when writing restart files it segfaults only if there is already an existing restart file.
So if my OUTPUT_WRT_FREQ is less than my ITER, it will segfault on the second write, or at the completion of a successful run when the final solution is written to disk. Also if I try to restart a solution, it will obviously segfault when trying to write the new solution.
I'm running version 8.0.1. Note the first write of the of the restart file works fine. Any ideas?
Here is a partial trace:
Code:
+-----------------------------------------------------------------------+
| File Writing Summary | Filename |
+-----------------------------------------------------------------------+
|SU2 binary restart |restart_flow.dat |
[<> :0:6905] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x30)
Code:
==== backtrace (tid: 7130) ====
0 0x000000000004eb50 killpg() ???:0
1 0x0000000000049f61 opal_obj_run_destructors() /build-result/src/hpcx-v2.16-gcc-mlnx_ofed-redhat8-cuda12-gdrcopy2-nccl2.18-x86_64/ompi-db10576f403e833fdf7cd0d938e66b8393b20680/
2 0x0000000000049f61 ompi_file_close() /build-result/src/hpcx-v2.16-gcc-mlnx_ofed-redhat8-cuda12-gdrcopy2-nccl2.18-x86_64/ompi-db10576f403e833fdf7cd0d938e66b8393b20680/ompi/file
3 0x000000000006efe6 PMPI_File_close() /build-result/src/hpcx-v2.16-gcc-mlnx_ofed-redhat8-cuda12-gdrcopy2-nccl2.18-x86_64/ompi-db10576f403e833fdf7cd0d938e66b8393b20680/ompi/mpi/
4 0x00000000005e8a58 CFileWriter::OpenMPIFile() ???:0
5 0x00000000005f8a5e CSU2BinaryFileWriter::WriteData() ???:0
6 0x00000000005c3391 COutput::WriteToFile() ???:0
7 0x00000000005c4d36 COutput::SetResultFiles() ???:0
8 0x000000000052de76 CFluidIteration::Solve() ???:0
9 0x0000000000505fd0 CSinglezoneDriver::StartSolver() ???:0
10 0x000000000047b8d8 main() ???:0
11 0x000000000003ad85 __libc_start_main() ???:0
12 0x00000000004a1b1e _start() ???:0
Code:
=================================
[:07130] *** Process received signal ***
[:07130] Signal: Segmentation fault (11)
[:07130] Signal code: (-6)
[:07130] Failing at address: 0x4e2100001bda
[:07130] [ 0] /usr/lib64/libc.so.6(+0x4eb50)[0x14c9a1b0cb50]
[:07130] [ 1] /opt/hpcx-v2.16-gcc-mlnx_ofed-redhat8-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/lib/libmpi.so.40(ompi_file_close+0x11)[0x14c9a27fbf61]
[:07130] [ 2] /opt/hpcx-v2.16-gcc-mlnx_ofed-redhat8-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/lib/libmpi.so.40(PMPI_File_close+0x16)[0x14c9a2820fe6]
[:07130] [ 3] /opt/su2/su2-8.0.1/bin/SU2_CFD[0x5e8a58]
[:07130] [ 4] /opt/su2/su2-8.0.1/bin/SU2_CFD[0x5f8a5e]
[:07130] [ 5] /opt/su2/su2-8.0.1/bin/SU2_CFD[0x5c3391]
[:07130] [ 6] /opt/su2/su2-8.0.1/bin/SU2_CFD[0x5c4d36]
[:07130] [ 7] /opt/su2/su2-8.0.1/bin/SU2_CFD[0x52de76]
[:07130] [ 8] /opt/su2/su2-8.0.1/bin/SU2_CFD[0x505fd0]
[:07130] [ 9] /opt/su2/su2-8.0.1/bin/SU2_CFD[0x47b8d8]
[:07130] [10] /usr/lib64/libc.so.6(__libc_start_main+0xe5)[0x14c9a1af8d85]
[:07130] [11] /opt/su2/su2-8.0.1/bin/SU2_CFD[0x4a1b1e]
[:07130] *** End of error message ***
|