CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   SU2 (https://www.cfd-online.com/Forums/su2/)
-   -   Segfault with a periodic domain parallel (https://www.cfd-online.com/Forums/su2/162547-segfault-periodic-domain-parallel.html)

xgarnaud November 12, 2015 12:59

Segfault with a periodic domain parallel
 
Dear all.

I am trying to perform a computation on a periodic mesh (a 10 degrees sector of a rotor). The mesh was generated with Ansys meshing and exported in cgns, then I run
SU2_MSH convert_mesh.cfg
to generate a su2 mesh file with the ghost nodes.

If I run
SU2_CFD comp_o1.cfg
it runs fine, but as soon as I run it in parallel I get a segfault.

valgrind gives a first invalid write when calling parmetis, and gdb gives a backtrace in CPhysicalGeometry::SetSendReceive.

The files are available here:
https://drive.google.com/folderview?...GM&usp=sharing

Do you know what goes wrong?

Best regards,

Xavier

xgarnaud December 3, 2015 08:11

I am sorry the mesh file was missing. I added it as well as a smaller 2D example (folder LS89) for which the problem also appears.

The 2D case runs fine on 1 and 2 processes, but a segfault occurs for 4 processes. I am not sure it is related, but SU2_MSH gives mismtches between the periodic nodes of the order of 1e-8 or smaller for all nodes, while mpiexec -n 2 SU2_CFD LS89.cfg gives much higher values, for axample
Bad match for point 9439. Nearest donor distance: 1.9809954119e-02.

Best regards

Jiba December 3, 2015 11:58

Dear Xavier,
I tested the comp_o1.cfg using the original CGNS grid and both in serial and parallel (up to 8 procs) there is no problem at all.

When running SU2_MSH I get the same error as your:
"Bad match for point 638439. Nearest donor distance: 3.7528684773e-06"
when checking Periodic1, Periodic2

Comparing the two grid in su2 format (the one from SU2_MSH and the one that is the output of SU2_CFD with the input grid in CGNS) they are identical.
If I run the RO37.su2 grid on a single processor it's ok but at the end of the run I get this message
"The surface element (5, 6096) doesn't have an associated volume element." So
even if it runs I don't know if the solution is well computed.

On two procs instead the code stop when calling the mpi communication.
The same happens if I load the mesh_out.su2 as obvious.

Unfortunately, I didn't get where is the problem and for now my suggestion is to read directly the CGNS grid format in SU2_CFD and run without loading the SU2 format.

xgarnaud December 16, 2015 07:44

Dear Jiba,

As far as I understand, it is necessary to use SU2_MSH to convert the mesh to su2 format in order to add the ghost nodes / ghost cells, otherwise periodicity is not enforced.
In the LS89 test case for example, the computation on the cgns mesh crashes (if I only perform a few iterations, the solution is not periodic).
Is is possible to run a periodic computation directly from the cgns mesh without any conversion?
Thank you very much for your help.
Best regards

Xavier

jywang June 16, 2016 04:51

Hi Xavier,

Now I am running Rotor 37 using SU2, and I also face the same problem about segmentation fault in parallel running, do you have any ideal how to solve it?

Thanks!

xgarnaud June 29, 2016 07:30

No I didn't figure out a way t osolve this problem. Sorry!


All times are GMT -4. The time now is 16:11.