CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   SU2 (https://www.cfd-online.com/Forums/su2/)
-   -   Running CFD parallel. There is no geometry file! (https://www.cfd-online.com/Forums/su2/133024-running-cfd-parallel-there-no-geometry-file.html)

CrashLaker April 9, 2014 07:35

Running CFD parallel. DDC isn't working.
 
Hello guys! I tried to run Onera M6 in parallel using parallel_computing.py and got an error reporting that There is no geometry file (GetnZone))!

What might that be?

Code:

the command: mpirun -np 8 -machinefile hosts /scratch/ramos/su2mpi/bin/SU2_CFD config_CFD.cfg
the location: /scratch/ramos/su2tests/oneram6v1
cstr=mesh_ONERAM6_inv_1.su2
There is no geometry file (GetnZone))!
cstr=mesh_ONERAM6_inv_1.su2
There is no geometry file (GetnZone))!
cstr=mesh_ONERAM6_inv_1.su2
There is no geometry file (GetnZone))!
cstr=mesh_ONERAM6_inv_1.su2
There is no geometry file (GetnZone))!
cstr=mesh_ONERAM6_inv_1.su2
There is no geometry file (GetnZone))!
cstr=mesh_ONERAM6_inv_1.su2
There is no geometry file (GetnZone))!
cstr=mesh_ONERAM6_inv_1.su2
There is no geometry file (GetnZone))!
cstr=mesh_ONERAM6_inv_1.su2
There is no geometry file (GetnZone))!
Wed Apr  9 08:04:56 BRT 2014

Files inside /scratch/ramos/su2test/oneram6v1/
Code:

config_CFD.cfg
errput
errput2
hosts
inv_ONERAM6.cfg
inv_ONERAM6_JST.cfg
jobp2.sh
mesh_ONERAM6_inv.su2
openmpi_exemplo_job_PBS.sh

thanks in advance!

hlk April 9, 2014 16:44

This error indicates that the program was not able to find the mesh file. From the error message, it is looking for "mesh_ONERAM6_inv_1.su2", so most likely that mesh file does not exist in the working directory.

Based on your scratch file, it looks like what might be happening is that the mesh is never decomposed. The parallel computation looks for the mesh file name with an "_n" at the end because it expects that the mesh has already been split. This is executed automatically if you use the parallel_computation.py script, and you can also use SU2_DDC manually if you would prefer. Since it looks like neither of these are being executed, the mesh is not divided.

I suggest using parallel_computation.py. For more details, please see http://adl-public.stanford.edu/docs/...ED/Running+SU2

CrashLaker April 9, 2014 16:53

Quote:

Originally Posted by hlk (Post 485036)
This error indicates that the program was not able to find the mesh file. From the error message, it is looking for "mesh_ONERAM6_inv_1.su2", so most likely that mesh file does not exist in the working directory.

Based on your scratch file, it looks like what might be happening is that the mesh is never decomposed. The parallel computation looks for the mesh file name with an "_n" at the end because it expects that the mesh has already been split. This is executed automatically if you use the parallel_computation.py script, and you can also use SU2_DDC manually if you would prefer. Since it looks like neither of these are being executed, the mesh is not divided.

I suggest using parallel_computation.py. For more details, please see http://adl-public.stanford.edu/docs/...ED/Running+SU2

Hello hlk! Thanks for your reply!

I realized that DDC wasn't creating new meshes. After correcting some other minor mistakes now I'm facing a new error in which DDC is sending all the original mesh points to the first Domain only.

Code:

---------------------- Read grid file information -----------------------
Three dimensional problem.
582752 interior elements.
582752 tetrahedra.
108396 points, and 0 ghost points.

------------------------ Divide the numerical grid ----------------------
Domain 1: 108396 points (0 ghost points). Comm buff: 21.98MB of 50.00MB.
Domain 2: 0 points (0 ghost points). Comm buff: 0.00MB of 50.00MB.

And received this error:
Code:

Command = mpirun -np 4 -machinefile hosts /scratch/ramos/su2mpi/bin/SU2_DDC config_DDC.cfg
SU2 process returned error '139'
[puma52:28738] *** Process received signal ***
[puma52:28738] Signal: Segmentation fault (11)
[puma52:28738] Signal code: Address not mapped (1)
[puma52:28738] Failing at address: 0x1f231f00
[puma52:28738] [ 0] /lib64/libpthread.so.0 [0x307be0e7c0]
[puma52:28738] [ 1] /scratch/ramos/su2mpi/bin/SU2_DDC(_ZN15CDomainGeometryC1EP9CGeometryP7CConfig+0xb35) [0x50df95]
[puma52:28738] [ 2] /scratch/ramos/su2mpi/bin/SU2_DDC(main+0x2d4) [0x44ce14]
[puma52:28738] [ 3] /lib64/libc.so.6(__libc_start_main+0xf4) [0x307b21d994]
[puma52:28738] [ 4] /scratch/ramos/su2mpi/bin/SU2_DDC(_ZNSt8ios_base4InitD1Ev+0x39) [0x44ca89]
[puma52:28738] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 28738 on node puma52 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Any solution?
Thanks!

hlk April 9, 2014 17:02

I'm not certain (maybe someone else on the forum can jump in if they recognize this error), but what I would suggest as a first step is to double check that the code was correctly configured for parallel.

For more details on configuring with parallel tools: http://adl-public.stanford.edu/docs/...on+from+Source

CrashLaker April 9, 2014 19:54

Quote:

Originally Posted by hlk (Post 485044)
I'm not certain (maybe someone else on the forum can jump in if they recognize this error), but what I would suggest as a first step is to double check that the code was correctly configured for parallel.

For more details on configuring with parallel tools: http://adl-public.stanford.edu/docs/...on+from+Source

Hello hlk. I think that's not a configuration problem due to the fact that I recompiled it dozens of times. Let's recall what we know so far?

Well.. After some googling I think that error 139 is C's default code for segmentation fault.

Another thing is that DDC is sending all the grid's points to the first Domain (1st mpi rank). Leaving nothing to the others as you can see on this log.

Code:

Domain 1: 108396 points (0 ghost points). Comm buff: 21.98MB of 50.00MB.
Domain 2: 0 points (0 ghost points). Comm buff: 0.00MB of 50.00MB.

And yes. I'm sure that the whole mesh has 108396 points.

Can you help me?

copeland April 9, 2014 21:16

Hi CrashLaker,

I believe Heather's correct, the error is clearly in the partitioning of the mesh and I would suspect it has something to do with the configuration of SU2 and/or it's link to Metis.

Please be absolutely sure you've run the SU2 configure script with a link to your MPI compiler and by enabling Metis. Also, please be sure to 'make clean' to ensure that there are no old binaries hanging around with configuration settings that are out-of-date.


-Sean

CrashLaker April 10, 2014 03:08

Quote:

Originally Posted by copeland (Post 485073)
Hi CrashLaker,

I believe Heather's correct, the error is clearly in the partitioning of the mesh and I would suspect it has something to do with the configuration of SU2 and/or it's link to Metis.

Please be absolutely sure you've run the SU2 configure script with a link to your MPI compiler and by enabling Metis. Also, please be sure to 'make clean' to ensure that there are no old binaries hanging around with configuration settings that are out-of-date.


-Sean

Hello Copeland. Thanks for replying.

Isn't there anything else I should try? I already compiled it lots of times.


All times are GMT -4. The time now is 15:23.