CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   SU2 Installation (http://www.cfd-online.com/Forums/su2-installation/)
-   -   Grid partitioning error. (http://www.cfd-online.com/Forums/su2-installation/122873-grid-partitioning-error.html)

Akash C August 30, 2013 04:29

Grid partitioning error.
 
Hi,

I have compiled the latest version of the code from github and was trying to run the DPW tutorial in parallel. During the partitioning of grid the following error crops up.

Code:

------------------------ Divide the numerical grid ----------------------
Finished partitioning using METIS 4.0.3. (73515 edge cuts).             
Domain 1: 796503 points (18642 ghost points). Comm buff: 72.35MB of 50.00MB.
Traceback (most recent call last):                                         
  File "parallel_computation.py", line 109, in <module>                   
    main()                                                                 
  File "parallel_computation.py", line 54, in main                         
    options.divide_grid  )                                                 
  File "parallel_computation.py", line 77, in parallel_computation         
    info = SU2.run.decompose(config)                                       
  File "/home.local/akash.chaudhari/Analysis/ver2.0.6/SU2/run/decompose.py", line 66, in decompose
    SU2_DDC(konfig)                                                                             
  File "/home.local/akash.chaudhari/Analysis/ver2.0.6/SU2/run/interface.py", line 68, in DDC     
    run_command( the_Command )                                                                   
  File "/home.local/akash.chaudhari/Analysis/ver2.0.6/SU2/run/interface.py", line 272, in run_command
    raise Exception , message                                                                       
Exception: Path = /home.local/akash.chaudhari/Analysis/ver2.0.6/,                                   
Command = mpirun -np 2 /home.local/akash.chaudhari/SU_CFD/SU2v2.0.6/SU2/SU2_PY/bin/SU2_DDC config_DDC.cfg
SU2 process returned error '1'                                                                         
Fatal error in MPI_Bsend: Invalid buffer pointer, error stack:                                         
MPI_Bsend(182).......: MPI_Bsend(buf=0x7f065aeb1010, count=6161488, MPI_UNSIGNED_LONG, dest=0, tag=7, MPI_COMM_WORLD) failed
MPIR_Bsend_isend(305): Insufficient space in Bsend buffer; requested 49291904; total buffer size is 52430000

So does this mean that I have to move to bigger machine or possibly a cluster to run this analysis as there is memory shortage. This mesh has cell count of 1.5 million and I had run meshes of cell count up to 1.2 million on my current machine with older versions of the code.
Kindly help me solve this problem.

Thanks,
Akash

austin.m September 6, 2013 12:43

I have also encountered this error when trying to partition a very large grid.

"Comm buff: 130.80MB of 50.00MB"

The machine I am running on has 96GB of RAM, so I doubt it is a machine memory issue.

austin.m September 6, 2013 14:37

Line 117 of ../Common/include/option_structure.hpp has

const unsigned int MAX_MPI_BUFFER = 52428800; /*!< \brief Buffer size for parallel simulations (50MB). */

I increased this value to 150MB and the partitioning process gets a little farther (to domain 3 of 8), but then still errors out:

[CFD-LINUX:22185] *** An error occurred in MPI_Bsend
[CFD-LINUX:22185] *** on communicator MPI_COMM_WORLD
[CFD-LINUX:22185] *** MPI_ERR_BUFFER: invalid buffer pointer
[CFD-LINUX:22185] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)

austin.m September 6, 2013 15:23

After googling "MPI_ERR_BUFFER: invalid buffer pointer", it seems the buffer was still not big enough. So I boosted the MAX_MPI_BUFFER to 1.5 GB, and now my 22M element grid will partition successfully.

SU2 developers: what should this buffer be set to? The maximum grid size we expect? For the record, this 22M element grid is 1.8 GB on the disk in *.su2 format.

Akash C September 7, 2013 00:40

Thanks for your replies Austin. I will try this out on my machine.


All times are GMT -4. The time now is 05:36.