CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > SU2

Error while calling MPI_Barrier

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   April 10, 2014, 11:04
Default Error while calling MPI_Barrier
  #1
Member
 
Carlos Alexandre Tomigawa Aguni
Join Date: Mar 2014
Posts: 40
Rep Power: 12
CrashLaker is on a distinguished road
Hello guys!

I've managed to successfully run in 1 node consuming up to 2 processes.
Quote:
parallel_computation.py -f inv_ONERAM6.cfg -p 2
the command: mpirun -np 2 /opt/su2v2/bin/SU2_DDC config_DDC.cfg
But when when I try to scale it to more than 1 node it echoes an MPI_Barrier error. Can you help?

Quote:
parallel_computation.py -f inv_ONERAM6.cfg -p 12
the command: mpirun -np 12 -machinefile hosts /opt/su2v2/bin/SU2_DDC config_DDC.cfg
Error:
Code:
---------------------- Read grid file information -----------------------
Three dimensional problem.
582752 interior elements. 
Traceback (most recent call last):
  File "/opt/su2v2/bin/parallel_computation.py", line 113, in <module>
    main()
  File "/opt/su2v2/bin/parallel_computation.py", line 58, in main
    options.divide_grid  )
  File "/opt/su2v2/bin/parallel_computation.py", line 81, in parallel_computation
    info = SU2.run.decompose(config)
  File "/opt/su2v2/bin/SU2/run/decompose.py", line 66, in decompose
    SU2_DDC(konfig)
  File "/opt/su2v2/bin/SU2/run/interface.py", line 73, in DDC
    run_command( the_Command )
  File "/opt/su2v2/bin/SU2/run/interface.py", line 279, in run_command
    raise Exception , message
Exception: Path = /root/oneram6v2/,
Command = mpirun -np 12 -machinefile hosts /opt/su2v2/bin/SU2_DDC config_DDC.cfg
SU2 process returned error '1'
Fatal error in PMPI_Barrier: A process has failed, error stack:
PMPI_Barrier(428)...............: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(335)..........: Failure during collective
MPIR_Barrier_impl(328)..........: 
MPIR_Barrier(292)...............: 
MPIR_Barrier_intra(149).........: 
barrier_smp_intra(94)...........: 
MPIR_Barrier_impl(335)..........: Failure during collective
MPIR_Barrier_impl(328)..........: 
MPIR_Barrier(292)...............: 
MPIR_Barrier_intra(169).........: 
MPIDI_CH3U_Recvq_FDU_or_AEP(630): Communication error with rank 0
barrier_smp_intra(109)..........: 
MPIR_Bcast_impl(1458)...........: 
MPIR_Bcast(1482)................: 
MPIR_Bcast_intra(1291)..........: 
MPIR_Bcast_binomial(309)........: Failure during collective
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(428).......: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(335)..: Failure during collective
MPIR_Barrier_impl(328)..: 
MPIR_Barrier(292).......: 
MPIR_Barrier_intra(149).: 
barrier_smp_intra(109)..: 
MPIR_Bcast_impl(1458)...: 
MPIR_Bcast(1482)........: 
MPIR_Bcast_intra(1291)..: 
MPIR_Bcast_binomial(309): Failure during collective
CrashLaker is offline   Reply With Quote

Old   April 10, 2014, 15:55
Default
  #2
Super Moderator
 
Thomas D. Economon
Join Date: Jan 2013
Location: Stanford, CA
Posts: 271
Rep Power: 14
economon is on a distinguished road
Hi,

Can you please share the compiler/MPI type and versions that you are using?

Also, can you try to call SU2_DDC as a stand alone module with something like: mpirun -np 8 SU2_DDC inv_ONERAM6.cfg to verify whether there is a problem on the Python side or on the C++ side?

T
economon is offline   Reply With Quote

Old   April 10, 2014, 16:02
Default
  #3
Member
 
Carlos Alexandre Tomigawa Aguni
Join Date: Mar 2014
Posts: 40
Rep Power: 12
CrashLaker is on a distinguished road
Quote:
Originally Posted by economon View Post
Hi,

Can you please share the compiler/MPI type and versions that you are using?

Also, can you try to call SU2_DDC as a stand alone module with something like: mpirun -np 8 SU2_DDC inv_ONERAM6.cfg to verify whether there is a problem on the Python side or on the C++ side?

T
Hello Economon!

Im using Metis 5.0.2 along with OMPI 1.4.

I get the same error calling mpirun directly. So it's a C++ problem.
Quote:
------------------------ Divide the numerical grid ----------------------
Domain 1: 108396 points (0 ghost points). Comm buff: 21.98MB of 50.00MB.
[puma4:01831] *** Process received signal ***
[puma4:01831] Signal: Segmentation fault (11)
[puma4:01831] Signal code: Address not mapped (1)
[puma4:01831] Failing at address: 0x2ddb8b20
Domain 2: 0 points (0 ghost points). Comm buff: 0.00MB of 50.00MB.
[puma4:01831] [ 0] /lib64/libpthread.so.0 [0x345400eca0]
[puma4:01831] [ 1] /scratch/ramos/su2mpi/bin/SU2_DDC(_ZN15CDomainGeometryC1EP9CGeometryP7CConfi g+0xb35) [0x50df95]
[puma4:01831] [ 2] /scratch/ramos/su2mpi/bin/SU2_DDC(main+0x2d4) [0x44ce14]
[puma4:01831] [ 3] /lib64/libc.so.6(__libc_start_main+0xf4) [0x345341d9c4]
[puma4:01831] [ 4] /scratch/ramos/su2mpi/bin/SU2_DDC(_ZNSt8ios_base4InitD1Ev+0x39) [0x44ca89]
[puma4:01831] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 1831 on node puma4 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
CrashLaker is offline   Reply With Quote

Old   April 10, 2014, 17:17
Default
  #4
Super Moderator
 
Thomas D. Economon
Join Date: Jan 2013
Location: Stanford, CA
Posts: 271
Rep Power: 14
economon is on a distinguished road
Quote:
Originally Posted by CrashLaker View Post
Hello Economon!

Im using Metis 5.0.2 along with OMPI 1.4.

I get the same error calling mpirun directly. So it's a C++ problem.
Hmmm.. I have mostly been working with OpenMPI 1.6. Do you have the ability to upgrade your OpenMPI to a newer version and try again?
economon is offline   Reply With Quote

Old   April 10, 2014, 21:32
Default
  #5
Member
 
Carlos Alexandre Tomigawa Aguni
Join Date: Mar 2014
Posts: 40
Rep Power: 12
CrashLaker is on a distinguished road
Quote:
Originally Posted by economon View Post
Hmmm.. I have mostly been working with OpenMPI 1.6. Do you have the ability to upgrade your OpenMPI to a newer version and try again?
Hello Economon.

Same thing

Quote:
Command = /scratch/programas/intel/ompi-1.6-intel/bin/mpirun -np 4 -machinefile hosts /scratch/ramos/su2v4/bin/SU2_DDC config_DDC.cfg
SU2 process returned error '139'
[puma43:16153] *** Process received signal ***
[puma43:16153] Signal: Segmentation fault (11)
[puma43:16153] Signal code: Address not mapped (1)
[puma43:16153] Failing at address: 0x194158a0
[puma43:16153] [ 0] /lib64/libpthread.so.0 [0x32dce0eca0]
[puma43:16153] [ 1] /scratch/ramos/su2v4/bin/SU2_DDC(_ZN15CDomainGeometryC1EP9CGeometryP7CConfi g+0xb35) [0x53b5c5]
[puma43:16153] [ 2] /scratch/ramos/su2v4/bin/SU2_DDC(main+0x2d4) [0x4798b4]
[puma43:16153] [ 3] /lib64/libc.so.6(__libc_start_main+0xf4) [0x32dc21d9c4]
[puma43:16153] [ 4] /scratch/ramos/su2v4/bin/SU2_DDC(_ZNSt8ios_base4InitD1Ev+0x41) [0x479529]
[puma43:16153] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 16153 on node puma43 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
CrashLaker is offline   Reply With Quote

Old   April 10, 2014, 21:36
Default
  #6
Super Moderator
 
Thomas D. Economon
Join Date: Jan 2013
Location: Stanford, CA
Posts: 271
Rep Power: 14
economon is on a distinguished road
Hmm.. can you share the options that you sent to the configure process? And does this happen for every test case that you run in parallel?
economon is offline   Reply With Quote

Old   April 10, 2014, 21:38
Default
  #7
Member
 
Carlos Alexandre Tomigawa Aguni
Join Date: Mar 2014
Posts: 40
Rep Power: 12
CrashLaker is on a distinguished road
Quote:
Originally Posted by economon View Post
Hmm.. can you share the options that you sent to the configure process? And does this happen for every test case that you run in parallel?
This is my configure.
Quote:
./configure --prefix="/scratch/ramos/su2v4" --with-Metis-lib="/scratch/ramos/metis5.0.2/lib" --with-Metis-include="/scratch/ramos/metis5.0.2/include" --with-Metis-version=5 --with-MPI="/scratch/programas/intel/ompi1.6-intel/bin/mpicxx"
This is actually the 2nd tutorial yet The first that teaches how to run in parallel.
CrashLaker is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Calling zone name, zone id and total pressure from Fluent scheme to UDF kuldip Fluent UDF and Scheme Programming 2 October 6, 2014 00:23
Changing viewfactor Model and use the changed my model by calling "mySolver" FabianF OpenFOAM Programming & Development 2 January 14, 2014 11:25
Calling Fluent from IDL. Help! justinclouds FLUENT 0 May 19, 2013 18:37
Udf calling into Fluent coolengineer Fluent UDF and Scheme Programming 2 April 26, 2009 11:16
calling the iteration value D Harvey FLUENT 0 July 12, 2004 00:08


All times are GMT -4. The time now is 23:31.