CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > SU2

Problem with parallel computation (case inviscid onera M6)

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   December 2, 2013, 17:44
Default Problem with parallel computation (case inviscid onera M6)
  #1
Member
 
Join Date: Sep 2013
Posts: 38
Rep Power: 4
Combas is on a distinguished road
Hello,

I spent hours to try to perform the tutorial case "inviscid onera m6" in parallel but it does not work. My problem comes from the fact that whatever the number of cores used, the time per iteration remains roughly the same. In fact, when I ask a computation on 4 cores, all the cores do the same thing: the 4 cores are used, and the memory used by each core is the same as if I did a serial computation (so it seems that the entire mesh is sent to each core)

I work with Ubuntu 12.0.4, with SU2 2.0.8.
I tried open mpi 1.5 and mpich2 (it gave the same thing), and for metis I tried versions 4.0.3 and 5.1.0 (this last version does not seem compatible with SU2 since I got an error during the SU2 compilation)

I used the python command: "parallel_computation.py -f inv_ONERAM6.cfg -p 4"
I tried also to run SU2_DDC adding the parameter "NUMBER_PART = 4" and "VISUALIZE_PART= YES" to the config file, but it does nothing (and the parameter "NUMBER_PART" is not recognized)

I put in attachment 2 files:
- what displays the SU2_CFD code when it is launched on 4 cores with "parallel_computation.py"
- what displays the SU2_DDC code when it is launched with "NUMBER_PART = 4"

It looks like more a problem of mesh partition than a problem of parallel computation, but I'm stuck and I don't know what to do...

If anyone has an idea, I am interested!
Thank you in advance
Laurent
Attached Files
File Type: txt SU2_CFD_display.txt (43.6 KB, 20 views)
File Type: txt SU2_DDC_display.txt (2.6 KB, 4 views)
Combas is offline   Reply With Quote

Old   December 2, 2013, 19:17
Default
  #2
New Member
 
Leonardo Burchini
Join Date: Oct 2013
Posts: 5
Rep Power: 4
burchio_cfd is on a distinguished road
Hello Laurent,
I've experienced compatibility problems with openmpi and mpich2 in the same machine. For the speed up or scalability with various number of cores this is my experience running inviscid onera m6:
-with 2 cores the time for the convergence is 5 min and 25 sec;
-with 4 cores 3 min and 20 sec;
-with 6 cores 3 min and 4 sec.
As you'll notice between 4 and 6 cores there is poor speedup due, I think, by memory bandwidth problems. My machine's config is:
-1 Intel core I7 3930k 6 cores;
-32 GB RAM DDR3 quad channel 1333 Mhz.

Leonardo
burchio_cfd is offline   Reply With Quote

Old   December 3, 2013, 06:00
Default
  #3
Member
 
Join Date: Sep 2013
Posts: 38
Rep Power: 4
Combas is on a distinguished road
Thank you Leonardo for these pieces of information.
For the moment I have no speedup at all... Even if I have only 4 cores (Intel Xeon(R) CPU W3540) it would be nice to go 3 or 4 times quicker since I get quite long computations to do.

Anyone else has had the same problem like me?

Laurent
Combas is offline   Reply With Quote

Old   January 22, 2014, 04:47
Default
  #4
New Member
 
Victor Major
Join Date: Jan 2012
Posts: 10
Rep Power: 6
vmajor is on a distinguished road
I am also having problems running the onera6 optimization in parallel.

When I execute

Code:
python ../SU2_PY/shape_optimization.py -f inv_ONERAM6.cfg -p 4
The run terminates at the Solver Preprocessing step as below. What could be happening? This happens on any -p value that I tried (2, 4, 6, 22)

It runs well, but slow on a single thread. It would be great if I could get it to use the available computing resources.

Code:
------------------------- Solver Preprocessing --------------------------
Area projection in the z-plane = 0.758602.
Traceback (most recent call last):
  File "../SU2_PY/shape_optimization.py", line 124, in <module>
    main()
  File "../SU2_PY/shape_optimization.py", line 69, in main
    options.step         )
  File "../SU2_PY/shape_optimization.py", line 107, in shape_optimization
    SU2.opt.SLSQP(project,x0,xb,its)
  File "/media/1tb/SU2/SU2_PY/SU2/opt/scipy_tools.py", line 102, in scipy_slsqp
    epsilon        = 1.0e-06         )
  File "/usr/lib/python2.7/dist-packages/scipy/optimize/slsqp.py", line 236, in fmin_slsqp
    mieq = len(f_ieqcons(x))
  File "/usr/lib/python2.7/dist-packages/scipy/optimize/optimize.py", line 176, in function_wrapper
    return function(x, *args)
  File "/media/1tb/SU2/SU2_PY/SU2/opt/scipy_tools.py", line 187, in con_cieq
    cons = project.con_cieq(x)
  File "/media/1tb/SU2/SU2_PY/SU2/opt/project.py", line 223, in con_cieq
    return self._eval(konfig, func,dvs)
  File "/media/1tb/SU2/SU2_PY/SU2/opt/project.py", line 172, in _eval
    vals = design._eval(func,*args)
  File "/media/1tb/SU2/SU2_PY/SU2/eval/design.py", line 132, in _eval
    vals = eval_func(*inputs)
  File "/media/1tb/SU2/SU2_PY/SU2/eval/design.py", line 422, in con_cieq
    func = su2func(this_con,config,state)
  File "/media/1tb/SU2/SU2_PY/SU2/eval/functions.py", line 75, in function
    aerodynamics( config, state )
  File "/media/1tb/SU2/SU2_PY/SU2/eval/functions.py", line 200, in aerodynamics
    info = su2run.direct(config)
  File "/media/1tb/SU2/SU2_PY/SU2/run/direct.py", line 75, in direct
    SU2_CFD(konfig)
  File "/media/1tb/SU2/SU2_PY/SU2/run/interface.py", line 93, in CFD
    run_command( the_Command )
  File "/media/1tb/SU2/SU2_PY/SU2/run/interface.py", line 279, in run_command
    raise Exception , message
Exception: Path = /media/1tb/SU2/onera6/DESIGNS/DSN_001/DIRECT/,
Command = mpirun -np 4 /usr/local/bin/SU2_CFD config_CFD.cfg
SU2 process returned error '134'
CSysVector::CSysVector(unsigned int,unsigned int,double): invalid input: numBlk, numVar = 0CSysVector::CSysVector(unsigned int,unsigned int,double): CSysVector::CSysVector(unsigned int,unsigned int,double): invalid input: numBlk, numVar = 0,5
,5
invalid input: numBlk, numVar = 0terminate called after throwing an instance of 'terminate called after throwing an instance of 'int'
,5
[ system:14029] *** Process received signal ***
int[ system:14029] Signal: Aborted (6)
[ system:14029] Signal code:  (-6)
terminate called after throwing an instance of 'int'
'
[ system:14030] *** Process received signal ***
[ system:14031] *** Process received signal ***
[ system:14030] Signal: Aborted (6)
[ system:14030] Signal code:  (-6)
[ system:14031] Signal: Aborted (6)
[ system:14031] Signal code:  (-6)
[ system:14029] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f7495f2acb0]
[ system:14029] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f7495b91425]
[ system:14029] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f7495b94b8b]
[ system:14030] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f13a80aecb0]
[ system:14030] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f13a7d15425]
[ system:14030] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f13a7d18b8b]
[ system:14030] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x11d) [0x7f13a888569d]
[ system:14030] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5846) [0x7f13a8883846]
[ system:14030] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5873) [0x7f13a8883873]
[ system:14030] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb596e) [0x7f13a888396e]
[ system:14030] [ 7] /usr/local/bin/SU2_CFD() [0x848d97]
[ system:14030] [ 8] /usr/local/bin/SU2_CFD(_ZN12CEulerSolverC1EP9CGeometryP7CConfigt+0x895) [0x64d055]
[ system:14030] [ 9] /usr/local/bin/SU2_CFD(_Z20Solver_PreprocessingPPP7CSolverPP9CGeometryP7CConfigt+0x1e9) [0x45a299]
[ system:14030] [10] /usr/local/bin/SU2_CFD(main+0x638) [0x6d7db8]
[ system:14030] [11] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f13a7d0076d]
[ system:14030] [12] /usr/local/bin/SU2_CFD() [0x459049]
[ system:14030] *** End of error message ***
[ system:14031] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7fa4e9ecccb0]
[ system:14031] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7fa4e9b33425]
[ system:14031] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7fa4e9b36b8b]
[ system:14031] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x11d) [0x7fa4ea6a369d]
[ system:14031] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5846) [0x7fa4ea6a1846]
[ system:14031] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5873) [0x7fa4ea6a1873]
[ system:14031] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb596e) [0x7fa4ea6a196e]
[ system:14031] [ 7] /usr/local/bin/SU2_CFD() [0x848d97]
[ system:14031] [ 8] /usr/local/bin/SU2_CFD(_ZN12CEulerSolverC1EP9CGeometryP7CConfigt+0x895) [0x64d055]
[ system:14031] [ 9] /usr/local/bin/SU2_CFD(_Z20Solver_PreprocessingPPP7CSolverPP9CGeometryP7CConfigt+0x1e9) [0x45a299]
[ system:14031] [10] /usr/local/bin/SU2_CFD(main+0x638) [0x6d7db8]
[ system:14031] [11] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fa4e9b1e76d]
[ system:14031] [12] /usr/local/bin/SU2_CFD() [0x459049]
[ system:14031] *** End of error message ***
[ system:14029] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x11d) [0x7f749670169d]
[ system:14029] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5846) [0x7f74966ff846]
[ system:14029] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5873) [0x7f74966ff873]
[ system:14029] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb596e) [0x7f74966ff96e]
[ system:14029] [ 7] /usr/local/bin/SU2_CFD() [0x848d97]
[ system:14029] [ 8] /usr/local/bin/SU2_CFD(_ZN12CEulerSolverC1EP9CGeometryP7CConfigt+0x895) [0x64d055]
[ system:14029] [ 9] /usr/local/bin/SU2_CFD(_Z20Solver_PreprocessingPPP7CSolverPP9CGeometryP7CConfigt+0x1e9) [0x45a299]
[ system:14029] [10] /usr/local/bin/SU2_CFD(main+0x638) [0x6d7db8]
[ system:14029] [11] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f7495b7c76d]
[ system:14029] [12] /usr/local/bin/SU2_CFD() [0x459049]
[ system:14029] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 14030 on node  system exited on signal 6 (Aborted).
--------------------------------------------------------------------------
vmajor is offline   Reply With Quote

Old   January 22, 2014, 05:18
Default
  #5
Member
 
Join Date: Sep 2013
Posts: 38
Rep Power: 4
Combas is on a distinguished road
I don't know if you get the same problem than I had, but you can try what I did to solve it. You can read my last message on this subject Parallel processing: Each iteration carried out 4 times
If your case works in serial and not in parallel, the problem is probably the mpi.

Good luck!
Laurent
Combas is offline   Reply With Quote

Old   January 22, 2014, 05:44
Default
  #6
New Member
 
Victor Major
Join Date: Jan 2012
Posts: 10
Rep Power: 6
vmajor is on a distinguished road
Thanks for your reply Laurent,

I removed and re-installed openmpi components, and recreated the exact error that I posted earlier.

I thought of upgrading to openmpi 1.5.4 but HDF5 depends on openmpi 1.4.3.

I should also mention that SU2 is the only application that I use that fails to run under the existing openmpi environment. Thus it is not that openmpi is not working it is just that something specific to SU2 makes it fail.
vmajor is offline   Reply With Quote

Old   January 22, 2014, 05:59
Default
  #7
Member
 
Join Date: Sep 2013
Posts: 38
Rep Power: 4
Combas is on a distinguished road
Indeed SU2 seems to get some problems with openmpi, that is why you should use mpich2. Maybe you have not to uninstall openmpi to use mpich2 with SU2. You probably just have to modify the links contained in the mpirun or mpiexec to point to the ones of mpich2. For me, this change worked.

Laurent
Combas is offline   Reply With Quote

Old   January 22, 2014, 06:05
Default
  #8
New Member
 
Victor Major
Join Date: Jan 2012
Posts: 10
Rep Power: 6
vmajor is on a distinguished road
I can try that. Do I also need to recompile SU2 for mpich?

And yes at least HDF5 is incompatible and needs to be removed in order to install mpich. Fun.

Perhaps the openmpi difficulties are worthy of a bug report?

EDIT: I made the bug report here: https://github.com/su2code/SU2/issues/23
vmajor is offline   Reply With Quote

Old   January 22, 2014, 06:45
Default
  #9
Member
 
Join Date: Sep 2013
Posts: 38
Rep Power: 4
Combas is on a distinguished road
I think that you have to recompile SU2 if you compiled with openmpi... (you have to check the link of the mpicxx file in the options of the configuration file)
I did not have to recompile SU2 since it was already compiled with mpich2. The problem was just the mpirun file.
Laurent
Combas is offline   Reply With Quote

Old   January 29, 2014, 05:16
Default
  #10
New Member
 
Victor Major
Join Date: Jan 2012
Posts: 10
Rep Power: 6
vmajor is on a distinguished road
I installed OpenMPI 1.6.5 and rebuilt SU2....and got a different error. I guess you can call it progress.

I cannot (will not) make any additional changes as our other code that relies on OpenMPI runs fine. I am still hoping that someone from SU2 team will notice this or my bug report and at least comment on the issue.

There has been no acknowledgement of the bug report, and the forum is still sparse so I cannot see much specific information that relates to the errors that I am experiencing.



Code:
found: mesh_ONERAM6_inv.su2
New Project: ./
Warning, removing old designs... now
New Design: DESIGNS/DSN_001
./DESIGNS/DSN_001
./DESIGNS/DSN_001
Evaluate Inequality Constraints
  Lift... the command: mpirun -np 2 /media/1tb/SU2/SU2_DDC config_DDC.cfg
the location: /media/1tb/SU2/onera6/DESIGNS/DSN_001/DECOMP
Traceback (most recent call last):
  File "/usr/local/bin/shape_optimization.py", line 124, in <module>
    main()
  File "/usr/local/bin/shape_optimization.py", line 69, in main
    options.step         )
  File "/usr/local/bin/shape_optimization.py", line 107, in shape_optimization
    SU2.opt.SLSQP(project,x0,xb,its)
  File "/usr/local/bin/SU2/opt/scipy_tools.py", line 102, in scipy_slsqp
    epsilon        = 1.0e-06         )
  File "/usr/lib/python2.7/dist-packages/scipy/optimize/slsqp.py", line 236, in fmin_slsqp
    mieq = len(f_ieqcons(x))
  File "/usr/lib/python2.7/dist-packages/scipy/optimize/optimize.py", line 176, in function_wrapper
    return function(x, *args)
  File "/usr/local/bin/SU2/opt/scipy_tools.py", line 187, in con_cieq
    cons = project.con_cieq(x)
  File "/usr/local/bin/SU2/opt/project.py", line 223, in con_cieq
    return self._eval(konfig, func,dvs)
  File "/usr/local/bin/SU2/opt/project.py", line 172, in _eval
    vals = design._eval(func,*args)
  File "/usr/local/bin/SU2/eval/design.py", line 132, in _eval
    vals = eval_func(*inputs)
  File "/usr/local/bin/SU2/eval/design.py", line 422, in con_cieq
    func = su2func(this_con,config,state)
  File "/usr/local/bin/SU2/eval/functions.py", line 75, in function
    aerodynamics( config, state )
  File "/usr/local/bin/SU2/eval/functions.py", line 148, in aerodynamics
    info = update_mesh(config,state)
  File "/usr/local/bin/SU2/eval/functions.py", line 390, in update_mesh
    info = su2run.decompose(config)
  File "/usr/local/bin/SU2/run/decompose.py", line 66, in decompose
    SU2_DDC(konfig)
  File "/usr/local/bin/SU2/run/interface.py", line 73, in DDC
    run_command( the_Command )
  File "/usr/local/bin/SU2/run/interface.py", line 279, in run_command
    raise Exception , message
Exception: Path = /media/1tb/SU2/onera6/DESIGNS/DSN_001/DECOMP/,
Command = mpirun -np 2 /media/1tb/SU2/SU2_DDC config_DDC.cfg
SU2 process returned error '1'
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
vmajor is offline   Reply With Quote

Old   January 30, 2014, 01:59
Default
  #11
Super Moderator
 
Francisco Palacios
Join Date: Jan 2013
Location: Long Beach, CA
Posts: 321
Rep Power: 6
fpalacios is on a distinguished road
Quote:
Originally Posted by vmajor View Post
I can try that. Do I also need to recompile SU2 for mpich?

And yes at least HDF5 is incompatible and needs to be removed in order to install mpich. Fun.

Perhaps the openmpi difficulties are worthy of a bug report?

EDIT: I made the bug report here: https://github.com/su2code/SU2/issues/23
Thanks,

we'll take care of this issue.

Best,
Francisco
fpalacios is offline   Reply With Quote

Old   January 30, 2014, 02:20
Default
  #12
New Member
 
Victor Major
Join Date: Jan 2012
Posts: 10
Rep Power: 6
vmajor is on a distinguished road
Great, thank you for replying.

I am really keen to start using SU2 for our low speed aerodynamic optimization problems. I am hoping to switch my forum queries to that topic soon :-)
vmajor is offline   Reply With Quote

Reply

Tags
parallel

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Superlinear speedup in OpenFOAM 13 msrinath80 OpenFOAM Running, Solving & CFD 18 March 3, 2015 06:36
Multi-cpu computation HPMPI problem crevoise CD-adapco 1 February 26, 2014 05:32
problem with running in parallel dhruv OpenFOAM 3 November 25, 2011 06:06
Performance of GGI case in parallel hannes OpenFOAM Running, Solving & CFD 26 August 3, 2011 03:07
parallel problem rui CD-adapco 2 July 31, 2007 13:23


All times are GMT -4. The time now is 20:31.