|
[Sponsors] |
Problem with parallel computation (case inviscid onera M6) |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
December 2, 2013, 17:44 |
Problem with parallel computation (case inviscid onera M6)
|
#1 |
Member
Join Date: Sep 2013
Posts: 43
Rep Power: 13 |
Hello,
I spent hours to try to perform the tutorial case "inviscid onera m6" in parallel but it does not work. My problem comes from the fact that whatever the number of cores used, the time per iteration remains roughly the same. In fact, when I ask a computation on 4 cores, all the cores do the same thing: the 4 cores are used, and the memory used by each core is the same as if I did a serial computation (so it seems that the entire mesh is sent to each core) I work with Ubuntu 12.0.4, with SU2 2.0.8. I tried open mpi 1.5 and mpich2 (it gave the same thing), and for metis I tried versions 4.0.3 and 5.1.0 (this last version does not seem compatible with SU2 since I got an error during the SU2 compilation) I used the python command: "parallel_computation.py -f inv_ONERAM6.cfg -p 4" I tried also to run SU2_DDC adding the parameter "NUMBER_PART = 4" and "VISUALIZE_PART= YES" to the config file, but it does nothing (and the parameter "NUMBER_PART" is not recognized) I put in attachment 2 files: - what displays the SU2_CFD code when it is launched on 4 cores with "parallel_computation.py" - what displays the SU2_DDC code when it is launched with "NUMBER_PART = 4" It looks like more a problem of mesh partition than a problem of parallel computation, but I'm stuck and I don't know what to do... If anyone has an idea, I am interested! Thank you in advance Laurent |
|
December 2, 2013, 19:17 |
|
#2 |
New Member
Leonardo Burchini
Join Date: Oct 2013
Posts: 5
Rep Power: 13 |
Hello Laurent,
I've experienced compatibility problems with openmpi and mpich2 in the same machine. For the speed up or scalability with various number of cores this is my experience running inviscid onera m6: -with 2 cores the time for the convergence is 5 min and 25 sec; -with 4 cores 3 min and 20 sec; -with 6 cores 3 min and 4 sec. As you'll notice between 4 and 6 cores there is poor speedup due, I think, by memory bandwidth problems. My machine's config is: -1 Intel core I7 3930k 6 cores; -32 GB RAM DDR3 quad channel 1333 Mhz. Leonardo |
|
December 3, 2013, 06:00 |
|
#3 |
Member
Join Date: Sep 2013
Posts: 43
Rep Power: 13 |
Thank you Leonardo for these pieces of information.
For the moment I have no speedup at all... Even if I have only 4 cores (Intel Xeon(R) CPU W3540) it would be nice to go 3 or 4 times quicker since I get quite long computations to do. Anyone else has had the same problem like me? Laurent |
|
January 22, 2014, 04:47 |
|
#4 |
New Member
Victor Major
Join Date: Jan 2012
Posts: 10
Rep Power: 14 |
I am also having problems running the onera6 optimization in parallel.
When I execute Code:
python ../SU2_PY/shape_optimization.py -f inv_ONERAM6.cfg -p 4 It runs well, but slow on a single thread. It would be great if I could get it to use the available computing resources. Code:
------------------------- Solver Preprocessing -------------------------- Area projection in the z-plane = 0.758602. Traceback (most recent call last): File "../SU2_PY/shape_optimization.py", line 124, in <module> main() File "../SU2_PY/shape_optimization.py", line 69, in main options.step ) File "../SU2_PY/shape_optimization.py", line 107, in shape_optimization SU2.opt.SLSQP(project,x0,xb,its) File "/media/1tb/SU2/SU2_PY/SU2/opt/scipy_tools.py", line 102, in scipy_slsqp epsilon = 1.0e-06 ) File "/usr/lib/python2.7/dist-packages/scipy/optimize/slsqp.py", line 236, in fmin_slsqp mieq = len(f_ieqcons(x)) File "/usr/lib/python2.7/dist-packages/scipy/optimize/optimize.py", line 176, in function_wrapper return function(x, *args) File "/media/1tb/SU2/SU2_PY/SU2/opt/scipy_tools.py", line 187, in con_cieq cons = project.con_cieq(x) File "/media/1tb/SU2/SU2_PY/SU2/opt/project.py", line 223, in con_cieq return self._eval(konfig, func,dvs) File "/media/1tb/SU2/SU2_PY/SU2/opt/project.py", line 172, in _eval vals = design._eval(func,*args) File "/media/1tb/SU2/SU2_PY/SU2/eval/design.py", line 132, in _eval vals = eval_func(*inputs) File "/media/1tb/SU2/SU2_PY/SU2/eval/design.py", line 422, in con_cieq func = su2func(this_con,config,state) File "/media/1tb/SU2/SU2_PY/SU2/eval/functions.py", line 75, in function aerodynamics( config, state ) File "/media/1tb/SU2/SU2_PY/SU2/eval/functions.py", line 200, in aerodynamics info = su2run.direct(config) File "/media/1tb/SU2/SU2_PY/SU2/run/direct.py", line 75, in direct SU2_CFD(konfig) File "/media/1tb/SU2/SU2_PY/SU2/run/interface.py", line 93, in CFD run_command( the_Command ) File "/media/1tb/SU2/SU2_PY/SU2/run/interface.py", line 279, in run_command raise Exception , message Exception: Path = /media/1tb/SU2/onera6/DESIGNS/DSN_001/DIRECT/, Command = mpirun -np 4 /usr/local/bin/SU2_CFD config_CFD.cfg SU2 process returned error '134' CSysVector::CSysVector(unsigned int,unsigned int,double): invalid input: numBlk, numVar = 0CSysVector::CSysVector(unsigned int,unsigned int,double): CSysVector::CSysVector(unsigned int,unsigned int,double): invalid input: numBlk, numVar = 0,5 ,5 invalid input: numBlk, numVar = 0terminate called after throwing an instance of 'terminate called after throwing an instance of 'int' ,5 [ system:14029] *** Process received signal *** int[ system:14029] Signal: Aborted (6) [ system:14029] Signal code: (-6) terminate called after throwing an instance of 'int' ' [ system:14030] *** Process received signal *** [ system:14031] *** Process received signal *** [ system:14030] Signal: Aborted (6) [ system:14030] Signal code: (-6) [ system:14031] Signal: Aborted (6) [ system:14031] Signal code: (-6) [ system:14029] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f7495f2acb0] [ system:14029] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f7495b91425] [ system:14029] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f7495b94b8b] [ system:14030] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f13a80aecb0] [ system:14030] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f13a7d15425] [ system:14030] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f13a7d18b8b] [ system:14030] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x11d) [0x7f13a888569d] [ system:14030] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5846) [0x7f13a8883846] [ system:14030] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5873) [0x7f13a8883873] [ system:14030] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb596e) [0x7f13a888396e] [ system:14030] [ 7] /usr/local/bin/SU2_CFD() [0x848d97] [ system:14030] [ 8] /usr/local/bin/SU2_CFD(_ZN12CEulerSolverC1EP9CGeometryP7CConfigt+0x895) [0x64d055] [ system:14030] [ 9] /usr/local/bin/SU2_CFD(_Z20Solver_PreprocessingPPP7CSolverPP9CGeometryP7CConfigt+0x1e9) [0x45a299] [ system:14030] [10] /usr/local/bin/SU2_CFD(main+0x638) [0x6d7db8] [ system:14030] [11] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f13a7d0076d] [ system:14030] [12] /usr/local/bin/SU2_CFD() [0x459049] [ system:14030] *** End of error message *** [ system:14031] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7fa4e9ecccb0] [ system:14031] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7fa4e9b33425] [ system:14031] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7fa4e9b36b8b] [ system:14031] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x11d) [0x7fa4ea6a369d] [ system:14031] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5846) [0x7fa4ea6a1846] [ system:14031] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5873) [0x7fa4ea6a1873] [ system:14031] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb596e) [0x7fa4ea6a196e] [ system:14031] [ 7] /usr/local/bin/SU2_CFD() [0x848d97] [ system:14031] [ 8] /usr/local/bin/SU2_CFD(_ZN12CEulerSolverC1EP9CGeometryP7CConfigt+0x895) [0x64d055] [ system:14031] [ 9] /usr/local/bin/SU2_CFD(_Z20Solver_PreprocessingPPP7CSolverPP9CGeometryP7CConfigt+0x1e9) [0x45a299] [ system:14031] [10] /usr/local/bin/SU2_CFD(main+0x638) [0x6d7db8] [ system:14031] [11] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fa4e9b1e76d] [ system:14031] [12] /usr/local/bin/SU2_CFD() [0x459049] [ system:14031] *** End of error message *** [ system:14029] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x11d) [0x7f749670169d] [ system:14029] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5846) [0x7f74966ff846] [ system:14029] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5873) [0x7f74966ff873] [ system:14029] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb596e) [0x7f74966ff96e] [ system:14029] [ 7] /usr/local/bin/SU2_CFD() [0x848d97] [ system:14029] [ 8] /usr/local/bin/SU2_CFD(_ZN12CEulerSolverC1EP9CGeometryP7CConfigt+0x895) [0x64d055] [ system:14029] [ 9] /usr/local/bin/SU2_CFD(_Z20Solver_PreprocessingPPP7CSolverPP9CGeometryP7CConfigt+0x1e9) [0x45a299] [ system:14029] [10] /usr/local/bin/SU2_CFD(main+0x638) [0x6d7db8] [ system:14029] [11] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f7495b7c76d] [ system:14029] [12] /usr/local/bin/SU2_CFD() [0x459049] [ system:14029] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 2 with PID 14030 on node system exited on signal 6 (Aborted). -------------------------------------------------------------------------- |
|
January 22, 2014, 05:18 |
|
#5 |
Member
Join Date: Sep 2013
Posts: 43
Rep Power: 13 |
I don't know if you get the same problem than I had, but you can try what I did to solve it. You can read my last message on this subject http://www.cfd-online.com/Forums/su2...t-4-times.html
If your case works in serial and not in parallel, the problem is probably the mpi. Good luck! Laurent |
|
January 22, 2014, 05:44 |
|
#6 |
New Member
Victor Major
Join Date: Jan 2012
Posts: 10
Rep Power: 14 |
Thanks for your reply Laurent,
I removed and re-installed openmpi components, and recreated the exact error that I posted earlier. I thought of upgrading to openmpi 1.5.4 but HDF5 depends on openmpi 1.4.3. I should also mention that SU2 is the only application that I use that fails to run under the existing openmpi environment. Thus it is not that openmpi is not working it is just that something specific to SU2 makes it fail. |
|
January 22, 2014, 05:59 |
|
#7 |
Member
Join Date: Sep 2013
Posts: 43
Rep Power: 13 |
Indeed SU2 seems to get some problems with openmpi, that is why you should use mpich2. Maybe you have not to uninstall openmpi to use mpich2 with SU2. You probably just have to modify the links contained in the mpirun or mpiexec to point to the ones of mpich2. For me, this change worked.
Laurent |
|
January 22, 2014, 06:05 |
|
#8 |
New Member
Victor Major
Join Date: Jan 2012
Posts: 10
Rep Power: 14 |
I can try that. Do I also need to recompile SU2 for mpich?
And yes at least HDF5 is incompatible and needs to be removed in order to install mpich. Fun. Perhaps the openmpi difficulties are worthy of a bug report? EDIT: I made the bug report here: https://github.com/su2code/SU2/issues/23 |
|
January 22, 2014, 06:45 |
|
#9 |
Member
Join Date: Sep 2013
Posts: 43
Rep Power: 13 |
I think that you have to recompile SU2 if you compiled with openmpi... (you have to check the link of the mpicxx file in the options of the configuration file)
I did not have to recompile SU2 since it was already compiled with mpich2. The problem was just the mpirun file. Laurent |
|
January 29, 2014, 05:16 |
|
#10 |
New Member
Victor Major
Join Date: Jan 2012
Posts: 10
Rep Power: 14 |
I installed OpenMPI 1.6.5 and rebuilt SU2....and got a different error. I guess you can call it progress.
I cannot (will not) make any additional changes as our other code that relies on OpenMPI runs fine. I am still hoping that someone from SU2 team will notice this or my bug report and at least comment on the issue. There has been no acknowledgement of the bug report, and the forum is still sparse so I cannot see much specific information that relates to the errors that I am experiencing. Code:
found: mesh_ONERAM6_inv.su2 New Project: ./ Warning, removing old designs... now New Design: DESIGNS/DSN_001 ./DESIGNS/DSN_001 ./DESIGNS/DSN_001 Evaluate Inequality Constraints Lift... the command: mpirun -np 2 /media/1tb/SU2/SU2_DDC config_DDC.cfg the location: /media/1tb/SU2/onera6/DESIGNS/DSN_001/DECOMP Traceback (most recent call last): File "/usr/local/bin/shape_optimization.py", line 124, in <module> main() File "/usr/local/bin/shape_optimization.py", line 69, in main options.step ) File "/usr/local/bin/shape_optimization.py", line 107, in shape_optimization SU2.opt.SLSQP(project,x0,xb,its) File "/usr/local/bin/SU2/opt/scipy_tools.py", line 102, in scipy_slsqp epsilon = 1.0e-06 ) File "/usr/lib/python2.7/dist-packages/scipy/optimize/slsqp.py", line 236, in fmin_slsqp mieq = len(f_ieqcons(x)) File "/usr/lib/python2.7/dist-packages/scipy/optimize/optimize.py", line 176, in function_wrapper return function(x, *args) File "/usr/local/bin/SU2/opt/scipy_tools.py", line 187, in con_cieq cons = project.con_cieq(x) File "/usr/local/bin/SU2/opt/project.py", line 223, in con_cieq return self._eval(konfig, func,dvs) File "/usr/local/bin/SU2/opt/project.py", line 172, in _eval vals = design._eval(func,*args) File "/usr/local/bin/SU2/eval/design.py", line 132, in _eval vals = eval_func(*inputs) File "/usr/local/bin/SU2/eval/design.py", line 422, in con_cieq func = su2func(this_con,config,state) File "/usr/local/bin/SU2/eval/functions.py", line 75, in function aerodynamics( config, state ) File "/usr/local/bin/SU2/eval/functions.py", line 148, in aerodynamics info = update_mesh(config,state) File "/usr/local/bin/SU2/eval/functions.py", line 390, in update_mesh info = su2run.decompose(config) File "/usr/local/bin/SU2/run/decompose.py", line 66, in decompose SU2_DDC(konfig) File "/usr/local/bin/SU2/run/interface.py", line 73, in DDC run_command( the_Command ) File "/usr/local/bin/SU2/run/interface.py", line 279, in run_command raise Exception , message Exception: Path = /media/1tb/SU2/onera6/DESIGNS/DSN_001/DECOMP/, Command = mpirun -np 2 /media/1tb/SU2/SU2_DDC config_DDC.cfg SU2 process returned error '1' -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- |
|
January 30, 2014, 01:59 |
|
#11 | |
Super Moderator
Francisco Palacios
Join Date: Jan 2013
Location: Long Beach, CA
Posts: 404
Rep Power: 15 |
Quote:
we'll take care of this issue. Best, Francisco |
||
January 30, 2014, 02:20 |
|
#12 |
New Member
Victor Major
Join Date: Jan 2012
Posts: 10
Rep Power: 14 |
Great, thank you for replying.
I am really keen to start using SU2 for our low speed aerodynamic optimization problems. I am hoping to switch my forum queries to that topic soon :-) |
|
Tags |
parallel |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Superlinear speedup in OpenFOAM 13 | msrinath80 | OpenFOAM Running, Solving & CFD | 18 | March 3, 2015 06:36 |
Multi-cpu computation HPMPI problem | crevoise | Siemens | 1 | February 26, 2014 05:32 |
problem with running in parallel | dhruv | OpenFOAM | 3 | November 25, 2011 06:06 |
Performance of GGI case in parallel | hannes | OpenFOAM Running, Solving & CFD | 26 | August 3, 2011 04:07 |
parallel problem | rui | Siemens | 2 | July 31, 2007 14:23 |