CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   SU2 (http://www.cfd-online.com/Forums/su2/)
-   -   Problem with parallel computation (case inviscid onera M6) (http://www.cfd-online.com/Forums/su2/127066-problem-parallel-computation-case-inviscid-onera-m6.html)

Combas December 2, 2013 17:44

Problem with parallel computation (case inviscid onera M6)
 
2 Attachment(s)
Hello,

I spent hours to try to perform the tutorial case "inviscid onera m6" in parallel but it does not work. My problem comes from the fact that whatever the number of cores used, the time per iteration remains roughly the same. In fact, when I ask a computation on 4 cores, all the cores do the same thing: the 4 cores are used, and the memory used by each core is the same as if I did a serial computation (so it seems that the entire mesh is sent to each core)

I work with Ubuntu 12.0.4, with SU2 2.0.8.
I tried open mpi 1.5 and mpich2 (it gave the same thing), and for metis I tried versions 4.0.3 and 5.1.0 (this last version does not seem compatible with SU2 since I got an error during the SU2 compilation)

I used the python command: "parallel_computation.py -f inv_ONERAM6.cfg -p 4"
I tried also to run SU2_DDC adding the parameter "NUMBER_PART = 4" and "VISUALIZE_PART= YES" to the config file, but it does nothing (and the parameter "NUMBER_PART" is not recognized)

I put in attachment 2 files:
- what displays the SU2_CFD code when it is launched on 4 cores with "parallel_computation.py"
- what displays the SU2_DDC code when it is launched with "NUMBER_PART = 4"

It looks like more a problem of mesh partition than a problem of parallel computation, but I'm stuck and I don't know what to do...

If anyone has an idea, I am interested!
Thank you in advance
Laurent

burchio_cfd December 2, 2013 19:17

Hello Laurent,
I've experienced compatibility problems with openmpi and mpich2 in the same machine. For the speed up or scalability with various number of cores this is my experience running inviscid onera m6:
-with 2 cores the time for the convergence is 5 min and 25 sec;
-with 4 cores 3 min and 20 sec;
-with 6 cores 3 min and 4 sec.
As you'll notice between 4 and 6 cores there is poor speedup due, I think, by memory bandwidth problems. My machine's config is:
-1 Intel core I7 3930k 6 cores;
-32 GB RAM DDR3 quad channel 1333 Mhz.

Leonardo

Combas December 3, 2013 06:00

Thank you Leonardo for these pieces of information.
For the moment I have no speedup at all... Even if I have only 4 cores (Intel Xeon(R) CPU W3540) it would be nice to go 3 or 4 times quicker since I get quite long computations to do.

Anyone else has had the same problem like me?

Laurent

vmajor January 22, 2014 04:47

I am also having problems running the onera6 optimization in parallel.

When I execute

Code:

python ../SU2_PY/shape_optimization.py -f inv_ONERAM6.cfg -p 4
The run terminates at the Solver Preprocessing step as below. What could be happening? This happens on any -p value that I tried (2, 4, 6, 22)

It runs well, but slow on a single thread. It would be great if I could get it to use the available computing resources.

Code:

------------------------- Solver Preprocessing --------------------------
Area projection in the z-plane = 0.758602.
Traceback (most recent call last):
  File "../SU2_PY/shape_optimization.py", line 124, in <module>
    main()
  File "../SU2_PY/shape_optimization.py", line 69, in main
    options.step        )
  File "../SU2_PY/shape_optimization.py", line 107, in shape_optimization
    SU2.opt.SLSQP(project,x0,xb,its)
  File "/media/1tb/SU2/SU2_PY/SU2/opt/scipy_tools.py", line 102, in scipy_slsqp
    epsilon        = 1.0e-06        )
  File "/usr/lib/python2.7/dist-packages/scipy/optimize/slsqp.py", line 236, in fmin_slsqp
    mieq = len(f_ieqcons(x))
  File "/usr/lib/python2.7/dist-packages/scipy/optimize/optimize.py", line 176, in function_wrapper
    return function(x, *args)
  File "/media/1tb/SU2/SU2_PY/SU2/opt/scipy_tools.py", line 187, in con_cieq
    cons = project.con_cieq(x)
  File "/media/1tb/SU2/SU2_PY/SU2/opt/project.py", line 223, in con_cieq
    return self._eval(konfig, func,dvs)
  File "/media/1tb/SU2/SU2_PY/SU2/opt/project.py", line 172, in _eval
    vals = design._eval(func,*args)
  File "/media/1tb/SU2/SU2_PY/SU2/eval/design.py", line 132, in _eval
    vals = eval_func(*inputs)
  File "/media/1tb/SU2/SU2_PY/SU2/eval/design.py", line 422, in con_cieq
    func = su2func(this_con,config,state)
  File "/media/1tb/SU2/SU2_PY/SU2/eval/functions.py", line 75, in function
    aerodynamics( config, state )
  File "/media/1tb/SU2/SU2_PY/SU2/eval/functions.py", line 200, in aerodynamics
    info = su2run.direct(config)
  File "/media/1tb/SU2/SU2_PY/SU2/run/direct.py", line 75, in direct
    SU2_CFD(konfig)
  File "/media/1tb/SU2/SU2_PY/SU2/run/interface.py", line 93, in CFD
    run_command( the_Command )
  File "/media/1tb/SU2/SU2_PY/SU2/run/interface.py", line 279, in run_command
    raise Exception , message
Exception: Path = /media/1tb/SU2/onera6/DESIGNS/DSN_001/DIRECT/,
Command = mpirun -np 4 /usr/local/bin/SU2_CFD config_CFD.cfg
SU2 process returned error '134'
CSysVector::CSysVector(unsigned int,unsigned int,double): invalid input: numBlk, numVar = 0CSysVector::CSysVector(unsigned int,unsigned int,double): CSysVector::CSysVector(unsigned int,unsigned int,double): invalid input: numBlk, numVar = 0,5
,5
invalid input: numBlk, numVar = 0terminate called after throwing an instance of 'terminate called after throwing an instance of 'int'
,5
[ system:14029] *** Process received signal ***
int[ system:14029] Signal: Aborted (6)
[ system:14029] Signal code:  (-6)
terminate called after throwing an instance of 'int'
'
[ system:14030] *** Process received signal ***
[ system:14031] *** Process received signal ***
[ system:14030] Signal: Aborted (6)
[ system:14030] Signal code:  (-6)
[ system:14031] Signal: Aborted (6)
[ system:14031] Signal code:  (-6)
[ system:14029] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f7495f2acb0]
[ system:14029] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f7495b91425]
[ system:14029] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f7495b94b8b]
[ system:14030] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f13a80aecb0]
[ system:14030] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f13a7d15425]
[ system:14030] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f13a7d18b8b]
[ system:14030] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x11d) [0x7f13a888569d]
[ system:14030] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5846) [0x7f13a8883846]
[ system:14030] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5873) [0x7f13a8883873]
[ system:14030] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb596e) [0x7f13a888396e]
[ system:14030] [ 7] /usr/local/bin/SU2_CFD() [0x848d97]
[ system:14030] [ 8] /usr/local/bin/SU2_CFD(_ZN12CEulerSolverC1EP9CGeometryP7CConfigt+0x895) [0x64d055]
[ system:14030] [ 9] /usr/local/bin/SU2_CFD(_Z20Solver_PreprocessingPPP7CSolverPP9CGeometryP7CConfigt+0x1e9) [0x45a299]
[ system:14030] [10] /usr/local/bin/SU2_CFD(main+0x638) [0x6d7db8]
[ system:14030] [11] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f13a7d0076d]
[ system:14030] [12] /usr/local/bin/SU2_CFD() [0x459049]
[ system:14030] *** End of error message ***
[ system:14031] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7fa4e9ecccb0]
[ system:14031] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7fa4e9b33425]
[ system:14031] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7fa4e9b36b8b]
[ system:14031] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x11d) [0x7fa4ea6a369d]
[ system:14031] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5846) [0x7fa4ea6a1846]
[ system:14031] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5873) [0x7fa4ea6a1873]
[ system:14031] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb596e) [0x7fa4ea6a196e]
[ system:14031] [ 7] /usr/local/bin/SU2_CFD() [0x848d97]
[ system:14031] [ 8] /usr/local/bin/SU2_CFD(_ZN12CEulerSolverC1EP9CGeometryP7CConfigt+0x895) [0x64d055]
[ system:14031] [ 9] /usr/local/bin/SU2_CFD(_Z20Solver_PreprocessingPPP7CSolverPP9CGeometryP7CConfigt+0x1e9) [0x45a299]
[ system:14031] [10] /usr/local/bin/SU2_CFD(main+0x638) [0x6d7db8]
[ system:14031] [11] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fa4e9b1e76d]
[ system:14031] [12] /usr/local/bin/SU2_CFD() [0x459049]
[ system:14031] *** End of error message ***
[ system:14029] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x11d) [0x7f749670169d]
[ system:14029] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5846) [0x7f74966ff846]
[ system:14029] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb5873) [0x7f74966ff873]
[ system:14029] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb596e) [0x7f74966ff96e]
[ system:14029] [ 7] /usr/local/bin/SU2_CFD() [0x848d97]
[ system:14029] [ 8] /usr/local/bin/SU2_CFD(_ZN12CEulerSolverC1EP9CGeometryP7CConfigt+0x895) [0x64d055]
[ system:14029] [ 9] /usr/local/bin/SU2_CFD(_Z20Solver_PreprocessingPPP7CSolverPP9CGeometryP7CConfigt+0x1e9) [0x45a299]
[ system:14029] [10] /usr/local/bin/SU2_CFD(main+0x638) [0x6d7db8]
[ system:14029] [11] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f7495b7c76d]
[ system:14029] [12] /usr/local/bin/SU2_CFD() [0x459049]
[ system:14029] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 14030 on node  system exited on signal 6 (Aborted).
--------------------------------------------------------------------------


Combas January 22, 2014 05:18

I don't know if you get the same problem than I had, but you can try what I did to solve it. You can read my last message on this subject http://www.cfd-online.com/Forums/su2...t-4-times.html
If your case works in serial and not in parallel, the problem is probably the mpi.

Good luck!
Laurent

vmajor January 22, 2014 05:44

Thanks for your reply Laurent,

I removed and re-installed openmpi components, and recreated the exact error that I posted earlier.

I thought of upgrading to openmpi 1.5.4 but HDF5 depends on openmpi 1.4.3.

I should also mention that SU2 is the only application that I use that fails to run under the existing openmpi environment. Thus it is not that openmpi is not working it is just that something specific to SU2 makes it fail.

Combas January 22, 2014 05:59

Indeed SU2 seems to get some problems with openmpi, that is why you should use mpich2. Maybe you have not to uninstall openmpi to use mpich2 with SU2. You probably just have to modify the links contained in the mpirun or mpiexec to point to the ones of mpich2. For me, this change worked.

Laurent

vmajor January 22, 2014 06:05

I can try that. Do I also need to recompile SU2 for mpich?

And yes at least HDF5 is incompatible and needs to be removed in order to install mpich. Fun.

Perhaps the openmpi difficulties are worthy of a bug report?

EDIT: I made the bug report here: https://github.com/su2code/SU2/issues/23

Combas January 22, 2014 06:45

I think that you have to recompile SU2 if you compiled with openmpi... (you have to check the link of the mpicxx file in the options of the configuration file)
I did not have to recompile SU2 since it was already compiled with mpich2. The problem was just the mpirun file.
Laurent

vmajor January 29, 2014 05:16

I installed OpenMPI 1.6.5 and rebuilt SU2....and got a different error. I guess you can call it progress.

I cannot (will not) make any additional changes as our other code that relies on OpenMPI runs fine. I am still hoping that someone from SU2 team will notice this or my bug report and at least comment on the issue.

There has been no acknowledgement of the bug report, and the forum is still sparse so I cannot see much specific information that relates to the errors that I am experiencing.



Code:

found: mesh_ONERAM6_inv.su2
New Project: ./
Warning, removing old designs... now
New Design: DESIGNS/DSN_001
./DESIGNS/DSN_001
./DESIGNS/DSN_001
Evaluate Inequality Constraints
  Lift... the command: mpirun -np 2 /media/1tb/SU2/SU2_DDC config_DDC.cfg
the location: /media/1tb/SU2/onera6/DESIGNS/DSN_001/DECOMP
Traceback (most recent call last):
  File "/usr/local/bin/shape_optimization.py", line 124, in <module>
    main()
  File "/usr/local/bin/shape_optimization.py", line 69, in main
    options.step        )
  File "/usr/local/bin/shape_optimization.py", line 107, in shape_optimization
    SU2.opt.SLSQP(project,x0,xb,its)
  File "/usr/local/bin/SU2/opt/scipy_tools.py", line 102, in scipy_slsqp
    epsilon        = 1.0e-06        )
  File "/usr/lib/python2.7/dist-packages/scipy/optimize/slsqp.py", line 236, in fmin_slsqp
    mieq = len(f_ieqcons(x))
  File "/usr/lib/python2.7/dist-packages/scipy/optimize/optimize.py", line 176, in function_wrapper
    return function(x, *args)
  File "/usr/local/bin/SU2/opt/scipy_tools.py", line 187, in con_cieq
    cons = project.con_cieq(x)
  File "/usr/local/bin/SU2/opt/project.py", line 223, in con_cieq
    return self._eval(konfig, func,dvs)
  File "/usr/local/bin/SU2/opt/project.py", line 172, in _eval
    vals = design._eval(func,*args)
  File "/usr/local/bin/SU2/eval/design.py", line 132, in _eval
    vals = eval_func(*inputs)
  File "/usr/local/bin/SU2/eval/design.py", line 422, in con_cieq
    func = su2func(this_con,config,state)
  File "/usr/local/bin/SU2/eval/functions.py", line 75, in function
    aerodynamics( config, state )
  File "/usr/local/bin/SU2/eval/functions.py", line 148, in aerodynamics
    info = update_mesh(config,state)
  File "/usr/local/bin/SU2/eval/functions.py", line 390, in update_mesh
    info = su2run.decompose(config)
  File "/usr/local/bin/SU2/run/decompose.py", line 66, in decompose
    SU2_DDC(konfig)
  File "/usr/local/bin/SU2/run/interface.py", line 73, in DDC
    run_command( the_Command )
  File "/usr/local/bin/SU2/run/interface.py", line 279, in run_command
    raise Exception , message
Exception: Path = /media/1tb/SU2/onera6/DESIGNS/DSN_001/DECOMP/,
Command = mpirun -np 2 /media/1tb/SU2/SU2_DDC config_DDC.cfg
SU2 process returned error '1'
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------


fpalacios January 30, 2014 01:59

Quote:

Originally Posted by vmajor (Post 471197)
I can try that. Do I also need to recompile SU2 for mpich?

And yes at least HDF5 is incompatible and needs to be removed in order to install mpich. Fun.

Perhaps the openmpi difficulties are worthy of a bug report?

EDIT: I made the bug report here: https://github.com/su2code/SU2/issues/23

Thanks,

we'll take care of this issue.

Best,
Francisco

vmajor January 30, 2014 02:20

Great, thank you for replying.

I am really keen to start using SU2 for our low speed aerodynamic optimization problems. I am hoping to switch my forum queries to that topic soon :-)


All times are GMT -4. The time now is 21:26.