Parallelism structure
Hi,
I was wondering whether someone could help me out a bit in terms of explaining how the parallel computation in SU2 is structured. As far as I can see what you currently do is to split up the program into multiple subproblems (how? just by choosing a subset of the dataset? Are there any dependency issues that needs to be taken into account?) and then running a full SU2_CFD application on the individual subsets and then in the end stitching together a solution. Are there any dependencies between while running the application on different cores, or are these computations completely independent? Another question, when each of the iterations of the application run, these iterations are directly dependent on each other, and each of the iterations does work on the full input data correct? So by splitting up the input data into smaller chunks each iteration needs to do less work. I assume there are no way of removing dependencies between each iteration? Are there any specific test cases that are especially good for running in parallel ? The ones I've tired have seen no speed-up running it in parallel (actually all of the ones I tired was slower). I am looking to investigate whether it would be feasible to attempt to create a GPU version. However I am struggling a little with figuring out how to best achieve good performance. Note: I am not very knowledgeable in CFD, but looking for good use-cases for GPU computations. Thanks for any help |
1 Attachment(s)
Hi Jacob,
I can't answer you question about structure of parallel implemented for the code and the dependencies. It's better to be answered by its developers. However, about the last question, I already determined a speedup for the compressible external flow over ONERA m6 wing using the Euler equations and its provided native grid in the tutorial. Please see my results in the following picture. I used my core-i5 laptop albeit, not a computational server. Meanwhile, I have 2 physical cores. Attachment 32721 Hope it helps you for finding an appropriate test case. Bests, PDP |
Hi PDP,
Thanks for your answer, much appreciated. Well for me the TestCases/euler/onearm6 executes in serial: ~90secs but with two cores it is slightly slower ~100 secs. (although full execution including the python script is: 1m43.838s for 1 core and 1m56.726s for two cores) Using the full 8 threads of my i7-2600 makes it even more slow. Interestingly when I do configure it claims that I have no MPI support, so maybe something goes wrong in the compilation process? I do have MPI installed though, and mpirun, mpicc, mpic++ etc runs just fine. It does use two full cores and definitely shows output for two running programs, but maybe something have gone wrong? Code:
Source code location: /home/jacob/SU2 Code:
$ type mpirun Edit: Ah, it helps configuring with --with-MPI=mpic++ ! Now using 8 cores runs in ~40sec compared to ~90sec on 1 core :) |
I am happy that you figured out the issue. Also, you might have look here, at the end I wrote the configuration flags with details for several versions.
http://www.cfd-online.com/Forums/blo...-part-1-3.html Good Luck, |
All times are GMT -4. The time now is 18:36. |