|
[Sponsors] |
April 20, 2011, 09:43 |
Worth parallelizing ?
|
#1 |
New Member
Join Date: Jul 2010
Posts: 17
Rep Power: 15 |
Hi,
I have tried to run pisoFoam both in parallel and on a single processor and result is quite upsetting. I have carried out test with a dual core processor on a 500 000 cell mesh. I only save 13% of execution time when running in parallel. So, I know there is overhead due to data transfer but is there a mean to assess if running in parallel is worth doing it (from experience or from a theoretical point of view) ? Thanks |
|
April 20, 2011, 10:11 |
|
#3 | |
New Member
Join Date: Jul 2010
Posts: 17
Rep Power: 15 |
Quote:
It gives 5000 shared cells (so about 1% of mesh size). I have also tried simple decomposition method but I had twice as much shared cells. |
||
April 21, 2011, 03:30 |
|
#4 |
Senior Member
Alberto Passalacqua
Join Date: Mar 2009
Location: Ames, Iowa, United States
Posts: 1,912
Rep Power: 36 |
I have very good results in Gigabit clusters, less good on largely SMP systems, but still acceptable.
How many cells do you have per processors?
__________________
Alberto Passalacqua GeekoCFD - A free distribution based on openSUSE 64 bit with CFD tools, including OpenFOAM. Available as in both physical and virtual formats (current status: http://albertopassalacqua.com/?p=1541) OpenQBMM - An open-source implementation of quadrature-based moment methods. To obtain more accurate answers, please specify the version of OpenFOAM you are using. |
|
April 21, 2011, 04:13 |
|
#5 |
New Member
Join Date: Jul 2010
Posts: 17
Rep Power: 15 |
I have done test with a 500 000 cell mesh. Test is carried out on a dual core processor so I have 250 000 cells per core and 5000 shared cells between the two cores.
|
|
April 26, 2011, 05:53 |
|
#6 |
Assistant Moderator
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51 |
Just a thought: on a dual-core processor the two cores have to share the memory bandwidth. If one process already uses the whole bandwidth (old Intel dual-cores for example) then you won't see hardly any speedup on two cores
|
|
April 26, 2011, 18:49 |
|
#7 | |
New Member
Join Date: Jul 2010
Posts: 17
Rep Power: 15 |
Quote:
Beyond my initial question, I assume that there is an optimum number of processors for each case, depending on mesh size, number of cells shared between processors, processor frequency, memory bandwidth... |
||
April 27, 2011, 05:16 |
|
#8 | |
Assistant Moderator
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51 |
Quote:
Bernhard |
||
April 28, 2011, 02:53 |
|
#9 |
Super Moderator
Niklas Nordin
Join Date: Mar 2009
Location: Stockholm, Sweden
Posts: 693
Rep Power: 29 |
I have conducted some tests on our new quad-core cluster with infiniband.
The case I ran had 1.0M cells and the solver was rhoPimpleFoam. Code:
#procs runtime (s) speedup efficiency rel speedup k cells/core k cells/cpu 1 38019 - 1 - 1000 1000 2 24625 1.5 0.75 1.5 500 1000 4 15951 2.4 0.6 1.5 265 1000 8 12090 3.1 0.39 1.3 132 500 16 5554 6.9 0.43 2.2 65 250 32 2335 16.3 0.51 2.4 33 125 64 877 43.4 0.68 2.7 17 62.5 128 455 83.6 0.65 1.9 ~8 31.3 of stuff over the same internal connectivity. This problem is too big to run on a low number of cores basically. If you chose your problem optimally you should be able to produce good scaling numbers even on 1-8 cores. As you also can see from the relative speedup (which is how much speedup you get compared to the previous lower number of cores. (i.e. from 2 to 4, 4 to 8, 8 to 16 etc...) you get superlinear effects once you reduce the size of the problem on the cpu's. Once you get a size of the problem that is ~50-70k cells/cpu it starts to scale very well. This is my rule of thumb really to keep the problemsize of around 50 k cells / cpu. I will do a similar comparison later when I use 1 to 2 cores/cpu only and see what the numbers are. Im guessing the speedup will be alot better, so the numbers will look better, but I will be utilizing my hardware very bad. quadcores are not the best hardware for cfd. Last edited by niklas; April 28, 2011 at 04:36. |
|
April 28, 2011, 10:17 |
|
#10 | |
New Member
Join Date: Jul 2010
Posts: 17
Rep Power: 15 |
Quote:
- Two cases simultanously = 9926 s (averaged, respectively 9965 s and 9887 s) - Same case alone = 4695 s No comment ! Thanks Niklas for your post, it's quite interesting |
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Help Parallelizing UDF | AndresC | FLUENT | 0 | February 25, 2010 15:50 |
Experienced CFD guys-Is CFD worth it? | Prads | Main CFD Forum | 5 | March 16, 2008 17:35 |
Is it worth to parallelize 2D code? | zonexo | Main CFD Forum | 7 | February 16, 2007 00:47 |
Help Parallelizing UDF - THREAD_STORE | Mario Santillo | FLUENT | 4 | June 21, 2006 14:21 |
These VKI Lecture Series - so much worth?? | Nomad | Main CFD Forum | 7 | March 16, 2004 20:47 |