Worth parallelizing ?
Hi,
I have tried to run pisoFoam both in parallel and on a single processor and result is quite upsetting. I have carried out test with a dual core processor on a 500 000 cell mesh. I only save 13% of execution time when running in parallel. So, I know there is overhead due to data transfer but is there a mean to assess if running in parallel is worth doing it (from experience or from a theoretical point of view) ? Thanks |
How is your mesh decomposed? You need to minimise the number of faces in the processor boundaries.
|
Quote:
It gives 5000 shared cells (so about 1% of mesh size). I have also tried simple decomposition method but I had twice as much shared cells. |
I have very good results in Gigabit clusters, less good on largely SMP systems, but still acceptable.
How many cells do you have per processors? |
I have done test with a 500 000 cell mesh. Test is carried out on a dual core processor so I have 250 000 cells per core and 5000 shared cells between the two cores.
|
Quote:
|
Quote:
Beyond my initial question, I assume that there is an optimum number of processors for each case, depending on mesh size, number of cells shared between processors, processor frequency, memory bandwidth... |
Quote:
Bernhard |
I have conducted some tests on our new quad-core cluster with infiniband.
The case I ran had 1.0M cells and the solver was rhoPimpleFoam. Code:
#procs runtime (s) speedup efficiency rel speedup k cells/core k cells/cpu of stuff over the same internal connectivity. This problem is too big to run on a low number of cores basically. If you chose your problem optimally you should be able to produce good scaling numbers even on 1-8 cores. As you also can see from the relative speedup (which is how much speedup you get compared to the previous lower number of cores. (i.e. from 2 to 4, 4 to 8, 8 to 16 etc...) you get superlinear effects once you reduce the size of the problem on the cpu's. Once you get a size of the problem that is ~50-70k cells/cpu it starts to scale very well. This is my rule of thumb really to keep the problemsize of around 50 k cells / cpu. I will do a similar comparison later when I use 1 to 2 cores/cpu only and see what the numbers are. Im guessing the speedup will be alot better, so the numbers will look better, but I will be utilizing my hardware very bad. quadcores are not the best hardware for cfd. |
Quote:
- Two cases simultanously = 9926 s (averaged, respectively 9965 s and 9887 s) - Same case alone = 4695 s No comment ! Thanks Niklas for your post, it's quite interesting |
All times are GMT -4. The time now is 11:43. |