CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > OpenFOAM Running, Solving & CFD

Worth parallelizing ?

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree1Likes
  • 1 Post By niklas

Reply
 
LinkBack Thread Tools Display Modes
Old   April 20, 2011, 09:43
Default Worth parallelizing ?
  #1
New Member
 
Join Date: Jul 2010
Posts: 17
Rep Power: 6
alf12 is on a distinguished road
Hi,

I have tried to run pisoFoam both in parallel and on a single processor and result is quite upsetting. I have carried out test with a dual core processor on a 500 000 cell mesh.

I only save 13% of execution time when running in parallel.

So, I know there is overhead due to data transfer but is there a mean to assess if running in parallel is worth doing it (from experience or from a theoretical point of view) ?

Thanks
alf12 is offline   Reply With Quote

Old   April 20, 2011, 09:51
Default
  #2
Senior Member
 
Laurence R. McGlashan
Join Date: Mar 2009
Posts: 370
Rep Power: 14
l_r_mcglashan will become famous soon enough
How is your mesh decomposed? You need to minimise the number of faces in the processor boundaries.
__________________
Laurence R. McGlashan :: Website
l_r_mcglashan is offline   Reply With Quote

Old   April 20, 2011, 10:11
Default
  #3
New Member
 
Join Date: Jul 2010
Posts: 17
Rep Power: 6
alf12 is on a distinguished road
Quote:
How is your mesh decomposed?
I have used decomposePar using scotch method.
It gives 5000 shared cells (so about 1% of mesh size).
I have also tried simple decomposition method but I had twice as much shared cells.
alf12 is offline   Reply With Quote

Old   April 21, 2011, 03:30
Default
  #4
Senior Member
 
Alberto Passalacqua
Join Date: Mar 2009
Location: Ames, Iowa, United States
Posts: 1,894
Rep Power: 26
alberto will become famous soon enoughalberto will become famous soon enough
I have very good results in Gigabit clusters, less good on largely SMP systems, but still acceptable.

How many cells do you have per processors?
__________________
Alberto Passalacqua

GeekoCFD - A free distribution based on openSUSE 64 bit with CFD tools, including OpenFOAM. Available as live DVD/USB, hard drive image and virtual image.
OpenQBMM - An open-source implementation of quadrature-based moment methods
alberto is offline   Reply With Quote

Old   April 21, 2011, 04:13
Default
  #5
New Member
 
Join Date: Jul 2010
Posts: 17
Rep Power: 6
alf12 is on a distinguished road
I have done test with a 500 000 cell mesh. Test is carried out on a dual core processor so I have 250 000 cells per core and 5000 shared cells between the two cores.
alf12 is offline   Reply With Quote

Old   April 26, 2011, 05:53
Default
  #6
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 3,912
Rep Power: 40
gschaider will become famous soon enoughgschaider will become famous soon enough
Quote:
Originally Posted by alf12 View Post
I have done test with a 500 000 cell mesh. Test is carried out on a dual core processor so I have 250 000 cells per core and 5000 shared cells between the two cores.
Just a thought: on a dual-core processor the two cores have to share the memory bandwidth. If one process already uses the whole bandwidth (old Intel dual-cores for example) then you won't see hardly any speedup on two cores
gschaider is offline   Reply With Quote

Old   April 26, 2011, 18:49
Default
  #7
New Member
 
Join Date: Jul 2010
Posts: 17
Rep Power: 6
alf12 is on a distinguished road
Quote:
Originally Posted by gschaider View Post
Just a thought: on a dual-core processor the two cores have to share the memory bandwidth. If one process already uses the whole bandwidth (old Intel dual-cores for example) then you won't see hardly any speedup on two cores
So, I have checked and the computer I have used to test has a single memory channel, so it is likely to be why running the program in parallel is not so efficient with my configuration

Beyond my initial question, I assume that there is an optimum number of processors for each case, depending on mesh size, number of cells shared between processors, processor frequency, memory bandwidth...
alf12 is offline   Reply With Quote

Old   April 27, 2011, 05:16
Default
  #8
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 3,912
Rep Power: 40
gschaider will become famous soon enoughgschaider will become famous soon enough
Quote:
Originally Posted by alf12 View Post
So, I have checked and the computer I have used to test has a single memory channel, so it is likely to be why running the program in parallel is not so efficient with my configuration
One way to know for sure is to let two serial cases run simultanously. In an ideal world they should take the same time as if they're on their own. In real life they'll take longer: this is the impact of the memory bandwidth and having to share other parts of the processor.

Bernhard
gschaider is offline   Reply With Quote

Old   April 28, 2011, 02:53
Default
  #9
Super Moderator
 
niklas's Avatar
 
Niklas Nordin
Join Date: Mar 2009
Location: Stockholm, Sweden
Posts: 693
Rep Power: 19
niklas will become famous soon enough
I have conducted some tests on our new quad-core cluster with infiniband.
The case I ran had 1.0M cells and the solver was rhoPimpleFoam.
Code:
#procs        runtime (s)     speedup   efficiency  rel speedup     k cells/core     k cells/cpu
1                   38019               -               1                -                1000             1000
2                   24625              1.5           0.75            1.5               500             1000
4                   15951              2.4           0.6              1.5               265             1000
8                   12090              3.1           0.39            1.3               132               500
16                   5554              6.9           0.43            2.2                 65               250
32                   2335            16.3           0.51            2.4                 33               125
64                     877            43.4           0.68            2.7                 17               62.5
128                   455            83.6           0.65            1.9                 ~8               31.3
as you can see it scales very bad up to 8 cpu's, which is because you just shovel alot
of stuff over the same internal connectivity.
This problem is too big to run on a low number of cores basically. If you chose your problem
optimally you should be able to produce good scaling numbers even on 1-8 cores.
As you also can see from the relative speedup (which is how much speedup you get compared
to the previous lower number of cores. (i.e. from 2 to 4, 4 to 8, 8 to 16 etc...) you get superlinear effects
once you reduce the size of the problem on the cpu's.
Once you get a size of the problem that is ~50-70k cells/cpu it starts to scale very well.
This is my rule of thumb really to keep the problemsize of around 50 k cells / cpu.

I will do a similar comparison later when I use 1 to 2 cores/cpu only and see what the numbers are.
Im guessing the speedup will be alot better, so the numbers will look better, but I will be utilizing
my hardware very bad.

quadcores are not the best hardware for cfd.
elvis likes this.

Last edited by niklas; April 28, 2011 at 04:36.
niklas is offline   Reply With Quote

Old   April 28, 2011, 10:17
Default
  #10
New Member
 
Join Date: Jul 2010
Posts: 17
Rep Power: 6
alf12 is on a distinguished road
Quote:
Originally Posted by gschaider View Post
One way to know for sure is to let two serial cases run simultanously.
So, I have done the test Bernhard talked about. Here are the results :
- Two cases simultanously = 9926 s (averaged, respectively 9965 s and 9887 s)
- Same case alone = 4695 s

No comment !

Thanks Niklas for your post, it's quite interesting
alf12 is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Help Parallelizing UDF AndresC FLUENT 0 February 25, 2010 16:50
Experienced CFD guys-Is CFD worth it? Prads Main CFD Forum 5 March 16, 2008 18:35
Is it worth to parallelize 2D code? zonexo Main CFD Forum 7 February 16, 2007 01:47
Help Parallelizing UDF - THREAD_STORE Mario Santillo FLUENT 4 June 21, 2006 14:21
These VKI Lecture Series - so much worth?? Nomad Main CFD Forum 7 March 16, 2004 21:47


All times are GMT -4. The time now is 21:24.