CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > OpenFOAM Running, Solving & CFD

Large test case for running OpenFoam in parallel

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree2Likes

Reply
 
LinkBack Thread Tools Display Modes
Old   August 16, 2007, 14:44
Default Hi, I am testing parallel
  #1
fhy
New Member
 
Huiyu Feng
Join Date: Mar 2009
Posts: 11
Rep Power: 8
fhy is on a distinguished road
Hi,

I am testing parallel feature of OpenFOAM. The test case I used is icoFoam/cavity. However, I did not observe any speedup. The execution times are

sequential: 0.27s
4 cpus : 0.63s
8 cpus : 0.7 s

It might be because it is a small case.

I am wondering whether there are any large cases which I can try. I see there are quite some cases in the tutorial directory. Can someone suggest one which is large enough to test the parallel feature?

Or is there any publicly available OpenFoam test cases for benchmark purpose?

Thanks,
Huiyu
fhy is offline   Reply With Quote

Old   August 16, 2007, 16:03
Default Just change the node density i
  #2
Senior Member
 
Srinath Madhavan (a.k.a pUl|)
Join Date: Mar 2009
Location: Edmonton, AB, Canada
Posts: 698
Rep Power: 12
msrinath80 is on a distinguished road
Just change the node density in constant/polyMesh/blockMeshDict. Increase the number of control volumes!
msrinath80 is offline   Reply With Quote

Old   August 17, 2007, 03:18
Default Hi We did some Benchmarks
  #3
Senior Member
 
Jens Klostermann
Join Date: Mar 2009
Posts: 117
Rep Power: 8
jens_klostermann is on a distinguished road
Hi

We did some Benchmarks based on the pyFoamBech cases from the wiki (Thanks to Bernhard) see http://openfoamwiki.net/index.php/Be...ks_standard_v1

With OF-1.3 depending on the case, we got speedups up to 128 cores.

Jens
jens_klostermann is offline   Reply With Quote

Old   August 17, 2007, 14:29
Default Hi Srinath and Jens, Thanks
  #4
fhy
New Member
 
Huiyu Feng
Join Date: Mar 2009
Posts: 11
Rep Power: 8
fhy is on a distinguished road
Hi Srinath and Jens,

Thanks for your reply and suggestions!

I increased the node density from (20,20,1) to (100, 100,1) and decrease the deltaT to 0.001 to satisfy CFL condition (of the icoFoam/cavity case). With longer runtime, I did observe speedup. However, parallel efficiency dropped to < 50% with 8+ processors.

I checked the benchmark wiki page. Most parallel results only have maximal 4 cpu result.

Jens, can you point me some cases that scales with large number of processors? I am looking for some test cases that can scales to 64 cpus.

I understand that scaling is depending on application as well as system. That's why I am looking for some cases which already show good scaling in other systems. I want to make sure the bad scaling in my local system is not due to application.

Thanks,
Huiyu
fhy is offline   Reply With Quote

Old   August 17, 2007, 14:50
Default In my case it is a rasInterFoa
  #5
Senior Member
 
Join Date: Mar 2009
Posts: 225
Rep Power: 9
paka is on a distinguished road
In my case it is a rasInterFoam solver. The cluster which I'm using allows me to use 8 nodes each having 8 processors.

Frankly speaking the more processors I use, the much better efficiency I obtain. For example using my Mac G5, single processor, computation takes about couple days. With a cluster, 4 nodes give 1.5h, and 8 give something a bit more than 10 minutes.

Try that case.

Krystian
paka is offline   Reply With Quote

Old   August 17, 2007, 17:13
Default When I was in school a grad st
  #6
connclark
Guest
 
Posts: n/a
When I was in school a grad student was doing cluster performance experiments on VHDL language chip simulations. He found that going from 1 to 2 to 4 nodes that total compute time was reduced. He attributed it to more cache hits due to smaller data sets.
  Reply With Quote

Old   August 20, 2007, 18:05
Default Hi Krystian, What test case
  #7
fhy
New Member
 
Huiyu Feng
Join Date: Mar 2009
Posts: 11
Rep Power: 8
fhy is on a distinguished road
Hi Krystian,

What test case did you run with rasInterFoam solver? Is it the default damBreak case in the tutorials? Did you change anything in the controlDict and blockMeshDict?

Did you use the damBreak case for interFoam in standardBench_v1.cfg ?

Thanks,
Huiyu
fhy is offline   Reply With Quote

Old   August 20, 2007, 18:22
Default Hi Jens and Bernhard, I dow
  #8
fhy
New Member
 
Huiyu Feng
Join Date: Mar 2009
Posts: 11
Rep Power: 8
fhy is on a distinguished road
Hi Jens and Bernhard,

I downloaded PyFoam-0.4.0, but did not find benchFoam.py under examples/. Where can I find it? Thanks.

I did find standardBench_v1.cfg under examples/data.
Based on it, I modified the interForm/damBreak. However the sequential run only takes about 180s to finish, which is significantly less than the baseline (1605.82s) in standardBench_v1.cfg. Something must be wrong here. I just want to make sure I am running the correct benchmark.

The following is what I did for modification based on the configure file, please let me know which steps are wrong.

step 1: Modify blockMeshDict, blocks section

blocks
(
hex (0 1 5 4 12 13 17 16) (46 16 1) simpleGrading (1 1 1)
hex (2 3 7 6 14 15 19 18) (38 16 1) simpleGrading (1 1 1)
hex (4 5 9 8 16 17 21 20) (46 84 1) simpleGrading (1 1 1)
hex (5 6 10 9 17 18 22 21) (8 84 1) simpleGrading (1 1 1)
hex (6 7 11 10 18 19 23 22) (38 84 1) simpleGrading (1 1 1)
);

step 2. modify controlDict

endTime 0.5;

deltaT 0.0005;

writeControl adjustableRunTime;

writeInterval 0.1;

step 3: Generate Mesh
blockMesh . damBreak

step4: reset gammar
setFields . damBreak

step5. run it
interFoam . damBreak

I am running with OpenFoam-1.4 on AMD Opteron(tm) Processor 285, 2.6Ghz, 8GB RAM, SLES10.

Thanks,
Huiyu
fhy is offline   Reply With Quote

Old   August 21, 2007, 00:59
Default Hi Huiyu, 1. benchFoam.py i
  #9
Senior Member
 
Jens Klostermann
Join Date: Mar 2009
Posts: 117
Rep Power: 8
jens_klostermann is on a distinguished road
Hi Huiyu,

1. benchFoam.py is now pyFoamBench.py
2. suggestion for cases: oodles pitzDaily and interFoam dambreak should have a good efficiency

Jens
jens_klostermann is offline   Reply With Quote

Old   August 21, 2007, 14:54
Default Hi Jens, Thanks for your su
  #10
fhy
New Member
 
Huiyu Feng
Join Date: Mar 2009
Posts: 11
Rep Power: 8
fhy is on a distinguished road
Hi Jens,

Thanks for your suggestion.

In the benchmark wiki page, I saw you submitted result on Opti250 with OpenFoam 1.2 standard. I am wondering whether you have tried the benchmark with OpenFoam 1.4.

My problem is that I tried OpenFoam1.4 compiled with gcc 4.1.0 on a SLES 10, AMD Opteron(tm) Processor 285, 2.6Ghz, 8GB RAM machine. The sequential version interFoam/damBreak (as in the benchmark v1 configuration) finishes around 180s.
Your submission is 588.91s on a Opteron 250 2.4Ghz 4GB Ram system. The big runtime difference can not be explained by the difference in the system.

So I am wondering whether different OpenFoam versions contribute to the difference. However, it is still hard to believe it accounts for all the rest difference. Is there a way to verify the result of the benchmark?

Huiyu
fhy is offline   Reply With Quote

Old   August 21, 2007, 16:57
Default Hi Huiyu! As far as I know
  #11
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 3,915
Rep Power: 40
gschaider will become famous soon enoughgschaider will become famous soon enough
Hi Huiyu!

As far as I know interFoam was rewritten in a major way from 1.3 to 1.4 (completely new algorithm). Propably this is the cause for the big difference.

Bernhard
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Old   August 21, 2007, 18:00
Default Hi Bernhard, Thanks for the
  #12
fhy
New Member
 
Huiyu Feng
Join Date: Mar 2009
Posts: 11
Rep Power: 8
fhy is on a distinguished road
Hi Bernhard,

Thanks for the info!

I wonder how many solvers got rewritten from 1.2 to 1.4.

I just run PitzDaily with oodles using benchmark_v1 configuration. And got 151.9s (Wall clock), while Jens's submission on Opti252 Dual-Opteron with OF1.2 is 232.47s.

Has anyone run the benchmark suite using OF1.4?

The reason I am so concerned about the runtime is if it is too short, it won't be a good case for parallel runs. Although I can modify the case to make it run longer, there won't be any data to compare from the benchmark wiki page.

Huiyu
fhy is offline   Reply With Quote

Old   August 22, 2007, 02:42
Default Hi Huiyu, just started the
  #13
Senior Member
 
Jens Klostermann
Join Date: Mar 2009
Posts: 117
Rep Power: 8
jens_klostermann is on a distinguished road
Hi Huiyu,

just started the benchmark_v1 again for sequential version interFoam/damBreak (as in the benchmark v1 configuration) I got 168.6 s at the same machine I mentioned on the wiki. So this is quite a speedup!!

I will publish some more results later this week. In the wiki.

If there is some interest in the community for benchmarking and I am willing to share my experience!
Maybe we should colaborate and form some kind of benchmark group? I think the pyFoamBench is a good starting point.

Jens
jens_klostermann is offline   Reply With Quote

Old   August 22, 2007, 13:48
Default Hi Jens, Thanks a lot for t
  #14
fhy
New Member
 
Huiyu Feng
Join Date: Mar 2009
Posts: 11
Rep Power: 8
fhy is on a distinguished road
Hi Jens,

Thanks a lot for testing!

I think it is a great idea to form a benchmark group! I am very interested in benchmarking the parallel features of OpenFoam. Reading through some posts in this forum, I did find there are people interested in benchmarking for difference reasons, such as InfiniBand vs GigE comparision, procurement reference and so on.

I do appreciate the benchmark wiki page and PyFoamBench, and will update benchmarking results when I finish some tests.

The current benchmark suite is a good start. It may be too small for parallel runs with the significant speedup due to version updates.

Huiyu
fhy is offline   Reply With Quote

Old   August 27, 2009, 03:18
Default
  #15
Senior Member
 
lakeat's Avatar
 
Daniel WEI (老魏)
Join Date: Mar 2009
Location: South Bend, IN, USA
Posts: 688
Blog Entries: 9
Rep Power: 12
lakeat is on a distinguished road
Send a message via Skype™ to lakeat
Hi, I hope I am not too late to join the discussion.
It seems no followup of this concern after OpenFOAM-1.4.
Why?
__________________
~
Daniel WEI
-------------
NatHaz Modeling Laboratory
Department of Civil & Environmental Engineering & Earth Sciences
University of Notre Dame, USA
Email || My Personal CFD Blog
lakeat is offline   Reply With Quote

Old   August 27, 2009, 04:45
Default
  #16
Senior Member
 
lakeat's Avatar
 
Daniel WEI (老魏)
Join Date: Mar 2009
Location: South Bend, IN, USA
Posts: 688
Blog Entries: 9
Rep Power: 12
lakeat is on a distinguished road
Send a message via Skype™ to lakeat
My personal Computer:
Case 9 (Please correct me if I was wrong),
http://openfoamwiki.net/index.php/Be...ks_standard_v1
which means, I have modified the case according to standardBench_v2.cfg file.

Version: OpenFOAM-1.6
Case: tutorials/incompressible/pisoFoam/pitzDaily
Application: pisoFoam
MPI: openmpi
I use precompiled one from the official release.

SUSE LINUX
Release 11.2
Gnome 2.26.2

Memory 3.9G
Processor 0: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
Processor 1: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
Processor 2: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
Processor 3: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz

That is 4 cores.

Code:
NP  Time(s)   Speedup       Speedup|baseline(880s)
1    105        1                 8.381
2    55          1.909           16
4    34          3.088           25.882

Any ideas?
__________________
~
Daniel WEI
-------------
NatHaz Modeling Laboratory
Department of Civil & Environmental Engineering & Earth Sciences
University of Notre Dame, USA
Email || My Personal CFD Blog
lakeat is offline   Reply With Quote

Old   August 27, 2009, 04:57
Default
  #17
Senior Member
 
Anonymous
Join Date: Mar 2009
Posts: 110
Rep Power: 8
madad2005 is on a distinguished road
That seems normal to me.

You'll never get 4 times speedup by using 4 cores, especially with quad-cores. All four cores are on one processor (or two dual-cores stuck together like the Kentsfield quad-cores are) and that single processor has access to its RAM via a single Front-Side Bus. The FSB allows the passage of information between the cpu cache and the RAM itself.

Now, if four separate cores are trying to access the RAM storage you are going to be limited on your FSB speed (not sure what the baseline value is). If you had a dual cpu motherboard with two dual cores and the same core-to-RAM ratio, then you would see improved performance as the FSB limit has been improved upon.

I have the same cpu at home and one way to improve these numbers (without spending any money) is to perform overclocking, but this can be hazardous if not done properly. Nevertheless, you would then be able to increase FSB speeds (but keeping the same cpu speed) and you should see an increase in performance.
madad2005 is offline   Reply With Quote

Old   August 27, 2009, 06:09
Default
  #18
Senior Member
 
lakeat's Avatar
 
Daniel WEI (老魏)
Join Date: Mar 2009
Location: South Bend, IN, USA
Posts: 688
Blog Entries: 9
Rep Power: 12
lakeat is on a distinguished road
Send a message via Skype™ to lakeat
No, that's my only cpu, you are not gonna to persuade me overclocking, I will be burnt out if it's burnt out.

Quote:
You'll never get 4 times speedup by using 4 cores,
Google the following paper and you'll see the superlinear behavier.
And I want that!

Super-linear speed-up of a parallel multigrid. Navier-Stokes solver on Flosolver
__________________
~
Daniel WEI
-------------
NatHaz Modeling Laboratory
Department of Civil & Environmental Engineering & Earth Sciences
University of Notre Dame, USA
Email || My Personal CFD Blog
lakeat is offline   Reply With Quote

Old   August 30, 2009, 22:06
Default
  #19
Senior Member
 
lakeat's Avatar
 
Daniel WEI (老魏)
Join Date: Mar 2009
Location: South Bend, IN, USA
Posts: 688
Blog Entries: 9
Rep Power: 12
lakeat is on a distinguished road
Send a message via Skype™ to lakeat
Anyone experience very very good scalability using OpenFOAM, like super linear speedup?

What is the best report of scalability using OpenFOAM so far?

According to Amdahl's Law, the maximum speedup will be restricted by the fraction of the codes that can not be parallelized, so I would eagerly want to know what is this fraction of OpenFOAM? Thank you!
__________________
~
Daniel WEI
-------------
NatHaz Modeling Laboratory
Department of Civil & Environmental Engineering & Earth Sciences
University of Notre Dame, USA
Email || My Personal CFD Blog
lakeat is offline   Reply With Quote

Old   September 22, 2009, 04:13
Default
  #20
Member
 
Carsten Thorenz
Join Date: Mar 2009
Location: Germany
Posts: 33
Rep Power: 8
carsten is on a distinguished road
Hi lakeat!

Speedup is not very easy to define. It is not only a property of the programm code (as you assume) but also a property of the machine you're using and the testcase.

Amdahl's law is not really applicable here. OpenFoam is based on the idea of domain partitioning and should not (or to a negligible extend) suffer from Amdahl's law.

For singlecore clusters scaling is mainly limited by the speed of the network between the nodes (latency beeing the culprit). The smaller the partitioned domains, the worse the impact.

For multicore systems scaling is additionally hindered by the transfer of data from the cores to the main memory. Multiple cores on a single CPU all have to share a bus to the memory and this can hurt execution speed badly.

Further more the size of the problem is relevant. Small meshes only run well on small number of CPUs, bigger meshes on larger numbers. There is usually a "sweet spot" (cells/core) where a code performs best. The is depending on the machine you're using (interconnects, cache sizes, cores per CPU, bus system, ...)

For our machine (HP-Xeon Cluster with Gigabit-Ethernet) the code performs best with 50000 cells/core. Setting 4 cores as the reference (speedup 4), we see superlinear speedup of 42 for 32 cores for a testcase with 1.6 mio cells. This is mainly due to cache effects, i.e. lot of the data fits into the CPU caches for this number of cores. For even larger number of cores, the speedup is poorer (56 for 64 cores, 55!!! for 128 cores). This is due to the poor interconnects.

Hope this helps.

Bye,

Carsten Thorenz
wyldckat and lth like this.
carsten is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
3d test case Hassan Raiesi Main CFD Forum 1 August 19, 2006 12:33
Test Case ganesh Main CFD Forum 0 March 16, 2006 13:34
Looking for test case William M. Main CFD Forum 2 May 26, 2005 03:45
test case? lsm Main CFD Forum 0 June 14, 2004 11:39
test case Follet Main CFD Forum 0 July 8, 2002 04:07


All times are GMT -4. The time now is 20:15.