CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM

Lies Damn Lies and Benchmarking

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   January 16, 2006, 11:31
Default We're in the process of buying
  #1
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51
gschaider will become famous soon enoughgschaider will become famous soon enough
We're in the process of buying new hardware for a small cluster in the next months. Evaluating the hardware by looking at published results is a bit difficult because benchmarks tend to fall into three categories:

- SPECmarks (which are OK but IMHO not 100% applicable for CFD-computations)
- Framerates in Quake 3/Doom 3 (which are interesting, but I don't think my boss would approve if I took this as the basis for a decision)
- other benchmarks which tend to be interger/low memory

So to compare hardware we get for tests I wrote a Python-Script that runs various tutorial-cases from the OpenFOAM-distribution and compares the execution time with that of a reference machine. It then computes an average speedup to that machine.

The script can also be used to run the cases in parallel.

I KNOW that trying to get a single number to gauge the performance of a computer system is the sign of extreme simple-mindedness but I'm trying it anyway (and of course it's not the only number I'm using)

The script is discussed in more detail at

http://openfoamwiki.net/index.php/Contrib_benchFo am

and is part of

http://openfoamwiki.net/index.php/Contrib_PyFoam

The script is quite stable (at least at my site). The problem is the benchmark suite where half of the cases fail for parallel execution (due to problems during decomposePar, problems with the boundary conditions, seems that some of these cases were never run in parallel). I'm planning to have a stable version by the end of the month.

Any comments on the approach/the script/the benchmark suite would be greatly apprechiated (even "You got it all wrong")

(On thing that especially interests me is: what is faster a) one DualCore-Opteron or b) two equivalent SingleCore-CPUs on one board; the last time I looked the price for these two configurations was almost the same)
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Old   January 16, 2006, 12:11
Default Cant comment on the reliabilit
  #2
Senior Member
 
Eugene de Villiers
Join Date: Mar 2009
Posts: 725
Rep Power: 21
eugene is on a distinguished road
Cant comment on the reliability of the benchmarks, but dual core vs single core depends entirely on the interconnect you plan to use.

Friend of mine did a bunch of benchmarks (using STAR admittedly) with a cheap gigabit ethernet and 3 AMD 3800 X2s. Using only 2 machines he got near 90% efficiency. Adding the third machine however dropped him down to around 60%. From this and other experiences I would say unless you can afford a myrinet or equivalent interconnect, stick with single core cpus. A gigabit backbone just doesn't have the capacity or low enough latency to carry two compute units per nic. Even if you etherbond 2 or more nics per box, you will still have latency issues.
eugene is offline   Reply With Quote

Old   January 16, 2006, 13:56
Default Can you report any problems wi
  #3
Senior Member
 
Mattijs Janssens
Join Date: Mar 2009
Posts: 1,419
Rep Power: 26
mattijs is on a distinguished road
Can you report any problems with decomposePar?
mattijs is offline   Reply With Quote

Old   January 16, 2006, 16:15
Default It's not a problem with decomp
  #4
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51
gschaider will become famous soon enoughgschaider will become famous soon enough
It's not a problem with decomposePar per se: for instance in the dieselFoam/aachenBomb case there are two files (ft, fu) that don't have sufficient boundary conditions according to decomposePar:

--> FOAM FATAL IO ERROR : keyword walls is undefined in dictionary "/.automount/werner/Werner/bgschaid/bgschaid-foamStuff/Benchmark/dieselFoam_aach enBomb_standard.gcds07.cdratfd.unileoben.ac.at.cas e.runDir/0/ft::boundaryField"

(I'm fully aware that lagrangian particles usually do not parallelize very well, but that was the reason why I included that case)

Similar things happen with the other cases that fail (except for dnsFoam/boxTurb16: "FOAM FATAL ERROR : calculated number of cells is incorrect" when running dnsFoam).

I'll let you know if I find a real problem with decomposePar (and not a problem that has to do with model set-up)
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Old   January 16, 2006, 16:50
Default Eugene wrote: > Cant comment
  #5
New Member
 
Markus Hitter
Join Date: Mar 2009
Location: Germany
Posts: 12
Rep Power: 17
traumflug is on a distinguished road
Eugene wrote:
> Cant comment on the reliability of the benchmarks, but
> dual core vs single core depends entirely on the
> interconnect you plan to use.

Aren't you confusing a computer with a dual core processor with a cluster with two nodes here? Dual core is always better than single core, same processor frequency assumed.


Markus
traumflug is offline   Reply With Quote

Old   January 16, 2006, 17:49
Default I think what Eugene meant was:
  #6
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51
gschaider will become famous soon enoughgschaider will become famous soon enough
I think what Eugene meant was: "if there are two CPUs on a board (in whatever form) as soon as you need a third CPU for your task you'll see that it would have been wiser to invest in good networking instead of fancy SMP-hardware"

@"dual core always better": if there's only one CPU you're right, but compared to a Dual-CPU-SingleCore-Board I'm not 100% sure, because, if I interpret the Processor diagrams I've seen correctly, on a DualCore the two cores have to share the same MemoryBus which could be a bottleneck. But nobody can tell me for sure whether this has an impact. That's why I want to benchmark.
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Old   January 17, 2006, 08:15
Default The AMD Hypertransport memory
  #7
Senior Member
 
Eugene de Villiers
Join Date: Mar 2009
Posts: 725
Rep Power: 21
eugene is on a distinguished road
The AMD Hypertransport memory bus is good enough that dual core cpus only take about a 10% hit in performance when running a 2 processor job.

The comment about the number of cpus per nic stands though. It all depends on the number of foam processes that have to share the same communications interface. Basically 2 cores/cpus/processors per comms interface can potentially produce a bottleneck due to the doubling in the volume of interprocessor communications that the nic has to handle compared to a single processes.
eugene is offline   Reply With Quote

Old   January 17, 2006, 08:32
Default Channel bonding gigabit ethern
  #8
Senior Member
 
Michael Prinkey
Join Date: Mar 2009
Location: Pittsburgh PA
Posts: 363
Rep Power: 25
mprinkey will become famous soon enough
Channel bonding gigabit ethernet is a waste of time. Performance is not doubled and latency actually becomes worse. Since many (most?) dual Opteron motherboard include dual gigabit interfaces onboard, a useful approach is to buy a bigger network switch and connect both NICs on each node to it. This is key--you need to assign different IP addresses to each interface and basically make the single node look like two nodes by assigning it two host names.

So, nodeX would each have two host names nodeXa and nodeXb. When you launch your parallel runs on dual-CPU, dual-core Opteron nodes, you would use each hostname twice:

node1a
node1a
node1b
node1b
node2a
node2a....

This will give each pair of processors one independent network interface to use as its own and avoids network contention issues. Latency for this setup is the same as for a dual-CPU single-core with one NIC. This same approach could be used for single-core dual-CPU nodes or for dual-core dual-CPU configurations with four NICs.

While this is a possible solution, in my mind, the cost of a dual-core, dual-CPU Opteron node is at the breaking point for investing in higher-end networking. Specifically, our new dual-dual cluster uses Infiniband. The cost of each node itself was on the order of $4k. The networking added roughly $1k per node over plain gigabit. I think that is a reasonable investment for significantly higher bandwidth and lower latency.
mprinkey is offline   Reply With Quote

Old   January 17, 2006, 09:03
Default This is very interesting. So t
  #9
Senior Member
 
Eugene de Villiers
Join Date: Mar 2009
Posts: 725
Rep Power: 21
eugene is on a distinguished road
This is very interesting. So traffic between 2 or more nics on a single machine will be balanced automatically or is it managed by lam/mpi?

I have two 8way opteron boxes here that I would really like to improve the interconnect for. If as you say I can just stick in more nics and cables, that would be awsome. For some reason I had never considered this a possibility and fiddling around with channel bonding got me nowhere.
eugene is offline   Reply With Quote

Old   January 17, 2006, 09:24
Default There is no balancing to do.
  #10
Senior Member
 
Michael Prinkey
Join Date: Mar 2009
Location: Pittsburgh PA
Posts: 363
Rep Power: 25
mprinkey will become famous soon enough
There is no balancing to do. The IP addresses/hostnames are the identifier that MPI/PVM uses to identify processes. By doing as I outlined, you will be giving different pairs of parallel processes different IP addresses. Let's just talk about one PE per NIC for clarity for now. In MPI talk, it might be like this:

# Node 2
PE0 192.168.1.3
PE1 192.168.1.4

# Node 3
PE2 192.168.1.5
PE3 192.168.1.6

If the nodes have two CPUs each, when we launch this job, the each PE will have its own IP address and hence, its own network interface to use. Traffic moving between PE0 and PE1 will not be sent to the switch. The IP stack will bounce it right back just like it would if the processes shared the same IP address, so no performance is lost. With dual cores, it is the same...just two PEs will share an IP address and its corresponding network interface and your hosts file will list the hostnames twice.

On your 8-way boxes, you have a few ways to go. You can buy a few cheap Intel e1000 cards and populate as many PCI slots as you can. Or you can spend a bit more and get the Intel dual- or quad-interface cards. I would also recommend that you at least check on Infiniband prices. You can "end-to-end" them so you would only need to buy two cards and a cable. That shouldn't be much more than $1200-$1500.

BTW, make sure that you add this line to the modules.conf file if you are running the e1000 cards under Linux:

options e1000 InterruptThrottleRate=80000,80000

Add an "80000" for each e1000 you have. That line above is for two interfaces. This greatly reduces network latency and gave about another 150 Mbps in bandwidth. With this tuning, I got latency numbers in the 25-ms range on our xeon cluster. That is down from about 160 ms using the default settings.
mprinkey is offline   Reply With Quote

Old   January 17, 2006, 10:00
Default Now why didnt I think of that?
  #11
Senior Member
 
Eugene de Villiers
Join Date: Mar 2009
Posts: 725
Rep Power: 21
eugene is on a distinguished road
Now why didnt I think of that? Thanks for the info.

I will see about getting a few PCI-X multi-channel cards as soon as I can get these monsters stable.

I know this is not really the forum for this but I have to ask since my patience is wearing thin: has anyone managed to get any of the opteron 8-way systems stable under load for protracted periods (week+)?
eugene is offline   Reply With Quote

Old   January 17, 2006, 10:21
Default Hello Mattijs! I didn't fin
  #12
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51
gschaider will become famous soon enoughgschaider will become famous soon enough
Hello Mattijs!

I didn't find any problems with decomposePar. The only two cases in the suite that I didn't get to run are

- dieselFoam/aachenBomb: the same problem as the one described by thomas in
http://www.cfd-online.com/OpenFOAM_D.../126/1634.html

- dnsFoam/boxTurb16: dnsFoam says

<snip>
--> FOAM FATAL ERROR : calculated number of cells is incorrect

From function Kmesh::Kmesh(const fvMesh& mesh)
in file Kmesh/Kmesh.C at line 87.
</snip>

no matter how I decompose the grid (simple/metis). My stupid question: does dnsFoam run in parallel?
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Old   January 17, 2006, 10:30
Default Nope. It uses fast Fourier tr
  #13
Senior Member
 
Hrvoje Jasak
Join Date: Mar 2009
Location: London, England
Posts: 1,905
Rep Power: 33
hjasak will become famous soon enough
Nope. It uses fast Fourier transforms and a regular uniform mesh (KMesh) to do it on for the forcing and that does not parallelise. If you throw away the forcing, the solver will run parallel.

Sorry,

Hrv
__________________
Hrvoje Jasak
Providing commercial FOAM/OpenFOAM and CFD Consulting: http://wikki.co.uk
hjasak is offline   Reply With Quote

Old   January 17, 2006, 10:32
Default decomposePar needs proper boun
  #14
Super Moderator
 
niklas's Avatar
 
Niklas Nordin
Join Date: Mar 2009
Location: Stockholm, Sweden
Posts: 693
Rep Power: 29
niklas will become famous soon enoughniklas will become famous soon enough
decomposePar needs proper boundary conditions like any other code, but ft is not used anymore by dieselFoam, so that file can simply be removed.
Since decomposePar tries to decompose every file it finds in the directory it will obviously not work if the boundary conditions are wrong.

Has anyone tested to correct the bc, or simply remove ft, and then run decomposePar for the aachenBomb????

N
niklas is offline   Reply With Quote

Old   January 17, 2006, 10:56
Default @dnsFoam: I've marked it as no
  #15
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51
gschaider will become famous soon enoughgschaider will become famous soon enough
@dnsFoam: I've marked it as non-parallel in the Benchmark-suite.

@dieselFoam: my script does that (remove ft and fu) and then the grid get's correctly decomposed. But as soon as dieselFoam runs in parallel I get the error described in the other thread.
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Old   January 23, 2006, 03:49
Default Hy Bernahard, the benchmak-
  #16
Member
 
Duderino
Join Date: Mar 2009
Posts: 40
Rep Power: 17
duderino is on a distinguished road
Hy Bernahard,

the benchmak-script is what I was looking for, since there are almost no cfd-benchmarks available. I am also interested in a dualcore vs. two-cpu comparison: espacially AMD Athlon 64 X2 4800+ vs.2 x AMD Opteron 248 2.20GHz vs. AMD Opteron 265 2x 1.80GHz. Which are all about the same im in price.

The problem is I don'get your PyFoam-0.2.2 script to run if I do a: python setup.py install
I get: error: invalid Python installation: unable to open /usr/lib/python2.4/config/Makefile (No such file or directory). Any idea?

Python seems to be installed since /usr/lib/python2.4/ does exist but not the config directory.

Thanks for your help!

Jens

P.S. I never used python before!
duderino is offline   Reply With Quote

Old   January 23, 2006, 07:03
Default Hi Duderino! Which Linux-di
  #17
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51
gschaider will become famous soon enoughgschaider will become famous soon enough
Hi Duderino!

Which Linux-distribution are you using (I assume it's Linux)? (Python2.4 is only included in the most recent distributions)

Anyway: your python2.4-installation seems to be broken. To find out how badly broken it is just type 'python' on the command line. You then get an "interactive python shell". If you don't the installation is very badly broken.
If you're lucky there is an older version of python still installed (call 'python2.3' or 'python23', 2.2 won't work with my scripts). Try that.

Feel free to contact me by EMail (if we find a solution we can post it to this forum but I don't think it is necessary to bother people with the intermediate steps)
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Old   January 24, 2006, 11:34
Default Hello all! First concerning
  #18
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 4,225
Rep Power: 51
gschaider will become famous soon enoughgschaider will become famous soon enough
Hello all!

First concerning Jens' (Duderino) problem: it seems that Ubuntu-Linux only installs the files that are necessary for a successfull setup.py with the development stuff for Python (try 'apt-get install python2.4-dev' or something similar)

The benchmark-script and suite are now sufficently stable to be thought of as 'beta quality'.

Some prelimiary results can be found at
http://openfoamwiki.net/index.php/Benchmarks_stan dard_v1

The parallel results are not too good (some would even say they're bad), but this had to be expected with cases in the suite that only use 11MB of memory (King Amdahl says Hello). But I think some of the results are quite interesting (Good speedup for Opteron-SMP compared to Xeon-SMP (with MultiThreading; thats not so good))

Feel free to add your results.

And of course: I'm still open to suggestions concerning the benchmark-suite.
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Old   January 31, 2006, 10:31
Default Hello all I am looking for
  #19
Member
 
Duderino
Join Date: Mar 2009
Posts: 40
Rep Power: 17
duderino is on a distinguished road
Hello all

I am looking for some volunteers who help me on comparing some machines. You just need to use Bernhards python script collection you get at the links of the first message in this thread.

I really would like to see some benchmark results on Opteron 250 and above system. So if somebody happens to have such system. Please run the benchmark and publish it at the wiki. This will definetly help me (and also others) with choosing a new system.

Best regards
duderino is offline   Reply With Quote

Old   February 4, 2006, 08:34
Default Hi, We are also planning on
  #20
Senior Member
 
Håkan Nilsson
Join Date: Mar 2009
Location: Gothenburg, Sweden
Posts: 204
Rep Power: 18
hani is on a distinguished road
Hi,

We are also planning on purchasing a new Linux cluster. It has basically already been decided to be an AMD Opteron Dual Node, Dual Core, 2.2GHz. I will start doing some benchmaring during next week on a Dual Node Dual Core Opteron 280, 2.4GHz for up to 16 CPU's/cores. I will benchmark both with Gigabit network and Infiniband. Later on (in a week or so) I will have the opportunity to also try out a similar system but with InfiniPath and up to 32 CPU's/cores.

I will try to use your Python script, but I will also run a test with a 1M cell testcase in simpleFoam (A water turbine draft tube, anyone who would like the case can contact me to get it). As you have already mentioned, the testcases in the python-script are most likely way too small to say anything about real applications.

Does anyone have any suggestion on special settings I should use concerning domain decomposition (I plan to use automatic metis) or any specific settings that can be done in OpenFOAM, which could influence the benchmarking?

Håkan.
hani is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Benchmarking solvers in OpenFOAM srinath OpenFOAM Running, Solving & CFD 4 January 13, 2009 03:22
Benchmarking in parallel connclark OpenFOAM Running, Solving & CFD 4 January 29, 2008 13:01
damn need of help........................ Krishna Yadav FLUENT 5 November 20, 2006 06:44
a way to make lots of money quick and easy no lies Dob Main CFD Forum 0 October 10, 2006 16:45
turbulence benchmarking/validation Steve FLUENT 3 February 26, 2002 19:57


All times are GMT -4. The time now is 09:10.