CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > OpenFOAM

Lies Damn Lies and Benchmarking

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   February 6, 2006, 13:38
Default Hi Håkan! Now I can tell yo
  #21
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 3,926
Rep Power: 41
gschaider will become famous soon enoughgschaider will become famous soon enough
Hi Håkan!

Now I can tell you about the hidden agenda I had with starting this thread: the things you're talking (GigaBit vs InfinBand) are exactly what interests me and the more hard data there is available the better.

About the size of the test-cases: I was thinking about splitting the cases into sizes (the way the Fluent-people do it with their Benchmarks). And I'm still open to suggestions which cases from the tutorials would fit the purpose better (BTW: I'm planning to extend the script in order to make it possible to use it with cases that are not in $FOAM_TUTORIALS; should have that in a few days)

@decomposition: my approach when putting the benchMark-suite together was: "let the computer do all the work" (-> metis) but I got some errors when applying this strategy (I think some physical boundary conditions don't like to be splitted by a processor boundary). About the performance-merits of the decomposition strategies I can't comment. Sorry.

Bernhard
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Old   February 7, 2006, 05:12
Default Hi Håkan! We are also plann
  #22
Senior Member
 
Jens Klostermann
Join Date: Mar 2009
Posts: 117
Rep Power: 9
jens_klostermann is on a distinguished road
Hi Håkan!

We are also planning to buy a new cluster, and there will be an opportunity to benchmark. Since there is a lack on large cases yet, I would like to take your offer running the 1M cell testcase in simpleFoam (A water turbine draft tube). If you compare Gigabit Ethernet vs. Infiniband, will you have chance to try the suggestion by Michael Prinkey (every PE will have its own IP and also NIC)?

Waiting for results

Jens
jens_klostermann is offline   Reply With Quote

Old   February 7, 2006, 05:39
Default Hi Jens, I am planning to d
  #23
Senior Member
 
Håkan Nilsson
Join Date: Mar 2009
Location: Gothenburg, Sweden
Posts: 193
Rep Power: 9
hani is on a distinguished road
Hi Jens,

I am planning to do the tests tomorrow together with our Linux cluster provider (Gridcore). I guess that the suggestion by Michael requires some extra hardware, so it might be difficult to convince Gridcore to make this effort but I will discuss it with them.

As soon as I have the results I will post them here. If you would like to have the complete set-up of the test case after wednesday, send me an e-mail. Find my e-mail address by clicking on my name in the forum.

Håkan.
hani is offline   Reply With Quote

Old   February 7, 2006, 08:26
Default I have made some preliminary t
  #24
Senior Member
 
Håkan Nilsson
Join Date: Mar 2009
Location: Gothenburg, Sweden
Posts: 193
Rep Power: 9
hani is on a distinguished road
I have made some preliminary tests on my own dual AMD to find out which settings I should use for the benchmarking.

Can anyone tell me if the parallellization in OpenFOAM is made so that a parallel run should give exactly (more or less) the same convergence as a sequential run?

I have tried using the AMD solver for the pressure, and I get the same clockTime for both 1 and 2 CPUS, i.e. zero speedup. The reason for this could be that there might be communication at each grid level, which slows down the computations. I'm not yet sure that the problem has its origin in the AMG solver, but that is my sophisticated guess.

Does anyone have any idea what solvers I should use to get good parallel speedup. I know that choosing a solver for good parallel speedup might not be good for the convergence, since the more advanced solvers has a much better convergence, but I would like to test it since zero speedup is not very good.

Håkan.
hani is offline   Reply With Quote

Old   February 7, 2006, 08:48
Default Håkan, your question might be
  #25
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 3,926
Rep Power: 41
gschaider will become famous soon enoughgschaider will become famous soon enough
Håkan, your question might be answered by the Posting "New Releases" which was released on "Anouncments" 5 minutes after your posting.

Quote: "- rewriting the AMG solver has improved performance in parallel"
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Old   February 7, 2006, 09:43
Default I saw the new release. However
  #26
Senior Member
 
Håkan Nilsson
Join Date: Mar 2009
Location: Gothenburg, Sweden
Posts: 193
Rep Power: 9
hani is on a distinguished road
I saw the new release. However, I have now tested ICCG with the same unsatisfactory result, which indicates that it wasn't the AMG solver. I must be doing something else wrong.

About the Python script - do I have to be root to install it?

Håkan.
hani is offline   Reply With Quote

Old   February 7, 2006, 10:30
Default About root: A very good questi
  #27
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 3,926
Rep Power: 41
gschaider will become famous soon enoughgschaider will become famous soon enough
About root: A very good question. Never thought of that because in my small world I have the root-PW.

The easy way to install the script is being root. Then the stuff gets installed to a place where Python automagically finds it.

But of course it's a short-sight on my side to assume that everyone has a root-password (or is allowed to sudo).

So the way to it as a non-root would be to:

1. create a directory for the Python-Libs (for instance /home/me/PythonLibs
2. call the installation script with 'python setup.py install --prefix=/home/me/PythonLibs'
3. set the environment variable PYTHONPATH to /home/me/PythonLibs/lib/python2.3/site-packages (the second portion of the path may vary depending on your Python-installation)
4. check the installation by just typing python on the command line. In the python-shell type 'import PyFoam'. If you don't get an error-message all is well

I'll change the Wiki-page accordingly

About the release: I just couldn't resist. Because of the 5 min gap between your question and the release anouncement I thought: "These guys are really fast with their fixes" :-)
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Old   February 7, 2006, 11:48
Default BIG WARNING! I would like t
  #28
Senior Member
 
Håkan Nilsson
Join Date: Mar 2009
Location: Gothenburg, Sweden
Posts: 193
Rep Power: 9
hani is on a distinguished road
BIG WARNING!

I would like to post a BIG WARNING not to forget making sure that the number of 1's in decomposeParDict for the processorWeights in metisCoeffs corresponds to the numberOfSubdomains specified in the same file. The problem I had in the previous discussion was that I had specified numberOfSubdomains 2, but I had 10 1's left from a previous computation. OpenFOAM interpreted it as processor0 had processorWeight 1 and processor1 had a processor weight corresponding to the sum of the rest of the 1's.

I realized what was the problem first when I ran the case for 2 CPUs on two different dual machines. Then the CPU-usage was much lower on one of them. This was not the fact when running both processes on the same dual machine. Of course I could have looked at the numbers when doing the decomposePar, but in this case I didn't.

It would be nice if I only had to specify the number of processes once in decomposeParDict, as long as I am running on a homogeneous cluster (which most people usually are).

Now a preliminary parallel speedup is:
Running on Dual Intel(R) Xeon(TM) CPU 2.40GHz, 100Mbps Ethernet network, 500MB RAM/CPU, 0.5MB cache/CPU.
1 CPU: speedup 1 (normalized)
2 CPUs on one dual node: 1.2 (!!!???)
2 CPUs on two dual nodes: 2.0 (great!)
4 CPUs on two dual nodes: 2.2
4 CPUs on four dual nodes: 3
I did not check the influence on the convergence.
I used the ICCG solver for the pressure.
The comparison is based on clock time for four iterations (normalization factor 494s)

Can someone tell me why it is better to run over this slow network than to stay as much as possible to the same nodes? I guess that was what was discussed earlier in this thread? I'm surpriced (and scared) of the effect when running on two CPUs. I will discuss it with Gridcore tomorrow.

I will get back with the 'real' investigation soon.

Håkan.
hani is offline   Reply With Quote

Old   February 7, 2006, 12:07
Default Hi Bernhard, I think I did
  #29
Senior Member
 
Håkan Nilsson
Join Date: Mar 2009
Location: Gothenburg, Sweden
Posts: 193
Rep Power: 9
hani is on a distinguished road
Hi Bernhard,

I think I did as you said with the Python script, except that the default config file was named defaultBench.cfg instead of default.cfg

When running ./benchFoam.py defaultBench.cfg i got the following error message:

Traceback (most recent call last):
File "./benchFoam.py", line 7, in ?
from PyFoam.Execution.BasicRunner import BasicRunner
ImportError: No module named PyFoam.Execution.BasicRunner

I have no idea what this means.

Håkan.
hani is offline   Reply With Quote

Old   February 7, 2006, 13:18
Default Hi Håkan! @the benchmarks:
  #30
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 3,926
Rep Power: 41
gschaider will become famous soon enoughgschaider will become famous soon enough
Hi Håkan!

@the benchmarks: Our Xeon-SMP machines scaled shitty compared with the Opteron-machine, but that shitty? One wild guess would be Hyperthreading. Is it enabled? If yes: get rid of it (I've never done tests, but I've heard that it can impact persormance on SMP-machines). BTW: the speedup you get (1.2) is approx. the same you should get for a single Xeon with two processes running with Hyperthreading. If HT can be ruled out I would start blaming the MotherBoard, then the kernel.
But I'm not a hardware-expert so al of these are guesses.

@python script: The error message means that he is trying to import a submodule of PyFoam and can't find it.
Please check the following:
1. the python you get when typing python in the shell ('which python') is the same as the one expected by the script (/usr/bin/python), but this should only matter if you installed PyFoam as root
2. PYTHONPATH is set to the right directory (in the directory PYTHONPATH points to should be a folder PyFoam in which there are several folders one of them called Execution)

Should both of these test be OK do the following:
- on the shell type 'python', the Python interactive shell appears (can be left with control-D)
- on the Python shell type 'import sys' then 'print sys.path': a list of directory names should appear, on of them the Directory you set with PYTHONPATH (assuming you installed as non-root)
- try the offending line (from PyFoam.Execution.BasicRunner import BasicRunner) on the shell (should raise the same error the script gives you)

If things still aren't working feel free to contact me via EMail (we could do it here to, but I think in this thread there are already approx. 3 different discussions going on (all of them interesting) so I think we'll sort out that problem seperatly and I'll distribute the gathered knowledge at the appropriate position (Wiki or script or here) )
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Old   February 7, 2006, 14:16
Default decomposition: we find (on sim
  #31
Super Moderator
 
Mattijs Janssens
Join Date: Mar 2009
Posts: 1,416
Rep Power: 17
mattijs is on a distinguished road
decomposition: we find (on simplish cases) that hierarchical or simple can give as good results as or better results than Metis.

For trivial cases (e.g. lid-driven cavity) Metis produces a funny decomposition. Run with the -cellDist argument to have decomposePar dump the decomposition.
mattijs is offline   Reply With Quote

Old   February 7, 2006, 14:19
Default > Can anyone tell me if the pa
  #32
Super Moderator
 
Mattijs Janssens
Join Date: Mar 2009
Posts: 1,416
Rep Power: 17
mattijs is on a distinguished road
> Can anyone tell me if the parallellization in OpenFOAM is made so that a parallel run should give exactly (more or less) the same convergence as a sequential run?

Not exactly. (IC) Preconditioning is non-parallellizable so there will be slight differences. The AMG solver uses ICCG at the coarsest level so will also give slightly different results.
The only solver that parallelizes perfectly is the diagonally preconditioned cg (DCG). Unfortunately it generally is much much worse than iccg.
mattijs is offline   Reply With Quote

Old   February 7, 2006, 14:31
Default > Can anyone tell me if the pa
  #33
Senior Member
 
Jens Klostermann
Join Date: Mar 2009
Posts: 117
Rep Power: 9
jens_klostermann is on a distinguished road
> Can anyone tell me if the parallellization in OpenFOAM is made so that a parallel run should give exactly (more or less) the same convergence as a sequential run?

@ slight differences:
I ran an interFoam case which diverged in parallel (2processes). When I restarted the same case from the last time dump of the parallel case on a single cpu the case ran just over timestep where it diverged in parallel.
jens_klostermann is offline   Reply With Quote

Old   February 8, 2006, 03:36
Default Hi Mattijs, My question reg
  #34
Senior Member
 
Håkan Nilsson
Join Date: Mar 2009
Location: Gothenburg, Sweden
Posts: 193
Rep Power: 9
hani is on a distinguished road
Hi Mattijs,

My question regarding if OpenFOAM is made so that a parallel run should give exactly (more or less) the same convergence as a sequential run was about at what level the parallelization is made. If every single operation is parallelized, I guess that it should be possible to get the same convergence? Consider a parallel run where each cell belongs to a separate CPU. For all operations that need information from a neighbour you have to send the information with mpi instead of when you do it sequential, where you get it through pointing at the information. The same information should be used in both cases, and the parallel convergence would be the same. However, this is not a good way to parallelize the code since it will run very slowly, so you choose another level of parallelization where you say that a certain amount of operations are made in each CPU before exchanging information with your neighbour.

Take the AMG solver as an example: I could choose to do all AMG levels on each CPU before exchanging the information. I could choose to exchange information at all AMG levels. I could choose to exchange information at each sweep of the solver (in this case the ICCG solver). I could choose to exchange information exactly at the time I need it (as decribed above).

Of course - I can find the answer in the source code :-) ... In the future.

Håkan.
hani is offline   Reply With Quote

Old   February 8, 2006, 05:32
Default Everything is fully/exactly pa
  #35
Super Moderator
 
Mattijs Janssens
Join Date: Mar 2009
Posts: 1,416
Rep Power: 17
mattijs is on a distinguished road
Everything is fully/exactly parallelized but for the IC (incomplete cholesky) preconditioning. This means that iccg and amg will behave slightly different in parallel. Only the diagonal preconditioning cg solver (DCG) should behave exactly identical.
mattijs is offline   Reply With Quote

Old   March 31, 2006, 06:30
Default I redid the benchmark suite (h
  #36
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 3,926
Rep Power: 41
gschaider will become famous soon enoughgschaider will become famous soon enough
I redid the benchmark suite (http://openfoamwiki.net/index.php/Benchmarks_stan dard_v1) with version 1.3 on three machines. For the Intel machines it's more than 10% faster (average, with some solvers even 30%). For the AMD-machine the performance increase doesn't seem to be that dramatic (but I've got to recheck these results). The only solver in the suite that seems to be consistently slower than version 1.2 is Xoodles.

But these results are only preliminary. Very interesting would be results comparing LAM-mpi with openMPI but I think I've got to adapt the scripts for that.
__________________
Note: I don't use "Friend"-feature on this forum out of principle. Ah. And by the way: I'm not on Facebook either. So don't be offended if I don't accept your invitation/friend request
gschaider is offline   Reply With Quote

Old   November 7, 2007, 12:30
Default Hi, Does anybody know, wich c
  #37
Member
 
Paul Mauk
Join Date: Mar 2009
Posts: 39
Rep Power: 9
plmauk is on a distinguished road
Hi,
Does anybody know, wich cluster were recommended for
OpenFoam running?
plmauk is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Benchmarking solvers in OpenFOAM srinath OpenFOAM Running, Solving & CFD 4 January 13, 2009 04:22
Benchmarking in parallel connclark OpenFOAM Running, Solving & CFD 4 January 29, 2008 14:01
damn need of help........................ Krishna Yadav FLUENT 5 November 20, 2006 07:44
a way to make lots of money quick and easy no lies Dob Main CFD Forum 0 October 10, 2006 16:45
turbulence benchmarking/validation Steve FLUENT 3 February 26, 2002 20:57


All times are GMT -4. The time now is 22:05.