CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM (https://www.cfd-online.com/Forums/openfoam/)
-   -   Lies Damn Lies and Benchmarking (https://www.cfd-online.com/Forums/openfoam/60893-lies-damn-lies-benchmarking.html)

gschaider February 6, 2006 12:38

Hi Håkan! Now I can tell yo
 
Hi Håkan!

Now I can tell you about the hidden agenda I had with starting this thread: the things you're talking (GigaBit vs InfinBand) are exactly what interests me and the more hard data there is available the better.

About the size of the test-cases: I was thinking about splitting the cases into sizes (the way the Fluent-people do it with their Benchmarks). And I'm still open to suggestions which cases from the tutorials would fit the purpose better (BTW: I'm planning to extend the script in order to make it possible to use it with cases that are not in $FOAM_TUTORIALS; should have that in a few days)

@decomposition: my approach when putting the benchMark-suite together was: "let the computer do all the work" (-> metis) but I got some errors when applying this strategy (I think some physical boundary conditions don't like to be splitted by a processor boundary). About the performance-merits of the decomposition strategies I can't comment. Sorry.

Bernhard

jens_klostermann February 7, 2006 04:12

Hi Håkan! We are also plann
 
Hi Håkan!

We are also planning to buy a new cluster, and there will be an opportunity to benchmark. Since there is a lack on large cases yet, I would like to take your offer running the 1M cell testcase in simpleFoam (A water turbine draft tube). If you compare Gigabit Ethernet vs. Infiniband, will you have chance to try the suggestion by Michael Prinkey (every PE will have its own IP and also NIC)?

Waiting for results

Jens

hani February 7, 2006 04:39

Hi Jens, I am planning to d
 
Hi Jens,

I am planning to do the tests tomorrow together with our Linux cluster provider (Gridcore). I guess that the suggestion by Michael requires some extra hardware, so it might be difficult to convince Gridcore to make this effort but I will discuss it with them.

As soon as I have the results I will post them here. If you would like to have the complete set-up of the test case after wednesday, send me an e-mail. Find my e-mail address by clicking on my name in the forum.

Håkan.

hani February 7, 2006 07:26

I have made some preliminary t
 
I have made some preliminary tests on my own dual AMD to find out which settings I should use for the benchmarking.

Can anyone tell me if the parallellization in OpenFOAM is made so that a parallel run should give exactly (more or less) the same convergence as a sequential run?

I have tried using the AMD solver for the pressure, and I get the same clockTime for both 1 and 2 CPUS, i.e. zero speedup. The reason for this could be that there might be communication at each grid level, which slows down the computations. I'm not yet sure that the problem has its origin in the AMG solver, but that is my sophisticated guess.

Does anyone have any idea what solvers I should use to get good parallel speedup. I know that choosing a solver for good parallel speedup might not be good for the convergence, since the more advanced solvers has a much better convergence, but I would like to test it since zero speedup is not very good.

Håkan.

gschaider February 7, 2006 07:48

Håkan, your question might be
 
Håkan, your question might be answered by the Posting "New Releases" which was released on "Anouncments" 5 minutes after your posting.

Quote: "- rewriting the AMG solver has improved performance in parallel"

hani February 7, 2006 08:43

I saw the new release. However
 
I saw the new release. However, I have now tested ICCG with the same unsatisfactory result, which indicates that it wasn't the AMG solver. I must be doing something else wrong.

About the Python script - do I have to be root to install it?

Håkan.

gschaider February 7, 2006 09:30

About root: A very good questi
 
About root: A very good question. Never thought of that because in my small world I have the root-PW.

The easy way to install the script is being root. Then the stuff gets installed to a place where Python automagically finds it.

But of course it's a short-sight on my side to assume that everyone has a root-password (or is allowed to sudo).

So the way to it as a non-root would be to:

1. create a directory for the Python-Libs (for instance /home/me/PythonLibs
2. call the installation script with 'python setup.py install --prefix=/home/me/PythonLibs'
3. set the environment variable PYTHONPATH to /home/me/PythonLibs/lib/python2.3/site-packages (the second portion of the path may vary depending on your Python-installation)
4. check the installation by just typing python on the command line. In the python-shell type 'import PyFoam'. If you don't get an error-message all is well

I'll change the Wiki-page accordingly

About the release: I just couldn't resist. Because of the 5 min gap between your question and the release anouncement I thought: "These guys are really fast with their fixes" :-)

hani February 7, 2006 10:48

BIG WARNING! I would like t
 
BIG WARNING!

I would like to post a BIG WARNING not to forget making sure that the number of 1's in decomposeParDict for the processorWeights in metisCoeffs corresponds to the numberOfSubdomains specified in the same file. The problem I had in the previous discussion was that I had specified numberOfSubdomains 2, but I had 10 1's left from a previous computation. OpenFOAM interpreted it as processor0 had processorWeight 1 and processor1 had a processor weight corresponding to the sum of the rest of the 1's.

I realized what was the problem first when I ran the case for 2 CPUs on two different dual machines. Then the CPU-usage was much lower on one of them. This was not the fact when running both processes on the same dual machine. Of course I could have looked at the numbers when doing the decomposePar, but in this case I didn't.

It would be nice if I only had to specify the number of processes once in decomposeParDict, as long as I am running on a homogeneous cluster (which most people usually are).

Now a preliminary parallel speedup is:
Running on Dual Intel(R) Xeon(TM) CPU 2.40GHz, 100Mbps Ethernet network, 500MB RAM/CPU, 0.5MB cache/CPU.
1 CPU: speedup 1 (normalized)
2 CPUs on one dual node: 1.2 (!!!???)
2 CPUs on two dual nodes: 2.0 (great!)
4 CPUs on two dual nodes: 2.2
4 CPUs on four dual nodes: 3
I did not check the influence on the convergence.
I used the ICCG solver for the pressure.
The comparison is based on clock time for four iterations (normalization factor 494s)

Can someone tell me why it is better to run over this slow network than to stay as much as possible to the same nodes? I guess that was what was discussed earlier in this thread? I'm surpriced (and scared) of the effect when running on two CPUs. I will discuss it with Gridcore tomorrow.

I will get back with the 'real' investigation soon.

Håkan.

hani February 7, 2006 11:07

Hi Bernhard, I think I did
 
Hi Bernhard,

I think I did as you said with the Python script, except that the default config file was named defaultBench.cfg instead of default.cfg

When running ./benchFoam.py defaultBench.cfg i got the following error message:

Traceback (most recent call last):
File "./benchFoam.py", line 7, in ?
from PyFoam.Execution.BasicRunner import BasicRunner
ImportError: No module named PyFoam.Execution.BasicRunner

I have no idea what this means.

Håkan.

gschaider February 7, 2006 12:18

Hi Håkan! @the benchmarks:
 
Hi Håkan!

@the benchmarks: Our Xeon-SMP machines scaled shitty compared with the Opteron-machine, but that shitty? One wild guess would be Hyperthreading. Is it enabled? If yes: get rid of it (I've never done tests, but I've heard that it can impact persormance on SMP-machines). BTW: the speedup you get (1.2) is approx. the same you should get for a single Xeon with two processes running with Hyperthreading. If HT can be ruled out I would start blaming the MotherBoard, then the kernel.
But I'm not a hardware-expert so al of these are guesses.

@python script: The error message means that he is trying to import a submodule of PyFoam and can't find it.
Please check the following:
1. the python you get when typing python in the shell ('which python') is the same as the one expected by the script (/usr/bin/python), but this should only matter if you installed PyFoam as root
2. PYTHONPATH is set to the right directory (in the directory PYTHONPATH points to should be a folder PyFoam in which there are several folders one of them called Execution)

Should both of these test be OK do the following:
- on the shell type 'python', the Python interactive shell appears (can be left with control-D)
- on the Python shell type 'import sys' then 'print sys.path': a list of directory names should appear, on of them the Directory you set with PYTHONPATH (assuming you installed as non-root)
- try the offending line (from PyFoam.Execution.BasicRunner import BasicRunner) on the shell (should raise the same error the script gives you)

If things still aren't working feel free to contact me via EMail (we could do it here to, but I think in this thread there are already approx. 3 different discussions going on (all of them interesting) so I think we'll sort out that problem seperatly and I'll distribute the gathered knowledge at the appropriate position (Wiki or script or here) )

mattijs February 7, 2006 13:16

decomposition: we find (on sim
 
decomposition: we find (on simplish cases) that hierarchical or simple can give as good results as or better results than Metis.

For trivial cases (e.g. lid-driven cavity) Metis produces a funny decomposition. Run with the -cellDist argument to have decomposePar dump the decomposition.

mattijs February 7, 2006 13:19

> Can anyone tell me if the pa
 
> Can anyone tell me if the parallellization in OpenFOAM is made so that a parallel run should give exactly (more or less) the same convergence as a sequential run?

Not exactly. (IC) Preconditioning is non-parallellizable so there will be slight differences. The AMG solver uses ICCG at the coarsest level so will also give slightly different results.
The only solver that parallelizes perfectly is the diagonally preconditioned cg (DCG). Unfortunately it generally is much much worse than iccg.

jens_klostermann February 7, 2006 13:31

> Can anyone tell me if the pa
 
> Can anyone tell me if the parallellization in OpenFOAM is made so that a parallel run should give exactly (more or less) the same convergence as a sequential run?

@ slight differences:
I ran an interFoam case which diverged in parallel (2processes). When I restarted the same case from the last time dump of the parallel case on a single cpu the case ran just over timestep where it diverged in parallel.

hani February 8, 2006 02:36

Hi Mattijs, My question reg
 
Hi Mattijs,

My question regarding if OpenFOAM is made so that a parallel run should give exactly (more or less) the same convergence as a sequential run was about at what level the parallelization is made. If every single operation is parallelized, I guess that it should be possible to get the same convergence? Consider a parallel run where each cell belongs to a separate CPU. For all operations that need information from a neighbour you have to send the information with mpi instead of when you do it sequential, where you get it through pointing at the information. The same information should be used in both cases, and the parallel convergence would be the same. However, this is not a good way to parallelize the code since it will run very slowly, so you choose another level of parallelization where you say that a certain amount of operations are made in each CPU before exchanging information with your neighbour.

Take the AMG solver as an example: I could choose to do all AMG levels on each CPU before exchanging the information. I could choose to exchange information at all AMG levels. I could choose to exchange information at each sweep of the solver (in this case the ICCG solver). I could choose to exchange information exactly at the time I need it (as decribed above).

Of course - I can find the answer in the source code :-) ... In the future.

Håkan.

mattijs February 8, 2006 04:32

Everything is fully/exactly pa
 
Everything is fully/exactly parallelized but for the IC (incomplete cholesky) preconditioning. This means that iccg and amg will behave slightly different in parallel. Only the diagonal preconditioning cg solver (DCG) should behave exactly identical.

gschaider March 31, 2006 06:30

I redid the benchmark suite (h
 
I redid the benchmark suite (http://openfoamwiki.net/index.php/Benchmarks_stan dard_v1) with version 1.3 on three machines. For the Intel machines it's more than 10% faster (average, with some solvers even 30%). For the AMD-machine the performance increase doesn't seem to be that dramatic (but I've got to recheck these results). The only solver in the suite that seems to be consistently slower than version 1.2 is Xoodles.

But these results are only preliminary. Very interesting would be results comparing LAM-mpi with openMPI but I think I've got to adapt the scripts for that.

plmauk November 7, 2007 11:30

Hi, Does anybody know, wich c
 
Hi,
Does anybody know, wich cluster were recommended for
OpenFoam running?


All times are GMT -4. The time now is 20:21.