CFD Online Discussion Forums

CFD Online Discussion Forums (
-   OpenFOAM Running, Solving & CFD (
-   -   AMG Solver in parallel (

fra76 December 5, 2006 06:15

Hi all! I've tried to use the
Hi all!
I've tried to use the AMG solver in a parallel, simpleFoam case single precision, but it seems to be really very slow, with a slow load for CPU during pressure solving, and "501" iteration for each SIMPLE iteration.
I used "AMG 1.e-6 0 100".
I switched to ICCG and everything seems ok, now.

Furthermore, is it possible to use a fast network instead of the ethernet? If yes, shoud I recompile lamport MPI or what?


hjasak December 5, 2006 06:29

It probably means you've messe
It probably means you've messed up your discretisation or boundary conditions. The solver is "slow" because it is doing a maximum number of iterations without converging, meaning that something in your mesh or discretisation setp is bothering it.

Since I wrote the solver, I wouldn't mind trying it out (if the case is public).


hjasak December 5, 2006 06:49

Sorry, a few more questions:
Sorry, a few more questions:
- can you plot the residual history for the solver (go to ~/.OpenFOAM-1.3/controlDict and set the debug switch for lduMatrix to 2. This will give you a residual for every iteration). It may be useful to show this for ICCG as well
- how big is this case?
- you are running single precision and converging to 1e-6. The round-off error at single precision will be around 1e-7 (times the number of equations for the residual). Is that your problem?


fra76 December 5, 2006 09:05

Hi Hrvoje! Thanks for your re
Hi Hrvoje!
Thanks for your reply!
At first, I've tried to increase the tolerance of the solver (up to 1e-5). Seems to be better.
First iteration: 501 + 17 (1 non-orthogonal corrector)
Second: 501 15
Third: 21 11
Fourth: 24 15
It came back to 501 during 6th and 7th iteration, but I'm letting it go.

I'll make some tests, single and double precision, with debug activated, and I'll let you know something!

The size of the mesh, however, is a few millions cells, on 16 processes.

fra76 December 5, 2006 09:10

P.S. What it's strange, in te
What it's strange, in terms of performances, is this (after 10 iterations):
ExecutionTime = 398.07 s ClockTime = 1321 s

hjasak December 5, 2006 09:13

I am just writing a paper of v
I am just writing a paper of very fast solvers, containing some considerable new work. :-)

Incidentally, do you have a particularly bad communications on your parallel machine? BTW, I would still like to see a residual graph if possible.



fra76 December 6, 2006 05:47

The inefficiency seems to be r
The inefficiency seems to be related to the network interface.
In fact, running the same simulation on 4 processors (all on the same computational node, so that network is not used at all), the difference between executionTime and clockTime is almost zero.

BTW, the solver seems to be much more robust in double precision than in single. I started from the solution provided by potentialFoam, with these settings in fvSolution:

p AMG 1e-09 0 100;
U BICCG 1e-09 0.1;
k BICCG 1e-09 0.1;
epsilon BICCG 1e-09 0.1;
R BICCG 1e-09 0.1;
nuTilda BICCG 1e-09 0.1;

nNonOrthogonalCorrectors 1;
pRefCell 0;
pRefValue 0;

And I got:
Selecting incompressible transport model Newtonian
Selecting turbulence model realizableKE

Starting time loop

Time = 1

BICCG: Solving for Ux, Initial residual = 0.20082417, Final residual = 0.0090950718, No Iterations 1
BICCG: Solving for Uy, Initial residual = 0.2947727, Final residual = 0.010673377, No Iterations 1
BICCG: Solving for Uz, Initial residual = 0.29221236, Final residual = 0.011575026, No Iterations 1
AMG: Solving for p, Initial residual = 1, Final residual = 9.012947e-10, No Iterations 44
AMG: Solving for p, Initial residual = 0.28980045, Final residual = 6.1152146e-10, No Iterations 35
time step continuity errors : sum local = 1.13472e-08, global = -4.5350984e-10, cumulative = -4.5350984e-10
BICCG: Solving for epsilon, Initial residual = 0.00020144714, Final residual = 1.2768634e-06, No Iterations 1
BICCG: Solving for k, Initial residual = 0.99999999, Final residual = 0.008439019, No Iterations 1
ExecutionTime = 186.14 s ClockTime = 190 s

Time = 2

BICCG: Solving for Ux, Initial residual = 0.17720949, Final residual = 0.013158181, No Iterations 1
BICCG: Solving for Uy, Initial residual = 0.25730573, Final residual = 0.011324291, No Iterations 1
BICCG: Solving for Uz, Initial residual = 0.090200218, Final residual = 0.0075883116, No Iterations 1
AMG: Solving for p, Initial residual = 0.51558964, Final residual = 4.655188e-10, No Iterations 44
AMG: Solving for p, Initial residual = 0.17652435, Final residual = 9.4423534e-10, No Iterations 33
time step continuity errors : sum local = 1.455791e-08, global = -7.7483077e-10, cumulative = -1.2283406e-09
BICCG: Solving for epsilon, Initial residual = 0.00014428334, Final residual = 8.4038709e-07, No Iterations 1
BICCG: Solving for k, Initial residual = 0.029816968, Final residual = 0.00022590526, No Iterations 1
ExecutionTime = 348.5 s ClockTime = 352 s

I'll generate the residuals for the single precision case as soon as possible.

BTW, there is a way of use the fast network interconnection I have, instead of the standard ethernet, so that I can speedup the parallel AMG solver?

hjasak December 6, 2006 06:37

This is good news. Incidental
This is good news. Incidentally, round-off error pollution will be a problem in your case with single precision. What should be done is to keep x and residual in double precision; the rest of the software can be kept single precision. Since you've got the full source, you should be able to do this on your own.

For my personal pleasure, I would always run in double precision and not worry about round-off.

By the way, you are running SIMPLE and converging the pressure equation to 1e-10 every time, which is a massive waste of time. You can get away with converging the pressure equation to 0.05 or even 0.1 and you will save 80% in CPU time. Definitely worth playing with. Also, there's no point in keeping the solver tolerance at 1e-9 - 1e-6 will almost certainly do:

p AMG 1e-06 0.05 100;

Please keep me posted - it would be nice to hear you say all is well with the solver for future generations to see. :-)



olwi December 13, 2006 09:31

Hi, Reading Hrvoje's commen

Reading Hrvoje's comment he has a paper brewing with new solver algorithms, I get very curious... I look forward to reading the full paper in due time, I'm sure it will be good stuff.

Just a few general questions, concerning speed and solver efficiency.
1. If I do a profiling of a standard high-level solver, like turbFoam, how much of the time would be spent in the linear solvers? Is the setting up of the matrices (the code in the high-level solvers) a big part?
2. Is there a potential for making any of the linear solver's more efficient, or implementing other solvers than those in the code today? (I'm suspecting that Hrvoje's new exciting stuff is probably not simply a new linear solver, but more on the solution algorithm as a whole)
3. Is there a big overhead due to the data structures, and the polyhedral capabilities?

Best regards,

hjasak December 13, 2006 09:46

Heya, 1) You should be spen

1) You should be spending 50-80% of total execution time in the solvers. When you do a profile, the linear solvers should be first by a long way, followed by the velocity gradient and then other minor operations. The first four items ofn the list should bring you over 90% of total time - tells you a lot about the code and algorithm.

2) I just did it. The new solver is just that - a solver: it's only that it's three times faster than anything I've ever seem before. No cheating, no being selective in the test cases, no being economical with the truth or similar. The paper has been submitted to a conference, we'll see what will come out of it. If you wish to test it and are serious about using it, drop me a line.

3) Polyhedral mesh handling actually reduces execution time - it is the most beautiful (read: efficient) way of dealing with an FVM mesh. As for the rest, have a look at run-time and memory consumption comparisons vs other CFD software and "make your own judgement".



lr103476 January 18, 2007 10:57

Hi prof. Jasak, Today I qu
Hi prof. Jasak,

Today I quickly read your papers (from your site) about extrapolated / preconditioned iterative solvers and I am impressed. Will those new solver variants also be present in the new OpenFOAM 1.4?
If so, when do you plan to release the new version? Since I am doing very computational intensive simulations, 3D moving meshes, a big gain in calculation time can be obtained.

Looking forward to it...

Regards, Frank

hjasak January 18, 2007 15:21

Yeah, it does look pretty cool
Yeah, it does look pretty cool, doesn't it? :-) I haven't been expecting such a great improvement in performance but it looks like surprises do indeed exist.

The work will be presented on 15th Annual Conference of the CFD Society of Canada, Toronto, Ontario, Canada, May 27-31, 2007. Why don't you come over to Zagreb to the OpenFOAM Workshop, Jun/2007 and we can talk about it - it is just after the Toronto meeting. In any case, it would be really nice to see some of your work because you've been quite busy over the last year and there will be a session dedicated to fluid-structure interaction.


lr103476 January 18, 2007 18:48

Yes, it looks really cool and
Yes, it looks really cool and promising. I need to study the theories in more detail to really understand what's happening.

I was already considering a visit to the next OpenFOAM workshop. If I will come is more of a time issue. It depends on my simulations and I hope to have some respectable results by then. Your new solver techniques may improve the speed of my simulations:-)

Could you also please give some comments on my problems concerning parallel computations with dynamicBodyFvMesh using more than 2 processors in the other tread.

Regards, Frank

fra76 February 22, 2007 03:38

Hi prof. Jasak, Just to keep
Hi prof. Jasak,
Just to keep you updated, I was able to recompile the communication library so that I can run OpenFOAM on the extremely fast interconnection we have.
Everything is now astonishing quick, and I've measured non linear speedup, on a 3.7 mil cells mesh, double precision, up to 64 processors (efficiency=2.19, for sake of precision)!
I'm still playing around with single and double precision. Single is faster, but it seems that you need more iteration to converge. However, it's hard to say, as the solvers tolerances have to be different.

What about the new solvers you mentioned? Will you distribute them with OF 1.4?

Thanks a lot for all the suggestions!

All times are GMT -4. The time now is 04:38.