CFD Online Discussion Forums - AMG Solver in parallel

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)

- - AMG Solver in parallel (https://www.cfd-online.com/Forums/openfoam-solving/59813-amg-solver-parallel.html)

Hi all! I've tried to use the

Hi all!
I've tried to use the AMG solver in a parallel, simpleFoam case single precision, but it seems to be really very slow, with a slow load for CPU during pressure solving, and "501" iteration for each SIMPLE iteration.
I used "AMG 1.e-6 0 100".
I switched to ICCG and everything seems ok, now.

Furthermore, is it possible to use a fast network instead of the ethernet? If yes, shoud I recompile lamport MPI or what?

Francesco

It probably means you've messe

It probably means you've messed up your discretisation or boundary conditions. The solver is "slow" because it is doing a maximum number of iterations without converging, meaning that something in your mesh or discretisation setp is bothering it.

Since I wrote the solver, I wouldn't mind trying it out (if the case is public).

Hrv

Sorry, a few more questions:

Sorry, a few more questions:
- can you plot the residual history for the solver (go to ~/.OpenFOAM-1.3/controlDict and set the debug switch for lduMatrix to 2. This will give you a residual for every iteration). It may be useful to show this for ICCG as well
- how big is this case?
- you are running single precision and converging to 1e-6. The round-off error at single precision will be around 1e-7 (times the number of equations for the residual). Is that your problem?

Hrv

Hi Hrvoje! Thanks for your re

Hi Hrvoje!
Thanks for your reply!
At first, I've tried to increase the tolerance of the solver (up to 1e-5). Seems to be better.
First iteration: 501 + 17 (1 non-orthogonal corrector)
Second: 501 15
Third: 21 11
Fourth: 24 15
...
It came back to 501 during 6th and 7th iteration, but I'm letting it go.

I'll make some tests, single and double precision, with debug activated, and I'll let you know something!

The size of the mesh, however, is a few millions cells, on 16 processes.

P.S. What it's strange, in te

P.S.
What it's strange, in terms of performances, is this (after 10 iterations):
ExecutionTime = 398.07 s ClockTime = 1321 s

I am just writing a paper of v

I am just writing a paper of very fast solvers, containing some considerable new work. :-)

Incidentally, do you have a particularly bad communications on your parallel machine? BTW, I would still like to see a residual graph if possible.

Thanks,

Hrv

The inefficiency seems to be r

The inefficiency seems to be related to the network interface.
In fact, running the same simulation on 4 processors (all on the same computational node, so that network is not used at all), the difference between executionTime and clockTime is almost zero.

BTW, the solver seems to be much more robust in double precision than in single. I started from the solution provided by potentialFoam, with these settings in fvSolution:

solvers
{
p AMG 1e-09 0 100;
U BICCG 1e-09 0.1;
k BICCG 1e-09 0.1;
epsilon BICCG 1e-09 0.1;
R BICCG 1e-09 0.1;
nuTilda BICCG 1e-09 0.1;
}

SIMPLE
{
nNonOrthogonalCorrectors 1;
pRefCell 0;
pRefValue 0;
}

And I got:
Selecting incompressible transport model Newtonian
Selecting turbulence model realizableKE

Starting time loop

Time = 1

BICCG: Solving for Ux, Initial residual = 0.20082417, Final residual = 0.0090950718, No Iterations 1
BICCG: Solving for Uy, Initial residual = 0.2947727, Final residual = 0.010673377, No Iterations 1
BICCG: Solving for Uz, Initial residual = 0.29221236, Final residual = 0.011575026, No Iterations 1
AMG: Solving for p, Initial residual = 1, Final residual = 9.012947e-10, No Iterations 44
AMG: Solving for p, Initial residual = 0.28980045, Final residual = 6.1152146e-10, No Iterations 35
time step continuity errors : sum local = 1.13472e-08, global = -4.5350984e-10, cumulative = -4.5350984e-10
BICCG: Solving for epsilon, Initial residual = 0.00020144714, Final residual = 1.2768634e-06, No Iterations 1
BICCG: Solving for k, Initial residual = 0.99999999, Final residual = 0.008439019, No Iterations 1
ExecutionTime = 186.14 s ClockTime = 190 s

Time = 2

BICCG: Solving for Ux, Initial residual = 0.17720949, Final residual = 0.013158181, No Iterations 1
BICCG: Solving for Uy, Initial residual = 0.25730573, Final residual = 0.011324291, No Iterations 1
BICCG: Solving for Uz, Initial residual = 0.090200218, Final residual = 0.0075883116, No Iterations 1
AMG: Solving for p, Initial residual = 0.51558964, Final residual = 4.655188e-10, No Iterations 44
AMG: Solving for p, Initial residual = 0.17652435, Final residual = 9.4423534e-10, No Iterations 33
time step continuity errors : sum local = 1.455791e-08, global = -7.7483077e-10, cumulative = -1.2283406e-09
BICCG: Solving for epsilon, Initial residual = 0.00014428334, Final residual = 8.4038709e-07, No Iterations 1
BICCG: Solving for k, Initial residual = 0.029816968, Final residual = 0.00022590526, No Iterations 1
ExecutionTime = 348.5 s ClockTime = 352 s

I'll generate the residuals for the single precision case as soon as possible.

BTW, there is a way of use the fast network interconnection I have, instead of the standard ethernet, so that I can speedup the parallel AMG solver?

This is good news. Incidental

This is good news. Incidentally, round-off error pollution will be a problem in your case with single precision. What should be done is to keep x and residual in double precision; the rest of the software can be kept single precision. Since you've got the full source, you should be able to do this on your own.

For my personal pleasure, I would always run in double precision and not worry about round-off.

By the way, you are running SIMPLE and converging the pressure equation to 1e-10 every time, which is a massive waste of time. You can get away with converging the pressure equation to 0.05 or even 0.1 and you will save 80% in CPU time. Definitely worth playing with. Also, there's no point in keeping the solver tolerance at 1e-9 - 1e-6 will almost certainly do:

p AMG 1e-06 0.05 100;

Please keep me posted - it would be nice to hear you say all is well with the solver for future generations to see. :-)

Enjoy,

Hrv

Hi, Reading Hrvoje's commen

Hi,

Reading Hrvoje's comment he has a paper brewing with new solver algorithms, I get very curious... I look forward to reading the full paper in due time, I'm sure it will be good stuff. http://www.cfd-online.com/OpenFOAM_D...part/happy.gif

Just a few general questions, concerning speed and solver efficiency.
1. If I do a profiling of a standard high-level solver, like turbFoam, how much of the time would be spent in the linear solvers? Is the setting up of the matrices (the code in the high-level solvers) a big part?
2. Is there a potential for making any of the linear solver's more efficient, or implementing other solvers than those in the code today? (I'm suspecting that Hrvoje's new exciting stuff is probably not simply a new linear solver, but more on the solution algorithm as a whole)
3. Is there a big overhead due to the data structures, and the polyhedral capabilities?

Best regards,
Ola

Heya, 1) You should be spen

Heya,

1) You should be spending 50-80% of total execution time in the solvers. When you do a profile, the linear solvers should be first by a long way, followed by the velocity gradient and then other minor operations. The first four items ofn the list should bring you over 90% of total time - tells you a lot about the code and algorithm.

2) I just did it. The new solver is just that - a solver: it's only that it's three times faster than anything I've ever seem before. No cheating, no being selective in the test cases, no being economical with the truth or similar. The paper has been submitted to a conference, we'll see what will come out of it. If you wish to test it and are serious about using it, drop me a line.

3) Polyhedral mesh handling actually reduces execution time - it is the most beautiful (read: efficient) way of dealing with an FVM mesh. As for the rest, have a look at run-time and memory consumption comparisons vs other CFD software and "make your own judgement".

Enjoy,

Hrv

Hi prof. Jasak, Today I qu

Hi prof. Jasak,

Today I quickly read your papers (from your site) about extrapolated / preconditioned iterative solvers and I am impressed. Will those new solver variants also be present in the new OpenFOAM 1.4?
If so, when do you plan to release the new version? Since I am doing very computational intensive simulations, 3D moving meshes, a big gain in calculation time can be obtained.

Looking forward to it...

Regards, Frank

Yeah, it does look pretty cool

Yeah, it does look pretty cool, doesn't it? :-) I haven't been expecting such a great improvement in performance but it looks like surprises do indeed exist.

The work will be presented on 15th Annual Conference of the CFD Society of Canada, Toronto, Ontario, Canada, May 27-31, 2007. Why don't you come over to Zagreb to the OpenFOAM Workshop, Jun/2007 and we can talk about it - it is just after the Toronto meeting. In any case, it would be really nice to see some of your work because you've been quite busy over the last year and there will be a session dedicated to fluid-structure interaction.

Hrv

Yes, it looks really cool and

Yes, it looks really cool and promising. I need to study the theories in more detail to really understand what's happening.

I was already considering a visit to the next OpenFOAM workshop. If I will come is more of a time issue. It depends on my simulations and I hope to have some respectable results by then. Your new solver techniques may improve the speed of my simulations:-)

Could you also please give some comments on my problems concerning parallel computations with dynamicBodyFvMesh using more than 2 processors in the other tread.

Regards, Frank

Hi prof. Jasak, Just to keep

Hi prof. Jasak,
Just to keep you updated, I was able to recompile the communication library so that I can run OpenFOAM on the extremely fast interconnection we have.
Everything is now astonishing quick, and I've measured non linear speedup, on a 3.7 mil cells mesh, double precision, up to 64 processors (efficiency=2.19, for sake of precision)!
I'm still playing around with single and double precision. Single is faster, but it seems that you need more iteration to converge. However, it's hard to say, as the solvers tolerances have to be different.

What about the new solvers you mentioned? Will you distribute them with OF 1.4?

Thanks a lot for all the suggestions!
Francesco