CFD Online Discussion Forums - Radically Different GAMG Pressure Solve Iterations with Varying Processor Count

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- OpenFOAM (https://www.cfd-online.com/Forums/openfoam/)

- - Radically Different GAMG Pressure Solve Iterations with Varying Processor Count (https://www.cfd-online.com/Forums/openfoam/106503-radically-different-gamg-pressure-solve-iterations-varying-processor-count.html)

Radically Different GAMG Pressure Solve Iterations with Varying Processor Count

I am performing a scaling study of OpenFOAM and using channel flow DNS to study that. I am finding that PCG scales well to the point that roughly 10K-20K cells/core is reached. GAMG does seem to scale well, not down to the point that PCG does, but is much faster than PCG.

However, there is some anomalous behavior that I am trying to understand with GAMG. The best example is as follows: I ran a case with 315M cells for roughly 2000 time steps. I tried this on 1024, 2048, and 4096 cores. I am using default scotch decomposition. Everything is exactly the same in all three cases, except for the number of cores used. The 2048 case requires roughly twice the final pressure solve iterations to achieve the same tolerance as the 1024 and 4096 cases. In general, I have seen that for a fixed problem size, as the number of cores used is increased, the number of final pressure solve iterations required increases slightly, but this 2048 case is an outlier. Does anyone have any idea why this may have occurred?

My pressure solver settings are as follows, and I used OF-2.1.0:

p
{
solver GAMG;
tolerance 1e-5;
relTol 0.05;
smoother DIC;
nPreSweeps 0;
nPostSweeps 2;
nFinestSweeps 2;
cacheAgglomeration true;
nCellsInCoarsestLevel 100;
agglomerator faceAreaPair;
mergeLevels 1;
}

pFinal
{
solver GAMG;
tolerance 1e-6;
relTol 0.0;
smoother DIC;
nPreSweeps 0;
nPostSweeps 2;
nFinestSweeps 2;
cacheAgglomeration true;
nCellsInCoarsestLevel 100;
agglomerator faceAreaPair;
mergeLevels 1;
}

Thank you,

Matt Churchfield

Hi Matt,

Looks like you've hit a corner case due to the number of divisions. I know I've seen some explanations on this subject... OK, two I've found:

You can start reading here: http://www.cfd-online.com/Forums/ope...tml#post328784 post #4 - the post #9 also has a good suggestion.
The presentation it is referring to is this one: http://powerlab.fsb.hr/ped/kturbo/Op...conJun2007.pdf

As for the question you had: possibly the tolerance values are affecting the way the error addition is occurring, which leads to this specific problem. For example: imagine that you have well distributed cumulative errors between each matrix operation with 1024 and 4096 processors, but with 2048 the cumulative errors aren't well distributed. This would lead to the perfect error storm.

Another possibility is the number of cells available per processor: 315M cells / 2048 processors ~= 154k cells > 50k cells. Mmm, I guess that it's enough cells to go around. Of course you should confirm if scotch isn't unbalancing the distribution, by shifting 40k to a single processor and the remaining 110k spread through other processors.

By the way, another detail I've found sometime ago that might help: http://www.cfd-online.com/Forums/ope...tml#post367979 post #8 - it's possible to do multi-level decomposition!

Best regards,
Bruno