
[Sponsors] 
OpenFOAM Parallel Numerical Linear Algebra Post 

LinkBack  Thread Tools  Search this Thread  Display Modes 
September 3, 2018, 07:50 
OpenFOAM Parallel Numerical Linear Algebra Post

#1 
Senior Member

Dear all,
I posted information of the parallel numerical linear algebra in OpenFOAM at https://www.linkedin.com/pulse/openf...e/?published=t Ideas to further develop these notes would be valuable to receive. Thanks. Domenico. 

October 13, 2022, 15:13 

#2 
Senior Member
Klaus
Join Date: Mar 2009
Posts: 227
Rep Power: 20 
Two topics come to my mind:
1: How to do linear algebra operations for alternative preconditioners with the OpenFOAM LDU matrix structure e.g. How to compute something like: (IL*D^1) or L^T ... maybe even with a scaled matrix / linear system for further improved preconditioning 2: How to extend PCG or LGMRES to mixed precision fp64/fp32 with fp64 error correction by adding an inner fp32 loop or embedded solver; I suggest here LGMRES because it converges smoother (similar to BICGStab) than GMRES 

October 14, 2022, 04:58 

#3 
Senior Member

Very interesting.
I do understand that the LDU format is convenient to use in the facebased structure of OpenFoam. I wonder, however, whether it makes sense to switch to a more versatile matrix format for linear algebra. Possibly using https://petsc.org/release/docs/manualpages/DMPlex/ or similar allows to retain facebased addressing for discretization and switch to other for faster linear algebra. 

October 14, 2022, 07:20 

#4 
Senior Member
Klaus
Join Date: Mar 2009
Posts: 227
Rep Power: 20 
I don`t think it makes sense to switch to a "more versatile matrix format for linear algebra" for several reasons:
1: There are numerous matrix storage formats including COO, CSR, CSR5, ELL, SpELL, HYB etc.. Which one to use for optimal performance depends on the linear algebra operation and hardware e.g. matrixmatrix operations are faster using a specific format while matrixvector operations are faster using another format and the optimal formats can be different depending on what hardware is used  at least when GPUs are used. On top of that, it can be usefull to store section of the matrix separately e.g. Petsc splits a matrix into sections of a number of rows and stores for each section the diagonal and offdiagonal entries separately in CSR format and works with optimized algorithms leveraging also a range of MPI functionalities to optimize performance. The LDU format stores the matrix structure L, D, U in separate parts in COO format. Be aware the coefficient matrix is incomplete as some elements, the "boundary contributions" are stored/handled separately. 2: Maybe more important, linear algebra operations are often based on L, D or U or transformations of one of them so storing the complete matrix to extract L, D or U later on as needed for a particular linear algebra operation doesn`t bring a beneft to the table I think. A team of Asian researchers working on an OpenMP version of OpenFOAM ended up with storing the L, D, U structure using the CSR format to store each of the parts separately for best performance. 3: OpenFOAM is fast! But there`s room for improvement particularly in the field of preconditioners and mixed precision both on the CPU and in combination with GPUs but simple CPUGPUoffloading or moving to an external linear algebra library yields little benefit in my experience. The challenges I experience are in the complexity that comes with the inherent optimizations in OpenFOAM where algorithms leverage "cell" indexing and/or "face" indexing and/or LDUaddressing for which I have never been able to find a comprehensive documentation together with the boundary contributions to be considered and the decomposition of the linear system for parallel computations which alltogether would enable me to make the extensions I have in mind. A handson, examples based tutorial explaining how to implement linear algebra operations and matrix (section) transformations and operations would help me a lot. 

October 14, 2022, 14:27 

#5 
Senior Member

I agree.
A tutorial case (e.g. extending https://github.com/UnnamedMoose/Basi...mmingTutorials) with linear algebra aspects would be valuable. I suggest to start from laplacianFoam in sequential, information in DarwishMoukalledMangani and other sources. The scope should be to clarify 1. [loop over internal faces]: how the matrix and righthand side vector is assembled by a loop over inner faces (laplacianSchemes assume no nonorthogonal corrections are required); 2. [loop over boundary faces]: how Dirichlet and Neumann boundary conditions are treated and how the matrix is stored in LDU format (U = L^T and D contains negative row sums); 3. [Krylov solve]: how the linear system is solved using unpreconditioned Krylov methods by calling the member function solve() of the lduMatrix class involving BLAS1 (vector and vectorvector) and BLAS2 (matrixvector) routines; 4. [preconditioning]: how the convergence of the Krylov subspace method can be accelerated by involving a preconditioner (one call to the preconditioner at each Krylov subspace iteration) Your input on how to further develop my notes or on taking an alternative route would be valuable here. I disagree. The LDU matrix format does not allow for any fillin. ILU preconditioning allowing some form of fillin or Galerkin coarsening (in GAMG) is thus hard to accomplish. OpenFOAM furthermore does not allow the transparant profiling of linear solver performance that e.g. PETSc allows (using log_summary). 

October 14, 2022, 15:38 

#6 
Senior Member
Klaus
Join Date: Mar 2009
Posts: 227
Rep Power: 20 
If these limitations exist it makes sense.
I wouldn`t only consider Petsc. I think it would make sense to have a look at Trilinos and Hypre even so Hypre preconditioners can be accessed via Petsc, too. Many OpenFoamTrilinos specifics are covered in the thesis by Bob Dröge titled "A software interface for fully implicit flow simulations on blockstructured grids" which can be found online. In particular in chapter 5.. Another very strong but less known contender should be GaspiLS (http://gaspils.de/). See GaspiLSPetscHypre scalability comparisons on the main webpage. 

October 17, 2022, 05:47 

#7 
Senior Member

Dear Klaus,
1/ Valuable information on the implementation of boundary conditions in OpenFoam is in Section 18.2 of the MoukalledManganiDarwish book; 2/ I do see some information on Trilinois and OpenFoam in the master thesis of Droge. Information on how to couple both remains limited. I have previous experience using PETSc; I will continue to expand and get back here. Kind wishes, Domenico. 

October 17, 2022, 09:56 

#8 
Senior Member
Klaus
Join Date: Mar 2009
Posts: 227
Rep Power: 20 
Dear Domenico,
there`s already the PETSc4FOAM extension. When scaling is the objective, GaspiLS (http://gaspils.de/) is probably the better choice. BR, Klaus 

October 17, 2022, 10:27 

#9 
Senior Member

Dear Klaus,
Thank you for getting in touch. I am aware of PETSc4FOAM. I am trying to increase my understanding of why the speedup that PETSc4FOAM remains limited or how to use more advanced features of PETSc such as DMPLEX (to avoid the conversion to CRS after discretization) or FieldSplit (for Schur complement preconditioning) Can you please elaborate on GaspiLS (or ginko) versus the access to GPU that PETSc (or Trilinois) provides? Thank you. Kind wishes. Domenico. 

October 17, 2022, 11:22 

#10 
Senior Member
Klaus
Join Date: Mar 2009
Posts: 227
Rep Power: 20 
GaspiLS uses GPI2 (http://www.gpisite.com/), not MPI for parallel operations which uses a "more nonblockingcommunicationconcept". I can´t say whether GPUs are yet supported.
Ginko is a library designed for GPU computing and there`s and experimental OpenFOAM extension supporting both Nvidia and AMD GPUs. Petsc provides GPUoffloading via cuda to Nvidia GPUs and via OpenCl to AMD GPUs. Why should it be a lot faster? I heard comments that it should be but no reasons. Trilnos offers GPU support (see: https://trilinos.github.io/mpi_x.html) but I have been struggling with the Trilinos documentation in general and have not been able to implement it. It supports apparently Nvidia, Intel and AMD hardware. Maybe RapidCFD is the way to go when it comes to GPU computing. 

Thread Tools  Search this Thread 
Display Modes  


Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
OpenFOAM Training JanJul 2017, Virtual, London, Houston, Berlin  CFDFoundation  OpenFOAM Announcements from Other Sources  0  January 4, 2017 07:15 
A turbulent test case for rhoCentralFoam  immortality  OpenFOAM Running, Solving & CFD  13  April 20, 2014 07:32 
Something weird encountered when running OpenFOAM in parallel on multiple nodes  xpqiu  OpenFOAM Running, Solving & CFD  2  May 2, 2013 05:59 
how to modify fvScheme to converge?  immortality  OpenFOAM Running, Solving & CFD  15  January 16, 2013 14:06 
Summer School on Numerical Modelling and OpenFOAM  hjasak  OpenFOAM  5  October 12, 2008 14:14 