CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Programming & Development

How the lduMatrix is distributed in MPI parallelism?

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree1Likes
  • 1 Post By jczhang

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   April 25, 2025, 14:36
Default How the lduMatrix is distributed in MPI parallelism?
  #1
New Member
 
Junchao Zhang
Join Date: Jan 2025
Location: U.S.
Posts: 15
Rep Power: 2
jczhang is on a distinguished road
Hello,

I want to understand the MPI parallel distribution of a lduMatrix. I've read the post at https://www.linkedin.com/pulse/openf...menico-lahaye/ and especially "3/1/15: Parallel Partitioned Matrices", but still could not get it.


I want to build a MATMPIAIJ matrix in PETSc with its MatSetValue(A, i, j, value, mode), where i, j are global indices and mode could be ADD_VALUES or INSERT_VALUES. From my understanding of OpenFOAM, I only need INSERT_VALUES since a coefficient is only computed once by one processor, so we don't need to accumulate within a processor or across processors. Is it correct?

In particular, I am confused by the lduInterface code. I knew the repo https://develop.openfoam.com/modules/external-solver but the code somehow disappeared. Fortunately, I have a clone. It seems the code beginning at https://gitlab.com/petsc/petsc4foam/...ype=heads#L632 handles lduInterface, but I don't understand it. What is communicated by `initInternalFieldTransfer`, and why does one need to take `-bCoeffs[i]`? Is bCoeffs[] the result of communication?

Could someone shed some light on this?


Thank you!

-Junchao
jczhang is offline   Reply With Quote

Old   April 27, 2025, 05:02
Default
  #2
Senior Member
 
Klaus
Join Date: Mar 2009
Posts: 293
Rep Power: 23
klausb will become famous soon enough
I think yor are missing a point. OpenFOAM and PETSc use very different matrix storage concepts and algorithms optimized for these concepts. OpenFOAM uses a LDU decomposition and COO storage format (and some additional information stored elsewhere) and it's algorithms are optimized for this decomposition of the matrix. PETSc on the other hand allocates blocks of matrix rows split into two parts to each process/processor. Firstly the diagonal block part and secondly the off-diagonal elements part, both parts are stored separately using CSR format and each process works with these two parts.


You are right to look at the external-solver or petsc4foam which handle the conversion efficiently.
klausb is offline   Reply With Quote

Old   April 28, 2025, 10:16
Default
  #3
New Member
 
Junchao Zhang
Join Date: Jan 2025
Location: U.S.
Posts: 15
Rep Power: 2
jczhang is on a distinguished road
Hi, Klaus,

I understand the diagonal (A), off-diagonal matrix (B) storage in PETSc. For a M x N matrix, with global indices i, j, the function MatSetValue(mat, i, j, value, mode) either insert or add the value to entry A_{ij}. The entry could land in on-processor A or B, or off-processor (remote) A or B. The operation and potential MPI communication is internally handled by PETSc without user's intervention. So the concept is quite clear. One can call MatSetValue(mat, i, j, value, ADD_VALUES) with same (i, j) multiple times (might from multiple processors) to sum up partial values.


What I don't understand is how I should call MatSetValue() to convert a lduMatrix to a global CSR matrix in PETSc. Especially, what role does lduInterface play? Will a matrix entry have multiple contributions (i.e., use ADD_VALUES or simply INSERT_VALUES)?


Thank you!

-Junchao
jczhang is offline   Reply With Quote

Old   April 29, 2025, 19:38
Default
  #4
Senior Member
 
Klaus
Join Date: Mar 2009
Posts: 293
Rep Power: 23
klausb will become famous soon enough
You have to use the OpenFoam globalIndex functionality which will give you the gobal indices which you'll then feed into petsc.

You should look into the petsc4foam code where you'll probably find all the details of the proper ldu to petsc csr conversion using the globalIndex functionality.


I think when you mention the interfaces, you mean the process interfaces which hold some contributions which are often overlooked and for which you have to change the sign +/- as it's on the other side of the process interface. I hope that's clear enough, it has been years since I worked with it.
klausb is offline   Reply With Quote

Old   April 30, 2025, 10:13
Default
  #5
New Member
 
Junchao Zhang
Join Date: Jan 2025
Location: U.S.
Posts: 15
Rep Power: 2
jczhang is on a distinguished road
Hi, Klaus,
Yes, I meant the process interfaces. I have read the petsc4foam code but I could not understand it, probably because I don't have a math background. So I might need your further explanation.

I read the book "The Finite Volume Method in Computational Fluid Dynamics" by F. Moukalled et al. I like it very much but unfortunately it didn't talk about MPI.

I borrow one figure from the book to show my question. The original figure shows the matrix assembly on one processor. I added a red line to it. Let's say along the red line, the mesh is partitioned between two processors P3 and P5. Look again at element/cell 3. In my understanding, to compute a_{35}, P3 needs to get the coordinates and global index of cell 5 from P5. Where does the "contributions and +/-" business come from? What did you mean by contributions? Thank you!


Last edited by jczhang; May 1, 2025 at 11:24.
jczhang is offline   Reply With Quote

Old   April 30, 2025, 14:35
Default
  #6
Senior Member
 
Klaus
Join Date: Mar 2009
Posts: 293
Rep Power: 23
klausb will become famous soon enough
Regarding parallelism search online for the presentation "Handling Parallelisation in OpenFOAM".

Here's the code of the matrix conversion from an amgcl (AMG slover library) implementation which shows the interface handling needed:


Code:
void Foam::myAMGCLSOLVER::crs
(
    const scalarField& source,
    std::vector<double>& val,
    std::vector<int>& col,
    std::vector<double>& rhs
) const
{
    const lduAddressing& lduAddr = matrix_.mesh().lduAddr();
    const lduInterfacePtrsList interfaces(matrix_.mesh().interfaces());
    const labelUList& uPtr = lduAddr.upperAddr();
    const labelUList& lPtr = lduAddr.lowerAddr();
    const label n = lduAddr.size();
    const scalarField& diag = matrix_.diag();
    const scalarField& upperPtr = matrix_.upper();

    const scalarField& lowerPtr =
    (
        matrix_.hasLower()
      ? matrix_.lower()
      : matrix_.upper()
    );

    // Values of nonzero entries
    val.resize(ptr_[ptr_.size()-1]);
    // Column numbers of nonzero entries
    col.resize(val.size());

    std::vector<int> offsets(ptr_);
    // Store diagonal first (not needed per se but easier when adding
    // boundary contributions)
    for (label celli = 0; celli < n; celli++)
    {
        int& index = offsets[celli];
        col[index] = globalNumbering_.toGlobal(celli);
        val[index++] = diag[celli];
    }
    // Store connections to lower numbered cells
    forAll(uPtr, facei)
    {
        int& index = offsets[uPtr[facei]];
        col[index] = globalNumbering_.toGlobal(lPtr[facei]);
        val[index++] = lowerPtr[facei];
    }

    // Store connections to higher numbered cells
    forAll(lPtr, facei)
    {
        int& index = offsets[lPtr[facei]];
        col[index] = globalNumbering_.toGlobal(uPtr[facei]);
        val[index++] = upperPtr[facei];
    }

    // Store connections to neighbouring processors
    {
        labelList globalCells(n);
        forAll(globalCells, celli)
        {
            globalCells[celli] = globalNumbering_.toGlobal(celli);
        }

        // Initialise transfer of global cells
        forAll(interfaces, patchi)
        {
            if (interfaces.set(patchi))
            {
                interfaces[patchi].initInternalFieldTransfer
                (
                    Pstream::commsTypes::nonBlocking,
                    globalCells
                );
            }
        }

        if (Pstream::parRun())
        {
            Pstream::waitRequests();
        }

        forAll(interfaces, patchi)
        {
            if (interfaces.set(patchi))
            {
                labelField nbrCells
                (
                    interfaces[patchi].internalFieldTransfer
                    (
                        Pstream::commsTypes::nonBlocking,
                        globalCells
                    )
                );

                const labelUList& fc = lduAddr.patchAddr(patchi);
                const scalarField& bCoeffs = interfaceBouCoeffs_[patchi];
                forAll(fc, i)
                {
                    label celli = fc[i];
                    int& index = offsets[celli];
                    col[index] = nbrCells[i];

                    // Note: opposite sign since we're using this sides'
                    //       coefficients (from discretising our face; not the
                    //       neighbouring, reversed face)
                    val[index++] = -bCoeffs[i];
                }
            }
        }
    }

    // import source (rhs)
    rhs.resize(n);

    std::copy(source.begin(), source.end(), rhs.begin());
}
klausb is offline   Reply With Quote

Old   April 30, 2025, 17:34
Default
  #7
New Member
 
Junchao Zhang
Join Date: Jan 2025
Location: U.S.
Posts: 15
Rep Power: 2
jczhang is on a distinguished road
Hello, Klaus,
The information is very useful. I need to digest it.

Thank you very much!

-Junchao
jczhang is offline   Reply With Quote

Old   May 1, 2025, 03:36
Default
  #8
Senior Member
 
Domenico Lahaye
Join Date: Dec 2013
Posts: 844
Blog Entries: 1
Rep Power: 19
dlahaye is on a distinguished road
Does figure attached help?
Attached Images
File Type: png ldu_matrix.png (64.1 KB, 5 views)
dlahaye is offline   Reply With Quote

Old   May 1, 2025, 12:25
Default
  #9
New Member
 
Junchao Zhang
Join Date: Jan 2025
Location: U.S.
Posts: 15
Rep Power: 2
jczhang is on a distinguished road
Hi, dlahaye
I don't understand the phrase "interface from P3 to P5", i.e., the upper-right red box. From the wording, it seems to me the data of "interface from P3 to P5" is owned/calculated by P3. Then, will P5 contribute to the two dots (entries) in that red box? If not, there should be no communication at all. Then, why in the code posted by Klaus, there were two calls to "interfaces[patchi].internalFieldTransfer"?

Thanks!
-Junchao
jczhang is offline   Reply With Quote

Old   May 1, 2025, 13:25
Default
  #10
Senior Member
 
Domenico Lahaye
Join Date: Dec 2013
Posts: 844
Blog Entries: 1
Rep Power: 19
dlahaye is on a distinguished road
Quote:
Originally Posted by jczhang View Post
Then, will P5 contribute to the two dots (entries) in that red box?
-Junchao
Yes, true, correct. Communication between P3 and P5 is required to compute the two off-diagonal block. This is independent of who "owns" the off-diagonal blocks.

Communication is required to compute the interfaces (or off-diagonal matrix blocks).

Additional communication is required to compute the matrix-vector product in solving the linear system by iterative solution methods.

Does this help?

The matter is explained in e.g. Chapter 11 of https://www-users.cse.umn.edu/~saad/...Book_2ndEd.pdf

or on in this book https://www.ii.uib.no/~petter/ddbook.html .
dlahaye is offline   Reply With Quote

Old   May 2, 2025, 00:20
Default
  #11
New Member
 
Junchao Zhang
Join Date: Jan 2025
Location: U.S.
Posts: 15
Rep Power: 2
jczhang is on a distinguished road
Thank you! Let me read the code and document.
dlahaye likes this.
jczhang is offline   Reply With Quote

Old   May 2, 2025, 02:05
Default
  #12
Senior Member
 
Domenico Lahaye
Join Date: Dec 2013
Posts: 844
Blog Entries: 1
Rep Power: 19
dlahaye is on a distinguished road
Please get in touch with follow-up questions.
dlahaye is offline   Reply With Quote

Reply

Tags
ldumatrix, mpi, petsc

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
MPI Distributed parallel Jiricbeng CFX 26 July 24, 2019 09:21
Sgimpi pere OpenFOAM 27 September 24, 2011 07:57
HP MPI warning...Distributed parallel processing Peter CFX 10 May 14, 2011 06:17
Error using LaunderGibsonRSTM on SGI ALTIX 4700 jaswi OpenFOAM 2 April 29, 2008 10:54
Is Testsuite on the way or not lakeat OpenFOAM Installation 6 April 28, 2008 11:12


All times are GMT -4. The time now is 03:18.