|
[Sponsors] |
How the lduMatrix is distributed in MPI parallelism? |
![]() |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
![]() |
![]() |
#1 |
New Member
Junchao Zhang
Join Date: Jan 2025
Location: U.S.
Posts: 15
Rep Power: 2 ![]() |
Hello,
I want to understand the MPI parallel distribution of a lduMatrix. I've read the post at https://www.linkedin.com/pulse/openf...menico-lahaye/ and especially "3/1/15: Parallel Partitioned Matrices", but still could not get it. I want to build a MATMPIAIJ matrix in PETSc with its MatSetValue(A, i, j, value, mode), where i, j are global indices and mode could be ADD_VALUES or INSERT_VALUES. From my understanding of OpenFOAM, I only need INSERT_VALUES since a coefficient is only computed once by one processor, so we don't need to accumulate within a processor or across processors. Is it correct? In particular, I am confused by the lduInterface code. I knew the repo https://develop.openfoam.com/modules/external-solver but the code somehow disappeared. Fortunately, I have a clone. It seems the code beginning at https://gitlab.com/petsc/petsc4foam/...ype=heads#L632 handles lduInterface, but I don't understand it. What is communicated by `initInternalFieldTransfer`, and why does one need to take `-bCoeffs[i]`? Is bCoeffs[] the result of communication? Could someone shed some light on this? Thank you! -Junchao |
|
![]() |
![]() |
![]() |
![]() |
#2 |
Senior Member
Klaus
Join Date: Mar 2009
Posts: 293
Rep Power: 23 ![]() |
I think yor are missing a point. OpenFOAM and PETSc use very different matrix storage concepts and algorithms optimized for these concepts. OpenFOAM uses a LDU decomposition and COO storage format (and some additional information stored elsewhere) and it's algorithms are optimized for this decomposition of the matrix. PETSc on the other hand allocates blocks of matrix rows split into two parts to each process/processor. Firstly the diagonal block part and secondly the off-diagonal elements part, both parts are stored separately using CSR format and each process works with these two parts.
You are right to look at the external-solver or petsc4foam which handle the conversion efficiently. |
|
![]() |
![]() |
![]() |
![]() |
#3 |
New Member
Junchao Zhang
Join Date: Jan 2025
Location: U.S.
Posts: 15
Rep Power: 2 ![]() |
Hi, Klaus,
I understand the diagonal (A), off-diagonal matrix (B) storage in PETSc. For a M x N matrix, with global indices i, j, the function MatSetValue(mat, i, j, value, mode) either insert or add the value to entry A_{ij}. The entry could land in on-processor A or B, or off-processor (remote) A or B. The operation and potential MPI communication is internally handled by PETSc without user's intervention. So the concept is quite clear. One can call MatSetValue(mat, i, j, value, ADD_VALUES) with same (i, j) multiple times (might from multiple processors) to sum up partial values. What I don't understand is how I should call MatSetValue() to convert a lduMatrix to a global CSR matrix in PETSc. Especially, what role does lduInterface play? Will a matrix entry have multiple contributions (i.e., use ADD_VALUES or simply INSERT_VALUES)? Thank you! -Junchao |
|
![]() |
![]() |
![]() |
![]() |
#4 |
Senior Member
Klaus
Join Date: Mar 2009
Posts: 293
Rep Power: 23 ![]() |
You have to use the OpenFoam globalIndex functionality which will give you the gobal indices which you'll then feed into petsc.
You should look into the petsc4foam code where you'll probably find all the details of the proper ldu to petsc csr conversion using the globalIndex functionality. I think when you mention the interfaces, you mean the process interfaces which hold some contributions which are often overlooked and for which you have to change the sign +/- as it's on the other side of the process interface. I hope that's clear enough, it has been years since I worked with it. |
|
![]() |
![]() |
![]() |
![]() |
#5 |
New Member
Junchao Zhang
Join Date: Jan 2025
Location: U.S.
Posts: 15
Rep Power: 2 ![]() |
Hi, Klaus,
Yes, I meant the process interfaces. I have read the petsc4foam code but I could not understand it, probably because I don't have a math background. So I might need your further explanation. I read the book "The Finite Volume Method in Computational Fluid Dynamics" by F. Moukalled et al. I like it very much but unfortunately it didn't talk about MPI. I borrow one figure from the book to show my question. The original figure shows the matrix assembly on one processor. I added a red line to it. Let's say along the red line, the mesh is partitioned between two processors P3 and P5. Look again at element/cell 3. In my understanding, to compute a_{35}, P3 needs to get the coordinates and global index of cell 5 from P5. Where does the "contributions and +/-" business come from? What did you mean by contributions? Thank you! ![]() Last edited by jczhang; May 1, 2025 at 11:24. |
|
![]() |
![]() |
![]() |
![]() |
#6 |
Senior Member
Klaus
Join Date: Mar 2009
Posts: 293
Rep Power: 23 ![]() |
Regarding parallelism search online for the presentation "Handling Parallelisation in OpenFOAM".
Here's the code of the matrix conversion from an amgcl (AMG slover library) implementation which shows the interface handling needed: Code:
void Foam::myAMGCLSOLVER::crs ( const scalarField& source, std::vector<double>& val, std::vector<int>& col, std::vector<double>& rhs ) const { const lduAddressing& lduAddr = matrix_.mesh().lduAddr(); const lduInterfacePtrsList interfaces(matrix_.mesh().interfaces()); const labelUList& uPtr = lduAddr.upperAddr(); const labelUList& lPtr = lduAddr.lowerAddr(); const label n = lduAddr.size(); const scalarField& diag = matrix_.diag(); const scalarField& upperPtr = matrix_.upper(); const scalarField& lowerPtr = ( matrix_.hasLower() ? matrix_.lower() : matrix_.upper() ); // Values of nonzero entries val.resize(ptr_[ptr_.size()-1]); // Column numbers of nonzero entries col.resize(val.size()); std::vector<int> offsets(ptr_); // Store diagonal first (not needed per se but easier when adding // boundary contributions) for (label celli = 0; celli < n; celli++) { int& index = offsets[celli]; col[index] = globalNumbering_.toGlobal(celli); val[index++] = diag[celli]; } // Store connections to lower numbered cells forAll(uPtr, facei) { int& index = offsets[uPtr[facei]]; col[index] = globalNumbering_.toGlobal(lPtr[facei]); val[index++] = lowerPtr[facei]; } // Store connections to higher numbered cells forAll(lPtr, facei) { int& index = offsets[lPtr[facei]]; col[index] = globalNumbering_.toGlobal(uPtr[facei]); val[index++] = upperPtr[facei]; } // Store connections to neighbouring processors { labelList globalCells(n); forAll(globalCells, celli) { globalCells[celli] = globalNumbering_.toGlobal(celli); } // Initialise transfer of global cells forAll(interfaces, patchi) { if (interfaces.set(patchi)) { interfaces[patchi].initInternalFieldTransfer ( Pstream::commsTypes::nonBlocking, globalCells ); } } if (Pstream::parRun()) { Pstream::waitRequests(); } forAll(interfaces, patchi) { if (interfaces.set(patchi)) { labelField nbrCells ( interfaces[patchi].internalFieldTransfer ( Pstream::commsTypes::nonBlocking, globalCells ) ); const labelUList& fc = lduAddr.patchAddr(patchi); const scalarField& bCoeffs = interfaceBouCoeffs_[patchi]; forAll(fc, i) { label celli = fc[i]; int& index = offsets[celli]; col[index] = nbrCells[i]; // Note: opposite sign since we're using this sides' // coefficients (from discretising our face; not the // neighbouring, reversed face) val[index++] = -bCoeffs[i]; } } } } // import source (rhs) rhs.resize(n); std::copy(source.begin(), source.end(), rhs.begin()); } |
|
![]() |
![]() |
![]() |
![]() |
#7 |
New Member
Junchao Zhang
Join Date: Jan 2025
Location: U.S.
Posts: 15
Rep Power: 2 ![]() |
Hello, Klaus,
The information is very useful. I need to digest it. Thank you very much! -Junchao |
|
![]() |
![]() |
![]() |
![]() |
#9 |
New Member
Junchao Zhang
Join Date: Jan 2025
Location: U.S.
Posts: 15
Rep Power: 2 ![]() |
Hi, dlahaye
I don't understand the phrase "interface from P3 to P5", i.e., the upper-right red box. From the wording, it seems to me the data of "interface from P3 to P5" is owned/calculated by P3. Then, will P5 contribute to the two dots (entries) in that red box? If not, there should be no communication at all. Then, why in the code posted by Klaus, there were two calls to "interfaces[patchi].internalFieldTransfer"? Thanks! -Junchao |
|
![]() |
![]() |
![]() |
![]() |
#10 | |
Senior Member
|
Quote:
Communication is required to compute the interfaces (or off-diagonal matrix blocks). Additional communication is required to compute the matrix-vector product in solving the linear system by iterative solution methods. Does this help? The matter is explained in e.g. Chapter 11 of https://www-users.cse.umn.edu/~saad/...Book_2ndEd.pdf or on in this book https://www.ii.uib.no/~petter/ddbook.html . |
||
![]() |
![]() |
![]() |
![]() |
#11 |
New Member
Junchao Zhang
Join Date: Jan 2025
Location: U.S.
Posts: 15
Rep Power: 2 ![]() |
Thank you! Let me read the code and document.
|
|
![]() |
![]() |
![]() |
Tags |
ldumatrix, mpi, petsc |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
MPI Distributed parallel | Jiricbeng | CFX | 26 | July 24, 2019 09:21 |
Sgimpi | pere | OpenFOAM | 27 | September 24, 2011 07:57 |
HP MPI warning...Distributed parallel processing | Peter | CFX | 10 | May 14, 2011 06:17 |
Error using LaunderGibsonRSTM on SGI ALTIX 4700 | jaswi | OpenFOAM | 2 | April 29, 2008 10:54 |
Is Testsuite on the way or not | lakeat | OpenFOAM Installation | 6 | April 28, 2008 11:12 |