# Is there a way to order nodes in a mesh to efficiently compute sparse matrices?

 Register Blogs Members List Search Today's Posts Mark Forums Read

 December 6, 2017, 10:16 Is there a way to order nodes in a mesh to efficiently compute sparse matrices? #1 Senior Member   Hector Redal Join Date: Aug 2010 Location: Madrid, Spain Posts: 217 Rep Power: 11 Hi, I don't know if I am going to explain my problem clear enough or my question makes no sense. Suppose there is a mesh created or generated by hand or by any software. The identifers provided to the nodes of the this mesh have been assigned randomly / arbitrarily. The CFD software takes this mesh as input and creates some sparse matrices. Obviously, the order assigned to the nodes affects the performance cost of the calculation / solution of the problem. If I manage to constructed a narrow banded (almost diagonal) matrix, the performance cost when calculating the inverse of the matrix (for example) will definitelly be smaller than any other matrix (constructed for the same mesh). Are there any algorithms I can check / read to implement some kind of preprocessing step so as to order the nodes of the mesh when building the sparse matrix of the CFD software? Are there any paper or reading I can take a look at? Thanks in advance for your kind support. Best regards, Hector. Quasar_89 likes this.

 December 6, 2017, 10:41 #2 Senior Member   Filippo Maria Denaro Join Date: Jul 2010 Posts: 4,269 Rep Power: 46 HectorRedal likes this.

 December 6, 2017, 10:42 #3 Senior Member     Paolo Lampitella Join Date: Mar 2009 Location: Italy Posts: 895 Blog Entries: 17 Rep Power: 24 There is really a lot of material on this. Basically, any book on solving linear equation systems has some section on the proper ordering of the variables. The reverse Cuthill–McKee algorithm (RCM) is one such orderings based only on the graph of the matrix (i.e., the connectivity of the mesh, not its coordinates). Space Filling Curves are another mean to order "stuff" in space and only require local coordinates of the objects to sort. Thus they are suitable for matrices coming from PDEs discretized in space. HectorRedal likes this.

December 6, 2017, 10:58
#4
Senior Member

Michael Prinkey
Join Date: Mar 2009
Location: Pittsburgh PA
Posts: 363
Rep Power: 18
Quote:
 Originally Posted by sbaffini There is really a lot of material on this. Basically, any book on solving linear equation systems has some section on the proper ordering of the variables. The reverse Cuthill–McKee algorithm (RCM) is one such orderings based only on the graph of the matrix (i.e., the connectivity of the mesh, not its coordinates). Space Filling Curves are another mean to order "stuff" in space and only require local coordinates of the objects to sort. Thus they are suitable for matrices coming from PDEs discretized in space.
I've found space filling curves to be every bit as good, if not better than RCM. RCM gives the lowest bandwidth result which is exactly what you want for avoiding fill-in for a banded algorithm. But SFC seem to do a better job of reducing "average" bandwidth allowing that cache misses/parallel communication must occur, it seems to reduce the average span of the matrix at the expense of more severe outliers. But if an outlier is not in cache, it doesn't matter much where it is. Of course, this is just my experience. I don't have any solid theoretical/experimental results to point to.

I have looked a using SFC to create blocks of a characteristic (say L2) cache size and then maybe RCM on each of those blocks.

 December 6, 2017, 12:09 #5 Senior Member     Paolo Lampitella Join Date: Mar 2009 Location: Italy Posts: 895 Blog Entries: 17 Rep Power: 24 If I remember correctly, RCM might have some advantage for LU factorizations or the sort of. However, in addition to what Michael has stated, SFC orderings have also the advantage of producing an ordering also suitable for straightforward parallel partitioning and more (there is a paper by aftosmis on this).

 December 6, 2017, 17:35 #6 Senior Member   Hector Redal Join Date: Aug 2010 Location: Madrid, Spain Posts: 217 Rep Power: 11 Hi, Thank you all for your help and support! I have been digging out a bit in the internet using the information you have provided to me (Cuthill–McKee algorithm and Space Filling Curves), and I have found that all these methods are suitable for SparseLU solvers. I have just one doubt, can these methods be used with BiCGSTAB solvers? Maybe my question makes no sense, so I beg your pardon if this is the case. Best regards.

December 6, 2017, 21:14
#7
Senior Member

Michael Prinkey
Join Date: Mar 2009
Location: Pittsburgh PA
Posts: 363
Rep Power: 18
Quote:
 Originally Posted by HectorRedal Hi, Thank you all for your help and support! I have been digging out a bit in the internet using the information you have provided to me (Cuthill–McKee algorithm and Space Filling Curves), and I have found that all these methods are suitable for SparseLU solvers. I have just one doubt, can these methods be used with BiCGSTAB solvers? Maybe my question makes no sense, so I beg your pardon if this is the case. Best regards.
Matrix ordering WILL NOT change the efficacy of BiCGStab or any other Krylov method in terms of increasing or decreasing the number of iterations required. The same Krylov spaces will be generated (subject to the same reordering of the solution and subspace vectors). The only potential changes *could* be due to round-off errors from the change in order of operations--but neither RCM or SFC address this issue in any fashion.

For (Super)-LU schemes or a wide-band Thomas algorithms that generate direct (aka exact) rather than iterative solutions to the linear system, there is a big benefit for RCM and the resulting reduced matrix bandwidth. The scaling of the solution of, say, a Thomas algorithm is O(N*B^2) where N is the number of unknowns and B is the bandwidth of the matrix. If you can reorder and reduce B by some significant factor, you reduce the runtime of the exact Thomas solution of the reordered matrix by that factor squared. Fill-in storage is significantly reduced as well.

For iterative schemes, the value of matrix re-ordering is ALL cache related. Matrices with smaller bandwidths tend to have more cache reuse and fewer pipeline stalls due to cache misses. Better cache usages tends to make for faster runtimes and more efficient solution schemes. But these efficiency improvements are NOT algorithmic...they are artifacts of the memory hierarchy in modern CPU and memory subsystems. And some CPU, cache, and memory configurations may see more benefit from re-ordering than others. This is deep in the weeds of high-performance coding, but these are where the distinctions lie.

December 7, 2017, 02:28
#8
Senior Member

Hector Redal
Join Date: Aug 2010
Posts: 217
Rep Power: 11
Quote:
 Originally Posted by mprinkey Matrix ordering WILL NOT change the efficacy of BiCGStab or any other Krylov method in terms of increasing or decreasing the number of iterations required. The same Krylov spaces will be generated (subject to the same reordering of the solution and subspace vectors). The only potential changes *could* be due to round-off errors from the change in order of operations--but neither RCM or SFC address this issue in any fashion. For (Super)-LU schemes or a wide-band Thomas algorithms that generate direct (aka exact) rather than iterative solutions to the linear system, there is a big benefit for RCM and the resulting reduced matrix bandwidth. The scaling of the solution of, say, a Thomas algorithm is O(N*B^2) where N is the number of unknowns and B is the bandwidth of the matrix. If you can reorder and reduce B by some significant factor, you reduce the runtime of the exact Thomas solution of the reordered matrix by that factor squared. Fill-in storage is significantly reduced as well. For iterative schemes, the value of matrix re-ordering is ALL cache related. Matrices with smaller bandwidths tend to have more cache reuse and fewer pipeline stalls due to cache misses. Better cache usages tends to make for faster runtimes and more efficient solution schemes. But these efficiency improvements are NOT algorithmic...they are artifacts of the memory hierarchy in modern CPU and memory subsystems. And some CPU, cache, and memory configurations may see more benefit from re-ordering than others. This is deep in the weeds of high-performance coding, but these are where the distinctions lie.
Thanks a lot for your clear and cristal explanation.
I have perfectly grasped the main point of the subject under discussion.

Just more question: Which algorithms is quicker / faster?
Using a wide-band Thomas with RCM algorithm or a Krylov method? (If they are comparable)

Can they be compared? Or are we talking about things that are not comparable?

December 7, 2017, 03:49
#9
Senior Member

Filippo Maria Denaro
Join Date: Jul 2010
Posts: 4,269
Rep Power: 46
Quote:
 Originally Posted by HectorRedal Thanks a lot for your clear and cristal explanation. I have perfectly grasped the main point of the subject under discussion. Just more question: Which algorithms is quicker / faster? Using a wide-band Thomas with RCM algorithm or a Krylov method? (If they are comparable) Can they be compared? Or are we talking about things that are not comparable?

The issue is about what do you mean for quicker/faster. Depending on the method, CPU time and number of iterations can produce strange and conflicting results

December 7, 2017, 06:57
#10
Senior Member

Michael Prinkey
Join Date: Mar 2009
Location: Pittsburgh PA
Posts: 363
Rep Power: 18
Quote:
 Originally Posted by FMDenaro The issue is about what do you mean for quicker/faster. Depending on the method, CPU time and number of iterations can produce strange and conflicting results
Indeed. 8)

In terms of practical use in CFD applications, Krylov solvers (CG for symmetric positive definite matrices, BiCGStab and GMRES for non-symmetric matrices) are state of the art. Other alternatives are pure multigrid iterative schemes. The best solvers for large systems use multigrid iteration as a precondition for the Krylov space solver and reap the synergistic benefit of both approaches.

Direct/Exact solvers like LU/Thomas algorithms are not regularly used in CFD for the solution of 2D/3D problems. They are only used in reduced-dimension approximations like Alternate Direction Implicit methods or for line relaxation in smoother/preconditioner schemes...but those applications usually have a bandwidth of 3 for 2nd-order schemes or proportionally larger for higher order schemes. Again, these are solving only 1D sub-sets of the larger 2D/3D problem and are used in an iterative fashion or used in conjunction with a dimensional reduction scheme, say by using FFT techniques in the other spatial dimensions. The only other CFD application I can think of for banded direct solvers is in the evaluation of the compact finite differences.

Outside of CFD, RCM with direct solver methods may be used to do more than solve the linear system. For example, a reduced bandwidth matrix will be easier to extract eigenvalues/vectors which is useful, say, for identifying vibrational frequency and mode shapes in elastic structures.

 December 7, 2017, 07:07 #11 Senior Member   Filippo Maria Denaro Join Date: Jul 2010 Posts: 4,269 Rep Power: 46 My experience is somehow depressing me...For standard 3D second order elliptic operators any try to get the best CPU time performance out of my optimized SOR (parallelized with black&white assumption) failed. That happened at least for number of unknowns

December 7, 2017, 08:05
#12
Senior Member

Michael Prinkey
Join Date: Mar 2009
Location: Pittsburgh PA
Posts: 363
Rep Power: 18
Quote:
 Originally Posted by FMDenaro My experience is somehow depressing me...For standard 3D second order elliptic operators any try to get the best CPU time performance out of my optimized SOR (parallelized with black&white assumption) failed. That happened at least for number of unknowns
If you have hard-coded the matrix coefficients (-1,-1,-1,6,-1,-1,-1) and you use red/black coloring, that gives loops that are very easy for a compiler to optimize using unrolling, loop blocking, and many other tricks. Also, you only have two vectors being accessed...the RHS and the solution, so it is highly likely that your problem fits mostly in cache. I ran into a similar problem with a code I worked on many years ago. The code was ancient and not well designed. It used SOR, so I tried to replace it with a canned ICCG solver and it ran (slightly) slower, just because their problem sizes were so small, SOR tucked into L2 cache and ICCG didn't. These are important issues, IMO, especially because of the push to larger core counts and GPU architectures. What algorithm *does* get the fastest time to solution as we end up with way more cores available than we can possibly use efficiently with older algorithms.

Having said that, compilers should be able to optimize Geometric Multigrid loops in much the same way as your Red-Black SOR loops. I'd be surprised if a GMG solver using RB GS couldn't best your SOR solver even on small cases, just because the convergence rate is higher, the storage is only log(N) larger, and loops exhibit the same structure. Of course, I am often wrong about such things.

December 8, 2017, 04:17
#13
Senior Member

Hector Redal
Join Date: Aug 2010
Posts: 217
Rep Power: 11
Quote:
 Originally Posted by mprinkey Indeed. 8) In terms of practical use in CFD applications, Krylov solvers (CG for symmetric positive definite matrices, BiCGStab and GMRES for non-symmetric matrices) are state of the art. Other alternatives are pure multigrid iterative schemes. The best solvers for large systems use multigrid iteration as a precondition for the Krylov space solver and reap the synergistic benefit of both approaches. Direct/Exact solvers like LU/Thomas algorithms are not regularly used in CFD for the solution of 2D/3D problems. They are only used in reduced-dimension approximations like Alternate Direction Implicit methods or for line relaxation in smoother/preconditioner schemes...but those applications usually have a bandwidth of 3 for 2nd-order schemes or proportionally larger for higher order schemes. Again, these are solving only 1D sub-sets of the larger 2D/3D problem and are used in an iterative fashion or used in conjunction with a dimensional reduction scheme, say by using FFT techniques in the other spatial dimensions. The only other CFD application I can think of for banded direct solvers is in the evaluation of the compact finite differences. Outside of CFD, RCM with direct solver methods may be used to do more than solve the linear system. For example, a reduced bandwidth matrix will be easier to extract eigenvalues/vectors which is useful, say, for identifying vibrational frequency and mode shapes in elastic structures.
Thanks for the information.
Right now the algorithm I have implemented is using a BiCGStab for solving the linear systems used by my application. As pre-conditioner I am using IncompleteLUT.
I am not using any kind of multigrid in my application, so I understand my approach is fair enough.
I would like to comment also that I am using non-uniform meshes, based on TRIA elements that discretize the domain / problem.

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is OffTrackbacks are On Pingbacks are On Refbacks are On Forum Rules

 Similar Threads Thread Thread Starter Forum Replies Last Post thezack Siemens 7 October 12, 2016 11:14 quarkz Main CFD Forum 30 December 26, 2011 07:12 sc298 OpenFOAM Native Meshers: snappyHexMesh and Others 2 March 27, 2011 21:11 Remy Main CFD Forum 1 December 22, 2008 04:49 Joe CFX 2 March 26, 2007 18:10

All times are GMT -4. The time now is 23:36.