CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Main CFD Forum

Is there a way to order nodes in a mesh to efficiently compute sparse matrices?

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree8Likes
  • 1 Post By FMDenaro
  • 1 Post By sbaffini
  • 2 Post By mprinkey
  • 3 Post By mprinkey
  • 1 Post By mprinkey

Reply
 
LinkBack Thread Tools Display Modes
Old   December 6, 2017, 11:16
Default Is there a way to order nodes in a mesh to efficiently compute sparse matrices?
  #1
Senior Member
 
Hector Redal
Join Date: Aug 2010
Location: Madrid, Spain
Posts: 201
Rep Power: 10
HectorRedal is on a distinguished road
Hi,

I don't know if I am going to explain my problem clear enough or my question makes no sense.

Suppose there is a mesh created or generated by hand or by any software.
The identifers provided to the nodes of the this mesh have been assigned randomly / arbitrarily.

The CFD software takes this mesh as input and creates some sparse matrices.
Obviously, the order assigned to the nodes affects the performance cost of the calculation / solution of the problem.
If I manage to constructed a narrow banded (almost diagonal) matrix, the performance cost when calculating the inverse of the matrix (for example) will definitelly be smaller than any other matrix (constructed for the same mesh).

Are there any algorithms I can check / read to implement some kind of preprocessing step so as to order the nodes of the mesh when building the sparse matrix of the CFD software?
Are there any paper or reading I can take a look at?

Thanks in advance for your kind support.

Best regards,
Hector.
HectorRedal is offline   Reply With Quote

Old   December 6, 2017, 11:41
Default
  #2
Senior Member
 
Filippo Maria Denaro
Join Date: Jul 2010
Posts: 3,686
Rep Power: 41
FMDenaro has a spectacular aura aboutFMDenaro has a spectacular aura about
https://it.mathworks.com/help/matlab...-matrices.html
HectorRedal likes this.
FMDenaro is offline   Reply With Quote

Old   December 6, 2017, 11:42
Default
  #3
Senior Member
 
sbaffini's Avatar
 
Paolo Lampitella
Join Date: Mar 2009
Location: Italy
Posts: 811
Blog Entries: 17
Rep Power: 22
sbaffini will become famous soon enough
There is really a lot of material on this. Basically, any book on solving linear equation systems has some section on the proper ordering of the variables.

The reverse Cuthill–McKee algorithm (RCM) is one such orderings based only on the graph of the matrix (i.e., the connectivity of the mesh, not its coordinates).

Space Filling Curves are another mean to order "stuff" in space and only require local coordinates of the objects to sort. Thus they are suitable for matrices coming from PDEs discretized in space.
HectorRedal likes this.
sbaffini is offline   Reply With Quote

Old   December 6, 2017, 11:58
Default
  #4
Senior Member
 
Michael Prinkey
Join Date: Mar 2009
Location: Pittsburgh PA
Posts: 354
Rep Power: 17
mprinkey will become famous soon enough
Quote:
Originally Posted by sbaffini View Post
There is really a lot of material on this. Basically, any book on solving linear equation systems has some section on the proper ordering of the variables.

The reverse Cuthill–McKee algorithm (RCM) is one such orderings based only on the graph of the matrix (i.e., the connectivity of the mesh, not its coordinates).

Space Filling Curves are another mean to order "stuff" in space and only require local coordinates of the objects to sort. Thus they are suitable for matrices coming from PDEs discretized in space.
I've found space filling curves to be every bit as good, if not better than RCM. RCM gives the lowest bandwidth result which is exactly what you want for avoiding fill-in for a banded algorithm. But SFC seem to do a better job of reducing "average" bandwidth allowing that cache misses/parallel communication must occur, it seems to reduce the average span of the matrix at the expense of more severe outliers. But if an outlier is not in cache, it doesn't matter much where it is. Of course, this is just my experience. I don't have any solid theoretical/experimental results to point to.

I have looked a using SFC to create blocks of a characteristic (say L2) cache size and then maybe RCM on each of those blocks.
sbaffini and HectorRedal like this.
mprinkey is offline   Reply With Quote

Old   December 6, 2017, 13:09
Default
  #5
Senior Member
 
sbaffini's Avatar
 
Paolo Lampitella
Join Date: Mar 2009
Location: Italy
Posts: 811
Blog Entries: 17
Rep Power: 22
sbaffini will become famous soon enough
If I remember correctly, RCM might have some advantage for LU factorizations or the sort of. However, in addition to what Michael has stated, SFC orderings have also the advantage of producing an ordering also suitable for straightforward parallel partitioning and more (there is a paper by aftosmis on this).
sbaffini is offline   Reply With Quote

Old   December 6, 2017, 18:35
Default
  #6
Senior Member
 
Hector Redal
Join Date: Aug 2010
Location: Madrid, Spain
Posts: 201
Rep Power: 10
HectorRedal is on a distinguished road
Hi,

Thank you all for your help and support!

I have been digging out a bit in the internet using the information you have provided to me (Cuthill–McKee algorithm and Space Filling Curves), and I have found that all these methods are suitable for SparseLU solvers.
I have just one doubt, can these methods be used with BiCGSTAB solvers?
Maybe my question makes no sense, so I beg your pardon if this is the case.

Best regards.
HectorRedal is offline   Reply With Quote

Old   December 6, 2017, 22:14
Default
  #7
Senior Member
 
Michael Prinkey
Join Date: Mar 2009
Location: Pittsburgh PA
Posts: 354
Rep Power: 17
mprinkey will become famous soon enough
Quote:
Originally Posted by HectorRedal View Post
Hi,

Thank you all for your help and support!

I have been digging out a bit in the internet using the information you have provided to me (Cuthill–McKee algorithm and Space Filling Curves), and I have found that all these methods are suitable for SparseLU solvers.
I have just one doubt, can these methods be used with BiCGSTAB solvers?
Maybe my question makes no sense, so I beg your pardon if this is the case.

Best regards.
Matrix ordering WILL NOT change the efficacy of BiCGStab or any other Krylov method in terms of increasing or decreasing the number of iterations required. The same Krylov spaces will be generated (subject to the same reordering of the solution and subspace vectors). The only potential changes *could* be due to round-off errors from the change in order of operations--but neither RCM or SFC address this issue in any fashion.

For (Super)-LU schemes or a wide-band Thomas algorithms that generate direct (aka exact) rather than iterative solutions to the linear system, there is a big benefit for RCM and the resulting reduced matrix bandwidth. The scaling of the solution of, say, a Thomas algorithm is O(N*B^2) where N is the number of unknowns and B is the bandwidth of the matrix. If you can reorder and reduce B by some significant factor, you reduce the runtime of the exact Thomas solution of the reordered matrix by that factor squared. Fill-in storage is significantly reduced as well.

For iterative schemes, the value of matrix re-ordering is ALL cache related. Matrices with smaller bandwidths tend to have more cache reuse and fewer pipeline stalls due to cache misses. Better cache usages tends to make for faster runtimes and more efficient solution schemes. But these efficiency improvements are NOT algorithmic...they are artifacts of the memory hierarchy in modern CPU and memory subsystems. And some CPU, cache, and memory configurations may see more benefit from re-ordering than others. This is deep in the weeds of high-performance coding, but these are where the distinctions lie.
sbaffini, FMDenaro and HectorRedal like this.
mprinkey is offline   Reply With Quote

Old   December 7, 2017, 03:28
Default
  #8
Senior Member
 
Hector Redal
Join Date: Aug 2010
Location: Madrid, Spain
Posts: 201
Rep Power: 10
HectorRedal is on a distinguished road
Quote:
Originally Posted by mprinkey View Post
Matrix ordering WILL NOT change the efficacy of BiCGStab or any other Krylov method in terms of increasing or decreasing the number of iterations required. The same Krylov spaces will be generated (subject to the same reordering of the solution and subspace vectors). The only potential changes *could* be due to round-off errors from the change in order of operations--but neither RCM or SFC address this issue in any fashion.

For (Super)-LU schemes or a wide-band Thomas algorithms that generate direct (aka exact) rather than iterative solutions to the linear system, there is a big benefit for RCM and the resulting reduced matrix bandwidth. The scaling of the solution of, say, a Thomas algorithm is O(N*B^2) where N is the number of unknowns and B is the bandwidth of the matrix. If you can reorder and reduce B by some significant factor, you reduce the runtime of the exact Thomas solution of the reordered matrix by that factor squared. Fill-in storage is significantly reduced as well.

For iterative schemes, the value of matrix re-ordering is ALL cache related. Matrices with smaller bandwidths tend to have more cache reuse and fewer pipeline stalls due to cache misses. Better cache usages tends to make for faster runtimes and more efficient solution schemes. But these efficiency improvements are NOT algorithmic...they are artifacts of the memory hierarchy in modern CPU and memory subsystems. And some CPU, cache, and memory configurations may see more benefit from re-ordering than others. This is deep in the weeds of high-performance coding, but these are where the distinctions lie.
Thanks a lot for your clear and cristal explanation.
I have perfectly grasped the main point of the subject under discussion.

Just more question: Which algorithms is quicker / faster?
Using a wide-band Thomas with RCM algorithm or a Krylov method? (If they are comparable)

Can they be compared? Or are we talking about things that are not comparable?
HectorRedal is offline   Reply With Quote

Old   December 7, 2017, 04:49
Default
  #9
Senior Member
 
Filippo Maria Denaro
Join Date: Jul 2010
Posts: 3,686
Rep Power: 41
FMDenaro has a spectacular aura aboutFMDenaro has a spectacular aura about
Quote:
Originally Posted by HectorRedal View Post
Thanks a lot for your clear and cristal explanation.
I have perfectly grasped the main point of the subject under discussion.

Just more question: Which algorithms is quicker / faster?
Using a wide-band Thomas with RCM algorithm or a Krylov method? (If they are comparable)

Can they be compared? Or are we talking about things that are not comparable?

The issue is about what do you mean for quicker/faster. Depending on the method, CPU time and number of iterations can produce strange and conflicting results
FMDenaro is offline   Reply With Quote

Old   December 7, 2017, 07:57
Default
  #10
Senior Member
 
Michael Prinkey
Join Date: Mar 2009
Location: Pittsburgh PA
Posts: 354
Rep Power: 17
mprinkey will become famous soon enough
Quote:
Originally Posted by FMDenaro View Post
The issue is about what do you mean for quicker/faster. Depending on the method, CPU time and number of iterations can produce strange and conflicting results
Indeed. 8)

In terms of practical use in CFD applications, Krylov solvers (CG for symmetric positive definite matrices, BiCGStab and GMRES for non-symmetric matrices) are state of the art. Other alternatives are pure multigrid iterative schemes. The best solvers for large systems use multigrid iteration as a precondition for the Krylov space solver and reap the synergistic benefit of both approaches.

Direct/Exact solvers like LU/Thomas algorithms are not regularly used in CFD for the solution of 2D/3D problems. They are only used in reduced-dimension approximations like Alternate Direction Implicit methods or for line relaxation in smoother/preconditioner schemes...but those applications usually have a bandwidth of 3 for 2nd-order schemes or proportionally larger for higher order schemes. Again, these are solving only 1D sub-sets of the larger 2D/3D problem and are used in an iterative fashion or used in conjunction with a dimensional reduction scheme, say by using FFT techniques in the other spatial dimensions. The only other CFD application I can think of for banded direct solvers is in the evaluation of the compact finite differences.

Outside of CFD, RCM with direct solver methods may be used to do more than solve the linear system. For example, a reduced bandwidth matrix will be easier to extract eigenvalues/vectors which is useful, say, for identifying vibrational frequency and mode shapes in elastic structures.
HectorRedal likes this.
mprinkey is offline   Reply With Quote

Old   December 7, 2017, 08:07
Default
  #11
Senior Member
 
Filippo Maria Denaro
Join Date: Jul 2010
Posts: 3,686
Rep Power: 41
FMDenaro has a spectacular aura aboutFMDenaro has a spectacular aura about
My experience is somehow depressing me...For standard 3D second order elliptic operators any try to get the best CPU time performance out of my optimized SOR (parallelized with black&white assumption) failed. That happened at least for number of unknowns <O(10^7).
Yes, I get improvements in terms of numbers of iterations but as my final goal was to have the fastest solver, I always used the SOR method...
FMDenaro is offline   Reply With Quote

Old   December 7, 2017, 09:05
Default
  #12
Senior Member
 
Michael Prinkey
Join Date: Mar 2009
Location: Pittsburgh PA
Posts: 354
Rep Power: 17
mprinkey will become famous soon enough
Quote:
Originally Posted by FMDenaro View Post
My experience is somehow depressing me...For standard 3D second order elliptic operators any try to get the best CPU time performance out of my optimized SOR (parallelized with black&white assumption) failed. That happened at least for number of unknowns <O(10^7).
Yes, I get improvements in terms of numbers of iterations but as my final goal was to have the fastest solver, I always used the SOR method...
If you have hard-coded the matrix coefficients (-1,-1,-1,6,-1,-1,-1) and you use red/black coloring, that gives loops that are very easy for a compiler to optimize using unrolling, loop blocking, and many other tricks. Also, you only have two vectors being accessed...the RHS and the solution, so it is highly likely that your problem fits mostly in cache. I ran into a similar problem with a code I worked on many years ago. The code was ancient and not well designed. It used SOR, so I tried to replace it with a canned ICCG solver and it ran (slightly) slower, just because their problem sizes were so small, SOR tucked into L2 cache and ICCG didn't. These are important issues, IMO, especially because of the push to larger core counts and GPU architectures. What algorithm *does* get the fastest time to solution as we end up with way more cores available than we can possibly use efficiently with older algorithms.

Having said that, compilers should be able to optimize Geometric Multigrid loops in much the same way as your Red-Black SOR loops. I'd be surprised if a GMG solver using RB GS couldn't best your SOR solver even on small cases, just because the convergence rate is higher, the storage is only log(N) larger, and loops exhibit the same structure. Of course, I am often wrong about such things.
mprinkey is offline   Reply With Quote

Old   December 8, 2017, 05:17
Default
  #13
Senior Member
 
Hector Redal
Join Date: Aug 2010
Location: Madrid, Spain
Posts: 201
Rep Power: 10
HectorRedal is on a distinguished road
Quote:
Originally Posted by mprinkey View Post
Indeed. 8)

In terms of practical use in CFD applications, Krylov solvers (CG for symmetric positive definite matrices, BiCGStab and GMRES for non-symmetric matrices) are state of the art. Other alternatives are pure multigrid iterative schemes. The best solvers for large systems use multigrid iteration as a precondition for the Krylov space solver and reap the synergistic benefit of both approaches.

Direct/Exact solvers like LU/Thomas algorithms are not regularly used in CFD for the solution of 2D/3D problems. They are only used in reduced-dimension approximations like Alternate Direction Implicit methods or for line relaxation in smoother/preconditioner schemes...but those applications usually have a bandwidth of 3 for 2nd-order schemes or proportionally larger for higher order schemes. Again, these are solving only 1D sub-sets of the larger 2D/3D problem and are used in an iterative fashion or used in conjunction with a dimensional reduction scheme, say by using FFT techniques in the other spatial dimensions. The only other CFD application I can think of for banded direct solvers is in the evaluation of the compact finite differences.

Outside of CFD, RCM with direct solver methods may be used to do more than solve the linear system. For example, a reduced bandwidth matrix will be easier to extract eigenvalues/vectors which is useful, say, for identifying vibrational frequency and mode shapes in elastic structures.
Thanks for the information.
Right now the algorithm I have implemented is using a BiCGStab for solving the linear systems used by my application. As pre-conditioner I am using IncompleteLUT.
I am not using any kind of multigrid in my application, so I understand my approach is fair enough.
I would like to comment also that I am using non-uniform meshes, based on TRIA elements that discretize the domain / problem.
HectorRedal is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Star CCM Overset Mesh Error (Rotating Turbine) thezack Siemens 7 October 12, 2016 11:14
2nd order boundary conditions for 2nd order discretization? quarkz Main CFD Forum 30 December 26, 2011 08:12
snappyHexMesh won't work - zeros everywhere! sc298 OpenFOAM Native Meshers: snappyHexMesh and Others 2 March 27, 2011 21:11
2d irregular grid Remy Main CFD Forum 1 December 22, 2008 05:49
Icemcfd 11: Loss of mesh from surface mesh option? Joe CFX 2 March 26, 2007 18:10


All times are GMT -4. The time now is 02:26.