|
[Sponsors] |
Is there a way to order nodes in a mesh to efficiently compute sparse matrices? |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
December 6, 2017, 10:16 |
Is there a way to order nodes in a mesh to efficiently compute sparse matrices?
|
#1 |
Senior Member
Hector Redal
Join Date: Aug 2010
Location: Madrid, Spain
Posts: 243
Rep Power: 16 |
Hi,
I don't know if I am going to explain my problem clear enough or my question makes no sense. Suppose there is a mesh created or generated by hand or by any software. The identifers provided to the nodes of the this mesh have been assigned randomly / arbitrarily. The CFD software takes this mesh as input and creates some sparse matrices. Obviously, the order assigned to the nodes affects the performance cost of the calculation / solution of the problem. If I manage to constructed a narrow banded (almost diagonal) matrix, the performance cost when calculating the inverse of the matrix (for example) will definitelly be smaller than any other matrix (constructed for the same mesh). Are there any algorithms I can check / read to implement some kind of preprocessing step so as to order the nodes of the mesh when building the sparse matrix of the CFD software? Are there any paper or reading I can take a look at? Thanks in advance for your kind support. Best regards, Hector. |
|
December 6, 2017, 10:41 |
|
#2 |
Senior Member
Filippo Maria Denaro
Join Date: Jul 2010
Posts: 6,768
Rep Power: 71 |
||
December 6, 2017, 10:42 |
|
#3 |
Senior Member
|
There is really a lot of material on this. Basically, any book on solving linear equation systems has some section on the proper ordering of the variables.
The reverse Cuthill–McKee algorithm (RCM) is one such orderings based only on the graph of the matrix (i.e., the connectivity of the mesh, not its coordinates). Space Filling Curves are another mean to order "stuff" in space and only require local coordinates of the objects to sort. Thus they are suitable for matrices coming from PDEs discretized in space. |
|
December 6, 2017, 10:58 |
|
#4 | |
Senior Member
Michael Prinkey
Join Date: Mar 2009
Location: Pittsburgh PA
Posts: 363
Rep Power: 25 |
Quote:
I have looked a using SFC to create blocks of a characteristic (say L2) cache size and then maybe RCM on each of those blocks. |
||
December 6, 2017, 12:09 |
|
#5 |
Senior Member
|
If I remember correctly, RCM might have some advantage for LU factorizations or the sort of. However, in addition to what Michael has stated, SFC orderings have also the advantage of producing an ordering also suitable for straightforward parallel partitioning and more (there is a paper by aftosmis on this).
|
|
December 6, 2017, 17:35 |
|
#6 |
Senior Member
Hector Redal
Join Date: Aug 2010
Location: Madrid, Spain
Posts: 243
Rep Power: 16 |
Hi,
Thank you all for your help and support! I have been digging out a bit in the internet using the information you have provided to me (Cuthill–McKee algorithm and Space Filling Curves), and I have found that all these methods are suitable for SparseLU solvers. I have just one doubt, can these methods be used with BiCGSTAB solvers? Maybe my question makes no sense, so I beg your pardon if this is the case. Best regards. |
|
December 6, 2017, 21:14 |
|
#7 | |
Senior Member
Michael Prinkey
Join Date: Mar 2009
Location: Pittsburgh PA
Posts: 363
Rep Power: 25 |
Quote:
For (Super)-LU schemes or a wide-band Thomas algorithms that generate direct (aka exact) rather than iterative solutions to the linear system, there is a big benefit for RCM and the resulting reduced matrix bandwidth. The scaling of the solution of, say, a Thomas algorithm is O(N*B^2) where N is the number of unknowns and B is the bandwidth of the matrix. If you can reorder and reduce B by some significant factor, you reduce the runtime of the exact Thomas solution of the reordered matrix by that factor squared. Fill-in storage is significantly reduced as well. For iterative schemes, the value of matrix re-ordering is ALL cache related. Matrices with smaller bandwidths tend to have more cache reuse and fewer pipeline stalls due to cache misses. Better cache usages tends to make for faster runtimes and more efficient solution schemes. But these efficiency improvements are NOT algorithmic...they are artifacts of the memory hierarchy in modern CPU and memory subsystems. And some CPU, cache, and memory configurations may see more benefit from re-ordering than others. This is deep in the weeds of high-performance coding, but these are where the distinctions lie. |
||
December 7, 2017, 02:28 |
|
#8 | |
Senior Member
Hector Redal
Join Date: Aug 2010
Location: Madrid, Spain
Posts: 243
Rep Power: 16 |
Quote:
I have perfectly grasped the main point of the subject under discussion. Just more question: Which algorithms is quicker / faster? Using a wide-band Thomas with RCM algorithm or a Krylov method? (If they are comparable) Can they be compared? Or are we talking about things that are not comparable? |
||
December 7, 2017, 03:49 |
|
#9 | |
Senior Member
Filippo Maria Denaro
Join Date: Jul 2010
Posts: 6,768
Rep Power: 71 |
Quote:
The issue is about what do you mean for quicker/faster. Depending on the method, CPU time and number of iterations can produce strange and conflicting results |
||
December 7, 2017, 06:57 |
|
#10 | |
Senior Member
Michael Prinkey
Join Date: Mar 2009
Location: Pittsburgh PA
Posts: 363
Rep Power: 25 |
Quote:
In terms of practical use in CFD applications, Krylov solvers (CG for symmetric positive definite matrices, BiCGStab and GMRES for non-symmetric matrices) are state of the art. Other alternatives are pure multigrid iterative schemes. The best solvers for large systems use multigrid iteration as a precondition for the Krylov space solver and reap the synergistic benefit of both approaches. Direct/Exact solvers like LU/Thomas algorithms are not regularly used in CFD for the solution of 2D/3D problems. They are only used in reduced-dimension approximations like Alternate Direction Implicit methods or for line relaxation in smoother/preconditioner schemes...but those applications usually have a bandwidth of 3 for 2nd-order schemes or proportionally larger for higher order schemes. Again, these are solving only 1D sub-sets of the larger 2D/3D problem and are used in an iterative fashion or used in conjunction with a dimensional reduction scheme, say by using FFT techniques in the other spatial dimensions. The only other CFD application I can think of for banded direct solvers is in the evaluation of the compact finite differences. Outside of CFD, RCM with direct solver methods may be used to do more than solve the linear system. For example, a reduced bandwidth matrix will be easier to extract eigenvalues/vectors which is useful, say, for identifying vibrational frequency and mode shapes in elastic structures. |
||
December 7, 2017, 07:07 |
|
#11 |
Senior Member
Filippo Maria Denaro
Join Date: Jul 2010
Posts: 6,768
Rep Power: 71 |
My experience is somehow depressing me...For standard 3D second order elliptic operators any try to get the best CPU time performance out of my optimized SOR (parallelized with black&white assumption) failed. That happened at least for number of unknowns <O(10^7).
Yes, I get improvements in terms of numbers of iterations but as my final goal was to have the fastest solver, I always used the SOR method... |
|
December 7, 2017, 08:05 |
|
#12 | |
Senior Member
Michael Prinkey
Join Date: Mar 2009
Location: Pittsburgh PA
Posts: 363
Rep Power: 25 |
Quote:
Having said that, compilers should be able to optimize Geometric Multigrid loops in much the same way as your Red-Black SOR loops. I'd be surprised if a GMG solver using RB GS couldn't best your SOR solver even on small cases, just because the convergence rate is higher, the storage is only log(N) larger, and loops exhibit the same structure. Of course, I am often wrong about such things. |
||
December 8, 2017, 04:17 |
|
#13 | |
Senior Member
Hector Redal
Join Date: Aug 2010
Location: Madrid, Spain
Posts: 243
Rep Power: 16 |
Quote:
Right now the algorithm I have implemented is using a BiCGStab for solving the linear systems used by my application. As pre-conditioner I am using IncompleteLUT. I am not using any kind of multigrid in my application, so I understand my approach is fair enough. I would like to comment also that I am using non-uniform meshes, based on TRIA elements that discretize the domain / problem. |
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Star CCM Overset Mesh Error (Rotating Turbine) | thezack | Siemens | 7 | October 12, 2016 11:14 |
2nd order boundary conditions for 2nd order discretization? | quarkz | Main CFD Forum | 30 | December 26, 2011 07:12 |
[snappyHexMesh] snappyHexMesh won't work - zeros everywhere! | sc298 | OpenFOAM Meshing & Mesh Conversion | 2 | March 27, 2011 21:11 |
2d irregular grid | Remy | Main CFD Forum | 1 | December 22, 2008 04:49 |
Icemcfd 11: Loss of mesh from surface mesh option? | Joe | CFX | 2 | March 26, 2007 18:10 |