CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Main CFD Forum

solver-dev : which variables to store in memory and which to compute on the fly?

Register Blogs Community New Posts Updated Threads Search

Like Tree12Likes
  • 1 Post By sbaffini
  • 1 Post By sbaffini
  • 1 Post By sbaffini
  • 1 Post By sbaffini
  • 3 Post By flotus1
  • 1 Post By sbaffini
  • 2 Post By flotus1
  • 1 Post By aerosayan
  • 1 Post By arjun

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   November 22, 2020, 10:27
Default solver-dev : which variables to store in memory and which to compute on the fly?
  #1
Senior Member
 
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8
aerosayan is on a distinguished road
Loading data from memory can become costly due to cache misses.


In some cases it makes sense to calculate some variables on the fly and reduce memory pressure.


In your personal experience, which variables do you pre-compute and store in memory and which ones do you compute on the fly?


I'm currently pre-computing the surface normals, cell volumes, face areas, and storing them in memory. Each array contains NCELLS double/single precision values. However this seems like a seriously bad idea since I could've saved space for NCELLS*3 more cells.


What's the right choice?
aerosayan is offline   Reply With Quote

Old   November 22, 2020, 12:16
Default
  #2
Senior Member
 
sbaffini's Avatar
 
Paolo Lampitella
Join Date: Mar 2009
Location: Italy
Posts: 2,152
Blog Entries: 29
Rep Power: 39
sbaffini will become famous soon enoughsbaffini will become famous soon enough
Send a message via Skype™ to sbaffini
A typical approach is to store the face normals with magnitude, so that you don't store the face area separately.

Besides this, I second the store less and compute more approach. Still, there is not really that much places where to use this.

It really depends from the effort involved in computing and how much you reuse that variable. Areas and volumes are typically stored. Mass flow trough faces is typically stored as well.
aerosayan likes this.
sbaffini is offline   Reply With Quote

Old   November 22, 2020, 12:58
Default
  #3
Senior Member
 
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8
aerosayan is on a distinguished road
Quote:
Originally Posted by sbaffini View Post
A typical approach is to store the face normals with magnitude, so that you don't store the face area separately. .... Mass flow trough faces is typically stored as well.

Thanks for the help.



How do you generally store the flux values?



To preserve continuity, the flux going through a face will be positive in one cell and negative in the neighbor cell. I haven't thought too much about it, but that is definitely going to cause lots of branch mispredictions when figuring out whether the flux will be positive or negative. So, multiplying with the unit surface normal (where surface normal is from the cell with high index to cell with low index) would actually improve performance.


So, wouldn't it be better to store the face normals simply as +ve or -ve unit vectors and not with their magnitude?



Also, I was planning to store all of my flux values in three 1D arrays (for three faces of each cell) of length NCELLS. However the gather and scatter operations to write/read into the three separate arrays would absolutely kill performance.


I think I can get away with only saving the residual as SUM(face_flux * face_area)
aerosayan is offline   Reply With Quote

Old   November 22, 2020, 13:47
Default
  #4
Senior Member
 
sbaffini's Avatar
 
Paolo Lampitella
Join Date: Mar 2009
Location: Italy
Posts: 2,152
Blog Entries: 29
Rep Power: 39
sbaffini will become famous soon enoughsbaffini will become famous soon enough
Send a message via Skype™ to sbaffini
First of all, let me clarify that I was referring to an unstructured code. The fact that you mention 3 faces (directions?) makes me think you are actually dealing with a structured code. In this case, I have little knowledge of the common approaches to improve performances. Surely, there might be some trick related to areas and volumes, which could be simpler to compute.

Besides this, tha mass flux is stored with its own sign, which is the one of the face normal... this just flows smoothly in code, nothing to worry about usually.

Also, you need the mass flow for the convection scheme of other scalars, and there you typically preload variables from both sides of a face and feed the convection scheme subroutine. I wouldn't rely on performance gains from branch prediction of upwind schemes. Also, as I mentioned in a previous post, I feel you are overoptimizing stuff while you should rely on a profiling of the whole code to guide you.
aerosayan likes this.
sbaffini is offline   Reply With Quote

Old   November 22, 2020, 15:01
Default
  #5
Senior Member
 
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8
aerosayan is on a distinguished road
Quote:
Originally Posted by sbaffini View Post
First of all, let me clarify that I was referring to an unstructured code. The fact that you mention 3 faces (directions?) makes me think you are actually dealing with a structured code. In this case, I have little knowledge of the common approaches to improve performances. Surely, there might be some trick related to areas and volumes, which could be simpler to compute.

Besides this, tha mass flux is stored with its own sign, which is the one of the face normal... this just flows smoothly in code, nothing to worry about usually.

Also, you need the mass flow for the convection scheme of other scalars, and there you typically preload variables from both sides of a face and feed the convection scheme subroutine. I wouldn't rely on performance gains from branch prediction of upwind schemes. Also, as I mentioned in a previous post, I feel you are overoptimizing stuff while you should rely on a profiling of the whole code to guide you.

I'm currently working on unstructured grid (hence 3 faces of a triangle cell)


Also, I personally don't prefer to leave optimization till the end, when most of the mistakes are made in the initial stages of development. For example : Fortran being the language for high performance numerical computation, still doesn't have any method to enforce memory alignment (AFAIK). We have to call a C function to allocate and align the memory for us.
aerosayan is offline   Reply With Quote

Old   November 22, 2020, 16:09
Default
  #6
Senior Member
 
sbaffini's Avatar
 
Paolo Lampitella
Join Date: Mar 2009
Location: Italy
Posts: 2,152
Blog Entries: 29
Rep Power: 39
sbaffini will become famous soon enoughsbaffini will become famous soon enough
Send a message via Skype™ to sbaffini
Then I don't think I get your reasoning on the face normal. You still need 3 components (in 3D), the face area would be a sqrt away from you, instead of storing it. Also, you typically need n with the area, so you shouldn't actually compute the face area a lot, if any at all.

Note that the mass flux (the only one I suggest storing, because you reuse it a lot for other equations) is then already of the correct sign and includes the area from n.

But don't store it for cells!!! It belongs to faces, otherwise you end up storing it twice
aerosayan likes this.
sbaffini is offline   Reply With Quote

Old   November 22, 2020, 16:13
Default
  #7
Senior Member
 
sbaffini's Avatar
 
Paolo Lampitella
Join Date: Mar 2009
Location: Italy
Posts: 2,152
Blog Entries: 29
Rep Power: 39
sbaffini will become famous soon enoughsbaffini will become famous soon enough
Send a message via Skype™ to sbaffini
At this point it is also important what kind of solver you are developing. Pressure based or density based? Explicit or implicit? Which algorithm?

Because some of them have their peculiarities
aerosayan likes this.
sbaffini is offline   Reply With Quote

Old   November 22, 2020, 17:19
Default
  #8
Senior Member
 
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8
aerosayan is on a distinguished road
The solver would be a steady state 2D Compressible Euler solver based on Cell Centred Explicit FVM discretization and first order accurate Van Leer flux splitting for solution of the Riemann problem. M-stage update procedure is used along with local CFL condition based time stepping. No turbulence models are implemented.


I was initially planning to implement an Implicit solver, but I figured that I should really re-write the explicit solver to use OpenMP parallelization and actually make something that's scalable and performs well for the cases where explicit solvers are a must.



Quote:
Originally Posted by sbaffini View Post
But don't store it for cells!!! It belongs to faces, otherwise you end up storing it twice

Makes sense. I wasn't thinking correctly about memory consumption as the number of cells scales up.
aerosayan is offline   Reply With Quote

Old   November 22, 2020, 17:48
Default
  #9
Senior Member
 
sbaffini's Avatar
 
Paolo Lampitella
Join Date: Mar 2009
Location: Italy
Posts: 2,152
Blog Entries: 29
Rep Power: 39
sbaffini will become famous soon enoughsbaffini will become famous soon enough
Send a message via Skype™ to sbaffini
Then probably the mass flux is not really needed yet.

But, as an example of what I said on overkilling optimizations, how do you store your variables (p,u,v,t I guess)? It may have sense to, say, store them in a single array nvar x ncells instead of nvar dinstinct vectors of ncells.
sbaffini is offline   Reply With Quote

Old   November 22, 2020, 18:21
Default
  #10
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Do I smell premature performance optimization based on guesses rather than profiling

On a more serious note:
Quote:
since I could've saved space for NCELLS*3
Does memory consumption really matter for a 2D code? RAM is cheap, your time isn't.
sbaffini, arjun and bejuzz like this.
flotus1 is offline   Reply With Quote

Old   November 23, 2020, 05:39
Default
  #11
Senior Member
 
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8
aerosayan is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
On a more serious note:

Does memory consumption really matter for a 2D code? RAM is cheap, your time isn't.

"...and I took that personally" -- Michael Jordon


I politely disagree. Actually it does matter. The solver only works for triangle grids. In order to get a good solution, the number of cells need to be cranked up. I work intentionally with only 4GB RAM to force myself to write better code.



The idea of "RAM is cheap" is something that inherently hurts everyone using the software. RAM is cheap when we're running the code on our own machines, but not so cheap when we decide to run it on a paid cloud server, or when we're assigned limited system resources by our university system admin.
aerosayan is offline   Reply With Quote

Old   November 23, 2020, 06:01
Default
  #12
Senior Member
 
sbaffini's Avatar
 
Paolo Lampitella
Join Date: Mar 2009
Location: Italy
Posts: 2,152
Blog Entries: 29
Rep Power: 39
sbaffini will become famous soon enoughsbaffini will become famous soon enough
Send a message via Skype™ to sbaffini
Let me tell you, it's not that you are wrong, in any of your statements or approach, just that it is like putting the cart before the horse.

Notably, there are things, even trivial, that a profiler can't catch. But once the obvious ones are taken into account (column/row major, proper algorithm choice, no bad scaling allocations), it can actually give you a lot of insight on things you wouldn't even know.

It is good to give proper thinking to how you write things, but it is wrong to become attached to any of your pieces of code. My current URANS code for unstructured grids is around 25k lines of code, but the commit history says that I changed around 350k lines of code. If code is alive, it will costantly change, and you will have to adapt to it.

I don't obviously know which is the state of your code but, if you haven't a working code yet, with all the planned features in place, you are, in my opinion, doing this wrong. If you have instead, I will post a nude if any of this is your major bottleneck

For example, you could have spent time on using MPI, which is way simpler than these micro-optimizations
aerosayan likes this.
sbaffini is offline   Reply With Quote

Old   November 23, 2020, 06:24
Default
  #13
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
"...and I took that personally" -- Michael Jordon
My intention really was not to offend you with anything I wrote.
Instead, it was an attempt to make you take a step back, and consider the bigger picture. It is far too easy to get lost in all those micro-level performance details. We have all been there at some point.
I could not agree more with sbafani: with the most obvious performance bottlenecks out of the way, it is time for profiling your code. Assumptions about performance impacts on this level will be wrong, no matter the level of expertise. This will of course force you to write a somewhat functional version of your code first, and then make changes where necessary. Instead of getting bogged down by all the "what if I did this instead" decisions before you approach a working solver.

About the memory usage though: if you want to restrict it as much as possible as an exercise, that's entirely up to you.
What I was trying to get across: with a 2D unstuctured tria solver, there is room for 10+ million cells in 4GB of RAM. Even if you treat RAM as a cheap resource.
sbaffini and aerosayan like this.
flotus1 is offline   Reply With Quote

Old   November 23, 2020, 06:31
Default
  #14
Senior Member
 
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8
aerosayan is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
My intention really was not to offend you with anything I wrote.

Not offended at all my good sir. It's a meme.


https://youtu.be/m38XhQSf1oU


Appreciate all of your help.
aerosayan is offline   Reply With Quote

Old   November 23, 2020, 08:22
Default
  #15
Senior Member
 
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8
aerosayan is on a distinguished road
Quote:
Originally Posted by sbaffini View Post
I will post a nude if any of this is your major bottleneck

LOL hilarious
Attached Images
File Type: jpg soon.jpg (11.6 KB, 10 views)
sbaffini likes this.
aerosayan is offline   Reply With Quote

Old   November 23, 2020, 08:42
Default
  #16
Senior Member
 
Arjun
Join Date: Mar 2009
Location: Nurenberg, Germany
Posts: 1,273
Rep Power: 34
arjun will become famous soon enougharjun will become famous soon enough
Quote:
Originally Posted by aerosayan View Post
The idea of "RAM is cheap" is something that inherently hurts everyone using the software.
.



Why drag everyone into this and speak on their behalf.
For example I have been writing codes for last 20 years and have written many times navier stokes solvers. I am not in the category of people who are hurt by "RAM is cheap" mantra.

In fact I am of opinion that one shall store something if it saves them the cost of calculations.
Your optimized code is as good as my unoptimized code when i do not compute things and just store most of the things. The things like sqrt, pow , exp etc take time to calculate.

I never bother about the optimization more than what -O3 flag can do. For most people who write serious code , maintenance of the code is most important issue. Low level personal optimization say at assembly level only leads to unmanageable nightmare for example.

My 2 cents.
aerosayan likes this.
arjun is offline   Reply With Quote

Old   November 23, 2020, 18:24
Default
  #17
Senior Member
 
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8
aerosayan is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
What I was trying to get across: with a 2D unstuctured tria solver, there is room for 10+ million cells in 4GB of RAM. Even if you treat RAM as a cheap resource.

You're correct.
aerosayan is offline   Reply With Quote

Reply

Tags
solver deveopment


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 15:47.