CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > OpenFOAM Bugs

Feature proposal: Application level profiling

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   April 28, 2009, 15:24
Default Feature proposal: Application level profiling
  #1
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 3,649
Rep Power: 36
gschaider will become famous soon enoughgschaider will become famous soon enough
At first: Yes I know how to compile a profile version of OF, but sometimes I find the amount of information there a bit overwhelming when I ask myself simple questions like: "The pressure equation seems to eat most of the CPU-time, but how much exactly" and don't want to know every detail

Second: I know this is not a bug, but I don't know what would be the appropriate forum for this

Attached there is an archive with the example implementation. The attached archive can be used as a separate library (which requires modifications of the solver) or compiled into libOpenFOAM.so. The latter adds some data-structures for Profiling. The profiling information will be automatically written into a file uniform/profilingInfo at every time-step for which data is written to disk. The profiling information is basically how much time was spent in a section, how often it was called and how much time was spent in the 'child' sections. The sections have to be marked with the addProfile-Macro (closed by an optional endProfile). The enclosed patch automatically does this for the linear solvers. So if running an unmodified interFoam on the damBreak case the data would look like this after being processed with the enclosed Python-script, the raw data looks more foamish):
Code:
                              | parent  (total ) -  self  |    calls      total      self 
------------------------------------------------------------------------------------------
"application::main"           |  100.0% ( 67.5%) -  67.5% |        1     69.57s     46.93s
|-"lduMatrix::solver_pcorr"   |    0.0% (  0.0%) - 100.0% |        1   0.00055s   0.00055s
|-"MULES::explicitSolve"      |    8.5% (  8.5%) - 100.0% |     2092     5.946s     5.946s
|-"lduMatrix::solver_pd"      |   23.2% ( 23.2%) - 100.0% |     3138     16.12s     16.12s
|-"writing"                   |    0.8% (  0.8%) - 100.0% |       20    0.5808s    0.5808s

(please excuse the misalignment, the board doesn't seem to think that fixed-length-font is appropriate for Code)
The "raw" data is on the right on the left there are the percentages calculated from that. In this example there is only one "parent" and three children.

A slightly modified interFoam (see other patch) would generate this profile
Code:
                                  | parent  (total ) -  self  |    calls      total      self 
----------------------------------------------------------------------------------------------
"application::main"               |  100.0% (  0.0%) -   0.0% |        1     69.82s    0.0021s
|-"init"                          |    0.1% (  0.1%) -  93.1% |        1   0.06371s   0.05934s
  |-"correctPhi"                  |    6.9% (  0.0%) -  84.9% |        1  0.004364s  0.003703s
    |-"lduMatrix::solver_pcorr"   |   15.1% (  0.0%) - 100.0% |        1  0.000661s  0.000661s
|-"timeLoop"                      |   99.9% (  3.9%) -   4.0% |     1046     69.75s     2.757s
  |-"gammaEqnSubCycle"            |   25.0% (  5.7%) -  22.9% |     1046     17.46s     3.996s
    |-"gammaEqn"                  |   77.1% ( 10.7%) -  55.4% |     2092     13.46s     7.461s
      |-"MULES::explicitSolve"    |   44.6% (  8.6%) - 100.0% |     2092     6.002s     6.002s
  |-"UEqn"                        |    9.4% (  9.4%) - 100.0% |     1046     6.572s     6.572s
  |-"pEqn"                        |   60.9% (  0.3%) -   0.5% |     3138     42.46s    0.2084s
    |-"pEqn - pre"                |   34.7% ( 21.1%) - 100.0% |     3138     14.72s     14.72s
    |-"pEqn - Corrector"          |   47.3% (  5.7%) -  19.9% |     3138     20.08s     3.993s
      |-"lduMatrix::solver_pd"    |   80.1% ( 23.0%) - 100.0% |     3138     16.08s     16.08s
    |-"pEqn - post"               |   17.6% ( 10.7%) - 100.0% |     3138     7.456s     7.456s
  |-"writing"                     |    0.7% (  0.7%) - 100.0% |       20    0.5095s    0.5095s


To install the profiling support into OF use the addProfilingToRelease.sh-script enclosed in the tar and recompile the installation (this works for 1.5(.x). 1.4.1 needs some trivial changes - if one needs it)
To use it "just" as a library compile it with "wmake libso" and have a look at the test-application that is enclosed. But I think the cooler way to use this is as an integrated part of OF because thus the linear solvers are automatically profiled

Some remarks:
  • I just found out that some source files of interFoam are shared by other VOF-solvers. Applying that patch thus breaks the compilation of those solvers if they're not modified as well
  • The addProfile/endProfile should be only applied for sections that are not called to often (inside for instance of a "forAll(U,cellI)"-loop is a BAAAD idea) but if the section is not used too entered too often per time-step then there should be no noticeable impact on the performance
  • There are some ugly places in the code. Most of them come from the effort to correctly write timing-information for sections that are still "on the stack"
  • It is on purpose that not synchronization is done for parallel-runs. The data is written per processor and might vary (thus reflecting an imbalance in certain sections)
So in conclusion I'd like to say that it would be cool if this (or something similar and probably better) would be included in future versions of OF as it would help to quantify the effects of measures that were taken to improve the performance (both in the program or in parameter-files).

Bernhard

Attachment 200
gschaider is online now   Reply With Quote

Old   April 28, 2009, 15:30
Default
  #2
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 3,649
Rep Power: 36
gschaider will become famous soon enoughgschaider will become famous soon enough
OK. The forum seems to hate my attachments. I'll retry.
applicationLevelProfile.r10393.tar.gz
gschaider is online now   Reply With Quote

Old   April 29, 2009, 04:36
Default
  #3
Senior Member
 
Join Date: Mar 2009
Posts: 854
Rep Power: 12
henry is on a distinguished road
I use the callgrind tool in valgrind and view the output with kcachegrind, this has proved a very effective way of doing detailed profiling on the optimised code without the need to recompile with profiling options.

Henry
henry is offline   Reply With Quote

Old   April 29, 2009, 13:07
Default
  #4
Assistant Moderator
 
Bernhard Gschaider
Join Date: Mar 2009
Posts: 3,649
Rep Power: 36
gschaider will become famous soon enoughgschaider will become famous soon enough
Hi Henry!

Quote:
Originally Posted by henry View Post
I use the callgrind tool in valgrind and view the output with kcachegrind, this has proved a very effective way of doing detailed profiling on the optimised code without the need to recompile with profiling options.
Thanks for that tip. I was only using valgrind for memory-leaks until now, but the info that kcachegrind gives is quite nice.

Nevertheless I think that this extension has its applications, especially because
  • the data is automatically there (no need to rerun the case, which is especially impractical for large cases)
  • detailed profiling for parallel runs is quite hard
  • the data is interpretable for non-programmers (so that they can see for instance how much a move from a CG-solver to AMG improves the performance of a particular part of the code - or not)
Of course one relies on the programmer to instrument the code, but then a "normal" user can see where the time was lost and has quantitative information (And go to the programmer "It is nice that the stuff you inserted works in the demo case, but for that particular case it eats most of the time in the section you called 'bigOexpNalgorithm' " which is better than "It's kinda slow")
So I see it not as much as a programmer's tool, more as a service for the user.

Bernhard
gschaider is online now   Reply With Quote

Old   April 29, 2009, 13:43
Default
  #5
Senior Member
 
Join Date: Mar 2009
Posts: 854
Rep Power: 12
henry is on a distinguished road
  • >the data is automatically there (no need to rerun the case, which is especially impractical for large cases)

    valgrind can be run in parallel so can be used for large cases, in fact any cases that can be run in the normal manner.
  • >detailed profiling for parallel runs is quite hard

    I have found that this is easy using valgrind.
  • >the data is interpretable for non-programmers (so that they can see for instance how much a move from a CG-solver to AMG improves the performance of a particular part of the code - or not)

    kcachegrind presents all the information in a very accessible manner.
> Of course one relies on the programmer to instrument the code,

Valgrind does not need any changes to the code or even recompilation.

> but then a "normal" user can see where the time was lost and has quantitative information

valgrind provides all the information.
henry is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Level set vs VOF steve Main CFD Forum 11 September 16, 2008 00:22
MPI profiling OpenFOAM damBreak3D application mellanoxuser OpenFOAM Pre-Processing 0 April 13, 2008 23:15
MPI profiling OpenFOAM damBreak3D application mellanoxuser OpenFOAM Running, Solving & CFD 0 April 13, 2008 23:04
Is Playstation 3 cluster suitable for CFD work hsieh OpenFOAM 8 March 12, 2008 05:58
Relaxation factors Vidya Raja FLUENT 7 September 26, 2006 20:00


All times are GMT -4. The time now is 13:04.