CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM Bugs (http://www.cfd-online.com/Forums/openfoam-bugs/)
-   -   Feature proposal: Application level profiling (http://www.cfd-online.com/Forums/openfoam-bugs/64081-feature-proposal-application-level-profiling.html)

gschaider April 28, 2009 15:24

Feature proposal: Application level profiling
 
At first: Yes I know how to compile a profile version of OF, but sometimes I find the amount of information there a bit overwhelming when I ask myself simple questions like: "The pressure equation seems to eat most of the CPU-time, but how much exactly" and don't want to know every detail

Second: I know this is not a bug, but I don't know what would be the appropriate forum for this

Attached there is an archive with the example implementation. The attached archive can be used as a separate library (which requires modifications of the solver) or compiled into libOpenFOAM.so. The latter adds some data-structures for Profiling. The profiling information will be automatically written into a file uniform/profilingInfo at every time-step for which data is written to disk. The profiling information is basically how much time was spent in a section, how often it was called and how much time was spent in the 'child' sections. The sections have to be marked with the addProfile-Macro (closed by an optional endProfile). The enclosed patch automatically does this for the linear solvers. So if running an unmodified interFoam on the damBreak case the data would look like this after being processed with the enclosed Python-script, the raw data looks more foamish):
Code:

                              | parent  (total ) -  self  |    calls      total      self
------------------------------------------------------------------------------------------
"application::main"          |  100.0% ( 67.5%) -  67.5% |        1    69.57s    46.93s
|-"lduMatrix::solver_pcorr"  |    0.0% (  0.0%) - 100.0% |        1  0.00055s  0.00055s
|-"MULES::explicitSolve"      |    8.5% (  8.5%) - 100.0% |    2092    5.946s    5.946s
|-"lduMatrix::solver_pd"      |  23.2% ( 23.2%) - 100.0% |    3138    16.12s    16.12s
|-"writing"                  |    0.8% (  0.8%) - 100.0% |      20    0.5808s    0.5808s


(please excuse the misalignment, the board doesn't seem to think that fixed-length-font is appropriate for Code)
The "raw" data is on the right on the left there are the percentages calculated from that. In this example there is only one "parent" and three children.

A slightly modified interFoam (see other patch) would generate this profile
Code:

                                  | parent  (total ) -  self  |    calls      total      self
----------------------------------------------------------------------------------------------
"application::main"              |  100.0% (  0.0%) -  0.0% |        1    69.82s    0.0021s
|-"init"                          |    0.1% (  0.1%) -  93.1% |        1  0.06371s  0.05934s
  |-"correctPhi"                  |    6.9% (  0.0%) -  84.9% |        1  0.004364s  0.003703s
    |-"lduMatrix::solver_pcorr"  |  15.1% (  0.0%) - 100.0% |        1  0.000661s  0.000661s
|-"timeLoop"                      |  99.9% (  3.9%) -  4.0% |    1046    69.75s    2.757s
  |-"gammaEqnSubCycle"            |  25.0% (  5.7%) -  22.9% |    1046    17.46s    3.996s
    |-"gammaEqn"                  |  77.1% ( 10.7%) -  55.4% |    2092    13.46s    7.461s
      |-"MULES::explicitSolve"    |  44.6% (  8.6%) - 100.0% |    2092    6.002s    6.002s
  |-"UEqn"                        |    9.4% (  9.4%) - 100.0% |    1046    6.572s    6.572s
  |-"pEqn"                        |  60.9% (  0.3%) -  0.5% |    3138    42.46s    0.2084s
    |-"pEqn - pre"                |  34.7% ( 21.1%) - 100.0% |    3138    14.72s    14.72s
    |-"pEqn - Corrector"          |  47.3% (  5.7%) -  19.9% |    3138    20.08s    3.993s
      |-"lduMatrix::solver_pd"    |  80.1% ( 23.0%) - 100.0% |    3138    16.08s    16.08s
    |-"pEqn - post"              |  17.6% ( 10.7%) - 100.0% |    3138    7.456s    7.456s
  |-"writing"                    |    0.7% (  0.7%) - 100.0% |      20    0.5095s    0.5095s



To install the profiling support into OF use the addProfilingToRelease.sh-script enclosed in the tar and recompile the installation (this works for 1.5(.x). 1.4.1 needs some trivial changes - if one needs it)
To use it "just" as a library compile it with "wmake libso" and have a look at the test-application that is enclosed. But I think the cooler way to use this is as an integrated part of OF because thus the linear solvers are automatically profiled

Some remarks:
  • I just found out that some source files of interFoam are shared by other VOF-solvers. Applying that patch thus breaks the compilation of those solvers if they're not modified as well
  • The addProfile/endProfile should be only applied for sections that are not called to often (inside for instance of a "forAll(U,cellI)"-loop is a BAAAD idea) but if the section is not used too entered too often per time-step then there should be no noticeable impact on the performance
  • There are some ugly places in the code. Most of them come from the effort to correctly write timing-information for sections that are still "on the stack"
  • It is on purpose that not synchronization is done for parallel-runs. The data is written per processor and might vary (thus reflecting an imbalance in certain sections)
So in conclusion I'd like to say that it would be cool if this (or something similar and probably better) would be included in future versions of OF as it would help to quantify the effects of measures that were taken to improve the performance (both in the program or in parameter-files).

Bernhard

Attachment 200

gschaider April 28, 2009 15:30

1 Attachment(s)
OK. The forum seems to hate my attachments. I'll retry.
Attachment 201

henry April 29, 2009 04:36

I use the callgrind tool in valgrind and view the output with kcachegrind, this has proved a very effective way of doing detailed profiling on the optimised code without the need to recompile with profiling options.

Henry

gschaider April 29, 2009 13:07

Hi Henry!

Quote:

Originally Posted by henry (Post 214580)
I use the callgrind tool in valgrind and view the output with kcachegrind, this has proved a very effective way of doing detailed profiling on the optimised code without the need to recompile with profiling options.

Thanks for that tip. I was only using valgrind for memory-leaks until now, but the info that kcachegrind gives is quite nice.

Nevertheless I think that this extension has its applications, especially because
  • the data is automatically there (no need to rerun the case, which is especially impractical for large cases)
  • detailed profiling for parallel runs is quite hard
  • the data is interpretable for non-programmers (so that they can see for instance how much a move from a CG-solver to AMG improves the performance of a particular part of the code - or not)
Of course one relies on the programmer to instrument the code, but then a "normal" user can see where the time was lost and has quantitative information (And go to the programmer "It is nice that the stuff you inserted works in the demo case, but for that particular case it eats most of the time in the section you called 'bigOexpNalgorithm' " which is better than "It's kinda slow")
So I see it not as much as a programmer's tool, more as a service for the user.

Bernhard

henry April 29, 2009 13:43

  • >the data is automatically there (no need to rerun the case, which is especially impractical for large cases)

    valgrind can be run in parallel so can be used for large cases, in fact any cases that can be run in the normal manner.
  • >detailed profiling for parallel runs is quite hard

    I have found that this is easy using valgrind.
  • >the data is interpretable for non-programmers (so that they can see for instance how much a move from a CG-solver to AMG improves the performance of a particular part of the code - or not)

    kcachegrind presents all the information in a very accessible manner.
> Of course one relies on the programmer to instrument the code,

Valgrind does not need any changes to the code or even recompilation.

> but then a "normal" user can see where the time was lost and has quantitative information

valgrind provides all the information.


All times are GMT -4. The time now is 19:49.