How to caluclate flops(floating point oprations in the fortran code. Is there any standard subroutine or functions pratap
FLOPS or Floating Operations per Second is a characteristic of your computer and not of your code.
If I compile my code with no optimization and measure the flop rate for a typical problem, I get a speed of n flops.
If I recompile the same code with full optimization, then measure the flop rate on the same problem, I get a speed of 2n or 3n [I'm not the best programmer : ) ].
I ran both of these on the same computer. Perhaps the flop rate can depend on several different things other than the computer?
You can estimate the FLOPs by estimating the number of floating point operations per cycle (time step or iteration). That generally involves
(number of nodes) x (multiplies + adds + divides + subtracts per node) + (same calculation for application of boundary conditions) is number of floating point operations per cycle. Multiply by cycles/sec (you can time this) to get FLOPs.
The Los Alamos folks suggest a simpler alternative, the 'cpu time per cell per cycle', or "grind." It's a lot easier to calculate than estimating all of the arithmetic operations.
Probably the computer science folks have made all of this precise and mathematical?
Assuming that the previos posters have misunderstood your question...
There are no standard ways in FORTRAN or C to automatically count FLOPS. My guess is that you're trying to optimise your implementation by minimising FLOPS carried out. If this is the case, some programming environments provide really detailed profiling tools. I got this from profiling a simple floating point loop program on my Compaq system here. Note the line that says the program did 201 FLOPS.
% cc -g program.c % pixie a.out % a.out.pixie % prof -pixstats a.out:
1088 (1.037) cycles (2.176e-06s @ 500.0MHz)
1049 (1.000) instructions
419 (0.399) interlock cycles due to basic block boundary
10 (0.010) nops
391 (0.373) alu (including logicals, shifts)
157 (0.150) logicals (including ldah and lda)
0 (0.000) shifts
0 (0.000) prefetches
185 (0.176) loads
15 (0.014) stores
200 (0.191) loads+stores
34 (0.032) load followed by load
200 (0.191) data bus use
186 (0.177) sp+gp load/stores
201 (0.192) flops (92.4 mflop/s @ 500.0MHz)
135 (0.129) conditional branches
100 (0.095) branch to branch
100 (0.095) branch to branch taken
263 (0.251) basic blocks
6 (0.006) calls
0 (0.000) skip
<plus loads more output not included>
In order to know MegaFLOPS (and other events like loads,stores and so on...) you have to instrument your code.
Take a look to: http://www.fz-juelich.de/zam/PCL/
where you can find a quit "portable" performace counter library.
Best regards, Giorgio
|All times are GMT -4. The time now is 03:12.|