# Introduction to turbulence/Statistical analysis/Probability

## Probability

### The histogram and probability density function

The frequency of occurence of a given amplitude (or value) from a finite number of realizations of a random variable can be displayed by dividing the range of possible values of the random variables into a number of slots (or windows). Since all possible values are covered, each realization fits into only one window. For every realization a count is entered into the appropriate window. When all the realizations have been considered, the number of counts in each window is divided by the total number of realizations. The result is called the histogram (or frequency of occurence diagram). From the definitioin it follows immediately that the sum of the values of all the windows is exactly one.

The shape of a histogram depends on the statistical distribution of the random variable, but it also depends on the total number of realizations, N, and the size of the slots, $\Delta c$. THe histogram can be represented symbolically by the function $H_{x}(c,\Delta c,N)$ where $c\leq x < c + \Delta c$, $\Delta c$ is the slot width, and $N$ is the number of realizaions of the random variable. Thus the histogram shows the relative frequency of occurence of a given value range in a given ensemble. Figure 2.3 illustrates a typical histogram. If the size of the sample is increased so that the number of realizations in each window increases, the diagram will become less erratic and will be more representative of the actual probability of occurence of the amplitudes of the signal itself, as long as the window size is sufficiently small.

If the number of realizations, $N$, increases without bound as the window size, $\Delta c$ , goes to zero, the histogram divided by the window size goes to a limiting curve called the probability density function, $B_{x} \left( c \right)$. That is,

 $B_{x} \left( c \right) \equiv \lim_{{ N \rightarrow \infty} } H \left( c , \Delta c , N \right) / \Delta c$ (2)

Note that as the window width goes to zero, so does the number of realizations which fall into it, $N H$. That it is only when this number (or relative number) is divided by the slot width that a meaningful limit is achieved.

The probability density function (or pdf) has the following propeties:

• Property 1:
 $B_{x} \left( c \right) > 0$ (2)

always.

• Property 2:
 $Prob \left\{c < x < c + dc \right\} = B_{x} \left(c \right) dc$ (2)

where $Prob \left\{ \right\}$ is read "the probability that".

• Property 3:
 $Prob \left\{ x < c \right\} = \int ^{\infty}_{- \infty }B_{x} \left(c \right) dc$ (2)
• Property 4:
 $\int ^{\infty}_{- \infty }B_{x} \left(x \right) dx = 1$ (2)

The condition imposed by property (1) simply states that negative probabilities are impossible, while property (4) assures that the probability is unity that a realization takes on some value. Property (2) gives the probability of finding the realization in a interval around a certain value, while property (3) provides the probability that the realization is less than a prescribed value. Note the necessity of distinguishing between the running variable, $x$ , and the integration variable, $c$, in equations 2.14 and 2.15.

Since $B_{x} \left( c \right) dc$ gives the probability of the random variable $x$ assuming a value between $c$ and $c + dc$, any moment of the distribution can be computed by integrating the appropriate power of $x$ over all possible values. Thus the $n$ - th moment is given by:

 $\left\langle x^{n} \right\rangle = \int^{\infty}_{- \infty} c^{n} B_{x} \left(c \right) dc$ (2)

If the probability density is given, the moments of all orders can be determined. For example, the variance can be determined by:

 $var \left\{ x \right\} = \left\langle \left( x - X \right)^2 \right\rangle = \int^{\infty}_{- \infty} \left(c - X \right)^2 B_{x} \left(c \right) dc$ (2)

The central moments give information about the shape of the probability density function, and vice versa. Figure 2.4 shows three distributions which have the same mean and standard deviation, but are clearly quite different. Beneath them are shown random functions of time, which might have generated them. Distribution (b) has a higher value of the fourth central moment than does distribution (a). This can be easily seen from the definition

 $\left\langle \left( x - X \right)^{4} \right\rangle = \int^{\infty}_{- \infty} \left(c - X \right)^4 B_{x} \left(c \right) dc$ (2)

since the fourth power emphasizes the fact that distribution (b) has more weight in the tails than does distribution (a).

It is also easy to see that because of the symmetry of pdf's in (a) and (b) all the odd central moments will be zero. Distributions (c) and (d), on the other hand, have non-zero values for the odd moments, because of their asymmtry. For example,

 $\left\langle \left( x - X \right)^{3} \right\rangle = \int^{\infty}_{- \infty} \left(c - X \right)^3 B_{x} \left(c \right) dc$ (2)

is equal to zero if B_{x} is an even function.

### The probability distribution

Sometimes it is convienient to work with the probability distribution instead of with probability density function. The probability distribution is defined as the probability that the random variable has a value less than or equal to a given value. Thus from equation 2.15, the probability distribution is given by

 $F_{x} \left( c \right) = Prob \left\{ x < c \right\} = \int^{c}_{-\infty} B_{x} \left( c^{'} \right) d c^{'}$ (2)

Note that we had to introduce the integration variable, $c^{'}$, since $c$ occured in the limits.

Equation 2.21 can be inverted by differentiating by $c$ to obtain

 Failed to parse (syntax error): B_{x} \left( c \right) = \frac{dF_{x}}{dc (2)