# Introduction to turbulence/Statistical analysis/Estimation from a finite number of realizations

(Difference between revisions)
 Revision as of 11:39, 18 June 2007 (view source)Jola (Talk | contribs)← Older edit Latest revision as of 16:42, 31 August 2007 (view source) (→Bias and convergence of estimators) (4 intermediate revisions not shown) Line 1: Line 1: + {{Introduction to turbulence menu}} == Estimators for averaged quantities == == Estimators for averaged quantities == Line 12: Line 13: A procedure for answering these questions will be illustrated by considerind a simple '''estimator''' for the mean, the arithmetic mean considered above, $X_{N}$. For $N$ independent realizations $x_{n}, n=1,2,...,N$ where $N$ is finite, $X_{N}$ is given by: A procedure for answering these questions will be illustrated by considerind a simple '''estimator''' for the mean, the arithmetic mean considered above, $X_{N}$. For $N$ independent realizations $x_{n}, n=1,2,...,N$ where $N$ is finite, $X_{N}$ is given by: - + -
+ :$X_{N}=\frac{1}{N}\sum^{N}_{n=1} x_{n}$ - :$+ - X_{N}=\frac{1}{N}\sum^{N}_{n=1} x_{n} + -$ + - (2)
+ Figure 2.9 not uploaded yet Figure 2.9 not uploaded yet Line 23: Line 20: Now, as we observed in our simple coin-flipping experiment, since the $x_{n}$ are random, so must be the value of the estimator $X_{N}$. For the estimator to be ''unbiased'', the mean value of $X_{N}$ must be true ensemble mean, $X$, i.e. Now, as we observed in our simple coin-flipping experiment, since the $x_{n}$ are random, so must be the value of the estimator $X_{N}$. For the estimator to be ''unbiased'', the mean value of $X_{N}$ must be true ensemble mean, $X$, i.e. -
+ :$\lim_{N\rightarrow\infty} X_{N} = X$ - :$+ - \lim_{N\rightarrow\infty} X_{N} = X + -$ + - (2)
+ It is easy to see that since the operations of averaging adding commute, It is easy to see that since the operations of averaging adding commute, -
:$:[itex] \begin{matrix} \begin{matrix} Line 39: Line 31: \end{matrix} \end{matrix}$ [/itex] - (2)
(Note that the expected value of each $x_{n}$ is just $X$ since the $x_{n}$ are assumed identically distributed). Thus $x_{N}$ is, in fact, an ''unbiased estimator for the mean''. (Note that the expected value of each $x_{n}$ is just $X$ since the $x_{n}$ are assumed identically distributed). Thus $x_{N}$ is, in fact, an ''unbiased estimator for the mean''. Line 45: Line 36: The question of ''convergence'' of the estimator can be addressed by defining the square of '''variability of the estimator''', say $\epsilon^{2}_{X_{N}}$, to be: The question of ''convergence'' of the estimator can be addressed by defining the square of '''variability of the estimator''', say $\epsilon^{2}_{X_{N}}$, to be: -
:$:[itex] \epsilon^{2}_{X_{N}}\equiv \frac{var \left\{ X_{N} \right\} }{X^{2}} = \frac{\left\langle \left( X_{N}- X \right)^{2} \right\rangle }{X^{2}} \epsilon^{2}_{X_{N}}\equiv \frac{var \left\{ X_{N} \right\} }{X^{2}} = \frac{\left\langle \left( X_{N}- X \right)^{2} \right\rangle }{X^{2}}$ [/itex] - (2)
Now we want to examine what happens to $\epsilon_{X_{N}}$ as the number of realizations increases. For the estimator to converge it is clear that $\epsilon_{x}$ should decrease as the number of sample increases. Obviously, we need to examine the variance of $X_{N}$ first. It is given by: Now we want to examine what happens to $\epsilon_{X_{N}}$ as the number of realizations increases. For the estimator to converge it is clear that $\epsilon_{x}$ should decrease as the number of sample increases. Obviously, we need to examine the variance of $X_{N}$ first. It is given by: -
:$:[itex] \begin{matrix} \begin{matrix} Line 60: Line 48: \end{matrix} \end{matrix}$ [/itex] - (2)
- since $\left\langle X_{N} \right\rangle = X$ from equation 2.46. Using the fact that operations of averaging and summation commute, the squared summation can be expanded as follows: + since $\left\langle X_{N} \right\rangle = X$ from the equation for $\langle X_{N} \rangle$ above. Using the fact that operations of averaging and summation commute, the squared summation can be expanded as follows: -
:$:[itex] \begin{matrix} \begin{matrix} Line 72: Line 58: \end{matrix} \end{matrix}$ [/itex] - (2)
- where the next to last step follows from the fact that the $x_{n}$ are assumed to be statistically independent samples (and hence uncorrelated), and the last step from the definition of the variance. It follows immediately by substitution into equation 2.49 that the square of the variability of the estimator, $X_{N}$, is given by: + where the next to last step follows from the fact that the $x_{n}$ are assumed to be statistically independent samples (and hence uncorrelated), and the last step from the definition of the variance. It follows immediately by substitution into the equation for $\epsilon^{2}_{X_{N}}$ above that the square of the variability of the estimator, $X_{N}$, is given by: -
:$:[itex] \begin{matrix} \begin{matrix} Line 83: Line 67: \end{matrix} \end{matrix}$ [/itex] - (2)
- Thus ''the variability of the stimator depends inversely on the number of independent realizations, $N$, and linearly on the relative fluctuation level of the random variable itself $\sigma_{x}/ X$''. Obviously if the relative fluctuation level is zero (either because there the quantity being measured is constant and there are no measurement errors), then a single measurement will suffice. On the other hand, as soon as there is any fluctuation in the $x$ itself, the greater the fluctuation ( relative to the mean of $x$, $\left\langle x \right\rangle = X$ ), then the more independent samples it will take to achieve a specified accuracy. + Thus ''the variability of the estimator depends inversely on the number of independent realizations, $N$, and linearly on the relative fluctuation level of the random variable itself $\sigma_{x}/ X$''. Obviously if the relative fluctuation level is zero (either because there the quantity being measured is constant and there are no measurement errors), then a single measurement will suffice. On the other hand, as soon as there is any fluctuation in the $x$ itself, the greater the fluctuation ( relative to the mean of $x$, $\left\langle x \right\rangle = X$ ), then the more independent samples it will take to achieve a specified accuracy. '''Example:''' In a given ensemble the relative fluctuation level is 12% (i.e. $\sigma_{x}/ X = 0.12$). What is the fewest number of independent samples that must be acquired to measure the mean value to within 1%? '''Example:''' In a given ensemble the relative fluctuation level is 12% (i.e. $\sigma_{x}/ X = 0.12$). What is the fewest number of independent samples that must be acquired to measure the mean value to within 1%? - '''Answer'''Using equation 2.52, and taking $\epsilon_{X_{N}}=0.01$, it follows that: + '''Answer'''Using the equation for $\epsilon^{2}_{X_{N}}$ above, and taking $\epsilon_{X_{N}}=0.01$, it follows that: - -
:$:[itex] \left(0.01 \right)^{2} = \frac{1}{N}\left(0.12 \right)^{2} \left(0.01 \right)^{2} = \frac{1}{N}\left(0.12 \right)^{2}$ [/itex] - (2)
or $N \geq 144$. or $N \geq 144$. + + {{Turbulence credit wkgeorge}} + + {{Chapter navigation|Multivariate random variables|Generalization to the estimator of any quantity}}

## Latest revision as of 16:42, 31 August 2007

 Nature of turbulence Statistical analysis Ensemble average Probability Multivariate random var ... Estimation from a finite ... Generalization to the esti ... Reynolds averaged equation Turbulence kinetic energy Stationarity and homogeneity Homogeneous turbulence Free turbulent shear flows Wall bounded turbulent flows Study questions ... template not finished yet!

## Estimators for averaged quantities

Since there can never an infinite number of realizations from which ensemble averages (and probability densities) can be computed, it is essential to ask: How many realizations are enough? The answer to this question must be sought by looking at the statistical properties of estimators based on a finite number of realization. There are two questions which must be answered. The first one is:

• Is the expected value (or mean value) of the estimator equal to the true ensemble mean? Or in other words, is yje estimator unbiased?

The second question is

• Does the difference between the and that of the true mean decrease as the number of realizations increases? Or in other words, does the estimator converge in a statistical sense (or converge in probability). Figure 2.9 illustrates the problems which can arise.

## Bias and convergence of estimators

A procedure for answering these questions will be illustrated by considerind a simple estimator for the mean, the arithmetic mean considered above, $X_{N}$. For $N$ independent realizations $x_{n}, n=1,2,...,N$ where $N$ is finite, $X_{N}$ is given by:

$X_{N}=\frac{1}{N}\sum^{N}_{n=1} x_{n}$

Figure 2.9 not uploaded yet

Now, as we observed in our simple coin-flipping experiment, since the $x_{n}$ are random, so must be the value of the estimator $X_{N}$. For the estimator to be unbiased, the mean value of $X_{N}$ must be true ensemble mean, $X$, i.e.

$\lim_{N\rightarrow\infty} X_{N} = X$

It is easy to see that since the operations of averaging adding commute,

$\begin{matrix} \left\langle X_{N} \right\rangle & = & \left\langle \frac{1}{N} \sum^{N}_{n=1} x_{n} \right\rangle \\ & = & \frac{1}{N} \sum^{N}_{n=1} \left\langle x_{n} \right\rangle \\ & = & \frac{1}{N} NX = X \\ \end{matrix}$

(Note that the expected value of each $x_{n}$ is just $X$ since the $x_{n}$ are assumed identically distributed). Thus $x_{N}$ is, in fact, an unbiased estimator for the mean.

The question of convergence of the estimator can be addressed by defining the square of variability of the estimator, say $\epsilon^{2}_{X_{N}}$, to be:

$\epsilon^{2}_{X_{N}}\equiv \frac{var \left\{ X_{N} \right\} }{X^{2}} = \frac{\left\langle \left( X_{N}- X \right)^{2} \right\rangle }{X^{2}}$

Now we want to examine what happens to $\epsilon_{X_{N}}$ as the number of realizations increases. For the estimator to converge it is clear that $\epsilon_{x}$ should decrease as the number of sample increases. Obviously, we need to examine the variance of $X_{N}$ first. It is given by:

$\begin{matrix} var \left\{ X_{N} \right\} & = & \left\langle X_{N} - X^{2} \right\rangle \\ & = & \left\langle \left[ \lim_{N\rightarrow\infty} \frac{1}{N} \sum^{N}_{n=1} \left( x_{n} - X \right) \right]^{2} \right\rangle - X^{2}\\ \end{matrix}$

since $\left\langle X_{N} \right\rangle = X$ from the equation for $\langle X_{N} \rangle$ above. Using the fact that operations of averaging and summation commute, the squared summation can be expanded as follows:

$\begin{matrix} \left\langle \left[ \lim_{N\rightarrow\infty} \sum^{N}_{n=1} \left( x_{n} - X \right) \right]^{2} \right\rangle & = & \lim_{N\rightarrow\infty}\frac{1}{N^{2}} \sum^{N}_{n=1} \sum^{N}_{m=1} \left\langle \left( x_{n} - X \right) \left( x_{m} - X \right) \right\rangle \\ & = & \lim_{N\rightarrow\infty}\frac{1}{N^{2}}\sum^{N}_{n=1}\left\langle \left( x_{n} - X \right)^{2} \right\rangle \\ & = & \frac{1}{N} var \left\{ x \right\} \\ \end{matrix}$

where the next to last step follows from the fact that the $x_{n}$ are assumed to be statistically independent samples (and hence uncorrelated), and the last step from the definition of the variance. It follows immediately by substitution into the equation for $\epsilon^{2}_{X_{N}}$ above that the square of the variability of the estimator, $X_{N}$, is given by:

$\begin{matrix} \epsilon^{2}_{X_{N}}& =& \frac{1}{N}\frac{var\left\{x\right\}}{X^{2}} \\ & = & \frac{1}{N} \left[ \frac{\sigma_{x}}{X} \right]^{2} \\ \end{matrix}$

Thus the variability of the estimator depends inversely on the number of independent realizations, $N$, and linearly on the relative fluctuation level of the random variable itself $\sigma_{x}/ X$. Obviously if the relative fluctuation level is zero (either because there the quantity being measured is constant and there are no measurement errors), then a single measurement will suffice. On the other hand, as soon as there is any fluctuation in the $x$ itself, the greater the fluctuation ( relative to the mean of $x$, $\left\langle x \right\rangle = X$ ), then the more independent samples it will take to achieve a specified accuracy.

Example: In a given ensemble the relative fluctuation level is 12% (i.e. $\sigma_{x}/ X = 0.12$). What is the fewest number of independent samples that must be acquired to measure the mean value to within 1%?

AnswerUsing the equation for $\epsilon^{2}_{X_{N}}$ above, and taking $\epsilon_{X_{N}}=0.01$, it follows that:

$\left(0.01 \right)^{2} = \frac{1}{N}\left(0.12 \right)^{2}$

or $N \geq 144$.

## Credits

This text was based on "Lectures in Turbulence for the 21st Century" by Professor William K. George, Professor of Turbulence, Chalmers University of Technology, Gothenburg, Sweden.