CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Main CFD Forum

Estimating confidence intervals for average results from LES

Register Blogs Community New Posts Updated Threads Search

Like Tree2Likes
  • 1 Post By FMDenaro
  • 1 Post By flotus1

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   February 2, 2017, 12:32
Default Estimating confidence intervals for average results from LES
  #1
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Say I performed a LES and extracted some transient value from it, like e.g. the data in this figure:
example.png

This might be an average mass flow rate through a plane or the force on a specific boundary, doesn't matter.
Now calculating the average value of this quantity is straightforward. But how high is the uncertainty range for this average value? Lets leave aside issues like initial transients or any uncertainties in the simulation modeling and focus on statistics. If I had the computational resources to perform an infinite number of time steps, the average value \mu_{\infty} of this time series would have zero statistical error. But since in this case I only have 5000 time steps in total, my estimate for the mean value \mu_{5000} is obviously not infinitely accurate.

Now what I want is to estimate a confidence interval (95%, 99%, whatever) to say that the infinite mean value \mu_{\infty} lies within this distance from my estimated mean value \mu_{5000}.
Or to put it differently: I want to give my simulation result as µ=0.234+-0.056 with 95% certainty.

Edit:
Let me put my question differently:
When performing the same simulation N times with slightly different initial conditions and measuring a time series after the initial transient: I get N different time series with different mean values. Assuming that my sampling time is long enough (>> the largest time scale in the flow) these mean values will be normally distributed.
Now what I want is the standard deviation of this normal distribution. Estimating this would be straightforward if I had all N simulations, but I can only afford one of them. There must be a clever trick to estimate the standard deviation from only one sample.


I am not quite sure which is the correct approach here.
From what I recalled from my "statistics for engineers" lecture I came up with the following approach:
1) Divide the time-series into sub-series of smaller length, e.g. 500 time steps each.
2) Calculate the mean values of these sub-series.
3) Omit every second sub-series to make sure the mean values are uncorrelated.
4) Calculate the standard deviation of the remaining sub-series mean values: s_{sub}.
5) estimate the standard deviation of the time-series mean values as: s_{5000}=\frac{s_{sub}}{\sqrt{n}}.
Here n is the amount of remaining sub-series.
6) Multiply by the appropriate value of Student's t-distribution to obtain the confidence interval
AFAIK, this procedure is based on the assumption that the mean values of the sub-series are uncorrelated (see 3)) and normally distributed. Both properties could be checked additionally.

Is this a valid approach or is there a better one?

Last edited by flotus1; February 3, 2017 at 04:19. Reason: better title
flotus1 is offline   Reply With Quote

Old   February 2, 2017, 12:54
Default
  #2
Senior Member
 
Filippo Maria Denaro
Join Date: Jul 2010
Posts: 6,768
Rep Power: 71
FMDenaro has a spectacular aura aboutFMDenaro has a spectacular aura aboutFMDenaro has a spectacular aura about
Well, what you are asking for has nothing to do with LES...the same issue would be true for URANS as well as for DNS.

Usually, we do a statistical ensemble average using several fields in a certain period of time. For example, no less than 30 samples in a time T that must be evaluated from the characteristic turnover time. That makes statistically meaningful the statistics. Obviously, that does not mean that such statistically averaged field is also constant in time.
FMDenaro is offline   Reply With Quote

Old   February 2, 2017, 13:03
Default
  #3
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
I am aware that this issue is not unique to LES and I have some knowledge about how to post-process DNS, LES and URANS in general.
My question is specifically about the quantitative statistical uncertainty for the average flow properties obtained from this kind of simulation.
flotus1 is offline   Reply With Quote

Old   February 2, 2017, 13:10
Default
  #4
Senior Member
 
Filippo Maria Denaro
Join Date: Jul 2010
Posts: 6,768
Rep Power: 71
FMDenaro has a spectacular aura aboutFMDenaro has a spectacular aura aboutFMDenaro has a spectacular aura about
Quote:
Originally Posted by flotus1 View Post
I am aware that this issue is not unique to LES and I have some knowledge about how to post-process DNS, LES and URANS in general.
My question is specifically about the quantitative statistical uncertainty for the average flow properties obtained from this kind of simulation.

ok, I wrote that because the title of your post mentioned LES while it is a more general question... Concerning LES/DNS, we focus on spatial correlation and spectra. Usually, we perform spatial averaging along the homogeneous direction and the supplementary time (ensemble) averaging is performed to make the statistics more meaningful.
To tell the truth, I have no idea of published papers that show a quantitative analysis for the error between finite period and asymptotic (T->Infinity) averaging ...that should be more related to the signal analysis field
FMDenaro is offline   Reply With Quote

Old   February 2, 2017, 14:05
Default
  #5
Senior Member
 
Filippo Maria Denaro
Join Date: Jul 2010
Posts: 6,768
Rep Power: 71
FMDenaro has a spectacular aura aboutFMDenaro has a spectacular aura aboutFMDenaro has a spectacular aura about
I remembered some comments reported in this report, sec. 3.1.3
http://torroja.dmt.upm.es/turbdata/a...ARD-AR-345.pdf
flotus1 likes this.
FMDenaro is offline   Reply With Quote

Old   February 3, 2017, 03:58
Default
  #6
Senior Member
 
sbaffini's Avatar
 
Paolo Lampitella
Join Date: Mar 2009
Location: Italy
Posts: 2,151
Blog Entries: 29
Rep Power: 39
sbaffini will become famous soon enoughsbaffini will become famous soon enough
Send a message via Skype™ to sbaffini
Dear Alex,

I can't add any specific information to the matter. Just my 2 cents on how I do it myself in general, without actually considering any quantitative aspect.

Consider any turbulent case producing, eventually, a statistically steady state as yours. Now, you mention taking 1 subset of contiguous 500 samples every 2. However, those 500 samples will not be independent. Actually, if you advance in time with an accurate scheme, each sample in your grid will have a very strict correlation with the one at the previous time step.

What I do instead is picking just 1 value every n, where n is function of the flow and the selected time step. You can choose n by first reaching the steady state, then collecting some contiguoud samples as you did, and finally performing an autocorrelation (in your case at least) in time. That will give you the minimum n to achieve independence between the samples. Then just restart running, but now taking 1 every n samples, with n just determined.

For what concerns when to stop, once you have the previous procedure in place, you can also monitor the running average over the samples taken as described above (n below just counts the samples for the running average, has nothing to do with the n above):

x_avg_n = (n-1)/n * x_avg_n-1 + x_n/n

Thus, monitoring x_avg_n, you can see when it reaches your confidence interval, say within +- y% of a certain value. You will not have a quantitative measure of the certainty that the final avg value will be in that interval, but tipically such visual inspection is such that you don't need that anymore.

This, obviously, does not necessarily requires less samples.

In this case, you are however also considering any possible aspect related to an LES dependent correlation between contiguous samples. That is, for your LES, the time over which samples will decorrelate is a function of several modeling/numerical aspects and, in principle, that time is different from the DNS one on the same experiment. With this approach you are somehow taking into account such specificities (in contrast to just taking 500 contiguous samples, no matter what).
sbaffini is offline   Reply With Quote

Old   February 3, 2017, 04:17
Default
  #7
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Thanks for the link to the report, it is a good read.
Making use of a spatially homogeneous direction is not possible since there is none in the 3D geometries I am currently investigating.

Let me put my question differently:
When performing the same simulation N times with different initial conditions and measuring a time series after the initial transient: I get N different time series with different mean values. Assuming that my sampling time is long enough (>> the largest time scale in the flow) these mean values will be normally distributed.
Now what I want is the standard deviation of this normal distribution. Estimating this would be straightforward if I had all N simulations, but I can only afford one of them. There must be a clever trick to estimate the standard deviation from only one sample.
flotus1 is offline   Reply With Quote

Old   February 3, 2017, 05:50
Default
  #8
Senior Member
 
sbaffini's Avatar
 
Paolo Lampitella
Join Date: Mar 2009
Location: Italy
Posts: 2,151
Blog Entries: 29
Rep Power: 39
sbaffini will become famous soon enoughsbaffini will become famous soon enough
Send a message via Skype™ to sbaffini
Quote:
Originally Posted by flotus1 View Post
When performing the same simulation N times with different initial conditions and measuring a time series after the initial transient: I get N different time series with different mean values. Assuming that my sampling time is long enough (>> the largest time scale in the flow) these mean values will be normally distributed.
I'm just arguing (not an expert here) but, will they when the samples come from a single experiment?

I feel like this might be another LES related point. Turbulence statistics are clearly not gaussian. But lack of gaussianity, to the best of my knowledge, is mostly related to the smallest scales and the flow type. Imagine sampling near the wall of an LES simulated flow. Do you expect those samples to follow any gaussianity? Or any other PDF not dependent on the numerics/modeling?

That's why ensuring independence among all the single samples seems the minimum requirement to me (still, I repeat, not an expert here, just for the sake of discussion).

Consider also that the whole matter has also to do with the ergodic hypothesis, e.g. http://www3.imperial.ac.uk/portal/pl.../1/9607696.PDF

P.S. I understand your original question, you are just looking for a formula which, probably, is in any statistics textbook (still, I don't have any at the moment, otherwise I would have searched it for you). But I also want to open the discussion to LES related aspects which might be relevant.
sbaffini is offline   Reply With Quote

Old   February 3, 2017, 06:11
Default
  #9
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
I completely agree with your point that turbulence statistics are not Gaussian.
But what I took from my statistics lecture is that summing a sufficient amount of samples from an arbitrary distribution function these sums will be Gaussian ->central limit theorem. And since calculating the mean involves the summation over all sampled values I expect the mean values to be Gaussian.
sbaffini likes this.

Last edited by flotus1; February 3, 2017 at 07:23.
flotus1 is offline   Reply With Quote

Old   February 3, 2017, 08:21
Default
  #10
Senior Member
 
Filippo Maria Denaro
Join Date: Jul 2010
Posts: 6,768
Rep Power: 71
FMDenaro has a spectacular aura aboutFMDenaro has a spectacular aura aboutFMDenaro has a spectacular aura about
Just from a very practical point of view, considering your problem that has no homogeneity directions, I think you can use a single simulation that, after the numerical transient is ended and an energy equilibrium is reached, allows you to sample the fields. In other words, you use your LES simulation to obtain a RANS-like solution by performing an ensemble average of the fields that approximates the time averaging. You will sample until a steady averaged field is obtained. Obviously, no high order statistics can be obtained from such averaged field, only zero-th order statistics.
However, using such steady field, you can compute the fluctuations (in the sense of the LES residual to RANS solution) for each field simply by subtraction. Now, statistics at each time can be obtained from that. The time auto-correlation can be use to compute the separation time value, that gives an idea of how many periods you have that could mimik the series of experiments.
FMDenaro is offline   Reply With Quote

Old   February 3, 2017, 08:24
Default
  #11
Senior Member
 
sbaffini's Avatar
 
Paolo Lampitella
Join Date: Mar 2009
Location: Italy
Posts: 2,151
Blog Entries: 29
Rep Power: 39
sbaffini will become famous soon enoughsbaffini will become famous soon enough
Send a message via Skype™ to sbaffini
Have you checked these pages?

https://en.wikipedia.org/wiki/Confidence_interval
https://en.wikipedia.org/wiki/Normal..._of_parameters
sbaffini is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem: Very long "write" time (~2h-3h) for results and transient results Shawn_A CFX 16 April 12, 2016 20:49
'sample' utility for 'U' yields different results for simple-scotch-etc. HakikiCanakkaleli OpenFOAM Post-Processing 3 January 5, 2014 12:08
Creating a tool to interpolate results Luis Batista OpenFOAM Running, Solving & CFD 2 April 11, 2013 08:15
Transient Run - Output "Time" in partial results? evcelica CFX 2 May 16, 2012 21:36
Different Results from Fluent 5.5 and Fluent 6.0 Rajeev Kumar Singh FLUENT 6 December 19, 2010 11:33


All times are GMT -4. The time now is 19:51.