SE_003399  SE logo

Monte Carlo Oversights

Summary:

Background:

Underlying nearly all Monte Carlo simulations is a simple (and correct) idea:   Samples from a known probability density can be arithmetically produced in large numbers to observe the behavior of infrequent events. Of course most engineering simulations are significantly more involved than multiple samples from a single known distribution. These simulations sample from many different known densities and compute some characteristic of interest (e.g. failure). Then the number of these is tallied and their frequency estimated as their number divided by the total number of sample created, p = #failures/#samples.  There is nothing wrong with this thinking.

But in all real situations the parameters of the probability densities are NOT known; however statistical parameter estimates are usually available.  For example, if the density is assumed to be normal, its mean, mu, can be estimated by XbarX/n.  But mu does NOT equal Xbar, although the Central Limit Theorem insists that it is probably close.

Here is where most of my engineering colleagues would say "close enough for engineering work" and conclude, erroneously, that mu equals Xbar.   These same colleagues would never be tempted to say that some small probability, say, 1/1000, is equal to zero because it is "close enough for engineering work."  Why is it close enough in the one situation but not in the other?

The answer is that in the second case it is obvious that such an assumption would lead to a gross underestimation of potential risk, while in the first case the underestimation is less obvious, although just as real.

Two Simulations:

To illustrate this numerically we suggest two simulations.  These are not presented here, nor are their results.  For the distinctions to be meaningful, kind reader, requires that you create and execute both simulations (they're E-Z!) and draw conclusions for yourself.

First, we assume that you have a means of creating a Monte Carlo simulator; if you didn't it is unlikely you would have been sufficiently interested to read this far.  So the steps to be taken in the two simulations will be outlined.  The actual implementation will depend on individual choice.

Simulation 1:

The is the simplest form of MC simulation, and is an exemplar for most engineering MC simulations.

  1. Define the universe.  In this simplest of simulations we will sample from a standard normal density, with known parameters, mu = 0, and sigma= 1.

  2. Draw n = 10,000 samples form this universe.

  3. Estimate the empirical probability that x < -3, by counting the values of x that are less than minus 3, and dividing by 10,000, i.e. p = #(x < -3)/#samples.

Of course we already know the theoretical answer because we know we are sampling from a standard normal density, p(x < -3) = 0.001349898.  Your simulation will likely have observed about 13 values of x less than -3.   It won't be exactly 13 because this is a random sample.   But if you were to create a much larger sample your empirical probability would approach the true value.  You already know all this, since this is the way you already perform MC simulations.   The reason for going to all the trouble is to provide a comparison for Simulation 2. 

Simulation 2:

This simulation considers the more realistic situation where the true values for mu and sigma are NOT known.  WAIT!  Of course you know them; you are building the simulator!  But you can override your simulated omniscience and create a more realistic situation by using one simulation nested within another. 

The outer simulation loop mimics your having conducted a laboratory experiment to determine the mean and variance of the probability density to be used in the inner simulation.   The inner simulation is just like Simulator 1, except that it uses parameter estimates provided by the outer simulator.   In this way you are simulating having to rely on parameter estimates since the true values are unknown and must be estimated by laboratory or other experiments.

Outer Loop:

  1. Again, define the universe, this time providing estimates of the unknown parameters, muand sigma.  We will use Xbar(Xbar) and s as the estimators ofmuand sigma, respectively.

    To do this draw a sample of size, say n=10, from the known standard normal, and compute XbarXbarand s, using the standard estimates(*).   This represents your having to use estimates for  muand sigma, just as you must do in your real simulations.

Inner Loop:

  1. Now conduct Simulation 1, but using the parameter estimates, rather than their true values, as in Simulation 1.  That is, draw n=10,000 samples from this universe, i.e. N(Xbar, s) rather than N(0,1).

  2. Keep a running total of the number of instances that x < -3.

Outer Loop (continued):

  1. Repeat the Outer Loop (and of course its nested Inner Loop) some large number of times (say 1,000), using new estimates ( Xbarand s ) formuand sigma for each generation of 10,000 Simulation 1-type samples.

  2. Now estimate the empirical probability that x < -3, by counting the values of x that are less than -3, and dividing by the total number of samples, in our example 1,000 x 10,000 = 107, i.e. p = #(x < -3)/#samples.

Compare this result with p = 0.001349898.  Startling, isn't it?  You observed more than 5 times as many "failures" as you expected.

Try running the inner loop with 100,000 samples, rather than the current 10,000.  What happens to the empirical probability that x < -3?   Right!  Nothing!   Running the inner loop longer and longer has little influence because the real cause of the variability is the uncertainty in the sampling population's parameters, the very entities you usually assume that you know with certainty, when you really had to have estimated them from outside data.

Do the simulations again, but this time record the number of times that x < -4.   This time you observe nearly 50 times as many "failures" as you expected.   If your criterion were p=1x10-6, Simulation 1 will show about 1 per million x <  -4.753424, while Simulation 2 will be more than 500 times that.  That ISN'T close enough for engineering work.

A Feud of Historic Dimensions:

Confusing a parameter value with its estimate has an interesting history, and was the cause of a monumental quarrel between two statistical giants, Karl Pearson and R. A. Fisher.

Karl Pearson (1857-1936) was the leading statistician of his day. He founded the prestigious statistical journal Biometrika (still a major publication), the Pearson correlation coefficient is named after him, he developed a system of probability density curves also named after him, and he invented the Chi-square test of statistical significance.

R. A. Fisher (1890-1962) wrote the first book on Design of Experiments, revolutionized statistics with the concept of likelihood and estimating parameters by maximizing their likelihood, and told Pearson that he misunderstood his own Chi-square test, and was therefore calculating probability of failure incorrectly by using the sample means as though they were the population means.

The resulting acrimonious and vitriolic row lasted years and was finally resolved in 1926 in Fisher's favor by, ironically, Egon Pearson, Karl Pearson's son, who published the results of 12,000 2x2 tables observed under random sampling.  If Karl Pearson had been correct the observed average value for Chi-square would have been 3.  Fisher said it would be 1, as it was. (ref.: Joan Fisher Box, R.A. Fisher - the Life of a Scientist, Wiley, 1978)  As a result, the elder Pearson's erroneous calculations would erroneously accept as having a 5% probability of occurrence, something with a true probability of 0.5% - an error of 10x,  effectively increasing his Type II error rate (failing to reject a false null hypothesis) by 10x.

Summary:

There are several lessons here:

  1. You are in good company - even great minds can miss the distinction between a parameter value and its estimate.

  2. Problems caused by confusing sample statistics with population parameters are not restricted to the Normal density.

  3. Consequences of this confusion will result in anticonservative underestimations of "low probability" events, i.e the real probability will by higher than you think it is.  (It will also result in anticonservative overestimations of "high probability" events because the unaccounted for uncertainty forces the true probabilities toward p=0.5.)

Let's Review:

estimators 

s is not sigma and Xbar is not mu.

The sample statistic does not equal the population parameter (but it will be close to it).

... and if your simulator is not accounting for these facts, then your probability estimates are WRONG.

All is not hopeless, of course, and statisticians have been studying the phenomenon uncovered by the Pearson - Fisher feud for 80 years.  Not surprisingly there is no "silver bullet" quick-fix, one-size-fits-all solution, but we do have 8 decades of statistical theory and practice in successfully dealing with it.


(*) Notes:

estimators