Underlying nearly all Monte Carlo simulations is a simple (and correct) idea:
Samples from a known probability density can be arithmetically produced in
large numbers to observe the behavior of infrequent events. Of course most
engineering simulations are significantly more involved than multiple samples
from a single known distribution. These simulations sample from many different
known densities and compute some characteristic of interest (e.g. failure).
Then the number of these is tallied and their frequency estimated as their
number divided by the total number of sample created, p = #failures/#samples.
There is nothing wrong with this thinking.
But in all real situations the parameters of the probability densities
are NOT known; however statistical parameter estimates are usually
available. For example, if the density is assumed to be normal, its mean,
can be estimated by
= X/n. But
, although the
Central Limit Theorem insists that it is probably close.
Here is where most of my engineering colleagues would say "close enough
for engineering work" and conclude, erroneously, that
These same colleagues would never be tempted to say that some small
probability, say, 1/1000, is equal to zero because it is "close
enough for engineering work." Why is it close enough in the one situation
but not in the other?
The answer is that in the second case it is obvious that such an
assumption would lead to a gross underestimation of potential risk, while in
the first case the underestimation is less obvious, although just as real.
To illustrate this numerically we suggest two
simulations. These are not presented here, nor are their results. For the
distinctions to be meaningful, kind reader, requires that you create and
execute both simulations (they're E-Z!) and draw conclusions for yourself.
First, we assume that you have a means of creating a Monte Carlo
simulator; if you didn't it is unlikely you would have been sufficiently
interested to read this far. So the steps to be taken in the two
simulations will be outlined. The actual implementation will depend on
The is the simplest form of MC simulation, and is an exemplar for most engineering MC simulations.
Define the universe. In this simplest of simulations we will sample from a
standard normal density, with known parameters,
= 0, and
Draw n = 10,000 samples form this universe.
Estimate the empirical probability
that x < -3, by counting the values of x that are less than minus 3, and
dividing by 10,000, i.e.
p = #(x < -3)/#samples.
Of course we already know the theoretical
answer because we know we are sampling from a standard normal density, p(x < -3) =
0.001349898. Your simulation will likely have observed about 13 values
of x less than -3. It won't be exactly 13 because this is a random
sample. But if you were to create a much larger sample your empirical
probability would approach the true value. You already know all this,
since this is the way you already perform MC simulations. The reason for
going to all the trouble is to provide a comparison for Simulation 2.
This simulation considers the more realistic situation where the true
are NOT known. WAIT! Of course you know them; you are building the
simulator! But you can override your simulated omniscience and create a
more realistic situation by using one simulation nested within another.
The outer simulation loop mimics your having conducted a laboratory
experiment to determine the mean and variance of the probability density to
be used in the inner simulation. The inner simulation is just like
Simulator 1, except that it uses parameter estimates provided by the outer
simulator. In this way you are simulating having to rely on parameter
estimates since the true values are unknown and must be estimated by
laboratory or other experiments.
Again, define the universe, this time
providing estimates of the unknown parameters,
. We will
s as the estimators ofand
To do this draw a sample of size, say n=10, from
the known standard normal, and compute
s, using the standard
estimates(*). This represents your
having to use estimates for
, just as you must do in
your real simulations.
Now conduct Simulation 1, but using
the parameter estimates, rather than their true values, as in
Simulation 1. That is, draw n=10,000 samples from this universe,
s) rather than N(0,1).
Keep a running total of the number of
instances that x < -3.
Outer Loop (continued):
Repeat the Outer Loop (and of
course its nested Inner Loop)
some large number of times (say 1,000), using new estimates (
s ) forand
each generation of 10,000 Simulation 1-type samples.
Now estimate the empirical probability
that x < -3, by counting the values of x that are less than -3, and
dividing by the total number of samples, in our example 1,000 x 10,000 =
p = #(x < -3)/#samples.
Compare this result with p = 0.001349898.
Startling, isn't it? You observed more than 5 times as many "failures"
as you expected.
Try running the inner loop with 100,000
samples, rather than the current 10,000. What happens to the empirical
probability that x < -3? Right! Nothing! Running the inner
loop longer and longer has little influence because the real cause of the
variability is the uncertainty in the sampling population's parameters, the
very entities you usually assume that you know with certainty, when you
really had to have estimated them from outside data.
Do the simulations again, but this time
record the number of times that x < -4. This time you observe
nearly 50 times as many "failures" as you expected. If your
criterion were p=1x10-6, Simulation 1 will show about 1 per million x <
-4.753424, while Simulation 2 will be more than 500 times that.
That ISN'T close enough for engineering work.
A Feud of Historic Dimensions:
Confusing a parameter value with its
estimate has an interesting history, and was the cause of a monumental
quarrel between two statistical giants, Karl Pearson and R. A. Fisher.
Karl Pearson (1857-1936) was the leading statistician of his
day. He founded the prestigious statistical journal Biometrika (still
a major publication), the Pearson correlation coefficient is named after
him, he developed a system of probability density curves also named after
him, and he invented the Chi-square test of statistical significance.
R. A. Fisher (1890-1962) wrote the first book on Design of
Experiments, revolutionized statistics with the concept of likelihood and
estimating parameters by maximizing their likelihood, and told Pearson that
he misunderstood his own Chi-square test, and was therefore calculating
probability of failure incorrectly by using the sample means as though
they were the population means.
The resulting acrimonious and vitriolic row lasted years and
was finally resolved in 1926 in Fisher's favor by, ironically, Egon Pearson,
Karl Pearson's son, who published the results of 12,000 2x2 tables observed
under random sampling. If Karl Pearson had been
correct the observed average value for Chi-square would have been 3. Fisher said it
would be 1, as it was. (ref.: Joan Fisher Box,
R.A. Fisher - the Life of a Scientist, Wiley, 1978) As a
result, the elder Pearson's erroneous calculations would erroneously accept
as having a 5% probability of occurrence, something with a true probability
of 0.5% - an error of 10x, effectively increasing his Type II
error rate (failing to reject a false null hypothesis) by 10x.
There are several lessons here:
You are in good company - even great minds can miss the
distinction between a parameter value and its estimate.
Problems caused by confusing sample
statistics with population parameters are not restricted to the Normal
Consequences of this confusion will result in anticonservative underestimations
of "low probability" events, i.e the real probability will by higher
than you think it is. (It will also result in anticonservative
overestimations of "high probability" events because the unaccounted for
uncertainty forces the true probabilities toward p=0.5.)
s is not sigma and Xbar is
The sample statistic does not
equal the population parameter (but it will be close to it).
... and if your simulator is not
accounting for these facts, then your probability estimates are WRONG.
All is not hopeless, of course, and statisticians have been studying the
phenomenon uncovered by the Pearson - Fisher feud for 80 years. Not
surprisingly there is no "silver bullet" quick-fix, one-size-fits-all
solution, but we do have 8 decades of statistical theory and practice in
successfully dealing with it.