SE_003399  SE logo

Convergence in Distribution

Engineers are familiar with mathematical convergence - that the terminal value of a series approaches some limit as the number of terms increases.

We are less familiar with an analogous statistical concept of "convergence in distribution," where the characteristic of the limit isn't a single value, but rather that the character of the sequence itself approaches some specific distribution. An example is the central limit theorem. Further examples are illustrated here, with the dotted arrows indicating asymptotic relationships.

"Convergence in probability" is not quite the same as convergence in distribution. Convergence in probability says that the random variable converges to a value I know. So (r.v. - value) = 0, or (r.v. - other r.v.) = 0.  Always. Convergence in distribution says that they behave the same way (but aren't the same value).  Clearly if X has a normal density, N(0,1) and Y, too, has a normal density, Y~N(0,1), then the difference between a random draw from X and a random draw from Y is not equal to zero,  X-Y ≠ 0.

Still other examples of convergence in distribution are the extreme value distributions.

So what? In practical applications simple, direct-sampling* Monte Carlo simulation may not be up to the task of producing draws from the target joint density even when the joint density is correctly specified. (Sadly, many engineering MC simulations rely on an inadequate correlation coefficient, or worse - ignore dependencies among variables.)

Recent advances** in computational statistics take advantage of convergence in distribution to simulate the often complicated joint density by sampling directly from the joint probability density itself (!) These are iterative, rather than direct, sampling methods. It can be shown that under suitable conditions that the sequence of samples ultimately becomes ergodic***, with elements of the sequence converging in distribution, thus representing samples from the desired joint probability density.

Because they do not have to sample everywhere in the probability space, only where the variables most probably reside, these methods are not fettered by the problem of large dimensions (the Curse of Dimensionality).