SElogo.gif (5955 bytes)

 

Sums of Random Variables

 

SEbullet_3.gif (95 bytes) Home
SEbullet_3.gif (95 bytes) Up

Nomen-
    clature:

Upper case letters, X, Y, are random variables; lower case letters, x, y, are specific realizations of them. Upper case F is a cumulative distribution function, cdf, and lower case f is a probability density function, pdf.

 

Sometimes you need to know the distribution of some combination of things.  The sum of two incomes, for example, or the difference between demand and capacity.  If fX(x) is the distribution (probability density function, pdf) of one item, and fY(y) is the distribution of another, what is the distribution of their sum, Z = X + Y ?

As a simple example consider X and Y to have a uniform distribution on the interval (0, 1). The distribution of their sum is triangular on (0, 2).

Why? To begin consider the problem qualitatively. The minimum possible value of Z = X + Y is zero when x=0 and y=0, and the maximum possible value is two, when x=1 and y=1. Thus the sum is defined only on the interval (0, 2) since the probability of z<0 or z>2 is zero, that is,
P(Z | z<0) = 0 and P(Z | z>2) = 0.

Further, it seems intuitive(1) that the most probable value would be near z=1, the midpoint of the interval, for several reasons. The summands are iid (independent, identically distributed) and the sum is a linear operation that doesn't distort symmetry. So we would intuit(2) that the probability density of
Z = X + Y should start at zero at z=0, rise to a maximum at mid-interval, z=1, and then drop symmetrically to zero at the end of the interval, z=2. We might expect the distribution of
Z = X + Y to look like this:

sum.gif (3249 bytes)

Enough of visceral pseudo calculus. How do you prove that this result is correct?

Proof:
FZ(z), the cdf of Z, is the probability that the sum, Z, is less than or equal to some value z.  The probability density that we're looking for is
fZ(z) = d[FZ(z)]/dz, by relationship between a cdf and a pdf.

 

1

by definition.

2

   by the definition of conditional probability and the independence
of X and Y

3

letting X = x

4

Now,
  
by the definition of FY

5

so that
  
by substitution of 4 into 3.

6

Also,
  
by the relationship between a pdf and its cdf.

7

by substituting 5 into 6.

8

by Liebnitz's rule for differentiating an integral.

9

Since,
by the relationship between a pdf and its cdf and the fact that since y = z - x, dy = dz

10  

Finally,
Q.E.D. by substituting
8 into 7

11  

fX(x) = 1 and 0 x 1, and

12  

fY(y) = 1 and 0 y 1 by the definition of a uniform distribution

13  

from 10, 11 and 12 above,
Breaking the integral into to parts depending on z

14  

if 0 z 1, and

15  

if 1 z 2
Which are seen to be the equations describing a triangular distribution on
(0, 2) shown in the figure above.
Q.E.D.
Note that 10, above, is called the convolution of functions fX(x) and fY(y).   This result is general and it true for any independent continuous densities.

So what?  Consider our original problem when fX(x) and fY(y) are both uniform on (0, 1) and Z = X + Y is their sum.

 

Footnotes:

  1. Statistical intuition can sometimes be misleading. See Joseph P. Romano, Andrew F. Siegel (1986)  Counterexamples in Probability and Statistics (Wadsworth and Brooks/Cole Statistics/Probability Series)

  2. Is intuit a real word?

(return to text)


 

SElogo.gif (5955 bytes)

[ HOME ] [ Feedback ]

Mail to Charles.Annis@StatisticalEngineering.com
Copyright � 1998-2008 Charles Annis, P.E.
Last modified: June 08, 2014