# Joint, Marginal, and Conditional Distributions

We engineers often ignore the distinctions between joint, marginal, and conditional probabilities - to our detriment.

*Figure 1 - How the Joint, Marginal, and Conditional
distributions are related.*

**conditional probability**:
where where
**f **
is the probability of *x* by itself, given specific value of variable
*y*, and the
distribution parameters,
. (See Figure
1) If *x* and *y* represent events *A*
and *B*, then * ***P(A|B) = n**_{AB}**/n**_{B} , where
*
***n**_{AB} is the number of times *both*
*A*
and *B* occur, and * ***n**_{B} is the number of times
*B* occurs. *P(A|B) =
P(AB)/P(B)*, since

**P(AB) = ***n*_{AB}*/N* and **P(B) = n**_{B}**/N** so that

**Joint probability** is the probability of two or more
things happening together.
where
**f **is the probability of
*x* and *y*
together as a pair, given the distribution parameters,
. Often these events
are *not* independent, and sadly this is often ignored. Furthermore, the correlation coefficient itself does NOT
adequately describe these interrelationships.

Consider first the idea of a probability *density*
or **distribution**:
where
**f **
is the probability density of *x*, given the distribution
parameters,
. For a normal distribution,
where
is the
mean, and
is the standard deviation. This is sometimes called a
**pdf**, **probability density function**. The
integral of a pdf, the area under the curve (corresponding to the
probability) between specified values of *x*, is a **cdf**,
**cumulative distribution function**, . For
discrete
**f ,
F **is the corresponding summation.

A joint probability density two or more variables is called a
**multivariate distribution**. It is often summarized by a
vector of parameters, which may or may not be sufficient to characterize
the distribution completely. Example, the normal is summarized
(sufficiently) by a mean vector and covariance matrix.

**marginal probability**:
where
**f ** is the
probability density of *x*, for all possible values of *y*, given the
distribution parameters,
. The marginal probability is determined from
the joint distribution of *x* and *y* by integrating over all values of
*y*,
called *"integrating out"* the variable *y*. In applications of Bayes's
Theorem, *y* is often a matrix of possible parameter values. The figure
illustrates joint, marginal, and conditional probability relationships.

Note that in general the conditional
probability of **A** given **B** is *not* the same as *
***B** given
**A**. The probability of both **A** and *
***B** together is **P(AB)**,
and if both **P(A)** and **P(B)** are non-zero this leads to a statement of
**Bayes Theorem**:

**P(A|B) = P(B|A) **
**x**** P(A) / P(B) ** and

**P(B|A) = P(A|B) **
**x**** P(B) / P(A) **

Conditional probability is also the basis
for **statistical dependence** and **statistical independence**.

**Independence**: Two variables, **A** and *
***B**, are independent if their
conditional probability is equal to their unconditional probability. In
other words, *A* and *B* are independent if, and only if,
*P(A|B)=P(A)*, and
*P(B|A)=P(B)*. In engineering terms,
*A* and *B* are independent if knowing
something about one tells nothing about the other. This is the origin of
the familiar, but often misused, formula *P(AB) = P(A)
x P(B),* which is
true only when *A* and *B* are independent.

**conditional independence**: **A** and *
***B** are conditionally independent, given
**C**, if

*Prob(A=a, B=b | C=c) = Prob(A=a
| C=c)
x Prob(B=b | C=c)* whenever **Prob(C=c)
> 0**.

So the joint probability of **ABC**, when *
***A** and **B** are conditionally
independent, given **C**, is then

Prob(C) *
***x** *Prob(A | C)
x Prob(B | C) A*
directed graph illustrating this conditional independence is *
***A** <- *C*
-> **B**.