# Joint, Marginal, and Conditional Distributions

We engineers often ignore the distinctions between joint, marginal, and conditional probabilities - to our detriment.

Figure 1 - How the Joint, Marginal, and Conditional distributions are related.

conditional probability: where where is the probability of x by itself, given specific value of variable y, and the distribution parameters, .  (See Figure 1)  If x and y represent events A and B, then P(A|B) = nAB/nB , where nAB is the number of times both A and B occur, and nB is the number of times B occurs. P(A|B) = P(AB)/P(B), since
P(AB) = nAB/N and P(B) = nB/N so that

Joint probability is the probability of two or more things happening together. where is the probability of x and y together as a pair, given the distribution parameters, . Often these events are not independent, and sadly this is often ignored.  Furthermore, the correlation coefficient itself does NOT adequately describe these interrelationships.

Consider first the idea of a probability density or distribution: where  is the probability density of x, given the distribution parameters, .  For a normal distribution,  where is the mean, and is the standard deviation.  This is sometimes called a pdf, probability density function.  The integral of a pdf, the area under the curve (corresponding to the probability) between specified values of x, is a cdf, cumulative distribution function, . For discrete  f , F is the corresponding summation.

A joint probability density two or more variables is called a multivariate distribution. It is often summarized by a vector of parameters, which may or may not be sufficient to characterize the distribution completely. Example, the normal is summarized (sufficiently) by a mean vector and covariance matrix.

marginal probability: where is the probability density of x, for all possible values of y, given the distribution parameters, .  The marginal probability is determined from the joint distribution of x and y by integrating over all values of y, called "integrating out" the variable y.  In applications of Bayes's Theorem, y is often a matrix of possible parameter values.  The figure illustrates joint, marginal, and conditional probability relationships.

Note that in general the conditional probability of A given B is not the same as B given A.  The probability of both A and B together is P(AB), and if both P(A) and P(B) are non-zero this leads to a statement of Bayes Theorem:

P(A|B) = P(B|A) x P(A) / P(B)  and

P(B|A) = P(A|B) x P(B) / P(A)

Conditional probability is also the basis for statistical dependence and statistical independence.

Independence: Two variables, A and B, are independent if their conditional probability is equal to their unconditional probability.  In other words, A and B are independent if, and only if, P(A|B)=P(A), and P(B|A)=P(B).  In engineering terms, A and B are independent if knowing something about one tells nothing about the other.  This is the origin of the familiar, but often misused, formula P(AB) = P(A) x P(B), which is true only when A and B are independent.

conditional independence: A and B are conditionally independent, given C, if

Prob(A=a, B=b | C=c) = Prob(A=a | C=c) x Prob(B=b | C=c) whenever Prob(C=c) > 0.

So the joint probability of ABC, when A and B are conditionally independent, given C, is then
Prob(C)
x Prob(A | C) x Prob(B | C) A directed graph illustrating this conditional independence is A <- C -> B.