The differences are not even nuanced. They are Apples and Oranges.

Engineers know that *stress* and *strain* are not synonymous: they don't mean the same thing, even though
the popular press uses the terms interchangeably.

(*Stress* is a force acting over a unit area.
*Strain* is the elongation per unit of original length. One can
be viewed as causing the other, and in many instances *stress* =
proportionality constant x *strain*.)

*Probability* and *Statistics* are not the same
either. They are related, but much more circuitously than as Hooke's
Law (above) relates stress with strain.

*Probability*can be viewedas the long-run frequency of occurrence**either**as a measure of the plausibility of an event given incomplete knowledge - but not both.**or***Statistics*are*functions*of the observations (data) that often have useful and even surprising properties.

The sample mean,, is a statistic; the population mean,
, is not.
That is because a statistic is observable, being computed from the
observations, while a population parameter, being a philosophical
abstraction, is not observable, and thus must be estimated. Statistics, like
, are often used to
**estimate** population parameters like.
The fidelity of the estimate depends on the number of
observations used in computing the statistic. Notice that the estimate
changes slightly every time you take a sample, whereas the population
parameter doesn't.

The population parameters are required to estimate probabilities, based on a probability density function,
*pdf *(or probability mass function, *pmf*, if ** X**
is a discrete random variable).

So (finally) we see the relationship between *probability* and *statistics*:

From the *observations* we compute *statistics* that we use to
*estimate population parameters*, which index the probability density, from which
we can compute the *probability* of a future observation from that density.

(With convoluted thought processes like this is it any wonder that *statistics* is not everyone's favorite subject?)

**Caveat:**

Notice that estimating the population parameters is only half the battle. The density from which the observations were taken must also be known. For example, given these observations, what is the probability of a new observation being less than zero?

X: 0.10, 0.16, 0.23, 0.32, 0.43, 0.62, 1.0

If you estimate the mean and standard deviation in the
usual way, and if you assume that the observations are from a normal
density, you would compute that the probability is p=0.1 that a new
observation would be less than zero. (If you were paying attention to
the very small sample size and used the *t* density,
rather than the normal, you would have p=0.12.)

But these observations are not from a normal density,
rather they are log-normal, something that a *quantile-quantile* plot would
have suggested.

Thus the probability of a future observation being less than zero, is p=0, because the log-normal density is defined only for X > 0, since .

**Summary: **In statistics, as with engineering, pay attention to the fine print.