How hit/miss models work
POD (Probability of Detection) as a function of size is less
straightforward for binary (yes/no) data when compared with data having a
continuous response (â). Grouping data into size bins, and estimating
POD as the fraction found in that size range, is inefficient and further
suffers from an unwelcome trade-off between resolution in size (smaller size
intervals) and resolution in POD (fewer observations in the interval).
The most effective method for describing binary data is to posit some
continuous function, bounded by 0 < y < 1, and then estimating the model
parameters using the maximum likelihood
The figure shows POD on the right and the linearizing function Z on the
left, and a random sample of 60 hit/miss observations, plotted against
The figure sequentially presents ten random samples
of n=60 hit/miss observations and illustrates the magnitude of variability due only to chance.
Reality is random:
The solid black line is defined as truth. In reality
the truth would be unknown and is to be inferred from the behavior of the
data. The solid black "data" points are observations of "hits" or
"misses," ones or zeros, for an inspection with only a binary outcome.
The "data" are generated and a generalized linear
model is used to produce the most likely function to have given rise to those
observations. That's the blue line. Also shown are the
confidence bounds from which a90/95 can be taken directly (unlike
the confidence bounds on â vs a censored regression). Sometimes the blue line (the model) is very close to
the "truth." But sometimes it isn't as can be seen from another random
In reality we only get to see ONE collection of data, and
from that must estimate the most likely model for the unseen and unknown and
unknowable "truth," and produce its confidence bound that
includes the true a90 at least 95 times in every 100 similar
experiments (were we to run the other 99 experiments, which we cannot).