Why GLM?

 

Aside 3:  Perhaps "normit" is more accurate than "probit" because a standard normit is centered at zero, whereas a true probit is centered at 5.  Both refer to a Normal cdf.
(return to text)
Why GLM?

One quantitative description of inspection capability is the cracksize, a, which can be detected with at least 90% probability, established with 95% confidence. This is equivalent to finding a such that the lower confidence bound on [Pr(detect|a) =0.9] is 0.95. The NDE community calls this cracksize a90/95. Unfortunately this conveys little about the relationship between size and detectability since infinitely many POD vs. cracksize curves can share the same a90/95 depending on appropriate combinations of capability (mean cracksize having 90% POD) and experimental uncertainty. One method used to establish a90/95 is based on "randomly" selecting 29 specimens, all with same cracksize, and observing 29 successes in 29 inspections. (No provision is made for fewer than 29 successes other than a re-test.) Although the maximum likelihood estimate of the underlying POD would be 1, the conventional interpretation is POD = 0.9 with a 95% confidence, based on simple binomial calculations.

The "29 of 29"method is based on untenable statistical underpinnings, yet enjoys widespread acceptance in the NDE community largely because it is easy to implement. Better methods have been suggested (Annis, et.al., 1989) based on inspections of cracks with different sizes and a GLM modeling procedure. Wide acceptance of these methods has been slow, owing to their requirement for specialized software. GLM procedures are available in sophisticated statistics software packages but these are expensive and largely inaccessible to nonstatisticians. The method described here removes this impediment.

Link Functions:

To begin, define probability of detection, pi = POD(ai), as a function linked to the ith cracksize, ai. Common link functions for binary data which map (- < x < ) into (0 < y < 1) include the probit or inverse Normal function, the logit, logistic or log-odds function, and the complementary log-log function, often called Weibull by engineers. These are:

probit f(x) = g(y) =F-1(p)

logit f(x) = g(y) = ln{ p/(1-p) }

complementary log-log f(x) = g(y) = ln{ -ln(1-p) }

where f(x) is any polynomial sum, linear in the parameters, and F(•) is the standard normal cdf. Notice that when g(y) = y, the problem reduces to an ordinary linear model, y = f(x). Since f(x) = g(y), then y = g-1( f(x) ). We will refer to g-1(•) as the link, and, using the probability of crack detection example, as POD(a).

probit(3) link:    POD(a) = 1 - F({log(a)-L}/S)

logit link:    POD(a) = exp{L0+S0Log(a)}/[1+exp{L0+S0Log(a)}]

complementary log-log link;     POD(a) = 1 - exp{ -exp{L0+S0 log(a)}}

where L and S are model location and scale parameters. (Note that F(•) is NOT a distribution of cracksizes, even though it has the same mathematical form.) A comparison of the properties of these transforms can be found in McCullagh and Nelder (1989).

The Likelihood Function:

For a given link function, the likelihood of L and S, based on the result, y, of inspecting crack ai is

li( L, S | ai, yi) = pyi (1-pi)(1-y)                                       equation 1

which reduces to pi when yi is 1 and (1-pi) when yi is 0. This is the key relationship on which the spreadsheet implementation is built.

A textbook development might proceed to describe the aggregate likelihood for inspecting N independent cracks of different sizes as

n N-n

L=

P pi P (1 - pj)
i=1 j=1

where n is the number of hits (ones) and N-n is the number of misses (zeros); pi is the POD given by the model for the ith hit and (1-pj) is the probability for the jth miss. The observation would then be made that this repeated product would prove computationally onerous, a difficulty greatly simplified by taking the logarithm, thus transforming the series of products into one of sums.

n N-n

ln(L)=

S ln(pi) + S ln(1 - pj)

equation 2

i=1 j=1

Finally, the model parameters would be estimated so as to maximize this likelihood by differentiating equation 2 with respect to the model parameters, which enter through the link, equating these derivatives to zero, and solving the resulting equations simultaneously. Fortunately, the P/C spreadsheet can streamline this tedious arithmetic by simply using the logarithms of the individual likelihoods (equation 1) and the built-in SOLVER algorithm.

[First page] [back] [next] [Spreadsheets] [References]

 
SElogo.gif (5955 bytes)

[ HOME ] [ Feedback ]

Mail to Charles.Annis@StatisticalEngineering.com
Copyright � 1998-2008 Charles Annis, P.E.
Last modified: June 08, 2014