This lecture deals with the probit model, a binary classification model in which the conditional probability of one of the two possible realizations of the output variable is equal to a linear combination of the inputs, transformed by the cumulative distribution function of the standard normal distribution.
Assume that a sample of data
,
for
,
is observed, where:
is an output variable that can take only two values, either
or
(it is a
Bernoulli random
variable);
is a
vector of inputs.
The conditional probability that the output
is equal to
,
given the inputs
,
is assumed to
be
where
is the cumulative distribution function of the standard normal distribution
and
is a
vector of coefficients.
Moreover, if
is not equal to
,
then it is equal to
(no other values are possible), and the probabilities of the two values need
to sum up to
,
so
that
The interpretation of the probit model is very similar to that of the logit model. You are advised to read the comments about the interpretation of the latter in the lecture entitled Logistic classification model.
As in the case of the logit, also the probit model can be written as a latent variable model.
Define a latent variable
where
is a random error term having a standard normal distribution. The output
is linked to the latent variable by the following
relationship:
We
have
that
so
that the latent variable model specified by (1) and (2) assigns to the inputs
the same conditional distributions assigned by the probit model.
The vector of coefficients
can be estimated by
maximum
likelihood (ML).
We assume that the observations
in the sample are independently and identically distributed
(IID) and that he
matrix of inputs defined
by
has
full rank.
In a separate lecture
(ML
estimation of the probit model), we demonstrate that the ML estimator
can be found (if it exists) with the following iterative procedure.
Starting from an initial guess of the solution
(e.g.,
),
we generate a sequence of
guesses
is an
diagonal matrix and
is an
vector. They are calculated as follows:
compute
denote by
the probability density
function of the standard normal distribution, and compute the
entries
of
the
vector
compute the
diagonal
matrix
The iterative procedure stops when numerical convergence is achieved, that is,
when the difference between two successive guesses
and
is so small that we can ignore it.
If
is the last step of the iterative procedure, then the maximum likelihood
estimator
is
and
its asymptotic
covariance
matrix
is
where
.
As a consequence, the distribution of
can be approximated by a normal distribution with mean equal to the true
parameter and covariance matrix
.
When we estimate the coefficients of a probit classification model by maximum likelihood (see previous section), we can carry out hypothesis tests based on maximum likelihood procedures (e.g., Wald, Likelihood Ratio, Lagrange Multiplier) to test a null hypothesis about the coefficients.
Furthermore, we can set up a z test to test a restriction on a single
coefficient:where
is the
-th
entry of the vector of coefficients
and
.
The test statistic
iswhere
is the
-th
entry of
and
is the
-th
entry on the diagonal of the matrix
.
Since
is asymptotically normal and
is a consistent estimator of the
asymptotic covariance matrix of
,
converges in distribution to a
standard normal
distribution (the proof is identical to the proof we have provided for the
asymptotic normality of the z statistic in the lecture on the
logit
model).
By approximating the distribution of
with its asymptotic one (a standard normal), we can
derive critical values (depending on
the desired size) and carry out the
test.
Please cite as:
Taboga, Marco (2021). "Probit classification model (or probit regression)", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/probit-classification-model.
Most of the learning materials found on this website are now available in a traditional textbook format.