The asymptotic covariance matrix of a maximum likelihood estimator (MLE) is an unknown quantity that we need to approximate when we want to build confidence intervals around the point estimates obtained with the maximum likelihood method.
It is not to be confused with the MLE of the covariance matrix of a distribution (check this page).
This lecture presents three popular estimators that can be used to approximate the asymptotic covariance matrix of the MLE:
the outer product of gradients (OPG) estimator;
the Hessian estimator;
the Sandwich estimator.
Let
be the realizations of the first
terms of an IID sequence
.
Suppose that a generic term of the sequence has probability density (or mass)
function
where
is an unknown vector of parameters.
The maximum likelihood estimator of
is
As proved in the lecture on maximum
likelihood, under certain technical assumptions the distribution of
is asymptotically normal.
In particular, the distribution of
can be approximated by a multivariate normal
distribution with mean
and covariance
matrix
where:
is the log-likelihood of a single observation from the sample, evaluated at
the true parameter
;
the gradient
is the vector of first derivatives of the log-likelihood;
is the so-called asymptotic covariance matrix.
Note that we divide by
because the asymptotic covariance matrix is the covariance matrix of
,
while we are interested in the covariance of
.
Under the technical assumptions mentioned previously, the information equality
holds:where
the Hessian matrix
is the matrix of second-order partial derivatives of the log-likelihood
function.
The first estimator of the asymptotic covariance
matrixis
called outer product of gradients (OPG) estimator and it is computed
as
It takes its name from the fact that the gradient
is a column vector, its transpose is a row vector, and the product between a
column and a row is called outer product.
Provided some regularity conditions are satisfied, the OPG estimator
is a consistent estimator of
,
that is, it converges in probability to
.
We provide only a sketch of the proof and we
refer the reader to Newey and McFadden (1994) for
a more rigorous exposition. Provided some regularity conditions are satisfied
(see the source just cited), we have the following equality between
probability
limits:where
has been replaced by
because, being a consistent estimator, it converges in probability to
.
Because the sample is IID, by the Law of Large
Numbers we have
that
Now,
the formula for the covariance matrix (see the lecture entitled
Covariance matrix)
yields
But
the expected value of the gradient evaluated at
is
,
so
that
Thus,
Because
matrix inversion is continuous, by the
Continuous Mapping
theorem we have
which
is exactly the result we needed to prove.
The second estimator of the asymptotic covariance
matrixis
called Hessian estimator and it is computed
as
Under some regularity conditions, the Hessian estimator
is also a consistent estimator of
.
Again, we do not provide an entirely
rigorous proof (for which you can see Newey and
McFadden - 1994) and we only a sketch the main steps. First of all, under
some regularity conditions, we have
thatwhere
has been replaced by
because, being a consistent estimator, it converges in probability to
.
Now, since the sample is IID, by the Law of Large Numbers we have
that
By
the information equality, we
have
Therefore,
Because
matrix inversion is continuous, by the Continuous Mapping theorem we have
which
is what we needed to prove.
The third estimator of the asymptotic covariance
matrixis
called Sandwich estimator and it is computed
as
where
is the OPG estimator and
is the Hessian estimator.
Also the Sandwich estimator
is a consistent estimator of
.
This is again a consequence of the
Continuous Mapping
theorem:where
the last equality follows from the consistency of the OPG and Hessian
estimators.
Newey, W. K. and D. McFadden (1994) "Chapter 35: Large sample estimation and hypothesis testing", in Handbook of Econometrics, Elsevier.
Please cite as:
Taboga, Marco (2021). "Covariance matrix of the maximum likelihood estimator", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/maximum-likelihood-covariance-matrix-estimation.
Most of the learning materials found on this website are now available in a traditional textbook format.