Index > Fundamentals of statistics > Maximum likelihood

Covariance matrix of the maximum likelihood estimator

The asymptotic covariance matrix of a maximum likelihood estimator (MLE) is an unknown quantity that we need to approximate when we want to build confidence intervals around the point estimates obtained with the maximum likelihood method.

It is not to be confused with the MLE of the covariance matrix of a distribution (check this page).

This lecture presents three popular estimators that can be used to approximate the asymptotic covariance matrix of the MLE:

the outer product of gradients (OPG) estimator;
the Hessian estimator;
the Sandwich estimator.

Table of contents

The setting
Asymptotic covariance matrix
Information equality
Outer product of gradients (OPG) estimator
Hessian estimator
Sandwich estimator
References

The setting

Let be the realizations of the first terms of an IID sequence .

Suppose that a generic term of the sequence has probability density (or mass) function where $heta _{0}$ is an unknown vector of parameters.

The maximum likelihood estimator of $heta _{0}$ is [eq4]

Asymptotic covariance matrix

As proved in the lecture on maximum likelihood, under certain technical assumptions the distribution of is asymptotically normal.

In particular, the distribution of can be approximated by a multivariate normal distribution with mean $heta _{0}$ and covariance matrixwhere:

is the log-likelihood of a single observation from the sample, evaluated at the true parameter $heta _{0}$ ;
the gradient is the vector of first derivatives of the log-likelihood;
is the so-called asymptotic covariance matrix.

Note that we divide by because the asymptotic covariance matrix is the covariance matrix of , while we are interested in the covariance of .

Information equality

Under the technical assumptions mentioned previously, the information equality holds:where the Hessian matrix is the matrix of second-order partial derivatives of the log-likelihood function.

Outer product of gradients (OPG) estimator

The first estimator of the asymptotic covariance matrixis called outer product of gradients (OPG) estimator and it is computed as [eq16]

It takes its name from the fact that the gradient is a column vector, its transpose is a row vector, and the product between a column and a row is called outer product.

Provided some regularity conditions are satisfied, the OPG estimator $widehat{V}_{n}$ is a consistent estimator of , that is, it converges in probability to .

Proof

We provide only a sketch of the proof and we refer the reader to Newey and McFadden (1994) for a more rigorous exposition. Provided some regularity conditions are satisfied (see the source just cited), we have the following equality between probability limits: [eq18] where has been replaced by $heta _{0}$ because, being a consistent estimator, it converges in probability to $heta _{0}$ . Because the sample is IID, by the Law of Large Numbers we have that [eq20] Now, the formula for the covariance matrix (see the lecture entitled Covariance matrix) yields [eq21] But the expected value of the gradient evaluated at $heta _{0}$ is , so thatThus, [eq23] Because matrix inversion is continuous, by the Continuous Mapping theorem we have [eq24] which is exactly the result we needed to prove.

Hessian estimator

The second estimator of the asymptotic covariance matrixis called Hessian estimator and it is computed as [eq26]

Under some regularity conditions, the Hessian estimator $widetilde{V}_{n}$ is also a consistent estimator of .

Proof

Again, we do not provide an entirely rigorous proof (for which you can see Newey and McFadden - 1994) and we only a sketch the main steps. First of all, under some regularity conditions, we have that [eq27] where has been replaced by $heta _{0}$ because, being a consistent estimator, it converges in probability to $heta _{0}$ . Now, since the sample is IID, by the Law of Large Numbers we have that [eq29] By the information equality, we haveTherefore, [eq31] Because matrix inversion is continuous, by the Continuous Mapping theorem we have [eq32] which is what we needed to prove.

Sandwich estimator

The third estimator of the asymptotic covariance matrixis called Sandwich estimator and it is computed as [eq34] where $widehat{V}_{n}$ is the OPG estimator and $widetilde{V}_{n}$ is the Hessian estimator.

Also the Sandwich estimator is a consistent estimator of .

Proof

This is again a consequence of the Continuous Mapping theorem: [eq36] where the last equality follows from the consistency of the OPG and Hessian estimators.

References

Newey, W. K. and D. McFadden (1994) "Chapter 35: Large sample estimation and hypothesis testing", in Handbook of Econometrics, Elsevier.

How to cite

Please cite as:

Taboga, Marco (2021). "Covariance matrix of the maximum likelihood estimator", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/maximum-likelihood-covariance-matrix-estimation.