Search for probability and statistics terms on Statlect
StatLect

Information matrix

by , PhD

The information matrix (also called Fisher information matrix) is the matrix of second cross-moments of the score vector. The latter is the vector of first partial derivatives of the log-likelihood function with respect to its parameters.

Table of Contents

Definition

To define the information matrix, we need the following objects:

The information matrix [eq5] is the $K	imes K$ matrix of second cross-moments of the score:[eq6]

The notation [eq7] indicates that the expected value is taken with respect to the probability distribution of x associated to the parameter $	heta $.

The expected value

We take an expected value because the sample x is random.

For example, if the sample x has a continuous distribution, then the likelihood function is[eq8]where [eq9] is the probability density function of x, parametrized by $	heta $.

Then, the information matrix is[eq10]

The information matrix is the covariance matrix of the score

Under mild regularity conditions, the expected value of the score is equal to zero:[eq11]

As a consequence,[eq12]that is, the information matrix is the covariance matrix of the score.

Information equality

Under mild regularity conditions, it can be proved that[eq13]where [eq14] is the matrix of second-order cross-partial derivatives (so-called Hessian matrix) of the log-likelihood.

This equality is called information equality.

Example: information matrix of the normal distribution

As an example, consider a sample [eq15]made up of the realizations of n IID normal random variables with parameters mu and sigma^2 (mean and variance).

In this case, the information matrix is[eq16]

Proof

The log-likelihood function is [eq17]as proved in the lecture on maximum likelihood estimation of the parameters of the normal distribution. The score $s$ is a $2	imes 1$ vector whose entries are the partial derivatives of the log-likelihood with respect to mu and sigma^2: [eq18]The information matrix is[eq19]We have[eq20]where: in step $rame{A}$ we have used the fact that [eq21] for $i
eq j$ because the variables in the sample are independent and have mean equal to mu; in step $rame{B}$ we have used the fact that [eq22]Moreover,[eq23]where: in steps $rame{A}$ and $rame{B}$ we have used the independence of the observations in the sample and in step $rame{B}$ we have used the fact that the fourth central moment of the normal distribution is equal to [eq24]. Finally,[eq25]where: in step $rame{A}$ we have used the facts that [eq26] and that [eq27]for $i
eq j$ because the variables in the sample are independent; in step $rame{B}$ we have used the fact that the third central moment of the normal distribution is equal to zero.

Covariance matrix of the maximum likelihood estimator

When the sample x is made up of IID observations, as in the previous example, the covariance matrix of the maximum likelihood estimator of $	heta $ is approximately equal to the inverse of the information matrix.

Denote the maximum likelihood estimator of $	heta $ by $widehat{	heta }$. Then,[eq28]

Proof

Denote by [eq29] the n iid observations. The log-likelihood of the sample is[eq30]where [eq31] is the log-likelihood of the i-th observation. Under some technical conditions, we have proved that [eq32] converges in distribution to a normal distribution with zero mean and covariance matrix equal to[eq33]This implies that[eq34]where: in step $rame{A}$ we use the fact that the observations are identically distributed; in step $rame{B}$ we can bring the summation inside the variance operator because the observations are independent; in step $rame{C}$ we exploit the linearity of the gradient; in step $rame{D} $ we use the fact that the information matrix is equal to the covariance matrix of the score.

Note that in general, this is true only if the observations in the sample are independently and identically distributed.

More details

More details about the Fisher information matrix, including proofs of the information equality and of the fact that the expected value of the score is equal to zero, can be found in the lecture on Maximum likelihood.

Keep reading the glossary

Previous entry: Impossible event

Next entry: Integrable random variable

How to cite

Please cite as:

Taboga, Marco (2021). "Information matrix", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/glossary/information-matrix.

The books

Most of the learning materials found on this website are now available in a traditional textbook format.