The information matrix (also called Fisher information matrix) is the matrix of second cross-moments of the score vector. The latter is the vector of first partial derivatives of the log-likelihood function with respect to its parameters.
To define the information matrix, we need the following objects:
a sample ;
a parameter vector that characterizes the distribution of ;
the likelihood function ;
the log-likelihood function
the score vector that is, the vector of first derivatives of with respect to the entries of .
The information matrix is the matrix of second cross-moments of the score:
The notation indicates that the expected value is taken with respect to the probability distribution of associated to the parameter .
We take an expected value because the sample is random.
For example, if the sample has a continuous distribution, then the likelihood function iswhere is the probability density function of , parametrized by .
Then, the information matrix is
Under mild regularity conditions, the expected value of the score is equal to zero:
As a consequence,that is, the information matrix is the covariance matrix of the score.
Under mild regularity conditions, it can be proved thatwhere is the matrix of second-order cross-partial derivatives (so-called Hessian matrix) of the log-likelihood.
This equality is called information equality.
As an example, consider a sample made up of the realizations of IID normal random variables with parameters and (mean and variance).
In this case, the information matrix is
The log-likelihood function is as proved in the lecture on maximum likelihood estimation of the parameters of the normal distribution. The score is a vector whose entries are the partial derivatives of the log-likelihood with respect to and : The information matrix isWe havewhere: in step we have used the fact that for because the variables in the sample are independent and have mean equal to ; in step we have used the fact that Moreover,where: in steps and we have used the independence of the observations in the sample and in step we have used the fact that the fourth central moment of the normal distribution is equal to . Finally,where: in step we have used the facts that and that for because the variables in the sample are independent; in step we have used the fact that the third central moment of the normal distribution is equal to zero.
When the sample is made up of IID observations, as in the previous example, the covariance matrix of the maximum likelihood estimator of is approximately equal to the inverse of the information matrix.
Denote the maximum likelihood estimator of by . Then,
Denote by the iid observations. The log-likelihood of the sample iswhere is the log-likelihood of the -th observation. Under some technical conditions, we have proved that converges in distribution to a normal distribution with zero mean and covariance matrix equal toThis implies thatwhere: in step we use the fact that the observations are identically distributed; in step we can bring the summation inside the variance operator because the observations are independent; in step we exploit the linearity of the gradient; in step we use the fact that the information matrix is equal to the covariance matrix of the score.
Note that in general, this is true only if the observations in the sample are independently and identically distributed.
More details about the Fisher information matrix, including proofs of the information equality and of the fact that the expected value of the score is equal to zero, can be found in the lecture on Maximum likelihood.
Previous entry: Impossible event
Next entry: Integrable random variable
Please cite as:
Taboga, Marco (2021). "Information matrix", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/glossary/information-matrix.
Most of the learning materials found on this website are now available in a traditional textbook format.