The information matrix (also called Fisher information matrix) is the matrix of second cross-moments of the score vector. The latter is the vector of first partial derivatives of the log-likelihood function with respect to its parameters.
To define the information matrix, we need the following objects:
a sample
;
a
parameter vector
that characterizes the distribution of
;
the likelihood function
;
the log-likelihood
function
the score vector
that
is, the
vector of first derivatives of
with respect to the entries of
.
The information matrix
is the
matrix of second cross-moments of the
score:
The notation
indicates that the expected value is taken with respect to the probability
distribution of
associated to the parameter
.
We take an expected value because the sample
is random.
For example, if the sample
has a continuous
distribution, then the likelihood function
is
where
is the probability density
function of
,
parametrized by
.
Then, the information matrix
is
Under mild regularity conditions, the expected value of the score is equal to
zero:
As a
consequence,that
is, the information matrix is the
covariance
matrix of the score.
Under mild regularity conditions, it can be proved
thatwhere
is the matrix of second-order cross-partial derivatives (so-called Hessian
matrix) of the log-likelihood.
This equality is called information equality.
As an example, consider a sample
made
up of the realizations of
IID
normal random
variables with parameters
and
(mean and variance).
In this case, the information matrix
is
The log-likelihood function is
as
proved in the lecture on
maximum
likelihood estimation of the parameters of the normal distribution. The
score
is a
vector whose entries are the partial derivatives of the log-likelihood with
respect to
and
:
The
information matrix
is
We
have
where:
in step
we have used the fact that
for
because the variables in the sample are
independent
and have mean equal to
;
in step
we have used the fact that
Moreover,
where:
in steps
and
we have used the independence of the observations in the sample and in step
we have used the fact that the fourth
central moment of the
normal distribution is equal to
.
Finally,
where:
in step
we have used the facts that
and that
for
because the variables in the sample are independent; in step
we have used the fact that the third central moment of the normal distribution
is equal to zero.
When the sample
is made up of IID observations, as in the previous example, the covariance
matrix of the maximum likelihood estimator of
is approximately equal to the
inverse of the information
matrix.
Denote the maximum likelihood estimator of
by
.
Then,
Denote by
the
iid observations. The log-likelihood of the sample
is
where
is the log-likelihood of the
-th
observation. Under some technical conditions, we
have proved that
converges
in distribution to a normal distribution with zero mean and covariance
matrix equal
to
This
implies
that
where:
in step
we use the fact that the observations are identically distributed; in step
we can bring the summation inside the variance operator because the
observations are independent; in step
we exploit the linearity of the gradient; in step
we use the fact that the information matrix is equal to the covariance matrix
of the score.
Note that in general, this is true only if the observations in the sample are independently and identically distributed.
More details about the Fisher information matrix, including proofs of the information equality and of the fact that the expected value of the score is equal to zero, can be found in the lecture on Maximum likelihood.
Previous entry: Impossible event
Next entry: Integrable random variable
Please cite as:
Taboga, Marco (2021). "Information matrix", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/glossary/information-matrix.
Most of the learning materials found on this website are now available in a traditional textbook format.