In this lecture we show how to derive the maximum likelihood estimators of the two parameters of a multivariate normal distribution: the mean vector and the covariance matrix.
In order to understand the derivation, you need to be familiar with the concept of trace of a matrix.
Suppose we observe the first
terms of an IID sequence
of
-dimensional
multivariate normal random vectors.
The joint probability
density function of the
-th
term of the sequence
is
where:
is the
mean vector;
is the
covariance
matrix.
The covariance matrix
is assumed to be positive definite, so that its determinant
is strictly positive.
We use
,
that is, the
realizations of the
first
random vectors in the sequence, to estimate the two unknown
parameters
and
.
The likelihood function
is
Since the terms in the sequence are
independent,
their joint density is equal to the product of their marginal densities. As a
consequence, the likelihood function can be written
as
The log-likelihood function is
The log-likelihood is obtained by taking the
natural logarithm of the likelihood
function:
Note that the likelihood function is well-defined only if
is strictly positive. This reflects the assumption made above that the true
parameter
is positive definite, which implies that the search for a maximum likelihood
estimator of
is restricted to the space of positive definite matrices.
For convenience, we can also define the log-likelihood in terms of the
precision matrix
:
where
we have used the property of
the determinant
Before deriving the maximum likelihood estimators, we need to state some facts about matrices, their trace and their derivatives:
if
is a scalar, then it is equal to its
trace:
if two matrices
and
are such that the products
and
are both well defined,
then
the trace is a linear operator: if
and
are two matrices and
and
are two scalars,
then
the gradient of the trace of the product of two matrices
and
with respect to
is
the gradient of the natural logarithm of the determinant of
is
if
is
a
vector and
is a
symmetric matrix,
then
The maximum likelihood estimators of the mean and the
covariance
matrix
are
We need to solve the following maximization
problem
The
first order conditions for a maximum are
The
gradient of the log-likelihood with respect to the mean vector is
which
is equal to zero only
if
Therefore,
the first of the two first-order conditions implies
The
gradient of the log-likelihood with respect to the precision matrix is
By
transposing the whole expression and setting it equal to zero, we
get
Thus,
the system of first order conditions is solved
by
We are now going to give a formula for the information matrix of the multivariate normal distribution, which will be used to derive the asymptotic covariance matrix of the maximum likelihood estimators.
Denote by
the
column vector of all
parameters:
where
converts the matrix
into a
column vector whose entries are taken from the first column of
,
then from the second, and so on.
The log-likelihood of one observation from the sample can be written
as
The information matrix is
Define the
vector
Thus:
if
is an element of
,
say the
-th,
then the
-th
entry of the vector
is equal to
and all the other entries are equal to
;
if
is not an element of
,
then all the entries of the vector
are equal to
.
Define the
matrix
Note that:
if
is an element of
,
say
,
then the
-th
entry of the matrix
is equal to
and all the other entries are equal to
;
if
is not an element of
,
then all the entries of the matrix
are equal to
.
It can be proved (see, e.g., Pistone and Malagò
2015) that the
-th
element of the information matrix
is
The
vectoris
asymptotically normal with asymptotic mean equal
to
and
asymptotic covariance matrix equal
to
In more formal
terms, converges
in distribution to a multivariate normal distribution with zero mean and
covariance matrix
.
In other words, the distribution of the vector
can be approximated by a multivariate normal distribution with mean
and covariance
matrix
Pistone, G. and Malagò, L. (2015) " Information Geometry of the Gaussian Distribution in View of Stochastic Optimization", Proceedings of the 2015 ACM Conference on Foundations of Genetic Algorithms XIII, 150-162.
Please cite as:
Taboga, Marco (2021). "Multivariate normal distribution - Maximum Likelihood Estimation", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/multivariate-normal-distribution-maximum-likelihood.
Most of the learning materials found on this website are now available in a traditional textbook format.