This lecture shows how to perform maximum likelihood estimation of the parameters of a linear regression model whose error terms are normally distributed conditional on the regressors.
In order to fully understand the material presented here, it might be useful to revise the introductions to maximum likelihood estimation (MLE) and to the Normal Linear Regression Model.
The objective is to estimate the parameters of the linear regression
modelwhere
is the dependent variable,
is a
vector of regressors,
is the
vector of regression coefficients to be estimated and
is an unobservable error term.
The sample is made up of
IID observations
.
The regression equations can be written in matrix form
aswhere
the
vector of observations of the dependent variable is denoted by
,
the
matrix of regressors is denoted by
,
and the
vector of error terms is denoted by
.
We assume that the vector of errors
has a multivariate normal distribution conditional
on
,
with mean equal to
and covariance matrix equal
to
where
is the
identity matrix and
is
the second parameter to be estimated.
Furthermore, it is assumed that the matrix of regressors
has full-rank.
The assumption that the covariance matrix of
is diagonal implies that the entries of
are mutually independent (i.e.,
is independent of
for
.).
Moreover, they all have a normal distribution with mean
and variance
.
By the properties of
linear
transformations of normal random variables, the dependent variable
is conditionally normal, with mean
and variance
.
Therefore, its
conditional
probability density function is
The likelihood function
is
Since the observations from the sample are
independent, the likelihood of the sample is equal to the product of the
likelihoods of the single
observations:
The log-likelihood function is
It is obtained by taking the natural
logarithm of the likelihood
function:
The maximum likelihood estimators of the regression coefficients and of the
variance of the error terms
are
The estimators solve the following
maximization problem
The
first-order conditions for a maximum are
where
indicates the gradient calculated with respect to
,
that is, the vector of the partial derivatives of the log-likelihood with
respect to the entries of
.
The gradient is
which
is equal to zero only
if
Therefore,
the first of the two equations is satisfied if
where
we have used the assumption that
has full rank and, as a consequence,
is invertible. The partial derivative of the log-likelihood with respect to
the variance is
which,
if we assume
,
is equal to zero only
if
Thus,
the system of first order conditions is solved
by
Note
that
does not depend on
,
so that this is an explicit solution.
Thus, the maximum likelihood estimators are:
for the regression coefficients, the usual OLS estimator;
for the variance of the error terms, the
unadjusted sample
variance of the residuals
.
The vector of
parametersis
asymptotically normal with asymptotic mean equal
to
and
asymptotic covariance matrix equal
to
The first
entries of the score vector
are
The
-th
entry of the score vector
is
The
Hessian, that is, the matrix of second derivatives, can be written as a block
matrix
Let
us compute the
blocks:
and
Finally,
Therefore, the Hessian
is
By
the information equality, we have
that
But
and,
by the Law of Iterated
Expectations,
Thus,
As
a consequence, the asymptotic covariance matrix
is
This means that the probability distribution of the vector of parameter
estimates
can
be approximated by a multivariate normal
distribution with mean
and
covariance
matrix
StatLect has several pages on maximum likelihood estimation. Learn how to derive the estimators of the parameters of the following distributions and models.
Type | Solution | |
---|---|---|
Exponential distribution | Univariate distribution | Analytical |
Normal distribution | Univariate distribution | Analytical |
Poisson distribution | Univariate distribution | Analytical |
T distribution | Univariate distribution | Numerical |
Multivariate normal distribution | Multivariate distribution | Analytical |
Logistic classification model | Classification model | Numerical |
Probit classification model | Classification model | Numerical |
Please cite as:
Taboga, Marco (2021). "Linear regression - Maximum Likelihood Estimation", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/linear-regression-maximum-likelihood.
Most of the learning materials found on this website are now available in a traditional textbook format.