A linear regression model is a conditional model in which the output variable is a linear function of the input variables and of an unobservable error term that adds noise to the relationship between inputs and outputs.
This lecture introduces the main mathematical assumptions, the matrix notation and the terminology used in linear regression models.
Table of contents
We assume that the statistician observes a sample of realizations
for
,
where:
is a scalar output variable, also called dependent variable or regressand;
is a
vector of input variables,
also called independent variables or regressors;
is the sample size.
Inputs and outputs are assumed to have a linear
relationship:where:
is a
vector of constants, called regression coefficients;
is an unobservable error term which encompasses the sources of variability in
that are not included in the vector of inputs
;
for example,
could include measurement errors and input variables that are not observed by
the statistician.
The linear relationship is assumed to hold for each
,
with the same
.
Let us make an example.
Suppose that we have a sample of individuals for which weight, height and age are observed.
We want to set up a linear regression model to predict weight based on height and age.
Then, we could postulate
thatwhere:
,
and
denote the weight, age and height of the
-th
individual in the sample, respectively;
,
and
are regression coefficients;
is an error term.
The regression equation can be written in vector notation
asby
defining
where
is a
vector and
is a
vector.
Denote by
the
vector of
outputs
by
the
matrix of
inputs
and
by
the
vector of error terms.
Then, the linear relationship can be expressed in matrix form
as
The matrix
is called design matrix.
The vector of regressors
usually contains a constant variable equal to
.
Without loss of generality, we can assume that the constant is the first entry
of
.
Therefore, the first column of the design matrix
is a column of
s.
The regression coefficient corresponding to the constant variable is called intercept.
Example
Suppose that the number of regressors is
and the regression includes a constant equal to
.
Then, we have
that
The
coefficient
is the intercept of the regression.
When an intercept is included in the regression, we can assume without loss of
generality that the
expected value of
the error term is equal to
.
Consider, for instance, the previous example.
If we had
,
then we could
write
We could then define a new regression
equationwhere
The expected value of the new error would be zero
because
Usually, the vector of regression coefficients
is unknown and needs to be estimated.
The most commonly used estimator of
is the Ordinary Least Squares (OLS) estimator.
The OLS estimator is not only computationally convenient, but it enjoys good
statistical properties under different sets of mathematical assumptions on the
joint distribution of
and
.
The following is a formal definition of the OLS estimator.
Definition
An estimator
is an OLS estimator of
if and only if
satisfies
The OLS estimator is the vector of estimated regression coefficients that
minimizes the sum of the squared distances between predicted values
and observed values
.
In other words, the OLS estimator makes the predicted values as close as possible to the actual output values.
A
residualis
the difference between the observed output
and its predicted value
.
Thus, the OLS estimator is the estimator that minimizes the sum of squared residuals.
If the design matrix has full rank, the OLS minimization problem has a solution that is both unique and explicit.
Proposition
If the design matrix
has full rank, then the OLS estimator
is
First of all, observe that the sum of
squared residuals, henceforth indicated by
,
can be written in matrix form as
follows:
The
first order condition for a minimum is that the gradient of
with respect to
should be equal to
zero:
that
is,
or
Now,
if
has full rank (i.e., rank
equal to
),
then the matrix
is invertible. As a consequence,
the first order condition is satisfied
by
We
now need to check that this is indeed a global minimum. Note that the Hessian
matrix, that is, the matrix of second derivatives of
,
is
But
is a positive definite
matrix because, for any
,
we
have
where
the last inequality follows from the fact that
has full rank (and, as a consequence,
implies that
cannot be equal to
for every
).
Thus,
is strictly convex in
,
which implies that
is indeed a global minimum.
The linearity
assumptionis
not per se sufficient to determine the mathematical properties of the OLS
estimator of
(or of any other estimator).
In order to be able to establish any property (e.g.,
unbiasedness,
consistency and asymptotic
normality), we need to make further assumptions about the joint
distribution of the regressors
and the error terms
.
These further assumptions, together with the linearity assumption, form a linear regression model.
The next section provides an example.
A popular linear regression model is the so called Normal Linear Regression Model (NLRM).
In the NLRM it is assumed that:
the vector of errors
has a multivariate normal distribution conditional
on the design matrix
;
the covariance matrix of
is diagonal and all the diagonal entries are equal (in other words, the
entries of
are mutually independent and have constant
variance).
Under these hypotheses, the OLS estimator has a multivariate normal distribution. Furthermore, the distributions of several test statistics can be derived analytically.
More details about the NLRM can be found in the lecture on the Normal Linear Regression Model.
The NLRM has several appealing properties, but its assumptions are unrealistic in many practical cases of interest.
For this reason, we often prefer to make weaker assumptions, under which it is possible to prove that the OLS estimators are consistent and asymptotically normal.
These assumption are discussed in the lecture on the properties of the OLS estimator.
If you want to learn more about the mathematics of linear regression, you can read the following lectures:
Please cite as:
Taboga, Marco (2021). "Linear regression model", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/linear-regression.
Most of the learning materials found on this website are now available in a traditional textbook format.