This lecture introduces conditional probability models, a class of statistical models in which sample data are divided into input and output data and the relation between the two kind of data is studied by modelling the conditional probability distribution of the outputs given the inputs. This is in contrast to unconditional models (sometimes also called generative models) where the data is studied by modelling the joint distribution of inputs and outputs.
Before introducing conditional models, let us review the main elements of a statistical model (see the lecture entitled Statistical inference):
there is a sample
,
which can be regarded as a realization of a random vector
(for example,
could be a vector collecting the realizations of some independent random
variables);
the
joint
distribution function of the sample, denoted by
,
is not known exactly;
the sample
is used to infer some characteristics of
;
a model for
is used to make inferences, where a model is simply a set of joint
distribution functions to which
is assumed to belong.
In a conditional model, the sample
is partitioned into inputs and
outputs:
where
denotes the vector of outputs and
the vector of inputs. The object of interest is the conditional distribution
function of the outputs given the
inputs
and
specifying a conditional model means specifying a set of conditional
distribution functions to which
is assumed to belong.
In other words, in a conditional model, the problem of model specification is
simplified by narrowing the focus of the statistician's attention on the
conditional distribution of the outputs and by ignoring the distribution of
the inputs. This can be seen, for example, in the case in which both inputs
and outputs are continuous random variables. In such a case, specifying an
unconditional model is equivalent to specifying a
joint
probability density function
for
the inputs and the outputs. But a joint density can be seen as the product of
a marginal and a conditional
density:
So,
in an unconditional model we explicitly or implicitly specify both the
marginal probability density function
and the
conditional
probability density function
.
On the other hand, in a conditional model, we specify only the conditional
and we leave the marginal
unspecified.
This section presents some of the terminology that is often used when dealing with conditional models.
The following distinction is often made, especially in the field of machine learning:
if the output is a continuous random variable, then a conditional model is called a regression model;
if the output is a discrete random variable, taking finitely many values (typically few), then a conditional model is called a classification model.
The input variables are often called:
predictors
independent variables
features
explanatory variables
regressors (in the context of regression models)
The output variables are often called:
predictands
dependent variables
target variables
response variables
regressands (in the context of regression models)
The following subsections introduce some examples of conditional models.
The linear regression model is probably the oldest, best understood and most
widely used conditional model. In the linear regression model, the response
variables
are assumed to be a linear function of the inputs
:
where
is any observation from the sample,
is a scalar output,
is a
vector of inputs,
is a
vector of constants (called regression coefficients) and
is an unobservable random variable that adds noise to the linear relationship
between inputs and outputs.
A linear regression model is specified by making assumptions about the error
term
.
For example,
is often assumed to have a normal distribution with
zero mean and to be independent of
.
In such a case, we have that, conditional on the inputs
,
the output
has a normal distribution with mean
.
As a consequence, the conditional density of
is
where
is the variance of
.
The parameters
and
are usually unknown and need to be estimated. So, we have a different
conditional distribution for each of the values of
and
that are deemed plausible by the statistician before observing the sample. The
set of all these conditional distributions (associated to the different
parameters) constitutes the conditional model for
.
To learn more about linear regression you can read:
the introductory lecture on linear regression models;
the lecture on the linear regression model with normal errors.
In the logistic classification model, the response variable
is a Bernoulli random variable: it
can take only two values, either
or
.
It is assumed that the
conditional
probability mass function of
is a non-linear function of the inputs
:
where
is a
vector of inputs,
is a
vector of constants and
is the logistic function defined
by
To know more, you can read the lecture about the logistic model.
Please cite as:
Taboga, Marco (2021). "Conditional models", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/conditional-models.
Most of the learning materials found on this website are now available in a traditional textbook format.