Point estimation is a type of statistical inference which consists in producing a guess or approximation of an unknown parameter.
In this lecture we introduce the theoretical framework that underlies all point estimation problems.
At the end of the lecture, we provide links to detailed examples of point estimation, in which we show how to apply the theory.
The main elements of a point estimation problem are those found in any statistical inference problem:
we have a sample that has been drawn from a probability distribution whose characteristics are at least partly unknown;
the sample
is regarded as the realization of a random vector
;
the joint distribution
function of
,
denoted by
,
is assumed to belong to a set of distribution functions
,
called statistical model.
When the model
is put into correspondence with a set
of real vectors, then we have a parametric model.
The set
is called the parameter space and its elements are called
parameters.
Denote by
the parameter that is associated with the data-generating distribution
and assume that
is unique. The vector
is called the true parameter.
Point estimation is the act of choosing a vector
that approximates
.
The approximation
is called an estimate (or point estimate) of
.
When the estimate
is produced using a predefined rule (a function) that associates a parameter
estimate
to each
in the support of
,
we can
write
The function
is called an estimator.
Often, the symbol
is used to denote both the estimate and the estimator. The meaning is usually
clear from the context.
According to the decision-theoretic terminology
introduced previously, making an
estimate
is an act, which produces consequences.
Among these consequences, the most relevant one is the estimation
error
The statistician's goal is to commit the smallest possible estimation error.
The preference for small errors can be formalized with a loss
function
that
quantifies the loss incurred by estimating
with
.
Examples of loss functions are:
the absolute
error:where
is the Euclidean norm (it coincides with the absolute value when
);
the squared
error:
When the estimate
is obtained from an estimator, it is a function of the random vector
and the loss
is
a random variable.
The expected value
of the
lossis
called the statistical risk (or, simply, the risk) of the
estimator
.
The expected value in the definition of risk is computed with respect to the
true distribution function
.
Therefore, we can compute the risk
only if we know the true parameter
and
.
When
and
are unknown, the risk needs to be estimated.
For example, we can approximate the risk with the quantity
where:
we pretend that the estimate
is the true parameter;
we denote the estimator of
by
we compute the expected value with respect to the estimated distribution
function
.
Even if the risk is unknown, the notion of risk is often used to derive theoretical properties of estimators.
Point estimation is always guided, at least ideally, by the principle of risk minimization, that is, by the search for estimators that minimize the risk.
Depending on the specific loss function we use, the statistical risk of an estimator can take different names:
when the absolute error is used as a loss function, then the
riskis
called the Mean Absolute Error (MAE) of the estimator.
when the squared error is used as a loss function, then the
riskis
called Mean Squared Error (MSE). The square root of the mean squared error is
called root mean squared error (RMSE).
In this section we discuss other criteria that are commonly used to evaluate estimators.
If an estimator produces parameter estimates that are on average correct, then it is said to be unbiased.
The following is a formal definition.
Definition
Let
be the true parameter. An estimator
is an unbiased estimator of
if and only
if
If
an estimator is not unbiased, then it is called a biased estimator.
If an estimator is unbiased, then the estimation error is on average
zero:
If an estimator produces parameter estimates that converge to the true value when the sample size increases, then it is said to be consistent.
The following is a formal definition.
Definition
Let
be a sequence of samples such that all the distribution functions
are put into correspondence with the same parameter
.
A sequence of estimators
is said to be consistent (or weakly consistent) if and only
if
where
indicates convergence in probability. The sequence
of estimators is said to be strongly consistent if and only
if
where
indicates almost sure convergence. A sequence of
estimators which is not consistent is called inconsistent.
When the sequence of estimators is obtained using the same predefined rule for
every sample
,
we often say, with a slight abuse of language, "consistent estimator" instead
of saying "consistent sequence of estimators". In such cases, what we mean is
that the predefined rule produces a consistent sequence of estimators.
You can find detailed examples of point estimation in the lectures on:
The methods to find point estimators are called estimation methods.
You can read about these methods here:
There is another kind of estimation, called set estimation or interval estimation.
While in point estimation we produce a single estimate meant to approximate the true parameter, in set estimation we produce a whole set of estimates meant to include the true parameter with high probability.
Please cite as:
Taboga, Marco (2021). "Point estimation", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/point-estimation.
Most of the learning materials found on this website are now available in a traditional textbook format.