Interval estimation (or set estimation) is a kind of statistical inference in which we search for an interval of values that contains the true parameter with high probability. Such an interval is called a confidence interval.
The mathematical framework of interval estimation is the same of other statistical inference methods:
a sample is employed to make statements about the probability distribution from which the sample has been generated;
the sample
can be regarded as a realization of a
random vector
,
whose joint distribution
function, denoted by
,
is unknown;
is assumed to belong to a set of distribution functions
,
called statistical model.
Furthermore, the model is parametrized:
in a parametric model, the set
is put into correspondence with a set
of
-dimensional
real vectors;
is called the parameter space and its elements are called
parameters;
there is a true parameter
,
which is associated with the unknown data-generating distribution
is assumed to be unique (this is not strictly necessary, but we do it to
simplify the exposition).
In set estimation, the aim is to choose a subset
in such a way that
has a high probability of containing the unknown parameter
.
The chosen subset
is called a set estimate of
or a confidence set for
.
A special terminology is used in the case in which:
the parameter space
is a subset of the set of real numbers
;
the subset
is chosen among the intervals of
(e.g. intervals of the kind
)
In this case, we speak about:
interval estimation (instead of set estimation);
interval estimate (instead of set estimate);
confidence interval (instead of confidence set).
When the set estimate
is produced using a predefined rule (a function) that associates a set
estimate
to each
in the support of
,
we can
write
The function
is called a set estimator (interval estimator).
The probability that the interval estimator
contains the true parameter is called coverage probability.
In formal terms, the coverage probability of an interval estimator is defined
aswhere
the notation
is used to indicate that the probability is calculated using the distribution
function
associated to the true parameter
.
In this definition, the random quantity is the interval
,
while the parameter
is fixed.
The coverage probability is usually chosen by the statistician.
Intuitively, before observing the data the statistician makes a statement:
where
is the parameter space, containing all the parameters that are deemed
plausible.
The statistician believes the statement to be true, but the statement is not
very informative because
is a very large set.
After observing the data, she makes a more informative
statement:
This statement is more informative because
is smaller than
,
but it has a positive probability of being wrong (which is the complement to
of the coverage probability).
In controlling this probability, the statistician faces a trade-off: if she decreases the probability of being wrong, then her statements become less informative; on the contrary, if she increases the probability of being wrong, then her statements become more informative.
In practice, the coverage probability is seldom known because it depends on
the unknown parameter
(although in some important cases it is equal for all parameters belonging to
the parameter space).
When the coverage probability is not known, it is customary to compute the
confidence coefficient
,
which is defined as
In other words, the confidence coefficient
is equal to the smallest possible coverage probability.
The confidence coefficient is also often called level of confidence.
We already mentioned that there is a trade-off in the construction and choice of a set estimator.
On the one hand, we want our set estimator
to have a high coverage probability, that is, we want the set
to include the true parameter with a high probability.
On the other hand, we want the size of
to be as small as possible, so as to make our interval estimate more precise.
What do we mean by size of
?
If the parameter space
is unidimensional and
is an interval estimate, then the size of
is its length.
If the space
is multidimensional, then the size of
is its volume.
The size of a confidence set is also called measure of a confidence set. For those who have a grasp of measure theory, the name stems from the fact that Lebesgue measure is the generalization of volume in multidimensional spaces.
If we denote by
the size of a confidence set, then we can also define the expected size of a
set estimator
:
where
the notation
is used to indicate that the expected value is calculated using the true
distribution function
.
Like the coverage probability, also the expected size of a set estimator
depends on the unknown parameter
.
Hence, unless it is a constant function of
,
one needs to somehow estimate it or to take the infimum over all possible
values of the parameter, as we did above for coverage probabilities.
Although size is probably the simplest criterion to evaluate and select interval estimators, there are several other criteria. We do not discuss them here, but we refer the reader to the very nice exposition in Berger and Casella (2002).
Examples of interval estimation problems can be found in the following lectures:
Berger, R. L. and G. Casella (2002) "Statistical inference", Duxbury Advanced Series.
Please cite as:
Taboga, Marco (2021). "Interval estimation", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/set-estimation.
Most of the learning materials found on this website are now available in a traditional textbook format.