Search for probability and statistics terms on Statlect
StatLect

Interval estimation

by , PhD

Interval estimation (or set estimation) is a kind of statistical inference in which we search for an interval of values that contains the true parameter with high probability. Such an interval is called a confidence interval.

Table of Contents

The mathematical framework

The mathematical framework of interval estimation is the same of other statistical inference methods:

Furthermore, the model is parametrized:

Confidence set

In set estimation, the aim is to choose a subset $Tsubseteq Theta $ in such a way that $T$ has a high probability of containing the unknown parameter $	heta _{0}$.

The chosen subset $T$ is called a set estimate of $	heta _{0}$ or a confidence set for $	heta _{0}$.

Confidence interval

A special terminology is used in the case in which:

  1. the parameter space $Theta $ is a subset of the set of real numbers R;

  2. the subset $T$ is chosen among the intervals of R (e.g. intervals of the kind $left[ a,b
ight] $)

In this case, we speak about:

Interval estimator

When the set estimate $T$ is produced using a predefined rule (a function) that associates a set estimate $T$ to each $xi $ in the support of $Xi $, we can write[eq5]

The function [eq6] is called a set estimator (interval estimator).

Coverage probability

The probability that the interval estimator $T$ contains the true parameter is called coverage probability.

In formal terms, the coverage probability of an interval estimator is defined as[eq7]where the notation [eq8] is used to indicate that the probability is calculated using the distribution function [eq9] associated to the true parameter $	heta _{0}$.

In this definition, the random quantity is the interval [eq10], while the parameter $	heta _{0}$ is fixed.

How the coverage probability is chosen

The coverage probability is usually chosen by the statistician.

Intuitively, before observing the data the statistician makes a statement: [eq11]where $Theta $ is the parameter space, containing all the parameters that are deemed plausible.

The statistician believes the statement to be true, but the statement is not very informative because $Theta $ is a very large set.

After observing the data, she makes a more informative statement:[eq12]

This statement is more informative because $T$ is smaller than $Theta $, but it has a positive probability of being wrong (which is the complement to 1 of the coverage probability).

In controlling this probability, the statistician faces a trade-off: if she decreases the probability of being wrong, then her statements become less informative; on the contrary, if she increases the probability of being wrong, then her statements become more informative.

Level of confidence

In practice, the coverage probability is seldom known because it depends on the unknown parameter $	heta _{0}$ (although in some important cases it is equal for all parameters belonging to the parameter space).

When the coverage probability is not known, it is customary to compute the confidence coefficient [eq13], which is defined as [eq14]

In other words, the confidence coefficient [eq15] is equal to the smallest possible coverage probability.

The confidence coefficient is also often called level of confidence.

Size of the confidence set

We already mentioned that there is a trade-off in the construction and choice of a set estimator.

On the one hand, we want our set estimator $T$ to have a high coverage probability, that is, we want the set $T$ to include the true parameter with a high probability.

On the other hand, we want the size of $T$ to be as small as possible, so as to make our interval estimate more precise.

What do we mean by size of $T$?

If the parameter space $Theta $ is unidimensional and $T$ is an interval estimate, then the size of $T$ is its length.

If the space $Theta $ is multidimensional, then the size of $T$ is its volume.

The size of a confidence set is also called measure of a confidence set. For those who have a grasp of measure theory, the name stems from the fact that Lebesgue measure is the generalization of volume in multidimensional spaces.

Expected size of the confidence set

If we denote by [eq16] the size of a confidence set, then we can also define the expected size of a set estimator $T$:[eq17]where the notation [eq18] is used to indicate that the expected value is calculated using the true distribution function [eq19].

Like the coverage probability, also the expected size of a set estimator depends on the unknown parameter $	heta _{0}$.

Hence, unless it is a constant function of $	heta _{0}$, one needs to somehow estimate it or to take the infimum over all possible values of the parameter, as we did above for coverage probabilities.

Other criteria to evaluate set estimators

Although size is probably the simplest criterion to evaluate and select interval estimators, there are several other criteria. We do not discuss them here, but we refer the reader to the very nice exposition in Berger and Casella (2002).

Examples

Examples of interval estimation problems can be found in the following lectures:

  1. Confidence intervals for the mean;

  2. Confidence intervals for the variance.

References

Berger, R. L. and G. Casella (2002) "Statistical inference", Duxbury Advanced Series.

How to cite

Please cite as:

Taboga, Marco (2021). "Interval estimation", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/set-estimation.

The books

Most of the learning materials found on this website are now available in a traditional textbook format.