Interval estimation (or set estimation) is a kind of statistical inference in which we search for an interval of values that contains the true parameter with high probability. Such an interval is called a confidence interval.
The mathematical framework of interval estimation is the same of other statistical inference methods:
a sample is employed to make statements about the probability distribution from which the sample has been generated;
the sample can be regarded as a realization of a random vector , whose joint distribution function, denoted by , is unknown;
is assumed to belong to a set of distribution functions , called statistical model.
Furthermore, the model is parametrized:
in a parametric model, the set is put into correspondence with a set of -dimensional real vectors;
is called the parameter space and its elements are called parameters;
there is a true parameter , which is associated with the unknown data-generating distribution
is assumed to be unique (this is not strictly necessary, but we do it to simplify the exposition).
In set estimation, the aim is to choose a subset in such a way that has a high probability of containing the unknown parameter .
The chosen subset is called a set estimate of or a confidence set for .
A special terminology is used in the case in which:
the parameter space is a subset of the set of real numbers ;
the subset is chosen among the intervals of (e.g. intervals of the kind )
In this case, we speak about:
interval estimation (instead of set estimation);
interval estimate (instead of set estimate);
confidence interval (instead of confidence set).
When the set estimate is produced using a predefined rule (a function) that associates a set estimate to each in the support of , we can write
The function is called a set estimator (interval estimator).
The probability that the interval estimator contains the true parameter is called coverage probability.
In formal terms, the coverage probability of an interval estimator is defined aswhere the notation is used to indicate that the probability is calculated using the distribution function associated to the true parameter .
In this definition, the random quantity is the interval , while the parameter is fixed.
The coverage probability is usually chosen by the statistician.
Intuitively, before observing the data the statistician makes a statement: where is the parameter space, containing all the parameters that are deemed plausible.
The statistician believes the statement to be true, but the statement is not very informative because is a very large set.
After observing the data, she makes a more informative statement:
This statement is more informative because is smaller than , but it has a positive probability of being wrong (which is the complement to of the coverage probability).
In controlling this probability, the statistician faces a trade-off: if she decreases the probability of being wrong, then her statements become less informative; on the contrary, if she increases the probability of being wrong, then her statements become more informative.
In practice, the coverage probability is seldom known because it depends on the unknown parameter (although in some important cases it is equal for all parameters belonging to the parameter space).
When the coverage probability is not known, it is customary to compute the confidence coefficient , which is defined as
In other words, the confidence coefficient is equal to the smallest possible coverage probability.
The confidence coefficient is also often called level of confidence.
We already mentioned that there is a trade-off in the construction and choice of a set estimator.
On the one hand, we want our set estimator to have a high coverage probability, that is, we want the set to include the true parameter with a high probability.
On the other hand, we want the size of to be as small as possible, so as to make our interval estimate more precise.
What do we mean by size of ?
If the parameter space is unidimensional and is an interval estimate, then the size of is its length.
If the space is multidimensional, then the size of is its volume.
The size of a confidence set is also called measure of a confidence set. For those who have a grasp of measure theory, the name stems from the fact that Lebesgue measure is the generalization of volume in multidimensional spaces.
If we denote by the size of a confidence set, then we can also define the expected size of a set estimator :where the notation is used to indicate that the expected value is calculated using the true distribution function .
Like the coverage probability, also the expected size of a set estimator depends on the unknown parameter .
Hence, unless it is a constant function of , one needs to somehow estimate it or to take the infimum over all possible values of the parameter, as we did above for coverage probabilities.
Although size is probably the simplest criterion to evaluate and select interval estimators, there are several other criteria. We do not discuss them here, but we refer the reader to the very nice exposition in Berger and Casella (2002).
Examples of interval estimation problems can be found in the following lectures:
Berger, R. L. and G. Casella (2002) "Statistical inference", Duxbury Advanced Series.
Please cite as:
Taboga, Marco (2021). "Interval estimation", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/set-estimation.
Most of the learning materials found on this website are now available in a traditional textbook format.