Hypothesis testing is a method of making statistical inferences in which:
we establish an hypothesis, called null hypothesis;
we use some data to decide whether to reject or not to reject the hypothesis.
This lecture provides a rigorous introduction to the mathematics of hypothesis tests, and it provides several links to other pages where the single steps of a test of hypothesis can be studied in more detail.
Remember that a statistical inference is a statement about the probability distribution from which a sample has been drawn.
In mathematical terms, the sample
can be regarded as a realization of a random vector
,
whose unknown joint
distribution function
is assumed to belong to a set of distribution functions
,
called statistical model.
Example
We observe the realizations
of
independently and identically
distributed (IID) random variables having a
normal
distribution. The sample
can
be regarded as a realization of a random vector
whose entries are all independent of each other. The statistical model is a
set of distribution functions satisfying certain
conditions
We
will continue this example in the following sections.
In hypothesis testing we make a statement about a model restriction involving
a subset
of the original model.
The statement we make is chosen between two possible statements:
reject the restriction
;
do not reject the restriction
.
Roughly speaking, we start from a large set
of distributions that might possibly have generated the sample
and we would like to restrict our attention to a smaller set
.
In a test of hypothesis, we use the sample
to decide whether or not to indeed restrict our attention to the smaller set
.
Example
In the case of our normal sample, we might want to test the restriction that
the mean of the distribution is equal to zero. The restriction would
be:
Remember that in a parametric model the set of distribution functions
is put into correspondence with a set
of
-dimensional
real vectors called the parameter
space.
The elements of
are called parameters and the true
parameter is denoted by
.
The true parameter is the parameter associated with the unknown distribution
function
from which the sample
was actually drawn. For simplicity,
is assumed to be unique.
In parametric hypothesis testing we have a restriction
on the parameter space and we choose one of the following two statements about
the restriction:
reject the restriction
;
do not reject the restriction
.
For concreteness, we will focus on parametric hypothesis testing in this lecture, but most of the things we will say apply with straightforward modifications to hypothesis testing in general.
Example
In the above example, a normal distribution is completely described by its
mean
and variance
.
Thus, each distribution in the set
is put into correspondence with a parameter vector
.
In this case the parameter space is
.
The restriction to be tested is that the mean of the distribution be equal to
zero. Therefore, the parametric restriction
is
The hypothesis that the restriction is true is called null hypothesis and it
is usually denoted by
:
Understanding how to formulate a null hypothesis is a fundamental step in hypothesis testing. We suggest to read a thorough discussion of null hypotheses here.
Example
In our example, the null hypothesis
is
The restriction
(where
is the complement of
)
is often called alternative
hypothesis and it is denoted by
:
Statisticians sometimes take into consideration as an alternative hypothesis a
set smaller than
.
In these cases, the null hypothesis and the alternative hypothesis do not
cover all the possibilities contemplated by the parameter space
.
For some authors, "rejecting the null hypothesis
"
and "accepting the alternative hypothesis
"
are synonyms. For other authors, however, "rejecting the null hypothesis
"
does not necessarily imply "accepting the alternative hypothesis
".
Although this is mostly a matter of language, it is possible to envision
situations in which, after rejecting
,
a second test of hypothesis is performed whereby
becomes the new null hypothesis and it is rejected (this may happen for
example if the model is
mis-specified).
In these situations, if "rejecting the null hypothesis
"
and "accepting the alternative hypothesis
"
are treated as synonyms, then some confusion arises, because the first test
leads to "accept
"
and the second test leads to "reject
".
Example
In our example, the alternative hypothesis could
be
When we decide whether to reject a restriction or not to reject it, we can incur in two types of errors:
reject the restriction
when the restriction is true; this is called an error of the first kind or a
Type I error;
do not reject the restriction
when the restriction is false; this is called an error of the second kind or a
Type II error.
Example
In our example, if we reject the restriction
when it is true, we commit a Type I error.
Remember that the sample
is regarded as a realization of a random vector
having support
.
A test of hypothesis is usually carried out by explicitly or implicitly
subdividing the support
into two disjoint subsets.
One of the two subsets, denoted by
is called the critical region (or rejection region) and it is the set of all
values of
for which the null hypothesis is
rejected:
The other subset is the complement of the critical
region:and
it is, of course, such
that
This mathematical formulation is made more concrete in the next section.
The critical region is often implicitly defined in terms of a test statistic and a critical region for the test statistic.
A test statistic is a
random variable
whose
realization is a function of the sample
.
A critical region for
is a subset
of the set of real numbers and the test is performed based on the test
statistic, as
follows:
If the complement of the critical region
is an interval, then its extremes are called critical
values of the test. See
this glossary entry for more details
about critical values.
Example In our example, where we are testing that the mean of the normal distribution is zero, we could use a test statistic called z-statistic. If you want to read the details, go to the lecture on hypothesis tests about the mean.
The power function of a test of
hypothesis is the function that associates the probability of rejecting
to each parameter
.
Denote the critical region by
.
The power function
is defined as
follows:
where
the notation
is used to indicate the fact that the
probability is
calculated using the distribution function
associated to the parameter
.
When
,
the power function
gives us the probability of committing a Type I error, that is, the
probability of rejecting the null hypothesis when the null hypothesis is true.
The maximum probability of committing a Type I error is,
therefore,
This maximum probability is called the size of the test.
The size of the test is also called by some authors the level of significance of the test. However, according to other authors, who assign a slightly different meaning to the term, the level of significance of a test is an upper bound on the size of the test.
In mathematical tems, the level of significance is a constant
that, to the statistician's knowledge,
satisfies
Tests of hypothesis are most commonly evaluated based on their size and power.
An ideal test should have:
size equal to
(i.e., the probability of rejecting the null hypothesis when the null
hypothesis is true should be
);
power equal to
when
(i.e. the probability of rejecting the null hypothesis when the null
hypothesis is false should be
).
Of course, such an ideal test is never found in practice, but the best we can hope for is a test with a very small size and a very high probability of rejecting a false hypothesis. Nevertheless, this ideal is routinely used to choose among different tests.
For example:
if we choose between two tests having the same size, we will always utilize
the test that has the higher power when
;
if we choose between two tests that have the same power when
,
we will always utilize the test that has the smaller size.
Several other criteria, beyond power and size, are used to evaluate tests of hypothesis. We do not discuss them here, but we refer the reader to the very nice exposition in Berger and Casella (2002).
Examples of how the mathematics of hypothesis testing works can be found in the following lectures:
Hypothesis tests about the mean (examples of tests of hypothesis about the mean of an unknown distribution);
Hypothesis tests about the variance (examples of tests of hypothesis about the variance of an unknown distribution).
Berger, R. L. and G. Casella (2002) "Statistical inference", Duxbury Advanced Series.
Please cite as:
Taboga, Marco (2021). "Hypothesis testing", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/hypothesis-testing.
Most of the learning materials found on this website are now available in a traditional textbook format.