Central Limit Theorems (CLT) state conditions that are sufficient to guarantee the convergence of the sample mean to a normal distribution as the sample size increases.
Table of contents
As Central Limit Theorems concern the sample mean, we first define it precisely.
Let
be a sequence of random variables.
We will denote by
the sample mean of the first
terms of the
sequence:
When the sample size
increases, we add more observations
to the sample mean.
Note that the sample mean, being a sum of random variables, is itself a random variable.
The Central Limit Theorem tells us what happens to the distribution of the sample mean when we increase the sample size.
Remember that if the conditions of a Law of Large
Numbers apply, the sample mean
converges in
probability to the expected value of the observations, that
is,
In a Central Limit Theorem, we first standardize the sample mean, that is, we
subtract from it its expected value and we divide it by its standard
deviation. Then, we analyze the behavior of its distribution as the sample
size gets large. What happens is that the standardized sample mean
converges in distribution to a normal
distribution:
where
is a standard normal random variable.
In the important case in which the variables
are independently and identically distributed (IID), the formula above
becomes
because
and
Several students are confused by the fact that the sample mean converges to a constant in the Law of Large Numbers, while it converges to a normal distribution in the Central Limit Theorem. This seems a contradiction: a normal distribution is not a constant!
The formula for the IID case may help to eliminate this kind of doubt: in the
Law of Large Numbers, the variance of the sample mean converges to zero, while
in the Central Limit Theorem the sample mean is multiplied by
so that its variance stays constant.
In practice, the CLT is used as follows:
we observe a sample consisting of
observations
,
,
,
;
if
is large enough, then a standard normal distribution is a good approximation
of the distribution of the standardized sample mean;
therefore, we pretend
thatwhere
indicates the normal distribution with mean
and variance
;
as a consequence, the distribution of the sample mean
is
There are several Central Limit Theorems. We report some examples below.
The best known Central Limit Theorem is probably Lindeberg-Lévy CLT:
Proposition (Lindeberg-Lévy
CLT)
Let
be an IID sequence of random
variables such
that:
where
.
Then, a Central Limit Theorem applies to the sample mean
:
where
is a standard normal random variable and
denotes convergence in distribution.
We will just sketch a proof. For a detailed
and rigorous proof see, for example: Resnick (1999)
and Williams (1991). First of all, denote by
the sequence whose generic term
is
The
characteristic function of
is
Now
take a second order Taylor series expansion of
around the point
:
where
is an infinitesimal of higher order than
,
that is, a quantity that converges to
faster than
does.
Therefore,
So,
we have
that
where
is
the characteristic function of a standard normal random variable
(see the lecture entitled Normal distribution). A
theorem, called Lévy continuity theorem, which we do not cover in these
lectures, states that if a sequence of random variables
is such that their characteristic functions
converge to the characteristic function
of a random variable
,
then the sequence
converges in distribution to
.
Therefore, in our case the sequence
converges in distribution to a standard normal distribution.
So, roughly speaking, under the stated assumptions, the distribution of the
sample mean
can be approximated by a normal distribution with mean
and variance
(provided
is large enough).
Also note that the conditions for the validity of Lindeberg-Lévy Central
Limit Theorem resemble the conditions for the validity of
Kolmogorov's Strong Law of Large
Numbers. The only difference is the additional requirement
that
In the Lindeberg-Lévy CLT (see above),
the sequence
is required to be an IID sequence. The assumption of independence can be
weakened as follows.
Proposition (CLT for correlated
sequences)
Let
be a stationary and
mixing sequence of random variables
satisfying a CLT technical condition (defined in the proof below) and such
that
where
.
Then, a Central Limit Theorem applies to the sample mean
:
where
is a standard normal random variable and
indicates convergence in distribution.
Several different technical conditions (beyond those explicitly stated in the above proposition) are imposed in the literature in order to derive Central Limit Theorems for correlated sequences. These conditions are usually very mild and differ from author to author. We do not mention these technical conditions here and just refer to them as CLT technical conditions.
For a proof, see for example Durrett (2010) and White (2001).
So, roughly speaking, under the stated assumptions, the distribution of the
sample mean
can be approximated by a normal distribution with mean
and variance
(provided
is large enough).
Also note that the conditions for the validity of the Central Limit Theorem
for correlated sequences resemble the conditions for the validity of the
ergodic theorem. The main differences
(beyond some technical conditions that are not explicitly stated in the above
proposition) are the additional requirements
that:and
the fact that ergodicity is replaced by
the stronger condition of mixing.
Finally, let us mention that the variance
in the above proposition, which is defined
as
is
called the long-run variance of
.
The results illustrated above for sequences of random variables extend in a straightforward manner to sequences of random vectors. For example, the multivariate version of the Lindeberg-Lévy CLT is as follows.
Proposition (Multivariate
Lindeberg-Lévy
CLT)
Let
be an IID sequence of
random vectors such
that
where
for an invertible matrix
.
Let
be the vector of sample means.
Then,
where
is a standard multivariate normal random vector
and
denotes convergence in distribution.
For a proof see, for example, Basu (2004), DasGupta (2008) and McCabe and Tremayne (1993).
In a similar manner, the CLT for correlated sequences generalizes to random
vectors
(
becomes a matrix, called long-run covariance matrix).
Below you can find some exercises with explained solutions.
Let
be a sequence of independent Bernoulli
random variables with parameter
,
i.e. a generic term
of the sequence has
support
and
probability mass
function
Use a Central Limit Theorem to derive an approximate distribution for the mean
of the first
terms of the sequence.
The sequence
is and IID sequence. The mean of a generic term of the sequence
is
The
variance of a generic term of the sequence can be derived thanks to the usual
formula for computing the variance
(
):
Therefore,
the sequence
satisfies the conditions of Lindeberg-Lévy Central Limit Theorem (IID,
finite mean, finite variance). The mean of the first
terms of the sequence
is
Using
the Central Limit Theorem to approximate its distribution, we
obtain
or
Let
be a sequence of independent Bernoulli random variables with parameter
,
as in the previous exercise. Let
be another sequence of random variables such
that
Suppose
satisfies the conditions of a Central Limit Theorem for correlated sequences.
Derive an approximate distribution for the mean of the first
terms of the sequence
.
The sequence
is and IID sequence. The mean of a generic term of the sequence
is
The
variance of a generic term of the sequence
is
The
covariance between two successive terms of the sequence
is
The
covariance between two terms that are not adjacent
(
and
,
with
)
is
The
long-run variance
is
The
mean of the first
terms of the sequence
is
Using
the Central Limit Theorem for correlated sequences to approximate its
distribution, we
obtain:
or
Let
be a binomial random variable with parameters
and
(you need to read the lecture entitled Binomial
distribution in order to be able to solve this exercise). By using the
Central Limit Theorem, show that a normal random variable
with mean
and variance
can be used as an approximation of
.
A binomial random variable
with parameters
and
can be written
as
where
,
...,
are mutually independent Bernoulli random variables with parameter
.
Thus,
In
the first exercise, we have shown that the distribution of
can be approximated by a normal
distribution:
Therefore,
the distribution of
can be approximated
by
Thus,
can be approximated by a normal distribution with mean
and variance
.
Basu, A. K. (2004) Measure theory and probability, PHI Learning PVT.
DasGupta, A. (2008) Asymptotic theory of statistics and probability, Springer.
Durrett, R. (2010) Probability: theory and examples, Cambridge University Press.
McCabe, B. and A. Tremayne (1993) Elements of modern asymptotic theory with statistical applications, Manchester University Press.
Resnick, S. I. (1999) A probability path, Birkhauser.
White, H. (2001) Asymptotic theory for econometricians, Academic Press.
Williams, D. (1991) Probability with martingales, Cambridge University Press.
Please cite as:
Taboga, Marco (2021). "Central Limit Theorem", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/asymptotic-theory/central-limit-theorem.
Most of the learning materials found on this website are now available in a traditional textbook format.