Autocorrelation is the coefficient of linear correlation between two terms of a sequence of random variables.
Autocorrelation is also called serial correlation.
The following is formal definition.
Definition
Let
be a sequence of random variables. The autocorrelation coefficient between two
terms of the sequence
and
is
In other words, the autocorrelation coefficient is just the coefficient of linear correlation between two random variables belonging to the same sequence.
Note that the covariance
is called autocovariance.
Remember that a sequence of random variables is said to be covariance stationary (or weakly stationary) if and only if:
all the terms of the sequence have the same
mean:
the covariance between any two terms of the sequence depends only on on how
far apart they are located from each other, and not on where they are located
in the
sequence:
The second of these two properties implies that all the random variables in
the sequence have the same
variance:because
.
When a sequence is covariance stationary, the autocorrelation coefficient
between two terms of the sequence
and
depends only on
:
We denote it by
:
and we call it autocorrelation at lag
(the distance
between two terms of the sequence is called lag).
When we observe the first
realizations of a sequence
,
we can compute the sample autocorrelation at lag
:
where
is the sample
mean
If
is covariance stationary, then the numerator of
is a consistent estimator of
and the denominator is a consistent estimator of
.
As a consequence,
is a consistent estimator of the autocorrelation at lag
.
The autocorrelation function (ACF) is the function that maps lags to
autocorrelations, that is,
is considered as a function of
(see the examples below).
When the mapping is from lags to sample autocorrelations
,
then we call it sample ACF.
An ACF plot is a bar chart (or a line chart) that plots the autocorrelation function:
lags are on the x-axis;
the autocorrelations corresponding to the lags are on the y-axis.
Let's look at some examples of ACF and ACF plots.
Suppose that
is a covariance stationary sequence such
that
where
is a constant and
is an IID sequence of
standard normal
random variables (zero mean and unit variance).
Such a sequence is called an autoregressive process of order 1, or AR(1) process (the order is the maximum lag of the sequence on the right hand side of the equation).
Note that
where
we have performed recursive substitutions of
with
.
By using this expression for
,
we can easily derive the autocovariance at lag
:
where:
in steps
and
we have used the
bilinearity of the
covariance operator and in step
we have used the facts that 1) the covariance of a random variable with itself
is equal to its variance; 2) the covariance between
and
is zero for any
because
depends only on
for
and the sequence
is IID.
Thus, the autocorrelation at lag
is
The following ACF plots show the autocorrelation function for different values
of
.
In this example, we show what a sample ACF looks like.
We generate, via Monte Carlo simulations, 200 realizations for each of the four AR(1) processes whose ACFs have been plotted above. The realizations are plotted below.
We then compute their sample ACFs, which are plotted below.
These are the sample versions of the ACFs shown in Example 1. As the sample autocorrelations are noisy estimates of the true autocorrelations, these ACFs do not coincide with those shown in Example 1.
Please cite as:
Taboga, Marco (2021). "Autocorrelation", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/autocorrelation.
Most of the learning materials found on this website are now available in a traditional textbook format.