The Expectation-Maximization (EM) algorithm is a recursive algorithm that can be used to search for the maximum likelihood estimators of model parameters when the model includes some unobservable variables (also called latent variables).
In a latent-variable model, there are two vectors:
the observed data
;
the vector of unobserved variables
.
We denote their joint probability
bywhere
is a vector of parameters to be estimated.
The joint distribution of
and
is usually obtained by separately specifying the conditional distribution of
the observed
variables
and
the marginal distribution of the latent
variables
As a consequence, the joint distribution
is
In the context of latent-variable models, the joint distribution is often called complete likelihood.
Given the model specification above, if
is a discrete random vector,
we can derive the marginal distribution of
as
where
the sum is over the set
of all the values that
can possibly take (its
support).
We can also derive the conditional distribution of
as
If
is continuous the sums above are replaced by
integrals.
The maximum likelihood estimator (MLE) of the parameter
is the vector
that solves the maximization
problem
This problem does not usually have an analytical solution. As a consequence,
we need to solve it with an iterative procedure that starts from an initial
guess
of the solution and produces a
sequence
that
hopefully converges to the solution.
We say hopefully because we are often unable to derive theoretical guarantees about the numerical convergence of likelihood maximization algorithms, especially for latent-variable models.
The EM algorithm is one of the iterative procedures that can be used to search for a solution when we are dealing with a latent-variable model specified as above.
Starting from an initial guess
,
the
-th
iteration of the EM algorithm consists of the following steps:
use the parameter value
found in the previous iteration to compute the
conditional
probabilities
for
each
;
use the conditional probabilities derived in step 1 to compute the
expected value of
the complete
log-likelihood:
solve the maximization
problem
if the parameter update is smaller than a pre-specified threshold
,
that is, if
stop
the algorithm, else return to step 1.
Steps 1 and 2 are collectively called the Expectation step, while step 3 is called the Maximization step. Hence the name of the algorithm (Expectation-Maximization).
Step 4 is a stopping criterion: we stop the algorithm when there are no significant changes in the parameter.
In general, the algorithm is not guaranteed to converge to a global maximum of the likelihood.
However, it has the important property that the likelihood is guaranteed to
increase at each
iteration:for
every
.
The conditional probability
formulaimplies
that
If
we multiply both sides by
,
we
obtain
Then,
we can sum over the support of
:
Since
the log-likelihood
does not depend on
and
we
have
Moreover,
Thus,
Setting
,
we
obtain
Subtracting
(2) from (1), we
get
By
Jensen's
inequality, we
have
Therefore,
which,
for
,
becomes
But
because
Therefore
or
The EM algorithm is particularly advantageous when the maximization problem in
the Maximization
step
has a closed-form solution.
This happens, for example, when the latent-variable model is a mixture of multivariate normal distributions.
As we said, there is no guarantee that the EM algorithm converges to a global maximum of the likelihood.
If we suspect that the likelihood may have multiple local minima, we should
use the
multiple
starts approach. In other words, we should run the EM algorithm several
times with different starting values
.
The EM algorithm is often used to estimate the parameters of a mixture of normal distributions (also called Gaussian mixture). See this lecture for details.
Please cite as:
Taboga, Marco (2021). "EM algorithm", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/EM-algorithm.
Most of the learning materials found on this website are now available in a traditional textbook format.