A hierarchical Bayesian model is a model in which the prior distribution of some of the model parameters depends on other parameters, which are also assigned a prior.
Given the observed data
,
in a hierarchical
Bayesian model,
the likelihood depends on two parameter vectors
and
and
the
prior
is
specified by separately specifying the conditional distribution
and the distribution
.
In the literature it is often required that the likelihood does not depend on
,
that
is,
In this special case, the parameter
is called hyper-parameter and the prior
is called hyper-prior.
We use a broader definition of hierarchical model, that does not necessarily include assumption (1), because it allows for a unified treatment of several interesting models.
The following examples illustrate two popular models that fall within our definition.
Suppose the sample
is
a vector of draws
from
normal
distributions having different unknown means
and a known common variance
:
Denote by
the vector of means:
Conditional on
,
the observations are assumed to be
independent.
As a consequence, the likelihood of the whole sample, conditional on
,
can be written as
Now, assume the means
are a sample of IID draws from a normal
distribution with unknown mean
and known variance
,
so
that
Finally, we assign a normal prior (with known mean
and variance
)
to the hyper-parameter
:
The model just described is a hierarchical model. With the notation used in
the definition, we have
,
and the added assumption
that
Suppose that the sample
is
a vector of IID draws
from a normal distribution having unknown mean
and unknown variance
.
The likelihood of the whole sample, conditional on
and
,
is
Now, assume that the mean
is itself normal with known mean
and variance
,
where
is a known
parameter:
Finally, we assign an inverse-Gamma prior to the parameter
(i.e., a Gamma
distribution to the precision
):
where
and
are the two parameters of the Gamma distribution.
This is a very popular model, known as normal - inverse Gamma model.
It fits the above definition of a hierarchical model with
,
.
The computation of the posterior distribution is usually performed in steps:
first
is taken as given, and a conditional distribution for
is derived; then a posterior for
is computed.
The steps are as follows.
Conditional on
(i.e., by keeping it fixed), compute:
the prior predictive distribution of
:
the posterior distribution of
:
By using
from step 1, compute:
the prior predictive distribution of
:
the posterior marginal distribution of
:
Compute the posterior joint distribution of
and
:
Compute the posterior marginal distribution of
:
When we are not able to carry out the integrations required to derive the
predictive distributions, or when we cannot compute posteriors with
Bayes' rule, then we
can use other computational methods (e.g., the factorization method
illustrated in the lecture on
Bayesian
inference). In these cases, the steps of the above procedure remain valid:
we first derive posterior and predictive distributions given
,
by using whatever method is available to us; then, we use the
conditional
distributions thus derived to compute the posterior of
.
In the definition above, there were only two levels: a parameter
and a hyper-parameter
.
The definition can be generalized to more than two levels. For example, we
could have a third parameter
,
the likelihood
and
the
prior
which
is specified by separately specifying the conditional distributions
,
and the distribution
.
With more than two levels, the computation strategy is similar to that
illustrated in the previous section. First, we take all parameters but one as
given, and we derive the prior predictive distribution of
,
conditional on the parameters that have been kept fixed. Then, we use the
predictive distribution thus obtained as likelihood, and we use it to obtain
another prior predictive distribution for
,
conditional on a smaller number of parameters than in the previous step. And
so on.
Please cite as:
Taboga, Marco (2021). "Hierarchical Bayesian models", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/Hierarchical-Bayesian-models.
Most of the learning materials found on this website are now available in a traditional textbook format.