The Dirichlet distribution is a multivariate continuous probability distribution often used to model the uncertainty about a vector of unknown probabilities.
The Dirichlet distribution is a multivariate generalization of the Beta distribution.
Denote by
the probability of an
event. If
is unknown, we can treat it as a
random variable,
and assign a Beta distribution to
.
If
is a vector of unknown probabilities of mutually exclusive
events, we can treat
as a random vector
and assign a Dirichlet distribution to it.
The Dirichlet distribution is characterized as follows.
Definition
Let
be a
continuous random
vector. Let its
support
be
Let
.
We say that
has a Dirichlet distribution with parameters
if and only if its
joint probability
density function
is
where
the normalizing constant
is
and
is the Gamma function.
In the above definition, the entries of the vector
are
probabilities
whose sum is less than or equal to
1:
If we want to have a vector of probabilities exactly summing up to 1, we can
define an additional probability
so
that
However, there is no way to rigorously define a probability density for the
vectorbecause
the constraint in equation (2) implies that the probability density should be
zero everywhere on
except on a subset whose Lebesgue measure is equal to zero, and on the latter
set the probability density should be infinite (something involving a
Dirac delta
function).
Therefore, the right way to deal with
events whose probabilities sum up to 1 is to:
assign a Dirichlet density, as defined above, to the probabilities of
events
(
).
define the probability
of the
-th
event as in equation (1).
We notice that several sources (including the Wikipedia page on the Dirichlet distribution) are not entirely clear about this point.
How do we come up with the above formula for the density of the Dirichlet distribution?
The next proposition provides some insights.
Proposition
Let
be
independent
Gamma random
variables having means
and degrees-of-freedom parameters
.
Define
Then,
the
random
vector
has
a Dirichlet distribution with parameters
.
A Gamma random variable is supported on the
set of positive real numbers.
Moreover,and
Therefore,
the support of
coincides with that of a Dirichlet random vector. The probability density of a
Gamma random variable
with mean parameter
and degrees-of-freedom parameter
is
Since
the variables
are independent, their joint probability density
is
Consider
the one-to-one
transformation
whose
inverse
is
The
Jacobian matrix of
is
The
determinant of the
Jacobian
is
because:
1) the determinant does not change if we add the first
rows to the
-th
row; 2) the determinant of a
triangular matrix is equal to the product of its diagonal entries. The
formula for the
joint
probability density of a one-to-one transformation gives us (on the
support of
):
By
integrating out
,
we
obtain
where
in step
we have used the definition of the
Gamma function. The latter
expression is the density of the Dirichlet distribution with parameters
.
The Beta distribution is a special case of the Dirichlet distribution.
If we set the dimension
in the definition above, the support becomes
and
the probability density function
becomes
By using the definition of the
Beta
functionwe
can re-write the density
as
But this is the density of a
Beta random
variable with parameters
and
.
The following proposition is often used to prove interesting results about the Dirichlet distribution.
Proposition
Let
be a
Dirichlet random vector with parameters
.
Let
be any integer such that
.
Then, the the marginal distribution of the
subvector
is
a Dirichlet distribution with parameters
.
First of all, notice that if the proposition
holds for
,
then we can use it recursively to show that it holds for all the other
possible values of
.
So, we assume
.
In order to derive the marginal distribution, we need to integrate
out of the joint density of
:
where
and
we have used
indicator
functions to specify the support of
;
for example,
is equal to 1 if
and to 0 otherwise. We can re-write the
marginal
density
as
After
defining
we
can solve the integral as
follows:
where:
in step
we made the change of variable
;
in step
we used the integral
representation of the Beta function; in step
we used the relation between the Beta and Gamma functions. Thus, we
have
which
is the density of a
-dimensional
Dirichlet distribution with parameters
.
A corollary of the previous two propositions follows.
Proposition
Let
be a
Dirichlet random vector with parameters
.
Then, the marginal distribution of the
-th
entry of
is a Beta distribution with parameters
and
where
The expected value of a Dirichlet random vector
is
We know that the marginal distribution of
each entry of
is a Beta distribution. Therefore, we can use, for each entry, the formula for
the expected value of a Beta random variable:
The cross-moments of
a Dirichlet random vector are
where
are non-negative integers.
The formula is derived as
follows:In
the last step we have used the fact that the expression inside the integral is
the joint probability density of a Dirichlet distribution with parameters
The entries of the
covariance
matrix of a Dirichlet random vector
are
where
We can use the
covariance formula
and
its special
case
together
with the formulae for the expected value and the cross-moments derived
previously. When
,
we
have
where
we have used the property of the
Gamma
function
and
we have
defined
Therefore,
for
,
we
have
When
,
we
have
and
Please cite as:
Taboga, Marco (2021). "Dirichlet distribution", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/probability-distributions/Dirichlet-distribution.
Most of the learning materials found on this website are now available in a traditional textbook format.