Simulation Methods: Metropolis-Hastings Algorithm

Motivation

Through out this series on Simulation Methods, we focused on the applications of these methods to numerical integration.

Given an integral of the form: \begin{equation} I = \int_{A} h(x)f(x)dx \tag{1} \label{eq:1} \end{equation}

We assumed in the basic Monte Carlo method, that we could sample from f and use the law of large numbers to compute the integral using the mean. However, in case we cannot easily sample from $f$, Importance Sampling can be used by introducing another distribution $q$ which can be easily sampled from.

This post is about the popular Metropolis-Hastings algorithm, which is an algorithm for sampling from a distribution $f$ that is typically hard to sample from.

For demonstration purposes we use the logistic distribution with mean 0 and scale 1, even though we could sample directly from it:

\begin{equation} f(x) = \dfrac{\exp(-x)}{(1+\exp(-x))^2} \end{equation}

The Algorithm

Loosely speaking, the Metropolis-Hastings algorithm generates a sequence of values (a sample) $\{x_0, x_1, …, x_n\}$ whose distribution tends to $f$ as $n \rightarrow \infty$. We will dive into the theory behind the algorithm in a later post with more formal definitions.

Just like in Importance Sampling, in the Metropolis-Hastings algorithm we make use of a friendly distribution to our liking that we call $q$ from which we can easily sample. This distribution is used to propose candidate sample values and is called a proposal density.

The algorithm is as follows:

Choose $x_0$ arbitrarily. To generated the next value $x_{t + 1}$ for $t \in \{0, 1, 2, …\}$, we do the following:

Generate a proposal or candidate for the next sample value $y$ by sampling from the proposal density ($y ∼ q(y|x_t)$)
Evaluate the acceptance probability $\alpha$: \begin{equation} \alpha(x, y) = \min\biggl\{1, \dfrac{f(y) \cdot q(x_t|y)}{f(x_t) \cdot q(y|x_t)}\biggr\} \end{equation}
Accept or reject: \begin{equation} x_{t+1} = \begin{cases} y, & \text{with probability } \alpha \\ x_t, & \text{with probability } 1-\alpha \end{cases} \end{equation}
A typical proposal density $q(y|x)$ is the normal density $N(x,\sigma^2)$ for some $\sigma > 0$. In this case the proposal density is symmetric, i.e., $q(y|x) = q(x|y)$:
\begin{equation} q(y|x) = \dfrac{1}{\sigma \sqrt{2\pi}} \cdot \exp\biggl(-\dfrac{1}{2} \biggl(\dfrac{y-x}{\sigma}\biggr)^2\biggr) = \dfrac{1}{\sigma \sqrt{2\pi}} \cdot \exp\biggl(-\dfrac{1}{2} \biggl(\dfrac{x-y}{\sigma}\biggr)^2\biggr) = q(x|y) \end{equation}
This means that: \begin{equation} \alpha = \min\biggl\{1, \dfrac{f(y)}{f(x_t)}\biggr\} \end{equation}

Example: So, let’s try the algorithm to generate 10000 logistic samples with mean 0 and scale 1.

import numpy as np
import seaborn as sns

sns.set_theme(style="white")

def logistic(x):
    return np.exp(-x) / np.square(1 + np.exp(-x))

We will make use of the symmetric proposal density $q(y|x) = N(x,\sigma^2)$, we use $\sigma = 1$

n = 10000
samples = np.zeros(n)
samples[0] = 0

for i in range(n - 1):
    y = np.random.normal(loc=samples[i], scale=1)
    alpha = min(1, logistic(y) / logistic(samples[i]))

    u = np.random.uniform()

    if u < alpha:
        samples[i + 1] = y
    else:
        samples[i + 1] = samples[i]

sns.displot(
    data=samples, stat="density", color="chocolate", kde=True, kde_kws={"bw_adjust": 2}
).set(title="Generated samples density estimate", xlabel="x")

The figure above shows the density estimate of the generated sample which matches closely the target logistic density.

Now, because the generated sample values follow the distribution $f$ .i.e., ($x_i \sim f$), we can use these for numerical integration. Given the integral from Equation \eqref{eq:1}, we could approximate it using the sample mean of a large sample as the law of large numbers suggests:

\begin{equation} \dfrac{1}{N} \sum_{i=1}^N h(x_i) \rightarrow \mathbb{E}_{f}(h(x)) = I \end{equation}

In the next few posts, we will discuss and show why this algorithm works :)

References

Wasserman, Larry. 2004. All of Statistics: A Concise Course in Statistical Inference. Springer Texts in Statistics. New York: Springer. https://doi.org/10.1007/978-0-387-21736-9.