3 minutes
Simulation Methods: Metropolis-Hastings Algorithm
Motivation
Through out this series on Simulation Methods, we focused on the applications of these methods to numerical integration.
Given an integral of the form: \begin{equation} I = \int_{A} h(x)f(x)dx \tag{1} \label{eq:1} \end{equation}
We assumed in the basic Monte Carlo method, that we could sample from f and use the law of large numbers to compute the integral using the mean. However, in case we cannot easily sample from \(f\), Importance Sampling can be used by introducing another distribution \(q\) which can be easily sampled from.
This post is about the popular Metropolis-Hastings algorithm, which is an algorithm for sampling from a distribution \(f\) that is typically hard to sample from.
For demonstration purposes we use the logistic distribution with mean 0 and scale 1, even though we could sample directly from it:
\begin{equation} f(x) = \dfrac{\exp(-x)}{(1+\exp(-x))^2} \end{equation}
The Algorithm
Loosely speaking, the Metropolis-Hastings algorithm generates a sequence of values (a sample) \(\{x_0, x_1, …, x_n\}\) whose distribution tends to \(f\) as \(n \rightarrow \infty\). We will dive into the theory behind the algorithm in a later post with more formal definitions.
Just like in Importance Sampling, in the Metropolis-Hastings algorithm we make use of a friendly distribution to our liking that we call \(q\) from which we can easily sample. This distribution is used to propose candidate sample values and is called a proposal density.
The algorithm is as follows:
Choose \(x_0\) arbitrarily. To generated the next value \(x_{t + 1}\) for \(t \in \{0, 1, 2, …\}\), we do the following:
Generate a proposal or candidate for the next sample value \(y\) by sampling from the proposal density (\(y ∼ q(y|x_t)\))
Evaluate the acceptance probability \(\alpha\): \begin{equation} \alpha(x, y) = \min\biggl\{1, \dfrac{f(y) \cdot q(x_t|y)}{f(x_t) \cdot q(y|x_t)}\biggr\} \end{equation}
Accept or reject: \begin{equation} x_{t+1} = \begin{cases} y, & \text{with probability } \alpha \\ x_t, & \text{with probability } 1-\alpha \end{cases} \end{equation}
A typical proposal density \(q(y|x)\) is the normal density \(N(x,\sigma^2)\) for some \(\sigma > 0\). In this case the proposal density is symmetric, i.e., \(q(y|x) = q(x|y)\):
\begin{equation} q(y|x) = \dfrac{1}{\sigma \sqrt{2\pi}} \cdot \exp\biggl(-\dfrac{1}{2} \biggl(\dfrac{y-x}{\sigma}\biggr)^2\biggr) = \dfrac{1}{\sigma \sqrt{2\pi}} \cdot \exp\biggl(-\dfrac{1}{2} \biggl(\dfrac{x-y}{\sigma}\biggr)^2\biggr) = q(x|y) \end{equation}
This means that: \begin{equation} \alpha = \min\biggl\{1, \dfrac{f(y)}{f(x_t)}\biggr\} \end{equation}
Example: So, let’s try the algorithm to generate 10000 logistic samples with mean 0 and scale 1.
import numpy as np
import seaborn as sns
sns.set_theme(style="white")
def logistic(x):
return np.exp(-x) / np.square(1 + np.exp(-x))
We will make use of the symmetric proposal density \(q(y|x) = N(x,\sigma^2)\), we use \(\sigma = 1\)
n = 10000
samples = np.zeros(n)
samples[0] = 0
for i in range(n - 1):
y = np.random.normal(loc=samples[i], scale=1)
alpha = min(1, logistic(y) / logistic(samples[i]))
u = np.random.uniform()
if u < alpha:
samples[i + 1] = y
else:
samples[i + 1] = samples[i]
sns.displot(
data=samples, stat="density", color="chocolate", kde=True, kde_kws={"bw_adjust": 2}
).set(title="Generated samples density estimate", xlabel="x")
The figure above shows the density estimate of the generated sample which matches closely the target logistic density.
Now, because the generated sample values follow the distribution \(f\) .i.e., (\(x_i \sim f\)), we can use these for numerical integration. Given the integral from Equation \eqref{eq:1}, we could approximate it using the sample mean of a large sample as the law of large numbers suggests:
\begin{equation} \dfrac{1}{N} \sum_{i=1}^N h(x_i) \rightarrow \mathbb{E}_{f}(h(x)) = I \end{equation}
In the next few posts, we will discuss and show why this algorithm works :)
References
Wasserman, Larry. 2004. All of Statistics: A Concise Course in Statistical Inference. Springer Texts in Statistics. New York: Springer. https://doi.org/10.1007/978-0-387-21736-9.