9 minutes
Markov chain: Stationary Distribution
In this article, we will discuss the existence and uniqueness of the stationary distibrution and investigate the limiting behaviour.
Stationary Distribution
Definition: Let \(\{X_{t}| \ t \in \mathbb{N}\}\) be a Markov chain with state space \(\mathcal{S}\) and transition matrix \(\mathbf{P}\). The vector \(\vec{\pi}\) is called a stationary (or equilibrium) distribution of the chain if \(\vec{\pi}^{\top}\) has entries \((\pi_{j} | \ j \in \mathcal{S})\) such that:
- \(\pi_{j} \geq 0\) for all \(j\), and \(\sum_{j} \pi_{j} = 1\)
- \(\vec{\pi}^{\top} = \vec{\pi}^{\top}\mathbf{P}\), i.e. \(\pi_{j} = \sum_{i}\pi_{i}p_{ij}\)
We introduced the stationary distribution in part 1 of this series, were we stated that if a chain reaches equilibrium, then it will stay at equilibrium forever. That is to say that the distribution of \(X_{t}\) does not change anymore when the chain is stationary or at equilibrium. This translates mathematically to:
\begin{align} P(X_0 = i) = \mu_0(i) = \pi_i \implies P(X_t = i) &= \mu_{t}(i) \\ &= (\vec{\mu}_{0}\mathbf{P}^{t})_{i} \\ &= (\vec{\pi}\mathbf{P}^{t})_{i} \\ &= \pi_i \tag{1} \label{eq:1} \end{align}
\(\forall t \in \mathbb{N}^{+}\), the last step results from the definition above and the Chapman-Kolmogorov equations.
A vector \(\vec{x}\) is called a measure if \(\vec{x} \neq \vec{0}\) \(x_{i} \geq 0\) for all \(i\) and is called a probability measure if additionally \(\sum_{i}x_{i} = 1\).
Furthermore, \(\vec{x}\) is called a stationary measure if it satisfies the equation \(\vec{x}^{\top} = \vec{x}^{\top}\mathbf{P}\), for a transition matrix \(\mathbf{P}\). Therefore, \(\vec{\pi}\) is a stationary probability measure.
Existence
Under which conditions does the stationary distribution exist ? That’s what we are going to find out.
Note: We’ll limit ourself to an irreducible Markov chain.
let’s introduce some terms:
Let the chain start at state \(k\).
\(T_{k}\) denotes the recurrence time, i.e., from state \(k\) back to state \(k\).
\begin{equation} T_{k} = \min\{t \in \mathbb{N}^{+} | \ X_t = k\} \tag{2} \label{eq:2} \end{equation}
\(N_{i}\) denotes the number of visits to a state \(i\) before the first return to the starting state \(k\).
\begin{equation} N_{i} = \sum_{t=0}^{\infty}\mathbb{1}_{\{X_t = i, \ T_{k} > t\}} \tag{3} \label{eq:3} \end{equation}
The recurrence time to state \(k\) is effectively the sum of the times spent in all other states before returning to state \(k\). Therefore,
\begin{equation} T_{k} = \sum_{i \in \mathcal{S}} N_{i} \tag{4} \label{eq:4} \end{equation}
\(\rho_{i}(k)\) denotes the expected number of visits to state \(i\) before the first return to state \(k\).
\begin{align} \rho_{i}(k) &= \mathbb{E}(N_{i}|\ X_{0}=k) \\ &= \mathbb{E}\biggl(\sum_{t=0}^{\infty}\mathbb{1}_{\{X_{t} = i, \ T_{k} > t\}} \ | \ X_{0} = k \biggr) \\ &= \sum_{t=0}^{\infty}\mathbb{E}(\mathbb{1}_{\{X_{t} = i, \ T_{k} > t\}} \ | \ X_{0} = k) \\ &= \sum_{t=0}^{\infty}P(X_{t} = i, T_{k} > t \ | \ X_{0} = k) \tag{5} \label{eq:5} \end{align}
Because \(X_{0} = k = X_{T_{k}}\), Equation \eqref{eq:5} can be written in the following equivalent forms:
\begin{align} \rho_{i}(k) &= \sum_{t=0}^{\infty}P(X_{t} = i, T_{k} > t \ | \ X_{0} = k) = \sum_{t=0}^{T_{k} - 1}P(X_{t} = i \ | \ X_{0} = k) \\ &= \sum_{t=1}^{\infty}P(X_{t} = i, T_{k} \geq t \ | \ X_{0} = k) = \sum_{t=1}^{T_{k}}P(X_{t} = i \ | \ X_{0} = k) \tag{6} \label{eq:6} \end{align}
We shall assume that \(k\) is recurrent, so that the chain returns to state \(k\) with probability \(1\). We could then write the expected recurrence time \(m_{k}\) in terms of \(\rho_{i}(k)\):
\begin{align} m_{k} &= \mathbb{E}(T_{k} \ | \ X_{0} = k) \\ &= \mathbb{E}\biggl(\sum_{i \in \mathcal{S}} N_{i} \ | \ X_{0} = k\biggr) \\ &= \sum_{i \in \mathcal{S}}(\mathbb{E}(N_{i} \ | \ X_{0} = k)) \\ &= \sum_{i \in \mathcal{S}} \rho_{i}(k) \tag{7} \label{eq:7} \end{align}
Let \(\vec{\rho}(k) = (\rho_{i}(k) | \ i \in \mathcal{S})\) denote the vector with components indicating the expected number of visits to each state before turning to state \(k\).
Lemma: If \(k\) above is recurrent then,
- \(\rho_{k}(k) = 1\)
- \(\vec{\rho}^{\top}(k) = \vec{\rho}^{\top}(k)\mathbf{P}\), i.e. \(\vec{\rho}\) is stationary.
- \( 0 < \rho_{i}(k) < \infty\)
Proof:
\begin{align} \rho_{k}(k) = \sum_{t=0}^{\infty}P(X_{t} = i, T_{k} > t \ | \ X_{0} = k) &= P(X_{0} = k \ | \ X_{0} = k) + \sum_{t=1}^{T_{k} - 1} P(X_{t} = k \ | \ X_{0} = k) \\ &= 1 + 0 \quad (X_{t} = k \text{ only at } t = 0)\\ &= 1 \tag*{$\blacksquare$} \end{align}
- We show that \(\vec{\rho}\) is stationary.
Let \(i \in \mathcal{S}\) be an arbitrary state,
\begin{align} \rho_{i}(k) &= \sum_{t=1}^{\infty}P(X_{t} = i, \ T_{k} \geq t \ | \ X_{0} = k) \\ &= \sum_{t=1}^{\infty}\sum_{j \in \mathcal{S}}P(X_{t} = i, \ X_{t-1}=j, \ T_{k} \geq t \ | \ X_{0} = k) \\ &= \sum_{j \in \mathcal{S}}\sum_{t=1}^{\infty}P(X_{t} = i, \ X_{t-1}=j, \ T_{k} \geq t \ | \ X_{0} = k) \\ &= \sum_{j \in \mathcal{S}}\sum_{t=1}^{\infty}P(X_{t}=i \ | \ X_{t-1}=j, \ T_{k} \geq t)P(X_{t-1} = j, \ T_{k} > t \ | \ X_{0} = k) \\ &= \sum_{j \in \mathcal{S}}\sum_{t=1}^{\infty}p_{ji}P(X_{t-1} = j\ , \ T_{k} \geq t \ | \ X_{0} = k) \quad \text{(Markov property)}\\ &= \sum_{j \in \mathcal{S}}p_{ji}\sum_{t=1}^{\infty}P(X_{t-1} = j,\ T_{k} \geq t \ | \ X_{0} = k) \\ &= \sum_{j \in \mathcal{S}}p_{ji}\sum_{t=0}^{\infty}P(X_{t} = j,\ T_{k} \geq t + 1 \ | \ X_{0} = k) \\ &= \sum_{j \in \mathcal{S}}p_{ji}\sum_{t=0}^{T_{k} - 1}P(X_{t} = j \ | \ X_{0} = k) \quad (T_{k} \geq t + 1 \implies T_{k} -1 \geq t)\\ &= \sum_{j \in \mathcal{S}}p_{ji}\rho_{j}(k) \\ &\implies \vec{\rho}^{\top}(k) = \vec{\rho}^{\top}(k)\mathbf{P} \tag*{$\blacksquare$} \end{align}
From this result implies that \(\rho_{i}(k) = \sum\limits_{j \in \mathcal{S}}p_{ji}(n)\rho_{j}(k)\) for all \(n \in \mathbb{N}\).
- First we show that \( 0 < \rho_{i}(k)\) for an arbitrary state \(i\)
We assumed an irreducible markov chain, there for there exists \(m, n \in \mathbb{N}^{+}\), such that \(p_{ik}(m) > 0\) and \(p_{ki}(n) > 0\)
\begin{align} \rho_{i}(k) = \sum_{j \in \mathcal{S}} p_{ji}(n) \rho_{j}(k) \geq p_{ki}(n) \rho_{k}(k) = p_{ki}(n) > 0 \end{align}
Now we show that \(\rho_{i}(k) < \infty\) \begin{align} &\rho_{k}(k) = 1 = \sum_{j\in \mathcal{S}}p_{jk}(m)\rho_{j}(k) \geq p_{ik}(m)\rho_{i}(k) \\ &\implies \rho_{i}(k) \leq \frac{1}{p_{ik}(m)} < \infty \end{align}
Therefore, \begin{equation} 0 < \rho_{i}(k) < \infty \tag*{$\blacksquare$} \end{equation}
This means that \begin{align} \vec{\pi} &= \dfrac{\vec{\rho}(k)}{\sum\limits_{i \in \mathcal{S}}\rho_{i}(k)} = \dfrac{\vec{\rho}(k)}{m_{k}} \\ \pi_{i} &= \dfrac{\rho_{i}(k)}{m_{k}} \tag{8} \label{eq:8} \end{align} is a stationary probability measure if and only if \(k\) is positive recurrent.
So we have found a canditate stationary probability measure. The only condition for its existence is the presence of a positive recurrent state \(k\) in the state space of the Markov chain.
It was shown in the previous post that for an irreducible chain with a countable state space, if at least one state is recurrent, then all states are recurrent. Therefore all states must be recurrent for the stationary distribution to exist with at least one positive recurrent state \(k\).
Uniqueness
We want to show here that the stationary probability distribution, \(\pi_{i}\), is uniquely determined by \(\dfrac{1}{m_{i}}\).
\begin{equation} \pi_{i} = \dfrac{1}{m_{i}} \tag{9} \label{eq:9} \end{equation}
For this purpose we can assume a stationary probability measure \(\vec{\pi}\) exists and that \(X_{0}\) has distribution \(\vec{\pi}\), i.e., \(P(X_{0} = i) = \pi_{i}\).
Proof:
\begin{align} m_{i} &= \mathbb{E}(T_{i} | X_0 = i) \\ \pi_{i}m_{i} &= P(X_{0} = i)\mathbb{E}(T_{i} | X_0 = i) \\ &= P(X_{0} = i)\sum_{l=1}^{\infty}lP(T_{i} = l | X_{0} = i) \\ &= P(X_{0} = i)\sum_{l=1}^{\infty}(\sum_{t=1}^{l}(1))P(T_{i} = l | X_{0} = i) \\ &= P(X_{0} = i)\sum_{t=1}^{\infty}\sum_{l=t}^{\infty}P(T_{i} = l | X_{0} = i) \\ &= P(X_{0} = i)\sum_{t=1}^{\infty}P(T_{i} \geq t | X_{0} = i) \\ &= \sum_{t=1}^{\infty}P(X_{0} = i)P(T_{i} \geq t | X_{0} = i) \\ &= \sum_{t=1}^{\infty}P(X_{0}=i, T_{i} \geq t) \\ &= P(X_{0}=i, \underbrace{T_{i} \geq 1}_{\text{True}}) + \sum_{t=2}^{\infty}P(X_{0}=i, T_{i} \geq t) \\ &= P(X_{0}=i) + \sum_{t=2}^{\infty}P(X_{0}=i, T_{i} \geq t) \\ &= P(X_{0}=i) + \sum_{t=2}^{\infty}P(X_{0}=i, X_{1} \neq i, X_{2} \neq i, \cdots, X_{t-1} \neq i) \\ \end{align}
To continue, we have to find \(P(X_{0} = i, X_{1} \neq i, X_{2} \neq i, \cdots, X_{t-1} \neq i)\). Let the event \(A=\{X_{1} \neq i, X_{2} \neq i, \cdots, X_{t-1} \neq i\}\) and \(B = \{X_{0} = i\}\)
\begin{equation} P(A \cap B) = P(A) - P(A \cap \overline{B}) \end{equation}
Therefore, \begin{align} P(X_{0} = i, X_{m} \neq i \text{ for } 1 \leq m \leq t- 1) &= P(X_{m} \neq i \text{ for } 1 \leq m \leq t- 1) - P(X_{m} \neq i \text{ for } 0 \leq m \leq t- 1) \\ &= P(X_{m} \neq i \text{ for } 0 \leq m \leq t - 2) - P(X_{m} \neq i \text{ for } 0 \leq m \leq t- 1) \end{align}
The last simplification is due to the fact that the chain is assumed to be homogeneous and stationary.
\begin{align} \pi_{i}m_{i} &= P(X_{0} = i) + \sum_{t=2}^{\infty}\biggl(P(X_{m} \neq i \text{ for } 0 \leq m \leq t- 2) - P(X_{m} \neq i \text{ for } 0 \leq m \leq t- 1)\biggr) \\ \end{align}
Let \(a_{t} = P(X_{m} \neq i \text{ for } 0 \leq m \leq t)\), \begin{align} \pi_{i}m_{i} &= P(X_{0} = i) + \sum_{t=2}^{\infty}(a_{t-2} - a_{t-1}) \\ &= P(X_{0} = i) + a_{0} - a_{1} + a_{1} - a_{2} + a_{2} - \cdots - \lim_{t \to \infty} a_{t} \\ &= P(X_{0} = i) + a_{0} - \lim_{t \to \infty} a_{t} \\ &= P(X_{0} = i) + P(X_{0} \neq i) - \lim_{t \to \infty} a_{t} \\ &= 1 - \lim_{t \to \infty} a_{t} \\ \end{align}
Because the chain is irreducible and the state space is countable, all states are recurrent. Therefore, if the chain starts at any arbitrary state, it must eventually visit state \(i\). Therefore \(\lim\limits_{n \to \infty} a_{t} = 0\).
Thus \(\pi_{i}m_{i} = 1\), and \(\pi_{i}\) is unique since the mean recurrence time \(m_{i}\) is obviously unique. \begin{align} \pi_{i} &= \dfrac{1}{m_{i}} \tag*{$\blacksquare$} \end{align}
Since we’ve now established that \(\pi_{i}\) is unique, we have from Equation \eqref{eq:8} above that,
\begin{equation} \rho_{i}(k) = \dfrac{m_{k}}{m_{i}} \end{equation}
Furthermore Equation \eqref{eq:9} implies that all states are positive recurrent.
Proof:
Suppose, state \(j\) is not positive recurrent, then from Equation \eqref{eq:9} \(\pi_{j} = 0\). But,
\begin{align} \pi_{j} &= \sum_{i\in\mathcal{S}} \pi_{i}p_{ij}(t) \\ 0 &= \sum_{i\in\mathcal{S}} \pi_{i}p_{ij}(t) \end{align}
for all positive integers \(t\). But since the chain is irreducible, there is some \(t\) for which \(p_{ij}(t) > 0\). This implies that \(\pi_{i} = 0\) for all \(i \in \mathcal{S}\). This is a contradiction since, \(\sum\limits_{i} \pi_{i} = 1\). Therefore \(\pi_{j}>0\) for all states \(j\) which means that \(m_{j} < \infty\) for \(j \in \mathcal{S}\). Therefore, for an irreducible Markov chain, a stationary distribution exists if and only if all states are positive recurrent.
$$\tag*{$\blacksquare$}$$
References
Grimmett, Geoffrey, and David Stirzaker. 2020. Probability and Random Processes. Oxford; New York: Oxford University Press.
Takahara, Glen. 2017. Stochastic Processes. https://mast.queensu.ca/~stat455/lecturenotes/set3.pdf.
Wasserman, Larry. 2004. All of Statistics: A Concise Course in Statistical Inference. Springer Texts in Statistics. New York: Springer. https://doi.org/10.1007/978-0-387-21736-9.