The previous post introduced the Markov chain and the limiting and stationary distributions. This post will go through some of the fundamental notions associated with Markov chains.

Recurrence and Transience 

A state, \(i \in \mathcal{S}\), is called recurrent (or persistent) if starting from state \(i\) the probability that the chain ever returns to state \(i\) is 1. Mathematically, a state is recurrent if

\begin{equation} P(\exists \ t \in \mathbb{N}^{+}: X_t = i | X_{0} = i) = 1 \tag{1} \label{eq:1} \end{equation}

Let

\begin{equation} f_{ij}(t) = P(X_1 \neq j, X_2 \neq j, \dots, X_{t} = j | X_0 = i) \tag{2} \label{eq:2} \end{equation} denote the probability that starting from state \(i\) we first hit state \(j \in \mathcal{S}\) at exactly time step \(t\).

The probability that the chain ever hits state \(j\), while starting in state \(i\), \(P(\exists \ t \in \mathbb{N}^{+}: X_t = j | X_{0} = i)\), can be written in terms of \eqref{eq:2}

\begin{equation} f_{ij} = P(\exists \ t \in \mathbb{N}^{+}: X_t = j | X_{0} = i) = \sum_{t=1}^{\infty} f_{ij}(t) \tag{3} \label{eq:3} \end{equation}

Therefore by definition, state \(i\) is recurrent iff \(f_{ii} = 1\).

If a state is not recurrent, it is called transient, i.e. if \(f_{ii} < 1\).

Assumming the chain starts at state \(i\), the recurrence time \(T_{i}\) is the time of the first return of the chain to state \(i\). \begin{equation} T_{i} = \min\{t \in \mathbb{N}^{+} | X_t = i\}, \quad \in \{1, 2, \dots\} \tag{4} \label{eq:4} \end{equation}

By convention \(T_{i} = \infty\) if the chain never returns to state \(i\).

We can define the recurrence of a state \(i\) in terms of \(T_{i}\): \(i\) is recurrent iff \(P(T_{i} < \infty) = f_{ii} = 1 \). Otherwise it is transient.

Note that \(P(T_{i} = t) = f_{ii}(t)\). Using this we can find the expected recurrence time

\begin{equation} m_{i} = \mathbb{E}(T_{i}) = \sum_{t=1}^{\infty}tf_{ii}(t), \quad \in (0, \infty] \tag{5} \label{eq:5} \end{equation}

For a recurrent state \(i\), if \(m_{i} < \infty\), state \(i\) is said to be positive recurrent. If however \(m_{i} = \infty\), then \(i\) is called null recurrent.

For a transient state \(i\), \(m_{i} = \infty\). This is because for a transient state, \(P(T_{i} = \infty) = 1 - f_{ii} > 0\), so the expectation, \(\mathbb{E}(T_{i})\) is infinite.

Lemma 1: The probability of hitting a null recurrent state is \(0\) in the limit, i.e If \(j\) is null recurrent,

\begin{equation} \lim_{n \to \infty} p_{ij}(n) = 0, \quad \forall i \in \mathcal{S} \tag{6} \label{eq:6} \end{equation}

Reachability and Communication 

State \(j\) is said to be reachable from state \(i\), written \(i \to j\), if \(p_{ij}(t) > 0\) for some \(t \in \mathbb{N}\). Equivalently \(i \to j\) if \(f_{ij} > 0\).

States \(i\) and \(j\) are said to communicate if \(i \to\ j\) and \(j \to i \), written \(i \leftrightarrow j\).

Theorem 1: The communication relation (\(\leftrightarrow\)) is an equivalence relation:

  1. \(i \leftrightarrow i\)
  2. If \(i \leftrightarrow j\), then \(j \leftrightarrow i\)
  3. If \(i \leftrightarrow j\) and \(j \leftrightarrow k\), then \(i \leftrightarrow k\)

From this equivalence relation follows that the state space can be partitioned into disjoint sets of equivalence classes called communication classes \begin{equation} \mathcal{S} = C_{1} \cup C_{2} \cup C_{3} \cup \cdots \tag{7} \label{eq:7} \end{equation} , such that two states are in the same class if they communicate with each other.

Example: Consider the example chain below. The state space of this chain consists of two disjoint classes \(\{1, 2\}\) and \(\{3\}\), because eventhough state \(2\) reaches state \(3\), \(3\) does not reach states \(1\) and \(2\).

The chain is called irreducible if all states communicate with each other, i.e. if the state space consists of a single class.

A set of states \(C\) is closed if once you hit/enter that set of states, you can never leave, i.e. if \(p_{ij} = 0\) for all \(i \in C\), \(j \notin C\). By this definition communication classes \(C_{(.)}\) are closed.

Theorem 2:

  1. State \(j\) is recurrent iff \(\sum\limits_{t=0}^{\infty}p_{jj}(t) = \infty\). In this case \(\sum\limits_{t=0}^{\infty}p_{ij}(t) = \infty\) for all \(i\) such that \(f_{ij} > 0\).
  2. State \(j\) is transient iff \(\sum\limits_{t=0}^{\infty}p_{jj}(t) < \infty\). In this case \(\sum\limits_{t=0}^{\infty}p_{ij}(t) < \infty\) for all \(i\).

It follows directly from the second statement of Theorem 2, that if \(j\) is transient, then \(\lim\limits_{n\to\infty}p_{ij} = 0\) for all \(i\), since \(\sum\limits_{t=0}^{\infty}p_{ij}(t)\) is finite.

Theorem 3:

  1. If \(i \leftrightarrow j\) and \(i\) is recurrent then \(j\) is recurrent.
  2. If \(i \leftrightarrow j\) and \(i\) is transient then \(j\) is transient.
  3. A Markov chain with countable state space \(\mathcal{S}\) (finite or countably infinite state space) must have at least one recurrent state
  4. The states of a countable, irreducible Markov chain are all recurrent

Theorem 4 (Decomposition Theorem): The state space \(\mathcal{S}\) can be written as the disjoint union

\begin{equation} \mathcal{S} = T \cup C_{1} \cup C_{2} \cup C_{3} \cup \cdots \tag{8} \label{eq:8} \end{equation}

where \(T\) is the set of transient states, and the \(C_{(\cdot)}\)’s are irreducible closed sets of recurrent states.

Period of a State 

The period \(d(i)\) is defined to be \begin{equation} d(i) = \gcd\{t: p_{ii}(t) > 0\} \tag{9} \label{eq:9} \end{equation}

Intuitively, the time taken to get from state \(i\) back to \(i\) is a multiple of the period of \(i\).

If \(d(i) = 1\), then \(i\) is said to be aperiodic, otherwise it is periodic.

Lemma 2: If \(i \leftrightarrow j\), then \(i\) and \(j\) have the same period.

Ergodicity 

A state is called ergodic if it is recurrent, non-null and aperiodic. A chain is ergodic if all its states are ergodic

For an ergodic chain, the limiting distribution converges to the stationary distribution. We shall discuss this in a future post.

Appendix (Proofs) 

Lemma 1

See (Grimmett and Stirzaker 2020, Section 6.4 Proof of (20), p. 261)

Theorem 1

Statements 1 and 2 follow from the definition of communication.

To prove statement 3, if \(i\) reaches \(j\) and \(j\) reaches \(k\), then there exists natural numbers (including \(0\)), \(m\) and \(n\) for which \(p_{ij}(m) > 0\) and \(p_{jk}(n) > 0\). By the Chapman-Kolmogorov equations, we’ll show that in \(m + n\) steps \(i\) reaches \(k\).

\begin{equation*} p_{ik}(m+n) = \sum_{l=0}^{\infty}p_{il}(m)p_{lk}(n) \geq p_{ij}(m)p_{jk}(n) > 0 \end{equation*}

In a similar way we can show that \(k \to i\). Therefore \(i \leftrightarrow k\). $$\tag*{$\blacksquare$}$$

Theorem 2

  1. We prove that state \(j\) is recurrent iff \(\sum\limits_{t=0}^{\infty}p_{jj}(t) = \infty\).

The t-step transition probability \(p_{ij}(t)\) can be written in terms of \(f_{ij}(\cdot)\), giving the following useful equation.

\begin{equation} p_{ij}(t) = \sum_{k=1}^{t}p_{jj}(t-k)f_{ij}(k), \quad t\in \mathbb{N}^{+} \tag{10} \label{eq:10} \end{equation}

Let \(A_{t}\) be the event that at time \(t\), the chain is in state \(j\), i.e \(A_t = \{X_t = j\}\).

Let \(B_k\) be the event that the chain first visits state \(j\) (after time 0) at time \(k\), i.e. \(B_k = \{X_l \neq j \text{ for } 1 \leq l \leq k, X_k = j\}\) for \(k \in \{1, 2, \dots, t\}\). Observe that the \(B_k\)’s are disjoint, i.e. \(B_r \cap B_s = \emptyset\) \(\forall \ r, s \in \{1, 2,\dots, t\}\). \begin{align} p_{ij}(t) & = P(A_{t} | X_{0} = i) \\ & = P(\bigcup_{k=1}^{t} A_t \cap B_k | X_0 = i) \end{align} The union goes up to \(t\) because \(A_{t} \cap B_{k} = \emptyset\) for \(k > t\).
Since the \(B_k\)’s are disjoint, so are the \(A_t \cap B_k\)’s \begin{align} p_{ij}(t) & = \sum_{k=1}^{t} P(A_t \cap B_k | X_0 = i) \\ & = \sum_{k=1}^{t} P(A_t | B_k, X_0 = i) P(B_k | X_0 = i) \\ & = \sum_{k=1}^{t} P(A_t | X_k = j, X_{k-1} \neq j,\dots, X_1 \neq j, X_0 = i) P(B_k | X_0 = i) \\ & = \sum_{k=1}^{t} P(A_t | X_k = j) P(B_k | X_0 = i) \quad \text{ (Markov property)}\\ & = \sum_{k=1}^{t} p_{jj}(t-k) f_{ij}(k) \tag*{$\blacksquare$} \end{align}

We define two probability generating functions for \(p_{ij}(\cdot)\) and \(f_{ij}(\cdot)\)

\begin{equation} P_{ij}(s) = \sum_{t=0}^{\infty}s^{t}p_{ij}(t), \quad F_{ij}(s) = \sum_{t=0}^{\infty}s^{t}f_{ij}(t) \end{equation}

These power series are guaranteed to converge for \(|s| \leq 1\).

If we multiply both sides of Equation \eqref{eq:10} by \(s^t\) and sum over \(t \in \mathbb{N}^{+}\) \begin{align} s^t p_{ij}(t) &= s^t\sum_{k=1}^{t} p_{jj}(t-k) f_{ij}(k) \\ \sum_{t=1}^{\infty} s^{t} p_{ij}(t) &= \sum_{t=1}^{\infty} s^t\sum_{k=1}^{t} p_{jj}(t-k) f_{ij}(k) \\ \sum_{t=0}^{\infty} s^{t} p_{ij}(t) - \delta_{ij} & = \sum_{t=1}^{\infty} \sum_{k=1}^{t} s^t p_{jj}(t-k) f_{ij}(k) \\ & = \sum_{k=1}^{\infty} \sum_{t=k}^{\infty} s^t p_{jj}(t-k) f_{ij}(k) \quad &(\text{r.h.s swap sums}) \\ & = \sum_{k=1}^{\infty} \sum_{t=k}^{\infty} s^{t-k} p_{jj}(t-k) s^{k} f_{ij}(k) \\ & = \sum_{k=1}^{\infty} s^{k} f_{ij}(k) \sum_{t=k}^{\infty} s^{t-k} p_{jj}(t-k) \\ & = \sum_{k=0}^{\infty} s^{k} f_{ij}(k) \sum_{r=0}^{\infty} s^{r} p_{jj}(r) \quad &(f_{ij}(0) = 0) \\ P_{ij}(s) - \delta_{ij} & = F_{ij}(s)P_{jj}(s) \\ P_{ij}(s) & = \delta_{ij} + F_{ij}(s)P_{jj}(s) \tag{11} \label{eq:11} \end{align}

Now we can prove statement 1: \(j\) is recurrent iff \(\sum\limits_{t=0}^{\infty}p_{jj}(t) = \infty \)

From Equation \eqref{eq:11} we have \begin{align} P_{jj}(s) &= 1 + F_{jj}(s)P_{jj}(s) \\ \implies P_{jj}(s) &= \dfrac{1}{1-F_{jj}(s)} \\ \lim_{s \to 1^{-}}P_{jj}(s) &= \sum_{t=0}^{\infty}p_{jj}(t) \\ &= \lim_{s \to 1^{-}} \biggl(\dfrac{1}{1-F_{jj}(s)}\biggr), & \biggl(\lim_{s \to 1^{-}} F_{jj}(s) = \sum_{t=0}^{\infty} f_{jj}(t)\biggr) \end{align}

Note that the limit from below (\(1^{-}\)) is used because, for convergence of the series (generating functions) are defined so that \(|s| < 1\).

From Equation \eqref{eq:3}, we know that \(f_{jj} = \sum\limits_{t=0}^{\infty} f_{jj}(t)\), we also know that if \(j\) is recurrent, \(f_{jj} = 1\)

Therefore, \begin{equation} \lim_{s \to 1^{-}}P_{jj}(s) = \sum_{t=0}^{\infty}p_{jj}(t) = \frac{1}{1-1} = \infty \tag*{$\blacksquare$} \end{equation}

The second part of statement 1 says: If \(j\) is recurrent then \(\sum\limits_{ij}^{\infty}p_{ij}(t) = \infty\) for all \(i\) such that \(f_{ij} > 0\)

We just showed the case \(i = j\). For all other cases where \(i \neq j\) such that \(f_{ij} > 0\), we have (using equation \eqref{eq:11})

\begin{align} P_{ij}(s) &= F_{ij}(s)P_{jj}(s) \\ \lim_{s\to 1^{-}}P_{ij} &= \sum_{t=0}^{\infty}p_{ij}(t) \\ \sum_{t=0}^{\infty}p_{ij}(t) &= \lim_{s\to 1^{-}} \biggl(F_{ij}(s)P_{jj}(s)\biggr) \\ \sum_{t=0}^{\infty}p_{ij}(t) &= \sum_{t=0}^{\infty}f_{ij}(t) \sum_{t=0}^{\infty}p_{jj}(t) \\ \sum_{t=0}^{\infty}p_{ij}(t) &= \underbrace{f_{ij}}_{> 0} \underbrace{\sum_{t=0}^{\infty}p_{jj}(t)}_{\infty} \\ \implies \sum_{t=0}^{\infty}p_{ij}(t) &= \infty \tag*{$\blacksquare$} \end{align}

  1. We prove that state \(j\) is transient iff \(\sum\limits_{t=0}^{\infty}p_{jj}(t) < \infty\). In this case \(\sum_{t=0}^{\infty}p_{ij}(t) < \infty\) for all \(i\).

For the first part, from Equation \eqref{eq:11},

\begin{align} P_{jj}(s) &= \dfrac{1}{1-F_{jj}(s)} \\ \sum_{t=0}^{\infty}p_{jj}(t) &= \lim_{s \to 1^{-}}\biggl(\dfrac{1}{1-F_{jj}(s)}\biggr) \end{align}

because \(j\) is transient,

\begin{equation} \lim_{s \to 1^{-}} F_{jj}(s) = \sum_{t=0}^{\infty}f_{jj}(t) = f_{jj} < 1 \end{equation}

Therefore,

\begin{equation} \sum_{t=0}^{\infty}p_{jj}(t) < \infty \tag*{$\blacksquare$} \end{equation}

Similarly, for all \(i \neq j\)

\begin{align} \sum_{t=0}^{\infty}p_{ij}(t) &= \lim_{s\to 1^{-}} \biggl(F_{ij}(s)P_{jj}(s)\biggr) \\ \sum_{t=0}^{\infty}p_{ij}(t) &= \sum_{t=0}^{\infty}f_{ij}(t) \sum_{t=0}^{\infty}p_{jj}(t) \\ \sum_{t=0}^{\infty}p_{ij}(t) &= f_{ij} \underbrace{\sum_{t=0}^{\infty}p_{jj}(t)}_{< \infty} \\ \implies \sum_{t=0}^{\infty}p_{ij}(t) &< \infty \tag*{$\blacksquare$} \end{align}

Theorem 3

  1. Suppose \(i\) is recurrent, if \(i \leftrightarrow j\), then from the definition of communication, there exists \(n, m \in \mathbb{N}_{0}\) for which \(p_{ij}(n)>0\) and \(p_{ji}(m)>0\)

By the Chapman-Kolmogorov equations, \(\mathbf{P}(n + r + m) = \mathbf{P}(m)\mathbf{P}(r)\mathbf{P}(n)\), we have

\begin{align} p_{jj}(m+r+n) & = \sum_{k}p_{jk}(m)(\mathbf{P}(r)\mathbf{P}(n))_{kj} \\ & = \sum_{k}p_{jk}(m) \sum_{l}p_{kl}(r)p_{lj}(n) \\ & = \sum_{k} \sum_{l} p_{jk}(m) p_{kl}(r)p_{lj}(n) \\ & \geq p_{ji}(m)p_{ii}(r)p_{ij}(n) \\ & \equiv \alpha p_{ii}(r) \end{align}

the inequality comes about by taking the summand for which \(k=l=i\) and \(\alpha = p_{ji}(m)p_{ij}(n)\), \(\alpha \in (0, 1]\).

Now summing over \(r\), we obtain

\begin{align} \sum_{r=0}^{\infty} p_{jj}(m+r+n) & \geq \sum_{r=0}^{\infty}\alpha p_{ii}(r) \\ \sum_{r=m+n}^{\infty} p_{jj}(r) & \geq \alpha \sum_{r=0}^{\infty}p_{ii}(r) \\ \sum_{r=0}^{\infty} p_{jj}(r) & \geq \sum_{r=m+n}^{\infty} p_{jj}(r) \\ & \geq \alpha \underbrace{\sum_{r=0}^{\infty}p_{ii}(r)}_{\infty, \text{ i is recurrent}} \\ \implies \sum_{r=0}^{\infty} p_{jj}(r) & = \infty \text{ (thus j is recurrent)} \tag*{$\blacksquare$} \end{align}

  1. Can be proven in a similar manner.

  2. Assume all states are transient, i.e., \(\lim\limits_{t\to\infty} p_{ij}(t) = 0\) for all states \(i, j \in \mathcal{S}\). We know that, \begin{equation} 1 = \sum_{j \ \in \ \mathcal{S}}p_{ij}(t) \end{equation} If we take the limit as \(n \to \infty\) \begin{equation} 1 = \lim_{n\to\infty}\sum_{j \ \in \ \mathcal{S}}p_{ij}(t) \end{equation} Because \(p_{ij}(t)\) can be uniformly bounded, i.e \(p_{ij}(t) \leq 1\) for all \(t\), we can take the limit inside the summation. (The limit can be taken inside the sum, if the sum is finite. This would be sufficient when considering a finite chain) \begin{equation} 1 = \sum_{j \ \in \ \mathcal{S}} \underbrace{\lim_{t\to\infty} p_{ij}(t)}_{0} = 0 \end{equation}

This results in a contradiction, all states cannot be transient. Therefore at least one state is recurrent. $$\tag*{$\blacksquare$}$$

  1. Let \(i\) be a state in the state space \(\mathcal{S}\) of an irreducible Markov chain. Because a Markov chain with a countable state space must have at least one recurrent state, let additionally this recurrent state be \(i\). By definition, if the chain is irreducible then \(i \leftrightarrow j \) for all \(j \in \mathcal{S}\). From statement 1 of this theorem all \(j\)’s have to be recurrent too. $$\tag*{$\blacksquare$}$$

Theorem 4

By definition communication classes are irreducible because all states communicate with one another and if the communication classes are finite, then all states are recurrent by Theorem 3.4.

So let \(C_1, C_2, \dots \) be the recurrent equivalence classes from Equation \eqref{eq:7}, we only need to show that any set of recurrent states \(C_{r}\) is closed.

Assume \(C_{r}\) is not closed, i.e., there exists \(i \in C_{r}\) (\(i\) is recurrent) and \(j \notin C_{r}\) such that \(p_{ij} > 0\). This means that \(i \to j\), but \(j \not\to i\)

Let \(A_t\) be the event \(\{\exists \ t \in \mathbb{N}^{+}: X_t = i\}\) and \(B\) the event \(\{X_1=j\}\). Notice that \(A_t \cap B = \emptyset\). \(\overline{A}_{t} = \{\forall \ t \in \mathbb{N}^{+}, \ X_t \neq i\}\), i.e never hitting state \(i\), \(\overline{A}_t \supseteq B\). This implies,

\begin{align} P(\overline{A}_t | X_0 = i) &\geq P(B | X_0 = i) \\ P(\forall \ t \geq 1, \ X_t \neq i | X_0 = i) &\geq P(X_1 = j | X_0 = i) \\ P(\forall \ t \geq 1, \ X_t \neq i | X_0 = i) &\geq p_{ij} \\ & > 0 \end{align}

Therefore the probability of starting at \(i\) and never returning is \(> 0\). Which would mean that \(i\) is not recurrent. Therefore \(C_{r}\) must be closed. $$\tag*{$\blacksquare$}$$

Lemma 2

Let \(d_{i}\) be the period of state \(i\) and \(d_{j}\) be that of state \(j\). For \(d_{i}=d_{j}\) to be true, we have to show that \(d_{i} \ | \ d_{j}\) and \(d_{j} \ | \ d_{i}\) (read \(d_{i}\) divides \(d_{j}\) and \(d_{j}\) divides \(d_{i}\)).

If \(i \leftrightarrow j\), then \(\exists \ m, n \in \mathbb{N}_{0}\) such that \(p_{ij}(n) > 0\) and \(p_{ji}(m) > 0\).

Using the Chapman-Kolmogorov equations as in the proof for Theorem 3.1 we get,

\begin{equation} p_{jj}(m + r +n ) \geq p_{ji}(m)p_{ii}(r)p_{ij}(n) \end{equation}

However for \(p_{ii}(r) > 0\), \(p_{jj}(m+r+n) > 0\) and when \(r = 0\), \(p_{ii} = 1 > 0\). This means that \(d_{j} \ | \ m + n\) and \(d_{j} \ | \ m+r+n\) for every \(r \geq 1\) such that \(p_{ii}(r) > 0\). Therefore \(d_{j} \ | \ \{r : p_{ii}(r)>0\} \implies d_{j} \ | \ d_{i}\). By interchaing \(i\) and \(j\) in the inequality above we arrive at \(d_{i} \ | \ d_{j}\). Thus \(d_{i} = d_{j}\) $$\tag*{$\blacksquare$}$$

References 

Grimmett, Geoffrey, and David Stirzaker. 2020. Probability and Random Processes. Oxford; New York: Oxford University Press.

Wasserman, Larry. 2004. All of Statistics: A Concise Course in Statistical Inference. Springer Texts in Statistics. New York: Springer. https://doi.org/10.1007/978-0-387-21736-9.