# Markov state models (MSMs)¶

Markov state models (MSMs) are a class of models for modeling the long-timescale dynamics of molecular systems. They model the dynamics of a system as a series of memoryless, probabilistic jumps between a set of states. Practically, the model consists of (1) a set of conformational states, and (2) a matrix of transition probabilities between each pair of states.

In MSMBuilder, you can use MarkovStateModel to build MSMs from “labeled” trajectories – that is, sequences of integers that are the result of clustering.

## Algorithms¶

 MarkovStateModel([lag_time, n_timescales, ...]) Reversible Markov State Model BayesianMarkovStateModel([lag_time, ...]) Bayesian reversible Markov state model.

## Maximum Likelihood and Bayesian Estimation¶

There are two steps in constructing an MSM

1. Count the number of observed transitions between states. That is, construct $$\mathbf{C}$$ such that $$C_{ij}$$ is the number of observed transitions from state $$i$$ at time $$t$$ to state $$j$$ at time $$t+\tau$$, summed over all times $$t$$.

2. Estimate the transition probability matrix, $$\mathbf{T}$$

$T_{ij} = P( s_{t+\tau} = j | s_t = i)$

where $$S = (s_t)$$ is a trajectory in state-index space of length $$N$$, and $$s_t \in \{1, \ldots, k\}$$ the state-index of the trajectory at time $$t$$.

The probability that a given transition probability matrix would generate some observed trajectory (the likelihood) is

$\mathcal{L}(\mathbf{T}) = P(S | \mathbf{T}) = \prod_{t=0}^{N-\tau} T_{s_t, s_{t+\tau}} = \prod_{i,j}^{k} T_{ij}^{C_{ij}}.$

Assuming a prior distribution on $$T$$ of the form $$P(T)=\prod_{ij} T_{ij}^{B_{ij}}$$, we then have a posterior distribution

$P(\mathbf{T} | S) \propto \prod_{i,j}^{k} T_{ij}^{B_{ij} + C_{ij}}.$

MSMBuilder implements two MSM estimators.

• MarkovStateModel performs maximum likelihood estimation. It estimates a single transition matrix, $$\mathbf{T}$$, to maximimize $$\mathcal{L}(\mathbf{T})$$.
• BayesianMarkovStateModel uses Metropolis Markov chain Monte Carlo to (approximately) draw a sample of transition matrices from the posterior distribution $$P(\mathbf{T} | S)$$. This sampler is described in Metzner et al.  This can be used to estimate the sampling uncertainty in functions of the transition matrix (e.g. relaxation timescales).

Note

The uncertainty in the transition matrix (and functions of the transition matrix) that can be estimated from BayesianMarkovStateModel do not fully account for all sources of error. In particular, the discretization induced by clustering produces a negative bias on the eigenvalues of the transition matrix – they asymptotically underestimate the eigenvalues of the propagator / transfer operator in the limit of infinite sampling.  See section 3D (Quantifying the discretization error) of Prinz et al. for more discussion on the discretization error.