msmbuilder.msm.BayesianMarkovStateModel¶

class msmbuilder.msm.BayesianMarkovStateModel(lag_time=1, n_samples=100, n_steps=0, n_chains=None, n_timescales=None, reversible=True, ergodic_cutoff='on', prior_counts=0, sliding_window=True, random_state=None, sampler='metzner', verbose=False)¶

Bayesian reversible Markov state model.

Variant of MarkovStateModel which estimates a distribution over transition matrices instead of a single transition matrix using Metropolis Markov chain Monte Carlo. This distribution gives information about the statistical uncertainty in the transition matrix (and functions of the transition matrix), and is stored in all_transmats_

Parameters:

lag_time : int: The lag time of the model
n_samples : int, default=100: Total number of transition matrices to sample from the posterior
n_steps : int, default=n_states: Number of MCMC steps to take between sampled transition matrices. By default, we use n_steps=n_states_**2.
n_chains : int, default=n_procs: Number of independent Markov chains to simulate. The requested number of transition matrix samples will be generated from n_chains independent MCMC chains.
n_timescales : int, optional: The number of dynamical timescales to calculate when diagonalizing the transition matrix.
reversible : bool, default=True: Enforce reversibility during transition matrix sampling
ergodic_cutoff : int, default=1: Only the maximal strongly ergodic subgraph of the data is used to build an MSM. Ergodicity is determined by ensuring that each state is accessible from each other state via one or more paths involving edges with a number of observed directed counts greater than or equal to ergodic_cutoff. Not that by setting ergodic_cutoff to 0, this trimming is effectively turned off.
prior_counts : float, optional: Add a number of “pseudo counts” to each entry in the counts matrix. When prior_counts == 0 (default), the assigned transition probability between two states with no observed transitions will be zero, whereas when prior_counts > 0, even this unobserved transitions will be given nonzero probability.
sliding_window : bool, optional: Count transitions using a window of length lag_time, which is slid along the sequences 1 unit at a time, yielding transitions which contain more data but cannot be assumed to be statistically independent. Otherwise, the sequences are simply subsampled at an interval of lag_time.
random_state : int or RandomState instance or None (default): Pseudo Random Number generator seed control. If None, use the numpy.random singleton.
sampler : {‘metzner’, ‘metzner_py’}: The sampler implementation to use. ‘metzer’ is the sampler from Ref. [1] implemented in C, ‘metzner_py’ is a pure-python reference implementation.
verbose : bool: Enable verbose printout

Notes

Markov chain Monte Carlo can be computationally expensive. To get good (converged) results and acceptable performance, you’ll likely need to play around with the n_samples, n_steps and n_chains parameters. n_samples gives the total number of transition matrices sampled from the posterior. These samples are generated from n_chains different independent MCMC chains, at an interval of n_steps. The total number of iterations of MCMC performed during fit() is n_samples * n_steps. Increasing n_chains therefore does not alter the total number of iterations – instead it controls whether those iterations occur as part of one long chain or multiple shorter chains (which are run in parallel for sampler=='metzner').

References

[1]	P. Metzner, F. Noe and C. Schutte, “Estimating the sampling error: Distribution of transition matrices and functions of transition matrices for given trajectory data.” Phys. Rev. E 80 021106 (2009)

Attributes:

n_states_ : int: The number of states in the model
mapping_ : dict: Mapping between “input” labels and internal state indices used by the counts and transition matrix for this Markov state model. Input states need not necessarily be integers in (0, …, n_states_ - 1), for example. The semantics of mapping_[i] = j is that state i from the “input space” is represented by the index j in this MSM.
countsmat_ : array_like, shape = (n_states_, n_states_): Number of transition counts between states. countsmat_[i, j] is counted during fit(). The indices i and j are the “internal” indices described above. No correction for reversibility is made to this matrix.
transmats_ : array_like, shape = (n_samples, n_states_, n_states_): Samples from the posterior ensemble of transition matrices.

Methods

`fit_transform`(X[, y])	Fit to data, then transform it.
`get_params`([deep])	Get parameters for this estimator.
`inverse_transform`(sequences)	Transform a list of sequences from internal indexing into labels
`partial_transform`(sequence[, mode])	Transform a sequence to internal indexing
`set_params`(**params)	Set the parameters of this estimator.
`summarize`()	Return some diagnostic summary statistics about this Markov model
`transform`(sequences[, mode])	Transform a list of sequences to internal indexing

fit

__init__(lag_time=1, n_samples=100, n_steps=0, n_chains=None, n_timescales=None, reversible=True, ergodic_cutoff='on', prior_counts=0, sliding_window=True, random_state=None, sampler='metzner', verbose=False)¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`([lag_time, n_samples, n_steps, …])	Initialize self.
`fit`(sequences[, y])
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_params`([deep])	Get parameters for this estimator.
`inverse_transform`(sequences)	Transform a list of sequences from internal indexing into labels
`partial_transform`(sequence[, mode])	Transform a sequence to internal indexing
`set_params`(**params)	Set the parameters of this estimator.
`summarize`()	Return some diagnostic summary statistics about this Markov model
`transform`(sequences[, mode])	Transform a list of sequences to internal indexing

Attributes

`all_eigenvalues_`	Eigenvalues of the transition matrices.
`all_left_eigenvectors_`	Left eigenvectors, \(\Phi\), of each transition matrix in the ensemble
`all_populations_`
`all_right_eigenvectors_`	Right eigenvectors, \(\Psi\), of each transition matrix in the ensemble
`all_timescales_`	Implied relaxation timescales each sample in the ensemble

all_eigenvalues_¶

Eigenvalues of the transition matrices.

Returns:	eigs : array-like, shape = (n_samples, n_timescales+1) The eigenvalues of each transition matrix in the ensemble

all_left_eigenvectors_¶

Left eigenvectors, \(\Phi\), of each transition matrix in the ensemble

Each transition matrix’s left eigenvectors are normalized such that:

lv[:, 0] is the equilibrium populations and is normalized such that sum(lv[:, 0]) == 1`

The eigenvectors satisfy sum(lv[:, i] * lv[:, i] / model.populations_) == 1. In math notation, this is \(<\phi_i, \phi_i>_{\mu^{-1}} = 1\)

Returns:	lv : array-like, shape=(n_samples, n_states, n_timescales+1) The columns of lv, `lv[:, i]`, are the left eigenvectors of `transmat_`.

all_right_eigenvectors_¶

Right eigenvectors, \(\Psi\), of each transition matrix in the ensemble

Each transition matrix’s left eigenvectors are normalized such that:

Weighted by the stationary distribution, the right eigenvectors are normalized to 1. That is,

sum(rv[:, i] * rv[:, i] * self.populations_) == 1,

or \(<\psi_i, \psi_i>_{\mu} = 1\)

Returns:	rv : array-like, shape=(n_samples, n_states, n_timescales+1) The columns of lv, `rv[:, i]`, are the right eigenvectors of `transmat_`.

all_timescales_¶

Implied relaxation timescales each sample in the ensemble

Returns:	timescales : array-like, shape = (n_samples, n_timescales,) The longest implied relaxation timescales of the each sample in the ensemble of transition matrices, expressed in units of time-step between indices in the source data supplied to `fit()`.

References

[1]	Prinz, Jan-Hendrik, et al. “Markov models of molecular kinetics:

Generation and validation.” J. Chem. Phys. 134.17 (2011): 174105.

fit_transform(X, y=None, **fit_params)¶

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:	X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values.
Returns:	X_new : numpy array of shape [n_samples, n_features_new] Transformed array.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:	deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params : mapping of string to any Parameter names mapped to their values.

inverse_transform(sequences)¶

Transform a list of sequences from internal indexing into labels

Parameters:	sequences : list List of sequences, each of which is one-dimensional array of integers in `0, ..., n_states_ - 1`.
Returns:	sequences : list List of sequences, each of which is one-dimensional array of labels.

partial_transform(sequence, mode='clip')¶

Transform a sequence to internal indexing

Recall that sequence can be arbitrary labels, whereas transmat_ and countsmat_ are indexed with integers between 0 and n_states - 1. This methods maps a set of sequences from the labels onto this internal indexing.

Parameters:

sequence : array-like

A 1D iterable of state labels. Labels can be integers, strings, or other orderable objects.

mode : {‘clip’, ‘fill’}

Method by which to treat labels in sequence which do not have a corresponding index. This can be due, for example, to the ergodic trimming step.

clip: Unmapped labels are removed during transform. If they occur at the beginning or end of a sequence, the resulting transformed sequence will be shorted. If they occur in the middle of a sequence, that sequence will be broken into two (or more) sequences. (Default)
fill: Unmapped labels will be replaced with NaN, to signal missing data. [The use of NaN to signal missing data is not fantastic, but it’s consistent with current behavior of the pandas library.]

Returns:

mapped_sequence : list or ndarray: If mode is “fill”, return an ndarray in internal indexing. If mode is “clip”, return a list of ndarrays each in internal indexing.

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:	self

summarize()¶: Return some diagnostic summary statistics about this Markov model

transform(sequences, mode='clip')¶

Transform a list of sequences to internal indexing

Recall that sequences can be arbitrary labels, whereas transmat_ and countsmat_ are indexed with integers between 0 and n_states - 1. This methods maps a set of sequences from the labels onto this internal indexing.

Parameters:

sequences : list of array-like

List of sequences, or a single sequence. Each sequence should be a 1D iterable of state labels. Labels can be integers, strings, or other orderable objects.

mode : {‘clip’, ‘fill’}

Method by which to treat labels in sequences which do not have a corresponding index. This can be due, for example, to the ergodic trimming step.

clip: Unmapped labels are removed during transform. If they occur at the beginning or end of a sequence, the resulting transformed sequence will be shorted. If they occur in the middle of a sequence, that sequence will be broken into two (or more) sequences. (Default)
fill: Unmapped labels will be replaced with NaN, to signal missing data. [The use of NaN to signal missing data is not fantastic, but it’s consistent with current behavior of the pandas library.]

Returns:

mapped_sequences : list: List of sequences in internal indexing