Reversible Markov State Model
This model fits a first-order Markov model to a dataset of integer-valued timeseries. The key estimated attribute, transmat_ is a matrix containing the estimated probability of transitioning between pairs of states in the duration specified by lag_time.
Unless otherwise specified, the model is constrained to be reversible (satisfy detailed balance), which is appropriate for equilibrium chemical systems.
Parameters: | lag_time : int
n_timescales : int, optional
reversible_type : {‘mle’, ‘transpose’, None}
ergodic_cutoff : int, default=1
prior_counts : float, optional
sliding_window : bool, optional
verbose : bool
|
---|
References
[R31] | Prinz, Jan-Hendrik, et al. “Markov models of molecular kinetics: Generation and validation.” J Chem. Phys. 134.17 (2011): 174105. |
[R32] | Pande, V. S., K. A. Beauchamp, and G. R. Bowman. “Everything you wanted to know about Markov State Models but were afraid to ask” Methods 52.1 (2010): 99-105. |
Attributes
n_states_ | (int) The number of states in the model |
mapping_ | (dict) Mapping between “input” labels and internal state indices used by the counts and transition matrix for this Markov state model. Input states need not necessarily be integers in (0, ..., n_states_ - 1), for example. The semantics of mapping_[i] = j is that state i from the “input space” is represented by the index j in this MSM. |
countsmat_ | (array_like, shape = (n_states_, n_states_)) Number of transition counts between states. countsmat_[i, j] is counted during fit(). The indices i and j are the “internal” indices described above. No correction for reversibility is made to this matrix. |
transmat_ | (array_like, shape = (n_states_, n_states_)) Maximum likelihood estimate of the reversible transition matrix. The indices i and j are the “internal” indices described above. |
populations_ | (array, shape = (n_states_,)) The equilibrium population (stationary eigenvector) of transmat_ |
Methods
draw_samples(sequences, n_samples[, ...]) | Sample conformations from each state. |
eigtransform(sequences[, right, mode]) | Transform a list of sequences by projecting the sequences onto the first n_timescales dynamical eigenvectors. |
fit(sequences[, y]) | Estimate model parameters. |
fit_transform(X[, y]) | Fit to data, then transform it. |
get_params([deep]) | Get parameters for this estimator. |
inverse_transform(sequences) | Transform a list of sequences from internal indexing into |
sample([state, n_steps, random_state]) | Generate a random sequence of states by propagating the model |
score(sequences[, y]) | Score the model on new data using the generalized matrix Rayleigh quotient |
score_ll(sequences) | log of the likelihood of sequences with respect to the model |
set_params(**params) | Set the parameters of this estimator. |
summarize() | Return some diagnostic summary statistics about this Markov model |
transform(sequences[, mode]) | Transform a list of sequences to internal indexing |
Estimate model parameters.
Parameters: | sequences : list of array-like
|
---|---|
Returns: | self |
Notes
None and NaN are recognized immediately as invalid labels. Therefore, transition counts from or to a sequence item which is NaN or None will not be counted. The mapping_ attribute will not include the NaN or None.
Transform a list of sequences by projecting the sequences onto the first n_timescales dynamical eigenvectors.
Parameters: | sequences : list of array-like
right : bool
mode : {‘clip’, ‘fill’}
|
---|---|
Returns: | transformed : list of 2d arrays
|
References
[R33] | Prinz, Jan-Hendrik, et al. “Markov models of molecular kinetics: |
Generation and validation.” J. Chem. Phys. 134.17 (2011): 174105.
Generate a random sequence of states by propagating the model
Parameters: | state : {None, ndarray, label}
n_steps : int
random_state : int or RandomState instance or None (default)
|
---|---|
Returns: | sequence : array of length n_steps
|
log of the likelihood of sequences with respect to the model
Parameters: | sequences : list of array-like
|
---|---|
Returns: | loglikelihood : float
|
Return some diagnostic summary statistics about this Markov model
Training score of the model, computed as the generalized matrix, Rayleigh quotient, the sum of the first n_components eigenvalues
Score the model on new data using the generalized matrix Rayleigh quotient
Parameters: | sequences : list of array-like
|
---|---|
Returns: | gmrq : float
|
References
[R34] | McGibbon, R. T. and V. S. Pande, “Variational cross-validation of slow dynamical modes in molecular kinetics” http://arxiv.org/abs/1407.8083 (2014) |
Implied relaxation timescales of the model.
The relaxation of any initial distribution towards equilibrium is given, according to this model, by a sum of terms – each corresponding to the relaxation along a specific direction (eigenvector) in state space – which decay exponentially in time. See equation 19. from [1].
Returns: | timescales : array-like, shape = (n_timescales,)
|
---|
References
[R35] | Prinz, Jan-Hendrik, et al. “Markov models of molecular kinetics: |
Generation and validation.” J. Chem. Phys. 134.17 (2011): 174105.
Eigenvalues of the transition matrix.
Left eigenvectors, \(\Phi\), of the transition matrix.
The left eigenvectors are normalized such that:
- lv[:, 0] is the equilibrium populations and is normalized such that sum(lv[:, 0]) == 1`
- The eigenvectors satisfy sum(lv[:, i] * lv[:, i] / model.populations_) == 1. In math notation, this is \(<\phi_i, \phi_i>_{\mu^{-1}} = 1\)
Returns: | lv : array-like, shape=(n_states, n_timescales+1)
|
---|
Right eigenvectors, \(\Psi\), of the transition matrix.
The right eigenvectors are normalized such that:
Weighted by the stationary distribution, the right eigenvectors are normalized to 1. That is,
sum(rv[:, i] * rv[:, i] * self.populations_) == 1,
or \(<\psi_i, \psi_i>_{\mu} = 1\)
Returns: | rv : array-like, shape=(n_states, n_timescales+1)
|
---|
Sample conformations from each state.
Parameters: | sequences : list
n_samples : int
|
---|---|
Returns: | selected_pairs_by_state : np.array, dtype=int, shape=(n_states, n_samples, 2)
|
See also
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: | X : numpy array of shape [n_samples, n_features]
y : numpy array of shape [n_samples]
|
---|---|
Returns: | X_new : numpy array of shape [n_samples, n_features_new]
|
Get parameters for this estimator.
Parameters: | deep: boolean, optional
|
---|---|
Returns: | params : mapping of string to any
|
Transform a list of sequences from internal indexing into labels
Parameters: | sequences : list
|
---|---|
Returns: | sequences : list
|
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
Returns: | self |
---|
Transform a list of sequences to internal indexing
Recall that sequences can be arbitrary labels, whereas transmat_ and countsmat_ are indexed with integers between 0 and n_states - 1. This methods maps a set of sequences from the labels onto this internal indexing.
Parameters: | sequences : list of array-like
mode : {‘clip’, ‘fill’}
|
---|---|
Returns: | mapped_sequences : list
|