msmbuilder.msm.ContinuousTimeMSM¶
-
class
msmbuilder.msm.
ContinuousTimeMSM
(lag_time=1, n_timescales=None, ergodic_cutoff=1, sliding_window=True, verbose=False, guess='log')¶ Reversible first order master equation model
This model fits a continuous-time Markov model (master equation) from discrete-time integer labeled timeseries. The key estimated attribute,
ratemat_
, is a matrix containing the estimated first order rate constants between the states. See [1] for details.Parameters: - lag_time : int
The lag time used to count the number of state to state transition events.
- n_timescales : int, optional
Number of implied timescales to calculate.
- ergodic_cutoff : int, default=1
Only the maximal strongly ergodic subgraph of the data is used to build an MSM. Ergodicity is determined by ensuring that each state is accessible from each other state via one or more paths involving edges with a number of observed directed counts greater than or equal to
ergodic_cutoff
. Not that by settingergodic_cutoff
to 0, this trimming is effectively turned off.- sliding_window : bool, default=True
Count transitions using a window of length
lag_time
, which is slid along the sequences 1 unit at a time, yielding transitions which contain more data but cannot be assumed to be statistically independent. Otherwise, the sequences are simply subsampled at an interval oflag_time
.- verbose : bool, default=False
Verbosity level
- guess : {‘log’, ‘pseudo’, array}, default=’log’
Method for determining the initial guess rate matrix, as input to the optimizer.
- ‘log’:
Initialize from matrix log of the MLE transition matrix
- ‘pseudo’:
Initialize from the pseudo-generator, using the 1st order taylor expansion of the matrix exponential.
Otherwise, supply your own (n_states_ x n_states_) numpy array as a guess rate matrix.
See also
MarkovStateModel
- discrete-time analog
References
[1] R. T. McGibbon and V. S. Pande, Efficient maximum likelihood parameterization of continuous-time Markov processes.” J. Chem. Phys. 143, 034109 (2015) http://dx.doi.org/10.1063/1.4926516 Attributes: - n_states_ : int
The number of states
- ratemat_ : np.ndarray, shape=(n_states_, n_state_)
The estimated state-to-state transition rates.
- transmat_ : np.ndarray, shape=(n_states_, n_state_)
The estimated state-to-state transition probabilities over an interval of 1 time unit.
- timescales_ : array of shape=(n_timescales,)
Estimated relaxation timescales of the model.
- populations_ : np.ndarray, shape=(n_states_,)
Estimated stationary probability distribution over the states.
- countsmat_ : array_like, shape = (n_states_, n_states_)
Number of transition counts between states, at a time delay of
lag_time
countsmat_[i, j] is counted during fit().- optimizer_state_ : object
Contains information about the optimization termination.
- mapping_ : dict
Mapping between “input” labels and internal state indices used by the counts and transition matrix for this Markov state model. Input states need not necessarily be integers in (0, …, n_states_ - 1), for example. The semantics of
mapping_[i] = j
is that statei
from the “input space” is represented by the indexj
in this MSM.- theta_ : array of shape n*(n+1)/2 or shorter
Optimized set of parameters for the model.
- information_ : np.ndarray, shape=(len(theta_), len(theta_))
Approximate inverse of the hessian of the model log-likelihood evaluated at
theta_
.- eigenvalues_ : array of shape=(n_timescales+1)
Largest eigenvalues of the rate matrix.
- left_eigenvectors_ : array of shape=(n_timescales+1)
Dominant left eigenvectors of the rate matrix.
- right_eigenvectors_ : array of shape=(n_timescales+1)
Dominant right eigenvectors of the rate matrix,
Methods
draw_samples
(sequences, n_samples[, …])Sample conformations for a sequences of states. fit_transform
(X[, y])Fit to data, then transform it. get_params
([deep])Get parameters for this estimator. inverse_transform
(sequences)Transform a list of sequences from internal indexing into labels partial_transform
(sequence[, mode])Transform a sequence to internal indexing sample_discrete
([state, n_steps, random_state])Generate a random sequence of states by propagating the model using discrete time steps given by the model lagtime. score
(sequences[, y])Score the model on new data using the generalized matrix Rayleigh quotient set_params
(**params)Set the parameters of this estimator. summarize
()Return some diagnostic summary statistics about this Markov model transform
(sequences[, mode])Transform a list of sequences to internal indexing uncertainty_K
()Estimate of the element-wise asymptotic standard deviation in the rate matrix uncertainty_eigenvalues
()Estimate of the element-wise asymptotic standard deviation in the model eigenvalues uncertainty_pi
()Estimate of the element-wise asymptotic standard deviation in the stationary distribution. uncertainty_timescales
()Estimate of the element-wise asymptotic standard deviation in the model relaxation timescales. fit -
__init__
(lag_time=1, n_timescales=None, ergodic_cutoff=1, sliding_window=True, verbose=False, guess='log')¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
([lag_time, n_timescales, …])Initialize self. draw_samples
(sequences, n_samples[, …])Sample conformations for a sequences of states. fit
(sequences[, y])fit_transform
(X[, y])Fit to data, then transform it. get_params
([deep])Get parameters for this estimator. inverse_transform
(sequences)Transform a list of sequences from internal indexing into labels partial_transform
(sequence[, mode])Transform a sequence to internal indexing sample_discrete
([state, n_steps, random_state])Generate a random sequence of states by propagating the model using discrete time steps given by the model lagtime. score
(sequences[, y])Score the model on new data using the generalized matrix Rayleigh quotient set_params
(**params)Set the parameters of this estimator. summarize
()Return some diagnostic summary statistics about this Markov model transform
(sequences[, mode])Transform a list of sequences to internal indexing uncertainty_K
()Estimate of the element-wise asymptotic standard deviation in the rate matrix uncertainty_eigenvalues
()Estimate of the element-wise asymptotic standard deviation in the model eigenvalues uncertainty_pi
()Estimate of the element-wise asymptotic standard deviation in the stationary distribution. uncertainty_timescales
()Estimate of the element-wise asymptotic standard deviation in the model relaxation timescales. Attributes
score_
Training score of the model, computed as the generalized matrix, Rayleigh quotient, the sum of the first n_components eigenvalues -
draw_samples
(sequences, n_samples, random_state=None)¶ Sample conformations for a sequences of states.
Parameters: - sequences : list or list of lists
A sequence or list of sequences, in which each element corresponds to a state label.
- n_samples : int
How many samples to return for any given state.
Returns: - selected_pairs_by_state : np.array, dtype=int,
shape=(n_states, n_samples, 2) selected_pairs_by_state[state] gives an array of randomly selected (trj, frame) pairs from the specified state.
See also
utils.map_drawn_samples
- Extract conformations from MD trajectories by
index.
-
fit_transform
(X, y=None, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: - X : numpy array of shape [n_samples, n_features]
Training set.
- y : numpy array of shape [n_samples]
Target values.
Returns: - X_new : numpy array of shape [n_samples, n_features_new]
Transformed array.
-
get_params
(deep=True)¶ Get parameters for this estimator.
Parameters: - deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: - params : mapping of string to any
Parameter names mapped to their values.
-
inverse_transform
(sequences)¶ Transform a list of sequences from internal indexing into labels
Parameters: - sequences : list
List of sequences, each of which is one-dimensional array of integers in
0, ..., n_states_ - 1
.
Returns: - sequences : list
List of sequences, each of which is one-dimensional array of labels.
-
partial_transform
(sequence, mode='clip')¶ Transform a sequence to internal indexing
Recall that sequence can be arbitrary labels, whereas
transmat_
andcountsmat_
are indexed with integers between 0 andn_states - 1
. This methods maps a set of sequences from the labels onto this internal indexing.Parameters: - sequence : array-like
A 1D iterable of state labels. Labels can be integers, strings, or other orderable objects.
- mode : {‘clip’, ‘fill’}
Method by which to treat labels in sequence which do not have a corresponding index. This can be due, for example, to the ergodic trimming step.
clip
Unmapped labels are removed during transform. If they occur at the beginning or end of a sequence, the resulting transformed sequence will be shorted. If they occur in the middle of a sequence, that sequence will be broken into two (or more) sequences. (Default)
fill
Unmapped labels will be replaced with NaN, to signal missing data. [The use of NaN to signal missing data is not fantastic, but it’s consistent with current behavior of the
pandas
library.]
Returns: - mapped_sequence : list or ndarray
If mode is “fill”, return an ndarray in internal indexing. If mode is “clip”, return a list of ndarrays each in internal indexing.
-
sample_discrete
(state=None, n_steps=100, random_state=None)¶ Generate a random sequence of states by propagating the model using discrete time steps given by the model lagtime.
Parameters: - state : {None, ndarray, label}
Specify the starting state for the chain.
None
Choose the initial state by randomly drawing from the model’s stationary distribution.
array-like
If
state
is a 1D array with length equal ton_states_
, then it is is interpreted as an initial multinomial distribution from which to draw the chain’s initial state. Note that the indexing semantics of this array must match the _internal_ indexing of this model.- otherwise
Otherwise,
state
is interpreted as a particular deterministic state label from which to begin the trajectory.
- n_steps : int
Lengths of the resulting trajectory
- random_state : int or RandomState instance or None (default)
Pseudo Random Number generator seed control. If None, use the numpy.random singleton.
Returns: - sequence : array of length n_steps
A randomly sampled label sequence
-
score
(sequences, y=None)¶ Score the model on new data using the generalized matrix Rayleigh quotient
Parameters: - sequences : list of array-like
List of sequences, or a single sequence. Each sequence should be a 1D iterable of state labels. Labels can be integers, strings, or other orderable objects.
Returns: - gmrq : float
Generalized matrix Rayleigh quotient. This number indicates how well the top
n_timescales+1
eigenvectors of this model perform as slowly decorrelating collective variables for the new data insequences
.
References
[1] McGibbon, R. T. and V. S. Pande, “Variational cross-validation of slow dynamical modes in molecular kinetics” J. Chem. Phys. 142, 124105 (2015)
-
score_
¶ Training score of the model, computed as the generalized matrix, Rayleigh quotient, the sum of the first n_components eigenvalues
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Returns: - self
-
summarize
()¶ Return some diagnostic summary statistics about this Markov model
-
transform
(sequences, mode='clip')¶ Transform a list of sequences to internal indexing
Recall that sequences can be arbitrary labels, whereas
transmat_
andcountsmat_
are indexed with integers between 0 andn_states - 1
. This methods maps a set of sequences from the labels onto this internal indexing.Parameters: - sequences : list of array-like
List of sequences, or a single sequence. Each sequence should be a 1D iterable of state labels. Labels can be integers, strings, or other orderable objects.
- mode : {‘clip’, ‘fill’}
Method by which to treat labels in sequences which do not have a corresponding index. This can be due, for example, to the ergodic trimming step.
clip
Unmapped labels are removed during transform. If they occur at the beginning or end of a sequence, the resulting transformed sequence will be shorted. If they occur in the middle of a sequence, that sequence will be broken into two (or more) sequences. (Default)
fill
Unmapped labels will be replaced with NaN, to signal missing data. [The use of NaN to signal missing data is not fantastic, but it’s consistent with current behavior of the
pandas
library.]
Returns: - mapped_sequences : list
List of sequences in internal indexing
-
uncertainty_K
()¶ Estimate of the element-wise asymptotic standard deviation in the rate matrix
-
uncertainty_eigenvalues
()¶ Estimate of the element-wise asymptotic standard deviation in the model eigenvalues
-
uncertainty_pi
()¶ Estimate of the element-wise asymptotic standard deviation in the stationary distribution.
-
uncertainty_timescales
()¶ Estimate of the element-wise asymptotic standard deviation in the model relaxation timescales.