msmbuilder.msm.ContinuousTimeMSM

class msmbuilder.msm.ContinuousTimeMSM(lag_time=1, n_timescales=None, ergodic_cutoff=1, sliding_window=True, verbose=False)

Reversible first order master equation model

This model fits a reversible continuous-time Markov model for labeled sequence data.

Warning

This model is currently (as of December 2, 2014) experimental, and may undergo significant changes or bugfixes in upcoming releases.

Parameters:

lag_time : int

The lag time used to count the number of state to state transition events.

n_timescales : int, optional

Number of implied timescales to calculate.

ergodic_cutoff : int, default=1

Only the maximal strongly ergodic subgraph of the data is used to build an MSM. Ergodicity is determined by ensuring that each state is accessible from each other state via one or more paths involving edges with a number of observed directed counts greater than or equal to ergodic_cutoff. Not that by setting ergodic_cutoff to 0, this trimming is effectively turned off.

sliding_window : bool, default=True

Count transitions using a window of length lag_time, which is slid along the sequences 1 unit at a time, yielding transitions which contain more data but cannot be assumed to be statistically independent. Otherwise, the sequences are simply subsampled at an interval of lag_time.

verbose : bool, default=False

Verbosity level

See also

MarkovStateModel
discrete-time analog

Attributes

n_states_ (int) The number of states
ratemat_ (np.ndarray, shape=(n_states_, n_state_)) The estimated state-to-state transition rates.
transmat_ (np.ndarray, shape=(n_states_, n_state_)) The estimated state-to-state transition probabilities over an interval of 1 time unit.
timescales_ (array of shape=(n_timescales,)) Estimated relaxation timescales of the model.
populations_ (np.ndarray, shape=(n_states_,)) Estimated stationary probability distribution over the states.
countsmat_ (array_like, shape = (n_states_, n_states_)) Number of transition counts between states, at a time delay of lag_time countsmat_[i, j] is counted during fit().
optimizer_state_ (object) Contains information about the optimization termination.
mapping_ (dict) Mapping between “input” labels and internal state indices used by the counts and transition matrix for this Markov state model. Input states need not necessarily be integers in (0, ..., n_states_ - 1), for example. The semantics of mapping_[i] = j is that state i from the “input space” is represented by the index j in this MSM.
theta_ (array of shape n*(n+1)/2 or shorter) Optimized set of parameters for the model.
information_ (np.ndarray, shape=(len(theta_), len(theta_))) Approximate inverse of the hessian of the model log-likelihood evaluated at theta_.
eigenvalues_ ( array of shape=(n_timescales+1)) Largest eigenvalues of the rate matrix.
left_eigenvectors_ (array of shape=(n_timescales+1)) Dominant left eigenvectors of the rate matrix.
right_eigenvectors_ (array of shape=(n_timescales+1)) Dominant right eigenvectors of the rate matrix,

Methods

fit(sequences[, y])
fit_transform(X[, y]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
inverse_transform(sequences) Transform a list of sequences from internal indexing into
score(sequences[, y]) Score the model on new data using the generalized matrix Rayleigh
set_params(**params) Set the parameters of this estimator.
summarize()
transform(sequences[, mode]) Transform a list of sequences to internal indexing
uncertainty_K() Estimate of the element-wise asymptotic standard deviation
uncertainty_eigenvalues() Estimate of the element-wise asymptotic standard deviation
uncertainty_pi() Estimate of the element-wise asymptotic standard deviation in the stationary distribution.
uncertainty_timescales() Estimate of the element-wise asymptotic standard deviation in the model relaxation timescales.
uncertainty_K()

Estimate of the element-wise asymptotic standard deviation in the rate matrix

uncertainty_pi()

Estimate of the element-wise asymptotic standard deviation in the stationary distribution.

uncertainty_eigenvalues()

Estimate of the element-wise asymptotic standard deviation in the model eigenvalues

uncertainty_timescales()

Estimate of the element-wise asymptotic standard deviation in the model relaxation timescales.

score_

Training score of the model, computed as the generalized matrix, Rayleigh quotient, the sum of the first n_components eigenvalues

score(sequences, y=None)

Score the model on new data using the generalized matrix Rayleigh quotient

Parameters:

sequences : list of array-like

List of sequences, or a single sequence. Each sequence should be a 1D iterable of state labels. Labels can be integers, strings, or other orderable objects.

Returns:

gmrq : float

Generalized matrix Rayleigh quotient. This number indicates how well the top n_timescales+1 eigenvectors of this model perform as slowly decorrelating collective variables for the new data in sequences.

References

[R29]McGibbon, R. T. and V. S. Pande, “Variational cross-validation of slow dynamical modes in molecular kinetics” http://arxiv.org/abs/1407.8083 (2014)
fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns:

X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep: boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

inverse_transform(sequences)

Transform a list of sequences from internal indexing into labels

Parameters:

sequences : list

List of sequences, each of which is one-dimensional array of integers in 0, ..., n_states_ - 1.

Returns:

sequences : list

List of sequences, each of which is one-dimensional array of labels.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:self
transform(sequences, mode='clip')

Transform a list of sequences to internal indexing

Recall that sequences can be arbitrary labels, whereas transmat_ and countsmat_ are indexed with integers between 0 and n_states - 1. This methods maps a set of sequences from the labels onto this internal indexing.

Parameters:

sequences : list of array-like

List of sequences, or a single sequence. Each sequence should be a 1D iterable of state labels. Labels can be integers, strings, or other orderable objects.

mode : {‘clip’, ‘fill’}

Method by which to treat labels in sequences which do not have a corresponding index. This can be due, for example, to the ergodic trimming step.

clip

Unmapped labels are removed during transform. If they occur at the beginning or end of a sequence, the resulting transformed sequence will be shorted. If they occur in the middle of a sequence, that sequence will be broken into two (or more) sequences. (Default)

fill

Unmapped labels will be replaced with NaN, to signal missing data. [The use of NaN to signal missing data is not fantastic, but it’s consistent with current behavior of the pandas library.]

Returns:

mapped_sequences : list

List of sequences in internal indexing

Versions