msmbuilder.decomposition.tICA¶

class msmbuilder.decomposition.tICA(n_components=None, lag_time=1, gamma=0.05, weighted_transform=False)¶

Time-structure Independent Component Analysis (tICA)

Linear dimensionality reduction using an eigendecomposition of the time-lag correlation matrix and covariance matrix of the data and keeping only the vectors which decorrelate slowest to project the data into a lower dimensional space.

Parameters:

n_components : int, None

Number of components to keep.

lag_time : int

Delay time forward or backward in the input data. The time-lagged correlations is computed between datas X[t] and X[t+lag_time].

gamma : nonnegative float, default=0.05

Regularization strength. Positive gamma entails incrementing the sample covariance matrix by a constant times the identity, to ensure that it is positive definite. The exact form of the regularized sample covariance matrix is \(covariance + (gamma / n_features) * Tr(covariance) * Identity\)

weighted_transform : bool, default=False

If True, weight the projections by the implied timescales, giving a quantity that has units [Time].

Notes

This method was introduced originally in [R20], and has been applied to the analysis of molecular dynamics data in [R17], [R18], and [R19]. In [R17] and [R18], tICA was used as a dimensionality reduction technique before fitting other kinetic models.

Attributes

components_

(array-like, shape (n_components, n_features)) Components with maximum autocorrelation. offset_correlation_ : array-like, shape (n_features, n_features) Symmetric time-lagged correlation matrix, \(C=E[(x_t)^T x_{t+lag}]\). eigenvalues_ : array-like, shape (n_features,) Eigenvalues of the tICA generalized eigenproblem, in decreasing order. eigenvectors_ : array-like, shape (n_components, n_features) Eigenvectors of the tICA generalized eigenproblem. The vectors give a set of “directions” through configuration space along which the system relaxes towards equilibrium. Each eigenvector is associated with characteritic timescale :math:`-

rac{lag_time}{ln lambda_i}, where \(lambda_i\) is

the corresponding eigenvector. See [2] for more information. means_ : array, shape (n_features,) The mean of the data along each feature n_observations_ : int Total number of data points fit by the model. Note that the model is “reset” by calling fit() with new sequences, whereas partial_fit() updates the fit with new data, and is suitable for online learning. n_sequences_ : int Total number of sequences fit by the model. Note that the model is “reset” by calling fit() with new sequences, whereas partial_fit() updates the fit with new data, and is suitable for online learning. timescales_ : array-like, shape (n_features,) The implied timescales of the tICA model, given by -offset / log(eigenvalues)

Methods

`fit`(sequences[, y])	Fit the model with a collection of sequences.
`fit_transform`(sequences[, y])	Fit the model with X and apply the dimensionality reduction on X.
`get_params`([deep])	Get parameters for this estimator.
`partial_fit`(X)	Fit the model with X.
`partial_transform`(features)	Apply the dimensionality reduction on X.
`score`(sequences[, y])	Score the model on new data using the generalized matrix Rayleigh quotient
`set_params`(**params)	Set the parameters of this estimator.
`summarize`()	Some summary information.
`transform`(sequences)	Apply the dimensionality reduction on X.

score_¶: Training score of the model, computed as the generalized matrix, Rayleigh quotient, the sum of the first n_components eigenvalues

fit(sequences, y=None)¶

Fit the model with a collection of sequences.

This method is not online. Any state accumulated from previous calls to fit() or partial_fit() will be cleared. For online learning, use partial_fit.

Parameters:

sequences: list of array-like, each of shape (n_samples_i, n_features)

Training data, where n_samples_i in the number of samples in sequence i and n_features is the number of features.

y : None

Ignored

Returns:

self : object

Returns the instance itself.

partial_fit(X)¶

Fit the model with X.

This method is suitable for online learning. The state of the model will be updated with the new data X.

Parameters:

X: array-like, shape (n_samples, n_features)

Training data, where n_samples in the number of samples and n_features is the number of features.

Returns:

self : object

Returns the instance itself.

transform(sequences)¶

Apply the dimensionality reduction on X.

Parameters:

sequences: list of array-like, each of shape (n_samples_i, n_features)

Training data, where n_samples_i in the number of samples in sequence i and n_features is the number of features.

Returns:

sequence_new : list of array-like, each of shape (n_samples_i, n_components)

partial_transform(features)¶

Apply the dimensionality reduction on X.

Parameters:

features: array-like, shape (n_samples, n_features)

Training data, where n_samples in the number of samples and n_features is the number of features. This function acts on a single featurized trajectory.

Returns:

sequence_new : array-like, shape (n_samples, n_components)

TICA-projected features

Notes

This function acts on a single featurized trajectory.

fit_transform(sequences, y=None)¶

Fit the model with X and apply the dimensionality reduction on X.

This method is not online. Any state accumulated from previous calls to fit() or partial_fit() will be cleared. For online learning, use partial_fit.

Parameters:

sequences: list of array-like, each of shape (n_samples_i, n_features)

Training data, where n_samples_i in the number of samples in sequence i and n_features is the number of features.

y : None

Ignored

Returns:

sequence_new : list of array-like, each of shape (n_samples_i, n_components)

score(sequences, y=None)¶

Score the model on new data using the generalized matrix Rayleigh quotient

Parameters:

sequences : list of array-like

List of sequences, or a single sequence. Each sequence should be a 1D iterable of state labels. Labels can be integers, strings, or other orderable objects.

Returns:

gmrq : float

Generalized matrix Rayleigh quotient. This number indicates how well the top n_timescales+1 eigenvectors of this MSM perform as slowly decorrelating collective variables for the new data in sequences.

References

[R21]

McGibbon, R. T. and V. S. Pande, “Variational cross-validation of slow dynamical modes in molecular kinetics” http://arxiv.org/abs/1407.8083 (2014)

summarize()¶: Some summary information.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deep: boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:	self