msmbuilder.decomposition.tICA¶

class
msmbuilder.decomposition.
tICA
(n_components=None, lag_time=1, shrinkage=None, kinetic_mapping=False)¶ Timestructure Independent Component Analysis (tICA)
Linear dimensionality reduction using an eigendecomposition of the timelag correlation matrix and covariance matrix of the data and keeping only the vectors which decorrelate slowest to project the data into a lower dimensional space.Parameters: n_components : int, None
Number of components to keep.
 lag_time : int
Delay time forward or backward in the input data. The timelagged correlations is computed between datas X[t] and X[t+lag_time].
 shrinkage : float, default=None
The covariance shrinkage intensity (range 01). If shrinkage is not specified (the default) it is estimated using an analytic formula (the RaoBlackwellized LedoitWolf estimator) introduced in [5].
 kinetic_mapping : bool, default=False
 If True, weigh the projections by the tICA eigenvalues, yielding
kinetic distances as described in [6].
Notes
This method was introduced originally in [R17], and has been applied to the analysis of molecular dynamics data in [R14], [R15], and [R16]. In [R14] and [R15], tICA was used as a dimensionality reduction technique before fitting other kinetic models.
Attributes
components_ (arraylike, shape (n_components, n_features)) Components with maximum autocorrelation. offset_correlation_ : arraylike, shape (n_features, n_features) Symmetric timelagged correlation matrix, \(C=E[(x_t)^T x_{t+lag}]\). eigenvalues_ : arraylike, shape (n_features,) Eigenvalues of the tICA generalized eigenproblem, in decreasing order. eigenvectors_ : arraylike, shape (n_components, n_features) Eigenvectors of the tICA generalized eigenproblem. The vectors give a set of “directions” through configuration space along which the system relaxes towards equilibrium. Each eigenvector is associated with characteritic timescale :math:` rac{lag_time}{ln lambda_i}, where \(lambda_i\) is the corresponding eigenvector. See [2] for more information. means_ : array, shape (n_features,) The mean of the data along each feature n_observations_ : int Total number of data points fit by the model. Note that the model is “reset” by calling fit() with new sequences, whereas partial_fit() updates the fit with new data, and is suitable for online learning. n_sequences_ : int Total number of sequences fit by the model. Note that the model is “reset” by calling fit() with new sequences, whereas partial_fit() updates the fit with new data, and is suitable for online learning. timescales_ : arraylike, shape (n_features,) The implied timescales of the tICA model, given by offset / log(eigenvalues) Methods
fit
(sequences[, y])Fit the model with a collection of sequences. fit_transform
(sequences[, y])Fit the model with X and apply the dimensionality reduction on X. get_params
([deep])Get parameters for this estimator. partial_fit
(X)Fit the model with X. partial_transform
(features)Apply the dimensionality reduction on X. score
(sequences[, y])Score the model on new data using the generalized matrix Rayleigh quotient set_params
(**params)Set the parameters of this estimator. summarize
()Some summary information. transform
(sequences)Apply the dimensionality reduction on X. 
__init__
(n_components=None, lag_time=1, shrinkage=None, kinetic_mapping=False)¶
Methods
__init__
([n_components, lag_time, ...])fit
(sequences[, y])Fit the model with a collection of sequences. fit_transform
(sequences[, y])Fit the model with X and apply the dimensionality reduction on X. get_params
([deep])Get parameters for this estimator. partial_fit
(X)Fit the model with X. partial_transform
(features)Apply the dimensionality reduction on X. score
(sequences[, y])Score the model on new data using the generalized matrix Rayleigh quotient set_params
(**params)Set the parameters of this estimator. summarize
()Some summary information. transform
(sequences)Apply the dimensionality reduction on X. Attributes
components_
covariance_
eigenvalues_
eigenvectors_
means_
offset_correlation_
score_
Training score of the model, computed as the generalized matrix, timescales_

fit
(sequences, y=None)¶ Fit the model with a collection of sequences.
This method is not online. Any state accumulated from previous calls to fit() or partial_fit() will be cleared. For online learning, use partial_fit.
Parameters: sequences: list of arraylike, each of shape (n_samples_i, n_features)
Training data, where n_samples_i in the number of samples in sequence i and n_features is the number of features.
y : None
Ignored
Returns: self : object
Returns the instance itself.

fit_transform
(sequences, y=None)¶ Fit the model with X and apply the dimensionality reduction on X.
This method is not online. Any state accumulated from previous calls to fit() or partial_fit() will be cleared. For online learning, use partial_fit.
Parameters: sequences: list of arraylike, each of shape (n_samples_i, n_features)
Training data, where n_samples_i in the number of samples in sequence i and n_features is the number of features.
y : None
Ignored
Returns: sequence_new : list of arraylike, each of shape (n_samples_i, n_components)

get_params
(deep=True)¶ Get parameters for this estimator.
Parameters: deep: boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params : mapping of string to any
Parameter names mapped to their values.

partial_fit
(X)¶ Fit the model with X.
This method is suitable for online learning. The state of the model will be updated with the new data X.
Parameters: X: arraylike, shape (n_samples, n_features)
Training data, where n_samples in the number of samples and n_features is the number of features.
Returns: self : object
Returns the instance itself.

partial_transform
(features)¶ Apply the dimensionality reduction on X.
Parameters: features: arraylike, shape (n_samples, n_features)
Training data, where n_samples in the number of samples and n_features is the number of features. This function acts on a single featurized trajectory.
Returns: sequence_new : arraylike, shape (n_samples, n_components)
TICAprojected features
Notes
This function acts on a single featurized trajectory.

score
(sequences, y=None)¶ Score the model on new data using the generalized matrix Rayleigh quotient
Parameters: sequences : list of array, each of shape (n_samples_i, n_features)
Test data. A list of sequences in afeature space, each of which is a 2D array of possibily different lengths, but the same number of features.
Returns: gmrq : float
Generalized matrix Rayleigh quotient. This number indicates how well the top
n_timescales+1
eigenvectors of this tICA model perform as slowly decorrelating collective variables for the new data insequences
.References
[R20] McGibbon, R. T. and V. S. Pande, “Variational crossvalidation of slow dynamical modes in molecular kinetics” J. Chem. Phys. 142, 124105 (2015)

score_
¶ Training score of the model, computed as the generalized matrix, Rayleigh quotient, the sum of the first n_components eigenvalues

set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Returns: self

summarize
()¶ Some summary information.

transform
(sequences)¶ Apply the dimensionality reduction on X.
Parameters: sequences: list of arraylike, each of shape (n_samples_i, n_features)
Training data, where n_samples_i in the number of samples in sequence i and n_features is the number of features.
Returns: sequence_new : list of arraylike, each of shape (n_samples_i, n_components)