msmbuilder.decomposition.tICA¶
-
class
msmbuilder.decomposition.
tICA
(n_components=None, lag_time=1, shrinkage=None, weighted_transform=False, kinetic_mapping=False)¶ Time-structure Independent Component Analysis (tICA)
Linear dimensionality reduction using an eigendecomposition of the time-lag correlation matrix and covariance matrix of the data and keeping only the vectors which decorrelate slowest to project the data into a lower dimensional space.Parameters: n_components : int, None
Number of components to keep.
- lag_time : int
Delay time forward or backward in the input data. The time-lagged correlations is computed between datas X[t] and X[t+lag_time].
- shrinkage : float, default=None
The covariance shrinkage intensity (range 0-1). If shrinkage is not specified (the default) it is estimated using an analytic formula (the Rao-Blackwellized Ledoit-Wolf estimator) introduced in [5].
- weighted_transform : bool, default=False
Deprecated. Please use kinetic_mapping.
- kinetic_mapping : bool, default=False
- If True, weigh the projections by the tICA eigenvalues, yielding
kinetic distances as described in [6].
Notes
This method was introduced originally in [R16], and has been applied to the analysis of molecular dynamics data in [R13], [R14], and [R15]. In [R13] and [R14], tICA was used as a dimensionality reduction technique before fitting other kinetic models.
Attributes
components_ (array-like, shape (n_components, n_features)) Components with maximum autocorrelation. offset_correlation_ : array-like, shape (n_features, n_features) Symmetric time-lagged correlation matrix, \(C=E[(x_t)^T x_{t+lag}]\). eigenvalues_ : array-like, shape (n_features,) Eigenvalues of the tICA generalized eigenproblem, in decreasing order. eigenvectors_ : array-like, shape (n_components, n_features) Eigenvectors of the tICA generalized eigenproblem. The vectors give a set of “directions” through configuration space along which the system relaxes towards equilibrium. Each eigenvector is associated with characteritic timescale :math:`- rac{lag_time}{ln lambda_i}, where \(lambda_i\) is the corresponding eigenvector. See [2] for more information. means_ : array, shape (n_features,) The mean of the data along each feature n_observations_ : int Total number of data points fit by the model. Note that the model is “reset” by calling fit() with new sequences, whereas partial_fit() updates the fit with new data, and is suitable for online learning. n_sequences_ : int Total number of sequences fit by the model. Note that the model is “reset” by calling fit() with new sequences, whereas partial_fit() updates the fit with new data, and is suitable for online learning. timescales_ : array-like, shape (n_features,) The implied timescales of the tICA model, given by -offset / log(eigenvalues) Methods
fit
(sequences[, y])Fit the model with a collection of sequences. fit_transform
(sequences[, y])Fit the model with X and apply the dimensionality reduction on X. get_params
([deep])Get parameters for this estimator. partial_fit
(X)Fit the model with X. partial_transform
(features)Apply the dimensionality reduction on X. score
(sequences[, y])Score the model on new data using the generalized matrix Rayleigh quotient set_params
(**params)Set the parameters of this estimator. summarize
()Some summary information. transform
(sequences)Apply the dimensionality reduction on X. -
__init__
(n_components=None, lag_time=1, shrinkage=None, weighted_transform=False, kinetic_mapping=False)¶
Methods
__init__
([n_components, lag_time, ...])fit
(sequences[, y])Fit the model with a collection of sequences. fit_transform
(sequences[, y])Fit the model with X and apply the dimensionality reduction on X. get_params
([deep])Get parameters for this estimator. partial_fit
(X)Fit the model with X. partial_transform
(features)Apply the dimensionality reduction on X. score
(sequences[, y])Score the model on new data using the generalized matrix Rayleigh quotient set_params
(**params)Set the parameters of this estimator. summarize
()Some summary information. transform
(sequences)Apply the dimensionality reduction on X. Attributes
components_
covariance_
eigenvalues_
eigenvectors_
means_
offset_correlation_
score_
Training score of the model, computed as the generalized matrix, timescales_
-
fit
(sequences, y=None)¶ Fit the model with a collection of sequences.
This method is not online. Any state accumulated from previous calls to fit() or partial_fit() will be cleared. For online learning, use partial_fit.
Parameters: sequences: list of array-like, each of shape (n_samples_i, n_features)
Training data, where n_samples_i in the number of samples in sequence i and n_features is the number of features.
y : None
Ignored
Returns: self : object
Returns the instance itself.
-
fit_transform
(sequences, y=None)¶ Fit the model with X and apply the dimensionality reduction on X.
This method is not online. Any state accumulated from previous calls to fit() or partial_fit() will be cleared. For online learning, use partial_fit.
Parameters: sequences: list of array-like, each of shape (n_samples_i, n_features)
Training data, where n_samples_i in the number of samples in sequence i and n_features is the number of features.
y : None
Ignored
Returns: sequence_new : list of array-like, each of shape (n_samples_i, n_components)
-
get_params
(deep=True)¶ Get parameters for this estimator.
Parameters: deep: boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params : mapping of string to any
Parameter names mapped to their values.
-
partial_fit
(X)¶ Fit the model with X.
This method is suitable for online learning. The state of the model will be updated with the new data X.
Parameters: X: array-like, shape (n_samples, n_features)
Training data, where n_samples in the number of samples and n_features is the number of features.
Returns: self : object
Returns the instance itself.
-
partial_transform
(features)¶ Apply the dimensionality reduction on X.
Parameters: features: array-like, shape (n_samples, n_features)
Training data, where n_samples in the number of samples and n_features is the number of features. This function acts on a single featurized trajectory.
Returns: sequence_new : array-like, shape (n_samples, n_components)
TICA-projected features
Notes
This function acts on a single featurized trajectory.
-
score
(sequences, y=None)¶ Score the model on new data using the generalized matrix Rayleigh quotient
Parameters: sequences : list of array, each of shape (n_samples_i, n_features)
Test data. A list of sequences in afeature space, each of which is a 2D array of possibily different lengths, but the same number of features.
Returns: gmrq : float
Generalized matrix Rayleigh quotient. This number indicates how well the top
n_timescales+1
eigenvectors of this tICA model perform as slowly decorrelating collective variables for the new data insequences
.References
[R19] McGibbon, R. T. and V. S. Pande, “Variational cross-validation of slow dynamical modes in molecular kinetics” J. Chem. Phys. 142, 124105 (2015)
-
score_
¶ Training score of the model, computed as the generalized matrix, Rayleigh quotient, the sum of the first n_components eigenvalues
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Returns: self
-
summarize
()¶ Some summary information.
-
transform
(sequences)¶ Apply the dimensionality reduction on X.
Parameters: sequences: list of array-like, each of shape (n_samples_i, n_features)
Training data, where n_samples_i in the number of samples in sequence i and n_features is the number of features.
Returns: sequence_new : list of array-like, each of shape (n_samples_i, n_components)