msmbuilder.decomposition.tICA¶
-
class
msmbuilder.decomposition.
tICA
(n_components=None, lag_time=1, shrinkage=None, kinetic_mapping=False, commute_mapping=False)¶ Time-structure Independent Component Analysis (tICA)
Linear dimensionality reduction using an eigendecomposition of the time-lag correlation matrix and covariance matrix of the data and keeping only the vectors which decorrelate slowest to project the data into a lower dimensional space.Parameters: - n_components : int, None
Number of components to keep.
- lag_time : int
Delay time forward or backward in the input data. The time-lagged correlations is computed between datas X[t] and X[t+lag_time].
- shrinkage : float, default=None
The covariance shrinkage intensity (range 0-1). If shrinkage is not specified (the default) it is estimated using an analytic formula (the Rao-Blackwellized Ledoit-Wolf estimator) introduced in [5].
- kinetic_mapping : bool, default=False
- If True, weigh the projections by the tICA eigenvalues, yielding
kinetic distances as described in [6].
- commute_mapping : bool, default=False
If True, scale/weigh the projections by the sqrt(ti/2), yielding commute distance as described in [7].
Notes
This method was introduced originally in [R429ad44fa1ac-4], and has been applied to the analysis of molecular dynamics data in [R429ad44fa1ac-1], [R429ad44fa1ac-2], and [R429ad44fa1ac-3]. In [R429ad44fa1ac-1] and [R429ad44fa1ac-2], tICA was used as a dimensionality reduction technique before fitting other kinetic models.
Attributes: - components_ : array-like, shape (n_components, n_features)
Components with maximum autocorrelation.
- offset_correlation_ : array-like, shape (n_features, n_features)
Symmetric time-lagged correlation matrix, \(C=E[(x_t)^T x_{t+lag}]\).
- eigenvalues_ : array-like, shape (n_features,)
Eigenvalues of the tICA generalized eigenproblem, in decreasing order.
- eigenvectors_ : array-like, shape (n_components, n_features)
Eigenvectors of the tICA generalized eigenproblem. The vectors give a set of “directions” through configuration space along which the system relaxes towards equilibrium. Each eigenvector is associated with characteritic timescale :math:`-
- rac{lag_time}{ln lambda_i}, where :math:`lambda_i` is
the corresponding eigenvector. See [2] for more information.
- means_ : array, shape (n_features,)
The mean of the data along each feature
- n_observations_ : int
Total number of data points fit by the model. Note that the model is “reset” by calling fit() with new sequences, whereas partial_fit() updates the fit with new data, and is suitable for online learning.
- n_sequences_ : int
Total number of sequences fit by the model. Note that the model is “reset” by calling fit() with new sequences, whereas partial_fit() updates the fit with new data, and is suitable for
online learning.
- timescales_ : array-like, shape (n_features,)
The implied timescales of the tICA model, given by -offset / log(eigenvalues)
Methods
fit
(sequences[, y])Fit the model with a collection of sequences. fit_transform
(sequences[, y])Fit the model with X and apply the dimensionality reduction on X. get_params
([deep])Get parameters for this estimator. partial_fit
(X)Fit the model with X. partial_transform
(features)Apply the dimensionality reduction on X. score
(sequences[, y])Score the model on new data using the generalized matrix Rayleigh quotient set_params
(**params)Set the parameters of this estimator. summarize
()Some summary information. transform
(sequences)Apply the dimensionality reduction on X. -
__init__
(n_components=None, lag_time=1, shrinkage=None, kinetic_mapping=False, commute_mapping=False)¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
([n_components, lag_time, …])Initialize self. fit
(sequences[, y])Fit the model with a collection of sequences. fit_transform
(sequences[, y])Fit the model with X and apply the dimensionality reduction on X. get_params
([deep])Get parameters for this estimator. partial_fit
(X)Fit the model with X. partial_transform
(features)Apply the dimensionality reduction on X. score
(sequences[, y])Score the model on new data using the generalized matrix Rayleigh quotient set_params
(**params)Set the parameters of this estimator. summarize
()Some summary information. transform
(sequences)Apply the dimensionality reduction on X. Attributes
components_
covariance_
eigenvalues_
eigenvectors_
means_
offset_correlation_
score_
Training score of the model, computed as the generalized matrix, Rayleigh quotient, the sum of the first n_components eigenvalues timescales_
-
fit
(sequences, y=None)¶ Fit the model with a collection of sequences.
This method is not online. Any state accumulated from previous calls to fit() or partial_fit() will be cleared. For online learning, use partial_fit.
Parameters: - sequences: list of array-like, each of shape (n_samples_i, n_features)
Training data, where n_samples_i in the number of samples in sequence i and n_features is the number of features.
- y : None
Ignored
Returns: - self : object
Returns the instance itself.
-
fit_transform
(sequences, y=None)¶ Fit the model with X and apply the dimensionality reduction on X.
This method is not online. Any state accumulated from previous calls to fit() or partial_fit() will be cleared. For online learning, use partial_fit.
Parameters: - sequences: list of array-like, each of shape (n_samples_i, n_features)
Training data, where n_samples_i in the number of samples in sequence i and n_features is the number of features.
- y : None
Ignored
Returns: - sequence_new : list of array-like, each of shape (n_samples_i, n_components)
-
get_params
(deep=True)¶ Get parameters for this estimator.
Parameters: - deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: - params : mapping of string to any
Parameter names mapped to their values.
-
partial_fit
(X)¶ Fit the model with X.
This method is suitable for online learning. The state of the model will be updated with the new data X.
Parameters: - X: array-like, shape (n_samples, n_features)
Training data, where n_samples in the number of samples and n_features is the number of features.
Returns: - self : object
Returns the instance itself.
-
partial_transform
(features)¶ Apply the dimensionality reduction on X.
Parameters: - features: array-like, shape (n_samples, n_features)
Training data, where n_samples in the number of samples and n_features is the number of features. This function acts on a single featurized trajectory.
Returns: - sequence_new : array-like, shape (n_samples, n_components)
TICA-projected features
Notes
This function acts on a single featurized trajectory.
-
score
(sequences, y=None)¶ Score the model on new data using the generalized matrix Rayleigh quotient
Parameters: - sequences : list of array, each of shape (n_samples_i, n_features)
Test data. A list of sequences in afeature space, each of which is a 2D array of possibily different lengths, but the same number of features.
Returns: - gmrq : float
Generalized matrix Rayleigh quotient. This number indicates how well the top
n_timescales+1
eigenvectors of this tICA model perform as slowly decorrelating collective variables for the new data insequences
.
References
[1] McGibbon, R. T. and V. S. Pande, “Variational cross-validation of slow dynamical modes in molecular kinetics” J. Chem. Phys. 142, 124105 (2015)
-
score_
¶ Training score of the model, computed as the generalized matrix, Rayleigh quotient, the sum of the first n_components eigenvalues
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Returns: - self
-
summarize
()¶ Some summary information.
-
transform
(sequences)¶ Apply the dimensionality reduction on X.
Parameters: - sequences: list of array-like, each of shape (n_samples_i, n_features)
Training data, where n_samples_i in the number of samples in sequence i and n_features is the number of features.
Returns: - sequence_new : list of array-like, each of shape (n_samples_i, n_components)