Sparse time-structure Independent Component Analysis (tICA).
Linear dimensionality reduction which finds sparse linear combinations of the input features which decorrelate most slowly. These can be used for feature selection and/or dimensionality reduction.
This model requires the additional python package cvxpy, which can be installed from PyPI.
Warning
This model is currently experimental, and may undergo significant changes or bug fixes in upcoming releases.
Parameters: | n_components : int
lag_time : int
gamma : nonnegative float, default=0.05
rho : positive float
weighted_transform : bool, default=False
epsilon : positive float, default=1e-6
tolerance : positive float
maxiter : int
max_nc : int
greedy : bool, default=True
verbose : bool, default=False
|
---|
See also
References
[R12] | McGibbon, R. T. and V. S. Pande “Identification of sparse, slow reaction coordinates from molular dynamics simulations” In preparation. |
[R12] | Sriperumbudur, B. K., D. A. Torres, and G. R. Lanckriet. “A majorization-minimization approach to the sparse generalized eigenvalue problem.” Machine learning 85.1-2 (2011): 3-39. |
[R14] | Mackey, L. “Deflation Methods for Sparse PCA.” NIPS. Vol. 21. 2008. |
Attributes
components_ | (array-like, shape (n_components, n_features)) Components with maximum autocorrelation. |
offset_correlation_ | (array-like, shape (n_features, n_features)) Symmetric time-lagged correlation matrix, C=E[(x_t)^T x_{t+lag}]. |
eigenvalues_ | (array-like, shape (n_features,)) Psuedo-eigenvalues of the tICA generalized eigenproblem, in decreasing order. |
eigenvectors_ | (array-like, shape (n_components, n_features)) Sparse psuedo-eigenvectors of the tICA generalized eigenproblem. The vectors give a set of “directions” through configuration space along which the system relaxes towards equilibrium. |
means_ | (array, shape (n_features,)) The mean of the data along each feature |
n_observations_ | (int) Total number of data points fit by the model. Note that the model is “reset” by calling fit() with new sequences, whereas partial_fit() updates the fit with new data, and is suitable for online learning. |
n_sequences_ | (int) Total number of sequences fit by the model. Note that the model is “reset” by calling fit() with new sequences, whereas partial_fit() updates the fit with new data, and is suitable for online learning. |
timescales_ | (array-like, shape (n_components,)) The implied timescales of the tICA model, given by -offset / log(eigenvalues) |
Methods
fit(sequences[, y]) | Fit the model with a collection of sequences. |
fit_transform(sequences[, y]) | Fit the model with X and apply the dimensionality reduction on X. |
get_params([deep]) | Get parameters for this estimator. |
partial_fit(X) | Fit the model with X. |
partial_transform(features) | Apply the dimensionality reduction on X. |
score(sequences[, y]) | Score the model on new data using the generalized matrix Rayleigh quotient |
set_params(**params) | Set the parameters of this estimator. |
summarize() | Some summary information. |
transform(sequences) | Apply the dimensionality reduction on X. |
Some summary information.
Fit the model with a collection of sequences.
This method is not online. Any state accumulated from previous calls to fit() or partial_fit() will be cleared. For online learning, use partial_fit.
Parameters: | sequences: list of array-like, each of shape (n_samples_i, n_features)
y : None
|
---|---|
Returns: | self : object
|
Fit the model with X and apply the dimensionality reduction on X.
This method is not online. Any state accumulated from previous calls to fit() or partial_fit() will be cleared. For online learning, use partial_fit.
Parameters: | sequences: list of array-like, each of shape (n_samples_i, n_features)
y : None
|
---|---|
Returns: | sequence_new : list of array-like, each of shape (n_samples_i, n_components) |
Get parameters for this estimator.
Parameters: | deep: boolean, optional
|
---|---|
Returns: | params : mapping of string to any
|
Fit the model with X.
This method is suitable for online learning. The state of the model will be updated with the new data X.
Parameters: | X: array-like, shape (n_samples, n_features)
|
---|---|
Returns: | self : object
|
Apply the dimensionality reduction on X.
Parameters: | features: array-like, shape (n_samples, n_features)
|
---|---|
Returns: | sequence_new : array-like, shape (n_samples, n_components)
|
Notes
This function acts on a single featurized trajectory.
Score the model on new data using the generalized matrix Rayleigh quotient
Parameters: | sequences : list of array-like
|
---|---|
Returns: | gmrq : float
|
References
[R15] | McGibbon, R. T. and V. S. Pande, “Variational cross-validation of slow dynamical modes in molecular kinetics” J. Chem. Phys. 142, 124105 (2015) |
Training score of the model, computed as the generalized matrix, Rayleigh quotient, the sum of the first n_components eigenvalues
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
Returns: | self |
---|
Apply the dimensionality reduction on X.
Parameters: | sequences: list of array-like, each of shape (n_samples_i, n_features)
|
---|---|
Returns: | sequence_new : list of array-like, each of shape (n_samples_i, n_components) |