msmbuilder.cluster.KCenters¶
-
class
msmbuilder.cluster.
KCenters
(n_clusters=8, metric='euclidean', random_state=None)¶ K-Centers clustering
Cluster a vector or Trajectory dataset using a simple heuristic to minimize the maximum distance from any data point to its assigned cluster center.
The runtime of this algorithm is O(kN), where k is the number of clusters and N is the size of the dataset, making it one of the least expensive clustering algorithms available.
Parameters: - n_clusters : int, optional, default: 8
The number of clusters to form as well as the number of centroids to generate.
- metric : {“euclidean”, “sqeuclidean”, “cityblock”, “chebyshev”, “canberra”,
“braycurtis”, “hamming”, “jaccard”, “cityblock”, “rmsd”}
The distance metric to use. metric = “rmsd” requires that sequences passed to
fit()
be`md.Trajectory`
; other distance metrics require ``np.ndarray``s.- random_state : integer or numpy.RandomState, optional
The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.
References
[1] Gonzalez, Teofilo F. “Clustering to minimize the maximum intercluster distance.” Theor. Comput. Sci. 38 (1985): 293-306. [2] Beauchamp, Kyle A., et al. “MSMBuilder2: modeling conformational dynamics on the picosecond to millisecond scale.” J. Chem. Theory. Comput. 7.10 (2011): 3412-3419. Attributes: - `cluster_centers_` : array, [n_clusters, n_features]
Coordinates of cluster centers
- `labels_` : list of arrays, each of shape [sequence_length, ]
labels_[i] is an array of the labels of each point in sequence i. The label of each point is an integer in [0, n_clusters).
- `distances_` : list of arrays, each of shape [sequence_length, ]
distances_[i] is an array of the labels of each point in sequence i. Distance from each sample to the cluster center it is assigned to.
Methods
fit
(sequences[, y])Fit the kcenters clustering on the data fit_predict
(sequences[, y])Performs clustering on X and returns cluster labels. fit_transform
(sequences[, y])Alias for fit_predict get_params
([deep])Get parameters for this estimator. partial_predict
(X[, y])Predict the closest cluster each sample in X belongs to. partial_transform
(X)Alias for partial_predict predict
(sequences[, y])Predict the closest cluster each sample in each sequence in sequences belongs to. set_params
(**params)Set the parameters of this estimator. summarize
()Return some diagnostic summary statistics about this Markov model transform
(sequences)Alias for predict -
__init__
(n_clusters=8, metric='euclidean', random_state=None)¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
([n_clusters, metric, random_state])Initialize self. fit
(sequences[, y])Fit the kcenters clustering on the data fit_predict
(sequences[, y])Performs clustering on X and returns cluster labels. fit_transform
(sequences[, y])Alias for fit_predict get_params
([deep])Get parameters for this estimator. partial_predict
(X[, y])Predict the closest cluster each sample in X belongs to. partial_transform
(X)Alias for partial_predict predict
(sequences[, y])Predict the closest cluster each sample in each sequence in sequences belongs to. set_params
(**params)Set the parameters of this estimator. summarize
()Return some diagnostic summary statistics about this Markov model transform
(sequences)Alias for predict -
fit
(sequences, y=None)¶ Fit the kcenters clustering on the data
Parameters: - sequences : list of array-like, each of shape [sequence_length, n_features]
A list of multivariate timeseries, or
md.Trajectory
. Each sequence may have a different length, but they all must have the same number of features, or the same number of atoms if they are ``md.Trajectory``s.
Returns: - self
-
fit_predict
(sequences, y=None)¶ Performs clustering on X and returns cluster labels.
Parameters: - sequences : list of array-like, each of shape [sequence_length, n_features]
A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features.
Returns: - Y : list of ndarray, each of shape [sequence_length, ]
Cluster labels
-
fit_transform
(sequences, y=None)¶ Alias for fit_predict
-
get_params
(deep=True)¶ Get parameters for this estimator.
Parameters: - deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: - params : mapping of string to any
Parameter names mapped to their values.
-
partial_predict
(X, y=None)¶ Predict the closest cluster each sample in X belongs to.
In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.
Parameters: - X : array-like shape=(n_samples, n_features)
A single timeseries.
Returns: - Y : array, shape=(n_samples,)
Index of the cluster that each sample belongs to
-
partial_transform
(X)¶ Alias for partial_predict
-
predict
(sequences, y=None)¶ Predict the closest cluster each sample in each sequence in sequences belongs to.
In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.
Parameters: - sequences : list of array-like, each of shape [sequence_length, n_features]
A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features.
Returns: - Y : list of arrays, each of shape [sequence_length,]
Index of the closest center each sample belongs to.
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Returns: - self
-
summarize
()¶ Return some diagnostic summary statistics about this Markov model
-
transform
(sequences)¶ Alias for predict