msmbuilder.cluster.KCenters¶
- 
class msmbuilder.cluster.KCenters(n_clusters=8, metric='euclidean', random_state=None)¶
- K-Centers clustering - Cluster a vector or Trajectory dataset using a simple heuristic to minimize the maximum distance from any data point to its assigned cluster center. - The runtime of this algorithm is O(kN), where k is the number of clusters and N is the size of the dataset, making it one of the least expensive clustering algorithms available. - Parameters: - n_clusters : int, optional, default: 8
- The number of clusters to form as well as the number of centroids to generate. 
- metric : {“euclidean”, “sqeuclidean”, “cityblock”, “chebyshev”, “canberra”,
- “braycurtis”, “hamming”, “jaccard”, “cityblock”, “rmsd”} - The distance metric to use. metric = “rmsd” requires that sequences passed to - fit()be- `md.Trajectory`; other distance metrics require ``np.ndarray``s.
- random_state : integer or numpy.RandomState, optional
- The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator. 
 - References - [1] - Gonzalez, Teofilo F. “Clustering to minimize the maximum intercluster distance.” Theor. Comput. Sci. 38 (1985): 293-306. - [2] - Beauchamp, Kyle A., et al. “MSMBuilder2: modeling conformational dynamics on the picosecond to millisecond scale.” J. Chem. Theory. Comput. 7.10 (2011): 3412-3419. - Attributes: - `cluster_centers_` : array, [n_clusters, n_features]
- Coordinates of cluster centers 
- `labels_` : list of arrays, each of shape [sequence_length, ]
- labels_[i] is an array of the labels of each point in sequence i. The label of each point is an integer in [0, n_clusters). 
- `distances_` : list of arrays, each of shape [sequence_length, ]
- distances_[i] is an array of the labels of each point in sequence i. Distance from each sample to the cluster center it is assigned to. 
 - Methods - fit(sequences[, y])- Fit the kcenters clustering on the data - fit_predict(sequences[, y])- Performs clustering on X and returns cluster labels. - fit_transform(sequences[, y])- Alias for fit_predict - get_params([deep])- Get parameters for this estimator. - partial_predict(X[, y])- Predict the closest cluster each sample in X belongs to. - partial_transform(X)- Alias for partial_predict - predict(sequences[, y])- Predict the closest cluster each sample in each sequence in sequences belongs to. - set_params(**params)- Set the parameters of this estimator. - summarize()- Return some diagnostic summary statistics about this Markov model - transform(sequences)- Alias for predict - 
__init__(n_clusters=8, metric='euclidean', random_state=None)¶
- Initialize self. See help(type(self)) for accurate signature. 
 - Methods - __init__([n_clusters, metric, random_state])- Initialize self. - fit(sequences[, y])- Fit the kcenters clustering on the data - fit_predict(sequences[, y])- Performs clustering on X and returns cluster labels. - fit_transform(sequences[, y])- Alias for fit_predict - get_params([deep])- Get parameters for this estimator. - partial_predict(X[, y])- Predict the closest cluster each sample in X belongs to. - partial_transform(X)- Alias for partial_predict - predict(sequences[, y])- Predict the closest cluster each sample in each sequence in sequences belongs to. - set_params(**params)- Set the parameters of this estimator. - summarize()- Return some diagnostic summary statistics about this Markov model - transform(sequences)- Alias for predict - 
fit(sequences, y=None)¶
- Fit the kcenters clustering on the data - Parameters: - sequences : list of array-like, each of shape [sequence_length, n_features]
- A list of multivariate timeseries, or - md.Trajectory. Each sequence may have a different length, but they all must have the same number of features, or the same number of atoms if they are ``md.Trajectory``s.
 - Returns: - self
 
 - 
fit_predict(sequences, y=None)¶
- Performs clustering on X and returns cluster labels. - Parameters: - sequences : list of array-like, each of shape [sequence_length, n_features]
- A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features. 
 - Returns: - Y : list of ndarray, each of shape [sequence_length, ]
- Cluster labels 
 
 - 
fit_transform(sequences, y=None)¶
- Alias for fit_predict 
 - 
get_params(deep=True)¶
- Get parameters for this estimator. - Parameters: - deep : boolean, optional
- If True, will return the parameters for this estimator and contained subobjects that are estimators. 
 - Returns: - params : mapping of string to any
- Parameter names mapped to their values. 
 
 - 
partial_predict(X, y=None)¶
- Predict the closest cluster each sample in X belongs to. - In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book. - Parameters: - X : array-like shape=(n_samples, n_features)
- A single timeseries. 
 - Returns: - Y : array, shape=(n_samples,)
- Index of the cluster that each sample belongs to 
 
 - 
partial_transform(X)¶
- Alias for partial_predict 
 - 
predict(sequences, y=None)¶
- Predict the closest cluster each sample in each sequence in sequences belongs to. - In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book. - Parameters: - sequences : list of array-like, each of shape [sequence_length, n_features]
- A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features. 
 - Returns: - Y : list of arrays, each of shape [sequence_length,]
- Index of the closest center each sample belongs to. 
 
 - 
set_params(**params)¶
- Set the parameters of this estimator. - The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form - <component>__<parameter>so that it’s possible to update each component of a nested object.- Returns: - self
 
 - 
summarize()¶
- Return some diagnostic summary statistics about this Markov model 
 - 
transform(sequences)¶
- Alias for predict