msmbuilder.cluster.MiniBatchKMeans¶
- 
class msmbuilder.cluster.MiniBatchKMeans(n_clusters=8, init='k-means++', max_iter=100, batch_size=100, verbose=0, compute_labels=True, random_state=None, tol=0.0, max_no_improvement=10, init_size=None, n_init=3, reassignment_ratio=0.01)¶
- Mini-Batch K-Means clustering - Read more in the User Guide. - Parameters: - n_clusters : int, optional, default: 8 - The number of clusters to form as well as the number of centroids to generate. - max_iter : int, optional - Maximum number of iterations over the complete dataset before stopping independently of any early stopping criterion heuristics. - max_no_improvement : int, default: 10 - Control early stopping based on the consecutive number of mini batches that does not yield an improvement on the smoothed inertia. - To disable convergence detection based on inertia, set max_no_improvement to None. - tol : float, default: 0.0 - Control early stopping based on the relative center changes as measured by a smoothed, variance-normalized of the mean center squared position changes. This early stopping heuristics is closer to the one used for the batch variant of the algorithms but induces a slight computational and memory overhead over the inertia heuristic. - To disable convergence detection based on normalized center change, set tol to 0.0 (default). - batch_size : int, optional, default: 100 - Size of the mini batches. - init_size : int, optional, default: 3 * batch_size - Number of samples to randomly sample for speeding up the initialization (sometimes at the expense of accuracy): the only algorithm is initialized by running a batch KMeans on a random subset of the data. This needs to be larger than n_clusters. - init : {‘k-means++’, ‘random’ or an ndarray}, default: ‘k-means++’ - Method for initialization, defaults to ‘k-means++’: - ‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details. - ‘random’: choose k observations (rows) at random from data for the initial centroids. - If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers. - n_init : int, default=3 - Number of random initializations that are tried. In contrast to KMeans, the algorithm is only run once, using the best of the - n_initinitializations as measured by inertia.- compute_labels : boolean, default=True - Compute label assignment and inertia for the complete dataset once the minibatch optimization has converged in fit. - random_state : integer or numpy.RandomState, optional - The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator. - reassignment_ratio : float, default: 0.01 - Control the fraction of the maximum number of counts for a center to be reassigned. A higher value means that low count centers are more easily reassigned, which means that the model will take longer to converge, but should converge in a better clustering. - verbose : boolean, optional - Verbosity mode. - See also - KMeans
- The classic implementation of the clustering method based on the Lloyd’s algorithm. It consumes the whole set of input data at each iteration.
 - Notes - See http://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf - Attributes - cluster_centers_ - (array, [n_clusters, n_features]) Coordinates of cluster centers - labels_ - (list of arrays, each of shape [sequence_length, ]) The label of each point is an integer in [0, n_clusters). - inertia_ - (float) The value of the inertia criterion associated with the chosen partition (if compute_labels is set to True). The inertia is defined as the sum of square distances of samples to their nearest neighbor. - Methods - fit(sequences[, y])- Fit the clustering on the data - fit_predict(sequences[, y])- Performs clustering on X and returns cluster labels. - fit_transform(sequences[, y])- Alias for fit_predict - get_params([deep])- Get parameters for this estimator. - partial_fit(X[, y])- Update k means estimate on a single mini-batch X. - partial_predict(X[, y])- Predict the closest cluster each sample in X belongs to. - partial_transform(X)- Alias for partial_predict - predict(sequences[, y])- Predict the closest cluster each sample in each sequence in sequences belongs to. - score(X[, y])- Opposite of the value of X on the K-means objective. - set_params(**params)- Set the parameters of this estimator. - summarize()- Return some diagnostic summary statistics about this Markov model - transform(sequences)- Alias for predict - 
__init__(n_clusters=8, init='k-means++', max_iter=100, batch_size=100, verbose=0, compute_labels=True, random_state=None, tol=0.0, max_no_improvement=10, init_size=None, n_init=3, reassignment_ratio=0.01)¶
 - Methods - __init__([n_clusters, init, max_iter, ...])- fit(sequences[, y])- Fit the clustering on the data - fit_predict(sequences[, y])- Performs clustering on X and returns cluster labels. - fit_transform(sequences[, y])- Alias for fit_predict - get_params([deep])- Get parameters for this estimator. - partial_fit(X[, y])- Update k means estimate on a single mini-batch X. - partial_predict(X[, y])- Predict the closest cluster each sample in X belongs to. - partial_transform(X)- Alias for partial_predict - predict(sequences[, y])- Predict the closest cluster each sample in each sequence in sequences belongs to. - score(X[, y])- Opposite of the value of X on the K-means objective. - set_params(**params)- Set the parameters of this estimator. - summarize()- Return some diagnostic summary statistics about this Markov model - transform(sequences)- Alias for predict - 
fit(sequences, y=None)¶
- Fit the clustering on the data - Parameters: - sequences : list of array-like, each of shape [sequence_length, n_features] - A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features. - Returns: - self 
 - 
fit_predict(sequences, y=None)¶
- Performs clustering on X and returns cluster labels. - Parameters: - sequences : list of array-like, each of shape [sequence_length, n_features] - A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features. - Returns: - Y : list of ndarray, each of shape [sequence_length, ] - Cluster labels 
 - 
fit_transform(sequences, y=None)¶
- Alias for fit_predict 
 - 
get_params(deep=True)¶
- Get parameters for this estimator. - Parameters: - deep : boolean, optional - If True, will return the parameters for this estimator and contained subobjects that are estimators. - Returns: - params : mapping of string to any - Parameter names mapped to their values. 
 - 
partial_fit(X, y=None)¶
- Update k means estimate on a single mini-batch X. - Parameters: - X : array-like, shape = [n_samples, n_features] - Coordinates of the data points to cluster. 
 - 
partial_predict(X, y=None)¶
- Predict the closest cluster each sample in X belongs to. - In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book. - Parameters: - X : array-like shape=(n_samples, n_features) - A single timeseries. - Returns: - Y : array, shape=(n_samples,) - Index of the cluster that each sample belongs to 
 - 
partial_transform(X)¶
- Alias for partial_predict 
 - 
predict(sequences, y=None)¶
- Predict the closest cluster each sample in each sequence in sequences belongs to. - In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book. - Parameters: - sequences : list of array-like, each of shape [sequence_length, n_features] - A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features. - Returns: - Y : list of arrays, each of shape [sequence_length,] - Index of the closest center each sample belongs to. 
 - 
score(X, y=None)¶
- Opposite of the value of X on the K-means objective. - Parameters: - X : {array-like, sparse matrix}, shape = [n_samples, n_features] - New data. - Returns: - score : float - Opposite of the value of X on the K-means objective. 
 - 
set_params(**params)¶
- Set the parameters of this estimator. - The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form - <component>__<parameter>so that it’s possible to update each component of a nested object.- Returns: - self 
 - 
summarize()¶
- Return some diagnostic summary statistics about this Markov model 
 - 
transform(sequences)¶
- Alias for predict