K-Means clustering
| Parameters: | n_clusters : int, optional, default: 8 
 max_iter : int, default: 300 
 n_init : int, default: 10 
 init : {‘k-means++’, ‘random’ or an ndarray} 
 precompute_distances : boolean, default: True 
 tol : float, default: 1e-4 
 n_jobs : int, default: 1 
 random_state : integer or numpy.RandomState, optional 
 | 
|---|
See also
Notes
The k-means problem is solved using Lloyd’s algorithm.
The average complexity is given by O(k n T), were n is the number of samples and T is the number of iteration.
The worst case complexity is given by O(n^(k+2/p)) with n = n_samples, p = n_features. (D. Arthur and S. Vassilvitskii, ‘How slow is the k-means method?’ SoCG2006)
In practice, the k-means algorithm is very fast (one of the fastest clustering algorithms available), but it falls in local minima. That’s why it can be useful to restart it several times.
Attributes
| cluster_centers_ | (array, [n_clusters, n_features]) Coordinates of cluster centers | 
| labels_ | (list of arrays, each of shape [sequence_length, ]) The label of each point is an integer in [0, n_clusters). | 
| inertia_ | (float) Sum of distances of samples to their closest cluster center. | 
Methods
| fit(sequences[, y]) | Fit the clustering on the data | 
| fit_predict(sequences[, y]) | Performs clustering on X and returns cluster labels. | 
| fit_transform(sequences[, y]) | Alias for fit_predict | 
| get_params([deep]) | Get parameters for this estimator. | 
| partial_predict(X[, y]) | Predict the closest cluster each sample in X belongs to. | 
| partial_transform(X) | Alias for partial_predict | 
| predict(sequences[, y]) | Predict the closest cluster each sample in each sequence in sequences belongs to. | 
| score(X) | Opposite of the value of X on the K-means objective. | 
| set_params(**params) | Set the parameters of this estimator. | 
| summarize() | Return some diagnostic summary statistics about this Markov model | 
| transform(sequences) | Alias for predict | 
Fit the clustering on the data
| Parameters: | sequences : list of array-like, each of shape [sequence_length, n_features] 
 | 
|---|---|
| Returns: | self | 
Performs clustering on X and returns cluster labels.
| Parameters: | sequences : list of array-like, each of shape [sequence_length, n_features] 
 | 
|---|---|
| Returns: | Y : list of ndarray, each of shape [sequence_length, ] 
 | 
Alias for fit_predict
Get parameters for this estimator.
| Parameters: | deep: boolean, optional 
 | 
|---|---|
| Returns: | params : mapping of string to any 
 | 
Predict the closest cluster each sample in X belongs to.
In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.
| Parameters: | X : array-like shape=(n_samples, n_features) 
 | 
|---|---|
| Returns: | Y : array, shape=(n_samples,) 
 | 
Alias for partial_predict
Predict the closest cluster each sample in each sequence in sequences belongs to.
In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.
| Parameters: | sequences : list of array-like, each of shape [sequence_length, n_features] 
 | 
|---|---|
| Returns: | Y : list of arrays, each of shape [sequence_length,] 
 | 
Opposite of the value of X on the K-means objective.
| Parameters: | X : {array-like, sparse matrix}, shape = [n_samples, n_features] 
 | 
|---|---|
| Returns: | score : float 
 | 
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
| Returns: | self | 
|---|
Return some diagnostic summary statistics about this Markov model
Alias for predict