Mean shift clustering using a flat kernel.
Mean shift clustering aims to discover “blobs” in a smooth density of samples. It is a centroid-based algorithm, which works by updating candidates for centroids to be the mean of the points within a given region. These candidates are then filtered in a post-processing stage to eliminate near-duplicates to form the final set of centroids.
Seeding is performed using a binning technique for scalability.
| Parameters: | bandwidth : float, optional 
 seeds : array, shape=[n_samples, n_features], optional 
 bin_seeding : boolean, optional 
 min_bin_freq : int, optional 
 cluster_all : boolean, default True 
  | 
|---|
Notes
Scalability:
Because this implementation uses a flat kernel and a Ball Tree to look up members of each kernel, the complexity will is to O(T*n*log(n)) in lower dimensions, with n the number of samples and T the number of points. In higher dimensions the complexity will tend towards O(T*n^2).
Scalability can be boosted by using fewer seeds, for example by using a higher value of min_bin_freq in the get_bin_seeds function.
Note that the estimate_bandwidth function is much less scalable than the mean shift algorithm and will be the bottleneck if it is used.
References
Dorin Comaniciu and Peter Meer, “Mean Shift: A robust approach toward feature space analysis”. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002. pp. 603-619.
Attributes
| cluster_centers_ | (array, [n_clusters, n_features]) Coordinates of cluster centers. | 
| labels_ | (list of arrays, each of shape [sequence_length, ]) The label of each point is an integer in [0, n_clusters). | 
Methods
| fit(sequences[, y]) | Fit the clustering on the data | 
| fit_predict(sequences[, y]) | Performs clustering on X and returns cluster labels. | 
| fit_transform(sequences[, y]) | Alias for fit_predict | 
| get_params([deep]) | Get parameters for this estimator. | 
| partial_predict(X[, y]) | Predict the closest cluster each sample in X belongs to. | 
| partial_transform(X) | Alias for partial_predict | 
| predict(sequences[, y]) | Predict the closest cluster each sample in each sequence in sequences belongs to. | 
| set_params(**params) | Set the parameters of this estimator. | 
| summarize() | Return some diagnostic summary statistics about this Markov model | 
| transform(sequences) | Alias for predict | 
Methods
| __init__([bandwidth, seeds, bin_seeding, ...]) | |
| fit(sequences[, y]) | Fit the clustering on the data | 
| fit_predict(sequences[, y]) | Performs clustering on X and returns cluster labels. | 
| fit_transform(sequences[, y]) | Alias for fit_predict | 
| get_params([deep]) | Get parameters for this estimator. | 
| partial_predict(X[, y]) | Predict the closest cluster each sample in X belongs to. | 
| partial_transform(X) | Alias for partial_predict | 
| predict(sequences[, y]) | Predict the closest cluster each sample in each sequence in sequences belongs to. | 
| set_params(**params) | Set the parameters of this estimator. | 
| summarize() | Return some diagnostic summary statistics about this Markov model | 
| transform(sequences) | Alias for predict | 
Fit the clustering on the data
| Parameters: | sequences : list of array-like, each of shape [sequence_length, n_features] 
  | 
|---|---|
| Returns: | self  | 
Performs clustering on X and returns cluster labels.
| Parameters: | sequences : list of array-like, each of shape [sequence_length, n_features] 
  | 
|---|---|
| Returns: | Y : list of ndarray, each of shape [sequence_length, ] 
  | 
Alias for fit_predict
Get parameters for this estimator.
| Parameters: | deep: boolean, optional 
  | 
|---|---|
| Returns: | params : mapping of string to any 
  | 
Predict the closest cluster each sample in X belongs to.
In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.
| Parameters: | X : array-like shape=(n_samples, n_features) 
  | 
|---|---|
| Returns: | Y : array, shape=(n_samples,) 
  | 
Alias for partial_predict
Predict the closest cluster each sample in each sequence in sequences belongs to.
In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.
| Parameters: | sequences : list of array-like, each of shape [sequence_length, n_features] 
  | 
|---|---|
| Returns: | Y : list of arrays, each of shape [sequence_length,] 
  | 
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
| Returns: | self | 
|---|
Return some diagnostic summary statistics about this Markov model
Alias for predict