msmbuilder.cluster.SpectralClustering¶
- 
class msmbuilder.cluster.SpectralClustering(n_clusters=8, eigen_solver=None, random_state=None, n_init=10, gamma=1.0, affinity='rbf', n_neighbors=10, eigen_tol=0.0, assign_labels='kmeans', degree=3, coef0=1, kernel_params=None)¶
- Apply clustering to a projection to the normalized laplacian. - In practice Spectral Clustering is very useful when the structure of the individual clusters is highly non-convex or more generally when a measure of the center and spread of the cluster is not a suitable description of the complete cluster. For instance when clusters are nested circles on the 2D plan. - If affinity is the adjacency matrix of a graph, this method can be used to find normalized graph cuts. - When calling - fit, an affinity matrix is constructed using either kernel function such the Gaussian (aka RBF) kernel of the euclidean distanced- d(X, X):- np.exp(-gamma * d(X,X) ** 2) - or a k-nearest neighbors connectivity matrix. - Alternatively, using - precomputed, a user-provided affinity matrix can be used.- Read more in the User Guide. - Parameters: - n_clusters : integer, optional - The dimension of the projection subspace. - affinity : string, array-like or callable, default ‘rbf’ - If a string, this may be one of ‘nearest_neighbors’, ‘precomputed’, ‘rbf’ or one of the kernels supported by sklearn.metrics.pairwise_kernels. - Only kernels that produce similarity scores (non-negative values that increase with similarity) should be used. This property is not checked by the clustering algorithm. - gamma : float - Scaling factor of RBF, polynomial, exponential chi^2 and sigmoid affinity kernel. Ignored for - affinity='nearest_neighbors'.- degree : float, default=3 - Degree of the polynomial kernel. Ignored by other kernels. - coef0 : float, default=1 - Zero coefficient for polynomial and sigmoid kernels. Ignored by other kernels. - n_neighbors : integer - Number of neighbors to use when constructing the affinity matrix using the nearest neighbors method. Ignored for - affinity='rbf'.- eigen_solver : {None, ‘arpack’, ‘lobpcg’, or ‘amg’} - The eigenvalue decomposition strategy to use. AMG requires pyamg to be installed. It can be faster on very large, sparse problems, but may also lead to instabilities - random_state : int seed, RandomState instance, or None (default) - A pseudo random number generator used for the initialization of the lobpcg eigen vectors decomposition when eigen_solver == ‘amg’ and by the K-Means initialization. - n_init : int, optional, default: 10 - Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia. - eigen_tol : float, optional, default: 0.0 - Stopping criterion for eigendecomposition of the Laplacian matrix when using arpack eigen_solver. - assign_labels : {‘kmeans’, ‘discretize’}, default: ‘kmeans’ - The strategy to use to assign labels in the embedding space. There are two ways to assign labels after the laplacian embedding. k-means can be applied and is a popular choice. But it can also be sensitive to initialization. Discretization is another approach which is less sensitive to random initialization. - kernel_params : dictionary of string to any, optional - Parameters (keyword arguments) and values for kernel passed as callable object. Ignored by other kernels. - Notes - If you have an affinity matrix, such as a distance matrix, for which 0 means identical elements, and high values means very dissimilar elements, it can be transformed in a similarity matrix that is well suited for the algorithm by applying the Gaussian (RBF, heat) kernel: - np.exp(- X ** 2 / (2. * delta ** 2)) - Another alternative is to take a symmetric version of the k nearest neighbors connectivity matrix of the points. - If the pyamg package is installed, it is used: this greatly speeds up computation. - References - Normalized cuts and image segmentation, 2000 Jianbo Shi, Jitendra Malik http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.160.2324
- A Tutorial on Spectral Clustering, 2007 Ulrike von Luxburg http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.165.9323
- Multiclass spectral clustering, 2003 Stella X. Yu, Jianbo Shi http://www1.icsi.berkeley.edu/~stellayu/publication/doc/2003kwayICCV.pdf
 - Attributes - affinity_matrix_ - (array-like, shape (n_samples, n_samples)) Affinity matrix used for clustering. Available only if after calling - fit.- labels_ - (list of arrays, each of shape [sequence_length, ]) The label of each point is an integer in [0, n_clusters). - Methods - fit(sequences[, y])- Fit the clustering on the data - fit_predict(sequences[, y])- Performs clustering on X and returns cluster labels. - fit_transform(sequences[, y])- Alias for fit_predict - get_params([deep])- Get parameters for this estimator. - partial_predict(X[, y])- Predict the closest cluster each sample in X belongs to. - partial_transform(X)- Alias for partial_predict - predict(sequences[, y])- Predict the closest cluster each sample in each sequence in sequences belongs to. - set_params(**params)- Set the parameters of this estimator. - summarize()- Return some diagnostic summary statistics about this Markov model - transform(sequences)- Alias for predict - 
__init__(n_clusters=8, eigen_solver=None, random_state=None, n_init=10, gamma=1.0, affinity='rbf', n_neighbors=10, eigen_tol=0.0, assign_labels='kmeans', degree=3, coef0=1, kernel_params=None)¶
 - Methods - __init__([n_clusters, eigen_solver, ...])- fit(sequences[, y])- Fit the clustering on the data - fit_predict(sequences[, y])- Performs clustering on X and returns cluster labels. - fit_transform(sequences[, y])- Alias for fit_predict - get_params([deep])- Get parameters for this estimator. - partial_predict(X[, y])- Predict the closest cluster each sample in X belongs to. - partial_transform(X)- Alias for partial_predict - predict(sequences[, y])- Predict the closest cluster each sample in each sequence in sequences belongs to. - set_params(**params)- Set the parameters of this estimator. - summarize()- Return some diagnostic summary statistics about this Markov model - transform(sequences)- Alias for predict - 
fit(sequences, y=None)¶
- Fit the clustering on the data - Parameters: - sequences : list of array-like, each of shape [sequence_length, n_features] - A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features. - Returns: - self 
 - 
fit_predict(sequences, y=None)¶
- Performs clustering on X and returns cluster labels. - Parameters: - sequences : list of array-like, each of shape [sequence_length, n_features] - A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features. - Returns: - Y : list of ndarray, each of shape [sequence_length, ] - Cluster labels 
 - 
fit_transform(sequences, y=None)¶
- Alias for fit_predict 
 - 
get_params(deep=True)¶
- Get parameters for this estimator. - Parameters: - deep: boolean, optional - If True, will return the parameters for this estimator and contained subobjects that are estimators. - Returns: - params : mapping of string to any - Parameter names mapped to their values. 
 - 
partial_predict(X, y=None)¶
- Predict the closest cluster each sample in X belongs to. - In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book. - Parameters: - X : array-like shape=(n_samples, n_features) - A single timeseries. - Returns: - Y : array, shape=(n_samples,) - Index of the cluster that each sample belongs to 
 - 
partial_transform(X)¶
- Alias for partial_predict 
 - 
predict(sequences, y=None)¶
- Predict the closest cluster each sample in each sequence in sequences belongs to. - In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book. - Parameters: - sequences : list of array-like, each of shape [sequence_length, n_features] - A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features. - Returns: - Y : list of arrays, each of shape [sequence_length,] - Index of the closest center each sample belongs to. 
 - 
set_params(**params)¶
- Set the parameters of this estimator. - The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form - <component>__<parameter>so that it’s possible to update each component of a nested object.- Returns: - self 
 - 
summarize()¶
- Return some diagnostic summary statistics about this Markov model 
 - 
transform(sequences)¶
- Alias for predict