msmbuilder.cluster.MiniBatchKMeans¶

class msmbuilder.cluster.MiniBatchKMeans(n_clusters=8, init='k-means++', max_iter=100, batch_size=100, verbose=0, compute_labels=True, random_state=None, tol=0.0, max_no_improvement=10, init_size=None, n_init=3, reassignment_ratio=0.01)¶

Mini-Batch K-Means clustering

See also

KMeans: The classic implementation of the clustering method based on the Lloyd’s algorithm. It consumes the whole set of input data at each iteration.

Notes

See http://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf

Attributes

cluster_centers_	(array, [n_clusters, n_features]) Coordinates of cluster centers
labels_	(list of arrays, each of shape [sequence_length, ]) The label of each point is an integer in [0, n_clusters).
inertia_	(float) The value of the inertia criterion associated with the chosen partition (if compute_labels is set to True). The inertia is defined as the sum of square distances of samples to their nearest neighbor.

Methods

`fit`(sequences[, y])	Fit the clustering on the data
`fit_predict`(sequences[, y])	Performs clustering on X and returns cluster labels.
`fit_transform`(sequences[, y])	Alias for fit_predict
`get_params`([deep])	Get parameters for this estimator.
`partial_fit`(X[, y])	Update k means estimate on a single mini-batch X.
`partial_predict`(X[, y])	Predict the closest cluster each sample in X belongs to.
`partial_transform`(X)	Alias for partial_predict
`predict`(sequences[, y])	Predict the closest cluster each sample in each sequence in sequences belongs to.
`score`(X[, y])	Opposite of the value of X on the K-means objective.
`set_params`(**params)	Set the parameters of this estimator.
`summarize`()	Return some diagnostic summary statistics about this Markov model
`transform`(sequences)	Alias for predict

__init__(n_clusters=8, init='k-means++', max_iter=100, batch_size=100, verbose=0, compute_labels=True, random_state=None, tol=0.0, max_no_improvement=10, init_size=None, n_init=3, reassignment_ratio=0.01)¶

Methods

`__init__`([n_clusters, init, max_iter, ...])
`fit`(sequences[, y])	Fit the clustering on the data
`fit_predict`(sequences[, y])	Performs clustering on X and returns cluster labels.
`fit_transform`(sequences[, y])	Alias for fit_predict
`get_params`([deep])	Get parameters for this estimator.
`partial_fit`(X[, y])	Update k means estimate on a single mini-batch X.
`partial_predict`(X[, y])	Predict the closest cluster each sample in X belongs to.
`partial_transform`(X)	Alias for partial_predict
`predict`(sequences[, y])	Predict the closest cluster each sample in each sequence in sequences belongs to.
`score`(X[, y])	Opposite of the value of X on the K-means objective.
`set_params`(**params)	Set the parameters of this estimator.
`summarize`()	Return some diagnostic summary statistics about this Markov model
`transform`(sequences)	Alias for predict

fit(sequences, y=None)¶

Fit the clustering on the data

Parameters:

sequences : list of array-like, each of shape [sequence_length, n_features]

A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features.

Returns:

self

fit_predict(sequences, y=None)¶

Performs clustering on X and returns cluster labels.

Parameters:

sequences : list of array-like, each of shape [sequence_length, n_features]

A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features.

Returns:

Y : list of ndarray, each of shape [sequence_length, ]

Cluster labels

fit_transform(sequences, y=None)¶: Alias for fit_predict

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

partial_fit(X, y=None)¶

Update k means estimate on a single mini-batch X.

Parameters:

X : array-like, shape = [n_samples, n_features]

Coordinates of the data points to cluster.

partial_predict(X, y=None)¶

Predict the closest cluster each sample in X belongs to.

In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.

Parameters:

X : array-like shape=(n_samples, n_features)

A single timeseries.

Returns:

Y : array, shape=(n_samples,)

Index of the cluster that each sample belongs to

partial_transform(X)¶: Alias for partial_predict

predict(sequences, y=None)¶

Predict the closest cluster each sample in each sequence in sequences belongs to.

In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.

Parameters:

sequences : list of array-like, each of shape [sequence_length, n_features]

A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features.

Returns:

Y : list of arrays, each of shape [sequence_length,]

Index of the closest center each sample belongs to.

score(X, y=None)¶

Opposite of the value of X on the K-means objective.

Parameters:

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

New data.

Returns:

score : float

Opposite of the value of X on the K-means objective.

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:	self

summarize()¶: Return some diagnostic summary statistics about this Markov model

transform(sequences)¶: Alias for predict