msmbuilder.cluster.Ward¶

class msmbuilder.cluster.Ward(n_clusters=2, memory=Memory(cachedir=None), connectivity=None, copy=None, n_components=None, compute_full_tree='auto', pooling_func=<function mean at 0x10694f730>)[source]¶

Ward hierarchical clustering: constructs a tree and cuts it.

Recursively merges the pair of clusters that minimally increases within-cluster variance.

Parameters:

n_clusters : int or ndarray

The number of clusters to find.

connectivity : sparse matrix (optional)

Connectivity matrix. Defines for each sample the neighboring samples following a given structure of the data. Default is None, i.e, the hierarchical clustering algorithm is unstructured.

memory : Instance of joblib.Memory or string (optional)

Used to cache the output of the computation of the tree. By default, no caching is done. If a string is given, it is the path to the caching directory.

n_components : int (optional)

The number of connected components in the graph defined by the connectivity matrix. If not set, it is estimated.

compute_full_tree : bool or ‘auto’ (optional)

Stop early the construction of the tree at n_clusters. This is useful to decrease computation time if the number of clusters is not small compared to the number of samples. This option is useful only when specifying a connectivity matrix. Note also that when varying the number of clusters and using caching, it may be advantageous to compute the full tree.

See also

AgglomerativeClustering: agglomerative hierarchical clustering

Attributes

labels_	(list of arrays, each of shape [sequence_length, ]) The label of each point is an integer in [0, n_clusters).
n_leaves_	(int) Number of leaves in the hierarchical tree.
n_components_	(int) The estimated number of connected components in the graph.
children_	(array-like, shape = [n_nodes, 2]) The children of each non-leaf node. Values less than n_samples refer to leaves of the tree. A greater value i indicates a node with children children_[i - n_samples].

Methods

`fit`(sequences[, y])	Fit the clustering on the data
`fit_predict`(sequences[, y])	Performs clustering on X and returns cluster labels.
`fit_transform`(sequences[, y])	Alias for fit_predict
`get_params`([deep])	Get parameters for this estimator.
`partial_predict`(X[, y])	Predict the closest cluster each sample in X belongs to.
`partial_transform`(X)	Alias for partial_predict
`predict`(sequences[, y])	Predict the closest cluster each sample in each sequence in sequences belongs to.
`set_params`(**params)	Set the parameters of this estimator.
`summarize`()	Return some diagnostic summary statistics about this Markov model
`transform`(sequences)	Alias for predict

__init__(n_clusters=2, memory=Memory(cachedir=None), connectivity=None, copy=None, n_components=None, compute_full_tree='auto', pooling_func=<function mean at 0x10694f730>)¶

Methods

`__init__`([n_clusters, memory, connectivity, ...])
`fit`(sequences[, y])	Fit the clustering on the data
`fit_predict`(sequences[, y])	Performs clustering on X and returns cluster labels.
`fit_transform`(sequences[, y])	Alias for fit_predict
`get_params`([deep])	Get parameters for this estimator.
`partial_predict`(X[, y])	Predict the closest cluster each sample in X belongs to.
`partial_transform`(X)	Alias for partial_predict
`predict`(sequences[, y])	Predict the closest cluster each sample in each sequence in sequences belongs to.
`set_params`(**params)	Set the parameters of this estimator.
`summarize`()	Return some diagnostic summary statistics about this Markov model
`transform`(sequences)	Alias for predict

Attributes

linkage

fit(sequences, y=None)¶

Fit the clustering on the data

Parameters:

sequences : list of array-like, each of shape [sequence_length, n_features]

A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features.

Returns:

self

fit_predict(sequences, y=None)¶

Performs clustering on X and returns cluster labels.

Parameters:

sequences : list of array-like, each of shape [sequence_length, n_features]

A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features.

Returns:

Y : list of ndarray, each of shape [sequence_length, ]

Cluster labels

fit_transform(sequences, y=None)¶: Alias for fit_predict

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deep: boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

partial_predict(X, y=None)¶

Predict the closest cluster each sample in X belongs to.

In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.

Parameters:

X : array-like shape=(n_samples, n_features)

A single timeseries.

Returns:

Y : array, shape=(n_samples,)

Index of the cluster that each sample belongs to

partial_transform(X)¶: Alias for partial_predict

predict(sequences, y=None)¶

Predict the closest cluster each sample in each sequence in sequences belongs to.

In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.

Parameters:

sequences : list of array-like, each of shape [sequence_length, n_features]

A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features.

Returns:

Y : list of arrays, each of shape [sequence_length,]

Index of the closest center each sample belongs to.

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:	self

summarize()¶: Return some diagnostic summary statistics about this Markov model

transform(sequences)¶: Alias for predict