msmbuilder.cluster.GMM¶
- 
class msmbuilder.cluster.GMM(n_components=1, covariance_type='diag', random_state=None, thresh=None, tol=0.001, min_covar=0.001, n_iter=100, n_init=1, params='wmc', init_params='wmc', verbose=0)¶
- Gaussian Mixture Model - Representation of a Gaussian mixture model probability distribution. This class allows for easy evaluation of, sampling from, and maximum-likelihood estimation of the parameters of a GMM distribution. - Initializes parameters such that every mixture component has zero mean and identity covariance. - Read more in the User Guide. - Parameters: - n_components : int, optional - Number of mixture components. Defaults to 1. - covariance_type : string, optional - String describing the type of covariance parameters to use. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. Defaults to ‘diag’. - random_state: RandomState or an int seed (None by default) - A random number generator instance - min_covar : float, optional - Floor on the diagonal of the covariance matrix to prevent overfitting. Defaults to 1e-3. - tol : float, optional - Convergence threshold. EM iterations will stop when average gain in log-likelihood is below this threshold. Defaults to 1e-3. - n_iter : int, optional - Number of EM iterations to perform. - n_init : int, optional - Number of initializations to perform. the best results is kept - params : string, optional - Controls which parameters are updated in the training process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’. - init_params : string, optional - Controls which parameters are updated in the initialization process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’. - verbose : int, default: 0 - Enable verbose output. If 1 then it always prints the current initialization and iteration step. If greater than 1 then it prints additionally the change and time needed for each step. - See also - DPGMM
- Infinite gaussian mixture model, using the dirichlet process, fit with a variational algorithm
- VBGMM
- Finite gaussian mixture model fit with a variational algorithm, better for situations where there might be too little data to get a good estimate of the covariance matrix.
 - Examples - >>> import numpy as np >>> from sklearn import mixture >>> np.random.seed(1) >>> g = mixture.GMM(n_components=2) >>> # Generate random observations with two modes centered on 0 >>> # and 10 to use for training. >>> obs = np.concatenate((np.random.randn(100, 1), ... 10 + np.random.randn(300, 1))) >>> g.fit(obs) GMM(covariance_type='diag', init_params='wmc', min_covar=0.001, n_components=2, n_init=1, n_iter=100, params='wmc', random_state=None, thresh=None, tol=0.001, verbose=0) >>> np.round(g.weights_, 2) array([ 0.75, 0.25]) >>> np.round(g.means_, 2) array([[ 10.05], [ 0.06]]) >>> np.round(g.covars_, 2) array([[[ 1.02]], [[ 0.96]]]) >>> g.predict([[0], [2], [9], [10]]) array([1, 1, 0, 0]...) >>> np.round(g.score([[0], [2], [9], [10]]), 2) array([-2.19, -4.58, -1.75, -1.21]) >>> # Refit the model on new data (initial parameters remain the >>> # same), this time with an even split between the two modes. >>> g.fit(20 * [[0]] + 20 * [[10]]) GMM(covariance_type='diag', init_params='wmc', min_covar=0.001, n_components=2, n_init=1, n_iter=100, params='wmc', random_state=None, thresh=None, tol=0.001, verbose=0) >>> np.round(g.weights_, 2) array([ 0.5, 0.5]) - Attributes - weights_ - (array, shape (n_components,)) This attribute stores the mixing weights for each mixture component. - means_ - (array, shape (n_components, n_features)) Mean parameters for each mixture component. - covars_ - (array) Covariance parameters for each mixture component. The shape depends on covariance_type:: (n_components, n_features) if ‘spherical’, (n_features, n_features) if ‘tied’, (n_components, n_features) if ‘diag’, (n_components, n_features, n_features) if ‘full’ - converged_ - (bool) True when convergence was reached in fit(), False otherwise. - Methods - aic(X)- Akaike information criterion for the current model fit - bic(X)- Bayesian information criterion for the current model fit - fit(sequences[, y])- Fit the clustering on the data - fit_predict(sequences[, y])- Performs clustering on X and returns cluster labels. - fit_transform(sequences[, y])- Alias for fit_predict - get_params([deep])- Get parameters for this estimator. - partial_predict(X[, y])- Predict the closest cluster each sample in X belongs to. - partial_transform(X)- Alias for partial_predict - predict(sequences[, y])- Predict the closest cluster each sample in each sequence in sequences belongs to. - predict_proba(X)- Predict posterior probability of data under each Gaussian in the model. - sample([n_samples, random_state])- Generate random samples from the model. - score(X[, y])- Compute the log probability under the model. - score_samples(X)- Return the per-sample likelihood of the data under the model. - set_params(**params)- Set the parameters of this estimator. - summarize()- Return some diagnostic summary statistics about this Markov model - transform(sequences)- Alias for predict - 
__init__(n_components=1, covariance_type='diag', random_state=None, thresh=None, tol=0.001, min_covar=0.001, n_iter=100, n_init=1, params='wmc', init_params='wmc', verbose=0)¶
 - Methods - __init__([n_components, covariance_type, ...])- aic(X)- Akaike information criterion for the current model fit - bic(X)- Bayesian information criterion for the current model fit - fit(sequences[, y])- Fit the clustering on the data - fit_predict(sequences[, y])- Performs clustering on X and returns cluster labels. - fit_transform(sequences[, y])- Alias for fit_predict - get_params([deep])- Get parameters for this estimator. - partial_predict(X[, y])- Predict the closest cluster each sample in X belongs to. - partial_transform(X)- Alias for partial_predict - predict(sequences[, y])- Predict the closest cluster each sample in each sequence in sequences belongs to. - predict_proba(X)- Predict posterior probability of data under each Gaussian in the model. - sample([n_samples, random_state])- Generate random samples from the model. - score(X[, y])- Compute the log probability under the model. - score_samples(X)- Return the per-sample likelihood of the data under the model. - set_params(**params)- Set the parameters of this estimator. - summarize()- Return some diagnostic summary statistics about this Markov model - transform(sequences)- Alias for predict - 
aic(X)¶
- Akaike information criterion for the current model fit and the proposed data - Parameters: - X : array of shape(n_samples, n_dimensions) - Returns: - aic: float (the lower the better) 
 - 
bic(X)¶
- Bayesian information criterion for the current model fit and the proposed data - Parameters: - X : array of shape(n_samples, n_dimensions) - Returns: - bic: float (the lower the better) 
 - 
fit(sequences, y=None)¶
- Fit the clustering on the data - Parameters: - sequences : list of array-like, each of shape [sequence_length, n_features] - A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features. - Returns: - self 
 - 
fit_predict(sequences, y=None)¶
- Performs clustering on X and returns cluster labels. - Parameters: - sequences : list of array-like, each of shape [sequence_length, n_features] - A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features. - Returns: - Y : list of ndarray, each of shape [sequence_length, ] - Cluster labels 
 - 
fit_transform(sequences, y=None)¶
- Alias for fit_predict 
 - 
get_params(deep=True)¶
- Get parameters for this estimator. - Parameters: - deep: boolean, optional - If True, will return the parameters for this estimator and contained subobjects that are estimators. - Returns: - params : mapping of string to any - Parameter names mapped to their values. 
 - 
partial_predict(X, y=None)¶
- Predict the closest cluster each sample in X belongs to. - In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book. - Parameters: - X : array-like shape=(n_samples, n_features) - A single timeseries. - Returns: - Y : array, shape=(n_samples,) - Index of the cluster that each sample belongs to 
 - 
partial_transform(X)¶
- Alias for partial_predict 
 - 
predict(sequences, y=None)¶
- Predict the closest cluster each sample in each sequence in sequences belongs to. - In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book. - Parameters: - sequences : list of array-like, each of shape [sequence_length, n_features] - A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features. - Returns: - Y : list of arrays, each of shape [sequence_length,] - Index of the closest center each sample belongs to. 
 - 
predict_proba(X)¶
- Predict posterior probability of data under each Gaussian in the model. - Parameters: - X : array-like, shape = [n_samples, n_features] - Returns: - responsibilities : array-like, shape = (n_samples, n_components) - Returns the probability of the sample for each Gaussian (state) in the model. 
 - 
sample(n_samples=1, random_state=None)¶
- Generate random samples from the model. - Parameters: - n_samples : int, optional - Number of samples to generate. Defaults to 1. - Returns: - X : array_like, shape (n_samples, n_features) - List of samples 
 - 
score(X, y=None)¶
- Compute the log probability under the model. - Parameters: - X : array_like, shape (n_samples, n_features) - List of n_features-dimensional data points. Each row corresponds to a single data point. - Returns: - logprob : array_like, shape (n_samples,) - Log probabilities of each data point in X 
 - 
score_samples(X)¶
- Return the per-sample likelihood of the data under the model. - Compute the log probability of X under the model and return the posterior distribution (responsibilities) of each mixture component for each element of X. - Parameters: - X: array_like, shape (n_samples, n_features) - List of n_features-dimensional data points. Each row corresponds to a single data point. - Returns: - logprob : array_like, shape (n_samples,) - Log probabilities of each data point in X. - responsibilities : array_like, shape (n_samples, n_components) - Posterior probabilities of each mixture component for each observation 
 - 
set_params(**params)¶
- Set the parameters of this estimator. - The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form - <component>__<parameter>so that it’s possible to update each component of a nested object.- Returns: - self 
 - 
summarize()¶
- Return some diagnostic summary statistics about this Markov model 
 - 
transform(sequences)¶
- Alias for predict