Gaussian Mixture Model
Representation of a Gaussian mixture model probability distribution. This class allows for easy evaluation of, sampling from, and maximum-likelihood estimation of the parameters of a GMM distribution.
Initializes parameters such that every mixture component has zero mean and identity covariance.
Parameters: | n_components : int, optional
covariance_type : string, optional
random_state: RandomState or an int seed (0 by default)
min_covar : float, optional
thresh : float, optional
n_iter : int, optional
n_init : int, optional
params : string, optional
init_params : string, optional
|
---|
See also
Examples
>>> import numpy as np
>>> from sklearn import mixture
>>> np.random.seed(1)
>>> g = mixture.GMM(n_components=2)
>>> # Generate random observations with two modes centered on 0
>>> # and 10 to use for training.
>>> obs = np.concatenate((np.random.randn(100, 1),
... 10 + np.random.randn(300, 1)))
>>> g.fit(obs)
GMM(covariance_type='diag', init_params='wmc', min_covar=0.001,
n_components=2, n_init=1, n_iter=100, params='wmc',
random_state=None, thresh=0.01)
>>> np.round(g.weights_, 2)
array([ 0.75, 0.25])
>>> np.round(g.means_, 2)
array([[ 10.05],
[ 0.06]])
>>> np.round(g.covars_, 2)
array([[[ 1.02]],
[[ 0.96]]])
>>> g.predict([[0], [2], [9], [10]])
array([1, 1, 0, 0]...)
>>> np.round(g.score([[0], [2], [9], [10]]), 2)
array([-2.19, -4.58, -1.75, -1.21])
>>> # Refit the model on new data (initial parameters remain the
>>> # same), this time with an even split between the two modes.
>>> g.fit(20 * [[0]] + 20 * [[10]])
GMM(covariance_type='diag', init_params='wmc', min_covar=0.001,
n_components=2, n_init=1, n_iter=100, params='wmc',
random_state=None, thresh=0.01)
>>> np.round(g.weights_, 2)
array([ 0.5, 0.5])
Attributes
weights_ | (array, shape (n_components,)) This attribute stores the mixing weights for each mixture component. |
means_ | (array, shape (n_components, n_features)) Mean parameters for each mixture component. |
covars_ | (array) Covariance parameters for each mixture component. The shape depends on covariance_type:: (n_components, n_features) if ‘spherical’, (n_features, n_features) if ‘tied’, (n_components, n_features) if ‘diag’, (n_components, n_features, n_features) if ‘full’ |
converged_ | (bool) True when convergence was reached in fit(), False otherwise. |
Methods
aic(X) | Akaike information criterion for the current model fit |
bic(X) | Bayesian information criterion for the current model fit |
eval(*args, **kwargs) | DEPRECATED: GMM.eval was renamed to GMM.score_samples in 0.14 and will be removed in 0.16. |
fit(sequences[, y]) | Fit the clustering on the data |
fit_predict(sequences[, y]) | Performs clustering on X and returns cluster labels. |
fit_transform(sequences[, y]) | Alias for fit_predict |
get_params([deep]) | Get parameters for this estimator. |
partial_predict(X[, y]) | Predict the closest cluster each sample in X belongs to. |
partial_transform(X) | Alias for partial_predict |
predict(sequences[, y]) | Predict the closest cluster each sample in each sequence in sequences belongs to. |
predict_proba(X) | Predict posterior probability of data under each Gaussian in the model. |
sample([n_samples, random_state]) | Generate random samples from the model. |
score(X) | Compute the log probability under the model. |
score_samples(X) | Return the per-sample likelihood of the data under the model. |
set_params(**params) | Set the parameters of this estimator. |
summarize() | Return some diagnostic summary statistics about this Markov model |
transform(sequences) | Alias for predict |
Methods
__init__([n_components, covariance_type, ...]) | |
aic(X) | Akaike information criterion for the current model fit |
bic(X) | Bayesian information criterion for the current model fit |
eval(*args, **kwargs) | DEPRECATED: GMM.eval was renamed to GMM.score_samples in 0.14 and will be removed in 0.16. |
fit(sequences[, y]) | Fit the clustering on the data |
fit_predict(sequences[, y]) | Performs clustering on X and returns cluster labels. |
fit_transform(sequences[, y]) | Alias for fit_predict |
get_params([deep]) | Get parameters for this estimator. |
partial_predict(X[, y]) | Predict the closest cluster each sample in X belongs to. |
partial_transform(X) | Alias for partial_predict |
predict(sequences[, y]) | Predict the closest cluster each sample in each sequence in sequences belongs to. |
predict_proba(X) | Predict posterior probability of data under each Gaussian in the model. |
sample([n_samples, random_state]) | Generate random samples from the model. |
score(X) | Compute the log probability under the model. |
score_samples(X) | Return the per-sample likelihood of the data under the model. |
set_params(**params) | Set the parameters of this estimator. |
summarize() | Return some diagnostic summary statistics about this Markov model |
transform(sequences) | Alias for predict |
Akaike information criterion for the current model fit and the proposed data
Parameters: | X : array of shape(n_samples, n_dimensions) |
---|---|
Returns: | aic: float (the lower the better) |
Bayesian information criterion for the current model fit and the proposed data
Parameters: | X : array of shape(n_samples, n_dimensions) |
---|---|
Returns: | bic: float (the lower the better) |
DEPRECATED: GMM.eval was renamed to GMM.score_samples in 0.14 and will be removed in 0.16.
Fit the clustering on the data
Parameters: | sequences : list of array-like, each of shape [sequence_length, n_features]
|
---|---|
Returns: | self |
Performs clustering on X and returns cluster labels.
Parameters: | sequences : list of array-like, each of shape [sequence_length, n_features]
|
---|---|
Returns: | Y : list of ndarray, each of shape [sequence_length, ]
|
Alias for fit_predict
Get parameters for this estimator.
Parameters: | deep: boolean, optional
|
---|---|
Returns: | params : mapping of string to any
|
Predict the closest cluster each sample in X belongs to.
In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.
Parameters: | X : array-like shape=(n_samples, n_features)
|
---|---|
Returns: | Y : array, shape=(n_samples,)
|
Alias for partial_predict
Predict the closest cluster each sample in each sequence in sequences belongs to.
In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.
Parameters: | sequences : list of array-like, each of shape [sequence_length, n_features]
|
---|---|
Returns: | Y : list of arrays, each of shape [sequence_length,]
|
Predict posterior probability of data under each Gaussian in the model.
Parameters: | X : array-like, shape = [n_samples, n_features] |
---|---|
Returns: | responsibilities : array-like, shape = (n_samples, n_components)
|
Generate random samples from the model.
Parameters: | n_samples : int, optional
|
---|---|
Returns: | X : array_like, shape (n_samples, n_features)
|
Compute the log probability under the model.
Parameters: | X : array_like, shape (n_samples, n_features)
|
---|---|
Returns: | logprob : array_like, shape (n_samples,)
|
Return the per-sample likelihood of the data under the model.
Compute the log probability of X under the model and return the posterior distribution (responsibilities) of each mixture component for each element of X.
Parameters: | X: array_like, shape (n_samples, n_features)
|
---|---|
Returns: | logprob : array_like, shape (n_samples,)
responsibilities : array_like, shape (n_samples, n_components)
|
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
Returns: | self |
---|
Return some diagnostic summary statistics about this Markov model
Alias for predict