msmbuilder.preprocessing.MultiLabelBinarizer

class msmbuilder.preprocessing.MultiLabelBinarizer(classes=None, sparse_output=False)

Transform between iterable of iterables and a multilabel format

Although a list of sets or tuples is a very intuitive format for multilabel data, it is unwieldy to process. This transformer converts between this intuitive format and the supported multilabel format: a (samples x classes) binary matrix indicating the presence of a class label.

Parameters:

classes : array-like of shape [n_classes] (optional)

Indicates an ordering for the class labels

sparse_output : boolean (default: False),

Set to true if output binary array is desired in CSR sparse format

See also

sklearn.preprocessing.OneHotEncoder
encode categorical integer features using a one-hot aka one-of-K scheme.

Examples

>>> from sklearn.preprocessing import MultiLabelBinarizer
>>> mlb = MultiLabelBinarizer()
>>> mlb.fit_transform([(1, 2), (3,)])
array([[1, 1, 0],
       [0, 0, 1]])
>>> mlb.classes_
array([1, 2, 3])
>>> mlb.fit_transform([set(['sci-fi', 'thriller']), set(['comedy'])])
array([[0, 1, 1],
       [1, 0, 0]])
>>> list(mlb.classes_)
['comedy', 'sci-fi', 'thriller']

Attributes

classes_ (array of labels) A copy of the classes parameter where provided, or otherwise, the sorted set of classes found when fitting.

Methods

fit(sequences[, y]) Fit Preprocessing to X.
fit_transform(sequences[, y]) Fit the model and apply preprocessing
get_params([deep]) Get parameters for this estimator.
inverse_transform(yt) Transform the given indicator matrix into label sets
partial_fit(sequence[, y]) Fit Preprocessing to X.
partial_transform(sequence) Apply preprocessing to single sequence
set_params(**params) Set the parameters of this estimator.
summarize() Return some diagnostic summary statistics about this Markov model
transform(sequences) Apply preprocessing to sequences
__init__(classes=None, sparse_output=False)

Methods

__init__([classes, sparse_output])
fit(sequences[, y]) Fit Preprocessing to X.
fit_transform(sequences[, y]) Fit the model and apply preprocessing
get_params([deep]) Get parameters for this estimator.
inverse_transform(yt) Transform the given indicator matrix into label sets
partial_fit(sequence[, y]) Fit Preprocessing to X.
partial_transform(sequence) Apply preprocessing to single sequence
set_params(**params) Set the parameters of this estimator.
summarize() Return some diagnostic summary statistics about this Markov model
transform(sequences) Apply preprocessing to sequences
fit(sequences, y=None)

Fit Preprocessing to X.

Parameters:

sequences : list of array-like, each of shape [sequence_length, n_features]

A list of multivariate timeseries. Each sequence may have a different length, but they all must have the same number of features.

y : None

Ignored

Returns:

self

fit_transform(sequences, y=None)

Fit the model and apply preprocessing

Parameters:

sequences: list of array-like, each of shape (n_samples_i, n_features)

Training data, where n_samples_i in the number of samples in sequence i and n_features is the number of features.

y : None

Ignored

Returns:

sequence_new : list of array-like, each of shape (n_samples_i, n_components)

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

inverse_transform(yt)

Transform the given indicator matrix into label sets

Parameters:

yt : array or sparse matrix of shape (n_samples, n_classes)

A matrix containing only 1s ands 0s.

Returns:

y : list of tuples

The set of labels for each sample such that y[i] consists of classes_[j] for each yt[i, j] == 1.

partial_fit(sequence, y=None)

Fit Preprocessing to X.

Parameters:

sequence : array-like, [sequence_length, n_features]

A multivariate timeseries.

y : None

Ignored

Returns:

self

partial_transform(sequence)

Apply preprocessing to single sequence

Parameters:

sequence: array like, shape (n_samples, n_features)

A single sequence to transform

Returns:

out : array like, shape (n_samples, n_features)

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:self
summarize()

Return some diagnostic summary statistics about this Markov model

transform(sequences)

Apply preprocessing to sequences

Parameters:

sequences: list of array-like, each of shape (n_samples_i, n_features)

Sequence data to transform, where n_samples_i in the number of samples in sequence i and n_features is the number of features.

Returns:

sequence_new : list of array-like, each of shape (n_samples_i, n_components)