msmbuilder.preprocessing.MultiLabelBinarizer¶

class msmbuilder.preprocessing.MultiLabelBinarizer(classes=None, sparse_output=False)¶

Transform between iterable of iterables and a multilabel format

Although a list of sets or tuples is a very intuitive format for multilabel data, it is unwieldy to process. This transformer converts between this intuitive format and the supported multilabel format: a (samples x classes) binary matrix indicating the presence of a class label.

Parameters:	classes : array-like of shape [n_classes] (optional) Indicates an ordering for the class labels sparse_output : boolean (default: False), Set to true if output binary array is desired in CSR sparse format

See also

sklearn.preprocessing.OneHotEncoder: encode categorical integer features using a one-hot aka one-of-K scheme.

Examples

>>> from sklearn.preprocessing import MultiLabelBinarizer
>>> mlb = MultiLabelBinarizer()
>>> mlb.fit_transform([(1, 2), (3,)])
array([[1, 1, 0],
       [0, 0, 1]])
>>> mlb.classes_
array([1, 2, 3])

>>> mlb.fit_transform([set(['sci-fi', 'thriller']), set(['comedy'])])
array([[0, 1, 1],
       [1, 0, 0]])
>>> list(mlb.classes_)
['comedy', 'sci-fi', 'thriller']

Attributes:	classes_ : array of labels A copy of the classes parameter where provided, or otherwise, the sorted set of classes found when fitting.

Methods

`fit`(X[, y])	Fit Preprocessing to X.
`fit_transform`(sequences[, y])	Fit the model and apply preprocessing
`get_params`([deep])	Get parameters for this estimator.
`inverse_transform`(yt)	Transform the given indicator matrix into label sets
`partial_fit`(sequence[, y])	Fit Preprocessing to X.
`partial_transform`(sequence)	Apply preprocessing to single sequence
`set_params`(**params)	Set the parameters of this estimator.
`summarize`()	Return some diagnostic summary statistics about this Markov model
`transform`(sequences)	Apply preprocessing to sequences

__init__(classes=None, sparse_output=False)¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`([classes, sparse_output])	Initialize self.
`fit`(X[, y])	Fit Preprocessing to X.
`fit_transform`(sequences[, y])	Fit the model and apply preprocessing
`get_params`([deep])	Get parameters for this estimator.
`inverse_transform`(yt)	Transform the given indicator matrix into label sets
`partial_fit`(sequence[, y])	Fit Preprocessing to X.
`partial_transform`(sequence)	Apply preprocessing to single sequence
`set_params`(**params)	Set the parameters of this estimator.
`summarize`()	Return some diagnostic summary statistics about this Markov model
`transform`(sequences)	Apply preprocessing to sequences

fit(X, y=None)¶

Fit Preprocessing to X.

Parameters:	sequence : array-like, [sequence_length, n_features] A multivariate timeseries. y : None Ignored
Returns:	self

fit_transform(sequences, y=None)¶

Fit the model and apply preprocessing

Parameters:	sequences: list of array-like, each of shape (n_samples_i, n_features) Training data, where n_samples_i in the number of samples in sequence i and n_features is the number of features. y : None Ignored
Returns:	sequence_new : list of array-like, each of shape (n_samples_i, n_components)

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:	deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params : mapping of string to any Parameter names mapped to their values.

inverse_transform(yt)¶

Transform the given indicator matrix into label sets

Parameters:	yt : array or sparse matrix of shape (n_samples, n_classes) A matrix containing only 1s ands 0s.
Returns:	y : list of tuples The set of labels for each sample such that y[i] consists of classes_[j] for each yt[i, j] == 1.

partial_fit(sequence, y=None)¶

Fit Preprocessing to X. Parameters ———- sequence : array-like, [sequence_length, n_features]

A multivariate timeseries.

y : None: Ignored

self

partial_transform(sequence)¶

Apply preprocessing to single sequence

Parameters:	sequence: array like, shape (n_samples, n_features) A single sequence to transform
Returns:	out : array like, shape (n_samples, n_features)

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:	self

summarize()¶: Return some diagnostic summary statistics about this Markov model

transform(sequences)¶

Apply preprocessing to sequences

Parameters:	sequences: list of array-like, each of shape (n_samples_i, n_features) Sequence data to transform, where n_samples_i in the number of samples in sequence i and n_features is the number of features.
Returns:	sequence_new : list of array-like, each of shape (n_samples_i, n_components)