msmbuilder.preprocessing.LabelBinarizer¶

class msmbuilder.preprocessing.LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)¶

Binarize labels in a one-vs-all fashion

Several regression and binary classification algorithms are available in the scikit. A simple way to extend these algorithms to the multi-class classification case is to use the so-called one-vs-all scheme.

At learning time, this simply consists in learning one regressor or binary classifier per class. In doing so, one needs to convert multi-class labels to binary labels (belong or does not belong to the class). LabelBinarizer makes this process easy with the transform method.

At prediction time, one assigns the class for which the corresponding model gave the greatest confidence. LabelBinarizer makes this easy with the inverse_transform method.

See also

label_binarize: function to perform the transform operation of LabelBinarizer with fixed classes.
sklearn.preprocessing.OneHotEncoder: encode categorical integer features using a one-hot aka one-of-K scheme.

Examples

>>> from sklearn import preprocessing
>>> lb = preprocessing.LabelBinarizer()
>>> lb.fit([1, 2, 6, 4, 2])
LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)
>>> lb.classes_
array([1, 2, 4, 6])
>>> lb.transform([1, 6])
array([[1, 0, 0, 0],
       [0, 0, 0, 1]])

Binary targets transform to a column vector

>>> lb = preprocessing.LabelBinarizer()
>>> lb.fit_transform(['yes', 'no', 'no', 'yes'])
array([[1],
       [0],
       [0],
       [1]])

Passing a 2D matrix for multilabel classification

>>> import numpy as np
>>> lb.fit(np.array([[0, 1, 1], [1, 0, 0]]))
LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)
>>> lb.classes_
array([0, 1, 2])
>>> lb.transform([0, 1, 2, 1])
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [0, 1, 0]])

Attributes

classes_	(array of shape [n_class]) Holds the label for each class.
y_type_	(str,) Represents the type of the target data as evaluated by utils.multiclass.type_of_target. Possible type are ‘continuous’, ‘continuous-multioutput’, ‘binary’, ‘multiclass’, ‘multiclass-multioutput’, ‘multilabel-indicator’, and ‘unknown’.
sparse_input_	(boolean,) True if the input data to transform is given as a sparse matrix, False otherwise.

Methods

`fit`(X[, y])	Fit Preprocessing to X.
`fit_transform`(sequences[, y])	Fit the model and apply preprocessing
`get_params`([deep])	Get parameters for this estimator.
`inverse_transform`(Y[, threshold])	Transform binary labels back to multi-class labels
`partial_fit`(sequence[, y])	Fit Preprocessing to X.
`partial_transform`(sequence)	Apply preprocessing to single sequence
`set_params`(**params)	Set the parameters of this estimator.
`summarize`()	Return some diagnostic summary statistics about this Markov model
`transform`(sequences)	Apply preprocessing to sequences

__init__(neg_label=0, pos_label=1, sparse_output=False)¶

Methods

`__init__`([neg_label, pos_label, sparse_output])
`fit`(X[, y])	Fit Preprocessing to X.
`fit_transform`(sequences[, y])	Fit the model and apply preprocessing
`get_params`([deep])	Get parameters for this estimator.
`inverse_transform`(Y[, threshold])	Transform binary labels back to multi-class labels
`partial_fit`(sequence[, y])	Fit Preprocessing to X.
`partial_transform`(sequence)	Apply preprocessing to single sequence
`set_params`(**params)	Set the parameters of this estimator.
`summarize`()	Return some diagnostic summary statistics about this Markov model
`transform`(sequences)	Apply preprocessing to sequences

fit(X, y=None)¶

Fit Preprocessing to X.

Parameters:

sequence : array-like, [sequence_length, n_features]

A multivariate timeseries.

y : None

Ignored

Returns:

self

fit_transform(sequences, y=None)¶

Fit the model and apply preprocessing

Parameters:

sequences: list of array-like, each of shape (n_samples_i, n_features)

Training data, where n_samples_i in the number of samples in sequence i and n_features is the number of features.

y : None

Ignored

Returns:

sequence_new : list of array-like, each of shape (n_samples_i, n_components)

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

inverse_transform(Y, threshold=None)¶

Transform binary labels back to multi-class labels

Parameters:

Y : numpy array or sparse matrix with shape [n_samples, n_classes]

Target values. All sparse matrices are converted to CSR before inverse transformation.

threshold : float or None

Threshold used in the binary and multi-label cases.

Use 0 when:

Y contains the output of decision_function (classifier)

Use 0.5 when:

Y contains the output of predict_proba

If None, the threshold is assumed to be half way between neg_label and pos_label.

Returns:

y : numpy array or CSR matrix of shape [n_samples] Target values.

Notes

In the case when the binary labels are fractional (probabilistic), inverse_transform chooses the class with the greatest value. Typically, this allows to use the output of a linear model’s decision_function method directly as the input of inverse_transform.

partial_fit(sequence, y=None)¶

Fit Preprocessing to X. Parameters ———- sequence : array-like, [sequence_length, n_features]

A multivariate timeseries.

y : None: Ignored

self

partial_transform(sequence)¶

Apply preprocessing to single sequence

Parameters:

sequence: array like, shape (n_samples, n_features)

A single sequence to transform

Returns:

out : array like, shape (n_samples, n_features)

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:	self

summarize()¶: Return some diagnostic summary statistics about this Markov model

transform(sequences)¶

Apply preprocessing to sequences

Parameters:

sequences: list of array-like, each of shape (n_samples_i, n_features)

Sequence data to transform, where n_samples_i in the number of samples in sequence i and n_features is the number of features.

Returns:

sequence_new : list of array-like, each of shape (n_samples_i, n_components)