
class msmbuilder.featurizer.ContactFeaturizer(contacts='all', scheme='closest-heavy', ignore_nonprotein=True, soft_min=False, soft_min_beta=20, periodic=True)

Featurizer based on residue-residue distances.

This featurizer transforms a dataset containing MD trajectories into a vector dataset by representing each frame in each of the MD trajectories by a vector of the distances between pairs of amino-acid residues.

The exact method for computing the the distance between two residues is configurable with the scheme parameter.

contacts : np.ndarray or ‘all’

array containing (0-indexed) indices of the residues to compute the contacts for. (e.g. np.array([[0, 10], [0, 11]]) would compute the contact between residue 0 and residue 10 as well as the contact between residue 0 and residue 11.) [NOTE: if no array is passed then ‘all’ contacts are calculated. This means that the result will contain all contacts between residues separated by at least 3 residues.]

scheme : {‘ca’, ‘closest’, ‘closest-heavy’}
scheme to determine the distance between two residues:
‘ca’ : distance between two residues is given by the distance

between their alpha carbons

‘closest’ : distance is the closest distance between any

two atoms in the residues

‘closest-heavy’ : distance is the closest distance between

any two non-hydrogen atoms in the residues

ignore_nonprotein : bool

When using contact==all, don’t compute contacts between “residues” which are not protein (i.e. do not contain an alpha carbon).

soft_min : bool, default=False

If soft_min is true, we will use a diffrentiable version of the scheme. The exact expression used

is d =

rac{eta}{d_i}})} where

beta is user parameter which defaults to 20nm. The expression we use is copied from the plumed mindist calculator.

soft_min_beta : float, default=20nm

The value of beta to use for the soft_min distance option. Very large values might cause small contact distances to go to 0.

periodic : bool, default=True

If True, compute distances using periodic boundary conditions.


describe_features(traj) Return a list of dictionaries describing the contacts features.
fit_transform(X[, y]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
partial_transform(traj) Featurize an MD trajectory into a vector space derived from residue-residue distances
set_params(**params) Set the parameters of this estimator.
summarize() Return some diagnostic summary statistics about this Markov model
transform(traj_list[, y]) Featurize a several trajectories.
__init__(contacts='all', scheme='closest-heavy', ignore_nonprotein=True, soft_min=False, soft_min_beta=20, periodic=True)

Initialize self. See help(type(self)) for accurate signature.


__init__([contacts, scheme, …]) Initialize self.
describe_features(traj) Return a list of dictionaries describing the contacts features.
fit(traj_list[, y])
fit_transform(X[, y]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
partial_transform(traj) Featurize an MD trajectory into a vector space derived from residue-residue distances
set_params(**params) Set the parameters of this estimator.
summarize() Return some diagnostic summary statistics about this Markov model
transform(traj_list[, y]) Featurize a several trajectories.

Return a list of dictionaries describing the contacts features.

traj : mdtraj.Trajectory

The trajectory to describe

feature_descs : list of dict

Dictionary describing each feature with the following information about the atoms participating in each dihedral

  • resnames: unique names of residues
  • atominds: atom indices(returns CA if scheme is ca_inds,otherwise
    returns all atom_inds)
  • resseqs: unique residue sequence ids (not necessarily 0-indexed)
  • resids: unique residue ids (0-indexed)
  • featurizer: Contact
  • featuregroup: ca, heavy etc.
fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.


Get parameters for this estimator.

deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

params : mapping of string to any

Parameter names mapped to their values.


Featurize an MD trajectory into a vector space derived from residue-residue distances

traj : mdtraj.Trajectory

A molecular dynamics trajectory to featurize.

features : np.ndarray, dtype=float, shape=(n_samples, n_features)

A featurized trajectory is a 2D array of shape (length_of_trajectory x n_features) where each features[i] vector is computed by applying the featurization function to the `i`th snapshot of the input trajectory.

See also

simultaneously featurize a collection of MD trajectories

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.


Return some diagnostic summary statistics about this Markov model

transform(traj_list, y=None)

Featurize a several trajectories.

traj_list : list(mdtraj.Trajectory)

Trajectories to be featurized.

features : list(np.ndarray), length = len(traj_list)

The featurized trajectories. features[i] is the featurized version of traj_list[i] and has shape (n_samples_i, n_features)