msmbuilder.featurizer.ContactFeaturizer¶

class msmbuilder.featurizer.ContactFeaturizer(contacts='all', scheme='closest-heavy', ignore_nonprotein=True, soft_min=False, soft_min_beta=20, periodic=True)¶

Featurizer based on residue-residue distances.

This featurizer transforms a dataset containing MD trajectories into a vector dataset by representing each frame in each of the MD trajectories by a vector of the distances between pairs of amino-acid residues.

The exact method for computing the the distance between two residues is configurable with the scheme parameter.

Parameters:

contacts : np.ndarray or ‘all’

array containing (0-indexed) indices of the residues to compute the contacts for. (e.g. np.array([[0, 10], [0, 11]]) would compute the contact between residue 0 and residue 10 as well as the contact between residue 0 and residue 11.) [NOTE: if no array is passed then ‘all’ contacts are calculated. This means that the result will contain all contacts between residues separated by at least 3 residues.]

scheme : {‘ca’, ‘closest’, ‘closest-heavy’}

scheme to determine the distance between two residues:

‘ca’ : distance between two residues is given by the distance: between their alpha carbons
‘closest’ : distance is the closest distance between any: two atoms in the residues
‘closest-heavy’ : distance is the closest distance between: any two non-hydrogen atoms in the residues

ignore_nonprotein : bool

When using contact==all, don’t compute contacts between “residues” which are not protein (i.e. do not contain an alpha carbon).

soft_min : bool, default=False

If soft_min is true, we will use a diffrentiable version of the scheme. The exact expression used

is d =

rac{eta}{logsum_i{exp(

rac{eta}{d_i}})} where

beta is user parameter which defaults to 20nm. The expression we use is copied from the plumed mindist calculator. http://plumed.github.io/doc-v2.0/user-doc/html/mindist.html

soft_min_beta : float, default=20nm: The value of beta to use for the soft_min distance option. Very large values might cause small contact distances to go to 0.
periodic : bool, default=True: If True, compute distances using periodic boundary conditions.

Methods

`describe_features`(traj)	Return a list of dictionaries describing the contacts features.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_params`([deep])	Get parameters for this estimator.
`partial_transform`(traj)	Featurize an MD trajectory into a vector space derived from residue-residue distances
`set_params`(**params)	Set the parameters of this estimator.
`summarize`()	Return some diagnostic summary statistics about this Markov model
`transform`(traj_list[, y])	Featurize a several trajectories.

featurize
fit

__init__(contacts='all', scheme='closest-heavy', ignore_nonprotein=True, soft_min=False, soft_min_beta=20, periodic=True)¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`([contacts, scheme, …])	Initialize self.
`describe_features`(traj)	Return a list of dictionaries describing the contacts features.
`featurize`(traj)
`fit`(traj_list[, y])
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_params`([deep])	Get parameters for this estimator.
`partial_transform`(traj)	Featurize an MD trajectory into a vector space derived from residue-residue distances
`set_params`(**params)	Set the parameters of this estimator.
`summarize`()	Return some diagnostic summary statistics about this Markov model
`transform`(traj_list[, y])	Featurize a several trajectories.

describe_features(traj)¶

Return a list of dictionaries describing the contacts features.

Parameters:

traj : mdtraj.Trajectory: The trajectory to describe

Returns:

feature_descs : list of dict

Dictionary describing each feature with the following information about the atoms participating in each dihedral

resnames: unique names of residues

atominds: atom indices(returns CA if scheme is ca_inds,otherwise

returns all atom_inds)

resseqs: unique residue sequence ids (not necessarily 0-indexed)

resids: unique residue ids (0-indexed)

featurizer: Contact

featuregroup: ca, heavy etc.

fit_transform(X, y=None, **fit_params)¶

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:	X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values.
Returns:	X_new : numpy array of shape [n_samples, n_features_new] Transformed array.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:	deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params : mapping of string to any Parameter names mapped to their values.

partial_transform(traj)¶

Featurize an MD trajectory into a vector space derived from residue-residue distances

Parameters:	traj : mdtraj.Trajectory A molecular dynamics trajectory to featurize.
Returns:	features : np.ndarray, dtype=float, shape=(n_samples, n_features) A featurized trajectory is a 2D array of shape (length_of_trajectory x n_features) where each features[i] vector is computed by applying the featurization function to the `i`th snapshot of the input trajectory.