Fs Peptide (command line)¶

Modeling dynamics of FS Peptide¶

This example shows a typical, basic usage of the MSMBuilder command line to model dynamics of a protein system.

In [1]:

# Work in a temporary directory
import tempfile
import os
os.chdir(tempfile.mkdtemp())

In [2]:

# Since this is running from an IPython notebook,
# we prefix all our commands with "!"
# When running on the command line, omit the leading "!"
! msmb -h

/home/travis/miniconda3/envs/docenv/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
/home/travis/miniconda3/envs/docenv/lib/python3.6/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
usage: msmb [-h] [-v] ...

Statistical models for biomolecular dynamics

optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit

commands:

AtomIndices Create index file for atoms or distance pairs.
ConvertChunkedProject
Convert an MD dataset with chunked trajectories into a standard format.
TemplateProject Set up a new MSMBuilder project
DihedralFeaturizer Featurizer based on dihedral angles.
KappaAngleFeaturizer Featurizer to extract kappa angles.
AlphaAngleFeaturizer Featurizer to extract alpha (dihedral) angles.
AtomPairsFeaturizer Featurizer based on distances between specified pairs of atoms.
RMSDFeaturizer Featurizer based on RMSD to one or more reference structures.
LandMarkRMSDFeaturizer
Kernel Landmark Featuizer based on RMSD to one or more reference structures.
SuperposeFeaturizer Featurizer based on euclidian atom distances to reference structure.
DRIDFeaturizer Featurizer based on distribution of reciprocal interatomic distances (DRID).
ContactFeaturizer Featurizer based on residue-residue distances.
AngleFeaturizer Featurizer based on angles between 3 atoms.
BinaryContactFeaturizer
Featurizer based on residue-residue contacts below a cutoff.
LogisticContactFeaturizer
Featurizer based on logistic-transformed residue-residue contacts.
GaussianSolventFeaturizer
Featurizer on weighted pairwise distance between solute and solvent.
VonMisesFeaturizer Featurizer based on dihedral angles soft-binned along the unit circle.
RawPositionsFeaturizer
Featurize an MD trajectory into a vector space with the raw cartesian coordinates.
SASAFeaturizer Featurizer based on solvent-accessible surface areas.
LigandContactFeaturizer
Featurizer based on ligand-protein distances.
BinaryLigandContactFeaturizer
Featurizer based on binary ligand-protein contacts.
LigandRMSDFeaturizer Featurizer based on RMSD to one or more reference structures.
Butterworth Smooth time-series data using a low-pass, zero-delay Butterworth filter.
DoubleEWMA Smooth time-series data using forward and backward exponentially-weighted moving average
filters.
StandardScaler Standardize features by removing the mean and scaling to unit variance.
RobustScaler Scale features using statistics that are robust to outliers.
KMeans K-Means clustering.
KCenters K-Centers clustering.
KMedoids K-Medoids clustering.
MiniBatchKMedoids Mini-Batch K-Medoids clustering.
RegularSpatial Regular spatial clustering.
LandmarkAgglomerative
Landmark-based agglomerative hierarchical clustering.
GMM Gaussian Mixture.
MeanShift Mean shift clustering using a flat kernel.
NDGrid Discretize continuous data points onto an N-dimensional grid.
SpectralClustering Apply clustering to a projection to the normalized laplacian.
AffinityPropagation Perform Affinity Propagation Clustering of data.
APM APM This Program is a python package which implements Automatic State Partitioning for Multibody
Systems (APM) algorithm to cluster trajectories and model the transitions based on a Markov
State Model (MSM).
AgglomerativeClustering
Agglomerative Clustering.
MiniBatchKMeans Mini-Batch K-Means clustering.
tICA Time-structure Independent Component Analysis (tICA).
SparseTICA Sparse time-structure Independent Component Analysis (tICA).
FastICA FastICA: a fast algorithm for Independent Component Analysis.
FactorAnalysis Factor Analysis (FA).
KernelTICA Time-structure Independent Componenent Analysis (tICA) using the kernel trick.
PCA Principal component analysis (PCA).
SparsePCA Sparse Principal Components Analysis (SparsePCA).
MiniBatchSparsePCA Mini-batch Sparse Principal Components Analysis.
KSparseTICA Sparse tICA where eigenvectors must have k or fewer nonzero values.
AlanineDipeptide Download example alanine dipeptide dataset.
FsPeptide Download example Fs-peptide dataset.
MetEnkephalin Download example Met-Enkephalin dataset.
DoubleWell Generate example double well potential dataset.
QuadWell Generate example quad-well potential dataset.
MullerPotential Generate example Muller potential dataset.
ImpliedTimescales Scan the implied timescales of MarkovStateModels with respect to lag time.
GaussianHMM Reversible Gaussian Hidden Markov Model L1-Fusion Regularization.
MarkovStateModel Reversible Markov State Model.
BayesianMarkovStateModel
Bayesian reversible Markov state model.
ContinuousTimeMSM Reversible first order master equation model.
BayesianContinuousTimeMSM
Bayesian reversible first-order Master equation model.
TransformDataset Use a pre-fit model to transform a dataset.

Get example data¶

In [3]:

! msmb FsPeptide --data_home ./
! tree

/home/travis/miniconda3/envs/docenv/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/travis/miniconda3/envs/docenv/lib/python3.6/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
Copying fs_peptide from msmb_data package to ./
Example dataset saved: ./fs_peptide
/bin/sh: 1: tree: not found

Featurization¶

The raw (x, y, z) coordinates from the simulation do not respect the translational and rotational symmetry of our problem. A Featurizer transforms cartesian coordinates into other representations. Here we use the DihedralFeaturizer to turn our data into phi and psi dihedral angles. Observe that the 264*3-dimensional space is reduced to 84 dimensions.

In [4]:

# Remember '\' is the line-continuation marker
# You can enter this command on one line
! msmb DihedralFeaturizer \
    --out featurizer.pkl  \
    --transformed diheds  \
    --top fs_peptide/fs-peptide.pdb \
    --trjs "fs_peptide/*.xtc" \
    --stride 10

/home/travis/miniconda3/envs/docenv/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/travis/miniconda3/envs/docenv/lib/python3.6/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
DihedralFeaturizer(sincos=True, types=['phi', 'psi'])
/home/travis/miniconda3/envs/docenv/lib/python3.6/site-packages/mdtraj/formats/pdb/pdbfile.py:196: UserWarning: Unlikely unit cell vectors detected in PDB file likely resulting from a dummy CRYST1 record. Discarding unit cell vectors.
  warnings.warn('Unlikely unit cell vectors detected in PDB file likely '


Saving transformed dataset to 'diheds/'
To load this dataset interactive inside an IPython
shell or notebook, run

  $ ipython
  >>> from msmbuilder.dataset import dataset
  >>> ds = dataset('diheds/')

Saving "featurizer.pkl"... (<class 'msmbuilder.featurizer.featurizer.DihedralFeaturizer'>)
To load this DihedralFeaturizer object interactively inside an IPython
shell or notebook, run: 

  $ ipython
  >>> from msmbuilder.utils import load
  >>> model = load('featurizer.pkl')

Preprocessing¶

Since the range of values in our raw data can vary widely from feature to feature, we can scale values to reduce bias. Here we use the RobustScaler to center and scale our dihedral angles by their respective interquartile ranges.

In [5]:

! msmb RobustScaler \
    -i diheds \
    --transformed scaled_diheds.h5

/home/travis/miniconda3/envs/docenv/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/travis/miniconda3/envs/docenv/lib/python3.6/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
RobustScaler(copy=True, quantile_range=(25.0, 75.0), with_centering=True,
       with_scaling=True)
Fitting model...
*********
*RESULTS*
*********
NotImplemented
--------------------------------------------------------------------------------


Saving transformed dataset to 'scaled_diheds.h5'
To load this dataset interactive inside an IPython
shell or notebook, run

  $ ipython
  >>> from msmbuilder.dataset import dataset
  >>> ds = dataset('scaled_diheds.h5')

Intermediate kinetic model: tICA¶

tICA is similar to principal component analysis (see "tICA vs. PCA" example). Note that the 84-dimensional space is reduced to 4 dimensions.

In [6]:

! msmb tICA -i scaled_diheds.h5 \
    --out tica_model.pkl \
    --transformed tica_trajs.h5 \
    --n_components 4 \
    --lag_time 2

/home/travis/miniconda3/envs/docenv/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/travis/miniconda3/envs/docenv/lib/python3.6/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
tICA(commute_mapping=False, kinetic_mapping=False, lag_time=2, n_components=4,
   shrinkage=None)
Fitting model...
*********
*RESULTS*
*********
time-structure based Independent Components Analysis (tICA)
-----------------------------------------------------------
n_components        : 4
shrinkage           : 0.0011725244155440773
lag_time            : 2
kinetic_mapping     : False

Top 5 timescales :
[ 288.73718985  158.84187276  133.35210395  129.88067026]

Top 5 eigenvalues :
[ 0.99309722  0.9874878   0.98511402  0.9847192 ]

--------------------------------------------------------------------------------


Saving transformed dataset to 'tica_trajs.h5'
To load this dataset interactive inside an IPython
shell or notebook, run

  $ ipython
  >>> from msmbuilder.dataset import dataset
  >>> ds = dataset('tica_trajs.h5')

Saving "tica_model.pkl"... (<class 'msmbuilder.decomposition.tica.tICA'>)
To load this tICA object interactively inside an IPython
shell or notebook, run: 

  $ ipython
  >>> from msmbuilder.utils import load
  >>> model = load('tica_model.pkl')

tICA Histogram¶

We can histogram our data projecting along the two slowest degrees of freedom (as found by tICA). You have to do this in a python script.

In [7]:

from msmbuilder.dataset import dataset
ds = dataset('tica_trajs.h5')

%matplotlib inline
import msmexplorer as msme
import numpy as np
txx = np.concatenate(ds)
_ = msme.plot_histogram(txx)

/home/travis/miniconda3/envs/docenv/lib/python3.6/site-packages/seaborn/apionly.py:6: UserWarning: As seaborn no longer sets a default style on import, the seaborn.apionly module is deprecated. It will be removed in a future version.
  warnings.warn(msg, UserWarning)
/home/travis/miniconda3/envs/docenv/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/travis/miniconda3/envs/docenv/lib/python3.6/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)

Clustering¶

Conformations need to be clustered into states (sometimes written as microstates). We cluster based on the tICA projections to group conformations that interconvert rapidly. Note that we transform our trajectories from the 4-dimensional tICA space into a 1-dimensional cluster index.

In [8]:

! msmb MiniBatchKMeans -i tica_trajs.h5 \
    --transformed labeled_trajs.h5 \
    --out clusterer.pkl \
    --n_clusters 100 \
    --random_state 42

/home/travis/miniconda3/envs/docenv/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/travis/miniconda3/envs/docenv/lib/python3.6/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
MiniBatchKMeans(batch_size=100, compute_labels=True, init='k-means++',
        init_size=None, max_iter=100, max_no_improvement=10,
        n_clusters=100, n_init=3, random_state=42, reassignment_ratio=0.01,
        tol=0.0, verbose=0)
Fitting model...
*********
*RESULTS*
*********
NotImplemented
--------------------------------------------------------------------------------


Saving transformed dataset to 'labeled_trajs.h5'
To load this dataset interactive inside an IPython
shell or notebook, run

  $ ipython
  >>> from msmbuilder.dataset import dataset
  >>> ds = dataset('labeled_trajs.h5')

Saving "clusterer.pkl"... (<class 'msmbuilder.cluster.MiniBatchKMeans'>)
To load this MiniBatchKMeans object interactively inside an IPython
shell or notebook, run: 

  $ ipython
  >>> from msmbuilder.utils import load
  >>> model = load('clusterer.pkl')

MSM¶

We can construct an MSM from the labeled trajectories

In [9]:

! msmb MarkovStateModel -i labeled_trajs.h5 \
    --out msm.pkl \
    --lag_time 2

/home/travis/miniconda3/envs/docenv/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/travis/miniconda3/envs/docenv/lib/python3.6/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
MarkovStateModel(ergodic_cutoff='on', lag_time=2, n_timescales=None,
         prior_counts=0, reversible_type='mle', sliding_window=True,
         verbose=True)
MSM contains 1 strongly connected component above weight=0.50. Component 0 selected, with population 100.000000%
*********
*RESULTS*
*********
Markov state model
------------------
Lag time         : 2
Reversible type  : mle
Ergodic cutoff   : on
Prior counts     : 0

Number of states : 100
Number of nonzero entries in counts matrix : 841 (8.41%)
Nonzero counts matrix entries:
    Min.   : 0.5
    1st Qu.: 0.5
    Median : 2.0
    Mean   : 16.6
    3rd Qu.: 8.0
    Max.   : 378.5

Total transition counts :
    13972.0 counts
Total transition counts / lag_time:
    6986.0 units
Timescales:
    [1828.40, 1249.05, 572.64, 311.84, 304.83, 298.62, 229.76, 204.46, 139.46, 132.14, 118.04, 110.47, 105.30, 79.05, 69.94, 66.07, 61.18, 53.25, 48.55, 46.22, 42.69, 41.32, 40.70, 37.78, 32.36, 29.14, 28.35, 24.80, 23.65, 23.04, 21.88, 21.11, 19.13, 18.44, 17.77, 16.34, 15.73, 15.40, 14.70, 13.35, 12.97, 9.94, 9.92, 9.70, 9.09, 8.79, 8.69, 8.60, 8.18, 7.49, 7.08, 7.01, 6.87, 6.72, 6.49, 6.35, 6.22, 6.17, 5.89, 5.61, 5.56, 5.50, 5.33, 5.28, 5.23, 4.91, 4.81, 4.68, 4.60, 4.28, 4.26, 4.07, 4.05, 3.98, 3.71, 3.36, 3.27, 2.96, 2.63, 2.51, 2.47, 2.41, 2.28, 2.27, 2.18, 2.08, 2.07, 1.77, 1.74, 1.32, 1.20, 1.20, 1.18, 1.00, 0.98, 0.70, 0.59, 0.55, 0.36]  units

--------------------------------------------------------------------------------
Saving "msm.pkl"... (<class 'msmbuilder.msm.msm.MarkovStateModel'>)
To load this MarkovStateModel object interactively inside an IPython
shell or notebook, run: 

  $ ipython
  >>> from msmbuilder.utils import load
  >>> model = load('msm.pkl')

Plot Free Energy Landscape¶

Subsequent plotting and analysis should be done from Python

In [10]:

from msmbuilder.utils import load
msm = load('msm.pkl')
clusterer = load('clusterer.pkl')

assignments = clusterer.partial_transform(txx)
assignments = msm.partial_transform(assignments)

from matplotlib import pyplot as plt
msme.plot_free_energy(txx, obs=(0, 1), n_samples=10000,
                      pi=msm.populations_[assignments],
                      xlabel='tIC 1', ylabel='tIC 2')
plt.scatter(clusterer.cluster_centers_[msm.state_labels_, 0],
            clusterer.cluster_centers_[msm.state_labels_, 1],
            s=1e4 * msm.populations_,       # size by population
            c=msm.left_eigenvectors_[:, 1], # color by eigenvector
            cmap="coolwarm",
            zorder=3
           ) 
plt.colorbar(label='First dynamical eigenvector')
plt.tight_layout()

(Fs-Peptide-command-line.ipynb; Fs-Peptide-command-line.eval.ipynb; Fs-Peptide-command-line.py)