HMM and MSM Timescales for Ala2¶

hmm-and-msm_evaluated

This example builds HMM and MSMs on the alanine_dipeptide dataset using varing lag times and numbers of states, and compares the relaxation timescales

In [1]:

from __future__ import print_function
import os
from matplotlib.pyplot import *
from msmbuilder.featurizer import SuperposeFeaturizer
from msmbuilder.example_datasets import AlanineDipeptide
from msmbuilder.hmm import GaussianFusionHMM
from msmbuilder.cluster import KCenters
from msmbuilder.msm import MarkovStateModel

First: load and "featurize"

Featurization refers to the process of converting the conformational snapshots from your MD trajectories into vectors in some space \(\mathbb{R}^N\) that can be manipulated and modeled by subsequent analyses. The Gaussian HMM, for instance, uses Gaussian emission distributions, so it models the trajectory as a time-dependent mixture of multivariate Gaussians.

In general, the featurization is somewhat of an art. For this example, we're using Mixtape's SuperposeFeaturizer, which superposes each snapshot onto a reference frame (trajectories[0][0] in this example), and then measure the distance from each atom to its position in the reference conformation as the 'feature'

In [2]:

print(AlanineDipeptide.description())

dataset = AlanineDipeptide().get()
trajectories = dataset.trajectories
topology = trajectories[0].topology

indices = [atom.index for atom in topology.atoms if atom.element.symbol in ['C', 'O', 'N']]
featurizer = SuperposeFeaturizer(indices, trajectories[0][0])
sequences = featurizer.transform(trajectories)

The dataset consists of ten 10ns trajectories of of alanine dipeptide,
simulated using OpenMM 6.0.1 (CUDA platform, NVIDIA GTX660) with the
AMBER99SB-ILDN force field at 300K (langevin dynamics, friction coefficient
of 91/ps, timestep of 2fs) with GBSA implicit solvent. The coordinates are
saved every 1ps. Each trajectory contains 9,999 snapshots.

The dataset, including the script used to generate the dataset
is available on figshare at

http://dx.doi.org/10.6084/m9.figshare.1026131

Now `sequences` is our featurized data.

In [3]:

lag_times = [1, 10, 20, 30, 40]
hmm_ts0 = {}
hmm_ts1 = {}
n_states = [3, 5]

for n in n_states:
    hmm_ts0[n] = []
    hmm_ts1[n] = []
    for lag_time in lag_times:
        strided_data = [s[i::lag_time] for s in sequences for i in range(lag_time)]
        hmm = GaussianFusionHMM(n_states=n, n_features=sequences[0].shape[1], n_init=1).fit(strided_data)
        timescales = hmm.timescales_ * lag_time
        hmm_ts0[n].append(timescales[0])
        hmm_ts1[n].append(timescales[1])
        print('n_states=%d\tlag_time=%d\ttimescales=%s' % (n, lag_time, timescales))
    print()

n_states=3	lag_time=1	timescales=[ 125.98809052    3.76098752]
n_states=3	lag_time=10	timescales=[ 208.22198486    5.93702745]
n_states=3	lag_time=20	timescales=[ 221.63017273    3.20434332]
n_states=3	lag_time=30	timescales=[ 227.0619812     7.76616001]
n_states=3	lag_time=40	timescales=[ 230.05653381    8.96204567]

n_states=5	lag_time=1	timescales=[ 127.13710022    3.75455594    2.01012301    0.77447289]
n_states=5	lag_time=10	timescales=[ 197.96842957    5.93466234    3.34466958    2.22226715]
n_states=5	lag_time=20	timescales=[ 222.17843628    6.30546951    3.49593163    3.40618801]
n_states=5	lag_time=30	timescales=[ 227.09893799    7.31305552    6.26924467]
n_states=5	lag_time=40	timescales=[ 230.46333313    9.10867405    8.51679039    6.68550062]

/Users/rmcgibbo/miniconda/envs/3.4.2/lib/python3.4/site-packages/msmbuilder-3.0.0-py3.4-macosx-10.5-x86_64.egg/msmbuilder/hmm/ghmm.py:390: RuntimeWarning: divide by zero encountered in log
  self._impl.transmat_ = value

In [4]:

figure(figsize=(14,3))

for i, n in enumerate(n_states):
    subplot(1,len(n_states),1+i)
    plot(lag_times, hmm_ts0[n])
    plot(lag_times, hmm_ts1[n])
    if i == 0:
        ylabel('Relaxation Timescale')
    xlabel('Lag Time')
    title('%d states' % n)

show()

In [5]:

msmts0, msmts1 = {}, {}
lag_times = [1, 10, 20, 30, 40]
n_states = [4, 8, 16, 32, 64]

for n in n_states:
    msmts0[n] = []
    msmts1[n] = []
    for lag_time in lag_times:
        assignments = KCenters(n_clusters=n).fit_predict(sequences)
        msm = MarkovStateModel(lag_time=lag_time, verbose=False).fit(assignments)
        timescales = msm.timescales_
        msmts0[n].append(timescales[0])
        msmts1[n].append(timescales[1])
        print('n_states=%d\tlag_time=%d\ttimescales=%s' % (n, lag_time, timescales[0:2]))
    print()

n_states=4	lag_time=1	timescales=[ 81.29685156   2.65953513]
n_states=4	lag_time=10	timescales=[ 173.53877649    5.53804502]
n_states=4	lag_time=20	timescales=[ 201.60073711    6.23725672]
n_states=4	lag_time=30	timescales=[ 190.49916684    6.84261968]
n_states=4	lag_time=40	timescales=[ 221.75135601    8.32955704]

n_states=8	lag_time=1	timescales=[ 74.52242143   1.80979293]
n_states=8	lag_time=10	timescales=[ 188.89343515    5.93226099]
n_states=8	lag_time=20	timescales=[ 207.74374767    6.01933781]
n_states=8	lag_time=30	timescales=[ 202.25543238    7.45125444]
n_states=8	lag_time=40	timescales=[ 211.99218114    9.42052073]

n_states=16	lag_time=1	timescales=[ 97.96595787   3.74447917]
n_states=16	lag_time=10	timescales=[ 200.87875877    6.2085573 ]
n_states=16	lag_time=20	timescales=[ 221.63539564    6.42614112]
n_states=16	lag_time=30	timescales=[ 230.76252185    7.84194881]
n_states=16	lag_time=40	timescales=[ 227.6310456    10.53777119]

n_states=32	lag_time=1	timescales=[ 120.5183706     3.74518508]
n_states=32	lag_time=10	timescales=[ 218.68101579    6.36109923]
n_states=32	lag_time=20	timescales=[ 229.78045034    6.55422503]
n_states=32	lag_time=30	timescales=[ 230.29874027    8.41109618]
n_states=32	lag_time=40	timescales=[ 234.68154656   11.04351931]

n_states=64	lag_time=1	timescales=[ 154.29561857    4.99679298]
n_states=64	lag_time=10	timescales=[ 229.17859219    6.47666357]
n_states=64	lag_time=20	timescales=[ 232.73818779    6.66093186]
n_states=64	lag_time=30	timescales=[ 235.36208971    9.33085304]
n_states=64	lag_time=40	timescales=[ 237.88383513   12.83043733]

In [6]:

figure(figsize=(14,3))

for i, n in enumerate(n_states):
    subplot(1,len(n_states),1+i)
    plot(lag_times, msmts0[n])
    plot(lag_times, msmts1[n])
    if i == 0:
        ylabel('Relaxation Timescale')
    xlabel('Lag Time')
    title('%d states' % n)

show()

(hmm-and-msm.ipynb; hmm-and-msm_evaluated.ipynb; hmm-and-msm.py)

HMM and MSM Timescales for Ala2¶

First: load and "featurize"

Now sequences is our featurized data.

Now `sequences` is our featurized data.