tICA and PCA¶

tica-example_evaluated

This example compares two methods for dimensionality reduction: tICA and PCA.

In [1]:

%matplotlib inline
from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
import simtk.openmm as mm
from msmbuilder.decomposition import tICA, PCA

loading from /home/rmcgibbo/miniconda/lib/plugins

First, let's use OpenMM to run some dynamics on the 3D potential energy function

$$E(x,y,z) = 5 \cdot (x-1)^2 \cdot (x+1)^2 + y^2 + z^2$$

From looking at this equation, we can see that along the $x$ dimension, the potential is a double-well, whereas along the $y$ and $z$ dimensions, we've just got a harmonic potential. So, we should expect that $x$ is the slow degree of freedom, whereas the system should equilibrate rapidly along $y$ and $z$.

In [2]:

def propagate(n_steps=10000):
    "Simulate some dynamics"
    system = mm.System()
    system.addParticle(1)
    force = mm.CustomExternalForce('5*(x-1)^2*(x+1)^2 + y^2 + z^2')
    force.addParticle(0, [])
    system.addForce(force)
    integrator = mm.LangevinIntegrator(500, 1, 0.02)
    context = mm.Context(system, integrator)
    context.setPositions([[0, 0, 0]])
    context.setVelocitiesToTemperature(500)
    x = np.zeros((n_steps, 3))
    for i in range(n_steps):
        x[i] = context.getState(getPositions=True).getPositions(asNumpy=True)._value
        integrator.step(1)
    return x

Okay, let's run the dynamics. The first plot below shows the $x$, $y$ and $z$ coordinate vs. time for the trajectory, and the second plot shows each of the 1D and 2D marginal distributions.

In [3]:

trajectory = propagate(10000)

ylabels = ['x', 'y', 'z']
for i in range(3):
    plt.subplot(3, 1, i+1)
    plt.plot(trajectory[:, i])
    plt.ylabel(ylabels[i])
plt.xlabel('Simulation time')
plt.show()

Note that the variance of $x$ is much lower than the variance in $y$ or $z$, despite it's bi-modal distribution.

In [4]:

# fit the two models
tica = tICA(n_components=1, lag_time=100)
pca = PCA(n_components=1)
tica.fit([trajectory])
pca.fit([trajectory])

Out[4]:

PCA(copy=True, n_components=1, whiten=False)

In [5]:

plt.subplot(1,2,1)
plt.title('1st tIC')
plt.bar([1,2,3], tica.components_[0], color='b')
plt.xticks([1.5,2.5,3.5], ['x', 'y', 'z'])
plt.subplot(1,2,2)
plt.title('1st PC')
plt.bar([1,2,3], pca.components_[0], color='r')
plt.xticks([1.5,2.5,3.5], ['x', 'y', 'z'])
plt.show()

print('1st tIC', tica.components_ / np.linalg.norm(tica.components_))
print('1st PC ', pca.components_ / np.linalg.norm(pca.components_))

1st tIC [[ 0.99806166  0.01365976  0.06071511]]
1st PC  [[-0.02702715 -0.99072435 -0.13317207]]

Note that the first tIC "finds" a projection that just resolves the $x$ coordinate, whereas PCA doesn't.

(tica-example.ipynb; tica-example_evaluated.ipynb; tica-example.py)