tICA and PCAΒΆ

tica-example_evaluated

This example compares two methods for dimensionality reduction: tICA and PCA.

In [1]:
from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
import simtk.openmm as mm
from msmbuilder.decomposition import tICA, PCA
/Users/rmcgibbo/miniconda/envs/3.4.2/lib/python3.4/site-packages/sklearn/externals/joblib/_multiprocessing_helpers.py:29: UserWarning: [Errno 28] No space left on device.  joblib will operate in serial mode
  warnings.warn('%s.  joblib will operate in serial mode' % (e,))

First, let's use OpenMM to run some dynamics on the 3D potential energy function

\[E(x,y,z) = 5 \cdot (x-1)^2 \cdot (x+1)^2 + y^2 + z^2\]

From looking at this equation, we can see that along the \(x\) dimension, the potential is a double-well, whereas along the \(y\) and \(z\) dimensions, we've just got a harmonic potential. So, we should expect that \(x\) is the slow degree of freedom, whereas the system should equilibrate rapidly along \(y\) and \(z\).

In [2]:
def propagate(n_steps=10000):
    "Simulate some dynamics"
    system = mm.System()
    system.addParticle(1)
    force = mm.CustomExternalForce('5*(x-1)^2*(x+1)^2 + y^2 + z^2')
    force.addParticle(0, [])
    system.addForce(force)
    integrator = mm.LangevinIntegrator(500, 1, 0.02)
    context = mm.Context(system, integrator)
    context.setPositions([[0, 0, 0]])
    context.setVelocitiesToTemperature(500)
    x = np.zeros((n_steps, 3))
    for i in range(n_steps):
        x[i] = context.getState(getPositions=True).getPositions(asNumpy=True)._value
        integrator.step(1)
    return x

Okay, let's run the dynamics. The first plot below shows the \(x\), \(y\) and \(z\) coordinate vs. time for the trajectory, and the second plot shows each of the 1D and 2D marginal distributions.

In [3]:
trajectory = propagate(10000)

ylabels = ['x', 'y', 'z']
for i in range(3):
    plt.subplot(3, 1, i+1)
    plt.plot(trajectory[:, i])
    plt.ylabel(ylabels[i])
plt.xlabel('Simulation time')
plt.show()

Note that the variance of \(x\) is much lower than the variance in \(y\) or \(z\), despite it's bi-modal distribution.

In [4]:
# fit the two models
tica = tICA(n_components=1, lag_time=100)
pca = PCA(n_components=1)
tica.fit([trajectory])
pca.fit([trajectory])
Out[4]:
PCA(copy=True, n_components=1, whiten=False)
In [5]:
plt.subplot(1,2,1)
plt.title('1st tIC')
plt.bar([1,2,3], tica.components_[0], color='b')
plt.xticks([1.5,2.5,3.5], ['x', 'y', 'z'])
plt.subplot(1,2,2)
plt.title('1st PC')
plt.bar([1,2,3], pca.components_[0], color='r')
plt.xticks([1.5,2.5,3.5], ['x', 'y', 'z'])
plt.show()

print('1st tIC', tica.components_ / np.linalg.norm(tica.components_))
print('1st PC ', pca.components_ / np.linalg.norm(pca.components_))
1st tIC [[ 0.99485814 -0.09569917  0.03315053]]
1st PC  [[ 0.01413359 -0.76312039  0.64610179]]

Note that the first tIC "finds" a projection that just resolves the \(x\) coordinate, whereas PCA doesn't.

(tica-example.ipynb; tica-example_evaluated.ipynb; tica-example.py)

Versions