Beauchamp, K. A.; Bowman, G. R.; Lane, T. J.; Maibaum, L.; Haque, I. S.; Pande, V. S. J. Chem. Theory Comput. 2011, 7 3412-3419
Markov state models provide a framework for understanding the fundamental states and rates in the conformational dynamics of biomolecules. We describe an improved protocol for constructing Markov state models from molecular dynamics simulations. The new protocol includes advances in clustering , data preparation , and model estimation; these improvements lead to significant increases in model accuracy , as assessed by the ability to recapitulate equilibrium and kinetic properties of reference systems. A high-performance implementation of this protocol , provided in MSMBuilder2 , is validated on dynamics ranging from picoseconds to milliseconds.
Lane, T. J.; Bowman, G. R.; Beauchamp, K.; Voelz, V. A.; Pande, V. S. J. Am. Chem. Soc. 2011, 133 18413-18419
Two strategies have been recently employed to push molecular simulation to long , biologically relevant time scales: projection-based analysis of results from specialized hardware producing a small number of ultralong trajectories and the statistical interpretation of massive parallel sampling performed with Markov state models MSMs . Here , we assess the MSM as an analysis method by constructing a Markov model from ultralong trajectories , specifically two previously reported 100 $mu$s trajectories of the FiP35 WW domain Shaw , D. E. Science 2010 , 330 , 341-346 . We find that the MSM approach yields novel insights. It discovers new statistically significant folding pathways , in which either beta-hairpin of the WW domain can form first. The rates of this process approach experimental values in a direct quantitative comparison time scales of 5.0 $mu$ s and 100 ns , within a factor of $sim$2. Finally , the hub-like topology of the MSM and identification of a holo conformation predicts how WW domains may function through a conformational selection mechanism.
Beauchamp, K. A.; McGibbon, R.; Lin, Y.; Pande, V. S. Proc. Natl. Acad. Sci. U.S.A. 2012, 109 17807-17813
Markov state models constructed from molecular dynamics simulations have recently shown success at modeling protein folding kinetics. Here we introduce two methods , flux PCCA+ FPCCA+ and sliding constraint rate estimation SCRE , that allow accurate rate models from protein folding simulations. We apply these techniques to fourteen massive simulation datasets generated by Anton and Folding @ home. Our protocol quantitatively identifies the suitability of describing each system using two-state kinetics and predicts experimentally detectable deviations from two-state behavior. An analysis of the villin headpiece and FiP35 WW domain detects multiple native substates that are consistent with experimental data. Applying the same protocol to GTT , NTL9 , and protein G suggests that some beta containing proteins can form long-lived native-like states with small register shifts. Even the simplest protein systems show folding and functional dynamics involving three or more states.
Lin, Y.; Bowman, G. R.; Beauchamp, K. A.; Pande, V. S. Biophys. J. 2012, 102 315 - 324
The aggregation of amyloid beta Aβ peptides plays an important role in the development of Alzheimer s disease. Despite extensive effort , it has been difficult to characterize the secondary and tertiary structure of the Aβ monomer , the starting point for aggregation , due to its hydrophobicity and high aggregation propensity. Here , we employ extensive molecular dynamics simulations with atomistic protein and water models to determine structural ensembles for Aβ42 , Aβ40 , and Aβ42-E22K the Italian mutant monomers in solution. Sampling of a total of >700 microseconds in all-atom detail with explicit solvent enables us to observe the effects of peptide length and a pathogenic mutation on the disordered Aβ monomer structural ensemble. Aβ42 and Aβ40 have crudely similar characteristics but reducing the peptide length from 42 to 40 residues reduces β-hairpin formation near the C-terminus. The pathogenic Italian E22K mutation induces helix formation in the region of residues 20–24. This structural alteration may increase helix-helix interactions between monomers , resulting in altered mechanism and kinetics of Aβ oligomerization.
Bowman, G. R.; Geissler, P. L. Proc. Natl. Acad. Sci. U.S.A. 2012, 109 11681-11686
Cryptic allosteric sites—transient pockets in a folded protein that are invisible to conventional experiments but can alter enzymatic activity via allosteric communication with the active site—are a promising opportunity for facilitating drug design by greatly expanding the repertoire of available drug targets. Unfortunately , identifying these sites is difficult , typically requiring resource-intensive screening of large libraries of small molecules. Here , we demonstrate that Markov state models built from extensive computer simulations totaling hundreds of microseconds of dynamics can identify prospective cryptic sites from the equilibrium fluctuations of three medically relevant proteins—β-lactamase , interleukin-2 , and RNase H—even in the absence of any ligand. As in previous studies , our methods reveal a surprising variety of conformations—including bound-like configurations—that implies a role for conformational selection in ligand binding. Moreover , our analyses lead to a number of unique insights. First , direct comparison of simulations with and without the ligand reveals that there is still an important role for an induced fit during ligand binding to cryptic sites and suggests new conformations for docking. Second , correlations between amino acid sidechains can convey allosteric signals even in the absence of substantial backbone motions. Most importantly , our extensive sampling reveals a multitude of potential cryptic sites—consisting of transient pockets coupled to the active site—even in a single protein. Based on these observations , we propose that cryptic allosteric sites may be even more ubiquitous than previously thought and that our methods should be a valuable means of guiding the search for such sites.
Bowman, G. R. J. Chem. Phys. 2012, 137 -
Markov state models MSMs –or discrete-time master equation models–are a powerful way of modeling the structure and function of molecular systems like proteins. Unfortunately , MSMs with sufficiently many states to make a quantitative connection with experiments often tens of thousands of states even for small systems are generally too complicated to understand. Here , I present a Bayesian agglomerative clustering engine BACE for coarse-graining such Markovmodels , thereby reducing their complexity and making them more comprehensible. An important feature of this algorithm is its ability to explicitly account for statistical uncertainty in model parameters that arises from finite sampling. This advance builds on a number of recent works highlighting the importance of accounting for uncertainty in the analysis of MSMs and provides significant advantages over existing methods for coarse-graining Markov state models. The closed-form expression I derive here for determining which states to merge is equivalent to the generalized Jensen-Shannon divergence , an important measure from information theory that is related to the relative entropy. Therefore , the method has an appealing information theoretic interpretation in terms of minimizing information loss. The bottom-up nature of the algorithm likely makes it particularly well suited for constructing mesoscale models. I also present an extremely efficient expression for Bayesian model comparison that can be used to identify the most meaningful levels of the hierarchy of models from BACE.
Dickson, A.; Brooks, C. L. J. Chem. Theory Comput. 2012, 8 3044-3052
The free energy landscape of a protein is a function of many interdependent degrees of freedom. For this reason , conceptual constructs e.g. , funnels have been useful to visualize these landscapes. One relatively new construct is the idea of a hub-like native state that is the final destination of many noninterconverting folding pathways. This is in contrast to the idea of a single predominant folding pathway connecting the native state to a rapidly interconverting ensemble of unfolded states. The key quantity to distinguish between these two ideas is the connectivity of the unfolded ensemble. We present a metric to determine this connectivity for a given network , which can be calculated either from continuous folding trajectories , or a Markov model. The metric determines how often a region of space is used as an intermediate on transition paths that connect two other regions of space , and we use it here to determine how often two parts of the unfolded ensemble are connected directly versus how often these transitions are mediated by the native state.
Dickson, A.; Brooks, C. L. J. Am. Chem. Soc. 2013, 135 4729-4734
It has been suggested that the native state of a protein acts as a kinetic hub that can facilitate transitions between nonnative states. Using recently developed tools to quantify mediation probabilities “hub scores” , we quantify hub-like behavior in atomic resolution trajectories for the first time. We use a data set of trajectory ensembles for 12 fast-folding proteins previously published by D. E. Shaw Research Lindorff-Larsen , K.; et al. How Fast-Folding Proteins Fold. Science2011 , 334 , 517 with an aggregate simulation time of over 8.2 ms. We visualize the free-energy landscape of each molecule using configuration space networks , and show that dynamic quantities can be qualitatively understood from visual inspection of the networks. Modularity optimization is used to provide a parameter-free means of tessellating the network into a group of communities. Using hub scores , we find that the percentage of trajectories that are mediated by the native state is 31% when averaged over all molecules , and reaches a maximum of 52% for the homeodomain and chignolin. Furthermore , for these mediated transitions , we use Markov models to determine whether the native state acts as a facilitator for the transition , or as a trap i.e. , an off-pathway detour . Although instances of facilitation are found in 4 of the 12 molecules , we conclude that the native state acts primarily as a trap , which is consistent with the idea of a funnel-like landscape.
Yao, Y.; Cui, R. Z.; Bowman, G. R.; Silva, D.; Sun, J.; Huang, X. J. Chem. Phys. 2013, 138 -
Markov state models MSMs have become a popular approach for investigating the conformational dynamics of proteins and other biomolecules. MSMs are typically built from numerous molecular dynamics simulations by dividing the sampled configurations into a large number of microstates based on geometric criteria. The resulting microstate model can then be coarse-grained into a more understandable macrostate model by lumping together rapidly mixing microstates into larger , metastable aggregates. However , finite sampling often results in the creation of many poorly sampled microstates. During coarse-graining , these states are mistakenly identified as being kinetically important because transitions to/from them appear to be slow. In this paper , we propose a formalism based on an algebraic principle for matrix approximation , i.e. , the Nyström method , to deal with such poorly sampled microstates. Our scheme builds a hierarchy of microstates from high to low populations and progressively applies spectral clustering on sets of microstates within each level of the hierarchy. It helps spectral clustering identify metastable aggregates with highly populated microstates rather than being distracted by lowly populated states. We demonstrate the ability of this algorithm to discover the major metastable states on two model systems , the alanine dipeptide and trpzip2 peptide.
Bowman, G. R.; Meng, L.; Huang, X. J. Chem. Phys. 2013, 139 -
Markov models and master equations are a powerful means of modeling dynamic processes like protein conformational changes. However , these models are often difficult to understand because of the enormous number of components and connections between them. Therefore , a variety of methods have been developed to facilitate understanding by coarse-graining these complex models. Here , we employ Bayesian model comparison to determine which of these coarse-graining methods provides the models that are most faithful to the original set of states. We find that the Bayesian agglomerative clustering engine and the hierarchical Nyström expansion graph HNEG typically provide the best performance. Surprisingly , the original Perron cluster cluster analysis PCCA method often provides the next best results , outperforming the newer PCCA+ method and the most probable paths algorithm. We also show that the differences between the models are qualitatively significant , rather than being minor shifts in the boundaries between states. The performance of the methods correlates well with the entropy of the resulting coarse-grainings , suggesting that finding states with more similar populations i.e. , avoiding low population states that may just be noise gives better results.
McGibbon, R. T.; Pande, V. S. J. Chem. Theory Comput. 2013, 9 2900-2906
Statistical modeling of long timescale dynamics with Markov state models MSMs has been shown to be an effective strategy for building quantitative and qualitative insight into protein folding processes. Existing methodologies , however , rely on geometric clustering using distance metrics such as root mean square deviation RMSD , assuming that geometric similarity provides an adequate basis for the kinetic partitioning of phase space. Here , inspired by advances in the machine learning community , we introduce a new approach for learning a distance metric explicitly constructed to model kinetic similarity. This approach enables the construction of models , especially in the regime of high anisotropy in the diffusion constant , with fewer states than was previously possible. Application of this technique to the analysis of two ultralong molecular dynamics simulations of the FiP35 WW domain identifies discrete near-native relaxation dynamics in the millisecond regime that were not resolved in previous analyses.
Levy, R. M.; Dai, W.; Deng, N.; Makarov, D. E. Protein Sci. 2013, 22 1459-1465
How long does it take to equilibrate the unfolded state of a protein? The answer to this question has important implications for our understanding of why many small proteins fold with two state kinetics. When the equilibration within the unfolded state U is much faster than the folding , the folding kinetics will be two state even if there are many folding pathways with different barriers. Yet the mean first passage times MFPTs between different regions of the unfolded state can be much longer than the folding time. This seems to imply that the equilibration within U is much slower than the folding. In this communication we resolve this paradox. We present a formula for estimating the time to equilibrate the unfolded state of a protein. We also present a formula for the MFPT to any state within U , which is proportional to the average lifetime of that state divided by the state population. This relation is valid when the equilibration within U is very fast as compared with folding as it often is for small proteins. To illustrate the concepts , we apply the formulas to estimate the time to equilibrate the unfolded state of Trp-cage and MFPTs within the unfolded state based on a Markov State Model using an ultra-long 208 microsecond trajectory of the miniprotein to parameterize the model. The time to equilibrate the unfolded state of Trp-cage is $sim$ 100 ns while the typical MFPTs within U are tens of microseconds or longer.
Chang, H.; Bacallado, S.; Pande, V. S.; Carlsson, G. E. PLoS ONE 2013, 8 e58699
The large amount of molecular dynamics simulation data produced by modern computational models brings big opportunities and challenges to researchers. Clustering algorithms play an important role in understanding biomolecular kinetics from the simulation data , especially under the Markov state model framework. However , the ruggedness of the free energy landscape in a biomolecular system makes common clustering algorithms very sensitive to perturbations of the data. Here , we introduce a data-exploratory tool which provides an overview of the clustering structure under different parameters. The proposed Multi-Persistent Clustering analysis combines insights from recent studies on the dynamics of systems with dominant metastable states with the concept of multi-dimensional persistence in computational topology. We propose to explore the clustering structure of the data based on its persistence on scale and density. The analysis provides a systematic way to discover clusters that are robust to perturbations of the data. The dominant states of the system can be chosen with confidence. For the clusters on the borderline , the user can choose to do more simulation or make a decision based on their structural characteristics. Furthermore , our multi-resolution analysis gives users information about the relative potential of the clusters and their hierarchical relationship. The effectiveness of the proposed method is illustrated in three biomolecules: alanine dipeptide , Villin headpiece , and the FiP35 WW domain.
Singh, G.; Tieleman, D. Peter J. Chem. Theory Comput. 2013, 9 1657-1666
Wimley–White pentapeptides Ac-WLXLL can be used as a model system to study lipid–protein interactions as they bind to lipid/water interfaces , like many antimicrobial peptides , and thermodynamic experimental data on their interactions with lipids are available , making them useful for both force field and method testing and development. Here we present a detailed simulation study of Wimley–White WW peptides in bulk water to investigate sampling , conformations , and differences due to the different X residue with an eye to future simulations at the lipid/water interface where sampling problems so far have hindered free energy calculations to reproduce the experimental thermodynamic data. We investigate the conformational preferences and slowest relaxation time of WW peptides in bulk water by building Markov State Models MSM from Molecular Dynamics MD simulation data. We show that clustering based on binning of backbone $phi$ , $psi$ dihedrals in combination with the community detection algorithm of Blondel et al. provides a quick way of building MSM from large data sets. Our results show that in some cases , implied times even in these small peptides range from 224 to 547 ns. The implications of these slow transitions on determining the potential of mean force profiles of peptide interactions with a lipid bilayer are discussed.
Kohlhoff, K.; Shukla, D.; Lawrenz, M.; Bowman, G.; Konerding, D.; Belov, D.; Altman, R.; Pande, V. Nat. Chem. 2014, 6 15–21
Simulations can provide tremendous insight into the atomistic details of biological mechanisms , but micro- to millisecond timescales are historically only accessible on dedicated supercomputers. We demonstrate that cloud computing is a viable alternative that brings long-timescale processes within reach of a broader community. We used Google s Exacycle cloud-computing platform to simulate two milliseconds of dynamics of a major drug target , the G-protein-coupled receptor β2AR. Markov state models aggregate independent simulations into a single statistical model that is validated by previous computational and experimental results. Moreover , our models provide an atomistic description of the activation of a G-protein-coupled receptor and reveal multiple activation pathways. Agonists and inverse agonists interact differentially with these pathways , with profound implications for drug design.
Jimenez-Cruz, C. A.; Garcia, A. E. Phys. Chem. Chem. Phys. 2014, -
Equilibrium thermodynamics of a short beta-hairpin are studied using unbiased all-atom replica exchange molecular dynamics simulations in explicit solvent. An exploratory analysis of the free energy landscape of the system is provided in terms of various structural characteristics , for both the folded and unfolded ensembles. We find that the favorable interactions between the ends introduced by the tryptophan cap , along with the flexibility of the turn region , explain the remarkable stability of the folded state. Charging of the N termini results in effective roughening of the free energy landscape and stabilization of non-native contacts. Folding-unfolding dynamics are further discussed using a set of 2413 independent molecular dynamics simulations , 2 ns to 20 ns long , at the melting temperature of the beta-hairpin. A novel method for the construction of Markov models consisting of an iterative refinement of the discretization in reduced dimensionality is presented and used to generate a detailed kinetic network of the system. The hairpin is found to fold heterogeneously on sub-microsecond timescales , with the relative position of the tryptophan side chains driving the selection of the specific pathway.