-
Nutmeg and SPICE: Models and Data for Biomolecular Machine Learning
Authors:
Peter Eastman,
Benjamin P. Pritchard,
John D. Chodera,
Thomas E. Markland
Abstract:
We describe version 2 of the SPICE dataset, a collection of quantum chemistry calculations for training machine learning potentials. It expands on the original dataset by adding much more sampling of chemical space and more data on non-covalent interactions. We train a set of potential energy functions called Nutmeg on it. They use a novel mechanism to improve performance on charged and polar mole…
▽ More
We describe version 2 of the SPICE dataset, a collection of quantum chemistry calculations for training machine learning potentials. It expands on the original dataset by adding much more sampling of chemical space and more data on non-covalent interactions. We train a set of potential energy functions called Nutmeg on it. They use a novel mechanism to improve performance on charged and polar molecules, injecting precomputed partial charges into the model to provide a reference for the large scale charge distribution. Evaluation of the new models shows they do an excellent job of reproducing energy differences between conformations, even on highly charged molecules or ones that are significantly larger than the molecules in the training set. They also produce stable molecular dynamics trajectories, and are fast enough to be useful for routine simulation of small molecules.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Enhancing Protein-Ligand Binding Affinity Predictions using Neural Network Potentials
Authors:
Francesc Sabanes Zariquiey,
Raimondas Galvelis,
Emilio Gallicchio,
John D. Chodera,
Thomas E. Markland,
Gianni de Fabritiis
Abstract:
This letter gives results on improving protein-ligand binding affinity predictions based on molecular dynamics simulations using machine learning potentials with a hybrid neural network potential and molecular mechanics methodology (NNP/MM). We compute relative binding free energies (RBFE) with the Alchemical Transfer Method (ATM) and validate its performance against established benchmarks and fin…
▽ More
This letter gives results on improving protein-ligand binding affinity predictions based on molecular dynamics simulations using machine learning potentials with a hybrid neural network potential and molecular mechanics methodology (NNP/MM). We compute relative binding free energies (RBFE) with the Alchemical Transfer Method (ATM) and validate its performance against established benchmarks and find significant enhancements compared to conventional MM force fields like GAFF2.
△ Less
Submitted 14 February, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials
Authors:
Peter Eastman,
Raimondas Galvelis,
Raúl P. Peláez,
Charlles R. A. Abreu,
Stephen E. Farr,
Emilio Gallicchio,
Anton Gorenko,
Michael M. Henry,
Frank Hu,
**g Huang,
Andreas Krämer,
Julien Michel,
Joshua A. Mitchell,
Vijay S. Pande,
João PGLM Rodrigues,
Jaime Rodriguez-Guerra,
Andrew C. Simmonett,
Sukrit Singh,
Jason Swails,
Philip Turner,
Yuanqing Wang,
Ivy Zhang,
John D. Chodera,
Gianni De Fabritiis,
Thomas E. Markland
Abstract:
Machine learning plays an important and growing role in molecular simulation. The newest version of the OpenMM molecular dynamics toolkit introduces new features to support the use of machine learning potentials. Arbitrary PyTorch models can be added to a simulation and used to compute forces and energy. A higher-level interface allows users to easily model their molecules of interest with general…
▽ More
Machine learning plays an important and growing role in molecular simulation. The newest version of the OpenMM molecular dynamics toolkit introduces new features to support the use of machine learning potentials. Arbitrary PyTorch models can be added to a simulation and used to compute forces and energy. A higher-level interface allows users to easily model their molecules of interest with general purpose, pretrained potential functions. A collection of optimized CUDA kernels and custom PyTorch operations greatly improves the speed of simulations. We demonstrate these features on simulations of cyclin-dependent kinase 8 (CDK8) and the green fluorescent protein (GFP) chromophore in water. Taken together, these features make it practical to use machine learning to improve the accuracy of simulations at only a modest increase in cost.
△ Less
Submitted 29 November, 2023; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Machine-learned molecular mechanics force field for the simulation of protein-ligand systems and beyond
Authors:
Kenichiro Takaba,
Iván Pulido,
Pavan Kumar Behara,
Chapin E. Cavender,
Anika J. Friedman,
Michael M. Henry,
Hugo MacDermott Opeskin,
Christopher R. Iacovella,
Arnav M. Nagle,
Alexander Matthew Payne,
Michael R. Shirts,
David L. Mobley,
John D. Chodera,
Yuanqing Wang
Abstract:
The development of reliable and extensible molecular mechanics (MM) force fields -- fast, empirical models characterizing the potential energy surface of molecular systems -- is indispensable for biomolecular simulation and computer-aided drug design. Here, we introduce a generalized and extensible machine-learned MM force field, \texttt{espaloma-0.3}, and an end-to-end differentiable framework us…
▽ More
The development of reliable and extensible molecular mechanics (MM) force fields -- fast, empirical models characterizing the potential energy surface of molecular systems -- is indispensable for biomolecular simulation and computer-aided drug design. Here, we introduce a generalized and extensible machine-learned MM force field, \texttt{espaloma-0.3}, and an end-to-end differentiable framework using graph neural networks to overcome the limitations of traditional rule-based methods. Trained in a single GPU-day to fit a large and diverse quantum chemical dataset of over 1.1M energy and force calculations, \texttt{espaloma-0.3} reproduces quantum chemical energetic properties of chemical domains highly relevant to drug discovery, including small molecules, peptides, and nucleic acids. Moreover, this force field maintains the quantum chemical energy-minimized geometries of small molecules and preserves the condensed phase properties of peptides, self-consistently parametrizing proteins and ligands to produce stable simulations leading to highly accurate predictions of binding free energies. This methodology demonstrates significant promise as a path forward for systematically building more accurate force fields that are easily extensible to new chemical domains of interest.
△ Less
Submitted 8 December, 2023; v1 submitted 13 July, 2023;
originally announced July 2023.
-
EspalomaCharge: Machine learning-enabled ultra-fast partial charge assignment
Authors:
Yuanqing Wang,
Iván Pulido,
Kenichiro Takaba,
Benjamin Kaminow,
Jenke Scheen,
Lily Wang,
John D. Chodera
Abstract:
Atomic partial charges are crucial parameters in molecular dynamics (MD) simulation, dictating the electrostatic contributions to intermolecular energies, and thereby the potential energy landscape. Traditionally, the assignment of partial charges has relied on surrogates of \textit{ab initio} semiempirical quantum chemical methods such as AM1-BCC, and is expensive for large systems or large numbe…
▽ More
Atomic partial charges are crucial parameters in molecular dynamics (MD) simulation, dictating the electrostatic contributions to intermolecular energies, and thereby the potential energy landscape. Traditionally, the assignment of partial charges has relied on surrogates of \textit{ab initio} semiempirical quantum chemical methods such as AM1-BCC, and is expensive for large systems or large numbers of molecules. We propose a hybrid physical / graph neural network-based approximation to the widely popular AM1-BCC charge model that is orders of magnitude faster while maintaining accuracy comparable to differences in AM1-BCC implementations. Our hybrid approach couples a graph neural network to a streamlined charge equilibration approach in order to predict molecule-specific atomic electronegativity and hardness parameters, followed by analytical determination of optimal charge-equilibrated parameters that preserves total molecular charge. This hybrid approach scales linearly with the number of atoms, enabling, for the first time, the use of fully consistent charge models for small molecules and biopolymers for the construction of next-generation self-consistent biomolecular force fields. Implemented in the free and open source package \texttt{espaloma\_charge}, this approach provides drop-in replacements for both AmberTools \texttt{antechamber} and the Open Force Field Toolkit charging workflows, in addition to stand-alone charge generation interfaces. Source code is available at \url{https://github.com/choderalab/espaloma_charge}.
△ Less
Submitted 16 February, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
Spatial Attention Kinetic Networks with E(n)-Equivariance
Authors:
Yuanqing Wang,
John D. Chodera
Abstract:
Neural networks that are equivariant to rotations, translations, reflections, and permutations on n-dimensional geometric space have shown promise in physical modeling for tasks such as accurately but inexpensively modeling complex potential energy surfaces to guiding the sampling of complex dynamical systems or forecasting their time evolution. Current state-of-the-art methods employ spherical ha…
▽ More
Neural networks that are equivariant to rotations, translations, reflections, and permutations on n-dimensional geometric space have shown promise in physical modeling for tasks such as accurately but inexpensively modeling complex potential energy surfaces to guiding the sampling of complex dynamical systems or forecasting their time evolution. Current state-of-the-art methods employ spherical harmonics to encode higher-order interactions among particles, which are computationally expensive. In this paper, we propose a simple alternative functional form that uses neurally parametrized linear combinations of edge vectors to achieve equivariance while still universally approximating node environments. Incorporating this insight, we design spatial attention kinetic networks with E(n)-equivariance, or SAKE, which are competitive in many-body system modeling tasks while being significantly faster.
△ Less
Submitted 24 January, 2023; v1 submitted 21 January, 2023;
originally announced January 2023.
-
SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials
Authors:
Peter Eastman,
Pavan Kumar Behara,
David L. Dotson,
Raimondas Galvelis,
John E. Herr,
Josh T. Horton,
Yuezhi Mao,
John D. Chodera,
Benjamin P. Pritchard,
Yuanqing Wang,
Gianni De Fabritiis,
Thomas E. Markland
Abstract:
Machine learning potentials are an important tool for molecular simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins. It contains over 1.1 million conformations for a diverse set of small…
▽ More
Machine learning potentials are an important tool for molecular simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins. It contains over 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids. It includes 15 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions. It provides both forces and energies calculated at the ωB97M-D3(BJ)/def2-TZVPPD level of theory, along with other useful quantities such as multipole moments and bond orders. We train a set of machine learning potentials on it and demonstrate that they can achieve chemical accuracy across a broad region of chemical space. It can serve as a valuable resource for the creation of transferable, ready to use potential functions for use in molecular simulations.
△ Less
Submitted 23 November, 2022; v1 submitted 21 September, 2022;
originally announced September 2022.
-
NNP/MM: Accelerating molecular dynamics simulations with machine learning potentials and molecular mechanic
Authors:
Raimondas Galvelis,
Alejandro Varela-Rial,
Stefan Doerr,
Roberto Fino,
Peter Eastman,
Thomas E. Markland,
John D. Chodera,
Gianni De Fabritiis
Abstract:
Machine learning potentials have emerged as a means to enhance the accuracy of biomolecular simulations. However, their application is constrained by the significant computational cost arising from the vast number of parameters compared to traditional molecular mechanics. To tackle this issue, we introduce an optimized implementation of the hybrid method (NNP/MM), which combines neural network pot…
▽ More
Machine learning potentials have emerged as a means to enhance the accuracy of biomolecular simulations. However, their application is constrained by the significant computational cost arising from the vast number of parameters compared to traditional molecular mechanics. To tackle this issue, we introduce an optimized implementation of the hybrid method (NNP/MM), which combines neural network potentials (NNP) and molecular mechanics (MM). This approach models a portion of the system, such as a small molecule, using NNP while employing MM for the remaining system to boost efficiency. By conducting molecular dynamics (MD) simulations on various protein-ligand complexes and metadynamics (MTD) simulations on a ligand, we showcase the capabilities of our implementation of NNP/MM. It has enabled us to increase the simulation speed by 5 times and achieve a combined sampling of one microsecond for each complex, marking the longest simulations ever reported for this class of simulation.
△ Less
Submitted 28 August, 2023; v1 submitted 20 January, 2022;
originally announced January 2022.
-
Bayesian inference-driven model parameterization and model selection for 2CLJQ fluid models
Authors:
Owen C. Madin,
Simon Boothroyd,
Richard A. Messerly,
John D. Chodera,
Josh Fass,
Michael R. Shirts
Abstract:
A high level of physical detail in a molecular model improves its ability to perform high accuracy simulations, but can also significantly affect its complexity and computational cost. In some situations, it is worthwhile to add additional complexity to a model to capture properties of interest; in others, additional complexity is unnecessary and can make simulations computationally infeasible. In…
▽ More
A high level of physical detail in a molecular model improves its ability to perform high accuracy simulations, but can also significantly affect its complexity and computational cost. In some situations, it is worthwhile to add additional complexity to a model to capture properties of interest; in others, additional complexity is unnecessary and can make simulations computationally infeasible. In this work we demonstrate the use of Bayes factors for molecular model selection, using Monte Carlo sampling techniques to evaluate the evidence for different levels of complexity in the two-centered Lennard-Jones + quadrupole (2CLJQ) fluid model. Examining three levels of nested model complexity, we demonstrate that the use of variable quadrupole and bond length parameters in this model framework is justified only sometimes. We also explore the effect of the Bayesian prior distribution on the Bayes factors, as well as ways to propose meaningful prior distributions. This Bayesian Markov Chain Monte Carlo (MCMC) process is enabled by the use of analytical surrogate models that accurately approximate the physical properties of interest. This work paves the way for further atomistic model selection work via Bayesian inference and surrogate modeling
△ Less
Submitted 13 September, 2021; v1 submitted 14 May, 2021;
originally announced May 2021.
-
Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarks
Authors:
David F. Hahn,
Christopher I. Bayly,
Hannah E. Bruce Macdonald,
John D. Chodera,
Vytautas Gapsys,
Antonia S. J. S. Mey,
David L. Mobley,
Laura Perez Benito,
Christina E. M. Schindler,
Gary Tresadern,
Gregory L. Warren
Abstract:
Free energy calculations are rapidly becoming indispensable in structure-enabled drug discovery programs. As new methods, force fields, and implementations are developed, assessing their expected accuracy on real-world systems (benchmarking) becomes critical to provide users with an assessment of the accuracy expected when these methods are applied within their domain of applicability, and develop…
▽ More
Free energy calculations are rapidly becoming indispensable in structure-enabled drug discovery programs. As new methods, force fields, and implementations are developed, assessing their expected accuracy on real-world systems (benchmarking) becomes critical to provide users with an assessment of the accuracy expected when these methods are applied within their domain of applicability, and developers with a way to assess the expected impact of new methodologies. These assessments require construction of a benchmark - a set of well-prepared, high quality systems with corresponding experimental measurements designed to ensure the resulting calculations provide a realistic assessment of expected performance when these methods are deployed within their domains of applicability. To date, the community has not yet adopted a common standardized benchmark, and existing benchmark reports suffer from a myriad of issues, including poor data quality, limited statistical power, and statistically deficient analyses, all of which can conspire to produce benchmarks that are poorly predictive of real-world performance. Here, we address these issues by presenting guidelines for (1) curating experimental data to develop meaningful benchmark sets, (2) preparing benchmark inputs according to best practices to facilitate widespread adoption, and (3) analysis of the resulting predictions to enable statistically meaningful comparisons among methods and force fields.
△ Less
Submitted 12 November, 2021; v1 submitted 13 May, 2021;
originally announced May 2021.
-
End-to-End Differentiable Molecular Mechanics Force Field Construction
Authors:
Yuanqing Wang,
Josh Fass,
Benjamin Kaminow,
John E. Herr,
Dominic Rufa,
Ivy Zhang,
Iván Pulido,
Mike Henry,
John D. Chodera
Abstract:
Molecular mechanics (MM) potentials have long been a workhorse of computational chemistry. Leveraging accuracy and speed, these functional forms find use in a wide variety of applications in biomolecular modeling and drug discovery, from rapid virtual screening to detailed free energy calculations. Traditionally, MM potentials have relied on human-curated, inflexible, and poorly extensible discret…
▽ More
Molecular mechanics (MM) potentials have long been a workhorse of computational chemistry. Leveraging accuracy and speed, these functional forms find use in a wide variety of applications in biomolecular modeling and drug discovery, from rapid virtual screening to detailed free energy calculations. Traditionally, MM potentials have relied on human-curated, inflexible, and poorly extensible discrete chemical perception rules or applying parameters to small molecules or biopolymers, making it difficult to optimize both types and parameters to fit quantum chemical or physical property data. Here, we propose an alternative approach that uses graph neural networks to perceive chemical environments, producing continuous atom embeddings from which valence and nonbonded parameters can be predicted using invariance-preserving layers. Since all stages are built from smooth neural functions, the entire process is modular and end-to-end differentiable with respect to model parameters, allowing new force fields to be easily constructed, extended, and applied to arbitrary molecules. We show that this approach is not only sufficiently expressive to reproduce legacy atom types, but that it can learn to accurately reproduce and extend existing molecular mechanics force fields. Trained with arbitrary loss functions, it can construct entirely new force fields self-consistently applicable to both biopolymers and small molecules directly from quantum chemical calculations, with superior fidelity than traditional atom or parameter ty** schemes. When trained on the same quantum chemical small molecule dataset used to parameterize the openff-1.2.0 small molecule force field augmented with a peptide dataset, the resulting espaloma model shows superior accuracy vis-à-vis experiments in computing relative alchemical free energy calculations for a popular benchmark set.
△ Less
Submitted 18 April, 2022; v1 submitted 2 October, 2020;
originally announced October 2020.
-
Best Practices for Alchemical Free Energy Calculations
Authors:
Antonia S. J. S. Mey,
Bryce Allen,
Hannah E. Bruce Macdonald,
John D. Chodera,
Maximilian Kuhn,
Julien Michel,
David L. Mobley,
Levi N. Naden,
Samarjeet Prasad,
Andrea Rizzi,
Jenke Scheen,
Michael R. Shirts,
Gary Tresadern,
Huafeng Xu
Abstract:
Alchemical free energy calculations are a useful tool for predicting free energy differences associated with the transfer of molecules from one environment to another. The hallmark of these methods is the use of "bridging" potential energy functions representing \emph{alchemical} intermediate states that cannot exist as real chemical species. The data collected from these bridging alchemical therm…
▽ More
Alchemical free energy calculations are a useful tool for predicting free energy differences associated with the transfer of molecules from one environment to another. The hallmark of these methods is the use of "bridging" potential energy functions representing \emph{alchemical} intermediate states that cannot exist as real chemical species. The data collected from these bridging alchemical thermodynamic states allows the efficient computation of transfer free energies (or differences in transfer free energies) with orders of magnitude less simulation time than simulating the transfer process directly. While these methods are highly flexible, care must be taken in avoiding common pitfalls to ensure that computed free energy differences can be robust and reproducible for the chosen force field, and that appropriate corrections are included to permit direct comparison with experimental data. In this paper, we review current best practices for several popular application domains of alchemical free energy calculations, including relative and absolute small molecule binding free energy calculations to biomolecular targets.
△ Less
Submitted 21 August, 2020; v1 submitted 7 August, 2020;
originally announced August 2020.
-
Towards Automated Benchmarking of Atomistic Forcefields: Neat Liquid Densities and Static Dielectric Constants from the ThermoML Data Archive
Authors:
Kyle A. Beauchamp,
Julie M. Behr,
Ariën S. Rustenburg,
Christopher I. Bayly,
Kenneth Kroenlein,
John D. Chodera
Abstract:
Atomistic molecular simulations are a powerful way to make quantitative predictions, but the accuracy of these predictions depends entirely on the quality of the forcefield employed. While experimental measurements of fundamental physical properties offer a straightforward approach for evaluating forcefield quality, the bulk of this information has been tied up in formats that are not machine-read…
▽ More
Atomistic molecular simulations are a powerful way to make quantitative predictions, but the accuracy of these predictions depends entirely on the quality of the forcefield employed. While experimental measurements of fundamental physical properties offer a straightforward approach for evaluating forcefield quality, the bulk of this information has been tied up in formats that are not machine-readable. Compiling benchmark datasets of physical properties from non-machine-readable sources require substantial human effort and is prone to accumulation of human errors, hindering the development of reproducible benchmarks of forcefield accuracy. Here, we examine the feasibility of benchmarking atomistic forcefields against the NIST ThermoML data archive of physicochemical measurements, which aggregates thousands of experimental measurements in a portable, machine-readable, self-annotating format. As a proof of concept, we present a detailed benchmark of the generalized Amber small molecule forcefield (GAFF) using the AM1-BCC charge model against measurements (specifically bulk liquid densities and static dielectric constants at ambient pressure) automatically extracted from the archive, and discuss the extent of available data. The results of this benchmark highlight a general problem with fixed-charge forcefields in the representation low dielectric environments such as those seen in binding cavities or biological membranes.
△ Less
Submitted 31 May, 2015;
originally announced June 2015.
-
Time step rescaling recovers continuous-time dynamical properties for discrete-time Langevin integration of nonequilibrium systems
Authors:
David A. Sivak,
John D. Chodera,
Gavin E. Crooks
Abstract:
When simulating molecular systems using deterministic equations of motion (e.g., Newtonian dynamics), such equations are generally numerically integrated according to a well-developed set of algorithms that share commonly agreed-upon desirable properties. However, for stochastic equations of motion (e.g., Langevin dynamics), there is still broad disagreement over which integration algorithms are m…
▽ More
When simulating molecular systems using deterministic equations of motion (e.g., Newtonian dynamics), such equations are generally numerically integrated according to a well-developed set of algorithms that share commonly agreed-upon desirable properties. However, for stochastic equations of motion (e.g., Langevin dynamics), there is still broad disagreement over which integration algorithms are most appropriate. While multiple desiderata have been proposed throughout the literature, consensus on which criteria are important is absent, and no published integration scheme satisfies all desiderata simultaneously. Additional nontrivial complications stem from simulating systems driven out of equilibrium using existing stochastic integration schemes in conjunction with recently-developed nonequilibrium fluctuation theorems. Here, we examine a family of discrete time integration schemes for Langevin dynamics, assessing how each member satisfies a variety of desiderata that have been enumerated in prior efforts to construct suitable Langevin integrators. We show that the incorporation of a novel time step rescaling in the deterministic updates of position and velocity can correct a number of dynamical defects in these integrators. Finally, we identify a particular splitting that has essentially universally appropriate properties for the simulation of Langevin dynamics for molecular systems in equilibrium, nonequilibrium, and path sampling contexts.
△ Less
Submitted 9 April, 2014; v1 submitted 16 January, 2013;
originally announced January 2013.
-
arXiv:1207.0225
[pdf, other]
physics.data-an
cond-mat.soft
cond-mat.stat-mech
math-ph
physics.bio-ph
physics.chem-ph
physics.comp-ph
Spectral rate theory for projected two-state kinetics
Authors:
Jan-Hendrik Prinz,
John D. Chodera,
Frank Noe
Abstract:
Classical rate theories often fail in cases where the observable(s) or order parameter(s) used are poor reaction coordinates or the observed signal is deteriorated by noise, such that no clear separation between reactants and products is possible. Here, we present a general spectral two-state rate theory for ergodic dynamical systems in thermal equilibrium that explicitly takes into account how th…
▽ More
Classical rate theories often fail in cases where the observable(s) or order parameter(s) used are poor reaction coordinates or the observed signal is deteriorated by noise, such that no clear separation between reactants and products is possible. Here, we present a general spectral two-state rate theory for ergodic dynamical systems in thermal equilibrium that explicitly takes into account how the system is observed. The theory allows the systematic estimation errors made by standard rate theories to be understood and quantified. We also elucidate the connection of spectral rate theory with the popular Markov state modeling (MSM) approach for molecular simulation studies. An optimal rate estimator is formulated that gives robust and unbiased results even for poor reaction coordinates and can be applied to both computer simulations and single-molecule experiments. No definition of a dividing surface is required. Another result of the theory is a model-free definition of the reaction coordinate quality (RCQ). The RCQ can be bounded from below by the directly computable observation quality (OQ), thus providing a measure allowing the RCQ to be optimized by tuning the experimental setup. Additionally, the respective partial probability distributions can be obtained for the reactant and product states along the observed order parameter, even when these strongly overlap. The effects of both filtering (averaging) and uncorrelated noise are also examined. The approach is demonstrated on numerical examples and experimental single-molecule force probe data of the p5ab RNA hairpin and the apo-myoglobin protein at low pH, here focusing on the case of two-state kinetics.
△ Less
Submitted 6 November, 2013; v1 submitted 1 July, 2012;
originally announced July 2012.
-
A robust approach to estimating rates from time-correlation functions
Authors:
John D. Chodera,
Phillip J. Elms,
William C. Swope,
Jan-Hendrik Prinz,
Susan Marqusee,
Carlos Bustamante,
Frank Noé,
Vijay S. Pande
Abstract:
While seemingly straightforward in principle, the reliable estimation of rate constants is seldom easy in practice. Numerous issues, such as the complication of poor reaction coordinates, cause obvious approaches to yield unreliable estimates. When a reliable order parameter is available, the reactive flux theory of Chandler allows the rate constant to be extracted from the plateau region of an ap…
▽ More
While seemingly straightforward in principle, the reliable estimation of rate constants is seldom easy in practice. Numerous issues, such as the complication of poor reaction coordinates, cause obvious approaches to yield unreliable estimates. When a reliable order parameter is available, the reactive flux theory of Chandler allows the rate constant to be extracted from the plateau region of an appropriate reactive flux function. However, when applied to real data from single-molecule experiments or molecular dynamics simulations, the rate can sometimes be difficult to extract due to the numerical differentiation of a noisy empirical correlation function or difficulty in locating the plateau region at low sampling frequencies. We present a modified version of this theory which does not require numerical derivatives, allowing rate constants to be robustly estimated from the time-correlation function directly. We compare these approaches using single-molecule force spectroscopy measurements of an RNA hairpin.
△ Less
Submitted 10 August, 2011;
originally announced August 2011.
-
Bayesian hidden Markov model analysis of single-molecule force spectroscopy: Characterizing kinetics under measurement uncertainty
Authors:
John D. Chodera,
Phillip Elms,
Frank Noé,
Bettina Keller,
Christian M. Kaiser,
Aaron Ewall-Wice,
Susan Marqusee,
Carlos Bustamante,
Nina Singhal Hinrichs
Abstract:
Single-molecule force spectroscopy has proven to be a powerful tool for studying the kinetic behavior of biomolecules. Through application of an external force, conformational states with small or transient populations can be stabilized, allowing them to be characterized and the statistics of individual trajectories studied to provide insight into biomolecular folding and function. Because the obs…
▽ More
Single-molecule force spectroscopy has proven to be a powerful tool for studying the kinetic behavior of biomolecules. Through application of an external force, conformational states with small or transient populations can be stabilized, allowing them to be characterized and the statistics of individual trajectories studied to provide insight into biomolecular folding and function. Because the observed quantity (force or extension) is not necessarily an ideal reaction coordinate, individual observations cannot be uniquely associated with kinetically distinct conformations. While maximum-likelihood schemes such as hidden Markov models have solved this problem for other classes of single-molecule experiments by using temporal information to aid in the inference of a sequence of distinct conformational states, these methods do not give a clear picture of how precisely the model parameters are determined by the data due to instrument noise and finite-sample statistics, both significant problems in force spectroscopy. We solve this problem through a Bayesian extension that allows the experimental uncertainties to be directly quantified, and build in detailed balance to further reduce uncertainty through physical constraints. We illustrate the utility of this approach in characterizing the three-state kinetic behavior of an RNA hairpin in a stationary optical trap.
△ Less
Submitted 5 August, 2011;
originally announced August 2011.
-
Using nonequilibrium fluctuation theorems to understand and correct errors in equilibrium and nonequilibrium discrete Langevin dynamics simulations
Authors:
David A. Sivak,
John D. Chodera,
Gavin E. Crooks
Abstract:
Common algorithms for computationally simulating Langevin dynamics must discretize the stochastic differential equations of motion. These resulting finite time step integrators necessarily have several practical issues in common: Microscopic reversibility is violated, the sampled stationary distribution differs from the desired equilibrium distribution, and the work accumulated in nonequilibrium s…
▽ More
Common algorithms for computationally simulating Langevin dynamics must discretize the stochastic differential equations of motion. These resulting finite time step integrators necessarily have several practical issues in common: Microscopic reversibility is violated, the sampled stationary distribution differs from the desired equilibrium distribution, and the work accumulated in nonequilibrium simulations is not directly usable in estimators based on nonequilibrium work theorems. Here, we show that even with a time-independent Hamiltonian, finite time step Langevin integrators can be thought of as a driven, nonequilibrium physical process. Once an appropriate work-like quantity is defined -- here called the shadow work -- recently developed nonequilibrium fluctuation theorems can be used to measure or correct for the errors introduced by the use of finite time steps. In particular, we demonstrate that amending estimators based on nonequilibrium work theorems to include this shadow work removes the time step dependent error from estimates of free energies. We also quantify, for the first time, the magnitude of deviations between the sampled stationary distribution and the desired equilibrium distribution for equilibrium Langevin simulations of solvated systems of varying size. While these deviations can be large, they can be eliminated altogether by Metropolization or greatly diminished by small reductions in the time step. Through this connection with driven processes, further developments in nonequilibrium fluctuation theorems can provide additional analytical tools for dealing with errors in finite time step integrators.
△ Less
Submitted 29 January, 2013; v1 submitted 14 July, 2011;
originally announced July 2011.
-
Replica exchange and expanded ensemble simulations as Gibbs sampling: Simple improvements for enhanced mixing
Authors:
John D. Chodera,
Michael R. Shirts
Abstract:
The widespread popularity of replica exchange and expanded ensemble algorithms for simulating complex molecular systems in chemistry and biophysics has generated much interest in enhancing phase space mixing of these protocols, thus improving their efficiency. Here, we demonstrate how both of these classes of algorithms can be considered a form of Gibbs sampling within a Markov chain Monte Carlo (…
▽ More
The widespread popularity of replica exchange and expanded ensemble algorithms for simulating complex molecular systems in chemistry and biophysics has generated much interest in enhancing phase space mixing of these protocols, thus improving their efficiency. Here, we demonstrate how both of these classes of algorithms can be considered a form of Gibbs sampling within a Markov chain Monte Carlo (MCMC) framework. While the update of the conformational degrees of freedom by Metropolis Monte Carlo or molecular dynamics unavoidably generates correlated samples, we show how judicious updating of the thermodynamic state indices---corresponding to thermodynamic parameters such as temperature or alchemical coupling variables---associated with these configurations can substantially increase mixing while still sampling from the desired distributions. We show how state update methods in common use lead to suboptimal mixing, and present some simple, inexpensive alternatives that can increase mixing of the overall Markov chain, reducing simulation times necessary to obtain estimates of the desired precision. These improved schemes are demonstrated for several common applications, including an alchemical expanded ensemble simulation, parallel tempering, and multidimensional replica exchange umbrella sampling.
△ Less
Submitted 26 September, 2011; v1 submitted 28 May, 2011;
originally announced May 2011.
-
Nonequilibrium candidate Monte Carlo: A new tool for efficient equilibrium simulation
Authors:
Jerome P. Nilmeier,
Gavin E. Crooks,
David D. L. Minh,
John D. Chodera
Abstract:
Metropolis Monte Carlo simulation is a powerful tool for studying the equilibrium properties of matter. In complex condensed-phase systems, however, it is difficult to design Monte Carlo moves with high acceptance probabilities that also rapidly sample uncorrelated configurations. Here, we introduce a new class of moves based on nonequilibrium dynamics: candidate configurations are generated throu…
▽ More
Metropolis Monte Carlo simulation is a powerful tool for studying the equilibrium properties of matter. In complex condensed-phase systems, however, it is difficult to design Monte Carlo moves with high acceptance probabilities that also rapidly sample uncorrelated configurations. Here, we introduce a new class of moves based on nonequilibrium dynamics: candidate configurations are generated through a finite-time process in which a system is actively driven out of equilibrium, and accepted with criteria that preserve the equilibrium distribution. The acceptance rule is similar to the Metropolis acceptance probability, but related to the nonequilibrium work rather than the instantaneous energy difference. Our method is applicable to sampling from both a single thermodynamic state or a mixture of thermodynamic states, and allows both coordinates and thermodynamic parameters to be driven in nonequilibrium proposals. While generating finite-time switching trajectories incurs an additional cost, driving some degrees of freedom while allowing others to evolve naturally can lead to large enhancements in acceptance probabilities, greatly reducing structural correlation times. Using nonequilibrium driven processes vastly expands the repertoire of useful Monte Carlo proposals in simulations of dense solvated systems.
△ Less
Submitted 20 October, 2011; v1 submitted 11 May, 2011;
originally announced May 2011.
-
Splitting probabilities as a test of reaction coordinate choice in single-molecule experiments
Authors:
John D. Chodera,
Vijay S. Pande
Abstract:
To explain the observed dynamics in equilibrium single-molecule measurements of biomolecules, the experimental observable is often chosen as a putative reaction coordinate along which kinetic behavior is presumed to be governed by diffusive dynamics. Here, we invoke the splitting probability as a test of the suitability of such a proposed reaction coordinate. Comparison of the observed splitting p…
▽ More
To explain the observed dynamics in equilibrium single-molecule measurements of biomolecules, the experimental observable is often chosen as a putative reaction coordinate along which kinetic behavior is presumed to be governed by diffusive dynamics. Here, we invoke the splitting probability as a test of the suitability of such a proposed reaction coordinate. Comparison of the observed splitting probability with that computed from the kinetic model provides a simple test to reject poor reaction coordinates. We demonstrate this test for a force spectroscopy measurement of a DNA hairpin.
△ Less
Submitted 13 July, 2011; v1 submitted 3 May, 2011;
originally announced May 2011.
-
Estimating equilibrium ensemble averages using multiple time slices from driven nonequilibrium processes: theory and application to free energies, moments, and thermodynamic length in single-molecule pulling experiments
Authors:
David D. L. Minh,
John D. Chodera
Abstract:
Recently discovered identities in statistical mechanics have enabled the calculation of equilibrium ensemble averages from realizations of driven nonequilibrium processes, including single-molecule pulling experiments and analogous computer simulations. Challenges in collecting large data sets motivate the pursuit of efficient statistical estimators that maximize use of available information. Alon…
▽ More
Recently discovered identities in statistical mechanics have enabled the calculation of equilibrium ensemble averages from realizations of driven nonequilibrium processes, including single-molecule pulling experiments and analogous computer simulations. Challenges in collecting large data sets motivate the pursuit of efficient statistical estimators that maximize use of available information. Along these lines, Hummer and Szabo developed an estimator that combines data from multiple time slices along a driven nonequilibrium process to compute the potential of mean force. Here, we generalize their approach, pooling information from multiple time slices to estimate arbitrary equilibrium expectations. Our expression may be combined with estimators of path-ensemble averages, including existing optimal estimators that use data collected by unidirectional and bidirectional protocols. We demonstrate the estimator by calculating free energies, moments of the polymer extension, and the metric tensor for thermodynamic length in a model single-molecule pulling experiment. Compared to estimators that only use individual time slices, our multiple time-slice estimators yield substantially smoother estimates and achieve lower variance for higher-order moments.
△ Less
Submitted 26 October, 2010;
originally announced October 2010.
-
Optimal estimators and asymptotic variances for nonequilibrium path-ensemble averages
Authors:
David D. L. Minh,
John D. Chodera
Abstract:
Existing optimal estimators of nonequilibrium path-ensemble averages are shown to fall within the framework of extended bridge sampling. Using this framework, we derive a general minimal-variance estimator that can combine nonequilibrium trajectory data sampled from multiple path-ensembles to estimate arbitrary functions of nonequilibrium expectations. The framework is also applied to obtaining…
▽ More
Existing optimal estimators of nonequilibrium path-ensemble averages are shown to fall within the framework of extended bridge sampling. Using this framework, we derive a general minimal-variance estimator that can combine nonequilibrium trajectory data sampled from multiple path-ensembles to estimate arbitrary functions of nonequilibrium expectations. The framework is also applied to obtaining asymptotic variance estimates, which are a useful measure of statistical uncertainty. In particular, we develop asymptotic variance estimates pertaining to Jarzynski's equality for free energies and the Hummer-Szabo expressions for the potential of mean force, calculated from uni- or bidirectional path samples. Lastly, they are demonstrated on a model single-molecule pulling experiment. In these simulations, the asymptotic variance expression is found to accurately characterize the confidence intervals around estimators when the bias is small. Hence, it does not work well for unidirectional estimates with large bias, but for this model it largely reflects the true error in a bidirectional estimator derived by Minh and Adib.
△ Less
Submitted 27 July, 2009;
originally announced July 2009.
-
Statistically optimal analysis of samples from multiple equilibrium states
Authors:
Michael R. Shirts,
John D. Chodera
Abstract:
We present a new estimator for computing free energy differences and thermodynamic expectations as well as their uncertainties from samples obtained from multiple equilibrium states via either simulation or experiment. The estimator, which we term the multistate Bennett acceptance ratio (MBAR) estimator because it reduces to the Bennett acceptance ratio when only two states are considered, has s…
▽ More
We present a new estimator for computing free energy differences and thermodynamic expectations as well as their uncertainties from samples obtained from multiple equilibrium states via either simulation or experiment. The estimator, which we term the multistate Bennett acceptance ratio (MBAR) estimator because it reduces to the Bennett acceptance ratio when only two states are considered, has significant advantages over multiple histogram reweighting methods for combining data from multiple states. It does not require the sampled energy range to be discretized to produce histograms, eliminating bias due to energy binning and significantly reducing the time complexity of computing a solution to the estimating equations in many cases. Additionally, an estimate of the statistical uncertainty is provided for all estimated quantities. In the large sample limit, MBAR is unbiased and has the lowest variance of any known estimator for making use of equilibrium data collected from multiple states. We illustrate this method by producing a highly precise estimate of the potential of mean force for a DNA hairpin system, combining data from multiple optical tweezer measurements under constant force bias.
△ Less
Submitted 17 June, 2008; v1 submitted 9 January, 2008;
originally announced January 2008.