-
Learning data efficient coarse-grained molecular dynamics from forces and noise
Authors:
Aleksander E. P. Durumeric,
Yaoyi Chen,
Frank Noé,
Cecilia Clementi
Abstract:
Machine-learned coarse-grained (MLCG) molecular dynamics is a promising option for modeling biomolecules. However, MLCG models currently require large amounts of data from reference atomistic molecular dynamics or substantial computation for training. Denoising score matching -- the technology behind the widely popular diffusion models -- has simultaneously emerged as a machine-learning framework…
▽ More
Machine-learned coarse-grained (MLCG) molecular dynamics is a promising option for modeling biomolecules. However, MLCG models currently require large amounts of data from reference atomistic molecular dynamics or substantial computation for training. Denoising score matching -- the technology behind the widely popular diffusion models -- has simultaneously emerged as a machine-learning framework for creating samples from noise. Models in the first category are often trained using atomistic forces, while those in the second category extract the data distribution by reverting noise-based corruption. We unify these approaches to improve the training of MLCG force-fields, reducing data requirements by a factor of 100 while maintaining advantages typical to force-based parameterization. The methods are demonstrated on proteins Trp-Cage and NTL9 and published as open-source code.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Navigating protein landscapes with a machine-learned transferable coarse-grained model
Authors:
Nicholas E. Charron,
Felix Musil,
Andrea Guljas,
Yaoyi Chen,
Klara Bonneau,
Aldo S. Pasos-Trejo,
Jacopo Venturin,
Daria Gusew,
Iryna Zaporozhets,
Andreas Krämer,
Clark Templeton,
Atharva Kelkar,
Aleksander E. P. Durumeric,
Simon Olsson,
Adrià Pérez,
Maciej Majewski,
Brooke E. Husic,
Ankit Patel,
Gianni De Fabritiis,
Frank Noé,
Cecilia Clementi
Abstract:
The most popular and universally predictive protein simulation models employ all-atom molecular dynamics (MD), but they come at extreme computational cost. The development of a universal, computationally efficient coarse-grained (CG) model with similar prediction performance has been a long-standing challenge. By combining recent deep learning methods with a large and diverse training set of all-a…
▽ More
The most popular and universally predictive protein simulation models employ all-atom molecular dynamics (MD), but they come at extreme computational cost. The development of a universal, computationally efficient coarse-grained (CG) model with similar prediction performance has been a long-standing challenge. By combining recent deep learning methods with a large and diverse training set of all-atom protein simulations, we here develop a bottom-up CG force field with chemical transferability, which can be used for extrapolative molecular dynamics on new sequences not used during model parametrization. We demonstrate that the model successfully predicts folded structures, intermediates, metastable folded and unfolded basins, and the fluctuations of intrinsically disordered proteins while it is several orders of magnitude faster than an all-atom model. This showcases the feasibility of a universal and computationally efficient machine-learned CG model for proteins.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Utilizing Machine Learning to Greatly Expand the Range and Accuracy of Bottom-Up Coarse-Grained Models Through Virtual Particles
Authors:
Patrick G. Sahrmann,
Timothy D. Loose,
Aleksander E. P. Durumeric,
Gregory A. Voth
Abstract:
Coarse-grained (CG) models parameterized using atomistic reference data, i.e., 'bottom up' CG models, have proven useful in the study of biomolecules and other soft matter. However, the construction of highly accurate, low resolution CG models of biomolecules remains challenging. We demonstrate in this work how virtual particles, CG sites with no atomistic correspondence, can be incorporated into…
▽ More
Coarse-grained (CG) models parameterized using atomistic reference data, i.e., 'bottom up' CG models, have proven useful in the study of biomolecules and other soft matter. However, the construction of highly accurate, low resolution CG models of biomolecules remains challenging. We demonstrate in this work how virtual particles, CG sites with no atomistic correspondence, can be incorporated into CG models within the context of relative entropy minimization (REM) as latent variables. The methodology presented, variational derivative relative entropy minimization (VD-REM), enables optimization of virtual particle interactions through a gradient descent algorithm aided by machine learning. We apply this methodology to the challenging case of a solvent-free CG model of a 1,2-dioleoyl-sn-glycero-3-phosphocholine (DOPC) lipid bilayer and demonstrate that introduction of virtual particles captures solvent-mediated behavior and higher-order correlations which REM alone cannot capture in a more standard CG model based only on the map** of collections of atoms to the CG sites.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
Explaining classifiers to understand coarse-grained models
Authors:
Aleksander Evren Paetzold Durumeric,
Gregory A. Voth
Abstract:
Bottom-up coarse-grained molecular dynamics models are parameterized using complex effective Hamiltonians. These models are typically optimized to approximate high dimensional data from atomistic simulations. In contrast, human validation of these models is often limited to low dimensional statistics that do not necessarily differentiate between the CG model and said atomistic simulations. We prop…
▽ More
Bottom-up coarse-grained molecular dynamics models are parameterized using complex effective Hamiltonians. These models are typically optimized to approximate high dimensional data from atomistic simulations. In contrast, human validation of these models is often limited to low dimensional statistics that do not necessarily differentiate between the CG model and said atomistic simulations. We propose that explainable machine learning can directly convey high-dimensional error to scientists and use Shapley additive explanations do so in two coarse-grained protein models.
△ Less
Submitted 15 September, 2021;
originally announced September 2021.
-
Adversarial-Residual-Coarse-Graining: Applying machine learning theory to systematic molecular coarse-graining
Authors:
Aleksander E. P. Durumeric,
Gregory A. Voth
Abstract:
We utilize connections between molecular coarse-graining approaches and implicit generative models in machine learning to describe a new framework for systematic molecular coarse-graining (CG). Focus is placed on the formalism encompassing generative adversarial networks. The resulting method enables a variety of model parameterization strategies, some of which show similarity to previous CG metho…
▽ More
We utilize connections between molecular coarse-graining approaches and implicit generative models in machine learning to describe a new framework for systematic molecular coarse-graining (CG). Focus is placed on the formalism encompassing generative adversarial networks. The resulting method enables a variety of model parameterization strategies, some of which show similarity to previous CG methods. We demonstrate that the resulting framework can rigorously parameterize CG models containing CG sites with no prescribed connection to the reference atomistic system (termed virtual sites); however, this advantage is offset by the lack of a closed-form expression for the CG Hamiltonian at the resolution obtained after integration over the virtual CG sites. Computational examples are provided for cases in which these methods ideally return identical parameters as Relative Entropy Minimization (REM) CG but where traditional REM CG optimization equations are not applicable.
△ Less
Submitted 27 June, 2019; v1 submitted 1 April, 2019;
originally announced April 2019.