-
BricksRL: A Platform for Democratizing Robotics and Reinforcement Learning Research and Education with LEGO
Authors:
Sebastian Dittert,
Vincent Moens,
Gianni De Fabritiis
Abstract:
We present BricksRL, a platform designed to democratize access to robotics for reinforcement learning research and education. BricksRL facilitates the creation, design, and training of custom LEGO robots in the real world by interfacing them with the TorchRL library for reinforcement learning agents. The integration of TorchRL with the LEGO hubs, via Bluetooth bidirectional communication, enables…
▽ More
We present BricksRL, a platform designed to democratize access to robotics for reinforcement learning research and education. BricksRL facilitates the creation, design, and training of custom LEGO robots in the real world by interfacing them with the TorchRL library for reinforcement learning agents. The integration of TorchRL with the LEGO hubs, via Bluetooth bidirectional communication, enables state-of-the-art reinforcement learning training on GPUs for a wide variety of LEGO builds. This offers a flexible and cost-efficient approach for scaling and also provides a robust infrastructure for robot-environment-algorithm communication. We present various experiments across tasks and robot configurations, providing built plans and training results. Furthermore, we demonstrate that inexpensive LEGO robots can be trained end-to-end in the real world to achieve simple tasks, with training times typically under 120 minutes on a normal laptop. Moreover, we show how users can extend the capabilities, exemplified by the successful integration of non-LEGO sensors. By enhancing accessibility to both robotics and reinforcement learning, BricksRL establishes a strong foundation for democratized robotic learning in research and educational settings.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
ACEGEN: Reinforcement learning of generative chemical agents for drug discovery
Authors:
Albert Bou,
Morgan Thomas,
Sebastian Dittert,
Carles Navarro Ramírez,
Maciej Majewski,
Ye Wang,
Shivam Patel,
Gary Tresadern,
Mazen Ahmad,
Vincent Moens,
Woody Sherman,
Simone Sciabola,
Gianni De Fabritiis
Abstract:
In recent years, reinforcement learning (RL) has emerged as a valuable tool in drug design, offering the potential to propose and optimize molecules with desired properties. However, striking a balance between capabilities, flexibility, reliability, and efficiency remains challenging due to the complexity of advanced RL algorithms and the significant reliance on specialized code. In this work, we…
▽ More
In recent years, reinforcement learning (RL) has emerged as a valuable tool in drug design, offering the potential to propose and optimize molecules with desired properties. However, striking a balance between capabilities, flexibility, reliability, and efficiency remains challenging due to the complexity of advanced RL algorithms and the significant reliance on specialized code. In this work, we introduce ACEGEN, a comprehensive and streamlined toolkit tailored for generative drug design, built using TorchRL, a modern RL library that offers thoroughly tested reusable components. We validate ACEGEN by benchmarking against other published generative modeling algorithms and show comparable or improved performance. We also show examples of ACEGEN applied in multiple drug discovery case studies. ACEGEN is accessible at \url{https://github.com/acellera/acegen-open} and available for use under the MIT license.
△ Less
Submitted 3 June, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
On the Inclusion of Charge and Spin States in Cartesian Tensor Neural Network Potentials
Authors:
Guillem Simeon,
Antonio Mirarchi,
Raul P. Pelaez,
Raimondas Galvelis,
Gianni De Fabritiis
Abstract:
In this letter, we present an extension to TensorNet, a state-of-the-art equivariant Cartesian tensor neural network potential, allowing it to handle charged molecules and spin states without architectural changes or increased costs. By incorporating these attributes, we address input degeneracy issues, enhancing the model's predictive accuracy across diverse chemical systems. This advancement sig…
▽ More
In this letter, we present an extension to TensorNet, a state-of-the-art equivariant Cartesian tensor neural network potential, allowing it to handle charged molecules and spin states without architectural changes or increased costs. By incorporating these attributes, we address input degeneracy issues, enhancing the model's predictive accuracy across diverse chemical systems. This advancement significantly broadens TensorNet's applicability, maintaining its efficiency and accuracy.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations
Authors:
Raul P. Pelaez,
Guillem Simeon,
Raimondas Galvelis,
Antonio Mirarchi,
Peter Eastman,
Stefan Doerr,
Philipp Thölke,
Thomas E. Markland,
Gianni De Fabritiis
Abstract:
Achieving a balance between computational speed, prediction accuracy, and universal applicability in molecular simulations has been a persistent challenge. This paper presents substantial advancements in the TorchMD-Net software, a pivotal step forward in the shift from conventional force fields to neural network-based potentials. The evolution of TorchMD-Net into a more comprehensive and versatil…
▽ More
Achieving a balance between computational speed, prediction accuracy, and universal applicability in molecular simulations has been a persistent challenge. This paper presents substantial advancements in the TorchMD-Net software, a pivotal step forward in the shift from conventional force fields to neural network-based potentials. The evolution of TorchMD-Net into a more comprehensive and versatile framework is highlighted, incorporating cutting-edge architectures such as TensorNet. This transformation is achieved through a modular design approach, encouraging customized applications within the scientific community. The most notable enhancement is a significant improvement in computational efficiency, achieving a very remarkable acceleration in the computation of energy and forces for TensorNet models, with performance gains ranging from 2-fold to 10-fold over previous iterations. Other enhancements include highly optimized neighbor search algorithms that support periodic boundary conditions and the smooth integration with existing molecular dynamics frameworks. Additionally, the updated version introduces the capability to integrate physical priors, further enriching its application spectrum and utility in research. The software is available at https://github.com/torchmd/torchmd-net.
△ Less
Submitted 23 May, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials
Authors:
Peter Eastman,
Raimondas Galvelis,
Raúl P. Peláez,
Charlles R. A. Abreu,
Stephen E. Farr,
Emilio Gallicchio,
Anton Gorenko,
Michael M. Henry,
Frank Hu,
**g Huang,
Andreas Krämer,
Julien Michel,
Joshua A. Mitchell,
Vijay S. Pande,
João PGLM Rodrigues,
Jaime Rodriguez-Guerra,
Andrew C. Simmonett,
Sukrit Singh,
Jason Swails,
Philip Turner,
Yuanqing Wang,
Ivy Zhang,
John D. Chodera,
Gianni De Fabritiis,
Thomas E. Markland
Abstract:
Machine learning plays an important and growing role in molecular simulation. The newest version of the OpenMM molecular dynamics toolkit introduces new features to support the use of machine learning potentials. Arbitrary PyTorch models can be added to a simulation and used to compute forces and energy. A higher-level interface allows users to easily model their molecules of interest with general…
▽ More
Machine learning plays an important and growing role in molecular simulation. The newest version of the OpenMM molecular dynamics toolkit introduces new features to support the use of machine learning potentials. Arbitrary PyTorch models can be added to a simulation and used to compute forces and energy. A higher-level interface allows users to easily model their molecules of interest with general purpose, pretrained potential functions. A collection of optimized CUDA kernels and custom PyTorch operations greatly improves the speed of simulations. We demonstrate these features on simulations of cyclin-dependent kinase 8 (CDK8) and the green fluorescent protein (GFP) chromophore in water. Taken together, these features make it practical to use machine learning to improve the accuracy of simulations at only a modest increase in cost.
△ Less
Submitted 29 November, 2023; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Machine Learning Small Molecule Properties in Drug Discovery
Authors:
Nikolai Schapin,
Maciej Majewski,
Alejandro Varela,
Carlos Arroniz,
Gianni De Fabritiis
Abstract:
Machine learning (ML) is a promising approach for predicting small molecule properties in drug discovery. Here, we provide a comprehensive overview of various ML methods introduced for this purpose in recent years. We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity). We discuss existing popular da…
▽ More
Machine learning (ML) is a promising approach for predicting small molecule properties in drug discovery. Here, we provide a comprehensive overview of various ML methods introduced for this purpose in recent years. We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity). We discuss existing popular datasets and molecular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks. We highlight also challenges of predicting and optimizing multiple properties during hit-to-lead and lead optimization stages of drug discovery and explore briefly possible multi-objective optimization techniques that can be used to balance diverse properties while optimizing lead candidates. Finally, techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery are assessed. Overall, this review provides insights into the landscape of ML models for small molecule property predictions in drug discovery. So far, there are multiple diverse approaches, but their performances are often comparable. Neural networks, while more flexible, do not always outperform simpler models. This shows that the availability of high-quality training data remains crucial for training accurate models and there is a need for standardized benchmarks, additional performance metrics, and best practices to enable richer comparisons between the different techniques and models that can shed a better light on the differences between the many techniques.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Top-down machine learning of coarse-grained protein force-fields
Authors:
Carles Navarro,
Maciej Majewski,
Gianni de Fabritiis
Abstract:
Develo** accurate and efficient coarse-grained representations of proteins is crucial for understanding their folding, function, and interactions over extended timescales. Our methodology involves simulating proteins with molecular dynamics and utilizing the resulting trajectories to train a neural network potential through differentiable trajectory reweighting. Remarkably, this method requires…
▽ More
Develo** accurate and efficient coarse-grained representations of proteins is crucial for understanding their folding, function, and interactions over extended timescales. Our methodology involves simulating proteins with molecular dynamics and utilizing the resulting trajectories to train a neural network potential through differentiable trajectory reweighting. Remarkably, this method requires only the native conformation of proteins, eliminating the need for labeled data derived from extensive simulations or memory-intensive end-to-end differentiable simulations. Once trained, the model can be employed to run parallel molecular dynamics simulations and sample folding events for proteins both within and beyond the training distribution, showcasing its extrapolation capabilities. By applying Markov State Models, native-like conformations of the simulated proteins can be predicted from the coarse-grained simulations. Owing to its theoretical transferability and ability to use solely experimental static structures as training data, we anticipate that this approach will prove advantageous for develo** new protein force fields and further advancing the study of protein dynamics, folding, and interactions.
△ Less
Submitted 10 October, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
TensorNet: Cartesian Tensor Representations for Efficient Learning of Molecular Potentials
Authors:
Guillem Simeon,
Gianni de Fabritiis
Abstract:
The development of efficient machine learning models for molecular systems representation is becoming crucial in scientific research. We introduce TensorNet, an innovative O(3)-equivariant message-passing neural network architecture that leverages Cartesian tensor representations. By using Cartesian tensor atomic embeddings, feature mixing is simplified through matrix product operations. Furthermo…
▽ More
The development of efficient machine learning models for molecular systems representation is becoming crucial in scientific research. We introduce TensorNet, an innovative O(3)-equivariant message-passing neural network architecture that leverages Cartesian tensor representations. By using Cartesian tensor atomic embeddings, feature mixing is simplified through matrix product operations. Furthermore, the cost-effective decomposition of these tensors into rotation group irreducible representations allows for the separate processing of scalars, vectors, and tensors when necessary. Compared to higher-rank spherical tensor models, TensorNet demonstrates state-of-the-art performance with significantly fewer parameters. For small molecule potential energies, this can be achieved even with a single interaction layer. As a result of all these properties, the model's computational cost is substantially decreased. Moreover, the accurate prediction of vector and tensor molecular quantities on top of potential energies and forces is possible. In summary, TensorNet's framework opens up a new space for the design of state-of-the-art equivariant models.
△ Less
Submitted 30 October, 2023; v1 submitted 10 June, 2023;
originally announced June 2023.
-
TorchRL: A data-driven decision-making library for PyTorch
Authors:
Albert Bou,
Matteo Bettini,
Sebastian Dittert,
Vikash Kumar,
Shagun Sodhani,
Xiaomeng Yang,
Gianni De Fabritiis,
Vincent Moens
Abstract:
PyTorch has ascended as a premier machine learning framework, yet it lacks a native and comprehensive library for decision and control tasks suitable for large development teams dealing with complex real-world data and environments. To address this issue, we propose TorchRL, a generalistic control library for PyTorch that provides well-integrated, yet standalone components. We introduce a new and…
▽ More
PyTorch has ascended as a premier machine learning framework, yet it lacks a native and comprehensive library for decision and control tasks suitable for large development teams dealing with complex real-world data and environments. To address this issue, we propose TorchRL, a generalistic control library for PyTorch that provides well-integrated, yet standalone components. We introduce a new and flexible PyTorch primitive, the TensorDict, which facilitates streamlined algorithm development across the many branches of Reinforcement Learning (RL) and control. We provide a detailed description of the building blocks and an extensive overview of the library across domains and tasks. Finally, we experimentally demonstrate its reliability and flexibility and show comparative benchmarks to demonstrate its computational efficiency. TorchRL fosters long-term support and is publicly available on GitHub for greater reproducibility and collaboration within the research community. The code is open-sourced on GitHub.
△ Less
Submitted 27 November, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Binding-and-folding recognition of an intrinsically disordered protein using online learning molecular dynamics
Authors:
Pablo Herrera-Nieto,
Adrià Pérez,
Gianni De Fabritiis
Abstract:
Intrinsically disordered proteins participate in many biological processes by folding upon binding with other proteins. However, coupled folding and binding processes are not well understood from an atomistic point of view. One of the main questions is whether folding occurs prior to or after binding. Here we use a novel unbiased high-throughput adaptive sampling approach to reconstruct the bindin…
▽ More
Intrinsically disordered proteins participate in many biological processes by folding upon binding with other proteins. However, coupled folding and binding processes are not well understood from an atomistic point of view. One of the main questions is whether folding occurs prior to or after binding. Here we use a novel unbiased high-throughput adaptive sampling approach to reconstruct the binding and folding between the disordered transactivation domain of \mbox{c-Myb} and the KIX domain of the CREB-binding protein. The reconstructed long-term dynamical process highlights the binding of a short stretch of amino acids on \mbox{c-Myb} as a folded $α$-helix. Leucine residues, specially Leu298 to Leu302, establish initial native contacts that prime the binding and folding of the rest of the peptide, with a mixture of conformational selection on the N-terminal region with an induced fit of the C-terminal.
△ Less
Submitted 20 February, 2023;
originally announced February 2023.
-
Machine Learning Coarse-Grained Potentials of Protein Thermodynamics
Authors:
Maciej Majewski,
Adrià Pérez,
Philipp Thölke,
Stefan Doerr,
Nicholas E. Charron,
Toni Giorgino,
Brooke E. Husic,
Cecilia Clementi,
Frank Noé,
Gianni De Fabritiis
Abstract:
A generalized understanding of protein dynamics is an unsolved scientific problem, the solution of which is critical to the interpretation of the structure-function relationships that govern essential biological processes. Here, we approach this problem by constructing coarse-grained molecular potentials based on artificial neural networks and grounded in statistical mechanics. For training, we bu…
▽ More
A generalized understanding of protein dynamics is an unsolved scientific problem, the solution of which is critical to the interpretation of the structure-function relationships that govern essential biological processes. Here, we approach this problem by constructing coarse-grained molecular potentials based on artificial neural networks and grounded in statistical mechanics. For training, we build a unique dataset of unbiased all-atom molecular dynamics simulations of approximately 9 ms for twelve different proteins with multiple secondary structure arrangements. The coarse-grained models are capable of accelerating the dynamics by more than three orders of magnitude while preserving the thermodynamics of the systems. Coarse-grained simulations identify relevant structural states in the ensemble with comparable energetics to the all-atom systems. Furthermore, we show that a single coarse-grained potential can integrate all twelve proteins and can capture experimental structural features of mutated proteins. These results indicate that machine learning coarse-grained potentials could provide a feasible approach to simulate and understand protein dynamics.
△ Less
Submitted 14 December, 2022;
originally announced December 2022.
-
SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials
Authors:
Peter Eastman,
Pavan Kumar Behara,
David L. Dotson,
Raimondas Galvelis,
John E. Herr,
Josh T. Horton,
Yuezhi Mao,
John D. Chodera,
Benjamin P. Pritchard,
Yuanqing Wang,
Gianni De Fabritiis,
Thomas E. Markland
Abstract:
Machine learning potentials are an important tool for molecular simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins. It contains over 1.1 million conformations for a diverse set of small…
▽ More
Machine learning potentials are an important tool for molecular simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins. It contains over 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids. It includes 15 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions. It provides both forces and energies calculated at the ωB97M-D3(BJ)/def2-TZVPPD level of theory, along with other useful quantities such as multipole moments and bond orders. We train a set of machine learning potentials on it and demonstrate that they can achieve chemical accuracy across a broad region of chemical space. It can serve as a valuable resource for the creation of transferable, ready to use potential functions for use in molecular simulations.
△ Less
Submitted 23 November, 2022; v1 submitted 21 September, 2022;
originally announced September 2022.
-
TorchMD-NET: Equivariant Transformers for Neural Network based Molecular Potentials
Authors:
Philipp Thölke,
Gianni De Fabritiis
Abstract:
The prediction of quantum mechanical properties is historically plagued by a trade-off between accuracy and speed. Machine learning potentials have previously shown great success in this domain, reaching increasingly better accuracy while maintaining computational efficiency comparable with classical force fields. In this work we propose TorchMD-NET, a novel equivariant transformer (ET) architectu…
▽ More
The prediction of quantum mechanical properties is historically plagued by a trade-off between accuracy and speed. Machine learning potentials have previously shown great success in this domain, reaching increasingly better accuracy while maintaining computational efficiency comparable with classical force fields. In this work we propose TorchMD-NET, a novel equivariant transformer (ET) architecture, outperforming state-of-the-art on MD17, ANI-1, and many QM9 targets in both accuracy and computational efficiency. Through an extensive attention weight analysis, we gain valuable insights into the black box predictor and show differences in the learned representation of conformers versus conformations sampled from molecular dynamics or normal modes. Furthermore, we highlight the importance of datasets including off-equilibrium conformations for the evaluation of molecular potentials.
△ Less
Submitted 23 April, 2022; v1 submitted 5 February, 2022;
originally announced February 2022.
-
NNP/MM: Accelerating molecular dynamics simulations with machine learning potentials and molecular mechanic
Authors:
Raimondas Galvelis,
Alejandro Varela-Rial,
Stefan Doerr,
Roberto Fino,
Peter Eastman,
Thomas E. Markland,
John D. Chodera,
Gianni De Fabritiis
Abstract:
Machine learning potentials have emerged as a means to enhance the accuracy of biomolecular simulations. However, their application is constrained by the significant computational cost arising from the vast number of parameters compared to traditional molecular mechanics. To tackle this issue, we introduce an optimized implementation of the hybrid method (NNP/MM), which combines neural network pot…
▽ More
Machine learning potentials have emerged as a means to enhance the accuracy of biomolecular simulations. However, their application is constrained by the significant computational cost arising from the vast number of parameters compared to traditional molecular mechanics. To tackle this issue, we introduce an optimized implementation of the hybrid method (NNP/MM), which combines neural network potentials (NNP) and molecular mechanics (MM). This approach models a portion of the system, such as a small molecule, using NNP while employing MM for the remaining system to boost efficiency. By conducting molecular dynamics (MD) simulations on various protein-ligand complexes and metadynamics (MTD) simulations on a ligand, we showcase the capabilities of our implementation of NNP/MM. It has enabled us to increase the simulation speed by 5 times and achieve a combined sampling of one microsecond for each complex, marking the longest simulations ever reported for this class of simulation.
△ Less
Submitted 28 August, 2023; v1 submitted 20 January, 2022;
originally announced January 2022.
-
TorchMD: A deep learning framework for molecular simulations
Authors:
Stefan Doerr,
Maciej Majewsk,
Adrià Pérez,
Andreas Krämer,
Cecilia Clementi,
Frank Noe,
Toni Giorgino,
Gianni De Fabritiis
Abstract:
Molecular dynamics simulations provide a mechanistic description of molecules by relying on empirical potentials. The quality and transferability of such potentials can be improved leveraging data-driven models derived with machine learning approaches. Here, we present TorchMD, a framework for molecular simulations with mixed classical and machine learning potentials. All of force computations inc…
▽ More
Molecular dynamics simulations provide a mechanistic description of molecules by relying on empirical potentials. The quality and transferability of such potentials can be improved leveraging data-driven models derived with machine learning approaches. Here, we present TorchMD, a framework for molecular simulations with mixed classical and machine learning potentials. All of force computations including bond, angle, dihedral, Lennard-Jones and Coulomb interactions are expressed as PyTorch arrays and operations. Moreover, TorchMD enables learning and simulating neural network potentials. We validate it using standard Amber all-atom simulations, learning an ab-initio potential, performing an end-to-end training and finally learning and simulating a coarse-grained model for protein folding. We believe that TorchMD provides a useful tool-set to support molecular simulations of machine learning potentials. Code and data are freely available at \url{github.com/torchmd}.
△ Less
Submitted 22 December, 2020;
originally announced December 2020.
-
Guided Exploration with Proximal Policy Optimization using a Single Demonstration
Authors:
Gabriele Libardi,
Gianni De Fabritiis
Abstract:
Solving sparse reward tasks through exploration is one of the major challenges in deep reinforcement learning, especially in three-dimensional, partially-observable environments. Critically, the algorithm proposed in this article uses a single human demonstration to solve hard-exploration problems. We train an agent on a combination of demonstrations and own experience to solve problems with varia…
▽ More
Solving sparse reward tasks through exploration is one of the major challenges in deep reinforcement learning, especially in three-dimensional, partially-observable environments. Critically, the algorithm proposed in this article uses a single human demonstration to solve hard-exploration problems. We train an agent on a combination of demonstrations and own experience to solve problems with variable initial conditions. We adapt this idea and integrate it with the proximal policy optimization (PPO). The agent is able to increase its performance and to tackle harder problems by replaying its own past trajectories prioritizing them based on the obtained reward and the maximum value of the trajectory. We compare different variations of this algorithm to behavioral cloning on a set of hard-exploration tasks in the Animal-AI Olympics environment. To the best of our knowledge, learning a task in a three-dimensional environment with comparable difficulty has never been considered before using only one human demonstration.
△ Less
Submitted 16 June, 2021; v1 submitted 7 July, 2020;
originally announced July 2020.
-
Integrating Distributed Architectures in Highly Modular RL Libraries
Authors:
Albert Bou,
Sebastian Dittert,
Gianni De Fabritiis
Abstract:
Advancing reinforcement learning (RL) requires tools that are flexible enough to easily prototype new methods while avoiding impractically slow experimental turnaround times. To match the first requirement, the most popular RL libraries advocate for highly modular agent composability, which facilitates experimentation and development. To solve challenging environments within reasonable time frames…
▽ More
Advancing reinforcement learning (RL) requires tools that are flexible enough to easily prototype new methods while avoiding impractically slow experimental turnaround times. To match the first requirement, the most popular RL libraries advocate for highly modular agent composability, which facilitates experimentation and development. To solve challenging environments within reasonable time frames, scaling RL to large sampling and computing resources has proved a successful strategy. However, this capability has been so far difficult to combine with modularity. In this work, we explore design choices to allow agent composability both at a local and distributed level of execution. We propose a versatile approach that allows the definition of RL agents at different scales through independent reusable components. We demonstrate experimentally that our design choices allow us to reproduce classical benchmarks, explore multiple distributed architectures, and solve novel and complex environments while giving full control to the user in the agent definition and training scheme definition. We believe this work can provide useful insights to the next generation of RL libraries.
△ Less
Submitted 12 June, 2023; v1 submitted 6 July, 2020;
originally announced July 2020.
-
Machine Learning of coarse-grained Molecular Dynamics Force Fields
Authors:
Jiang Wang,
Simon Olsson,
Christoph Wehmeyer,
Adria Perez,
Nicholas E. Charron,
Gianni de Fabritiis,
Frank Noe,
Cecilia Clementi
Abstract:
Atomistic or ab-initio molecular dynamics simulations are widely used to predict thermodynamics and kinetics and relate them to molecular structure. A common approach to go beyond the time- and length-scales accessible with such computationally expensive simulations is the definition of coarse-grained molecular models. Existing coarse-graining approaches define an effective interaction potential t…
▽ More
Atomistic or ab-initio molecular dynamics simulations are widely used to predict thermodynamics and kinetics and relate them to molecular structure. A common approach to go beyond the time- and length-scales accessible with such computationally expensive simulations is the definition of coarse-grained molecular models. Existing coarse-graining approaches define an effective interaction potential to match defined properties of high-resolution models or experimental data. In this paper, we reformulate coarse-graining as a supervised machine learning problem. We use statistical learning theory to decompose the coarse-graining error and cross-validation to select and compare the performance of different models. We introduce CGnets, a deep learning approach, that learns coarse-grained free energy functions and can be trained by a force matching scheme. CGnets maintain all physically relevant invariances and allow one to incorporate prior physics knowledge to avoid sampling of unphysical structures. We show that CGnets can capture all-atom explicit-solvent free energy surfaces with models using only a few coarse-grained beads and no solvent, while classical coarse-graining methods fail to capture crucial features of the free energy surface. Thus, CGnets are able to capture multi-body terms that emerge from the dimensionality reduction.
△ Less
Submitted 3 April, 2019; v1 submitted 4 December, 2018;
originally announced December 2018.
-
Dimensionality reduction methods for molecular simulations
Authors:
Stefan Doerr,
Igor Ariz-Extreme,
Matthew J. Harvey,
Gianni De Fabritiis
Abstract:
Molecular simulations produce very high-dimensional data-sets with millions of data points. As analysis methods are often unable to cope with so many dimensions, it is common to use dimensionality reduction and clustering methods to reach a reduced representation of the data. Yet these methods often fail to capture the most important features necessary for the construction of a Markov model. Here…
▽ More
Molecular simulations produce very high-dimensional data-sets with millions of data points. As analysis methods are often unable to cope with so many dimensions, it is common to use dimensionality reduction and clustering methods to reach a reduced representation of the data. Yet these methods often fail to capture the most important features necessary for the construction of a Markov model. Here we demonstrate the results of various dimensionality reduction methods on two simulation data-sets, one of protein folding and another of protein-ligand binding. The methods tested include a k-means clustering variant, a non-linear auto encoder, principal component analysis and tICA. The dimension-reduced data is then used to estimate the implied timescales of the slowest process by a Markov state model analysis to assess the quality of the projection. The projected dimensions learned from the data are visualized to demonstrate which conformations the various methods choose to represent the molecular process.
△ Less
Submitted 2 November, 2017; v1 submitted 29 October, 2017;
originally announced October 2017.