OrbNet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy
Authors:
Anders S. Christensen,
Sai Krishna Sirumalla,
Zhuoran Qiao,
Michael B. O'Connor,
Daniel G. A. Smith,
Feizhi Ding,
Peter J. Bygrave,
Animashree Anandkumar,
Matthew Welborn,
Frederick R. Manby,
Thomas F. Miller III
Abstract:
We present OrbNet Denali, a machine learning model for electronic structure that is designed as a drop-in replacement for ground-state density functional theory (DFT) energy calculations. The model is a message-passing neural network that uses symmetry-adapted atomic orbital features from a low-cost quantum calculation to predict the energy of a molecule. OrbNet Denali is trained on a vast dataset…
▽ More
We present OrbNet Denali, a machine learning model for electronic structure that is designed as a drop-in replacement for ground-state density functional theory (DFT) energy calculations. The model is a message-passing neural network that uses symmetry-adapted atomic orbital features from a low-cost quantum calculation to predict the energy of a molecule. OrbNet Denali is trained on a vast dataset of 2.3 million DFT calculations on molecules and geometries. This dataset covers the most common elements in bio- and organic chemistry (H, Li, B, C, N, O, F, Na, Mg, Si, P, S, Cl, K, Ca, Br, I) as well as charged molecules. OrbNet Denali is demonstrated on several well-established benchmark datasets, and we find that it provides accuracy that is on par with modern DFT methods while offering a speedup of up to three orders of magnitude. For the GMTKN55 benchmark set, OrbNet Denali achieves WTMAD-1 and WTMAD-2 scores of 7.19 and 9.84, on par with modern DFT functionals. For several GMTKN55 subsets, which contain chemical problems that are not present in the training set, OrbNet Denali produces a mean absolute error comparable to those of DFT methods. For the Hutchison conformers benchmark set, OrbNet Denali has a median correlation coefficient of R^2=0.90 compared to the reference DLPNO-CCSD(T) calculation, and R^2=0.97 compared to the method used to generate the training data (wB97X-D3/def2-TZVP), exceeding the performance of any other method with a similar cost. Similarly, the model reaches chemical accuracy for non-covalent interactions in the S66x10 dataset. For torsional profiles, OrbNet Denali reproduces the torsion profiles of wB97X-D3/def2-TZVP with an average MAE of 0.12 kcal/mol for the potential energy surfaces of the diverse fragments in the TorsionNet500 dataset.
△ Less
Submitted 2 July, 2021; v1 submitted 1 July, 2021;
originally announced July 2021.
Training atomic neural networks using fragment-based data generated in virtual reality
Authors:
Silvia Amabilino,
Lars A. Bratholm,
Simon J. Bennie,
Michael B. O'Connor,
David R. Glowacki
Abstract:
The ability to understand and engineer molecular structures relies on having accurate descriptions of the energy as a function of atomic coordinates. Here we outline a new paradigm for deriving energy functions of hyperdimensional molecular systems, which involves generating data for low-dimensional systems in virtual reality (VR) to then efficiently train atomic neural networks (ANNs). This gener…
▽ More
The ability to understand and engineer molecular structures relies on having accurate descriptions of the energy as a function of atomic coordinates. Here we outline a new paradigm for deriving energy functions of hyperdimensional molecular systems, which involves generating data for low-dimensional systems in virtual reality (VR) to then efficiently train atomic neural networks (ANNs). This generates high quality data for specific areas of interest within the hyperdimensional space that characterizes a molecule's potential energy surface (PES). We demonstrate the utility of this approach by gathering data within VR to train ANNs on chemical reactions involving fewer than 8 heavy atoms. This strategy enables us to predict the energies of much higher-dimensional systems, e.g. containing nearly 100 atoms. Training on datasets containing only 15K geometries, this approach generates mean absolute errors around 2 kcal/mol. This represents one of the first times that an ANN-PES for a large reactive radical has been generated using such a small dataset. Our results suggest VR enables the intelligent curation of high-quality data, which accelerates the learning process.
△ Less
Submitted 30 May, 2020;
originally announced July 2020.
Interactive molecular dynamics in virtual reality for accurate flexible protein-ligand docking
Authors:
Helen M. Deeks,
Rebecca K. Walters,
Stephanie R. Hare,
Michael B. O'Connor,
Adrian J. Mulholland,
David R. Glowacki
Abstract:
Simulating drug binding and unbinding is a challenge, as the rugged energy landscapes that separate bound and unbound states require extensive sampling that consumes significant computational resources. Here, we describe the use of interactive molecular dynamics in virtual reality (iMD-VR) as an accurate low-cost strategy for flexible protein-ligand docking. We outline an experimental protocol whi…
▽ More
Simulating drug binding and unbinding is a challenge, as the rugged energy landscapes that separate bound and unbound states require extensive sampling that consumes significant computational resources. Here, we describe the use of interactive molecular dynamics in virtual reality (iMD-VR) as an accurate low-cost strategy for flexible protein-ligand docking. We outline an experimental protocol which enables expert iMD-VR users to guide ligands into and out of the binding pockets of trypsin, neuraminidase, and HIV-1 protease, and recreate their respective crystallographic protein-ligand binding poses within 5 - 10 minutes. Following a brief training phase, our studies shown that iMD-VR novices were able to generate unbinding and rebinding pathways on similar timescales as iMD-VR experts, with the majority able to recover binding poses within 2.15 Angstrom RMSD of the crystallographic binding pose. These results indicate that iMD-VR affords sufficient control for users to carry out the detailed atomic manipulations required to dock flexible ligands into dynamic enzyme active sites and recover crystallographic poses, offering an interesting new approach for simulating drug docking and generating binding hypotheses.
△ Less
Submitted 4 February, 2020; v1 submitted 20 August, 2019;
originally announced August 2019.