-
Core-level signature of long-range charge-density-wave order and short-range excitonic correlations probed by attosecond broadband spectroscopy
Authors:
Alfred Zong,
Sheng-Chih Lin,
Shunsuke A. Sato,
Emma Berger,
Bailey R. Nebgen,
Marcus Hui,
B. Q. Lv,
Yun Cheng,
Wei Xia,
Yanfeng Guo,
Dao Xiang,
Michael W. Zuerch
Abstract:
Strongly-correlated quantum materials are characterized by a multitude of time- and energy-scales that govern quasiparticle interactions and excitations, and a precise understanding of their ground state properties necessitate an access to experimental observables that are both sensitive to the establishment of long-range order and short-range correlations. Advances in attosecond core-level spectr…
▽ More
Strongly-correlated quantum materials are characterized by a multitude of time- and energy-scales that govern quasiparticle interactions and excitations, and a precise understanding of their ground state properties necessitate an access to experimental observables that are both sensitive to the establishment of long-range order and short-range correlations. Advances in attosecond core-level spectroscopes have successfully unlocked the fastest process involving carrier dynamics in these systems, yet they are not traditionally regarded as an appropriate probe for long-range order due to their local charge sensitivity or for the low-energy physics that governs the ground-state properties of strongly-correlated materials. Here, employing a unique cryogenic attosecond broadband extreme-ultraviolet beamline, we identified clear core-level signatures of long-range charge-density-wave (CDW) formation in an excitonic insulator candidate, titanium diselenide in its 1T phase, although equilibrium measurements of the same core levels in either photoemission or absorption spectroscopy showed no observable signals about the phase transition. Leveraging intrinsic sensitivity to short-range charge excitations in core-level absorption spectra, we observed direct time-domain evidence for incoherent excitonic correlations in the normal-state of the material, whose presence has been subjected to a long-standing debate in equilibrium experiments due to the existence of CDW phonon fluctuations in a similar part of the phase space. Our results demonstrated the importance of simultaneous accesses to long- and short-range ordering with underlying dynamical processes spanning decades of time- and energy-scales, making attosecond broadband spectroscopy an indispensable tool for both understanding the equilibrium phase diagram and for discovering novel, nonequilibrium states in correlated materials.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Robust Adversarial Defense by Tensor Factorization
Authors:
Manish Bhattarai,
Mehmet Cagri Kaymak,
Ryan Barron,
Ben Nebgen,
Kim Rasmussen,
Boian Alexandrov
Abstract:
As machine learning techniques become increasingly prevalent in data analysis, the threat of adversarial attacks has surged, necessitating robust defense mechanisms. Among these defenses, methods exploiting low-rank approximations for input data preprocessing and neural network (NN) parameter factorization have shown potential. Our work advances this field further by integrating the tensorization…
▽ More
As machine learning techniques become increasingly prevalent in data analysis, the threat of adversarial attacks has surged, necessitating robust defense mechanisms. Among these defenses, methods exploiting low-rank approximations for input data preprocessing and neural network (NN) parameter factorization have shown potential. Our work advances this field further by integrating the tensorization of input data with low-rank decomposition and tensorization of NN parameters to enhance adversarial defense. The proposed approach demonstrates significant defense capabilities, maintaining robust accuracy even when subjected to the strongest known auto-attacks. Evaluations against leading-edge robust performance benchmarks reveal that our results not only hold their ground against the best defensive methods available but also exceed all current defense strategies that rely on tensor factorizations. This study underscores the potential of integrating tensorization and low-rank decomposition as a robust defense against adversarial attacks in machine learning.
△ Less
Submitted 3 September, 2023;
originally announced September 2023.
-
A solid-state high harmonic generation spectrometer with cryogenic cooling
Authors:
Finn Kohrell,
Bailey R. Nebgen,
Jacob A. Spies,
Richard Hollinger,
Alfred Zong,
Can Uzundal,
Christian Spielmann,
Michael Zuerch
Abstract:
Solid-state high harmonic generation spectroscopy (sHHG) is a promising technique for studying electronic structure, symmetry, and dynamics in condensed matter systems. Here, we report on the implementation of an advanced sHHG spectrometer based on a vacuum chamber and closed-cycle helium cryostat. Using an in situ temperature probe, it is demonstrated that the sample interaction region retains cr…
▽ More
Solid-state high harmonic generation spectroscopy (sHHG) is a promising technique for studying electronic structure, symmetry, and dynamics in condensed matter systems. Here, we report on the implementation of an advanced sHHG spectrometer based on a vacuum chamber and closed-cycle helium cryostat. Using an in situ temperature probe, it is demonstrated that the sample interaction region retains cryogenic temperature during the application of high-intensity femtosecond laser pulses that generate high harmonics. The presented implementation opens the door for temperature-dependent sHHG measurements down to few Kelvin, which makes sHHG spectroscopy a new tool for studying phases of matter that emerge at low temperatures, which is particularly interesting for highly correlated materials.
△ Less
Submitted 7 September, 2023; v1 submitted 2 September, 2023;
originally announced September 2023.
-
Machine learning potentials with Iterative Boltzmann Inversion: training to experiment
Authors:
Sakib Matin,
Alice Allen,
Justin S. Smith,
Nicholas Lubbers,
Ryan B. Jadrich,
Richard A. Messerly,
Benjamin T. Nebgen,
Ying Wai Li,
Sergei Tretiak,
Kipton Barros
Abstract:
Methodologies for training machine learning potentials (MLPs) to quantum-mechanical simulation data have recently seen tremendous progress. Experimental data has a very different character than simulated data, and most MLP training procedures cannot be easily adapted to incorporate both types of data into the training process. We investigate a training procedure based on Iterative Boltzmann Invers…
▽ More
Methodologies for training machine learning potentials (MLPs) to quantum-mechanical simulation data have recently seen tremendous progress. Experimental data has a very different character than simulated data, and most MLP training procedures cannot be easily adapted to incorporate both types of data into the training process. We investigate a training procedure based on Iterative Boltzmann Inversion that produces a pair potential correction to an existing MLP, using equilibrium radial distribution function data. By applying these corrections to a MLP for pure aluminum based on Density Functional Theory, we observe that the resulting model largely addresses previous overstructuring in the melt phase. Interestingly, the corrected MLP also exhibits improved performance in predicting experimental diffusion constants, which are not included in the training procedure. The presented method does not require auto-differentiating through a molecular dynamics solver, and does not make assumptions about the MLP architecture. The results suggest a practical framework of incorporating experimental data into machine learning models to improve accuracy of molecular dynamics simulations.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
Machine Learning Models Capture Plasmon Dynamics in Ag Nanoparticles
Authors:
Adela Habib,
Nicholas Lubbers,
Sergei Tretiak,
Benjamin Nebgen
Abstract:
Highly energetic electron-hole pairs (hot carriers) formed from plasmon decay in metallic nanostructures promise sustainable pathways for energy-harvesting devices. However, efficient collection before thermalization remains an obstacle for realization of their full energy generating potential. Addressing this challenge requires detailed understanding of physical processes from plasmon excitation…
▽ More
Highly energetic electron-hole pairs (hot carriers) formed from plasmon decay in metallic nanostructures promise sustainable pathways for energy-harvesting devices. However, efficient collection before thermalization remains an obstacle for realization of their full energy generating potential. Addressing this challenge requires detailed understanding of physical processes from plasmon excitation in metal to their collection in a molecule or a semiconductor, where atomistic theoretical investigation may be particularly beneficial. Unfortunately, first-principles theoretical modeling of these processes is extremely costly, limiting the analysis to systems with a few 100s of atoms. Recent advances in machine learned interatomic potentials suggest that dynamics can be accelerated with surrogate models which replace the full solution of the Schroedinger Equation. Here, we modify an existing neural network, Hierarchically Interacting Particle Neural Network (HIP-NN), to predict plasmon dynamics in Ag nanoparticles. We demonstrate the model's capability to accurately predict plasmon dynamics in large nanoparticles of up to 561 atoms not present in the training dataset. More importantly, with machine learning models we gain a speed-up of about 200 times as compared with the rt-TDDFT calculations when predicting important physical quantities such as dynamic dipole moments in Ag55 and about 4000 times for extended nanoparticles that are 10 times larger. This underscores the promise of future machine learning accelerated electron/nuclear dynamics simulations for understanding fundamental properties of plasmon-driven hot carrier devices.
△ Less
Submitted 7 March, 2023;
originally announced March 2023.
-
Semi-Empirical Shadow Molecular Dynamics: A PyTorch implementation
Authors:
Maksim Kulichenko,
Kipton Barros,
Nicholas Lubbers,
Nikita Fedik,
Guoqing Zhou,
Sergei Tretiak,
Benjamin Nebgen,
Anders M. N. Niklasson
Abstract:
Extended Lagrangian Born-Oppenheimer molecular dynamics (XL-BOMD) in its most recent shadow potential energy version has been implemented in the semiempirical PyTorch-based software PySeQM. The implementation includes finite electronic temperatures, canonical density matrix perturbation theory, and an adaptive Krylov Subspace Approximation for the integration of the electronic equations of motion…
▽ More
Extended Lagrangian Born-Oppenheimer molecular dynamics (XL-BOMD) in its most recent shadow potential energy version has been implemented in the semiempirical PyTorch-based software PySeQM. The implementation includes finite electronic temperatures, canonical density matrix perturbation theory, and an adaptive Krylov Subspace Approximation for the integration of the electronic equations of motion within the XL-BOMB approach (KSA-XL-BOMD). The PyTorch implementation leverages the use of GPU and machine learning hardware accelerators for the simulations. The new XL-BOMD formulation allows studying more challenging chemical systems with charge instabilities and low electronic energy gaps. Current public release of PySeQM continues our development of modular architecture for large-scale simulations employing semiempirical quantum mechanical treatment. Applied to molecular dynamics simulation of 840 carbon atoms, one integration time step executes in 4 seconds on a single Nvidia RTX A6000 GPU.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
Lightweight and Effective Tensor Sensitivity for Atomistic Neural Networks
Authors:
Michael Chigaev,
Justin S. Smith,
Steven Anaya,
Benjamin Nebgen,
Matthew Bettencourt,
Kipton Barros,
Nicholas Lubbers
Abstract:
Atomistic machine learning focuses on the creation of models which obey fundamental symmetries of atomistic configurations, such as permutation, translation, and rotation invariances. In many of these schemes, translation and rotation invariance are achieved by building on scalar invariants, e.g., distances between atom pairs. There is growing interest in molecular representations that work intern…
▽ More
Atomistic machine learning focuses on the creation of models which obey fundamental symmetries of atomistic configurations, such as permutation, translation, and rotation invariances. In many of these schemes, translation and rotation invariance are achieved by building on scalar invariants, e.g., distances between atom pairs. There is growing interest in molecular representations that work internally with higher rank rotational tensors, e.g., vector displacements between atoms, and tensor products thereof. Here we present a framework for extending the Hierarchically Interacting Particle Neural Network (HIP-NN) with Tensor Sensitivity information (HIP-NN-TS) from each local atomic environment. Crucially, the method employs a weight tying strategy that allows direct incorporation of many-body information while adding very few model parameters. We show that HIP-NN-TS is more accurate than HIP-NN, with negligible increase in parameter count, for several datasets and network sizes. As the dataset becomes more complex, tensor sensitivities provide greater improvements to model accuracy. In particular, HIP-NN-TS achieves a record mean absolute error of 0.927 kcal/mol for conformational energy variation on the challenging COMP6 benchmark, which includes a broad set of organic molecules. We also compare the computational performance of HIP-NN-TS to HIP-NN and other models in the literature.
△ Less
Submitted 26 April, 2023; v1 submitted 6 December, 2022;
originally announced December 2022.
-
Using Machine Learning Hamiltonians To Compute Molecular Motor Barrier Heights
Authors:
Aaron Philip,
Guoqing Zhou,
Benjamin Nebgen
Abstract:
Machine Learning Inter-atomic Potentials (MLIPs) have become a common tool in use by computational chemists due to their combination of accuracy and speed. Yet, it is still not clear how well these tools behave at or near transitions states found in complex molecules. Here we investigate the applicability of MLIPs in evaluating the transition barrier of two, complex, molecular motor systems: a 1st…
▽ More
Machine Learning Inter-atomic Potentials (MLIPs) have become a common tool in use by computational chemists due to their combination of accuracy and speed. Yet, it is still not clear how well these tools behave at or near transitions states found in complex molecules. Here we investigate the applicability of MLIPs in evaluating the transition barrier of two, complex, molecular motor systems: a 1st generation Feringa motor and the 9c alkene 2nd generation Feringa motor. We compared paths generated with the Hierarchically Interacting Particle Neural Network (HIP-NN), the PM3 semi-empirical quantum method (SEQM), PM3 interfaced with HIP-NN (SEQM+HIP-NN), and Density Functional Theory calculations. We found that using SEQM+HIP-NN to generate cheap, realistic pathway guesses then refining the intermediates with DFT allowed us to cheaply find realistic reaction paths and energy barriers matching experiment, providing evidence that deep learning can be used for high precision computational tasks such as transition path sampling while also suggesting potential application to high throughput screening.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
Distributed non-negative RESCAL with Automatic Model Selection for Exascale Data
Authors:
Manish Bhattarai,
Namita Kharat,
Erik Skau,
Benjamin Nebgen,
Hristo Djidjev,
Sanjay Rajopadhye,
James P. Smith,
Boian Alexandrov
Abstract:
With the boom in the development of computer hardware and software, social media, IoT platforms, and communications, there has been an exponential growth in the volume of data produced around the world. Among these data, relational datasets are growing in popularity as they provide unique insights regarding the evolution of communities and their interactions. Relational datasets are naturally non-…
▽ More
With the boom in the development of computer hardware and software, social media, IoT platforms, and communications, there has been an exponential growth in the volume of data produced around the world. Among these data, relational datasets are growing in popularity as they provide unique insights regarding the evolution of communities and their interactions. Relational datasets are naturally non-negative, sparse, and extra-large. Relational data usually contain triples, (subject, relation, object), and are represented as graphs/multigraphs, called knowledge graphs, which need to be embedded into a low-dimensional dense vector space. Among various embedding models, RESCAL allows learning of relational data to extract the posterior distributions over the latent variables and to make predictions of missing relations. However, RESCAL is computationally demanding and requires a fast and distributed implementation to analyze extra-large real-world datasets. Here we introduce a distributed non-negative RESCAL algorithm for heterogeneous CPU/GPU architectures with automatic selection of the number of latent communities (model selection), called pyDRESCALk. We demonstrate the correctness of pyDRESCALk with real-world and large synthetic tensors, and the efficacy showing near-linear scaling that concurs with the theoretical complexities. Finally, pyDRESCALk determines the number of latent communities in an 11-terabyte dense and 9-exabyte sparse synthetic tensor.
△ Less
Submitted 18 February, 2022;
originally announced February 2022.
-
Signatures of multi-band effects in high-harmonic generation in monolayer MoS$_2$
Authors:
Lun Yue,
Richard Hollinger,
Can B. Uzundal,
Bailey Nebgen,
Ziyang Gan,
Emad Najafidehaghani,
Antony George,
Christian Spielmann,
Daniil Kartashov,
Andrey Turchanin,
Diana Y. Qiu,
Mette B. Gaarde,
Michael Zuerch
Abstract:
High-harmonic generation (HHG) in solids has been touted as a way to probe ultrafast dynamics and crystal symmetries in condensed matter systems. Here, we investigate the polarization properties of high-order harmonics generated in monolayer MoS$_2$, as a function of crystal orientation relative to the mid-infrared laser field polarization. At several different laser wavelengths we experimentally…
▽ More
High-harmonic generation (HHG) in solids has been touted as a way to probe ultrafast dynamics and crystal symmetries in condensed matter systems. Here, we investigate the polarization properties of high-order harmonics generated in monolayer MoS$_2$, as a function of crystal orientation relative to the mid-infrared laser field polarization. At several different laser wavelengths we experimentally observe a prominent angular shift of the parallel-polarized odd harmonics for energies above approximately 3.5 eV, and our calculations indicate that this shift originates in subtle differences in the recombination dipole strengths involving multiple conduction bands. This observation is material specific and is in addition to the angular dependence imposed by the dynamical symmetry properties of the crystal interacting with the laser field, and may pave the way for probing the vectorial character of multi-band recombination dipoles.
△ Less
Submitted 15 February, 2022; v1 submitted 24 December, 2021;
originally announced December 2021.
-
A Neural Network for Determination of Latent Dimensionality in Nonnegative Matrix Factorization
Authors:
Benjamin T. Nebgen,
Raviteja Vangara,
Miguel A. Hombrados-Herrera,
Svetlana Kuksova,
Boian S. Alexandrov
Abstract:
Non-negative Matrix Factorization (NMF) has proven to be a powerful unsupervised learning method for uncovering hidden features in complex and noisy data sets with applications in data mining, text recognition, dimension reduction, face recognition, anomaly detection, blind source separation, and many other fields. An important input for NMF is the latent dimensionality of the data, that is, the n…
▽ More
Non-negative Matrix Factorization (NMF) has proven to be a powerful unsupervised learning method for uncovering hidden features in complex and noisy data sets with applications in data mining, text recognition, dimension reduction, face recognition, anomaly detection, blind source separation, and many other fields. An important input for NMF is the latent dimensionality of the data, that is, the number of hidden features, K, present in the explored data set. Unfortunately, this quantity is rarely known a priori. We utilize a supervised machine learning approach in combination with a recent method for model determination, called NMFk, to determine the number of hidden features automatically. NMFk performs a set of NMF simulations on an ensemble of matrices, obtained by bootstrap** the initial data set, and determines which K produces stable groups of latent features that reconstruct the initial data set well. We then train a Multi-Layer Perceptron (MLP) classifier network to determine the correct number of latent features utilizing the statistics and characteristics of the NMF solutions, obtained from NMFk. In order to train the MLP classifier, a training set of 58,660 matrices with predetermined latent features were factorized with NMFk. The MLP classifier in conjunction with NMFk maintains a greater than 95% success rate when applied to a held out test set. Additionally, when applied to two well-known benchmark data sets, the swimmer and MIT face data, NMFk/MLP correctly recovered the established number of hidden features. Finally, we compared the accuracy of our method to the ARD, AIC and Stability-based methods.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
Automated discovery of a robust interatomic potential for aluminum
Authors:
Justin S. Smith,
Benjamin Nebgen,
Nithin Mathew,
Jie Chen,
Nicholas Lubbers,
Leonid Burakovsky,
Sergei Tretiak,
Hai Ah Nam,
Timothy Germann,
Saryu Fensin,
Kipton Barros
Abstract:
Accuracy of molecular dynamics simulations depends crucially on the interatomic potential used to generate forces. The gold standard would be first-principles quantum mechanics (QM) calculations, but these become prohibitively expensive at large simulation scales. Machine learning (ML) based potentials aim for faithful emulation of QM at drastically reduced computational cost. The accuracy and rob…
▽ More
Accuracy of molecular dynamics simulations depends crucially on the interatomic potential used to generate forces. The gold standard would be first-principles quantum mechanics (QM) calculations, but these become prohibitively expensive at large simulation scales. Machine learning (ML) based potentials aim for faithful emulation of QM at drastically reduced computational cost. The accuracy and robustness of an ML potential is primarily limited by the quality and diversity of the training dataset. Using the principles of active learning (AL), we present a highly automated approach to dataset construction. The strategy is to use the ML potential under development to sample new atomic configurations and, whenever a configuration is reached for which the ML uncertainty is sufficiently large, collect new QM data. Here, we seek to push the limits of automation, removing as much expert knowledge from the AL process as possible. All sampling is performed using MD simulations starting from an initially disordered configuration, and undergoing non-equilibrium dynamics as driven by time-varying applied temperatures. We demonstrate this approach by building an ML potential for aluminum (ANI-Al). After many AL iterations, ANI-Al teaches itself to predict properties like the radial distribution function in melt, liquid-solid coexistence curve, and crystal properties such as defect energies and barriers. To demonstrate transferability, we perform a 1.3M atom shock simulation, and show that ANI-Al predictions agree very well with DFT calculations on local atomic environments sampled from the nonequilibrium dynamics. Interestingly, the configurations appearing in shock appear to have been well sampled in the AL training dataset, in a way that we illustrate visually.
△ Less
Submitted 24 August, 2020; v1 submitted 10 March, 2020;
originally announced March 2020.
-
Machine Learned Hückel Theory: Interfacing Physics and Deep Neural Networks
Authors:
Tetiana Zubatyuk,
Ben Nebgen,
Nicholas Lubbers,
Justin S. Smith,
Roman Zubatyuk,
Guoqing Zhou,
Christopher Koh,
Kipton Barros,
Olexandr Isayev,
Sergei Tretiak
Abstract:
The Hückel Hamiltonian is an incredibly simple tight-binding model famed for its ability to capture qualitative physics phenomena arising from electron interactions in molecules and materials. Part of its simplicity arises from using only two types of empirically fit physics-motivated parameters: the first describes the orbital energies on each atom and the second describes electronic interactions…
▽ More
The Hückel Hamiltonian is an incredibly simple tight-binding model famed for its ability to capture qualitative physics phenomena arising from electron interactions in molecules and materials. Part of its simplicity arises from using only two types of empirically fit physics-motivated parameters: the first describes the orbital energies on each atom and the second describes electronic interactions and bonding between atoms. By replacing these traditionally static parameters with dynamically predicted values, we vastly increase the accuracy of the extended Hückel model. The dynamic values are generated with a deep neural network, which is trained to reproduce orbital energies and densities derived from density functional theory. The resulting model retains interpretability while the deep neural network parameterization is smooth, accurate, and reproduces insightful features of the original static parameterization. Finally, we demonstrate that the Hückel model, and not the deep neural network, is responsible for capturing intricate orbital interactions in two molecular case studies. Overall, this work shows the promise of utilizing machine learning to formulate simple, accurate, and dynamically parameterized physics models.
△ Less
Submitted 27 September, 2019;
originally announced September 2019.
-
Transferable Molecular Charge Assignment Using Deep Neural Networks
Authors:
Ben Nebgen,
Nick Lubbers,
Justin S. Smith,
Andrew Sifain,
Andrey Lokhov,
Olexandr Isayev,
Adrian Roitberg,
Kipton Barros,
Sergei Tretiak
Abstract:
We use HIP-NN, a neural network architecture that excels at predicting molecular energies, to predict atomic charges. The charge predictions are accurate over a wide range of molecules (both small and large) and for a diverse set of charge assignment schemes. To demonstrate the power of charge prediction on non-equilibrium geometries, we use HIP-NN to generate IR spectra from dynamical trajectorie…
▽ More
We use HIP-NN, a neural network architecture that excels at predicting molecular energies, to predict atomic charges. The charge predictions are accurate over a wide range of molecules (both small and large) and for a diverse set of charge assignment schemes. To demonstrate the power of charge prediction on non-equilibrium geometries, we use HIP-NN to generate IR spectra from dynamical trajectories on a variety of molecules. The results are in good agreement with reference IR spectra produced by traditional theoretical methods. Critically, for this application, HIP-NN charge predictions are about 104 times faster than direct DFT charge calculations. Thus, ML provides a pathway to greatly increase the range of feasible simulations while retaining quantum-level accuracy. In summary, our results provide further evidence that machine learning can replicate high-level quantum calculations at a tiny fraction of the computational cost.
△ Less
Submitted 12 March, 2018;
originally announced March 2018.
-
Less is more: sampling chemical space with active learning
Authors:
Justin S. Smith,
Ben Nebgen,
Nicholas Lubbers,
Olexandr Isayev,
Adrian E. Roitberg
Abstract:
The development of accurate and transferable machine learning (ML) potentials for predicting molecular energetics is a challenging task. The process of data generation to train such ML potentials is a task neither well understood nor researched in detail. In this work, we present a fully automated approach for the generation of datasets with the intent of training universal ML potentials. It is ba…
▽ More
The development of accurate and transferable machine learning (ML) potentials for predicting molecular energetics is a challenging task. The process of data generation to train such ML potentials is a task neither well understood nor researched in detail. In this work, we present a fully automated approach for the generation of datasets with the intent of training universal ML potentials. It is based on the concept of active learning (AL) via Query by Committee (QBC), which uses the disagreement between an ensemble of ML potentials to infer the reliability of the ensemble's prediction. QBC allows the presented AL algorithm to automatically sample regions of chemical space where the ML potential fails to accurately predict the potential energy. AL improves the overall fitness of ANAKIN-ME (ANI) deep learning potentials in rigorous test cases by mitigating human biases in deciding what new training data to use. AL also reduces the training set size to a fraction of the data required when using naive random sampling techniques. To provide validation of our AL approach we develop the COMP6 benchmark (publicly available on GitHub), which contains a diverse set of organic molecules. Through the AL process, it is shown that the AL-based potentials perform as well as the ANI-1 potential on COMP6 with only 10% of the data, and vastly outperforms ANI-1 with 25% the amount of data. Finally, we show that our proposed AL technique develops a universal ANI potential (ANI-1x) that provides accurate energy and force predictions on the entire COMP6 benchmark. This universal ML potential achieves a level of accuracy on par with the best ML potentials for single molecule or materials, while remaining applicable to the general class of organic molecules comprised of the elements CHNO.
△ Less
Submitted 9 April, 2018; v1 submitted 28 January, 2018;
originally announced January 2018.