Search | arXiv e-print repository

arXiv:2310.20155 [pdf]

doi 10.1021/acs.jctc.3c01203

MLatom 3: Platform for machine learning-enhanced computational chemistry simulations and workflows

Authors: Pavlo O. Dral, Fuchun Ge, Yi-Fan Hou, Peikun Zheng, Yuxinxin Chen, Mario Barbatti, Olexandr Isayev, Cheng Wang, Bao-Xin Xue, Max Pinheiro Jr, Yuming Su, Yiheng Dai, Yangtao Chen, Lina Zhang, Shuang Zhang, Arif Ullah, Quanhao Zhang, Yanchi Ou

Abstract: Machine learning (ML) is increasingly becoming a common tool in computational chemistry. At the same time, the rapid development of ML methods requires a flexible software framework for designing custom workflows. MLatom 3 is a program package designed to leverage the power of ML to enhance typical computational chemistry simulations and to create complex workflows. This open-source package provid… ▽ More Machine learning (ML) is increasingly becoming a common tool in computational chemistry. At the same time, the rapid development of ML methods requires a flexible software framework for designing custom workflows. MLatom 3 is a program package designed to leverage the power of ML to enhance typical computational chemistry simulations and to create complex workflows. This open-source package provides plenty of choice to the users who can run simulations with the command line options, input files, or with scripts using MLatom as a Python package, both on their computers and on the online XACS cloud computing at XACScloud.com. Computational chemists can calculate energies and thermochemical properties, optimize geometries, run molecular and quantum dynamics, and simulate (ro)vibrational, one-photon UV/vis absorption, and two-photon absorption spectra with ML, quantum mechanical, and combined models. The users can choose from an extensive library of methods containing pre-trained ML models and quantum mechanical approximations such as AIQM1 approaching coupled-cluster accuracy. The developers can build their own models using various ML algorithms. The great flexibility of MLatom is largely due to the extensive use of the interfaces to many state-of-the-art software packages and libraries. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.00115 [pdf, other]

Learning Over Molecular Conformer Ensembles: Datasets and Benchmarks

Authors: Yanqiao Zhu, Jeehyun Hwang, Keir Adams, Zhen Liu, Bozhao Nan, Brock Stenfors, Yuanqi Du, Jatin Chauhan, Olaf Wiest, Olexandr Isayev, Connor W. Coley, Yizhou Sun, Wei Wang

Abstract: Molecular Representation Learning (MRL) has proven impactful in numerous biochemical applications such as drug discovery and enzyme design. While Graph Neural Networks (GNNs) are effective at learning molecular representations from a 2D molecular graph or a single 3D structure, existing works often overlook the flexible nature of molecules, which continuously interconvert across conformations via… ▽ More Molecular Representation Learning (MRL) has proven impactful in numerous biochemical applications such as drug discovery and enzyme design. While Graph Neural Networks (GNNs) are effective at learning molecular representations from a 2D molecular graph or a single 3D structure, existing works often overlook the flexible nature of molecules, which continuously interconvert across conformations via chemical bond rotations and minor vibrational perturbations. To better account for molecular flexibility, some recent works formulate MRL as an ensemble learning problem, focusing on explicitly learning from a set of conformer structures. However, most of these studies have limited datasets, tasks, and models. In this work, we introduce the first MoleculAR Conformer Ensemble Learning (MARCEL) benchmark to thoroughly evaluate the potential of learning on conformer ensembles and suggest promising research directions. MARCEL includes four datasets covering diverse molecule- and reaction-level properties of chemically diverse molecules including organocatalysts and transition-metal catalysts, extending beyond the scope of common GNN benchmarks that are confined to drug-like molecules. In addition, we conduct a comprehensive empirical study, which benchmarks representative 1D, 2D, and 3D molecular representation learning models, along with two strategies that explicitly incorporate conformer ensembles into 3D MRL models. Our findings reveal that direct learning from an accessible conformer space can improve performance on a variety of tasks and models. △ Less

Submitted 29 September, 2023; originally announced October 2023.

Comments: 19 pages

arXiv:2301.07594 [pdf, other]

Structure Prediction of Epitaxial Organic Interfaces with Ogre, Demonstrated for TCNQ on TTF

Authors: Saeed Moayedpour, Imaneul Bier, Wen Wen, Derek Dardzinski, Olexandr Isayev, Noa Marom

Abstract: Highly ordered epitaxial interfaces between organic semiconductors are considered as a promising avenue for enhancing the performance of organic electronic devices including solar cells, light emitting diodes, and transistors, thanks to their well-controlled, uniform electronic properties and high carrier mobilities. Although the phenomenon of organic epitaxy has been known for decades, computatio… ▽ More Highly ordered epitaxial interfaces between organic semiconductors are considered as a promising avenue for enhancing the performance of organic electronic devices including solar cells, light emitting diodes, and transistors, thanks to their well-controlled, uniform electronic properties and high carrier mobilities. Although the phenomenon of organic epitaxy has been known for decades, computational methods for structure prediction of epitaxial organic interfaces have lagged far behind the existing methods for their inorganic counterparts. We present a method for structure prediction of epitaxial organic interfaces based on lattice matching followed by surface matching, implemented in the open-source Python package, Ogre. The lattice matching step produces domain-matched interfaces, where commensurability is achieved with different integer multiples of the substrate and film unit cells. In the surface matching step, Bayesian optimization (BO) is used to find the interfacial distance and registry between the substrate and film. The BO objective function is based on dispersion corrected deep neural network interatomic potentials, shown to be in excellent agreement with density functional theory (DFT). The application of Ogre is demonstrated for an epitaxial interface of 7,7,8,8-tetracyanoquinodimethane (TCNQ) on tetrathiafulvalene (TTF), whose electronic structure has been probed by ultraviolet photoemission spectroscopy (UPS), but whose structure had been hitherto unknown [Organic Electronics 48, 371 (2017)]. We find that TCNQ(001) on top of TTF(100) is the most stable interface configuration, closely followed by TCNQ(010) on top of TTF(100). The density of states, calculated using DFT, is in excellent agreement with UPS, including the presence of an interface charge transfer state. △ Less

Submitted 18 January, 2023; originally announced January 2023.

arXiv:2207.14276 [pdf, other]

doi 10.1039/D2SC04815A

Scalable Hybrid Deep Neural Networks/Polarizable Potentials Biomolecular Simulations including long-range effects

Authors: Théo Jaffrelot Inizan, Thomas Plé, Olivier Adjoua, Pengyu Ren, Hattice Gökcan, Olexandr Isayev, Louis Lagardère, Jean-Philip Piquemal

Abstract: Deep-HP is a scalable extension of the \TinkerHP\ multi-GPUs molecular dynamics (MD) package enabling the use of Pytorch/TensorFlow Deep Neural Networks (DNNs) models. Deep-HP increases DNNs MD capabilities by orders of magnitude offering access to ns simulations for 100k-atom biosystems while offering the possibility of coupling DNNs to any classical (FFs) and many-body polarizable (PFFs) force f… ▽ More Deep-HP is a scalable extension of the \TinkerHP\ multi-GPUs molecular dynamics (MD) package enabling the use of Pytorch/TensorFlow Deep Neural Networks (DNNs) models. Deep-HP increases DNNs MD capabilities by orders of magnitude offering access to ns simulations for 100k-atom biosystems while offering the possibility of coupling DNNs to any classical (FFs) and many-body polarizable (PFFs) force fields. It allows therefore to introduce the ANI-2X/AMOEBA hybrid polarizable potential designed for ligand binding studies where solvent-solvent and solvent-solute interactions are computed with the AMOEBA PFF while solute-solute ones are computed by the ANI-2x DNN. ANI-2X/AMOEBA explicitly includes AMOEBA's physical long-range interactions via an efficient Particle Mesh Ewald implementation while preserving ANI-2X's solute short-range quantum mechanical accuracy. The DNNs/PFFs partition can be user-defined allowing for hybrid simulations to include biosimulation key ingredients such as polarizable solvents, polarizable counter ions, etc... ANI-2X/AMOEBA is accelerated using a multiple-timestep strategy focusing on the models contributions to low-frequency modes of nuclear forces. It primarily evaluates AMOEBA forces while including ANI-2x ones only via correction-steps resulting in an order of magnitude acceleration over standard Velocity Verlet integration. Simulating more than 10 $μ$, we compute charged/uncharged ligands solvation free energies in 4 solvents, and absolute binding free energies of host-guest complexes from SAMPL challenges. ANI-2X/AMOEBA average errors are within chemical accuracy opening the path towards large-scale hybrid DNNs simulations, at force-field cost, in biophysics and drug discovery. △ Less

Submitted 30 August, 2022; v1 submitted 28 July, 2022; originally announced July 2022.

Journal ref: Chemical Science, 2023

arXiv:2112.03235 [pdf, other]

Simulation Intelligence: Towards a New Generation of Scientific Methods

Authors: Alexander Lavin, David Krakauer, Hector Zenil, Justin Gottschlich, Tim Mattson, Johann Brehmer, Anima Anandkumar, Sanjay Choudry, Kamil Rocki, Atılım Güneş Baydin, Carina Prunkl, Brooks Paige, Olexandr Isayev, Erik Peterson, Peter L. McMahon, Jakob Macke, Kyle Cranmer, Jiaxin Zhang, Haruko Wainwright, Adi Hanuka, Manuela Veloso, Samuel Assefa, Stephan Zheng, Avi Pfeffer

Abstract: The original "Seven Motifs" set forth a roadmap of essential methods for the field of scientific computing, where a motif is an algorithmic method that captures a pattern of computation and data movement. We present the "Nine Motifs of Simulation Intelligence", a roadmap for the development and integration of the essential algorithms necessary for a merger of scientific computing, scientific simul… ▽ More The original "Seven Motifs" set forth a roadmap of essential methods for the field of scientific computing, where a motif is an algorithmic method that captures a pattern of computation and data movement. We present the "Nine Motifs of Simulation Intelligence", a roadmap for the development and integration of the essential algorithms necessary for a merger of scientific computing, scientific simulation, and artificial intelligence. We call this merger simulation intelligence (SI), for short. We argue the motifs of simulation intelligence are interconnected and interdependent, much like the components within the layers of an operating system. Using this metaphor, we explore the nature of each layer of the simulation intelligence operating system stack (SI-stack) and the motifs therein: (1) Multi-physics and multi-scale modeling; (2) Surrogate modeling and emulation; (3) Simulation-based inference; (4) Causal modeling and inference; (5) Agent-based modeling; (6) Probabilistic programming; (7) Differentiable programming; (8) Open-ended optimization; (9) Machine programming. We believe coordinated efforts between motifs offers immense opportunity to accelerate scientific discovery, from solving inverse problems in synthetic biology and climate science, to directing nuclear energy experiments and predicting emergent behavior in socioeconomic settings. We elaborate on each layer of the SI-stack, detailing the state-of-art methods, presenting examples to highlight challenges and opportunities, and advocating for specific ways to advance the motifs and the synergies from their combinations. Advancing and integrating these technologies can enable a robust and efficient hypothesis-simulation-analysis type of scientific method, which we introduce with several use-cases for human-machine teaming and automated science. △ Less

Submitted 27 November, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

arXiv:1911.11559 [pdf, other]

Impressive computational acceleration by using machine learning for 2-dimensional super-lubricant materials discovery

Authors: Marco Fronzi, Mutaz Abu Ghazaleh, Olexandr Isayev, David A. Winkler, Joe Shapter, Michael J. Ford

Abstract: The screening of novel materials is an important topic in the field of materials science. Although traditional computational modeling, especially first-principles approaches, is a very useful and accurate tool to predict the properties of novel materials, it still demands extensive and expensive state-of-the-art computational resources. Additionally, they can be often extremely time consuming. We… ▽ More The screening of novel materials is an important topic in the field of materials science. Although traditional computational modeling, especially first-principles approaches, is a very useful and accurate tool to predict the properties of novel materials, it still demands extensive and expensive state-of-the-art computational resources. Additionally, they can be often extremely time consuming. We describe a time and resource-efficient machine learning approach to create a large dataset of structural properties of van der Waals layered structures. In particular, we focus on the interlayer energy and the elastic constant of layered materials composed of two different 2-dimensional (2D) structures, that are important for novel solid lubricant and super-lubricant materials. We show that machine learning models can recapitulate results of computationally expansive approaches (i.e. density functional theory) with high accuracy. △ Less

Submitted 29 July, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

arXiv:1909.12963 [pdf]

doi 10.1063/5.0052857

Machine Learned Hückel Theory: Interfacing Physics and Deep Neural Networks

Authors: Tetiana Zubatyuk, Ben Nebgen, Nicholas Lubbers, Justin S. Smith, Roman Zubatyuk, Guoqing Zhou, Christopher Koh, Kipton Barros, Olexandr Isayev, Sergei Tretiak

Abstract: The Hückel Hamiltonian is an incredibly simple tight-binding model famed for its ability to capture qualitative physics phenomena arising from electron interactions in molecules and materials. Part of its simplicity arises from using only two types of empirically fit physics-motivated parameters: the first describes the orbital energies on each atom and the second describes electronic interactions… ▽ More The Hückel Hamiltonian is an incredibly simple tight-binding model famed for its ability to capture qualitative physics phenomena arising from electron interactions in molecules and materials. Part of its simplicity arises from using only two types of empirically fit physics-motivated parameters: the first describes the orbital energies on each atom and the second describes electronic interactions and bonding between atoms. By replacing these traditionally static parameters with dynamically predicted values, we vastly increase the accuracy of the extended Hückel model. The dynamic values are generated with a deep neural network, which is trained to reproduce orbital energies and densities derived from density functional theory. The resulting model retains interpretability while the deep neural network parameterization is smooth, accurate, and reproduces insightful features of the original static parameterization. Finally, we demonstrate that the Hückel model, and not the deep neural network, is responsible for capturing intricate orbital interactions in two molecular case studies. Overall, this work shows the promise of utilizing machine learning to formulate simple, accurate, and dynamically parameterized physics models. △ Less

Submitted 27 September, 2019; originally announced September 2019.

arXiv:1905.13372 [pdf, other]

MolecularRNN: Generating realistic molecular graphs with optimized properties

Authors: Mariya Popova, Mykhailo Shvets, Junier Oliva, Olexandr Isayev

Abstract: Designing new molecules with a set of predefined properties is a core problem in modern drug discovery and development. There is a growing need for de-novo design methods that would address this problem. We present MolecularRNN, the graph recurrent generative model for molecular structures. Our model generates diverse realistic molecular graphs after likelihood pretraining on a big database of mol… ▽ More Designing new molecules with a set of predefined properties is a core problem in modern drug discovery and development. There is a growing need for de-novo design methods that would address this problem. We present MolecularRNN, the graph recurrent generative model for molecular structures. Our model generates diverse realistic molecular graphs after likelihood pretraining on a big database of molecules. We perform an analysis of our pretrained models on large-scale generated datasets of 1 million samples. Further, the model is tuned with policy gradient algorithm, provided a critic that estimates the reward for the property of interest. We show a significant distribution shift to the desired range for lipophilicity, drug-likeness, and melting point outperforming state-of-the-art works. With the use of rejection sampling based on valency constraints, our model yields 100% validity. Moreover, we show that invalid molecules provide a rich signal to the model through the use of structure penalty in our reinforcement learning pipeline. △ Less

Submitted 30 May, 2019; originally announced May 2019.

arXiv:1803.04395 [pdf]

Transferable Molecular Charge Assignment Using Deep Neural Networks

Authors: Ben Nebgen, Nick Lubbers, Justin S. Smith, Andrew Sifain, Andrey Lokhov, Olexandr Isayev, Adrian Roitberg, Kipton Barros, Sergei Tretiak

Abstract: We use HIP-NN, a neural network architecture that excels at predicting molecular energies, to predict atomic charges. The charge predictions are accurate over a wide range of molecules (both small and large) and for a diverse set of charge assignment schemes. To demonstrate the power of charge prediction on non-equilibrium geometries, we use HIP-NN to generate IR spectra from dynamical trajectorie… ▽ More We use HIP-NN, a neural network architecture that excels at predicting molecular energies, to predict atomic charges. The charge predictions are accurate over a wide range of molecules (both small and large) and for a diverse set of charge assignment schemes. To demonstrate the power of charge prediction on non-equilibrium geometries, we use HIP-NN to generate IR spectra from dynamical trajectories on a variety of molecules. The results are in good agreement with reference IR spectra produced by traditional theoretical methods. Critically, for this application, HIP-NN charge predictions are about 104 times faster than direct DFT charge calculations. Thus, ML provides a pathway to greatly increase the range of feasible simulations while retaining quantum-level accuracy. In summary, our results provide further evidence that machine learning can replicate high-level quantum calculations at a tiny fraction of the computational cost. △ Less

Submitted 12 March, 2018; originally announced March 2018.

arXiv:1801.09319 [pdf]

doi 10.1063/1.5023802

Less is more: sampling chemical space with active learning

Authors: Justin S. Smith, Ben Nebgen, Nicholas Lubbers, Olexandr Isayev, Adrian E. Roitberg

Abstract: The development of accurate and transferable machine learning (ML) potentials for predicting molecular energetics is a challenging task. The process of data generation to train such ML potentials is a task neither well understood nor researched in detail. In this work, we present a fully automated approach for the generation of datasets with the intent of training universal ML potentials. It is ba… ▽ More The development of accurate and transferable machine learning (ML) potentials for predicting molecular energetics is a challenging task. The process of data generation to train such ML potentials is a task neither well understood nor researched in detail. In this work, we present a fully automated approach for the generation of datasets with the intent of training universal ML potentials. It is based on the concept of active learning (AL) via Query by Committee (QBC), which uses the disagreement between an ensemble of ML potentials to infer the reliability of the ensemble's prediction. QBC allows the presented AL algorithm to automatically sample regions of chemical space where the ML potential fails to accurately predict the potential energy. AL improves the overall fitness of ANAKIN-ME (ANI) deep learning potentials in rigorous test cases by mitigating human biases in deciding what new training data to use. AL also reduces the training set size to a fraction of the data required when using naive random sampling techniques. To provide validation of our AL approach we develop the COMP6 benchmark (publicly available on GitHub), which contains a diverse set of organic molecules. Through the AL process, it is shown that the AL-based potentials perform as well as the ANI-1 potential on COMP6 with only 10% of the data, and vastly outperforms ANI-1 with 25% the amount of data. Finally, we show that our proposed AL technique develops a universal ANI potential (ANI-1x) that provides accurate energy and force predictions on the entire COMP6 benchmark. This universal ML potential achieves a level of accuracy on par with the best ML potentials for single molecule or materials, while remaining applicable to the general class of organic molecules comprised of the elements CHNO. △ Less

Submitted 9 April, 2018; v1 submitted 28 January, 2018; originally announced January 2018.

Comments: Accepted at J. Chem. Phys

Journal ref: J. Chem. Phys. 148, 241733 (2018)

arXiv:1712.00422 [pdf, other]

The AFLOW Fleet for Materials Discovery

Authors: Cormac Toher, Corey Oses, David Hicks, Eric Gossett, Frisco Rose, Pinku Nath, Demet Usanmaz, Denise C. Ford, Eric Perim, Camilo E. Calderon, Jose J. Plata, Yoav Lederer, Michal Jahnátek, Wahyu Setyawan, Shidong Wang, Junkai Xue, Kevin Rasch, Roman V. Chepulskii, Richard H. Taylor, Geena Gomez, Harvey Shi, Andrew R. Supka, Rabih Al Rahal Al Orabi, Priya Gopal, Frank T. Cerasoli , et al. (26 additional authors not shown)

Abstract: The traditional paradigm for materials discovery has been recently expanded to incorporate substantial data driven research. With the intent to accelerate the development and the deployment of new technologies, the AFLOW Fleet for computational materials design automates high-throughput first principles calculations, and provides tools for data verification and dissemination for a broad community… ▽ More The traditional paradigm for materials discovery has been recently expanded to incorporate substantial data driven research. With the intent to accelerate the development and the deployment of new technologies, the AFLOW Fleet for computational materials design automates high-throughput first principles calculations, and provides tools for data verification and dissemination for a broad community of users. AFLOW incorporates different computational modules to robustly determine thermodynamic stability, electronic band structures, vibrational dispersions, thermo-mechanical properties and more. The AFLOW data repository is publicly accessible online at aflow.org, with more than 1.7 million materials entries and a panoply of queryable computed properties. Tools to programmatically search and process the data, as well as to perform online machine learning predictions, are also available. △ Less

Submitted 1 December, 2017; originally announced December 2017.

Comments: 14 pages, 8 figures

arXiv:1711.10907 [pdf]

doi 10.1126/sciadv.aap7885

Deep Reinforcement Learning for De-Novo Drug Design

Authors: Mariya Popova, Olexandr Isayev, Alexander Tropsha

Abstract: We propose a novel computational strategy for de novo design of molecules with desired properties termed ReLeaSE (Reinforcement Learning for Structural Evolution). Based on deep and reinforcement learning approaches, ReLeaSE integrates two deep neural networks - generative and predictive - that are trained separately but employed jointly to generate novel targeted chemical libraries. ReLeaSE emplo… ▽ More We propose a novel computational strategy for de novo design of molecules with desired properties termed ReLeaSE (Reinforcement Learning for Structural Evolution). Based on deep and reinforcement learning approaches, ReLeaSE integrates two deep neural networks - generative and predictive - that are trained separately but employed jointly to generate novel targeted chemical libraries. ReLeaSE employs simple representation of molecules by their SMILES strings only. Generative models are trained with stack-augmented memory network to produce chemically feasible SMILES strings, and predictive models are derived to forecast the desired properties of the de novo generated compounds. In the first phase of the method, generative and predictive models are trained separately with a supervised learning algorithm. In the second phase, both models are trained jointly with the reinforcement learning approach to bias the generation of new chemical structures towards those with the desired physical and/or biological properties. In the proof-of-concept study, we have employed the ReLeaSE method to design chemical libraries with a bias toward structural complexity or biased toward compounds with either maximal, minimal, or specific range of physical properties such as melting point or hydrophobicity, as well as to develop novel putative inhibitors of JAK2. The approach proposed herein can find a general use for generating targeted chemical libraries of novel compounds optimized for either a single desired property or multiple properties. △ Less

Submitted 31 May, 2018; v1 submitted 29 November, 2017; originally announced November 2017.

Journal ref: Science Advances, 2018, vol. 4, no. 7, eaap7885

arXiv:1711.10744 [pdf, other]

AFLOW-ML: A RESTful API for machine-learning predictions of materials properties

Authors: Eric Gossett, Cormac Toher, Corey Oses, Olexandr Isayev, Fleur Legrain, Frisco Rose, Eva Zurek, Jesús Carrete, Natalio Mingo, Alexander Tropsha, Stefano Curtarolo

Abstract: Machine learning approaches, enabled by the emergence of comprehensive databases of materials properties, are becoming a fruitful direction for materials analysis. As a result, a plethora of models have been constructed and trained on existing data to predict properties of new systems. These powerful methods allow researchers to target studies only at interesting materials $\unicode{x2014}$ neglec… ▽ More Machine learning approaches, enabled by the emergence of comprehensive databases of materials properties, are becoming a fruitful direction for materials analysis. As a result, a plethora of models have been constructed and trained on existing data to predict properties of new systems. These powerful methods allow researchers to target studies only at interesting materials $\unicode{x2014}$ neglecting the non-synthesizable systems and those without the desired properties $\unicode{x2014}$ thus reducing the amount of resources spent on expensive computations and/or time-consuming experimental synthesis. However, using these predictive models is not always straightforward. Often, they require a panoply of technical expertise, creating barriers for general users. AFLOW-ML (AFLOW $\underline{\mathrm{M}}$achine $\underline{\mathrm{L}}$earning) overcomes the problem by streamlining the use of the machine learning methods developed within the AFLOW consortium. The framework provides an open RESTful API to directly access the continuously updated algorithms, which can be transparently integrated into any workflow to retrieve predictions of electronic, thermal and mechanical properties. These types of interconnected cloud-based applications are envisioned to be capable of further accelerating the adoption of machine learning methods into materials development. △ Less

Submitted 29 November, 2017; originally announced November 2017.

Comments: 10 pages, 2 figures

arXiv:1708.04987 [pdf]

doi 10.1038/sdata.2017.193

ANI-1: A data set of 20M off-equilibrium DFT calculations for organic molecules

Authors: Justin S. Smith, Olexandr Isayev, Adrian E. Roitberg

Abstract: One of the grand challenges in modern theoretical chemistry is designing and implementing approximations that expedite ab initio methods without loss of accuracy. Machine learning (ML), in particular neural networks, are emerging as a powerful approach to constructing various forms of transferable atomistic potentials. They have been successfully applied in a variety of applications in chemistry,… ▽ More One of the grand challenges in modern theoretical chemistry is designing and implementing approximations that expedite ab initio methods without loss of accuracy. Machine learning (ML), in particular neural networks, are emerging as a powerful approach to constructing various forms of transferable atomistic potentials. They have been successfully applied in a variety of applications in chemistry, biology, catalysis, and solid-state physics. However, these models are heavily dependent on the quality and quantity of data used in their fitting. Fitting highly flexible ML potentials comes at a cost: a vast amount of reference data is required to properly train these models. We address this need by providing access to a large computational DFT database, which consists of 20M conformations for 57,454 small organic molecules. We believe it will become a new standard benchmark for comparison of current and future methods in the ML potential community. △ Less

Submitted 12 December, 2017; v1 submitted 16 August, 2017; originally announced August 2017.

Journal ref: Scientific Data 4, Article number: 170193 (2017)

arXiv:1610.08935 [pdf]

doi 10.1039/C6SC05720A

ANI-1: An extensible neural network potential with DFT accuracy at force field computational cost

Authors: Justin S. Smith, Olexandr Isayev, Adrian E. Roitberg

Abstract: Deep learning is revolutionizing many areas of science and technology, especially image, text and speech recognition. In this paper, we demonstrate how a deep neural network (NN) trained on quantum mechanical (QM) DFT calculations can learn an accurate and fully transferable potential for organic molecules. We introduce ANAKIN-ME (Accurate NeurAl networK engINe for Molecular Energies) or ANI in sh… ▽ More Deep learning is revolutionizing many areas of science and technology, especially image, text and speech recognition. In this paper, we demonstrate how a deep neural network (NN) trained on quantum mechanical (QM) DFT calculations can learn an accurate and fully transferable potential for organic molecules. We introduce ANAKIN-ME (Accurate NeurAl networK engINe for Molecular Energies) or ANI in short. ANI is a new method and procedure for training neural network potentials that utilizes a highly modified version of the Behler and Parrinello symmetry functions to build single-atom atomic environment vectors as a molecular representation. We utilize ANI to build a potential called ANI-1, which was trained on a subset of the GDB databases with up to 8 heavy atoms to predict total energies for organic molecules containing four atom types: H, C, N, and O. To obtain an accelerated but physically relevant sampling of molecular potential surfaces, we also propose a Normal Mode Sampling (NMS) method for generating molecular configurations. Through a series of case studies, we show that ANI-1 is chemically accurate compared to reference DFT calculations on much larger molecular systems (up to 54 atoms) than those included in the training data set, with root mean square errors as low as 0.56 kcal/mol. △ Less

Submitted 6 February, 2017; v1 submitted 27 October, 2016; originally announced October 2016.

arXiv:1608.04782 [pdf, other]

doi 10.1038/ncomms15679

Universal Fragment Descriptors for Predicting Electronic Properties of Inorganic Crystals

Authors: Olexandr Isayev, Corey Oses, Cormac Toher, Eric Gossett, Stefano Curtarolo, Alexander Tropsha

Abstract: Historically, materials discovery has been driven by a laborious trial-and-error process. The growth of materials databases and emerging informatics approaches finally offer the opportunity to transform this practice into data- and knowledge-driven rational design. By using data from the AFLOW repository for high-throughput ab-initio calculations, we have generated Quantitative Materials Structure… ▽ More Historically, materials discovery has been driven by a laborious trial-and-error process. The growth of materials databases and emerging informatics approaches finally offer the opportunity to transform this practice into data- and knowledge-driven rational design. By using data from the AFLOW repository for high-throughput ab-initio calculations, we have generated Quantitative Materials Structure-Property Relationship (QMSPR) models to predict eight critical electronic and thermomechanical materials properties, such as the metal/insulator classification, band gap energy, bulk and shear moduli, Debye temperature, and heat capacity. The prediction accuracy obtained with these QMSPR models approaches training data for virtually any stoichiometric inorganic crystalline material. The success and universality of these models is attributed to the construction of new materials descriptors---referred to as the universal Property-Labeled Materials Fragments (PLMF). The representation requires only minimal structural input and affords straightforward model interpretation in terms of simple heuristic design rules that guide rational materials design. This study demonstrates the power of materials informatics to dramatically accelerate the search for new materials. △ Less

Submitted 24 March, 2017; v1 submitted 16 August, 2016; originally announced August 2016.

Comments: 14 pages, 7 figures

arXiv:1412.4096 [pdf, other]

doi 10.1021/cm503507h

Materials Cartography: Representing and Mining Material Space Using Structural and Electronic Fingerprints

Authors: Olexandr Isayev, Denis Fourches, Eugene N. Muratov, Corey Oses, Kevin Rasch, Alexander Tropscha, Stefano Curtarolo

Abstract: As the proliferation of high-throughput approaches in materials science is increasing the wealth of data in the field, the gap between accumulated-information and derived-knowledge widens. We address the issue of scientific discovery in materials databases by introducing novel analytical approaches based on structural and electronic materials fingerprints. The framework is employed to (i) query la… ▽ More As the proliferation of high-throughput approaches in materials science is increasing the wealth of data in the field, the gap between accumulated-information and derived-knowledge widens. We address the issue of scientific discovery in materials databases by introducing novel analytical approaches based on structural and electronic materials fingerprints. The framework is employed to (i) query large databases of materials using similarity concepts, (ii) map the connectivity of the materials space (i.e., as a materials cartogram) for rapidly identifying regions with unique organizations/properties, and (iii) develop predictive Quantitative Materials Structure-Property Relation- ships (QMSPR) models for guiding materials design. In this study, we test these fingerprints by seeking target material properties. As a quantitative example, we model the critical temperatures of known superconductors. Our novel materials fingerprinting and materials cartography approaches contribute to the emerging field of materials informatics by enabling effective computational tools to analyze, visualize, model, and design new materials. △ Less

Submitted 16 December, 2014; v1 submitted 9 December, 2014; originally announced December 2014.

Comments: 13 pages and 5 figures, Chem. Mater., 2015

Showing 1–17 of 17 results for author: Isayev, O