-
Predicting polymerization reactions via transfer learning using chemical language models
Authors:
Brenda S. Ferrari,
Matteo Manica,
Ronaldo Giro,
Teodoro Laino,
Mathias B. Steiner
Abstract:
Polymers are candidate materials for a wide range of sustainability applications such as carbon capture and energy storage. However, computational polymer discovery lacks automated analysis of reaction pathways and stability assessment through retro-synthesis. Here, we report the first extension of transformer-based language models to polymerization reactions for both forward and retrosynthesis ta…
▽ More
Polymers are candidate materials for a wide range of sustainability applications such as carbon capture and energy storage. However, computational polymer discovery lacks automated analysis of reaction pathways and stability assessment through retro-synthesis. Here, we report the first extension of transformer-based language models to polymerization reactions for both forward and retrosynthesis tasks. To that end, we have curated a polymerization dataset for vinyl polymers covering reactions and retrosynthesis for representative homo-polymers and co-polymers. Overall, we obtain a forward model Top-4 accuracy of 80% and a backward model Top-4 accuracy of 60%. We further analyze the model performance with representative polymerization and retro-synthesis examples and evaluate its prediction quality from a materials science perspective.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Language models in molecular discovery
Authors:
Nikita Janakarajan,
Tim Erdmann,
Sarath Swaminathan,
Teodoro Laino,
Jannis Born
Abstract:
The success of language models, especially transformer-based architectures, has trickled into other domains giving rise to "scientific language models" that operate on small molecules, proteins or polymers. In chemistry, language models contribute to accelerating the molecule discovery cycle as evidenced by promising recent findings in early-stage drug discovery. Here, we review the role of langua…
▽ More
The success of language models, especially transformer-based architectures, has trickled into other domains giving rise to "scientific language models" that operate on small molecules, proteins or polymers. In chemistry, language models contribute to accelerating the molecule discovery cycle as evidenced by promising recent findings in early-stage drug discovery. Here, we review the role of language models in molecular discovery, underlining their strength in de novo drug design, property prediction and reaction chemistry. We highlight valuable open-source software assets thus lowering the entry barrier to the field of scientific language modeling. Last, we sketch a vision for future molecular design that combines a chatbot interface with access to computational chemistry tools. Our contribution serves as a valuable resource for researchers, chemists, and AI enthusiasts interested in understanding how language models can and will be used to accelerate chemical discovery.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
Map** the Space of Chemical Reactions Using Attention-Based Neural Networks
Authors:
Philippe Schwaller,
Daniel Probst,
Alain C. Vaucher,
Vishnu H. Nair,
David Kreutter,
Teodoro Laino,
Jean-Louis Reymond
Abstract:
Organic reactions are usually assigned to classes containing reactions with similar reagents and mechanisms. Reaction classes facilitate the communication of complex concepts and efficient navigation through chemical reaction space. However, the classification process is a tedious task. It requires the identification of the corresponding reaction class template via annotation of the number of mole…
▽ More
Organic reactions are usually assigned to classes containing reactions with similar reagents and mechanisms. Reaction classes facilitate the communication of complex concepts and efficient navigation through chemical reaction space. However, the classification process is a tedious task. It requires the identification of the corresponding reaction class template via annotation of the number of molecules in the reactions, the reaction center, and the distinction between reactants and reagents. This work shows that transformer-based models can infer reaction classes from non-annotated, simple text-based representations of chemical reactions. Our best model reaches a classification accuracy of 98.2%. We also show that the learned representations can be used as reaction fingerprints that capture fine-grained differences between reaction classes better than traditional reaction fingerprints. The insights into chemical reaction space enabled by our learned fingerprints are illustrated by an interactive reaction atlas providing visual clustering and similarity searching.
△ Less
Submitted 9 December, 2020;
originally announced December 2020.
-
CP2K: An Electronic Structure and Molecular Dynamics Software Package -- Quickstep: Efficient and Accurate Electronic Structure Calculations
Authors:
Thomas D. Kühne,
Marcella Iannuzzi,
Mauro Del Ben,
Vladimir V. Rybkin,
Patrick Seewald,
Frederick Stein,
Teodoro Laino,
Rustam Z. Khaliullin,
Ole Schütt,
Florian Schiffmann,
Dorothea Golze,
Jan Wilhelm,
Sergey Chulkov,
Mohammad Hossein Bani-Hashemian,
Valéry Weber,
Urban Borstnik,
Mathieu Taillefumier,
Alice Shoshana Jakobovits,
Alfio Lazzaro,
Hans Pabst,
Tiziano Müller,
Robert Schade,
Manuel Guidon,
Samuel Andermatt,
Nico Holmberg
, et al. (14 additional authors not shown)
Abstract:
CP2K is an open source electronic structure and molecular dynamics software package to perform atomistic simulations of solid-state, liquid, molecular and biological systems. It is especially aimed at massively-parallel and linear-scaling electronic structure methods and state-of-the-art ab-initio molecular dynamics simulations. Excellent performance for electronic structure calculations is achiev…
▽ More
CP2K is an open source electronic structure and molecular dynamics software package to perform atomistic simulations of solid-state, liquid, molecular and biological systems. It is especially aimed at massively-parallel and linear-scaling electronic structure methods and state-of-the-art ab-initio molecular dynamics simulations. Excellent performance for electronic structure calculations is achieved using novel algorithms implemented for modern high-performance computing systems. This review revisits the main capabilities of CP2k to perform efficient and accurate electronic structure simulations. The emphasis is put on density functional theory and multiple post-Hartree-Fock methods using the Gaussian and plane wave approach and its augmented all-electron extension.
△ Less
Submitted 11 March, 2020; v1 submitted 8 March, 2020;
originally announced March 2020.
-
Simulating diffusion properties of solid-state electrolytes via a neural network potential: Performance and training scheme
Authors:
Aris Marcolongo,
Tobias Binninger,
Federico Zipoli,
Teodoro Laino
Abstract:
The recently published DeePMD model (https://github.com/deepmodeling/deepmd-kit), based on a deep neural network architecture, brings the hope of solving the time-scale issue which often prevents the application of first principle molecular dynamics to physical systems. With this contribution we assess the performance of the DeePMD potential on a real-life application and model diffusion of ions i…
▽ More
The recently published DeePMD model (https://github.com/deepmodeling/deepmd-kit), based on a deep neural network architecture, brings the hope of solving the time-scale issue which often prevents the application of first principle molecular dynamics to physical systems. With this contribution we assess the performance of the DeePMD potential on a real-life application and model diffusion of ions in solid-state electrolytes. We consider as test cases the well known Li10GeP2S12, Li7La3Zr2O12 and Na3Zr2Si2PO12. We develop and test a training protocol suitable for the computation of diffusion coefficients, which is one of the key properties to be optimized for battery applications, and we find good agreement with previous computations. Our results show that the DeePMD model may be a successful component of a framework to identify novel solid-state electrolytes.
△ Less
Submitted 22 October, 2019;
originally announced October 2019.
-
Comparison of computational methods for the electrochemical stability window of solid-state electrolyte materials
Authors:
Tobias Binninger,
Aris Marcolongo,
Matthieu Mottet,
Valéry Weber,
Teodoro Laino
Abstract:
Superior stability and safety are key promises attributed to all-solid-state batteries (ASSBs) containing solid-state electrolyte (SSE) compared to their conventional counterparts utilizing liquid electrolyte. To unleash the full potential of ASSBs, SSE materials that are stable when in contact with the low and high potential electrodes are required. The electrochemical stability window is conveni…
▽ More
Superior stability and safety are key promises attributed to all-solid-state batteries (ASSBs) containing solid-state electrolyte (SSE) compared to their conventional counterparts utilizing liquid electrolyte. To unleash the full potential of ASSBs, SSE materials that are stable when in contact with the low and high potential electrodes are required. The electrochemical stability window is conveniently used to assess the SSE-electrode interface stability. In the present work, we review the most important methods to compute the SSE stability window. Our analysis reveals that the stoichiometry stability method represents a bridge between HOMO-LUMO method and phase stability method (grand canonical phase diagram). Moreover, we provide computational implementations of these methods for SSE material screening. We compare their results for the relevant Li- and Na-SSE materials LGPS, LIPON, LLZO, LLTO, LATP, LISICON, and NASICON, and we discuss their relation to published experimental stability windows.
△ Less
Submitted 25 October, 2019; v1 submitted 8 January, 2019;
originally announced January 2019.
-
Molecular Transformer - A Model for Uncertainty-Calibrated Chemical Reaction Prediction
Authors:
Philippe Schwaller,
Teodoro Laino,
Théophile Gaudin,
Peter Bolgar,
Costas Bekas,
Alpha A Lee
Abstract:
Organic synthesis is one of the key stumbling blocks in medicinal chemistry. A necessary yet unsolved step in planning synthesis is solving the forward problem: given reactants and reagents, predict the products. Similar to other work, we treat reaction prediction as a machine translation problem between SMILES strings of reactants-reagents and the products. We show that a multi-head attention Mol…
▽ More
Organic synthesis is one of the key stumbling blocks in medicinal chemistry. A necessary yet unsolved step in planning synthesis is solving the forward problem: given reactants and reagents, predict the products. Similar to other work, we treat reaction prediction as a machine translation problem between SMILES strings of reactants-reagents and the products. We show that a multi-head attention Molecular Transformer model outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark dataset. Our algorithm requires no handcrafted rules, and accurately predicts subtle chemical transformations. Crucially, our model can accurately estimate its own uncertainty, with an uncertainty score that is 89% accurate in terms of classifying whether a prediction is correct. Furthermore, we show that the model is able to handle inputs without reactant-reagent split and including stereochemistry, which makes our method universally applicable.
△ Less
Submitted 30 May, 2019; v1 submitted 6 November, 2018;
originally announced November 2018.
-
Map** the Free Energy of Lithium Solvation in the Protic Ionic Liquid Ethylammonuim Nitrate: A Metadynamics Study
Authors:
Ali Kachmar,
Marcelo Carignano,
Teodoro Laino,
Marcella Iannuzzi,
Jürg Hutter
Abstract:
Understanding lithium solvation and transport in ionic liquids is important due to their possible application in electrochemical devices. Using first-principles simulations aided by a metadynamics approach we study the free-energy landscape for lithium ions at infinite dilution in ethylammonium nitrate, a protic ionic liquid. We analyze the local structure of the liquid around the lithium cation a…
▽ More
Understanding lithium solvation and transport in ionic liquids is important due to their possible application in electrochemical devices. Using first-principles simulations aided by a metadynamics approach we study the free-energy landscape for lithium ions at infinite dilution in ethylammonium nitrate, a protic ionic liquid. We analyze the local structure of the liquid around the lithium cation and obtain a quantitative picture in agreement with experimental findings. Our simulations show that the lowest two free energy minima correspond to conformations with the lithium ion being solvated either by three or four nitrate ions with a transition barrier between them of 0.2 \eV. Other less probable conformations having different solvation pattern are also investigated.
△ Less
Submitted 1 August, 2017;
originally announced August 2017.
-
The Adaptive Buffered Force QM/MM method in the CP2K and AMBER software packages
Authors:
Letif Mones,
Andrew Jones,
Andreas W. Götz,
Teodoro Laino,
Ross C. Walker,
Ben Leimkuhler,
Gábor Csányi,
Noam Bernstein
Abstract:
The implementation and validation of the adaptive buffered force QM/MM method in two popular packages, CP2K and AMBER are presented. The implementations build on the existing QM/MM functionality in each code, extending it to allow for redefinition of the QM and MM regions during the simulation and reducing QM-MM interface errors by discarding forces near the boundary according to the buffered forc…
▽ More
The implementation and validation of the adaptive buffered force QM/MM method in two popular packages, CP2K and AMBER are presented. The implementations build on the existing QM/MM functionality in each code, extending it to allow for redefinition of the QM and MM regions during the simulation and reducing QM-MM interface errors by discarding forces near the boundary according to the buffered force-mixing approach. New adaptive thermostats, needed by force-mixing methods, are also implemented. Different variants of the method are benchmarked by simulating the structure of bulk water, water autoprotolysis in the presence of zinc and dimethyl-phosphate hydrolysis using various semiempirical Hamiltonians and density functional theory as the QM model. It is shown that with suitable parameters, based on force convergence tests, the adaptive buffered-force QM/MM scheme can provide an accurate approximation of the structure in the dynamical QM region matching the corresponding fully QM simulations, as well as reproducing the correct energetics in all cases. Adaptive unbuffered force-mixing and adaptive conventional QM/MM methods also provide reasonable results for some systems, but are more likely to suffer from instabilities and inaccuracies.
△ Less
Submitted 18 September, 2014;
originally announced September 2014.