-
Map** the Space of Chemical Reactions Using Attention-Based Neural Networks
Authors:
Philippe Schwaller,
Daniel Probst,
Alain C. Vaucher,
Vishnu H. Nair,
David Kreutter,
Teodoro Laino,
Jean-Louis Reymond
Abstract:
Organic reactions are usually assigned to classes containing reactions with similar reagents and mechanisms. Reaction classes facilitate the communication of complex concepts and efficient navigation through chemical reaction space. However, the classification process is a tedious task. It requires the identification of the corresponding reaction class template via annotation of the number of mole…
▽ More
Organic reactions are usually assigned to classes containing reactions with similar reagents and mechanisms. Reaction classes facilitate the communication of complex concepts and efficient navigation through chemical reaction space. However, the classification process is a tedious task. It requires the identification of the corresponding reaction class template via annotation of the number of molecules in the reactions, the reaction center, and the distinction between reactants and reagents. This work shows that transformer-based models can infer reaction classes from non-annotated, simple text-based representations of chemical reactions. Our best model reaches a classification accuracy of 98.2%. We also show that the learned representations can be used as reaction fingerprints that capture fine-grained differences between reaction classes better than traditional reaction fingerprints. The insights into chemical reaction space enabled by our learned fingerprints are illustrated by an interactive reaction atlas providing visual clustering and similarity searching.
△ Less
Submitted 9 December, 2020;
originally announced December 2020.
-
Training neural nets to learn reactive potential energy surfaces using interactive quantum chemistry in virtual reality
Authors:
Silvia Amabilino,
Lars A. Bratholm,
Simon J. Bennie,
Alain C. Vaucher,
Markus Reiher,
David R. Glowacki
Abstract:
Whilst the primary bottleneck to a number of computational workflows was not so long ago limited by processing power, the rise of machine learning technologies has resulted in a paradigm shift which places increasing value on issues related to data curation - i.e., data size, quality, bias, format, and coverage. Increasingly, data-related issues are equally as important as the algorithmic methods…
▽ More
Whilst the primary bottleneck to a number of computational workflows was not so long ago limited by processing power, the rise of machine learning technologies has resulted in a paradigm shift which places increasing value on issues related to data curation - i.e., data size, quality, bias, format, and coverage. Increasingly, data-related issues are equally as important as the algorithmic methods used to process and learn from the data. Here we introduce an open source GPU-accelerated neural network (NN) framework for learning reactive potential energy surfaces (PESs), and investigate the use of real-time interactive ab initio molecular dynamics in virtual reality (iMD-VR) as a new strategy for rapidly sampling geometries along reaction pathways which can be used to train NNs to learn reactive PESs. Focussing on hydrogen abstraction reactions of CN radical with isopentane, we compare the performance of NNs trained using iMD-VR data versus NNs trained using a more traditional method, namely molecular dynamics (MD) constrained to sample a predefined grid of points along hydrogen abstraction reaction coordinates. Both the NN trained using iMD-VR data and the NN trained using the constrained MD data reproduce important qualitative features of the reactive PESs, such as a low and early barrier to abstraction. Quantitatively, learning is sensitive to the training dataset. Our results show that user-sampled structures obtained with the quantum chemical iMD-VR machinery enable better sampling in the vicinity of the minimum energy path (MEP). As a result, the NN trained on the iMD-VR data does very well predicting energies in the vicinity of the MEP, but less well predicting energies for 'off-path' structures. The NN trained on the constrained MD data does better in predicting energies for 'off-path' structures, given that it included a number of such structures in its training set.
△ Less
Submitted 22 January, 2019; v1 submitted 16 January, 2019;
originally announced January 2019.
-
GuacaMol: Benchmarking Models for De Novo Molecular Design
Authors:
Nathan Brown,
Marco Fiscato,
Marwin H. S. Segler,
Alain C. Vaucher
Abstract:
De novo design seeks to generate molecules with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for molecular design based on neural networks appeared recently and show promising results. However, the new models have not been profiled on consistent tasks, and comparative studies to wel…
▽ More
De novo design seeks to generate molecules with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for molecular design based on neural networks appeared recently and show promising results. However, the new models have not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed.
To standardize the assessment of both classical and neural models for de novo molecular design, we propose an evaluation framework, GuacaMol, based on a suite of standardized benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel molecules, the exploration and exploitation of chemical space, and a variety of single and multi-objective optimization tasks. The benchmarking open-source Python code, and a leaderboard can be found on https://benevolent.ai/guacamol
△ Less
Submitted 26 February, 2019; v1 submitted 22 November, 2018;
originally announced November 2018.