-
Machine learning-guided computational screening of new bio-orthogonal click reactions
Authors:
Thijs Stuyver,
Connor Coley
Abstract:
Bio-orthogonal click chemistry has become an indispensable part of the biochemist's toolbox. Despite the wide variety of applications that have been developed in recent years, only a limited number of bio-orthogonal click reactions have been discovered so far, most of them based on (substituted) azides. In this work, we present a computational workflow to discover new candidate bio-orthogonal clic…
▽ More
Bio-orthogonal click chemistry has become an indispensable part of the biochemist's toolbox. Despite the wide variety of applications that have been developed in recent years, only a limited number of bio-orthogonal click reactions have been discovered so far, most of them based on (substituted) azides. In this work, we present a computational workflow to discover new candidate bio-orthogonal click reactions. Sampling only around 0.05\% of an overall search space of over 10,000,000 dipolar cycloadditions, we develop a machine learning model able to predict DFT-computed activation and reaction energies within ~2-3 kcal/mol across the entire space. Applying this model to screen the full search space through iterative rounds of learning, we identify a broad pool of candidate reactions with rich structural diversity, which can be used as a starting point or source of inspiration for future experimental development of both azide-based and non-azide-based bio-orthogonal click reactions.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.
-
Reaction profiles for quantum chemistry-computed [3 + 2] cycloaddition reactions
Authors:
Thijs Stuyver,
Kjell Jorner,
Connor Coley
Abstract:
Bio-orthogonal click chemistry based on [3 + 2] dipolar cycloadditions has had a profound impact on the field of biochemistry and significant effort has been devoted to identify promising new candidate reactions for this purpose. To gauge whether a prospective reaction could be a suitable bio-orthogonal click reaction, information about both on- and off-target activation and reaction energies is h…
▽ More
Bio-orthogonal click chemistry based on [3 + 2] dipolar cycloadditions has had a profound impact on the field of biochemistry and significant effort has been devoted to identify promising new candidate reactions for this purpose. To gauge whether a prospective reaction could be a suitable bio-orthogonal click reaction, information about both on- and off-target activation and reaction energies is highly valuable. Here, we use an automated workflow, based on the autodE program, to compute over 5000 reaction profiles for [3 + 2] cycloadditions involving both synthetic dipolarophiles and a set of biologically-inspired structural motifs. Based on a succinct benchmarking study, the B3LYP-D3(BJ)/def2-TZVP//B3LYP-D3(BJ)/def2-SVP level of theory was selected for the DFT calculations, and standard conditions and an (aqueous) SMD model were imposed to mimic physiological conditions. We believe that this data, as well as the presented workflow for high-throughput reaction profile computation, will be useful to screen for new bio-orthogonal reactions, as well as for the development of novel machine learning models for the prediction of chemical reactivity more broadly.
△ Less
Submitted 12 December, 2022; v1 submitted 12 December, 2022;
originally announced December 2022.
-
Quantum chemistry-augmented neural networks for reactivity prediction: Performance, generalizability and interpretability
Authors:
Thijs Stuyver,
Connor W. Coley
Abstract:
There is a perceived dichotomy between structure-based and descriptor-based molecular representations used for predictive chemistry tasks. Here, we study the performance, generalizability, and interpretability of the recently proposed quantum mechanics-augmented graph neural network (ml-QM-GNN) architecture as applied to the prediction of regioselectivity (classification) and of activation energie…
▽ More
There is a perceived dichotomy between structure-based and descriptor-based molecular representations used for predictive chemistry tasks. Here, we study the performance, generalizability, and interpretability of the recently proposed quantum mechanics-augmented graph neural network (ml-QM-GNN) architecture as applied to the prediction of regioselectivity (classification) and of activation energies (regression). In our hybrid QM-augmented model architecture, structure-based representations are first used to predict a set of atom- and bond-level reactivity descriptors derived from density functional theory (DFT) calculations. These estimated reactivity descriptors are combined with the original structure-based representation to make the final reactivity prediction. We demonstrate that our model architecture leads to significant improvements over structure-based GNNs in not only overall accuracy, but also in generalization to unseen compounds. Even when provided training sets of only a couple hundred labeled data points, the ml-QM-GNN outperforms other state-of-the-art model architectures that have been applied to these tasks. Further, because the predictions of our model are grounded in (but not restricted to) QM descriptors, we are able to relate predictions to the conceptual frameworks commonly used to gain qualitative insights into reactivity phenomena. This effort results in a productive synergy between theory and data science, wherein our QM-augmented models provide a data-driven confirmation of previous qualitative analyses, and these analyses in their turn facilitate insights into the decision-making process occurring within ml-QM-GNNs.
△ Less
Submitted 21 July, 2021;
originally announced July 2021.