EquiReact: An equivariant neural network for chemical reactions
Abstract
Equivariant neural networks have considerably improved the accuracy and data-efficiency of predictions of molecular properties. Building on this success, we introduce EquiReact, an equivariant neural network to infer properties of chemical reactions, built from three-dimensional structures of reactants and products. We illustrate its competitive performance on the prediction of activation barriers on the GDB7-22-TS, Cyclo-23-TS and Proparg-21-TS datasets with different regimes according to the inclusion of atom-map** information. We show that, compared to state-of-the-art models for reaction property prediction, EquiReact offers: (i) a flexible model with reduced sensitivity between atom-map** regimes, (ii) better extrapolation capabilities to unseen chemistries, (iii) impressive prediction errors for datasets exhibiting subtle variations in three-dimensional geometries of reactants/products, (iv) reduced sensitivity to geometry quality and (iv) excellent data efficiency.
keywords:
machine learning, equivariant neural networks, activation energies, chemical reactionsThese authors contributed equally to this work. \alsoaffiliationNational Center for Competence in Research – Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland \altaffiliationThese authors contributed equally to this work. \alsoaffiliationLearning & Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland \alsoaffiliationLearning & Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland \alsoaffiliationNational Center for Competence in Research – Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland \alsoaffiliationLearning & Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland \alsoaffiliationNational Center for Competence in Research – Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
1 Introduction
Physics-inspired representations that take as input the three-dimensional structure1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 (as well as, in some cases, electronic structure14, 15, 16, 17) of molecules and transform them into a fixed-length vector while respecting known physical laws, have a rich history in molecular property prediction. Models have been developed to predict properties ranging from atomization energies,2, 8, 4, 5, 6, 7, 13 molecular forces,18, 19, 20, 7, 5 potential energy surfaces,1, 21, 3, 5, 7, 9, 10, 22 multipole moments,23 polarizabilities,4, 5, 6, 11, 12, 24, 25 dipole moments,6 HOMO and LUMO eigenvalues4, 6 as well as the HOMO–LUMO gap,6, 26, 27 and electron densities.28, 29, 30 A common desiderata31, 32, 33, 34 for high-performing representations is (i) smoothness, (ii) the encoding of the appropriate symmetries to permutations, rotations and translations,24, 35 (iii) completeness and (iv) additivity to allow for extrapolation to larger systems. Such fingerprints such as the CM,2 BoB,4 SOAP,3 FCHL,6, 7 SLATM,8 MBTR,5 LODE,11, 12 NICE13 or others, being rooted in fundamental principles, are designed to be property-independent: a single representation can be constructed for a molecule to predict any (electronically-derived) target. This is analogous to the molecular Hamiltonian, which specifies the energy and all other properties of the system as a function of atom’s types and positions in three-dimensional space (assuming the molecules are charge neutral and singlets). These representations are typically used in combination with kernel models2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 31, 32, 33 due to their data efficiency, ability to deal with high-dimensional feature vectors, and interpretability of the similarity kernel. Early works showed that combining such representations2, 4, 6, 8, 36 with simple feed-forward neural networks instead of kernel models did not necessarily led to better performance.37, 38
More recently, end-to-end neural networks have been proposed that learn the representation as part of the (supervised) training process,39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53 based on similar principles to the aforementioned physics-inspired representations: They take as input a three-dimensional structure, as well as in some cases charge and spin information.46, 51, 52, 53 Relevant symmetry operations are appropriately encoded into the neural network architecture. Equivariant Neural Networks (ENNs) in particular have shown unprecedented accuracy and data efficiency on benchmarks of molecular property prediction such as energies of organic molecules (QM7b-T,54, 46 QM9,55, 46, 49, 50, 56 GDB-13-T,54, 46 DrugBank-T,57, 46 conformers,58 ANI-159, 45), and energies and forces of several molecular dynamics datasets (MD17,43, 44, 45, 49, 56 proteins,48 methane combustion,60, 45 the open catalyst dataset (OC20)61, 44). Unlike the earlier invariant neural networks,39, 40, 41, 42 which by operating on distances between atoms ensure rotational and translational invariance, equivariant neural networks typically operate on relative position vectors and angular information, which is processed by rotationally-equivariant convolutional layers. The internal features are then equivariant to rotation. ENNs demonstrate a substantial improvement in accuracy even for the prediction of rotation-invariant properties such as total energies.43, 56, 44, 45, 46, 49, 50
Despite these advances for molecular property prediction, the prediction of computed reaction properties (principally, reaction barriers62, 36, 38, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74) is still in its infancy.75 Machine learning approaches range from the use of simple two-dimensional fingerprints of reaction components76, 77 to physical-organic descriptors78, 79, 70, 80, 81, 69, 82, 83, 84, 85, 86, 87, 74 derived from quantum-chemical computations to transformer models88, 89 adapted for regression90 to graph-based approaches.65, 66, 64 A recent class of reaction fingerprints are built from the three-dimensional structure of reaction components,36, 63, 68 involving invariant features for molecular components. It was recently shown38 that these representations are performant for the prediction of reaction barriers, particularly for datasets91, 63 relying on subtle changes in the geometry of reactants and/or products. As in the earlier stages of molecular property prediction,2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13 representations were combined with kernel models.
To date, few attempts have been made to extend equivariant neural networks to chemical reactions. The first is from Spiekermann et al.,66 extending the molecular network DimeNet++41, 92 to describe reactant and product molecules in a chemical reaction, to predict reaction barriers on the benchmark GDB7-20-TS set.93 The second is an equivariant diffusion model predicting the Transition State (TS) structure from reactants and products94 on the same dataset. The former representation-learning approach is closer in spirit to our dedicated reaction fingerprints.36 However, this particular model66 was not shown to improve on 2D-graph based models.65 It was recently illustrated38 that the 2D-graph based models achieve their impressive performance by exploiting atom-map** information,95, 96, 97, 98 which is absent in the equivariant model from Spiekermann et al.66
In this work, we introduce EquiReact, a neural network to predict properties of chemical reactions (showcased here for activation energies), built from equivariant features of reactants’ and products’ geometries. Compared to previously established reaction fingerprints36, 63 as well as other neural networks for reaction prediction,65, 64, 66, 99, 100 we offer several advantages with our new model: (i) greater model flexibility depending on the ease of atom-map** a particular dataset, (ii) better extrapolation capabilities, (iii) competitive predictive performance, (iv) reduced dependence on the quality of three-dimensional geometries and (v) improved data efficiency.
We illustrate these points by studying three datasets of reaction barriers: the GDB7-22-TS 101 (an upgraded dataset from the previously published GDB7-20-TS93), the Cyclo-23-TS,102 and the Proparg-21-TS.91, 63 As discussed in previous works,38 these datasets present varying challenges for ML models, from the dependence on chemical information101 to the distinction of subtle changes in configurations.91, 63 In all cases, we compare to the previously best-performing models:38 the 2D-graph based CGR model ChemProp 65 as well as SLATM fingerprints8, 36 built from three-dimensional information combined with kernel ridge regression (KRR) models.
2 Architecture
EquiReact is built from -equivariant convolutional networks over point clouds as implemented in e3nn103 and used in Thomas et al. 47 and Corso et al. 104. Three-dimensional geometries of molecules constituting reactants and products of each reaction are separately passed through these equivariant channels, detailed in Section 2.1 for the convenience of the reader. They are then combined to eventually predict a reaction property, such as the activation energy, as detailed in Section 2.2. The overall architecture is summarized in Figure 1.
2.1 Equivariant molecular channels
A molecule is represented as a distance-based graph where nodes describe atoms and edges describe bonds. Instead of explicitly using connectivity information, the “bonds” of atom are formed with all the neighboring atoms within the cutoff , all the (directed) bonds in the molecule forming set . Initial node (atom) features encode several cheminformatic features from RDKit,105 including atomic number, chirality tag (unspecified, tetrahedral, or other, including octahedral, square planar, allene-type, etc.), number of directly-bonded neighbors, number of rings, implicit valence, formal charge, number of attached hydrogens, number of radical electrons, hybridization, aromaticity, presence in rings of specified sizes from 3 to 7.
Inspired from related models,104 initial scalar edge (bond) features are projections of the atom distances onto Gaussians uniformly spanning the line segment from to with the step ,
(1) | |||
(2) |
Tensorial edge features are projections of normalized difference vectors between atomic positions onto spherical harmonics of ,
(3) | |||
(4) |
The initial and are then passed through embeddings
to give and .
Atomic representations are then updated by equivariant convolutional
layers:
{DispWithArrows}<L1 >
& w^(1)_ab
= g_31(e_ab ⊕x^(1)_a ⊕x^(1)_b)
∀(a,b)∈B
s(1)b≡s0e(1)b⊕s1o(1)b= 1Neigh(b)∑a: (a,b)∈Bt1(x(1)a, zab, w(1)ab) ∀b
x^0e(2) = x^(1) + s^0e(1)
x^(2) = x^0e(2) ⊕s^1o(1)
{DispWithArrows}<L2 >
& w^(2)_ab
= g_32(e_ab ⊕x^0e(2)_a ⊕x^0e(2)_b)
∀(a,b)∈B
s(2)b≡s0e(2)b⊕s1o(2)b⊕s1e(2)b= 1Neigh(b)∑a: (a,b)∈Bt2(x(2)a, zab, w(2)ab) ∀b
x^0e(3) = x^0e(2) + s^0e(2)
x^(3) = x^0e(3) ⊕(s^1o(1)+s^1o(2)) ⊕s^1e(2)
{DispWithArrows}<L3 >
& w^(3)_ab
= g_33(e_ab ⊕x^0e(3)_a ⊕x^0e(3)_b)
∀(a,b)∈B
s(3)b≡s0e(3)b⊕s1o(3)b⊕s1e(3)b⊕s0o(3)b= 1Neigh(b)∑a: (a,b)∈Bt3(x(3)a, zab, w(3)ab) ∀b
x^out = (x^0e(3) + s^0e(3)) ⊕s^0o(3),
where for example in ,
means that the result of the function
consists of scalars () and vectors () that can be treated separately.
Each function is a fully-connected weighted tensor product, as defined in e3nn,103
in the form of
(5) |
They are specified by signatures of irreducible representations (irreps) of two input and one output tensors. The output tensor is a combination of weighted sums of paths (pairs of input irreps) leading to each output irrep. The irrep sequence in each layer from 1–3 is illustrated in Figure 2. To obtain the weights for each convolutional layer , the spherical parts of and are concatenated with the bond features and passed through a Multi-Layer Perceptron (MLP).
The output of the equivariant molecular channels is the local molecular representation corresponding to atoms associated with features. Depending on the sum_mode hyperparameter, it is constructed either from the node features (node mode) or both node and edge features (both mode). In the case of , the vectors are taken to construct the molecular representation.
Inspired by the ChemProp model,65 we added an option to exclude hydrogen atoms as nodes when constructing the graph. The only information about hydrogens is then contained in the initial edge features of heavy atoms. The results shown in the main text are obtained without hydrogens, since in this regime the model performs systematically better. Comparison with a regime which uses explicit hydrogen atoms is provided in Section LABEL:S-sec:hydrogens.
2.2 Combining molecules for reactions
Once atom-wise molecular representations are learned for reactant and product molecules, they must be combined to form a reaction representation .
For certain datasets, atom-map** information is available, which correlates individual atoms in reactant molecules to individual atoms in product molecules according to their reaction mechanism. In this setting, the representations and are re-ordered such that the local representation vectors correspond to the same atom in reactants and products. Depending on the combine_mode hyperparameter, either a difference is taken between products’ and reactants’ atom representations, or they are summed, averaged or passed through an MLP. Thus, the local reaction representation consists of vectors reflecting how the environment changes in the reaction for each atom. We will address this regime as EquiReact.
With the reaction representation at hand, predictions are made in the so-called vector or energy modes. In the vector mode, the atomic vectors constituting the reaction representation are initially passed through an MLP to introduce non-linearity and then summed up to form a global reaction representation vector . The target (here, the activation barrier) is then learned using an MLP. This model pipeline is illustrated in Figure 3a. In the energy mode, the local reaction representations are used to learn atomic contributions to the target (Figure 3b). While performing worse in general, in some cases this mode yields the best predictions (see Section 3.1.2).
While atom-map** provides static information analogous to a reaction mechanism to link atoms in reactants to atoms in products, instead it is possible to dynamically (i.e., in a learnable fashion) exchange information between two molecular representations. For example, RXNMapper98 is a neural network that learns atom-map**s within the larger self-supervised task of predicting the randomly masked parts in a reaction sequence, using one head of a multi-head transformer architecture. Inspired by EquiBind,106 a neural network that predicts the rotation and translation of a ligand to a protein and contains a cross-attention module between ligand and receptor, EquiReact uses cross-attention between reactants and products to create a surrogate for atom-map**. Given queries , keys and values , attention is computed as
(6) |
and the “reordered” values are
(7) |
We used the implementation of this scaled-dot-product attention107 in PyTorch’s108 MultiheadAttention (PyTorch version 1.12.1). The representations are re-ordered using Eq. 6 and Eq. 7 with as the vector representation of reactants, and as the vector representations of products and vice versa (thus here ). The re-ordered representations of reactants and products are combined as for the case of atom-mapped reactions (Figures 3a and 3b). We note that other algorithms could also have been used to exchange information between reactants and products, for example in the form of message passing, or equivariant attention.109, 110
EquiReact also provides a simple “no map**” approach, called EquiReact, which does not rely on atom-map**, nor a surrogate cross-attention module. In the vector mode (Figure 3c), the atomic components of molecular representations and are summed up to obtain global vectors and , respectively. Then they are combined, according to the combine_mode parameter, to form a reaction vector which is used to learn the target with an MLP. In the energy mode (Figure 3d) individual atomic representations are used to learn their contributions to the quasi-molecular energies of reactants and products, which are later combined (according to the combine_mode parameter) to predict the target. In most cases, this simpler model out-performs EquiReact (vide infra).
3 Results and Discussion
3.1 Model performance
The performance of EquiReact on three diverse datasets (the GDB7-22-TS,101 Cyclo-23-TS 102 and Proparg-21-TS 91, 63) is illustrated in Table 1, compared to previously best baseline models:38 ChemProp,65 a graph neural network that uses atom maps to construct a condensed graph of reaction (CGR), and the SLATM8 fingerprint adapted to reactions by taking the difference between product and reactant fingerprints (SLATM),36 combined with KRR models (SLATM+KRR). The models are compared in three regimes: with high-quality atom-map** information (“True”) derived from the transition state structure or heuristic rules,102, 93, 101, 65, 97 with atom-maps obtained using the open-source RXNMapper98 (“RXNMapper”) and without any atom-map** information at all (“None”). As discussed in recent work,111, 38 previously developed graph-based models for reaction property prediction65, 66, 64, 99, 100 including ChemProp 65 reported prediction errors only in the “True” atom-map** regime. The “RXNMapper” regime is important for real chemistry where the reaction mechanism is not known and atom-map** using heuristic rules is impossible. The “None” regime is critical for all chemistry that falls outside of the realm of organic chemistry captured in the patents112 that RXNMapper98 is pre-trained on. SLATM+KRR always operates in the “None” regime.
The atom-map**-based model EquiReact is used in the “True” and “RXNMapper” regimes. In the “None” regime, EquiReact and EquiReact were tested. EquiReact consistently outperformed EquiReact, so we include only EquiReact and refer the reader to the Supplementary Information for their comparison. EquiReact is compared to ChemProp and SLATM+KRR baselines using both random splits to measure interpolative capabilities as well as scaffold splits to measure extrapolative capabilities. Scaffold splitting113, 114 clusters molecules based on their 2D backbones (such as Bemis–Murcko scaffolds115) and ensures that the clusters (scaffolds) belonging to the train, test, and validation sets do not overlap. This is a more challenging regime for a model than random splitting where very similar molecules could appear in both the training and test sets.
Dataset (property, units) | Atom-map** regime | ChemProp | SLATM+KRR | EquiReact |
Random splits | ||||
GDB7-22-TS (, kcal/mol) | True | — | ||
RXNMapper | — | |||
None | ||||
Cyclo-23-TS (, kcal/mol) | True | — | ||
RXNMapper | — | |||
None | ||||
Proparg-21-TS (, kcal/mol) | True | — | ||
None | ||||
Scaffold splits | ||||
GDB7-22-TS (, kcal/mol) | True | — | ||
RXNMapper | — | |||
None | ||||
Cyclo-23-TS (, kcal/mol) | True | — | ||
RXNMapper | — | |||
None | ||||
Proparg-21-TS (, kcal/mol) | True | — | ||
None |
3.1.1 GDB7-22-TS dataset
This dataset is distinct from the other two in that it includes variations in the reaction class (and mechanism), thereby showing a greater dependence on the existence and quality of atom-map** information in the models. It has already been observed38 that for existing models (including ChemProp and SLATM), using random splits, that there is stark hierarchy in the predictions from the “True” to “RXNMapper” to “None” regimes.
In the “True” regime, EquiReact does not improve predictive capabilities over ChemProp model with random splits. This points to the importance of the chemical diversity in this set, where knowledge of the reaction mechanism (in the form of atom maps) is sufficient information to predict the reaction barriers without information of the three-dimensional geometries of reactants and products. With scaffold splits however, EquiReact and ChemProp have the same predictive MAEs within the standard deviations. It can be seen that a model based on geometry information naturally extrapolates better than one trained on atom maps. Bemis–Murcko scaffold115 splitting clusters molecules (here, reactants) based on ring systems. Out-of-sample test molecules may therefore appear “novel” from the point of view of the reaction graph, but will still feature distances and angles close to what the model has seen during training.
Moving to the “RXNMapper” regime, the trend observed in the relative model performance in the “True” regime is exaggerated. Both EquiReact and ChemProp agree within the standard deviation using random splits, while EquiReact outperforms ChemProp using scaffold splits.
In the “None” regime, the difference in the performance of the models is even greater. In terms of MAE, EquiReact outperforms ChemProp by more than using random splits, and more than using scaffold splits. EquiReact’s improvement compared to SLATM+KRR is smaller. The SLATM representation also makes use of 3D coordinates of the reactants and products as is therefore more fundamentally similar to EquiReact than ChemProp. Nevertheless, EquiReact makes use of equivariant components for molecular features (vs. the invariant features of SLATM) and learns a representation end-to-end, allowing for a more performant model.
Thanks to EquiReact exploiting the chemical information present in atom-maps (if available) and encoding the natural symmetries of molecular reaction components, the stark gap previously observed38 from the “True” to “None” regimes has been diminished, with the prediction MAEs ranging from for the GDB7-22-TS set using random splits and using scaffold splits. This is illustrated in Figure 4, where for the same scaffold split, outliers in the “True” plot successively move closer to the line.
3.1.2 Cyclo-23-TS dataset
The Cyclo-23-TS 102 dataset contains a single fixed reaction-class and has been previously illustrated to show less dependence on the quality of atom-map** than the GDB7-22-TS.38
For this set, EquiReact outperforms or matches the other models in all three regimes for both random and scaffold splits. This illustrates that a model based purely on geometry information of reactants and products, without any chemical information in the form of atom-map** or surrogates thereof, can be highly performant for reaction property prediction.
The best model is obtained in the “None” regime, with EquiReact in the energy mode (Figure 3d). As outlined in Section 2.2, in energy mode an energy contribution is learned for reactants’ and products’ atoms separately. In the original publication,102 Stuyver et al. illustrate that the activation barriers correlate linearly with the reaction energy. Since the reaction energy is the difference between products’ and reactants’ energies, the energy mode is the best choice for a model learning the reaction energy, and in the case of this dataset, for the barrier too, due to its linear correlation with the reaction energy.
3.1.3 Proparg-21-TS dataset
The Proparg-21-TS 91, 63 is a small dataset for neural network standards (753 points) and therefore constitutes a challenge for the data efficiency of our model.
EquiReact obtains the best predictive abilities for both the “None” and “True” regimes, where the “RXNMapper” regime is not available since it cannot atom-map the reaction SMILES of this set. These results illustrate that in line with the observations for ENNs for molecules,43, 44, 45, 46, 47, 48, 49, 50 ENNs for chemical reactions can be highly data-efficient and operate in the “low-data” regime.
Geometry-only EquiReact results in MAEs / lower than those of previous state-of-the-art SLATM using random / scaffold splits. Since the enantioselectivity is related to the barrier through an exponential relationship, this difference is significant for the downstream enantioselectivity prediction.63 Like the Cyclo-23-TS set, this dataset consists of a fixed reaction class and the model does not benefit from being provided the “obvious” chemical information: including true atom-maps does not decrease the error.
These three datasets illustrate the benefits of the flexibility of EquiReact: depending on the datasets’ particular challenges, the model exploits the available information to yield the best-performing model in almost all cases. Since the modes of the model may be specified as hyperparameters, the optimized version of EquiReact can emerge with minimal user intervention.
3.2 Model behaviour
Since the GDB7-22-TS set illustrates the most dependence on the chemical diversity captured in the models, studying EquiReact and baseline models SLATM and ChemProp in the “True” regime best captures the difference in their chemical behaviour.
Figure 5 compares the (latent) representations of EquiReact, ChemProp and SLATM using t-SNE116 maps. In the upper panel, we find that the quality of the correlation between the representations and the target property corresponds to the relative performance of the models in this regime (Table 1). The best-performing ChemProp shows a smooth transition of the target property across the plot, while EquiReact’s gradient is somewhat distorted and the map of the worst-performing SLATM does not have a clear structure. The lower panel shows the correlation of the representations with the reaction types. ChemProp, as a chemically-inspired model, illustrates clearer clusters in the reaction type, whereas EquiReact does not, using different information (distances and angles) to correlate its latent space representation with the target property. SLATM lies somewhere in-between, illustrating some clusters of reaction class. While SLATM is also a distance-based model, the binning structure used to create the representation8, 36 may result in better correlation with the reaction types, since the pairwise bins for example naturally cluster features such as \ceC–\ceH bond formation or breaking. Nevertheless, as discussed in Section 3.1, EquiReact better exploits distance information only to out-perform SLATM.
Figure 6 illustrates how EquiReact “True” performs for the most common reaction types in the GDB7-22-TS set defined by bond breaking and formation (see Section 5.3). EquiReact performs universally well across the different reaction types, with consistently low errors and relatively small standard deviations. The reactions for which the model has higher mean errors and standard deviations (C–H,C–H (blue) and H–H,C–H,C–H (green)) correspond to those involving C–H features. Since the model is trained without explicit H nodes in the graph, features associated with X–H bonds are included implicitly in the model. Since C is the most frequently occurring element in various different configurations, capturing all the C–H features is more challenging than the O–H features for example, which will be more similar to one another. The equivalent plot for the model trained with explicit H nodes is shown in Figure LABEL:S-fig:box_plot_with_H, illustrating that the standard deviations reduce for the reaction types involving C–H features.
3.3 Geometry quality
To illustrate that EquiReact does not require high-quality molecular structures to be used in an out-of-sample scenario, we train and test a model using lower-quality GFN2-xTB117 (xTB) geometries to predict higher-level barriers (CCSD(T)-F12a/cc-pVDZ-F12//B97X-D3/def2-TZVP for GDB7-22-TS, B3LYP-D3(BJ)/def2-TZVP//B3LYP-D3(BJ)/def2-SVP for Cyclo-23-TS and B97D/TZV(2p,2d) for Proparg-21-TS). The results are illustrated in Figure 7 for the three datasets with DFT and xTB geometries, and compared to the SLATM+KRR model in the same settings. EquiReact benefits from a lower sensitivity to the geometry quality compared to the pre-designed representation SLATM+KRR across the three datasets.
For the GDB7-22-TS set, there is a negligible difference in model performance moving from DFT to xTB geometries. The xTB geometries are a good proxy for the DFT geometries here, since this set consists of small, charge-neutral organic molecules, which are largely well-described by semiempirical methods. For the Cyclo-23-TS set, while the molecules are still organic, they are larger than those in the GDB7-22-TS set and there is a greater divergence between the GFN2-xTB and DFT geometries, resulting in a larger deterioration with these geometries. Figure LABEL:S-fig:cyclo_rmsd illustrates that models trained on lower quality (i.e., xTB) geometries do not produce higher errors for molecules with particularly high deviation from DFT geometries. Rather, there is a consistent deterioration in the model performance when training on xTB geometries and predicting on DFT barriers, if the xTB geometries are a poor proxy for the DFT ones.
The Proparg-21-TS set is the most complex of the three for GFN2-xTB, since these systems with charged organosilicon compounds differ considerably from those used to parameterize semi-empirical methods or force field methods. As described in Section 5.3, unlike for the other datasets where we generate an initial structure from SMILES using force field methods, for this set it is impossible and we instead generate xTB geometries from the DFT ones. While this is not a feasible geometry generation pipeline out-of-sample, it still demonstrates how different methods perform with high and low-quality geometries. Here, we see that EquiReact is significantly less sensitive than SLATM+KRR and the variation trained with lower quality geometries still offers competitive errors ( kcal/mol for the “None” model).
4 Conclusions
Despite the crowning of equivariant neural networks as best-in-class for the prediction of computed molecular properties, the equivalent has not been well-established for the prediction of reaction properties. We contribute to this domain by introducing EquiReact, an equivariant neural network constructed from the three-dimensional coordinates of reactants and products. While other graph-based models for reactions rely on atom-mapped reaction SMILES, EquiReact is flexible and can include atom-map** if it is available. Particularly in the regime without atom-map** information, EquiReact outperforms existing baselines99, 66, 38 for the prediction of reaction barriers on the GDB7-22-TS,101 Cyclo-23-TS 102 and Proparg-21-TS 63, 91 datasets. The latter dataset in particular illustrates both the data efficiency of our model (with a total dataset size of less than 700 datapoints) as well as EquiReact’s forte in describing subtle changes in geometry. EquiReact demonstrates superior extrapolation capabilities compared to the 2D-graph-based models for all datasets and in all regimes tested. It also suffers less in moving from DFT-level to GFN2-xTB-level117 geometries compared to existing methods. These points illustrate its utility in out-of-sample scenarios.
5 Methods
5.1 Datasets
We test EquiReact on three datasets of reaction barriers previously used to benchmark reaction representations.38 In all cases, optimized three-dimensional structures of reactants and products are provided, which are used to train models and make predictions. The activation barrier is not a direct function of these structures, but using the TS structure to make predictions removes the utility of the ML models vs. direct computation of the TS. Thus we use an implicit interpolation of reactants’ and products’ structures as a proxy for the TS as in previous works.36, 63, 38
The GDB7-22-TS 101 dataset consists of close to diverse organic reactions automatically constructed from the GDB7 dataset118, 119, 55 using the growing string method120 along with corresponding energy barriers () computed at the CCSD(T)-F12a/cc-pVDZ-F12//B97X-D3/def2-TZVP level. The dataset provides atom-mapped SMILES, with “True” maps derived from the transition state. For reactions out of , one of the product’s SMILES represent a molecule different from the xyz structure. These reactions were therefore excluded from the dataset, leading to a modified GDB7-22-TS set used here.
While there are no pre-defined classes for all the reactions in the GDB7-20-TS93 or GDB7-22-TS 101 sets, Grambow et al.64 split the dataset into reactions undergoing certain bond changes: for example, the most common type was breaking of a C–H bond (C–H) and a C–C bond (C–C) in the reactants and formation of a C–H bond (C–H) in the products, giving the reaction type signature C–H,C–C,C–H. Here, we extract similar reaction types by comparing the connectivity matrices from atom-mapped reaction SMILES of reactants and products (ignoring bond orders). The most abundant reaction types in the dataset are C–H,C–C,C–H (1667 reactions), H–N,C–H (633), C–H,C–H (619), H–O,C–H,C–O (599) and H–H,C–H,C–H (517).
The original Cyclo-23-TS 102 dataset encompasses profiles for cycloaddition reactions with activation free energies () computed at the B3LYP-D3(BJ)/def2-TZVP//B3LYP-D3(BJ)/def2-SVP level. The dataset provides atom-mapped SMILES with “True” maps for heavy atoms derived from either the transition state structure or heuristic rules. For the regime with explicit hydrogen atoms, we atom-mapped the xyz files by matching the reactants, given in two separate files, to the provided transition state structure, which closely resembles the two reactants and has the same atom order as in the products. This was done with a labeled graph matching algorithm as implemented in NetworkX.121, 122 The algorithm is unaware of chirality, double-bond stereochemistry or conformations, thus may lead to not exactly correct atom-map**s. We also found that for four reactions the product SMILES and xyz files depict different species, thus the the set was reduced to reactions.
The Proparg-21-TS dataset91, 63 contains 753 structures of intermediates before and after the enantioselective transition state of benzaldehyde propargylation, with activation energies () computed at the B97D/TZV(2p,2d) level. SMILES strings (“Fragment-Based” SMILES) and “True” atom-maps are not provided in the original dataset, these are taken from Ref. 38.
RXNMapper98-mapped versions of GDB7-22-TS and Cyclo-23-TS were obtained with RXNMapper 0.3.0 with the default settings. The Proparg-21-TS set cannot be mapped, because the underlying libraries cannot process its SMILES. Since RXNMapper sorts molecules in case of multiple reactants and/or products, which would complicate SMILES–xyz matching (see Section 5.2 below), we used a locally modified version that does not change the molecule order.
5.2 Matching SMILES strings to xyz geometries
EquiReact makes use of both the graph structure of a molecule (as provided in the SMILES string) and the three-dimensional structure (in the xyz). The atoms in the graph are associated with the atomic coordinates provided in the xyz file. Thanks to the way the GDB7-22-TS dataset101 was generated, the atomic coordinates can be easily matched to SMILES which in turn allows to atom-map reactants to products. However, we also tested RXNMapper-mapped SMILES which do not respect the same constraints. Therefore, for consistency, we use a SMILES–xyz matching procedure detailed below.
We construct molecular graphs from xyz using covalent radii and matched them to RDKit105 molecular graphs obtained from SMILES with a labeled graph matching algorithm as implemented in NetworkX.121, 122 This procedure is however unaware of chirality and double-bond stereochemistry, thus some of the matches might be incorrect. Still, it provides a flexible method that can be applied to any dataset consisting of SMILES strings and xyz files.
The same procedure was applied to the Cyclo-23-TS dataset in the few cases when the the canonical SMILES have a different atom ordering than xyz.
5.3 xTB geometry generation
For the GDB7-22-TS and Cyclo-23-TS datasets, the starting structures were generated from SMILES using the distance-geometry embedding implemented in RDKit105 with the srETKDGv3 settings.123 Ten conformations were produced per molecule, which were then energy-ranked with the MMFF94 implementation124 in RDKit, defaulting to UFF in case of missing parameters. The lowest energy conformer was retained. For the Proparg-21-TS set, the original B97D/TZV(2p,2d) geometries were used as a starting point, because the stereochemical and conformational diversity of this set cannot be completely encoded with SMILES. Therefore MMFF94 will fail to generate an initial geometry from SMILES.
For all the sets, the starting structures were optimized at the GFN2-xTB semiempirical level of theory117 at the “loose” convergence level for a maximum of 1000 iterations using xTB v6.2 RC2. For reactions of the GDB7-22-TS set and reactions of the Cyclo-23-TS set, at least one of the participating molecules either could not converge to any reasonable configuration or converged to a structure not matching the SMILES. These reactions were excluded from the geometry quality tests (Sec. 3.3).
5.4 Model training
EquiReact was trained using the Adam optimizer 125 with learning rate and weight decay parameters as hyperparameters to be optimized. The learning rate was reduced by 40% after epochs of no improvement in the validation MAE, as in Ref. 106. Models were trained for max. epochs, using early stop** after epochs of no improvement. The model with the best validation score was then used to make predictions on the test set.
The optimal model hyperparameters were searched within the following values: learning rate ; weight decay parameter ; node and edge features embedding size ; hidden space size ; number of edge features ; number of convolutional layers ; radial cutoff ; maximum number of atom neighbors ; dropout probability ; sum_mode [node, both]; combine_mode [mlp, diff, mean, sum]; graph_mode [energy, vector].
The hyperparameter search was done using EquiReact (without attention or map**), using Bayesian search as implemented in Weights & Biases.126 For the EquiReact regime, the learning rate and weight decay parameter were optimized afterward in a grid search, setting the other parameters to those from the search for EquiReact. Hydrogen atoms were included as nodes in the graphs. Sweeps were run for 128 epochs for the GDB7-22-TS and Proparg-21-TS sets, and for 256 epochs for the Cyclo-23-TS set on a single random split. The parameters resulting in the best validation error, summarized in Table LABEL:S-tab:model-params, were used for all the other model settings.
5.5 Baseline models
The ChemProp model is based on a Condensed Graph of Reaction (CGR)65 built from atom-mapped SMILES strings of reactants and products, which is then passed through the directed message-passing neural network chemprop114 (version 1.5.0). Models are trained using the default parameters of chemprop.
Molecular SLATM vectors were generated using the qml python package127 before being combined to form the reaction version SLATM. SLATM is combined with Kernel Ridge Regression (KRR) models using the best kernel functions as in van Gerwen et al.38 The kernel width and regularization parameters were optimized on the first fold of the ten using the random splits, in line with how the hyperparameters were optimized for EquiReact.
Code and Data Availability
The code is available as a github repository at https://github.com/lcmd-epfl/EquiReact. The versions of the datasets used, as well as any processing applied to them, can be found in the same repository.
Supplementary Information is provided in the freely available file SI.pdf, detailing the hyperparameters of the different models tested (Section LABEL:S-sec:model-params), the model performance with and without explicit hydrogen atoms (Section LABEL:S-sec:hydrogens), and the discussion of the model with a cross-attention surrogate for atom-map** (Section LABEL:S-sec:cross).
Author Information
Author contributions
P.v.G. and C.B. conceptualized the project. EquiReact and support codes were written and run by P.v.G. and K.R.B., with design suggestions from C.B. and V.R.S. Results were analyzed by P.v.G., K.R.B., C.B. and V.R.S. xTB computations were run by R.L. The original draft was written by P.v.G. and K.R.B. with reviews and edits from all authors. C.C. and A.K. provided supervision and are acknowledged for acquiring funding.
Conflict of interest
The authors have no conflicts to disclose.
P.v.G., C.B., V.R.S., R.L., A.K. and C.C. acknowledge the National Centre of Competence in Research (NCCR) “Sustainable chemical process through catalysis (Catalysis)”, grant number 180544, of the Swiss National Science Foundation (SNSF) for financial support. K.R.B. and C.C. were supported by the European Research Council (grant number 817977) and by the National Centre of Competence in Research (NCCR) “Materials’ Revolution: Computational Design and Discovery of Novel Materials (MARVEL)”, grant number 205602, of the Swiss National Science Foundation.
References
- Behler and Parrinello 2007 Behler, J.; Parrinello, M. Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces. Phys. Rev. Lett. 2007, 98, 146401
- Rupp et al. 2012 Rupp, M.; Tkatchenko, A.; Müller, K.-R.; von Lilienfeld, O. A. Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning. Phys. Rev. Lett. 2012, 108, 058301
- Bartók et al. 2013 Bartók, A. P.; Kondor, R.; Csányi, G. On representing chemical environments. Phys. Rev. B 2013, 87, 184115
- Hansen et al. 2015 Hansen, K.; Biegler, F.; Ramakrishnan, R.; Pronobis, W.; von Lilienfeld, O. A.; Müller, K.-R.; Tkatchenko, A. Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space. J. Phys. Chem. Lett. 2015, 6, 2326–2331
- Huo and Rupp 2017 Huo, H.; Rupp, M. Unified representation for machine learning of molecules and crystals. arXiv preprint 2017, arXiv:1704.06439
- Faber et al. 2018 Faber, F. A.; Christensen, A. S.; Huang, B.; von Lilienfeld, O. A. Alchemical and structural distribution based representation for universal quantum machine learning. J. Chem. Phys. 2018, 148, 241717
- Christensen et al. 2020 Christensen, A. S.; Bratholm, L. A.; Faber, F. A.; Anatole von Lilienfeld, O. FCHL revisited: Faster and more accurate quantum machine learning. J. Chem. Phys. 2020, 152, 044107
- Huang and von Lilienfeld 2020 Huang, B.; von Lilienfeld, O. A. Quantum machine learning using atom-in-molecule-based fragments selected on the fly. Nat. Chem. 2020, 12, 945–951
- Drautz 2019 Drautz, R. Atomic cluster expansion for accurate and transferable interatomic potentials. Phys. Rev. B 2019, 99, 014104
- Dusson et al. 2022 Dusson, G.; Bachmayr, M.; Csányi, G.; Drautz, R.; Etter, S.; van der Oord, C.; Ortner, C. Atomic cluster expansion: Completeness, efficiency and stability. J. Comput. Phys. 2022, 454, 110946
- Grisafi and Ceriotti 2019 Grisafi, A.; Ceriotti, M. Incorporating long-range physics in atomic-scale machine learning. J. Chem. Phys. 2019, 151, 204105
- Grisafi et al. 2021 Grisafi, A.; Nigam, J.; Ceriotti, M. Multi-scale approach for the prediction of atomic scale properties. Chem. Sci. 2021, 12, 2078–2090
- Nigam et al. 2020 Nigam, J.; Pozdnyakov, S.; Ceriotti, M. Recursive evaluation and iterative contraction of -body equivariant features. J. Chem. Phys. 2020, 153, 121101
- Fabrizio et al. 2022 Fabrizio, A.; Briling, K. R.; Corminboeuf, C. SPAHM: the Spectrum of Approximated Hamiltonian Matrices representations. Digital Discovery 2022, 1, 286–294
- Briling et al. 2023 Briling, K. R.; Calvino Alonso, Y.; Fabrizio, A.; Corminboeuf, C. SPAM(a,b): encoding the density information from guess Hamiltonian in quantum machine learning representations. arXiv preprint 2023, arXiv:2309.02950
- Karandashev and von Lilienfeld 2022 Karandashev, K.; von Lilienfeld, O. A. An orbital-based representation for accurate quantum machine learning. J. Chem. Phys. 2022, 156, 114101
- Llenga and Gryn’ova 2023 Llenga, S.; Gryn’ova, G. Matrix of orthogonalized atomic orbital coefficients representation for radicals and ions. J. Chem. Phys. 2023, 158, 214116
- Li et al. 2015 Li, Z.; Kermode, J. R.; De Vita, A. Molecular dynamics with on-the-fly machine learning of quantum-mechanical forces. Phys. Rev. Lett. 2015, 114, 096405
- Chmiela et al. 2017 Chmiela, S.; Tkatchenko, A.; Sauceda, H. E.; Poltavsky, I.; Schütt, K. T.; Müller, K.-R. Machine learning of accurate energy-conserving molecular force fields. Sci. Adv. 2017, 3, e1603015
- Chmiela et al. 2018 Chmiela, S.; Sauceda, H. E.; Müller, K.-R.; Tkatchenko, A. Towards exact molecular dynamics simulations with machine-learned force fields. Nat. Commun. 2018, 9, 3887
- Behler 2017 Behler, J. First principles neural network potentials for reactive simulations of large molecular and condensed systems. Angew. Chem. Int. Ed. 2017, 56, 12828–12840
- Smith et al. 2018 Smith, J. S.; Nebgen, B.; Lubbers, N.; Isayev, O.; Roitberg, A. E. Less is more: Sampling chemical space with active learning. J. Chem. Phys. 2018, 148, 241733
- Bereau et al. 2015 Bereau, T.; Andrienko, D.; Von Lilienfeld, O. A. Transferable atomic multipole machine learning models for small organic molecules. J. Chem. Theory Comput. 2015, 11, 3225–3233
- Grisafi et al. 2018 Grisafi, A.; Wilkins, D. M.; Csányi, G.; Ceriotti, M. Symmetry-adapted machine learning for tensorial properties of atomistic systems. Phys. Rev. Lett. 2018, 120, 036002
- Wilkins et al. 2019 Wilkins, D. M.; Grisafi, A.; Yang, Y.; Lao, K. U.; DiStasio Jr, R. A.; Ceriotti, M. Accurate molecular polarizabilities with coupled cluster theory and machine learning. Proc. Natl. Acad. Sci. U.S.A. 2019, 116, 3401–3406
- Montavon et al. 2013 Montavon, G.; Rupp, M.; Gobre, V.; Vazquez-Mayagoitia, A.; Hansen, K.; Tkatchenko, A.; Müller, K.-R.; Von Lilienfeld, O. A. Machine learning of molecular electronic properties in chemical compound space. New J. Phys. 2013, 15, 095003
- Mazouin et al. 2022 Mazouin, B.; Schöpfer, A. A.; von Lilienfeld, O. A. Selected machine learning of HOMO–LUMO gaps with improved data-efficiency. Mater. Adv. 2022, 3, 8306–8316
- Brockherde et al. 2017 Brockherde, F.; Vogt, L.; Li, L.; Tuckerman, M. E.; Burke, K.; Müller, K.-R. Bypassing the Kohn-Sham equations with machine learning. Nat. Commun. 2017, 8, 872
- Grisafi et al. 2010 Grisafi, A.; Fabrizio, A.; Meyer, B.; Wilkins, D. M.; Corminboeuf, C.; Ceriotti, M. Transferable machine-learning model of the electron density. ACS Cent. Sci. 2010, 5, 57–64
- Fabrizio et al. 2019 Fabrizio, A.; Grisafi, A.; Meyer, B.; Ceriotti, M.; Corminboeuf, C. Electron density learning of non-covalent systems. Chem. Sci. 2019, 10, 9424–9432
- Musil et al. 2021 Musil, F.; Grisafi, A.; Bartók, A. P.; Ortner, C.; Csányi, G.; Ceriotti, M. Physics-Inspired Structural Representations for Molecules and Materials. Chem. Rev. 2021, 121, 9759–9815
- Langer et al. 2022 Langer, M. F.; Goessmann, A.; Rupp, M. Representations of molecules and materials for interpolation of quantum-mechanical simulations via machine learning. npj Comput. Mater. 2022, 8, 41
- Huang and von Lilienfeld 2021 Huang, B.; von Lilienfeld, O. A. Ab Initio Machine Learning in Chemical Compound Space. Chem. Rev. 2021, 121, 10001–10036
- Kulik et al. 2022 Kulik, H. J.; Hammerschmidt, T.; Schmidt, J.; Botti, S.; Marques, M. A. L.; Boley, M.; Scheffler, M.; Todorović, M.; Rinke, P.; Oses, C.; Smolyanyuk, A.; Curtarolo, S.; Tkatchenko, A.; Bartók, A. P.; Manzhos, S.; Ihara, M.; Carrington, T.; Behler, J.; Isayev, O.; Veit, M.; Grisafi, A.; Nigam, J.; Ceriotti, M.; Schütt, K. T.; Westermayr, J.; Gastegger, M.; Maurer, R. J.; Kalita, B.; Burke, K.; Nagai, R.; Akashi, R.; Sugino, O.; Hermann, J.; Noé, F.; Pilati, S.; Draxl, C.; Kuban, M.; Rigamonti, S.; Scheidgen, M.; Esters, M.; Hicks, D.; Toher, C.; Balachandran, P. V.; Tamblyn, I.; Whitelam, S.; Bellinger, C.; Ghiringhelli, L. M. Roadmap on Machine learning in electronic structure. Electron. Struct. 2022, 4, 023004
- Glielmo et al. 2017 Glielmo, A.; Sollich, P.; De Vita, A. Accurate interatomic force fields via machine learning with covariant kernels. Phys. Rev. B 2017, 95, 214302
- van Gerwen et al. 2022 van Gerwen, P.; Fabrizio, A.; Wodrich, M. D.; Corminboeuf, C. Physics-based representations for machine learning properties of chemical reactions. Mach. Learn.: Sci. Technol. 2022, 3, 045005
- Faber et al. 2017 Faber, F. A.; Hutchison, L.; Huang, B.; Gilmer, J.; Schoenholz, S. S.; Dahl, G. E.; Vinyals, O.; Kearnes, S.; Riley, P. F.; Von Lilienfeld, O. A. Prediction errors of molecular machine learning models lower than hybrid DFT error. J. Chem. Theory Comput. 2017, 13, 5255–5264
- van Gerwen et al. 2023 van Gerwen, P.; Briling, K. R.; Calvino Alonso, Y.; Franke, M.; Corminboeuf, C. Benchmarking machine-readable vectors of chemical reactions on computed activation barriers. ChemRxiv preprint 2023, doi:10.26434/chemrxiv--2023--0hgbc
- Schütt et al. 2017 Schütt, K.; Kindermans, P.-J.; Sauceda Felix, H. E.; Chmiela, S.; Tkatchenko, A.; Müller, K.-R. Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. Adv. Neural Inf. Process. Syst. 2017, 30, 991–1001
- Unke and Meuwly 2019 Unke, O. T.; Meuwly, M. PhysNet: A neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput. 2019, 15, 3678–3693
- Gasteiger et al. 2020 Gasteiger, J.; Groß, J.; Günnemann, S. Directional message passing for molecular graphs. arXiv preprint 2020, arXiv:2003.03123
- Gilmer et al. 2017 Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; Vinyals, O.; Dahl, G. E. Neural message passing for quantum chemistry. International conference on machine learning. 2017; pp 1263–1272
- Batzner et al. 2022 Batzner, S.; Musaelian, A.; Sun, L.; Geiger, M.; Mailoa, J. P.; Kornbluth, M.; Molinari, N.; Smidt, T. E.; Kozinsky, B. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 2022, 13, 2453
- Gasteiger et al. 2021 Gasteiger, J.; Becker, F.; Günnemann, S. Gemnet: Universal directional graph neural networks for molecules. Adv. Neural Inf. Process. Syst. 2021, 34, 6790–6802
- Haghighatlari et al. 2022 Haghighatlari, M.; Li, J.; Guan, X.; Zhang, O.; Das, A.; Stein, C. J.; Heidar-Zadeh, F.; Liu, M.; Head-Gordon, M.; Bertels, L.; Hao, H.; Leven, I.; Head-Gordon, T. Newtonnet: A newtonian message passing network for deep learning of interatomic potentials and forces. Digital Discovery 2022, 1, 333–343
- Qiao et al. 2020 Qiao, Z.; Welborn, M.; Anandkumar, A.; Manby, F. R.; Miller, T. F. OrbNet: Deep learning for quantum chemistry using symmetry-adapted atomic-orbital features. J. Chem. Phys. 2020, 153, 124111
- Thomas et al. 2018 Thomas, N.; Smidt, T.; Kearnes, S.; Yang, L.; Li, L.; Kohlhoff, K.; Riley, P. Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds. arXiv preprint 2018, arXiv:1802.08219
- Townshend et al. 2020 Townshend, R. J.; Townshend, B.; Eismann, S.; Dror, R. O. Geometric prediction: Moving beyond scalars. arXiv preprint 2020, arXiv:2006.14163
- Anderson et al. 2019 Anderson, B.; Hy, T. S.; Kondor, R. Cormorant: Covariant molecular neural networks. Adv. Neural Inf. Process. Syst. 2019, 32, 14537–14546
- Satorras et al. 2021 Satorras, V. G.; Hoogeboom, E.; Welling, M. E(n) Equivariant Graph Neural Networks. Proceedings of the 38th International Conference on Machine Learning. 2021; pp 9323–9332
- Christensen et al. 2021 Christensen, A. S.; Sirumalla, S. K.; Qiao, Z.; O’Connor, M. B.; Smith, D. G.; Ding, F.; Bygrave, P. J.; Anandkumar, A.; Welborn, M.; Manby, F. R.; Miller, T. F. OrbNet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy. J. Chem. Phys. 2021, 155, 204103
- Schütt et al. 2021 Schütt, K.; Unke, O.; Gastegger, M. Equivariant message passing for the prediction of tensorial properties and molecular spectra. Proceedings of the 38th International Conference on Machine Learning. 2021; pp 9377–9388
- Unke et al. 2021 Unke, O. T.; Chmiela, S.; Gastegger, M.; Schütt, K. T.; Sauceda, H. E.; Müller, K.-R. SpookyNet: Learning force fields with electronic degrees of freedom and nonlocal effects. Nat. Commun. 2021, 12, 7273
- Cheng et al. 2019 Cheng, L.; Welborn, M.; Christensen, A. S.; Miller, T. F. A universal density matrix functional from molecular orbital-based machine learning: Transferability across organic molecules. J. Chem. Phys. 2019, 150, 131103
- Ramakrishnan et al. 2014 Ramakrishnan, R.; Dral, P. O.; Rupp, M.; Von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 2014, 1, 140022
- Musaelian et al. 2023 Musaelian, A.; Batzner, S.; Johansson, A.; Sun, L.; Owen, C. J.; Kornbluth, M.; Kozinsky, B. Learning local equivariant representations for large-scale atomistic dynamics. Nature Communications 2023, 14, 579
- Law et al. 2014 Law, V.; Knox, C.; Djoumbou, Y.; Jewison, T.; Guo, A. C.; Liu, Y.; Maciejewski, A.; Arndt, D.; Wilson, M.; Neveu, V., et al. DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res. 2014, 42, D1091–D1097
- Folmsbee and Hutchison 2021 Folmsbee, D.; Hutchison, G. Assessing conformer energies using electronic structure and machine learning methods. Int. J. Quantum Chem. 2021, 121, e26381
- Smith et al. 2017 Smith, J. S.; Isayev, O.; Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 2017, 8, 3192–3203
- Zeng et al. 2020 Zeng, J.; Cao, L.; Xu, M.; Zhu, T.; Zhang, J. Z. Complex reaction processes in combustion unraveled by neural network-based molecular dynamics simulation. Nat. Commun. 2020, 11, 5713
- Chanussot et al. 2021 Chanussot, L.; Das, A.; Goyal, S.; Lavril, T.; Shuaibi, M.; Riviere, M.; Tran, K.; Heras-Domingo, J.; Ho, C.; Hu, W., et al. Open catalyst 2020 (OC20) dataset and community challenges. ACS Catal. 2021, 11, 6059–6072
- Lewis-Atwell et al. 2022 Lewis-Atwell, T.; Townsend, P. A.; Grayson, M. N. Machine learning activation energies of chemical reactions. WIREs Comput. Mol. Sci. 2022, 12, e1593
- Gallarati et al. 2021 Gallarati, S.; Fabregat, R.; Laplaza, R.; Bhattacharjee, S.; Wodrich, M. D.; Corminboeuf, C. Reaction-based machine learning representations for predicting the enantioselectivity of organocatalysts. Chem. Sci. 2021, 12, 6879–6889
- Grambow et al. 2020 Grambow, C. A.; Pattanaik, L.; Green, W. H. Deep Learning of Activation Energies. J. Phys. Chem. Lett. 2020, 11, 2992–2997
- Heid and Green 2022 Heid, E.; Green, W. H. Machine learning of reaction properties via learned representations of the condensed graph of reaction. J. Chem. Inf. Model. 2022, 62, 2101–2110
- Spiekermann et al. 2022 Spiekermann, K. A.; Pattanaik, L.; Green, W. H. Fast Predictions of Reaction Barrier Heights: Toward Coupled-Cluster Accuracy. J. Phys. Chem. A 2022, 126, 3976–3986
- Zhao et al. 2023 Zhao, Q.; Anstine, D. M.; Isayev, O.; Savoie, B. M. machine learning for reaction property prediction. Chem. Sci. 2023, 14, 13392–13401
- Heinen et al. 2021 Heinen, S.; von Rudorff, G. F.; von Lilienfeld, O. A. Toward the design of chemical reactions: Machine learning barriers of competing mechanisms in reactant space. J. Chem. Phys. 2021, 155, 064105
- Singh et al. 2019 Singh, A. R.; Rohr, B. A.; Gauthier, J. A.; Nørskov, J. K. Predicting chemical reaction barriers with a machine learning model. Catal. Lett. 2019, 149, 2347–2354
- Choi et al. 2018 Choi, S.; Kim, Y.; Kim, J. W.; Kim, Z.; Kim, W. Y. Feasibility of activation energy prediction of gas-phase reactions by machine learning. Chem. Eur. J. 2018, 24, 12354–12358
- Farrar and Grayson 2022 Farrar, E. H. E.; Grayson, M. N. Machine learning and semi-empirical calculations: a synergistic approach to rapid, accurate, and mechanism-based reaction barrier prediction. Chem. Sci. 2022, 13, 7594–7603
- Friederich et al. 2020 Friederich, P.; dos Passos Gomes, G.; Bin, R. D.; Aspuru-Guzik, A.; Balcells, D. Machine learning dihydrogen activation in the chemical space surrounding Vaska’s complex. Chem. Sci. 2020, 11, 4584–4601
- Migliaro and Cundari 2020 Migliaro, I.; Cundari, T. R. Density Functional Study of Methane Activation by Frustrated Lewis Pairs with Group 13 Trihalides and Group 15 Pentahalides and a Machine Learning Analysis of Their Barrier Heights. J. Chem. Inf. Model. 2020, 60, 4958–4966
- Lewis-Atwell et al. 2023 Lewis-Atwell, T.; Beechey, D.; Şimşek, O.; Grayson, M. N. Reformulating Reactivity Design for Data-Efficient Machine Learning. ACS catalysis 2023, 13, 13506–13515
- Schwaller et al. 2022 Schwaller, P.; Vaucher, A. C.; Laplaza, R.; Bunne, C.; Krause, A.; Corminboeuf, C.; Laino, T. Machine intelligence for chemical reaction space. WIREs Comput. Mol. Sci. 2022, 12, e1604
- Rogers and Hahn 2010 Rogers, D.; Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 2010, 50, 742–754
- Probst et al. 2022 Probst, D.; Schwaller, P.; Reymond, J.-L. Reaction Classification and Yield Prediction using the Differential Reaction Fingerprint DRFP. Digital Discovery 2022, 1, 91–97
- Ahneman et al. 2018 Ahneman, D. T.; Estrada, J. G.; Lin, S.; Dreher, S. D.; Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 2018, 360, 186–190
- Żurański et al. 2021 Żurański, A. M.; Martinez Alvarado, J. I.; Shields, B. J.; Doyle, A. G. Predicting Reaction Yields via Supervised Learning. Acc. Chem. Res. 2021, 54, 1856–1865
- Zahrt et al. 2019 Zahrt, A. F.; Henle, J. J.; Rose, B. T.; Wang, Y.; Darrow, W. T.; Denmark, S. E. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 2019, 363, eaau5631
- Jorner et al. 2021 Jorner, K.; Brinck, T.; Norrby, P.-O.; Buttar, D. Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies. Chem. Sci. 2021, 12, 1163–1175
- Reid and Sigman 2019 Reid, J. P.; Sigman, M. S. Holistic prediction of enantioselectivity in asymmetric catalysis. Nature 2019, 571, 343–348
- Gensch et al. 2022 Gensch, T.; dos Passos Gomes, G.; Friederich, P.; Peters, E.; Gaudin, T.; Pollice, R.; Jorner, K.; Nigam, A.; Lindner-D’Addario, M.; Sigman, M. S.; Aspuru-Guzik, A. A comprehensive discovery platform for organophosphorus ligands for catalysis. J. Am. Chem. Soc. 2022, 144, 1205–1217
- Santiago et al. 2018 Santiago, C. B.; Guo, J.-Y.; Sigman, M. S. Predictive and mechanistic multivariate linear regression models for reaction development. Chem. Sci. 2018, 9, 2398–2412
- Jorner 2023 Jorner, K. Putting Chemical Knowledge to Work in Machine Learning for Reactivity. Chimia 2023, 77, 22
- Gallegos et al. 2021 Gallegos, L. C.; Luchini, G.; St. John, P. C.; Kim, S.; Paton, R. S. Importance of Engineered and Learned Molecular Representations in Predicting Organic Reactivity, Selectivity, and Chemical Properties. Acc. Chem. Res. 2021, 54, 827–836
- Williams et al. 2021 Williams, W. L.; Zeng, L.; Gensch, T.; Sigman, M. S.; Doyle, A. G.; Anslyn, E. V. The Evolution of Data-Driven Modeling in Organic Chemistry. ACS Cent. Sci. 2021, 7, 1622–1637
- Devlin et al. 2018 Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint 2018, arXiv:1810.04805
- Schwaller et al. 2019 Schwaller, P.; Laino, T.; Gaudin, T.; Bolgar, P.; Hunter, C. A.; Bekas, C.; Lee, A. A. Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction. ACS Cent. Sci. 2019, 5, 1572–1583
- Schwaller et al. 2021 Schwaller, P.; Vaucher, A. C.; Laino, T.; Reymond, J.-L. Prediction of chemical reaction yields using deep learning. Mach. Learn.: Sci. Technol. 2021, 2, 015016
- Doney et al. 2016 Doney, A. C.; Rooks, B. J.; Lu, T.; Wheeler, S. E. Design of Organocatalysts for Asymmetric Propargylations through Computational Screening. ACS Catal. 2016, 6, 7948–7955
- Gasteiger et al. 2020 Gasteiger, J.; Giri, S.; Margraf, J. T.; Günnemann, S. Fast and uncertainty-aware directional message passing for non-equilibrium molecules. arXiv preprint 2020, arXiv:2011.14115
- Grambow et al. 2020 Grambow, C.; Pattanaik, L.; Green, W. Reactants, products, and transition states of elementary chemical reactions based on quantum chemistry. Sci. Data 2020, 7, 137
- Duan et al. 2023 Duan, C.; Du, Y.; Jia, H.; Kulik, H. J. Accurate transition state generation with an object-aware equivariant elementary reaction diffusion model. arXiv preprint 2023, arXiv:2304.06174
- Chen et al. 2013 Chen, W. L.; Chen, D. Z.; Taylor, K. T. Automatic reaction map** and reaction center detection. WIREs Comput. Mol. Sci. 2013, 3, 560–593
- Preciat Gonzalez et al. 2017 Preciat Gonzalez, G. A.; El Assal, L. R.; Noronha, A.; Thiele, I.; Haraldsdóttir, H. S.; Fleming, R. M. Comparative evaluation of atom map** algorithms for balanced metabolic reactions: application to Recon 3D. J. Cheminform. 2017, 9, 1–15
- Jaworski et al. 2019 Jaworski, W.; Szymkuć, S.; Mikulak-Klucznik, B.; Piecuch, K.; Klucznik, T.; Kaźmierowski, M.; Rydzewski, J.; Gambin, A.; Grzybowski, B. A. Automatic map** of atoms across both simple and complex chemical reactions. Nat. Commun. 2019, 10, 1434
- Schwaller et al. 2021 Schwaller, P.; Hoover, B.; Reymond, J.-L.; Strobelt, H.; Laino, T. Extraction of organic chemistry grammar from unsupervised learning of chemical reactions. Sci. Adv. 2021, 7, eabe4166
- Stuyver and Coley 2023 Stuyver, T.; Coley, C. W. Machine Learning-Guided Computational Screening of New Candidate Reactions with High Bioorthogonal Click Potential. Chem. Eur. J. 2023, 29, e202300387
- Stuyver and Coley 2022 Stuyver, T.; Coley, C. W. Quantum chemistry-augmented neural networks for reactivity prediction: Performance, generalizability, and explainability. J. Chem. Phys. 2022, 156, 084104
- Spiekermann et al. 2022 Spiekermann, K.; Pattanaik, L.; Green, W. H. High accuracy barrier heights, enthalpies, and rate coefficients for chemical reactions. Sci. Data 2022, 9, 417
- Stuyver et al. 2023 Stuyver, T.; Jorner, K.; Coley, C. W. Reaction profiles for quantum chemistry-computed [3+2] cycloaddition reactions. Sci. Data 2023, 10, 66
- Geiger et al. 2022 Geiger, M.; Smidt, T.; M., A.; Miller, B. K.; Boomsma, W.; Dice, B.; Lapchevskyi, K.; Weiler, M.; Tyszkiewicz, M.; Uhrin, M.; Batzner, S.; Madisetti, D.; Frellsen, J.; Jung, N.; Sanborn, S.; jkh,; Wen, M.; Rackers, J.; Rød, M.; Bailey, M. e3nn/e3nn: 2022-12-12. 2022; https://doi.org/10.5281/zenodo.7430260
- Corso et al. 2023 Corso, G.; Stärk, H.; **g, B.; Barzilay, R.; Jaakkola, T. DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking. arXiv preprint 2023, arXiv:2210.01776
- Landrum et al. 2023 Landrum, G.; Tosco, P.; Kelley, B.; Ric,; Sriniker,; Cosgrove, D.; Gedeck,; Vianello, R.; NadineSchneider,; Kawashima, E.; N, D.; Jones, G.; Dalke, A.; Cole, B.; Swain, M.; Turk, S.; AlexanderSavelyev,; Vaucher, A.; Wójcikowski, M.; Ichiru Take,; Probst, D.; Ujihara, K.; Scalfani, V. F.; Godin, G.; Pahl, A.; Francois Berenger,; JLVarjo,; Walker, R.; Jasondbiggs,; Strets123, rdkit/rdkit: 2023_03_1 (Q1 2023) Release. 2023; https://zenodo.org/record/7880616
- Stärk et al. 2022 Stärk, H.; Ganea, O.; Pattanaik, L.; Barzilay, R.; Jaakkola, T. EquiBind: Geometric deep learning for drug binding structure prediction. International conference on machine learning. 2022; pp 20503–20521
- Vaswani et al. 2017 Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008
- Paszke et al. 2019 Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; Desmaison, A.; Kopf, A.; Yang, E.; DeVito, Z.; Raison, M.; Tejani, A.; Chilamkurthy, S.; Steiner, B.; Fang, L.; Bai, J.; Chintala, S. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037
- Liao and Smidt 2022 Liao, Y.-L.; Smidt, T. Equiformer: Equivariant graph attention transformer for 3D atomistic graphs. arXiv preprint 2022, arXiv:2206.11990
- Ganea et al. 2022 Ganea, O.-E.; Huang, X.; Bunne, C.; Bian, Y.; Barzilay, R.; Jaakkola, T.; Krause, A. Independent SE(3)-Equivariant Models for End-to-End Rigid Protein Docking. arXiv preprint 2022, arXiv:2111.07786
- van Gerwen et al. 2023 van Gerwen, P.; Wodrich, M. D.; Laplaza, R.; Corminboeuf, C. Reply to Comment on ‘Physics-based representations for machine learning properties of chemical reactions’. Mach. Learn.: Sci. Technol. 2023, 4, 048002
- Lowe 2012 Lowe, D. M. Extraction of chemical structures and reactions from the literature. Ph.D. thesis, University of Cambridge, 2012
- Wu et al. 2018 Wu, Z.; Ramsundar, B.; Feinberg, E. N.; Gomes, J.; Geniesse, C.; Pappu, A. S.; Leswing, K.; Pande, V. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 2018, 9, 513–530
- Yang et al. 2019 Yang, K.; Swanson, K.; **, W.; Coley, C.; Eiden, P.; Gao, H.; Guzman-Perez, A.; Hopper, T.; Kelley, B.; Mathea, M.; Palmer, A.; Settels, V.; Jaakkola, T.; Jensen, K.; Barzilay, R. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 2019, 59, 3370–3388
- Bemis and Murcko 1996 Bemis, G. W.; Murcko, M. A. The Properties of Known Drugs. 1. Molecular Frameworks. J. Med. Chem. 1996, 39, 2887–2893
- van der Maaten and Hinton 2008 van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605
- Bannwarth et al. 2019 Bannwarth, C.; Ehlert, S.; Grimme, S. GFN2-xTB—An Accurate and Broadly Parametrized Self-Consistent Tight-Binding Quantum Chemical Method with Multipole Electrostatics and Density-Dependent Dispersion Contributions. J. Chem. Theory Comput. 2019, 15, 1652–1671
- Blum and Reymond 2009 Blum, L. C.; Reymond, J.-L. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. J. Am. Chem. Soc. 2009, 131, 8732–8733
- Reymond 2015 Reymond, J.-L. The chemical space project. Acc. Chem. Res. 2015, 48, 722–730
- Zimmerman 2015 Zimmerman, P. M. Single-ended transition state finding with the growing string method. J. Comput. Chem. 2015, 36, 601–611
- Cordella et al. 2001 Cordella, L. P.; Foggia, P.; Sansone, C.; Vento, M. An improved algorithm for matching large graphs. 3rd IAPR-TC15 workshop on graph-based representations in pattern recognition. 2001; pp 149–159
- Hagberg et al. 2008 Hagberg, A. A.; Schult, D. A.; Swart, P. J. Exploring Network Structure, Dynamics, and Function using NetworkX. Proceedings of the 7th Python in Science Conference. Pasadena, CA USA, 2008; pp 11–15
- Riniker and Landrum 2015 Riniker, S.; Landrum, G. A. Better informed distance geometry: using what we know to improve conformation generation. J. Chem. Inf. Model. 2015, 55, 2562–2574
- Tosco et al. 2014 Tosco, P.; Stiefl, N.; Landrum, G. Bringing the MMFF force field to the RDKit: implementation and validation. J. Cheminform. 2014, 6, 37
- Kingma and Ba 2014 Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint 2014, arXiv:1412.6980
- Biewald 2020 Biewald, L. Experiment Tracking with Weights and Biases. 2020; https://www.wandb.com/, Software available from wandb.com
- Christensen et al. 2017 Christensen, A. S.; Faber, F.; Huang, B.; Bratholm, L.; Tkatchenko, A.; Müller, K.-R.; von Lilienfeld, O. A. QML: A Python Toolkit for Quantum Machine Learning. https://github.com/qmlcode/qml, 2017