EquiReact: An equivariant neural network for chemical reactions

Puck van Gerwen Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland Ksenia R. Briling Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland Charlotte Bunne National Center for Competence in Research – Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland Vignesh Ram Somnath National Center for Competence in Research – Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland Ruben Laplaza Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland Andreas Krause National Center for Competence in Research – Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland Clemence Corminboeuf [email protected] Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland

Abstract

Equivariant neural networks have considerably improved the accuracy and data-efficiency of predictions of molecular properties. Building on this success, we introduce EquiReact, an equivariant neural network to infer properties of chemical reactions, built from three-dimensional structures of reactants and products. We illustrate its competitive performance on the prediction of activation barriers on the GDB7-22-TS, Cyclo-23-TS and Proparg-21-TS datasets with different regimes according to the inclusion of atom-map** information. We show that, compared to state-of-the-art models for reaction property prediction, EquiReact offers: (i) a flexible model with reduced sensitivity between atom-map** regimes, (ii) better extrapolation capabilities to unseen chemistries, (iii) impressive prediction errors for datasets exhibiting subtle variations in three-dimensional geometries of reactants/products, (iv) reduced sensitivity to geometry quality and (iv) excellent data efficiency.

keywords:

machine learning, equivariant neural networks, activation energies, chemical reactions

\altaffiliation

These authors contributed equally to this work. \alsoaffiliationNational Center for Competence in Research – Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland \altaffiliationThese authors contributed equally to this work. \alsoaffiliationLearning & Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland \alsoaffiliationLearning & Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland \alsoaffiliationNational Center for Competence in Research – Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland \alsoaffiliationLearning & Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland \alsoaffiliationNational Center for Competence in Research – Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland

1 Introduction

Physics-inspired representations that take as input the three-dimensional structure^{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13} (as well as, in some cases, electronic structure^{14, 15, 16, 17}) of molecules and transform them into a fixed-length vector while respecting known physical laws, have a rich history in molecular property prediction. Models have been developed to predict properties ranging from atomization energies,^{2, 8, 4, 5, 6, 7, 13} molecular forces,^{18, 19, 20, 7, 5} potential energy surfaces,^{1, 21, 3, 5, 7, 9, 10, 22} multipole moments,²³ polarizabilities,^{4, 5, 6, 11, 12, 24, 25} dipole moments,⁶ HOMO and LUMO eigenvalues^{4, 6} as well as the HOMO–LUMO gap,^{6, 26, 27} and electron densities.^{28, 29, 30} A common desiderata^{31, 32, 33, 34} for high-performing representations is (i) smoothness, (ii) the encoding of the appropriate symmetries to permutations, rotations and translations,^{24, 35} (iii) completeness and (iv) additivity to allow for extrapolation to larger systems. Such fingerprints such as the CM,² BoB,⁴ SOAP,³ FCHL,^{6, 7} SLATM,⁸ MBTR,⁵ LODE,^{11, 12} NICE¹³ or others, being rooted in fundamental principles, are designed to be property-independent: a single representation can be constructed for a molecule to predict any (electronically-derived) target. This is analogous to the molecular Hamiltonian, which specifies the energy and all other properties of the system as a function of atom’s types and positions in three-dimensional space (assuming the molecules are charge neutral and singlets). These representations are typically used in combination with kernel models^{2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 31, 32, 33} due to their data efficiency, ability to deal with high-dimensional feature vectors, and interpretability of the similarity kernel. Early works showed that combining such representations^{2, 4, 6, 8, 36} with simple feed-forward neural networks instead of kernel models did not necessarily led to better performance.^{37, 38}

More recently, end-to-end neural networks have been proposed that learn the representation as part of the (supervised) training process,^{39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53} based on similar principles to the aforementioned physics-inspired representations: They take as input a three-dimensional structure, as well as in some cases charge and spin information.^{46, 51, 52, 53} Relevant symmetry operations are appropriately encoded into the neural network architecture. Equivariant Neural Networks (ENNs) in particular have shown unprecedented accuracy and data efficiency on benchmarks of molecular property prediction such as energies of organic molecules (QM7b-T,^{54, 46} QM9,^{55, 46, 49, 50, 56} GDB-13-T,^{54, 46} DrugBank-T,^{57, 46} conformers,⁵⁸ ANI-1^{59, 45}), and energies and forces of several molecular dynamics datasets (MD17,^{43, 44, 45, 49, 56} proteins,⁴⁸ methane combustion,^{60, 45} the open catalyst dataset (OC20)^{61, 44}). Unlike the earlier invariant neural networks,^{39, 40, 41, 42} which by operating on distances between atoms ensure rotational and translational invariance, equivariant neural networks typically operate on relative position vectors and angular information, which is processed by rotationally-equivariant convolutional layers. The internal features are then equivariant to rotation. ENNs demonstrate a substantial improvement in accuracy even for the prediction of rotation-invariant properties such as total energies.^{43, 56, 44, 45, 46, 49, 50}

Despite these advances for molecular property prediction, the prediction of computed reaction properties (principally, reaction barriers^{62, 36, 38, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74}) is still in its infancy.⁷⁵ Machine learning approaches range from the use of simple two-dimensional fingerprints of reaction components^{76, 77} to physical-organic descriptors^{78, 79, 70, 80, 81, 69, 82, 83, 84, 85, 86, 87, 74} derived from quantum-chemical computations to transformer models^{88, 89} adapted for regression⁹⁰ to graph-based approaches.^{65, 66, 64} A recent class of reaction fingerprints are built from the three-dimensional structure of reaction components,^{36, 63, 68} involving invariant features for molecular components. It was recently shown³⁸ that these representations are performant for the prediction of reaction barriers, particularly for datasets^{91, 63} relying on subtle changes in the geometry of reactants and/or products. As in the earlier stages of molecular property prediction,^{2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13} representations were combined with kernel models.

To date, few attempts have been made to extend equivariant neural networks to chemical reactions. The first is from Spiekermann et al.,⁶⁶ extending the molecular network DimeNet++^{41, 92} to describe reactant and product molecules in a chemical reaction, to predict reaction barriers on the benchmark GDB7-20-TS set.⁹³ The second is an equivariant diffusion model predicting the Transition State (TS) structure from reactants and products⁹⁴ on the same dataset. The former representation-learning approach is closer in spirit to our dedicated reaction fingerprints.³⁶ However, this particular model⁶⁶ was not shown to improve on 2D-graph based models.⁶⁵ It was recently illustrated³⁸ that the 2D-graph based models achieve their impressive performance by exploiting atom-map** information,^{95, 96, 97, 98} which is absent in the equivariant model from Spiekermann et al.⁶⁶

In this work, we introduce EquiReact, a neural network to predict properties of chemical reactions (showcased here for activation energies), built from equivariant features of reactants’ and products’ geometries. Compared to previously established reaction fingerprints^{36, 63} as well as other neural networks for reaction prediction,^{65, 64, 66, 99, 100} we offer several advantages with our new model: (i) greater model flexibility depending on the ease of atom-map** a particular dataset, (ii) better extrapolation capabilities, (iii) competitive predictive performance, (iv) reduced dependence on the quality of three-dimensional geometries and (v) improved data efficiency.

We illustrate these points by studying three datasets of reaction barriers: the GDB7-22-TS ¹⁰¹ (an upgraded dataset from the previously published GDB7-20-TS⁹³), the Cyclo-23-TS,¹⁰² and the Proparg-21-TS.^{91, 63} As discussed in previous works,³⁸ these datasets present varying challenges for ML models, from the dependence on chemical information¹⁰¹ to the distinction of subtle changes in configurations.^{91, 63} In all cases, we compare to the previously best-performing models:³⁸ the 2D-graph based CGR model ChemProp ⁶⁵ as well as SLATM ${}_{d}$ fingerprints^{8, 36} built from three-dimensional information combined with kernel ridge regression (KRR) models.

2 Architecture

Refer to caption — Figure 1: Architecture of EquiReact. Molecules pass through independent equivariant channels (green and orange). These are combined to yield a reaction representation (blue) which is used to predict a reaction property, such as the activation energy (red dot, far right).

EquiReact is built from $SE(3)$ -equivariant convolutional networks over point clouds as implemented in e3nn¹⁰³ and used in Thomas et al. ⁴⁷ and Corso et al. ¹⁰⁴. Three-dimensional geometries of molecules constituting reactants and products of each reaction are separately passed through these equivariant channels, detailed in Section 2.1 for the convenience of the reader. They are then combined to eventually predict a reaction property, such as the activation energy, as detailed in Section 2.2. The overall architecture is summarized in Figure 1.

2.1 Equivariant molecular channels

A molecule is represented as a distance-based graph where nodes describe atoms and edges describe bonds. Instead of explicitly using connectivity information, the “bonds” of atom $a$ are formed with all the neighboring $\operatorname{Neigh}(a)$ atoms within the cutoff $r_{\max}$ , all the (directed) bonds $\{(a,b)\}$ in the molecule forming set $\mathfrak{B}$ . Initial node (atom) features $\{\mathrm{\mathbf{x}}^{(0)}_{a}\}$ encode several cheminformatic features from RDKit,¹⁰⁵ including atomic number, chirality tag (unspecified, tetrahedral, or other, including octahedral, square planar, allene-type, etc.), number of directly-bonded neighbors, number of rings, implicit valence, formal charge, number of attached hydrogens, number of radical electrons, hybridization, aromaticity, presence in rings of specified sizes from 3 to 7.

Inspired from related models,¹⁰⁴ initial scalar edge (bond) features $\{\mathrm{\mathbf{e}}^{(0)}_{ab}\}$ are projections of the atom distances $|\mathrm{\mathbf{r}}_{ab}|$ onto $n_{g}$ Gaussians uniformly spanning the line segment from $0$ to $r_{\max}$ with the step $\Delta\mu=r_{\max}/(n_{g}-1)$ ,

	$\displaystyle\mathrm{\mathbf{e}}^{(0)}_{ab}=\mathrm{\mathbf{f}}_{1}(\|\mathrm{% \mathbf{r}}_{ab}\|)\qquad\forall(a,b)\in\mathfrak{B},$		(1)
	$\displaystyle\mathrm{\mathbf{f}}_{1}(r)=\left\{\exp\left(-\frac{1}{2}\left(% \frac{r-n\Delta\mu}{\Delta\mu}\right)^{2}\right)\right\}\quad n\in 0,\ldots,n_% {g}{-}1.$		(2)

Tensorial edge features $\{\mathrm{\mathbf{z}}_{ab}\}$ are projections of normalized difference vectors between atomic positions $\mathrm{\mathbf{r}}_{ab}/|\mathrm{\mathbf{r}}_{ab}|$ onto spherical harmonics $Y^{\ell}_{m}$ of $0\leq\ell\leq 2$ ,

	$\displaystyle\mathrm{\mathbf{z}}_{ab}\equiv\mathrm{\mathbf{z}}_{ab}^{0e}\oplus% \mathrm{\mathbf{z}}_{ab}^{1o}\oplus\mathrm{\mathbf{z}}_{ab}^{2e}=\mathrm{% \mathbf{f}}_{2}(\mathrm{\mathbf{r}}_{ab}/\|\mathrm{\mathbf{r}}_{ab}\|)\qquad% \forall(a,b)\in\mathfrak{B},$		(3)
	$\displaystyle\mathrm{\mathbf{f}}_{2}(\mathrm{\mathbf{r}})=Y^{0}_{0}(\mathrm{% \mathbf{r}})\oplus\left\{Y^{1}_{m}(\mathrm{\mathbf{r}})\right\}_{\|m\|\leq 1}% \oplus\left\{Y^{2}_{m}(\mathrm{\mathbf{r}})\right\}_{\|m\|\leq 2}.$		(4)

The initial $\mathrm{\mathbf{x}}^{(0)}$ and $\mathrm{\mathbf{e}}^{(0)}$ are then passed through embeddings to give $x^{(1)}_{a}\forall a$ and $\mathrm{\mathbf{e}}_{ab}\forall(a,b)\in\mathfrak{B}$ . Atomic representations $\mathrm{\mathbf{x}}^{(1)}$ are then updated by $n_{\mathrm{conv}}=3$ equivariant convolutional layers: {DispWithArrows}<L1 > & w^(1)_ab = g_31(e_ab ⊕x^(1)_a ⊕x^(1)_b) ∀(a,b)∈B
s(1)b≡s0e(1)b⊕s1o(1)b= 1Neigh(b)∑a: (a,b)∈Bt1(x(1)a, zab, w(1)ab) ∀b
x^0e(2) = x^(1) + s^0e(1)
x^(2) = x^0e(2) ⊕s^1o(1) {DispWithArrows}<L2 > & w^(2)_ab = g_32(e_ab ⊕x^0e(2)_a ⊕x^0e(2)_b) ∀(a,b)∈B
s(2)b≡s0e(2)b⊕s1o(2)b⊕s1e(2)b= 1Neigh(b)∑a: (a,b)∈Bt2(x(2)a, zab, w(2)ab) ∀b
x^0e(3) = x^0e(2) + s^0e(2)
x^(3) = x^0e(3) ⊕(s^1o(1)+s^1o(2)) ⊕s^1e(2) {DispWithArrows}<L3 > & w^(3)_ab = g_33(e_ab ⊕x^0e(3)_a ⊕x^0e(3)_b) ∀(a,b)∈B
s(3)b≡s0e(3)b⊕s1o(3)b⊕s1e(3)b⊕s0o(3)b= 1Neigh(b)∑a: (a,b)∈Bt3(x(3)a, zab, w(3)ab) ∀b
x^out = (x^0e(3) + s^0e(3)) ⊕s^0o(3), where for example in $\mathrm{\mathbf{L1}}$ , $\mathrm{\mathbf{s}}^{(1)}_{b}\equiv\mathrm{\mathbf{s}}^{0e(1)}_{b}\oplus% \mathrm{\mathbf{s}}^{1o(1)}_{b}$ means that the result of the function $\mathrm{\mathbf{t}}_{1}$ consists of scalars ( $0e$ ) and vectors ( $1o$ ) that can be treated separately. Each function $\mathrm{\mathbf{t}}_{n}(\mathrm{\mathbf{x}},\mathrm{\mathbf{z}},\mathrm{% \mathbf{w}})$ is a fully-connected weighted tensor product, as defined in e3nn,¹⁰³ in the form of

\mathrm{\mathbf{t}}_{n}(\mathrm{\mathbf{x}},\mathrm{\mathbf{z}},\mathrm{% \mathbf{v}})=\{\mathrm{\mathbf{t}}^{(n)}_{w}\}=\left\{\sum_{uv}w^{(n)}_{uvw}% \mathrm{\mathbf{x}}_{u}\otimes\mathrm{\mathbf{z}}_{v}\right\}.

(5)

They are specified by signatures of irreducible representations (irreps) of two input and one output $O(3)$ tensors. The output tensor is a combination of weighted sums of paths (pairs of input irreps) leading to each output irrep. The irrep sequence in each layer from 1–3 is illustrated in Figure 2. To obtain the weights $\mathrm{\mathbf{w}}^{(n)}$ for each convolutional layer $n$ , the spherical parts of $\mathrm{\mathbf{x}}^{(n)}_{a}$ and $\mathrm{\mathbf{x}}^{(n)}_{b}$ are concatenated with the bond features $\mathrm{\mathbf{e}}_{ab}$ and passed through a Multi-Layer Perceptron (MLP).

The output of the equivariant molecular channels is the local molecular representation $\mathrm{\mathbf{X}}\in\mathbb{R}^{N_{\mathrm{at}}\times D}$ corresponding to $N_{\mathrm{at}}$ atoms associated with $D$ features. Depending on the sum_mode hyperparameter, it is constructed either from the node features $\{\mathrm{\mathbf{x}}^{\mathrm{out}}_{a}\}$ (node mode) or both node and edge features $\{\mathrm{\mathbf{x}}^{\mathrm{out}}_{a}\oplus\sum_{b:(a,b)\in\mathfrak{B}}% \mathrm{\mathbf{y}}^{(0)}_{ab}\}$ (both mode). In the case of $n_{\mathrm{conv}}=2$ , the vectors $\{\mathrm{\mathbf{x}}^{0e(3)}_{a}\}$ are taken to construct the molecular representation.

Inspired by the ChemProp model,⁶⁵ we added an option to exclude hydrogen atoms as nodes when constructing the graph. The only information about hydrogens is then contained in the initial edge features of heavy atoms. The results shown in the main text are obtained without hydrogens, since in this regime the model performs systematically better. Comparison with a regime which uses explicit hydrogen atoms is provided in Section LABEL:S-sec:hydrogens.

2.2 Combining molecules for reactions

Once atom-wise molecular representations $\mathrm{\mathbf{X}}$ are learned for reactant and product molecules, they must be combined to form a reaction representation $\mathrm{\mathbf{X}}_{\mathrm{rxn}}$ .

For certain datasets, atom-map** information is available, which correlates individual atoms in reactant molecules to individual atoms in product molecules according to their reaction mechanism. In this setting, the representations $\mathrm{\mathbf{X}}_{\mathrm{reactant}}$ and $\mathrm{\mathbf{X}}_{\mathrm{product}}$ are re-ordered such that the local representation vectors correspond to the same atom in reactants and products. Depending on the combine_mode hyperparameter, either a difference is taken between products’ and reactants’ atom representations, or they are summed, averaged or passed through an MLP. Thus, the local reaction representation $\mathrm{\mathbf{X}}_{\mathrm{rxn}}$ consists of vectors reflecting how the environment changes in the reaction for each atom. We will address this regime as EquiReact ${}_{M}$ .

With the reaction representation at hand, predictions are made in the so-called vector or energy modes. In the vector mode, the atomic vectors constituting the reaction representation $\mathrm{\mathbf{X}}_{\mathrm{rxn}}$ are initially passed through an MLP to introduce non-linearity and then summed up to form a global reaction representation vector $\mathrm{\mathbf{\bar{X}}}_{\mathrm{rxn}}$ . The target (here, the activation barrier) is then learned using an MLP. This model pipeline is illustrated in Figure 3a. In the energy mode, the local reaction representations are used to learn atomic contributions to the target (Figure 3b). While performing worse in general, in some cases this mode yields the best predictions (see Section 3.1.2).

While atom-map** provides static information analogous to a reaction mechanism to link atoms in reactants to atoms in products, instead it is possible to dynamically (i.e., in a learnable fashion) exchange information between two molecular representations. For example, RXNMapper⁹⁸ is a neural network that learns atom-map**s within the larger self-supervised task of predicting the randomly masked parts in a reaction sequence, using one head of a multi-head transformer architecture. Inspired by EquiBind,¹⁰⁶ a neural network that predicts the rotation and translation of a ligand to a protein and contains a cross-attention module between ligand and receptor, EquiReact ${}_{X}$ uses cross-attention between reactants and products to create a surrogate for atom-map**. Given queries $\mathrm{\mathbf{Q}}\in\mathbb{R}^{N\times D}$ , keys $\mathrm{\mathbf{K}}\in\mathbb{R}^{M\times D}$ and values $\mathrm{\mathbf{V}}\in\mathbb{R}^{M\times D}$ , attention is computed as

\mathrm{\mathbf{A}}=\operatorname{softmax}\left(\frac{\mathrm{\mathbf{Q}}% \mathrm{\mathbf{K}}^{T}}{\sqrt{D}}\right)

(6)

and the “reordered” values $\mathrm{\mathbf{Y}}$ are

\mathrm{\mathbf{Y}}=\mathrm{\mathbf{A}}\mathrm{\mathbf{V}}.

(7)

We used the implementation of this scaled-dot-product attention¹⁰⁷ in PyTorch’s¹⁰⁸ MultiheadAttention (PyTorch version 1.12.1). The representations are re-ordered using Eq. 6 and Eq. 7 with $\mathrm{\mathbf{Q}}$ as the vector representation of reactants, $\mathrm{\mathbf{K}}$ and $\mathrm{\mathbf{V}}$ as the vector representations of products and vice versa (thus here $N=M=N_{\mathrm{at}}$ ). The re-ordered representations of reactants and products are combined as for the case of atom-mapped reactions (Figures 3a and 3b). We note that other algorithms could also have been used to exchange information between reactants and products, for example in the form of message passing, or equivariant attention.^{109, 110}

EquiReact also provides a simple “no map**” approach, called EquiReact ${}_{S}$ , which does not rely on atom-map**, nor a surrogate cross-attention module. In the vector mode (Figure 3c), the atomic components of molecular representations $\mathrm{\mathbf{X}}_{\mathrm{reactant}}$ and $\mathrm{\mathbf{X}}_{\mathrm{product}}$ are summed up to obtain global vectors $\mathrm{\mathbf{\bar{X}}}_{\mathrm{reactant}}$ and $\mathrm{\mathbf{\bar{X}}}_{\mathrm{product}}$ , respectively. Then they are combined, according to the combine_mode parameter, to form a reaction vector $\mathrm{\mathbf{\bar{X}}}_{\mathrm{rxn}}$ which is used to learn the target with an MLP. In the energy mode (Figure 3d) individual atomic representations are used to learn their contributions to the quasi-molecular energies of reactants and products, which are later combined (according to the combine_mode parameter) to predict the target. In most cases, this simpler model out-performs EquiReact ${}_{X}$ (vide infra).

3 Results and Discussion

3.1 Model performance

The performance of EquiReact on three diverse datasets (the GDB7-22-TS,¹⁰¹ Cyclo-23-TS ¹⁰² and Proparg-21-TS ^{91, 63}) is illustrated in Table 1, compared to previously best baseline models:³⁸ ChemProp,⁶⁵ a graph neural network that uses atom maps to construct a condensed graph of reaction (CGR), and the SLATM⁸ fingerprint adapted to reactions by taking the difference between product and reactant fingerprints (SLATM ${}_{d}$ ),³⁶ combined with KRR models (SLATM ${}_{d}$ +KRR). The models are compared in three regimes: with high-quality atom-map** information (“True”) derived from the transition state structure or heuristic rules,^{102, 93, 101, 65, 97} with atom-maps obtained using the open-source RXNMapper⁹⁸ (“RXNMapper”) and without any atom-map** information at all (“None”). As discussed in recent work,^{111, 38} previously developed graph-based models for reaction property prediction^{65, 66, 64, 99, 100} including ChemProp ⁶⁵ reported prediction errors only in the “True” atom-map** regime. The “RXNMapper” regime is important for real chemistry where the reaction mechanism is not known and atom-map** using heuristic rules is impossible. The “None” regime is critical for all chemistry that falls outside of the realm of organic chemistry captured in the patents¹¹² that RXNMapper⁹⁸ is pre-trained on. SLATM ${}_{d}$ +KRR always operates in the “None” regime.

The atom-map**-based model EquiReact ${}_{M}$ is used in the “True” and “RXNMapper” regimes. In the “None” regime, EquiReact ${}_{X}$ and EquiReact ${}_{S}$ were tested. EquiReact ${}_{S}$ consistently outperformed EquiReact ${}_{X}$ , so we include only EquiReact ${}_{S}$ and refer the reader to the Supplementary Information for their comparison. EquiReact is compared to ChemProp and SLATM ${}_{d}$ +KRR baselines using both random splits to measure interpolative capabilities as well as scaffold splits to measure extrapolative capabilities. Scaffold splitting^{113, 114} clusters molecules based on their 2D backbones (such as Bemis–Murcko scaffolds¹¹⁵) and ensures that the clusters (scaffolds) belonging to the train, test, and validation sets do not overlap. This is a more challenging regime for a model than random splitting where very similar molecules could appear in both the training and test sets.

Dataset (property, units)	Atom-map** regime	ChemProp	SLATM ${}_{d}$ +KRR	EquiReact
Random splits
GDB7-22-TS ( $\Delta E^{\ddagger}$ , kcal/mol)	True	$\bf 4.60\pm 0.15$	—	$5.00\pm 0.16$
	RXNMapper	$\bf 6.1\pm 0.3$	—	$6.2\pm 0.3$
	None	$8.74\pm 0.24$	$6.9\pm 0.3$	$\bf 6.3\pm 0.3$
Cyclo-23-TS ( $\Delta G^{\ddagger}$ , kcal/mol)	True	$2.70\pm 0.14$	—	$\bf 2.42\pm 0.11$
	RXNMapper	$2.71\pm 0.16$	—	$\bf 2.44\pm 0.17$
	None	$2.69\pm 0.16$	$2.70\pm 0.14$	$\bf 2.32\pm 0.14$
Proparg-21-TS ( $\Delta E^{\ddagger}$ , kcal/mol)	True	$1.58\pm 0.13$	—	$\bf 0.29\pm 0.06$
Proparg-21-TS ( $\Delta E^{\ddagger}$ , kcal/mol)	None	$1.59\pm 0.15$	$0.42\pm 0.11$	$\bf 0.27\pm 0.04$
Scaffold splits
GDB7-22-TS ( $\Delta E^{\ddagger}$ , kcal/mol)	True	$5.0\pm 0.5$	—	$\bf 4.95\pm 0.19$
	RXNMapper	$6.6\pm 0.3$	—	$\bf 6.0\pm 0.3$
	None	$11.9\pm 1.0$	$6.47\pm 0.22$	$\bf 6.3\pm 0.3$
Cyclo-23-TS ( $\Delta G^{\ddagger}$ , kcal/mol)	True	$2.9\pm 0.3$	—	$\bf 2.73\pm 0.19$
	RXNMapper	$\bf 2.81\pm 0.20$	—	$\bf 2.82\pm 0.19$
	None	$3.2\pm 0.3$	$2.66\pm 0.15$	$\bf 2.37\pm 0.21$
Proparg-21-TS ( $\Delta E^{\ddagger}$ , kcal/mol)	True	$1.64\pm 0.25$	—	$\bf 0.31\pm 0.08$
Proparg-21-TS ( $\Delta E^{\ddagger}$ , kcal/mol)	None	$1.89\pm 0.07$	$0.36\pm 0.07$	$\bf 0.27\pm 0.03$

Table 1: Performance as measured in mean absolute errors (MAEs) of predictions of EquiReact vs. state-of-the-art baselines ChemProp and SLATM

{}_{d}

for the GDB7-22-TS,¹⁰¹ Cyclo-23-TS,¹⁰² and Proparg-21-TS ^{91, 63} datasets. All datasets are compared in three atom-map** regimes: “True”, “RXNMapper” and “None”, except for the Proparg-21-TS set, where RXNMapper cannot map the reaction SMILES. EquiReact

{}_{M}

is used for the “True” and “RXNMapper” regimes, while in the “None” regime results for EquiReact

{}_{S}

model are reported. MAEs are averaged over 10 folds of random/scaffold 90/5/5 splits (train/test/validation). The best model for each regime, dataset and split type is highlighted in bold.

3.1.1 GDB7-22-TS dataset

This dataset is distinct from the other two in that it includes variations in the reaction class (and mechanism), thereby showing a greater dependence on the existence and quality of atom-map** information in the models. It has already been observed³⁸ that for existing models (including ChemProp and SLATM ${}_{d}$ ), using random splits, that there is stark hierarchy in the predictions from the “True” to “RXNMapper” to “None” regimes.

In the “True” regime, EquiReact does not improve predictive capabilities over ChemProp model with random splits. This points to the importance of the chemical diversity in this set, where knowledge of the reaction mechanism (in the form of atom maps) is sufficient information to predict the reaction barriers without information of the three-dimensional geometries of reactants and products. With scaffold splits however, EquiReact and ChemProp have the same predictive MAEs within the standard deviations. It can be seen that a model based on geometry information naturally extrapolates better than one trained on atom maps. Bemis–Murcko scaffold¹¹⁵ splitting clusters molecules (here, reactants) based on ring systems. Out-of-sample test molecules may therefore appear “novel” from the point of view of the reaction graph, but will still feature distances and angles close to what the model has seen during training.

Moving to the “RXNMapper” regime, the trend observed in the relative model performance in the “True” regime is exaggerated. Both EquiReact and ChemProp agree within the standard deviation using random splits, while EquiReact outperforms ChemProp using scaffold splits.

In the “None” regime, the difference in the performance of the models is even greater. In terms of MAE, EquiReact outperforms ChemProp by more than $2\text{\,}\mathrm{k}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{/}\mathrm{m}\mathrm{% o}\mathrm{l}$ using random splits, and more than $5\text{\,}\mathrm{k}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{/}\mathrm{m}\mathrm{% o}\mathrm{l}$ using scaffold splits. EquiReact’s improvement compared to SLATM ${}_{d}$ +KRR is smaller. The SLATM ${}_{d}$ representation also makes use of 3D coordinates of the reactants and products as is therefore more fundamentally similar to EquiReact than ChemProp. Nevertheless, EquiReact makes use of equivariant components for molecular features (vs. the invariant features of SLATM ${}_{d}$ ) and learns a representation end-to-end, allowing for a more performant model.

Thanks to EquiReact exploiting the chemical information present in atom-maps (if available) and encoding the natural symmetries of molecular reaction components, the stark gap previously observed³⁸ from the “True” to “None” regimes has been diminished, with the prediction MAEs ranging from $56.3\text{\,}\mathrm{k}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{/}\mathrm{m}% \mathrm{o}\mathrm{l}$ for the GDB7-22-TS set using random splits and $4.956.3\text{\,}\mathrm{k}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{/}\mathrm{m}% \mathrm{o}\mathrm{l}$ using scaffold splits. This is illustrated in Figure 4, where for the same scaffold split, outliers in the “True” plot successively move closer to the $y=x$ line.

3.1.2 Cyclo-23-TS dataset

The Cyclo-23-TS ¹⁰² dataset contains a single fixed reaction-class and has been previously illustrated to show less dependence on the quality of atom-map** than the GDB7-22-TS.³⁸

For this set, EquiReact outperforms or matches the other models in all three regimes for both random and scaffold splits. This illustrates that a model based purely on geometry information of reactants and products, without any chemical information in the form of atom-map** or surrogates thereof, can be highly performant for reaction property prediction.

The best model is obtained in the “None” regime, with EquiReact ${}_{S}$ in the energy mode (Figure 3d). As outlined in Section 2.2, in energy mode an energy contribution is learned for reactants’ and products’ atoms separately. In the original publication,¹⁰² Stuyver et al. illustrate that the activation barriers correlate linearly with the reaction energy. Since the reaction energy is the difference between products’ and reactants’ energies, the energy mode is the best choice for a model learning the reaction energy, and in the case of this dataset, for the barrier too, due to its linear correlation with the reaction energy.

3.1.3 Proparg-21-TS dataset

The Proparg-21-TS ^{91, 63} is a small dataset for neural network standards (753 points) and therefore constitutes a challenge for the data efficiency of our model.

EquiReact obtains the best predictive abilities for both the “None” and “True” regimes, where the “RXNMapper” regime is not available since it cannot atom-map the reaction SMILES of this set. These results illustrate that in line with the observations for ENNs for molecules,^{43, 44, 45, 46, 47, 48, 49, 50} ENNs for chemical reactions can be highly data-efficient and operate in the “low-data” regime.

Geometry-only EquiReact ${}_{S}$ results in MAEs $0.15\text{\,}\mathrm{k}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{/}\mathrm{m}% \mathrm{o}\mathrm{l}$ / $0.09\text{\,}\mathrm{k}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{/}\mathrm{m}% \mathrm{o}\mathrm{l}$ lower than those of previous state-of-the-art SLATM ${}_{d}$ using random / scaffold splits. Since the enantioselectivity is related to the barrier through an exponential relationship, this difference is significant for the downstream enantioselectivity prediction.⁶³ Like the Cyclo-23-TS set, this dataset consists of a fixed reaction class and the model does not benefit from being provided the “obvious” chemical information: including true atom-maps does not decrease the error.

These three datasets illustrate the benefits of the flexibility of EquiReact: depending on the datasets’ particular challenges, the model exploits the available information to yield the best-performing model in almost all cases. Since the modes of the model may be specified as hyperparameters, the optimized version of EquiReact can emerge with minimal user intervention.

3.2 Model behaviour

Since the GDB7-22-TS set illustrates the most dependence on the chemical diversity captured in the models, studying EquiReact and baseline models SLATM ${}_{d}$ and ChemProp in the “True” regime best captures the difference in their chemical behaviour.

Figure 5 compares the (latent) representations of EquiReact, ChemProp and SLATM ${}_{d}$ using t-SNE¹¹⁶ maps. In the upper panel, we find that the quality of the correlation between the representations and the target property corresponds to the relative performance of the models in this regime (Table 1). The best-performing ChemProp shows a smooth transition of the target property across the plot, while EquiReact’s gradient is somewhat distorted and the map of the worst-performing SLATM ${}_{d}$ does not have a clear structure. The lower panel shows the correlation of the representations with the reaction types. ChemProp, as a chemically-inspired model, illustrates clearer clusters in the reaction type, whereas EquiReact does not, using different information (distances and angles) to correlate its latent space representation with the target property. SLATM ${}_{d}$ lies somewhere in-between, illustrating some clusters of reaction class. While SLATM ${}_{d}$ is also a distance-based model, the binning structure used to create the representation^{8, 36} may result in better correlation with the reaction types, since the pairwise bins for example naturally cluster features such as \ceC–\ceH bond formation or breaking. Nevertheless, as discussed in Section 3.1, EquiReact better exploits distance information only to out-perform SLATM ${}_{d}$ .

Figure 6 illustrates how EquiReact “True” performs for the most common reaction types in the GDB7-22-TS set defined by bond breaking and formation (see Section 5.3). EquiReact performs universally well across the different reaction types, with consistently low errors and relatively small standard deviations. The reactions for which the model has higher mean errors and standard deviations ( $+$ C–H, $-$ C–H (blue) and $+$ H–H, $-$ C–H, $-$ C–H (green)) correspond to those involving C–H features. Since the model is trained without explicit H nodes in the graph, features associated with X–H bonds are included implicitly in the model. Since C is the most frequently occurring element in various different configurations, capturing all the C–H features is more challenging than the O–H features for example, which will be more similar to one another. The equivalent plot for the model trained with explicit H nodes is shown in Figure LABEL:S-fig:box_plot_with_H, illustrating that the standard deviations reduce for the reaction types involving C–H features.

3.3 Geometry quality

To illustrate that EquiReact does not require high-quality molecular structures to be used in an out-of-sample scenario, we train and test a model using lower-quality GFN2-xTB¹¹⁷ (xTB) geometries to predict higher-level barriers (CCSD(T)-F12a/cc-pVDZ-F12// $\omega$ B97X-D3/def2-TZVP for GDB7-22-TS, B3LYP-D3(BJ)/def2-TZVP//B3LYP-D3(BJ)/def2-SVP for Cyclo-23-TS and B97D/TZV(2p,2d) for Proparg-21-TS). The results are illustrated in Figure 7 for the three datasets with DFT and xTB geometries, and compared to the SLATM ${}_{d}$ +KRR model in the same settings. EquiReact benefits from a lower sensitivity to the geometry quality compared to the pre-designed representation SLATM ${}_{d}$ +KRR across the three datasets.

For the GDB7-22-TS set, there is a negligible difference in model performance moving from DFT to xTB geometries. The xTB geometries are a good proxy for the DFT geometries here, since this set consists of small, charge-neutral organic molecules, which are largely well-described by semiempirical methods. For the Cyclo-23-TS set, while the molecules are still organic, they are larger than those in the GDB7-22-TS set and there is a greater divergence between the GFN2-xTB and DFT geometries, resulting in a larger deterioration with these geometries. Figure LABEL:S-fig:cyclo_rmsd illustrates that models trained on lower quality (i.e., xTB) geometries do not produce higher errors for molecules with particularly high deviation from DFT geometries. Rather, there is a consistent deterioration in the model performance when training on xTB geometries and predicting on DFT barriers, if the xTB geometries are a poor proxy for the DFT ones.

The Proparg-21-TS set is the most complex of the three for GFN2-xTB, since these systems with charged organosilicon compounds differ considerably from those used to parameterize semi-empirical methods or force field methods. As described in Section 5.3, unlike for the other datasets where we generate an initial structure from SMILES using force field methods, for this set it is impossible and we instead generate xTB geometries from the DFT ones. While this is not a feasible geometry generation pipeline out-of-sample, it still demonstrates how different methods perform with high and low-quality geometries. Here, we see that EquiReact is significantly less sensitive than SLATM ${}_{d}$ +KRR and the variation trained with lower quality geometries still offers competitive errors ( $0.48\pm 0.21$ kcal/mol for the “None” model).

4 Conclusions

Despite the crowning of equivariant neural networks as best-in-class for the prediction of computed molecular properties, the equivalent has not been well-established for the prediction of reaction properties. We contribute to this domain by introducing EquiReact, an equivariant neural network constructed from the three-dimensional coordinates of reactants and products. While other graph-based models for reactions rely on atom-mapped reaction SMILES, EquiReact is flexible and can include atom-map** if it is available. Particularly in the regime without atom-map** information, EquiReact outperforms existing baselines^{99, 66, 38} for the prediction of reaction barriers on the GDB7-22-TS,¹⁰¹ Cyclo-23-TS ¹⁰² and Proparg-21-TS ^{63, 91} datasets. The latter dataset in particular illustrates both the data efficiency of our model (with a total dataset size of less than 700 datapoints) as well as EquiReact’s forte in describing subtle changes in geometry. EquiReact demonstrates superior extrapolation capabilities compared to the 2D-graph-based models for all datasets and in all regimes tested. It also suffers less in moving from DFT-level to GFN2-xTB-level¹¹⁷ geometries compared to existing methods. These points illustrate its utility in out-of-sample scenarios.

5 Methods

5.1 Datasets

We test EquiReact on three datasets of reaction barriers previously used to benchmark reaction representations.³⁸ In all cases, optimized three-dimensional structures of reactants and products are provided, which are used to train models and make predictions. The activation barrier is not a direct function of these structures, but using the TS structure to make predictions removes the utility of the ML models vs. direct computation of the TS. Thus we use an implicit interpolation of reactants’ and products’ structures as a proxy for the TS as in previous works.^{36, 63, 38}

The GDB7-22-TS ¹⁰¹ dataset consists of close to $12,000$ diverse organic reactions automatically constructed from the GDB7 dataset^{118, 119, 55} using the growing string method¹²⁰ along with corresponding energy barriers ( $\Delta E^{\ddagger}$ ) computed at the CCSD(T)-F12a/cc-pVDZ-F12// $\omega$ B97X-D3/def2-TZVP level. The dataset provides atom-mapped SMILES, with “True” maps derived from the transition state. For $43$ reactions out of $11,926$ , one of the product’s SMILES represent a molecule different from the xyz structure. These reactions were therefore excluded from the dataset, leading to a modified GDB7-22-TS set used here.

While there are no pre-defined classes for all the reactions in the GDB7-20-TS⁹³ or GDB7-22-TS ¹⁰¹ sets, Grambow et al.⁶⁴ split the dataset into reactions undergoing certain bond changes: for example, the most common type was breaking of a C–H bond ( $-$ C–H) and a C–C bond ( $-$ C–C) in the reactants and formation of a C–H bond ( $+$ C–H) in the products, giving the reaction type signature $+$ C–H, $-$ C–C, $-$ C–H. Here, we extract similar reaction types by comparing the connectivity matrices from atom-mapped reaction SMILES of reactants and products (ignoring bond orders). The most abundant reaction types in the dataset are $+$ C–H, $-$ C–C, $-$ C–H (1667 reactions), $+$ H–N, $-$ C–H (633), $+$ C–H, $-$ C–H (619), $+$ H–O, $-$ C–H, $-$ C–O (599) and $+$ H–H, $-$ C–H, $-$ C–H (517).

The original Cyclo-23-TS ¹⁰² dataset encompasses $5,269$ profiles for $[3+2]$ cycloaddition reactions with activation free energies ( $\Delta G^{\ddagger}$ ) computed at the B3LYP-D3(BJ)/def2-TZVP//B3LYP-D3(BJ)/def2-SVP level. The dataset provides atom-mapped SMILES with “True” maps for heavy atoms derived from either the transition state structure or heuristic rules. For the regime with explicit hydrogen atoms, we atom-mapped the xyz files by matching the reactants, given in two separate files, to the provided transition state structure, which closely resembles the two reactants and has the same atom order as in the products. This was done with a labeled graph matching algorithm as implemented in NetworkX.^{121, 122} The algorithm is unaware of chirality, double-bond stereochemistry or conformations, thus may lead to not exactly correct atom-map**s. We also found that for four reactions the product SMILES and xyz files depict different species, thus the the set was reduced to $5,265$ reactions.

The Proparg-21-TS dataset^{91, 63} contains 753 structures of intermediates before and after the enantioselective transition state of benzaldehyde propargylation, with activation energies ( $\Delta E^{\ddagger}$ ) computed at the B97D/TZV(2p,2d) level. SMILES strings (“Fragment-Based” SMILES) and “True” atom-maps are not provided in the original dataset, these are taken from Ref. 38.

RXNMapper⁹⁸-mapped versions of GDB7-22-TS and Cyclo-23-TS were obtained with RXNMapper 0.3.0 with the default settings. The Proparg-21-TS set cannot be mapped, because the underlying libraries cannot process its SMILES. Since RXNMapper sorts molecules in case of multiple reactants and/or products, which would complicate SMILES–xyz matching (see Section 5.2 below), we used a locally modified version that does not change the molecule order.

5.2 Matching SMILES strings to xyz geometries

EquiReact makes use of both the graph structure of a molecule (as provided in the SMILES string) and the three-dimensional structure (in the xyz). The atoms in the graph are associated with the atomic coordinates provided in the xyz file. Thanks to the way the GDB7-22-TS dataset¹⁰¹ was generated, the atomic coordinates can be easily matched to SMILES which in turn allows to atom-map reactants to products. However, we also tested RXNMapper-mapped SMILES which do not respect the same constraints. Therefore, for consistency, we use a SMILES–xyz matching procedure detailed below.

We construct molecular graphs from xyz using covalent radii and matched them to RDKit¹⁰⁵ molecular graphs obtained from SMILES with a labeled graph matching algorithm as implemented in NetworkX.^{121, 122} This procedure is however unaware of chirality and double-bond stereochemistry, thus some of the matches might be incorrect. Still, it provides a flexible method that can be applied to any dataset consisting of SMILES strings and xyz files.

The same procedure was applied to the Cyclo-23-TS dataset in the few cases when the the canonical SMILES have a different atom ordering than xyz.

5.3 xTB geometry generation

For the GDB7-22-TS and Cyclo-23-TS datasets, the starting structures were generated from SMILES using the distance-geometry embedding implemented in RDKit¹⁰⁵ with the srETKDGv3 settings.¹²³ Ten conformations were produced per molecule, which were then energy-ranked with the MMFF94 implementation¹²⁴ in RDKit, defaulting to UFF in case of missing parameters. The lowest energy conformer was retained. For the Proparg-21-TS set, the original B97D/TZV(2p,2d) geometries were used as a starting point, because the stereochemical and conformational diversity of this set cannot be completely encoded with SMILES. Therefore MMFF94 will fail to generate an initial geometry from SMILES.

For all the sets, the starting structures were optimized at the GFN2-xTB semiempirical level of theory¹¹⁷ at the “loose” convergence level for a maximum of 1000 iterations using xTB v6.2 RC2. For $969$ reactions of the GDB7-22-TS set and $491$ reactions of the Cyclo-23-TS set, at least one of the participating molecules either could not converge to any reasonable configuration or converged to a structure not matching the SMILES. These reactions were excluded from the geometry quality tests (Sec. 3.3).

5.4 Model training

EquiReact was trained using the Adam optimizer ¹²⁵ with learning rate and weight decay parameters as hyperparameters to be optimized. The learning rate was reduced by 40% after $60$ epochs of no improvement in the validation MAE, as in Ref. 106. Models were trained for max. $512$ epochs, using early stop** after $150$ epochs of no improvement. The model with the best validation score was then used to make predictions on the test set.

The optimal model hyperparameters were searched within the following values: learning rate $\in[$5\text{$\cdot$}{10}^{-5}$,${10}^{-4}$,$5\text{$\cdot$}{10}^{-4}$,${10}^{-% 3}$]$ ; weight decay parameter $\in[${10}^{-5}$,${10}^{-4}$,${10}^{-3}$,0]$ ; node and edge features embedding size $n_{s}\in[16,32,48,64]$ ; $\ell{=}1$ hidden space size $n_{v}\in[16,32,48,64]$ ; number of edge features $n_{g}\in[16,32,48,64]$ ; number of convolutional layers $n_{\mathrm{conv}}\in[2,3]$ ; radial cutoff $r_{\max}\in[2.5,5.0,10.0]$ ; maximum number of atom neighbors $n_{\mathrm{neigh}}\in[10,25,50]$ ; dropout probability $p_{d}\in[0.0,0.05,0.1]$ ; sum_mode $\in$ [node, both]; combine_mode $\in$ [mlp, diff, mean, sum]; graph_mode $\in$ [energy, vector].

The hyperparameter search was done using EquiReact ${}_{S}$ (without attention or map**), using Bayesian search as implemented in Weights & Biases.¹²⁶ For the EquiReact ${}_{X}$ regime, the learning rate and weight decay parameter were optimized afterward in a grid search, setting the other parameters to those from the search for EquiReact ${}_{S}$ . Hydrogen atoms were included as nodes in the graphs. Sweeps were run for 128 epochs for the GDB7-22-TS and Proparg-21-TS sets, and for 256 epochs for the Cyclo-23-TS set on a single random split. The parameters resulting in the best validation error, summarized in Table LABEL:S-tab:model-params, were used for all the other model settings.

5.5 Baseline models

The ChemProp model is based on a Condensed Graph of Reaction (CGR)⁶⁵ built from atom-mapped SMILES strings of reactants and products, which is then passed through the directed message-passing neural network chemprop¹¹⁴ (version 1.5.0). Models are trained using the default parameters of chemprop.

Molecular SLATM vectors were generated using the qml python package¹²⁷ before being combined to form the reaction version SLATM ${}_{d}$ . SLATM ${}_{d}$ is combined with Kernel Ridge Regression (KRR) models using the best kernel functions as in van Gerwen et al.³⁸ The kernel width and regularization parameters were optimized on the first fold of the ten using the random splits, in line with how the hyperparameters were optimized for EquiReact.

Code and Data Availability

The code is available as a github repository at https://github.com/lcmd-epfl/EquiReact. The versions of the datasets used, as well as any processing applied to them, can be found in the same repository.

{suppinfo}

Supplementary Information is provided in the freely available file SI.pdf, detailing the hyperparameters of the different models tested (Section LABEL:S-sec:model-params), the model performance with and without explicit hydrogen atoms (Section LABEL:S-sec:hydrogens), and the discussion of the model with a cross-attention surrogate for atom-map** (Section LABEL:S-sec:cross).

Author Information

Author contributions

P.v.G. and C.B. conceptualized the project. EquiReact and support codes were written and run by P.v.G. and K.R.B., with design suggestions from C.B. and V.R.S. Results were analyzed by P.v.G., K.R.B., C.B. and V.R.S. xTB computations were run by R.L. The original draft was written by P.v.G. and K.R.B. with reviews and edits from all authors. C.C. and A.K. provided supervision and are acknowledged for acquiring funding.

Conflict of interest

The authors have no conflicts to disclose.

{acknowledgement}

P.v.G., C.B., V.R.S., R.L., A.K. and C.C. acknowledge the National Centre of Competence in Research (NCCR) “Sustainable chemical process through catalysis (Catalysis)”, grant number 180544, of the Swiss National Science Foundation (SNSF) for financial support. K.R.B. and C.C. were supported by the European Research Council (grant number 817977) and by the National Centre of Competence in Research (NCCR) “Materials’ Revolution: Computational Design and Discovery of Novel Materials (MARVEL)”, grant number 205602, of the Swiss National Science Foundation.

References

Behler and Parrinello 2007 Behler, J.; Parrinello, M. Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces. Phys. Rev. Lett. 2007, 98, 146401
Rupp et al. 2012 Rupp, M.; Tkatchenko, A.; Müller, K.-R.; von Lilienfeld, O. A. Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning. Phys. Rev. Lett. 2012, 108, 058301
Bartók et al. 2013 Bartók, A. P.; Kondor, R.; Csányi, G. On representing chemical environments. Phys. Rev. B 2013, 87, 184115
Hansen et al. 2015 Hansen, K.; Biegler, F.; Ramakrishnan, R.; Pronobis, W.; von Lilienfeld, O. A.; Müller, K.-R.; Tkatchenko, A. Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space. J. Phys. Chem. Lett. 2015, 6, 2326–2331
Huo and Rupp 2017 Huo, H.; Rupp, M. Unified representation for machine learning of molecules and crystals. arXiv preprint 2017, arXiv:1704.06439
Faber et al. 2018 Faber, F. A.; Christensen, A. S.; Huang, B.; von Lilienfeld, O. A. Alchemical and structural distribution based representation for universal quantum machine learning. J. Chem. Phys. 2018, 148, 241717
Christensen et al. 2020 Christensen, A. S.; Bratholm, L. A.; Faber, F. A.; Anatole von Lilienfeld, O. FCHL revisited: Faster and more accurate quantum machine learning. J. Chem. Phys. 2020, 152, 044107
Huang and von Lilienfeld 2020 Huang, B.; von Lilienfeld, O. A. Quantum machine learning using atom-in-molecule-based fragments selected on the fly. Nat. Chem. 2020, 12, 945–951
Drautz 2019 Drautz, R. Atomic cluster expansion for accurate and transferable interatomic potentials. Phys. Rev. B 2019, 99, 014104
Dusson et al. 2022 Dusson, G.; Bachmayr, M.; Csányi, G.; Drautz, R.; Etter, S.; van der Oord, C.; Ortner, C. Atomic cluster expansion: Completeness, efficiency and stability. J. Comput. Phys. 2022, 454, 110946
Grisafi and Ceriotti 2019 Grisafi, A.; Ceriotti, M. Incorporating long-range physics in atomic-scale machine learning. J. Chem. Phys. 2019, 151, 204105
Grisafi et al. 2021 Grisafi, A.; Nigam, J.; Ceriotti, M. Multi-scale approach for the prediction of atomic scale properties. Chem. Sci. 2021, 12, 2078–2090
Nigam et al. 2020 Nigam, J.; Pozdnyakov, S.; Ceriotti, M. Recursive evaluation and iterative contraction of $N$ -body equivariant features. J. Chem. Phys. 2020, 153, 121101
Fabrizio et al. 2022 Fabrizio, A.; Briling, K. R.; Corminboeuf, C. SPA^HM: the Spectrum of Approximated Hamiltonian Matrices representations. Digital Discovery 2022, 1, 286–294
Briling et al. 2023 Briling, K. R.; Calvino Alonso, Y.; Fabrizio, A.; Corminboeuf, C. SPA ${}^{\mathrm{H}}$ M(a,b): encoding the density information from guess Hamiltonian in quantum machine learning representations. arXiv preprint 2023, arXiv:2309.02950
Karandashev and von Lilienfeld 2022 Karandashev, K.; von Lilienfeld, O. A. An orbital-based representation for accurate quantum machine learning. J. Chem. Phys. 2022, 156, 114101
Llenga and Gryn’ova 2023 Llenga, S.; Gryn’ova, G. Matrix of orthogonalized atomic orbital coefficients representation for radicals and ions. J. Chem. Phys. 2023, 158, 214116
Li et al. 2015 Li, Z.; Kermode, J. R.; De Vita, A. Molecular dynamics with on-the-fly machine learning of quantum-mechanical forces. Phys. Rev. Lett. 2015, 114, 096405
Chmiela et al. 2017 Chmiela, S.; Tkatchenko, A.; Sauceda, H. E.; Poltavsky, I.; Schütt, K. T.; Müller, K.-R. Machine learning of accurate energy-conserving molecular force fields. Sci. Adv. 2017, 3, e1603015
Chmiela et al. 2018 Chmiela, S.; Sauceda, H. E.; Müller, K.-R.; Tkatchenko, A. Towards exact molecular dynamics simulations with machine-learned force fields. Nat. Commun. 2018, 9, 3887
Behler 2017 Behler, J. First principles neural network potentials for reactive simulations of large molecular and condensed systems. Angew. Chem. Int. Ed. 2017, 56, 12828–12840
Smith et al. 2018 Smith, J. S.; Nebgen, B.; Lubbers, N.; Isayev, O.; Roitberg, A. E. Less is more: Sampling chemical space with active learning. J. Chem. Phys. 2018, 148, 241733
Bereau et al. 2015 Bereau, T.; Andrienko, D.; Von Lilienfeld, O. A. Transferable atomic multipole machine learning models for small organic molecules. J. Chem. Theory Comput. 2015, 11, 3225–3233
Grisafi et al. 2018 Grisafi, A.; Wilkins, D. M.; Csányi, G.; Ceriotti, M. Symmetry-adapted machine learning for tensorial properties of atomistic systems. Phys. Rev. Lett. 2018, 120, 036002
Wilkins et al. 2019 Wilkins, D. M.; Grisafi, A.; Yang, Y.; Lao, K. U.; DiStasio Jr, R. A.; Ceriotti, M. Accurate molecular polarizabilities with coupled cluster theory and machine learning. Proc. Natl. Acad. Sci. U.S.A. 2019, 116, 3401–3406
Montavon et al. 2013 Montavon, G.; Rupp, M.; Gobre, V.; Vazquez-Mayagoitia, A.; Hansen, K.; Tkatchenko, A.; Müller, K.-R.; Von Lilienfeld, O. A. Machine learning of molecular electronic properties in chemical compound space. New J. Phys. 2013, 15, 095003
Mazouin et al. 2022 Mazouin, B.; Schöpfer, A. A.; von Lilienfeld, O. A. Selected machine learning of HOMO–LUMO gaps with improved data-efficiency. Mater. Adv. 2022, 3, 8306–8316
Brockherde et al. 2017 Brockherde, F.; Vogt, L.; Li, L.; Tuckerman, M. E.; Burke, K.; Müller, K.-R. Bypassing the Kohn-Sham equations with machine learning. Nat. Commun. 2017, 8, 872
Grisafi et al. 2010 Grisafi, A.; Fabrizio, A.; Meyer, B.; Wilkins, D. M.; Corminboeuf, C.; Ceriotti, M. Transferable machine-learning model of the electron density. ACS Cent. Sci. 2010, 5, 57–64
Fabrizio et al. 2019 Fabrizio, A.; Grisafi, A.; Meyer, B.; Ceriotti, M.; Corminboeuf, C. Electron density learning of non-covalent systems. Chem. Sci. 2019, 10, 9424–9432
Musil et al. 2021 Musil, F.; Grisafi, A.; Bartók, A. P.; Ortner, C.; Csányi, G.; Ceriotti, M. Physics-Inspired Structural Representations for Molecules and Materials. Chem. Rev. 2021, 121, 9759–9815
Langer et al. 2022 Langer, M. F.; Goessmann, A.; Rupp, M. Representations of molecules and materials for interpolation of quantum-mechanical simulations via machine learning. npj Comput. Mater. 2022, 8, 41
Huang and von Lilienfeld 2021 Huang, B.; von Lilienfeld, O. A. Ab Initio Machine Learning in Chemical Compound Space. Chem. Rev. 2021, 121, 10001–10036
Kulik et al. 2022 Kulik, H. J.; Hammerschmidt, T.; Schmidt, J.; Botti, S.; Marques, M. A. L.; Boley, M.; Scheffler, M.; Todorović, M.; Rinke, P.; Oses, C.; Smolyanyuk, A.; Curtarolo, S.; Tkatchenko, A.; Bartók, A. P.; Manzhos, S.; Ihara, M.; Carrington, T.; Behler, J.; Isayev, O.; Veit, M.; Grisafi, A.; Nigam, J.; Ceriotti, M.; Schütt, K. T.; Westermayr, J.; Gastegger, M.; Maurer, R. J.; Kalita, B.; Burke, K.; Nagai, R.; Akashi, R.; Sugino, O.; Hermann, J.; Noé, F.; Pilati, S.; Draxl, C.; Kuban, M.; Rigamonti, S.; Scheidgen, M.; Esters, M.; Hicks, D.; Toher, C.; Balachandran, P. V.; Tamblyn, I.; Whitelam, S.; Bellinger, C.; Ghiringhelli, L. M. Roadmap on Machine learning in electronic structure. Electron. Struct. 2022, 4, 023004
Glielmo et al. 2017 Glielmo, A.; Sollich, P.; De Vita, A. Accurate interatomic force fields via machine learning with covariant kernels. Phys. Rev. B 2017, 95, 214302
van Gerwen et al. 2022 van Gerwen, P.; Fabrizio, A.; Wodrich, M. D.; Corminboeuf, C. Physics-based representations for machine learning properties of chemical reactions. Mach. Learn.: Sci. Technol. 2022, 3, 045005
Faber et al. 2017 Faber, F. A.; Hutchison, L.; Huang, B.; Gilmer, J.; Schoenholz, S. S.; Dahl, G. E.; Vinyals, O.; Kearnes, S.; Riley, P. F.; Von Lilienfeld, O. A. Prediction errors of molecular machine learning models lower than hybrid DFT error. J. Chem. Theory Comput. 2017, 13, 5255–5264
van Gerwen et al. 2023 van Gerwen, P.; Briling, K. R.; Calvino Alonso, Y.; Franke, M.; Corminboeuf, C. Benchmarking machine-readable vectors of chemical reactions on computed activation barriers. ChemRxiv preprint 2023, doi:10.26434/chemrxiv--2023--0hgbc
Schütt et al. 2017 Schütt, K.; Kindermans, P.-J.; Sauceda Felix, H. E.; Chmiela, S.; Tkatchenko, A.; Müller, K.-R. Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. Adv. Neural Inf. Process. Syst. 2017, 30, 991–1001
Unke and Meuwly 2019 Unke, O. T.; Meuwly, M. PhysNet: A neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput. 2019, 15, 3678–3693
Gasteiger et al. 2020 Gasteiger, J.; Groß, J.; Günnemann, S. Directional message passing for molecular graphs. arXiv preprint 2020, arXiv:2003.03123
Gilmer et al. 2017 Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; Vinyals, O.; Dahl, G. E. Neural message passing for quantum chemistry. International conference on machine learning. 2017; pp 1263–1272
Batzner et al. 2022 Batzner, S.; Musaelian, A.; Sun, L.; Geiger, M.; Mailoa, J. P.; Kornbluth, M.; Molinari, N.; Smidt, T. E.; Kozinsky, B. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 2022, 13, 2453
Gasteiger et al. 2021 Gasteiger, J.; Becker, F.; Günnemann, S. Gemnet: Universal directional graph neural networks for molecules. Adv. Neural Inf. Process. Syst. 2021, 34, 6790–6802
Haghighatlari et al. 2022 Haghighatlari, M.; Li, J.; Guan, X.; Zhang, O.; Das, A.; Stein, C. J.; Heidar-Zadeh, F.; Liu, M.; Head-Gordon, M.; Bertels, L.; Hao, H.; Leven, I.; Head-Gordon, T. Newtonnet: A newtonian message passing network for deep learning of interatomic potentials and forces. Digital Discovery 2022, 1, 333–343
Qiao et al. 2020 Qiao, Z.; Welborn, M.; Anandkumar, A.; Manby, F. R.; Miller, T. F. OrbNet: Deep learning for quantum chemistry using symmetry-adapted atomic-orbital features. J. Chem. Phys. 2020, 153, 124111
Thomas et al. 2018 Thomas, N.; Smidt, T.; Kearnes, S.; Yang, L.; Li, L.; Kohlhoff, K.; Riley, P. Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds. arXiv preprint 2018, arXiv:1802.08219
Townshend et al. 2020 Townshend, R. J.; Townshend, B.; Eismann, S.; Dror, R. O. Geometric prediction: Moving beyond scalars. arXiv preprint 2020, arXiv:2006.14163
Anderson et al. 2019 Anderson, B.; Hy, T. S.; Kondor, R. Cormorant: Covariant molecular neural networks. Adv. Neural Inf. Process. Syst. 2019, 32, 14537–14546
Satorras et al. 2021 Satorras, V. G.; Hoogeboom, E.; Welling, M. E(n) Equivariant Graph Neural Networks. Proceedings of the 38th International Conference on Machine Learning. 2021; pp 9323–9332
Christensen et al. 2021 Christensen, A. S.; Sirumalla, S. K.; Qiao, Z.; O’Connor, M. B.; Smith, D. G.; Ding, F.; Bygrave, P. J.; Anandkumar, A.; Welborn, M.; Manby, F. R.; Miller, T. F. OrbNet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy. J. Chem. Phys. 2021, 155, 204103
Schütt et al. 2021 Schütt, K.; Unke, O.; Gastegger, M. Equivariant message passing for the prediction of tensorial properties and molecular spectra. Proceedings of the 38th International Conference on Machine Learning. 2021; pp 9377–9388
Unke et al. 2021 Unke, O. T.; Chmiela, S.; Gastegger, M.; Schütt, K. T.; Sauceda, H. E.; Müller, K.-R. SpookyNet: Learning force fields with electronic degrees of freedom and nonlocal effects. Nat. Commun. 2021, 12, 7273
Cheng et al. 2019 Cheng, L.; Welborn, M.; Christensen, A. S.; Miller, T. F. A universal density matrix functional from molecular orbital-based machine learning: Transferability across organic molecules. J. Chem. Phys. 2019, 150, 131103
Ramakrishnan et al. 2014 Ramakrishnan, R.; Dral, P. O.; Rupp, M.; Von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 2014, 1, 140022
Musaelian et al. 2023 Musaelian, A.; Batzner, S.; Johansson, A.; Sun, L.; Owen, C. J.; Kornbluth, M.; Kozinsky, B. Learning local equivariant representations for large-scale atomistic dynamics. Nature Communications 2023, 14, 579
Law et al. 2014 Law, V.; Knox, C.; Djoumbou, Y.; Jewison, T.; Guo, A. C.; Liu, Y.; Maciejewski, A.; Arndt, D.; Wilson, M.; Neveu, V., et al. DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res. 2014, 42, D1091–D1097
Folmsbee and Hutchison 2021 Folmsbee, D.; Hutchison, G. Assessing conformer energies using electronic structure and machine learning methods. Int. J. Quantum Chem. 2021, 121, e26381
Smith et al. 2017 Smith, J. S.; Isayev, O.; Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 2017, 8, 3192–3203
Zeng et al. 2020 Zeng, J.; Cao, L.; Xu, M.; Zhu, T.; Zhang, J. Z. Complex reaction processes in combustion unraveled by neural network-based molecular dynamics simulation. Nat. Commun. 2020, 11, 5713
Chanussot et al. 2021 Chanussot, L.; Das, A.; Goyal, S.; Lavril, T.; Shuaibi, M.; Riviere, M.; Tran, K.; Heras-Domingo, J.; Ho, C.; Hu, W., et al. Open catalyst 2020 (OC20) dataset and community challenges. ACS Catal. 2021, 11, 6059–6072
Lewis-Atwell et al. 2022 Lewis-Atwell, T.; Townsend, P. A.; Grayson, M. N. Machine learning activation energies of chemical reactions. WIREs Comput. Mol. Sci. 2022, 12, e1593
Gallarati et al. 2021 Gallarati, S.; Fabregat, R.; Laplaza, R.; Bhattacharjee, S.; Wodrich, M. D.; Corminboeuf, C. Reaction-based machine learning representations for predicting the enantioselectivity of organocatalysts. Chem. Sci. 2021, 12, 6879–6889
Grambow et al. 2020 Grambow, C. A.; Pattanaik, L.; Green, W. H. Deep Learning of Activation Energies. J. Phys. Chem. Lett. 2020, 11, 2992–2997
Heid and Green 2022 Heid, E.; Green, W. H. Machine learning of reaction properties via learned representations of the condensed graph of reaction. J. Chem. Inf. Model. 2022, 62, 2101–2110
Spiekermann et al. 2022 Spiekermann, K. A.; Pattanaik, L.; Green, W. H. Fast Predictions of Reaction Barrier Heights: Toward Coupled-Cluster Accuracy. J. Phys. Chem. A 2022, 126, 3976–3986
Zhao et al. 2023 Zhao, Q.; Anstine, D. M.; Isayev, O.; Savoie, B. M. $\Delta^{2}$ machine learning for reaction property prediction. Chem. Sci. 2023, 14, 13392–13401
Heinen et al. 2021 Heinen, S.; von Rudorff, G. F.; von Lilienfeld, O. A. Toward the design of chemical reactions: Machine learning barriers of competing mechanisms in reactant space. J. Chem. Phys. 2021, 155, 064105
Singh et al. 2019 Singh, A. R.; Rohr, B. A.; Gauthier, J. A.; Nørskov, J. K. Predicting chemical reaction barriers with a machine learning model. Catal. Lett. 2019, 149, 2347–2354
Choi et al. 2018 Choi, S.; Kim, Y.; Kim, J. W.; Kim, Z.; Kim, W. Y. Feasibility of activation energy prediction of gas-phase reactions by machine learning. Chem. Eur. J. 2018, 24, 12354–12358
Farrar and Grayson 2022 Farrar, E. H. E.; Grayson, M. N. Machine learning and semi-empirical calculations: a synergistic approach to rapid, accurate, and mechanism-based reaction barrier prediction. Chem. Sci. 2022, 13, 7594–7603
Friederich et al. 2020 Friederich, P.; dos Passos Gomes, G.; Bin, R. D.; Aspuru-Guzik, A.; Balcells, D. Machine learning dihydrogen activation in the chemical space surrounding Vaska’s complex. Chem. Sci. 2020, 11, 4584–4601
Migliaro and Cundari 2020 Migliaro, I.; Cundari, T. R. Density Functional Study of Methane Activation by Frustrated Lewis Pairs with Group 13 Trihalides and Group 15 Pentahalides and a Machine Learning Analysis of Their Barrier Heights. J. Chem. Inf. Model. 2020, 60, 4958–4966
Lewis-Atwell et al. 2023 Lewis-Atwell, T.; Beechey, D.; Şimşek, O.; Grayson, M. N. Reformulating Reactivity Design for Data-Efficient Machine Learning. ACS catalysis 2023, 13, 13506–13515
Schwaller et al. 2022 Schwaller, P.; Vaucher, A. C.; Laplaza, R.; Bunne, C.; Krause, A.; Corminboeuf, C.; Laino, T. Machine intelligence for chemical reaction space. WIREs Comput. Mol. Sci. 2022, 12, e1604
Rogers and Hahn 2010 Rogers, D.; Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 2010, 50, 742–754
Probst et al. 2022 Probst, D.; Schwaller, P.; Reymond, J.-L. Reaction Classification and Yield Prediction using the Differential Reaction Fingerprint DRFP. Digital Discovery 2022, 1, 91–97
Ahneman et al. 2018 Ahneman, D. T.; Estrada, J. G.; Lin, S.; Dreher, S. D.; Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 2018, 360, 186–190
Żurański et al. 2021 Żurański, A. M.; Martinez Alvarado, J. I.; Shields, B. J.; Doyle, A. G. Predicting Reaction Yields via Supervised Learning. Acc. Chem. Res. 2021, 54, 1856–1865
Zahrt et al. 2019 Zahrt, A. F.; Henle, J. J.; Rose, B. T.; Wang, Y.; Darrow, W. T.; Denmark, S. E. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 2019, 363, eaau5631
Jorner et al. 2021 Jorner, K.; Brinck, T.; Norrby, P.-O.; Buttar, D. Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies. Chem. Sci. 2021, 12, 1163–1175
Reid and Sigman 2019 Reid, J. P.; Sigman, M. S. Holistic prediction of enantioselectivity in asymmetric catalysis. Nature 2019, 571, 343–348
Gensch et al. 2022 Gensch, T.; dos Passos Gomes, G.; Friederich, P.; Peters, E.; Gaudin, T.; Pollice, R.; Jorner, K.; Nigam, A.; Lindner-D’Addario, M.; Sigman, M. S.; Aspuru-Guzik, A. A comprehensive discovery platform for organophosphorus ligands for catalysis. J. Am. Chem. Soc. 2022, 144, 1205–1217
Santiago et al. 2018 Santiago, C. B.; Guo, J.-Y.; Sigman, M. S. Predictive and mechanistic multivariate linear regression models for reaction development. Chem. Sci. 2018, 9, 2398–2412
Jorner 2023 Jorner, K. Putting Chemical Knowledge to Work in Machine Learning for Reactivity. Chimia 2023, 77, 22
Gallegos et al. 2021 Gallegos, L. C.; Luchini, G.; St. John, P. C.; Kim, S.; Paton, R. S. Importance of Engineered and Learned Molecular Representations in Predicting Organic Reactivity, Selectivity, and Chemical Properties. Acc. Chem. Res. 2021, 54, 827–836
Williams et al. 2021 Williams, W. L.; Zeng, L.; Gensch, T.; Sigman, M. S.; Doyle, A. G.; Anslyn, E. V. The Evolution of Data-Driven Modeling in Organic Chemistry. ACS Cent. Sci. 2021, 7, 1622–1637
Devlin et al. 2018 Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint 2018, arXiv:1810.04805
Schwaller et al. 2019 Schwaller, P.; Laino, T.; Gaudin, T.; Bolgar, P.; Hunter, C. A.; Bekas, C.; Lee, A. A. Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction. ACS Cent. Sci. 2019, 5, 1572–1583
Schwaller et al. 2021 Schwaller, P.; Vaucher, A. C.; Laino, T.; Reymond, J.-L. Prediction of chemical reaction yields using deep learning. Mach. Learn.: Sci. Technol. 2021, 2, 015016
Doney et al. 2016 Doney, A. C.; Rooks, B. J.; Lu, T.; Wheeler, S. E. Design of Organocatalysts for Asymmetric Propargylations through Computational Screening. ACS Catal. 2016, 6, 7948–7955
Gasteiger et al. 2020 Gasteiger, J.; Giri, S.; Margraf, J. T.; Günnemann, S. Fast and uncertainty-aware directional message passing for non-equilibrium molecules. arXiv preprint 2020, arXiv:2011.14115
Grambow et al. 2020 Grambow, C.; Pattanaik, L.; Green, W. Reactants, products, and transition states of elementary chemical reactions based on quantum chemistry. Sci. Data 2020, 7, 137
Duan et al. 2023 Duan, C.; Du, Y.; Jia, H.; Kulik, H. J. Accurate transition state generation with an object-aware equivariant elementary reaction diffusion model. arXiv preprint 2023, arXiv:2304.06174
Chen et al. 2013 Chen, W. L.; Chen, D. Z.; Taylor, K. T. Automatic reaction map** and reaction center detection. WIREs Comput. Mol. Sci. 2013, 3, 560–593
Preciat Gonzalez et al. 2017 Preciat Gonzalez, G. A.; El Assal, L. R.; Noronha, A.; Thiele, I.; Haraldsdóttir, H. S.; Fleming, R. M. Comparative evaluation of atom map** algorithms for balanced metabolic reactions: application to Recon 3D. J. Cheminform. 2017, 9, 1–15
Jaworski et al. 2019 Jaworski, W.; Szymkuć, S.; Mikulak-Klucznik, B.; Piecuch, K.; Klucznik, T.; Kaźmierowski, M.; Rydzewski, J.; Gambin, A.; Grzybowski, B. A. Automatic map** of atoms across both simple and complex chemical reactions. Nat. Commun. 2019, 10, 1434
Schwaller et al. 2021 Schwaller, P.; Hoover, B.; Reymond, J.-L.; Strobelt, H.; Laino, T. Extraction of organic chemistry grammar from unsupervised learning of chemical reactions. Sci. Adv. 2021, 7, eabe4166
Stuyver and Coley 2023 Stuyver, T.; Coley, C. W. Machine Learning-Guided Computational Screening of New Candidate Reactions with High Bioorthogonal Click Potential. Chem. Eur. J. 2023, 29, e202300387
Stuyver and Coley 2022 Stuyver, T.; Coley, C. W. Quantum chemistry-augmented neural networks for reactivity prediction: Performance, generalizability, and explainability. J. Chem. Phys. 2022, 156, 084104
Spiekermann et al. 2022 Spiekermann, K.; Pattanaik, L.; Green, W. H. High accuracy barrier heights, enthalpies, and rate coefficients for chemical reactions. Sci. Data 2022, 9, 417
Stuyver et al. 2023 Stuyver, T.; Jorner, K.; Coley, C. W. Reaction profiles for quantum chemistry-computed [3+2] cycloaddition reactions. Sci. Data 2023, 10, 66
Geiger et al. 2022 Geiger, M.; Smidt, T.; M., A.; Miller, B. K.; Boomsma, W.; Dice, B.; Lapchevskyi, K.; Weiler, M.; Tyszkiewicz, M.; Uhrin, M.; Batzner, S.; Madisetti, D.; Frellsen, J.; Jung, N.; Sanborn, S.; jkh,; Wen, M.; Rackers, J.; Rød, M.; Bailey, M. e3nn/e3nn: 2022-12-12. 2022; https://doi.org/10.5281/zenodo.7430260
Corso et al. 2023 Corso, G.; Stärk, H.; **g, B.; Barzilay, R.; Jaakkola, T. DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking. arXiv preprint 2023, arXiv:2210.01776
Landrum et al. 2023 Landrum, G.; Tosco, P.; Kelley, B.; Ric,; Sriniker,; Cosgrove, D.; Gedeck,; Vianello, R.; NadineSchneider,; Kawashima, E.; N, D.; Jones, G.; Dalke, A.; Cole, B.; Swain, M.; Turk, S.; AlexanderSavelyev,; Vaucher, A.; Wójcikowski, M.; Ichiru Take,; Probst, D.; Ujihara, K.; Scalfani, V. F.; Godin, G.; Pahl, A.; Francois Berenger,; JLVarjo,; Walker, R.; Jasondbiggs,; Strets123, rdkit/rdkit: 2023_03_1 (Q1 2023) Release. 2023; https://zenodo.org/record/7880616
Stärk et al. 2022 Stärk, H.; Ganea, O.; Pattanaik, L.; Barzilay, R.; Jaakkola, T. EquiBind: Geometric deep learning for drug binding structure prediction. International conference on machine learning. 2022; pp 20503–20521
Vaswani et al. 2017 Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008
Paszke et al. 2019 Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; Desmaison, A.; Kopf, A.; Yang, E.; DeVito, Z.; Raison, M.; Tejani, A.; Chilamkurthy, S.; Steiner, B.; Fang, L.; Bai, J.; Chintala, S. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037
Liao and Smidt 2022 Liao, Y.-L.; Smidt, T. Equiformer: Equivariant graph attention transformer for 3D atomistic graphs. arXiv preprint 2022, arXiv:2206.11990
Ganea et al. 2022 Ganea, O.-E.; Huang, X.; Bunne, C.; Bian, Y.; Barzilay, R.; Jaakkola, T.; Krause, A. Independent SE(3)-Equivariant Models for End-to-End Rigid Protein Docking. arXiv preprint 2022, arXiv:2111.07786
van Gerwen et al. 2023 van Gerwen, P.; Wodrich, M. D.; Laplaza, R.; Corminboeuf, C. Reply to Comment on ‘Physics-based representations for machine learning properties of chemical reactions’. Mach. Learn.: Sci. Technol. 2023, 4, 048002
Lowe 2012 Lowe, D. M. Extraction of chemical structures and reactions from the literature. Ph.D. thesis, University of Cambridge, 2012
Wu et al. 2018 Wu, Z.; Ramsundar, B.; Feinberg, E. N.; Gomes, J.; Geniesse, C.; Pappu, A. S.; Leswing, K.; Pande, V. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 2018, 9, 513–530
Yang et al. 2019 Yang, K.; Swanson, K.; **, W.; Coley, C.; Eiden, P.; Gao, H.; Guzman-Perez, A.; Hopper, T.; Kelley, B.; Mathea, M.; Palmer, A.; Settels, V.; Jaakkola, T.; Jensen, K.; Barzilay, R. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 2019, 59, 3370–3388
Bemis and Murcko 1996 Bemis, G. W.; Murcko, M. A. The Properties of Known Drugs. 1. Molecular Frameworks. J. Med. Chem. 1996, 39, 2887–2893
van der Maaten and Hinton 2008 van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605
Bannwarth et al. 2019 Bannwarth, C.; Ehlert, S.; Grimme, S. GFN2-xTB—An Accurate and Broadly Parametrized Self-Consistent Tight-Binding Quantum Chemical Method with Multipole Electrostatics and Density-Dependent Dispersion Contributions. J. Chem. Theory Comput. 2019, 15, 1652–1671
Blum and Reymond 2009 Blum, L. C.; Reymond, J.-L. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. J. Am. Chem. Soc. 2009, 131, 8732–8733
Reymond 2015 Reymond, J.-L. The chemical space project. Acc. Chem. Res. 2015, 48, 722–730
Zimmerman 2015 Zimmerman, P. M. Single-ended transition state finding with the growing string method. J. Comput. Chem. 2015, 36, 601–611
Cordella et al. 2001 Cordella, L. P.; Foggia, P.; Sansone, C.; Vento, M. An improved algorithm for matching large graphs. 3rd IAPR-TC15 workshop on graph-based representations in pattern recognition. 2001; pp 149–159
Hagberg et al. 2008 Hagberg, A. A.; Schult, D. A.; Swart, P. J. Exploring Network Structure, Dynamics, and Function using NetworkX. Proceedings of the 7th Python in Science Conference. Pasadena, CA USA, 2008; pp 11–15
Riniker and Landrum 2015 Riniker, S.; Landrum, G. A. Better informed distance geometry: using what we know to improve conformation generation. J. Chem. Inf. Model. 2015, 55, 2562–2574
Tosco et al. 2014 Tosco, P.; Stiefl, N.; Landrum, G. Bringing the MMFF force field to the RDKit: implementation and validation. J. Cheminform. 2014, 6, 37
Kingma and Ba 2014 Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint 2014, arXiv:1412.6980
Biewald 2020 Biewald, L. Experiment Tracking with Weights and Biases. 2020; https://www.wandb.com/, Software available from wandb.com
Christensen et al. 2017 Christensen, A. S.; Faber, F.; Huang, B.; Bratholm, L.; Tkatchenko, A.; Müller, K.-R.; von Lilienfeld, O. A. QML: A Python Toolkit for Quantum Machine Learning. https://github.com/qmlcode/qml, 2017