HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: witharrows
  • failed: mhchem

Authors: achieve the best HTML results from your LaTeX submissions by selecting from this list of supported packages.

License: arXiv.org perpetual non-exclusive license
arXiv:2312.08307v1 [physics.chem-ph] 13 Dec 2023

EquiReact: An equivariant neural network for chemical reactions

Puck van Gerwen Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland    Ksenia R. Briling Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland    Charlotte Bunne National Center for Competence in Research – Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland    Vignesh Ram Somnath National Center for Competence in Research – Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland    Ruben Laplaza Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland    Andreas Krause National Center for Competence in Research – Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland    Clemence Corminboeuf [email protected] Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
Abstract

Equivariant neural networks have considerably improved the accuracy and data-efficiency of predictions of molecular properties. Building on this success, we introduce EquiReact, an equivariant neural network to infer properties of chemical reactions, built from three-dimensional structures of reactants and products. We illustrate its competitive performance on the prediction of activation barriers on the GDB7-22-TS, Cyclo-23-TS and Proparg-21-TS datasets with different regimes according to the inclusion of atom-map** information. We show that, compared to state-of-the-art models for reaction property prediction, EquiReact offers: (i) a flexible model with reduced sensitivity between atom-map** regimes, (ii) better extrapolation capabilities to unseen chemistries, (iii) impressive prediction errors for datasets exhibiting subtle variations in three-dimensional geometries of reactants/products, (iv) reduced sensitivity to geometry quality and (iv) excellent data efficiency.

keywords:
machine learning, equivariant neural networks, activation energies, chemical reactions
\altaffiliation

These authors contributed equally to this work. \alsoaffiliationNational Center for Competence in Research – Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland \altaffiliationThese authors contributed equally to this work. \alsoaffiliationLearning & Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland \alsoaffiliationLearning & Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland \alsoaffiliationNational Center for Competence in Research – Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland \alsoaffiliationLearning & Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland \alsoaffiliationNational Center for Competence in Research – Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland

1 Introduction

Physics-inspired representations that take as input the three-dimensional structure1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 (as well as, in some cases, electronic structure14, 15, 16, 17) of molecules and transform them into a fixed-length vector while respecting known physical laws, have a rich history in molecular property prediction. Models have been developed to predict properties ranging from atomization energies,2, 8, 4, 5, 6, 7, 13 molecular forces,18, 19, 20, 7, 5 potential energy surfaces,1, 21, 3, 5, 7, 9, 10, 22 multipole moments,23 polarizabilities,4, 5, 6, 11, 12, 24, 25 dipole moments,6 HOMO and LUMO eigenvalues4, 6 as well as the HOMO–LUMO gap,6, 26, 27 and electron densities.28, 29, 30 A common desiderata31, 32, 33, 34 for high-performing representations is (i) smoothness, (ii) the encoding of the appropriate symmetries to permutations, rotations and translations,24, 35 (iii) completeness and (iv) additivity to allow for extrapolation to larger systems. Such fingerprints such as the CM,2 BoB,4 SOAP,3 FCHL,6, 7 SLATM,8 MBTR,5 LODE,11, 12 NICE13 or others, being rooted in fundamental principles, are designed to be property-independent: a single representation can be constructed for a molecule to predict any (electronically-derived) target. This is analogous to the molecular Hamiltonian, which specifies the energy and all other properties of the system as a function of atom’s types and positions in three-dimensional space (assuming the molecules are charge neutral and singlets). These representations are typically used in combination with kernel models2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 31, 32, 33 due to their data efficiency, ability to deal with high-dimensional feature vectors, and interpretability of the similarity kernel. Early works showed that combining such representations2, 4, 6, 8, 36 with simple feed-forward neural networks instead of kernel models did not necessarily led to better performance.37, 38

More recently, end-to-end neural networks have been proposed that learn the representation as part of the (supervised) training process,39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53 based on similar principles to the aforementioned physics-inspired representations: They take as input a three-dimensional structure, as well as in some cases charge and spin information.46, 51, 52, 53 Relevant symmetry operations are appropriately encoded into the neural network architecture. Equivariant Neural Networks (ENNs) in particular have shown unprecedented accuracy and data efficiency on benchmarks of molecular property prediction such as energies of organic molecules (QM7b-T,54, 46 QM9,55, 46, 49, 50, 56 GDB-13-T,54, 46 DrugBank-T,57, 46 conformers,58 ANI-159, 45), and energies and forces of several molecular dynamics datasets (MD17,43, 44, 45, 49, 56 proteins,48 methane combustion,60, 45 the open catalyst dataset (OC20)61, 44). Unlike the earlier invariant neural networks,39, 40, 41, 42 which by operating on distances between atoms ensure rotational and translational invariance, equivariant neural networks typically operate on relative position vectors and angular information, which is processed by rotationally-equivariant convolutional layers. The internal features are then equivariant to rotation. ENNs demonstrate a substantial improvement in accuracy even for the prediction of rotation-invariant properties such as total energies.43, 56, 44, 45, 46, 49, 50

Despite these advances for molecular property prediction, the prediction of computed reaction properties (principally, reaction barriers62, 36, 38, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74) is still in its infancy.75 Machine learning approaches range from the use of simple two-dimensional fingerprints of reaction components76, 77 to physical-organic descriptors78, 79, 70, 80, 81, 69, 82, 83, 84, 85, 86, 87, 74 derived from quantum-chemical computations to transformer models88, 89 adapted for regression90 to graph-based approaches.65, 66, 64 A recent class of reaction fingerprints are built from the three-dimensional structure of reaction components,36, 63, 68 involving invariant features for molecular components. It was recently shown38 that these representations are performant for the prediction of reaction barriers, particularly for datasets91, 63 relying on subtle changes in the geometry of reactants and/or products. As in the earlier stages of molecular property prediction,2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13 representations were combined with kernel models.

To date, few attempts have been made to extend equivariant neural networks to chemical reactions. The first is from Spiekermann et al.,66 extending the molecular network DimeNet++41, 92 to describe reactant and product molecules in a chemical reaction, to predict reaction barriers on the benchmark GDB7-20-TS set.93 The second is an equivariant diffusion model predicting the Transition State (TS) structure from reactants and products94 on the same dataset. The former representation-learning approach is closer in spirit to our dedicated reaction fingerprints.36 However, this particular model66 was not shown to improve on 2D-graph based models.65 It was recently illustrated38 that the 2D-graph based models achieve their impressive performance by exploiting atom-map** information,95, 96, 97, 98 which is absent in the equivariant model from Spiekermann et al.66

In this work, we introduce EquiReact, a neural network to predict properties of chemical reactions (showcased here for activation energies), built from equivariant features of reactants’ and products’ geometries. Compared to previously established reaction fingerprints36, 63 as well as other neural networks for reaction prediction,65, 64, 66, 99, 100 we offer several advantages with our new model: (i) greater model flexibility depending on the ease of atom-map** a particular dataset, (ii) better extrapolation capabilities, (iii) competitive predictive performance, (iv) reduced dependence on the quality of three-dimensional geometries and (v) improved data efficiency.

We illustrate these points by studying three datasets of reaction barriers: the GDB7-22-TS 101 (an upgraded dataset from the previously published GDB7-20-TS93), the Cyclo-23-TS,102 and the Proparg-21-TS.91, 63 As discussed in previous works,38 these datasets present varying challenges for ML models, from the dependence on chemical information101 to the distinction of subtle changes in configurations.91, 63 In all cases, we compare to the previously best-performing models:38 the 2D-graph based CGR model ChemProp 65 as well as SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT fingerprints8, 36 built from three-dimensional information combined with kernel ridge regression (KRR) models.

2 Architecture

Refer to caption
Figure 1: Architecture of EquiReact. Molecules pass through independent equivariant channels (green and orange). These are combined to yield a reaction representation (blue) which is used to predict a reaction property, such as the activation energy (red dot, far right).

EquiReact is built from SE(3)𝑆𝐸3SE(3)italic_S italic_E ( 3 )-equivariant convolutional networks over point clouds as implemented in e3nn103 and used in Thomas et al. 47 and Corso et al. 104. Three-dimensional geometries of molecules constituting reactants and products of each reaction are separately passed through these equivariant channels, detailed in Section 2.1 for the convenience of the reader. They are then combined to eventually predict a reaction property, such as the activation energy, as detailed in Section 2.2. The overall architecture is summarized in Figure 1.

2.1 Equivariant molecular channels

A molecule is represented as a distance-based graph where nodes describe atoms and edges describe bonds. Instead of explicitly using connectivity information, the “bonds” of atom a𝑎aitalic_a are formed with all the neighboring Neigh(a)Neigh𝑎\operatorname{Neigh}(a)roman_Neigh ( italic_a ) atoms within the cutoff rmaxsubscript𝑟r_{\max}italic_r start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT, all the (directed) bonds {(a,b)}𝑎𝑏\{(a,b)\}{ ( italic_a , italic_b ) } in the molecule forming set 𝔅𝔅\mathfrak{B}fraktur_B. Initial node (atom) features {𝐱a(0)}subscriptsuperscript𝐱0𝑎\{\mathrm{\mathbf{x}}^{(0)}_{a}\}{ bold_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT } encode several cheminformatic features from RDKit,105 including atomic number, chirality tag (unspecified, tetrahedral, or other, including octahedral, square planar, allene-type, etc.), number of directly-bonded neighbors, number of rings, implicit valence, formal charge, number of attached hydrogens, number of radical electrons, hybridization, aromaticity, presence in rings of specified sizes from 3 to 7.

Inspired from related models,104 initial scalar edge (bond) features {𝐞ab(0)}subscriptsuperscript𝐞0𝑎𝑏\{\mathrm{\mathbf{e}}^{(0)}_{ab}\}{ bold_e start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT } are projections of the atom distances |𝐫ab|subscript𝐫𝑎𝑏|\mathrm{\mathbf{r}}_{ab}|| bold_r start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT | onto ngsubscript𝑛𝑔n_{g}italic_n start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT Gaussians uniformly spanning the line segment from 00 to rmaxsubscript𝑟r_{\max}italic_r start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT with the step Δμ=rmax/(ng1)Δ𝜇subscript𝑟subscript𝑛𝑔1\Delta\mu=r_{\max}/(n_{g}-1)roman_Δ italic_μ = italic_r start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT / ( italic_n start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT - 1 ),

𝐞ab(0)=𝐟1(|𝐫ab|)(a,b)𝔅,formulae-sequencesubscriptsuperscript𝐞0𝑎𝑏subscript𝐟1subscript𝐫𝑎𝑏for-all𝑎𝑏𝔅\displaystyle\mathrm{\mathbf{e}}^{(0)}_{ab}=\mathrm{\mathbf{f}}_{1}(|\mathrm{% \mathbf{r}}_{ab}|)\qquad\forall(a,b)\in\mathfrak{B},bold_e start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT = bold_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( | bold_r start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT | ) ∀ ( italic_a , italic_b ) ∈ fraktur_B , (1)
𝐟1(r)={exp(12(rnΔμΔμ)2)}n0,,ng1.formulae-sequencesubscript𝐟1𝑟12superscript𝑟𝑛Δ𝜇Δ𝜇2𝑛0subscript𝑛𝑔1\displaystyle\mathrm{\mathbf{f}}_{1}(r)=\left\{\exp\left(-\frac{1}{2}\left(% \frac{r-n\Delta\mu}{\Delta\mu}\right)^{2}\right)\right\}\quad n\in 0,\ldots,n_% {g}{-}1.bold_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_r ) = { roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG italic_r - italic_n roman_Δ italic_μ end_ARG start_ARG roman_Δ italic_μ end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) } italic_n ∈ 0 , … , italic_n start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT - 1 . (2)

Tensorial edge features {𝐳ab}subscript𝐳𝑎𝑏\{\mathrm{\mathbf{z}}_{ab}\}{ bold_z start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT } are projections of normalized difference vectors between atomic positions 𝐫ab/|𝐫ab|subscript𝐫𝑎𝑏subscript𝐫𝑎𝑏\mathrm{\mathbf{r}}_{ab}/|\mathrm{\mathbf{r}}_{ab}|bold_r start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT / | bold_r start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT | onto spherical harmonics Ymsubscriptsuperscript𝑌𝑚Y^{\ell}_{m}italic_Y start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT of 02020\leq\ell\leq 20 ≤ roman_ℓ ≤ 2,

𝐳ab𝐳ab0e𝐳ab1o𝐳ab2e=𝐟2(𝐫ab/|𝐫ab|)(a,b)𝔅,formulae-sequencesubscript𝐳𝑎𝑏direct-sumsuperscriptsubscript𝐳𝑎𝑏0𝑒superscriptsubscript𝐳𝑎𝑏1𝑜superscriptsubscript𝐳𝑎𝑏2𝑒subscript𝐟2subscript𝐫𝑎𝑏subscript𝐫𝑎𝑏for-all𝑎𝑏𝔅\displaystyle\mathrm{\mathbf{z}}_{ab}\equiv\mathrm{\mathbf{z}}_{ab}^{0e}\oplus% \mathrm{\mathbf{z}}_{ab}^{1o}\oplus\mathrm{\mathbf{z}}_{ab}^{2e}=\mathrm{% \mathbf{f}}_{2}(\mathrm{\mathbf{r}}_{ab}/|\mathrm{\mathbf{r}}_{ab}|)\qquad% \forall(a,b)\in\mathfrak{B},bold_z start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT ≡ bold_z start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 italic_e end_POSTSUPERSCRIPT ⊕ bold_z start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 italic_o end_POSTSUPERSCRIPT ⊕ bold_z start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_e end_POSTSUPERSCRIPT = bold_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_r start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT / | bold_r start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT | ) ∀ ( italic_a , italic_b ) ∈ fraktur_B , (3)
𝐟2(𝐫)=Y00(𝐫){Ym1(𝐫)}|m|1{Ym2(𝐫)}|m|2.subscript𝐟2𝐫direct-sumsubscriptsuperscript𝑌00𝐫subscriptsubscriptsuperscript𝑌1𝑚𝐫𝑚1subscriptsubscriptsuperscript𝑌2𝑚𝐫𝑚2\displaystyle\mathrm{\mathbf{f}}_{2}(\mathrm{\mathbf{r}})=Y^{0}_{0}(\mathrm{% \mathbf{r}})\oplus\left\{Y^{1}_{m}(\mathrm{\mathbf{r}})\right\}_{|m|\leq 1}% \oplus\left\{Y^{2}_{m}(\mathrm{\mathbf{r}})\right\}_{|m|\leq 2}.bold_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_r ) = italic_Y start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_r ) ⊕ { italic_Y start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( bold_r ) } start_POSTSUBSCRIPT | italic_m | ≤ 1 end_POSTSUBSCRIPT ⊕ { italic_Y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( bold_r ) } start_POSTSUBSCRIPT | italic_m | ≤ 2 end_POSTSUBSCRIPT . (4)

The initial 𝐱(0)superscript𝐱0\mathrm{\mathbf{x}}^{(0)}bold_x start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT and 𝐞(0)superscript𝐞0\mathrm{\mathbf{e}}^{(0)}bold_e start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT are then passed through embeddings to give xa(1)asubscriptsuperscript𝑥1𝑎for-all𝑎x^{(1)}_{a}\forall aitalic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ∀ italic_a and 𝐞ab(a,b)𝔅subscript𝐞𝑎𝑏for-all𝑎𝑏𝔅\mathrm{\mathbf{e}}_{ab}\forall(a,b)\in\mathfrak{B}bold_e start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT ∀ ( italic_a , italic_b ) ∈ fraktur_B. Atomic representations 𝐱(1)superscript𝐱1\mathrm{\mathbf{x}}^{(1)}bold_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT are then updated by nconv=3subscript𝑛conv3n_{\mathrm{conv}}=3italic_n start_POSTSUBSCRIPT roman_conv end_POSTSUBSCRIPT = 3 equivariant convolutional layers: {DispWithArrows}<L1 > & w^(1)_ab = g_31(e_ab ⊕x^(1)_a ⊕x^(1)_b)  ∀(a,b)∈B
s(1)bs0e(1)bs1o(1)b= 1Neigh(b)a: (a,b)∈Bt1(x(1)a, zab, w(1)ab)  ∀b
x^0e(2) = x^(1) + s^0e(1)
x^(2) = x^0e(2) ⊕s^1o(1) {DispWithArrows}<L2 > & w^(2)_ab = g_32(e_ab ⊕x^0e(2)_a ⊕x^0e(2)_b)  ∀(a,b)∈B
s(2)bs0e(2)bs1o(2)bs1e(2)b= 1Neigh(b)a: (a,b)∈Bt2(x(2)a, zab, w(2)ab)  ∀b
x^0e(3) = x^0e(2) + s^0e(2)
x^(3) = x^0e(3) ⊕(s^1o(1)+s^1o(2))s^1e(2) {DispWithArrows}<L3 > & w^(3)_ab = g_33(e_ab ⊕x^0e(3)_a ⊕x^0e(3)_b)  ∀(a,b)∈B
s(3)bs0e(3)bs1o(3)bs1e(3)bs0o(3)b= 1Neigh(b)a: (a,b)∈Bt3(x(3)a, zab, w(3)ab)  ∀b
x^out = (x^0e(3) + s^0e(3))s^0o(3), where for example in 𝐋𝟏𝐋𝟏\mathrm{\mathbf{L1}}bold_L1, 𝐬b(1)𝐬b0e(1)𝐬b1o(1)subscriptsuperscript𝐬1𝑏direct-sumsubscriptsuperscript𝐬0𝑒1𝑏subscriptsuperscript𝐬1𝑜1𝑏\mathrm{\mathbf{s}}^{(1)}_{b}\equiv\mathrm{\mathbf{s}}^{0e(1)}_{b}\oplus% \mathrm{\mathbf{s}}^{1o(1)}_{b}bold_s start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ≡ bold_s start_POSTSUPERSCRIPT 0 italic_e ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ⊕ bold_s start_POSTSUPERSCRIPT 1 italic_o ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT means that the result of the function 𝐭1subscript𝐭1\mathrm{\mathbf{t}}_{1}bold_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT consists of scalars (0e0𝑒0e0 italic_e) and vectors (1o1𝑜1o1 italic_o) that can be treated separately. Each function 𝐭n(𝐱,𝐳,𝐰)subscript𝐭𝑛𝐱𝐳𝐰\mathrm{\mathbf{t}}_{n}(\mathrm{\mathbf{x}},\mathrm{\mathbf{z}},\mathrm{% \mathbf{w}})bold_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_x , bold_z , bold_w ) is a fully-connected weighted tensor product, as defined in e3nn,103 in the form of

𝐭n(𝐱,𝐳,𝐯)={𝐭w(n)}={uvwuvw(n)𝐱u𝐳v}.subscript𝐭𝑛𝐱𝐳𝐯subscriptsuperscript𝐭𝑛𝑤subscript𝑢𝑣tensor-productsubscriptsuperscript𝑤𝑛𝑢𝑣𝑤subscript𝐱𝑢subscript𝐳𝑣\mathrm{\mathbf{t}}_{n}(\mathrm{\mathbf{x}},\mathrm{\mathbf{z}},\mathrm{% \mathbf{v}})=\{\mathrm{\mathbf{t}}^{(n)}_{w}\}=\left\{\sum_{uv}w^{(n)}_{uvw}% \mathrm{\mathbf{x}}_{u}\otimes\mathrm{\mathbf{z}}_{v}\right\}.bold_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_x , bold_z , bold_v ) = { bold_t start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT } = { ∑ start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v italic_w end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⊗ bold_z start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT } . (5)

They are specified by signatures of irreducible representations (irreps) of two input and one output O(3)𝑂3O(3)italic_O ( 3 ) tensors. The output tensor is a combination of weighted sums of paths (pairs of input irreps) leading to each output irrep. The irrep sequence in each layer from 1–3 is illustrated in Figure 2. To obtain the weights 𝐰(n)superscript𝐰𝑛\mathrm{\mathbf{w}}^{(n)}bold_w start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT for each convolutional layer n𝑛nitalic_n, the spherical parts of 𝐱a(n)subscriptsuperscript𝐱𝑛𝑎\mathrm{\mathbf{x}}^{(n)}_{a}bold_x start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and 𝐱b(n)subscriptsuperscript𝐱𝑛𝑏\mathrm{\mathbf{x}}^{(n)}_{b}bold_x start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT are concatenated with the bond features 𝐞absubscript𝐞𝑎𝑏\mathrm{\mathbf{e}}_{ab}bold_e start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT and passed through a Multi-Layer Perceptron (MLP).

Refer to caption
Figure 2: Irrep sequence in the (1), (2), (3) convolutional layers of EquiReact. Input irreps are on the left and right, output irreps are at the bottom, and paths that connect them are in the middle in green. Note that formally 2e2𝑒2e2 italic_e is present in the right input already at the first layer, but does not contribute to the output specified by the signature of 𝐭1subscript𝐭1\mathrm{\mathbf{t}}_{1}bold_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

The output of the equivariant molecular channels is the local molecular representation 𝐗Nat×D𝐗superscriptsubscript𝑁at𝐷\mathrm{\mathbf{X}}\in\mathbb{R}^{N_{\mathrm{at}}\times D}bold_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_at end_POSTSUBSCRIPT × italic_D end_POSTSUPERSCRIPT corresponding to Natsubscript𝑁atN_{\mathrm{at}}italic_N start_POSTSUBSCRIPT roman_at end_POSTSUBSCRIPT atoms associated with D𝐷Ditalic_D features. Depending on the sum_mode hyperparameter, it is constructed either from the node features {𝐱aout}subscriptsuperscript𝐱out𝑎\{\mathrm{\mathbf{x}}^{\mathrm{out}}_{a}\}{ bold_x start_POSTSUPERSCRIPT roman_out end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT } (node mode) or both node and edge features {𝐱aoutb:(a,b)𝔅𝐲ab(0)}direct-sumsubscriptsuperscript𝐱out𝑎subscript:𝑏𝑎𝑏𝔅subscriptsuperscript𝐲0𝑎𝑏\{\mathrm{\mathbf{x}}^{\mathrm{out}}_{a}\oplus\sum_{b:(a,b)\in\mathfrak{B}}% \mathrm{\mathbf{y}}^{(0)}_{ab}\}{ bold_x start_POSTSUPERSCRIPT roman_out end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ⊕ ∑ start_POSTSUBSCRIPT italic_b : ( italic_a , italic_b ) ∈ fraktur_B end_POSTSUBSCRIPT bold_y start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT } (both mode). In the case of nconv=2subscript𝑛conv2n_{\mathrm{conv}}=2italic_n start_POSTSUBSCRIPT roman_conv end_POSTSUBSCRIPT = 2, the vectors {𝐱a0e(3)}subscriptsuperscript𝐱0𝑒3𝑎\{\mathrm{\mathbf{x}}^{0e(3)}_{a}\}{ bold_x start_POSTSUPERSCRIPT 0 italic_e ( 3 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT } are taken to construct the molecular representation.

Inspired by the ChemProp model,65 we added an option to exclude hydrogen atoms as nodes when constructing the graph. The only information about hydrogens is then contained in the initial edge features of heavy atoms. The results shown in the main text are obtained without hydrogens, since in this regime the model performs systematically better. Comparison with a regime which uses explicit hydrogen atoms is provided in Section LABEL:S-sec:hydrogens.

2.2 Combining molecules for reactions

Once atom-wise molecular representations 𝐗𝐗\mathrm{\mathbf{X}}bold_X are learned for reactant and product molecules, they must be combined to form a reaction representation 𝐗rxnsubscript𝐗rxn\mathrm{\mathbf{X}}_{\mathrm{rxn}}bold_X start_POSTSUBSCRIPT roman_rxn end_POSTSUBSCRIPT.

Refer to caption
Figure 3: Scheme illustrating how the reactant (green) and product (orange) representations are combined to form a reaction representation (blue) and eventually predict the target property (red dot) using a multi-layer perceptron (mlp). \sum refers to the summation over atom-wise environments. Oblong rectangles and squares represent vectors and scalars, respectively.

For certain datasets, atom-map** information is available, which correlates individual atoms in reactant molecules to individual atoms in product molecules according to their reaction mechanism. In this setting, the representations 𝐗reactantsubscript𝐗reactant\mathrm{\mathbf{X}}_{\mathrm{reactant}}bold_X start_POSTSUBSCRIPT roman_reactant end_POSTSUBSCRIPT and 𝐗productsubscript𝐗product\mathrm{\mathbf{X}}_{\mathrm{product}}bold_X start_POSTSUBSCRIPT roman_product end_POSTSUBSCRIPT are re-ordered such that the local representation vectors correspond to the same atom in reactants and products. Depending on the combine_mode hyperparameter, either a difference is taken between products’ and reactants’ atom representations, or they are summed, averaged or passed through an MLP. Thus, the local reaction representation 𝐗rxnsubscript𝐗rxn\mathrm{\mathbf{X}}_{\mathrm{rxn}}bold_X start_POSTSUBSCRIPT roman_rxn end_POSTSUBSCRIPT consists of vectors reflecting how the environment changes in the reaction for each atom. We will address this regime as EquiReactM𝑀{}_{M}start_FLOATSUBSCRIPT italic_M end_FLOATSUBSCRIPT.

With the reaction representation at hand, predictions are made in the so-called vector or energy modes. In the vector mode, the atomic vectors constituting the reaction representation 𝐗rxnsubscript𝐗rxn\mathrm{\mathbf{X}}_{\mathrm{rxn}}bold_X start_POSTSUBSCRIPT roman_rxn end_POSTSUBSCRIPT are initially passed through an MLP to introduce non-linearity and then summed up to form a global reaction representation vector 𝐗¯rxnsubscript¯𝐗rxn\mathrm{\mathbf{\bar{X}}}_{\mathrm{rxn}}over¯ start_ARG bold_X end_ARG start_POSTSUBSCRIPT roman_rxn end_POSTSUBSCRIPT. The target (here, the activation barrier) is then learned using an MLP. This model pipeline is illustrated in Figure 3a. In the energy mode, the local reaction representations are used to learn atomic contributions to the target (Figure 3b). While performing worse in general, in some cases this mode yields the best predictions (see Section 3.1.2).

While atom-map** provides static information analogous to a reaction mechanism to link atoms in reactants to atoms in products, instead it is possible to dynamically (i.e., in a learnable fashion) exchange information between two molecular representations. For example, RXNMapper98 is a neural network that learns atom-map**s within the larger self-supervised task of predicting the randomly masked parts in a reaction sequence, using one head of a multi-head transformer architecture. Inspired by EquiBind,106 a neural network that predicts the rotation and translation of a ligand to a protein and contains a cross-attention module between ligand and receptor, EquiReactX𝑋{}_{X}start_FLOATSUBSCRIPT italic_X end_FLOATSUBSCRIPT uses cross-attention between reactants and products to create a surrogate for atom-map**. Given queries 𝐐N×D𝐐superscript𝑁𝐷\mathrm{\mathbf{Q}}\in\mathbb{R}^{N\times D}bold_Q ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_D end_POSTSUPERSCRIPT, keys 𝐊M×D𝐊superscript𝑀𝐷\mathrm{\mathbf{K}}\in\mathbb{R}^{M\times D}bold_K ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × italic_D end_POSTSUPERSCRIPT and values 𝐕M×D𝐕superscript𝑀𝐷\mathrm{\mathbf{V}}\in\mathbb{R}^{M\times D}bold_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × italic_D end_POSTSUPERSCRIPT, attention is computed as

𝐀=softmax(𝐐𝐊TD)𝐀softmaxsuperscript𝐐𝐊𝑇𝐷\mathrm{\mathbf{A}}=\operatorname{softmax}\left(\frac{\mathrm{\mathbf{Q}}% \mathrm{\mathbf{K}}^{T}}{\sqrt{D}}\right)bold_A = roman_softmax ( divide start_ARG bold_QK start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_D end_ARG end_ARG ) (6)

and the “reordered” values 𝐘𝐘\mathrm{\mathbf{Y}}bold_Y are

𝐘=𝐀𝐕.𝐘𝐀𝐕\mathrm{\mathbf{Y}}=\mathrm{\mathbf{A}}\mathrm{\mathbf{V}}.bold_Y = bold_AV . (7)

We used the implementation of this scaled-dot-product attention107 in PyTorch’s108 MultiheadAttention (PyTorch version 1.12.1). The representations are re-ordered using Eq. 6 and Eq. 7 with 𝐐𝐐\mathrm{\mathbf{Q}}bold_Q as the vector representation of reactants, 𝐊𝐊\mathrm{\mathbf{K}}bold_K and 𝐕𝐕\mathrm{\mathbf{V}}bold_V as the vector representations of products and vice versa (thus here N=M=Nat𝑁𝑀subscript𝑁atN=M=N_{\mathrm{at}}italic_N = italic_M = italic_N start_POSTSUBSCRIPT roman_at end_POSTSUBSCRIPT). The re-ordered representations of reactants and products are combined as for the case of atom-mapped reactions (Figures 3a and 3b). We note that other algorithms could also have been used to exchange information between reactants and products, for example in the form of message passing, or equivariant attention.109, 110

EquiReact also provides a simple “no map**” approach, called EquiReactS𝑆{}_{S}start_FLOATSUBSCRIPT italic_S end_FLOATSUBSCRIPT, which does not rely on atom-map**, nor a surrogate cross-attention module. In the vector mode (Figure 3c), the atomic components of molecular representations 𝐗reactantsubscript𝐗reactant\mathrm{\mathbf{X}}_{\mathrm{reactant}}bold_X start_POSTSUBSCRIPT roman_reactant end_POSTSUBSCRIPT and 𝐗productsubscript𝐗product\mathrm{\mathbf{X}}_{\mathrm{product}}bold_X start_POSTSUBSCRIPT roman_product end_POSTSUBSCRIPT are summed up to obtain global vectors 𝐗¯reactantsubscript¯𝐗reactant\mathrm{\mathbf{\bar{X}}}_{\mathrm{reactant}}over¯ start_ARG bold_X end_ARG start_POSTSUBSCRIPT roman_reactant end_POSTSUBSCRIPT and 𝐗¯productsubscript¯𝐗product\mathrm{\mathbf{\bar{X}}}_{\mathrm{product}}over¯ start_ARG bold_X end_ARG start_POSTSUBSCRIPT roman_product end_POSTSUBSCRIPT, respectively. Then they are combined, according to the combine_mode parameter, to form a reaction vector 𝐗¯rxnsubscript¯𝐗rxn\mathrm{\mathbf{\bar{X}}}_{\mathrm{rxn}}over¯ start_ARG bold_X end_ARG start_POSTSUBSCRIPT roman_rxn end_POSTSUBSCRIPT which is used to learn the target with an MLP. In the energy mode (Figure 3d) individual atomic representations are used to learn their contributions to the quasi-molecular energies of reactants and products, which are later combined (according to the combine_mode parameter) to predict the target. In most cases, this simpler model out-performs EquiReactX𝑋{}_{X}start_FLOATSUBSCRIPT italic_X end_FLOATSUBSCRIPT (vide infra).

3 Results and Discussion

3.1 Model performance

The performance of EquiReact on three diverse datasets (the GDB7-22-TS,101 Cyclo-23-TS 102 and Proparg-21-TS 91, 63) is illustrated in Table 1, compared to previously best baseline models:38 ChemProp,65 a graph neural network that uses atom maps to construct a condensed graph of reaction (CGR), and the SLATM8 fingerprint adapted to reactions by taking the difference between product and reactant fingerprints (SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT),36 combined with KRR models (SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT+KRR). The models are compared in three regimes: with high-quality atom-map** information (“True”) derived from the transition state structure or heuristic rules,102, 93, 101, 65, 97 with atom-maps obtained using the open-source RXNMapper98 (“RXNMapper”) and without any atom-map** information at all (“None”). As discussed in recent work,111, 38 previously developed graph-based models for reaction property prediction65, 66, 64, 99, 100 including ChemProp 65 reported prediction errors only in the “True” atom-map** regime. The “RXNMapper” regime is important for real chemistry where the reaction mechanism is not known and atom-map** using heuristic rules is impossible. The “None” regime is critical for all chemistry that falls outside of the realm of organic chemistry captured in the patents112 that RXNMapper98 is pre-trained on. SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT+KRR always operates in the “None” regime.

The atom-map**-based model EquiReactM𝑀{}_{M}start_FLOATSUBSCRIPT italic_M end_FLOATSUBSCRIPT is used in the “True” and “RXNMapper” regimes. In the “None” regime, EquiReactX𝑋{}_{X}start_FLOATSUBSCRIPT italic_X end_FLOATSUBSCRIPT and EquiReactS𝑆{}_{S}start_FLOATSUBSCRIPT italic_S end_FLOATSUBSCRIPT were tested. EquiReactS𝑆{}_{S}start_FLOATSUBSCRIPT italic_S end_FLOATSUBSCRIPT consistently outperformed EquiReactX𝑋{}_{X}start_FLOATSUBSCRIPT italic_X end_FLOATSUBSCRIPT, so we include only EquiReactS𝑆{}_{S}start_FLOATSUBSCRIPT italic_S end_FLOATSUBSCRIPT and refer the reader to the Supplementary Information for their comparison. EquiReact is compared to ChemProp and SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT+KRR baselines using both random splits to measure interpolative capabilities as well as scaffold splits to measure extrapolative capabilities. Scaffold splitting113, 114 clusters molecules based on their 2D backbones (such as Bemis–Murcko scaffolds115) and ensures that the clusters (scaffolds) belonging to the train, test, and validation sets do not overlap. This is a more challenging regime for a model than random splitting where very similar molecules could appear in both the training and test sets.

Dataset (property, units) Atom-map** regime ChemProp SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT+KRR EquiReact
Random splits
GDB7-22-TS (ΔEΔsuperscript𝐸\Delta E^{\ddagger}roman_Δ italic_E start_POSTSUPERSCRIPT ‡ end_POSTSUPERSCRIPT, kcal/mol) True 4.60±0.15plus-or-minus4.600.15\bf 4.60\pm 0.15bold_4.60 ± bold_0.15 5.00±0.16plus-or-minus5.000.165.00\pm 0.165.00 ± 0.16
RXNMapper 6.1±0.3plus-or-minus6.10.3\bf 6.1\pm 0.3bold_6.1 ± bold_0.3 6.2±0.3plus-or-minus6.20.36.2\pm 0.36.2 ± 0.3
None 8.74±0.24plus-or-minus8.740.248.74\pm 0.248.74 ± 0.24 6.9±0.3plus-or-minus6.90.36.9\pm 0.36.9 ± 0.3 6.3±0.3plus-or-minus6.30.3\bf 6.3\pm 0.3bold_6.3 ± bold_0.3
Cyclo-23-TS (ΔGΔsuperscript𝐺\Delta G^{\ddagger}roman_Δ italic_G start_POSTSUPERSCRIPT ‡ end_POSTSUPERSCRIPT, kcal/mol) True 2.70±0.14plus-or-minus2.700.142.70\pm 0.142.70 ± 0.14 2.42±0.11plus-or-minus2.420.11\bf 2.42\pm 0.11bold_2.42 ± bold_0.11
RXNMapper 2.71±0.16plus-or-minus2.710.162.71\pm 0.162.71 ± 0.16 2.44±0.17plus-or-minus2.440.17\bf 2.44\pm 0.17bold_2.44 ± bold_0.17
None 2.69±0.16plus-or-minus2.690.162.69\pm 0.162.69 ± 0.16 2.70±0.14plus-or-minus2.700.142.70\pm 0.142.70 ± 0.14 2.32±0.14plus-or-minus2.320.14\bf 2.32\pm 0.14bold_2.32 ± bold_0.14
Proparg-21-TS (ΔEΔsuperscript𝐸\Delta E^{\ddagger}roman_Δ italic_E start_POSTSUPERSCRIPT ‡ end_POSTSUPERSCRIPT, kcal/mol) True 1.58±0.13plus-or-minus1.580.131.58\pm 0.131.58 ± 0.13 0.29±0.06plus-or-minus0.290.06\bf 0.29\pm 0.06bold_0.29 ± bold_0.06
None 1.59±0.15plus-or-minus1.590.151.59\pm 0.151.59 ± 0.15 0.42±0.11plus-or-minus0.420.110.42\pm 0.110.42 ± 0.11 0.27±0.04plus-or-minus0.270.04\bf 0.27\pm 0.04bold_0.27 ± bold_0.04
Scaffold splits
GDB7-22-TS (ΔEΔsuperscript𝐸\Delta E^{\ddagger}roman_Δ italic_E start_POSTSUPERSCRIPT ‡ end_POSTSUPERSCRIPT, kcal/mol) True 5.0±0.5plus-or-minus5.00.55.0\pm 0.55.0 ± 0.5 4.95±0.19plus-or-minus4.950.19\bf 4.95\pm 0.19bold_4.95 ± bold_0.19
RXNMapper 6.6±0.3plus-or-minus6.60.36.6\pm 0.36.6 ± 0.3 6.0±0.3plus-or-minus6.00.3\bf 6.0\pm 0.3bold_6.0 ± bold_0.3
None 11.9±1.0plus-or-minus11.91.011.9\pm 1.011.9 ± 1.0 6.47±0.22plus-or-minus6.470.226.47\pm 0.226.47 ± 0.22 6.3±0.3plus-or-minus6.30.3\bf 6.3\pm 0.3bold_6.3 ± bold_0.3
Cyclo-23-TS (ΔGΔsuperscript𝐺\Delta G^{\ddagger}roman_Δ italic_G start_POSTSUPERSCRIPT ‡ end_POSTSUPERSCRIPT, kcal/mol) True 2.9±0.3plus-or-minus2.90.32.9\pm 0.32.9 ± 0.3 2.73±0.19plus-or-minus2.730.19\bf 2.73\pm 0.19bold_2.73 ± bold_0.19
RXNMapper 2.81±0.20plus-or-minus2.810.20\bf 2.81\pm 0.20bold_2.81 ± bold_0.20 2.82±0.19plus-or-minus2.820.19\bf 2.82\pm 0.19bold_2.82 ± bold_0.19
None 3.2±0.3plus-or-minus3.20.33.2\pm 0.33.2 ± 0.3 2.66±0.15plus-or-minus2.660.152.66\pm 0.152.66 ± 0.15 2.37±0.21plus-or-minus2.370.21\bf 2.37\pm 0.21bold_2.37 ± bold_0.21
Proparg-21-TS (ΔEΔsuperscript𝐸\Delta E^{\ddagger}roman_Δ italic_E start_POSTSUPERSCRIPT ‡ end_POSTSUPERSCRIPT, kcal/mol) True 1.64±0.25plus-or-minus1.640.251.64\pm 0.251.64 ± 0.25 0.31±0.08plus-or-minus0.310.08\bf 0.31\pm 0.08bold_0.31 ± bold_0.08
None 1.89±0.07plus-or-minus1.890.071.89\pm 0.071.89 ± 0.07 0.36±0.07plus-or-minus0.360.070.36\pm 0.070.36 ± 0.07 0.27±0.03plus-or-minus0.270.03\bf 0.27\pm 0.03bold_0.27 ± bold_0.03
Table 1: Performance as measured in mean absolute errors (MAEs) of predictions of EquiReact vs. state-of-the-art baselines ChemProp and SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT for the GDB7-22-TS,101 Cyclo-23-TS,102 and Proparg-21-TS 91, 63 datasets. All datasets are compared in three atom-map** regimes: “True”, “RXNMapper” and “None”, except for the Proparg-21-TS set, where RXNMapper cannot map the reaction SMILES. EquiReactM𝑀{}_{M}start_FLOATSUBSCRIPT italic_M end_FLOATSUBSCRIPT is used for the “True” and “RXNMapper” regimes, while in the “None” regime results for EquiReactS𝑆{}_{S}start_FLOATSUBSCRIPT italic_S end_FLOATSUBSCRIPT model are reported. MAEs are averaged over 10 folds of random/scaffold 90/5/5 splits (train/test/validation). The best model for each regime, dataset and split type is highlighted in bold.

3.1.1 GDB7-22-TS dataset

This dataset is distinct from the other two in that it includes variations in the reaction class (and mechanism), thereby showing a greater dependence on the existence and quality of atom-map** information in the models. It has already been observed38 that for existing models (including ChemProp and SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT), using random splits, that there is stark hierarchy in the predictions from the “True” to “RXNMapper” to “None” regimes.

In the “True” regime, EquiReact does not improve predictive capabilities over ChemProp model with random splits. This points to the importance of the chemical diversity in this set, where knowledge of the reaction mechanism (in the form of atom maps) is sufficient information to predict the reaction barriers without information of the three-dimensional geometries of reactants and products. With scaffold splits however, EquiReact and ChemProp have the same predictive MAEs within the standard deviations. It can be seen that a model based on geometry information naturally extrapolates better than one trained on atom maps. Bemis–Murcko scaffold115 splitting clusters molecules (here, reactants) based on ring systems. Out-of-sample test molecules may therefore appear “novel” from the point of view of the reaction graph, but will still feature distances and angles close to what the model has seen during training.

Moving to the “RXNMapper” regime, the trend observed in the relative model performance in the “True” regime is exaggerated. Both EquiReact and ChemProp agree within the standard deviation using random splits, while EquiReact outperforms ChemProp using scaffold splits.

In the “None” regime, the difference in the performance of the models is even greater. In terms of MAE, EquiReact outperforms ChemProp by more than 2 kcal/moltimes2kcalmol2\text{\,}\mathrm{k}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{/}\mathrm{m}\mathrm{% o}\mathrm{l}start_ARG 2 end_ARG start_ARG times end_ARG start_ARG roman_kcal / roman_mol end_ARG using random splits, and more than 5 kcal/moltimes5kcalmol5\text{\,}\mathrm{k}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{/}\mathrm{m}\mathrm{% o}\mathrm{l}start_ARG 5 end_ARG start_ARG times end_ARG start_ARG roman_kcal / roman_mol end_ARG using scaffold splits. EquiReact’s improvement compared to SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT+KRR is smaller. The SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT representation also makes use of 3D coordinates of the reactants and products as is therefore more fundamentally similar to EquiReact than ChemProp. Nevertheless, EquiReact makes use of equivariant components for molecular features (vs. the invariant features of SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT) and learns a representation end-to-end, allowing for a more performant model.

Thanks to EquiReact exploiting the chemical information present in atom-maps (if available) and encoding the natural symmetries of molecular reaction components, the stark gap previously observed38 from the “True” to “None” regimes has been diminished, with the prediction MAEs ranging from 56.3 kcal/moltimesrange56.3kcalmol56.3\text{\,}\mathrm{k}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{/}\mathrm{m}% \mathrm{o}\mathrm{l}start_ARG start_ARG 5 end_ARG – start_ARG 6.3 end_ARG end_ARG start_ARG times end_ARG start_ARG roman_kcal / roman_mol end_ARG for the GDB7-22-TS set using random splits and 4.956.3 kcal/moltimesrange4.956.3kcalmol4.956.3\text{\,}\mathrm{k}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{/}\mathrm{m}% \mathrm{o}\mathrm{l}start_ARG start_ARG 4.95 end_ARG – start_ARG 6.3 end_ARG end_ARG start_ARG times end_ARG start_ARG roman_kcal / roman_mol end_ARG using scaffold splits. This is illustrated in Figure 4, where for the same scaffold split, outliers in the “True” plot successively move closer to the y=x𝑦𝑥y=xitalic_y = italic_x line.

Refer to caption
Figure 4: Correlation plots of predicted with EquiReact ΔEΔsuperscript𝐸\Delta E^{\ddagger}roman_Δ italic_E start_POSTSUPERSCRIPT ‡ end_POSTSUPERSCRIPT values vs. true (computed) labels for the first scaffold split on the GDB7-22-TS dataset, in the three different atom-map** regimes (“None”, “RXNMapper” and “True”). The same test reactions are highlighted in each subplot, where the reactant is illustrated on top and the product(s) on the bottom for each reaction. The reactants are used to generate the scaffold splits.

3.1.2 Cyclo-23-TS dataset

The Cyclo-23-TS 102 dataset contains a single fixed reaction-class and has been previously illustrated to show less dependence on the quality of atom-map** than the GDB7-22-TS.38

For this set, EquiReact outperforms or matches the other models in all three regimes for both random and scaffold splits. This illustrates that a model based purely on geometry information of reactants and products, without any chemical information in the form of atom-map** or surrogates thereof, can be highly performant for reaction property prediction.

The best model is obtained in the “None” regime, with EquiReactS𝑆{}_{S}start_FLOATSUBSCRIPT italic_S end_FLOATSUBSCRIPT in the energy mode (Figure 3d). As outlined in Section 2.2, in energy mode an energy contribution is learned for reactants’ and products’ atoms separately. In the original publication,102 Stuyver et al. illustrate that the activation barriers correlate linearly with the reaction energy. Since the reaction energy is the difference between products’ and reactants’ energies, the energy mode is the best choice for a model learning the reaction energy, and in the case of this dataset, for the barrier too, due to its linear correlation with the reaction energy.

3.1.3 Proparg-21-TS dataset

The Proparg-21-TS 91, 63 is a small dataset for neural network standards (753 points) and therefore constitutes a challenge for the data efficiency of our model.

EquiReact obtains the best predictive abilities for both the “None” and “True” regimes, where the “RXNMapper” regime is not available since it cannot atom-map the reaction SMILES of this set. These results illustrate that in line with the observations for ENNs for molecules,43, 44, 45, 46, 47, 48, 49, 50 ENNs for chemical reactions can be highly data-efficient and operate in the “low-data” regime.

Geometry-only EquiReactS𝑆{}_{S}start_FLOATSUBSCRIPT italic_S end_FLOATSUBSCRIPT results in MAEs 0.15 kcal/moltimes0.15kcalmol0.15\text{\,}\mathrm{k}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{/}\mathrm{m}% \mathrm{o}\mathrm{l}start_ARG 0.15 end_ARG start_ARG times end_ARG start_ARG roman_kcal / roman_mol end_ARG / 0.09 kcal/moltimes0.09kcalmol0.09\text{\,}\mathrm{k}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{/}\mathrm{m}% \mathrm{o}\mathrm{l}start_ARG 0.09 end_ARG start_ARG times end_ARG start_ARG roman_kcal / roman_mol end_ARG lower than those of previous state-of-the-art SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT using random / scaffold splits. Since the enantioselectivity is related to the barrier through an exponential relationship, this difference is significant for the downstream enantioselectivity prediction.63 Like the Cyclo-23-TS set, this dataset consists of a fixed reaction class and the model does not benefit from being provided the “obvious” chemical information: including true atom-maps does not decrease the error.


These three datasets illustrate the benefits of the flexibility of EquiReact: depending on the datasets’ particular challenges, the model exploits the available information to yield the best-performing model in almost all cases. Since the modes of the model may be specified as hyperparameters, the optimized version of EquiReact can emerge with minimal user intervention.

3.2 Model behaviour

Since the GDB7-22-TS set illustrates the most dependence on the chemical diversity captured in the models, studying EquiReact and baseline models SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT and ChemProp in the “True” regime best captures the difference in their chemical behaviour.

Refer to caption
Figure 5: t-SNE maps (perplexity =64absent64=64= 64) of the latent representations of EquiReact and ChemProp models in the “True” regime and the SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT representation of the GDB7-22-TS dataset, colored by the target ΔEΔsuperscript𝐸\Delta E^{\ddagger}roman_Δ italic_E start_POSTSUPERSCRIPT ‡ end_POSTSUPERSCRIPT (upper panel) and reaction types (lower panel).

Figure 5 compares the (latent) representations of EquiReact, ChemProp and SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT using t-SNE116 maps. In the upper panel, we find that the quality of the correlation between the representations and the target property corresponds to the relative performance of the models in this regime (Table 1). The best-performing ChemProp shows a smooth transition of the target property across the plot, while EquiReact’s gradient is somewhat distorted and the map of the worst-performing SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT does not have a clear structure. The lower panel shows the correlation of the representations with the reaction types. ChemProp, as a chemically-inspired model, illustrates clearer clusters in the reaction type, whereas EquiReact does not, using different information (distances and angles) to correlate its latent space representation with the target property. SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT lies somewhere in-between, illustrating some clusters of reaction class. While SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT is also a distance-based model, the binning structure used to create the representation8, 36 may result in better correlation with the reaction types, since the pairwise bins for example naturally cluster features such as \ceC–\ceH bond formation or breaking. Nevertheless, as discussed in Section 3.1, EquiReact better exploits distance information only to out-perform SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT.

Refer to caption
Figure 6: Box plots illustrating how EquiReact “True” performs for the most common reaction types in the GDB7-22-TS set. EquiReact is constructed without explicit \ceH nodes in the graphs (the default construction). The boxes range from the first to the third quartile of the datapoints. The whiskers limit 90% of the datapoints and the individual points illustrate outliers. The points correspond to the test set of the first random split.

Figure 6 illustrates how EquiReact “True” performs for the most common reaction types in the GDB7-22-TS set defined by bond breaking and formation (see Section 5.3). EquiReact performs universally well across the different reaction types, with consistently low errors and relatively small standard deviations. The reactions for which the model has higher mean errors and standard deviations (+++C–H,--C–H (blue) and +++H–H,--C–H,--C–H (green)) correspond to those involving C–H features. Since the model is trained without explicit H nodes in the graph, features associated with X–H bonds are included implicitly in the model. Since C is the most frequently occurring element in various different configurations, capturing all the C–H features is more challenging than the O–H features for example, which will be more similar to one another. The equivalent plot for the model trained with explicit H nodes is shown in Figure LABEL:S-fig:box_plot_with_H, illustrating that the standard deviations reduce for the reaction types involving C–H features.

3.3 Geometry quality

To illustrate that EquiReact does not require high-quality molecular structures to be used in an out-of-sample scenario, we train and test a model using lower-quality GFN2-xTB117 (xTB) geometries to predict higher-level barriers (CCSD(T)-F12a/cc-pVDZ-F12//ω𝜔\omegaitalic_ωB97X-D3/def2-TZVP for GDB7-22-TS, B3LYP-D3(BJ)/def2-TZVP//B3LYP-D3(BJ)/def2-SVP for Cyclo-23-TS and B97D/TZV(2p,2d) for Proparg-21-TS). The results are illustrated in Figure 7 for the three datasets with DFT and xTB geometries, and compared to the SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT+KRR model in the same settings. EquiReact benefits from a lower sensitivity to the geometry quality compared to the pre-designed representation SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT+KRR across the three datasets.

Refer to caption
Figure 7: Mean Absolute Errors (MAEs) for predictions using either the provided geometries (ω𝜔\omegaitalic_ωB97X-D3/def2-TZVP for GDB7-22-TS, B3LYP-D3(BJ)/def2-SVP for Cyclo-23-TS, B97D/TZV(2p,2d) for Proparg-21-TS) (DFT) or lower-quality GFN2-xTB117 (xTB) geometries with no atom-map** (“None”), “RXNMapper”98 map** or the “True” atom-map**. MAEs are averaged over 10 folds of random 90/5/5 splits (train/test/validation). Note that for GDB7-22-TS and Cyclo-23-TS datasets the DFT results are different from those presented in Sec. 3.1 because here they are obtained on the same subset as the xTB results (see Sec. 5.3).

For the GDB7-22-TS set, there is a negligible difference in model performance moving from DFT to xTB geometries. The xTB geometries are a good proxy for the DFT geometries here, since this set consists of small, charge-neutral organic molecules, which are largely well-described by semiempirical methods. For the Cyclo-23-TS set, while the molecules are still organic, they are larger than those in the GDB7-22-TS set and there is a greater divergence between the GFN2-xTB and DFT geometries, resulting in a larger deterioration with these geometries. Figure LABEL:S-fig:cyclo_rmsd illustrates that models trained on lower quality (i.e., xTB) geometries do not produce higher errors for molecules with particularly high deviation from DFT geometries. Rather, there is a consistent deterioration in the model performance when training on xTB geometries and predicting on DFT barriers, if the xTB geometries are a poor proxy for the DFT ones.

The Proparg-21-TS set is the most complex of the three for GFN2-xTB, since these systems with charged organosilicon compounds differ considerably from those used to parameterize semi-empirical methods or force field methods. As described in Section 5.3, unlike for the other datasets where we generate an initial structure from SMILES using force field methods, for this set it is impossible and we instead generate xTB geometries from the DFT ones. While this is not a feasible geometry generation pipeline out-of-sample, it still demonstrates how different methods perform with high and low-quality geometries. Here, we see that EquiReact is significantly less sensitive than SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT+KRR and the variation trained with lower quality geometries still offers competitive errors (0.48±0.21plus-or-minus0.480.210.48\pm 0.210.48 ± 0.21 kcal/mol for the “None” model).

4 Conclusions

Despite the crowning of equivariant neural networks as best-in-class for the prediction of computed molecular properties, the equivalent has not been well-established for the prediction of reaction properties. We contribute to this domain by introducing EquiReact, an equivariant neural network constructed from the three-dimensional coordinates of reactants and products. While other graph-based models for reactions rely on atom-mapped reaction SMILES, EquiReact is flexible and can include atom-map** if it is available. Particularly in the regime without atom-map** information, EquiReact outperforms existing baselines99, 66, 38 for the prediction of reaction barriers on the GDB7-22-TS,101 Cyclo-23-TS 102 and Proparg-21-TS 63, 91 datasets. The latter dataset in particular illustrates both the data efficiency of our model (with a total dataset size of less than 700 datapoints) as well as EquiReact’s forte in describing subtle changes in geometry. EquiReact demonstrates superior extrapolation capabilities compared to the 2D-graph-based models for all datasets and in all regimes tested. It also suffers less in moving from DFT-level to GFN2-xTB-level117 geometries compared to existing methods. These points illustrate its utility in out-of-sample scenarios.

5 Methods

5.1 Datasets

We test EquiReact on three datasets of reaction barriers previously used to benchmark reaction representations.38 In all cases, optimized three-dimensional structures of reactants and products are provided, which are used to train models and make predictions. The activation barrier is not a direct function of these structures, but using the TS structure to make predictions removes the utility of the ML models vs. direct computation of the TS. Thus we use an implicit interpolation of reactants’ and products’ structures as a proxy for the TS as in previous works.36, 63, 38

The GDB7-22-TS 101 dataset consists of close to 12,0001200012,00012 , 000 diverse organic reactions automatically constructed from the GDB7 dataset118, 119, 55 using the growing string method120 along with corresponding energy barriers (ΔEΔsuperscript𝐸\Delta E^{\ddagger}roman_Δ italic_E start_POSTSUPERSCRIPT ‡ end_POSTSUPERSCRIPT) computed at the CCSD(T)-F12a/cc-pVDZ-F12//ω𝜔\omegaitalic_ωB97X-D3/def2-TZVP level. The dataset provides atom-mapped SMILES, with “True” maps derived from the transition state. For 43434343 reactions out of 11,9261192611,92611 , 926, one of the product’s SMILES represent a molecule different from the xyz structure. These reactions were therefore excluded from the dataset, leading to a modified GDB7-22-TS set used here.

While there are no pre-defined classes for all the reactions in the GDB7-20-TS93 or GDB7-22-TS 101 sets, Grambow et al.64 split the dataset into reactions undergoing certain bond changes: for example, the most common type was breaking of a C–H bond (--C–H) and a C–C bond (--C–C) in the reactants and formation of a C–H bond (+++C–H) in the products, giving the reaction type signature +++C–H,--C–C,--C–H. Here, we extract similar reaction types by comparing the connectivity matrices from atom-mapped reaction SMILES of reactants and products (ignoring bond orders). The most abundant reaction types in the dataset are +++C–H,--C–C,--C–H (1667 reactions), +++H–N,--C–H (633), +++C–H,--C–H (619), +++H–O,--C–H,--C–O (599) and +++H–H,--C–H,--C–H (517).

The original Cyclo-23-TS 102 dataset encompasses 5,26952695,2695 , 269 profiles for [3+2]delimited-[]32[3+2][ 3 + 2 ] cycloaddition reactions with activation free energies (ΔGΔsuperscript𝐺\Delta G^{\ddagger}roman_Δ italic_G start_POSTSUPERSCRIPT ‡ end_POSTSUPERSCRIPT) computed at the B3LYP-D3(BJ)/def2-TZVP//B3LYP-D3(BJ)/def2-SVP level. The dataset provides atom-mapped SMILES with “True” maps for heavy atoms derived from either the transition state structure or heuristic rules. For the regime with explicit hydrogen atoms, we atom-mapped the xyz files by matching the reactants, given in two separate files, to the provided transition state structure, which closely resembles the two reactants and has the same atom order as in the products. This was done with a labeled graph matching algorithm as implemented in NetworkX.121, 122 The algorithm is unaware of chirality, double-bond stereochemistry or conformations, thus may lead to not exactly correct atom-map**s. We also found that for four reactions the product SMILES and xyz files depict different species, thus the the set was reduced to 5,26552655,2655 , 265 reactions.

The Proparg-21-TS dataset91, 63 contains 753 structures of intermediates before and after the enantioselective transition state of benzaldehyde propargylation, with activation energies (ΔEΔsuperscript𝐸\Delta E^{\ddagger}roman_Δ italic_E start_POSTSUPERSCRIPT ‡ end_POSTSUPERSCRIPT) computed at the B97D/TZV(2p,2d) level. SMILES strings (“Fragment-Based” SMILES) and “True” atom-maps are not provided in the original dataset, these are taken from Ref. 38.

RXNMapper98-mapped versions of GDB7-22-TS and Cyclo-23-TS were obtained with RXNMapper 0.3.0 with the default settings. The Proparg-21-TS set cannot be mapped, because the underlying libraries cannot process its SMILES. Since RXNMapper sorts molecules in case of multiple reactants and/or products, which would complicate SMILES–xyz matching (see Section 5.2 below), we used a locally modified version that does not change the molecule order.

5.2 Matching SMILES strings to xyz geometries

EquiReact makes use of both the graph structure of a molecule (as provided in the SMILES string) and the three-dimensional structure (in the xyz). The atoms in the graph are associated with the atomic coordinates provided in the xyz file. Thanks to the way the GDB7-22-TS dataset101 was generated, the atomic coordinates can be easily matched to SMILES which in turn allows to atom-map reactants to products. However, we also tested RXNMapper-mapped SMILES which do not respect the same constraints. Therefore, for consistency, we use a SMILES–xyz matching procedure detailed below.

We construct molecular graphs from xyz using covalent radii and matched them to RDKit105 molecular graphs obtained from SMILES with a labeled graph matching algorithm as implemented in NetworkX.121, 122 This procedure is however unaware of chirality and double-bond stereochemistry, thus some of the matches might be incorrect. Still, it provides a flexible method that can be applied to any dataset consisting of SMILES strings and xyz files.

The same procedure was applied to the Cyclo-23-TS dataset in the few cases when the the canonical SMILES have a different atom ordering than xyz.

5.3 xTB geometry generation

For the GDB7-22-TS and Cyclo-23-TS datasets, the starting structures were generated from SMILES using the distance-geometry embedding implemented in RDKit105 with the srETKDGv3 settings.123 Ten conformations were produced per molecule, which were then energy-ranked with the MMFF94 implementation124 in RDKit, defaulting to UFF in case of missing parameters. The lowest energy conformer was retained. For the Proparg-21-TS set, the original B97D/TZV(2p,2d) geometries were used as a starting point, because the stereochemical and conformational diversity of this set cannot be completely encoded with SMILES. Therefore MMFF94 will fail to generate an initial geometry from SMILES.

For all the sets, the starting structures were optimized at the GFN2-xTB semiempirical level of theory117 at the “loose” convergence level for a maximum of 1000 iterations using xTB v6.2 RC2. For 969969969969 reactions of the GDB7-22-TS set and 491491491491 reactions of the Cyclo-23-TS set, at least one of the participating molecules either could not converge to any reasonable configuration or converged to a structure not matching the SMILES. These reactions were excluded from the geometry quality tests (Sec. 3.3).

5.4 Model training

EquiReact was trained using the Adam optimizer 125 with learning rate and weight decay parameters as hyperparameters to be optimized. The learning rate was reduced by 40% after 60606060 epochs of no improvement in the validation MAE, as in Ref. 106. Models were trained for max. 512512512512 epochs, using early stop** after 150150150150 epochs of no improvement. The model with the best validation score was then used to make predictions on the test set.

The optimal model hyperparameters were searched within the following values: learning rate [5105,104,5104,103]absent5E-5E-45E-4E-3\in[$5\text{$\cdot$}{10}^{-5}$,${10}^{-4}$,$5\text{$\cdot$}{10}^{-4}$,${10}^{-% 3}$]∈ [ start_ARG 5 end_ARG start_ARG ⋅ end_ARG start_ARG power start_ARG 10 end_ARG start_ARG - 5 end_ARG end_ARG , start_ARG end_ARG start_ARG ⁢ end_ARG start_ARG power start_ARG 10 end_ARG start_ARG - 4 end_ARG end_ARG , start_ARG 5 end_ARG start_ARG ⋅ end_ARG start_ARG power start_ARG 10 end_ARG start_ARG - 4 end_ARG end_ARG , start_ARG end_ARG start_ARG ⁢ end_ARG start_ARG power start_ARG 10 end_ARG start_ARG - 3 end_ARG end_ARG ]; weight decay parameter [105,104,103,0]absentE-5E-4E-30\in[${10}^{-5}$,${10}^{-4}$,${10}^{-3}$,0]∈ [ start_ARG end_ARG start_ARG ⁢ end_ARG start_ARG power start_ARG 10 end_ARG start_ARG - 5 end_ARG end_ARG , start_ARG end_ARG start_ARG ⁢ end_ARG start_ARG power start_ARG 10 end_ARG start_ARG - 4 end_ARG end_ARG , start_ARG end_ARG start_ARG ⁢ end_ARG start_ARG power start_ARG 10 end_ARG start_ARG - 3 end_ARG end_ARG , 0 ]; node and edge features embedding size ns[16,32,48,64]subscript𝑛𝑠16324864n_{s}\in[16,32,48,64]italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∈ [ 16 , 32 , 48 , 64 ]; =11\ell{=}1roman_ℓ = 1 hidden space size nv[16,32,48,64]subscript𝑛𝑣16324864n_{v}\in[16,32,48,64]italic_n start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ∈ [ 16 , 32 , 48 , 64 ]; number of edge features ng[16,32,48,64]subscript𝑛𝑔16324864n_{g}\in[16,32,48,64]italic_n start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ∈ [ 16 , 32 , 48 , 64 ]; number of convolutional layers nconv[2,3]subscript𝑛conv23n_{\mathrm{conv}}\in[2,3]italic_n start_POSTSUBSCRIPT roman_conv end_POSTSUBSCRIPT ∈ [ 2 , 3 ]; radial cutoff rmax[2.5,5.0,10.0]subscript𝑟2.55.010.0r_{\max}\in[2.5,5.0,10.0]italic_r start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ∈ [ 2.5 , 5.0 , 10.0 ]; maximum number of atom neighbors nneigh[10,25,50]subscript𝑛neigh102550n_{\mathrm{neigh}}\in[10,25,50]italic_n start_POSTSUBSCRIPT roman_neigh end_POSTSUBSCRIPT ∈ [ 10 , 25 , 50 ]; dropout probability pd[0.0,0.05,0.1]subscript𝑝𝑑0.00.050.1p_{d}\in[0.0,0.05,0.1]italic_p start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∈ [ 0.0 , 0.05 , 0.1 ]; sum_mode \in [node, both]; combine_mode \in [mlp, diff, mean, sum]; graph_mode \in [energy, vector].

The hyperparameter search was done using EquiReactS𝑆{}_{S}start_FLOATSUBSCRIPT italic_S end_FLOATSUBSCRIPT (without attention or map**), using Bayesian search as implemented in Weights & Biases.126 For the EquiReactX𝑋{}_{X}start_FLOATSUBSCRIPT italic_X end_FLOATSUBSCRIPT regime, the learning rate and weight decay parameter were optimized afterward in a grid search, setting the other parameters to those from the search for EquiReactS𝑆{}_{S}start_FLOATSUBSCRIPT italic_S end_FLOATSUBSCRIPT. Hydrogen atoms were included as nodes in the graphs. Sweeps were run for 128 epochs for the GDB7-22-TS and Proparg-21-TS sets, and for 256 epochs for the Cyclo-23-TS set on a single random split. The parameters resulting in the best validation error, summarized in Table LABEL:S-tab:model-params, were used for all the other model settings.

5.5 Baseline models

The ChemProp model is based on a Condensed Graph of Reaction (CGR)65 built from atom-mapped SMILES strings of reactants and products, which is then passed through the directed message-passing neural network chemprop114 (version 1.5.0). Models are trained using the default parameters of chemprop.

Molecular SLATM vectors were generated using the qml python package127 before being combined to form the reaction version SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT. SLATMd𝑑{}_{d}start_FLOATSUBSCRIPT italic_d end_FLOATSUBSCRIPT is combined with Kernel Ridge Regression (KRR) models using the best kernel functions as in van Gerwen et al.38 The kernel width and regularization parameters were optimized on the first fold of the ten using the random splits, in line with how the hyperparameters were optimized for EquiReact.

Code and Data Availability

The code is available as a github repository at https://github.com/lcmd-epfl/EquiReact. The versions of the datasets used, as well as any processing applied to them, can be found in the same repository.

{suppinfo}

Supplementary Information is provided in the freely available file SI.pdf, detailing the hyperparameters of the different models tested (Section LABEL:S-sec:model-params), the model performance with and without explicit hydrogen atoms (Section LABEL:S-sec:hydrogens), and the discussion of the model with a cross-attention surrogate for atom-map** (Section LABEL:S-sec:cross).

Author Information

Author contributions

P.v.G. and C.B. conceptualized the project. EquiReact and support codes were written and run by P.v.G. and K.R.B., with design suggestions from C.B. and V.R.S. Results were analyzed by P.v.G., K.R.B., C.B. and V.R.S. xTB computations were run by R.L. The original draft was written by P.v.G. and K.R.B. with reviews and edits from all authors. C.C. and A.K. provided supervision and are acknowledged for acquiring funding.

Conflict of interest

The authors have no conflicts to disclose.

{acknowledgement}

P.v.G., C.B., V.R.S., R.L., A.K. and C.C. acknowledge the National Centre of Competence in Research (NCCR) “Sustainable chemical process through catalysis (Catalysis)”, grant number 180544, of the Swiss National Science Foundation (SNSF) for financial support. K.R.B. and C.C. were supported by the European Research Council (grant number 817977) and by the National Centre of Competence in Research (NCCR) “Materials’ Revolution: Computational Design and Discovery of Novel Materials (MARVEL)”, grant number 205602, of the Swiss National Science Foundation.

References

  • Behler and Parrinello 2007 Behler, J.; Parrinello, M. Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces. Phys. Rev. Lett. 2007, 98, 146401
  • Rupp et al. 2012 Rupp, M.; Tkatchenko, A.; Müller, K.-R.; von Lilienfeld, O. A. Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning. Phys. Rev. Lett. 2012, 108, 058301
  • Bartók et al. 2013 Bartók, A. P.; Kondor, R.; Csányi, G. On representing chemical environments. Phys. Rev. B 2013, 87, 184115
  • Hansen et al. 2015 Hansen, K.; Biegler, F.; Ramakrishnan, R.; Pronobis, W.; von Lilienfeld, O. A.; Müller, K.-R.; Tkatchenko, A. Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space. J. Phys. Chem. Lett. 2015, 6, 2326–2331
  • Huo and Rupp 2017 Huo, H.; Rupp, M. Unified representation for machine learning of molecules and crystals. arXiv preprint 2017, arXiv:1704.06439
  • Faber et al. 2018 Faber, F. A.; Christensen, A. S.; Huang, B.; von Lilienfeld, O. A. Alchemical and structural distribution based representation for universal quantum machine learning. J. Chem. Phys. 2018, 148, 241717
  • Christensen et al. 2020 Christensen, A. S.; Bratholm, L. A.; Faber, F. A.; Anatole von Lilienfeld, O. FCHL revisited: Faster and more accurate quantum machine learning. J. Chem. Phys. 2020, 152, 044107
  • Huang and von Lilienfeld 2020 Huang, B.; von Lilienfeld, O. A. Quantum machine learning using atom-in-molecule-based fragments selected on the fly. Nat. Chem. 2020, 12, 945–951
  • Drautz 2019 Drautz, R. Atomic cluster expansion for accurate and transferable interatomic potentials. Phys. Rev. B 2019, 99, 014104
  • Dusson et al. 2022 Dusson, G.; Bachmayr, M.; Csányi, G.; Drautz, R.; Etter, S.; van der Oord, C.; Ortner, C. Atomic cluster expansion: Completeness, efficiency and stability. J. Comput. Phys. 2022, 454, 110946
  • Grisafi and Ceriotti 2019 Grisafi, A.; Ceriotti, M. Incorporating long-range physics in atomic-scale machine learning. J. Chem. Phys. 2019, 151, 204105
  • Grisafi et al. 2021 Grisafi, A.; Nigam, J.; Ceriotti, M. Multi-scale approach for the prediction of atomic scale properties. Chem. Sci. 2021, 12, 2078–2090
  • Nigam et al. 2020 Nigam, J.; Pozdnyakov, S.; Ceriotti, M. Recursive evaluation and iterative contraction of N𝑁Nitalic_N-body equivariant features. J. Chem. Phys. 2020, 153, 121101
  • Fabrizio et al. 2022 Fabrizio, A.; Briling, K. R.; Corminboeuf, C. SPAHM: the Spectrum of Approximated Hamiltonian Matrices representations. Digital Discovery 2022, 1, 286–294
  • Briling et al. 2023 Briling, K. R.; Calvino Alonso, Y.; Fabrizio, A.; Corminboeuf, C. SPAHH{}^{\mathrm{H}}start_FLOATSUPERSCRIPT roman_H end_FLOATSUPERSCRIPTM(a,b): encoding the density information from guess Hamiltonian in quantum machine learning representations. arXiv preprint 2023, arXiv:2309.02950
  • Karandashev and von Lilienfeld 2022 Karandashev, K.; von Lilienfeld, O. A. An orbital-based representation for accurate quantum machine learning. J. Chem. Phys. 2022, 156, 114101
  • Llenga and Gryn’ova 2023 Llenga, S.; Gryn’ova, G. Matrix of orthogonalized atomic orbital coefficients representation for radicals and ions. J. Chem. Phys. 2023, 158, 214116
  • Li et al. 2015 Li, Z.; Kermode, J. R.; De Vita, A. Molecular dynamics with on-the-fly machine learning of quantum-mechanical forces. Phys. Rev. Lett. 2015, 114, 096405
  • Chmiela et al. 2017 Chmiela, S.; Tkatchenko, A.; Sauceda, H. E.; Poltavsky, I.; Schütt, K. T.; Müller, K.-R. Machine learning of accurate energy-conserving molecular force fields. Sci. Adv. 2017, 3, e1603015
  • Chmiela et al. 2018 Chmiela, S.; Sauceda, H. E.; Müller, K.-R.; Tkatchenko, A. Towards exact molecular dynamics simulations with machine-learned force fields. Nat. Commun. 2018, 9, 3887
  • Behler 2017 Behler, J. First principles neural network potentials for reactive simulations of large molecular and condensed systems. Angew. Chem. Int. Ed. 2017, 56, 12828–12840
  • Smith et al. 2018 Smith, J. S.; Nebgen, B.; Lubbers, N.; Isayev, O.; Roitberg, A. E. Less is more: Sampling chemical space with active learning. J. Chem. Phys. 2018, 148, 241733
  • Bereau et al. 2015 Bereau, T.; Andrienko, D.; Von Lilienfeld, O. A. Transferable atomic multipole machine learning models for small organic molecules. J. Chem. Theory Comput. 2015, 11, 3225–3233
  • Grisafi et al. 2018 Grisafi, A.; Wilkins, D. M.; Csányi, G.; Ceriotti, M. Symmetry-adapted machine learning for tensorial properties of atomistic systems. Phys. Rev. Lett. 2018, 120, 036002
  • Wilkins et al. 2019 Wilkins, D. M.; Grisafi, A.; Yang, Y.; Lao, K. U.; DiStasio Jr, R. A.; Ceriotti, M. Accurate molecular polarizabilities with coupled cluster theory and machine learning. Proc. Natl. Acad. Sci. U.S.A. 2019, 116, 3401–3406
  • Montavon et al. 2013 Montavon, G.; Rupp, M.; Gobre, V.; Vazquez-Mayagoitia, A.; Hansen, K.; Tkatchenko, A.; Müller, K.-R.; Von Lilienfeld, O. A. Machine learning of molecular electronic properties in chemical compound space. New J. Phys. 2013, 15, 095003
  • Mazouin et al. 2022 Mazouin, B.; Schöpfer, A. A.; von Lilienfeld, O. A. Selected machine learning of HOMO–LUMO gaps with improved data-efficiency. Mater. Adv. 2022, 3, 8306–8316
  • Brockherde et al. 2017 Brockherde, F.; Vogt, L.; Li, L.; Tuckerman, M. E.; Burke, K.; Müller, K.-R. Bypassing the Kohn-Sham equations with machine learning. Nat. Commun. 2017, 8, 872
  • Grisafi et al. 2010 Grisafi, A.; Fabrizio, A.; Meyer, B.; Wilkins, D. M.; Corminboeuf, C.; Ceriotti, M. Transferable machine-learning model of the electron density. ACS Cent. Sci. 2010, 5, 57–64
  • Fabrizio et al. 2019 Fabrizio, A.; Grisafi, A.; Meyer, B.; Ceriotti, M.; Corminboeuf, C. Electron density learning of non-covalent systems. Chem. Sci. 2019, 10, 9424–9432
  • Musil et al. 2021 Musil, F.; Grisafi, A.; Bartók, A. P.; Ortner, C.; Csányi, G.; Ceriotti, M. Physics-Inspired Structural Representations for Molecules and Materials. Chem. Rev. 2021, 121, 9759–9815
  • Langer et al. 2022 Langer, M. F.; Goessmann, A.; Rupp, M. Representations of molecules and materials for interpolation of quantum-mechanical simulations via machine learning. npj Comput. Mater. 2022, 8, 41
  • Huang and von Lilienfeld 2021 Huang, B.; von Lilienfeld, O. A. Ab Initio Machine Learning in Chemical Compound Space. Chem. Rev. 2021, 121, 10001–10036
  • Kulik et al. 2022 Kulik, H. J.; Hammerschmidt, T.; Schmidt, J.; Botti, S.; Marques, M. A. L.; Boley, M.; Scheffler, M.; Todorović, M.; Rinke, P.; Oses, C.; Smolyanyuk, A.; Curtarolo, S.; Tkatchenko, A.; Bartók, A. P.; Manzhos, S.; Ihara, M.; Carrington, T.; Behler, J.; Isayev, O.; Veit, M.; Grisafi, A.; Nigam, J.; Ceriotti, M.; Schütt, K. T.; Westermayr, J.; Gastegger, M.; Maurer, R. J.; Kalita, B.; Burke, K.; Nagai, R.; Akashi, R.; Sugino, O.; Hermann, J.; Noé, F.; Pilati, S.; Draxl, C.; Kuban, M.; Rigamonti, S.; Scheidgen, M.; Esters, M.; Hicks, D.; Toher, C.; Balachandran, P. V.; Tamblyn, I.; Whitelam, S.; Bellinger, C.; Ghiringhelli, L. M. Roadmap on Machine learning in electronic structure. Electron. Struct. 2022, 4, 023004
  • Glielmo et al. 2017 Glielmo, A.; Sollich, P.; De Vita, A. Accurate interatomic force fields via machine learning with covariant kernels. Phys. Rev. B 2017, 95, 214302
  • van Gerwen et al. 2022 van Gerwen, P.; Fabrizio, A.; Wodrich, M. D.; Corminboeuf, C. Physics-based representations for machine learning properties of chemical reactions. Mach. Learn.: Sci. Technol. 2022, 3, 045005
  • Faber et al. 2017 Faber, F. A.; Hutchison, L.; Huang, B.; Gilmer, J.; Schoenholz, S. S.; Dahl, G. E.; Vinyals, O.; Kearnes, S.; Riley, P. F.; Von Lilienfeld, O. A. Prediction errors of molecular machine learning models lower than hybrid DFT error. J. Chem. Theory Comput. 2017, 13, 5255–5264
  • van Gerwen et al. 2023 van Gerwen, P.; Briling, K. R.; Calvino Alonso, Y.; Franke, M.; Corminboeuf, C. Benchmarking machine-readable vectors of chemical reactions on computed activation barriers. ChemRxiv preprint 2023, doi:10.26434/chemrxiv--2023--0hgbc
  • Schütt et al. 2017 Schütt, K.; Kindermans, P.-J.; Sauceda Felix, H. E.; Chmiela, S.; Tkatchenko, A.; Müller, K.-R. Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. Adv. Neural Inf. Process. Syst. 2017, 30, 991–1001
  • Unke and Meuwly 2019 Unke, O. T.; Meuwly, M. PhysNet: A neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput. 2019, 15, 3678–3693
  • Gasteiger et al. 2020 Gasteiger, J.; Groß, J.; Günnemann, S. Directional message passing for molecular graphs. arXiv preprint 2020, arXiv:2003.03123
  • Gilmer et al. 2017 Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; Vinyals, O.; Dahl, G. E. Neural message passing for quantum chemistry. International conference on machine learning. 2017; pp 1263–1272
  • Batzner et al. 2022 Batzner, S.; Musaelian, A.; Sun, L.; Geiger, M.; Mailoa, J. P.; Kornbluth, M.; Molinari, N.; Smidt, T. E.; Kozinsky, B. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 2022, 13, 2453
  • Gasteiger et al. 2021 Gasteiger, J.; Becker, F.; Günnemann, S. Gemnet: Universal directional graph neural networks for molecules. Adv. Neural Inf. Process. Syst. 2021, 34, 6790–6802
  • Haghighatlari et al. 2022 Haghighatlari, M.; Li, J.; Guan, X.; Zhang, O.; Das, A.; Stein, C. J.; Heidar-Zadeh, F.; Liu, M.; Head-Gordon, M.; Bertels, L.; Hao, H.; Leven, I.; Head-Gordon, T. Newtonnet: A newtonian message passing network for deep learning of interatomic potentials and forces. Digital Discovery 2022, 1, 333–343
  • Qiao et al. 2020 Qiao, Z.; Welborn, M.; Anandkumar, A.; Manby, F. R.; Miller, T. F. OrbNet: Deep learning for quantum chemistry using symmetry-adapted atomic-orbital features. J. Chem. Phys. 2020, 153, 124111
  • Thomas et al. 2018 Thomas, N.; Smidt, T.; Kearnes, S.; Yang, L.; Li, L.; Kohlhoff, K.; Riley, P. Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds. arXiv preprint 2018, arXiv:1802.08219
  • Townshend et al. 2020 Townshend, R. J.; Townshend, B.; Eismann, S.; Dror, R. O. Geometric prediction: Moving beyond scalars. arXiv preprint 2020, arXiv:2006.14163
  • Anderson et al. 2019 Anderson, B.; Hy, T. S.; Kondor, R. Cormorant: Covariant molecular neural networks. Adv. Neural Inf. Process. Syst. 2019, 32, 14537–14546
  • Satorras et al. 2021 Satorras, V. G.; Hoogeboom, E.; Welling, M. E(n) Equivariant Graph Neural Networks. Proceedings of the 38th International Conference on Machine Learning. 2021; pp 9323–9332
  • Christensen et al. 2021 Christensen, A. S.; Sirumalla, S. K.; Qiao, Z.; O’Connor, M. B.; Smith, D. G.; Ding, F.; Bygrave, P. J.; Anandkumar, A.; Welborn, M.; Manby, F. R.; Miller, T. F. OrbNet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy. J. Chem. Phys. 2021, 155, 204103
  • Schütt et al. 2021 Schütt, K.; Unke, O.; Gastegger, M. Equivariant message passing for the prediction of tensorial properties and molecular spectra. Proceedings of the 38th International Conference on Machine Learning. 2021; pp 9377–9388
  • Unke et al. 2021 Unke, O. T.; Chmiela, S.; Gastegger, M.; Schütt, K. T.; Sauceda, H. E.; Müller, K.-R. SpookyNet: Learning force fields with electronic degrees of freedom and nonlocal effects. Nat. Commun. 2021, 12, 7273
  • Cheng et al. 2019 Cheng, L.; Welborn, M.; Christensen, A. S.; Miller, T. F. A universal density matrix functional from molecular orbital-based machine learning: Transferability across organic molecules. J. Chem. Phys. 2019, 150, 131103
  • Ramakrishnan et al. 2014 Ramakrishnan, R.; Dral, P. O.; Rupp, M.; Von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 2014, 1, 140022
  • Musaelian et al. 2023 Musaelian, A.; Batzner, S.; Johansson, A.; Sun, L.; Owen, C. J.; Kornbluth, M.; Kozinsky, B. Learning local equivariant representations for large-scale atomistic dynamics. Nature Communications 2023, 14, 579
  • Law et al. 2014 Law, V.; Knox, C.; Djoumbou, Y.; Jewison, T.; Guo, A. C.; Liu, Y.; Maciejewski, A.; Arndt, D.; Wilson, M.; Neveu, V., et al. DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res. 2014, 42, D1091–D1097
  • Folmsbee and Hutchison 2021 Folmsbee, D.; Hutchison, G. Assessing conformer energies using electronic structure and machine learning methods. Int. J. Quantum Chem. 2021, 121, e26381
  • Smith et al. 2017 Smith, J. S.; Isayev, O.; Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 2017, 8, 3192–3203
  • Zeng et al. 2020 Zeng, J.; Cao, L.; Xu, M.; Zhu, T.; Zhang, J. Z. Complex reaction processes in combustion unraveled by neural network-based molecular dynamics simulation. Nat. Commun. 2020, 11, 5713
  • Chanussot et al. 2021 Chanussot, L.; Das, A.; Goyal, S.; Lavril, T.; Shuaibi, M.; Riviere, M.; Tran, K.; Heras-Domingo, J.; Ho, C.; Hu, W., et al. Open catalyst 2020 (OC20) dataset and community challenges. ACS Catal. 2021, 11, 6059–6072
  • Lewis-Atwell et al. 2022 Lewis-Atwell, T.; Townsend, P. A.; Grayson, M. N. Machine learning activation energies of chemical reactions. WIREs Comput. Mol. Sci. 2022, 12, e1593
  • Gallarati et al. 2021 Gallarati, S.; Fabregat, R.; Laplaza, R.; Bhattacharjee, S.; Wodrich, M. D.; Corminboeuf, C. Reaction-based machine learning representations for predicting the enantioselectivity of organocatalysts. Chem. Sci. 2021, 12, 6879–6889
  • Grambow et al. 2020 Grambow, C. A.; Pattanaik, L.; Green, W. H. Deep Learning of Activation Energies. J. Phys. Chem. Lett. 2020, 11, 2992–2997
  • Heid and Green 2022 Heid, E.; Green, W. H. Machine learning of reaction properties via learned representations of the condensed graph of reaction. J. Chem. Inf. Model. 2022, 62, 2101–2110
  • Spiekermann et al. 2022 Spiekermann, K. A.; Pattanaik, L.; Green, W. H. Fast Predictions of Reaction Barrier Heights: Toward Coupled-Cluster Accuracy. J. Phys. Chem. A 2022, 126, 3976–3986
  • Zhao et al. 2023 Zhao, Q.; Anstine, D. M.; Isayev, O.; Savoie, B. M. Δ2superscriptΔ2\Delta^{2}roman_Δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT machine learning for reaction property prediction. Chem. Sci. 2023, 14, 13392–13401
  • Heinen et al. 2021 Heinen, S.; von Rudorff, G. F.; von Lilienfeld, O. A. Toward the design of chemical reactions: Machine learning barriers of competing mechanisms in reactant space. J. Chem. Phys. 2021, 155, 064105
  • Singh et al. 2019 Singh, A. R.; Rohr, B. A.; Gauthier, J. A.; Nørskov, J. K. Predicting chemical reaction barriers with a machine learning model. Catal. Lett. 2019, 149, 2347–2354
  • Choi et al. 2018 Choi, S.; Kim, Y.; Kim, J. W.; Kim, Z.; Kim, W. Y. Feasibility of activation energy prediction of gas-phase reactions by machine learning. Chem. Eur. J. 2018, 24, 12354–12358
  • Farrar and Grayson 2022 Farrar, E. H. E.; Grayson, M. N. Machine learning and semi-empirical calculations: a synergistic approach to rapid, accurate, and mechanism-based reaction barrier prediction. Chem. Sci. 2022, 13, 7594–7603
  • Friederich et al. 2020 Friederich, P.; dos Passos Gomes, G.; Bin, R. D.; Aspuru-Guzik, A.; Balcells, D. Machine learning dihydrogen activation in the chemical space surrounding Vaska’s complex. Chem. Sci. 2020, 11, 4584–4601
  • Migliaro and Cundari 2020 Migliaro, I.; Cundari, T. R. Density Functional Study of Methane Activation by Frustrated Lewis Pairs with Group 13 Trihalides and Group 15 Pentahalides and a Machine Learning Analysis of Their Barrier Heights. J. Chem. Inf. Model. 2020, 60, 4958–4966
  • Lewis-Atwell et al. 2023 Lewis-Atwell, T.; Beechey, D.; Şimşek, O.; Grayson, M. N. Reformulating Reactivity Design for Data-Efficient Machine Learning. ACS catalysis 2023, 13, 13506–13515
  • Schwaller et al. 2022 Schwaller, P.; Vaucher, A. C.; Laplaza, R.; Bunne, C.; Krause, A.; Corminboeuf, C.; Laino, T. Machine intelligence for chemical reaction space. WIREs Comput. Mol. Sci. 2022, 12, e1604
  • Rogers and Hahn 2010 Rogers, D.; Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 2010, 50, 742–754
  • Probst et al. 2022 Probst, D.; Schwaller, P.; Reymond, J.-L. Reaction Classification and Yield Prediction using the Differential Reaction Fingerprint DRFP. Digital Discovery 2022, 1, 91–97
  • Ahneman et al. 2018 Ahneman, D. T.; Estrada, J. G.; Lin, S.; Dreher, S. D.; Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 2018, 360, 186–190
  • Żurański et al. 2021 Żurański, A. M.; Martinez Alvarado, J. I.; Shields, B. J.; Doyle, A. G. Predicting Reaction Yields via Supervised Learning. Acc. Chem. Res. 2021, 54, 1856–1865
  • Zahrt et al. 2019 Zahrt, A. F.; Henle, J. J.; Rose, B. T.; Wang, Y.; Darrow, W. T.; Denmark, S. E. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 2019, 363, eaau5631
  • Jorner et al. 2021 Jorner, K.; Brinck, T.; Norrby, P.-O.; Buttar, D. Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies. Chem. Sci. 2021, 12, 1163–1175
  • Reid and Sigman 2019 Reid, J. P.; Sigman, M. S. Holistic prediction of enantioselectivity in asymmetric catalysis. Nature 2019, 571, 343–348
  • Gensch et al. 2022 Gensch, T.; dos Passos Gomes, G.; Friederich, P.; Peters, E.; Gaudin, T.; Pollice, R.; Jorner, K.; Nigam, A.; Lindner-D’Addario, M.; Sigman, M. S.; Aspuru-Guzik, A. A comprehensive discovery platform for organophosphorus ligands for catalysis. J. Am. Chem. Soc. 2022, 144, 1205–1217
  • Santiago et al. 2018 Santiago, C. B.; Guo, J.-Y.; Sigman, M. S. Predictive and mechanistic multivariate linear regression models for reaction development. Chem. Sci. 2018, 9, 2398–2412
  • Jorner 2023 Jorner, K. Putting Chemical Knowledge to Work in Machine Learning for Reactivity. Chimia 2023, 77, 22
  • Gallegos et al. 2021 Gallegos, L. C.; Luchini, G.; St. John, P. C.; Kim, S.; Paton, R. S. Importance of Engineered and Learned Molecular Representations in Predicting Organic Reactivity, Selectivity, and Chemical Properties. Acc. Chem. Res. 2021, 54, 827–836
  • Williams et al. 2021 Williams, W. L.; Zeng, L.; Gensch, T.; Sigman, M. S.; Doyle, A. G.; Anslyn, E. V. The Evolution of Data-Driven Modeling in Organic Chemistry. ACS Cent. Sci. 2021, 7, 1622–1637
  • Devlin et al. 2018 Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint 2018, arXiv:1810.04805
  • Schwaller et al. 2019 Schwaller, P.; Laino, T.; Gaudin, T.; Bolgar, P.; Hunter, C. A.; Bekas, C.; Lee, A. A. Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction. ACS Cent. Sci. 2019, 5, 1572–1583
  • Schwaller et al. 2021 Schwaller, P.; Vaucher, A. C.; Laino, T.; Reymond, J.-L. Prediction of chemical reaction yields using deep learning. Mach. Learn.: Sci. Technol. 2021, 2, 015016
  • Doney et al. 2016 Doney, A. C.; Rooks, B. J.; Lu, T.; Wheeler, S. E. Design of Organocatalysts for Asymmetric Propargylations through Computational Screening. ACS Catal. 2016, 6, 7948–7955
  • Gasteiger et al. 2020 Gasteiger, J.; Giri, S.; Margraf, J. T.; Günnemann, S. Fast and uncertainty-aware directional message passing for non-equilibrium molecules. arXiv preprint 2020, arXiv:2011.14115
  • Grambow et al. 2020 Grambow, C.; Pattanaik, L.; Green, W. Reactants, products, and transition states of elementary chemical reactions based on quantum chemistry. Sci. Data 2020, 7, 137
  • Duan et al. 2023 Duan, C.; Du, Y.; Jia, H.; Kulik, H. J. Accurate transition state generation with an object-aware equivariant elementary reaction diffusion model. arXiv preprint 2023, arXiv:2304.06174
  • Chen et al. 2013 Chen, W. L.; Chen, D. Z.; Taylor, K. T. Automatic reaction map** and reaction center detection. WIREs Comput. Mol. Sci. 2013, 3, 560–593
  • Preciat Gonzalez et al. 2017 Preciat Gonzalez, G. A.; El Assal, L. R.; Noronha, A.; Thiele, I.; Haraldsdóttir, H. S.; Fleming, R. M. Comparative evaluation of atom map** algorithms for balanced metabolic reactions: application to Recon 3D. J. Cheminform. 2017, 9, 1–15
  • Jaworski et al. 2019 Jaworski, W.; Szymkuć, S.; Mikulak-Klucznik, B.; Piecuch, K.; Klucznik, T.; Kaźmierowski, M.; Rydzewski, J.; Gambin, A.; Grzybowski, B. A. Automatic map** of atoms across both simple and complex chemical reactions. Nat. Commun. 2019, 10, 1434
  • Schwaller et al. 2021 Schwaller, P.; Hoover, B.; Reymond, J.-L.; Strobelt, H.; Laino, T. Extraction of organic chemistry grammar from unsupervised learning of chemical reactions. Sci. Adv. 2021, 7, eabe4166
  • Stuyver and Coley 2023 Stuyver, T.; Coley, C. W. Machine Learning-Guided Computational Screening of New Candidate Reactions with High Bioorthogonal Click Potential. Chem. Eur. J. 2023, 29, e202300387
  • Stuyver and Coley 2022 Stuyver, T.; Coley, C. W. Quantum chemistry-augmented neural networks for reactivity prediction: Performance, generalizability, and explainability. J. Chem. Phys. 2022, 156, 084104
  • Spiekermann et al. 2022 Spiekermann, K.; Pattanaik, L.; Green, W. H. High accuracy barrier heights, enthalpies, and rate coefficients for chemical reactions. Sci. Data 2022, 9, 417
  • Stuyver et al. 2023 Stuyver, T.; Jorner, K.; Coley, C. W. Reaction profiles for quantum chemistry-computed [3+2] cycloaddition reactions. Sci. Data 2023, 10, 66
  • Geiger et al. 2022 Geiger, M.; Smidt, T.; M., A.; Miller, B. K.; Boomsma, W.; Dice, B.; Lapchevskyi, K.; Weiler, M.; Tyszkiewicz, M.; Uhrin, M.; Batzner, S.; Madisetti, D.; Frellsen, J.; Jung, N.; Sanborn, S.; jkh,; Wen, M.; Rackers, J.; Rød, M.; Bailey, M. e3nn/e3nn: 2022-12-12. 2022; https://doi.org/10.5281/zenodo.7430260
  • Corso et al. 2023 Corso, G.; Stärk, H.; **g, B.; Barzilay, R.; Jaakkola, T. DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking. arXiv preprint 2023, arXiv:2210.01776
  • Landrum et al. 2023 Landrum, G.; Tosco, P.; Kelley, B.; Ric,; Sriniker,; Cosgrove, D.; Gedeck,; Vianello, R.; NadineSchneider,; Kawashima, E.; N, D.; Jones, G.; Dalke, A.; Cole, B.; Swain, M.; Turk, S.; AlexanderSavelyev,; Vaucher, A.; Wójcikowski, M.; Ichiru Take,; Probst, D.; Ujihara, K.; Scalfani, V. F.; Godin, G.; Pahl, A.; Francois Berenger,; JLVarjo,; Walker, R.; Jasondbiggs,; Strets123, rdkit/rdkit: 2023_03_1 (Q1 2023) Release. 2023; https://zenodo.org/record/7880616
  • Stärk et al. 2022 Stärk, H.; Ganea, O.; Pattanaik, L.; Barzilay, R.; Jaakkola, T. EquiBind: Geometric deep learning for drug binding structure prediction. International conference on machine learning. 2022; pp 20503–20521
  • Vaswani et al. 2017 Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008
  • Paszke et al. 2019 Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; Desmaison, A.; Kopf, A.; Yang, E.; DeVito, Z.; Raison, M.; Tejani, A.; Chilamkurthy, S.; Steiner, B.; Fang, L.; Bai, J.; Chintala, S. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037
  • Liao and Smidt 2022 Liao, Y.-L.; Smidt, T. Equiformer: Equivariant graph attention transformer for 3D atomistic graphs. arXiv preprint 2022, arXiv:2206.11990
  • Ganea et al. 2022 Ganea, O.-E.; Huang, X.; Bunne, C.; Bian, Y.; Barzilay, R.; Jaakkola, T.; Krause, A. Independent SE(3)-Equivariant Models for End-to-End Rigid Protein Docking. arXiv preprint 2022, arXiv:2111.07786
  • van Gerwen et al. 2023 van Gerwen, P.; Wodrich, M. D.; Laplaza, R.; Corminboeuf, C. Reply to Comment on ‘Physics-based representations for machine learning properties of chemical reactions’. Mach. Learn.: Sci. Technol. 2023, 4, 048002
  • Lowe 2012 Lowe, D. M. Extraction of chemical structures and reactions from the literature. Ph.D. thesis, University of Cambridge, 2012
  • Wu et al. 2018 Wu, Z.; Ramsundar, B.; Feinberg, E. N.; Gomes, J.; Geniesse, C.; Pappu, A. S.; Leswing, K.; Pande, V. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 2018, 9, 513–530
  • Yang et al. 2019 Yang, K.; Swanson, K.; **, W.; Coley, C.; Eiden, P.; Gao, H.; Guzman-Perez, A.; Hopper, T.; Kelley, B.; Mathea, M.; Palmer, A.; Settels, V.; Jaakkola, T.; Jensen, K.; Barzilay, R. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 2019, 59, 3370–3388
  • Bemis and Murcko 1996 Bemis, G. W.; Murcko, M. A. The Properties of Known Drugs. 1. Molecular Frameworks. J. Med. Chem. 1996, 39, 2887–2893
  • van der Maaten and Hinton 2008 van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605
  • Bannwarth et al. 2019 Bannwarth, C.; Ehlert, S.; Grimme, S. GFN2-xTB—An Accurate and Broadly Parametrized Self-Consistent Tight-Binding Quantum Chemical Method with Multipole Electrostatics and Density-Dependent Dispersion Contributions. J. Chem. Theory Comput. 2019, 15, 1652–1671
  • Blum and Reymond 2009 Blum, L. C.; Reymond, J.-L. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. J. Am. Chem. Soc. 2009, 131, 8732–8733
  • Reymond 2015 Reymond, J.-L. The chemical space project. Acc. Chem. Res. 2015, 48, 722–730
  • Zimmerman 2015 Zimmerman, P. M. Single-ended transition state finding with the growing string method. J. Comput. Chem. 2015, 36, 601–611
  • Cordella et al. 2001 Cordella, L. P.; Foggia, P.; Sansone, C.; Vento, M. An improved algorithm for matching large graphs. 3rd IAPR-TC15 workshop on graph-based representations in pattern recognition. 2001; pp 149–159
  • Hagberg et al. 2008 Hagberg, A. A.; Schult, D. A.; Swart, P. J. Exploring Network Structure, Dynamics, and Function using NetworkX. Proceedings of the 7th Python in Science Conference. Pasadena, CA USA, 2008; pp 11–15
  • Riniker and Landrum 2015 Riniker, S.; Landrum, G. A. Better informed distance geometry: using what we know to improve conformation generation. J. Chem. Inf. Model. 2015, 55, 2562–2574
  • Tosco et al. 2014 Tosco, P.; Stiefl, N.; Landrum, G. Bringing the MMFF force field to the RDKit: implementation and validation. J. Cheminform. 2014, 6, 37
  • Kingma and Ba 2014 Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint 2014, arXiv:1412.6980
  • Biewald 2020 Biewald, L. Experiment Tracking with Weights and Biases. 2020; https://www.wandb.com/, Software available from wandb.com
  • Christensen et al. 2017 Christensen, A. S.; Faber, F.; Huang, B.; Bratholm, L.; Tkatchenko, A.; Müller, K.-R.; von Lilienfeld, O. A. QML: A Python Toolkit for Quantum Machine Learning. https://github.com/qmlcode/qml, 2017