\jyear

2022

[1]\fnmShihao \surShao

[1]\fnmQinghua \surCui

1]\orgdivDepartment of Biomedical Informatics, State Key Laboratory of Vascular Homeostasis and Remodeling, School of Basic Medical Sciences, \orgnamePeking University, \orgaddress\cityBei**g, \postcode100191, \countryChina

2]\orgdivSchool of Electronics Engineering and Computer Science, \orgnamePeking University, \orgaddress\cityBei**g, \postcode100871, \countryChina

FreeCG: Free the Design Space of Clebsch–Gordan Transform for machine learning force field

[email protected] (S. S.)    \fnmHaoran \surGeng    [email protected] (Q. C.) [ [
Abstract

The Clebsch–Gordan Transform (CG transform) effectively encodes many-body interactions. Many studies have proven its accuracy in depicting atomic environments, although this comes with high computational needs. The computational burden of this challenge is hard to reduce due to the need for permutation equivariance, which limits the design space of the CG transform layer. We show that, implementing the CG transform layer on permutation-invariant inputs allows complete freedom in the design of this layer without affecting symmetry. Develo** further on this premise, our idea is to create a CG transform layer that operates on permutation-invariant abstract edges generated from real edge information. We bring in group CG transform with sparse path, abstract edges shuffling, and attention enhancer to form a powerful and efficient CG transform layer. Our method, known as FreeCG, achieves State-of-The-Art (SoTA) results in force prediction for MD17, rMD17, MD22, and property prediction in QM9 datasets with notable enhancement. It introduces a novel paradigm for carrying out efficient and expressive CG transform in future geometric neural network designs.

keywords:
Group Equivariance; Tensor Product; Irreducible Representation; Machine Learning Force Field

1 Introduction

Accurate modelling of molecular force field is of great importance for drug development amaro2018drug ; das2021drug ; chen2024design , materials science zepeda2017probing ; liu2024layer , chemical reaction kinetics zeng2020complex ; meuwly2021machine , nanotechnology srivastava2021recent ; wang2023novel , among others. Density Functional Theory (DFT) kohn1965self and other ab initio methods martin2020electronic ; ceperley1980ground ; bartlett2007coupled demonstrate excellent precision but requiring intensive computational resources, which highly limits its usage for many-body system burke2012perspective ; jones2015density ; cohen2008insights . Classical Force Fields are cheap but do not offer the same level of precision as the previous ones lindorff2010improved ; brooks2009charmm . Machine Learning Force Fields (MLFFs) cui2024geometry ; wang2023quinnet ; wang2024enhancing ; musaelian2023learning ; batzner20223 ; drautz2019atomic ; batatia2022mace ; tholke2021equivariant ; schutt2018schnet ; chmiela2017machine offer a satisfying trade-off between accuracy and efficiency, which is expected to perform as powerful as DFT or other high accuracy references, but with orders-of-magnitude speedup.

Kernel-based methods chmiela2017machine ; drautz2019atomic are the starting point of MLFFs. sGDML chmiela2017machine introduces the properties of conservative fields into MLFF, namely, the predicted energy’s negative partial derivative with respect to atom position is taken as the predicted force. Subsequently, the attention is shifted to deep neural networks. Message-Passing Neural Networks (MPNNs) have performed SoTA on several molecular dynamic datasets schutt2018schnet ; schutt2021equivariant . Group and group representation theory play important roles in the design of MPNNs for MLFFs. An intuitive idea is to maintain roration and translation equivariance in the design of neural network. For example, we naturally hope the predicted force can move with respect to the input molecule, and the energy unchanged in this process. Graph neural networks that obey this property are called Equivariant Graph Neural Networks (EGNNs) thomas2018tensor ; satorras2021n ; gasteiger2020directional . On top of that, there are several works utilizing the powerful transformer structure vaswani2017attention , and reporting satisfying results fuchs2020se ; liao2022equiformer . To better model the many-body interactions, irreducible representations (irreps) are adopted to represent high order geometric objects. The CG transform is used to translate between different irreps. The use of irreps significantly enhances the expressivity of models. Several works process and aggregate geometric information via leveraging such high degree irreps, which shows significant performance boost batatia2022mace ; batzner20223 ; musaelian2023learning ; gasteiger2021gemnet ; thomas2018tensor .

However, the benefit of high degree irreps and CG transform upon them is at the cost of heavy computational overhead. Irreps are extensions of scalars and vectors, and in this way CG transform also extends the dot product. The higher the degree of irreps for the CG transform, the greater the computational demands. The requirements for being permutation equivariant make this burden hard to be alleviated. EGNNs require each node to receive information from neighbour atoms together with the edges linking them, where the heavy computation of CG transform happens for each neighbour atom and edge, which means we cannot naïvely remove some neighbor computation, as it will break permutation equivariance. Moreover, the narrowness of the design space prevents us from freely constructing the CG transform layer. We need to operate on each neighbour atom in an equal way (e.g., the predecessors typically assign a same Multi-Layer Preceptor (MLP) operating on scalar features of the edge to produce the weights for each computation between the central atom and each neighbouring one batzner20223 ; musaelian2023learning ). To confront this challenge, we propose FreeCG. The model combines geometric features from the surrounding edges near each atom. We call the different aggregated edge geometric features abstract edges, which are permutation invariant. By the invariance transitivity, we show that CG transform on these abstract edges is always permutation invariant, regardless of concrete design, and does not affecting the permutation equivariance of the layer, thus being free of the burdens above. Futhermore, the abstract edges are constructed from different real edges, so they contain refined features of them for better model expressive power. The invariance nature of abstract edges allows us to assign different weights to different edges, instead of weights computed by the same MLP. We put abstract edges into groups, and operate on each group individually, to further decrease the computation demands. Works that keep E(3)E3{\text{E}}(3)E ( 3 ) equivariance are more expensive batzner20223 ; musaelian2023learning , since it requires an extra parity argument being 1111 or 11-1- 1, and thus the number of irreps doubled. Instead, we select an efficient set of paths for CG transform so that we maintain E(3)E3{\text{E}}(3)E ( 3 ) equivariance while being more efficient than kee** SE(3) equivariance. The abstract edges shuffling, inspired by zhang2018shufflenet , is also implied for combination of irreps features. The abstract edges are then plugged back into the self-attention calculation to improve the quality of the attention scores. The operations above are available thanks to the equivariance proposition about the abstract edges.

To evaluate our FreeCG, following previous works wang2024enhancing ; wang2023quinnet ; batzner20223 ; musaelian2023learning , we collect standard force field prediction benchmarks, MD17 chmiela2017machine , revised MD17 (rMD17) Christensen2020 , and MD22 chmiela2023accurate . To further examine the generalization of FreeCG on molecular propoerty prediction, we also conduct experiments on QM9 ruddigkeit2012enumeration ; ramakrishnan2014quantum . We follow the conventional training/validation/test splits, and report our results against several SoTA methods on these datasets. Remarkably, FreeCG outperforms other methods for force prediction in most molecules with maximum margins. The ablation studies are also conducted to validate the effectiveness of each proposed module, and evaluate the sensitivity of the hyper-parameters. To examine the efficiency of our modification to CG transform, we both provide theoretical numbers and evaluate the speed w.r.t. the group number and sparisty of CG transform. The speed and size of the overall FreeCG is also benchmarked, which proves the efficiency of our whole model.

The contributions of our work are summarized as follows:

  • We reveal two major issues in the current EGNNs with CG transform: heavy computation demands and narrow design space.

  • We propose to leverage permutation-invariant abstract edges, and by our proposed proposition, we completely free the design space of CG transform.

  • We propose FreeCG, comprising of three main components: Group CG transform with sparse path, abstract edges shuffling, and Attention enhancer, contributing to information-rich and efficient model with high-degree CG transform (see Fig. 1).

  • Experiments on small molecule datasets MD17, rMD17, large molecules ones MD22, and molecular property datasets QM9 reveals the SoTA performance of FreeCG.

  • We benchmark the speed and memory usage of our proposed modules and the overall FreeCG, demonstrating their efficiency.

  • Since the design space of CG transform is unrestricted, it presents a new paradigm for designing CG transform in future research, extending beyond the design in this work.

2 Results

2.1 Background

Group, equivariance and invariance. Permutation, rotation, and translation form different groups in group theory. Formally, a set with a binary operation (G,)𝐺(G,*)( italic_G , ∗ ) is said to be a group if and only if the following conditions hold: 1) g1g2G,subscript𝑔1subscript𝑔2𝐺g_{1}*g_{2}\in G,italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∗ italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_G , for any g1,g2Gsubscript𝑔1subscript𝑔2𝐺g_{1},g_{2}\in Gitalic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_G (closure) 2) (g1g2)g3=g1(g2g3),subscript𝑔1subscript𝑔2subscript𝑔3subscript𝑔1subscript𝑔2subscript𝑔3(g_{1}*g_{2})*g_{3}=g_{1}*(g_{2}*g_{3}),( italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∗ italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∗ italic_g start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∗ ( italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∗ italic_g start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) , for any g1,g2,g3Gsubscript𝑔1subscript𝑔2subscript𝑔3𝐺g_{1},g_{2},g_{3}\in Gitalic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∈ italic_G (associativity) 3) There exists a group element eG𝑒𝐺e\in Gitalic_e ∈ italic_G, such that ge=eg=g,𝑔𝑒𝑒𝑔𝑔g*e=e*g=g,italic_g ∗ italic_e = italic_e ∗ italic_g = italic_g , for any gG𝑔𝐺g\in Gitalic_g ∈ italic_G. (e𝑒eitalic_e identity element) 4) There is a group element gsuperscript𝑔g^{\prime}italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT w.r.t. g𝑔gitalic_g, such that gg=gg=e𝑔superscript𝑔superscript𝑔𝑔𝑒g*g^{\prime}=g^{\prime}*g=eitalic_g ∗ italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∗ italic_g = italic_e, for each gG𝑔𝐺g\in Gitalic_g ∈ italic_G. (gsuperscript𝑔g^{\prime}italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT inverse element) The group elements gG𝑔𝐺g\in Gitalic_g ∈ italic_G, according to the representation theory, can be represented as linear transformations 𝒫V(g)GL(V)subscript𝒫𝑉𝑔𝐺𝐿𝑉\mathcal{P}_{V}(g)\in GL(V)caligraphic_P start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( italic_g ) ∈ italic_G italic_L ( italic_V ) on vector space V𝑉Vitalic_V. Given a function f:XY:𝑓𝑋𝑌f:X\to Yitalic_f : italic_X → italic_Y, where X𝑋Xitalic_X and Y𝑌Yitalic_Y are vector spaces. It is said to be G𝐺Gitalic_G-equivariant if and only if f(𝒫X(g)x)=𝒫Y(g)f(x)𝑓subscript𝒫𝑋𝑔𝑥subscript𝒫𝑌𝑔𝑓𝑥f(\mathcal{P}_{X}(g)x)=\mathcal{P}_{Y}(g)f(x)italic_f ( caligraphic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_g ) italic_x ) = caligraphic_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_g ) italic_f ( italic_x ), for any gG𝑔𝐺g\in Gitalic_g ∈ italic_G. G𝐺Gitalic_G-invariance is a special case when 𝒫Y(g)subscript𝒫𝑌𝑔\mathcal{P}_{Y}(g)caligraphic_P start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_g ) is an identity matrix. Permutation equivariance and E(3)E3{\text{E}}(3)E ( 3 )-equivariance are two properties each layer of our model obeys. Permutation equivariance means the index of node or edge features will be consistent when passing a layer. E(3)E3{\text{E}}(3)E ( 3 )-equivariance covers rotation, translation, and reflection, where the translation is explicitly guaranteed via only considering the relative distances between atoms, thus we consider O(3)O3{\text{O}}(3)O ( 3 )-equivariance where translations are omitted. It is intuitive to correspondingly change directional features when the whole molecule rotates or reflects.

Tensor, irreps and CG transform. Tensors are high-dimensional generalizations of scalars, vectors, and matrices. Scalars and vectors are both special cases of Cartesian tensors. Tensor product can generate high-rank tensors from low-rank ones. Formally, tensors are the results of tensor product of several vectors and covectors. In our context, it is not essential to distinguish between vectors and covectors. Tensors representing groups can be further decomposed to the direct sum of irreps. For example, tensors of SO(3)SO3{\rm SO}(3)roman_SO ( 3 ) (omit reflection compared to O(3)O3{\text{O}}(3)O ( 3 )) on 9-space (from tensor product of two 3×3333\times 33 × 3 rotation matrix) can be decomposed into 1×1111\times 11 × 1 (l=0𝑙0l=0italic_l = 0), 3×3333\times 33 × 3 (l=1𝑙1l=1italic_l = 1), and 5×5555\times 55 × 5 (l=2𝑙2l=2italic_l = 2) irreps, which are called Wigner-D matrices. In EGNNs, we often project the distance vector between atoms onto the unit sphere S2superscript𝑆2S^{2}italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with the central atom as the center of sphere. Actually, S2superscript𝑆2S^{2}italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is homomorphic to the quotient group SO(3)///SO(2), thus it also has its own irreps, e.g., l=0𝑙0l=0italic_l = 0 scalar and l=1𝑙1l=1italic_l = 1 vector. S2superscript𝑆2S^{2}italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT irreps are the main features we maintain in our model, where irreps with degree l𝑙litalic_l has 2l+12𝑙12l+12 italic_l + 1 elements, which are often indexed by m𝑚mitalic_m. To combine these features, we can calculate the tensor product between them, and the results can, again, be decomposed to irreps. This process is CG transform, which utilizes CG coefficients to perform transformations. For instance, Am11,l2l3l1=m2,m3Cm1m2m3l1l2l3Am22,l2Am33,l3subscriptsuperscript𝐴maps-to1subscript𝑙2subscript𝑙3subscript𝑙1subscript𝑚1subscriptsubscript𝑚2subscript𝑚3subscriptsuperscript𝐶subscript𝑙1subscript𝑙2subscript𝑙3subscript𝑚1subscript𝑚2subscript𝑚3subscriptsuperscript𝐴2subscript𝑙2subscript𝑚2subscriptsuperscript𝐴3subscript𝑙3subscript𝑚3A^{1,l_{2}l_{3}\mapsto l_{1}}_{m_{1}}=\sum_{m_{2},m_{3}}C^{l_{1}l_{2}l_{3}}_{m% _{1}m_{2}m_{3}}A^{2,l_{2}}_{m_{2}}A^{3,l_{3}}_{m_{3}}italic_A start_POSTSUPERSCRIPT 1 , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ↦ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT 2 , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT 3 , italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, where Alsuperscript𝐴𝑙A^{l}italic_A start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT are S2superscript𝑆2S^{2}italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT irreps, m𝑚mitalic_m denotes the elements of irreps, and C𝐶Citalic_C the CG coefficient. To satisfy O(3)O3{\text{O}}(3)O ( 3 ), we consider an additional variable, parity p𝑝pitalic_p, which takes the values of 1111 or 11-1- 1. Irreps with p=1𝑝1p=-1italic_p = - 1 will be inverse when the space is reflected, and p=1𝑝1p=1italic_p = 1 unchanged. The above formula of CG transform becomes:

Am11,l2p2l3p3l1p1=𝟙(p1=p2p3)m2,m3Cm1m2m3l1l2l3subscriptsuperscript𝐴maps-to1subscript𝑙2subscript𝑝2subscript𝑙3subscript𝑝3subscript𝑙1subscript𝑝1subscript𝑚1subscript1subscript𝑝1subscript𝑝2subscript𝑝3subscriptsubscript𝑚2subscript𝑚3subscriptsuperscript𝐶subscript𝑙1subscript𝑙2subscript𝑙3subscript𝑚1subscript𝑚2subscript𝑚3\displaystyle A^{1,l_{2}p_{2}l_{3}p_{3}\mapsto l_{1}p_{1}}_{m_{1}}=\mathbbm{1}% _{(p_{1}=p_{2}p_{3})}\sum_{m_{2},m_{3}}C^{l_{1}l_{2}l_{3}}_{m_{1}m_{2}m_{3}}italic_A start_POSTSUPERSCRIPT 1 , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ↦ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = blackboard_1 start_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT (1)
Am22,l2p2Am33,l3p3subscriptsuperscript𝐴2subscript𝑙2subscript𝑝2subscript𝑚2subscriptsuperscript𝐴3subscript𝑙3subscript𝑝3subscript𝑚3\displaystyle A^{2,l_{2}p_{2}}_{m_{2}}A^{3,l_{3}p_{3}}_{m_{3}}italic_A start_POSTSUPERSCRIPT 2 , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT 3 , italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT

where 𝟙(expression)subscript1𝑒𝑥𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛\mathbbm{1}_{(expression)}blackboard_1 start_POSTSUBSCRIPT ( italic_e italic_x italic_p italic_r italic_e italic_s italic_s italic_i italic_o italic_n ) end_POSTSUBSCRIPT is the indicator function, outputing 1111 if expression𝑒𝑥𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛expressionitalic_e italic_x italic_p italic_r italic_e italic_s italic_s italic_i italic_o italic_n is true, and 00 otherwise. Given a vector (l=1𝑙1l=1italic_l = 1 S2superscript𝑆2S^{2}italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT irreps), we can lift it to irreps with arbitrary degree l𝑙litalic_l and p=(1)l𝑝superscript1𝑙p=(-1)^{l}italic_p = ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT, via a series of real spherical harmonics (Ym=1l,,Ym=2l+1l)subscriptsuperscript𝑌𝑙𝑚1subscriptsuperscript𝑌𝑙𝑚2𝑙1(Y^{l}_{m=1},...,Y^{l}_{m=2l+1})( italic_Y start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m = 2 italic_l + 1 end_POSTSUBSCRIPT ). For further details about group theory, we refer interested readers to related books and papers zee2016group ; raczka1986theory ; thomas2018tensor ; jeevanjee2011introduction ; cohen2018spherical .

2.2 Problem analysis

The task of force field prediction can be formalised as follows: Given a set of atoms with their positions and atom types {𝑿,𝒁}𝑿𝒁\{\bm{X},\bm{Z}\}{ bold_italic_X , bold_italic_Z }, the neural network fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT with parameter θ𝜃\thetaitalic_θ aims to predict the energy, and by which it derives the predicted force on each atom. In each layer of NequIP batzner20223 , messages from neighboring atoms are aggregated and combined with the features of the central atom. The messages are created via CG transform between the irreps. Here, we revisit the critical step constructing messages to a central atom a𝑎aitalic_a in NequIP:

acmolepelnpnlopo(𝑿,𝑵)=𝟙(po=pepn)memnCmomemnlolelnsubscriptsuperscriptmaps-tosubscript𝑙𝑒subscript𝑝𝑒subscript𝑙𝑛subscript𝑝𝑛subscript𝑙𝑜subscript𝑝𝑜𝑎𝑐subscript𝑚𝑜𝑿𝑵subscript1subscript𝑝𝑜subscript𝑝𝑒subscript𝑝𝑛subscriptsubscript𝑚𝑒subscript𝑚𝑛subscriptsuperscript𝐶subscript𝑙𝑜subscript𝑙𝑒subscript𝑙𝑛subscript𝑚𝑜subscript𝑚𝑒subscript𝑚𝑛\displaystyle\mathcal{L}^{l_{e}p_{e}l_{n}p_{n}\mapsto l_{o}p_{o}}_{acm_{o}}(% \bm{X},\bm{N})=\mathbbm{1}_{(p_{o}=p_{e}p_{n})}\sum_{m_{e}m_{n}}C^{l_{o}l_{e}l% _{n}}_{m_{o}m_{e}m_{n}}caligraphic_L start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ↦ italic_l start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a italic_c italic_m start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_X , bold_italic_N ) = blackboard_1 start_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT (2)
b𝒩(a)(R(rab)cloleln)Ymele(rabrab)Nbcmnlnpnsubscript𝑏𝒩𝑎𝑅subscriptsuperscriptdelimited-∥∥subscript𝑟𝑎𝑏subscript𝑙𝑜subscript𝑙𝑒subscript𝑙𝑛𝑐subscriptsuperscript𝑌subscript𝑙𝑒subscript𝑚𝑒subscript𝑟𝑎𝑏delimited-∥∥subscript𝑟𝑎𝑏subscriptsuperscript𝑁subscript𝑙𝑛subscript𝑝𝑛𝑏𝑐subscript𝑚𝑛\displaystyle\sum_{b\in\mathcal{N}(a)}(R(\lVert r_{ab}\rVert)^{l_{o}l_{e}l_{n}% }_{c})Y^{l_{e}}_{m_{e}}(\frac{r_{ab}}{\lVert r_{ab}\rVert})N^{l_{n}p_{n}}_{bcm% _{n}}∑ start_POSTSUBSCRIPT italic_b ∈ caligraphic_N ( italic_a ) end_POSTSUBSCRIPT ( italic_R ( ∥ italic_r start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT ∥ ) start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) italic_Y start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( divide start_ARG italic_r start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT end_ARG start_ARG ∥ italic_r start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT ∥ end_ARG ) italic_N start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b italic_c italic_m start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT

where 𝒩(a)𝒩𝑎\mathcal{N}(a)caligraphic_N ( italic_a ) is the set of neighboring atoms of atom a𝑎aitalic_a. R𝑅Ritalic_R is a MLP. delimited-∥∥\lVert*\rVert∥ ∗ ∥ is the Euclidean norm. Nbsubscript𝑁𝑏N_{b}italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is the features of node b𝑏bitalic_b. rabsubscript𝑟𝑎𝑏r_{ab}italic_r start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT is the vector pointing from atom a𝑎aitalic_a to b𝑏bitalic_b. Consider the vector function form of Eq. 2: 𝓛cmolepelnpnlopo=(1cmolepelnpnlopo,2cmolepelnpnlopo,)subscriptsuperscript𝓛maps-tosubscript𝑙𝑒subscript𝑝𝑒subscript𝑙𝑛subscript𝑝𝑛subscript𝑙𝑜subscript𝑝𝑜𝑐subscript𝑚𝑜subscriptsuperscriptmaps-tosubscript𝑙𝑒subscript𝑝𝑒subscript𝑙𝑛subscript𝑝𝑛subscript𝑙𝑜subscript𝑝𝑜1𝑐subscript𝑚𝑜subscriptsuperscriptmaps-tosubscript𝑙𝑒subscript𝑝𝑒subscript𝑙𝑛subscript𝑝𝑛subscript𝑙𝑜subscript𝑝𝑜2𝑐subscript𝑚𝑜\bm{\mathcal{L}}^{l_{e}p_{e}l_{n}p_{n}\mapsto l_{o}p_{o}}_{cm_{o}}=(\mathcal{L% }^{l_{e}p_{e}l_{n}p_{n}\mapsto l_{o}p_{o}}_{1cm_{o}},\mathcal{L}^{l_{e}p_{e}l_% {n}p_{n}\mapsto l_{o}p_{o}}_{2cm_{o}},...)bold_caligraphic_L start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ↦ italic_l start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c italic_m start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ( caligraphic_L start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ↦ italic_l start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 italic_c italic_m start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUBSCRIPT , caligraphic_L start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ↦ italic_l start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 italic_c italic_m start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … ), which is permutation equivariant w.r.t. permutation operations acting on 𝑿𝑿\bm{X}bold_italic_X and 𝑵𝑵\bm{N}bold_italic_N. Formaly, it means 𝓛cmolepelnpnlopo(𝒫𝑿𝑿,𝒫𝑵𝑵)=𝒫𝓛𝓛cmolepelnpnlopo(𝑿,𝑵)subscriptsuperscript𝓛maps-tosubscript𝑙𝑒subscript𝑝𝑒subscript𝑙𝑛subscript𝑝𝑛subscript𝑙𝑜subscript𝑝𝑜𝑐subscript𝑚𝑜subscript𝒫𝑿𝑿subscript𝒫𝑵𝑵subscript𝒫𝓛subscriptsuperscript𝓛maps-tosubscript𝑙𝑒subscript𝑝𝑒subscript𝑙𝑛subscript𝑝𝑛subscript𝑙𝑜subscript𝑝𝑜𝑐subscript𝑚𝑜𝑿𝑵\bm{\mathcal{L}}^{l_{e}p_{e}l_{n}p_{n}\mapsto l_{o}p_{o}}_{cm_{o}}(\mathcal{P}% _{\bm{X}}\bm{X},\mathcal{P}_{\bm{N}}\bm{N})=\mathcal{P}_{\bm{\mathcal{L}}}\bm{% \mathcal{L}}^{l_{e}p_{e}l_{n}p_{n}\mapsto l_{o}p_{o}}_{cm_{o}}(\bm{X},\bm{N})bold_caligraphic_L start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ↦ italic_l start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c italic_m start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( caligraphic_P start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT bold_italic_X , caligraphic_P start_POSTSUBSCRIPT bold_italic_N end_POSTSUBSCRIPT bold_italic_N ) = caligraphic_P start_POSTSUBSCRIPT bold_caligraphic_L end_POSTSUBSCRIPT bold_caligraphic_L start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ↦ italic_l start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c italic_m start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_X , bold_italic_N ). Put simply, if we exchange the indexes of two atoms, for example, 1 and 2, and feed them into function 𝓛cmolepelnpnloposubscriptsuperscript𝓛maps-tosubscript𝑙𝑒subscript𝑝𝑒subscript𝑙𝑛subscript𝑝𝑛subscript𝑙𝑜subscript𝑝𝑜𝑐subscript𝑚𝑜\bm{\mathcal{L}}^{l_{e}p_{e}l_{n}p_{n}\mapsto l_{o}p_{o}}_{cm_{o}}bold_caligraphic_L start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ↦ italic_l start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c italic_m start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUBSCRIPT, it equals to that we directly change the index 1 and 2 of the output of function 𝓛cmolepelnpnloposubscriptsuperscript𝓛maps-tosubscript𝑙𝑒subscript𝑝𝑒subscript𝑙𝑛subscript𝑝𝑛subscript𝑙𝑜subscript𝑝𝑜𝑐subscript𝑚𝑜\bm{\mathcal{L}}^{l_{e}p_{e}l_{n}p_{n}\mapsto l_{o}p_{o}}_{cm_{o}}bold_caligraphic_L start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ↦ italic_l start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c italic_m start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUBSCRIPT, which is (2cmolepelnpnlopo,1cmolepelnpnlopo,)subscriptsuperscriptmaps-tosubscript𝑙𝑒subscript𝑝𝑒subscript𝑙𝑛subscript𝑝𝑛subscript𝑙𝑜subscript𝑝𝑜2𝑐subscript𝑚𝑜subscriptsuperscriptmaps-tosubscript𝑙𝑒subscript𝑝𝑒subscript𝑙𝑛subscript𝑝𝑛subscript𝑙𝑜subscript𝑝𝑜1𝑐subscript𝑚𝑜(\mathcal{L}^{l_{e}p_{e}l_{n}p_{n}\mapsto l_{o}p_{o}}_{2cm_{o}},\mathcal{L}^{l% _{e}p_{e}l_{n}p_{n}\mapsto l_{o}p_{o}}_{1cm_{o}},...)( caligraphic_L start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ↦ italic_l start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 italic_c italic_m start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUBSCRIPT , caligraphic_L start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ↦ italic_l start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 italic_c italic_m start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … ). This property is simple, but very important for the molecular neural networks, since the properties of a molecule should not depend on the order in which these atoms are arranged.

Most works take this property for granted. However, the permutation equivariance is actually important but vulnerable. It limits the design space to a very small scope, and make the network poorly scalable when the number of neighbors arises. Specifically, it brings the following issues:

Problem 1.

The CG transform layer scales as 𝒪(maxicard(𝒩(i)))𝒪subscript𝑖card𝒩𝑖\mathcal{O}(\max\limits_{i}{\rm card}(\mathcal{N}(i)))caligraphic_O ( roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_card ( caligraphic_N ( italic_i ) ) ), where card(X)card𝑋{\rm card}(X)roman_card ( italic_X ) is the number of elements in set X𝑋Xitalic_X. One cannot arbitrarily remove calculations for a specific neighboring atom because it would break the permutation equivariance.

Problem 2.

The design space is limited for maintaining permutation equivariance. For example, in Eq. 2, the formulation and the parameters of R𝑅Ritalic_R should be the same across different neighboring atoms, thus forbidding the design for complicated CG transform layers.

Problem 1 brings heavy computation demands, as the CG transform itself is very time-consuming, compared to dot product and element-wise multiplication. We provide a detailed analysis for the efficiency of CG transform in Method section. On the other hand, the narrowness for design space brought by problem 2 makes it hard to design a high expressive CG transform layer, as only limited structures can be designed to maintain permutation equivariance. To address these problems, we aim to free the CG transform in messages transmissions from the constraints of permutation equivariance without compromising the overall equivariance of the network. Here, we leverage a simple and useful mathematical property. Consider a function hhitalic_h that can be written as:

h(x)=h(h1(x),h2(x),)𝑥superscriptsubscript1𝑥subscript2𝑥h(x)=h^{{}^{\prime}}(h_{1}(x),h_{2}(x),...)italic_h ( italic_x ) = italic_h start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) , italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) , … ) (3)

if h(x)subscript𝑥h_{*}(x)italic_h start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( italic_x ) are all G𝐺Gitalic_G-invariant, then, no matter how we design hhitalic_h, the overall function hhitalic_h must be G𝐺Gitalic_G-equivariant. The proof is simple, as:

h(h1(𝒫X(g)x),h2(𝒫X(g)x),)=superscriptsubscript1subscript𝒫𝑋𝑔𝑥subscript2subscript𝒫𝑋𝑔𝑥absent\displaystyle h^{{}^{\prime}}(h_{1}(\mathcal{P}_{X}(g)x),h_{2}(\mathcal{P}_{X}% (g)x),...)=italic_h start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_g ) italic_x ) , italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( caligraphic_P start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_g ) italic_x ) , … ) = (4)
𝒫h(e)h(h1(x),h2(x),)subscript𝒫𝑒superscriptsubscript1𝑥subscript2𝑥\displaystyle\mathcal{P}_{h}(e)h^{{}^{\prime}}(h_{1}(x),h_{2}(x),...)caligraphic_P start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_e ) italic_h start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) , italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) , … )

Invariants components will not affect the equivariance of the layer. Thus, we can first obtain a set of invariant functions hsubscripth_{*}italic_h start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT, then we can freely design the function hhitalic_h of them.

2.3 FreeCG

Abstract edges. The above proposition presents an elegant way to solve problem 1 and 2. The idea is to put CG transform inside the function hhitalic_h, and by the conclusion, we can completely free the design space of the CG transform. The first step is to construct the permutation invariant function hsubscripth_{*}italic_h start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT. To emphasize the geometric information, we want these hsubscripth_{*}italic_h start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT’s to be the aggregation of edge features. We call hsubscripth_{*}italic_h start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT’s abstract edges. For the concrete design, we take the transformer architecture in VisNet wang2024enhancing as an efficient tool to construct abstract edges. The detailed information of VisNet is in Appendix. In VisNet, each edge maintains high-degree features Eij=Eijl=1Eijl=2subscript𝐸𝑖𝑗direct-sumsubscriptsuperscript𝐸𝑙1𝑖𝑗subscriptsuperscript𝐸𝑙2𝑖𝑗E_{ij}=E^{l=1}_{ij}\oplus E^{l=2}_{ij}italic_E start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_E start_POSTSUPERSCRIPT italic_l = 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ⊕ italic_E start_POSTSUPERSCRIPT italic_l = 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT consisting of irreps Eijl=Yl(rij/rij)subscriptsuperscript𝐸𝑙𝑖𝑗superscript𝑌𝑙subscript𝑟𝑖𝑗delimited-∥∥subscript𝑟𝑖𝑗E^{l}_{ij}=Y^{l}(r_{ij}/\lVert r_{ij}\rVert)italic_E start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_Y start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT / ∥ italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∥ ). The above features are invariant to layer index L𝐿Litalic_L. The computed attention aijL,tsubscriptsuperscript𝑎𝐿𝑡𝑖𝑗a^{L,t}_{ij}italic_a start_POSTSUPERSCRIPT italic_L , italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is multiplied to each edge. The sum of them E^i,tL=ij(i)aij,tLEijLsubscriptsuperscript^𝐸𝐿𝑖𝑡subscript𝑖𝑗𝑖subscriptsuperscript𝑎𝐿𝑖𝑗𝑡subscriptsuperscript𝐸𝐿𝑖𝑗\hat{E}^{L}_{i,t}=\sum_{ij\in\mathcal{E}(i)}a^{L}_{ij,t}E^{L}_{ij}over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i italic_j ∈ caligraphic_E ( italic_i ) end_POSTSUBSCRIPT italic_a start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j , italic_t end_POSTSUBSCRIPT italic_E start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT forms an abstract edge, where we omit the degree l𝑙litalic_l, and L𝐿Litalic_L the index of the layer. t𝑡titalic_t denotes the index of the t𝑡titalic_t-th abstract edges (E^i,t=1L,E^i,t=2L,)subscriptsuperscript^𝐸𝐿𝑖𝑡1subscriptsuperscript^𝐸𝐿𝑖𝑡2(\hat{E}^{L}_{i,t=1},\hat{E}^{L}_{i,t=2},...)( over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_t = 1 end_POSTSUBSCRIPT , over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_t = 2 end_POSTSUBSCRIPT , … ). In the original VisNet, it was used to update the geometric feature dE¯iL+1=E^iL+oiL,1Linear(E¯iL)𝑑subscriptsuperscript¯𝐸𝐿1𝑖subscriptsuperscript^𝐸𝐿𝑖subscriptsuperscript𝑜𝐿1𝑖Linearsubscriptsuperscript¯𝐸𝐿𝑖d\overline{E}^{L+1}_{i}=\hat{E}^{L}_{i}+o^{L,1}_{i}\cdot{\rm Linear}(\overline% {E}^{L}_{i})italic_d over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_o start_POSTSUPERSCRIPT italic_L , 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ roman_Linear ( over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), where LinearLinear{\rm Linear}roman_Linear is a fully-connected linear operation, which performs across the dimension of t𝑡titalic_t, thus does not break the equivariance, and oiL,1subscriptsuperscript𝑜𝐿1𝑖o^{L,1}_{i}italic_o start_POSTSUPERSCRIPT italic_L , 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a variable generated from the node feature, as we will introduce in Method section. We leverage the fact that E^Lsuperscript^𝐸𝐿\hat{E}^{L}over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT fits the requirements of hsubscripth_{*}italic_h start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT in our proposition, take them as abstract edges, and propose methods to construct CG transform function hhitalic_h upon it. The proof that each abstract edge meets the requirement for hsubscripth_{*}italic_h start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT, namely, it is permutation invariant, is in Method section.

Group CG transform. The number of abstract edges is decided by us, so the complexity for computing CG transform in Problem 1 is controlled to be constant. The proposition above gives us enough freedom to construct the CG transform function hhitalic_h, expanding the design space to maximum, alleviating Problem 2. The idea is to use CG transform to replace the updating mechanism of E¯¯𝐸\overline{E}over¯ start_ARG italic_E end_ARG in VisNet. A naïve attempt is to directly take the CG transform between E¯Lsuperscript¯𝐸𝐿\overline{E}^{L}over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT and E^Lsuperscript^𝐸𝐿\hat{E}^{L}over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT to acquire E¯L+1superscript¯𝐸𝐿1\overline{E}^{L+1}over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT. However, we want to further decrease the O(T2)𝑂superscript𝑇2O(T^{2})italic_O ( italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) time complexity for the CG transform, where T𝑇Titalic_T is the number of abstract edges, even though it is a constant number. Leveraging the unlimited freedom in constructing hhitalic_h, and taking inspiration of group convolution krizhevsky2012imagenet , we propose group CG transform (distinct from the group in group theory). We first split the abstract edges of E¯Lsuperscript¯𝐸𝐿\overline{E}^{L}over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT and E^Lsuperscript^𝐸𝐿\hat{E}^{L}over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT into groups, where each index of abstract edge belongs to some group Ugsubscript𝑈𝑔U_{g}italic_U start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT , the integer g𝑔gitalic_g ranges from 1111 to G𝐺Gitalic_G, and G𝐺Gitalic_G a hyper-parameter for the number of total groups. Then a group CG transform acts as:

dE¯i,tomoL+1,lo,po=𝟙(po=p1p2)l1,l2m1,m2Cmom1m2lo,l1,l2𝑑subscriptsuperscript¯𝐸𝐿1subscript𝑙𝑜subscript𝑝𝑜𝑖subscript𝑡𝑜subscript𝑚𝑜subscript1subscript𝑝𝑜subscript𝑝1subscript𝑝2subscriptsubscript𝑙1subscript𝑙2subscriptsubscript𝑚1subscript𝑚2subscriptsuperscript𝐶subscript𝑙𝑜subscript𝑙1subscript𝑙2subscript𝑚𝑜subscript𝑚1subscript𝑚2\displaystyle d\overline{E}^{L+1,l_{o},p_{o}}_{i,t_{o}m_{o}}=\mathbbm{1}_{(p_{% o}=p_{1}p_{2})}\sum_{l_{1},l_{2}}\sum_{m_{1},m_{2}}C^{l_{o},l_{1},l_{2}}_{m_{o% }m_{1}m_{2}}italic_d over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L + 1 , italic_l start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_t start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUBSCRIPT = blackboard_1 start_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT (5)
t1,t2UgWtot1t2lo,l1,l2oL,1Linear(E¯iL)t1m1l1,p1E^i,t2m2L,l2,p2subscriptsubscript𝑡1subscript𝑡2subscript𝑈𝑔subscriptsuperscript𝑊subscript𝑙𝑜subscript𝑙1subscript𝑙2subscript𝑡𝑜subscript𝑡1subscript𝑡2superscript𝑜𝐿1Linearsubscriptsuperscriptsubscriptsuperscript¯ELisubscriptl1subscriptp1subscriptt1subscriptm1subscriptsuperscript^ELsubscriptl2subscriptp2isubscriptt2subscriptm2\displaystyle\sum_{t_{1},t_{2}\in U_{g}}W^{l_{o},l_{1},l_{2}}_{t_{o}t_{1}t_{2}% }o^{L,1}\rm{Linear}(\overline{E}^{L}_{i})^{l_{1},p_{1}}_{t_{1}m_{1}}\hat{E}^{L% ,l_{2},p_{2}}_{i,t_{2}m_{2}}∑ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_U start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_W start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_o start_POSTSUPERSCRIPT italic_L , 1 end_POSTSUPERSCRIPT roman_Linear ( over¯ start_ARG roman_E end_ARG start_POSTSUPERSCRIPT roman_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT roman_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , roman_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT over^ start_ARG roman_E end_ARG start_POSTSUPERSCRIPT roman_L , roman_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , roman_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_i , roman_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT

where toUgsubscript𝑡𝑜subscript𝑈𝑔t_{o}\in U_{g}italic_t start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ∈ italic_U start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT. The group CG transform decreases the time complexity to O(T2/G)𝑂superscript𝑇2𝐺O(T^{2}/G)italic_O ( italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_G ). Here, the parameters W𝑊Witalic_W for CG transform are also worth emphasizing. They are not necessary to be kept the same across different abstract edges t𝑡titalic_t to keep permutation equivariance, and do not need to adopt the same MLP for each edge to calculate weights. Thus, we directly assign different weights W𝑊Witalic_W for different abstract edges to enhance the model expressive ability. In contrast to previous methods, We save the computational cost for calculating weights for each edge.

Sparse path. Typically, ensuring SO(3) equivariance is considered more efficent than ensuring O(3) equivariance. It is because we often need to consider both p=1𝑝1p=1italic_p = 1 and p=1𝑝1p=-1italic_p = - 1 for a single l𝑙litalic_l for O(3), thus the total computation is quadrupled, and memory usage is doubled. Here we propose a method to keep O(3) while being even efficient than SO(3). We only keep (l=1,p=1)formulae-sequence𝑙1𝑝1(l=1,p=-1)( italic_l = 1 , italic_p = - 1 ) and (l=2,p=1)formulae-sequence𝑙2𝑝1(l=2,p=1)( italic_l = 2 , italic_p = 1 ), which is same as the order of directly using spherical harmonics. In such way, It suffices that each output irreps containing information from both input irreps through CG transform, as (l=1,p=1)(l=2,p=1)(l=1,p=1)maps-toformulae-sequence𝑙1𝑝1formulae-sequence𝑙2𝑝1formulae-sequence𝑙1𝑝1(l=1,p=-1)*(l=2,p=1)\mapsto(l=1,p=-1)( italic_l = 1 , italic_p = - 1 ) ∗ ( italic_l = 2 , italic_p = 1 ) ↦ ( italic_l = 1 , italic_p = - 1 ), (l=1,p=1)(l=1,p=1)(l=2,p=1)maps-toformulae-sequence𝑙1𝑝1formulae-sequence𝑙1𝑝1formulae-sequence𝑙2𝑝1(l=1,p=-1)*(l=1,p=-1)\mapsto(l=2,p=1)( italic_l = 1 , italic_p = - 1 ) ∗ ( italic_l = 1 , italic_p = - 1 ) ↦ ( italic_l = 2 , italic_p = 1 ), (l=2,p=1)(l=2,p=1)(l=2,p=1)maps-toformulae-sequence𝑙2𝑝1formulae-sequence𝑙2𝑝1formulae-sequence𝑙2𝑝1(l=2,p=1)*(l=2,p=1)\mapsto(l=2,p=1)( italic_l = 2 , italic_p = 1 ) ∗ ( italic_l = 2 , italic_p = 1 ) ↦ ( italic_l = 2 , italic_p = 1 ), and (l=1,p=1)(l=1,p=1)(l=2,p=1)maps-toformulae-sequence𝑙1𝑝1formulae-sequence𝑙1𝑝1formulae-sequence𝑙2𝑝1(l=1,p=-1)*(l=1,p=-1)\mapsto(l=2,p=1)( italic_l = 1 , italic_p = - 1 ) ∗ ( italic_l = 1 , italic_p = - 1 ) ↦ ( italic_l = 2 , italic_p = 1 ). There are only 4 path in contrast to 8 path for SO(3), being O(3) equivariant but even efficient than being SO(3) equivariant. We show the path in Fig. 2(a).

Abstract edges shuffling. Inspired by ShuffleNet zhang2018shufflenet , we can also shuffle the abstract edges to make the information exchanged comprehensively. We shuffle all the abstract edges. Specifically, we increase the indices of all irreps by 1.5T/G1.5𝑇𝐺1.5*T/G1.5 ∗ italic_T / italic_G. If the index exceeds T𝑇Titalic_T, we start counting from 1 again. Theoretically, the shuffling strategy can be arbitrary as long as maintaining the same strategy for each layer during every inference. This process is also depicted in Fig. 2(b) The ablation on different strategies is shown in Results section.

Abstract edges enhance self-attention. The Transformer integrates neighboring atoms information in molecular tasks through self-attention mechanism, which aims to capture relations for those atoms exhibiting strong interatomic correlations. In order to better utilize abstract edge information, we leverage it to augment the generation of attention scores. To calculate the self-attention, the node scalar features are processed to generate query Q𝑄Qitalic_Q, key K𝐾Kitalic_K, and value V𝑉Vitalic_V for each atom, respectively. Then, the self attention is computed as Aij=SiLU(QiKj)subscript𝐴𝑖𝑗SiLUdirect-productsubscript𝑄𝑖subscript𝐾𝑗A_{ij}=\mathop{\rm SiLU}(Q_{i}\odot K_{j})italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = roman_SiLU ( italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊙ italic_K start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), where direct-product\odot represents dot product, and SiLU is the activation function. Note that VisNet is different from other transformer-based models where Aijsubscript𝐴𝑖𝑗A_{ij}italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is scaled by the SiLU instead of Softmax across different j𝑗jitalic_j. We integrate the information of abstract edges by:

Aij=SiLU(QiKj+maxt(E¯j,tLEij))subscript𝐴𝑖𝑗SiLUdirect-productsubscript𝑄𝑖subscript𝐾𝑗subscript𝑡direct-productsubscriptsuperscript¯𝐸𝐿𝑗𝑡subscript𝐸𝑖𝑗A_{ij}=\mathop{\rm SiLU}(Q_{i}\odot K_{j}+\max\limits_{t}(\overline{E}^{L}_{j,% t}\odot E_{ij}))italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = roman_SiLU ( italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊙ italic_K start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + roman_max start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_t end_POSTSUBSCRIPT ⊙ italic_E start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ) (6)

where the maxt(E¯j,tLEij)subscript𝑡direct-productsubscriptsuperscript¯𝐸𝐿𝑗𝑡subscript𝐸𝑖𝑗\max\limits_{t}(\overline{E}^{L}_{j,t}\odot E_{ij})roman_max start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_t end_POSTSUBSCRIPT ⊙ italic_E start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) means each abstract edge E¯j,tLsubscriptsuperscript¯𝐸𝐿𝑗𝑡\overline{E}^{L}_{j,t}over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_t end_POSTSUBSCRIPT is dot product with the real edge features Eijsubscript𝐸𝑖𝑗E_{ij}italic_E start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, and the maximum value among different abstract edges. There has no L𝐿Litalic_L superscript for Eijsubscript𝐸𝑖𝑗E_{ij}italic_E start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT because they are the same across different layers. We take it as the additional contribution to the self-attention, as it measures how abstract edges contain the information of the edge linking atoms i𝑖iitalic_i and j𝑗jitalic_j.

2.4 Experiments

To evaluate the performance of FreeCG, we collect molecular force field datasets, on which we compare our methods with other SoTA MLFFs, which include small molecular dataset MD17 chmiela2017machine with its revised version, rMD17 Christensen2020 , and large molecule dataset MD22 chmiela2023accurate . To test the generalization capacity of the proposed FreeCG, we also evaluate the performance of FreeCG on a standard molecule property prediction dataset, QM9 ruddigkeit2012enumeration ; ramakrishnan2014quantum . We take popular SoTA models into the comparison, including sGDML chmiela2017machine , SchNet schutt2018schnet , DimeNet gasteiger2020directional , SphereNet liu2021spherical , PaxNet zhang2022efficient , PaiNN schutt2021equivariant , SpookyNet unke2021spookynet , ET tholke2021equivariant , GemNet gasteiger2021gemnet , ComENet wang2022comenet , NequIP batzner20223 , UniTE qiao2022informing , SO3KRATES frank2022so3krates , MACE batatia2022mace , Allegro musaelian2023learning , BOTNet batatia2022design , VisNet wang2024enhancing , and QuinNet wang2023quinnet . The ablations on each component and corresponding hyper-parameters of FreeCG are also presented. The comprehensive introduction for each datasets, and detailed settings for the training are in Method section.

Table 1: Performances on MD17 dataset.
Molecule SchNet DimeNet PaiNN SpookeyNet ET GemNet NequIP SO3KRATES VisNet QuinNet FreeCG
Energy Prediction
Aspirin 0.37 0.204 0.167 0.151 0.123 - 0.131 0.139 0.116 0.119 0.110
Ethanol 0.08 0.064 0.064 0.052 0.052 - 0.051 0.052 0.051 0.050 0.049
Malondialdehyde 0.13 0.104 0.091 0.079 0.077 - 0.076 0.077 0.075 0.078 0.094
Naphthalene 0.16 0.122 0.116 0.116 0.085 - 0.113 0.115 0.085 0.101 0.083
Salicylic acid 0.20 0.134 0.116 0.114 0.093 - 0.106 0.016 0.092 0.101 0.090
Toluene 0.12 0.102 0.095 0.094 0.074 - 0.092 0.095 0.074 0.080 0.076
Uracil 0.14 0.115 0.106 0.105 0.095 - 0.104 0.103 0.095 0.096 0.097
Force Prediction
Aspirin 1.35 0.499 0.338 0.258 0.253 0.217 0.184 0.236 0.155 0.145 0.122
Ethanol 0.39 0.230 0.224 0.094 0.109 0.085 0.071 0.096 0.060 0.060 0.053
Malondialdehyde 0.66 0.383 0.319 0.167 0.169 0.155 0.129 0.147 0.100 0.097 0.095
Naphthalene 0.58 0.215 0.077 0.089 0.061 0.051 0.039 0.074 0.039 0.039 0.034
Salicylic acid 0.85 0.374 0.195 0.180 0.129 0.125 0.090 0.145 0.084 0.080 0.070
Toluene 0.57 0.216 0.094 0.087 0.067 0.060 0.046 0.073 0.039 0.039 0.035
Uracil 0.56 0.301 0.139 0.119 0.095 0.097 0.076 0.111 0.062 0.062 0.059
The results are reported in mean abosolute error (MAE). The energy and force are measured in kcal/mol and kcal/mol/Å, respectively. The best numbers are marked in bold.
Table 2: Performances on rMD17 dataset.
Molecule UNiTE GemNet NequIP MACE Allergo BOTNet VisNet QuinNet FreeCG
Energy Prediction
Aspirin 0.055 - 0.0530 0.0507 0.0530 0.0530 0.0445 0.0486 0.0530
Azobenzene 0.025 - 0.0161 0.0277 0.0277 0.0161 0.0156 0.0394 0.0217
Benzene 0.002 - 0.0009 0.0092 0.0069 0.0007 0.0007 0.0096 0.0107
Ethanol 0.014 - 0.0092 0.0032 0.0092 0.0092 0.0078 0.0096 0.0087
Malonaldehyde 0.025 - 0.0184 0.0185 0.0138 0.185 0.0132 0.0168 0.0146
Naphthalene 0.011 - 0.0046 0.1153 0.0046 0.0046 0.0057 0.0174 0.0118
Paracetamol 0.044 - 0.0323 0.0300 0.0346 0.0300 0.0258 0.0362 0.0392
Salicylic acid 0.017 - 0.0161 0.0208 0.0208 0.0185 0.0161 0.033 0.0233
Toluene 0.010 - 0.0069 0.0115 0.0092 0.0069 0.0059 0.0139 0.0334
Uracil 0.013 - 0.0092 0.0115 0.0138 0.0092 0.0069 0.0149 0.0116
Force Prediction
Aspirin 0.175 0.2191 0.1891 0.1522 0.1684 0.1900 0.1520 0.1429 0.1212
Azobenzene 0.097 - 0.0669 0.0692 0.0600 0.0761 0.0585 0.0513 0.0486
Benzene 0.017 0.0115 0.0069 0.0069 0.0046 0.0069 0.0056 0.0047 0.0056
Ethanol 0.085 0.083 0.0646 0.0484 0.0484 0.0738 0.0522 0.0516 0.0438
Malonaldehyde 0.152 0.1522 0.0118 0.0946 0.0830 0.1338 0.0893 0.0875 0.0802
Naphthalene 0.060 0.0438 0.0300 0.0369 0.0208 0.0415 0.0291 0.0242 0.0228
Paracetamol 0.164 - 0.1361 0.1107 0.1130 0.1338 0.1029 0.0979 0.0840
Salicylic acid 0.088 0.1222 0.0922 0.0715 0.0669 0.0992 0.0795 0.0771 0.0648
Toluene 0.058 0.0507 0.0369 0.0350 0.0415 0.0438 0.0264 0.0244 0.0239
Uracil 0.088 0.0876 0.0669 0.0484 0.0415 0.0738 0.0495 0.0487 0.0446
The results are reported in MAE. The energy and force are measured in kcal/mol and kcal/mol/Å, respectively. The best numbers are marked in bold.
Table 3: Performances on MD22 dataset.
Molecule sGDML ViSNet ViSNet-Improper ViSNet-LSRM MACE QuinNet FreeCG
Energy Prediction
Ac-Ala3-NHMe 0.391 0.0636 0.0546 0.0673 0.0631 0.0840 0.507
AT-AT 0.720 0.0708 0.0668 0.0780 0.108 0.144 0.0665
AT-AT-CG-CG 1.42 0.196 0.197 0.118 0.154 0.379 0.254
DHA 1.29 0.0741 0.0700 0.0897 0.135 0.118 0.0761
Buckyball catcher 1.17 0.508 0.537 0.319 0.489 0.563 0.512
Stachyose 4.00 0.0915 0.0882 0.104 0.122 0.226 0.183
Double-walled nanotube 4.00 0.800 0.601 1.81 1.67 1.81 0.543
Force Prediction
Ac-Ala3-NHMe 0.790 0.0830 0.0709 0.0942 0.0876 0.0681 0.0531
AT-AT 0.690 0.0812 0.0776 0.0781 0.0992 0.0687 0.0634
AT-AT-CG-CG 0.700 0.148 0.139 0.1064 0.1153 0.1273 0.1252
DHA 0.750 0.0598 0.0554 0.0598 0.0646 0.0515 0.0507
Buckyball catcher 0.680 0.184 0.201 0.1026 0.0853 0.1091 0.1783
Stachyose 0.680 0.0879 0.0802 0.0767 0.0876 0.0543 0.612
Double-walled nanotube 0.520 0.362 0.292 0.3391 0.2767 0.2473 0.2449
The results are reported in MAE. The energy and force are measured in kcal/mol and kcal/mol/Å, respectively. The best numbers are marked in bold. Note that the energy MAE is calculated without divided by the total number of atoms, unlike wang2023quinnet .
Table 4: Molecular property prediction on QM9 dataset.
Target SchNet EGNN DimeNet++ PaiNN SphereNet PaxNet ET ComENet ViSNet FreeCG
μ𝜇\muitalic_μ mD 33 29 29.7 12 24.5 10.8 11 24.5 9.5 11.4
α𝛼\alphaitalic_α ma03subscriptsuperscript𝑎30a^{3}_{0}italic_a start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT 235 71 43.5 45 44.9 44.7 59 45.2 41.1 38.2
ϵHOMOsubscriptitalic-ϵ𝐻𝑂𝑀𝑂\epsilon_{HOMO}italic_ϵ start_POSTSUBSCRIPT italic_H italic_O italic_M italic_O end_POSTSUBSCRIPT meV 41 29 24.6 27.6 22.8 22.8 20.3 23.1 17.3 16.6
ϵLUMOsubscriptitalic-ϵ𝐿𝑈𝑀𝑂\epsilon_{LUMO}italic_ϵ start_POSTSUBSCRIPT italic_L italic_U italic_M italic_O end_POSTSUBSCRIPT meV 34 25 19.5 20.4 18.9 19.2 17.5 19.8 14.8 13.5
U0subscript𝑈0U_{0}italic_U start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT meV 14 11 6.32 5.85 6.26 5.9 6.15 6.59 4.23 4.11
U𝑈Uitalic_U meV 19 12 6.28 5.83 6.36 5.92 6.38 6.82 4.25 4.51
The results are reported in MAE. The best numbers are marked in bold.

Dynamics on small molecules. MD17 is a famous molecule dynamic benchmark for small molecules. FreeCG outperforms others in all force prediction tasks. It also significantly decreases the force prediction errors for the most hard-to-predict molecule in this datasets, aspirin, by 15%. Remarkably, our methods also decrease the MAE by over 10% for ethnol, naphthalene, and salicylic acid. FreeCG does not have a particular preference for the size of molecules. It demonstrates strong performance for aspirin (180.2 g/mol) and excels on ethanol (46.1 g/mol). The energy prediction is also competitive when compared to other SoTA methods. rMD17 is the revised version of MD17. It recomputed the trajectories of each atom with higher accuracy. The force prediction accuracy of FreeCG is still leading in majority of the molecules. It improves the force results compared to the baseline model, VisNet, in all molecules except for the benzene, and performs SoTA on more atoms. Note that the results on benzene is already extreme high with previous models. The results for MD17 and rMD17 can be referred to Tab. 1 and 2, respectively.

Dynamics on large molecules. MD22 is a large molecules benchmark adopted by several studies wang2024enhancing ; wang2023quinnet ; li2023long . As shown in Tab. 3, it reveals that FreeCG also performs well for large scale data. It leads in most tracks for force prediction, and shows comparable results for energy prediction. Remarkably, The decreasing in MAE for energy and force prediction on Ac-Ala3-NHMe are both around 20%. The performances for the other models are not consistent well for force prediction, while VisNet-LSRM exhibits strong performance for energy prediction. It is also reasonable that all modern deep neural network-based methods outperform sGDML, as a classical kernel method.

Molecular property prediction. To examine the generalization power on molecular property prediction of FreeCG, we collect QM9 as a standard benchmark for this task. FreeCG performs the best for most properties. VisNet also performs the second best in most measures. Although these two methods are proposed to be MLFFs, but they are even more comparable than others in molecular property prediction tasks.

Efficiency benchmarking. Chignolin dataset wang2023aimd comprises of nearly 10,000 166-atom mini protein, which is taken as a benchmark for testing the memory usage and inference speed. We compare the inference speed and memory usage of FreeCG with VisNet, NequIP, and Allegro. The results are shown in Tab. 3. FreeCG adds little extra time and memory cost, compared to the baseline model, VisNet. It is also the most efficient for both memory and speed, compared to the other two CG transform-based methods, NequIP and Allegro. The overall results prove the effciency of FreeCG. The number of groups in group CG transform also impacts the inference speed. Fig. 4 shows the theoretical number of paths and the actual inference time for different group numbers. A computation analysis for CG transform can be referred to Method section.

Ablation Study. We conduct ablations on different modules we propose, as well as the strategies for abstract edges shuffling. The results are shown in Tab. 5. It reveals that each of our module contributes to the final score of FreeCG. In the final implementation of abstract edges shuffling, we add the index of each abstract edge by 1.5T/G1.5𝑇𝐺1.5*T/G1.5 ∗ italic_T / italic_G. Here we also study the influence of the shuffling strategies. We adopt 0.5T/G0.5𝑇𝐺0.5*T/G0.5 ∗ italic_T / italic_G, 1.0T/G1.0𝑇𝐺1.0*T/G1.0 ∗ italic_T / italic_G, and 1.5T/G1.5𝑇𝐺1.5*T/G1.5 ∗ italic_T / italic_G for comparing the performance. We can see from the result that 1.5T/G1.5𝑇𝐺1.5*T/G1.5 ∗ italic_T / italic_G works the best. The group numbers are also evaluated and a small number of groups appears to be a good choice.

3 Discussion

This work proposes FreeCG, a geometric neural network that frees the design space of CG transform. We reveal two main issues in designing a CG transform-based neural networks: 1) the computational overhead and 2) the limitation in designing the CG transform layer. We analyse and prove that these two problems root in the mathematical constrain posed by the permutation equivariance. Proposing and leveraging an interesting proposition, we bypass the constrain by designing CG transform layer upon the permutation-invariant abstract edges. On top of this free design platform we set up, we propose group CG transform, sparse path, abstract edges shuffling, and attention enhancer to achieve a high expressive and efficient MLFF model. We conduct experiments in various data types and tasks, e.g., force prediction for small molecules, large molecules, and molecular property prediction task. The results prove that FreeCG is the current SoTA for force and property prediction. The speed and memory demands are also tested on the Chignolin dataset.

Beyond this, the proposed CG transform design paradigm is also available for the future design of CG transform-based neural networks. The proposition clearly shows that once the permutation invariant mathematical objects are created, the CG transform designed on them is completely free. The way to create permutation invariant objects, and to design the CG transform layer upon them can be well pushed beyond the way we do in this work. Thus, it also points out a paradigm for expressive and efficient CG transform-based neural network design in the future.

4 Methods

4.1 Experimental settings

We conduct all the experiments under the same software and hardware settings. The machine is equipped with an Intel® Xeon® Gold 6330 CPU @ 2.00GHz, with NVIDIA Tesla A100 80G GPU. We run the experiments for each molecule on a single GPU. Pytorch 1.10.0 is used as the basic machine learning python library. For the CG transform operations, we adopt e3nn 0.5.1. Matplotlib 3.0.3 is utilized for plotting. The details can be referred to Tab. 6. We report the hyperparameters used in Tab. 7. For training/validation/test splits, we follow previous works wang2024enhancing ; wang2023quinnet . We pick up the model for evaluating on test set based on the performance on the validation set. If the model does not improve for a given number of epochs, we will terminate the training and select the checkpoint with the best validation score. As previous works, Exponential Moving Average (EMA) is adopted to generate the model weights. The detailed training configurations are shown in Tab. 7.

4.2 Model implementation

Here we show how FreeCG is built upon VisNet. This section provides detailed explanations of the implementation details, ensuring FreeCG can be replicated effectively.

Input layer. Given the atom coordinates and types {𝑿=r0,r1,r2,,rN),𝒁=(z1,z2,,zn)}\{\bm{X}=r_{0},r_{1},r_{2},...,r_{N}),\bm{Z}=(z_{1},z_{2},...,z_{n})\}{ bold_italic_X = italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_r start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) , bold_italic_Z = ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) }, where r3𝑟superscript3r\in\mathbbm{R}^{3}italic_r ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT the Cartesian coordinates of atom, and z𝑧zitalic_z the atom type (atom numbers). First we embed the atom types to the latent space, and take them as our first layer’s node features hi=embedding(zi)Csubscript𝑖embeddingsubscript𝑧𝑖superscript𝐶h_{i}={\rm embedding}(z_{i})\in\mathbbm{R}^{C}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_embedding ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT. C𝐶Citalic_C is the dimension of the latent space. For each atom, we only consider neighbouring atoms within a given radius 𝒩(i)𝒩𝑖\mathcal{N}(i)caligraphic_N ( italic_i ), where we maintain the distance vector from the central atom to the neighbouring atoms, and lift them to (l=1,p=1(l=1,p=-1( italic_l = 1 , italic_p = - 1 and l=2,p=1)l=2,p=1)italic_l = 2 , italic_p = 1 ) irreps Eij3+5subscript𝐸𝑖𝑗superscript35E_{ij}\in\mathbbm{R}^{3+5}italic_E start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 + 5 end_POSTSUPERSCRIPT via real spherical harmonics applied on the unit vector Eij=Yl(rij/rij)subscript𝐸𝑖𝑗superscript𝑌𝑙subscript𝑟𝑖𝑗delimited-∥∥subscript𝑟𝑖𝑗E_{ij}=Y^{l}(r_{ij}/\lVert r_{ij}\rVert)italic_E start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_Y start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT / ∥ italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∥ ), where we also calculate the corresponding Euclidean norm rijdelimited-∥∥subscript𝑟𝑖𝑗\lVert r_{ij}\rVert∥ italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∥. The Euclidean norm of vectors are then converted to high-dimension scalar features (edge attributes) fij=RBF(rij)Csubscript𝑓𝑖𝑗RBFsubscript𝑟𝑖𝑗superscript𝐶f_{ij}={\rm RBF}(r_{ij})\in\mathbbm{R}^{C}italic_f start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = roman_RBF ( italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT by radial basis functions (RBFs). We also maintain zero-initialized abstract edges E¯iL=0=𝟎subscriptsuperscript¯𝐸𝐿0𝑖0\overline{E}^{L=0}_{i}={\bm{0}}over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L = 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_0 for each node to be updated in the following layers. We assign the same number of abstract edges as the dimension of the latent features, such that additional operations to align the dimension numbers are not required.

Intermediate layers. Here, we use a superscript L𝐿Litalic_L to denote the index of layer that the features are in. The message-passing between atoms is implemented by a transformer architecture. For each atom i𝑖iitalic_i, the neighbouring atoms j𝒩(i)𝑗𝒩𝑖j\in\mathcal{N}(i)italic_j ∈ caligraphic_N ( italic_i ) will send messages to i𝑖iitalic_i, and the messages are aggregated to update the information of i𝑖iitalic_i. The query, key, and value of the node features are first calculated, respectively: qi=fq(hi)subscript𝑞𝑖subscript𝑓𝑞subscript𝑖q_{i}=f_{q}(h_{i})italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), kj=fk(hj)subscript𝑘𝑗subscript𝑓𝑘subscript𝑗k_{j}=f_{k}(h_{j})italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), vj=fv(hj)subscript𝑣𝑗subscript𝑓𝑣subscript𝑗v_{j}=f_{v}(h_{j})italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ( italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ). The edge attributes are also converted to auxiliary terms dkj=fdk(fij)𝑑subscript𝑘𝑗subscript𝑓𝑑𝑘subscript𝑓𝑖𝑗dk_{j}=f_{dk}(f_{ij})italic_d italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_d italic_k end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) and dvj=fdv(fij)𝑑subscript𝑣𝑗subscript𝑓𝑑𝑣subscript𝑓𝑖𝑗dv_{j}=f_{dv}(f_{ij})italic_d italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_d italic_v end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) to modulate keys and and values of atoms. Here functions f𝑓fitalic_f are all fully-connected linear operations. Then we calculate the self-attention from i𝑖iitalic_i to j𝑗jitalic_j, which is

aij=SiLU(Cutoff(||rij||)qikjdkj+\displaystyle a_{ij}={\rm SiLU}\bigg{(}{\rm Cutoff}(||r_{ij}||)q_{i}k_{j}dk_{j}+italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = roman_SiLU ( roman_Cutoff ( | | italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | | ) italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_d italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + (7)
AttEnhancer(rij,E¯jL))\displaystyle{\rm AttEnhancer}(r_{ij},\overline{E}^{L}_{j})\bigg{)}roman_AttEnhancer ( italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) )

where Cutoff()Cutoff{\rm Cutoff}(\cdot)roman_Cutoff ( ⋅ ) is a cosine cutoff function, and AttEnhancer()AttEnhancer{\rm AttEnhancer}(\cdot)roman_AttEnhancer ( ⋅ ) the proposed attention enhancer module, as we will formulate its details. First, recall the dimension of E¯iLC8subscriptsuperscript¯𝐸𝐿𝑖superscript𝐶8\overline{E}^{L}_{i}\in\mathbbm{R}^{C*8}over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_C ∗ 8 end_POSTSUPERSCRIPT and rij8subscript𝑟𝑖𝑗superscript8r_{ij}\in\mathbbm{R}^{8}italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT. Each of the C𝐶Citalic_C abstract edges will undergo a dot product with rijsubscript𝑟𝑖𝑗r_{ij}italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT. The highest value among them will be the output of AttEnhancerAttEnhancer{\rm AttEnhancer}roman_AttEnhancer. In other word,

AttEnhancer(E¯iL,rij)=maxC(E¯iLrij)AttEnhancersubscriptsuperscript¯𝐸𝐿𝑖subscript𝑟𝑖𝑗subscript𝐶direct-productsubscriptsuperscript¯𝐸𝐿𝑖subscript𝑟𝑖𝑗{\rm AttEnhancer}(\overline{E}^{L}_{i},r_{ij})=\max_{C}(\overline{E}^{L}_{i}% \odot r_{ij})roman_AttEnhancer ( over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) = roman_max start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊙ italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) (8)

as we introduce in Eq. (6). Then the values are multiplied with dv𝑑𝑣dvitalic_d italic_v and attention.

v^ji=vjdvjaijsubscript^𝑣maps-to𝑗𝑖subscript𝑣𝑗𝑑subscript𝑣𝑗subscript𝑎𝑖𝑗\hat{v}_{j\mapsto i}=v_{j}\cdot dv_{j}\cdot a_{ij}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_j ↦ italic_i end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⋅ italic_d italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⋅ italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT (9)

It then undergoes two different fully-connected operations to generate two coefficients s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. They are used to generate the abstract edges:

E^jiL=E¯iLs1+Eijs2subscriptsuperscript^𝐸𝐿maps-to𝑗𝑖subscriptsuperscript¯𝐸𝐿𝑖subscript𝑠1subscript𝐸𝑖𝑗subscript𝑠2\hat{E}^{L}_{j\mapsto i}=\overline{E}^{L}_{i}\cdot s_{1}+E_{ij}\cdot s_{2}over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j ↦ italic_i end_POSTSUBSCRIPT = over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_E start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ⋅ italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (10)

This variable, together with v^jisubscript^𝑣maps-to𝑗𝑖\hat{v}_{j\mapsto i}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_j ↦ italic_i end_POSTSUBSCRIPT, are aggregated by sum:

E^iL=j𝒩(i)E^jiLsubscriptsuperscript^𝐸𝐿𝑖subscript𝑗𝒩𝑖subscriptsuperscript^𝐸𝐿maps-to𝑗𝑖\hat{E}^{L}_{i}=\sum_{j\in\mathcal{N}(i)}\hat{E}^{L}_{j\mapsto i}over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N ( italic_i ) end_POSTSUBSCRIPT over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j ↦ italic_i end_POSTSUBSCRIPT (11)
v^iL=j𝒩(i)v^jiLsubscriptsuperscript^𝑣𝐿𝑖subscript𝑗𝒩𝑖subscriptsuperscript^𝑣𝐿maps-to𝑗𝑖\hat{v}^{L}_{i}=\sum_{j\in\mathcal{N}(i)}\hat{v}^{L}_{j\mapsto i}over^ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N ( italic_i ) end_POSTSUBSCRIPT over^ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j ↦ italic_i end_POSTSUBSCRIPT (12)

v^iLsubscriptsuperscript^𝑣𝐿𝑖\hat{v}^{L}_{i}over^ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT then converts to three variables for further operation:

oiL,1,oiL,2,oiL,3=Linear(v^iL)subscriptsuperscript𝑜𝐿1𝑖subscriptsuperscript𝑜𝐿2𝑖subscriptsuperscript𝑜𝐿3𝑖Linearsubscriptsuperscript^𝑣𝐿𝑖o^{L,1}_{i},o^{L,2}_{i},o^{L,3}_{i}={\rm Linear}(\hat{v}^{L}_{i})italic_o start_POSTSUPERSCRIPT italic_L , 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_o start_POSTSUPERSCRIPT italic_L , 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_o start_POSTSUPERSCRIPT italic_L , 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_Linear ( over^ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (13)

E^iLC8subscriptsuperscript^𝐸𝐿𝑖superscript𝐶8\hat{E}^{L}_{i}\in\mathbbm{R}^{C*8}over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_C ∗ 8 end_POSTSUPERSCRIPT and E¯iLC8subscriptsuperscript¯𝐸𝐿𝑖superscript𝐶8\overline{E}^{L}_{i}\in\mathbbm{R}^{C*8}over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_C ∗ 8 end_POSTSUPERSCRIPT are used for the following group CG transform and abstract edges shuffling. First E¯iLC8subscriptsuperscript¯𝐸𝐿𝑖superscript𝐶8\overline{E}^{L}_{i}\in\mathbbm{R}^{C*8}over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_C ∗ 8 end_POSTSUPERSCRIPT undergoes a fully-connected operation along C𝐶Citalic_C dimension, and multiply with oiL,1subscriptsuperscript𝑜𝐿1𝑖o^{L,1}_{i}italic_o start_POSTSUPERSCRIPT italic_L , 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, which means we get oiL,1Linear(E¯iL)subscriptsuperscript𝑜𝐿1𝑖Linearsubscriptsuperscript¯𝐸𝐿𝑖o^{L,1}_{i}\cdot{\rm Linear}(\overline{E}^{L}_{i})italic_o start_POSTSUPERSCRIPT italic_L , 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ roman_Linear ( over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). It, together with E^iLC8subscriptsuperscript^𝐸𝐿𝑖superscript𝐶8\hat{E}^{L}_{i}\in\mathbbm{R}^{C*8}over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_C ∗ 8 end_POSTSUPERSCRIPT, are then divided into G𝐺Gitalic_G groups along C𝐶Citalic_C dimension, where we get E^i,tGgLCG8subscriptsuperscript^𝐸𝐿𝑖𝑡subscript𝐺𝑔superscript𝐶𝐺8\hat{E}^{L}_{i,t\in G_{g}}\in\mathbbm{R}^{\frac{C}{G}*8}over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_t ∈ italic_G start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT divide start_ARG italic_C end_ARG start_ARG italic_G end_ARG ∗ 8 end_POSTSUPERSCRIPT, and (oiL,1Linear(E¯iL))tGgCG8subscriptsubscriptsuperscript𝑜𝐿1𝑖Linearsubscriptsuperscript¯𝐸𝐿𝑖𝑡subscript𝐺𝑔superscript𝐶𝐺8(o^{L,1}_{i}\cdot{\rm Linear}(\overline{E}^{L}_{i}))_{t\in G_{g}}\in\mathbbm{R% }^{\frac{C}{G}*8}( italic_o start_POSTSUPERSCRIPT italic_L , 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ roman_Linear ( over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_t ∈ italic_G start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT divide start_ARG italic_C end_ARG start_ARG italic_G end_ARG ∗ 8 end_POSTSUPERSCRIPT. Then, we perform CG transform between two variables in fully connected form with learnable weights, and concatenate the results to generate dE¯iL+1𝑑subscriptsuperscript¯𝐸𝐿1𝑖d\overline{E}^{\prime L+1}_{i}italic_d over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT ′ italic_L + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT before shuffling, as shown in Eq. (5). For the shuffling strategies, we add 3C2G3𝐶2𝐺\frac{3C}{2G}divide start_ARG 3 italic_C end_ARG start_ARG 2 italic_G end_ARG to each index of the abstract edges E¯iL+1subscriptsuperscript¯𝐸𝐿1𝑖\overline{E}^{\prime L+1}_{i}over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT ′ italic_L + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Then, it is added with E^iLsubscriptsuperscript^𝐸𝐿𝑖\hat{E}^{L}_{i}over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to form a residual structure, as we show here:

dE¯iL+1=shuffle(dE¯iL+1)+E^iL𝑑subscriptsuperscript¯𝐸𝐿1𝑖shuffle𝑑subscriptsuperscript¯𝐸𝐿1𝑖subscriptsuperscript^𝐸𝐿𝑖d\overline{E}^{L+1}_{i}={\rm shuffle}(d\overline{E}^{\prime L+1}_{i})+\hat{E}^% {L}_{i}italic_d over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_shuffle ( italic_d over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT ′ italic_L + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (14)

where dE¯iL+1𝑑subscriptsuperscript¯𝐸𝐿1𝑖d\overline{E}^{L+1}_{i}italic_d over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is added to E¯iLsubscriptsuperscript¯𝐸𝐿𝑖\overline{E}^{L}_{i}over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to obtain E¯iL+1subscriptsuperscript¯𝐸𝐿1𝑖\overline{E}^{L+1}_{i}over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Next, we update hhitalic_h and f𝑓fitalic_f. We first show the update for hhitalic_h:

dhiL+1=hiL+(Linear1(E¯iL)Linear2(E¯iL))oiL,2𝑑subscriptsuperscript𝐿1𝑖subscriptsuperscript𝐿𝑖direct-productsubscriptLinear1subscriptsuperscript¯𝐸𝐿𝑖subscriptLinear2subscriptsuperscript¯𝐸𝐿𝑖subscriptsuperscript𝑜𝐿2𝑖\displaystyle dh^{L+1}_{i}=h^{L}_{i}+\bigg{(}{\rm Linear_{1}}(\overline{E}^{L}% _{i})\odot{\rm Linear_{2}}(\overline{E}^{L}_{i})\bigg{)}\cdot o^{L,2}_{i}italic_d italic_h start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_h start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( roman_Linear start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⊙ roman_Linear start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ⋅ italic_o start_POSTSUPERSCRIPT italic_L , 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (15)
+oiL,3subscriptsuperscript𝑜𝐿3𝑖\displaystyle+o^{L,3}_{i}+ italic_o start_POSTSUPERSCRIPT italic_L , 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

To update f𝑓fitalic_f, we follow VisNet to leverage rejection of vectors:

dfijL+1=fijL+RejCalctrg(E¯iL,rij)\displaystyle df^{L+1}_{ij}=f^{L}_{ij}+{\rm RejCalc_{trg}}(\overline{E}^{L}_{i% },r_{ij})\odotitalic_d italic_f start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_f start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT + roman_RejCalc start_POSTSUBSCRIPT roman_trg end_POSTSUBSCRIPT ( over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ⊙ (16)
RejCalcsrc(E¯iL,rij)SiLU(Linear(fijL))subscriptRejCalcsrcsubscriptsuperscript¯𝐸𝐿𝑖subscript𝑟𝑖𝑗SiLULinearsubscriptsuperscript𝑓𝐿𝑖𝑗\displaystyle{\rm RejCalc_{src}}(\overline{E}^{L}_{i},r_{ij})\cdot{\rm SiLU}({% \rm Linear}(f^{L}_{ij}))roman_RejCalc start_POSTSUBSCRIPT roman_src end_POSTSUBSCRIPT ( over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ⋅ roman_SiLU ( roman_Linear ( italic_f start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) )

where rejection calculation module RejCalc𝑅𝑒𝑗𝐶𝑎𝑙𝑐RejCalcitalic_R italic_e italic_j italic_C italic_a italic_l italic_c is:

RejCalcmode(a,b)=a(Linearmode(a)b)b𝑅𝑒𝑗𝐶𝑎𝑙subscript𝑐𝑚𝑜𝑑𝑒𝑎𝑏𝑎direct-productsubscriptLinearmode𝑎𝑏𝑏RejCalc_{mode}(a,b)=a-({\rm Linear_{mode}}(a)\odot b)\cdot bitalic_R italic_e italic_j italic_C italic_a italic_l italic_c start_POSTSUBSCRIPT italic_m italic_o italic_d italic_e end_POSTSUBSCRIPT ( italic_a , italic_b ) = italic_a - ( roman_Linear start_POSTSUBSCRIPT roman_mode end_POSTSUBSCRIPT ( italic_a ) ⊙ italic_b ) ⋅ italic_b (17)

The updated E¯L+1superscript¯𝐸𝐿1\overline{E}^{L+1}over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT, hL+1superscript𝐿1h^{L+1}italic_h start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT, and fL+1superscript𝑓𝐿1f^{L+1}italic_f start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT are fed into the next layer.

Output layers are different with respect to the task our model performs. We introduce the details for each task.

Force field prediction. Our model is based on energy-conservative field, which means we derive the force from the predicted potential energy. Following VisNet wang2024enhancing and PaiNN schutt2021equivariant , we predict the potential energy of the molecule via equivariant gated module.

hiL+1,riL+1=MLP(Concat(h,Linear1(E¯iL)))subscriptsuperscript𝐿1𝑖subscriptsuperscript𝑟𝐿1𝑖MLPConcatnormsubscriptLinear1subscriptsuperscript¯𝐸𝐿𝑖h^{L+1}_{i},r^{L+1}_{i}={\rm MLP}\bigg{(}{\rm Concat}(h,||{\rm Linear_{1}}(% \overline{E}^{L}_{i})||)\bigg{)}italic_h start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_r start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_MLP ( roman_Concat ( italic_h , | | roman_Linear start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | | ) ) (18)

where MLPMLP{\rm MLP}roman_MLP is an 1111-hidden layer multi-layer preceptor. There is one more step to update E¯iL+1subscriptsuperscript¯𝐸𝐿1𝑖\overline{E}^{L+1}_{i}over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT:

E¯iL+1=Linear2(E¯iL)riL+1subscriptsuperscript¯𝐸𝐿1𝑖subscriptLinear2subscriptsuperscript¯𝐸𝐿𝑖subscriptsuperscript𝑟𝐿1𝑖\overline{E}^{L+1}_{i}={\rm Linear_{2}}(\overline{E}^{L}_{i})\cdot r^{L+1}_{i}over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_Linear start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ italic_r start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (19)

We stack this module two times and finally, the total energy of the molecule is the sum of the last-layer node features h𝑳superscript𝑳h^{\bm{L}}italic_h start_POSTSUPERSCRIPT bold_italic_L end_POSTSUPERSCRIPT:

y=ihi𝑳𝑦subscript𝑖subscriptsuperscript𝑳𝑖y=\sum_{i}h^{\bm{L}}_{i}italic_y = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_h start_POSTSUPERSCRIPT bold_italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (20)

and the force is the negative gradients of the total energy:

Fi=riysubscript𝐹𝑖subscriptsubscript𝑟𝑖𝑦F_{i}=\nabla_{r_{i}}yitalic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∇ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_y (21)

Property prediction. The calculations for properties in QM9 follow the same procedure as energy prediction in force field prediction, with the exception of molecular dipole and electronic spatial extent. We first need to calculate the center of mass rcsubscript𝑟𝑐r_{c}italic_r start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, which is:

rc=imiriimisubscript𝑟𝑐subscript𝑖subscript𝑚𝑖subscript𝑟𝑖subscript𝑖subscript𝑚𝑖r_{c}=\frac{\sum_{i}m_{i}\cdot r_{i}}{\sum_{i}m_{i}}italic_r start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG (22)

For molecular dipole, the formula is:

μ=iE¯i𝑳+hi𝑳(rirc)𝜇normsubscript𝑖superscriptsubscript¯𝐸𝑖𝑳superscriptsubscript𝑖𝑳subscript𝑟𝑖subscript𝑟𝑐\mu=\left\|\sum_{i}\overline{E}_{i}^{\bm{L}}+h_{i}^{\bm{L}}(r_{i}-r_{c})\right\|italic_μ = ∥ ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over¯ start_ARG italic_E end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_italic_L end_POSTSUPERSCRIPT + italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_italic_L end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) ∥ (23)

and for electronic spatial extent:

R2=ihi𝑳rircdelimited-⟨⟩superscript𝑅2subscript𝑖superscriptsubscript𝑖𝑳delimited-∥∥subscript𝑟𝑖subscript𝑟𝑐\langle R^{2}\rangle=\sum_{i}h_{i}^{\bm{L}}\lVert r_{i}-r_{c}\rVert⟨ italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟩ = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_italic_L end_POSTSUPERSCRIPT ∥ italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∥ (24)

It suffices to change the output head for different tasks.

4.3 Datasets details

MD17 and rMD17. They are all molecular dynamic datasets for small molecules. MD17 chmiela2017machine is proposed by Chmiela, S., et al. It contains ab-initio level molecular dynamic trajectories. There are four types of information provided in the dataset: 1) atomic numbers, 2) atomic position, 3) molecular energy, and 4) force on each atom. To alleivate the noise during the trajectory computation, Christensen, A. S. et al. also propose revised MD17 Christensen2020 , where molecular trajectories are calculated at the PBE/def2-SVP level of theory. The tight SCF convergence and dense DFT integration grid further guarantee the accuracy of the calculated trajectories.

MD22. Compared to MD17 and rMD17, MD22 comprises of large molecules ranging from 42 to 370 atoms. The trajectories are sampled between 400K and 500K at 1fs resolution. The energy and force labels are acquired at the PBE+MBD level of theory. The root mean squared test error of force prediction is controlled to be around 1 kcal/mol/Å in the original paper chmiela2023accurate . Thus, the training data sizes for different molecules vary. Generally, the larger the molecules, the smaller the training data size.

QM9. QM9 consists of around 130,000 molecules with 19 properties regression tasks. It a subset of GDB-17 database ruddigkeit2012enumeration . The data is calculated at B3LYP/6-31G(2df,p) based DFT level of accuracy. Since the attributes are different for various properties, we adopt different output heads for them, as discussed in Sec. 4.2.

Chignolin. The AIMD-Chig dataset consists of 2 million conformations of the 166-atom protein Chignolin, sampled at the M06-2X/6-31 G* based DFT level. There are around 10,000 conformations, which covers folded, unfolded, and metastable states. We take this dataset as our efficiency benchmark, following wang2024enhancing .

4.4 Proof of the permutation invariance of abstract edges

According to Sec. 4.2, we first recall the last step for generating abstract edges:

E^iL=j𝒩(i)E^jiL=pPj𝒩(i)𝒫(p)E^jiLCard(P)subscriptsuperscript^𝐸𝐿𝑖subscript𝑗𝒩𝑖subscriptsuperscript^𝐸𝐿maps-to𝑗𝑖subscript𝑝𝑃subscript𝑗𝒩𝑖𝒫𝑝subscriptsuperscript^𝐸𝐿maps-to𝑗𝑖Card𝑃\hat{E}^{L}_{i}=\sum_{j\in\mathcal{N}(i)}\hat{E}^{L}_{j\mapsto i}=\sum_{p\in P% }\sum_{j\in\mathcal{N}(i)}\frac{\mathcal{P}(p)\hat{E}^{L}_{j\mapsto i}}{{\rm Card% }(P)}over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N ( italic_i ) end_POSTSUBSCRIPT over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j ↦ italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_p ∈ italic_P end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N ( italic_i ) end_POSTSUBSCRIPT divide start_ARG caligraphic_P ( italic_p ) over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j ↦ italic_i end_POSTSUBSCRIPT end_ARG start_ARG roman_Card ( italic_P ) end_ARG (25)

where P𝑃Pitalic_P is the set for all permutation operations, here we omit the subscript of 𝒫𝒫\mathcal{P}caligraphic_P for specific spaces to work on. Note we are proving that the abstract edges for each atom are permutation invariant, and we can freely design CG transform per atom, thus the permutation is applied to j𝑗jitalic_j but not i𝑖iitalic_i. It sums over all the permutation operations, and thus the last step is permutation invariant. Then, it suffices to show that each of the previous step are all at least permutation equivariant. It also suffices to show they are permutation equivariant w.r.t. single index switch operation, as each permutation operation can be made by several switches. If we exchange, without loss of generality, index x𝑥xitalic_x and y𝑦yitalic_y, then those aijsubscript𝑎𝑖𝑗a_{ij}italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT that x𝑥xitalic_x or y𝑦yitalic_y shows up in the subscript for j𝑗jitalic_j will exchange with each other, and so do v^jisubscript^𝑣maps-to𝑗𝑖\hat{v}_{j\mapsto i}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_j ↦ italic_i end_POSTSUBSCRIPT and E^jiLsubscriptsuperscript^𝐸𝐿maps-to𝑗𝑖\hat{E}^{L}_{j\mapsto i}over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j ↦ italic_i end_POSTSUBSCRIPT. Thus, the rest steps are equivariant w.r.t. single switch, and so they are permutation equivariant. Therefore, we conclude our proof that abstract edges are permutation invariant.

4.5 Analysis on the efficiency of CG transform

CG transform comprises of two single steps: 1) tensor product between two irreps, and 2) the decomposition of the output tensors into irreps. These transforms are actually quadratic homogeneous polynomials. For the sake of convenience, we discuss SO(3) group here. Recall the CG transform formula:

Cmclc=ma,mbCAmalaBmblbsubscriptsuperscript𝐶subscript𝑙𝑐subscript𝑚𝑐subscriptsubscript𝑚𝑎subscript𝑚𝑏𝐶subscriptsuperscript𝐴subscript𝑙𝑎subscript𝑚𝑎subscriptsuperscript𝐵subscript𝑙𝑏subscript𝑚𝑏C^{l_{c}}_{m_{c}}=\sum_{m_{a},m_{b}}CA^{l_{a}}_{m_{a}}B^{l_{b}}_{m_{b}}italic_C start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_C italic_A start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT (26)

where ma+mb=mcsubscript𝑚𝑎subscript𝑚𝑏subscript𝑚𝑐m_{a}+m_{b}=m_{c}italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT + italic_m start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = italic_m start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT. For example, if we take single multiplication and addiction as a basic operation, then two l=1𝑙1l=1italic_l = 1 irreps generates a l=2𝑙2l=2italic_l = 2 irreps will consume 1 basic operations for m=±2𝑚plus-or-minus2m=\pm 2italic_m = ± 2, 3 for m=±1𝑚plus-or-minus1m=\pm 1italic_m = ± 1, and 5 for m=0𝑚0m=0italic_m = 0, which is in total 13 basic operations. A good way to intepret irreps is to take it as the generalization of vector and scalar. A dot product between vectors only consumes 5 basic operations, compared to the 13 ones above. Thus CG transform is very time consuming. The table of basic operations for the CG transform between each pair is shown in Tab. 8.

Author contributions

S. S. initiated, conceived the study, conducted all experiments, and wrote the manuscript under the guidance of Q. C. H. G. discussed the projects, and set up the experimental platform. Q. C. supervised the project. All authors reviewed and approved the final manuscript.

Data availability

Code availability

The code for reproduction will be publicly available upon official publishing.

References

  • (1) Amaro, R. E. & Mulholland, A. J. Multiscale methods in drug design bridge chemical and biological complexity in the search for cures. Nature Reviews Chemistry 2, 0148 (2018).
  • (2) Das, P. et al. Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations. Nature Biomedical Engineering 5, 613–623 (2021).
  • (3) Chen, S. et al. Design of target specific peptide inhibitors using generative deep learning and molecular dynamics simulations. Nature Communications 15, 1611 (2024).
  • (4) Zepeda-Ruiz, L. A., Stukowski, A., Oppelstrup, T. & Bulatov, V. V. Probing the limits of metal plasticity with molecular dynamics simulations. Nature 550, 492–495 (2017).
  • (5) Liu, M. et al. Layer-by-layer phase transformation in ti3o5 revealed by machine-learning molecular dynamics simulations. Nature Communications 15, 3079 (2024).
  • (6) Zeng, J., Cao, L., Xu, M., Zhu, T. & Zhang, J. Z. Complex reaction processes in combustion unraveled by neural network-based molecular dynamics simulation. Nature communications 11, 5713 (2020).
  • (7) Meuwly, M. Machine learning for chemical reactions. Chemical Reviews 121, 10218–10239 (2021).
  • (8) Srivastava, I., Kotia, A., Ghosh, S. K. & Ali, M. K. A. Recent advances of molecular dynamics simulations in nanotribology. Journal of Molecular Liquids 335, 116154 (2021).
  • (9) Wang, Z., Zhu, J. & Li, S. Novel strategy for reducing the minimum miscible pressure in a co2–oil system using nonionic surfactant: Insights from molecular dynamics simulations. Applied Energy 352, 121966 (2023).
  • (10) Kohn, W. & Sham, L. J. Self-consistent equations including exchange and correlation effects. Physical review 140, A1133 (1965).
  • (11) Martin, R. M. Electronic structure: basic theory and practical methods (Cambridge university press, 2020).
  • (12) Ceperley, D. M. & Alder, B. J. Ground state of the electron gas by a stochastic method. Physical review letters 45, 566 (1980).
  • (13) Bartlett, R. J. & Musiał, M. Coupled-cluster theory in quantum chemistry. Reviews of Modern Physics 79, 291 (2007).
  • (14) Burke, K. Perspective on density functional theory. The Journal of chemical physics 136 (2012).
  • (15) Jones, R. O. Density functional theory: Its origins, rise to prominence, and future. Reviews of modern physics 87, 897 (2015).
  • (16) Cohen, A. J., Mori-Sánchez, P. & Yang, W. Insights into current limitations of density functional theory. Science 321, 792–794 (2008).
  • (17) Lindorff-Larsen, K. et al. Improved side-chain torsion potentials for the amber ff99sb protein force field. Proteins: Structure, Function, and Bioinformatics 78, 1950–1958 (2010).
  • (18) Brooks, B. R. et al. Charmm: the biomolecular simulation program. Journal of computational chemistry 30, 1545–1614 (2009).
  • (19) Cui, T. et al. Geometry-enhanced pretraining on interatomic potentials. Nature Machine Intelligence 1–9 (2024).
  • (20) Wang, Z., Liu, G., Zhou, Y., Wang, T. & Shao, B. Quinnet: efficiently incorporating quintuple interactions into geometric deep learning force fields. In Proceedings of the 37th International Conference on Neural Information Processing Systems, 77043–77055 (2023).
  • (21) Wang, Y. et al. Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing. Nature Communications 15, 313 (2024).
  • (22) Musaelian, A. et al. Learning local equivariant representations for large-scale atomistic dynamics. Nature Communications 14, 579 (2023).
  • (23) Batzner, S. et al. E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature communications 13, 2453 (2022).
  • (24) Drautz, R. Atomic cluster expansion for accurate and transferable interatomic potentials. Physical Review B 99, 014104 (2019).
  • (25) Batatia, I., Kovacs, D. P., Simm, G., Ortner, C. & Csányi, G. Mace: Higher order equivariant message passing neural networks for fast and accurate force fields. Advances in Neural Information Processing Systems 35, 11423–11436 (2022).
  • (26) Thölke, P. & De Fabritiis, G. Equivariant transformers for neural network based molecular potentials. In International Conference on Learning Representations (2021).
  • (27) Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-R. Schnet–a deep learning architecture for molecules and materials. The Journal of Chemical Physics 148 (2018).
  • (28) Chmiela, S. et al. Machine learning of accurate energy-conserving molecular force fields. Science advances 3, e1603015 (2017).
  • (29) Schütt, K., Unke, O. & Gastegger, M. Equivariant message passing for the prediction of tensorial properties and molecular spectra. In International Conference on Machine Learning, 9377–9388 (PMLR, 2021).
  • (30) Thomas, N. et al. Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv preprint arXiv:1802.08219 (2018).
  • (31) Satorras, V. G., Hoogeboom, E. & Welling, M. E (n) equivariant graph neural networks. In International conference on machine learning, 9323–9332 (PMLR, 2021).
  • (32) Gasteiger, J., Groß, J. & Günnemann, S. Directional message passing for molecular graphs. arXiv preprint arXiv:2003.03123 (2020).
  • (33) Vaswani, A. et al. Attention is all you need. Advances in neural information processing systems 30 (2017).
  • (34) Fuchs, F., Worrall, D., Fischer, V. & Welling, M. Se (3)-transformers: 3d roto-translation equivariant attention networks. Advances in neural information processing systems 33, 1970–1981 (2020).
  • (35) Liao, Y.-L. & Smidt, T. Equiformer: Equivariant graph attention transformer for 3d atomistic graphs. arXiv preprint arXiv:2206.11990 (2022).
  • (36) Gasteiger, J., Becker, F. & Günnemann, S. Gemnet: Universal directional graph neural networks for molecules. Advances in Neural Information Processing Systems 34, 6790–6802 (2021).
  • (37) Zhang, X., Zhou, X., Lin, M. & Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition, 6848–6856 (2018).
  • (38) Christensen, A. S. & Von Lilienfeld, O. A. On the role of gradients for machine learning of molecular energies and forces. Machine Learning: Science and Technology 1, 045018 (2020).
  • (39) Chmiela, S. et al. Accurate global machine learning force fields for molecules with hundreds of atoms. Science Advances 9, eadf0873 (2023).
  • (40) Ruddigkeit, L., Van Deursen, R., Blum, L. C. & Reymond, J.-L. Enumeration of 166 billion organic small molecules in the chemical universe database gdb-17. Journal of chemical information and modeling 52, 2864–2875 (2012).
  • (41) Ramakrishnan, R., Dral, P. O., Rupp, M. & Von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Scientific data 1, 1–7 (2014).
  • (42) Zee, A. Group theory in a nutshell for physicists, vol. 17 (Princeton University Press, 2016).
  • (43) Raczka, R. & Barut, A. O. Theory of group representations and applications (World Scientific Publishing Company, 1986).
  • (44) Jeevanjee, N. An introduction to tensors and group theory for physicists (Springer, 2011).
  • (45) Cohen, T. S., Geiger, M., Köhler, J. & Welling, M. Spherical cnns. arXiv preprint arXiv:1801.10130 (2018).
  • (46) Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012).
  • (47) Liu, Y. et al. Spherical message passing for 3d graph networks. arXiv preprint arXiv:2102.05013 (2021).
  • (48) Zhang, S., Liu, Y. & Xie, L. Efficient and accurate physics-aware multiplex graph neural networks for 3d small molecules and macromolecule complexes. arXiv preprint arXiv:2206.02789 (2022).
  • (49) Unke, O. T. et al. Spookynet: Learning force fields with electronic degrees of freedom and nonlocal effects. Nature communications 12, 7273 (2021).
  • (50) Wang, L., Liu, Y., Lin, Y., Liu, H. & Ji, S. Comenet: Towards complete and efficient message passing for 3d molecular graphs. Advances in Neural Information Processing Systems 35, 650–664 (2022).
  • (51) Qiao, Z. et al. Informing geometric deep learning with electronic interactions to accelerate quantum chemistry. Proceedings of the National Academy of Sciences 119, e2205221119 (2022).
  • (52) Frank, T., Unke, O. & Müller, K.-R. So3krates: Equivariant attention for interactions on arbitrary length-scales in molecular systems. Advances in Neural Information Processing Systems 35, 29400–29413 (2022).
  • (53) Batatia, I. et al. The design space of e (3)-equivariant atom-centered interatomic potentials. arXiv preprint arXiv:2205.06643 (2022).
  • (54) Li, Y. et al. Long-short-range message-passing: A physics-informed framework to capture non-local interaction for scalable molecular dynamics simulation. arXiv preprint arXiv:2304.13542 (2023).
  • (55) Wang, T., He, X., Li, M., Shao, B. & Liu, T.-Y. Aimd-chig: Exploring the conformational space of a 166-atom protein chignolin with ab initio molecular dynamics. Scientific Data 10, 549 (2023).
Refer to caption
Figure 1: Main problems and FreeCG overview. a. Permutation equivariance requires performing the CG transform per atom with consistent settings. It limits the available design space. b. We construct abstract edges that are invariant under permutations, on which the CG transformation is applied. This approach ensures that the entire layer is always permutation equivariant, maximizing the available design space. c. The architecture of a single layer of FreeCG. The self-attention mechanism generates abstract edges through a permutation-invariant process. The abstract edges are also used to enhance the quality of the attention score, denoted as Attention Enhancer. d. The Group CG transform organizes abstract edges into groups and performs the CG transform on each group. We adopt sparse path for CG transform, enabling lower computation demands while maintaining stronger O(3) equivariance. Abstract edges shuffling improves the information exchange between different irreps. The details for sparse path and abstract edges shuffling can be referred to Fig. 2.
Refer to caption
Figure 2: Details on sparse path and abstract edges shuffling. a. The sparse path holds two useful properties: 1) The number of paths is less than the weaker SO(3) equivariance (4 vs. 8). 2) Each output irreps contains the information from input ones with both degree l=1𝑙1l=1italic_l = 1 and l=2𝑙2l=2italic_l = 2. b. The shuffling strategy is to add a constant k𝑘kitalic_k for the index of each abstract edge. The shuffled result is then added by E^iLsubscriptsuperscript^𝐸𝐿𝑖\hat{E}^{L}_{i}over^ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and get the final added value dE¯iL+1𝑑subscriptsuperscript¯𝐸𝐿1𝑖d\overline{E}^{L+1}_{i}italic_d over¯ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.
Refer to caption
Figure 3: The speed and size of FreeCG compared with other SoTA models. Numbers are reported based on single Chignolin molecule.
Refer to caption
Figure 4: Efficiency analysis for group CG transform. a. The number of paths for CG transform under different group numbers, where the numbers of irreps are the same. b. The actual running time for CG transform for different group numers. Here we adopt sparse path strategy for computing 512 irreps (before grou**) for each l𝑙litalic_l. Full TP denotes not using sparse path.
Table 5: Ablation on different modules.
Method Aspirin
Val loss Energy Force
VisNet - 0.116 0.155
+ Group tensor product
32 groups 0.0509 0.123 0.144
8 groups 0.0416 0.112 0.129
+ Group shuffling
1-group shuffle 0.0401 0.112 0.128
0.5-group shuffle 0.0396 0.110 0.128
1.5-group shuffle 0.0384 0.111 0.125
+ Attention Enhancer 0.0345 0.110 0.122
Abstract edges shuffling and Attention Enhancer are added upon the best choices of the above modules, with respect to the validation loss.
Table 6: Hardware and software settings.
Hardware Software
CPU GPU Neural Network Equivariance Plotting
Intel® Xeon®
Gold 6330 CPU @ 2.00GHz
NVIDIA Tesla A100 Pytorch 1.10.0 e3nn 0.5.1 Matplotlib 3.0.3
Table 7: Hyperparameters for each dataset.
Hyperparameter MD17 rMD17 MD22 QM9
initial learning rate 4e-4, 2e-4 2e-4 2e-4, 1e-4 1e-4
Learning rate decay factor 0.8
Learning rate decay patience 30 30 30 15
Learning rate warmup step 1000 1000 1000 10000
Optimizer AdamW (β(0.9,0.999)𝛽0.90.999\beta(0.9,0.999)italic_β ( 0.9 , 0.999 ))
Epoch 3000 3000 3000 1500
batch size 4 4 4 32
Number of layers 9
Cutoff 5.0, 4.0 5.0 5.0, 4.0 5.0
Force/Energy loss weights 0.95/0.05 0.95/0.05 0.95/0.05 -
Dimension of latent feature 256 256 256 512
Number of groups 8
Table 8: The basic operation number for each type of CG transform.
lo=2subscript𝑙𝑜2l_{o}=2italic_l start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 2 0 1 2 lo=1subscript𝑙𝑜1l_{o}=1italic_l start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 1 0 1 2 lo=2subscript𝑙𝑜2l_{o}=2italic_l start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 2 0 1 2
0 1 0 - 3 - 0 - - 5
1 3 1 3 6 9 1 - 9 12
2 5 2 - 9 12 2 5 12 19
losubscript𝑙𝑜l_{o}italic_l start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT denotes the output degree. The column and row numbers denote degrees of two input irreps, respectively. The cyan blocks denote the operations in normal neural networks, while the others are for high order CG transform.