Neural Polarization: Toward Electron Density for Molecules by Extending Equivariant Networks

Bumju Kwak
Independent Researcher
[email protected]
&Jeonghee Jo*
Korea Institute of Science and Technology (KIST)
[email protected]
Abstract

Recent SO(3)-equivariant models embedded a molecule as a set of single atoms fixed in the three-dimensional space, which is analogous to a ball-and-stick view. This perspective provides a concise view of atom arrangements, however, the surrounding electron density cannot be represented and its polarization effects may be underestimated. To overcome this limitation, we propose Neural Polarization, a novel method extending equivariant network by embedding each atom as a pair of fixed and moving points. Motivated by density functional theory, Neural Polarization represents molecules as a space-filling view which includes an electron density, in contrast with a ball-and-stick view. Neural Polarization can flexibly be applied to most type of existing equivariant models. We showed that Neural Polarization can improve prediction performances of existing models over a wide range of targets. Finally, we verified that our method can improve the expressiveness and equivariance in terms of mathematical aspects.

1 Introduction

For chemical engineering, accurate estimation of molecular conformation with electronic configuration is an essential factor. Quantum chemistry [1, 2] is a branch of chemistry of studying quantum mechanics of a molecule conformation, based on microscopic analysis of a single atom and its surroundings. The most common approach of quantum mechanical modeling of a molecule is density functional theory (DFT) [3]. From this perspective, the main strategy for solving this equation is considering the many-electron system as a functional for a single function, which corresponds to an electron density of a molecule in three-dimensional space [4, 5].

In DFT, this electron density can be represented by atomic orbital function, as a basis set consisting of radial basis functions and spherical harmonics in three-dimensional space [6, 7]. Spherical harmonics are a set of angular basis functions subdivided by a degree (L𝐿Litalic_L), an integer-valued notation representing an angular frequency of orbitals [8, 9]. Specifically, d𝑑ditalic_d and f𝑓fitalic_f orbitals are also called “polarization functions”, because they can describe a distortion of an electron cloud, named as polarization [10, 11, 12].

These polarization functions are useful in describing molecular properties including valence electrons, and consequently affect various types of properties of a molecule [13, 14]. To estimate quantum mechanical properties with considering polarization effects, the basis sets for DFT calculation need to include polarization functions of higher degree [15, 16]. QM9 [17], one of the popular datasets in deep learning benchmark, was also calculated by 6-31G(2df,p) level of basis sets [18] which containing polarization basis.

The previous SO(3)-equivariant networks including Equiformer [19], NequIP [20], and others [21, 22, 23, 24, 25, 26] also used radial basis and spherical harmonics in their networks to represent molecule conformation. However, this approach may be limited because the representation of a set of atoms as points cannot cover the electron density functional. Therefore, most existing SO(3)-equivariant networks are potentially limited in prediction of molecular potential energy and related properties, without expression of electron density.

To address these challenges, we propose a novel flexible extension method for SO(3)-equivariant networks motivated by DFT, Neural Polarization, by allowing each equivariant block to explicitly consider the polarization effect of electron density while kee** SO(3)-equivariance. The key point of Neural Polarization is introducing an additional “movable point”, which is similarly defined as the existing atom, but these points can update their location during the training process. These moving points can be viewed as a type of direction indicator describing the polarized electron density, which is closer to a space-filling view of a molecule. We did not use any additional constraint on movements of these movable points, expecting that atomic orbital polarization caused by electron configuration can be learned for better molecule representation learning.

We applied Neural Polarization on three existing SO(3)-equivariant networks for quantum mechanical property prediction, and trained the extended models from scratch (without pretrained parameters). We verified that Neural Polarization significantly improved the prediction performance over a wide range of targets compared with the original report, especially thermodynamic potential-related targets. We also visualized the trajectories of each movable point in the three trained models, and observed that the shifting patterns of movable points have distinctive characteristics according to the target objectives. The experimental results support our initial assumption that Neural Polarization, which explicitly models the directional surroundings of an atom, induces the latent features to exhibit behavior more similar to the electron density in DFT. We also analyzed the pattern of the position of movable points, comparing with the fundamentals of chemical bonding. Finally, we mathematically verified that equivariant networks equipped with Neural Polarization also have the strictly lower bound of an approximation error with the same maximum degree of spherical harmonics and higher model expressiveness, compared to the original networks.

The contributions of this study are summarized as below.

1. Motivated by DFT, we developed a novel extension method, Neural Polarization, for SO(3)-equivariant networks by introducing movable points expecting that their positions can incorporate the effect of polarization representation for advanced molecular representation.

2. We validated the effectiveness of Neural Polarization based on the performance gain of the experimental results. In addition, the trajectories of movable points from the trained models showed that Neural Polarization can adaptively find the better description of polarization, depending on the target objective.

3. We verified that Neural Polarization lowers the approximation error trained with the same spherical harmonics with the original network, and improves the model expressiveness.

2 Preliminary of equivariant neural networks

In this manuscript, we aim to present the most important part of the preliminary due to space constraints. The remaining parts, DFT and SO(3)-equivariance are introduced in the Appendix A.4.

By definition, a group equivariant network consists of group equivariant layers such that its output transforms equivariantly under specified group operations applied to its input [27]. Meanwhile, a group-invariant network has the final layer which is group-invariant, and all other group-equivariant layers. For predicting molecular property y𝑦yitalic_y related to its energy for learning molecule embedding representation E(𝐱)𝐸𝐱E(\mathbf{x})italic_E ( bold_x ) from atom position 𝐱={xi}𝐱subscript𝑥𝑖\mathbf{x}=\{x_{i}\}bold_x = { italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }, the network of T𝑇Titalic_T layers should be group-invariant: consisting of 0,,T10𝑇10,...,T-10 , … , italic_T - 1 group-equivariant layers M𝑀Mitalic_M with the final group-invariant readout function R𝑅Ritalic_R, or a pooling layer. That is, Y^=RMT1MT2M0E(𝐱)^𝑌𝑅subscript𝑀𝑇1subscript𝑀𝑇2subscript𝑀0𝐸𝐱\hat{Y}=R\circ{M_{T-1}}\circ{M_{T-2}}\circ\ldots\circ{M_{0}}\circ{E}(\mathbf{x})over^ start_ARG italic_Y end_ARG = italic_R ∘ italic_M start_POSTSUBSCRIPT italic_T - 1 end_POSTSUBSCRIPT ∘ italic_M start_POSTSUBSCRIPT italic_T - 2 end_POSTSUBSCRIPT ∘ … ∘ italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∘ italic_E ( bold_x ).

Embedding layer. In general, the embedding layer E𝐸Eitalic_E locates at the front of the network. The embedding layer learns a feature vector of individual atoms 𝐯0={vi}subscript𝐯0subscript𝑣𝑖\mathbf{v}_{0}=\{v_{i}\}bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = { italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } from atom positions 𝐱={xi}𝐱subscript𝑥𝑖\mathbf{x}=\{x_{i}\}bold_x = { italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } and atom numbers 𝐚={ai}𝐚subscript𝑎𝑖\mathbf{a}=\{a_{i}\}bold_a = { italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }.

Equivariant layer. A function f𝑓fitalic_f which satisfies gf(x)=f(gx)𝑔𝑓𝑥𝑓𝑔𝑥g\circ f(x)=f(g\circ x)italic_g ∘ italic_f ( italic_x ) = italic_f ( italic_g ∘ italic_x ) for any group element gG𝑔𝐺g\in Gitalic_g ∈ italic_G, is called a G𝐺Gitalic_G-equivariant layer. For representing molecule structures in Euclidean space, an orthogonal geometry group SO(3) or SE(3) is generally selected as G𝐺Gitalic_G. Many equivariant networks [28, 29, 24, 21, 25] utilized a message-passing function [30] as a framework for their equivariant layers. However, there is no explicit constraint on the choice of an architecture. For example, [19, 31] consists of equivariant self-attention layers for molecules. We denote that Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT uses given position 𝐱𝐱\mathbf{x}bold_x and learnable equivariant vector 𝐯𝐭subscript𝐯𝐭\mathbf{v_{t}}bold_v start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT, however, some networks only uses 𝐯𝐭subscript𝐯𝐭\mathbf{v_{t}}bold_v start_POSTSUBSCRIPT bold_t end_POSTSUBSCRIPT for learning 𝐯𝐭+𝟏subscript𝐯𝐭1\mathbf{v_{t+1}}bold_v start_POSTSUBSCRIPT bold_t + bold_1 end_POSTSUBSCRIPT.

Pooling (Readout) layer . A pooling layer R𝑅Ritalic_R locates the end of a network, also called a readout function, produces the output y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG. In molecular property prediction, this layer merges all equivariant feature vectors and produces a scalar-valued prediction.

Embeddinglayer:𝐯0:𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔𝑙𝑎𝑦𝑒𝑟subscript𝐯0\displaystyle Embedding\quad layer:\mathbf{v}_{0}italic_E italic_m italic_b italic_e italic_d italic_d italic_i italic_n italic_g italic_l italic_a italic_y italic_e italic_r : bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT =E(𝐱,𝐚)absent𝐸𝐱𝐚\displaystyle=E(\mathbf{x},\mathbf{a})= italic_E ( bold_x , bold_a ) (1)
Equivariantlayer:𝐯t+1:𝐸𝑞𝑢𝑖𝑣𝑎𝑟𝑖𝑎𝑛𝑡𝑙𝑎𝑦𝑒𝑟subscript𝐯𝑡1\displaystyle Equivariant\quad layer:\mathbf{v}_{t+1}italic_E italic_q italic_u italic_i italic_v italic_a italic_r italic_i italic_a italic_n italic_t italic_l italic_a italic_y italic_e italic_r : bold_v start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =Mt(𝐱,𝐯t)absentsubscript𝑀𝑡𝐱subscript𝐯𝑡\displaystyle=M_{t}(\mathbf{x},\mathbf{v}_{t})= italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_x , bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) (2)
Poolinglayer:y^:𝑃𝑜𝑜𝑙𝑖𝑛𝑔𝑙𝑎𝑦𝑒𝑟^𝑦\displaystyle Pooling\quad layer:\hat{y}italic_P italic_o italic_o italic_l italic_i italic_n italic_g italic_l italic_a italic_y italic_e italic_r : over^ start_ARG italic_y end_ARG =R(𝐯T)absent𝑅subscript𝐯𝑇\displaystyle=R(\mathbf{v}_{T})= italic_R ( bold_v start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) (3)

3 Methods

3.1 Motivation of Neural Polarization by DFT

The role of electron density in DFT calculation

A molecule conformation is represented by a set of atomic number 𝐚𝐚\mathbf{a}bold_a and positions 𝐱𝐱\mathbf{x}bold_x of constituting N𝑁Nitalic_N atoms. Molecular property y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG can be predicted from its conformation X={xi,ai}i=1,,N={𝐱,𝐚}𝑋subscriptsubscript𝑥𝑖subscript𝑎𝑖𝑖1𝑁𝐱𝐚X=\{x_{i},a_{i}\}_{i=1,...,N}=\{\mathbf{x},\mathbf{a}\}italic_X = { italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 , … , italic_N end_POSTSUBSCRIPT = { bold_x , bold_a }. DFT has been most widely used method for addressing this problem, providing more accurate and reliable predictions compared to recent deep learning-based approaches [32, 33]. We hypothesized that the prediction performance of deep neural networks would benefit from fundamental concepts of DFT. In particular, we aim to propose a methodology for improving the prediction performance of existing equivariant networks based on the fundamentals of DFT.

As mentioned in Appendix A.3, DFT calculation is achieved by two sequential steps [34]. The first step is calculating electron density ρ𝜌\rhoitalic_ρ of from the given X𝑋Xitalic_X, and predict molecular property y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG with the well-defined functional based on the calculated ρ𝜌\rhoitalic_ρ, which is a type of function defined on any vector r3𝑟superscript3\vec{r}\in\mathbb{R}^{3}over→ start_ARG italic_r end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT. The first step Xρ𝑋𝜌X\rightarrow\rhoitalic_X → italic_ρ is achieved by solving computation-intensive Kohn-Sham equation [35], whereas molecular energy (or y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG) can be easily calculated from the electron density ρ𝜌\rhoitalic_ρ using pre-defined functionals, in the second step.

Construction of a link between electron density and feature space of equivariant networks

Based on these principles, we developed the assumption that if the latent feature χ={𝐱,𝐯}𝜒𝐱𝐯\chi=\{\mathbf{x},\mathbf{v}\}italic_χ = { bold_x , bold_v } of any equivariant network is equivalent to ρ𝜌\rhoitalic_ρ, or if χ𝜒\chiitalic_χ can express all information contained in ρ𝜌\rhoitalic_ρ, the network would suggest more accurate and reliable predictions close to DFT. To be more concrete, our research objective is to develop a novel equivariant latent feature χ𝜒\chiitalic_χ with atom positions 𝐱𝐱\mathbf{x}bold_x and 𝐯𝐯\mathbf{v}bold_v for SO(3) equivariant networks, of which each layer can approximate the electron density ρ𝜌\rhoitalic_ρ of a given molecule in DFT calculation. To be precise, we aim to train ξ𝜉\xiitalic_ξ and appropriate χ𝜒\chiitalic_χ which satisfies ξ(χ)=ρ𝜉𝜒𝜌\xi(\chi)=\rhoitalic_ξ ( italic_χ ) = italic_ρ.

In DFT calculations, the electron density ρ𝜌\rhoitalic_ρ is expressed as a linear combination of a finite set of basis functions. Analogously, the latent feature in equivariant networks resides in a finite-dimensional vector space. In addition, in DFT, finding the optimal representation for ρ𝜌\rhoitalic_ρ using basis sets is analogous to finding ξ𝜉\xiitalic_ξ under the constraints of linearity and invertibility. Based on this connection between finding optimal DFT basis sets and constructing an expressive latent space for SO(3)-equivariant networks for molecule property prediction, we aim to introduce the methodology based on selecting the DFT basis set, in order to improve the performance of the neural network.

Among factors considered for basis set, we focused on the term "polarization", which is one of the significant characteristics of a molecule. Polarization refers to a distortion toward specific direction x~~𝑥\tilde{x}over~ start_ARG italic_x end_ARG of the electron cloud depending on electron configuration around an atom nuclei. The shape of polarization is determined by complex interatomic interactions, and has a direct effect on various properties. To incorporate polarization effects, DFT utilizes a high-degree polarization function in general. Analogously, if we learn a basis function for equivariant features v~isubscript~𝑣𝑖\tilde{v}_{i}over~ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of any movable point x~isubscript~𝑥𝑖\tilde{x}_{i}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT near the original atom xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the latent space can effectively learn polarization functions. Therefore, if we can extend an equivariant network to incorporate a pair of (x~isubscript~𝑥𝑖\tilde{x}_{i}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, v~isubscript~𝑣𝑖\tilde{v}_{i}over~ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT) in feature space to get a hint of polarization effect, the network would be more powerful and expressive in representing atom surroundings and show better prediction performance. In Figure 1, a schematic diagram comparing the concepts is described.

3.2 Neural Polarization

We introduce a high-level description of Neural Polarization, because the internal structure of each module depends on the original baseline networks. Neural Polarization is a type of extension methodology for SO(3)-equivariant networks, with an additional movable point 𝐱~~𝐱\tilde{\mathbf{x}}over~ start_ARG bold_x end_ARG of each atom with its corresponding equivariant feature vector 𝐯~~𝐯\tilde{\mathbf{v}}over~ start_ARG bold_v end_ARG, and t𝑡titalic_t equivariant layers (or blocks) M~tsubscript~𝑀𝑡\tilde{M}_{t}over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT which are extended for incorporating 𝐱~~𝐱\tilde{\mathbf{x}}over~ start_ARG bold_x end_ARG and 𝐯~~𝐯\tilde{\mathbf{v}}over~ start_ARG bold_v end_ARG as inputs.

The first step is initializing movable points of position 𝐱~0subscript~𝐱0\tilde{\mathbf{x}}_{0}over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and type 𝐚~0subscript~𝐚0\tilde{\mathbf{a}}_{0}over~ start_ARG bold_a end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and embedding them using E𝐸Eitalic_E for creating 𝐯~0subscript~𝐯0\tilde{\mathbf{v}}_{0}over~ start_ARG bold_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. The initial position 𝐱~0subscript~𝐱0\tilde{\mathbf{x}}_{0}over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is same with 𝐱~~𝐱\tilde{\mathbf{x}}over~ start_ARG bold_x end_ARG of the original atoms.

𝐱~0subscript~𝐱0\displaystyle\tilde{\mathbf{x}}_{0}over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT =𝐱absent𝐱\displaystyle=\mathbf{x}= bold_x (4)
𝐯~0subscript~𝐯0\displaystyle\tilde{\mathbf{v}}_{0}over~ start_ARG bold_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT =E(𝐱0~,𝐚~)absent𝐸~subscript𝐱0~𝐚\displaystyle=E(\tilde{\mathbf{x}_{0}},\tilde{\mathbf{a}})= italic_E ( over~ start_ARG bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG , over~ start_ARG bold_a end_ARG ) (5)

Second, the network updates 𝐯~tsubscript~𝐯𝑡\tilde{\mathbf{v}}_{t}over~ start_ARG bold_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT with M~tsubscript~𝑀𝑡\tilde{M}_{t}over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Contrary to an original Mt(𝐱t,𝐯t)subscript𝑀𝑡subscript𝐱𝑡subscript𝐯𝑡M_{t}(\mathbf{x}_{t},\mathbf{v}_{t})italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), M~tsubscript~𝑀𝑡\tilde{M}_{t}over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is defined on extended inputs ([𝐱;𝐱~t],[𝐯t;𝐯~t])𝐱subscript~𝐱𝑡subscript𝐯𝑡subscript~𝐯𝑡([\mathbf{x};\tilde{\mathbf{x}}_{t}],[\mathbf{v}_{t};\tilde{\mathbf{v}}_{t}])( [ bold_x ; over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] , [ bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; over~ start_ARG bold_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] ). Our equivariant M~tsubscript~𝑀𝑡\tilde{M}_{t}over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT updates (𝐯t+1,𝐯~t+1)subscript𝐯𝑡1subscript~𝐯𝑡1(\mathbf{v}_{t+1},\tilde{\mathbf{v}}_{t+1})( bold_v start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , over~ start_ARG bold_v end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) based on (𝐱,𝐱~t,𝐯t,𝐯~t)𝐱subscript~𝐱𝑡subscript𝐯𝑡subscript~𝐯𝑡(\mathbf{x},\tilde{\mathbf{x}}_{t},\mathbf{v}_{t},\tilde{\mathbf{v}}_{t})( bold_x , over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG bold_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) using Mtsubscript𝑀𝑡{M}_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

M~t:[𝐯t+1;𝐯~t+1]=Mt([𝐱;𝐱~t],[𝐯t;𝐯~t]):subscript~𝑀𝑡subscript𝐯𝑡1subscript~𝐯𝑡1subscript𝑀𝑡𝐱subscript~𝐱𝑡subscript𝐯𝑡subscript~𝐯𝑡\displaystyle\tilde{M}_{t}:[\mathbf{v}_{t+1};\tilde{\mathbf{v}}_{t+1}]=M_{t}([% \mathbf{x};\tilde{\mathbf{x}}_{t}],[\mathbf{v}_{t};\tilde{\mathbf{v}}_{t}])over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : [ bold_v start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ; over~ start_ARG bold_v end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ] = italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( [ bold_x ; over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] , [ bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; over~ start_ARG bold_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] ) (6)

𝐱~t+1subscript~𝐱𝑡1\tilde{\mathbf{x}}_{t+1}over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT is produced by the additional projection layer πt:𝕍k×N3×N:subscript𝜋𝑡superscript𝕍𝑘𝑁superscript3𝑁\pi_{t}:\mathbb{V}^{k\times N}\rightarrow\mathbb{R}^{3\times N}italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : blackboard_V start_POSTSUPERSCRIPT italic_k × italic_N end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 3 × italic_N end_POSTSUPERSCRIPT, given by an equivariant feature 𝐯t~~subscript𝐯𝑡\tilde{\mathbf{v}_{t}}over~ start_ARG bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG of M~tsubscript~𝑀𝑡\tilde{M}_{t}over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. We constructed a πtsubscript𝜋𝑡\pi_{t}italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as a sequential block of an equivariant layer and linear layer, however, there is no constraint on the constitution of π𝜋\piitalic_π block. Note that M~tsubscript~𝑀𝑡\tilde{M}_{t}over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT does not modify the original atom position 𝐱𝐱\mathbf{x}bold_x, following the baseline networks.

Δ𝐱~t=πt(𝐯t~)Δsubscript~𝐱𝑡subscript𝜋𝑡~subscript𝐯𝑡\displaystyle\Delta\tilde{\mathbf{x}}_{t}=\pi_{t}(\tilde{\mathbf{v}_{t}})roman_Δ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG ) (7)
𝐱~t+1=𝐱~t+Δ𝐱~tsubscript~𝐱𝑡1subscript~𝐱𝑡Δsubscript~𝐱𝑡\displaystyle\tilde{\mathbf{x}}_{t+1}=\tilde{\mathbf{x}}_{t}+\Delta\tilde{% \mathbf{x}}_{t}over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + roman_Δ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (8)

The overview and psuedocode of neural polarization is described in Figure 2 and Algorithm 2, respectively, compared with the original framework. In broad terms, the optimizing Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and {x,x~,v,v~}𝑥~𝑥𝑣~𝑣\{x,\tilde{x},v,\tilde{v}\}{ italic_x , over~ start_ARG italic_x end_ARG , italic_v , over~ start_ARG italic_v end_ARG } may correspond to finding the optimal ξ𝜉\xiitalic_ξ and χ𝜒\chiitalic_χ, respectively.

3.3 Mathematical interpretation of Neural Polarization

We introduced the process of training movable points and their equivariant features {𝐱~,𝐯~}~𝐱~𝐯\{\tilde{\mathbf{x}},\tilde{\mathbf{v}}\}{ over~ start_ARG bold_x end_ARG , over~ start_ARG bold_v end_ARG } in networks with Neural Polarization. To investigate the advantage of {𝐱~,𝐯~}~𝐱~𝐯\{\tilde{\mathbf{x}},\tilde{\mathbf{v}}\}{ over~ start_ARG bold_x end_ARG , over~ start_ARG bold_v end_ARG } in approximating ρ𝜌\rhoitalic_ρ, we also conducted theoretical analysis on these terms. In particular, We will discuss about approximation capability of Neural Polarization for electron density. To discuss the approximation capability for the electron density ρ𝜌\rhoitalic_ρ, we introduce the following definition.

Definition 1

Let define the error (ρ,ρ^)𝜌^𝜌\mathcal{E}(\rho,\hat{\rho})caligraphic_E ( italic_ρ , over^ start_ARG italic_ρ end_ARG ) between electron density ρ𝜌\rhoitalic_ρ and ρ^^𝜌\hat{\rho}over^ start_ARG italic_ρ end_ARG as (ρ,ρ^)=𝐑3|(ρρ^)|2𝑑V𝜌^𝜌subscriptsuperscript𝐑3superscript𝜌^𝜌2differential-d𝑉\mathcal{E}(\rho,\hat{\rho})=\int_{\mathbf{R}^{3}}{|{(\rho-\hat{\rho})}|^{2}{% dV}}caligraphic_E ( italic_ρ , over^ start_ARG italic_ρ end_ARG ) = ∫ start_POSTSUBSCRIPT bold_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | ( italic_ρ - over^ start_ARG italic_ρ end_ARG ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_V and error P𝑃Pitalic_P and ρ𝜌\rhoitalic_ρ as (P,ρ)=minρP(ρ,ρ^)𝑃𝜌𝜌𝑃min𝜌^𝜌\mathcal{E}(P,\rho)=\underset{\rho\in P}{\mathrm{min}}\>\mathcal{E}(\rho,\hat{% \rho})caligraphic_E ( italic_P , italic_ρ ) = start_UNDERACCENT italic_ρ ∈ italic_P end_UNDERACCENT start_ARG roman_min end_ARG caligraphic_E ( italic_ρ , over^ start_ARG italic_ρ end_ARG ).

The error (P,ρ)𝑃𝜌\mathcal{E}(P,\rho)caligraphic_E ( italic_P , italic_ρ ) defined in Definition 1 can be regarded as a metric for approximation capability of P𝑃Pitalic_P about ρ𝜌\rhoitalic_ρ. Let P𝑃Pitalic_P and P~[𝐱~]~𝑃delimited-[]~𝐱\tilde{P}[\tilde{\mathbf{x}}]over~ start_ARG italic_P end_ARG [ over~ start_ARG bold_x end_ARG ] denote the latent feature of electron density in the original network and the network with Neural Polarization, respectively. Then, the following holds, by setting 𝐱~~𝐱\tilde{\mathbf{x}}over~ start_ARG bold_x end_ARG where error (P,ρ)𝑃𝜌\mathcal{E}(P,\rho)caligraphic_E ( italic_P , italic_ρ ) occurs. Detailed definition and proof are provided in A.5.

Proposition 1

For any electron density ρ𝜌\rhoitalic_ρ, there exists 𝐱~~𝐱\tilde{\mathbf{x}}over~ start_ARG bold_x end_ARG which that satisfies (P,ρ)>(P~[𝐱~],ρ)𝑃𝜌~𝑃delimited-[]~𝐱𝜌\mathcal{E}(P,\rho)>\mathcal{E}(\tilde{P}[\tilde{\mathbf{x}}],\rho)caligraphic_E ( italic_P , italic_ρ ) > caligraphic_E ( over~ start_ARG italic_P end_ARG [ over~ start_ARG bold_x end_ARG ] , italic_ρ )

Proposition 1 shows that Neural Polarization can achieve better approximation than the original network for arbitrary electron densities. Because the electron density itself is a type of function, this proposition supports that Neural Polarization can obtain node features 𝐯𝐯\mathbf{v}bold_v that are closer to the real electron density compared to the original network. Therefore, we have demonstrated that Neural Polarization provides better approximation to the electron density within neural networks.

4 Experiment

To confirm the effect of Neural Polarization on general SO(3)-equivariant models, we selected three equivariant models (EGNN [36], Equiformer [19], TorchMD-NET [31]) of various architectures as the baseline networks. We performed experiments on QM9 [17] and MD17 [37], which are most commonly used datasets for molecular property predictions. Lastly, we investigated whether Neural Polarization can be effective on non-molecular tasks involving particle movements, we conducted experiments on the n-body system task proposed by in EGNN. Details of implementations are presented in the Appendix A.8.

Refer to caption
Figure 1: Conceptual overview of our research compared with other methodology. In DFT, there exists a one-to-one correspondence between ρ(r)𝜌𝑟\rho(\vec{r})italic_ρ ( over→ start_ARG italic_r end_ARG ) and molecular conformation {𝐱,𝐚}𝐱𝐚\{\mathbf{x},\mathbf{a}\}{ bold_x , bold_a } which both fully determines the other properties of the molecule. The baseline 𝕊𝕆(3)𝕊𝕆3\mathbb{SO}(3)blackboard_S blackboard_O ( 3 )-equivariant networks can provide a more rich representation of an electron density ρ(r)𝜌𝑟\rho(\vec{r})italic_ρ ( over→ start_ARG italic_r end_ARG ) with Neural Polarization.
Refer to caption
Figure 2: Comparison between ball-and-stick view with space-filling view (left) and overall architecture of equivariant networks accompanied by Neural Polarization (right). Neural polarization allows a molecule representation to incorporate a nearby environment including polarization, which is more similar to a space-filling view of a molecule. We expect two types of moving terms 𝐱~~𝐱\tilde{\mathbf{x}}over~ start_ARG bold_x end_ARG and 𝐯~~𝐯\tilde{\mathbf{v}}over~ start_ARG bold_v end_ARG can infer and utilize the characteristics of an electron density ρ𝜌\rhoitalic_ρ of each molecule during the training process, without any additional information or constraints.
{adjustwidth}

-4cm-4cm Target Unit EGNN TorchMD-NET Equiformer avg. ΔΔ\Deltaroman_Δ% w/o NP w/ NP w/o NP w/ NP w/o NP w/ NP μ𝜇\muitalic_μ D 0.029 0.03 0.011 0.014 0.011 0.010 +4.92% α𝛼\alphaitalic_α a03superscriptsubscript𝑎03{a_{0}}^{3}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT 0.071 0.071 0.059 0.0447 0.046 0.0527 -3.31% ϵHOMOsubscriptitalic-ϵHOMO\epsilon_{\mathrm{HOMO}}italic_ϵ start_POSTSUBSCRIPT roman_HOMO end_POSTSUBSCRIPT meV 29 29.9 20.3 18.4 16.5 16.7 -2.04% ϵLUMOsubscriptitalic-ϵLUMO\epsilon_{\mathrm{LUMO}}italic_ϵ start_POSTSUBSCRIPT roman_LUMO end_POSTSUBSCRIPT meV 25 23.4 17.5 17.8 14.3 14.0 -2.43% ΔϵsubscriptΔitalic-ϵ\Delta_{\epsilon}roman_Δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT meV 48 47.9 36.1 41.9 30 33.7 +8.20% <R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT> a02superscriptsubscript𝑎02{a_{0}}^{2}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 0.106 0.089 0.033 0.085 0.251 0.162 -4.29% zpve𝑧𝑝𝑣𝑒zpveitalic_z italic_p italic_v italic_e meV 1.55 1.50 1.84 1.22 1.26 2.15 -4.25% U0subscript𝑈0U_{0}italic_U start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT meV 11 9.52 6.15 5.5 6.59 5.54 -15.44% U𝑈Uitalic_U meV 12 10.33 6.38 5.4 6.74 5.49 -19.03% H𝐻Hitalic_H meV 12 9.52 6.16 5.6 6.63 6.27 -13.93% G𝐺Gitalic_G meV 12 11.5 7.62 6.6 7.63 7.01 -9.55% Cvsubscript𝐶𝑣C_{v}italic_C start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT cal/mol K 0.031 0.032 0.026 0.023 0.023 0.023 -3.31% avg. ΔΔ\Deltaroman_Δ% -6.84% -5.29% -4.76% -5.63%

Table 1: Mean absolute error on QM9.
  • NP: Neural Polarization.

Table 2: Mean absolute errors (MAE) of the energy and force prediction of MD17 (Unit: kcal/mol/A).
Molecule Target TorchMD-NET avg. ΔΔ\Deltaroman_Δ%
w/o NP w/ NP
Aspirin Energy 0.123 0.126 2.38%
Forces 0.253 0.224 -12.95%
Benzene Energy 0.058 0.05424 -6.93%
Forces 0.196 0.1174 -10.48%
Ethanol Energy 0.052 0.0524 0.76%
Forces 0.109 0.0878 -24.15%
Malonaldehyde Energy 0.077 0.0794 3.02%
Forces 0.169 0.146 -15.75%
Naphthalene Energy 0.085 0.081 -4.94%
Forces 0.061 0.1594 61.73%
Salicylic acid Energy 0.093 0.08086 -15.01%
Forces 0.129 0.1262 -2.22%
Toluene Energy 0.074 0.058 -27.59%
Forces 0.067 0.057 -17.54%
Uracil Energy 0.095 0.0857 -10.85%
Forces 0.095 0.0857 -10.85%
Energy -7.39%
Forces -4.03%
  • *The performance gain is -13.42% except for the case of Naphthalene.

Algorithm 1 SO(3)-equivariant network without Neural Polarization Given 𝐱3𝐱superscript3\mathbf{x}\in\mathbb{R}^{3}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, 𝐚𝐚\mathbf{a}\in\mathbb{R}bold_a ∈ blackboard_R, 𝐯𝕍𝐯𝕍\mathbf{v}\in\mathbb{V}bold_v ∈ blackboard_V and a layer index t=0,1,,(T1)𝑡01𝑇1t=0,1,...,(T-1)italic_t = 0 , 1 , … , ( italic_T - 1 ). 𝐯𝟎subscript𝐯0\mathbf{v_{0}}bold_v start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT \longleftarrow Embedding(𝐱𝐱\mathbf{x}bold_x, 𝐚𝐚\mathbf{a}bold_a) for t=0,1,,(T1)𝑡01𝑇1t=0,1,...,(T-1)italic_t = 0 , 1 , … , ( italic_T - 1 ) do    𝐯t+1subscript𝐯𝑡1\mathbf{v}_{t+1}bold_v start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT \longleftarrow EquivariantLayer(𝐯tsubscript𝐯𝑡\mathbf{v}_{t}bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT) end for y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG \longleftarrow Pooling(𝐯Tsubscript𝐯𝑇\mathbf{v}_{T}bold_v start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT) return y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG Algorithm 2 SO(3)-equivariant network with Neural Polarization (proposed) Given 𝐱,𝐱~𝐱~𝐱\mathbf{x},\color[rgb]{0,0,1}{\mathbf{\tilde{x}}}bold_x , over~ start_ARG bold_x end_ARG 3absentsuperscript3\in\mathbb{R}^{3}∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, 𝐚,𝐚~𝐚~𝐚\mathbf{a},\color[rgb]{0,0,1}{\mathbf{\tilde{a}}}\in\mathbb{R}bold_a , over~ start_ARG bold_a end_ARG ∈ blackboard_R, 𝐯,𝐯~𝕍𝐯~𝐯𝕍\mathbf{v},\color[rgb]{0,0,1}{\mathbf{\tilde{v}}}\in\mathbb{V}bold_v , over~ start_ARG bold_v end_ARG ∈ blackboard_V and a layer index t=0,1,,(T1)𝑡01𝑇1t=0,1,...,(T-1)italic_t = 0 , 1 , … , ( italic_T - 1 ). 𝐯𝟎,𝐯𝟎~subscript𝐯0~subscript𝐯0\mathbf{v_{0}},\color[rgb]{0,0,1}{\mathbf{\tilde{v_{0}}}}bold_v start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT , over~ start_ARG bold_v start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT end_ARG \longleftarrow Embedding(𝐱𝐱\mathbf{x}bold_x, 𝐚,𝐚~𝐚~𝐚\mathbf{a},\color[rgb]{0,0,1}{\mathbf{\tilde{a}}}bold_a , over~ start_ARG bold_a end_ARG) for t=0,1,,(T1)𝑡01𝑇1t=0,1,...,(T-1)italic_t = 0 , 1 , … , ( italic_T - 1 ) do    𝐯t+1,𝐯~t+1subscript𝐯𝑡1subscript~𝐯𝑡1\mathbf{v}_{t+1},\color[rgb]{0,0,1}{\mathbf{\tilde{v}}_{t+1}}bold_v start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , over~ start_ARG bold_v end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT \longleftarrow EquivariantLayer([𝐱t,𝐱~tsubscript𝐱𝑡subscript~𝐱𝑡\mathbf{x}_{t},\color[rgb]{0,0,1}{\mathbf{\tilde{x}}_{t}}bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT], [𝐯t,𝐯~tsubscript𝐯𝑡subscript~𝐯𝑡\mathbf{v}_{t},\color[rgb]{0,0,1}{\mathbf{\tilde{v}}_{t}}bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG bold_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT])    Δ𝐱~tΔsubscript~𝐱𝑡\color[rgb]{0,0,1}{\Delta\mathbf{\tilde{x}}_{t}}roman_Δ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT \longleftarrow ProjtsubscriptProj𝑡\color[rgb]{0,0,1}{\text{Proj}_{t}}Proj start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (𝐯~t+1(\color[rgb]{0,0,1}{\mathbf{\tilde{v}}_{t+1}}( over~ start_ARG bold_v end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT))))    𝐱~t+1subscript~𝐱𝑡1\color[rgb]{0,0,1}{\mathbf{\tilde{x}}_{t+1}}over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT \longleftarrow 𝐱~tsubscript~𝐱𝑡\color[rgb]{0,0,1}{\mathbf{\tilde{x}}_{t}}over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + Δ𝐱~tΔsubscript~𝐱𝑡\color[rgb]{0,0,1}{\Delta\mathbf{\tilde{x}}_{t}}roman_Δ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end for y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG \longleftarrow Pooling(𝐯~Tsubscript~𝐯𝑇\color[rgb]{0,0,1}{\mathbf{\tilde{v}}_{T}}over~ start_ARG bold_v end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT) return y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG

Refer to caption
Figure 3: The trajectories of x~~𝑥\tilde{x}over~ start_ARG italic_x end_ARG of three different molecules trained on three types of target objectives, ϵHOMOsubscriptitalic-ϵ𝐻𝑂𝑀𝑂\epsilon_{HOMO}italic_ϵ start_POSTSUBSCRIPT italic_H italic_O italic_M italic_O end_POSTSUBSCRIPT (top), zpve𝑧𝑝𝑣𝑒zpveitalic_z italic_p italic_v italic_e (middle), and U0subscript𝑈0U_{0}italic_U start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (bottom). The first row is the distribution of electronegativity of each target molecule, calculated by GAMESS [38] the level of 6-31G. The second, third, and the last rows show the trajectories of x~~𝑥\tilde{x}over~ start_ARG italic_x end_ARG (small dark red dots near each atom) from three different models trained on ϵHOMOsubscriptitalic-ϵ𝐻𝑂𝑀𝑂\epsilon_{HOMO}italic_ϵ start_POSTSUBSCRIPT italic_H italic_O italic_M italic_O end_POSTSUBSCRIPT, zpve𝑧𝑝𝑣𝑒zpveitalic_z italic_p italic_v italic_e, and U0subscript𝑈0U_{0}italic_U start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, respectively. The baseline network was torchMD-NET. More figures are provided in A.9
Table 3: Ablation results trained with Equiformer on three types of QM9 targets μ𝜇\muitalic_μ, ϵHOMOsubscriptitalic-ϵ𝐻𝑂𝑀𝑂\epsilon_{HOMO}italic_ϵ start_POSTSUBSCRIPT italic_H italic_O italic_M italic_O end_POSTSUBSCRIPT and U0subscript𝑈0U_{0}italic_U start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. {𝐱,𝐱}𝐱𝐱\{\mathbf{x},\mathbf{x}\}{ bold_x , bold_x } in the second row is the ablation study for comparing the effect of 𝐱~~𝐱\tilde{\mathbf{x}}over~ start_ARG bold_x end_ARG, kee** the same computational cost and weight parameters. Scale is the same with Table 1.
Method μ𝜇\muitalic_μ ϵHOMOsubscriptitalic-ϵ𝐻𝑂𝑀𝑂\epsilon_{HOMO}italic_ϵ start_POSTSUBSCRIPT italic_H italic_O italic_M italic_O end_POSTSUBSCRIPT U0subscript𝑈0U_{0}italic_U start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
Equiformer 0.0118 16.5 6.59
Equiformer + {𝐱,𝐱}𝐱𝐱\{\mathbf{x},\mathbf{x}\}{ bold_x , bold_x } 0.0159 17.7 8.80
Equiformer + NP 0.0109 16.7 5.54
Table 4: Mean Squared Error (MSE) for the future position estimation in n-body task (the prediction of particles’ movement), proposed in [36]. The results of baseline EGNN are retrieved from the original paper.
Method MSE
EGNN 0.0071
EGNN + NP 0.0051

5 Result

5.1 The performance gains on QM9 and MD17 dataset

The experimental results on the QM9 dataset are presented in Table 1, categorized by whether Neural Polarization was applied (marked as ‘with NP’) or not for each baseline network, in all 12 target cases. Most of the baseline results (the left side) of each previous model were reproduced by training the source code provided in the official page, from the scratch. A few cases were retrieved from the reports on the original paper, in the case of a computation or compatibility issue with our environment for the source code.

In the case of EGNN, we observed that the error was reduced on 9 labels (including no change of an alpha case) with using a Neural Polarization, with an average of -6.84% error change rate of all 12 targets. In the next case, TorchMD-NET, the error was reduced on 7 targets with a Neural Polarization, with an average of -5.29% error change rate of all 12 targets. The last case Equiformer, the error was reduced on 8 targets with a Neural Polarization (including no change of a Cvsubscript𝐶𝑣C_{v}italic_C start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT case), with an average of -4.76% error change rate of all targets.

Interestingly, we observed that for the cases of thermodynamic properties including U0,U,H,Gsubscript𝑈0𝑈𝐻𝐺U_{0},U,H,Gitalic_U start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_U , italic_H , italic_G, the error was significantly reduced regardless of the baseline network type. In these four cases, the average performance gain (the average of three error change rates of each baseline case) the ranges from -10% to -20%, whereas other six target cases resulting the error reduction (except for μ𝜇\muitalic_μ and ΔϵsubscriptΔitalic-ϵ\Delta_{\epsilon}roman_Δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT), showed the ranges from -2% to -5% of performance gains, respectively. In the analysis of the average error change rate of each baseline model, there was no significant difference between three models. EGNN showed the best performance gain of -6.84%, followed by TorchMD-NET (-5.29%), and Equiformer (-4.76%).

Next is the analysis on the MD17 dataset consisting of eight molecules, using TorchMD-NET as a baseline network. Each molecule has two types of targets, energy and forces, respectively. These results are presented in Table 2. In case of energy prediction, the error was reduced in five molecules when trained on the network with Neural Polarization, and the average error change rate of all energies of eight molecules is -7.39%. In the case of forces prediction, the error was reduced seven of eight molecules on the network with Neural Polarization, except for Naphthalene. We observed that in the case of force prediction of Naphthalene molecule, the error was significantly increased by 61.73%, although the error was decreased in the case of energy prediction. The reason of this contrasting results of the naphthalene case is not clear. In summary, the average error change rates were -7.39% in energy predictions and -4.03% in forces predictions of eight molecules, respectively. Without considering naphthalene, the average error change rate of force predictions was decrease to -13.42%.

5.2 Investigation of the polarization trajectory

To validate our assumption that updating 𝐱~~𝐱\tilde{\mathbf{x}}over~ start_ARG bold_x end_ARG and its equivariant feature 𝐯~~𝐯\tilde{\mathbf{v}}over~ start_ARG bold_v end_ARG can facilitate exploiting molecule’s electron density for molecular property prediction, we analyzed the final position of 𝐱~~𝐱\tilde{\mathbf{x}}over~ start_ARG bold_x end_ARG extracted from the trained models. We selected targets of various types, and tracked the trajectory of every 𝐱~~𝐱\tilde{\mathbf{x}}over~ start_ARG bold_x end_ARG during the training process.

We observed that the most determining factor of the movement of x~~𝑥\tilde{x}over~ start_ARG italic_x end_ARG is the target objective, rather than atom type or bond types. x~~𝑥\tilde{x}over~ start_ARG italic_x end_ARG from the model trained on ϵhomosubscriptitalic-ϵ𝑜𝑚𝑜\epsilon_{homo}italic_ϵ start_POSTSUBSCRIPT italic_h italic_o italic_m italic_o end_POSTSUBSCRIPT whereas in the trained model with Neural Polarization on zpve𝑧𝑝𝑣𝑒zpveitalic_z italic_p italic_v italic_e, x~~𝑥\tilde{x}over~ start_ARG italic_x end_ARG tends to move toward the outside of the molecule center of mass. One possible explanation for these characteristic patterns is that Neural Polarization was adaptively trained for optimizing x~~𝑥\tilde{x}over~ start_ARG italic_x end_ARG depending on training target types, rather than just increasing number of parameters for the original atoms.

Another notable point is that each x~~𝑥\tilde{x}over~ start_ARG italic_x end_ARG did not deviate more than a half of the bond length from its belonging atom x𝑥xitalic_x. Although further profound analyses would be needed to explain the movement of x~~𝑥\tilde{x}over~ start_ARG italic_x end_ARG, this trend may be one of the evidences that x~~𝑥\tilde{x}over~ start_ARG italic_x end_ARG perform a role in supporting original x𝑥{x}italic_x, while understanding the characteristics of molecular properties.

5.3 Ablation study

We assumed that the performance improvement in the model is not simply caused by an increase in the number of variables and parameters, we trained Neural Polarization on a pair of non-movable points {𝐱,𝐱}𝐱𝐱\{\mathbf{x},\mathbf{x}\}{ bold_x , bold_x }, which is a replicating the inputs. We conducted an ablation study on ϵHOMOsubscriptitalic-ϵ𝐻𝑂𝑀𝑂\epsilon_{HOMO}italic_ϵ start_POSTSUBSCRIPT italic_H italic_O italic_M italic_O end_POSTSUBSCRIPT, μ𝜇\muitalic_μ, and U0subscript𝑈0U_{0}italic_U start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT in QM9. As shown in Table 3, using {𝐱,𝐱}𝐱𝐱\{\mathbf{x},\mathbf{x}\}{ bold_x , bold_x } rather than {𝐱,𝐱~}𝐱~𝐱\{\mathbf{x},\tilde{\mathbf{x}}\}{ bold_x , over~ start_ARG bold_x end_ARG } increase the prediction error, and the performances were improved only applied with Neural Polarization. Based on this, we found that movable points perform a significant role in improving performance of property prediction tasks.

5.4 Neural Polarization in other domain

We found that 1 holds not only for the electron density of molecules but also for general 3-D density functions. To examine this assumption, we conducted experiments on the n-body task proposed in [36]. As demonstrated in Table 4, Neural Polarization also improved performances n-body task, which is not limited to molecule tasks. This observation led to possible generalizability of Neural Polarization beyond molecular tasks.

6 Discussion

For analyzing the effect of Neural Polarization on molecular property prediction tasks, we analyzed the final positions of x~~𝑥\tilde{x}over~ start_ARG italic_x end_ARG along with the directions of the covalent bonds in molecule. According to chemistry, a single bond is formed by the sharing of an electron pair between two atoms, while a double bond arises from the sharing of two pairs of electrons, leading to electron densities aligned parallel to the bond axis. In case of aromatic rings, the delocalized electron densities form planar regions above and below the ring plane. In accord with this fundamentals, as shown in Figure 3 (right), the trajectories from atom with single bonds exist near bonds, while the trajectories from aromatic rings were created on the same plane with the ring, as shown in Figure 3 (left). In addition, we observed that most of the final location of 𝐱~Tsubscript~𝐱𝑇\tilde{\mathbf{x}}_{T}over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT (small red dot) is distant from the original atom location 𝐱0subscript𝐱0\mathbf{x}_{0}bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, but no more than half the bond length away, which is related with the inherent property of a covalent bond. From those observations, we assume that Neural Polarization recognize the various molecular property including the characteristics of bonding types, and understand the characteristics of a polarization effect of each molecule.

7 Conclusion

We proposed Neural Polarization, which enables the intermediate state of an equivariant network to better represent the electron density corresponding to the intermediate state in DFT calculations. Accompanied by flexible applicability, Neural Polarization demonstrated performance improvements across diverse tasks and models, showing that it can be trained toward the polarization characteristics of electron density in accord with our assumption. Based on these results, we expect Neural Polarization to enforce improvements in general molecular problems. For future work, we will propose various methodologies inspired by more aspects beyond polarization.

Limitations

While Neural Polarization does not change the computational complexity of the original model, it introduces an additional computational cost. Meanwhile, for deeper insights from trajectories, it would be beneficial to validate the approach on molecular datasets that include electron density information.

Broader impacts

Our study can lead to an advanced research subjects for bridging gap between quantum chemistry and deep learning. In addition, the development of Neural Polarization involves concepts from DFT, equivariant neural networks, and molecular modeling. This interdisciplinary approach could foster collaborations between researchers from different fields, such as physics, chemistry, machine learning.

References

  • [1] I.N. Levine. Quantum chemistry. Pearson advanced chemistry series, 2014.
  • [2] Albert P Bartók, Mike C Payne, Risi Kondor, and Gábor Csányi. Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons. Physical review letters, 104(13):136403, 2010.
  • [3] Narbe Mardirossian and Martin Head-Gordon. Thirty years of density functional theory in computational chemistry: an overview and extensive assessment of 200 density functionals. Molecular Physics, 115(19):2315–2372, 2017.
  • [4] Weitao Yang. Direct calculation of electron density in density-functional theory. Physical review letters, 66(11):1438, 1991.
  • [5] Eugene S Kryachko and Eduardo V Ludeña. Energy density functional theory of many-electron systems, volume 4. Springer Science & Business Media, 2012.
  • [6] J Andzelm and E Wimmer. Density functional gaussian-type-orbital approach to molecular geometries, vibrations, and reaction energies. The Journal of chemical physics, 96(2):1280–1303, 1992.
  • [7] Jens Jørgen Mortensen, Lars Bruno Hansen, and Karsten Wedel Jacobsen. Real-space grid implementation of the projector augmented wave method. Physical review B, 71(3):035109, 2005.
  • [8] Richard J Morris, Rafael J Najmanovich, Abdullah Kahraman, and Janet M Thornton. Real spherical harmonic expansion coefficients as 3d shape descriptors for protein binding pocket and ligand comparisons. Bioinformatics, 21(10):2347–2355, 2005.
  • [9] Jeanne L McHale. Molecular spectroscopy. CRC Press, 2017.
  • [10] Praveen C Hariharan and John A Pople. The influence of polarization functions on molecular orbital hydrogenation energies. Theoretica chimica acta, 28:213–222, 1973.
  • [11] RHWJ Ditchfield, Warren J Hehre, and John A Pople. Self-consistent molecular-orbital methods. ix. an extended gaussian-type basis for molecular-orbital studies of organic molecules. The Journal of Chemical Physics, 54(2):724–728, 1971.
  • [12] Vitaly A Rassolov, Mark A Ratner, John A Pople, Paul C Redfern, and Larry A Curtiss. 6-31g* basis set for third-row atoms. Journal of Computational Chemistry, 22(9):976–984, 2001.
  • [13] Trygve Helgaker, Sonia Coriani, Poul Jørgensen, Kasper Kristensen, Jeppe Olsen, and Kenneth Ruud. Recent advances in wave function-based methods of molecular-property calculations. Chemical reviews, 112(1):543–631, 2012.
  • [14] Mati Karelson, Victor S Lobanov, and Alan R Katritzky. Quantum-chemical descriptors in qsar/qspr studies. Chemical reviews, 96(3):1027–1044, 1996.
  • [15] Frank Jensen. Polarization consistent basis sets: Principles. The Journal of Chemical Physics, 115(20):9113–9125, 2001.
  • [16] Daniel Sánchez-Portal, Pablo Ordejon, Emilio Artacho, and Jose M Soler. Density-functional method for very large systems with lcao basis sets. International journal of quantum chemistry, 65(5):453–461, 1997.
  • [17] Raghunathan Ramakrishnan, Pavlo O Dral, Matthias Rupp, and O Anatole von Lilienfeld. Quantum chemistry structures and properties of 134 kilo molecules. Scientific Data, 1, 2014.
  • [18] Ernest R Davidson and David Feller. Basis set selection for molecular calculations. Chemical Reviews, 86(4):681–696, 1986.
  • [19] Yi-Lun Liao and Tess Smidt. Equiformer: Equivariant graph attention transformer for 3d atomistic graphs. In The Eleventh International Conference on Learning Representations, 2022.
  • [20] Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E Smidt, and Boris Kozinsky. E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature communications, 13(1):1–11, 2022.
  • [21] Brandon Anderson, Truong-Son Hy, and Risi Kondor. Cormorant: Covariant molecular neural networks. arXiv preprint arXiv:1906.04015, 2019.
  • [22] Fabian B Fuchs, Daniel E Worrall, Volker Fischer, and Max Welling. Se (3)-transformers: 3d roto-translation equivariant attention networks. arXiv preprint arXiv:2006.10503, 2020.
  • [23] Johannes Brandstetter, Rob Hesselink, Elise van der Pol, Erik J Bekkers, and Max Welling. Geometric and physical quantities improve e (3) equivariant message passing. In International Conference on Learning Representations, 2021.
  • [24] Oliver T Unke, Stefan Chmiela, Michael Gastegger, Kristof T Schütt, Huziel E Sauceda, and Klaus-Robert Müller. Spookynet: Learning force fields with electronic degrees of freedom and nonlocal effects. Nature communications, 12(1):7273, 2021.
  • [25] Ilyes Batatia, David P Kovacs, Gregor Simm, Christoph Ortner, and Gábor Csányi. Mace: Higher order equivariant message passing neural networks for fast and accurate force fields. Advances in Neural Information Processing Systems, 35:11423–11436, 2022.
  • [26] Thorben Frank, Oliver Unke, and Klaus-Robert Müller. So3krates: Equivariant attention for interactions on arbitrary length-scales in molecular systems. Advances in Neural Information Processing Systems, 35:29400–29413, 2022.
  • [27] William Raymond Scott. Group theory. Courier Corporation, 2012.
  • [28] Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, and Patrick Riley. Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv preprint arXiv:1802.08219, 2018.
  • [29] Kristof Schütt, Oliver Unke, and Michael Gastegger. Equivariant message passing for the prediction of tensorial properties and molecular spectra. In International Conference on Machine Learning, pages 9377–9388. PMLR, 2021.
  • [30] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In International Conference on Machine Learning, pages 1263–1272. PMLR, 2017.
  • [31] Philipp Thölke and Gianni De Fabritiis. Torchmd-net: Equivariant transformers for neural network based molecular potentials. arXiv preprint arXiv:2202.02541, 2022.
  • [32] Bhupalee Kalita, Li Li, Ryan J McCarty, and Kieron Burke. Learning to approximate density functionals. Accounts of Chemical Research, 54(4):818–826, 2021.
  • [33] Gabriel R Schleder, Antonio CM Padilha, Carlos Mera Acosta, Marcio Costa, and Adalberto Fazzio. From dft to machine learning: recent approaches to materials science–a review. Journal of Physics: Materials, 2(3):032001, 2019.
  • [34] Eberhard Engel. Density functional theory. Springer, 2011.
  • [35] Walter Kohn and Lu Jeu Sham. Self-consistent equations including exchange and correlation effects. Physical review, 140(4A):A1133, 1965.
  • [36] Vıctor Garcia Satorras, Emiel Hoogeboom, and Max Welling. E (n) equivariant graph neural networks. In International conference on machine learning, pages 9323–9332. PMLR, 2021.
  • [37] Stefan Chmiela, Huziel E Sauceda, Igor Poltavsky, Klaus-Robert Müller, and Alexandre Tkatchenko. sgdml: Constructing accurate and data efficient molecular force fields using machine learning. Computer Physics Communications, 240:38–45, 2019.
  • [38] Giuseppe M. J. Barca, Colleen Bertoni, Laura Carrington, Dipayan Datta, Nuwan De Silva, J. Emiliano Deustua, Dmitri G. Fedorov, Jeffrey R. Gour, Anastasia O. Gunina, Emilie Guidez, Taylor Harville, Stephan Irle, Joe Ivanic, Karol Kowalski, Sarom S. Leang, Hui Li, Wei Li, Jesse J. Lutz, Ilias Magoulas, Joani Mato, Vladimir Mironov, Hiroya Nakata, Buu Q. Pham, Piotr Piecuch, David Poole, Spencer R. Pruitt, Alistair P. Rendell, Luke B. Roskop, Klaus Ruedenberg, Tosaporn Sattasathuchana, Michael W. Schmidt, Jun Shen, Lyudmila Slipchenko, Masha Sosonkina, Vaibhav Sundriyal, Ananta Tiwari, Jorge L. Galvez Vallejo, Bryce Westheimer, Marta Wloch, Peng Xu, Federico Zahariev, and Mark S. Gordon. Recent developments in the general atomic and molecular electronic structure system. The Journal of Chemical Physics, 152(15):154102, April 2020.
  • [39] Kristof T Schütt, Pieter-Jan Kindermans, Huziel E Sauceda, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Robert Müller. Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. arXiv preprint arXiv:1706.08566, 2017.
  • [40] Johannes Klicpera, Shankari Giri, Johannes T. Margraf, and Stephan Günnemann. Fast and uncertainty-aware directional message passing for non-equilibrium molecules. In NeurIPS-W, 2020.
  • [41] Johannes Gasteiger, Florian Becker, and Stephan Günnemann. Gemnet: Universal directional graph neural networks for molecules. Advances in Neural Information Processing Systems, 34:6790–6802, 2021.
  • [42] Johan J de Swart. The octet model and its clebsch-gordan coefficients. In The Eightfold Way, pages 120–143. CRC Press, 2018.
  • [43] Eugen Wigner. Gruppentheorie und ihre anwendung auf die quantenmechanik der atomspektren. Monatshefte für Mathematik und Physik, 1931.
  • [44] Congyue Deng, Or Litany, Yueqi Duan, Adrien Poulenard, Andrea Tagliasacchi, and Leonidas J Guibas. Vector neurons: A general framework for so (3)-equivariant networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12200–12209, 2021.
  • [45] Emiel Hoogeboom, Vıctor Garcia Satorras, Clément Vignac, and Max Welling. Equivariant diffusion for molecule generation in 3d. In International conference on machine learning, pages 8867–8887. PMLR, 2022.
  • [46] Kevin Ryczko, David A Strubbe, and Isaac Tamblyn. Deep learning and density-functional theory. Physical Review A, 100(2):022512, 2019.
  • [47] Kristof T Schütt, Michael Gastegger, Alexandre Tkatchenko, K-R Müller, and Reinhard J Maurer. Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions. Nature communications, 10(1):5024, 2019.
  • [48] Ryan Pederson, Bhupalee Kalita, and Kieron Burke. Machine learning and density functional theory. Nature Reviews Physics, 4(6):357–358, 2022.
  • [49] Bing Huang, Guido Falk von Rudorff, and O Anatole von Lilienfeld. The central role of density functional theory in the ai age. Science, 381(6654):170–175, 2023.
  • [50] Tian Xie and Jeffrey C Grossman. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Physical review letters, 120(14):145301, 2018.
  • [51] Omar Allam, Byung Woo Cho, Ki Chul Kim, and Seung Soon Jang. Application of dft-based machine learning for develo** molecular electrode materials in li-ion batteries. RSC advances, 8(69):39414–39420, 2018.
  • [52] Andrew Lee, Suchismita Sarker, James E Saal, Logan Ward, Christopher Borg, Apurva Mehta, and Christopher Wolverton. Machine learned synthesizability predictions aided by density functional theory. Communications Materials, 3(1):73, 2022.
  • [53] Pin Chen, Jianwen Chen, Hui Yan, Qing Mo, Zexin Xu, **yu Liu, Wenqing Zhang, Yuedong Yang, and Yutong Lu. Improving material property prediction by leveraging the large-scale computational database and deep learning. The Journal of Physical Chemistry C, 126(38):16297–16305, 2022.
  • [54] Hsin-Yuan Huang, Richard Kueng, Giacomo Torlai, Victor V Albert, and John Preskill. Provably efficient machine learning for quantum many-body problems. Science, 377(6613):eabk3333, 2022.
  • [55] Bing Huang, Guido Falk von Rudorff, and O Anatole von Lilienfeld. Towards self-driving laboratories in chemistry and materials sciences: The central role of dft in the era of ai. arXiv preprint arXiv:2304.03272, 2023.
  • [56] Chenru Duan, Fang Liu, Aditya Nandy, and Heather J Kulik. Putting density functional theory to the test in machine-learning-accelerated materials discovery. The Journal of Physical Chemistry Letters, 12(19):4628–4637, 2021.
  • [57] Haiyang Yu, Meng Liu, Youzhi Luo, Alex Strasser, Xiaofeng Qian, Xiaoning Qian, and Shuiwang Ji. Qh9: A quantum hamiltonian prediction benchmark for qm9 molecules. Advances in Neural Information Processing Systems, 36, 2024.
  • [58] Pierre Hohenberg and Walter Kohn. Inhomogeneous electron gas. Physical review, 136(3B):B864, 1964.
  • [59] Jean-Pierre Serre et al. Linear representations of finite groups, volume 42. Springer, 1977.
  • [60] Teturo Inui, Yukito Tanabe, and Yositaka Onodera. Group theory and its applications in physics, volume 78. Springer Science & Business Media, 2012.
  • [61] Deutsche Akademie der Wissenschaften zu Berlin. Sitzungsberichte der Königlich Preussischen Akademie der Wissenschaften zu Berlin, volume Jan-Mai 1882. Berlin, Deutsche Akademie der Wissenschaften zu Berlin, 1882-1918, 1882. https://www.biodiversitylibrary.org/bibliography/42231.
  • [62] Brian C Hall and Brian C Hall. Lie groups, Lie algebras, and representations. Springer, 2013.
  • [63] Claus Müller. Spherical harmonics, volume 17. Springer, 2006.
  • [64] M Shiraishi. Spin-Weighted Spherical Harmonic Function. Springer, 2013.
  • [65] Stefan Chmiela, Alexandre Tkatchenko, Huziel E Sauceda, Igor Poltavsky, Kristof T Schütt, and Klaus-Robert Müller. Machine learning of accurate energy-conserving molecular force fields. Science advances, 3(5):e1603015, 2017.
  • [66] Thomas Kipf, Ethan Fetaya, Kuan-Chieh Wang, Max Welling, and Richard Zemel. Neural relational inference for interacting systems. In International conference on machine learning, pages 2688–2697. PMLR, 2018.

Appendix A Appendix / supplemental material

A.1 Table of notations

Table S1: Table of notations in this manuscript
Variable Definition
i𝑖iitalic_i an index of an atom of a molecule
t𝑡titalic_t a layer index of a baseline network
N𝑁Nitalic_N the number of atoms in a molecule
xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT a three-dimensional coordinate of i𝑖iitalic_i-th atom
aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT an atom number (or atom type) of i𝑖iitalic_i-th atom
vt,isubscript𝑣𝑡𝑖v_{t,i}italic_v start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT a node feature of t𝑡titalic_t - th Layer output
𝐱𝐱\mathbf{x}bold_x a set of {x0,x1,,xN1}subscript𝑥0subscript𝑥1subscript𝑥𝑁1\{x_{0},x_{1},...,x_{N-1}\}{ italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_N - 1 end_POSTSUBSCRIPT }
y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG predicted target (molecular property)
X𝑋Xitalic_X molecule conformation
ρ𝜌\rhoitalic_ρ electron density
χ𝜒\chiitalic_χ latent feature of a baseline network
ξ𝜉\xiitalic_ξ map** from latent feature χ𝜒\chiitalic_χ to electron density ρ𝜌\rhoitalic_ρ

A.2 Related works

Molecule property prediction based on molecule structure in Euclidean space Based on the relationship between molecule conformation and its properties, many existing networks utilized molecule conformation as a source for property prediction tasks. SchNet [39] introduced a radial basis function for embedding continuous-valued atom-atom distances, and DimeNet [40] utilized Bessel basis functions for embedding continuous-valued angles between three atoms.

To more accurately represent molecule structures located in Euclidean space, recent studies introduced geometry-group theory in their network. Various types of representation theory have been utilized in modern chemistry for structural analysis of molecules or crystals. Cormorant [21], NequIP [20], GemNet [41], PaiNN [29], and SpookyNet [24] are the well-known examples of their geometry-group equivariant blocks for molecule conformation. These studies contributed to more accurate and reliable molecule structure learning, however, the effects caused by electron configurations may still be limited in these type of representation.

Two branches of implementing equivariant networks Broadly speaking, there are two branches for implementing geometry-group equivariance in a neural network. One branch relied on the representation technique for SO(3) group in Physics. In particular, they introduced Clebsch-Gordan coefficients [42] or Wigner-D matrices [43], for SO(3)-group equivariant tensor product for features defined on spherical basis. Cormorant [21], NequIP [20], SE(3)-Transformer [22] are the examples belong to this category. Further explanation is described in A.4.

On the other hand, EGNN [36] and several following works [44, 45] did not use a computationally expensive tensor production. Instead, they separated scalar-valued features for scales and vector valued-features for directions, and trained them as individual features for molecule structure. This approach is relatively efficient in terms of computational complexity for larger molecules, in general.

Utilization of DFT for property prediction in machine learning As machine learning-based methods have progressed for solving more sophisticated problems in chemistry and material science, many recent studies [46, 47, 32, 48, 49] focused on DFT for a wide range of tasks. [50, 51] studied the DFT-related property prediction for the given materials. [52, 53, 54] are the examples of considering DFT for other molecules-related tasks. In one of the review paper [55], the authors argued that understanding DFT will be the necessary background to explore the chemical property for machine learning-based methods.

The prediction performance of current networks Despite the rapid increase in prediction performance of deep neural networks over a short period, there remains a considerable gap compared to DFT methods. There have been several reports [56, 1, 37, 57] about these limitations of the current neural networks in Chemistry research, and they pointed out that one of the possible limitations is that the deep learning approaches could not utilize enough chemical information in appropriate ways, including DFT.

A.3 Brief introduction about DFT

Density functional theory (DFT) [1] is one of the most widely used methodologies for studying molecular properties. The overall process of DFT calculation consists of inferring the electron density and calculating the properties of the molecule based on the computed electron density. Molecules are composed of multiple atoms with surrounding electrons. According to quantum mechanics, the location of an electron cannot be specified as a point, but rather as a probability distribution over the space, called electron density. The first Hohenberg-Kohn theorem [58] states that the ground state electron density of a molecule uniquely determines the external potential, and consequently all ground state molecular properties. This theorem implies that knowing just the electron density is enough to calculate any ground state property including energy.

A.4 Equivariance and representations in SO(3) and SE(3) symmetry

We briefly review several concepts on equivariance as an essential background for our motivation and strategy. A group [27] (G,)𝐺(G,\circ)( italic_G , ∘ ) is a type of an algebraic structure consisting of a non-empty set G={g}𝐺𝑔G=\{g\}italic_G = { italic_g } and a binary operation :G×GG\circ:G\times G\rightarrow G∘ : italic_G × italic_G → italic_G with satisfying three requirements: an associativity, an identity, and an inverse element. A (left) group action a𝑎aitalic_a of G𝐺Gitalic_G on a set X𝑋Xitalic_X is a function a:G×XX:𝑎𝐺𝑋𝑋a:G\times X\rightarrow Xitalic_a : italic_G × italic_X → italic_X with satisfying identity and compatibility for all g,hG𝑔𝐺g,h\in Gitalic_g , italic_h ∈ italic_G and all xX𝑥𝑋x\in Xitalic_x ∈ italic_X.

Group representation [59, 60] is φ𝜑\varphiitalic_φ a group homomorphism from a group G𝐺Gitalic_G to a general linear group GL(V)𝐺𝐿𝑉GL(V)italic_G italic_L ( italic_V ), which enables group actions to be represented as a matrix multiplication in (finite) vector space V𝑉Vitalic_V. If φ:GGL(V):𝜑𝐺𝐺𝐿𝑉\varphi:G\rightarrow GL(V)italic_φ : italic_G → italic_G italic_L ( italic_V ) has only trivial subrepresentations, it is called an irreducible representation or irrep. One important property of irreducible representations is the Great Orthogonality Theorem [61] stated by Schur’s orthogonality relations [62], RG{|G|}φ(L)(g)lm,ln=0subscriptsuperscript𝐺𝑅𝐺superscript𝜑𝐿subscript𝑔subscript𝑙𝑚subscript𝑙𝑛0\sum^{\{|G|\}}_{R\in G}\varphi^{(L)}(g)_{l_{m},l_{n}}=0∑ start_POSTSUPERSCRIPT { | italic_G | } end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_R ∈ italic_G end_POSTSUBSCRIPT italic_φ start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ( italic_g ) start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0 for ln,lm=1,,Lformulae-sequencesubscript𝑙𝑛subscript𝑙𝑚1𝐿l_{n},l_{m}=1,...,Litalic_l start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = 1 , … , italic_L are the dimension of φ,φIL𝜑for-all𝜑subscript𝐼𝐿\varphi,\forall\varphi\neq I_{L}italic_φ , ∀ italic_φ ≠ italic_I start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT. This theorem is proved by Schur’s lemma, 1) If V𝑉Vitalic_V and W𝑊Witalic_W are not isomorphic, then there are no nontrivial G𝐺Gitalic_G-linear maps between them, and 2) if V=W𝑉𝑊V=Witalic_V = italic_W and φV=φWsubscript𝜑𝑉subscript𝜑𝑊\varphi_{V}=\varphi_{W}italic_φ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT = italic_φ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT, then the only nontrivial G𝐺Gitalic_G-linear maps are the scalar multiplication of the identity.

Any function f𝑓fitalic_f satisfies f(gx)=f(x)𝑓𝑔𝑥𝑓𝑥f(g\circ x)=f(x)italic_f ( italic_g ∘ italic_x ) = italic_f ( italic_x ) is called an invariant function of group G𝐺Gitalic_G on X𝑋Xitalic_X, while it is called equivariant if it satisfies f(gx)=gf(x)𝑓𝑔𝑥𝑔𝑓𝑥f(g\circ x)=g\circ f(x)italic_f ( italic_g ∘ italic_x ) = italic_g ∘ italic_f ( italic_x ).

In this study, we focus on the special orthogonal group 𝕊𝕆(3)𝕊𝕆3\mathbb{SO}(3)blackboard_S blackboard_O ( 3 ), the group of all rotations under function composition in three-dimensional Euclidean space. The irreducible representations of 𝕊𝕆(3)𝕊𝕆3\mathbb{SO}(3)blackboard_S blackboard_O ( 3 ) are called Wigner-D matrices DL(g)superscript𝐷𝐿𝑔D^{L}(g)italic_D start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( italic_g ) of dimension 2L+12𝐿12L+12 italic_L + 1, and there are (2L+1)×(2L+1)2𝐿12𝐿1(2L+1)\times(2L+1)( 2 italic_L + 1 ) × ( 2 italic_L + 1 ) type of irreps matrices Dm,mLsubscriptsuperscript𝐷𝐿𝑚superscript𝑚D^{L}_{m,m^{\prime}}italic_D start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m , italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, with Lm,mLformulae-sequence𝐿𝑚superscript𝑚𝐿-L\leq m,m^{\prime}\leq L- italic_L ≤ italic_m , italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≤ italic_L, respectively (footnote: we assume integer L, m=m’ for real-valued spherical harmonics).

Spherical harmonics [63] is a set of orthonormal basis functions for irreducible representations of 𝕊𝕆(3)𝕊𝕆3\mathbb{SO}(3)blackboard_S blackboard_O ( 3 ), and denoted by YmL(θ,ϕ)superscriptsubscript𝑌𝑚𝐿𝜃italic-ϕY_{m}^{L}(\theta,\phi)italic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( italic_θ , italic_ϕ ) with an integer degree l𝑙litalic_l. On the unit sphere S2superscript𝑆2S^{2}italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, any square-integrable function s:S2:𝑠superscript𝑆2s:S^{2}\rightarrow\mathbb{C}italic_s : italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → blackboard_C can be expanded as a linear combination s(θ,ϕ)=l=0m=LLfmLYmL(θ,ϕ)𝑠𝜃italic-ϕsuperscriptsubscript𝑙0superscriptsubscript𝑚𝐿𝐿superscriptsubscript𝑓𝑚𝐿superscriptsubscript𝑌𝑚𝐿𝜃italic-ϕs(\theta,\phi)=\sum_{l=0}^{\infty}\sum_{m=-L}^{L}f_{m}^{L\ast}Y_{m}^{L}(\theta% ,\phi)italic_s ( italic_θ , italic_ϕ ) = ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m = - italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L ∗ end_POSTSUPERSCRIPT italic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( italic_θ , italic_ϕ ). Accordingly, any group action g𝕊𝕆(3)𝑔𝕊𝕆3g\in\mathbb{SO}(3)italic_g ∈ blackboard_S blackboard_O ( 3 ) can be expressed as a direct sum of Wigner-D matrices Dm,mLsubscriptsuperscript𝐷𝐿𝑚superscript𝑚D^{L}_{m,m^{\prime}}italic_D start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m , italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT with a change of basis P:P1D(g)P=D(g):𝑃superscript𝑃1𝐷𝑔𝑃superscript𝐷𝑔P:P^{-1}D(g)P=D^{\prime}(g)italic_P : italic_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_D ( italic_g ) italic_P = italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_g ) as follows:

D(g)=P1(iDli(g))P=P1[Dl0(g)Dl1(g)]P𝐷𝑔superscript𝑃1subscriptdirect-sum𝑖subscript𝐷subscript𝑙𝑖𝑔𝑃superscript𝑃1matrixsuperscript𝐷subscript𝑙0𝑔missing-subexpressionmissing-subexpressionmissing-subexpressionsuperscript𝐷subscript𝑙1𝑔missing-subexpressionmissing-subexpressionmissing-subexpression𝑃D(g)=P^{-1}\Bigg{(}\bigoplus_{i}D_{l_{i}}(g)\Bigg{)}P=P^{-1}\begin{bmatrix}D^{% l_{0}}(g)&&\\ &D^{l_{1}}(g)&\\ &&\ldots\end{bmatrix}Pitalic_D ( italic_g ) = italic_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ⨁ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_g ) ) italic_P = italic_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARG start_ROW start_CELL italic_D start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_g ) end_CELL start_CELL end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_D start_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_g ) end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL end_CELL start_CELL … end_CELL end_ROW end_ARG ] italic_P (9)

We can write a rotation R𝑅Ritalic_R in three-dimensional Euclidean space followed as:

YmL(θ+Δθ,ϕ+Δϕ)=m=LL[Dm,mL(α,β,γ)]YmL(θ,ϕ)superscriptsubscript𝑌𝑚𝐿𝜃Δ𝜃italic-ϕΔitalic-ϕsuperscriptsubscriptsuperscript𝑚𝐿𝐿delimited-[]subscriptsuperscript𝐷𝐿𝑚superscript𝑚𝛼𝛽𝛾subscriptsuperscript𝑌𝐿superscript𝑚𝜃italic-ϕY_{m}^{L}(\theta+\Delta\theta,\phi+\Delta\phi)=\sum_{m^{\prime}=-L}^{L}\Big{[}% D^{L}_{m,m^{\prime}}(\alpha,\beta,\gamma)\Big{]}Y^{L}_{m^{\prime}}(\theta,\phi)italic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( italic_θ + roman_Δ italic_θ , italic_ϕ + roman_Δ italic_ϕ ) = ∑ start_POSTSUBSCRIPT italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = - italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT [ italic_D start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m , italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_α , italic_β , italic_γ ) ] italic_Y start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_θ , italic_ϕ ) (10)

For a tensor production of two spherical tensors fm1L1subscriptsuperscript𝑓subscript𝐿1subscript𝑚1f^{L_{1}}_{m_{1}}italic_f start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and fm2L2subscriptsuperscript𝑓subscript𝐿2subscript𝑚2f^{L_{2}}_{m_{2}}italic_f start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT in 𝕊𝕆(3)𝕊𝕆3\mathbb{SO}(3)blackboard_S blackboard_O ( 3 ), the Clebsch-Gordan coefficients [62] C(L1,m1),(L2,m2)(L3,m3)superscriptsubscript𝐶subscript𝐿1subscript𝑚1subscript𝐿2subscript𝑚2subscript𝐿3subscript𝑚3C_{(L_{1},m_{1}),(L_{2},m_{2})}^{(L_{3},m_{3})}italic_C start_POSTSUBSCRIPT ( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT are used to assign the numerical value according to each decomposed type of two spherical tensors followed as:

fm3L3=m1=L1L1m2=L2L2C(L1,m1),(L2,m2)(L3,m3)fm1L1fm2L2subscriptsuperscript𝑓subscript𝐿3subscript𝑚3superscriptsubscriptsubscript𝑚1subscript𝐿1subscript𝐿1superscriptsubscriptsubscript𝑚2subscript𝐿2subscript𝐿2superscriptsubscript𝐶subscript𝐿1subscript𝑚1subscript𝐿2subscript𝑚2subscript𝐿3subscript𝑚3subscriptsuperscript𝑓subscript𝐿1subscript𝑚1subscriptsuperscript𝑓subscript𝐿2subscript𝑚2f^{L_{3}}_{m_{3}}=\sum_{m_{1}=-L_{1}}^{L_{1}}\sum_{m_{2}=-L_{2}}^{L_{2}}C_{(L_% {1},m_{1}),(L_{2},m_{2})}^{(L_{3},m_{3})}f^{L_{1}}_{m_{1}}f^{L_{2}}_{m_{2}}italic_f start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = - italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = - italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT ( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT (11)

In quantum chemistry, we can describe an electron charge distribution (cloud) of an atom generated by the interactions between an atom and its electrons [1]. It is called as an atomic orbital, and described as spherical coordinates with a radial term R(r)𝑅𝑟R(r)italic_R ( italic_r ) and spherical harmonics Yml(θ,ϕ)superscriptsubscript𝑌𝑚𝑙𝜃italic-ϕY_{m}^{l}(\theta,\phi)italic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_θ , italic_ϕ ) of polar angle θ𝜃\thetaitalic_θ and azimutal angle ϕitalic-ϕ\phiitalic_ϕ, with different degree l𝑙litalic_l and order m𝑚mitalic_m. Integer-valued degree of real spherical harmonics (footnote: notation) l=0,1,2,𝑙012l=0,1,2,...italic_l = 0 , 1 , 2 , … correspond to s,p,d,𝑠𝑝𝑑s,p,d,...italic_s , italic_p , italic_d , … orbital of an atom, and more higher l𝑙litalic_l-basis orbital can capture higher angular frequency of a function fmLsuperscriptsubscript𝑓𝑚𝐿f_{m}^{L}italic_f start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT defined on the surface of a sphere S2superscript𝑆2S^{2}italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The total angular momentum coupling can be described by an element of rotation group g𝕊𝕆(3)𝑔𝕊𝕆3g\in\mathbb{SO}(3)italic_g ∈ blackboard_S blackboard_O ( 3 ), and Wigner-D matrix [64] Dm,mLsubscriptsuperscript𝐷𝐿𝑚superscript𝑚D^{L}_{m,m^{\prime}}italic_D start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m , italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT with Clebsch-Gordan coefficients {C(L1,m1),(L2,m2)(L3,m3)}superscriptsubscript𝐶subscript𝐿1subscript𝑚1subscript𝐿2subscript𝑚2subscript𝐿3subscript𝑚3\{C_{(L_{1},m_{1}),(L_{2},m_{2})}^{(L_{3},m_{3})}\}{ italic_C start_POSTSUBSCRIPT ( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT } are used to product two angular momenta represented as spherical tensors of dimension 2L1+12subscript𝐿112L_{1}+12 italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 1 and 2L2+12subscript𝐿212L_{2}+12 italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 1, respectively.

A.5 Equivalence between selection of basis set and node feature

Let assume some basis set of DFT B𝐵Bitalic_B with spherical harmonics. Since node feature 𝐯t𝕍eqsubscript𝐯𝑡subscript𝕍𝑒𝑞\mathbf{v}_{t}\in\mathbb{V}_{eq}bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_V start_POSTSUBSCRIPT italic_e italic_q end_POSTSUBSCRIPT computed by equivariant layer is equivariant under 𝕊𝕆(3)𝕊𝕆3\mathbb{SO}(3)blackboard_S blackboard_O ( 3 ) transformation, 𝐯tsubscript𝐯𝑡\mathbf{v}_{t}bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT may available to represented as combination of irreducible representation of 𝕊𝕆(3)𝕊𝕆3\mathbb{SO}(3)blackboard_S blackboard_O ( 3 ). Using notation defined in 11 , let us represent {L,m}𝐿𝑚\{L,m\}{ italic_L , italic_m }-degree 𝕊𝕆(3)𝕊𝕆3\mathbb{SO}(3)blackboard_S blackboard_O ( 3 ) irreducible feature of vt,isubscript𝑣𝑡𝑖{v}_{t,i}italic_v start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT as fmiL=jfmijLsubscriptsuperscript𝑓𝐿𝑚𝑖subscript𝑗subscriptsuperscript𝑓𝐿𝑚𝑖𝑗f^{L}_{mi}=\sum_{j}f^{L}_{mij}italic_f start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_f start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m italic_i italic_j end_POSTSUBSCRIPT. With this definition we can represent vt,i=L,m,i,jfmijLsubscript𝑣𝑡𝑖subscript𝐿𝑚𝑖𝑗subscriptsuperscript𝑓𝐿𝑚𝑖𝑗{v}_{t,i}=\sum_{L,m,i,j}{f^{L}_{mij}}italic_v start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_L , italic_m , italic_i , italic_j end_POSTSUBSCRIPT italic_f start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m italic_i italic_j end_POSTSUBSCRIPT .

Now, consider linear map ξ1/2subscriptsuperscript𝜉12\xi^{\prime}_{1/2}italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 / 2 end_POSTSUBSCRIPT which maps fmijLsubscriptsuperscript𝑓𝐿𝑚𝑖𝑗f^{L}_{mij}italic_f start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m italic_i italic_j end_POSTSUBSCRIPT into RijL(rxi)YmL(θi,ϕi)subscript𝑅𝑖𝑗𝐿𝑟subscript𝑥𝑖subscriptsuperscript𝑌𝐿𝑚subscript𝜃𝑖subscriptitalic-ϕ𝑖R_{ijL}(r-x_{i})Y^{L}_{m}(\theta_{i},\phi_{i})italic_R start_POSTSUBSCRIPT italic_i italic_j italic_L end_POSTSUBSCRIPT ( italic_r - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_Y start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Then map ξ(𝐯)=ξ1/2(𝐯)2(𝕍eqN(3))superscript𝜉𝐯subscriptsuperscript𝜉12superscript𝐯2superscriptsubscript𝕍𝑒𝑞𝑁superscript3\xi^{\prime}(\mathbf{v})={\xi^{\prime}_{1/2}(\mathbf{v})}^{2}\in(\mathbb{V}_{% eq}^{N}\rightarrow(\mathbb{R}^{3}\rightarrow\mathbb{R}))italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( bold_v ) = italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 / 2 end_POSTSUBSCRIPT ( bold_v ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∈ ( blackboard_V start_POSTSUBSCRIPT italic_e italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT → ( blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT → blackboard_R ) ) maps layer output 𝐯tsubscript𝐯𝑡\mathbf{v}_{t}bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT into electron density, which can be represented as quadratic form under B𝐵Bitalic_B. Meanwhile, any electron density yielded from DFT with B𝐵Bitalic_B also forms quadratic form under B𝐵Bitalic_B. Therefore, any invertible linear map ξ𝜉\xiitalic_ξ defined for 𝐯tsubscript𝐯𝑡\mathbf{v}_{t}bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can be converted into linear map, which its image is subset of electron density that can be generated by DFT by ξ1/2ξ1subscriptsuperscript𝜉12superscript𝜉1\xi^{\prime}_{1/2}\cdot\xi^{-1}italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 / 2 end_POSTSUBSCRIPT ⋅ italic_ξ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT.

A.6 Proof of Proposition 1

Let BPsubscript𝐵𝑃B_{P}italic_B start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT basis set of P𝑃Pitalic_P and B~P[𝐱~]subscript~𝐵𝑃delimited-[]~𝐱\tilde{B}_{P}[\tilde{\mathbf{x}}]over~ start_ARG italic_B end_ARG start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT [ over~ start_ARG bold_x end_ARG ] as basis set of P~[𝐱~]~𝑃delimited-[]~𝐱\tilde{P}[\tilde{\mathbf{x}}]over~ start_ARG italic_P end_ARG [ over~ start_ARG bold_x end_ARG ]. Since basis BPsubscript𝐵𝑃B_{P}italic_B start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT is equivalent with DFT by equivariant constraint, any basis function in BPsubscript𝐵𝑃B_{P}italic_B start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT can be represented by combination of spherical harmonics and radial function centered at a position xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

BP={RijL(rxi)YmL(θi,ϕi)|1iN,LmL,jJLm}subscript𝐵𝑃conditional-setsubscript𝑅𝑖𝑗𝐿𝑟subscript𝑥𝑖subscriptsuperscript𝑌𝐿𝑚subscript𝜃𝑖subscriptitalic-ϕ𝑖formulae-sequence1𝑖𝑁𝐿𝑚𝐿𝑗subscript𝐽𝐿𝑚\displaystyle B_{P}=\{R_{ijL}(r-x_{i})Y^{L}_{m}(\theta_{i},\phi_{i})|1\leq i% \leq N,-L\leq m\leq L,j\leq J_{Lm}\}italic_B start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT = { italic_R start_POSTSUBSCRIPT italic_i italic_j italic_L end_POSTSUBSCRIPT ( italic_r - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_Y start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | 1 ≤ italic_i ≤ italic_N , - italic_L ≤ italic_m ≤ italic_L , italic_j ≤ italic_J start_POSTSUBSCRIPT italic_L italic_m end_POSTSUBSCRIPT } (12)

Similarly, a basis set with Neural Polarization B~P[𝐱~]subscript~𝐵𝑃delimited-[]~𝐱\tilde{B}_{P}[\tilde{\mathbf{x}}]over~ start_ARG italic_B end_ARG start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT [ over~ start_ARG bold_x end_ARG ] can be represented as below.

B~P[𝐱~]=P{RijLm(rx~i)YmL(θi~,ϕi~)|iN,LmL,jJLm}subscript~𝐵𝑃delimited-[]~𝐱𝑃conditional-setsubscript𝑅𝑖𝑗𝐿𝑚𝑟subscript~𝑥𝑖subscriptsuperscript𝑌𝐿𝑚~subscript𝜃𝑖~subscriptitalic-ϕ𝑖formulae-sequenceformulae-sequence𝑖𝑁𝐿𝑚𝐿𝑗subscript𝐽𝐿𝑚\displaystyle\tilde{B}_{P}[\tilde{\mathbf{x}}]=P\cap\{R_{ijLm}(r-\tilde{x}_{i}% )Y^{L}_{m}(\tilde{\theta_{i}},\tilde{\phi_{i}})|i\leq N,-L\leq m\leq L,j\leq J% _{Lm}\}over~ start_ARG italic_B end_ARG start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT [ over~ start_ARG bold_x end_ARG ] = italic_P ∩ { italic_R start_POSTSUBSCRIPT italic_i italic_j italic_L italic_m end_POSTSUBSCRIPT ( italic_r - over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_Y start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( over~ start_ARG italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG , over~ start_ARG italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) | italic_i ≤ italic_N , - italic_L ≤ italic_m ≤ italic_L , italic_j ≤ italic_J start_POSTSUBSCRIPT italic_L italic_m end_POSTSUBSCRIPT } (13)

Since BPsubscript𝐵𝑃B_{P}italic_B start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT is a finite basis, there exists density ρ𝜌\rhoitalic_ρ such that ρP=span(BP)𝜌𝑃𝑠𝑝𝑎𝑛subscript𝐵𝑃\rho\notin P=span(B_{P})italic_ρ ∉ italic_P = italic_s italic_p italic_a italic_n ( italic_B start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ). With this ρ𝜌\rhoitalic_ρ, let ρminsubscript𝜌𝑚𝑖𝑛\rho_{min}italic_ρ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT an electron density which yield minimal error (P,ρ)=(ρmin,ρ)𝑃𝜌subscript𝜌𝑚𝑖𝑛𝜌\mathcal{E}(P,\rho)=\mathcal{E}(\rho_{min},\rho)caligraphic_E ( italic_P , italic_ρ ) = caligraphic_E ( italic_ρ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT , italic_ρ ). When L=0𝐿0L=0italic_L = 0, spherical harmonics is constant and Rij0(rx~i)B~P[𝐱~]subscript𝑅𝑖𝑗0𝑟subscript~𝑥𝑖subscript~𝐵𝑃delimited-[]~𝐱R_{ij0}(r-\tilde{x}_{i})\in\tilde{B}_{P}[\tilde{\mathbf{x}}]italic_R start_POSTSUBSCRIPT italic_i italic_j 0 end_POSTSUBSCRIPT ( italic_r - over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ over~ start_ARG italic_B end_ARG start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT [ over~ start_ARG bold_x end_ARG ]. For convenience let define ϕ[x~i]=Rij0(rx~i)italic-ϕdelimited-[]subscript~𝑥𝑖subscript𝑅𝑖𝑗0𝑟subscript~𝑥𝑖\phi[\tilde{x}_{i}]=R_{ij0}(r-\tilde{x}_{i})italic_ϕ [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = italic_R start_POSTSUBSCRIPT italic_i italic_j 0 end_POSTSUBSCRIPT ( italic_r - over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ).

Since (ρmin,ρ)>0subscript𝜌𝑚𝑖𝑛𝜌0\mathcal{E}(\rho_{min},\rho)>0caligraphic_E ( italic_ρ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT , italic_ρ ) > 0 , ρρ0𝜌𝜌0\rho-~{}\rho\neq 0italic_ρ - italic_ρ ≠ 0 and there exists some x~i3subscript~𝑥𝑖superscript3\tilde{x}_{i}\in\mathbb{R}^{3}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT such that𝐑3(ρρmin)ϕ[x~i]𝑑V=Δ0subscriptsuperscript𝐑3𝜌subscript𝜌𝑚𝑖𝑛italic-ϕdelimited-[]subscript~𝑥𝑖differential-d𝑉Δ0\int_{\mathbf{R}^{3}}(\rho-\rho_{min})\phi[\tilde{x}_{i}]{dV}=\Delta\neq 0∫ start_POSTSUBSCRIPT bold_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ρ - italic_ρ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ) italic_ϕ [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] italic_d italic_V = roman_Δ ≠ 0 because of radial symmetry in ϕ[x~i]italic-ϕdelimited-[]subscript~𝑥𝑖\phi[\tilde{x}_{i}]italic_ϕ [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ]. With this x~isubscript~𝑥𝑖\tilde{x}_{i}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

(ρmin+Δ2ϕ[x~i],ρ)subscript𝜌𝑚𝑖𝑛Δ2italic-ϕdelimited-[]subscript~𝑥𝑖𝜌\displaystyle\mathcal{E}(\rho_{min}+\frac{\Delta}{2}\phi[\tilde{x}_{i}],\rho)caligraphic_E ( italic_ρ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT + divide start_ARG roman_Δ end_ARG start_ARG 2 end_ARG italic_ϕ [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] , italic_ρ ) =𝐑3(ρρminΔ2ϕ[x~i])2ϕ[x~i]𝑑Vabsentsubscriptsuperscript𝐑3superscript𝜌subscript𝜌𝑚𝑖𝑛Δ2italic-ϕdelimited-[]subscript~𝑥𝑖2italic-ϕdelimited-[]subscript~𝑥𝑖differential-d𝑉\displaystyle=\int_{\mathbf{R}^{3}}(\rho-\rho_{min}-\frac{\Delta}{2}\phi[% \tilde{x}_{i}])^{2}\phi[\tilde{x}_{i}]{dV}= ∫ start_POSTSUBSCRIPT bold_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ρ - italic_ρ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT - divide start_ARG roman_Δ end_ARG start_ARG 2 end_ARG italic_ϕ [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϕ [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] italic_d italic_V (14)
=𝐑3(ρρmin)2+(Δ2ϕ[x~i])2Δϕ[x~i](ρρmin)dVabsentsubscriptsuperscript𝐑3superscript𝜌subscript𝜌𝑚𝑖𝑛2superscriptΔ2italic-ϕdelimited-[]subscript~𝑥𝑖2Δitalic-ϕdelimited-[]subscript~𝑥𝑖𝜌subscript𝜌𝑚𝑖𝑛𝑑𝑉\displaystyle=\int_{\mathbf{R}^{3}}(\rho-\rho_{min})^{2}+(\frac{\Delta}{2}\phi% [\tilde{x}_{i}])^{2}-\Delta\phi[\tilde{x}_{i}](\rho-\rho_{min}){dV}= ∫ start_POSTSUBSCRIPT bold_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ρ - italic_ρ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG roman_Δ end_ARG start_ARG 2 end_ARG italic_ϕ [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - roman_Δ italic_ϕ [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ( italic_ρ - italic_ρ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ) italic_d italic_V (15)
=(ρmin,ρ)+(Δ2)2Δ2absentsubscript𝜌𝑚𝑖𝑛𝜌superscriptΔ22superscriptΔ2\displaystyle=\mathcal{E}(\rho_{min},\rho)+(\frac{\Delta}{2})^{2}-\Delta^{2}= caligraphic_E ( italic_ρ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT , italic_ρ ) + ( divide start_ARG roman_Δ end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - roman_Δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (16)
=(ρmin,ρ)34Δ2absentsubscript𝜌𝑚𝑖𝑛𝜌34superscriptΔ2\displaystyle=\mathcal{E}(\rho_{min},\rho)-\frac{3}{4}\Delta^{2}= caligraphic_E ( italic_ρ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT , italic_ρ ) - divide start_ARG 3 end_ARG start_ARG 4 end_ARG roman_Δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (17)
<(ρmin,ρ)absentsubscript𝜌𝑚𝑖𝑛𝜌\displaystyle<\mathcal{E}(\rho_{min},\rho)< caligraphic_E ( italic_ρ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT , italic_ρ ) (18)

Because ρmin+Δ2ϕij0P~[𝐱~]subscript𝜌𝑚𝑖𝑛Δ2subscriptitalic-ϕ𝑖𝑗0~𝑃delimited-[]~𝐱\rho_{min}+\frac{\Delta}{2}\phi_{ij0}\in\tilde{P}[\tilde{\mathbf{x}}]italic_ρ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT + divide start_ARG roman_Δ end_ARG start_ARG 2 end_ARG italic_ϕ start_POSTSUBSCRIPT italic_i italic_j 0 end_POSTSUBSCRIPT ∈ over~ start_ARG italic_P end_ARG [ over~ start_ARG bold_x end_ARG ], with x~i𝐱~subscript~𝑥𝑖~𝐱\tilde{x}_{i}\in\tilde{\mathbf{x}}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ over~ start_ARG bold_x end_ARG

(P~[𝐱~],ρ)(ρmin+Δ2ϕij0,ρ)<(ρmin,ρ)=(P,ρ)~𝑃delimited-[]~𝐱𝜌subscript𝜌𝑚𝑖𝑛Δ2subscriptitalic-ϕ𝑖𝑗0𝜌subscript𝜌𝑚𝑖𝑛𝜌𝑃𝜌\displaystyle\mathcal{E}(\tilde{P}[\tilde{\mathbf{x}}],\rho)\leq\mathcal{E}(% \rho_{min}+\frac{\Delta}{2}\phi_{ij0},\rho)<\mathcal{E}(\rho_{min},\rho)=% \mathcal{E}(P,\rho)caligraphic_E ( over~ start_ARG italic_P end_ARG [ over~ start_ARG bold_x end_ARG ] , italic_ρ ) ≤ caligraphic_E ( italic_ρ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT + divide start_ARG roman_Δ end_ARG start_ARG 2 end_ARG italic_ϕ start_POSTSUBSCRIPT italic_i italic_j 0 end_POSTSUBSCRIPT , italic_ρ ) < caligraphic_E ( italic_ρ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT , italic_ρ ) = caligraphic_E ( italic_P , italic_ρ ) (20)

A.7 Datasets

The QM9 dataset [17] is a dataset containing various molecular properties for 134k small molecules. The molecules included in the dataset are composed of C, N, O, H, and F and contain up to 29 atoms. This dataset is frequently used for measuring performance of deep learning models for molecules. The dataset consists of molecular conformations, the atomic coordinates of the given molecule, and 12 different molecular properties calculated from these conformations, such as internal Energy and dipole moment.

The MD17 dataset [65] is a dataset about molecular dynamics (MD) trajectories of 10 small molecules. The dataset consists of molecular conformations from the trajectories with the corresponding molecular energy and atomic forces. The task for this dataset is to predict the forces and energy for a given conformation. In this paper, we used the MD17 dataset configuration addressed in TorchMD-NET, which is the base model for comparison.

The n-body system task we experimented was suggested from EGNN, which extended task proposed in [66]. The dataset consists of trajectories generated by five charged particles, whose trajectories are simulated based on charged interactions. The task for this dataset is to predict the positions after 1000 steps from given initial positions.

A.8 Detailed Implementation of Neural Polarization

We implemented Neural Polarization based on the official GitHub repositories provided by each model. Neural Polarization was implemented using the same way with three types of baseline networks, except for the projection layer.For EGNN we used Multilayer Perceptron (MLP) with a single hidden layer of 128 dimension as a node feature. In TorchMD-NET, the projection layer was implemented as a single linear layer without bias for the l=1𝑙1l=1italic_l = 1 feature of each layer output 𝐯~𝐭,𝐢subscript~𝐯𝐭𝐢\mathbf{\tilde{v}_{t,i}}over~ start_ARG bold_v end_ARG start_POSTSUBSCRIPT bold_t , bold_i end_POSTSUBSCRIPT. For Equiformer, the projection layer was implemented as a shallow equivariant network based on a tensor product with one l=1𝑙1l=1italic_l = 1 feature.

To minimize undesired effect on performances arose by hyperparameter optimization, we fixed all hyperparameters except the learning rate and the batch size provided in the official repositories. We modified batch size as half only in cases where out-of-memory (OOM) errors occurred. For other implementations such as data splitting, optimizer, learning rate scheduling strategy, and objective loss function, we followed configurations implemented in the official repositories. Experiments were conducted on NVIDIA V100, A40, or A100 GPUs. Each experiment with the QM9 dataset has required maximum 240 GPU hours each, while other experiments required less than 24 GPU hours. Training times differ from model types, because we followed the early-stop** condition implemented in each original repository.

A.9 Trajectories generated by Neural Polarization about molecules in QM9 dataset

Refer to caption
Figure S1: More trajectories of x~~𝑥\tilde{x}over~ start_ARG italic_x end_ARG trained on ϵHOMOsubscriptitalic-ϵ𝐻𝑂𝑀𝑂\epsilon_{HOMO}italic_ϵ start_POSTSUBSCRIPT italic_H italic_O italic_M italic_O end_POSTSUBSCRIPT, based on torchMD-NET.
Refer to caption
Figure S2: More trajectories of x~~𝑥\tilde{x}over~ start_ARG italic_x end_ARG trained on U0subscript𝑈0U_{0}italic_U start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, based on torchMD-NET.
Refer to caption
Figure S3: More position of movable position x~~𝑥\tilde{x}over~ start_ARG italic_x end_ARG trained on zpve𝑧𝑝𝑣𝑒zpveitalic_z italic_p italic_v italic_e, based on torchMD-NET.

A.10 Visualization of atomic and molecular orbitals

Refer to caption
Figure S4: Orbitals of various molecules. The polarization effect is characterized by each molecule type.