Neural Polarization: Toward Electron Density for Molecules by Extending Equivariant Networks

Bumju Kwak
Independent Researcher
[email protected]
&Jeonghee Jo*
Korea Institute of Science and Technology (KIST)
[email protected]

Abstract

Recent SO(3)-equivariant models embedded a molecule as a set of single atoms fixed in the three-dimensional space, which is analogous to a ball-and-stick view. This perspective provides a concise view of atom arrangements, however, the surrounding electron density cannot be represented and its polarization effects may be underestimated. To overcome this limitation, we propose Neural Polarization, a novel method extending equivariant network by embedding each atom as a pair of fixed and moving points. Motivated by density functional theory, Neural Polarization represents molecules as a space-filling view which includes an electron density, in contrast with a ball-and-stick view. Neural Polarization can flexibly be applied to most type of existing equivariant models. We showed that Neural Polarization can improve prediction performances of existing models over a wide range of targets. Finally, we verified that our method can improve the expressiveness and equivariance in terms of mathematical aspects.

1 Introduction

For chemical engineering, accurate estimation of molecular conformation with electronic configuration is an essential factor. Quantum chemistry [1, 2] is a branch of chemistry of studying quantum mechanics of a molecule conformation, based on microscopic analysis of a single atom and its surroundings. The most common approach of quantum mechanical modeling of a molecule is density functional theory (DFT) [3]. From this perspective, the main strategy for solving this equation is considering the many-electron system as a functional for a single function, which corresponds to an electron density of a molecule in three-dimensional space [4, 5].

In DFT, this electron density can be represented by atomic orbital function, as a basis set consisting of radial basis functions and spherical harmonics in three-dimensional space [6, 7]. Spherical harmonics are a set of angular basis functions subdivided by a degree ( $L$ ), an integer-valued notation representing an angular frequency of orbitals [8, 9]. Specifically, $d$ and $f$ orbitals are also called “polarization functions”, because they can describe a distortion of an electron cloud, named as polarization [10, 11, 12].

These polarization functions are useful in describing molecular properties including valence electrons, and consequently affect various types of properties of a molecule [13, 14]. To estimate quantum mechanical properties with considering polarization effects, the basis sets for DFT calculation need to include polarization functions of higher degree [15, 16]. QM9 [17], one of the popular datasets in deep learning benchmark, was also calculated by 6-31G(2df,p) level of basis sets [18] which containing polarization basis.

The previous SO(3)-equivariant networks including Equiformer [19], NequIP [20], and others [21, 22, 23, 24, 25, 26] also used radial basis and spherical harmonics in their networks to represent molecule conformation. However, this approach may be limited because the representation of a set of atoms as points cannot cover the electron density functional. Therefore, most existing SO(3)-equivariant networks are potentially limited in prediction of molecular potential energy and related properties, without expression of electron density.

To address these challenges, we propose a novel flexible extension method for SO(3)-equivariant networks motivated by DFT, Neural Polarization, by allowing each equivariant block to explicitly consider the polarization effect of electron density while kee** SO(3)-equivariance. The key point of Neural Polarization is introducing an additional “movable point”, which is similarly defined as the existing atom, but these points can update their location during the training process. These moving points can be viewed as a type of direction indicator describing the polarized electron density, which is closer to a space-filling view of a molecule. We did not use any additional constraint on movements of these movable points, expecting that atomic orbital polarization caused by electron configuration can be learned for better molecule representation learning.

We applied Neural Polarization on three existing SO(3)-equivariant networks for quantum mechanical property prediction, and trained the extended models from scratch (without pretrained parameters). We verified that Neural Polarization significantly improved the prediction performance over a wide range of targets compared with the original report, especially thermodynamic potential-related targets. We also visualized the trajectories of each movable point in the three trained models, and observed that the shifting patterns of movable points have distinctive characteristics according to the target objectives. The experimental results support our initial assumption that Neural Polarization, which explicitly models the directional surroundings of an atom, induces the latent features to exhibit behavior more similar to the electron density in DFT. We also analyzed the pattern of the position of movable points, comparing with the fundamentals of chemical bonding. Finally, we mathematically verified that equivariant networks equipped with Neural Polarization also have the strictly lower bound of an approximation error with the same maximum degree of spherical harmonics and higher model expressiveness, compared to the original networks.

The contributions of this study are summarized as below.

1. Motivated by DFT, we developed a novel extension method, Neural Polarization, for SO(3)-equivariant networks by introducing movable points expecting that their positions can incorporate the effect of polarization representation for advanced molecular representation.

2. We validated the effectiveness of Neural Polarization based on the performance gain of the experimental results. In addition, the trajectories of movable points from the trained models showed that Neural Polarization can adaptively find the better description of polarization, depending on the target objective.

3. We verified that Neural Polarization lowers the approximation error trained with the same spherical harmonics with the original network, and improves the model expressiveness.

2 Preliminary of equivariant neural networks

In this manuscript, we aim to present the most important part of the preliminary due to space constraints. The remaining parts, DFT and SO(3)-equivariance are introduced in the Appendix A.4.

By definition, a group equivariant network consists of group equivariant layers such that its output transforms equivariantly under specified group operations applied to its input [27]. Meanwhile, a group-invariant network has the final layer which is group-invariant, and all other group-equivariant layers. For predicting molecular property $y$ related to its energy for learning molecule embedding representation $E(\mathbf{x})$ from atom position $\mathbf{x}=\{x_{i}\}$ , the network of $T$ layers should be group-invariant: consisting of $0,...,T-1$ group-equivariant layers $M$ with the final group-invariant readout function $R$ , or a pooling layer. That is, $\hat{Y}=R\circ{M_{T-1}}\circ{M_{T-2}}\circ\ldots\circ{M_{0}}\circ{E}(\mathbf{x})$ .

Embedding layer. In general, the embedding layer $E$ locates at the front of the network. The embedding layer learns a feature vector of individual atoms $\mathbf{v}_{0}=\{v_{i}\}$ from atom positions $\mathbf{x}=\{x_{i}\}$ and atom numbers $\mathbf{a}=\{a_{i}\}$ .

Equivariant layer. A function $f$ which satisfies $g\circ f(x)=f(g\circ x)$ for any group element $g\in G$ , is called a $G$ -equivariant layer. For representing molecule structures in Euclidean space, an orthogonal geometry group SO(3) or SE(3) is generally selected as $G$ . Many equivariant networks [28, 29, 24, 21, 25] utilized a message-passing function [30] as a framework for their equivariant layers. However, there is no explicit constraint on the choice of an architecture. For example, [19, 31] consists of equivariant self-attention layers for molecules. We denote that $M_{t}$ uses given position $\mathbf{x}$ and learnable equivariant vector $\mathbf{v_{t}}$ , however, some networks only uses $\mathbf{v_{t}}$ for learning $\mathbf{v_{t+1}}$ .

Pooling (Readout) layer . A pooling layer $R$ locates the end of a network, also called a readout function, produces the output $\hat{y}$ . In molecular property prediction, this layer merges all equivariant feature vectors and produces a scalar-valued prediction.

$\displaystyle Embedding\quad layer:\mathbf{v}_{0}$	$\displaystyle=E(\mathbf{x},\mathbf{a})$	(1)
$\displaystyle Equivariant\quad layer:\mathbf{v}_{t+1}$	$\displaystyle=M_{t}(\mathbf{x},\mathbf{v}_{t})$	(2)
$\displaystyle Pooling\quad layer:\hat{y}$	$\displaystyle=R(\mathbf{v}_{T})$	(3)

3 Methods

3.1 Motivation of Neural Polarization by DFT

The role of electron density in DFT calculation

A molecule conformation is represented by a set of atomic number $\mathbf{a}$ and positions $\mathbf{x}$ of constituting $N$ atoms. Molecular property $\hat{y}$ can be predicted from its conformation $X=\{x_{i},a_{i}\}_{i=1,...,N}=\{\mathbf{x},\mathbf{a}\}$ . DFT has been most widely used method for addressing this problem, providing more accurate and reliable predictions compared to recent deep learning-based approaches [32, 33]. We hypothesized that the prediction performance of deep neural networks would benefit from fundamental concepts of DFT. In particular, we aim to propose a methodology for improving the prediction performance of existing equivariant networks based on the fundamentals of DFT.

As mentioned in Appendix A.3, DFT calculation is achieved by two sequential steps [34]. The first step is calculating electron density $\rho$ of from the given $X$ , and predict molecular property $\hat{y}$ with the well-defined functional based on the calculated $\rho$ , which is a type of function defined on any vector $\vec{r}\in\mathbb{R}^{3}$ . The first step $X\rightarrow\rho$ is achieved by solving computation-intensive Kohn-Sham equation [35], whereas molecular energy (or $\hat{y}$ ) can be easily calculated from the electron density $\rho$ using pre-defined functionals, in the second step.

Construction of a link between electron density and feature space of equivariant networks

Based on these principles, we developed the assumption that if the latent feature $\chi=\{\mathbf{x},\mathbf{v}\}$ of any equivariant network is equivalent to $\rho$ , or if $\chi$ can express all information contained in $\rho$ , the network would suggest more accurate and reliable predictions close to DFT. To be more concrete, our research objective is to develop a novel equivariant latent feature $\chi$ with atom positions $\mathbf{x}$ and $\mathbf{v}$ for SO(3) equivariant networks, of which each layer can approximate the electron density $\rho$ of a given molecule in DFT calculation. To be precise, we aim to train $\xi$ and appropriate $\chi$ which satisfies $\xi(\chi)=\rho$ .

In DFT calculations, the electron density $\rho$ is expressed as a linear combination of a finite set of basis functions. Analogously, the latent feature in equivariant networks resides in a finite-dimensional vector space. In addition, in DFT, finding the optimal representation for $\rho$ using basis sets is analogous to finding $\xi$ under the constraints of linearity and invertibility. Based on this connection between finding optimal DFT basis sets and constructing an expressive latent space for SO(3)-equivariant networks for molecule property prediction, we aim to introduce the methodology based on selecting the DFT basis set, in order to improve the performance of the neural network.

Among factors considered for basis set, we focused on the term "polarization", which is one of the significant characteristics of a molecule. Polarization refers to a distortion toward specific direction $\tilde{x}$ of the electron cloud depending on electron configuration around an atom nuclei. The shape of polarization is determined by complex interatomic interactions, and has a direct effect on various properties. To incorporate polarization effects, DFT utilizes a high-degree polarization function in general. Analogously, if we learn a basis function for equivariant features $\tilde{v}_{i}$ of any movable point $\tilde{x}_{i}$ near the original atom $x_{i}$ , the latent space can effectively learn polarization functions. Therefore, if we can extend an equivariant network to incorporate a pair of ( $\tilde{x}_{i}$ , $\tilde{v}_{i}$ ) in feature space to get a hint of polarization effect, the network would be more powerful and expressive in representing atom surroundings and show better prediction performance. In Figure 1, a schematic diagram comparing the concepts is described.

3.2 Neural Polarization

We introduce a high-level description of Neural Polarization, because the internal structure of each module depends on the original baseline networks. Neural Polarization is a type of extension methodology for SO(3)-equivariant networks, with an additional movable point $\tilde{\mathbf{x}}$ of each atom with its corresponding equivariant feature vector $\tilde{\mathbf{v}}$ , and $t$ equivariant layers (or blocks) $\tilde{M}_{t}$ which are extended for incorporating $\tilde{\mathbf{x}}$ and $\tilde{\mathbf{v}}$ as inputs.

The first step is initializing movable points of position $\tilde{\mathbf{x}}_{0}$ and type $\tilde{\mathbf{a}}_{0}$ , and embedding them using $E$ for creating $\tilde{\mathbf{v}}_{0}$ . The initial position $\tilde{\mathbf{x}}_{0}$ is same with $\tilde{\mathbf{x}}$ of the original atoms.

	$\displaystyle\tilde{\mathbf{x}}_{0}$	$\displaystyle=\mathbf{x}$		(4)
	$\displaystyle\tilde{\mathbf{v}}_{0}$	$\displaystyle=E(\tilde{\mathbf{x}_{0}},\tilde{\mathbf{a}})$		(5)

Second, the network updates $\tilde{\mathbf{v}}_{t}$ with $\tilde{M}_{t}$ . Contrary to an original $M_{t}(\mathbf{x}_{t},\mathbf{v}_{t})$ , $\tilde{M}_{t}$ is defined on extended inputs $([\mathbf{x};\tilde{\mathbf{x}}_{t}],[\mathbf{v}_{t};\tilde{\mathbf{v}}_{t}])$ . Our equivariant $\tilde{M}_{t}$ updates $(\mathbf{v}_{t+1},\tilde{\mathbf{v}}_{t+1})$ based on $(\mathbf{x},\tilde{\mathbf{x}}_{t},\mathbf{v}_{t},\tilde{\mathbf{v}}_{t})$ using ${M}_{t}$ .

\displaystyle\tilde{M}_{t}:[\mathbf{v}_{t+1};\tilde{\mathbf{v}}_{t+1}]=M_{t}([% \mathbf{x};\tilde{\mathbf{x}}_{t}],[\mathbf{v}_{t};\tilde{\mathbf{v}}_{t}])

(6)

$\tilde{\mathbf{x}}_{t+1}$ is produced by the additional projection layer $\pi_{t}:\mathbb{V}^{k\times N}\rightarrow\mathbb{R}^{3\times N}$ , given by an equivariant feature $\tilde{\mathbf{v}_{t}}$ of $\tilde{M}_{t}$ . We constructed a $\pi_{t}$ as a sequential block of an equivariant layer and linear layer, however, there is no constraint on the constitution of $\pi$ block. Note that $\tilde{M}_{t}$ does not modify the original atom position $\mathbf{x}$ , following the baseline networks.

	$\displaystyle\Delta\tilde{\mathbf{x}}_{t}=\pi_{t}(\tilde{\mathbf{v}_{t}})$		(7)
	$\displaystyle\tilde{\mathbf{x}}_{t+1}=\tilde{\mathbf{x}}_{t}+\Delta\tilde{% \mathbf{x}}_{t}$		(8)

The overview and psuedocode of neural polarization is described in Figure 2 and Algorithm 2, respectively, compared with the original framework. In broad terms, the optimizing $M_{t}$ and $\{x,\tilde{x},v,\tilde{v}\}$ may correspond to finding the optimal $\xi$ and $\chi$ , respectively.

3.3 Mathematical interpretation of Neural Polarization

We introduced the process of training movable points and their equivariant features $\{\tilde{\mathbf{x}},\tilde{\mathbf{v}}\}$ in networks with Neural Polarization. To investigate the advantage of $\{\tilde{\mathbf{x}},\tilde{\mathbf{v}}\}$ in approximating $\rho$ , we also conducted theoretical analysis on these terms. In particular, We will discuss about approximation capability of Neural Polarization for electron density. To discuss the approximation capability for the electron density $\rho$ , we introduce the following definition.

Definition 1

Let define the error $\mathcal{E}(\rho,\hat{\rho})$ between electron density $\rho$ and $\hat{\rho}$ as $\mathcal{E}(\rho,\hat{\rho})=\int_{\mathbf{R}^{3}}{|{(\rho-\hat{\rho})}|^{2}{% dV}}$ and error $P$ and $\rho$ as $\mathcal{E}(P,\rho)=\underset{\rho\in P}{\mathrm{min}}\>\mathcal{E}(\rho,\hat{% \rho})$ .

The error $\mathcal{E}(P,\rho)$ defined in Definition 1 can be regarded as a metric for approximation capability of $P$ about $\rho$ . Let $P$ and $\tilde{P}[\tilde{\mathbf{x}}]$ denote the latent feature of electron density in the original network and the network with Neural Polarization, respectively. Then, the following holds, by setting $\tilde{\mathbf{x}}$ where error $\mathcal{E}(P,\rho)$ occurs. Detailed definition and proof are provided in A.5.

Proposition 1

For any electron density $\rho$ , there exists $\tilde{\mathbf{x}}$ which that satisfies $\mathcal{E}(P,\rho)>\mathcal{E}(\tilde{P}[\tilde{\mathbf{x}}],\rho)$

Proposition 1 shows that Neural Polarization can achieve better approximation than the original network for arbitrary electron densities. Because the electron density itself is a type of function, this proposition supports that Neural Polarization can obtain node features $\mathbf{v}$ that are closer to the real electron density compared to the original network. Therefore, we have demonstrated that Neural Polarization provides better approximation to the electron density within neural networks.

4 Experiment

To confirm the effect of Neural Polarization on general SO(3)-equivariant models, we selected three equivariant models (EGNN [36], Equiformer [19], TorchMD-NET [31]) of various architectures as the baseline networks. We performed experiments on QM9 [17] and MD17 [37], which are most commonly used datasets for molecular property predictions. Lastly, we investigated whether Neural Polarization can be effective on non-molecular tasks involving particle movements, we conducted experiments on the n-body system task proposed by in EGNN. Details of implementations are presented in the Appendix A.8.

Refer to caption — Figure 1: Conceptual overview of our research compared with other methodology. In DFT, there exists a one-to-one correspondence between $\rho(\vec{r})$ and molecular conformation $\{\mathbf{x},\mathbf{a}\}$ which both fully determines the other properties of the molecule. The baseline $\mathbb{SO}(3)$ -equivariant networks can provide a more rich representation of an electron density $\rho(\vec{r})$ with Neural Polarization.

{adjustwidth}

-4cm-4cm Target Unit EGNN TorchMD-NET Equiformer avg. $\Delta$ % w/o NP w/ NP w/o NP w/ NP w/o NP w/ NP $\mu$ D 0.029 0.03 0.011 0.014 0.011 0.010 +4.92% $\alpha$ ${a_{0}}^{3}$ 0.071 0.071 0.059 0.0447 0.046 0.0527 -3.31% $\epsilon_{\mathrm{HOMO}}$ meV 29 29.9 20.3 18.4 16.5 16.7 -2.04% $\epsilon_{\mathrm{LUMO}}$ meV 25 23.4 17.5 17.8 14.3 14.0 -2.43% $\Delta_{\epsilon}$ meV 48 47.9 36.1 41.9 30 33.7 +8.20% < $R^{2}$ > ${a_{0}}^{2}$ 0.106 0.089 0.033 0.085 0.251 0.162 -4.29% $zpve$ meV 1.55 1.50 1.84 1.22 1.26 2.15 -4.25% $U_{0}$ meV 11 9.52 6.15 5.5 6.59 5.54 -15.44% $U$ meV 12 10.33 6.38 5.4 6.74 5.49 -19.03% $H$ meV 12 9.52 6.16 5.6 6.63 6.27 -13.93% $G$ meV 12 11.5 7.62 6.6 7.63 7.01 -9.55% $C_{v}$ cal/mol K 0.031 0.032 0.026 0.023 0.023 0.023 -3.31% avg. $\Delta$ % -6.84% -5.29% -4.76% -5.63%

Table 1: Mean absolute error on QM9.

•

NP: Neural Polarization.

Table 2: Mean absolute errors (MAE) of the energy and force prediction of MD17 (Unit: kcal/mol/A).

Molecule	Target	TorchMD-NET		avg. $\Delta$ %
		w/o NP	w/ NP
Aspirin	Energy	0.123	0.126	2.38%
	Forces	0.253	0.224	-12.95%
Benzene	Energy	0.058	0.05424	-6.93%
	Forces	0.196	0.1174	-10.48%
Ethanol	Energy	0.052	0.0524	0.76%
	Forces	0.109	0.0878	-24.15%
Malonaldehyde	Energy	0.077	0.0794	3.02%
	Forces	0.169	0.146	-15.75%
Naphthalene	Energy	0.085	0.081	-4.94%
	Forces	0.061	0.1594	61.73%
Salicylic acid	Energy	0.093	0.08086	-15.01%
	Forces	0.129	0.1262	-2.22%
Toluene	Energy	0.074	0.058	-27.59%
	Forces	0.067	0.057	-17.54%
Uracil	Energy	0.095	0.0857	-10.85%
	Forces	0.095	0.0857	-10.85%
	Energy			-7.39%
	Forces			^∗-4.03%

•

*The performance gain is -13.42% except for the case of Naphthalene.

Algorithm 1 SO(3)-equivariant network without Neural Polarization Given $\mathbf{x}\in\mathbb{R}^{3}$ , $\mathbf{a}\in\mathbb{R}$ , $\mathbf{v}\in\mathbb{V}$ and a layer index $t=0,1,...,(T-1)$ . $\mathbf{v_{0}}$ $\longleftarrow$ Embedding( $\mathbf{x}$ , $\mathbf{a}$ ) for $t=0,1,...,(T-1)$ do $\mathbf{v}_{t+1}$ $\longleftarrow$ EquivariantLayer( $\mathbf{v}_{t}$ ) end for $\hat{y}$ $\longleftarrow$ Pooling( $\mathbf{v}_{T}$ ) return $\hat{y}$ Algorithm 2 SO(3)-equivariant network with Neural Polarization (proposed) Given $\mathbf{x},\color[rgb]{0,0,1}{\mathbf{\tilde{x}}}$ $\in\mathbb{R}^{3}$ , $\mathbf{a},\color[rgb]{0,0,1}{\mathbf{\tilde{a}}}\in\mathbb{R}$ , $\mathbf{v},\color[rgb]{0,0,1}{\mathbf{\tilde{v}}}\in\mathbb{V}$ and a layer index $t=0,1,...,(T-1)$ . $\mathbf{v_{0}},\color[rgb]{0,0,1}{\mathbf{\tilde{v_{0}}}}$ $\longleftarrow$ Embedding( $\mathbf{x}$ , $\mathbf{a},\color[rgb]{0,0,1}{\mathbf{\tilde{a}}}$ ) for $t=0,1,...,(T-1)$ do $\mathbf{v}_{t+1},\color[rgb]{0,0,1}{\mathbf{\tilde{v}}_{t+1}}$ $\longleftarrow$ EquivariantLayer([ $\mathbf{x}_{t},\color[rgb]{0,0,1}{\mathbf{\tilde{x}}_{t}}$ ], [ $\mathbf{v}_{t},\color[rgb]{0,0,1}{\mathbf{\tilde{v}}_{t}}$ ]) $\color[rgb]{0,0,1}{\Delta\mathbf{\tilde{x}}_{t}}$ $\longleftarrow$ $\color[rgb]{0,0,1}{\text{Proj}_{t}}$ $(\color[rgb]{0,0,1}{\mathbf{\tilde{v}}_{t+1}}$ $)$ $\color[rgb]{0,0,1}{\mathbf{\tilde{x}}_{t+1}}$ $\longleftarrow$ $\color[rgb]{0,0,1}{\mathbf{\tilde{x}}_{t}}$ + $\color[rgb]{0,0,1}{\Delta\mathbf{\tilde{x}}_{t}}$ end for $\hat{y}$ $\longleftarrow$ Pooling( $\color[rgb]{0,0,1}{\mathbf{\tilde{v}}_{T}}$ ) return $\hat{y}$

Table 3: Ablation results trained with Equiformer on three types of QM9 targets

\mu

\epsilon_{HOMO}

and

U_{0}

\{\mathbf{x},\mathbf{x}\}

in the second row is the ablation study for comparing the effect of

\tilde{\mathbf{x}}

, kee** the same computational cost and weight parameters. Scale is the same with Table 1.

Method	$\mu$	$\epsilon_{HOMO}$	$U_{0}$
Equiformer	0.0118	16.5	6.59
Equiformer + $\{\mathbf{x},\mathbf{x}\}$	0.0159	17.7	8.80
Equiformer + NP	0.0109	16.7	5.54

Table 4: Mean Squared Error (MSE) for the future position estimation in n-body task (the prediction of particles’ movement), proposed in [36]. The results of baseline EGNN are retrieved from the original paper.

Method	MSE
EGNN	0.0071
EGNN + NP	0.0051

5 Result

5.1 The performance gains on QM9 and MD17 dataset

The experimental results on the QM9 dataset are presented in Table 1, categorized by whether Neural Polarization was applied (marked as ‘with NP’) or not for each baseline network, in all 12 target cases. Most of the baseline results (the left side) of each previous model were reproduced by training the source code provided in the official page, from the scratch. A few cases were retrieved from the reports on the original paper, in the case of a computation or compatibility issue with our environment for the source code.

In the case of EGNN, we observed that the error was reduced on 9 labels (including no change of an alpha case) with using a Neural Polarization, with an average of -6.84% error change rate of all 12 targets. In the next case, TorchMD-NET, the error was reduced on 7 targets with a Neural Polarization, with an average of -5.29% error change rate of all 12 targets. The last case Equiformer, the error was reduced on 8 targets with a Neural Polarization (including no change of a $C_{v}$ case), with an average of -4.76% error change rate of all targets.

Interestingly, we observed that for the cases of thermodynamic properties including $U_{0},U,H,G$ , the error was significantly reduced regardless of the baseline network type. In these four cases, the average performance gain (the average of three error change rates of each baseline case) the ranges from -10% to -20%, whereas other six target cases resulting the error reduction (except for $\mu$ and $\Delta_{\epsilon}$ ), showed the ranges from -2% to -5% of performance gains, respectively. In the analysis of the average error change rate of each baseline model, there was no significant difference between three models. EGNN showed the best performance gain of -6.84%, followed by TorchMD-NET (-5.29%), and Equiformer (-4.76%).

Next is the analysis on the MD17 dataset consisting of eight molecules, using TorchMD-NET as a baseline network. Each molecule has two types of targets, energy and forces, respectively. These results are presented in Table 2. In case of energy prediction, the error was reduced in five molecules when trained on the network with Neural Polarization, and the average error change rate of all energies of eight molecules is -7.39%. In the case of forces prediction, the error was reduced seven of eight molecules on the network with Neural Polarization, except for Naphthalene. We observed that in the case of force prediction of Naphthalene molecule, the error was significantly increased by 61.73%, although the error was decreased in the case of energy prediction. The reason of this contrasting results of the naphthalene case is not clear. In summary, the average error change rates were -7.39% in energy predictions and -4.03% in forces predictions of eight molecules, respectively. Without considering naphthalene, the average error change rate of force predictions was decrease to -13.42%.

5.2 Investigation of the polarization trajectory

To validate our assumption that updating $\tilde{\mathbf{x}}$ and its equivariant feature $\tilde{\mathbf{v}}$ can facilitate exploiting molecule’s electron density for molecular property prediction, we analyzed the final position of $\tilde{\mathbf{x}}$ extracted from the trained models. We selected targets of various types, and tracked the trajectory of every $\tilde{\mathbf{x}}$ during the training process.

We observed that the most determining factor of the movement of $\tilde{x}$ is the target objective, rather than atom type or bond types. $\tilde{x}$ from the model trained on $\epsilon_{homo}$ whereas in the trained model with Neural Polarization on $zpve$ , $\tilde{x}$ tends to move toward the outside of the molecule center of mass. One possible explanation for these characteristic patterns is that Neural Polarization was adaptively trained for optimizing $\tilde{x}$ depending on training target types, rather than just increasing number of parameters for the original atoms.

Another notable point is that each $\tilde{x}$ did not deviate more than a half of the bond length from its belonging atom $x$ . Although further profound analyses would be needed to explain the movement of $\tilde{x}$ , this trend may be one of the evidences that $\tilde{x}$ perform a role in supporting original ${x}$ , while understanding the characteristics of molecular properties.

5.3 Ablation study

We assumed that the performance improvement in the model is not simply caused by an increase in the number of variables and parameters, we trained Neural Polarization on a pair of non-movable points $\{\mathbf{x},\mathbf{x}\}$ , which is a replicating the inputs. We conducted an ablation study on $\epsilon_{HOMO}$ , $\mu$ , and $U_{0}$ in QM9. As shown in Table 3, using $\{\mathbf{x},\mathbf{x}\}$ rather than $\{\mathbf{x},\tilde{\mathbf{x}}\}$ increase the prediction error, and the performances were improved only applied with Neural Polarization. Based on this, we found that movable points perform a significant role in improving performance of property prediction tasks.

5.4 Neural Polarization in other domain

We found that 1 holds not only for the electron density of molecules but also for general 3-D density functions. To examine this assumption, we conducted experiments on the n-body task proposed in [36]. As demonstrated in Table 4, Neural Polarization also improved performances n-body task, which is not limited to molecule tasks. This observation led to possible generalizability of Neural Polarization beyond molecular tasks.

6 Discussion

For analyzing the effect of Neural Polarization on molecular property prediction tasks, we analyzed the final positions of $\tilde{x}$ along with the directions of the covalent bonds in molecule. According to chemistry, a single bond is formed by the sharing of an electron pair between two atoms, while a double bond arises from the sharing of two pairs of electrons, leading to electron densities aligned parallel to the bond axis. In case of aromatic rings, the delocalized electron densities form planar regions above and below the ring plane. In accord with this fundamentals, as shown in Figure 3 (right), the trajectories from atom with single bonds exist near bonds, while the trajectories from aromatic rings were created on the same plane with the ring, as shown in Figure 3 (left). In addition, we observed that most of the final location of $\tilde{\mathbf{x}}_{T}$ (small red dot) is distant from the original atom location $\mathbf{x}_{0}$ , but no more than half the bond length away, which is related with the inherent property of a covalent bond. From those observations, we assume that Neural Polarization recognize the various molecular property including the characteristics of bonding types, and understand the characteristics of a polarization effect of each molecule.

7 Conclusion

We proposed Neural Polarization, which enables the intermediate state of an equivariant network to better represent the electron density corresponding to the intermediate state in DFT calculations. Accompanied by flexible applicability, Neural Polarization demonstrated performance improvements across diverse tasks and models, showing that it can be trained toward the polarization characteristics of electron density in accord with our assumption. Based on these results, we expect Neural Polarization to enforce improvements in general molecular problems. For future work, we will propose various methodologies inspired by more aspects beyond polarization.

Limitations

While Neural Polarization does not change the computational complexity of the original model, it introduces an additional computational cost. Meanwhile, for deeper insights from trajectories, it would be beneficial to validate the approach on molecular datasets that include electron density information.

Broader impacts

Our study can lead to an advanced research subjects for bridging gap between quantum chemistry and deep learning. In addition, the development of Neural Polarization involves concepts from DFT, equivariant neural networks, and molecular modeling. This interdisciplinary approach could foster collaborations between researchers from different fields, such as physics, chemistry, machine learning.

References

[1] I.N. Levine. Quantum chemistry. Pearson advanced chemistry series, 2014.
[2] Albert P Bartók, Mike C Payne, Risi Kondor, and Gábor Csányi. Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons. Physical review letters, 104(13):136403, 2010.
[3] Narbe Mardirossian and Martin Head-Gordon. Thirty years of density functional theory in computational chemistry: an overview and extensive assessment of 200 density functionals. Molecular Physics, 115(19):2315–2372, 2017.
[4] Weitao Yang. Direct calculation of electron density in density-functional theory. Physical review letters, 66(11):1438, 1991.
[5] Eugene S Kryachko and Eduardo V Ludeña. Energy density functional theory of many-electron systems, volume 4. Springer Science & Business Media, 2012.
[6] J Andzelm and E Wimmer. Density functional gaussian-type-orbital approach to molecular geometries, vibrations, and reaction energies. The Journal of chemical physics, 96(2):1280–1303, 1992.
[7] Jens Jørgen Mortensen, Lars Bruno Hansen, and Karsten Wedel Jacobsen. Real-space grid implementation of the projector augmented wave method. Physical review B, 71(3):035109, 2005.
[8] Richard J Morris, Rafael J Najmanovich, Abdullah Kahraman, and Janet M Thornton. Real spherical harmonic expansion coefficients as 3d shape descriptors for protein binding pocket and ligand comparisons. Bioinformatics, 21(10):2347–2355, 2005.
[9] Jeanne L McHale. Molecular spectroscopy. CRC Press, 2017.
[10] Praveen C Hariharan and John A Pople. The influence of polarization functions on molecular orbital hydrogenation energies. Theoretica chimica acta, 28:213–222, 1973.
[11] RHWJ Ditchfield, Warren J Hehre, and John A Pople. Self-consistent molecular-orbital methods. ix. an extended gaussian-type basis for molecular-orbital studies of organic molecules. The Journal of Chemical Physics, 54(2):724–728, 1971.
[12] Vitaly A Rassolov, Mark A Ratner, John A Pople, Paul C Redfern, and Larry A Curtiss. 6-31g* basis set for third-row atoms. Journal of Computational Chemistry, 22(9):976–984, 2001.
[13] Trygve Helgaker, Sonia Coriani, Poul Jørgensen, Kasper Kristensen, Jeppe Olsen, and Kenneth Ruud. Recent advances in wave function-based methods of molecular-property calculations. Chemical reviews, 112(1):543–631, 2012.
[14] Mati Karelson, Victor S Lobanov, and Alan R Katritzky. Quantum-chemical descriptors in qsar/qspr studies. Chemical reviews, 96(3):1027–1044, 1996.
[15] Frank Jensen. Polarization consistent basis sets: Principles. The Journal of Chemical Physics, 115(20):9113–9125, 2001.
[16] Daniel Sánchez-Portal, Pablo Ordejon, Emilio Artacho, and Jose M Soler. Density-functional method for very large systems with lcao basis sets. International journal of quantum chemistry, 65(5):453–461, 1997.
[17] Raghunathan Ramakrishnan, Pavlo O Dral, Matthias Rupp, and O Anatole von Lilienfeld. Quantum chemistry structures and properties of 134 kilo molecules. Scientific Data, 1, 2014.
[18] Ernest R Davidson and David Feller. Basis set selection for molecular calculations. Chemical Reviews, 86(4):681–696, 1986.
[19] Yi-Lun Liao and Tess Smidt. Equiformer: Equivariant graph attention transformer for 3d atomistic graphs. In The Eleventh International Conference on Learning Representations, 2022.
[20] Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E Smidt, and Boris Kozinsky. E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature communications, 13(1):1–11, 2022.
[21] Brandon Anderson, Truong-Son Hy, and Risi Kondor. Cormorant: Covariant molecular neural networks. arXiv preprint arXiv:1906.04015, 2019.
[22] Fabian B Fuchs, Daniel E Worrall, Volker Fischer, and Max Welling. Se (3)-transformers: 3d roto-translation equivariant attention networks. arXiv preprint arXiv:2006.10503, 2020.
[23] Johannes Brandstetter, Rob Hesselink, Elise van der Pol, Erik J Bekkers, and Max Welling. Geometric and physical quantities improve e (3) equivariant message passing. In International Conference on Learning Representations, 2021.
[24] Oliver T Unke, Stefan Chmiela, Michael Gastegger, Kristof T Schütt, Huziel E Sauceda, and Klaus-Robert Müller. Spookynet: Learning force fields with electronic degrees of freedom and nonlocal effects. Nature communications, 12(1):7273, 2021.
[25] Ilyes Batatia, David P Kovacs, Gregor Simm, Christoph Ortner, and Gábor Csányi. Mace: Higher order equivariant message passing neural networks for fast and accurate force fields. Advances in Neural Information Processing Systems, 35:11423–11436, 2022.
[26] Thorben Frank, Oliver Unke, and Klaus-Robert Müller. So3krates: Equivariant attention for interactions on arbitrary length-scales in molecular systems. Advances in Neural Information Processing Systems, 35:29400–29413, 2022.
[27] William Raymond Scott. Group theory. Courier Corporation, 2012.
[28] Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, and Patrick Riley. Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv preprint arXiv:1802.08219, 2018.
[29] Kristof Schütt, Oliver Unke, and Michael Gastegger. Equivariant message passing for the prediction of tensorial properties and molecular spectra. In International Conference on Machine Learning, pages 9377–9388. PMLR, 2021.
[30] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In International Conference on Machine Learning, pages 1263–1272. PMLR, 2017.
[31] Philipp Thölke and Gianni De Fabritiis. Torchmd-net: Equivariant transformers for neural network based molecular potentials. arXiv preprint arXiv:2202.02541, 2022.
[32] Bhupalee Kalita, Li Li, Ryan J McCarty, and Kieron Burke. Learning to approximate density functionals. Accounts of Chemical Research, 54(4):818–826, 2021.
[33] Gabriel R Schleder, Antonio CM Padilha, Carlos Mera Acosta, Marcio Costa, and Adalberto Fazzio. From dft to machine learning: recent approaches to materials science–a review. Journal of Physics: Materials, 2(3):032001, 2019.
[34] Eberhard Engel. Density functional theory. Springer, 2011.
[35] Walter Kohn and Lu Jeu Sham. Self-consistent equations including exchange and correlation effects. Physical review, 140(4A):A1133, 1965.
[36] Vıctor Garcia Satorras, Emiel Hoogeboom, and Max Welling. E (n) equivariant graph neural networks. In International conference on machine learning, pages 9323–9332. PMLR, 2021.
[37] Stefan Chmiela, Huziel E Sauceda, Igor Poltavsky, Klaus-Robert Müller, and Alexandre Tkatchenko. sgdml: Constructing accurate and data efficient molecular force fields using machine learning. Computer Physics Communications, 240:38–45, 2019.
[38] Giuseppe M. J. Barca, Colleen Bertoni, Laura Carrington, Dipayan Datta, Nuwan De Silva, J. Emiliano Deustua, Dmitri G. Fedorov, Jeffrey R. Gour, Anastasia O. Gunina, Emilie Guidez, Taylor Harville, Stephan Irle, Joe Ivanic, Karol Kowalski, Sarom S. Leang, Hui Li, Wei Li, Jesse J. Lutz, Ilias Magoulas, Joani Mato, Vladimir Mironov, Hiroya Nakata, Buu Q. Pham, Piotr Piecuch, David Poole, Spencer R. Pruitt, Alistair P. Rendell, Luke B. Roskop, Klaus Ruedenberg, Tosaporn Sattasathuchana, Michael W. Schmidt, Jun Shen, Lyudmila Slipchenko, Masha Sosonkina, Vaibhav Sundriyal, Ananta Tiwari, Jorge L. Galvez Vallejo, Bryce Westheimer, Marta Wloch, Peng Xu, Federico Zahariev, and Mark S. Gordon. Recent developments in the general atomic and molecular electronic structure system. The Journal of Chemical Physics, 152(15):154102, April 2020.
[39] Kristof T Schütt, Pieter-Jan Kindermans, Huziel E Sauceda, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Robert Müller. Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. arXiv preprint arXiv:1706.08566, 2017.
[40] Johannes Klicpera, Shankari Giri, Johannes T. Margraf, and Stephan Günnemann. Fast and uncertainty-aware directional message passing for non-equilibrium molecules. In NeurIPS-W, 2020.
[41] Johannes Gasteiger, Florian Becker, and Stephan Günnemann. Gemnet: Universal directional graph neural networks for molecules. Advances in Neural Information Processing Systems, 34:6790–6802, 2021.
[42] Johan J de Swart. The octet model and its clebsch-gordan coefficients. In The Eightfold Way, pages 120–143. CRC Press, 2018.
[43] Eugen Wigner. Gruppentheorie und ihre anwendung auf die quantenmechanik der atomspektren. Monatshefte für Mathematik und Physik, 1931.
[44] Congyue Deng, Or Litany, Yueqi Duan, Adrien Poulenard, Andrea Tagliasacchi, and Leonidas J Guibas. Vector neurons: A general framework for so (3)-equivariant networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12200–12209, 2021.
[45] Emiel Hoogeboom, Vıctor Garcia Satorras, Clément Vignac, and Max Welling. Equivariant diffusion for molecule generation in 3d. In International conference on machine learning, pages 8867–8887. PMLR, 2022.
[46] Kevin Ryczko, David A Strubbe, and Isaac Tamblyn. Deep learning and density-functional theory. Physical Review A, 100(2):022512, 2019.
[47] Kristof T Schütt, Michael Gastegger, Alexandre Tkatchenko, K-R Müller, and Reinhard J Maurer. Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions. Nature communications, 10(1):5024, 2019.
[48] Ryan Pederson, Bhupalee Kalita, and Kieron Burke. Machine learning and density functional theory. Nature Reviews Physics, 4(6):357–358, 2022.
[49] Bing Huang, Guido Falk von Rudorff, and O Anatole von Lilienfeld. The central role of density functional theory in the ai age. Science, 381(6654):170–175, 2023.
[50] Tian Xie and Jeffrey C Grossman. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Physical review letters, 120(14):145301, 2018.
[51] Omar Allam, Byung Woo Cho, Ki Chul Kim, and Seung Soon Jang. Application of dft-based machine learning for develo** molecular electrode materials in li-ion batteries. RSC advances, 8(69):39414–39420, 2018.
[52] Andrew Lee, Suchismita Sarker, James E Saal, Logan Ward, Christopher Borg, Apurva Mehta, and Christopher Wolverton. Machine learned synthesizability predictions aided by density functional theory. Communications Materials, 3(1):73, 2022.
[53] Pin Chen, Jianwen Chen, Hui Yan, Qing Mo, Zexin Xu, **yu Liu, Wenqing Zhang, Yuedong Yang, and Yutong Lu. Improving material property prediction by leveraging the large-scale computational database and deep learning. The Journal of Physical Chemistry C, 126(38):16297–16305, 2022.
[54] Hsin-Yuan Huang, Richard Kueng, Giacomo Torlai, Victor V Albert, and John Preskill. Provably efficient machine learning for quantum many-body problems. Science, 377(6613):eabk3333, 2022.
[55] Bing Huang, Guido Falk von Rudorff, and O Anatole von Lilienfeld. Towards self-driving laboratories in chemistry and materials sciences: The central role of dft in the era of ai. arXiv preprint arXiv:2304.03272, 2023.
[56] Chenru Duan, Fang Liu, Aditya Nandy, and Heather J Kulik. Putting density functional theory to the test in machine-learning-accelerated materials discovery. The Journal of Physical Chemistry Letters, 12(19):4628–4637, 2021.
[57] Haiyang Yu, Meng Liu, Youzhi Luo, Alex Strasser, Xiaofeng Qian, Xiaoning Qian, and Shuiwang Ji. Qh9: A quantum hamiltonian prediction benchmark for qm9 molecules. Advances in Neural Information Processing Systems, 36, 2024.
[58] Pierre Hohenberg and Walter Kohn. Inhomogeneous electron gas. Physical review, 136(3B):B864, 1964.
[59] Jean-Pierre Serre et al. Linear representations of finite groups, volume 42. Springer, 1977.
[60] Teturo Inui, Yukito Tanabe, and Yositaka Onodera. Group theory and its applications in physics, volume 78. Springer Science & Business Media, 2012.
[61] Deutsche Akademie der Wissenschaften zu Berlin. Sitzungsberichte der Königlich Preussischen Akademie der Wissenschaften zu Berlin, volume Jan-Mai 1882. Berlin, Deutsche Akademie der Wissenschaften zu Berlin, 1882-1918, 1882. https://www.biodiversitylibrary.org/bibliography/42231.
[62] Brian C Hall and Brian C Hall. Lie groups, Lie algebras, and representations. Springer, 2013.
[63] Claus Müller. Spherical harmonics, volume 17. Springer, 2006.
[64] M Shiraishi. Spin-Weighted Spherical Harmonic Function. Springer, 2013.
[65] Stefan Chmiela, Alexandre Tkatchenko, Huziel E Sauceda, Igor Poltavsky, Kristof T Schütt, and Klaus-Robert Müller. Machine learning of accurate energy-conserving molecular force fields. Science advances, 3(5):e1603015, 2017.
[66] Thomas Kipf, Ethan Fetaya, Kuan-Chieh Wang, Max Welling, and Richard Zemel. Neural relational inference for interacting systems. In International conference on machine learning, pages 2688–2697. PMLR, 2018.

Appendix A Appendix / supplemental material

A.1 Table of notations

Table S1: Table of notations in this manuscript

Variable	Definition
$i$	an index of an atom of a molecule
$t$	a layer index of a baseline network
$N$	the number of atoms in a molecule
$x_{i}$	a three-dimensional coordinate of $i$ -th atom
$a_{i}$	an atom number (or atom type) of $i$ -th atom
$v_{t,i}$	a node feature of $t$ - th Layer output
$\mathbf{x}$	a set of $\{x_{0},x_{1},...,x_{N-1}\}$
$\hat{y}$	predicted target (molecular property)
$X$	molecule conformation
$\rho$	electron density
$\chi$	latent feature of a baseline network
$\xi$	map** from latent feature $\chi$ to electron density $\rho$

A.2 Related works

Molecule property prediction based on molecule structure in Euclidean space Based on the relationship between molecule conformation and its properties, many existing networks utilized molecule conformation as a source for property prediction tasks. SchNet [39] introduced a radial basis function for embedding continuous-valued atom-atom distances, and DimeNet [40] utilized Bessel basis functions for embedding continuous-valued angles between three atoms.

To more accurately represent molecule structures located in Euclidean space, recent studies introduced geometry-group theory in their network. Various types of representation theory have been utilized in modern chemistry for structural analysis of molecules or crystals. Cormorant [21], NequIP [20], GemNet [41], PaiNN [29], and SpookyNet [24] are the well-known examples of their geometry-group equivariant blocks for molecule conformation. These studies contributed to more accurate and reliable molecule structure learning, however, the effects caused by electron configurations may still be limited in these type of representation.

Two branches of implementing equivariant networks Broadly speaking, there are two branches for implementing geometry-group equivariance in a neural network. One branch relied on the representation technique for SO(3) group in Physics. In particular, they introduced Clebsch-Gordan coefficients [42] or Wigner-D matrices [43], for SO(3)-group equivariant tensor product for features defined on spherical basis. Cormorant [21], NequIP [20], SE(3)-Transformer [22] are the examples belong to this category. Further explanation is described in A.4.

On the other hand, EGNN [36] and several following works [44, 45] did not use a computationally expensive tensor production. Instead, they separated scalar-valued features for scales and vector valued-features for directions, and trained them as individual features for molecule structure. This approach is relatively efficient in terms of computational complexity for larger molecules, in general.

Utilization of DFT for property prediction in machine learning As machine learning-based methods have progressed for solving more sophisticated problems in chemistry and material science, many recent studies [46, 47, 32, 48, 49] focused on DFT for a wide range of tasks. [50, 51] studied the DFT-related property prediction for the given materials. [52, 53, 54] are the examples of considering DFT for other molecules-related tasks. In one of the review paper [55], the authors argued that understanding DFT will be the necessary background to explore the chemical property for machine learning-based methods.

The prediction performance of current networks Despite the rapid increase in prediction performance of deep neural networks over a short period, there remains a considerable gap compared to DFT methods. There have been several reports [56, 1, 37, 57] about these limitations of the current neural networks in Chemistry research, and they pointed out that one of the possible limitations is that the deep learning approaches could not utilize enough chemical information in appropriate ways, including DFT.

A.3 Brief introduction about DFT

Density functional theory (DFT) [1] is one of the most widely used methodologies for studying molecular properties. The overall process of DFT calculation consists of inferring the electron density and calculating the properties of the molecule based on the computed electron density. Molecules are composed of multiple atoms with surrounding electrons. According to quantum mechanics, the location of an electron cannot be specified as a point, but rather as a probability distribution over the space, called electron density. The first Hohenberg-Kohn theorem [58] states that the ground state electron density of a molecule uniquely determines the external potential, and consequently all ground state molecular properties. This theorem implies that knowing just the electron density is enough to calculate any ground state property including energy.

A.4 Equivariance and representations in SO(3) and SE(3) symmetry

We briefly review several concepts on equivariance as an essential background for our motivation and strategy. A group [27] $(G,\circ)$ is a type of an algebraic structure consisting of a non-empty set $G=\{g\}$ and a binary operation $\circ:G\times G\rightarrow G$ with satisfying three requirements: an associativity, an identity, and an inverse element. A (left) group action $a$ of $G$ on a set $X$ is a function $a:G\times X\rightarrow X$ with satisfying identity and compatibility for all $g,h\in G$ and all $x\in X$ .

Group representation [59, 60] is $\varphi$ a group homomorphism from a group $G$ to a general linear group $GL(V)$ , which enables group actions to be represented as a matrix multiplication in (finite) vector space $V$ . If $\varphi:G\rightarrow GL(V)$ has only trivial subrepresentations, it is called an irreducible representation or irrep. One important property of irreducible representations is the Great Orthogonality Theorem [61] stated by Schur’s orthogonality relations [62], $\sum^{\{|G|\}}_{R\in G}\varphi^{(L)}(g)_{l_{m},l_{n}}=0$ for $l_{n},l_{m}=1,...,L$ are the dimension of $\varphi,\forall\varphi\neq I_{L}$ . This theorem is proved by Schur’s lemma, 1) If $V$ and $W$ are not isomorphic, then there are no nontrivial $G$ -linear maps between them, and 2) if $V=W$ and $\varphi_{V}=\varphi_{W}$ , then the only nontrivial $G$ -linear maps are the scalar multiplication of the identity.

Any function $f$ satisfies $f(g\circ x)=f(x)$ is called an invariant function of group $G$ on $X$ , while it is called equivariant if it satisfies $f(g\circ x)=g\circ f(x)$ .

In this study, we focus on the special orthogonal group $\mathbb{SO}(3)$ , the group of all rotations under function composition in three-dimensional Euclidean space. The irreducible representations of $\mathbb{SO}(3)$ are called Wigner-D matrices $D^{L}(g)$ of dimension $2L+1$ , and there are $(2L+1)\times(2L+1)$ type of irreps matrices $D^{L}_{m,m^{\prime}}$ , with $-L\leq m,m^{\prime}\leq L$ , respectively (footnote: we assume integer L, m=m’ for real-valued spherical harmonics).

Spherical harmonics [63] is a set of orthonormal basis functions for irreducible representations of $\mathbb{SO}(3)$ , and denoted by $Y_{m}^{L}(\theta,\phi)$ with an integer degree $l$ . On the unit sphere $S^{2}$ , any square-integrable function $s:S^{2}\rightarrow\mathbb{C}$ can be expanded as a linear combination $s(\theta,\phi)=\sum_{l=0}^{\infty}\sum_{m=-L}^{L}f_{m}^{L\ast}Y_{m}^{L}(\theta% ,\phi)$ . Accordingly, any group action $g\in\mathbb{SO}(3)$ can be expressed as a direct sum of Wigner-D matrices $D^{L}_{m,m^{\prime}}$ with a change of basis $P:P^{-1}D(g)P=D^{\prime}(g)$ as follows:

D(g)=P^{-1}\Bigg{(}\bigoplus_{i}D_{l_{i}}(g)\Bigg{)}P=P^{-1}\begin{bmatrix}D^{% l_{0}}(g)&&\\ &D^{l_{1}}(g)&\\ &&\ldots\end{bmatrix}P

(9)

We can write a rotation $R$ in three-dimensional Euclidean space followed as:

Y_{m}^{L}(\theta+\Delta\theta,\phi+\Delta\phi)=\sum_{m^{\prime}=-L}^{L}\Big{[}% D^{L}_{m,m^{\prime}}(\alpha,\beta,\gamma)\Big{]}Y^{L}_{m^{\prime}}(\theta,\phi)

(10)

For a tensor production of two spherical tensors $f^{L_{1}}_{m_{1}}$ and $f^{L_{2}}_{m_{2}}$ in $\mathbb{SO}(3)$ , the Clebsch-Gordan coefficients [62] $C_{(L_{1},m_{1}),(L_{2},m_{2})}^{(L_{3},m_{3})}$ are used to assign the numerical value according to each decomposed type of two spherical tensors followed as:

f^{L_{3}}_{m_{3}}=\sum_{m_{1}=-L_{1}}^{L_{1}}\sum_{m_{2}=-L_{2}}^{L_{2}}C_{(L_% {1},m_{1}),(L_{2},m_{2})}^{(L_{3},m_{3})}f^{L_{1}}_{m_{1}}f^{L_{2}}_{m_{2}}

(11)

In quantum chemistry, we can describe an electron charge distribution (cloud) of an atom generated by the interactions between an atom and its electrons [1]. It is called as an atomic orbital, and described as spherical coordinates with a radial term $R(r)$ and spherical harmonics $Y_{m}^{l}(\theta,\phi)$ of polar angle $\theta$ and azimutal angle $\phi$ , with different degree $l$ and order $m$ . Integer-valued degree of real spherical harmonics (footnote: notation) $l=0,1,2,...$ correspond to $s,p,d,...$ orbital of an atom, and more higher $l$ -basis orbital can capture higher angular frequency of a function $f_{m}^{L}$ defined on the surface of a sphere $S^{2}$ . The total angular momentum coupling can be described by an element of rotation group $g\in\mathbb{SO}(3)$ , and Wigner-D matrix [64] $D^{L}_{m,m^{\prime}}$ with Clebsch-Gordan coefficients $\{C_{(L_{1},m_{1}),(L_{2},m_{2})}^{(L_{3},m_{3})}\}$ are used to product two angular momenta represented as spherical tensors of dimension $2L_{1}+1$ and $2L_{2}+1$ , respectively.

A.5 Equivalence between selection of basis set and node feature

Let assume some basis set of DFT $B$ with spherical harmonics. Since node feature $\mathbf{v}_{t}\in\mathbb{V}_{eq}$ computed by equivariant layer is equivariant under $\mathbb{SO}(3)$ transformation, $\mathbf{v}_{t}$ may available to represented as combination of irreducible representation of $\mathbb{SO}(3)$ . Using notation defined in 11 , let us represent $\{L,m\}$ -degree $\mathbb{SO}(3)$ irreducible feature of ${v}_{t,i}$ as $f^{L}_{mi}=\sum_{j}f^{L}_{mij}$ . With this definition we can represent ${v}_{t,i}=\sum_{L,m,i,j}{f^{L}_{mij}}$ .

Now, consider linear map $\xi^{\prime}_{1/2}$ which maps $f^{L}_{mij}$ into $R_{ijL}(r-x_{i})Y^{L}_{m}(\theta_{i},\phi_{i})$ . Then map $\xi^{\prime}(\mathbf{v})={\xi^{\prime}_{1/2}(\mathbf{v})}^{2}\in(\mathbb{V}_{% eq}^{N}\rightarrow(\mathbb{R}^{3}\rightarrow\mathbb{R}))$ maps layer output $\mathbf{v}_{t}$ into electron density, which can be represented as quadratic form under $B$ . Meanwhile, any electron density yielded from DFT with $B$ also forms quadratic form under $B$ . Therefore, any invertible linear map $\xi$ defined for $\mathbf{v}_{t}$ can be converted into linear map, which its image is subset of electron density that can be generated by DFT by $\xi^{\prime}_{1/2}\cdot\xi^{-1}$ .

A.6 Proof of Proposition 1

Let $B_{P}$ basis set of $P$ and $\tilde{B}_{P}[\tilde{\mathbf{x}}]$ as basis set of $\tilde{P}[\tilde{\mathbf{x}}]$ . Since basis $B_{P}$ is equivalent with DFT by equivariant constraint, any basis function in $B_{P}$ can be represented by combination of spherical harmonics and radial function centered at a position $x_{i}$ .

\displaystyle B_{P}=\{R_{ijL}(r-x_{i})Y^{L}_{m}(\theta_{i},\phi_{i})|1\leq i% \leq N,-L\leq m\leq L,j\leq J_{Lm}\}

(12)

Similarly, a basis set with Neural Polarization $\tilde{B}_{P}[\tilde{\mathbf{x}}]$ can be represented as below.

\displaystyle\tilde{B}_{P}[\tilde{\mathbf{x}}]=P\cap\{R_{ijLm}(r-\tilde{x}_{i}% )Y^{L}_{m}(\tilde{\theta_{i}},\tilde{\phi_{i}})|i\leq N,-L\leq m\leq L,j\leq J% _{Lm}\}

(13)

Since $B_{P}$ is a finite basis, there exists density $\rho$ such that $\rho\notin P=span(B_{P})$ . With this $\rho$ , let $\rho_{min}$ an electron density which yield minimal error $\mathcal{E}(P,\rho)=\mathcal{E}(\rho_{min},\rho)$ . When $L=0$ , spherical harmonics is constant and $R_{ij0}(r-\tilde{x}_{i})\in\tilde{B}_{P}[\tilde{\mathbf{x}}]$ . For convenience let define $\phi[\tilde{x}_{i}]=R_{ij0}(r-\tilde{x}_{i})$ .

Since $\mathcal{E}(\rho_{min},\rho)>0$ , $\rho-~{}\rho\neq 0$ and there exists some $\tilde{x}_{i}\in\mathbb{R}^{3}$ such that $\int_{\mathbf{R}^{3}}(\rho-\rho_{min})\phi[\tilde{x}_{i}]{dV}=\Delta\neq 0$ because of radial symmetry in $\phi[\tilde{x}_{i}]$ . With this $\tilde{x}_{i}$ ,

$\displaystyle\mathcal{E}(\rho_{min}+\frac{\Delta}{2}\phi[\tilde{x}_{i}],\rho)$	$\displaystyle=\int_{\mathbf{R}^{3}}(\rho-\rho_{min}-\frac{\Delta}{2}\phi[% \tilde{x}_{i}])^{2}\phi[\tilde{x}_{i}]{dV}$	(14)
	$\displaystyle=\int_{\mathbf{R}^{3}}(\rho-\rho_{min})^{2}+(\frac{\Delta}{2}\phi% [\tilde{x}_{i}])^{2}-\Delta\phi[\tilde{x}_{i}](\rho-\rho_{min}){dV}$	(15)
	$\displaystyle=\mathcal{E}(\rho_{min},\rho)+(\frac{\Delta}{2})^{2}-\Delta^{2}$	(16)
	$\displaystyle=\mathcal{E}(\rho_{min},\rho)-\frac{3}{4}\Delta^{2}$	(17)
	$\displaystyle<\mathcal{E}(\rho_{min},\rho)$	(18)

Because $\rho_{min}+\frac{\Delta}{2}\phi_{ij0}\in\tilde{P}[\tilde{\mathbf{x}}]$ , with $\tilde{x}_{i}\in\tilde{\mathbf{x}}$

\displaystyle\mathcal{E}(\tilde{P}[\tilde{\mathbf{x}}],\rho)\leq\mathcal{E}(% \rho_{min}+\frac{\Delta}{2}\phi_{ij0},\rho)<\mathcal{E}(\rho_{min},\rho)=% \mathcal{E}(P,\rho)

(20)

A.7 Datasets

The QM9 dataset [17] is a dataset containing various molecular properties for 134k small molecules. The molecules included in the dataset are composed of C, N, O, H, and F and contain up to 29 atoms. This dataset is frequently used for measuring performance of deep learning models for molecules. The dataset consists of molecular conformations, the atomic coordinates of the given molecule, and 12 different molecular properties calculated from these conformations, such as internal Energy and dipole moment.

The MD17 dataset [65] is a dataset about molecular dynamics (MD) trajectories of 10 small molecules. The dataset consists of molecular conformations from the trajectories with the corresponding molecular energy and atomic forces. The task for this dataset is to predict the forces and energy for a given conformation. In this paper, we used the MD17 dataset configuration addressed in TorchMD-NET, which is the base model for comparison.

The n-body system task we experimented was suggested from EGNN, which extended task proposed in [66]. The dataset consists of trajectories generated by five charged particles, whose trajectories are simulated based on charged interactions. The task for this dataset is to predict the positions after 1000 steps from given initial positions.

A.8 Detailed Implementation of Neural Polarization

We implemented Neural Polarization based on the official GitHub repositories provided by each model. Neural Polarization was implemented using the same way with three types of baseline networks, except for the projection layer.For EGNN we used Multilayer Perceptron (MLP) with a single hidden layer of 128 dimension as a node feature. In TorchMD-NET, the projection layer was implemented as a single linear layer without bias for the $l=1$ feature of each layer output $\mathbf{\tilde{v}_{t,i}}$ . For Equiformer, the projection layer was implemented as a shallow equivariant network based on a tensor product with one $l=1$ feature.

To minimize undesired effect on performances arose by hyperparameter optimization, we fixed all hyperparameters except the learning rate and the batch size provided in the official repositories. We modified batch size as half only in cases where out-of-memory (OOM) errors occurred. For other implementations such as data splitting, optimizer, learning rate scheduling strategy, and objective loss function, we followed configurations implemented in the official repositories. Experiments were conducted on NVIDIA V100, A40, or A100 GPUs. Each experiment with the QM9 dataset has required maximum 240 GPU hours each, while other experiments required less than 24 GPU hours. Training times differ from model types, because we followed the early-stop** condition implemented in each original repository.