A Theoretical Framework for an Efficient Normalizing Flow-Based Solution to the Schrödinger Equation

Daniel Freedman1        Eyal Rozenberg1,2        Alex Bronstein2
1Verily Research        2Technion - Israel Institute of Technology
Abstract

A central problem in quantum mechanics involves solving the Electronic Schrödinger Equation for a molecule or material. The Variational Monte Carlo approach to this problem approximates a particular variational objective via sampling, and then optimizes this approximated objective over a chosen parameterized family of wavefunctions, known as the ansatz. Recently neural networks have been used as the ansatz, with accompanying success. However, sampling from such wavefunctions has required the use of a Markov Chain Monte Carlo approach, which is inherently inefficient. In this work, we propose a solution to this problem via an ansatz which is cheap to sample from, yet satisfies the requisite quantum mechanical properties. We prove that a normalizing flow using the following two essential ingredients satisfies our requirements: (a) a base distribution which is constructed from Determinantal Point Processes; (b) flow layers which are equivariant to a particular subgroup of the permutation group. We then show how to construct both continuous and discrete normalizing flows which satisfy the requisite equivariance. We further demonstrate the manner in which the non-smooth nature (“cusps”) of the wavefunction may be captured, and how the framework may be generalized to provide induction across multiple molecules. The resulting theoretical framework entails an efficient approach to solving the Electronic Schrödinger Equation.

1 Introduction

The Electronic Schrödinger Equation   A central problem in quantum mechanics involves solving the Electronic Schrödinger Equation to compute the ground state energy and wavefunction of a molecule or material. This problem has manifold applications in chemistry, condensed matter physics, and materials science. A standard computational approach to this problem is based on Variational Monte Carlo (Ceperley and Alder, 1986; Austin et al., 2012; Gubernatis et al., 2016; Foulkes et al., 2001; Needs et al., 2009): a particular variational objective is approximated via sampling, and the approximated objective is optimized over a family of wavefunctions, yielding an upper bound on the ground state energy. The heart of this method is the wavefunction family, also known as the ansatz; recent work has proposed using neural networks as a flexible ansatz, and has achieved very high quality results, which we now describe further.

Neural Network Ansätze   We begin by noting that various works have used neural networks as the ansätze in the case of pure-spin systems (sometimes also referred to as “discrete space systems”), for example (Carleo and Troyer, 2017; Deng et al., 2017; Gao and Duan, 2017; Levine et al., 2019; Sharir et al., 2022; Passetti et al., 2023). In terms of continuous space problems of the sort that interest us, DeepWF (Han et al., 2019) bases its on ansatz on the classical Slater-Jastrow formalism, but learns both the symmetric and antisymmetric parts; the latter contains only two-electron terms, limiting the accuracy. PauliNet (Hermann et al., 2020; Schätzle et al., 2021) also bases its ansatz on the Slater-Jastrow-Backflow form, but does so in a way that captures many-electron interactions, while respecting permutation-equivariance; this, as well as the inclusion of cusp terms, leads to much higher accuracy (e.g. 97.3% of the correlation energy for boron atoms). FermiNet (Pfau et al., 2020; Spencer et al., 2020) attains still higher accuracy (e.g. 99.8% of the correlation energy for boron atoms) by using an appropriately designed neural network to represent the entire wavefunction, which contains a generalization of Slater determinants to account for all-electron interactions. A hybrid solution which improves upon both PauliNet and FermiNet is presented in (Gerard et al., 2022). Techniques for learning / induction across several molecules or materials at once are presented in (Gao and Günnemann, 2023; Scherbela et al., 2024; Gerard et al., 2024). We briefly mention applications to periodic systems (Wilson et al., 2022; Li et al., 2022; Pescia et al., 2022; Cassella et al., 2023); techniques that use Diffusion Monte Carlo (Wilson et al., 2021; Ren et al., 2023); and methods that deal with excited states (Entwistle et al., 2023; Pfau et al., 2023; Naito et al., 2023). Finally, we mention two works which use normalizing flows (Thiede et al., 2022; Saleh et al., 2023). Both are limited in their applicability, as the former is restricted to one-dimensional systems by construction, while the latter makes use of the flows in a non-standard way and thus cannot scale past systems with a few electrons.

Goals and Contributions   In order to be able to apply the Variational Monte Carlo formalism to the ansätze just described, such as PauliNet or FermiNet, one must be able to sample from the densities corresponding to the wavefunctions given by their neural networks. In general, this is only possibly using Markov Chain Monte Carlo (MCMC) techniques such as Langevin Monte Carlo (Umrigar et al., 1993) or any of several variations. The issue with using such MCMC approaches to sampling is that they are inherently time-consuming: each sample is itself the solution of a stochastic differential equation as time goes to infinity. The main goal of this paper is to solve the problem of sampling inefficiency, thereby yielding faster algorithms for solving the Electronic Schrödinger Equation. We achieve this goal by specifying a wavefunction ansatz which is easy to sample from, yet satisfies the requisite quantum mechanical properties. The ansatz is based on normalizing flows, which unlike (Thiede et al., 2022; Saleh et al., 2023) are general and can be applied to a space of any dimensionality. We provide the following contributions:

  • We establish that such an ansatz can be instantiated as a normalizing flow with these characteristics: (a) its base distribution is symmetric under permutations, and vanishes for identical electrons; (b) the flow transformation is equivariant to a particular subgroup of the permutation group.

  • We show that the base distribution can be constructed using a particular combination of Determinantal Point Processes.

  • We construct both continuous and discrete normalizing flows obeying the requisite equivariance.

  • We provide a training regimen based on standard stochastic gradient descent.

  • We show how to accommodate cusps, which encapsulate non-smooth aspects of the wavefunction.

  • We generalize the framework so that induction across multiple molecules may be accommodated, while including the necessary additional invariances, in particular rigid motion invariance.

2 Problem Setup

2.1 Goals

The Setting   Our overall goal is to compute the ground state wavefunction and energy of a molecule given its molecular parameters and spin multiplicity. Denote xi=(ri,si)subscript𝑥𝑖subscript𝑟𝑖subscript𝑠𝑖x_{i}=(r_{i},s_{i})italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) to be the pair consisting of the position and spin for the ithsuperscript𝑖𝑡i^{th}italic_i start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT electron; x𝑥xitalic_x will denote the entire ordered list (x1,,xn)subscript𝑥1subscript𝑥𝑛(x_{1},\dots,x_{n})( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), with corresponding definitions for r𝑟ritalic_r and s𝑠sitalic_s. We specify wavefunctions as ψ(x)𝜓𝑥\psi(x)italic_ψ ( italic_x ); due to the fact that electrons are Fermions, valid wavefunctions must be antisymmetric, that is if π𝕊n𝜋subscript𝕊𝑛\pi\in\mathbb{S}_{n}italic_π ∈ blackboard_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is a permutation, then

ψ(πx)=(1)πψ(x)𝜓𝜋𝑥superscript1𝜋𝜓𝑥\psi(\pi x)=(-1)^{\pi}\psi(x)italic_ψ ( italic_π italic_x ) = ( - 1 ) start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT italic_ψ ( italic_x ) (1)

where as usual, (1)πsuperscript1𝜋(-1)^{\pi}( - 1 ) start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT is shorthand for (1)N(π)superscript1𝑁𝜋(-1)^{N(\pi)}( - 1 ) start_POSTSUPERSCRIPT italic_N ( italic_π ) end_POSTSUPERSCRIPT where N(π)𝑁𝜋N(\pi)italic_N ( italic_π ) is the minimal number of flips to produce π𝜋\piitalic_π.

Let RIsubscript𝑅𝐼R_{I}italic_R start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT and ZIsubscript𝑍𝐼Z_{I}italic_Z start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT denote the position and atomic number of the Ithsuperscript𝐼𝑡I^{th}italic_I start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT nucleus, and let the Laplacian for the ithsuperscript𝑖𝑡i^{th}italic_i start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT electron be Δi=2ri12+2ri22+2ri32subscriptΔ𝑖superscript2superscriptsubscript𝑟𝑖12superscript2superscriptsubscript𝑟𝑖22superscript2superscriptsubscript𝑟𝑖32\Delta_{i}=\frac{\partial^{2}}{\partial r_{i1}^{2}}+\frac{\partial^{2}}{% \partial r_{i2}^{2}}+\frac{\partial^{2}}{\partial r_{i3}^{2}}roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_i 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_i 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG; then the Hamiltonian is given by

H=12iΔiΔ+i>j1rirjiIZIriRI+I>JZIZJRIRJV(x)=12Δ+V(x)𝐻12subscriptsubscript𝑖subscriptΔ𝑖Δsubscriptsubscript𝑖𝑗1normsubscript𝑟𝑖subscript𝑟𝑗subscript𝑖𝐼subscript𝑍𝐼normsubscript𝑟𝑖subscript𝑅𝐼subscript𝐼𝐽subscript𝑍𝐼subscript𝑍𝐽normsubscript𝑅𝐼subscript𝑅𝐽𝑉𝑥12Δ𝑉𝑥H=-\frac{1}{2}\underbrace{\sum_{i}\Delta_{i}}_{\Delta}+\underbrace{\sum_{i>j}% \frac{1}{\|r_{i}-r_{j}\|}-\sum_{iI}\frac{Z_{I}}{\|r_{i}-R_{I}\|}+\sum_{I>J}% \frac{Z_{I}Z_{J}}{\|R_{I}-R_{J}\|}}_{V(x)}=-\tfrac{1}{2}\Delta+V(x)italic_H = - divide start_ARG 1 end_ARG start_ARG 2 end_ARG under⏟ start_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT + under⏟ start_ARG ∑ start_POSTSUBSCRIPT italic_i > italic_j end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ∥ italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ end_ARG - ∑ start_POSTSUBSCRIPT italic_i italic_I end_POSTSUBSCRIPT divide start_ARG italic_Z start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT end_ARG start_ARG ∥ italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_R start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ∥ end_ARG + ∑ start_POSTSUBSCRIPT italic_I > italic_J end_POSTSUBSCRIPT divide start_ARG italic_Z start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT end_ARG start_ARG ∥ italic_R start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT - italic_R start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ∥ end_ARG end_ARG start_POSTSUBSCRIPT italic_V ( italic_x ) end_POSTSUBSCRIPT = - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_Δ + italic_V ( italic_x ) (2)

Our goal is to compute the ground state wavefunction, which we denote as ψ0(x)subscript𝜓0𝑥\psi_{0}(x)italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ) and corresponding ground state energy E0subscript𝐸0E_{0}italic_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. They may be computed using the variational principle:

ψ0=argminψΨψ|H|ψψ|ψandE0=ψ0|H|ψ0ψ0|ψ0formulae-sequencesubscript𝜓0subscriptargmin𝜓Ψquantum-operator-product𝜓𝐻𝜓inner-product𝜓𝜓andsubscript𝐸0quantum-operator-productsubscript𝜓0𝐻subscript𝜓0inner-productsubscript𝜓0subscript𝜓0\psi_{0}=\operatorname*{argmin}_{\psi\in\Psi}\frac{\langle\psi|H|\psi\rangle}{% \langle\psi|\psi\rangle}\qquad\text{and}\qquad E_{0}=\frac{\langle\psi_{0}|H|% \psi_{0}\rangle}{\langle\psi_{0}|\psi_{0}\rangle}italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_argmin start_POSTSUBSCRIPT italic_ψ ∈ roman_Ψ end_POSTSUBSCRIPT divide start_ARG ⟨ italic_ψ | italic_H | italic_ψ ⟩ end_ARG start_ARG ⟨ italic_ψ | italic_ψ ⟩ end_ARG and italic_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG ⟨ italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_H | italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⟩ end_ARG start_ARG ⟨ italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⟩ end_ARG (3)

where ΨΨ\Psiroman_Ψ is the set of all possible valid wavefunctions, and H𝐻Hitalic_H is the Hamiltonian. If we specify the wavefunction ansatz as a neural network with parameters θ𝜃\thetaitalic_θ, this becomes

θ=argminθψ(;θ)|H|ψ(;θ)ψ(;θ)|ψ(;θ)andE=ψ(;θ)|H|ψ(;θ)ψ(;θ)|ψ(;θ)E0formulae-sequencesuperscript𝜃subscriptargmin𝜃quantum-operator-product𝜓𝜃𝐻𝜓𝜃inner-product𝜓𝜃𝜓𝜃andsuperscript𝐸quantum-operator-product𝜓superscript𝜃𝐻𝜓superscript𝜃inner-product𝜓superscript𝜃𝜓superscript𝜃subscript𝐸0\theta^{*}=\operatorname*{argmin}_{\theta}\frac{\langle\psi(\cdot;\theta)|H|% \psi(\cdot;\theta)\rangle}{\langle\psi(\cdot;\theta)|\psi(\cdot;\theta)\rangle% }\qquad\text{and}\qquad E^{*}=\frac{\langle\psi(\cdot;\theta^{*})|H|\psi(\cdot% ;\theta^{*})\rangle}{\langle\psi(\cdot;\theta^{*})|\psi(\cdot;\theta^{*})% \rangle}\geq E_{0}italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_argmin start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT divide start_ARG ⟨ italic_ψ ( ⋅ ; italic_θ ) | italic_H | italic_ψ ( ⋅ ; italic_θ ) ⟩ end_ARG start_ARG ⟨ italic_ψ ( ⋅ ; italic_θ ) | italic_ψ ( ⋅ ; italic_θ ) ⟩ end_ARG and italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = divide start_ARG ⟨ italic_ψ ( ⋅ ; italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | italic_H | italic_ψ ( ⋅ ; italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ⟩ end_ARG start_ARG ⟨ italic_ψ ( ⋅ ; italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | italic_ψ ( ⋅ ; italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ⟩ end_ARG ≥ italic_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (4)

That is, we compute an upper bound Esuperscript𝐸E^{*}italic_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT to the ground state energy E0subscript𝐸0E_{0}italic_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. The more expressive the ansatz, the tighter the bound will be.

Variational Monte Carlo   The issue with the formulation to this point is the need to compute the inner products in Equations (3) and (4), which correspond to very high-dimensional integrals. A standard solution to this problem is based on a Monte Carlo scheme. To begin with, let us define the local energy as

(x)Hψ(x)ψ(x)=Δψ(x)2ψ(x)+V(x)andr(x)=Real{(x)}formulae-sequence𝑥𝐻𝜓𝑥𝜓𝑥Δ𝜓𝑥2𝜓𝑥𝑉𝑥andsubscript𝑟𝑥Real𝑥\mathcal{E}(x)\equiv\frac{H\psi(x)}{\psi(x)}=-\frac{\Delta\psi(x)}{2\psi(x)}+V% (x)\qquad\text{and}\qquad\mathcal{E}_{r}(x)=\text{Real}\{\mathcal{E}(x)\}caligraphic_E ( italic_x ) ≡ divide start_ARG italic_H italic_ψ ( italic_x ) end_ARG start_ARG italic_ψ ( italic_x ) end_ARG = - divide start_ARG roman_Δ italic_ψ ( italic_x ) end_ARG start_ARG 2 italic_ψ ( italic_x ) end_ARG + italic_V ( italic_x ) and caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x ) = Real { caligraphic_E ( italic_x ) } (5)

In this case, one can simplify the minimand in Equation (3) (see Appendix A) as

ψ|H|ψψ|ψ=𝔼xρ()[r(x)]1Kk=1Kr(x(k))withρ(x)=|ψ(x)|2ψ|ψformulae-sequencequantum-operator-product𝜓𝐻𝜓inner-product𝜓𝜓subscript𝔼similar-to𝑥𝜌delimited-[]subscript𝑟𝑥1𝐾superscriptsubscript𝑘1𝐾subscript𝑟superscript𝑥𝑘with𝜌𝑥superscript𝜓𝑥2inner-product𝜓𝜓\frac{\langle\psi|H|\psi\rangle}{\langle\psi|\psi\rangle}\,=\,\mathbb{E}_{x% \sim\rho(\cdot)}\left[\mathcal{E}_{r}(x)\right]\,\approx\,\frac{1}{K}\sum_{k=1% }^{K}\mathcal{E}_{r}\left(x^{(k)}\right)\qquad\text{with}\qquad\rho(x)=\frac{|% \psi(x)|^{2}}{\langle\psi|\psi\rangle}divide start_ARG ⟨ italic_ψ | italic_H | italic_ψ ⟩ end_ARG start_ARG ⟨ italic_ψ | italic_ψ ⟩ end_ARG = blackboard_E start_POSTSUBSCRIPT italic_x ∼ italic_ρ ( ⋅ ) end_POSTSUBSCRIPT [ caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x ) ] ≈ divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) with italic_ρ ( italic_x ) = divide start_ARG | italic_ψ ( italic_x ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ⟨ italic_ψ | italic_ψ ⟩ end_ARG (6)

where the x(k)superscript𝑥𝑘x^{(k)}italic_x start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT are sampled from ρ(x)=|ψ(x)|2/ψ|ψ𝜌𝑥superscript𝜓𝑥2inner-product𝜓𝜓\rho(x)=|\psi(x)|^{2}/\langle\psi|\psi\rangleitalic_ρ ( italic_x ) = | italic_ψ ( italic_x ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ⟨ italic_ψ | italic_ψ ⟩.

2.2 The General Approach

As enumerated in Section 1, a number of recent works have followed the above approach using a variety of neural networks as the ansatz for the wavefunction ψ(;θ)𝜓𝜃\psi(\cdot;\theta)italic_ψ ( ⋅ ; italic_θ ). In order to do so, one must be able to sample from ρ(x;θ)=|ψ(x;θ)|2/ψ|ψ𝜌𝑥𝜃superscript𝜓𝑥𝜃2inner-product𝜓𝜓\rho(x;\theta)=|\psi(x;\theta)|^{2}/\langle\psi|\psi\rangleitalic_ρ ( italic_x ; italic_θ ) = | italic_ψ ( italic_x ; italic_θ ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ⟨ italic_ψ | italic_ψ ⟩; and as the networks are quite general, the only feasible method for sampling is a Markov Chain Monte Carlo technique such as Langevin Monte Carlo (Umrigar et al., 1993) or any of several variations. These techniques can be time-consuming, as each sample is itself the solution of a stochastic differential equation as time goes to infinity.

A solution to this problem presents itself if we can somehow specify a wavefunction ψ(x)𝜓𝑥\psi(x)italic_ψ ( italic_x ) which is easy to sample from. We are interested in wavefunctions which satisfy the following three properties:

  1. (W1)

    There is an explicit functional form for the wavefunction ψ(x)𝜓𝑥\psi(x)italic_ψ ( italic_x ).

  2. (W2)

    ψ𝜓\psiitalic_ψ is antisymmetric.

  3. (W3)

    We can sample non-iteratively (in constant time) from |ψ()|2superscript𝜓2|\psi(\cdot)|^{2}| italic_ψ ( ⋅ ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

The first two properties are necessary for any form of Variational Monte Carlo: (W1) allows us to evaluate the local energy rsubscript𝑟\mathcal{E}_{r}caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT in (5) for use in (6); and (W2) is required for valid electronic (Fermionic) wavefunctions. But (W3) is the new ingredient: if we have a family of wavefunctions ψ𝜓\psiitalic_ψ satisfying (W1)-(W3), then solving the minimization in (4) via the Monte Carlo approach in (6) will be considerably accelerated, as each sample will only require constant time to generate. We add a fourth property, which is not strictly necessary but is both desirable and will prove useful:

  1. (W4)

    ψ𝜓\psiitalic_ψ is normalized, that is |ψ(x)|2𝑑x=1superscript𝜓𝑥2differential-d𝑥1\int|\psi(x)|^{2}dx=1∫ | italic_ψ ( italic_x ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_x = 1.

It turns out that generating such wavefunctions is possible using the following procedure:

Theorem 1.

Let ρ()𝜌\rho(\cdot)italic_ρ ( ⋅ ) be a probability density function which we can sample from in constant time. Let ρ()𝜌\rho(\cdot)italic_ρ ( ⋅ ) satisfy two additional properties:

  1. (D1)

    ρ(x)𝜌𝑥\rho(x)italic_ρ ( italic_x ) is symmetric: ρ(πx)=ρ(x)𝜌𝜋𝑥𝜌𝑥\rho(\pi x)=\rho(x)italic_ρ ( italic_π italic_x ) = italic_ρ ( italic_x ) for all permutations π𝕊n𝜋subscript𝕊𝑛\pi\in\mathbb{S}_{n}italic_π ∈ blackboard_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

  2. (D2)

    ρ(x)=0𝜌𝑥0\rho(x)=0italic_ρ ( italic_x ) = 0 if xi=xjsubscript𝑥𝑖subscript𝑥𝑗x_{i}=x_{j}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for any i,j𝑖𝑗i,jitalic_i , italic_j.

Finally, let κ(x)𝜅𝑥\kappa(x)italic_κ ( italic_x ) be a complex function which satisfies |κ(x)|=1x𝜅𝑥1for-all𝑥|\kappa(x)|=1\,\,\forall x| italic_κ ( italic_x ) | = 1 ∀ italic_x, and is nearly antisymmetric:

κ(πx)={(1)πκ(x)if xixj for all i,jκ¯otherwise𝜅𝜋𝑥casessuperscript1𝜋𝜅𝑥if subscript𝑥𝑖subscript𝑥𝑗 for all 𝑖𝑗¯𝜅otherwise\kappa(\pi x)=\begin{cases}(-1)^{\pi}\kappa(x)&\text{if }x_{i}\neq x_{j}\text{% for all }i,j\\ \bar{\kappa}&\text{otherwise}\end{cases}italic_κ ( italic_π italic_x ) = { start_ROW start_CELL ( - 1 ) start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT italic_κ ( italic_x ) end_CELL start_CELL if italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for all italic_i , italic_j end_CELL end_ROW start_ROW start_CELL over¯ start_ARG italic_κ end_ARG end_CELL start_CELL otherwise end_CELL end_ROW (7)

where κ¯¯𝜅\bar{\kappa}\in\mathbb{C}over¯ start_ARG italic_κ end_ARG ∈ blackboard_C is an arbitrary value with |κ¯|=1¯𝜅1|\bar{\kappa}|=1| over¯ start_ARG italic_κ end_ARG | = 1. Then ψ𝜓\psiitalic_ψ satisfies (W1)-(W4) if and only if ψ𝜓\psiitalic_ψ can be written as ψ(x)=κ(x)ρ(x)𝜓𝑥𝜅𝑥𝜌𝑥\psi(x)=\kappa(x)\sqrt{\rho(x)}italic_ψ ( italic_x ) = italic_κ ( italic_x ) square-root start_ARG italic_ρ ( italic_x ) end_ARG with κ𝜅\kappaitalic_κ and ρ𝜌\rhoitalic_ρ satisfying the above-stated properties.

Proof: See Appendix B.
The general idea expressed in Theorem 1 is that we can build the wavefunction ψ𝜓\psiitalic_ψ out of an easy-to-sample-from density function satisfying additional properties (D1)-(D2); and a nearly antisymmetric phase function κ𝜅\kappaitalic_κ. In what follows, we will show how to construct both of these ingredients. But before doing so, we take a short detour to address the most important practical scenario, that of fixed spin multiplicity.

2.3 Fixed Spin Multiplicity

Notation   As in most approaches to this problem, we assume that the spin multiplicity of the molecule is specified, which is equivalent to fixing the number of spin up and spin down electrons, denoted nusubscript𝑛𝑢n_{u}italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and ndsubscript𝑛𝑑n_{d}italic_n start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT respectively, with nu+nd=nsubscript𝑛𝑢subscript𝑛𝑑𝑛n_{u}+n_{d}=nitalic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = italic_n. Define the canonical spin vector to be given by s¯=[,,,,,]¯𝑠\bar{s}=[\uparrow,\dots,\uparrow,\downarrow,\dots,\downarrow]over¯ start_ARG italic_s end_ARG = [ ↑ , … , ↑ , ↓ , … , ↓ ], i.e. the first nusubscript𝑛𝑢n_{u}italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT are \uparrow, the last ndsubscript𝑛𝑑n_{d}italic_n start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT are \downarrow. We let the sets of indices of up and down spin electrons for the canonical spin vector be denoted by 𝒩u={1,,nu}subscript𝒩𝑢1subscript𝑛𝑢\mathcal{N}_{u}=\{1,\dots,n_{u}\}caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = { 1 , … , italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT } and 𝒩d={nu+1,,n}subscript𝒩𝑑subscript𝑛𝑢1𝑛\mathcal{N}_{d}=\{n_{u}+1,\dots,n\}caligraphic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = { italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT + 1 , … , italic_n }. Finally, we will be interested in the subgroup of permutations in which a permutation is applied separately to spin-up and spin-down electrons. We denote this subgroup by

𝔾𝕊𝒩u×𝕊𝒩d(𝔾 is a subgroup of 𝕊n)𝔾subscript𝕊subscript𝒩𝑢subscript𝕊subscript𝒩𝑑𝔾 is a subgroup of subscript𝕊𝑛\mathbb{G}\equiv\mathbb{S}_{\mathcal{N}_{u}}\times\mathbb{S}_{\mathcal{N}_{d}}% \quad\quad(\mathbb{G}\text{ is a subgroup of }\mathbb{S}_{n})blackboard_G ≡ blackboard_S start_POSTSUBSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT × blackboard_S start_POSTSUBSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( blackboard_G is a subgroup of blackboard_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) (8)

Specification of the Density   In the case of fixed spin multiplicity, the specification of the density ρ(x)𝜌𝑥\rho(x)italic_ρ ( italic_x ) is simplified:

Theorem 2.

Given a configuration x=(r,s)𝑥𝑟𝑠x=(r,s)italic_x = ( italic_r , italic_s ), let a permutation which maps the spin vector s𝑠sitalic_s to the canonical spin vector s¯¯𝑠\bar{s}over¯ start_ARG italic_s end_ARG be given by π¯ssubscript¯𝜋𝑠\bar{\pi}_{s}over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, i.e. s¯=π¯ss¯𝑠subscript¯𝜋𝑠𝑠\bar{s}=\bar{\pi}_{s}sover¯ start_ARG italic_s end_ARG = over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_s. Let ρ¯(r)¯𝜌𝑟\bar{\rho}(r)over¯ start_ARG italic_ρ end_ARG ( italic_r ) be a density function on electron positions (i.e. no spins) satisfying

  1. (R1)

    ρ¯¯𝜌\bar{\rho}over¯ start_ARG italic_ρ end_ARG is 𝔾𝔾\mathbb{G}blackboard_G-invariant: ρ¯(πr)=ρ¯(r) for all π𝔾¯𝜌𝜋𝑟¯𝜌𝑟 for all 𝜋𝔾\bar{\rho}(\pi r)=\bar{\rho}(r)\text{ for all }\pi\in\mathbb{G}over¯ start_ARG italic_ρ end_ARG ( italic_π italic_r ) = over¯ start_ARG italic_ρ end_ARG ( italic_r ) for all italic_π ∈ blackboard_G

  2. (R2)

    ρ¯(r)=0 if ri=rj, for i,j𝒩u or i,j𝒩dformulae-sequence¯𝜌𝑟0 if subscript𝑟𝑖subscript𝑟𝑗 for 𝑖𝑗subscript𝒩𝑢 or 𝑖𝑗subscript𝒩𝑑\bar{\rho}(r)=0\text{ if }r_{i}=r_{j},\text{ for }i,j\in\mathcal{N}_{u}\text{ % or }i,j\in\mathcal{N}_{d}over¯ start_ARG italic_ρ end_ARG ( italic_r ) = 0 if italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , for italic_i , italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT or italic_i , italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT

A density ρ(x)=ρ(r,s)𝜌𝑥𝜌𝑟𝑠\rho(x)=\rho(r,s)italic_ρ ( italic_x ) = italic_ρ ( italic_r , italic_s ) satisfies conditions (D1)-(D2) in Theorem 1 if and only if it may be written as ρ(r,s)=ρ¯(π¯sr)𝜌𝑟𝑠¯𝜌subscript¯𝜋𝑠𝑟\rho(r,s)=\bar{\rho}(\bar{\pi}_{s}r)italic_ρ ( italic_r , italic_s ) = over¯ start_ARG italic_ρ end_ARG ( over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_r ) for a density ρ¯(r)¯𝜌𝑟\bar{\rho}(r)over¯ start_ARG italic_ρ end_ARG ( italic_r ) satisfying conditions (R1) and (R2).

Proof: See Appendix C.
To summarize: in the case of fixed spin multiplicity, specifying a wavefunction ψ𝜓\psiitalic_ψ satisfying our desired conditions (W1)-(W4) is equivalent to specifying a density ρ¯(r)¯𝜌𝑟\bar{\rho}(r)over¯ start_ARG italic_ρ end_ARG ( italic_r ) satisfying conditions (R1)-(R2); and then applying the transformations given in Theorems 1 and 2 to map from ρ¯¯𝜌\bar{\rho}over¯ start_ARG italic_ρ end_ARG to ψ𝜓\psiitalic_ψ.111We have for the moment ignored the issue of the phase κ𝜅\kappaitalic_κ, which we return to in Sections 3.6 and 4.1. Therefore, henceforth we will focus exclusively on specifying densities ρ¯(r)¯𝜌𝑟\bar{\rho}(r)over¯ start_ARG italic_ρ end_ARG ( italic_r ) satisfying conditions (R1)-(R2). To avoid unnecessary notational complexity, we will drop the bars and simply write ρ(r)𝜌𝑟\rho(r)italic_ρ ( italic_r ).

3 Using Normalizing Flows to Construct the Wavefunction Ansatz

3.1 Sufficient Properties of the Normalizing Flow’s Base Density and Transformation

Our goal is to use a normalizing flow to construct the density ρ(r)𝜌𝑟\rho(r)italic_ρ ( italic_r ). Let D𝐷Ditalic_D be the ambient dimension (i.e. D=3𝐷3D=3italic_D = 3) and n𝑛nitalic_n be the number of electrons. The relevant vectors will live in the space Dnsuperscript𝐷𝑛\mathbb{R}^{Dn}blackboard_R start_POSTSUPERSCRIPT italic_D italic_n end_POSTSUPERSCRIPT construed as the Cartesian product D××Dsuperscript𝐷superscript𝐷\mathbb{R}^{D}\times\dots\times\mathbb{R}^{D}blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT × ⋯ × blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT (which is of course isomorphic to Dnsuperscript𝐷𝑛\mathbb{R}^{Dn}blackboard_R start_POSTSUPERSCRIPT italic_D italic_n end_POSTSUPERSCRIPT). A normalizing flow will consist of two ingredients: (1) a base random variable z𝑧zitalic_z, which lives in Dnsuperscript𝐷𝑛\mathbb{R}^{Dn}blackboard_R start_POSTSUPERSCRIPT italic_D italic_n end_POSTSUPERSCRIPT, and is described by the density ρz(z)subscript𝜌𝑧𝑧\rho_{z}(z)italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_z ); (2) an invertible transformation T:DnDn:𝑇superscript𝐷𝑛superscript𝐷𝑛T:\mathbb{R}^{Dn}\to\mathbb{R}^{Dn}italic_T : blackboard_R start_POSTSUPERSCRIPT italic_D italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_D italic_n end_POSTSUPERSCRIPT, such that r=T(z)𝑟𝑇𝑧r=T(z)italic_r = italic_T ( italic_z ). In this case, the density ρ(r)𝜌𝑟\rho(r)italic_ρ ( italic_r ) is the push-forward of ρzsubscript𝜌𝑧\rho_{z}italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT along T𝑇Titalic_T, and is given by the change of variables formula

ρ(r)=ρz(T1(r))|detJT1(r)|𝜌𝑟subscript𝜌𝑧superscript𝑇1𝑟subscript𝐽superscript𝑇1𝑟\rho(r)=\rho_{z}(T^{-1}(r))|\det J_{T^{-1}}(r)|italic_ρ ( italic_r ) = italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_r ) ) | roman_det italic_J start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_r ) | (9)

Recall that we would like our density ρ(r)𝜌𝑟\rho(r)italic_ρ ( italic_r ) to satisfy conditions (R1)-(R2) laid out in Theorem 2. The following theorem establishes conditions for this to occur:

Theorem 3.

Suppose that we have a normalizing flow, whose base density ρzsubscript𝜌𝑧\rho_{z}italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT satisfies properties (R1) and (R2) from Theorem 2, and whose transformation T𝑇Titalic_T is 𝔾𝔾\mathbb{G}blackboard_G-equivariant. Then the density resulting from the normalizing flow will satisfy properties (R1) and (R2).

Proof: See Appendix D.
Armed with this key result, we now set out to design the base density ρzsubscript𝜌𝑧\rho_{z}italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT and transformation T𝑇Titalic_T which satisfy the conditions of Theorem 3.

3.2 The Base Density via Determinantal Point Processes

In most cases in machine learning, the base density for a normalizing flow is taken to be a standard distribution, most often a Gaussian. In our case, we require that the base density have certain special properties, namely (R1) and (R2) from Theorem 2. It turns out that Determinantal Point Processes (DPPs) have just the properties we require. In particular, we are interested in the class of DPPs known as Projection DPPs (Gautier et al., 2019; Lavancier et al., 2015), which can be specified as follows. We will let y𝑦yitalic_y specify a generic point in Dsuperscript𝐷\mathbb{R}^{D}blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT. Let hk:D:subscript𝑘superscript𝐷h_{k}:\mathbb{R}^{D}\to\mathbb{R}italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT → blackboard_R for k=1,,n𝑘1𝑛k=1,\dots,nitalic_k = 1 , … , italic_n be a set of n𝑛nitalic_n functions which are orthogonal, that is hi,hj=Dhi(y)hj(y)𝑑y=δijsubscript𝑖subscript𝑗subscriptsuperscript𝐷subscript𝑖𝑦subscript𝑗𝑦differential-d𝑦subscript𝛿𝑖𝑗\langle h_{i},h_{j}\rangle=\int_{\mathbb{R}^{D}}h_{i}(y)h_{j}(y)dy=\delta_{ij}⟨ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ = ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y ) italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_y ) italic_d italic_y = italic_δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT. Let H(y)𝐻𝑦H(y)italic_H ( italic_y ) be the column vector composed by stacking the individual functions hi(y)subscript𝑖𝑦h_{i}(y)italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y ) and define the kernel function as K(y,y)=H(y)TH(y)𝐾𝑦superscript𝑦𝐻superscript𝑦𝑇𝐻superscript𝑦K(y,y^{\prime})=H(y)^{T}H(y^{\prime})italic_K ( italic_y , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_H ( italic_y ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H ( italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ). Then for a given collection of n𝑛nitalic_n points in Dsuperscript𝐷\mathbb{R}^{D}blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT, that is r=(r1,,rn)𝑟subscript𝑟1subscript𝑟𝑛r=(r_{1},\dots,r_{n})italic_r = ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), we define the n×n𝑛𝑛n\times nitalic_n × italic_n kernel matrix 𝐊n(r)subscript𝐊𝑛𝑟\mathbf{K}_{n}(r)bold_K start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_r ), from which the density of the Projection DPP may be specified:

𝐊n(r)=[K(r1,r1)K(r1,rn)K(rn,r1)K(rn,rn)]ρdpp(r;n)=1n!det𝐊n(r)formulae-sequencesubscript𝐊𝑛𝑟matrix𝐾subscript𝑟1subscript𝑟1𝐾subscript𝑟1subscript𝑟𝑛𝐾subscript𝑟𝑛subscript𝑟1𝐾subscript𝑟𝑛subscript𝑟𝑛subscript𝜌𝑑𝑝𝑝𝑟𝑛1𝑛subscript𝐊𝑛𝑟\mathbf{K}_{n}(r)=\begin{bmatrix}K(r_{1},r_{1})&\dots&K(r_{1},r_{n})\\ \vdots&\ddots&\vdots\\ K(r_{n},r_{1})&\dots&K(r_{n},r_{n})\end{bmatrix}\qquad\Rightarrow\qquad\rho_{% dpp}(r;n)=\frac{1}{n!}\det\mathbf{K}_{n}(r)bold_K start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_r ) = [ start_ARG start_ROW start_CELL italic_K ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_CELL start_CELL … end_CELL start_CELL italic_K ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_K ( italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_CELL start_CELL … end_CELL start_CELL italic_K ( italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG ] ⇒ italic_ρ start_POSTSUBSCRIPT italic_d italic_p italic_p end_POSTSUBSCRIPT ( italic_r ; italic_n ) = divide start_ARG 1 end_ARG start_ARG italic_n ! end_ARG roman_det bold_K start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_r ) (10)

Since 𝐊n(r)subscript𝐊𝑛𝑟\mathbf{K}_{n}(r)bold_K start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_r ) is positive semi-definite, it follows that its determinant is non-negative so that ρdpp(r;n)subscript𝜌𝑑𝑝𝑝𝑟𝑛\rho_{dpp}(r;n)italic_ρ start_POSTSUBSCRIPT italic_d italic_p italic_p end_POSTSUBSCRIPT ( italic_r ; italic_n ) is non-negative, as desired. A proof that ρdpp(r;n)subscript𝜌𝑑𝑝𝑝𝑟𝑛\rho_{dpp}(r;n)italic_ρ start_POSTSUBSCRIPT italic_d italic_p italic_p end_POSTSUBSCRIPT ( italic_r ; italic_n ) is properly normalized (i.e. integrates to 1111) can be found, for example, in Proposition 2.10 of (Johansson, 2006).

Given the notion of a Projection DPP, we may define the base density as follows. As above, let the base random variable be z𝑧zitalic_z, where z𝑧zitalic_z can be broken into spin-up and spin-down pieces, denoted zusubscript𝑧𝑢z_{u}italic_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and zdsubscript𝑧𝑑z_{d}italic_z start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT. (Specifically, zusubscript𝑧𝑢z_{u}italic_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and zdsubscript𝑧𝑑z_{d}italic_z start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT are the parts of z𝑧zitalic_z corresponding to electrons in 𝒩usubscript𝒩𝑢\mathcal{N}_{u}caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and 𝒩dsubscript𝒩𝑑\mathcal{N}_{d}caligraphic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, respectively.) The base density can then be constructed by taking

ρz(z)=ρdpp(zu;nu)ρdpp(zd;nd)subscript𝜌𝑧𝑧subscript𝜌𝑑𝑝𝑝subscript𝑧𝑢subscript𝑛𝑢subscript𝜌𝑑𝑝𝑝subscript𝑧𝑑subscript𝑛𝑑\rho_{z}(z)=\rho_{dpp}(z_{u};n_{u})\rho_{dpp}(z_{d};n_{d})italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_z ) = italic_ρ start_POSTSUBSCRIPT italic_d italic_p italic_p end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ; italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) italic_ρ start_POSTSUBSCRIPT italic_d italic_p italic_p end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ; italic_n start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) (11)

That is, zusubscript𝑧𝑢z_{u}italic_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and zdsubscript𝑧𝑑z_{d}italic_z start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT are chosen from two independent Projection DPPs. We then have the following theorem:

Theorem 4.

Let ρzsubscript𝜌𝑧\rho_{z}italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT be the density specified in Equation (11). Then ρzsubscript𝜌𝑧\rho_{z}italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT satisfies conditions (R1) and (R2) from Theorem 2.

Proof: See Appendix E.
We therefore have an explicit form for the base density from Equations (10) and (11). Furthermore, sampling from the base density amounts to sampling from two independent Projection DPPs. A sampling procedure for Projection DPPs is specified in Appendix F.

3.3 𝔾𝔾\mathbb{G}blackboard_G-Equivariant Layers

As noted in Section 3, we require the normalizing flow transformation to be 𝔾𝔾\mathbb{G}blackboard_G-equivariant. Of course, chaining together many layers which are each 𝔾𝔾\mathbb{G}blackboard_G-equivariant results in an overall transformation which is also 𝔾𝔾\mathbb{G}blackboard_G-equivariant. Now, suppose that a particular layer \ellroman_ℓ can be written as

r+1=T(r)superscript𝑟1superscript𝑇superscript𝑟r^{\ell+1}=T^{\ell}(r^{\ell})italic_r start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT = italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_r start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) (12)

where r=(r1,,rn)superscript𝑟superscriptsubscript𝑟1superscriptsubscript𝑟𝑛r^{\ell}=(r_{1}^{\ell},\dots,r_{n}^{\ell})italic_r start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , … , italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) and likewise for r+1superscript𝑟1r^{\ell+1}italic_r start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT. We will need to see the action on the spin-up and spin-down electrons separately, so we denote ru=(ri)i𝒩usuperscriptsubscript𝑟𝑢subscriptsuperscriptsubscript𝑟𝑖𝑖subscript𝒩𝑢r_{u}^{\ell}=(r_{i}^{\ell})_{i\in\mathcal{N}_{u}}italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT and rd=(ri)i𝒩dsuperscriptsubscript𝑟𝑑subscriptsuperscriptsubscript𝑟𝑖𝑖subscript𝒩𝑑r_{d}^{\ell}=(r_{i}^{\ell})_{i\in\mathcal{N}_{d}}italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT; and we may write

ru+1=Tu(ru,rd)andrd+1=Td(ru,rd)formulae-sequencesuperscriptsubscript𝑟𝑢1superscriptsubscript𝑇𝑢superscriptsubscript𝑟𝑢superscriptsubscript𝑟𝑑andsuperscriptsubscript𝑟𝑑1superscriptsubscript𝑇𝑑superscriptsubscript𝑟𝑢superscriptsubscript𝑟𝑑r_{u}^{\ell+1}=T_{u}^{\ell}(r_{u}^{\ell},r_{d}^{\ell})\quad\text{and}\quad r_{% d}^{\ell+1}=T_{d}^{\ell}(r_{u}^{\ell},r_{d}^{\ell})italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT = italic_T start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) and italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT = italic_T start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) (13)

For notational convenience, we use α{u,d}𝛼𝑢𝑑\alpha\in\{u,d\}italic_α ∈ { italic_u , italic_d } to denote the spin, and the complement of the spin is given by α^^𝛼{\hat{\alpha}}over^ start_ARG italic_α end_ARG (i.e. if α=u𝛼𝑢\alpha=uitalic_α = italic_u then α^=d^𝛼𝑑{\hat{\alpha}}=dover^ start_ARG italic_α end_ARG = italic_d and vice-versa). Then we have the following theorem:

Theorem 5.

The transformation Tsuperscript𝑇T^{\ell}italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is 𝔾𝔾\mathbb{G}blackboard_G-equivariant if and only if

Tα(παrα,πα^rα^)=παTα(rα,rα^)α{u,d}formulae-sequencesuperscriptsubscript𝑇𝛼subscript𝜋𝛼superscriptsubscript𝑟𝛼subscript𝜋^𝛼superscriptsubscript𝑟^𝛼subscript𝜋𝛼superscriptsubscript𝑇𝛼superscriptsubscript𝑟𝛼superscriptsubscript𝑟^𝛼𝛼𝑢𝑑T_{\alpha}^{\ell}(\pi_{\alpha}r_{\alpha}^{\ell},\pi_{\hat{\alpha}}r_{\hat{% \alpha}}^{\ell})=\pi_{\alpha}T_{\alpha}^{\ell}(r_{\alpha}^{\ell},r_{\hat{% \alpha}}^{\ell})\qquad\alpha\in\{u,d\}italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) italic_α ∈ { italic_u , italic_d } (14)

That is, Tαsuperscriptsubscript𝑇𝛼T_{\alpha}^{\ell}italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is equivariant with respect to rαsuperscriptsubscript𝑟𝛼r_{\alpha}^{\ell}italic_r start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT, and invariant with respect to rα^superscriptsubscript𝑟^𝛼r_{\hat{\alpha}}^{\ell}italic_r start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT.

Proof: See Appendix G.

We now show how to specify continuous and discrete normalizing flows satisfying Theorem 5.

3.4 Continuous Normalizing Flows

According to Theorem 3, we are required a find a transformation which is 𝔾𝔾\mathbb{G}blackboard_G-equivariant. We now show this can be achieved via a continuous normalizing flow. We specify this flow via the ordinary differential equation (ODE)

dvdt=Γt(v),withv(0)=zρz()andr=v(1)formulae-sequenceformulae-sequence𝑑𝑣𝑑𝑡subscriptΓ𝑡𝑣with𝑣0𝑧similar-tosubscript𝜌𝑧and𝑟𝑣1\frac{dv}{dt}=\Gamma_{t}(v),\quad\text{with}\quad v(0)=z\sim\rho_{z}(\cdot)% \quad\text{and}\quad r=v(1)divide start_ARG italic_d italic_v end_ARG start_ARG italic_d italic_t end_ARG = roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ) , with italic_v ( 0 ) = italic_z ∼ italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( ⋅ ) and italic_r = italic_v ( 1 ) (15)

That is, the transformation r=T(z)𝑟𝑇𝑧r=T(z)italic_r = italic_T ( italic_z ) is derived as follows: the initial condition is sampled from the base density; and r𝑟ritalic_r is gotten by integrating the ODE forward to time t=1𝑡1t=1italic_t = 1. ΓΓ\Gammaroman_Γ’s t𝑡titalic_t-dependence is indicated via a subscript for notational convenience. We then have the following theorem:

Theorem 6.

Let the transformation r=T(z)𝑟𝑇𝑧r=T(z)italic_r = italic_T ( italic_z ) be specified as in Equation (15). Then T𝑇Titalic_T is 𝔾𝔾\mathbb{G}blackboard_G-equivariant if ΓtsubscriptΓ𝑡\Gamma_{t}roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is 𝔾𝔾\mathbb{G}blackboard_G-equivariant for all t𝑡titalic_t.

Proof: See Appendix H.
It therefore suffices to design a 𝔾𝔾\mathbb{G}blackboard_G-equivariant function ΓtsubscriptΓ𝑡\Gamma_{t}roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Let us break this down by spin: from Theorem 5, we know that this implies that for all t𝑡titalic_t, we have that Γt(παrα,πα^rα^)=παΓt(rα,rα^) for α{u,d}subscriptΓ𝑡subscript𝜋𝛼subscript𝑟𝛼subscript𝜋^𝛼subscript𝑟^𝛼subscript𝜋𝛼subscriptΓ𝑡subscript𝑟𝛼subscript𝑟^𝛼 for 𝛼𝑢𝑑\Gamma_{t}(\pi_{\alpha}r_{\alpha},\pi_{\hat{\alpha}}r_{\hat{\alpha}})=\pi_{% \alpha}\Gamma_{t}(r_{\alpha},r_{\hat{\alpha}})\text{ for }\alpha\in\{u,d\}roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT ) for italic_α ∈ { italic_u , italic_d }. We show in Appendix K how to implement a layer of ΓΓ\Gammaroman_Γ with a combination of multihead attention, fully connected layers, and linear projections (ΓΓ\Gammaroman_Γ can be composed of many such layers).

Continuous normalizing flows are elegant; however, they can present some numerical difficulties. In particular, the issue of ODE stiffness frequently arises in deep learning pipelines involving continuous normalizing flows. Thus, we now present an alternative method, based on discrete normalizing flows.

3.5 Discrete Normalizing Flows

Our goal is now to design such functions Tusuperscriptsubscript𝑇𝑢T_{u}^{\ell}italic_T start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT and Tdsuperscriptsubscript𝑇𝑑T_{d}^{\ell}italic_T start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT which satisfy Equation (14), and for which the overall transformation T=(Tu,Td)superscript𝑇superscriptsubscript𝑇𝑢superscriptsubscript𝑇𝑑T^{\ell}=(T_{u}^{\ell},T_{d}^{\ell})italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = ( italic_T start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_T start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) is invertible. The goal of the layer we propose here is to not sacrifice on expressivity, especially when compared to many layers which are designed for discrete normalizing flows. In particular, the main issue will be to show that the expressivity can be retained even with the joint requirements of invertibility and 𝔾𝔾\mathbb{G}blackboard_G-equivariance. We note that the kind of transformation we propose below is not generally used for normalizing flows, as the determinant of its Jacobian is not fast to compute; however, this is not an issue in our case, as the dimension of the spaces we are dealing with are relatively small. For a more detailed discussion, see Appendix I.

To solve this problem, we introduce the Split Subspace Layer; we note that this layer may be of broader interest in machine learning, independent of the current setting. As before, we take D𝐷Ditalic_D to represent the ambient spatial dimension; in our case, D=3𝐷3D=3italic_D = 3. A key parameter for the thsuperscript𝑡\ell^{th}roman_ℓ start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT layer will be the orthogonal matrix ΛαO(D)superscriptsubscriptΛ𝛼𝑂𝐷\Lambda_{\alpha}^{\ell}\in O(D)roman_Λ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ∈ italic_O ( italic_D ); in particular, we divide this matrix into 2 pieces

Λα=[βα,ξα]with βαD×Dβ and ξαD×(DDβ)formulae-sequencesuperscriptsubscriptΛ𝛼superscriptsubscript𝛽𝛼superscriptsubscript𝜉𝛼formulae-sequencewith superscriptsubscript𝛽𝛼superscript𝐷subscript𝐷𝛽 and superscriptsubscript𝜉𝛼superscript𝐷𝐷subscript𝐷𝛽\Lambda_{\alpha}^{\ell}=[\beta_{\alpha}^{\ell},\xi_{\alpha}^{\ell}]\quad\text{% with }\beta_{\alpha}^{\ell}\in\mathbb{R}^{D\times D_{\beta}}\quad\text{ and }% \quad\xi_{\alpha}^{\ell}\in\mathbb{R}^{D\times(D-D_{\beta})}roman_Λ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = [ italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ] with italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D × italic_D start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and italic_ξ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D × ( italic_D - italic_D start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT (16)

That is, βαsuperscriptsubscript𝛽𝛼\beta_{\alpha}^{\ell}italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT represents the first Dβsubscript𝐷𝛽D_{\beta}italic_D start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT columns of ΛαsuperscriptsubscriptΛ𝛼\Lambda_{\alpha}^{\ell}roman_Λ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT, and ξαsuperscriptsubscript𝜉𝛼\xi_{\alpha}^{\ell}italic_ξ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT represents the final DDβ𝐷subscript𝐷𝛽D-D_{\beta}italic_D - italic_D start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT columns. For each electron i𝑖iitalic_i, we compute the inner product of its coordinates with βαsuperscriptsubscript𝛽𝛼\beta_{\alpha}^{\ell}italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT, i.e.

γα,i=(βα)Trα,iso that γα,iDβformulae-sequencesuperscriptsubscript𝛾𝛼𝑖superscriptsuperscriptsubscript𝛽𝛼𝑇superscriptsubscript𝑟𝛼𝑖so that superscriptsubscript𝛾𝛼𝑖superscriptsubscript𝐷𝛽\gamma_{\alpha,i}^{\ell}=(\beta_{\alpha}^{\ell})^{T}r_{\alpha,i}^{\ell}\quad% \text{so that }\gamma_{\alpha,i}^{\ell}\in\mathbb{R}^{D_{\beta}}italic_γ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = ( italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT so that italic_γ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (17)

We can collect the individual vectors γα,isuperscriptsubscript𝛾𝛼𝑖\gamma_{\alpha,i}^{\ell}italic_γ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT into a list γα=(γα,i)i𝒩αsuperscriptsubscript𝛾𝛼subscriptsuperscriptsubscript𝛾𝛼𝑖𝑖subscript𝒩𝛼\gamma_{\alpha}^{\ell}=(\gamma_{\alpha,i}^{\ell})_{i\in{\mathcal{N}_{\alpha}}}italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = ( italic_γ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_N start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Given this, we define the Split Subspace Layer Tαsuperscriptsubscript𝑇𝛼T_{\alpha}^{\ell}italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT on a per-electron basis by

rα,i+1=Tα,i(rα,rα^)=rα,i+ξαφα,i(γα,γα^)withφα,i(γα,γα^)DDβformulae-sequencesuperscriptsubscript𝑟𝛼𝑖1superscriptsubscript𝑇𝛼𝑖superscriptsubscript𝑟𝛼superscriptsubscript𝑟^𝛼superscriptsubscript𝑟𝛼𝑖superscriptsubscript𝜉𝛼superscriptsubscript𝜑𝛼𝑖superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼withsuperscriptsubscript𝜑𝛼𝑖superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼superscript𝐷subscript𝐷𝛽r_{\alpha,i}^{\ell+1}=T_{\alpha,i}^{\ell}(r_{\alpha}^{\ell},r_{\hat{\alpha}}^{% \ell})=r_{\alpha,i}^{\ell}+\xi_{\alpha}^{\ell}\varphi_{\alpha,i}^{\ell}(\gamma% _{\alpha}^{\ell},\gamma_{\hat{\alpha}}^{\ell})\quad\text{with}\quad\varphi_{% \alpha,i}^{\ell}(\gamma_{\alpha}^{\ell},\gamma_{\hat{\alpha}}^{\ell})\in% \mathbb{R}^{D-D_{\beta}}italic_r start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT = italic_T start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_r start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT + italic_ξ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) with italic_φ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_D - italic_D start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (18)

where φαsuperscriptsubscript𝜑𝛼\varphi_{\alpha}^{\ell}italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is a network, and φα,isuperscriptsubscript𝜑𝛼𝑖\varphi_{\alpha,i}^{\ell}italic_φ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is the part of (the output of) φαsuperscriptsubscript𝜑𝛼\varphi_{\alpha}^{\ell}italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT corresponding to the ithsuperscript𝑖𝑡i^{th}italic_i start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT electron. The layer is referred to as the Split Subspace Layer due to the fact that its input is one subspace of Dsuperscript𝐷\mathbb{R}^{D}blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT, given by βαsuperscriptsubscript𝛽𝛼\beta_{\alpha}^{\ell}italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT; whereas its output is in the orthogonal complement of this subspace, given by ξαsuperscriptsubscript𝜉𝛼\xi_{\alpha}^{\ell}italic_ξ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT.

The main ingredient of the layer is the network φαsuperscriptsubscript𝜑𝛼\varphi_{\alpha}^{\ell}italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT. We now show two things: (1) the layer is invertible for any choice of φαsuperscriptsubscript𝜑𝛼\varphi_{\alpha}^{\ell}italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT (2) we derive conditions on φαsuperscriptsubscript𝜑𝛼\varphi_{\alpha}^{\ell}italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT to achieve 𝔾𝔾\mathbb{G}blackboard_G-equivariance of Tαsuperscriptsubscript𝑇𝛼T_{\alpha}^{\ell}italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT.

Theorem 7.

Let Tsuperscript𝑇T^{\ell}italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT be a Split Subspace Layer, as given in Equation (18). Then Tsuperscript𝑇T^{\ell}italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is invertible. In particular, let γ¯α,i+1=(βα)Trα,i+1superscriptsubscript¯𝛾𝛼𝑖1superscriptsuperscriptsubscript𝛽𝛼𝑇superscriptsubscript𝑟𝛼𝑖1\underline{\gamma}_{\alpha,i}^{\ell+1}=(\beta_{\alpha}^{\ell})^{T}r_{\alpha,i}% ^{\ell+1}under¯ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT = ( italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT; then the inverse of the layer is given by

rα,i=rα,i+1ξαφα,i(γ¯α+1,γ¯α^+1)superscriptsubscript𝑟𝛼𝑖superscriptsubscript𝑟𝛼𝑖1superscriptsubscript𝜉𝛼superscriptsubscript𝜑𝛼𝑖superscriptsubscript¯𝛾𝛼1superscriptsubscript¯𝛾^𝛼1r_{\alpha,i}^{\ell}=r_{\alpha,i}^{\ell+1}-\xi_{\alpha}^{\ell}\varphi_{\alpha,i% }^{\ell}(\underline{\gamma}_{\alpha}^{\ell+1},\underline{\gamma}_{\hat{\alpha}% }^{\ell+1})italic_r start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = italic_r start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( under¯ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT , under¯ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT ) (19)

Furthermore, the layer Tsuperscript𝑇T^{\ell}italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is 𝔾𝔾\mathbb{G}blackboard_G-equivariant if

φα(παγα,πα^γα^)=παφα(γα,γα^)superscriptsubscript𝜑𝛼subscript𝜋𝛼superscriptsubscript𝛾𝛼subscript𝜋^𝛼superscriptsubscript𝛾^𝛼subscript𝜋𝛼superscriptsubscript𝜑𝛼superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼\varphi_{\alpha}^{\ell}(\pi_{\alpha}\gamma_{\alpha}^{\ell},\pi_{\hat{\alpha}}% \gamma_{\hat{\alpha}}^{\ell})=\pi_{\alpha}\varphi_{\alpha}^{\ell}(\gamma_{% \alpha}^{\ell},\gamma_{\hat{\alpha}}^{\ell})italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) (20)

i.e. if φα(γα,γα^)superscriptsubscript𝜑𝛼superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼\varphi_{\alpha}^{\ell}(\gamma_{\alpha}^{\ell},\gamma_{\hat{\alpha}}^{\ell})italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) is equivariant with respect to permutations on γαsuperscriptsubscript𝛾𝛼\gamma_{\alpha}^{\ell}italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT and invariant with respect to permutations on γα^superscriptsubscript𝛾^𝛼\gamma_{\hat{\alpha}}^{\ell}italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT.

Proof: See Appendix J.
The Split Subspace Layer therefore depends on implementation of the network φαsuperscriptsubscript𝜑𝛼\varphi_{\alpha}^{\ell}italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT so that it satisfies Equation (20). We show in Appendix K how φαsuperscriptsubscript𝜑𝛼\varphi_{\alpha}^{\ell}italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT can be implemented with a combination of multihead attention, fully connected layers, and linear projections. We specify a more general version of the Split Subspace Layer in Appendix L.

3.6 Training via SGD

Log Domain: Density   In order to avoid numerical issues, it is best to operate in the log domain. Suppose that

ψ(r)=eq(r)+iw(r)q(r)=12logρ(r)andw(r)=atan2(κi(r),κr(r))formulae-sequence𝜓𝑟superscript𝑒𝑞𝑟𝑖𝑤𝑟formulae-sequence𝑞𝑟12𝜌𝑟and𝑤𝑟atan2subscript𝜅𝑖𝑟subscript𝜅𝑟𝑟\psi(r)=e^{q(r)+iw(r)}\quad\Leftrightarrow\quad q(r)=\tfrac{1}{2}\log\rho(r)% \quad\text{and}\quad w(r)=\text{atan2}\left(\kappa_{i}(r),\kappa_{r}(r)\right)italic_ψ ( italic_r ) = italic_e start_POSTSUPERSCRIPT italic_q ( italic_r ) + italic_i italic_w ( italic_r ) end_POSTSUPERSCRIPT ⇔ italic_q ( italic_r ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log italic_ρ ( italic_r ) and italic_w ( italic_r ) = atan2 ( italic_κ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_r ) , italic_κ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ) ) (21)

where κr(r)subscript𝜅𝑟𝑟\kappa_{r}(r)italic_κ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ) and κi(r)subscript𝜅𝑖𝑟\kappa_{i}(r)italic_κ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_r ) are the real and imaginary parts of the phase κ(r)𝜅𝑟\kappa(r)italic_κ ( italic_r ), respectively; and atan2 is the “full” arctangent.

The log-density q(r;θ)𝑞𝑟𝜃q(r;\theta)italic_q ( italic_r ; italic_θ ) may be computed for both continuous and discrete normalizing flows, where we now introduce the parameters θ𝜃\thetaitalic_θ of the network explicitly. Consider a sample z𝑧zitalic_z chosen from the base density ρz(z)subscript𝜌𝑧𝑧\rho_{z}(z)italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_z ), and in analogy to q(r)𝑞𝑟q(r)italic_q ( italic_r ), define qz(z)=12logρz(z)subscript𝑞𝑧𝑧12subscript𝜌𝑧𝑧q_{z}(z)=\tfrac{1}{2}\log\rho_{z}(z)italic_q start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_z ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_z ). Now, in the case of a continuous normalizing flow, let v(t)𝑣𝑡v(t)italic_v ( italic_t ) satisfy Equation (15); then q(r;θ)𝑞𝑟𝜃q(r;\theta)italic_q ( italic_r ; italic_θ ) can be by computed (Chen et al., 2018) by solving the ODE

dadt=Trace(Γtv(v(t);θ))with with a(0)=qz(z)andq(r;θ)=a(1)formulae-sequence𝑑𝑎𝑑𝑡TracesubscriptΓ𝑡𝑣𝑣𝑡𝜃withformulae-sequence with 𝑎0subscript𝑞𝑧𝑧and𝑞𝑟𝜃𝑎1\frac{da}{dt}=-\text{Trace}\left(\frac{\partial\Gamma_{t}}{\partial v}(v(t);% \theta)\right)\quad\text{with}\quad\text{ with }a(0)=q_{z}(z)\quad\text{and}% \quad q(r;\theta)=a(1)divide start_ARG italic_d italic_a end_ARG start_ARG italic_d italic_t end_ARG = - Trace ( divide start_ARG ∂ roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_v end_ARG ( italic_v ( italic_t ) ; italic_θ ) ) with with italic_a ( 0 ) = italic_q start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_z ) and italic_q ( italic_r ; italic_θ ) = italic_a ( 1 ) (22)

which is the continuous analogue of the change of variables formula. In the case of a discrete normalizing flow, fix the following notation: r0=zsuperscript𝑟0𝑧r^{0}=zitalic_r start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = italic_z, r=rL+1𝑟superscript𝑟𝐿1r=r^{L+1}italic_r = italic_r start_POSTSUPERSCRIPT italic_L + 1 end_POSTSUPERSCRIPT, and T=TLT0𝑇superscript𝑇𝐿superscript𝑇0T=T^{L}\circ\dots\circ T^{0}italic_T = italic_T start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ∘ ⋯ ∘ italic_T start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT. Then we may use a logarithmic version of the standard change of variables formula (9):

q(r;θ)=qz(T1(r;θ))+12=0Llog|detJ(T)1(r+1;θ)|𝑞𝑟𝜃subscript𝑞𝑧superscript𝑇1𝑟𝜃12superscriptsubscript0𝐿subscript𝐽superscriptsuperscript𝑇1superscript𝑟1𝜃q(r;\theta)=q_{z}\left(T^{-1}(r;\theta)\right)+\frac{1}{2}\sum_{\ell=0}^{L}% \log\left|\det J_{(T^{\ell})^{-1}}(r^{\ell+1};\theta)\right|italic_q ( italic_r ; italic_θ ) = italic_q start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_r ; italic_θ ) ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log | roman_det italic_J start_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT ; italic_θ ) | (23)

Log Domain: Gradient of the Objective   Recall that our goal in finding an approximation to the ground state wavefunction is to solve the optimization problem in Equation (4). Using Equation (6) and noting that ψ(;θ)|ψ(;θ)=1inner-product𝜓𝜃𝜓𝜃1\langle\psi(\cdot;\theta)|\psi(\cdot;\theta)\rangle=1⟨ italic_ψ ( ⋅ ; italic_θ ) | italic_ψ ( ⋅ ; italic_θ ) ⟩ = 1 since ρ(;θ)𝜌𝜃\rho(\cdot;\theta)italic_ρ ( ⋅ ; italic_θ ) is normalized, we may write the objective function to be minimized as

(θ)=ψ(;θ)|H|ψ(;θ)=𝔼rρ(;θ)[r(r;θ)]1Kk=1Kr(r(k);θ)𝜃quantum-operator-product𝜓𝜃𝐻𝜓𝜃subscript𝔼similar-to𝑟𝜌𝜃delimited-[]subscript𝑟𝑟𝜃1𝐾superscriptsubscript𝑘1𝐾subscript𝑟superscript𝑟𝑘𝜃\mathcal{L}(\theta)=\langle\psi(\cdot;\theta)|H|\psi(\cdot;\theta)\rangle\,=\,% \mathbb{E}_{r\sim\rho(\cdot;\theta)}\left[\mathcal{E}_{r}(r;\theta)\right]\,% \approx\,\frac{1}{K}\sum_{k=1}^{K}\mathcal{E}_{r}\left(r^{(k)};\theta\right)caligraphic_L ( italic_θ ) = ⟨ italic_ψ ( ⋅ ; italic_θ ) | italic_H | italic_ψ ( ⋅ ; italic_θ ) ⟩ = blackboard_E start_POSTSUBSCRIPT italic_r ∼ italic_ρ ( ⋅ ; italic_θ ) end_POSTSUBSCRIPT [ caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) ] ≈ divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ; italic_θ ) (24)

with samples r(k)ρ(;θ)similar-tosuperscript𝑟𝑘𝜌𝜃r^{(k)}\sim\rho(\cdot;\theta)italic_r start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ∼ italic_ρ ( ⋅ ; italic_θ ). Then we have the following theorem, which shows that the local energy can be written entirely as a function of q(r;θ)𝑞𝑟𝜃q(r;\theta)italic_q ( italic_r ; italic_θ ) and the potential V(r)𝑉𝑟V(r)italic_V ( italic_r ), so that the phase w(r;θ)𝑤𝑟𝜃w(r;\theta)italic_w ( italic_r ; italic_θ ) does not appear; and furthermore gives the gradient of the objective function (θ)𝜃\mathcal{L}(\theta)caligraphic_L ( italic_θ ).

Theorem 8.

The local energy can be written as

r(r;θ)=12Δrq(r;θ)12rq(r;θ)2+V(r)subscript𝑟𝑟𝜃12subscriptΔ𝑟𝑞𝑟𝜃12superscriptnormsubscript𝑟𝑞𝑟𝜃2𝑉𝑟\mathcal{E}_{r}(r;\theta)=-\tfrac{1}{2}\Delta_{r}q(r;\theta)-\tfrac{1}{2}\|% \nabla_{r}q(r;\theta)\|^{2}+V(r)caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) = - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_q ( italic_r ; italic_θ ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ ∇ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_q ( italic_r ; italic_θ ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_V ( italic_r ) (25)

In particular, the local energy is independent of the phase w(r;θ)𝑤𝑟𝜃w(r;\theta)italic_w ( italic_r ; italic_θ ). Furthermore, let

Ω(r;θ)=θr(r;θ)+2r(r;θ)θq(r;θ)Ω𝑟𝜃subscript𝜃subscript𝑟𝑟𝜃2subscript𝑟𝑟𝜃subscript𝜃𝑞𝑟𝜃\Omega(r;\theta)=\nabla_{\theta}\mathcal{E}_{r}(r;\theta)+2\mathcal{E}_{r}(r;% \theta)\nabla_{\theta}q(r;\theta)roman_Ω ( italic_r ; italic_θ ) = ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) + 2 caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_q ( italic_r ; italic_θ ) (26)

Then the gradient of the loss function may be written as

θ(θ)=𝔼rρ(;θ)[Ω(r;θ)]1Kk=1KΩ(r(k);θ)subscript𝜃𝜃subscript𝔼similar-to𝑟𝜌𝜃delimited-[]Ω𝑟𝜃1𝐾superscriptsubscript𝑘1𝐾Ωsuperscript𝑟𝑘𝜃\nabla_{\theta}\mathcal{L}(\theta)=\mathbb{E}_{r\sim\rho(\cdot;\theta)}\left[% \Omega(r;\theta)\right]\approx\frac{1}{K}\sum_{k=1}^{K}\Omega\left(r^{(k)};% \theta\right)∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_L ( italic_θ ) = blackboard_E start_POSTSUBSCRIPT italic_r ∼ italic_ρ ( ⋅ ; italic_θ ) end_POSTSUBSCRIPT [ roman_Ω ( italic_r ; italic_θ ) ] ≈ divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT roman_Ω ( italic_r start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ; italic_θ ) (27)

with samples r(k)ρ(;θ)similar-tosuperscript𝑟𝑘𝜌𝜃r^{(k)}\sim\rho(\cdot;\theta)italic_r start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ∼ italic_ρ ( ⋅ ; italic_θ ).

Proof: See Appendix M.
Thus, in order to optimize the objective in Equation (24), we may use gradient descent using the estimate for the gradient in Equation (27). A detailed version of the optimization routine is given in Appendix N.

4 Further Details: Phase, Cusps, and Induction

4.1 The Phase

Since the Hamiltonian is time-reversal invariant and Hermitian, both its eigenvalues and its eigenfunctions are real. Since the ground-state wavefunction we are looking for is real, the phase can be taken to belong to the two element set {0,π}0𝜋\{0,\pi\}{ 0 , italic_π }. Given that we now know how to solve for an approximation to the density ρ0(r)subscript𝜌0𝑟\rho_{0}(r)italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r ) corresponding to the ground state wavefunction, we now show one way of assigning the phase so that the resulting ground state wavefunction ψ0(r)subscript𝜓0𝑟\psi_{0}(r)italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r ) is appropriately antisymmetric.

Theorem 9.

Let ρ0(r)subscript𝜌0𝑟\rho_{0}(r)italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r ) be the the density for the ground state wavefunction. Let precedes\prec be a strict total order on Dsuperscript𝐷\mathbb{R}^{D}blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT, and define the set

={r=(r1,rn):r1r2rnu and rnu+1rnu+2rn}conditional-set𝑟subscript𝑟1subscript𝑟𝑛precedessubscript𝑟1subscript𝑟2precedesprecedessubscript𝑟subscript𝑛𝑢 and subscript𝑟subscript𝑛𝑢1precedessubscript𝑟subscript𝑛𝑢2precedesprecedessubscript𝑟𝑛\mathcal{R}=\{r=(r_{1},\dots r_{n}):r_{1}\prec r_{2}\prec\dots\prec r_{n_{u}}% \,\,\text{ and }\,\,r_{n_{u}+1}\prec r_{n_{u}+2}\prec\dots\prec r_{n}\}caligraphic_R = { italic_r = ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) : italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≺ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≺ ⋯ ≺ italic_r start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT and italic_r start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT ≺ italic_r start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT + 2 end_POSTSUBSCRIPT ≺ ⋯ ≺ italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } (28)

For any r𝑟ritalic_r without ri=rjsubscript𝑟𝑖subscript𝑟𝑗r_{i}=r_{j}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, define the permutation π(r)𝔾subscript𝜋precedes𝑟𝔾{\pi_{\prec}(r)}\in\mathbb{G}italic_π start_POSTSUBSCRIPT ≺ end_POSTSUBSCRIPT ( italic_r ) ∈ blackboard_G by π(r)rsubscript𝜋precedes𝑟𝑟{\pi_{\prec}(r)}r\in\mathcal{R}italic_π start_POSTSUBSCRIPT ≺ end_POSTSUBSCRIPT ( italic_r ) italic_r ∈ caligraphic_R. Then a valid antisymmetric ground state wavefunction is given by

ψ0(r)={(1)π(r)ρ0(r)if rirji,j0otherwisesubscript𝜓0𝑟casessuperscript1subscript𝜋precedes𝑟subscript𝜌0𝑟if subscript𝑟𝑖subscript𝑟𝑗for-all𝑖𝑗0otherwise\psi_{0}(r)=\begin{cases}(-1)^{\pi_{\prec}(r)}\sqrt{\rho_{0}(r)}&\text{if }r_{% i}\neq r_{j}\,\,\forall i,j\\ 0&\text{otherwise}\end{cases}italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r ) = { start_ROW start_CELL ( - 1 ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT ≺ end_POSTSUBSCRIPT ( italic_r ) end_POSTSUPERSCRIPT square-root start_ARG italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r ) end_ARG end_CELL start_CELL if italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∀ italic_i , italic_j end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise end_CELL end_ROW (29)

Proof: See Appendix O.
Thus, given the density ρ0subscript𝜌0\rho_{0}italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we can use Theorem 9 to easily compute the ground state wavefunction ψ0subscript𝜓0\psi_{0}italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. A question remains: what is the strict total order precedes\prec? Any choice is valid, but the simplest thing to do is to use lexicographic ordering on the coordinates of the two points in Dsuperscript𝐷\mathbb{R}^{D}blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT that are being compared.

4.2 Incorporating Cusps

Electron-Electron Cusps   Wavefunctions are known to have certain non-smooth properties, known as cusps. In particular, the gradient of the wavefunction should exhibit a discontinuity when two electrons coincide. One way to incorporate such gradient discontinuities is via the introduction of terms which depend on the distance between electrons (Pfau et al., 2020); as the distance is itself a continuous but non-smooth function of the electron positions, using distances can allow us to model such cusps. In the case of the discrete normalizing flow, our goal will be to design a layer which incorporates the inter-electron distances directly. Given the requirements of a normalizing flow, the challenge is to enforce invertibility for such a layer. We have the following result:

Theorem 10.

Let the set of distances be given by δ={δij}i<jsuperscript𝛿subscriptsuperscriptsubscript𝛿𝑖𝑗𝑖𝑗\delta^{\ell}=\left\{\delta_{ij}^{\ell}\right\}_{i<j}italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = { italic_δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i < italic_j end_POSTSUBSCRIPT where δij=rirjsuperscriptsubscript𝛿𝑖𝑗normsuperscriptsubscript𝑟𝑖superscriptsubscript𝑟𝑗\delta_{ij}^{\ell}=\|r_{i}^{\ell}-r_{j}^{\ell}\|italic_δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = ∥ italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT - italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ∥. Given a layer of the form

ri+1=Θ(δ;θ)ri+t(δ;θ)with Θ(δ;θ)O(D) and t(δ;θ)Dformulae-sequencesuperscriptsubscript𝑟𝑖1superscriptΘsuperscript𝛿𝜃superscriptsubscript𝑟𝑖superscript𝑡superscript𝛿𝜃with superscriptΘsuperscript𝛿𝜃𝑂𝐷 and superscript𝑡superscript𝛿𝜃superscript𝐷r_{i}^{\ell+1}=\Theta^{\ell}(\delta^{\ell};\theta)\,r_{i}^{\ell}+t^{\ell}(% \delta^{\ell};\theta)\qquad\text{with }\Theta^{\ell}(\delta^{\ell};\theta)\in O% (D)\text{ and }t^{\ell}(\delta^{\ell};\theta)\in\mathbb{R}^{D}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT = roman_Θ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ ) italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT + italic_t start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ ) with roman_Θ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ ) ∈ italic_O ( italic_D ) and italic_t start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT (30)

Then the layer is both 𝔾𝔾\mathbb{G}blackboard_G-equivariant as well as invertible.

Proof: See Appendix P.
The essence of this layer to rotate all electrons in a given configuration r=(r1,,rn)𝑟subscript𝑟1subscript𝑟𝑛r=(r_{1},\dots,r_{n})italic_r = ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) by the same rotation matrix ΘsuperscriptΘ\Theta^{\ell}roman_Θ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT and translation vector tsuperscript𝑡t^{\ell}italic_t start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT; and the rotation matrix and translation vector are both functions the configuration r𝑟ritalic_r entirely through the distances δsuperscript𝛿\delta^{\ell}italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT. The latter fact is crucial, as it means that different configurations r𝑟ritalic_r are treated differently, which gives the layer expressivity. An implementation of this layer based on a Deep Set architecture (Zaheer et al., 2017) is given in Appendix Q.

It is also known that the gradient of the wavefunction should exhibit a discontinuity when an electron and nucleus coincide. The treatment is similar, and is given in Appendix R.

4.3 Induction Across Multiple Molecules

In an effort to accelerate the ground state computation, we may try to learn the ground state wavefunctions and energies for an entire class of molecules simultaneously, as in (Gao and Günnemann, 2023; Scherbela et al., 2024; Gerard et al., 2024). In particular, the molecular parameters are given by R=(R1,,RN)𝑅subscript𝑅1subscript𝑅𝑁R=(R_{1},\dots,R_{N})italic_R = ( italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_R start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ), the nuclear positions; and Z=(Z1,,ZN)𝑍subscript𝑍1subscript𝑍𝑁Z=(Z_{1},\dots,Z_{N})italic_Z = ( italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ), the atomic numbers of each nucleus. Then our goal is to learn a function of the form ψ0(x;R,Z)subscript𝜓0𝑥𝑅𝑍\psi_{0}(x;R,Z)italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ; italic_R , italic_Z ), i.e. a ground state wavefunction which is explicitly parameterized by the molecular parameters. This entails computing the density ρ(r;R,Z)𝜌𝑟𝑅𝑍\rho(r;R,Z)italic_ρ ( italic_r ; italic_R , italic_Z ). However, this latter task is made more complicated by the fact that two new invariances are required:

ρ(r;πR,πZ)=ρ(r;R,Z)for π𝕊N(nuclear permutation invariance)formulae-sequence𝜌𝑟𝜋𝑅𝜋𝑍𝜌𝑟𝑅𝑍for 𝜋subscript𝕊𝑁(nuclear permutation invariance)\displaystyle\rho(r;\pi R,\pi Z)=\rho(r;R,Z)\quad\text{for }\pi\in\mathbb{S}_{% N}\quad\text{(nuclear permutation invariance)}italic_ρ ( italic_r ; italic_π italic_R , italic_π italic_Z ) = italic_ρ ( italic_r ; italic_R , italic_Z ) for italic_π ∈ blackboard_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT (nuclear permutation invariance) (31)
ρ(τr;τR,Z)=ρ(r;R,Z)for τE(D)(joint rigid motion invariance)formulae-sequence𝜌𝜏𝑟𝜏𝑅𝑍𝜌𝑟𝑅𝑍for 𝜏𝐸𝐷(joint rigid motion invariance)\displaystyle\rho(\tau r;\tau R,Z)=\rho(r;R,Z)\quad\text{for }\tau\in E(D)% \quad\text{(joint rigid motion invariance)}italic_ρ ( italic_τ italic_r ; italic_τ italic_R , italic_Z ) = italic_ρ ( italic_r ; italic_R , italic_Z ) for italic_τ ∈ italic_E ( italic_D ) (joint rigid motion invariance) (32)

We henceforth assume that the nuclei have their center of mass at the origin, i.e. R¯=1NI=1NRI=0¯𝑅1𝑁superscriptsubscript𝐼1𝑁subscript𝑅𝐼0\bar{R}=\frac{1}{N}\sum_{I=1}^{N}R_{I}=0over¯ start_ARG italic_R end_ARG = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_I = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT = 0; this removes the need to deal with translations, which generally require special (and uninteresting) treatment, e.g. see (Satorras et al., 2021). Thus, Equation (32) becomes

ρ(Θr;ΘR,Z)=ρ(r;R,Z)for ΘO(D)(joint rotation invariance)formulae-sequence𝜌Θ𝑟Θ𝑅𝑍𝜌𝑟𝑅𝑍for Θ𝑂𝐷(joint rotation invariance)\rho(\Theta r;\Theta R,Z)=\rho(r;R,Z)\quad\text{for }\Theta\in O(D)\quad\text{% (joint rotation invariance)}italic_ρ ( roman_Θ italic_r ; roman_Θ italic_R , italic_Z ) = italic_ρ ( italic_r ; italic_R , italic_Z ) for roman_Θ ∈ italic_O ( italic_D ) (joint rotation invariance) (33)

We now show that densities satisfying Equations (31) and (33) can be realized via a variation of the continuous normalizing flow we have introduced in Section 3.4:

Theorem 11.

Let R¯=1NI=1NRI=0¯𝑅1𝑁superscriptsubscript𝐼1𝑁subscript𝑅𝐼0\bar{R}=\frac{1}{N}\sum_{I=1}^{N}R_{I}=0over¯ start_ARG italic_R end_ARG = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_I = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT = 0. Given a continuous normalizing flow of the form dv/dt=Γt(v;R,Z)𝑑𝑣𝑑𝑡subscriptΓ𝑡𝑣𝑅𝑍dv/dt=\Gamma_{t}(v;R,Z)italic_d italic_v / italic_d italic_t = roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) with v(0)=zρz()𝑣0𝑧similar-tosubscript𝜌𝑧v(0)=z\sim\rho_{z}(\cdot)italic_v ( 0 ) = italic_z ∼ italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( ⋅ ) and r=v(1)𝑟𝑣1r=v(1)italic_r = italic_v ( 1 ). Let the function ΓtsubscriptΓ𝑡\Gamma_{t}roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be invariant with respect to nuclear permutations and equivariant with respect to joint rotations, i.e. for all t𝑡titalic_t

Γt(v;πR,πZ)=Γt(v;R,Z)π𝕊NΓt(Θv;ΘR,Z)=ΘΓt(v;R,Z)ΘO(D)formulae-sequencesubscriptΓ𝑡𝑣𝜋𝑅𝜋𝑍subscriptΓ𝑡𝑣𝑅𝑍for-all𝜋subscript𝕊𝑁subscriptΓ𝑡Θ𝑣Θ𝑅𝑍ΘsubscriptΓ𝑡𝑣𝑅𝑍for-allΘ𝑂𝐷\Gamma_{t}(v;\pi R,\pi Z)=\Gamma_{t}(v;R,Z)\,\,\forall\pi\in\mathbb{S}_{N}% \hskip 21.33955pt\Gamma_{t}(\Theta v;\Theta R,Z)=\Theta\Gamma_{t}(v;R,Z)\,\,% \forall\Theta\in O(D)roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_π italic_R , italic_π italic_Z ) = roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) ∀ italic_π ∈ blackboard_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( roman_Θ italic_v ; roman_Θ italic_R , italic_Z ) = roman_Θ roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) ∀ roman_Θ ∈ italic_O ( italic_D ) (34)

Furthermore, suppose that the base density is invariant with respect to rotations, ρz(Θz)=ρz(z)subscript𝜌𝑧Θ𝑧subscript𝜌𝑧𝑧\rho_{z}(\Theta z)=\rho_{z}(z)italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( roman_Θ italic_z ) = italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_z ) for ΘO(D)Θ𝑂𝐷\Theta\in O(D)roman_Θ ∈ italic_O ( italic_D ). Then the resulting density ρ(r;R,Z)𝜌𝑟𝑅𝑍\rho(r;R,Z)italic_ρ ( italic_r ; italic_R , italic_Z ) satisfies Equations (31) and (33).

Proof: See Appendix S.
First, we note that the base density in Equation (11) can be made invariant to rotations by constructing the relevant Projection DPP from a kernel function K(y,y)=H(y)TH(y)𝐾𝑦superscript𝑦𝐻superscript𝑦𝑇𝐻𝑦K(y,y^{\prime})=H(y)^{T}H(y)italic_K ( italic_y , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_H ( italic_y ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H ( italic_y ), where the functions hi(y)subscript𝑖𝑦h_{i}(y)italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y ) are derived from taking arbitrary rotationally-invariant functions h~i(y)subscript~𝑖𝑦\tilde{h}_{i}(y)over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y ), and orthogonalizing them with Gram-Schmidt; e.g. one may use Gaussians of varying bandwidths, h~i(y)=ey2/σi2subscript~𝑖𝑦superscript𝑒superscriptnorm𝑦2superscriptsubscript𝜎𝑖2\tilde{h}_{i}(y)=e^{-\|y\|^{2}/\sigma_{i}^{2}}over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y ) = italic_e start_POSTSUPERSCRIPT - ∥ italic_y ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT.

Now, we turn to the construction of ΓtsubscriptΓ𝑡\Gamma_{t}roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Recall from Theorem 6 that Γt(;R,Z)subscriptΓ𝑡𝑅𝑍\Gamma_{t}(\cdot;R,Z)roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ ; italic_R , italic_Z ) must be 𝔾𝔾\mathbb{G}blackboard_G-equivariant for all t𝑡titalic_t. Furthermore, we have already noted that 𝔾𝔾\mathbb{G}blackboard_G-equivariant functions may be constructed using a combination of standard pieces: multihead attention, fully connected layers, and linear projections. It would be nice if we were able to use this result while also incorporating the extra conditions in Equation (32). We now show that this is possible:

Theorem 12.

Let ϕt(v;R,Z)subscriptitalic-ϕ𝑡𝑣𝑅𝑍\phi_{t}(v;R,Z)italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) be a function which is 𝔾𝔾\mathbb{G}blackboard_G-equivariant with respect to v𝑣vitalic_v i.e. ϕt(gv;R,Z)=gϕt(v;R,Z)subscriptitalic-ϕ𝑡𝑔𝑣𝑅𝑍𝑔subscriptitalic-ϕ𝑡𝑣𝑅𝑍\phi_{t}(gv;R,Z)=g\phi_{t}(v;R,Z)italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_g italic_v ; italic_R , italic_Z ) = italic_g italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) for g𝔾𝑔𝔾g\in\mathbb{G}italic_g ∈ blackboard_G. Let ωt(v;R,z)subscript𝜔𝑡𝑣𝑅𝑧\omega_{t}(v;R,z)italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_z ) be a function whose output is itself a rotation, i.e. ωt(v;R,z)O(D)subscript𝜔𝑡𝑣𝑅𝑧𝑂𝐷\omega_{t}(v;R,z)\in O(D)italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_z ) ∈ italic_O ( italic_D ). Let ωtsubscript𝜔𝑡\omega_{t}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be 𝔾𝔾\mathbb{G}blackboard_G-invariant with respect to v𝑣vitalic_v, and O(D)𝑂𝐷O(D)italic_O ( italic_D )-equivariant jointly with respect to v𝑣vitalic_v and R𝑅Ritalic_R i.e. ωt(Θv;ΘR,Z)=Θωt(v;R,Z)subscript𝜔𝑡Θ𝑣Θ𝑅𝑍Θsubscript𝜔𝑡𝑣𝑅𝑍\omega_{t}(\Theta v;\Theta R,Z)=\Theta\omega_{t}(v;R,Z)italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( roman_Θ italic_v ; roman_Θ italic_R , italic_Z ) = roman_Θ italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ). Finally, let both ϕtsubscriptitalic-ϕ𝑡\phi_{t}italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and ωtsubscript𝜔𝑡\omega_{t}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be permutation-invariant jointly with respect to R𝑅Ritalic_R and Z𝑍Zitalic_Z i.e. ϕt(v;πR,πZ)=ϕt(v;R,Z)subscriptitalic-ϕ𝑡𝑣𝜋𝑅𝜋𝑍subscriptitalic-ϕ𝑡𝑣𝑅𝑍\phi_{t}(v;\pi R,\pi Z)=\phi_{t}(v;R,Z)italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_π italic_R , italic_π italic_Z ) = italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) and likewise for ωtsubscript𝜔𝑡\omega_{t}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Then the function

Γt(v;R,Z)=ζϕt(ζ1v;ζ1R,Z)whereζ=ωt(v;R,Z)formulae-sequencesubscriptΓ𝑡𝑣𝑅𝑍𝜁subscriptitalic-ϕ𝑡superscript𝜁1𝑣superscript𝜁1𝑅𝑍where𝜁subscript𝜔𝑡𝑣𝑅𝑍\Gamma_{t}(v;R,Z)=\zeta\phi_{t}(\zeta^{-1}v;\zeta^{-1}R,Z)\qquad\text{where}% \qquad\zeta=\omega_{t}(v;R,Z)roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) = italic_ζ italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_v ; italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_R , italic_Z ) where italic_ζ = italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) (35)

satisfies the properties in Equation (34) and is 𝔾𝔾\mathbb{G}blackboard_G-equivariant with respect to v𝑣vitalic_v.

Proof: See Appendix T.
We can use the previously mentioned recipe in Appendix K in order to construct a 𝔾𝔾\mathbb{G}blackboard_G-equivariant ϕtsubscriptitalic-ϕ𝑡\phi_{t}italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, with an extra path in the network for the R,Z𝑅𝑍R,Zitalic_R , italic_Z dependence, based on either Deep Set or a Transformer architecture with pooling to gain the requisite invariance. The function ωtsubscript𝜔𝑡\omega_{t}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can be constructed by using an E(D)𝐸𝐷E(D)italic_E ( italic_D ) Equivariant Graph Neural Network (Satorras et al., 2021) whose output is a rotation matrix, similar to what is done in (Kaba et al., 2023). More detailed information is contained in Appendix U.

5 Concluding Remarks, Limitations, and Future Work

We have demonstrated a theoretical framework for efficiently solving the Electronic Schrödinger Equation using normalizing flows. Using these flows allows us to sample efficiently from the wavefunction, thereby side-step** the need for time-consuming MCMC approaches to sampling. The framework’s construction does not easily admit extensions to either diffusion models (Yang et al., 2023) or flow-matching (Lipman et al., 2022), both of which are very powerful and useful techniques. Future work will focus on adapting the framework to accommodate one or both of these methods.

References

  • Austin et al. (2012) Brian M Austin, Dmitry Yu Zubarev, and William A Lester Jr. Quantum monte carlo and related approaches. Chemical reviews, 112(1):263–288, 2012.
  • Carleo and Troyer (2017) Giuseppe Carleo and Matthias Troyer. Solving the quantum many-body problem with artificial neural networks. Science, 355(6325):602–606, 2017.
  • Cassella et al. (2023) Gino Cassella, Halvard Sutterud, Sam Azadi, ND Drummond, David Pfau, James S Spencer, and W Matthew C Foulkes. Discovering quantum phase transitions with fermionic neural networks. Physical Review Letters, 130(3):036401, 2023.
  • Ceperley and Alder (1986) David Ceperley and Berni Alder. Quantum monte carlo. Science, 231(4738):555–560, 1986.
  • Chen et al. (2018) Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
  • Deng et al. (2017) Dong-Ling Deng, Xiaopeng Li, and S Das Sarma. Quantum entanglement in neural network states. Physical Review X, 7(2):021021, 2017.
  • Entwistle et al. (2023) Michael T Entwistle, Zeno Schätzle, Paolo A Erdman, Jan Hermann, and Frank Noé. Electronic excited states in deep variational monte carlo. Nature Communications, 14(1):274, 2023.
  • Foulkes et al. (2001) William MC Foulkes, Lubos Mitas, RJ Needs, and Guna Rajagopal. Quantum monte carlo simulations of solids. Reviews of Modern Physics, 73(1):33, 2001.
  • Gao and Günnemann (2023) Nicholas Gao and Stephan Günnemann. Generalizing neural wave functions. In International Conference on Machine Learning, pages 10708–10726. PMLR, 2023.
  • Gao and Duan (2017) Xun Gao and Lu-Ming Duan. Efficient representation of quantum many-body states with deep neural networks. Nature communications, 8(1):662, 2017.
  • Gautier et al. (2019) Guillaume Gautier, Rémi Bardenet, and Michal Valko. On two ways to use determinantal point processes for monte carlo integration. Advances in Neural Information Processing Systems, 32, 2019.
  • Gerard et al. (2022) Leon Gerard, Michael Scherbela, Philipp Marquetand, and Philipp Grohs. Gold-standard solutions to the schrödinger equation using deep learning: How much physics do we need? Advances in Neural Information Processing Systems, 35:10282–10294, 2022.
  • Gerard et al. (2024) Leon Gerard, Michael Scherbela, Halvard Sutterud, Matthew Foulkes, and Philipp Grohs. Transferable neural wavefunctions for solids. arXiv preprint arXiv:2405.07599, 2024.
  • Gubernatis et al. (2016) James Gubernatis, Naoki Kawashima, and Philipp Werner. Quantum Monte Carlo Methods. Cambridge University Press, 2016.
  • Han et al. (2019) Jiequn Han, Linfeng Zhang, and E Weinan. Solving many-electron schrödinger equation using deep neural networks. Journal of Computational Physics, 399:108929, 2019.
  • Hermann et al. (2020) Jan Hermann, Zeno Schätzle, and Frank Noé. Deep-neural-network solution of the electronic schrödinger equation. Nature Chemistry, 12(10):891–897, 2020.
  • Johansson (2006) Kurt Johansson. Random matrices and determinantal processes. In Les Houches, volume 83, pages 1–56. Elsevier, 2006.
  • Kaba et al. (2023) Sékou-Oumar Kaba, Arnab Kumar Mondal, Yan Zhang, Yoshua Bengio, and Siamak Ravanbakhsh. Equivariance with learned canonicalization functions. In International Conference on Machine Learning, pages 15546–15566. PMLR, 2023.
  • Köhler et al. (2020) Jonas Köhler, Leon Klein, and Frank Noé. Equivariant flows: exact likelihood generative learning for symmetric densities. In International conference on machine learning, pages 5361–5370. PMLR, 2020.
  • Lavancier et al. (2015) Frédéric Lavancier, Jesper Møller, and Ege Rubak. Determinantal point process models and statistical inference. Journal of the Royal Statistical Society Series B: Statistical Methodology, 77(4):853–877, 2015.
  • Levine et al. (2019) Yoav Levine, Or Sharir, Nadav Cohen, and Amnon Shashua. Quantum entanglement in deep learning architectures. Physical review letters, 122(6):065301, 2019.
  • Li et al. (2022) Xiang Li, Zhe Li, and Ji Chen. Ab initio calculation of real solids via neural network ansatz. Nature Communications, 13(1):7895, 2022.
  • Lipman et al. (2022) Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations, 2022.
  • Naito et al. (2023) Tomoya Naito, Hisashi Naito, Koji Hashimoto, et al. Multi-body wave function of ground and low-lying excited states using unornamented deep neural networks. Physical Review Research, 5(3):033189, 2023.
  • Needs et al. (2009) Richarad J Needs, Michael D Towler, Neil D Drummond, and P López Ríos. Continuum variational and diffusion quantum monte carlo calculations. Journal of Physics: Condensed Matter, 22(2):023201, 2009.
  • Passetti et al. (2023) Giacomo Passetti, Damian Hofmann, Pit Neitemeier, Lukas Grunwald, Michael A Sentef, and Dante M Kennes. Can neural quantum states learn volume-law ground states? Physical Review Letters, 131(3):036502, 2023.
  • Pescia et al. (2022) Gabriel Pescia, Jiequn Han, Alessandro Lovato, Jianfeng Lu, and Giuseppe Carleo. Neural-network quantum states for periodic systems in continuous space. Physical Review Research, 4(2):023138, 2022.
  • Pfau et al. (2020) David Pfau, James S Spencer, Alexander GDG Matthews, and W Matthew C Foulkes. Ab initio solution of the many-electron schrödinger equation with deep neural networks. Physical Review Research, 2(3):033429, 2020.
  • Pfau et al. (2023) David Pfau, Simon Axelrod, Halvard Sutterud, Ingrid von Glehn, and James S Spencer. Natural quantum monte carlo computation of excited states. arXiv preprint arXiv:2308.16848, 2023.
  • Ren et al. (2023) Weiluo Ren, Weizhong Fu, Xiaojie Wu, and Ji Chen. Towards the ground state of molecules via diffusion monte carlo on neural networks. Nature Communications, 14(1):1860, 2023.
  • Saleh et al. (2023) Yahya Saleh, Álvaro Fernández Corral, Armin Iske, Jochen Küpper, and Andrey Yachmenev. Computing excited states of molecules using normalizing flows. arXiv preprint arXiv:2308.16468, 2023.
  • Satorras et al. (2021) Vıctor Garcia Satorras, Emiel Hoogeboom, and Max Welling. E(n) equivariant graph neural networks. In International conference on machine learning, pages 9323–9332. PMLR, 2021.
  • Schätzle et al. (2021) Zeno Schätzle, Jan Hermann, and Frank Noé. Convergence to the fixed-node limit in deep variational monte carlo. The Journal of Chemical Physics, 154(12), 2021.
  • Scherbela et al. (2024) Michael Scherbela, Leon Gerard, and Philipp Grohs. Towards a transferable fermionic neural wavefunction for molecules. Nature Communications, 15(1):120, 2024.
  • Sharir et al. (2022) Or Sharir, Amnon Shashua, and Giuseppe Carleo. Neural tensor contractions and the expressive power of deep neural quantum states. Physical Review B, 106(20):205136, 2022.
  • Spencer et al. (2020) James S Spencer, David Pfau, Aleksandar Botev, and W Matthew C Foulkes. Better, faster fermionic neural networks. arXiv preprint arXiv:2011.07125, 2020.
  • Thiede et al. (2022) Luca Thiede, Chong Sun, and Alán Aspuru-Guzik. Waveflow: Enforcing boundary conditions in smooth normalizing flows with application to fermionic wave functions. arXiv preprint arXiv:2211.14839, 2022.
  • Umrigar et al. (1993) CJ Umrigar, MP Nightingale, and KJ Runge. A diffusion monte carlo algorithm with very small time-step errors. The Journal of chemical physics, 99(4):2865–2890, 1993.
  • Wilson et al. (2021) Max Wilson, Nicholas Gao, Filip Wudarski, Eleanor Rieffel, and Norm M Tubman. Simulations of state-of-the-art fermionic neural network wave functions with diffusion monte carlo. arXiv preprint arXiv:2103.12570, 2021.
  • Wilson et al. (2022) Max Wilson, Saverio Moroni, Markus Holzmann, Nicholas Gao, Filip Wudarski, Tejs Vegge, and Arghya Bhowmik. Wave function ansatz (but periodic) networks and the homogeneous electron gas. arXiv preprint arXiv:2202.04622, 2022.
  • Yang et al. (2023) Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. Diffusion models: A comprehensive survey of methods and applications. ACM Computing Surveys, 56(4):1–39, 2023.
  • Zaheer et al. (2017) Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov, and Alexander J Smola. Deep sets. Advances in neural information processing systems, 30, 2017.

Appendix A Derivation of Equation (6)

Recall that the local energy is defined as

(x)Hψ(x)ψ(x)=Δψ(x)2ψ(x)+V(x)𝑥𝐻𝜓𝑥𝜓𝑥Δ𝜓𝑥2𝜓𝑥𝑉𝑥\mathcal{E}(x)\equiv\frac{H\psi(x)}{\psi(x)}=-\frac{\Delta\psi(x)}{2\psi(x)}+V% (x)caligraphic_E ( italic_x ) ≡ divide start_ARG italic_H italic_ψ ( italic_x ) end_ARG start_ARG italic_ψ ( italic_x ) end_ARG = - divide start_ARG roman_Δ italic_ψ ( italic_x ) end_ARG start_ARG 2 italic_ψ ( italic_x ) end_ARG + italic_V ( italic_x ) (36)

with

r(x)=Real{(x)}subscript𝑟𝑥Real𝑥\mathcal{E}_{r}(x)=\text{Real}\{\mathcal{E}(x)\}caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x ) = Real { caligraphic_E ( italic_x ) } (37)

In this case, one may write

ψ|H|ψψ|ψquantum-operator-product𝜓𝐻𝜓inner-product𝜓𝜓\displaystyle\frac{\langle\psi|H|\psi\rangle}{\langle\psi|\psi\rangle}divide start_ARG ⟨ italic_ψ | italic_H | italic_ψ ⟩ end_ARG start_ARG ⟨ italic_ψ | italic_ψ ⟩ end_ARG =Real{ψ|H|ψψ|ψ}absentRealquantum-operator-product𝜓𝐻𝜓inner-product𝜓𝜓\displaystyle=\text{Real}\left\{\frac{\langle\psi|H|\psi\rangle}{\langle\psi|% \psi\rangle}\right\}= Real { divide start_ARG ⟨ italic_ψ | italic_H | italic_ψ ⟩ end_ARG start_ARG ⟨ italic_ψ | italic_ψ ⟩ end_ARG }
=Real{ψ(x)Hψ(x)𝑑xψ|ψ}absentRealsuperscript𝜓𝑥𝐻𝜓𝑥differential-d𝑥inner-product𝜓𝜓\displaystyle=\text{Real}\left\{\frac{\int\psi^{*}(x)H\psi(x)dx}{\langle\psi|% \psi\rangle}\right\}= Real { divide start_ARG ∫ italic_ψ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) italic_H italic_ψ ( italic_x ) italic_d italic_x end_ARG start_ARG ⟨ italic_ψ | italic_ψ ⟩ end_ARG }
=1ψ|ψReal{ψ(x)Hψ(x)}𝑑xabsent1inner-product𝜓𝜓Realsuperscript𝜓𝑥𝐻𝜓𝑥differential-d𝑥\displaystyle=\frac{1}{\langle\psi|\psi\rangle}\int\text{Real}\left\{\psi^{*}(% x)H\psi(x)\right\}dx= divide start_ARG 1 end_ARG start_ARG ⟨ italic_ψ | italic_ψ ⟩ end_ARG ∫ Real { italic_ψ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) italic_H italic_ψ ( italic_x ) } italic_d italic_x
=1ψ|ψReal{ψ(x)ψ(x)ψ(x)Hψ(x)}𝑑xabsent1inner-product𝜓𝜓Real𝜓𝑥𝜓𝑥superscript𝜓𝑥𝐻𝜓𝑥differential-d𝑥\displaystyle=\frac{1}{\langle\psi|\psi\rangle}\int\text{Real}\left\{\frac{% \psi(x)}{\psi(x)}\psi^{*}(x)H\psi(x)\right\}dx= divide start_ARG 1 end_ARG start_ARG ⟨ italic_ψ | italic_ψ ⟩ end_ARG ∫ Real { divide start_ARG italic_ψ ( italic_x ) end_ARG start_ARG italic_ψ ( italic_x ) end_ARG italic_ψ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) italic_H italic_ψ ( italic_x ) } italic_d italic_x
=1ψ|ψReal{|ψ(x)|2Hψ(x)ψ(x)}𝑑xabsent1inner-product𝜓𝜓Realsuperscript𝜓𝑥2𝐻𝜓𝑥𝜓𝑥differential-d𝑥\displaystyle=\frac{1}{\langle\psi|\psi\rangle}\int\text{Real}\left\{|\psi(x)|% ^{2}\frac{H\psi(x)}{\psi(x)}\right\}dx= divide start_ARG 1 end_ARG start_ARG ⟨ italic_ψ | italic_ψ ⟩ end_ARG ∫ Real { | italic_ψ ( italic_x ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG italic_H italic_ψ ( italic_x ) end_ARG start_ARG italic_ψ ( italic_x ) end_ARG } italic_d italic_x
=Real{Hψ(x)ψ(x)}|ψ(x)|2ψ|ψ𝑑xabsentReal𝐻𝜓𝑥𝜓𝑥superscript𝜓𝑥2inner-product𝜓𝜓differential-d𝑥\displaystyle=\int\text{Real}\left\{\frac{H\psi(x)}{\psi(x)}\right\}\frac{|% \psi(x)|^{2}}{\langle\psi|\psi\rangle}dx= ∫ Real { divide start_ARG italic_H italic_ψ ( italic_x ) end_ARG start_ARG italic_ψ ( italic_x ) end_ARG } divide start_ARG | italic_ψ ( italic_x ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ⟨ italic_ψ | italic_ψ ⟩ end_ARG italic_d italic_x
=r(x)ρ(x)𝑑xabsentsubscript𝑟𝑥𝜌𝑥differential-d𝑥\displaystyle=\int\mathcal{E}_{r}(x)\rho(x)dx= ∫ caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x ) italic_ρ ( italic_x ) italic_d italic_x
=𝔼xρ()[r(x)]absentsubscript𝔼similar-to𝑥𝜌delimited-[]subscript𝑟𝑥\displaystyle=\mathbb{E}_{x\sim\rho(\cdot)}\left[\mathcal{E}_{r}(x)\right]= blackboard_E start_POSTSUBSCRIPT italic_x ∼ italic_ρ ( ⋅ ) end_POSTSUBSCRIPT [ caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x ) ]
1Kk=1Kr(x(k))absent1𝐾superscriptsubscript𝑘1𝐾subscript𝑟superscript𝑥𝑘\displaystyle\approx\frac{1}{K}\sum_{k=1}^{K}\mathcal{E}_{r}\left(x^{(k)}\right)≈ divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) (38)

where the x(k)superscript𝑥𝑘x^{(k)}italic_x start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT are sampled from ρ()=|ψ()|2/ψ|ψ𝜌superscript𝜓2inner-product𝜓𝜓\rho(\cdot)=|\psi(\cdot)|^{2}/\langle\psi|\psi\rangleitalic_ρ ( ⋅ ) = | italic_ψ ( ⋅ ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ⟨ italic_ψ | italic_ψ ⟩. Note that in the first line, we have used the fact that H𝐻Hitalic_H is a symmetric operator so that the quadratic form is real; in the third line, the fact that ψ|ψinner-product𝜓𝜓\langle\psi|\psi\rangle⟨ italic_ψ | italic_ψ ⟩ is real; and in the sixth line, the fact that |ψ(x)|2superscript𝜓𝑥2|\psi(x)|^{2}| italic_ψ ( italic_x ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is real.

Appendix B Proof of Theorem 1

Theorem.

Let ρ()𝜌\rho(\cdot)italic_ρ ( ⋅ ) be a probability density function which we can sample from in constant time. Let ρ()𝜌\rho(\cdot)italic_ρ ( ⋅ ) satisfy two additional properties:

  1. (D1)

    ρ(x)𝜌𝑥\rho(x)italic_ρ ( italic_x ) is symmetric: ρ(πx)=ρ(x)𝜌𝜋𝑥𝜌𝑥\rho(\pi x)=\rho(x)italic_ρ ( italic_π italic_x ) = italic_ρ ( italic_x ) for all permutations π𝕊n𝜋subscript𝕊𝑛\pi\in\mathbb{S}_{n}italic_π ∈ blackboard_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

  2. (D2)

    ρ(x)=0𝜌𝑥0\rho(x)=0italic_ρ ( italic_x ) = 0 if xi=xjsubscript𝑥𝑖subscript𝑥𝑗x_{i}=x_{j}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for any i,j𝑖𝑗i,jitalic_i , italic_j.

Finally, let κ(x)𝜅𝑥\kappa(x)italic_κ ( italic_x ) be a complex function which satisfies |κ(x)|=1x𝜅𝑥1for-all𝑥|\kappa(x)|=1\,\,\forall x| italic_κ ( italic_x ) | = 1 ∀ italic_x, and is nearly antisymmetric:

κ(πx)={(1)πκ(x)if xixj for all i,jκ¯otherwise𝜅𝜋𝑥casessuperscript1𝜋𝜅𝑥if subscript𝑥𝑖subscript𝑥𝑗 for all 𝑖𝑗¯𝜅otherwise\kappa(\pi x)=\begin{cases}(-1)^{\pi}\kappa(x)&\text{if }x_{i}\neq x_{j}\text{% for all }i,j\\ \bar{\kappa}&\text{otherwise}\end{cases}italic_κ ( italic_π italic_x ) = { start_ROW start_CELL ( - 1 ) start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT italic_κ ( italic_x ) end_CELL start_CELL if italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for all italic_i , italic_j end_CELL end_ROW start_ROW start_CELL over¯ start_ARG italic_κ end_ARG end_CELL start_CELL otherwise end_CELL end_ROW

where κ¯¯𝜅\bar{\kappa}\in\mathbb{C}over¯ start_ARG italic_κ end_ARG ∈ blackboard_C is an arbitrary value with |κ¯|=1¯𝜅1|\bar{\kappa}|=1| over¯ start_ARG italic_κ end_ARG | = 1. Then ψ𝜓\psiitalic_ψ satisfies (W1)-(W4) if and only if ψ𝜓\psiitalic_ψ can be written as ψ(x)=κ(x)ρ(x)𝜓𝑥𝜅𝑥𝜌𝑥\psi(x)=\kappa(x)\sqrt{\rho(x)}italic_ψ ( italic_x ) = italic_κ ( italic_x ) square-root start_ARG italic_ρ ( italic_x ) end_ARG with κ𝜅\kappaitalic_κ and ρ𝜌\rhoitalic_ρ satisfying the above-stated properties.

Proof.

Suppose that ψ(x)=κ(x)ρ(x)𝜓𝑥𝜅𝑥𝜌𝑥\psi(x)=\kappa(x)\sqrt{\rho(x)}italic_ψ ( italic_x ) = italic_κ ( italic_x ) square-root start_ARG italic_ρ ( italic_x ) end_ARG, let us prove each of properties (W1)-(W4).

(W1) The functional form for ψ(x)𝜓𝑥\psi(x)italic_ψ ( italic_x ) is just κ(x)ρ(x)𝜅𝑥𝜌𝑥\kappa(x)\sqrt{\rho(x)}italic_κ ( italic_x ) square-root start_ARG italic_ρ ( italic_x ) end_ARG, which we know explicitly.

(W2) Antisymmetry of ψ𝜓\psiitalic_ψ: we break down by cases. Suppose that x𝑥xitalic_x is such that xixj for all i,jsubscript𝑥𝑖subscript𝑥𝑗 for all 𝑖𝑗x_{i}\neq x_{j}\text{ for all }i,jitalic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for all italic_i , italic_j. Then:

ψ(πx)𝜓𝜋𝑥\displaystyle\psi(\pi x)italic_ψ ( italic_π italic_x ) =κ(πx)ρ(πx)absent𝜅𝜋𝑥𝜌𝜋𝑥\displaystyle=\kappa(\pi x)\sqrt{\rho(\pi x)}= italic_κ ( italic_π italic_x ) square-root start_ARG italic_ρ ( italic_π italic_x ) end_ARG
=(1)πκ(x)ρ(x)absentsuperscript1𝜋𝜅𝑥𝜌𝑥\displaystyle=(-1)^{\pi}\kappa(x)\sqrt{\rho(x)}= ( - 1 ) start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT italic_κ ( italic_x ) square-root start_ARG italic_ρ ( italic_x ) end_ARG
=(1)πψ(x)absentsuperscript1𝜋𝜓𝑥\displaystyle=(-1)^{\pi}\psi(x)= ( - 1 ) start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT italic_ψ ( italic_x ) (39)

where in the second line we have used the two facts that κ𝜅\kappaitalic_κ is antisymmetric and ρ𝜌\rhoitalic_ρ is symmetric. Now, suppose that xi=xjsubscript𝑥𝑖subscript𝑥𝑗x_{i}=x_{j}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for some i,j𝑖𝑗i,jitalic_i , italic_j:

ψ(πx)𝜓𝜋𝑥\displaystyle\psi(\pi x)italic_ψ ( italic_π italic_x ) =κ(πx)ρ(πx)absent𝜅𝜋𝑥𝜌𝜋𝑥\displaystyle=\kappa(\pi x)\sqrt{\rho(\pi x)}= italic_κ ( italic_π italic_x ) square-root start_ARG italic_ρ ( italic_π italic_x ) end_ARG
=κ¯ρ(x)absent¯𝜅𝜌𝑥\displaystyle=\bar{\kappa}\sqrt{\rho(x)}= over¯ start_ARG italic_κ end_ARG square-root start_ARG italic_ρ ( italic_x ) end_ARG
=0absent0\displaystyle=0= 0 (40)

where in the third line, we have used (D2). But this is precisely what is required for an antisymmetric function: if a(x)𝑎𝑥a(x)italic_a ( italic_x ) is antisymmetric and xi=xjsubscript𝑥𝑖subscript𝑥𝑗x_{i}=x_{j}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT of some i,j𝑖𝑗i,jitalic_i , italic_j, then πijx=xsubscript𝜋𝑖𝑗𝑥𝑥\pi_{ij}x=xitalic_π start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_x = italic_x, where πijsubscript𝜋𝑖𝑗\pi_{ij}italic_π start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is the permutation which flips i𝑖iitalic_i and j𝑗jitalic_j, so that

a(πijx)=a(x)a(x)=a(x)a(x)=0a(πx)=(1)πa(x)=0𝑎subscript𝜋𝑖𝑗𝑥𝑎𝑥𝑎𝑥𝑎𝑥𝑎𝑥0𝑎𝜋𝑥superscript1𝜋𝑎𝑥0a(\pi_{ij}x)=a(x)\,\Rightarrow\,-a(x)=a(x)\,\Rightarrow\,a(x)=0\,\Rightarrow\,% a(\pi x)=(-1)^{\pi}a(x)=0italic_a ( italic_π start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_x ) = italic_a ( italic_x ) ⇒ - italic_a ( italic_x ) = italic_a ( italic_x ) ⇒ italic_a ( italic_x ) = 0 ⇒ italic_a ( italic_π italic_x ) = ( - 1 ) start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT italic_a ( italic_x ) = 0 (41)

where we have used the fact that (1)πij=1superscript1subscript𝜋𝑖𝑗1(-1)^{\pi_{ij}}=-1( - 1 ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = - 1, since only one flip is required.

(W3) We can sample in constant time from ρ(x)=ψ(x)2𝜌𝑥superscriptnorm𝜓𝑥2\rho(x)=\|\psi(x)\|^{2}italic_ρ ( italic_x ) = ∥ italic_ψ ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT by assumption.

(W4) ψ𝜓\psiitalic_ψ is normalized:

ψ(x)2𝑑x=|κ(x)|2ρ(x)𝑑x=ρ(x)𝑑x=1superscriptnorm𝜓𝑥2differential-d𝑥superscript𝜅𝑥2𝜌𝑥differential-d𝑥𝜌𝑥differential-d𝑥1\int\|\psi(x)\|^{2}dx=\int|\kappa(x)|^{2}\rho(x)dx=\int\rho(x)dx=1∫ ∥ italic_ψ ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_x = ∫ | italic_κ ( italic_x ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ρ ( italic_x ) italic_d italic_x = ∫ italic_ρ ( italic_x ) italic_d italic_x = 1 (42)

since ρ(x)𝜌𝑥\rho(x)italic_ρ ( italic_x ) is a probability density function.

Thus, we have proved the forward direction.

Now, let us assume properties (W1)-(W4). We can always express a complex number c𝑐citalic_c in terms of a magnitude and a phase; in particular, we may write c=meiν𝑐𝑚superscript𝑒𝑖𝜈c=me^{i\nu}italic_c = italic_m italic_e start_POSTSUPERSCRIPT italic_i italic_ν end_POSTSUPERSCRIPT, where m0𝑚0m\geq 0italic_m ≥ 0 is a real number, and ν[0,2π)𝜈02𝜋\nu\in[0,2\pi)italic_ν ∈ [ 0 , 2 italic_π ). (W1) tells us that we have an explicit form for the complex-valued function ψ(x)𝜓𝑥\psi(x)italic_ψ ( italic_x ); thus, we know that

ψ(x)=m(x)eiν(x)ρ(x)κ(x)𝜓𝑥𝑚𝑥superscript𝑒𝑖𝜈𝑥𝜌𝑥𝜅𝑥\psi(x)=m(x)e^{i\nu(x)}\equiv\sqrt{\rho(x)}\kappa(x)italic_ψ ( italic_x ) = italic_m ( italic_x ) italic_e start_POSTSUPERSCRIPT italic_i italic_ν ( italic_x ) end_POSTSUPERSCRIPT ≡ square-root start_ARG italic_ρ ( italic_x ) end_ARG italic_κ ( italic_x ) (43)

with |κ(x)|=1𝜅𝑥1|\kappa(x)|=1| italic_κ ( italic_x ) | = 1, where we have used the fact that m(x)0𝑚𝑥0m(x)\geq 0italic_m ( italic_x ) ≥ 0. Note that ρ(x)=ψ(x)2𝜌𝑥superscriptnorm𝜓𝑥2\rho(x)=\|\psi(x)\|^{2}italic_ρ ( italic_x ) = ∥ italic_ψ ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. As ρ(x)0𝜌𝑥0\rho(x)\geq 0italic_ρ ( italic_x ) ≥ 0, and

ρ(x)𝑑x=ψ(x)2𝑑x=1𝜌𝑥differential-d𝑥superscriptnorm𝜓𝑥2differential-d𝑥1\int\rho(x)dx=\int\|\psi(x)\|^{2}dx=1∫ italic_ρ ( italic_x ) italic_d italic_x = ∫ ∥ italic_ψ ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_x = 1 (44)

by (W4), then ρ(x)𝜌𝑥\rho(x)italic_ρ ( italic_x ) is a density. Furthermore, by (W3) ρ(x)=ψ(x)2𝜌𝑥superscriptnorm𝜓𝑥2\rho(x)=\|\psi(x)\|^{2}italic_ρ ( italic_x ) = ∥ italic_ψ ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT may be sampled in constant time. Finally, by (W2), ψ(x)𝜓𝑥\psi(x)italic_ψ ( italic_x ) is antisymmetric; thus,

ψ(πx)=(1)πψ(x)π,x𝜓𝜋𝑥superscript1𝜋𝜓𝑥for-all𝜋𝑥\displaystyle\psi(\pi x)=(-1)^{\pi}\psi(x)\quad\forall\pi,xitalic_ψ ( italic_π italic_x ) = ( - 1 ) start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT italic_ψ ( italic_x ) ∀ italic_π , italic_x
\displaystyle\Leftrightarrow\quad ρ(πx)κ(πx)=(1)πρ(x)κ(x)π,x𝜌𝜋𝑥𝜅𝜋𝑥superscript1𝜋𝜌𝑥𝜅𝑥for-all𝜋𝑥\displaystyle\sqrt{\rho(\pi x)}\kappa(\pi x)=(-1)^{\pi}\sqrt{\rho(x)}\kappa(x)% \quad\forall\pi,xsquare-root start_ARG italic_ρ ( italic_π italic_x ) end_ARG italic_κ ( italic_π italic_x ) = ( - 1 ) start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT square-root start_ARG italic_ρ ( italic_x ) end_ARG italic_κ ( italic_x ) ∀ italic_π , italic_x (45)

In cases where πx=x𝜋𝑥𝑥\pi x=xitalic_π italic_x = italic_x and (1)π=1superscript1𝜋1(-1)^{\pi}=-1( - 1 ) start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT = - 1, then we have that

ρ(πx)κ(πx)=(1)πρ(x)κ(x)𝜌𝜋𝑥𝜅𝜋𝑥superscript1𝜋𝜌𝑥𝜅𝑥\displaystyle\sqrt{\rho(\pi x)}\kappa(\pi x)=(-1)^{\pi}\sqrt{\rho(x)}\kappa(x)square-root start_ARG italic_ρ ( italic_π italic_x ) end_ARG italic_κ ( italic_π italic_x ) = ( - 1 ) start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT square-root start_ARG italic_ρ ( italic_x ) end_ARG italic_κ ( italic_x )
\displaystyle\Leftrightarrow\quad ρ(x)κ(x)=ρ(x)κ(x)𝜌𝑥𝜅𝑥𝜌𝑥𝜅𝑥\displaystyle\sqrt{\rho(x)}\kappa(x)=-\sqrt{\rho(x)}\kappa(x)square-root start_ARG italic_ρ ( italic_x ) end_ARG italic_κ ( italic_x ) = - square-root start_ARG italic_ρ ( italic_x ) end_ARG italic_κ ( italic_x )
\displaystyle\Leftrightarrow\quad ρ(x)=0𝜌𝑥0\displaystyle\rho(x)=0italic_ρ ( italic_x ) = 0 (46)

where the third line follows from the fact that κ(x)0𝜅𝑥0\kappa(x)\neq 0italic_κ ( italic_x ) ≠ 0 since |κ(x)|𝜅𝑥|\kappa(x)|| italic_κ ( italic_x ) | = 1. However, note that πx=x𝜋𝑥𝑥\pi x=xitalic_π italic_x = italic_x and (1)π=1superscript1𝜋1(-1)^{\pi}=-1( - 1 ) start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT = - 1 is equivalent to xi=xjsubscript𝑥𝑖subscript𝑥𝑗x_{i}=x_{j}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for some i,j𝑖𝑗i,jitalic_i , italic_j. Thus, we have established (D2). Furthermore, in such cases we can take κ(x)=κ¯𝜅𝑥¯𝜅\kappa(x)=\bar{\kappa}italic_κ ( italic_x ) = over¯ start_ARG italic_κ end_ARG, since it plays no role. In all other cases, i.e. where xixj for all i,jsubscript𝑥𝑖subscript𝑥𝑗 for all 𝑖𝑗x_{i}\neq x_{j}\text{ for all }i,jitalic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for all italic_i , italic_j, we have that

ρ(πx)κ(πx)=(1)πρ(x)κ(x)π,x such that xixj for all i,jformulae-sequence𝜌𝜋𝑥𝜅𝜋𝑥superscript1𝜋𝜌𝑥𝜅𝑥for-all𝜋𝑥 such that subscript𝑥𝑖subscript𝑥𝑗 for all 𝑖𝑗\displaystyle\sqrt{\rho(\pi x)}\kappa(\pi x)=(-1)^{\pi}\sqrt{\rho(x)}\kappa(x)% \quad\forall\pi,x\text{ such that }x_{i}\neq x_{j}\text{ for all }i,jsquare-root start_ARG italic_ρ ( italic_π italic_x ) end_ARG italic_κ ( italic_π italic_x ) = ( - 1 ) start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT square-root start_ARG italic_ρ ( italic_x ) end_ARG italic_κ ( italic_x ) ∀ italic_π , italic_x such that italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for all italic_i , italic_j
\displaystyle\Leftrightarrow\quad ρ(πx)=ρ(x)andκ(πx)=(1)πκ(x)π,x such that xixj for all i,jformulae-sequence𝜌𝜋𝑥𝜌𝑥andformulae-sequence𝜅𝜋𝑥superscript1𝜋𝜅𝑥for-all𝜋𝑥 such that subscript𝑥𝑖subscript𝑥𝑗 for all 𝑖𝑗\displaystyle\sqrt{\rho(\pi x)}=\sqrt{\rho(x)}\quad\text{and}\quad\kappa(\pi x% )=(-1)^{\pi}\kappa(x)\quad\forall\pi,x\text{ such that }x_{i}\neq x_{j}\text{ % for all }i,jsquare-root start_ARG italic_ρ ( italic_π italic_x ) end_ARG = square-root start_ARG italic_ρ ( italic_x ) end_ARG and italic_κ ( italic_π italic_x ) = ( - 1 ) start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT italic_κ ( italic_x ) ∀ italic_π , italic_x such that italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for all italic_i , italic_j (47)

where the second line holds since this must hold true for all π𝜋\piitalic_π and all relevant x𝑥xitalic_x. κ(πx)=(1)πκ(x)𝜅𝜋𝑥superscript1𝜋𝜅𝑥\kappa(\pi x)=(-1)^{\pi}\kappa(x)italic_κ ( italic_π italic_x ) = ( - 1 ) start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT italic_κ ( italic_x ) establishes the remainder of the nearly antisymmetric character of κ𝜅\kappaitalic_κ. Finally,

ρ(πx)=ρ(x)ρ(πx)=ρ(x)π,x such that xixj for all i,jformulae-sequence𝜌𝜋𝑥𝜌𝑥formulae-sequence𝜌𝜋𝑥𝜌𝑥for-all𝜋𝑥 such that subscript𝑥𝑖subscript𝑥𝑗 for all 𝑖𝑗\sqrt{\rho(\pi x)}=\sqrt{\rho(x)}\quad\Leftrightarrow\quad\rho(\pi x)=\rho(x)% \quad\forall\pi,x\text{ such that }x_{i}\neq x_{j}\text{ for all }i,jsquare-root start_ARG italic_ρ ( italic_π italic_x ) end_ARG = square-root start_ARG italic_ρ ( italic_x ) end_ARG ⇔ italic_ρ ( italic_π italic_x ) = italic_ρ ( italic_x ) ∀ italic_π , italic_x such that italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for all italic_i , italic_j (48)

since ρ(x)0𝜌𝑥0\rho(x)\geq 0italic_ρ ( italic_x ) ≥ 0. This shows that ρ(x)𝜌𝑥\rho(x)italic_ρ ( italic_x ) is symmetric for the case xixj for all i,jsubscript𝑥𝑖subscript𝑥𝑗 for all 𝑖𝑗x_{i}\neq x_{j}\text{ for all }i,jitalic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for all italic_i , italic_j. Indeed ρ(x)𝜌𝑥\rho(x)italic_ρ ( italic_x ) is symmetric for all x𝑥xitalic_x, including those for which xi=xjsubscript𝑥𝑖subscript𝑥𝑗x_{i}=x_{j}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for some i,j𝑖𝑗i,jitalic_i , italic_j, as in the latter case we have shown that ρ(x)=0𝜌𝑥0\rho(x)=0italic_ρ ( italic_x ) = 0. This establishes (D1) and completes the proof. ∎

Appendix C Proof of Theorem 2

Theorem.

Given a configuration x=(r,s)𝑥𝑟𝑠x=(r,s)italic_x = ( italic_r , italic_s ), let a permutation which maps the spin vector s𝑠sitalic_s to the canonical spin vector s¯¯𝑠\bar{s}over¯ start_ARG italic_s end_ARG be given by π¯ssubscript¯𝜋𝑠\bar{\pi}_{s}over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, i.e. s¯=π¯ss¯𝑠subscript¯𝜋𝑠𝑠\bar{s}=\bar{\pi}_{s}sover¯ start_ARG italic_s end_ARG = over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_s. Let ρ¯(r)¯𝜌𝑟\bar{\rho}(r)over¯ start_ARG italic_ρ end_ARG ( italic_r ) be a density function on electron positions (i.e. no spins) satisfying

  1. (R1)

    ρ¯¯𝜌\bar{\rho}over¯ start_ARG italic_ρ end_ARG is 𝔾𝔾\mathbb{G}blackboard_G-invariant: ρ¯(πr)=ρ¯(r) for all π𝔾¯𝜌𝜋𝑟¯𝜌𝑟 for all 𝜋𝔾\bar{\rho}(\pi r)=\bar{\rho}(r)\text{ for all }\pi\in\mathbb{G}over¯ start_ARG italic_ρ end_ARG ( italic_π italic_r ) = over¯ start_ARG italic_ρ end_ARG ( italic_r ) for all italic_π ∈ blackboard_G

  2. (R2)

    ρ¯(r)=0 if ri=rj, for i,j𝒩u or i,j𝒩dformulae-sequence¯𝜌𝑟0 if subscript𝑟𝑖subscript𝑟𝑗 for 𝑖𝑗subscript𝒩𝑢 or 𝑖𝑗subscript𝒩𝑑\bar{\rho}(r)=0\text{ if }r_{i}=r_{j},\text{ for }i,j\in\mathcal{N}_{u}\text{ % or }i,j\in\mathcal{N}_{d}over¯ start_ARG italic_ρ end_ARG ( italic_r ) = 0 if italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , for italic_i , italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT or italic_i , italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT

A density ρ(x)=ρ(r,s)𝜌𝑥𝜌𝑟𝑠\rho(x)=\rho(r,s)italic_ρ ( italic_x ) = italic_ρ ( italic_r , italic_s ) satisfies conditions (D1)-(D2) in Theorem 1 if and only if it may be written as ρ(r,s)=ρ¯(π¯sr)𝜌𝑟𝑠¯𝜌subscript¯𝜋𝑠𝑟\rho(r,s)=\bar{\rho}(\bar{\pi}_{s}r)italic_ρ ( italic_r , italic_s ) = over¯ start_ARG italic_ρ end_ARG ( over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_r ) for a density ρ¯(r)¯𝜌𝑟\bar{\rho}(r)over¯ start_ARG italic_ρ end_ARG ( italic_r ) satisfying conditions (R1) and (R2).

Proof.

Let us prove the forward direction: assume a density ρ(x)=ρ(r,s)𝜌𝑥𝜌𝑟𝑠\rho(x)=\rho(r,s)italic_ρ ( italic_x ) = italic_ρ ( italic_r , italic_s ) satisfying conditions (D1) and (D2), and we will show that it must be written as ρ(r,s)=ρ¯(π¯s1r)𝜌𝑟𝑠¯𝜌superscriptsubscript¯𝜋𝑠1𝑟\rho(r,s)=\bar{\rho}(\bar{\pi}_{s}^{-1}r)italic_ρ ( italic_r , italic_s ) = over¯ start_ARG italic_ρ end_ARG ( over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_r ) for ρ¯¯𝜌\bar{\rho}over¯ start_ARG italic_ρ end_ARG satisfying conditions (R1) and (R2). Let r¯=π¯sr¯𝑟subscript¯𝜋𝑠𝑟\bar{r}=\bar{\pi}_{s}rover¯ start_ARG italic_r end_ARG = over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_r and x¯=π¯sx=(r¯,s¯)¯𝑥subscript¯𝜋𝑠𝑥¯𝑟¯𝑠\bar{x}=\bar{\pi}_{s}x=(\bar{r},\bar{s})over¯ start_ARG italic_x end_ARG = over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_x = ( over¯ start_ARG italic_r end_ARG , over¯ start_ARG italic_s end_ARG ). In this case, we have that

ρ(x)=ρ(π¯sx)=ρ(x¯)𝜌𝑥𝜌subscript¯𝜋𝑠𝑥𝜌¯𝑥\rho(x)=\rho(\bar{\pi}_{s}x)=\rho(\bar{x})italic_ρ ( italic_x ) = italic_ρ ( over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_x ) = italic_ρ ( over¯ start_ARG italic_x end_ARG ) (49)

where the first equality comes our requirement that ρ(x)𝜌𝑥\rho(x)italic_ρ ( italic_x ) satisfy condition (D1), and the second equality from the definition of x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG. As a result, it is sufficient for us to focus on constructing a density ρ(x¯)=ρ(r¯,s¯)𝜌¯𝑥𝜌¯𝑟¯𝑠\rho(\bar{x})=\rho(\bar{r},\bar{s})italic_ρ ( over¯ start_ARG italic_x end_ARG ) = italic_ρ ( over¯ start_ARG italic_r end_ARG , over¯ start_ARG italic_s end_ARG ), i.e. a density where the spins are in canonical order. As s¯¯𝑠\bar{s}over¯ start_ARG italic_s end_ARG is fixed as the canonical ordering, we may suppress it, writing ρ(x¯)=ρ(r¯,s¯)ρ¯(r¯)𝜌¯𝑥𝜌¯𝑟¯𝑠¯𝜌¯𝑟\rho(\bar{x})=\rho(\bar{r},\bar{s})\equiv\bar{\rho}(\bar{r})italic_ρ ( over¯ start_ARG italic_x end_ARG ) = italic_ρ ( over¯ start_ARG italic_r end_ARG , over¯ start_ARG italic_s end_ARG ) ≡ over¯ start_ARG italic_ρ end_ARG ( over¯ start_ARG italic_r end_ARG ) for a function ρ¯¯𝜌\bar{\rho}over¯ start_ARG italic_ρ end_ARG. Now, ρ¯¯𝜌\bar{\rho}over¯ start_ARG italic_ρ end_ARG must satisfy condition (D1); however, the only permutations that are relevant are those that preserve the canonical spin ordering s¯¯𝑠\bar{s}over¯ start_ARG italic_s end_ARG. More specifically, the relevant permutations π𝜋\piitalic_π are those for which πs¯=s¯𝜋¯𝑠¯𝑠\pi\bar{s}=\bar{s}italic_π over¯ start_ARG italic_s end_ARG = over¯ start_ARG italic_s end_ARG; it is easy to see that those permutations form the group 𝕊𝒩u×𝕊𝒩d=𝔾subscript𝕊subscript𝒩𝑢subscript𝕊subscript𝒩𝑑𝔾\mathbb{S}_{\mathcal{N}_{u}}\times\mathbb{S}_{\mathcal{N}_{d}}=\mathbb{G}blackboard_S start_POSTSUBSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT × blackboard_S start_POSTSUBSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT = blackboard_G. Thus, we must have that

ρ¯(πr¯)=ρ¯(r¯) for all π𝔾¯𝜌𝜋¯𝑟¯𝜌¯𝑟 for all 𝜋𝔾\bar{\rho}(\pi\bar{r})=\bar{\rho}(\bar{r})\text{ for all }\pi\in\mathbb{G}over¯ start_ARG italic_ρ end_ARG ( italic_π over¯ start_ARG italic_r end_ARG ) = over¯ start_ARG italic_ρ end_ARG ( over¯ start_ARG italic_r end_ARG ) for all italic_π ∈ blackboard_G (50)

That is, ρ¯¯𝜌\bar{\rho}over¯ start_ARG italic_ρ end_ARG is 𝔾𝔾\mathbb{G}blackboard_G-invariant, which is condition (R1).

Now let us turn to condition (D2), which states that ρ(x)=0𝜌𝑥0\rho(x)=0italic_ρ ( italic_x ) = 0 if xi=xjsubscript𝑥𝑖subscript𝑥𝑗x_{i}=x_{j}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for any i,j𝑖𝑗i,jitalic_i , italic_j. The requirement xi=xjsubscript𝑥𝑖subscript𝑥𝑗x_{i}=x_{j}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT implies that both ri=rjsubscript𝑟𝑖subscript𝑟𝑗r_{i}=r_{j}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and si=sjsubscript𝑠𝑖subscript𝑠𝑗s_{i}=s_{j}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT; and the condition si=sjsubscript𝑠𝑖subscript𝑠𝑗s_{i}=s_{j}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is equivalent to i,j𝒩u𝑖𝑗subscript𝒩𝑢i,j\in\mathcal{N}_{u}italic_i , italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT or i,j𝒩d𝑖𝑗subscript𝒩𝑑i,j\in\mathcal{N}_{d}italic_i , italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT. Thus, condition (D2) is equivalent to

ρ¯(r¯)=0 if r¯i=r¯j, for i,j𝒩u or i,j𝒩dformulae-sequence¯𝜌¯𝑟0 if subscript¯𝑟𝑖subscript¯𝑟𝑗 for 𝑖𝑗subscript𝒩𝑢 or 𝑖𝑗subscript𝒩𝑑\bar{\rho}(\bar{r})=0\text{ if }\bar{r}_{i}=\bar{r}_{j},\text{ for }i,j\in% \mathcal{N}_{u}\text{ or }i,j\in\mathcal{N}_{d}over¯ start_ARG italic_ρ end_ARG ( over¯ start_ARG italic_r end_ARG ) = 0 if over¯ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over¯ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , for italic_i , italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT or italic_i , italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT (51)

which is simply condition (R2).

Thus, we have proven conditions (R1) and (R2) must hold. Finally, using Equation (49) and the definitions of ρ¯¯𝜌\bar{\rho}over¯ start_ARG italic_ρ end_ARG and r¯¯𝑟\bar{r}over¯ start_ARG italic_r end_ARG, we have that

ρ(x)=ρ(x¯)=ρ¯(r¯)=ρ¯(π¯sr)𝜌𝑥𝜌¯𝑥¯𝜌¯𝑟¯𝜌subscript¯𝜋𝑠𝑟\rho(x)=\rho(\bar{x})=\bar{\rho}(\bar{r})=\bar{\rho}(\bar{\pi}_{s}r)italic_ρ ( italic_x ) = italic_ρ ( over¯ start_ARG italic_x end_ARG ) = over¯ start_ARG italic_ρ end_ARG ( over¯ start_ARG italic_r end_ARG ) = over¯ start_ARG italic_ρ end_ARG ( over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_r ) (52)

which completes the proof of the forward direction.

Now, let us prove the reverse direction: assume a density ρ¯¯𝜌\bar{\rho}over¯ start_ARG italic_ρ end_ARG satisfying conditions (R1) and (R2), and we will show that ρ(r,s)=ρ¯(π¯sr)𝜌𝑟𝑠¯𝜌subscript¯𝜋𝑠𝑟\rho(r,s)=\bar{\rho}(\bar{\pi}_{s}r)italic_ρ ( italic_r , italic_s ) = over¯ start_ARG italic_ρ end_ARG ( over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_r ) satisfies conditions (D1) and (D2). Let us begin by computing π¯πssubscript¯𝜋𝜋𝑠\bar{\pi}_{\pi s}over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_π italic_s end_POSTSUBSCRIPT, which will prove useful in what follows. π¯πssubscript¯𝜋𝜋𝑠\bar{\pi}_{\pi s}over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_π italic_s end_POSTSUBSCRIPT is defined by s¯=π¯πsπs¯𝑠subscript¯𝜋𝜋𝑠𝜋𝑠\bar{s}=\bar{\pi}_{\pi s}\pi sover¯ start_ARG italic_s end_ARG = over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_π italic_s end_POSTSUBSCRIPT italic_π italic_s. However, we also know that s¯=π¯ss¯𝑠subscript¯𝜋𝑠𝑠\bar{s}=\bar{\pi}_{s}sover¯ start_ARG italic_s end_ARG = over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_s; setting these equal gives

π¯πsπs=π¯ssπ¯πsπ=π¯sπ^formulae-sequencesubscript¯𝜋𝜋𝑠𝜋𝑠subscript¯𝜋𝑠𝑠subscript¯𝜋𝜋𝑠𝜋subscript¯𝜋𝑠^𝜋\bar{\pi}_{\pi s}\pi s=\bar{\pi}_{s}s\quad\Rightarrow\quad\bar{\pi}_{\pi s}\pi% =\bar{\pi}_{s}\hat{\pi}over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_π italic_s end_POSTSUBSCRIPT italic_π italic_s = over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_s ⇒ over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_π italic_s end_POSTSUBSCRIPT italic_π = over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT over^ start_ARG italic_π end_ARG (53)

where π^^𝜋\hat{\pi}over^ start_ARG italic_π end_ARG is some permutation leaves s𝑠sitalic_s unchanged, i.e. such that π^s=s^𝜋𝑠𝑠\hat{\pi}s=sover^ start_ARG italic_π end_ARG italic_s = italic_s. Thus

π¯πs=π¯sπ^π1subscript¯𝜋𝜋𝑠subscript¯𝜋𝑠^𝜋superscript𝜋1\bar{\pi}_{\pi s}=\bar{\pi}_{s}\hat{\pi}\pi^{-1}over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_π italic_s end_POSTSUBSCRIPT = over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT over^ start_ARG italic_π end_ARG italic_π start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT (54)

However, we know that s¯=π¯ss¯𝑠subscript¯𝜋𝑠𝑠\bar{s}=\bar{\pi}_{s}sover¯ start_ARG italic_s end_ARG = over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_s so that s=π¯s1s¯𝑠superscriptsubscript¯𝜋𝑠1¯𝑠s=\bar{\pi}_{s}^{-1}\bar{s}italic_s = over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over¯ start_ARG italic_s end_ARG. Using the fact that π^s=s^𝜋𝑠𝑠\hat{\pi}s=sover^ start_ARG italic_π end_ARG italic_s = italic_s gives

π^π¯s1s¯=π¯s1s¯π^π¯s1=π¯s1π˘formulae-sequence^𝜋superscriptsubscript¯𝜋𝑠1¯𝑠superscriptsubscript¯𝜋𝑠1¯𝑠^𝜋superscriptsubscript¯𝜋𝑠1superscriptsubscript¯𝜋𝑠1˘𝜋\hat{\pi}\bar{\pi}_{s}^{-1}\bar{s}=\bar{\pi}_{s}^{-1}\bar{s}\quad\Rightarrow% \quad\hat{\pi}\bar{\pi}_{s}^{-1}=\bar{\pi}_{s}^{-1}\breve{\pi}over^ start_ARG italic_π end_ARG over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over¯ start_ARG italic_s end_ARG = over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over¯ start_ARG italic_s end_ARG ⇒ over^ start_ARG italic_π end_ARG over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over˘ start_ARG italic_π end_ARG (55)

where π˘˘𝜋\breve{\pi}over˘ start_ARG italic_π end_ARG is some permutation that leaves s¯¯𝑠\bar{s}over¯ start_ARG italic_s end_ARG unchanged; which precisely implies that π˘𝔾˘𝜋𝔾\breve{\pi}\in\mathbb{G}over˘ start_ARG italic_π end_ARG ∈ blackboard_G. Rearranging gives

π^=π¯s1π˘π¯sπ¯πs=π¯sπ¯s1π˘π¯sπ1=π˘π¯sπ1formulae-sequence^𝜋superscriptsubscript¯𝜋𝑠1˘𝜋subscript¯𝜋𝑠subscript¯𝜋𝜋𝑠subscript¯𝜋𝑠superscriptsubscript¯𝜋𝑠1˘𝜋subscript¯𝜋𝑠superscript𝜋1˘𝜋subscript¯𝜋𝑠superscript𝜋1\hat{\pi}=\bar{\pi}_{s}^{-1}\breve{\pi}\bar{\pi}_{s}\quad\Rightarrow\quad\bar{% \pi}_{\pi s}=\bar{\pi}_{s}\bar{\pi}_{s}^{-1}\breve{\pi}\bar{\pi}_{s}\pi^{-1}=% \breve{\pi}\bar{\pi}_{s}\pi^{-1}over^ start_ARG italic_π end_ARG = over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over˘ start_ARG italic_π end_ARG over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ⇒ over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_π italic_s end_POSTSUBSCRIPT = over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over˘ start_ARG italic_π end_ARG over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_π start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = over˘ start_ARG italic_π end_ARG over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_π start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT (56)

Now, for any permutation π𝜋\piitalic_π, we have that

ρ(πx)=ρ¯(π¯πsπr)=ρ¯(π˘π¯sπ1πr)=ρ¯(π˘(π¯sr))=ρ¯(π¯sr)=ρ(x)𝜌𝜋𝑥¯𝜌subscript¯𝜋𝜋𝑠𝜋𝑟¯𝜌˘𝜋subscript¯𝜋𝑠superscript𝜋1𝜋𝑟¯𝜌˘𝜋subscript¯𝜋𝑠𝑟¯𝜌subscript¯𝜋𝑠𝑟𝜌𝑥\rho(\pi x)=\bar{\rho}(\bar{\pi}_{\pi s}\pi r)=\bar{\rho}(\breve{\pi}\bar{\pi}% _{s}\pi^{-1}\pi r)=\bar{\rho}(\breve{\pi}(\bar{\pi}_{s}r))=\bar{\rho}(\bar{\pi% }_{s}r)=\rho(x)italic_ρ ( italic_π italic_x ) = over¯ start_ARG italic_ρ end_ARG ( over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_π italic_s end_POSTSUBSCRIPT italic_π italic_r ) = over¯ start_ARG italic_ρ end_ARG ( over˘ start_ARG italic_π end_ARG over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_π start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_π italic_r ) = over¯ start_ARG italic_ρ end_ARG ( over˘ start_ARG italic_π end_ARG ( over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_r ) ) = over¯ start_ARG italic_ρ end_ARG ( over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_r ) = italic_ρ ( italic_x ) (57)

where in the second last equality, we have used the fact that π˘𝔾˘𝜋𝔾\breve{\pi}\in\mathbb{G}over˘ start_ARG italic_π end_ARG ∈ blackboard_G, and that ρ¯¯𝜌\bar{\rho}over¯ start_ARG italic_ρ end_ARG is 𝔾𝔾\mathbb{G}blackboard_G-invariant by (R1). Thus, we have established property (D1), i.e. that ρ𝜌\rhoitalic_ρ is symmetric.

Now, turning to condition (D2), let us consider an x𝑥xitalic_x such that xi=xjsubscript𝑥𝑖subscript𝑥𝑗x_{i}=x_{j}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for a particular i,j𝑖𝑗i,jitalic_i , italic_j. Continuing to use the notation r¯=πsr¯𝑟subscript𝜋𝑠𝑟\bar{r}=\pi_{s}rover¯ start_ARG italic_r end_ARG = italic_π start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_r, this implies that r¯i=r¯jsubscript¯𝑟𝑖subscript¯𝑟𝑗\bar{r}_{i}=\bar{r}_{j}over¯ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over¯ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for either i,j𝒩u𝑖𝑗subscript𝒩𝑢i,j\in\mathcal{N}_{u}italic_i , italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT or i,j𝒩d𝑖𝑗subscript𝒩𝑑i,j\in\mathcal{N}_{d}italic_i , italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT. Thus, by condition (R2), we have that ρ¯(r¯)=0¯𝜌¯𝑟0\bar{\rho}(\bar{r})=0over¯ start_ARG italic_ρ end_ARG ( over¯ start_ARG italic_r end_ARG ) = 0. But then

ρ(x)=ρ¯(π¯sr)=ρ¯(r¯)=0𝜌𝑥¯𝜌subscript¯𝜋𝑠𝑟¯𝜌¯𝑟0\rho(x)=\bar{\rho}(\bar{\pi}_{s}r)=\bar{\rho}(\bar{r})=0italic_ρ ( italic_x ) = over¯ start_ARG italic_ρ end_ARG ( over¯ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_r ) = over¯ start_ARG italic_ρ end_ARG ( over¯ start_ARG italic_r end_ARG ) = 0 (58)

which is precisely condition (D2); this completes the proof. ∎

Appendix D Proof of Theorem 3

Theorem.

Suppose that we have a normalizing flow, whose base density ρzsubscript𝜌𝑧\rho_{z}italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT satisfies properties (R1) and (R2) from Theorem 2, and whose transformation T𝑇Titalic_T is 𝔾𝔾\mathbb{G}blackboard_G-equivariant. Then the density resulting from the normalizing flow will satisfy properties (R1) and (R2).

Proof.

Let us begin by proving that the density resulting from the normalizing flow will satisfy condition (R1). Theorem 1 in (Köhler et al., 2020) states the following: “Let ρ𝜌\rhoitalic_ρ be a density on msuperscript𝑚\mathbb{R}^{m}blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT which is 𝔾𝔾\mathbb{G}blackboard_G-invariant and 𝔾>𝔾\mathbb{G}>\mathbb{H}blackboard_G > blackboard_H. If f𝑓fitalic_f is an \mathbb{H}blackboard_H-equivariant diffeomorphism, then ρfsubscript𝜌𝑓\rho_{f}italic_ρ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, the push-forward of ρ𝜌\rhoitalic_ρ along f𝑓fitalic_f, is \mathbb{H}blackboard_H-invariant.” In our instance, we may take =𝔾𝔾\mathbb{H}=\mathbb{G}blackboard_H = blackboard_G, and thereby have established that the density resulting from the normalizing flow is 𝔾𝔾\mathbb{G}blackboard_G-invariant, thus satisfying condition (R1).

We now turn to proving that the density resulting from the normalizing flow will satisfy condition (R2). Suppose that the random variable for the base density is given by z𝑧zitalic_z, with density ρzsubscript𝜌𝑧\rho_{z}italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT; and the normalizing flow is given by transformation T𝑇Titalic_T, i.e. r=T(z)𝑟𝑇𝑧r=T(z)italic_r = italic_T ( italic_z ). Then by the change of variables formula, we know that the density of r𝑟ritalic_r is given by ρr(r)=ρz(T1(r))|detJT1(r)|subscript𝜌𝑟𝑟subscript𝜌𝑧superscript𝑇1𝑟subscript𝐽superscript𝑇1𝑟\rho_{r}(r)=\rho_{z}(T^{-1}(r))|\det J_{T^{-1}}(r)|italic_ρ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ) = italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_r ) ) | roman_det italic_J start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_r ) |. Now, we are interested in the case when ri=rjsubscript𝑟𝑖subscript𝑟𝑗r_{i}=r_{j}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for i,j𝒩u𝑖𝑗subscript𝒩𝑢i,j\in\mathcal{N}_{u}italic_i , italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT (we may equally consider the case of 𝒩dsubscript𝒩𝑑\mathcal{N}_{d}caligraphic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, they are identical). Let πij𝔾subscript𝜋𝑖𝑗𝔾\pi_{ij}\in\mathbb{G}italic_π start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ blackboard_G be the permutation whose only action is to flip the coordinates of electrons i𝑖iitalic_i and j𝑗jitalic_j. Given that ri=rjsubscript𝑟𝑖subscript𝑟𝑗r_{i}=r_{j}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, then by definition we have that πijr=rsubscript𝜋𝑖𝑗𝑟𝑟\pi_{ij}r=ritalic_π start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_r = italic_r. In this case, we have that

zT1(r)=T1(πijr)=πijT1(r)𝑧superscript𝑇1𝑟superscript𝑇1subscript𝜋𝑖𝑗𝑟subscript𝜋𝑖𝑗superscript𝑇1𝑟z\equiv T^{-1}(r)=T^{-1}(\pi_{ij}r)=\pi_{ij}T^{-1}(r)italic_z ≡ italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_r ) = italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_r ) = italic_π start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_r ) (59)

where the latter equality is due to the 𝔾𝔾\mathbb{G}blackboard_G-equivariance of T1superscript𝑇1T^{-1}italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, which follows straightforwardly from the 𝔾𝔾\mathbb{G}blackboard_G-equivariance of T𝑇Titalic_T. Rearranging the above, we have that

T1(r)=πij1z=πijzsuperscript𝑇1𝑟superscriptsubscript𝜋𝑖𝑗1𝑧subscript𝜋𝑖𝑗𝑧T^{-1}(r)=\pi_{ij}^{-1}z=\pi_{ij}zitalic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_r ) = italic_π start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_z = italic_π start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_z (60)

where the second equality is due to the fact that πij1=πijsuperscriptsubscript𝜋𝑖𝑗1subscript𝜋𝑖𝑗\pi_{ij}^{-1}=\pi_{ij}italic_π start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = italic_π start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, as πijsubscript𝜋𝑖𝑗\pi_{ij}italic_π start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT simply flips electrons i𝑖iitalic_i and j𝑗jitalic_j. However, z=T1(r)𝑧superscript𝑇1𝑟z=T^{-1}(r)italic_z = italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_r ), so combining Equations (59) and (60) gives z=πijz𝑧subscript𝜋𝑖𝑗𝑧z=\pi_{ij}zitalic_z = italic_π start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_z. Plugging this into the equation for the change of variables gives

ρr(r)=ρz(T1(r))|detJT1(r)|=ρz(z)|detJT1(r)|subscript𝜌𝑟𝑟subscript𝜌𝑧superscript𝑇1𝑟subscript𝐽superscript𝑇1𝑟subscript𝜌𝑧𝑧subscript𝐽superscript𝑇1𝑟\rho_{r}(r)=\rho_{z}(T^{-1}(r))|\det J_{T^{-1}}(r)|=\rho_{z}(z)|\det J_{T^{-1}% }(r)|italic_ρ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ) = italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_r ) ) | roman_det italic_J start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_r ) | = italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_z ) | roman_det italic_J start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_r ) | (61)

But we know that z𝑧zitalic_z is such that z=πijz𝑧subscript𝜋𝑖𝑗𝑧z=\pi_{ij}zitalic_z = italic_π start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_z, which means that zi=zjsubscript𝑧𝑖subscript𝑧𝑗z_{i}=z_{j}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT; and for such z𝑧zitalic_z’s, we know that ρz(z)=0subscript𝜌𝑧𝑧0\rho_{z}(z)=0italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_z ) = 0, by the assumption of condition (R2) for the base density. Thus, plugging back into Equation (61) gives

ρr(r)=0|detJT1(r)|=0subscript𝜌𝑟𝑟0subscript𝐽superscript𝑇1𝑟0\rho_{r}(r)=0\cdot|\det J_{T^{-1}}(r)|=0italic_ρ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ) = 0 ⋅ | roman_det italic_J start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_r ) | = 0 (62)

as desired. ∎

Appendix E Proof of Theorem 4

Theorem.

Let ρz(z)=ρdpp(zu;nu)ρdpp(zd;nd)subscript𝜌𝑧𝑧subscript𝜌𝑑𝑝𝑝subscript𝑧𝑢subscript𝑛𝑢subscript𝜌𝑑𝑝𝑝subscript𝑧𝑑subscript𝑛𝑑\rho_{z}(z)=\rho_{dpp}(z_{u};n_{u})\rho_{dpp}(z_{d};n_{d})italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_z ) = italic_ρ start_POSTSUBSCRIPT italic_d italic_p italic_p end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ; italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) italic_ρ start_POSTSUBSCRIPT italic_d italic_p italic_p end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ; italic_n start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ). Then ρzsubscript𝜌𝑧\rho_{z}italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT satisfies conditions (R1) and (R2) from Theorem 2.

Proof.

Let us begin with property (R1): we would like to prove that ρzsubscript𝜌𝑧\rho_{z}italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT is 𝔾𝔾\mathbb{G}blackboard_G-invariant. Let π𝔾𝜋𝔾\pi\in\mathbb{G}italic_π ∈ blackboard_G; as 𝔾=𝕊𝒩u×𝕊𝒩d𝔾subscript𝕊subscript𝒩𝑢subscript𝕊subscript𝒩𝑑\mathbb{G}=\mathbb{S}_{\mathcal{N}_{u}}\times\mathbb{S}_{\mathcal{N}_{d}}blackboard_G = blackboard_S start_POSTSUBSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT × blackboard_S start_POSTSUBSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT, we may write the permutation π=πuπd𝜋tensor-productsubscript𝜋𝑢subscript𝜋𝑑\pi=\pi_{u}\otimes\pi_{d}italic_π = italic_π start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⊗ italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, where πusubscript𝜋𝑢\pi_{u}italic_π start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT is a permutation which applies to the indices in 𝒩usubscript𝒩𝑢\mathcal{N}_{u}caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, and similarly for πdsubscript𝜋𝑑\pi_{d}italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and the indices in 𝒩dsubscript𝒩𝑑\mathcal{N}_{d}caligraphic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT. Thus,

ρz(πz)=ρdpp(πuzu;nu)ρdpp(πdzd;nd)subscript𝜌𝑧𝜋𝑧subscript𝜌𝑑𝑝𝑝subscript𝜋𝑢subscript𝑧𝑢subscript𝑛𝑢subscript𝜌𝑑𝑝𝑝subscript𝜋𝑑subscript𝑧𝑑subscript𝑛𝑑\rho_{z}(\pi z)=\rho_{dpp}(\pi_{u}z_{u};n_{u})\rho_{dpp}(\pi_{d}z_{d};n_{d})italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_π italic_z ) = italic_ρ start_POSTSUBSCRIPT italic_d italic_p italic_p end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ; italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) italic_ρ start_POSTSUBSCRIPT italic_d italic_p italic_p end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ; italic_n start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) (63)

Now, recall that the Projection DPP’s density is defined by

𝐊n(r)=[K(r1,r1)K(r1,rn)K(rn,r1)K(rn,rn)]ρdpp(r;n)=1n!det𝐊n(r)formulae-sequencesubscript𝐊𝑛𝑟matrix𝐾subscript𝑟1subscript𝑟1𝐾subscript𝑟1subscript𝑟𝑛𝐾subscript𝑟𝑛subscript𝑟1𝐾subscript𝑟𝑛subscript𝑟𝑛subscript𝜌𝑑𝑝𝑝𝑟𝑛1𝑛subscript𝐊𝑛𝑟\mathbf{K}_{n}(r)=\begin{bmatrix}K(r_{1},r_{1})&\dots&K(r_{1},r_{n})\\ \vdots&\ddots&\vdots\\ K(r_{n},r_{1})&\dots&K(r_{n},r_{n})\end{bmatrix}\qquad\Rightarrow\qquad\rho_{% dpp}(r;n)=\frac{1}{n!}\det\mathbf{K}_{n}(r)bold_K start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_r ) = [ start_ARG start_ROW start_CELL italic_K ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_CELL start_CELL … end_CELL start_CELL italic_K ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_K ( italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_CELL start_CELL … end_CELL start_CELL italic_K ( italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG ] ⇒ italic_ρ start_POSTSUBSCRIPT italic_d italic_p italic_p end_POSTSUBSCRIPT ( italic_r ; italic_n ) = divide start_ARG 1 end_ARG start_ARG italic_n ! end_ARG roman_det bold_K start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_r ) (64)

Thus, we must compute 𝐊n(πr)subscript𝐊𝑛𝜋𝑟\mathbf{K}_{n}(\pi r)bold_K start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_π italic_r ) for a permutation π𝜋\piitalic_π. We may represent the action of π𝜋\piitalic_π on a vector of length n𝑛nitalic_n by an n×n𝑛𝑛n\times nitalic_n × italic_n matrix Pπsubscript𝑃𝜋P_{\pi}italic_P start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT. It is then straightforward to see that

𝐊n(πr)=Pπ𝐊n(r)PπTsubscript𝐊𝑛𝜋𝑟subscript𝑃𝜋subscript𝐊𝑛𝑟superscriptsubscript𝑃𝜋𝑇\mathbf{K}_{n}(\pi r)=P_{\pi}\mathbf{K}_{n}(r)P_{\pi}^{T}bold_K start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_π italic_r ) = italic_P start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT bold_K start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_r ) italic_P start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT (65)

and thus that

det𝐊n(πr)=det(Pπ𝐊n(r)PπT)=det(PπTPπ𝐊n(r))=det(Pπ)2det𝐊n(r)=det𝐊n(r)subscript𝐊𝑛𝜋𝑟subscript𝑃𝜋subscript𝐊𝑛𝑟superscriptsubscript𝑃𝜋𝑇superscriptsubscript𝑃𝜋𝑇subscript𝑃𝜋subscript𝐊𝑛𝑟superscriptsubscript𝑃𝜋2subscript𝐊𝑛𝑟subscript𝐊𝑛𝑟\det\mathbf{K}_{n}(\pi r)=\det\left(P_{\pi}\mathbf{K}_{n}(r)P_{\pi}^{T}\right)% =\det\left(P_{\pi}^{T}P_{\pi}\mathbf{K}_{n}(r)\right)=\det(P_{\pi})^{2}\det% \mathbf{K}_{n}(r)=\det\mathbf{K}_{n}(r)roman_det bold_K start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_π italic_r ) = roman_det ( italic_P start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT bold_K start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_r ) italic_P start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) = roman_det ( italic_P start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT bold_K start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_r ) ) = roman_det ( italic_P start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_det bold_K start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_r ) = roman_det bold_K start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_r ) (66)

where the second equality uses the cyclic property of the determinant; the third equality that a determinant of products is the product of determinants; and the fourth equality that the determinant of a permutation matrix Pπsubscript𝑃𝜋P_{\pi}italic_P start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT is ±1plus-or-minus1\pm 1± 1. Thus, we have that

ρdpp(πuzu;nu)=1nu!det𝐊nu(πuzu)=1nu!det𝐊nu(zu)=ρdpp(zu;nu)subscript𝜌𝑑𝑝𝑝subscript𝜋𝑢subscript𝑧𝑢subscript𝑛𝑢1subscript𝑛𝑢subscript𝐊subscript𝑛𝑢subscript𝜋𝑢subscript𝑧𝑢1subscript𝑛𝑢subscript𝐊subscript𝑛𝑢subscript𝑧𝑢subscript𝜌𝑑𝑝𝑝subscript𝑧𝑢subscript𝑛𝑢\rho_{dpp}(\pi_{u}z_{u};n_{u})=\frac{1}{n_{u}!}\det\mathbf{K}_{n_{u}}(\pi_{u}z% _{u})=\frac{1}{n_{u}!}\det\mathbf{K}_{n_{u}}(z_{u})=\rho_{dpp}(z_{u};n_{u})italic_ρ start_POSTSUBSCRIPT italic_d italic_p italic_p end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ; italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ! end_ARG roman_det bold_K start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ! end_ARG roman_det bold_K start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) = italic_ρ start_POSTSUBSCRIPT italic_d italic_p italic_p end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ; italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) (67)

Likewise, ρdpp(πdzd;nd)=ρdpp(zd;nd)subscript𝜌𝑑𝑝𝑝subscript𝜋𝑑subscript𝑧𝑑subscript𝑛𝑑subscript𝜌𝑑𝑝𝑝subscript𝑧𝑑subscript𝑛𝑑\rho_{dpp}(\pi_{d}z_{d};n_{d})=\rho_{dpp}(z_{d};n_{d})italic_ρ start_POSTSUBSCRIPT italic_d italic_p italic_p end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ; italic_n start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) = italic_ρ start_POSTSUBSCRIPT italic_d italic_p italic_p end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ; italic_n start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ). This gives finally that ρz(πz)=ρz(z)subscript𝜌𝑧𝜋𝑧subscript𝜌𝑧𝑧\rho_{z}(\pi z)=\rho_{z}(z)italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_π italic_z ) = italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_z ), establishing that ρzsubscript𝜌𝑧\rho_{z}italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT is 𝔾𝔾\mathbb{G}blackboard_G-invariant, i.e. satisfies condition (R1).

We now turn to condition (R2): we would like to prove that ρz(z)=0 if zi=zj, for i,j𝒩u or i,j𝒩dformulae-sequencesubscript𝜌𝑧𝑧0 if subscript𝑧𝑖subscript𝑧𝑗 for 𝑖𝑗subscript𝒩𝑢 or 𝑖𝑗subscript𝒩𝑑\rho_{z}(z)=0\text{ if }z_{i}=z_{j},\text{ for }i,j\in\mathcal{N}_{u}\text{ or% }i,j\in\mathcal{N}_{d}italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_z ) = 0 if italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , for italic_i , italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT or italic_i , italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT. Let us focus on the case of spin-up electrons, i.e. i,j𝒩u𝑖𝑗subscript𝒩𝑢i,j\in\mathcal{N}_{u}italic_i , italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT; the spin-down case will follow analogously. We know that

ρdpp(zu;nu)=1nu!det𝐊nu(zu)subscript𝜌𝑑𝑝𝑝subscript𝑧𝑢subscript𝑛𝑢1subscript𝑛𝑢subscript𝐊subscript𝑛𝑢subscript𝑧𝑢\rho_{dpp}(z_{u};n_{u})=\frac{1}{n_{u}!}\det\mathbf{K}_{n_{u}}(z_{u})italic_ρ start_POSTSUBSCRIPT italic_d italic_p italic_p end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ; italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ! end_ARG roman_det bold_K start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) (68)

Given the definition of the matrix 𝐊nu(zu)subscript𝐊subscript𝑛𝑢subscript𝑧𝑢\mathbf{K}_{n_{u}}(z_{u})bold_K start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ), it is straightforward to see that if zi=zjsubscript𝑧𝑖subscript𝑧𝑗z_{i}=z_{j}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, then 𝐊nu(zu)subscript𝐊subscript𝑛𝑢subscript𝑧𝑢\mathbf{K}_{n_{u}}(z_{u})bold_K start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) has identical columns for i𝑖iitalic_i and j𝑗jitalic_j. However, a matrix with two identical columns is rank deficient, and therefore has determinant 00. Thus, we have that ρdpp(zu;nu)=0subscript𝜌𝑑𝑝𝑝subscript𝑧𝑢subscript𝑛𝑢0\rho_{dpp}(z_{u};n_{u})=0italic_ρ start_POSTSUBSCRIPT italic_d italic_p italic_p end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ; italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) = 0 so that ρz(z)=0subscript𝜌𝑧𝑧0\rho_{z}(z)=0italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_z ) = 0, establishing that ρzsubscript𝜌𝑧\rho_{z}italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT satisfies condition (R2). ∎

Appendix F Sampling Procedure for Projection DPPs

In order to sample from a Projection DPP, we may follows the procedure outlined in (Lavancier et al., 2015), which we reproduce in Algorithm 1. We note that the speed of the sampling algorithm is largely unimportant, as one may sample as many samples as one would like offline, prior to (and independent from) the process of minimizing the variational objective.

Algorithm 1 Sampling from a Projection Determinantal Point Process
0:  n𝑛nitalic_n, H(y)𝐻𝑦H(y)italic_H ( italic_y )
  sample rnsubscript𝑟𝑛r_{n}italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT from the distribution with density ρn(y)=1nH(y)2subscript𝜌𝑛𝑦1𝑛superscriptnorm𝐻𝑦2\rho_{n}(y)=\frac{1}{n}\|H(y)\|^{2}italic_ρ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_y ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ italic_H ( italic_y ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
  e1H(rn)/H(rn)subscript𝑒1𝐻subscript𝑟𝑛norm𝐻subscript𝑟𝑛e_{1}\leftarrow H(r_{n})/\|H(r_{n})\|italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ← italic_H ( italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) / ∥ italic_H ( italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∥
  for i=n1𝑖𝑛1i=n-1italic_i = italic_n - 1 to 1111 do
     sample risubscript𝑟𝑖r_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT from the distribution with density ρi(y)=1iH(y)21ij=1ni|ejTH(y)|2subscript𝜌𝑖𝑦1𝑖superscriptnorm𝐻𝑦21𝑖superscriptsubscript𝑗1𝑛𝑖superscriptsuperscriptsubscript𝑒𝑗𝑇𝐻𝑦2\rho_{i}(y)=\frac{1}{i}\|H(y)\|^{2}-\frac{1}{i}\sum_{j=1}^{n-i}|e_{j}^{T}H(y)|% ^{2}italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y ) = divide start_ARG 1 end_ARG start_ARG italic_i end_ARG ∥ italic_H ( italic_y ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_i end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - italic_i end_POSTSUPERSCRIPT | italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H ( italic_y ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
     ciH(ri)1ij=1ni(ejTH(ri))ejsubscript𝑐𝑖𝐻subscript𝑟𝑖1𝑖superscriptsubscript𝑗1𝑛𝑖superscriptsubscript𝑒𝑗𝑇𝐻subscript𝑟𝑖subscript𝑒𝑗c_{i}\leftarrow H(r_{i})-\frac{1}{i}\sum_{j=1}^{n-i}\left(e_{j}^{T}H(r_{i})% \right)e_{j}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_H ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - divide start_ARG 1 end_ARG start_ARG italic_i end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - italic_i end_POSTSUPERSCRIPT ( italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
     eni+1ci/cisubscript𝑒𝑛𝑖1subscript𝑐𝑖normsubscript𝑐𝑖e_{n-i+1}\leftarrow c_{i}/\|c_{i}\|italic_e start_POSTSUBSCRIPT italic_n - italic_i + 1 end_POSTSUBSCRIPT ← italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / ∥ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥
  end for
  return  r=(r1,,rn)𝑟subscript𝑟1subscript𝑟𝑛r=(r_{1},\dots,r_{n})italic_r = ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )

In order to sample from ρi(y)subscript𝜌𝑖𝑦\rho_{i}(y)italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y ), one may use rejection sampling; for further details, see (Lavancier et al., 2015). Note that the algorithm can be generalized in a straightforward fashion to a complex orthonormal basis H(x)𝐻𝑥H(x)italic_H ( italic_x ) by replacing all transposes with Hermitian transposes.

Appendix G Proof of Theorem 5

Theorem.

The transformation Tsuperscript𝑇T^{\ell}italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is 𝔾𝔾\mathbb{G}blackboard_G-equivariant if and only if

Tα(παrα,πα^rα^)=παTα(rα,rα^)α{u,d}formulae-sequencesuperscriptsubscript𝑇𝛼subscript𝜋𝛼superscriptsubscript𝑟𝛼subscript𝜋^𝛼superscriptsubscript𝑟^𝛼subscript𝜋𝛼superscriptsubscript𝑇𝛼superscriptsubscript𝑟𝛼superscriptsubscript𝑟^𝛼𝛼𝑢𝑑T_{\alpha}^{\ell}(\pi_{\alpha}r_{\alpha}^{\ell},\pi_{\hat{\alpha}}r_{\hat{% \alpha}}^{\ell})=\pi_{\alpha}T_{\alpha}^{\ell}(r_{\alpha}^{\ell},r_{\hat{% \alpha}}^{\ell})\qquad\alpha\in\{u,d\}italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) italic_α ∈ { italic_u , italic_d }

That is, Tαsuperscriptsubscript𝑇𝛼T_{\alpha}^{\ell}italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is equivariant with respect to rαsuperscriptsubscript𝑟𝛼r_{\alpha}^{\ell}italic_r start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT, and invariant with respect to rα^superscriptsubscript𝑟^𝛼r_{\hat{\alpha}}^{\ell}italic_r start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT.

Proof.

Let us begin with the forward direction: suppose that Tsuperscript𝑇T^{\ell}italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is 𝔾𝔾\mathbb{G}blackboard_G-equivariant. Let π𝔾𝜋𝔾\pi\in\mathbb{G}italic_π ∈ blackboard_G; as 𝔾=𝕊𝒩u×𝕊𝒩d𝔾subscript𝕊subscript𝒩𝑢subscript𝕊subscript𝒩𝑑\mathbb{G}=\mathbb{S}_{\mathcal{N}_{u}}\times\mathbb{S}_{\mathcal{N}_{d}}blackboard_G = blackboard_S start_POSTSUBSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT × blackboard_S start_POSTSUBSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT, we may write the permutation π=πuπd𝜋tensor-productsubscript𝜋𝑢subscript𝜋𝑑\pi=\pi_{u}\otimes\pi_{d}italic_π = italic_π start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⊗ italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, where πusubscript𝜋𝑢\pi_{u}italic_π start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT is a permutation which applies to the indices in 𝒩usubscript𝒩𝑢\mathcal{N}_{u}caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, and similarly for πdsubscript𝜋𝑑\pi_{d}italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and the indices in 𝒩dsubscript𝒩𝑑\mathcal{N}_{d}caligraphic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT. Then 𝔾𝔾\mathbb{G}blackboard_G-equivariance of Tsuperscript𝑇T^{\ell}italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT implies

T(πr)=πT(r)=πr+1superscript𝑇𝜋superscript𝑟𝜋superscript𝑇superscript𝑟𝜋superscript𝑟1T^{\ell}(\pi r^{\ell})=\pi T^{\ell}(r^{\ell})=\pi r^{\ell+1}italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π italic_r start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_π italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_r start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_π italic_r start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT (69)

Now, let us break this down by spin. Note that

πr+1=(πuru+1,πdrd+1)𝜋superscript𝑟1subscript𝜋𝑢superscriptsubscript𝑟𝑢1subscript𝜋𝑑superscriptsubscript𝑟𝑑1\pi r^{\ell+1}=(\pi_{u}r_{u}^{\ell+1},\pi_{d}r_{d}^{\ell+1})italic_π italic_r start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT = ( italic_π start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT ) (70)

and also

T(πr)=(Tu(πuru,πdrd),Td(πuru,πdrd))superscript𝑇𝜋superscript𝑟superscriptsubscript𝑇𝑢subscript𝜋𝑢superscriptsubscript𝑟𝑢subscript𝜋𝑑superscriptsubscript𝑟𝑑superscriptsubscript𝑇𝑑subscript𝜋𝑢superscriptsubscript𝑟𝑢subscript𝜋𝑑superscriptsubscript𝑟𝑑T^{\ell}(\pi r^{\ell})=(T_{u}^{\ell}(\pi_{u}r_{u}^{\ell},\pi_{d}r_{d}^{\ell})% \,,\,T_{d}^{\ell}(\pi_{u}r_{u}^{\ell},\pi_{d}r_{d}^{\ell}))italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π italic_r start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = ( italic_T start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) , italic_T start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) ) (71)

But Equation (69) says that πr+1=T(πr)𝜋superscript𝑟1superscript𝑇𝜋superscript𝑟\pi r^{\ell+1}=T^{\ell}(\pi r^{\ell})italic_π italic_r start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT = italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π italic_r start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ), so we may combine the last two equations to give

Tu(πuru,πdrd)=πuTu(ru,rd)andTd(πuru,πdrd)=πdTd(ru,rd)formulae-sequencesuperscriptsubscript𝑇𝑢subscript𝜋𝑢superscriptsubscript𝑟𝑢subscript𝜋𝑑superscriptsubscript𝑟𝑑subscript𝜋𝑢superscriptsubscript𝑇𝑢superscriptsubscript𝑟𝑢superscriptsubscript𝑟𝑑andsuperscriptsubscript𝑇𝑑subscript𝜋𝑢superscriptsubscript𝑟𝑢subscript𝜋𝑑superscriptsubscript𝑟𝑑subscript𝜋𝑑superscriptsubscript𝑇𝑑superscriptsubscript𝑟𝑢superscriptsubscript𝑟𝑑T_{u}^{\ell}(\pi_{u}r_{u}^{\ell},\pi_{d}r_{d}^{\ell})=\pi_{u}T_{u}^{\ell}(r_{u% }^{\ell},r_{d}^{\ell})\qquad\text{and}\qquad T_{d}^{\ell}(\pi_{u}r_{u}^{\ell},% \pi_{d}r_{d}^{\ell})=\pi_{d}T_{d}^{\ell}(r_{u}^{\ell},r_{d}^{\ell})italic_T start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) and italic_T start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) (72)

In words, Tusuperscriptsubscript𝑇𝑢T_{u}^{\ell}italic_T start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is equivariant with respect to πusubscript𝜋𝑢\pi_{u}italic_π start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, and invariant with respect to πdsubscript𝜋𝑑\pi_{d}italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT; and the reverse is true for Tdsuperscriptsubscript𝑇𝑑T_{d}^{\ell}italic_T start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT. For notational convenience, we use α{u,d}𝛼𝑢𝑑\alpha\in\{u,d\}italic_α ∈ { italic_u , italic_d } to denote the spin, and the complement of the spin is given by α^^𝛼{\hat{\alpha}}over^ start_ARG italic_α end_ARG (i.e. if α=u𝛼𝑢\alpha=uitalic_α = italic_u then α^=d^𝛼𝑑{\hat{\alpha}}=dover^ start_ARG italic_α end_ARG = italic_d). In this case, we may summarize Equation (72) as

Tα(παrα,πα^rα^)=παTα(rα,rα^)α{u,d}formulae-sequencesuperscriptsubscript𝑇𝛼subscript𝜋𝛼superscriptsubscript𝑟𝛼subscript𝜋^𝛼superscriptsubscript𝑟^𝛼subscript𝜋𝛼superscriptsubscript𝑇𝛼superscriptsubscript𝑟𝛼superscriptsubscript𝑟^𝛼𝛼𝑢𝑑T_{\alpha}^{\ell}(\pi_{\alpha}r_{\alpha}^{\ell},\pi_{\hat{\alpha}}r_{\hat{% \alpha}}^{\ell})=\pi_{\alpha}T_{\alpha}^{\ell}(r_{\alpha}^{\ell},r_{\hat{% \alpha}}^{\ell})\qquad\alpha\in\{u,d\}italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) italic_α ∈ { italic_u , italic_d } (73)

which completes the proof for the forward direction.

Now, suppose that Tα(παrα,πα^rα^)=παTα(rα,rα^)superscriptsubscript𝑇𝛼subscript𝜋𝛼superscriptsubscript𝑟𝛼subscript𝜋^𝛼superscriptsubscript𝑟^𝛼subscript𝜋𝛼superscriptsubscript𝑇𝛼superscriptsubscript𝑟𝛼superscriptsubscript𝑟^𝛼T_{\alpha}^{\ell}(\pi_{\alpha}r_{\alpha}^{\ell},\pi_{\hat{\alpha}}r_{\hat{% \alpha}}^{\ell})=\pi_{\alpha}T_{\alpha}^{\ell}(r_{\alpha}^{\ell},r_{\hat{% \alpha}}^{\ell})italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ). For a given permutation π=πuπd𝜋tensor-productsubscript𝜋𝑢subscript𝜋𝑑\pi=\pi_{u}\otimes\pi_{d}italic_π = italic_π start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⊗ italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, we have

T(πr)=(Tu(πuru,πdrd),Td(πuru,πdrd))=(πuTu(ru,rd),πdTd(ru,rd))=πTl(r)superscript𝑇𝜋superscript𝑟superscriptsubscript𝑇𝑢subscript𝜋𝑢superscriptsubscript𝑟𝑢subscript𝜋𝑑superscriptsubscript𝑟𝑑superscriptsubscript𝑇𝑑subscript𝜋𝑢superscriptsubscript𝑟𝑢subscript𝜋𝑑superscriptsubscript𝑟𝑑subscript𝜋𝑢superscriptsubscript𝑇𝑢superscriptsubscript𝑟𝑢superscriptsubscript𝑟𝑑subscript𝜋𝑑superscriptsubscript𝑇𝑑superscriptsubscript𝑟𝑢superscriptsubscript𝑟𝑑𝜋superscript𝑇𝑙superscript𝑟T^{\ell}(\pi r^{\ell})=(T_{u}^{\ell}(\pi_{u}r_{u}^{\ell},\pi_{d}r_{d}^{\ell})% \,,\,T_{d}^{\ell}(\pi_{u}r_{u}^{\ell},\pi_{d}r_{d}^{\ell}))=(\pi_{u}T_{u}^{% \ell}(r_{u}^{\ell},r_{d}^{\ell})\,,\,\pi_{d}T_{d}^{\ell}(r_{u}^{\ell},r_{d}^{% \ell}))=\pi T^{l}(r^{\ell})italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π italic_r start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = ( italic_T start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) , italic_T start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) ) = ( italic_π start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) , italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) ) = italic_π italic_T start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_r start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) (74)

so that Tsuperscript𝑇T^{\ell}italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is 𝔾𝔾\mathbb{G}blackboard_G-equivariant, as desired. This completes the proof for the reverse direction. ∎

Appendix H Proof of Theorem 6

Theorem.

Let the transformation r=T(z)𝑟𝑇𝑧r=T(z)italic_r = italic_T ( italic_z ) be specified by the ODE

dvdt=Γ(v),withv(0)=zρz()andr=v(1)formulae-sequenceformulae-sequence𝑑𝑣𝑑𝑡Γ𝑣with𝑣0𝑧similar-tosubscript𝜌𝑧and𝑟𝑣1\frac{dv}{dt}=\Gamma(v),\quad\text{with}\quad v(0)=z\sim\rho_{z}(\cdot)\quad% \text{and}\quad r=v(1)divide start_ARG italic_d italic_v end_ARG start_ARG italic_d italic_t end_ARG = roman_Γ ( italic_v ) , with italic_v ( 0 ) = italic_z ∼ italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( ⋅ ) and italic_r = italic_v ( 1 )

Then T𝑇Titalic_T is 𝔾𝔾\mathbb{G}blackboard_G-equivariant if ΓΓ\Gammaroman_Γ is 𝔾𝔾\mathbb{G}blackboard_G-equivariant.

Proof.

The result follows directly from Theorem 2 in (Köhler et al., 2020). ∎

Appendix I The Complexity of Discrete Normalizing Flows

The limiting factor in the complexity of the discrete normalizing flow is the computation of determinants; unlike traditional normalizing flows, we make no effort to accelerate the determinant of the Jacobian, which allows us to have more expressive layers. In particular, the relevant space is of dimension Dn𝐷𝑛Dnitalic_D italic_n, where D=3𝐷3D=3italic_D = 3 and n𝑛nitalic_n is on the order of tens of electrons for small molecules. Thus, the overall dimension of the space is low hundreds.

We note that the determinant of the Jacobian is cubic in the dimension; for a low-dimensional space this is acceptable. Furthermore, popular methods based on neural networks, such as FermiNet (Pfau et al., 2020) and PauliNet (Hermann et al., 2020) use determinants explicitly in their ansätze, so that they have similar complexity. However, these methods use Markov Chain Monte Carlo sampling, so that they incur extra overhead from having to sample by solving for the limit of a stochastic differential equation, which our method avoids.

Appendix J Proof of Theorem 7

Theorem.

Let Tsuperscript𝑇T^{\ell}italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT be a Split Subspace Layer, as given by

rα,i+1=Tα,i(rα,rα^)=rα,i+ξαφα,i(γα,γα^)withφα,i(γα,γα^)DDβformulae-sequencesuperscriptsubscript𝑟𝛼𝑖1superscriptsubscript𝑇𝛼𝑖superscriptsubscript𝑟𝛼superscriptsubscript𝑟^𝛼superscriptsubscript𝑟𝛼𝑖superscriptsubscript𝜉𝛼superscriptsubscript𝜑𝛼𝑖superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼withsuperscriptsubscript𝜑𝛼𝑖superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼superscript𝐷subscript𝐷𝛽r_{\alpha,i}^{\ell+1}=T_{\alpha,i}^{\ell}(r_{\alpha}^{\ell},r_{\hat{\alpha}}^{% \ell})=r_{\alpha,i}^{\ell}+\xi_{\alpha}^{\ell}\varphi_{\alpha,i}^{\ell}(\gamma% _{\alpha}^{\ell},\gamma_{\hat{\alpha}}^{\ell})\quad\text{with}\quad\varphi_{% \alpha,i}^{\ell}(\gamma_{\alpha}^{\ell},\gamma_{\hat{\alpha}}^{\ell})\in% \mathbb{R}^{D-D_{\beta}}italic_r start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT = italic_T start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_r start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT + italic_ξ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) with italic_φ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_D - italic_D start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT end_POSTSUPERSCRIPT

Then Tsuperscript𝑇T^{\ell}italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is invertible. In particular, let γ¯α,i+1=(βα)Trα,i+1superscriptsubscript¯𝛾𝛼𝑖1superscriptsuperscriptsubscript𝛽𝛼𝑇superscriptsubscript𝑟𝛼𝑖1\underline{\gamma}_{\alpha,i}^{\ell+1}=(\beta_{\alpha}^{\ell})^{T}r_{\alpha,i}% ^{\ell+1}under¯ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT = ( italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT; then the inverse of the layer is given by

rα,i=rα,i+1ξαφα,i(γ¯α+1,γ¯α^+1)superscriptsubscript𝑟𝛼𝑖superscriptsubscript𝑟𝛼𝑖1superscriptsubscript𝜉𝛼superscriptsubscript𝜑𝛼𝑖superscriptsubscript¯𝛾𝛼1superscriptsubscript¯𝛾^𝛼1r_{\alpha,i}^{\ell}=r_{\alpha,i}^{\ell+1}-\xi_{\alpha}^{\ell}\varphi_{\alpha,i% }^{\ell}(\underline{\gamma}_{\alpha}^{\ell+1},\underline{\gamma}_{\hat{\alpha}% }^{\ell+1})italic_r start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = italic_r start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( under¯ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT , under¯ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT )

Furthermore, the layer Tsuperscript𝑇T^{\ell}italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is 𝔾𝔾\mathbb{G}blackboard_G-equivariant if

φα(παγα,πα^γα^)=παφα(γα,γα^)superscriptsubscript𝜑𝛼subscript𝜋𝛼superscriptsubscript𝛾𝛼subscript𝜋^𝛼superscriptsubscript𝛾^𝛼subscript𝜋𝛼superscriptsubscript𝜑𝛼superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼\varphi_{\alpha}^{\ell}(\pi_{\alpha}\gamma_{\alpha}^{\ell},\pi_{\hat{\alpha}}% \gamma_{\hat{\alpha}}^{\ell})=\pi_{\alpha}\varphi_{\alpha}^{\ell}(\gamma_{% \alpha}^{\ell},\gamma_{\hat{\alpha}}^{\ell})italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT )

i.e. if φα(γα,γα^)superscriptsubscript𝜑𝛼superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼\varphi_{\alpha}^{\ell}(\gamma_{\alpha}^{\ell},\gamma_{\hat{\alpha}}^{\ell})italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) is equivariant with respect to permutations on γαsuperscriptsubscript𝛾𝛼\gamma_{\alpha}^{\ell}italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT and invariant with respect to permutations on γα^superscriptsubscript𝛾^𝛼\gamma_{\hat{\alpha}}^{\ell}italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT.

Proof.

Let us first prove the layer’s inverse. First, note that γ¯α,i+1superscriptsubscript¯𝛾𝛼𝑖1\underline{\gamma}_{\alpha,i}^{\ell+1}under¯ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT can be computed entirely from variables in layer +11{\ell+1}roman_ℓ + 1. Also note that γ¯α,i+1γα,i+1superscriptsubscript¯𝛾𝛼𝑖1superscriptsubscript𝛾𝛼𝑖1\underline{\gamma}_{\alpha,i}^{\ell+1}\neq\gamma_{\alpha,i}^{\ell+1}under¯ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT ≠ italic_γ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT, since γα,i+1=(βα+1)Trα,i+1superscriptsubscript𝛾𝛼𝑖1superscriptsuperscriptsubscript𝛽𝛼1𝑇superscriptsubscript𝑟𝛼𝑖1\gamma_{\alpha,i}^{\ell+1}=(\beta_{\alpha}^{\ell+1})^{T}r_{\alpha,i}^{\ell+1}italic_γ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT = ( italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT - i.e. γ¯α,i+1superscriptsubscript¯𝛾𝛼𝑖1\underline{\gamma}_{\alpha,i}^{\ell+1}under¯ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT uses βαsuperscriptsubscript𝛽𝛼\beta_{\alpha}^{\ell}italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT, while γα,i+1superscriptsubscript𝛾𝛼𝑖1\gamma_{\alpha,i}^{\ell+1}italic_γ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT uses βα+1superscriptsubscript𝛽𝛼1\beta_{\alpha}^{\ell+1}italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT. Now, we show that

γ¯α,i+1superscriptsubscript¯𝛾𝛼𝑖1\displaystyle\underline{\gamma}_{\alpha,i}^{\ell+1}under¯ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT =(βα)Trα,i+1absentsuperscriptsuperscriptsubscript𝛽𝛼𝑇superscriptsubscript𝑟𝛼𝑖1\displaystyle=(\beta_{\alpha}^{\ell})^{T}r_{\alpha,i}^{\ell+1}= ( italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT
=(βα)T(rα,i+ξαφα,i(γα,γα^))absentsuperscriptsuperscriptsubscript𝛽𝛼𝑇superscriptsubscript𝑟𝛼𝑖superscriptsubscript𝜉𝛼superscriptsubscript𝜑𝛼𝑖superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼\displaystyle=(\beta_{\alpha}^{\ell})^{T}\left(r_{\alpha,i}^{\ell}+\xi_{\alpha% }^{\ell}\varphi_{\alpha,i}^{\ell}(\gamma_{\alpha}^{\ell},\gamma_{\hat{\alpha}}% ^{\ell})\right)= ( italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT + italic_ξ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) )
=(βα)Trα,i+(βα)Tξαφα,i(γα,γα^)absentsuperscriptsuperscriptsubscript𝛽𝛼𝑇superscriptsubscript𝑟𝛼𝑖superscriptsuperscriptsubscript𝛽𝛼𝑇superscriptsubscript𝜉𝛼superscriptsubscript𝜑𝛼𝑖superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼\displaystyle=(\beta_{\alpha}^{\ell})^{T}r_{\alpha,i}^{\ell}+(\beta_{\alpha}^{% \ell})^{T}\xi_{\alpha}^{\ell}\varphi_{\alpha,i}^{\ell}(\gamma_{\alpha}^{\ell},% \gamma_{\hat{\alpha}}^{\ell})= ( italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT + ( italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT )
=γα,i+0=γα,iabsentsuperscriptsubscript𝛾𝛼𝑖0superscriptsubscript𝛾𝛼𝑖\displaystyle=\gamma_{\alpha,i}^{\ell}+0=\gamma_{\alpha,i}^{\ell}= italic_γ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT + 0 = italic_γ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT

where the equality in the last line follows from the fact that ξαsuperscriptsubscript𝜉𝛼\xi_{\alpha}^{\ell}italic_ξ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is the orthogonal complement of βαsuperscriptsubscript𝛽𝛼\beta_{\alpha}^{\ell}italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT, so that (βα)Tξα=0superscriptsuperscriptsubscript𝛽𝛼𝑇superscriptsubscript𝜉𝛼0(\beta_{\alpha}^{\ell})^{T}\xi_{\alpha}^{\ell}=0( italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_ξ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = 0. That is, the Split Subspace Layer has the nice property that it preserves projections onto the subspace given by βαsuperscriptsubscript𝛽𝛼\beta_{\alpha}^{\ell}italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT.

Given this, the inverse is straightforwardly computed by rearranging Equation (18):

rα,i=rα,i+1ξαφα,i(γα,γα^)=rα,i+1ξαφα,i(γ¯α+1,γ¯α^+1)superscriptsubscript𝑟𝛼𝑖superscriptsubscript𝑟𝛼𝑖1superscriptsubscript𝜉𝛼superscriptsubscript𝜑𝛼𝑖superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼superscriptsubscript𝑟𝛼𝑖1superscriptsubscript𝜉𝛼superscriptsubscript𝜑𝛼𝑖superscriptsubscript¯𝛾𝛼1superscriptsubscript¯𝛾^𝛼1r_{\alpha,i}^{\ell}=r_{\alpha,i}^{\ell+1}-\xi_{\alpha}^{\ell}\varphi_{\alpha,i% }^{\ell}(\gamma_{\alpha}^{\ell},\gamma_{\hat{\alpha}}^{\ell})=r_{\alpha,i}^{% \ell+1}-\xi_{\alpha}^{\ell}\varphi_{\alpha,i}^{\ell}(\underline{\gamma}_{% \alpha}^{\ell+1},\underline{\gamma}_{\hat{\alpha}}^{\ell+1})italic_r start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = italic_r start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_r start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( under¯ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT , under¯ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT ) (75)

Note that everything on the right-hand side depends on variables from layer +11{\ell+1}roman_ℓ + 1, as desired. Thus, we have shown the layer in Equation (18) is invertible regardless of the form of the network φαsuperscriptsubscript𝜑𝛼\varphi_{\alpha}^{\ell}italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT.

Now let us turn to proving the layer’s 𝔾𝔾\mathbb{G}blackboard_G-equivariance. Recall that the conditions for 𝔾𝔾\mathbb{G}blackboard_G-equivariance are given by Equation (14), which we can combine with Equation (18):

Tα(παrα,πα^rα^)=παTα(rα,rα^)superscriptsubscript𝑇𝛼subscript𝜋𝛼superscriptsubscript𝑟𝛼subscript𝜋^𝛼superscriptsubscript𝑟^𝛼subscript𝜋𝛼superscriptsubscript𝑇𝛼superscriptsubscript𝑟𝛼superscriptsubscript𝑟^𝛼\displaystyle T_{\alpha}^{\ell}(\pi_{\alpha}r_{\alpha}^{\ell},\pi_{\hat{\alpha% }}r_{\hat{\alpha}}^{\ell})=\pi_{\alpha}T_{\alpha}^{\ell}(r_{\alpha}^{\ell},r_{% \hat{\alpha}}^{\ell})italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT )
\displaystyle\Leftrightarrow\quad Tα,i(παrα,πα^rα^)=Tα,πα(i)(rα,rα^)superscriptsubscript𝑇𝛼𝑖subscript𝜋𝛼superscriptsubscript𝑟𝛼subscript𝜋^𝛼superscriptsubscript𝑟^𝛼superscriptsubscript𝑇𝛼subscript𝜋𝛼𝑖superscriptsubscript𝑟𝛼superscriptsubscript𝑟^𝛼\displaystyle T_{\alpha,i}^{\ell}(\pi_{\alpha}r_{\alpha}^{\ell},\pi_{\hat{% \alpha}}r_{\hat{\alpha}}^{\ell})=T_{\alpha,\pi_{\alpha}(i)}^{\ell}(r_{\alpha}^% {\ell},r_{\hat{\alpha}}^{\ell})italic_T start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_T start_POSTSUBSCRIPT italic_α , italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT )
\displaystyle\Leftrightarrow\quad rα,πα(i)+ξαφα,i(παγα,παγα^)=rα,πα(i)+ξαφα,πα(i)(γα,γα^)superscriptsubscript𝑟𝛼subscript𝜋𝛼𝑖superscriptsubscript𝜉𝛼superscriptsubscript𝜑𝛼𝑖subscript𝜋𝛼superscriptsubscript𝛾𝛼subscript𝜋𝛼superscriptsubscript𝛾^𝛼superscriptsubscript𝑟𝛼subscript𝜋𝛼𝑖superscriptsubscript𝜉𝛼superscriptsubscript𝜑𝛼subscript𝜋𝛼𝑖superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼\displaystyle r_{\alpha,\pi_{\alpha}(i)}^{\ell}+\xi_{\alpha}^{\ell}\varphi_{% \alpha,i}^{\ell}(\pi_{\alpha}\gamma_{\alpha}^{\ell},\pi_{\alpha}\gamma_{\hat{% \alpha}}^{\ell})=r_{\alpha,\pi_{\alpha}(i)}^{\ell}+\xi_{\alpha}^{\ell}\varphi_% {\alpha,\pi_{\alpha}(i)}^{\ell}(\gamma_{\alpha}^{\ell},\gamma_{\hat{\alpha}}^{% \ell})italic_r start_POSTSUBSCRIPT italic_α , italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT + italic_ξ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_r start_POSTSUBSCRIPT italic_α , italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT + italic_ξ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_α , italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT )
\displaystyle\Leftrightarrow\quad φα,i(παγα,παγα^)=φα,πα(i)(γα,γα^)superscriptsubscript𝜑𝛼𝑖subscript𝜋𝛼superscriptsubscript𝛾𝛼subscript𝜋𝛼superscriptsubscript𝛾^𝛼superscriptsubscript𝜑𝛼subscript𝜋𝛼𝑖superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼\displaystyle\varphi_{\alpha,i}^{\ell}(\pi_{\alpha}\gamma_{\alpha}^{\ell},\pi_% {\alpha}\gamma_{\hat{\alpha}}^{\ell})=\varphi_{\alpha,\pi_{\alpha}(i)}^{\ell}(% \gamma_{\alpha}^{\ell},\gamma_{\hat{\alpha}}^{\ell})italic_φ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_φ start_POSTSUBSCRIPT italic_α , italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT )
\displaystyle\Leftrightarrow\quad φα(παγα,πα^γα^)=παφα(γα,γα^)superscriptsubscript𝜑𝛼subscript𝜋𝛼superscriptsubscript𝛾𝛼subscript𝜋^𝛼superscriptsubscript𝛾^𝛼subscript𝜋𝛼superscriptsubscript𝜑𝛼superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼\displaystyle\varphi_{\alpha}^{\ell}(\pi_{\alpha}\gamma_{\alpha}^{\ell},\pi_{% \hat{\alpha}}\gamma_{\hat{\alpha}}^{\ell})=\pi_{\alpha}\varphi_{\alpha}^{\ell}% (\gamma_{\alpha}^{\ell},\gamma_{\hat{\alpha}}^{\ell})italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) (76)

where πα(i)subscript𝜋𝛼𝑖\pi_{\alpha}(i)italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_i ) indicates the index that electron i𝑖iitalic_i is moved to under the permutation παsubscript𝜋𝛼\pi_{\alpha}italic_π start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT; and the fourth line follows from the fact that the previous statement must be true for all possible outputs of φαsuperscriptsubscript𝜑𝛼\varphi_{\alpha}^{\ell}italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT. This completes the proof. ∎

Appendix K Implementation of the 𝔾𝔾\mathbb{G}blackboard_G-Equivariant Layer

As we have seen, invertibility places no special restrictions on the form of φαsuperscriptsubscript𝜑𝛼\varphi_{\alpha}^{\ell}italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT. With regard to the conditions imposed by 𝔾𝔾\mathbb{G}blackboard_G-equivariance in Equation (20), there are several ways to achieve them. We propose the following method, as it uses standard off-the-shelf architectures; we use the variables ζα,isuperscriptsubscript𝜁𝛼𝑖\zeta_{\alpha,i}^{\ell}italic_ζ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT to represent intermediate quantities.

  1. 1.

    Lifting: Map each value γα,isuperscriptsubscript𝛾𝛼𝑖\gamma_{\alpha,i}^{\ell}italic_γ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT from dimension Dβsubscript𝐷𝛽D_{\beta}italic_D start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT to dimension Dζsubscript𝐷𝜁D_{\zeta}italic_D start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT:

    ζα,i=Wαγα,isuperscriptsubscript𝜁𝛼𝑖subscript𝑊𝛼superscriptsubscript𝛾𝛼𝑖\zeta_{\alpha,i}^{\ell}=W_{\alpha}\gamma_{\alpha,i}^{\ell}italic_ζ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = italic_W start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT (77)

    where there are two matrices Wαsubscript𝑊𝛼W_{\alpha}italic_W start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT of dimension Dζ×Dβsubscript𝐷𝜁subscript𝐷𝛽D_{\zeta}\times D_{\beta}italic_D start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT × italic_D start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT, one for each spin α{u,d}𝛼𝑢𝑑\alpha\in\{u,d\}italic_α ∈ { italic_u , italic_d }.

  2. 2.

    Multihead Attention: We have two Multihead Attention (MHA) layers ταsuperscriptsubscript𝜏𝛼\tau_{\alpha}^{\ell}italic_τ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT, one for each spin. Each MHA takes as input the the list ζα={ζα,i}i𝒩αsuperscriptsubscript𝜁𝛼subscriptsuperscriptsubscript𝜁𝛼𝑖𝑖subscript𝒩𝛼\zeta_{\alpha}^{\ell}=\{\zeta_{\alpha,i}^{\ell}\}_{i\in{\mathcal{N}_{\alpha}}}italic_ζ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = { italic_ζ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i ∈ caligraphic_N start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_POSTSUBSCRIPT. The output of the MHA is then

    ζατα(ζα)superscriptsubscript𝜁𝛼superscriptsubscript𝜏𝛼superscriptsubscript𝜁𝛼\zeta_{\alpha}^{\ell}\leftarrow\tau_{\alpha}^{\ell}(\zeta_{\alpha}^{\ell})italic_ζ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ← italic_τ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_ζ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) (78)
  3. 3.

    Fully Connected Layer Per Spin: There are two fully connected layers μαsuperscriptsubscript𝜇𝛼\mu_{\alpha}^{\ell}italic_μ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT, one for each spin. The layer is applied per electron, with the same layer being applied to electrons of a given spin:

    ζα,iμα(ζα,i)for i𝒩αformulae-sequencesuperscriptsubscript𝜁𝛼𝑖superscriptsubscript𝜇𝛼superscriptsubscript𝜁𝛼𝑖for 𝑖subscript𝒩𝛼\zeta_{\alpha,i}^{\ell}\leftarrow\mu_{\alpha}^{\ell}(\zeta_{\alpha,i}^{\ell})% \quad\text{for }i\in{\mathcal{N}_{\alpha}}italic_ζ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ← italic_μ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_ζ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) for italic_i ∈ caligraphic_N start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT (79)
  4. 4.

    Average: Form the average values: ζ¯α=1nαi𝒩αζα,isuperscriptsubscript¯𝜁𝛼1subscript𝑛𝛼subscript𝑖subscript𝒩𝛼superscriptsubscript𝜁𝛼𝑖\bar{\zeta}_{\alpha}^{\ell}=\frac{1}{{n_{\alpha}}}\sum_{i\in{\mathcal{N}_{% \alpha}}}\zeta_{\alpha,i}^{\ell}over¯ start_ARG italic_ζ end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_N start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT.

  5. 5.

    Fully Connected Layer with Spin Mixing: We have two fully connected layers μ^αsuperscriptsubscript^𝜇𝛼\hat{\mu}_{\alpha}^{\ell}over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT, one for each spin. Then:

    φα,i(γα,γα^)=μ^α(CAT(ζα,i,ζ¯α^))for i𝒩αformulae-sequencesuperscriptsubscript𝜑𝛼𝑖superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼superscriptsubscript^𝜇𝛼CATsuperscriptsubscript𝜁𝛼𝑖superscriptsubscript¯𝜁^𝛼for 𝑖subscript𝒩𝛼\varphi_{\alpha,i}^{\ell}(\gamma_{\alpha}^{\ell},\gamma_{\hat{\alpha}}^{\ell})% =\hat{\mu}_{\alpha}^{\ell}(\texttt{CAT}(\zeta_{\alpha,i}^{\ell}\,,\,\bar{\zeta% }_{\hat{\alpha}}^{\ell}))\quad\text{for }i\in{\mathcal{N}_{\alpha}}italic_φ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( CAT ( italic_ζ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , over¯ start_ARG italic_ζ end_ARG start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) ) for italic_i ∈ caligraphic_N start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT (80)

    The output of the MLPs μαsuperscriptsubscript𝜇𝛼\mu_{\alpha}^{\ell}italic_μ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is of dimension DDβ𝐷subscript𝐷𝛽D-D_{\beta}italic_D - italic_D start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT.

Due to the permutation-equivariance of Multihead Attention, the 𝔾𝔾\mathbb{G}blackboard_G-equivariance follows naturally. Some comments are in order:

  • We can choose Dβ{1,,D1}subscript𝐷𝛽1𝐷1D_{\beta}\in\{1,\dots,D-1\}italic_D start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ∈ { 1 , … , italic_D - 1 }. Since in our case D=3𝐷3D=3italic_D = 3, this gives us exactly two choices: Dβ=1subscript𝐷𝛽1D_{\beta}=1italic_D start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT = 1 or Dβ=2subscript𝐷𝛽2D_{\beta}=2italic_D start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT = 2.

  • The fully connected layers should use smooth activation functions, i.e. not ReLU. There are many possible smooth substitutes for ReLU-like activations, such as Swish, SiLU, etc.

  • To achieve orthogonalization, i.e. to ensure that ξαsuperscriptsubscript𝜉𝛼\xi_{\alpha}^{\ell}italic_ξ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is itself orthonormal and is also orthogonal to βαsuperscriptsubscript𝛽𝛼\beta_{\alpha}^{\ell}italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT, it is important to use a smooth procedure. Gram-Schmidt may be employed for this purpose: an initial (e.g. random) set of vectors are chosen, which are then orthonormalized by the procedure.

  • In the special case of Helium, there are only 2 electrons: one which is spin-up, and the other which is spin-down. In this case, the requirement that φα(γα,γα^)superscriptsubscript𝜑𝛼superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼\varphi_{\alpha}^{\ell}(\gamma_{\alpha}^{\ell},\gamma_{\hat{\alpha}}^{\ell})italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) be equivariant with respect to permutations of γαsuperscriptsubscript𝛾𝛼\gamma_{\alpha}^{\ell}italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is trivially satisfied; likewise, the requirement that φα(γα,γα^)superscriptsubscript𝜑𝛼superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼\varphi_{\alpha}^{\ell}(\gamma_{\alpha}^{\ell},\gamma_{\hat{\alpha}}^{\ell})italic_φ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) be invariant with respect to permutations of γα^superscriptsubscript𝛾^𝛼\gamma_{\hat{\alpha}}^{\ell}italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is also trivially satisfied. As a result, the Multihead Attention layers ταsuperscriptsubscript𝜏𝛼\tau_{\alpha}^{\ell}italic_τ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT may be replaced by the identity, with everything else remaining the same.

Appendix L A Generalized Variant of the Split Subspace Layer

We note that a generalized variant of the Split Subspace Layer is as follows:

rα,i+1=Tα,i(rα,rα^)=βαηα,i(γα,γα^)+ξαφα,i(γα,γα^)superscriptsubscript𝑟𝛼𝑖1superscriptsubscript𝑇𝛼𝑖superscriptsubscript𝑟𝛼superscriptsubscript𝑟^𝛼superscriptsubscript𝛽𝛼superscriptsubscript𝜂𝛼𝑖superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼superscriptsubscript𝜉𝛼superscriptsubscript𝜑𝛼𝑖superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼r_{\alpha,i}^{\ell+1}=T_{\alpha,i}^{\ell}(r_{\alpha}^{\ell},r_{\hat{\alpha}}^{% \ell})=\beta_{\alpha}^{\ell}\eta_{\alpha,i}^{\ell}(\gamma_{\alpha}^{\ell},% \gamma_{\hat{\alpha}}^{\ell})+\xi_{\alpha}^{\ell}\varphi_{\alpha,i}^{\ell}(% \gamma_{\alpha}^{\ell},\gamma_{\hat{\alpha}}^{\ell})italic_r start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT = italic_T start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_β start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_η start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) + italic_ξ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) (81)

where both ϕitalic-ϕ\phiitalic_ϕ and η𝜂\etaitalic_η satisfy the conditions in (20), and η𝜂\etaitalic_η is explicitly invertible in the sense that the system of equations yα,i=ηα,i(γα,γα^)subscript𝑦𝛼𝑖superscriptsubscript𝜂𝛼𝑖superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼y_{\alpha,i}=\eta_{\alpha,i}^{\ell}(\gamma_{\alpha}^{\ell},\gamma_{\hat{\alpha% }}^{\ell})italic_y start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT = italic_η start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) for all α,i𝛼𝑖\alpha,iitalic_α , italic_i may be inverted to solve for all values of γα,isuperscriptsubscript𝛾𝛼𝑖\gamma_{\alpha,i}^{\ell}italic_γ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT. An example of such an η𝜂\etaitalic_η is given by ηα,i(γα,γα^)=f(Aγα,i+jiBγα,j+jCγα^,j)superscriptsubscript𝜂𝛼𝑖superscriptsubscript𝛾𝛼superscriptsubscript𝛾^𝛼𝑓𝐴superscriptsubscript𝛾𝛼𝑖subscript𝑗𝑖𝐵superscriptsubscript𝛾𝛼𝑗subscript𝑗𝐶superscriptsubscript𝛾^𝛼𝑗\eta_{\alpha,i}^{\ell}(\gamma_{\alpha}^{\ell},\gamma_{\hat{\alpha}}^{\ell})=f(% A\gamma_{\alpha,i}^{\ell}+\sum_{j\neq i}B\gamma_{\alpha,j}^{\ell}+\sum_{j}C% \gamma_{{\hat{\alpha}},j}^{\ell})italic_η start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) = italic_f ( italic_A italic_γ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT italic_B italic_γ start_POSTSUBSCRIPT italic_α , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_C italic_γ start_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) for Dβ×Dβsubscript𝐷𝛽subscript𝐷𝛽D_{\beta}\times D_{\beta}italic_D start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT × italic_D start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT matrices A,B,C𝐴𝐵𝐶A,B,Citalic_A , italic_B , italic_C and an invertible nonlinearity f:DβDβ:𝑓superscriptsubscript𝐷𝛽superscriptsubscript𝐷𝛽f:\mathbb{R}^{D_{\beta}}\to\mathbb{R}^{D_{\beta}}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (such as the cube of each element).

Appendix M Proof of Theorem 8

Theorem.

The local energy can be written as

r(r;θ)=12Δrq(r;θ)12rq(r;θ)2+V(r)subscript𝑟𝑟𝜃12subscriptΔ𝑟𝑞𝑟𝜃12superscriptnormsubscript𝑟𝑞𝑟𝜃2𝑉𝑟\mathcal{E}_{r}(r;\theta)=-\tfrac{1}{2}\Delta_{r}q(r;\theta)-\tfrac{1}{2}\|% \nabla_{r}q(r;\theta)\|^{2}+V(r)caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) = - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_q ( italic_r ; italic_θ ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ ∇ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_q ( italic_r ; italic_θ ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_V ( italic_r )

In particular, the local energy is independent of the phase w(r;θ)𝑤𝑟𝜃w(r;\theta)italic_w ( italic_r ; italic_θ ). Furthermore, let

Ω(r;θ)=θr(r;θ)+2r(r;θ)θq(r;θ)Ω𝑟𝜃subscript𝜃subscript𝑟𝑟𝜃2subscript𝑟𝑟𝜃subscript𝜃𝑞𝑟𝜃\Omega(r;\theta)=\nabla_{\theta}\mathcal{E}_{r}(r;\theta)+2\mathcal{E}_{r}(r;% \theta)\nabla_{\theta}q(r;\theta)roman_Ω ( italic_r ; italic_θ ) = ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) + 2 caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_q ( italic_r ; italic_θ )

Then the gradient of the loss function may be written as

θ(θ)=𝔼rρ(;θ)[Ω(r;θ)]1Kk=1KΩ(r(k);θ)subscript𝜃𝜃subscript𝔼similar-to𝑟𝜌𝜃delimited-[]Ω𝑟𝜃1𝐾superscriptsubscript𝑘1𝐾Ωsuperscript𝑟𝑘𝜃\nabla_{\theta}\mathcal{L}(\theta)=\mathbb{E}_{r\sim\rho(\cdot;\theta)}\left[% \Omega(r;\theta)\right]\approx\frac{1}{K}\sum_{k=1}^{K}\Omega\left(r^{(k)};% \theta\right)∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_L ( italic_θ ) = blackboard_E start_POSTSUBSCRIPT italic_r ∼ italic_ρ ( ⋅ ; italic_θ ) end_POSTSUBSCRIPT [ roman_Ω ( italic_r ; italic_θ ) ] ≈ divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT roman_Ω ( italic_r start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ; italic_θ )

with samples r(k)ρ(;θ)similar-tosuperscript𝑟𝑘𝜌𝜃r^{(k)}\sim\rho(\cdot;\theta)italic_r start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ∼ italic_ρ ( ⋅ ; italic_θ ).

Proof.

For the moment, we suppress θ𝜃\thetaitalic_θ for convenience. Recall that the overall (i.e. complex) local energy is defined by

(r)=Hψ(r)ψ(r)=Δψ(r)2ψ(r)+V(r)𝑟𝐻𝜓𝑟𝜓𝑟Δ𝜓𝑟2𝜓𝑟𝑉𝑟\mathcal{E}(r)=\frac{H\psi(r)}{\psi(r)}=-\frac{\Delta\psi(r)}{2\psi(r)}+V(r)caligraphic_E ( italic_r ) = divide start_ARG italic_H italic_ψ ( italic_r ) end_ARG start_ARG italic_ψ ( italic_r ) end_ARG = - divide start_ARG roman_Δ italic_ψ ( italic_r ) end_ARG start_ARG 2 italic_ψ ( italic_r ) end_ARG + italic_V ( italic_r ) (82)

Let rj,dsubscript𝑟𝑗𝑑r_{j,d}italic_r start_POSTSUBSCRIPT italic_j , italic_d end_POSTSUBSCRIPT be the dthsuperscript𝑑𝑡d^{th}italic_d start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT component of the position vector of the jthsuperscript𝑗𝑡j^{th}italic_j start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT electron, d=1,,D𝑑1𝐷d=1,\dots,Ditalic_d = 1 , … , italic_D. Then plugging in ψ(r)=eq(r)+iw(r)𝜓𝑟superscript𝑒𝑞𝑟𝑖𝑤𝑟\psi(r)=e^{q(r)+iw(r)}italic_ψ ( italic_r ) = italic_e start_POSTSUPERSCRIPT italic_q ( italic_r ) + italic_i italic_w ( italic_r ) end_POSTSUPERSCRIPT, we have that

ψrj,d=eq(r)+iw(r)(qrj,d+iwrj,d)=ψ(r)(qrj,d+iwrj,d)𝜓subscript𝑟𝑗𝑑superscript𝑒𝑞𝑟𝑖𝑤𝑟𝑞subscript𝑟𝑗𝑑𝑖𝑤subscript𝑟𝑗𝑑𝜓𝑟𝑞subscript𝑟𝑗𝑑𝑖𝑤subscript𝑟𝑗𝑑\frac{\partial\psi}{\partial r_{j,d}}=e^{q(r)+iw(r)}\left(\frac{\partial q}{% \partial r_{j,d}}+i\frac{\partial w}{\partial r_{j,d}}\right)=\psi(r)\left(% \frac{\partial q}{\partial r_{j,d}}+i\frac{\partial w}{\partial r_{j,d}}\right)divide start_ARG ∂ italic_ψ end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_j , italic_d end_POSTSUBSCRIPT end_ARG = italic_e start_POSTSUPERSCRIPT italic_q ( italic_r ) + italic_i italic_w ( italic_r ) end_POSTSUPERSCRIPT ( divide start_ARG ∂ italic_q end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_j , italic_d end_POSTSUBSCRIPT end_ARG + italic_i divide start_ARG ∂ italic_w end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_j , italic_d end_POSTSUBSCRIPT end_ARG ) = italic_ψ ( italic_r ) ( divide start_ARG ∂ italic_q end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_j , italic_d end_POSTSUBSCRIPT end_ARG + italic_i divide start_ARG ∂ italic_w end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_j , italic_d end_POSTSUBSCRIPT end_ARG ) (83)

and

2ψrj,d2superscript2𝜓superscriptsubscript𝑟𝑗𝑑2\displaystyle\frac{\partial^{2}\psi}{\partial r_{j,d}^{2}}divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ψ end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_j , italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG =eq(r)+iw(r)(2qrj,d2+i2wrj,d2)+eq(r)+iw(r)(qrj,d+iwrj,d)2absentsuperscript𝑒𝑞𝑟𝑖𝑤𝑟superscript2𝑞superscriptsubscript𝑟𝑗𝑑2𝑖superscript2𝑤superscriptsubscript𝑟𝑗𝑑2superscript𝑒𝑞𝑟𝑖𝑤𝑟superscript𝑞subscript𝑟𝑗𝑑𝑖𝑤subscript𝑟𝑗𝑑2\displaystyle=e^{q(r)+iw(r)}\left(\frac{\partial^{2}q}{\partial r_{j,d}^{2}}+i% \frac{\partial^{2}w}{\partial r_{j,d}^{2}}\right)+e^{q(r)+iw(r)}\left(\frac{% \partial q}{\partial r_{j,d}}+i\frac{\partial w}{\partial r_{j,d}}\right)^{2}= italic_e start_POSTSUPERSCRIPT italic_q ( italic_r ) + italic_i italic_w ( italic_r ) end_POSTSUPERSCRIPT ( divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_q end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_j , italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_i divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_w end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_j , italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) + italic_e start_POSTSUPERSCRIPT italic_q ( italic_r ) + italic_i italic_w ( italic_r ) end_POSTSUPERSCRIPT ( divide start_ARG ∂ italic_q end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_j , italic_d end_POSTSUBSCRIPT end_ARG + italic_i divide start_ARG ∂ italic_w end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_j , italic_d end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=ψ(r)[(2qrj,d2+(qrj,d)2(wrj,d)2)+i(2wrj,d2+2qrj,dwrj,d)]absent𝜓𝑟delimited-[]superscript2𝑞superscriptsubscript𝑟𝑗𝑑2superscript𝑞subscript𝑟𝑗𝑑2superscript𝑤subscript𝑟𝑗𝑑2𝑖superscript2𝑤superscriptsubscript𝑟𝑗𝑑22𝑞subscript𝑟𝑗𝑑𝑤subscript𝑟𝑗𝑑\displaystyle=\psi(r)\left[\left(\frac{\partial^{2}q}{\partial r_{j,d}^{2}}+% \left(\frac{\partial q}{\partial r_{j,d}}\right)^{2}-\left(\frac{\partial w}{% \partial r_{j,d}}\right)^{2}\right)+i\left(\frac{\partial^{2}w}{\partial r_{j,% d}^{2}}+2\frac{\partial q}{\partial r_{j,d}}\frac{\partial w}{\partial r_{j,d}% }\right)\right]= italic_ψ ( italic_r ) [ ( divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_q end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_j , italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + ( divide start_ARG ∂ italic_q end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_j , italic_d end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ( divide start_ARG ∂ italic_w end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_j , italic_d end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_i ( divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_w end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_j , italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 2 divide start_ARG ∂ italic_q end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_j , italic_d end_POSTSUBSCRIPT end_ARG divide start_ARG ∂ italic_w end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_j , italic_d end_POSTSUBSCRIPT end_ARG ) ] (84)

With the appropriate summation, this immediately yields

(r)=12j=1n[(Δjq+jq2jw2)+i(Δjw+2jqjw)]+V(r)𝑟12superscriptsubscript𝑗1𝑛delimited-[]subscriptΔ𝑗𝑞superscriptnormsubscript𝑗𝑞2superscriptnormsubscript𝑗𝑤2𝑖subscriptΔ𝑗𝑤2subscript𝑗𝑞subscript𝑗𝑤𝑉𝑟\mathcal{E}(r)=-\frac{1}{2}\sum_{j=1}^{n}\left[\left(\Delta_{j}q+\|\nabla_{j}q% \|^{2}-\|\nabla_{j}w\|^{2}\right)+i\left(\Delta_{j}w+2\nabla_{j}q\cdot\nabla_{% j}w\right)\right]+V(r)caligraphic_E ( italic_r ) = - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ ( roman_Δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_q + ∥ ∇ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_q ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∥ ∇ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_w ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_i ( roman_Δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_w + 2 ∇ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_q ⋅ ∇ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_w ) ] + italic_V ( italic_r ) (85)

so that its real part simplifies to

r(r)subscript𝑟𝑟\displaystyle\mathcal{E}_{r}(r)caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ) =12j=1n(jw2Δjqjq2)+V(r)absent12superscriptsubscript𝑗1𝑛superscriptnormsubscript𝑗𝑤2subscriptΔ𝑗𝑞superscriptnormsubscript𝑗𝑞2𝑉𝑟\displaystyle=-\frac{1}{2}\sum_{j=1}^{n}\left(\|\nabla_{j}w\|^{2}-\Delta_{j}q-% \|\nabla_{j}q\|^{2}\right)+V(r)= - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( ∥ ∇ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_w ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - roman_Δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_q - ∥ ∇ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_q ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_V ( italic_r )
=12(w2Δqq2)+V(r)absent12superscriptnorm𝑤2Δ𝑞superscriptnorm𝑞2𝑉𝑟\displaystyle=\tfrac{1}{2}\left(\|\nabla w\|^{2}-\Delta q-\|\nabla q\|^{2}% \right)+V(r)= divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( ∥ ∇ italic_w ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - roman_Δ italic_q - ∥ ∇ italic_q ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_V ( italic_r ) (86)

Now, it is known that since the Hamiltonian is time-reversal invariant and Hermitian, both its eigenvalues and its eigenfunctions are real. Since the ground-state wavefunction we are looking for is real, the phase w(r)𝑤𝑟w(r)italic_w ( italic_r ) can be taken to belong to the two element set {0,π}0𝜋\{0,\pi\}{ 0 , italic_π }, where w(r)=0𝑤𝑟0w(r)=0italic_w ( italic_r ) = 0 corresponds to positive values of the wavefunction ψ(r)𝜓𝑟\psi(r)italic_ψ ( italic_r ), and w(r)=π𝑤𝑟𝜋w(r)=\piitalic_w ( italic_r ) = italic_π to negative values of ψ(r)𝜓𝑟\psi(r)italic_ψ ( italic_r ). Thus, where the sign of ψ(r)𝜓𝑟\psi(r)italic_ψ ( italic_r ) does not change w(r)𝑤𝑟w(r)italic_w ( italic_r ) is constant, and therefore w(r)2=0superscriptnorm𝑤𝑟20\|\nabla w(r)\|^{2}=0∥ ∇ italic_w ( italic_r ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0.

We are then left to consider the case when the sign of ψ(r)𝜓𝑟\psi(r)italic_ψ ( italic_r ) flips, and therefore there is a discontinuity in w(r)𝑤𝑟w(r)italic_w ( italic_r ); this occurs precisely where ψ(r)=0𝜓𝑟0\psi(r)=0italic_ψ ( italic_r ) = 0. However, recall from Equation (24)

(θ)=ψ(;θ)|H|ψ(;θ)=𝔼rρ(;θ)[r(r;θ)]𝜃quantum-operator-product𝜓𝜃𝐻𝜓𝜃subscript𝔼similar-to𝑟𝜌𝜃delimited-[]subscript𝑟𝑟𝜃\mathcal{L}(\theta)=\langle\psi(\cdot;\theta)|H|\psi(\cdot;\theta)\rangle\,=\,% \mathbb{E}_{r\sim\rho(\cdot;\theta)}\left[\mathcal{E}_{r}(r;\theta)\right]caligraphic_L ( italic_θ ) = ⟨ italic_ψ ( ⋅ ; italic_θ ) | italic_H | italic_ψ ( ⋅ ; italic_θ ) ⟩ = blackboard_E start_POSTSUBSCRIPT italic_r ∼ italic_ρ ( ⋅ ; italic_θ ) end_POSTSUBSCRIPT [ caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) ] (87)

When ψ(r)=0𝜓𝑟0\psi(r)=0italic_ψ ( italic_r ) = 0 then ρ(r)=0𝜌𝑟0\rho(r)=0italic_ρ ( italic_r ) = 0; thus, samples where there is a discontinuity are never selected. We may therefore set the local energy at such values of r𝑟ritalic_r to any value we wish, without affecting the value of (θ)𝜃\mathcal{L}(\theta)caligraphic_L ( italic_θ ). In particular, we are free to set w(r)=0norm𝑤𝑟0\|\nabla w(r)\|=0∥ ∇ italic_w ( italic_r ) ∥ = 0 at such points. In conclusion, then, we have demonstrated that

r(r;θ)=12Δrq(r;θ)12rq(r;θ)2+V(r)subscript𝑟𝑟𝜃12subscriptΔ𝑟𝑞𝑟𝜃12superscriptnormsubscript𝑟𝑞𝑟𝜃2𝑉𝑟\mathcal{E}_{r}(r;\theta)=-\tfrac{1}{2}\Delta_{r}q(r;\theta)-\tfrac{1}{2}\|% \nabla_{r}q(r;\theta)\|^{2}+V(r)caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) = - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_q ( italic_r ; italic_θ ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ ∇ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_q ( italic_r ; italic_θ ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_V ( italic_r ) (88)

which is independent of the phase w(r)𝑤𝑟w(r)italic_w ( italic_r ).

Turning to the second part of the theorem, we note that

(θ)=𝔼rρ(;θ)[r(r;θ)]=r(r;θ)ρ(r;θ)𝑑r𝜃subscript𝔼similar-to𝑟𝜌𝜃delimited-[]subscript𝑟𝑟𝜃subscript𝑟𝑟𝜃𝜌𝑟𝜃differential-d𝑟\mathcal{L}(\theta)\,=\,\mathbb{E}_{r\sim\rho(\cdot;\theta)}\left[\mathcal{E}_% {r}(r;\theta)\right]\,=\,\int\mathcal{E}_{r}(r;\theta)\rho(r;\theta)drcaligraphic_L ( italic_θ ) = blackboard_E start_POSTSUBSCRIPT italic_r ∼ italic_ρ ( ⋅ ; italic_θ ) end_POSTSUBSCRIPT [ caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) ] = ∫ caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) italic_ρ ( italic_r ; italic_θ ) italic_d italic_r (89)

so that

θ(θ)=(θr(r;θ)ρ(r;θ)+r(r;θ)θρ(r;θ))𝑑rsubscript𝜃𝜃subscript𝜃subscript𝑟𝑟𝜃𝜌𝑟𝜃subscript𝑟𝑟𝜃subscript𝜃𝜌𝑟𝜃differential-d𝑟\nabla_{\theta}\mathcal{L}(\theta)\,=\,\int\left(\nabla_{\theta}\mathcal{E}_{r% }(r;\theta)\rho(r;\theta)+\mathcal{E}_{r}(r;\theta)\nabla_{\theta}\rho(r;% \theta)\right)dr∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_L ( italic_θ ) = ∫ ( ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) italic_ρ ( italic_r ; italic_θ ) + caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_ρ ( italic_r ; italic_θ ) ) italic_d italic_r (90)

However, since q(r;θ)=12logρ(r;θ)𝑞𝑟𝜃12𝜌𝑟𝜃q(r;\theta)=\frac{1}{2}\log\rho(r;\theta)italic_q ( italic_r ; italic_θ ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log italic_ρ ( italic_r ; italic_θ ), then θq(r;θ)=θρ(r;θ)/2ρ(r;θ)subscript𝜃𝑞𝑟𝜃subscript𝜃𝜌𝑟𝜃2𝜌𝑟𝜃\nabla_{\theta}q(r;\theta)=\nabla_{\theta}\rho(r;\theta)/2\rho(r;\theta)∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_q ( italic_r ; italic_θ ) = ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_ρ ( italic_r ; italic_θ ) / 2 italic_ρ ( italic_r ; italic_θ ), or θρ(r;θ)=2ρ(r;θ)θq(r;θ)subscript𝜃𝜌𝑟𝜃2𝜌𝑟𝜃subscript𝜃𝑞𝑟𝜃\nabla_{\theta}\rho(r;\theta)=2\rho(r;\theta)\nabla_{\theta}q(r;\theta)∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_ρ ( italic_r ; italic_θ ) = 2 italic_ρ ( italic_r ; italic_θ ) ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_q ( italic_r ; italic_θ ). Plugging this in gives

θ(θ)subscript𝜃𝜃\displaystyle\nabla_{\theta}\mathcal{L}(\theta)∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_L ( italic_θ ) =(θr(r;θ)ρ(r;θ)+2r(r;θ)ρ(r;θ)θq(r;θ))𝑑rabsentsubscript𝜃subscript𝑟𝑟𝜃𝜌𝑟𝜃2subscript𝑟𝑟𝜃𝜌𝑟𝜃subscript𝜃𝑞𝑟𝜃differential-d𝑟\displaystyle\,=\,\int\left(\nabla_{\theta}\mathcal{E}_{r}(r;\theta)\rho(r;% \theta)+2\mathcal{E}_{r}(r;\theta)\rho(r;\theta)\nabla_{\theta}q(r;\theta)% \right)dr= ∫ ( ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) italic_ρ ( italic_r ; italic_θ ) + 2 caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) italic_ρ ( italic_r ; italic_θ ) ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_q ( italic_r ; italic_θ ) ) italic_d italic_r
=ρ(r;θ)(θr(r;θ)+2r(r;θ)θq(r;θ))𝑑rabsent𝜌𝑟𝜃subscript𝜃subscript𝑟𝑟𝜃2subscript𝑟𝑟𝜃subscript𝜃𝑞𝑟𝜃differential-d𝑟\displaystyle\,=\,\int\rho(r;\theta)\left(\nabla_{\theta}\mathcal{E}_{r}(r;% \theta)+2\mathcal{E}_{r}(r;\theta)\nabla_{\theta}q(r;\theta)\right)dr= ∫ italic_ρ ( italic_r ; italic_θ ) ( ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) + 2 caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_q ( italic_r ; italic_θ ) ) italic_d italic_r
=𝔼rρ(;θ)[Ω(r;θ)]absentsubscript𝔼similar-to𝑟𝜌𝜃delimited-[]Ω𝑟𝜃\displaystyle\,=\,\mathbb{E}_{r\sim\rho(\cdot;\theta)}\left[\Omega(r;\theta)\right]= blackboard_E start_POSTSUBSCRIPT italic_r ∼ italic_ρ ( ⋅ ; italic_θ ) end_POSTSUBSCRIPT [ roman_Ω ( italic_r ; italic_θ ) ] (91)

where Ω(r;θ)=θr(r;θ)+2r(r;θ)θq(r;θ)Ω𝑟𝜃subscript𝜃subscript𝑟𝑟𝜃2subscript𝑟𝑟𝜃subscript𝜃𝑞𝑟𝜃\Omega(r;\theta)=\nabla_{\theta}\mathcal{E}_{r}(r;\theta)+2\mathcal{E}_{r}(r;% \theta)\nabla_{\theta}q(r;\theta)roman_Ω ( italic_r ; italic_θ ) = ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) + 2 caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_q ( italic_r ; italic_θ ). ∎

Appendix N Optimization of the Objective Function

In order to optimize the objective in Equation (24), we use the procedure in Algorithm 2, which is specified for the discrete normalizing flow; the procedure for the continuous normalizing flow will be similar. Note that we initially sample a large number Klargesubscript𝐾𝑙𝑎𝑟𝑔𝑒K_{large}italic_K start_POSTSUBSCRIPT italic_l italic_a italic_r italic_g italic_e end_POSTSUBSCRIPT of samples from the base density; we emphasize that this step can be performed entirely offline, and does not entail additional computational complexity.

Algorithm 2 Computation of Ground State Wavefunction and Energy
0:  base log-density qz()subscript𝑞𝑧q_{z}(\cdot)italic_q start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( ⋅ ), normalizing flow {T(;θ)}=0Lsuperscriptsubscriptsuperscript𝑇𝜃0𝐿\{T^{\ell}(\cdot;\theta)\}_{\ell=0}^{L}{ italic_T start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( ⋅ ; italic_θ ) } start_POSTSUBSCRIPT roman_ℓ = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT, potential V()𝑉V(\cdot)italic_V ( ⋅ ), learning rate ϵitalic-ϵ\epsilonitalic_ϵ
  sample 𝒵={z(k)}k=1Klarge𝒵superscriptsubscriptsuperscript𝑧𝑘𝑘1subscript𝐾𝑙𝑎𝑟𝑔𝑒\mathcal{Z}=\left\{z^{(k)}\right\}_{k=1}^{K_{large}}caligraphic_Z = { italic_z start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_l italic_a italic_r italic_g italic_e end_POSTSUBSCRIPT end_POSTSUPERSCRIPT for z(k)ρz()similar-tosuperscript𝑧𝑘subscript𝜌𝑧z^{(k)}\sim\rho_{z}(\cdot)italic_z start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ∼ italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( ⋅ ) and Klargesubscript𝐾𝑙𝑎𝑟𝑔𝑒K_{large}italic_K start_POSTSUBSCRIPT italic_l italic_a italic_r italic_g italic_e end_POSTSUBSCRIPT a very large number of samples
  take q(r;θ)𝑞𝑟𝜃q(r;\theta)italic_q ( italic_r ; italic_θ ) from (23) and use auto-differentiation to compute rq(r;θ)subscript𝑟𝑞𝑟𝜃\nabla_{r}q(r;\theta)∇ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_q ( italic_r ; italic_θ ) and Δrq(r;θ)subscriptΔ𝑟𝑞𝑟𝜃\Delta_{r}q(r;\theta)roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_q ( italic_r ; italic_θ )
  using rq(r;θ)subscript𝑟𝑞𝑟𝜃\nabla_{r}q(r;\theta)∇ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_q ( italic_r ; italic_θ ) and Δrq(r;θ)subscriptΔ𝑟𝑞𝑟𝜃\Delta_{r}q(r;\theta)roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_q ( italic_r ; italic_θ ), compute r(r;θ)subscript𝑟𝑟𝜃\mathcal{E}_{r}(r;\theta)caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r ; italic_θ ) from (25)
  using auto-differentiation, compute the function Ω(r;θ)Ω𝑟𝜃\Omega(r;\theta)roman_Ω ( italic_r ; italic_θ ) as in (26)
  initialize θ𝜃\thetaitalic_θ, e.g. using Xavier initialization
  while not converged do
     sample K𝐾Kitalic_K values of z(k)superscript𝑧𝑘z^{(k)}italic_z start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT from 𝒵𝒵\mathcal{Z}caligraphic_Z
     compute r(k)=T(z(k);θ)superscript𝑟𝑘𝑇superscript𝑧𝑘𝜃r^{(k)}=T(z^{(k)};\theta)italic_r start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = italic_T ( italic_z start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ; italic_θ ) using T=TLT0𝑇subscript𝑇𝐿subscript𝑇0T=T_{L}\circ\dots\circ T_{0}italic_T = italic_T start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ∘ ⋯ ∘ italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
     compute the energy E=1Kk=1Kr(r(k);θ)𝐸1𝐾superscriptsubscript𝑘1𝐾subscript𝑟superscript𝑟𝑘𝜃E=\frac{1}{K}\sum_{k=1}^{K}\mathcal{E}_{r}(r^{(k)};\theta)italic_E = divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ; italic_θ )
     compute the gradient g=1Kk=1KΩ(r(k);θ)𝑔1𝐾superscriptsubscript𝑘1𝐾Ωsuperscript𝑟𝑘𝜃g=\frac{1}{K}\sum_{k=1}^{K}\Omega(r^{(k)};\theta)italic_g = divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT roman_Ω ( italic_r start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ; italic_θ )
     take θθϵg𝜃𝜃italic-ϵ𝑔\theta\leftarrow\theta-\epsilon gitalic_θ ← italic_θ - italic_ϵ italic_g
  end while
  return  E𝐸Eitalic_E, θ𝜃\thetaitalic_θ

Appendix O Proof of Theorem 9

Theorem.

Let ρ0(r)subscript𝜌0𝑟\rho_{0}(r)italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r ) be the density for the ground state wavefunction. Let precedes\prec be a strict total order on Dsuperscript𝐷\mathbb{R}^{D}blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT, and define the set

={r=(r1,rn):r1r2rnu and rnu+1rnu+2rn}conditional-set𝑟subscript𝑟1subscript𝑟𝑛precedessubscript𝑟1subscript𝑟2precedesprecedessubscript𝑟subscript𝑛𝑢 and subscript𝑟subscript𝑛𝑢1precedessubscript𝑟subscript𝑛𝑢2precedesprecedessubscript𝑟𝑛\mathcal{R}=\{r=(r_{1},\dots r_{n}):r_{1}\prec r_{2}\prec\dots\prec r_{n_{u}}% \,\,\text{ and }\,\,r_{n_{u}+1}\prec r_{n_{u}+2}\prec\dots\prec r_{n}\}caligraphic_R = { italic_r = ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) : italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≺ italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≺ ⋯ ≺ italic_r start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT and italic_r start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT ≺ italic_r start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT + 2 end_POSTSUBSCRIPT ≺ ⋯ ≺ italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }

For any r𝑟ritalic_r without ri=rjsubscript𝑟𝑖subscript𝑟𝑗r_{i}=r_{j}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, define the permutation π(r)𝔾subscript𝜋precedes𝑟𝔾{\pi_{\prec}(r)}\in\mathbb{G}italic_π start_POSTSUBSCRIPT ≺ end_POSTSUBSCRIPT ( italic_r ) ∈ blackboard_G by π(r)rsubscript𝜋precedes𝑟𝑟{\pi_{\prec}(r)}r\in\mathcal{R}italic_π start_POSTSUBSCRIPT ≺ end_POSTSUBSCRIPT ( italic_r ) italic_r ∈ caligraphic_R. Then a valid antisymmetric ground state wavefunction is given by

ψ0(r)={(1)π(r)ρ0(r)if rirji,j0otherwisesubscript𝜓0𝑟casessuperscript1subscript𝜋precedes𝑟subscript𝜌0𝑟if subscript𝑟𝑖subscript𝑟𝑗for-all𝑖𝑗0otherwise\psi_{0}(r)=\begin{cases}(-1)^{\pi_{\prec}(r)}\sqrt{\rho_{0}(r)}&\text{if }r_{% i}\neq r_{j}\,\,\forall i,j\\ 0&\text{otherwise}\end{cases}italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r ) = { start_ROW start_CELL ( - 1 ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT ≺ end_POSTSUBSCRIPT ( italic_r ) end_POSTSUPERSCRIPT square-root start_ARG italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r ) end_ARG end_CELL start_CELL if italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∀ italic_i , italic_j end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise end_CELL end_ROW
Proof.

We begin by noting that the set \mathcal{R}caligraphic_R contains the spin-up electrons in ascending order, according to the ordering relation precedes\prec, and the spin-down electrons also in ascending order. Now, begin by considering the case of r𝑟ritalic_r for which ri=rjsubscript𝑟𝑖subscript𝑟𝑗r_{i}=r_{j}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for some pair of electrons i𝑖iitalic_i and j𝑗jitalic_j; in this case, ψ0(r)=0subscript𝜓0𝑟0\psi_{0}(r)=0italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r ) = 0, as is required by antisymmetry. Now, consider the case of r𝑟ritalic_r for which rirji,jsubscript𝑟𝑖subscript𝑟𝑗for-all𝑖𝑗r_{i}\neq r_{j}\,\,\forall i,jitalic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∀ italic_i , italic_j. In this case, for any permutation π𝔾𝜋𝔾\pi\in\mathbb{G}italic_π ∈ blackboard_G we have that

ψ0(πr)=(1)π(πr)ρ0(πr)subscript𝜓0𝜋𝑟superscript1subscript𝜋precedes𝜋𝑟subscript𝜌0𝜋𝑟\psi_{0}(\pi r)=(-1)^{\pi_{\prec}(\pi r)}\sqrt{\rho_{0}(\pi r)}italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_π italic_r ) = ( - 1 ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT ≺ end_POSTSUBSCRIPT ( italic_π italic_r ) end_POSTSUPERSCRIPT square-root start_ARG italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_π italic_r ) end_ARG (92)

However, recall that π(r)subscript𝜋precedes𝑟{\pi_{\prec}(r)}italic_π start_POSTSUBSCRIPT ≺ end_POSTSUBSCRIPT ( italic_r ) is defined by

π(r)rsubscript𝜋precedes𝑟𝑟{\pi_{\prec}(r)}r\in\mathcal{R}italic_π start_POSTSUBSCRIPT ≺ end_POSTSUBSCRIPT ( italic_r ) italic_r ∈ caligraphic_R (93)

Therefore, π(πr)subscript𝜋precedes𝜋𝑟\pi_{\prec}(\pi r)italic_π start_POSTSUBSCRIPT ≺ end_POSTSUBSCRIPT ( italic_π italic_r ) is defined by

π(πr)πrsubscript𝜋precedes𝜋𝑟𝜋𝑟\pi_{\prec}(\pi r)\pi r\in\mathcal{R}italic_π start_POSTSUBSCRIPT ≺ end_POSTSUBSCRIPT ( italic_π italic_r ) italic_π italic_r ∈ caligraphic_R (94)

Comparing the latter two equations, we see that

π(πr)π=π(r)π(πr)=π(r)π1formulae-sequencesubscript𝜋precedes𝜋𝑟𝜋subscript𝜋precedes𝑟subscript𝜋precedes𝜋𝑟subscript𝜋precedes𝑟superscript𝜋1\pi_{\prec}(\pi r)\pi={\pi_{\prec}(r)}\qquad\Rightarrow\qquad\pi_{\prec}(\pi r% )={\pi_{\prec}(r)}\pi^{-1}italic_π start_POSTSUBSCRIPT ≺ end_POSTSUBSCRIPT ( italic_π italic_r ) italic_π = italic_π start_POSTSUBSCRIPT ≺ end_POSTSUBSCRIPT ( italic_r ) ⇒ italic_π start_POSTSUBSCRIPT ≺ end_POSTSUBSCRIPT ( italic_π italic_r ) = italic_π start_POSTSUBSCRIPT ≺ end_POSTSUBSCRIPT ( italic_r ) italic_π start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT (95)

Furthermore, we know that as ρ0(x)subscript𝜌0𝑥\rho_{0}(x)italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ) is the density for the ground state wavefunction, it must satisfy property (D1) of Theorem 2, namely it must be 𝔾𝔾\mathbb{G}blackboard_G-invariant; therefore, we must have that

ρ0(πr)=ρ0(r)subscript𝜌0𝜋𝑟subscript𝜌0𝑟\rho_{0}(\pi r)=\rho_{0}(r)italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_π italic_r ) = italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r ) (96)

Plugging Equations (95) and (96) into (92) gives

ψ0(πr)subscript𝜓0𝜋𝑟\displaystyle\psi_{0}(\pi r)italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_π italic_r ) =(1)π(r)π1ρ0(r)absentsuperscript1subscript𝜋precedes𝑟superscript𝜋1subscript𝜌0𝑟\displaystyle=(-1)^{{\pi_{\prec}(r)}\pi^{-1}}\sqrt{\rho_{0}(r)}= ( - 1 ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT ≺ end_POSTSUBSCRIPT ( italic_r ) italic_π start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT square-root start_ARG italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r ) end_ARG
=(1)π(1)π(r)ρ0(r)absentsuperscript1𝜋superscript1subscript𝜋precedes𝑟subscript𝜌0𝑟\displaystyle=(-1)^{\pi}(-1)^{\pi_{\prec}(r)}\sqrt{\rho_{0}(r)}= ( - 1 ) start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT ≺ end_POSTSUBSCRIPT ( italic_r ) end_POSTSUPERSCRIPT square-root start_ARG italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r ) end_ARG
=(1)πψ0(r)absentsuperscript1𝜋subscript𝜓0𝑟\displaystyle=(-1)^{\pi}\psi_{0}(r)= ( - 1 ) start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r ) (97)

where in the second line, we have used the facts that (1)πaπb=(1)πa(1)πbsuperscript1subscript𝜋𝑎subscript𝜋𝑏superscript1subscript𝜋𝑎superscript1subscript𝜋𝑏(-1)^{\pi_{a}\pi_{b}}=(-1)^{\pi_{a}}(-1)^{\pi_{b}}( - 1 ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = ( - 1 ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUPERSCRIPT; and that (1)π1=(1)πsuperscript1superscript𝜋1superscript1𝜋(-1)^{\pi^{-1}}=(-1)^{\pi}( - 1 ) start_POSTSUPERSCRIPT italic_π start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT = ( - 1 ) start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT. But Equation (97) is exactly the antisymmetry property we desire, and so we have completed the proof.

Finally, we note that ψ0(r)>0subscript𝜓0𝑟0\psi_{0}(r)>0italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r ) > 0 for r𝑟r\in\mathcal{R}italic_r ∈ caligraphic_R; this is an arbitrary choice, and we could have equally well defined a second ground state wavefunction ψ~0subscript~𝜓0\tilde{\psi}_{0}over~ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT with ψ~0(r)<0subscript~𝜓0𝑟0\tilde{\psi}_{0}(r)<0over~ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r ) < 0 for r𝑟r\in\mathcal{R}italic_r ∈ caligraphic_R. It is easy to see that in this case, ψ~0(r)=ψ0(r)subscript~𝜓0𝑟subscript𝜓0𝑟\tilde{\psi}_{0}(r)=-\psi_{0}(r)over~ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r ) = - italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r ) for all r𝑟ritalic_r. However, this is not surprising: either ψ0subscript𝜓0\psi_{0}italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT or ψ0subscript𝜓0-\psi_{0}- italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT may be taken as an eigenfunction of H𝐻Hitalic_H, as eigenfunctions are only defined up to sign. ∎

Appendix P Proof of Theorem 10

Theorem.

Let the set of distances be given by δ={δij}i<jsuperscript𝛿subscriptsuperscriptsubscript𝛿𝑖𝑗𝑖𝑗\delta^{\ell}=\left\{\delta_{ij}^{\ell}\right\}_{i<j}italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = { italic_δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i < italic_j end_POSTSUBSCRIPT where δij=rirjsuperscriptsubscript𝛿𝑖𝑗normsuperscriptsubscript𝑟𝑖superscriptsubscript𝑟𝑗\delta_{ij}^{\ell}=\|r_{i}^{\ell}-r_{j}^{\ell}\|italic_δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = ∥ italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT - italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ∥. Given a layer of the form

ri+1=Θ(δ;θ)ri+t(δ;θ)with Θ(δ;θ)O(D) and t(δ;θ)Dformulae-sequencesuperscriptsubscript𝑟𝑖1superscriptΘsuperscript𝛿𝜃superscriptsubscript𝑟𝑖superscript𝑡superscript𝛿𝜃with superscriptΘsuperscript𝛿𝜃𝑂𝐷 and superscript𝑡superscript𝛿𝜃superscript𝐷r_{i}^{\ell+1}=\Theta^{\ell}(\delta^{\ell};\theta)\,r_{i}^{\ell}+t^{\ell}(% \delta^{\ell};\theta)\qquad\text{with }\Theta^{\ell}(\delta^{\ell};\theta)\in O% (D)\text{ and }t^{\ell}(\delta^{\ell};\theta)\in\mathbb{R}^{D}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT = roman_Θ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ ) italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT + italic_t start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ ) with roman_Θ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ ) ∈ italic_O ( italic_D ) and italic_t start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT

Then the layer is both 𝔾𝔾\mathbb{G}blackboard_G-equivariant as well as invertible.

Proof.

Let us begin with invertibility. We may compute the inter-electron distances at layer +11{\ell+1}roman_ℓ + 1:

δij+1superscriptsubscript𝛿𝑖𝑗1\displaystyle\delta_{ij}^{\ell+1}italic_δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT =ri+1rj+1absentnormsuperscriptsubscript𝑟𝑖1superscriptsubscript𝑟𝑗1\displaystyle=\|r_{i}^{\ell+1}-r_{j}^{\ell+1}\|= ∥ italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT - italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT ∥
=Θ(δ;θ)ri+t(δ;θ)Θ(δ;θ)rjt(δ;θ)absentnormsuperscriptΘsuperscript𝛿𝜃superscriptsubscript𝑟𝑖superscript𝑡superscript𝛿𝜃superscriptΘsuperscript𝛿𝜃superscriptsubscript𝑟𝑗superscript𝑡superscript𝛿𝜃\displaystyle=\|\Theta^{\ell}(\delta^{\ell};\theta)\,r_{i}^{\ell}+t^{\ell}(% \delta^{\ell};\theta)-\Theta^{\ell}(\delta^{\ell};\theta)\,r_{j}^{\ell}-t^{% \ell}(\delta^{\ell};\theta)\|= ∥ roman_Θ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ ) italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT + italic_t start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ ) - roman_Θ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ ) italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT - italic_t start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ ) ∥
=Θ(δ;θ)(rirj)=rirj=δijabsentnormsuperscriptΘsuperscript𝛿𝜃superscriptsubscript𝑟𝑖superscriptsubscript𝑟𝑗normsuperscriptsubscript𝑟𝑖superscriptsubscript𝑟𝑗superscriptsubscript𝛿𝑖𝑗\displaystyle=\|\Theta^{\ell}(\delta^{\ell};\theta)(r_{i}^{\ell}-r_{j}^{\ell})% \|=\|r_{i}^{\ell}-r_{j}^{\ell}\|=\delta_{ij}^{\ell}= ∥ roman_Θ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ ) ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT - italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) ∥ = ∥ italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT - italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ∥ = italic_δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT (98)

where the third line holds since Θ(δ;θ)O(D)superscriptΘsuperscript𝛿𝜃𝑂𝐷\Theta^{\ell}(\delta^{\ell};\theta)\in O(D)roman_Θ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ ) ∈ italic_O ( italic_D ). That is, since we are rotating and translating all of the electrons with the same rotation matrix and translation vector the inter-electron distances are preserved. As a result, the inverse is simply

risuperscriptsubscript𝑟𝑖\displaystyle r_{i}^{\ell}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT =Θ(δ;θ)1(ri+1t(δ;θ))absentsuperscriptΘsuperscriptsuperscript𝛿𝜃1superscriptsubscript𝑟𝑖1superscript𝑡superscript𝛿𝜃\displaystyle=\Theta^{\ell}(\delta^{\ell};\theta)^{-1}\,(r_{i}^{\ell+1}-t^{% \ell}(\delta^{\ell};\theta))= roman_Θ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT - italic_t start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ ) )
=Θ(δ+1;θ)T(ri+1t(δ+1;θ))absentsuperscriptΘsuperscriptsuperscript𝛿1𝜃𝑇superscriptsubscript𝑟𝑖1superscript𝑡superscript𝛿1𝜃\displaystyle=\Theta^{\ell}(\delta^{\ell+1};\theta)^{T}\,(r_{i}^{\ell+1}-t^{% \ell}(\delta^{\ell+1};\theta))= roman_Θ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT ; italic_θ ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT - italic_t start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT ; italic_θ ) ) (99)

where we have used the fact that for a rotation matrix, Θ1=ΘTsuperscriptΘ1superscriptΘ𝑇\Theta^{-1}=\Theta^{T}roman_Θ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = roman_Θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. Note that all of the arguments on the right-hand side of the equation depend only on quantities from layer +11{\ell+1}roman_ℓ + 1, as desired.

Having established invertibility, let us turn to 𝔾𝔾\mathbb{G}blackboard_G-equivariance. Let π𝔾𝜋𝔾\pi\in\mathbb{G}italic_π ∈ blackboard_G, and denote the layer by r+1=Q(r)superscript𝑟1𝑄superscript𝑟r^{\ell+1}=Q(r^{\ell})italic_r start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT = italic_Q ( italic_r start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ), so that ri+1=Qi(r)superscriptsubscript𝑟𝑖1subscript𝑄𝑖superscript𝑟r_{i}^{\ell+1}=Q_{i}(r^{\ell})italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT = italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ). Note that since δsuperscript𝛿\delta^{\ell}italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is the set of distances, we have that πδ=δ𝜋superscript𝛿superscript𝛿\pi\delta^{\ell}=\delta^{\ell}italic_π italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT: a set is inherently unordered, and therefore is unaffected by permutations. Then we have that

Qi(πr)subscript𝑄𝑖𝜋superscript𝑟\displaystyle Q_{i}(\pi r^{\ell})italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_π italic_r start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) =Θ(πδ;θ)rπ(i)+t(πδ;θ)absentsuperscriptΘ𝜋superscript𝛿𝜃superscriptsubscript𝑟𝜋𝑖superscript𝑡𝜋superscript𝛿𝜃\displaystyle=\Theta^{\ell}(\pi\delta^{\ell};\theta)\,r_{\pi(i)}^{\ell}+t^{% \ell}(\pi\delta^{\ell};\theta)= roman_Θ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ ) italic_r start_POSTSUBSCRIPT italic_π ( italic_i ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT + italic_t start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_π italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ )
=Θ(δ;θ)rπ(i)+t(δ;θ)absentsuperscriptΘsuperscript𝛿𝜃superscriptsubscript𝑟𝜋𝑖superscript𝑡superscript𝛿𝜃\displaystyle=\Theta^{\ell}(\delta^{\ell};\theta)\,r_{\pi(i)}^{\ell}+t^{\ell}(% \delta^{\ell};\theta)= roman_Θ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ ) italic_r start_POSTSUBSCRIPT italic_π ( italic_i ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT + italic_t start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ )
=Qπ(i)(r)absentsubscript𝑄𝜋𝑖superscript𝑟\displaystyle=Q_{\pi(i)}(r^{\ell})= italic_Q start_POSTSUBSCRIPT italic_π ( italic_i ) end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) (100)

so that Q(πrl)=πQ(r)𝑄𝜋superscript𝑟𝑙𝜋𝑄superscript𝑟Q(\pi r^{l})=\pi Q(r^{\ell})italic_Q ( italic_π italic_r start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) = italic_π italic_Q ( italic_r start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ), as desired. ∎

Appendix Q Implementation of the Electron-Electron Cusp Layer

Recall from Equation (30) that the network must be a function of the set of inter-electron distances δsuperscript𝛿\delta^{\ell}italic_δ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT. Using multihead attention will be inefficient, as we must apply it to all pairs of electrons, leading to quartic complexity. Instead, we propose the following Deep Set (Zaheer et al., 2017) style layer:

  1. 1.

    MLP Per Electron Pair: Apply the same Multilayer Perceptron ηsuperscript𝜂\eta^{\ell}italic_η start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT to each electron pair individually:

    ζij=η(δij)for all i<jformulae-sequencesuperscriptsubscript𝜁𝑖𝑗superscript𝜂superscriptsubscript𝛿𝑖𝑗for all 𝑖𝑗\zeta_{ij}^{\ell}=\eta^{\ell}(\delta_{ij}^{\ell})\quad\text{for all }i<jitalic_ζ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = italic_η start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) for all italic_i < italic_j (101)
  2. 2.

    Average: Form the average value: ζ¯=112n(n1)i<jζijsuperscript¯𝜁112𝑛𝑛1subscript𝑖𝑗superscriptsubscript𝜁𝑖𝑗\bar{\zeta}^{\ell}=\frac{1}{\frac{1}{2}n(n-1)}\sum_{i<j}\zeta_{ij}^{\ell}over¯ start_ARG italic_ζ end_ARG start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_n ( italic_n - 1 ) end_ARG ∑ start_POSTSUBSCRIPT italic_i < italic_j end_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT.

  3. 3.

    Overall MLP: Apply a Multilayer Perceptron η^superscript^𝜂\hat{\eta}^{\ell}over^ start_ARG italic_η end_ARG start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT to the average:

    ζ¯η^(ζ¯)superscript¯𝜁superscript^𝜂superscript¯𝜁\bar{\zeta}^{\ell}\leftarrow\hat{\eta}^{\ell}(\bar{\zeta}^{\ell})over¯ start_ARG italic_ζ end_ARG start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ← over^ start_ARG italic_η end_ARG start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( over¯ start_ARG italic_ζ end_ARG start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) (102)

    The output should be of dimension D2+Dsuperscript𝐷2𝐷D^{2}+Ditalic_D start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_D, which is equal to 12121212 when D=3𝐷3D=3italic_D = 3.

  4. 4.

    Split into Rotation and Translation:

    tsuperscript𝑡\displaystyle t^{\ell}italic_t start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT =First D components of ζ¯absentFirst D components of superscript¯𝜁\displaystyle=\text{First $D$ components of }\bar{\zeta}^{\ell}= First italic_D components of over¯ start_ARG italic_ζ end_ARG start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT
    Asuperscript𝐴\displaystyle A^{\ell}italic_A start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT =Last D2 components of ζ¯, reshaped into a D×D matrixabsentLast D2 components of superscript¯𝜁 reshaped into a D×D matrix\displaystyle=\text{Last $D^{2}$ components of }\bar{\zeta}^{\ell},\text{ % reshaped into a $D\times D$ matrix}= Last italic_D start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT components of over¯ start_ARG italic_ζ end_ARG start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT , reshaped into a italic_D × italic_D matrix (103)
    Bsuperscript𝐵\displaystyle B^{\ell}italic_B start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT =A(A)T, a skew-symmetric matrixabsentsuperscript𝐴superscriptsuperscript𝐴𝑇 a skew-symmetric matrix\displaystyle=A^{\ell}-(A^{\ell})^{T},\text{ a skew-symmetric matrix}= italic_A start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT - ( italic_A start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , a skew-symmetric matrix
    ΘsuperscriptΘ\displaystyle\Theta^{\ell}roman_Θ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT =exp(B), using the matrix exponentialabsentsuperscript𝐵 using the matrix exponential\displaystyle=\exp(B^{\ell}),\text{ using the matrix exponential}= roman_exp ( italic_B start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ) , using the matrix exponential

Notes:

  • The reason we parameterize the rotation as an exponential of a skew-symmetric matrix is so that the layer can effectively be a residual-style layer: if we choose A=0superscript𝐴0A^{\ell}=0italic_A start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = 0 and t=0superscript𝑡0t^{\ell}=0italic_t start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = 0, then we recover ri+1=risuperscriptsubscript𝑟𝑖1superscriptsubscript𝑟𝑖r_{i}^{\ell+1}=r_{i}^{\ell}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT = italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT. (This is harder if we use a rotation matrix directly, as the identity transformation ri+1=risuperscriptsubscript𝑟𝑖1superscriptsubscript𝑟𝑖r_{i}^{\ell+1}=r_{i}^{\ell}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT = italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT is only recovered if Θ=IsuperscriptΘ𝐼\Theta^{\ell}=Iroman_Θ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = italic_I, which is harder to achieve.)

  • It is proposed to use one such layer, or a very small number of such layers, somewhere near the beginning of the flow. The work of incorporating the cusps in the appropriate manner can then be performed by subsequent layers.

Appendix R Electron-Nuclear Cusps

It is also known that the gradient of the wavefunction should exhibit a discontinuity when an electron and nucleus coincide. As in the case of electron-electron cusps, we may treat this by incorporating the electron-nuclear distances directly; we may design our layer exactly analogously to the electron-electron cusp layer, with one main caveat: to preserve invertibility, we can only deal with a single nucleus at a time. In particular, for a given nucleus I𝐼Iitalic_I with position RIsubscript𝑅𝐼R_{I}italic_R start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT, let δI={δiI}i=1nsuperscriptsubscript𝛿𝐼superscriptsubscriptsuperscriptsubscript𝛿𝑖𝐼𝑖1𝑛\delta_{I}^{\ell}=\left\{\delta_{iI}^{\ell}\right\}_{i=1}^{n}italic_δ start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = { italic_δ start_POSTSUBSCRIPT italic_i italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with δiI=riRIsuperscriptsubscript𝛿𝑖𝐼normsuperscriptsubscript𝑟𝑖subscript𝑅𝐼\delta_{iI}^{\ell}=\|r_{i}^{\ell}-R_{I}\|italic_δ start_POSTSUBSCRIPT italic_i italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT = ∥ italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT - italic_R start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ∥. Then the layer looks like

ri+1=Θ(δI;θ)(riRI)+RIsuperscriptsubscript𝑟𝑖1superscriptΘsuperscriptsubscript𝛿𝐼𝜃superscriptsubscript𝑟𝑖subscript𝑅𝐼subscript𝑅𝐼r_{i}^{\ell+1}=\Theta^{\ell}(\delta_{I}^{\ell};\theta)\,(r_{i}^{\ell}-R_{I})+R% _{I}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ + 1 end_POSTSUPERSCRIPT = roman_Θ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ; italic_θ ) ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT - italic_R start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ) + italic_R start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT (104)

Note in the above that only the rotation matrix is parameterized, and the translation vector is fixed. We must include one such layer for each nucleus I𝐼Iitalic_I.

Appendix S Proof of Theorem 11

Theorem.

Let R¯=1NI=1NRI=0¯𝑅1𝑁superscriptsubscript𝐼1𝑁subscript𝑅𝐼0\bar{R}=\frac{1}{N}\sum_{I=1}^{N}R_{I}=0over¯ start_ARG italic_R end_ARG = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_I = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT = 0. Given a continuous normalizing flow of the form dv/dt=Γt(v;R,Z)𝑑𝑣𝑑𝑡subscriptΓ𝑡𝑣𝑅𝑍dv/dt=\Gamma_{t}(v;R,Z)italic_d italic_v / italic_d italic_t = roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) with v(0)=zρz()𝑣0𝑧similar-tosubscript𝜌𝑧v(0)=z\sim\rho_{z}(\cdot)italic_v ( 0 ) = italic_z ∼ italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( ⋅ ) and r=v(1)𝑟𝑣1r=v(1)italic_r = italic_v ( 1 ). Let the function ΓtsubscriptΓ𝑡\Gamma_{t}roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be invariant with respect to nuclear permutations and equivariant with respect to joint rotations, i.e. for all t𝑡titalic_t

Γt(v;πR,πZ)=Γt(v;R,Z)π𝕊NΓt(Θv;ΘR,Z)=ΘΓt(v;R,Z)ΘO(D)formulae-sequencesubscriptΓ𝑡𝑣𝜋𝑅𝜋𝑍subscriptΓ𝑡𝑣𝑅𝑍for-all𝜋subscript𝕊𝑁subscriptΓ𝑡Θ𝑣Θ𝑅𝑍ΘsubscriptΓ𝑡𝑣𝑅𝑍for-allΘ𝑂𝐷\Gamma_{t}(v;\pi R,\pi Z)=\Gamma_{t}(v;R,Z)\,\,\forall\pi\in\mathbb{S}_{N}% \hskip 21.33955pt\Gamma_{t}(\Theta v;\Theta R,Z)=\Theta\Gamma_{t}(v;R,Z)\,\,% \forall\Theta\in O(D)roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_π italic_R , italic_π italic_Z ) = roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) ∀ italic_π ∈ blackboard_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( roman_Θ italic_v ; roman_Θ italic_R , italic_Z ) = roman_Θ roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) ∀ roman_Θ ∈ italic_O ( italic_D )

Furthermore, suppose that the base density is invariant with respect to rotations, ρz(Θz)=ρz(z)subscript𝜌𝑧Θ𝑧subscript𝜌𝑧𝑧\rho_{z}(\Theta z)=\rho_{z}(z)italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( roman_Θ italic_z ) = italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_z ) for ΘO(D)Θ𝑂𝐷\Theta\in O(D)roman_Θ ∈ italic_O ( italic_D ). Then the resulting density ρ(r;R,Z)𝜌𝑟𝑅𝑍\rho(r;R,Z)italic_ρ ( italic_r ; italic_R , italic_Z ) satisfies Equations (31) and (33).

Proof.

Let us first consider permutation invariance, i.e. Equation (31). Let r𝑟ritalic_r be produced by solving the flow

dv/dt=Γt(v;R,Z) with v(0)=zρz() and r=v(1)𝑑𝑣𝑑𝑡subscriptΓ𝑡𝑣𝑅𝑍 with 𝑣0𝑧similar-tosubscript𝜌𝑧 and 𝑟𝑣1dv/dt=\Gamma_{t}(v;R,Z)\text{ with }v(0)=z\sim\rho_{z}(\cdot)\text{ and }r=v(1)italic_d italic_v / italic_d italic_t = roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) with italic_v ( 0 ) = italic_z ∼ italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( ⋅ ) and italic_r = italic_v ( 1 ) (105)

Consider a permutation π𝜋\piitalic_π on the nuclei, and let r~~𝑟\tilde{r}over~ start_ARG italic_r end_ARG be the resulting electronic positions. Then r~~𝑟\tilde{r}over~ start_ARG italic_r end_ARG is produced by solving the flow

dv~/dt=Γt(v~;πR,πZ) with v~(0)=zρz() and r~=v~(1)𝑑~𝑣𝑑𝑡subscriptΓ𝑡~𝑣𝜋𝑅𝜋𝑍 with ~𝑣0𝑧similar-tosubscript𝜌𝑧 and ~𝑟~𝑣1d\tilde{v}/dt=\Gamma_{t}(\tilde{v};\pi R,\pi Z)\text{ with }\tilde{v}(0)=z\sim% \rho_{z}(\cdot)\text{ and }\tilde{r}=\tilde{v}(1)italic_d over~ start_ARG italic_v end_ARG / italic_d italic_t = roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_v end_ARG ; italic_π italic_R , italic_π italic_Z ) with over~ start_ARG italic_v end_ARG ( 0 ) = italic_z ∼ italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( ⋅ ) and over~ start_ARG italic_r end_ARG = over~ start_ARG italic_v end_ARG ( 1 ) (106)

However, we know that Γt(v~;πR,πZ)=Γt(v~;R,Z)subscriptΓ𝑡~𝑣𝜋𝑅𝜋𝑍subscriptΓ𝑡~𝑣𝑅𝑍\Gamma_{t}(\tilde{v};\pi R,\pi Z)=\Gamma_{t}(\tilde{v};R,Z)roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_v end_ARG ; italic_π italic_R , italic_π italic_Z ) = roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_v end_ARG ; italic_R , italic_Z ). Thus, r~~𝑟\tilde{r}over~ start_ARG italic_r end_ARG is given by

dv~/dt=Γt(v~;R,Z) with v~(0)=zρz() and r~=v~(1)𝑑~𝑣𝑑𝑡subscriptΓ𝑡~𝑣𝑅𝑍 with ~𝑣0𝑧similar-tosubscript𝜌𝑧 and ~𝑟~𝑣1d\tilde{v}/dt=\Gamma_{t}(\tilde{v};R,Z)\text{ with }\tilde{v}(0)=z\sim\rho_{z}% (\cdot)\text{ and }\tilde{r}=\tilde{v}(1)italic_d over~ start_ARG italic_v end_ARG / italic_d italic_t = roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_v end_ARG ; italic_R , italic_Z ) with over~ start_ARG italic_v end_ARG ( 0 ) = italic_z ∼ italic_ρ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( ⋅ ) and over~ start_ARG italic_r end_ARG = over~ start_ARG italic_v end_ARG ( 1 ) (107)

which is precisely equivalent to the equation for r𝑟ritalic_r; thus r~=r~𝑟𝑟\tilde{r}=rover~ start_ARG italic_r end_ARG = italic_r, i.e. the random variables representing the electronic positions are identical in both cases. Thus, their distributions must be equal: ρ(r;πR,πZ)=ρ(r;R,Z)𝜌𝑟𝜋𝑅𝜋𝑍𝜌𝑟𝑅𝑍\rho(r;\pi R,\pi Z)=\rho(r;R,Z)italic_ρ ( italic_r ; italic_π italic_R , italic_π italic_Z ) = italic_ρ ( italic_r ; italic_R , italic_Z ), so Equation (31) is established.

Let us now turn to joint rotation invariance, i.e. Equation (33). As we know that ΓtsubscriptΓ𝑡\Gamma_{t}roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT satisfies rotation equivariance, i.e. Γt(Θv;ΘR,Z)=ΘΓt(v;R,Z)subscriptΓ𝑡Θ𝑣Θ𝑅𝑍ΘsubscriptΓ𝑡𝑣𝑅𝑍\Gamma_{t}(\Theta v;\Theta R,Z)=\Theta\Gamma_{t}(v;R,Z)roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( roman_Θ italic_v ; roman_Θ italic_R , italic_Z ) = roman_Θ roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ), we may apply Theorems 1 and 2 from (Köhler et al., 2020) (noting that R𝑅Ritalic_R is irrelevant for the flow, which is entirely in v𝑣vitalic_v). This yields immediately that ρ(Θr;ΘR,Z)=ρ(r;R,Z)𝜌Θ𝑟Θ𝑅𝑍𝜌𝑟𝑅𝑍\rho(\Theta r;\Theta R,Z)=\rho(r;R,Z)italic_ρ ( roman_Θ italic_r ; roman_Θ italic_R , italic_Z ) = italic_ρ ( italic_r ; italic_R , italic_Z ), so Equation (33) is established. ∎

Appendix T Proof of Theorem 12

Theorem.

Let ϕt(v;R,Z)subscriptitalic-ϕ𝑡𝑣𝑅𝑍\phi_{t}(v;R,Z)italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) be a function which is 𝔾𝔾\mathbb{G}blackboard_G-equivariant with respect to v𝑣vitalic_v i.e. ϕt(gv;R,Z)=gϕt(v;R,Z)subscriptitalic-ϕ𝑡𝑔𝑣𝑅𝑍𝑔subscriptitalic-ϕ𝑡𝑣𝑅𝑍\phi_{t}(gv;R,Z)=g\phi_{t}(v;R,Z)italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_g italic_v ; italic_R , italic_Z ) = italic_g italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) for g𝔾𝑔𝔾g\in\mathbb{G}italic_g ∈ blackboard_G. Let ωt(v;R,z)subscript𝜔𝑡𝑣𝑅𝑧\omega_{t}(v;R,z)italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_z ) be a function whose output is itself a rotation, i.e. ωt(v;R,z)O(D)subscript𝜔𝑡𝑣𝑅𝑧𝑂𝐷\omega_{t}(v;R,z)\in O(D)italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_z ) ∈ italic_O ( italic_D ). Let ωtsubscript𝜔𝑡\omega_{t}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be 𝔾𝔾\mathbb{G}blackboard_G-invariant with respect to v𝑣vitalic_v, and O(D)𝑂𝐷O(D)italic_O ( italic_D )-equivariant jointly with respect to v𝑣vitalic_v and R𝑅Ritalic_R i.e. ωt(Θv;ΘR,Z)=Θωt(v;R,Z)subscript𝜔𝑡Θ𝑣Θ𝑅𝑍Θsubscript𝜔𝑡𝑣𝑅𝑍\omega_{t}(\Theta v;\Theta R,Z)=\Theta\omega_{t}(v;R,Z)italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( roman_Θ italic_v ; roman_Θ italic_R , italic_Z ) = roman_Θ italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ). Finally, let both ϕtsubscriptitalic-ϕ𝑡\phi_{t}italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and ωtsubscript𝜔𝑡\omega_{t}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be permutation-invariant jointly with respect to R𝑅Ritalic_R and Z𝑍Zitalic_Z i.e. ϕt(v;πR,πZ)=ϕt(v;R,Z)subscriptitalic-ϕ𝑡𝑣𝜋𝑅𝜋𝑍subscriptitalic-ϕ𝑡𝑣𝑅𝑍\phi_{t}(v;\pi R,\pi Z)=\phi_{t}(v;R,Z)italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_π italic_R , italic_π italic_Z ) = italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) and likewise for ωtsubscript𝜔𝑡\omega_{t}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Then the function

Γt(v;R,Z)=ζϕt(ζ1v;ζ1R,Z)whereζ=ωt(v;R,Z)formulae-sequencesubscriptΓ𝑡𝑣𝑅𝑍𝜁subscriptitalic-ϕ𝑡superscript𝜁1𝑣superscript𝜁1𝑅𝑍where𝜁subscript𝜔𝑡𝑣𝑅𝑍\Gamma_{t}(v;R,Z)=\zeta\phi_{t}(\zeta^{-1}v;\zeta^{-1}R,Z)\qquad\text{where}% \qquad\zeta=\omega_{t}(v;R,Z)roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) = italic_ζ italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_v ; italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_R , italic_Z ) where italic_ζ = italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z )

satisfies the properties in Equation (34) and is 𝔾𝔾\mathbb{G}blackboard_G-equivariant with respect to v𝑣vitalic_v.

Proof.

Let us begin with the first condition in Equation (34), namely we wish to show that Γt(v;πR,πZ)=Γt(v;R,Z)subscriptΓ𝑡𝑣𝜋𝑅𝜋𝑍subscriptΓ𝑡𝑣𝑅𝑍\Gamma_{t}(v;\pi R,\pi Z)=\Gamma_{t}(v;R,Z)roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_π italic_R , italic_π italic_Z ) = roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ). Use tilde’s to denote the variables after the permutation π𝜋\piitalic_π has been applied. Thus,

ζ~=ωt(v;πR,πZ)=ωt(v;R,Z)=ζ~𝜁subscript𝜔𝑡𝑣𝜋𝑅𝜋𝑍subscript𝜔𝑡𝑣𝑅𝑍𝜁\tilde{\zeta}=\omega_{t}(v;\pi R,\pi Z)=\omega_{t}(v;R,Z)=\zetaover~ start_ARG italic_ζ end_ARG = italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_π italic_R , italic_π italic_Z ) = italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) = italic_ζ (108)

where we have used the fact that ωtsubscript𝜔𝑡\omega_{t}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is permutation-invariant jointly with respect to R𝑅Ritalic_R and Z𝑍Zitalic_Z. Then

Γt(v;πR,πZ)subscriptΓ𝑡𝑣𝜋𝑅𝜋𝑍\displaystyle\Gamma_{t}(v;\pi R,\pi Z)roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_π italic_R , italic_π italic_Z ) =ζ~ϕt(ζ~1v;ζ~1πR,πZ)absent~𝜁subscriptitalic-ϕ𝑡superscript~𝜁1𝑣superscript~𝜁1𝜋𝑅𝜋𝑍\displaystyle=\tilde{\zeta}\phi_{t}(\tilde{\zeta}^{-1}v;\tilde{\zeta}^{-1}\pi R% ,\pi Z)= over~ start_ARG italic_ζ end_ARG italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_ζ end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_v ; over~ start_ARG italic_ζ end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_π italic_R , italic_π italic_Z )
=ζϕt(ζ1v;ζ1πR,πZ)absent𝜁subscriptitalic-ϕ𝑡superscript𝜁1𝑣superscript𝜁1𝜋𝑅𝜋𝑍\displaystyle=\zeta\phi_{t}(\zeta^{-1}v;\zeta^{-1}\pi R,\pi Z)= italic_ζ italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_v ; italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_π italic_R , italic_π italic_Z )
=ζϕt(ζ1v;πζ1R,πZ)absent𝜁subscriptitalic-ϕ𝑡superscript𝜁1𝑣𝜋superscript𝜁1𝑅𝜋𝑍\displaystyle=\zeta\phi_{t}(\zeta^{-1}v;\pi\zeta^{-1}R,\pi Z)= italic_ζ italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_v ; italic_π italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_R , italic_π italic_Z )
=ζϕt(ζ1v;ζ1R,Z)absent𝜁subscriptitalic-ϕ𝑡superscript𝜁1𝑣superscript𝜁1𝑅𝑍\displaystyle=\zeta\phi_{t}(\zeta^{-1}v;\zeta^{-1}R,Z)= italic_ζ italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_v ; italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_R , italic_Z )
=Γt(v;R,Z)absentsubscriptΓ𝑡𝑣𝑅𝑍\displaystyle=\Gamma_{t}(v;R,Z)= roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) (109)

where in the second line we have used the fact that ζ~=ζ~𝜁𝜁\tilde{\zeta}=\zetaover~ start_ARG italic_ζ end_ARG = italic_ζ; in the third line, the fact that the operation of applying an identical rotation to a list of vectors commutes with a permutation applied to that list of vectors; and in the fourth line, the fact that ϕtsubscriptitalic-ϕ𝑡\phi_{t}italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is permutation-invariant jointly with respect to R𝑅Ritalic_R and Z𝑍Zitalic_Z. We have thus established the first condition in Equation (34).

Now let us turn to the second condition in Equation (34), that is we need to show that Γt(Θv;ΘR,Z)=ΘΓt(v;R,Z)subscriptΓ𝑡Θ𝑣Θ𝑅𝑍ΘsubscriptΓ𝑡𝑣𝑅𝑍\Gamma_{t}(\Theta v;\Theta R,Z)=\Theta\Gamma_{t}(v;R,Z)roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( roman_Θ italic_v ; roman_Θ italic_R , italic_Z ) = roman_Θ roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ). We have that

ζ~=ωt(Θv;ΘR,Z)=Θωt(v;R,Z)=Θζ~𝜁subscript𝜔𝑡Θ𝑣Θ𝑅𝑍Θsubscript𝜔𝑡𝑣𝑅𝑍Θ𝜁\tilde{\zeta}=\omega_{t}(\Theta v;\Theta R,Z)=\Theta\omega_{t}(v;R,Z)=\Theta\zetaover~ start_ARG italic_ζ end_ARG = italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( roman_Θ italic_v ; roman_Θ italic_R , italic_Z ) = roman_Θ italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) = roman_Θ italic_ζ (110)

where we have used the fact that ωtsubscript𝜔𝑡\omega_{t}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is O(D)𝑂𝐷O(D)italic_O ( italic_D )-equivariant jointly with respect to v𝑣vitalic_v and R𝑅Ritalic_R. Then

Γt(Θv;ΘR,Z)subscriptΓ𝑡Θ𝑣Θ𝑅𝑍\displaystyle\Gamma_{t}(\Theta v;\Theta R,Z)roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( roman_Θ italic_v ; roman_Θ italic_R , italic_Z ) =ζ~ϕt(ζ~1Θv;ζ~1ΘR,Z)absent~𝜁subscriptitalic-ϕ𝑡superscript~𝜁1Θ𝑣superscript~𝜁1Θ𝑅𝑍\displaystyle=\tilde{\zeta}\phi_{t}(\tilde{\zeta}^{-1}\Theta v;\tilde{\zeta}^{% -1}\Theta R,Z)= over~ start_ARG italic_ζ end_ARG italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_ζ end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Θ italic_v ; over~ start_ARG italic_ζ end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Θ italic_R , italic_Z )
=Θζϕt(ζ1Θ1Θv;ζ1Θ1ΘR,Z)absentΘ𝜁subscriptitalic-ϕ𝑡superscript𝜁1superscriptΘ1Θ𝑣superscript𝜁1superscriptΘ1Θ𝑅𝑍\displaystyle=\Theta\zeta\phi_{t}(\zeta^{-1}\Theta^{-1}\Theta v;\zeta^{-1}% \Theta^{-1}\Theta R,Z)= roman_Θ italic_ζ italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Θ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Θ italic_v ; italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Θ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Θ italic_R , italic_Z )
=Θζϕt(ζ1v;ζ1R,Z)absentΘ𝜁subscriptitalic-ϕ𝑡superscript𝜁1𝑣superscript𝜁1𝑅𝑍\displaystyle=\Theta\zeta\phi_{t}(\zeta^{-1}v;\zeta^{-1}R,Z)= roman_Θ italic_ζ italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_v ; italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_R , italic_Z )
=ΘΓt(v;R,Z)absentΘsubscriptΓ𝑡𝑣𝑅𝑍\displaystyle=\Theta\Gamma_{t}(v;R,Z)= roman_Θ roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) (111)

as desired.

Finally, let us turn to demonstrating the 𝔾𝔾\mathbb{G}blackboard_G-equivariance of ΓtsubscriptΓ𝑡\Gamma_{t}roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT with respect to v𝑣vitalic_v. Let g𝔾𝑔𝔾g\in\mathbb{G}italic_g ∈ blackboard_G; then we have that

ζ~=ωt(gv;R,Z)=ωt(v;R,Z)=ζ~𝜁subscript𝜔𝑡𝑔𝑣𝑅𝑍subscript𝜔𝑡𝑣𝑅𝑍𝜁\tilde{\zeta}=\omega_{t}(gv;R,Z)=\omega_{t}(v;R,Z)=\zetaover~ start_ARG italic_ζ end_ARG = italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_g italic_v ; italic_R , italic_Z ) = italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) = italic_ζ (112)

where we have used the fact that ωtsubscript𝜔𝑡\omega_{t}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is 𝔾𝔾\mathbb{G}blackboard_G-invariant with respect to v𝑣vitalic_v. Then

Γt(gv;R,Z)subscriptΓ𝑡𝑔𝑣𝑅𝑍\displaystyle\Gamma_{t}(gv;R,Z)roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_g italic_v ; italic_R , italic_Z ) =ζ~ϕt(ζ~1gv;ζ~1R,Z)absent~𝜁subscriptitalic-ϕ𝑡superscript~𝜁1𝑔𝑣superscript~𝜁1𝑅𝑍\displaystyle=\tilde{\zeta}\phi_{t}(\tilde{\zeta}^{-1}gv;\tilde{\zeta}^{-1}R,Z)= over~ start_ARG italic_ζ end_ARG italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_ζ end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_g italic_v ; over~ start_ARG italic_ζ end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_R , italic_Z )
=ζϕt(ζ1gv;ζ1R,Z)absent𝜁subscriptitalic-ϕ𝑡superscript𝜁1𝑔𝑣superscript𝜁1𝑅𝑍\displaystyle=\zeta\phi_{t}(\zeta^{-1}gv;\zeta^{-1}R,Z)= italic_ζ italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_g italic_v ; italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_R , italic_Z )
=ζϕt(gζ1v;ζ1R,Z)absent𝜁subscriptitalic-ϕ𝑡𝑔superscript𝜁1𝑣superscript𝜁1𝑅𝑍\displaystyle=\zeta\phi_{t}(g\zeta^{-1}v;\zeta^{-1}R,Z)= italic_ζ italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_g italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_v ; italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_R , italic_Z )
=ζgϕt(ζ1v;ζ1R,Z)absent𝜁𝑔subscriptitalic-ϕ𝑡superscript𝜁1𝑣superscript𝜁1𝑅𝑍\displaystyle=\zeta g\phi_{t}(\zeta^{-1}v;\zeta^{-1}R,Z)= italic_ζ italic_g italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_v ; italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_R , italic_Z )
=gζϕt(ζ1v;ζ1R,Z)absent𝑔𝜁subscriptitalic-ϕ𝑡superscript𝜁1𝑣superscript𝜁1𝑅𝑍\displaystyle=g\zeta\phi_{t}(\zeta^{-1}v;\zeta^{-1}R,Z)= italic_g italic_ζ italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_v ; italic_ζ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_R , italic_Z )
=gΓt(v;R,Z)absent𝑔subscriptΓ𝑡𝑣𝑅𝑍\displaystyle=g\Gamma_{t}(v;R,Z)= italic_g roman_Γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ; italic_R , italic_Z ) (113)

where in the second line we have used the fact that ζ~=ζ~𝜁𝜁\tilde{\zeta}=\zetaover~ start_ARG italic_ζ end_ARG = italic_ζ; in the third and fifth lines, the fact that the operation of applying an identical rotation to a list of vectors commutes with a permutation applied to that list of vectors; and in the fourth line, the fact that ϕtsubscriptitalic-ϕ𝑡\phi_{t}italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is 𝔾𝔾\mathbb{G}blackboard_G-equivariant with respect to v𝑣vitalic_v. This completes the proof. ∎

Appendix U Implementation of Continuous Normalizing Flow for Multiple Molecules

We must implement both networks mentioned in Theorem 12: the functions ϕtsubscriptitalic-ϕ𝑡\phi_{t}italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and ωtsubscript𝜔𝑡\omega_{t}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. The function ϕtsubscriptitalic-ϕ𝑡\phi_{t}italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is 𝔾𝔾\mathbb{G}blackboard_G-equivariant, so that we may use the general recipe described in Appendix K; however, it has the additional properties it depends on both R𝑅Ritalic_R and Z𝑍Zitalic_Z, and must be permutation-invariant jointly with respect to these two variables. Therefore, the following minor modification may be made to the recipe described in Appendix K (noting that the notation changes slightly as we no longer have layers \ellroman_ℓ - the flow is continuous; and that we replace the variables γα,isuperscriptsubscript𝛾𝛼𝑖\gamma_{\alpha,i}^{\ell}italic_γ start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT with vα,isubscript𝑣𝛼𝑖v_{\alpha,i}italic_v start_POSTSUBSCRIPT italic_α , italic_i end_POSTSUBSCRIPT). We compute a Deep Set (Zaheer et al., 2017) function on R,Z𝑅𝑍R,Zitalic_R , italic_Z, i.e. on the inputs {(RI,ZI)}subscript𝑅𝐼subscript𝑍𝐼\{(R_{I},Z_{I})\}{ ( italic_R start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ) }; the output of this function is permutation-invariant by construction. This output is then fed into the Fully Connected Layer with Spin Mixing as an extra input. An alternative to the Deep Set approach is to apply a transformer to R,Z𝑅𝑍R,Zitalic_R , italic_Z, where each token is the pair (RI,ZI)subscript𝑅𝐼subscript𝑍𝐼(R_{I},Z_{I})( italic_R start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ), and then apply an averaging step at the end; this will also produce a permutation-invariant function.

In order to implement the function ωtsubscript𝜔𝑡\omega_{t}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, recall that its output is a rotation matrix. Furthermore, it is 𝔾𝔾\mathbb{G}blackboard_G-invariant in v𝑣vitalic_v; O(D)𝑂𝐷O(D)italic_O ( italic_D )-equivariant with respect to v𝑣vitalic_v and R𝑅Ritalic_R jointly; and permutation-invariant with respect to R𝑅Ritalic_R and Z𝑍Zitalic_Z jointly. We may use an EGNN architecture (Satorras et al., 2021) jointly on electrons and nuclei. In the EGNN:

  • The positions of the electrons and nuclei are initialized as v𝑣vitalic_v and R𝑅Ritalic_R respectively.

  • The hidden vectors of the electrons and nuclei are initialized in order to encode two things:

    1. 1.

      Whether the vertex corresponds to an electron or a nucleus.

    2. 2.

      Properties of the vertex: (a) in the case of an electron, whether the spin is up or down; (b) in the case of a nucleus, the atomic number ZIsubscript𝑍𝐼Z_{I}italic_Z start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT.

    This encoding can be achieved via combining one-hot vectors with linear projections of varying dimensionalities.

For each of the D𝐷Ditalic_D final layers of the EGNN, one may then take the position vectors for that layer and form an average over them; this yields a total of D𝐷Ditalic_D new vectors. These D𝐷Ditalic_D vectors are clearly 𝔾𝔾\mathbb{G}blackboard_G-invariant in v𝑣vitalic_v, as reordering within spins does not matter; permutation-invariant in R𝑅Ritalic_R and Z𝑍Zitalic_Z jointly; and O(D)𝑂𝐷O(D)italic_O ( italic_D )-equivariant with respect to v𝑣vitalic_v and R𝑅Ritalic_R jointly, by the built-in equivariance properties of EGNNs. We then take these D𝐷Ditalic_D vectors, and perform Gram-Schmidt on them to obtain a rotation matrix ΘΘ\Thetaroman_Θ, noting that Gram-Schmidt retains the equivariance property. A similar idea is discussed in (Kaba et al., 2023). This completes the implementation.