A Theoretical Framework for an Efficient Normalizing Flow-Based Solution to the Schrödinger Equation
Abstract
A central problem in quantum mechanics involves solving the Electronic Schrödinger Equation for a molecule or material. The Variational Monte Carlo approach to this problem approximates a particular variational objective via sampling, and then optimizes this approximated objective over a chosen parameterized family of wavefunctions, known as the ansatz. Recently neural networks have been used as the ansatz, with accompanying success. However, sampling from such wavefunctions has required the use of a Markov Chain Monte Carlo approach, which is inherently inefficient. In this work, we propose a solution to this problem via an ansatz which is cheap to sample from, yet satisfies the requisite quantum mechanical properties. We prove that a normalizing flow using the following two essential ingredients satisfies our requirements: (a) a base distribution which is constructed from Determinantal Point Processes; (b) flow layers which are equivariant to a particular subgroup of the permutation group. We then show how to construct both continuous and discrete normalizing flows which satisfy the requisite equivariance. We further demonstrate the manner in which the non-smooth nature (“cusps”) of the wavefunction may be captured, and how the framework may be generalized to provide induction across multiple molecules. The resulting theoretical framework entails an efficient approach to solving the Electronic Schrödinger Equation.
1 Introduction
The Electronic Schrödinger Equation A central problem in quantum mechanics involves solving the Electronic Schrödinger Equation to compute the ground state energy and wavefunction of a molecule or material. This problem has manifold applications in chemistry, condensed matter physics, and materials science. A standard computational approach to this problem is based on Variational Monte Carlo (Ceperley and Alder, 1986; Austin et al., 2012; Gubernatis et al., 2016; Foulkes et al., 2001; Needs et al., 2009): a particular variational objective is approximated via sampling, and the approximated objective is optimized over a family of wavefunctions, yielding an upper bound on the ground state energy. The heart of this method is the wavefunction family, also known as the ansatz; recent work has proposed using neural networks as a flexible ansatz, and has achieved very high quality results, which we now describe further.
Neural Network Ansätze We begin by noting that various works have used neural networks as the ansätze in the case of pure-spin systems (sometimes also referred to as “discrete space systems”), for example (Carleo and Troyer, 2017; Deng et al., 2017; Gao and Duan, 2017; Levine et al., 2019; Sharir et al., 2022; Passetti et al., 2023). In terms of continuous space problems of the sort that interest us, DeepWF (Han et al., 2019) bases its on ansatz on the classical Slater-Jastrow formalism, but learns both the symmetric and antisymmetric parts; the latter contains only two-electron terms, limiting the accuracy. PauliNet (Hermann et al., 2020; Schätzle et al., 2021) also bases its ansatz on the Slater-Jastrow-Backflow form, but does so in a way that captures many-electron interactions, while respecting permutation-equivariance; this, as well as the inclusion of cusp terms, leads to much higher accuracy (e.g. 97.3% of the correlation energy for boron atoms). FermiNet (Pfau et al., 2020; Spencer et al., 2020) attains still higher accuracy (e.g. 99.8% of the correlation energy for boron atoms) by using an appropriately designed neural network to represent the entire wavefunction, which contains a generalization of Slater determinants to account for all-electron interactions. A hybrid solution which improves upon both PauliNet and FermiNet is presented in (Gerard et al., 2022). Techniques for learning / induction across several molecules or materials at once are presented in (Gao and Günnemann, 2023; Scherbela et al., 2024; Gerard et al., 2024). We briefly mention applications to periodic systems (Wilson et al., 2022; Li et al., 2022; Pescia et al., 2022; Cassella et al., 2023); techniques that use Diffusion Monte Carlo (Wilson et al., 2021; Ren et al., 2023); and methods that deal with excited states (Entwistle et al., 2023; Pfau et al., 2023; Naito et al., 2023). Finally, we mention two works which use normalizing flows (Thiede et al., 2022; Saleh et al., 2023). Both are limited in their applicability, as the former is restricted to one-dimensional systems by construction, while the latter makes use of the flows in a non-standard way and thus cannot scale past systems with a few electrons.
Goals and Contributions In order to be able to apply the Variational Monte Carlo formalism to the ansätze just described, such as PauliNet or FermiNet, one must be able to sample from the densities corresponding to the wavefunctions given by their neural networks. In general, this is only possibly using Markov Chain Monte Carlo (MCMC) techniques such as Langevin Monte Carlo (Umrigar et al., 1993) or any of several variations. The issue with using such MCMC approaches to sampling is that they are inherently time-consuming: each sample is itself the solution of a stochastic differential equation as time goes to infinity. The main goal of this paper is to solve the problem of sampling inefficiency, thereby yielding faster algorithms for solving the Electronic Schrödinger Equation. We achieve this goal by specifying a wavefunction ansatz which is easy to sample from, yet satisfies the requisite quantum mechanical properties. The ansatz is based on normalizing flows, which unlike (Thiede et al., 2022; Saleh et al., 2023) are general and can be applied to a space of any dimensionality. We provide the following contributions:
-
•
We establish that such an ansatz can be instantiated as a normalizing flow with these characteristics: (a) its base distribution is symmetric under permutations, and vanishes for identical electrons; (b) the flow transformation is equivariant to a particular subgroup of the permutation group.
-
•
We show that the base distribution can be constructed using a particular combination of Determinantal Point Processes.
-
•
We construct both continuous and discrete normalizing flows obeying the requisite equivariance.
-
•
We provide a training regimen based on standard stochastic gradient descent.
-
•
We show how to accommodate cusps, which encapsulate non-smooth aspects of the wavefunction.
-
•
We generalize the framework so that induction across multiple molecules may be accommodated, while including the necessary additional invariances, in particular rigid motion invariance.
2 Problem Setup
2.1 Goals
The Setting Our overall goal is to compute the ground state wavefunction and energy of a molecule given its molecular parameters and spin multiplicity. Denote to be the pair consisting of the position and spin for the electron; will denote the entire ordered list , with corresponding definitions for and . We specify wavefunctions as ; due to the fact that electrons are Fermions, valid wavefunctions must be antisymmetric, that is if is a permutation, then
(1) |
where as usual, is shorthand for where is the minimal number of flips to produce .
Let and denote the position and atomic number of the nucleus, and let the Laplacian for the electron be ; then the Hamiltonian is given by
(2) |
Our goal is to compute the ground state wavefunction, which we denote as and corresponding ground state energy . They may be computed using the variational principle:
(3) |
where is the set of all possible valid wavefunctions, and is the Hamiltonian. If we specify the wavefunction ansatz as a neural network with parameters , this becomes
(4) |
That is, we compute an upper bound to the ground state energy . The more expressive the ansatz, the tighter the bound will be.
Variational Monte Carlo The issue with the formulation to this point is the need to compute the inner products in Equations (3) and (4), which correspond to very high-dimensional integrals. A standard solution to this problem is based on a Monte Carlo scheme. To begin with, let us define the local energy as
(5) |
In this case, one can simplify the minimand in Equation (3) (see Appendix A) as
(6) |
where the are sampled from .
2.2 The General Approach
As enumerated in Section 1, a number of recent works have followed the above approach using a variety of neural networks as the ansatz for the wavefunction . In order to do so, one must be able to sample from ; and as the networks are quite general, the only feasible method for sampling is a Markov Chain Monte Carlo technique such as Langevin Monte Carlo (Umrigar et al., 1993) or any of several variations. These techniques can be time-consuming, as each sample is itself the solution of a stochastic differential equation as time goes to infinity.
A solution to this problem presents itself if we can somehow specify a wavefunction which is easy to sample from. We are interested in wavefunctions which satisfy the following three properties:
-
(W1)
There is an explicit functional form for the wavefunction .
-
(W2)
is antisymmetric.
-
(W3)
We can sample non-iteratively (in constant time) from .
The first two properties are necessary for any form of Variational Monte Carlo: (W1) allows us to evaluate the local energy in (5) for use in (6); and (W2) is required for valid electronic (Fermionic) wavefunctions. But (W3) is the new ingredient: if we have a family of wavefunctions satisfying (W1)-(W3), then solving the minimization in (4) via the Monte Carlo approach in (6) will be considerably accelerated, as each sample will only require constant time to generate. We add a fourth property, which is not strictly necessary but is both desirable and will prove useful:
-
(W4)
is normalized, that is .
It turns out that generating such wavefunctions is possible using the following procedure:
Theorem 1.
Let be a probability density function which we can sample from in constant time. Let satisfy two additional properties:
-
(D1)
is symmetric: for all permutations .
-
(D2)
if for any .
Finally, let be a complex function which satisfies , and is nearly antisymmetric:
(7) |
where is an arbitrary value with . Then satisfies (W1)-(W4) if and only if can be written as with and satisfying the above-stated properties.
Proof: See Appendix B.
The general idea expressed in Theorem 1 is that we can build the wavefunction out of an easy-to-sample-from density function satisfying additional properties (D1)-(D2); and a nearly antisymmetric phase function . In what follows, we will show how to construct both of these ingredients. But before doing so, we take a short detour to address the most important practical scenario, that of fixed spin multiplicity.
2.3 Fixed Spin Multiplicity
Notation As in most approaches to this problem, we assume that the spin multiplicity of the molecule is specified, which is equivalent to fixing the number of spin up and spin down electrons, denoted and respectively, with . Define the canonical spin vector to be given by , i.e. the first are , the last are . We let the sets of indices of up and down spin electrons for the canonical spin vector be denoted by and . Finally, we will be interested in the subgroup of permutations in which a permutation is applied separately to spin-up and spin-down electrons. We denote this subgroup by
(8) |
Specification of the Density In the case of fixed spin multiplicity, the specification of the density is simplified:
Theorem 2.
Given a configuration , let a permutation which maps the spin vector to the canonical spin vector be given by , i.e. . Let be a density function on electron positions (i.e. no spins) satisfying
-
(R1)
is -invariant:
-
(R2)
A density satisfies conditions (D1)-(D2) in Theorem 1 if and only if it may be written as for a density satisfying conditions (R1) and (R2).
Proof: See Appendix C.
To summarize: in the case of fixed spin multiplicity, specifying a wavefunction satisfying our desired conditions (W1)-(W4) is equivalent to specifying a density satisfying conditions (R1)-(R2); and then applying the transformations given in Theorems 1 and 2 to map from to .111We have for the moment ignored the issue of the phase , which we return to in Sections 3.6 and 4.1.
Therefore, henceforth we will focus exclusively on specifying densities satisfying conditions (R1)-(R2). To avoid unnecessary notational complexity, we will drop the bars and simply write .
3 Using Normalizing Flows to Construct the Wavefunction Ansatz
3.1 Sufficient Properties of the Normalizing Flow’s Base Density and Transformation
Our goal is to use a normalizing flow to construct the density . Let be the ambient dimension (i.e. ) and be the number of electrons. The relevant vectors will live in the space construed as the Cartesian product (which is of course isomorphic to ). A normalizing flow will consist of two ingredients: (1) a base random variable , which lives in , and is described by the density ; (2) an invertible transformation , such that . In this case, the density is the push-forward of along , and is given by the change of variables formula
(9) |
Recall that we would like our density to satisfy conditions (R1)-(R2) laid out in Theorem 2. The following theorem establishes conditions for this to occur:
Theorem 3.
Suppose that we have a normalizing flow, whose base density satisfies properties (R1) and (R2) from Theorem 2, and whose transformation is -equivariant. Then the density resulting from the normalizing flow will satisfy properties (R1) and (R2).
3.2 The Base Density via Determinantal Point Processes
In most cases in machine learning, the base density for a normalizing flow is taken to be a standard distribution, most often a Gaussian. In our case, we require that the base density have certain special properties, namely (R1) and (R2) from Theorem 2. It turns out that Determinantal Point Processes (DPPs) have just the properties we require. In particular, we are interested in the class of DPPs known as Projection DPPs (Gautier et al., 2019; Lavancier et al., 2015), which can be specified as follows. We will let specify a generic point in . Let for be a set of functions which are orthogonal, that is . Let be the column vector composed by stacking the individual functions and define the kernel function as . Then for a given collection of points in , that is , we define the kernel matrix , from which the density of the Projection DPP may be specified:
(10) |
Since is positive semi-definite, it follows that its determinant is non-negative so that is non-negative, as desired. A proof that is properly normalized (i.e. integrates to ) can be found, for example, in Proposition 2.10 of (Johansson, 2006).
Given the notion of a Projection DPP, we may define the base density as follows. As above, let the base random variable be , where can be broken into spin-up and spin-down pieces, denoted and . (Specifically, and are the parts of corresponding to electrons in and , respectively.) The base density can then be constructed by taking
(11) |
That is, and are chosen from two independent Projection DPPs. We then have the following theorem:
Theorem 4.
3.3 -Equivariant Layers
As noted in Section 3, we require the normalizing flow transformation to be -equivariant. Of course, chaining together many layers which are each -equivariant results in an overall transformation which is also -equivariant. Now, suppose that a particular layer can be written as
(12) |
where and likewise for . We will need to see the action on the spin-up and spin-down electrons separately, so we denote and ; and we may write
(13) |
For notational convenience, we use to denote the spin, and the complement of the spin is given by (i.e. if then and vice-versa). Then we have the following theorem:
Theorem 5.
The transformation is -equivariant if and only if
(14) |
That is, is equivariant with respect to , and invariant with respect to .
Proof: See Appendix G.
We now show how to specify continuous and discrete normalizing flows satisfying Theorem 5.
3.4 Continuous Normalizing Flows
According to Theorem 3, we are required a find a transformation which is -equivariant. We now show this can be achieved via a continuous normalizing flow. We specify this flow via the ordinary differential equation (ODE)
(15) |
That is, the transformation is derived as follows: the initial condition is sampled from the base density; and is gotten by integrating the ODE forward to time . ’s -dependence is indicated via a subscript for notational convenience. We then have the following theorem:
Theorem 6.
Let the transformation be specified as in Equation (15). Then is -equivariant if is -equivariant for all .
Proof: See Appendix H.
It therefore suffices to design a -equivariant function . Let us break this down by spin: from Theorem 5, we know that this implies that for all , we have that . We show in Appendix K how to implement a layer of with a combination of multihead attention, fully connected layers, and linear projections ( can be composed of many such layers).
Continuous normalizing flows are elegant; however, they can present some numerical difficulties. In particular, the issue of ODE stiffness frequently arises in deep learning pipelines involving continuous normalizing flows. Thus, we now present an alternative method, based on discrete normalizing flows.
3.5 Discrete Normalizing Flows
Our goal is now to design such functions and which satisfy Equation (14), and for which the overall transformation is invertible. The goal of the layer we propose here is to not sacrifice on expressivity, especially when compared to many layers which are designed for discrete normalizing flows. In particular, the main issue will be to show that the expressivity can be retained even with the joint requirements of invertibility and -equivariance. We note that the kind of transformation we propose below is not generally used for normalizing flows, as the determinant of its Jacobian is not fast to compute; however, this is not an issue in our case, as the dimension of the spaces we are dealing with are relatively small. For a more detailed discussion, see Appendix I.
To solve this problem, we introduce the Split Subspace Layer; we note that this layer may be of broader interest in machine learning, independent of the current setting. As before, we take to represent the ambient spatial dimension; in our case, . A key parameter for the layer will be the orthogonal matrix ; in particular, we divide this matrix into 2 pieces
(16) |
That is, represents the first columns of , and represents the final columns. For each electron , we compute the inner product of its coordinates with , i.e.
(17) |
We can collect the individual vectors into a list . Given this, we define the Split Subspace Layer on a per-electron basis by
(18) |
where is a network, and is the part of (the output of) corresponding to the electron. The layer is referred to as the Split Subspace Layer due to the fact that its input is one subspace of , given by ; whereas its output is in the orthogonal complement of this subspace, given by .
The main ingredient of the layer is the network . We now show two things: (1) the layer is invertible for any choice of (2) we derive conditions on to achieve -equivariance of .
Theorem 7.
Let be a Split Subspace Layer, as given in Equation (18). Then is invertible. In particular, let ; then the inverse of the layer is given by
(19) |
Furthermore, the layer is -equivariant if
(20) |
i.e. if is equivariant with respect to permutations on and invariant with respect to permutations on .
Proof: See Appendix J.
The Split Subspace Layer therefore depends on implementation of the network so that it satisfies Equation (20). We show in Appendix K how can be implemented with a combination of multihead attention, fully connected layers, and linear projections. We specify a more general version of the Split Subspace Layer in Appendix L.
3.6 Training via SGD
Log Domain: Density In order to avoid numerical issues, it is best to operate in the log domain. Suppose that
(21) |
where and are the real and imaginary parts of the phase , respectively; and atan2 is the “full” arctangent.
The log-density may be computed for both continuous and discrete normalizing flows, where we now introduce the parameters of the network explicitly. Consider a sample chosen from the base density , and in analogy to , define . Now, in the case of a continuous normalizing flow, let satisfy Equation (15); then can be by computed (Chen et al., 2018) by solving the ODE
(22) |
which is the continuous analogue of the change of variables formula. In the case of a discrete normalizing flow, fix the following notation: , , and . Then we may use a logarithmic version of the standard change of variables formula (9):
(23) |
Log Domain: Gradient of the Objective Recall that our goal in finding an approximation to the ground state wavefunction is to solve the optimization problem in Equation (4). Using Equation (6) and noting that since is normalized, we may write the objective function to be minimized as
(24) |
with samples . Then we have the following theorem, which shows that the local energy can be written entirely as a function of and the potential , so that the phase does not appear; and furthermore gives the gradient of the objective function .
Theorem 8.
The local energy can be written as
(25) |
In particular, the local energy is independent of the phase . Furthermore, let
(26) |
Then the gradient of the loss function may be written as
(27) |
with samples .
4 Further Details: Phase, Cusps, and Induction
4.1 The Phase
Since the Hamiltonian is time-reversal invariant and Hermitian, both its eigenvalues and its eigenfunctions are real. Since the ground-state wavefunction we are looking for is real, the phase can be taken to belong to the two element set . Given that we now know how to solve for an approximation to the density corresponding to the ground state wavefunction, we now show one way of assigning the phase so that the resulting ground state wavefunction is appropriately antisymmetric.
Theorem 9.
Let be the the density for the ground state wavefunction. Let be a strict total order on , and define the set
(28) |
For any without , define the permutation by . Then a valid antisymmetric ground state wavefunction is given by
(29) |
Proof: See Appendix O.
Thus, given the density , we can use Theorem 9 to easily compute the ground state wavefunction . A question remains: what is the strict total order ? Any choice is valid, but the simplest thing to do is to use lexicographic ordering on the coordinates of the two points in that are being compared.
4.2 Incorporating Cusps
Electron-Electron Cusps Wavefunctions are known to have certain non-smooth properties, known as cusps. In particular, the gradient of the wavefunction should exhibit a discontinuity when two electrons coincide. One way to incorporate such gradient discontinuities is via the introduction of terms which depend on the distance between electrons (Pfau et al., 2020); as the distance is itself a continuous but non-smooth function of the electron positions, using distances can allow us to model such cusps. In the case of the discrete normalizing flow, our goal will be to design a layer which incorporates the inter-electron distances directly. Given the requirements of a normalizing flow, the challenge is to enforce invertibility for such a layer. We have the following result:
Theorem 10.
Let the set of distances be given by where . Given a layer of the form
(30) |
Then the layer is both -equivariant as well as invertible.
Proof: See Appendix P.
The essence of this layer to rotate all electrons in a given configuration by the same rotation matrix and translation vector ; and the rotation matrix and translation vector are both functions the configuration entirely through the distances . The latter fact is crucial, as it means that different configurations are treated differently, which gives the layer expressivity. An implementation of this layer based on a Deep Set architecture (Zaheer et al., 2017) is given in Appendix Q.
It is also known that the gradient of the wavefunction should exhibit a discontinuity when an electron and nucleus coincide. The treatment is similar, and is given in Appendix R.
4.3 Induction Across Multiple Molecules
In an effort to accelerate the ground state computation, we may try to learn the ground state wavefunctions and energies for an entire class of molecules simultaneously, as in (Gao and Günnemann, 2023; Scherbela et al., 2024; Gerard et al., 2024). In particular, the molecular parameters are given by , the nuclear positions; and , the atomic numbers of each nucleus. Then our goal is to learn a function of the form , i.e. a ground state wavefunction which is explicitly parameterized by the molecular parameters. This entails computing the density . However, this latter task is made more complicated by the fact that two new invariances are required:
(31) | |||
(32) |
We henceforth assume that the nuclei have their center of mass at the origin, i.e. ; this removes the need to deal with translations, which generally require special (and uninteresting) treatment, e.g. see (Satorras et al., 2021). Thus, Equation (32) becomes
(33) |
We now show that densities satisfying Equations (31) and (33) can be realized via a variation of the continuous normalizing flow we have introduced in Section 3.4:
Theorem 11.
Let . Given a continuous normalizing flow of the form with and . Let the function be invariant with respect to nuclear permutations and equivariant with respect to joint rotations, i.e. for all
(34) |
Furthermore, suppose that the base density is invariant with respect to rotations, for . Then the resulting density satisfies Equations (31) and (33).
Proof: See Appendix S.
First, we note that the base density in Equation (11) can be made invariant to rotations by constructing the relevant Projection DPP from a kernel function , where the functions are derived from taking arbitrary rotationally-invariant functions , and orthogonalizing them with Gram-Schmidt; e.g. one may use Gaussians of varying bandwidths, .
Now, we turn to the construction of . Recall from Theorem 6 that must be -equivariant for all . Furthermore, we have already noted that -equivariant functions may be constructed using a combination of standard pieces: multihead attention, fully connected layers, and linear projections. It would be nice if we were able to use this result while also incorporating the extra conditions in Equation (32). We now show that this is possible:
Theorem 12.
Let be a function which is -equivariant with respect to i.e. for . Let be a function whose output is itself a rotation, i.e. . Let be -invariant with respect to , and -equivariant jointly with respect to and i.e. . Finally, let both and be permutation-invariant jointly with respect to and i.e. and likewise for . Then the function
(35) |
satisfies the properties in Equation (34) and is -equivariant with respect to .
Proof: See Appendix T.
We can use the previously mentioned recipe in Appendix K in order to construct a -equivariant , with an extra path in the network for the dependence, based on either Deep Set or a Transformer architecture with pooling to gain the requisite invariance. The function can be constructed by using an Equivariant Graph Neural Network (Satorras et al., 2021) whose output is a rotation matrix, similar to what is done in (Kaba et al., 2023). More detailed information is contained in Appendix U.
5 Concluding Remarks, Limitations, and Future Work
We have demonstrated a theoretical framework for efficiently solving the Electronic Schrödinger Equation using normalizing flows. Using these flows allows us to sample efficiently from the wavefunction, thereby side-step** the need for time-consuming MCMC approaches to sampling. The framework’s construction does not easily admit extensions to either diffusion models (Yang et al., 2023) or flow-matching (Lipman et al., 2022), both of which are very powerful and useful techniques. Future work will focus on adapting the framework to accommodate one or both of these methods.
References
- Austin et al. (2012) Brian M Austin, Dmitry Yu Zubarev, and William A Lester Jr. Quantum monte carlo and related approaches. Chemical reviews, 112(1):263–288, 2012.
- Carleo and Troyer (2017) Giuseppe Carleo and Matthias Troyer. Solving the quantum many-body problem with artificial neural networks. Science, 355(6325):602–606, 2017.
- Cassella et al. (2023) Gino Cassella, Halvard Sutterud, Sam Azadi, ND Drummond, David Pfau, James S Spencer, and W Matthew C Foulkes. Discovering quantum phase transitions with fermionic neural networks. Physical Review Letters, 130(3):036401, 2023.
- Ceperley and Alder (1986) David Ceperley and Berni Alder. Quantum monte carlo. Science, 231(4738):555–560, 1986.
- Chen et al. (2018) Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
- Deng et al. (2017) Dong-Ling Deng, Xiaopeng Li, and S Das Sarma. Quantum entanglement in neural network states. Physical Review X, 7(2):021021, 2017.
- Entwistle et al. (2023) Michael T Entwistle, Zeno Schätzle, Paolo A Erdman, Jan Hermann, and Frank Noé. Electronic excited states in deep variational monte carlo. Nature Communications, 14(1):274, 2023.
- Foulkes et al. (2001) William MC Foulkes, Lubos Mitas, RJ Needs, and Guna Rajagopal. Quantum monte carlo simulations of solids. Reviews of Modern Physics, 73(1):33, 2001.
- Gao and Günnemann (2023) Nicholas Gao and Stephan Günnemann. Generalizing neural wave functions. In International Conference on Machine Learning, pages 10708–10726. PMLR, 2023.
- Gao and Duan (2017) Xun Gao and Lu-Ming Duan. Efficient representation of quantum many-body states with deep neural networks. Nature communications, 8(1):662, 2017.
- Gautier et al. (2019) Guillaume Gautier, Rémi Bardenet, and Michal Valko. On two ways to use determinantal point processes for monte carlo integration. Advances in Neural Information Processing Systems, 32, 2019.
- Gerard et al. (2022) Leon Gerard, Michael Scherbela, Philipp Marquetand, and Philipp Grohs. Gold-standard solutions to the schrödinger equation using deep learning: How much physics do we need? Advances in Neural Information Processing Systems, 35:10282–10294, 2022.
- Gerard et al. (2024) Leon Gerard, Michael Scherbela, Halvard Sutterud, Matthew Foulkes, and Philipp Grohs. Transferable neural wavefunctions for solids. arXiv preprint arXiv:2405.07599, 2024.
- Gubernatis et al. (2016) James Gubernatis, Naoki Kawashima, and Philipp Werner. Quantum Monte Carlo Methods. Cambridge University Press, 2016.
- Han et al. (2019) Jiequn Han, Linfeng Zhang, and E Weinan. Solving many-electron schrödinger equation using deep neural networks. Journal of Computational Physics, 399:108929, 2019.
- Hermann et al. (2020) Jan Hermann, Zeno Schätzle, and Frank Noé. Deep-neural-network solution of the electronic schrödinger equation. Nature Chemistry, 12(10):891–897, 2020.
- Johansson (2006) Kurt Johansson. Random matrices and determinantal processes. In Les Houches, volume 83, pages 1–56. Elsevier, 2006.
- Kaba et al. (2023) Sékou-Oumar Kaba, Arnab Kumar Mondal, Yan Zhang, Yoshua Bengio, and Siamak Ravanbakhsh. Equivariance with learned canonicalization functions. In International Conference on Machine Learning, pages 15546–15566. PMLR, 2023.
- Köhler et al. (2020) Jonas Köhler, Leon Klein, and Frank Noé. Equivariant flows: exact likelihood generative learning for symmetric densities. In International conference on machine learning, pages 5361–5370. PMLR, 2020.
- Lavancier et al. (2015) Frédéric Lavancier, Jesper Møller, and Ege Rubak. Determinantal point process models and statistical inference. Journal of the Royal Statistical Society Series B: Statistical Methodology, 77(4):853–877, 2015.
- Levine et al. (2019) Yoav Levine, Or Sharir, Nadav Cohen, and Amnon Shashua. Quantum entanglement in deep learning architectures. Physical review letters, 122(6):065301, 2019.
- Li et al. (2022) Xiang Li, Zhe Li, and Ji Chen. Ab initio calculation of real solids via neural network ansatz. Nature Communications, 13(1):7895, 2022.
- Lipman et al. (2022) Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations, 2022.
- Naito et al. (2023) Tomoya Naito, Hisashi Naito, Koji Hashimoto, et al. Multi-body wave function of ground and low-lying excited states using unornamented deep neural networks. Physical Review Research, 5(3):033189, 2023.
- Needs et al. (2009) Richarad J Needs, Michael D Towler, Neil D Drummond, and P López Ríos. Continuum variational and diffusion quantum monte carlo calculations. Journal of Physics: Condensed Matter, 22(2):023201, 2009.
- Passetti et al. (2023) Giacomo Passetti, Damian Hofmann, Pit Neitemeier, Lukas Grunwald, Michael A Sentef, and Dante M Kennes. Can neural quantum states learn volume-law ground states? Physical Review Letters, 131(3):036502, 2023.
- Pescia et al. (2022) Gabriel Pescia, Jiequn Han, Alessandro Lovato, Jianfeng Lu, and Giuseppe Carleo. Neural-network quantum states for periodic systems in continuous space. Physical Review Research, 4(2):023138, 2022.
- Pfau et al. (2020) David Pfau, James S Spencer, Alexander GDG Matthews, and W Matthew C Foulkes. Ab initio solution of the many-electron schrödinger equation with deep neural networks. Physical Review Research, 2(3):033429, 2020.
- Pfau et al. (2023) David Pfau, Simon Axelrod, Halvard Sutterud, Ingrid von Glehn, and James S Spencer. Natural quantum monte carlo computation of excited states. arXiv preprint arXiv:2308.16848, 2023.
- Ren et al. (2023) Weiluo Ren, Weizhong Fu, Xiaojie Wu, and Ji Chen. Towards the ground state of molecules via diffusion monte carlo on neural networks. Nature Communications, 14(1):1860, 2023.
- Saleh et al. (2023) Yahya Saleh, Álvaro Fernández Corral, Armin Iske, Jochen Küpper, and Andrey Yachmenev. Computing excited states of molecules using normalizing flows. arXiv preprint arXiv:2308.16468, 2023.
- Satorras et al. (2021) Vıctor Garcia Satorras, Emiel Hoogeboom, and Max Welling. E(n) equivariant graph neural networks. In International conference on machine learning, pages 9323–9332. PMLR, 2021.
- Schätzle et al. (2021) Zeno Schätzle, Jan Hermann, and Frank Noé. Convergence to the fixed-node limit in deep variational monte carlo. The Journal of Chemical Physics, 154(12), 2021.
- Scherbela et al. (2024) Michael Scherbela, Leon Gerard, and Philipp Grohs. Towards a transferable fermionic neural wavefunction for molecules. Nature Communications, 15(1):120, 2024.
- Sharir et al. (2022) Or Sharir, Amnon Shashua, and Giuseppe Carleo. Neural tensor contractions and the expressive power of deep neural quantum states. Physical Review B, 106(20):205136, 2022.
- Spencer et al. (2020) James S Spencer, David Pfau, Aleksandar Botev, and W Matthew C Foulkes. Better, faster fermionic neural networks. arXiv preprint arXiv:2011.07125, 2020.
- Thiede et al. (2022) Luca Thiede, Chong Sun, and Alán Aspuru-Guzik. Waveflow: Enforcing boundary conditions in smooth normalizing flows with application to fermionic wave functions. arXiv preprint arXiv:2211.14839, 2022.
- Umrigar et al. (1993) CJ Umrigar, MP Nightingale, and KJ Runge. A diffusion monte carlo algorithm with very small time-step errors. The Journal of chemical physics, 99(4):2865–2890, 1993.
- Wilson et al. (2021) Max Wilson, Nicholas Gao, Filip Wudarski, Eleanor Rieffel, and Norm M Tubman. Simulations of state-of-the-art fermionic neural network wave functions with diffusion monte carlo. arXiv preprint arXiv:2103.12570, 2021.
- Wilson et al. (2022) Max Wilson, Saverio Moroni, Markus Holzmann, Nicholas Gao, Filip Wudarski, Tejs Vegge, and Arghya Bhowmik. Wave function ansatz (but periodic) networks and the homogeneous electron gas. arXiv preprint arXiv:2202.04622, 2022.
- Yang et al. (2023) Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. Diffusion models: A comprehensive survey of methods and applications. ACM Computing Surveys, 56(4):1–39, 2023.
- Zaheer et al. (2017) Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov, and Alexander J Smola. Deep sets. Advances in neural information processing systems, 30, 2017.
Appendix A Derivation of Equation (6)
Recall that the local energy is defined as
(36) |
with
(37) |
In this case, one may write
(38) |
where the are sampled from . Note that in the first line, we have used the fact that is a symmetric operator so that the quadratic form is real; in the third line, the fact that is real; and in the sixth line, the fact that is real.
Appendix B Proof of Theorem 1
Theorem.
Let be a probability density function which we can sample from in constant time. Let satisfy two additional properties:
-
(D1)
is symmetric: for all permutations .
-
(D2)
if for any .
Finally, let be a complex function which satisfies , and is nearly antisymmetric:
where is an arbitrary value with . Then satisfies (W1)-(W4) if and only if can be written as with and satisfying the above-stated properties.
Proof.
Suppose that , let us prove each of properties (W1)-(W4).
(W1) The functional form for is just , which we know explicitly.
(W2) Antisymmetry of : we break down by cases. Suppose that is such that . Then:
(39) |
where in the second line we have used the two facts that is antisymmetric and is symmetric. Now, suppose that for some :
(40) |
where in the third line, we have used (D2). But this is precisely what is required for an antisymmetric function: if is antisymmetric and of some , then , where is the permutation which flips and , so that
(41) |
where we have used the fact that , since only one flip is required.
(W3) We can sample in constant time from by assumption.
(W4) is normalized:
(42) |
since is a probability density function.
Thus, we have proved the forward direction.
Now, let us assume properties (W1)-(W4). We can always express a complex number in terms of a magnitude and a phase; in particular, we may write , where is a real number, and . (W1) tells us that we have an explicit form for the complex-valued function ; thus, we know that
(43) |
with , where we have used the fact that . Note that . As , and
(44) |
by (W4), then is a density. Furthermore, by (W3) may be sampled in constant time. Finally, by (W2), is antisymmetric; thus,
(45) |
In cases where and , then we have that
(46) |
where the third line follows from the fact that since = 1. However, note that and is equivalent to for some . Thus, we have established (D2). Furthermore, in such cases we can take , since it plays no role. In all other cases, i.e. where , we have that
(47) |
where the second line holds since this must hold true for all and all relevant . establishes the remainder of the nearly antisymmetric character of . Finally,
(48) |
since . This shows that is symmetric for the case . Indeed is symmetric for all , including those for which for some , as in the latter case we have shown that . This establishes (D1) and completes the proof. ∎
Appendix C Proof of Theorem 2
Theorem.
Given a configuration , let a permutation which maps the spin vector to the canonical spin vector be given by , i.e. . Let be a density function on electron positions (i.e. no spins) satisfying
-
(R1)
is -invariant:
-
(R2)
A density satisfies conditions (D1)-(D2) in Theorem 1 if and only if it may be written as for a density satisfying conditions (R1) and (R2).
Proof.
Let us prove the forward direction: assume a density satisfying conditions (D1) and (D2), and we will show that it must be written as for satisfying conditions (R1) and (R2). Let and . In this case, we have that
(49) |
where the first equality comes our requirement that satisfy condition (D1), and the second equality from the definition of . As a result, it is sufficient for us to focus on constructing a density , i.e. a density where the spins are in canonical order. As is fixed as the canonical ordering, we may suppress it, writing for a function . Now, must satisfy condition (D1); however, the only permutations that are relevant are those that preserve the canonical spin ordering . More specifically, the relevant permutations are those for which ; it is easy to see that those permutations form the group . Thus, we must have that
(50) |
That is, is -invariant, which is condition (R1).
Now let us turn to condition (D2), which states that if for any . The requirement implies that both and ; and the condition is equivalent to or . Thus, condition (D2) is equivalent to
(51) |
which is simply condition (R2).
Thus, we have proven conditions (R1) and (R2) must hold. Finally, using Equation (49) and the definitions of and , we have that
(52) |
which completes the proof of the forward direction.
Now, let us prove the reverse direction: assume a density satisfying conditions (R1) and (R2), and we will show that satisfies conditions (D1) and (D2). Let us begin by computing , which will prove useful in what follows. is defined by . However, we also know that ; setting these equal gives
(53) |
where is some permutation leaves unchanged, i.e. such that . Thus
(54) |
However, we know that so that . Using the fact that gives
(55) |
where is some permutation that leaves unchanged; which precisely implies that . Rearranging gives
(56) |
Now, for any permutation , we have that
(57) |
where in the second last equality, we have used the fact that , and that is -invariant by (R1). Thus, we have established property (D1), i.e. that is symmetric.
Now, turning to condition (D2), let us consider an such that for a particular . Continuing to use the notation , this implies that for either or . Thus, by condition (R2), we have that . But then
(58) |
which is precisely condition (D2); this completes the proof. ∎
Appendix D Proof of Theorem 3
Theorem.
Suppose that we have a normalizing flow, whose base density satisfies properties (R1) and (R2) from Theorem 2, and whose transformation is -equivariant. Then the density resulting from the normalizing flow will satisfy properties (R1) and (R2).
Proof.
Let us begin by proving that the density resulting from the normalizing flow will satisfy condition (R1). Theorem 1 in (Köhler et al., 2020) states the following: “Let be a density on which is -invariant and . If is an -equivariant diffeomorphism, then , the push-forward of along , is -invariant.” In our instance, we may take , and thereby have established that the density resulting from the normalizing flow is -invariant, thus satisfying condition (R1).
We now turn to proving that the density resulting from the normalizing flow will satisfy condition (R2). Suppose that the random variable for the base density is given by , with density ; and the normalizing flow is given by transformation , i.e. . Then by the change of variables formula, we know that the density of is given by . Now, we are interested in the case when for (we may equally consider the case of , they are identical). Let be the permutation whose only action is to flip the coordinates of electrons and . Given that , then by definition we have that . In this case, we have that
(59) |
where the latter equality is due to the -equivariance of , which follows straightforwardly from the -equivariance of . Rearranging the above, we have that
(60) |
where the second equality is due to the fact that , as simply flips electrons and . However, , so combining Equations (59) and (60) gives . Plugging this into the equation for the change of variables gives
(61) |
But we know that is such that , which means that ; and for such ’s, we know that , by the assumption of condition (R2) for the base density. Thus, plugging back into Equation (61) gives
(62) |
as desired. ∎
Appendix E Proof of Theorem 4
Theorem.
Let . Then satisfies conditions (R1) and (R2) from Theorem 2.
Proof.
Let us begin with property (R1): we would like to prove that is -invariant. Let ; as , we may write the permutation , where is a permutation which applies to the indices in , and similarly for and the indices in . Thus,
(63) |
Now, recall that the Projection DPP’s density is defined by
(64) |
Thus, we must compute for a permutation . We may represent the action of on a vector of length by an matrix . It is then straightforward to see that
(65) |
and thus that
(66) |
where the second equality uses the cyclic property of the determinant; the third equality that a determinant of products is the product of determinants; and the fourth equality that the determinant of a permutation matrix is . Thus, we have that
(67) |
Likewise, . This gives finally that , establishing that is -invariant, i.e. satisfies condition (R1).
We now turn to condition (R2): we would like to prove that . Let us focus on the case of spin-up electrons, i.e. ; the spin-down case will follow analogously. We know that
(68) |
Given the definition of the matrix , it is straightforward to see that if , then has identical columns for and . However, a matrix with two identical columns is rank deficient, and therefore has determinant . Thus, we have that so that , establishing that satisfies condition (R2). ∎
Appendix F Sampling Procedure for Projection DPPs
In order to sample from a Projection DPP, we may follows the procedure outlined in (Lavancier et al., 2015), which we reproduce in Algorithm 1. We note that the speed of the sampling algorithm is largely unimportant, as one may sample as many samples as one would like offline, prior to (and independent from) the process of minimizing the variational objective.
In order to sample from , one may use rejection sampling; for further details, see (Lavancier et al., 2015). Note that the algorithm can be generalized in a straightforward fashion to a complex orthonormal basis by replacing all transposes with Hermitian transposes.
Appendix G Proof of Theorem 5
Theorem.
The transformation is -equivariant if and only if
That is, is equivariant with respect to , and invariant with respect to .
Proof.
Let us begin with the forward direction: suppose that is -equivariant. Let ; as , we may write the permutation , where is a permutation which applies to the indices in , and similarly for and the indices in . Then -equivariance of implies
(69) |
Now, let us break this down by spin. Note that
(70) |
and also
(71) |
But Equation (69) says that , so we may combine the last two equations to give
(72) |
In words, is equivariant with respect to , and invariant with respect to ; and the reverse is true for . For notational convenience, we use to denote the spin, and the complement of the spin is given by (i.e. if then ). In this case, we may summarize Equation (72) as
(73) |
which completes the proof for the forward direction.
Now, suppose that . For a given permutation , we have
(74) |
so that is -equivariant, as desired. This completes the proof for the reverse direction. ∎
Appendix H Proof of Theorem 6
Theorem.
Let the transformation be specified by the ODE
Then is -equivariant if is -equivariant.
Proof.
The result follows directly from Theorem 2 in (Köhler et al., 2020). ∎
Appendix I The Complexity of Discrete Normalizing Flows
The limiting factor in the complexity of the discrete normalizing flow is the computation of determinants; unlike traditional normalizing flows, we make no effort to accelerate the determinant of the Jacobian, which allows us to have more expressive layers. In particular, the relevant space is of dimension , where and is on the order of tens of electrons for small molecules. Thus, the overall dimension of the space is low hundreds.
We note that the determinant of the Jacobian is cubic in the dimension; for a low-dimensional space this is acceptable. Furthermore, popular methods based on neural networks, such as FermiNet (Pfau et al., 2020) and PauliNet (Hermann et al., 2020) use determinants explicitly in their ansätze, so that they have similar complexity. However, these methods use Markov Chain Monte Carlo sampling, so that they incur extra overhead from having to sample by solving for the limit of a stochastic differential equation, which our method avoids.
Appendix J Proof of Theorem 7
Theorem.
Let be a Split Subspace Layer, as given by
Then is invertible. In particular, let ; then the inverse of the layer is given by
Furthermore, the layer is -equivariant if
i.e. if is equivariant with respect to permutations on and invariant with respect to permutations on .
Proof.
Let us first prove the layer’s inverse. First, note that can be computed entirely from variables in layer . Also note that , since - i.e. uses , while uses . Now, we show that
where the equality in the last line follows from the fact that is the orthogonal complement of , so that . That is, the Split Subspace Layer has the nice property that it preserves projections onto the subspace given by .
Given this, the inverse is straightforwardly computed by rearranging Equation (18):
(75) |
Note that everything on the right-hand side depends on variables from layer , as desired. Thus, we have shown the layer in Equation (18) is invertible regardless of the form of the network .
Now let us turn to proving the layer’s -equivariance. Recall that the conditions for -equivariance are given by Equation (14), which we can combine with Equation (18):
(76) |
where indicates the index that electron is moved to under the permutation ; and the fourth line follows from the fact that the previous statement must be true for all possible outputs of . This completes the proof. ∎
Appendix K Implementation of the -Equivariant Layer
As we have seen, invertibility places no special restrictions on the form of . With regard to the conditions imposed by -equivariance in Equation (20), there are several ways to achieve them. We propose the following method, as it uses standard off-the-shelf architectures; we use the variables to represent intermediate quantities.
-
1.
Lifting: Map each value from dimension to dimension :
(77) where there are two matrices of dimension , one for each spin .
-
2.
Multihead Attention: We have two Multihead Attention (MHA) layers , one for each spin. Each MHA takes as input the the list . The output of the MHA is then
(78) -
3.
Fully Connected Layer Per Spin: There are two fully connected layers , one for each spin. The layer is applied per electron, with the same layer being applied to electrons of a given spin:
(79) -
4.
Average: Form the average values: .
-
5.
Fully Connected Layer with Spin Mixing: We have two fully connected layers , one for each spin. Then:
(80) The output of the MLPs is of dimension .
Due to the permutation-equivariance of Multihead Attention, the -equivariance follows naturally. Some comments are in order:
-
•
We can choose . Since in our case , this gives us exactly two choices: or .
-
•
The fully connected layers should use smooth activation functions, i.e. not ReLU. There are many possible smooth substitutes for ReLU-like activations, such as Swish, SiLU, etc.
-
•
To achieve orthogonalization, i.e. to ensure that is itself orthonormal and is also orthogonal to , it is important to use a smooth procedure. Gram-Schmidt may be employed for this purpose: an initial (e.g. random) set of vectors are chosen, which are then orthonormalized by the procedure.
-
•
In the special case of Helium, there are only 2 electrons: one which is spin-up, and the other which is spin-down. In this case, the requirement that be equivariant with respect to permutations of is trivially satisfied; likewise, the requirement that be invariant with respect to permutations of is also trivially satisfied. As a result, the Multihead Attention layers may be replaced by the identity, with everything else remaining the same.
Appendix L A Generalized Variant of the Split Subspace Layer
We note that a generalized variant of the Split Subspace Layer is as follows:
(81) |
where both and satisfy the conditions in (20), and is explicitly invertible in the sense that the system of equations for all may be inverted to solve for all values of . An example of such an is given by for matrices and an invertible nonlinearity (such as the cube of each element).
Appendix M Proof of Theorem 8
Theorem.
The local energy can be written as
In particular, the local energy is independent of the phase . Furthermore, let
Then the gradient of the loss function may be written as
with samples .
Proof.
For the moment, we suppress for convenience. Recall that the overall (i.e. complex) local energy is defined by
(82) |
Let be the component of the position vector of the electron, . Then plugging in , we have that
(83) |
and
(84) |
With the appropriate summation, this immediately yields
(85) |
so that its real part simplifies to
(86) |
Now, it is known that since the Hamiltonian is time-reversal invariant and Hermitian, both its eigenvalues and its eigenfunctions are real. Since the ground-state wavefunction we are looking for is real, the phase can be taken to belong to the two element set , where corresponds to positive values of the wavefunction , and to negative values of . Thus, where the sign of does not change is constant, and therefore .
We are then left to consider the case when the sign of flips, and therefore there is a discontinuity in ; this occurs precisely where . However, recall from Equation (24)
(87) |
When then ; thus, samples where there is a discontinuity are never selected. We may therefore set the local energy at such values of to any value we wish, without affecting the value of . In particular, we are free to set at such points. In conclusion, then, we have demonstrated that
(88) |
which is independent of the phase .
Turning to the second part of the theorem, we note that
(89) |
so that
(90) |
However, since , then , or . Plugging this in gives
(91) |
where . ∎
Appendix N Optimization of the Objective Function
In order to optimize the objective in Equation (24), we use the procedure in Algorithm 2, which is specified for the discrete normalizing flow; the procedure for the continuous normalizing flow will be similar. Note that we initially sample a large number of samples from the base density; we emphasize that this step can be performed entirely offline, and does not entail additional computational complexity.
Appendix O Proof of Theorem 9
Theorem.
Let be the density for the ground state wavefunction. Let be a strict total order on , and define the set
For any without , define the permutation by . Then a valid antisymmetric ground state wavefunction is given by
Proof.
We begin by noting that the set contains the spin-up electrons in ascending order, according to the ordering relation , and the spin-down electrons also in ascending order. Now, begin by considering the case of for which for some pair of electrons and ; in this case, , as is required by antisymmetry. Now, consider the case of for which . In this case, for any permutation we have that
(92) |
However, recall that is defined by
(93) |
Therefore, is defined by
(94) |
Comparing the latter two equations, we see that
(95) |
Furthermore, we know that as is the density for the ground state wavefunction, it must satisfy property (D1) of Theorem 2, namely it must be -invariant; therefore, we must have that
(96) |
Plugging Equations (95) and (96) into (92) gives
(97) |
where in the second line, we have used the facts that ; and that . But Equation (97) is exactly the antisymmetry property we desire, and so we have completed the proof.
Finally, we note that for ; this is an arbitrary choice, and we could have equally well defined a second ground state wavefunction with for . It is easy to see that in this case, for all . However, this is not surprising: either or may be taken as an eigenfunction of , as eigenfunctions are only defined up to sign. ∎
Appendix P Proof of Theorem 10
Theorem.
Let the set of distances be given by where . Given a layer of the form
Then the layer is both -equivariant as well as invertible.
Proof.
Let us begin with invertibility. We may compute the inter-electron distances at layer :
(98) |
where the third line holds since . That is, since we are rotating and translating all of the electrons with the same rotation matrix and translation vector the inter-electron distances are preserved. As a result, the inverse is simply
(99) |
where we have used the fact that for a rotation matrix, . Note that all of the arguments on the right-hand side of the equation depend only on quantities from layer , as desired.
Having established invertibility, let us turn to -equivariance. Let , and denote the layer by , so that . Note that since is the set of distances, we have that : a set is inherently unordered, and therefore is unaffected by permutations. Then we have that
(100) |
so that , as desired. ∎
Appendix Q Implementation of the Electron-Electron Cusp Layer
Recall from Equation (30) that the network must be a function of the set of inter-electron distances . Using multihead attention will be inefficient, as we must apply it to all pairs of electrons, leading to quartic complexity. Instead, we propose the following Deep Set (Zaheer et al., 2017) style layer:
-
1.
MLP Per Electron Pair: Apply the same Multilayer Perceptron to each electron pair individually:
(101) -
2.
Average: Form the average value: .
-
3.
Overall MLP: Apply a Multilayer Perceptron to the average:
(102) The output should be of dimension , which is equal to when .
-
4.
Split into Rotation and Translation:
(103)
Notes:
-
•
The reason we parameterize the rotation as an exponential of a skew-symmetric matrix is so that the layer can effectively be a residual-style layer: if we choose and , then we recover . (This is harder if we use a rotation matrix directly, as the identity transformation is only recovered if , which is harder to achieve.)
-
•
It is proposed to use one such layer, or a very small number of such layers, somewhere near the beginning of the flow. The work of incorporating the cusps in the appropriate manner can then be performed by subsequent layers.
Appendix R Electron-Nuclear Cusps
It is also known that the gradient of the wavefunction should exhibit a discontinuity when an electron and nucleus coincide. As in the case of electron-electron cusps, we may treat this by incorporating the electron-nuclear distances directly; we may design our layer exactly analogously to the electron-electron cusp layer, with one main caveat: to preserve invertibility, we can only deal with a single nucleus at a time. In particular, for a given nucleus with position , let with . Then the layer looks like
(104) |
Note in the above that only the rotation matrix is parameterized, and the translation vector is fixed. We must include one such layer for each nucleus .
Appendix S Proof of Theorem 11
Theorem.
Let . Given a continuous normalizing flow of the form with and . Let the function be invariant with respect to nuclear permutations and equivariant with respect to joint rotations, i.e. for all
Furthermore, suppose that the base density is invariant with respect to rotations, for . Then the resulting density satisfies Equations (31) and (33).
Proof.
Let us first consider permutation invariance, i.e. Equation (31). Let be produced by solving the flow
(105) |
Consider a permutation on the nuclei, and let be the resulting electronic positions. Then is produced by solving the flow
(106) |
However, we know that . Thus, is given by
(107) |
which is precisely equivalent to the equation for ; thus , i.e. the random variables representing the electronic positions are identical in both cases. Thus, their distributions must be equal: , so Equation (31) is established.
Let us now turn to joint rotation invariance, i.e. Equation (33). As we know that satisfies rotation equivariance, i.e. , we may apply Theorems 1 and 2 from (Köhler et al., 2020) (noting that is irrelevant for the flow, which is entirely in ). This yields immediately that , so Equation (33) is established. ∎
Appendix T Proof of Theorem 12
Theorem.
Let be a function which is -equivariant with respect to i.e. for . Let be a function whose output is itself a rotation, i.e. . Let be -invariant with respect to , and -equivariant jointly with respect to and i.e. . Finally, let both and be permutation-invariant jointly with respect to and i.e. and likewise for . Then the function
satisfies the properties in Equation (34) and is -equivariant with respect to .
Proof.
Let us begin with the first condition in Equation (34), namely we wish to show that . Use tilde’s to denote the variables after the permutation has been applied. Thus,
(108) |
where we have used the fact that is permutation-invariant jointly with respect to and . Then
(109) |
where in the second line we have used the fact that ; in the third line, the fact that the operation of applying an identical rotation to a list of vectors commutes with a permutation applied to that list of vectors; and in the fourth line, the fact that is permutation-invariant jointly with respect to and . We have thus established the first condition in Equation (34).
Now let us turn to the second condition in Equation (34), that is we need to show that . We have that
(110) |
where we have used the fact that is -equivariant jointly with respect to and . Then
(111) |
as desired.
Finally, let us turn to demonstrating the -equivariance of with respect to . Let ; then we have that
(112) |
where we have used the fact that is -invariant with respect to . Then
(113) |
where in the second line we have used the fact that ; in the third and fifth lines, the fact that the operation of applying an identical rotation to a list of vectors commutes with a permutation applied to that list of vectors; and in the fourth line, the fact that is -equivariant with respect to . This completes the proof. ∎
Appendix U Implementation of Continuous Normalizing Flow for Multiple Molecules
We must implement both networks mentioned in Theorem 12: the functions and . The function is -equivariant, so that we may use the general recipe described in Appendix K; however, it has the additional properties it depends on both and , and must be permutation-invariant jointly with respect to these two variables. Therefore, the following minor modification may be made to the recipe described in Appendix K (noting that the notation changes slightly as we no longer have layers - the flow is continuous; and that we replace the variables with ). We compute a Deep Set (Zaheer et al., 2017) function on , i.e. on the inputs ; the output of this function is permutation-invariant by construction. This output is then fed into the Fully Connected Layer with Spin Mixing as an extra input. An alternative to the Deep Set approach is to apply a transformer to , where each token is the pair , and then apply an averaging step at the end; this will also produce a permutation-invariant function.
In order to implement the function , recall that its output is a rotation matrix. Furthermore, it is -invariant in ; -equivariant with respect to and jointly; and permutation-invariant with respect to and jointly. We may use an EGNN architecture (Satorras et al., 2021) jointly on electrons and nuclei. In the EGNN:
-
•
The positions of the electrons and nuclei are initialized as and respectively.
-
•
The hidden vectors of the electrons and nuclei are initialized in order to encode two things:
-
1.
Whether the vertex corresponds to an electron or a nucleus.
-
2.
Properties of the vertex: (a) in the case of an electron, whether the spin is up or down; (b) in the case of a nucleus, the atomic number .
This encoding can be achieved via combining one-hot vectors with linear projections of varying dimensionalities.
-
1.
For each of the final layers of the EGNN, one may then take the position vectors for that layer and form an average over them; this yields a total of new vectors. These vectors are clearly -invariant in , as reordering within spins does not matter; permutation-invariant in and jointly; and -equivariant with respect to and jointly, by the built-in equivariance properties of EGNNs. We then take these vectors, and perform Gram-Schmidt on them to obtain a rotation matrix , noting that Gram-Schmidt retains the equivariance property. A similar idea is discussed in (Kaba et al., 2023). This completes the implementation.