Space-Time Continuous PDE Forecasting using Equivariant Neural Fields

David M. Knigge^∗,1, David R. Wessels^∗,1, Riccardo Valperga¹, Samuele Papa¹, Jan-Jakob Sonke²,
Efstratios Gavves ^†,1, Erik J. Bekkers ^†,1
¹University of Amsterdam ² Netherlands Cancer Institute
[email protected], [email protected]

Abstract

Recently, Conditional Neural Fields (NeFs) have emerged as a powerful modelling paradigm for PDEs, by learning solutions as flows in the latent space of the Conditional NeF. Although benefiting from favourable properties of NeFs such as grid-agnosticity and space-time-continuous dynamics modelling, this approach limits the ability to impose known constraints of the PDE on the solutions – e.g. symmetries or boundary conditions – in favour of modelling flexibility. Instead, we propose a space-time continuous NeF-based solving framework that - by preserving geometric information in the latent space - respects known symmetries of the PDE. We show that modelling solutions as flows of pointclouds over the group of interest $G$ improves generalization and data-efficiency. We validated that our framework readily generalizes to unseen spatial and temporal locations, as well as geometric transformations of the initial conditions - where other NeF-based PDE forecasting methods fail - and improve over baselines in a number of challenging geometries.

1 Introduction

^†^†footnotetext: * shared first author,

\dagger

shared lead advising

Partial Differential Equations (PDEs) are a foundational tool in modelling and understanding spatio-temporal dynamics across diverse scientific domains. Classically, PDEs are solved using numerical methods such as finite elements, finite volumes, or spectral methods. In recent years, Deep Learning (DL) methods have emerged as promising alternatives due to abundance of observed and simulated data as well as the accessibility to computational resources, with applications ranging from fluid simulations and weather modelling [49, 7] to biology [32].

Refer to caption — Figure 1: We propose to solve an equivariant PDE in function space by solving an equivariant ODE in latent space. Through our proposed framework, which leverages Equivariant Neural Fields $f_{\theta}$ , a field $\nu_{t}$ is represented by a set of latents $z^{\nu}_{t}=\{(p_{i}^{\nu},\mathbf{c}_{i}^{\nu})\}_{i=1}^{N}$ consisting of a pose $p_{i}$ and context vector $\mathbf{c}_{i}$ . Using meta-learning, the initial latent $z^{\nu}_{0}$ is fit in only 3 SGD steps, after which an equivariant neural ODE $F_{\psi}$ models the solution as a latent flow.

The systems modelled by PDEs often have underlying symmetries. For example, heat diffusion or fluid dynamics can be modeled with differential operators which are rotation equivariant, e.g., given a solution to the system of PDEs, its rotation is also a valid solution ¹¹1Assuming boundary conditions are symmetric, i.e. they transform according to the relevant group action.. In such scenarios it is sensible, and even desirable, to design neural networks that incorporate and preserve such symmetries to improve generalization and data-efficiency [12, 47, 4].

Crucially, DL-based approaches often rely on data sampled on a regular grid, without the inherent ability to generalize outside of it, which is restrictive in many scenarios [39]. To this end, [49] propose to use Neural Fields (NeFs) for modelling and forecasting PDE dynamics. This is done by fitting a neural ODE [11] to the conditioning variables of a conditional Neural Field trained to reconstruct states of the PDE [13]. However, this approach fails to leverage aforementioned known symmetries of the system. Furthermore, using neural fields as representations has proved difficult due to the non-linear nature of neural networks [13, 3, 34], limiting performance in more challenging settings. We posit that NeF-based modelling of PDE dynamics benefits from representations that account for the symmetries of the system as this allows for introducing inductive biases into the model that ought to be reflected in solutions. Furthermore, we show that through meta-learning [28, 44] the NeF backbone improves performance for complex PDEs by further structuring the NeF’s latent space, simplifying the task of the neural ODE.

We introduce a framework for space-time continuous equivariant PDE solving, by adapting a class of $\mathrm{SE(n)}$ -Equivariant Neural Fields (ENFs) to PDE-specific symmetries. We leverage the ENF as representation for modelling spatiotemporal dynamics. We solve PDEs by learning a flow in the latent space of the ENF - starting at a point $z_{0}$ corresponding to the initial state of the PDE - with an equivariant graph-based neural ODE [11] we develop from previous work [5]. We extend the ENF to equivariances beyond $\mathrm{SE(n)}$ , by extending its weight-sharing scheme to equivalance classes for specific symmetries relevant to our setting. Furthermore, we show how meta-learning [14, 28, 44, 13], can not only significantly reduce inference time of the proposed framework, but also substantially simplify the structure of the latent space of the ENF, thereby simplifying the learning process of the latent dynamics for the neural ODE model. We present the following contributions:

•

We introduce a framework for spatio-temporally continuous PDE solving that respects known symmetries of the PDE through equivariance constraints.
•

We show that correctly chosen equivariance constraints as inductive bias improves performance of the solver - in terms of MSE - in spatio-temporally continuous settings, i.e. evaluated off the training grid and beyond the training horizon.
•

We show how meta-learning improves the structure of the latent space of the ENF, simplifying the learning process, leading to better performance in solving PDEs.

We structure the paper as follows: in Sec. 2 we provide an overview of the mathematical preliminaries and describe the problem setting. Our proposed framework is introduced in Sec. 3. We validate our framework on different PDEs defined over a variety of geometries in Sec. 4, with differing equivariance constraints, showing competitive performance over other neural PDE solvers.We provide an in-depth positioning of our approach in relation to other work in Appx. A.

2 Mathematical background and problem setting

Continuous spatiotemporal dynamics forecasting.

The setting considered is data-driven learning of the dynamics of a system described by continuous observables. In particular, we consider flows of fields, denoted with $\hat{\nu}:\mathbb{R}^{d}\times[0,T]\rightarrow\mathbb{R}^{c}$ . We use $\hat{\nu}_{t}$ as a shorthand for $\hat{\nu}(\cdot,t)$ . We assume the flow is governed by a PDE, and consider the Initial Value Problem (IVP) of predicting $\hat{\nu}_{t}$ from a given $\nu_{0}$ . The dataset consists of field snapshots $\nu:\mathcal{X}\times\llbracket T\rrbracket\rightarrow\mathbb{R}^{c}$ , in which $\llbracket T\rrbracket:=1,2,\dots,T$ denotes the set of time points on which the flow is sampled and $\mathcal{X}\subset\mathbb{R}^{d}$ is a set of coordinate values. For each time point we are given a set of input-output pairs $[\mathcal{X},\nu(\mathcal{X})]$ where $\nu(\mathcal{X})\subset\mathbb{R}^{c}$ are the values of the field at those coordinates. Importantly, the location at which the field is sampled need not be regular, i.e., we do not require the training data to be on a grid or to be regularly spaced in time, nor need coordinate values be identical for train and test sets. Following [49], we distinguish between $t_{\text{in}}$ - referring to values within the training time horizon $\left[0,T\right]$ - and $t_{\text{out}}$ - analogously to values beyond $T$ .

Neural Fields in dynamics modelling.

Conditional Neural fields (NeFs) are a class of coordinate-based neural networks, often trained to reconstruct discretely-sampled input continuously. More specifically, a conditional neural field $f_{\theta}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{d}$ is a field –parameterized by a neural network with parameters $\theta$ – that maps input coordinates $x\in\mathbb{R}^{n}$ in the data domain alongside conditioning latents $z$ to $d$ -dimensional signal values $\nu(x)\in\mathbb{R}^{d}$ . By associating a conditioning latent $z^{\nu}\in\mathbb{R}^{c}$ to each signal ${\nu}$ , a single conditional NeF $f_{\theta}:\mathbb{R}^{n}\times\mathbb{R}^{c}\rightarrow\mathbb{R}^{d}$ can learn to represent families $\mathcal{D}$ of continuous signals such that $\forall\,\nu\in\mathcal{D}:f(x)\approx f_{\theta}(x;z^{\nu})$ . [49] propose to use conditional NeFs for PDE modelling by learning a continuous flow in the latent space of a conditional neural field. In particular, a set of latents $\{z_{i}^{\nu}\}_{i=1}^{T}$ are obtained by fitting a conditional neural field to a given set of observations $\{\nu_{i}\}_{i=1}^{T}$ at timesteps $1,...,T$ ; simultaneously, a neural ODE [11] $F_{\psi}$ is trained to map pairs of temporally contiguous latents s.t. solutions correspond to the trajectories traced by the learned latents. Though this approach yields impressive results for sparse and irregular data in planar PDEs, we show it breaks down on complex geometries. We hypothesize that this is due to lack of a latent space that preserves relevant geometric transformations that characterize the symmetries of the systems we are modelling, and as such propose an extension of this framework where such symmetries are preserved.

Symmetries and weight sharing.

Given a group $G$ with identity element $e\in G$ , and a set $X$ , a group action is a map $\mathcal{T}:G\times X\rightarrow X$ . For simplicity, we denote the action of $g\in G$ on $x\in X$ as $gx:=\mathcal{T}(g,x)$ , and call $G$ -space a smooth manifold equipped with a $G$ action. A group action is homomorphic to $G$ with its group product, namely it is such that $ex=x$ and $(gh)x=g(hx)$ . As an example, we are interested in the Special Euclidean group $\mathrm{SE(n)}{=}\mathbb{R}^{n}\rtimes SO(n)$ : group elements of $\mathrm{SE(n)}$ are identified by a translation $t\in\mathbb{R}^{n}$ and rotations $\mathbf{R}\in SO(n)$ with group operation $gg^{\prime}=(t,\mathbf{R}_{\theta})\,(t^{\prime},\mathbf{R}_{\theta^{\prime}})% =(\mathbf{Rx}^{\prime}+\mathbf{x},\mathbf{R}\mathbf{R_{\theta^{\prime}}})$ ; We denote by $\mathcal{L}_{g}$ the left action of $G$ on function spaces defined as $\mathcal{L}_{g}f(\mathbf{x}^{\prime})=f(g^{-1}\mathbf{x}^{\prime})=f(\mathbf{R% }_{\theta}^{-1}(\mathbf{x}^{\prime}-\mathbf{x}))$ . Many PDEs are defined by equivariant differential operators such that for a given state $\nu$ : $\mathcal{L}_{g}\mathcal{N}[\nu]=\mathcal{N}[\mathcal{L}_{g}\nu]$ . If the boundary conditions do not break the symmetry, namely if the boundary is symmetric with respect to the same group action, then a $G$ -transformed solution to the IVP for some $\nu_{0}$ corresponds to the solution for the $G$ -transformed initial value. For example, laws of physics do not depend on the choice of coordinate system, this implies that many PDEs are defined by $\mathrm{SE(n)}$ -equivariant differential operators. The geometric deep learning literature shows that models can benefit from leveraging the inherent symmetries or invariances present in the data by constraining the searchable function space through weight sharing [9, 25, 5]. Recall that in our framework we model flows of fields, solutions to PDEs defined by equivariant differential operators, with ordinary differential equations in the latent space of conditional neural fields. We leverage the symmetries of the system for two key aspects of the proposed method: first by making the relation between signals and corresponding latents equivariant; second, by using equivariant ODEs, namely ODEs defined by equivariant vector fields: if $\frac{dz}{d\tau}{=}F(z)$ is such that $F\left(gz\right)=gF\left(z\right)$ , then solutions are mapped to solutions by the group action.

3 Method

We adapt the work of [49], and consider the following optimization problem ²²2We highlight that [49] optimize latents $z^{\nu}_{t}$ , neural field $f_{\theta}$ , and ODE $F_{\psi}$ using two separate objectives. We instead found that our framework is more stable under single-objective optimization.:

\displaystyle\underset{\theta,\psi,z_{\tau}}{\text{min}}

\displaystyle\mathbb{E}_{\nu\in D,x\in\mathcal{X},t\in\llbracket T\rrbracket}% \left\|\nu_{t}(x)-f_{\theta}(x;z^{\nu}_{t})\right\|_{2}^{2},

where

\displaystyle z^{\nu}_{t}=z^{\nu}_{0}+\int_{0}^{t}F_{\psi}(z^{\nu}_{\tau})d% \tau\,,

(1)

with $f_{\theta}(x;z^{\nu}_{t})$ a decoder tasked with reconstructing state $\nu_{t}$ from latent $z_{t}^{\nu}$ and $F_{\psi}$ a neural ODE that maps a latent to its temporal derivative: $\dfrac{dz_{\tau}^{\nu}}{d\tau}{=}F_{\psi}(z^{\nu}_{\tau})$ , modelling the solution as flow in latent space starting at the initial latent $z_{0}^{\nu}$ - see Fig. 1 for a visual intuition.

Equivariant space-time continuous dynamics forecasting.

A PDE defined by a $G$ -equivariant differential operator - for which $\mathcal{L}_{g}\mathcal{N}[\nu]=\mathcal{N}[\mathcal{L}_{g}\nu]$ - are such that solutions are mapped to other solutions by the group action if the boundary conditions are symmetric. We would like to leverage this property, and constrain the neural ODE $F_{\psi}$ such that the solutions it finds in latent space can be mapped onto each other by the group action. Our motivation for this is twofold: (1) it is natural for our model to have, by construction, the geometric properties that the modelled system is known to posses - (2) to get more structured latent representations and facilitate the job of the neural ODE. To achieve this we first need the latent space $Z$ to be equipped with a well-defined group action with respect to which $\forall g\in G,z\in Z:F_{\psi}(gz)\,{=}gF_{\psi}(z)$ , and, most importantly, we need the relation between the reconstructed field and the corresponding latent to be equivariant, i.e.,

\forall g\in G\,,\,x\in\mathcal{X}:\mathcal{L}_{g}f_{\theta}(x;z_{t}^{\nu})=f_% {\theta}(g^{-1}x;z_{t}^{\nu})=f_{\theta}(x;gz^{\nu}_{t}).

(2)

Note that, somewhat imprecisely, we call this condition equivariance to convey the idea even though it is not, strictly speaking, the commonly used definition of equivariance for general operators. If we consider the decoder as a map** from latents to fields, we can make the notion of equivariance of this map** more precise. Namely

f(x)=D_{\theta}(z),D_{\theta}(z):z_{t}^{\nu}\mapsto f_{\theta}(\cdot;z_{t}^{% \nu})\,,f(g^{-1}x)=D_{\theta}(gz),D_{\theta}(gz):g\,z_{t}^{\nu}\mapsto f_{% \theta}(g^{-1}\,\cdot;z_{t}^{\nu})\,.

(3)

In Sec. 3.1 we describe the Equivariant Neural Field (ENF)-based decoder, which satisfies equation (2). Second, in Sec. 3.2 we outline the graph-based equivariant neural ODE. Sec. 3.3 explains the motivation for- and use of- meta-learning for obtaining the ENF backbone parameters. We show how the combination of equivariance and meta-learning produce much more structured latent representations of continuous signals (Fig. 3).

3.1 Representing PDE states with Equivariant Neural Fields

We briefly recap ENFs here, referring the reader to [48] for more detail. We extend ENFs to symmetries for PDEs over varying geometries.

ENFs as cross-attention over bi-invariant attributes.

Attention-based conditional neural fields represent a signal $\nu\in\mathcal{D}$ with a corresponding latent set $z^{\nu}$ [50]. This class of conditional neural fields obtain signal-specific reconstructions $\nu(x)\approx f_{\theta}(x;z^{\nu})$ through a cross-attention operation between the latent set $z^{\nu}$ and input coordinates $x$ . ENFs [48] extend this approach by imposing equivariance constraints w.r.t a group ${\rm G}\subseteq{\rm SE(n)}$ on the relation between the neural field and the latents such that transformations to the signal $\nu$ correspond to transformation of the latent $z^{\nu}$ (Eq. (2)). For this condition to hold, we need a well-defined action on the latent space $Z$ of $f_{\theta}$ . To this end, ENFs define elements of the latent set $z^{\nu}$ as tuples of pose $p_{i}\in{\rm G}$ and context $\mathbf{c}_{i}\in\mathbb{R}^{d}$ , $z^{\nu}:=\{(p_{i},\mathbf{c}_{i})\}^{N}_{i=1}$ . The latent space is then equipped with a group action defined as $gz=\{(gp_{i},\mathbf{c}_{i})\}_{i=1}^{N}$ . To achieve equivariance over transformations ENFs follow [5] where equivariance is achieved with convolutional weight-sharing over equivalence classes of points pairs $x,x^{\prime}$ . ENFs instead extend weight-sharing to cross-attention over bi-invariant attributes of $z,x$ pairs.

Weight-sharing over bi-invariant attributes of $z,x$ is motivated by Eq. 2, by which we have:

f_{\theta}(x;z)=f_{\theta}(gx;gz).

(4)

Intuitively, the above equation says that a transformation $g$ on the domain of $f_{\theta}$ , i.e. $g^{-1}x$ , can be undone by also acting with $g$ on $z$ . In other words, the output of the neural field $f_{\theta}$ should be bi-invariant to $g-$ transformations of the pair $z,x$ . For a specific pair $(z_{i},x_{m})\in Z\times X$ , the term bi-invariant attribute $\mathbf{a}_{i,m}$ describes a function $\mathbf{a}:(z_{i},x_{m})\mapsto\mathbf{a}(z_{i},x_{m})$ such that $\mathbf{a}(z_{i},x_{m})=\mathbf{a}(gz_{i},gx_{m})$ . Thorughout the paper we use $\mathbf{a}_{i,m}$ as shorthand for $\mathbf{a}(z_{i},x_{m})$ .

To parameterize $f_{\theta}$ , we can accordingly choose any function that is bi-invariant to $G-$ transformations of $z,x$ . In particular, for an input coordinate $x_{m}$ ENFs choose to make $f_{\theta}$ a cross-attention operation between attributes $\mathbf{a}_{i,m}$ and the invariant context vectors $\mathbf{c}_{i}$ :

f_{\theta}(x_{m},z)=\operatorname{cross\_attn}(\mathbf{a}_{:,m},\mathbf{c}_{:}% ,\mathbf{c}_{:})

(5)

As an example, for ${\rm SE(n)}$ -equivariance, we can define the bi-invariant simply using the group action: $\mathbf{a}^{\rm SE(n)}_{i,m}=p^{-1}_{i}x_{m}=\mathbf{R}_{i}^{T}(x_{m}-x_{i})$ , which is bi-invariant by:

\forall g\in{\rm SE(n)}:\;\;(p_{i},x)\;\mapsto\;(g\,p_{i},g\,x)\;\;\;% \Leftrightarrow\;\;\;p_{i}^{-1}x\;\mapsto\;(g\,p_{i})^{-1}g\,x=p_{i}^{-1}g^{-1% }g\,x=p_{i}^{-1}x\,.

(6)

Bi-invariant attributes for PDE solving.

As explained above, ENF is equivariant to $\mathrm{SE(n)}$ -transformations by defining $f_{\theta}$ as a function of an ${\rm SE(n)}-$ bi-invariant attribute $\mathbf{a}^{\rm SE(n)}$ . Although many physical processes adhere to roto-translational symmetries, we are also interested in solving PDEs that - due to the geometry of the domain, their specific formulation, and/or their boundary conditions - are not fully $\rm SE(n)-$ equivariant. As such, we are interested in extending ENFs to equivariances that are not strictly (subsets of) $\rm SE(n)$ , which we show we can achieve by finding bi-invariants that respect these particular transformations. Below, we provide two examples, the other invariants we use in the experiments - including a "bi-invariant" $\mathbf{a}^{\emptyset}$ that is not actually bi-invariant to any geometric transformations, which we use to ablate over equivariance constraints - are in Appx. D.

The flat 2-torus. When the physical domain of interest is continuous and extends indefinitely, periodic boundary conditions are often used, i.e. the PDE is defined over a space topologically equivalent to that of the 2-torus. Such boundary conditions break $\rm SO(2)$ symmetries; assuming the domain has periodicity $\pi$ and none of the terms of this PDE depend on the choice of coordinate frame, these boundary conditions imply that the PDE is equivariant to periodic translations: the group of translations modulo $\pi$ : $\mathbb{T}^{2}\equiv\mathbb{R}^{2}/\mathbb{Z}^{2}$ . In this case, periodic functions over $x,y$ with periods $\pi$ would work as a bi-invariant, i.e. using poses $p\in\mathbb{T}^{2}$ , $\mathbf{a}^{\mathbb{T}^{2}}=\cos(2\pi(x_{0}-p_{0}))+\cos(2\pi(x_{1}-p_{1}))$ - which happens to be bi-invariant to rotations by $\frac{\pi}{2}$ as well. Instead, since we do not assume any rotational symmetries to exist on the torus, we opt for a non-rotationally symmetric function:

\mathbf{a}_{i,m}^{\mathbb{T}^{2}}=\cos(2\pi(x_{i}^{0}-p_{i}^{0}))\oplus\cos(2% \pi(x_{i}^{1}-p_{i}^{1})),

(7)

where $\oplus$ denotes concatenation. This bi-invariant is used in experiments on Navier-Stokes over the flat 2-Torus.

The 2-sphere. In some settings a PDE may be symmetric only to rotations along a certain axes. An example is that of the global shallow-water equations on the two-sphere - used to model geophysical processes such as atmospheric flow [16], which are characterised by rotational symmetry only along the earth’s axis of rotation due to inclusion of a term for Coriolis acceleration that breaks full $\rm SO(3)$ equivariance. We use poses $p\in\rm SO(3)$ parametrised by Euler angles $\phi,\theta,\gamma$ , and spherical coordinates $\phi,\theta$ for $x\in S^{2}$ . We make the first two Euler angles coincide with the spherical coordinates and define a bi-invariant for rotations around the axis $\theta=\pi$ .

\mathbf{a}^{\text{SW}}_{i,m}=\Delta\phi_{p_{i},x_{m}}\oplus\theta_{p_{i}}% \oplus\gamma_{p_{i}}\oplus\theta_{x_{m}},

(8)

where $\Delta\phi_{p_{i},x_{m}}{=}\phi_{p_{i}}{-}\phi_{x_{m}}{-}2\pi$ if $\phi_{p_{i}}{-}\phi_{x_{m}}>\pi$ and $\Delta\phi_{p_{i},x_{m}}{=}\phi_{p_{i}}{-}\phi_{x_{m}}+2\pi$ if $\phi_{p_{i}}{-}\phi_{x_{m}}<{-}\pi$ , to adjust for periodicity.

In summary, to parameterize an ENF equivariant with respect to a specific group we are simply required to find attributes that are bi-invariant with respect to the same group. In general we achieve this by using group-valued poses and their action on the PDE domain.

3.2 PDE solution as latent space flow

Let $z^{\nu}_{0}$ be a latent set that faithfully reconstructs the initial state $\nu_{0}$ . We want to define a neural ODE $F_{\psi}$ that map latents $z^{\nu}_{t}$ to their temporal derivatives $\dfrac{dz_{\tau}^{\nu}}{d\tau}{=}F_{\psi}(z^{\nu}_{\tau})$ that is equivariant with respect to the group action: $gF_{\psi}(z^{\nu}_{\tau}){=}F_{\psi}(gz^{\nu}_{\tau})$ . To this end, we use a message passing neural network (MPNN) to learn a flow of poses $p_{i}$ and contexts $\mathbf{c}_{i}$ over time. We base our architecture on P $\Theta$ NITA [5], which employs convolutional weight-sharing over bi-invariants for $\rm SE(n)$ . For an in-depth recap of message-passing frameworks, we refer the reader to Appx. A. Since $F_{\psi}$ is required to be equivariant w.r.t. the group action, any updates to the poses $p_{i}$ should also be equivariant. [40] propose to parameterize an equivariant node position update by using a basis spanned by relative node positions $x_{j}-x_{i}$ . In our setting, poses $p_{i}$ are points on a manifold $M$ equipped with a group action. As such, we analogously propose parameterizing pose updates by a weighted combination of logarithmic maps $\log_{p_{i}}(p_{j})$ , which intuitively describe the relative position between $p_{i},p_{j}$ in the tangent space $T_{p_{i}}M$ , or the displacement from $p_{i}$ to $p_{j}$ . We integrate the resulting pose update over the manifold through the exponential map $\exp_{p_{i}}$ . In the euclidean case $\log_{p_{i}}(p_{j}){=}x_{j}-x_{i}$ and we get back node position updates per [40]. In short, the message passing layers we use consist of the following update functions:

\displaystyle\mathbf{c}_{i}^{l+1}=\sum\limits_{(p_{j},\mathbf{c}_{j})\in z^{% \nu,l}}k^{\text{context}}(\mathbf{a}^{l}_{i,j})\mathbf{c}_{j}^{l},

\displaystyle p_{i}^{l+1}=\exp_{p_{i}^{l}}\bigg{(}\frac{1}{N}\sum\limits_{(p_{% j}^{l},\mathbf{c}_{j}^{l})\in z^{\nu,l}}k^{\text{pose}}(\mathbf{a}^{l}_{i,j})% \mathbf{c}_{j}^{l}\log_{p_{i}^{l}}(p_{j}^{l})\bigg{)},

(9)

with $k^{\text{context}},k^{\text{pose}}$ message functions weighting the incoming context and pose updates, parameterized by a two-layer MLP as a function of the respective bi-invariant.

3.3 Obtaining the initial latent $z^{\nu}_{0}$

Until now we’ve not discussed how to obtain latents corresponding to the initial condition $z^{\nu}_{0}$ . An approach often used in conditional neural field literature is that of autodecoding [35], where latents $z^{\nu}$ are optimized for reconstruction of the input signal $\nu$ with SGD. Optimizing a NeF for reconstruction does not necessarily lead to good quality representations [34], i.e. using MSE-based autodecoding to obtain latents $z^{\nu}_{t}$ - as is proposed by [49] - may complicate the latent space, impeding optimization of the neural ODE $F_{\psi}$ . Moreover, autodecoding requires many optimization steps at inference (for reference, [49] use 300-500 steps). [13] propose meta-learning as a way to overcome long inference times, as it allows for fitting latents in a few steps - typically three or four. We hypothesize that meta-learning may also structure the latent space - similar to the impact of equivariance constraints, since the very limited number of optimization steps requires efficient organization of latents $z^{\nu}_{t}$ around the (shared) initialization, forcing together the latent representation of contiguous states. To this end, we propose to use meta-learning for obtaining the initial latent $z^{\nu}_{0}$ , which is then unrolled by the neural ode $F_{\psi}$ to find solutions $z^{\nu}_{t}$ .

3.4 Equivariance and meta-learning structure the latent space $Z$

As a first validation of the hypotheses that both equivariance constraints and meta-learning introduce structure to the latent space of $f_{\theta}$ , we visualize latent spaces of different variants of the ENF. We fit ENFs to a dataset consisting of solutions to the heat equation for various initial conditions (details in Appx. E). For each sample $\nu_{t}$ , we obtain a set of latents $z^{\nu}_{t}$ , which we average over the invariant context vectors $\mathbf{c}_{i}\in\mathbb{R}^{c}$ to obtain a single vector in $\mathbb{R}^{c}$ invariant to a group action according to the chosen bi-invariant. Next, we apply T-SNE [46] to the resulting vectors in $\mathbb{R}^{c}$ . We use three different setups: (a) no meta-learning, model weights $\theta$ and latents $z^{\nu}_{t}$ optimized for every $\nu_{t}$ separately using autodecoding [35], and no equivariance imposed (per Eq. 15), shown in Fig. 3(a). (b) meta-learning is used to obtain $\theta$ , $z^{\nu}_{t}$ , but no equivariance imposed, shown in Fig. 3(b) and (c) meta-learning is used to obtain $\theta$ , $z^{\nu}_{t}$ and ${\rm SE(2)}$ -equivariance is imposed by weight-sharing over $\mathbf{a}^{\rm SE(n)}$ bi-invariants, shown in Fig. 3(c). The results confirm our intuition that both meta-learning and equivariance improve latent-space structure.

Recap: optimization objective.

We use a meta-learning inner-loop [28, 13] to obtain the initial latent $z^{\nu}_{0}$ under supervision of coordinate-value pairs $(x,\nu(x)_{0})_{x\in\mathcal{X}}$ from $\nu_{0}$ . This latent is unrolled for $t_{\text{train}}$ timesteps using $F_{\psi}$ . The obtained latents are used to reconstruct states $z^{\nu}_{t}$ along the trajectory of $\nu$ , and parameters of $f_{\theta},F_{\psi}$ are optimised for reconstruction MSE, as shown in the left-hand side of Eq. 1. See Appx. B for detailed pseudocode of this process.

4 Experiments

We intend to show the impact of symmetry-preservation in continuous PDE solving. To this end we perform a range of experiments assessing different qualities of our model on tasks with different symmetries. First, we investigate the equivariance properties of our framework by evaluating it against unseen geometric transformations of the initial conditions. Next, we assess generalization and extrapolation capabilities w.r.t. unseen spatial locations and time horizons inside and outside the time ranges seen during training respectively, robustness to partial test-time observations, and data-efficiency. As the continuous nature of NeF-based PDE solving allows, we verify these properties for PDEs defined over challenging geometries: the plane $\mathbb{R}^{2}$ , 2-torus $\mathbb{T}^{2}$ and the sphere $S^{2}$ and the 3D ball $\mathbb{B}^{3}$ . Architectural details and hyperparameters are in Appx. E. Code is attached to submission.

4.1 Datasets and evaluation

All datasets are obtained by randomly sampling disjoint sets of initial conditions for train and test sets, and solving them using numerical methods. Dataset-specific details on generation can be found in Appx E. •Heat equation on $\mathbb{R}^{2}$ and $S^{2}$ . The heat equation describes diffusion over a surface: $\frac{dc}{dt}=D\nabla^{2}c$ , where $c$ is a scalar field, and $D$ is the diffusivity coefficient. We solve it on the 2D plane where $\nabla^{2}c=\frac{\partial^{2}c}{\partial x_{1}}+\frac{\partial^{2}c}{\partial x% _{2}}$ - and on the 2-sphere $S^{2}$ where in spherical coordinates: $\nabla^{2}c=\left(\frac{1}{\sin\theta}\frac{\partial}{\partial\theta}\left(% \sin\theta\frac{\partial c}{\partial\theta}\right)+\frac{1}{\sin^{2}\theta}% \frac{\partial^{2}c}{\partial\phi^{2}}\right)$ . Although a relatively simple PDE, we find that defining it over a non-trivial geometry such as the sphere proves hard for non-equivariant methods. •Navier-Stokes on $\mathbb{T}^{2}$ . We solve 2D Navier Stokes [42] for an incompressible fluid with dynamics $\frac{dv}{dt}=-u\nabla v+v\Delta\mu+f,v=\nabla\times u,\nabla u=0$ , where $u$ is the velocity field, $v$ the vorticity, $\mu$ the viscosity and $f$ a forcing term (see Appx. E). We create a dataset of solutions for the vorticity using Gaussian random fields as initial conditions. Due to the incompressibility condition, it is natural to solve this PDE with periodic boundary conditions corresponding to the topology of a 2-Torus $\mathbb{T}^{2}$ - implying equivariance to periodic translation. •Shallow-water on $\mathbb{S}^{2}$ . The global shallow-water equations model large-scale oceanic and atmospheric flow on the globe, derived from Navier-Stokes under assumption of shallow fluid depth. The global shallow-water equations (see Appx. E) include terms for Coriolis accelleration, which makes this problem equivariant to rotation along the globe’s axis of rotation. We follow the IVP specified by [16], and create a dataset of paired vorticity-fluid height solutions. •Internally-heated convection in a 3D ball. We solve the Boussinesq equation for internally heated convection in a ball, a model relevant for example in the context of the Earth’s mantle convection. It involves continuity equations for mass conservation, momentum equations for fluid flow under pressure, viscous forces and buoyancy, and a term modelling heat transfer. We generate initial conditions varying the internal temperature using $N(0,1)$ noise and obtain solutions for the temperature defined over a regular spherical $\phi,\theta,r$ grid.

Evaluation.

All reported MSE values are for predictions obtained given only the initial condition $v_{0}$ , with std over 3 runs. We evaluate two settings for train and test sets both: generalization setting with time evolution happening within the seen horizon during training ( $t_{\text{in}}$ ); and, extrapolation setting with the time evolution happening outside the seen horizon during training ( $t_{\text{out}}$ ). For both cases we measure the mean-squared error (MSE). To position our work relative to competitive data-driven PDE solvers, on the 2D-Navier-Stokes experiment we provide comparisons with a range of baselines. In most other settings these models cannot straightforwardly be applied, and we only compare to [49], to our knowledge the only other fully continuous PDE solving method in literature.

Equivariance properties - heat equation on the plane.

To verify our framework respects the posed equivariance constraints, we create a dataset of solutions to the heat equation that requires a neural solver to respect equivariance constraints to achieve good performance. Specifically, for initial conditions we randomly insert a pulse of variable intensity in $x=(x_{1},x_{2})\in\mathbb{R}^{2}$ s.t. $-1{<}x_{1}{<}1,0{<}x_{2}{<}1$ for the training data and $-1{<}x_{1}{<}1,-1{<}x_{2}{<}0$ for the test data. Intuitively, train and test sets contain spikes under different disjoint sets of roto-translations (see Fig. 4). We train variants of our framework with ( $\mathbf{a}^{\rm SE(2)}$ , Eq. 6) and without ( $\mathbf{a}^{\emptyset}$ , Eq. 15) equivariance constraints. In this dataset, we set $t_{\text{in}}=\left[0,...,9\right]$ , and evaluation horizon $t_{\text{out}}=\left[10,...,20\right]$ . Results in Tab. 1 show that the non-equivariant model, as well as the baseline [49] are unable to successfully solve test initial conditions, whereas the equivariant model performs well.

Table 2: MSE

\downarrow

for Navier-Stokes on

\mathbb{T}^{2}

	$t_{\text{in}}$ train	$t_{\text{out}}$ train	$t_{\text{in}}$ test	$t_{\text{out}}$ test
	100% of $\nu_{0}$ observed
CNODE [2]	6.02E-02	3.35E-01	5.48E-02	3.17E-01
FNO	9.43E-05	2.11E-03	8.44E-05	1.60E-03
G-FNO	3.13E-05	3.49E-04	3.15E-05	3.52E-04
DINo [49]	8.20E-03	6.85E-02	1.11E-02	9.08E-02
Ours AD, $\mathbf{a}^{\mathbb{T}_{2}/\pi}$	5.60 $\pm$ 0.43E-02	0.37 $\pm$ 0.34E-01	6.75 $\pm$ 0.62E-02	4.00 $\pm$ 0.38E-01
Ours $\mathbf{a}^{\emptyset}$	1.41 $\pm$ 1.83E-02	1.67 $\pm$ 1.27E-01	2.60 $\pm$ 3.16E-02	2.14 $\pm$ 1.46E-01
Ours $\mathbf{a}^{\mathbb{T}_{2}/\pi}$	1.45 $\pm$ 0.08E-03	9.14 $\pm$ 0.36E-03	1.57 $\pm$ 0.09E-03	1.16 $\pm$ 0.14E-02
	50% of $\nu_{0}$ observed
CNODE [2]	1.38E-01	6.33E-01	1.52E-01	6.76E-01
FNO	3.31E-02	1.39E-01	3.20E-02	1.47E-01
G-FNO	2.75E-02	1.17E-01	2.32E-02	1.01E-01
DINo [49]	3.67E-02	2.81E-01	3.74E-02	2.83E-01
Ours AD, $\mathbf{a}^{\mathbb{T}_{2}/\pi}$	6.89 $\pm$ 2.68E-02	3.95 $\pm$ 2.18E-01	7.01 $\pm$ 3.56E-02	4.01 $\pm$ 2.29E-01
Ours $\mathbf{a}^{\emptyset}$	1.05 $\pm$ 0.04E-02	1.45 $\pm$ 0.01E-01	2.60 $\pm$ 3.16E-02	2.14 $\pm$ 1.46E-01
Ours $\mathbf{a}^{\mathbb{T}_{2}/\pi}$	1.50 $\pm$ 0.17E-03	8.97 $\pm$ 1.57E-03	5.75 $\pm$ 2.58E-03	5.03 $\pm$ 2.63E-02
	5% of $\nu_{0}$ observed
CNODE [2]	1.23E+01	2.14E+01	1.20E+01	4.35E+01
FNO	4.13E-01	7.70E-01	3.84E-01	7.07E-01
G-FNO	3.56E-01	7.09E-01	3.40E-01	6.47E-01
DINo [49]	3.67E-02	2.81E-01	3.94E-02	2.91E-01
Ours AD, $\mathbf{a}^{\mathbb{T}_{2}/\pi}$	6.89 $\pm$ 2.68E-02	3.95 $\pm$ 2.18E-01	7.01 $\pm$ 3.56E-02	4.01 $\pm$ 2.29E-01
Ours $\mathbf{a}^{\emptyset}$	7.31 $\pm$ 1.37E-02	2.97 $\pm$ 2.42E-01	7.96 $\pm$ 1.65E-02	3.35 $\pm$ 3.41E-01
Ours $\mathbf{a}^{\mathbb{T}_{2}/\pi}$	3.19 $\pm$ 1.07E-02	1.33 $\pm$ 0.35E-01	3.44 $\pm$ 1.43E-02	1.61 $\pm$ 4.93E-01

Robustness to subsampling & time-horizons - Navier-Stokes on the 2-Torus.

We perform an experiment assessing the impact of equivariance constraints and meta-learning on robustness to sparse test-time observations of the initial condition. To this end, we train a model with ( $\mathbf{a}^{\mathbb{T}^{2}}$ , Eq. 7), without ( $\mathbf{a}^{\emptyset}$ , Eq. 15) equivariance constraints, and one with equivariance constraints and without meta-learning (AD $\mathbf{a}^{\mathbb{T}^{2}}$ , Eq. 7), on a fully-observed train set. The training horizon $t_{\text{in}}=\left[0,...,9\right]$ , and evaluation horizon $t_{\text{out}}=\left[10,...,20\right]$ . Subsequently, we apply the trained model to the problem of solving from sparse initial conditions $v_{0}$ , with observation rates where $50\%$ and $5\%$ of the initial condition is observed (Tab. 2). Approaches operating on discrete (CNODE [2]) and regular grids (FNO [29], G-FNO [20]) perform very well when evaluated on fully-observed regular grids, outperforming continuous approaches (ours, [49]). However, we note that all discrete/regular models greatly deteriorate in performance when observation rates decrease. Equivariance constraints and meta-learning clearly improve performance overall, achieving best perfomance in all sparse settings. Our proposed framework performs competitively to discrete baselines and other NeF based PDE solving methods [49] in the fully observed setting. To qualitatively assess long-term stability well-beyond the train horizon, we visualizate test trajectory and the solution found by our model for $t_{\text{in}}=\left[0,...,9\right],t_{\text{out}}=\left[10,...,20\right]$ and beyond in Fig. 6.

Data-efficiency - Diffusion on the sphere.

To assess the impact of equivariance on data efficiency, we vary the size of the training set of heat equation solutions from 16 to 64 trajectories and apply a model with ( $\mathbf{a}^{\rm SO(3)}$ , Eq. 13) and without ( $\mathbf{a}^{\emptyset}$ , Eq. 15) equivariance constraints. In this dataset, we set $t_{\text{in}}=\left[0,...,9\right]$ , and evaluation horizon $t_{\text{out}}=\left[10,...,20\right]$ . We visualize $t_{\text{in}}$ test- and train MSE in Fig. 6. These results show the non-equivariant model overfitting the training set for smaller numbers of trajectories while unable to solve the PDE satisfactorily, whereas the equivariant model generalizes well even with only 16 training trajectories.

[Uncaptioned image] — Table 3: MSE $\downarrow$ on Shallow-Water equations on the sphere.

	$t_{\text{in}}$ train	$t_{\text{out}}$ train	$t_{\text{in}}$ test	$t_{\text{out}}$ test
DINo [49]	2.94E-03	7.56E-02	3.06E-03	7.78E-02
Ours $\mathbf{a}^{\mathbb{B}^{3}}$	5.79 $\pm$ 0.17E-04	7.72 $\pm$ 0.55E-03	5.99 $\pm$ 0.15E-04	7.97 $\pm$ 0.46E-03

Super-resolution - Shallow-Water on the sphere.

Due to their continuous nature, NeF-based approaches inherently support zero-shot super-resolution. In this setting, we generate a set of solutions for the global shallow-water equations over $\mathbb{S}^{2}$ at $2\times$ resolution, and apply mean-pooling with a kernel size of 2 to obtain a low-resolution dataset. We train a model that respects rotational symmetries along the rotation axis of the globe ( $\mathbf{a}^{\rm SW}$ , Eq. 8) at train resolution, and evaluate the model by solving initial conditions at $2\times$ resolution (Tab. 4.1, Fig. 7). In this dataset, we set $t_{\text{in}}=\left[0,...,9\right]$ , and evaluation horizon $t_{\text{out}}=\left[10,...,14\right]$ . First, we note that our model has difficulty capturing the dynamics near $t_{\text{out}}$ - and beyond the training horizon, i.e. $t=>9$ - we suspect because of accumulation of reconstruction errors impacting the ability of $F_{\psi}$ to model the relatively volatile dynamics of these equations. This points to a drawback of NeF-based solvers: error accumulation starts with the reconstruction error on the initial condition. Ranging over our experiments, we found that this error can be reduced by increasing model capacity, at steep cost of computational complexity attributable to the global attention operator in the ENF backbone. Regarding super-resolution; the model is able to solve the high-resolution initial conditions without inducing significantly increased MSE - it does not produce significant artefacts in the process.

Challenging geometries - Internally heated convection in 3D ball.

We show the value of inductive biases in modelling over a challenging geometry. We apply an equivariant model ( $\mathbf{a}^{\mathbb{B}^{3}}$ , Eq. 14) to a set of solutions to Boussinesq internally heated convection in a ball defined over a regular $\phi,\theta,r$ -grid, where we set $t_{\text{in}}=\left[0,...,9\right]$ , and evaluation horizon $t_{\text{out}}=\left[10,...,14\right]$ . Results (Tab. 4, Fig. 8) for our equivariant model show good generalization compared to a non-equivariant baseline [49]. We interpret this as an indication of a marked reduction in solving-complexity when correctly accounting for a PDE’s symmetries.

5 Conclusion

We introduce a novel equivariant space-time continuous framework for solving partial differential equations (PDEs). Uniquely - our method handles sparse or irregularly sampled observations of the initial state while respecting symmetry-constraints and boundary conditions of the underlying PDE. We clearly show the benefit of symmetry-preservation over a range of challenging tasks, where existing methods fail to capture the underlying dynamics.

References

Auzina et al. [2024] Ilze Amanda Auzina, Çağatay Yıldız, Sara Magliacane, Matthias Bethge, and Efstratios Gavves. Modulated neural odes. Advances in Neural Information Processing Systems, 36, 2024.
Ayed et al. [2020] Ibrahim Ayed, Emmanuel De Bezenac, Arthur Pajot, and Patrick Gallinari. Learning the spatio-temporal dynamics of physical processes from partial observations. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3232–3236. IEEE, 2020.
Bauer et al. [2023] Matthias Bauer, Emilien Dupont, Andy Brock, Dan Rosenbaum, Jonathan Richard Schwarz, and Hyunjik Kim. Spatial functa: Scaling functa to imagenet classification and generation. arXiv preprint arXiv:2302.03130, 2023.
Bekkers [2019] Erik J Bekkers. B-spline cnns on lie groups. In International Conference on Learning Representations, 2019.
Bekkers et al. [2023] Erik J Bekkers, Sharvaree Vadgama, Rob D Hesselink, Putri A van der Linden, and David W Romero. Fast, expressive se $(n)$ equivariant networks through weight-sharing in position-orientation space. arXiv preprint arXiv:2310.02970, 2023.
Brandstetter et al. [2021] Johannes Brandstetter, Rob Hesselink, Elise van der Pol, Erik J Bekkers, and Max Welling. Geometric and physical quantities improve e (3) equivariant message passing. arXiv preprint arXiv:2110.02905, 2021.
Brandstetter et al. [2022a] Johannes Brandstetter, Rianne van den Berg, Max Welling, and Jayesh K Gupta. Clifford neural layers for pde modeling. arXiv preprint arXiv:2209.04934, 2022a.
Brandstetter et al. [2022b] Johannes Brandstetter, Daniel Worrall, and Max Welling. Message passing neural pde solvers. arXiv preprint arXiv:2202.03376, 2022b.
Bronstein et al. [2021] Michael M Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478, 2021.
Burns et al. [2020] Keaton J. Burns, Geoffrey M. Vasil, Jeffrey S. Oishi, Daniel Lecoanet, and Benjamin P. Brown. Dedalus: A flexible framework for numerical simulations with spectral methods. Physical Review Research, 2(2):023068, April 2020. doi: 10.1103/PhysRevResearch.2.023068.
Chen et al. [2018] Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
Cohen and Welling [2016] Taco Cohen and Max Welling. Group equivariant convolutional networks. In International conference on machine learning, pages 2990–2999. PMLR, 2016.
Dupont et al. [2022] Emilien Dupont, Hyunjik Kim, SM Eslami, Danilo Rezende, and Dan Rosenbaum. From data to functa: Your data point is a function and you can treat it like one. arXiv preprint arXiv:2201.12204, 2022.
Finn et al. [2017] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
Finzi et al. [2020] Marc Finzi, Samuel Stanton, Pavel Izmailov, and Andrew Gordon Wilson. Generalizing convolutional neural networks for equivariance to lie groups on arbitrary continuous data. In International Conference on Machine Learning, pages 3165–3176. PMLR, 2020.
Galewsky et al. [2004] Joseph Galewsky, Richard K Scott, and Lorenzo M Polvani. An initial-value problem for testing numerical models of the global shallow-water equations. Tellus A: Dynamic Meteorology and Oceanography, 56(5):429–440, 2004.
Gilmer et al. [2017] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
Greydanus et al. [2019] Samuel Greydanus, Misko Dzamba, and Jason Yosinski. Hamiltonian neural networks. Advances in neural information processing systems, 32, 2019.
Guo et al. [2016] Xiaoxiao Guo, Wei Li, and Francesco Iorio. Convolutional neural networks for steady flow approximation. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 481–490, 2016.
Helwig et al. [2023] Jacob Helwig, Xuan Zhang, Cong Fu, Jerry Kurtin, Stephan Wojtowytsch, and Shuiwang Ji. Group equivariant fourier neural operators for partial differential equations. Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023., 2023.
Hernández et al. [2021] Quercus Hernández, Alberto Badías, David González, Francisco Chinesta, and Elías Cueto. Structure-preserving neural networks. Journal of Computational Physics, 426:109950, 2021.
** et al. [2020] Pengzhan **, Zhen Zhang, Aiqing Zhu, Yifa Tang, and George Em Karniadakis. Sympnets: Intrinsic structure-preserving symplectic networks for identifying hamiltonian systems. Neural Networks, 132:166–179, 2020.
Kingma and Ba [2014] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Kipf and Welling [2016] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
Knigge et al. [2022] David M Knigge, David W Romero, and Erik J Bekkers. Exploiting redundancy: Separable group convolutional networks on lie groups. In International Conference on Machine Learning, pages 11359–11386. PMLR, 2022.
Kofinas et al. [2023] Miltiadis Miltos Kofinas, Erik Bekkers, Naveen Nagaraja, and Efstratios Gavves. Latent field discovery in interacting dynamical systems with neural fields. Advances in Neural Information Processing Systems, 36, 2023.
Kovachki et al. [2021] Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces. arXiv preprint arXiv:2108.08481, 2021.
Li et al. [2017] Zhenguo Li, Fengwei Zhou, Fei Chen, and Hang Li. Meta-sgd: Learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835, 2017.
Li et al. [2020] Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895, 2020.
Liu et al. [2023] Yongtuo Liu, Sara Magliacane, Miltiadis Kofinas, and Efstratios Gavves. Graph switching dynamical systems. In International Conference on Machine Learning, pages 21867–21883. PMLR, 2023.
Liu et al. [2024] Yongtuo Liu, Sara Magliacane, Miltiadis Kofinas, and Efstratios Gavves. Amortized equation discovery in hybrid dynamical systems, 2024.
Moser et al. [2023] Philipp Moser, Wolfgang Fenz, Stefan Thumfart, Isabell Ganitzer, and Michael Giretzlehner. Modeling of 3d blood flows with physics-informed neural networks: Comparison of network architectures. Fluids, 8(2):46, 2023.
Nichol et al. [2018] Alex Nichol, Joshua Achiam, and John Schulman. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999, 2018.
Papa et al. [2023] Samuele Papa, David M Knigge, Riccardo Valperga, Nikita Moriakov, Miltos Kofinas, Jan-Jakob Sonke, and Efstratios Gavves. Neural modulation fields for conditional cone beam neural tomography. arXiv preprint arXiv:2307.08351, 2023.
Park et al. [2019] Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174, 2019.
Perez et al. [2018] Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
Pervez et al. [2024] Adeel Pervez, Francesco Locatello, and Efstratios Gavves. Mechanistic neural networks for scientific machine learning. arXiv preprint arXiv:2402.13077, 2024.
Pfaff et al. [2020] Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter W Battaglia. Learning mesh-based simulation with graph networks. arXiv preprint arXiv:2010.03409, 2020.
Prasthofer et al. [2022] Michael Prasthofer, Tim De Ryck, and Siddhartha Mishra. Variable-input deep operator networks. arXiv preprint arXiv:2205.11404, 2022.
Satorras et al. [2021] Vıctor Garcia Satorras, Emiel Hoogeboom, and Max Welling. E (n) equivariant graph neural networks. In International conference on machine learning, pages 9323–9332. PMLR, 2021.
Sitzmann et al. [2020] Vincent Sitzmann, Eric Chan, Richard Tucker, Noah Snavely, and Gordon Wetzstein. Metasdf: Meta-learning signed distance functions. Advances in Neural Information Processing Systems, 33:10136–10147, 2020.
Stokes et al. [1851] George Gabriel Stokes et al. On the effect of the internal friction of fluids on the motion of pendulums. 1851.
Tancik et al. [2020] Matthew Tancik, Pratul Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan Barron, and Ren Ng. Fourier features let networks learn high frequency functions in low dimensional domains. Advances in neural information processing systems, 33:7537–7547, 2020.
Tancik et al. [2021] Matthew Tancik, Ben Mildenhall, Terrance Wang, Divi Schmidt, Pratul P Srinivasan, Jonathan T Barron, and Ren Ng. Learned initializations for optimizing coordinate-based neural representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2846–2855, 2021.
Valperga et al. [2022] Riccardo Valperga, Kevin Webster, Dmitry Turaev, Victoria Klein, and Jeroen Lamb. Learning reversible symplectic dynamics. In Proceedings of The 4th Annual Learning for Dynamics and Control Conference, volume 168 of Proceedings of Machine Learning Research, pages 906–916. PMLR, 23–24 Jun 2022.
Van der Maaten and Hinton [2008] Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
Weiler and Cesa [2019] Maurice Weiler and Gabriele Cesa. General e (2)-equivariant steerable cnns. Advances in neural information processing systems, 32, 2019.
Wessels et al. [2024] David R Wessels, David M Knigge, Samuele Papa, Riccardo Valperga, Efstratios Gavves, and Erik J Bekkers. Grounding continuous representations in geometry: Equivariant neural fields. ArXiv Preprint arXiv:, 2024.
Yin et al. [2022] Yuan Yin, Matthieu Kirchmeyer, Jean-Yves Franceschi, Alain Rakotomamonjy, and Patrick Gallinari. Continuous pde dynamics forecasting with implicit neural representations. arXiv preprint arXiv:2209.14855, 2022.
Zhang et al. [2023] Biao Zhang, Jiapeng Tang, Matthias Niessner, and Peter Wonka. 3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models. ACM Transactions on Graphics (TOG), 42(4):1–16, 2023.
Zhdanov et al. [2024] Maksim Zhdanov, David Ruhe, Maurice Weiler, Ana Lucic, Johannes Brandstetter, and Patrick Forré. Clifford-steerable convolutional neural networks. arXiv preprint arXiv:2402.14730, 2024.
Zwicker [2020] David Zwicker. py-pde: A python package for solving partial differential equations. Journal of Open Source Software, 5(48):2158, 2020. doi: 10.21105/joss.02158. URL https://doi.org/10.21105/joss.02158.

Appendix A Related work

DL approaches to dynamics modelling

In recent years, the learning of spatiotemporal dynamics has been receiving significant attention, either for modelling interacting systems [31, 30], scientific Machine Learning [49, 8, 7, 37, 26, 51], or even videos [1]. Most DL methods for solving PDEs attempt to directly replace solvers with map**s between finite-dimensional Euclidean spaces, i.e. through the use of CNNs [19, 2] or GNNs [38, 8] often applied autoregressively to an observed (discretized) PDE state. Instead, the Neural Operator (NO) [27] paradigm attempts to learn infinite-dimensional operators, i.e. map**s between function spaces, with limited success. Fourier Neural Operator (FNO) [29] extends this method by performing convolutions in the spectral domain. FNO obtains much improved performance, but due to its reliance on FFT is limited to data on regular grids.

Inductive biases in DL and dynamics modelling

Geometric Deep Learning aims to improve model generalization and performance by constraining/designing a model’s space of learnable functions based on geometric principles. Prominent examples include Group Equivariant Convolutional Networks and Steerable CNNs [12, 4], generalizations of CNNs that respect symmetries of the data - such as dilations and continuous rotations [47, 15, 25]. Analogously, Graph Neural Networks (GNNs) [24] or Message Passing Neural Networks (MPNNS) [17] are a variant of neural network that respects set-permutations naturally found in graph data. They are typically formulated for graphs $\mathcal{G}=(\mathcal{V},\mathcal{E})$ , with nodes $i\in\mathcal{V}$ and edges $\mathcal{E}$ . Typically nodes are embedded into a node vector $f_{i}^{0}$ , which is subsequently updated over multiple layers of message passing. Message passing consists of (1) computing messages $m_{i,j}$ over edges $i,j$ from node $j$ to $i$ with the message function (taking into account edge attributes $a_{i,j}$ : $m_{i,j}=\phi_{m}(f_{i}^{l},f_{j}^{l},a_{i,j})$ (2) aggegating incoming messages: $m_{i}=\sum_{j\in\mathcal{N}(i)}m_{i,j}$ , (3) computing updated node features $f_{i}^{l+1}=\phi_{u}(f_{i}^{l},m_{i})$ .

Recently, such methods have also been adapted for sparse physical data, e.g. for molecular property prediction [40, 6] - where the GNN is additionally required to respect transformation symmetries. [5] unifies these approaches to equivariance under the guise of weight sharing over equivalence classes defined by bi-invariant attributes of pairs of nodes $i,j$ , a viewpoint we leverage in constructing the equivariant conditioning latent $z_{t}^{\nu}$ corresponding to a PDE state $\nu_{t}$ . In the context of dynamics modelling, equivariant architectures have been employed to incorporate various properties of physical systems in the modelling process, examples of such properties are the symplectic structure [22], discrete symmetries such as reversing symmetries [45] and energy conservation [18, 21].

Neural Fields in dynamics modelling

Conditional Neural fields (NeFs) are a class of coordinate-based neural networks, often trained to reconstruct discretely-sampled input continuously. More specifically, a conditional neural field $f_{\theta}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{d}$ is a field –parameterized by a neural network with parameters $\theta$ – that maps input coordinates $x\in\mathbb{R}^{n}$ in the data domain alongside conditioning latents $z$ to $d$ -dimensional signal values $f(x)\in\mathbb{R}^{d}$ . By associating a conditioning latent $z^{\nu}\in\mathbb{R}^{c}$ to each signal ${\nu}$ , a single conditional NeF $f_{\theta}:\mathbb{R}^{n}\times\mathbb{R}^{c}\rightarrow\mathbb{R}^{d}$ can learn to represent families $\mathcal{D}$ of continuous signals such that $\forall\,\nu\in\mathcal{D}:f(\mathbf{x})\approx f_{\theta}(\mathbf{x};\mathbf{% z}^{\nu})$ . [13] showed the viability of using the latents $\mathbf{z}^{i}$ as representations for downstream tasks (e.g. classification, generation) proposing a framework for learning on neural fields. This framework inherits desirable properties of neural fields, such as inherent support for sparsely and/or irregularly sampled data, and independence to signal resolution. [49] propose to use conditional NeFs for PDE modelling by learning a continuous flow in the latent space of a conditional neural field. In particular, a set of latents $\{\mathbf{z}_{i}^{\nu}\}_{i=1}^{T}$ are obtained by fitting a conditional neural field to a given set of observations $\{\nu_{i}\}_{i=1}^{T}$ at timesteps $1,...,T$ ; simultaneously, a neural ODE [11] $F_{\psi}$ is trained to map pairs of temporally continuous latents s.t. solutions correspond to the trajectories traced by the learned latents. Though this approach yields impressive results for sparse and irregular data in planar PDEs, we show it breaks down on more challenging geometries. We hypothesize that this is due to a lack of a latent space that preserves relevant geometric transformation with respect to which systems we are modelling are symmetric, and as such propose an extension of this framework where such symmetries are preserved.

Obtaining Neural Fields representations

Most NeF-based approach to representation or reconstruction use SGD to optimize (a subset of) the parameters of the NeF, inevitably leading to significant overhead in inference; conditional NeFs require optimizing a (set of) latents from initialization to reconstruct for a novel sample. Accordingly, research has explored ways of addressing this limitation. [41, 44] propose using Meta-Learning [14, 33] to optimize for an initialization for the NeF from which it is possible to reconstruct for a novel sample in as few as 3 gradient descent steps. [13] propose to meta-learn the NeF backbone, but fix the initialization for the latent $\mathbf{z}$ and instead optimize the learning rate used in its optimization using Meta-SGD [28]. Recently, work has also explored the relation between initialization/optimization of a NeF and its value as downstream representation; [34] show that (1) using a shared NeF initialization and (2) limiting the number of gradient updates to the NeF improves performance in downstream tasks, as this simplifies the complex relation between a NeFs parameter space and its output function space. We combine these insights and make Meta-Learning part of our equivariant PDE solving pipeline, as it enables fast inference and we show it to simplify the latent space of the ENF, improving performance of the neural ODE solver.

Appendix B Pseudocode for optimization objective

See Alg. 1 for pseudocode of the training loop that we use, written for a single datasample for simplicity of notation. For simplicity, we further assume we’re using an euler stepper to solve the neural ODE, but this can be replaced by any solver. For inference, this stratagem is identical, except we do not perform gradient updates to $\theta,\psi$ .

Algorithm 1 Optimization objective

Randomly initialize neural field

f_{\theta}

Randomly initialize neural ode

F_{\psi}

while not done do Sample initial states and coordinates

\nu_{0}

. Initialize latents

z_{0}^{\nu}\leftarrow\{(p_{i},\mathbf{c}_{i})\}_{i=1}^{N}

for all step

\in N_{\text{initial state opt}}=3

z_{0}^{\nu}\leftarrow z_{0}^{\nu}-\epsilon\nabla_{z_{0}^{\nu}}\mathcal{L}_{% \text{mse}}\big{(}f_{\theta}(\cdot,z_{0}^{\nu}),\nu_{0})\big{)}

end for

for all

t\in\left[1,...,t_{\text{in}}\right]

z^{\nu}_{t}\leftarrow z^{\nu}_{0}+\int_{0}^{t}F_{\psi}(z^{\nu}_{\tau})d\tau

end for

Update

\theta,\psi

per:

\theta\leftarrow\theta-\eta\nabla_{\theta}\mathcal{L}_{\text{mse}}^{\prime}

\psi\leftarrow\psi-\eta\nabla_{\psi}\mathcal{L}_{\text{mse}}^{\prime}

with

\mathcal{L}_{\text{mse}}^{\prime}=(\big{\{}f_{\theta}(\cdot,z^{\nu}_{t}),\nu_{% t}\big{\}}_{t=0}^{t_{\text{in}}}\big{)}

end while

Appendix C Equivariant Neural Fields

ENF to reconstruct PDE states

For ease of notation we denote $\mathbf{P}$ and $\mathbf{C}$ the matrices containing poses and corresponding appearances stacked row-wise, i.e. $\mathbf{P}_{i,:}=p_{i}^{T}$ and $\mathbf{C}_{i,:}=\mathbf{c}_{i}^{T}$ . Furthermore, we denote $\mathbf{A}$ as the matrix containing all bi-invariants $\mathbf{a}_{i,m}$ stacked row-wise, i.e. $\mathbf{A}_{i,:}=\mathbf{a}_{i,m}^{T}$ :

f_{\theta}(\mathbf{x};{z}^{\nu_{t}}):=\operatorname{softmax}\bigg{(}\frac{% \mathbf{Q}(\mathbf{A})\mathbf{K}^{T}(\mathbf{C})}{\sqrt{d_{k}}}+\mathbf{G}(% \mathbf{A})\bigg{)}\mathbf{V}(\mathbf{C};\mathbf{A}),

(10)

where the softmax is applied over the latent set and with $d_{k}$ the hidden dimensionality of the ENF. The query matrix $\mathbf{Q}$ is constructed as $\mathbf{Q}{=}\mathbf{W}_{q}\gamma_{q}^{T}(\mathbf{A})$ , $\gamma_{q}$ a Gaussian RFF embedding [43], followed by a linear layer $\mathbf{W}_{q}$ , i.e. $\mathbf{Q}$ consists of the RFF embedded bi-invariants of the input coordinate $x_{m}$ and each of the latent poses $p_{i}$ stacked row-wise. The key matrix is given by a learnable linear transformation $\mathbf{W}_{k}$ of the context vectors $\mathbf{c}_{i}$ : $\mathbf{K}{=}\mathbf{W}_{k}\mathbf{C}^{T}$ . The attention coefficients which result from the inner product of $\mathbf{Q},\mathbf{K}$ are weighted by a Gaussian window $\mathbf{G}$ whose magnitude is conditioned on a distance measure on the relative distance between latent poses and input coordinates as: $\mathbf{G}_{i}=\sigma_{\text{att}}(||p_{i}-\mathbf{x}||^{2})$ , with $\sigma_{\text{att}}$ a hyperparameter which determines the locality of each of the latents. Finally the value matrix is calculated as a learnable linear transformation $\mathbf{W}_{v}$ of the appearances $\mathbf{A}$ , conditioned through FiLM modulation [36] by a second RFF embedding of the relative poses split into scale- and shift modulations: $\mathbf{V}{=}\mathbf{W}_{v}\mathbf{A}\odot\mathbf{W}_{v_{\alpha}}\gamma_{v_{% \alpha}}(\mathbf{A})+\mathbf{W}_{v_{\beta}}\gamma_{v_{\beta}}(\mathbf{A})$ . The latents $z^{\nu}_{t}$ are optimized for a single state $\nu_{t}$ , whereas the parameters $\theta$ of the ENF backbone - which consist of all the learnable parameters of the linear layers $\mathbf{W}_{q},\mathbf{W}_{k},\mathbf{W}_{v},\mathbf{W}_{v_{\alpha}},\mathbf{W% }_{v_{\beta}}$ used to construct $\mathbf{Q},\mathbf{K},\mathbf{V}$ - are shared over all states.

The overall architecture consists of a linear layer $\mathbf{W}\mathbb{R}^{c}\rightarrow\mathbb{R}^{d}$ applied to $\mathbf{c}_{i}\in\mathbb{R}^{c}$ , followed by a layernorm. After this, the cross attention listed above is applied, followed by three $d\text{-dim}$ linear layers, the final one map** to the output dimension $\mathbb{R}^{\text{out}}$ .

Equivariance follows from sharing $\mathbf{Q},\mathbf{K},\mathbf{V}$ over equivalence classes

Note that the latent space of the ENF is equipped with a group action as: $gz^{\nu}_{t}=\{(gp_{i},\mathbf{a}_{i})\}_{i=1}^{N}$ . As an example, $\rm SE(2)$ -equivariance of the ENF follows from bi-invariance of the quantity $\mathbf{a}$ used to construct $\mathbf{Q}$ under the group action:

\forall g\in SE(n):\;\;(p_{i},\mathbf{x})\;\mapsto\;(g\,p_{i},g\,\mathbf{x})\;% \;\;\Leftrightarrow\;\;\;p_{i}^{-1}\mathbf{x}\;\mapsto\;(g\,p_{i})^{-1}g\,% \mathbf{x}=p_{i}^{-1}g^{-1}g\,\mathbf{x}=p_{i}^{-1}g^{-1}g\,.

(11)

And so, constructing the matrix containing the relative poses of bi-transformed poses and coordinates $(g\mathbf{P})^{-1}g\mathbf{x}$ as $((g\mathbf{P})^{-1}g\mathbf{x})_{i,:}=p_{i}^{-1}g^{-1}g\mathbf{x}=p_{i}^{-1}% \mathbf{x}$ , we trivially have:

\forall g\in SE(n):(p_{i},\mathbf{x})\mapsto(g\,p_{i},g\,\mathbf{x})\;\;\;% \Leftrightarrow\;\;\;\mathbf{Q}(\mathbf{A})\mapsto\mathbf{Q}(g\mathbf{A})=% \mathbf{Q}(\mathbf{A}).

(12)

Appendix D Defining additional bi-invariant attributes

Other examples of the bi-invariants attributes that are used in the experiments section are listed here.

Full rotation symmetries on the 2-sphere For the global shallow water equations we defined $\mathbf{a}^{\rm SW}$ as an attribute that is bi-invariant only to rotations over globe’s axis, i.e. rotations over $\phi$ . In our experiments we also solve diffusion over the sphere, which is fully $\mathrm{SO(3)}$ rotationally symmetric. To achieve equivariance to full 3d rotations, we take poses $p\in{\rm SO(3)}$ parameterized by euler angles which act on points $x\in S^{2}$ parameterized by 3D unit vectors $\mathbf{x}$ through 3D-rotation matrices, allowing us to calculate the bi-invariant $p^{-1}x$ :

\mathbf{a}^{\rm SO(3)}_{i,m}=\mathbf{R}_{i}\mathbf{x}_{m}.

(13)

This bi-invariant is used in our experiments for diffusion on the 2-sphere.

The 3D ball $\mathbb{B}^{3}$ . We experiment with Boussinesq equation for internally heated convection in a ball. The PDE is fully rotationally symmetric, but since the heat source $K$ is at a fixed point (the center of the ball resp.), it is not symmetric to translations of the initial conditions within the ball. As such, we let $p\in\mathrm{SO(3)}\times\mathbb{R}$ with $\phi,\theta,\gamma,r$ s.t. $0<r<1$ . The PDE is defined over spherical coordinates $(\phi,\theta,r)$ , which we map to vectors in $\mathbf{x}\in\mathbb{R}^{3}$ . We then use the following bi-invariant, which is only symmetric to rotations in $\mathrm{SO(3)}$ :

\mathbf{a}^{\mathbb{B}^{3}}_{i,m}=\mathbf{R}_{i}\mathbf{x}_{m}\oplus r_{p_{i}}% \oplus r_{x_{m}}.

(14)

No transformation symmetries. A simple "bi-invariant" for this setting that preserves all geometric information is given by simply concatenating coordinates $p$ with coordinates $x$ :

\mathbf{a}^{\emptyset}_{i,m}=p_{i}\oplus x_{m}

(15)

Parameterizing the cross-attention operation in Eq. 5 as function of this bi-invariant results in a framework without any equivariance constraints. We use this in experiments to ablate over equivariance constraints and its impact on performance.

Appendix E Experimental Details

E.1 Dataset creation

For creating the dataset of PDE solutions we used py-pde [52] for Navier-Stokes and the diffusion equation on the plane. For the shallow-water equation and the diffusion equation on the sphere, as well as the internally heated convection in a 3D ball we used Dedalus [10].

Diffusion on the plane.

For the diffusion equation on the plane we use as initial conditions narrow spikes centred at random locations in the left half of the domain for the train set, and in the right half of the domain for the test set. States are defined on a 64 $\times$ 64 grid ranging from -3 to 3. Initial conditions are randomly sampled uniformly between -2 and 2 for $x$ and 0 and 2 for $y$ in the training set and between -2 and 2 for $x$ and -2 and 0 for $y$ . A random value uniformly sampled between 5.0 and 5.5 is inserted at the randomly sampled location. We solve the equation with an Euler solver for 27 steps, discarding the first 7, with a timestep $dt=0.01$ . We generate 1024 training and 128 test trajectories.

Navier-Stokes on the flat 2-torus.

For Navier-Stokes on the flat 2-torus we use Gaussian random fields as initial conditions and solve the PDE using a Cranck-Nicholson method with timestep $dt=1.0$ for 20 steps. The PDE is $\frac{dv}{dt}=-u\nabla v+v\Delta\mu+f,v=\nabla\times u,\nabla u=0$ , where $u$ is the velocity field, $v$ the vorticity, $\mu$ the viscosity and $f$ a forcing term

		$\displaystyle\frac{dv}{dt}=-u\nabla v+v\Delta\mu+f$
		$\displaystyle v=\nabla\times u$
		$\displaystyle\nabla u=0,$

where $u$ is the velocity field, $v$ the vorticity, $\mu$ the viscosity and $f$ a forcing term. States are defined on a 64 $\times$ 64 grid. We generate 8192 training and 512 test trajectories.

Diffusion on the 2-sphere.

For the diffusion dataset on the sphere, states are defined over a $128\times 64$ $\phi,\theta$ grid. Initial conditions are generated as a gaussian peak inserted at a random point on the sphere with $\sigma=0.25$ . The equation is solved for 20 timesteps with RK4 and $dt=1.0$ . We generate 256 training and 64 test trajectories.

Spherical whallow-water equations [16].

The global shallow-water equations are

		$\displaystyle\frac{du}{dt}=-fk\times u-g\nabla h+\nu\Delta u$
		$\displaystyle\frac{dh}{dt}=-h\nabla\cdot u+\nu\Delta h,$

where $\frac{d}{dt}$ is the material derivative, $k$ is the unit vector orthogonal to the surface of the sphere, $u$ is the velocity field that is tangent to the spherical surface and and $h$ is the thickness of the fluid layer. The rest are constant parameters of the Earth (see [16] for details). As initial conditions we follow [16] and use basic zonal flow, representing a mid-latitude tropospheric jet, with a correspondingly balanced height field.

u(\phi)=\begin{cases}0&\text{for }\phi\leq\phi_{0}\\ \frac{u_{\max}}{e_{n}}\exp\left[\frac{1}{(\phi-\phi_{0})(\phi-\phi_{1})}\right% ]&\text{for }\phi_{0}<\phi<\phi_{1}\\ 0&\text{for }\phi\geq\phi_{1}\end{cases}

Where $u_{\text{max}}=80ms^{-1}$ , $\phi_{0}=\pi/7,\phi_{1}=\pi/2-\phi_{1}$ , and $e_{n}=\text{exp}[-4(\phi_{1}-\phi_{0})^{2}]$ . With this initial zonal flow, we numerically integrate the balance equation

gh(\phi)=gh_{0}-\int^{\phi}au(\phi^{\prime})\left[f+\frac{\tan(\phi^{\prime})}% {a}u(\phi^{\prime})\right]\,d\phi^{\prime},

to obtain the height $h$ . We then randomly generate small un-balanced perturbations $h^{\prime}$ to the height field

h^{\prime}(\theta,\phi)=\hat{h}\cos(\phi)e^{-(\theta_{2}-\theta/\alpha)^{2}}e^% {-[(\phi_{2}-\phi)/\beta]^{2}}

by uniformly sampling $\alpha,\beta,\hat{h},\theta_{2},$ and $\phi_{2}$ within a neighbourhood of the values use in [16]. States are defined on a 192 $\times$ 96 grid for the high-resolution dataset, which is subsequently downsampled by $2\times 2$ mean pooling to a $96\times 48$ grid. We generate 512 training trajectories and 64 test trajectories.

Internally-heated convection in the ball.

The equations for the internally-heated convection system are listed here, they include thermal diffusivity ( $\kappa$ ) and kinematic viscosity ( $\nu$ ), given by:

\kappa=\left(\text{Ra}\cdot\text{Pr}\right)^{-1/2}

\nu=\left(\frac{\text{Ra}}{\text{Pr}}\right)^{-1/2}

We set $\text{Ra}=1e-6$ and $\text{Pr}=1$ .

1. Incompressibility condition (continuity equation):

\nabla\cdot\mathbf{u}+\tau_{p}=0

2. Momentum equation (Navier-Stokes equation):

\frac{\partial\mathbf{u}}{\partial t}-\nu\nabla^{2}\mathbf{u}+\nabla p-\mathbf% {r}T+\text{lift}(\tau_{u})=-\mathbf{u}\times(\nabla\times\mathbf{u})

3. Temperature equation:

\frac{\partial T}{\partial t}-\kappa\nabla^{2}T+\text{lift}(\tau_{T})=-\mathbf% {u}\cdot\nabla T+\kappa T_{\text{source}}

4. Shear stress boundary condition (stress-free condition):

\text{Shear Stress}=0\text{ on the boundary}

5. No penetration boundary condition (radial component of velocity at $r=1$ ):

\text{radial}(\mathbf{u}(r=1))=0

6. Thermal boundary condition (radial gradient of temperature at $r=1$ ):

\text{radial}(\nabla T(r=1))=-2

7. Pressure gauge condition:

\int p\,dV=0

The boundary conditions imposed are stress-free and no-penetration for the velocity field and a constant thermal flux at the outer boundary. These conditions are enforced using penalty terms ( $\tau$ ) that are lifted into the domain using higher-order basis functions.

States are defined over a $64\times 24\times 24$ $\phi,\theta,r$ grid. We use a SBDF2 solver which we constrain by $dt_{\text{min}}=1e-4$ and $dt_{\text{max}}=2e-2$ . We evolve the PDE for 26 timesteps, discarding the first 6. We generate 512 training trajectories and 64 test trajectories.

E.2 Training details

We provide hyperparameters per experiment. We optimize the weights of the neural field $f_{\theta}$ , and neural ODE $F_{\psi}$ with Adam [23] with a learning rate of 1E-4 and 1E-3 respectively. We initialize the inner learning rate that we use in Meta-SGD [28] for learning $z^{\nu}$ at 1.0 for $p$ and 5.0 for $\mathbf{c}$ . For the neural ODE $F_{\psi}$ , we use 3 of our message passing layers in the architecture specified in [5], with a hidden dimensionality of 128. The std parameter of the RFF embedding functions $\gamma_{q},\gamma_{v_{\alpha}},\gamma_{v_{\beta}}$ (see Appx. C), is chosen per experiment. We run all experiments on a single A100. All experiments are ran 3 times.

Diffusion on the plane.

We use 4 latents with $\mathbf{c}\in\mathbb{R}^{16}$ . We set the hidden dim of the ENF to 64 and use 2 attention heads. We train the model for 1000 epochs. We set $\gamma_{q}=0.05,\gamma_{v_{\alpha}}=0.01,\gamma_{v_{\beta}}=0.01$ . We use a batch size of 8. The model takes approximately 8 hours to train.

Navier-Stokes on the flat 2-torus.

We use 4 latents with $\mathbf{c}\in\mathbb{R}^{16}$ . We set the hidden dim of the ENF to 64 and use 2 attention heads. We train the model for 2000 epochs. We set $\gamma_{q}=0.05,\gamma_{v_{\alpha}}=0.2,\gamma_{v_{\beta}}=0.2$ . We use a batch size of 4. The model takes approximately 48 hours to train.

Diffusion on the 2-sphere.

We use 18 latents with $\mathbf{c}\in\mathbb{R}^{4}$ . We set the hidden dim of the ENF to 16 and use 2 attention heads. We train the model for 1500 epochs. We set $\gamma_{q}=0.01,\gamma_{v_{\alpha}}=0.01,\gamma_{v_{\beta}}=0.01$ . We use a batch size of 2. The model takes approximately 12 hours to train.

Spherical whallow-water equations [16].

We use 8 latents with $\mathbf{c}\in\mathbb{R}^{3}2$ . We set the hidden dim of the ENF to 128, and use 2 attention heads. We train the model for 1500 epochs. $\gamma_{q}=0.05,\gamma_{v_{\alpha}}=0.2,\gamma_{v_{\beta}}=0.2$ . We use a batch size of 2. The model takes approximately 24 hours to train.

Internally-heated convection in the ball

Baselines

As baseline models on Navier-Stokes we train FNO and GFNO [29] with 8 modes and 32 channels for 700 epochs (until convergence). We train CNODE [2] with 4 layers of size 64 for 300 epochs (until convergence). We train DINo on all experiments for 2000 epochs with an architecture as specified in [49]. For the IHC and shallow-water experiments, we increase the latent dim from 100 to 200, the number of layers for the neural ODE from 3 to 5, and the latent dim of the neural field decoder from 64 to 256, as per [49].

	$t_{\text{in}}$ train	$t_{\text{out}}$ train	$t_{\text{in}}$ test	$t_{\text{out}}$ test
DINo [49]	5.92E-04	2.40E-04	3.85E-03	5.12E-03
Ours $\mathbf{a}^{\emptyset}$	6.23 $\pm$ 1.01E-06	4.90 $\pm$ 20.1E-06	2.19 $\pm$ 0.32E-03	5.08 $\pm$ 13.2E-04
Ours $\mathbf{a}^{\rm SE(2)}$	1.18 $\pm$ 0.45E-05	2.53 $\pm$ 3.50E-05	1.50 $\pm$ 0.77E-05	2.53 $\pm$ 3.43E-05

	$t_{\text{in}}$ train	$t_{\text{out}}$ train	$t_{\text{in}}$ test	$t_{\text{out}}$ test
	Train resolution
DINo [49]	1.75E-04	1.36E-03	2.01E-04	1.37E-03
Ours $\mathbf{a}^{\rm SW}$	9.94 $\pm$ 0.41E-05	1.89 $\pm$ 0.03E-03	1.09 $\pm$ 1.14E-04	1.87 $\pm$ 0.04E-03
	zero-shot 2x super-resolution
DINo [49]	3.03E-04	2.03E-03	3.37E-04	2.03E-03
Ours $\mathbf{a}^{\rm SW}$	1.58 $\pm$ 0.02E-04	1.96 $\pm$ 0.02E-03	1.61 $\pm$ 0.01E-04	1.93 $\pm$ 0.02E-03

Space-Time Continuous PDE Forecasting using Equivariant Neural Fields

Abstract

1 Introduction

2 Mathematical background and problem setting

Continuous spatiotemporal dynamics forecasting.

Neural Fields in dynamics modelling.

Symmetries and weight sharing.

3 Method

Equivariant space-time continuous dynamics forecasting.

3.1 Representing PDE states with Equivariant Neural Fields

ENFs as cross-attention over bi-invariant attributes.

Bi-invariant attributes for PDE solving.

3.2 PDE solution as latent space flow

3.3 Obtaining the initial latent z0νsubscriptsuperscript𝑧𝜈0z^{\nu}_{0}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT

3.4 Equivariance and meta-learning structure the latent space Z𝑍Zitalic_Z

Recap: optimization objective.

4 Experiments

4.1 Datasets and evaluation

Evaluation.

Equivariance properties - heat equation on the plane.

Robustness to subsampling & time-horizons - Navier-Stokes on the 2-Torus.

Data-efficiency - Diffusion on the sphere.

Super-resolution - Shallow-Water on the sphere.

Challenging geometries - Internally heated convection in 3D ball.

5 Conclusion

References

Appendix A Related work

DL approaches to dynamics modelling

Inductive biases in DL and dynamics modelling

Neural Fields in dynamics modelling

Obtaining Neural Fields representations

Appendix B Pseudocode for optimization objective

Appendix C Equivariant Neural Fields

ENF to reconstruct PDE states

Equivariance follows from sharing 𝐐,𝐊,𝐕𝐐𝐊𝐕\mathbf{Q},\mathbf{K},\mathbf{V}bold_Q , bold_K , bold_V over equivalence classes

Appendix D Defining additional bi-invariant attributes

Appendix E Experimental Details

E.1 Dataset creation

Diffusion on the plane.

Navier-Stokes on the flat 2-torus.

Diffusion on the 2-sphere.

Spherical whallow-water equations [16].

Internally-heated convection in the ball.

E.2 Training details

Diffusion on the plane.

Navier-Stokes on the flat 2-torus.

Diffusion on the 2-sphere.

Spherical whallow-water equations [16].

Internally-heated convection in the ball

Baselines

3.3 Obtaining the initial latent $z^{\nu}_{0}$

3.4 Equivariance and meta-learning structure the latent space $Z$

Equivariance follows from sharing $\mathbf{Q},\mathbf{K},\mathbf{V}$ over equivalence classes