Space-Time Continuous PDE Forecasting using Equivariant Neural Fields

David M. Knigge∗,1, David R. Wessels∗,1, Riccardo Valperga1, Samuele Papa1, Jan-Jakob Sonke2,
Efstratios Gavves †,1, Erik J. Bekkers †,1
1
University of Amsterdam    2 Netherlands Cancer Institute
[email protected], [email protected]
Abstract

Recently, Conditional Neural Fields (NeFs) have emerged as a powerful modelling paradigm for PDEs, by learning solutions as flows in the latent space of the Conditional NeF. Although benefiting from favourable properties of NeFs such as grid-agnosticity and space-time-continuous dynamics modelling, this approach limits the ability to impose known constraints of the PDE on the solutions – e.g. symmetries or boundary conditions – in favour of modelling flexibility. Instead, we propose a space-time continuous NeF-based solving framework that - by preserving geometric information in the latent space - respects known symmetries of the PDE. We show that modelling solutions as flows of pointclouds over the group of interest G𝐺Gitalic_G improves generalization and data-efficiency. We validated that our framework readily generalizes to unseen spatial and temporal locations, as well as geometric transformations of the initial conditions - where other NeF-based PDE forecasting methods fail - and improve over baselines in a number of challenging geometries.

1 Introduction

footnotetext: * shared first author, \dagger shared lead advising

Partial Differential Equations (PDEs) are a foundational tool in modelling and understanding spatio-temporal dynamics across diverse scientific domains. Classically, PDEs are solved using numerical methods such as finite elements, finite volumes, or spectral methods. In recent years, Deep Learning (DL) methods have emerged as promising alternatives due to abundance of observed and simulated data as well as the accessibility to computational resources, with applications ranging from fluid simulations and weather modelling [49, 7] to biology [32].

Refer to caption
Figure 1: We propose to solve an equivariant PDE in function space by solving an equivariant ODE in latent space. Through our proposed framework, which leverages Equivariant Neural Fields fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, a field νtsubscript𝜈𝑡\nu_{t}italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is represented by a set of latents ztν={(piν,𝐜iν)}i=1Nsubscriptsuperscript𝑧𝜈𝑡superscriptsubscriptsuperscriptsubscript𝑝𝑖𝜈superscriptsubscript𝐜𝑖𝜈𝑖1𝑁z^{\nu}_{t}=\{(p_{i}^{\nu},\mathbf{c}_{i}^{\nu})\}_{i=1}^{N}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT , bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT consisting of a pose pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and context vector 𝐜isubscript𝐜𝑖\mathbf{c}_{i}bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Using meta-learning, the initial latent z0νsubscriptsuperscript𝑧𝜈0z^{\nu}_{0}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is fit in only 3 SGD steps, after which an equivariant neural ODE Fψsubscript𝐹𝜓F_{\psi}italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT models the solution as a latent flow.

The systems modelled by PDEs often have underlying symmetries. For example, heat diffusion or fluid dynamics can be modeled with differential operators which are rotation equivariant, e.g., given a solution to the system of PDEs, its rotation is also a valid solution 111Assuming boundary conditions are symmetric, i.e. they transform according to the relevant group action.. In such scenarios it is sensible, and even desirable, to design neural networks that incorporate and preserve such symmetries to improve generalization and data-efficiency [12, 47, 4].

Crucially, DL-based approaches often rely on data sampled on a regular grid, without the inherent ability to generalize outside of it, which is restrictive in many scenarios [39]. To this end, [49] propose to use Neural Fields (NeFs) for modelling and forecasting PDE dynamics. This is done by fitting a neural ODE [11] to the conditioning variables of a conditional Neural Field trained to reconstruct states of the PDE [13]. However, this approach fails to leverage aforementioned known symmetries of the system. Furthermore, using neural fields as representations has proved difficult due to the non-linear nature of neural networks [13, 3, 34], limiting performance in more challenging settings. We posit that NeF-based modelling of PDE dynamics benefits from representations that account for the symmetries of the system as this allows for introducing inductive biases into the model that ought to be reflected in solutions. Furthermore, we show that through meta-learning [28, 44] the NeF backbone improves performance for complex PDEs by further structuring the NeF’s latent space, simplifying the task of the neural ODE.

We introduce a framework for space-time continuous equivariant PDE solving, by adapting a class of SE(n)SEn\mathrm{SE(n)}roman_SE ( roman_n )-Equivariant Neural Fields (ENFs) to PDE-specific symmetries. We leverage the ENF as representation for modelling spatiotemporal dynamics. We solve PDEs by learning a flow in the latent space of the ENF - starting at a point z0subscript𝑧0z_{0}italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT corresponding to the initial state of the PDE - with an equivariant graph-based neural ODE [11] we develop from previous work [5]. We extend the ENF to equivariances beyond SE(n)SEn\mathrm{SE(n)}roman_SE ( roman_n ), by extending its weight-sharing scheme to equivalance classes for specific symmetries relevant to our setting. Furthermore, we show how meta-learning [14, 28, 44, 13], can not only significantly reduce inference time of the proposed framework, but also substantially simplify the structure of the latent space of the ENF, thereby simplifying the learning process of the latent dynamics for the neural ODE model. We present the following contributions:

  • We introduce a framework for spatio-temporally continuous PDE solving that respects known symmetries of the PDE through equivariance constraints.

  • We show that correctly chosen equivariance constraints as inductive bias improves performance of the solver - in terms of MSE - in spatio-temporally continuous settings, i.e. evaluated off the training grid and beyond the training horizon.

  • We show how meta-learning improves the structure of the latent space of the ENF, simplifying the learning process, leading to better performance in solving PDEs.

We structure the paper as follows: in Sec. 2 we provide an overview of the mathematical preliminaries and describe the problem setting. Our proposed framework is introduced in Sec. 3. We validate our framework on different PDEs defined over a variety of geometries in Sec. 4, with differing equivariance constraints, showing competitive performance over other neural PDE solvers.We provide an in-depth positioning of our approach in relation to other work in Appx. A.

2 Mathematical background and problem setting

Continuous spatiotemporal dynamics forecasting.

The setting considered is data-driven learning of the dynamics of a system described by continuous observables. In particular, we consider flows of fields, denoted with ν^:d×[0,T]c:^𝜈superscript𝑑0𝑇superscript𝑐\hat{\nu}:\mathbb{R}^{d}\times[0,T]\rightarrow\mathbb{R}^{c}over^ start_ARG italic_ν end_ARG : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × [ 0 , italic_T ] → blackboard_R start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT. We use ν^tsubscript^𝜈𝑡\hat{\nu}_{t}over^ start_ARG italic_ν end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as a shorthand for ν^(,t)^𝜈𝑡\hat{\nu}(\cdot,t)over^ start_ARG italic_ν end_ARG ( ⋅ , italic_t ). We assume the flow is governed by a PDE, and consider the Initial Value Problem (IVP) of predicting ν^tsubscript^𝜈𝑡\hat{\nu}_{t}over^ start_ARG italic_ν end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from a given ν0subscript𝜈0\nu_{0}italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. The dataset consists of field snapshots ν:𝒳×Tc\nu:\mathcal{X}\times\llbracket T\rrbracket\rightarrow\mathbb{R}^{c}italic_ν : caligraphic_X × ⟦ italic_T ⟧ → blackboard_R start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT, in which T:=1,2,,T\llbracket T\rrbracket:=1,2,\dots,T⟦ italic_T ⟧ := 1 , 2 , … , italic_T denotes the set of time points on which the flow is sampled and 𝒳d𝒳superscript𝑑\mathcal{X}\subset\mathbb{R}^{d}caligraphic_X ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is a set of coordinate values. For each time point we are given a set of input-output pairs [𝒳,ν(𝒳)]𝒳𝜈𝒳[\mathcal{X},\nu(\mathcal{X})][ caligraphic_X , italic_ν ( caligraphic_X ) ] where ν(𝒳)c𝜈𝒳superscript𝑐\nu(\mathcal{X})\subset\mathbb{R}^{c}italic_ν ( caligraphic_X ) ⊂ blackboard_R start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT are the values of the field at those coordinates. Importantly, the location at which the field is sampled need not be regular, i.e., we do not require the training data to be on a grid or to be regularly spaced in time, nor need coordinate values be identical for train and test sets. Following [49], we distinguish between tinsubscript𝑡int_{\text{in}}italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT - referring to values within the training time horizon [0,T]0𝑇\left[0,T\right][ 0 , italic_T ] - and toutsubscript𝑡outt_{\text{out}}italic_t start_POSTSUBSCRIPT out end_POSTSUBSCRIPT - analogously to values beyond T𝑇Titalic_T.

Neural Fields in dynamics modelling.

Conditional Neural fields (NeFs) are a class of coordinate-based neural networks, often trained to reconstruct discretely-sampled input continuously. More specifically, a conditional neural field fθ:nd:subscript𝑓𝜃superscript𝑛superscript𝑑f_{\theta}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{d}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is a field –parameterized by a neural network with parameters θ𝜃\thetaitalic_θ– that maps input coordinates xn𝑥superscript𝑛x\in\mathbb{R}^{n}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT in the data domain alongside conditioning latents z𝑧zitalic_z to d𝑑ditalic_d-dimensional signal values ν(x)d𝜈𝑥superscript𝑑\nu(x)\in\mathbb{R}^{d}italic_ν ( italic_x ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. By associating a conditioning latent zνcsuperscript𝑧𝜈superscript𝑐z^{\nu}\in\mathbb{R}^{c}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT to each signal ν𝜈{\nu}italic_ν, a single conditional NeF fθ:n×cd:subscript𝑓𝜃superscript𝑛superscript𝑐superscript𝑑f_{\theta}:\mathbb{R}^{n}\times\mathbb{R}^{c}\rightarrow\mathbb{R}^{d}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT can learn to represent families 𝒟𝒟\mathcal{D}caligraphic_D of continuous signals such that ν𝒟:f(x)fθ(x;zν):for-all𝜈𝒟𝑓𝑥subscript𝑓𝜃𝑥superscript𝑧𝜈\forall\,\nu\in\mathcal{D}:f(x)\approx f_{\theta}(x;z^{\nu})∀ italic_ν ∈ caligraphic_D : italic_f ( italic_x ) ≈ italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ; italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ). [49] propose to use conditional NeFs for PDE modelling by learning a continuous flow in the latent space of a conditional neural field. In particular, a set of latents {ziν}i=1Tsuperscriptsubscriptsuperscriptsubscript𝑧𝑖𝜈𝑖1𝑇\{z_{i}^{\nu}\}_{i=1}^{T}{ italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT are obtained by fitting a conditional neural field to a given set of observations {νi}i=1Tsuperscriptsubscriptsubscript𝜈𝑖𝑖1𝑇\{\nu_{i}\}_{i=1}^{T}{ italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT at timesteps 1,,T1𝑇1,...,T1 , … , italic_T; simultaneously, a neural ODE [11] Fψsubscript𝐹𝜓F_{\psi}italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT is trained to map pairs of temporally contiguous latents s.t. solutions correspond to the trajectories traced by the learned latents. Though this approach yields impressive results for sparse and irregular data in planar PDEs, we show it breaks down on complex geometries. We hypothesize that this is due to lack of a latent space that preserves relevant geometric transformations that characterize the symmetries of the systems we are modelling, and as such propose an extension of this framework where such symmetries are preserved.

Symmetries and weight sharing.

Given a group G𝐺Gitalic_G with identity element eG𝑒𝐺e\in Gitalic_e ∈ italic_G, and a set X𝑋Xitalic_X, a group action is a map 𝒯:G×XX:𝒯𝐺𝑋𝑋\mathcal{T}:G\times X\rightarrow Xcaligraphic_T : italic_G × italic_X → italic_X. For simplicity, we denote the action of gG𝑔𝐺g\in Gitalic_g ∈ italic_G on xX𝑥𝑋x\in Xitalic_x ∈ italic_X as gx:=𝒯(g,x)assign𝑔𝑥𝒯𝑔𝑥gx:=\mathcal{T}(g,x)italic_g italic_x := caligraphic_T ( italic_g , italic_x ), and call G𝐺Gitalic_G-space a smooth manifold equipped with a G𝐺Gitalic_G action. A group action is homomorphic to G𝐺Gitalic_G with its group product, namely it is such that ex=x𝑒𝑥𝑥ex=xitalic_e italic_x = italic_x and (gh)x=g(hx)𝑔𝑥𝑔𝑥(gh)x=g(hx)( italic_g italic_h ) italic_x = italic_g ( italic_h italic_x ). As an example, we are interested in the Special Euclidean group SE(n)=nSO(n)SEnright-normal-factor-semidirect-productsuperscript𝑛𝑆𝑂𝑛\mathrm{SE(n)}{=}\mathbb{R}^{n}\rtimes SO(n)roman_SE ( roman_n ) = blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⋊ italic_S italic_O ( italic_n ): group elements of SE(n)SEn\mathrm{SE(n)}roman_SE ( roman_n ) are identified by a translation tn𝑡superscript𝑛t\in\mathbb{R}^{n}italic_t ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and rotations 𝐑SO(n)𝐑𝑆𝑂𝑛\mathbf{R}\in SO(n)bold_R ∈ italic_S italic_O ( italic_n ) with group operation gg=(t,𝐑θ)(t,𝐑θ)=(𝐑𝐱+𝐱,𝐑𝐑θ)𝑔superscript𝑔𝑡subscript𝐑𝜃superscript𝑡subscript𝐑superscript𝜃superscript𝐑𝐱𝐱subscript𝐑𝐑superscript𝜃gg^{\prime}=(t,\mathbf{R}_{\theta})\,(t^{\prime},\mathbf{R}_{\theta^{\prime}})% =(\mathbf{Rx}^{\prime}+\mathbf{x},\mathbf{R}\mathbf{R_{\theta^{\prime}}})italic_g italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ( italic_t , bold_R start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) ( italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_R start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = ( bold_Rx start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + bold_x , bold_RR start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ); We denote by gsubscript𝑔\mathcal{L}_{g}caligraphic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT the left action of G𝐺Gitalic_G on function spaces defined as gf(𝐱)=f(g1𝐱)=f(𝐑θ1(𝐱𝐱))subscript𝑔𝑓superscript𝐱𝑓superscript𝑔1superscript𝐱𝑓superscriptsubscript𝐑𝜃1superscript𝐱𝐱\mathcal{L}_{g}f(\mathbf{x}^{\prime})=f(g^{-1}\mathbf{x}^{\prime})=f(\mathbf{R% }_{\theta}^{-1}(\mathbf{x}^{\prime}-\mathbf{x}))caligraphic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_f ( italic_g start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_f ( bold_R start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_x ) ). Many PDEs are defined by equivariant differential operators such that for a given state ν𝜈\nuitalic_ν: g𝒩[ν]=𝒩[gν]subscript𝑔𝒩delimited-[]𝜈𝒩delimited-[]subscript𝑔𝜈\mathcal{L}_{g}\mathcal{N}[\nu]=\mathcal{N}[\mathcal{L}_{g}\nu]caligraphic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT caligraphic_N [ italic_ν ] = caligraphic_N [ caligraphic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT italic_ν ]. If the boundary conditions do not break the symmetry, namely if the boundary is symmetric with respect to the same group action, then a G𝐺Gitalic_G-transformed solution to the IVP for some ν0subscript𝜈0\nu_{0}italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT corresponds to the solution for the G𝐺Gitalic_G-transformed initial value. For example, laws of physics do not depend on the choice of coordinate system, this implies that many PDEs are defined by SE(n)SEn\mathrm{SE(n)}roman_SE ( roman_n )-equivariant differential operators. The geometric deep learning literature shows that models can benefit from leveraging the inherent symmetries or invariances present in the data by constraining the searchable function space through weight sharing [9, 25, 5]. Recall that in our framework we model flows of fields, solutions to PDEs defined by equivariant differential operators, with ordinary differential equations in the latent space of conditional neural fields. We leverage the symmetries of the system for two key aspects of the proposed method: first by making the relation between signals and corresponding latents equivariant; second, by using equivariant ODEs, namely ODEs defined by equivariant vector fields: if dzdτ=F(z)𝑑𝑧𝑑𝜏𝐹𝑧\frac{dz}{d\tau}{=}F(z)divide start_ARG italic_d italic_z end_ARG start_ARG italic_d italic_τ end_ARG = italic_F ( italic_z ) is such that F(gz)=gF(z)𝐹𝑔𝑧𝑔𝐹𝑧F\left(gz\right)=gF\left(z\right)italic_F ( italic_g italic_z ) = italic_g italic_F ( italic_z ), then solutions are mapped to solutions by the group action.

3 Method

We adapt the work of [49], and consider the following optimization problem 222We highlight that [49] optimize latents ztνsubscriptsuperscript𝑧𝜈𝑡z^{\nu}_{t}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, neural field fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, and ODE Fψsubscript𝐹𝜓F_{\psi}italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT using two separate objectives. We instead found that our framework is more stable under single-objective optimization.:

minθ,ψ,zτ𝜃𝜓subscript𝑧𝜏min\displaystyle\underset{\theta,\psi,z_{\tau}}{\text{min}}start_UNDERACCENT italic_θ , italic_ψ , italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_UNDERACCENT start_ARG min end_ARG 𝔼νD,x𝒳,tTνt(x)fθ(x;ztν)22,subscript𝔼formulae-sequence𝜈𝐷𝑥𝒳𝑡delimited-⟦⟧𝑇superscriptsubscriptnormsubscript𝜈𝑡𝑥subscript𝑓𝜃𝑥subscriptsuperscript𝑧𝜈𝑡22\displaystyle\mathbb{E}_{\nu\in D,x\in\mathcal{X},t\in\llbracket T\rrbracket}% \left\|\nu_{t}(x)-f_{\theta}(x;z^{\nu}_{t})\right\|_{2}^{2},blackboard_E start_POSTSUBSCRIPT italic_ν ∈ italic_D , italic_x ∈ caligraphic_X , italic_t ∈ ⟦ italic_T ⟧ end_POSTSUBSCRIPT ∥ italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) - italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ; italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , where ztν=z0ν+0tFψ(zτν)𝑑τ,subscriptsuperscript𝑧𝜈𝑡subscriptsuperscript𝑧𝜈0superscriptsubscript0𝑡subscript𝐹𝜓subscriptsuperscript𝑧𝜈𝜏differential-d𝜏\displaystyle z^{\nu}_{t}=z^{\nu}_{0}+\int_{0}^{t}F_{\psi}(z^{\nu}_{\tau})d% \tau\,,italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) italic_d italic_τ , (1)

with fθ(x;ztν)subscript𝑓𝜃𝑥subscriptsuperscript𝑧𝜈𝑡f_{\theta}(x;z^{\nu}_{t})italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ; italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) a decoder tasked with reconstructing state νtsubscript𝜈𝑡\nu_{t}italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from latent ztνsuperscriptsubscript𝑧𝑡𝜈z_{t}^{\nu}italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT and Fψsubscript𝐹𝜓F_{\psi}italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT a neural ODE that maps a latent to its temporal derivative: dzτνdτ=Fψ(zτν)𝑑superscriptsubscript𝑧𝜏𝜈𝑑𝜏subscript𝐹𝜓subscriptsuperscript𝑧𝜈𝜏\dfrac{dz_{\tau}^{\nu}}{d\tau}{=}F_{\psi}(z^{\nu}_{\tau})divide start_ARG italic_d italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT end_ARG start_ARG italic_d italic_τ end_ARG = italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ), modelling the solution as flow in latent space starting at the initial latent z0νsuperscriptsubscript𝑧0𝜈z_{0}^{\nu}italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT - see Fig. 1 for a visual intuition.

Equivariant space-time continuous dynamics forecasting.

A PDE defined by a G𝐺Gitalic_G-equivariant differential operator - for which g𝒩[ν]=𝒩[gν]subscript𝑔𝒩delimited-[]𝜈𝒩delimited-[]subscript𝑔𝜈\mathcal{L}_{g}\mathcal{N}[\nu]=\mathcal{N}[\mathcal{L}_{g}\nu]caligraphic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT caligraphic_N [ italic_ν ] = caligraphic_N [ caligraphic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT italic_ν ] - are such that solutions are mapped to other solutions by the group action if the boundary conditions are symmetric. We would like to leverage this property, and constrain the neural ODE Fψsubscript𝐹𝜓F_{\psi}italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT such that the solutions it finds in latent space can be mapped onto each other by the group action. Our motivation for this is twofold: (1) it is natural for our model to have, by construction, the geometric properties that the modelled system is known to posses - (2) to get more structured latent representations and facilitate the job of the neural ODE. To achieve this we first need the latent space Z𝑍Zitalic_Z to be equipped with a well-defined group action with respect to which gG,zZ:Fψ(gz)=gFψ(z):formulae-sequencefor-all𝑔𝐺𝑧𝑍subscript𝐹𝜓𝑔𝑧𝑔subscript𝐹𝜓𝑧\forall g\in G,z\in Z:F_{\psi}(gz)\,{=}gF_{\psi}(z)∀ italic_g ∈ italic_G , italic_z ∈ italic_Z : italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_g italic_z ) = italic_g italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_z ), and, most importantly, we need the relation between the reconstructed field and the corresponding latent to be equivariant, i.e.,

gG,x𝒳:gfθ(x;ztν)=fθ(g1x;ztν)=fθ(x;gztν).:formulae-sequencefor-all𝑔𝐺𝑥𝒳subscript𝑔subscript𝑓𝜃𝑥superscriptsubscript𝑧𝑡𝜈subscript𝑓𝜃superscript𝑔1𝑥superscriptsubscript𝑧𝑡𝜈subscript𝑓𝜃𝑥𝑔subscriptsuperscript𝑧𝜈𝑡\forall g\in G\,,\,x\in\mathcal{X}:\mathcal{L}_{g}f_{\theta}(x;z_{t}^{\nu})=f_% {\theta}(g^{-1}x;z_{t}^{\nu})=f_{\theta}(x;gz^{\nu}_{t}).∀ italic_g ∈ italic_G , italic_x ∈ caligraphic_X : caligraphic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ; italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ) = italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_g start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_x ; italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ) = italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ; italic_g italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) . (2)

Note that, somewhat imprecisely, we call this condition equivariance to convey the idea even though it is not, strictly speaking, the commonly used definition of equivariance for general operators. If we consider the decoder as a map** from latents to fields, we can make the notion of equivariance of this map** more precise. Namely

f(x)=Dθ(z),Dθ(z):ztνfθ(;ztν),f(g1x)=Dθ(gz),Dθ(gz):gztνfθ(g1;ztν).f(x)=D_{\theta}(z),D_{\theta}(z):z_{t}^{\nu}\mapsto f_{\theta}(\cdot;z_{t}^{% \nu})\,,f(g^{-1}x)=D_{\theta}(gz),D_{\theta}(gz):g\,z_{t}^{\nu}\mapsto f_{% \theta}(g^{-1}\,\cdot;z_{t}^{\nu})\,.italic_f ( italic_x ) = italic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z ) , italic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z ) : italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ↦ italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( ⋅ ; italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ) , italic_f ( italic_g start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_x ) = italic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_g italic_z ) , italic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_g italic_z ) : italic_g italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ↦ italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_g start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ ; italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ) . (3)
Refer to caption
Figure 2: The proposed framework respects pre-defined symmetries of the PDE: a rotated solution gνTsubscript𝑔subscript𝜈𝑇\mathcal{L}_{g}\nu_{T}caligraphic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT may be obtained either by solving from latent z0νsubscriptsuperscript𝑧𝜈0z^{\nu}_{0}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (top-left) and transforming the solution zTνsuperscriptsubscript𝑧𝑇𝜈z_{T}^{\nu}italic_z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT (top-right) to gzTν𝑔superscriptsubscript𝑧𝑇𝜈gz_{T}^{\nu}italic_g italic_z start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT (bottom-right) or transforming z0νsubscriptsuperscript𝑧𝜈0z^{\nu}_{0}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to gz0ν𝑔subscriptsuperscript𝑧𝜈0gz^{\nu}_{0}italic_g italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (bottom-left) and solving this.

In Sec. 3.1 we describe the Equivariant Neural Field (ENF)-based decoder, which satisfies equation (2). Second, in Sec. 3.2 we outline the graph-based equivariant neural ODE. Sec. 3.3 explains the motivation for- and use of- meta-learning for obtaining the ENF backbone parameters. We show how the combination of equivariance and meta-learning produce much more structured latent representations of continuous signals (Fig. 3).

3.1 Representing PDE states with Equivariant Neural Fields

We briefly recap ENFs here, referring the reader to [48] for more detail. We extend ENFs to symmetries for PDEs over varying geometries.

ENFs as cross-attention over bi-invariant attributes.

Attention-based conditional neural fields represent a signal ν𝒟𝜈𝒟\nu\in\mathcal{D}italic_ν ∈ caligraphic_D with a corresponding latent set zνsuperscript𝑧𝜈z^{\nu}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT [50]. This class of conditional neural fields obtain signal-specific reconstructions ν(x)fθ(x;zν)𝜈𝑥subscript𝑓𝜃𝑥superscript𝑧𝜈\nu(x)\approx f_{\theta}(x;z^{\nu})italic_ν ( italic_x ) ≈ italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ; italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ) through a cross-attention operation between the latent set zνsuperscript𝑧𝜈z^{\nu}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT and input coordinates x𝑥xitalic_x. ENFs [48] extend this approach by imposing equivariance constraints w.r.t a group GSE(n)GSEn{\rm G}\subseteq{\rm SE(n)}roman_G ⊆ roman_SE ( roman_n ) on the relation between the neural field and the latents such that transformations to the signal ν𝜈\nuitalic_ν correspond to transformation of the latent zνsuperscript𝑧𝜈z^{\nu}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT (Eq. (2)). For this condition to hold, we need a well-defined action on the latent space Z𝑍Zitalic_Z of fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT. To this end, ENFs define elements of the latent set zνsuperscript𝑧𝜈z^{\nu}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT as tuples of pose piGsubscript𝑝𝑖Gp_{i}\in{\rm G}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ roman_G and context 𝐜idsubscript𝐜𝑖superscript𝑑\mathbf{c}_{i}\in\mathbb{R}^{d}bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, zν:={(pi,𝐜i)}i=1Nassignsuperscript𝑧𝜈subscriptsuperscriptsubscript𝑝𝑖subscript𝐜𝑖𝑁𝑖1z^{\nu}:=\{(p_{i},\mathbf{c}_{i})\}^{N}_{i=1}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT := { ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT. The latent space is then equipped with a group action defined as gz={(gpi,𝐜i)}i=1N𝑔𝑧superscriptsubscript𝑔subscript𝑝𝑖subscript𝐜𝑖𝑖1𝑁gz=\{(gp_{i},\mathbf{c}_{i})\}_{i=1}^{N}italic_g italic_z = { ( italic_g italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT. To achieve equivariance over transformations ENFs follow [5] where equivariance is achieved with convolutional weight-sharing over equivalence classes of points pairs x,x𝑥superscript𝑥x,x^{\prime}italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. ENFs instead extend weight-sharing to cross-attention over bi-invariant attributes of z,x𝑧𝑥z,xitalic_z , italic_x pairs.

Weight-sharing over bi-invariant attributes of z,x𝑧𝑥z,xitalic_z , italic_x is motivated by Eq. 2, by which we have:

fθ(x;z)=fθ(gx;gz).subscript𝑓𝜃𝑥𝑧subscript𝑓𝜃𝑔𝑥𝑔𝑧f_{\theta}(x;z)=f_{\theta}(gx;gz).italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ; italic_z ) = italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_g italic_x ; italic_g italic_z ) . (4)

Intuitively, the above equation says that a transformation g𝑔gitalic_g on the domain of fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, i.e. g1xsuperscript𝑔1𝑥g^{-1}xitalic_g start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_x, can be undone by also acting with g𝑔gitalic_g on z𝑧zitalic_z. In other words, the output of the neural field fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT should be bi-invariant to glimit-from𝑔g-italic_g -transformations of the pair z,x𝑧𝑥z,xitalic_z , italic_x. For a specific pair (zi,xm)Z×Xsubscript𝑧𝑖subscript𝑥𝑚𝑍𝑋(z_{i},x_{m})\in Z\times X( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ∈ italic_Z × italic_X, the term bi-invariant attribute 𝐚i,msubscript𝐚𝑖𝑚\mathbf{a}_{i,m}bold_a start_POSTSUBSCRIPT italic_i , italic_m end_POSTSUBSCRIPT describes a function 𝐚:(zi,xm)𝐚(zi,xm):𝐚maps-tosubscript𝑧𝑖subscript𝑥𝑚𝐚subscript𝑧𝑖subscript𝑥𝑚\mathbf{a}:(z_{i},x_{m})\mapsto\mathbf{a}(z_{i},x_{m})bold_a : ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ↦ bold_a ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) such that 𝐚(zi,xm)=𝐚(gzi,gxm)𝐚subscript𝑧𝑖subscript𝑥𝑚𝐚𝑔subscript𝑧𝑖𝑔subscript𝑥𝑚\mathbf{a}(z_{i},x_{m})=\mathbf{a}(gz_{i},gx_{m})bold_a ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) = bold_a ( italic_g italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_g italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ). Thorughout the paper we use 𝐚i,msubscript𝐚𝑖𝑚\mathbf{a}_{i,m}bold_a start_POSTSUBSCRIPT italic_i , italic_m end_POSTSUBSCRIPT as shorthand for 𝐚(zi,xm)𝐚subscript𝑧𝑖subscript𝑥𝑚\mathbf{a}(z_{i},x_{m})bold_a ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ).

To parameterize fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, we can accordingly choose any function that is bi-invariant to Glimit-from𝐺G-italic_G -transformations of z,x𝑧𝑥z,xitalic_z , italic_x. In particular, for an input coordinate xmsubscript𝑥𝑚x_{m}italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ENFs choose to make fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT a cross-attention operation between attributes 𝐚i,msubscript𝐚𝑖𝑚\mathbf{a}_{i,m}bold_a start_POSTSUBSCRIPT italic_i , italic_m end_POSTSUBSCRIPT and the invariant context vectors 𝐜isubscript𝐜𝑖\mathbf{c}_{i}bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT:

fθ(xm,z)=cross_attn(𝐚:,m,𝐜:,𝐜:)subscript𝑓𝜃subscript𝑥𝑚𝑧cross_attnsubscript𝐚:𝑚subscript𝐜:subscript𝐜:f_{\theta}(x_{m},z)=\operatorname{cross\_attn}(\mathbf{a}_{:,m},\mathbf{c}_{:}% ,\mathbf{c}_{:})italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_z ) = start_OPFUNCTION roman_cross _ roman_attn end_OPFUNCTION ( bold_a start_POSTSUBSCRIPT : , italic_m end_POSTSUBSCRIPT , bold_c start_POSTSUBSCRIPT : end_POSTSUBSCRIPT , bold_c start_POSTSUBSCRIPT : end_POSTSUBSCRIPT ) (5)

As an example, for SE(n)SEn{\rm SE(n)}roman_SE ( roman_n )-equivariance, we can define the bi-invariant simply using the group action: 𝐚i,mSE(n)=pi1xm=𝐑iT(xmxi)subscriptsuperscript𝐚SEn𝑖𝑚subscriptsuperscript𝑝1𝑖subscript𝑥𝑚superscriptsubscript𝐑𝑖𝑇subscript𝑥𝑚subscript𝑥𝑖\mathbf{a}^{\rm SE(n)}_{i,m}=p^{-1}_{i}x_{m}=\mathbf{R}_{i}^{T}(x_{m}-x_{i})bold_a start_POSTSUPERSCRIPT roman_SE ( roman_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_m end_POSTSUBSCRIPT = italic_p start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = bold_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), which is bi-invariant by:

gSE(n):(pi,x)(gpi,gx)pi1x(gpi)1gx=pi1g1gx=pi1x.:for-all𝑔SEnmaps-tosubscript𝑝𝑖𝑥𝑔subscript𝑝𝑖𝑔𝑥maps-tosuperscriptsubscript𝑝𝑖1𝑥superscript𝑔subscript𝑝𝑖1𝑔𝑥superscriptsubscript𝑝𝑖1superscript𝑔1𝑔𝑥superscriptsubscript𝑝𝑖1𝑥\forall g\in{\rm SE(n)}:\;\;(p_{i},x)\;\mapsto\;(g\,p_{i},g\,x)\;\;\;% \Leftrightarrow\;\;\;p_{i}^{-1}x\;\mapsto\;(g\,p_{i})^{-1}g\,x=p_{i}^{-1}g^{-1% }g\,x=p_{i}^{-1}x\,.∀ italic_g ∈ roman_SE ( roman_n ) : ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x ) ↦ ( italic_g italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_g italic_x ) ⇔ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_x ↦ ( italic_g italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_g italic_x = italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_g italic_x = italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_x . (6)

Bi-invariant attributes for PDE solving.

As explained above, ENF is equivariant to SE(n)SEn\mathrm{SE(n)}roman_SE ( roman_n )-transformations by defining fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT as a function of an SE(n)limit-fromSEn{\rm SE(n)}-roman_SE ( roman_n ) -bi-invariant attribute 𝐚SE(n)superscript𝐚SEn\mathbf{a}^{\rm SE(n)}bold_a start_POSTSUPERSCRIPT roman_SE ( roman_n ) end_POSTSUPERSCRIPT. Although many physical processes adhere to roto-translational symmetries, we are also interested in solving PDEs that - due to the geometry of the domain, their specific formulation, and/or their boundary conditions - are not fully SE(n)limit-fromSEn\rm SE(n)-roman_SE ( roman_n ) -equivariant. As such, we are interested in extending ENFs to equivariances that are not strictly (subsets of) SE(n)SEn\rm SE(n)roman_SE ( roman_n ), which we show we can achieve by finding bi-invariants that respect these particular transformations. Below, we provide two examples, the other invariants we use in the experiments - including a "bi-invariant" 𝐚superscript𝐚\mathbf{a}^{\emptyset}bold_a start_POSTSUPERSCRIPT ∅ end_POSTSUPERSCRIPT that is not actually bi-invariant to any geometric transformations, which we use to ablate over equivariance constraints - are in Appx. D.

The flat 2-torus. When the physical domain of interest is continuous and extends indefinitely, periodic boundary conditions are often used, i.e. the PDE is defined over a space topologically equivalent to that of the 2-torus. Such boundary conditions break SO(2)SO2\rm SO(2)roman_SO ( 2 ) symmetries; assuming the domain has periodicity π𝜋\piitalic_π and none of the terms of this PDE depend on the choice of coordinate frame, these boundary conditions imply that the PDE is equivariant to periodic translations: the group of translations modulo π𝜋\piitalic_π: 𝕋22/2superscript𝕋2superscript2superscript2\mathbb{T}^{2}\equiv\mathbb{R}^{2}/\mathbb{Z}^{2}blackboard_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≡ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / blackboard_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. In this case, periodic functions over x,y𝑥𝑦x,yitalic_x , italic_y with periods π𝜋\piitalic_π would work as a bi-invariant, i.e. using poses p𝕋2𝑝superscript𝕋2p\in\mathbb{T}^{2}italic_p ∈ blackboard_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, 𝐚𝕋2=cos(2π(x0p0))+cos(2π(x1p1))superscript𝐚superscript𝕋22𝜋subscript𝑥0subscript𝑝02𝜋subscript𝑥1subscript𝑝1\mathbf{a}^{\mathbb{T}^{2}}=\cos(2\pi(x_{0}-p_{0}))+\cos(2\pi(x_{1}-p_{1}))bold_a start_POSTSUPERSCRIPT blackboard_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT = roman_cos ( 2 italic_π ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) + roman_cos ( 2 italic_π ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) - which happens to be bi-invariant to rotations by π2𝜋2\frac{\pi}{2}divide start_ARG italic_π end_ARG start_ARG 2 end_ARG as well. Instead, since we do not assume any rotational symmetries to exist on the torus, we opt for a non-rotationally symmetric function:

𝐚i,m𝕋2=cos(2π(xi0pi0))cos(2π(xi1pi1)),superscriptsubscript𝐚𝑖𝑚superscript𝕋2direct-sum2𝜋superscriptsubscript𝑥𝑖0superscriptsubscript𝑝𝑖02𝜋superscriptsubscript𝑥𝑖1superscriptsubscript𝑝𝑖1\mathbf{a}_{i,m}^{\mathbb{T}^{2}}=\cos(2\pi(x_{i}^{0}-p_{i}^{0}))\oplus\cos(2% \pi(x_{i}^{1}-p_{i}^{1})),bold_a start_POSTSUBSCRIPT italic_i , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT blackboard_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT = roman_cos ( 2 italic_π ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) ) ⊕ roman_cos ( 2 italic_π ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) ) , (7)

where direct-sum\oplus denotes concatenation. This bi-invariant is used in experiments on Navier-Stokes over the flat 2-Torus.

The 2-sphere. In some settings a PDE may be symmetric only to rotations along a certain axes. An example is that of the global shallow-water equations on the two-sphere - used to model geophysical processes such as atmospheric flow [16], which are characterised by rotational symmetry only along the earth’s axis of rotation due to inclusion of a term for Coriolis acceleration that breaks full SO(3)SO3\rm SO(3)roman_SO ( 3 ) equivariance. We use poses pSO(3)𝑝SO3p\in\rm SO(3)italic_p ∈ roman_SO ( 3 ) parametrised by Euler angles ϕ,θ,γitalic-ϕ𝜃𝛾\phi,\theta,\gammaitalic_ϕ , italic_θ , italic_γ, and spherical coordinates ϕ,θitalic-ϕ𝜃\phi,\thetaitalic_ϕ , italic_θ for xS2𝑥superscript𝑆2x\in S^{2}italic_x ∈ italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. We make the first two Euler angles coincide with the spherical coordinates and define a bi-invariant for rotations around the axis θ=π𝜃𝜋\theta=\piitalic_θ = italic_π.

𝐚i,mSW=Δϕpi,xmθpiγpiθxm,subscriptsuperscript𝐚SW𝑖𝑚direct-sumΔsubscriptitalic-ϕsubscript𝑝𝑖subscript𝑥𝑚subscript𝜃subscript𝑝𝑖subscript𝛾subscript𝑝𝑖subscript𝜃subscript𝑥𝑚\mathbf{a}^{\text{SW}}_{i,m}=\Delta\phi_{p_{i},x_{m}}\oplus\theta_{p_{i}}% \oplus\gamma_{p_{i}}\oplus\theta_{x_{m}},bold_a start_POSTSUPERSCRIPT SW end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_m end_POSTSUBSCRIPT = roman_Δ italic_ϕ start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⊕ italic_θ start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⊕ italic_γ start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⊕ italic_θ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , (8)

where Δϕpi,xm=ϕpiϕxm2πΔsubscriptitalic-ϕsubscript𝑝𝑖subscript𝑥𝑚subscriptitalic-ϕsubscript𝑝𝑖subscriptitalic-ϕsubscript𝑥𝑚2𝜋\Delta\phi_{p_{i},x_{m}}{=}\phi_{p_{i}}{-}\phi_{x_{m}}{-}2\piroman_Δ italic_ϕ start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_ϕ start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_ϕ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT - 2 italic_π if ϕpiϕxm>πsubscriptitalic-ϕsubscript𝑝𝑖subscriptitalic-ϕsubscript𝑥𝑚𝜋\phi_{p_{i}}{-}\phi_{x_{m}}>\piitalic_ϕ start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_ϕ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT > italic_π and Δϕpi,xm=ϕpiϕxm+2πΔsubscriptitalic-ϕsubscript𝑝𝑖subscript𝑥𝑚subscriptitalic-ϕsubscript𝑝𝑖subscriptitalic-ϕsubscript𝑥𝑚2𝜋\Delta\phi_{p_{i},x_{m}}{=}\phi_{p_{i}}{-}\phi_{x_{m}}+2\piroman_Δ italic_ϕ start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_ϕ start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_ϕ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT + 2 italic_π if ϕpiϕxm<πsubscriptitalic-ϕsubscript𝑝𝑖subscriptitalic-ϕsubscript𝑥𝑚𝜋\phi_{p_{i}}{-}\phi_{x_{m}}<{-}\piitalic_ϕ start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_ϕ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT < - italic_π, to adjust for periodicity.

In summary, to parameterize an ENF equivariant with respect to a specific group we are simply required to find attributes that are bi-invariant with respect to the same group. In general we achieve this by using group-valued poses and their action on the PDE domain.

3.2 PDE solution as latent space flow

Let z0νsubscriptsuperscript𝑧𝜈0z^{\nu}_{0}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT be a latent set that faithfully reconstructs the initial state ν0subscript𝜈0\nu_{0}italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. We want to define a neural ODE Fψsubscript𝐹𝜓F_{\psi}italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT that map latents ztνsubscriptsuperscript𝑧𝜈𝑡z^{\nu}_{t}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to their temporal derivatives dzτνdτ=Fψ(zτν)𝑑superscriptsubscript𝑧𝜏𝜈𝑑𝜏subscript𝐹𝜓subscriptsuperscript𝑧𝜈𝜏\dfrac{dz_{\tau}^{\nu}}{d\tau}{=}F_{\psi}(z^{\nu}_{\tau})divide start_ARG italic_d italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT end_ARG start_ARG italic_d italic_τ end_ARG = italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) that is equivariant with respect to the group action: gFψ(zτν)=Fψ(gzτν)𝑔subscript𝐹𝜓subscriptsuperscript𝑧𝜈𝜏subscript𝐹𝜓𝑔subscriptsuperscript𝑧𝜈𝜏gF_{\psi}(z^{\nu}_{\tau}){=}F_{\psi}(gz^{\nu}_{\tau})italic_g italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) = italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_g italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ). To this end, we use a message passing neural network (MPNN) to learn a flow of poses pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and contexts 𝐜isubscript𝐜𝑖\mathbf{c}_{i}bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over time. We base our architecture on PΘΘ\Thetaroman_ΘNITA [5], which employs convolutional weight-sharing over bi-invariants for SE(n)SEn\rm SE(n)roman_SE ( roman_n ). For an in-depth recap of message-passing frameworks, we refer the reader to Appx. A. Since Fψsubscript𝐹𝜓F_{\psi}italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT is required to be equivariant w.r.t. the group action, any updates to the poses pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT should also be equivariant. [40] propose to parameterize an equivariant node position update by using a basis spanned by relative node positions xjxisubscript𝑥𝑗subscript𝑥𝑖x_{j}-x_{i}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. In our setting, poses pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are points on a manifold M𝑀Mitalic_M equipped with a group action. As such, we analogously propose parameterizing pose updates by a weighted combination of logarithmic maps logpi(pj)subscriptsubscript𝑝𝑖subscript𝑝𝑗\log_{p_{i}}(p_{j})roman_log start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), which intuitively describe the relative position between pi,pjsubscript𝑝𝑖subscript𝑝𝑗p_{i},p_{j}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in the tangent space TpiMsubscript𝑇subscript𝑝𝑖𝑀T_{p_{i}}Mitalic_T start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_M, or the displacement from pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to pjsubscript𝑝𝑗p_{j}italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. We integrate the resulting pose update over the manifold through the exponential map exppisubscriptsubscript𝑝𝑖\exp_{p_{i}}roman_exp start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT. In the euclidean case logpi(pj)=xjxisubscriptsubscript𝑝𝑖subscript𝑝𝑗subscript𝑥𝑗subscript𝑥𝑖\log_{p_{i}}(p_{j}){=}x_{j}-x_{i}roman_log start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and we get back node position updates per [40]. In short, the message passing layers we use consist of the following update functions:

𝐜il+1=(pj,𝐜j)zν,lkcontext(𝐚i,jl)𝐜jl,superscriptsubscript𝐜𝑖𝑙1subscriptsubscript𝑝𝑗subscript𝐜𝑗superscript𝑧𝜈𝑙superscript𝑘contextsubscriptsuperscript𝐚𝑙𝑖𝑗superscriptsubscript𝐜𝑗𝑙\displaystyle\mathbf{c}_{i}^{l+1}=\sum\limits_{(p_{j},\mathbf{c}_{j})\in z^{% \nu,l}}k^{\text{context}}(\mathbf{a}^{l}_{i,j})\mathbf{c}_{j}^{l},bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ∈ italic_z start_POSTSUPERSCRIPT italic_ν , italic_l end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT context end_POSTSUPERSCRIPT ( bold_a start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) bold_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , pil+1=exppil(1N(pjl,𝐜jl)zν,lkpose(𝐚i,jl)𝐜jllogpil(pjl)),superscriptsubscript𝑝𝑖𝑙1subscriptsuperscriptsubscript𝑝𝑖𝑙1𝑁subscriptsuperscriptsubscript𝑝𝑗𝑙superscriptsubscript𝐜𝑗𝑙superscript𝑧𝜈𝑙superscript𝑘posesubscriptsuperscript𝐚𝑙𝑖𝑗superscriptsubscript𝐜𝑗𝑙subscriptsuperscriptsubscript𝑝𝑖𝑙superscriptsubscript𝑝𝑗𝑙\displaystyle p_{i}^{l+1}=\exp_{p_{i}^{l}}\bigg{(}\frac{1}{N}\sum\limits_{(p_{% j}^{l},\mathbf{c}_{j}^{l})\in z^{\nu,l}}k^{\text{pose}}(\mathbf{a}^{l}_{i,j})% \mathbf{c}_{j}^{l}\log_{p_{i}^{l}}(p_{j}^{l})\bigg{)},italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT = roman_exp start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , bold_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ∈ italic_z start_POSTSUPERSCRIPT italic_ν , italic_l end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT pose end_POSTSUPERSCRIPT ( bold_a start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) bold_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT roman_log start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ) , (9)

with kcontext,kposesuperscript𝑘contextsuperscript𝑘posek^{\text{context}},k^{\text{pose}}italic_k start_POSTSUPERSCRIPT context end_POSTSUPERSCRIPT , italic_k start_POSTSUPERSCRIPT pose end_POSTSUPERSCRIPT message functions weighting the incoming context and pose updates, parameterized by a two-layer MLP as a function of the respective bi-invariant.

Refer to caption
((a))
Refer to caption
((b))
Refer to caption
((c))
Figure 3: We show the impact of meta-learning and equivariance on the latent space of the ENF when representing trajectories of PDE states. Fig. 3(a) shows a T-SNE plot of the latent space of fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT when ztνsubscriptsuperscript𝑧𝜈𝑡z^{\nu}_{t}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is optimized with autodecoding, and no weight sharing over bi-invariants is enforced. Fig. 3(b) shows the latent space when meta-learning is used, but no weight sharing is enforced. Fig. 3(c) shows the latent space when ztνsubscriptsuperscript𝑧𝜈𝑡z^{\nu}_{t}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are obtained using meta-learning and fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT shares weights over 𝐚SE(n)superscript𝐚SEn\mathbf{a}^{\rm SE(n)}bold_a start_POSTSUPERSCRIPT roman_SE ( roman_n ) end_POSTSUPERSCRIPT.

3.3 Obtaining the initial latent z0νsubscriptsuperscript𝑧𝜈0z^{\nu}_{0}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT

Until now we’ve not discussed how to obtain latents corresponding to the initial condition z0νsubscriptsuperscript𝑧𝜈0z^{\nu}_{0}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. An approach often used in conditional neural field literature is that of autodecoding [35], where latents zνsuperscript𝑧𝜈z^{\nu}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT are optimized for reconstruction of the input signal ν𝜈\nuitalic_ν with SGD. Optimizing a NeF for reconstruction does not necessarily lead to good quality representations [34], i.e. using MSE-based autodecoding to obtain latents ztνsubscriptsuperscript𝑧𝜈𝑡z^{\nu}_{t}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - as is proposed by [49] - may complicate the latent space, impeding optimization of the neural ODE Fψsubscript𝐹𝜓F_{\psi}italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT. Moreover, autodecoding requires many optimization steps at inference (for reference, [49] use 300-500 steps). [13] propose meta-learning as a way to overcome long inference times, as it allows for fitting latents in a few steps - typically three or four. We hypothesize that meta-learning may also structure the latent space - similar to the impact of equivariance constraints, since the very limited number of optimization steps requires efficient organization of latents ztνsubscriptsuperscript𝑧𝜈𝑡z^{\nu}_{t}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT around the (shared) initialization, forcing together the latent representation of contiguous states. To this end, we propose to use meta-learning for obtaining the initial latent z0νsubscriptsuperscript𝑧𝜈0z^{\nu}_{0}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, which is then unrolled by the neural ode Fψsubscript𝐹𝜓F_{\psi}italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT to find solutions ztνsubscriptsuperscript𝑧𝜈𝑡z^{\nu}_{t}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

3.4 Equivariance and meta-learning structure the latent space Z𝑍Zitalic_Z

As a first validation of the hypotheses that both equivariance constraints and meta-learning introduce structure to the latent space of fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, we visualize latent spaces of different variants of the ENF. We fit ENFs to a dataset consisting of solutions to the heat equation for various initial conditions (details in Appx. E). For each sample νtsubscript𝜈𝑡\nu_{t}italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, we obtain a set of latents ztνsubscriptsuperscript𝑧𝜈𝑡z^{\nu}_{t}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, which we average over the invariant context vectors 𝐜icsubscript𝐜𝑖superscript𝑐\mathbf{c}_{i}\in\mathbb{R}^{c}bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT to obtain a single vector in csuperscript𝑐\mathbb{R}^{c}blackboard_R start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT invariant to a group action according to the chosen bi-invariant. Next, we apply T-SNE [46] to the resulting vectors in csuperscript𝑐\mathbb{R}^{c}blackboard_R start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT. We use three different setups: (a) no meta-learning, model weights θ𝜃\thetaitalic_θ and latents ztνsubscriptsuperscript𝑧𝜈𝑡z^{\nu}_{t}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT optimized for every νtsubscript𝜈𝑡\nu_{t}italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT separately using autodecoding [35], and no equivariance imposed (per Eq. 15), shown in Fig. 3(a). (b) meta-learning is used to obtain θ𝜃\thetaitalic_θ,ztνsubscriptsuperscript𝑧𝜈𝑡z^{\nu}_{t}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, but no equivariance imposed, shown in Fig. 3(b) and (c) meta-learning is used to obtain θ𝜃\thetaitalic_θ,ztνsubscriptsuperscript𝑧𝜈𝑡z^{\nu}_{t}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and SE(2)SE2{\rm SE(2)}roman_SE ( 2 )-equivariance is imposed by weight-sharing over 𝐚SE(n)superscript𝐚SEn\mathbf{a}^{\rm SE(n)}bold_a start_POSTSUPERSCRIPT roman_SE ( roman_n ) end_POSTSUPERSCRIPT bi-invariants, shown in Fig. 3(c). The results confirm our intuition that both meta-learning and equivariance improve latent-space structure.

Recap: optimization objective.

We use a meta-learning inner-loop [28, 13] to obtain the initial latent z0νsubscriptsuperscript𝑧𝜈0z^{\nu}_{0}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT under supervision of coordinate-value pairs (x,ν(x)0)x𝒳subscript𝑥𝜈subscript𝑥0𝑥𝒳(x,\nu(x)_{0})_{x\in\mathcal{X}}( italic_x , italic_ν ( italic_x ) start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT from ν0subscript𝜈0\nu_{0}italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. This latent is unrolled for ttrainsubscript𝑡traint_{\text{train}}italic_t start_POSTSUBSCRIPT train end_POSTSUBSCRIPT timesteps using Fψsubscript𝐹𝜓F_{\psi}italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT. The obtained latents are used to reconstruct states ztνsubscriptsuperscript𝑧𝜈𝑡z^{\nu}_{t}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT along the trajectory of ν𝜈\nuitalic_ν, and parameters of fθ,Fψsubscript𝑓𝜃subscript𝐹𝜓f_{\theta},F_{\psi}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT , italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT are optimised for reconstruction MSE, as shown in the left-hand side of Eq. 1. See Appx. B for detailed pseudocode of this process.

4 Experiments

We intend to show the impact of symmetry-preservation in continuous PDE solving. To this end we perform a range of experiments assessing different qualities of our model on tasks with different symmetries. First, we investigate the equivariance properties of our framework by evaluating it against unseen geometric transformations of the initial conditions. Next, we assess generalization and extrapolation capabilities w.r.t. unseen spatial locations and time horizons inside and outside the time ranges seen during training respectively, robustness to partial test-time observations, and data-efficiency. As the continuous nature of NeF-based PDE solving allows, we verify these properties for PDEs defined over challenging geometries: the plane 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, 2-torus 𝕋2superscript𝕋2\mathbb{T}^{2}blackboard_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and the sphere S2superscript𝑆2S^{2}italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and the 3D ball 𝔹3superscript𝔹3\mathbb{B}^{3}blackboard_B start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT. Architectural details and hyperparameters are in Appx. E. Code is attached to submission.

Table 1: MSE \downarrow for heat equation on 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.
tinsubscript𝑡int_{\text{in}}italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT train toutsubscript𝑡outt_{\text{out}}italic_t start_POSTSUBSCRIPT out end_POSTSUBSCRIPT train tinsubscript𝑡int_{\text{in}}italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT test toutsubscript𝑡outt_{\text{out}}italic_t start_POSTSUBSCRIPT out end_POSTSUBSCRIPT test
DINo [49] 5.92E-04 2.40E-04 3.85E-03 5.12E-03
Ours 𝐚superscript𝐚\mathbf{a}^{\emptyset}bold_a start_POSTSUPERSCRIPT ∅ end_POSTSUPERSCRIPT 6.23±plus-or-minus\pm±1.01E-06 4.90±plus-or-minus\pm±20.1E-06 2.19±plus-or-minus\pm±0.32E-03 5.08±plus-or-minus\pm±13.2E-04
Ours 𝐚SE(2)superscript𝐚SE2\mathbf{a}^{\rm SE(2)}bold_a start_POSTSUPERSCRIPT roman_SE ( 2 ) end_POSTSUPERSCRIPT 1.18±plus-or-minus\pm±0.45E-05 2.53±plus-or-minus\pm±3.50E-05 1.50±plus-or-minus\pm±0.77E-05 2.53±plus-or-minus\pm±3.43E-05
Refer to caption
Figure 4: A train and test sample from the planar diffusion dataset. Initial conditions for train and test are spikes in disjoint subsets of 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

4.1 Datasets and evaluation

All datasets are obtained by randomly sampling disjoint sets of initial conditions for train and test sets, and solving them using numerical methods. Dataset-specific details on generation can be found in Appx E. •Heat equation on 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and S2superscript𝑆2S^{2}italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The heat equation describes diffusion over a surface: dcdt=D2c𝑑𝑐𝑑𝑡𝐷superscript2𝑐\frac{dc}{dt}=D\nabla^{2}cdivide start_ARG italic_d italic_c end_ARG start_ARG italic_d italic_t end_ARG = italic_D ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_c, where c𝑐citalic_c is a scalar field, and D𝐷Ditalic_D is the diffusivity coefficient. We solve it on the 2D plane where 2c=2cx1+2cx2superscript2𝑐superscript2𝑐subscript𝑥1superscript2𝑐subscript𝑥2\nabla^{2}c=\frac{\partial^{2}c}{\partial x_{1}}+\frac{\partial^{2}c}{\partial x% _{2}}∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_c = divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_c end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG + divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_c end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG - and on the 2-sphere S2superscript𝑆2S^{2}italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT where in spherical coordinates: 2c=(1sinθθ(sinθcθ)+1sin2θ2cϕ2)superscript2𝑐1𝜃𝜃𝜃𝑐𝜃1superscript2𝜃superscript2𝑐superscriptitalic-ϕ2\nabla^{2}c=\left(\frac{1}{\sin\theta}\frac{\partial}{\partial\theta}\left(% \sin\theta\frac{\partial c}{\partial\theta}\right)+\frac{1}{\sin^{2}\theta}% \frac{\partial^{2}c}{\partial\phi^{2}}\right)∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_c = ( divide start_ARG 1 end_ARG start_ARG roman_sin italic_θ end_ARG divide start_ARG ∂ end_ARG start_ARG ∂ italic_θ end_ARG ( roman_sin italic_θ divide start_ARG ∂ italic_c end_ARG start_ARG ∂ italic_θ end_ARG ) + divide start_ARG 1 end_ARG start_ARG roman_sin start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_θ end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_c end_ARG start_ARG ∂ italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ). Although a relatively simple PDE, we find that defining it over a non-trivial geometry such as the sphere proves hard for non-equivariant methods. •Navier-Stokes on 𝕋2superscript𝕋2\mathbb{T}^{2}blackboard_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. We solve 2D Navier Stokes [42] for an incompressible fluid with dynamics dvdt=uv+vΔμ+f,v=×u,u=0formulae-sequence𝑑𝑣𝑑𝑡𝑢𝑣𝑣Δ𝜇𝑓formulae-sequence𝑣𝑢𝑢0\frac{dv}{dt}=-u\nabla v+v\Delta\mu+f,v=\nabla\times u,\nabla u=0divide start_ARG italic_d italic_v end_ARG start_ARG italic_d italic_t end_ARG = - italic_u ∇ italic_v + italic_v roman_Δ italic_μ + italic_f , italic_v = ∇ × italic_u , ∇ italic_u = 0, where u𝑢uitalic_u is the velocity field, v𝑣vitalic_v the vorticity, μ𝜇\muitalic_μ the viscosity and f𝑓fitalic_f a forcing term (see Appx. E). We create a dataset of solutions for the vorticity using Gaussian random fields as initial conditions. Due to the incompressibility condition, it is natural to solve this PDE with periodic boundary conditions corresponding to the topology of a 2-Torus 𝕋2superscript𝕋2\mathbb{T}^{2}blackboard_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - implying equivariance to periodic translation. •Shallow-water on 𝕊2superscript𝕊2\mathbb{S}^{2}blackboard_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The global shallow-water equations model large-scale oceanic and atmospheric flow on the globe, derived from Navier-Stokes under assumption of shallow fluid depth. The global shallow-water equations (see Appx. E) include terms for Coriolis accelleration, which makes this problem equivariant to rotation along the globe’s axis of rotation. We follow the IVP specified by [16], and create a dataset of paired vorticity-fluid height solutions. •Internally-heated convection in a 3D ball. We solve the Boussinesq equation for internally heated convection in a ball, a model relevant for example in the context of the Earth’s mantle convection. It involves continuity equations for mass conservation, momentum equations for fluid flow under pressure, viscous forces and buoyancy, and a term modelling heat transfer. We generate initial conditions varying the internal temperature using N(0,1)𝑁01N(0,1)italic_N ( 0 , 1 ) noise and obtain solutions for the temperature defined over a regular spherical ϕ,θ,ritalic-ϕ𝜃𝑟\phi,\theta,ritalic_ϕ , italic_θ , italic_r grid.

Evaluation.

All reported MSE values are for predictions obtained given only the initial condition v0subscript𝑣0v_{0}italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, with std over 3 runs. We evaluate two settings for train and test sets both: generalization setting with time evolution happening within the seen horizon during training (tinsubscript𝑡int_{\text{in}}italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT); and, extrapolation setting with the time evolution happening outside the seen horizon during training (toutsubscript𝑡outt_{\text{out}}italic_t start_POSTSUBSCRIPT out end_POSTSUBSCRIPT). For both cases we measure the mean-squared error (MSE). To position our work relative to competitive data-driven PDE solvers, on the 2D-Navier-Stokes experiment we provide comparisons with a range of baselines. In most other settings these models cannot straightforwardly be applied, and we only compare to [49], to our knowledge the only other fully continuous PDE solving method in literature.

Equivariance properties - heat equation on the plane.

To verify our framework respects the posed equivariance constraints, we create a dataset of solutions to the heat equation that requires a neural solver to respect equivariance constraints to achieve good performance. Specifically, for initial conditions we randomly insert a pulse of variable intensity in x=(x1,x2)2𝑥subscript𝑥1subscript𝑥2superscript2x=(x_{1},x_{2})\in\mathbb{R}^{2}italic_x = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT s.t. 1<x1<1,0<x2<1formulae-sequence1subscript𝑥110subscript𝑥21-1{<}x_{1}{<}1,0{<}x_{2}{<}1- 1 < italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < 1 , 0 < italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < 1 for the training data and 1<x1<1,1<x2<0formulae-sequence1subscript𝑥111subscript𝑥20-1{<}x_{1}{<}1,-1{<}x_{2}{<}0- 1 < italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < 1 , - 1 < italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < 0 for the test data. Intuitively, train and test sets contain spikes under different disjoint sets of roto-translations (see Fig. 4). We train variants of our framework with (𝐚SE(2)superscript𝐚SE2\mathbf{a}^{\rm SE(2)}bold_a start_POSTSUPERSCRIPT roman_SE ( 2 ) end_POSTSUPERSCRIPT, Eq. 6) and without (𝐚superscript𝐚\mathbf{a}^{\emptyset}bold_a start_POSTSUPERSCRIPT ∅ end_POSTSUPERSCRIPT, Eq. 15) equivariance constraints. In this dataset, we set tin=[0,,9]subscript𝑡in09t_{\text{in}}=\left[0,...,9\right]italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT = [ 0 , … , 9 ], and evaluation horizon tout=[10,,20]subscript𝑡out1020t_{\text{out}}=\left[10,...,20\right]italic_t start_POSTSUBSCRIPT out end_POSTSUBSCRIPT = [ 10 , … , 20 ]. Results in Tab. 1 show that the non-equivariant model, as well as the baseline [49] are unable to successfully solve test initial conditions, whereas the equivariant model performs well.

Refer to caption
Figure 5: A Navier-Stokes test sample (top) and corresponding predictions from our model (bottom). We visualize predictions in the train horizon tin=[0,,9],tout=[10,,20]formulae-sequencesubscript𝑡in09subscript𝑡out1020t_{\text{in}}=\left[0,...,9\right],t_{\text{out}}=\left[10,...,20\right]italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT = [ 0 , … , 9 ] , italic_t start_POSTSUBSCRIPT out end_POSTSUBSCRIPT = [ 10 , … , 20 ] and beyond. The model remains stable well beyond the train horizon, but due to accumulated errors fails to capture dynamics beyond t>40𝑡40t>40italic_t > 40.
Refer to caption
Figure 6: Test MSE tinsubscript𝑡int_{\text{in}}italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT for increasing training set sizes for the heat equation over the sphere. Equivariant improves over non-equivariant. For reference we show performance of DINo [49] trained on 256 trajectories.
Table 2: MSE \downarrow for Navier-Stokes on 𝕋2superscript𝕋2\mathbb{T}^{2}blackboard_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.
tinsubscript𝑡int_{\text{in}}italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT train toutsubscript𝑡outt_{\text{out}}italic_t start_POSTSUBSCRIPT out end_POSTSUBSCRIPT train tinsubscript𝑡int_{\text{in}}italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT test toutsubscript𝑡outt_{\text{out}}italic_t start_POSTSUBSCRIPT out end_POSTSUBSCRIPT test
100% of ν0subscript𝜈0\nu_{0}italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT observed
CNODE [2] 6.02E-02 3.35E-01 5.48E-02 3.17E-01
FNO 9.43E-05 2.11E-03 8.44E-05 1.60E-03
G-FNO 3.13E-05 3.49E-04 3.15E-05 3.52E-04
DINo [49] 8.20E-03 6.85E-02 1.11E-02 9.08E-02
Ours AD,𝐚𝕋2/πsuperscript𝐚subscript𝕋2𝜋\mathbf{a}^{\mathbb{T}_{2}/\pi}bold_a start_POSTSUPERSCRIPT blackboard_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / italic_π end_POSTSUPERSCRIPT 5.60±plus-or-minus\pm±0.43E-02 0.37±plus-or-minus\pm±0.34E-01 6.75±plus-or-minus\pm±0.62E-02 4.00±plus-or-minus\pm±0.38E-01
Ours 𝐚superscript𝐚\mathbf{a}^{\emptyset}bold_a start_POSTSUPERSCRIPT ∅ end_POSTSUPERSCRIPT 1.41±plus-or-minus\pm±1.83E-02 1.67±plus-or-minus\pm±1.27E-01 2.60±plus-or-minus\pm±3.16E-02 2.14±plus-or-minus\pm±1.46E-01
Ours 𝐚𝕋2/πsuperscript𝐚subscript𝕋2𝜋\mathbf{a}^{\mathbb{T}_{2}/\pi}bold_a start_POSTSUPERSCRIPT blackboard_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / italic_π end_POSTSUPERSCRIPT 1.45±plus-or-minus\pm±0.08E-03 9.14±plus-or-minus\pm±0.36E-03 1.57±plus-or-minus\pm±0.09E-03 1.16±plus-or-minus\pm±0.14E-02
50% of ν0subscript𝜈0\nu_{0}italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT observed
CNODE [2] 1.38E-01 6.33E-01 1.52E-01 6.76E-01
FNO 3.31E-02 1.39E-01 3.20E-02 1.47E-01
G-FNO 2.75E-02 1.17E-01 2.32E-02 1.01E-01
DINo [49] 3.67E-02 2.81E-01 3.74E-02 2.83E-01
Ours AD,𝐚𝕋2/πsuperscript𝐚subscript𝕋2𝜋\mathbf{a}^{\mathbb{T}_{2}/\pi}bold_a start_POSTSUPERSCRIPT blackboard_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / italic_π end_POSTSUPERSCRIPT 6.89±plus-or-minus\pm±2.68E-02 3.95±plus-or-minus\pm±2.18E-01 7.01±plus-or-minus\pm±3.56E-02 4.01±plus-or-minus\pm±2.29E-01
Ours 𝐚superscript𝐚\mathbf{a}^{\emptyset}bold_a start_POSTSUPERSCRIPT ∅ end_POSTSUPERSCRIPT 1.05±plus-or-minus\pm±0.04E-02 1.45±plus-or-minus\pm±0.01E-01 2.60±plus-or-minus\pm±3.16E-02 2.14±plus-or-minus\pm±1.46E-01
Ours 𝐚𝕋2/πsuperscript𝐚subscript𝕋2𝜋\mathbf{a}^{\mathbb{T}_{2}/\pi}bold_a start_POSTSUPERSCRIPT blackboard_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / italic_π end_POSTSUPERSCRIPT 1.50±plus-or-minus\pm±0.17E-03 8.97±plus-or-minus\pm±1.57E-03 5.75±plus-or-minus\pm±2.58E-03 5.03±plus-or-minus\pm±2.63E-02
5% of ν0subscript𝜈0\nu_{0}italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT observed
CNODE [2] 1.23E+01 2.14E+01 1.20E+01 4.35E+01
FNO 4.13E-01 7.70E-01 3.84E-01 7.07E-01
G-FNO 3.56E-01 7.09E-01 3.40E-01 6.47E-01
DINo [49] 3.67E-02 2.81E-01 3.94E-02 2.91E-01
Ours AD,𝐚𝕋2/πsuperscript𝐚subscript𝕋2𝜋\mathbf{a}^{\mathbb{T}_{2}/\pi}bold_a start_POSTSUPERSCRIPT blackboard_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / italic_π end_POSTSUPERSCRIPT 6.89±plus-or-minus\pm±2.68E-02 3.95±plus-or-minus\pm±2.18E-01 7.01±plus-or-minus\pm±3.56E-02 4.01±plus-or-minus\pm±2.29E-01
Ours 𝐚superscript𝐚\mathbf{a}^{\emptyset}bold_a start_POSTSUPERSCRIPT ∅ end_POSTSUPERSCRIPT 7.31±plus-or-minus\pm±1.37E-02 2.97±plus-or-minus\pm±2.42E-01 7.96±plus-or-minus\pm±1.65E-02 3.35±plus-or-minus\pm±3.41E-01
Ours 𝐚𝕋2/πsuperscript𝐚subscript𝕋2𝜋\mathbf{a}^{\mathbb{T}_{2}/\pi}bold_a start_POSTSUPERSCRIPT blackboard_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / italic_π end_POSTSUPERSCRIPT 3.19±plus-or-minus\pm±1.07E-02 1.33±plus-or-minus\pm±0.35E-01 3.44±plus-or-minus\pm±1.43E-02 1.61±plus-or-minus\pm±4.93E-01

Robustness to subsampling & time-horizons - Navier-Stokes on the 2-Torus.

We perform an experiment assessing the impact of equivariance constraints and meta-learning on robustness to sparse test-time observations of the initial condition. To this end, we train a model with (𝐚𝕋2superscript𝐚superscript𝕋2\mathbf{a}^{\mathbb{T}^{2}}bold_a start_POSTSUPERSCRIPT blackboard_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, Eq. 7), without (𝐚superscript𝐚\mathbf{a}^{\emptyset}bold_a start_POSTSUPERSCRIPT ∅ end_POSTSUPERSCRIPT, Eq. 15) equivariance constraints, and one with equivariance constraints and without meta-learning (AD 𝐚𝕋2superscript𝐚superscript𝕋2\mathbf{a}^{\mathbb{T}^{2}}bold_a start_POSTSUPERSCRIPT blackboard_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, Eq. 7), on a fully-observed train set. The training horizon tin=[0,,9]subscript𝑡in09t_{\text{in}}=\left[0,...,9\right]italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT = [ 0 , … , 9 ], and evaluation horizon tout=[10,,20]subscript𝑡out1020t_{\text{out}}=\left[10,...,20\right]italic_t start_POSTSUBSCRIPT out end_POSTSUBSCRIPT = [ 10 , … , 20 ]. Subsequently, we apply the trained model to the problem of solving from sparse initial conditions v0subscript𝑣0v_{0}italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, with observation rates where 50%percent5050\%50 % and 5%percent55\%5 % of the initial condition is observed (Tab. 2). Approaches operating on discrete (CNODE [2]) and regular grids (FNO [29], G-FNO [20]) perform very well when evaluated on fully-observed regular grids, outperforming continuous approaches (ours, [49]). However, we note that all discrete/regular models greatly deteriorate in performance when observation rates decrease. Equivariance constraints and meta-learning clearly improve performance overall, achieving best perfomance in all sparse settings. Our proposed framework performs competitively to discrete baselines and other NeF based PDE solving methods [49] in the fully observed setting. To qualitatively assess long-term stability well-beyond the train horizon, we visualizate test trajectory and the solution found by our model for tin=[0,,9],tout=[10,,20]formulae-sequencesubscript𝑡in09subscript𝑡out1020t_{\text{in}}=\left[0,...,9\right],t_{\text{out}}=\left[10,...,20\right]italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT = [ 0 , … , 9 ] , italic_t start_POSTSUBSCRIPT out end_POSTSUBSCRIPT = [ 10 , … , 20 ] and beyond in Fig. 6.

Data-efficiency - Diffusion on the sphere.

To assess the impact of equivariance on data efficiency, we vary the size of the training set of heat equation solutions from 16 to 64 trajectories and apply a model with (𝐚SO(3)superscript𝐚SO3\mathbf{a}^{\rm SO(3)}bold_a start_POSTSUPERSCRIPT roman_SO ( 3 ) end_POSTSUPERSCRIPT, Eq. 13) and without (𝐚superscript𝐚\mathbf{a}^{\emptyset}bold_a start_POSTSUPERSCRIPT ∅ end_POSTSUPERSCRIPT, Eq. 15) equivariance constraints. In this dataset, we set tin=[0,,9]subscript𝑡in09t_{\text{in}}=\left[0,...,9\right]italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT = [ 0 , … , 9 ], and evaluation horizon tout=[10,,20]subscript𝑡out1020t_{\text{out}}=\left[10,...,20\right]italic_t start_POSTSUBSCRIPT out end_POSTSUBSCRIPT = [ 10 , … , 20 ]. We visualize tinsubscript𝑡int_{\text{in}}italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT test- and train MSE in Fig. 6. These results show the non-equivariant model overfitting the training set for smaller numbers of trajectories while unable to solve the PDE satisfactorily, whereas the equivariant model generalizes well even with only 16 training trajectories.

Table 3: MSE \downarrow on Shallow-Water equations on the sphere.
tinsubscript𝑡int_{\text{in}}italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT train toutsubscript𝑡outt_{\text{out}}italic_t start_POSTSUBSCRIPT out end_POSTSUBSCRIPT train tinsubscript𝑡int_{\text{in}}italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT test toutsubscript𝑡outt_{\text{out}}italic_t start_POSTSUBSCRIPT out end_POSTSUBSCRIPT test
Train resolution
DINo [49] 1.75E-04 1.36E-03 2.01E-04 1.37E-03
Ours 𝐚SWsuperscript𝐚SW\mathbf{a}^{\rm SW}bold_a start_POSTSUPERSCRIPT roman_SW end_POSTSUPERSCRIPT 9.94±plus-or-minus\pm±0.41E-05 1.89±plus-or-minus\pm±0.03E-03 1.09±plus-or-minus\pm±1.14E-04 1.87±plus-or-minus\pm±0.04E-03
zero-shot 2x super-resolution
DINo [49] 3.03E-04 2.03E-03 3.37E-04 2.03E-03
Ours 𝐚SWsuperscript𝐚SW\mathbf{a}^{\rm SW}bold_a start_POSTSUPERSCRIPT roman_SW end_POSTSUPERSCRIPT 1.58 ±plus-or-minus\pm± 0.02E-04 1.96 ±plus-or-minus\pm±0.02E-03 1.61 ±plus-or-minus\pm±0.01E-04 1.93 ±plus-or-minus\pm±0.02E-03
[Uncaptioned image]
Figure 7: Test samples at train resolution (top), 2×2\times2 × train resolution (middle) and corresponding predictions from our equivariant model (𝐚SWsuperscript𝐚SW\mathbf{a}^{\rm SW}bold_a start_POSTSUPERSCRIPT roman_SW end_POSTSUPERSCRIPT Eq. 8 (bottom). The model does not produce significant upsampling artefacts, but fails to capture dynamics outside the training horizon.
Table 4: MSE \downarrow on Internally-Heated Convection in the ball.
tinsubscript𝑡int_{\text{in}}italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT train toutsubscript𝑡outt_{\text{out}}italic_t start_POSTSUBSCRIPT out end_POSTSUBSCRIPT train tinsubscript𝑡int_{\text{in}}italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT test toutsubscript𝑡outt_{\text{out}}italic_t start_POSTSUBSCRIPT out end_POSTSUBSCRIPT test
DINo [49] 2.94E-03 7.56E-02 3.06E-03 7.78E-02
Ours 𝐚𝔹3superscript𝐚superscript𝔹3\mathbf{a}^{\mathbb{B}^{3}}bold_a start_POSTSUPERSCRIPT blackboard_B start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT 5.79 ±plus-or-minus\pm±0.17E-04 7.72 ±plus-or-minus\pm±0.55E-03 5.99±plus-or-minus\pm±0.15E-04 7.97±plus-or-minus\pm±0.46E-03
[Uncaptioned image]
Figure 8: Test samples (top) and corresponding predictions from our model equivariant to S2superscript𝑆2S^{2}italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-rotations in the ball. (Eq. 14)

Super-resolution - Shallow-Water on the sphere.

Due to their continuous nature, NeF-based approaches inherently support zero-shot super-resolution. In this setting, we generate a set of solutions for the global shallow-water equations over 𝕊2superscript𝕊2\mathbb{S}^{2}blackboard_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT at 2×2\times2 × resolution, and apply mean-pooling with a kernel size of 2 to obtain a low-resolution dataset. We train a model that respects rotational symmetries along the rotation axis of the globe (𝐚SWsuperscript𝐚SW\mathbf{a}^{\rm SW}bold_a start_POSTSUPERSCRIPT roman_SW end_POSTSUPERSCRIPT, Eq. 8) at train resolution, and evaluate the model by solving initial conditions at 2×2\times2 × resolution (Tab. 4.1, Fig. 7). In this dataset, we set tin=[0,,9]subscript𝑡in09t_{\text{in}}=\left[0,...,9\right]italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT = [ 0 , … , 9 ], and evaluation horizon tout=[10,,14]subscript𝑡out1014t_{\text{out}}=\left[10,...,14\right]italic_t start_POSTSUBSCRIPT out end_POSTSUBSCRIPT = [ 10 , … , 14 ]. First, we note that our model has difficulty capturing the dynamics near toutsubscript𝑡outt_{\text{out}}italic_t start_POSTSUBSCRIPT out end_POSTSUBSCRIPT - and beyond the training horizon, i.e. t=>9t=>9italic_t = > 9 - we suspect because of accumulation of reconstruction errors impacting the ability of Fψsubscript𝐹𝜓F_{\psi}italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT to model the relatively volatile dynamics of these equations. This points to a drawback of NeF-based solvers: error accumulation starts with the reconstruction error on the initial condition. Ranging over our experiments, we found that this error can be reduced by increasing model capacity, at steep cost of computational complexity attributable to the global attention operator in the ENF backbone. Regarding super-resolution; the model is able to solve the high-resolution initial conditions without inducing significantly increased MSE - it does not produce significant artefacts in the process.

Challenging geometries - Internally heated convection in 3D ball.

We show the value of inductive biases in modelling over a challenging geometry. We apply an equivariant model (𝐚𝔹3superscript𝐚superscript𝔹3\mathbf{a}^{\mathbb{B}^{3}}bold_a start_POSTSUPERSCRIPT blackboard_B start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, Eq. 14) to a set of solutions to Boussinesq internally heated convection in a ball defined over a regular ϕ,θ,ritalic-ϕ𝜃𝑟\phi,\theta,ritalic_ϕ , italic_θ , italic_r-grid, where we set tin=[0,,9]subscript𝑡in09t_{\text{in}}=\left[0,...,9\right]italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT = [ 0 , … , 9 ], and evaluation horizon tout=[10,,14]subscript𝑡out1014t_{\text{out}}=\left[10,...,14\right]italic_t start_POSTSUBSCRIPT out end_POSTSUBSCRIPT = [ 10 , … , 14 ]. Results (Tab. 4, Fig. 8) for our equivariant model show good generalization compared to a non-equivariant baseline [49]. We interpret this as an indication of a marked reduction in solving-complexity when correctly accounting for a PDE’s symmetries.

5 Conclusion

We introduce a novel equivariant space-time continuous framework for solving partial differential equations (PDEs). Uniquely - our method handles sparse or irregularly sampled observations of the initial state while respecting symmetry-constraints and boundary conditions of the underlying PDE. We clearly show the benefit of symmetry-preservation over a range of challenging tasks, where existing methods fail to capture the underlying dynamics.

References

  • Auzina et al. [2024] Ilze Amanda Auzina, Çağatay Yıldız, Sara Magliacane, Matthias Bethge, and Efstratios Gavves. Modulated neural odes. Advances in Neural Information Processing Systems, 36, 2024.
  • Ayed et al. [2020] Ibrahim Ayed, Emmanuel De Bezenac, Arthur Pajot, and Patrick Gallinari. Learning the spatio-temporal dynamics of physical processes from partial observations. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3232–3236. IEEE, 2020.
  • Bauer et al. [2023] Matthias Bauer, Emilien Dupont, Andy Brock, Dan Rosenbaum, Jonathan Richard Schwarz, and Hyunjik Kim. Spatial functa: Scaling functa to imagenet classification and generation. arXiv preprint arXiv:2302.03130, 2023.
  • Bekkers [2019] Erik J Bekkers. B-spline cnns on lie groups. In International Conference on Learning Representations, 2019.
  • Bekkers et al. [2023] Erik J Bekkers, Sharvaree Vadgama, Rob D Hesselink, Putri A van der Linden, and David W Romero. Fast, expressive se (n)𝑛(n)( italic_n ) equivariant networks through weight-sharing in position-orientation space. arXiv preprint arXiv:2310.02970, 2023.
  • Brandstetter et al. [2021] Johannes Brandstetter, Rob Hesselink, Elise van der Pol, Erik J Bekkers, and Max Welling. Geometric and physical quantities improve e (3) equivariant message passing. arXiv preprint arXiv:2110.02905, 2021.
  • Brandstetter et al. [2022a] Johannes Brandstetter, Rianne van den Berg, Max Welling, and Jayesh K Gupta. Clifford neural layers for pde modeling. arXiv preprint arXiv:2209.04934, 2022a.
  • Brandstetter et al. [2022b] Johannes Brandstetter, Daniel Worrall, and Max Welling. Message passing neural pde solvers. arXiv preprint arXiv:2202.03376, 2022b.
  • Bronstein et al. [2021] Michael M Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478, 2021.
  • Burns et al. [2020] Keaton J. Burns, Geoffrey M. Vasil, Jeffrey S. Oishi, Daniel Lecoanet, and Benjamin P. Brown. Dedalus: A flexible framework for numerical simulations with spectral methods. Physical Review Research, 2(2):023068, April 2020. doi: 10.1103/PhysRevResearch.2.023068.
  • Chen et al. [2018] Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
  • Cohen and Welling [2016] Taco Cohen and Max Welling. Group equivariant convolutional networks. In International conference on machine learning, pages 2990–2999. PMLR, 2016.
  • Dupont et al. [2022] Emilien Dupont, Hyunjik Kim, SM Eslami, Danilo Rezende, and Dan Rosenbaum. From data to functa: Your data point is a function and you can treat it like one. arXiv preprint arXiv:2201.12204, 2022.
  • Finn et al. [2017] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
  • Finzi et al. [2020] Marc Finzi, Samuel Stanton, Pavel Izmailov, and Andrew Gordon Wilson. Generalizing convolutional neural networks for equivariance to lie groups on arbitrary continuous data. In International Conference on Machine Learning, pages 3165–3176. PMLR, 2020.
  • Galewsky et al. [2004] Joseph Galewsky, Richard K Scott, and Lorenzo M Polvani. An initial-value problem for testing numerical models of the global shallow-water equations. Tellus A: Dynamic Meteorology and Oceanography, 56(5):429–440, 2004.
  • Gilmer et al. [2017] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
  • Greydanus et al. [2019] Samuel Greydanus, Misko Dzamba, and Jason Yosinski. Hamiltonian neural networks. Advances in neural information processing systems, 32, 2019.
  • Guo et al. [2016] Xiaoxiao Guo, Wei Li, and Francesco Iorio. Convolutional neural networks for steady flow approximation. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 481–490, 2016.
  • Helwig et al. [2023] Jacob Helwig, Xuan Zhang, Cong Fu, Jerry Kurtin, Stephan Wojtowytsch, and Shuiwang Ji. Group equivariant fourier neural operators for partial differential equations. Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023., 2023.
  • Hernández et al. [2021] Quercus Hernández, Alberto Badías, David González, Francisco Chinesta, and Elías Cueto. Structure-preserving neural networks. Journal of Computational Physics, 426:109950, 2021.
  • ** et al. [2020] Pengzhan **, Zhen Zhang, Aiqing Zhu, Yifa Tang, and George Em Karniadakis. Sympnets: Intrinsic structure-preserving symplectic networks for identifying hamiltonian systems. Neural Networks, 132:166–179, 2020.
  • Kingma and Ba [2014] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • Kipf and Welling [2016] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  • Knigge et al. [2022] David M Knigge, David W Romero, and Erik J Bekkers. Exploiting redundancy: Separable group convolutional networks on lie groups. In International Conference on Machine Learning, pages 11359–11386. PMLR, 2022.
  • Kofinas et al. [2023] Miltiadis Miltos Kofinas, Erik Bekkers, Naveen Nagaraja, and Efstratios Gavves. Latent field discovery in interacting dynamical systems with neural fields. Advances in Neural Information Processing Systems, 36, 2023.
  • Kovachki et al. [2021] Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces. arXiv preprint arXiv:2108.08481, 2021.
  • Li et al. [2017] Zhenguo Li, Fengwei Zhou, Fei Chen, and Hang Li. Meta-sgd: Learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835, 2017.
  • Li et al. [2020] Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895, 2020.
  • Liu et al. [2023] Yongtuo Liu, Sara Magliacane, Miltiadis Kofinas, and Efstratios Gavves. Graph switching dynamical systems. In International Conference on Machine Learning, pages 21867–21883. PMLR, 2023.
  • Liu et al. [2024] Yongtuo Liu, Sara Magliacane, Miltiadis Kofinas, and Efstratios Gavves. Amortized equation discovery in hybrid dynamical systems, 2024.
  • Moser et al. [2023] Philipp Moser, Wolfgang Fenz, Stefan Thumfart, Isabell Ganitzer, and Michael Giretzlehner. Modeling of 3d blood flows with physics-informed neural networks: Comparison of network architectures. Fluids, 8(2):46, 2023.
  • Nichol et al. [2018] Alex Nichol, Joshua Achiam, and John Schulman. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999, 2018.
  • Papa et al. [2023] Samuele Papa, David M Knigge, Riccardo Valperga, Nikita Moriakov, Miltos Kofinas, Jan-Jakob Sonke, and Efstratios Gavves. Neural modulation fields for conditional cone beam neural tomography. arXiv preprint arXiv:2307.08351, 2023.
  • Park et al. [2019] Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174, 2019.
  • Perez et al. [2018] Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  • Pervez et al. [2024] Adeel Pervez, Francesco Locatello, and Efstratios Gavves. Mechanistic neural networks for scientific machine learning. arXiv preprint arXiv:2402.13077, 2024.
  • Pfaff et al. [2020] Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter W Battaglia. Learning mesh-based simulation with graph networks. arXiv preprint arXiv:2010.03409, 2020.
  • Prasthofer et al. [2022] Michael Prasthofer, Tim De Ryck, and Siddhartha Mishra. Variable-input deep operator networks. arXiv preprint arXiv:2205.11404, 2022.
  • Satorras et al. [2021] Vıctor Garcia Satorras, Emiel Hoogeboom, and Max Welling. E (n) equivariant graph neural networks. In International conference on machine learning, pages 9323–9332. PMLR, 2021.
  • Sitzmann et al. [2020] Vincent Sitzmann, Eric Chan, Richard Tucker, Noah Snavely, and Gordon Wetzstein. Metasdf: Meta-learning signed distance functions. Advances in Neural Information Processing Systems, 33:10136–10147, 2020.
  • Stokes et al. [1851] George Gabriel Stokes et al. On the effect of the internal friction of fluids on the motion of pendulums. 1851.
  • Tancik et al. [2020] Matthew Tancik, Pratul Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan Barron, and Ren Ng. Fourier features let networks learn high frequency functions in low dimensional domains. Advances in neural information processing systems, 33:7537–7547, 2020.
  • Tancik et al. [2021] Matthew Tancik, Ben Mildenhall, Terrance Wang, Divi Schmidt, Pratul P Srinivasan, Jonathan T Barron, and Ren Ng. Learned initializations for optimizing coordinate-based neural representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2846–2855, 2021.
  • Valperga et al. [2022] Riccardo Valperga, Kevin Webster, Dmitry Turaev, Victoria Klein, and Jeroen Lamb. Learning reversible symplectic dynamics. In Proceedings of The 4th Annual Learning for Dynamics and Control Conference, volume 168 of Proceedings of Machine Learning Research, pages 906–916. PMLR, 23–24 Jun 2022.
  • Van der Maaten and Hinton [2008] Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  • Weiler and Cesa [2019] Maurice Weiler and Gabriele Cesa. General e (2)-equivariant steerable cnns. Advances in neural information processing systems, 32, 2019.
  • Wessels et al. [2024] David R Wessels, David M Knigge, Samuele Papa, Riccardo Valperga, Efstratios Gavves, and Erik J Bekkers. Grounding continuous representations in geometry: Equivariant neural fields. ArXiv Preprint arXiv:, 2024.
  • Yin et al. [2022] Yuan Yin, Matthieu Kirchmeyer, Jean-Yves Franceschi, Alain Rakotomamonjy, and Patrick Gallinari. Continuous pde dynamics forecasting with implicit neural representations. arXiv preprint arXiv:2209.14855, 2022.
  • Zhang et al. [2023] Biao Zhang, Jiapeng Tang, Matthias Niessner, and Peter Wonka. 3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models. ACM Transactions on Graphics (TOG), 42(4):1–16, 2023.
  • Zhdanov et al. [2024] Maksim Zhdanov, David Ruhe, Maurice Weiler, Ana Lucic, Johannes Brandstetter, and Patrick Forré. Clifford-steerable convolutional neural networks. arXiv preprint arXiv:2402.14730, 2024.
  • Zwicker [2020] David Zwicker. py-pde: A python package for solving partial differential equations. Journal of Open Source Software, 5(48):2158, 2020. doi: 10.21105/joss.02158. URL https://doi.org/10.21105/joss.02158.

Appendix A Related work

DL approaches to dynamics modelling

In recent years, the learning of spatiotemporal dynamics has been receiving significant attention, either for modelling interacting systems [31, 30], scientific Machine Learning [49, 8, 7, 37, 26, 51], or even videos [1]. Most DL methods for solving PDEs attempt to directly replace solvers with map**s between finite-dimensional Euclidean spaces, i.e. through the use of CNNs [19, 2] or GNNs [38, 8] often applied autoregressively to an observed (discretized) PDE state. Instead, the Neural Operator (NO) [27] paradigm attempts to learn infinite-dimensional operators, i.e. map**s between function spaces, with limited success. Fourier Neural Operator (FNO) [29] extends this method by performing convolutions in the spectral domain. FNO obtains much improved performance, but due to its reliance on FFT is limited to data on regular grids.

Inductive biases in DL and dynamics modelling

Geometric Deep Learning aims to improve model generalization and performance by constraining/designing a model’s space of learnable functions based on geometric principles. Prominent examples include Group Equivariant Convolutional Networks and Steerable CNNs [12, 4], generalizations of CNNs that respect symmetries of the data - such as dilations and continuous rotations [47, 15, 25]. Analogously, Graph Neural Networks (GNNs) [24] or Message Passing Neural Networks (MPNNS) [17] are a variant of neural network that respects set-permutations naturally found in graph data. They are typically formulated for graphs 𝒢=(𝒱,)𝒢𝒱\mathcal{G}=(\mathcal{V},\mathcal{E})caligraphic_G = ( caligraphic_V , caligraphic_E ), with nodes i𝒱𝑖𝒱i\in\mathcal{V}italic_i ∈ caligraphic_V and edges \mathcal{E}caligraphic_E. Typically nodes are embedded into a node vector fi0superscriptsubscript𝑓𝑖0f_{i}^{0}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, which is subsequently updated over multiple layers of message passing. Message passing consists of (1) computing messages mi,jsubscript𝑚𝑖𝑗m_{i,j}italic_m start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT over edges i,j𝑖𝑗i,jitalic_i , italic_j from node j𝑗jitalic_j to i𝑖iitalic_i with the message function (taking into account edge attributes ai,jsubscript𝑎𝑖𝑗a_{i,j}italic_a start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT: mi,j=ϕm(fil,fjl,ai,j)subscript𝑚𝑖𝑗subscriptitalic-ϕ𝑚superscriptsubscript𝑓𝑖𝑙superscriptsubscript𝑓𝑗𝑙subscript𝑎𝑖𝑗m_{i,j}=\phi_{m}(f_{i}^{l},f_{j}^{l},a_{i,j})italic_m start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_ϕ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) (2) aggegating incoming messages: mi=j𝒩(i)mi,jsubscript𝑚𝑖subscript𝑗𝒩𝑖subscript𝑚𝑖𝑗m_{i}=\sum_{j\in\mathcal{N}(i)}m_{i,j}italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N ( italic_i ) end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT, (3) computing updated node features fil+1=ϕu(fil,mi)superscriptsubscript𝑓𝑖𝑙1subscriptitalic-ϕ𝑢superscriptsubscript𝑓𝑖𝑙subscript𝑚𝑖f_{i}^{l+1}=\phi_{u}(f_{i}^{l},m_{i})italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT = italic_ϕ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ).

Recently, such methods have also been adapted for sparse physical data, e.g. for molecular property prediction [40, 6] - where the GNN is additionally required to respect transformation symmetries. [5] unifies these approaches to equivariance under the guise of weight sharing over equivalence classes defined by bi-invariant attributes of pairs of nodes i,j𝑖𝑗i,jitalic_i , italic_j, a viewpoint we leverage in constructing the equivariant conditioning latent ztνsuperscriptsubscript𝑧𝑡𝜈z_{t}^{\nu}italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT corresponding to a PDE state νtsubscript𝜈𝑡\nu_{t}italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. In the context of dynamics modelling, equivariant architectures have been employed to incorporate various properties of physical systems in the modelling process, examples of such properties are the symplectic structure [22], discrete symmetries such as reversing symmetries [45] and energy conservation [18, 21].

Neural Fields in dynamics modelling

Conditional Neural fields (NeFs) are a class of coordinate-based neural networks, often trained to reconstruct discretely-sampled input continuously. More specifically, a conditional neural field fθ:nd:subscript𝑓𝜃superscript𝑛superscript𝑑f_{\theta}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{d}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is a field –parameterized by a neural network with parameters θ𝜃\thetaitalic_θ– that maps input coordinates xn𝑥superscript𝑛x\in\mathbb{R}^{n}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT in the data domain alongside conditioning latents z𝑧zitalic_z to d𝑑ditalic_d-dimensional signal values f(x)d𝑓𝑥superscript𝑑f(x)\in\mathbb{R}^{d}italic_f ( italic_x ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. By associating a conditioning latent zνcsuperscript𝑧𝜈superscript𝑐z^{\nu}\in\mathbb{R}^{c}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT to each signal ν𝜈{\nu}italic_ν, a single conditional NeF fθ:n×cd:subscript𝑓𝜃superscript𝑛superscript𝑐superscript𝑑f_{\theta}:\mathbb{R}^{n}\times\mathbb{R}^{c}\rightarrow\mathbb{R}^{d}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT can learn to represent families 𝒟𝒟\mathcal{D}caligraphic_D of continuous signals such that ν𝒟:f(𝐱)fθ(𝐱;𝐳ν):for-all𝜈𝒟𝑓𝐱subscript𝑓𝜃𝐱superscript𝐳𝜈\forall\,\nu\in\mathcal{D}:f(\mathbf{x})\approx f_{\theta}(\mathbf{x};\mathbf{% z}^{\nu})∀ italic_ν ∈ caligraphic_D : italic_f ( bold_x ) ≈ italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_x ; bold_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ). [13] showed the viability of using the latents 𝐳isuperscript𝐳𝑖\mathbf{z}^{i}bold_z start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT as representations for downstream tasks (e.g. classification, generation) proposing a framework for learning on neural fields. This framework inherits desirable properties of neural fields, such as inherent support for sparsely and/or irregularly sampled data, and independence to signal resolution. [49] propose to use conditional NeFs for PDE modelling by learning a continuous flow in the latent space of a conditional neural field. In particular, a set of latents {𝐳iν}i=1Tsuperscriptsubscriptsuperscriptsubscript𝐳𝑖𝜈𝑖1𝑇\{\mathbf{z}_{i}^{\nu}\}_{i=1}^{T}{ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT are obtained by fitting a conditional neural field to a given set of observations {νi}i=1Tsuperscriptsubscriptsubscript𝜈𝑖𝑖1𝑇\{\nu_{i}\}_{i=1}^{T}{ italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT at timesteps 1,,T1𝑇1,...,T1 , … , italic_T; simultaneously, a neural ODE [11] Fψsubscript𝐹𝜓F_{\psi}italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT is trained to map pairs of temporally continuous latents s.t. solutions correspond to the trajectories traced by the learned latents. Though this approach yields impressive results for sparse and irregular data in planar PDEs, we show it breaks down on more challenging geometries. We hypothesize that this is due to a lack of a latent space that preserves relevant geometric transformation with respect to which systems we are modelling are symmetric, and as such propose an extension of this framework where such symmetries are preserved.

Obtaining Neural Fields representations

Most NeF-based approach to representation or reconstruction use SGD to optimize (a subset of) the parameters of the NeF, inevitably leading to significant overhead in inference; conditional NeFs require optimizing a (set of) latents from initialization to reconstruct for a novel sample. Accordingly, research has explored ways of addressing this limitation. [41, 44] propose using Meta-Learning [14, 33] to optimize for an initialization for the NeF from which it is possible to reconstruct for a novel sample in as few as 3 gradient descent steps. [13] propose to meta-learn the NeF backbone, but fix the initialization for the latent 𝐳𝐳\mathbf{z}bold_z and instead optimize the learning rate used in its optimization using Meta-SGD [28]. Recently, work has also explored the relation between initialization/optimization of a NeF and its value as downstream representation; [34] show that (1) using a shared NeF initialization and (2) limiting the number of gradient updates to the NeF improves performance in downstream tasks, as this simplifies the complex relation between a NeFs parameter space and its output function space. We combine these insights and make Meta-Learning part of our equivariant PDE solving pipeline, as it enables fast inference and we show it to simplify the latent space of the ENF, improving performance of the neural ODE solver.

Appendix B Pseudocode for optimization objective

See Alg. 1 for pseudocode of the training loop that we use, written for a single datasample for simplicity of notation. For simplicity, we further assume we’re using an euler stepper to solve the neural ODE, but this can be replaced by any solver. For inference, this stratagem is identical, except we do not perform gradient updates to θ,ψ𝜃𝜓\theta,\psiitalic_θ , italic_ψ.

Algorithm 1 Optimization objective
Randomly initialize neural field fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT
Randomly initialize neural ode Fψsubscript𝐹𝜓F_{\psi}italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT
while not done do Sample initial states and coordinates ν0subscript𝜈0\nu_{0}italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Initialize latents z0ν{(pi,𝐜i)}i=1Nsuperscriptsubscript𝑧0𝜈superscriptsubscriptsubscript𝑝𝑖subscript𝐜𝑖𝑖1𝑁z_{0}^{\nu}\leftarrow\{(p_{i},\mathbf{c}_{i})\}_{i=1}^{N}italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ← { ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT.
    for all step Ninitial state opt=3absentsubscript𝑁initial state opt3\in N_{\text{initial state opt}}=3∈ italic_N start_POSTSUBSCRIPT initial state opt end_POSTSUBSCRIPT = 3 do z0νz0νϵz0νmse(fθ(,z0ν),ν0))z_{0}^{\nu}\leftarrow z_{0}^{\nu}-\epsilon\nabla_{z_{0}^{\nu}}\mathcal{L}_{% \text{mse}}\big{(}f_{\theta}(\cdot,z_{0}^{\nu}),\nu_{0})\big{)}italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ← italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT - italic_ϵ ∇ start_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT mse end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( ⋅ , italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ) , italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) )
    end for
    for all t[1,,tin]𝑡1subscript𝑡int\in\left[1,...,t_{\text{in}}\right]italic_t ∈ [ 1 , … , italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT ] do ztνz0ν+0tFψ(zτν)𝑑τsubscriptsuperscript𝑧𝜈𝑡subscriptsuperscript𝑧𝜈0superscriptsubscript0𝑡subscript𝐹𝜓subscriptsuperscript𝑧𝜈𝜏differential-d𝜏z^{\nu}_{t}\leftarrow z^{\nu}_{0}+\int_{0}^{t}F_{\psi}(z^{\nu}_{\tau})d\tauitalic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ← italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) italic_d italic_τ
    end for
Update θ,ψ𝜃𝜓\theta,\psiitalic_θ , italic_ψ per:
θθηθmse𝜃𝜃𝜂subscript𝜃superscriptsubscriptmse\theta\leftarrow\theta-\eta\nabla_{\theta}\mathcal{L}_{\text{mse}}^{\prime}italic_θ ← italic_θ - italic_η ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT mse end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, ψψηψmse𝜓𝜓𝜂subscript𝜓superscriptsubscriptmse\psi\leftarrow\psi-\eta\nabla_{\psi}\mathcal{L}_{\text{mse}}^{\prime}italic_ψ ← italic_ψ - italic_η ∇ start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT mse end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT with mse=({fθ(,ztν),νt}t=0tin)superscriptsubscriptmsesuperscriptsubscriptsubscript𝑓𝜃subscriptsuperscript𝑧𝜈𝑡subscript𝜈𝑡𝑡0subscript𝑡in\mathcal{L}_{\text{mse}}^{\prime}=(\big{\{}f_{\theta}(\cdot,z^{\nu}_{t}),\nu_{% t}\big{\}}_{t=0}^{t_{\text{in}}}\big{)}caligraphic_L start_POSTSUBSCRIPT mse end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ( { italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( ⋅ , italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT in end_POSTSUBSCRIPT end_POSTSUPERSCRIPT )
end while

Appendix C Equivariant Neural Fields

ENF to reconstruct PDE states

For ease of notation we denote 𝐏𝐏\mathbf{P}bold_P and 𝐂𝐂\mathbf{C}bold_C the matrices containing poses and corresponding appearances stacked row-wise, i.e. 𝐏i,:=piTsubscript𝐏𝑖:superscriptsubscript𝑝𝑖𝑇\mathbf{P}_{i,:}=p_{i}^{T}bold_P start_POSTSUBSCRIPT italic_i , : end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and 𝐂i,:=𝐜iTsubscript𝐂𝑖:superscriptsubscript𝐜𝑖𝑇\mathbf{C}_{i,:}=\mathbf{c}_{i}^{T}bold_C start_POSTSUBSCRIPT italic_i , : end_POSTSUBSCRIPT = bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. Furthermore, we denote 𝐀𝐀\mathbf{A}bold_A as the matrix containing all bi-invariants 𝐚i,msubscript𝐚𝑖𝑚\mathbf{a}_{i,m}bold_a start_POSTSUBSCRIPT italic_i , italic_m end_POSTSUBSCRIPT stacked row-wise, i.e. 𝐀i,:=𝐚i,mTsubscript𝐀𝑖:superscriptsubscript𝐚𝑖𝑚𝑇\mathbf{A}_{i,:}=\mathbf{a}_{i,m}^{T}bold_A start_POSTSUBSCRIPT italic_i , : end_POSTSUBSCRIPT = bold_a start_POSTSUBSCRIPT italic_i , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT:

fθ(𝐱;zνt):=softmax(𝐐(𝐀)𝐊T(𝐂)dk+𝐆(𝐀))𝐕(𝐂;𝐀),assignsubscript𝑓𝜃𝐱superscript𝑧subscript𝜈𝑡softmax𝐐𝐀superscript𝐊𝑇𝐂subscript𝑑𝑘𝐆𝐀𝐕𝐂𝐀f_{\theta}(\mathbf{x};{z}^{\nu_{t}}):=\operatorname{softmax}\bigg{(}\frac{% \mathbf{Q}(\mathbf{A})\mathbf{K}^{T}(\mathbf{C})}{\sqrt{d_{k}}}+\mathbf{G}(% \mathbf{A})\bigg{)}\mathbf{V}(\mathbf{C};\mathbf{A}),italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_x ; italic_z start_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) := roman_softmax ( divide start_ARG bold_Q ( bold_A ) bold_K start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( bold_C ) end_ARG start_ARG square-root start_ARG italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_ARG + bold_G ( bold_A ) ) bold_V ( bold_C ; bold_A ) , (10)

where the softmax is applied over the latent set and with dksubscript𝑑𝑘d_{k}italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT the hidden dimensionality of the ENF. The query matrix 𝐐𝐐\mathbf{Q}bold_Q is constructed as 𝐐=𝐖qγqT(𝐀)𝐐subscript𝐖𝑞superscriptsubscript𝛾𝑞𝑇𝐀\mathbf{Q}{=}\mathbf{W}_{q}\gamma_{q}^{T}(\mathbf{A})bold_Q = bold_W start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( bold_A ), γqsubscript𝛾𝑞\gamma_{q}italic_γ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT a Gaussian RFF embedding [43], followed by a linear layer 𝐖qsubscript𝐖𝑞\mathbf{W}_{q}bold_W start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT, i.e. 𝐐𝐐\mathbf{Q}bold_Q consists of the RFF embedded bi-invariants of the input coordinate xmsubscript𝑥𝑚x_{m}italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and each of the latent poses pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT stacked row-wise. The key matrix is given by a learnable linear transformation 𝐖ksubscript𝐖𝑘\mathbf{W}_{k}bold_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT of the context vectors 𝐜isubscript𝐜𝑖\mathbf{c}_{i}bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT: 𝐊=𝐖k𝐂T𝐊subscript𝐖𝑘superscript𝐂𝑇\mathbf{K}{=}\mathbf{W}_{k}\mathbf{C}^{T}bold_K = bold_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_C start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. The attention coefficients which result from the inner product of 𝐐,𝐊𝐐𝐊\mathbf{Q},\mathbf{K}bold_Q , bold_K are weighted by a Gaussian window 𝐆𝐆\mathbf{G}bold_G whose magnitude is conditioned on a distance measure on the relative distance between latent poses and input coordinates as: 𝐆i=σatt(pi𝐱2)subscript𝐆𝑖subscript𝜎attsuperscriptnormsubscript𝑝𝑖𝐱2\mathbf{G}_{i}=\sigma_{\text{att}}(||p_{i}-\mathbf{x}||^{2})bold_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT att end_POSTSUBSCRIPT ( | | italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_x | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), with σattsubscript𝜎att\sigma_{\text{att}}italic_σ start_POSTSUBSCRIPT att end_POSTSUBSCRIPT a hyperparameter which determines the locality of each of the latents. Finally the value matrix is calculated as a learnable linear transformation 𝐖vsubscript𝐖𝑣\mathbf{W}_{v}bold_W start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT of the appearances 𝐀𝐀\mathbf{A}bold_A, conditioned through FiLM modulation [36] by a second RFF embedding of the relative poses split into scale- and shift modulations: 𝐕=𝐖v𝐀𝐖vαγvα(𝐀)+𝐖vβγvβ(𝐀)𝐕direct-productsubscript𝐖𝑣𝐀subscript𝐖subscript𝑣𝛼subscript𝛾subscript𝑣𝛼𝐀subscript𝐖subscript𝑣𝛽subscript𝛾subscript𝑣𝛽𝐀\mathbf{V}{=}\mathbf{W}_{v}\mathbf{A}\odot\mathbf{W}_{v_{\alpha}}\gamma_{v_{% \alpha}}(\mathbf{A})+\mathbf{W}_{v_{\beta}}\gamma_{v_{\beta}}(\mathbf{A})bold_V = bold_W start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT bold_A ⊙ bold_W start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_A ) + bold_W start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_A ). The latents ztνsubscriptsuperscript𝑧𝜈𝑡z^{\nu}_{t}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are optimized for a single state νtsubscript𝜈𝑡\nu_{t}italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, whereas the parameters θ𝜃\thetaitalic_θ of the ENF backbone - which consist of all the learnable parameters of the linear layers 𝐖q,𝐖k,𝐖v,𝐖vα,𝐖vβsubscript𝐖𝑞subscript𝐖𝑘subscript𝐖𝑣subscript𝐖subscript𝑣𝛼subscript𝐖subscript𝑣𝛽\mathbf{W}_{q},\mathbf{W}_{k},\mathbf{W}_{v},\mathbf{W}_{v_{\alpha}},\mathbf{W% }_{v_{\beta}}bold_W start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , bold_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_W start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , bold_W start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_W start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT end_POSTSUBSCRIPT used to construct 𝐐,𝐊,𝐕𝐐𝐊𝐕\mathbf{Q},\mathbf{K},\mathbf{V}bold_Q , bold_K , bold_V - are shared over all states.

The overall architecture consists of a linear layer 𝐖cd𝐖superscript𝑐superscript𝑑\mathbf{W}\mathbb{R}^{c}\rightarrow\mathbb{R}^{d}bold_W blackboard_R start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT applied to 𝐜icsubscript𝐜𝑖superscript𝑐\mathbf{c}_{i}\in\mathbb{R}^{c}bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT, followed by a layernorm. After this, the cross attention listed above is applied, followed by three d-dim𝑑-dimd\text{-dim}italic_d -dim linear layers, the final one map** to the output dimension outsuperscriptout\mathbb{R}^{\text{out}}blackboard_R start_POSTSUPERSCRIPT out end_POSTSUPERSCRIPT.

Equivariance follows from sharing 𝐐,𝐊,𝐕𝐐𝐊𝐕\mathbf{Q},\mathbf{K},\mathbf{V}bold_Q , bold_K , bold_V over equivalence classes

Note that the latent space of the ENF is equipped with a group action as: gztν={(gpi,𝐚i)}i=1N𝑔subscriptsuperscript𝑧𝜈𝑡superscriptsubscript𝑔subscript𝑝𝑖subscript𝐚𝑖𝑖1𝑁gz^{\nu}_{t}=\{(gp_{i},\mathbf{a}_{i})\}_{i=1}^{N}italic_g italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { ( italic_g italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT. As an example, SE(2)SE2\rm SE(2)roman_SE ( 2 )-equivariance of the ENF follows from bi-invariance of the quantity 𝐚𝐚\mathbf{a}bold_a used to construct 𝐐𝐐\mathbf{Q}bold_Q under the group action:

gSE(n):(pi,𝐱)(gpi,g𝐱)pi1𝐱(gpi)1g𝐱=pi1g1g𝐱=pi1g1g.:for-all𝑔𝑆𝐸𝑛maps-tosubscript𝑝𝑖𝐱𝑔subscript𝑝𝑖𝑔𝐱maps-tosuperscriptsubscript𝑝𝑖1𝐱superscript𝑔subscript𝑝𝑖1𝑔𝐱superscriptsubscript𝑝𝑖1superscript𝑔1𝑔𝐱superscriptsubscript𝑝𝑖1superscript𝑔1𝑔\forall g\in SE(n):\;\;(p_{i},\mathbf{x})\;\mapsto\;(g\,p_{i},g\,\mathbf{x})\;% \;\;\Leftrightarrow\;\;\;p_{i}^{-1}\mathbf{x}\;\mapsto\;(g\,p_{i})^{-1}g\,% \mathbf{x}=p_{i}^{-1}g^{-1}g\,\mathbf{x}=p_{i}^{-1}g^{-1}g\,.∀ italic_g ∈ italic_S italic_E ( italic_n ) : ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_x ) ↦ ( italic_g italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_g bold_x ) ⇔ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_x ↦ ( italic_g italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_g bold_x = italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_g bold_x = italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_g . (11)

And so, constructing the matrix containing the relative poses of bi-transformed poses and coordinates (g𝐏)1g𝐱superscript𝑔𝐏1𝑔𝐱(g\mathbf{P})^{-1}g\mathbf{x}( italic_g bold_P ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_g bold_x as ((g𝐏)1g𝐱)i,:=pi1g1g𝐱=pi1𝐱subscriptsuperscript𝑔𝐏1𝑔𝐱𝑖:superscriptsubscript𝑝𝑖1superscript𝑔1𝑔𝐱superscriptsubscript𝑝𝑖1𝐱((g\mathbf{P})^{-1}g\mathbf{x})_{i,:}=p_{i}^{-1}g^{-1}g\mathbf{x}=p_{i}^{-1}% \mathbf{x}( ( italic_g bold_P ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_g bold_x ) start_POSTSUBSCRIPT italic_i , : end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_g bold_x = italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_x, we trivially have:

gSE(n):(pi,𝐱)(gpi,g𝐱)𝐐(𝐀)𝐐(g𝐀)=𝐐(𝐀).:for-all𝑔𝑆𝐸𝑛maps-tosubscript𝑝𝑖𝐱𝑔subscript𝑝𝑖𝑔𝐱maps-to𝐐𝐀𝐐𝑔𝐀𝐐𝐀\forall g\in SE(n):(p_{i},\mathbf{x})\mapsto(g\,p_{i},g\,\mathbf{x})\;\;\;% \Leftrightarrow\;\;\;\mathbf{Q}(\mathbf{A})\mapsto\mathbf{Q}(g\mathbf{A})=% \mathbf{Q}(\mathbf{A}).∀ italic_g ∈ italic_S italic_E ( italic_n ) : ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_x ) ↦ ( italic_g italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_g bold_x ) ⇔ bold_Q ( bold_A ) ↦ bold_Q ( italic_g bold_A ) = bold_Q ( bold_A ) . (12)

Appendix D Defining additional bi-invariant attributes

Other examples of the bi-invariants attributes that are used in the experiments section are listed here.

Full rotation symmetries on the 2-sphere For the global shallow water equations we defined 𝐚SWsuperscript𝐚SW\mathbf{a}^{\rm SW}bold_a start_POSTSUPERSCRIPT roman_SW end_POSTSUPERSCRIPT as an attribute that is bi-invariant only to rotations over globe’s axis, i.e. rotations over ϕitalic-ϕ\phiitalic_ϕ. In our experiments we also solve diffusion over the sphere, which is fully SO(3)SO3\mathrm{SO(3)}roman_SO ( 3 ) rotationally symmetric. To achieve equivariance to full 3d rotations, we take poses pSO(3)𝑝SO3p\in{\rm SO(3)}italic_p ∈ roman_SO ( 3 ) parameterized by euler angles which act on points xS2𝑥superscript𝑆2x\in S^{2}italic_x ∈ italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT parameterized by 3D unit vectors 𝐱𝐱\mathbf{x}bold_x through 3D-rotation matrices, allowing us to calculate the bi-invariant p1xsuperscript𝑝1𝑥p^{-1}xitalic_p start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_x:

𝐚i,mSO(3)=𝐑i𝐱m.subscriptsuperscript𝐚SO3𝑖𝑚subscript𝐑𝑖subscript𝐱𝑚\mathbf{a}^{\rm SO(3)}_{i,m}=\mathbf{R}_{i}\mathbf{x}_{m}.bold_a start_POSTSUPERSCRIPT roman_SO ( 3 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_m end_POSTSUBSCRIPT = bold_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT . (13)

This bi-invariant is used in our experiments for diffusion on the 2-sphere.

The 3D ball 𝔹3superscript𝔹3\mathbb{B}^{3}blackboard_B start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT. We experiment with Boussinesq equation for internally heated convection in a ball. The PDE is fully rotationally symmetric, but since the heat source K𝐾Kitalic_K is at a fixed point (the center of the ball resp.), it is not symmetric to translations of the initial conditions within the ball. As such, we let pSO(3)×𝑝SO3p\in\mathrm{SO(3)}\times\mathbb{R}italic_p ∈ roman_SO ( 3 ) × blackboard_R with ϕ,θ,γ,ritalic-ϕ𝜃𝛾𝑟\phi,\theta,\gamma,ritalic_ϕ , italic_θ , italic_γ , italic_r s.t. 0<r<10𝑟10<r<10 < italic_r < 1. The PDE is defined over spherical coordinates (ϕ,θ,r)italic-ϕ𝜃𝑟(\phi,\theta,r)( italic_ϕ , italic_θ , italic_r ), which we map to vectors in 𝐱3𝐱superscript3\mathbf{x}\in\mathbb{R}^{3}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT. We then use the following bi-invariant, which is only symmetric to rotations in SO(3)SO3\mathrm{SO(3)}roman_SO ( 3 ):

𝐚i,m𝔹3=𝐑i𝐱mrpirxm.subscriptsuperscript𝐚superscript𝔹3𝑖𝑚direct-sumsubscript𝐑𝑖subscript𝐱𝑚subscript𝑟subscript𝑝𝑖subscript𝑟subscript𝑥𝑚\mathbf{a}^{\mathbb{B}^{3}}_{i,m}=\mathbf{R}_{i}\mathbf{x}_{m}\oplus r_{p_{i}}% \oplus r_{x_{m}}.bold_a start_POSTSUPERSCRIPT blackboard_B start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_m end_POSTSUBSCRIPT = bold_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ⊕ italic_r start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⊕ italic_r start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT . (14)

No transformation symmetries. A simple "bi-invariant" for this setting that preserves all geometric information is given by simply concatenating coordinates p𝑝pitalic_p with coordinates x𝑥xitalic_x:

𝐚i,m=pixmsubscriptsuperscript𝐚𝑖𝑚direct-sumsubscript𝑝𝑖subscript𝑥𝑚\mathbf{a}^{\emptyset}_{i,m}=p_{i}\oplus x_{m}bold_a start_POSTSUPERSCRIPT ∅ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_m end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊕ italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT (15)

Parameterizing the cross-attention operation in Eq. 5 as function of this bi-invariant results in a framework without any equivariance constraints. We use this in experiments to ablate over equivariance constraints and its impact on performance.

Appendix E Experimental Details

E.1 Dataset creation

For creating the dataset of PDE solutions we used py-pde [52] for Navier-Stokes and the diffusion equation on the plane. For the shallow-water equation and the diffusion equation on the sphere, as well as the internally heated convection in a 3D ball we used Dedalus [10].

Diffusion on the plane.

For the diffusion equation on the plane we use as initial conditions narrow spikes centred at random locations in the left half of the domain for the train set, and in the right half of the domain for the test set. States are defined on a 64 ×\times× 64 grid ranging from -3 to 3. Initial conditions are randomly sampled uniformly between -2 and 2 for x𝑥xitalic_x and 0 and 2 for y𝑦yitalic_y in the training set and between -2 and 2 for x𝑥xitalic_x and -2 and 0 for y𝑦yitalic_y. A random value uniformly sampled between 5.0 and 5.5 is inserted at the randomly sampled location. We solve the equation with an Euler solver for 27 steps, discarding the first 7, with a timestep dt=0.01𝑑𝑡0.01dt=0.01italic_d italic_t = 0.01. We generate 1024 training and 128 test trajectories.

Navier-Stokes on the flat 2-torus.

For Navier-Stokes on the flat 2-torus we use Gaussian random fields as initial conditions and solve the PDE using a Cranck-Nicholson method with timestep dt=1.0𝑑𝑡1.0dt=1.0italic_d italic_t = 1.0 for 20 steps. The PDE is dvdt=uv+vΔμ+f,v=×u,u=0formulae-sequence𝑑𝑣𝑑𝑡𝑢𝑣𝑣Δ𝜇𝑓formulae-sequence𝑣𝑢𝑢0\frac{dv}{dt}=-u\nabla v+v\Delta\mu+f,v=\nabla\times u,\nabla u=0divide start_ARG italic_d italic_v end_ARG start_ARG italic_d italic_t end_ARG = - italic_u ∇ italic_v + italic_v roman_Δ italic_μ + italic_f , italic_v = ∇ × italic_u , ∇ italic_u = 0, where u𝑢uitalic_u is the velocity field, v𝑣vitalic_v the vorticity, μ𝜇\muitalic_μ the viscosity and f𝑓fitalic_f a forcing term

dvdt=uv+vΔμ+f𝑑𝑣𝑑𝑡𝑢𝑣𝑣Δ𝜇𝑓\displaystyle\frac{dv}{dt}=-u\nabla v+v\Delta\mu+fdivide start_ARG italic_d italic_v end_ARG start_ARG italic_d italic_t end_ARG = - italic_u ∇ italic_v + italic_v roman_Δ italic_μ + italic_f
v=×u𝑣𝑢\displaystyle v=\nabla\times uitalic_v = ∇ × italic_u
u=0,𝑢0\displaystyle\nabla u=0,∇ italic_u = 0 ,

where u𝑢uitalic_u is the velocity field, v𝑣vitalic_v the vorticity, μ𝜇\muitalic_μ the viscosity and f𝑓fitalic_f a forcing term. States are defined on a 64 ×\times× 64 grid. We generate 8192 training and 512 test trajectories.

Diffusion on the 2-sphere.

For the diffusion dataset on the sphere, states are defined over a 128×6412864128\times 64128 × 64 ϕ,θitalic-ϕ𝜃\phi,\thetaitalic_ϕ , italic_θ grid. Initial conditions are generated as a gaussian peak inserted at a random point on the sphere with σ=0.25𝜎0.25\sigma=0.25italic_σ = 0.25. The equation is solved for 20 timesteps with RK4 and dt=1.0𝑑𝑡1.0dt=1.0italic_d italic_t = 1.0. We generate 256 training and 64 test trajectories.

Spherical whallow-water equations [16].

The global shallow-water equations are

dudt=fk×ugh+νΔu𝑑𝑢𝑑𝑡𝑓𝑘𝑢𝑔𝜈Δ𝑢\displaystyle\frac{du}{dt}=-fk\times u-g\nabla h+\nu\Delta udivide start_ARG italic_d italic_u end_ARG start_ARG italic_d italic_t end_ARG = - italic_f italic_k × italic_u - italic_g ∇ italic_h + italic_ν roman_Δ italic_u
dhdt=hu+νΔh,𝑑𝑑𝑡𝑢𝜈Δ\displaystyle\frac{dh}{dt}=-h\nabla\cdot u+\nu\Delta h,divide start_ARG italic_d italic_h end_ARG start_ARG italic_d italic_t end_ARG = - italic_h ∇ ⋅ italic_u + italic_ν roman_Δ italic_h ,

where ddt𝑑𝑑𝑡\frac{d}{dt}divide start_ARG italic_d end_ARG start_ARG italic_d italic_t end_ARG is the material derivative, k𝑘kitalic_k is the unit vector orthogonal to the surface of the sphere, u𝑢uitalic_u is the velocity field that is tangent to the spherical surface and and hhitalic_h is the thickness of the fluid layer. The rest are constant parameters of the Earth (see [16] for details). As initial conditions we follow [16] and use basic zonal flow, representing a mid-latitude tropospheric jet, with a correspondingly balanced height field.

u(ϕ)={0for ϕϕ0umaxenexp[1(ϕϕ0)(ϕϕ1)]for ϕ0<ϕ<ϕ10for ϕϕ1𝑢italic-ϕcases0for italic-ϕsubscriptitalic-ϕ0subscript𝑢subscript𝑒𝑛1italic-ϕsubscriptitalic-ϕ0italic-ϕsubscriptitalic-ϕ1for subscriptitalic-ϕ0italic-ϕsubscriptitalic-ϕ10for italic-ϕsubscriptitalic-ϕ1u(\phi)=\begin{cases}0&\text{for }\phi\leq\phi_{0}\\ \frac{u_{\max}}{e_{n}}\exp\left[\frac{1}{(\phi-\phi_{0})(\phi-\phi_{1})}\right% ]&\text{for }\phi_{0}<\phi<\phi_{1}\\ 0&\text{for }\phi\geq\phi_{1}\end{cases}italic_u ( italic_ϕ ) = { start_ROW start_CELL 0 end_CELL start_CELL for italic_ϕ ≤ italic_ϕ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_u start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG start_ARG italic_e start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG roman_exp [ divide start_ARG 1 end_ARG start_ARG ( italic_ϕ - italic_ϕ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ( italic_ϕ - italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG ] end_CELL start_CELL for italic_ϕ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < italic_ϕ < italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL for italic_ϕ ≥ italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW

Where umax=80ms1subscript𝑢max80𝑚superscript𝑠1u_{\text{max}}=80ms^{-1}italic_u start_POSTSUBSCRIPT max end_POSTSUBSCRIPT = 80 italic_m italic_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, ϕ0=π/7,ϕ1=π/2ϕ1formulae-sequencesubscriptitalic-ϕ0𝜋7subscriptitalic-ϕ1𝜋2subscriptitalic-ϕ1\phi_{0}=\pi/7,\phi_{1}=\pi/2-\phi_{1}italic_ϕ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_π / 7 , italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_π / 2 - italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and en=exp[4(ϕ1ϕ0)2]subscript𝑒𝑛expdelimited-[]4superscriptsubscriptitalic-ϕ1subscriptitalic-ϕ02e_{n}=\text{exp}[-4(\phi_{1}-\phi_{0})^{2}]italic_e start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = exp [ - 4 ( italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_ϕ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]. With this initial zonal flow, we numerically integrate the balance equation

gh(ϕ)=gh0ϕau(ϕ)[f+tan(ϕ)au(ϕ)]𝑑ϕ,𝑔italic-ϕ𝑔subscript0superscriptitalic-ϕ𝑎𝑢superscriptitalic-ϕdelimited-[]𝑓superscriptitalic-ϕ𝑎𝑢superscriptitalic-ϕdifferential-dsuperscriptitalic-ϕgh(\phi)=gh_{0}-\int^{\phi}au(\phi^{\prime})\left[f+\frac{\tan(\phi^{\prime})}% {a}u(\phi^{\prime})\right]\,d\phi^{\prime},italic_g italic_h ( italic_ϕ ) = italic_g italic_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - ∫ start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT italic_a italic_u ( italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) [ italic_f + divide start_ARG roman_tan ( italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_a end_ARG italic_u ( italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ] italic_d italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ,

to obtain the height hhitalic_h. We then randomly generate small un-balanced perturbations hsuperscripth^{\prime}italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to the height field

h(θ,ϕ)=h^cos(ϕ)e(θ2θ/α)2e[(ϕ2ϕ)/β]2superscript𝜃italic-ϕ^italic-ϕsuperscript𝑒superscriptsubscript𝜃2𝜃𝛼2superscript𝑒superscriptdelimited-[]subscriptitalic-ϕ2italic-ϕ𝛽2h^{\prime}(\theta,\phi)=\hat{h}\cos(\phi)e^{-(\theta_{2}-\theta/\alpha)^{2}}e^% {-[(\phi_{2}-\phi)/\beta]^{2}}italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_θ , italic_ϕ ) = over^ start_ARG italic_h end_ARG roman_cos ( italic_ϕ ) italic_e start_POSTSUPERSCRIPT - ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_θ / italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - [ ( italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_ϕ ) / italic_β ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT

by uniformly sampling α,β,h^,θ2,𝛼𝛽^subscript𝜃2\alpha,\beta,\hat{h},\theta_{2},italic_α , italic_β , over^ start_ARG italic_h end_ARG , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , and ϕ2subscriptitalic-ϕ2\phi_{2}italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT within a neighbourhood of the values use in [16]. States are defined on a 192 ×\times× 96 grid for the high-resolution dataset, which is subsequently downsampled by 2×2222\times 22 × 2 mean pooling to a 96×48964896\times 4896 × 48 grid. We generate 512 training trajectories and 64 test trajectories.

Internally-heated convection in the ball.

The equations for the internally-heated convection system are listed here, they include thermal diffusivity (κ𝜅\kappaitalic_κ) and kinematic viscosity (ν𝜈\nuitalic_ν), given by:

κ=(RaPr)1/2𝜅superscriptRaPr12\kappa=\left(\text{Ra}\cdot\text{Pr}\right)^{-1/2}italic_κ = ( Ra ⋅ Pr ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT
ν=(RaPr)1/2𝜈superscriptRaPr12\nu=\left(\frac{\text{Ra}}{\text{Pr}}\right)^{-1/2}italic_ν = ( divide start_ARG Ra end_ARG start_ARG Pr end_ARG ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT

We set Ra=1e6Ra1𝑒6\text{Ra}=1e-6Ra = 1 italic_e - 6 and Pr=1Pr1\text{Pr}=1Pr = 1.

1. Incompressibility condition (continuity equation):

𝐮+τp=0𝐮subscript𝜏𝑝0\nabla\cdot\mathbf{u}+\tau_{p}=0∇ ⋅ bold_u + italic_τ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = 0

2. Momentum equation (Navier-Stokes equation):

𝐮tν2𝐮+p𝐫T+lift(τu)=𝐮×(×𝐮)𝐮𝑡𝜈superscript2𝐮𝑝𝐫𝑇liftsubscript𝜏𝑢𝐮𝐮\frac{\partial\mathbf{u}}{\partial t}-\nu\nabla^{2}\mathbf{u}+\nabla p-\mathbf% {r}T+\text{lift}(\tau_{u})=-\mathbf{u}\times(\nabla\times\mathbf{u})divide start_ARG ∂ bold_u end_ARG start_ARG ∂ italic_t end_ARG - italic_ν ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_u + ∇ italic_p - bold_r italic_T + lift ( italic_τ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) = - bold_u × ( ∇ × bold_u )

3. Temperature equation:

Ttκ2T+lift(τT)=𝐮T+κTsource𝑇𝑡𝜅superscript2𝑇liftsubscript𝜏𝑇𝐮𝑇𝜅subscript𝑇source\frac{\partial T}{\partial t}-\kappa\nabla^{2}T+\text{lift}(\tau_{T})=-\mathbf% {u}\cdot\nabla T+\kappa T_{\text{source}}divide start_ARG ∂ italic_T end_ARG start_ARG ∂ italic_t end_ARG - italic_κ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T + lift ( italic_τ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = - bold_u ⋅ ∇ italic_T + italic_κ italic_T start_POSTSUBSCRIPT source end_POSTSUBSCRIPT

4. Shear stress boundary condition (stress-free condition):

Shear Stress=0 on the boundaryShear Stress0 on the boundary\text{Shear Stress}=0\text{ on the boundary}Shear Stress = 0 on the boundary

5. No penetration boundary condition (radial component of velocity at r=1𝑟1r=1italic_r = 1):

radial(𝐮(r=1))=0radial𝐮𝑟10\text{radial}(\mathbf{u}(r=1))=0radial ( bold_u ( italic_r = 1 ) ) = 0

6. Thermal boundary condition (radial gradient of temperature at r=1𝑟1r=1italic_r = 1):

radial(T(r=1))=2radial𝑇𝑟12\text{radial}(\nabla T(r=1))=-2radial ( ∇ italic_T ( italic_r = 1 ) ) = - 2

7. Pressure gauge condition:

p𝑑V=0𝑝differential-d𝑉0\int p\,dV=0∫ italic_p italic_d italic_V = 0

The boundary conditions imposed are stress-free and no-penetration for the velocity field and a constant thermal flux at the outer boundary. These conditions are enforced using penalty terms (τ𝜏\tauitalic_τ) that are lifted into the domain using higher-order basis functions.

States are defined over a 64×24×2464242464\times 24\times 2464 × 24 × 24 ϕ,θ,ritalic-ϕ𝜃𝑟\phi,\theta,ritalic_ϕ , italic_θ , italic_r grid. We use a SBDF2 solver which we constrain by dtmin=1e4𝑑subscript𝑡min1𝑒4dt_{\text{min}}=1e-4italic_d italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT = 1 italic_e - 4 and dtmax=2e2𝑑subscript𝑡max2𝑒2dt_{\text{max}}=2e-2italic_d italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT = 2 italic_e - 2. We evolve the PDE for 26 timesteps, discarding the first 6. We generate 512 training trajectories and 64 test trajectories.

E.2 Training details

We provide hyperparameters per experiment. We optimize the weights of the neural field fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, and neural ODE Fψsubscript𝐹𝜓F_{\psi}italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT with Adam [23] with a learning rate of 1E-4 and 1E-3 respectively. We initialize the inner learning rate that we use in Meta-SGD [28] for learning zνsuperscript𝑧𝜈z^{\nu}italic_z start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT at 1.0 for p𝑝pitalic_p and 5.0 for 𝐜𝐜\mathbf{c}bold_c. For the neural ODE Fψsubscript𝐹𝜓F_{\psi}italic_F start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT, we use 3 of our message passing layers in the architecture specified in [5], with a hidden dimensionality of 128. The std parameter of the RFF embedding functions γq,γvα,γvβsubscript𝛾𝑞subscript𝛾subscript𝑣𝛼subscript𝛾subscript𝑣𝛽\gamma_{q},\gamma_{v_{\alpha}},\gamma_{v_{\beta}}italic_γ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT end_POSTSUBSCRIPT (see Appx. C), is chosen per experiment. We run all experiments on a single A100. All experiments are ran 3 times.

Diffusion on the plane.

We use 4 latents with 𝐜16𝐜superscript16\mathbf{c}\in\mathbb{R}^{16}bold_c ∈ blackboard_R start_POSTSUPERSCRIPT 16 end_POSTSUPERSCRIPT. We set the hidden dim of the ENF to 64 and use 2 attention heads. We train the model for 1000 epochs. We set γq=0.05,γvα=0.01,γvβ=0.01formulae-sequencesubscript𝛾𝑞0.05formulae-sequencesubscript𝛾subscript𝑣𝛼0.01subscript𝛾subscript𝑣𝛽0.01\gamma_{q}=0.05,\gamma_{v_{\alpha}}=0.01,\gamma_{v_{\beta}}=0.01italic_γ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT = 0.05 , italic_γ start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0.01 , italic_γ start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0.01. We use a batch size of 8. The model takes approximately 8 hours to train.

Navier-Stokes on the flat 2-torus.

We use 4 latents with 𝐜16𝐜superscript16\mathbf{c}\in\mathbb{R}^{16}bold_c ∈ blackboard_R start_POSTSUPERSCRIPT 16 end_POSTSUPERSCRIPT. We set the hidden dim of the ENF to 64 and use 2 attention heads. We train the model for 2000 epochs. We set γq=0.05,γvα=0.2,γvβ=0.2formulae-sequencesubscript𝛾𝑞0.05formulae-sequencesubscript𝛾subscript𝑣𝛼0.2subscript𝛾subscript𝑣𝛽0.2\gamma_{q}=0.05,\gamma_{v_{\alpha}}=0.2,\gamma_{v_{\beta}}=0.2italic_γ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT = 0.05 , italic_γ start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0.2 , italic_γ start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0.2. We use a batch size of 4. The model takes approximately 48 hours to train.

Diffusion on the 2-sphere.

We use 18 latents with 𝐜4𝐜superscript4\mathbf{c}\in\mathbb{R}^{4}bold_c ∈ blackboard_R start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT. We set the hidden dim of the ENF to 16 and use 2 attention heads. We train the model for 1500 epochs. We set γq=0.01,γvα=0.01,γvβ=0.01formulae-sequencesubscript𝛾𝑞0.01formulae-sequencesubscript𝛾subscript𝑣𝛼0.01subscript𝛾subscript𝑣𝛽0.01\gamma_{q}=0.01,\gamma_{v_{\alpha}}=0.01,\gamma_{v_{\beta}}=0.01italic_γ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT = 0.01 , italic_γ start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0.01 , italic_γ start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0.01. We use a batch size of 2. The model takes approximately 12 hours to train.

Spherical whallow-water equations [16].

We use 8 latents with 𝐜32𝐜superscript32\mathbf{c}\in\mathbb{R}^{3}2bold_c ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT 2. We set the hidden dim of the ENF to 128, and use 2 attention heads. We train the model for 1500 epochs. γq=0.05,γvα=0.2,γvβ=0.2formulae-sequencesubscript𝛾𝑞0.05formulae-sequencesubscript𝛾subscript𝑣𝛼0.2subscript𝛾subscript𝑣𝛽0.2\gamma_{q}=0.05,\gamma_{v_{\alpha}}=0.2,\gamma_{v_{\beta}}=0.2italic_γ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT = 0.05 , italic_γ start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0.2 , italic_γ start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0.2. We use a batch size of 2. The model takes approximately 24 hours to train.

Internally-heated convection in the ball

We use 8 latents with 𝐜32𝐜superscript32\mathbf{c}\in\mathbb{R}^{3}2bold_c ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT 2. We set the hidden dim of the ENF to 128, and use 2 attention heads. We train the model for 1500 epochs. γq=0.05,γvα=0.2,γvβ=0.2formulae-sequencesubscript𝛾𝑞0.05formulae-sequencesubscript𝛾subscript𝑣𝛼0.2subscript𝛾subscript𝑣𝛽0.2\gamma_{q}=0.05,\gamma_{v_{\alpha}}=0.2,\gamma_{v_{\beta}}=0.2italic_γ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT = 0.05 , italic_γ start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0.2 , italic_γ start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0.2. We use a batch size of 2. The model takes approximately 24 hours to train.

Baselines

As baseline models on Navier-Stokes we train FNO and GFNO [29] with 8 modes and 32 channels for 700 epochs (until convergence). We train CNODE [2] with 4 layers of size 64 for 300 epochs (until convergence). We train DINo on all experiments for 2000 epochs with an architecture as specified in [49]. For the IHC and shallow-water experiments, we increase the latent dim from 100 to 200, the number of layers for the neural ODE from 3 to 5, and the latent dim of the neural field decoder from 64 to 256, as per [49].