Senior Thesis CS

Data-Driven Computing Methods for Nonlinear Physics Systems with Geometric Constraints

Yun** Tong

Undergraduate Computer Science Thesis

Advised by

Professor Bo Zhu

Dartmouth College

Hanover, New Hampshire

June, 2024

Abstract

In a landscape where scientific discovery is increasingly driven by data, the integration of machine learning (ML) with traditional scientific methodologies has emerged as a transformative approach. This paper introduces a novel, data-driven framework that synergizes physics-based priors with advanced ML techniques to address the computational and practical limitations inherent in first-principle-based methods and brute-force machine learning methods. Our framework showcases four algorithms, each embedding a specific physics-based prior tailored to a particular class of nonlinear systems, including separable and nonseparable Hamiltonian systems, hyperbolic partial differential equations, and incompressible fluid dynamics. The intrinsic incorporation of physical laws preserves the system’s intrinsic symmetries and conservation laws, ensuring solutions are physically plausible and computationally efficient. The integration of these priors also enhances the expressive power of neural networks, enabling them to capture complex patterns typical in physical phenomena that conventional methods often miss. As a result, our models outperform existing data-driven techniques in terms of prediction accuracy, robustness, and predictive capability, particularly in recognizing features absent from the training set, despite relying on small datasets, short training periods, and small sample sizes.

Acknowledgements

I am deeply grateful for the generous financial support from Dartmouth Undergraduate Advising & Research, which provided me with the Presidential Scholarship, Sophomore and Junior Research Scholarships, the Leave Term Research Grant, and support through the Women in Science Project. Additionally, I wish to acknowledge the Neukom Scholarship program from Neukom Institute for Computational Science.

I extend my heartfelt thanks to my supervisor, Professor Bo Zhu, for his unwavering support and profound inspiration. I am especially grateful for the opportunities he offered me as a first-year student, which opened my career as a researcher. I wish him a prolific and successful career at Georgia Tech.

I also recognize the invaluable assistance of the team at the Dartmouth Visual Computing Lab. Special thanks to Dr. Shiying Xiong, now an Assistant Professor at Zhejiang University, for his extensive help in various aspects of my research. He is not only a super talented researchers in computational physics but also a remarkable coworker. Additional thanks go to Xingzhe He for his assistance with deep learning algorithms, among many others in the lab. Without their collective effort and collaboration, these works would not have been possible.

I also want to thank Professor Deeparnab Chakrabarty for his support and inspiration, as well as Professor Soroush Vosoughi and Professor Yaoqing Yang for being on my thesis committee. Additionally, I am grateful to the other professors and students who have taught and helped me at Dartmouth.

List of Publications

The following papers were published during the completion of my undergraduate studies and will be introduced in this thesis (listed in chronological order):

1.

Tong, Y., Xiong, S., He, X., Pan, G., & Zhu, B. (2021). Symplectic Neural Networks in Taylor Series Form for Hamiltonian Systems. Journal of Computational Physics, 437, 110325.
2.

Xiong, S., Tong, Y., He, X., Yang, S., Yang, C., & Zhu, B. (2021). Nonseparable Symplectic Neural Networks. In Proceedings of the International Conference on Learning Representations.
3.

Xiong, S., He, X., Tong, Y., Deng, Y., & Zhu, B. (2023). Neural Vortex Method: from Finite Lagrangian Particles to Infinite Dimensional Eulerian Dynamics. Computers and Fluids, 258, 105811.
4.

Tong, Y., Xiong, S., He, X., Yang, S., Wang, Z., Tao, R., Liu, R., & Zhu, B. (2024). RoeNet: Predicting Discontinuity of Hyperbolic Systems from Continuous Data. International Journal for Numerical Methods in Engineering, 125, e7406.

Author’s Contribution

The work presented in this thesis is the product of scientific collaboration. Here I detail my specific contributions to each project. For the project listed first [1], my responsibilities include the initial generation of research ideas, implementation of the methodologies, conducting experiments, and writing the research paper. For the remaining three projects [2, 3, 4], my primary roles involved conducting experiments and writing the respective papers. Additionally, for the fourth project [4], I was involved in idea generation and was responsible of the paper writing and revision.

1 Introduction

From the time of Newton, two principal paradigms have shaped the methodologies of scientific research: the Keplerian paradigm, or the data-driven approach, and the Newtonian paradigm, or the first-principle-based approach [5]. The first-principle-based approach is fundamental and elegant, but the dilemma we often face is its practicality. There are many time-dependent problems in science, where the equations of motion are too complex for full solution, either because the equations are not certain or because the computational cost is too high. Additionally, for a dynamic system governed by some unknown mechanics, it is challenging to identify governing equations by directly observing the system’s state, especially when such observation is partial and the sample data is sparse.

Now, the data-driven approach has become a very powerful tool with the advancement of statistical methods and machine learning (ML). This approach enables us to handle physical systems by statistically exploring their underlying structures. Data-driven approaches have proven their efficacy in uncovering the underlying governing equations of a variety of physical systems, ranging from fluid mechanics [6] and wave physics [7] to quantum physics [8], thermodynamics [9], and materials science [10]. Moreover, various ML methods have significantly advanced the numerical simulation of complex and high-dimensional dynamical systems. These methods integrate learning paradigms with simulation infrastructures, enhancing the modeling of ordinary differential equations [11], linear and nonlinear partial differential equations [4, 12], high-dimensional partial differential equations [13], and inverse problems [14], among others.

Despite these advancements, data-driven methods like neural networks, which exhibit remarkable generalization abilities across various fields, face significant challenges. These methods require large, clean datasets and depend heavily on complex, black-box network structures that are highly sensitive to input variations. Additionally, brute-force machine learning with conventional toolkits such as deep neural networks often struggles with the high dimensionality of input-output spaces, the cost of data acquisition, the production of physically implausible results, and the inability to handle extrapolation robustly. These factors make it difficult to predict long-term dynamical behaviors accurately.

To address these challenges, we introduce a novel, data-driven framework designed to make accurate, long-term predictions in a computationally efficient manner. The key innovation lies in incorporating physics-based priors into the learning algorithms so that the physics structure of the underlying system is intrinsically preserved. As a result, our models outperform other state-of-the-art data-driven methods in terms of prediction accuracy, robustness, and predictive capability, particularly in recognizing features absent from the training set. This superior performance is achieved despite relying on smaller datasets, shorter training periods, and limited sample sizes. At the same time, our models are significantly more computationally efficient than traditional first-principles-based methods, while achieving a similar level of accuracy.

This thesis details four algorithms we have developed over time, each incorporating a distinct physics-based prior relevant to a specific type of nonlinear system. The algorithm names, the associated physics priors, and the systems they address are as follows:

1.

Symplectic Taylor Neural Networks (Taylor-nets): The symplectic structure in separable Hamiltonian systems [1],
2.

Nonseparable Symplectic Neural Networks (NSSNNs): The symplectic structure in nonseparable Hamiltonian systems [2],
3.

Roe Neural Networks (RoeNet): Hyperbolic Conservation Law in hyperbolic partial differential equations (PDEs) [4],
4.

Neural Vortex Method (NVM): Helmholtz’s Theorems in incompressible fluid dynamics [3].

Overall, the key advantages and contributions of our methodologies are as follows:

•

Preservation of Intrinsic Symmetries and Conservation Laws: Our methodologies integrate physics-based priors within the learning algorithms, which significantly narrows the solution space. This reduction not only streamlines the computational demands but also preserves the mathematical symmetries and physical conservation laws inherent in the systems being modeled. Such an approach ensures that the generated solutions are not only efficient but also robust and aligned with physical reality, enhancing both the reliability and validity of the predictions.
•

Enhanced Expressive Power of Neural Networks: By embedding physics-based structures into our models, we expand the network’s capacity to capture and reproduce complex patterns that are typical in solutions to physical phenomena. Conventional deep neural networks often struggle to identify such patterns when they are not represented within the training dataset. Our approach supports generalized solutions to PDEs and expands the solution space, allowing for a more comprehensive encapsulation of the potential physical behaviors, significantly improving the model’s applicability and predictive accuracy.

The thesis will be organized into several key sections: an introductory section and a related work section that outline the research background; a methodology section that elaborates on the mathematical foundations, including an introduction to the supervised learning and numerical methods we used to develop our methodologies; an implementation section that details the algorithm design and proofs of four methodologies respectively; and a results section that summarizes the key implementation and experimental findings. The paper will conclude with a discussion on the implications of the results and potential avenues for future research.

Table 1: Overview of the key concepts related to four methodologies.

	Taylor-nets	NSSNNs	RoeNet	NVM
Physics System	Separable Hamiltonians	Nonseparable Hamiltonians	Hyperbolic PDEs	Incompressible Fluid Dynamics
Prior Embedded	Symplectic Structure	Symplectic Structure	Hyperbolic Conservation Law	Helmholtz’s Theorems
Solver	Separable Symplectic Integrator	Nonseparable Symplectic Integrator	Roe Solver	Lagrangian Vortex Method
Key Advantages	Accurately approximate the continuous-time evolution over a long term		Predict future discontinuities with short-term continuous data	Reconstruct continuous vortex dynamics with a small number of vortex particles

Table 1 summarizes the key concepts related to the four methods, including the specific physics systems they model, the type of physical principles or priors they embed, the integrative techniques they employ, and the primary advantages each method offers. These comparative insights provide an at-a-glance understanding of the distinct capabilities and applications of each method. The details will be addressed comprehensively in Section 3.2 and Section 4.

2 Related Work

Neural Networks for Hamiltonian Systems.

Greydanus et al. introduce Hamiltonian Neural Networks (HNNs) to preserve the Hamiltonian energy of systems by reformulating the loss function [15]. Inspired by HNNs, a series of methods that intrinsically embed a symplectic integrator into the recurrent neural network architecture were proposed, including SRNN [16], and SSINN [17]. Methods like HNN face two primary challenges: they require the temporal derivatives of system momentum and position to compute the loss function, which are hard to obtain from real-world systems, and they do not strictly preserve the symplectic structure as their symplectomorphism is governed by the loss function. Our model, Taylor-net [1], addresses these limitations by integrating a solver into the network architecture to avoid the need for time derivatives and by embedding a symmetrical structure directly within the neural networks, rather than adjusting the loss function. Moreover, these methods have been extended, via combination with graph networks [18, 19], to address large-scale N-body problems where interactions are driven by forces between particle pairs.

While the above methods are all designed to solve separable Hamiltonian systems, ** et al. proposed SympNet, which constructs symplectic map**s of system variables across neighboring time steps to handle both separable and nonseparable Hamiltonian systems [20]. However, the parameter scalability of SympNet, growing quadratically with the system size $O(N^{2})$ , poses challenges for application to high-dimensional N-body problems. Our model, NSSNN, addresses these issues with a novel network architecture tailored for nonseparable systems, which significantly reduces the complexity of parameter scaling [2]. Additionally, Hamiltonian-based neural networks have been adapted for broader applications. Toth et al. developed the Hamiltonian Generative Network (HGN) to infer Hamiltonian dynamics from high-dimensional observations, such as image data [21]. Furthermore, Zhong et al. introduced Symplectic ODE-Net (SymODEN), which incorporates an external control term into the standard Hamiltonian framework, enhancing the model’s applicability to controlled dynamical systems [22].

Neural Networks for Discontinuous Functions.

The use of deep learning networks to approximate discontinuous functions is well-supported theoretically, as highlighted in various studies on Hölder spaces [23], piecewise smooth functions [24], linear estimators [25], and highly adaptive, spatially anisotropic target functions [26]. Building on these foundations, Physics-Informed Neural Networks (PINNs) were introduced by Raissi et al. as a data-driven approach to solving nonlinear problems [27], leveraging the well-kown capability of deep neural networks to act as universal function approximators [28]. Among their key attributes, PINNs ensure the preservation of symmetry, invariance, and conservation principles that are inherent in the physical laws governing the observed data [29]. Michoski et al. demonstrated that PINNs could capture irregular solutions to PDEs without the need for any regularization [30]. Additionally, Mao et al. utilized PINNs to approximate solutions for high-speed flows by integrating the Euler equations with initial and boundary conditions into the loss function [31]. However, while these studies demonstrate the robust capabilities of PINNs, they often do not address extrapolation beyond the training set, a critical aspect for ensuring the generalizability of the models to a wider range of scenarios.

Neural Networks for Fluid Dynamics.

Recent advancements in fluid dynamics analysis have increasingly leveraged data-driven approaches powered by machine learning [32, 33, 34]. Recognizing the limitations in traditional brute-force machine learning methods, current research efforts are increasingly focused on integrating physical priors into learning algorithms, aiming to equip neural networks with a foundational understanding of physical laws, rather than approaching the data naively [35, 36, 37, 38, 39, 40]. Significant efforts have been made to encode these physical constraints efficiently, such as incorporating the Navier-Stokes (NS) equations [12], modeling incompressibility constraints [41], and map** dynamics of wave phenomena onto recurrent neural network computations [7]. Moreover, understanding complex fluid dynamics through machine learning involves embedding the structure of partial differential equations (PDEs) within neural network architectures [42, 43, 44, 45, 46, 47]. Ideally, these machine learning models designed to solve PDEs should be able to evolve the flow fields independently, obttaining initial-condition invariance without the need for a specific solver. However, the high dimensionality of the problems and insufficient supervisory data continue to pose significant challenges.

3 Methodology

3.1 Supervised learning

We used supervised learning for all of our models. Supervised learning is a subset of machine learning where an algorithm learns a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples. Each example is a pair consisting of an input object and a desired output value. The supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for map** new examples. Sequential steps involved in develo** a supervised learning model, from determining the type of training dataset to evaluating the model’s accuracy are:

1.

Determine the Type of Training Dataset: Identify whether the problem is a classification or regression to select the appropriate type of training dataset.
2.

Collect/Gather the Labelled Training Data: Assemble a dataset where each instance is tagged with the correct answer or outcome.
3.
Split the Training Dataset: Divide the dataset into three parts:
- •
  
  Training dataset: used to train the model.
- •
  
  Test dataset: used to test the model’s predictions.
- •
  
  Validation dataset: used to tune the model’s hyperparameters.
4.

Determine the Input Features: Select the features of the training dataset that contain sufficient information for the model to accurately predict the output.
5.

Determine the Suitable Algorithm: Choose an appropriate algorithm for the model based on the problem type.
6.

Execute the Algorithm on the Training Dataset: Train the model using the selected algorithm on the training dataset. Utilize the validation set to adjust control parameters as needed.
7.

Evaluate the Model’s Accuracy: Test the model using the test dataset to assess its accuracy. A model that correctly predicts the output indicates high accuracy.

3.1.1 Optimizer

In the context of neural networks, optimizers are crucial for minimizing the loss function, i.e., the difference between the actual and predicted outputs. One of the popular optimizers is the Adam optimizer [48], which combines the advantages of two other extensions of stochastic gradient descent, namely Adaptive Gradient Algorithm and Root Mean Square Propagation. The Adam optimizer’s update equations are given by:

	$\displaystyle m_{t}$	$\displaystyle=\beta_{1}m_{t-1}+(1-\beta_{1})g_{t}$
	$\displaystyle v_{t}$	$\displaystyle=\beta_{2}v_{t-1}+(1-\beta_{2})g_{t}^{2}$
	$\displaystyle\hat{m}_{t}$	$\displaystyle=\frac{m_{t}}{1-\beta_{1}^{t}}$
	$\displaystyle\hat{v}_{t}$	$\displaystyle=\frac{v_{t}}{1-\beta_{2}^{t}}$
	$\displaystyle\theta_{t+1}$	$\displaystyle=\theta_{t}-\frac{\alpha\hat{m}_{t}}{\sqrt{\hat{v}_{t}}+\epsilon}$

where $\theta$ represents the parameters of the model, $g_{t}$ is the gradient of the loss function with respect to the parameters at timestep $t$ , $m_{t}$ and $v_{t}$ are estimates of the first and the second moments of the gradients, respectively. $\alpha$ is the learning rate, $\beta_{1},\beta_{2}$ , and $\epsilon$ are hyperparameters.

3.1.2 Loss Functions

The choice of loss function is pivotal in guiding the training of the model towards its objective. In our methods, we use several common loss functions in supervised learning, including:

L1 Loss (Absolute Loss)

Defined as $L(y,\hat{y})=\sum|y-\hat{y}|$ , where $y$ is the true value and $\hat{y}$ is the predicted value.

L2 Loss (Squared Loss)

Given by $L(y,\hat{y})=\sum(y-\hat{y})^{2}$ . This loss function is sensitive to outliers as it squares the differences, hence penalizing larger errors more.

Cross-Entropy Loss

The Cross-Entropy Loss is widely used in classification tasks to measure the performance of a classification model whose output is a probability value between 0 and 1. The Cross-Entropy Loss formula is given by:

L(y,\hat{y})=-\frac{1}{N}\sum_{i=1}^{N}\left[y_{i}\log(\hat{y}_{i})+(1-y_{i})% \log(1-\hat{y}_{i})\right]

where $L(y,\hat{y})$ is the loss function, $N$ is the number of samples, $y_{i}$ is the actual label of the $i$ -th sample, and $\hat{y}_{i}$ is the predicted probability for the $i$ -th sample.

For multi-class classification, the generalized formula is:

L(y,\hat{y})=-\frac{1}{N}\sum_{i=1}^{N}\sum_{c=1}^{M}y_{ic}\log(\hat{y}_{ic})

where $M$ is the number of classes, $y_{ic}$ indicates whether class $c$ is the correct classification for observation $i$ , and $\hat{y}_{ic}$ is the predicted probability that observation $i$ is of class $c$ .

Focal Loss

Focal Loss is an adapted version of Cross-Entropy Loss, which addresses the problem of class imbalance by focusing more on hard-to-classify examples. It is particularly useful in scenarios where there is a large class imbalance. The formula for Focal Loss is given by:

L(y,\hat{y})=-\alpha_{t}(1-\hat{y}_{t})^{\gamma}\log(\hat{y}_{t})

where $\alpha_{t}$ is a weighting factor for the class $t$ to counteract class imbalance, $\gamma$ is a focusing parameter that adjusts the rate at which easy examples are down-weighted, $\hat{y}_{t}$ is the predicted probability of the class with label $t$ , and $(1-\hat{y}_{t})^{\gamma}$ reduces the loss for well-classified examples, putting more focus on hard, misclassified examples. Focal Loss is particularly useful for training on datasets where some classes are much more frequent than others, hel** to improve the robustness and performance of classification models in imbalanced datasets.

3.1.3 Activation Functions

Activation functions are non-linear functions applied to the output of a neuron in a neural network. They decide whether a neuron should be activated or not, hel** the network learn complex patterns in the data.

ReLU

One of the most popular activation functions is the Rectified Linear Unit (ReLU). It is defined as:

f(x)=\max(0,x)

where $x$ is the input to the neuron. ReLU is favored for its simplicity and efficiency, promoting faster convergence in training due to its linear, non-saturating form.

Next, we will outline the various models that were employed in the development of our model.

3.1.4 Residual Networks (ResNets) and Residual Blocks (ResBlocks)

ResNets are designed to enable training of very deep neural networks through the introduction of Residual Blocks (ResBlocks), which use skip connections or shortcuts to jump over some layers [49]. ResNets have been proven in numerous research studies to be a neural network architecture highly suitable for deep learning and computer vision. It offers distinctive advantages in mitigating problems like gradient vanishing during network training.

ResBlocks

A Residual Block allows the gradient to flow through the network directly, without passing through non-linear activations, by using skip connections. This is mathematically represented as:

\mathbf{h}_{\text{out}}=\mathcal{F}(\mathbf{h}_{\text{in}},\{\theta_{i}\})+% \mathbf{h}_{\text{in}}

where $\mathbf{h}_{\text{in}}$ is the input to the ResBlock, $\mathcal{F}(\mathbf{h}_{\text{in}},\{\theta_{i}\})$ represents the residual map** to be learned by layers of the ResBlock, and $\mathbf{h}_{\text{out}}$ is the output of the ResBlock. The addition operation is element-wise, allowing the network to learn identity map**s efficiently, which is crucial for training deep networks.

3.1.5 Neural Ordinary Differential Equations (Neural ODEs) and the Adjoint Method

Neural ODEs are a class of models that represent the continuous dynamics of hidden states using differential equations [50]. Unlike traditional neural networks that apply a discrete sequence of transformations, Neural ODEs model the derivative of the hidden state as a continuous transformation:

\frac{d\mathbf{h}(t)}{dt}=f(\mathbf{h}(t),t,\theta)

(1)

where $\mathbf{h}(t)$ is the hidden state at time $t$ , $f$ is a neural network parameterized by $\theta$ defining the time derivative of the hidden state, making the model capable of learning continuous-time dynamics.

At the heart of the model is that under the perspective of viewing a neural network as a dynamic system, we can treat the chain of residual blocks in a neural network as the solution of an ordinary differential equation (ODE) with the Euler method. Given a residual network that consists of sequence of transformations

\bm{h}_{t+1}=\bm{h}_{t}+f(\bm{h}_{t},\theta_{t}),

(2)

the idea is to parameterize the continuous dynamics using an ODE specified by the neural network specified in (1).

In a Neural ODE framework, the evolution of the hidden state $z$ is governed by an ODE parameterized by a neural network:

\frac{dz(t)}{dt}=f(z(t),t,\theta),

(3)

where $t$ is time, $\theta$ represents the parameters of the neural network, and $f$ is a function approximated by the neural network defining the dynamics of $z$ .

To optimize Neural ODEs, the adjoint method is utilized, providing an efficient means for calculating gradients with respect to the parameters $\theta$ during backpropagation [51]. Rather than differentiating through the ODE solver, we solve the adjoint ODE defined as:

\frac{d\mathbf{a}(t)}{dt}=-\mathbf{a}(t)^{\top}\frac{\partial f}{\partial% \mathbf{h}(t)},

(4)

where $\mathbf{a}(t)=\frac{dL}{d\mathbf{h}(t)}$ is the gradient of the loss function $L$ with respect to the hidden state.

The gradient of the loss with respect to the parameters is then obtained by integrating:

\frac{dL}{d\theta}=\int_{t_{1}}^{t_{0}}\mathbf{a}(t)^{\top}\frac{\partial f}{% \partial\theta}\,dt,

(5)

over the interval from $t_{0}$ to $t_{1}$ , the duration of the forward pass. The adjoint state $\mathbf{a}(t)$ is initialized at the end of the forward pass and integrated backward in time to obtain the necessary gradients for parameter updates.

3.2 Numerical Methods

First, we present four methods for solving ordinary differential equations (ODEs), which include the Euler method, Runge-Kutta method, Symplectic Integrator, and Non-separable Symplectic Integrator.

3.2.1 Euler Method

The Euler method represents one of the most straightforward numerical strategies for approximating solutions to ODEs. As a first-order numerical method, it provides an initial approach for solving initial value problems defined by $\frac{d\bm{y}}{dt}=\bm{f}(t,\bm{y})$ with the initial condition $\bm{y}(t_{0})=\bm{y}_{0}$ . Despite its simplicity, the Euler method is fundamental in the introduction to more sophisticated numerical methods for differential equations.

This method calculates the next state vector $\bm{y}$ by proceeding in the direction of the derivative $\bm{f}(t,\bm{y})$ , scaled by the timestep $dt$ . The updated state $\bm{y}$ at time $t+dt$ is given by:

\bm{y}(t+dt)=\bm{y}(t)+\bm{f}(t,\bm{y}(t))\cdot dt

As a consequence of its first-order accuracy, the local truncation error for the Euler method is of the order $O(dt^{2})$ , while the global error is of the order $O(dt)$ . This relatively large error suggests that while the Euler method can be beneficial for straightforward problems and educational purposes, it may not be the best choice for scenarios that demand high precision over extended durations.

3.2.2 Runge-Kutta Method

The Runge-Kutta methods are a prominent family of iterative techniques for the numerical resolution of ODEs. The fourth-order Runge-Kutta method, commonly referred to as RK4, is particularly renowned for its balance between computational efficiency and accuracy. This method is applied to approximate the solution of an initial value problem defined by the ODE $\frac{d\bm{y}}{dt}=\bm{f}(t,\bm{y})$ with the initial condition $\bm{y}(t_{0})=\bm{y}_{0}$ .

RK4 progresses the solution by computing a weighted average of four increments, where each increment evaluates the derivative $\bm{f}(t,\bm{y})$ at various points within the timestep $dt$ . The solution $\bm{y}$ at a subsequent time $t+dt$ is determined using the formula:

\bm{y}(t+dt)=\bm{y}(t)+\frac{1}{6}(\bm{k}_{1}+2\bm{k}_{2}+2\bm{k}_{3}+\bm{k}_{% 4})

with the increments given by:

	$\displaystyle\bm{k}_{1}$	$\displaystyle=\bm{f}(t,\bm{y})\cdot dt,$
	$\displaystyle\bm{k}_{2}$	$\displaystyle=\bm{f}\left(t+\frac{dt}{2},\bm{y}+\frac{\bm{k}_{1}}{2}\right)% \cdot dt,$
	$\displaystyle\bm{k}_{3}$	$\displaystyle=\bm{f}\left(t+\frac{dt}{2},\bm{y}+\frac{\bm{k}_{2}}{2}\right)% \cdot dt,$
	$\displaystyle\bm{k}_{4}$	$\displaystyle=\bm{f}(t+dt,\bm{y}+\bm{k}_{3})\cdot dt.$

As a fourth-order method, the RK4 achieves a local truncation error of the order $O(dt^{5})$ and a global error of the order $O(dt^{4})$ . This substantial accuracy renders the RK4 method highly effective for a broad spectrum of applications, offering an excellent trade-off between the computational demands and the precision of the solution.

3.2.3 Separable Symplectic Integrator

Symplectic integrators are a class of numerical integration schemes specifically designed for simulating Hamiltonian systems.

A Hamiltonian system is characterized by $N$ pairs of canonical coordinates, denoted by generalized positions $\bm{q}=(q_{1},q_{2},\cdots,q_{N})$ and generalized momenta $\bm{p}=(p_{1},p_{2},...p_{N})$ . The evolution of these coordinates over time is governed by Hamilton’s equations, expressed as

\begin{dcases}\frac{\textrm{d}\bm{q}}{\textrm{d}t}=\frac{\partial\mathcal{H}}{% \partial\bm{p}},\\ \frac{\textrm{d}\bm{p}}{\textrm{d}t}=-\frac{\partial\mathcal{H}}{\partial\bm{q% }},\end{dcases}

(6)

with the initial condition

(\bm{q}(t_{0}),\bm{p}(t_{0}))=(\bm{q}_{0},\bm{p}_{0}).

(7)

In a general setting, $\bm{q}=(q_{1},q_{2},\cdots,q_{N})$ represents the positions and $\bm{p}=(p_{1},p_{2},...p_{N})$ denotes their momentum. Function $\mathcal{H}=\mathcal{H}(\bm{q},\bm{p})$ is the Hamiltonian, which corresponds to the total energy of the system.

In a seperable Hamiltonian system, the Hamiltonian $\mathcal{H}$ can be split into a kinetic energy part $T(\bm{p})$ and a potential energy part $V(\bm{q})$ . Consequently, the Hamiltonian of a separable Hamiltonian system can e expressed in this form:

\mathcal{H}(\bm{q},\bm{p})=T(\bm{p})+V(\bm{q}).

(8)

The Symplectic integrators are distinguished by their ability to preserve the symplectic structure of phase space, an essential property for ensuring the long-term stability and accuracy of the simulation. By conserving quantities analogous to energy, these methods avoid the numerical dissipation typical of other numerical schemes, making them particularly well-suited for simulating dynamical systems over extended periods.

The specific Symplectic integrators we use is the fourth-order symplectic integrator, as described in the context of Hamiltonian systems and notably referenced in works by Forest and Ruth [52] and Yoshida [53]. It operates by applying a sequence of operations that integrate the system’s equations of motion over a timestep $dt$ while preserving the symplectic geometry of phase space. This preservation is crucial for accurately simulating the long-term behavior of Hamiltonian systems. The integrator is specifically designed for separable Hamiltonian systems shown in eqaution (8). The fourth-order symplectic integrator updates the system’s state over a time step $dt$ by applying a sequence of operations that preserve the symplectic structure. The procedure is as follows:

1. Initialize with $(\bm{q}_{0},\bm{p}_{0})$ at $t=t_{0}$ .

2. For each time step $dt$ , update $(\bm{q},\bm{p})$ through the following sequence of operations:

For each step $j$ from 1 to 4, execute the following updates:

•

Update momentum $\bm{p}$ by a fraction of the time step:

\bm{p}=\bm{p}-d_{j}\nabla V(\bm{q})\cdot dt.

(9)

•

Update position $\bm{q}$ by a fraction of the time step:

\bm{q}=\bm{q}+c_{j}\nabla T(\bm{p})\cdot dt.

(10)

The coefficients $c_{j}$ and $d_{j}$ are chosen to eliminate lower-order error terms, ensuring fourth-order accuracy. These coefficients are typically defined as [52, 53, 54]:

	$\displaystyle c_{1}$	$\displaystyle=c_{4}={\frac{1}{2(2-2^{1/3})}},$	$\displaystyle c_{2}$	$\displaystyle=c_{3}={\frac{1-2^{1/3}}{2(2-2^{1/3})}},$			(11)
	$\displaystyle d_{1}$	$\displaystyle=d_{3}={\frac{1}{2-2^{1/3}}},$	$\displaystyle d_{2}$	$\displaystyle=-{\frac{2^{1/3}}{2-2^{1/3}}},$	$\displaystyle d_{4}=0.$		(11)

Repeat these steps for each time step $dt$ , iteratively advancing the system from $(\bm{q}_{0},\bm{p}_{0})$ at $t_{0}$ to $(\bm{q}_{n},\bm{p}_{n})$ at $t_{0}+n\cdot dt$ , where $n$ is the number of time steps.

The fourth-order symplectic integrator is characterized by its fourth-order accuracy in the numerical simulation of Hamiltonian systems. This indicates that the local truncation error of the method is of the order $O(dt^{5})$ , implying that the error introduced in a single timestep decreases as the fifth power of the timestep size. Consequently, the global error, or the cumulative error over a fixed interval of time, is of the order $O(dt^{4})$ . Such high-order accuracy is especially beneficial for simulations requiring long-term stability and precision, as it permits the use of relatively large timestep sizes while maintaining a low overall numerical error.

3.2.4 Nonseparable Symplectic Integrator

Given a Hamiltonian system described in (6) with initial condition (7), we now consider a more genral case, an arbitrary separable and nonseparable Hamiltonian system. In the original research of [55] in computational physics, a generic, high-order, explicit and symplectic time integrator was proposed to solve (6) of an arbitrary separable and nonseparable Hamiltonian $\mathcal{H}$ . This is implemented by considering an augmented Hamiltonian

\overline{\mathcal{H}}(\bm{q},\bm{p},\bm{x},\bm{y}):=\mathcal{H}_{A}+\mathcal{% H}_{B}+\omega\mathcal{H}_{C}

(12)

with

\mathcal{H}_{A}=\mathcal{H}(\bm{q},\bm{y}),~{}~{}\mathcal{H}_{B}=\mathcal{H}(% \bm{x},\bm{p}),~{}~{}\mathcal{H}_{C}=\frac{1}{2}\left(\|\bm{q}-\bm{x}\|_{2}^{2% }+\|\bm{p}-\bm{y}\|_{2}^{2}\right)

(13)

in an extended phase space with symplectic two form $\textrm{d}\bm{q}\wedge\textrm{d}\bm{p}+\textrm{d}\bm{x}\wedge\textrm{d}\bm{y}$ , where $\omega$ is a constant that controls the binding of the original system and the artificial restraint.

Notice that the Hamilton’s equations for $\overline{\mathcal{H}}$

\begin{dcases}\frac{\textrm{d}\bm{q}}{\textrm{d}t}=\frac{\partial\overline{% \mathcal{H}}}{\partial\bm{p}}=\frac{\partial\mathcal{H}(\bm{x},\bm{p})}{% \partial\bm{p}}+\omega(\bm{p}-\bm{y}),\\ \frac{\textrm{d}\bm{p}}{\textrm{d}t}=-\frac{\partial\overline{\mathcal{H}}}{% \partial\bm{q}}=-\frac{\partial\mathcal{H}(\bm{q},\bm{y})}{\partial\bm{q}}-% \omega(\bm{q}-\bm{x}),\\ \frac{\textrm{d}\bm{x}}{\textrm{d}t}=\frac{\partial\overline{\mathcal{H}}}{% \partial\bm{y}}=\frac{\partial\mathcal{H}(\bm{q},\bm{y})}{\partial\bm{y}}-% \omega(\bm{p}-\bm{y}),\\ \frac{\textrm{d}\bm{y}}{\textrm{d}t}=-\frac{\partial\overline{\mathcal{H}}}{% \partial\bm{x}}=-\frac{\partial\mathcal{H}(\bm{x},\bm{p})}{\partial\bm{x}}+% \omega(\bm{q}-\bm{x}),\\ \end{dcases}

(14)

with the initial condition $(\bm{q},\bm{p},\bm{x},\bm{y})|_{t=t_{0}}=(\bm{q}_{0},\bm{p}_{0},\bm{q}_{0},\bm% {p}_{0})$ have the same exact solution as (6) in the sense that $(\bm{q},\bm{p},\bm{x},\bm{y})=(\bm{q},\bm{p},\bm{q},\bm{p})$ . Hence, we can get the solution of (6) by solving (14). The coefficient $\omega$ acts as a regularizer, which stabilizes the numerical results.

It is possible to construct high-order symplectic integrators for $\overline{\mathcal{H}}$ with explicit updates. Denote respectively by $\bm{\phi}_{1}^{\delta}(\bm{q},\bm{p},\bm{x},\bm{y})$ , $\bm{\phi}_{2}^{\delta}(\bm{q},\bm{p},\bm{x},\bm{y})$ , and $\bm{\phi}_{3}^{\delta}(\bm{q},\bm{p},\bm{x},\bm{y})$ , which are the time- $\delta$ flow of $\mathcal{H_{A}}$ , $\mathcal{H_{B}}$ , $\omega\mathcal{H_{C}}$ . $\bm{\phi}_{1}^{\delta}$ , $\bm{\phi}_{2}^{\delta}$ , and $\bm{\phi}_{3}^{\delta}$ are given by

\begin{bmatrix}\bm{q}\\ \bm{p}-\delta[\partial\mathcal{H}_{\theta}(\bm{q},\bm{y})/\partial\bm{q}]\\ \bm{x}+\delta[\partial\mathcal{H}_{\theta}(\bm{q},\bm{y})/\partial\bm{p}]\\ \bm{y}\end{bmatrix},~{}\begin{bmatrix}\bm{q}+\delta[\partial\mathcal{H}_{% \theta}(\bm{x},\bm{p})/\partial\bm{p}]\\ \bm{p}\\ \bm{x}\\ \bm{y}-\delta[\partial\mathcal{H}_{\theta}(\bm{x},\bm{p})/\partial\bm{q}]\end{% bmatrix},~{}\textrm{and}~{}\frac{1}{2}\begin{bmatrix}\begin{pmatrix}\bm{q}+\bm% {x}\\ \bm{p}+\bm{y}\\ \end{pmatrix}+\bm{R}^{\delta}\begin{pmatrix}\bm{q}-\bm{x}\\ \bm{p}-\bm{y}\\ \end{pmatrix}\\ \begin{pmatrix}\bm{q}+\bm{x}\\ \bm{p}+\bm{y}\\ \end{pmatrix}-\bm{R}^{\delta}\begin{pmatrix}\bm{q}-\bm{x}\\ \bm{p}-\bm{y}\\ \end{pmatrix}\\ \end{bmatrix},

(15)

respectively. Here

\bm{R}^{\delta}:=\begin{bmatrix}\cos(2\omega\delta)\bm{I}&\sin(2\omega\delta)% \bm{I}\\ -\sin(2\omega\delta)\bm{I}&\cos(2\omega\delta)\bm{I}\end{bmatrix},~{}~{}% \textrm{where}~{}\bm{I}~{}\textrm{is a identity matrix}.

(16)

We remark that $\bm{x}$ and $\bm{y}$ are just auxiliary variables, which are theoretically equal to $\bm{q}$ and $\bm{p}$ .

Then we construct a numerical integrator that approximates $\overline{\mathcal{H}}$ by composing these maps: it is well known that

(\bm{q}_{i},\bm{p}_{i},\bm{x}_{i},\bm{y}_{i})=\bm{\phi}_{1}^{\textrm{d}t/2}% \circ\bm{\phi}_{2}^{\textrm{d}t/2}\circ\bm{\phi}_{3}^{\textrm{d}t}\circ\bm{% \phi}_{2}^{\textrm{d}t/2}\circ\bm{\phi}_{1}^{\textrm{d}t/2}\circ(\bm{q}_{i-1},% \bm{p}_{i-1},\bm{x}_{i-1},\bm{y}_{i-1})

(17)

commonly named as Strang splitting, has a 3rd-order local error (thus a 2nd-order method), and is a symmetric method.

Next, we introduce two methods for solving partial differential equations (PDEs), which are the Roe solver and Lagrangian Vortex Method.

3.2.5 Roe Solver

In continuum mechanics, a one-dimensional hyperbolic conservation law is a first-order quasilinear hyperbolic PDE

\frac{\partial\bm{u}}{\partial t}+\frac{\partial\bm{F}(\bm{u})}{\partial x}=0,

(18)

with an initial condition

\bm{u}(t=t_{0},x)=\bm{u}_{0}(x),

(19)

and a proper boundary condition. Here the $N_{c}$ -component vector $\bm{u}=[u^{(1)},u^{(2)},\cdots,u^{(N_{c})}]^{T}$ is the conserved quantity, $t\in T=[t_{0},t_{1}]$ denotes the time variable, $x$ denotes the spatial coordinate in a computational domain $\Omega$ , and $\bm{F}=[F^{(1)},F^{(2)},\cdots,F^{(N_{c})}]^{T}$ is a $N_{c}$ -component flux function. The conservation laws described by (18) are fundamental in continuum mechanics, such as mass conservation, momentum conservation, and energy conservation in fluid mechanics [56].

Equation (18) can also be expressed in a weak form, which extends the class of admissible solutions to include discontinuous solutions. Specifically, by defining an arbitrary test function $\phi(t,x)$ that is continuously differentiable both in time and space with compact support, and integrating (18) $\times\phi$ in the space-time domain $T\times\Omega$ , the weak form of (18) is derived as

\int_{T\times\Omega}\left(\bm{u}\frac{\partial\phi}{\partial t}+\bm{F}\frac{% \partial\phi}{\partial x}\right)\textrm{d}t\textrm{d}x=0.

(20)

We remark that, with generalized Stokes theorem, all the partial derivatives of $\bm{u}$ and $\bm{F}$ in (18) have been passed on to the test function $\phi$ in (20), which with the former hypothesis is sufficiently smooth to admit these derivatives [57]. In the absence of ambiguity, we refer to the solution of (18) below as a weak solution that satisfies (20).

In addition, (18) can be written in a high dimensional form

\frac{\partial\bm{u}}{\partial t}+\sum_{i=1}^{N_{d}}\frac{\partial\bm{F}_{i}(% \bm{u})}{\partial x_{i}}=\bm{0},

(21)

where $x_{1},x_{2},\cdots,x_{N_{d}}$ denote the $N_{d}$ -dimensional spatial coordinates. Since every dimension in the second term of (21), namely $\partial\bm{F}_{i}(\bm{u})/\partial x_{i}$ , has the same form $\partial\bm{F}(\bm{u})/\partial x$ as the second term of (18), (21) can be easily solved if given the solution of (18). Thus, we will only discuss the numerical method to solve (18).

Philip L. Roe proposed an approximated Riemann solver based on the Godunov scheme [58] that constructs an estimation for the intercell numerical flux of $\bm{F}$ in (18) on the interface of two neighboring computational cells in a discretized space-time computational domain [58]. In particular, the Roe solver discretizes (18) as

\bm{u}_{j}^{n+1}=\bm{u}_{j}^{n}-\lambda_{r}\left(\hat{\bm{F}}_{j+\frac{1}{2}}^% {n}-\hat{\bm{F}}_{j-\frac{1}{2}}^{n}\right),

(22)

where $\lambda_{r}=\Delta t/\Delta x$ is the ratio of the temporal step size $\Delta t$ to the spatial step size $\Delta x$ , $j=1,...,N_{g}$ is the grid node index, and

\hat{\bm{F}}_{j+\frac{1}{2}}^{n}=\hat{\bm{F}}(\bm{u}_{j}^{n},\bm{u}_{j+1}^{n})

(23)

with

\hat{\bm{F}}(\bm{u},\bm{v})=\frac{1}{2}\left[\bm{F}(\bm{u})+\bm{F}(\bm{v})-|% \tilde{\bm{A}}(\bm{u},\bm{v})|(\bm{v}-\bm{u})\right].

(24)

Here, Roe matrix $\tilde{\bm{A}}$ that is assumed constant between two cells and must obey the following Roe conditions:

Matrix $\tilde{\bm{A}}$ is a diagonalizable matrix with real eigenvalues, i.e., matrix $\tilde{\bm{A}}(\bm{u},\bm{v})$ can be diagonalized as

\tilde{\bm{A}}=\bm{L}^{-1}\bm{\Lambda}\bm{L}

(25)

with an invertible matrix $\bm{L}$ and a diagonal matrix $\bm{\Lambda}=\textrm{diag}(\Lambda_{1},\cdots,\Lambda_{N_{c}})$ .

Matrix $\tilde{\bm{A}}$ is consistent with an exact Jacobian, that is

\lim_{\bm{u}_{j},\bm{u}_{j+1}\rightarrow\bm{u}}\tilde{\bm{A}}(\bm{u}_{j},\bm{u% }_{j+1})=\frac{\partial\bm{F}(\bm{u})}{\partial\bm{u}}.

(26)

Physical quantity $\bm{u}$ is conserved on the interface between two computational cells as

\bm{F}_{j+1}-\bm{F}_{j}=\tilde{\bm{A}}(\bm{u}_{j+1}-\bm{u}_{j}).

(27)

We denote the absolute value of $\tilde{\bm{A}}(\bm{u},\bm{v})$ as

|\tilde{\bm{A}}|=\bm{L}^{-1}|\bm{\Lambda}|\bm{L},

(28)

where $|\bm{\Lambda}|=\textrm{diag}(|\Lambda_{1}|,\cdots,|\Lambda_{N_{c}}|)$ is the absolute value of $\bm{\Lambda}$ . Substituting (23), (24) and (28) into (22) along with the third Roe condition (27) yields

	$\displaystyle\bm{u}_{j}^{n+1}=$	$\displaystyle\bm{u}_{j}^{n}-\frac{1}{2}\lambda_{r}[(\bm{L}^{n}_{j+\frac{1}{2}}% )^{-1}(\bm{\Lambda}_{j+\frac{1}{2}}^{n}-\|\bm{\Lambda}_{j+\frac{1}{2}}^{n}\|)\bm% {L}_{j+\frac{1}{2}}^{n}(\bm{u}_{j+1}^{n}-\bm{u}_{j}^{n})$		(29)
		$\displaystyle+(\bm{L}^{n}_{j-\frac{1}{2}})^{-1}(\bm{\Lambda}_{j-\frac{1}{2}}^{% n}+\|\bm{\Lambda}_{j-\frac{1}{2}}^{n}\|)\bm{L}_{j-\frac{1}{2}}^{n}(\bm{u}_{j}^{n% }-\bm{u}_{j-1}^{n})],$		(29)

with

\bm{L}_{j+\frac{1}{2}}^{n}=\bm{L}(\bm{u}_{j}^{n},\bm{u}_{j+1}^{n}),~{}~{}~{}~{% }\bm{\Lambda}_{j+\frac{1}{2}}^{n}=\bm{\Lambda}(\bm{u}_{j}^{n},\bm{u}_{j+1}^{n}).

(30)

Equation (29) serves as a template of evolution from $\bm{u}_{j}^{n}$ to $\bm{u}_{j}^{n+1}$ in Roe solver.

The key to design an effective Roe solver is to find the Roe matrix $\tilde{\bm{A}}$ that satisfies the three Roe conditions. In order to construct a Roe matrix $\tilde{\bm{A}}$ in (25), Roe solver utilizes an analytical approach to solve $\bm{L}$ and $\bm{\Lambda}$ based on $\bm{F}(\bm{u})$ . The Roe matrix is then plugged into (29) to ultimately solve for $\bm{u}$ in (18). The Roe solver linearizes Riemann problems, and such linearization recognizes the problem’s nonlinear jumps, while remaining computationally efficient.

3.2.6 Lagrangian Vortex Method (LVM)

Given a fluid velocity field $\bm{u}(\bm{x},t)$ with an incompressible constraint, its underlying dynamics can be described by the NS equations

\begin{dcases}\frac{D\bm{u}}{Dt}=-\frac{1}{\rho}\bm{\nabla}p+\nu\nabla^{2}\bm{% u}+\bm{f},\\ \bm{\nabla}\cdot\bm{u}=0,\end{dcases}

(31)

where $t$ denotes the time, $D/Dt=\partial/\partial t+\bm{u}\cdot\bm{\nabla}$ is the material derivative, $p$ is the pressure, $\nu$ is the kinematic viscosity, $\rho$ is the density, and $\bm{f}$ is the body accelerations (per unit mass) acting on the continuum, for example, gravity, inertial accelerations, electric field acceleration, and so on.

The alternative form of the NS equations could be obtained by defining the vorticity field $\bm{\omega}=\bm{\nabla\times u}$ , which leads to the following vorticity dynamical equation

\begin{dcases}\frac{D\bm{\omega}}{Dt}=(\bm{\omega}\cdot\bm{\nabla})\bm{u}+\nu% \bm{\nabla}^{2}\bm{\omega}+\bm{\nabla}\times\bm{f},\\ \nabla^{2}\bm{\Psi}=-\bm{\omega},~{}~{}\bm{u}=\bm{\nabla}\times\bm{\Psi},\end{dcases}

(32)

where $\bm{\Psi}$ is a vector potential whose curl is the velocity field. Although this form does not seem to bring any simplification, the key illumination of doing this transformation stems the Helmholtz’s theorems [59], which states that the dynamics of the vorticity field can be described by vortex surfaces/lines, which are Lagrangian surfaces/lines flowing with the velocity field in inviscid flows [60, 61].

The LVM discretizes the vorticity dynamical equation (32) with $N$ particles resulting in a set of ODEs for the particle strengths $\bm{\Gamma}=\{\bm{\Gamma}_{i}|i=1,\cdots,N\}$ and the particle positions $\bm{X}=\{\bm{X}_{i}|i=1,\cdots,N\}$ as

\begin{dcases}\frac{\textrm{d}\bm{\Gamma}_{i}}{\textrm{d}t}=\bm{\gamma}_{i},\\ \frac{\textrm{d}\bm{X}_{i}}{\textrm{d}t}=\bm{u}_{i}+\bm{v}_{i}.\end{dcases}

(33)

Here, the particle strength $\bm{\Gamma}_{i}$ is the integral of $\bm{\omega}$ over the $i^{\textrm{th}}$ computational element, $\bm{u}_{i}$ is the induced velocity calculated by BS law

\bm{u}_{i}=\frac{1}{2(n_{d}-1)\pi}\sum_{j\neq i}^{N}\frac{\bm{\Gamma}_{j}% \times(\bm{X}_{i}-\bm{X}_{j})}{|\bm{X}_{i}-\bm{X}_{j}|^{n_{d}}+\mathcal{R}^{n_% {d}}},

(34)

where $n_{d}$ is the dimension of the flow field. In addition, $\bm{\gamma}_{i}$ and $\bm{v}_{i}$ are the change rate of the particle strength and the drift velocity [62], respectively. To avoid singularities in the BS law, we introduce the numerical regularization parameter $\mathcal{R}$ in the LVM as $\mathcal{R}=0.1$ . The effect of the regularization parameter on the dynamics of the flow evolution of the simulated vortex particles is rather small because of the large spacing between the vortex particles.

In a two-dimensional ideal fluid flow, i.e., a strictly inviscid barotropic flow with conservative body forces, the movements of Lagrangian particles with conserved vorticity strength are determined by the velocity field they create, thus allowing us to advance the simulation temporally [63]. However, in the real three-dimensional flow, under the action of vortex stretching, vortex distortion, viscous dissipation, external forces, etc., the Lagrangian advection of vortex particles and their strength need to be corrected by $\gamma_{i}$ and $\bm{v}_{i}$ in (33).

We remark that the NS equations can be accurately modeled by the LVM with a large number of computational elements and a reasonable discrete distribution. However, the implementation of the LVM faces a major challenge which is to model the right-hand sides (r.h.s.) of the set of ordinary differential equations based on the NS equations. Firstly, the assumption that the vortices are point-like largely limits the use of the continuous BS law. Second, the drift velocity due to the external force cannot be obtained using the LVM without knowing the function of the external force. Even given the function, the LVM still fails to capture the drift velocity accurately in most cases [62]. Finally, when two particles are close enough, the singularity of the discrete BS law leads to a significant numerical error. The above problems make the LVM inaccurate and inapplicable in solving the underlying fluid dynamics under many situations [63].

4 Implementation

4.1 Symplectic Taylor Neural Networks (Taylor-nets)

4.1.1 Symplectomorphism in Hamiltonian Mechanics

Given a separable Hamiltonian system described by (6), (7), and (8). Substituting (8) into (6) yields

\begin{dcases}\frac{\textrm{d}\bm{q}}{\textrm{d}t}=\frac{\partial T(\bm{p})}{% \partial\bm{p}},\\ \frac{\textrm{d}\bm{p}}{\textrm{d}t}=-\frac{\partial V(\bm{q})}{\partial\bm{q}% }.\end{dcases}

(35)

This set of equations is fundamental in designing our neural networks. Our model will learn the r.h.s. of (35) under the framework of ODE-net.

One of the important features of the time evolution of Hamilton’s equations is symplectomorphism, which represents a transformation of phase space that is volume-preserving. In the setting of canonical coordinates, symplectomorphism means the transformation of the phase flow of a Hamiltonian system conserves the symplectic two-form

\textrm{d}\bm{p}\wedge\textrm{d}\bm{q}\equiv\sum_{j=1}^{N}\left(\textrm{d}p_{j% }\wedge\textrm{d}q_{j}\right),

(36)

where $\wedge$ denotes the wedge product of two differential forms. Inspired by the symplectomorphism feature, we aim to construct a neural network architecture that intrinsically preserves Hamiltonian structure.

4.1.2 A symmetric network in Taylor expansion form

In order to learn the gradients of the Hamiltonian with respect to the generalized coordinates, we propose the following underpinning mechanism, which is a set of symmetric networks that learn the gradients of the Hamiltonian with respect to the generalized coordinates.

\begin{dcases}\bm{T}_{p}(\bm{p},\bm{\theta}_{p})\rightarrow\frac{\partial T(% \bm{p})}{\partial\bm{p}},\\ \bm{V}_{q}(\bm{q},\bm{\theta}_{q})\rightarrow\frac{\partial V(\bm{q})}{% \partial\bm{q}},\end{dcases}

(37)

with parameters $(\bm{\theta}_{p},\bm{\theta}_{q})$ that are designed to learn the r.h.s. of (35), respectively. Here, the “ $\rightarrow$ ” represents our attempt to use the left-hand side (l.h.s) to learn the r.h.s. Substituting (37) into (35) yields

\begin{dcases}\frac{\textrm{d}\bm{q}}{\textrm{d}t}=\bm{T}_{p}(\bm{p},\bm{% \theta}_{p}),\\ \frac{\textrm{d}\bm{p}}{\textrm{d}t}=-\bm{V}_{q}(\bm{q},\bm{\theta}_{q}).\end{dcases}

(38)

Therefore, under the initial condition (7), the trajectories of the canonical coordinates can be integrated as

\begin{dcases}\bm{q}(t)=\bm{q}_{0}+\int_{t_{0}}^{t}\bm{T}_{p}(\bm{p},\bm{% \theta}_{p})\textrm{d}t,\\ \bm{p}(t)=\bm{p}_{0}-\int_{t_{0}}^{t}\bm{V}_{q}(\bm{q},\bm{\theta}_{q})\textrm% {d}t.\end{dcases}

(39)

From (37), we obtain

\begin{dcases}\frac{\partial\bm{T}_{p}(\bm{p},\bm{\theta}_{p})}{\partial\bm{p}% }\rightarrow\frac{\partial^{2}T(\bm{p})}{\partial\bm{p}^{2}},\\ \frac{\partial\bm{V}_{q}(\bm{q},\bm{\theta}_{q})}{\partial\bm{q}}\rightarrow% \frac{\partial^{2}V(\bm{q})}{\partial\bm{q}^{2}}.\end{dcases}

(40)

The r.h.s. of (40) are the Hessian matrix of $T$ and $V$ respectively, so we can design $\bm{T}_{p}(\bm{p},\bm{\theta}_{p})$ and $\bm{V}_{q}(\bm{q},\bm{\theta}_{q})$ as symmetric map**s, that are

\frac{\partial\bm{T}_{p}(\bm{p},\bm{\theta}_{p})}{\partial\bm{p}}=\left[\frac{% \partial\bm{T}_{p}(\bm{p},\bm{\theta}_{p})}{\partial\bm{p}}\right]^{T},

(41)

and

\frac{\partial\bm{V}_{q}(\bm{q},\bm{\theta}_{q})}{\partial\bm{q}}=\left[\frac{% \partial\bm{V}_{q}(\bm{q},\bm{\theta}_{q})}{\partial\bm{q}}\right]^{T}.

(42)

Due to the multiple nonlinear layers in the construction of traditional deep neural networks, it is impossible for these deep neural networks to fulfill (41) and (42). Therefore, we can only use a three-layer network with the form of linear-activation-linear, where the weights of the two linear layers are the transpose of each other, and in order to still maintain the expressive power of the networks, we construct symmetric nonlinear terms, as same as the terms of a Taylor polynomial, and combine them linearly. Specifically, we construct a symmetric network $\bm{T}_{p}(\bm{p},\bm{\theta}_{p})$ as

\bm{T}_{p}(\bm{p},\bm{\theta}_{p})=\left(\sum_{i=1}^{M}\bm{A}_{i}^{T}\circ f_{% i}\circ\bm{A}_{i}-\bm{B}_{i}^{T}\circ f_{i}\circ\bm{B}_{i}\right)\circ\bm{p}+% \bm{b},

(43)

where ‘ $\circ$ ’ denotes the function composition, $\bm{A}_{i}$ and $\bm{B}_{i}$ are fully connected layers with size $N_{h}\times N$ , $\bm{b}$ is a $N$ dimensional bias, $M$ is the number of terms in the Taylor series expansion, and $f_{i}$ is an element-wise function, representing the $i^{\textrm{th}}$ order term in the Taylor polynomial

f_{i}(x)=\frac{1}{i!}x^{i}.

(44)

Figure 1 plots a schematic diagram of $\bm{T}_{p}(\bm{p},\bm{\theta}_{p})$ in Taylor-net. The input of $\bm{T}_{p}(\bm{p},\bm{\theta}_{p})$ is $\bm{p}$ , and $\bm{\theta}_{p}=(\bm{A}_{i}$ , $\bm{B}_{i},\bm{b})$ . We construct a negative term $\bm{B}_{i}^{T}\circ f_{i}\circ\bm{B}_{i}$ following a positive term $\bm{A}_{i}^{T}\circ f_{i}\circ\bm{A}_{i}$ , since two positive semidefinite matrices with opposite signs can represent any symmetric matrix.

Refer to caption — Figure 1: The schematic diagram of $\bm{T}_{p}(\bm{p},\bm{\theta}_{p})$ in Taylor-net. Source: [1].

To prove (43) is symmetric, that is it fulfills (41), we introduce Theorem 4.1.

Theorem 4.1.

The network (43) satisfies (41).

Proof.

From (43), we have

\frac{\partial\bm{T}_{p}(\bm{p},\bm{\theta}_{p})}{\partial\bm{p}}=\sum_{i=1}^{% M}\bm{A}_{i}^{T}\bm{\Lambda}_{i}^{A}\bm{A}_{i}-\bm{B}_{i}^{T}\bm{\Lambda}_{i}^% {B}\bm{B}_{i},

(45)

with

\Lambda_{i}^{A}=\textrm{diag}\left(\frac{\textrm{d}f}{\textrm{d}x}\Bigg{|}_{x=% \bm{A}_{i}\circ\bm{p}}\right),

(46)

and

\Lambda_{i}^{B}=\textrm{diag}\left(\frac{\textrm{d}f}{\textrm{d}x}\Bigg{|}_{x=% \bm{B}_{i}\circ\bm{p}}\right).

(47)

It’s easy to see that (45) is a symmetric matrix that satisfies (41). ∎

In fact, $\bm{T}_{p}(\bm{p},\bm{\theta}_{p})$ in (41) and $\bm{V}_{q}(\bm{q},\bm{\theta}_{q})$ in (42) satisfy the same property, so we construct $V_{q}$ with the similar form as

\bm{V}_{q}(\bm{q},\bm{\theta}_{q})=\left(\sum_{i=1}^{M}\bm{C}_{i}^{T}\circ f_{% i}\circ\bm{C}_{i}-\bm{D}_{i}^{T}\circ f_{i}\circ\bm{D}_{i}\right)\circ\bm{q}+% \bm{d}.

(48)

Here, $\bm{C}_{i}$ , $\bm{D}_{i}$ , and $\bm{d}$ have the same structure as (43), and $(\bm{C}_{i}$ , $\bm{D}_{i},\bm{d})=\bm{\theta}_{q}$ .

4.1.3 Symplectic Taylor neural networks

Next, we substitute the constructed network (43) and (48) into (39) to learn the Hamiltonian system (35). We employ ODE-net [50] introduced in 3.1.5 as our computational infrastructure. Inspired by the idea of ODE-net, we design neural networks that can learn continuous time evolution. In Hamiltonian system (35), where the coordinates are integrated as (39), we can implement a time integrator to solve for $\bm{p}$ and $\bm{q}$ . While ODE-net uses fourth-order Runge–Kutta method to make the neural networks structure-preserving, we need to implement an integrator that is symplectic. Therefore, we introduce Taylor-net, in which we design the symmetric Taylor series expansion and utilize the fourth-order symplectic integrator to construct neural networks that are symplectic to learn the gradients of the Hamiltonian with respect to the generalized coordinates and ultimately the temporal integral of a Hamiltonian system.

Algorithm 1 Integrate (39) by using the fourth-order symplectic integrator. Source: [1].

\bm{q}_{0},\bm{p}_{0},t_{0},t,\Delta t

\bm{F}_{t}^{j}

in (49) and

\bm{F}_{k}^{j}

in (50) with

j=1,2,3,4

;

\bm{q}(t),\bm{p}(t)

n=\textrm{floor}[(t-t_{0})/\Delta t]

;

for

i=1,n

(\bm{k}_{p}^{0},\bm{k}_{q}^{0})=(\bm{p}_{i-1},\bm{q}_{i-1})

;

for

j=1,4

(\bm{t}_{p}^{j-1},\bm{t}_{q}^{j-1})=\bm{F}_{t}^{j}(\bm{k}_{p}^{j-1},\bm{k}_{q}% ^{j-1},\Delta t)

(\bm{k}_{p}^{j},\bm{k}_{q}^{j})=\bm{F}_{k}^{j}(\bm{t}_{p}^{j-1},\bm{t}_{q}^{j-% 1},\Delta t)

end

(\bm{p}_{i},\bm{q}_{i})=(\bm{k}_{p}^{4},\bm{k}_{q}^{4})

;

end

\bm{q}(t)=\bm{q}_{n},\bm{p}(t)=\bm{p}_{n}

For the constructed networks (43) and (48), we integrate (39) by using the fourth-order symplectic integrator introduced in 3.2.3. Specifically, we will have an input layer $(\bm{q}_{0},\bm{p}_{0})$ at $t=t_{0}$ and an output layer $(\bm{q}_{n},\bm{p}_{n})$ at $t=t_{0}+n\textrm{d}t$ . The recursive relations of $(\bm{q}_{i},\bm{p}_{i}),i=1,2,\cdots,n$ , can be expressed by Algorithm 1. The input function in Algorithm 1 are

\bm{F}_{t}^{j}(\bm{p},\bm{q},\textrm{d}t)=\left(\bm{p},\bm{q}+c_{j}\bm{T}_{p}(% \bm{p},\bm{\theta}_{p})\textrm{d}t\right),

(49)

and

\bm{F}_{k}^{j}(\bm{p},\bm{q},\textrm{d}t)=\left(\bm{p}-d_{j}\bm{V}_{q}(\bm{q},% \bm{\theta}_{q})\textrm{d}t,\bm{q}\right),

(50)

with coefficients (11).

Relationships (49) and (50) are obtained by replacing $\partial T(\bm{p})/\partial\bm{p}$ and $\partial V(\bm{q})/\partial\bm{q}$ in the fourth-order symplectic integrator with deliberately designed neural networks $\bm{T}_{p}(\bm{p},\bm{\theta}_{p})$ and $\bm{V}_{q}(\bm{q},\bm{\theta}_{q})$ , respectively. Figure 2 plots a schematic diagram of Taylor-net which is described by Algorithm 1. The input of Taylor-net is $(\bm{q}_{0},\bm{p}_{0})$ , and the output is $(\bm{q}_{n},\bm{p}_{n})$ . Taylor-net consists of $n$ iterations of fourth-order symplectic integrator. The input of the integrator is $(\bm{q}_{i-1},\bm{p}_{i-1})$ , and the output is $(\bm{q}_{i},\bm{p}_{i})$ . Within the integrator, the output of $\bm{T}_{p}$ is used to calculate $\bm{q}$ , while the output of $\bm{V}_{q}$ is used to calculate $\bm{p}$ , which is signified by the shoelace-like pattern in the diagram. The four intermediate variables $\bm{t}_{p}^{0}\cdots\bm{t}_{p}^{4}$ and $\bm{k}_{q}^{0}\cdots\bm{k}_{q}^{4}$ indicate that the scheme is fourth-order.

By constructing the network $\bm{T}_{p}(\bm{p},\bm{\theta}_{p})$ in (43) that satisfies (41), we show that Theorem 4.2 holds, so the network (49) preserves the symplectic structure of the system.

Theorem 4.2.

For a given $\textrm{d}t$ , the map** $\bm{F}_{t}^{j}(:,:,\textrm{d}t):\mathbb{R}^{2N}\rightarrow\mathbb{R}^{2N}$ in (49) is a symplectomorphism if and only if the Jacobian of $\bm{T}_{p}$ is a symmetric matrix, that is, it satisifies (41).

Proof.

Let

(\bm{t}_{p},\bm{t}_{q})=\bm{F}_{t}^{j}(\bm{k}_{p},\bm{k}_{q},\textrm{d}t).

(51)

From (49), we have

		$\displaystyle\textrm{d}\bm{t}_{p}\wedge\textrm{d}\bm{t}_{q}=\textrm{d}\bm{k}_{% p}\wedge\textrm{d}\bm{k}_{q}+$		(52)
		$\displaystyle\frac{1}{2}\sum_{l,m=1}^{N}c_{j}\textrm{d}t\left[\frac{\partial% \bm{T}_{p}(\bm{k}_{p},\bm{\theta}_{p})}{\partial\bm{k}_{p}}\Bigg{\|}_{l,m}-% \frac{\partial\bm{T}_{p}(\bm{k}_{p},\bm{\theta}_{p})}{\partial\bm{k}_{p}}\Bigg% {\|}_{m,l}\right]\textrm{d}\bm{k}_{p}\|_{l}\wedge\textrm{d}\bm{k}_{q}\|_{m}.$		(52)

Here $\bm{A}|_{l,m}$ refers to the entry in the $l$ -th row and $m$ -th column of a matrix $\bm{A}$ , $\bm{x}|_{l}$ refers to the $l$ -th component of vector $\bm{x}$ . From (55), we know that $\textrm{d}\bm{t}_{p}\wedge\textrm{d}\bm{t}_{q}=\textrm{d}\bm{k}_{p}\wedge% \textrm{d}\bm{k}_{q}$ is equivalent to

\frac{\partial\bm{T}_{p}(\bm{k}_{p},\bm{\theta}_{p})}{\partial\bm{k}_{p}}\Bigg% {|}_{l,m}-\frac{\partial\bm{T}_{p}(\bm{k}_{p},\bm{\theta}_{p})}{\partial\bm{k}% _{p}}\Bigg{|}_{m,l}=0,\quad\forall l,m=1,2,\cdots,N,

(53)

which is (41). ∎

Similar to Theorem 4.2, we can find the relationship between $\bm{F}_{k}^{j}$ and the Jacobian of $\bm{V}_{q}$ . The proof of 4.3 is omitted as it is similar to the proof of Theorem 4.2.

Theorem 4.3.

For a given dt, the map** $\bm{F}_{k}^{j}(:,:,\textrm{d}t):\mathbb{R}^{2N}\rightarrow\mathbb{R}^{2N}$ in (50) is a symplectomorphism if and only if the Jacobian of $\bm{V}_{q}$ is a symmetric matrix, that is, it satisifies (42).

Suppose that $\Phi_{1}$ and $\Phi_{2}$ are two symplectomorphisms. Then, it is easy to show that their composite map $\Phi_{2}\circ\Phi_{1}$ is also symplectomorphism due to the chain rule. Thus, the symplectomorphism of Algorithm 1 can be guaranteed by the Theorems 4.2 and 4.3.

4.2 Nonseparable Symplectic Neural Networks (NSSNNs)

Our model aims to learn the dynamical evolution of $(\bm{q},\bm{p})$ in (6) by embedding (14) into the framework of NeuralODE [50]. We learn the nonseparable Hamiltonian dynamics (6) by constructing an augmented system (14), from which we can obtain the energy function $\mathcal{H}(\bm{q},\bm{p})$ by training the neural network $\mathcal{H}_{\theta}(\bm{q},\bm{p})$ with parameter $\bm{\theta}$ and calculate the gradient $\bm{\nabla}\mathcal{H}_{\theta}(\bm{q},\bm{p})$ by taking the in-graph gradient. For the constructed network $\mathcal{H}_{\theta}(\bm{q},\bm{p})$ , we integrate (14) by using the second-order symplectic integrator [55]. Specifically, we will have an input layer $(\bm{q},\bm{p},\bm{x},\bm{y})=(\bm{q}_{0},\bm{p}_{0},\bm{q}_{0},\bm{p}_{0})$ at $t=t_{0}$ and an output layer $(\bm{q},\bm{p},\bm{x},\bm{y})=(\bm{q}_{n},\bm{p}_{n},\bm{x}_{n},\bm{y}_{n})$ at $t=t_{0}+n\textrm{d}t$ .

Algorithm 2 Integrate (14) by using the second-order symplectic integrator. Source: [2].

\bm{q}_{0},\bm{p}_{0},t_{0},t,\textrm{d}t

;

\bm{\phi}_{1}^{\delta}

\bm{\phi}_{2}^{\delta}

, and

\bm{\phi}_{3}^{\delta}

in (15);

(\hat{q},\hat{p},\hat{x},\hat{y})=(\bm{q}_{n},\bm{p}_{n},\bm{x}_{n},\bm{y}_{n})

(\bm{q}_{0},\bm{p}_{0},\bm{x}_{0},\bm{y}_{0})=(\bm{q}_{0},\bm{p}_{0},\bm{q}_{0% },\bm{p}_{0})

;

n=\textrm{floor}[(t-t_{0})/\textrm{d}t]

;

for

i=1\to n

(\bm{q}_{i},\bm{p}_{i},\bm{x}_{i},\bm{y}_{i})=\bm{\phi}_{1}^{\textrm{d}t/2}% \circ\bm{\phi}_{2}^{\textrm{d}t/2}\circ\bm{\phi}_{3}^{\textrm{d}t}\circ\bm{% \phi}_{2}^{\textrm{d}t/2}\circ\bm{\phi}_{1}^{\textrm{d}t/2}\circ(\bm{q}_{i-1},% \bm{p}_{i-1},\bm{x}_{i-1},\bm{y}_{i-1})

;

end

The recursive relations of $(\bm{q}_{i},\bm{p}_{i},\bm{x}_{i},\bm{y}_{i}),i=1,2,\cdots,n$ , can be expressed by Algorithm 2. Figure 3(a) shows the forward pass of NSSNN is composed of a forward pass through a differentiable symplectic integrator as well as a backpropagation step through the model. Figure 3(b) plots the schematic diagram of NSSNN. For the constructed network $\mathcal{H}_{\theta}(\bm{q},\bm{p})$ , we integrate (14) by using the second-order symplectic integrator [55]. Specifically, The input layer of the integrator is $(\bm{q},\bm{p},\bm{x},\bm{y})=(\bm{q}_{0},\bm{p}_{0},\bm{q}_{0},\bm{p}_{0})$ at $t=t_{0}$ and the output layer is $(\bm{q},\bm{p},\bm{x},\bm{y})=(\bm{q}_{n},\bm{p}_{n},\bm{x}_{n},\bm{y}_{n})$ at $t=t_{0}+n\textrm{d}t$ . The recursive relations of $(\bm{q}_{i},\bm{p}_{i},\bm{x}_{i},\bm{y}_{i}),i=1,2,\cdots,n$ , are expressed by Algorithm 2. Moreover, given (15), since $\bm{x}$ and $\bm{y}$ are theoretically equal to $\bm{q}$ and $\bm{p}$ , we can use the data set of $(\bm{q},\bm{p})$ to construct the data set containing variables $(\bm{q},\bm{p},\bm{x},\bm{y})$ .

In addition, by constructing the network $\mathcal{H}_{\theta}$ , we show that Theorem 4.4 holds, so the networks $\bm{\phi}_{1}^{\delta},\bm{\phi}_{2}^{\delta}$ , and $\bm{\phi}_{3}^{\delta}$ in (15) preserve the symplectic structure of the system. Suppose that $\Phi_{1}$ and $\Phi_{2}$ are two symplectomorphisms. Then, it is easy to show that their composite map $\Phi_{2}\circ\Phi_{1}$ is also symplectomorphism due to the chain rule. Thus, the symplectomorphism of Algorithm 2 can be guaranteed by Theorem 4.4.

Theorem 4.4.

For a given $\delta$ , the map** $\bm{\phi}_{1}^{\delta}$ , $\bm{\phi}_{2}^{\delta}$ , and $\bm{\phi}_{3}^{\delta}$ in (15) are symplectomorphisms.

Proof.

Let

(\bm{t}_{j}^{q},\bm{t}_{j}^{p},\bm{t}_{j}^{x},\bm{t}_{j}^{y})=\bm{\phi}_{j}^{% \delta}(\bm{q},\bm{p},\bm{x},\bm{y}),~{}~{}j=1,2,3.

(54)

From the first equation of (15), we have

	$\displaystyle\textrm{d}\bm{t}_{1}^{q}\wedge\textrm{d}\bm{t}_{1}^{p}+\textrm{d}% \bm{t}_{1}^{x}\wedge\textrm{d}\bm{t}_{1}^{y}$	(55)
$\displaystyle=$	$\displaystyle\textrm{d}\bm{q}\wedge\textrm{d}\left[\bm{p}-\delta\frac{\partial% \mathcal{H}_{\theta}(\bm{q},\bm{y})}{\partial\bm{q}}\right]+\textrm{d}\left[% \bm{x}+\delta\frac{\partial\mathcal{H}_{\theta}(\bm{q},\bm{y})}{\partial\bm{p}% }\right]\wedge\textrm{d}\bm{y}$
$\displaystyle=$	$\displaystyle\textrm{d}\bm{q}\wedge\textrm{d}\bm{p}+\textrm{d}\bm{x}\wedge% \textrm{d}\bm{y}+\delta\left[\frac{\partial\mathcal{H}_{\theta}(\bm{q},\bm{y})% }{\partial\bm{q}\partial\bm{y}}-\frac{\partial\mathcal{H}_{\theta}(\bm{q},\bm{% y})}{\partial\bm{y}\partial\bm{q}}\right]\textrm{d}\bm{q}\wedge\textrm{d}\bm{y}$
$\displaystyle=$	$\displaystyle\textrm{d}\bm{q}\wedge\textrm{d}\bm{p}+\textrm{d}\bm{x}\wedge% \textrm{d}\bm{y}.$

Similarly, we can prove that $\textrm{d}\bm{t}_{2}^{q}\wedge\textrm{d}\bm{t}_{2}^{p}+\textrm{d}\bm{t}_{2}^{x% }\wedge\textrm{d}\bm{t}_{2}^{y}=\textrm{d}\bm{q}\wedge\textrm{d}\bm{p}+\textrm% {d}\bm{x}\wedge\textrm{d}\bm{y}$ . In addition, from the third equation of (15), we can directly deduce that $\textrm{d}\bm{t}_{3}^{q}\wedge\textrm{d}\bm{t}_{3}^{p}+\textrm{d}\bm{t}_{3}^{x% }\wedge\textrm{d}\bm{t}_{3}^{y}=\textrm{d}\bm{q}\wedge\textrm{d}\bm{p}+\textrm% {d}\bm{x}\wedge\textrm{d}\bm{y}$ . ∎

We show a motivational example in Figure 4 by comparing our approach with a traditional HNN method [15] regarding their structural designs and predicting abilities. We refer the readers to Section 5.2.3 for a detailed discussion. As shown in Figure 4, the vortices evolved using NSSNN are separated nicely as the ground truth, while the vortices merge together using HNN due to the failure of conserving the symplectic structure of a nonseparable system. The conservative capability of NSSNN springs from our design of the auxiliary variables (red $x$ and $y$ ) which converts the original nonseparable system into a higher dimensional quasi-separable system where we can adopt a symplectic integrator.

4.3 Roe Neural Networks (RoeNet)

We introduce our design of the Roe template with pseudoinverse embedding, which accommodates the data processing and training over the entire learning pipeline. In particular, we present our basic ideas in Section 4.3.1, a detailed description of our network architecture in Section 4.3.2.

4.3.1 Roe template with Pseudoinverse Embedding

Recall the one-dimensional hyperbolic conservation law described in (18), without a given $\bm{F}$ , we learn the weak solution of (18) using a neural network that incorporates the framework of a Roe solver. For time integration of $\bm{u}$ in (29), we need to construct the matrix functions $\bm{L}$ and $\bm{\Lambda}$ . Since learning a tiny parameter space is impractical, using neural networks to approximate $\bm{L}$ and $\bm{\Lambda}$ directly in (30) is ineffective given that the number of learnable parameters is limited by the number of components $N_{c}$ of $\bm{u}$ . To enhance the expressiveness of our model, we use neural network $\bm{L}_{\theta}$ and $\bm{\Lambda}_{\phi}$ to replace $\bm{L}$ and $\bm{\Lambda}$ in (30) respectively. Similar to (30), the inputs to $\bm{L}_{\theta}$ and $\bm{\Lambda}_{\phi}$ remains the same as $(\bm{u}_{j}^{n},\bm{u}_{j+1}^{n})$ . However, the outputs of $\bm{L}_{\theta}$ and $\bm{\Lambda}_{\phi}$ are now a $N_{h}\times N_{c}$ matrix and a $N_{h}\times N_{h}$ diagonal matrix respectively, where the positive integer $N_{h}$ is a hidden dimension. Furthermore, we introduce the concept of pseudoinverses by replacing $\bm{L}^{-1}$ with

\bm{L}_{\theta}^{+}=(\bm{L}_{\theta}^{T}\bm{L}_{\theta})^{-1}\bm{L}_{\theta}^{% T}.

(56)

Here, the transpose and inverse operations are applied to the output matrix, that is

\bm{L}_{\theta}^{+}(\bm{u}_{j}^{n},\bm{u}_{j+1}^{n})=[\bm{L}_{\theta}(\bm{u}_{% j}^{n},\bm{u}_{j+1}^{n})^{T}\bm{L}_{\theta}(\bm{u}_{j}^{n},\bm{u}_{j+1}^{n})]^% {-1}\bm{L}_{\theta}^{T}(\bm{u}_{j}^{n},\bm{u}_{j+1}^{n}).

(57)

Substituting $\bm{L}_{\theta}$ , $\bm{\Lambda}_{\phi}$ , and (56) into (29) and (30) yields

	$\displaystyle\bm{u}_{j}^{n+1}=$	$\displaystyle\bm{u}_{j}^{n}-\frac{1}{2}\lambda_{r}(\bm{L}^{n}_{j+\frac{1}{2},% \theta})^{+}(\bm{\Lambda}_{j+\frac{1}{2},\phi}^{n}-\|\bm{\Lambda}_{j+\frac{1}{2% },\phi}^{n}\|)\bm{L}_{j+\frac{1}{2},\theta}^{n}(\bm{u}_{j+1}^{n}-\bm{u}_{j}^{n})$		(58)
		$\displaystyle-\frac{1}{2}\lambda_{r}(\bm{L}^{n}_{j-\frac{1}{2},\theta})^{+}(% \bm{\Lambda}_{j-\frac{1}{2},\phi}^{n}+\|\bm{\Lambda}_{j-\frac{1}{2},\phi}^{n}\|)% \bm{L}_{j-\frac{1}{2},\theta}^{n}(\bm{u}_{j}^{n}-\bm{u}_{j-1}^{n}),$		(58)

with

\bm{L}_{j+\frac{1}{2},\theta}^{n}=\bm{L}_{\theta}(\bm{u}_{j}^{n},\bm{u}_{j+1}^% {n}),~{}~{}~{}~{}\bm{\Lambda}_{j+\frac{1}{2},\phi}^{n}=\bm{\Lambda}_{\phi}(\bm% {u}_{j}^{n},\bm{u}_{j+1}^{n}).

(59)

Equation (58) serves as our template to evolve the system’s states from $\bm{u}_{j}^{n}$ to $\bm{u}_{j}^{n+1}$ in RoeNet.

Figure 5 presents a schematic diagram of RoeNet, which predicts future discontinuities from smooth observations. We note that for hyperbolic conservation laws with discontinuous solutions, RoeNet can accurately forecast long-term outcomes that are either fully or partially discontinuous. This is achievable even when the training data provided cover only a short window and contain limited information on discontinuities.

4.3.2 Neural network architecture

Figure 6 shows an overview of our neural network architecture. In summary, RoeNet consists of $\bm{L}_{\theta}$ and $\bm{\Lambda}_{\phi}$ , two networks embedded in (58) to serve as our template to evolve the system’s states from $\bm{u}_{j}^{n}$ to $\bm{u}_{j}^{n+1}$ .

Specifically, the network in Figure 6 contains two parts, each consists of a $\bm{L}_{\theta}$ and a $\bm{\Lambda}_{\phi}$ . The first part takes $\bm{u}_{j-1}^{n}$ and $\bm{u}_{j}^{n}$ as input of both $\bm{L}_{\theta}$ and $\bm{\Lambda}_{\phi}$ and outputs $\bm{L}_{j-\frac{1}{2},\theta}$ through $\bm{L}_{\theta}$ and $\bm{\Lambda}_{j-\frac{1}{2},\phi}$ through $\bm{\Lambda}_{\phi}$ . The input $\bm{u}_{j-1}^{n}$ and $\bm{u}_{j}^{n}$ is a vector $[\bm{u}^{n,(1)}_{j-1},\cdots,\bm{u}^{n,(N_{c})}_{j-1};\bm{u}^{n,(1)}_{j},% \cdots,\bm{u}^{n,(N_{c})}_{j}]$ of length $2N_{c}$ with $N_{c}$ components. The output matrix $L_{j-\frac{1}{2},\theta}$ is of size $(N_{c}\times N_{h})$ , and the other output matrix $\bm{\Lambda}_{j-\frac{1}{2},\phi}$ is a diagonal matrix of size $(N_{h}\times N_{h})$ . The second part takes $\bm{u}_{j}^{n}$ and $\bm{u}_{j+1}^{n}$ as the input for both $\bm{L}_{\theta}$ and $\bm{\Lambda}_{\phi}$ and outputs $\bm{L}_{j+\frac{1}{2},\theta}$ through $\bm{L}_{\theta}$ and $\bm{\Lambda}_{j+\frac{1}{2},\phi}$ through $\bm{\Lambda}_{\phi}$ . The input $\bm{u}_{j}^{n}$ and $\bm{u}_{j+1}^{n}$ is a vector $[\bm{u}^{n,(1)}_{j},\cdots,\bm{u}^{n,(N_{c})}_{j};\bm{u}^{n,(1)}_{j+1},\cdots,% \bm{u}^{n,(N_{c})}_{j+1}]$ of length $2N_{c}$ . The output matrices $\bm{L}_{j+\frac{1}{2},\theta}$ and $\bm{\Lambda}_{j+\frac{1}{2},\phi}$ take the same form as the output matrices in the first part. Given the four output matrices $\bm{L}_{j-\frac{1}{2},\theta}$ , $\bm{\Lambda}_{j-\frac{1}{2},\phi}$ , $\bm{L}_{j+\frac{1}{2},\theta}$ , and $\bm{\Lambda}_{j+\frac{1}{2},\phi}$ , we combine them through (58) and (59) to obtain $\bm{u}_{j}^{n+1}$ . Networks $\bm{L}_{\theta}$ and $\bm{\Lambda}_{\phi}$ both consist of a chain of ResBlocks [49] with a linear layer of size $N_{h}\times N_{c}$ and $N_{h}$ at the end, respectively. The ResBlock architecture comprises two convolutional layers and one ReLU layer. The $N_{h}$ learned parameters by $\bm{\Lambda}_{\phi}$ is then transferred into a diagonal matrix of $N_{h}\times N_{h}$ with the learned parameters as its diagonal. The ResBlock has the same architecture as in [49], only with the 2D convolution layers replaced by linear layers. Note that the number in the parentheses is the dimension of the output of each ResBlock, and the computation procedure for grid cell $j$ is applied to all grid cells. Since the computation of each node is independent of other cells except the adjacent cells, we could train them in parallel to achieve high efficiency.

In addition, we implement two ways of padding to address different boundary conditions. For periodic boundary conditions, we use the periodic padding, e.g., if $j=0$ , then $\bm{u}_{j-1}=\bm{u}_{N_{g}}$ , where $N_{g}$ is the number of the grid node. For Neumann boundary conditions, we use the replicate padding, e.g., if $j=0$ , then $\bm{u}_{j-1}=\bm{u}_{0}$ .

By introducing a hidden dimension $N_{h}$ , we have increased the number of network parameters and enhanced the network’s expressive capacity. However, the expansion of the parameter space could lead to multiple numerical optimal solutions during training. To address this, we employ a regularized loss function, which helps ensure that the network parameters converge to a local optimal solution. Importantly, our goal is to use the network to accurately model the evolution of PDEs over time and space; achieving a unique solution for the network parameters is not a requirement.

Algorithm 3 summarizes the recursive relation from the input layer

\bm{u}(t=0)=[\bm{u}_{1}(t=0),\cdots,\bm{u}_{N_{g}}(t=0)]

(60)

to the output layer

\hat{\bm{u}}(t=T_{span})=[\hat{\bm{u}}_{1}(t=T_{span}),\cdots,\hat{\bm{u}}_{N_% {g}}(t=T_{span})],

(61)

for each time step in RoeNet. Here $N_{g}$ is the spatial grid size and $T_{span}$ is the time span $T_{train}$ or $T_{predict}$ . As described in Algorithm 3, feeding $\bm{u}(t=0)$ , $T_{span}=T_{train}$ , temporal step $\Delta t$ , spatial step $\Delta x$ , and the constructed networks $\bm{L}_{\theta}$ and $\bm{\Lambda}_{\phi}$ into RoeNet, we could get predicted $\hat{\bm{u}}(t=T_{train})$ . Then, we choose the MSE as our loss function

\mathcal{L}_{RoeNet}=\|\bm{u}(t=T_{train})-\hat{\bm{u}}(t=T_{train})\|_{MSE}.

(62)

Algorithm 3 Recursive relation from the input layer to the output layer in RoeNet. Here,

\bm{u}_{j},~{}j=1,2,\cdots,N_{g}

represents discretized points

\bm{u}

in spatial coordinate. Source: [4].

1:Inputs:

\bm{u}_{j}(t=0),~{}j=1,2,\cdots,N_{g}

T_{span}

\Delta t

\Delta x

\bm{L}_{\theta}

\bm{\Lambda}_{\phi}

2:Outputs:

\hat{\bm{u}}_{j}(t=T_{span})=\bm{u}_{j}^{N_{t}}

N_{t}=\text{floor}(T_{span}/\Delta t)

\lambda_{r}=\Delta t/\Delta x

\bm{u}_{j}^{0}=\bm{u}_{j}(t=0),~{}j=1,2,\cdots,N_{g}

6:for

n=0

N_{t-1}

7: Calculate

\bm{L}_{j\pm\frac{1}{2},\theta}^{n}

\bm{\Lambda}_{j\pm\frac{1}{2},\phi}^{n}

by substituting

\bm{u}_{j}^{n},~{}j=1,2,\cdots,N_{g}

\bm{L}_{\theta}

, and

\bm{\Lambda}_{\phi}

into (59)

8: Calculate

\bm{u}_{j}^{n+1},~{}j=1,2,\cdots,N_{g}

by substituting

\bm{u}_{j}^{n}

\bm{L}_{j\pm\frac{1}{2},\theta}^{n}

\bm{\Lambda}_{j\pm\frac{1}{2},\phi}^{n}

, and

\lambda_{r}

into (58)

9:end for

4.4 Neural Vortex Method (NVM)

To accurately and efficiently quantify fluid dynamics, we propose the novel NVM framework. This framework utilizes physics-informed neural networks to extract and translate information from the Eulerian specification of the flow field (or images of flow visualizations) into knowledge about the underlying fluid field. As detailed in Figure 7, we integrate these networks with a vorticity-to-velocity Poisson solver to build a fully automated toolchain that extracts high-resolution Eulerian flow fields from Lagrangian inductive priors. This design addresses the challenge of learning directly from high-dimensional observations, such as images, which traditional methods struggle to convert directly into velocity and pressure fields.

We construct a vortex detection network in Section 4.4.1 to identify the positions and the vorticity of Lagrangian vortices from a grid-based velocity field, which from a mathematical perspective connects (31) with (33). This approach simplifies the vorticity field to include only the detected vortices. Given the detected vortices, we then use a vortex dynamics network in Section 4.4.2 to learn the underlying governing dynamics of these finite structures. Dynamics networks accurately model the r.h.s.of (33) under various conditions, resolving the longstanding problem in LVM.

The training of the NVM involves two primary steps: training the detection and dynamics networks. We employ high-fidelity data from direct numerical simulation (DNS) of interactions among 2 to 6 vortices, although the model can generalize to any vorticity field with an arbitrary number of vortices. We initially train the detection network using data from randomly generated vortices and their vorticity fields, then identify vortices’ positions and strengths using this trained network to facilitate the subsequent training of the dynamics network.

4.4.1 Detection network

The input of the detection network is a vorticity field of size $200\times 200\times 1$ . As shown in Figure 8, we first feed the vorticity field into a small one-stage detection network and get the feature map of size $25\times 25\times 512$ (we downsampled 3 times). The detection network consists of a Conv2d-BatchNorm-ReLU combo and a 6-layer-structured ResBlock chain whose size can be adjusted dynamically to the complexity of the problem. The primary reason for downsampling is to avoid extremely unbalanced data and multiple predictions for the same vortex. We then forward the feature map to 2 branches. In the first branch, we conduct a $1\times 1$ convolution to generate a probability score $\hat{p}$ of the possibility that there exists a vortex. If $\hat{p}>0.5$ , we believe there exists a vortex within the corresponding cells of the original $200\times 200\times 1$ vorticity field. In the second branch, we predict the relative position to the left-up corner of the cell of the feature map if the cell contains a vortex. Afterward, we set a bounding box of $10\times 10$ around these predicted vortices and use the weighted average of the positions of the cells of the original vorticity field to find the exact position of the vortex. Finally, the vortex particle strength is calculated as the sum of the value of the cells in the bounding box normalized by the cell area.

In the training process, we penalize the wrong position detection only if the cell containing a vortex in the ground truth given by DNS is not detected. This idea is similar to the real-time object detection in [64]. We do not use the weighted average method to find the position in the training to ensure the detection network can produce detection results as accurately as possible. We use the focal loss [65] to further relieve the unbalanced classification problem.

We mainly use the detection network to generate training data for the dynamics network because we want to use the high-resolution data generated by the method mentioned in Section 5.4.1 instead of by the approximate particle method (BS law). Moreover, there are many situations where BS law is inapplicable, as discussed previously in Section 3.2.6. The detection network enables us to find the positions of the vortices accurately regardless of the situation.

The detection network is responsible for providing necessary information to the dynamics network. After the training, we use the well-trained detection network to detect the vortices in the initial vorticity fields and the evolved vorticity field, both generated by the method in Section 5.4.1. We then apply the nearest-neighbor method to pair the vortices detected in these two fields. Figure 9 shows the case of two fields at $t=0$ and $t=0.2$ . The idea of nearest-neighbor pairing can be perceived from Figure 9 (c). The sample, or these two fields, is dropped if different numbers of vortices are detected in the initial and evolved fields or if a large difference exists in the vorticity of paired vortices. The successfully detected vortices in the initial and evolved vorticity fields are passed together into the dynamics network for its training.

4.4.2 Dynamics network

To learn the underlying dynamics of the vortices, we build a graph neural network similar to [19]. We predict the velocity of one vortex due to influences exerted by the other vortices and the external force. Then we use the fourth-order Runge–Kutta integrator to calculate the position in the next timestamp. As shown in Figure 10, for each vortex, we use a neural network $A(\theta_{1})$ to predict the influences exerted by the other vortices and add them up. Specifically, for each $i$ th vortex, we consider the vortex $j(j\neq i)$ . The difference of their positions can be calculated by $\textrm{diff}_{ij}=\textrm{pos}_{i}-\textrm{pos}_{j}$ , and their L2 distance is $\textrm{dist}_{ij}=\|\textrm{diff}_{ij}\|_{2}$ . The input of the $A(\theta_{1})$ is the vector $(\textrm{diff}_{ij},\textrm{dist}_{ij},\textrm{vort}_{j})$ of length 4. Here, pos and vort are detected by the detection network. The output of $A(\theta_{1})$ is a vector with the same dimension of the flow field, characterizing the induced velocity of the $j$ th vortex to the $i$ th vortex. In this way, we can calculate the induced velocity of each vortex $j$ ( $j\neq i$ ) on the vortex $i$ . We sum up all the induced velocities on the vortex $i$ and treat the result as the induced velocity exerted by the other vortices.

In addition, we use another neural network $A(\theta_{2})$ , to predict the influence caused by the external force, which is determined by the local vorticity and the position of the vortex. The input of $A(\theta_{2})$ is a vector of length 3. The output is the influence exerted by the environment on the vortex $i$ , i.e., the induced velocity of the external force to $i$ th vortex.

The reason we separate the induced velocity into two parts, i.e., $A(\theta_{1})$ and $A(\theta_{2})$ , is as follows. On the one hand, the induced velocities between vortex particles are global, and exhibit a certain symmetry, i.e., the vortex particles interact with each other following the same law. In contrast, the influence of external forces on vortex particles is usually local and direct; thus, we do not need to consider the interaction between particles. The effect of the vortex stretching term in three-dimensional vortex flows or diffusion term in viscous flows is also local and should be included in network $A(\theta_{2})$ . Note that both the outputs of $A(\theta_{1})$ and $A(\theta_{2})$ are a vector with the same dimension of the flow field. Thus, we can add the two kinds of influence together, whose result is defined as the velocity of the vortex $i$ . We feed the velocity into the fourth-order Runge–Kutta integrator to obtain the predicted position of vortex $i$ .

In addition, in predicting the evolution of the flow field, NVM replaces the discrete BS method with a dynamics network composed of ResBlocks. We chose a 5-layer ResBloks to improve the expressiveness of the dynamics network so that we can learn dynamics of different complexity on the same network. Since the dynamics network with 5-layer ResBloks is more complex than the discrete BS method, the computational cost of NVM is higher than that of the Lagrangian vortex method. We remark that although the computational cost of ResBlocks itself is relatively large in NVM, the number of vortex particles needed to predict the evolution of the flow field using NVM is much smaller. Therefore, the overall computational cost of NVM can be greatly reduced.

5 Results

We present several experiments here to highlight the key advantages of our methodologies. For additional examples and ablation tests, please refer to [1, 2, 3, 4].

5.1 Taylor-nets

5.1.1 Dataset generation and training settings

To make a fair comparison with the ground truth, we generate our training and testing datasets by using the same numerical integrator based on a given analytical Hamiltonian. In the learning process, we generate $N_{train}$ training samples, and for each training sample, we first pick a random initial point $(\bm{q}_{0},\bm{p}_{0})$ (input), then use the symplectic integrator discussed in Section 3.2.3 to calculate the value $(\bm{q}_{n},\bm{p}_{n})$ (target) of the trajectory at the end of the training period $T_{train}$ . We do the same to generate a validation dataset with $N_{validation}=100$ samples and the same time span as $T_{train}$ and calculate the validation loss $L_{validation}$ along the training loss $L_{train}$ to evaluate the training process. In addition, we generate a set of testing data with $N_{test}=100$ samples and predicting time span $T_{predict}$ that is around 6000 times larger and calculate the prediction error $\epsilon_{p}$ to evaluate the predictive ability of the model. For simplicity, we use $(\bm{\hat{p}}_{n},\bm{\hat{q}}_{n})$ to represent the predicted values using our trained model.

We remark that our training dataset is relatively smaller than that used by the other methods. Most of the methods, e.g. ODE-net [50] and HNN [15], have to rely on intermediate data in their training data to train the model. That is the dataset is

[(\bm{q}_{0}^{(s)},\bm{p}_{0}^{(s)}),(\bm{q}_{1}^{(s)},\bm{p}_{1}^{(s)}),\dots% ,(\bm{q}_{n-1}^{(s)},\bm{p}_{n-1}^{(s)}),(\bm{q}_{n}^{(s)},\bm{p}_{n}^{(s)})]_% {s=1}^{N_{train}},

where $(\bm{q}_{1}^{(s)},\bm{p}_{1}^{(s)})\dots,(\bm{q}_{n-1}^{(s)},\bm{p}_{n-1}^{(s)})$ are $n-1$ intermediate points collected within $T_{train}$ in between $(\bm{q}_{0}^{(s)},\bm{p}_{0}^{(s)})$ and $(\bm{q}_{n}^{(s)},\bm{p}_{n}^{(s)})$ . On the other hand, we only use two data points per sample, the initial data point and the end point, and our dataset looks like

\left[(\bm{q}_{0}^{(s)},\bm{p}_{0}^{(s)}),(\bm{q}_{n}^{(s)},\bm{p}_{n}^{(s)})% \right]_{s=1}^{N_{train}},

which is $n-1$ times smaller the dataset of the other methods, if we do not count $(\bm{q}_{0}^{(s)},\bm{p}_{0}^{(s)})$ . Our predicting time span $T_{predict}$ is around 6000 times the training period used in the training dataset $T_{train}$ (as compared to 10 times in HNN). This leads to a 600 times compression of the training data, in the dimension of temporal evolution. Note that we fix $T_{train}$ and $T_{predict}$ in practice so that we can train our network more efficiently on GPU. One can also choose to generate training data with different $T_{train}$ for each sample to obtain more robust performance.

We use the Adam optimizer [48]. We choose the automatic differentiation method as our backward propagation method. We have tried both the adjoint sensitivity method, which is used in ODE-net [50] and the automatic differentiation method. Both methods can be used to train the model well. However, we found that using the adjoint sensitivity method is much slower than using the automatic differentiation method considering the large parameter size of neural networks.

All $A_{i}$ and $B_{i}$ in (43) are initialized as $A_{i},B_{i}\sim\mathcal{N}(0,\sqrt{2/[N*N_{h}*(i+1)]})$ , where $N$ is the dimension of the system and $N_{h}$ is the size of the hidden layers. The loss function is

L_{train}=\frac{1}{N_{train}}\sum_{s=1}^{N_{train}}\|\bm{\hat{p}}_{n}^{(s)}-% \bm{p}_{n}^{(s)}\|_{1}+\|\bm{\hat{q}}_{n}^{(s)}-\bm{q}_{n}^{(s)}\|_{1}.

(63)

The validation loss $L_{validation}$ is the same as (67) but with dataset different from the training dataset. We choose $L1$ loss, instead of Mean Square Error (MSE) loss because of its better performance.

We will introduce the experimental result for an ideal pendulum system, which is defined

\mathcal{H}(q,p)=\frac{1}{2}p^{2}-\cos{(q)}.

(64)

We pick a random initial point for training $(\bm{q}_{0},\bm{p}_{0})\in\left[-2,2\right]\times\left[-2,2\right]$ .

To show the predictive ability of our model, we pick $T_{predict}=20\pi$ . We pick 15 as the sample size since we find that small $N_{train}$ ’s are sufficient to generate excellent results. We use 100 epochs for training, and 10 as the $step\_size$ (the period of learning rate decay), and 0.8 as $\gamma$ (the multiplicative factor of learning rate decay). The learning rate of each parameter group is decayed by $\gamma$ every $step\_size$ epochs, which prevents the model from overshooting the local minimum. The dynamic learning rate can also make our model converge faster. $M$ indicates the number of terms of the Taylor polynomial introduced in the construction of the neural networks (43). Through experimentation, we find that 8 terms can represent most functions well. We choose 16 as $N_{h}$ , the dimension of hidden layers.

5.1.2 Predictive ability and robustness

Table 2: Comparison of

\epsilon_{p}

for the pendulum problem without noise, with noise

\sigma_{1},\sigma_{2}\sim\mathcal{N}(0,0.1)

, and with noise

\sigma_{1},\sigma_{2}\sim\mathcal{N}(0,0.5)

. Source: [1].

Methods	Taylor-net	HNN	ODE-net
$\epsilon_{p}$ , without noise	0.213	0.377	1.416
$\epsilon_{p}$ , with noise $\sigma_{1},\sigma_{2}\sim\mathcal{N}(0,0.1)$	1.667	2.433	3.301
$\epsilon_{p}$ , with noise $\sigma_{1},\sigma_{2}\sim\mathcal{N}(0,0.5)$	1.293	2.416	27.114

Now, to assess how well our method can predict the future flow, we compare the predictive ability of Taylor-net with ODE-net and HNN. We apply all three methods on the pendulum problem, and let $T_{train}=0.01$ and $T_{predict}=20\pi$ . We evaluate the performance of the models by calculating the average prediction error at each predicted points, defined by

\epsilon_{p}^{(n_{t})}=\frac{1}{N_{test}}\sum_{s=1}^{N_{test}}\|\bm{\hat{p}}^{% (s,n_{t})}_{n}-\bm{p}_{n}^{(s,n_{t})}\|_{1}+\|\bm{\hat{q}}_{n}^{(s,n_{t})}-\bm% {q}_{n}^{(s,n_{t})}\|_{1},

(65)

and the average $\epsilon_{p}^{(n_{t})}$ over $T_{predict}$ is

\epsilon_{p}=\frac{1}{N_{T}}\sum_{n_{t}=1}^{N_{T}}\epsilon_{p}^{(n_{t})},

(66)

where $N_{test}$ represents the testing sample size specified in Section 5.1.1 and $N_{T}=T_{predict}/\Delta t$ with $\Delta t=0.01$ . After experimentation, we find that Taylor-net has stronger predictive ability than the other two methods. The first row of Table 2 shows the average prediction error of 100 testing samples using the three methods over $T_{predict}$ when no noise is added. The prediction error of HNN is almost double that of Taylor-net, while the prediction error of ODE-net is about 7 times that of Taylor-net. To analyze the difference more quantitatively, we made several plots to help us better compare the prediction results. Figure 11 shows the plots of prediction error $\epsilon_{p}^{(n_{t})}$ against $t=n_{t}\Delta t$ over $T_{predict}$ for all three methods. In Figure 12, we plot the prediction of position $q$ against time period for all three methods as well as the ground truth in order to see how well the prediction results match the ground truth. From Figure 12 (a), we can already see that the prediction result of ODE-net gradually deviates from the ground truth as time progresses, while the prediction of Taylor-net and HNN stays mostly consistent with the ground truth, with the former being slightly closer to the ground truth. The difference between Taylor-net and HNN can be seen more clearly in Figure 11 (a). Observe that the prediction error of Taylor-net is obviously smaller than that of the other two methods, and the difference becomes more and more apparent as time increases. The prediction error of ODE-net is larger than HNN and Taylor-net at the beginning of $T_{predict}$ and increases at a much faster rate than the other two methods. Although the prediction error of HNN has no obvious difference from that of Taylor-net at the beginning, it gradually diverges from the prediction error of Taylor-net.

5.2 NSSNNs

5.2.1 Dataset generation and training settings

We use 6 linear layers with hidden size 64 to model $\mathcal{H}_{\theta}$ , all of which are followed by a Sigmoid activation function except the last one. The derivatives $\partial\mathcal{H}_{\theta}/\partial\bm{p}$ , $\partial\mathcal{H}_{\theta}/\partial\bm{q}$ , $\partial\mathcal{H}_{\theta}/\partial\bm{x}$ , $\partial\mathcal{H}_{\theta}/\partial\bm{y}$ are all obtained by automatic differentiation in Pytorch [66]. The weights of the linear layers are initialized by Xavier initializaiton [67].

We generate the dataset for training and validation using high-precision numerical solver [55], where the ratio of training and validation datasets is $9:1$ . We set the dataset $(\bm{q}_{0}^{j},\bm{p}_{0}^{j})$ as the start input and $(\bm{q}^{j},\bm{p}^{j})$ as the target with $j=1,2,\cdots,N_{s}$ , and the time span between $(\bm{q}_{0}^{j},\bm{p}_{0}^{j})$ and $(\bm{q}^{j},\bm{p}^{j})$ is $T_{train}$ . Feeding $(\bm{q}_{0},\bm{p}_{0})=(\bm{q}_{0}^{j},\bm{p}_{0}^{j}),~{}t_{0}=0,~{}t=T_{train}$ , and time step $\textrm{d}t$ in Algorithm 1 to get the predicted variables $(\hat{\bm{q}}^{j},\hat{\bm{p}}^{j},\hat{\bm{x}}^{j},\hat{\bm{y}}^{j})$ . Accordingly, the loss function is defined as

\mathcal{L}_{NSSNN}=\frac{1}{N_{b}}\sum_{j=1}^{N_{b}}\|\bm{q}^{(j)}-\hat{\bm{q% }}^{(j)}\|_{1}+\|\bm{p}^{(j)}-\hat{\bm{p}}^{(j)}\|_{1}+\|\bm{q}^{(j)}-\hat{\bm% {x}}^{(j)}\|_{1}+\|\bm{p}^{(j)}-\hat{\bm{y}}^{(j)}\|_{1},

(67)

where $N_{b}=512$ is the batch size of the training samples. We use the Adam optimizer [48] with learning rate 0.05. The learning rate is multiplied by 0.8 for every 10 epoches.

Taking system $\mathcal{H}(q,p)=0.5(q^{2}+1)(p^{2}+1)$ as an example, we carry out a series of ablation tests based on our constructed networks to find the proper parameters. Normally, we set the time span, time step and dateset size as $T=0.01$ , $\textrm{d}t=0.01$ and $N_{s}=1280$ . The choice of $\omega$ in (14) is largely flexible since NSSNN is not sensitive to the parameter $\omega$ when it is larger than a certain threshold. We pick the $L1$ loss function to train our network due to its better performance. In addition, we already introduced a regularization term in the symplectic integrator embedded in the network; thus, there is no need to add the regularization term in the loss function. The integral time step in the sympletic integrator is a vital parameter, and the choice of $\textrm{d}t$ largely depends on the time span $T_{train}$ . In general, we should take relatively small $\textrm{d}t$ for the dataset with larger time span $T_{train}$ .

5.2.2 Spring system

We compare five implementations that learn and predict Hamiltonian systems. The first one is NeuralODE [50], which trains the system by embedding the network $\bm{f}_{\theta}\to(\textrm{d}\bm{q}/\textrm{d}t,\textrm{d}\bm{p}/\textrm{d}t)$ into the Runge-Kutta (RK) integrator. The other four, however, achieve the goal by fitting the Hamiltonian $\mathcal{H}_{\theta}\to\mathcal{H}$ based on (6). Specifically, HNN trains the network with the constraints of the Hamiltonian symplectic gradient along with the time derivative of system variables and then embeds the well-trained $\mathcal{H}_{\theta}$ into the RK integrator for predicting the system [15]. The third and fourth implementations are ablation tests. One of them is improved HNN (IHNN), which embeds the well-trained $\mathcal{H}_{\theta}$ into the nonseparable symplectic integrator (Tao’s integrator) for predicting. The other is to directly embed $\mathcal{H}_{\theta}$ into the RK integrator for training, which we call HRK. The fifth method is NSSNN, which embeds $\mathcal{H}_{\theta}$ into the nonseparable symplectic integrator for training.

For fair comparison, we adopt the same network structure (except that the dimension of output layer in NeuralODE is two times larger than that in the other four), the same $L1$ loss function and same size of the dataset, and the precision of all integral schemes is second order, and the other parameters keep consistent with the one in Section 5.2.1. The time derivative in the dataset for training HNN and IHNN is obtained by the first difference method

\frac{\textrm{d}\bm{q}}{\textrm{d}t}\approx\frac{\bm{q}(T_{train})-\bm{q}(0)}{% T_{train}}~{}~{}~{}~{}\textrm{and}~{}~{}~{}~{}\frac{\textrm{d}\bm{p}}{\textrm{% d}t}\approx\frac{\bm{p}(T_{train})-\bm{q}(0)}{T_{train}}.

(68)

Figure 13 demonstrates the differences between the five methods using a spring system $\mathcal{H}=0.5(q^{2}+p^{2})$ with different time span $T_{train}=0.4,~{}1$ and same time step $\textrm{d}t=0.2$ . We can see that by introducing the nonseparable symplectic integrator into the prediction of the Hamiltonian system, NSSNN has a stronger long-term predicting ability than all the other methods. In addition, the prediction of HNN and IHNN lies in the dataset with time derivative; consequently, it will lead to a larger error when the given time span $T_{train}$ is large.

5.2.3 Modeling vortex dynamics of multi-particle system

For two-dimensional vortex particle systems, the dynamical equations of particle positions $(x_{j},y_{j}),~{}j=1,2,\cdots,N_{v}$ with particle strengths $\Gamma_{j}$ can be written in the generalized Hamiltonian form as

\Gamma_{j}\frac{\textrm{d}x_{j}}{\textrm{d}t}=-\frac{\partial\mathcal{H}^{p}}{% \partial y_{j}},~{}~{}~{}~{}\Gamma_{j}\frac{\textrm{d}y_{j}}{\textrm{d}t}=% \frac{\partial\mathcal{H}^{p}}{\partial x_{j}},~{}~{}~{}~{}\textrm{with}~{}~{}% ~{}~{}\mathcal{H}^{p}=\frac{1}{4\pi}\sum_{j,k=1}^{N_{v}}\Gamma_{j}\Gamma_{k}% \log(|x_{j}-x_{k}|).

(69)

By including the given particle strengths $\Gamma_{j}$ in Algorithm 1, we can still adopt the method mentioned above to learn the Hamiltonian in (69) when there are fewer particles. However, considering a system with $N_{v}\gg 2$ particles, the cost to collect training data from all $N_{v}$ particles might be high, and the training process can be time-consuming. Thus, instead of collecting information from all $N_{v}$ particles to train our model, we only use data collected from two bodies as training data to make predictions of the dynamics of $N_{v}$ particles.

Specifically, we assume the interactive models between particle pairs with unit particle strengths $\Gamma_{j}=1$ are the same, and their corresponding Hamiltonian can be represented as network $\hat{\mathcal{H}}_{\theta}(\bm{x}_{j},\bm{x}_{k})$ , based on which the corresponding Hamiltonian of $N_{v}$ particles can be written as [19, 18]

\mathcal{H}_{\theta}^{p}=\sum_{i,j=1}^{N_{v}}\Gamma_{j}\Gamma_{k}\hat{\mathcal% {H}}_{\theta}(\bm{x}_{j},\bm{x}_{k}).

(70)

We embed (70) into the symplectic integrator that includes $\Gamma_{j}$ to obtain the final network architecture.

The setup of the multi-particle problem is similar to the previous problems. The training time span is $T_{train}=0.01$ while the prediction period can be up to $T_{predict}=40$ . We use 2048 clean data samples to train our model. The training process takes about 100 epochs for the loss to converge. In Figure 14, we use our trained model to predict the dynamics of 6000-particle systems, including Taylor and Leapfrog vortices. We generate results of Taylor vortex and Leapfrop vortex using NSSNN and HNN and compare them with the ground truth. Vortex elements are used with corresponding initial vorticity conditions of Taylor vortex and Leapfrop vortex [68]. The difficulty of the numerical modeling of these two systems lies in the separation of different dynamical vortices instead of having them merging into a bigger structure. In both cases, the vortices evolved using NSSNN are separated nicely as the ground truth shows, while the vortices merge together using HNN.

5.3 RoeNet

5.3.1 Dataset generation and training settings

For our experiments, we construct datasets using either analytical solutions or numerical solutions calculated with a high-resolution finite difference method. These datasets are then divided into training and validation sets in a $9:1$ ratio. The physical quantities solved in our experiments are of order $O(1)$ and, consequently, do not require normalization.

We train the network over a time span defined as $T_{train}$ and use it to predict target values over a time span of $T_{predict}$ , where $T_{predict}>T_{train}$ and $T_{predict}$ starts no earlier than $T_{train}$ .

In all experiments, the Adam optimizer [48] is employed, with a learning rate of $\gamma$ as listed in Table 3. The learning rate decays by a multiplicative factor of 0.9 every 5 to 20 epochs. This optimizer is chosen for its ability to adapt learning rates based on the gradient history of each parameter, which facilitates faster and more precise convergence compared to methods with fixed learning rates. Training is conducted with batch sizes ranging from 8 to 32, and all models undergo 100 epochs to ensure convergence. Notably, extending the number of training epochs can enhance training accuracy, reflecting a trade-off between training time and accuracy.

Table 3: Experimental set-up for RoeNet. Source: [4].

	1C Linear	Sod Tube
Boundary condition	Periodic	Neumann
Time step $\Delta t$	0.02	0.001
Space step $\Delta x$	0.01	0.005
Training time span	0.04	0.06
Predicting time span $>$	2	0.1
Data set samples	500	2000
Data set generation	Analytical	Analytical
Components number $N_{c}$	1	3
Hidden dimension $N_{h}$	1	64

5.3.2 A simple example

Taking a linear hyperbolic PDE with one component (1C Linear in Table 3)

\begin{dcases}\bm{F}=u,\\ u(t=0,x)=e^{-300x^{2}}\end{dcases}

(71)

in (18) as an example, we evaluate the performance of RoeNet. This hyperbolic PDE models a Gaussian wave traveling along a line at constant speed. Figure 15 illustrates the propagation of this Gaussian wave over time, simulated using RoeNet with both clean and noisy training data sets, alongside results from the Roe solver and the analytical solution. RoeNet’s predictions, regardless of noise in the training data, align closely with the analytical results throughout the entire computational time domain. In contrast, simulations using the Roe solver show rapid flattening and dissipation of the wave over time. Although the prediction error of RoeNet does accumulate gradually, this increase in numerical error is significantly slower than that observed with traditional numerical methods. As a result, RoeNet demonstrates superior performance with its more accurate predictions.

5.3.3 Sod shock tube

We take the one-dimensional diatomic ideal gas problem to assess the performance of our model on solving multi-component Riemann problems with nonlinear flux functions (Sode Tube in Table 3). Specifically, the system is modeled by (18) with

\begin{dcases}\bm{u}=(\rho,\rho v,e)^{T},\\ \bm{F}=[\rho v,\rho v^{2}+p,v(e+p)]^{T},\end{dcases}

(72)

where $\rho$ is the density, $p$ is the pressure, $e$ is the energy, $v$ is the velocity, and the pressure $p$ is related to the conserved quantities through the equation of state $p=(\gamma-1)\left(e-0.5\rho v^{2}\right)$ with a heat capacity ratio $\gamma\approx 1.4$ . We apply our model to the Sod shock tube problem [69], a one-dimensional Riemann problem in the form of (18) with (72). The time evolution of this problem can be described by solving the mass, momentum, and energy conservation of ideal gas inside a slender tube, which leads to three characteristics, describing the propagation speed of various regions in the system [69]. In Figure 16, we plot the three components of the problem, at $t=0.1$ . Note that due to the dissipation effects incorporated in our model, there is no sign of sonic glitch. The result shows that RoeNet exhibits higher accuracy in predicting the discontinuities of the nonlinear Riemann problem.

5.3.4 Comparison with other methods

Current neural network methods, such as Physics-Informed Neural Networks (PINNs) [27], typically require a pre-established PDE model and continuous interaction with this model during training to adjust the loss, using complex Hessian-based optimizers like L-BFGS that often result in extended training durations. In contrast, RoeNet operates independently of any explicit equation knowledge, utilizing only the training datasets and relying on more efficient gradient-based optimizers such as SGD.

Conventional neural networks struggle to predict the emergence and evolution of discontinuous solutions without a governing equation. Our model, RoeNet, showcases a unique capability to handle tasks that traditional machine learning approaches cannot, particularly in predicting dynamics for future times not included in the training data. This is demonstrated in Figure 17, where RoeNet outperforms PINNs [70] in the simulation of the 1C Linear problem described in Section 5.3.2, providing accurate predictions for future states beyond the training scope.

RoeNet, as a data-driven solver, does not require prior knowledge of the system’s evolution equations, setting it apart from traditional numerical methods. It employs an optimization-based approach to construct its numerical scheme, with an optimization space that fully encompasses that of the Roe solver. This enables RoeNet to deliver more precise simulations of PDE evolution compared to conventional numerical approaches.

5.4 NVM

5.4.1 Dataset generation and training settings

We randomly sample 2 to 6 vortices and create the initial vorticity field through convolution with a Gaussian kernel $\sim\mathcal{N}(0,0.01)$ . This process is repeated 2000 times to generate $N_{s}=2000$ samples. DNS is performed to solve (31) in the periodic box using a standard pseudo-spectral method [71]. Aliasing errors are removed using the two-thirds truncation method with the maximum wavenumber $k_{\max}\approx N/3$ . The Fourier coefficients of the velocity are advanced in time using a second-order Adams–Bashforth method. The time step is chosen to ensure that the Courant–Friedrichs–Lewy number is less than $0.5$ for numerical stability and accuracy. To obtain accurate DNS data samples, we set the grid size as $N=1024$ . Regarding the kinematic viscosity, we set $\nu=0$ and $\nu=0.001$ for different cases. The pseudo-spectral method used in this DNS is similar to that described in [61, 72, 73].

We use $N_{train}=0.9N_{s}=1600$ samples with the time span $T_{train}$ for the training of the dynamics network. The DNS dataset is generated with random initial conditions independent of the predicted vortex evolution. The time step of vortex evolution is set as $\textrm{d}t$ . For the leapfrog example, we set the parameters as $T_{train}=1$ and $\textrm{d}t=0.001$ . For the turbulent flow example, we set the parameters as $T_{train}=0.001$ and $\textrm{d}t=0.001$ . For other examples, the parameters are set as $T_{train}=0.2$ and $\textrm{d}t=0.1$ . In general, the parameters are chosen within a wide range, indicating the robustness of the network. We use the trained network to predict the vortex dynamics at time $T_{predict}$ . We show that the prediction time span $T_{prediction}$ is larger than the training time span $T_{train}$ in the results section, in some cases up to tens of times of $T_{train}$ .

For both the detection network and the dynamics network, we use Adam optimizer [48] with a learning rate of 1e-3. The learning rate decays every 20 epochs by a multiplicative factor of 0.8. For the detection network, we use a batch size of 32 and train it for 350 epochs. We use the cross entropy as the classification loss and L1 loss for position prediction. To relieve the unbalanced data problem in the detection network, we implement Focal loss [65] with $\alpha=0.4$ and $\gamma=2$ . It takes 15 minutes to converge on a single Nvidia RTX 2080Ti GPU. For the dynamics network, we use a batch size of 64 and train it for 500 epochs. We use L1 loss for position prediction. It takes 25 minutes to converge on a single Nvidia RTX 2080Ti GPU.

5.4.2 Comparison between NVM and LVM

To demonstrate that NVM is a better approach to capturing fluid dynamics than the traditional LVM, we compare the prediction results of the NVM and the LVM for solving NS equations in the periodic box. In the prediction, we initialize two vortex particles at $\bm{X}_{1}=(\pi-0.4,\pi-0.6)$ and $\bm{X}_{2}=(\pi+0.4,\pi+0.6)$ , where the corresponding particle strength are $\Gamma_{1}=0.75$ and $\Gamma_{2}=0.75$ . We plot the results using the NVM and LVM and the relative error of velocity in the simulation in Figure 18 (a), (b), and (c), respectively. Here, the relative error of velocity is defined as

\epsilon_{u}=\frac{\|\bm{u}_{predict}-\bm{u}_{true}\|_{L^{2}}}{\|\bm{u}_{true}% \|_{L^{2}}},

(73)

where $\bm{u}_{predict}$ denotes the predicted or simulated solution and $\bm{u}_{true}$ denote the ground truth solution.

It is quite obvious that in Figure 18 (a), the predictions made by NVM match the positions of vortices generated by DNS almost perfectly, while the predictions made by BS law in Figures 18 (b) contain a large error. The divergence of the relative error of velocity is shown in Figure 18 (c), which shows that the NVM outperforms traditional methods by increasing amounts as the predicting period becomes longer.

5.4.3 Turbulent flows

Besides simple systems, NVM is capable of predicting complicated turbulence systems. This example’s primary purpose is to illustrate our network’s ability to handle more complex problems.

Figure 19 depicts the two-dimensional Lagrangian scalar fields at $t=1$ with the initial condition $\phi=x$ and resolution $2000^{2}$ . The governing equation of the Lagrangian scalar fields is

\frac{\partial\phi}{\partial t}+\bm{u}\cdot\bm{\nabla}\phi=0.

(74)

The evolution of the Lagrangian scalar fields is induced by $O(10)$ and $O(100)$ NVM vortex particles at random positions $\backsim U(0,4)$ with random strengths $\backsim U(0,2)$ . We remark that the same trained model is used for both cases. There is no correlation between the positions and vortex particle strengths of the two sets of vortex particles.

Based on the particle velocity field from the NVM, a backward-particle-tracking method is applied to solve (74). Then the iso-contour of the Lagrangian field can be extracted as material structures in the evolution [74, 75, 76, 77, 78]. In Figure 19 (a), the spiral structure [79, 80] of individual NVM vortex particles can be observed clearly due to the small number of NVM vortex particles. In Figure 19 (b), the underlying field exhibits turbulent behaviors since it is generated with a large number of NVM vortex particles.

Generally, the high-resolution results shown in Figure 19 can only be achieved by supercomputation using grid-based methods [74], while NVM allows these to be generated on any laptop with GPU. We demonstrate that NVM is capable of generating an accurate depiction of complex turbulence systems with low computational costs.

6 Conclusion

6.1 Summary

This thesis introduces a novel data-driven framework, which demonstrates a significant advancement in predictive modeling for long-term forecasts by integrating physics-based priors into learning algorithms. This integration ensures intrinsic preservation of the physical structures of the systems analyzed, thereby maintaining mathematical symmetries and physical conservation laws. As a result, the models demonstrate superior performance in terms of prediction accuracy, robustness, and predictive capability, particularly in identifying patterns not present within the training dataset, despite the use of small datasets, short training periods, and small sample sizes.

In particular, we have developed four distinct algorithms, each designed to incorporate specific physics-based priors relevant to different types of nonlinear systems. These include the symplectic structure for both separable and nonseparable Hamiltonian systems, Hyperbolic Conservation Law for hyperbolic partial differential equations, and Helmholtz’s Theorem for incompressible fluid dynamics. The integration of physics-based priors not only narrows the solution space, thereby streamlining computational demands, but also enhances the reliability and validity of the predictions. Moreover, embedding these structures within neural networks significantly expands their capacity to capture and reproduce complex patterns inherent in physical phenomena, which conventional networks often fail to recognize. This expanded capability allows for a more comprehensive representation of potential physical behaviors, substantially improving the models’ applicability and predictive accuracy.

6.2 Limitations and Future Work

We also recognize our models have several limitations. Firstly, neural networks that include an embedded integrator often require a longer training period compared to those trained on datasets with explicit time derivatives. Secondly, our method employs an explicit scheme for time evolution, which necessitates a small time step to ensure accuracy. Although a smaller time step can lead to higher discretization accuracy, this advantage must be weighed against increased training costs and the risk of gradient explosion. In our future work, we are considering the adoption of implicit formats, such as leveraging RNN structures, which may offer more stability and efficiency. In addition, our current model is designed as an end-to-end system that does not account for environmental variability. To address this issue, we will explore online learning techniques to enhance the model’s adaptability in changing conditions. Lastly, To enhance the applicability of our model, a significant focus of our future research will be dedicated to develo** scalable methods that can be generalized to various PDEs, aiming to achieve a versatile and universally applicable framework for various systems.

References

Tong et al. [2021] Yun** Tong, Shiying Xiong, Xingzhe He, Guanghan Pan, and Bo Zhu. Symplectic neural networks in taylor series form for hamiltonian systems. Journal of Computational Physics, 437:110325, 2021.
Xiong et al. [2020] Shiying Xiong, Yun** Tong, Xingzhe He, Shuqi Yang, Cheng Yang, and Bo Zhu. Nonseparable symplectic neural networks. arXiv preprint arXiv:2010.12636, 2020.
Xiong et al. [2023] Shiying Xiong, Xingzhe He, Yun** Tong, Yitong Deng, and Bo Zhu. Neural vortex method: From finite lagrangian particles to infinite dimensional eulerian dynamics. Computers & Fluids, 258:105811, 2023.
Tong et al. [2024] Yun** Tong, Shiying Xiong, Xingzhe He, Shuqi Yang, Zhecheng Wang, Rui Tao, Runze Liu, and Bo Zhu. Roenet: Predicting discontinuity of hyperbolic systems from continuous data. International Journal for Numerical Methods in Engineering, 125(6):e7406, 2024.
Weinan [2021] E Weinan. The dawning of a new era in applied mathematics. Notices of the American Mathematical Society, 68(4):565–571, 2021.
Brunton et al. [2020] S. L. Brunton, B. R. Noack, and P. Koumoutsakos. Machine Learning for Fluid Mechanics. Annu. Rev. Fluid Mech., 52:477–508, 2020.
Hughes et al. [2019] T. W. Hughes, I. A. D. Williamson, M. Minkov, and S. Fan. Wave physics as an analog recurrent neural network. Sci. Adv., 5:6946, 2019.
Sellier et al. [2019] J. M. Sellier, G. M. Caron, and J. Leygonie. Signed particles and neural networks, towards efficient simulations of quantum systems. J. Comput. Phys., 387:154–162, 2019.
Hernandez et al. [2020] Quercus Hernandez, Alberto Badias, David Gonzalez, Francisco Chinesta, and Elias Cueto. Structure-preserving neural networks. arXiv:2004.04653, 2020.
Teicherta et al. [2019] G. H. Teicherta, A. R. Natarajanc, A. Van der Venc, and K. Garikipati. Machine learning materials physics: Integrable deep neural networks enable scale bridging by learning free energy functions. Comput. Methods Appl. Mech. Engrg., 353:201–216, 2019.
Regazzoni et al. [2019] F Regazzoni, L Dedé, and A Quarteroni. Machine learning for fast and reliable solution of time-dependent differential equations. J. Comput. Phys., 397:108852, 2019.
Raissi and Karniadakis [2018] M. Raissi and G. E. Karniadakis. Hidden physics models: Machine learning of nonlinear partial differential equations. J. Comput. Phys., 357:125–141, 2018.
Sirignano and Spiliopoulos [2018] J. Sirignano and K. Spiliopoulos. DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys., 375:686–707, 2018.
Raissi et al. [2019] M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys., 378:686–707, 2019.
Greydanus et al. [2019] S. Greydanus, M. Dzamba, and J. Yosinski. Hamiltonian neural networks. In Conference on Neural Information Processing Systems, pages 15379–15389, 2019.
Chen et al. [2020] Z. Chen, J. Zhang, M. Arjovsky, and L. Bottou. Symplectic recurrent neural networks. In International Conference on Learning Representations, 2020.
DiPietro et al. [2020] D. DiPietro, S. Xiong, and B. Zhu. Sparse symplectically integrated neural networks. In Advances in Neural Information Processing Systems, 2020.
Sanchez-Gonzalez et al. [2019] A. Sanchez-Gonzalez, V. Bapst, K. Cranmer, and P. Battaglia. Hamiltonian graph networks with ODE integrators. arXiv:1909.12790, 2019.
Battaglia et al. [2016] P. Battaglia, R. Pascanu, M. Lai, and D. J. Rezende. Interaction networks for learning about objects, relations and physics. In Advances in Neural Information Processing Systems, pages 4502–4510, 2016.
** et al. [2020] P. **, A. Zhu, G. E. Karniadakis, and Y. Tang. Symplectic networks: intrinsic structure-preserving networks for identifying Hamiltonian systems. arXiv:2001.03750, 2020.
Toth et al. [2020] P. Toth, D. J. Rezende, A. Jaegle, S. Racaniére, A. Botev, and I. Higgins. Hamiltonian generative networks. In International Conference on Learning Representations, 2020.
Zhong et al. [2020] Y. D. Zhong, B. Dey, and A. Chakraborty. Symplectic ODE-Net: learning Hamiltonian dynamics with control. In International Conference on Learning Representations, 2020.
Yarosky [2017] D. Yarosky. Error bounds for approximations with deep ReLU networks. Neural Netw., 94:103–114, 2017.
Petersen and Voigtländer [2018] P. Petersen and F. Voigtländer. Optimal approximation of piecewise smooth functions using deep ReLU neural networks. Neural Netw., 170:296–330, 2018.
Imaizumi and Fukumizu [2019] M. Imaizumi and K. Fukumizu. Deep learning networks learn non-smooth functions effectively. In The Institute of Statistical Mathematics, pages 869–878. The 22nd International Conference on Artificial Intelligence and Statistics, 2019.
Suzuki [2019] T. Suzuki. Adaptivity of deep relu network for learning in besov and mixed smooth besov spaces: Optimal rate and curse of dimensionality. In The University of Tokyo. International Conference on Learning Representations, 2019.
Raissi et al. [2017] M. Raissi, P. Perdikaris, and G. E. Karniadakis. Inferring solutions of differential equations using noisy multi-fidelity data. J. Comput. Phys., 335:736–746, 2017.
Hornik et al. [1989] K. Hornik, M. Stinchcombe, and W. Halbert. Multilayer feedforward networks are universal approximators. Neural Netw., 2:359–366, 1989.
Zhang et al. [2019] D. Zhang, L. Guo, and G. E. Karniadakis. Learning in modal space: solving time-dependent stochastic PDEs using physics-informed neural networks. SIAM J. Sci. Comput., 42:A639–A665, 2019.
Michoski et al. [2019] C. Michoski, M. Milosavljevic, T. Oliver, and D. Hatch. Solving differential equations using deep neural networks. Neurocomputing, 399:193–212, 2019.
Mao et al. [2020] Z. Mao, A. D. Jagtap, and G. E. Karniadakis. Physics-informed neural networks for high-speed flows. Comput. Method. Appl. M., 360:112789, 2020.
Duraisamy et al. [2019] K. Duraisamy, G. Iaccarino, and H. Xiao. Turbulence modeling in the age of data. Annu. Rev. Fluid Mech., 51:357–377, 2019.
Xie et al. [2018] Y. Xie, E. Franz, M. Chu, and N. Thuerey. tempogan: A temporally coherent, volumetric gan for super-resolution fluid flow. ACM Trans. Graph., 37(4):1–15, 2018.
Chu and Thuerey [2017] M. Chu and N. Thuerey. Data-driven synthesis of smoke flows with cnn-based feature descriptors. ACM Trans. Graph., 36(4):1–14, 2017.
Anderson et al. [1996] J. Anderson, I. Kevrekidis, and R. Rico-Martinez. A comparison of recurrent training algorithms for time series analysis and system identification. Comput. Chem. Eng., 20:S751–S756, 1996.
Crutchfield and McNamara [1987] James P Crutchfield and Bruce S McNamara. Equations of motion from a data series. Complex Syst., 1(417-452):121, 1987.
Daniels and Nemenman [2015] Bryan C Daniels and Ilya Nemenman. Automated adaptive inference of phenomenological dynamical models. Nat. Commun., 6(1):1–8, 2015.
Wang et al. [2017] J. Wang, J. Wu, and H. Xiao. Physics-informed machine learning approach for reconstructing reynolds stress modeling discrepancies based on dns data. Phys. Rev. Fluids, 2(3):034603, 2017.
Hammond et al. [2022] J. Hammond, F. Montomoli, M. Pietropaoli, R. D. Sandberg, and V. M. Machine learning for the development of data-driven turbulence closures in coolant systems. J. Turbomach., 144(8):081003, 2022.
Xu et al. [2022] X. Xu, F. Waschkowski, A. S. Ooi, and R. D. Sandberg. Towards robust and accurate reynolds-averaged closures for natural convection via multi-objective cfd-driven machine learning. Int. J. Heat Mass Transf., 187:122557, 2022.
Mohan et al. [2020a] A. T. Mohan, N. Lubbers, D. Livescu, and M. Chertkov. Embedding hard physical constraints in convolutional neural networks for 3D turbulence. In International Conference on Learning Representations, 2020a.
Yang et al. [2019] X. Yang, S. Zafar, J. Wang, and H. Xiao. Predictive large-eddy-simulation wall modeling via physics-informed neural networks. Phys. Rev. Fluids, 4:034602, 2019.
Raissi et al. [2020] M. Raissi, A. Yazdani, and G. E. Karniadakis. Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations. Science, 367(6481):1026–1030, 2020.
Belbute-Peres et al. [2020] F. Belbute-Peres, T. Economon, and Z. Kolter. Combining differentiable pde solvers and graph neural networks for fluid flow prediction. In International Conference on Machine Learning, pages 2402–2411, 2020.
Lye et al. [2020] K. Lye, S. Mishra, and D. Ray. Deep learning observables in computational fluid dynamics. J. Comput. Phys., 410:109339, 2020.
White et al. [2019] Cristina White, Daniela Ushizima, and Charbel Farhat. Neural networks predict fluid dynamics solutions from tiny datasets. arXiv preprint arXiv:1902.00091, 2019.
Mohan et al. [2020b] Arvind T Mohan, Nicholas Lubbers, Daniel Livescu, and Michael Chertkov. Embedding hard physical constraints in neural network coarse-graining of 3d turbulence. arXiv preprint arXiv:2002.00021, 2020b.
Kingma and Ba [2014] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
He et al. [2016] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognitionn, pages 770–778, 2016.
Chen et al. [2018] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud. Neural ordinary differential equations. In Conference on Neural Information Processing Systems, pages 6571–6583, 2018.
Pontryagin [2018] Lev Semenovich Pontryagin. Mathematical theory of optimal processes. Routledge, 2018.
Forest and Ruth [1990] E. Forest and R. D. Ruth. Fourth-order symplectic integration. Physica D, 43:105–117, 1990.
Yoshida [1990] H. Yoshida. Construction of higher order symplectic integrators. Phys. Lett. A, 150:262–268, 1990.
Candy and Rozmus [1991] J. Candy and W. Rozmus. A symplectic integration algorithm for separable Hamiltonian functions. J. Comput. Phys., 92:230–256, 1991.
Tao [2016] Molei Tao. Explicit symplectic approximation of nonseparable hamiltonians: Algorithm and long time performance. Physical Review E, 94(4):043303, 2016.
Wu et al. [2015] J. Z. Wu, H. Y. Ma, and M. D. Zhou. Vortical Flows. Springer, 2015.
Evans [2010] L. C. Evans. Partial Differential Equations. American Mathematical Society, 2 edition, 2010.
Roe [1981] P. L. Roe. Approximate riemann solvers, parameter vectors and difference schemes. J. Comput. Phys., 43:357–372, 1981.
Helmholtz [1858] H. Helmholtz. Uber integrale der hydrodynamischen Gleichungen welche den Wirbel-bewegungen ensprechen. J. Reine Angew. Math, 55:25–55, 1858.
Yang and Pullin [2010] Y. Yang and D. I. Pullin. On Lagrangian and vortex-surface fields for flows with Taylor–Green and Kida–Pelz initial conditions. J. Fluid Mech., 661:446–481, 2010.
Xiong and Yang [2017] S. Xiong and Y. Yang. The boundary-constraint method for constructing vortex-surface fields. J. Comput. Phys., 339:31–45, 2017.
Hao et al. [2019] J. Hao, S. Xiong, and Y. Yang. Tracking vortex surfaces frozen in the virtual velocity in non-ideal flows. J. Fluid Mech., 863:513–544, 2019.
Cottet and Koumoutsakos [2000] G.H. Cottet and P.D. Koumoutsakos. Vortex Methods: Theory and Practice. Cambridge University Press, 2000.
Redmon et al. [2016] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognitionn, pages 779–788, 2016.
Lin et al. [2017] T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. Focal loss for dense object detection. IEEE Trans. Vis. Comput. Graph., pages 2980–2988, 2017.
Paszke et al. [2019] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, pages 8026–8037, 2019.
Glorot and Bengio [2010] X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 249–256, 2010.
Qu et al. [2019] Z. Qu, X. Zhang, M. Gao, C. Jiang, and B. Chen. Efficient and conservative fluids using bidirectional map**. ACM Trans. Graph., 38:1–12, 2019.
Sod [1978] G. A. Sod. A survey of several finite difference methods for systems of nonlinear hyperbolic conservation laws. J. Comput. Phys., 27:1–31, 1978.
Lu et al. [2019] L. Lu, X. Meng, Z. Mao, and G. E. Karniadakis. DeepXDE: A deep learning library for solving differential equations. SIAM Rev. Soc. Ind. Appl. Math., 63:208–228, 2019.
Rogallo [1981] R. S. Rogallo. Numerical experiments in homogeneous turbulence. In Technical Report TM81315, NASA, 1981.
Xiong and Yang [2019] S. Xiong and Y. Yang. Construction of knotted vortex tubes with the writhe-dependent helicity. Phys. Fluids, 31:047101, 2019.
Xiong and Yang [2020] S. Xiong and Y. Yang. Effects of twist on the evolution of knotted magnetic flux tubes. J. Fluid Mech., 895:A28, 2020.
Yang et al. [2010] Y. Yang, D. I. Pullin, and I. Bermejo-Moreno. Multi-scale geometric analysis of Lagrangian structures in isotropic turbulence. J. Fluid Mech., 654:233–270, 2010.
Yang and Pullin [2011] Y. Yang and D. I. Pullin. Geometric study of Lagrangian and Eulerian structures in turbulent channel flow. J. Fluid Mech., 674:67–92, 2011.
Zhao et al. [2016] Y. Zhao, Y. Yang, and S. Chen. Evolution of material surfaces in the temporal transition in channel flow. J. Fluid Mech., 793:840–876, 2016.
Zheng et al. [2016] W. Zheng, Y. Yang, and S. Chen. Evolutionary geometry of Lagrangian structures in a transitional boundary layer. Phys. Fluids, 28:035110, 2016.
Zheng et al. [2019] W. Zheng, S. Ruan, Y. Yang, L. He, and S. Chen. Image-based modelling of the skin-friction coefficient in compressible boundary-layer transition. J. Fluid. Mech., 875:1175–1203, 2019.
Lundgren [1982] T. S. Lundgren. Strained spiral vortex model for turbulent fine structure. Phys. Fluids, 25:2193–2203, 1982.
Lundgren [1993] T. S. Lundgren. A small-scale turbulence model. Phys. Fluids A, 5:1472, 1993.