Data-Driven Computing Methods for Nonlinear Physics Systems with Geometric Constraints

Yun** Tong

Undergraduate Computer Science Thesis

Advised by

Professor Bo Zhu

[Uncaptioned image]

Dartmouth College

Hanover, New Hampshire

June, 2024

Abstract

In a landscape where scientific discovery is increasingly driven by data, the integration of machine learning (ML) with traditional scientific methodologies has emerged as a transformative approach. This paper introduces a novel, data-driven framework that synergizes physics-based priors with advanced ML techniques to address the computational and practical limitations inherent in first-principle-based methods and brute-force machine learning methods. Our framework showcases four algorithms, each embedding a specific physics-based prior tailored to a particular class of nonlinear systems, including separable and nonseparable Hamiltonian systems, hyperbolic partial differential equations, and incompressible fluid dynamics. The intrinsic incorporation of physical laws preserves the system’s intrinsic symmetries and conservation laws, ensuring solutions are physically plausible and computationally efficient. The integration of these priors also enhances the expressive power of neural networks, enabling them to capture complex patterns typical in physical phenomena that conventional methods often miss. As a result, our models outperform existing data-driven techniques in terms of prediction accuracy, robustness, and predictive capability, particularly in recognizing features absent from the training set, despite relying on small datasets, short training periods, and small sample sizes.

Acknowledgements

I am deeply grateful for the generous financial support from Dartmouth Undergraduate Advising & Research, which provided me with the Presidential Scholarship, Sophomore and Junior Research Scholarships, the Leave Term Research Grant, and support through the Women in Science Project. Additionally, I wish to acknowledge the Neukom Scholarship program from Neukom Institute for Computational Science.

I extend my heartfelt thanks to my supervisor, Professor Bo Zhu, for his unwavering support and profound inspiration. I am especially grateful for the opportunities he offered me as a first-year student, which opened my career as a researcher. I wish him a prolific and successful career at Georgia Tech.

I also recognize the invaluable assistance of the team at the Dartmouth Visual Computing Lab. Special thanks to Dr. Shiying Xiong, now an Assistant Professor at Zhejiang University, for his extensive help in various aspects of my research. He is not only a super talented researchers in computational physics but also a remarkable coworker. Additional thanks go to Xingzhe He for his assistance with deep learning algorithms, among many others in the lab. Without their collective effort and collaboration, these works would not have been possible.

I also want to thank Professor Deeparnab Chakrabarty for his support and inspiration, as well as Professor Soroush Vosoughi and Professor Yaoqing Yang for being on my thesis committee. Additionally, I am grateful to the other professors and students who have taught and helped me at Dartmouth.

List of Publications

The following papers were published during the completion of my undergraduate studies and will be introduced in this thesis (listed in chronological order):

  1. 1.

    Tong, Y., Xiong, S., He, X., Pan, G., & Zhu, B. (2021). Symplectic Neural Networks in Taylor Series Form for Hamiltonian Systems. Journal of Computational Physics, 437, 110325.

  2. 2.

    Xiong, S., Tong, Y., He, X., Yang, S., Yang, C., & Zhu, B. (2021). Nonseparable Symplectic Neural Networks. In Proceedings of the International Conference on Learning Representations.

  3. 3.

    Xiong, S., He, X., Tong, Y., Deng, Y., & Zhu, B. (2023). Neural Vortex Method: from Finite Lagrangian Particles to Infinite Dimensional Eulerian Dynamics. Computers and Fluids, 258, 105811.

  4. 4.

    Tong, Y., Xiong, S., He, X., Yang, S., Wang, Z., Tao, R., Liu, R., & Zhu, B. (2024). RoeNet: Predicting Discontinuity of Hyperbolic Systems from Continuous Data. International Journal for Numerical Methods in Engineering, 125, e7406.

Author’s Contribution

The work presented in this thesis is the product of scientific collaboration. Here I detail my specific contributions to each project. For the project listed first [1], my responsibilities include the initial generation of research ideas, implementation of the methodologies, conducting experiments, and writing the research paper. For the remaining three projects [2, 3, 4], my primary roles involved conducting experiments and writing the respective papers. Additionally, for the fourth project [4], I was involved in idea generation and was responsible of the paper writing and revision.

1 Introduction

From the time of Newton, two principal paradigms have shaped the methodologies of scientific research: the Keplerian paradigm, or the data-driven approach, and the Newtonian paradigm, or the first-principle-based approach [5]. The first-principle-based approach is fundamental and elegant, but the dilemma we often face is its practicality. There are many time-dependent problems in science, where the equations of motion are too complex for full solution, either because the equations are not certain or because the computational cost is too high. Additionally, for a dynamic system governed by some unknown mechanics, it is challenging to identify governing equations by directly observing the system’s state, especially when such observation is partial and the sample data is sparse.

Now, the data-driven approach has become a very powerful tool with the advancement of statistical methods and machine learning (ML). This approach enables us to handle physical systems by statistically exploring their underlying structures. Data-driven approaches have proven their efficacy in uncovering the underlying governing equations of a variety of physical systems, ranging from fluid mechanics [6] and wave physics [7] to quantum physics [8], thermodynamics [9], and materials science [10]. Moreover, various ML methods have significantly advanced the numerical simulation of complex and high-dimensional dynamical systems. These methods integrate learning paradigms with simulation infrastructures, enhancing the modeling of ordinary differential equations [11], linear and nonlinear partial differential equations [4, 12], high-dimensional partial differential equations [13], and inverse problems [14], among others.

Despite these advancements, data-driven methods like neural networks, which exhibit remarkable generalization abilities across various fields, face significant challenges. These methods require large, clean datasets and depend heavily on complex, black-box network structures that are highly sensitive to input variations. Additionally, brute-force machine learning with conventional toolkits such as deep neural networks often struggles with the high dimensionality of input-output spaces, the cost of data acquisition, the production of physically implausible results, and the inability to handle extrapolation robustly. These factors make it difficult to predict long-term dynamical behaviors accurately.

To address these challenges, we introduce a novel, data-driven framework designed to make accurate, long-term predictions in a computationally efficient manner. The key innovation lies in incorporating physics-based priors into the learning algorithms so that the physics structure of the underlying system is intrinsically preserved. As a result, our models outperform other state-of-the-art data-driven methods in terms of prediction accuracy, robustness, and predictive capability, particularly in recognizing features absent from the training set. This superior performance is achieved despite relying on smaller datasets, shorter training periods, and limited sample sizes. At the same time, our models are significantly more computationally efficient than traditional first-principles-based methods, while achieving a similar level of accuracy.

This thesis details four algorithms we have developed over time, each incorporating a distinct physics-based prior relevant to a specific type of nonlinear system. The algorithm names, the associated physics priors, and the systems they address are as follows:

  1. 1.

    Symplectic Taylor Neural Networks (Taylor-nets): The symplectic structure in separable Hamiltonian systems [1],

  2. 2.

    Nonseparable Symplectic Neural Networks (NSSNNs): The symplectic structure in nonseparable Hamiltonian systems [2],

  3. 3.

    Roe Neural Networks (RoeNet): Hyperbolic Conservation Law in hyperbolic partial differential equations (PDEs) [4],

  4. 4.

    Neural Vortex Method (NVM): Helmholtz’s Theorems in incompressible fluid dynamics [3].

Overall, the key advantages and contributions of our methodologies are as follows:

  • Preservation of Intrinsic Symmetries and Conservation Laws: Our methodologies integrate physics-based priors within the learning algorithms, which significantly narrows the solution space. This reduction not only streamlines the computational demands but also preserves the mathematical symmetries and physical conservation laws inherent in the systems being modeled. Such an approach ensures that the generated solutions are not only efficient but also robust and aligned with physical reality, enhancing both the reliability and validity of the predictions.

  • Enhanced Expressive Power of Neural Networks: By embedding physics-based structures into our models, we expand the network’s capacity to capture and reproduce complex patterns that are typical in solutions to physical phenomena. Conventional deep neural networks often struggle to identify such patterns when they are not represented within the training dataset. Our approach supports generalized solutions to PDEs and expands the solution space, allowing for a more comprehensive encapsulation of the potential physical behaviors, significantly improving the model’s applicability and predictive accuracy.

The thesis will be organized into several key sections: an introductory section and a related work section that outline the research background; a methodology section that elaborates on the mathematical foundations, including an introduction to the supervised learning and numerical methods we used to develop our methodologies; an implementation section that details the algorithm design and proofs of four methodologies respectively; and a results section that summarizes the key implementation and experimental findings. The paper will conclude with a discussion on the implications of the results and potential avenues for future research.

Table 1: Overview of the key concepts related to four methodologies.
Taylor-nets NSSNNs RoeNet NVM
Physics System Separable Hamiltonians Nonseparable Hamiltonians Hyperbolic PDEs Incompressible Fluid Dynamics
Prior Embedded Symplectic Structure Symplectic Structure Hyperbolic Conservation Law Helmholtz’s Theorems
Solver Separable Symplectic Integrator Nonseparable Symplectic Integrator Roe Solver Lagrangian Vortex Method
Key Advantages Accurately approximate the continuous-time evolution over a long term Predict future discontinuities with short-term continuous data Reconstruct continuous vortex dynamics with a small number of vortex particles

Table 1 summarizes the key concepts related to the four methods, including the specific physics systems they model, the type of physical principles or priors they embed, the integrative techniques they employ, and the primary advantages each method offers. These comparative insights provide an at-a-glance understanding of the distinct capabilities and applications of each method. The details will be addressed comprehensively in Section 3.2 and Section 4.

2 Related Work

Neural Networks for Hamiltonian Systems.

Greydanus et al. introduce Hamiltonian Neural Networks (HNNs) to preserve the Hamiltonian energy of systems by reformulating the loss function [15]. Inspired by HNNs, a series of methods that intrinsically embed a symplectic integrator into the recurrent neural network architecture were proposed, including SRNN [16], and SSINN [17]. Methods like HNN face two primary challenges: they require the temporal derivatives of system momentum and position to compute the loss function, which are hard to obtain from real-world systems, and they do not strictly preserve the symplectic structure as their symplectomorphism is governed by the loss function. Our model, Taylor-net [1], addresses these limitations by integrating a solver into the network architecture to avoid the need for time derivatives and by embedding a symmetrical structure directly within the neural networks, rather than adjusting the loss function. Moreover, these methods have been extended, via combination with graph networks [18, 19], to address large-scale N-body problems where interactions are driven by forces between particle pairs.

While the above methods are all designed to solve separable Hamiltonian systems, ** et al. proposed SympNet, which constructs symplectic map**s of system variables across neighboring time steps to handle both separable and nonseparable Hamiltonian systems [20]. However, the parameter scalability of SympNet, growing quadratically with the system size O(N2)𝑂superscript𝑁2O(N^{2})italic_O ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), poses challenges for application to high-dimensional N-body problems. Our model, NSSNN, addresses these issues with a novel network architecture tailored for nonseparable systems, which significantly reduces the complexity of parameter scaling [2]. Additionally, Hamiltonian-based neural networks have been adapted for broader applications. Toth et al. developed the Hamiltonian Generative Network (HGN) to infer Hamiltonian dynamics from high-dimensional observations, such as image data [21]. Furthermore, Zhong et al. introduced Symplectic ODE-Net (SymODEN), which incorporates an external control term into the standard Hamiltonian framework, enhancing the model’s applicability to controlled dynamical systems [22].

Neural Networks for Discontinuous Functions.

The use of deep learning networks to approximate discontinuous functions is well-supported theoretically, as highlighted in various studies on Hölder spaces [23], piecewise smooth functions [24], linear estimators [25], and highly adaptive, spatially anisotropic target functions [26]. Building on these foundations, Physics-Informed Neural Networks (PINNs) were introduced by Raissi et al. as a data-driven approach to solving nonlinear problems [27], leveraging the well-kown capability of deep neural networks to act as universal function approximators [28]. Among their key attributes, PINNs ensure the preservation of symmetry, invariance, and conservation principles that are inherent in the physical laws governing the observed data [29]. Michoski et al. demonstrated that PINNs could capture irregular solutions to PDEs without the need for any regularization [30]. Additionally, Mao et al. utilized PINNs to approximate solutions for high-speed flows by integrating the Euler equations with initial and boundary conditions into the loss function [31]. However, while these studies demonstrate the robust capabilities of PINNs, they often do not address extrapolation beyond the training set, a critical aspect for ensuring the generalizability of the models to a wider range of scenarios.

Neural Networks for Fluid Dynamics.

Recent advancements in fluid dynamics analysis have increasingly leveraged data-driven approaches powered by machine learning [32, 33, 34]. Recognizing the limitations in traditional brute-force machine learning methods, current research efforts are increasingly focused on integrating physical priors into learning algorithms, aiming to equip neural networks with a foundational understanding of physical laws, rather than approaching the data naively [35, 36, 37, 38, 39, 40]. Significant efforts have been made to encode these physical constraints efficiently, such as incorporating the Navier-Stokes (NS) equations [12], modeling incompressibility constraints [41], and map** dynamics of wave phenomena onto recurrent neural network computations [7]. Moreover, understanding complex fluid dynamics through machine learning involves embedding the structure of partial differential equations (PDEs) within neural network architectures [42, 43, 44, 45, 46, 47]. Ideally, these machine learning models designed to solve PDEs should be able to evolve the flow fields independently, obttaining initial-condition invariance without the need for a specific solver. However, the high dimensionality of the problems and insufficient supervisory data continue to pose significant challenges.

3 Methodology

3.1 Supervised learning

We used supervised learning for all of our models. Supervised learning is a subset of machine learning where an algorithm learns a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples. Each example is a pair consisting of an input object and a desired output value. The supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for map** new examples. Sequential steps involved in develo** a supervised learning model, from determining the type of training dataset to evaluating the model’s accuracy are:

  1. 1.

    Determine the Type of Training Dataset: Identify whether the problem is a classification or regression to select the appropriate type of training dataset.

  2. 2.

    Collect/Gather the Labelled Training Data: Assemble a dataset where each instance is tagged with the correct answer or outcome.

  3. 3.

    Split the Training Dataset: Divide the dataset into three parts:

    • Training dataset: used to train the model.

    • Test dataset: used to test the model’s predictions.

    • Validation dataset: used to tune the model’s hyperparameters.

  4. 4.

    Determine the Input Features: Select the features of the training dataset that contain sufficient information for the model to accurately predict the output.

  5. 5.

    Determine the Suitable Algorithm: Choose an appropriate algorithm for the model based on the problem type.

  6. 6.

    Execute the Algorithm on the Training Dataset: Train the model using the selected algorithm on the training dataset. Utilize the validation set to adjust control parameters as needed.

  7. 7.

    Evaluate the Model’s Accuracy: Test the model using the test dataset to assess its accuracy. A model that correctly predicts the output indicates high accuracy.

3.1.1 Optimizer

In the context of neural networks, optimizers are crucial for minimizing the loss function, i.e., the difference between the actual and predicted outputs. One of the popular optimizers is the Adam optimizer [48], which combines the advantages of two other extensions of stochastic gradient descent, namely Adaptive Gradient Algorithm and Root Mean Square Propagation. The Adam optimizer’s update equations are given by:

mtsubscript𝑚𝑡\displaystyle m_{t}italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =β1mt1+(1β1)gtabsentsubscript𝛽1subscript𝑚𝑡11subscript𝛽1subscript𝑔𝑡\displaystyle=\beta_{1}m_{t-1}+(1-\beta_{1})g_{t}= italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + ( 1 - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
vtsubscript𝑣𝑡\displaystyle v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =β2vt1+(1β2)gt2absentsubscript𝛽2subscript𝑣𝑡11subscript𝛽2superscriptsubscript𝑔𝑡2\displaystyle=\beta_{2}v_{t-1}+(1-\beta_{2})g_{t}^{2}= italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + ( 1 - italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
m^tsubscript^𝑚𝑡\displaystyle\hat{m}_{t}over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =mt1β1tabsentsubscript𝑚𝑡1superscriptsubscript𝛽1𝑡\displaystyle=\frac{m_{t}}{1-\beta_{1}^{t}}= divide start_ARG italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG
v^tsubscript^𝑣𝑡\displaystyle\hat{v}_{t}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =vt1β2tabsentsubscript𝑣𝑡1superscriptsubscript𝛽2𝑡\displaystyle=\frac{v_{t}}{1-\beta_{2}^{t}}= divide start_ARG italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG
θt+1subscript𝜃𝑡1\displaystyle\theta_{t+1}italic_θ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =θtαm^tv^t+ϵabsentsubscript𝜃𝑡𝛼subscript^𝑚𝑡subscript^𝑣𝑡italic-ϵ\displaystyle=\theta_{t}-\frac{\alpha\hat{m}_{t}}{\sqrt{\hat{v}_{t}}+\epsilon}= italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - divide start_ARG italic_α over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG + italic_ϵ end_ARG

where θ𝜃\thetaitalic_θ represents the parameters of the model, gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the gradient of the loss function with respect to the parameters at timestep t𝑡titalic_t, mtsubscript𝑚𝑡m_{t}italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are estimates of the first and the second moments of the gradients, respectively. α𝛼\alphaitalic_α is the learning rate, β1,β2subscript𝛽1subscript𝛽2\beta_{1},\beta_{2}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and ϵitalic-ϵ\epsilonitalic_ϵ are hyperparameters.

3.1.2 Loss Functions

The choice of loss function is pivotal in guiding the training of the model towards its objective. In our methods, we use several common loss functions in supervised learning, including:

L1 Loss (Absolute Loss)

Defined as L(y,y^)=|yy^|𝐿𝑦^𝑦𝑦^𝑦L(y,\hat{y})=\sum|y-\hat{y}|italic_L ( italic_y , over^ start_ARG italic_y end_ARG ) = ∑ | italic_y - over^ start_ARG italic_y end_ARG |, where y𝑦yitalic_y is the true value and y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG is the predicted value.

L2 Loss (Squared Loss)

Given by L(y,y^)=(yy^)2𝐿𝑦^𝑦superscript𝑦^𝑦2L(y,\hat{y})=\sum(y-\hat{y})^{2}italic_L ( italic_y , over^ start_ARG italic_y end_ARG ) = ∑ ( italic_y - over^ start_ARG italic_y end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. This loss function is sensitive to outliers as it squares the differences, hence penalizing larger errors more.

Cross-Entropy Loss

The Cross-Entropy Loss is widely used in classification tasks to measure the performance of a classification model whose output is a probability value between 0 and 1. The Cross-Entropy Loss formula is given by:

L(y,y^)=1Ni=1N[yilog(y^i)+(1yi)log(1y^i)]𝐿𝑦^𝑦1𝑁superscriptsubscript𝑖1𝑁delimited-[]subscript𝑦𝑖subscript^𝑦𝑖1subscript𝑦𝑖1subscript^𝑦𝑖L(y,\hat{y})=-\frac{1}{N}\sum_{i=1}^{N}\left[y_{i}\log(\hat{y}_{i})+(1-y_{i})% \log(1-\hat{y}_{i})\right]italic_L ( italic_y , over^ start_ARG italic_y end_ARG ) = - divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT [ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_log ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + ( 1 - italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) roman_log ( 1 - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ]

where L(y,y^)𝐿𝑦^𝑦L(y,\hat{y})italic_L ( italic_y , over^ start_ARG italic_y end_ARG ) is the loss function, N𝑁Nitalic_N is the number of samples, yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the actual label of the i𝑖iitalic_i-th sample, and y^isubscript^𝑦𝑖\hat{y}_{i}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the predicted probability for the i𝑖iitalic_i-th sample.

For multi-class classification, the generalized formula is:

L(y,y^)=1Ni=1Nc=1Myiclog(y^ic)𝐿𝑦^𝑦1𝑁superscriptsubscript𝑖1𝑁superscriptsubscript𝑐1𝑀subscript𝑦𝑖𝑐subscript^𝑦𝑖𝑐L(y,\hat{y})=-\frac{1}{N}\sum_{i=1}^{N}\sum_{c=1}^{M}y_{ic}\log(\hat{y}_{ic})italic_L ( italic_y , over^ start_ARG italic_y end_ARG ) = - divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_c = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT roman_log ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT )

where M𝑀Mitalic_M is the number of classes, yicsubscript𝑦𝑖𝑐y_{ic}italic_y start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT indicates whether class c𝑐citalic_c is the correct classification for observation i𝑖iitalic_i, and y^icsubscript^𝑦𝑖𝑐\hat{y}_{ic}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT is the predicted probability that observation i𝑖iitalic_i is of class c𝑐citalic_c.

Focal Loss

Focal Loss is an adapted version of Cross-Entropy Loss, which addresses the problem of class imbalance by focusing more on hard-to-classify examples. It is particularly useful in scenarios where there is a large class imbalance. The formula for Focal Loss is given by:

L(y,y^)=αt(1y^t)γlog(y^t)𝐿𝑦^𝑦subscript𝛼𝑡superscript1subscript^𝑦𝑡𝛾subscript^𝑦𝑡L(y,\hat{y})=-\alpha_{t}(1-\hat{y}_{t})^{\gamma}\log(\hat{y}_{t})italic_L ( italic_y , over^ start_ARG italic_y end_ARG ) = - italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( 1 - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT roman_log ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )

where αtsubscript𝛼𝑡\alpha_{t}italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a weighting factor for the class t𝑡titalic_t to counteract class imbalance, γ𝛾\gammaitalic_γ is a focusing parameter that adjusts the rate at which easy examples are down-weighted, y^tsubscript^𝑦𝑡\hat{y}_{t}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the predicted probability of the class with label t𝑡titalic_t, and (1y^t)γsuperscript1subscript^𝑦𝑡𝛾(1-\hat{y}_{t})^{\gamma}( 1 - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT reduces the loss for well-classified examples, putting more focus on hard, misclassified examples. Focal Loss is particularly useful for training on datasets where some classes are much more frequent than others, hel** to improve the robustness and performance of classification models in imbalanced datasets.

3.1.3 Activation Functions

Activation functions are non-linear functions applied to the output of a neuron in a neural network. They decide whether a neuron should be activated or not, hel** the network learn complex patterns in the data.

ReLU

One of the most popular activation functions is the Rectified Linear Unit (ReLU). It is defined as:

f(x)=max(0,x)𝑓𝑥0𝑥f(x)=\max(0,x)italic_f ( italic_x ) = roman_max ( 0 , italic_x )

where x𝑥xitalic_x is the input to the neuron. ReLU is favored for its simplicity and efficiency, promoting faster convergence in training due to its linear, non-saturating form.

Next, we will outline the various models that were employed in the development of our model.

3.1.4 Residual Networks (ResNets) and Residual Blocks (ResBlocks)

ResNets are designed to enable training of very deep neural networks through the introduction of Residual Blocks (ResBlocks), which use skip connections or shortcuts to jump over some layers [49]. ResNets have been proven in numerous research studies to be a neural network architecture highly suitable for deep learning and computer vision. It offers distinctive advantages in mitigating problems like gradient vanishing during network training.

ResBlocks

A Residual Block allows the gradient to flow through the network directly, without passing through non-linear activations, by using skip connections. This is mathematically represented as:

𝐡out=(𝐡in,{θi})+𝐡insubscript𝐡outsubscript𝐡insubscript𝜃𝑖subscript𝐡in\mathbf{h}_{\text{out}}=\mathcal{F}(\mathbf{h}_{\text{in}},\{\theta_{i}\})+% \mathbf{h}_{\text{in}}bold_h start_POSTSUBSCRIPT out end_POSTSUBSCRIPT = caligraphic_F ( bold_h start_POSTSUBSCRIPT in end_POSTSUBSCRIPT , { italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } ) + bold_h start_POSTSUBSCRIPT in end_POSTSUBSCRIPT

where 𝐡insubscript𝐡in\mathbf{h}_{\text{in}}bold_h start_POSTSUBSCRIPT in end_POSTSUBSCRIPT is the input to the ResBlock, (𝐡in,{θi})subscript𝐡insubscript𝜃𝑖\mathcal{F}(\mathbf{h}_{\text{in}},\{\theta_{i}\})caligraphic_F ( bold_h start_POSTSUBSCRIPT in end_POSTSUBSCRIPT , { italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } ) represents the residual map** to be learned by layers of the ResBlock, and 𝐡outsubscript𝐡out\mathbf{h}_{\text{out}}bold_h start_POSTSUBSCRIPT out end_POSTSUBSCRIPT is the output of the ResBlock. The addition operation is element-wise, allowing the network to learn identity map**s efficiently, which is crucial for training deep networks.

3.1.5 Neural Ordinary Differential Equations (Neural ODEs) and the Adjoint Method

Neural ODEs are a class of models that represent the continuous dynamics of hidden states using differential equations [50]. Unlike traditional neural networks that apply a discrete sequence of transformations, Neural ODEs model the derivative of the hidden state as a continuous transformation:

d𝐡(t)dt=f(𝐡(t),t,θ)𝑑𝐡𝑡𝑑𝑡𝑓𝐡𝑡𝑡𝜃\frac{d\mathbf{h}(t)}{dt}=f(\mathbf{h}(t),t,\theta)divide start_ARG italic_d bold_h ( italic_t ) end_ARG start_ARG italic_d italic_t end_ARG = italic_f ( bold_h ( italic_t ) , italic_t , italic_θ ) (1)

where 𝐡(t)𝐡𝑡\mathbf{h}(t)bold_h ( italic_t ) is the hidden state at time t𝑡titalic_t, f𝑓fitalic_f is a neural network parameterized by θ𝜃\thetaitalic_θ defining the time derivative of the hidden state, making the model capable of learning continuous-time dynamics.

At the heart of the model is that under the perspective of viewing a neural network as a dynamic system, we can treat the chain of residual blocks in a neural network as the solution of an ordinary differential equation (ODE) with the Euler method. Given a residual network that consists of sequence of transformations

𝒉t+1=𝒉t+f(𝒉t,θt),subscript𝒉𝑡1subscript𝒉𝑡𝑓subscript𝒉𝑡subscript𝜃𝑡\bm{h}_{t+1}=\bm{h}_{t}+f(\bm{h}_{t},\theta_{t}),bold_italic_h start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = bold_italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_f ( bold_italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , (2)

the idea is to parameterize the continuous dynamics using an ODE specified by the neural network specified in (1).

In a Neural ODE framework, the evolution of the hidden state z𝑧zitalic_z is governed by an ODE parameterized by a neural network:

dz(t)dt=f(z(t),t,θ),𝑑𝑧𝑡𝑑𝑡𝑓𝑧𝑡𝑡𝜃\frac{dz(t)}{dt}=f(z(t),t,\theta),divide start_ARG italic_d italic_z ( italic_t ) end_ARG start_ARG italic_d italic_t end_ARG = italic_f ( italic_z ( italic_t ) , italic_t , italic_θ ) , (3)

where t𝑡titalic_t is time, θ𝜃\thetaitalic_θ represents the parameters of the neural network, and f𝑓fitalic_f is a function approximated by the neural network defining the dynamics of z𝑧zitalic_z.

To optimize Neural ODEs, the adjoint method is utilized, providing an efficient means for calculating gradients with respect to the parameters θ𝜃\thetaitalic_θ during backpropagation [51]. Rather than differentiating through the ODE solver, we solve the adjoint ODE defined as:

d𝐚(t)dt=𝐚(t)f𝐡(t),𝑑𝐚𝑡𝑑𝑡𝐚superscript𝑡top𝑓𝐡𝑡\frac{d\mathbf{a}(t)}{dt}=-\mathbf{a}(t)^{\top}\frac{\partial f}{\partial% \mathbf{h}(t)},divide start_ARG italic_d bold_a ( italic_t ) end_ARG start_ARG italic_d italic_t end_ARG = - bold_a ( italic_t ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT divide start_ARG ∂ italic_f end_ARG start_ARG ∂ bold_h ( italic_t ) end_ARG , (4)

where 𝐚(t)=dLd𝐡(t)𝐚𝑡𝑑𝐿𝑑𝐡𝑡\mathbf{a}(t)=\frac{dL}{d\mathbf{h}(t)}bold_a ( italic_t ) = divide start_ARG italic_d italic_L end_ARG start_ARG italic_d bold_h ( italic_t ) end_ARG is the gradient of the loss function L𝐿Litalic_L with respect to the hidden state.

The gradient of the loss with respect to the parameters is then obtained by integrating:

dLdθ=t1t0𝐚(t)fθ𝑑t,𝑑𝐿𝑑𝜃superscriptsubscriptsubscript𝑡1subscript𝑡0𝐚superscript𝑡top𝑓𝜃differential-d𝑡\frac{dL}{d\theta}=\int_{t_{1}}^{t_{0}}\mathbf{a}(t)^{\top}\frac{\partial f}{% \partial\theta}\,dt,divide start_ARG italic_d italic_L end_ARG start_ARG italic_d italic_θ end_ARG = ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_a ( italic_t ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT divide start_ARG ∂ italic_f end_ARG start_ARG ∂ italic_θ end_ARG italic_d italic_t , (5)

over the interval from t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, the duration of the forward pass. The adjoint state 𝐚(t)𝐚𝑡\mathbf{a}(t)bold_a ( italic_t ) is initialized at the end of the forward pass and integrated backward in time to obtain the necessary gradients for parameter updates.

3.2 Numerical Methods

First, we present four methods for solving ordinary differential equations (ODEs), which include the Euler method, Runge-Kutta method, Symplectic Integrator, and Non-separable Symplectic Integrator.

3.2.1 Euler Method

The Euler method represents one of the most straightforward numerical strategies for approximating solutions to ODEs. As a first-order numerical method, it provides an initial approach for solving initial value problems defined by d𝒚dt=𝒇(t,𝒚)𝑑𝒚𝑑𝑡𝒇𝑡𝒚\frac{d\bm{y}}{dt}=\bm{f}(t,\bm{y})divide start_ARG italic_d bold_italic_y end_ARG start_ARG italic_d italic_t end_ARG = bold_italic_f ( italic_t , bold_italic_y ) with the initial condition 𝒚(t0)=𝒚0𝒚subscript𝑡0subscript𝒚0\bm{y}(t_{0})=\bm{y}_{0}bold_italic_y ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Despite its simplicity, the Euler method is fundamental in the introduction to more sophisticated numerical methods for differential equations.

This method calculates the next state vector 𝒚𝒚\bm{y}bold_italic_y by proceeding in the direction of the derivative 𝒇(t,𝒚)𝒇𝑡𝒚\bm{f}(t,\bm{y})bold_italic_f ( italic_t , bold_italic_y ), scaled by the timestep dt𝑑𝑡dtitalic_d italic_t. The updated state 𝒚𝒚\bm{y}bold_italic_y at time t+dt𝑡𝑑𝑡t+dtitalic_t + italic_d italic_t is given by:

𝒚(t+dt)=𝒚(t)+𝒇(t,𝒚(t))dt𝒚𝑡𝑑𝑡𝒚𝑡𝒇𝑡𝒚𝑡𝑑𝑡\bm{y}(t+dt)=\bm{y}(t)+\bm{f}(t,\bm{y}(t))\cdot dtbold_italic_y ( italic_t + italic_d italic_t ) = bold_italic_y ( italic_t ) + bold_italic_f ( italic_t , bold_italic_y ( italic_t ) ) ⋅ italic_d italic_t

As a consequence of its first-order accuracy, the local truncation error for the Euler method is of the order O(dt2)𝑂𝑑superscript𝑡2O(dt^{2})italic_O ( italic_d italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), while the global error is of the order O(dt)𝑂𝑑𝑡O(dt)italic_O ( italic_d italic_t ). This relatively large error suggests that while the Euler method can be beneficial for straightforward problems and educational purposes, it may not be the best choice for scenarios that demand high precision over extended durations.

3.2.2 Runge-Kutta Method

The Runge-Kutta methods are a prominent family of iterative techniques for the numerical resolution of ODEs. The fourth-order Runge-Kutta method, commonly referred to as RK4, is particularly renowned for its balance between computational efficiency and accuracy. This method is applied to approximate the solution of an initial value problem defined by the ODE d𝒚dt=𝒇(t,𝒚)𝑑𝒚𝑑𝑡𝒇𝑡𝒚\frac{d\bm{y}}{dt}=\bm{f}(t,\bm{y})divide start_ARG italic_d bold_italic_y end_ARG start_ARG italic_d italic_t end_ARG = bold_italic_f ( italic_t , bold_italic_y ) with the initial condition 𝒚(t0)=𝒚0𝒚subscript𝑡0subscript𝒚0\bm{y}(t_{0})=\bm{y}_{0}bold_italic_y ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

RK4 progresses the solution by computing a weighted average of four increments, where each increment evaluates the derivative 𝒇(t,𝒚)𝒇𝑡𝒚\bm{f}(t,\bm{y})bold_italic_f ( italic_t , bold_italic_y )at various points within the timestep dt𝑑𝑡dtitalic_d italic_t. The solution 𝒚𝒚\bm{y}bold_italic_y at a subsequent time t+dt𝑡𝑑𝑡t+dtitalic_t + italic_d italic_t is determined using the formula:

𝒚(t+dt)=𝒚(t)+16(𝒌1+2𝒌2+2𝒌3+𝒌4)𝒚𝑡𝑑𝑡𝒚𝑡16subscript𝒌12subscript𝒌22subscript𝒌3subscript𝒌4\bm{y}(t+dt)=\bm{y}(t)+\frac{1}{6}(\bm{k}_{1}+2\bm{k}_{2}+2\bm{k}_{3}+\bm{k}_{% 4})bold_italic_y ( italic_t + italic_d italic_t ) = bold_italic_y ( italic_t ) + divide start_ARG 1 end_ARG start_ARG 6 end_ARG ( bold_italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 2 bold_italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 2 bold_italic_k start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT + bold_italic_k start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT )

with the increments given by:

𝒌1subscript𝒌1\displaystyle\bm{k}_{1}bold_italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =𝒇(t,𝒚)dt,absent𝒇𝑡𝒚𝑑𝑡\displaystyle=\bm{f}(t,\bm{y})\cdot dt,= bold_italic_f ( italic_t , bold_italic_y ) ⋅ italic_d italic_t ,
𝒌2subscript𝒌2\displaystyle\bm{k}_{2}bold_italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =𝒇(t+dt2,𝒚+𝒌12)dt,absent𝒇𝑡𝑑𝑡2𝒚subscript𝒌12𝑑𝑡\displaystyle=\bm{f}\left(t+\frac{dt}{2},\bm{y}+\frac{\bm{k}_{1}}{2}\right)% \cdot dt,= bold_italic_f ( italic_t + divide start_ARG italic_d italic_t end_ARG start_ARG 2 end_ARG , bold_italic_y + divide start_ARG bold_italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ) ⋅ italic_d italic_t ,
𝒌3subscript𝒌3\displaystyle\bm{k}_{3}bold_italic_k start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT =𝒇(t+dt2,𝒚+𝒌22)dt,absent𝒇𝑡𝑑𝑡2𝒚subscript𝒌22𝑑𝑡\displaystyle=\bm{f}\left(t+\frac{dt}{2},\bm{y}+\frac{\bm{k}_{2}}{2}\right)% \cdot dt,= bold_italic_f ( italic_t + divide start_ARG italic_d italic_t end_ARG start_ARG 2 end_ARG , bold_italic_y + divide start_ARG bold_italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ) ⋅ italic_d italic_t ,
𝒌4subscript𝒌4\displaystyle\bm{k}_{4}bold_italic_k start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT =𝒇(t+dt,𝒚+𝒌3)dt.absent𝒇𝑡𝑑𝑡𝒚subscript𝒌3𝑑𝑡\displaystyle=\bm{f}(t+dt,\bm{y}+\bm{k}_{3})\cdot dt.= bold_italic_f ( italic_t + italic_d italic_t , bold_italic_y + bold_italic_k start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ⋅ italic_d italic_t .

As a fourth-order method, the RK4 achieves a local truncation error of the order O(dt5)𝑂𝑑superscript𝑡5O(dt^{5})italic_O ( italic_d italic_t start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT ) and a global error of the order O(dt4)𝑂𝑑superscript𝑡4O(dt^{4})italic_O ( italic_d italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ). This substantial accuracy renders the RK4 method highly effective for a broad spectrum of applications, offering an excellent trade-off between the computational demands and the precision of the solution.

3.2.3 Separable Symplectic Integrator

Symplectic integrators are a class of numerical integration schemes specifically designed for simulating Hamiltonian systems.

A Hamiltonian system is characterized by N𝑁Nitalic_N pairs of canonical coordinates, denoted by generalized positions 𝒒=(q1,q2,,qN)𝒒subscript𝑞1subscript𝑞2subscript𝑞𝑁\bm{q}=(q_{1},q_{2},\cdots,q_{N})bold_italic_q = ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_q start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) and generalized momenta 𝒑=(p1,p2,pN)𝒑subscript𝑝1subscript𝑝2subscript𝑝𝑁\bm{p}=(p_{1},p_{2},...p_{N})bold_italic_p = ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … italic_p start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ). The evolution of these coordinates over time is governed by Hamilton’s equations, expressed as

{d𝒒dt=𝒑,d𝒑dt=𝒒,casesd𝒒d𝑡𝒑otherwised𝒑d𝑡𝒒otherwise\begin{dcases}\frac{\textrm{d}\bm{q}}{\textrm{d}t}=\frac{\partial\mathcal{H}}{% \partial\bm{p}},\\ \frac{\textrm{d}\bm{p}}{\textrm{d}t}=-\frac{\partial\mathcal{H}}{\partial\bm{q% }},\end{dcases}{ start_ROW start_CELL divide start_ARG d bold_italic_q end_ARG start_ARG d italic_t end_ARG = divide start_ARG ∂ caligraphic_H end_ARG start_ARG ∂ bold_italic_p end_ARG , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL divide start_ARG d bold_italic_p end_ARG start_ARG d italic_t end_ARG = - divide start_ARG ∂ caligraphic_H end_ARG start_ARG ∂ bold_italic_q end_ARG , end_CELL start_CELL end_CELL end_ROW (6)

with the initial condition

(𝒒(t0),𝒑(t0))=(𝒒0,𝒑0).𝒒subscript𝑡0𝒑subscript𝑡0subscript𝒒0subscript𝒑0(\bm{q}(t_{0}),\bm{p}(t_{0}))=(\bm{q}_{0},\bm{p}_{0}).( bold_italic_q ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , bold_italic_p ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) = ( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) . (7)

In a general setting, 𝒒=(q1,q2,,qN)𝒒subscript𝑞1subscript𝑞2subscript𝑞𝑁\bm{q}=(q_{1},q_{2},\cdots,q_{N})bold_italic_q = ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_q start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) represents the positions and 𝒑=(p1,p2,pN)𝒑subscript𝑝1subscript𝑝2subscript𝑝𝑁\bm{p}=(p_{1},p_{2},...p_{N})bold_italic_p = ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … italic_p start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) denotes their momentum. Function =(𝒒,𝒑)𝒒𝒑\mathcal{H}=\mathcal{H}(\bm{q},\bm{p})caligraphic_H = caligraphic_H ( bold_italic_q , bold_italic_p ) is the Hamiltonian, which corresponds to the total energy of the system.

In a seperable Hamiltonian system, the Hamiltonian \mathcal{H}caligraphic_H can be split into a kinetic energy part T(𝒑)𝑇𝒑T(\bm{p})italic_T ( bold_italic_p ) and a potential energy part V(𝒒)𝑉𝒒V(\bm{q})italic_V ( bold_italic_q ). Consequently, the Hamiltonian of a separable Hamiltonian system can e expressed in this form:

(𝒒,𝒑)=T(𝒑)+V(𝒒).𝒒𝒑𝑇𝒑𝑉𝒒\mathcal{H}(\bm{q},\bm{p})=T(\bm{p})+V(\bm{q}).caligraphic_H ( bold_italic_q , bold_italic_p ) = italic_T ( bold_italic_p ) + italic_V ( bold_italic_q ) . (8)

The Symplectic integrators are distinguished by their ability to preserve the symplectic structure of phase space, an essential property for ensuring the long-term stability and accuracy of the simulation. By conserving quantities analogous to energy, these methods avoid the numerical dissipation typical of other numerical schemes, making them particularly well-suited for simulating dynamical systems over extended periods.

The specific Symplectic integrators we use is the fourth-order symplectic integrator, as described in the context of Hamiltonian systems and notably referenced in works by Forest and Ruth [52] and Yoshida [53]. It operates by applying a sequence of operations that integrate the system’s equations of motion over a timestep dt𝑑𝑡dtitalic_d italic_t while preserving the symplectic geometry of phase space. This preservation is crucial for accurately simulating the long-term behavior of Hamiltonian systems. The integrator is specifically designed for separable Hamiltonian systems shown in eqaution (8). The fourth-order symplectic integrator updates the system’s state over a time step dt𝑑𝑡dtitalic_d italic_t by applying a sequence of operations that preserve the symplectic structure. The procedure is as follows:

1. Initialize with (𝒒0,𝒑0)subscript𝒒0subscript𝒑0(\bm{q}_{0},\bm{p}_{0})( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) at t=t0𝑡subscript𝑡0t=t_{0}italic_t = italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

2. For each time step dt𝑑𝑡dtitalic_d italic_t, update (𝒒,𝒑)𝒒𝒑(\bm{q},\bm{p})( bold_italic_q , bold_italic_p ) through the following sequence of operations:

  1. 1.

    For each step j𝑗jitalic_j from 1 to 4, execute the following updates:

    • Update momentum 𝒑𝒑\bm{p}bold_italic_p by a fraction of the time step:

      𝒑=𝒑djV(𝒒)dt.𝒑𝒑subscript𝑑𝑗𝑉𝒒𝑑𝑡\bm{p}=\bm{p}-d_{j}\nabla V(\bm{q})\cdot dt.bold_italic_p = bold_italic_p - italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∇ italic_V ( bold_italic_q ) ⋅ italic_d italic_t . (9)
    • Update position 𝒒𝒒\bm{q}bold_italic_q by a fraction of the time step:

      𝒒=𝒒+cjT(𝒑)dt.𝒒𝒒subscript𝑐𝑗𝑇𝒑𝑑𝑡\bm{q}=\bm{q}+c_{j}\nabla T(\bm{p})\cdot dt.bold_italic_q = bold_italic_q + italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∇ italic_T ( bold_italic_p ) ⋅ italic_d italic_t . (10)

The coefficients cjsubscript𝑐𝑗c_{j}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and djsubscript𝑑𝑗d_{j}italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are chosen to eliminate lower-order error terms, ensuring fourth-order accuracy. These coefficients are typically defined as [52, 53, 54]:

c1subscript𝑐1\displaystyle c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =c4=12(221/3),absentsubscript𝑐4122superscript213\displaystyle=c_{4}={\frac{1}{2(2-2^{1/3})}},= italic_c start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 ( 2 - 2 start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ) end_ARG , c2subscript𝑐2\displaystyle c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =c3=121/32(221/3),absentsubscript𝑐31superscript21322superscript213\displaystyle=c_{3}={\frac{1-2^{1/3}}{2(2-2^{1/3})}},= italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = divide start_ARG 1 - 2 start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT end_ARG start_ARG 2 ( 2 - 2 start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ) end_ARG , (11)
d1subscript𝑑1\displaystyle d_{1}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =d3=1221/3,absentsubscript𝑑312superscript213\displaystyle=d_{3}={\frac{1}{2-2^{1/3}}},= italic_d start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 - 2 start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT end_ARG , d2subscript𝑑2\displaystyle d_{2}italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =21/3221/3,absentsuperscript2132superscript213\displaystyle=-{\frac{2^{1/3}}{2-2^{1/3}}},= - divide start_ARG 2 start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT end_ARG start_ARG 2 - 2 start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT end_ARG , d4=0.subscript𝑑40\displaystyle d_{4}=0.italic_d start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = 0 .

Repeat these steps for each time step dt𝑑𝑡dtitalic_d italic_t, iteratively advancing the system from (𝒒0,𝒑0)subscript𝒒0subscript𝒑0(\bm{q}_{0},\bm{p}_{0})( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) at t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to (𝒒n,𝒑n)subscript𝒒𝑛subscript𝒑𝑛(\bm{q}_{n},\bm{p}_{n})( bold_italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) at t0+ndtsubscript𝑡0𝑛𝑑𝑡t_{0}+n\cdot dtitalic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_n ⋅ italic_d italic_t, where n𝑛nitalic_n is the number of time steps.

The fourth-order symplectic integrator is characterized by its fourth-order accuracy in the numerical simulation of Hamiltonian systems. This indicates that the local truncation error of the method is of the order O(dt5)𝑂𝑑superscript𝑡5O(dt^{5})italic_O ( italic_d italic_t start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT ), implying that the error introduced in a single timestep decreases as the fifth power of the timestep size. Consequently, the global error, or the cumulative error over a fixed interval of time, is of the order O(dt4)𝑂𝑑superscript𝑡4O(dt^{4})italic_O ( italic_d italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ). Such high-order accuracy is especially beneficial for simulations requiring long-term stability and precision, as it permits the use of relatively large timestep sizes while maintaining a low overall numerical error.

3.2.4 Nonseparable Symplectic Integrator

Given a Hamiltonian system described in (6) with initial condition (7), we now consider a more genral case, an arbitrary separable and nonseparable Hamiltonian system. In the original research of [55] in computational physics, a generic, high-order, explicit and symplectic time integrator was proposed to solve (6) of an arbitrary separable and nonseparable Hamiltonian \mathcal{H}caligraphic_H. This is implemented by considering an augmented Hamiltonian

¯(𝒒,𝒑,𝒙,𝒚):=A+B+ωCassign¯𝒒𝒑𝒙𝒚subscript𝐴subscript𝐵𝜔subscript𝐶\overline{\mathcal{H}}(\bm{q},\bm{p},\bm{x},\bm{y}):=\mathcal{H}_{A}+\mathcal{% H}_{B}+\omega\mathcal{H}_{C}over¯ start_ARG caligraphic_H end_ARG ( bold_italic_q , bold_italic_p , bold_italic_x , bold_italic_y ) := caligraphic_H start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT + caligraphic_H start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT + italic_ω caligraphic_H start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT (12)

with

A=(𝒒,𝒚),B=(𝒙,𝒑),C=12(𝒒𝒙22+𝒑𝒚22)formulae-sequencesubscript𝐴𝒒𝒚formulae-sequencesubscript𝐵𝒙𝒑subscript𝐶12superscriptsubscriptnorm𝒒𝒙22superscriptsubscriptnorm𝒑𝒚22\mathcal{H}_{A}=\mathcal{H}(\bm{q},\bm{y}),~{}~{}\mathcal{H}_{B}=\mathcal{H}(% \bm{x},\bm{p}),~{}~{}\mathcal{H}_{C}=\frac{1}{2}\left(\|\bm{q}-\bm{x}\|_{2}^{2% }+\|\bm{p}-\bm{y}\|_{2}^{2}\right)caligraphic_H start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT = caligraphic_H ( bold_italic_q , bold_italic_y ) , caligraphic_H start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT = caligraphic_H ( bold_italic_x , bold_italic_p ) , caligraphic_H start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( ∥ bold_italic_q - bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ bold_italic_p - bold_italic_y ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) (13)

in an extended phase space with symplectic two form d𝒒d𝒑+d𝒙d𝒚d𝒒d𝒑d𝒙d𝒚\textrm{d}\bm{q}\wedge\textrm{d}\bm{p}+\textrm{d}\bm{x}\wedge\textrm{d}\bm{y}d bold_italic_q ∧ d bold_italic_p + d bold_italic_x ∧ d bold_italic_y, where ω𝜔\omegaitalic_ω is a constant that controls the binding of the original system and the artificial restraint.

Notice that the Hamilton’s equations for ¯¯\overline{\mathcal{H}}over¯ start_ARG caligraphic_H end_ARG

{d𝒒dt=¯𝒑=(𝒙,𝒑)𝒑+ω(𝒑𝒚),d𝒑dt=¯𝒒=(𝒒,𝒚)𝒒ω(𝒒𝒙),d𝒙dt=¯𝒚=(𝒒,𝒚)𝒚ω(𝒑𝒚),d𝒚dt=¯𝒙=(𝒙,𝒑)𝒙+ω(𝒒𝒙),casesd𝒒d𝑡¯𝒑𝒙𝒑𝒑𝜔𝒑𝒚otherwised𝒑d𝑡¯𝒒𝒒𝒚𝒒𝜔𝒒𝒙otherwised𝒙d𝑡¯𝒚𝒒𝒚𝒚𝜔𝒑𝒚otherwised𝒚d𝑡¯𝒙𝒙𝒑𝒙𝜔𝒒𝒙otherwise\begin{dcases}\frac{\textrm{d}\bm{q}}{\textrm{d}t}=\frac{\partial\overline{% \mathcal{H}}}{\partial\bm{p}}=\frac{\partial\mathcal{H}(\bm{x},\bm{p})}{% \partial\bm{p}}+\omega(\bm{p}-\bm{y}),\\ \frac{\textrm{d}\bm{p}}{\textrm{d}t}=-\frac{\partial\overline{\mathcal{H}}}{% \partial\bm{q}}=-\frac{\partial\mathcal{H}(\bm{q},\bm{y})}{\partial\bm{q}}-% \omega(\bm{q}-\bm{x}),\\ \frac{\textrm{d}\bm{x}}{\textrm{d}t}=\frac{\partial\overline{\mathcal{H}}}{% \partial\bm{y}}=\frac{\partial\mathcal{H}(\bm{q},\bm{y})}{\partial\bm{y}}-% \omega(\bm{p}-\bm{y}),\\ \frac{\textrm{d}\bm{y}}{\textrm{d}t}=-\frac{\partial\overline{\mathcal{H}}}{% \partial\bm{x}}=-\frac{\partial\mathcal{H}(\bm{x},\bm{p})}{\partial\bm{x}}+% \omega(\bm{q}-\bm{x}),\\ \end{dcases}{ start_ROW start_CELL divide start_ARG d bold_italic_q end_ARG start_ARG d italic_t end_ARG = divide start_ARG ∂ over¯ start_ARG caligraphic_H end_ARG end_ARG start_ARG ∂ bold_italic_p end_ARG = divide start_ARG ∂ caligraphic_H ( bold_italic_x , bold_italic_p ) end_ARG start_ARG ∂ bold_italic_p end_ARG + italic_ω ( bold_italic_p - bold_italic_y ) , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL divide start_ARG d bold_italic_p end_ARG start_ARG d italic_t end_ARG = - divide start_ARG ∂ over¯ start_ARG caligraphic_H end_ARG end_ARG start_ARG ∂ bold_italic_q end_ARG = - divide start_ARG ∂ caligraphic_H ( bold_italic_q , bold_italic_y ) end_ARG start_ARG ∂ bold_italic_q end_ARG - italic_ω ( bold_italic_q - bold_italic_x ) , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL divide start_ARG d bold_italic_x end_ARG start_ARG d italic_t end_ARG = divide start_ARG ∂ over¯ start_ARG caligraphic_H end_ARG end_ARG start_ARG ∂ bold_italic_y end_ARG = divide start_ARG ∂ caligraphic_H ( bold_italic_q , bold_italic_y ) end_ARG start_ARG ∂ bold_italic_y end_ARG - italic_ω ( bold_italic_p - bold_italic_y ) , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL divide start_ARG d bold_italic_y end_ARG start_ARG d italic_t end_ARG = - divide start_ARG ∂ over¯ start_ARG caligraphic_H end_ARG end_ARG start_ARG ∂ bold_italic_x end_ARG = - divide start_ARG ∂ caligraphic_H ( bold_italic_x , bold_italic_p ) end_ARG start_ARG ∂ bold_italic_x end_ARG + italic_ω ( bold_italic_q - bold_italic_x ) , end_CELL start_CELL end_CELL end_ROW (14)

with the initial condition (𝒒,𝒑,𝒙,𝒚)|t=t0=(𝒒0,𝒑0,𝒒0,𝒑0)evaluated-at𝒒𝒑𝒙𝒚𝑡subscript𝑡0subscript𝒒0subscript𝒑0subscript𝒒0subscript𝒑0(\bm{q},\bm{p},\bm{x},\bm{y})|_{t=t_{0}}=(\bm{q}_{0},\bm{p}_{0},\bm{q}_{0},\bm% {p}_{0})( bold_italic_q , bold_italic_p , bold_italic_x , bold_italic_y ) | start_POSTSUBSCRIPT italic_t = italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) have the same exact solution as (6) in the sense that (𝒒,𝒑,𝒙,𝒚)=(𝒒,𝒑,𝒒,𝒑)𝒒𝒑𝒙𝒚𝒒𝒑𝒒𝒑(\bm{q},\bm{p},\bm{x},\bm{y})=(\bm{q},\bm{p},\bm{q},\bm{p})( bold_italic_q , bold_italic_p , bold_italic_x , bold_italic_y ) = ( bold_italic_q , bold_italic_p , bold_italic_q , bold_italic_p ). Hence, we can get the solution of (6) by solving (14). The coefficient ω𝜔\omegaitalic_ω acts as a regularizer, which stabilizes the numerical results.

It is possible to construct high-order symplectic integrators for ¯¯\overline{\mathcal{H}}over¯ start_ARG caligraphic_H end_ARG with explicit updates. Denote respectively by ϕ1δ(𝒒,𝒑,𝒙,𝒚)superscriptsubscriptbold-italic-ϕ1𝛿𝒒𝒑𝒙𝒚\bm{\phi}_{1}^{\delta}(\bm{q},\bm{p},\bm{x},\bm{y})bold_italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT ( bold_italic_q , bold_italic_p , bold_italic_x , bold_italic_y ), ϕ2δ(𝒒,𝒑,𝒙,𝒚)superscriptsubscriptbold-italic-ϕ2𝛿𝒒𝒑𝒙𝒚\bm{\phi}_{2}^{\delta}(\bm{q},\bm{p},\bm{x},\bm{y})bold_italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT ( bold_italic_q , bold_italic_p , bold_italic_x , bold_italic_y ), and ϕ3δ(𝒒,𝒑,𝒙,𝒚)superscriptsubscriptbold-italic-ϕ3𝛿𝒒𝒑𝒙𝒚\bm{\phi}_{3}^{\delta}(\bm{q},\bm{p},\bm{x},\bm{y})bold_italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT ( bold_italic_q , bold_italic_p , bold_italic_x , bold_italic_y ), which are the time-δ𝛿\deltaitalic_δ flow of 𝒜subscript𝒜\mathcal{H_{A}}caligraphic_H start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT, subscript\mathcal{H_{B}}caligraphic_H start_POSTSUBSCRIPT caligraphic_B end_POSTSUBSCRIPT, ω𝒞𝜔subscript𝒞\omega\mathcal{H_{C}}italic_ω caligraphic_H start_POSTSUBSCRIPT caligraphic_C end_POSTSUBSCRIPT. ϕ1δsuperscriptsubscriptbold-italic-ϕ1𝛿\bm{\phi}_{1}^{\delta}bold_italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT, ϕ2δsuperscriptsubscriptbold-italic-ϕ2𝛿\bm{\phi}_{2}^{\delta}bold_italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT, and ϕ3δsuperscriptsubscriptbold-italic-ϕ3𝛿\bm{\phi}_{3}^{\delta}bold_italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT are given by

[𝒒𝒑δ[θ(𝒒,𝒚)/𝒒]𝒙+δ[θ(𝒒,𝒚)/𝒑]𝒚],[𝒒+δ[θ(𝒙,𝒑)/𝒑]𝒑𝒙𝒚δ[θ(𝒙,𝒑)/𝒒]],and12[(𝒒+𝒙𝒑+𝒚)+𝑹δ(𝒒𝒙𝒑𝒚)(𝒒+𝒙𝒑+𝒚)𝑹δ(𝒒𝒙𝒑𝒚)],matrix𝒒𝒑𝛿delimited-[]subscript𝜃𝒒𝒚𝒒𝒙𝛿delimited-[]subscript𝜃𝒒𝒚𝒑𝒚matrix𝒒𝛿delimited-[]subscript𝜃𝒙𝒑𝒑𝒑𝒙𝒚𝛿delimited-[]subscript𝜃𝒙𝒑𝒒and12matrixmatrix𝒒𝒙𝒑𝒚superscript𝑹𝛿matrix𝒒𝒙𝒑𝒚matrix𝒒𝒙𝒑𝒚superscript𝑹𝛿matrix𝒒𝒙𝒑𝒚\begin{bmatrix}\bm{q}\\ \bm{p}-\delta[\partial\mathcal{H}_{\theta}(\bm{q},\bm{y})/\partial\bm{q}]\\ \bm{x}+\delta[\partial\mathcal{H}_{\theta}(\bm{q},\bm{y})/\partial\bm{p}]\\ \bm{y}\end{bmatrix},~{}\begin{bmatrix}\bm{q}+\delta[\partial\mathcal{H}_{% \theta}(\bm{x},\bm{p})/\partial\bm{p}]\\ \bm{p}\\ \bm{x}\\ \bm{y}-\delta[\partial\mathcal{H}_{\theta}(\bm{x},\bm{p})/\partial\bm{q}]\end{% bmatrix},~{}\textrm{and}~{}\frac{1}{2}\begin{bmatrix}\begin{pmatrix}\bm{q}+\bm% {x}\\ \bm{p}+\bm{y}\\ \end{pmatrix}+\bm{R}^{\delta}\begin{pmatrix}\bm{q}-\bm{x}\\ \bm{p}-\bm{y}\\ \end{pmatrix}\\ \begin{pmatrix}\bm{q}+\bm{x}\\ \bm{p}+\bm{y}\\ \end{pmatrix}-\bm{R}^{\delta}\begin{pmatrix}\bm{q}-\bm{x}\\ \bm{p}-\bm{y}\\ \end{pmatrix}\\ \end{bmatrix},[ start_ARG start_ROW start_CELL bold_italic_q end_CELL end_ROW start_ROW start_CELL bold_italic_p - italic_δ [ ∂ caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_y ) / ∂ bold_italic_q ] end_CELL end_ROW start_ROW start_CELL bold_italic_x + italic_δ [ ∂ caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_y ) / ∂ bold_italic_p ] end_CELL end_ROW start_ROW start_CELL bold_italic_y end_CELL end_ROW end_ARG ] , [ start_ARG start_ROW start_CELL bold_italic_q + italic_δ [ ∂ caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_p ) / ∂ bold_italic_p ] end_CELL end_ROW start_ROW start_CELL bold_italic_p end_CELL end_ROW start_ROW start_CELL bold_italic_x end_CELL end_ROW start_ROW start_CELL bold_italic_y - italic_δ [ ∂ caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_p ) / ∂ bold_italic_q ] end_CELL end_ROW end_ARG ] , and divide start_ARG 1 end_ARG start_ARG 2 end_ARG [ start_ARG start_ROW start_CELL ( start_ARG start_ROW start_CELL bold_italic_q + bold_italic_x end_CELL end_ROW start_ROW start_CELL bold_italic_p + bold_italic_y end_CELL end_ROW end_ARG ) + bold_italic_R start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT ( start_ARG start_ROW start_CELL bold_italic_q - bold_italic_x end_CELL end_ROW start_ROW start_CELL bold_italic_p - bold_italic_y end_CELL end_ROW end_ARG ) end_CELL end_ROW start_ROW start_CELL ( start_ARG start_ROW start_CELL bold_italic_q + bold_italic_x end_CELL end_ROW start_ROW start_CELL bold_italic_p + bold_italic_y end_CELL end_ROW end_ARG ) - bold_italic_R start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT ( start_ARG start_ROW start_CELL bold_italic_q - bold_italic_x end_CELL end_ROW start_ROW start_CELL bold_italic_p - bold_italic_y end_CELL end_ROW end_ARG ) end_CELL end_ROW end_ARG ] , (15)

respectively. Here

𝑹δ:=[cos(2ωδ)𝑰sin(2ωδ)𝑰sin(2ωδ)𝑰cos(2ωδ)𝑰],where𝑰is a identity matrix.assignsuperscript𝑹𝛿matrix2𝜔𝛿𝑰2𝜔𝛿𝑰2𝜔𝛿𝑰2𝜔𝛿𝑰where𝑰is a identity matrix\bm{R}^{\delta}:=\begin{bmatrix}\cos(2\omega\delta)\bm{I}&\sin(2\omega\delta)% \bm{I}\\ -\sin(2\omega\delta)\bm{I}&\cos(2\omega\delta)\bm{I}\end{bmatrix},~{}~{}% \textrm{where}~{}\bm{I}~{}\textrm{is a identity matrix}.bold_italic_R start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT := [ start_ARG start_ROW start_CELL roman_cos ( 2 italic_ω italic_δ ) bold_italic_I end_CELL start_CELL roman_sin ( 2 italic_ω italic_δ ) bold_italic_I end_CELL end_ROW start_ROW start_CELL - roman_sin ( 2 italic_ω italic_δ ) bold_italic_I end_CELL start_CELL roman_cos ( 2 italic_ω italic_δ ) bold_italic_I end_CELL end_ROW end_ARG ] , where bold_italic_I is a identity matrix . (16)

We remark that 𝒙𝒙\bm{x}bold_italic_x and 𝒚𝒚\bm{y}bold_italic_y are just auxiliary variables, which are theoretically equal to 𝒒𝒒\bm{q}bold_italic_q and 𝒑𝒑\bm{p}bold_italic_p.

Then we construct a numerical integrator that approximates ¯¯\overline{\mathcal{H}}over¯ start_ARG caligraphic_H end_ARG by composing these maps: it is well known that

(𝒒i,𝒑i,𝒙i,𝒚i)=ϕ1dt/2ϕ2dt/2ϕ3dtϕ2dt/2ϕ1dt/2(𝒒i1,𝒑i1,𝒙i1,𝒚i1)subscript𝒒𝑖subscript𝒑𝑖subscript𝒙𝑖subscript𝒚𝑖superscriptsubscriptbold-italic-ϕ1d𝑡2superscriptsubscriptbold-italic-ϕ2d𝑡2superscriptsubscriptbold-italic-ϕ3d𝑡superscriptsubscriptbold-italic-ϕ2d𝑡2superscriptsubscriptbold-italic-ϕ1d𝑡2subscript𝒒𝑖1subscript𝒑𝑖1subscript𝒙𝑖1subscript𝒚𝑖1(\bm{q}_{i},\bm{p}_{i},\bm{x}_{i},\bm{y}_{i})=\bm{\phi}_{1}^{\textrm{d}t/2}% \circ\bm{\phi}_{2}^{\textrm{d}t/2}\circ\bm{\phi}_{3}^{\textrm{d}t}\circ\bm{% \phi}_{2}^{\textrm{d}t/2}\circ\bm{\phi}_{1}^{\textrm{d}t/2}\circ(\bm{q}_{i-1},% \bm{p}_{i-1},\bm{x}_{i-1},\bm{y}_{i-1})( bold_italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = bold_italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT d italic_t / 2 end_POSTSUPERSCRIPT ∘ bold_italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT d italic_t / 2 end_POSTSUPERSCRIPT ∘ bold_italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT d italic_t end_POSTSUPERSCRIPT ∘ bold_italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT d italic_t / 2 end_POSTSUPERSCRIPT ∘ bold_italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT d italic_t / 2 end_POSTSUPERSCRIPT ∘ ( bold_italic_q start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) (17)

commonly named as Strang splitting, has a 3rd-order local error (thus a 2nd-order method), and is a symmetric method.

Next, we introduce two methods for solving partial differential equations (PDEs), which are the Roe solver and Lagrangian Vortex Method.

3.2.5 Roe Solver

In continuum mechanics, a one-dimensional hyperbolic conservation law is a first-order quasilinear hyperbolic PDE

𝒖t+𝑭(𝒖)x=0,𝒖𝑡𝑭𝒖𝑥0\frac{\partial\bm{u}}{\partial t}+\frac{\partial\bm{F}(\bm{u})}{\partial x}=0,divide start_ARG ∂ bold_italic_u end_ARG start_ARG ∂ italic_t end_ARG + divide start_ARG ∂ bold_italic_F ( bold_italic_u ) end_ARG start_ARG ∂ italic_x end_ARG = 0 , (18)

with an initial condition

𝒖(t=t0,x)=𝒖0(x),𝒖𝑡subscript𝑡0𝑥subscript𝒖0𝑥\bm{u}(t=t_{0},x)=\bm{u}_{0}(x),bold_italic_u ( italic_t = italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x ) = bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ) , (19)

and a proper boundary condition. Here the Ncsubscript𝑁𝑐N_{c}italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT-component vector 𝒖=[u(1),u(2),,u(Nc)]T𝒖superscriptsuperscript𝑢1superscript𝑢2superscript𝑢subscript𝑁𝑐𝑇\bm{u}=[u^{(1)},u^{(2)},\cdots,u^{(N_{c})}]^{T}bold_italic_u = [ italic_u start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , ⋯ , italic_u start_POSTSUPERSCRIPT ( italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is the conserved quantity, tT=[t0,t1]𝑡𝑇subscript𝑡0subscript𝑡1t\in T=[t_{0},t_{1}]italic_t ∈ italic_T = [ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] denotes the time variable, x𝑥xitalic_x denotes the spatial coordinate in a computational domain ΩΩ\Omegaroman_Ω, and 𝑭=[F(1),F(2),,F(Nc)]T𝑭superscriptsuperscript𝐹1superscript𝐹2superscript𝐹subscript𝑁𝑐𝑇\bm{F}=[F^{(1)},F^{(2)},\cdots,F^{(N_{c})}]^{T}bold_italic_F = [ italic_F start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , italic_F start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , ⋯ , italic_F start_POSTSUPERSCRIPT ( italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is a Ncsubscript𝑁𝑐N_{c}italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT-component flux function. The conservation laws described by (18) are fundamental in continuum mechanics, such as mass conservation, momentum conservation, and energy conservation in fluid mechanics [56].

Equation (18) can also be expressed in a weak form, which extends the class of admissible solutions to include discontinuous solutions. Specifically, by defining an arbitrary test function ϕ(t,x)italic-ϕ𝑡𝑥\phi(t,x)italic_ϕ ( italic_t , italic_x ) that is continuously differentiable both in time and space with compact support, and integrating (18)×ϕabsentitalic-ϕ\times\phi× italic_ϕ in the space-time domain T×Ω𝑇ΩT\times\Omegaitalic_T × roman_Ω, the weak form of (18) is derived as

T×Ω(𝒖ϕt+𝑭ϕx)dtdx=0.subscript𝑇Ω𝒖italic-ϕ𝑡𝑭italic-ϕ𝑥d𝑡d𝑥0\int_{T\times\Omega}\left(\bm{u}\frac{\partial\phi}{\partial t}+\bm{F}\frac{% \partial\phi}{\partial x}\right)\textrm{d}t\textrm{d}x=0.∫ start_POSTSUBSCRIPT italic_T × roman_Ω end_POSTSUBSCRIPT ( bold_italic_u divide start_ARG ∂ italic_ϕ end_ARG start_ARG ∂ italic_t end_ARG + bold_italic_F divide start_ARG ∂ italic_ϕ end_ARG start_ARG ∂ italic_x end_ARG ) d italic_t d italic_x = 0 . (20)

We remark that, with generalized Stokes theorem, all the partial derivatives of 𝒖𝒖\bm{u}bold_italic_u and 𝑭𝑭\bm{F}bold_italic_F in (18) have been passed on to the test function ϕitalic-ϕ\phiitalic_ϕ in (20), which with the former hypothesis is sufficiently smooth to admit these derivatives [57]. In the absence of ambiguity, we refer to the solution of (18) below as a weak solution that satisfies (20).

In addition, (18) can be written in a high dimensional form

𝒖t+i=1Nd𝑭i(𝒖)xi=𝟎,𝒖𝑡superscriptsubscript𝑖1subscript𝑁𝑑subscript𝑭𝑖𝒖subscript𝑥𝑖0\frac{\partial\bm{u}}{\partial t}+\sum_{i=1}^{N_{d}}\frac{\partial\bm{F}_{i}(% \bm{u})}{\partial x_{i}}=\bm{0},divide start_ARG ∂ bold_italic_u end_ARG start_ARG ∂ italic_t end_ARG + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG ∂ bold_italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u ) end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = bold_0 , (21)

where x1,x2,,xNdsubscript𝑥1subscript𝑥2subscript𝑥subscript𝑁𝑑x_{1},x_{2},\cdots,x_{N_{d}}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT denote the Ndsubscript𝑁𝑑N_{d}italic_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT-dimensional spatial coordinates. Since every dimension in the second term of (21), namely 𝑭i(𝒖)/xisubscript𝑭𝑖𝒖subscript𝑥𝑖\partial\bm{F}_{i}(\bm{u})/\partial x_{i}∂ bold_italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_u ) / ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, has the same form 𝑭(𝒖)/x𝑭𝒖𝑥\partial\bm{F}(\bm{u})/\partial x∂ bold_italic_F ( bold_italic_u ) / ∂ italic_x as the second term of (18), (21) can be easily solved if given the solution of (18). Thus, we will only discuss the numerical method to solve (18).

Philip L. Roe proposed an approximated Riemann solver based on the Godunov scheme [58] that constructs an estimation for the intercell numerical flux of 𝑭𝑭\bm{F}bold_italic_F in (18) on the interface of two neighboring computational cells in a discretized space-time computational domain [58]. In particular, the Roe solver discretizes (18) as

𝒖jn+1=𝒖jnλr(𝑭^j+12n𝑭^j12n),superscriptsubscript𝒖𝑗𝑛1superscriptsubscript𝒖𝑗𝑛subscript𝜆𝑟superscriptsubscript^𝑭𝑗12𝑛superscriptsubscript^𝑭𝑗12𝑛\bm{u}_{j}^{n+1}=\bm{u}_{j}^{n}-\lambda_{r}\left(\hat{\bm{F}}_{j+\frac{1}{2}}^% {n}-\hat{\bm{F}}_{j-\frac{1}{2}}^{n}\right),bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT = bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_F end_ARG start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_F end_ARG start_POSTSUBSCRIPT italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , (22)

where λr=Δt/Δxsubscript𝜆𝑟Δ𝑡Δ𝑥\lambda_{r}=\Delta t/\Delta xitalic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = roman_Δ italic_t / roman_Δ italic_x is the ratio of the temporal step size ΔtΔ𝑡\Delta troman_Δ italic_t to the spatial step size ΔxΔ𝑥\Delta xroman_Δ italic_x, j=1,,Ng𝑗1subscript𝑁𝑔j=1,...,N_{g}italic_j = 1 , … , italic_N start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT is the grid node index, and

𝑭^j+12n=𝑭^(𝒖jn,𝒖j+1n)superscriptsubscript^𝑭𝑗12𝑛^𝑭superscriptsubscript𝒖𝑗𝑛superscriptsubscript𝒖𝑗1𝑛\hat{\bm{F}}_{j+\frac{1}{2}}^{n}=\hat{\bm{F}}(\bm{u}_{j}^{n},\bm{u}_{j+1}^{n})over^ start_ARG bold_italic_F end_ARG start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = over^ start_ARG bold_italic_F end_ARG ( bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , bold_italic_u start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) (23)

with

𝑭^(𝒖,𝒗)=12[𝑭(𝒖)+𝑭(𝒗)|𝑨~(𝒖,𝒗)|(𝒗𝒖)].^𝑭𝒖𝒗12delimited-[]𝑭𝒖𝑭𝒗~𝑨𝒖𝒗𝒗𝒖\hat{\bm{F}}(\bm{u},\bm{v})=\frac{1}{2}\left[\bm{F}(\bm{u})+\bm{F}(\bm{v})-|% \tilde{\bm{A}}(\bm{u},\bm{v})|(\bm{v}-\bm{u})\right].over^ start_ARG bold_italic_F end_ARG ( bold_italic_u , bold_italic_v ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG [ bold_italic_F ( bold_italic_u ) + bold_italic_F ( bold_italic_v ) - | over~ start_ARG bold_italic_A end_ARG ( bold_italic_u , bold_italic_v ) | ( bold_italic_v - bold_italic_u ) ] . (24)

Here, Roe matrix 𝑨~~𝑨\tilde{\bm{A}}over~ start_ARG bold_italic_A end_ARG that is assumed constant between two cells and must obey the following Roe conditions:

  1. 1.

    Matrix 𝑨~~𝑨\tilde{\bm{A}}over~ start_ARG bold_italic_A end_ARG is a diagonalizable matrix with real eigenvalues, i.e., matrix 𝑨~(𝒖,𝒗)~𝑨𝒖𝒗\tilde{\bm{A}}(\bm{u},\bm{v})over~ start_ARG bold_italic_A end_ARG ( bold_italic_u , bold_italic_v ) can be diagonalized as

    𝑨~=𝑳1𝚲𝑳~𝑨superscript𝑳1𝚲𝑳\tilde{\bm{A}}=\bm{L}^{-1}\bm{\Lambda}\bm{L}over~ start_ARG bold_italic_A end_ARG = bold_italic_L start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_Λ bold_italic_L (25)

    with an invertible matrix 𝑳𝑳\bm{L}bold_italic_L and a diagonal matrix 𝚲=diag(Λ1,,ΛNc)𝚲diagsubscriptΛ1subscriptΛsubscript𝑁𝑐\bm{\Lambda}=\textrm{diag}(\Lambda_{1},\cdots,\Lambda_{N_{c}})bold_Λ = diag ( roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , roman_Λ start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT ).

  2. 2.

    Matrix 𝑨~~𝑨\tilde{\bm{A}}over~ start_ARG bold_italic_A end_ARG is consistent with an exact Jacobian, that is

    lim𝒖j,𝒖j+1𝒖𝑨~(𝒖j,𝒖j+1)=𝑭(𝒖)𝒖.subscriptsubscript𝒖𝑗subscript𝒖𝑗1𝒖~𝑨subscript𝒖𝑗subscript𝒖𝑗1𝑭𝒖𝒖\lim_{\bm{u}_{j},\bm{u}_{j+1}\rightarrow\bm{u}}\tilde{\bm{A}}(\bm{u}_{j},\bm{u% }_{j+1})=\frac{\partial\bm{F}(\bm{u})}{\partial\bm{u}}.roman_lim start_POSTSUBSCRIPT bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_u start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT → bold_italic_u end_POSTSUBSCRIPT over~ start_ARG bold_italic_A end_ARG ( bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_u start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ) = divide start_ARG ∂ bold_italic_F ( bold_italic_u ) end_ARG start_ARG ∂ bold_italic_u end_ARG . (26)
  3. 3.

    Physical quantity 𝒖𝒖\bm{u}bold_italic_u is conserved on the interface between two computational cells as

    𝑭j+1𝑭j=𝑨~(𝒖j+1𝒖j).subscript𝑭𝑗1subscript𝑭𝑗~𝑨subscript𝒖𝑗1subscript𝒖𝑗\bm{F}_{j+1}-\bm{F}_{j}=\tilde{\bm{A}}(\bm{u}_{j+1}-\bm{u}_{j}).bold_italic_F start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT - bold_italic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = over~ start_ARG bold_italic_A end_ARG ( bold_italic_u start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT - bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) . (27)

We denote the absolute value of 𝑨~(𝒖,𝒗)~𝑨𝒖𝒗\tilde{\bm{A}}(\bm{u},\bm{v})over~ start_ARG bold_italic_A end_ARG ( bold_italic_u , bold_italic_v ) as

|𝑨~|=𝑳1|𝚲|𝑳,~𝑨superscript𝑳1𝚲𝑳|\tilde{\bm{A}}|=\bm{L}^{-1}|\bm{\Lambda}|\bm{L},| over~ start_ARG bold_italic_A end_ARG | = bold_italic_L start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT | bold_Λ | bold_italic_L , (28)

where |𝚲|=diag(|Λ1|,,|ΛNc|)𝚲diagsubscriptΛ1subscriptΛsubscript𝑁𝑐|\bm{\Lambda}|=\textrm{diag}(|\Lambda_{1}|,\cdots,|\Lambda_{N_{c}}|)| bold_Λ | = diag ( | roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | , ⋯ , | roman_Λ start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ) is the absolute value of 𝚲𝚲\bm{\Lambda}bold_Λ. Substituting (23), (24) and (28) into (22) along with the third Roe condition (27) yields

𝒖jn+1=superscriptsubscript𝒖𝑗𝑛1absent\displaystyle\bm{u}_{j}^{n+1}=bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT = 𝒖jn12λr[(𝑳j+12n)1(𝚲j+12n|𝚲j+12n|)𝑳j+12n(𝒖j+1n𝒖jn)\displaystyle\bm{u}_{j}^{n}-\frac{1}{2}\lambda_{r}[(\bm{L}^{n}_{j+\frac{1}{2}}% )^{-1}(\bm{\Lambda}_{j+\frac{1}{2}}^{n}-|\bm{\Lambda}_{j+\frac{1}{2}}^{n}|)\bm% {L}_{j+\frac{1}{2}}^{n}(\bm{u}_{j+1}^{n}-\bm{u}_{j}^{n})bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT [ ( bold_italic_L start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_Λ start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - | bold_Λ start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | ) bold_italic_L start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) (29)
+(𝑳j12n)1(𝚲j12n+|𝚲j12n|)𝑳j12n(𝒖jn𝒖j1n)],\displaystyle+(\bm{L}^{n}_{j-\frac{1}{2}})^{-1}(\bm{\Lambda}_{j-\frac{1}{2}}^{% n}+|\bm{\Lambda}_{j-\frac{1}{2}}^{n}|)\bm{L}_{j-\frac{1}{2}}^{n}(\bm{u}_{j}^{n% }-\bm{u}_{j-1}^{n})],+ ( bold_italic_L start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_Λ start_POSTSUBSCRIPT italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT + | bold_Λ start_POSTSUBSCRIPT italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | ) bold_italic_L start_POSTSUBSCRIPT italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - bold_italic_u start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ] ,

with

𝑳j+12n=𝑳(𝒖jn,𝒖j+1n),𝚲j+12n=𝚲(𝒖jn,𝒖j+1n).formulae-sequencesuperscriptsubscript𝑳𝑗12𝑛𝑳superscriptsubscript𝒖𝑗𝑛superscriptsubscript𝒖𝑗1𝑛superscriptsubscript𝚲𝑗12𝑛𝚲superscriptsubscript𝒖𝑗𝑛superscriptsubscript𝒖𝑗1𝑛\bm{L}_{j+\frac{1}{2}}^{n}=\bm{L}(\bm{u}_{j}^{n},\bm{u}_{j+1}^{n}),~{}~{}~{}~{% }\bm{\Lambda}_{j+\frac{1}{2}}^{n}=\bm{\Lambda}(\bm{u}_{j}^{n},\bm{u}_{j+1}^{n}).bold_italic_L start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = bold_italic_L ( bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , bold_italic_u start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , bold_Λ start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = bold_Λ ( bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , bold_italic_u start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) . (30)

Equation (29) serves as a template of evolution from 𝒖jnsuperscriptsubscript𝒖𝑗𝑛\bm{u}_{j}^{n}bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT to 𝒖jn+1superscriptsubscript𝒖𝑗𝑛1\bm{u}_{j}^{n+1}bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT in Roe solver.

The key to design an effective Roe solver is to find the Roe matrix 𝑨~~𝑨\tilde{\bm{A}}over~ start_ARG bold_italic_A end_ARG that satisfies the three Roe conditions. In order to construct a Roe matrix 𝑨~~𝑨\tilde{\bm{A}}over~ start_ARG bold_italic_A end_ARG in (25), Roe solver utilizes an analytical approach to solve 𝑳𝑳\bm{L}bold_italic_L and 𝚲𝚲\bm{\Lambda}bold_Λ based on 𝑭(𝒖)𝑭𝒖\bm{F}(\bm{u})bold_italic_F ( bold_italic_u ). The Roe matrix is then plugged into (29) to ultimately solve for 𝒖𝒖\bm{u}bold_italic_u in (18). The Roe solver linearizes Riemann problems, and such linearization recognizes the problem’s nonlinear jumps, while remaining computationally efficient.

3.2.6 Lagrangian Vortex Method (LVM)

Given a fluid velocity field 𝒖(𝒙,t)𝒖𝒙𝑡\bm{u}(\bm{x},t)bold_italic_u ( bold_italic_x , italic_t ) with an incompressible constraint, its underlying dynamics can be described by the NS equations

{D𝒖Dt=1ρp+ν2𝒖+𝒇,𝒖=0,cases𝐷𝒖𝐷𝑡1𝜌bold-∇𝑝𝜈superscript2𝒖𝒇otherwisebold-∇𝒖0otherwise\begin{dcases}\frac{D\bm{u}}{Dt}=-\frac{1}{\rho}\bm{\nabla}p+\nu\nabla^{2}\bm{% u}+\bm{f},\\ \bm{\nabla}\cdot\bm{u}=0,\end{dcases}{ start_ROW start_CELL divide start_ARG italic_D bold_italic_u end_ARG start_ARG italic_D italic_t end_ARG = - divide start_ARG 1 end_ARG start_ARG italic_ρ end_ARG bold_∇ italic_p + italic_ν ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_u + bold_italic_f , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL bold_∇ ⋅ bold_italic_u = 0 , end_CELL start_CELL end_CELL end_ROW (31)

where t𝑡titalic_t denotes the time, D/Dt=/t+𝒖𝐷𝐷𝑡𝑡𝒖bold-∇D/Dt=\partial/\partial t+\bm{u}\cdot\bm{\nabla}italic_D / italic_D italic_t = ∂ / ∂ italic_t + bold_italic_u ⋅ bold_∇ is the material derivative, p𝑝pitalic_p is the pressure, ν𝜈\nuitalic_ν is the kinematic viscosity, ρ𝜌\rhoitalic_ρ is the density, and 𝒇𝒇\bm{f}bold_italic_f is the body accelerations (per unit mass) acting on the continuum, for example, gravity, inertial accelerations, electric field acceleration, and so on.

The alternative form of the NS equations could be obtained by defining the vorticity field 𝝎=×𝒖𝝎bold-∇𝒖\bm{\omega}=\bm{\nabla\times u}bold_italic_ω = bold_∇ bold_× bold_italic_u, which leads to the following vorticity dynamical equation

{D𝝎Dt=(𝝎)𝒖+ν2𝝎+×𝒇,2𝚿=𝝎,𝒖=×𝚿,cases𝐷𝝎𝐷𝑡𝝎bold-∇𝒖𝜈superscriptbold-∇2𝝎bold-∇𝒇otherwiseformulae-sequencesuperscript2𝚿𝝎𝒖bold-∇𝚿otherwise\begin{dcases}\frac{D\bm{\omega}}{Dt}=(\bm{\omega}\cdot\bm{\nabla})\bm{u}+\nu% \bm{\nabla}^{2}\bm{\omega}+\bm{\nabla}\times\bm{f},\\ \nabla^{2}\bm{\Psi}=-\bm{\omega},~{}~{}\bm{u}=\bm{\nabla}\times\bm{\Psi},\end{dcases}{ start_ROW start_CELL divide start_ARG italic_D bold_italic_ω end_ARG start_ARG italic_D italic_t end_ARG = ( bold_italic_ω ⋅ bold_∇ ) bold_italic_u + italic_ν bold_∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_ω + bold_∇ × bold_italic_f , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_Ψ = - bold_italic_ω , bold_italic_u = bold_∇ × bold_Ψ , end_CELL start_CELL end_CELL end_ROW (32)

where 𝚿𝚿\bm{\Psi}bold_Ψ is a vector potential whose curl is the velocity field. Although this form does not seem to bring any simplification, the key illumination of doing this transformation stems the Helmholtz’s theorems [59], which states that the dynamics of the vorticity field can be described by vortex surfaces/lines, which are Lagrangian surfaces/lines flowing with the velocity field in inviscid flows [60, 61].

The LVM discretizes the vorticity dynamical equation (32) with N𝑁Nitalic_N particles resulting in a set of ODEs for the particle strengths 𝚪={𝚪i|i=1,,N}𝚪conditional-setsubscript𝚪𝑖𝑖1𝑁\bm{\Gamma}=\{\bm{\Gamma}_{i}|i=1,\cdots,N\}bold_Γ = { bold_Γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_i = 1 , ⋯ , italic_N } and the particle positions 𝑿={𝑿i|i=1,,N}𝑿conditional-setsubscript𝑿𝑖𝑖1𝑁\bm{X}=\{\bm{X}_{i}|i=1,\cdots,N\}bold_italic_X = { bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_i = 1 , ⋯ , italic_N } as

{d𝚪idt=𝜸i,d𝑿idt=𝒖i+𝒗i.casesdsubscript𝚪𝑖d𝑡subscript𝜸𝑖otherwisedsubscript𝑿𝑖d𝑡subscript𝒖𝑖subscript𝒗𝑖otherwise\begin{dcases}\frac{\textrm{d}\bm{\Gamma}_{i}}{\textrm{d}t}=\bm{\gamma}_{i},\\ \frac{\textrm{d}\bm{X}_{i}}{\textrm{d}t}=\bm{u}_{i}+\bm{v}_{i}.\end{dcases}{ start_ROW start_CELL divide start_ARG d bold_Γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG d italic_t end_ARG = bold_italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL divide start_ARG d bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG d italic_t end_ARG = bold_italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . end_CELL start_CELL end_CELL end_ROW (33)

Here, the particle strength 𝚪isubscript𝚪𝑖\bm{\Gamma}_{i}bold_Γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the integral of 𝝎𝝎\bm{\omega}bold_italic_ω over the ithsuperscript𝑖thi^{\textrm{th}}italic_i start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT computational element, 𝒖isubscript𝒖𝑖\bm{u}_{i}bold_italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the induced velocity calculated by BS law

𝒖i=12(nd1)πjiN𝚪j×(𝑿i𝑿j)|𝑿i𝑿j|nd+nd,subscript𝒖𝑖12subscript𝑛𝑑1𝜋superscriptsubscript𝑗𝑖𝑁subscript𝚪𝑗subscript𝑿𝑖subscript𝑿𝑗superscriptsubscript𝑿𝑖subscript𝑿𝑗subscript𝑛𝑑superscriptsubscript𝑛𝑑\bm{u}_{i}=\frac{1}{2(n_{d}-1)\pi}\sum_{j\neq i}^{N}\frac{\bm{\Gamma}_{j}% \times(\bm{X}_{i}-\bm{X}_{j})}{|\bm{X}_{i}-\bm{X}_{j}|^{n_{d}}+\mathcal{R}^{n_% {d}}},bold_italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 ( italic_n start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - 1 ) italic_π end_ARG ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG bold_Γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT × ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG start_ARG | bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + caligraphic_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG , (34)

where ndsubscript𝑛𝑑n_{d}italic_n start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT is the dimension of the flow field. In addition, 𝜸isubscript𝜸𝑖\bm{\gamma}_{i}bold_italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝒗isubscript𝒗𝑖\bm{v}_{i}bold_italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are the change rate of the particle strength and the drift velocity [62], respectively. To avoid singularities in the BS law, we introduce the numerical regularization parameter \mathcal{R}caligraphic_R in the LVM as =0.10.1\mathcal{R}=0.1caligraphic_R = 0.1. The effect of the regularization parameter on the dynamics of the flow evolution of the simulated vortex particles is rather small because of the large spacing between the vortex particles.

In a two-dimensional ideal fluid flow, i.e., a strictly inviscid barotropic flow with conservative body forces, the movements of Lagrangian particles with conserved vorticity strength are determined by the velocity field they create, thus allowing us to advance the simulation temporally [63]. However, in the real three-dimensional flow, under the action of vortex stretching, vortex distortion, viscous dissipation, external forces, etc., the Lagrangian advection of vortex particles and their strength need to be corrected by γisubscript𝛾𝑖\gamma_{i}italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝒗isubscript𝒗𝑖\bm{v}_{i}bold_italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in (33).

We remark that the NS equations can be accurately modeled by the LVM with a large number of computational elements and a reasonable discrete distribution. However, the implementation of the LVM faces a major challenge which is to model the right-hand sides (r.h.s.) of the set of ordinary differential equations based on the NS equations. Firstly, the assumption that the vortices are point-like largely limits the use of the continuous BS law. Second, the drift velocity due to the external force cannot be obtained using the LVM without knowing the function of the external force. Even given the function, the LVM still fails to capture the drift velocity accurately in most cases [62]. Finally, when two particles are close enough, the singularity of the discrete BS law leads to a significant numerical error. The above problems make the LVM inaccurate and inapplicable in solving the underlying fluid dynamics under many situations [63].

4 Implementation

4.1 Symplectic Taylor Neural Networks (Taylor-nets)

4.1.1 Symplectomorphism in Hamiltonian Mechanics

Given a separable Hamiltonian system described by (6), (7), and (8). Substituting (8) into (6) yields

{d𝒒dt=T(𝒑)𝒑,d𝒑dt=V(𝒒)𝒒.casesd𝒒d𝑡𝑇𝒑𝒑otherwised𝒑d𝑡𝑉𝒒𝒒otherwise\begin{dcases}\frac{\textrm{d}\bm{q}}{\textrm{d}t}=\frac{\partial T(\bm{p})}{% \partial\bm{p}},\\ \frac{\textrm{d}\bm{p}}{\textrm{d}t}=-\frac{\partial V(\bm{q})}{\partial\bm{q}% }.\end{dcases}{ start_ROW start_CELL divide start_ARG d bold_italic_q end_ARG start_ARG d italic_t end_ARG = divide start_ARG ∂ italic_T ( bold_italic_p ) end_ARG start_ARG ∂ bold_italic_p end_ARG , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL divide start_ARG d bold_italic_p end_ARG start_ARG d italic_t end_ARG = - divide start_ARG ∂ italic_V ( bold_italic_q ) end_ARG start_ARG ∂ bold_italic_q end_ARG . end_CELL start_CELL end_CELL end_ROW (35)

This set of equations is fundamental in designing our neural networks. Our model will learn the r.h.s. of (35) under the framework of ODE-net.

One of the important features of the time evolution of Hamilton’s equations is symplectomorphism, which represents a transformation of phase space that is volume-preserving. In the setting of canonical coordinates, symplectomorphism means the transformation of the phase flow of a Hamiltonian system conserves the symplectic two-form

d𝒑d𝒒j=1N(dpjdqj),d𝒑d𝒒superscriptsubscript𝑗1𝑁dsubscript𝑝𝑗dsubscript𝑞𝑗\textrm{d}\bm{p}\wedge\textrm{d}\bm{q}\equiv\sum_{j=1}^{N}\left(\textrm{d}p_{j% }\wedge\textrm{d}q_{j}\right),d bold_italic_p ∧ d bold_italic_q ≡ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( d italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∧ d italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , (36)

where \wedge denotes the wedge product of two differential forms. Inspired by the symplectomorphism feature, we aim to construct a neural network architecture that intrinsically preserves Hamiltonian structure.

4.1.2 A symmetric network in Taylor expansion form

In order to learn the gradients of the Hamiltonian with respect to the generalized coordinates, we propose the following underpinning mechanism, which is a set of symmetric networks that learn the gradients of the Hamiltonian with respect to the generalized coordinates.

{𝑻p(𝒑,𝜽p)T(𝒑)𝒑,𝑽q(𝒒,𝜽q)V(𝒒)𝒒,casessubscript𝑻𝑝𝒑subscript𝜽𝑝𝑇𝒑𝒑otherwisesubscript𝑽𝑞𝒒subscript𝜽𝑞𝑉𝒒𝒒otherwise\begin{dcases}\bm{T}_{p}(\bm{p},\bm{\theta}_{p})\rightarrow\frac{\partial T(% \bm{p})}{\partial\bm{p}},\\ \bm{V}_{q}(\bm{q},\bm{\theta}_{q})\rightarrow\frac{\partial V(\bm{q})}{% \partial\bm{q}},\end{dcases}{ start_ROW start_CELL bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_p , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) → divide start_ARG ∂ italic_T ( bold_italic_p ) end_ARG start_ARG ∂ bold_italic_p end_ARG , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL bold_italic_V start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) → divide start_ARG ∂ italic_V ( bold_italic_q ) end_ARG start_ARG ∂ bold_italic_q end_ARG , end_CELL start_CELL end_CELL end_ROW (37)

with parameters (𝜽p,𝜽q)subscript𝜽𝑝subscript𝜽𝑞(\bm{\theta}_{p},\bm{\theta}_{q})( bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , bold_italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) that are designed to learn the r.h.s. of (35), respectively. Here, the “\rightarrow” represents our attempt to use the left-hand side (l.h.s) to learn the r.h.s. Substituting (37) into (35) yields

{d𝒒dt=𝑻p(𝒑,𝜽p),d𝒑dt=𝑽q(𝒒,𝜽q).casesd𝒒d𝑡subscript𝑻𝑝𝒑subscript𝜽𝑝otherwised𝒑d𝑡subscript𝑽𝑞𝒒subscript𝜽𝑞otherwise\begin{dcases}\frac{\textrm{d}\bm{q}}{\textrm{d}t}=\bm{T}_{p}(\bm{p},\bm{% \theta}_{p}),\\ \frac{\textrm{d}\bm{p}}{\textrm{d}t}=-\bm{V}_{q}(\bm{q},\bm{\theta}_{q}).\end{dcases}{ start_ROW start_CELL divide start_ARG d bold_italic_q end_ARG start_ARG d italic_t end_ARG = bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_p , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL divide start_ARG d bold_italic_p end_ARG start_ARG d italic_t end_ARG = - bold_italic_V start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) . end_CELL start_CELL end_CELL end_ROW (38)

Therefore, under the initial condition (7), the trajectories of the canonical coordinates can be integrated as

{𝒒(t)=𝒒0+t0t𝑻p(𝒑,𝜽p)dt,𝒑(t)=𝒑0t0t𝑽q(𝒒,𝜽q)dt.cases𝒒𝑡subscript𝒒0superscriptsubscriptsubscript𝑡0𝑡subscript𝑻𝑝𝒑subscript𝜽𝑝d𝑡otherwise𝒑𝑡subscript𝒑0superscriptsubscriptsubscript𝑡0𝑡subscript𝑽𝑞𝒒subscript𝜽𝑞d𝑡otherwise\begin{dcases}\bm{q}(t)=\bm{q}_{0}+\int_{t_{0}}^{t}\bm{T}_{p}(\bm{p},\bm{% \theta}_{p})\textrm{d}t,\\ \bm{p}(t)=\bm{p}_{0}-\int_{t_{0}}^{t}\bm{V}_{q}(\bm{q},\bm{\theta}_{q})\textrm% {d}t.\end{dcases}{ start_ROW start_CELL bold_italic_q ( italic_t ) = bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_p , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) d italic_t , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL bold_italic_p ( italic_t ) = bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_italic_V start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) d italic_t . end_CELL start_CELL end_CELL end_ROW (39)

From (37), we obtain

{𝑻p(𝒑,𝜽p)𝒑2T(𝒑)𝒑2,𝑽q(𝒒,𝜽q)𝒒2V(𝒒)𝒒2.casessubscript𝑻𝑝𝒑subscript𝜽𝑝𝒑superscript2𝑇𝒑superscript𝒑2otherwisesubscript𝑽𝑞𝒒subscript𝜽𝑞𝒒superscript2𝑉𝒒superscript𝒒2otherwise\begin{dcases}\frac{\partial\bm{T}_{p}(\bm{p},\bm{\theta}_{p})}{\partial\bm{p}% }\rightarrow\frac{\partial^{2}T(\bm{p})}{\partial\bm{p}^{2}},\\ \frac{\partial\bm{V}_{q}(\bm{q},\bm{\theta}_{q})}{\partial\bm{q}}\rightarrow% \frac{\partial^{2}V(\bm{q})}{\partial\bm{q}^{2}}.\end{dcases}{ start_ROW start_CELL divide start_ARG ∂ bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_p , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ bold_italic_p end_ARG → divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T ( bold_italic_p ) end_ARG start_ARG ∂ bold_italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL divide start_ARG ∂ bold_italic_V start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ bold_italic_q end_ARG → divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_V ( bold_italic_q ) end_ARG start_ARG ∂ bold_italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . end_CELL start_CELL end_CELL end_ROW (40)

The r.h.s. of (40) are the Hessian matrix of T𝑇Titalic_T and V𝑉Vitalic_V respectively, so we can design 𝑻p(𝒑,𝜽p)subscript𝑻𝑝𝒑subscript𝜽𝑝\bm{T}_{p}(\bm{p},\bm{\theta}_{p})bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_p , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) and 𝑽q(𝒒,𝜽q)subscript𝑽𝑞𝒒subscript𝜽𝑞\bm{V}_{q}(\bm{q},\bm{\theta}_{q})bold_italic_V start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) as symmetric map**s, that are

𝑻p(𝒑,𝜽p)𝒑=[𝑻p(𝒑,𝜽p)𝒑]T,subscript𝑻𝑝𝒑subscript𝜽𝑝𝒑superscriptdelimited-[]subscript𝑻𝑝𝒑subscript𝜽𝑝𝒑𝑇\frac{\partial\bm{T}_{p}(\bm{p},\bm{\theta}_{p})}{\partial\bm{p}}=\left[\frac{% \partial\bm{T}_{p}(\bm{p},\bm{\theta}_{p})}{\partial\bm{p}}\right]^{T},divide start_ARG ∂ bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_p , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ bold_italic_p end_ARG = [ divide start_ARG ∂ bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_p , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ bold_italic_p end_ARG ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , (41)

and

𝑽q(𝒒,𝜽q)𝒒=[𝑽q(𝒒,𝜽q)𝒒]T.subscript𝑽𝑞𝒒subscript𝜽𝑞𝒒superscriptdelimited-[]subscript𝑽𝑞𝒒subscript𝜽𝑞𝒒𝑇\frac{\partial\bm{V}_{q}(\bm{q},\bm{\theta}_{q})}{\partial\bm{q}}=\left[\frac{% \partial\bm{V}_{q}(\bm{q},\bm{\theta}_{q})}{\partial\bm{q}}\right]^{T}.divide start_ARG ∂ bold_italic_V start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ bold_italic_q end_ARG = [ divide start_ARG ∂ bold_italic_V start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ bold_italic_q end_ARG ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT . (42)

Due to the multiple nonlinear layers in the construction of traditional deep neural networks, it is impossible for these deep neural networks to fulfill (41) and (42). Therefore, we can only use a three-layer network with the form of linear-activation-linear, where the weights of the two linear layers are the transpose of each other, and in order to still maintain the expressive power of the networks, we construct symmetric nonlinear terms, as same as the terms of a Taylor polynomial, and combine them linearly. Specifically, we construct a symmetric network 𝑻p(𝒑,𝜽p)subscript𝑻𝑝𝒑subscript𝜽𝑝\bm{T}_{p}(\bm{p},\bm{\theta}_{p})bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_p , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) as

𝑻p(𝒑,𝜽p)=(i=1M𝑨iTfi𝑨i𝑩iTfi𝑩i)𝒑+𝒃,subscript𝑻𝑝𝒑subscript𝜽𝑝superscriptsubscript𝑖1𝑀superscriptsubscript𝑨𝑖𝑇subscript𝑓𝑖subscript𝑨𝑖superscriptsubscript𝑩𝑖𝑇subscript𝑓𝑖subscript𝑩𝑖𝒑𝒃\bm{T}_{p}(\bm{p},\bm{\theta}_{p})=\left(\sum_{i=1}^{M}\bm{A}_{i}^{T}\circ f_{% i}\circ\bm{A}_{i}-\bm{B}_{i}^{T}\circ f_{i}\circ\bm{B}_{i}\right)\circ\bm{p}+% \bm{b},bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_p , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) = ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT bold_italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∘ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∘ bold_italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∘ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∘ bold_italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∘ bold_italic_p + bold_italic_b , (43)

where ‘\circ’ denotes the function composition, 𝑨isubscript𝑨𝑖\bm{A}_{i}bold_italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝑩isubscript𝑩𝑖\bm{B}_{i}bold_italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are fully connected layers with size Nh×Nsubscript𝑁𝑁N_{h}\times Nitalic_N start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT × italic_N, 𝒃𝒃\bm{b}bold_italic_b is a N𝑁Nitalic_N dimensional bias, M𝑀Mitalic_M is the number of terms in the Taylor series expansion, and fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is an element-wise function, representing the ithsuperscript𝑖thi^{\textrm{th}}italic_i start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT order term in the Taylor polynomial

fi(x)=1i!xi.subscript𝑓𝑖𝑥1𝑖superscript𝑥𝑖f_{i}(x)=\frac{1}{i!}x^{i}.italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) = divide start_ARG 1 end_ARG start_ARG italic_i ! end_ARG italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT . (44)

Figure 1 plots a schematic diagram of 𝑻p(𝒑,𝜽p)subscript𝑻𝑝𝒑subscript𝜽𝑝\bm{T}_{p}(\bm{p},\bm{\theta}_{p})bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_p , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) in Taylor-net. The input of 𝑻p(𝒑,𝜽p)subscript𝑻𝑝𝒑subscript𝜽𝑝\bm{T}_{p}(\bm{p},\bm{\theta}_{p})bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_p , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) is 𝒑𝒑\bm{p}bold_italic_p, and 𝜽p=(𝑨i\bm{\theta}_{p}=(\bm{A}_{i}bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = ( bold_italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, 𝑩i,𝒃)\bm{B}_{i},\bm{b})bold_italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_b ). We construct a negative term 𝑩iTfi𝑩isuperscriptsubscript𝑩𝑖𝑇subscript𝑓𝑖subscript𝑩𝑖\bm{B}_{i}^{T}\circ f_{i}\circ\bm{B}_{i}bold_italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∘ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∘ bold_italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT following a positive term 𝑨iTfi𝑨isuperscriptsubscript𝑨𝑖𝑇subscript𝑓𝑖subscript𝑨𝑖\bm{A}_{i}^{T}\circ f_{i}\circ\bm{A}_{i}bold_italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∘ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∘ bold_italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, since two positive semidefinite matrices with opposite signs can represent any symmetric matrix.

Refer to caption
Figure 1: The schematic diagram of 𝑻p(𝒑,𝜽p)subscript𝑻𝑝𝒑subscript𝜽𝑝\bm{T}_{p}(\bm{p},\bm{\theta}_{p})bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_p , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) in Taylor-net. Source: [1].

To prove (43) is symmetric, that is it fulfills (41), we introduce Theorem 4.1.

Theorem 4.1.

The network (43) satisfies (41).

Proof.

From (43), we have

𝑻p(𝒑,𝜽p)𝒑=i=1M𝑨iT𝚲iA𝑨i𝑩iT𝚲iB𝑩i,subscript𝑻𝑝𝒑subscript𝜽𝑝𝒑superscriptsubscript𝑖1𝑀superscriptsubscript𝑨𝑖𝑇superscriptsubscript𝚲𝑖𝐴subscript𝑨𝑖superscriptsubscript𝑩𝑖𝑇superscriptsubscript𝚲𝑖𝐵subscript𝑩𝑖\frac{\partial\bm{T}_{p}(\bm{p},\bm{\theta}_{p})}{\partial\bm{p}}=\sum_{i=1}^{% M}\bm{A}_{i}^{T}\bm{\Lambda}_{i}^{A}\bm{A}_{i}-\bm{B}_{i}^{T}\bm{\Lambda}_{i}^% {B}\bm{B}_{i},divide start_ARG ∂ bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_p , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ bold_italic_p end_ARG = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT bold_italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT bold_italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT bold_italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (45)

with

ΛiA=diag(dfdx|x=𝑨i𝒑),superscriptsubscriptΛ𝑖𝐴diagevaluated-atd𝑓d𝑥𝑥subscript𝑨𝑖𝒑\Lambda_{i}^{A}=\textrm{diag}\left(\frac{\textrm{d}f}{\textrm{d}x}\Bigg{|}_{x=% \bm{A}_{i}\circ\bm{p}}\right),roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = diag ( divide start_ARG d italic_f end_ARG start_ARG d italic_x end_ARG | start_POSTSUBSCRIPT italic_x = bold_italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∘ bold_italic_p end_POSTSUBSCRIPT ) , (46)

and

ΛiB=diag(dfdx|x=𝑩i𝒑).superscriptsubscriptΛ𝑖𝐵diagevaluated-atd𝑓d𝑥𝑥subscript𝑩𝑖𝒑\Lambda_{i}^{B}=\textrm{diag}\left(\frac{\textrm{d}f}{\textrm{d}x}\Bigg{|}_{x=% \bm{B}_{i}\circ\bm{p}}\right).roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = diag ( divide start_ARG d italic_f end_ARG start_ARG d italic_x end_ARG | start_POSTSUBSCRIPT italic_x = bold_italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∘ bold_italic_p end_POSTSUBSCRIPT ) . (47)

It’s easy to see that (45) is a symmetric matrix that satisfies (41). ∎

In fact, 𝑻p(𝒑,𝜽p)subscript𝑻𝑝𝒑subscript𝜽𝑝\bm{T}_{p}(\bm{p},\bm{\theta}_{p})bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_p , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) in (41) and 𝑽q(𝒒,𝜽q)subscript𝑽𝑞𝒒subscript𝜽𝑞\bm{V}_{q}(\bm{q},\bm{\theta}_{q})bold_italic_V start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) in (42) satisfy the same property, so we construct Vqsubscript𝑉𝑞V_{q}italic_V start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT with the similar form as

𝑽q(𝒒,𝜽q)=(i=1M𝑪iTfi𝑪i𝑫iTfi𝑫i)𝒒+𝒅.subscript𝑽𝑞𝒒subscript𝜽𝑞superscriptsubscript𝑖1𝑀superscriptsubscript𝑪𝑖𝑇subscript𝑓𝑖subscript𝑪𝑖superscriptsubscript𝑫𝑖𝑇subscript𝑓𝑖subscript𝑫𝑖𝒒𝒅\bm{V}_{q}(\bm{q},\bm{\theta}_{q})=\left(\sum_{i=1}^{M}\bm{C}_{i}^{T}\circ f_{% i}\circ\bm{C}_{i}-\bm{D}_{i}^{T}\circ f_{i}\circ\bm{D}_{i}\right)\circ\bm{q}+% \bm{d}.bold_italic_V start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) = ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT bold_italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∘ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∘ bold_italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∘ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∘ bold_italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∘ bold_italic_q + bold_italic_d . (48)

Here, 𝑪isubscript𝑪𝑖\bm{C}_{i}bold_italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, 𝑫isubscript𝑫𝑖\bm{D}_{i}bold_italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and 𝒅𝒅\bm{d}bold_italic_d have the same structure as (43), and (𝑪i(\bm{C}_{i}( bold_italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, 𝑫i,𝒅)=𝜽q\bm{D}_{i},\bm{d})=\bm{\theta}_{q}bold_italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_d ) = bold_italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT.

4.1.3 Symplectic Taylor neural networks

Next, we substitute the constructed network (43) and (48) into (39) to learn the Hamiltonian system (35). We employ ODE-net [50] introduced in 3.1.5 as our computational infrastructure. Inspired by the idea of ODE-net, we design neural networks that can learn continuous time evolution. In Hamiltonian system (35), where the coordinates are integrated as (39), we can implement a time integrator to solve for 𝒑𝒑\bm{p}bold_italic_p and 𝒒𝒒\bm{q}bold_italic_q. While ODE-net uses fourth-order Runge–Kutta method to make the neural networks structure-preserving, we need to implement an integrator that is symplectic. Therefore, we introduce Taylor-net, in which we design the symmetric Taylor series expansion and utilize the fourth-order symplectic integrator to construct neural networks that are symplectic to learn the gradients of the Hamiltonian with respect to the generalized coordinates and ultimately the temporal integral of a Hamiltonian system.

Algorithm 1 Integrate (39) by using the fourth-order symplectic integrator. Source: [1].
𝒒0,𝒑0,t0,t,Δtsubscript𝒒0subscript𝒑0subscript𝑡0𝑡Δ𝑡\bm{q}_{0},\bm{p}_{0},t_{0},t,\Delta tbold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_t , roman_Δ italic_t,
𝑭tjsuperscriptsubscript𝑭𝑡𝑗\bm{F}_{t}^{j}bold_italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT in (49) and 𝑭kjsuperscriptsubscript𝑭𝑘𝑗\bm{F}_{k}^{j}bold_italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT in (50) with j=1,2,3,4𝑗1234j=1,2,3,4italic_j = 1 , 2 , 3 , 4;
𝒒(t),𝒑(t)𝒒𝑡𝒑𝑡\bm{q}(t),\bm{p}(t)bold_italic_q ( italic_t ) , bold_italic_p ( italic_t )
n=floor[(tt0)/Δt]𝑛floordelimited-[]𝑡subscript𝑡0Δ𝑡n=\textrm{floor}[(t-t_{0})/\Delta t]italic_n = floor [ ( italic_t - italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) / roman_Δ italic_t ];
for i=1,n𝑖1𝑛i=1,nitalic_i = 1 , italic_n
    (𝒌p0,𝒌q0)=(𝒑i1,𝒒i1)superscriptsubscript𝒌𝑝0superscriptsubscript𝒌𝑞0subscript𝒑𝑖1subscript𝒒𝑖1(\bm{k}_{p}^{0},\bm{k}_{q}^{0})=(\bm{p}_{i-1},\bm{q}_{i-1})( bold_italic_k start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , bold_italic_k start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) = ( bold_italic_p start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , bold_italic_q start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT );
    for j=1,4𝑗14j=1,4italic_j = 1 , 4
        (𝒕pj1,𝒕qj1)=𝑭tj(𝒌pj1,𝒌qj1,Δt)superscriptsubscript𝒕𝑝𝑗1superscriptsubscript𝒕𝑞𝑗1superscriptsubscript𝑭𝑡𝑗superscriptsubscript𝒌𝑝𝑗1superscriptsubscript𝒌𝑞𝑗1Δ𝑡(\bm{t}_{p}^{j-1},\bm{t}_{q}^{j-1})=\bm{F}_{t}^{j}(\bm{k}_{p}^{j-1},\bm{k}_{q}% ^{j-1},\Delta t)( bold_italic_t start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT , bold_italic_t start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT ) = bold_italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( bold_italic_k start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT , bold_italic_k start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT , roman_Δ italic_t ),
        (𝒌pj,𝒌qj)=𝑭kj(𝒕pj1,𝒕qj1,Δt)superscriptsubscript𝒌𝑝𝑗superscriptsubscript𝒌𝑞𝑗superscriptsubscript𝑭𝑘𝑗superscriptsubscript𝒕𝑝𝑗1superscriptsubscript𝒕𝑞𝑗1Δ𝑡(\bm{k}_{p}^{j},\bm{k}_{q}^{j})=\bm{F}_{k}^{j}(\bm{t}_{p}^{j-1},\bm{t}_{q}^{j-% 1},\Delta t)( bold_italic_k start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , bold_italic_k start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) = bold_italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( bold_italic_t start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT , bold_italic_t start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT , roman_Δ italic_t ),
    end
    (𝒑i,𝒒i)=(𝒌p4,𝒌q4)subscript𝒑𝑖subscript𝒒𝑖superscriptsubscript𝒌𝑝4superscriptsubscript𝒌𝑞4(\bm{p}_{i},\bm{q}_{i})=(\bm{k}_{p}^{4},\bm{k}_{q}^{4})( bold_italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ( bold_italic_k start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT , bold_italic_k start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT );
end
𝒒(t)=𝒒n,𝒑(t)=𝒑nformulae-sequence𝒒𝑡subscript𝒒𝑛𝒑𝑡subscript𝒑𝑛\bm{q}(t)=\bm{q}_{n},\bm{p}(t)=\bm{p}_{n}bold_italic_q ( italic_t ) = bold_italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_italic_p ( italic_t ) = bold_italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

For the constructed networks (43) and (48), we integrate (39) by using the fourth-order symplectic integrator introduced in 3.2.3. Specifically, we will have an input layer (𝒒0,𝒑0)subscript𝒒0subscript𝒑0(\bm{q}_{0},\bm{p}_{0})( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) at t=t0𝑡subscript𝑡0t=t_{0}italic_t = italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and an output layer (𝒒n,𝒑n)subscript𝒒𝑛subscript𝒑𝑛(\bm{q}_{n},\bm{p}_{n})( bold_italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) at t=t0+ndt𝑡subscript𝑡0𝑛d𝑡t=t_{0}+n\textrm{d}titalic_t = italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_n d italic_t. The recursive relations of (𝒒i,𝒑i),i=1,2,,nformulae-sequencesubscript𝒒𝑖subscript𝒑𝑖𝑖12𝑛(\bm{q}_{i},\bm{p}_{i}),i=1,2,\cdots,n( bold_italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_i = 1 , 2 , ⋯ , italic_n, can be expressed by Algorithm 1. The input function in Algorithm 1 are

𝑭tj(𝒑,𝒒,dt)=(𝒑,𝒒+cj𝑻p(𝒑,𝜽p)dt),superscriptsubscript𝑭𝑡𝑗𝒑𝒒d𝑡𝒑𝒒subscript𝑐𝑗subscript𝑻𝑝𝒑subscript𝜽𝑝d𝑡\bm{F}_{t}^{j}(\bm{p},\bm{q},\textrm{d}t)=\left(\bm{p},\bm{q}+c_{j}\bm{T}_{p}(% \bm{p},\bm{\theta}_{p})\textrm{d}t\right),bold_italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( bold_italic_p , bold_italic_q , d italic_t ) = ( bold_italic_p , bold_italic_q + italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_p , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) d italic_t ) , (49)

and

𝑭kj(𝒑,𝒒,dt)=(𝒑dj𝑽q(𝒒,𝜽q)dt,𝒒),superscriptsubscript𝑭𝑘𝑗𝒑𝒒d𝑡𝒑subscript𝑑𝑗subscript𝑽𝑞𝒒subscript𝜽𝑞d𝑡𝒒\bm{F}_{k}^{j}(\bm{p},\bm{q},\textrm{d}t)=\left(\bm{p}-d_{j}\bm{V}_{q}(\bm{q},% \bm{\theta}_{q})\textrm{d}t,\bm{q}\right),bold_italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( bold_italic_p , bold_italic_q , d italic_t ) = ( bold_italic_p - italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_italic_V start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) d italic_t , bold_italic_q ) , (50)

with coefficients (11).

Relationships (49) and (50) are obtained by replacing T(𝒑)/𝒑𝑇𝒑𝒑\partial T(\bm{p})/\partial\bm{p}∂ italic_T ( bold_italic_p ) / ∂ bold_italic_p and V(𝒒)/𝒒𝑉𝒒𝒒\partial V(\bm{q})/\partial\bm{q}∂ italic_V ( bold_italic_q ) / ∂ bold_italic_q in the fourth-order symplectic integrator with deliberately designed neural networks 𝑻p(𝒑,𝜽p)subscript𝑻𝑝𝒑subscript𝜽𝑝\bm{T}_{p}(\bm{p},\bm{\theta}_{p})bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_p , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) and 𝑽q(𝒒,𝜽q)subscript𝑽𝑞𝒒subscript𝜽𝑞\bm{V}_{q}(\bm{q},\bm{\theta}_{q})bold_italic_V start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ), respectively. Figure 2 plots a schematic diagram of Taylor-net which is described by Algorithm 1. The input of Taylor-net is (𝒒0,𝒑0)subscript𝒒0subscript𝒑0(\bm{q}_{0},\bm{p}_{0})( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), and the output is (𝒒n,𝒑n)subscript𝒒𝑛subscript𝒑𝑛(\bm{q}_{n},\bm{p}_{n})( bold_italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). Taylor-net consists of n𝑛nitalic_n iterations of fourth-order symplectic integrator. The input of the integrator is (𝒒i1,𝒑i1)subscript𝒒𝑖1subscript𝒑𝑖1(\bm{q}_{i-1},\bm{p}_{i-1})( bold_italic_q start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ), and the output is (𝒒i,𝒑i)subscript𝒒𝑖subscript𝒑𝑖(\bm{q}_{i},\bm{p}_{i})( bold_italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Within the integrator, the output of 𝑻psubscript𝑻𝑝\bm{T}_{p}bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is used to calculate 𝒒𝒒\bm{q}bold_italic_q, while the output of 𝑽qsubscript𝑽𝑞\bm{V}_{q}bold_italic_V start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT is used to calculate 𝒑𝒑\bm{p}bold_italic_p, which is signified by the shoelace-like pattern in the diagram. The four intermediate variables 𝒕p0𝒕p4superscriptsubscript𝒕𝑝0superscriptsubscript𝒕𝑝4\bm{t}_{p}^{0}\cdots\bm{t}_{p}^{4}bold_italic_t start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ⋯ bold_italic_t start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT and 𝒌q0𝒌q4superscriptsubscript𝒌𝑞0superscriptsubscript𝒌𝑞4\bm{k}_{q}^{0}\cdots\bm{k}_{q}^{4}bold_italic_k start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ⋯ bold_italic_k start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT indicate that the scheme is fourth-order.

Refer to caption
Figure 2: The schematic diagram of Taylor-net. The input of Taylor-net is (𝒒0,𝒑0)subscript𝒒0subscript𝒑0(\bm{q}_{0},\bm{p}_{0})( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), and the output is (𝒒n,𝒑n)subscript𝒒𝑛subscript𝒑𝑛(\bm{q}_{n},\bm{p}_{n})( bold_italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). Taylor-net consists of n𝑛nitalic_n iterations of fourth-order symplectic integrator. The input of the integrator is (𝒒i1,𝒑i1)subscript𝒒𝑖1subscript𝒑𝑖1(\bm{q}_{i-1},\bm{p}_{i-1})( bold_italic_q start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ), and the output is (𝒒i,𝒑i)subscript𝒒𝑖subscript𝒑𝑖(\bm{q}_{i},\bm{p}_{i})( bold_italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). The four intermediate variables 𝒕p0𝒕p4superscriptsubscript𝒕𝑝0superscriptsubscript𝒕𝑝4\bm{t}_{p}^{0}\cdots\bm{t}_{p}^{4}bold_italic_t start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ⋯ bold_italic_t start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT and 𝒌q0𝒌q4superscriptsubscript𝒌𝑞0superscriptsubscript𝒌𝑞4\bm{k}_{q}^{0}\cdots\bm{k}_{q}^{4}bold_italic_k start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ⋯ bold_italic_k start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT show that the scheme is fourth-order. Source: [1].

By constructing the network 𝑻p(𝒑,𝜽p)subscript𝑻𝑝𝒑subscript𝜽𝑝\bm{T}_{p}(\bm{p},\bm{\theta}_{p})bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_p , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) in (43) that satisfies (41), we show that Theorem 4.2 holds, so the network (49) preserves the symplectic structure of the system.

Theorem 4.2.

For a given dtd𝑡\textrm{d}td italic_t, the map** 𝐅tj(:,:,dt):2N2N:superscriptsubscript𝐅𝑡𝑗::d𝑡superscript2𝑁superscript2𝑁\bm{F}_{t}^{j}(:,:,\textrm{d}t):\mathbb{R}^{2N}\rightarrow\mathbb{R}^{2N}bold_italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( : , : , d italic_t ) : blackboard_R start_POSTSUPERSCRIPT 2 italic_N end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 2 italic_N end_POSTSUPERSCRIPT in (49) is a symplectomorphism if and only if the Jacobian of 𝐓psubscript𝐓𝑝\bm{T}_{p}bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is a symmetric matrix, that is, it satisifies (41).

Proof.

Let

(𝒕p,𝒕q)=𝑭tj(𝒌p,𝒌q,dt).subscript𝒕𝑝subscript𝒕𝑞superscriptsubscript𝑭𝑡𝑗subscript𝒌𝑝subscript𝒌𝑞d𝑡(\bm{t}_{p},\bm{t}_{q})=\bm{F}_{t}^{j}(\bm{k}_{p},\bm{k}_{q},\textrm{d}t).( bold_italic_t start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , bold_italic_t start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) = bold_italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( bold_italic_k start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , bold_italic_k start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , d italic_t ) . (51)

From (49), we have

d𝒕pd𝒕q=d𝒌pd𝒌q+dsubscript𝒕𝑝dsubscript𝒕𝑞dsubscript𝒌𝑝limit-fromdsubscript𝒌𝑞\displaystyle\textrm{d}\bm{t}_{p}\wedge\textrm{d}\bm{t}_{q}=\textrm{d}\bm{k}_{% p}\wedge\textrm{d}\bm{k}_{q}+d bold_italic_t start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∧ d bold_italic_t start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT = d bold_italic_k start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∧ d bold_italic_k start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT + (52)
12l,m=1Ncjdt[𝑻p(𝒌p,𝜽p)𝒌p|l,m𝑻p(𝒌p,𝜽p)𝒌p|m,l]d𝒌p|ld𝒌q|m.evaluated-at12superscriptsubscript𝑙𝑚1𝑁subscript𝑐𝑗d𝑡delimited-[]evaluated-atsubscript𝑻𝑝subscript𝒌𝑝subscript𝜽𝑝subscript𝒌𝑝𝑙𝑚evaluated-atsubscript𝑻𝑝subscript𝒌𝑝subscript𝜽𝑝subscript𝒌𝑝𝑚𝑙dsubscript𝒌𝑝𝑙evaluated-atdsubscript𝒌𝑞𝑚\displaystyle\frac{1}{2}\sum_{l,m=1}^{N}c_{j}\textrm{d}t\left[\frac{\partial% \bm{T}_{p}(\bm{k}_{p},\bm{\theta}_{p})}{\partial\bm{k}_{p}}\Bigg{|}_{l,m}-% \frac{\partial\bm{T}_{p}(\bm{k}_{p},\bm{\theta}_{p})}{\partial\bm{k}_{p}}\Bigg% {|}_{m,l}\right]\textrm{d}\bm{k}_{p}|_{l}\wedge\textrm{d}\bm{k}_{q}|_{m}.divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_l , italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT d italic_t [ divide start_ARG ∂ bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_k start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ bold_italic_k start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT italic_l , italic_m end_POSTSUBSCRIPT - divide start_ARG ∂ bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_k start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ bold_italic_k start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT italic_m , italic_l end_POSTSUBSCRIPT ] d bold_italic_k start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∧ d bold_italic_k start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT .

Here 𝑨|l,mevaluated-at𝑨𝑙𝑚\bm{A}|_{l,m}bold_italic_A | start_POSTSUBSCRIPT italic_l , italic_m end_POSTSUBSCRIPT refers to the entry in the l𝑙litalic_l-th row and m𝑚mitalic_m-th column of a matrix 𝑨𝑨\bm{A}bold_italic_A, 𝒙|levaluated-at𝒙𝑙\bm{x}|_{l}bold_italic_x | start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT refers to the l𝑙litalic_l-th component of vector 𝒙𝒙\bm{x}bold_italic_x. From (55), we know that d𝒕pd𝒕q=d𝒌pd𝒌qdsubscript𝒕𝑝dsubscript𝒕𝑞dsubscript𝒌𝑝dsubscript𝒌𝑞\textrm{d}\bm{t}_{p}\wedge\textrm{d}\bm{t}_{q}=\textrm{d}\bm{k}_{p}\wedge% \textrm{d}\bm{k}_{q}d bold_italic_t start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∧ d bold_italic_t start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT = d bold_italic_k start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∧ d bold_italic_k start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT is equivalent to

𝑻p(𝒌p,𝜽p)𝒌p|l,m𝑻p(𝒌p,𝜽p)𝒌p|m,l=0,l,m=1,2,,N,formulae-sequenceevaluated-atsubscript𝑻𝑝subscript𝒌𝑝subscript𝜽𝑝subscript𝒌𝑝𝑙𝑚evaluated-atsubscript𝑻𝑝subscript𝒌𝑝subscript𝜽𝑝subscript𝒌𝑝𝑚𝑙0for-all𝑙𝑚12𝑁\frac{\partial\bm{T}_{p}(\bm{k}_{p},\bm{\theta}_{p})}{\partial\bm{k}_{p}}\Bigg% {|}_{l,m}-\frac{\partial\bm{T}_{p}(\bm{k}_{p},\bm{\theta}_{p})}{\partial\bm{k}% _{p}}\Bigg{|}_{m,l}=0,\quad\forall l,m=1,2,\cdots,N,divide start_ARG ∂ bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_k start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ bold_italic_k start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT italic_l , italic_m end_POSTSUBSCRIPT - divide start_ARG ∂ bold_italic_T start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_italic_k start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , bold_italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ bold_italic_k start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT italic_m , italic_l end_POSTSUBSCRIPT = 0 , ∀ italic_l , italic_m = 1 , 2 , ⋯ , italic_N , (53)

which is (41). ∎

Similar to Theorem 4.2, we can find the relationship between 𝑭kjsuperscriptsubscript𝑭𝑘𝑗\bm{F}_{k}^{j}bold_italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT and the Jacobian of 𝑽qsubscript𝑽𝑞\bm{V}_{q}bold_italic_V start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT. The proof of 4.3 is omitted as it is similar to the proof of Theorem 4.2.

Theorem 4.3.

For a given dt, the map** 𝐅kj(:,:,dt):2N2N:superscriptsubscript𝐅𝑘𝑗::d𝑡superscript2𝑁superscript2𝑁\bm{F}_{k}^{j}(:,:,\textrm{d}t):\mathbb{R}^{2N}\rightarrow\mathbb{R}^{2N}bold_italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( : , : , d italic_t ) : blackboard_R start_POSTSUPERSCRIPT 2 italic_N end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 2 italic_N end_POSTSUPERSCRIPT in (50) is a symplectomorphism if and only if the Jacobian of 𝐕qsubscript𝐕𝑞\bm{V}_{q}bold_italic_V start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT is a symmetric matrix, that is, it satisifies (42).

Suppose that Φ1subscriptΦ1\Phi_{1}roman_Φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and Φ2subscriptΦ2\Phi_{2}roman_Φ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are two symplectomorphisms. Then, it is easy to show that their composite map Φ2Φ1subscriptΦ2subscriptΦ1\Phi_{2}\circ\Phi_{1}roman_Φ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∘ roman_Φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is also symplectomorphism due to the chain rule. Thus, the symplectomorphism of Algorithm 1 can be guaranteed by the Theorems 4.2 and 4.3.

4.2 Nonseparable Symplectic Neural Networks (NSSNNs)

Our model aims to learn the dynamical evolution of (𝒒,𝒑)𝒒𝒑(\bm{q},\bm{p})( bold_italic_q , bold_italic_p ) in (6) by embedding (14) into the framework of NeuralODE [50]. We learn the nonseparable Hamiltonian dynamics (6) by constructing an augmented system (14), from which we can obtain the energy function (𝒒,𝒑)𝒒𝒑\mathcal{H}(\bm{q},\bm{p})caligraphic_H ( bold_italic_q , bold_italic_p ) by training the neural network θ(𝒒,𝒑)subscript𝜃𝒒𝒑\mathcal{H}_{\theta}(\bm{q},\bm{p})caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_p ) with parameter 𝜽𝜽\bm{\theta}bold_italic_θ and calculate the gradient θ(𝒒,𝒑)bold-∇subscript𝜃𝒒𝒑\bm{\nabla}\mathcal{H}_{\theta}(\bm{q},\bm{p})bold_∇ caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_p ) by taking the in-graph gradient. For the constructed network θ(𝒒,𝒑)subscript𝜃𝒒𝒑\mathcal{H}_{\theta}(\bm{q},\bm{p})caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_p ), we integrate (14) by using the second-order symplectic integrator [55]. Specifically, we will have an input layer (𝒒,𝒑,𝒙,𝒚)=(𝒒0,𝒑0,𝒒0,𝒑0)𝒒𝒑𝒙𝒚subscript𝒒0subscript𝒑0subscript𝒒0subscript𝒑0(\bm{q},\bm{p},\bm{x},\bm{y})=(\bm{q}_{0},\bm{p}_{0},\bm{q}_{0},\bm{p}_{0})( bold_italic_q , bold_italic_p , bold_italic_x , bold_italic_y ) = ( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) at t=t0𝑡subscript𝑡0t=t_{0}italic_t = italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and an output layer (𝒒,𝒑,𝒙,𝒚)=(𝒒n,𝒑n,𝒙n,𝒚n)𝒒𝒑𝒙𝒚subscript𝒒𝑛subscript𝒑𝑛subscript𝒙𝑛subscript𝒚𝑛(\bm{q},\bm{p},\bm{x},\bm{y})=(\bm{q}_{n},\bm{p}_{n},\bm{x}_{n},\bm{y}_{n})( bold_italic_q , bold_italic_p , bold_italic_x , bold_italic_y ) = ( bold_italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) at t=t0+ndt𝑡subscript𝑡0𝑛d𝑡t=t_{0}+n\textrm{d}titalic_t = italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_n d italic_t.

Algorithm 2 Integrate (14) by using the second-order symplectic integrator. Source: [2].
𝒒0,𝒑0,t0,t,dtsubscript𝒒0subscript𝒑0subscript𝑡0𝑡d𝑡\bm{q}_{0},\bm{p}_{0},t_{0},t,\textrm{d}tbold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_t , d italic_t;   ϕ1δsuperscriptsubscriptbold-italic-ϕ1𝛿\bm{\phi}_{1}^{\delta}bold_italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT, ϕ2δsuperscriptsubscriptbold-italic-ϕ2𝛿\bm{\phi}_{2}^{\delta}bold_italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT, and ϕ3δsuperscriptsubscriptbold-italic-ϕ3𝛿\bm{\phi}_{3}^{\delta}bold_italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT in (15);
(q^,p^,x^,y^)=(𝒒n,𝒑n,𝒙n,𝒚n)^𝑞^𝑝^𝑥^𝑦subscript𝒒𝑛subscript𝒑𝑛subscript𝒙𝑛subscript𝒚𝑛(\hat{q},\hat{p},\hat{x},\hat{y})=(\bm{q}_{n},\bm{p}_{n},\bm{x}_{n},\bm{y}_{n})( over^ start_ARG italic_q end_ARG , over^ start_ARG italic_p end_ARG , over^ start_ARG italic_x end_ARG , over^ start_ARG italic_y end_ARG ) = ( bold_italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )
(𝒒0,𝒑0,𝒙0,𝒚0)=(𝒒0,𝒑0,𝒒0,𝒑0)subscript𝒒0subscript𝒑0subscript𝒙0subscript𝒚0subscript𝒒0subscript𝒑0subscript𝒒0subscript𝒑0(\bm{q}_{0},\bm{p}_{0},\bm{x}_{0},\bm{y}_{0})=(\bm{q}_{0},\bm{p}_{0},\bm{q}_{0% },\bm{p}_{0})( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = ( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ;
n=floor[(tt0)/dt]𝑛floordelimited-[]𝑡subscript𝑡0d𝑡n=\textrm{floor}[(t-t_{0})/\textrm{d}t]italic_n = floor [ ( italic_t - italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) / d italic_t ] ;
    for i=1n𝑖1𝑛i=1\to nitalic_i = 1 → italic_n
        (𝒒i,𝒑i,𝒙i,𝒚i)=ϕ1dt/2ϕ2dt/2ϕ3dtϕ2dt/2ϕ1dt/2(𝒒i1,𝒑i1,𝒙i1,𝒚i1)subscript𝒒𝑖subscript𝒑𝑖subscript𝒙𝑖subscript𝒚𝑖superscriptsubscriptbold-italic-ϕ1d𝑡2superscriptsubscriptbold-italic-ϕ2d𝑡2superscriptsubscriptbold-italic-ϕ3d𝑡superscriptsubscriptbold-italic-ϕ2d𝑡2superscriptsubscriptbold-italic-ϕ1d𝑡2subscript𝒒𝑖1subscript𝒑𝑖1subscript𝒙𝑖1subscript𝒚𝑖1(\bm{q}_{i},\bm{p}_{i},\bm{x}_{i},\bm{y}_{i})=\bm{\phi}_{1}^{\textrm{d}t/2}% \circ\bm{\phi}_{2}^{\textrm{d}t/2}\circ\bm{\phi}_{3}^{\textrm{d}t}\circ\bm{% \phi}_{2}^{\textrm{d}t/2}\circ\bm{\phi}_{1}^{\textrm{d}t/2}\circ(\bm{q}_{i-1},% \bm{p}_{i-1},\bm{x}_{i-1},\bm{y}_{i-1})( bold_italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = bold_italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT d italic_t / 2 end_POSTSUPERSCRIPT ∘ bold_italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT d italic_t / 2 end_POSTSUPERSCRIPT ∘ bold_italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT d italic_t end_POSTSUPERSCRIPT ∘ bold_italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT d italic_t / 2 end_POSTSUPERSCRIPT ∘ bold_italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT d italic_t / 2 end_POSTSUPERSCRIPT ∘ ( bold_italic_q start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT );
    end

The recursive relations of (𝒒i,𝒑i,𝒙i,𝒚i),i=1,2,,nformulae-sequencesubscript𝒒𝑖subscript𝒑𝑖subscript𝒙𝑖subscript𝒚𝑖𝑖12𝑛(\bm{q}_{i},\bm{p}_{i},\bm{x}_{i},\bm{y}_{i}),i=1,2,\cdots,n( bold_italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_i = 1 , 2 , ⋯ , italic_n, can be expressed by Algorithm 2. Figure 3(a) shows the forward pass of NSSNN is composed of a forward pass through a differentiable symplectic integrator as well as a backpropagation step through the model. Figure 3(b) plots the schematic diagram of NSSNN. For the constructed network θ(𝒒,𝒑)subscript𝜃𝒒𝒑\mathcal{H}_{\theta}(\bm{q},\bm{p})caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_p ), we integrate (14) by using the second-order symplectic integrator [55]. Specifically, The input layer of the integrator is (𝒒,𝒑,𝒙,𝒚)=(𝒒0,𝒑0,𝒒0,𝒑0)𝒒𝒑𝒙𝒚subscript𝒒0subscript𝒑0subscript𝒒0subscript𝒑0(\bm{q},\bm{p},\bm{x},\bm{y})=(\bm{q}_{0},\bm{p}_{0},\bm{q}_{0},\bm{p}_{0})( bold_italic_q , bold_italic_p , bold_italic_x , bold_italic_y ) = ( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) at t=t0𝑡subscript𝑡0t=t_{0}italic_t = italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and the output layer is (𝒒,𝒑,𝒙,𝒚)=(𝒒n,𝒑n,𝒙n,𝒚n)𝒒𝒑𝒙𝒚subscript𝒒𝑛subscript𝒑𝑛subscript𝒙𝑛subscript𝒚𝑛(\bm{q},\bm{p},\bm{x},\bm{y})=(\bm{q}_{n},\bm{p}_{n},\bm{x}_{n},\bm{y}_{n})( bold_italic_q , bold_italic_p , bold_italic_x , bold_italic_y ) = ( bold_italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) at t=t0+ndt𝑡subscript𝑡0𝑛d𝑡t=t_{0}+n\textrm{d}titalic_t = italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_n d italic_t. The recursive relations of (𝒒i,𝒑i,𝒙i,𝒚i),i=1,2,,nformulae-sequencesubscript𝒒𝑖subscript𝒑𝑖subscript𝒙𝑖subscript𝒚𝑖𝑖12𝑛(\bm{q}_{i},\bm{p}_{i},\bm{x}_{i},\bm{y}_{i}),i=1,2,\cdots,n( bold_italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_i = 1 , 2 , ⋯ , italic_n, are expressed by Algorithm 2. Moreover, given (15), since 𝒙𝒙\bm{x}bold_italic_x and 𝒚𝒚\bm{y}bold_italic_y are theoretically equal to 𝒒𝒒\bm{q}bold_italic_q and 𝒑𝒑\bm{p}bold_italic_p, we can use the data set of (𝒒,𝒑)𝒒𝒑(\bm{q},\bm{p})( bold_italic_q , bold_italic_p ) to construct the data set containing variables (𝒒,𝒑,𝒙,𝒚)𝒒𝒑𝒙𝒚(\bm{q},\bm{p},\bm{x},\bm{y})( bold_italic_q , bold_italic_p , bold_italic_x , bold_italic_y ).

Refer to caption
Figure 3: (a) The forward pass of an NSSNN is composed of a forward pass through a differentiable symplectic integrator as well as a backpropagation step through the model. (b) The schematic diagram of NSSNN. Source: [2].

In addition, by constructing the network θsubscript𝜃\mathcal{H}_{\theta}caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, we show that Theorem 4.4 holds, so the networks ϕ1δ,ϕ2δsuperscriptsubscriptbold-italic-ϕ1𝛿superscriptsubscriptbold-italic-ϕ2𝛿\bm{\phi}_{1}^{\delta},\bm{\phi}_{2}^{\delta}bold_italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT , bold_italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT, and ϕ3δsuperscriptsubscriptbold-italic-ϕ3𝛿\bm{\phi}_{3}^{\delta}bold_italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT in (15) preserve the symplectic structure of the system. Suppose that Φ1subscriptΦ1\Phi_{1}roman_Φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and Φ2subscriptΦ2\Phi_{2}roman_Φ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are two symplectomorphisms. Then, it is easy to show that their composite map Φ2Φ1subscriptΦ2subscriptΦ1\Phi_{2}\circ\Phi_{1}roman_Φ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∘ roman_Φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is also symplectomorphism due to the chain rule. Thus, the symplectomorphism of Algorithm 2 can be guaranteed by Theorem 4.4.

Theorem 4.4.

For a given δ𝛿\deltaitalic_δ, the map** ϕ1δsuperscriptsubscriptbold-ϕ1𝛿\bm{\phi}_{1}^{\delta}bold_italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT, ϕ2δsuperscriptsubscriptbold-ϕ2𝛿\bm{\phi}_{2}^{\delta}bold_italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT, and ϕ3δsuperscriptsubscriptbold-ϕ3𝛿\bm{\phi}_{3}^{\delta}bold_italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT in (15) are symplectomorphisms.

Proof.

Let

(𝒕jq,𝒕jp,𝒕jx,𝒕jy)=ϕjδ(𝒒,𝒑,𝒙,𝒚),j=1,2,3.formulae-sequencesuperscriptsubscript𝒕𝑗𝑞superscriptsubscript𝒕𝑗𝑝superscriptsubscript𝒕𝑗𝑥superscriptsubscript𝒕𝑗𝑦superscriptsubscriptbold-italic-ϕ𝑗𝛿𝒒𝒑𝒙𝒚𝑗123(\bm{t}_{j}^{q},\bm{t}_{j}^{p},\bm{t}_{j}^{x},\bm{t}_{j}^{y})=\bm{\phi}_{j}^{% \delta}(\bm{q},\bm{p},\bm{x},\bm{y}),~{}~{}j=1,2,3.( bold_italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT , bold_italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT , bold_italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT , bold_italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT ) = bold_italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT ( bold_italic_q , bold_italic_p , bold_italic_x , bold_italic_y ) , italic_j = 1 , 2 , 3 . (54)

From the first equation of (15), we have

d𝒕1qd𝒕1p+d𝒕1xd𝒕1ydsuperscriptsubscript𝒕1𝑞dsuperscriptsubscript𝒕1𝑝dsuperscriptsubscript𝒕1𝑥dsuperscriptsubscript𝒕1𝑦\displaystyle\textrm{d}\bm{t}_{1}^{q}\wedge\textrm{d}\bm{t}_{1}^{p}+\textrm{d}% \bm{t}_{1}^{x}\wedge\textrm{d}\bm{t}_{1}^{y}d bold_italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ∧ d bold_italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT + d bold_italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ∧ d bold_italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT (55)
=\displaystyle== d𝒒d[𝒑δθ(𝒒,𝒚)𝒒]+d[𝒙+δθ(𝒒,𝒚)𝒑]d𝒚d𝒒ddelimited-[]𝒑𝛿subscript𝜃𝒒𝒚𝒒ddelimited-[]𝒙𝛿subscript𝜃𝒒𝒚𝒑d𝒚\displaystyle\textrm{d}\bm{q}\wedge\textrm{d}\left[\bm{p}-\delta\frac{\partial% \mathcal{H}_{\theta}(\bm{q},\bm{y})}{\partial\bm{q}}\right]+\textrm{d}\left[% \bm{x}+\delta\frac{\partial\mathcal{H}_{\theta}(\bm{q},\bm{y})}{\partial\bm{p}% }\right]\wedge\textrm{d}\bm{y}d bold_italic_q ∧ d [ bold_italic_p - italic_δ divide start_ARG ∂ caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_y ) end_ARG start_ARG ∂ bold_italic_q end_ARG ] + d [ bold_italic_x + italic_δ divide start_ARG ∂ caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_y ) end_ARG start_ARG ∂ bold_italic_p end_ARG ] ∧ d bold_italic_y
=\displaystyle== d𝒒d𝒑+d𝒙d𝒚+δ[θ(𝒒,𝒚)𝒒𝒚θ(𝒒,𝒚)𝒚𝒒]d𝒒d𝒚d𝒒d𝒑d𝒙d𝒚𝛿delimited-[]subscript𝜃𝒒𝒚𝒒𝒚subscript𝜃𝒒𝒚𝒚𝒒d𝒒d𝒚\displaystyle\textrm{d}\bm{q}\wedge\textrm{d}\bm{p}+\textrm{d}\bm{x}\wedge% \textrm{d}\bm{y}+\delta\left[\frac{\partial\mathcal{H}_{\theta}(\bm{q},\bm{y})% }{\partial\bm{q}\partial\bm{y}}-\frac{\partial\mathcal{H}_{\theta}(\bm{q},\bm{% y})}{\partial\bm{y}\partial\bm{q}}\right]\textrm{d}\bm{q}\wedge\textrm{d}\bm{y}d bold_italic_q ∧ d bold_italic_p + d bold_italic_x ∧ d bold_italic_y + italic_δ [ divide start_ARG ∂ caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_y ) end_ARG start_ARG ∂ bold_italic_q ∂ bold_italic_y end_ARG - divide start_ARG ∂ caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_q , bold_italic_y ) end_ARG start_ARG ∂ bold_italic_y ∂ bold_italic_q end_ARG ] d bold_italic_q ∧ d bold_italic_y
=\displaystyle== d𝒒d𝒑+d𝒙d𝒚.d𝒒d𝒑d𝒙d𝒚\displaystyle\textrm{d}\bm{q}\wedge\textrm{d}\bm{p}+\textrm{d}\bm{x}\wedge% \textrm{d}\bm{y}.d bold_italic_q ∧ d bold_italic_p + d bold_italic_x ∧ d bold_italic_y .

Similarly, we can prove that d𝒕2qd𝒕2p+d𝒕2xd𝒕2y=d𝒒d𝒑+d𝒙d𝒚dsuperscriptsubscript𝒕2𝑞dsuperscriptsubscript𝒕2𝑝dsuperscriptsubscript𝒕2𝑥dsuperscriptsubscript𝒕2𝑦d𝒒d𝒑d𝒙d𝒚\textrm{d}\bm{t}_{2}^{q}\wedge\textrm{d}\bm{t}_{2}^{p}+\textrm{d}\bm{t}_{2}^{x% }\wedge\textrm{d}\bm{t}_{2}^{y}=\textrm{d}\bm{q}\wedge\textrm{d}\bm{p}+\textrm% {d}\bm{x}\wedge\textrm{d}\bm{y}d bold_italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ∧ d bold_italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT + d bold_italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ∧ d bold_italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT = d bold_italic_q ∧ d bold_italic_p + d bold_italic_x ∧ d bold_italic_y. In addition, from the third equation of (15), we can directly deduce that d𝒕3qd𝒕3p+d𝒕3xd𝒕3y=d𝒒d𝒑+d𝒙d𝒚dsuperscriptsubscript𝒕3𝑞dsuperscriptsubscript𝒕3𝑝dsuperscriptsubscript𝒕3𝑥dsuperscriptsubscript𝒕3𝑦d𝒒d𝒑d𝒙d𝒚\textrm{d}\bm{t}_{3}^{q}\wedge\textrm{d}\bm{t}_{3}^{p}+\textrm{d}\bm{t}_{3}^{x% }\wedge\textrm{d}\bm{t}_{3}^{y}=\textrm{d}\bm{q}\wedge\textrm{d}\bm{p}+\textrm% {d}\bm{x}\wedge\textrm{d}\bm{y}d bold_italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ∧ d bold_italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT + d bold_italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ∧ d bold_italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT = d bold_italic_q ∧ d bold_italic_p + d bold_italic_x ∧ d bold_italic_y. ∎

Refer to caption
Figure 4: Comparison between NSSNN and HNN regarding the network design and prediction results of a vortex flow example. Source: [2].

.

We show a motivational example in Figure 4 by comparing our approach with a traditional HNN method [15] regarding their structural designs and predicting abilities. We refer the readers to Section 5.2.3 for a detailed discussion. As shown in Figure 4, the vortices evolved using NSSNN are separated nicely as the ground truth, while the vortices merge together using HNN due to the failure of conserving the symplectic structure of a nonseparable system. The conservative capability of NSSNN springs from our design of the auxiliary variables (red x𝑥xitalic_x and y𝑦yitalic_y) which converts the original nonseparable system into a higher dimensional quasi-separable system where we can adopt a symplectic integrator.

4.3 Roe Neural Networks (RoeNet)

Refer to caption
Figure 5: Schematic diagram of RoeNet to predict future discontinuity from smooth observations. The blue band shows the distribution of the training set with respect to time, and the training set does not necessarily contain discontinuous solutions to the equations. Meanwhile, the orange band represents the solutions predicted with RoeNet, which may contain discontinuous solutions. Source: [4].

We introduce our design of the Roe template with pseudoinverse embedding, which accommodates the data processing and training over the entire learning pipeline. In particular, we present our basic ideas in Section 4.3.1, a detailed description of our network architecture in Section 4.3.2.

4.3.1 Roe template with Pseudoinverse Embedding

Recall the one-dimensional hyperbolic conservation law described in (18), without a given 𝑭𝑭\bm{F}bold_italic_F, we learn the weak solution of (18) using a neural network that incorporates the framework of a Roe solver. For time integration of 𝒖𝒖\bm{u}bold_italic_u in (29), we need to construct the matrix functions 𝑳𝑳\bm{L}bold_italic_L and 𝚲𝚲\bm{\Lambda}bold_Λ. Since learning a tiny parameter space is impractical, using neural networks to approximate 𝑳𝑳\bm{L}bold_italic_L and 𝚲𝚲\bm{\Lambda}bold_Λ directly in (30) is ineffective given that the number of learnable parameters is limited by the number of components Ncsubscript𝑁𝑐N_{c}italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT of 𝒖𝒖\bm{u}bold_italic_u. To enhance the expressiveness of our model, we use neural network 𝑳θsubscript𝑳𝜃\bm{L}_{\theta}bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and 𝚲ϕsubscript𝚲italic-ϕ\bm{\Lambda}_{\phi}bold_Λ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT to replace 𝑳𝑳\bm{L}bold_italic_L and 𝚲𝚲\bm{\Lambda}bold_Λ in (30) respectively. Similar to (30), the inputs to 𝑳θsubscript𝑳𝜃\bm{L}_{\theta}bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and 𝚲ϕsubscript𝚲italic-ϕ\bm{\Lambda}_{\phi}bold_Λ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT remains the same as (𝒖jn,𝒖j+1n)superscriptsubscript𝒖𝑗𝑛superscriptsubscript𝒖𝑗1𝑛(\bm{u}_{j}^{n},\bm{u}_{j+1}^{n})( bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , bold_italic_u start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ). However, the outputs of 𝑳θsubscript𝑳𝜃\bm{L}_{\theta}bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and 𝚲ϕsubscript𝚲italic-ϕ\bm{\Lambda}_{\phi}bold_Λ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT are now a Nh×Ncsubscript𝑁subscript𝑁𝑐N_{h}\times N_{c}italic_N start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT matrix and a Nh×Nhsubscript𝑁subscript𝑁N_{h}\times N_{h}italic_N start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT diagonal matrix respectively, where the positive integer Nhsubscript𝑁N_{h}italic_N start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT is a hidden dimension. Furthermore, we introduce the concept of pseudoinverses by replacing 𝑳1superscript𝑳1\bm{L}^{-1}bold_italic_L start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT with

𝑳θ+=(𝑳θT𝑳θ)1𝑳θT.superscriptsubscript𝑳𝜃superscriptsuperscriptsubscript𝑳𝜃𝑇subscript𝑳𝜃1superscriptsubscript𝑳𝜃𝑇\bm{L}_{\theta}^{+}=(\bm{L}_{\theta}^{T}\bm{L}_{\theta})^{-1}\bm{L}_{\theta}^{% T}.bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = ( bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT . (56)

Here, the transpose and inverse operations are applied to the output matrix, that is

𝑳θ+(𝒖jn,𝒖j+1n)=[𝑳θ(𝒖jn,𝒖j+1n)T𝑳θ(𝒖jn,𝒖j+1n)]1𝑳θT(𝒖jn,𝒖j+1n).superscriptsubscript𝑳𝜃superscriptsubscript𝒖𝑗𝑛superscriptsubscript𝒖𝑗1𝑛superscriptdelimited-[]subscript𝑳𝜃superscriptsuperscriptsubscript𝒖𝑗𝑛superscriptsubscript𝒖𝑗1𝑛𝑇subscript𝑳𝜃superscriptsubscript𝒖𝑗𝑛superscriptsubscript𝒖𝑗1𝑛1superscriptsubscript𝑳𝜃𝑇superscriptsubscript𝒖𝑗𝑛superscriptsubscript𝒖𝑗1𝑛\bm{L}_{\theta}^{+}(\bm{u}_{j}^{n},\bm{u}_{j+1}^{n})=[\bm{L}_{\theta}(\bm{u}_{% j}^{n},\bm{u}_{j+1}^{n})^{T}\bm{L}_{\theta}(\bm{u}_{j}^{n},\bm{u}_{j+1}^{n})]^% {-1}\bm{L}_{\theta}^{T}(\bm{u}_{j}^{n},\bm{u}_{j+1}^{n}).bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , bold_italic_u start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = [ bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , bold_italic_u start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , bold_italic_u start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , bold_italic_u start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) . (57)

Substituting 𝑳θsubscript𝑳𝜃\bm{L}_{\theta}bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, 𝚲ϕsubscript𝚲italic-ϕ\bm{\Lambda}_{\phi}bold_Λ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT, and (56) into (29) and (30) yields

𝒖jn+1=superscriptsubscript𝒖𝑗𝑛1absent\displaystyle\bm{u}_{j}^{n+1}=bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT = 𝒖jn12λr(𝑳j+12,θn)+(𝚲j+12,ϕn|𝚲j+12,ϕn|)𝑳j+12,θn(𝒖j+1n𝒖jn)superscriptsubscript𝒖𝑗𝑛12subscript𝜆𝑟superscriptsubscriptsuperscript𝑳𝑛𝑗12𝜃superscriptsubscript𝚲𝑗12italic-ϕ𝑛superscriptsubscript𝚲𝑗12italic-ϕ𝑛superscriptsubscript𝑳𝑗12𝜃𝑛superscriptsubscript𝒖𝑗1𝑛superscriptsubscript𝒖𝑗𝑛\displaystyle\bm{u}_{j}^{n}-\frac{1}{2}\lambda_{r}(\bm{L}^{n}_{j+\frac{1}{2},% \theta})^{+}(\bm{\Lambda}_{j+\frac{1}{2},\phi}^{n}-|\bm{\Lambda}_{j+\frac{1}{2% },\phi}^{n}|)\bm{L}_{j+\frac{1}{2},\theta}^{n}(\bm{u}_{j+1}^{n}-\bm{u}_{j}^{n})bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_L start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_θ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ( bold_Λ start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_ϕ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - | bold_Λ start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_ϕ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | ) bold_italic_L start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) (58)
12λr(𝑳j12,θn)+(𝚲j12,ϕn+|𝚲j12,ϕn|)𝑳j12,θn(𝒖jn𝒖j1n),12subscript𝜆𝑟superscriptsubscriptsuperscript𝑳𝑛𝑗12𝜃superscriptsubscript𝚲𝑗12italic-ϕ𝑛superscriptsubscript𝚲𝑗12italic-ϕ𝑛superscriptsubscript𝑳𝑗12𝜃𝑛superscriptsubscript𝒖𝑗𝑛superscriptsubscript𝒖𝑗1𝑛\displaystyle-\frac{1}{2}\lambda_{r}(\bm{L}^{n}_{j-\frac{1}{2},\theta})^{+}(% \bm{\Lambda}_{j-\frac{1}{2},\phi}^{n}+|\bm{\Lambda}_{j-\frac{1}{2},\phi}^{n}|)% \bm{L}_{j-\frac{1}{2},\theta}^{n}(\bm{u}_{j}^{n}-\bm{u}_{j-1}^{n}),- divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_L start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_θ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ( bold_Λ start_POSTSUBSCRIPT italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_ϕ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT + | bold_Λ start_POSTSUBSCRIPT italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_ϕ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | ) bold_italic_L start_POSTSUBSCRIPT italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - bold_italic_u start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ,

with

𝑳j+12,θn=𝑳θ(𝒖jn,𝒖j+1n),𝚲j+12,ϕn=𝚲ϕ(𝒖jn,𝒖j+1n).formulae-sequencesuperscriptsubscript𝑳𝑗12𝜃𝑛subscript𝑳𝜃superscriptsubscript𝒖𝑗𝑛superscriptsubscript𝒖𝑗1𝑛superscriptsubscript𝚲𝑗12italic-ϕ𝑛subscript𝚲italic-ϕsuperscriptsubscript𝒖𝑗𝑛superscriptsubscript𝒖𝑗1𝑛\bm{L}_{j+\frac{1}{2},\theta}^{n}=\bm{L}_{\theta}(\bm{u}_{j}^{n},\bm{u}_{j+1}^% {n}),~{}~{}~{}~{}\bm{\Lambda}_{j+\frac{1}{2},\phi}^{n}=\bm{\Lambda}_{\phi}(\bm% {u}_{j}^{n},\bm{u}_{j+1}^{n}).bold_italic_L start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , bold_italic_u start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , bold_Λ start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_ϕ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = bold_Λ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , bold_italic_u start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) . (59)

Equation (58) serves as our template to evolve the system’s states from 𝒖jnsuperscriptsubscript𝒖𝑗𝑛\bm{u}_{j}^{n}bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT to 𝒖jn+1superscriptsubscript𝒖𝑗𝑛1\bm{u}_{j}^{n+1}bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT in RoeNet.

Figure 5 presents a schematic diagram of RoeNet, which predicts future discontinuities from smooth observations. We note that for hyperbolic conservation laws with discontinuous solutions, RoeNet can accurately forecast long-term outcomes that are either fully or partially discontinuous. This is achievable even when the training data provided cover only a short window and contain limited information on discontinuities.

4.3.2 Neural network architecture

Figure 6 shows an overview of our neural network architecture. In summary, RoeNet consists of 𝑳θsubscript𝑳𝜃\bm{L}_{\theta}bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and 𝚲ϕsubscript𝚲italic-ϕ\bm{\Lambda}_{\phi}bold_Λ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT, two networks embedded in (58) to serve as our template to evolve the system’s states from 𝒖jnsuperscriptsubscript𝒖𝑗𝑛\bm{u}_{j}^{n}bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT to 𝒖jn+1superscriptsubscript𝒖𝑗𝑛1\bm{u}_{j}^{n+1}bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT.

Refer to caption
Figure 6: The architecture of the neural network that evolves the system’s states from 𝒖jnsuperscriptsubscript𝒖𝑗𝑛\bm{u}_{j}^{n}bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT to 𝒖jn+1superscriptsubscript𝒖𝑗𝑛1\bm{u}_{j}^{n+1}bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT in RoeNet. This network takes the current conserved quantity 𝒖jnsubscriptsuperscript𝒖𝑛𝑗\bm{u}^{n}_{j}bold_italic_u start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and its direct neighbors, 𝒖j1nsubscriptsuperscript𝒖𝑛𝑗1\bm{u}^{n}_{j-1}bold_italic_u start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT and 𝒖j+1nsubscriptsuperscript𝒖𝑛𝑗1\bm{u}^{n}_{j+1}bold_italic_u start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT, as the inputs and outputs the conserved quantity 𝒖jn+1subscriptsuperscript𝒖𝑛1𝑗\bm{u}^{n+1}_{j}bold_italic_u start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT of the next time step. The ResBlock has the same architecture as in [49], only with the 2D convolution layers replaced by linear layers. The number in the parentheses is the dimension of each Resblock output. Source: [4].

Specifically, the network in Figure 6 contains two parts, each consists of a 𝑳θsubscript𝑳𝜃\bm{L}_{\theta}bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and a 𝚲ϕsubscript𝚲italic-ϕ\bm{\Lambda}_{\phi}bold_Λ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT. The first part takes 𝒖j1nsuperscriptsubscript𝒖𝑗1𝑛\bm{u}_{j-1}^{n}bold_italic_u start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝒖jnsuperscriptsubscript𝒖𝑗𝑛\bm{u}_{j}^{n}bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT as input of both 𝑳θsubscript𝑳𝜃\bm{L}_{\theta}bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and 𝚲ϕsubscript𝚲italic-ϕ\bm{\Lambda}_{\phi}bold_Λ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT and outputs 𝑳j12,θsubscript𝑳𝑗12𝜃\bm{L}_{j-\frac{1}{2},\theta}bold_italic_L start_POSTSUBSCRIPT italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_θ end_POSTSUBSCRIPT through 𝑳θsubscript𝑳𝜃\bm{L}_{\theta}bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and 𝚲j12,ϕsubscript𝚲𝑗12italic-ϕ\bm{\Lambda}_{j-\frac{1}{2},\phi}bold_Λ start_POSTSUBSCRIPT italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_ϕ end_POSTSUBSCRIPT through 𝚲ϕsubscript𝚲italic-ϕ\bm{\Lambda}_{\phi}bold_Λ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT. The input 𝒖j1nsuperscriptsubscript𝒖𝑗1𝑛\bm{u}_{j-1}^{n}bold_italic_u start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝒖jnsuperscriptsubscript𝒖𝑗𝑛\bm{u}_{j}^{n}bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is a vector [𝒖j1n,(1),,𝒖j1n,(Nc);𝒖jn,(1),,𝒖jn,(Nc)]subscriptsuperscript𝒖𝑛1𝑗1subscriptsuperscript𝒖𝑛subscript𝑁𝑐𝑗1subscriptsuperscript𝒖𝑛1𝑗subscriptsuperscript𝒖𝑛subscript𝑁𝑐𝑗[\bm{u}^{n,(1)}_{j-1},\cdots,\bm{u}^{n,(N_{c})}_{j-1};\bm{u}^{n,(1)}_{j},% \cdots,\bm{u}^{n,(N_{c})}_{j}][ bold_italic_u start_POSTSUPERSCRIPT italic_n , ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT , ⋯ , bold_italic_u start_POSTSUPERSCRIPT italic_n , ( italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ; bold_italic_u start_POSTSUPERSCRIPT italic_n , ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ⋯ , bold_italic_u start_POSTSUPERSCRIPT italic_n , ( italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] of length 2Nc2subscript𝑁𝑐2N_{c}2 italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT with Ncsubscript𝑁𝑐N_{c}italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT components. The output matrix Lj12,θsubscript𝐿𝑗12𝜃L_{j-\frac{1}{2},\theta}italic_L start_POSTSUBSCRIPT italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_θ end_POSTSUBSCRIPT is of size (Nc×Nh)subscript𝑁𝑐subscript𝑁(N_{c}\times N_{h})( italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ), and the other output matrix 𝚲j12,ϕsubscript𝚲𝑗12italic-ϕ\bm{\Lambda}_{j-\frac{1}{2},\phi}bold_Λ start_POSTSUBSCRIPT italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_ϕ end_POSTSUBSCRIPT is a diagonal matrix of size (Nh×Nh)subscript𝑁subscript𝑁(N_{h}\times N_{h})( italic_N start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ). The second part takes 𝒖jnsuperscriptsubscript𝒖𝑗𝑛\bm{u}_{j}^{n}bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝒖j+1nsuperscriptsubscript𝒖𝑗1𝑛\bm{u}_{j+1}^{n}bold_italic_u start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT as the input for both 𝑳θsubscript𝑳𝜃\bm{L}_{\theta}bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and 𝚲ϕsubscript𝚲italic-ϕ\bm{\Lambda}_{\phi}bold_Λ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT and outputs 𝑳j+12,θsubscript𝑳𝑗12𝜃\bm{L}_{j+\frac{1}{2},\theta}bold_italic_L start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_θ end_POSTSUBSCRIPT through 𝑳θsubscript𝑳𝜃\bm{L}_{\theta}bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and 𝚲j+12,ϕsubscript𝚲𝑗12italic-ϕ\bm{\Lambda}_{j+\frac{1}{2},\phi}bold_Λ start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_ϕ end_POSTSUBSCRIPT through 𝚲ϕsubscript𝚲italic-ϕ\bm{\Lambda}_{\phi}bold_Λ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT. The input 𝒖jnsuperscriptsubscript𝒖𝑗𝑛\bm{u}_{j}^{n}bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝒖j+1nsuperscriptsubscript𝒖𝑗1𝑛\bm{u}_{j+1}^{n}bold_italic_u start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is a vector [𝒖jn,(1),,𝒖jn,(Nc);𝒖j+1n,(1),,𝒖j+1n,(Nc)]subscriptsuperscript𝒖𝑛1𝑗subscriptsuperscript𝒖𝑛subscript𝑁𝑐𝑗subscriptsuperscript𝒖𝑛1𝑗1subscriptsuperscript𝒖𝑛subscript𝑁𝑐𝑗1[\bm{u}^{n,(1)}_{j},\cdots,\bm{u}^{n,(N_{c})}_{j};\bm{u}^{n,(1)}_{j+1},\cdots,% \bm{u}^{n,(N_{c})}_{j+1}][ bold_italic_u start_POSTSUPERSCRIPT italic_n , ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ⋯ , bold_italic_u start_POSTSUPERSCRIPT italic_n , ( italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ; bold_italic_u start_POSTSUPERSCRIPT italic_n , ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT , ⋯ , bold_italic_u start_POSTSUPERSCRIPT italic_n , ( italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ] of length 2Nc2subscript𝑁𝑐2N_{c}2 italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT. The output matrices 𝑳j+12,θsubscript𝑳𝑗12𝜃\bm{L}_{j+\frac{1}{2},\theta}bold_italic_L start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_θ end_POSTSUBSCRIPT and 𝚲j+12,ϕsubscript𝚲𝑗12italic-ϕ\bm{\Lambda}_{j+\frac{1}{2},\phi}bold_Λ start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_ϕ end_POSTSUBSCRIPT take the same form as the output matrices in the first part. Given the four output matrices 𝑳j12,θsubscript𝑳𝑗12𝜃\bm{L}_{j-\frac{1}{2},\theta}bold_italic_L start_POSTSUBSCRIPT italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_θ end_POSTSUBSCRIPT, 𝚲j12,ϕsubscript𝚲𝑗12italic-ϕ\bm{\Lambda}_{j-\frac{1}{2},\phi}bold_Λ start_POSTSUBSCRIPT italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_ϕ end_POSTSUBSCRIPT, 𝑳j+12,θsubscript𝑳𝑗12𝜃\bm{L}_{j+\frac{1}{2},\theta}bold_italic_L start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_θ end_POSTSUBSCRIPT, and 𝚲j+12,ϕsubscript𝚲𝑗12italic-ϕ\bm{\Lambda}_{j+\frac{1}{2},\phi}bold_Λ start_POSTSUBSCRIPT italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_ϕ end_POSTSUBSCRIPT, we combine them through (58) and (59) to obtain 𝒖jn+1superscriptsubscript𝒖𝑗𝑛1\bm{u}_{j}^{n+1}bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT. Networks 𝑳θsubscript𝑳𝜃\bm{L}_{\theta}bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and 𝚲ϕsubscript𝚲italic-ϕ\bm{\Lambda}_{\phi}bold_Λ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT both consist of a chain of ResBlocks [49] with a linear layer of size Nh×Ncsubscript𝑁subscript𝑁𝑐N_{h}\times N_{c}italic_N start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT and Nhsubscript𝑁N_{h}italic_N start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT at the end, respectively. The ResBlock architecture comprises two convolutional layers and one ReLU layer. The Nhsubscript𝑁N_{h}italic_N start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT learned parameters by 𝚲ϕsubscript𝚲italic-ϕ\bm{\Lambda}_{\phi}bold_Λ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT is then transferred into a diagonal matrix of Nh×Nhsubscript𝑁subscript𝑁N_{h}\times N_{h}italic_N start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT with the learned parameters as its diagonal. The ResBlock has the same architecture as in [49], only with the 2D convolution layers replaced by linear layers. Note that the number in the parentheses is the dimension of the output of each ResBlock, and the computation procedure for grid cell j𝑗jitalic_j is applied to all grid cells. Since the computation of each node is independent of other cells except the adjacent cells, we could train them in parallel to achieve high efficiency.

In addition, we implement two ways of padding to address different boundary conditions. For periodic boundary conditions, we use the periodic padding, e.g., if j=0𝑗0j=0italic_j = 0, then 𝒖j1=𝒖Ngsubscript𝒖𝑗1subscript𝒖subscript𝑁𝑔\bm{u}_{j-1}=\bm{u}_{N_{g}}bold_italic_u start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT = bold_italic_u start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT, where Ngsubscript𝑁𝑔N_{g}italic_N start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT is the number of the grid node. For Neumann boundary conditions, we use the replicate padding, e.g., if j=0𝑗0j=0italic_j = 0, then 𝒖j1=𝒖0subscript𝒖𝑗1subscript𝒖0\bm{u}_{j-1}=\bm{u}_{0}bold_italic_u start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT = bold_italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

By introducing a hidden dimension Nhsubscript𝑁N_{h}italic_N start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT, we have increased the number of network parameters and enhanced the network’s expressive capacity. However, the expansion of the parameter space could lead to multiple numerical optimal solutions during training. To address this, we employ a regularized loss function, which helps ensure that the network parameters converge to a local optimal solution. Importantly, our goal is to use the network to accurately model the evolution of PDEs over time and space; achieving a unique solution for the network parameters is not a requirement.

Algorithm 3 summarizes the recursive relation from the input layer

𝒖(t=0)=[𝒖1(t=0),,𝒖Ng(t=0)]𝒖𝑡0subscript𝒖1𝑡0subscript𝒖subscript𝑁𝑔𝑡0\bm{u}(t=0)=[\bm{u}_{1}(t=0),\cdots,\bm{u}_{N_{g}}(t=0)]bold_italic_u ( italic_t = 0 ) = [ bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t = 0 ) , ⋯ , bold_italic_u start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t = 0 ) ] (60)

to the output layer

𝒖^(t=Tspan)=[𝒖^1(t=Tspan),,𝒖^Ng(t=Tspan)],^𝒖𝑡subscript𝑇𝑠𝑝𝑎𝑛subscript^𝒖1𝑡subscript𝑇𝑠𝑝𝑎𝑛subscript^𝒖subscript𝑁𝑔𝑡subscript𝑇𝑠𝑝𝑎𝑛\hat{\bm{u}}(t=T_{span})=[\hat{\bm{u}}_{1}(t=T_{span}),\cdots,\hat{\bm{u}}_{N_% {g}}(t=T_{span})],over^ start_ARG bold_italic_u end_ARG ( italic_t = italic_T start_POSTSUBSCRIPT italic_s italic_p italic_a italic_n end_POSTSUBSCRIPT ) = [ over^ start_ARG bold_italic_u end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t = italic_T start_POSTSUBSCRIPT italic_s italic_p italic_a italic_n end_POSTSUBSCRIPT ) , ⋯ , over^ start_ARG bold_italic_u end_ARG start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t = italic_T start_POSTSUBSCRIPT italic_s italic_p italic_a italic_n end_POSTSUBSCRIPT ) ] , (61)

for each time step in RoeNet. Here Ngsubscript𝑁𝑔N_{g}italic_N start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT is the spatial grid size and Tspansubscript𝑇𝑠𝑝𝑎𝑛T_{span}italic_T start_POSTSUBSCRIPT italic_s italic_p italic_a italic_n end_POSTSUBSCRIPT is the time span Ttrainsubscript𝑇𝑡𝑟𝑎𝑖𝑛T_{train}italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT or Tpredictsubscript𝑇𝑝𝑟𝑒𝑑𝑖𝑐𝑡T_{predict}italic_T start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t end_POSTSUBSCRIPT. As described in Algorithm 3, feeding 𝒖(t=0)𝒖𝑡0\bm{u}(t=0)bold_italic_u ( italic_t = 0 ), Tspan=Ttrainsubscript𝑇𝑠𝑝𝑎𝑛subscript𝑇𝑡𝑟𝑎𝑖𝑛T_{span}=T_{train}italic_T start_POSTSUBSCRIPT italic_s italic_p italic_a italic_n end_POSTSUBSCRIPT = italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT, temporal step ΔtΔ𝑡\Delta troman_Δ italic_t, spatial step ΔxΔ𝑥\Delta xroman_Δ italic_x, and the constructed networks 𝑳θsubscript𝑳𝜃\bm{L}_{\theta}bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and 𝚲ϕsubscript𝚲italic-ϕ\bm{\Lambda}_{\phi}bold_Λ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT into RoeNet, we could get predicted 𝒖^(t=Ttrain)^𝒖𝑡subscript𝑇𝑡𝑟𝑎𝑖𝑛\hat{\bm{u}}(t=T_{train})over^ start_ARG bold_italic_u end_ARG ( italic_t = italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT ). Then, we choose the MSE as our loss function

RoeNet=𝒖(t=Ttrain)𝒖^(t=Ttrain)MSE.subscript𝑅𝑜𝑒𝑁𝑒𝑡subscriptnorm𝒖𝑡subscript𝑇𝑡𝑟𝑎𝑖𝑛^𝒖𝑡subscript𝑇𝑡𝑟𝑎𝑖𝑛𝑀𝑆𝐸\mathcal{L}_{RoeNet}=\|\bm{u}(t=T_{train})-\hat{\bm{u}}(t=T_{train})\|_{MSE}.caligraphic_L start_POSTSUBSCRIPT italic_R italic_o italic_e italic_N italic_e italic_t end_POSTSUBSCRIPT = ∥ bold_italic_u ( italic_t = italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT ) - over^ start_ARG bold_italic_u end_ARG ( italic_t = italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT italic_M italic_S italic_E end_POSTSUBSCRIPT . (62)
Algorithm 3 Recursive relation from the input layer to the output layer in RoeNet. Here, 𝒖j,j=1,2,,Ngformulae-sequencesubscript𝒖𝑗𝑗12subscript𝑁𝑔\bm{u}_{j},~{}j=1,2,\cdots,N_{g}bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j = 1 , 2 , ⋯ , italic_N start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT represents discretized points 𝒖𝒖\bm{u}bold_italic_u in spatial coordinate. Source: [4].
1:Inputs: 𝒖j(t=0),j=1,2,,Ngformulae-sequencesubscript𝒖𝑗𝑡0𝑗12subscript𝑁𝑔\bm{u}_{j}(t=0),~{}j=1,2,\cdots,N_{g}bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t = 0 ) , italic_j = 1 , 2 , ⋯ , italic_N start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT, Tspansubscript𝑇𝑠𝑝𝑎𝑛T_{span}italic_T start_POSTSUBSCRIPT italic_s italic_p italic_a italic_n end_POSTSUBSCRIPT, ΔtΔ𝑡\Delta troman_Δ italic_t, ΔxΔ𝑥\Delta xroman_Δ italic_x, 𝑳θsubscript𝑳𝜃\bm{L}_{\theta}bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, 𝚲ϕsubscript𝚲italic-ϕ\bm{\Lambda}_{\phi}bold_Λ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT
2:Outputs: 𝒖^j(t=Tspan)=𝒖jNtsubscript^𝒖𝑗𝑡subscript𝑇𝑠𝑝𝑎𝑛superscriptsubscript𝒖𝑗subscript𝑁𝑡\hat{\bm{u}}_{j}(t=T_{span})=\bm{u}_{j}^{N_{t}}over^ start_ARG bold_italic_u end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t = italic_T start_POSTSUBSCRIPT italic_s italic_p italic_a italic_n end_POSTSUBSCRIPT ) = bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT
3:Nt=floor(Tspan/Δt)subscript𝑁𝑡floorsubscript𝑇𝑠𝑝𝑎𝑛Δ𝑡N_{t}=\text{floor}(T_{span}/\Delta t)italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = floor ( italic_T start_POSTSUBSCRIPT italic_s italic_p italic_a italic_n end_POSTSUBSCRIPT / roman_Δ italic_t )
4:λr=Δt/Δxsubscript𝜆𝑟Δ𝑡Δ𝑥\lambda_{r}=\Delta t/\Delta xitalic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = roman_Δ italic_t / roman_Δ italic_x
5:𝒖j0=𝒖j(t=0),j=1,2,,Ngformulae-sequencesuperscriptsubscript𝒖𝑗0subscript𝒖𝑗𝑡0𝑗12subscript𝑁𝑔\bm{u}_{j}^{0}=\bm{u}_{j}(t=0),~{}j=1,2,\cdots,N_{g}bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t = 0 ) , italic_j = 1 , 2 , ⋯ , italic_N start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT
6:for n=0𝑛0n=0italic_n = 0 to Nt1subscript𝑁𝑡1N_{t-1}italic_N start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT do
7:     Calculate 𝑳j±12,θnsuperscriptsubscript𝑳plus-or-minus𝑗12𝜃𝑛\bm{L}_{j\pm\frac{1}{2},\theta}^{n}bold_italic_L start_POSTSUBSCRIPT italic_j ± divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, 𝚲j±12,ϕnsuperscriptsubscript𝚲plus-or-minus𝑗12italic-ϕ𝑛\bm{\Lambda}_{j\pm\frac{1}{2},\phi}^{n}bold_Λ start_POSTSUBSCRIPT italic_j ± divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_ϕ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT by substituting 𝒖jn,j=1,2,,Ngformulae-sequencesuperscriptsubscript𝒖𝑗𝑛𝑗12subscript𝑁𝑔\bm{u}_{j}^{n},~{}j=1,2,\cdots,N_{g}bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_j = 1 , 2 , ⋯ , italic_N start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT, 𝑳θsubscript𝑳𝜃\bm{L}_{\theta}bold_italic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, and 𝚲ϕsubscript𝚲italic-ϕ\bm{\Lambda}_{\phi}bold_Λ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT into (59)
8:     Calculate 𝒖jn+1,j=1,2,,Ngformulae-sequencesuperscriptsubscript𝒖𝑗𝑛1𝑗12subscript𝑁𝑔\bm{u}_{j}^{n+1},~{}j=1,2,\cdots,N_{g}bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT , italic_j = 1 , 2 , ⋯ , italic_N start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT by substituting 𝒖jnsuperscriptsubscript𝒖𝑗𝑛\bm{u}_{j}^{n}bold_italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, 𝑳j±12,θnsuperscriptsubscript𝑳plus-or-minus𝑗12𝜃𝑛\bm{L}_{j\pm\frac{1}{2},\theta}^{n}bold_italic_L start_POSTSUBSCRIPT italic_j ± divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, 𝚲j±12,ϕnsuperscriptsubscript𝚲plus-or-minus𝑗12italic-ϕ𝑛\bm{\Lambda}_{j\pm\frac{1}{2},\phi}^{n}bold_Λ start_POSTSUBSCRIPT italic_j ± divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_ϕ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and λrsubscript𝜆𝑟\lambda_{r}italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT into (58)
9:end for

4.4 Neural Vortex Method (NVM)

To accurately and efficiently quantify fluid dynamics, we propose the novel NVM framework. This framework utilizes physics-informed neural networks to extract and translate information from the Eulerian specification of the flow field (or images of flow visualizations) into knowledge about the underlying fluid field. As detailed in Figure 7, we integrate these networks with a vorticity-to-velocity Poisson solver to build a fully automated toolchain that extracts high-resolution Eulerian flow fields from Lagrangian inductive priors. This design addresses the challenge of learning directly from high-dimensional observations, such as images, which traditional methods struggle to convert directly into velocity and pressure fields.

Refer to caption
Figure 7: Schematic diagram of the NVM. Our system is constituted of two networks, the detection network and the dynamics network, which are embedded with a vorticity-to-velocity Poisson solver. Source: [3].

We construct a vortex detection network in Section 4.4.1 to identify the positions and the vorticity of Lagrangian vortices from a grid-based velocity field, which from a mathematical perspective connects (31) with (33). This approach simplifies the vorticity field to include only the detected vortices. Given the detected vortices, we then use a vortex dynamics network in Section 4.4.2 to learn the underlying governing dynamics of these finite structures. Dynamics networks accurately model the r.h.s.of (33) under various conditions, resolving the longstanding problem in LVM.

The training of the NVM involves two primary steps: training the detection and dynamics networks. We employ high-fidelity data from direct numerical simulation (DNS) of interactions among 2 to 6 vortices, although the model can generalize to any vorticity field with an arbitrary number of vortices. We initially train the detection network using data from randomly generated vortices and their vorticity fields, then identify vortices’ positions and strengths using this trained network to facilitate the subsequent training of the dynamics network.

4.4.1 Detection network

The input of the detection network is a vorticity field of size 200×200×12002001200\times 200\times 1200 × 200 × 1. As shown in Figure 8, we first feed the vorticity field into a small one-stage detection network and get the feature map of size 25×25×512252551225\times 25\times 51225 × 25 × 512 (we downsampled 3 times). The detection network consists of a Conv2d-BatchNorm-ReLU combo and a 6-layer-structured ResBlock chain whose size can be adjusted dynamically to the complexity of the problem. The primary reason for downsampling is to avoid extremely unbalanced data and multiple predictions for the same vortex. We then forward the feature map to 2 branches. In the first branch, we conduct a 1×1111\times 11 × 1 convolution to generate a probability score p^^𝑝\hat{p}over^ start_ARG italic_p end_ARG of the possibility that there exists a vortex. If p^>0.5^𝑝0.5\hat{p}>0.5over^ start_ARG italic_p end_ARG > 0.5, we believe there exists a vortex within the corresponding cells of the original 200×200×12002001200\times 200\times 1200 × 200 × 1 vorticity field. In the second branch, we predict the relative position to the left-up corner of the cell of the feature map if the cell contains a vortex. Afterward, we set a bounding box of 10×10101010\times 1010 × 10 around these predicted vortices and use the weighted average of the positions of the cells of the original vorticity field to find the exact position of the vortex. Finally, the vortex particle strength is calculated as the sum of the value of the cells in the bounding box normalized by the cell area.

Refer to caption
Figure 8: The architecture of the detection network. It takes the vorticity field as input and outputs the position and vortex particle strength for each vortex detected. The Conv means the Conv2d-BatchNorm-ReLU combo, and the ResBlock is the same as in [49]. In each ResBlock, we use stride 2 to downsample the feature map. The Resblock chain is six-layer structured. The number in the parenthesis is the output dimension. Source: [3].

In the training process, we penalize the wrong position detection only if the cell containing a vortex in the ground truth given by DNS is not detected. This idea is similar to the real-time object detection in [64]. We do not use the weighted average method to find the position in the training to ensure the detection network can produce detection results as accurately as possible. We use the focal loss [65] to further relieve the unbalanced classification problem.

We mainly use the detection network to generate training data for the dynamics network because we want to use the high-resolution data generated by the method mentioned in Section 5.4.1 instead of by the approximate particle method (BS law). Moreover, there are many situations where BS law is inapplicable, as discussed previously in Section 3.2.6. The detection network enables us to find the positions of the vortices accurately regardless of the situation.

The detection network is responsible for providing necessary information to the dynamics network. After the training, we use the well-trained detection network to detect the vortices in the initial vorticity fields and the evolved vorticity field, both generated by the method in Section 5.4.1. We then apply the nearest-neighbor method to pair the vortices detected in these two fields. Figure 9 shows the case of two fields at t=0𝑡0t=0italic_t = 0 and t=0.2𝑡0.2t=0.2italic_t = 0.2. The idea of nearest-neighbor pairing can be perceived from Figure 9 (c). The sample, or these two fields, is dropped if different numbers of vortices are detected in the initial and evolved fields or if a large difference exists in the vorticity of paired vortices. The successfully detected vortices in the initial and evolved vorticity fields are passed together into the dynamics network for its training.

Refer to caption
Figure 9: A example of vorticity contour at (a) t=0𝑡0t=0italic_t = 0, (b) t=0.2𝑡0.2t=0.2italic_t = 0.2, and (c) superposition of t=0𝑡0t=0italic_t = 0 and t=0.2𝑡0.2t=0.2italic_t = 0.2. The black circles indicate the location recognized by the detection network. The evolution from (a) to (b) is calculated by DNS. Source: [3].

4.4.2 Dynamics network

To learn the underlying dynamics of the vortices, we build a graph neural network similar to [19]. We predict the velocity of one vortex due to influences exerted by the other vortices and the external force. Then we use the fourth-order Runge–Kutta integrator to calculate the position in the next timestamp. As shown in Figure 10, for each vortex, we use a neural network A(θ1)𝐴subscript𝜃1A(\theta_{1})italic_A ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) to predict the influences exerted by the other vortices and add them up. Specifically, for each i𝑖iitalic_ith vortex, we consider the vortex j(ji)𝑗𝑗𝑖j(j\neq i)italic_j ( italic_j ≠ italic_i ). The difference of their positions can be calculated by diffij=posiposjsubscriptdiff𝑖𝑗subscriptpos𝑖subscriptpos𝑗\textrm{diff}_{ij}=\textrm{pos}_{i}-\textrm{pos}_{j}diff start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = pos start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - pos start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and their L2 distance is distij=diffij2subscriptdist𝑖𝑗subscriptnormsubscriptdiff𝑖𝑗2\textrm{dist}_{ij}=\|\textrm{diff}_{ij}\|_{2}dist start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = ∥ diff start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. The input of the A(θ1)𝐴subscript𝜃1A(\theta_{1})italic_A ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) is the vector (diffij,distij,vortj)subscriptdiff𝑖𝑗subscriptdist𝑖𝑗subscriptvort𝑗(\textrm{diff}_{ij},\textrm{dist}_{ij},\textrm{vort}_{j})( diff start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , dist start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , vort start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) of length 4. Here, pos and vort are detected by the detection network. The output of A(θ1)𝐴subscript𝜃1A(\theta_{1})italic_A ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) is a vector with the same dimension of the flow field, characterizing the induced velocity of the j𝑗jitalic_jth vortex to the i𝑖iitalic_ith vortex. In this way, we can calculate the induced velocity of each vortex j𝑗jitalic_j (ji𝑗𝑖j\neq iitalic_j ≠ italic_i) on the vortex i𝑖iitalic_i. We sum up all the induced velocities on the vortex i𝑖iitalic_i and treat the result as the induced velocity exerted by the other vortices.

Refer to caption
Figure 10: The architecture of the dynamics network. It takes the particle’s attribution as input and outputs each vortex’s position. The ResBlock has the same architecture as in [49] with the convolution layers replaced by linear layers. The number in the parenthesis is the output dimension. Source: [3].

In addition, we use another neural network A(θ2)𝐴subscript𝜃2A(\theta_{2})italic_A ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), to predict the influence caused by the external force, which is determined by the local vorticity and the position of the vortex. The input of A(θ2)𝐴subscript𝜃2A(\theta_{2})italic_A ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) is a vector of length 3. The output is the influence exerted by the environment on the vortex i𝑖iitalic_i, i.e., the induced velocity of the external force to i𝑖iitalic_ith vortex.

The reason we separate the induced velocity into two parts, i.e., A(θ1)𝐴subscript𝜃1A(\theta_{1})italic_A ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and A(θ2)𝐴subscript𝜃2A(\theta_{2})italic_A ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), is as follows. On the one hand, the induced velocities between vortex particles are global, and exhibit a certain symmetry, i.e., the vortex particles interact with each other following the same law. In contrast, the influence of external forces on vortex particles is usually local and direct; thus, we do not need to consider the interaction between particles. The effect of the vortex stretching term in three-dimensional vortex flows or diffusion term in viscous flows is also local and should be included in network A(θ2)𝐴subscript𝜃2A(\theta_{2})italic_A ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). Note that both the outputs of A(θ1)𝐴subscript𝜃1A(\theta_{1})italic_A ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and A(θ2)𝐴subscript𝜃2A(\theta_{2})italic_A ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) are a vector with the same dimension of the flow field. Thus, we can add the two kinds of influence together, whose result is defined as the velocity of the vortex i𝑖iitalic_i. We feed the velocity into the fourth-order Runge–Kutta integrator to obtain the predicted position of vortex i𝑖iitalic_i.

In addition, in predicting the evolution of the flow field, NVM replaces the discrete BS method with a dynamics network composed of ResBlocks. We chose a 5-layer ResBloks to improve the expressiveness of the dynamics network so that we can learn dynamics of different complexity on the same network. Since the dynamics network with 5-layer ResBloks is more complex than the discrete BS method, the computational cost of NVM is higher than that of the Lagrangian vortex method. We remark that although the computational cost of ResBlocks itself is relatively large in NVM, the number of vortex particles needed to predict the evolution of the flow field using NVM is much smaller. Therefore, the overall computational cost of NVM can be greatly reduced.

5 Results

We present several experiments here to highlight the key advantages of our methodologies. For additional examples and ablation tests, please refer to [1, 2, 3, 4].

5.1 Taylor-nets

5.1.1 Dataset generation and training settings

To make a fair comparison with the ground truth, we generate our training and testing datasets by using the same numerical integrator based on a given analytical Hamiltonian. In the learning process, we generate Ntrainsubscript𝑁𝑡𝑟𝑎𝑖𝑛N_{train}italic_N start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT training samples, and for each training sample, we first pick a random initial point (𝒒0,𝒑0)subscript𝒒0subscript𝒑0(\bm{q}_{0},\bm{p}_{0})( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) (input), then use the symplectic integrator discussed in Section 3.2.3 to calculate the value (𝒒n,𝒑n)subscript𝒒𝑛subscript𝒑𝑛(\bm{q}_{n},\bm{p}_{n})( bold_italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) (target) of the trajectory at the end of the training period Ttrainsubscript𝑇𝑡𝑟𝑎𝑖𝑛T_{train}italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT. We do the same to generate a validation dataset with Nvalidation=100subscript𝑁𝑣𝑎𝑙𝑖𝑑𝑎𝑡𝑖𝑜𝑛100N_{validation}=100italic_N start_POSTSUBSCRIPT italic_v italic_a italic_l italic_i italic_d italic_a italic_t italic_i italic_o italic_n end_POSTSUBSCRIPT = 100 samples and the same time span as Ttrainsubscript𝑇𝑡𝑟𝑎𝑖𝑛T_{train}italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT and calculate the validation loss Lvalidationsubscript𝐿𝑣𝑎𝑙𝑖𝑑𝑎𝑡𝑖𝑜𝑛L_{validation}italic_L start_POSTSUBSCRIPT italic_v italic_a italic_l italic_i italic_d italic_a italic_t italic_i italic_o italic_n end_POSTSUBSCRIPT along the training loss Ltrainsubscript𝐿𝑡𝑟𝑎𝑖𝑛L_{train}italic_L start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT to evaluate the training process. In addition, we generate a set of testing data with Ntest=100subscript𝑁𝑡𝑒𝑠𝑡100N_{test}=100italic_N start_POSTSUBSCRIPT italic_t italic_e italic_s italic_t end_POSTSUBSCRIPT = 100 samples and predicting time span Tpredictsubscript𝑇𝑝𝑟𝑒𝑑𝑖𝑐𝑡T_{predict}italic_T start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t end_POSTSUBSCRIPT that is around 6000 times larger and calculate the prediction error ϵpsubscriptitalic-ϵ𝑝\epsilon_{p}italic_ϵ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT to evaluate the predictive ability of the model. For simplicity, we use (𝒑^n,𝒒^n)subscriptbold-^𝒑𝑛subscriptbold-^𝒒𝑛(\bm{\hat{p}}_{n},\bm{\hat{q}}_{n})( overbold_^ start_ARG bold_italic_p end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , overbold_^ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) to represent the predicted values using our trained model.

We remark that our training dataset is relatively smaller than that used by the other methods. Most of the methods, e.g. ODE-net [50] and HNN [15], have to rely on intermediate data in their training data to train the model. That is the dataset is

[(𝒒0(s),𝒑0(s)),(𝒒1(s),𝒑1(s)),,(𝒒n1(s),𝒑n1(s)),(𝒒n(s),𝒑n(s))]s=1Ntrain,superscriptsubscriptsuperscriptsubscript𝒒0𝑠superscriptsubscript𝒑0𝑠superscriptsubscript𝒒1𝑠superscriptsubscript𝒑1𝑠superscriptsubscript𝒒𝑛1𝑠superscriptsubscript𝒑𝑛1𝑠superscriptsubscript𝒒𝑛𝑠superscriptsubscript𝒑𝑛𝑠𝑠1subscript𝑁𝑡𝑟𝑎𝑖𝑛[(\bm{q}_{0}^{(s)},\bm{p}_{0}^{(s)}),(\bm{q}_{1}^{(s)},\bm{p}_{1}^{(s)}),\dots% ,(\bm{q}_{n-1}^{(s)},\bm{p}_{n-1}^{(s)}),(\bm{q}_{n}^{(s)},\bm{p}_{n}^{(s)})]_% {s=1}^{N_{train}},[ ( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) , ( bold_italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) , … , ( bold_italic_q start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) , ( bold_italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ,

where (𝒒1(s),𝒑1(s)),(𝒒n1(s),𝒑n1(s))superscriptsubscript𝒒1𝑠superscriptsubscript𝒑1𝑠superscriptsubscript𝒒𝑛1𝑠superscriptsubscript𝒑𝑛1𝑠(\bm{q}_{1}^{(s)},\bm{p}_{1}^{(s)})\dots,(\bm{q}_{n-1}^{(s)},\bm{p}_{n-1}^{(s)})( bold_italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) … , ( bold_italic_q start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) are n1𝑛1n-1italic_n - 1 intermediate points collected within Ttrainsubscript𝑇𝑡𝑟𝑎𝑖𝑛T_{train}italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT in between (𝒒0(s),𝒑0(s))superscriptsubscript𝒒0𝑠superscriptsubscript𝒑0𝑠(\bm{q}_{0}^{(s)},\bm{p}_{0}^{(s)})( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) and (𝒒n(s),𝒑n(s))superscriptsubscript𝒒𝑛𝑠superscriptsubscript𝒑𝑛𝑠(\bm{q}_{n}^{(s)},\bm{p}_{n}^{(s)})( bold_italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ). On the other hand, we only use two data points per sample, the initial data point and the end point, and our dataset looks like

[(𝒒0(s),𝒑0(s)),(𝒒n(s),𝒑n(s))]s=1Ntrain,superscriptsubscriptsuperscriptsubscript𝒒0𝑠superscriptsubscript𝒑0𝑠superscriptsubscript𝒒𝑛𝑠superscriptsubscript𝒑𝑛𝑠𝑠1subscript𝑁𝑡𝑟𝑎𝑖𝑛\left[(\bm{q}_{0}^{(s)},\bm{p}_{0}^{(s)}),(\bm{q}_{n}^{(s)},\bm{p}_{n}^{(s)})% \right]_{s=1}^{N_{train}},[ ( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) , ( bold_italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ,

which is n1𝑛1n-1italic_n - 1 times smaller the dataset of the other methods, if we do not count (𝒒0(s),𝒑0(s))superscriptsubscript𝒒0𝑠superscriptsubscript𝒑0𝑠(\bm{q}_{0}^{(s)},\bm{p}_{0}^{(s)})( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ). Our predicting time span Tpredictsubscript𝑇𝑝𝑟𝑒𝑑𝑖𝑐𝑡T_{predict}italic_T start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t end_POSTSUBSCRIPT is around 6000 times the training period used in the training dataset Ttrainsubscript𝑇𝑡𝑟𝑎𝑖𝑛T_{train}italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT (as compared to 10 times in HNN). This leads to a 600 times compression of the training data, in the dimension of temporal evolution. Note that we fix Ttrainsubscript𝑇𝑡𝑟𝑎𝑖𝑛T_{train}italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT and Tpredictsubscript𝑇𝑝𝑟𝑒𝑑𝑖𝑐𝑡T_{predict}italic_T start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t end_POSTSUBSCRIPT in practice so that we can train our network more efficiently on GPU. One can also choose to generate training data with different Ttrainsubscript𝑇𝑡𝑟𝑎𝑖𝑛T_{train}italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT for each sample to obtain more robust performance.

We use the Adam optimizer [48]. We choose the automatic differentiation method as our backward propagation method. We have tried both the adjoint sensitivity method, which is used in ODE-net [50] and the automatic differentiation method. Both methods can be used to train the model well. However, we found that using the adjoint sensitivity method is much slower than using the automatic differentiation method considering the large parameter size of neural networks.

All Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Bisubscript𝐵𝑖B_{i}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in (43) are initialized as Ai,Bi𝒩(0,2/[NNh(i+1)])similar-tosubscript𝐴𝑖subscript𝐵𝑖𝒩02delimited-[]𝑁subscript𝑁𝑖1A_{i},B_{i}\sim\mathcal{N}(0,\sqrt{2/[N*N_{h}*(i+1)]})italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , square-root start_ARG 2 / [ italic_N ∗ italic_N start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∗ ( italic_i + 1 ) ] end_ARG ), where N𝑁Nitalic_N is the dimension of the system and Nhsubscript𝑁N_{h}italic_N start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT is the size of the hidden layers. The loss function is

Ltrain=1Ntrains=1Ntrain𝒑^n(s)𝒑n(s)1+𝒒^n(s)𝒒n(s)1.subscript𝐿𝑡𝑟𝑎𝑖𝑛1subscript𝑁𝑡𝑟𝑎𝑖𝑛superscriptsubscript𝑠1subscript𝑁𝑡𝑟𝑎𝑖𝑛subscriptnormsuperscriptsubscriptbold-^𝒑𝑛𝑠superscriptsubscript𝒑𝑛𝑠1subscriptnormsuperscriptsubscriptbold-^𝒒𝑛𝑠superscriptsubscript𝒒𝑛𝑠1L_{train}=\frac{1}{N_{train}}\sum_{s=1}^{N_{train}}\|\bm{\hat{p}}_{n}^{(s)}-% \bm{p}_{n}^{(s)}\|_{1}+\|\bm{\hat{q}}_{n}^{(s)}-\bm{q}_{n}^{(s)}\|_{1}.italic_L start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∥ overbold_^ start_ARG bold_italic_p end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT - bold_italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ∥ overbold_^ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT - bold_italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . (63)

The validation loss Lvalidationsubscript𝐿𝑣𝑎𝑙𝑖𝑑𝑎𝑡𝑖𝑜𝑛L_{validation}italic_L start_POSTSUBSCRIPT italic_v italic_a italic_l italic_i italic_d italic_a italic_t italic_i italic_o italic_n end_POSTSUBSCRIPT is the same as (67) but with dataset different from the training dataset. We choose L1𝐿1L1italic_L 1 loss, instead of Mean Square Error (MSE) loss because of its better performance.

We will introduce the experimental result for an ideal pendulum system, which is defined

(q,p)=12p2cos(q).𝑞𝑝12superscript𝑝2𝑞\mathcal{H}(q,p)=\frac{1}{2}p^{2}-\cos{(q)}.caligraphic_H ( italic_q , italic_p ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - roman_cos ( italic_q ) . (64)

We pick a random initial point for training (𝒒0,𝒑0)[2,2]×[2,2]subscript𝒒0subscript𝒑02222(\bm{q}_{0},\bm{p}_{0})\in\left[-2,2\right]\times\left[-2,2\right]( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∈ [ - 2 , 2 ] × [ - 2 , 2 ].

To show the predictive ability of our model, we pick Tpredict=20πsubscript𝑇𝑝𝑟𝑒𝑑𝑖𝑐𝑡20𝜋T_{predict}=20\piitalic_T start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t end_POSTSUBSCRIPT = 20 italic_π. We pick 15 as the sample size since we find that small Ntrainsubscript𝑁𝑡𝑟𝑎𝑖𝑛N_{train}italic_N start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT’s are sufficient to generate excellent results. We use 100 epochs for training, and 10 as the step_size𝑠𝑡𝑒𝑝_𝑠𝑖𝑧𝑒step\_sizeitalic_s italic_t italic_e italic_p _ italic_s italic_i italic_z italic_e (the period of learning rate decay), and 0.8 as γ𝛾\gammaitalic_γ (the multiplicative factor of learning rate decay). The learning rate of each parameter group is decayed by γ𝛾\gammaitalic_γ every step_size𝑠𝑡𝑒𝑝_𝑠𝑖𝑧𝑒step\_sizeitalic_s italic_t italic_e italic_p _ italic_s italic_i italic_z italic_e epochs, which prevents the model from overshooting the local minimum. The dynamic learning rate can also make our model converge faster. M𝑀Mitalic_M indicates the number of terms of the Taylor polynomial introduced in the construction of the neural networks (43). Through experimentation, we find that 8 terms can represent most functions well. We choose 16 as Nhsubscript𝑁N_{h}italic_N start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT, the dimension of hidden layers.

5.1.2 Predictive ability and robustness

Refer to caption
Figure 11: Prediction error ϵp(nt)superscriptsubscriptitalic-ϵ𝑝subscript𝑛𝑡\epsilon_{p}^{(n_{t})}italic_ϵ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT at different t𝑡titalic_t from t=0𝑡0t=0italic_t = 0 to t=20π𝑡20𝜋t=20\piitalic_t = 20 italic_π for the pendulum problem (a) without noise, (b) with noise σ1,σ2𝒩(0,0.1)similar-tosubscript𝜎1subscript𝜎2𝒩00.1\sigma_{1},\sigma_{2}\sim\mathcal{N}(0,0.1)italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 0.1 ), and (c) with noise σ1,σ2𝒩(0,0.5)similar-tosubscript𝜎1subscript𝜎2𝒩00.5\sigma_{1},\sigma_{2}\sim\mathcal{N}(0,0.5)italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 0.5 ). In the figure, t=ntΔt𝑡subscript𝑛𝑡Δ𝑡t=n_{t}\Delta titalic_t = italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT roman_Δ italic_t, where Δt=0.01Δ𝑡0.01\Delta t=0.01roman_Δ italic_t = 0.01. ϵp(nt)superscriptsubscriptitalic-ϵ𝑝subscript𝑛𝑡\epsilon_{p}^{(n_{t})}italic_ϵ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT is the prediction error at the ntthsuperscriptsubscript𝑛𝑡thn_{t}^{\textrm{th}}italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT predicted point among the total NT=Tpredict/Δtsubscript𝑁𝑇subscript𝑇𝑝𝑟𝑒𝑑𝑖𝑐𝑡Δ𝑡N_{T}=T_{predict}/\Delta titalic_N start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = italic_T start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t end_POSTSUBSCRIPT / roman_Δ italic_t predicted points. We use Ttrain=0.01subscript𝑇𝑡𝑟𝑎𝑖𝑛0.01T_{train}=0.01italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT = 0.01, Ttrain=0.5subscript𝑇𝑡𝑟𝑎𝑖𝑛0.5T_{train}=0.5italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT = 0.5 and Ttrain=1subscript𝑇𝑡𝑟𝑎𝑖𝑛1T_{train}=1italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT = 1 to train the model in (a), (b), and (c), respectively. Source: [1].
Table 2: Comparison of ϵpsubscriptitalic-ϵ𝑝\epsilon_{p}italic_ϵ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT for the pendulum problem without noise, with noise σ1,σ2𝒩(0,0.1)similar-tosubscript𝜎1subscript𝜎2𝒩00.1\sigma_{1},\sigma_{2}\sim\mathcal{N}(0,0.1)italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 0.1 ), and with noise σ1,σ2𝒩(0,0.5)similar-tosubscript𝜎1subscript𝜎2𝒩00.5\sigma_{1},\sigma_{2}\sim\mathcal{N}(0,0.5)italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 0.5 ). Source: [1].
Methods Taylor-net HNN ODE-net
ϵpsubscriptitalic-ϵ𝑝\epsilon_{p}italic_ϵ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, without noise 0.213 0.377 1.416
ϵpsubscriptitalic-ϵ𝑝\epsilon_{p}italic_ϵ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, with noise σ1,σ2𝒩(0,0.1)similar-tosubscript𝜎1subscript𝜎2𝒩00.1\sigma_{1},\sigma_{2}\sim\mathcal{N}(0,0.1)italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 0.1 ) 1.667 2.433 3.301
ϵpsubscriptitalic-ϵ𝑝\epsilon_{p}italic_ϵ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, with noise σ1,σ2𝒩(0,0.5)similar-tosubscript𝜎1subscript𝜎2𝒩00.5\sigma_{1},\sigma_{2}\sim\mathcal{N}(0,0.5)italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 0.5 ) 1.293 2.416 27.114
Refer to caption
Figure 12: Prediction results of position 𝒒𝒒\bm{q}bold_italic_q from t=0𝑡0t=0italic_t = 0 to t=20π𝑡20𝜋t=20\piitalic_t = 20 italic_π for the pendulum problem using Taylor-net, HNN, and ODE-net (a) without noise, (b) with noise σ1,σ2𝒩(0,0.1)similar-tosubscript𝜎1subscript𝜎2𝒩00.1\sigma_{1},\sigma_{2}\sim\mathcal{N}(0,0.1)italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 0.1 ), and (c) with noise σ1,σ2𝒩(0,0.5)similar-tosubscript𝜎1subscript𝜎2𝒩00.5\sigma_{1},\sigma_{2}\sim\mathcal{N}(0,0.5)italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 0.5 ). For all the models, we set the initial point as (𝒒0,𝒑0)=(1,1)subscript𝒒0subscript𝒑011(\bm{q}_{0},\bm{p}_{0})=(1,1)( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = ( 1 , 1 ). We use Ttrain=0.01subscript𝑇𝑡𝑟𝑎𝑖𝑛0.01T_{train}=0.01italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT = 0.01, Ttrain=0.5subscript𝑇𝑡𝑟𝑎𝑖𝑛0.5T_{train}=0.5italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT = 0.5 and Ttrain=1subscript𝑇𝑡𝑟𝑎𝑖𝑛1T_{train}=1italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT = 1 to train the model in (a), (b), and (c), respectively. All the methods are trained until the Lvalidationsubscript𝐿𝑣𝑎𝑙𝑖𝑑𝑎𝑡𝑖𝑜𝑛L_{validation}italic_L start_POSTSUBSCRIPT italic_v italic_a italic_l italic_i italic_d italic_a italic_t italic_i italic_o italic_n end_POSTSUBSCRIPT converges. Source: [1].

Now, to assess how well our method can predict the future flow, we compare the predictive ability of Taylor-net with ODE-net and HNN. We apply all three methods on the pendulum problem, and let Ttrain=0.01subscript𝑇𝑡𝑟𝑎𝑖𝑛0.01T_{train}=0.01italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT = 0.01 and Tpredict=20πsubscript𝑇𝑝𝑟𝑒𝑑𝑖𝑐𝑡20𝜋T_{predict}=20\piitalic_T start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t end_POSTSUBSCRIPT = 20 italic_π. We evaluate the performance of the models by calculating the average prediction error at each predicted points, defined by

ϵp(nt)=1Ntests=1Ntest𝒑^n(s,nt)𝒑n(s,nt)1+𝒒^n(s,nt)𝒒n(s,nt)1,superscriptsubscriptitalic-ϵ𝑝subscript𝑛𝑡1subscript𝑁𝑡𝑒𝑠𝑡superscriptsubscript𝑠1subscript𝑁𝑡𝑒𝑠𝑡subscriptnormsubscriptsuperscriptbold-^𝒑𝑠subscript𝑛𝑡𝑛superscriptsubscript𝒑𝑛𝑠subscript𝑛𝑡1subscriptnormsuperscriptsubscriptbold-^𝒒𝑛𝑠subscript𝑛𝑡superscriptsubscript𝒒𝑛𝑠subscript𝑛𝑡1\epsilon_{p}^{(n_{t})}=\frac{1}{N_{test}}\sum_{s=1}^{N_{test}}\|\bm{\hat{p}}^{% (s,n_{t})}_{n}-\bm{p}_{n}^{(s,n_{t})}\|_{1}+\|\bm{\hat{q}}_{n}^{(s,n_{t})}-\bm% {q}_{n}^{(s,n_{t})}\|_{1},italic_ϵ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_t italic_e italic_s italic_t end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_t italic_e italic_s italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∥ overbold_^ start_ARG bold_italic_p end_ARG start_POSTSUPERSCRIPT ( italic_s , italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - bold_italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s , italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ∥ overbold_^ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s , italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT - bold_italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_s , italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , (65)

and the average ϵp(nt)superscriptsubscriptitalic-ϵ𝑝subscript𝑛𝑡\epsilon_{p}^{(n_{t})}italic_ϵ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT over Tpredictsubscript𝑇𝑝𝑟𝑒𝑑𝑖𝑐𝑡T_{predict}italic_T start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t end_POSTSUBSCRIPT is

ϵp=1NTnt=1NTϵp(nt),subscriptitalic-ϵ𝑝1subscript𝑁𝑇superscriptsubscriptsubscript𝑛𝑡1subscript𝑁𝑇superscriptsubscriptitalic-ϵ𝑝subscript𝑛𝑡\epsilon_{p}=\frac{1}{N_{T}}\sum_{n_{t}=1}^{N_{T}}\epsilon_{p}^{(n_{t})},italic_ϵ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT , (66)

where Ntestsubscript𝑁𝑡𝑒𝑠𝑡N_{test}italic_N start_POSTSUBSCRIPT italic_t italic_e italic_s italic_t end_POSTSUBSCRIPT represents the testing sample size specified in Section 5.1.1 and NT=Tpredict/Δtsubscript𝑁𝑇subscript𝑇𝑝𝑟𝑒𝑑𝑖𝑐𝑡Δ𝑡N_{T}=T_{predict}/\Delta titalic_N start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = italic_T start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t end_POSTSUBSCRIPT / roman_Δ italic_t with Δt=0.01Δ𝑡0.01\Delta t=0.01roman_Δ italic_t = 0.01. After experimentation, we find that Taylor-net has stronger predictive ability than the other two methods. The first row of Table 2 shows the average prediction error of 100 testing samples using the three methods over Tpredictsubscript𝑇𝑝𝑟𝑒𝑑𝑖𝑐𝑡T_{predict}italic_T start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t end_POSTSUBSCRIPT when no noise is added. The prediction error of HNN is almost double that of Taylor-net, while the prediction error of ODE-net is about 7 times that of Taylor-net. To analyze the difference more quantitatively, we made several plots to help us better compare the prediction results. Figure 11 shows the plots of prediction error ϵp(nt)superscriptsubscriptitalic-ϵ𝑝subscript𝑛𝑡\epsilon_{p}^{(n_{t})}italic_ϵ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT against t=ntΔt𝑡subscript𝑛𝑡Δ𝑡t=n_{t}\Delta titalic_t = italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT roman_Δ italic_t over Tpredictsubscript𝑇𝑝𝑟𝑒𝑑𝑖𝑐𝑡T_{predict}italic_T start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t end_POSTSUBSCRIPT for all three methods. In Figure 12, we plot the prediction of position q𝑞qitalic_q against time period for all three methods as well as the ground truth in order to see how well the prediction results match the ground truth. From Figure 12 (a), we can already see that the prediction result of ODE-net gradually deviates from the ground truth as time progresses, while the prediction of Taylor-net and HNN stays mostly consistent with the ground truth, with the former being slightly closer to the ground truth. The difference between Taylor-net and HNN can be seen more clearly in Figure 11 (a). Observe that the prediction error of Taylor-net is obviously smaller than that of the other two methods, and the difference becomes more and more apparent as time increases. The prediction error of ODE-net is larger than HNN and Taylor-net at the beginning of Tpredictsubscript𝑇𝑝𝑟𝑒𝑑𝑖𝑐𝑡T_{predict}italic_T start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t end_POSTSUBSCRIPT and increases at a much faster rate than the other two methods. Although the prediction error of HNN has no obvious difference from that of Taylor-net at the beginning, it gradually diverges from the prediction error of Taylor-net.

5.2 NSSNNs

5.2.1 Dataset generation and training settings

We use 6 linear layers with hidden size 64 to model θsubscript𝜃\mathcal{H}_{\theta}caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, all of which are followed by a Sigmoid activation function except the last one. The derivatives θ/𝒑subscript𝜃𝒑\partial\mathcal{H}_{\theta}/\partial\bm{p}∂ caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT / ∂ bold_italic_p, θ/𝒒subscript𝜃𝒒\partial\mathcal{H}_{\theta}/\partial\bm{q}∂ caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT / ∂ bold_italic_q, θ/𝒙subscript𝜃𝒙\partial\mathcal{H}_{\theta}/\partial\bm{x}∂ caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT / ∂ bold_italic_x, θ/𝒚subscript𝜃𝒚\partial\mathcal{H}_{\theta}/\partial\bm{y}∂ caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT / ∂ bold_italic_y are all obtained by automatic differentiation in Pytorch [66]. The weights of the linear layers are initialized by Xavier initializaiton [67].

We generate the dataset for training and validation using high-precision numerical solver [55], where the ratio of training and validation datasets is 9:1:919:19 : 1. We set the dataset (𝒒0j,𝒑0j)superscriptsubscript𝒒0𝑗superscriptsubscript𝒑0𝑗(\bm{q}_{0}^{j},\bm{p}_{0}^{j})( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) as the start input and (𝒒j,𝒑j)superscript𝒒𝑗superscript𝒑𝑗(\bm{q}^{j},\bm{p}^{j})( bold_italic_q start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) as the target with j=1,2,,Ns𝑗12subscript𝑁𝑠j=1,2,\cdots,N_{s}italic_j = 1 , 2 , ⋯ , italic_N start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, and the time span between (𝒒0j,𝒑0j)superscriptsubscript𝒒0𝑗superscriptsubscript𝒑0𝑗(\bm{q}_{0}^{j},\bm{p}_{0}^{j})( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) and (𝒒j,𝒑j)superscript𝒒𝑗superscript𝒑𝑗(\bm{q}^{j},\bm{p}^{j})( bold_italic_q start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) is Ttrainsubscript𝑇𝑡𝑟𝑎𝑖𝑛T_{train}italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT. Feeding (𝒒0,𝒑0)=(𝒒0j,𝒑0j),t0=0,t=Ttrainformulae-sequencesubscript𝒒0subscript𝒑0superscriptsubscript𝒒0𝑗superscriptsubscript𝒑0𝑗formulae-sequencesubscript𝑡00𝑡subscript𝑇𝑡𝑟𝑎𝑖𝑛(\bm{q}_{0},\bm{p}_{0})=(\bm{q}_{0}^{j},\bm{p}_{0}^{j}),~{}t_{0}=0,~{}t=T_{train}( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = ( bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) , italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0 , italic_t = italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT, and time step dtd𝑡\textrm{d}td italic_t in Algorithm 1 to get the predicted variables (𝒒^j,𝒑^j,𝒙^j,𝒚^j)superscript^𝒒𝑗superscript^𝒑𝑗superscript^𝒙𝑗superscript^𝒚𝑗(\hat{\bm{q}}^{j},\hat{\bm{p}}^{j},\hat{\bm{x}}^{j},\hat{\bm{y}}^{j})( over^ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_p end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_y end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ). Accordingly, the loss function is defined as

NSSNN=1Nbj=1Nb𝒒(j)𝒒^(j)1+𝒑(j)𝒑^(j)1+𝒒(j)𝒙^(j)1+𝒑(j)𝒚^(j)1,subscript𝑁𝑆𝑆𝑁𝑁1subscript𝑁𝑏superscriptsubscript𝑗1subscript𝑁𝑏subscriptnormsuperscript𝒒𝑗superscript^𝒒𝑗1subscriptnormsuperscript𝒑𝑗superscript^𝒑𝑗1subscriptnormsuperscript𝒒𝑗superscript^𝒙𝑗1subscriptnormsuperscript𝒑𝑗superscript^𝒚𝑗1\mathcal{L}_{NSSNN}=\frac{1}{N_{b}}\sum_{j=1}^{N_{b}}\|\bm{q}^{(j)}-\hat{\bm{q% }}^{(j)}\|_{1}+\|\bm{p}^{(j)}-\hat{\bm{p}}^{(j)}\|_{1}+\|\bm{q}^{(j)}-\hat{\bm% {x}}^{(j)}\|_{1}+\|\bm{p}^{(j)}-\hat{\bm{y}}^{(j)}\|_{1},caligraphic_L start_POSTSUBSCRIPT italic_N italic_S italic_S italic_N italic_N end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∥ bold_italic_q start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ∥ bold_italic_p start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_p end_ARG start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ∥ bold_italic_q start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ∥ bold_italic_p start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_y end_ARG start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , (67)

where Nb=512subscript𝑁𝑏512N_{b}=512italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = 512 is the batch size of the training samples. We use the Adam optimizer [48] with learning rate 0.05. The learning rate is multiplied by 0.8 for every 10 epoches.

Taking system (q,p)=0.5(q2+1)(p2+1)𝑞𝑝0.5superscript𝑞21superscript𝑝21\mathcal{H}(q,p)=0.5(q^{2}+1)(p^{2}+1)caligraphic_H ( italic_q , italic_p ) = 0.5 ( italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) ( italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) as an example, we carry out a series of ablation tests based on our constructed networks to find the proper parameters. Normally, we set the time span, time step and dateset size as T=0.01𝑇0.01T=0.01italic_T = 0.01, dt=0.01d𝑡0.01\textrm{d}t=0.01d italic_t = 0.01 and Ns=1280subscript𝑁𝑠1280N_{s}=1280italic_N start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 1280. The choice of ω𝜔\omegaitalic_ω in (14) is largely flexible since NSSNN is not sensitive to the parameter ω𝜔\omegaitalic_ω when it is larger than a certain threshold. We pick the L1𝐿1L1italic_L 1 loss function to train our network due to its better performance. In addition, we already introduced a regularization term in the symplectic integrator embedded in the network; thus, there is no need to add the regularization term in the loss function. The integral time step in the sympletic integrator is a vital parameter, and the choice of dtd𝑡\textrm{d}td italic_t largely depends on the time span Ttrainsubscript𝑇𝑡𝑟𝑎𝑖𝑛T_{train}italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT. In general, we should take relatively small dtd𝑡\textrm{d}td italic_t for the dataset with larger time span Ttrainsubscript𝑇𝑡𝑟𝑎𝑖𝑛T_{train}italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT.

5.2.2 Spring system

We compare five implementations that learn and predict Hamiltonian systems. The first one is NeuralODE [50], which trains the system by embedding the network 𝒇θ(d𝒒/dt,d𝒑/dt)subscript𝒇𝜃d𝒒d𝑡d𝒑d𝑡\bm{f}_{\theta}\to(\textrm{d}\bm{q}/\textrm{d}t,\textrm{d}\bm{p}/\textrm{d}t)bold_italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT → ( d bold_italic_q / d italic_t , d bold_italic_p / d italic_t ) into the Runge-Kutta (RK) integrator. The other four, however, achieve the goal by fitting the Hamiltonian θsubscript𝜃\mathcal{H}_{\theta}\to\mathcal{H}caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT → caligraphic_H based on (6). Specifically, HNN trains the network with the constraints of the Hamiltonian symplectic gradient along with the time derivative of system variables and then embeds the well-trained θsubscript𝜃\mathcal{H}_{\theta}caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT into the RK integrator for predicting the system [15]. The third and fourth implementations are ablation tests. One of them is improved HNN (IHNN), which embeds the well-trained θsubscript𝜃\mathcal{H}_{\theta}caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT into the nonseparable symplectic integrator (Tao’s integrator) for predicting. The other is to directly embed θsubscript𝜃\mathcal{H}_{\theta}caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT into the RK integrator for training, which we call HRK. The fifth method is NSSNN, which embeds θsubscript𝜃\mathcal{H}_{\theta}caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT into the nonseparable symplectic integrator for training.

Refer to caption
Figure 13: Comparison of prediction results of (q,p)𝑞𝑝(q,p)( italic_q , italic_p ) for the spring system =0.5(q2+p2)0.5superscript𝑞2superscript𝑝2\mathcal{H}=0.5(q^{2}+p^{2})caligraphic_H = 0.5 ( italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) from t=0𝑡0t=0italic_t = 0 to t=200𝑡200t=200italic_t = 200 with (q0,p0)=(0,3)subscript𝑞0subscript𝑝003(q_{0},p_{0})=(0,-3)( italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = ( 0 , - 3 ). The time span of the datasets are Ttrain=0.4subscript𝑇𝑡𝑟𝑎𝑖𝑛0.4T_{train}=0.4italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT = 0.4 (first row) and Ttrain=1subscript𝑇𝑡𝑟𝑎𝑖𝑛1T_{train}=1italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT = 1 (second row). The five columns are five different methods NeuralODE, HNN, IHNN, HRK, and NSSNN, respectively. The red line denotes the ground truth; the blue line denotes the prediction, which are perfectly overlap** in NSSNN. The prediction ability of HNN and IHNN improves significantly with the decreasing of Ttrainsubscript𝑇𝑡𝑟𝑎𝑖𝑛T_{train}italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT of the dataset which however may be hard to obtain in the actual experimental measurements. Source: [2].

For fair comparison, we adopt the same network structure (except that the dimension of output layer in NeuralODE is two times larger than that in the other four), the same L1𝐿1L1italic_L 1 loss function and same size of the dataset, and the precision of all integral schemes is second order, and the other parameters keep consistent with the one in Section 5.2.1. The time derivative in the dataset for training HNN and IHNN is obtained by the first difference method

d𝒒dt𝒒(Ttrain)𝒒(0)Ttrainandd𝒑dt𝒑(Ttrain)𝒒(0)Ttrain.formulae-sequenced𝒒d𝑡𝒒subscript𝑇𝑡𝑟𝑎𝑖𝑛𝒒0subscript𝑇𝑡𝑟𝑎𝑖𝑛andd𝒑d𝑡𝒑subscript𝑇𝑡𝑟𝑎𝑖𝑛𝒒0subscript𝑇𝑡𝑟𝑎𝑖𝑛\frac{\textrm{d}\bm{q}}{\textrm{d}t}\approx\frac{\bm{q}(T_{train})-\bm{q}(0)}{% T_{train}}~{}~{}~{}~{}\textrm{and}~{}~{}~{}~{}\frac{\textrm{d}\bm{p}}{\textrm{% d}t}\approx\frac{\bm{p}(T_{train})-\bm{q}(0)}{T_{train}}.divide start_ARG d bold_italic_q end_ARG start_ARG d italic_t end_ARG ≈ divide start_ARG bold_italic_q ( italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT ) - bold_italic_q ( 0 ) end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT end_ARG and divide start_ARG d bold_italic_p end_ARG start_ARG d italic_t end_ARG ≈ divide start_ARG bold_italic_p ( italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT ) - bold_italic_q ( 0 ) end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT end_ARG . (68)

Figure 13 demonstrates the differences between the five methods using a spring system =0.5(q2+p2)0.5superscript𝑞2superscript𝑝2\mathcal{H}=0.5(q^{2}+p^{2})caligraphic_H = 0.5 ( italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) with different time span Ttrain=0.4,1subscript𝑇𝑡𝑟𝑎𝑖𝑛0.41T_{train}=0.4,~{}1italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT = 0.4 , 1 and same time step dt=0.2d𝑡0.2\textrm{d}t=0.2d italic_t = 0.2. We can see that by introducing the nonseparable symplectic integrator into the prediction of the Hamiltonian system, NSSNN has a stronger long-term predicting ability than all the other methods. In addition, the prediction of HNN and IHNN lies in the dataset with time derivative; consequently, it will lead to a larger error when the given time span Ttrainsubscript𝑇𝑡𝑟𝑎𝑖𝑛T_{train}italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT is large.

5.2.3 Modeling vortex dynamics of multi-particle system

For two-dimensional vortex particle systems, the dynamical equations of particle positions (xj,yj),j=1,2,,Nvformulae-sequencesubscript𝑥𝑗subscript𝑦𝑗𝑗12subscript𝑁𝑣(x_{j},y_{j}),~{}j=1,2,\cdots,N_{v}( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_j = 1 , 2 , ⋯ , italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT with particle strengths ΓjsubscriptΓ𝑗\Gamma_{j}roman_Γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT can be written in the generalized Hamiltonian form as

Γjdxjdt=pyj,Γjdyjdt=pxj,withp=14πj,k=1NvΓjΓklog(|xjxk|).formulae-sequencesubscriptΓ𝑗dsubscript𝑥𝑗d𝑡superscript𝑝subscript𝑦𝑗formulae-sequencesubscriptΓ𝑗dsubscript𝑦𝑗d𝑡superscript𝑝subscript𝑥𝑗withsuperscript𝑝14𝜋superscriptsubscript𝑗𝑘1subscript𝑁𝑣subscriptΓ𝑗subscriptΓ𝑘subscript𝑥𝑗subscript𝑥𝑘\Gamma_{j}\frac{\textrm{d}x_{j}}{\textrm{d}t}=-\frac{\partial\mathcal{H}^{p}}{% \partial y_{j}},~{}~{}~{}~{}\Gamma_{j}\frac{\textrm{d}y_{j}}{\textrm{d}t}=% \frac{\partial\mathcal{H}^{p}}{\partial x_{j}},~{}~{}~{}~{}\textrm{with}~{}~{}% ~{}~{}\mathcal{H}^{p}=\frac{1}{4\pi}\sum_{j,k=1}^{N_{v}}\Gamma_{j}\Gamma_{k}% \log(|x_{j}-x_{k}|).roman_Γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT divide start_ARG d italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG d italic_t end_ARG = - divide start_ARG ∂ caligraphic_H start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG , roman_Γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT divide start_ARG d italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG d italic_t end_ARG = divide start_ARG ∂ caligraphic_H start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG , with caligraphic_H start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 4 italic_π end_ARG ∑ start_POSTSUBSCRIPT italic_j , italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_Γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_log ( | italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | ) . (69)

By including the given particle strengths ΓjsubscriptΓ𝑗\Gamma_{j}roman_Γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in Algorithm 1, we can still adopt the method mentioned above to learn the Hamiltonian in (69) when there are fewer particles. However, considering a system with Nv2much-greater-thansubscript𝑁𝑣2N_{v}\gg 2italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ≫ 2 particles, the cost to collect training data from all Nvsubscript𝑁𝑣N_{v}italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT particles might be high, and the training process can be time-consuming. Thus, instead of collecting information from all Nvsubscript𝑁𝑣N_{v}italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT particles to train our model, we only use data collected from two bodies as training data to make predictions of the dynamics of Nvsubscript𝑁𝑣N_{v}italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT particles.

Specifically, we assume the interactive models between particle pairs with unit particle strengths Γj=1subscriptΓ𝑗1\Gamma_{j}=1roman_Γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = 1 are the same, and their corresponding Hamiltonian can be represented as network ^θ(𝒙j,𝒙k)subscript^𝜃subscript𝒙𝑗subscript𝒙𝑘\hat{\mathcal{H}}_{\theta}(\bm{x}_{j},\bm{x}_{k})over^ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), based on which the corresponding Hamiltonian of Nvsubscript𝑁𝑣N_{v}italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT particles can be written as [19, 18]

θp=i,j=1NvΓjΓk^θ(𝒙j,𝒙k).superscriptsubscript𝜃𝑝superscriptsubscript𝑖𝑗1subscript𝑁𝑣subscriptΓ𝑗subscriptΓ𝑘subscript^𝜃subscript𝒙𝑗subscript𝒙𝑘\mathcal{H}_{\theta}^{p}=\sum_{i,j=1}^{N_{v}}\Gamma_{j}\Gamma_{k}\hat{\mathcal% {H}}_{\theta}(\bm{x}_{j},\bm{x}_{k}).caligraphic_H start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_Γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over^ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) . (70)

We embed (70) into the symplectic integrator that includes ΓjsubscriptΓ𝑗\Gamma_{j}roman_Γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to obtain the final network architecture.

Refer to caption
Figure 14: Taylor and Leapfrog vortex. We generate results of Taylor vortex and Leapfrop vortex using NSSNN and HNN, and compare them with the ground truth. 6000 vortex elements are used with corresponding initial vorticity conditions of Taylor vortex and Leapfrop vortex. Source: [2].

The setup of the multi-particle problem is similar to the previous problems. The training time span is Ttrain=0.01subscript𝑇𝑡𝑟𝑎𝑖𝑛0.01T_{train}=0.01italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT = 0.01 while the prediction period can be up to Tpredict=40subscript𝑇𝑝𝑟𝑒𝑑𝑖𝑐𝑡40T_{predict}=40italic_T start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t end_POSTSUBSCRIPT = 40. We use 2048 clean data samples to train our model. The training process takes about 100 epochs for the loss to converge. In Figure 14, we use our trained model to predict the dynamics of 6000-particle systems, including Taylor and Leapfrog vortices. We generate results of Taylor vortex and Leapfrop vortex using NSSNN and HNN and compare them with the ground truth. Vortex elements are used with corresponding initial vorticity conditions of Taylor vortex and Leapfrop vortex [68]. The difficulty of the numerical modeling of these two systems lies in the separation of different dynamical vortices instead of having them merging into a bigger structure. In both cases, the vortices evolved using NSSNN are separated nicely as the ground truth shows, while the vortices merge together using HNN.

5.3 RoeNet

5.3.1 Dataset generation and training settings

For our experiments, we construct datasets using either analytical solutions or numerical solutions calculated with a high-resolution finite difference method. These datasets are then divided into training and validation sets in a 9:1:919:19 : 1 ratio. The physical quantities solved in our experiments are of order O(1)𝑂1O(1)italic_O ( 1 ) and, consequently, do not require normalization.

We train the network over a time span defined as Ttrainsubscript𝑇𝑡𝑟𝑎𝑖𝑛T_{train}italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT and use it to predict target values over a time span of Tpredictsubscript𝑇𝑝𝑟𝑒𝑑𝑖𝑐𝑡T_{predict}italic_T start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t end_POSTSUBSCRIPT, where Tpredict>Ttrainsubscript𝑇𝑝𝑟𝑒𝑑𝑖𝑐𝑡subscript𝑇𝑡𝑟𝑎𝑖𝑛T_{predict}>T_{train}italic_T start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t end_POSTSUBSCRIPT > italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT and Tpredictsubscript𝑇𝑝𝑟𝑒𝑑𝑖𝑐𝑡T_{predict}italic_T start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t end_POSTSUBSCRIPT starts no earlier than Ttrainsubscript𝑇𝑡𝑟𝑎𝑖𝑛T_{train}italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT.

In all experiments, the Adam optimizer [48] is employed, with a learning rate of γ𝛾\gammaitalic_γ as listed in Table 3. The learning rate decays by a multiplicative factor of 0.9 every 5 to 20 epochs. This optimizer is chosen for its ability to adapt learning rates based on the gradient history of each parameter, which facilitates faster and more precise convergence compared to methods with fixed learning rates. Training is conducted with batch sizes ranging from 8 to 32, and all models undergo 100 epochs to ensure convergence. Notably, extending the number of training epochs can enhance training accuracy, reflecting a trade-off between training time and accuracy.

Table 3: Experimental set-up for RoeNet. Source: [4].
1C Linear Sod Tube
Boundary condition Periodic Neumann
Time step ΔtΔ𝑡\Delta troman_Δ italic_t 0.02 0.001
Space step ΔxΔ𝑥\Delta xroman_Δ italic_x 0.01 0.005
Training time span 0.04 0.06
Predicting time span >>> 2 0.1
Data set samples 500 2000
Data set generation Analytical Analytical
Components number Ncsubscript𝑁𝑐N_{c}italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT 1 3
Hidden dimension Nhsubscript𝑁N_{h}italic_N start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT 1 64

5.3.2 A simple example

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 15: Comparison of RoeNet and Roe solver for solving a one component linear hyperbolic PDE (1C Linear in Table 3). (a) t=0𝑡0t=0italic_t = 0, (b) t=0.4𝑡0.4t=0.4italic_t = 0.4, (c) t=0.8𝑡0.8t=0.8italic_t = 0.8, (d) t=1.2𝑡1.2t=1.2italic_t = 1.2. The legend “RoeNet” and “RoeNet (noise)” denote the networks are trained by the clean dataset and the dataset with noise ϵ𝒩(0,0.1)similar-toitalic-ϵ𝒩00.1\epsilon\sim\mathcal{N}(0,0.1)italic_ϵ ∼ caligraphic_N ( 0 , 0.1 ), respectively. Source: [4].

Taking a linear hyperbolic PDE with one component (1C Linear in Table 3)

{𝑭=u,u(t=0,x)=e300x2cases𝑭𝑢otherwise𝑢𝑡0𝑥superscript𝑒300superscript𝑥2otherwise\begin{dcases}\bm{F}=u,\\ u(t=0,x)=e^{-300x^{2}}\end{dcases}{ start_ROW start_CELL bold_italic_F = italic_u , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_u ( italic_t = 0 , italic_x ) = italic_e start_POSTSUPERSCRIPT - 300 italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL start_CELL end_CELL end_ROW (71)

in (18) as an example, we evaluate the performance of RoeNet. This hyperbolic PDE models a Gaussian wave traveling along a line at constant speed. Figure 15 illustrates the propagation of this Gaussian wave over time, simulated using RoeNet with both clean and noisy training data sets, alongside results from the Roe solver and the analytical solution. RoeNet’s predictions, regardless of noise in the training data, align closely with the analytical results throughout the entire computational time domain. In contrast, simulations using the Roe solver show rapid flattening and dissipation of the wave over time. Although the prediction error of RoeNet does accumulate gradually, this increase in numerical error is significantly slower than that observed with traditional numerical methods. As a result, RoeNet demonstrates superior performance with its more accurate predictions.

5.3.3 Sod shock tube

We take the one-dimensional diatomic ideal gas problem to assess the performance of our model on solving multi-component Riemann problems with nonlinear flux functions (Sode Tube in Table 3). Specifically, the system is modeled by (18) with

{𝒖=(ρ,ρv,e)T,𝑭=[ρv,ρv2+p,v(e+p)]T,cases𝒖superscript𝜌𝜌𝑣𝑒𝑇otherwise𝑭superscript𝜌𝑣𝜌superscript𝑣2𝑝𝑣𝑒𝑝𝑇otherwise\begin{dcases}\bm{u}=(\rho,\rho v,e)^{T},\\ \bm{F}=[\rho v,\rho v^{2}+p,v(e+p)]^{T},\end{dcases}{ start_ROW start_CELL bold_italic_u = ( italic_ρ , italic_ρ italic_v , italic_e ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL bold_italic_F = [ italic_ρ italic_v , italic_ρ italic_v start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_p , italic_v ( italic_e + italic_p ) ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , end_CELL start_CELL end_CELL end_ROW (72)

where ρ𝜌\rhoitalic_ρ is the density, p𝑝pitalic_p is the pressure, e𝑒eitalic_e is the energy, v𝑣vitalic_v is the velocity, and the pressure p𝑝pitalic_p is related to the conserved quantities through the equation of state p=(γ1)(e0.5ρv2)𝑝𝛾1𝑒0.5𝜌superscript𝑣2p=(\gamma-1)\left(e-0.5\rho v^{2}\right)italic_p = ( italic_γ - 1 ) ( italic_e - 0.5 italic_ρ italic_v start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) with a heat capacity ratio γ1.4𝛾1.4\gamma\approx 1.4italic_γ ≈ 1.4. We apply our model to the Sod shock tube problem [69], a one-dimensional Riemann problem in the form of (18) with (72). The time evolution of this problem can be described by solving the mass, momentum, and energy conservation of ideal gas inside a slender tube, which leads to three characteristics, describing the propagation speed of various regions in the system [69]. In Figure 16, we plot the three components of the problem, at t=0.1𝑡0.1t=0.1italic_t = 0.1. Note that due to the dissipation effects incorporated in our model, there is no sign of sonic glitch. The result shows that RoeNet exhibits higher accuracy in predicting the discontinuities of the nonlinear Riemann problem.

Refer to caption
Refer to caption
Refer to caption
Figure 16: Comparison of RoeNet and Roe solver for solving a Riemann problem with three components and a nonlinear flux function (Sod Tube in Table 3). (a), (b), and (c) plot the comparison of the prediction results using RoeNet, numerical results solved with Roe solver, and the analytical solutions at t=0.1𝑡0.1t=0.1italic_t = 0.1 of the three components u(1)superscript𝑢1u^{(1)}italic_u start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT, u(2)superscript𝑢2u^{(2)}italic_u start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT, and u(3)superscript𝑢3u^{(3)}italic_u start_POSTSUPERSCRIPT ( 3 ) end_POSTSUPERSCRIPT, respectively. Source: [4].

5.3.4 Comparison with other methods

Current neural network methods, such as Physics-Informed Neural Networks (PINNs) [27], typically require a pre-established PDE model and continuous interaction with this model during training to adjust the loss, using complex Hessian-based optimizers like L-BFGS that often result in extended training durations. In contrast, RoeNet operates independently of any explicit equation knowledge, utilizing only the training datasets and relying on more efficient gradient-based optimizers such as SGD.

Conventional neural networks struggle to predict the emergence and evolution of discontinuous solutions without a governing equation. Our model, RoeNet, showcases a unique capability to handle tasks that traditional machine learning approaches cannot, particularly in predicting dynamics for future times not included in the training data. This is demonstrated in Figure 17, where RoeNet outperforms PINNs [70] in the simulation of the 1C Linear problem described in Section 5.3.2, providing accurate predictions for future states beyond the training scope.

Refer to caption
Figure 17: Comparion of the numerical results solved with Roe solver, prediction results using RoeNet and PINNs [70] at (a) t=0𝑡0t=0italic_t = 0, (b) t=0.4𝑡0.4t=0.4italic_t = 0.4, (c) t=0.8𝑡0.8t=0.8italic_t = 0.8 of the problem 1C Linear in Table 3. Source: [4].

RoeNet, as a data-driven solver, does not require prior knowledge of the system’s evolution equations, setting it apart from traditional numerical methods. It employs an optimization-based approach to construct its numerical scheme, with an optimization space that fully encompasses that of the Roe solver. This enables RoeNet to deliver more precise simulations of PDE evolution compared to conventional numerical approaches.

5.4 NVM

5.4.1 Dataset generation and training settings

We randomly sample 2 to 6 vortices and create the initial vorticity field through convolution with a Gaussian kernel 𝒩(0,0.01)similar-toabsent𝒩00.01\sim\mathcal{N}(0,0.01)∼ caligraphic_N ( 0 , 0.01 ). This process is repeated 2000 times to generate Ns=2000subscript𝑁𝑠2000N_{s}=2000italic_N start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 2000 samples. DNS is performed to solve (31) in the periodic box using a standard pseudo-spectral method [71]. Aliasing errors are removed using the two-thirds truncation method with the maximum wavenumber kmaxN/3subscript𝑘𝑁3k_{\max}\approx N/3italic_k start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ≈ italic_N / 3. The Fourier coefficients of the velocity are advanced in time using a second-order Adams–Bashforth method. The time step is chosen to ensure that the Courant–Friedrichs–Lewy number is less than 0.50.50.50.5 for numerical stability and accuracy. To obtain accurate DNS data samples, we set the grid size as N=1024𝑁1024N=1024italic_N = 1024. Regarding the kinematic viscosity, we set ν=0𝜈0\nu=0italic_ν = 0 and ν=0.001𝜈0.001\nu=0.001italic_ν = 0.001 for different cases. The pseudo-spectral method used in this DNS is similar to that described in [61, 72, 73].

We use Ntrain=0.9Ns=1600subscript𝑁𝑡𝑟𝑎𝑖𝑛0.9subscript𝑁𝑠1600N_{train}=0.9N_{s}=1600italic_N start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT = 0.9 italic_N start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 1600 samples with the time span Ttrainsubscript𝑇𝑡𝑟𝑎𝑖𝑛T_{train}italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT for the training of the dynamics network. The DNS dataset is generated with random initial conditions independent of the predicted vortex evolution. The time step of vortex evolution is set as dtd𝑡\textrm{d}td italic_t. For the leapfrog example, we set the parameters as Ttrain=1subscript𝑇𝑡𝑟𝑎𝑖𝑛1T_{train}=1italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT = 1 and dt=0.001d𝑡0.001\textrm{d}t=0.001d italic_t = 0.001. For the turbulent flow example, we set the parameters as Ttrain=0.001subscript𝑇𝑡𝑟𝑎𝑖𝑛0.001T_{train}=0.001italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT = 0.001 and dt=0.001d𝑡0.001\textrm{d}t=0.001d italic_t = 0.001. For other examples, the parameters are set as Ttrain=0.2subscript𝑇𝑡𝑟𝑎𝑖𝑛0.2T_{train}=0.2italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT = 0.2 and dt=0.1d𝑡0.1\textrm{d}t=0.1d italic_t = 0.1. In general, the parameters are chosen within a wide range, indicating the robustness of the network. We use the trained network to predict the vortex dynamics at time Tpredictsubscript𝑇𝑝𝑟𝑒𝑑𝑖𝑐𝑡T_{predict}italic_T start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t end_POSTSUBSCRIPT. We show that the prediction time span Tpredictionsubscript𝑇𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛T_{prediction}italic_T start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t italic_i italic_o italic_n end_POSTSUBSCRIPT is larger than the training time span Ttrainsubscript𝑇𝑡𝑟𝑎𝑖𝑛T_{train}italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT in the results section, in some cases up to tens of times of Ttrainsubscript𝑇𝑡𝑟𝑎𝑖𝑛T_{train}italic_T start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT.

For both the detection network and the dynamics network, we use Adam optimizer [48] with a learning rate of 1e-3. The learning rate decays every 20 epochs by a multiplicative factor of 0.8. For the detection network, we use a batch size of 32 and train it for 350 epochs. We use the cross entropy as the classification loss and L1 loss for position prediction. To relieve the unbalanced data problem in the detection network, we implement Focal loss [65] with α=0.4𝛼0.4\alpha=0.4italic_α = 0.4 and γ=2𝛾2\gamma=2italic_γ = 2. It takes 15 minutes to converge on a single Nvidia RTX 2080Ti GPU. For the dynamics network, we use a batch size of 64 and train it for 500 epochs. We use L1 loss for position prediction. It takes 25 minutes to converge on a single Nvidia RTX 2080Ti GPU.

5.4.2 Comparison between NVM and LVM

Refer to caption
Figure 18: Comparison of NVM and LVM for solving NS equations in the periodic box. (a) NVM, (b) LVM, and (c) The relative error of velocity in flow simulation. The red dots indicate the positions of 2 vortices at different time steps generated by DNS. The black circles in (a) and (b) are the prediction and simulation results of the NVM and LVMs, respectively. The black arrows indicate the directions of the motions of the 2 vortices. Source: [3].

To demonstrate that NVM is a better approach to capturing fluid dynamics than the traditional LVM, we compare the prediction results of the NVM and the LVM for solving NS equations in the periodic box. In the prediction, we initialize two vortex particles at 𝑿1=(π0.4,π0.6)subscript𝑿1𝜋0.4𝜋0.6\bm{X}_{1}=(\pi-0.4,\pi-0.6)bold_italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ( italic_π - 0.4 , italic_π - 0.6 ) and 𝑿2=(π+0.4,π+0.6)subscript𝑿2𝜋0.4𝜋0.6\bm{X}_{2}=(\pi+0.4,\pi+0.6)bold_italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ( italic_π + 0.4 , italic_π + 0.6 ), where the corresponding particle strength are Γ1=0.75subscriptΓ10.75\Gamma_{1}=0.75roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.75 and Γ2=0.75subscriptΓ20.75\Gamma_{2}=0.75roman_Γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.75. We plot the results using the NVM and LVM and the relative error of velocity in the simulation in Figure 18 (a), (b), and (c), respectively. Here, the relative error of velocity is defined as

ϵu=𝒖predict𝒖trueL2𝒖trueL2,subscriptitalic-ϵ𝑢subscriptnormsubscript𝒖𝑝𝑟𝑒𝑑𝑖𝑐𝑡subscript𝒖𝑡𝑟𝑢𝑒superscript𝐿2subscriptnormsubscript𝒖𝑡𝑟𝑢𝑒superscript𝐿2\epsilon_{u}=\frac{\|\bm{u}_{predict}-\bm{u}_{true}\|_{L^{2}}}{\|\bm{u}_{true}% \|_{L^{2}}},italic_ϵ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = divide start_ARG ∥ bold_italic_u start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t end_POSTSUBSCRIPT - bold_italic_u start_POSTSUBSCRIPT italic_t italic_r italic_u italic_e end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG ∥ bold_italic_u start_POSTSUBSCRIPT italic_t italic_r italic_u italic_e end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG , (73)

where 𝒖predictsubscript𝒖𝑝𝑟𝑒𝑑𝑖𝑐𝑡\bm{u}_{predict}bold_italic_u start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d italic_i italic_c italic_t end_POSTSUBSCRIPT denotes the predicted or simulated solution and 𝒖truesubscript𝒖𝑡𝑟𝑢𝑒\bm{u}_{true}bold_italic_u start_POSTSUBSCRIPT italic_t italic_r italic_u italic_e end_POSTSUBSCRIPT denote the ground truth solution.

It is quite obvious that in Figure 18 (a), the predictions made by NVM match the positions of vortices generated by DNS almost perfectly, while the predictions made by BS law in Figures 18 (b) contain a large error. The divergence of the relative error of velocity is shown in Figure 18 (c), which shows that the NVM outperforms traditional methods by increasing amounts as the predicting period becomes longer.

5.4.3 Turbulent flows

Refer to caption
Figure 19: Two-dimensional Lagrangian scalar fields at t=1𝑡1t=1italic_t = 1 with the initial condition ϕ=xitalic-ϕ𝑥\phi=xitalic_ϕ = italic_x and resolution 20002superscript200022000^{2}2000 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The evolution of the Lagrangian scalar fields is induced by (a) O(10)𝑂10O(10)italic_O ( 10 ) and (b) O(100)𝑂100O(100)italic_O ( 100 ) random NVM vortex particles. Source: [3].

Besides simple systems, NVM is capable of predicting complicated turbulence systems. This example’s primary purpose is to illustrate our network’s ability to handle more complex problems.

Figure 19 depicts the two-dimensional Lagrangian scalar fields at t=1𝑡1t=1italic_t = 1 with the initial condition ϕ=xitalic-ϕ𝑥\phi=xitalic_ϕ = italic_x and resolution 20002superscript200022000^{2}2000 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The governing equation of the Lagrangian scalar fields is

ϕt+𝒖ϕ=0.italic-ϕ𝑡𝒖bold-∇italic-ϕ0\frac{\partial\phi}{\partial t}+\bm{u}\cdot\bm{\nabla}\phi=0.divide start_ARG ∂ italic_ϕ end_ARG start_ARG ∂ italic_t end_ARG + bold_italic_u ⋅ bold_∇ italic_ϕ = 0 . (74)

The evolution of the Lagrangian scalar fields is induced by O(10)𝑂10O(10)italic_O ( 10 ) and O(100)𝑂100O(100)italic_O ( 100 ) NVM vortex particles at random positions U(0,4)absent𝑈04\backsim U(0,4)∽ italic_U ( 0 , 4 ) with random strengths U(0,2)absent𝑈02\backsim U(0,2)∽ italic_U ( 0 , 2 ). We remark that the same trained model is used for both cases. There is no correlation between the positions and vortex particle strengths of the two sets of vortex particles.

Based on the particle velocity field from the NVM, a backward-particle-tracking method is applied to solve (74). Then the iso-contour of the Lagrangian field can be extracted as material structures in the evolution [74, 75, 76, 77, 78]. In Figure 19 (a), the spiral structure [79, 80] of individual NVM vortex particles can be observed clearly due to the small number of NVM vortex particles. In Figure 19 (b), the underlying field exhibits turbulent behaviors since it is generated with a large number of NVM vortex particles.

Generally, the high-resolution results shown in Figure 19 can only be achieved by supercomputation using grid-based methods [74], while NVM allows these to be generated on any laptop with GPU. We demonstrate that NVM is capable of generating an accurate depiction of complex turbulence systems with low computational costs.

6 Conclusion

6.1 Summary

This thesis introduces a novel data-driven framework, which demonstrates a significant advancement in predictive modeling for long-term forecasts by integrating physics-based priors into learning algorithms. This integration ensures intrinsic preservation of the physical structures of the systems analyzed, thereby maintaining mathematical symmetries and physical conservation laws. As a result, the models demonstrate superior performance in terms of prediction accuracy, robustness, and predictive capability, particularly in identifying patterns not present within the training dataset, despite the use of small datasets, short training periods, and small sample sizes.

In particular, we have developed four distinct algorithms, each designed to incorporate specific physics-based priors relevant to different types of nonlinear systems. These include the symplectic structure for both separable and nonseparable Hamiltonian systems, Hyperbolic Conservation Law for hyperbolic partial differential equations, and Helmholtz’s Theorem for incompressible fluid dynamics. The integration of physics-based priors not only narrows the solution space, thereby streamlining computational demands, but also enhances the reliability and validity of the predictions. Moreover, embedding these structures within neural networks significantly expands their capacity to capture and reproduce complex patterns inherent in physical phenomena, which conventional networks often fail to recognize. This expanded capability allows for a more comprehensive representation of potential physical behaviors, substantially improving the models’ applicability and predictive accuracy.

6.2 Limitations and Future Work

We also recognize our models have several limitations. Firstly, neural networks that include an embedded integrator often require a longer training period compared to those trained on datasets with explicit time derivatives. Secondly, our method employs an explicit scheme for time evolution, which necessitates a small time step to ensure accuracy. Although a smaller time step can lead to higher discretization accuracy, this advantage must be weighed against increased training costs and the risk of gradient explosion. In our future work, we are considering the adoption of implicit formats, such as leveraging RNN structures, which may offer more stability and efficiency. In addition, our current model is designed as an end-to-end system that does not account for environmental variability. To address this issue, we will explore online learning techniques to enhance the model’s adaptability in changing conditions. Lastly, To enhance the applicability of our model, a significant focus of our future research will be dedicated to develo** scalable methods that can be generalized to various PDEs, aiming to achieve a versatile and universally applicable framework for various systems.

References

  • Tong et al. [2021] Yun** Tong, Shiying Xiong, Xingzhe He, Guanghan Pan, and Bo Zhu. Symplectic neural networks in taylor series form for hamiltonian systems. Journal of Computational Physics, 437:110325, 2021.
  • Xiong et al. [2020] Shiying Xiong, Yun** Tong, Xingzhe He, Shuqi Yang, Cheng Yang, and Bo Zhu. Nonseparable symplectic neural networks. arXiv preprint arXiv:2010.12636, 2020.
  • Xiong et al. [2023] Shiying Xiong, Xingzhe He, Yun** Tong, Yitong Deng, and Bo Zhu. Neural vortex method: From finite lagrangian particles to infinite dimensional eulerian dynamics. Computers & Fluids, 258:105811, 2023.
  • Tong et al. [2024] Yun** Tong, Shiying Xiong, Xingzhe He, Shuqi Yang, Zhecheng Wang, Rui Tao, Runze Liu, and Bo Zhu. Roenet: Predicting discontinuity of hyperbolic systems from continuous data. International Journal for Numerical Methods in Engineering, 125(6):e7406, 2024.
  • Weinan [2021] E Weinan. The dawning of a new era in applied mathematics. Notices of the American Mathematical Society, 68(4):565–571, 2021.
  • Brunton et al. [2020] S. L. Brunton, B. R. Noack, and P. Koumoutsakos. Machine Learning for Fluid Mechanics. Annu. Rev. Fluid Mech., 52:477–508, 2020.
  • Hughes et al. [2019] T. W. Hughes, I. A. D. Williamson, M. Minkov, and S. Fan. Wave physics as an analog recurrent neural network. Sci. Adv., 5:6946, 2019.
  • Sellier et al. [2019] J. M. Sellier, G. M. Caron, and J. Leygonie. Signed particles and neural networks, towards efficient simulations of quantum systems. J. Comput. Phys., 387:154–162, 2019.
  • Hernandez et al. [2020] Quercus Hernandez, Alberto Badias, David Gonzalez, Francisco Chinesta, and Elias Cueto. Structure-preserving neural networks. arXiv:2004.04653, 2020.
  • Teicherta et al. [2019] G. H. Teicherta, A. R. Natarajanc, A. Van der Venc, and K. Garikipati. Machine learning materials physics: Integrable deep neural networks enable scale bridging by learning free energy functions. Comput. Methods Appl. Mech. Engrg., 353:201–216, 2019.
  • Regazzoni et al. [2019] F Regazzoni, L Dedé, and A Quarteroni. Machine learning for fast and reliable solution of time-dependent differential equations. J. Comput. Phys., 397:108852, 2019.
  • Raissi and Karniadakis [2018] M. Raissi and G. E. Karniadakis. Hidden physics models: Machine learning of nonlinear partial differential equations. J. Comput. Phys., 357:125–141, 2018.
  • Sirignano and Spiliopoulos [2018] J. Sirignano and K. Spiliopoulos. DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys., 375:686–707, 2018.
  • Raissi et al. [2019] M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys., 378:686–707, 2019.
  • Greydanus et al. [2019] S. Greydanus, M. Dzamba, and J. Yosinski. Hamiltonian neural networks. In Conference on Neural Information Processing Systems, pages 15379–15389, 2019.
  • Chen et al. [2020] Z. Chen, J. Zhang, M. Arjovsky, and L. Bottou. Symplectic recurrent neural networks. In International Conference on Learning Representations, 2020.
  • DiPietro et al. [2020] D. DiPietro, S. Xiong, and B. Zhu. Sparse symplectically integrated neural networks. In Advances in Neural Information Processing Systems, 2020.
  • Sanchez-Gonzalez et al. [2019] A. Sanchez-Gonzalez, V. Bapst, K. Cranmer, and P. Battaglia. Hamiltonian graph networks with ODE integrators. arXiv:1909.12790, 2019.
  • Battaglia et al. [2016] P. Battaglia, R. Pascanu, M. Lai, and D. J. Rezende. Interaction networks for learning about objects, relations and physics. In Advances in Neural Information Processing Systems, pages 4502–4510, 2016.
  • ** et al. [2020] P. **, A. Zhu, G. E. Karniadakis, and Y. Tang. Symplectic networks: intrinsic structure-preserving networks for identifying Hamiltonian systems. arXiv:2001.03750, 2020.
  • Toth et al. [2020] P. Toth, D. J. Rezende, A. Jaegle, S. Racaniére, A. Botev, and I. Higgins. Hamiltonian generative networks. In International Conference on Learning Representations, 2020.
  • Zhong et al. [2020] Y. D. Zhong, B. Dey, and A. Chakraborty. Symplectic ODE-Net: learning Hamiltonian dynamics with control. In International Conference on Learning Representations, 2020.
  • Yarosky [2017] D. Yarosky. Error bounds for approximations with deep ReLU networks. Neural Netw., 94:103–114, 2017.
  • Petersen and Voigtländer [2018] P. Petersen and F. Voigtländer. Optimal approximation of piecewise smooth functions using deep ReLU neural networks. Neural Netw., 170:296–330, 2018.
  • Imaizumi and Fukumizu [2019] M. Imaizumi and K. Fukumizu. Deep learning networks learn non-smooth functions effectively. In The Institute of Statistical Mathematics, pages 869–878. The 22nd International Conference on Artificial Intelligence and Statistics, 2019.
  • Suzuki [2019] T. Suzuki. Adaptivity of deep relu network for learning in besov and mixed smooth besov spaces: Optimal rate and curse of dimensionality. In The University of Tokyo. International Conference on Learning Representations, 2019.
  • Raissi et al. [2017] M. Raissi, P. Perdikaris, and G. E. Karniadakis. Inferring solutions of differential equations using noisy multi-fidelity data. J. Comput. Phys., 335:736–746, 2017.
  • Hornik et al. [1989] K. Hornik, M. Stinchcombe, and W. Halbert. Multilayer feedforward networks are universal approximators. Neural Netw., 2:359–366, 1989.
  • Zhang et al. [2019] D. Zhang, L. Guo, and G. E. Karniadakis. Learning in modal space: solving time-dependent stochastic PDEs using physics-informed neural networks. SIAM J. Sci. Comput., 42:A639–A665, 2019.
  • Michoski et al. [2019] C. Michoski, M. Milosavljevic, T. Oliver, and D. Hatch. Solving differential equations using deep neural networks. Neurocomputing, 399:193–212, 2019.
  • Mao et al. [2020] Z. Mao, A. D. Jagtap, and G. E. Karniadakis. Physics-informed neural networks for high-speed flows. Comput. Method. Appl. M., 360:112789, 2020.
  • Duraisamy et al. [2019] K. Duraisamy, G. Iaccarino, and H. Xiao. Turbulence modeling in the age of data. Annu. Rev. Fluid Mech., 51:357–377, 2019.
  • Xie et al. [2018] Y. Xie, E. Franz, M. Chu, and N. Thuerey. tempogan: A temporally coherent, volumetric gan for super-resolution fluid flow. ACM Trans. Graph., 37(4):1–15, 2018.
  • Chu and Thuerey [2017] M. Chu and N. Thuerey. Data-driven synthesis of smoke flows with cnn-based feature descriptors. ACM Trans. Graph., 36(4):1–14, 2017.
  • Anderson et al. [1996] J. Anderson, I. Kevrekidis, and R. Rico-Martinez. A comparison of recurrent training algorithms for time series analysis and system identification. Comput. Chem. Eng., 20:S751–S756, 1996.
  • Crutchfield and McNamara [1987] James P Crutchfield and Bruce S McNamara. Equations of motion from a data series. Complex Syst., 1(417-452):121, 1987.
  • Daniels and Nemenman [2015] Bryan C Daniels and Ilya Nemenman. Automated adaptive inference of phenomenological dynamical models. Nat. Commun., 6(1):1–8, 2015.
  • Wang et al. [2017] J. Wang, J. Wu, and H. Xiao. Physics-informed machine learning approach for reconstructing reynolds stress modeling discrepancies based on dns data. Phys. Rev. Fluids, 2(3):034603, 2017.
  • Hammond et al. [2022] J. Hammond, F. Montomoli, M. Pietropaoli, R. D. Sandberg, and V. M. Machine learning for the development of data-driven turbulence closures in coolant systems. J. Turbomach., 144(8):081003, 2022.
  • Xu et al. [2022] X. Xu, F. Waschkowski, A. S. Ooi, and R. D. Sandberg. Towards robust and accurate reynolds-averaged closures for natural convection via multi-objective cfd-driven machine learning. Int. J. Heat Mass Transf., 187:122557, 2022.
  • Mohan et al. [2020a] A. T. Mohan, N. Lubbers, D. Livescu, and M. Chertkov. Embedding hard physical constraints in convolutional neural networks for 3D turbulence. In International Conference on Learning Representations, 2020a.
  • Yang et al. [2019] X. Yang, S. Zafar, J. Wang, and H. Xiao. Predictive large-eddy-simulation wall modeling via physics-informed neural networks. Phys. Rev. Fluids, 4:034602, 2019.
  • Raissi et al. [2020] M. Raissi, A. Yazdani, and G. E. Karniadakis. Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations. Science, 367(6481):1026–1030, 2020.
  • Belbute-Peres et al. [2020] F. Belbute-Peres, T. Economon, and Z. Kolter. Combining differentiable pde solvers and graph neural networks for fluid flow prediction. In International Conference on Machine Learning, pages 2402–2411, 2020.
  • Lye et al. [2020] K. Lye, S. Mishra, and D. Ray. Deep learning observables in computational fluid dynamics. J. Comput. Phys., 410:109339, 2020.
  • White et al. [2019] Cristina White, Daniela Ushizima, and Charbel Farhat. Neural networks predict fluid dynamics solutions from tiny datasets. arXiv preprint arXiv:1902.00091, 2019.
  • Mohan et al. [2020b] Arvind T Mohan, Nicholas Lubbers, Daniel Livescu, and Michael Chertkov. Embedding hard physical constraints in neural network coarse-graining of 3d turbulence. arXiv preprint arXiv:2002.00021, 2020b.
  • Kingma and Ba [2014] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
  • He et al. [2016] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognitionn, pages 770–778, 2016.
  • Chen et al. [2018] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud. Neural ordinary differential equations. In Conference on Neural Information Processing Systems, pages 6571–6583, 2018.
  • Pontryagin [2018] Lev Semenovich Pontryagin. Mathematical theory of optimal processes. Routledge, 2018.
  • Forest and Ruth [1990] E. Forest and R. D. Ruth. Fourth-order symplectic integration. Physica D, 43:105–117, 1990.
  • Yoshida [1990] H. Yoshida. Construction of higher order symplectic integrators. Phys. Lett. A, 150:262–268, 1990.
  • Candy and Rozmus [1991] J. Candy and W. Rozmus. A symplectic integration algorithm for separable Hamiltonian functions. J. Comput. Phys., 92:230–256, 1991.
  • Tao [2016] Molei Tao. Explicit symplectic approximation of nonseparable hamiltonians: Algorithm and long time performance. Physical Review E, 94(4):043303, 2016.
  • Wu et al. [2015] J. Z. Wu, H. Y. Ma, and M. D. Zhou. Vortical Flows. Springer, 2015.
  • Evans [2010] L. C. Evans. Partial Differential Equations. American Mathematical Society, 2 edition, 2010.
  • Roe [1981] P. L. Roe. Approximate riemann solvers, parameter vectors and difference schemes. J. Comput. Phys., 43:357–372, 1981.
  • Helmholtz [1858] H. Helmholtz. Uber integrale der hydrodynamischen Gleichungen welche den Wirbel-bewegungen ensprechen. J. Reine Angew. Math, 55:25–55, 1858.
  • Yang and Pullin [2010] Y. Yang and D. I. Pullin. On Lagrangian and vortex-surface fields for flows with Taylor–Green and Kida–Pelz initial conditions. J. Fluid Mech., 661:446–481, 2010.
  • Xiong and Yang [2017] S. Xiong and Y. Yang. The boundary-constraint method for constructing vortex-surface fields. J. Comput. Phys., 339:31–45, 2017.
  • Hao et al. [2019] J. Hao, S. Xiong, and Y. Yang. Tracking vortex surfaces frozen in the virtual velocity in non-ideal flows. J. Fluid Mech., 863:513–544, 2019.
  • Cottet and Koumoutsakos [2000] G.H. Cottet and P.D. Koumoutsakos. Vortex Methods: Theory and Practice. Cambridge University Press, 2000.
  • Redmon et al. [2016] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognitionn, pages 779–788, 2016.
  • Lin et al. [2017] T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. Focal loss for dense object detection. IEEE Trans. Vis. Comput. Graph., pages 2980–2988, 2017.
  • Paszke et al. [2019] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, pages 8026–8037, 2019.
  • Glorot and Bengio [2010] X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 249–256, 2010.
  • Qu et al. [2019] Z. Qu, X. Zhang, M. Gao, C. Jiang, and B. Chen. Efficient and conservative fluids using bidirectional map**. ACM Trans. Graph., 38:1–12, 2019.
  • Sod [1978] G. A. Sod. A survey of several finite difference methods for systems of nonlinear hyperbolic conservation laws. J. Comput. Phys., 27:1–31, 1978.
  • Lu et al. [2019] L. Lu, X. Meng, Z. Mao, and G. E. Karniadakis. DeepXDE: A deep learning library for solving differential equations. SIAM Rev. Soc. Ind. Appl. Math., 63:208–228, 2019.
  • Rogallo [1981] R. S. Rogallo. Numerical experiments in homogeneous turbulence. In Technical Report TM81315, NASA, 1981.
  • Xiong and Yang [2019] S. Xiong and Y. Yang. Construction of knotted vortex tubes with the writhe-dependent helicity. Phys. Fluids, 31:047101, 2019.
  • Xiong and Yang [2020] S. Xiong and Y. Yang. Effects of twist on the evolution of knotted magnetic flux tubes. J. Fluid Mech., 895:A28, 2020.
  • Yang et al. [2010] Y. Yang, D. I. Pullin, and I. Bermejo-Moreno. Multi-scale geometric analysis of Lagrangian structures in isotropic turbulence. J. Fluid Mech., 654:233–270, 2010.
  • Yang and Pullin [2011] Y. Yang and D. I. Pullin. Geometric study of Lagrangian and Eulerian structures in turbulent channel flow. J. Fluid Mech., 674:67–92, 2011.
  • Zhao et al. [2016] Y. Zhao, Y. Yang, and S. Chen. Evolution of material surfaces in the temporal transition in channel flow. J. Fluid Mech., 793:840–876, 2016.
  • Zheng et al. [2016] W. Zheng, Y. Yang, and S. Chen. Evolutionary geometry of Lagrangian structures in a transitional boundary layer. Phys. Fluids, 28:035110, 2016.
  • Zheng et al. [2019] W. Zheng, S. Ruan, Y. Yang, L. He, and S. Chen. Image-based modelling of the skin-friction coefficient in compressible boundary-layer transition. J. Fluid. Mech., 875:1175–1203, 2019.
  • Lundgren [1982] T. S. Lundgren. Strained spiral vortex model for turbulent fine structure. Phys. Fluids, 25:2193–2203, 1982.
  • Lundgren [1993] T. S. Lundgren. A small-scale turbulence model. Phys. Fluids A, 5:1472, 1993.