Data-Driven Computing Methods for Nonlinear Physics Systems with Geometric Constraints
Yun** Tong
Undergraduate Computer Science Thesis
Advised by
Professor Bo Zhu
Dartmouth College
Hanover, New Hampshire
June, 2024
Abstract
In a landscape where scientific discovery is increasingly driven by data, the integration of machine learning (ML) with traditional scientific methodologies has emerged as a transformative approach. This paper introduces a novel, data-driven framework that synergizes physics-based priors with advanced ML techniques to address the computational and practical limitations inherent in first-principle-based methods and brute-force machine learning methods. Our framework showcases four algorithms, each embedding a specific physics-based prior tailored to a particular class of nonlinear systems, including separable and nonseparable Hamiltonian systems, hyperbolic partial differential equations, and incompressible fluid dynamics. The intrinsic incorporation of physical laws preserves the system’s intrinsic symmetries and conservation laws, ensuring solutions are physically plausible and computationally efficient. The integration of these priors also enhances the expressive power of neural networks, enabling them to capture complex patterns typical in physical phenomena that conventional methods often miss. As a result, our models outperform existing data-driven techniques in terms of prediction accuracy, robustness, and predictive capability, particularly in recognizing features absent from the training set, despite relying on small datasets, short training periods, and small sample sizes.
Acknowledgements
I am deeply grateful for the generous financial support from Dartmouth Undergraduate Advising & Research, which provided me with the Presidential Scholarship, Sophomore and Junior Research Scholarships, the Leave Term Research Grant, and support through the Women in Science Project. Additionally, I wish to acknowledge the Neukom Scholarship program from Neukom Institute for Computational Science.
I extend my heartfelt thanks to my supervisor, Professor Bo Zhu, for his unwavering support and profound inspiration. I am especially grateful for the opportunities he offered me as a first-year student, which opened my career as a researcher. I wish him a prolific and successful career at Georgia Tech.
I also recognize the invaluable assistance of the team at the Dartmouth Visual Computing Lab. Special thanks to Dr. Shiying Xiong, now an Assistant Professor at Zhejiang University, for his extensive help in various aspects of my research. He is not only a super talented researchers in computational physics but also a remarkable coworker. Additional thanks go to Xingzhe He for his assistance with deep learning algorithms, among many others in the lab. Without their collective effort and collaboration, these works would not have been possible.
I also want to thank Professor Deeparnab Chakrabarty for his support and inspiration, as well as Professor Soroush Vosoughi and Professor Yaoqing Yang for being on my thesis committee. Additionally, I am grateful to the other professors and students who have taught and helped me at Dartmouth.
List of Publications
The following papers were published during the completion of my undergraduate studies and will be introduced in this thesis (listed in chronological order):
-
1.
Tong, Y., Xiong, S., He, X., Pan, G., & Zhu, B. (2021). Symplectic Neural Networks in Taylor Series Form for Hamiltonian Systems. Journal of Computational Physics, 437, 110325.
-
2.
Xiong, S., Tong, Y., He, X., Yang, S., Yang, C., & Zhu, B. (2021). Nonseparable Symplectic Neural Networks. In Proceedings of the International Conference on Learning Representations.
-
3.
Xiong, S., He, X., Tong, Y., Deng, Y., & Zhu, B. (2023). Neural Vortex Method: from Finite Lagrangian Particles to Infinite Dimensional Eulerian Dynamics. Computers and Fluids, 258, 105811.
-
4.
Tong, Y., Xiong, S., He, X., Yang, S., Wang, Z., Tao, R., Liu, R., & Zhu, B. (2024). RoeNet: Predicting Discontinuity of Hyperbolic Systems from Continuous Data. International Journal for Numerical Methods in Engineering, 125, e7406.
Author’s Contribution
The work presented in this thesis is the product of scientific collaboration. Here I detail my specific contributions to each project. For the project listed first [1], my responsibilities include the initial generation of research ideas, implementation of the methodologies, conducting experiments, and writing the research paper. For the remaining three projects [2, 3, 4], my primary roles involved conducting experiments and writing the respective papers. Additionally, for the fourth project [4], I was involved in idea generation and was responsible of the paper writing and revision.
Contents
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 4 Implementation
- 5 Results
- 6 Conclusion
1 Introduction
From the time of Newton, two principal paradigms have shaped the methodologies of scientific research: the Keplerian paradigm, or the data-driven approach, and the Newtonian paradigm, or the first-principle-based approach [5]. The first-principle-based approach is fundamental and elegant, but the dilemma we often face is its practicality. There are many time-dependent problems in science, where the equations of motion are too complex for full solution, either because the equations are not certain or because the computational cost is too high. Additionally, for a dynamic system governed by some unknown mechanics, it is challenging to identify governing equations by directly observing the system’s state, especially when such observation is partial and the sample data is sparse.
Now, the data-driven approach has become a very powerful tool with the advancement of statistical methods and machine learning (ML). This approach enables us to handle physical systems by statistically exploring their underlying structures. Data-driven approaches have proven their efficacy in uncovering the underlying governing equations of a variety of physical systems, ranging from fluid mechanics [6] and wave physics [7] to quantum physics [8], thermodynamics [9], and materials science [10]. Moreover, various ML methods have significantly advanced the numerical simulation of complex and high-dimensional dynamical systems. These methods integrate learning paradigms with simulation infrastructures, enhancing the modeling of ordinary differential equations [11], linear and nonlinear partial differential equations [4, 12], high-dimensional partial differential equations [13], and inverse problems [14], among others.
Despite these advancements, data-driven methods like neural networks, which exhibit remarkable generalization abilities across various fields, face significant challenges. These methods require large, clean datasets and depend heavily on complex, black-box network structures that are highly sensitive to input variations. Additionally, brute-force machine learning with conventional toolkits such as deep neural networks often struggles with the high dimensionality of input-output spaces, the cost of data acquisition, the production of physically implausible results, and the inability to handle extrapolation robustly. These factors make it difficult to predict long-term dynamical behaviors accurately.
To address these challenges, we introduce a novel, data-driven framework designed to make accurate, long-term predictions in a computationally efficient manner. The key innovation lies in incorporating physics-based priors into the learning algorithms so that the physics structure of the underlying system is intrinsically preserved. As a result, our models outperform other state-of-the-art data-driven methods in terms of prediction accuracy, robustness, and predictive capability, particularly in recognizing features absent from the training set. This superior performance is achieved despite relying on smaller datasets, shorter training periods, and limited sample sizes. At the same time, our models are significantly more computationally efficient than traditional first-principles-based methods, while achieving a similar level of accuracy.
This thesis details four algorithms we have developed over time, each incorporating a distinct physics-based prior relevant to a specific type of nonlinear system. The algorithm names, the associated physics priors, and the systems they address are as follows:
-
1.
Symplectic Taylor Neural Networks (Taylor-nets): The symplectic structure in separable Hamiltonian systems [1],
-
2.
Nonseparable Symplectic Neural Networks (NSSNNs): The symplectic structure in nonseparable Hamiltonian systems [2],
-
3.
Roe Neural Networks (RoeNet): Hyperbolic Conservation Law in hyperbolic partial differential equations (PDEs) [4],
-
4.
Neural Vortex Method (NVM): Helmholtz’s Theorems in incompressible fluid dynamics [3].
Overall, the key advantages and contributions of our methodologies are as follows:
-
•
Preservation of Intrinsic Symmetries and Conservation Laws: Our methodologies integrate physics-based priors within the learning algorithms, which significantly narrows the solution space. This reduction not only streamlines the computational demands but also preserves the mathematical symmetries and physical conservation laws inherent in the systems being modeled. Such an approach ensures that the generated solutions are not only efficient but also robust and aligned with physical reality, enhancing both the reliability and validity of the predictions.
-
•
Enhanced Expressive Power of Neural Networks: By embedding physics-based structures into our models, we expand the network’s capacity to capture and reproduce complex patterns that are typical in solutions to physical phenomena. Conventional deep neural networks often struggle to identify such patterns when they are not represented within the training dataset. Our approach supports generalized solutions to PDEs and expands the solution space, allowing for a more comprehensive encapsulation of the potential physical behaviors, significantly improving the model’s applicability and predictive accuracy.
The thesis will be organized into several key sections: an introductory section and a related work section that outline the research background; a methodology section that elaborates on the mathematical foundations, including an introduction to the supervised learning and numerical methods we used to develop our methodologies; an implementation section that details the algorithm design and proofs of four methodologies respectively; and a results section that summarizes the key implementation and experimental findings. The paper will conclude with a discussion on the implications of the results and potential avenues for future research.
Taylor-nets | NSSNNs | RoeNet | NVM | |
---|---|---|---|---|
Physics System | Separable Hamiltonians | Nonseparable Hamiltonians | Hyperbolic PDEs | Incompressible Fluid Dynamics |
Prior Embedded | Symplectic Structure | Symplectic Structure | Hyperbolic Conservation Law | Helmholtz’s Theorems |
Solver | Separable Symplectic Integrator | Nonseparable Symplectic Integrator | Roe Solver | Lagrangian Vortex Method |
Key Advantages | Accurately approximate the continuous-time evolution over a long term | Predict future discontinuities with short-term continuous data | Reconstruct continuous vortex dynamics with a small number of vortex particles |
Table 1 summarizes the key concepts related to the four methods, including the specific physics systems they model, the type of physical principles or priors they embed, the integrative techniques they employ, and the primary advantages each method offers. These comparative insights provide an at-a-glance understanding of the distinct capabilities and applications of each method. The details will be addressed comprehensively in Section 3.2 and Section 4.
2 Related Work
Neural Networks for Hamiltonian Systems.
Greydanus et al. introduce Hamiltonian Neural Networks (HNNs) to preserve the Hamiltonian energy of systems by reformulating the loss function [15]. Inspired by HNNs, a series of methods that intrinsically embed a symplectic integrator into the recurrent neural network architecture were proposed, including SRNN [16], and SSINN [17]. Methods like HNN face two primary challenges: they require the temporal derivatives of system momentum and position to compute the loss function, which are hard to obtain from real-world systems, and they do not strictly preserve the symplectic structure as their symplectomorphism is governed by the loss function. Our model, Taylor-net [1], addresses these limitations by integrating a solver into the network architecture to avoid the need for time derivatives and by embedding a symmetrical structure directly within the neural networks, rather than adjusting the loss function. Moreover, these methods have been extended, via combination with graph networks [18, 19], to address large-scale N-body problems where interactions are driven by forces between particle pairs.
While the above methods are all designed to solve separable Hamiltonian systems, ** et al. proposed SympNet, which constructs symplectic map**s of system variables across neighboring time steps to handle both separable and nonseparable Hamiltonian systems [20]. However, the parameter scalability of SympNet, growing quadratically with the system size , poses challenges for application to high-dimensional N-body problems. Our model, NSSNN, addresses these issues with a novel network architecture tailored for nonseparable systems, which significantly reduces the complexity of parameter scaling [2]. Additionally, Hamiltonian-based neural networks have been adapted for broader applications. Toth et al. developed the Hamiltonian Generative Network (HGN) to infer Hamiltonian dynamics from high-dimensional observations, such as image data [21]. Furthermore, Zhong et al. introduced Symplectic ODE-Net (SymODEN), which incorporates an external control term into the standard Hamiltonian framework, enhancing the model’s applicability to controlled dynamical systems [22].
Neural Networks for Discontinuous Functions.
The use of deep learning networks to approximate discontinuous functions is well-supported theoretically, as highlighted in various studies on Hölder spaces [23], piecewise smooth functions [24], linear estimators [25], and highly adaptive, spatially anisotropic target functions [26]. Building on these foundations, Physics-Informed Neural Networks (PINNs) were introduced by Raissi et al. as a data-driven approach to solving nonlinear problems [27], leveraging the well-kown capability of deep neural networks to act as universal function approximators [28]. Among their key attributes, PINNs ensure the preservation of symmetry, invariance, and conservation principles that are inherent in the physical laws governing the observed data [29]. Michoski et al. demonstrated that PINNs could capture irregular solutions to PDEs without the need for any regularization [30]. Additionally, Mao et al. utilized PINNs to approximate solutions for high-speed flows by integrating the Euler equations with initial and boundary conditions into the loss function [31]. However, while these studies demonstrate the robust capabilities of PINNs, they often do not address extrapolation beyond the training set, a critical aspect for ensuring the generalizability of the models to a wider range of scenarios.
Neural Networks for Fluid Dynamics.
Recent advancements in fluid dynamics analysis have increasingly leveraged data-driven approaches powered by machine learning [32, 33, 34]. Recognizing the limitations in traditional brute-force machine learning methods, current research efforts are increasingly focused on integrating physical priors into learning algorithms, aiming to equip neural networks with a foundational understanding of physical laws, rather than approaching the data naively [35, 36, 37, 38, 39, 40]. Significant efforts have been made to encode these physical constraints efficiently, such as incorporating the Navier-Stokes (NS) equations [12], modeling incompressibility constraints [41], and map** dynamics of wave phenomena onto recurrent neural network computations [7]. Moreover, understanding complex fluid dynamics through machine learning involves embedding the structure of partial differential equations (PDEs) within neural network architectures [42, 43, 44, 45, 46, 47]. Ideally, these machine learning models designed to solve PDEs should be able to evolve the flow fields independently, obttaining initial-condition invariance without the need for a specific solver. However, the high dimensionality of the problems and insufficient supervisory data continue to pose significant challenges.
3 Methodology
3.1 Supervised learning
We used supervised learning for all of our models. Supervised learning is a subset of machine learning where an algorithm learns a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples. Each example is a pair consisting of an input object and a desired output value. The supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for map** new examples. Sequential steps involved in develo** a supervised learning model, from determining the type of training dataset to evaluating the model’s accuracy are:
-
1.
Determine the Type of Training Dataset: Identify whether the problem is a classification or regression to select the appropriate type of training dataset.
-
2.
Collect/Gather the Labelled Training Data: Assemble a dataset where each instance is tagged with the correct answer or outcome.
-
3.
Split the Training Dataset: Divide the dataset into three parts:
-
•
Training dataset: used to train the model.
-
•
Test dataset: used to test the model’s predictions.
-
•
Validation dataset: used to tune the model’s hyperparameters.
-
•
-
4.
Determine the Input Features: Select the features of the training dataset that contain sufficient information for the model to accurately predict the output.
-
5.
Determine the Suitable Algorithm: Choose an appropriate algorithm for the model based on the problem type.
-
6.
Execute the Algorithm on the Training Dataset: Train the model using the selected algorithm on the training dataset. Utilize the validation set to adjust control parameters as needed.
-
7.
Evaluate the Model’s Accuracy: Test the model using the test dataset to assess its accuracy. A model that correctly predicts the output indicates high accuracy.
3.1.1 Optimizer
In the context of neural networks, optimizers are crucial for minimizing the loss function, i.e., the difference between the actual and predicted outputs. One of the popular optimizers is the Adam optimizer [48], which combines the advantages of two other extensions of stochastic gradient descent, namely Adaptive Gradient Algorithm and Root Mean Square Propagation. The Adam optimizer’s update equations are given by:
where represents the parameters of the model, is the gradient of the loss function with respect to the parameters at timestep , and are estimates of the first and the second moments of the gradients, respectively. is the learning rate, , and are hyperparameters.
3.1.2 Loss Functions
The choice of loss function is pivotal in guiding the training of the model towards its objective. In our methods, we use several common loss functions in supervised learning, including:
L1 Loss (Absolute Loss)
Defined as , where is the true value and is the predicted value.
L2 Loss (Squared Loss)
Given by . This loss function is sensitive to outliers as it squares the differences, hence penalizing larger errors more.
Cross-Entropy Loss
The Cross-Entropy Loss is widely used in classification tasks to measure the performance of a classification model whose output is a probability value between 0 and 1. The Cross-Entropy Loss formula is given by:
where is the loss function, is the number of samples, is the actual label of the -th sample, and is the predicted probability for the -th sample.
For multi-class classification, the generalized formula is:
where is the number of classes, indicates whether class is the correct classification for observation , and is the predicted probability that observation is of class .
Focal Loss
Focal Loss is an adapted version of Cross-Entropy Loss, which addresses the problem of class imbalance by focusing more on hard-to-classify examples. It is particularly useful in scenarios where there is a large class imbalance. The formula for Focal Loss is given by:
where is a weighting factor for the class to counteract class imbalance, is a focusing parameter that adjusts the rate at which easy examples are down-weighted, is the predicted probability of the class with label , and reduces the loss for well-classified examples, putting more focus on hard, misclassified examples. Focal Loss is particularly useful for training on datasets where some classes are much more frequent than others, hel** to improve the robustness and performance of classification models in imbalanced datasets.
3.1.3 Activation Functions
Activation functions are non-linear functions applied to the output of a neuron in a neural network. They decide whether a neuron should be activated or not, hel** the network learn complex patterns in the data.
ReLU
One of the most popular activation functions is the Rectified Linear Unit (ReLU). It is defined as:
where is the input to the neuron. ReLU is favored for its simplicity and efficiency, promoting faster convergence in training due to its linear, non-saturating form.
Next, we will outline the various models that were employed in the development of our model.
3.1.4 Residual Networks (ResNets) and Residual Blocks (ResBlocks)
ResNets are designed to enable training of very deep neural networks through the introduction of Residual Blocks (ResBlocks), which use skip connections or shortcuts to jump over some layers [49]. ResNets have been proven in numerous research studies to be a neural network architecture highly suitable for deep learning and computer vision. It offers distinctive advantages in mitigating problems like gradient vanishing during network training.
ResBlocks
A Residual Block allows the gradient to flow through the network directly, without passing through non-linear activations, by using skip connections. This is mathematically represented as:
where is the input to the ResBlock, represents the residual map** to be learned by layers of the ResBlock, and is the output of the ResBlock. The addition operation is element-wise, allowing the network to learn identity map**s efficiently, which is crucial for training deep networks.
3.1.5 Neural Ordinary Differential Equations (Neural ODEs) and the Adjoint Method
Neural ODEs are a class of models that represent the continuous dynamics of hidden states using differential equations [50]. Unlike traditional neural networks that apply a discrete sequence of transformations, Neural ODEs model the derivative of the hidden state as a continuous transformation:
(1) |
where is the hidden state at time , is a neural network parameterized by defining the time derivative of the hidden state, making the model capable of learning continuous-time dynamics.
At the heart of the model is that under the perspective of viewing a neural network as a dynamic system, we can treat the chain of residual blocks in a neural network as the solution of an ordinary differential equation (ODE) with the Euler method. Given a residual network that consists of sequence of transformations
(2) |
the idea is to parameterize the continuous dynamics using an ODE specified by the neural network specified in (1).
In a Neural ODE framework, the evolution of the hidden state is governed by an ODE parameterized by a neural network:
(3) |
where is time, represents the parameters of the neural network, and is a function approximated by the neural network defining the dynamics of .
To optimize Neural ODEs, the adjoint method is utilized, providing an efficient means for calculating gradients with respect to the parameters during backpropagation [51]. Rather than differentiating through the ODE solver, we solve the adjoint ODE defined as:
(4) |
where is the gradient of the loss function with respect to the hidden state.
The gradient of the loss with respect to the parameters is then obtained by integrating:
(5) |
over the interval from to , the duration of the forward pass. The adjoint state is initialized at the end of the forward pass and integrated backward in time to obtain the necessary gradients for parameter updates.
3.2 Numerical Methods
First, we present four methods for solving ordinary differential equations (ODEs), which include the Euler method, Runge-Kutta method, Symplectic Integrator, and Non-separable Symplectic Integrator.
3.2.1 Euler Method
The Euler method represents one of the most straightforward numerical strategies for approximating solutions to ODEs. As a first-order numerical method, it provides an initial approach for solving initial value problems defined by with the initial condition . Despite its simplicity, the Euler method is fundamental in the introduction to more sophisticated numerical methods for differential equations.
This method calculates the next state vector by proceeding in the direction of the derivative , scaled by the timestep . The updated state at time is given by:
As a consequence of its first-order accuracy, the local truncation error for the Euler method is of the order , while the global error is of the order . This relatively large error suggests that while the Euler method can be beneficial for straightforward problems and educational purposes, it may not be the best choice for scenarios that demand high precision over extended durations.
3.2.2 Runge-Kutta Method
The Runge-Kutta methods are a prominent family of iterative techniques for the numerical resolution of ODEs. The fourth-order Runge-Kutta method, commonly referred to as RK4, is particularly renowned for its balance between computational efficiency and accuracy. This method is applied to approximate the solution of an initial value problem defined by the ODE with the initial condition .
RK4 progresses the solution by computing a weighted average of four increments, where each increment evaluates the derivative at various points within the timestep . The solution at a subsequent time is determined using the formula:
with the increments given by:
As a fourth-order method, the RK4 achieves a local truncation error of the order and a global error of the order . This substantial accuracy renders the RK4 method highly effective for a broad spectrum of applications, offering an excellent trade-off between the computational demands and the precision of the solution.
3.2.3 Separable Symplectic Integrator
Symplectic integrators are a class of numerical integration schemes specifically designed for simulating Hamiltonian systems.
A Hamiltonian system is characterized by pairs of canonical coordinates, denoted by generalized positions and generalized momenta . The evolution of these coordinates over time is governed by Hamilton’s equations, expressed as
(6) |
with the initial condition
(7) |
In a general setting, represents the positions and denotes their momentum. Function is the Hamiltonian, which corresponds to the total energy of the system.
In a seperable Hamiltonian system, the Hamiltonian can be split into a kinetic energy part and a potential energy part . Consequently, the Hamiltonian of a separable Hamiltonian system can e expressed in this form:
(8) |
The Symplectic integrators are distinguished by their ability to preserve the symplectic structure of phase space, an essential property for ensuring the long-term stability and accuracy of the simulation. By conserving quantities analogous to energy, these methods avoid the numerical dissipation typical of other numerical schemes, making them particularly well-suited for simulating dynamical systems over extended periods.
The specific Symplectic integrators we use is the fourth-order symplectic integrator, as described in the context of Hamiltonian systems and notably referenced in works by Forest and Ruth [52] and Yoshida [53]. It operates by applying a sequence of operations that integrate the system’s equations of motion over a timestep while preserving the symplectic geometry of phase space. This preservation is crucial for accurately simulating the long-term behavior of Hamiltonian systems. The integrator is specifically designed for separable Hamiltonian systems shown in eqaution (8). The fourth-order symplectic integrator updates the system’s state over a time step by applying a sequence of operations that preserve the symplectic structure. The procedure is as follows:
1. Initialize with at .
2. For each time step , update through the following sequence of operations:
-
1.
For each step from 1 to 4, execute the following updates:
-
•
Update momentum by a fraction of the time step:
(9) -
•
Update position by a fraction of the time step:
(10)
-
•
The coefficients and are chosen to eliminate lower-order error terms, ensuring fourth-order accuracy. These coefficients are typically defined as [52, 53, 54]:
(11) | |||||||
Repeat these steps for each time step , iteratively advancing the system from at to at , where is the number of time steps.
The fourth-order symplectic integrator is characterized by its fourth-order accuracy in the numerical simulation of Hamiltonian systems. This indicates that the local truncation error of the method is of the order , implying that the error introduced in a single timestep decreases as the fifth power of the timestep size. Consequently, the global error, or the cumulative error over a fixed interval of time, is of the order . Such high-order accuracy is especially beneficial for simulations requiring long-term stability and precision, as it permits the use of relatively large timestep sizes while maintaining a low overall numerical error.
3.2.4 Nonseparable Symplectic Integrator
Given a Hamiltonian system described in (6) with initial condition (7), we now consider a more genral case, an arbitrary separable and nonseparable Hamiltonian system. In the original research of [55] in computational physics, a generic, high-order, explicit and symplectic time integrator was proposed to solve (6) of an arbitrary separable and nonseparable Hamiltonian . This is implemented by considering an augmented Hamiltonian
(12) |
with
(13) |
in an extended phase space with symplectic two form , where is a constant that controls the binding of the original system and the artificial restraint.
Notice that the Hamilton’s equations for
(14) |
with the initial condition have the same exact solution as (6) in the sense that . Hence, we can get the solution of (6) by solving (14). The coefficient acts as a regularizer, which stabilizes the numerical results.
It is possible to construct high-order symplectic integrators for with explicit updates. Denote respectively by , , and , which are the time- flow of , , . , , and are given by
(15) |
respectively. Here
(16) |
We remark that and are just auxiliary variables, which are theoretically equal to and .
Then we construct a numerical integrator that approximates by composing these maps: it is well known that
(17) |
commonly named as Strang splitting, has a 3rd-order local error (thus a 2nd-order method), and is a symmetric method.
Next, we introduce two methods for solving partial differential equations (PDEs), which are the Roe solver and Lagrangian Vortex Method.
3.2.5 Roe Solver
In continuum mechanics, a one-dimensional hyperbolic conservation law is a first-order quasilinear hyperbolic PDE
(18) |
with an initial condition
(19) |
and a proper boundary condition. Here the -component vector is the conserved quantity, denotes the time variable, denotes the spatial coordinate in a computational domain , and is a -component flux function. The conservation laws described by (18) are fundamental in continuum mechanics, such as mass conservation, momentum conservation, and energy conservation in fluid mechanics [56].
Equation (18) can also be expressed in a weak form, which extends the class of admissible solutions to include discontinuous solutions. Specifically, by defining an arbitrary test function that is continuously differentiable both in time and space with compact support, and integrating (18) in the space-time domain , the weak form of (18) is derived as
(20) |
We remark that, with generalized Stokes theorem, all the partial derivatives of and in (18) have been passed on to the test function in (20), which with the former hypothesis is sufficiently smooth to admit these derivatives [57]. In the absence of ambiguity, we refer to the solution of (18) below as a weak solution that satisfies (20).
In addition, (18) can be written in a high dimensional form
(21) |
where denote the -dimensional spatial coordinates. Since every dimension in the second term of (21), namely , has the same form as the second term of (18), (21) can be easily solved if given the solution of (18). Thus, we will only discuss the numerical method to solve (18).
Philip L. Roe proposed an approximated Riemann solver based on the Godunov scheme [58] that constructs an estimation for the intercell numerical flux of in (18) on the interface of two neighboring computational cells in a discretized space-time computational domain [58]. In particular, the Roe solver discretizes (18) as
(22) |
where is the ratio of the temporal step size to the spatial step size , is the grid node index, and
(23) |
with
(24) |
Here, Roe matrix that is assumed constant between two cells and must obey the following Roe conditions:
-
1.
Matrix is a diagonalizable matrix with real eigenvalues, i.e., matrix can be diagonalized as
(25) with an invertible matrix and a diagonal matrix .
-
2.
Matrix is consistent with an exact Jacobian, that is
(26) -
3.
Physical quantity is conserved on the interface between two computational cells as
(27)
We denote the absolute value of as
(28) |
where is the absolute value of . Substituting (23), (24) and (28) into (22) along with the third Roe condition (27) yields
(29) | ||||
with
(30) |
Equation (29) serves as a template of evolution from to in Roe solver.
The key to design an effective Roe solver is to find the Roe matrix that satisfies the three Roe conditions. In order to construct a Roe matrix in (25), Roe solver utilizes an analytical approach to solve and based on . The Roe matrix is then plugged into (29) to ultimately solve for in (18). The Roe solver linearizes Riemann problems, and such linearization recognizes the problem’s nonlinear jumps, while remaining computationally efficient.
3.2.6 Lagrangian Vortex Method (LVM)
Given a fluid velocity field with an incompressible constraint, its underlying dynamics can be described by the NS equations
(31) |
where denotes the time, is the material derivative, is the pressure, is the kinematic viscosity, is the density, and is the body accelerations (per unit mass) acting on the continuum, for example, gravity, inertial accelerations, electric field acceleration, and so on.
The alternative form of the NS equations could be obtained by defining the vorticity field , which leads to the following vorticity dynamical equation
(32) |
where is a vector potential whose curl is the velocity field. Although this form does not seem to bring any simplification, the key illumination of doing this transformation stems the Helmholtz’s theorems [59], which states that the dynamics of the vorticity field can be described by vortex surfaces/lines, which are Lagrangian surfaces/lines flowing with the velocity field in inviscid flows [60, 61].
The LVM discretizes the vorticity dynamical equation (32) with particles resulting in a set of ODEs for the particle strengths and the particle positions as
(33) |
Here, the particle strength is the integral of over the computational element, is the induced velocity calculated by BS law
(34) |
where is the dimension of the flow field. In addition, and are the change rate of the particle strength and the drift velocity [62], respectively. To avoid singularities in the BS law, we introduce the numerical regularization parameter in the LVM as . The effect of the regularization parameter on the dynamics of the flow evolution of the simulated vortex particles is rather small because of the large spacing between the vortex particles.
In a two-dimensional ideal fluid flow, i.e., a strictly inviscid barotropic flow with conservative body forces, the movements of Lagrangian particles with conserved vorticity strength are determined by the velocity field they create, thus allowing us to advance the simulation temporally [63]. However, in the real three-dimensional flow, under the action of vortex stretching, vortex distortion, viscous dissipation, external forces, etc., the Lagrangian advection of vortex particles and their strength need to be corrected by and in (33).
We remark that the NS equations can be accurately modeled by the LVM with a large number of computational elements and a reasonable discrete distribution. However, the implementation of the LVM faces a major challenge which is to model the right-hand sides (r.h.s.) of the set of ordinary differential equations based on the NS equations. Firstly, the assumption that the vortices are point-like largely limits the use of the continuous BS law. Second, the drift velocity due to the external force cannot be obtained using the LVM without knowing the function of the external force. Even given the function, the LVM still fails to capture the drift velocity accurately in most cases [62]. Finally, when two particles are close enough, the singularity of the discrete BS law leads to a significant numerical error. The above problems make the LVM inaccurate and inapplicable in solving the underlying fluid dynamics under many situations [63].
4 Implementation
4.1 Symplectic Taylor Neural Networks (Taylor-nets)
4.1.1 Symplectomorphism in Hamiltonian Mechanics
Given a separable Hamiltonian system described by (6), (7), and (8). Substituting (8) into (6) yields
(35) |
This set of equations is fundamental in designing our neural networks. Our model will learn the r.h.s. of (35) under the framework of ODE-net.
One of the important features of the time evolution of Hamilton’s equations is symplectomorphism, which represents a transformation of phase space that is volume-preserving. In the setting of canonical coordinates, symplectomorphism means the transformation of the phase flow of a Hamiltonian system conserves the symplectic two-form
(36) |
where denotes the wedge product of two differential forms. Inspired by the symplectomorphism feature, we aim to construct a neural network architecture that intrinsically preserves Hamiltonian structure.
4.1.2 A symmetric network in Taylor expansion form
In order to learn the gradients of the Hamiltonian with respect to the generalized coordinates, we propose the following underpinning mechanism, which is a set of symmetric networks that learn the gradients of the Hamiltonian with respect to the generalized coordinates.
(37) |
with parameters that are designed to learn the r.h.s. of (35), respectively. Here, the “” represents our attempt to use the left-hand side (l.h.s) to learn the r.h.s. Substituting (37) into (35) yields
(38) |
Therefore, under the initial condition (7), the trajectories of the canonical coordinates can be integrated as
(39) |
From (37), we obtain
(40) |
The r.h.s. of (40) are the Hessian matrix of and respectively, so we can design and as symmetric map**s, that are
(41) |
and
(42) |
Due to the multiple nonlinear layers in the construction of traditional deep neural networks, it is impossible for these deep neural networks to fulfill (41) and (42). Therefore, we can only use a three-layer network with the form of linear-activation-linear, where the weights of the two linear layers are the transpose of each other, and in order to still maintain the expressive power of the networks, we construct symmetric nonlinear terms, as same as the terms of a Taylor polynomial, and combine them linearly. Specifically, we construct a symmetric network as
(43) |
where ‘’ denotes the function composition, and are fully connected layers with size , is a dimensional bias, is the number of terms in the Taylor series expansion, and is an element-wise function, representing the order term in the Taylor polynomial
(44) |
Figure 1 plots a schematic diagram of in Taylor-net. The input of is , and , . We construct a negative term following a positive term , since two positive semidefinite matrices with opposite signs can represent any symmetric matrix.
![Refer to caption](x1.png)
(48) |
Here, , , and have the same structure as (43), and , .
4.1.3 Symplectic Taylor neural networks
Next, we substitute the constructed network (43) and (48) into (39) to learn the Hamiltonian system (35). We employ ODE-net [50] introduced in 3.1.5 as our computational infrastructure. Inspired by the idea of ODE-net, we design neural networks that can learn continuous time evolution. In Hamiltonian system (35), where the coordinates are integrated as (39), we can implement a time integrator to solve for and . While ODE-net uses fourth-order Runge–Kutta method to make the neural networks structure-preserving, we need to implement an integrator that is symplectic. Therefore, we introduce Taylor-net, in which we design the symmetric Taylor series expansion and utilize the fourth-order symplectic integrator to construct neural networks that are symplectic to learn the gradients of the Hamiltonian with respect to the generalized coordinates and ultimately the temporal integral of a Hamiltonian system.
For the constructed networks (43) and (48), we integrate (39) by using the fourth-order symplectic integrator introduced in 3.2.3. Specifically, we will have an input layer at and an output layer at . The recursive relations of , can be expressed by Algorithm 1. The input function in Algorithm 1 are
Relationships (49) and (50) are obtained by replacing and in the fourth-order symplectic integrator with deliberately designed neural networks and , respectively. Figure 2 plots a schematic diagram of Taylor-net which is described by Algorithm 1. The input of Taylor-net is , and the output is . Taylor-net consists of iterations of fourth-order symplectic integrator. The input of the integrator is , and the output is . Within the integrator, the output of is used to calculate , while the output of is used to calculate , which is signified by the shoelace-like pattern in the diagram. The four intermediate variables and indicate that the scheme is fourth-order.
![Refer to caption](x2.png)
By constructing the network in (43) that satisfies (41), we show that Theorem 4.2 holds, so the network (49) preserves the symplectic structure of the system.
Theorem 4.2.
Proof.
Let
(51) |
From (49), we have
(52) | ||||
Here refers to the entry in the -th row and -th column of a matrix , refers to the -th component of vector . From (55), we know that is equivalent to
(53) |
which is (41). ∎
Similar to Theorem 4.2, we can find the relationship between and the Jacobian of . The proof of 4.3 is omitted as it is similar to the proof of Theorem 4.2.
Theorem 4.3.
4.2 Nonseparable Symplectic Neural Networks (NSSNNs)
Our model aims to learn the dynamical evolution of in (6) by embedding (14) into the framework of NeuralODE [50]. We learn the nonseparable Hamiltonian dynamics (6) by constructing an augmented system (14), from which we can obtain the energy function by training the neural network with parameter and calculate the gradient by taking the in-graph gradient. For the constructed network , we integrate (14) by using the second-order symplectic integrator [55]. Specifically, we will have an input layer at and an output layer at .
The recursive relations of , can be expressed by Algorithm 2. Figure 3(a) shows the forward pass of NSSNN is composed of a forward pass through a differentiable symplectic integrator as well as a backpropagation step through the model. Figure 3(b) plots the schematic diagram of NSSNN. For the constructed network , we integrate (14) by using the second-order symplectic integrator [55]. Specifically, The input layer of the integrator is at and the output layer is at . The recursive relations of , are expressed by Algorithm 2. Moreover, given (15), since and are theoretically equal to and , we can use the data set of to construct the data set containing variables .
![Refer to caption](x3.png)
In addition, by constructing the network , we show that Theorem 4.4 holds, so the networks , and in (15) preserve the symplectic structure of the system. Suppose that and are two symplectomorphisms. Then, it is easy to show that their composite map is also symplectomorphism due to the chain rule. Thus, the symplectomorphism of Algorithm 2 can be guaranteed by Theorem 4.4.
Theorem 4.4.
For a given , the map** , , and in (15) are symplectomorphisms.
Proof.
Let
(54) |
![Refer to caption](x4.png)
.
We show a motivational example in Figure 4 by comparing our approach with a traditional HNN method [15] regarding their structural designs and predicting abilities. We refer the readers to Section 5.2.3 for a detailed discussion. As shown in Figure 4, the vortices evolved using NSSNN are separated nicely as the ground truth, while the vortices merge together using HNN due to the failure of conserving the symplectic structure of a nonseparable system. The conservative capability of NSSNN springs from our design of the auxiliary variables (red and ) which converts the original nonseparable system into a higher dimensional quasi-separable system where we can adopt a symplectic integrator.
4.3 Roe Neural Networks (RoeNet)
![Refer to caption](x5.png)
We introduce our design of the Roe template with pseudoinverse embedding, which accommodates the data processing and training over the entire learning pipeline. In particular, we present our basic ideas in Section 4.3.1, a detailed description of our network architecture in Section 4.3.2.
4.3.1 Roe template with Pseudoinverse Embedding
Recall the one-dimensional hyperbolic conservation law described in (18), without a given , we learn the weak solution of (18) using a neural network that incorporates the framework of a Roe solver. For time integration of in (29), we need to construct the matrix functions and . Since learning a tiny parameter space is impractical, using neural networks to approximate and directly in (30) is ineffective given that the number of learnable parameters is limited by the number of components of . To enhance the expressiveness of our model, we use neural network and to replace and in (30) respectively. Similar to (30), the inputs to and remains the same as . However, the outputs of and are now a matrix and a diagonal matrix respectively, where the positive integer is a hidden dimension. Furthermore, we introduce the concept of pseudoinverses by replacing with
(56) |
Here, the transpose and inverse operations are applied to the output matrix, that is
(57) |
Substituting , , and (56) into (29) and (30) yields
(58) | ||||
with
(59) |
Equation (58) serves as our template to evolve the system’s states from to in RoeNet.
Figure 5 presents a schematic diagram of RoeNet, which predicts future discontinuities from smooth observations. We note that for hyperbolic conservation laws with discontinuous solutions, RoeNet can accurately forecast long-term outcomes that are either fully or partially discontinuous. This is achievable even when the training data provided cover only a short window and contain limited information on discontinuities.
4.3.2 Neural network architecture
Figure 6 shows an overview of our neural network architecture. In summary, RoeNet consists of and , two networks embedded in (58) to serve as our template to evolve the system’s states from to .
![Refer to caption](x6.png)
Specifically, the network in Figure 6 contains two parts, each consists of a and a . The first part takes and as input of both and and outputs through and through . The input and is a vector of length with components. The output matrix is of size , and the other output matrix is a diagonal matrix of size . The second part takes and as the input for both and and outputs through and through . The input and is a vector of length . The output matrices and take the same form as the output matrices in the first part. Given the four output matrices , , , and , we combine them through (58) and (59) to obtain . Networks and both consist of a chain of ResBlocks [49] with a linear layer of size and at the end, respectively. The ResBlock architecture comprises two convolutional layers and one ReLU layer. The learned parameters by is then transferred into a diagonal matrix of with the learned parameters as its diagonal. The ResBlock has the same architecture as in [49], only with the 2D convolution layers replaced by linear layers. Note that the number in the parentheses is the dimension of the output of each ResBlock, and the computation procedure for grid cell is applied to all grid cells. Since the computation of each node is independent of other cells except the adjacent cells, we could train them in parallel to achieve high efficiency.
In addition, we implement two ways of padding to address different boundary conditions. For periodic boundary conditions, we use the periodic padding, e.g., if , then , where is the number of the grid node. For Neumann boundary conditions, we use the replicate padding, e.g., if , then .
By introducing a hidden dimension , we have increased the number of network parameters and enhanced the network’s expressive capacity. However, the expansion of the parameter space could lead to multiple numerical optimal solutions during training. To address this, we employ a regularized loss function, which helps ensure that the network parameters converge to a local optimal solution. Importantly, our goal is to use the network to accurately model the evolution of PDEs over time and space; achieving a unique solution for the network parameters is not a requirement.
Algorithm 3 summarizes the recursive relation from the input layer
(60) |
to the output layer
(61) |
for each time step in RoeNet. Here is the spatial grid size and is the time span or . As described in Algorithm 3, feeding , , temporal step , spatial step , and the constructed networks and into RoeNet, we could get predicted . Then, we choose the MSE as our loss function
(62) |
4.4 Neural Vortex Method (NVM)
To accurately and efficiently quantify fluid dynamics, we propose the novel NVM framework. This framework utilizes physics-informed neural networks to extract and translate information from the Eulerian specification of the flow field (or images of flow visualizations) into knowledge about the underlying fluid field. As detailed in Figure 7, we integrate these networks with a vorticity-to-velocity Poisson solver to build a fully automated toolchain that extracts high-resolution Eulerian flow fields from Lagrangian inductive priors. This design addresses the challenge of learning directly from high-dimensional observations, such as images, which traditional methods struggle to convert directly into velocity and pressure fields.
![Refer to caption](x7.png)
We construct a vortex detection network in Section 4.4.1 to identify the positions and the vorticity of Lagrangian vortices from a grid-based velocity field, which from a mathematical perspective connects (31) with (33). This approach simplifies the vorticity field to include only the detected vortices. Given the detected vortices, we then use a vortex dynamics network in Section 4.4.2 to learn the underlying governing dynamics of these finite structures. Dynamics networks accurately model the r.h.s.of (33) under various conditions, resolving the longstanding problem in LVM.
The training of the NVM involves two primary steps: training the detection and dynamics networks. We employ high-fidelity data from direct numerical simulation (DNS) of interactions among 2 to 6 vortices, although the model can generalize to any vorticity field with an arbitrary number of vortices. We initially train the detection network using data from randomly generated vortices and their vorticity fields, then identify vortices’ positions and strengths using this trained network to facilitate the subsequent training of the dynamics network.
4.4.1 Detection network
The input of the detection network is a vorticity field of size . As shown in Figure 8, we first feed the vorticity field into a small one-stage detection network and get the feature map of size (we downsampled 3 times). The detection network consists of a Conv2d-BatchNorm-ReLU combo and a 6-layer-structured ResBlock chain whose size can be adjusted dynamically to the complexity of the problem. The primary reason for downsampling is to avoid extremely unbalanced data and multiple predictions for the same vortex. We then forward the feature map to 2 branches. In the first branch, we conduct a convolution to generate a probability score of the possibility that there exists a vortex. If , we believe there exists a vortex within the corresponding cells of the original vorticity field. In the second branch, we predict the relative position to the left-up corner of the cell of the feature map if the cell contains a vortex. Afterward, we set a bounding box of around these predicted vortices and use the weighted average of the positions of the cells of the original vorticity field to find the exact position of the vortex. Finally, the vortex particle strength is calculated as the sum of the value of the cells in the bounding box normalized by the cell area.
![Refer to caption](x8.png)
In the training process, we penalize the wrong position detection only if the cell containing a vortex in the ground truth given by DNS is not detected. This idea is similar to the real-time object detection in [64]. We do not use the weighted average method to find the position in the training to ensure the detection network can produce detection results as accurately as possible. We use the focal loss [65] to further relieve the unbalanced classification problem.
We mainly use the detection network to generate training data for the dynamics network because we want to use the high-resolution data generated by the method mentioned in Section 5.4.1 instead of by the approximate particle method (BS law). Moreover, there are many situations where BS law is inapplicable, as discussed previously in Section 3.2.6. The detection network enables us to find the positions of the vortices accurately regardless of the situation.
The detection network is responsible for providing necessary information to the dynamics network. After the training, we use the well-trained detection network to detect the vortices in the initial vorticity fields and the evolved vorticity field, both generated by the method in Section 5.4.1. We then apply the nearest-neighbor method to pair the vortices detected in these two fields. Figure 9 shows the case of two fields at and . The idea of nearest-neighbor pairing can be perceived from Figure 9 (c). The sample, or these two fields, is dropped if different numbers of vortices are detected in the initial and evolved fields or if a large difference exists in the vorticity of paired vortices. The successfully detected vortices in the initial and evolved vorticity fields are passed together into the dynamics network for its training.
![Refer to caption](x9.png)
4.4.2 Dynamics network
To learn the underlying dynamics of the vortices, we build a graph neural network similar to [19]. We predict the velocity of one vortex due to influences exerted by the other vortices and the external force. Then we use the fourth-order Runge–Kutta integrator to calculate the position in the next timestamp. As shown in Figure 10, for each vortex, we use a neural network to predict the influences exerted by the other vortices and add them up. Specifically, for each th vortex, we consider the vortex . The difference of their positions can be calculated by , and their L2 distance is . The input of the is the vector of length 4. Here, pos and vort are detected by the detection network. The output of is a vector with the same dimension of the flow field, characterizing the induced velocity of the th vortex to the th vortex. In this way, we can calculate the induced velocity of each vortex () on the vortex . We sum up all the induced velocities on the vortex and treat the result as the induced velocity exerted by the other vortices.
![Refer to caption](x10.png)
In addition, we use another neural network , to predict the influence caused by the external force, which is determined by the local vorticity and the position of the vortex. The input of is a vector of length 3. The output is the influence exerted by the environment on the vortex , i.e., the induced velocity of the external force to th vortex.
The reason we separate the induced velocity into two parts, i.e., and , is as follows. On the one hand, the induced velocities between vortex particles are global, and exhibit a certain symmetry, i.e., the vortex particles interact with each other following the same law. In contrast, the influence of external forces on vortex particles is usually local and direct; thus, we do not need to consider the interaction between particles. The effect of the vortex stretching term in three-dimensional vortex flows or diffusion term in viscous flows is also local and should be included in network . Note that both the outputs of and are a vector with the same dimension of the flow field. Thus, we can add the two kinds of influence together, whose result is defined as the velocity of the vortex . We feed the velocity into the fourth-order Runge–Kutta integrator to obtain the predicted position of vortex .
In addition, in predicting the evolution of the flow field, NVM replaces the discrete BS method with a dynamics network composed of ResBlocks. We chose a 5-layer ResBloks to improve the expressiveness of the dynamics network so that we can learn dynamics of different complexity on the same network. Since the dynamics network with 5-layer ResBloks is more complex than the discrete BS method, the computational cost of NVM is higher than that of the Lagrangian vortex method. We remark that although the computational cost of ResBlocks itself is relatively large in NVM, the number of vortex particles needed to predict the evolution of the flow field using NVM is much smaller. Therefore, the overall computational cost of NVM can be greatly reduced.
5 Results
We present several experiments here to highlight the key advantages of our methodologies. For additional examples and ablation tests, please refer to [1, 2, 3, 4].
5.1 Taylor-nets
5.1.1 Dataset generation and training settings
To make a fair comparison with the ground truth, we generate our training and testing datasets by using the same numerical integrator based on a given analytical Hamiltonian. In the learning process, we generate training samples, and for each training sample, we first pick a random initial point (input), then use the symplectic integrator discussed in Section 3.2.3 to calculate the value (target) of the trajectory at the end of the training period . We do the same to generate a validation dataset with samples and the same time span as and calculate the validation loss along the training loss to evaluate the training process. In addition, we generate a set of testing data with samples and predicting time span that is around 6000 times larger and calculate the prediction error to evaluate the predictive ability of the model. For simplicity, we use to represent the predicted values using our trained model.
We remark that our training dataset is relatively smaller than that used by the other methods. Most of the methods, e.g. ODE-net [50] and HNN [15], have to rely on intermediate data in their training data to train the model. That is the dataset is
where are intermediate points collected within in between and . On the other hand, we only use two data points per sample, the initial data point and the end point, and our dataset looks like
which is times smaller the dataset of the other methods, if we do not count . Our predicting time span is around 6000 times the training period used in the training dataset (as compared to 10 times in HNN). This leads to a 600 times compression of the training data, in the dimension of temporal evolution. Note that we fix and in practice so that we can train our network more efficiently on GPU. One can also choose to generate training data with different for each sample to obtain more robust performance.
We use the Adam optimizer [48]. We choose the automatic differentiation method as our backward propagation method. We have tried both the adjoint sensitivity method, which is used in ODE-net [50] and the automatic differentiation method. Both methods can be used to train the model well. However, we found that using the adjoint sensitivity method is much slower than using the automatic differentiation method considering the large parameter size of neural networks.
All and in (43) are initialized as , where is the dimension of the system and is the size of the hidden layers. The loss function is
(63) |
The validation loss is the same as (67) but with dataset different from the training dataset. We choose loss, instead of Mean Square Error (MSE) loss because of its better performance.
We will introduce the experimental result for an ideal pendulum system, which is defined
(64) |
We pick a random initial point for training .
To show the predictive ability of our model, we pick . We pick 15 as the sample size since we find that small ’s are sufficient to generate excellent results. We use 100 epochs for training, and 10 as the (the period of learning rate decay), and 0.8 as (the multiplicative factor of learning rate decay). The learning rate of each parameter group is decayed by every epochs, which prevents the model from overshooting the local minimum. The dynamic learning rate can also make our model converge faster. indicates the number of terms of the Taylor polynomial introduced in the construction of the neural networks (43). Through experimentation, we find that 8 terms can represent most functions well. We choose 16 as , the dimension of hidden layers.
5.1.2 Predictive ability and robustness
![Refer to caption](extracted/5682117/graph/diff_new.jpg)
Methods | Taylor-net | HNN | ODE-net |
---|---|---|---|
, without noise | 0.213 | 0.377 | 1.416 |
, with noise | 1.667 | 2.433 | 3.301 |
, with noise | 1.293 | 2.416 | 27.114 |
![Refer to caption](extracted/5682117/graph/pred.jpg)
Now, to assess how well our method can predict the future flow, we compare the predictive ability of Taylor-net with ODE-net and HNN. We apply all three methods on the pendulum problem, and let and . We evaluate the performance of the models by calculating the average prediction error at each predicted points, defined by
(65) |
and the average over is
(66) |
where represents the testing sample size specified in Section 5.1.1 and with . After experimentation, we find that Taylor-net has stronger predictive ability than the other two methods. The first row of Table 2 shows the average prediction error of 100 testing samples using the three methods over when no noise is added. The prediction error of HNN is almost double that of Taylor-net, while the prediction error of ODE-net is about 7 times that of Taylor-net. To analyze the difference more quantitatively, we made several plots to help us better compare the prediction results. Figure 11 shows the plots of prediction error against over for all three methods. In Figure 12, we plot the prediction of position against time period for all three methods as well as the ground truth in order to see how well the prediction results match the ground truth. From Figure 12 (a), we can already see that the prediction result of ODE-net gradually deviates from the ground truth as time progresses, while the prediction of Taylor-net and HNN stays mostly consistent with the ground truth, with the former being slightly closer to the ground truth. The difference between Taylor-net and HNN can be seen more clearly in Figure 11 (a). Observe that the prediction error of Taylor-net is obviously smaller than that of the other two methods, and the difference becomes more and more apparent as time increases. The prediction error of ODE-net is larger than HNN and Taylor-net at the beginning of and increases at a much faster rate than the other two methods. Although the prediction error of HNN has no obvious difference from that of Taylor-net at the beginning, it gradually diverges from the prediction error of Taylor-net.
5.2 NSSNNs
5.2.1 Dataset generation and training settings
We use 6 linear layers with hidden size 64 to model , all of which are followed by a Sigmoid activation function except the last one. The derivatives , , , are all obtained by automatic differentiation in Pytorch [66]. The weights of the linear layers are initialized by Xavier initializaiton [67].
We generate the dataset for training and validation using high-precision numerical solver [55], where the ratio of training and validation datasets is . We set the dataset as the start input and as the target with , and the time span between and is . Feeding , and time step in Algorithm 1 to get the predicted variables . Accordingly, the loss function is defined as
(67) |
where is the batch size of the training samples. We use the Adam optimizer [48] with learning rate 0.05. The learning rate is multiplied by 0.8 for every 10 epoches.
Taking system as an example, we carry out a series of ablation tests based on our constructed networks to find the proper parameters. Normally, we set the time span, time step and dateset size as , and . The choice of in (14) is largely flexible since NSSNN is not sensitive to the parameter when it is larger than a certain threshold. We pick the loss function to train our network due to its better performance. In addition, we already introduced a regularization term in the symplectic integrator embedded in the network; thus, there is no need to add the regularization term in the loss function. The integral time step in the sympletic integrator is a vital parameter, and the choice of largely depends on the time span . In general, we should take relatively small for the dataset with larger time span .
5.2.2 Spring system
We compare five implementations that learn and predict Hamiltonian systems. The first one is NeuralODE [50], which trains the system by embedding the network into the Runge-Kutta (RK) integrator. The other four, however, achieve the goal by fitting the Hamiltonian based on (6). Specifically, HNN trains the network with the constraints of the Hamiltonian symplectic gradient along with the time derivative of system variables and then embeds the well-trained into the RK integrator for predicting the system [15]. The third and fourth implementations are ablation tests. One of them is improved HNN (IHNN), which embeds the well-trained into the nonseparable symplectic integrator (Tao’s integrator) for predicting. The other is to directly embed into the RK integrator for training, which we call HRK. The fifth method is NSSNN, which embeds into the nonseparable symplectic integrator for training.
![Refer to caption](x11.png)
For fair comparison, we adopt the same network structure (except that the dimension of output layer in NeuralODE is two times larger than that in the other four), the same loss function and same size of the dataset, and the precision of all integral schemes is second order, and the other parameters keep consistent with the one in Section 5.2.1. The time derivative in the dataset for training HNN and IHNN is obtained by the first difference method
(68) |
Figure 13 demonstrates the differences between the five methods using a spring system with different time span and same time step . We can see that by introducing the nonseparable symplectic integrator into the prediction of the Hamiltonian system, NSSNN has a stronger long-term predicting ability than all the other methods. In addition, the prediction of HNN and IHNN lies in the dataset with time derivative; consequently, it will lead to a larger error when the given time span is large.
5.2.3 Modeling vortex dynamics of multi-particle system
For two-dimensional vortex particle systems, the dynamical equations of particle positions with particle strengths can be written in the generalized Hamiltonian form as
(69) |
By including the given particle strengths in Algorithm 1, we can still adopt the method mentioned above to learn the Hamiltonian in (69) when there are fewer particles. However, considering a system with particles, the cost to collect training data from all particles might be high, and the training process can be time-consuming. Thus, instead of collecting information from all particles to train our model, we only use data collected from two bodies as training data to make predictions of the dynamics of particles.
Specifically, we assume the interactive models between particle pairs with unit particle strengths are the same, and their corresponding Hamiltonian can be represented as network , based on which the corresponding Hamiltonian of particles can be written as [19, 18]
(70) |
We embed (70) into the symplectic integrator that includes to obtain the final network architecture.
![Refer to caption](x12.png)
The setup of the multi-particle problem is similar to the previous problems. The training time span is while the prediction period can be up to . We use 2048 clean data samples to train our model. The training process takes about 100 epochs for the loss to converge. In Figure 14, we use our trained model to predict the dynamics of 6000-particle systems, including Taylor and Leapfrog vortices. We generate results of Taylor vortex and Leapfrop vortex using NSSNN and HNN and compare them with the ground truth. Vortex elements are used with corresponding initial vorticity conditions of Taylor vortex and Leapfrop vortex [68]. The difficulty of the numerical modeling of these two systems lies in the separation of different dynamical vortices instead of having them merging into a bigger structure. In both cases, the vortices evolved using NSSNN are separated nicely as the ground truth shows, while the vortices merge together using HNN.
5.3 RoeNet
5.3.1 Dataset generation and training settings
For our experiments, we construct datasets using either analytical solutions or numerical solutions calculated with a high-resolution finite difference method. These datasets are then divided into training and validation sets in a ratio. The physical quantities solved in our experiments are of order and, consequently, do not require normalization.
We train the network over a time span defined as and use it to predict target values over a time span of , where and starts no earlier than .
In all experiments, the Adam optimizer [48] is employed, with a learning rate of as listed in Table 3. The learning rate decays by a multiplicative factor of 0.9 every 5 to 20 epochs. This optimizer is chosen for its ability to adapt learning rates based on the gradient history of each parameter, which facilitates faster and more precise convergence compared to methods with fixed learning rates. Training is conducted with batch sizes ranging from 8 to 32, and all models undergo 100 epochs to ensure convergence. Notably, extending the number of training epochs can enhance training accuracy, reflecting a trade-off between training time and accuracy.
1C Linear | Sod Tube | |
Boundary condition | Periodic | Neumann |
Time step | 0.02 | 0.001 |
Space step | 0.01 | 0.005 |
Training time span | 0.04 | 0.06 |
Predicting time span | 2 | 0.1 |
Data set samples | 500 | 2000 |
Data set generation | Analytical | Analytical |
Components number | 1 | 3 |
Hidden dimension | 1 | 64 |
5.3.2 A simple example
![Refer to caption](x13.png)
![Refer to caption](x14.png)
![Refer to caption](x15.png)
![Refer to caption](x16.png)
Taking a linear hyperbolic PDE with one component (1C Linear in Table 3)
(71) |
in (18) as an example, we evaluate the performance of RoeNet. This hyperbolic PDE models a Gaussian wave traveling along a line at constant speed. Figure 15 illustrates the propagation of this Gaussian wave over time, simulated using RoeNet with both clean and noisy training data sets, alongside results from the Roe solver and the analytical solution. RoeNet’s predictions, regardless of noise in the training data, align closely with the analytical results throughout the entire computational time domain. In contrast, simulations using the Roe solver show rapid flattening and dissipation of the wave over time. Although the prediction error of RoeNet does accumulate gradually, this increase in numerical error is significantly slower than that observed with traditional numerical methods. As a result, RoeNet demonstrates superior performance with its more accurate predictions.
5.3.3 Sod shock tube
We take the one-dimensional diatomic ideal gas problem to assess the performance of our model on solving multi-component Riemann problems with nonlinear flux functions (Sode Tube in Table 3). Specifically, the system is modeled by (18) with
(72) |
where is the density, is the pressure, is the energy, is the velocity, and the pressure is related to the conserved quantities through the equation of state with a heat capacity ratio . We apply our model to the Sod shock tube problem [69], a one-dimensional Riemann problem in the form of (18) with (72). The time evolution of this problem can be described by solving the mass, momentum, and energy conservation of ideal gas inside a slender tube, which leads to three characteristics, describing the propagation speed of various regions in the system [69]. In Figure 16, we plot the three components of the problem, at . Note that due to the dissipation effects incorporated in our model, there is no sign of sonic glitch. The result shows that RoeNet exhibits higher accuracy in predicting the discontinuities of the nonlinear Riemann problem.
![Refer to caption](x17.png)
![Refer to caption](x18.png)
![Refer to caption](x19.png)
5.3.4 Comparison with other methods
Current neural network methods, such as Physics-Informed Neural Networks (PINNs) [27], typically require a pre-established PDE model and continuous interaction with this model during training to adjust the loss, using complex Hessian-based optimizers like L-BFGS that often result in extended training durations. In contrast, RoeNet operates independently of any explicit equation knowledge, utilizing only the training datasets and relying on more efficient gradient-based optimizers such as SGD.
Conventional neural networks struggle to predict the emergence and evolution of discontinuous solutions without a governing equation. Our model, RoeNet, showcases a unique capability to handle tasks that traditional machine learning approaches cannot, particularly in predicting dynamics for future times not included in the training data. This is demonstrated in Figure 17, where RoeNet outperforms PINNs [70] in the simulation of the 1C Linear problem described in Section 5.3.2, providing accurate predictions for future states beyond the training scope.
![Refer to caption](extracted/5682117/graph/DeepXDE.png)
RoeNet, as a data-driven solver, does not require prior knowledge of the system’s evolution equations, setting it apart from traditional numerical methods. It employs an optimization-based approach to construct its numerical scheme, with an optimization space that fully encompasses that of the Roe solver. This enables RoeNet to deliver more precise simulations of PDE evolution compared to conventional numerical approaches.
5.4 NVM
5.4.1 Dataset generation and training settings
We randomly sample 2 to 6 vortices and create the initial vorticity field through convolution with a Gaussian kernel . This process is repeated 2000 times to generate samples. DNS is performed to solve (31) in the periodic box using a standard pseudo-spectral method [71]. Aliasing errors are removed using the two-thirds truncation method with the maximum wavenumber . The Fourier coefficients of the velocity are advanced in time using a second-order Adams–Bashforth method. The time step is chosen to ensure that the Courant–Friedrichs–Lewy number is less than for numerical stability and accuracy. To obtain accurate DNS data samples, we set the grid size as . Regarding the kinematic viscosity, we set and for different cases. The pseudo-spectral method used in this DNS is similar to that described in [61, 72, 73].
We use samples with the time span for the training of the dynamics network. The DNS dataset is generated with random initial conditions independent of the predicted vortex evolution. The time step of vortex evolution is set as . For the leapfrog example, we set the parameters as and . For the turbulent flow example, we set the parameters as and . For other examples, the parameters are set as and . In general, the parameters are chosen within a wide range, indicating the robustness of the network. We use the trained network to predict the vortex dynamics at time . We show that the prediction time span is larger than the training time span in the results section, in some cases up to tens of times of .
For both the detection network and the dynamics network, we use Adam optimizer [48] with a learning rate of 1e-3. The learning rate decays every 20 epochs by a multiplicative factor of 0.8. For the detection network, we use a batch size of 32 and train it for 350 epochs. We use the cross entropy as the classification loss and L1 loss for position prediction. To relieve the unbalanced data problem in the detection network, we implement Focal loss [65] with and . It takes 15 minutes to converge on a single Nvidia RTX 2080Ti GPU. For the dynamics network, we use a batch size of 64 and train it for 500 epochs. We use L1 loss for position prediction. It takes 25 minutes to converge on a single Nvidia RTX 2080Ti GPU.
5.4.2 Comparison between NVM and LVM
![Refer to caption](x20.png)
To demonstrate that NVM is a better approach to capturing fluid dynamics than the traditional LVM, we compare the prediction results of the NVM and the LVM for solving NS equations in the periodic box. In the prediction, we initialize two vortex particles at and , where the corresponding particle strength are and . We plot the results using the NVM and LVM and the relative error of velocity in the simulation in Figure 18 (a), (b), and (c), respectively. Here, the relative error of velocity is defined as
(73) |
where denotes the predicted or simulated solution and denote the ground truth solution.
It is quite obvious that in Figure 18 (a), the predictions made by NVM match the positions of vortices generated by DNS almost perfectly, while the predictions made by BS law in Figures 18 (b) contain a large error. The divergence of the relative error of velocity is shown in Figure 18 (c), which shows that the NVM outperforms traditional methods by increasing amounts as the predicting period becomes longer.
5.4.3 Turbulent flows
![Refer to caption](x21.png)
Besides simple systems, NVM is capable of predicting complicated turbulence systems. This example’s primary purpose is to illustrate our network’s ability to handle more complex problems.
Figure 19 depicts the two-dimensional Lagrangian scalar fields at with the initial condition and resolution . The governing equation of the Lagrangian scalar fields is
(74) |
The evolution of the Lagrangian scalar fields is induced by and NVM vortex particles at random positions with random strengths . We remark that the same trained model is used for both cases. There is no correlation between the positions and vortex particle strengths of the two sets of vortex particles.
Based on the particle velocity field from the NVM, a backward-particle-tracking method is applied to solve (74). Then the iso-contour of the Lagrangian field can be extracted as material structures in the evolution [74, 75, 76, 77, 78]. In Figure 19 (a), the spiral structure [79, 80] of individual NVM vortex particles can be observed clearly due to the small number of NVM vortex particles. In Figure 19 (b), the underlying field exhibits turbulent behaviors since it is generated with a large number of NVM vortex particles.
Generally, the high-resolution results shown in Figure 19 can only be achieved by supercomputation using grid-based methods [74], while NVM allows these to be generated on any laptop with GPU. We demonstrate that NVM is capable of generating an accurate depiction of complex turbulence systems with low computational costs.
6 Conclusion
6.1 Summary
This thesis introduces a novel data-driven framework, which demonstrates a significant advancement in predictive modeling for long-term forecasts by integrating physics-based priors into learning algorithms. This integration ensures intrinsic preservation of the physical structures of the systems analyzed, thereby maintaining mathematical symmetries and physical conservation laws. As a result, the models demonstrate superior performance in terms of prediction accuracy, robustness, and predictive capability, particularly in identifying patterns not present within the training dataset, despite the use of small datasets, short training periods, and small sample sizes.
In particular, we have developed four distinct algorithms, each designed to incorporate specific physics-based priors relevant to different types of nonlinear systems. These include the symplectic structure for both separable and nonseparable Hamiltonian systems, Hyperbolic Conservation Law for hyperbolic partial differential equations, and Helmholtz’s Theorem for incompressible fluid dynamics. The integration of physics-based priors not only narrows the solution space, thereby streamlining computational demands, but also enhances the reliability and validity of the predictions. Moreover, embedding these structures within neural networks significantly expands their capacity to capture and reproduce complex patterns inherent in physical phenomena, which conventional networks often fail to recognize. This expanded capability allows for a more comprehensive representation of potential physical behaviors, substantially improving the models’ applicability and predictive accuracy.
6.2 Limitations and Future Work
We also recognize our models have several limitations. Firstly, neural networks that include an embedded integrator often require a longer training period compared to those trained on datasets with explicit time derivatives. Secondly, our method employs an explicit scheme for time evolution, which necessitates a small time step to ensure accuracy. Although a smaller time step can lead to higher discretization accuracy, this advantage must be weighed against increased training costs and the risk of gradient explosion. In our future work, we are considering the adoption of implicit formats, such as leveraging RNN structures, which may offer more stability and efficiency. In addition, our current model is designed as an end-to-end system that does not account for environmental variability. To address this issue, we will explore online learning techniques to enhance the model’s adaptability in changing conditions. Lastly, To enhance the applicability of our model, a significant focus of our future research will be dedicated to develo** scalable methods that can be generalized to various PDEs, aiming to achieve a versatile and universally applicable framework for various systems.
References
- Tong et al. [2021] Yun** Tong, Shiying Xiong, Xingzhe He, Guanghan Pan, and Bo Zhu. Symplectic neural networks in taylor series form for hamiltonian systems. Journal of Computational Physics, 437:110325, 2021.
- Xiong et al. [2020] Shiying Xiong, Yun** Tong, Xingzhe He, Shuqi Yang, Cheng Yang, and Bo Zhu. Nonseparable symplectic neural networks. arXiv preprint arXiv:2010.12636, 2020.
- Xiong et al. [2023] Shiying Xiong, Xingzhe He, Yun** Tong, Yitong Deng, and Bo Zhu. Neural vortex method: From finite lagrangian particles to infinite dimensional eulerian dynamics. Computers & Fluids, 258:105811, 2023.
- Tong et al. [2024] Yun** Tong, Shiying Xiong, Xingzhe He, Shuqi Yang, Zhecheng Wang, Rui Tao, Runze Liu, and Bo Zhu. Roenet: Predicting discontinuity of hyperbolic systems from continuous data. International Journal for Numerical Methods in Engineering, 125(6):e7406, 2024.
- Weinan [2021] E Weinan. The dawning of a new era in applied mathematics. Notices of the American Mathematical Society, 68(4):565–571, 2021.
- Brunton et al. [2020] S. L. Brunton, B. R. Noack, and P. Koumoutsakos. Machine Learning for Fluid Mechanics. Annu. Rev. Fluid Mech., 52:477–508, 2020.
- Hughes et al. [2019] T. W. Hughes, I. A. D. Williamson, M. Minkov, and S. Fan. Wave physics as an analog recurrent neural network. Sci. Adv., 5:6946, 2019.
- Sellier et al. [2019] J. M. Sellier, G. M. Caron, and J. Leygonie. Signed particles and neural networks, towards efficient simulations of quantum systems. J. Comput. Phys., 387:154–162, 2019.
- Hernandez et al. [2020] Quercus Hernandez, Alberto Badias, David Gonzalez, Francisco Chinesta, and Elias Cueto. Structure-preserving neural networks. arXiv:2004.04653, 2020.
- Teicherta et al. [2019] G. H. Teicherta, A. R. Natarajanc, A. Van der Venc, and K. Garikipati. Machine learning materials physics: Integrable deep neural networks enable scale bridging by learning free energy functions. Comput. Methods Appl. Mech. Engrg., 353:201–216, 2019.
- Regazzoni et al. [2019] F Regazzoni, L Dedé, and A Quarteroni. Machine learning for fast and reliable solution of time-dependent differential equations. J. Comput. Phys., 397:108852, 2019.
- Raissi and Karniadakis [2018] M. Raissi and G. E. Karniadakis. Hidden physics models: Machine learning of nonlinear partial differential equations. J. Comput. Phys., 357:125–141, 2018.
- Sirignano and Spiliopoulos [2018] J. Sirignano and K. Spiliopoulos. DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys., 375:686–707, 2018.
- Raissi et al. [2019] M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys., 378:686–707, 2019.
- Greydanus et al. [2019] S. Greydanus, M. Dzamba, and J. Yosinski. Hamiltonian neural networks. In Conference on Neural Information Processing Systems, pages 15379–15389, 2019.
- Chen et al. [2020] Z. Chen, J. Zhang, M. Arjovsky, and L. Bottou. Symplectic recurrent neural networks. In International Conference on Learning Representations, 2020.
- DiPietro et al. [2020] D. DiPietro, S. Xiong, and B. Zhu. Sparse symplectically integrated neural networks. In Advances in Neural Information Processing Systems, 2020.
- Sanchez-Gonzalez et al. [2019] A. Sanchez-Gonzalez, V. Bapst, K. Cranmer, and P. Battaglia. Hamiltonian graph networks with ODE integrators. arXiv:1909.12790, 2019.
- Battaglia et al. [2016] P. Battaglia, R. Pascanu, M. Lai, and D. J. Rezende. Interaction networks for learning about objects, relations and physics. In Advances in Neural Information Processing Systems, pages 4502–4510, 2016.
- ** et al. [2020] P. **, A. Zhu, G. E. Karniadakis, and Y. Tang. Symplectic networks: intrinsic structure-preserving networks for identifying Hamiltonian systems. arXiv:2001.03750, 2020.
- Toth et al. [2020] P. Toth, D. J. Rezende, A. Jaegle, S. Racaniére, A. Botev, and I. Higgins. Hamiltonian generative networks. In International Conference on Learning Representations, 2020.
- Zhong et al. [2020] Y. D. Zhong, B. Dey, and A. Chakraborty. Symplectic ODE-Net: learning Hamiltonian dynamics with control. In International Conference on Learning Representations, 2020.
- Yarosky [2017] D. Yarosky. Error bounds for approximations with deep ReLU networks. Neural Netw., 94:103–114, 2017.
- Petersen and Voigtländer [2018] P. Petersen and F. Voigtländer. Optimal approximation of piecewise smooth functions using deep ReLU neural networks. Neural Netw., 170:296–330, 2018.
- Imaizumi and Fukumizu [2019] M. Imaizumi and K. Fukumizu. Deep learning networks learn non-smooth functions effectively. In The Institute of Statistical Mathematics, pages 869–878. The 22nd International Conference on Artificial Intelligence and Statistics, 2019.
- Suzuki [2019] T. Suzuki. Adaptivity of deep relu network for learning in besov and mixed smooth besov spaces: Optimal rate and curse of dimensionality. In The University of Tokyo. International Conference on Learning Representations, 2019.
- Raissi et al. [2017] M. Raissi, P. Perdikaris, and G. E. Karniadakis. Inferring solutions of differential equations using noisy multi-fidelity data. J. Comput. Phys., 335:736–746, 2017.
- Hornik et al. [1989] K. Hornik, M. Stinchcombe, and W. Halbert. Multilayer feedforward networks are universal approximators. Neural Netw., 2:359–366, 1989.
- Zhang et al. [2019] D. Zhang, L. Guo, and G. E. Karniadakis. Learning in modal space: solving time-dependent stochastic PDEs using physics-informed neural networks. SIAM J. Sci. Comput., 42:A639–A665, 2019.
- Michoski et al. [2019] C. Michoski, M. Milosavljevic, T. Oliver, and D. Hatch. Solving differential equations using deep neural networks. Neurocomputing, 399:193–212, 2019.
- Mao et al. [2020] Z. Mao, A. D. Jagtap, and G. E. Karniadakis. Physics-informed neural networks for high-speed flows. Comput. Method. Appl. M., 360:112789, 2020.
- Duraisamy et al. [2019] K. Duraisamy, G. Iaccarino, and H. Xiao. Turbulence modeling in the age of data. Annu. Rev. Fluid Mech., 51:357–377, 2019.
- Xie et al. [2018] Y. Xie, E. Franz, M. Chu, and N. Thuerey. tempogan: A temporally coherent, volumetric gan for super-resolution fluid flow. ACM Trans. Graph., 37(4):1–15, 2018.
- Chu and Thuerey [2017] M. Chu and N. Thuerey. Data-driven synthesis of smoke flows with cnn-based feature descriptors. ACM Trans. Graph., 36(4):1–14, 2017.
- Anderson et al. [1996] J. Anderson, I. Kevrekidis, and R. Rico-Martinez. A comparison of recurrent training algorithms for time series analysis and system identification. Comput. Chem. Eng., 20:S751–S756, 1996.
- Crutchfield and McNamara [1987] James P Crutchfield and Bruce S McNamara. Equations of motion from a data series. Complex Syst., 1(417-452):121, 1987.
- Daniels and Nemenman [2015] Bryan C Daniels and Ilya Nemenman. Automated adaptive inference of phenomenological dynamical models. Nat. Commun., 6(1):1–8, 2015.
- Wang et al. [2017] J. Wang, J. Wu, and H. Xiao. Physics-informed machine learning approach for reconstructing reynolds stress modeling discrepancies based on dns data. Phys. Rev. Fluids, 2(3):034603, 2017.
- Hammond et al. [2022] J. Hammond, F. Montomoli, M. Pietropaoli, R. D. Sandberg, and V. M. Machine learning for the development of data-driven turbulence closures in coolant systems. J. Turbomach., 144(8):081003, 2022.
- Xu et al. [2022] X. Xu, F. Waschkowski, A. S. Ooi, and R. D. Sandberg. Towards robust and accurate reynolds-averaged closures for natural convection via multi-objective cfd-driven machine learning. Int. J. Heat Mass Transf., 187:122557, 2022.
- Mohan et al. [2020a] A. T. Mohan, N. Lubbers, D. Livescu, and M. Chertkov. Embedding hard physical constraints in convolutional neural networks for 3D turbulence. In International Conference on Learning Representations, 2020a.
- Yang et al. [2019] X. Yang, S. Zafar, J. Wang, and H. Xiao. Predictive large-eddy-simulation wall modeling via physics-informed neural networks. Phys. Rev. Fluids, 4:034602, 2019.
- Raissi et al. [2020] M. Raissi, A. Yazdani, and G. E. Karniadakis. Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations. Science, 367(6481):1026–1030, 2020.
- Belbute-Peres et al. [2020] F. Belbute-Peres, T. Economon, and Z. Kolter. Combining differentiable pde solvers and graph neural networks for fluid flow prediction. In International Conference on Machine Learning, pages 2402–2411, 2020.
- Lye et al. [2020] K. Lye, S. Mishra, and D. Ray. Deep learning observables in computational fluid dynamics. J. Comput. Phys., 410:109339, 2020.
- White et al. [2019] Cristina White, Daniela Ushizima, and Charbel Farhat. Neural networks predict fluid dynamics solutions from tiny datasets. arXiv preprint arXiv:1902.00091, 2019.
- Mohan et al. [2020b] Arvind T Mohan, Nicholas Lubbers, Daniel Livescu, and Michael Chertkov. Embedding hard physical constraints in neural network coarse-graining of 3d turbulence. arXiv preprint arXiv:2002.00021, 2020b.
- Kingma and Ba [2014] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
- He et al. [2016] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognitionn, pages 770–778, 2016.
- Chen et al. [2018] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud. Neural ordinary differential equations. In Conference on Neural Information Processing Systems, pages 6571–6583, 2018.
- Pontryagin [2018] Lev Semenovich Pontryagin. Mathematical theory of optimal processes. Routledge, 2018.
- Forest and Ruth [1990] E. Forest and R. D. Ruth. Fourth-order symplectic integration. Physica D, 43:105–117, 1990.
- Yoshida [1990] H. Yoshida. Construction of higher order symplectic integrators. Phys. Lett. A, 150:262–268, 1990.
- Candy and Rozmus [1991] J. Candy and W. Rozmus. A symplectic integration algorithm for separable Hamiltonian functions. J. Comput. Phys., 92:230–256, 1991.
- Tao [2016] Molei Tao. Explicit symplectic approximation of nonseparable hamiltonians: Algorithm and long time performance. Physical Review E, 94(4):043303, 2016.
- Wu et al. [2015] J. Z. Wu, H. Y. Ma, and M. D. Zhou. Vortical Flows. Springer, 2015.
- Evans [2010] L. C. Evans. Partial Differential Equations. American Mathematical Society, 2 edition, 2010.
- Roe [1981] P. L. Roe. Approximate riemann solvers, parameter vectors and difference schemes. J. Comput. Phys., 43:357–372, 1981.
- Helmholtz [1858] H. Helmholtz. Uber integrale der hydrodynamischen Gleichungen welche den Wirbel-bewegungen ensprechen. J. Reine Angew. Math, 55:25–55, 1858.
- Yang and Pullin [2010] Y. Yang and D. I. Pullin. On Lagrangian and vortex-surface fields for flows with Taylor–Green and Kida–Pelz initial conditions. J. Fluid Mech., 661:446–481, 2010.
- Xiong and Yang [2017] S. Xiong and Y. Yang. The boundary-constraint method for constructing vortex-surface fields. J. Comput. Phys., 339:31–45, 2017.
- Hao et al. [2019] J. Hao, S. Xiong, and Y. Yang. Tracking vortex surfaces frozen in the virtual velocity in non-ideal flows. J. Fluid Mech., 863:513–544, 2019.
- Cottet and Koumoutsakos [2000] G.H. Cottet and P.D. Koumoutsakos. Vortex Methods: Theory and Practice. Cambridge University Press, 2000.
- Redmon et al. [2016] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognitionn, pages 779–788, 2016.
- Lin et al. [2017] T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. Focal loss for dense object detection. IEEE Trans. Vis. Comput. Graph., pages 2980–2988, 2017.
- Paszke et al. [2019] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, pages 8026–8037, 2019.
- Glorot and Bengio [2010] X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 249–256, 2010.
- Qu et al. [2019] Z. Qu, X. Zhang, M. Gao, C. Jiang, and B. Chen. Efficient and conservative fluids using bidirectional map**. ACM Trans. Graph., 38:1–12, 2019.
- Sod [1978] G. A. Sod. A survey of several finite difference methods for systems of nonlinear hyperbolic conservation laws. J. Comput. Phys., 27:1–31, 1978.
- Lu et al. [2019] L. Lu, X. Meng, Z. Mao, and G. E. Karniadakis. DeepXDE: A deep learning library for solving differential equations. SIAM Rev. Soc. Ind. Appl. Math., 63:208–228, 2019.
- Rogallo [1981] R. S. Rogallo. Numerical experiments in homogeneous turbulence. In Technical Report TM81315, NASA, 1981.
- Xiong and Yang [2019] S. Xiong and Y. Yang. Construction of knotted vortex tubes with the writhe-dependent helicity. Phys. Fluids, 31:047101, 2019.
- Xiong and Yang [2020] S. Xiong and Y. Yang. Effects of twist on the evolution of knotted magnetic flux tubes. J. Fluid Mech., 895:A28, 2020.
- Yang et al. [2010] Y. Yang, D. I. Pullin, and I. Bermejo-Moreno. Multi-scale geometric analysis of Lagrangian structures in isotropic turbulence. J. Fluid Mech., 654:233–270, 2010.
- Yang and Pullin [2011] Y. Yang and D. I. Pullin. Geometric study of Lagrangian and Eulerian structures in turbulent channel flow. J. Fluid Mech., 674:67–92, 2011.
- Zhao et al. [2016] Y. Zhao, Y. Yang, and S. Chen. Evolution of material surfaces in the temporal transition in channel flow. J. Fluid Mech., 793:840–876, 2016.
- Zheng et al. [2016] W. Zheng, Y. Yang, and S. Chen. Evolutionary geometry of Lagrangian structures in a transitional boundary layer. Phys. Fluids, 28:035110, 2016.
- Zheng et al. [2019] W. Zheng, S. Ruan, Y. Yang, L. He, and S. Chen. Image-based modelling of the skin-friction coefficient in compressible boundary-layer transition. J. Fluid. Mech., 875:1175–1203, 2019.
- Lundgren [1982] T. S. Lundgren. Strained spiral vortex model for turbulent fine structure. Phys. Fluids, 25:2193–2203, 1982.
- Lundgren [1993] T. S. Lundgren. A small-scale turbulence model. Phys. Fluids A, 5:1472, 1993.