Search | arXiv e-print repository

Score-fPINN: Fractional Score-Based Physics-Informed Neural Networks for High-Dimensional Fokker-Planck-Levy Equations

Authors: Zheyuan Hu, Zhongqiang Zhang, George Em Karniadakis, Kenji Kawaguchi

Abstract: We introduce an innovative approach for solving high-dimensional Fokker-Planck-Lévy (FPL) equations in modeling non-Brownian processes across disciplines such as physics, finance, and ecology. We utilize a fractional score function and Physical-informed neural networks (PINN) to lift the curse of dimensionality (CoD) and alleviate numerical overflow from exponentially decaying solutions with dimen… ▽ More We introduce an innovative approach for solving high-dimensional Fokker-Planck-Lévy (FPL) equations in modeling non-Brownian processes across disciplines such as physics, finance, and ecology. We utilize a fractional score function and Physical-informed neural networks (PINN) to lift the curse of dimensionality (CoD) and alleviate numerical overflow from exponentially decaying solutions with dimensions. The introduction of a fractional score function allows us to transform the FPL equation into a second-order partial differential equation without fractional Laplacian and thus can be readily solved with standard physics-informed neural networks (PINNs). We propose two methods to obtain a fractional score function: fractional score matching (FSM) and score-fPINN for fitting the fractional score function. While FSM is more cost-effective, it relies on known conditional distributions. On the other hand, score-fPINN is independent of specific stochastic differential equations (SDEs) but requires evaluating the PINN model's derivatives, which may be more costly. We conduct our experiments on various SDEs and demonstrate numerical stability and effectiveness of our method in dealing with high-dimensional problems, marking a significant advancement in addressing the CoD in FPL equations. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 16 pages, 1 figure

ACM Class: F.2.2; I.2.7

arXiv:2404.08809 [pdf, other]

Leveraging viscous Hamilton-Jacobi PDEs for uncertainty quantification in scientific machine learning

Authors: Zongren Zou, Tingwei Meng, Paula Chen, Jérôme Darbon, George Em Karniadakis

Abstract: Uncertainty quantification (UQ) in scientific machine learning (SciML) combines the powerful predictive power of SciML with methods for quantifying the reliability of the learned models. However, two major challenges remain: limited interpretability and expensive training procedures. We provide a new interpretation for UQ problems by establishing a new theoretical connection between some Bayesian… ▽ More Uncertainty quantification (UQ) in scientific machine learning (SciML) combines the powerful predictive power of SciML with methods for quantifying the reliability of the learned models. However, two major challenges remain: limited interpretability and expensive training procedures. We provide a new interpretation for UQ problems by establishing a new theoretical connection between some Bayesian inference problems arising in SciML and viscous Hamilton-Jacobi partial differential equations (HJ PDEs). Namely, we show that the posterior mean and covariance can be recovered from the spatial gradient and Hessian of the solution to a viscous HJ PDE. As a first exploration of this connection, we specialize to Bayesian inference problems with linear models, Gaussian likelihoods, and Gaussian priors. In this case, the associated viscous HJ PDEs can be solved using Riccati ODEs, and we develop a new Riccati-based methodology that provides computational advantages when continuously updating the model predictions. Specifically, our Riccati-based approach can efficiently add or remove data points to the training set invariant to the order of the data and continuously tune hyperparameters. Moreover, neither update requires retraining on or access to previously incorporated data. We provide several examples from SciML involving noisy data and \textit{epistemic uncertainty} to illustrate the potential advantages of our approach. In particular, this approach's amenability to data streaming applications demonstrates its potential for real-time inferences, which, in turn, allows for applications in which the predicted uncertainty is used to dynamically alter the learning process. △ Less

Submitted 12 April, 2024; originally announced April 2024.

MSC Class: 35F21; 62F15; 65L99; 65N99; 68T05; 35B37

arXiv:2402.07465 [pdf, other]

Score-Based Physics-Informed Neural Networks for High-Dimensional Fokker-Planck Equations

Authors: Zheyuan Hu, Zhongqiang Zhang, George Em Karniadakis, Kenji Kawaguchi

Abstract: The Fokker-Planck (FP) equation is a foundational PDE in stochastic processes. However, curse of dimensionality (CoD) poses challenge when dealing with high-dimensional FP PDEs. Although Monte Carlo and vanilla Physics-Informed Neural Networks (PINNs) have shown the potential to tackle CoD, both methods exhibit numerical errors in high dimensions when dealing with the probability density function… ▽ More The Fokker-Planck (FP) equation is a foundational PDE in stochastic processes. However, curse of dimensionality (CoD) poses challenge when dealing with high-dimensional FP PDEs. Although Monte Carlo and vanilla Physics-Informed Neural Networks (PINNs) have shown the potential to tackle CoD, both methods exhibit numerical errors in high dimensions when dealing with the probability density function (PDF) associated with Brownian motion. The point-wise PDF values tend to decrease exponentially as dimension increases, surpassing the precision of numerical simulations and resulting in substantial errors. Moreover, due to its massive sampling, Monte Carlo fails to offer fast sampling. Modeling the logarithm likelihood (LL) via vanilla PINNs transforms the FP equation into a difficult HJB equation, whose error grows rapidly with dimension. To this end, we propose a novel approach utilizing a score-based solver to fit the score function in SDEs. The score function, defined as the gradient of the LL, plays a fundamental role in inferring LL and PDF and enables fast SDE sampling. Three fitting methods, Score Matching (SM), Sliced SM (SSM), and Score-PINN, are introduced. The proposed score-based SDE solver operates in two stages: first, employing SM, SSM, or Score-PINN to acquire the score; and second, solving the LL via an ODE using the obtained score. Comparative evaluations across these methods showcase varying trade-offs. The proposed method is evaluated across diverse SDEs, including anisotropic OU processes, geometric Brownian, and Brownian with varying eigenspace. We also test various distributions, including Gaussian, Log-normal, Laplace, and Cauchy. The numerical results demonstrate the score-based SDE solver's stability, speed, and performance across different settings, solidifying its potential as a solution to CoD for high-dimensional FP equations. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: 22 pages

MSC Class: 14J60

arXiv:2312.14499 [pdf, other]

doi 10.1016/j.cma.2024.116883

Hutchinson Trace Estimation for High-Dimensional and High-Order Physics-Informed Neural Networks

Authors: Zheyuan Hu, Zekun Shi, George Em Karniadakis, Kenji Kawaguchi

Abstract: Physics-Informed Neural Networks (PINNs) have proven effective in solving partial differential equations (PDEs), especially when some data are available by seamlessly blending data and physics. However, extending PINNs to high-dimensional and even high-order PDEs encounters significant challenges due to the computational cost associated with automatic differentiation in the residual loss. Herein,… ▽ More Physics-Informed Neural Networks (PINNs) have proven effective in solving partial differential equations (PDEs), especially when some data are available by seamlessly blending data and physics. However, extending PINNs to high-dimensional and even high-order PDEs encounters significant challenges due to the computational cost associated with automatic differentiation in the residual loss. Herein, we address the limitations of PINNs in handling high-dimensional and high-order PDEs by introducing Hutchinson Trace Estimation (HTE). Starting with the second-order high-dimensional PDEs ubiquitous in scientific computing, HTE transforms the calculation of the entire Hessian matrix into a Hessian vector product (HVP). This approach alleviates the computational bottleneck via Taylor-mode automatic differentiation and significantly reduces memory consumption from the Hessian matrix to HVP. We further showcase HTE's convergence to the original PINN loss and its unbiased behavior under specific conditions. Comparisons with Stochastic Dimension Gradient Descent (SDGD) highlight the distinct advantages of HTE, particularly in scenarios with significant variance among dimensions. We further extend HTE to higher-order and higher-dimensional PDEs, specifically addressing the biharmonic equation. By employing tensor-vector products (TVP), HTE efficiently computes the colossal tensor associated with the fourth-order high-dimensional biharmonic equation, saving memory and enabling rapid computation. The effectiveness of HTE is illustrated through experimental setups, demonstrating comparable convergence rates with SDGD under memory and speed constraints. Additionally, HTE proves valuable in accelerating the Gradient-Enhanced PINN (gPINN) version as well as the Biharmonic equation. Overall, HTE opens up a new capability in scientific machine learning for tackling high-order and high-dimensional PDEs. △ Less

Submitted 3 March, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

Comments: Published in Computer Methods in Applied Mechanics and Engineering

MSC Class: 14J60

Journal ref: Computer Methods in Applied Mechanics and Engineering, Volume 424, 1 May 2024, 116883

arXiv:2311.15283 [pdf, other]

Bias-Variance Trade-off in Physics-Informed Neural Networks with Randomized Smoothing for High-Dimensional PDEs

Authors: Zheyuan Hu, Zhouhao Yang, Yezhen Wang, George Em Karniadakis, Kenji Kawaguchi

Abstract: While physics-informed neural networks (PINNs) have been proven effective for low-dimensional partial differential equations (PDEs), the computational cost remains a hurdle in high-dimensional scenarios. This is particularly pronounced when computing high-order and high-dimensional derivatives in the physics-informed loss. Randomized Smoothing PINN (RS-PINN) introduces Gaussian noise for stochasti… ▽ More While physics-informed neural networks (PINNs) have been proven effective for low-dimensional partial differential equations (PDEs), the computational cost remains a hurdle in high-dimensional scenarios. This is particularly pronounced when computing high-order and high-dimensional derivatives in the physics-informed loss. Randomized Smoothing PINN (RS-PINN) introduces Gaussian noise for stochastic smoothing of the original neural net model, enabling Monte Carlo methods for derivative approximation, eliminating the need for costly auto-differentiation. Despite its computational efficiency in high dimensions, RS-PINN introduces biases in both loss and gradients, negatively impacting convergence, especially when coupled with stochastic gradient descent (SGD). We present a comprehensive analysis of biases in RS-PINN, attributing them to the nonlinearity of the Mean Squared Error (MSE) loss and the PDE nonlinearity. We propose tailored bias correction techniques based on the order of PDE nonlinearity. The unbiased RS-PINN allows for a detailed examination of its pros and cons compared to the biased version. Specifically, the biased version has a lower variance and runs faster than the unbiased version, but it is less accurate due to the bias. To optimize the bias-variance trade-off, we combine the two approaches in a hybrid method that balances the rapid convergence of the biased version with the high accuracy of the unbiased version. In addition, we present an enhanced implementation of RS-PINN. Extensive experiments on diverse high-dimensional PDEs, including Fokker-Planck, HJB, viscous Burgers', Allen-Cahn, and Sine-Gordon equations, illustrate the bias-variance trade-off and highlight the effectiveness of the hybrid RS-PINN. Empirical guidelines are provided for selecting biased, unbiased, or hybrid versions, depending on the dimensionality and nonlinearity of the specific PDE problem. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: 21 pages, 5 figures

MSC Class: 14J60

arXiv:2307.12306 [pdf, other]

doi 10.1016/j.neunet.2024.106369

Tackling the Curse of Dimensionality with Physics-Informed Neural Networks

Authors: Zheyuan Hu, Khemraj Shukla, George Em Karniadakis, Kenji Kawaguchi

Abstract: The curse-of-dimensionality taxes computational resources heavily with exponentially increasing computational cost as the dimension increases. This poses great challenges in solving high-dimensional PDEs, as Richard E. Bellman first pointed out over 60 years ago. While there has been some recent success in solving numerically partial differential equations (PDEs) in high dimensions, such computati… ▽ More The curse-of-dimensionality taxes computational resources heavily with exponentially increasing computational cost as the dimension increases. This poses great challenges in solving high-dimensional PDEs, as Richard E. Bellman first pointed out over 60 years ago. While there has been some recent success in solving numerically partial differential equations (PDEs) in high dimensions, such computations are prohibitively expensive, and true scaling of general nonlinear PDEs to high dimensions has never been achieved. We develop a new method of scaling up physics-informed neural networks (PINNs) to solve arbitrary high-dimensional PDEs. The new method, called Stochastic Dimension Gradient Descent (SDGD), decomposes a gradient of PDEs into pieces corresponding to different dimensions and randomly samples a subset of these dimensional pieces in each iteration of training PINNs. We prove theoretically the convergence and other desired properties of the proposed method. We demonstrate in various diverse tests that the proposed method can solve many notoriously hard high-dimensional PDEs, including the Hamilton-Jacobi-Bellman (HJB) and the Schrödinger equations in tens of thousands of dimensions very fast on a single GPU using the PINNs mesh-free approach. Notably, we solve nonlinear PDEs with nontrivial, anisotropic, and inseparable solutions in 100,000 effective dimensions in 12 hours on a single GPU using SDGD with PINNs. Since SDGD is a general training methodology of PINNs, it can be applied to any current and future variants of PINNs to scale them up for arbitrary high-dimensional PDEs. △ Less

Submitted 17 May, 2024; v1 submitted 23 July, 2023; originally announced July 2023.

Comments: Accepted by Neural Networks. Code is available at https://github.com/zheyuanhu01/SDGD_PINN

MSC Class: 14J60 ACM Class: F.2.2; I.2.7

Journal ref: Neural Networks, Volume 176, 2024, 106369, ISSN 0893-6080

arXiv:2211.08939 [pdf, other]

doi 10.1016/j.engappai.2023.107183

Augmented Physics-Informed Neural Networks (APINNs): A gating network-based soft domain decomposition methodology

Authors: Zheyuan Hu, Ameya D. Jagtap, George Em Karniadakis, Kenji Kawaguchi

Abstract: In this paper, we propose the augmented physics-informed neural network (APINN), which adopts soft and trainable domain decomposition and flexible parameter sharing to further improve the extended PINN (XPINN) as well as the vanilla PINN methods. In particular, a trainable gate network is employed to mimic the hard decomposition of XPINN, which can be flexibly fine-tuned for discovering a potentia… ▽ More In this paper, we propose the augmented physics-informed neural network (APINN), which adopts soft and trainable domain decomposition and flexible parameter sharing to further improve the extended PINN (XPINN) as well as the vanilla PINN methods. In particular, a trainable gate network is employed to mimic the hard decomposition of XPINN, which can be flexibly fine-tuned for discovering a potentially better partition. It weight-averages several sub-nets as the output of APINN. APINN does not require complex interface conditions, and its sub-nets can take advantage of all training samples rather than just part of the training data in their subdomains. Lastly, each sub-net shares part of the common parameters to capture the similar components in each decomposed function. Furthermore, following the PINN generalization theory in Hu et al. [2021], we show that APINN can improve generalization by proper gate network initialization and general domain & function decomposition. Extensive experiments on different types of PDEs demonstrate how APINN improves the PINN and XPINN methods. Specifically, we present examples where XPINN performs similarly to or worse than PINN, so that APINN can significantly improve both. We also show cases where XPINN is already better than PINN, so APINN can still slightly improve XPINN. Furthermore, we visualize the optimized gating networks and their optimization trajectories, and connect them with their performance, which helps discover the possibly optimal decomposition. Interestingly, if initialized by different decomposition, the performances of corresponding APINNs can differ drastically. This, in turn, shows the potential to design an optimal domain decomposition for the differential equation problem under consideration. △ Less

Submitted 29 September, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: Accepted at Engineering Applications of Artificial Intelligence (EAAI)

Journal ref: Engineering Applications of Artificial Intelligence, Volume 126, Part B, November 2023, 107183

arXiv:2204.02488 [pdf, other]

Discovering and forecasting extreme events via active learning in neural operators

Authors: Ethan Pickering, Stephen Guth, George Em Karniadakis, Themistoklis P. Sapsis

Abstract: Extreme events in society and nature, such as pandemic spikes, rogue waves, or structural failures, can have catastrophic consequences. Characterizing extremes is difficult as they occur rarely, arise from seemingly benign conditions, and belong to complex and often unknown infinite-dimensional systems. Such challenges render attempts at characterizing them as moot. We address each of these diffic… ▽ More Extreme events in society and nature, such as pandemic spikes, rogue waves, or structural failures, can have catastrophic consequences. Characterizing extremes is difficult as they occur rarely, arise from seemingly benign conditions, and belong to complex and often unknown infinite-dimensional systems. Such challenges render attempts at characterizing them as moot. We address each of these difficulties by combining novel training schemes in Bayesian experimental design (BED) with an ensemble of deep neural operators (DNOs). This model-agnostic framework pairs a BED scheme that actively selects data for quantifying extreme events with an ensemble of DNOs that approximate infinite-dimensional nonlinear operators. We find that not only does this framework clearly beat Gaussian processes (GPs) but that 1) shallow ensembles of just two members perform best; 2) extremes are uncovered regardless of the state of initial data (i.e. with or without extremes); 3) our method eliminates "double-descent" phenomena; 4) the use of batches of suboptimal acquisition points compared to step-by-step global optima does not hinder BED performance; and 5) Monte Carlo acquisition outperforms standard optimizers in high-dimensions. Together these conclusions form the foundation of an AI-assisted experimental infrastructure that can efficiently infer and pinpoint critical situations across many domains, from physical to societal systems. △ Less

Submitted 20 September, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

Comments: 25 pages, 8 figures, Submitted to Nature Computational Science

arXiv:2109.09444 [pdf, other]

doi 10.1137/21M1447039

When Do Extended Physics-Informed Neural Networks (XPINNs) Improve Generalization?

Authors: Zheyuan Hu, Ameya D. Jagtap, George Em Karniadakis, Kenji Kawaguchi

Abstract: Physics-informed neural networks (PINNs) have become a popular choice for solving high-dimensional partial differential equations (PDEs) due to their excellent approximation power and generalization ability. Recently, Extended PINNs (XPINNs) based on domain decomposition methods have attracted considerable attention due to their effectiveness in modeling multiscale and multiphysics problems and th… ▽ More Physics-informed neural networks (PINNs) have become a popular choice for solving high-dimensional partial differential equations (PDEs) due to their excellent approximation power and generalization ability. Recently, Extended PINNs (XPINNs) based on domain decomposition methods have attracted considerable attention due to their effectiveness in modeling multiscale and multiphysics problems and their parallelization. However, theoretical understanding on their convergence and generalization properties remains unexplored. In this study, we take an initial step towards understanding how and when XPINNs outperform PINNs. Specifically, for general multi-layer PINNs and XPINNs, we first provide a prior generalization bound via the complexity of the target functions in the PDE problem, and a posterior generalization bound via the posterior matrix norms of the networks after optimization. Moreover, based on our bounds, we analyze the conditions under which XPINNs improve generalization. Concretely, our theory shows that the key building block of XPINN, namely the domain decomposition, introduces a tradeoff for generalization. On the one hand, XPINNs decompose the complex PDE solution into several simple parts, which decreases the complexity needed to learn each part and boosts generalization. On the other hand, decomposition leads to less training data being available in each subdomain, and hence such model is typically prone to overfitting and may become less generalizable. Empirically, we choose five PDEs to show when XPINNs perform better than, similar to, or worse than PINNs, hence demonstrating and justifying our new theory. △ Less

Submitted 18 October, 2022; v1 submitted 20 September, 2021; originally announced September 2021.

Comments: Published in SIAM Journal on Scientific Computing (SISC)

Journal ref: SIAM Journal on Scientific Computing Vol. 44, Iss. 5 (2022)

arXiv:2008.10653 [pdf, other]

Solving Inverse Stochastic Problems from Discrete Particle Observations Using the Fokker-Planck Equation and Physics-informed Neural Networks

Authors: Xiaoli Chen, Liu Yang, **qiao Duan, George Em Karniadakis

Abstract: The Fokker-Planck (FP) equation governing the evolution of the probability density function (PDF) is applicable to many disciplines but it requires specification of the coefficients for each case, which can be functions of space-time and not just constants, hence requiring the development of a data-driven modeling approach. When the data available is directly on the PDF, then there exist methods f… ▽ More The Fokker-Planck (FP) equation governing the evolution of the probability density function (PDF) is applicable to many disciplines but it requires specification of the coefficients for each case, which can be functions of space-time and not just constants, hence requiring the development of a data-driven modeling approach. When the data available is directly on the PDF, then there exist methods for inverse problems that can be employed to infer the coefficients and thus determine the FP equation and subsequently obtain its solution. Herein, we address a more realistic scenario, where only sparse data are given on the particles' positions at a few time instants, which are not sufficient to accurately construct directly the PDF even at those times from existing methods, e.g., kernel estimation algorithms. To this end, we develop a general framework based on physics-informed neural networks (PINNs) that introduces a new loss function using the Kullback-Leibler divergence to connect the stochastic samples with the FP equation, to simultaneously learn the equation and infer the multi-dimensional PDF at all times. In particular, we consider two types of inverse problems, type I where the FP equation is known but the initial PDF is unknown, and type II in which, in addition to unknown initial PDF, the drift and diffusion terms are also unknown. In both cases, we investigate problems with either Brownian or Levy noise or a combination of both. We demonstrate the new PINN framework in detail in the one-dimensional case (1D) but we also provide results for up to 5D demonstrating that we can infer both the FP equation and} dynamics simultaneously at all times with high accuracy using only very few discrete observations of the particles. △ Less

Submitted 24 August, 2020; originally announced August 2020.

Comments: The first two authors contributed equally to this paper. Corresponding author: George Em Karniadakis

arXiv:2008.01915 [pdf, other]

Generative Ensemble Regression: Learning Particle Dynamics from Observations of Ensembles with Physics-Informed Deep Generative Models

Authors: Liu Yang, Constantinos Daskalakis, George Em Karniadakis

Abstract: We propose a new method for inferring the governing stochastic ordinary differential equations (SODEs) by observing particle ensembles at discrete and sparse time instants, i.e., multiple "snapshots". Particle coordinates at a single time instant, possibly noisy or truncated, are recorded in each snapshot but are unpaired across the snapshots. By training a physics-informed generative model that g… ▽ More We propose a new method for inferring the governing stochastic ordinary differential equations (SODEs) by observing particle ensembles at discrete and sparse time instants, i.e., multiple "snapshots". Particle coordinates at a single time instant, possibly noisy or truncated, are recorded in each snapshot but are unpaired across the snapshots. By training a physics-informed generative model that generates "fake" sample paths, we aim to fit the observed particle ensemble distributions with a curve in the probability measure space, which is induced from the inferred particle dynamics. We employ different metrics to quantify the differences between distributions, e.g., the sliced Wasserstein distances and the adversarial losses in generative adversarial networks (GANs). We refer to this method as generative "ensemble-regression" (GER), in analogy to the classic "point-regression", where we infer the dynamics by performing regression in the Euclidean space. We illustrate the GER by learning the drift and diffusion terms of particle ensembles governed by SODEs with Brownian motions and Levy processes up to 100 dimensions. We also discuss how to treat cases with noisy or truncated observations. Apart from systems consisting of independent particles, we also tackle nonlocal interacting particle systems with unknown interaction potential parameters by constructing a physics-informed loss function. Finally, we investigate scenarios of paired observations and discuss how to reduce the dimensionality in such cases by proving a convergence theorem that provides theoretical support. △ Less

Submitted 20 March, 2021; v1 submitted 4 August, 2020; originally announced August 2020.

arXiv:2005.03596 [pdf, other]

Physics-informed neural network for ultrasound nondestructive quantification of surface breaking cracks

Authors: Khemraj Shukla, Patricio Clark Di Leoni, James Blackshire, Daniel Sparkman, George Em Karniadakis

Abstract: We introduce an optimized physics-informed neural network (PINN) trained to solve the problem of identifying and characterizing a surface breaking crack in a metal plate. PINNs are neural networks that can combine data and physics in the learning process by adding the residuals of a system of Partial Differential Equations to the loss function. Our PINN is supervised with realistic ultrasonic surf… ▽ More We introduce an optimized physics-informed neural network (PINN) trained to solve the problem of identifying and characterizing a surface breaking crack in a metal plate. PINNs are neural networks that can combine data and physics in the learning process by adding the residuals of a system of Partial Differential Equations to the loss function. Our PINN is supervised with realistic ultrasonic surface acoustic wave data acquired at a frequency of 5 MHz. The ultrasonic surface wave data is represented as a surface deformation on the top surface of a metal plate, measured by using the method of laser vibrometry. The PINN is physically informed by the acoustic wave equation and its convergence is sped up using adaptive activation functions. The adaptive activation function uses a scalable hyperparameter in the activation function, which is optimized to achieve best performance of the network as it changes dynamically the topology of the loss function involved in the optimization process. The usage of adaptive activation function significantly improves the convergence, notably observed in the current study. We use PINNs to estimate the speed of sound of the metal plate, which we do with an error of 1\%, and then, by allowing the speed of sound to be space dependent, we identify and characterize the crack as the positions where the speed of sound has decreased. Our study also shows the effect of sub-sampling of the data on the sensitivity of sound speed estimates. More broadly, the resulting model shows a promising deep neural network model for ill-posed inverse problems. △ Less

Submitted 7 May, 2020; originally announced May 2020.

Comments: 19 pages, 12 Figures

arXiv:2003.06097 [pdf, other]

doi 10.1016/j.jcp.2020.109913

B-PINNs: Bayesian Physics-Informed Neural Networks for Forward and Inverse PDE Problems with Noisy Data

Authors: Liu Yang, Xuhui Meng, George Em Karniadakis

Abstract: We propose a Bayesian physics-informed neural network (B-PINN) to solve both forward and inverse nonlinear problems described by partial differential equations (PDEs) and noisy data. In this Bayesian framework, the Bayesian neural network (BNN) combined with a PINN for PDEs serves as the prior while the Hamiltonian Monte Carlo (HMC) or the variational inference (VI) could serve as an estimator of… ▽ More We propose a Bayesian physics-informed neural network (B-PINN) to solve both forward and inverse nonlinear problems described by partial differential equations (PDEs) and noisy data. In this Bayesian framework, the Bayesian neural network (BNN) combined with a PINN for PDEs serves as the prior while the Hamiltonian Monte Carlo (HMC) or the variational inference (VI) could serve as an estimator of the posterior. B-PINNs make use of both physical laws and scattered noisy measurements to provide predictions and quantify the aleatoric uncertainty arising from the noisy data in the Bayesian framework. Compared with PINNs, in addition to uncertainty quantification, B-PINNs obtain more accurate predictions in scenarios with large noise due to their capability of avoiding overfitting. We conduct a systematic comparison between the two different approaches for the B-PINN posterior estimation (i.e., HMC or VI), along with dropout used for quantifying uncertainty in deep neural networks. Our experiments show that HMC is more suitable than VI for the B-PINNs posterior estimation, while dropout employed in PINNs can hardly provide accurate predictions with reasonable uncertainty. Finally, we replace the BNN in the prior with a truncated Karhunen-Loève (KL) expansion combined with HMC or a deep normalizing flow (DNF) model as posterior estimators. The KL is as accurate as BNN and much faster but this framework cannot be easily extended to high-dimensional problems unlike the BNN based framework. △ Less

Submitted 13 March, 2020; originally announced March 2020.

Comments: The first two authors contributed equally to this work

arXiv:2001.03750 [pdf, other]

SympNets: Intrinsic structure-preserving symplectic networks for identifying Hamiltonian systems

Authors: Pengzhan **, Zhen Zhang, Aiqing Zhu, Yifa Tang, George Em Karniadakis

Abstract: We propose new symplectic networks (SympNets) for identifying Hamiltonian systems from data based on a composition of linear, activation and gradient modules. In particular, we define two classes of SympNets: the LA-SympNets composed of linear and activation modules, and the G-SympNets composed of gradient modules. Correspondingly, we prove two new universal approximation theorems that demonstrate… ▽ More We propose new symplectic networks (SympNets) for identifying Hamiltonian systems from data based on a composition of linear, activation and gradient modules. In particular, we define two classes of SympNets: the LA-SympNets composed of linear and activation modules, and the G-SympNets composed of gradient modules. Correspondingly, we prove two new universal approximation theorems that demonstrate that SympNets can approximate arbitrary symplectic maps based on appropriate activation functions. We then perform several experiments including the pendulum, double pendulum and three-body problems to investigate the expressivity and the generalization ability of SympNets. The simulation results show that even very small size SympNets can generalize well, and are able to handle both separable and non-separable Hamiltonian systems with data points resulting from short or long time steps. In all the test cases, SympNets outperform the baseline models, and are much faster in training and prediction. We also develop an extended version of SympNets to learn the dynamics from irregularly sampled data. This extended version of SympNets can be thought of as a universal model representing the solution to an arbitrary Hamiltonian system. △ Less

Submitted 19 August, 2020; v1 submitted 11 January, 2020; originally announced January 2020.

arXiv:1912.00873 [pdf, other]

Variational Physics-Informed Neural Networks For Solving Partial Differential Equations

Authors: E. Kharazmi, Z. Zhang, G. E. Karniadakis

Abstract: Physics-informed neural networks (PINNs) [31] use automatic differentiation to solve partial differential equations (PDEs) by penalizing the PDE in the loss function at a random set of points in the domain of interest. Here, we develop a Petrov-Galerkin version of PINNs based on the nonlinear approximation of deep neural networks (DNNs) by selecting the {\em trial space} to be the space of neural… ▽ More Physics-informed neural networks (PINNs) [31] use automatic differentiation to solve partial differential equations (PDEs) by penalizing the PDE in the loss function at a random set of points in the domain of interest. Here, we develop a Petrov-Galerkin version of PINNs based on the nonlinear approximation of deep neural networks (DNNs) by selecting the {\em trial space} to be the space of neural networks and the {\em test space} to be the space of Legendre polynomials. We formulate the \textit{variational residual} of the PDE using the DNN approximation by incorporating the variational form of the problem into the loss function of the network and construct a \textit{variational physics-informed neural network} (VPINN). By integrating by parts the integrand in the variational form, we lower the order of the differential operators represented by the neural networks, hence effectively reducing the training cost in VPINNs while increasing their accuracy compared to PINNs that essentially employ delta test functions. For shallow networks with one hidden layer, we analytically obtain explicit forms of the \textit{variational residual}. We demonstrate the performance of the new formulation for several examples that show clear advantages of VPINNs over PINNs in terms of both accuracy and speed. △ Less

Submitted 27 November, 2019; originally announced December 2019.

Comments: 24 pages, 12 figures

arXiv:1910.13444 [pdf, other]

Highly-scalable, physics-informed GANs for learning solutions of stochastic PDEs

Authors: Liu Yang, Sean Treichler, Thorsten Kurth, Keno Fischer, David Barajas-Solano, Josh Romero, Valentin Churavy, Alexandre Tartakovsky, Michael Houston, Prabhat, George Karniadakis

Abstract: Uncertainty quantification for forward and inverse problems is a central challenge across physical and biomedical disciplines. We address this challenge for the problem of modeling subsurface flow at the Hanford Site by combining stochastic computational models with observational data using physics-informed GAN models. The geographic extent, spatial heterogeneity, and multiple correlation length s… ▽ More Uncertainty quantification for forward and inverse problems is a central challenge across physical and biomedical disciplines. We address this challenge for the problem of modeling subsurface flow at the Hanford Site by combining stochastic computational models with observational data using physics-informed GAN models. The geographic extent, spatial heterogeneity, and multiple correlation length scales of the Hanford Site require training a computationally intensive GAN model to thousands of dimensions. We develop a hierarchical scheme for exploiting domain parallelism, map discriminators and generators to multiple GPUs, and employ efficient communication schemes to ensure training stability and convergence. We developed a highly optimized implementation of this scheme that scales to 27,500 NVIDIA Volta GPUs and 4584 nodes on the Summit supercomputer with a 93.1% scaling efficiency, achieving peak and sustained half-precision rates of 1228 PF/s and 1207 PF/s. △ Less

Submitted 28 October, 2019; originally announced October 2019.

Comments: 3rd Deep Learning on Supercomputers Workshop (DLS) at SC19

arXiv:1910.03193 [pdf, other]

doi 10.1038/s42256-021-00302-5

DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators

Authors: Lu Lu, Pengzhan **, George Em Karniadakis

Abstract: While it is widely known that neural networks are universal approximators of continuous functions, a less known and perhaps more powerful result is that a neural network with a single hidden layer can approximate accurately any nonlinear continuous operator. This universal approximation theorem is suggestive of the potential application of neural networks in learning nonlinear operators from data.… ▽ More While it is widely known that neural networks are universal approximators of continuous functions, a less known and perhaps more powerful result is that a neural network with a single hidden layer can approximate accurately any nonlinear continuous operator. This universal approximation theorem is suggestive of the potential application of neural networks in learning nonlinear operators from data. However, the theorem guarantees only a small approximation error for a sufficient large network, and does not consider the important optimization and generalization errors. To realize this theorem in practice, we propose deep operator networks (DeepONets) to learn operators accurately and efficiently from a relatively small dataset. A DeepONet consists of two sub-networks, one for encoding the input function at a fixed number of sensors $x_i, i=1,\dots,m$ (branch net), and another for encoding the locations for the output functions (trunk net). We perform systematic simulations for identifying two types of operators, i.e., dynamic systems and partial differential equations, and demonstrate that DeepONet significantly reduces the generalization error compared to the fully-connected networks. We also derive theoretically the dependence of the approximation error in terms of the number of sensors (where the input function is defined) as well as the input function type, and we verify the theorem with computational results. More importantly, we observe high-order error convergence in our computational tests, namely polynomial rates (from half order to fourth order) and even exponential convergence with respect to the training dataset size. △ Less

Submitted 14 April, 2020; v1 submitted 7 October, 2019; originally announced October 2019.

arXiv:1909.12228 [pdf, other]

doi 10.1098/rspa.2020.0334

Locally adaptive activation functions with slope recovery term for deep and physics-informed neural networks

Authors: Ameya D. Jagtap, Kenji Kawaguchi, George Em Karniadakis

Abstract: We propose two approaches of locally adaptive activation functions namely, layer-wise and neuron-wise locally adaptive activation functions, which improve the performance of deep and physics-informed neural networks. The local adaptation of activation function is achieved by introducing a scalable parameter in each layer (layer-wise) and for every neuron (neuron-wise) separately, and then optimizi… ▽ More We propose two approaches of locally adaptive activation functions namely, layer-wise and neuron-wise locally adaptive activation functions, which improve the performance of deep and physics-informed neural networks. The local adaptation of activation function is achieved by introducing a scalable parameter in each layer (layer-wise) and for every neuron (neuron-wise) separately, and then optimizing it using a variant of stochastic gradient descent algorithm. In order to further increase the training speed, an activation slope based slope recovery term is added in the loss function, which further accelerates convergence, thereby reducing the training cost. On the theoretical side, we prove that in the proposed method, the gradient descent algorithms are not attracted to sub-optimal critical points or local minima under practical conditions on the initialization and learning rate, and that the gradient dynamics of the proposed method is not achievable by base methods with any (adaptive) learning rates. We further show that the adaptive activation methods accelerate the convergence by implicitly multiplying conditioning matrices to the gradient of the base method without any explicit computation of the conditioning matrix and the matrix-vector product. The different adaptive activation functions are shown to induce different implicit conditioning matrices. Furthermore, the proposed methods with the slope recovery are shown to accelerate the training process. △ Less

Submitted 17 June, 2020; v1 submitted 25 September, 2019; originally announced September 2019.

Comments: 19 pages, 8 figures

arXiv:1909.10145 [pdf, other]

doi 10.1016/j.cma.2020.113250

PPINN: Parareal Physics-Informed Neural Network for time-dependent PDEs

Authors: Xuhui Meng, Zhen Li, Dongkun Zhang, George Em Karniadakis

Abstract: Physics-informed neural networks (PINNs) encode physical conservation laws and prior physical knowledge into the neural networks, ensuring the correct physics is represented accurately while alleviating the need for supervised learning to a great degree. While effective for relatively short-term time integration, when long time integration of the time-dependent PDEs is sought, the time-space domai… ▽ More Physics-informed neural networks (PINNs) encode physical conservation laws and prior physical knowledge into the neural networks, ensuring the correct physics is represented accurately while alleviating the need for supervised learning to a great degree. While effective for relatively short-term time integration, when long time integration of the time-dependent PDEs is sought, the time-space domain may become arbitrarily large and hence training of the neural network may become prohibitively expensive. To this end, we develop a parareal physics-informed neural network (PPINN), hence decomposing a long-time problem into many independent short-time problems supervised by an inexpensive/fast coarse-grained (CG) solver. In particular, the serial CG solver is designed to provide approximate predictions of the solution at discrete times, while initiate many fine PINNs simultaneously to correct the solution iteratively. There is a two-fold benefit from training PINNs with small-data sets rather than working on a large-data set directly, i.e., training of individual PINNs with small-data is much faster, while training the fine PINNs can be readily parallelized. Consequently, compared to the original PINN approach, the proposed PPINN approach may achieve a significant speedup for long-time integration of PDEs, assuming that the CG solver is fast and can provide reasonable predictions of the solution, hence aiding the PPINN solution to converge in just a few iterations. To investigate the PPINN performance on solving time-dependent PDEs, we first apply the PPINN to solve the Burgers equation, and subsequently we apply the PPINN to solve a two-dimensional nonlinear diffusion-reaction equation. Our results demonstrate that PPINNs converge in a couple of iterations with significant speed-ups proportional to the number of time-subdomains employed. △ Less

Submitted 22 September, 2019; originally announced September 2019.

Comments: 17 pages, 7 figures, 5 tables

arXiv:1909.09459 [pdf, other]

Physics-informed semantic inpainting: Application to geostatistical modeling

Authors: Qiang Zheng, Lingzao Zeng, Zhendan Cao, George Em Karniadakis

Abstract: A fundamental problem in geostatistical modeling is to infer the heterogeneous geological field based on limited measurements and some prior spatial statistics. Semantic inpainting, a technique for image processing using deep generative models, has been recently applied for this purpose, demonstrating its effectiveness in dealing with complex spatial patterns. However, the original semantic inpain… ▽ More A fundamental problem in geostatistical modeling is to infer the heterogeneous geological field based on limited measurements and some prior spatial statistics. Semantic inpainting, a technique for image processing using deep generative models, has been recently applied for this purpose, demonstrating its effectiveness in dealing with complex spatial patterns. However, the original semantic inpainting framework incorporates only information from direct measurements, while in geostatistics indirect measurements are often plentiful. To overcome this limitation, here we propose a physics-informed semantic inpainting framework, employing the Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) and jointly incorporating the direct and indirect measurements by exploiting the underlying physical laws. Our simulation results for a high-dimensional problem with 512 dimensions show that in the new method, the physical conservation laws are satisfied and contribute in enhancing the inpainting performance compared to using only the direct measurements. △ Less

Submitted 23 December, 2019; v1 submitted 19 September, 2019; originally announced September 2019.

arXiv:1908.11462 [pdf, other]

Potential Flow Generator with $L_2$ Optimal Transport Regularity for Generative Models

Authors: Liu Yang, George Em Karniadakis

Abstract: We propose a potential flow generator with $L_2$ optimal transport regularity, which can be easily integrated into a wide range of generative models including different versions of GANs and flow-based models. We show the correctness and robustness of the potential flow generator in several 2D problems, and illustrate the concept of "proximity" due to the $L_2$ optimal transport regularity. Subsequ… ▽ More We propose a potential flow generator with $L_2$ optimal transport regularity, which can be easily integrated into a wide range of generative models including different versions of GANs and flow-based models. We show the correctness and robustness of the potential flow generator in several 2D problems, and illustrate the concept of "proximity" due to the $L_2$ optimal transport regularity. Subsequently, we demonstrate the effectiveness of the potential flow generator in image translation tasks with unpaired training data from the MNIST dataset and the CelebA dataset. △ Less

Submitted 29 August, 2019; originally announced August 2019.

arXiv:1907.09696 [pdf, other]

doi 10.1615/.2020034126

Trainability of ReLU networks and Data-dependent Initialization

Authors: Yeonjong Shin, George Em Karniadakis

Abstract: In this paper, we study the trainability of rectified linear unit (ReLU) networks. A ReLU neuron is said to be dead if it only outputs a constant for any input. Two death states of neurons are introduced; tentative and permanent death. A network is then said to be trainable if the number of permanently dead neurons is sufficiently small for a learning task. We refer to the probability of a network… ▽ More In this paper, we study the trainability of rectified linear unit (ReLU) networks. A ReLU neuron is said to be dead if it only outputs a constant for any input. Two death states of neurons are introduced; tentative and permanent death. A network is then said to be trainable if the number of permanently dead neurons is sufficiently small for a learning task. We refer to the probability of a network being trainable as trainability. We show that a network being trainable is a necessary condition for successful training and the trainability serves as an upper bound of successful training rates. In order to quantify the trainability, we study the probability distribution of the number of active neurons at the initialization. In many applications, over-specified or over-parameterized neural networks are successfully employed and shown to be trained effectively. With the notion of trainability, we show that over-parameterization is both a necessary and a sufficient condition for minimizing the training loss. Furthermore, we propose a data-dependent initialization method in the over-parameterized setting. Numerical examples are provided to demonstrate the effectiveness of the method and our theoretical findings. △ Less

Submitted 31 March, 2020; v1 submitted 23 July, 2019; originally announced July 2019.

arXiv:1907.04502 [pdf, other]

doi 10.1137/19M1274067

DeepXDE: A deep learning library for solving differential equations

Authors: Lu Lu, Xuhui Meng, Zhi** Mao, George E. Karniadakis

Abstract: Deep learning has achieved remarkable success in diverse applications; however, its use in solving partial differential equations (PDEs) has emerged only recently. Here, we present an overview of physics-informed neural networks (PINNs), which embed a PDE into the loss of the neural network using automatic differentiation. The PINN algorithm is simple, and it can be applied to different types of P… ▽ More Deep learning has achieved remarkable success in diverse applications; however, its use in solving partial differential equations (PDEs) has emerged only recently. Here, we present an overview of physics-informed neural networks (PINNs), which embed a PDE into the loss of the neural network using automatic differentiation. The PINN algorithm is simple, and it can be applied to different types of PDEs, including integro-differential equations, fractional PDEs, and stochastic PDEs. Moreover, from the implementation point of view, PINNs solve inverse problems as easily as forward problems. We propose a new residual-based adaptive refinement (RAR) method to improve the training efficiency of PINNs. For pedagogical reasons, we compare the PINN algorithm to a standard finite element method. We also present a Python library for PINNs, DeepXDE, which is designed to serve both as an education tool to be used in the classroom as well as a research tool for solving problems in computational science and engineering. Specifically, DeepXDE can solve forward problems given initial and boundary conditions, as well as inverse problems given some extra measurements. DeepXDE supports complex-geometry domains based on the technique of constructive solid geometry, and enables the user code to be compact, resembling closely the mathematical formulation. We introduce the usage of DeepXDE and its customizability, and we also demonstrate the capability of PINNs and the user-friendliness of DeepXDE for five different examples. More broadly, DeepXDE contributes to the more rapid development of the emerging Scientific Machine Learning field. △ Less

Submitted 14 February, 2020; v1 submitted 10 July, 2019; originally announced July 2019.

arXiv:1905.11427 [pdf, other]

doi 10.1016/j.neunet.2020.06.024

Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness

Authors: Pengzhan **, Lu Lu, Yifa Tang, George Em Karniadakis

Abstract: The accuracy of deep learning, i.e., deep neural networks, can be characterized by dividing the total error into three main types: approximation error, optimization error, and generalization error. Whereas there are some satisfactory answers to the problems of approximation and optimization, much less is known about the theory of generalization. Most existing theoretical works for generalization f… ▽ More The accuracy of deep learning, i.e., deep neural networks, can be characterized by dividing the total error into three main types: approximation error, optimization error, and generalization error. Whereas there are some satisfactory answers to the problems of approximation and optimization, much less is known about the theory of generalization. Most existing theoretical works for generalization fail to explain the performance of neural networks in practice. To derive a meaningful bound, we study the generalization error of neural networks for classification problems in terms of data distribution and neural network smoothness. We introduce the cover complexity (CC) to measure the difficulty of learning a data set and the inverse of the modulus of continuity to quantify neural network smoothness. A quantitative bound for expected accuracy/error is derived by considering both the CC and neural network smoothness. Although most of the analysis is general and not specific to neural networks, we validate our theoretical assumptions and results numerically for neural networks by several data sets of images. The numerical results confirm that the expected error of trained networks scaled with the square root of the number of classes has a linear relationship with respect to the CC. We also observe a clear consistency between test loss and neural network smoothness during the training process. In addition, we demonstrate empirically that the neural network smoothness decreases when the network size increases whereas the smoothness is insensitive to training dataset size. △ Less

Submitted 25 June, 2020; v1 submitted 27 May, 2019; originally announced May 2019.

arXiv:1905.01205 [pdf, other]

Learning in Modal Space: Solving Time-Dependent Stochastic PDEs Using Physics-Informed Neural Networks

Authors: Dongkun Zhang, Ling Guo, George Em Karniadakis

Abstract: One of the open problems in scientific computing is the long-time integration of nonlinear stochastic partial differential equations (SPDEs). We address this problem by taking advantage of recent advances in scientific machine learning and the dynamically orthogonal (DO) and bi-orthogonal (BO) methods for representing stochastic processes. Specifically, we propose two new Physics-Informed Neural N… ▽ More One of the open problems in scientific computing is the long-time integration of nonlinear stochastic partial differential equations (SPDEs). We address this problem by taking advantage of recent advances in scientific machine learning and the dynamically orthogonal (DO) and bi-orthogonal (BO) methods for representing stochastic processes. Specifically, we propose two new Physics-Informed Neural Networks (PINNs) for solving time-dependent SPDEs, namely the NN-DO/BO methods, which incorporate the DO/BO constraints into the loss function with an implicit form instead of generating explicit expressions for the temporal derivatives of the DO/BO modes. Hence, the proposed methods overcome some of the drawbacks of the original DO/BO methods: we do not need the assumption that the covariance matrix of the random coefficients is invertible as in the original DO method, and we can remove the assumption of no eigenvalue crossing as in the original BO method. Moreover, the NN-DO/BO methods can be used to solve time-dependent stochastic inverse problems with the same formulation and computational complexity as for forward problems. We demonstrate the capability of the proposed methods via several numerical examples: (1) A linear stochastic advection equation with deterministic initial condition where the original DO/BO method would fail; (2) Long-time integration of the stochastic Burgers' equation with many eigenvalue crossings during the whole time evolution where the original BO method fails. (3) Nonlinear reaction diffusion equation: we consider both the forward and the inverse problem, including noisy initial data, to investigate the flexibility of the NN-DO/BO methods in handling inverse and mixed type problems. Taken together, these simulation results demonstrate that the NN-DO/BO methods can be employed to effectively quantify uncertainty propagation in a wide range of physical problems. △ Less

Submitted 3 September, 2019; v1 submitted 3 May, 2019; originally announced May 2019.

arXiv:1903.06733 [pdf, other]

doi 10.4208/cicp.OA-2020-0165

Dying ReLU and Initialization: Theory and Numerical Examples

Authors: Lu Lu, Yeonjong Shin, Yanhui Su, George Em Karniadakis

Abstract: The dying ReLU refers to the problem when ReLU neurons become inactive and only output 0 for any input. There are many empirical and heuristic explanations of why ReLU neurons die. However, little is known about its theoretical analysis. In this paper, we rigorously prove that a deep ReLU network will eventually die in probability as the depth goes to infinite. Several methods have been proposed t… ▽ More The dying ReLU refers to the problem when ReLU neurons become inactive and only output 0 for any input. There are many empirical and heuristic explanations of why ReLU neurons die. However, little is known about its theoretical analysis. In this paper, we rigorously prove that a deep ReLU network will eventually die in probability as the depth goes to infinite. Several methods have been proposed to alleviate the dying ReLU. Perhaps, one of the simplest treatments is to modify the initialization procedure. One common way of initializing weights and biases uses symmetric probability distributions, which suffers from the dying ReLU. We thus propose a new initialization procedure, namely, a randomized asymmetric initialization. We prove that the new initialization can effectively prevent the dying ReLU. All parameters required for the new initialization are theoretically designed. Numerical examples are provided to demonstrate the effectiveness of the new initialization procedure. △ Less

Submitted 21 October, 2020; v1 submitted 15 March, 2019; originally announced March 2019.

arXiv:1812.06467 [pdf, other]

doi 10.1098/rsfs.2018.0083

Linking Gaussian Process regression with data-driven manifold embeddings for nonlinear data fusion

Authors: Seungjoon Lee, Felix Dietrich, George E. Karniadakis, Ioannis G. Kevrekidis

Abstract: In statistical modeling with Gaussian Process regression, it has been shown that combining (few) high-fidelity data with (many) low-fidelity data can enhance prediction accuracy, compared to prediction based on the few high-fidelity data only. Such information fusion techniques for multifidelity data commonly approach the high-fidelity model $f_h(t)$ as a function of two variables $(t,y)$, and the… ▽ More In statistical modeling with Gaussian Process regression, it has been shown that combining (few) high-fidelity data with (many) low-fidelity data can enhance prediction accuracy, compared to prediction based on the few high-fidelity data only. Such information fusion techniques for multifidelity data commonly approach the high-fidelity model $f_h(t)$ as a function of two variables $(t,y)$, and then using $f_l(t)$ as the $y$ data. More generally, the high-fidelity model can be written as a function of several variables $(t,y_1,y_2....)$; the low-fidelity model $f_l$ and, say, some of its derivatives, can then be substituted for these variables. In this paper, we will explore mathematical algorithms for multifidelity information fusion that use such an approach towards improving the representation of the high-fidelity function with only a few training data points. Given that $f_h$ may not be a simple function -- and sometimes not even a function -- of $f_l$, we demonstrate that using additional functions of $t$, such as derivatives or shifts of $f_l$, can drastically improve the approximation of $f_h$ through Gaussian Processes. We also point out a connection with "embedology" techniques from topology and dynamical systems. △ Less

Submitted 16 December, 2018; originally announced December 2018.

arXiv:1811.02033 [pdf, other]

Physics-Informed Generative Adversarial Networks for Stochastic Differential Equations

Authors: Liu Yang, Dongkun Zhang, George Em Karniadakis

Abstract: We developed a new class of physics-informed generative adversarial networks (PI-GANs) to solve in a unified manner forward, inverse and mixed stochastic problems based on a limited number of scattered measurements. Unlike standard GANs relying only on data for training, here we encoded into the architecture of GANs the governing physical laws in the form of stochastic differential equations (SDEs… ▽ More We developed a new class of physics-informed generative adversarial networks (PI-GANs) to solve in a unified manner forward, inverse and mixed stochastic problems based on a limited number of scattered measurements. Unlike standard GANs relying only on data for training, here we encoded into the architecture of GANs the governing physical laws in the form of stochastic differential equations (SDEs) using automatic differentiation. In particular, we applied Wasserstein GANs with gradient penalty (WGAN-GP) for its enhanced stability compared to vanilla GANs. We first tested WGAN-GP in approximating Gaussian processes of different correlation lengths based on data realizations collected from simultaneous reads at sparsely placed sensors. We obtained good approximation of the generated stochastic processes to the target ones even for a mismatch between the input noise dimensionality and the effective dimensionality of the target stochastic processes. We also studied the overfitting issue for both the discriminator and generator, and we found that overfitting occurs also in the generator in addition to the discriminator as previously reported. Subsequently, we considered the solution of elliptic SDEs requiring approximations of three stochastic processes, namely the solution, the forcing, and the diffusion coefficient. We used three generators for the PI-GANs, two of them were feed forward deep neural networks (DNNs) while the other one was the neural network induced by the SDE. Depending on the data, we employed one or multiple feed forward DNNs as the discriminators in PI-GANs. Here, we have demonstrated the accuracy and effectiveness of PI-GANs in solving SDEs for up to 30 dimensions, but in principle, PI-GANs could tackle very high dimensional problems given more sensor data with low-polynomial growth in computational cost. △ Less

Submitted 5 November, 2018; originally announced November 2018.

arXiv:1810.11596 [pdf, other]

doi 10.1007/s42967-019-00031-y

Nonlocal flocking dynamics: Learning the fractional order of PDEs from particle simulations

Authors: Zhi** Mao, Zhen Li, George Em Karniadakis

Abstract: Flocking refers to collective behavior of a large number of interacting entities, where the interactions between discrete individuals produce collective motion on the large scale. We employ an agent-based model to describe the microscopic dynamics of each individual in a flock, and use a fractional PDE to model the evolution of macroscopic quantities of interest. The macroscopic models with phenom… ▽ More Flocking refers to collective behavior of a large number of interacting entities, where the interactions between discrete individuals produce collective motion on the large scale. We employ an agent-based model to describe the microscopic dynamics of each individual in a flock, and use a fractional PDE to model the evolution of macroscopic quantities of interest. The macroscopic models with phenomenological interaction functions are derived by applying the continuum hypothesis to the microscopic model. Instead of specifying the fPDEs with an ad hoc fractional order for nonlocal flocking dynamics, we learn the effective nonlocal influence function in fPDEs directly from particle trajectories generated by the agent-based simulations. We demonstrate how the learning framework is used to connect the discrete agent-based model to the continuum fPDEs in 1D and 2D nonlocal flocking dynamics. In particular, a Cucker-Smale particle model is employed to describe the microscale dynamics of each individual, while Euler equations with nonlocal interaction terms are used to compute the evolution of macroscale quantities. The trajectories generated by the particle simulations mimic the field data of tracking logs that can be obtained experimentally. They can be used to learn the fractional order of the influence function using a Gaussian process regression model implemented with the Bayesian optimization. We show that the numerical solution of the learned Euler equations solved by the finite volume scheme can yield correct density distributions consistent with the collective behavior of the agent-based system. The proposed method offers new insights on how to scale the discrete agent-based models to the continuum-based PDE models, and could serve as a paradigm on extracting effective governing equations for nonlocal flocking dynamics directly from particle trajectories. △ Less

Submitted 30 October, 2018; v1 submitted 27 October, 2018; originally announced October 2018.

Comments: 22 pages, 7 figures

Journal ref: Commun. Appl. Math. Comput. 2019, 1: 597-619

arXiv:1809.08327 [pdf, other]

doi 10.1016/j.jcp.2019.07.048

Quantifying total uncertainty in physics-informed neural networks for solving forward and inverse stochastic problems

Authors: Dongkun Zhang, Lu Lu, Ling Guo, George Em Karniadakis

Abstract: Physics-informed neural networks (PINNs) have recently emerged as an alternative way of solving partial differential equations (PDEs) without the need of building elaborate grids, instead, using a straightforward implementation. In particular, in addition to the deep neural network (DNN) for the solution, a second DNN is considered that represents the residual of the PDE. The residual is then comb… ▽ More Physics-informed neural networks (PINNs) have recently emerged as an alternative way of solving partial differential equations (PDEs) without the need of building elaborate grids, instead, using a straightforward implementation. In particular, in addition to the deep neural network (DNN) for the solution, a second DNN is considered that represents the residual of the PDE. The residual is then combined with the mismatch in the given data of the solution in order to formulate the loss function. This framework is effective but is lacking uncertainty quantification of the solution due to the inherent randomness in the data or due to the approximation limitations of the DNN architecture. Here, we propose a new method with the objective of endowing the DNN with uncertainty quantification for both sources of uncertainty, i.e., the parametric uncertainty and the approximation uncertainty. We first account for the parametric uncertainty when the parameter in the differential equation is represented as a stochastic process. Multiple DNNs are designed to learn the modal functions of the arbitrary polynomial chaos (aPC) expansion of its solution by using stochastic data from sparse sensors. We can then make predictions from new sensor measurements very efficiently with the trained DNNs. Moreover, we employ dropout to correct the over-fitting and also to quantify the uncertainty of DNNs in approximating the modal functions. We then design an active learning strategy based on the dropout uncertainty to place new sensors in the domain to improve the predictions of DNNs. Several numerical tests are conducted for both the forward and the inverse problems to quantify the effectiveness of PINNs combined with uncertainty quantification. This NN-aPC new paradigm of physics-informed deep learning with uncertainty quantification can be readily applied to other types of stochastic PDEs in multi-dimensions. △ Less

Submitted 21 September, 2018; originally announced September 2018.

arXiv:1808.08952 [pdf, other]

doi 10.1017/jfm.2018.872

Deep Learning of Vortex Induced Vibrations

Authors: Maziar Raissi, Zhicheng Wang, Michael S. Triantafyllou, George Em Karniadakis

Abstract: Vortex induced vibrations of bluff bodies occur when the vortex shedding frequency is close to the natural frequency of the structure. Of interest is the prediction of the lift and drag forces on the structure given some limited and scattered information on the velocity field. This is an inverse problem that is not straightforward to solve using standard computational fluid dynamics (CFD) methods,… ▽ More Vortex induced vibrations of bluff bodies occur when the vortex shedding frequency is close to the natural frequency of the structure. Of interest is the prediction of the lift and drag forces on the structure given some limited and scattered information on the velocity field. This is an inverse problem that is not straightforward to solve using standard computational fluid dynamics (CFD) methods, especially since no information is provided for the pressure. An even greater challenge is to infer the lift and drag forces given some dye or smoke visualizations of the flow field. Here we employ deep neural networks that are extended to encode the incompressible Navier-Stokes equations coupled with the structure's dynamic motion equation. In the first case, given scattered data in space-time on the velocity field and the structure's motion, we use four coupled deep neural networks to infer very accurately the structural parameters, the entire time-dependent pressure field (with no prior training data), and reconstruct the velocity vector field and the structure's dynamic motion. In the second case, given scattered data in space-time on a concentration field only, we use five coupled deep neural networks to infer very accurately the vector velocity field and all other quantities of interest as before. This new paradigm of inference in fluid mechanics for coupled multi-physics problems enables velocity and pressure quantification from flow snapshots in small subdomains and can be exploited for flow control applications and also for system identification. △ Less

Submitted 26 August, 2018; originally announced August 2018.

Comments: arXiv admin note: text overlap with arXiv:1808.04327

arXiv:1808.04947 [pdf, other]

doi 10.4208/cicp.OA-2020-0165

Collapse of Deep and Narrow Neural Nets

Authors: Lu Lu, Yanhui Su, George Em Karniadakis

Abstract: Recent theoretical work has demonstrated that deep neural networks have superior performance over shallow networks, but their training is more difficult, e.g., they suffer from the vanishing gradient problem. This problem can be typically resolved by the rectified linear unit (ReLU) activation. However, here we show that even for such activation, deep and narrow neural networks (NNs) will converge… ▽ More Recent theoretical work has demonstrated that deep neural networks have superior performance over shallow networks, but their training is more difficult, e.g., they suffer from the vanishing gradient problem. This problem can be typically resolved by the rectified linear unit (ReLU) activation. However, here we show that even for such activation, deep and narrow neural networks (NNs) will converge to erroneous mean or median states of the target function depending on the loss with high probability. Deep and narrow NNs are encountered in solving partial differential equations with high-order derivatives. We demonstrate this collapse of such NNs both numerically and theoretically, and provide estimates of the probability of collapse. We also construct a diagram of a safe region for designing NNs that avoid the collapse to erroneous states. Finally, we examine different ways of initialization and normalization that may avoid the collapse problem. Asymmetric initializations may reduce the probability of collapse but do not totally eliminate it. △ Less

Submitted 23 December, 2018; v1 submitted 14 August, 2018; originally announced August 2018.

arXiv:1808.04327 [pdf, other]

Hidden Fluid Mechanics: A Navier-Stokes Informed Deep Learning Framework for Assimilating Flow Visualization Data

Authors: Maziar Raissi, Alireza Yazdani, George Em Karniadakis

Abstract: We present hidden fluid mechanics (HFM), a physics informed deep learning framework capable of encoding an important class of physical laws governing fluid motions, namely the Navier-Stokes equations. In particular, we seek to leverage the underlying conservation laws (i.e., for mass, momentum, and energy) to infer hidden quantities of interest such as velocity and pressure fields merely from spat… ▽ More We present hidden fluid mechanics (HFM), a physics informed deep learning framework capable of encoding an important class of physical laws governing fluid motions, namely the Navier-Stokes equations. In particular, we seek to leverage the underlying conservation laws (i.e., for mass, momentum, and energy) to infer hidden quantities of interest such as velocity and pressure fields merely from spatio-temporal visualizations of a passive scaler (e.g., dye or smoke), transported in arbitrarily complex domains (e.g., in human arteries or brain aneurysms). Our approach towards solving the aforementioned data assimilation problem is unique as we design an algorithm that is agnostic to the geometry or the initial and boundary conditions. This makes HFM highly flexible in choosing the spatio-temporal domain of interest for data acquisition as well as subsequent training and predictions. Consequently, the predictions made by HFM are among those cases where a pure machine learning strategy or a mere scientific computing approach simply cannot reproduce. The proposed algorithm achieves accurate predictions of the pressure and velocity fields in both two and three dimensional flows for several benchmark problems motivated by real-world applications. Our results demonstrate that this relatively simple methodology can be used in physical and biomedical problems to extract valuable quantitative information (e.g., lift and drag forces or wall shear stresses in arteries) for which direct measurements may not be possible. △ Less

Submitted 13 August, 2018; originally announced August 2018.

arXiv:1808.00931 [pdf, other]

Machine Learning of Space-Fractional Differential Equations

Authors: Mamikon Gulian, Maziar Raissi, Paris Perdikaris, George Karniadakis

Abstract: Data-driven discovery of "hidden physics" -- i.e., machine learning of differential equation models underlying observed data -- has recently been approached by embedding the discovery problem into a Gaussian Process regression of spatial data, treating and discovering unknown equation parameters as hyperparameters of a modified "physics informed" Gaussian Process kernel. This kernel includes the p… ▽ More Data-driven discovery of "hidden physics" -- i.e., machine learning of differential equation models underlying observed data -- has recently been approached by embedding the discovery problem into a Gaussian Process regression of spatial data, treating and discovering unknown equation parameters as hyperparameters of a modified "physics informed" Gaussian Process kernel. This kernel includes the parametrized differential operators applied to a prior covariance kernel. We extend this framework to linear space-fractional differential equations. The methodology is compatible with a wide variety of fractional operators in $\mathbb{R}^d$ and stationary covariance kernels, including the Matern class, and can optimize the Matern parameter during training. We provide a user-friendly and feasible way to perform fractional derivatives of kernels, via a unified set of d-dimensional Fourier integral formulas amenable to generalized Gauss-Laguerre quadrature. The implementation of fractional derivatives has several benefits. First, it allows for discovering fractional-order PDEs for systems characterized by heavy tails or anomalous diffusion, bypassing the analytical difficulty of fractional calculus. Data sets exhibiting such features are of increasing prevalence in physical and financial domains. Second, a single fractional-order archetype allows for a derivative of arbitrary order to be learned, with the order itself being a parameter in the regression. This is advantageous even when used for discovering integer-order equations; the user is not required to assume a "dictionary" of derivatives of various orders, and directly controls the parsimony of the models being discovered. We illustrate on several examples, including fractional-order interpolation of advection-diffusion and modeling relative stock performance in the S&P 500 with alpha-stable motion via a fractional diffusion equation. △ Less

Submitted 2 August, 2019; v1 submitted 2 August, 2018; originally announced August 2018.

Comments: 26 pages, 10 figures. In v2, a minor change to the formatting of a handful of references was made in the bibliography; the main text was unchanged. In v3, minor improvements were made to the exposition; more details about motivation, examples, optimization, and relation to previous works were given

MSC Class: 35R11; 65N21; 62M10; 62F15; 60G15; 60G52

arXiv:1806.11187 [pdf, other]

doi 10.1016/j.jcp.2019.01.045

Neural-net-induced Gaussian process regression for function approximation and PDE solution

Authors: Guofei Pang, Liu Yang, George Em Karniadakis

Abstract: Neural-net-induced Gaussian process (NNGP) regression inherits both the high expressivity of deep neural networks (deep NNs) as well as the uncertainty quantification property of Gaussian processes (GPs). We generalize the current NNGP to first include a larger number of hyperparameters and subsequently train the model by maximum likelihood estimation. Unlike previous works on NNGP that targeted c… ▽ More Neural-net-induced Gaussian process (NNGP) regression inherits both the high expressivity of deep neural networks (deep NNs) as well as the uncertainty quantification property of Gaussian processes (GPs). We generalize the current NNGP to first include a larger number of hyperparameters and subsequently train the model by maximum likelihood estimation. Unlike previous works on NNGP that targeted classification, here we apply the generalized NNGP to function approximation and to solving partial differential equations (PDEs). Specifically, we develop an analytical iteration formula to compute the covariance function of GP induced by deep NN with an error-function nonlinearity. We compare the performance of the generalized NNGP for function approximations and PDE solutions with those of GPs and fully-connected NNs. We observe that for smooth functions the generalized NNGP can yield the same order of accuracy with GP, while both NNGP and GP outperform deep NN. For non-smooth functions, the generalized NNGP is superior to GP and comparable or superior to deep NN. △ Less

Submitted 21 June, 2018; originally announced June 2018.

arXiv:1801.01236 [pdf, other]

Multistep Neural Networks for Data-driven Discovery of Nonlinear Dynamical Systems

Authors: Maziar Raissi, Paris Perdikaris, George Em Karniadakis

Abstract: The process of transforming observed data into predictive mathematical models of the physical world has always been paramount in science and engineering. Although data is currently being collected at an ever-increasing pace, devising meaningful models out of such observations in an automated fashion still remains an open problem. In this work, we put forth a machine learning approach for identifyi… ▽ More The process of transforming observed data into predictive mathematical models of the physical world has always been paramount in science and engineering. Although data is currently being collected at an ever-increasing pace, devising meaningful models out of such observations in an automated fashion still remains an open problem. In this work, we put forth a machine learning approach for identifying nonlinear dynamical systems from data. Specifically, we blend classical tools from numerical analysis, namely the multi-step time-step** schemes, with powerful nonlinear function approximators, namely deep neural networks, to distill the mechanisms that govern the evolution of a given data-set. We test the effectiveness of our approach for several benchmark problems involving the identification of complex, nonlinear and chaotic dynamics, and we demonstrate how this allows us to accurately learn the dynamics, forecast future states, and identify basins of attraction. In particular, we study the Lorenz system, the fluid flow behind a cylinder, the Hopf bifurcation, and the Glycoltic oscillator model as an example of complicated nonlinear dynamics typical of biological systems. △ Less

Submitted 3 January, 2018; originally announced January 2018.

arXiv:1711.10566 [pdf, other]

Physics Informed Deep Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations

Authors: Maziar Raissi, Paris Perdikaris, George Em Karniadakis

Abstract: We introduce physics informed neural networks -- neural networks that are trained to solve supervised learning tasks while respecting any given law of physics described by general nonlinear partial differential equations. In this second part of our two-part treatise, we focus on the problem of data-driven discovery of partial differential equations. Depending on whether the available data is scatt… ▽ More We introduce physics informed neural networks -- neural networks that are trained to solve supervised learning tasks while respecting any given law of physics described by general nonlinear partial differential equations. In this second part of our two-part treatise, we focus on the problem of data-driven discovery of partial differential equations. Depending on whether the available data is scattered in space-time or arranged in fixed temporal snapshots, we introduce two main classes of algorithms, namely continuous time and discrete time models. The effectiveness of our approach is demonstrated using a wide range of benchmark problems in mathematical physics, including conservation laws, incompressible fluid flow, and the propagation of nonlinear shallow-water waves. △ Less

Submitted 28 November, 2017; originally announced November 2017.

arXiv:1711.10561 [pdf, other]

Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations

Authors: Maziar Raissi, Paris Perdikaris, George Em Karniadakis

Abstract: We introduce physics informed neural networks -- neural networks that are trained to solve supervised learning tasks while respecting any given law of physics described by general nonlinear partial differential equations. In this two part treatise, we present our developments in the context of solving two main classes of problems: data-driven solution and data-driven discovery of partial different… ▽ More We introduce physics informed neural networks -- neural networks that are trained to solve supervised learning tasks while respecting any given law of physics described by general nonlinear partial differential equations. In this two part treatise, we present our developments in the context of solving two main classes of problems: data-driven solution and data-driven discovery of partial differential equations. Depending on the nature and arrangement of the available data, we devise two distinct classes of algorithms, namely continuous time and discrete time models. The resulting neural networks form a new class of data-efficient universal function approximators that naturally encode any underlying physical laws as prior information. In this first part, we demonstrate how these networks can be used to infer solutions to partial differential equations, and obtain physics-informed surrogate models that are fully differentiable with respect to all input coordinates and free parameters. △ Less

Submitted 28 November, 2017; originally announced November 2017.

arXiv:1708.00588 [pdf, other]

doi 10.1016/j.jcp.2017.11.039

Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations

Authors: Maziar Raissi, George Em Karniadakis

Abstract: While there is currently a lot of enthusiasm about "big data", useful data is usually "small" and expensive to acquire. In this paper, we present a new paradigm of learning partial differential equations from {\em small} data. In particular, we introduce \emph{hidden physics models}, which are essentially data-efficient learning machines capable of leveraging the underlying laws of physics, expres… ▽ More While there is currently a lot of enthusiasm about "big data", useful data is usually "small" and expensive to acquire. In this paper, we present a new paradigm of learning partial differential equations from {\em small} data. In particular, we introduce \emph{hidden physics models}, which are essentially data-efficient learning machines capable of leveraging the underlying laws of physics, expressed by time dependent and nonlinear partial differential equations, to extract patterns from high-dimensional data generated from experiments. The proposed methodology may be applied to the problem of learning, system identification, or data-driven discovery of partial differential equations. Our framework relies on Gaussian processes, a powerful tool for probabilistic inference over functions, that enables us to strike a balance between model complexity and data fitting. The effectiveness of the proposed approach is demonstrated through a variety of canonical problems, spanning a number of scientific domains, including the Navier-Stokes, Schrödinger, Kuramoto-Sivashinsky, and time dependent linear fractional equations. The methodology provides a promising new direction for harnessing the long-standing developments of classical methods in applied mathematics and mathematical physics to design learning machines with the ability to operate in complex domains without requiring large quantities of data. △ Less

Submitted 21 August, 2017; v1 submitted 1 August, 2017; originally announced August 2017.

arXiv:1703.10230 [pdf, other]

Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations

Authors: Maziar Raissi, Paris Perdikaris, George Em Karniadakis

Abstract: We introduce the concept of numerical Gaussian processes, which we define as Gaussian processes with covariance functions resulting from temporal discretization of time-dependent partial differential equations. Numerical Gaussian processes, by construction, are designed to deal with cases where: (1) all we observe are noisy data on black-box initial conditions, and (2) we are interested in quantif… ▽ More We introduce the concept of numerical Gaussian processes, which we define as Gaussian processes with covariance functions resulting from temporal discretization of time-dependent partial differential equations. Numerical Gaussian processes, by construction, are designed to deal with cases where: (1) all we observe are noisy data on black-box initial conditions, and (2) we are interested in quantifying the uncertainty associated with such noisy data in our solutions to time-dependent partial differential equations. Our method circumvents the need for spatial discretization of the differential operators by proper placement of Gaussian process priors. This is an attempt to construct structured and data-efficient learning machines, which are explicitly informed by the underlying physics that possibly generated the observed data. The effectiveness of the proposed approach is demonstrated through several benchmark problems involving linear and nonlinear time-dependent operators. In all examples, we are able to recover accurate approximations of the latent solutions, and consistently propagate uncertainty, even in cases involving very long time integration. △ Less

Submitted 29 March, 2017; originally announced March 2017.

MSC Class: 65C20; 68T05; 65M75

arXiv:1701.02440 [pdf, other]

doi 10.1016/j.jcp.2017.07.050

Machine Learning of Linear Differential Equations using Gaussian Processes

Authors: Maziar Raissi, George Em. Karniadakis

Abstract: This work leverages recent advances in probabilistic machine learning to discover conservation laws expressed by parametric linear equations. Such equations involve, but are not limited to, ordinary and partial differential, integro-differential, and fractional order operators. Here, Gaussian process priors are modified according to the particular form of such operators and are employed to infer p… ▽ More This work leverages recent advances in probabilistic machine learning to discover conservation laws expressed by parametric linear equations. Such equations involve, but are not limited to, ordinary and partial differential, integro-differential, and fractional order operators. Here, Gaussian process priors are modified according to the particular form of such operators and are employed to infer parameters of the linear equations from scarce and possibly noisy observations. Such observations may come from experiments or "black-box" computer simulations. △ Less

Submitted 10 January, 2017; originally announced January 2017.

arXiv:1604.07484 [pdf, other]

Deep Multi-fidelity Gaussian Processes

Authors: Maziar Raissi, George Karniadakis

Abstract: We develop a novel multi-fidelity framework that goes far beyond the classical AR(1) Co-kriging scheme of Kennedy and O'Hagan (2000). Our method can handle general discontinuous cross-correlations among systems with different levels of fidelity. A combination of multi-fidelity Gaussian Processes (AR(1) Co-kriging) and deep neural networks enables us to construct a method that is immune to disconti… ▽ More We develop a novel multi-fidelity framework that goes far beyond the classical AR(1) Co-kriging scheme of Kennedy and O'Hagan (2000). Our method can handle general discontinuous cross-correlations among systems with different levels of fidelity. A combination of multi-fidelity Gaussian Processes (AR(1) Co-kriging) and deep neural networks enables us to construct a method that is immune to discontinuities. We demonstrate the effectiveness of the new technology using standard benchmark problems designed to resemble the outputs of complicated high- and low-fidelity codes. △ Less

Submitted 25 April, 2016; originally announced April 2016.

Showing 1–42 of 42 results for author: Karniadakis, G