Search | arXiv e-print repository

Solving High-Dimensional Inverse Problems with Auxiliary Uncertainty via Operator Learning with Limited Data

Authors: Joseph Hart, Mamikon Gulian, Indu Manickam, Laura Swiler

Abstract: In complex large-scale systems such as climate, important effects are caused by a combination of confounding processes that are not fully observable. The identification of sources from observations of system state is vital for attribution and prediction, which inform critical policy decisions. The difficulty of these types of inverse problems lies in the inability to isolate sources and the cost o… ▽ More In complex large-scale systems such as climate, important effects are caused by a combination of confounding processes that are not fully observable. The identification of sources from observations of system state is vital for attribution and prediction, which inform critical policy decisions. The difficulty of these types of inverse problems lies in the inability to isolate sources and the cost of simulating computational models. Surrogate models may enable the many-query algorithms required for source identification, but data challenges arise from high dimensionality of the state and source, limited ensembles of costly model simulations to train a surrogate model, and few and potentially noisy state observations for inversion due to measurement limitations. The influence of auxiliary processes adds an additional layer of uncertainty that further confounds source identification. We introduce a framework based on (1) calibrating deep neural network surrogates to the flow maps provided by an ensemble of simulations obtained by varying sources, and (2) using these surrogates in a Bayesian framework to identify sources from observations via optimization. Focusing on an atmospheric dispersion exemplar, we find that the expressive and computationally efficient nature of the deep neural network operator surrogates in appropriately reduced dimension allows for source identification with uncertainty quantification using limited data. Introducing a variable wind field as an auxiliary process, we find that a Bayesian approximation error approach is essential for reliable source inversion when uncertainty due to wind stresses the algorithm. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: 29 pages, 10 figures

arXiv:2204.10909 [pdf, other]

Error-in-variables modelling for operator learning

Authors: Ravi G. Patel, Indu Manickam, Myoungkyu Lee, Mamikon Gulian

Abstract: Deep operator learning has emerged as a promising tool for reduced-order modelling and PDE model discovery. Leveraging the expressive power of deep neural networks, especially in high dimensions, such methods learn the map** between functional state variables. While proposed methods have assumed noise only in the dependent variables, experimental and numerical data for operator learning typicall… ▽ More Deep operator learning has emerged as a promising tool for reduced-order modelling and PDE model discovery. Leveraging the expressive power of deep neural networks, especially in high dimensions, such methods learn the map** between functional state variables. While proposed methods have assumed noise only in the dependent variables, experimental and numerical data for operator learning typically exhibit noise in the independent variables as well, since both variables represent signals that are subject to measurement error. In regression on scalar data, failure to account for noisy independent variables can lead to biased parameter estimates. With noisy independent variables, linear models fitted via ordinary least squares (OLS) will show attenuation bias, wherein the slope will be underestimated. In this work, we derive an analogue of attenuation bias for linear operator regression with white noise in both the independent and dependent variables. In the nonlinear setting, we computationally demonstrate underprediction of the action of the Burgers operator in the presence of noise in the independent variable. We propose error-in-variables (EiV) models for two operator regression methods, MOR-Physics and DeepONet, and demonstrate that these new models reduce bias in the presence of noisy independent variables for a variety of operator learning problems. Considering the Burgers operator in 1D and 2D, we demonstrate that EiV operator learning robustly recovers operators in high-noise regimes that defeat OLS operator learning. We also introduce an EiV model for time-evolving PDE discovery and show that OLS and EiV perform similarly in learning the Kuramoto-Sivashinsky evolution operator from corrupted data, suggesting that the effect of bias in OLS operator learning depends on the regularity of the target operator. △ Less

Submitted 19 July, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

Comments: 23 pages, 10 figures

arXiv:2110.11531 [pdf, other]

Fractional Modeling in Action: A Survey of Nonlocal Models for Subsurface Transport, Turbulent Flows, and Anomalous Materials

Authors: Jorge Suzuki, Mamikon Gulian, Mohsen Zayernouri, Marta D'Elia

Abstract: Modeling of phenomena such as anomalous transport via fractional-order differential equations has been established as an effective alternative to partial differential equations, due to the inherent ability to describe large-scale behavior with greater efficiency than fully-resolved classical models. In this review article, we first provide a broad overview of fractional-order derivatives with a cl… ▽ More Modeling of phenomena such as anomalous transport via fractional-order differential equations has been established as an effective alternative to partial differential equations, due to the inherent ability to describe large-scale behavior with greater efficiency than fully-resolved classical models. In this review article, we first provide a broad overview of fractional-order derivatives with a clear emphasis on the stochastic processes that underlie their use. We then survey three exemplary application areas - subsurface transport, turbulence, and anomalous materials - in which fractional-order differential equations provide accurate and predictive models. For each area, we report on the evidence of anomalous behavior that justifies the use of fractional-order models, and survey both foundational models as well as more expressive state-of-the-art models. We also propose avenues for future research, including more advanced and physically sound models, as well as tools for calibration and discovery of fractional-order models. △ Less

Submitted 21 October, 2021; originally announced October 2021.

Comments: 75 pages, 16 figures

Report number: SAND2021-11291 R

arXiv:2107.03066 [pdf, other]

Probabilistic partition of unity networks: clustering based deep approximation

Authors: Nat Trask, Mamikon Gulian, Andy Huang, Kook** Lee

Abstract: Partition of unity networks (POU-Nets) have been shown capable of realizing algebraic convergence rates for regression and solution of PDEs, but require empirical tuning of training parameters. We enrich POU-Nets with a Gaussian noise model to obtain a probabilistic generalization amenable to gradient-based minimization of a maximum likelihood loss. The resulting architecture provides spatial repr… ▽ More Partition of unity networks (POU-Nets) have been shown capable of realizing algebraic convergence rates for regression and solution of PDEs, but require empirical tuning of training parameters. We enrich POU-Nets with a Gaussian noise model to obtain a probabilistic generalization amenable to gradient-based minimization of a maximum likelihood loss. The resulting architecture provides spatial representations of both noiseless and noisy data as Gaussian mixtures with closed form expressions for variance which provides an estimator of local error. The training process yields remarkably sharp partitions of input space based upon correlation of function values. This classification of training points is amenable to a hierarchical refinement strategy that significantly improves the localization of the regression, allowing for higher-order polynomial approximation to be utilized. The framework scales more favorably to large data sets as compared to Gaussian process regression and allows for spatially varying uncertainty, leveraging the expressive power of deep neural networks while bypassing expensive training associated with other probabilistic deep learning methods. Compared to standard deep neural networks, the framework demonstrates hp-convergence without the use of regularizers to tune the localization of partitions. We provide benchmarks quantifying performance in high/low-dimensions, demonstrating that convergence rates depend only on the latent dimension of data within high-dimensional space. Finally, we introduce a new open-source data set of PDE-based simulations of a semiconductor device and perform unsupervised extraction of a physically interpretable reduced-order basis. △ Less

Submitted 7 July, 2021; originally announced July 2021.

Comments: 12 pages, 6 figures

arXiv:2101.11256 [pdf, other]

Partition of unity networks: deep hp-approximation

Authors: Kook** Lee, Nathaniel A. Trask, Ravi G. Patel, Mamikon A. Gulian, Eric C. Cyr

Abstract: Approximation theorists have established best-in-class optimal approximation rates of deep neural networks by utilizing their ability to simultaneously emulate partitions of unity and monomials. Motivated by this, we propose partition of unity networks (POUnets) which incorporate these elements directly into the architecture. Classification architectures of the type used to learn probability measu… ▽ More Approximation theorists have established best-in-class optimal approximation rates of deep neural networks by utilizing their ability to simultaneously emulate partitions of unity and monomials. Motivated by this, we propose partition of unity networks (POUnets) which incorporate these elements directly into the architecture. Classification architectures of the type used to learn probability measures are used to build a meshfree partition of space, while polynomial spaces with learnable coefficients are associated to each partition. The resulting hp-element-like approximation allows use of a fast least-squares optimizer, and the resulting architecture size need not scale exponentially with spatial dimension, breaking the curse of dimensionality. An abstract approximation result establishes desirable properties to guide network design. Numerical results for two choices of architecture demonstrate that POUnets yield hp-convergence for smooth functions and consistently outperform MLPs for piecewise polynomial functions with large numbers of discontinuities. △ Less

Submitted 27 January, 2021; originally announced January 2021.

Comments: 8 pages, 5 figures

arXiv:2006.10123 [pdf, other]

A block coordinate descent optimizer for classification problems exploiting convexity

Authors: Ravi G. Patel, Nathaniel A. Trask, Mamikon A. Gulian, Eric C. Cyr

Abstract: Second-order optimizers hold intriguing potential for deep learning, but suffer from increased cost and sensitivity to the non-convexity of the loss surface as compared to gradient-based approaches. We introduce a coordinate descent method to train deep neural networks for classification tasks that exploits global convexity of the cross-entropy loss in the weights of the linear layer. Our hybrid N… ▽ More Second-order optimizers hold intriguing potential for deep learning, but suffer from increased cost and sensitivity to the non-convexity of the loss surface as compared to gradient-based approaches. We introduce a coordinate descent method to train deep neural networks for classification tasks that exploits global convexity of the cross-entropy loss in the weights of the linear layer. Our hybrid Newton/Gradient Descent (NGD) method is consistent with the interpretation of hidden layers as providing an adaptive basis and the linear layer as providing an optimal fit of the basis to data. By alternating between a second-order method to find globally optimal parameters for the linear layer and gradient descent to train the hidden layers, we ensure an optimal fit of the adaptive basis to data throughout training. The size of the Hessian in the second-order step scales only with the number weights in the linear layer and not the depth and width of the hidden layers; furthermore, the approach is applicable to arbitrary hidden layer architecture. Previous work applying this adaptive basis perspective to regression problems demonstrated significant improvements in accuracy at reduced training cost, and this work can be viewed as an extension of this approach to classification problems. We first prove that the resulting Hessian matrix is symmetric semi-definite, and that the Newton step realizes a global minimizer. By studying classification of manufactured two-dimensional point cloud data, we demonstrate both an improvement in validation error and a striking qualitative difference in the basis functions encoded in the hidden layer when trained using NGD. Application to image classification benchmarks for both dense and convolutional architectures reveals improved training accuracy, suggesting possible gains of second-order methods over gradient descent. △ Less

Submitted 17 June, 2020; originally announced June 2020.

Comments: 10 pages, 4 figures

arXiv:2006.09319 [pdf, other]

doi 10.1615/JMachLearnModelComput.2020035155

A Survey of Constrained Gaussian Process Regression: Approaches and Implementation Challenges

Authors: Laura Swiler, Mamikon Gulian, Ari Frankel, Cosmin Safta, John Jakeman

Abstract: Gaussian process regression is a popular Bayesian framework for surrogate modeling of expensive data sources. As part of a broader effort in scientific machine learning, many recent works have incorporated physical constraints or other a priori information within Gaussian process regression to supplement limited data and regularize the behavior of the model. We provide an overview and survey of se… ▽ More Gaussian process regression is a popular Bayesian framework for surrogate modeling of expensive data sources. As part of a broader effort in scientific machine learning, many recent works have incorporated physical constraints or other a priori information within Gaussian process regression to supplement limited data and regularize the behavior of the model. We provide an overview and survey of several classes of Gaussian process constraints, including positivity or bound constraints, monotonicity and convexity constraints, differential equation constraints provided by linear PDEs, and boundary condition constraints. We compare the strategies behind each approach as well as the differences in implementation, concluding with a discussion of the computational challenges introduced by constraints. △ Less

Submitted 6 January, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

Comments: 42 pages, 3 figures. Version 3: DOI & Reference added; appeared in Journal of Machine Learning for Modeling and Computing. Version 2 includes minor additions, clarifications and improvements to notation

Journal ref: Journal of Machine Learning for Modeling and Computing, 1(2):119-156 (2020)

arXiv:1912.04862 [pdf, other]

Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint

Authors: Eric C. Cyr, Mamikon A. Gulian, Ravi G. Patel, Mauro Perego, Nathaniel A. Trask

Abstract: Motivated by the gap between theoretical optimal approximation rates of deep neural networks (DNNs) and the accuracy realized in practice, we seek to improve the training of DNNs. The adoption of an adaptive basis viewpoint of DNNs leads to novel initializations and a hybrid least squares/gradient descent optimizer. We provide analysis of these techniques and illustrate via numerical examples dram… ▽ More Motivated by the gap between theoretical optimal approximation rates of deep neural networks (DNNs) and the accuracy realized in practice, we seek to improve the training of DNNs. The adoption of an adaptive basis viewpoint of DNNs leads to novel initializations and a hybrid least squares/gradient descent optimizer. We provide analysis of these techniques and illustrate via numerical examples dramatic increases in accuracy and convergence rate for benchmarks characterizing scientific applications where DNNs are currently used, including regression problems and physics-informed neural networks for the solution of partial differential equations. △ Less

Submitted 10 December, 2019; originally announced December 2019.

Comments: 26 pages

arXiv:1808.00931 [pdf, other]

Machine Learning of Space-Fractional Differential Equations

Authors: Mamikon Gulian, Maziar Raissi, Paris Perdikaris, George Karniadakis

Abstract: Data-driven discovery of "hidden physics" -- i.e., machine learning of differential equation models underlying observed data -- has recently been approached by embedding the discovery problem into a Gaussian Process regression of spatial data, treating and discovering unknown equation parameters as hyperparameters of a modified "physics informed" Gaussian Process kernel. This kernel includes the p… ▽ More Data-driven discovery of "hidden physics" -- i.e., machine learning of differential equation models underlying observed data -- has recently been approached by embedding the discovery problem into a Gaussian Process regression of spatial data, treating and discovering unknown equation parameters as hyperparameters of a modified "physics informed" Gaussian Process kernel. This kernel includes the parametrized differential operators applied to a prior covariance kernel. We extend this framework to linear space-fractional differential equations. The methodology is compatible with a wide variety of fractional operators in $\mathbb{R}^d$ and stationary covariance kernels, including the Matern class, and can optimize the Matern parameter during training. We provide a user-friendly and feasible way to perform fractional derivatives of kernels, via a unified set of d-dimensional Fourier integral formulas amenable to generalized Gauss-Laguerre quadrature. The implementation of fractional derivatives has several benefits. First, it allows for discovering fractional-order PDEs for systems characterized by heavy tails or anomalous diffusion, bypassing the analytical difficulty of fractional calculus. Data sets exhibiting such features are of increasing prevalence in physical and financial domains. Second, a single fractional-order archetype allows for a derivative of arbitrary order to be learned, with the order itself being a parameter in the regression. This is advantageous even when used for discovering integer-order equations; the user is not required to assume a "dictionary" of derivatives of various orders, and directly controls the parsimony of the models being discovered. We illustrate on several examples, including fractional-order interpolation of advection-diffusion and modeling relative stock performance in the S&P 500 with alpha-stable motion via a fractional diffusion equation. △ Less

Submitted 2 August, 2019; v1 submitted 2 August, 2018; originally announced August 2018.

Comments: 26 pages, 10 figures. In v2, a minor change to the formatting of a handful of references was made in the bibliography; the main text was unchanged. In v3, minor improvements were made to the exposition; more details about motivation, examples, optimization, and relation to previous works were given

MSC Class: 35R11; 65N21; 62M10; 62F15; 60G15; 60G52

Showing 1–9 of 9 results for author: Gulian, M