Search | arXiv e-print repository

Optimal deep learning of holomorphic operators between Banach spaces

Authors: Ben Adcock, Nick Dexter, Sebastian Moraga

Abstract: Operator learning problems arise in many key areas of scientific computing where Partial Differential Equations (PDEs) are used to model physical systems. In such scenarios, the operators map between Banach or Hilbert spaces. In this work, we tackle the problem of learning operators between Banach spaces, in contrast to the vast majority of past works considering only Hilbert spaces. We focus on l… ▽ More Operator learning problems arise in many key areas of scientific computing where Partial Differential Equations (PDEs) are used to model physical systems. In such scenarios, the operators map between Banach or Hilbert spaces. In this work, we tackle the problem of learning operators between Banach spaces, in contrast to the vast majority of past works considering only Hilbert spaces. We focus on learning holomorphic operators - an important class of problems with many applications. We combine arbitrary approximate encoders and decoders with standard feedforward Deep Neural Network (DNN) architectures - specifically, those with constant width exceeding the depth - under standard $\ell^2$-loss minimization. We first identify a family of DNNs such that the resulting Deep Learning (DL) procedure achieves optimal generalization bounds for such operators. For standard fully-connected architectures, we then show that there are uncountably many minimizers of the training problem that yield equivalent optimal performance. The DNN architectures we consider are `problem agnostic', with width and depth only depending on the amount of training data $m$ and not on regularity assumptions of the target operator. Next, we show that DL is optimal for this problem: no recovery procedure can surpass these generalization bounds up to log terms. Finally, we present numerical results demonstrating the practical performance on challenging problems including the parametric diffusion, Navier-Stokes-Brinkman and Boussinesq PDEs. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.01539 [pdf, other]

Physics-informed deep learning and compressive collocation for high-dimensional diffusion-reaction equations: practical existence theory and numerics

Authors: Simone Brugiapaglia, Nick Dexter, Samir Karam, Weiqi Wang

Abstract: On the forefront of scientific computing, Deep Learning (DL), i.e., machine learning with Deep Neural Networks (DNNs), has emerged a powerful new tool for solving Partial Differential Equations (PDEs). It has been observed that DNNs are particularly well suited to weakening the effect of the curse of dimensionality, a term coined by Richard E. Bellman in the late `50s to describe challenges such a… ▽ More On the forefront of scientific computing, Deep Learning (DL), i.e., machine learning with Deep Neural Networks (DNNs), has emerged a powerful new tool for solving Partial Differential Equations (PDEs). It has been observed that DNNs are particularly well suited to weakening the effect of the curse of dimensionality, a term coined by Richard E. Bellman in the late `50s to describe challenges such as the exponential dependence of the sample complexity, i.e., the number of samples required to solve an approximation problem, on the dimension of the ambient space. However, although DNNs have been used to solve PDEs since the `90s, the literature underpinning their mathematical efficiency in terms of numerical analysis (i.e., stability, accuracy, and sample complexity), is only recently beginning to emerge. In this paper, we leverage recent advancements in function approximation using sparsity-based techniques and random sampling to develop and analyze an efficient high-dimensional PDE solver based on DL. We show, both theoretically and numerically, that it can compete with a novel stable and accurate compressive spectral collocation method. In particular, we demonstrate a new practical existence theorem, which establishes the existence of a class of trainable DNNs with suitable bounds on the network architecture and a sufficient condition on the sample complexity, with logarithmic or, at worst, linear scaling in dimension, such that the resulting networks stably and accurately approximate a diffusion-reaction PDE with high probability. △ Less

Submitted 10 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2404.03761 [pdf, other]

Learning smooth functions in high dimensions: from sparse polynomials to deep neural networks

Authors: Ben Adcock, Simone Brugiapaglia, Nick Dexter, Sebastian Moraga

Abstract: Learning approximations to smooth target functions of many variables from finite sets of pointwise samples is an important task in scientific computing and its many applications in computational science and engineering. Despite well over half a century of research on high-dimensional approximation, this remains a challenging problem. Yet, significant advances have been made in the last decade towa… ▽ More Learning approximations to smooth target functions of many variables from finite sets of pointwise samples is an important task in scientific computing and its many applications in computational science and engineering. Despite well over half a century of research on high-dimensional approximation, this remains a challenging problem. Yet, significant advances have been made in the last decade towards efficient methods for doing this, commencing with so-called sparse polynomial approximation methods and continuing most recently with methods based on Deep Neural Networks (DNNs). In tandem, there have been substantial advances in the relevant approximation theory and analysis of these techniques. In this work, we survey this recent progress. We describe the contemporary motivations for this problem, which stem from parametric models and computational uncertainty quantification; the relevant function classes, namely, classes of infinite-dimensional, Banach-valued, holomorphic functions; fundamental limits of learnability from finite data for these classes; and finally, sparse polynomial and DNN methods for efficiently learning such functions from finite data. For the latter, there is currently a significant gap between the approximation theory of DNNs and the practical performance of deep learning. Aiming to narrow this gap, we develop the topic of practical existence theory, which asserts the existence of dimension-independent DNN architectures and training strategies that achieve provably near-optimal generalization errors in terms of the amount of training data. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2311.14886 [pdf, ps, other]

A unified framework for learning with nonlinear model classes from arbitrary linear samples

Authors: Ben Adcock, Juan M. Cardenas, Nick Dexter

Abstract: This work considers the fundamental problem of learning an unknown object from training data using a given model class. We introduce a unified framework that allows for objects in arbitrary Hilbert spaces, general types of (random) linear measurements as training data and general types of nonlinear model classes. We establish a series of learning guarantees for this framework. These guarantees pro… ▽ More This work considers the fundamental problem of learning an unknown object from training data using a given model class. We introduce a unified framework that allows for objects in arbitrary Hilbert spaces, general types of (random) linear measurements as training data and general types of nonlinear model classes. We establish a series of learning guarantees for this framework. These guarantees provide explicit relations between the amount of training data and properties of the model class to ensure near-best generalization bounds. In doing so, we also introduce and develop the key notion of the variation of a model class with respect to a distribution of sampling operators. To exhibit the versatility of this framework, we show that it can accommodate many different types of well-known problems of interest. We present examples such as matrix sketching by random sampling, compressed sensing with isotropic vectors, active learning in regression and compressed sensing with generative models. In all cases, we show how known results become straightforward corollaries of our general learning guarantees. For compressed sensing with generative models, we also present a number of generalizations and improvements of recent results. In summary, our work not only introduces a unified way to study learning unknown objects from general types of data, but also establishes a series of general theoretical guarantees which consolidate and improve various known results. △ Less

Submitted 24 November, 2023; originally announced November 2023.

arXiv:2310.16940 [pdf, ps, other]

Optimal approximation of infinite-dimensional holomorphic functions II: recovery from i.i.d. pointwise samples

Authors: Ben Adcock, Nick Dexter, Sebastian Moraga

Abstract: Infinite-dimensional, holomorphic functions have been studied in detail over the last several decades, due to their relevance to parametric differential equations and computational uncertainty quantification. The approximation of such functions from finitely many samples is of particular interest, due to the practical importance of constructing surrogate models to complex mathematical models of ph… ▽ More Infinite-dimensional, holomorphic functions have been studied in detail over the last several decades, due to their relevance to parametric differential equations and computational uncertainty quantification. The approximation of such functions from finitely many samples is of particular interest, due to the practical importance of constructing surrogate models to complex mathematical models of physical processes. In a previous work, [5] we studied the approximation of so-called Banach-valued, $(\boldsymbol{b},\varepsilon)$-holomorphic functions on the infinite-dimensional hypercube $[-1,1]^{\mathbb{N}}$ from $m$ (potentially adaptive) samples. In particular, we derived lower bounds for the adaptive $m$-widths for classes of such functions, which showed that certain algebraic rates of the form $m^{1/2-1/p}$ are the best possible regardless of the sampling-recovery pair. In this work, we continue this investigation by focusing on the practical case where the samples are pointwise evaluations drawn identically and independently from a probability measure. Specifically, for Hilbert-valued $(\boldsymbol{b},\varepsilon)$-holomorphic functions, we show that the same rates can be achieved (up to a small polylogarithmic or algebraic factor) for essentially arbitrary tensor-product Jacobi (ultraspherical) measures. Our reconstruction maps are based on least squares and compressed sensing procedures using the corresponding orthonormal Jacobi polynomials. In doing so, we strengthen and generalize past work that has derived weaker nonuniform guarantees for the uniform and Chebyshev measures (and corresponding polynomials) only. We also extend various best $s$-term polynomial approximation error bounds to arbitrary Jacobi polynomial expansions. Overall, we demonstrate that i.i.d.\ pointwise samples are near-optimal for the recovery of infinite-dimensional, holomorphic functions. △ Less

Submitted 25 October, 2023; originally announced October 2023.

MSC Class: 65D40; 41A10; 41A63; 65Y20; 41A25

arXiv:2306.00945 [pdf, other]

CS4ML: A general framework for active learning with arbitrary data based on Christoffel functions

Authors: Ben Adcock, Juan M. Cardenas, Nick Dexter

Abstract: We introduce a general framework for active learning in regression problems. Our framework extends the standard setup by allowing for general types of data, rather than merely pointwise samples of the target function. This generalization covers many cases of practical interest, such as data acquired in transform domains (e.g., Fourier data), vector-valued data (e.g., gradient-augmented data), data… ▽ More We introduce a general framework for active learning in regression problems. Our framework extends the standard setup by allowing for general types of data, rather than merely pointwise samples of the target function. This generalization covers many cases of practical interest, such as data acquired in transform domains (e.g., Fourier data), vector-valued data (e.g., gradient-augmented data), data acquired along continuous curves, and, multimodal data (i.e., combinations of different types of measurements). Our framework considers random sampling according to a finite number of sampling measures and arbitrary nonlinear approximation spaces (model classes). We introduce the concept of generalized Christoffel functions and show how these can be used to optimize the sampling measures. We prove that this leads to near-optimal sample complexity in various important cases. This paper focuses on applications in scientific computing, where active learning is often desirable, since it is usually expensive to generate data. We demonstrate the efficacy of our framework for gradient-augmented learning with polynomials, Magnetic Resonance Imaging (MRI) using generative models and adaptive sampling for solving PDEs using Physics-Informed Neural Networks (PINNs). △ Less

Submitted 7 December, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

arXiv:2305.18642 [pdf, ps, other]

Optimal approximation of infinite-dimensional holomorphic functions

Authors: Ben Adcock, Nick Dexter, Sebastian Moraga

Abstract: Over the last decade, approximating functions in infinite dimensions from samples has gained increasing attention in computational science and engineering, especially in computational uncertainty quantification. This is primarily due to the relevance of functions that are solutions to parametric differential equations in various fields, e.g. chemistry, economics, engineering, and physics. While ac… ▽ More Over the last decade, approximating functions in infinite dimensions from samples has gained increasing attention in computational science and engineering, especially in computational uncertainty quantification. This is primarily due to the relevance of functions that are solutions to parametric differential equations in various fields, e.g. chemistry, economics, engineering, and physics. While acquiring accurate and reliable approximations of such functions is inherently difficult, current benchmark methods exploit the fact that such functions often belong to certain classes of holomorphic functions to get algebraic convergence rates in infinite dimensions with respect to the number of (potentially adaptive) samples $m$. Our work focuses on providing theoretical approximation guarantees for the class of $(\boldsymbol{b},\varepsilon)$-holomorphic functions, demonstrating that these algebraic rates are the best possible for Banach-valued functions in infinite dimensions. We establish lower bounds using a reduction to a discrete problem in combination with the theory of $m$-widths, Gelfand widths and Kolmogorov widths. We study two cases, known and unknown anisotropy, in which the relative importance of the variables is known and unknown, respectively. A key conclusion of our paper is that in the latter setting, approximation from finite samples is impossible without some inherent ordering of the variables, even if the samples are chosen adaptively. Finally, in both cases, we demonstrate near-optimal, non-adaptive (random) sampling and recovery strategies which achieve close to same rates as the lower bounds. △ Less

Submitted 16 October, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

MSC Class: 65D40; 41A10; 41A63; 65Y20; 41A25

arXiv:2211.12633 [pdf, ps, other]

Near-optimal learning of Banach-valued, high-dimensional functions via deep neural networks

Authors: Ben Adcock, Simone Brugiapaglia, Nick Dexter, Sebastian Moraga

Abstract: The past decade has seen increasing interest in applying Deep Learning (DL) to Computational Science and Engineering (CSE). Driven by impressive results in applications such as computer vision, Uncertainty Quantification (UQ), genetics, simulations and image processing, DL is increasingly supplanting classical algorithms, and seems poised to revolutionize scientific computing. However, DL is not y… ▽ More The past decade has seen increasing interest in applying Deep Learning (DL) to Computational Science and Engineering (CSE). Driven by impressive results in applications such as computer vision, Uncertainty Quantification (UQ), genetics, simulations and image processing, DL is increasingly supplanting classical algorithms, and seems poised to revolutionize scientific computing. However, DL is not yet well-understood from the standpoint of numerical analysis. Little is known about the efficiency and reliability of DL from the perspectives of stability, robustness, accuracy, and sample complexity. In particular, approximating solutions to parametric PDEs is an objective of UQ for CSE. Training data for such problems is often scarce and corrupted by errors. Moreover, the target function is a possibly infinite-dimensional smooth function taking values in the PDE solution space, generally an infinite-dimensional Banach space. This paper provides arguments for Deep Neural Network (DNN) approximation of such functions, with both known and unknown parametric dependence, that overcome the curse of dimensionality. We establish practical existence theorems that describe classes of DNNs with dimension-independent architecture size and training procedures based on minimizing the (regularized) $\ell^2$-loss which achieve near-optimal algebraic rates of convergence. These results involve key extensions of compressed sensing for Banach-valued recovery and polynomial emulation with DNNs. When approximating solutions of parametric PDEs, our results account for all sources of error, i.e., sampling, optimization, approximation and physical discretization, and allow for training high-fidelity DNN approximations from coarse-grained sample data. Our theoretical results fall into the category of non-intrusive methods, providing a theoretical alternative to classical methods for high-dimensional approximation. △ Less

Submitted 29 July, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

Comments: 49 pages

MSC Class: 65D40 (Primary) 68T07 (Secondary) 68Q32

arXiv:2208.12190 [pdf, other]

CAS4DL: Christoffel Adaptive Sampling for function approximation via Deep Learning

Authors: Ben Adcock, Juan M. Cardenas, Nick Dexter

Abstract: The problem of approximating smooth, multivariate functions from sample points arises in many applications in scientific computing, e.g., in computational Uncertainty Quantification (UQ) for science and engineering. In these applications, the target function may represent a desired quantity of interest of a parameterized Partial Differential Equation (PDE). Due to the large cost of solving such pr… ▽ More The problem of approximating smooth, multivariate functions from sample points arises in many applications in scientific computing, e.g., in computational Uncertainty Quantification (UQ) for science and engineering. In these applications, the target function may represent a desired quantity of interest of a parameterized Partial Differential Equation (PDE). Due to the large cost of solving such problems, where each sample is computed by solving a PDE, sample efficiency is a key concerning these applications. Recently, there has been increasing focus on the use of Deep Neural Networks (DNN) and Deep Learning (DL) for learning such functions from data. In this work, we propose an adaptive sampling strategy, CAS4DL (Christoffel Adaptive Sampling for Deep Learning) to increase the sample efficiency of DL for multivariate function approximation. Our novel approach is based on interpreting the second to last layer of a DNN as a dictionary of functions defined by the nodes on that layer. With this viewpoint, we then define an adaptive sampling strategy motivated by adaptive sampling schemes recently proposed for linear approximation schemes, wherein samples are drawn randomly with respect to the Christoffel function of the subspace spanned by this dictionary. We present numerical experiments comparing CAS4DL with standard Monte Carlo (MC) sampling. Our results demonstrate that CAS4DL often yields substantial savings in the number of samples required to achieve a given accuracy, particularly in the case of smooth activation functions, and it shows a better stability in comparison to MC. These results therefore are a promising step towards fully adapting DL towards scientific computing applications. △ Less

Submitted 25 August, 2022; originally announced August 2022.

Comments: 34 pages, 9 figures

arXiv:2203.13908 [pdf, other]

On efficient algorithms for computing near-best polynomial approximations to high-dimensional, Hilbert-valued functions from limited samples

Authors: Ben Adcock, Simone Brugiapaglia, Nick Dexter, Sebastian Moraga

Abstract: Sparse polynomial approximation has become indispensable for approximating smooth, high- or infinite-dimensional functions from limited samples. This is a key task in computational science and engineering, e.g., surrogate modelling in uncertainty quantification where the function is the solution map of a parametric or stochastic differential equation (DE). Yet, sparse polynomial approximation lack… ▽ More Sparse polynomial approximation has become indispensable for approximating smooth, high- or infinite-dimensional functions from limited samples. This is a key task in computational science and engineering, e.g., surrogate modelling in uncertainty quantification where the function is the solution map of a parametric or stochastic differential equation (DE). Yet, sparse polynomial approximation lacks a complete theory. On the one hand, there is a well-developed theory of best $s$-term polynomial approximation, which asserts exponential or algebraic rates of convergence for holomorphic functions. On the other, there are increasingly mature methods such as (weighted) $\ell^1$-minimization for computing such approximations. While the sample complexity of these methods has been analyzed with compressed sensing, whether they achieve best $s$-term approximation rates is not fully understood. Furthermore, these methods are not algorithms per se, as they involve exact minimizers of nonlinear optimization problems. This paper closes these gaps. Specifically, we consider the following question: are there robust, efficient algorithms for computing approximations to finite- or infinite-dimensional, holomorphic and Hilbert-valued functions from limited samples that achieve best $s$-term rates? We answer this affirmatively by introducing algorithms and theoretical guarantees that assert exponential or algebraic rates of convergence, along with robustness to sampling, algorithmic, and physical discretization errors. We tackle both scalar- and Hilbert-valued functions, this being key to parametric or stochastic DEs. Our results involve significant developments of existing techniques, including a novel restarted primal-dual iteration for solving weighted $\ell^1$-minimization problems in Hilbert spaces. Our theory is supplemented by numerical experiments demonstrating the efficacy of these algorithms. △ Less

Submitted 6 November, 2023; v1 submitted 25 March, 2022; originally announced March 2022.

arXiv:2202.02360 [pdf, other]

Towards optimal sampling for learning sparse approximation in high dimensions

Authors: Ben Adcock, Juan M. Cardenas, Nick Dexter, Sebastian Moraga

Abstract: In this chapter, we discuss recent work on learning sparse approximations to high-dimensional functions on data, where the target functions may be scalar-, vector- or even Hilbert space-valued. Our main objective is to study how the sampling strategy affects the sample complexity -- that is, the number of samples that suffice for accurate and stable recovery -- and to use this insight to obtain op… ▽ More In this chapter, we discuss recent work on learning sparse approximations to high-dimensional functions on data, where the target functions may be scalar-, vector- or even Hilbert space-valued. Our main objective is to study how the sampling strategy affects the sample complexity -- that is, the number of samples that suffice for accurate and stable recovery -- and to use this insight to obtain optimal or near-optimal sampling procedures. We consider two settings. First, when a target sparse representation is known, in which case we present a near-complete answer based on drawing independent random samples from carefully-designed probability measures. Second, we consider the more challenging scenario when such representation is unknown. In this case, while not giving a full answer, we describe a general construction of sampling measures that improves over standard Monte Carlo sampling. We present examples using algebraic and trigonometric polynomials, and for the former, we also introduce a new procedure for function approximation on irregular (i.e., nontensorial) domains. The effectiveness of this procedure is shown through numerical examples. Finally, we discuss a number of structured sparsity models, and how they may lead to better approximations. △ Less

Submitted 4 February, 2022; originally announced February 2022.

arXiv:2202.00144 [pdf, other]

An Adaptive sampling and domain learning strategy for multivariate function approximation on unknown domains

Authors: Ben Adcock, Juan M. Cardenas, Nick Dexter

Abstract: Many problems in computational science and engineering can be described in terms of approximating a smooth function of $d$ variables, defined over an unknown domain of interest $Ω\subset \mathbb{R}^d$, from sample data. Here both the curse of dimensionality ($d\gg 1$) and the lack of domain knowledge with $Ω$ potentially irregular and/or disconnected are confounding factors for sampling-based meth… ▽ More Many problems in computational science and engineering can be described in terms of approximating a smooth function of $d$ variables, defined over an unknown domain of interest $Ω\subset \mathbb{R}^d$, from sample data. Here both the curse of dimensionality ($d\gg 1$) and the lack of domain knowledge with $Ω$ potentially irregular and/or disconnected are confounding factors for sampling-based methods. Naïve approaches often lead to wasted samples and inefficient approximation schemes. For example, uniform sampling can result in upwards of 20\% wasted samples in some problems. In surrogate model construction in computational uncertainty quantification (UQ), the high cost of computing samples needs a more efficient sampling procedure. In the last years, methods for computing such approximations from sample data have been studied in the case of irregular domains. The advantages of computing sampling measures depending on an approximation space $P$ of $\dim(P)=N$ have been shown. In particular, such methods confer advantages such as stability and well-conditioning, with $\mathcal{O}(N\log(N))$ as sample complexity. The recently-proposed adaptive sampling for general domains (ASGD) strategy is one method to construct these sampling measures. The main contribution of this paper is to improve ASGD by adaptively updating the sampling measures over unknown domains. We achieve this by first introducing a general domain adaptivity strategy (GDAS), which approximates the function and domain of interest from sample points. Second, we propose adaptive sampling for unknown domains (ASUD), which generates sampling measures over a domain that may not be known in advance. Our results show that the ASUD approach consistently achieves the same or smaller errors as uniform sampling, but using fewer, and often significantly fewer evaluations. △ Less

Submitted 4 October, 2022; v1 submitted 31 January, 2022; originally announced February 2022.

arXiv:2108.04862 [pdf, other]

Matching Algorithms for Blood Donation

Authors: Duncan C McElfresh, Christian Kroer, Sergey Pupyrev, Eric Sodomka, Karthik Sankararaman, Zack Chauvin, Neil Dexter, John P Dickerson

Abstract: Global demand for donated blood far exceeds supply, and unmet need is greatest in low- and middle-income countries; experts suggest that large-scale coordination is necessary to alleviate demand. Using the Facebook Blood Donation tool, we conduct the first large-scale algorithmic matching of blood donors with donation opportunities. While measuring actual donation rates remains a challenge, we mea… ▽ More Global demand for donated blood far exceeds supply, and unmet need is greatest in low- and middle-income countries; experts suggest that large-scale coordination is necessary to alleviate demand. Using the Facebook Blood Donation tool, we conduct the first large-scale algorithmic matching of blood donors with donation opportunities. While measuring actual donation rates remains a challenge, we measure donor action (e.g., making a donation appointment) as a proxy for actual donation. We develop automated policies for matching donors with donation opportunities, based on an online matching model. We provide theoretical guarantees for these policies, both regarding the number of expected donations and the equitable treatment of blood recipients. In simulations, a simple matching strategy increases the number of donations by 5-10%; a pilot experiment with real donors shows a 5% relative increase in donor action rate (from 3.7% to 3.9%). When scaled to the global Blood Donation tool user base, this corresponds to an increase of around one hundred thousand users taking action toward donation. Further, observing donor action on a social network can shed light onto donor behavior and response to incentives. Our initial findings align with several observations made in the medical and social science literature regarding donor behavior. △ Less

Submitted 13 August, 2021; v1 submitted 10 August, 2021; originally announced August 2021.

Comments: An early version of this paper appeared at EC'20. (https://doi.org/10.1145/3391403.3399458)

ACM Class: J.3; J.4

arXiv:2012.06081 [pdf, other]

Deep Neural Networks Are Effective At Learning High-Dimensional Hilbert-Valued Functions From Limited Data

Authors: Ben Adcock, Simone Brugiapaglia, Nick Dexter, Sebastian Moraga

Abstract: Accurate approximation of scalar-valued functions from sample points is a key task in computational science. Recently, machine learning with Deep Neural Networks (DNNs) has emerged as a promising tool for scientific computing, with impressive results achieved on problems where the dimension of the data or problem domain is large. This work broadens this perspective, focusing on approximating funct… ▽ More Accurate approximation of scalar-valued functions from sample points is a key task in computational science. Recently, machine learning with Deep Neural Networks (DNNs) has emerged as a promising tool for scientific computing, with impressive results achieved on problems where the dimension of the data or problem domain is large. This work broadens this perspective, focusing on approximating functions that are Hilbert-valued, i.e. take values in a separable, but typically infinite-dimensional, Hilbert space. This arises in science and engineering problems, in particular those involving solution of parametric Partial Differential Equations (PDEs). Such problems are challenging: 1) pointwise samples are expensive to acquire, 2) the function domain is high dimensional, and 3) the range lies in a Hilbert space. Our contributions are twofold. First, we present a novel result on DNN training for holomorphic functions with so-called hidden anisotropy. This result introduces a DNN training procedure and full theoretical analysis with explicit guarantees on error and sample complexity. The error bound is explicit in three key errors occurring in the approximation procedure: the best approximation, measurement, and physical discretization errors. Our result shows that there exists a procedure (albeit non-standard) for learning Hilbert-valued functions via DNNs that performs as well as, but no better than current best-in-class schemes. It gives a benchmark lower bound for how well DNNs can perform on such problems. Second, we examine whether better performance can be achieved in practice through different types of architectures and training. We provide preliminary numerical results illustrating practical performance of DNNs on parametric PDEs. We consider different parameters, modifying the DNN architecture to achieve better and competitive results, comparing these to current best-in-class schemes. △ Less

Submitted 4 March, 2021; v1 submitted 10 December, 2020; originally announced December 2020.

arXiv:2009.08555 [pdf, other]

Improved recovery guarantees and sampling strategies for TV minimization in compressive imaging

Authors: Ben Adcock, Nick Dexter, Qinghong Xu

Abstract: In this paper, we consider the use of Total Variation (TV) minimization for compressive imaging; that is, image reconstruction from subsampled measurements. Focusing on two important imaging modalities -- namely, Fourier imaging and structured binary imaging via the Walsh--Hadamard transform -- we derive uniform recovery guarantees asserting stable and robust recovery for arbitrary random sampling… ▽ More In this paper, we consider the use of Total Variation (TV) minimization for compressive imaging; that is, image reconstruction from subsampled measurements. Focusing on two important imaging modalities -- namely, Fourier imaging and structured binary imaging via the Walsh--Hadamard transform -- we derive uniform recovery guarantees asserting stable and robust recovery for arbitrary random sampling strategies. Using this, we then derive a class of theoretically-optimal sampling strategies. For Fourier sampling, we show recovery of an image with approximately $s$-sparse gradient from $m \gtrsim_d s \cdot \log^2(s) \cdot \log^4(N)$ measurements, in $d \geq 1$ dimensions. When $d = 2$, this improves the current state-of-the-art result by a factor of $\log(s) \cdot \log(N)$. It also extends it to arbitrary dimensions $d \geq 2$. For Walsh sampling, we prove that $m \gtrsim_d s \cdot \log^2(s) \cdot \log^2(N/s) \cdot \log^3(N) $ measurements suffice in $d \geq 2$ dimensions. To the best of our knowledge, this is the first recovery guarantee for structured binary sampling with TV minimization. △ Less

Submitted 17 September, 2020; originally announced September 2020.

arXiv:2001.07523 [pdf, other]

The gap between theory and practice in function approximation with deep neural networks

Authors: Ben Adcock, Nick Dexter

Abstract: Deep learning (DL) is transforming industry as decision-making processes are being automated by deep neural networks (DNNs) trained on real-world data. Driven partly by rapidly-expanding literature on DNN approximation theory showing they can approximate a rich variety of functions, such tools are increasingly being considered for problems in scientific computing. Yet, unlike traditional algorithm… ▽ More Deep learning (DL) is transforming industry as decision-making processes are being automated by deep neural networks (DNNs) trained on real-world data. Driven partly by rapidly-expanding literature on DNN approximation theory showing they can approximate a rich variety of functions, such tools are increasingly being considered for problems in scientific computing. Yet, unlike traditional algorithms in this field, little is known about DNNs from the principles of numerical analysis, e.g., stability, accuracy, computational efficiency and sample complexity. In this paper we introduce a computational framework for examining DNNs in practice, and use it to study empirical performance with regard to these issues. We study performance of DNNs of different widths & depths on test functions in various dimensions, including smooth and piecewise smooth functions. We also compare DL against best-in-class methods for smooth function approx. based on compressed sensing (CS). Our main conclusion from these experiments is that there is a crucial gap between the approximation theory of DNNs and their practical performance, with trained DNNs performing relatively poorly on functions for which there are strong approximation results (e.g. smooth functions), yet performing well in comparison to best-in-class methods for other functions. To analyze this gap further, we provide some theoretical insights. We establish a practical existence theorem, asserting existence of a DNN architecture and training procedure that offers the same performance as CS. This establishes a key theoretical benchmark, showing the gap can be closed, albeit via a strategy guaranteed to perform as well as, but no better than, current best-in-class schemes. Nevertheless, it demonstrates the promise of practical DNN approx., by highlighting potential for better schemes through careful design of DNN architectures and training strategies. △ Less

Submitted 15 February, 2021; v1 submitted 16 January, 2020; originally announced January 2020.

arXiv:1905.05853 [pdf, other]

Reconstructing high-dimensional Hilbert-valued functions via compressed sensing

Authors: Nick Dexter, Hoang Tran, Clayton Webster

Abstract: We present and analyze a novel sparse polynomial technique for approximating high-dimensional Hilbert-valued functions, with application to parameterized partial differential equations (PDEs) with deterministic and stochastic inputs. Our theoretical framework treats the function approximation problem as a joint sparse recovery problem, where the set of jointly sparse vectors is possibly infinite.… ▽ More We present and analyze a novel sparse polynomial technique for approximating high-dimensional Hilbert-valued functions, with application to parameterized partial differential equations (PDEs) with deterministic and stochastic inputs. Our theoretical framework treats the function approximation problem as a joint sparse recovery problem, where the set of jointly sparse vectors is possibly infinite. To achieve the simultaneous reconstruction of Hilbert-valued functions in both parametric domain and Hilbert space, we propose a novel mixed-norm based $\ell_1$ regularization method that exploits both energy and sparsity. Our approach requires extensions of concepts such as the restricted isometry and null space properties, allowing us to prove recovery guarantees for sparse Hilbert-valued function reconstruction. We complement the enclosed theory with an algorithm for Hilbert-valued recovery, based on standard forward-backward algorithm, meanwhile establishing its strong convergence in the considered infinite-dimensional setting. Finally, we demonstrate the minimal sample complexity requirements of our approach, relative to other popular methods, with numerical experiments approximating the solutions of high-dimensional parameterized elliptic PDEs. △ Less

Submitted 14 May, 2019; originally announced May 2019.

Comments: 5 pages, 2 figures. Accepted for poster at SampTA 2019

Journal ref: https://sampta2019.sciencesconf.org/267707

arXiv:1812.06174 [pdf, other]

doi 10.1051/m2an/2019048

A mixed $\ell_1$ regularization approach for sparse simultaneous approximation of parameterized PDEs

Authors: Nick Dexter, Hoang Tran, Clayton Webster

Abstract: We present and analyze a novel sparse polynomial technique for the simultaneous approximation of parameterized partial differential equations (PDEs) with deterministic and stochastic inputs. Our approach treats the numerical solution as a jointly sparse reconstruction problem through the reformulation of the standard basis pursuit denoising, where the set of jointly sparse vectors is infinite. To… ▽ More We present and analyze a novel sparse polynomial technique for the simultaneous approximation of parameterized partial differential equations (PDEs) with deterministic and stochastic inputs. Our approach treats the numerical solution as a jointly sparse reconstruction problem through the reformulation of the standard basis pursuit denoising, where the set of jointly sparse vectors is infinite. To achieve global reconstruction of sparse solutions to parameterized elliptic PDEs over both physical and parametric domains, we combine the standard measurement scheme developed for compressed sensing in the context of bounded orthonormal systems with a novel mixed-norm based $\ell_1$ regularization method that exploits both energy and sparsity. In addition, we are able to prove that, with minimal sample complexity, error estimates comparable to the best $s$-term and quasi-optimal approximations are achievable, while requiring only a priori bounds on polynomial truncation error with respect to the energy norm. Finally, we perform extensive numerical experiments on several high-dimensional parameterized elliptic PDE models to demonstrate the superior recovery properties of the proposed approach. △ Less

Submitted 14 December, 2018; originally announced December 2018.

Comments: 23 pages, 4 figures

Journal ref: https://www.esaim-m2an.org/articles/m2an/pdf/2019/06/m2an180226.pdf

arXiv:1711.02591 [pdf, ps, other]

doi 10.1007/s11228-021-00603-2

On the strong convergence of forward-backward splitting in reconstructing jointly sparse signals

Authors: Nick Dexter, Hoang Tran, Clayton Webster

Abstract: We consider the problem of reconstructing an infinite set of sparse, finite-dimensional vectors, that share a common sparsity pattern, from incomplete measurements. This is in contrast to the work [17], where the single vector signal can be infinite-dimensional, and [28], which extends the aforementioned work to the joint sparse recovery of finite number of infinite-dimensional vectors. In our cas… ▽ More We consider the problem of reconstructing an infinite set of sparse, finite-dimensional vectors, that share a common sparsity pattern, from incomplete measurements. This is in contrast to the work [17], where the single vector signal can be infinite-dimensional, and [28], which extends the aforementioned work to the joint sparse recovery of finite number of infinite-dimensional vectors. In our case, to take account of the joint sparsity and promote the coupling of nonvanishing components, we employ a convex relaxation approach with mixed norm penalty $\ell_{2,1}$. This paper discusses the computation of the solutions of linear inverse problems with such relaxation by a forward-backward splitting algorithm. However, since the solution matrix possesses infinitely many columns, the arguments of [17] no longer apply. As such, we establish new strong convergence results for the algorithm, in particular when the set of jointly sparse vectors is infinite. △ Less

Submitted 25 November, 2021; v1 submitted 7 November, 2017; originally announced November 2017.

Journal ref: Set-Valued Var. Anal. (2021)

arXiv:1602.05823 [pdf, other]

doi 10.1090/mcom/3272

Polynomial approximation via compressed sensing of high-dimensional functions on lower sets

Authors: Abdellah Chkifa, Nick Dexter, Hoang Tran, Clayton G. Webster

Abstract: This work proposes and analyzes a compressed sensing approach to polynomial approximation of complex-valued functions in high dimensions. Of particular interest is the setting where the target function is smooth, characterized by a rapidly decaying orthonormal expansion, whose most important terms are captured by a lower (or downward closed) set. By exploiting this fact, we present an innovative w… ▽ More This work proposes and analyzes a compressed sensing approach to polynomial approximation of complex-valued functions in high dimensions. Of particular interest is the setting where the target function is smooth, characterized by a rapidly decaying orthonormal expansion, whose most important terms are captured by a lower (or downward closed) set. By exploiting this fact, we present an innovative weighted $\ell_1$ minimization procedure with a precise choice of weights, and a new iterative hard thresholding method, for imposing the downward closed preference. Theoretical results reveal that our computational approaches possess a provably reduced sample complexity compared to existing compressed sensing techniques presented in the literature. In addition, the recovery of the corresponding best approximation using these methods is established through an improved bound for the restricted isometry property. Our analysis represents an extension of the approach for Hadamard matrices in [5] to the general case of continuous bounded orthonormal systems, quantifies the dependence of sample complexity on the successful recovery probability, and provides an estimate on the number of measurements with explicit constants. Numerical examples are provided to support the theoretical results and demonstrate the computational efficiency of the novel weighted $\ell_1$ minimization strategy. △ Less

Submitted 18 February, 2016; originally announced February 2016.

Comments: 33 pages, 3 figures

MSC Class: 65F35; 42C05; 41A10; 6008

Journal ref: https://www.ams.org/journals/mcom/2018-87-311/S0025-5718-2017-03272-5/home.html

arXiv:1507.05545 [pdf, other]

doi 10.1016/j.camwa.2015.12.005

Explicit cost bounds of stochastic Galerkin approximations for parameterized PDEs with random coefficients

Authors: Nick Dexter, Clayton Webster, Guannan Zhang

Abstract: This work analyzes the overall computational complexity of the stochastic Galerkin finite element method (SGFEM) for approximating the solution of parameterized elliptic partial differential equations with both affine and non-affine random coefficients. To compute the fully discrete solution, such approaches employ a Galerkin projection in both the deterministic and stochastic domains, produced he… ▽ More This work analyzes the overall computational complexity of the stochastic Galerkin finite element method (SGFEM) for approximating the solution of parameterized elliptic partial differential equations with both affine and non-affine random coefficients. To compute the fully discrete solution, such approaches employ a Galerkin projection in both the deterministic and stochastic domains, produced here by a combination of finite elements and a global orthogonal basis, defined on an isotopic total degree index set, respectively. To account for the sparsity of the resulting system, we present a rigorous cost analysis that considers the total number of coupled finite element systems that must be simultaneously solved in the SGFEM. However, to maintain sparsity as the coefficient becomes increasingly nonlinear in the parameterization, it is necessary to also approximate the coefficient by an additional orthogonal expansion. In this case we prove a rigorous complexity estimate for the number of floating point operations (FLOPs) required per matrix-vector multiplication of the coupled system. Based on such complexity estimates we also develop explicit cost bounds in terms of FLOPs to solve the stochastic Galerkin (SG) systems to a prescribed tolerance, which are used to compare with the minimal complexity estimates of a stochastic collocation finite element method (SCFEM), shown in our previous work [16]. Finally, computational evidence complements the theoretical estimates and supports our conclusion that, in the case that the coefficient is affine, the coupled SG system can be solved more efficiently than the decoupled SC systems. However, as the coefficient becomes more nonlinear, it becomes prohibitively expensive to obtain an approximation with the SGFEM. △ Less

Submitted 22 June, 2016; v1 submitted 20 July, 2015; originally announced July 2015.

Journal ref: Computers and Mathematics with Applications, Volume 71, Issue 11, June 2016, Pages 2231-2256

Showing 1–21 of 21 results for author: Dexter, N