Search | arXiv e-print repository

Optimal deep learning of holomorphic operators between Banach spaces

Authors: Ben Adcock, Nick Dexter, Sebastian Moraga

Abstract: Operator learning problems arise in many key areas of scientific computing where Partial Differential Equations (PDEs) are used to model physical systems. In such scenarios, the operators map between Banach or Hilbert spaces. In this work, we tackle the problem of learning operators between Banach spaces, in contrast to the vast majority of past works considering only Hilbert spaces. We focus on l… ▽ More Operator learning problems arise in many key areas of scientific computing where Partial Differential Equations (PDEs) are used to model physical systems. In such scenarios, the operators map between Banach or Hilbert spaces. In this work, we tackle the problem of learning operators between Banach spaces, in contrast to the vast majority of past works considering only Hilbert spaces. We focus on learning holomorphic operators - an important class of problems with many applications. We combine arbitrary approximate encoders and decoders with standard feedforward Deep Neural Network (DNN) architectures - specifically, those with constant width exceeding the depth - under standard $\ell^2$-loss minimization. We first identify a family of DNNs such that the resulting Deep Learning (DL) procedure achieves optimal generalization bounds for such operators. For standard fully-connected architectures, we then show that there are uncountably many minimizers of the training problem that yield equivalent optimal performance. The DNN architectures we consider are `problem agnostic', with width and depth only depending on the amount of training data $m$ and not on regularity assumptions of the target operator. Next, we show that DL is optimal for this problem: no recovery procedure can surpass these generalization bounds up to log terms. Finally, we present numerical results demonstrating the practical performance on challenging problems including the parametric diffusion, Navier-Stokes-Brinkman and Boussinesq PDEs. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.01539 [pdf, other]

Physics-informed deep learning and compressive collocation for high-dimensional diffusion-reaction equations: practical existence theory and numerics

Authors: Simone Brugiapaglia, Nick Dexter, Samir Karam, Weiqi Wang

Abstract: On the forefront of scientific computing, Deep Learning (DL), i.e., machine learning with Deep Neural Networks (DNNs), has emerged a powerful new tool for solving Partial Differential Equations (PDEs). It has been observed that DNNs are particularly well suited to weakening the effect of the curse of dimensionality, a term coined by Richard E. Bellman in the late `50s to describe challenges such a… ▽ More On the forefront of scientific computing, Deep Learning (DL), i.e., machine learning with Deep Neural Networks (DNNs), has emerged a powerful new tool for solving Partial Differential Equations (PDEs). It has been observed that DNNs are particularly well suited to weakening the effect of the curse of dimensionality, a term coined by Richard E. Bellman in the late `50s to describe challenges such as the exponential dependence of the sample complexity, i.e., the number of samples required to solve an approximation problem, on the dimension of the ambient space. However, although DNNs have been used to solve PDEs since the `90s, the literature underpinning their mathematical efficiency in terms of numerical analysis (i.e., stability, accuracy, and sample complexity), is only recently beginning to emerge. In this paper, we leverage recent advancements in function approximation using sparsity-based techniques and random sampling to develop and analyze an efficient high-dimensional PDE solver based on DL. We show, both theoretically and numerically, that it can compete with a novel stable and accurate compressive spectral collocation method. In particular, we demonstrate a new practical existence theorem, which establishes the existence of a class of trainable DNNs with suitable bounds on the network architecture and a sufficient condition on the sample complexity, with logarithmic or, at worst, linear scaling in dimension, such that the resulting networks stably and accurately approximate a diffusion-reaction PDE with high probability. △ Less

Submitted 10 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2404.03761 [pdf, other]

Learning smooth functions in high dimensions: from sparse polynomials to deep neural networks

Authors: Ben Adcock, Simone Brugiapaglia, Nick Dexter, Sebastian Moraga

Abstract: Learning approximations to smooth target functions of many variables from finite sets of pointwise samples is an important task in scientific computing and its many applications in computational science and engineering. Despite well over half a century of research on high-dimensional approximation, this remains a challenging problem. Yet, significant advances have been made in the last decade towa… ▽ More Learning approximations to smooth target functions of many variables from finite sets of pointwise samples is an important task in scientific computing and its many applications in computational science and engineering. Despite well over half a century of research on high-dimensional approximation, this remains a challenging problem. Yet, significant advances have been made in the last decade towards efficient methods for doing this, commencing with so-called sparse polynomial approximation methods and continuing most recently with methods based on Deep Neural Networks (DNNs). In tandem, there have been substantial advances in the relevant approximation theory and analysis of these techniques. In this work, we survey this recent progress. We describe the contemporary motivations for this problem, which stem from parametric models and computational uncertainty quantification; the relevant function classes, namely, classes of infinite-dimensional, Banach-valued, holomorphic functions; fundamental limits of learnability from finite data for these classes; and finally, sparse polynomial and DNN methods for efficiently learning such functions from finite data. For the latter, there is currently a significant gap between the approximation theory of DNNs and the practical performance of deep learning. Aiming to narrow this gap, we develop the topic of practical existence theory, which asserts the existence of dimension-independent DNN architectures and training strategies that achieve provably near-optimal generalization errors in terms of the amount of training data. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2311.14886 [pdf, ps, other]

A unified framework for learning with nonlinear model classes from arbitrary linear samples

Authors: Ben Adcock, Juan M. Cardenas, Nick Dexter

Abstract: This work considers the fundamental problem of learning an unknown object from training data using a given model class. We introduce a unified framework that allows for objects in arbitrary Hilbert spaces, general types of (random) linear measurements as training data and general types of nonlinear model classes. We establish a series of learning guarantees for this framework. These guarantees pro… ▽ More This work considers the fundamental problem of learning an unknown object from training data using a given model class. We introduce a unified framework that allows for objects in arbitrary Hilbert spaces, general types of (random) linear measurements as training data and general types of nonlinear model classes. We establish a series of learning guarantees for this framework. These guarantees provide explicit relations between the amount of training data and properties of the model class to ensure near-best generalization bounds. In doing so, we also introduce and develop the key notion of the variation of a model class with respect to a distribution of sampling operators. To exhibit the versatility of this framework, we show that it can accommodate many different types of well-known problems of interest. We present examples such as matrix sketching by random sampling, compressed sensing with isotropic vectors, active learning in regression and compressed sensing with generative models. In all cases, we show how known results become straightforward corollaries of our general learning guarantees. For compressed sensing with generative models, we also present a number of generalizations and improvements of recent results. In summary, our work not only introduces a unified way to study learning unknown objects from general types of data, but also establishes a series of general theoretical guarantees which consolidate and improve various known results. △ Less

Submitted 24 November, 2023; originally announced November 2023.

arXiv:2306.00945 [pdf, other]

CS4ML: A general framework for active learning with arbitrary data based on Christoffel functions

Authors: Ben Adcock, Juan M. Cardenas, Nick Dexter

Abstract: We introduce a general framework for active learning in regression problems. Our framework extends the standard setup by allowing for general types of data, rather than merely pointwise samples of the target function. This generalization covers many cases of practical interest, such as data acquired in transform domains (e.g., Fourier data), vector-valued data (e.g., gradient-augmented data), data… ▽ More We introduce a general framework for active learning in regression problems. Our framework extends the standard setup by allowing for general types of data, rather than merely pointwise samples of the target function. This generalization covers many cases of practical interest, such as data acquired in transform domains (e.g., Fourier data), vector-valued data (e.g., gradient-augmented data), data acquired along continuous curves, and, multimodal data (i.e., combinations of different types of measurements). Our framework considers random sampling according to a finite number of sampling measures and arbitrary nonlinear approximation spaces (model classes). We introduce the concept of generalized Christoffel functions and show how these can be used to optimize the sampling measures. We prove that this leads to near-optimal sample complexity in various important cases. This paper focuses on applications in scientific computing, where active learning is often desirable, since it is usually expensive to generate data. We demonstrate the efficacy of our framework for gradient-augmented learning with polynomials, Magnetic Resonance Imaging (MRI) using generative models and adaptive sampling for solving PDEs using Physics-Informed Neural Networks (PINNs). △ Less

Submitted 7 December, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

arXiv:2208.12190 [pdf, other]

CAS4DL: Christoffel Adaptive Sampling for function approximation via Deep Learning

Authors: Ben Adcock, Juan M. Cardenas, Nick Dexter

Abstract: The problem of approximating smooth, multivariate functions from sample points arises in many applications in scientific computing, e.g., in computational Uncertainty Quantification (UQ) for science and engineering. In these applications, the target function may represent a desired quantity of interest of a parameterized Partial Differential Equation (PDE). Due to the large cost of solving such pr… ▽ More The problem of approximating smooth, multivariate functions from sample points arises in many applications in scientific computing, e.g., in computational Uncertainty Quantification (UQ) for science and engineering. In these applications, the target function may represent a desired quantity of interest of a parameterized Partial Differential Equation (PDE). Due to the large cost of solving such problems, where each sample is computed by solving a PDE, sample efficiency is a key concerning these applications. Recently, there has been increasing focus on the use of Deep Neural Networks (DNN) and Deep Learning (DL) for learning such functions from data. In this work, we propose an adaptive sampling strategy, CAS4DL (Christoffel Adaptive Sampling for Deep Learning) to increase the sample efficiency of DL for multivariate function approximation. Our novel approach is based on interpreting the second to last layer of a DNN as a dictionary of functions defined by the nodes on that layer. With this viewpoint, we then define an adaptive sampling strategy motivated by adaptive sampling schemes recently proposed for linear approximation schemes, wherein samples are drawn randomly with respect to the Christoffel function of the subspace spanned by this dictionary. We present numerical experiments comparing CAS4DL with standard Monte Carlo (MC) sampling. Our results demonstrate that CAS4DL often yields substantial savings in the number of samples required to achieve a given accuracy, particularly in the case of smooth activation functions, and it shows a better stability in comparison to MC. These results therefore are a promising step towards fully adapting DL towards scientific computing applications. △ Less

Submitted 25 August, 2022; originally announced August 2022.

Comments: 34 pages, 9 figures

arXiv:2203.13908 [pdf, other]

On efficient algorithms for computing near-best polynomial approximations to high-dimensional, Hilbert-valued functions from limited samples

Authors: Ben Adcock, Simone Brugiapaglia, Nick Dexter, Sebastian Moraga

Abstract: Sparse polynomial approximation has become indispensable for approximating smooth, high- or infinite-dimensional functions from limited samples. This is a key task in computational science and engineering, e.g., surrogate modelling in uncertainty quantification where the function is the solution map of a parametric or stochastic differential equation (DE). Yet, sparse polynomial approximation lack… ▽ More Sparse polynomial approximation has become indispensable for approximating smooth, high- or infinite-dimensional functions from limited samples. This is a key task in computational science and engineering, e.g., surrogate modelling in uncertainty quantification where the function is the solution map of a parametric or stochastic differential equation (DE). Yet, sparse polynomial approximation lacks a complete theory. On the one hand, there is a well-developed theory of best $s$-term polynomial approximation, which asserts exponential or algebraic rates of convergence for holomorphic functions. On the other, there are increasingly mature methods such as (weighted) $\ell^1$-minimization for computing such approximations. While the sample complexity of these methods has been analyzed with compressed sensing, whether they achieve best $s$-term approximation rates is not fully understood. Furthermore, these methods are not algorithms per se, as they involve exact minimizers of nonlinear optimization problems. This paper closes these gaps. Specifically, we consider the following question: are there robust, efficient algorithms for computing approximations to finite- or infinite-dimensional, holomorphic and Hilbert-valued functions from limited samples that achieve best $s$-term rates? We answer this affirmatively by introducing algorithms and theoretical guarantees that assert exponential or algebraic rates of convergence, along with robustness to sampling, algorithmic, and physical discretization errors. We tackle both scalar- and Hilbert-valued functions, this being key to parametric or stochastic DEs. Our results involve significant developments of existing techniques, including a novel restarted primal-dual iteration for solving weighted $\ell^1$-minimization problems in Hilbert spaces. Our theory is supplemented by numerical experiments demonstrating the efficacy of these algorithms. △ Less

Submitted 6 November, 2023; v1 submitted 25 March, 2022; originally announced March 2022.

arXiv:2108.04862 [pdf, other]

Matching Algorithms for Blood Donation

Authors: Duncan C McElfresh, Christian Kroer, Sergey Pupyrev, Eric Sodomka, Karthik Sankararaman, Zack Chauvin, Neil Dexter, John P Dickerson

Abstract: Global demand for donated blood far exceeds supply, and unmet need is greatest in low- and middle-income countries; experts suggest that large-scale coordination is necessary to alleviate demand. Using the Facebook Blood Donation tool, we conduct the first large-scale algorithmic matching of blood donors with donation opportunities. While measuring actual donation rates remains a challenge, we mea… ▽ More Global demand for donated blood far exceeds supply, and unmet need is greatest in low- and middle-income countries; experts suggest that large-scale coordination is necessary to alleviate demand. Using the Facebook Blood Donation tool, we conduct the first large-scale algorithmic matching of blood donors with donation opportunities. While measuring actual donation rates remains a challenge, we measure donor action (e.g., making a donation appointment) as a proxy for actual donation. We develop automated policies for matching donors with donation opportunities, based on an online matching model. We provide theoretical guarantees for these policies, both regarding the number of expected donations and the equitable treatment of blood recipients. In simulations, a simple matching strategy increases the number of donations by 5-10%; a pilot experiment with real donors shows a 5% relative increase in donor action rate (from 3.7% to 3.9%). When scaled to the global Blood Donation tool user base, this corresponds to an increase of around one hundred thousand users taking action toward donation. Further, observing donor action on a social network can shed light onto donor behavior and response to incentives. Our initial findings align with several observations made in the medical and social science literature regarding donor behavior. △ Less

Submitted 13 August, 2021; v1 submitted 10 August, 2021; originally announced August 2021.

Comments: An early version of this paper appeared at EC'20. (https://doi.org/10.1145/3391403.3399458)

ACM Class: J.3; J.4

arXiv:2012.06081 [pdf, other]

Deep Neural Networks Are Effective At Learning High-Dimensional Hilbert-Valued Functions From Limited Data

Authors: Ben Adcock, Simone Brugiapaglia, Nick Dexter, Sebastian Moraga

Abstract: Accurate approximation of scalar-valued functions from sample points is a key task in computational science. Recently, machine learning with Deep Neural Networks (DNNs) has emerged as a promising tool for scientific computing, with impressive results achieved on problems where the dimension of the data or problem domain is large. This work broadens this perspective, focusing on approximating funct… ▽ More Accurate approximation of scalar-valued functions from sample points is a key task in computational science. Recently, machine learning with Deep Neural Networks (DNNs) has emerged as a promising tool for scientific computing, with impressive results achieved on problems where the dimension of the data or problem domain is large. This work broadens this perspective, focusing on approximating functions that are Hilbert-valued, i.e. take values in a separable, but typically infinite-dimensional, Hilbert space. This arises in science and engineering problems, in particular those involving solution of parametric Partial Differential Equations (PDEs). Such problems are challenging: 1) pointwise samples are expensive to acquire, 2) the function domain is high dimensional, and 3) the range lies in a Hilbert space. Our contributions are twofold. First, we present a novel result on DNN training for holomorphic functions with so-called hidden anisotropy. This result introduces a DNN training procedure and full theoretical analysis with explicit guarantees on error and sample complexity. The error bound is explicit in three key errors occurring in the approximation procedure: the best approximation, measurement, and physical discretization errors. Our result shows that there exists a procedure (albeit non-standard) for learning Hilbert-valued functions via DNNs that performs as well as, but no better than current best-in-class schemes. It gives a benchmark lower bound for how well DNNs can perform on such problems. Second, we examine whether better performance can be achieved in practice through different types of architectures and training. We provide preliminary numerical results illustrating practical performance of DNNs on parametric PDEs. We consider different parameters, modifying the DNN architecture to achieve better and competitive results, comparing these to current best-in-class schemes. △ Less

Submitted 4 March, 2021; v1 submitted 10 December, 2020; originally announced December 2020.

arXiv:2009.08555 [pdf, other]

Improved recovery guarantees and sampling strategies for TV minimization in compressive imaging

Authors: Ben Adcock, Nick Dexter, Qinghong Xu

Abstract: In this paper, we consider the use of Total Variation (TV) minimization for compressive imaging; that is, image reconstruction from subsampled measurements. Focusing on two important imaging modalities -- namely, Fourier imaging and structured binary imaging via the Walsh--Hadamard transform -- we derive uniform recovery guarantees asserting stable and robust recovery for arbitrary random sampling… ▽ More In this paper, we consider the use of Total Variation (TV) minimization for compressive imaging; that is, image reconstruction from subsampled measurements. Focusing on two important imaging modalities -- namely, Fourier imaging and structured binary imaging via the Walsh--Hadamard transform -- we derive uniform recovery guarantees asserting stable and robust recovery for arbitrary random sampling strategies. Using this, we then derive a class of theoretically-optimal sampling strategies. For Fourier sampling, we show recovery of an image with approximately $s$-sparse gradient from $m \gtrsim_d s \cdot \log^2(s) \cdot \log^4(N)$ measurements, in $d \geq 1$ dimensions. When $d = 2$, this improves the current state-of-the-art result by a factor of $\log(s) \cdot \log(N)$. It also extends it to arbitrary dimensions $d \geq 2$. For Walsh sampling, we prove that $m \gtrsim_d s \cdot \log^2(s) \cdot \log^2(N/s) \cdot \log^3(N) $ measurements suffice in $d \geq 2$ dimensions. To the best of our knowledge, this is the first recovery guarantee for structured binary sampling with TV minimization. △ Less

Submitted 17 September, 2020; originally announced September 2020.

arXiv:2001.07523 [pdf, other]

The gap between theory and practice in function approximation with deep neural networks

Authors: Ben Adcock, Nick Dexter

Abstract: Deep learning (DL) is transforming industry as decision-making processes are being automated by deep neural networks (DNNs) trained on real-world data. Driven partly by rapidly-expanding literature on DNN approximation theory showing they can approximate a rich variety of functions, such tools are increasingly being considered for problems in scientific computing. Yet, unlike traditional algorithm… ▽ More Deep learning (DL) is transforming industry as decision-making processes are being automated by deep neural networks (DNNs) trained on real-world data. Driven partly by rapidly-expanding literature on DNN approximation theory showing they can approximate a rich variety of functions, such tools are increasingly being considered for problems in scientific computing. Yet, unlike traditional algorithms in this field, little is known about DNNs from the principles of numerical analysis, e.g., stability, accuracy, computational efficiency and sample complexity. In this paper we introduce a computational framework for examining DNNs in practice, and use it to study empirical performance with regard to these issues. We study performance of DNNs of different widths & depths on test functions in various dimensions, including smooth and piecewise smooth functions. We also compare DL against best-in-class methods for smooth function approx. based on compressed sensing (CS). Our main conclusion from these experiments is that there is a crucial gap between the approximation theory of DNNs and their practical performance, with trained DNNs performing relatively poorly on functions for which there are strong approximation results (e.g. smooth functions), yet performing well in comparison to best-in-class methods for other functions. To analyze this gap further, we provide some theoretical insights. We establish a practical existence theorem, asserting existence of a DNN architecture and training procedure that offers the same performance as CS. This establishes a key theoretical benchmark, showing the gap can be closed, albeit via a strategy guaranteed to perform as well as, but no better than, current best-in-class schemes. Nevertheless, it demonstrates the promise of practical DNN approx., by highlighting potential for better schemes through careful design of DNN architectures and training strategies. △ Less

Submitted 15 February, 2021; v1 submitted 16 January, 2020; originally announced January 2020.

Showing 1–11 of 11 results for author: Dexter, N