-
A note on best n-term approximation for generalized Wiener classes
Authors:
Ronald DeVore,
Guergana Petrova,
Przemyslaw Wojtaszczyk
Abstract:
We determine the best n-term approximation of generalized Wiener model classes in a Hilbert space $H $. This theory is then applied to several special cases.
We determine the best n-term approximation of generalized Wiener model classes in a Hilbert space $H $. This theory is then applied to several special cases.
△ Less
Submitted 25 June, 2024; v1 submitted 15 June, 2024;
originally announced June 2024.
-
Convergence and error control of consistent PINNs for elliptic PDEs
Authors:
Andrea Bonito,
Ronald DeVore,
Guergana Petrova,
Jonathan W. Siegel
Abstract:
We provide an a priori analysis of a certain class of numerical methods, commonly referred to as collocation methods, for solving elliptic boundary value problems. They begin with information in the form of point values of the right side f of such equations and point values of the boundary function g and utilize only this information to numerically approximate the solution u of the Partial Differe…
▽ More
We provide an a priori analysis of a certain class of numerical methods, commonly referred to as collocation methods, for solving elliptic boundary value problems. They begin with information in the form of point values of the right side f of such equations and point values of the boundary function g and utilize only this information to numerically approximate the solution u of the Partial Differential Equation (PDE). For such a method to provide an approximation to u with guaranteed error bounds, additional assumptions on f and g, called model class assumptions, are needed. We determine the best error (in the energy norm) of approximating u, in terms of the number of point samples m, under all Besov class model assumptions for the right hand side $f$ and boundary g.
We then turn to the study of numerical procedures and asks whether a proposed numerical procedure (nearly) achieves the optimal recovery error. We analyze numerical methods which generate the numerical approximation to $u$ by minimizing a specified data driven loss function over a set $Σ$ which is either a finite dimensional linear space, or more generally, a finite dimensional manifold. We show that the success of such a procedure depends critically on choosing a correct data driven loss function that is consistent with the PDE and provides sharp error control. Based on this analysis a loss function $L^*$ is proposed.
We also address the recent methods of Physics Informed Neural Networks (PINNs). Minimization of the new loss $L^*$ over neural network spaces $Σ$ is referred to as consistent PINNs (CPINNs). We prove that CPINNs provides an optimal recovery of the solution $u$, provided that the optimization problem can be numerically executed and $Σ$ has sufficient approximation capabilities. Finally, numerical examples illustrating the benefits of the CPINNs are given.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Neural networks: deep, shallow, or in between?
Authors:
Guergana Petrova,
Przemyslaw Wojtaszczyk
Abstract:
We give estimates from below for the error of approximation of a compact subset from a Banach space by the outputs of feed-forward neural networks with width W, depth l and Lipschitz activation functions. We show that, modulo logarithmic factors, rates better that entropy numbers' rates are possibly attainable only for neural networks for which the depth l goes to infinity, and that there is no ga…
▽ More
We give estimates from below for the error of approximation of a compact subset from a Banach space by the outputs of feed-forward neural networks with width W, depth l and Lipschitz activation functions. We show that, modulo logarithmic factors, rates better that entropy numbers' rates are possibly attainable only for neural networks for which the depth l goes to infinity, and that there is no gain if we fix the depth and let the width W go to infinity.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Solving PDEs with Incomplete Information
Authors:
Peter Binev,
Andrea Bonito,
Albert Cohen,
Wolfgang Dahmen,
Ronald DeVore,
Guergana Petrova
Abstract:
We consider the problem of numerically approximating the solutions to a partial differential equation (PDE) when there is insufficient information to determine a unique solution. Our main example is the Poisson boundary value problem, when the boundary data is unknown and instead one observes finitely many linear measurements of the solution. We view this setting as an optimal recovery problem and…
▽ More
We consider the problem of numerically approximating the solutions to a partial differential equation (PDE) when there is insufficient information to determine a unique solution. Our main example is the Poisson boundary value problem, when the boundary data is unknown and instead one observes finitely many linear measurements of the solution. We view this setting as an optimal recovery problem and develop theory and numerical algorithms for its solution. The main vehicle employed is the derivation and approximation of the Riesz representers of these functionals with respect to relevant Hilbert spaces of harmonic functions.
△ Less
Submitted 20 December, 2023; v1 submitted 13 January, 2023;
originally announced January 2023.
-
Limitations on approximation by deep and shallow neural networks
Authors:
Guergana Petrova,
Przemysław Wojtaszczyk
Abstract:
We prove Carl's type inequalities for the error of approximation of compact sets K by deep and shallow neural networks. This in turn gives lower bounds on how well we can approximate the functions in K when requiring the approximants to come from outputs of such networks. Our results are obtained as a byproduct of the study of the recently introduced Lipschitz widths.
We prove Carl's type inequalities for the error of approximation of compact sets K by deep and shallow neural networks. This in turn gives lower bounds on how well we can approximate the functions in K when requiring the approximants to come from outputs of such networks. Our results are obtained as a byproduct of the study of the recently introduced Lipschitz widths.
△ Less
Submitted 30 November, 2022;
originally announced December 2022.
-
Optimal Learning
Authors:
Peter Binev,
Andrea Bonito,
Ronald DeVore,
Guergana Petrova
Abstract:
This paper studies the problem of learning an unknown function $f$ from given data about $f$. The learning problem is to give an approximation $\hat f$ to $f$ that predicts the values of $f$ away from the data. There are numerous settings for this learning problem depending on (i) what additional information we have about $f$ (known as a model class assumption), (ii) how we measure the accuracy of…
▽ More
This paper studies the problem of learning an unknown function $f$ from given data about $f$. The learning problem is to give an approximation $\hat f$ to $f$ that predicts the values of $f$ away from the data. There are numerous settings for this learning problem depending on (i) what additional information we have about $f$ (known as a model class assumption), (ii) how we measure the accuracy of how well $\hat f$ predicts $f$, (iii) what is known about the data and data sites, (iv) whether the data observations are polluted by noise. A mathematical description of the optimal performance possible (the smallest possible error of recovery) is known in the presence of a model class assumption. Under standard model class assumptions, it is shown in this paper that a near optimal $\hat f$ can be found by solving a certain discrete over-parameterized optimization problem with a penalty term. Here, near optimal means that the error is bounded by a fixed constant times the optimal error. This explains the advantage of over-parameterization which is commonly used in modern machine learning. The main results of this paper prove that over-parameterized learning with an appropriate loss function gives a near optimal approximation $\hat f$ of the function $f$ from which the data is collected. Quantitative bounds are given for how much over-parameterization needs to be employed and how the penalization needs to be scaled in order to guarantee a near optimal recovery of $f$. An extension of these results to the case where the data is polluted by additive deterministic noise is also given.
△ Less
Submitted 26 June, 2023; v1 submitted 29 March, 2022;
originally announced March 2022.
-
On the entropy numbers and the Kolmogorov widths
Authors:
Guergana Petrova,
Przemyslaw Wojtaszczyk
Abstract:
Direct estimates between linear or nonlinear Kolmogorov widths and entropy numbers are presented. These estimates are derived using the recently introduced Lipschitz widths. Applications for m-term approximation are obtained.
Direct estimates between linear or nonlinear Kolmogorov widths and entropy numbers are presented. These estimates are derived using the recently introduced Lipschitz widths. Applications for m-term approximation are obtained.
△ Less
Submitted 15 February, 2022;
originally announced March 2022.
-
Lipschitz widths
Authors:
Guergana Petrova,
Przemyslaw Wojtaszczyk
Abstract:
This paper introduces a measure, called Lipschitz widths, of the optimal performance possible of certain nonlinear methods of approximation. It discusses their relation to entropy numbers and other well known widths such as the Kolmogorov and the stable manifold widths. It also shows that the Lipschitz widths provide a theoretical benchmark for the approximation quality achieved via deep neural ne…
▽ More
This paper introduces a measure, called Lipschitz widths, of the optimal performance possible of certain nonlinear methods of approximation. It discusses their relation to entropy numbers and other well known widths such as the Kolmogorov and the stable manifold widths. It also shows that the Lipschitz widths provide a theoretical benchmark for the approximation quality achieved via deep neural networks.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
Neural Network Approximation
Authors:
Ronald DeVore,
Boris Hanin,
Guergana Petrova
Abstract:
Neural Networks (NNs) are the method of choice for building learning algorithms. Their popularity stems from their empirical success on several challenging learning problems. However, most scholars agree that a convincing theoretical explanation for this success is still lacking.
This article surveys the known approximation properties of the outputs of NNs with the aim of uncovering the properti…
▽ More
Neural Networks (NNs) are the method of choice for building learning algorithms. Their popularity stems from their empirical success on several challenging learning problems. However, most scholars agree that a convincing theoretical explanation for this success is still lacking.
This article surveys the known approximation properties of the outputs of NNs with the aim of uncovering the properties that are not present in the more traditional methods of approximation used in numerical analysis. Comparisons are made with traditional approximation methods from the viewpoint of rate distortion. Another major component in the analysis of numerical approximation is the computational time needed to construct the approximation and this in turn is intimately connected with the stability of the approximation algorithm. So the stability of numerical approximation using NNs is a large part of the analysis put forward.
The survey, for the most part, is concerned with NNs using the popular ReLU activation function. In this case, the outputs of the NNs are piecewise linear functions on rather complicated partitions of the domain of $f$ into cells that are convex polytopes. When the architecture of the NN is fixed and the parameters are allowed to vary, the set of output functions of the NN is a parameterized nonlinear manifold. It is shown that this manifold has certain space filling properties leading to an increased ability to approximate (better rate distortion) but at the expense of numerical stability. The space filling creates a challenge to the numerical method in finding best or good parameter choices when trying to approximate.
△ Less
Submitted 28 December, 2020;
originally announced December 2020.
-
Optimal Stable Nonlinear Approximation
Authors:
Albert Cohen,
Ronald DeVore,
Guergana Petrova,
Przemyslaw Wojtaszczyk
Abstract:
While it is well known that nonlinear methods of approximation can often perform dramatically better than linear methods, there are still questions on how to measure the optimal performance possible for such methods. This paper studies nonlinear methods of approximation that are compatible with numerical implementation in that they are required to be numerically stable. A measure of optimal perfor…
▽ More
While it is well known that nonlinear methods of approximation can often perform dramatically better than linear methods, there are still questions on how to measure the optimal performance possible for such methods. This paper studies nonlinear methods of approximation that are compatible with numerical implementation in that they are required to be numerically stable. A measure of optimal performance, called {\em stable manifold widths}, for approximating a model class $K$ in a Banach space $X$ by stable manifold methods is introduced. Fundamental inequalities between these stable manifold widths and the entropy of $K$ are established. The effects of requiring stability in the settings of deep learning and compressed sensing are discussed.
△ Less
Submitted 21 September, 2020;
originally announced September 2020.
-
Nonlinear Methods for Model Reduction
Authors:
Andrea Bonito,
Albert Cohen,
Ronald DeVore,
Diane Guignard,
Peter Jantsch,
Guergana Petrova
Abstract:
The usual approach to model reduction for parametric partial differential equations (PDEs) is to construct a linear space $V_n$ which approximates well the solution manifold $\mathcal{M}$ consisting of all solutions $u(y)$ with $y$ the vector of parameters. This linear reduced model $V_n$ is then used for various tasks such as building an online forward solver for the PDE or estimating parameters…
▽ More
The usual approach to model reduction for parametric partial differential equations (PDEs) is to construct a linear space $V_n$ which approximates well the solution manifold $\mathcal{M}$ consisting of all solutions $u(y)$ with $y$ the vector of parameters. This linear reduced model $V_n$ is then used for various tasks such as building an online forward solver for the PDE or estimating parameters from data observations. It is well understood in other problems of numerical computation that nonlinear methods such as adaptive approximation, $n$-term approximation, and certain tree-based methods may provide improved numerical efficiency. For model reduction, a nonlinear method would replace the linear space $V_n$ by a nonlinear space $Σ_n$. This idea has already been suggested in recent papers on model reduction where the parameter domain is decomposed into a finite number of cells and a linear space of low dimension is assigned to each cell.
Up to this point, little is known in terms of performance guarantees for such a nonlinear strategy. Moreover, most numerical experiments for nonlinear model reduction use a parameter dimension of only one or two. In this work, a step is made towards a more cohesive theory for nonlinear model reduction. Framing these methods in the general setting of library approximation allows us to give a first comparison of their performance with those of standard linear approximation for any general compact set. We then turn to the study these methods for solution manifolds of parametrized elliptic PDEs. We study a very specific example of library approximation where the parameter domain is split into a finite number $N$ of rectangular cells and where different reduced affine spaces of dimension $m$ are assigned to each cell. The performance of this nonlinear procedure is analyzed from the viewpoint of accuracy of approximation versus $m$ and $N$.
△ Less
Submitted 5 May, 2020;
originally announced May 2020.
-
Polynomial Approximation of Anisotropic Analytic Functions of Several Variables
Authors:
Andrea Bonito,
Ronald DeVore,
Diane Guignard,
Peter Jantsch,
Guergana Petrova
Abstract:
Motivated by numerical methods for solving parametric partial differential equations, this paper studies the approximation of multivariate analytic functions by algebraic polynomials. We introduce various anisotropic model classes based on Taylor expansions, and study their approximation by finite dimensional polynomial spaces $\cal{P}_Λ$ described by lower sets $Λ$. Given a budget $n$ for the dim…
▽ More
Motivated by numerical methods for solving parametric partial differential equations, this paper studies the approximation of multivariate analytic functions by algebraic polynomials. We introduce various anisotropic model classes based on Taylor expansions, and study their approximation by finite dimensional polynomial spaces $\cal{P}_Λ$ described by lower sets $Λ$. Given a budget $n$ for the dimension of $\cal{P}_Λ$, we prove that certain lower sets $Λ_n$, with cardinality $n$, provide a certifiable approximation error that is in a certain sense optimal, and that these lower sets have a simple definition in terms of simplices. Our main goal is to obtain approximation results when the number of variables $d$ is large and even infinite, and so we concentrate almost exclusively on the case $d=\infty$. We also emphasize obtaining results which hold for the full range $n\ge 1$, rather than asymptotic results that only hold for $n$ sufficiently large. In applications, one typically wants $n$ small to comply with computational budgets.
△ Less
Submitted 15 January, 2020; v1 submitted 27 April, 2019;
originally announced April 2019.
-
Diffusion Coefficients Estimation for Elliptic Partial Differential Equations
Authors:
Andrea Bonito,
Albert Cohen,
Ronald DeVore,
Guergana Petrova,
Gerrit Welper
Abstract:
This paper considers the Dirichlet problem $$ -\mathrm{div}(a\nabla u_a)=f \quad \hbox{on}\,\,\ D, \qquad u_a=0\quad \hbox{on}\,\,\partial D, $$ for a Lipschitz domain $D\subset \mathbb R^d$, where $a$ is a scalar diffusion function. For a fixed $f$, we discuss under which conditions is $a$ uniquely determined and when can $a$ be stably recovered from the knowledge of $u_a$.
A first result is th…
▽ More
This paper considers the Dirichlet problem $$ -\mathrm{div}(a\nabla u_a)=f \quad \hbox{on}\,\,\ D, \qquad u_a=0\quad \hbox{on}\,\,\partial D, $$ for a Lipschitz domain $D\subset \mathbb R^d$, where $a$ is a scalar diffusion function. For a fixed $f$, we discuss under which conditions is $a$ uniquely determined and when can $a$ be stably recovered from the knowledge of $u_a$.
A first result is that whenever $a\in H^1(D)$, with $0<λ\le a\le Λ$ on $D$, and $f\in L_\infty(D)$ is strictly positive, then $$ \|a-b\|_{L_2(D)}\le C\|u_a-u_b\|_{H_0^1(D)}^{1/6}. $$ More generally, it is shown that the assumption $a\in H^1(D)$ can be weakened to $a\in H^s(D)$, for certain $s<1$, at the expense of lowering the exponent $1/6$ to a value that depends on $s$.
△ Less
Submitted 16 December, 2016; v1 submitted 16 September, 2016;
originally announced September 2016.
-
Data Assimilation and Sampling in Banach spaces
Authors:
Ronald DeVore,
Guergana Petrova,
Przemyslaw Wojtaszczyk
Abstract:
This paper studies the problem of approximating a function $f$ in a Banach space $X$ from measurements $l_j(f)$, $j=1,\dots,m$, where the $l_j$ are linear functionals from $X^*$. Most results study this problem for classical Banach spaces $X$ such as the $L_p$ spaces, $1\le p\le \infty$, and for $K$ the unit ball of a smoothness space in $X$. Our interest in this paper is in the model classes…
▽ More
This paper studies the problem of approximating a function $f$ in a Banach space $X$ from measurements $l_j(f)$, $j=1,\dots,m$, where the $l_j$ are linear functionals from $X^*$. Most results study this problem for classical Banach spaces $X$ such as the $L_p$ spaces, $1\le p\le \infty$, and for $K$ the unit ball of a smoothness space in $X$. Our interest in this paper is in the model classes $K=K(ε,V)$, with $ε>0$ and $V$ a finite dimensional subspace of $X$, which consists of all $f\in X$ such that $dist(f,V)_X\le ε$. These model classes, called {\it approximation sets}, arise naturally in application domains such as parametric partial differential equations, uncertainty quantification, and signal processing.
A general theory for the recovery of approximation sets in a Banach space is given. This theory includes tight a priori bounds on optimal performance, and algorithms for finding near optimal approximations. We show how the recovery problem for approximation sets is connected with well-studied concepts in Banach space theory such as liftings and the angle between spaces. Examples are given that show how this theory can be used to recover several recent results on sampling and data assimilation.
△ Less
Submitted 5 August, 2016; v1 submitted 19 February, 2016;
originally announced February 2016.
-
Data Assimilation in Reduced Modeling
Authors:
Peter Binev,
Albert Cohen,
Wolfgang Dahmen,
Ronald DeVore,
Guergana Petrova,
Przemyslaw Wojtaszczyk
Abstract:
We consider the problem of optimal recovery of an element $u$ of a Hilbert space $\mathcal{H}$ from $m$ measurements obtained through known linear functionals on $\mathcal{H}$. Problems of this type are well studied \cite{MRW} under an assumption that $u$ belongs to a prescribed model class, e.g. a known compact subset of $\mathcal{H}$. Motivated by reduced modeling for parametric partial differen…
▽ More
We consider the problem of optimal recovery of an element $u$ of a Hilbert space $\mathcal{H}$ from $m$ measurements obtained through known linear functionals on $\mathcal{H}$. Problems of this type are well studied \cite{MRW} under an assumption that $u$ belongs to a prescribed model class, e.g. a known compact subset of $\mathcal{H}$. Motivated by reduced modeling for parametric partial differential equations, this paper considers another setting where the additional information about $u$ is in the form of how well $u$ can be approximated by a certain known subspace $V_n$ of $\mathcal{H}$ of dimension $n$, or more generally, how well $u$ can be approximated by each $k$-dimensional subspace $V_k$ of a sequence of nested subspaces $V_0\subset V_1\cdots\subset V_n$. A recovery algorithm for the one-space formulation, proposed in \cite{MPPY}, is proven here to be optimal and to have a simple formulation, if certain favorable bases are chosen to represent $V_n$ and the measurements. The major contribution of the present paper is to analyze the multi-space case for which it is shown that the set of all $u$ satisfying the given information can be described as the intersection of a family of known ellipsoids in $\mathcal{H}$. It follows that a near optimal recovery algorithm in the multi-space problem is to identify any point in this intersection which can provide a much better accuracy than in the one-space problem. Two iterative algorithms based on alternating projections are proposed for recovery in the multi-space problem. A detailed analysis of one of them provides a posteriori performance estimates for the iterates, stop** criteria, and convergence rates. Since the limit of the algorithm is a point in the intersection of the aforementioned ellipsoids, it provides a near optimal recovery for $u$.
△ Less
Submitted 15 June, 2015;
originally announced June 2015.
-
Rescaled Pure Greedy Algorithm for Convex Optimization
Authors:
Zheming Gao,
Guergana Petrova
Abstract:
We suggest a new greedy strategy for convex optimization in Banach spaces and prove its convergent rates under a suitable behavior of the modulus of uniform smoothness of the objective function.
We suggest a new greedy strategy for convex optimization in Banach spaces and prove its convergent rates under a suitable behavior of the modulus of uniform smoothness of the objective function.
△ Less
Submitted 13 May, 2015;
originally announced May 2015.
-
Rescaled Pure Greedy Algorithm for Hilbert and Banach Spaces
Authors:
Guergana Petrova
Abstract:
We show that a very simple modification of the Pure Greedy Algorithm for approximating functions by sparse sums from a dictionary in a Hilbert or more generally a Banach space has optimal convergence rates on the class of convex combinations of dictionary elements
We show that a very simple modification of the Pure Greedy Algorithm for approximating functions by sparse sums from a dictionary in a Hilbert or more generally a Banach space has optimal convergence rates on the class of convex combinations of dictionary elements
△ Less
Submitted 13 May, 2015;
originally announced May 2015.
-
Greedy Strategies for Convex Optimization
Authors:
Hao Nguyen,
Guergana Petrova
Abstract:
We investigate two greedy strategies for finding an approximation to the minimum of a convex function $E$ defined on a Hilbert space $H$. We prove convergence rates for these algorithms under suitable conditions on the objective function $E$. These conditions involve the behavior of the modulus of smoothness and the modulus of uniform convexity of $E$.
We investigate two greedy strategies for finding an approximation to the minimum of a convex function $E$ defined on a Hilbert space $H$. We prove convergence rates for these algorithms under suitable conditions on the objective function $E$. These conditions involve the behavior of the modulus of smoothness and the modulus of uniform convexity of $E$.
△ Less
Submitted 8 January, 2014;
originally announced January 2014.
-
Greedy Algorithms for Reduced Bases in Banach Spaces
Authors:
Ronald DeVore,
Guergana Petrova,
Przemyslaw Wojtaszczyk
Abstract:
Given a Banach space X and one of its compact sets F, we consider the problem of finding a good n dimensional space X_n \subset X which can be used to approximate the elements of F. The best possible error we can achieve for such an approximation is given by the Kolmogorov width d_n(F)_X. However, finding the space which gives this performance is typically numerically intractable. Recently, a new…
▽ More
Given a Banach space X and one of its compact sets F, we consider the problem of finding a good n dimensional space X_n \subset X which can be used to approximate the elements of F. The best possible error we can achieve for such an approximation is given by the Kolmogorov width d_n(F)_X. However, finding the space which gives this performance is typically numerically intractable. Recently, a new greedy strategy for obtaining good spaces was given in the context of the reduced basis method for solving a parametric family of PDEs. The performance of this greedy algorithm was initially analyzed in A. Buffa, Y. Maday, A.T. Patera, C. Prud'homme, and G. Turinici, "A Priori convergence of the greedy algorithm for the parameterized reduced basis", M2AN Math. Model. Numer. Anal., 46(2012), 595-603 in the case X = H is a Hilbert space. The results there were significantly improved on in P. Binev, A. Cohen, W. Dahmen, R. DeVore, G. Petrova, and P. Wojtaszczyk, "Convergence rates for greedy algorithms in reduced bases Methods", SIAM J. Math. Anal., 43 (2011), 1457-1472. The purpose of the present paper is to give a new analysis of the performance of such greedy algorithms. Our analysis not only gives improved results for the Hilbert space case but can also be applied to the same greedy procedure in general Banach spaces.
△ Less
Submitted 10 April, 2012;
originally announced April 2012.