-
A note on best n-term approximation for generalized Wiener classes
Authors:
Ronald DeVore,
Guergana Petrova,
Przemyslaw Wojtaszczyk
Abstract:
We determine the best n-term approximation of generalized Wiener model classes in a Hilbert space $H $. This theory is then applied to several special cases.
We determine the best n-term approximation of generalized Wiener model classes in a Hilbert space $H $. This theory is then applied to several special cases.
△ Less
Submitted 25 June, 2024; v1 submitted 15 June, 2024;
originally announced June 2024.
-
Convergence and error control of consistent PINNs for elliptic PDEs
Authors:
Andrea Bonito,
Ronald DeVore,
Guergana Petrova,
Jonathan W. Siegel
Abstract:
We provide an a priori analysis of a certain class of numerical methods, commonly referred to as collocation methods, for solving elliptic boundary value problems. They begin with information in the form of point values of the right side f of such equations and point values of the boundary function g and utilize only this information to numerically approximate the solution u of the Partial Differe…
▽ More
We provide an a priori analysis of a certain class of numerical methods, commonly referred to as collocation methods, for solving elliptic boundary value problems. They begin with information in the form of point values of the right side f of such equations and point values of the boundary function g and utilize only this information to numerically approximate the solution u of the Partial Differential Equation (PDE). For such a method to provide an approximation to u with guaranteed error bounds, additional assumptions on f and g, called model class assumptions, are needed. We determine the best error (in the energy norm) of approximating u, in terms of the number of point samples m, under all Besov class model assumptions for the right hand side $f$ and boundary g.
We then turn to the study of numerical procedures and asks whether a proposed numerical procedure (nearly) achieves the optimal recovery error. We analyze numerical methods which generate the numerical approximation to $u$ by minimizing a specified data driven loss function over a set $Σ$ which is either a finite dimensional linear space, or more generally, a finite dimensional manifold. We show that the success of such a procedure depends critically on choosing a correct data driven loss function that is consistent with the PDE and provides sharp error control. Based on this analysis a loss function $L^*$ is proposed.
We also address the recent methods of Physics Informed Neural Networks (PINNs). Minimization of the new loss $L^*$ over neural network spaces $Σ$ is referred to as consistent PINNs (CPINNs). We prove that CPINNs provides an optimal recovery of the solution $u$, provided that the optimization problem can be numerically executed and $Σ$ has sufficient approximation capabilities. Finally, numerical examples illustrating the benefits of the CPINNs are given.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Constructions of bounded solutions of $div\, {\mathbf u}=f$ in critical spaces
Authors:
Albert Cohen,
Ronald DeVore,
Eitan Tadmor
Abstract:
We construct uniformly bounded solutions of the equation $div\, {\mathbf u}=f$ for arbitrary data $f$ in the critical spaces $L^d(Ω)$, where $Ω$ is a domain of ${\mathbb R}^d$. This question was addressed by Bourgain & Brezis, [On the equation ${\rm div}\, Y=f$ and application to control of phases, JAMS 16(2) (2003) 393-426], who proved that although the problem has a uniformly bounded solution, i…
▽ More
We construct uniformly bounded solutions of the equation $div\, {\mathbf u}=f$ for arbitrary data $f$ in the critical spaces $L^d(Ω)$, where $Ω$ is a domain of ${\mathbb R}^d$. This question was addressed by Bourgain & Brezis, [On the equation ${\rm div}\, Y=f$ and application to control of phases, JAMS 16(2) (2003) 393-426], who proved that although the problem has a uniformly bounded solution, it is critical in the sense that there exists no linear solution operator for general $L^d$-data. We first discuss the validity of this existence result under weaker conditions than $f\in L^d(Ω)$, and then focus our work on constructive processes for such uniformly bounded solutions. In the $d=2$ case, we present a direct one-step explicit construction, which generalizes for $d>2$ to a $(d-1)$-step construction based on induction. An explicit construction is proposed for compactly supported data in $L^{2,\infty}(Ω)$ in the $d=2$ case. We also present constructive approaches based on optimization of a certain loss functional adapted to the problem. This approach provides a two-step construction in the $d=2$ case. This optimization is used as the building block of a hierarchical multistep process introduced in [E. Tadmor, Hierarchical construction of bounded solutions in critical regularity spaces, CPAM 69(6) (2016) 1087-1109] that converges to a solution in more general situations.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Weighted variation spaces and approximation by shallow ReLU networks
Authors:
Ronald DeVore,
Robert D. Nowak,
Rahul Parhi,
Jonathan W. Siegel
Abstract:
We investigate the approximation of functions $f$ on a bounded domain $Ω\subset \mathbb{R}^d$ by the outputs of single-hidden-layer ReLU neural networks of width $n$. This form of nonlinear $n$-term dictionary approximation has been intensely studied since it is the simplest case of neural network approximation (NNA). There are several celebrated approximation results for this form of NNA that int…
▽ More
We investigate the approximation of functions $f$ on a bounded domain $Ω\subset \mathbb{R}^d$ by the outputs of single-hidden-layer ReLU neural networks of width $n$. This form of nonlinear $n$-term dictionary approximation has been intensely studied since it is the simplest case of neural network approximation (NNA). There are several celebrated approximation results for this form of NNA that introduce novel model classes of functions on $Ω$ whose approximation rates avoid the curse of dimensionality. These novel classes include Barron classes, and classes based on sparsity or variation such as the Radon-domain BV classes.
The present paper is concerned with the definition of these novel model classes on domains $Ω$. The current definition of these model classes does not depend on the domain $Ω$. A new and more proper definition of model classes on domains is given by introducing the concept of weighted variation spaces. These new model classes are intrinsic to the domain itself. The importance of these new model classes is that they are strictly larger than the classical (domain-independent) classes. Yet, it is shown that they maintain the same NNA rates.
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
Solving PDEs with Incomplete Information
Authors:
Peter Binev,
Andrea Bonito,
Albert Cohen,
Wolfgang Dahmen,
Ronald DeVore,
Guergana Petrova
Abstract:
We consider the problem of numerically approximating the solutions to a partial differential equation (PDE) when there is insufficient information to determine a unique solution. Our main example is the Poisson boundary value problem, when the boundary data is unknown and instead one observes finitely many linear measurements of the solution. We view this setting as an optimal recovery problem and…
▽ More
We consider the problem of numerically approximating the solutions to a partial differential equation (PDE) when there is insufficient information to determine a unique solution. Our main example is the Poisson boundary value problem, when the boundary data is unknown and instead one observes finitely many linear measurements of the solution. We view this setting as an optimal recovery problem and develop theory and numerical algorithms for its solution. The main vehicle employed is the derivation and approximation of the Riesz representers of these functionals with respect to relevant Hilbert spaces of harmonic functions.
△ Less
Submitted 20 December, 2023; v1 submitted 13 January, 2023;
originally announced January 2023.
-
Optimal Learning
Authors:
Peter Binev,
Andrea Bonito,
Ronald DeVore,
Guergana Petrova
Abstract:
This paper studies the problem of learning an unknown function $f$ from given data about $f$. The learning problem is to give an approximation $\hat f$ to $f$ that predicts the values of $f$ away from the data. There are numerous settings for this learning problem depending on (i) what additional information we have about $f$ (known as a model class assumption), (ii) how we measure the accuracy of…
▽ More
This paper studies the problem of learning an unknown function $f$ from given data about $f$. The learning problem is to give an approximation $\hat f$ to $f$ that predicts the values of $f$ away from the data. There are numerous settings for this learning problem depending on (i) what additional information we have about $f$ (known as a model class assumption), (ii) how we measure the accuracy of how well $\hat f$ predicts $f$, (iii) what is known about the data and data sites, (iv) whether the data observations are polluted by noise. A mathematical description of the optimal performance possible (the smallest possible error of recovery) is known in the presence of a model class assumption. Under standard model class assumptions, it is shown in this paper that a near optimal $\hat f$ can be found by solving a certain discrete over-parameterized optimization problem with a penalty term. Here, near optimal means that the error is bounded by a fixed constant times the optimal error. This explains the advantage of over-parameterization which is commonly used in modern machine learning. The main results of this paper prove that over-parameterized learning with an appropriate loss function gives a near optimal approximation $\hat f$ of the function $f$ from which the data is collected. Quantitative bounds are given for how much over-parameterization needs to be employed and how the penalization needs to be scaled in order to guarantee a near optimal recovery of $f$. An extension of these results to the case where the data is polluted by additive deterministic noise is also given.
△ Less
Submitted 26 June, 2023; v1 submitted 29 March, 2022;
originally announced March 2022.
-
Neural Network Approximation
Authors:
Ronald DeVore,
Boris Hanin,
Guergana Petrova
Abstract:
Neural Networks (NNs) are the method of choice for building learning algorithms. Their popularity stems from their empirical success on several challenging learning problems. However, most scholars agree that a convincing theoretical explanation for this success is still lacking.
This article surveys the known approximation properties of the outputs of NNs with the aim of uncovering the properti…
▽ More
Neural Networks (NNs) are the method of choice for building learning algorithms. Their popularity stems from their empirical success on several challenging learning problems. However, most scholars agree that a convincing theoretical explanation for this success is still lacking.
This article surveys the known approximation properties of the outputs of NNs with the aim of uncovering the properties that are not present in the more traditional methods of approximation used in numerical analysis. Comparisons are made with traditional approximation methods from the viewpoint of rate distortion. Another major component in the analysis of numerical approximation is the computational time needed to construct the approximation and this in turn is intimately connected with the stability of the approximation algorithm. So the stability of numerical approximation using NNs is a large part of the analysis put forward.
The survey, for the most part, is concerned with NNs using the popular ReLU activation function. In this case, the outputs of the NNs are piecewise linear functions on rather complicated partitions of the domain of $f$ into cells that are convex polytopes. When the architecture of the NN is fixed and the parameters are allowed to vary, the set of output functions of the NN is a parameterized nonlinear manifold. It is shown that this manifold has certain space filling properties leading to an increased ability to approximate (better rate distortion) but at the expense of numerical stability. The space filling creates a challenge to the numerical method in finding best or good parameter choices when trying to approximate.
△ Less
Submitted 28 December, 2020;
originally announced December 2020.
-
Optimal Stable Nonlinear Approximation
Authors:
Albert Cohen,
Ronald DeVore,
Guergana Petrova,
Przemyslaw Wojtaszczyk
Abstract:
While it is well known that nonlinear methods of approximation can often perform dramatically better than linear methods, there are still questions on how to measure the optimal performance possible for such methods. This paper studies nonlinear methods of approximation that are compatible with numerical implementation in that they are required to be numerically stable. A measure of optimal perfor…
▽ More
While it is well known that nonlinear methods of approximation can often perform dramatically better than linear methods, there are still questions on how to measure the optimal performance possible for such methods. This paper studies nonlinear methods of approximation that are compatible with numerical implementation in that they are required to be numerically stable. A measure of optimal performance, called {\em stable manifold widths}, for approximating a model class $K$ in a Banach space $X$ by stable manifold methods is introduced. Fundamental inequalities between these stable manifold widths and the entropy of $K$ are established. The effects of requiring stability in the settings of deep learning and compressed sensing are discussed.
△ Less
Submitted 21 September, 2020;
originally announced September 2020.
-
Nonlinear Methods for Model Reduction
Authors:
Andrea Bonito,
Albert Cohen,
Ronald DeVore,
Diane Guignard,
Peter Jantsch,
Guergana Petrova
Abstract:
The usual approach to model reduction for parametric partial differential equations (PDEs) is to construct a linear space $V_n$ which approximates well the solution manifold $\mathcal{M}$ consisting of all solutions $u(y)$ with $y$ the vector of parameters. This linear reduced model $V_n$ is then used for various tasks such as building an online forward solver for the PDE or estimating parameters…
▽ More
The usual approach to model reduction for parametric partial differential equations (PDEs) is to construct a linear space $V_n$ which approximates well the solution manifold $\mathcal{M}$ consisting of all solutions $u(y)$ with $y$ the vector of parameters. This linear reduced model $V_n$ is then used for various tasks such as building an online forward solver for the PDE or estimating parameters from data observations. It is well understood in other problems of numerical computation that nonlinear methods such as adaptive approximation, $n$-term approximation, and certain tree-based methods may provide improved numerical efficiency. For model reduction, a nonlinear method would replace the linear space $V_n$ by a nonlinear space $Σ_n$. This idea has already been suggested in recent papers on model reduction where the parameter domain is decomposed into a finite number of cells and a linear space of low dimension is assigned to each cell.
Up to this point, little is known in terms of performance guarantees for such a nonlinear strategy. Moreover, most numerical experiments for nonlinear model reduction use a parameter dimension of only one or two. In this work, a step is made towards a more cohesive theory for nonlinear model reduction. Framing these methods in the general setting of library approximation allows us to give a first comparison of their performance with those of standard linear approximation for any general compact set. We then turn to the study these methods for solution manifolds of parametrized elliptic PDEs. We study a very specific example of library approximation where the parameter domain is split into a finite number $N$ of rectangular cells and where different reduced affine spaces of dimension $m$ are assigned to each cell. The performance of this nonlinear procedure is analyzed from the viewpoint of accuracy of approximation versus $m$ and $N$.
△ Less
Submitted 5 May, 2020;
originally announced May 2020.
-
State Estimation -- The Role of Reduced Models
Authors:
Albert Cohen,
Wolfgang Dahmen,
Ron DeVore
Abstract:
The exploration of complex physical or technological processes usually requires exploiting available information from different sources: (i) physical laws often represented as a family of parameter dependent partial differential equations and (ii) data provided by measurement devices or sensors. The amount of sensors is typically limited and data acquisition may be expensive and in some cases even…
▽ More
The exploration of complex physical or technological processes usually requires exploiting available information from different sources: (i) physical laws often represented as a family of parameter dependent partial differential equations and (ii) data provided by measurement devices or sensors. The amount of sensors is typically limited and data acquisition may be expensive and in some cases even harmful. This article reviews some recent developments for this "small-data" scenario where inversion is strongly aggravated by the typically large parametric dimensionality. The proposed concepts may be viewed as exploring alternatives to Bayesian inversion in favor of more deterministic accuracy quantification related to the required computational complexity. We discuss optimality criteria which delineate intrinsic information limits, and highlight the role of reduced models for develo** efficient computational strategies. In particular, the need to adapt the reduced models -- not to a specific (possibly noisy) data set but rather to the sensor system -- is a central theme. This, in turn, is facilitated by exploiting geometric perspectives based on proper stable variational formulations of the continuous model.
△ Less
Submitted 1 February, 2020;
originally announced February 2020.
-
Polynomial Approximation of Anisotropic Analytic Functions of Several Variables
Authors:
Andrea Bonito,
Ronald DeVore,
Diane Guignard,
Peter Jantsch,
Guergana Petrova
Abstract:
Motivated by numerical methods for solving parametric partial differential equations, this paper studies the approximation of multivariate analytic functions by algebraic polynomials. We introduce various anisotropic model classes based on Taylor expansions, and study their approximation by finite dimensional polynomial spaces $\cal{P}_Λ$ described by lower sets $Λ$. Given a budget $n$ for the dim…
▽ More
Motivated by numerical methods for solving parametric partial differential equations, this paper studies the approximation of multivariate analytic functions by algebraic polynomials. We introduce various anisotropic model classes based on Taylor expansions, and study their approximation by finite dimensional polynomial spaces $\cal{P}_Λ$ described by lower sets $Λ$. Given a budget $n$ for the dimension of $\cal{P}_Λ$, we prove that certain lower sets $Λ_n$, with cardinality $n$, provide a certifiable approximation error that is in a certain sense optimal, and that these lower sets have a simple definition in terms of simplices. Our main goal is to obtain approximation results when the number of variables $d$ is large and even infinite, and so we concentrate almost exclusively on the case $d=\infty$. We also emphasize obtaining results which hold for the full range $n\ge 1$, rather than asymptotic results that only hold for $n$ sufficiently large. In applications, one typically wants $n$ small to comply with computational budgets.
△ Less
Submitted 15 January, 2020; v1 submitted 27 April, 2019;
originally announced April 2019.
-
Optimal reduced model algorithms for data-based state estimation
Authors:
Albert Cohen,
Wolfgang Dahmen,
Ron DeVore,
Jalal Fadili,
Olga Mula,
James Nichols
Abstract:
Reduced model spaces, such as reduced basis and polynomial chaos, are linear spaces $V_n$ of finite dimension $n$ which are designed for the efficient approximation of families parametrized PDEs in a Hilbert space $V$. The manifold $\mathcal{M}$ that gathers the solutions of the PDE for all admissible parameter values is globally approximated by the space $V_n$ with some controlled accuracy $ε_n$,…
▽ More
Reduced model spaces, such as reduced basis and polynomial chaos, are linear spaces $V_n$ of finite dimension $n$ which are designed for the efficient approximation of families parametrized PDEs in a Hilbert space $V$. The manifold $\mathcal{M}$ that gathers the solutions of the PDE for all admissible parameter values is globally approximated by the space $V_n$ with some controlled accuracy $ε_n$, which is typically much smaller than when using standard approximation spaces of the same dimension such as finite elements. Reduced model spaces have also been proposed in [13] as a vehicle to design a simple linear recovery algorithm of the state $u\in\mathcal{M}$ corresponding to a particular solution when the values of parameters are unknown but a set of data is given by $m$ linear measurements of the state. The measurements are of the form $\ell_j(u)$, $j=1,\dots,m$, where the $\ell_j$ are linear functionals on $V$. The analysis of this approach in [2] shows that the recovery error is bounded by $μ_nε_n$, where $μ_n=μ(V_n,W)$ is the inverse of an inf-sup constant that describe the angle between $V_n$ and the space $W$ spanned by the Riesz representers of $(\ell_1,\dots,\ell_m)$. A reduced model space which is efficient for approximation might thus be ineffective for recovery if $μ_n$ is large or infinite. In this paper, we discuss the existence and construction of an optimal reduced model space for this recovery method, and we extend our search to affine spaces. Our basic observation is that this problem is equivalent to the search of an optimal affine algorithm for the recovery of $\mathcal{M}$ in the worst case error sense. This allows us to perform our search by a convex optimization procedure. Numerical tests illustrate that the reduced model spaces constructed from our approach perform better than the classical reduced basis spaces.
△ Less
Submitted 2 August, 2020; v1 submitted 19 March, 2019;
originally announced March 2019.
-
Reduced Basis Greedy Selection Using Random Training Sets
Authors:
Albert Cohen,
Wolfgang Dahmen,
Ronald DeVore
Abstract:
Reduced bases have been introduced for the approximation of parametrized PDEs in applications where many online queries are required. Their numerical efficiency for such problems has been theoretically confirmed in \cite{BCDDPW,DPW}, where it is shown that the reduced basis space $V_n$ of dimension $n$, constructed by a certain greedy strategy, has approximation error similar to that of the optima…
▽ More
Reduced bases have been introduced for the approximation of parametrized PDEs in applications where many online queries are required. Their numerical efficiency for such problems has been theoretically confirmed in \cite{BCDDPW,DPW}, where it is shown that the reduced basis space $V_n$ of dimension $n$, constructed by a certain greedy strategy, has approximation error similar to that of the optimal space associated to the Kolmogorov $n$-width of the solution manifold. The greedy construction of the reduced basis space is performed in an offline stage which requires at each step a maximization of the current error over the parameter space. For the purpose of numerical computation, this maximization is performed over a finite {\em training set} obtained through a discretization. of the parameter domain. To guarantee a final approximation error $\varepsilon$ for the space generated by the greedy algorithm requires in principle that the snapshots associated to this training set constitute an approximation net for the solution manifold with accuracy or order $\varepsilon$. Hence, the size of the training set is the $\varepsilon$ covering number for $\mathcal{M}$ and this covering number typically behaves like $\exp(C\varepsilon^{-1/s})$ for some $C>0$ when the solution manifold has $n$-width decay $O(n^{-s})$. Thus, the shear size of the training set prohibits implementation of the algorithm when $\varepsilon$ is small. The main result of this paper shows that, if one is willing to accept results which hold with high probability, rather than with certainty, then for a large class of relevant problems one may replace the fine discretization by a random training set of size polynomial in $\varepsilon^{-1}$. Our proof of this fact is established by using inverse inequalities for polynomials in high dimensions.
△ Less
Submitted 18 February, 2020; v1 submitted 22 October, 2018;
originally announced October 2018.
-
Diffusion Coefficients Estimation for Elliptic Partial Differential Equations
Authors:
Andrea Bonito,
Albert Cohen,
Ronald DeVore,
Guergana Petrova,
Gerrit Welper
Abstract:
This paper considers the Dirichlet problem $$ -\mathrm{div}(a\nabla u_a)=f \quad \hbox{on}\,\,\ D, \qquad u_a=0\quad \hbox{on}\,\,\partial D, $$ for a Lipschitz domain $D\subset \mathbb R^d$, where $a$ is a scalar diffusion function. For a fixed $f$, we discuss under which conditions is $a$ uniquely determined and when can $a$ be stably recovered from the knowledge of $u_a$.
A first result is th…
▽ More
This paper considers the Dirichlet problem $$ -\mathrm{div}(a\nabla u_a)=f \quad \hbox{on}\,\,\ D, \qquad u_a=0\quad \hbox{on}\,\,\partial D, $$ for a Lipschitz domain $D\subset \mathbb R^d$, where $a$ is a scalar diffusion function. For a fixed $f$, we discuss under which conditions is $a$ uniquely determined and when can $a$ be stably recovered from the knowledge of $u_a$.
A first result is that whenever $a\in H^1(D)$, with $0<λ\le a\le Λ$ on $D$, and $f\in L_\infty(D)$ is strictly positive, then $$ \|a-b\|_{L_2(D)}\le C\|u_a-u_b\|_{H_0^1(D)}^{1/6}. $$ More generally, it is shown that the assumption $a\in H^1(D)$ can be weakened to $a\in H^s(D)$, for certain $s<1$, at the expense of lowering the exponent $1/6$ to a value that depends on $s$.
△ Less
Submitted 16 December, 2016; v1 submitted 16 September, 2016;
originally announced September 2016.
-
Data Assimilation and Sampling in Banach spaces
Authors:
Ronald DeVore,
Guergana Petrova,
Przemyslaw Wojtaszczyk
Abstract:
This paper studies the problem of approximating a function $f$ in a Banach space $X$ from measurements $l_j(f)$, $j=1,\dots,m$, where the $l_j$ are linear functionals from $X^*$. Most results study this problem for classical Banach spaces $X$ such as the $L_p$ spaces, $1\le p\le \infty$, and for $K$ the unit ball of a smoothness space in $X$. Our interest in this paper is in the model classes…
▽ More
This paper studies the problem of approximating a function $f$ in a Banach space $X$ from measurements $l_j(f)$, $j=1,\dots,m$, where the $l_j$ are linear functionals from $X^*$. Most results study this problem for classical Banach spaces $X$ such as the $L_p$ spaces, $1\le p\le \infty$, and for $K$ the unit ball of a smoothness space in $X$. Our interest in this paper is in the model classes $K=K(ε,V)$, with $ε>0$ and $V$ a finite dimensional subspace of $X$, which consists of all $f\in X$ such that $dist(f,V)_X\le ε$. These model classes, called {\it approximation sets}, arise naturally in application domains such as parametric partial differential equations, uncertainty quantification, and signal processing.
A general theory for the recovery of approximation sets in a Banach space is given. This theory includes tight a priori bounds on optimal performance, and algorithms for finding near optimal approximations. We show how the recovery problem for approximation sets is connected with well-studied concepts in Banach space theory such as liftings and the angle between spaces. Examples are given that show how this theory can be used to recover several recent results on sampling and data assimilation.
△ Less
Submitted 5 August, 2016; v1 submitted 19 February, 2016;
originally announced February 2016.
-
Sparse polynomial approximation of parametric elliptic PDEs. Part II: lognormal coefficients
Authors:
Markus Bachmayr,
Albert Cohen,
Ronald DeVore,
Giovanni Migliorati
Abstract:
Elliptic partial differential equations with diffusion coefficients of lognormal form, that is $a=exp(b)$, where $b$ is a Gaussian random field, are considered. We study the $\ell^p$ summability properties of the Hermite polynomial expansion of the solution in terms of the countably many scalar parameters appearing in a given representation of $b$. These summability results have direct consequence…
▽ More
Elliptic partial differential equations with diffusion coefficients of lognormal form, that is $a=exp(b)$, where $b$ is a Gaussian random field, are considered. We study the $\ell^p$ summability properties of the Hermite polynomial expansion of the solution in terms of the countably many scalar parameters appearing in a given representation of $b$. These summability results have direct consequences on the approximation rates of best $n$-term truncated Hermite expansions. Our results significantly improve on the state of the art estimates available for this problem. In particular, they take into account the support properties of the basis functions involved in the representation of $b$, in addition to the size of these functions. One interesting conclusion from our analysis is that in certain relevant cases, the Karhunen-Loève representation of $b$ may not be the best choice concerning the resulting sparsity and approximability of the Hermite expansion.
△ Less
Submitted 23 September, 2015;
originally announced September 2015.
-
Orthogonal Matching Pursuit under the Restricted Isometry Property
Authors:
Albert Cohen,
Wolfgang Dahmen,
Ronald DeVore
Abstract:
This paper is concerned with the performance of Orthogonal Matching Pursuit (OMP) algorithms applied to a dictionary $\mathcal{D}$ in a Hilbert space $\mathcal{H}$. Given an element $f\in \mathcal{H}$, OMP generates a sequence of approximations $f_n$, $n=1,2,\dots$, each of which is a linear combination of $n$ dictionary elements chosen by a greedy criterion. It is studied whether the approximatio…
▽ More
This paper is concerned with the performance of Orthogonal Matching Pursuit (OMP) algorithms applied to a dictionary $\mathcal{D}$ in a Hilbert space $\mathcal{H}$. Given an element $f\in \mathcal{H}$, OMP generates a sequence of approximations $f_n$, $n=1,2,\dots$, each of which is a linear combination of $n$ dictionary elements chosen by a greedy criterion. It is studied whether the approximations $f_n$ are in some sense comparable to {\em best $n$ term approximation} from the dictionary. One important result related to this question is a theorem of Zhang \cite{TZ} in the context of sparse recovery of finite dimensional signals. This theorem shows that OMP exactly recovers $n$-sparse signal, whenever the dictionary $\mathcal{D}$ satisfies a Restricted Isometry Property (RIP) of order $An$ for some constant $A$, and that the procedure is also stable in $\ell^2$ under measurement noise. The main contribution of the present paper is to give a structurally simpler proof of Zhang's theorem, formulated in the general context of $n$ term approximation from a dictionary in arbitrary Hilbert spaces $\mathcal{H}$. Namely, it is shown that OMP generates near best $n$ term approximations under a similar RIP condition.
△ Less
Submitted 15 June, 2015;
originally announced June 2015.
-
Data Assimilation in Reduced Modeling
Authors:
Peter Binev,
Albert Cohen,
Wolfgang Dahmen,
Ronald DeVore,
Guergana Petrova,
Przemyslaw Wojtaszczyk
Abstract:
We consider the problem of optimal recovery of an element $u$ of a Hilbert space $\mathcal{H}$ from $m$ measurements obtained through known linear functionals on $\mathcal{H}$. Problems of this type are well studied \cite{MRW} under an assumption that $u$ belongs to a prescribed model class, e.g. a known compact subset of $\mathcal{H}$. Motivated by reduced modeling for parametric partial differen…
▽ More
We consider the problem of optimal recovery of an element $u$ of a Hilbert space $\mathcal{H}$ from $m$ measurements obtained through known linear functionals on $\mathcal{H}$. Problems of this type are well studied \cite{MRW} under an assumption that $u$ belongs to a prescribed model class, e.g. a known compact subset of $\mathcal{H}$. Motivated by reduced modeling for parametric partial differential equations, this paper considers another setting where the additional information about $u$ is in the form of how well $u$ can be approximated by a certain known subspace $V_n$ of $\mathcal{H}$ of dimension $n$, or more generally, how well $u$ can be approximated by each $k$-dimensional subspace $V_k$ of a sequence of nested subspaces $V_0\subset V_1\cdots\subset V_n$. A recovery algorithm for the one-space formulation, proposed in \cite{MPPY}, is proven here to be optimal and to have a simple formulation, if certain favorable bases are chosen to represent $V_n$ and the measurements. The major contribution of the present paper is to analyze the multi-space case for which it is shown that the set of all $u$ satisfying the given information can be described as the intersection of a family of known ellipsoids in $\mathcal{H}$. It follows that a near optimal recovery algorithm in the multi-space problem is to identify any point in this intersection which can provide a much better accuracy than in the one-space problem. Two iterative algorithms based on alternating projections are proposed for recovery in the multi-space problem. A detailed analysis of one of them provides a posteriori performance estimates for the iterates, stop** criteria, and convergence rates. Since the limit of the algorithm is a point in the intersection of the aforementioned ellipsoids, it provides a near optimal recovery for $u$.
△ Less
Submitted 15 June, 2015;
originally announced June 2015.
-
Approximation of high-dimensional parametric PDEs
Authors:
Albert Cohen,
Ronald Devore
Abstract:
Parametrized families of PDEs arise in various contexts such as inverse problems, control and optimization, risk assessment, and uncertainty quantification. In most of these applications, the number of parameters is large or perhaps even infinite. Thus, the development of numerical methods for these parametric problems is faced with the possible curse of dimensionality. This article is directed a…
▽ More
Parametrized families of PDEs arise in various contexts such as inverse problems, control and optimization, risk assessment, and uncertainty quantification. In most of these applications, the number of parameters is large or perhaps even infinite. Thus, the development of numerical methods for these parametric problems is faced with the possible curse of dimensionality. This article is directed at (i) identifying and understanding which properties of parametric equations allow one to avoid this curse and (ii) develo** and analyzing effective numerical methodd which fully exploit these properties and, in turn, are immune to the growth in dimensionality. The first part of this article studies the smoothness and approximability of the solution map, that is, the map $a\mapsto u(a)$ where $a$ is the parameter value and $u(a)$ is the corresponding solution to the PDE. It is shown that for many relevant parametric PDEs, the parametric smoothness of this map is typically holomorphic and also highly anisotropic in that the relevant parameters are of widely varying importance in describing the solution. These two properties are then exploited to establish convergence rates of $n$-term approximations to the solution map for which each term is separable in the parametric and physical variables. These results reveal that, at least on a theoretical level, the solution map can be well approximated by discretizations of moderate complexity, thereby showing how the curse of dimensionality is broken. This theoretical analysis is carried out through concepts of approximation theory such as best $n$-term approximation, sparsity, and $n$-widths. These notions determine a priori the best possible performance of numerical methods and thus serve as a benchmark for concrete algorithms. The second part of this article turns to the development of numerical algorithms based on the theoretically established sparse separable approximations. The numerical methods studied fall into two general categories. The first uses polynomial expansions in terms of the parameters to approximate the solution map. The second one searches for suitable low dimensional spaces for simultaneously approximating all members of the parametric family. The numerical implementation of these approaches is carried out through adaptive and greedy algorithms. An a priori analysis of the performance of these algorithms establishes how well they meet the theoretical benchmarks.
△ Less
Submitted 3 March, 2015; v1 submitted 24 February, 2015;
originally announced February 2015.
-
Kolmogorov widths under holomorphic map**s
Authors:
Albert Cohen,
Ronald Devore
Abstract:
If $L$ is a bounded linear operator map** the Banach space $X$ into the Banach space $Y$ and $K$ is a compact set in $X$, then the Kolmogorov widths of the image $L(K)$ do not exceed those of $K$ multiplied by the norm of $L$. We extend this result from linear maps to holomorphic map**s $u$ from $X$ to $Y$ in the following sense: when the $n$ widths of
$K$ are $O(n^{-r})$ for some…
▽ More
If $L$ is a bounded linear operator map** the Banach space $X$ into the Banach space $Y$ and $K$ is a compact set in $X$, then the Kolmogorov widths of the image $L(K)$ do not exceed those of $K$ multiplied by the norm of $L$. We extend this result from linear maps to holomorphic map**s $u$ from $X$ to $Y$ in the following sense: when the $n$ widths of
$K$ are $O(n^{-r})$ for some $r\textgreater{}1$, then those of $u(K)$ are $O(n^{-s})$ for any $s \textless{} r-1$, We then use these results to prove various theorems about Kolmogorov widths of manifolds consisting of solutions to certain parametrized PDEs. Results of this type are important in the numerical analysis of reduced bases and other reduced modeling methods, since the best possible performance of such methods is governed by the rate of decay of the Kolmogorov widths of the solution manifold.
△ Less
Submitted 24 February, 2015;
originally announced February 2015.
-
Classification algorithms using adaptive partitioning
Authors:
Peter Binev,
Albert Cohen,
Wolfgang Dahmen,
Ronald DeVore
Abstract:
Algorithms for binary classification based on adaptive tree partitioning are formulated and analyzed for both their risk performance and their friendliness to numerical implementation. The algorithms can be viewed as generating a set approximation to the Bayes set and thus fall into the general category of set estimators. In contrast with the most studied tree-based algorithms, which utilize piece…
▽ More
Algorithms for binary classification based on adaptive tree partitioning are formulated and analyzed for both their risk performance and their friendliness to numerical implementation. The algorithms can be viewed as generating a set approximation to the Bayes set and thus fall into the general category of set estimators. In contrast with the most studied tree-based algorithms, which utilize piecewise constant approximation on the generated partition [IEEE Trans. Inform. Theory 52 (2006) 1335-1353; Mach. Learn. 66 (2007) 209-242], we consider decorated trees, which allow us to derive higher order methods. Convergence rates for these methods are derived in terms the parameter $α$ of margin conditions and a rate $s$ of best approximation of the Bayes set by decorated adaptive partitions. They can also be expressed in terms of the Besov smoothness $β$ of the regression function that governs its approximability by piecewise polynomials on adaptive partition. The execution of the algorithms does not require knowledge of the smoothness or margin conditions. Besov smoothness conditions are weaker than the commonly used Hölder conditions, which govern approximation by nonadaptive partitions, and therefore for a given regression function can result in a higher rate of convergence. This in turn mitigates the compatibility conflict between smoothness and margin parameters.
△ Less
Submitted 4 November, 2014;
originally announced November 2014.
-
Tensor-Sparsity of Solutions to High-Dimensional Elliptic Partial Differential Equations
Authors:
Wolfgang Dahmen,
Ronald DeVore,
Lars Grasedyck,
Endre Süli
Abstract:
A recurring theme in attempts to break the curse of dimensionality in the numerical approximations of solutions to high-dimensional partial differential equations (PDEs) is to employ some form of sparse tensor approximation. Unfortunately, there are only a few results that quantify the possible advantages of such an approach. This paper introduces a class $Σ_n$ of functions, which can be written a…
▽ More
A recurring theme in attempts to break the curse of dimensionality in the numerical approximations of solutions to high-dimensional partial differential equations (PDEs) is to employ some form of sparse tensor approximation. Unfortunately, there are only a few results that quantify the possible advantages of such an approach. This paper introduces a class $Σ_n$ of functions, which can be written as a sum of rank-one tensors using a total of at most $n$ parameters and then uses this notion of sparsity to prove a regularity theorem for certain high-dimensional elliptic PDEs. It is shown, among other results, that whenever the right-hand side $f$ of the elliptic PDE can be approximated with a certain rate $\mathcal{O}(n^{-r})$ in the norm of ${\mathrm H}^{-1}$ by elements of $Σ_n$, then the solution $u$ can be approximated in ${\mathrm H}^1$ from $Σ_n$ to accuracy $\mathcal{O}(n^{-r'})$ for any $r'\in (0,r)$. Since these results require knowledge of the eigenbasis of the elliptic operator considered, we propose a second "basis-free" model of tensor sparsity and prove a regularity theorem for this second sparsity model as well. We then proceed to address the important question of the extent such regularity theorems translate into results on computational complexity. It is shown how this second model can be used to derive computational algorithms with performance that breaks the curse of dimensionality on certain model high-dimensional elliptic PDEs with tensor-sparse data.
△ Less
Submitted 23 July, 2014;
originally announced July 2014.
-
Convex optimization on Banach Spaces
Authors:
R. A. DeVore,
V. N. Temlyakov
Abstract:
Greedy algorithms which use only function evaluations are applied to convex optimization in a general Banach space $X$. Along with algorithms that use exact evaluations, algorithms with approximate evaluations are treated. A priori upper bounds for the convergence rate of the proposed algorithms are given. These bounds depend on the smoothness of the objective function and the sparsity or compress…
▽ More
Greedy algorithms which use only function evaluations are applied to convex optimization in a general Banach space $X$. Along with algorithms that use exact evaluations, algorithms with approximate evaluations are treated. A priori upper bounds for the convergence rate of the proposed algorithms are given. These bounds depend on the smoothness of the objective function and the sparsity or compressibility (with respect to a given dictionary) of a point in $X$ where the minimum is attained.
△ Less
Submitted 1 January, 2014;
originally announced January 2014.
-
Adaptive Finite Element Methods for Elliptic Problems with Discontinuous Coefficients
Authors:
Andrea Bonito,
Ronald A. DeVore,
Ricardo H. Nochetto
Abstract:
Elliptic partial differential equations (PDEs) with discontinuous diffusion coefficients occur in application domains such as diffusions through porous media, electro-magnetic field propagation on heterogeneous media, and diffusion processes on rough surfaces. The standard approach to numerically treating such problems using finite element methods is to assume that the discontinuities lie on the b…
▽ More
Elliptic partial differential equations (PDEs) with discontinuous diffusion coefficients occur in application domains such as diffusions through porous media, electro-magnetic field propagation on heterogeneous media, and diffusion processes on rough surfaces. The standard approach to numerically treating such problems using finite element methods is to assume that the discontinuities lie on the boundaries of the cells in the initial triangulation. However, this does not match applications where discontinuities occur on curves, surfaces, or manifolds, and could even be unknown beforehand. One of the obstacles to treating such discontinuity problems is that the usual perturbation theory for elliptic PDEs assumes bounds for the distortion of the coefficients in the $L_\infty$ norm and this in turn requires that the discontinuities are matched exactly when the coefficients are approximated. We present a new approach based on distortion of the coefficients in an $L_q$ norm with $q<\infty$ which therefore does not require the exact matching of the discontinuities. We then use this new distortion theory to formulate new adaptive finite element methods (AFEMs) for such discontinuity problems. We show that such AFEMs are optimal in the sense of distortion versus number of computations, and report insightful numerical results supporting our analysis.
△ Less
Submitted 30 July, 2013; v1 submitted 14 January, 2013;
originally announced January 2013.
-
Greedy Algorithms for Reduced Bases in Banach Spaces
Authors:
Ronald DeVore,
Guergana Petrova,
Przemyslaw Wojtaszczyk
Abstract:
Given a Banach space X and one of its compact sets F, we consider the problem of finding a good n dimensional space X_n \subset X which can be used to approximate the elements of F. The best possible error we can achieve for such an approximation is given by the Kolmogorov width d_n(F)_X. However, finding the space which gives this performance is typically numerically intractable. Recently, a new…
▽ More
Given a Banach space X and one of its compact sets F, we consider the problem of finding a good n dimensional space X_n \subset X which can be used to approximate the elements of F. The best possible error we can achieve for such an approximation is given by the Kolmogorov width d_n(F)_X. However, finding the space which gives this performance is typically numerically intractable. Recently, a new greedy strategy for obtaining good spaces was given in the context of the reduced basis method for solving a parametric family of PDEs. The performance of this greedy algorithm was initially analyzed in A. Buffa, Y. Maday, A.T. Patera, C. Prud'homme, and G. Turinici, "A Priori convergence of the greedy algorithm for the parameterized reduced basis", M2AN Math. Model. Numer. Anal., 46(2012), 595-603 in the case X = H is a Hilbert space. The results there were significantly improved on in P. Binev, A. Cohen, W. Dahmen, R. DeVore, G. Petrova, and P. Wojtaszczyk, "Convergence rates for greedy algorithms in reduced bases Methods", SIAM J. Math. Anal., 43 (2011), 1457-1472. The purpose of the present paper is to give a new analysis of the performance of such greedy algorithms. Our analysis not only gives improved results for the Hilbert space case but can also be applied to the same greedy procedure in general Banach spaces.
△ Less
Submitted 10 April, 2012;
originally announced April 2012.
-
Iteratively re-weighted least squares minimization for sparse recovery
Authors:
Ingrid Daubechies,
Ronald DeVore,
Massimo Fornasier,
C. Sinan Gunturk
Abstract:
We analyze an Iteratively Re-weighted Least Squares (IRLS) algorithm for promoting l1-minimization in sparse and compressible vector recovery. We prove its convergence and we estimate its local rate. We show how the algorithm can be modified in order to promote lt-minimization for t<1, and how this modification produces superlinear rates of convergence.
We analyze an Iteratively Re-weighted Least Squares (IRLS) algorithm for promoting l1-minimization in sparse and compressible vector recovery. We prove its convergence and we estimate its local rate. We show how the algorithm can be modified in order to promote lt-minimization for t<1, and how this modification produces superlinear rates of convergence.
△ Less
Submitted 3 July, 2008;
originally announced July 2008.
-
Approximation and learning by greedy algorithms
Authors:
Andrew R. Barron,
Albert Cohen,
Wolfgang Dahmen,
Ronald A. DeVore
Abstract:
We consider the problem of approximating a given element $f$ from a Hilbert space $\mathcal{H}$ by means of greedy algorithms and the application of such procedures to the regression problem in statistical learning theory. We improve on the existing theory of convergence rates for both the orthogonal greedy algorithm and the relaxed greedy algorithm, as well as for the forward stepwise projectio…
▽ More
We consider the problem of approximating a given element $f$ from a Hilbert space $\mathcal{H}$ by means of greedy algorithms and the application of such procedures to the regression problem in statistical learning theory. We improve on the existing theory of convergence rates for both the orthogonal greedy algorithm and the relaxed greedy algorithm, as well as for the forward stepwise projection algorithm. For all these algorithms, we prove convergence results for a variety of function classes and not simply those that are related to the convex hull of the dictionary. We then show how these bounds for convergence rates lead to a new theory for the performance of greedy algorithms in learning. In particular, we build upon the results in [IEEE Trans. Inform. Theory 42 (1996) 2118--2132] to construct learning algorithms based on greedy approximations which are universally consistent and provide provable convergence rates for large classes of functions. The use of greedy algorithms in the context of learning is very appealing since it greatly reduces the computational burden when compared with standard model selection using general dictionaries.
△ Less
Submitted 12 March, 2008;
originally announced March 2008.
-
Approximation using scattered shifts of a multivariate function
Authors:
Ronald DeVore,
Amos Ron
Abstract:
The approximation of a general $d$-variate function $f$ by the shifts $φ(\cdot-ξ)$, $ξ\inΞ\subset \Rd$, of a fixed function $φ$ occurs in many applications such as data fitting, neural networks, and learning theory. When $Ξ=h\Z^d$ is a dilate of the integer lattice, there is a rather complete understanding of the approximation problem \cite{BDR,Johnson1} using Fourier techniques. However, in mos…
▽ More
The approximation of a general $d$-variate function $f$ by the shifts $φ(\cdot-ξ)$, $ξ\inΞ\subset \Rd$, of a fixed function $φ$ occurs in many applications such as data fitting, neural networks, and learning theory. When $Ξ=h\Z^d$ is a dilate of the integer lattice, there is a rather complete understanding of the approximation problem \cite{BDR,Johnson1} using Fourier techniques. However, in most applications the {\it center} set $Ξ$ is either given, or can be chosen with complete freedom. In both of these cases, the shift-invariant setting is too restrictive. This paper studies the approximation problem in the case $Ξ$ is arbitrary. It establishes approximation theorems whose error bounds reflect the local density of the points in $Ξ$. Two different settings are analyzed. The first is when the set $Ξ$ is prescribed in advance. In this case, the theorems of this paper show that, in analogy with the classical univariate spline approximation, improved approximation occurs in regions where the density is high. The second setting corresponds to the problem of non-linear approximation. In that setting the set $Ξ$ can be chosen using information about the target function $f$. We discuss how to `best' make these choices and give estimates for the approximation error.
△ Less
Submitted 18 February, 2008;
originally announced February 2008.