-
A Low Rank Neural Representation of Entropy Solutions
Authors:
Donsub Rim,
Gerrit Welper
Abstract:
We construct a new representation of entropy solutions to nonlinear scalar conservation laws with a smooth convex flux function in a single spatial dimension. The representation is a generalization of the method of characteristics and posseses a compositional form. While it is a nonlinear representation, the embedded dynamics of the solution in the time variable is linear. This representation is t…
▽ More
We construct a new representation of entropy solutions to nonlinear scalar conservation laws with a smooth convex flux function in a single spatial dimension. The representation is a generalization of the method of characteristics and posseses a compositional form. While it is a nonlinear representation, the embedded dynamics of the solution in the time variable is linear. This representation is then discretized as a manifold of implicit neural representations where the feedforward neural network architecture has a low rank structure. Finally, we show that the low rank neural representation with a fixed number of layers and a small number of coefficients can approximate any entropy solution regardless of the complexity of the shock topology, while retaining the linearity of the embedded dynamics.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Approximation and Gradient Descent Training with Neural Networks
Authors:
G. Welper
Abstract:
It is well understood that neural networks with carefully hand-picked weights provide powerful function approximation and that they can be successfully trained in over-parametrized regimes. Since over-parametrization ensures zero training error, these two theories are not immediately compatible. Recent work uses the smoothness that is required for approximation results to extend a neural tangent k…
▽ More
It is well understood that neural networks with carefully hand-picked weights provide powerful function approximation and that they can be successfully trained in over-parametrized regimes. Since over-parametrization ensures zero training error, these two theories are not immediately compatible. Recent work uses the smoothness that is required for approximation results to extend a neural tangent kernel (NTK) optimization argument to an under-parametrized regime and show direct approximation bounds for networks trained by gradient flow. Since gradient flow is only an idealization of a practical method, this paper establishes analogous results for networks trained by gradient descent.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Performance bounds for Reduced Order Models with Application to Parametric Transport
Authors:
D. Rim,
G. Welper
Abstract:
The Kolmogorov $n$-width is an established benchmark to judge the performance of reduced basis and similar methods that produce linear reduced spaces. Although immensely successful in the elliptic regime, this width, shows unsatisfactory slow convergence rates for transport dominated problems. While this has triggered a large amount of work on nonlinear model reduction techniques, we are lacking a…
▽ More
The Kolmogorov $n$-width is an established benchmark to judge the performance of reduced basis and similar methods that produce linear reduced spaces. Although immensely successful in the elliptic regime, this width, shows unsatisfactory slow convergence rates for transport dominated problems. While this has triggered a large amount of work on nonlinear model reduction techniques, we are lacking a benchmark to evaluate their optimal performance.
Nonlinear benchmarks like manifold/stable/Lipschitz width applied to the solution manifold are often trivial if the degrees of freedom exceed the parameter dimension and ignore desirable structure as offline/online decompositions. In this paper, we show that the same benchmarks applied to the full reduced order model pipeline from PDE to parametric quantity of interest provide non-trivial benchmarks and we prove lower bounds for transport equations.
△ Less
Submitted 22 October, 2023;
originally announced October 2023.
-
Approximation Results for Gradient Descent trained Neural Networks
Authors:
G. Welper
Abstract:
The paper contains approximation guarantees for neural networks that are trained with gradient flow, with error measured in the continuous $L_2(\mathbb{S}^{d-1})$-norm on the $d$-dimensional unit sphere and targets that are Sobolev smooth. The networks are fully connected of constant depth and increasing width. Although all layers are trained, the gradient flow convergence is based on a neural tan…
▽ More
The paper contains approximation guarantees for neural networks that are trained with gradient flow, with error measured in the continuous $L_2(\mathbb{S}^{d-1})$-norm on the $d$-dimensional unit sphere and targets that are Sobolev smooth. The networks are fully connected of constant depth and increasing width. Although all layers are trained, the gradient flow convergence is based on a neural tangent kernel (NTK) argument for the non-convex second but last layer. Unlike standard NTK analysis, the continuous error norm implies an under-parametrized regime, possible by the natural smoothness assumption required for approximation. The typical over-parametrization re-enters the results in form of a loss in approximation rate relative to established approximation methods for Sobolev smooth functions.
△ Less
Submitted 9 September, 2023;
originally announced September 2023.
-
Learning Trees of $\ell_0$-Minimization Problems
Authors:
G. Welper
Abstract:
The problem of computing minimally sparse solutions of under-determined linear systems is $NP$ hard in general. Subsets with extra properties, may allow efficient algorithms, most notably problems with the restricted isometry property (RIP) can be solved by convex $\ell_1$-minimization. While these classes have been very successful, they leave out many practical applications.
In this paper, we c…
▽ More
The problem of computing minimally sparse solutions of under-determined linear systems is $NP$ hard in general. Subsets with extra properties, may allow efficient algorithms, most notably problems with the restricted isometry property (RIP) can be solved by convex $\ell_1$-minimization. While these classes have been very successful, they leave out many practical applications.
In this paper, we consider adaptable classes that are tractable after training on a curriculum of increasingly difficult samples. The setup is intended as a candidate model for a human mathematician, who may not be able to tackle an arbitrary proof right away, but may be successful in relatively flexible subclasses, or areas of expertise, after training on a suitable curriculum.
△ Less
Submitted 5 February, 2023;
originally announced February 2023.
-
Approximation results for Gradient Descent trained Shallow Neural Networks in $1d$
Authors:
R. Gentile,
G. Welper
Abstract:
Two aspects of neural networks that have been extensively studied in the recent literature are their function approximation properties and their training by gradient descent methods. The approximation problem seeks accurate approximations with a minimal number of weights. In most of the current literature these weights are fully or partially hand-crafted, showing the capabilities of neural network…
▽ More
Two aspects of neural networks that have been extensively studied in the recent literature are their function approximation properties and their training by gradient descent methods. The approximation problem seeks accurate approximations with a minimal number of weights. In most of the current literature these weights are fully or partially hand-crafted, showing the capabilities of neural networks but not necessarily their practical performance. In contrast, optimization theory for neural networks heavily relies on an abundance of weights in over-parametrized regimes.
This paper balances these two demands and provides an approximation result for shallow networks in $1d$ with non-convex weight optimization by gradient descent. We consider finite width networks and infinite sample limits, which is the typical setup in approximation theory. Technically, this problem is not over-parametrized, however, some form of redundancy reappears as a loss in approximation rate compared to best possible rates.
△ Less
Submitted 17 September, 2022;
originally announced September 2022.
-
Non-Convex Compressed Sensing with Training Data
Authors:
G. Welper
Abstract:
Efficient algorithms for the sparse solution of under-determined linear systems $Ax = b$ are known for matrices $A$ satisfying suitable assumptions like the restricted isometry property (RIP). Without such assumptions little is known and without any assumptions on $A$ the problem is $NP$-hard. A common approach is to replace $\ell_1$ by $\ell_p$ minimization for $0 < p < 1$, which is no longer con…
▽ More
Efficient algorithms for the sparse solution of under-determined linear systems $Ax = b$ are known for matrices $A$ satisfying suitable assumptions like the restricted isometry property (RIP). Without such assumptions little is known and without any assumptions on $A$ the problem is $NP$-hard. A common approach is to replace $\ell_1$ by $\ell_p$ minimization for $0 < p < 1$, which is no longer convex and typically requires some form of local initial values for provably convergent algorithms.
In this paper, we consider an alternative, where instead of suitable initial values we are provided with extra training problems $Ax = B_l$, $l=1, \dots, p$ that are related to our compressed sensing problem. They allow us to find the solution of the original problem $Ax = b$ with high probability in the range of a one layer linear neural network with comparatively few assumptions on the matrix $A$.
△ Less
Submitted 20 January, 2021;
originally announced January 2021.
-
Universality of Gradient Descent Neural Network Training
Authors:
G. Welper
Abstract:
It has been observed that design choices of neural networks are often crucial for their successful optimization. In this article, we therefore discuss the question if it is always possible to redesign a neural network so that it trains well with gradient descent. This yields the following universality result: If, for a given network, there is any algorithm that can find good network weights for a…
▽ More
It has been observed that design choices of neural networks are often crucial for their successful optimization. In this article, we therefore discuss the question if it is always possible to redesign a neural network so that it trains well with gradient descent. This yields the following universality result: If, for a given network, there is any algorithm that can find good network weights for a classification task, then there exists an extension of this network that reproduces these weights and the corresponding forward output by mere gradient descent training. The construction is not intended for practical computations, but it provides some orientation on the possibilities of meta-learning and related approaches.
△ Less
Submitted 27 July, 2020;
originally announced July 2020.
-
A Relaxation Argument for Optimization in Neural Networks and Non-Convex Compressed Sensing
Authors:
G. Welper
Abstract:
It has been observed in practical applications and in theoretical analysis that over-parametrization helps to find good minima in neural network training. Similarly, in this article we study widening and deepening neural networks by a relaxation argument so that the enlarged networks are rich enough to run $r$ copies of parts of the original network in parallel, without necessarily achieving zero…
▽ More
It has been observed in practical applications and in theoretical analysis that over-parametrization helps to find good minima in neural network training. Similarly, in this article we study widening and deepening neural networks by a relaxation argument so that the enlarged networks are rich enough to run $r$ copies of parts of the original network in parallel, without necessarily achieving zero training error as in over-parametrized scenarios. The partial copies can be combined in $r^θ$ possible ways for layer width $θ$. Therefore, the enlarged networks can potentially achieve the best training error of $r^θ$ random initializations, but it is not immediately clear if this can be realized via gradient descent or similar training methods.
The same construction can be applied to other optimization problems by introducing a similar layered structure. We apply this idea to non-convex compressed sensing, where we show that in some scenarios we can realize the $r^θ$ times increased chance to obtain a global optimum by solving a convex optimization problem of dimension $rθ$.
△ Less
Submitted 2 February, 2020;
originally announced February 2020.
-
Transformed Snapshot Interpolation with High Resolution Transforms
Authors:
G. Welper
Abstract:
In the last few years, several methods have been developed to deal with jump singularities in parametric or stochastic hyperbolic PDEs. They typically use some alignment of the jump-sets in physical space before performing well established reduced order modelling techniques such as reduced basis methods, POD or simply interpolation. In the current literature, the transforms are typically of low re…
▽ More
In the last few years, several methods have been developed to deal with jump singularities in parametric or stochastic hyperbolic PDEs. They typically use some alignment of the jump-sets in physical space before performing well established reduced order modelling techniques such as reduced basis methods, POD or simply interpolation. In the current literature, the transforms are typically of low resolution in space, mostly low order polynomials, Fourier modes or constant shifts. In this paper, we discuss higher resolution transforms in one of the recent methods, the transformed snapshot interpolation (TSI). We introduce a new discretization of the transforms with an appropriate behaviour near singularities and consider their numerical computation via an optimization procedure.
△ Less
Submitted 4 January, 2019;
originally announced January 2019.
-
$h$ and $hp$-adaptive Interpolation by Transformed Snapshots for Parametric and Stochastic Hyperbolic PDEs
Authors:
G. Welper
Abstract:
The numerical approximation of solutions of parametric or stochastic hyperbolic PDEs is still a serious challenge. Because of shock singularities, most methods from the elliptic and parabolic regime, such as reduced basis methods, POD or polynomial chaos expansions, show a poor performance. Recently, Welper [Interpolation of functions with parameter dependent jumps by transformed snapshots. SIAM J…
▽ More
The numerical approximation of solutions of parametric or stochastic hyperbolic PDEs is still a serious challenge. Because of shock singularities, most methods from the elliptic and parabolic regime, such as reduced basis methods, POD or polynomial chaos expansions, show a poor performance. Recently, Welper [Interpolation of functions with parameter dependent jumps by transformed snapshots. SIAM Journal on Scientific Computing, 39(4):A1225-A1250, 2017] introduced a new approximation method, based on the alignment of the jump sets of the snapshots. If the structure of the jump sets changes with parameter, this assumption is too restrictive. However, these changes are typically local in parameter space, so that in this paper, we explore $h$ and $hp$-adaptive methods to resolve them. Since local refinements do not scale to high dimensions, we introduce an alternative "tensorized" adaption method.
△ Less
Submitted 31 October, 2017;
originally announced October 2017.
-
Diffusion Coefficients Estimation for Elliptic Partial Differential Equations
Authors:
Andrea Bonito,
Albert Cohen,
Ronald DeVore,
Guergana Petrova,
Gerrit Welper
Abstract:
This paper considers the Dirichlet problem $$ -\mathrm{div}(a\nabla u_a)=f \quad \hbox{on}\,\,\ D, \qquad u_a=0\quad \hbox{on}\,\,\partial D, $$ for a Lipschitz domain $D\subset \mathbb R^d$, where $a$ is a scalar diffusion function. For a fixed $f$, we discuss under which conditions is $a$ uniquely determined and when can $a$ be stably recovered from the knowledge of $u_a$.
A first result is th…
▽ More
This paper considers the Dirichlet problem $$ -\mathrm{div}(a\nabla u_a)=f \quad \hbox{on}\,\,\ D, \qquad u_a=0\quad \hbox{on}\,\,\partial D, $$ for a Lipschitz domain $D\subset \mathbb R^d$, where $a$ is a scalar diffusion function. For a fixed $f$, we discuss under which conditions is $a$ uniquely determined and when can $a$ be stably recovered from the knowledge of $u_a$.
A first result is that whenever $a\in H^1(D)$, with $0<λ\le a\le Λ$ on $D$, and $f\in L_\infty(D)$ is strictly positive, then $$ \|a-b\|_{L_2(D)}\le C\|u_a-u_b\|_{H_0^1(D)}^{1/6}. $$ More generally, it is shown that the assumption $a\in H^1(D)$ can be weakened to $a\in H^s(D)$, for certain $s<1$, at the expense of lowering the exponent $1/6$ to a value that depends on $s$.
△ Less
Submitted 16 December, 2016; v1 submitted 16 September, 2016;
originally announced September 2016.
-
Adaptive Anisotropic Petrov-Galerkin Methods for First Order Transport Equations
Authors:
W. Dahmen,
G. Kutyniok,
W. -Q Lim,
C. Schwab,
G. Welper
Abstract:
This paper builds on recent developments of adaptive methods for linear transport equations based on certain stable variational formulations of Petrov-Galerkin type. The variational formulations allow us to employ meshes with cells of arbitrary aspect ratios. We develop a refinement scheme generating highly anisotropic partitions that is inspired by shearlet systems. We establish approximation rat…
▽ More
This paper builds on recent developments of adaptive methods for linear transport equations based on certain stable variational formulations of Petrov-Galerkin type. The variational formulations allow us to employ meshes with cells of arbitrary aspect ratios. We develop a refinement scheme generating highly anisotropic partitions that is inspired by shearlet systems. We establish approximation rates for N-term approximations from corresponding piecewise polynomials for certain compact cartoon classes of functions. In contrast to earlier results in a curvelet or shearlet context the cartoon classes are concisely defined through certain characteristic parameters and the dependence of the approximation rates on these parameters is made explicit here. The approximation rate results serve then as a benchmark for subsequent applications to adaptive Galerkin solvers for transport equations. In numerical experiments, the new algorithms track C^2-curved shear layers and discontinuities stably and accurately, and realize essentially optimal rates. Finally, we treat parameter dependent transport problems, which arise in kinetic models as well as in radiative transfer. In heterogeneous media these problems feature propagation of singularities along curved characteristics precluding, in particular, fast marching methods based on ray-tracing. Since now the solutions are functions of spatial variables and parameters one has to address the curse of dimensionality. We show computationally, for a model parametric transport problem in heterogeneous media in 2 + 1 dimension, that sparse tensorization of the presently proposed spatial directionally adaptive scheme with hierarchic collocation in ordinate space based on a stable variational formulation high-dimensional phase space, the curse of dimensionality can be removed when approximating averaged bulk quantities.
△ Less
Submitted 2 January, 2016;
originally announced January 2016.
-
Transformed snapshot interpolation
Authors:
G. Welper
Abstract:
Functions with jumps and kinks typically arising from parameter dependent or stochastic hyperbolic PDEs are notoriously difficult to approximate. If the jump location in physical space is parameter dependent or random, standard approximation techniques like reduced basis methods, PODs, polynomial chaos, etc. are known to yield poor convergence rates. In order to improve these rates, we propose a n…
▽ More
Functions with jumps and kinks typically arising from parameter dependent or stochastic hyperbolic PDEs are notoriously difficult to approximate. If the jump location in physical space is parameter dependent or random, standard approximation techniques like reduced basis methods, PODs, polynomial chaos, etc. are known to yield poor convergence rates. In order to improve these rates, we propose a new approximation scheme. As reduced basis methods, it relies on snapshots for the reconstruction of parameter dependent functions so that it is efficiently applicable in a PDE context. However, we allow a transformation of the physical coordinates before the use of a snapshot in the reconstruction, which allows to realign the moving discontinuities and yields high convergence rates. The transforms are automatically computed by minimizing a training error. In order to show feasibility of this approach it is tested by 1d and 2d numerical experiments.
△ Less
Submitted 5 May, 2015;
originally announced May 2015.
-
Efficient Resolution of Anisotropic Structures
Authors:
Wolfgang Dahmen,
Chunyan Huang,
Gitta Kutyniok,
Wang-Q Lim,
Christoph Schwab,
Gerrit Welper
Abstract:
We highlight some recent new delevelopments concerning the sparse representation of possibly high-dimensional functions exhibiting strong anisotropic features and low regularity in isotropic Sobolev or Besov scales. Specifically, we focus on the solution of transport equations which exhibit propagation of singularities where, additionally, high-dimensionality enters when the convection field, and…
▽ More
We highlight some recent new delevelopments concerning the sparse representation of possibly high-dimensional functions exhibiting strong anisotropic features and low regularity in isotropic Sobolev or Besov scales. Specifically, we focus on the solution of transport equations which exhibit propagation of singularities where, additionally, high-dimensionality enters when the convection field, and hence the solutions, depend on parameters varying over some compact set. Important constituents of our approach are directionally adaptive discretization concepts motivated by compactly supported shearlet systems, and well-conditioned stable variational formulations that support trial spaces with anisotropic refinements with arbitrary directionalities. We prove that they provide tight error-residual relations which are used to contrive rigorously founded adaptive refinement schemes which converge in $L_2$. Moreover, in the context of parameter dependent problems we discuss two approaches serving different purposes and working under different regularity assumptions. For frequent query problems, making essential use of the novel well-conditioned variational formulations, a new Reduced Basis Method is outlined which exhibits a certain rate-optimal performance for indefinite, unsymmetric or singularly perturbed problems. For the radiative transfer problem with scattering a sparse tensor method is presented which mitigates or even overcomes the curse of dimensionality under suitable (so far still isotropic) regularity assumptions. Numerical examples for both methods illustrate the theoretical findings.
△ Less
Submitted 28 September, 2014;
originally announced September 2014.
-
Double Greedy Algorithms: Reduced Basis Methods for Transport Dominated Problems
Authors:
Wolfgang Dahmen,
Christian Plesken,
Gerrit Welper
Abstract:
The central objective of this paper is to develop reduced basis methods for parameter dependent transport dominated problems that are rigorously proven to exhibit rate-optimal performance when compared with the Kolmogorov $n$-widths of the solution sets. The central ingredient is the construction of computationally feasible "tight" surrogates which in turn are based on deriving a suitable well-con…
▽ More
The central objective of this paper is to develop reduced basis methods for parameter dependent transport dominated problems that are rigorously proven to exhibit rate-optimal performance when compared with the Kolmogorov $n$-widths of the solution sets. The central ingredient is the construction of computationally feasible "tight" surrogates which in turn are based on deriving a suitable well-conditioned variational formulation for the parameter dependent problem. The theoretical results are illustrated by numerical experiments for convection-diffusion and pure transport equations. In particular, the latter example sheds some light on the smoothness of the dependence of the solutions on the parameters.
△ Less
Submitted 20 February, 2013;
originally announced February 2013.