Skip to main content

Showing 1–17 of 17 results for author: Riekert, A

.
  1. arXiv:2406.14340  [pdf, other

    math.OC cs.LG math.NA

    Learning rate adaptive stochastic gradient descent optimization methods: numerical simulations for deep learning methods for partial differential equations and convergence analyses

    Authors: Steffen Dereich, Arnulf Jentzen, Adrian Riekert

    Abstract: It is known that the standard stochastic gradient descent (SGD) optimization method, as well as accelerated and adaptive SGD optimization methods such as the Adam optimizer fail to converge if the learning rates do not converge to zero (as, for example, in the situation of constant learning rates). Numerical simulations often use human-tuned deterministic learning rate schedules or small constant… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 68 pages, 8 figures

  2. arXiv:2402.05155  [pdf, other

    math.OC cs.LG

    Non-convergence to global minimizers for Adam and stochastic gradient descent optimization and constructions of local minimizers in the training of artificial neural networks

    Authors: Arnulf Jentzen, Adrian Riekert

    Abstract: Stochastic gradient descent (SGD) optimization methods such as the plain vanilla SGD method and the popular Adam optimizer are nowadays the method of choice in the training of artificial neural networks (ANNs). Despite the remarkable success of SGD methods in the ANN training in numerical simulations, it remains in essentially all practical relevant scenarios an open problem to rigorously explain… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 36 pages

  3. arXiv:2304.09250  [pdf, ps, other

    math.NT

    A proof of the corrected Sister Beiter cyclotomic coefficient conjecture inspired by Zhao and Zhang

    Authors: Branko Juran, Pieter Moree, Adrian Riekert, David Schmitz, Julian Völlmecke

    Abstract: The largest coefficient (in absolute value) of a cyclotomic polynomial $Φ_n$ is called its height $A(n)$. In case $p$ is a fixed prime it turns out that as $q$ and $r$ range over all primes satisfying $p<q<r$, the height $A(pqr)$ assumes a maximum $M(p)$. In 1968, Sister Marion Beiter conjectured that $M(p)\leq (p+1)/2$. In 2009, this was disproved for every $p\ge 11$ by Yves Gallot and Pieter Mor… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

    Comments: 20 pages, 3 tables, outcome of MPIM Internship project in 2015+2022

    Report number: MPIM-Bonn-2023 MSC Class: 11C08; 11T22

  4. arXiv:2304.05790  [pdf, ps, other

    math.NA cs.LG

    Deep neural network approximation of composite functions without the curse of dimensionality

    Authors: Adrian Riekert

    Abstract: In this article we identify a general class of high-dimensional continuous functions that can be approximated by deep neural networks (DNNs) with the rectified linear unit (ReLU) activation without the curse of dimensionality. In other words, the number of DNN parameters grows at most polynomially in the input dimension and the approximation error. The functions in our class can be expressed as a… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: 19 pages

  5. arXiv:2302.03286  [pdf, other

    math.NA stat.ML

    Algorithmically Designed Artificial Neural Networks (ADANNs): Higher order deep operator learning for parametric partial differential equations

    Authors: Arnulf Jentzen, Adrian Riekert, Philippe von Wurstemberger

    Abstract: In this article we propose a new deep learning approach to approximate operators related to parametric partial differential equations (PDEs). In particular, we introduce a new strategy to design specific artificial neural network (ANN) architectures in conjunction with specific ANN initialization schemes which are tailor-made for the particular approximation problem under consideration. In the pro… ▽ More

    Submitted 29 May, 2024; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: 39 pages, 16 Figures

  6. arXiv:2212.13111  [pdf, other

    math.OC

    Convergence to good non-optimal critical points in the training of neural networks: Gradient descent optimization with one random initialization overcomes all bad non-global local minima with high probability

    Authors: Shokhrukh Ibragimov, Arnulf Jentzen, Adrian Riekert

    Abstract: Gradient descent (GD) methods for the training of artificial neural networks (ANNs) belong nowadays to the most heavily employed computational schemes in the digital world. Despite the compelling success of such methods, it remains an open problem to provide a rigorous theoretical justification for the success of GD methods in the training of ANNs. The main difficulty is that the optimization risk… ▽ More

    Submitted 26 December, 2022; originally announced December 2022.

    Comments: 98 pages, 15 figures, 10 Python codes

    MSC Class: 65K10; 65C50; 68T05; 60H35

  7. arXiv:2207.06246  [pdf, ps, other

    math.OC cs.LG

    Normalized gradient flow optimization in the training of ReLU artificial neural networks

    Authors: Simon Eberle, Arnulf Jentzen, Adrian Riekert, Georg Weiss

    Abstract: The training of artificial neural networks (ANNs) is nowadays a highly relevant algorithmic procedure with many applications in science and industry. Roughly speaking, ANNs can be regarded as iterated compositions between affine linear functions and certain fixed nonlinear functions, which are usually multidimensional versions of a one-dimensional so-called activation function. The most popular ch… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: 26 pages, 1 figure

  8. arXiv:2202.11481  [pdf, other

    math.OC

    On the existence of infinitely many realization functions of non-global local minima in the training of artificial neural networks with ReLU activation

    Authors: Shokhrukh Ibragimov, Arnulf Jentzen, Timo Kröger, Adrian Riekert

    Abstract: Gradient descent (GD) type optimization schemes are the standard instruments to train fully connected feedforward artificial neural networks (ANNs) with rectified linear unit (ReLU) activation and can be considered as temporal discretizations of solutions of gradient flow (GF) differential equations. It has recently been proved that the risk of every bounded GF trajectory converges in the training… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

    Comments: 49 pages, 1 figure

    MSC Class: 68T07

  9. arXiv:2112.09684  [pdf, other

    math.OC cs.LG math.NA math.ST

    On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks

    Authors: Arnulf Jentzen, Adrian Riekert

    Abstract: In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers and we prove convergence of the risk of the GD optimization method with random initializations in the training of such ANNs under the assumption that the unnormalized probability density function of the probability distribution of the input data of the considered supervised learnin… ▽ More

    Submitted 13 July, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

    Comments: 89 pages, 15 figures

    Journal ref: Journal of Machine Learning, 1 (2022), pp. 141-246

  10. arXiv:2112.07369  [pdf, other

    cs.LG math.NA math.PR

    Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions

    Authors: Martin Hutzenthaler, Arnulf Jentzen, Katharina Pohl, Adrian Riekert, Luca Scarpa

    Abstract: In many numerical simulations stochastic gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs) but till this day it remains an open problem of research to provide a mathematical convergence analysis which rigorously explains the success of SGD type optimization methods in the training of DNNs. In this work we study SGD type optimiz… ▽ More

    Submitted 22 June, 2023; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: 71 pages, 5 figures, 2 tables, 4 Python source codes. To appear in Electronic Research Archive

  11. arXiv:2108.08106  [pdf, other

    cs.LG math.DS math.NA

    Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation

    Authors: Simon Eberle, Arnulf Jentzen, Adrian Riekert, Georg S. Weiss

    Abstract: The training of artificial neural networks (ANNs) with rectified linear unit (ReLU) activation via gradient descent (GD) type optimization schemes is nowadays a common industrially relevant procedure. Till this day in the scientific literature there is in general no mathematical convergence analysis which explains the numerical success of GD type optimization schemes in the training of ANNs with R… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

    Comments: 30 pages. arXiv admin note: text overlap with arXiv:2107.04479, arXiv:2108.04620

    Journal ref: Electronic Research Archive 2023, Volume 31, Issue 5: 2519-2554

  12. arXiv:2108.04620  [pdf, other

    math.OC cs.LG math.NA

    A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functions

    Authors: Arnulf Jentzen, Adrian Riekert

    Abstract: Gradient descent (GD) type optimization methods are the standard instrument to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Despite the great success of GD type optimization methods in numerical simulations for the training of ANNs with ReLU activation, it remains - even in the simplest situation of the plain vanilla GD optimization method with random initi… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

    Comments: 44 pages. arXiv admin note: text overlap with arXiv:2107.04479

    Journal ref: Journal of Machine Learning Research 23, 260 (2022), pp. 1-50

  13. arXiv:2107.04479  [pdf, ps, other

    cs.LG math.DS math.NA

    Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation

    Authors: Arnulf Jentzen, Adrian Riekert

    Abstract: Gradient descent (GD) type optimization schemes are the standard methods to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Such schemes can be considered as discretizations of gradient flows (GFs) associated to the training of ANNs with ReLU activation and most of the key difficulties in the mathematical convergence analysis of GD type optimization schemes in… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: 37 pages

    Journal ref: Journal of Mathematical Analysis and Applications 517, 2 (2023)

  14. arXiv:2104.00277  [pdf, ps, other

    math.NA cs.LG math.PR math.ST

    A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

    Authors: Arnulf Jentzen, Adrian Riekert

    Abstract: In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully-connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant. In the established convergence result the considered artificial neural network… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Comments: 29 pages

    Journal ref: Zeitschrift für angewandte Mathematik und Physik 73 (2022)

  15. arXiv:2102.09924  [pdf, ps, other

    math.NA cs.LG math.ST

    A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions

    Authors: Patrick Cheridito, Arnulf Jentzen, Adrian Riekert, Florian Rossmannek

    Abstract: Gradient descent optimization algorithms are the standard ingredients that are used to train artificial neural networks (ANNs). Even though a huge number of numerical simulations indicate that gradient descent optimization methods do indeed convergence in the training of ANNs, until today there is no rigorous theoretical analysis which proves (or disproves) this conjecture. In particular, even in… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

    Comments: 23 pages

    Journal ref: Journal of Complexity (2022)

  16. Convergence Rates for Empirical Measures of Markov Chains in Dual and Wasserstein Distances

    Authors: Adrian Riekert

    Abstract: We consider a Markov chain on $\mathbb{R}^d$ with invariant measure $μ$. We are interested in the rate of convergence of the empirical measures towards the invariant measure with respect to various dual distances, including in particular the $1$-Wasserstein distance. The main result of this article is a new upper bound for the expected distance, which is proved by combining a Fourier expansion wit… ▽ More

    Submitted 12 October, 2022; v1 submitted 18 January, 2021; originally announced January 2021.

    Comments: 15 pages

    Journal ref: Statistics & Probability Letters 189 (2022)

  17. arXiv:2012.08443  [pdf, ps, other

    cs.LG math.NA math.ST

    Strong overall error analysis for the training of artificial neural networks via random initializations

    Authors: Arnulf Jentzen, Adrian Riekert

    Abstract: Although deep learning based approximation algorithms have been applied very successfully to numerous problems, at the moment the reasons for their performance are not entirely understood from a mathematical point of view. Recently, estimates for the convergence of the overall error have been obtained in the situation of deep supervised learning, but with an extremely slow rate of convergence. In… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

    Comments: 40 pages

    Journal ref: Communications in Mathematics and Statistics (2023)