Skip to main content

Showing 1–50 of 127 results for author: Jentzen, A

.
  1. arXiv:2407.08100  [pdf, ps, other

    cs.LG math.OC math.PR

    Non-convergence of Adam and other adaptive stochastic gradient descent optimization methods for non-vanishing learning rates

    Authors: Steffen Dereich, Robin Graeber, Arnulf Jentzen

    Abstract: Deep learning algorithms - typically consisting of a class of deep neural networks trained by a stochastic gradient descent (SGD) optimization method - are nowadays the key ingredients in many artificial intelligence (AI) systems and have revolutionized our ways of working and living in modern societies. For example, SGD methods are used to train powerful large language models (LLMs) such as versi… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 54 pages

    MSC Class: 60J22 (Primary); 65K10; 60J20; 65C40 (Secondary) ACM Class: G.1.6; F.2.0; G.3

  2. arXiv:2406.14340  [pdf, other

    math.OC cs.LG math.NA

    Learning rate adaptive stochastic gradient descent optimization methods: numerical simulations for deep learning methods for partial differential equations and convergence analyses

    Authors: Steffen Dereich, Arnulf Jentzen, Adrian Riekert

    Abstract: It is known that the standard stochastic gradient descent (SGD) optimization method, as well as accelerated and adaptive SGD optimization methods such as the Adam optimizer fail to converge if the learning rates do not converge to zero (as, for example, in the situation of constant learning rates). Numerical simulations often use human-tuned deterministic learning rate schedules or small constant… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 68 pages, 8 figures

  3. arXiv:2406.10876  [pdf, ps, other

    cs.LG math.NA math.PR

    Deep neural networks with ReLU, leaky ReLU, and softplus activation provably overcome the curse of dimensionality for space-time solutions of semilinear partial differential equations

    Authors: Julia Ackermann, Arnulf Jentzen, Benno Kuckuck, Joshua Lee Padgett

    Abstract: It is a challenging topic in applied mathematics to solve high-dimensional nonlinear partial differential equations (PDEs). Standard approximation methods for nonlinear PDEs suffer under the curse of dimensionality (COD) in the sense that the number of computational operations of the approximation method grows at least exponentially in the PDE dimension and with such methods it is essentially impo… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 64 pages. arXiv admin note: text overlap with arXiv:2309.13722, arXiv:2310.20360

    MSC Class: 65M15; 65C05; 68T07 (Primary) 60H35 (Secondary)

  4. arXiv:2402.05155  [pdf, other

    math.OC cs.LG

    Non-convergence to global minimizers for Adam and stochastic gradient descent optimization and constructions of local minimizers in the training of artificial neural networks

    Authors: Arnulf Jentzen, Adrian Riekert

    Abstract: Stochastic gradient descent (SGD) optimization methods such as the plain vanilla SGD method and the popular Adam optimizer are nowadays the method of choice in the training of artificial neural networks (ANNs). Despite the remarkable success of SGD methods in the ANN training in numerical simulations, it remains in essentially all practical relevant scenarios an open problem to rigorously explain… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 36 pages

  5. arXiv:2310.20360  [pdf, other

    cs.LG cs.AI math.NA math.PR stat.ML

    Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory

    Authors: Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger

    Abstract: This book aims to provide an introduction to the topic of deep learning algorithms. We review essential components of deep learning algorithms in full mathematical detail including different artificial neural network (ANN) architectures (such as fully-connected feedforward ANNs, convolutional ANNs, recurrent ANNs, residual ANNs, and ANNs with batch normalization) and different optimization algorit… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: 601 pages, 36 figures, 45 source codes

    MSC Class: 68T07

  6. arXiv:2309.13722  [pdf, ps, other

    math.NA cs.LG math.PR

    Deep neural networks with ReLU, leaky ReLU, and softplus activation provably overcome the curse of dimensionality for Kolmogorov partial differential equations with Lipschitz nonlinearities in the $L^p$-sense

    Authors: Julia Ackermann, Arnulf Jentzen, Thomas Kruse, Benno Kuckuck, Joshua Lee Padgett

    Abstract: Recently, several deep learning (DL) methods for approximating high-dimensional partial differential equations (PDEs) have been proposed. The interest that these methods have generated in the literature is in large part due to simulations which appear to demonstrate that such DL methods have the capacity to overcome the curse of dimensionality (COD) for PDEs in the sense that the number of computa… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: 52 pages

    MSC Class: 65M15; 65C05; 68T07 (Primary) 60H35 (Secondary)

  7. arXiv:2303.03390  [pdf, ps, other

    math.OC math.PR

    Nonlinear Monte Carlo methods with polynomial runtime for Bellman equations of discrete time high-dimensional stochastic optimal control problems

    Authors: Christian Beck, Arnulf Jentzen, Konrad Kleinberg, Thomas Kruse

    Abstract: Discrete time stochastic optimal control problems and Markov decision processes (MDPs), respectively, serve as fundamental models for problems that involve sequential decision making under uncertainty and as such constitute the theoretical foundation of reinforcement learning. In this article we study the numerical approximation of MDPs with infinite time horizon, finite control set, and general s… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

    MSC Class: 90C40; 90C39; 60J05; 93E20; 65C05

  8. arXiv:2302.14690  [pdf, other

    math.OC cs.LG math.NA stat.ML

    On the existence of minimizers in shallow residual ReLU neural network optimization landscapes

    Authors: Steffen Dereich, Arnulf Jentzen, Sebastian Kassing

    Abstract: Many mathematical convergence results for gradient descent (GD) based algorithms employ the assumption that the GD process is (almost surely) bounded and, also in concrete numerical simulations, divergence of the GD process may slow down, or even completely rule out, convergence of the error function. In practical relevant learning problems, it thus seems to be advisable to design the ANN architec… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    MSC Class: Primary 68T07; Secondary 68T05; 41A50

  9. arXiv:2302.03286  [pdf, other

    math.NA stat.ML

    Algorithmically Designed Artificial Neural Networks (ADANNs): Higher order deep operator learning for parametric partial differential equations

    Authors: Arnulf Jentzen, Adrian Riekert, Philippe von Wurstemberger

    Abstract: In this article we propose a new deep learning approach to approximate operators related to parametric partial differential equations (PDEs). In particular, we introduce a new strategy to design specific artificial neural network (ANN) architectures in conjunction with specific ANN initialization schemes which are tailor-made for the particular approximation problem under consideration. In the pro… ▽ More

    Submitted 29 May, 2024; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: 39 pages, 16 Figures

  10. arXiv:2301.08284  [pdf, ps, other

    math.NA cs.AI

    The necessity of depth for artificial neural networks to approximate certain classes of smooth and bounded functions without the curse of dimensionality

    Authors: Lukas Gonon, Robin Graeber, Arnulf Jentzen

    Abstract: In this article we study high-dimensional approximation capacities of shallow and deep artificial neural networks (ANNs) with the rectified linear unit (ReLU) activation. In particular, it is a key contribution of this work to reveal that for all $a,b\in\mathbb{R}$ with $b-a\geq 7$ we have that the functions $[a,b]^d\ni x=(x_1,\dots,x_d)\mapsto\prod_{i=1}^d x_i\in\mathbb{R}$ for $d\in\mathbb{N}$ a… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

    Comments: 101 pages, 1 figure. arXiv admin note: substantial text overlap with arXiv:2112.14523

    MSC Class: 65D40; 68T07

  11. arXiv:2212.13111  [pdf, other

    math.OC

    Convergence to good non-optimal critical points in the training of neural networks: Gradient descent optimization with one random initialization overcomes all bad non-global local minima with high probability

    Authors: Shokhrukh Ibragimov, Arnulf Jentzen, Adrian Riekert

    Abstract: Gradient descent (GD) methods for the training of artificial neural networks (ANNs) belong nowadays to the most heavily employed computational schemes in the digital world. Despite the compelling success of such methods, it remains an open problem to provide a rigorous theoretical justification for the success of GD methods in the training of ANNs. The main difficulty is that the optimization risk… ▽ More

    Submitted 26 December, 2022; originally announced December 2022.

    Comments: 98 pages, 15 figures, 10 Python codes

    MSC Class: 65K10; 65C50; 68T05; 60H35

  12. arXiv:2211.15641  [pdf, ps, other

    math.OC

    Blow up phenomena for gradient descent optimization methods in the training of artificial neural networks

    Authors: Davide Gallon, Arnulf Jentzen, Felix Lindner

    Abstract: In this article we investigate blow up phenomena for gradient descent optimization methods in the training of artificial neural networks (ANNs). Our theoretical analysis is focused on shallow ANNs with one neuron on the input layer, one neuron on the output layer, and one hidden layer. For ANNs with ReLU activation and at least two neurons on the hidden layer we establish the existence of a target… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Comments: 84 pages, one figure

  13. arXiv:2210.13530  [pdf, other

    math.NA math.PR stat.CO

    An efficient Monte Carlo scheme for Zakai equations

    Authors: Christian Beck, Sebastian Becker, Patrick Cheridito, Arnulf Jentzen, Ariel Neufeld

    Abstract: In this paper we develop a numerical method for efficiently approximating solutions of certain Zakai equations in high dimensions. The key idea is to transform a given Zakai SPDE into a PDE with random coefficients. We show that under suitable regularity assumptions on the coefficients of the Zakai equation, the corresponding random PDE admits a solution random field which, for almost all realizat… ▽ More

    Submitted 20 August, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    MSC Class: 65C05; 65M75; 60H15; 62M20

  14. arXiv:2208.02083  [pdf, ps, other

    cs.LG math.DS math.OC

    Gradient descent provably escapes saddle points in the training of shallow ReLU networks

    Authors: Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek

    Abstract: Dynamical systems theory has recently been applied in optimization to prove that gradient descent algorithms avoid so-called strict saddle points of the loss function. However, in many modern machine learning applications, the required regularity conditions are not satisfied. In particular, this is the case for rectified linear unit (ReLU) networks. In this paper, we prove a variant of the relevan… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    MSC Class: 68T07; 37D10 ACM Class: I.2.6; G.1.6

  15. arXiv:2207.06246  [pdf, ps, other

    math.OC cs.LG

    Normalized gradient flow optimization in the training of ReLU artificial neural networks

    Authors: Simon Eberle, Arnulf Jentzen, Adrian Riekert, Georg Weiss

    Abstract: The training of artificial neural networks (ANNs) is nowadays a highly relevant algorithmic procedure with many applications in science and industry. Roughly speaking, ANNs can be regarded as iterated compositions between affine linear functions and certain fixed nonlinear functions, which are usually multidimensional versions of a one-dimensional so-called activation function. The most popular ch… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: 26 pages, 1 figure

  16. arXiv:2206.13646  [pdf, ps, other

    cs.LG math.OC

    On bounds for norms of reparameterized ReLU artificial neural network parameters: sums of fractional powers of the Lipschitz norm control the network parameter vector

    Authors: Arnulf Jentzen, Timo Kröger

    Abstract: It is an elementary fact in the scientific literature that the Lipschitz norm of the realization function of a feedforward fully-connected rectified linear unit (ReLU) artificial neural network (ANN) can, up to a multiplicative constant, be bounded from above by sums of powers of the norm of the ANN parameter vector. Roughly speaking, in this work we reveal in the case of shallow ANNs that the con… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: 39 pages, 1 figure

  17. arXiv:2205.03672  [pdf, other

    math.NA cs.LG math.PR

    Deep learning approximations for non-local nonlinear PDEs with Neumann boundary conditions

    Authors: Victor Boussange, Sebastian Becker, Arnulf Jentzen, Benno Kuckuck, Loïc Pellissier

    Abstract: Nonlinear partial differential equations (PDEs) are used to model dynamical processes in a large number of scientific fields, ranging from finance to biology. In many applications standard local models are not sufficient to accurately account for certain non-local phenomena such as, e.g., interactions at a distance. In order to properly capture these phenomena non-local nonlinear PDE models are fr… ▽ More

    Submitted 7 May, 2022; originally announced May 2022.

    Comments: 59 pages

    MSC Class: 35R09 (Primary) 65M75; 45K05; 35K20; 65C05; 65M22; 68T07 (Secondary)

  18. arXiv:2202.11481  [pdf, other

    math.OC

    On the existence of infinitely many realization functions of non-global local minima in the training of artificial neural networks with ReLU activation

    Authors: Shokhrukh Ibragimov, Arnulf Jentzen, Timo Kröger, Adrian Riekert

    Abstract: Gradient descent (GD) type optimization schemes are the standard instruments to train fully connected feedforward artificial neural networks (ANNs) with rectified linear unit (ReLU) activation and can be considered as temporal discretizations of solutions of gradient flow (GF) differential equations. It has recently been proved that the risk of every bounded GF trajectory converges in the training… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

    Comments: 49 pages, 1 figure

    MSC Class: 68T07

  19. arXiv:2202.02717  [pdf, other

    math.NA math.AP math.PR

    Learning the random variables in Monte Carlo simulations with stochastic gradient descent: Machine learning for parametric PDEs and financial derivative pricing

    Authors: Sebastian Becker, Arnulf Jentzen, Marvin S. Müller, Philippe von Wurstemberger

    Abstract: In financial engineering, prices of financial products are computed approximately many times each trading day with (slightly) different parameters in each calculation. In many financial models such prices can be approximated by means of Monte Carlo (MC) simulations. To obtain a good approximation the MC sample size usually needs to be considerably large resulting in a long computing time to obtain… ▽ More

    Submitted 8 June, 2023; v1 submitted 6 February, 2022; originally announced February 2022.

    Comments: 71 pages, 4 Figures, 14 Tables; to appear in Math. Finance

    MSC Class: 35K15; 65C05; 65M75; 68T99; 91G20

  20. arXiv:2112.14523  [pdf, ps, other

    math.NA

    Deep neural network approximation theory for high-dimensional functions

    Authors: Pierfrancesco Beneventano, Patrick Cheridito, Robin Graeber, Arnulf Jentzen, Benno Kuckuck

    Abstract: The purpose of this article is to develop machinery to study the capacity of deep neural networks (DNNs) to approximate high-dimensional functions. In particular, we show that DNNs have the expressive power to overcome the curse of dimensionality in the approximation of a large class of functions. More precisely, we prove that these functions can be approximated by DNNs on compact sets such that t… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

    Comments: 82 pages, 1 figure

  21. arXiv:2112.09684  [pdf, other

    math.OC cs.LG math.NA math.ST

    On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks

    Authors: Arnulf Jentzen, Adrian Riekert

    Abstract: In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers and we prove convergence of the risk of the GD optimization method with random initializations in the training of such ANNs under the assumption that the unnormalized probability density function of the probability distribution of the input data of the considered supervised learnin… ▽ More

    Submitted 13 July, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

    Comments: 89 pages, 15 figures

    Journal ref: Journal of Machine Learning, 1 (2022), pp. 141-246

  22. arXiv:2112.07369  [pdf, other

    cs.LG math.NA math.PR

    Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions

    Authors: Martin Hutzenthaler, Arnulf Jentzen, Katharina Pohl, Adrian Riekert, Luca Scarpa

    Abstract: In many numerical simulations stochastic gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs) but till this day it remains an open problem of research to provide a mathematical convergence analysis which rigorously explains the success of SGD type optimization methods in the training of DNNs. In this work we study SGD type optimiz… ▽ More

    Submitted 22 June, 2023; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: 71 pages, 5 figures, 2 tables, 4 Python source codes. To appear in Electronic Research Archive

  23. arXiv:2110.08297  [pdf, ps, other

    math.NA math.PR

    Strong $L^p$-error analysis of nonlinear Monte Carlo approximations for high-dimensional semilinear partial differential equations

    Authors: Martin Hutzenthaler, Arnulf Jentzen, Benno Kuckuck, Joshua Lee Padgett

    Abstract: Full-history recursive multilevel Picard (MLP) approximation schemes have been shown to overcome the curse of dimensionality in the numerical approximation of high-dimensional semilinear partial differential equations (PDEs) with general time horizons and Lipschitz continuous nonlinearities. However, each of the error analyses for MLP approximation schemes in the existing literature studies the… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

    Comments: 42 pages.

  24. arXiv:2108.10602  [pdf, ps, other

    math.NA math.PR

    Overcoming the curse of dimensionality in the numerical approximation of backward stochastic differential equations

    Authors: Martin Hutzenthaler, Arnulf Jentzen, Thomas Kruse, Tuan Anh Nguyen

    Abstract: Backward stochastic differential equations (BSDEs) belong nowadays to the most frequently studied equations in stochastic analysis and computational stochastics. BSDEs in applications are often nonlinear and high-dimensional. In nearly all cases such nonlinear high-dimensional BSDEs cannot be solved explicitly and it has been and still is a very active topic of research to design and analyze numer… ▽ More

    Submitted 24 August, 2021; originally announced August 2021.

  25. arXiv:2108.08106  [pdf, other

    cs.LG math.DS math.NA

    Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation

    Authors: Simon Eberle, Arnulf Jentzen, Adrian Riekert, Georg S. Weiss

    Abstract: The training of artificial neural networks (ANNs) with rectified linear unit (ReLU) activation via gradient descent (GD) type optimization schemes is nowadays a common industrially relevant procedure. Till this day in the scientific literature there is in general no mathematical convergence analysis which explains the numerical success of GD type optimization schemes in the training of ANNs with R… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

    Comments: 30 pages. arXiv admin note: text overlap with arXiv:2107.04479, arXiv:2108.04620

    Journal ref: Electronic Research Archive 2023, Volume 31, Issue 5: 2519-2554

  26. arXiv:2108.04620  [pdf, other

    math.OC cs.LG math.NA

    A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functions

    Authors: Arnulf Jentzen, Adrian Riekert

    Abstract: Gradient descent (GD) type optimization methods are the standard instrument to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Despite the great success of GD type optimization methods in numerical simulations for the training of ANNs with ReLU activation, it remains - even in the simplest situation of the plain vanilla GD optimization method with random initi… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

    Comments: 44 pages. arXiv admin note: text overlap with arXiv:2107.04479

    Journal ref: Journal of Machine Learning Research 23, 260 (2022), pp. 1-50

  27. arXiv:2107.04479  [pdf, ps, other

    cs.LG math.DS math.NA

    Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation

    Authors: Arnulf Jentzen, Adrian Riekert

    Abstract: Gradient descent (GD) type optimization schemes are the standard methods to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Such schemes can be considered as discretizations of gradient flows (GFs) associated to the training of ANNs with ReLU activation and most of the key difficulties in the mathematical convergence analysis of GD type optimization schemes in… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: 37 pages

    Journal ref: Journal of Mathematical Analysis and Applications 517, 2 (2023)

  28. arXiv:2104.00277  [pdf, ps, other

    math.NA cs.LG math.PR math.ST

    A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

    Authors: Arnulf Jentzen, Adrian Riekert

    Abstract: In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully-connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant. In the established convergence result the considered artificial neural network… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Comments: 29 pages

    Journal ref: Zeitschrift für angewandte Mathematik und Physik 73 (2022)

  29. Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions

    Authors: Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek

    Abstract: In this paper, we analyze the landscape of the true loss of neural networks with one hidden layer and ReLU, leaky ReLU, or quadratic activation. In all three cases, we provide a complete classification of the critical points in the case where the target function is affine and one-dimensional. In particular, we show that there exist no local maxima and clarify the structure of saddle points. Moreov… ▽ More

    Submitted 6 July, 2022; v1 submitted 19 March, 2021; originally announced March 2021.

    MSC Class: 68T07 ACM Class: I.2.6

    Journal ref: J Nonlinear Sci 32, 64 (2022)

  30. arXiv:2103.04488  [pdf, ps, other

    math.NA

    Lower bounds for artificial neural network approximations: A proof that shallow neural networks fail to overcome the curse of dimensionality

    Authors: Philipp Grohs, Shokhrukh Ibragimov, Arnulf Jentzen, Sarah Koppensteiner

    Abstract: Artificial neural networks (ANNs) have become a very powerful tool in the approximation of high-dimensional functions. Especially, deep ANNs, consisting of a large number of hidden layers, have been very successfully used in a series of practical relevant computational problems involving high-dimensional input data ranging from classification tasks in supervised learning to optimal decision proble… ▽ More

    Submitted 7 March, 2021; originally announced March 2021.

    Comments: 53 pages

  31. arXiv:2103.02350  [pdf, ps, other

    math.NA math.PR

    Full history recursive multilevel Picard approximations for ordinary differential equations with expectations

    Authors: Christian Beck, Martin Hutzenthaler, Arnulf Jentzen, Emilia Magnani

    Abstract: We consider ordinary differential equations (ODEs) which involve expectations of a random variable. These ODEs are special cases of McKean-Vlasov stochastic differential equations (SDEs). A plain vanilla Monte Carlo approximation method for such ODEs requires a computational cost of order $\varepsilon^{-3}$ to achieve a root-mean-square error of size $\varepsilon$. In this work we adapt recently i… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Comments: 24 pages. arXiv admin note: substantial text overlap with arXiv:1903.05985

    MSC Class: 65Lxx; 65Mxx; 65Cxx; 65M75 ACM Class: G.1.0; G.1.7; G.1.m; G.3

  32. arXiv:2102.11840  [pdf, ps, other

    cs.LG math.NA math.PR

    Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases

    Authors: Arnulf Jentzen, Timo Kröger

    Abstract: In recent years, artificial neural networks have developed into a powerful tool for dealing with a multitude of problems for which classical solution approaches reach their limits. However, it is still unclear why randomly initialized gradient descent optimization algorithms, such as the well-known batch gradient descent, are able to achieve zero training loss in many situations even though the ob… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

    Comments: 38 pages

  33. arXiv:2102.09924  [pdf, ps, other

    math.NA cs.LG math.ST

    A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions

    Authors: Patrick Cheridito, Arnulf Jentzen, Adrian Riekert, Florian Rossmannek

    Abstract: Gradient descent optimization algorithms are the standard ingredients that are used to train artificial neural networks (ANNs). Even though a huge number of numerical simulations indicate that gradient descent optimization methods do indeed convergence in the training of ANNs, until today there is no rigorous theoretical analysis which proves (or disproves) this conjecture. In particular, even in… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

    Comments: 23 pages

    Journal ref: Journal of Complexity (2022)

  34. An overview on deep learning-based approximation methods for partial differential equations

    Authors: Christian Beck, Martin Hutzenthaler, Arnulf Jentzen, Benno Kuckuck

    Abstract: It is one of the most challenging problems in applied mathematics to approximatively solve high-dimensional partial differential equations (PDEs). Recently, several deep learning-based approximation algorithms for attacking this problem have been proposed and tested numerically on a number of examples of high-dimensional PDEs. This has given rise to a lively field of research in which deep learnin… ▽ More

    Submitted 18 November, 2022; v1 submitted 22 December, 2020; originally announced December 2020.

    Comments: 49 pages. Compared to the first version, the manuscript has been significantly expanded. In particular, Python source code implementing several of the presented methods using PyTorch, as well as numerical simulations have been added

    MSC Class: 65M99 (Primary); 35-02; 65-02; 68T07 (Secondary)

    Journal ref: Discrete Contin. Dyn. Syst. Ser. B 28 (2023), no. 6, 3697-3746

  35. arXiv:2012.08443  [pdf, ps, other

    cs.LG math.NA math.ST

    Strong overall error analysis for the training of artificial neural networks via random initializations

    Authors: Arnulf Jentzen, Adrian Riekert

    Abstract: Although deep learning based approximation algorithms have been applied very successfully to numerous problems, at the moment the reasons for their performance are not entirely understood from a mathematical point of view. Recently, estimates for the convergence of the overall error have been obtained in the situation of deep supervised learning, but with an extremely slow rate of convergence. In… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

    Comments: 40 pages

    Journal ref: Communications in Mathematics and Statistics (2023)

  36. arXiv:2012.04326  [pdf, other

    math.NA

    High-dimensional approximation spaces of artificial neural networks and applications to partial differential equations

    Authors: Pierfrancesco Beneventano, Patrick Cheridito, Arnulf Jentzen, Philippe von Wurstemberger

    Abstract: In this paper we develop a new machinery to study the capacity of artificial neural networks (ANNs) to approximate high-dimensional functions without suffering from the curse of dimensionality. Specifically, we introduce a concept which we refer to as approximation spaces of artificial neural networks and we present several tools to handle those spaces. Roughly speaking, approximation spaces consi… ▽ More

    Submitted 8 December, 2020; originally announced December 2020.

    Comments: 32 pages

  37. arXiv:2012.01194  [pdf, ps, other

    math.NA cs.LG math.PR stat.ML

    Deep learning based numerical approximation algorithms for stochastic partial differential equations and high-dimensional nonlinear filtering problems

    Authors: Christian Beck, Sebastian Becker, Patrick Cheridito, Arnulf Jentzen, Ariel Neufeld

    Abstract: In this article we introduce and study a deep learning based approximation algorithm for solutions of stochastic partial differential equations (SPDEs). In the proposed approximation algorithm we employ a deep neural network for every realization of the driving noise process of the SPDE to approximate the solution process of the SPDE under consideration. We test the performance of the proposed app… ▽ More

    Submitted 2 December, 2020; originally announced December 2020.

  38. arXiv:2009.13989  [pdf, ps, other

    math.PR cs.CC math.NA

    Nonlinear Monte Carlo methods with polynomial runtime for high-dimensional iterated nested expectations

    Authors: Christian Beck, Arnulf Jentzen, Thomas Kruse

    Abstract: The approximative calculation of iterated nested expectations is a recurring challenging problem in applications. Nested expectations appear, for example, in the numerical approximation of solutions of backward stochastic differential equations (BSDEs), in the numerical approximation of solutions of semilinear parabolic partial differential equations (PDEs), in statistical physics, in optimal stop… ▽ More

    Submitted 29 September, 2020; originally announced September 2020.

    Comments: 47 pages

    MSC Class: 65C05 (Primary) 65M75; 68Q25 (Secondary)

  39. arXiv:2009.02484  [pdf, ps, other

    math.NA math.PR

    Multilevel Picard approximations for high-dimensional semilinear second-order PDEs with Lipschitz nonlinearities

    Authors: Martin Hutzenthaler, Arnulf Jentzen, Thomas Kruse, Tuan Anh Nguyen

    Abstract: The recently introduced full-history recursive multilevel Picard (MLP) approximation methods have turned out to be quite successful in the numerical approximation of solutions of high-dimensional nonlinear PDEs. In particular, there are mathematical convergence results in the literature which prove that MLP approximation methods do overcome the curse of dimensionality in the numerical approximatio… ▽ More

    Submitted 9 October, 2020; v1 submitted 5 September, 2020; originally announced September 2020.

  40. Algorithms for Solving High Dimensional PDEs: From Nonlinear Monte Carlo to Machine Learning

    Authors: Weinan E, Jiequn Han, Arnulf Jentzen

    Abstract: In recent years, tremendous progress has been made on numerical algorithms for solving partial differential equations (PDEs) in a very high dimension, using ideas from either nonlinear (multilevel) Monte Carlo or deep learning. They are potentially free of the curse of dimensionality for many different applications and have been proven to be so in the case of some nonlinear Monte Carlo methods for… ▽ More

    Submitted 11 September, 2020; v1 submitted 30 August, 2020; originally announced August 2020.

    MSC Class: 65C05; 65K10; 65M75; 90C06

    Journal ref: Nonlinearity 35 (2022) 278-310

  41. arXiv:2007.02723  [pdf, ps, other

    math.NA cs.LG math.OC math.PR stat.ML

    Weak error analysis for stochastic gradient descent optimization algorithms

    Authors: Aritz Bercher, Lukas Gonon, Arnulf Jentzen, Diyora Salimova

    Abstract: Stochastic gradient descent (SGD) type optimization schemes are fundamental ingredients in a large number of machine learning based algorithms. In particular, SGD type optimization schemes are frequently employed in applications involving natural language processing, object and face recognition, fraud detection, computational advertisement, and numerical approximations of partial differential equa… ▽ More

    Submitted 21 July, 2020; v1 submitted 3 July, 2020; originally announced July 2020.

    Comments: 123 pages

  42. arXiv:2006.07075  [pdf, ps, other

    cs.LG math.NA stat.ML

    Non-convergence of stochastic gradient descent in the training of deep neural networks

    Authors: Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek

    Abstract: Deep neural networks have successfully been trained in various application areas with stochastic gradient descent. However, there exists no rigorous mathematical explanation why this works so well. The training of neural networks with stochastic gradient descent has four different discretization parameters: (i) the network architecture; (ii) the amount of training data; (iii) the number of gradien… ▽ More

    Submitted 29 January, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

    MSC Class: 68T07 (Primary) 65D15 (Secondary) ACM Class: I.2.6

    Journal ref: J. Complexity 64 (2021)

  43. arXiv:2006.02199  [pdf, ps, other

    math.PR cs.LG math.AP math.NA

    Space-time deep neural network approximations for high-dimensional partial differential equations

    Authors: Fabian Hornung, Arnulf Jentzen, Diyora Salimova

    Abstract: It is one of the most challenging issues in applied mathematics to approximately solve high-dimensional partial differential equations (PDEs) and most of the numerical approximation methods for PDEs in the scientific literature suffer from the so-called curse of dimensionality in the sense that the number of computational operations employed in the corresponding approximation scheme to obtain an a… ▽ More

    Submitted 3 June, 2024; v1 submitted 3 June, 2020; originally announced June 2020.

    Comments: 56 pages, 2 figures

  44. Numerical simulations for full history recursive multilevel Picard approximations for systems of high-dimensional partial differential equations

    Authors: Sebastian Becker, Ramon Braunwarth, Martin Hutzenthaler, Arnulf Jentzen, Philippe von Wurstemberger

    Abstract: One of the most challenging issues in applied mathematics is to develop and analyze algorithms which are able to approximately compute solutions of high-dimensional nonlinear partial differential equations (PDEs). In particular, it is very hard to develop approximation algorithms which do not suffer under the curse of dimensionality in the sense that the number of computational operations needed b… ▽ More

    Submitted 25 May, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

    Comments: 21 pages

    MSC Class: 65M75 ACM Class: G.1.8

    Journal ref: Commun. Comput. Phys. 28 (2020), no. 5, 2109-2138

  45. On nonlinear Feynman-Kac formulas for viscosity solutions of semilinear parabolic partial differential equations

    Authors: Christian Beck, Martin Hutzenthaler, Arnulf Jentzen

    Abstract: The classical Feynman-Kac identity builds a bridge between stochastic analysis and partial differential equations (PDEs) by providing stochastic representations for classical solutions of linear Kolmogorov PDEs. This opens the door for the derivation of sampling based Monte Carlo approximation methods, which can be meshfree and thereby stand a chance to approximate solutions of PDEs without suffer… ▽ More

    Submitted 16 April, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: 54 pages

    Journal ref: Stochastics and Dynamics (2021), 2150048, 68 pages

  46. arXiv:2003.01291  [pdf, other

    math.ST cs.LG math.NA math.PR stat.ML

    Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation

    Authors: Arnulf Jentzen, Timo Welti

    Abstract: In spite of the accomplishments of deep learning based algorithms in numerous applications and very broad corresponding research interest, at the moment there is still no rigorous understanding of the reasons why such algorithms produce useful results in certain situations. A thorough mathematical analysis of deep learning based algorithms seems to be crucial in order to improve our understanding… ▽ More

    Submitted 2 March, 2020; originally announced March 2020.

    Comments: 51 pages

    MSC Class: 62M45; 68T05; 62L20; 60H30

  47. arXiv:2003.00596  [pdf, ps, other

    math.PR math.NA

    Overcoming the curse of dimensionality in the numerical approximation of high-dimensional semilinear elliptic partial differential equations

    Authors: Christian Beck, Lukas Gonon, Arnulf Jentzen

    Abstract: Recently, so-called full-history recursive multilevel Picard (MLP) approximation schemes have been introduced and shown to overcome the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations (PDEs) with Lipschitz nonlinearities. The key contribution of this article is to introduce and analyze a new variant of MLP approximation schemes for cert… ▽ More

    Submitted 1 March, 2020; originally announced March 2020.

    Comments: 50 pages

    MSC Class: 65Cxx; 65Mxx

  48. Counterexamples to local Lipschitz and local Hölder continuity with respect to the initial values for additive noise driven SDEs with smooth drift coefficient functions with at most polynomially growing derivatives

    Authors: Arnulf Jentzen, Benno Kuckuck, Thomas Müller-Gronbach, Larisa Yaroslavtseva

    Abstract: In the recent article [A. Jentzen, B. Kuckuck, T. Müller-Gronbach, and L. Yaroslavtseva, arXiv:1904.05963 (2019)] it has been proved that the solutions to every additive noise driven stochastic differential equation (SDE) which has a drift coefficient function with at most polynomially growing first order partial derivatives and which admits a Lyapunov-type condition (ensuring the the existence of… ▽ More

    Submitted 10 January, 2020; originally announced January 2020.

    Comments: 27 pages

    Journal ref: Discrete Contin. Dyn. Syst. Ser. B 27 (2022), no. 7, 3707-3724

  49. Pricing and hedging American-style options with deep learning

    Authors: Sebastian Becker, Patrick Cheridito, Arnulf Jentzen

    Abstract: In this paper we introduce a deep learning method for pricing and hedging American-style options. It first computes a candidate optimal stop** policy. From there it derives a lower bound for the price. Then it calculates an upper bound, a point estimate and confidence intervals. Finally, it constructs an approximate dynamic hedging strategy. We test the approach on different specifications of a… ▽ More

    Submitted 18 July, 2020; v1 submitted 23 December, 2019; originally announced December 2019.

    Journal ref: Journal of Risk and Financial Management 13, 7 (2020)

  50. Efficient approximation of high-dimensional functions with neural networks

    Authors: Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek

    Abstract: In this paper, we develop a framework for showing that neural networks can overcome the curse of dimensionality in different high-dimensional approximation problems. Our approach is based on the notion of a catalog network, which is a generalization of a standard neural network in which the nonlinear activation functions can vary from layer to layer as long as they are chosen from a predefined cat… ▽ More

    Submitted 29 January, 2021; v1 submitted 9 December, 2019; originally announced December 2019.

    MSC Class: 68T07 ACM Class: I.2.0

    Journal ref: IEEE Trans. Neural Netw. Learn. Syst. (2021)