Search | arXiv e-print repository

Regularized Gauss-Newton for Optimizing Overparameterized Neural Networks

Authors: Adeyemi D. Adeoye, Philipp Christian Petersen, Alberto Bemporad

Abstract: The generalized Gauss-Newton (GGN) optimization method incorporates curvature estimates into its solution steps, and provides a good approximation to the Newton method for large-scale optimization problems. GGN has been found particularly interesting for practical training of deep neural networks, not only for its impressive convergence speed, but also for its close relation with neural tangent ke… ▽ More The generalized Gauss-Newton (GGN) optimization method incorporates curvature estimates into its solution steps, and provides a good approximation to the Newton method for large-scale optimization problems. GGN has been found particularly interesting for practical training of deep neural networks, not only for its impressive convergence speed, but also for its close relation with neural tangent kernel regression, which is central to recent studies that aim to understand the optimization and generalization properties of neural networks. This work studies a GGN method for optimizing a two-layer neural network with explicit regularization. In particular, we consider a class of generalized self-concordant (GSC) functions that provide smooth approximations to commonly-used penalty terms in the objective function of the optimization problem. This approach provides an adaptive learning rate selection technique that requires little to no tuning for optimal performance. We study the convergence of the two-layer neural network, considered to be overparameterized, in the optimization loop of the resulting GGN method for a given scaling of the network parameters. Our numerical experiments highlight specific aspects of GSC regularization that help to improve generalization of the optimized neural network. The code to reproduce the experimental results is available at https://github.com/adeyemiadeoye/ggn-score-nn. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 27 pages, 9 figures, 2 tables

arXiv:2404.04549 [pdf, other]

Efficient Learning Using Spiking Neural Networks Equipped With Affine Encoders and Decoders

Authors: A. Martina Neuman, Philipp Christian Petersen

Abstract: We study the learning problem associated with spiking neural networks. Specifically, we consider hypothesis sets of spiking neural networks with affine temporal encoders and decoders and simple spiking neurons having only positive synaptic weights. We demonstrate that the positivity of the weights continues to enable a wide range of expressivity results, including rate-optimal approximation of smo… ▽ More We study the learning problem associated with spiking neural networks. Specifically, we consider hypothesis sets of spiking neural networks with affine temporal encoders and decoders and simple spiking neurons having only positive synaptic weights. We demonstrate that the positivity of the weights continues to enable a wide range of expressivity results, including rate-optimal approximation of smooth functions or approximation without the curse of dimensionality. Moreover, positive-weight spiking neural networks are shown to depend continuously on their parameters which facilitates classical covering number-based generalization statements. Finally, we observe that from a generalization perspective, contrary to feedforward neural networks or previous results for general spiking neural networks, the depth has little to no adverse effect on the generalization capabilities. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2301.13867 [pdf, other]

Mathematical Capabilities of ChatGPT

Authors: Simon Frieder, Luca Pinchetti, Alexis Chevalier, Ryan-Rhys Griffiths, Tommaso Salvatori, Thomas Lukasiewicz, Philipp Christian Petersen, Julius Berner

Abstract: We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology. In contrast to formal mathematics, where large databases of formal proofs are available (e.g., the Lean Mathematical Library), current datasets of natural-languag… ▽ More We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology. In contrast to formal mathematics, where large databases of formal proofs are available (e.g., the Lean Mathematical Library), current datasets of natural-language mathematics, used to benchmark language models, either cover only elementary mathematics or are very small. We address this by publicly releasing two new datasets: GHOSTS and miniGHOSTS. These are the first natural-language datasets curated by working researchers in mathematics that (1) aim to cover graduate-level mathematics, (2) provide a holistic overview of the mathematical capabilities of language models, and (3) distinguish multiple dimensions of mathematical reasoning. These datasets also test whether ChatGPT and GPT-4 can be helpful assistants to professional mathematicians by emulating use cases that arise in the daily professional activities of mathematicians. We benchmark the models on a range of fine-grained performance metrics. For advanced mathematics, this is the most detailed evaluation effort to date. We find that ChatGPT can be used most successfully as a mathematical assistant for querying facts, acting as a mathematical search engine and knowledge base interface. GPT-4 can additionally be used for undergraduate-level mathematics but fails on graduate-level difficulty. Contrary to many positive reports in the media about GPT-4 and ChatGPT's exam-solving abilities (a potential case of selection bias), their overall mathematical performance is well below the level of a graduate student. Hence, if your goal is to use ChatGPT to pass a graduate-level math exam, you would be better off copying from your average peer! △ Less

Submitted 20 July, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

Comments: Added further evaluations on another ChatGPT version and on GPT-4. The GHOSTS and miniGHOSTS datasets are available at https://github.com/xyfrieder/science-GHOSTS

Journal ref: NeurIPS 2023 Datasets and Benchmarks

arXiv:2212.09507 [pdf, ps, other]

VC dimensions of group convolutional neural networks

Authors: Philipp Christian Petersen, Anna Sepliarskaia

Abstract: We study the generalization capacity of group convolutional neural networks. We identify precise estimates for the VC dimensions of simple sets of group convolutional neural networks. In particular, we find that for infinite groups and appropriately chosen convolutional kernels, already two-parameter families of convolutional neural networks have an infinite VC dimension, despite being invariant t… ▽ More We study the generalization capacity of group convolutional neural networks. We identify precise estimates for the VC dimensions of simple sets of group convolutional neural networks. In particular, we find that for infinite groups and appropriately chosen convolutional kernels, already two-parameter families of convolutional neural networks have an infinite VC dimension, despite being invariant to the action of an infinite group. △ Less

Submitted 19 December, 2022; originally announced December 2022.

MSC Class: 68T07; 68Q32; 68T05

arXiv:2210.00805 [pdf, other]

Limitations of neural network training due to numerical instability of backpropagation

Authors: Clemens Karner, Vladimir Kazeev, Philipp Christian Petersen

Abstract: We study the training of deep neural networks by gradient descent where floating-point arithmetic is used to compute the gradients. In this framework and under realistic assumptions, we demonstrate that it is highly unlikely to find ReLU neural networks that maintain, in the course of training with gradient descent, superlinearly many affine pieces with respect to their number of layers. In virtua… ▽ More We study the training of deep neural networks by gradient descent where floating-point arithmetic is used to compute the gradients. In this framework and under realistic assumptions, we demonstrate that it is highly unlikely to find ReLU neural networks that maintain, in the course of training with gradient descent, superlinearly many affine pieces with respect to their number of layers. In virtually all approximation theoretical arguments that yield high-order polynomial rates of approximation, sequences of ReLU neural networks with exponentially many affine pieces compared to their numbers of layers are used. As a consequence, we conclude that approximating sequences of ReLU neural networks resulting from gradient descent in practice differ substantially from theoretically constructed sequences. The assumptions and the theoretical results are compared to a numerical study, which yields concurring results. △ Less

Submitted 15 November, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

MSC Class: 65G50; 68T07; 41A25; 68T09

arXiv:2206.00934 [pdf, other]

Deep neural networks can stably solve high-dimensional, noisy, non-linear inverse problems

Authors: Andrés Felipe Lerma Pineda, Philipp Christian Petersen

Abstract: We study the problem of reconstructing solutions of inverse problems when only noisy measurements are available. We assume that the problem can be modeled with an infinite-dimensional forward operator that is not continuously invertible. Then, we restrict this forward operator to finite-dimensional spaces so that the inverse is Lipschitz continuous. For the inverse operator, we demonstrate that th… ▽ More We study the problem of reconstructing solutions of inverse problems when only noisy measurements are available. We assume that the problem can be modeled with an infinite-dimensional forward operator that is not continuously invertible. Then, we restrict this forward operator to finite-dimensional spaces so that the inverse is Lipschitz continuous. For the inverse operator, we demonstrate that there exists a neural network which is a robust-to-noise approximation of the operator. In addition, we show that these neural networks can be learned from appropriately perturbed training data. We demonstrate the admissibility of this approach to a wide range of inverse problems of practical interest. Numerical examples are given that support the theoretical findings. △ Less

Submitted 20 October, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

MSC Class: 35R30; 41A25; 68T05

arXiv:2010.12217 [pdf, other]

doi 10.1007/s10208-022-09565-9

Exponential ReLU Neural Network Approximation Rates for Point and Edge Singularities

Authors: Carlo Marcati, Joost A. A. Opschoor, Philipp C. Petersen, Christoph Schwab

Abstract: We prove exponential expressivity with stable ReLU Neural Networks (ReLU NNs) in $H^1(Ω)$ for weighted analytic function classes in certain polytopal domains $Ω$, in space dimension $d=2,3$. Functions in these classes are locally analytic on open subdomains $D\subset Ω$, but may exhibit isolated point singularities in the interior of $Ω$ or corner and edge singularities at the boundary… ▽ More We prove exponential expressivity with stable ReLU Neural Networks (ReLU NNs) in $H^1(Ω)$ for weighted analytic function classes in certain polytopal domains $Ω$, in space dimension $d=2,3$. Functions in these classes are locally analytic on open subdomains $D\subset Ω$, but may exhibit isolated point singularities in the interior of $Ω$ or corner and edge singularities at the boundary $\partial Ω$. The exponential expression rate bounds proved here imply uniform exponential expressivity by ReLU NNs of solution families for several elliptic boundary and eigenvalue problems with analytic data. The exponential approximation rates are shown to hold in space dimension $d = 2$ on Lipschitz polygons with straight sides, and in space dimension $d=3$ on Fichera-type polyhedral domains with plane faces. The constructive proofs indicate in particular that NN depth and size increase poly-logarithmically with respect to the target NN approximation accuracy $\varepsilon>0$ in $H^1(Ω)$. The results cover in particular solution sets of linear, second order elliptic PDEs with analytic data and certain nonlinear elliptic eigenvalue problems with analytic nonlinearities and singular, weighted analytic potentials as arise in electron structure models. In the latter case, the functions correspond to electron densities that exhibit isolated point singularities at the positions of the nuclei. Our findings provide in particular mathematical foundation of recently reported, successful uses of deep neural networks in variational electron structure algorithms. △ Less

Submitted 23 October, 2020; originally announced October 2020.

Comments: Found Comput Math (2022)

MSC Class: 35Q40; 41A25; 41A46; 65N30

Journal ref: Found. Comput. Math.23(2023), no.3, 1043-1127

arXiv:1901.05744 [pdf, ps, other]

The Oracle of DLphi

Authors: Dominik Alfke, Weston Baines, Jan Blechschmidt, Mauricio J. del Razo Sarmina, Amnon Drory, Dennis Elbrächter, Nando Farchmin, Matteo Gambara, Silke Glas, Philipp Grohs, Peter Hinz, Danijel Kivaranovic, Christian Kümmerle, Gitta Kutyniok, Sebastian Lunz, Jan Macdonald, Ryan Malthaner, Gregory Naisat, Ariel Neufeld, Philipp Christian Petersen, Rafael Reisenhofer, Jun-Da Sheng, Laura Thesing, Philipp Trunschke, Johannes von Lindheim , et al. (2 additional authors not shown)

Abstract: We present a novel technique based on deep learning and set theory which yields exceptional classification and prediction results. Having access to a sufficiently large amount of labelled training data, our methodology is capable of predicting the labels of the test data almost always even if the training data is entirely unrelated to the test data. In other words, we prove in a specific setting t… ▽ More We present a novel technique based on deep learning and set theory which yields exceptional classification and prediction results. Having access to a sufficiently large amount of labelled training data, our methodology is capable of predicting the labels of the test data almost always even if the training data is entirely unrelated to the test data. In other words, we prove in a specific setting that as long as one has access to enough data points, the quality of the data is irrelevant. △ Less

Submitted 27 January, 2019; v1 submitted 17 January, 2019; originally announced January 2019.

MSC Class: 68T05; 82C32

arXiv:1810.12835 [pdf, other]

Gamma-convergence of a shearlet-based Ginzburg--Landau energy

Authors: Philipp Christian Petersen, Endre Süli

Abstract: We introduce two shearlet-based Ginzburg--Landau energies, based on the continuous and the discrete shearlet transform. The energies result from replacing the elastic energy term of a classical Ginzburg--Landau energy by the weighted $L^2$-norm of a shearlet transform. The asymptotic behaviour of sequences of these energies is analysed within the framework of $Γ$-convergence and the limit energy i… ▽ More We introduce two shearlet-based Ginzburg--Landau energies, based on the continuous and the discrete shearlet transform. The energies result from replacing the elastic energy term of a classical Ginzburg--Landau energy by the weighted $L^2$-norm of a shearlet transform. The asymptotic behaviour of sequences of these energies is analysed within the framework of $Γ$-convergence and the limit energy is identified. We show that the limit energy of a characteristic function is an anisotropic surface integral over the interfaces of that function. We demonstrate that the anisotropy of the limit energy can be controlled by weighting the underlying shearlet transforms according to their directional parameter. △ Less

Submitted 27 November, 2019; v1 submitted 30 October, 2018; originally announced October 2018.

MSC Class: 42C40; 65T60; 49M25

Showing 1–9 of 9 results for author: Petersen, P C