Search | arXiv e-print repository

Stochastic Gradient Flow Dynamics of Test Risk and its Exact Solution for Weak Features

Authors: Rodrigo Veiga, Anastasia Remizova, Nicolas Macris

Abstract: We investigate the test risk of continuous-time stochastic gradient flow dynamics in learning theory. Using a path integral formulation we provide, in the regime of a small learning rate, a general formula for computing the difference between test risk curves of pure gradient and stochastic gradient flows. We apply the general theory to a simple model of weak features, which displays the double de… ▽ More We investigate the test risk of continuous-time stochastic gradient flow dynamics in learning theory. Using a path integral formulation we provide, in the regime of a small learning rate, a general formula for computing the difference between test risk curves of pure gradient and stochastic gradient flows. We apply the general theory to a simple model of weak features, which displays the double descent phenomenon, and explicitly compute the corrections brought about by the added stochastic term in the dynamics, as a function of time and model parameters. The analytical results are compared to simulations of discrete-time stochastic gradient descent and show good agreement. △ Less

Submitted 10 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: Accepted to ICML 2024

arXiv:2305.10408 [pdf, other]

Extracting Blockchain Concepts from Text

Authors: Rodrigo Veiga, Markus Endler, Valeria de Paiva

Abstract: Blockchains provide a mechanism through which mutually distrustful remote parties can reach consensus on the state of a ledger of information. With the great acceleration with which this space is developed, the demand for those seeking to learn about blockchain also grows. Being a technical subject, it can be quite intimidating to start learning. For this reason, the main objective of this project… ▽ More Blockchains provide a mechanism through which mutually distrustful remote parties can reach consensus on the state of a ledger of information. With the great acceleration with which this space is developed, the demand for those seeking to learn about blockchain also grows. Being a technical subject, it can be quite intimidating to start learning. For this reason, the main objective of this project was to apply machine learning models to extract information from whitepapers and academic articles focused on the blockchain area to organize this information and aid users to navigate the space. △ Less

Submitted 6 May, 2023; originally announced May 2023.

Comments: 19 pages, 3 figures

arXiv:2203.12094 [pdf, other]

doi 10.1088/2632-2153/acb428

Learning curves for the multi-class teacher-student perceptron

Authors: Elisabetta Cornacchia, Francesca Mignacco, Rodrigo Veiga, Cédric Gerbelot, Bruno Loureiro, Lenka Zdeborová

Abstract: One of the most classical results in high-dimensional learning theory provides a closed-form expression for the generalisation error of binary classification with the single-layer teacher-student perceptron on i.i.d. Gaussian inputs. Both Bayes-optimal estimation and empirical risk minimisation (ERM) were extensively analysed for this setting. At the same time, a considerable part of modern machin… ▽ More One of the most classical results in high-dimensional learning theory provides a closed-form expression for the generalisation error of binary classification with the single-layer teacher-student perceptron on i.i.d. Gaussian inputs. Both Bayes-optimal estimation and empirical risk minimisation (ERM) were extensively analysed for this setting. At the same time, a considerable part of modern machine learning practice concerns multi-class classification. Yet, an analogous analysis for the corresponding multi-class teacher-student perceptron was missing. In this manuscript we fill this gap by deriving and evaluating asymptotic expressions for both the Bayes-optimal and ERM generalisation errors in the high-dimensional regime. For Gaussian teacher weights, we investigate the performance of ERM with both cross-entropy and square losses, and explore the role of ridge regularisation in approaching Bayes-optimality. In particular, we observe that regularised cross-entropy minimisation yields close-to-optimal accuracy. Instead, for a binary teacher we show that a first-order phase transition arises in the Bayes-optimal performance. △ Less

Submitted 22 March, 2022; originally announced March 2022.

Comments: 14 pages + appendix

Journal ref: Machine Learning: Science and Technology 4 015019 (2022)

arXiv:2202.00293 [pdf, other]

Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

Authors: Rodrigo Veiga, Ludovic Stephan, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

Abstract: Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connect… ▽ More Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates. △ Less

Submitted 14 June, 2023; v1 submitted 1 February, 2022; originally announced February 2022.

Comments: 20 pages

Journal ref: Advances in Neural Information Processing Systems (2022), vol 35, pages {23244--23255)

arXiv:1507.00216 [pdf, ps, other]

Implementing generating functions to obtain power indices with coalition configuration

Authors: Jorge Rodríguez Veiga, Guido I. Novoa Flores, Balbina V. Casas Méndez

Abstract: We consider the Banzhaf-Coleman and Owen power indices for weighted majority games modified by a coalition configuration. We present calculation algorithms of them that make use of the method of generating functions. We programmed the procedure in the open language R and it is illustrated by a real life example taken from social sciences. We consider the Banzhaf-Coleman and Owen power indices for weighted majority games modified by a coalition configuration. We present calculation algorithms of them that make use of the method of generating functions. We programmed the procedure in the open language R and it is illustrated by a real life example taken from social sciences. △ Less

Submitted 1 July, 2015; originally announced July 2015.

Comments: 14 pages, 1 Table

MSC Class: 91A80 ACM Class: F.4.3

Showing 1–5 of 5 results for author: Veiga, R