-
Stochastic Gradient Flow Dynamics of Test Risk and its Exact Solution for Weak Features
Authors:
Rodrigo Veiga,
Anastasia Remizova,
Nicolas Macris
Abstract:
We investigate the test risk of continuous-time stochastic gradient flow dynamics in learning theory. Using a path integral formulation we provide, in the regime of a small learning rate, a general formula for computing the difference between test risk curves of pure gradient and stochastic gradient flows. We apply the general theory to a simple model of weak features, which displays the double de…
▽ More
We investigate the test risk of continuous-time stochastic gradient flow dynamics in learning theory. Using a path integral formulation we provide, in the regime of a small learning rate, a general formula for computing the difference between test risk curves of pure gradient and stochastic gradient flows. We apply the general theory to a simple model of weak features, which displays the double descent phenomenon, and explicitly compute the corrections brought about by the added stochastic term in the dynamics, as a function of time and model parameters. The analytical results are compared to simulations of discrete-time stochastic gradient descent and show good agreement.
△ Less
Submitted 10 June, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Extracting Blockchain Concepts from Text
Authors:
Rodrigo Veiga,
Markus Endler,
Valeria de Paiva
Abstract:
Blockchains provide a mechanism through which mutually distrustful remote parties can reach consensus on the state of a ledger of information. With the great acceleration with which this space is developed, the demand for those seeking to learn about blockchain also grows. Being a technical subject, it can be quite intimidating to start learning. For this reason, the main objective of this project…
▽ More
Blockchains provide a mechanism through which mutually distrustful remote parties can reach consensus on the state of a ledger of information. With the great acceleration with which this space is developed, the demand for those seeking to learn about blockchain also grows. Being a technical subject, it can be quite intimidating to start learning. For this reason, the main objective of this project was to apply machine learning models to extract information from whitepapers and academic articles focused on the blockchain area to organize this information and aid users to navigate the space.
△ Less
Submitted 6 May, 2023;
originally announced May 2023.
-
Learning curves for the multi-class teacher-student perceptron
Authors:
Elisabetta Cornacchia,
Francesca Mignacco,
Rodrigo Veiga,
Cédric Gerbelot,
Bruno Loureiro,
Lenka Zdeborová
Abstract:
One of the most classical results in high-dimensional learning theory provides a closed-form expression for the generalisation error of binary classification with the single-layer teacher-student perceptron on i.i.d. Gaussian inputs. Both Bayes-optimal estimation and empirical risk minimisation (ERM) were extensively analysed for this setting. At the same time, a considerable part of modern machin…
▽ More
One of the most classical results in high-dimensional learning theory provides a closed-form expression for the generalisation error of binary classification with the single-layer teacher-student perceptron on i.i.d. Gaussian inputs. Both Bayes-optimal estimation and empirical risk minimisation (ERM) were extensively analysed for this setting. At the same time, a considerable part of modern machine learning practice concerns multi-class classification. Yet, an analogous analysis for the corresponding multi-class teacher-student perceptron was missing. In this manuscript we fill this gap by deriving and evaluating asymptotic expressions for both the Bayes-optimal and ERM generalisation errors in the high-dimensional regime. For Gaussian teacher weights, we investigate the performance of ERM with both cross-entropy and square losses, and explore the role of ridge regularisation in approaching Bayes-optimality. In particular, we observe that regularised cross-entropy minimisation yields close-to-optimal accuracy. Instead, for a binary teacher we show that a first-order phase transition arises in the Bayes-optimal performance.
△ Less
Submitted 22 March, 2022;
originally announced March 2022.
-
Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks
Authors:
Rodrigo Veiga,
Ludovic Stephan,
Bruno Loureiro,
Florent Krzakala,
Lenka Zdeborová
Abstract:
Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connect…
▽ More
Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates.
△ Less
Submitted 14 June, 2023; v1 submitted 1 February, 2022;
originally announced February 2022.
-
Implementing generating functions to obtain power indices with coalition configuration
Authors:
Jorge Rodríguez Veiga,
Guido I. Novoa Flores,
Balbina V. Casas Méndez
Abstract:
We consider the Banzhaf-Coleman and Owen power indices for weighted majority games modified by a coalition configuration. We present calculation algorithms of them that make use of the method of generating functions. We programmed the procedure in the open language R and it is illustrated by a real life example taken from social sciences.
We consider the Banzhaf-Coleman and Owen power indices for weighted majority games modified by a coalition configuration. We present calculation algorithms of them that make use of the method of generating functions. We programmed the procedure in the open language R and it is illustrated by a real life example taken from social sciences.
△ Less
Submitted 1 July, 2015;
originally announced July 2015.