Skip to main content

Showing 1–7 of 7 results for author: Pesme, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12763  [pdf, other

    stat.ML cs.LG math.OC

    Implicit Bias of Mirror Flow on Separable Data

    Authors: Scott Pesme, Radu-Alexandru Dragomir, Nicolas Flammarion

    Abstract: We examine the continuous-time counterpart of mirror descent, namely mirror flow, on classification problems which are linearly separable. Such problems are minimised `at infinity' and have many possible solutions; we study which solution is preferred by the algorithm depending on the mirror potential. For exponential tailed losses and under mild assumptions on the potential, we show that the iter… ▽ More

    Submitted 19 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: Exact same text as first version but the acknowledgments section is updated

  2. arXiv:2403.05293  [pdf, other

    cs.LG math.OC stat.ML

    Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

    Authors: Hristo Papazov, Scott Pesme, Nicolas Flammarion

    Abstract: In this work, we investigate the effect of momentum on the optimisation trajectory of gradient descent. We leverage a continuous-time approach in the analysis of momentum gradient descent with step size $γ$ and momentum parameter $β$ that allows us to identify an intrinsic quantity $λ= \frac{ γ}{ (1 - β)^2 }$ which uniquely defines the optimisation path and provides a simple acceleration rule. Whe… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2304.00488  [pdf, other

    cs.LG math.OC

    Saddle-to-Saddle Dynamics in Diagonal Linear Networks

    Authors: Scott Pesme, Nicolas Flammarion

    Abstract: In this paper we fully describe the trajectory of gradient flow over diagonal linear networks in the limit of vanishing initialisation. We show that the limiting flow successively jumps from a saddle of the training loss to another until reaching the minimum $\ell_1$-norm solution. This saddle-to-saddle dynamics translates to an incremental learning process as each saddle corresponds to the minimi… ▽ More

    Submitted 25 October, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

  4. arXiv:2302.08982  [pdf, other

    cs.LG math.OC stat.ML

    (S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability

    Authors: Mathieu Even, Scott Pesme, Suriya Gunasekar, Nicolas Flammarion

    Abstract: In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over diagonal linear networks. We prove the convergence of GD and SGD with macroscopic stepsizes in an overparametrised regression setting and characterise their solutions through an implicit regularisation problem. Our crisp ch… ▽ More

    Submitted 25 October, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

  5. arXiv:2106.09524  [pdf, other

    cs.LG

    Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity

    Authors: Scott Pesme, Loucas Pillaud-Vivien, Nicolas Flammarion

    Abstract: Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the dynamics of stochastic gradient descent over diagonal linear networks through its continuous time version, namely stochastic gradient flow. We explicitly characterise the solution chosen by the stochastic flow and prove tha… ▽ More

    Submitted 7 December, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

  6. arXiv:2007.00534  [pdf, ps, other

    cs.LG math.OC stat.ML

    On Convergence-Diagnostic based Step Sizes for Stochastic Gradient Descent

    Authors: Scott Pesme, Aymeric Dieuleveut, Nicolas Flammarion

    Abstract: Constant step-size Stochastic Gradient Descent exhibits two phases: a transient phase during which iterates make fast progress towards the optimum, followed by a stationary phase during which iterates oscillate around the optimal point. In this paper, we show that efficiently detecting this transition and appropriately decreasing the step size can lead to fast convergence rates. We analyse the cla… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

  7. arXiv:2007.00399  [pdf, ps, other

    cs.LG math.OC stat.ML

    Online Robust Regression via SGD on the l1 loss

    Authors: Scott Pesme, Nicolas Flammarion

    Abstract: We consider the robust linear regression problem in the online setting where we have access to the data in a streaming manner, one data point after the other. More specifically, for a true parameter $θ^*$, we consider the corrupted Gaussian linear model $y = \langle x , \ θ^* \rangle + \varepsilon + b$ where the adversarial noise $b$ can take any value with probability $η$ and equals zero otherwis… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.