Skip to main content

Showing 1–2 of 2 results for author: Notsawo, P J T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.12771  [pdf, other

    cs.LG math.OC

    Stochastic Average Gradient : A Simple Empirical Investigation

    Authors: Pascal Junior Tikeng Notsawo

    Abstract: Despite the recent growth of theoretical studies and empirical successes of neural networks, gradient backpropagation is still the most widely used algorithm for training such networks. On the one hand, we have deterministic or full gradient (FG) approaches that have a cost proportional to the amount of training data used but have a linear convergence rate, and on the other hand, stochastic gradie… ▽ More

    Submitted 27 July, 2023; originally announced October 2023.

    Comments: 37 pages, 52 figures. arXiv admin note: substantial text overlap with arXiv:1309.2388 by other authors

    ACM Class: G.1.6; I.2.6; I.2.8

  2. arXiv:2306.13253  [pdf, other

    cs.LG

    Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

    Authors: Pascal Jr. Tikeng Notsawo, Hattie Zhou, Mohammad Pezeshki, Irina Rish, Guillaume Dumas

    Abstract: This paper focuses on predicting the occurrence of grokking in neural networks, a phenomenon in which perfect generalization emerges long after signs of overfitting or memorization are observed. It has been reported that grokking can only be observed with certain hyper-parameters. This makes it critical to identify the parameters that lead to grokking. However, since grokking occurs after a large… ▽ More

    Submitted 28 September, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

    Comments: 26 pages, 30 figures

    ACM Class: I.2.6