Skip to main content

Showing 1–7 of 7 results for author: Radhakrishnan, A

Searching in archive math. Search in all archives.
.
  1. arXiv:2306.04815  [pdf, other

    cs.LG math.OC stat.ML

    Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning

    Authors: Libin Zhu, Chaoyue Liu, Adityanarayanan Radhakrishnan, Mikhail Belkin

    Abstract: In this paper, we first present an explanation regarding the common occurrence of spikes in the training loss when neural networks are trained with stochastic gradient descent (SGD). We provide evidence that the spikes in the training loss of SGD are "catapults", an optimization phenomenon originally observed in GD with large learning rates in [Lewkowycz et al. 2020]. We empirically show that thes… ▽ More

    Submitted 5 June, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: ICML 2024

  2. arXiv:2205.11787  [pdf, other

    cs.LG math.OC stat.ML

    Quadratic models for understanding catapult dynamics of neural networks

    Authors: Libin Zhu, Chaoyue Liu, Adityanarayanan Radhakrishnan, Mikhail Belkin

    Abstract: While neural networks can be approximated by linear models as their width increases, certain properties of wide neural networks cannot be captured by linear models. In this work we show that recently proposed Neural Quadratic Models can exhibit the "catapult phase" [Lewkowycz et al. 2020] that arises when training such models with large learning rates. We then empirically show that the behaviour o… ▽ More

    Submitted 1 May, 2024; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: accepted in ICLR 2024; changed the title

  3. arXiv:2112.14872  [pdf, other

    math.OC cs.LG

    Local Quadratic Convergence of Stochastic Gradient Descent with Adaptive Step Size

    Authors: Adityanarayanan Radhakrishnan, Mikhail Belkin, Caroline Uhler

    Abstract: Establishing a fast rate of convergence for optimization methods is crucial to their applicability in practice. With the increasing popularity of deep learning over the past decade, stochastic gradient descent and its adaptive variants (e.g. Adagrad, Adam, etc.) have become prominent methods of choice for machine learning practitioners. While a large number of works have demonstrated that these fi… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

    Comments: ICML 2021 Workshop on Beyond first-order methods in ML systems

  4. arXiv:2104.13758  [pdf, other

    math.NA cs.CE

    A Non-Nested Multilevel Method for Meshless Solution of the Poisson Equation in Heat Transfer and Fluid Flow

    Authors: Anand Radhakrishnan, Michael Xu, Shantanu Shahane, Surya Pratap Vanka

    Abstract: We present a non-nested multilevel algorithm for solving the Poisson equation discretized at scattered points using polyharmonic radial basis function (PHS-RBF) interpolations. We append polynomials to the radial basis functions to achieve exponential convergence of discretization errors. The interpolations are performed over local clouds of points and the Poisson equation is collocated at each of… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

  5. arXiv:2010.01702  [pdf, other

    math.NA physics.comp-ph physics.flu-dyn

    A High-Order Accurate Meshless Method for Solution of Incompressible Fluid Flow Problems

    Authors: Shantanu Shahane, Anand Radhakrishnan, Surya Pratap Vanka

    Abstract: Meshless solution to differential equations using radial basis functions (RBF) is an alternative to grid based methods commonly used. Since the meshless method does not need an underlying connectivity in the form of control volumes or elements, issues such as grid skewness that adversely impact accuracy are eliminated. Gaussian, Multiquadrics and inverse Multiquadrics are some of the most popular… ▽ More

    Submitted 31 July, 2021; v1 submitted 4 October, 2020; originally announced October 2020.

  6. arXiv:1706.06091  [pdf, other

    math.CO

    Counting Markov Equivalence Classes for DAG models on Trees

    Authors: Adityanarayanan Radhakrishnan, Liam Solus, Caroline Uhler

    Abstract: DAG models are statistical models satisfying a collection of conditional independence relations encoded by the nonedges of a directed acyclic graph (DAG) $\mathcal{G}$. Such models are used to model complex cause-effect systems across a variety of research fields. From observational data alone, a DAG model $\mathcal{G}$ is only recoverable up to Markov equivalence. Combinatorially, two DAGs are Ma… ▽ More

    Submitted 17 June, 2017; originally announced June 2017.

    Comments: 31 Pages, 25 Figures, 1 Table

  7. arXiv:1611.07493  [pdf, other

    math.CO

    Counting Markov Equivalence Classes by Number of Immoralities

    Authors: Adityanarayanan Radhakrishnan, Liam Solus, Caroline Uhler

    Abstract: Two directed acyclic graphs (DAGs) are called Markov equivalent if and only if they have the same underlying undirected graph (i.e. skeleton) and the same set of immoralities. Using observational data, a DAG model can only be determined up to Markov equivalence, and so it is desirable to understand the size and number of Markov equivalence classes (MECs) combinatorially. In this paper, we address… ▽ More

    Submitted 17 June, 2017; v1 submitted 22 November, 2016; originally announced November 2016.

    Comments: 10 pages, 3 Figures, 1 Table