Skip to main content

Showing 1–5 of 5 results for author: Cooper, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2307.15744  [pdf, ps, other

    cs.LG math.DG math.OC

    How regularization affects the geometry of loss functions

    Authors: Nathaniel Bottman, Y. Cooper, Antonio Lerario

    Abstract: What neural networks learn depends fundamentally on the geometry of the underlying loss function. We study how different regularizers affect the geometry of this function. One of the most basic geometric properties of a smooth function is whether it is Morse or not. For nonlinear deep neural networks, the unregularized loss function $L$ is typically not Morse. We consider several different regular… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: 16 pages, 0 figures

  2. arXiv:2005.04210  [pdf, other

    cs.LG cs.NE math.OC stat.ML

    The critical locus of overparameterized neural networks

    Authors: Y. Cooper

    Abstract: Many aspects of the geometry of loss functions in deep learning remain mysterious. In this paper, we work toward a better understanding of the geometry of the loss function $L$ of overparameterized feedforward neural networks. In this setting, we identify several components of the critical locus of $L$ and study their geometric properties. For networks of depth $\ell \geq 4$, we identify a locus o… ▽ More

    Submitted 17 May, 2020; v1 submitted 8 May, 2020; originally announced May 2020.

  3. arXiv:1809.05527  [pdf, ps, other

    math.OC cs.LG math.DS stat.ML

    Gradient descent in higher codimension

    Authors: Y. Cooper

    Abstract: We consider the behavior of gradient flow and of discrete and noisy gradient descent. It is commonly noted that the addition of noise to the process of discrete gradient descent can affect the trajectory of gradient descent. In previous work, we observed such effects. There, we considered the case where the minima had codimension 1. In this note, we do some computer experiments and observe the beh… ▽ More

    Submitted 18 April, 2019; v1 submitted 14 September, 2018; originally announced September 2018.

  4. arXiv:1808.04839  [pdf, other

    math.OC cs.LG math.DS stat.ML

    Gradient descent in some simple settings

    Authors: Y. Cooper

    Abstract: In this note, we observe the behavior of gradient flow and discrete and noisy gradient descent in some simple settings. It is commonly noted that addition of noise to gradient descent can affect the trajectory of gradient descent. Here, we run some computer experiments for gradient descent on some simple functions, and observe this principle in some concrete examples.

    Submitted 18 April, 2019; v1 submitted 14 August, 2018; originally announced August 2018.

  5. arXiv:1804.10200  [pdf, ps, other

    cs.LG cs.AI cs.NE stat.ML

    The loss landscape of overparameterized neural networks

    Authors: Y Cooper

    Abstract: We explore some mathematical features of the loss landscape of overparameterized neural networks. A priori one might imagine that the loss function looks like a typical function from $\mathbb{R}^n$ to $\mathbb{R}$ - in particular, nonconvex, with discrete global minima. In this paper, we prove that in at least one important way, the loss function of an overparameterized neural network does not loo… ▽ More

    Submitted 26 April, 2018; originally announced April 2018.

    Comments: 9 pages