Skip to main content

Showing 1–2 of 2 results for author: Hu, E J

Searching in archive cond-mat. Search in all archives.
.
  1. arXiv:2203.03466  [pdf, other

    cs.LG cond-mat.dis-nn cs.NE

    Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

    Authors: Greg Yang, Edward J. Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao

    Abstract: Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters. We show that, in the recently discovered Maximal Update Parametrization (muP), many optimal HPs remain stable even as model size changes. This leads to a new HP tuning paradigm we call muTransfer: parametrize the target model in muP, tune the HP indirectly on… ▽ More

    Submitted 28 March, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: NeurIPS 2021

  2. arXiv:2011.14522  [pdf, other

    cs.LG cond-mat.dis-nn cs.NE

    Feature Learning in Infinite-Width Neural Networks

    Authors: Greg Yang, Edward J. Hu

    Abstract: As its width tends to infinity, a deep neural network's behavior under gradient descent can become simplified and predictable (e.g. given by the Neural Tangent Kernel (NTK)), if it is parametrized appropriately (e.g. the NTK parametrization). However, we show that the standard and NTK parametrizations of a neural network do not admit infinite-width limits that can learn features, which is crucial… ▽ More

    Submitted 15 July, 2022; v1 submitted 29 November, 2020; originally announced November 2020.

    Comments: 4th paper in the Tensor Programs series. Appearing in ICML 2021