Skip to main content

Showing 1–3 of 3 results for author: Kamalakara, S R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2209.13569  [pdf, other

    cs.LG stat.ML

    Exploring Low Rank Training of Deep Neural Networks

    Authors: Siddhartha Rao Kamalakara, Acyr Locatelli, Bharat Venkitesh, Jimmy Ba, Yarin Gal, Aidan N. Gomez

    Abstract: Training deep neural networks in low rank, i.e. with factorised layers, is of particular interest to the community: it offers efficiency over unfactorised training in terms of both memory consumption and training time. Prior work has focused on low rank approximations of pre-trained networks and training in low rank space with additional objectives, offering various ad hoc explanations for chosen… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

  2. arXiv:2204.06514  [pdf, other

    cs.LG cs.CL

    Scalable Training of Language Models using JAX pjit and TPUv4

    Authors: Joanna Yoo, Kuba Perlin, Siddhartha Rao Kamalakara, João G. M. Araújo

    Abstract: Modern large language models require distributed training strategies due to their size. The challenges of efficiently and robustly training them are met with rapid developments on both software and hardware frontiers. In this technical report, we explore challenges and design decisions associated with develo** a scalable training framework, and present a quantitative analysis of efficiency impro… ▽ More

    Submitted 13 April, 2022; originally announced April 2022.

    Comments: 5 pages, 4 figures

  3. arXiv:1905.13678  [pdf, other

    cs.LG stat.ML

    Learning Sparse Networks Using Targeted Dropout

    Authors: Aidan N. Gomez, Ivan Zhang, Siddhartha Rao Kamalakara, Divyam Madaan, Kevin Swersky, Yarin Gal, Geoffrey E. Hinton

    Abstract: Neural networks are easier to optimise when they have many more weights than are required for modelling the map** from inputs to outputs. This suggests a two-stage learning procedure that first learns a large net and then prunes away connections or hidden units. But standard training does not necessarily encourage nets to be amenable to pruning. We introduce targeted dropout, a method for traini… ▽ More

    Submitted 9 September, 2019; v1 submitted 31 May, 2019; originally announced May 2019.