Skip to main content

Showing 1–5 of 5 results for author: Narang, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:1910.14613  [pdf, other

    cs.LG cs.CL stat.ML

    Neural Assistant: Joint Action Prediction, Response Generation, and Latent Knowledge Reasoning

    Authors: Arvind Neelakantan, Semih Yavuz, Sharan Narang, Vishaal Prasad, Ben Goodrich, Daniel Duckworth, Chinnadhurai Sankar, Xifeng Yan

    Abstract: Task-oriented dialog presents a difficult challenge encompassing multiple problems including multi-turn language understanding and generation, knowledge retrieval and reasoning, and action prediction. Modern dialog systems typically begin by converting conversation history to a symbolic object referred to as belief state by using supervised learning. The belief state is then used to reason on an e… ▽ More

    Submitted 31 October, 2019; originally announced October 2019.

  2. arXiv:1910.10683  [pdf, other

    cs.LG cs.CL stat.ML

    Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

    Authors: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu

    Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing… ▽ More

    Submitted 19 September, 2023; v1 submitted 23 October, 2019; originally announced October 2019.

  3. arXiv:1712.00409  [pdf, other

    cs.LG stat.ML

    Deep Learning Scaling is Predictable, Empirically

    Authors: Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou

    Abstract: Deep learning (DL) creates impactful advances following a virtuous recipe: model architecture search, creating large training data sets, and scaling computation. It is widely believed that growing training sets and models should improve accuracy and result in better products. As DL application domains grow, we would like a deeper understanding of the relationships between training set size, comput… ▽ More

    Submitted 1 December, 2017; originally announced December 2017.

    Comments: 19 pages, 11 figures

  4. arXiv:1711.02782  [pdf, other

    cs.LG cs.AI stat.ML

    Block-Sparse Recurrent Neural Networks

    Authors: Sharan Narang, Eric Undersander, Gregory Diamos

    Abstract: Recurrent Neural Networks (RNNs) are used in state-of-the-art models in domains such as speech recognition, machine translation, and language modelling. Sparsity is a technique to reduce compute and memory requirements of deep learning models. Sparse RNNs are easier to deploy on devices and high-end server processors. Even though sparse operations need less compute and memory relative to their den… ▽ More

    Submitted 7 November, 2017; originally announced November 2017.

  5. arXiv:1710.03740  [pdf, other

    cs.AI cs.LG stat.ML

    Mixed Precision Training

    Authors: Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu

    Abstract: Deep neural networks have enabled progress in a wide variety of applications. Growing the size of the neural network typically results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models also increases. We introduce a technique to train deep neural networks using half precision floating point numbers. In our technique, weights, activations and g… ▽ More

    Submitted 15 February, 2018; v1 submitted 10 October, 2017; originally announced October 2017.

    Comments: Published as a conference paper at ICLR 2018