Skip to main content

Showing 1–3 of 3 results for author: Dewan, C

.
  1. arXiv:2212.12017  [pdf, other

    cs.CL

    OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

    Authors: Srinivasan Iyer, Xi Victoria Lin, Ramakanth Pasunuru, Todor Mihaylov, Daniel Simig, ** Yu, Kurt Shuster, Tianlu Wang, Qing Liu, Punit Singh Koura, Xian Li, Brian O'Horo, Gabriel Pereyra, Jeff Wang, Christopher Dewan, Asli Celikyilmaz, Luke Zettlemoyer, Ves Stoyanov

    Abstract: Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks described via instructions, a.k.a. instruction-tuning, improves their zero and few-shot generalization to unseen tasks. However, there is a limited understanding of the performance trade-offs of different decisions made during the instruction-tuning process. These decisions include the scale and diver… ▽ More

    Submitted 30 January, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

    Comments: 56 pages. v2->v3: fix OPT-30B evaluation results across benchmarks (previously we reported lower performance of this model due to an evaluation pipeline bug)

  2. arXiv:2205.01068  [pdf, other

    cs.CL cs.LG

    OPT: Open Pre-trained Transformer Language Models

    Authors: Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer

    Abstract: Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open… ▽ More

    Submitted 21 June, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

  3. arXiv:1812.08729  [pdf, other

    cs.CL

    PyText: A Seamless Path from NLP research to production

    Authors: Ahmed Aly, Kushal Lakhotia, Shicong Zhao, Mrinal Mohit, Barlas Oguz, Abhinav Arora, Sonal Gupta, Christopher Dewan, Stef Nelson-Lindall, Rushin Shah

    Abstract: We introduce PyText - a deep learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapid experimentation and of serving models at scale. It achieves this by providing simple and extensible interfaces for model components, and by using PyTorch's capabilities of exporting models for inference via the optimized Caffe2 execution engine.… ▽ More

    Submitted 12 December, 2018; originally announced December 2018.