Skip to main content

Showing 1–3 of 3 results for author: Triefenbach, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2208.01448  [pdf, other

    cs.CL cs.LG

    AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

    Authors: Saleh Soltan, Shankar Ananthakrishnan, Jack FitzGerald, Rahul Gupta, Wael Hamza, Haidar Khan, Charith Peris, Stephen Rawls, Andy Rosenbaum, Anna Rumshisky, Chandana Satya Prakash, Mukund Sridhar, Fabian Triefenbach, Apurv Verma, Gokhan Tur, Prem Natarajan

    Abstract: In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models on various tasks. In particular, we train a 20 billion parameter multilingual seq2seq model called Alexa Teacher Model (AlexaTM 20B) and show that it achieves s… ▽ More

    Submitted 3 August, 2022; v1 submitted 2 August, 2022; originally announced August 2022.

  2. arXiv:2206.07808  [pdf, other

    cs.CL cs.AI cs.LG

    Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems

    Authors: Jack FitzGerald, Shankar Ananthakrishnan, Konstantine Arkoudas, Davide Bernardi, Abhishek Bhagia, Claudio Delli Bovi, ** Cao, Rakesh Chada, Amit Chauhan, Luoxin Chen, Anurag Dwarakanath, Satyam Dwivedi, Turan Gojayev, Karthik Gopalakrishnan, Thomas Gueudre, Dilek Hakkani-Tur, Wael Hamza, Jonathan Hueser, Kevin Martin Jose, Haidar Khan, Beiye Liu, Jianhua Lu, Alessandro Manzotti, Pradeep Natarajan, Karolina Owczarzak , et al. (16 additional authors not shown)

    Abstract: We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B, their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the Natural Language Understanding (NLU) component of a virtual assistant system. Though we train using 70% spoken-form data, our teacher models perform co… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

    Comments: KDD 2022

    ACM Class: I.2.7

    Journal ref: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '22), August 14-18, 2022, Washington, DC, USA

  3. arXiv:2008.02603  [pdf, other

    eess.AS cs.CL cs.SD

    Data balancing for boosting performance of low-frequency classes in Spoken Language Understanding

    Authors: Judith Gaspers, Quynh Do, Fabian Triefenbach

    Abstract: Despite the fact that data imbalance is becoming more and more common in real-world Spoken Language Understanding (SLU) applications, it has not been studied extensively in the literature. To the best of our knowledge, this paper presents the first systematic study on handling data imbalance for SLU. In particular, we discuss the application of existing data balancing techniques for SLU and propos… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: accepted at InterSpeech 2020