Skip to main content

Showing 1–35 of 35 results for author: Hannun, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.01093  [pdf, other

    cs.LG cs.CL

    Specialized Language Models with Cheap Inference from Limited Domain Data

    Authors: David Grangier, Angelos Katharopoulos, Pierre Ablin, Awni Hannun

    Abstract: Large language models have emerged as a versatile tool but are challenging to apply to tasks lacking large inference budgets and large in-domain training sets. This work formalizes these constraints and distinguishes four important variables: the pretraining budget (for training before the target domain is known), the specialization budget (for training after the target domain is known), the infer… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  2. arXiv:2311.11973  [pdf, ps, other

    cs.LG cs.CL

    Adaptive Training Distributions with Scalable Online Bilevel Optimization

    Authors: David Grangier, Pierre Ablin, Awni Hannun

    Abstract: Large neural networks pretrained on web-scale corpora are central to modern machine learning. In this paradigm, the distribution of the large, heterogeneous pretraining data rarely matches that of the application domain. This work considers modifying the pretraining distribution in the case where one has a small sample of data reflecting the targeted test conditions. We propose an algorithm motiva… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  3. arXiv:2311.06382  [pdf, other

    cs.CL cs.LG

    Transfer Learning for Structured Pruning under Limited Task Data

    Authors: Lucio Dery, David Grangier, Awni Hannun

    Abstract: Large, pre-trained models are problematic to use in resource constrained applications. Fortunately, task-aware structured pruning methods offer a solution. These approaches reduce model size by drop** structural units like layers and attention heads in a manner that takes into account the end-task. However, these pruning algorithms require more task-specific data than is typically available. We… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: 8 pages, 7 figures and 3 tables

  4. arXiv:2201.12465  [pdf, other

    cs.LG cs.AI cs.DC

    Flashlight: Enabling Innovation in Tools for Machine Learning

    Authors: Jacob Kahn, Vineel Pratap, Tatiana Likhomanenko, Qiantong Xu, Awni Hannun, Jeff Cai, Paden Tomasello, Ann Lee, Edouard Grave, Gilad Avidov, Benoit Steiner, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

    Abstract: As the computational requirements for machine learning systems and the size and complexity of machine learning frameworks increases, essential framework innovation has become challenging. While computational needs have driven recent compiler, networking, and hardware advancements, utilization of those advancements by machine learning tools is occurring at a slower pace. This is in part due to the… ▽ More

    Submitted 22 June, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: Presented at ICML 2022

  5. arXiv:2201.12208  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Star Temporal Classification: Sequence Classification with Partially Labeled Data

    Authors: Vineel Pratap, Awni Hannun, Gabriel Synnaeve, Ronan Collobert

    Abstract: We develop an algorithm which can learn from partially labeled and unsegmented sequential data. Most sequential loss functions, such as Connectionist Temporal Classification (CTC), break down when many labels are missing. We address this problem with Star Temporal Classification (STC) which uses a special star token to allow alignments which include all possible tokens whenever a token could be mi… ▽ More

    Submitted 3 March, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

  6. arXiv:2110.02848  [pdf, other

    cs.CL

    Parallel Composition of Weighted Finite-State Transducers

    Authors: Shubho Sengupta, Vineel Pratap, Awni Hannun

    Abstract: Finite-state transducers (FSTs) are frequently used in speech recognition. Transducer composition is an essential operation for combining different sources of information at different granularities. However, composition is also one of the more computationally expensive operations. Due to the heterogeneous structure of FSTs, parallel algorithms for composition are suboptimal in efficiency, generali… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

  7. arXiv:2109.00984  [pdf, other

    cs.LG cs.CR

    CrypTen: Secure Multi-Party Computation Meets Machine Learning

    Authors: Brian Knott, Shobha Venkataraman, Awni Hannun, Shubho Sengupta, Mark Ibrahim, Laurens van der Maaten

    Abstract: Secure multi-party computation (MPC) allows parties to perform computations on data while kee** that data private. This capability has great potential for machine-learning applications: it facilitates training of machine-learning models on private data sets owned by different parties, evaluation of one party's private model using another party's private data, etc. Although a range of studies imp… ▽ More

    Submitted 15 September, 2022; v1 submitted 2 September, 2021; originally announced September 2021.

  8. arXiv:2108.00084  [pdf, other

    cs.CL cs.SD eess.AS

    The History of Speech Recognition to the Year 2030

    Authors: Awni Hannun

    Abstract: The decade from 2010 to 2020 saw remarkable improvements in automatic speech recognition. Many people now use speech recognition on a daily basis, for example to perform voice search queries, send text messages, and interact with voice assistants like Amazon Alexa and Siri by Apple. Before 2010 most people rarely used speech recognition. Given the remarkable changes in the state of speech recognit… ▽ More

    Submitted 30 July, 2021; originally announced August 2021.

  9. arXiv:2106.11151  [pdf, other

    cs.NE

    The Role of Evolution in Machine Intelligence

    Authors: Awni Hannun

    Abstract: Machine intelligence can develop either directly from experience or by inheriting experience through evolution. The bulk of current research efforts focus on algorithms which learn directly from experience. I argue that the alternative, evolution, is important to the development of machine intelligence and underinvested in terms of research allocation. The primary aim of this work is to assess whe… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

  10. arXiv:2104.09937  [pdf, other

    cs.LG stat.ML

    Gradient Matching for Domain Generalization

    Authors: Yuge Shi, Jeffrey Seely, Philip H. S. Torr, N. Siddharth, Awni Hannun, Nicolas Usunier, Gabriel Synnaeve

    Abstract: Machine learning systems typically assume that the distributions of training and test sets match closely. However, a critical requirement of such systems in the real world is their ability to generalize to unseen domains. Here, we propose an inter-domain gradient matching objective that targets domain generalization by maximizing the inner product between gradients from different domains. Since di… ▽ More

    Submitted 13 July, 2021; v1 submitted 20 April, 2021; originally announced April 2021.

  11. arXiv:2103.11766  [pdf, other

    cs.LG

    Fixes That Fail: Self-Defeating Improvements in Machine-Learning Systems

    Authors: Ruihan Wu, Chuan Guo, Awni Hannun, Laurens van der Maaten

    Abstract: Machine-learning systems such as self-driving cars or virtual assistants are composed of a large number of machine-learning models that recognize image content, transcribe speech, analyze natural language, infer preferences, rank options, etc. Models in these systems are often developed and trained independently, which raises an obvious concern: Can improving a machine-learning model make the over… ▽ More

    Submitted 31 May, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

  12. arXiv:2102.11673  [pdf, other

    cs.LG cs.CR

    Measuring Data Leakage in Machine-Learning Models with Fisher Information

    Authors: Awni Hannun, Chuan Guo, Laurens van der Maaten

    Abstract: Machine-learning models contain information about the data they were trained on. This information leaks either through the model itself or through predictions made by the model. Consequently, when the training data contains sensitive attributes, assessing the amount of information leakage is paramount. We propose a method to quantify this leakage using the Fisher information of the model about the… ▽ More

    Submitted 23 August, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

  13. arXiv:2012.06430  [pdf, other

    cs.LG

    Data Appraisal Without Data Sharing

    Authors: Mimee Xu, Laurens van der Maaten, Awni Hannun

    Abstract: One of the most effective approaches to improving the performance of a machine learning model is to procure additional training data. A model owner seeking relevant training data from a data owner needs to appraise the data before acquiring it. However, without a formal agreement, the data owner does not want to share data. The resulting Catch-22 prevents efficient data markets from forming. This… ▽ More

    Submitted 13 March, 2022; v1 submitted 11 December, 2020; originally announced December 2020.

  14. arXiv:2010.01003  [pdf, other

    cs.LG stat.ML

    Differentiable Weighted Finite-State Transducers

    Authors: Awni Hannun, Vineel Pratap, Jacob Kahn, Wei-Ning Hsu

    Abstract: We introduce a framework for automatic differentiation with weighted finite-state transducers (WFSTs) allowing them to be used dynamically at training time. Through the separation of graphs from operations on graphs, this framework enables the exploration of new structured loss functions which in turn eases the encoding of prior knowledge into learning algorithms. We show how the framework can com… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

  15. arXiv:2007.05089  [pdf, other

    cs.LG stat.ML

    The Trade-Offs of Private Prediction

    Authors: Laurens van der Maaten, Awni Hannun

    Abstract: Machine learning models leak information about their training data every time they reveal a prediction. This is problematic when the training data needs to remain private. Private prediction methods limit how much information about the training data is leaked by each prediction. Private prediction can also be achieved using models that are trained by private training methods. In private prediction… ▽ More

    Submitted 9 July, 2020; originally announced July 2020.

  16. arXiv:2007.03001  [pdf, other

    eess.AS cs.CL cs.SD

    Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters

    Authors: Vineel Pratap, Anuroop Sriram, Paden Tomasello, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

    Abstract: We study training a single acoustic model for multiple languages with the aim of improving automatic speech recognition (ASR) performance on low-resource languages, and over-all simplifying deployment of ASR systems that support diverse languages. We perform an extensive benchmark on 51 languages, with varying amount of training data by language(from 100 hours to 1100 hours). We compare three vari… ▽ More

    Submitted 7 July, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

  17. arXiv:2005.09267  [pdf, other

    cs.CL cs.SD eess.AS

    Iterative Pseudo-Labeling for Speech Recognition

    Authors: Qiantong Xu, Tatiana Likhomanenko, Jacob Kahn, Awni Hannun, Gabriel Synnaeve, Ronan Collobert

    Abstract: Pseudo-labeling has recently shown promise in end-to-end automatic speech recognition (ASR). We study Iterative Pseudo-Labeling (IPL), a semi-supervised algorithm which efficiently performs multiple iterations of pseudo-labeling on unlabeled data as the acoustic model evolves. In particular, IPL fine-tunes an existing model at each iteration using both labeled data and a subset of unlabeled data.… ▽ More

    Submitted 26 August, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

    Comments: INTERSPEECH 2020

  18. arXiv:2002.10336  [pdf, other

    cs.CL cs.LG eess.AS

    Semi-Supervised Speech Recognition via Local Prior Matching

    Authors: Wei-Ning Hsu, Ann Lee, Gabriel Synnaeve, Awni Hannun

    Abstract: For sequence transduction tasks like speech recognition, a strong structured prior model encodes rich information about the target space, implicitly ruling out invalid sequences by assigning them low probability. In this work, we propose local prior matching (LPM), a semi-supervised objective that distills knowledge from a strong prior (e.g. a language model) to provide learning signal to a discri… ▽ More

    Submitted 24 February, 2020; originally announced February 2020.

  19. arXiv:2001.09727  [pdf, other

    cs.CL cs.SD eess.AS

    Scaling Up Online Speech Recognition Using ConvNets

    Authors: Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

    Abstract: We design an online end-to-end speech recognition system based on Time-Depth Separable (TDS) convolutions and Connectionist Temporal Classification (CTC). We improve the core TDS architecture in order to limit the future context and hence reduce latency while maintaining accuracy. The system has almost three times the throughput of a well tuned hybrid ASR baseline while also having lower latency a… ▽ More

    Submitted 27 January, 2020; originally announced January 2020.

  20. arXiv:2001.03192  [pdf, other

    cs.CR cs.IT cs.LG math.NA stat.CO

    Secure multiparty computations in floating-point arithmetic

    Authors: Chuan Guo, Awni Hannun, Brian Knott, Laurens van der Maaten, Mark Tygert, Ruiyu Zhu

    Abstract: Secure multiparty computations enable the distribution of so-called shares of sensitive data to multiple parties such that the multiple parties can effectively process the data while being unable to glean much information about the data (at least not without collusion among all parties to put back together all the shares). Thus, the parties may conspire to send all their processed results to a tru… ▽ More

    Submitted 9 January, 2020; originally announced January 2020.

    Comments: 31 pages, 13 figures, 6 tables

    Journal ref: Information and Inference: a Journal of the IMA, iaaa038: 1-33, 2021

  21. arXiv:1911.03030  [pdf, other

    cs.LG stat.ML

    Certified Data Removal from Machine Learning Models

    Authors: Chuan Guo, Tom Goldstein, Awni Hannun, Laurens van der Maaten

    Abstract: Good data stewardship requires removal of data at the request of the data's owner. This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request. Is it possible to "remove" data from a machine-learning model? We study this problem by defining certified removal: a very strong theoretica… ▽ More

    Submitted 7 November, 2023; v1 submitted 7 November, 2019; originally announced November 2019.

    Comments: Accepted to ICML 2020

  22. arXiv:1910.07323  [pdf, ps, other

    cs.CL cs.AI cs.LG eess.AS

    Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition

    Authors: Adrien Dufraux, Emmanuel Vincent, Awni Hannun, Armelle Brun, Matthijs Douze

    Abstract: The transcriptions used to train an Automatic Speech Recognition (ASR) system may contain errors. Usually, either a quality control stage discards transcriptions with too many errors, or the noisy transcriptions are used as is. We introduce Lead2Gold, a method to train an ASR system that exploits the full potential of noisy transcriptions. Based on a noise model of transcription errors, Lead2Gold… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

    Comments: 8 pages, 4 tables, Accepted for publication in ASRU 2019

    ACM Class: I.2.6; I.2.7

  23. arXiv:1910.05299  [pdf, other

    cs.LG cs.CR stat.ML

    Privacy-Preserving Multi-Party Contextual Bandits

    Authors: Awni Hannun, Brian Knott, Shubho Sengupta, Laurens van der Maaten

    Abstract: Contextual bandits are online learners that, given an input, select an arm and receive a reward for that arm. They use the reward as a learning signal and aim to maximize the total reward over the inputs. Contextual bandits are commonly used to solve recommendation or ranking problems. This paper considers a learning setting in which multiple parties aim to train a contextual bandit together in a… ▽ More

    Submitted 13 February, 2020; v1 submitted 11 October, 2019; originally announced October 2019.

  24. Self-Training for End-to-End Speech Recognition

    Authors: Jacob Kahn, Ann Lee, Awni Hannun

    Abstract: We revisit self-training in the context of end-to-end speech recognition. We demonstrate that training with pseudo-labels can substantially improve the accuracy of a baseline model. Key to our approach are a strong baseline acoustic and language model used to generate the pseudo-labels, filtering mechanisms tailored to common errors from sequence-to-sequence models, and a novel ensemble approach t… ▽ More

    Submitted 23 February, 2020; v1 submitted 19 September, 2019; originally announced September 2019.

    Comments: To be published in the 45th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2020

  25. arXiv:1906.04323  [pdf, other

    cs.CL cs.SD eess.AS

    Word-level Speech Recognition with a Letter to Word Encoder

    Authors: Ronan Collobert, Awni Hannun, Gabriel Synnaeve

    Abstract: We propose a direct-to-word sequence model which uses a word network to learn word embeddings from letters. The word network can be integrated seamlessly with arbitrary sequence models including Connectionist Temporal Classification and encoder-decoder models with attention. We show our direct-to-word model can achieve word error rate gains over sub-word level models for speech recognition. We als… ▽ More

    Submitted 14 July, 2020; v1 submitted 10 June, 2019; originally announced June 2019.

    Comments: ICML 2020

  26. arXiv:1904.02619  [pdf, other

    cs.CL

    Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions

    Authors: Awni Hannun, Ann Lee, Qiantong Xu, Ronan Collobert

    Abstract: We propose a fully convolutional sequence-to-sequence encoder architecture with a simple and efficient decoder. Our model improves WER on LibriSpeech while being an order of magnitude more efficient than a strong RNN baseline. Key to our approach is a time-depth separable convolution block which dramatically reduces the number of parameters in the model while kee** the receptive field large. We… ▽ More

    Submitted 4 April, 2019; originally announced April 2019.

  27. arXiv:1902.06022  [pdf, other

    cs.CL

    A Fully Differentiable Beam Search Decoder

    Authors: Ronan Collobert, Awni Hannun, Gabriel Synnaeve

    Abstract: We introduce a new beam search decoder that is fully differentiable, making it possible to optimize at training time through the inference procedure. Our decoder allows us to combine models which operate at different granularities (e.g. acoustic and language models). It can be used when target sequences are not aligned to input sequences by considering all possible alignments between the two. We d… ▽ More

    Submitted 15 February, 2019; originally announced February 2019.

  28. wav2letter++: The Fastest Open-source Speech Recognition System

    Authors: Vineel Pratap, Awni Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, Ronan Collobert

    Abstract: This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. Here we explain the architecture and design of the wav2letter++ system and compare it to other major open-source speech recognition systems. In some cases wav2letter++ is more than 2x faster th… ▽ More

    Submitted 18 December, 2018; originally announced December 2018.

  29. arXiv:1707.01836  [pdf, other

    cs.CV

    Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks

    Authors: Pranav Rajpurkar, Awni Y. Hannun, Masoumeh Haghpanahi, Codie Bourn, Andrew Y. Ng

    Abstract: We develop an algorithm which exceeds the performance of board certified cardiologists in detecting a wide range of heart arrhythmias from electrocardiograms recorded with a single-lead wearable monitor. We build a dataset with more than 500 times the number of unique patients than previously studied corpora. On this dataset, we train a 34-layer convolutional neural network which maps a sequence o… ▽ More

    Submitted 6 July, 2017; originally announced July 2017.

  30. arXiv:1611.09405  [pdf, other

    cs.CL

    An End-to-End Architecture for Keyword Spotting and Voice Activity Detection

    Authors: Chris Lengerich, Awni Hannun

    Abstract: We propose a single neural network architecture for two tasks: on-line keyword spotting and voice activity detection. We develop novel inference algorithms for an end-to-end Recurrent Neural Network trained with the Connectionist Temporal Classification loss function which allow our model to achieve high accuracy on both keyword spotting and voice activity detection without retraining. In contrast… ▽ More

    Submitted 28 November, 2016; originally announced November 2016.

    Comments: NIPS 2016 End-to-End Learning for Speech and Audio Processing Workshop

  31. arXiv:1603.09509  [pdf, other

    cs.CL cs.LG cs.NE cs.SD

    Learning Multiscale Features Directly From Waveforms

    Authors: Zhenyao Zhu, Jesse H. Engel, Awni Hannun

    Abstract: Deep learning has dramatically improved the performance of speech recognition systems through learning hierarchies of features optimized for the task at hand. However, true end-to-end learning, where features are learned directly from waveforms, has only recently reached the performance of hand-tailored representations based on the Fourier transform. In this paper, we detail an approach to use con… ▽ More

    Submitted 5 April, 2016; v1 submitted 31 March, 2016; originally announced March 2016.

    Comments: "fix typo in the title"

  32. arXiv:1512.02595  [pdf, other

    cs.CL

    Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

    Authors: Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, **gdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh , et al. (9 additional authors not shown)

    Abstract: We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our app… ▽ More

    Submitted 8 December, 2015; originally announced December 2015.

  33. arXiv:1412.5567  [pdf, other

    cs.CL cs.LG cs.NE

    Deep Speech: Scaling up end-to-end speech recognition

    Authors: Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, Andrew Y. Ng

    Abstract: We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model backgroun… ▽ More

    Submitted 19 December, 2014; v1 submitted 17 December, 2014; originally announced December 2014.

  34. arXiv:1408.2873  [pdf, ps, other

    cs.CL cs.LG cs.NE

    First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs

    Authors: Awni Y. Hannun, Andrew L. Maas, Daniel Jurafsky, Andrew Y. Ng

    Abstract: We present a method to perform first-pass large vocabulary continuous speech recognition using only a neural network and language model. Deep neural network acoustic models are now commonplace in HMM-based speech recognition systems, but building such systems is a complex, domain-specific task. Recent work demonstrated the feasibility of discarding the HMM sequence modeling framework by directly p… ▽ More

    Submitted 8 December, 2014; v1 submitted 12 August, 2014; originally announced August 2014.

  35. arXiv:1406.7806  [pdf, other

    cs.CL cs.LG cs.NE stat.ML

    Building DNN Acoustic Models for Large Vocabulary Speech Recognition

    Authors: Andrew L. Maas, Peng Qi, Ziang Xie, Awni Y. Hannun, Christopher T. Lengerich, Daniel Jurafsky, Andrew Y. Ng

    Abstract: Deep neural networks (DNNs) are now a central component of nearly all state-of-the-art speech recognition systems. Building neural network acoustic models requires several design decisions including network architecture, size, and training loss function. This paper offers an empirical investigation on which aspects of DNN acoustic model design are most important for speech recognition system perfo… ▽ More

    Submitted 20 January, 2015; v1 submitted 30 June, 2014; originally announced June 2014.