Skip to main content

Showing 1–9 of 9 results for author: Lugosch, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2305.13330  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Unsupervised ASR via Cross-Lingual Pseudo-Labeling

    Authors: Tatiana Likhomanenko, Loren Lugosch, Ronan Collobert

    Abstract: Recent work has shown that it is possible to train an $\textit{unsupervised}$ automatic speech recognition (ASR) system using only unpaired audio and text. Existing unsupervised ASR methods assume that no labeled data can be used for training. We argue that even if one does not have any labeled audio for a given language, there is $\textit{always}$ labeled data available for other languages. We sh… ▽ More

    Submitted 16 February, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

  2. arXiv:2111.00161  [pdf, other

    cs.CL cs.SD eess.AS

    Pseudo-Labeling for Massively Multilingual Speech Recognition

    Authors: Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert

    Abstract: Semi-supervised learning through pseudo-labeling has become a staple of state-of-the-art monolingual speech recognition systems. In this work, we extend pseudo-labeling to massively multilingual speech recognition with 60 languages. We propose a simple pseudo-labeling recipe that works well even with low-resource languages: train a supervised multilingual model, fine-tune it with semi-supervised l… ▽ More

    Submitted 8 March, 2022; v1 submitted 29 October, 2021; originally announced November 2021.

    Comments: Accepted to ICASSP 2022. New version has links to code/models + more training curves for larger model. (Fixed code link.)

  3. arXiv:2106.04624  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    SpeechBrain: A General-Purpose Speech Toolkit

    Authors: Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-Chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato De Mori, Yoshua Bengio

    Abstract: SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: Preprint

  4. arXiv:2104.01604  [pdf, other

    cs.CL eess.AS

    Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers

    Authors: Loren Lugosch, Piyush Papreja, Mirco Ravanelli, Abdelwahab Heba, Titouan Parcollet

    Abstract: This paper introduces Timers and Such, a new open source dataset of spoken English commands for common voice control use cases involving numbers. We describe the gap in existing spoken language understanding datasets that Timers and Such fills, the design and creation of the dataset, and experiments with a number of ASR-based and end-to-end baseline models, the code for which has been made availab… ▽ More

    Submitted 30 September, 2021; v1 submitted 4 April, 2021; originally announced April 2021.

    Comments: Accepted to NeurIPS 2021 - Datasets and Benchmarks Track

  5. arXiv:1910.09463  [pdf, other

    eess.AS

    Using Speech Synthesis to Train End-to-End Spoken Language Understanding Models

    Authors: Loren Lugosch, Brett Meyer, Derek Nowrouzezahrai, Mirco Ravanelli

    Abstract: End-to-end models are an attractive new approach to spoken language understanding (SLU) in which the meaning of an utterance is inferred directly from the raw audio without employing the standard pipeline composed of a separately trained speech recognizer and natural language understanding module. The downside of end-to-end SLU is that in-domain speech data must be recorded to train the model. In… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

  6. arXiv:1904.03670  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Speech Model Pre-training for End-to-End Spoken Language Understanding

    Authors: Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, Yoshua Bengio

    Abstract: Whereas conventional spoken language understanding (SLU) systems map speech to text, and then text to intent, end-to-end SLU systems map speech directly to intent through a single trainable model. Achieving high accuracy with these end-to-end models without a large amount of training data is difficult. We propose a method to reduce the data requirements of end-to-end SLU in which the model is firs… ▽ More

    Submitted 25 July, 2019; v1 submitted 7 April, 2019; originally announced April 2019.

    Comments: Accepted to Interspeech 2019

  7. arXiv:1811.10736  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    DONUT: CTC-based Query-by-Example Keyword Spotting

    Authors: Loren Lugosch, Samuel Myer, Vikrant Singh Tomar

    Abstract: Keyword spotting--or wakeword detection--is an essential feature for hands-free operation of modern voice-controlled devices. With such devices becoming ubiquitous, users might want to choose a personalized custom wakeword. In this work, we present DONUT, a CTC-based algorithm for online query-by-example keyword spotting that enables custom wakeword detection. The algorithm works by recording a sm… ▽ More

    Submitted 26 November, 2018; originally announced November 2018.

    Comments: Accepted to NeurIPS 2018 Workshop on Interpretability and Robustness for Audio, Speech, and Language

  8. arXiv:1810.10902  [pdf, ps, other

    cs.IT cs.LG eess.SP

    Learning from the Syndrome

    Authors: Loren Lugosch, Warren J. Gross

    Abstract: In this paper, we introduce the syndrome loss, an alternative loss function for neural error-correcting decoders based on a relaxation of the syndrome. The syndrome loss penalizes the decoder for producing outputs that do not correspond to valid codewords. We show that training with the syndrome loss yields decoders with consistently lower frame error rate for a number of short block codes, at lit… ▽ More

    Submitted 22 October, 2018; originally announced October 2018.

    Comments: Accepted to Asilomar 2018 - special session on "Machine Learning for Wireless Systems"

  9. arXiv:1807.02465  [pdf, other

    eess.AS cs.SD

    Tone Recognition Using Lifters and CTC

    Authors: Loren Lugosch, Vikrant Singh Tomar

    Abstract: In this paper, we present a new method for recognizing tones in continuous speech for tonal languages. The method works by converting the speech signal to a cepstrogram, extracting a sequence of cepstral features using a convolutional neural network, and predicting the underlying sequence of tones using a connectionist temporal classification (CTC) network. The performance of the proposed method i… ▽ More

    Submitted 6 July, 2018; originally announced July 2018.

    Comments: Accepted to Interspeech 2018