Skip to main content

Showing 1–12 of 12 results for author: Allauzen, C

.
  1. arXiv:2401.12789  [pdf, other

    cs.CL cs.SD eess.AS

    Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study

    Authors: W. Ronny Huang, Cyril Allauzen, Tongzhou Chen, Kilol Gupta, Ke Hu, James Qin, Yu Zhang, Yongqiang Wang, Shuo-Yiin Chang, Tara N. Sainath

    Abstract: In the era of large models, the autoregressive nature of decoding often results in latency serving as a significant bottleneck. We propose a non-autoregressive LM-fused ASR system that effectively leverages the parallelization capabilities of accelerator hardware. Our approach combines the Universal Speech Model (USM) and the PaLM 2 language model in per-segment scoring mode, achieving an average… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: ICASSP 2024

  2. arXiv:2306.08133  [pdf, ps, other

    eess.AS cs.CL

    Large-scale Language Model Rescoring on Long-form Data

    Authors: Tongzhou Chen, Cyril Allauzen, Yinghui Huang, Daniel Park, David Rybach, W. Ronny Huang, Rodrigo Cabrera, Kartik Audhkhasi, Bhuvana Ramabhadran, Pedro J. Moreno, Michael Riley

    Abstract: In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR. We demonstrate up to 8\% relative reduction in Word Error Eate (WER) on US English (en-us) and code-switched Indian English (en-in) long-form ASR test sets and a reduction of up to 30\% relative on Salient Term Error Rate (STER)… ▽ More

    Submitted 5 September, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: 5 pages, accepted in ICASSP 2023

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  3. arXiv:2212.12442  [pdf, ps, other

    cs.CL cs.LG

    Alignment Entropy Regularization

    Authors: Ehsan Variani, Ke Wu, David Rybach, Cyril Allauzen, Michael Riley

    Abstract: Existing training criteria in automatic speech recognition(ASR) permit the model to freely explore more than one time alignments between the feature and label sequences. In this paper, we use entropy to measure a model's uncertainty, i.e. how it chooses to distribute the probability mass over the set of allowed alignments. Furthermore, we evaluate the effect of entropy regularization in encouragin… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

  4. arXiv:2211.15432  [pdf, other

    cs.CL

    E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model

    Authors: W. Ronny Huang, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, David Rybach, Robert David, Rohit Prabhavalkar, Cyril Allauzen, Cal Peyser, Trevor D. Strohman

    Abstract: We explore unifying a neural segmenter with two-pass cascaded encoder ASR into a single model. A key challenge is allowing the segmenter (which runs in real-time, synchronously with the decoder) to finalize the 2nd pass (which runs 900 ms behind real-time) without introducing user-perceived latency or deletion errors during inference. We propose a design where the neural segmenter is integrated wi… ▽ More

    Submitted 5 March, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: ICASSP 2023

  5. arXiv:2205.13674  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Global Normalization for Streaming Speech Recognition in a Modular Framework

    Authors: Ehsan Variani, Ke Wu, Michael Riley, David Rybach, Matt Shannon, Cyril Allauzen

    Abstract: We introduce the Globally Normalized Autoregressive Transducer (GNAT) for addressing the label bias problem in streaming speech recognition. Our solution admits a tractable exact computation of the denominator for the sequence-level normalization. Through theoretical and empirical results, we demonstrate that by switching to a globally normalized model, the word error rate gap between streaming an… ▽ More

    Submitted 26 May, 2022; originally announced May 2022.

  6. arXiv:2204.10749  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR

    Authors: W. Ronny Huang, Shuo-yiin Chang, David Rybach, Rohit Prabhavalkar, Tara N. Sainath, Cyril Allauzen, Cal Peyser, Zhiyun Lu

    Abstract: Improving the performance of end-to-end ASR models on long utterances ranging from minutes to hours in length is an ongoing challenge in speech recognition. A common solution is to segment the audio in advance using a separate voice activity detector (VAD) that decides segment boundary locations based purely on acoustic speech/non-speech information. VAD segmenters, however, may be sub-optimal for… ▽ More

    Submitted 15 June, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

    Comments: Interspeech 2022

  7. arXiv:2204.07236  [pdf, other

    cs.FL cs.CL

    A* shortest string decoding for non-idempotent semirings

    Authors: Kyle Gorman, Cyril Allauzen

    Abstract: The single shortest path algorithm is undefined for weighted finite-state automata over non-idempotent semirings because such semirings do not guarantee the existence of a shortest path. However, in non-idempotent semirings admitting an order satisfying a monotonicity condition (such as the plus-times or log semirings), the notion of shortest string is well-defined. We describe an algorithm which… ▽ More

    Submitted 25 January, 2024; v1 submitted 14 April, 2022; originally announced April 2022.

    Comments: Ten pages, two figures. To appear in the proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics

  8. arXiv:2003.07705  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD

    Hybrid Autoregressive Transducer (hat)

    Authors: Ehsan Variani, David Rybach, Cyril Allauzen, Michael Riley

    Abstract: This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a time-synchronous encoderdecoder model that preserves the modularity of conventional automatic speech recognition systems. The HAT model provides a way to measure the quality of the internal language model that can be used to decide whether inference with an external language model is beneficial or not. This artic… ▽ More

    Submitted 12 March, 2020; originally announced March 2020.

  9. arXiv:1910.03432  [pdf, other

    cs.CL cs.LG

    Federated Learning of N-gram Language Models

    Authors: Mingqing Chen, Ananda Theertha Suresh, Rajiv Mathews, Adeline Wong, Cyril Allauzen, Françoise Beaufays, Michael Riley

    Abstract: We propose algorithms to train production-quality n-gram language models using federated learning. Federated learning is a distributed computation platform that can be used to train global models for portable devices such as smart phones. Federated learning is especially relevant for applications handling privacy-sensitive data, such as virtual keyboards, because training is performed without the… ▽ More

    Submitted 8 October, 2019; originally announced October 2019.

    Comments: 10 pages

  10. arXiv:0904.4686  [pdf, ps, other

    cs.FL

    Linear-Space Computation of the Edit-Distance between a String and a Finite Automaton

    Authors: Cyril Allauzen, Mehryar Mohri

    Abstract: The problem of computing the edit-distance between a string and a finite automaton arises in a variety of applications in computational biology, text processing, and speech recognition. This paper presents linear-space algorithms for computing the edit-distance between a string and an arbitrary weighted automaton over the tropical semiring, or an unambiguous weighted automaton over an arbitrary… ▽ More

    Submitted 29 April, 2009; originally announced April 2009.

  11. arXiv:0802.3254  [pdf, ps, other

    cs.CC

    General Algorithms for Testing the Ambiguity of Finite Automata

    Authors: Cyril Allauzen, Mehryar Mohri, Ashish Rastogi

    Abstract: This paper presents efficient algorithms for testing the finite, polynomial, and exponential ambiguity of finite automata with $ε$-transitions. It gives an algorithm for testing the exponential ambiguity of an automaton $A$ in time $O(|A|_E^2)$, and finite or polynomial ambiguity in time $O(|A|_E^3)$. These complexities significantly improve over the previous best complexities given for the same… ▽ More

    Submitted 22 February, 2008; originally announced February 2008.

  12. arXiv:0802.1465  [pdf, ps, other

    cs.CC

    3-Way Composition of Weighted Finite-State Transducers

    Authors: Cyril Allauzen, Mehryar Mohri

    Abstract: Composition of weighted transducers is a fundamental algorithm used in many applications, including for computing complex edit-distances between automata, or string kernels in machine learning, or to combine different components of a speech recognition, speech synthesis, or information extraction system. We present a generalization of the composition of weighted transducers, 3-way composition, w… ▽ More

    Submitted 22 February, 2008; v1 submitted 11 February, 2008; originally announced February 2008.

    Comments: Added missing acknowledgments