Skip to main content

Showing 1–5 of 5 results for author: Travadi, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.09676  [pdf, other

    eess.AS cs.CL

    Optimizing Byte-level Representation for End-to-end ASR

    Authors: Roger Hsiao, Liuhui Deng, Erik McDermott, Ruchir Travadi, Xiaodan Zhuang

    Abstract: We propose a novel approach to optimizing a byte-level representation for end-to-end automatic speech recognition (ASR). Byte-level representation is often used by large scale multilingual ASR systems when the character set of the supported languages is large. The compactness and universality of byte-level representation allow the ASR models to use smaller output vocabularies and therefore, provid… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 5 pages, 1 figure

  2. arXiv:2211.01438  [pdf, other

    eess.AS cs.CL cs.SD

    Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

    Authors: Pawel Swietojanski, Stefan Braun, Dogan Can, Thiago Fraga da Silva, Arnab Ghoshal, Takaaki Hori, Roger Hsiao, Henry Mason, Erik McDermott, Honza Silovsky, Ruchir Travadi, Xiaodan Zhuang

    Abstract: This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios. We present a comprehensive set of experiments comparing fixed masking, where the same attention mask is applied at every frame, with chunked masking, where the attention mask for each frame is determined by chunk boundaries,… ▽ More

    Submitted 18 April, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: To appear in ICASSP 2023

    Journal ref: International Conference on Acoustics, Speech, and Signal Processing, 2023 International Conference on Acoustics, Speech, and Signal Processing International Conference on Acoustics, Speech, and Signal Processing

  3. arXiv:2008.05514  [pdf, other

    eess.AS cs.CL cs.SD

    Online Automatic Speech Recognition with Listen, Attend and Spell Model

    Authors: Roger Hsiao, Dogan Can, Tim Ng, Ruchir Travadi, Arnab Ghoshal

    Abstract: The Listen, Attend and Spell (LAS) model and other attention-based automatic speech recognition (ASR) models have known limitations when operated in a fully online mode. In this paper, we analyze the online operation of LAS models to demonstrate that these limitations stem from the handling of silence regions and the reliability of online attention mechanism at the edge of input buffers. We propos… ▽ More

    Submitted 13 October, 2020; v1 submitted 12 August, 2020; originally announced August 2020.

    Comments: 5 pages, 4 figures, this version is submitted to IEEE Signal Processing Letters

  4. arXiv:1907.06859  [pdf, other

    eess.AS cs.SD

    Towards Adapting NMF Dictionaries Using Total Variability Modeling for Noise-Robust Acoustic Features

    Authors: Kunal Dhawan, Colin Vaz, Ruchir Travadi, Shrikanth Narayanan

    Abstract: We propose an algorithm to extract noise-robust acoustic features from noisy speech. We use Total Variability Modeling in combination with Non-negative Matrix Factorization (NMF) to learn a total variability subspace and adapt NMF dictionaries for each utterance. Unlike several other approaches for extracting noise-robust features, our algorithm does not require a training corpus of parallel clean… ▽ More

    Submitted 16 July, 2019; originally announced July 2019.

  5. Multimodal Representation Learning using Deep Multiset Canonical Correlation

    Authors: Krishna Somandepalli, Naveen Kumar, Ruchir Travadi, Shrikanth Narayanan

    Abstract: We propose Deep Multiset Canonical Correlation Analysis (dMCCA) as an extension to representation learning using CCA when the underlying signal is observed across multiple (more than two) modalities. We use deep learning framework to learn non-linear transformations from different modalities to a shared subspace such that the representations maximize the ratio of between- and within-modality covar… ▽ More

    Submitted 3 April, 2019; originally announced April 2019.