Skip to main content

Showing 1–5 of 5 results for author: Kulshreshtha, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.17935  [pdf, other

    cs.CL cs.SD eess.AS

    Sequential Editing for Lifelong Training of Speech Recognition Models

    Authors: Devang Kulshreshtha, Saket Dingliwal, Brady Houston, Nikolaos Pappas, Srikanth Ronanki

    Abstract: Automatic Speech Recognition (ASR) traditionally assumes known domains, but adding data from a new domain raises concerns about computational inefficiencies linked to retraining models on both existing and new domains. Fine-tuning solely on new domain risks Catastrophic Forgetting (CF). To address this, Lifelong Learning (LLL) algorithms have been proposed for ASR. Prior research has explored tech… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  2. arXiv:2311.08402  [pdf, other

    cs.CL cs.IR cs.SD eess.AS

    Retrieve and Copy: Scaling ASR Personalization to Large Catalogs

    Authors: Sai Muralidhar Jayanthi, Devang Kulshreshtha, Saket Dingliwal, Srikanth Ronanki, Sravan Bodapati

    Abstract: Personalization of automatic speech recognition (ASR) models is a widely studied topic because of its many practical applications. Most recently, attention-based contextual biasing techniques are used to improve the recognition of rare words and domain specific entities. However, due to performance constraints, the biasing is often limited to a few thousand entities, restricting real-world usabili… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023

  3. arXiv:2311.02482  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Generalized zero-shot audio-to-intent classification

    Authors: Veera Raghavendra Elluru, Devang Kulshreshtha, Rohit Paturi, Sravan Bodapati, Srikanth Ronanki

    Abstract: Spoken language understanding systems using audio-only data are gaining popularity, yet their ability to handle unseen intents remains limited. In this study, we propose a generalized zero-shot audio-to-intent classification framework with only a few sample text sentences per intent. To achieve this, we first train a supervised audio-to-intent classifier by making use of a self-supervised pre-trai… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

  4. arXiv:2307.00759  [pdf, other

    cs.CL cs.SD eess.AS

    Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages

    Authors: Devang Kulshreshtha, Saket Dingliwal, Brady Houston, Sravan Bodapati

    Abstract: Connectionist Temporal Classification (CTC) models are popular for their balance between speed and performance for Automatic Speech Recognition (ASR). However, these CTC models still struggle in other areas, such as personalization towards custom words. A recent approach explores Contextual Adapters, wherein an attention-based biasing model for CTC is used to improve the recognition of custom enti… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: Published at INTERSPEECH 2023

  5. arXiv:2305.03837  [pdf, other

    eess.AS cs.LG cs.SD

    Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal Language Model Estimation

    Authors: Nilaksh Das, Monica Sunkara, Sravan Bodapati, **glun Cai, Devang Kulshreshtha, Jeff Farris, Katrin Kirchhoff

    Abstract: End-to-end ASR models trained on large amount of data tend to be implicitly biased towards language semantics of the training data. Internal language model estimation (ILME) has been proposed to mitigate this bias for autoregressive models such as attention-based encoder-decoder and RNN-T. Typically, ILME is performed by modularizing the acoustic and language components of the model architecture,… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

    Comments: Accepted to ICASSP 2023