Skip to main content

Showing 1–5 of 5 results for author: Unni, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.15970  [pdf, other

    cs.CL cs.AI cs.LG

    Accented Speech Recognition With Accent-specific Codebooks

    Authors: Darshan Prabhu, Preethi Jyothi, Sriram Ganapathy, Vinit Unni

    Abstract: Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems. Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR. In this work, we propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks. These learnable codebooks capture… ▽ More

    Submitted 26 October, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 Main Conference (Long Paper)

  2. arXiv:2307.05006  [pdf, ps, other

    cs.CL cs.LG eess.AS

    Improving RNN-Transducers with Acoustic LookAhead

    Authors: Vinit S. Unni, Ashish Mittal, Preethi Jyothi, Sunita Sarawagi

    Abstract: RNN-Transducers (RNN-Ts) have gained widespread acceptance as an end-to-end model for speech to text conversion because of their high accuracy and streaming capabilities. A typical RNN-T independently encodes the input audio and the text context, and combines the two encodings by a thin joint network. While this architecture provides SOTA streaming accuracy, it also makes the model vulnerable to s… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: 5 pages, 1 fig, 7 tables, Proceedings of Interspeech 2023

  3. arXiv:2203.02317  [pdf, other

    cs.CL cs.LG

    Adaptive Discounting of Implicit Language Models in RNN-Transducers

    Authors: Vinit Unni, Shreya Khare, Ashish Mittal, Preethi Jyothi, Sunita Sarawagi, Samarth Bharadwaj

    Abstract: RNN-Transducer (RNN-T) models have become synonymous with streaming end-to-end ASR systems. While they perform competitively on a number of evaluation categories, rare words pose a serious challenge to RNN-T models. One main reason for the degradation in performance on rare words is that the language model (LM) internal to RNN-Ts can become overconfident and lead to hallucinated predictions that a… ▽ More

    Submitted 21 February, 2022; originally announced March 2022.

    Comments: Proceedings for ICASSP 2022

  4. arXiv:2106.12758  [pdf, other

    physics.flu-dyn cs.LG nlin.CD

    Neural ODE to model and prognose thermoacoustic instability

    Authors: Jayesh Dhadphale, Vishnu R. Unni, Abhishek Saha, R. I. Sujith

    Abstract: In reacting flow systems, thermoacoustic instability characterized by high amplitude pressure fluctuations, is driven by a positive coupling between the unsteady heat release rate and the acoustic field of the combustor. When the underlying flow is turbulent, as a control parameter of the system is varied and the system approach thermoacoustic instability, the acoustic pressure oscillations synchr… ▽ More

    Submitted 13 August, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: 31 pages, 12 figures

  5. Multilingual and code-switching ASR challenges for low resource Indian languages

    Authors: Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan, Tejaswi Seeram, Basil Abraham

    Abstract: Recently, there is increasing interest in multilingual automatic speech recognition (ASR) where a speech recognition system caters to multiple low resource languages by taking advantage of low amounts of labeled corpora in multiple languages. With multilingualism becoming common in today's world, there has been increasing interest in code-switching ASR as well. In code-switching, multiple language… ▽ More

    Submitted 31 March, 2021; originally announced April 2021.

    Comments: 6 pages