Skip to main content

Showing 1–16 of 16 results for author: Bodapati, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2311.08402  [pdf, other

    cs.CL cs.IR cs.SD eess.AS

    Retrieve and Copy: Scaling ASR Personalization to Large Catalogs

    Authors: Sai Muralidhar Jayanthi, Devang Kulshreshtha, Saket Dingliwal, Srikanth Ronanki, Sravan Bodapati

    Abstract: Personalization of automatic speech recognition (ASR) models is a widely studied topic because of its many practical applications. Most recently, attention-based contextual biasing techniques are used to improve the recognition of rare words and domain specific entities. However, due to performance constraints, the biasing is often limited to a few thousand entities, restricting real-world usabili… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023

  2. arXiv:2311.02482  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Generalized zero-shot audio-to-intent classification

    Authors: Veera Raghavendra Elluru, Devang Kulshreshtha, Rohit Paturi, Sravan Bodapati, Srikanth Ronanki

    Abstract: Spoken language understanding systems using audio-only data are gaining popularity, yet their ability to handle unseen intents remains limited. In this study, we propose a generalized zero-shot audio-to-intent classification framework with only a few sample text sentences per intent. To achieve this, we first train a supervised audio-to-intent classifier by making use of a self-supervised pre-trai… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

  3. arXiv:2307.00759  [pdf, other

    cs.CL cs.SD eess.AS

    Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages

    Authors: Devang Kulshreshtha, Saket Dingliwal, Brady Houston, Sravan Bodapati

    Abstract: Connectionist Temporal Classification (CTC) models are popular for their balance between speed and performance for Automatic Speech Recognition (ASR). However, these CTC models still struggle in other areas, such as personalization towards custom words. A recent approach explores Contextual Adapters, wherein an attention-based biasing model for CTC is used to improve the recognition of custom enti… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: Published at INTERSPEECH 2023

  4. arXiv:2307.00453  [pdf, other

    cs.CL cs.SD eess.AS

    Don't Stop Self-Supervision: Accent Adaptation of Speech Representations via Residual Adapters

    Authors: Anshu Bhatia, Sanchit Sinha, Saket Dingliwal, Karthik Gopalakrishnan, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Speech representations learned in a self-supervised fashion from massive unlabeled speech corpora have been adapted successfully toward several downstream tasks. However, such representations may be skewed toward canonical data characteristics of such corpora and perform poorly on atypical, non-native accented speaker populations. With the state-of-the-art HuBERT model as a baseline, we propose an… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

  5. arXiv:2306.08175  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer ASR

    Authors: Goeric Huybrechts, Srikanth Ronanki, Xilai Li, Hadis Nosrati, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Conformer-based end-to-end models have become ubiquitous these days and are commonly used in both streaming and non-streaming automatic speech recognition (ASR). Techniques like dual-mode and dynamic chunk training helped unify streaming and non-streaming systems. However, there remains a performance gap between streaming with a full and limited past context. To address this issue, we propose the… ▽ More

    Submitted 1 March, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

  6. arXiv:2305.03837  [pdf, other

    eess.AS cs.LG cs.SD

    Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal Language Model Estimation

    Authors: Nilaksh Das, Monica Sunkara, Sravan Bodapati, **glun Cai, Devang Kulshreshtha, Jeff Farris, Katrin Kirchhoff

    Abstract: End-to-end ASR models trained on large amount of data tend to be implicitly biased towards language semantics of the training data. Internal language model estimation (ILME) has been proposed to mitigate this bias for autoregressive models such as attention-based encoder-decoder and RNN-T. Typically, ILME is performed by modularizing the acoustic and language components of the model architecture,… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

    Comments: Accepted to ICASSP 2023

  7. arXiv:2304.09325  [pdf, other

    eess.AS cs.SD

    Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASR

    Authors: Xilai Li, Goeric Huybrechts, Srikanth Ronanki, Jeff Farris, Sravan Bodapati

    Abstract: Recently, there has been an increasing interest in unifying streaming and non-streaming speech recognition models to reduce development, training and deployment cost. The best-known approaches rely on either window-based or dynamic chunk-based attention strategy and causal convolutions to minimize the degradation due to streaming. However, the performance gap still remains relatively large between… ▽ More

    Submitted 25 April, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: 5 pages, 3 figures, 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023)

  8. arXiv:2211.13280  [pdf, other

    cs.CL cs.SD eess.AS

    Device Directedness with Contextual Cues for Spoken Dialog Systems

    Authors: Dhanush Bekal, Sundararajan Srinivasan, Sravan Bodapati, Srikanth Ronanki, Katrin Kirchhoff

    Abstract: In this work, we define barge-in verification as a supervised learning task where audio-only information is used to classify user spoken dialogue into true and false barge-ins. Following the success of pre-trained models, we use low-level speech representations from a self-supervised representation learning model for our downstream classification task. Further, we propose a novel technique to infu… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

  9. arXiv:2210.09510  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Personalization of CTC Speech Recognition Models with Contextual Adapters and Adaptive Boosting

    Authors: Saket Dingliwal, Monica Sunkara, Sravan Bodapati, Srikanth Ronanki, Jeff Farris, Katrin Kirchhoff

    Abstract: End-to-end speech recognition models trained using joint Connectionist Temporal Classification (CTC)-Attention loss have gained popularity recently. In these models, a non-autoregressive CTC decoder is often used at inference time due to its speed and simplicity. However, such models are hard to personalize because of their conditional independence assumption that prevents output tokens from previ… ▽ More

    Submitted 13 November, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: To appear in SLT 2022

  10. arXiv:2109.05092  [pdf, other

    eess.AS cs.SD

    Remember the context! ASR slot error correction through memorization

    Authors: Dhanush Bekal, Ashish Shenoy, Monica Sunkara, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Accurate recognition of slot values such as domain specific words or named entities by automatic speech recognition (ASR) systems forms the core of the Goal-oriented Dialogue Systems. Although it is a critical step with direct impact on downstream tasks such as language understanding, many domain agnostic ASR systems tend to perform poorly on domain specific or long tail words. They are often supp… ▽ More

    Submitted 17 September, 2021; v1 submitted 10 September, 2021; originally announced September 2021.

    Comments: 8 pages, 3 figures, 4 tables, Accepted to ASRU 2021

  11. arXiv:2106.09532  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    ASR Adaptation for E-commerce Chatbots using Cross-Utterance Context and Multi-Task Language Modeling

    Authors: Ashish Shenoy, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Automatic Speech Recognition (ASR) robustness toward slot entities are critical in e-commerce voice assistants that involve monetary transactions and purchases. Along with effective domain adaptation, it is intuitive that cross utterance contextual cues play an important role in disambiguating domain specific content words from speech. In this paper, we investigate various techniques to improve co… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: Accepted at ACL-IJCNLP 2021 Workshop on e-Commerce and NLP (ECNLP)

  12. arXiv:2103.05834  [pdf, other

    eess.AS

    Best of Both Worlds: Robust Accented Speech Recognition with Adversarial Transfer Learning

    Authors: Nilaksh Das, Sravan Bodapati, Monica Sunkara, Sundararajan Srinivasan, Duen Horng Chau

    Abstract: Training deep neural networks for automatic speech recognition (ASR) requires large amounts of transcribed speech. This becomes a bottleneck for training robust models for accented speech which typically contains high variability in pronunciation and other semantics, since obtaining large amounts of annotated accented data is both tedious and costly. Often, we only have access to large amounts of… ▽ More

    Submitted 9 March, 2021; originally announced March 2021.

  13. arXiv:2102.06380  [pdf, ps, other

    cs.CL eess.AS

    Neural Inverse Text Normalization

    Authors: Monica Sunkara, Chaitanya Shivade, Sravan Bodapati, Katrin Kirchhoff

    Abstract: While there have been several contributions exploring state of the art techniques for text normalization, the problem of inverse text normalization (ITN) remains relatively unexplored. The best known approaches leverage finite state transducer (FST) based models which rely on manually curated rules and are hence not scalable. We propose an efficient and robust neural solution for ITN leveraging tr… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

    Comments: 5 pages, accepted to ICASSP 2021

  14. arXiv:2008.00702  [pdf, other

    eess.AS cs.CL

    Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech

    Authors: Monica Sunkara, Srikanth Ronanki, Dhanush Bekal, Sravan Bodapati, Katrin Kirchhoff

    Abstract: In this work, we explore a multimodal semi-supervised learning approach for punctuation prediction by learning representations from large amounts of unlabelled audio and text data. Conventional approaches in speech processing typically use forced alignment to encoder per frame acoustic features to word level features and perform multimodal fusion of the resulting acoustic and lexical representatio… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

    Comments: Accepted for Interspeech 2020

  15. arXiv:2007.02025  [pdf, other

    cs.CL cs.SD eess.AS

    Robust Prediction of Punctuation and Truecasing for Medical ASR

    Authors: Monica Sunkara, Srikanth Ronanki, Kalpit Dixit, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Automatic speech recognition (ASR) systems in the medical domain that focus on transcribing clinical dictations and doctor-patient conversations often pose many challenges due to the complexity of the domain. ASR output typically undergoes automatic punctuation to enable users to speak naturally, without having to vocalise awkward and explicit punctuation commands, such as "period", "add comma" or… ▽ More

    Submitted 11 July, 2020; v1 submitted 4 July, 2020; originally announced July 2020.

    Comments: Accepted for ACL NLPMC workshop 2020

  16. arXiv:2001.00605  [pdf, other

    cs.LG cs.RO eess.SY

    Zero-Shot Reinforcement Learning with Deep Attention Convolutional Neural Networks

    Authors: Sahika Genc, Sunil Mallya, Sravan Bodapati, Tao Sun, Yunzhe Tao

    Abstract: Simulation-to-simulation and simulation-to-real world transfer of neural network models have been a difficult problem. To close the reality gap, prior methods to simulation-to-real world transfer focused on domain adaptation, decoupling perception and dynamics and solving each problem separately, and randomization of agent parameters and environment conditions to expose the learning agent to a var… ▽ More

    Submitted 2 January, 2020; originally announced January 2020.