Skip to main content

Showing 1–17 of 17 results for author: Dingliwal, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17935  [pdf, other

    cs.CL cs.SD eess.AS

    Sequential Editing for Lifelong Training of Speech Recognition Models

    Authors: Devang Kulshreshtha, Saket Dingliwal, Brady Houston, Nikolaos Pappas, Srikanth Ronanki

    Abstract: Automatic Speech Recognition (ASR) traditionally assumes known domains, but adding data from a new domain raises concerns about computational inefficiencies linked to retraining models on both existing and new domains. Fine-tuning solely on new domain risks Catastrophic Forgetting (CF). To address this, Lifelong Learning (LLL) algorithms have been proposed for ASR. Prior research has explored tech… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  2. arXiv:2405.08317  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models

    Authors: Raghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ronanki, Anshu Bhatia, Karel Mundnich, Saket Dingliwal, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Srikanth Vishnubhotla, Daniel Garcia-Romero, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this work, we investigate the potential vulnerabilities of such instruction-following speech-language models to adversarial attacks and jailbreaking. Specifically, we… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 9+6 pages, Submitted to ACL 2024

  3. arXiv:2405.08295  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechVerse: A Large-scale Generalizable Audio Language Model

    Authors: Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, Zhaocheng Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore devel… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Single Column, 13 page

  4. arXiv:2311.08402  [pdf, other

    cs.CL cs.IR cs.SD eess.AS

    Retrieve and Copy: Scaling ASR Personalization to Large Catalogs

    Authors: Sai Muralidhar Jayanthi, Devang Kulshreshtha, Saket Dingliwal, Srikanth Ronanki, Sravan Bodapati

    Abstract: Personalization of automatic speech recognition (ASR) models is a widely studied topic because of its many practical applications. Most recently, attention-based contextual biasing techniques are used to improve the recognition of rare words and domain specific entities. However, due to performance constraints, the biasing is often limited to a few thousand entities, restricting real-world usabili… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023

  5. arXiv:2307.00759  [pdf, other

    cs.CL cs.SD eess.AS

    Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages

    Authors: Devang Kulshreshtha, Saket Dingliwal, Brady Houston, Sravan Bodapati

    Abstract: Connectionist Temporal Classification (CTC) models are popular for their balance between speed and performance for Automatic Speech Recognition (ASR). However, these CTC models still struggle in other areas, such as personalization towards custom words. A recent approach explores Contextual Adapters, wherein an attention-based biasing model for CTC is used to improve the recognition of custom enti… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: Published at INTERSPEECH 2023

  6. arXiv:2307.00453  [pdf, other

    cs.CL cs.SD eess.AS

    Don't Stop Self-Supervision: Accent Adaptation of Speech Representations via Residual Adapters

    Authors: Anshu Bhatia, Sanchit Sinha, Saket Dingliwal, Karthik Gopalakrishnan, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Speech representations learned in a self-supervised fashion from massive unlabeled speech corpora have been adapted successfully toward several downstream tasks. However, such representations may be skewed toward canonical data characteristics of such corpora and perform poorly on atypical, non-native accented speaker populations. With the state-of-the-art HuBERT model as a baseline, we propose an… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

  7. arXiv:2212.09095  [pdf, other

    cs.CL cs.AI

    Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

    Authors: Hritik Bansal, Karthik Gopalakrishnan, Saket Dingliwal, Sravan Bodapati, Katrin Kirchhoff, Dan Roth

    Abstract: Language models have been shown to perform better with an increase in scale on a wide variety of tasks via the in-context learning paradigm. In this paper, we investigate the hypothesis that the ability of a large language model to in-context learn-perform a task is not uniformly spread across all of its underlying components. Using a 66 billion parameter language model (OPT-66B) across a diverse… ▽ More

    Submitted 16 August, 2023; v1 submitted 18 December, 2022; originally announced December 2022.

    Comments: Accepted at Annual Meeting of the Association for Computational Linguistics (ACL) 2023, Main Proceedings

  8. arXiv:2210.09510  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Personalization of CTC Speech Recognition Models with Contextual Adapters and Adaptive Boosting

    Authors: Saket Dingliwal, Monica Sunkara, Sravan Bodapati, Srikanth Ronanki, Jeff Farris, Katrin Kirchhoff

    Abstract: End-to-end speech recognition models trained using joint Connectionist Temporal Classification (CTC)-Attention loss have gained popularity recently. In these models, a non-autoregressive CTC decoder is often used at inference time due to its speed and simplicity. However, such models are hard to personalize because of their conditional independence assumption that prevents output tokens from previ… ▽ More

    Submitted 13 November, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: To appear in SLT 2022

  9. arXiv:2112.08718  [pdf, other

    cs.CL cs.LG

    Prompt Tuning GPT-2 language model for parameter-efficient domain adaptation of ASR systems

    Authors: Saket Dingliwal, Ashish Shenoy, Sravan Bodapati, Ankur Gandhe, Ravi Teja Gadde, Katrin Kirchhoff

    Abstract: Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications in very diverse domains creating a need to adapt to new domains with small memory and deployment overhead. In this work, we introduce domain-prompts, a methodology that involves training a small number of domain embedding parameters to prime a Transformer-based Language Model (LM) to a particular do… ▽ More

    Submitted 21 July, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted at InterSpeech 2022

  10. arXiv:2110.06502  [pdf, other

    cs.CL

    Prompt-tuning in ASR systems for efficient domain-adaptation

    Authors: Saket Dingliwal, Ashish Shenoy, Sravan Bodapati, Ankur Gandhe, Ravi Teja Gadde, Katrin Kirchhoff

    Abstract: Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications in very diverse domains. Since domain-specific systems perform better than their generic counterparts on in-domain evaluation, the need for memory and compute-efficient domain adaptation is obvious. Particularly, adapting parameter-heavy transformer-based language models used for rescoring ASR hypot… ▽ More

    Submitted 22 October, 2021; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: WeCNLP 2021 camera-ready

  11. arXiv:2101.06779  [pdf, other

    cs.CL

    Few Shot Dialogue State Tracking using Meta-learning

    Authors: Saket Dingliwal, Bill Gao, Sanchit Agarwal, Chien-Wei Lin, Tagyoung Chung, Dilek Hakkani-Tur

    Abstract: Dialogue State Tracking (DST) forms a core component of automated chatbot based systems designed for specific goals like hotel, taxi reservation, tourist information, etc. With the increasing need to deploy such systems in new domains, solving the problem of zero/few-shot DST has become necessary. There has been a rising trend for learning to transfer knowledge from resource-rich domains to unknow… ▽ More

    Submitted 5 April, 2021; v1 submitted 17 January, 2021; originally announced January 2021.

    Comments: To appear in EACL 2021

  12. arXiv:2008.08148  [pdf, other

    cs.CV cs.LG

    Robust Handwriting Recognition with Limited and Noisy Data

    Authors: Hai Pham, Amrith Setlur, Saket Dingliwal, Tzu-Hsiang Lin, Barnabas Poczos, Kang Huang, Zhuo Li, Jae Lim, Collin McCormack, Tam Vu

    Abstract: Despite the advent of deep learning in computer vision, the general handwriting recognition problem is far from solved. Most existing approaches focus on handwriting datasets that have clearly written text and carefully segmented labels. In this paper, we instead focus on learning handwritten characters from maintenance logs, a constrained setting where data is very limited and noisy. We break the… ▽ More

    Submitted 18 August, 2020; originally announced August 2020.

    Comments: icfhr2020

  13. arXiv:2007.02523  [pdf, other

    cs.LG stat.ML

    Covariate Distribution Aware Meta-learning

    Authors: Amrith Setlur, Saket Dingliwal, Barnabas Poczos

    Abstract: Meta-learning has proven to be successful for few-shot learning across the regression, classification, and reinforcement learning paradigms. Recent approaches have adopted Bayesian interpretations to improve gradient-based meta-learners by quantifying the uncertainty of the post-adaptation estimates. Most of these works almost completely ignore the latent relationship between the covariate distrib… ▽ More

    Submitted 27 November, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

    Journal ref: ICML 2020 Lifelong Learning Workshop

  14. arXiv:2003.04273  [pdf, other

    cs.LG cs.NE stat.ML

    Finding Input Characterizations for Output Properties in ReLU Neural Networks

    Authors: Saket Dingliwal, Divyansh Pareek, Jatin Arora

    Abstract: Deep Neural Networks (DNNs) have emerged as a powerful mechanism and are being increasingly deployed in real-world safety-critical domains. Despite the widespread success, their complex architecture makes proving any formal guarantees about them difficult. Identifying how logical notions of high-level correctness relate to the complex low-level network architecture is a significant challenge. In t… ▽ More

    Submitted 9 March, 2020; originally announced March 2020.

    Comments: 5 page

  15. arXiv:1908.00860  [pdf, ps, other

    cs.LO

    Advances in Symmetry Breaking for SAT Modulo Theories

    Authors: Saket Dingliwal, Ronak Agarwal, Happy Mittal, Parag Singla

    Abstract: Symmetry breaking is a popular technique to reduce the search space for SAT solving by exploiting the underlying symmetry over variables and clauses in a formula. The key idea is to first identify sets of assignments which fall in the same symmetry class, and then impose ordering constraints, called Symmetry Breaking Predicates (SBPs), such that only one (or a small subset) of these assignments is… ▽ More

    Submitted 16 January, 2020; v1 submitted 2 August, 2019; originally announced August 2019.

    Comments: SMT 2019, SMT, CVC4, Symmetry-breaking, starAI

  16. arXiv:1902.01629  [pdf, other

    cs.SI

    Literature Survey on Finding Influential Communities in Large Scale Networks

    Authors: Prakhar Ganesh, Saket Dingliwal, Rahul Agarwal

    Abstract: Community or modular structure is considered to be a significant property of large scale real-world graphs such as social or information networks. Detecting influential clusters or communities in these graphs is a problem of considerable interest as it often accounts for the functionality of the system. We aim to provide a thorough exposition of the topic, including the main elements of the proble… ▽ More

    Submitted 5 February, 2019; originally announced February 2019.

  17. arXiv:1902.01615  [pdf, other

    cs.CL

    Restructuring Conversations using Discourse Relations for Zero-shot Abstractive Dialogue Summarization

    Authors: Prakhar Ganesh, Saket Dingliwal

    Abstract: Dialogue summarization is a challenging problem due to the informal and unstructured nature of conversational data. Recent advances in abstractive summarization have been focused on data-hungry neural models and adapting these models to a new domain requires the availability of domain-specific manually annotated corpus created by linguistic experts. We propose a zero-shot abstractive dialogue summ… ▽ More

    Submitted 13 October, 2020; v1 submitted 5 February, 2019; originally announced February 2019.

    Comments: 4 pages + supplementary