Skip to main content

Showing 1–6 of 6 results for author: Gadde, R T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2308.09716  [pdf, other

    cs.CV cs.AI

    Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization

    Authors: Soumik Mukhopadhyay, Saksham Suri, Ravi Teja Gadde, Abhinav Shrivastava

    Abstract: The task of lip synchronization (lip-sync) seeks to match the lips of human faces with different audio. It has various applications in the film industry as well as for creating virtual avatars and for video conferencing. This is a challenging problem as one needs to simultaneously introduce detailed, realistic lip movements while preserving the identity, pose, emotions, and image quality. Many of… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: Website: see https://soumik-kanad.github.io/diff2lip . Submission under review

  2. arXiv:2112.08718  [pdf, other

    cs.CL cs.LG

    Prompt Tuning GPT-2 language model for parameter-efficient domain adaptation of ASR systems

    Authors: Saket Dingliwal, Ashish Shenoy, Sravan Bodapati, Ankur Gandhe, Ravi Teja Gadde, Katrin Kirchhoff

    Abstract: Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications in very diverse domains creating a need to adapt to new domains with small memory and deployment overhead. In this work, we introduce domain-prompts, a methodology that involves training a small number of domain embedding parameters to prime a Transformer-based Language Model (LM) to a particular do… ▽ More

    Submitted 21 July, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted at InterSpeech 2022

  3. arXiv:2110.06502  [pdf, other

    cs.CL

    Prompt-tuning in ASR systems for efficient domain-adaptation

    Authors: Saket Dingliwal, Ashish Shenoy, Sravan Bodapati, Ankur Gandhe, Ravi Teja Gadde, Katrin Kirchhoff

    Abstract: Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications in very diverse domains. Since domain-specific systems perform better than their generic counterparts on in-domain evaluation, the need for memory and compute-efficient domain adaptation is obvious. Particularly, adapting parameter-heavy transformer-based language models used for rescoring ASR hypot… ▽ More

    Submitted 22 October, 2021; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: WeCNLP 2021 camera-ready

  4. arXiv:2108.00082  [pdf, other

    cs.CL cs.AI

    Towards Continual Entity Learning in Language Models for Conversational Agents

    Authors: Ravi Teja Gadde, Ivan Bulyko

    Abstract: Neural language models (LM) trained on diverse corpora are known to work well on previously seen entities, however, updating these models with dynamically changing entities such as place names, song titles and shop** items requires re-training from scratch and collecting full sentences containing these entities. We aim to address this issue, by introducing entity-aware language models (EALM), wh… ▽ More

    Submitted 14 September, 2021; v1 submitted 30 July, 2021; originally announced August 2021.

    Comments: Submitted to NeurIPS 2021. Paper is under review

  5. arXiv:2007.16013  [pdf, other

    cs.CL cs.LG stat.ML

    Neural Composition: Learning to Generate from Multiple Models

    Authors: Denis Filimonov, Ravi Teja Gadde, Ariya Rastrow

    Abstract: Decomposing models into multiple components is critically important in many applications such as language modeling (LM) as it enables adapting individual components separately and biasing of some components to the user's personal preferences. Conventionally, contextual and personalized adaptation for language models, are achieved through class-based factorization, which requires class-annotated da… ▽ More

    Submitted 9 November, 2020; v1 submitted 10 July, 2020; originally announced July 2020.

    Comments: Self-Supervised Learning for Speech and Audio Processing Workshop @ NeurIPS 2020

    ACM Class: I.2.6; I.2.7

  6. arXiv:1904.03288  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Jasper: An End-to-End Convolutional Neural Acoustic Model

    Authors: Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde

    Abstract: In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data. Our model, Jasper, uses only 1D convolutions, batch normalization, ReLU, dropout, and residual connections. To improve training, we further introduce a new layer-wise optimizer called NovoGrad. Through experiments, we demonstrate that the proposed deep arc… ▽ More

    Submitted 26 August, 2019; v1 submitted 5 April, 2019; originally announced April 2019.

    Comments: Accepted to INTERSPEECH 2019