Skip to main content

Showing 1–13 of 13 results for author: Gadde, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2308.09716  [pdf, other

    cs.CV cs.AI

    Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization

    Authors: Soumik Mukhopadhyay, Saksham Suri, Ravi Teja Gadde, Abhinav Shrivastava

    Abstract: The task of lip synchronization (lip-sync) seeks to match the lips of human faces with different audio. It has various applications in the film industry as well as for creating virtual avatars and for video conferencing. This is a challenging problem as one needs to simultaneously introduce detailed, realistic lip movements while preserving the identity, pose, emotions, and image quality. Many of… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: Website: see https://soumik-kanad.github.io/diff2lip . Submission under review

  2. arXiv:2202.11833  [pdf, other

    cs.CV cs.AI

    Near Perfect GAN Inversion

    Authors: Qianli Feng, Viraj Shah, Raghudeep Gadde, Pietro Perona, Aleix Martinez

    Abstract: To edit a real photo using Generative Adversarial Networks (GANs), we need a GAN inversion algorithm to identify the latent vector that perfectly reproduces it. Unfortunately, whereas existing inversion algorithms can synthesize images similar to real photos, they cannot generate the identical clones needed in most applications. Here, we derive an algorithm that achieves near perfect reconstructio… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

  3. arXiv:2201.10423  [pdf, other

    cs.CV cs.GR cs.LG

    Rayleigh EigenDirections (REDs): GAN latent space traversals for multidimensional features

    Authors: Guha Balakrishnan, Raghudeep Gadde, Aleix Martinez, Pietro Perona

    Abstract: We present a method for finding paths in a deep generative model's latent space that can maximally vary one set of image features while holding others constant. Crucially, unlike past traversal approaches, ours can manipulate multidimensional features of an image such as facial identity and pixels within a specified region. Our method is principled and conceptually simple: optimal traversal direct… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

  4. arXiv:2112.08718  [pdf, other

    cs.CL cs.LG

    Prompt Tuning GPT-2 language model for parameter-efficient domain adaptation of ASR systems

    Authors: Saket Dingliwal, Ashish Shenoy, Sravan Bodapati, Ankur Gandhe, Ravi Teja Gadde, Katrin Kirchhoff

    Abstract: Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications in very diverse domains creating a need to adapt to new domains with small memory and deployment overhead. In this work, we introduce domain-prompts, a methodology that involves training a small number of domain embedding parameters to prime a Transformer-based Language Model (LM) to a particular do… ▽ More

    Submitted 21 July, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted at InterSpeech 2022

  5. arXiv:2110.06502  [pdf, other

    cs.CL

    Prompt-tuning in ASR systems for efficient domain-adaptation

    Authors: Saket Dingliwal, Ashish Shenoy, Sravan Bodapati, Ankur Gandhe, Ravi Teja Gadde, Katrin Kirchhoff

    Abstract: Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications in very diverse domains. Since domain-specific systems perform better than their generic counterparts on in-domain evaluation, the need for memory and compute-efficient domain adaptation is obvious. Particularly, adapting parameter-heavy transformer-based language models used for rescoring ASR hypot… ▽ More

    Submitted 22 October, 2021; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: WeCNLP 2021 camera-ready

  6. arXiv:2108.00082  [pdf, other

    cs.CL cs.AI

    Towards Continual Entity Learning in Language Models for Conversational Agents

    Authors: Ravi Teja Gadde, Ivan Bulyko

    Abstract: Neural language models (LM) trained on diverse corpora are known to work well on previously seen entities, however, updating these models with dynamically changing entities such as place names, song titles and shop** items requires re-training from scratch and collecting full sentences containing these entities. We aim to address this issue, by introducing entity-aware language models (EALM), wh… ▽ More

    Submitted 14 September, 2021; v1 submitted 30 July, 2021; originally announced August 2021.

    Comments: Submitted to NeurIPS 2021. Paper is under review

  7. arXiv:2007.16013  [pdf, other

    cs.CL cs.LG stat.ML

    Neural Composition: Learning to Generate from Multiple Models

    Authors: Denis Filimonov, Ravi Teja Gadde, Ariya Rastrow

    Abstract: Decomposing models into multiple components is critically important in many applications such as language modeling (LM) as it enables adapting individual components separately and biasing of some components to the user's personal preferences. Conventionally, contextual and personalized adaptation for language models, are achieved through class-based factorization, which requires class-annotated da… ▽ More

    Submitted 9 November, 2020; v1 submitted 10 July, 2020; originally announced July 2020.

    Comments: Self-Supervised Learning for Speech and Audio Processing Workshop @ NeurIPS 2020

    ACM Class: I.2.6; I.2.7

  8. arXiv:1904.03288  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Jasper: An End-to-End Convolutional Neural Acoustic Model

    Authors: Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde

    Abstract: In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data. Our model, Jasper, uses only 1D convolutions, batch normalization, ReLU, dropout, and residual connections. To improve training, we further introduce a new layer-wise optimizer called NovoGrad. Through experiments, we demonstrate that the proposed deep arc… ▽ More

    Submitted 26 August, 2019; v1 submitted 5 April, 2019; originally announced April 2019.

    Comments: Accepted to INTERSPEECH 2019

  9. arXiv:1811.00707  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Training Neural Speech Recognition Systems with Synthetic Speech Augmentation

    Authors: Jason Li, Ravi Gadde, Boris Ginsburg, Vitaly Lavrukhin

    Abstract: Building an accurate automatic speech recognition (ASR) system requires a large dataset that contains many hours of labeled speech samples produced by a diverse set of speakers. The lack of such open free datasets is one of the main issues preventing advancements in ASR research. To address this problem, we propose to augment a natural speech dataset with synthetic speech. We train very large end-… ▽ More

    Submitted 1 November, 2018; originally announced November 2018.

    Comments: Pre-print. Work in progress, 5 pages, 1 figure

  10. arXiv:1708.03088  [pdf, other

    cs.CV

    Semantic Video CNNs through Representation War**

    Authors: Raghudeep Gadde, Varun Jampani, Peter V. Gehler

    Abstract: In this work, we propose a technique to convert CNN models for semantic segmentation of static images into CNNs for video data. We describe a war** method that can be used to augment existing architectures with very little extra computational cost. This module is called NetWarp and we demonstrate its use for a range of network architectures. The main design principle is to use optical flow of ad… ▽ More

    Submitted 10 August, 2017; originally announced August 2017.

    Comments: ICCV 2017

  11. arXiv:1612.05478  [pdf, other

    cs.CV

    Video Propagation Networks

    Authors: Varun Jampani, Raghudeep Gadde, Peter V. Gehler

    Abstract: We propose a technique that propagates information forward through video data. The method is conceptually simple and can be applied to tasks that require the propagation of structured information, such as semantic labels, based on video content. We propose a 'Video Propagation Network' that processes video frames in an adaptive manner. The model is applied online: it propagates information forward… ▽ More

    Submitted 11 April, 2017; v1 submitted 16 December, 2016; originally announced December 2016.

    Comments: Appearing in Computer Vision and Pattern Recognition, 2017 (CVPR'17)

  12. arXiv:1606.06437  [pdf, other

    cs.CV

    Efficient 2D and 3D Facade Segmentation using Auto-Context

    Authors: Raghudeep Gadde, Varun Jampani, Renaud Marlet, Peter V. Gehler

    Abstract: This paper introduces a fast and efficient segmentation technique for 2D images and 3D point clouds of building facades. Facades of buildings are highly structured and consequently most methods that have been proposed for this problem aim to make use of this strong prior information. Contrary to most prior work, we are describing a system that is almost domain independent and consists of standard… ▽ More

    Submitted 21 June, 2016; originally announced June 2016.

    Comments: 8 pages

  13. arXiv:1511.06739  [pdf, other

    cs.CV

    Superpixel Convolutional Networks using Bilateral Inceptions

    Authors: Raghudeep Gadde, Varun Jampani, Martin Kiefel, Daniel Kappler, Peter V. Gehler

    Abstract: In this paper we propose a CNN architecture for semantic image segmentation. We introduce a new 'bilateral inception' module that can be inserted in existing CNN architectures and performs bilateral filtering, at multiple feature-scales, between superpixels in an image. The feature spaces for bilateral filtering and other parameters of the module are learned end-to-end using standard backpropagati… ▽ More

    Submitted 8 August, 2016; v1 submitted 20 November, 2015; originally announced November 2015.

    Comments: European Conference on Computer Vision (ECCV), 2016

    ACM Class: I.2.10; I.2.6