Skip to main content

Showing 1–50 of 80 results for author: Namboodiri, V P

.
  1. arXiv:2406.14201  [pdf, other

    cs.CV

    Trusting Semantic Segmentation Networks

    Authors: Samik Some, Vinay P. Namboodiri

    Abstract: Semantic segmentation has become an important task in computer vision with the growth of self-driving cars, medical image segmentation, etc. Although current models provide excellent results, they are still far from perfect and while there has been significant work in trying to improve the performance, both with respect to accuracy and speed of segmentation, there has been little work which analys… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.10892  [pdf, other

    cs.LG

    DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning

    Authors: Utsav Singh, Souradip Chakraborty, Wesley A. Suttle, Brian M. Sadler, Vinay P Namboodiri, Amrit Singh Bedi

    Abstract: Learning control policies to perform complex robotics tasks from human preference data presents significant challenges. On the one hand, the complexity of such tasks typically requires learning policies to perform a variety of subtasks, then combining them to achieve the overall goal. At the same time, comprehensive, well-engineered reward functions are typically unavailable in such problems, whil… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  3. arXiv:2406.05881  [pdf, other

    cs.LG cs.CL cs.RO

    LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning

    Authors: Utsav Singh, Pramit Bhattacharyya, Vinay P. Namboodiri

    Abstract: Develo** interactive systems that leverage natural language instructions to solve complex robotic control tasks has been a long-desired goal in the robotics community. Large Language Models (LLMs) have demonstrated exceptional abilities in handling complex tasks, including logical reasoning, in-context learning, and code generation. However, predicting low-level robotic actions using LLMs poses… ▽ More

    Submitted 16 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  4. arXiv:2404.13423  [pdf, other

    cs.LG

    PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling

    Authors: Utsav Singh, Wesley A. Suttle, Brian M. Sadler, Vinay P. Namboodiri, Amrit Singh Bedi

    Abstract: In this work, we introduce PIPER: Primitive-Informed Preference-based Hierarchical reinforcement learning via Hindsight Relabeling, a novel approach that leverages preference-based learning to learn a reward model, and subsequently uses this reward model to relabel higher-level replay buffers. Since this reward is unaffected by lower primitive behavior, our relabeling-based approach is able to mit… ▽ More

    Submitted 16 June, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

  5. arXiv:2403.18063  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    Heracles: A Hybrid SSM-Transformer Model for High-Resolution Image and Time-Series Analysis

    Authors: Badri N. Patro, Suhas Ranganath, Vinay P. Namboodiri, Vijay S. Agneeswaran

    Abstract: Transformers have revolutionized image modeling tasks with adaptations like DeIT, Swin, SVT, Biformer, STVit, and FDVIT. However, these models often face challenges with inductive bias and high quadratic complexity, making them less efficient for high-resolution images. State space models (SSMs) such as Mamba, V-Mamba, ViM, and SiMBA offer an alternative to handle high resolution images in compute… ▽ More

    Submitted 3 June, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  6. arXiv:2311.14029  [pdf, other

    cs.CV cs.LG

    Understanding the Vulnerability of CLIP to Image Compression

    Authors: Cangxiong Chen, Vinay P. Namboodiri, Julian Padget

    Abstract: CLIP is a widely used foundational vision-language model that is used for zero-shot image recognition and other image-text alignment tasks. We demonstrate that CLIP is vulnerable to change in image quality under compression. This surprising result is further analysed using an attribution method-Integrated Gradients. Using this attribution method, we are able to better understand both quantitativel… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: R0-FoMo: Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models at NeurIPS 2023

  7. arXiv:2309.08227  [pdf, other

    cs.LG cs.AI cs.CV

    VERSE: Virtual-Gradient Aware Streaming Lifelong Learning with Anytime Inference

    Authors: Soumya Banerjee, Vinay K. Verma, Avideep Mukherjee, Deepak Gupta, Vinay P. Namboodiri, Piyush Rai

    Abstract: Lifelong learning or continual learning is the problem of training an AI agent continuously while also preventing it from forgetting its previously acquired knowledge. Streaming lifelong learning is a challenging setting of lifelong learning with the goal of continuous learning in a dynamic non-stationary environment without forgetting. We introduce a novel approach to lifelong learning, which is… ▽ More

    Submitted 19 February, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

  8. arXiv:2306.06394  [pdf, other

    cs.LG

    PEAR: Primitive enabled Adaptive Relabeling for boosting Hierarchical Reinforcement Learning

    Authors: Utsav Singh, Vinay P. Namboodiri

    Abstract: Hierarchical reinforcement learning (HRL) has the potential to solve complex long horizon tasks using temporal abstraction and increased exploration. However, hierarchical agents are difficult to train due to inherent non-stationarity. We present primitive enabled adaptive relabeling (PEAR), a two-phase approach where we first perform adaptive relabeling on a few expert demonstrations to generate… ▽ More

    Submitted 21 April, 2024; v1 submitted 10 June, 2023; originally announced June 2023.

  9. arXiv:2304.06446  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    SpectFormer: Frequency and Attention is what you need in a Vision Transformer

    Authors: Badri N. Patro, Vinay P. Namboodiri, Vijay Srinivas Agneeswaran

    Abstract: Vision transformers have been applied successfully for image recognition tasks. There have been either multi-headed self-attention based (ViT \cite{dosovitskiy2020image}, DeIT, \cite{touvron2021training}) similar to the original work in textual models or more recently based on spectral layers (Fnet\cite{lee2021fnet}, GFNet\cite{rao2021global}, AFNO\cite{guibas2021efficient}). We hypothesize that b… ▽ More

    Submitted 14 April, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: The project page is available at this webpage \url{https://badripatro.github.io/SpectFormers/}

  10. arXiv:2304.03535  [pdf, other

    cs.LG

    CRISP: Curriculum inducing Primitive Informed Subgoal Prediction

    Authors: Utsav Singh, Vinay P. Namboodiri

    Abstract: Hierarchical reinforcement learning (HRL) is a promising approach that uses temporal abstraction to solve complex long horizon problems. However, simultaneously learning a hierarchy of policies is unstable as it is challenging to train higher-level policy when the lower-level primitive is non-stationary. In this paper, we present CRISP, a novel HRL algorithm that effectively generates a curriculum… ▽ More

    Submitted 21 April, 2024; v1 submitted 7 April, 2023; originally announced April 2023.

  11. arXiv:2301.11892  [pdf, other

    cs.LG cs.AI cs.CV

    Streaming LifeLong Learning With Any-Time Inference

    Authors: Soumya Banerjee, Vinay Kumar Verma, Vinay P. Namboodiri

    Abstract: Despite rapid advancements in lifelong learning (LLL) research, a large body of research mainly focuses on improving the performance in the existing \textit{static} continual learning (CL) setups. These methods lack the ability to succeed in a rapidly changing \textit{dynamic} environment, where an AI agent needs to quickly learn new instances in a `single pass' from the non-i.i.d (also possibly t… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2110.10741

  12. arXiv:2210.16579  [pdf, other

    cs.CV

    INR-V: A Continuous Representation Space for Video-based Generative Tasks

    Authors: Bipasha Sen, Aditya Agarwal, Vinay P Namboodiri, C. V. Jawahar

    Abstract: Generating videos is a complex task that is accomplished by generating a set of temporally coherent images frame-by-frame. This limits the expressivity of videos to only image-based operations on the individual video frames needing network designs to obtain temporally coherent trajectories in the underlying image space. We propose INR-V, a video representation network that learns a continuous spac… ▽ More

    Submitted 2 April, 2023; v1 submitted 29 October, 2022; originally announced October 2022.

    Comments: Published in Transactions on Machine Learning Research (10/2022); https://openreview.net/forum?id=aIoEkwc2oB

  13. arXiv:2210.03692  [pdf, other

    cs.CV

    Compressing Video Calls using Synthetic Talking Heads

    Authors: Madhav Agarwal, Anchit Gupta, Rudrabha Mukhopadhyay, Vinay P. Namboodiri, C V Jawahar

    Abstract: We leverage the modern advancements in talking head generation to propose an end-to-end system for talking head video compression. Our algorithm transmits pivot frames intermittently while the rest of the talking head video is generated by animating them. We use a state-of-the-art face reenactment network to detect key points in the non-pivot frames and transmit them to the receiver. A dense flow… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: British Machine Vision Conference (BMVC), 2022

  14. arXiv:2209.00642  [pdf, other

    cs.CV cs.CL cs.SD eess.AS

    Lip-to-Speech Synthesis for Arbitrary Speakers in the Wild

    Authors: Sindhu B Hegde, K R Prajwal, Rudrabha Mukhopadhyay, Vinay P Namboodiri, C. V. Jawahar

    Abstract: In this work, we address the problem of generating speech from silent lip videos for any speaker in the wild. In stark contrast to previous works, our method (i) is not restricted to a fixed number of speakers, (ii) does not explicitly impose constraints on the domain or the vocabulary and (iii) deals with videos that are recorded in the wild as opposed to within laboratory settings. The task pres… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

    Comments: Accepted in ACM-MM 2022, 9 pages, 2 pages supplementary, 7 Figures

  15. Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors

    Authors: Sindhu B Hegde, Rudrabha Mukhopadhyay, Vinay P Namboodiri, C. V. Jawahar

    Abstract: In this paper, we explore an interesting question of what can be obtained from an $8\times8$ pixel video sequence. Surprisingly, it turns out to be quite a lot. We show that when we process this $8\times8$ video with the right set of audio and image priors, we can obtain a full-length, $256\times256$ video. We achieve this $32\times$ scaling of an extremely low-resolution input using our novel aud… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: Accepted in ACM-MM 2022, 10 pages, 6 pages supplementary, 18 Figures

  16. arXiv:2206.02050  [pdf, other

    cs.CV cs.SD eess.AS

    Learning Speaker-specific Lip-to-Speech Generation

    Authors: Munender Varshney, Ravindra Yadav, Vinay P. Namboodiri, Rajesh M Hegde

    Abstract: Understanding the lip movement and inferring the speech from it is notoriously difficult for the common person. The task of accurate lip-reading gets help from various cues of the speaker and its contextual or environmental setting. Every speaker has a different accent and speaking style, which can be inferred from their visual and speech features. This work aims to understand the correlation/mapp… ▽ More

    Submitted 20 August, 2022; v1 submitted 4 June, 2022; originally announced June 2022.

    Comments: Accepted at ICPR 2022

  17. arXiv:2202.10943  [pdf, other

    cs.LG cs.AI cs.CV

    Gradient Based Activations for Accurate Bias-Free Learning

    Authors: Vinod K Kurmi, Rishabh Sharma, Yash Vardhan Sharma, Vinay P. Namboodiri

    Abstract: Bias mitigation in machine learning models is imperative, yet challenging. While several approaches have been proposed, one view towards mitigating bias is through adversarial learning. A discriminator is used to identify the bias attributes such as gender, age or race in question. This discriminator is used adversarially to ensure that it cannot distinguish the bias attributes. The main drawback… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

    Comments: AAAI 2022(Accepted)

  18. arXiv:2110.10741  [pdf, other

    cs.LG cs.AI cs.CV

    Class Incremental Online Streaming Learning

    Authors: Soumya Banerjee, Vinay Kumar Verma, Toufiq Parag, Maneesh Singh, Vinay P. Namboodiri

    Abstract: A wide variety of methods have been developed to enable lifelong learning in conventional deep neural networks. However, to succeed, these methods require a `batch' of samples to be available and visited multiple times during training. While this works well in a static setting, these methods continue to suffer in a more realistic situation where data arrives in \emph{online streaming manner}. We e… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

  19. Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor

    Authors: Anchit Gupta, Faizan Farooq Khan, Rudrabha Mukhopadhyay, Vinay P. Namboodiri, C. V. Jawahar

    Abstract: This paper proposes a video editor based on OpenShot with several state-of-the-art facial video editing algorithms as added functionalities. Our editor provides an easy-to-use interface to apply modern lip-syncing algorithms interactively. Apart from lip-syncing, the editor also uses audio and facial re-enactment to generate expressive talking faces. The manual control improves the overall experie… ▽ More

    Submitted 16 October, 2021; originally announced October 2021.

    Comments: 9 pages, 7 figures, accepted in ICVGIP 2021

  20. arXiv:2109.12135  [pdf, other

    cs.CV

    Attentive Contractive Flow with Lipschitz-constrained Self-Attention

    Authors: Avideep Mukherjee, Badri Narayan Patro, Vinay P. Namboodiri

    Abstract: Normalizing flows provide an elegant method for obtaining tractable density estimates from distributions by using invertible transformations. The main challenge is to improve the expressivity of the models while kee** the invertibility constraints intact. We propose to do so via the incorporation of localized self-attention. However, conventional self-attention mechanisms don't satisfy the requi… ▽ More

    Submitted 6 September, 2023; v1 submitted 24 September, 2021; originally announced September 2021.

    Comments: 10 pages, to be published at BMVC 2023

  21. arXiv:2107.09622  [pdf, other

    cs.CL

    More Parameters? No Thanks!

    Authors: Zeeshan Khan, Kartheek Akella, Vinay P. Namboodiri, C V Jawahar

    Abstract: This work studies the long-standing problems of model capacity and negative interference in multilingual neural machine translation MNMT. We use network pruning techniques and observe that pruning 50-70% of the parameters from a trained MNMT model results only in a 0.29-1.98 drop in the BLEU score. Suggesting that there exist large redundancies even in MNMT models. These observations motivate us t… ▽ More

    Submitted 20 July, 2021; originally announced July 2021.

  22. arXiv:2107.05241  [pdf, other

    cs.LG cs.CV stat.ML

    Prb-GAN: A Probabilistic Framework for GAN Modelling

    Authors: Blessen George, Vinod K. Kurmi, Vinay P. Namboodiri

    Abstract: Generative adversarial networks (GANs) are very popular to generate realistic images, but they often suffer from the training instability issues and the phenomenon of mode loss. In order to attain greater diversity in GAN synthesized data, it is critical to solving the problem of mode loss. Our work explores probabilistic approaches to GAN modelling that could allow us to tackle these issues. We p… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

  23. arXiv:2107.04231  [pdf, other

    cs.LG cs.AI cs.CV

    Exploring Dropout Discriminator for Domain Adaptation

    Authors: Vinod K Kurmi, Venkatesh K Subramanian, Vinay P. Namboodiri

    Abstract: Adaptation of a classifier to new domains is one of the challenging problems in machine learning. This has been addressed using many deep and non-deep learning based methods. Among the methodologies used, that of adversarial learning is widely applied to solve many deep learning problems along with domain adaptation. These methods are based on a discriminator that ensures source and target distrib… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: This work is an extension of our BMVC-2019 paper (arXiv:1907.10628)

  24. arXiv:2107.00727  [pdf, other

    cs.LG cs.CV

    Mitigating Uncertainty of Classifier for Unsupervised Domain Adaptation

    Authors: Shanu Kumar, Vinod Kumar Kurmi, Praphul Singh, Vinay P Namboodiri

    Abstract: Understanding unsupervised domain adaptation has been an important task that has been well explored. However, the wide variety of methods have not analyzed the role of a classifier's performance in detail. In this paper, we thoroughly examine the role of a classifier in terms of matching source and target distributions. We specifically investigate the classifier ability by matching a) the distribu… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

  25. arXiv:2107.00067  [pdf, other

    cs.CV

    Fair Visual Recognition in Limited Data Regime using Self-Supervision and Self-Distillation

    Authors: Pratik Mazumder, Pravendra Singh, Vinay P. Namboodiri

    Abstract: Deep learning models generally learn the biases present in the training data. Researchers have proposed several approaches to mitigate such biases and make the model fair. Bias mitigation techniques assume that a sufficiently large number of training examples are present. However, we observe that if the training data is limited, then the effectiveness of bias mitigation methods is severely degrade… ▽ More

    Submitted 30 June, 2021; originally announced July 2021.

    Comments: Under Review

  26. arXiv:2104.02656  [pdf, other

    cs.CV cs.AI cs.GR cs.MM cs.SD eess.AS eess.IV

    Collaborative Learning to Generate Audio-Video Jointly

    Authors: Vinod K Kurmi, Vipul Bajaj, Badri N Patro, K S Venkatesh, Vinay P Namboodiri, Preethi Jyothi

    Abstract: There have been a number of techniques that have demonstrated the generation of multimedia data for one modality at a time using GANs, such as the ability to generate images, videos, and audio. However, so far, the task of multi-modal generation of data, specifically for audio and videos both, has not been sufficiently well-explored. Towards this, we propose a method that demonstrates that we are… ▽ More

    Submitted 31 March, 2021; originally announced April 2021.

    Comments: ICASSP 2021 (Accepted)

  27. arXiv:2103.16597  [pdf, other

    cs.CV

    Rectification-based Knowledge Retention for Continual Learning

    Authors: Pravendra Singh, Pratik Mazumder, Piyush Rai, Vinay P. Namboodiri

    Abstract: Deep learning models suffer from catastrophic forgetting when trained in an incremental learning setting. In this work, we propose a novel approach to address the task incremental learning problem, which involves training a model on new tasks that arrive in an incremental manner. The task incremental learning problem becomes even more challenging when the test set contains classes that are not par… ▽ More

    Submitted 30 March, 2021; originally announced March 2021.

    Comments: Accepted in CVPR 2021

  28. arXiv:2102.09003  [pdf, other

    cs.CV cs.AI

    Domain Impression: A Source Data Free Domain Adaptation Method

    Authors: Vinod K Kurmi, Venkatesh K Subramanian, Vinay P Namboodiri

    Abstract: Unsupervised Domain adaptation methods solve the adaptation problem for an unlabeled target set, assuming that the source dataset is available with all labels. However, the availability of actual source samples is not always possible in practical cases. It could be due to memory constraints, privacy concerns, and challenges in sharing data. This practical scenario creates a bottleneck in the domai… ▽ More

    Submitted 17 February, 2021; originally announced February 2021.

    Comments: Published- WACV-2021

  29. arXiv:2102.01906  [pdf, other

    cs.LG cs.CV

    Do Not Forget to Attend to Uncertainty while Mitigating Catastrophic Forgetting

    Authors: Vinod K Kurmi, Badri N. Patro, Venkatesh K. Subramanian, Vinay P. Namboodiri

    Abstract: One of the major limitations of deep learning models is that they face catastrophic forgetting in an incremental learning scenario. There have been several approaches proposed to tackle the problem of incremental learning. Most of these methods are based on knowledge distillation and do not adequately utilize the information provided by older task models, such as uncertainty estimation in predicti… ▽ More

    Submitted 3 February, 2021; originally announced February 2021.

    Comments: Accepted WACV 2021

    Journal ref: WACV 2021

  30. arXiv:2012.05786  [pdf, other

    cs.CL

    Exploring Pair-Wise NMT for Indian Languages

    Authors: Kartheek Akella, Sai Himal Allu, Sridhar Suresh Ragupathi, Aman Singhal, Zeeshan Khan, Vinay P. Namboodiri, C V Jawahar

    Abstract: In this paper, we address the task of improving pair-wise machine translation for specific low resource Indian languages. Multilingual NMT models have demonstrated a reasonable amount of effectiveness on resource-poor languages. In this work, we show that the performance of these models can be significantly improved upon by using back-translation through a filtered back-translation process and sub… ▽ More

    Submitted 10 December, 2020; originally announced December 2020.

    Comments: ICON 2020 Short paper

  31. arXiv:2011.11067  [pdf, other

    cs.CV

    RNNP: A Robust Few-Shot Learning Approach

    Authors: Pratik Mazumder, Pravendra Singh, Vinay P. Namboodiri

    Abstract: Learning from a few examples is an important practical aspect of training classifiers. Various works have examined this aspect quite well. However, all existing approaches assume that the few examples provided are always correctly labeled. This is a strong assumption, especially if one considers the current techniques for labeling using crowd-based labeling services. We address this issue by propo… ▽ More

    Submitted 22 November, 2020; originally announced November 2020.

    Comments: Accepted in WACV 2021

  32. arXiv:2011.10727  [pdf, other

    cs.CV

    Stochastic Talking Face Generation Using Latent Distribution Matching

    Authors: Ravindra Yadav, Ashish Sardana, Vinay P Namboodiri, Rajesh M Hegde

    Abstract: The ability to envisage the visual of a talking face based just on hearing a voice is a unique human capability. There have been a number of works that have solved for this ability recently. We differ from these approaches by enabling a variety of talking face generations based on single audio input. Indeed, just having the ability to generate a single talking face would make a system almost robot… ▽ More

    Submitted 21 November, 2020; originally announced November 2020.

    Comments: InterSpeech 2020

  33. arXiv:2011.07340  [pdf, other

    cs.CV

    Speech Prediction in Silent Videos using Variational Autoencoders

    Authors: Ravindra Yadav, Ashish Sardana, Vinay P Namboodiri, Rajesh M Hegde

    Abstract: Understanding the relationship between the auditory and visual signals is crucial for many different applications ranging from computer-generated imagery (CGI) and video editing automation to assisting people with hearing or visual impairments. However, this is challenging since the distribution of both audio and visual modality is inherently multimodal. Therefore, most of the existing methods ign… ▽ More

    Submitted 14 November, 2020; originally announced November 2020.

  34. SHAD3S: A model to Sketch, Shade and Shadow

    Authors: Raghav B. Venkataramaiyer, Abhishek Joshi, Saisha Narang, Vinay P. Namboodiri

    Abstract: Hatching is a common method used by artists to accentuate the third dimension of a sketch, and to illuminate the scene. Our system SHAD3S attempts to compete with a human at hatching generic three-dimensional (3D) shapes, and also tries to assist her in a form exploration exercise. The novelty of our approach lies in the fact that we make no assumptions about the input other than that it represent… ▽ More

    Submitted 4 September, 2021; v1 submitted 13 November, 2020; originally announced November 2020.

    Comments: 10 pages, 11 figures, 2 tables Accepted to WACV 2021. Project Page: https://bvraghav.com/shad3s/

    Journal ref: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 2021, pp. 3615-3624

  35. arXiv:2008.11451  [pdf, other

    cs.CV

    Determinantal Point Process as an alternative to NMS

    Authors: Samik Some, Mithun Das Gupta, Vinay P. Namboodiri

    Abstract: We present a determinantal point process (DPP) inspired alternative to non-maximum suppression (NMS) which has become an integral step in all state-of-the-art object detection frameworks. DPPs have been shown to encourage diversity in subset selection problems. We pose NMS as a subset selection problem and posit that directly incorporating DPP like framework can improve the overall performance of… ▽ More

    Submitted 20 June, 2024; v1 submitted 26 August, 2020; originally announced August 2020.

    Comments: Published in BMVC 2020

  36. Revisiting Low Resource Status of Indian Languages in Machine Translation

    Authors: Jerin Philip, Shashank Siripragada, Vinay P. Namboodiri, C. V. Jawahar

    Abstract: Indian language machine translation performance is hampered due to the lack of large scale multi-lingual sentence aligned corpora and robust benchmarks. Through this paper, we provide and analyse an automated framework to obtain such a corpus for Indian language neural machine translation (NMT) systems. Our pipeline consists of a baseline NMT system, a retrieval module, and an alignment module tha… ▽ More

    Submitted 4 November, 2020; v1 submitted 11 August, 2020; originally announced August 2020.

    Comments: 10 pages, few figures, Preprint under review

    Journal ref: 8th ACM IKDD CODS and 26th COMAD (CODS COMAD 2021), January 2--4, 2021, Bangalore, India

  37. arXiv:2007.07691  [pdf

    cs.CL

    A Multilingual Parallel Corpora Collection Effort for Indian Languages

    Authors: Shashank Siripragada, Jerin Philip, Vinay P. Namboodiri, C V Jawahar

    Abstract: We present sentence aligned parallel corpora across 10 Indian Languages - Hindi, Telugu, Tamil, Malayalam, Gujarati, Urdu, Bengali, Oriya, Marathi, Punjabi, and English - many of which are categorized as low resource. The corpora are compiled from online sources which have content shared across languages. The corpora presented significantly extends present resources that are either not large enoug… ▽ More

    Submitted 15 July, 2020; originally announced July 2020.

    Comments: 9 pages. Accepted in LREC 2020

  38. Learning to Switch CNNs with Model Agnostic Meta Learning for Fine Precision Visual Servoing

    Authors: Prem Raj, Vinay P. Namboodiri, L. Behera

    Abstract: Convolutional Neural Networks (CNNs) have been successfully applied for relative camera pose estimation from labeled image-pair data, without requiring any hand-engineered features, camera intrinsic parameters or depth information. The trained CNN can be utilized for performing pose based visual servo control (PBVS). One of the ways to improve the quality of visual servo output is to improve the a… ▽ More

    Submitted 9 July, 2020; originally announced July 2020.

    Comments: Accepted in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-2020). For video visit - https://youtu.be/GSG20lmWDUo

    Journal ref: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 10210-10217

  39. arXiv:2006.15919  [pdf, other

    cs.CV

    Improving Few-Shot Learning using Composite Rotation based Auxiliary Task

    Authors: Pratik Mazumder, Pravendra Singh, Vinay P. Namboodiri

    Abstract: In this paper, we propose an approach to improve few-shot classification performance using a composite rotation based auxiliary task. Few-shot classification methods aim to produce neural networks that perform well for classes with a large number of training samples and classes with less number of training samples. They employ techniques to enable the network to produce highly discriminative featu… ▽ More

    Submitted 22 November, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: Accepted in WACV 2021

  40. arXiv:2006.04406  [pdf, ps, other

    cs.CV cs.LG

    Passive Batch Injection Training Technique: Boosting Network Performance by Injecting Mini-Batches from a different Data Distribution

    Authors: Pravendra Singh, Pratik Mazumder, Vinay P. Namboodiri

    Abstract: This work presents a novel training technique for deep neural networks that makes use of additional data from a distribution that is different from that of the original input data. This technique aims to reduce overfitting and improve the generalization performance of the network. Our proposed technique, namely Passive Batch Injection Training Technique (PBITT), even reduces the level of overfitti… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

    Comments: Accepted in IJCNN 2020

  41. arXiv:2005.13402  [pdf, other

    cs.CV cs.SD eess.AS

    AVGZSLNet: Audio-Visual Generalized Zero-Shot Learning by Reconstructing Label Features from Multi-Modal Embeddings

    Authors: Pratik Mazumder, Pravendra Singh, Kranti Kumar Parida, Vinay P. Namboodiri

    Abstract: In this paper, we propose a novel approach for generalized zero-shot learning in a multi-modal setting, where we have novel classes of audio/video during testing that are not seen during training. We use the semantic relatedness of text embeddings as a means for zero-shot learning by aligning audio and video embeddings with the corresponding class label text feature space. Our approach uses a cros… ▽ More

    Submitted 23 November, 2020; v1 submitted 27 May, 2020; originally announced May 2020.

    Comments: Accepted in WACV 2021

  42. arXiv:2005.12892  [pdf, other

    cs.CV cs.LG eess.IV

    Minimizing Supervision in Multi-label Categorization

    Authors: Rajat, Munender Varshney, Pravendra Singh, Vinay P. Namboodiri

    Abstract: Multiple categories of objects are present in most images. Treating this as a multi-class classification is not justified. We treat this as a multi-label classification problem. In this paper, we further aim to minimize the supervision required for providing supervision in multi-label classification. Specifically, we investigate an effective class of approaches that associate a weak localization w… ▽ More

    Submitted 26 May, 2020; originally announced May 2020.

    Comments: Accepted in CVPR-W 2020

  43. arXiv:2002.10309  [pdf, other

    cs.CV cs.CL cs.LG

    Uncertainty based Class Activation Maps for Visual Question Answering

    Authors: Badri N. Patro, Mayank Lunayach, Vinay P. Namboodiri

    Abstract: Understanding and explaining deep learning models is an imperative task. Towards this, we propose a method that obtains gradient-based certainty estimates that also provide visual attention maps. Particularly, we solve for visual question answering task. We incorporate modern probabilistic deep learning methods that we further improve by using the gradients for these estimates. These have two-fold… ▽ More

    Submitted 23 January, 2020; originally announced February 2020.

    Comments: This work is an extension of our ICCV-2019 work. arXiv admin note: text overlap with arXiv:1908.06306

  44. arXiv:2001.08779  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    Deep Bayesian Network for Visual Question Generation

    Authors: Badri N. Patro, Vinod K. Kurmi, Sandeep Kumar, Vinay P. Namboodiri

    Abstract: Generating natural questions from an image is a semantic task that requires using vision and language modalities to learn multimodal representations. Images can have multiple visual and language cues such as places, captions, and tags. In this paper, we propose a principled deep Bayesian learning framework that combines these cues to produce natural questions. We observe that with the addition of… ▽ More

    Submitted 23 January, 2020; originally announced January 2020.

    Comments: WACV-2020 (Accepted)

  45. arXiv:2001.08730  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    Robust Explanations for Visual Question Answering

    Authors: Badri N. Patro, Shivansh Pate, Vinay P. Namboodiri

    Abstract: In this paper, we propose a method to obtain robust explanations for visual question answering(VQA) that correlate well with the answers. Our model explains the answers obtained through a VQA model by providing visual and textual explanations. The main challenges that we address are i) Answers and textual explanations obtained by current methods are not well correlated and ii) Current methods for… ▽ More

    Submitted 23 January, 2020; originally announced January 2020.

    Comments: WACV-2020 (Accepted)

  46. arXiv:2001.05545  [pdf, other

    cs.CV cs.LG stat.ML

    A "Network Pruning Network" Approach to Deep Model Compression

    Authors: Vinay Kumar Verma, Pravendra Singh, Vinay P. Namboodiri, Piyush Rai

    Abstract: We present a filter pruning approach for deep model compression, using a multitask network. Our approach is based on learning a a pruner network to prune a pre-trained target network. The pruner is essentially a multitask deep neural network with binary outputs that help identify the filters from each layer of the original network that do not have any significant contribution to the model and can… ▽ More

    Submitted 15 January, 2020; originally announced January 2020.

    Comments: Accepted in WACV'20

  47. arXiv:2001.01240  [pdf, other

    cs.CV cs.LG

    Cooperative Initialization based Deep Neural Network Training

    Authors: Pravendra Singh, Munender Varshney, Vinay P. Namboodiri

    Abstract: Researchers have proposed various activation functions. These activation functions help the deep network to learn non-linear behavior with a significant effect on training dynamics and task performance. The performance of these activations also depends on the initial state of the weight parameters, i.e., different initial state leads to a difference in the performance of a network. In this paper,… ▽ More

    Submitted 5 January, 2020; originally announced January 2020.

    Comments: IEEE Winter Conference on Applications of Computer Vision (WACV), 2020

  48. arXiv:1912.13149  [pdf, other

    cs.CL cs.LG

    Revisiting Paraphrase Question Generator using Pairwise Discriminator

    Authors: Badri N. Patro, Dev Chauhan, Vinod K. Kurmi, Vinay P. Namboodiri

    Abstract: In this paper, we propose a method for obtaining sentence-level embeddings. While the problem of securing word-level embeddings is very well studied, we propose a novel method for obtaining sentence-level embeddings. This is obtained by a simple method in the context of solving the paraphrase generation task. If we use a sequential encoder-decoder model for generating paraphrase, we would like the… ▽ More

    Submitted 4 January, 2020; v1 submitted 30 December, 2019; originally announced December 2019.

    Comments: This work is an extension of our COLING-2018 paper arXiv:1806.00807

  49. arXiv:1912.09551  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Deep Exemplar Networks for VQA and VQG

    Authors: Badri N. Patro, Vinay P. Namboodiri

    Abstract: In this paper, we consider the problem of solving semantic tasks such as `Visual Question Answering' (VQA), where one aims to answers related to an image and `Visual Question Generation' (VQG), where one aims to generate a natural question pertaining to an image. Solutions for VQA and VQG tasks have been proposed using variants of encoder-decoder deep learning based frameworks that have shown impr… ▽ More

    Submitted 19 December, 2019; originally announced December 2019.

    Comments: This work is an extension of CVPR-2018 accepted paper arXiv:1804.00298 and EMNLP-2018 accepted paper arXiv:1808.03986

  50. arXiv:1912.07991  [pdf, other

    cs.LG cs.CV stat.ML

    Jointly Trained Image and Video Generation using Residual Vectors

    Authors: Yatin Dandi, Aniket Das, Soumye Singhal, Vinay P. Namboodiri, Piyush Rai

    Abstract: In this work, we propose a modeling technique for jointly training image and video generation models by simultaneously learning to map latent variables with a fixed prior onto real images and interpolate over images to generate videos. The proposed approach models the variations in representations using residual vectors encoding the change at each time step over a summary vector for the entire vid… ▽ More

    Submitted 17 December, 2019; originally announced December 2019.

    Comments: Accepted in 2020 Winter Conference on Applications of Computer Vision (WACV '20)