Search | arXiv e-print repository

LoMOE: Localized Multi-Object Editing via Multi-Diffusion

Authors: Goirik Chakrabarty, Aditya Chandrasekar, Ramya Hebbalaguppe, Prathosh AP

Abstract: Recent developments in the field of diffusion models have demonstrated an exceptional capacity to generate high-quality prompt-conditioned image edits. Nevertheless, previous approaches have primarily relied on textual prompts for image editing, which tend to be less effective when making precise edits to specific objects or fine-grained regions within a scene containing single/multiple objects. W… ▽ More Recent developments in the field of diffusion models have demonstrated an exceptional capacity to generate high-quality prompt-conditioned image edits. Nevertheless, previous approaches have primarily relied on textual prompts for image editing, which tend to be less effective when making precise edits to specific objects or fine-grained regions within a scene containing single/multiple objects. We introduce a novel framework for zero-shot localized multi-object editing through a multi-diffusion process to overcome this challenge. This framework empowers users to perform various operations on objects within an image, such as adding, replacing, or editing $\textbf{many}$ objects in a complex scene $\textbf{in one pass}$. Our approach leverages foreground masks and corresponding simple text prompts that exert localized influences on the target regions resulting in high-fidelity image editing. A combination of cross-attention and background preservation losses within the latent space ensures that the characteristics of the object being edited are preserved while simultaneously achieving a high-quality, seamless reconstruction of the background with fewer artifacts compared to the current methods. We also curate and release a dataset dedicated to multi-object editing, named $\texttt{LoMOE}$-Bench. Our experiments against existing state-of-the-art methods demonstrate the improved effectiveness of our approach in terms of both image editing quality and inference speed. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 18 pages

arXiv:2212.10005 [pdf, other]

Calibrating Deep Neural Networks using Explicit Regularisation and Dynamic Data Pruning

Authors: Ramya Hebbalaguppe, Rishabh Patra, Tirtharaj Dash, Gautam Shroff, Lovekesh Vig

Abstract: Deep neural networks (DNN) are prone to miscalibrated predictions, often exhibiting a mismatch between the predicted output and the associated confidence scores. Contemporary model calibration techniques mitigate the problem of overconfident predictions by pushing down the confidence of the winning class while increasing the confidence of the remaining classes across all test samples. However, fro… ▽ More Deep neural networks (DNN) are prone to miscalibrated predictions, often exhibiting a mismatch between the predicted output and the associated confidence scores. Contemporary model calibration techniques mitigate the problem of overconfident predictions by pushing down the confidence of the winning class while increasing the confidence of the remaining classes across all test samples. However, from a deployment perspective, an ideal model is desired to (i) generate well-calibrated predictions for high-confidence samples with predicted probability say >0.95, and (ii) generate a higher proportion of legitimate high-confidence samples. To this end, we propose a novel regularization technique that can be used with classification losses, leading to state-of-the-art calibrated predictions at test time; From a deployment standpoint in safety-critical applications, only high-confidence samples from a well-calibrated model are of interest, as the remaining samples have to undergo manual inspection. Predictive confidence reduction of these potentially ``high-confidence samples'' is a downside of existing calibration approaches. We mitigate this by proposing a dynamic train-time data pruning strategy that prunes low-confidence samples every few epochs, providing an increase in "confident yet calibrated samples". We demonstrate state-of-the-art calibration performance across image classification benchmarks, reducing training time without much compromise in accuracy. We provide insights into why our dynamic pruning strategy that prunes low-confidence training samples leads to an increase in high-confidence samples at test time. △ Less

Submitted 20 December, 2022; originally announced December 2022.

Comments: The paper is accepted at Winter Conference on applications of Computer Vision (IEEE WACV) in algorithms tracks. 8 pages Main paper; 3 pages supplementary material

arXiv:2207.13916 [pdf, other]

A Novel Data Augmentation Technique for Out-of-Distribution Sample Detection using Compounded Corruptions

Authors: Ramya S. Hebbalaguppe, Soumya Suvra Goshal, Jatin Prakash, Harshad Khadilkar, Chetan Arora

Abstract: Modern deep neural network models are known to erroneously classify out-of-distribution (OOD) test data into one of the in-distribution (ID) training classes with high confidence. This can have disastrous consequences for safety-critical applications. A popular mitigation strategy is to train a separate classifier that can detect such OOD samples at the test time. In most practical settings OOD ex… ▽ More Modern deep neural network models are known to erroneously classify out-of-distribution (OOD) test data into one of the in-distribution (ID) training classes with high confidence. This can have disastrous consequences for safety-critical applications. A popular mitigation strategy is to train a separate classifier that can detect such OOD samples at the test time. In most practical settings OOD examples are not known at the train time, and hence a key question is: how to augment the ID data with synthetic OOD samples for training such an OOD detector? In this paper, we propose a novel Compounded Corruption technique for the OOD data augmentation termed CnC. One of the major advantages of CnC is that it does not require any hold-out data apart from the training set. Further, unlike current state-of-the-art (SOTA) techniques, CnC does not require backpropagation or ensembling at the test time, making our method much faster at inference. Our extensive comparison with 20 methods from the major conferences in last 4 years show that a model trained using CnC based data augmentation, significantly outperforms SOTA, both in terms of OOD detection accuracy as well as inference time. We include a detailed post-hoc analysis to investigate the reasons for the success of our method and identify higher relative entropy and diversity of CnC samples as probable causes. We also provide theoretical insights via a piece-wise decomposition analysis on a two-dimensional dataset to reveal (visually and quantitatively) that our approach leads to a tighter boundary around ID classes, leading to better detection of OOD samples. Source code link: https://github.com/cnc-ood △ Less

Submitted 21 September, 2022; v1 submitted 28 July, 2022; originally announced July 2022.

Comments: 16 pages of the main text, and supplemental material. Accepted in Research Track ECML'22. Project webpage: https://cnc-ood.github.io/

arXiv:2203.13834 [pdf, other]

A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration

Authors: Ramya Hebbalaguppe, Jatin Prakash, Neelabh Madan, Chetan Arora

Abstract: Deep Neural Networks ( DNN s) are known to make overconfident mistakes, which makes their use problematic in safety-critical applications. State-of-the-art ( SOTA ) calibration techniques improve on the confidence of predicted labels alone and leave the confidence of non-max classes (e.g. top-2, top-5) uncalibrated. Such calibration is not suitable for label refinement using post-processing. Furth… ▽ More Deep Neural Networks ( DNN s) are known to make overconfident mistakes, which makes their use problematic in safety-critical applications. State-of-the-art ( SOTA ) calibration techniques improve on the confidence of predicted labels alone and leave the confidence of non-max classes (e.g. top-2, top-5) uncalibrated. Such calibration is not suitable for label refinement using post-processing. Further, most SOTA techniques learn a few hyper-parameters post-hoc, leaving out the scope for image, or pixel specific calibration. This makes them unsuitable for calibration under domain shift, or for dense prediction tasks like semantic segmentation. In this paper, we argue for intervening at the train time itself, so as to directly produce calibrated DNN models. We propose a novel auxiliary loss function: Multi-class Difference in Confidence and Accuracy ( MDCA ), to achieve the same MDCA can be used in conjunction with other application/task-specific loss functions. We show that training with MDCA leads to better-calibrated models in terms of Expected Calibration Error ( ECE ), and Static Calibration Error ( SCE ) on image classification, and segmentation tasks. We report ECE ( SCE ) score of 0.72 (1.60) on the CIFAR 100 dataset, in comparison to 1.90 (1.71) by the SOTA. Under domain shift, a ResNet-18 model trained on PACS dataset using MDCA gives an average ECE ( SCE ) score of 19.7 (9.7) across all domains, compared to 24.2 (11.8) by the SOTA. For the segmentation task, we report a 2X reduction in calibration error on PASCAL - VOC dataset in comparison to Focal Loss. Finally, MDCA training improves calibration even on imbalanced data, and for natural language classification tasks. We have released the code here: code is available at https://github.com/mdca-loss △ Less

Submitted 25 March, 2022; originally announced March 2022.

Comments: Accepted in IEEE Computer Vision and Pattern Recognition 2022

arXiv:2111.00506 [pdf, other]

PnPOOD : Out-Of-Distribution Detection for Text Classification via Plug andPlay Data Augmentation

Authors: Mrinal Rawat, Ramya Hebbalaguppe, Lovekesh Vig

Abstract: While Out-of-distribution (OOD) detection has been well explored in computer vision, there have been relatively few prior attempts in OOD detection for NLP classification. In this paper we argue that these prior attempts do not fully address the OOD problem and may suffer from data leakage and poor calibration of the resulting models. We present PnPOOD, a data augmentation technique to perform OOD… ▽ More While Out-of-distribution (OOD) detection has been well explored in computer vision, there have been relatively few prior attempts in OOD detection for NLP classification. In this paper we argue that these prior attempts do not fully address the OOD problem and may suffer from data leakage and poor calibration of the resulting models. We present PnPOOD, a data augmentation technique to perform OOD detection via out-of-domain sample generation using the recently proposed Plug and Play Language Model (Dathathri et al., 2020). Our method generates high quality discriminative samples close to the class boundaries, resulting in accurate OOD detection at test time. We demonstrate that our model outperforms prior models on OOD sample detection, and exhibits lower calibration error on the 20 newsgroup text and Stanford Sentiment Treebank dataset (Lang, 1995; Socheret al., 2013). We further highlight an important data leakage issue with datasets used in prior attempts at OOD detection, and share results on a new dataset for OOD detection that does not suffer from the same problem. △ Less

Submitted 31 October, 2021; originally announced November 2021.

Report number: Accepted in Uncertainty in Deep Learning, ICML'21

arXiv:1911.01320 [pdf, other]

Synthetic Video Generation for Robust Hand Gesture Recognition in Augmented Reality Applications

Authors: Varun Jain, Shivam Aggarwal, Suril Mehta, Ramya Hebbalaguppe

Abstract: Hand gestures are a natural means of interaction in Augmented Reality and Virtual Reality (AR/VR) applications. Recently, there has been an increased focus on removing the dependence of accurate hand gesture recognition on complex sensor setup found in expensive proprietary devices such as the Microsoft HoloLens, Daqri and Meta Glasses. Most such solutions either rely on multi-modal sensor data or… ▽ More Hand gestures are a natural means of interaction in Augmented Reality and Virtual Reality (AR/VR) applications. Recently, there has been an increased focus on removing the dependence of accurate hand gesture recognition on complex sensor setup found in expensive proprietary devices such as the Microsoft HoloLens, Daqri and Meta Glasses. Most such solutions either rely on multi-modal sensor data or deep neural networks that can benefit greatly from abundance of labelled data. Datasets are an integral part of any deep learning based research. They have been the principal reason for the substantial progress in this field, both, in terms of providing enough data for the training of these models, and, for benchmarking competing algorithms. However, it is becoming increasingly difficult to generate enough labelled data for complex tasks such as hand gesture recognition. The goal of this work is to introduce a framework capable of generating photo-realistic videos that have labelled hand bounding box and fingertip that can help in designing, training, and benchmarking models for hand-gesture recognition in AR/VR applications. We demonstrate the efficacy of our framework in generating videos with diverse backgrounds. △ Less

Submitted 5 December, 2019; v1 submitted 4 November, 2019; originally announced November 2019.

Comments: Presented at the ICCV 2019 Workshop: The 5th International Workshop on Observing And Understanding Hands In Action

arXiv:1910.12061 [pdf, other]

Variational Student: Learning Compact and Sparser Networks in Knowledge Distillation Framework

Authors: Srinidhi Hegde, Ranjitha Prasad, Ramya Hebbalaguppe, Vishwajith Kumar

Abstract: The holy grail in deep neural network research is porting the memory- and computation-intensive network models on embedded platforms with a minimal compromise in model accuracy. To this end, we propose a novel approach, termed as Variational Student, where we reap the benefits of compressibility of the knowledge distillation (KD) framework, and sparsity inducing abilities of variational inference… ▽ More The holy grail in deep neural network research is porting the memory- and computation-intensive network models on embedded platforms with a minimal compromise in model accuracy. To this end, we propose a novel approach, termed as Variational Student, where we reap the benefits of compressibility of the knowledge distillation (KD) framework, and sparsity inducing abilities of variational inference (VI) techniques. Essentially, we build a sparse student network, whose sparsity is induced by the variational parameters found via optimizing a loss function based on VI, leveraging the knowledge learnt by an accurate but complex pre-trained teacher network. Further, for sparsity enhancement, we also employ a Block Sparse Regularizer on a concatenated tensor of teacher and student network weights. We demonstrate that the marriage of KD and the VI techniques inherits compression properties from the KD framework, and enhances levels of sparsity from the VI approach, with minimal compromise in the model accuracy. We benchmark our results on LeNet MLP and VGGNet (CNN) and illustrate a memory footprint reduction of 64x and 213x on these MLP and CNN variants, respectively, without a need to retrain the teacher network. Furthermore, in the low data regime, we observed that our method outperforms state-of-the-art Bayesian techniques in terms of accuracy. △ Less

Submitted 26 October, 2019; originally announced October 2019.

arXiv:1904.09843 [pdf, other]

GestARLite: An On-Device Pointing Finger Based Gestural Interface for Smartphones and Video See-Through Head-Mounts

Authors: Varun Jain, Gaurav Garg, Ramakrishna Perla, Ramya Hebbalaguppe

Abstract: Hand gestures form an intuitive means of interaction in Mixed Reality (MR) applications. However, accurate gesture recognition can be achieved only through state-of-the-art deep learning models or with the use of expensive sensors. Despite the robustness of these deep learning models, they are generally computationally expensive and obtaining real-time performance on-device is still a challenge. T… ▽ More Hand gestures form an intuitive means of interaction in Mixed Reality (MR) applications. However, accurate gesture recognition can be achieved only through state-of-the-art deep learning models or with the use of expensive sensors. Despite the robustness of these deep learning models, they are generally computationally expensive and obtaining real-time performance on-device is still a challenge. To this end, we propose a novel lightweight hand gesture recognition framework that works in First Person View for wearable devices. The models are trained on a GPU machine and ported on an Android smartphone for its use with frugal wearable devices such as the Google Cardboard and VR Box. The proposed hand gesture recognition framework is driven by a cascade of state-of-the-art deep learning models: MobileNetV2 for hand localisation, our custom fingertip regression architecture followed by a Bi-LSTM model for gesture classification. We extensively evaluate the framework on our EgoGestAR dataset. The overall framework works in real-time on mobile devices and achieves a classification accuracy of 80% on EgoGestAR video dataset with an average latency of only 0.12 s. △ Less

Submitted 19 April, 2019; originally announced April 2019.

Comments: The AAAI 2019 Workshop on Plan, Activity, and Intent Recognition. arXiv admin note: substantial text overlap with arXiv:1904.06122

arXiv:1904.06122 [pdf]

AirPen: A Touchless Fingertip Based Gestural Interface for Smartphones and Head-Mounted Devices

Authors: Varun Jain, Ramya Hebbalaguppe

Abstract: Hand gestures are an intuitive, socially acceptable, and a non-intrusive interaction modality in Mixed Reality (MR) and smartphone based applications. Unlike speech interfaces, they tend to perform well even in shared and public spaces. Hand gestures can also be used to interact with smartphones in situations where the user's ability to physically touch the device is impaired. However, accurate ge… ▽ More Hand gestures are an intuitive, socially acceptable, and a non-intrusive interaction modality in Mixed Reality (MR) and smartphone based applications. Unlike speech interfaces, they tend to perform well even in shared and public spaces. Hand gestures can also be used to interact with smartphones in situations where the user's ability to physically touch the device is impaired. However, accurate gesture recognition can be achieved through state-of-the-art deep learning models or with the use of expensive sensors. Despite the robustness of these deep learning models, they are computationally heavy and memory hungry, and obtaining real-time performance on-device without additional hardware is still a challenge. To address this, we propose AirPen: an analogue to pen on paper, but in air, for in-air writing and gestural commands that works seamlessly in First and Second Person View. The models are trained on a GPU machine and ported on an Android smartphone. AirPen comprises of three deep learning models that work in tandem: MobileNetV2 for hand localisation, our custom fingertip regression architecture followed by a Bi-LSTM model for gesture classification. The overall framework works in real-time on mobile devices and achieves a classification accuracy of 80% with an average latency of only 0.12 s. △ Less

Submitted 12 April, 2019; originally announced April 2019.

Comments: Presented at the CHI'19 Workshop: Addressing the Challenges of Situationally-Induced Impairments and Disabilities in Mobile Interaction, 2019 (arXiv:1904.05382)

Report number: SIID/2019/no05

arXiv:1706.03851 [pdf, ps, other]

Google Cardboard Dates Augmented Reality : Issues, Challenges and Future Opportunities

Authors: Ramakrishna Perla, Ramya Hebbalaguppe

Abstract: The Google's frugal Cardboard solution for immersive Virtual Reality experiences has come a long way in the VR market. The Google Cardboard VR applications will support us in the fields such as education, virtual tourism, entertainment, gaming, design etc. Recently, Qualcomm's Vuforia SDK has introduced support for develo** mixed reality applications for Google Cardboard which can combine Virtua… ▽ More The Google's frugal Cardboard solution for immersive Virtual Reality experiences has come a long way in the VR market. The Google Cardboard VR applications will support us in the fields such as education, virtual tourism, entertainment, gaming, design etc. Recently, Qualcomm's Vuforia SDK has introduced support for develo** mixed reality applications for Google Cardboard which can combine Virtual and Augmented Reality to develop exciting and immersive experiences. In this work, we present a comprehensive review of Google Cardboard for AR and also highlight its technical and subjective limitations by conducting a feasibility study through the inspection of a Desktop computer use-case. Additionally, we recommend the future avenues for the Google Cardboard in AR. This work also serves as a guide for Android/iOS developers as there are no published scholarly articles or well documented studies exclusively on Google Cardboard with both user and developer's experience captured at one place. △ Less

Submitted 5 June, 2017; originally announced June 2017.

Showing 1–10 of 10 results for author: Hebbalaguppe, R