-
Semantic Graph Consistency: Going Beyond Patches for Regularizing Self-Supervised Vision Transformers
Authors:
Chaitanya Devaguptapu,
Sumukh Aithal,
Shrinivas Ramasubramanian,
Moyuru Yamada,
Manohar Kaul
Abstract:
Self-supervised learning (SSL) with vision transformers (ViTs) has proven effective for representation learning as demonstrated by the impressive performance on various downstream tasks. Despite these successes, existing ViT-based SSL architectures do not fully exploit the ViT backbone, particularly the patch tokens of the ViT. In this paper, we introduce a novel Semantic Graph Consistency (SGC) m…
▽ More
Self-supervised learning (SSL) with vision transformers (ViTs) has proven effective for representation learning as demonstrated by the impressive performance on various downstream tasks. Despite these successes, existing ViT-based SSL architectures do not fully exploit the ViT backbone, particularly the patch tokens of the ViT. In this paper, we introduce a novel Semantic Graph Consistency (SGC) module to regularize ViT-based SSL methods and leverage patch tokens effectively. We reconceptualize images as graphs, with image patches as nodes and infuse relational inductive biases by explicit message passing using Graph Neural Networks into the SSL framework. Our SGC loss acts as a regularizer, leveraging the underexploited patch tokens of ViTs to construct a graph and enforcing consistency between graph features across multiple views of an image. Extensive experiments on various datasets including ImageNet, RESISC and Food-101 show that our approach significantly improves the quality of learned representations, resulting in a 5-10\% increase in performance when limited labeled data is used for linear evaluation. These experiments coupled with a comprehensive set of ablations demonstrate the promise of our approach in various settings.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Understanding Hallucinations in Diffusion Models through Mode Interpolation
Authors:
Sumukh K Aithal,
Pratyush Maini,
Zachary C. Lipton,
J. Zico Kolter
Abstract:
Colloquially speaking, image generation models based upon diffusion processes are frequently said to exhibit "hallucinations," samples that could never occur in the training data. But where do such hallucinations come from? In this paper, we study a particular failure mode in diffusion models, which we term mode interpolation. Specifically, we find that diffusion models smoothly "interpolate" betw…
▽ More
Colloquially speaking, image generation models based upon diffusion processes are frequently said to exhibit "hallucinations," samples that could never occur in the training data. But where do such hallucinations come from? In this paper, we study a particular failure mode in diffusion models, which we term mode interpolation. Specifically, we find that diffusion models smoothly "interpolate" between nearby data modes in the training set, to generate samples that are completely outside the support of the original training distribution; this phenomenon leads diffusion models to generate artifacts that never existed in real data (i.e., hallucinations). We systematically study the reasons for, and the manifestation of this phenomenon. Through experiments on 1D and 2D Gaussians, we show how a discontinuous loss landscape in the diffusion model's decoder leads to a region where any smooth approximation will cause such hallucinations. Through experiments on artificial datasets with various shapes, we show how hallucination leads to the generation of combinations of shapes that never existed. Finally, we show that diffusion models in fact know when they go out of support and hallucinate. This is captured by the high variance in the trajectory of the generated sample towards the final few backward sampling process. Using a simple metric to capture this variance, we can remove over 95% of hallucinations at generation time while retaining 96% of in-support samples. We conclude our exploration by showing the implications of such hallucination (and its removal) on the collapse (and stabilization) of recursive training on synthetic data with experiments on MNIST and 2D Gaussians dataset. We release our code at https://github.com/locuslab/diffusion-model-hallucination.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Leveraging the Third Dimension in Contrastive Learning
Authors:
Sumukh Aithal,
Anirudh Goyal,
Alex Lamb,
Yoshua Bengio,
Michael Mozer
Abstract:
Self-Supervised Learning (SSL) methods operate on unlabeled data to learn robust representations useful for downstream tasks. Most SSL methods rely on augmentations obtained by transforming the 2D image pixel map. These augmentations ignore the fact that biological vision takes place in an immersive three-dimensional, temporally contiguous environment, and that low-level biological vision relies h…
▽ More
Self-Supervised Learning (SSL) methods operate on unlabeled data to learn robust representations useful for downstream tasks. Most SSL methods rely on augmentations obtained by transforming the 2D image pixel map. These augmentations ignore the fact that biological vision takes place in an immersive three-dimensional, temporally contiguous environment, and that low-level biological vision relies heavily on depth cues. Using a signal provided by a pretrained state-of-the-art monocular RGB-to-depth model (the \emph{Depth Prediction Transformer}, Ranftl et al., 2021), we explore two distinct approaches to incorporating depth signals into the SSL framework. First, we evaluate contrastive learning using an RGB+depth input representation. Second, we use the depth signal to generate novel views from slightly different camera positions, thereby producing a 3D augmentation for contrastive learning. We evaluate these two approaches on three different SSL methods -- BYOL, SimSiam, and SwAV -- using ImageNette (10 class subset of ImageNet), ImageNet-100 and ImageNet-1k datasets. We find that both approaches to incorporating depth signals improve the robustness and generalization of the baseline SSL methods, though the first approach (with depth-channel concatenation) is superior. For instance, BYOL with the additional depth channel leads to an increase in downstream classification accuracy from 85.3\% to 88.0\% on ImageNette and 84.1\% to 87.0\% on ImageNet-C.
△ Less
Submitted 27 January, 2023;
originally announced January 2023.
-
Esca** Saddle Points for Effective Generalization on Class-Imbalanced Data
Authors:
Harsh Rangwani,
Sumukh K Aithal,
Mayank Mishra,
R. Venkatesh Babu
Abstract:
Real-world datasets exhibit imbalances of varying types and degrees. Several techniques based on re-weighting and margin adjustment of loss are often used to enhance the performance of neural networks, particularly on minority classes. In this work, we analyze the class-imbalanced learning problem by examining the loss landscape of neural networks trained with re-weighting and margin-based techniq…
▽ More
Real-world datasets exhibit imbalances of varying types and degrees. Several techniques based on re-weighting and margin adjustment of loss are often used to enhance the performance of neural networks, particularly on minority classes. In this work, we analyze the class-imbalanced learning problem by examining the loss landscape of neural networks trained with re-weighting and margin-based techniques. Specifically, we examine the spectral density of Hessian of class-wise loss, through which we observe that the network weights converge to a saddle point in the loss landscapes of minority classes. Following this observation, we also find that optimization methods designed to escape from saddle points can be effectively used to improve generalization on minority classes. We further theoretically and empirically demonstrate that Sharpness-Aware Minimization (SAM), a recent technique that encourages convergence to a flat minima, can be effectively used to escape saddle points for minority classes. Using SAM results in a 6.2\% increase in accuracy on the minority classes over the state-of-the-art Vector Scaling Loss, leading to an overall average increase of 4\% across imbalanced datasets. The code is available at: https://github.com/val-iisc/Saddle-LongTail.
△ Less
Submitted 28 December, 2022;
originally announced December 2022.
-
A Computationally Efficient, Robust Methodology for Evaluating Chemical Timescales with Detailed Chemical Kinetics
Authors:
S. M. Aithal
Abstract:
Turbulent reacting flows occur in a variety of engineering applications such as chemical reactors and power generating equipment (gas turbines and internal combustion engines). Turbulent reacting flows are characterized by two main timescales, namely, flow timescales and chemical (or reaction) timescales. Understanding the relative timescales of flow and reaction kinetics plays an important role,…
▽ More
Turbulent reacting flows occur in a variety of engineering applications such as chemical reactors and power generating equipment (gas turbines and internal combustion engines). Turbulent reacting flows are characterized by two main timescales, namely, flow timescales and chemical (or reaction) timescales. Understanding the relative timescales of flow and reaction kinetics plays an important role, not only in the choice of models required for the accurate simulation of these devices but also their design/optimization studies. There are several definitions of chemical timescales, which can largely be classified as algebraic or eigenvalue-based methods. The computational complexity (and hence cost) depends on the method of evaluation of the chemical timescales and size of the chemical reaction mechanism. The computational cost and robustness of the methodology of evaluating the reaction times scales is an important consideration in large-scale multi-dimensional simulations using detailed chemical mechanisms. In this work, we present a computational efficient and robust methodology to evaluate chemical timescales based on the algebraic method. Comparison of this novel methodology with other traditional methods is presented for a range of fuel-air mixtures, pressures and temperatures conditions. Additionally, chemical timescales are also presented for fuel-air mixtures at conditions of relevance to power generating equipment. The proposed method showed the same temporal characteristics as the eigenvalue-based methods with no additional computational cost for all the 1cases studied. The proposed method thus has the potential for use with multidimensional turbulent reacting flow simulations which require the computation of the Damkohler number.
△ Less
Submitted 7 October, 2022;
originally announced October 2022.
-
A Closer Look at Smoothness in Domain Adversarial Training
Authors:
Harsh Rangwani,
Sumukh K Aithal,
Mayank Mishra,
Arihant Jain,
R. Venkatesh Babu
Abstract:
Domain adversarial training has been ubiquitous for achieving invariant representations and is used widely for various domain adaptation tasks. In recent times, methods converging to smooth optima have shown improved generalization for supervised learning tasks like classification. In this work, we analyze the effect of smoothness enhancing formulations on domain adversarial training, the objectiv…
▽ More
Domain adversarial training has been ubiquitous for achieving invariant representations and is used widely for various domain adaptation tasks. In recent times, methods converging to smooth optima have shown improved generalization for supervised learning tasks like classification. In this work, we analyze the effect of smoothness enhancing formulations on domain adversarial training, the objective of which is a combination of task loss (eg. classification, regression, etc.) and adversarial terms. We find that converging to a smooth minima with respect to (w.r.t.) task loss stabilizes the adversarial training leading to better performance on target domain. In contrast to task loss, our analysis shows that converging to smooth minima w.r.t. adversarial loss leads to sub-optimal generalization on the target domain. Based on the analysis, we introduce the Smooth Domain Adversarial Training (SDAT) procedure, which effectively enhances the performance of existing domain adversarial methods for both classification and object detection tasks. Our analysis also provides insight into the extensive usage of SGD over Adam in the community for domain adversarial training.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
S$^3$VAADA: Submodular Subset Selection for Virtual Adversarial Active Domain Adaptation
Authors:
Harsh Rangwani,
Arihant Jain,
Sumukh K Aithal,
R. Venkatesh Babu
Abstract:
Unsupervised domain adaptation (DA) methods have focused on achieving maximal performance through aligning features from source and target domains without using labeled data in the target domain. Whereas, in the real-world scenario's it might be feasible to get labels for a small proportion of target data. In these scenarios, it is important to select maximally-informative samples to label and fin…
▽ More
Unsupervised domain adaptation (DA) methods have focused on achieving maximal performance through aligning features from source and target domains without using labeled data in the target domain. Whereas, in the real-world scenario's it might be feasible to get labels for a small proportion of target data. In these scenarios, it is important to select maximally-informative samples to label and find an effective way to combine them with the existing knowledge from source data. Towards achieving this, we propose S$^3$VAADA which i) introduces a novel submodular criterion to select a maximally informative subset to label and ii) enhances a cluster-based DA procedure through novel improvements to effectively utilize all the available data for improving generalization on target. Our approach consistently outperforms the competing state-of-the-art approaches on datasets with varying degrees of domain shifts.
△ Less
Submitted 18 September, 2021;
originally announced September 2021.
-
A Comprehensive Review On Various State Of Art Techniques For Eye Blink Detection
Authors:
Sannidhan MS,
Sunil Kumar Aithal,
Abhir Bhandary
Abstract:
Computer Vision is considered to be one of the most important areas in research and has focused on develo** many applications that has proved to be useful for both research and societal benefits. Today we have been witnessing many of the road mishaps happening just because of the lack of concentration while driving.As a part of avoiding this kind of disaster happening in day to day life there ar…
▽ More
Computer Vision is considered to be one of the most important areas in research and has focused on develo** many applications that has proved to be useful for both research and societal benefits. Today we have been witnessing many of the road mishaps happening just because of the lack of concentration while driving.As a part of avoiding this kind of disaster happening in day to day life there are many technologies focusing on kee** track of the vehicle drivers concentration.One such technology uses the method of eye blink detection to find out the concentration level of the driver.With the advent of many high end camera devices with cost effectiveness factor today it has become more efficient and cheaper to use eye blink detection for kee** track of the concentration level of the driver.Hence this paper presents an exhaustive review on the implementations of various eye blink detection algorithms.The detection system has also extended its application in various other fields like drowsiness detection and fatigue detection and expression detection.
△ Less
Submitted 26 November, 2019;
originally announced December 2019.
-
MaLTESE: Large-Scale Simulation-Driven Machine Learning for Transient Driving Cycles
Authors:
Shashi M. Aithal,
Prasanna Balaprakash
Abstract:
Optimal engine operation during a transient driving cycle is the key to achieving greater fuel economy, engine efficiency, and reduced emissions. In order to achieve continuously optimal engine operation, engine calibration methods use a combination of static correlations obtained from dynamometer tests for steady-state operating points and road and/or track performance data. As the parameter spac…
▽ More
Optimal engine operation during a transient driving cycle is the key to achieving greater fuel economy, engine efficiency, and reduced emissions. In order to achieve continuously optimal engine operation, engine calibration methods use a combination of static correlations obtained from dynamometer tests for steady-state operating points and road and/or track performance data. As the parameter space of control variables, design variable constraints, and objective functions increases, the cost and duration for optimal calibration become prohibitively large. In order to reduce the number of dynamometer tests required for calibrating modern engines, a large-scale simulation-driven machine learning approach is presented in this work. A parallel, fast, robust, physics-based reduced-order engine simulator is used to obtain performance and emission characteristics of engines over a wide range of control parameters under various transient driving conditions (drive cycles). We scale the simulation up to 3,906 nodes of the Theta supercomputer at the Argonne Leadership Computing Facility to generate data required to train a machine learning model. The trained model is then used to predict various engine parameters of interest. Our results show that a deep-neural-network-based surrogate model achieves high accuracy for various engine parameters such as exhaust temperature, exhaust pressure, nitric oxide, and engine torque. Once trained, the deep-neural-network-based surrogate model is fast for inference: it requires about 16 micro sec for predicting the engine performance and emissions for a single design configuration compared with about 0.5 s per configuration with the engine simulator. Moreover, we demonstrate that transfer learning and retraining can be leveraged to incrementally retrain the surrogate model to cope with new configurations that fall outside the training data space.
△ Less
Submitted 21 September, 2019;
originally announced September 2019.
-
Integrating goals after prioritization and evaluation-A Goal-oriented requirements engineering method
Authors:
S Vinay,
Shridhar Aithal,
Sudhakara Adiga
Abstract:
Decision support system in Requirements engineering plays an important role in software development life cycle. The relationship between functional and non-functional requirements often plays a crucial role in resolving conflicts or arriving at decisions in requirements engineering phase. Goal-Oriented Requirements Engineering (GORE) methods make a good attempt of addressing these aspects which ar…
▽ More
Decision support system in Requirements engineering plays an important role in software development life cycle. The relationship between functional and non-functional requirements often plays a crucial role in resolving conflicts or arriving at decisions in requirements engineering phase. Goal-Oriented Requirements Engineering (GORE) methods make a good attempt of addressing these aspects which are helpful in decision support. We propose a GORE method - Integrating goals after prioritization and evaluation (IGAPE). The method is semi-formal in nature thereby ensuring active stakeholder participation. In this paper we elaborate the various steps of IGAPE method. The output of IGAPE is then given as input to a decision support system which makes use of Analytic Hierarchy Process (AHP) and Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS). Integration of IGAPE with AHP and TOPSIS will clearly provide a rationale for various decisions which are arrived at during the requirements engineering phase. The method is illustrated for an e-commerce application and is validated by expert analysis approach.
△ Less
Submitted 8 December, 2014;
originally announced December 2014.