-
Physical networks become what they learn
Authors:
Menachem Stern,
Marcelo Guzman,
Felipe Martins,
Andrea J Liu,
Vijay Balasubramanian
Abstract:
Physical networks can develop diverse responses, or functions, by design, evolution or learning. We focus on electrical networks of nodes connected by resistive edges. Such networks can learn by adapting edge conductances to lower a cost function that penalizes deviations from a desired response. The network must also satisfy Kirchhoff's law, balancing currents at nodes, or, equivalently, minimizi…
▽ More
Physical networks can develop diverse responses, or functions, by design, evolution or learning. We focus on electrical networks of nodes connected by resistive edges. Such networks can learn by adapting edge conductances to lower a cost function that penalizes deviations from a desired response. The network must also satisfy Kirchhoff's law, balancing currents at nodes, or, equivalently, minimizing total power dissipation by adjusting node voltages. The adaptation is thus a double optimization process, in which a cost function is minimized with respect to conductances, while dissipated power is minimized with respect to node voltages. Here we study how this physical adaptation couples the cost landscape, the landscape of the cost function in the high-dimensional space of edge conductances, to the physical landscape, the dissipated power in the high-dimensional space of node voltages. We show how adaptation links the physical and cost Hessian matrices, suggesting that the physical response of networks to perturbations holds significant information about the functions to which they are adapted.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Can Better Text Semantics in Prompt Tuning Improve VLM Generalization?
Authors:
Hari Chandana Kuchibhotla,
Sai Srinivas Kancheti,
Abbavaram Gowtham Reddy,
Vineeth N Balasubramanian
Abstract:
Going beyond mere fine-tuning of vision-language models (VLMs), learnable prompt tuning has emerged as a promising, resource-efficient alternative. Despite their potential, effectively learning prompts faces the following challenges: (i) training in a low-shot scenario results in overfitting, limiting adaptability, and yielding weaker performance on newer classes or datasets; (ii) prompt-tuning's…
▽ More
Going beyond mere fine-tuning of vision-language models (VLMs), learnable prompt tuning has emerged as a promising, resource-efficient alternative. Despite their potential, effectively learning prompts faces the following challenges: (i) training in a low-shot scenario results in overfitting, limiting adaptability, and yielding weaker performance on newer classes or datasets; (ii) prompt-tuning's efficacy heavily relies on the label space, with decreased performance in large class spaces, signaling potential gaps in bridging image and class concepts. In this work, we investigate whether better text semantics can help address these concerns. In particular, we introduce a prompt-tuning method that leverages class descriptions obtained from Large Language Models (LLMs). These class descriptions are used to bridge image and text modalities. Our approach constructs part-level description-guided image and text features, which are subsequently aligned to learn more generalizable prompts. Our comprehensive experiments conducted across 11 benchmark datasets show that our method outperforms established methods, demonstrating substantial improvements.
△ Less
Submitted 20 June, 2024; v1 submitted 13 May, 2024;
originally announced May 2024.
-
C2FDrone: Coarse-to-Fine Drone-to-Drone Detection using Vision Transformer Networks
Authors:
Sairam VC Rebbapragada,
Pranoy Panda,
Vineeth N Balasubramanian
Abstract:
A vision-based drone-to-drone detection system is crucial for various applications like collision avoidance, countering hostile drones, and search-and-rescue operations. However, detecting drones presents unique challenges, including small object sizes, distortion, occlusion, and real-time processing requirements. Current methods integrating multi-scale feature fusion and temporal information have…
▽ More
A vision-based drone-to-drone detection system is crucial for various applications like collision avoidance, countering hostile drones, and search-and-rescue operations. However, detecting drones presents unique challenges, including small object sizes, distortion, occlusion, and real-time processing requirements. Current methods integrating multi-scale feature fusion and temporal information have limitations in handling extreme blur and minuscule objects. To address this, we propose a novel coarse-to-fine detection strategy based on vision transformers. We evaluate our approach on three challenging drone-to-drone detection datasets, achieving F1 score enhancements of 7%, 3%, and 1% on the FL-Drones, AOT, and NPS-Drones datasets, respectively. Additionally, we demonstrate real-time processing capabilities by deploying our model on an edge-computing device. Our code will be made publicly available.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Fiducial Focus Augmentation for Facial Landmark Detection
Authors:
Purbayan Kar,
Vishal Chudasama,
Naoyuki Onoe,
Pankaj Wasnik,
Vineeth Balasubramanian
Abstract:
Deep learning methods have led to significant improvements in the performance on the facial landmark detection (FLD) task. However, detecting landmarks in challenging settings, such as head pose changes, exaggerated expressions, or uneven illumination, continue to remain a challenge due to high variability and insufficient samples. This inadequacy can be attributed to the model's inability to effe…
▽ More
Deep learning methods have led to significant improvements in the performance on the facial landmark detection (FLD) task. However, detecting landmarks in challenging settings, such as head pose changes, exaggerated expressions, or uneven illumination, continue to remain a challenge due to high variability and insufficient samples. This inadequacy can be attributed to the model's inability to effectively acquire appropriate facial structure information from the input images. To address this, we propose a novel image augmentation technique specifically designed for the FLD task to enhance the model's understanding of facial structures. To effectively utilize the newly proposed augmentation technique, we employ a Siamese architecture-based training mechanism with a Deep Canonical Correlation Analysis (DCCA)-based loss to achieve collective learning of high-level feature representations from two different views of the input images. Furthermore, we employ a Transformer + CNN-based network with a custom hourglass module as the robust backbone for the Siamese framework. Extensive experiments show that our approach outperforms multiple state-of-the-art approaches across various benchmark datasets.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks
Authors:
Tanmay Garg,
Deepika Vemuri,
Vineeth N Balasubramanian
Abstract:
This paper presents a novel concept learning framework for enhancing model interpretability and performance in visual classification tasks. Our approach appends an unsupervised explanation generator to the primary classifier network and makes use of adversarial training. During training, the explanation module is optimized to extract visual concepts from the classifier's latent representations, wh…
▽ More
This paper presents a novel concept learning framework for enhancing model interpretability and performance in visual classification tasks. Our approach appends an unsupervised explanation generator to the primary classifier network and makes use of adversarial training. During training, the explanation module is optimized to extract visual concepts from the classifier's latent representations, while the GAN-based module aims to discriminate images generated from concepts, from true images. This joint training scheme enables the model to implicitly align its internally learned concepts with human-interpretable visual properties. Comprehensive experiments demonstrate the robustness of our approach, while producing coherent concept activations. We analyse the learned concepts, showing their semantic concordance with object parts and visual attributes. We also study how perturbations in the adversarial training protocol impact both classification and concept acquisition. In summary, this work presents a significant step towards building inherently interpretable deep vision models with task-aligned concept representations - a key enabler for develo** trustworthy AI for real-world perception tasks.
△ Less
Submitted 3 April, 2024; v1 submitted 9 January, 2024;
originally announced January 2024.
-
Rethinking Robustness of Model Attributions
Authors:
Sandesh Kamath,
Sankalp Mittal,
Amit Deshpande,
Vineeth N Balasubramanian
Abstract:
For machine learning models to be reliable and trustworthy, their decisions must be interpretable. As these models find increasing use in safety-critical applications, it is important that not just the model predictions but also their explanations (as feature attributions) be robust to small human-imperceptible input perturbations. Recent works have shown that many attribution methods are fragile…
▽ More
For machine learning models to be reliable and trustworthy, their decisions must be interpretable. As these models find increasing use in safety-critical applications, it is important that not just the model predictions but also their explanations (as feature attributions) be robust to small human-imperceptible input perturbations. Recent works have shown that many attribution methods are fragile and have proposed improvements in either these methods or the model training. We observe two main causes for fragile attributions: first, the existing metrics of robustness (e.g., top-k intersection) over-penalize even reasonable local shifts in attribution, thereby making random perturbations to appear as a strong attack, and second, the attribution can be concentrated in a small region even when there are multiple important parts in an image. To rectify this, we propose simple ways to strengthen existing metrics and attribution methods that incorporate locality of pixels in robustness metrics and diversity of pixel locations in attributions. Towards the role of model training in attributional robustness, we empirically observe that adversarially trained models have more robust attributions on smaller datasets, however, this advantage disappears in larger datasets. Code is available at https://github.com/ksandeshk/LENS.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
The entropy of finite gravitating regions
Authors:
Vijay Balasubramanian,
Charlie Cummings
Abstract:
We develop a formalism for calculating the entanglement entropy of an arbitrary spatial region of a gravitating spacetime at a moment of time symmetry. The crucial ingredient is a path integral over embeddings of the region into the overall spacetime, interpretable as a sum over the edge modes associated with the region. We find that the entanglement entropy of a gravitating region equals the mini…
▽ More
We develop a formalism for calculating the entanglement entropy of an arbitrary spatial region of a gravitating spacetime at a moment of time symmetry. The crucial ingredient is a path integral over embeddings of the region into the overall spacetime, interpretable as a sum over the edge modes associated with the region. We find that the entanglement entropy of a gravitating region equals the minimal surface area among all regions that enclose it. This suggests a notion of "terrestrial holography" where regions of space can encode larger ones, in contrast to the standard form of holography, in which degrees of freedom on the celestial sphere at the boundary of the universe encode the interior.
△ Less
Submitted 11 January, 2024; v1 submitted 13 December, 2023;
originally announced December 2023.
-
Quantum chaos, integrability, and late times in the Krylov basis
Authors:
Vijay Balasubramanian,
Javier M. Magan,
Qingyue Wu
Abstract:
Quantum chaotic systems are conjectured to display a spectrum whose fine-grained features (gaps and correlations) are well described by Random Matrix Theory (RMT). We propose and develop a complementary version of this conjecture: quantum chaotic systems display a Lanczos spectrum whose local means and covariances are well described by RMT. To support this proposal, we first demonstrate its validi…
▽ More
Quantum chaotic systems are conjectured to display a spectrum whose fine-grained features (gaps and correlations) are well described by Random Matrix Theory (RMT). We propose and develop a complementary version of this conjecture: quantum chaotic systems display a Lanczos spectrum whose local means and covariances are well described by RMT. To support this proposal, we first demonstrate its validity in examples of chaotic and integrable systems. We then show that for Haar-random initial states in RMTs the mean and covariance of the Lanczos spectrum suffices to produce the full long time behavior of general survival probabilities including the spectral form factor, as well as the spread complexity. In addition, for initial states with continuous overlap with energy eigenstates, we analytically find the long time averages of the probabilities of Krylov basis elements in terms of the mean Lanczos spectrum. This analysis suggests a notion of eigenstate complexity, the statistics of which differentiate integrable systems and classes of quantum chaos. Finally, we clarify the relation between spread complexity and the universality classes of RMT by exploring various values of the Dyson index and Poisson distributed spectra.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Is stochastic thermodynamics the key to understanding the energy costs of computation?
Authors:
David Wolpert,
Jan Korbel,
Christopher Lynn,
Farita Tasnim,
Joshua Grochow,
Gülce Kardeş,
James Aimone,
Vijay Balasubramanian,
Eric de Giuli,
David Doty,
Nahuel Freitas,
Matteo Marsili,
Thomas E. Ouldridge,
Andrea Richa,
Paul Riechers,
Édgar Roldán,
Brenda Rubenstein,
Zoltan Toroczkai,
Joseph Paradiso
Abstract:
The relationship between the thermodynamic and computational characteristics of dynamical physical systems has been a major theoretical interest since at least the 19th century, and has been of increasing practical importance as the energetic cost of digital devices has exploded over the last half century. One of the most important thermodynamic features of real-world computers is that they operat…
▽ More
The relationship between the thermodynamic and computational characteristics of dynamical physical systems has been a major theoretical interest since at least the 19th century, and has been of increasing practical importance as the energetic cost of digital devices has exploded over the last half century. One of the most important thermodynamic features of real-world computers is that they operate very far from thermal equilibrium, in finite time, with many quickly (co-)evolving degrees of freedom. Such computers also must almost always obey multiple physical constraints on how they work. For example, all modern digital computers are periodic processes, governed by a global clock. Another example is that many computers are modular, hierarchical systems, with strong restrictions on the connectivity of their subsystems. This properties hold both for naturally occurring computers, like brains or Eukaryotic cells, as well as digital systems. These features of real-world computers are absent in 20th century analyses of the thermodynamics of computational processes, which focused on quasi-statically slow processes. However, the field of stochastic thermodynamics has been developed in the last few decades - and it provides the formal tools for analyzing systems that have exactly these features of real-world computers. We argue here that these tools, together with other tools currently being developed in stochastic thermodynamics, may help us understand at a far deeper level just how the fundamental physical properties of dynamic systems are related to the computation that they perform.
△ Less
Submitted 30 November, 2023; v1 submitted 28 November, 2023;
originally announced November 2023.
-
MADG: Margin-based Adversarial Learning for Domain Generalization
Authors:
Aveen Dayal,
Vimal K. B.,
Linga Reddy Cenkeramaddi,
C. Krishna Mohan,
Abhinav Kumar,
Vineeth N Balasubramanian
Abstract:
Domain Generalization (DG) techniques have emerged as a popular approach to address the challenges of domain shift in Deep Learning (DL), with the goal of generalizing well to the target domain unseen during the training. In recent years, numerous methods have been proposed to address the DG setting, among which one popular approach is the adversarial learning-based methodology. The main idea behi…
▽ More
Domain Generalization (DG) techniques have emerged as a popular approach to address the challenges of domain shift in Deep Learning (DL), with the goal of generalizing well to the target domain unseen during the training. In recent years, numerous methods have been proposed to address the DG setting, among which one popular approach is the adversarial learning-based methodology. The main idea behind adversarial DG methods is to learn domain-invariant features by minimizing a discrepancy metric. However, most adversarial DG methods use 0-1 loss based $\mathcal{H}Δ\mathcal{H}$ divergence metric. In contrast, the margin loss-based discrepancy metric has the following advantages: more informative, tighter, practical, and efficiently optimizable. To mitigate this gap, this work proposes a novel adversarial learning DG algorithm, MADG, motivated by a margin loss-based discrepancy metric. The proposed MADG model learns domain-invariant features across all source domains and uses adversarial training to generalize well to the unseen target domain. We also provide a theoretical analysis of the proposed MADG model based on the unseen target error bound. Specifically, we construct the link between the source and unseen domains in the real-valued hypothesis space and derive the generalization bound using margin loss and Rademacher complexity. We extensively experiment with the MADG model on popular real-world DG datasets, VLCS, PACS, OfficeHome, DomainNet, and TerraIncognita. We evaluate the proposed algorithm on DomainBed's benchmark and observe consistent performance across all the datasets.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Causal Inference Using LLM-Guided Discovery
Authors:
Aniket Vashishtha,
Abbavaram Gowtham Reddy,
Abhinav Kumar,
Saketh Bachu,
Vineeth N Balasubramanian,
Amit Sharma
Abstract:
At the core of causal inference lies the challenge of determining reliable causal graphs solely based on observational data. Since the well-known backdoor criterion depends on the graph, any errors in the graph can propagate downstream to effect inference. In this work, we initially show that complete graph information is not necessary for causal effect inference; the topological order over graph…
▽ More
At the core of causal inference lies the challenge of determining reliable causal graphs solely based on observational data. Since the well-known backdoor criterion depends on the graph, any errors in the graph can propagate downstream to effect inference. In this work, we initially show that complete graph information is not necessary for causal effect inference; the topological order over graph variables (causal order) alone suffices. Further, given a node pair, causal order is easier to elicit from domain experts compared to graph edges since determining the existence of an edge can depend extensively on other variables. Interestingly, we find that the same principle holds for Large Language Models (LLMs) such as GPT-3.5-turbo and GPT-4, motivating an automated method to obtain causal order (and hence causal effect) with LLMs acting as virtual domain experts. To this end, we employ different prompting strategies and contextual cues to propose a robust technique of obtaining causal order from LLMs. Acknowledging LLMs' limitations, we also study possible techniques to integrate LLMs with established causal discovery algorithms, including constraint-based and score-based methods, to enhance their performance. Extensive experiments demonstrate that our approach significantly improves causal ordering accuracy as compared to discovery algorithms, highlighting the potential of LLMs to enhance causal inference across diverse fields.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Mitigating the Effect of Incidental Correlations on Part-based Learning
Authors:
Gaurav Bhatt,
Deepayan Das,
Leonid Sigal,
Vineeth N Balasubramanian
Abstract:
Intelligent systems possess a crucial characteristic of breaking complicated problems into smaller reusable components or parts and adjusting to new tasks using these part representations. However, current part-learners encounter difficulties in dealing with incidental correlations resulting from the limited observations of objects that may appear only in specific arrangements or with specific bac…
▽ More
Intelligent systems possess a crucial characteristic of breaking complicated problems into smaller reusable components or parts and adjusting to new tasks using these part representations. However, current part-learners encounter difficulties in dealing with incidental correlations resulting from the limited observations of objects that may appear only in specific arrangements or with specific backgrounds. These incidental correlations may have a detrimental impact on the generalization and interpretability of learned part representations. This study asserts that part-based representations could be more interpretable and generalize better with limited data, employing two innovative regularization methods. The first regularization separates foreground and background information's generative process via a unique mixture-of-parts formulation. Structural constraints are imposed on the parts using a weakly-supervised loss, guaranteeing that the mixture-of-parts for foreground and background entails soft, object-agnostic masks. The second regularization assumes the form of a distillation loss, ensuring the invariance of the learned parts to the incidental background correlations. Furthermore, we incorporate sparse and orthogonal constraints to facilitate learning high-quality part representations. By reducing the impact of incidental background correlations on the learned parts, we exhibit state-of-the-art (SoTA) performance on few-shot learning tasks on benchmark datasets, including MiniImagenet, TieredImageNet, and FC100. We also demonstrate that the part-based representations acquired through our approach generalize better than existing techniques, even under domain shifts of the background and common data corruption on the ImageNet-9 dataset. The implementation is available on GitHub: https://github.com/GauravBh1010tt/DPViT.git
△ Less
Submitted 30 September, 2023;
originally announced October 2023.
-
Explaining Deep Face Algorithms through Visualization: A Survey
Authors:
Thrupthi Ann John,
Vineeth N Balasubramanian,
C. V. Jawahar
Abstract:
Although current deep models for face tasks surpass human performance on some benchmarks, we do not understand how they work. Thus, we cannot predict how it will react to novel inputs, resulting in catastrophic failures and unwanted biases in the algorithms. Explainable AI helps bridge the gap, but currently, there are very few visualization algorithms designed for faces. This work undertakes a fi…
▽ More
Although current deep models for face tasks surpass human performance on some benchmarks, we do not understand how they work. Thus, we cannot predict how it will react to novel inputs, resulting in catastrophic failures and unwanted biases in the algorithms. Explainable AI helps bridge the gap, but currently, there are very few visualization algorithms designed for faces. This work undertakes a first-of-its-kind meta-analysis of explainability algorithms in the face domain. We explore the nuances and caveats of adapting general-purpose visualization algorithms to the face domain, illustrated by computing visualizations on popular face models. We review existing face explainability works and reveal valuable insights into the structure and hierarchy of face networks. We also determine the design considerations for practical face visualizations accessible to AI practitioners by conducting a user study on the utility of various explainability algorithms.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach
Authors:
Vimal K B,
Saketh Bachu,
Tanmay Garg,
Niveditha Lakshmi Narasimhan,
Raghavan Konuru,
Vineeth N Balasubramanian
Abstract:
Estimating the transferability of publicly available pretrained models to a target task has assumed an important place for transfer learning tasks in recent years. Existing efforts propose metrics that allow a user to choose one model from a pool of pre-trained models without having to fine-tune each model individually and identify one explicitly. With the growth in the number of available pre-tra…
▽ More
Estimating the transferability of publicly available pretrained models to a target task has assumed an important place for transfer learning tasks in recent years. Existing efforts propose metrics that allow a user to choose one model from a pool of pre-trained models without having to fine-tune each model individually and identify one explicitly. With the growth in the number of available pre-trained models and the popularity of model ensembles, it also becomes essential to study the transferability of multiple-source models for a given target task. The few existing efforts study transferability in such multi-source ensemble settings using just the outputs of the classification layer and neglect possible domain or task mismatch. Moreover, they overlook the most important factor while selecting the source models, viz., the cohesiveness factor between them, which can impact the performance and confidence in the prediction of the ensemble. To address these gaps, we propose a novel Optimal tranSport-based suBmOdular tRaNsferability metric (OSBORN) to estimate the transferability of an ensemble of models to a downstream task. OSBORN collectively accounts for image domain difference, task difference, and cohesiveness of models in the ensemble to provide reliable estimates of transferability. We gauge the performance of OSBORN on both image classification and semantic segmentation tasks. Our setup includes 28 source datasets, 11 target datasets, 5 model architectures, and 2 pre-training methods. We benchmark our method against current state-of-the-art metrics MS-LEEP and E-LEEP, and outperform them consistently using the proposed approach.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
de Sitter space is sometimes not empty
Authors:
Vijay Balasubramanian,
Yasunori Nomura,
Tomonori Uga**
Abstract:
Multiple lines of evidence suggest that the Hilbert space of an isolated de Sitter universe is one dimensional but can appear larger when probed by a gravitating observer. To test this idea, we compute the von Neumann entropy of a field theory in a two-dimensional de Sitter universe which is entangled in a thermal-like state with the same field theory on a disjoint, asymptotically anti-de Sitter (…
▽ More
Multiple lines of evidence suggest that the Hilbert space of an isolated de Sitter universe is one dimensional but can appear larger when probed by a gravitating observer. To test this idea, we compute the von Neumann entropy of a field theory in a two-dimensional de Sitter universe which is entangled in a thermal-like state with the same field theory on a disjoint, asymptotically anti-de Sitter (AdS) black hole. Previously, it was shown that the replica trick for computing the entropy of such entangled gravitating systems requires the inclusion of a non-perturbative effect in quantum gravity -- novel wormholes connecting the two spaces. Here we show that: (a) the expected wormholes connecting de Sitter and AdS universes exist, avoiding a no-go theorem via the presence of sources on the AdS boundary; (b) the entanglement entropy vanishes if the nominal entropy of the de Sitter cosmological horizon (S_dS = A_horizon^dS / 4G_N) is less than the entropy of the AdS black hole horizon (S_BH = A_horizon^AdS / 4G_N), i.e., S_dS < S_BH; (c) the entanglement entropy is finite when S_dS > S_BH. Thus, the de Sitter Hilbert space is effectively nontrivial only when S_dS > S_BH. The AdS black hole we introduce can be regarded as an ``observer'' for de Sitter space. In this sense, our result is a non-perturbative generalization of the recent perturbative argument that the algebra of observables on the de Sitter static patch becomes nontrivial in the presence of an observer.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
The Physical Effects of Learning
Authors:
Menachem Stern,
Andrea J. Liu,
Vijay Balasubramanian
Abstract:
Interacting many-body physical systems ranging from neural networks in the brain to folding proteins to self-modifying electrical circuits can learn to perform diverse tasks. This learning, both in nature and in engineered systems, can occur through evolutionary selection or through dynamical rules that drive active learning from experience. Here, we show that \added{learning in linear physical ne…
▽ More
Interacting many-body physical systems ranging from neural networks in the brain to folding proteins to self-modifying electrical circuits can learn to perform diverse tasks. This learning, both in nature and in engineered systems, can occur through evolutionary selection or through dynamical rules that drive active learning from experience. Here, we show that \added{learning in linear physical networks with weak input signals} leaves architectural imprints on the Hessian of a physical system. Compared to a generic organization of the system components, (a) the effective physical dimension of the response to inputs decreases, (b) the response of physical degrees of freedom to random perturbations (or system ``susceptibility'') increases, and (c) the low-eigenvalue eigenvectors of the Hessian align with the task. Overall, these effects embody the typical scenario for learning processes in physical systems in the weak input regime, suggesting ways of discovering whether a physical network may have been trained.
△ Less
Submitted 27 January, 2024; v1 submitted 22 June, 2023;
originally announced June 2023.
-
On Counterfactual Data Augmentation Under Confounding
Authors:
Abbavaram Gowtham Reddy,
Saketh Bachu,
Saloni Dash,
Charchit Sharma,
Amit Sharma,
Vineeth N Balasubramanian
Abstract:
Counterfactual data augmentation has recently emerged as a method to mitigate confounding biases in the training data. These biases, such as spurious correlations, arise due to various observed and unobserved confounding variables in the data generation process. In this paper, we formally analyze how confounding biases impact downstream classifiers and present a causal viewpoint to the solutions b…
▽ More
Counterfactual data augmentation has recently emerged as a method to mitigate confounding biases in the training data. These biases, such as spurious correlations, arise due to various observed and unobserved confounding variables in the data generation process. In this paper, we formally analyze how confounding biases impact downstream classifiers and present a causal viewpoint to the solutions based on counterfactual data augmentation. We explore how removing confounding biases serves as a means to learn invariant features, ultimately aiding in generalization beyond the observed data distribution. Additionally, we present a straightforward yet powerful algorithm for generating counterfactual images, which effectively mitigates the influence of confounding effects on downstream classifiers. Through experiments on MNIST variants and the CelebA datasets, we demonstrate how our simple augmentation method helps existing state-of-the-art methods achieve good results.
△ Less
Submitted 21 November, 2023; v1 submitted 29 May, 2023;
originally announced May 2023.
-
Playing it safe: information constrains collective betting strategies
Authors:
Philipp Fleig,
Vijay Balasubramanian
Abstract:
Every interaction of a living organism with its environment involves the placement of a bet. Armed with partial knowledge about a stochastic world, the organism must decide its next step or near-term strategy, an act that implicitly or explicitly involves the assumption of a model of the world. Better information about environmental statistics can improve the bet quality, but in practice resources…
▽ More
Every interaction of a living organism with its environment involves the placement of a bet. Armed with partial knowledge about a stochastic world, the organism must decide its next step or near-term strategy, an act that implicitly or explicitly involves the assumption of a model of the world. Better information about environmental statistics can improve the bet quality, but in practice resources for information gathering are always limited. We argue that theories of optimal inference dictate that ``complex'' models are harder to infer with bounded information and lead to larger prediction errors. Thus, we propose a principle of ``playing it safe'' where, given finite information gathering capacity, biological systems should be biased towards simpler models of the world, and thereby to less risky betting strategies. In the framework of Bayesian inference, we show that there is an optimally safe adaptation strategy determined by the Bayesian prior. We then demonstrate that, in the context of stochastic phenotypic switching by bacteria, implementation of our principle of ``playing it safe'' increases fitness (population growth rate) of the bacterial collective. We suggest that the principle applies broadly to problems of adaptation, learning and evolution, and illuminates the types of environments in which organisms are able to thrive.
△ Less
Submitted 28 May, 2023; v1 submitted 18 April, 2023;
originally announced April 2023.
-
Dispersion and Orientation patterns in nanorod-infused polymer melts
Authors:
Navid Afrasiabian,
Venkat Balasubramanian,
Colin Denniston
Abstract:
Introducing nanorods into a polymeric matrix can enhance the physical and mechanical properties of the resulting material. In this paper, we focus on understanding the dispersion and orientation patterns of nanorods in an unentangled polymer melt, particularly as a function of nanorod concentration, using Molecular Dynamics (MD) simulations. The system is comprised of flexible polymer chains and m…
▽ More
Introducing nanorods into a polymeric matrix can enhance the physical and mechanical properties of the resulting material. In this paper, we focus on understanding the dispersion and orientation patterns of nanorods in an unentangled polymer melt, particularly as a function of nanorod concentration, using Molecular Dynamics (MD) simulations. The system is comprised of flexible polymer chains and multi-thread nanorods that are equilibrated in the NPT ensemble. All interactions are purely repulsive except for those between polymers and rods. Results with attractive versus repulsive polymer-rod interactions are compared and contrasted. The concentration of rods has a direct impact on the phase behaviour of the system. At lower concentrations rods phase separate into nematic clusters, while at higher concentrations more isotropic and less structured rod configurations are observed. A detailed examination of the conformation of the polymer chains near the rod surface shows extension of the chains along the director of the rods (especially within clusters). The dispersion and orientation of the nanorods is a result of the competition between depletion entropic forces responsible for the formation of rod clusters, the enthalpic effects that improve mixing of rods and polymer, and entropic losses of polymers interpenetrating rod clusters.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
$Δ$-Patching: A Framework for Rapid Adaptation of Pre-trained Convolutional Networks without Base Performance Loss
Authors:
Chaitanya Devaguptapu,
Samarth Sinha,
K J Joseph,
Vineeth N Balasubramanian,
Animesh Garg
Abstract:
Models pre-trained on large-scale datasets are often fine-tuned to support newer tasks and datasets that arrive over time. This process necessitates storing copies of the model over time for each task that the pre-trained model is fine-tuned to. Building on top of recent model patching work, we propose $Δ$-Patching for fine-tuning neural network models in an efficient manner, without the need to s…
▽ More
Models pre-trained on large-scale datasets are often fine-tuned to support newer tasks and datasets that arrive over time. This process necessitates storing copies of the model over time for each task that the pre-trained model is fine-tuned to. Building on top of recent model patching work, we propose $Δ$-Patching for fine-tuning neural network models in an efficient manner, without the need to store model copies. We propose a simple and lightweight method called $Δ$-Networks to achieve this objective. Our comprehensive experiments across setting and architecture variants show that $Δ$-Networks outperform earlier model patching work while only requiring a fraction of parameters to be trained. We also show that this approach can be used for other problem settings such as transfer learning and zero-shot domain adaptation, as well as other tasks such as detection and segmentation.
△ Less
Submitted 21 September, 2023; v1 submitted 26 March, 2023;
originally announced March 2023.
-
Towards Learning and Explaining Indirect Causal Effects in Neural Networks
Authors:
Abbavaram Gowtham Reddy,
Saketh Bachu,
Harsharaj Pathak,
Benin L Godfrey,
Vineeth N. Balasubramanian,
Varshaneya V,
Satya Narayanan Kar
Abstract:
Recently, there has been a growing interest in learning and explaining causal effects within Neural Network (NN) models. By virtue of NN architectures, previous approaches consider only direct and total causal effects assuming independence among input variables. We view an NN as a structural causal model (SCM) and extend our focus to include indirect causal effects by introducing feedforward conne…
▽ More
Recently, there has been a growing interest in learning and explaining causal effects within Neural Network (NN) models. By virtue of NN architectures, previous approaches consider only direct and total causal effects assuming independence among input variables. We view an NN as a structural causal model (SCM) and extend our focus to include indirect causal effects by introducing feedforward connections among input neurons. We propose an ante-hoc method that captures and maintains direct, indirect, and total causal effects during NN model training. We also propose an algorithm for quantifying learned causal effects in an NN model and efficient approximation strategies for quantifying causal effects in high-dimensional data. Extensive experiments conducted on synthetic and real-world datasets demonstrate that the causal effects learned by our ante-hoc method better approximate the ground truth effects compared to existing methods.
△ Less
Submitted 8 January, 2024; v1 submitted 24 March, 2023;
originally announced March 2023.
-
Classification of Methods to Reduce Clinical Alarm Signals for Remote Patient Monitoring: A Critical Review
Authors:
Teena Arora,
Venki Balasubramanian,
Andrew Stranieri,
Shenhan Mai,
Rajkumar Buyya,
Sardar Islam
Abstract:
Remote Patient Monitoring (RPM) is an emerging technology paradigm that helps reduce clinician workload by automated monitoring and raising intelligent alarm signals. High sensitivity and intelligent data-processing algorithms used in RPM devices result in frequent false-positive alarms, resulting in alarm fatigue. This study aims to critically review the existing literature to identify the causes…
▽ More
Remote Patient Monitoring (RPM) is an emerging technology paradigm that helps reduce clinician workload by automated monitoring and raising intelligent alarm signals. High sensitivity and intelligent data-processing algorithms used in RPM devices result in frequent false-positive alarms, resulting in alarm fatigue. This study aims to critically review the existing literature to identify the causes of these false-positive alarms and categorize the various interventions used in the literature to eliminate these causes. That act as a catalog and helps in false alarm reduction algorithm design. A step-by-step approach to building an effective alarm signal generator for clinical use has been proposed in this work. Second, the possible causes of false-positive alarms amongst RPM applications were analyzed from the literature. Third, a critical review has been done of the various interventions used in the literature depending on causes and classification based on four major approaches: clinical knowledge, physiological data, medical sensor devices, and clinical environments. A practical clinical alarm strategy could be developed by following our pentagon approach. The first phase of this approach emphasizes identifying the various causes for the high number of false-positive alarms. Future research will focus on develo** a false alarm reduction method using data mining.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Quantum Error Correction from Complexity in Brownian SYK
Authors:
Vijay Balasubramanian,
Arjun Kar,
Cathy Li,
Onkar Parrikar,
Harshit Rajgadia
Abstract:
We study the robustness of quantum error correction in a one-parameter ensemble of codes generated by the Brownian SYK model, where the parameter quantifies the encoding complexity. The robustness of error correction by a quantum code is upper bounded by the "mutual purity" of a certain entangled state between the code subspace and environment in the isometric extension of the error channel, where…
▽ More
We study the robustness of quantum error correction in a one-parameter ensemble of codes generated by the Brownian SYK model, where the parameter quantifies the encoding complexity. The robustness of error correction by a quantum code is upper bounded by the "mutual purity" of a certain entangled state between the code subspace and environment in the isometric extension of the error channel, where the mutual purity of a density matrix $ρ_{AB}$ is the difference $\mathcal{F}_ρ(A:B) \equiv \mathrm{Tr}\;ρ_{AB}^2 - \mathrm{Tr}\;ρ_A^2\;\mathrm{Tr}\;ρ_B^2$. We show that when the encoding complexity is small, the mutual purity is $O(1)$ for the erasure of a small number of qubits (i.e., the encoding is fragile). However, this quantity decays exponentially, becoming $O(1/N)$ for $O(\log N)$ encoding complexity. Further, at polynomial encoding complexity, the mutual purity saturates to a plateau of $O(e^{-N})$. We also find a hierarchy of complexity scales associated to a tower of subleading contributions to the mutual purity that quantitatively, but not qualitatively, adjust our error correction bound as encoding complexity increases. In the AdS/CFT context, our results suggest that any portion of the entanglement wedge of a general boundary subregion $A$ with sufficiently high encoding complexity is robustly protected against low-rank errors acting on $A$ with no prior access to the encoding map. From the bulk point of view, we expect such bulk degrees of freedom to be causally inaccessible from the region $A$ despite being encoded in it.
△ Less
Submitted 17 January, 2023;
originally announced January 2023.
-
Towards Estimating Transferability using Hard Subsets
Authors:
Tarun Ram Menta,
Surgan Jandial,
Akash Patil,
Vimal KB,
Saketh Bachu,
Balaji Krishnamurthy,
Vineeth N. Balasubramanian,
Chirag Agarwal,
Mausoom Sarkar
Abstract:
As transfer learning techniques are increasingly used to transfer knowledge from the source model to the target task, it becomes important to quantify which source models are suitable for a given target task without performing computationally expensive fine tuning. In this work, we propose HASTE (HArd Subset TransfErability), a new strategy to estimate the transferability of a source model to a pa…
▽ More
As transfer learning techniques are increasingly used to transfer knowledge from the source model to the target task, it becomes important to quantify which source models are suitable for a given target task without performing computationally expensive fine tuning. In this work, we propose HASTE (HArd Subset TransfErability), a new strategy to estimate the transferability of a source model to a particular target task using only a harder subset of target data. By leveraging the internal and output representations of model, we introduce two techniques, one class agnostic and another class specific, to identify harder subsets and show that HASTE can be used with any existing transferability metric to improve their reliability. We further analyze the relation between HASTE and the optimal average log likelihood as well as negative conditional entropy and empirically validate our theoretical bounds. Our experimental results across multiple source model architectures, target datasets, and transfer learning tasks show that HASTE modified metrics are consistently better or on par with the state of the art transferability metrics.
△ Less
Submitted 17 January, 2023;
originally announced January 2023.
-
Microscopic origin of the entropy of astrophysical black holes
Authors:
Vijay Balasubramanian,
Albion Lawrence,
Javier M. Magan,
Martin Sasieta
Abstract:
We construct an infinite family of microstates for black holes in Minkowski spacetime which have effective semiclassical descriptions in terms of collapsing dust shells in the black hole interior. Quantum mechanical wormholes cause these states to have exponentially small, but universal, overlaps. We show that these overlaps imply that the microstates span a Hilbert space of log dimension equal to…
▽ More
We construct an infinite family of microstates for black holes in Minkowski spacetime which have effective semiclassical descriptions in terms of collapsing dust shells in the black hole interior. Quantum mechanical wormholes cause these states to have exponentially small, but universal, overlaps. We show that these overlaps imply that the microstates span a Hilbert space of log dimension equal to the event horizon area divided by four times the Newton constant, explaining the statistical origin of the Bekenstein-Hawking entropy.
△ Less
Submitted 2 November, 2023; v1 submitted 16 December, 2022;
originally announced December 2022.
-
Microscopic origin of the entropy of black holes in general relativity
Authors:
Vijay Balasubramanian,
Albion Lawrence,
Javier M. Magan,
Martin Sasieta
Abstract:
We construct an infinite family of microstates with geometric interiors for eternal black holes in general relativity with negative cosmological constant in any dimension. Wormholes in the Euclidean path integral for gravity cause these states to have small, but non-zero, quantum mechanical overlaps that have a universal form. The overlaps have a dramatic consequence: the microstates span a Hilber…
▽ More
We construct an infinite family of microstates with geometric interiors for eternal black holes in general relativity with negative cosmological constant in any dimension. Wormholes in the Euclidean path integral for gravity cause these states to have small, but non-zero, quantum mechanical overlaps that have a universal form. The overlaps have a dramatic consequence: the microstates span a Hilbert space of log dimension equal to the Bekenstein-Hawking entropy. The semiclassical microstates we construct contain Einstein-Rosen bridges of arbitrary size behind their horizons. Our results imply that all these bridges can be interpreted as quantum superpositions of wormholes of size at most exponential in the entropy.
△ Less
Submitted 17 October, 2023; v1 submitted 5 December, 2022;
originally announced December 2022.
-
On the Robustness of Explanations of Deep Neural Network Models: A Survey
Authors:
Amlan Jyoti,
Karthik Balaji Ganesh,
Manoj Gayala,
Nandita Lakshmi Tunuguntla,
Sandesh Kamath,
Vineeth N Balasubramanian
Abstract:
Explainability has been widely stated as a cornerstone of the responsible and trustworthy use of machine learning models. With the ubiquitous use of Deep Neural Network (DNN) models expanding to risk-sensitive and safety-critical domains, many methods have been proposed to explain the decisions of these models. Recent years have also seen concerted efforts that have shown how such explanations can…
▽ More
Explainability has been widely stated as a cornerstone of the responsible and trustworthy use of machine learning models. With the ubiquitous use of Deep Neural Network (DNN) models expanding to risk-sensitive and safety-critical domains, many methods have been proposed to explain the decisions of these models. Recent years have also seen concerted efforts that have shown how such explanations can be distorted (attacked) by minor input perturbations. While there have been many surveys that review explainability methods themselves, there has been no effort hitherto to assimilate the different methods and metrics proposed to study the robustness of explanations of DNN models. In this work, we present a comprehensive survey of methods that study, understand, attack, and defend explanations of DNN models. We also present a detailed review of different metrics used to evaluate explanation methods, as well as describe attributional attack and defense methods. We conclude with lessons and take-aways for the community towards ensuring robust explanations of DNN model predictions.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
NESTER: An Adaptive Neurosymbolic Method for Causal Effect Estimation
Authors:
Abbavaram Gowtham Reddy,
Vineeth N Balasubramanian
Abstract:
Causal effect estimation from observational data is a central problem in causal inference. Methods based on potential outcomes framework solve this problem by exploiting inductive biases and heuristics from causal inference. Each of these methods addresses a specific aspect of causal effect estimation, such as controlling propensity score, enforcing randomization, etc., by designing neural network…
▽ More
Causal effect estimation from observational data is a central problem in causal inference. Methods based on potential outcomes framework solve this problem by exploiting inductive biases and heuristics from causal inference. Each of these methods addresses a specific aspect of causal effect estimation, such as controlling propensity score, enforcing randomization, etc., by designing neural network (NN) architectures and regularizers. In this paper, we propose an adaptive method called Neurosymbolic Causal Effect Estimator (NESTER), a generalized method for causal effect estimation. NESTER integrates the ideas used in existing methods based on multi-head NNs for causal effect estimation into one framework. We design a Domain Specific Language (DSL) tailored for causal effect estimation based on causal inductive biases used in literature. We conduct a theoretical analysis to investigate NESTER's efficacy in estimating causal effects. Our comprehensive empirical results show that NESTER performs better than state-of-the-art methods on benchmark datasets.
△ Less
Submitted 8 January, 2024; v1 submitted 8 November, 2022;
originally announced November 2022.
-
Counterfactual Generation Under Confounding
Authors:
Abbavaram Gowtham Reddy,
Saloni Dash,
Amit Sharma,
Vineeth N Balasubramanian
Abstract:
A machine learning model, under the influence of observed or unobserved confounders in the training data, can learn spurious correlations and fail to generalize when deployed. For image classifiers, augmenting a training dataset using counterfactual examples has been empirically shown to break spurious correlations. However, the counterfactual generation task itself becomes more difficult as the l…
▽ More
A machine learning model, under the influence of observed or unobserved confounders in the training data, can learn spurious correlations and fail to generalize when deployed. For image classifiers, augmenting a training dataset using counterfactual examples has been empirically shown to break spurious correlations. However, the counterfactual generation task itself becomes more difficult as the level of confounding increases. Existing methods for counterfactual generation under confounding consider a fixed set of interventions (e.g., texture, rotation) and are not flexible enough to capture diverse data-generating processes. Given a causal generative process, we formally characterize the adverse effects of confounding on any downstream tasks and show that the correlation between generative factors (attributes) can be used to quantitatively measure confounding between generative factors. To minimize such correlation, we propose a counterfactual generation method that learns to modify the value of any attribute in an image and generate new images given a set of observed attributes, even when the dataset is highly confounded. These counterfactual images are then used to regularize the downstream classifier such that the learned representations are the same across various generative factors conditioned on the class label. Our method is computationally efficient, simple to implement, and works well for any number of generative factors and confounding variables. Our experimental results on both synthetic (MNIST variants) and real-world (CelebA) datasets show the usefulness of our approach.
△ Less
Submitted 10 December, 2022; v1 submitted 22 October, 2022;
originally announced October 2022.
-
Distilling the Undistillable: Learning from a Nasty Teacher
Authors:
Surgan Jandial,
Yash Khasbage,
Arghya Pal,
Vineeth N Balasubramanian,
Balaji Krishnamurthy
Abstract:
The inadvertent stealing of private/sensitive information using Knowledge Distillation (KD) has been getting significant attention recently and has guided subsequent defense efforts considering its critical nature. Recent work Nasty Teacher proposed to develop teachers which can not be distilled or imitated by models attacking it. However, the promise of confidentiality offered by a nasty teacher…
▽ More
The inadvertent stealing of private/sensitive information using Knowledge Distillation (KD) has been getting significant attention recently and has guided subsequent defense efforts considering its critical nature. Recent work Nasty Teacher proposed to develop teachers which can not be distilled or imitated by models attacking it. However, the promise of confidentiality offered by a nasty teacher is not well studied, and as a further step to strengthen against such loopholes, we attempt to bypass its defense and steal (or extract) information in its presence successfully. Specifically, we analyze Nasty Teacher from two different directions and subsequently leverage them carefully to develop simple yet efficient methodologies, named as HTC and SCM, which increase the learning from Nasty Teacher by upto 68.63% on standard datasets. Additionally, we also explore an improvised defense method based on our insights of stealing. Our detailed set of experiments and ablations on diverse models/settings demonstrate the efficacy of our approach.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
ARUBA: An Architecture-Agnostic Balanced Loss for Aerial Object Detection
Authors:
Rebbapragada V C Sairam,
Monish Keswani,
Uttaran Sinha,
Nishit Shah,
Vineeth N Balasubramanian
Abstract:
Deep neural networks tend to reciprocate the bias of their training dataset. In object detection, the bias exists in the form of various imbalances such as class, background-foreground, and object size. In this paper, we denote size of an object as the number of pixels it covers in an image and size imbalance as the over-representation of certain sizes of objects in a dataset. We aim to address th…
▽ More
Deep neural networks tend to reciprocate the bias of their training dataset. In object detection, the bias exists in the form of various imbalances such as class, background-foreground, and object size. In this paper, we denote size of an object as the number of pixels it covers in an image and size imbalance as the over-representation of certain sizes of objects in a dataset. We aim to address the problem of size imbalance in drone-based aerial image datasets. Existing methods for solving size imbalance are based on architectural changes that utilize multiple scales of images or feature maps for detecting objects of different sizes. We, on the other hand, propose a novel ARchitectUre-agnostic BAlanced Loss (ARUBA) that can be applied as a plugin on top of any object detection model. It follows a neighborhood-driven approach inspired by the ordinality of object size. We evaluate the effectiveness of our approach through comprehensive experiments on aerial datasets such as HRSC2016, DOTAv1.0, DOTAv1.5 and VisDrone and obtain consistent improvement in performance.
△ Less
Submitted 18 November, 2023; v1 submitted 10 October, 2022;
originally announced October 2022.
-
A Tale of Two Hungarians: Tridiagonalizing Random Matrices
Authors:
Vijay Balasubramanian,
Javier M. Magan,
Qingyue Wu
Abstract:
The Hungarian physicist Eugene Wigner introduced random matrix models in physics to describe the energy spectra of atomic nuclei. As such, the main goal of Random Matrix Theory (RMT) has been to derive the eigenvalue statistics of matrices drawn from a given distribution. The Wigner approach gives powerful insights into the properties of complex, chaotic systems in thermal equilibrium. Another Hun…
▽ More
The Hungarian physicist Eugene Wigner introduced random matrix models in physics to describe the energy spectra of atomic nuclei. As such, the main goal of Random Matrix Theory (RMT) has been to derive the eigenvalue statistics of matrices drawn from a given distribution. The Wigner approach gives powerful insights into the properties of complex, chaotic systems in thermal equilibrium. Another Hungarian, Cornelius Lanczos, suggested a method of reducing the dynamics of any quantum system to a one-dimensional chain by tridiagonalizing the Hamiltonian relative to a given initial state. In the resulting matrix, the diagonal and off-diagonal Lanczos coefficients control transition amplitudes between elements of a distinguished basis of states. We connect these two approaches to the quantum mechanics of complex systems by deriving analytical formulae relating the potential defining a general RMT, or, equivalently, its density of states, to the Lanczos coefficients and their correlations. In particular, we derive an integral relation between the average Lanczos coefficients and the density of states, and, for polynomial potentials, algebraic equations that determine the Lanczos coefficients from the potential. We obtain these results for generic initial states in the thermodynamic limit. As an application, we compute the time-dependent ``spread complexity'' in Thermo-Field Double states and the spectral form factor for Gaussian and Non-Gaussian RMTs.
△ Less
Submitted 30 September, 2022; v1 submitted 17 August, 2022;
originally announced August 2022.
-
Class-Incremental Learning with Cross-Space Clustering and Controlled Transfer
Authors:
Arjun Ashok,
K J Joseph,
Vineeth Balasubramanian
Abstract:
In class-incremental learning, the model is expected to learn new classes continually while maintaining knowledge on previous classes. The challenge here lies in preserving the model's ability to effectively represent prior classes in the feature space, while adapting it to represent incoming new classes. We propose two distillation-based objectives for class incremental learning that leverage the…
▽ More
In class-incremental learning, the model is expected to learn new classes continually while maintaining knowledge on previous classes. The challenge here lies in preserving the model's ability to effectively represent prior classes in the feature space, while adapting it to represent incoming new classes. We propose two distillation-based objectives for class incremental learning that leverage the structure of the feature space to maintain accuracy on previous classes, as well as enable learning the new classes. In our first objective, termed cross-space clustering (CSC), we propose to use the feature space structure of the previous model to characterize directions of optimization that maximally preserve the class: directions that all instances of a specific class should collectively optimize towards, and those that they should collectively optimize away from. Apart from minimizing forgetting, this indirectly encourages the model to cluster all instances of a class in the current feature space, and gives rise to a sense of herd-immunity, allowing all samples of a class to jointly combat the model from forgetting the class. Our second objective termed controlled transfer (CT) tackles incremental learning from an understudied perspective of inter-class transfer. CT explicitly approximates and conditions the current model on the semantic similarities between incrementally arriving classes and prior classes. This allows the model to learn classes in such a way that it maximizes positive forward transfer from similar prior classes, thus increasing plasticity, and minimizes negative backward transfer on dissimilar prior classes, whereby strengthening stability. We perform extensive experiments on two benchmark datasets, adding our method (CSCCT) on top of three prominent class-incremental learning methods. We observe consistent performance improvement on a variety of experimental settings.
△ Less
Submitted 16 August, 2022; v1 submitted 7 August, 2022;
originally announced August 2022.
-
Learning Modular Structures That Generalize Out-of-Distribution
Authors:
Arjun Ashok,
Chaitanya Devaguptapu,
Vineeth Balasubramanian
Abstract:
Out-of-distribution (O.O.D.) generalization remains to be a key challenge for real-world machine learning systems. We describe a method for O.O.D. generalization that, through training, encourages models to only preserve features in the network that are well reused across multiple training domains. Our method combines two complementary neuron-level regularizers with a probabilistic differentiable…
▽ More
Out-of-distribution (O.O.D.) generalization remains to be a key challenge for real-world machine learning systems. We describe a method for O.O.D. generalization that, through training, encourages models to only preserve features in the network that are well reused across multiple training domains. Our method combines two complementary neuron-level regularizers with a probabilistic differentiable binary mask over the network, to extract a modular sub-network that achieves better O.O.D. performance than the original network. Preliminary evaluation on two benchmark datasets corroborates the promise of our method.
△ Less
Submitted 7 August, 2022;
originally announced August 2022.
-
Novel Class Discovery without Forgetting
Authors:
K J Joseph,
Sujoy Paul,
Gaurav Aggarwal,
Soma Biswas,
Piyush Rai,
Kai Han,
Vineeth N Balasubramanian
Abstract:
Humans possess an innate ability to identify and differentiate instances that they are not familiar with, by leveraging and adapting the knowledge that they have acquired so far. Importantly, they achieve this without deteriorating the performance on their earlier learning. Inspired by this, we identify and formulate a new, pragmatic problem setting of NCDwF: Novel Class Discovery without Forgetti…
▽ More
Humans possess an innate ability to identify and differentiate instances that they are not familiar with, by leveraging and adapting the knowledge that they have acquired so far. Importantly, they achieve this without deteriorating the performance on their earlier learning. Inspired by this, we identify and formulate a new, pragmatic problem setting of NCDwF: Novel Class Discovery without Forgetting, which tasks a machine learning model to incrementally discover novel categories of instances from unlabeled data, while maintaining its performance on the previously seen categories. We propose 1) a method to generate pseudo-latent representations which act as a proxy for (no longer available) labeled data, thereby alleviating forgetting, 2) a mutual-information based regularizer which enhances unsupervised discovery of novel classes, and 3) a simple Known Class Identifier which aids generalized inference when the testing data contains instances form both seen and unseen categories. We introduce experimental protocols based on CIFAR-10, CIFAR-100 and ImageNet-1000 to measure the trade-off between knowledge retention and novel class discovery. Our extensive evaluations reveal that existing models catastrophically forget previously seen categories while identifying novel categories, while our method is able to effectively balance between the competing objectives. We hope our work will attract further research into this newly identified pragmatic problem setting.
△ Less
Submitted 21 July, 2022;
originally announced July 2022.
-
INDIGO: Intrinsic Multimodality for Domain Generalization
Authors:
Puneet Mangla,
Shivam Chandhok,
Milan Aggarwal,
Vineeth N Balasubramanian,
Balaji Krishnamurthy
Abstract:
For models to generalize under unseen domains (a.k.a domain generalization), it is crucial to learn feature representations that are domain-agnostic and capture the underlying semantics that makes up an object category. Recent advances towards weakly supervised vision-language models that learn holistic representations from cheap weakly supervised noisy text annotations have shown their ability on…
▽ More
For models to generalize under unseen domains (a.k.a domain generalization), it is crucial to learn feature representations that are domain-agnostic and capture the underlying semantics that makes up an object category. Recent advances towards weakly supervised vision-language models that learn holistic representations from cheap weakly supervised noisy text annotations have shown their ability on semantic understanding by capturing object characteristics that generalize under different domains. However, when multiple source domains are involved, the cost of curating textual annotations for every image in the dataset can blow up several times, depending on their number. This makes the process tedious and infeasible, hindering us from directly using these supervised vision-language approaches to achieve the best generalization on an unseen domain. Motivated from this, we study how multimodal information from existing pre-trained multimodal networks can be leveraged in an "intrinsic" way to make systems generalize under unseen domains. To this end, we propose IntriNsic multimodality for DomaIn GeneralizatiOn (INDIGO), a simple and elegant way of leveraging the intrinsic modality present in these pre-trained multimodal networks along with the visual modality to enhance generalization to unseen domains at test-time. We experiment on several Domain Generalization settings (ClosedDG, OpenDG, and Limited sources) and show state-of-the-art generalization performance on unseen domains. Further, we provide a thorough analysis to develop a holistic understanding of INDIGO.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
On Conditioning the Input Noise for Controlled Image Generation with Diffusion Models
Authors:
Vedant Singh,
Surgan Jandial,
Ayush Chopra,
Siddharth Ramesh,
Balaji Krishnamurthy,
Vineeth N. Balasubramanian
Abstract:
Conditional image generation has paved the way for several breakthroughs in image editing, generating stock photos and 3-D object generation. This continues to be a significant area of interest with the rise of new state-of-the-art methods that are based on diffusion models. However, diffusion models provide very little control over the generated image, which led to subsequent works exploring tech…
▽ More
Conditional image generation has paved the way for several breakthroughs in image editing, generating stock photos and 3-D object generation. This continues to be a significant area of interest with the rise of new state-of-the-art methods that are based on diffusion models. However, diffusion models provide very little control over the generated image, which led to subsequent works exploring techniques like classifier guidance, that provides a way to trade off diversity with fidelity. In this work, we explore techniques to condition diffusion models with carefully crafted input noise artifacts. This allows generation of images conditioned on semantic attributes. This is different from existing approaches that input Gaussian noise and further introduce conditioning at the diffusion model's inference step. Our experiments over several examples and conditional settings show the potential of our approach.
△ Less
Submitted 8 May, 2022;
originally announced May 2022.
-
Proto2Proto: Can you recognize the car, the way I do?
Authors:
Monish Keswani,
Sriranjani Ramakrishnan,
Nishant Reddy,
Vineeth N Balasubramanian
Abstract:
Prototypical methods have recently gained a lot of attention due to their intrinsic interpretable nature, which is obtained through the prototypes. With growing use cases of model reuse and distillation, there is a need to also study transfer of interpretability from one model to another. We present Proto2Proto, a novel method to transfer interpretability of one prototypical part network to anothe…
▽ More
Prototypical methods have recently gained a lot of attention due to their intrinsic interpretable nature, which is obtained through the prototypes. With growing use cases of model reuse and distillation, there is a need to also study transfer of interpretability from one model to another. We present Proto2Proto, a novel method to transfer interpretability of one prototypical part network to another via knowledge distillation. Our approach aims to add interpretability to the "dark" knowledge transferred from the teacher to the shallower student model. We propose two novel losses: "Global Explanation" loss and "Patch-Prototype Correspondence" loss to facilitate such a transfer. Global Explanation loss forces the student prototypes to be close to teacher prototypes, and Patch-Prototype Correspondence loss enforces the local representations of the student to be similar to that of the teacher. Further, we propose three novel metrics to evaluate the student's proximity to the teacher as measures of interpretability transfer in our settings. We qualitatively and quantitatively demonstrate the effectiveness of our method on CUB-200-2011 and Stanford Cars datasets. Our experiments show that the proposed method indeed achieves interpretability transfer from teacher to student while simultaneously exhibiting competitive performance.
△ Less
Submitted 2 May, 2022; v1 submitted 25 April, 2022;
originally announced April 2022.
-
Spacing Loss for Discovering Novel Categories
Authors:
K J Joseph,
Sujoy Paul,
Gaurav Aggarwal,
Soma Biswas,
Piyush Rai,
Kai Han,
Vineeth N Balasubramanian
Abstract:
Novel Class Discovery (NCD) is a learning paradigm, where a machine learning model is tasked to semantically group instances from unlabeled data, by utilizing labeled instances from a disjoint set of classes. In this work, we first characterize existing NCD approaches into single-stage and two-stage methods based on whether they require access to labeled and unlabeled data together while discoveri…
▽ More
Novel Class Discovery (NCD) is a learning paradigm, where a machine learning model is tasked to semantically group instances from unlabeled data, by utilizing labeled instances from a disjoint set of classes. In this work, we first characterize existing NCD approaches into single-stage and two-stage methods based on whether they require access to labeled and unlabeled data together while discovering new classes. Next, we devise a simple yet powerful loss function that enforces separability in the latent space using cues from multi-dimensional scaling, which we refer to as Spacing Loss. Our proposed formulation can either operate as a standalone method or can be plugged into existing methods to enhance them. We validate the efficacy of Spacing Loss with thorough experimental evaluation across multiple settings on CIFAR-10 and CIFAR-100 datasets.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
CholecTriplet2021: A benchmark challenge for surgical action triplet recognition
Authors:
Chinedu Innocent Nwoye,
Deepak Alapatt,
Tong Yu,
Armine Vardazaryan,
Fangfang Xia,
Zixuan Zhao,
Tong Xia,
Fucang Jia,
Yuxuan Yang,
Hao Wang,
Derong Yu,
Guoyan Zheng,
Xiaotian Duan,
Neil Getty,
Ricardo Sanchez-Matilla,
Maria Robu,
Li Zhang,
Huabin Chen,
Jiacheng Wang,
Liansheng Wang,
Bokai Zhang,
Beerend Gerats,
Sista Raviteja,
Rachana Sathish,
Rong Tao
, et al. (37 additional authors not shown)
Abstract:
Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in…
▽ More
Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.
△ Less
Submitted 29 December, 2022; v1 submitted 10 April, 2022;
originally announced April 2022.
-
Unseen Classes at a Later Time? No Problem
Authors:
Hari Chandana Kuchibhotla,
Sumitra S Malagi,
Shivam Chandhok,
Vineeth N Balasubramanian
Abstract:
Recent progress towards learning from limited supervision has encouraged efforts towards designing models that can recognize novel classes at test time (generalized zero-shot learning or GZSL). GZSL approaches assume knowledge of all classes, with or without labeled data, beforehand. However, practical scenarios demand models that are adaptable and can handle dynamic addition of new seen and unsee…
▽ More
Recent progress towards learning from limited supervision has encouraged efforts towards designing models that can recognize novel classes at test time (generalized zero-shot learning or GZSL). GZSL approaches assume knowledge of all classes, with or without labeled data, beforehand. However, practical scenarios demand models that are adaptable and can handle dynamic addition of new seen and unseen classes on the fly (that is continual generalized zero-shot learning or CGZSL). One solution is to sequentially retrain and reuse conventional GZSL methods, however, such an approach suffers from catastrophic forgetting leading to suboptimal generalization performance. A few recent efforts towards tackling CGZSL have been limited by difference in settings, practicality, data splits and protocols followed-inhibiting fair comparison and a clear direction forward. Motivated from these observations, in this work, we firstly consolidate the different CGZSL setting variants and propose a new Online-CGZSL setting which is more practical and flexible. Secondly, we introduce a unified feature-generative framework for CGZSL that leverages bi-directional incremental alignment to dynamically adapt to addition of new classes, with or without labeled data, that arrive over time in any of these CGZSL settings. Our comprehensive experiments and analysis on five benchmark datasets and comparison with baselines show that our approach consistently outperforms existing methods, especially on the more practical Online setting.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
Energy-based Latent Aligner for Incremental Learning
Authors:
K J Joseph,
Salman Khan,
Fahad Shahbaz Khan,
Rao Muhammad Anwer,
Vineeth N Balasubramanian
Abstract:
Deep learning models tend to forget their earlier knowledge while incrementally learning new tasks. This behavior emerges because the parameter updates optimized for the new tasks may not align well with the updates suitable for older tasks. The resulting latent representation mismatch causes forgetting. In this work, we propose ELI: Energy-based Latent Aligner for Incremental Learning, which firs…
▽ More
Deep learning models tend to forget their earlier knowledge while incrementally learning new tasks. This behavior emerges because the parameter updates optimized for the new tasks may not align well with the updates suitable for older tasks. The resulting latent representation mismatch causes forgetting. In this work, we propose ELI: Energy-based Latent Aligner for Incremental Learning, which first learns an energy manifold for the latent representations such that previous task latents will have low energy and the current task latents have high energy values. This learned manifold is used to counter the representational shift that happens during incremental learning. The implicit regularization that is offered by our proposed methodology can be used as a plug-and-play module in existing incremental learning methodologies. We validate this through extensive evaluation on CIFAR-100, ImageNet subset, ImageNet 1k and Pascal VOC datasets. We observe consistent improvement when ELI is added to three prominent methodologies in class-incremental learning, across multiple incremental settings. Further, when added to the state-of-the-art incremental object detector, ELI provides over 5% improvement in detection accuracy, corroborating its effectiveness and complementary advantage to existing art.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
Quantum Error Correction in the Black Hole Interior
Authors:
Vijay Balasubramanian,
Arjun Kar,
Cathy Li,
Onkar Parrikar
Abstract:
We study the quantum error correction properties of the black hole interior in a toy model for an evaporating black hole: Jackiw-Teitelboim gravity entangled with a non-gravitational bath. After the Page time, the black hole interior degrees of freedom in this system are encoded in the bath Hilbert space. We use the gravitational path integral to show that the interior density matrix is correctabl…
▽ More
We study the quantum error correction properties of the black hole interior in a toy model for an evaporating black hole: Jackiw-Teitelboim gravity entangled with a non-gravitational bath. After the Page time, the black hole interior degrees of freedom in this system are encoded in the bath Hilbert space. We use the gravitational path integral to show that the interior density matrix is correctable against the action of quantum operations on the bath which (i) do not have prior access to details of the black hole microstates, and (ii) do not have a large, negative coherent information with respect to the maximally mixed state on the bath, with the lower bound controlled by the black hole entropy and code subspace dimension. Thus, the encoding of the black hole interior in the radiation is robust against generic, low-rank quantum operations. For erasure errors, gravity comes within an $O(1)$ distance of saturating the Singleton bound on the tolerance of error correcting codes. For typical errors in the bath to corrupt the interior, they must have a rank that is a large multiple of the bath Hilbert space dimension, with the precise coefficient set by the black hole entropy and code subspace dimension.
△ Less
Submitted 11 April, 2022; v1 submitted 3 March, 2022;
originally announced March 2022.
-
Quantum chaos and the complexity of spread of states
Authors:
Vijay Balasubramanian,
Pawel Caputa,
Javier Magan,
Qingyue Wu
Abstract:
We propose a measure of quantum state complexity defined by minimizing the spread of the wave-function over all choices of basis. Our measure is controlled by the "survival amplitude" for a state to remain unchanged, and can be efficiently computed in theories with discrete spectra. For continuous Hamiltonian evolution, it generalizes Krylov operator complexity to quantum states. We apply our meth…
▽ More
We propose a measure of quantum state complexity defined by minimizing the spread of the wave-function over all choices of basis. Our measure is controlled by the "survival amplitude" for a state to remain unchanged, and can be efficiently computed in theories with discrete spectra. For continuous Hamiltonian evolution, it generalizes Krylov operator complexity to quantum states. We apply our methods to the harmonic and inverted oscillators, particles on group manifolds, the Schwarzian theory, the SYK model, and random matrix models. For time-evolved thermofield double states in chaotic systems our measure shows four regimes: a linear "ramp" up to a "peak" that is exponential in the entropy, followed by a "slope" down to a "plateau". These regimes arise in the same physics producing the slope-dip-ramp-plateau structure of the Spectral Form Factor. Specifically, the complexity slope arises from spectral rigidity, distinguishing different random matrix ensembles.
△ Less
Submitted 20 April, 2022; v1 submitted 14 February, 2022;
originally announced February 2022.
-
On Causally Disentangled Representations
Authors:
Abbavaram Gowtham Reddy,
Benin Godfrey L,
Vineeth N Balasubramanian
Abstract:
Representation learners that disentangle factors of variation have already proven to be important in addressing various real world concerns such as fairness and interpretability. Initially consisting of unsupervised models with independence assumptions, more recently, weak supervision and correlated features have been explored, but without a causal view of the generative process. In contrast, we w…
▽ More
Representation learners that disentangle factors of variation have already proven to be important in addressing various real world concerns such as fairness and interpretability. Initially consisting of unsupervised models with independence assumptions, more recently, weak supervision and correlated features have been explored, but without a causal view of the generative process. In contrast, we work under the regime of a causal generative process where generative factors are either independent or can be potentially confounded by a set of observed or unobserved confounders. We present an analysis of disentangled representations through the notion of disentangled causal process. We motivate the need for new metrics and datasets to study causal disentanglement and propose two evaluation metrics and a dataset. We show that our metrics capture the desiderata of disentangled causal process. Finally, we perform an empirical study on state of the art disentangled representation learners using our metrics and dataset to evaluate them from causal perspective.
△ Less
Submitted 10 December, 2021;
originally announced December 2021.
-
Matching Learned Causal Effects of Neural Networks with Domain Priors
Authors:
Sai Srinivas Kancheti,
Abbavaram Gowtham Reddy,
Vineeth N Balasubramanian,
Amit Sharma
Abstract:
A trained neural network can be interpreted as a structural causal model (SCM) that provides the effect of changing input variables on the model's output. However, if training data contains both causal and correlational relationships, a model that optimizes prediction accuracy may not necessarily learn the true causal relationships between input and output variables. On the other hand, expert user…
▽ More
A trained neural network can be interpreted as a structural causal model (SCM) that provides the effect of changing input variables on the model's output. However, if training data contains both causal and correlational relationships, a model that optimizes prediction accuracy may not necessarily learn the true causal relationships between input and output variables. On the other hand, expert users often have prior knowledge of the causal relationship between certain input variables and output from domain knowledge. Therefore, we propose a regularization method that aligns the learned causal effects of a neural network with domain priors, including both direct and total causal effects. We show that this approach can generalize to different kinds of domain priors, including monotonicity of causal effect of an input variable on output or zero causal effect of a variable on output for purposes of fairness. Our experiments on twelve benchmark datasets show its utility in regularizing a neural network model to maintain desired causal effects, without compromising on accuracy. Importantly, we also show that a model thus trained is robust and gets improved accuracy on noisy inputs.
△ Less
Submitted 29 June, 2022; v1 submitted 24 November, 2021;
originally announced November 2021.
-
Feature Generation for Long-tail Classification
Authors:
Rahul Vigneswaran,
Marc T. Law,
Vineeth N. Balasubramanian,
Makarand Tapaswi
Abstract:
The visual world naturally exhibits an imbalance in the number of object or scene instances resulting in a \emph{long-tailed distribution}. This imbalance poses significant challenges for classification models based on deep learning. Oversampling instances of the tail classes attempts to solve this imbalance. However, the limited visual diversity results in a network with poor representation abili…
▽ More
The visual world naturally exhibits an imbalance in the number of object or scene instances resulting in a \emph{long-tailed distribution}. This imbalance poses significant challenges for classification models based on deep learning. Oversampling instances of the tail classes attempts to solve this imbalance. However, the limited visual diversity results in a network with poor representation ability. A simple counter to this is decoupling the representation and classifier networks and using oversampling only to train the classifier. In this paper, instead of repeatedly re-sampling the same image (and thereby features), we explore a direction that attempts to generate meaningful features by estimating the tail category's distribution. Inspired by ideas from recent work on few-shot learning, we create calibrated distributions to sample additional features that are subsequently used to train the classifier. Through several experiments on the CIFAR-100-LT (long-tail) dataset with varying imbalance factors and on mini-ImageNet-LT (long-tail), we show the efficacy of our approach and establish a new state-of-the-art. We also present a qualitative analysis of generated features using t-SNE visualizations and analyze the nearest neighbors used to calibrate the tail class distributions. Our code is available at https://github.com/rahulvigneswaran/TailCalibX.
△ Less
Submitted 10 November, 2021;
originally announced November 2021.
-
Get Fooled for the Right Reason: Improving Adversarial Robustness through a Teacher-guided Curriculum Learning Approach
Authors:
Anindya Sarkar,
Anirban Sarkar,
Sowrya Gali,
Vineeth N Balasubramanian
Abstract:
Current SOTA adversarially robust models are mostly based on adversarial training (AT) and differ only by some regularizers either at inner maximization or outer minimization steps. Being repetitive in nature during the inner maximization step, they take a huge time to train. We propose a non-iterative method that enforces the following ideas during training. Attribution maps are more aligned to t…
▽ More
Current SOTA adversarially robust models are mostly based on adversarial training (AT) and differ only by some regularizers either at inner maximization or outer minimization steps. Being repetitive in nature during the inner maximization step, they take a huge time to train. We propose a non-iterative method that enforces the following ideas during training. Attribution maps are more aligned to the actual object in the image for adversarially robust models compared to naturally trained models. Also, the allowed set of pixels to perturb an image (that changes model decision) should be restricted to the object pixels only, which reduces the attack strength by limiting the attack space. Our method achieves significant performance gains with a little extra effort (10-20%) over existing AT models and outperforms all other methods in terms of adversarial as well as natural accuracy. We have performed extensive experimentation with CIFAR-10, CIFAR-100, and TinyImageNet datasets and reported results against many popular strong adversarial attacks to prove the effectiveness of our method.
△ Less
Submitted 30 October, 2021;
originally announced November 2021.
-
Multi-Domain Incremental Learning for Semantic Segmentation
Authors:
Prachi Garg,
Rohit Saluja,
Vineeth N Balasubramanian,
Chetan Arora,
Anbumani Subramanian,
C. V. Jawahar
Abstract:
Recent efforts in multi-domain learning for semantic segmentation attempt to learn multiple geographical datasets in a universal, joint model. A simple fine-tuning experiment performed sequentially on three popular road scene segmentation datasets demonstrates that existing segmentation frameworks fail at incrementally learning on a series of visually disparate geographical domains. When learning…
▽ More
Recent efforts in multi-domain learning for semantic segmentation attempt to learn multiple geographical datasets in a universal, joint model. A simple fine-tuning experiment performed sequentially on three popular road scene segmentation datasets demonstrates that existing segmentation frameworks fail at incrementally learning on a series of visually disparate geographical domains. When learning a new domain, the model catastrophically forgets previously learned knowledge. In this work, we pose the problem of multi-domain incremental learning for semantic segmentation. Given a model trained on a particular geographical domain, the goal is to (i) incrementally learn a new geographical domain, (ii) while retaining performance on the old domain, (iii) given that the previous domain's dataset is not accessible. We propose a dynamic architecture that assigns universally shared, domain-invariant parameters to capture homogeneous semantic features present in all domains, while dedicated domain-specific parameters learn the statistics of each domain. Our novel optimization strategy helps achieve a good balance between retention of old knowledge (stability) and acquiring new knowledge (plasticity). We demonstrate the effectiveness of our proposed solution on domain incremental settings pertaining to real-world driving scenes from roads of Germany (Cityscapes), the United States (BDD100k), and India (IDD).
△ Less
Submitted 23 October, 2021;
originally announced October 2021.
-
A Framework for Learning Ante-hoc Explainable Models via Concepts
Authors:
Anirban Sarkar,
Deepak Vijaykeerthy,
Anindya Sarkar,
Vineeth N Balasubramanian
Abstract:
Self-explaining deep models are designed to learn the latent concept-based explanations implicitly during training, which eliminates the requirement of any post-hoc explanation generation technique. In this work, we propose one such model that appends an explanation generation module on top of any basic network and jointly trains the whole module that shows high predictive performance and generate…
▽ More
Self-explaining deep models are designed to learn the latent concept-based explanations implicitly during training, which eliminates the requirement of any post-hoc explanation generation technique. In this work, we propose one such model that appends an explanation generation module on top of any basic network and jointly trains the whole module that shows high predictive performance and generates meaningful explanations in terms of concepts. Our training strategy is suitable for unsupervised concept learning with much lesser parameter space requirements compared to baseline methods. Our proposed model also has provision for leveraging self-supervision on concepts to extract better explanations. However, with full concept supervision, we achieve the best predictive performance compared to recently proposed concept-based explainable models. We report both qualitative and quantitative results with our method, which shows better performance than recently proposed concept-based explainability methods. We reported exhaustive results with two datasets without ground truth concepts, i.e., CIFAR10, ImageNet, and two datasets with ground truth concepts, i.e., AwA2, CUB-200, to show the effectiveness of our method for both cases. To the best of our knowledge, we are the first ante-hoc explanation generation method to show results with a large-scale dataset such as ImageNet.
△ Less
Submitted 30 November, 2021; v1 submitted 25 August, 2021;
originally announced August 2021.