Search | arXiv e-print repository

On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models

Authors: Sree Harsha Tanneru, Dan Ley, Chirag Agarwal, Himabindu Lakkaraju

Abstract: As Large Language Models (LLMs) are increasingly being employed in real-world applications in critical domains such as healthcare, it is important to ensure that the Chain-of-Thought (CoT) reasoning generated by these models faithfully captures their underlying behavior. While LLMs are known to generate CoT reasoning that is appealing to humans, prior studies have shown that these explanations d… ▽ More As Large Language Models (LLMs) are increasingly being employed in real-world applications in critical domains such as healthcare, it is important to ensure that the Chain-of-Thought (CoT) reasoning generated by these models faithfully captures their underlying behavior. While LLMs are known to generate CoT reasoning that is appealing to humans, prior studies have shown that these explanations do not accurately reflect the actual behavior of the underlying LLMs. In this work, we explore the promise of three broad approaches commonly employed to steer the behavior of LLMs to enhance the faithfulness of the CoT reasoning generated by LLMs: in-context learning, fine-tuning, and activation editing. Specifically, we introduce novel strategies for in-context learning, fine-tuning, and activation editing aimed at improving the faithfulness of the CoT reasoning. We then carry out extensive empirical analyses with multiple benchmark datasets to explore the promise of these strategies. Our analyses indicate that these strategies offer limited success in improving the faithfulness of the CoT reasoning, with only slight performance enhancements in controlled scenarios. Activation editing demonstrated minimal success, while fine-tuning and in-context learning achieved marginal improvements that failed to generalize across diverse reasoning and truthful question-answering benchmarks. In summary, our work underscores the inherent difficulty in eliciting faithful CoT reasoning from LLMs, suggesting that the current array of approaches may not be sufficient to address this complex challenge. △ Less

Submitted 1 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

arXiv:2403.03744 [pdf, other]

MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models

Authors: Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju

Abstract: As large language models (LLMs) develop increasingly sophisticated capabilities and find applications in medical settings, it becomes important to assess their medical safety due to their far-reaching implications for personal and public health, patient safety, and human rights. However, there is little to no understanding of the notion of medical safety in the context of LLMs, let alone how to ev… ▽ More As large language models (LLMs) develop increasingly sophisticated capabilities and find applications in medical settings, it becomes important to assess their medical safety due to their far-reaching implications for personal and public health, patient safety, and human rights. However, there is little to no understanding of the notion of medical safety in the context of LLMs, let alone how to evaluate and improve it. To address this gap, we first define the notion of medical safety in LLMs based on the Principles of Medical Ethics set forth by the American Medical Association. We then leverage this understanding to introduce MedSafetyBench, the first benchmark dataset specifically designed to measure the medical safety of LLMs. We demonstrate the utility of MedSafetyBench by using it to evaluate and improve the medical safety of LLMs. Our results show that publicly-available medical LLMs do not meet standards of medical safety and that fine-tuning them using MedSafetyBench improves their medical safety. By introducing this new benchmark dataset, our work enables a systematic study of the state of medical safety in LLMs and motivates future work in this area, thereby mitigating the safety risks of LLMs in medicine. △ Less

Submitted 13 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.14145 [pdf, other]

Multiply Robust Estimation for Local Distribution Shifts with Multiple Domains

Authors: Steven Wilkins-Reeves, Xu Chen, Qi Ma, Christine Agarwal, Aude Hofleitner

Abstract: Distribution shifts are ubiquitous in real-world machine learning applications, posing a challenge to the generalization of models trained on one data distribution to another. We focus on scenarios where data distributions vary across multiple segments of the entire population and only make local assumptions about the differences between training and test (deployment) distributions within each seg… ▽ More Distribution shifts are ubiquitous in real-world machine learning applications, posing a challenge to the generalization of models trained on one data distribution to another. We focus on scenarios where data distributions vary across multiple segments of the entire population and only make local assumptions about the differences between training and test (deployment) distributions within each segment. We propose a two-stage multiply robust estimation method to improve model performance on each individual segment for tabular data analysis. The method involves fitting a linear combination of the based models, learned using clusters of training data from multiple segments, followed by a refinement step for each segment. Our method is designed to be implemented with commonly used off-the-shelf machine learning models. We establish theoretical guarantees on the generalization bound of the method on the test risk. With extensive experiments on synthetic and real datasets, we demonstrate that the proposed method substantially improves over existing alternatives in prediction accuracy and robustness on both regression and classification tasks. We also assess its effectiveness on a user city prediction dataset from Meta. △ Less

Submitted 3 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: 9 pages, 4 figures

arXiv:2402.06625 [pdf, other]

Understanding the Effects of Iterative Prompting on Truthfulness

Authors: Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

Abstract: The development of Large Language Models (LLMs) has notably transformed numerous sectors, offering impressive text generation capabilities. Yet, the reliability and truthfulness of these models remain pressing concerns. To this end, we investigate iterative prompting, a strategy hypothesized to refine LLM responses, assessing its impact on LLM truthfulness, an area which has not been thoroughly ex… ▽ More The development of Large Language Models (LLMs) has notably transformed numerous sectors, offering impressive text generation capabilities. Yet, the reliability and truthfulness of these models remain pressing concerns. To this end, we investigate iterative prompting, a strategy hypothesized to refine LLM responses, assessing its impact on LLM truthfulness, an area which has not been thoroughly explored. Our extensive experiments delve into the intricacies of iterative prompting variants, examining their influence on the accuracy and calibration of model responses. Our findings reveal that naive prompting methods significantly undermine truthfulness, leading to exacerbated calibration errors. In response to these challenges, we introduce several prompting variants designed to address the identified issues. These variants demonstrate marked improvements over existing baselines, signaling a promising direction for future research. Our work provides a nuanced understanding of iterative prompting and introduces novel approaches to enhance the truthfulness of LLMs, thereby contributing to the development of more accurate and trustworthy AI systems. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2402.04614 [pdf, other]

Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

Authors: Chirag Agarwal, Sree Harsha Tanneru, Himabindu Lakkaraju

Abstract: Large Language Models (LLMs) are deployed as powerful tools for several natural language processing (NLP) applications. Recent works show that modern LLMs can generate self-explanations (SEs), which elicit their intermediate reasoning steps for explaining their behavior. Self-explanations have seen widespread adoption owing to their conversational and plausible nature. However, there is little to… ▽ More Large Language Models (LLMs) are deployed as powerful tools for several natural language processing (NLP) applications. Recent works show that modern LLMs can generate self-explanations (SEs), which elicit their intermediate reasoning steps for explaining their behavior. Self-explanations have seen widespread adoption owing to their conversational and plausible nature. However, there is little to no understanding of their faithfulness. In this work, we discuss the dichotomy between faithfulness and plausibility in SEs generated by LLMs. We argue that while LLMs are adept at generating plausible explanations -- seemingly logical and coherent to human users -- these explanations do not necessarily align with the reasoning processes of the LLMs, raising concerns about their faithfulness. We highlight that the current trend towards increasing the plausibility of explanations, primarily driven by the demand for user-friendly interfaces, may come at the cost of diminishing their faithfulness. We assert that the faithfulness of explanations is critical in LLMs employed for high-stakes decision-making. Moreover, we emphasize the need for a systematic characterization of faithfulness-plausibility requirements of different real-world applications and ensure explanations meet those needs. While there are several approaches to improving plausibility, improving faithfulness is an open challenge. We call upon the community to develop novel methods to enhance the faithfulness of self explanations thereby enabling transparent deployment of LLMs in diverse high-stakes settings. △ Less

Submitted 13 March, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

arXiv:2311.03533 [pdf, other]

Quantifying Uncertainty in Natural Language Explanations of Large Language Models

Authors: Sree Harsha Tanneru, Chirag Agarwal, Himabindu Lakkaraju

Abstract: Large Language Models (LLMs) are increasingly used as powerful tools for several high-stakes natural language processing (NLP) applications. Recent prompting works claim to elicit intermediate reasoning steps and key tokens that serve as proxy explanations for LLM predictions. However, there is no certainty whether these explanations are reliable and reflect the LLMs behavior. In this work, we mak… ▽ More Large Language Models (LLMs) are increasingly used as powerful tools for several high-stakes natural language processing (NLP) applications. Recent prompting works claim to elicit intermediate reasoning steps and key tokens that serve as proxy explanations for LLM predictions. However, there is no certainty whether these explanations are reliable and reflect the LLMs behavior. In this work, we make one of the first attempts at quantifying the uncertainty in explanations of LLMs. To this end, we propose two novel metrics -- $\textit{Verbalized Uncertainty}$ and $\textit{Probing Uncertainty}$ -- to quantify the uncertainty of generated explanations. While verbalized uncertainty involves prompting the LLM to express its confidence in its explanations, probing uncertainty leverages sample and model perturbations as a means to quantify the uncertainty. Our empirical analysis of benchmark datasets reveals that verbalized uncertainty is not a reliable estimate of explanation confidence. Further, we show that the probing uncertainty estimates are correlated with the faithfulness of an explanation, with lower uncertainty corresponding to explanations with higher faithfulness. Our study provides insights into the challenges and opportunities of quantifying uncertainty in LLM explanations, contributing to the broader discussion of the trustworthiness of foundation models. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2310.05797 [pdf, other]

Are Large Language Models Post Hoc Explainers?

Authors: Nicholas Kroeger, Dan Ley, Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

Abstract: The increasing use of predictive models in high-stakes settings highlights the need for ensuring that relevant stakeholders understand and trust the decisions made by these models. To this end, several approaches have been proposed in recent literature to explain the behavior of complex predictive models in a post hoc fashion. However, despite the growing number of such post hoc explanation techni… ▽ More The increasing use of predictive models in high-stakes settings highlights the need for ensuring that relevant stakeholders understand and trust the decisions made by these models. To this end, several approaches have been proposed in recent literature to explain the behavior of complex predictive models in a post hoc fashion. However, despite the growing number of such post hoc explanation techniques, many require white-box access to the model and/or are computationally expensive, highlighting the need for next-generation post hoc explainers. Recently, Large Language Models (LLMs) have emerged as powerful tools that are effective at a wide variety of tasks. However, their potential to explain the behavior of other complex predictive models remains relatively unexplored. In this work, we carry out one of the initial explorations to analyze the effectiveness of LLMs in explaining other complex predictive models. To this end, we propose three novel approaches that exploit the in-context learning (ICL) capabilities of LLMs to explain the predictions made by other complex models. We conduct extensive experimentation with these approaches on real-world datasets to demonstrate that LLMs perform on par with state-of-the-art post hoc explainers, opening up promising avenues for future research into LLM-based post hoc explanations of complex predictive models. △ Less

Submitted 26 February, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

arXiv:2309.16452 [pdf, other]

On the Trade-offs between Adversarial Robustness and Actionable Explanations

Authors: Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

Abstract: As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders. However, it is unclear if these two notions can be simultaneously achieved or if there exist trade-offs between them. In this work, we make one of the fir… ▽ More As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders. However, it is unclear if these two notions can be simultaneously achieved or if there exist trade-offs between them. In this work, we make one of the first attempts at studying the impact of adversarially robust models on actionable explanations which provide end users with a means for recourse. We theoretically and empirically analyze the cost (ease of implementation) and validity (probability of obtaining a positive model prediction) of recourses output by state-of-the-art algorithms when the underlying models are adversarially robust vs. non-robust. More specifically, we derive theoretical bounds on the differences between the cost and the validity of the recourses generated by state-of-the-art algorithms for adversarially robust vs. non-robust linear and non-linear models. Our empirical results with multiple real-world datasets validate our theoretical results and show the impact of varying degrees of model robustness on the cost and validity of the resulting recourses. Our analyses demonstrate that adversarially robust models significantly increase the cost and reduce the validity of the resulting recourses, thus shedding light on the inherent trade-offs between adversarial robustness and actionable explanations △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.02705 [pdf, other]

Certifying LLM Safety against Adversarial Prompting

Authors: Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi, Himabindu Lakkaraju

Abstract: Large language models (LLMs) are vulnerable to adversarial attacks that add malicious tokens to an input prompt to bypass the safety guardrails of an LLM and cause it to produce harmful content. In this work, we introduce erase-and-check, the first framework for defending against adversarial prompts with certifiable safety guarantees. Given a prompt, our procedure erases tokens individually and in… ▽ More Large language models (LLMs) are vulnerable to adversarial attacks that add malicious tokens to an input prompt to bypass the safety guardrails of an LLM and cause it to produce harmful content. In this work, we introduce erase-and-check, the first framework for defending against adversarial prompts with certifiable safety guarantees. Given a prompt, our procedure erases tokens individually and inspects the resulting subsequences using a safety filter. Our safety certificate guarantees that harmful prompts are not mislabeled as safe due to an adversarial attack up to a certain size. We implement the safety filter in two ways, using Llama 2 and DistilBERT, and compare the performance of erase-and-check for the two cases. We defend against three attack modes: i) adversarial suffix, where an adversarial sequence is appended at the end of a harmful prompt; ii) adversarial insertion, where the adversarial sequence is inserted anywhere in the middle of the prompt; and iii) adversarial infusion, where adversarial tokens are inserted at arbitrary positions in the prompt, not necessarily as a contiguous block. Our experimental results demonstrate that this procedure can obtain strong certified safety guarantees on harmful prompts while maintaining good empirical performance on safe prompts. Additionally, we propose three efficient empirical defenses: i) RandEC, a randomized subsampling version of erase-and-check; ii) GreedyEC, which greedily erases tokens that maximize the softmax score of the harmful class; and iii) GradEC, which uses gradient information to optimize tokens to erase. We demonstrate their effectiveness against adversarial prompts generated by the Greedy Coordinate Gradient (GCG) attack algorithm. The code for our experiments is available at https://github.com/aounon/certified-llm-safety. △ Less

Submitted 12 February, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

arXiv:2307.13192 [pdf, other]

Counterfactual Explanation Policies in RL

Authors: Shripad V. Deshmukh, Srivatsan R, Supriti Vijay, Jayakumar Subramanian, Chirag Agarwal

Abstract: As Reinforcement Learning (RL) agents are increasingly employed in diverse decision-making problems using reward preferences, it becomes important to ensure that policies learned by these frameworks in map** observations to a probability distribution of the possible actions are explainable. However, there is little to no work in the systematic understanding of these complex policies in a contras… ▽ More As Reinforcement Learning (RL) agents are increasingly employed in diverse decision-making problems using reward preferences, it becomes important to ensure that policies learned by these frameworks in map** observations to a probability distribution of the possible actions are explainable. However, there is little to no work in the systematic understanding of these complex policies in a contrastive manner, i.e., what minimal changes to the policy would improve/worsen its performance to a desired level. In this work, we present COUNTERPOL, the first framework to analyze RL policies using counterfactual explanations in the form of minimal changes to the policy that lead to the desired outcome. We do so by incorporating counterfactuals in supervised learning in RL with the target outcome regulated using desired return. We establish a theoretical connection between Counterpol and widely used trust region-based policy optimization methods in RL. Extensive empirical analysis shows the efficacy of COUNTERPOL in generating explanations for (un)learning skills while kee** close to the original policy. Our results on five different RL environments with diverse state and action spaces demonstrate the utility of counterfactual explanations, paving the way for new frontiers in designing and develo** counterfactual policies. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: ICML Workshop on Counterfactuals in Minds and Machines, 2023

arXiv:2305.04073 [pdf, other]

Explaining RL Decisions with Trajectories

Authors: Shripad Vilasrao Deshmukh, Arpan Dasgupta, Balaji Krishnamurthy, Nan Jiang, Chirag Agarwal, Georgios Theocharous, Jayakumar Subramanian

Abstract: Explanation is a key component for the adoption of reinforcement learning (RL) in many real-world decision-making problems. In the literature, the explanation is often provided by saliency attribution to the features of the RL agent's state. In this work, we propose a complementary approach to these explanations, particularly for offline RL, where we attribute the policy decisions of a trained RL… ▽ More Explanation is a key component for the adoption of reinforcement learning (RL) in many real-world decision-making problems. In the literature, the explanation is often provided by saliency attribution to the features of the RL agent's state. In this work, we propose a complementary approach to these explanations, particularly for offline RL, where we attribute the policy decisions of a trained RL agent to the trajectories encountered by it during training. To do so, we encode trajectories in offline training data individually as well as collectively (encoding a set of trajectories). We then attribute policy decisions to a set of trajectories in this encoded space by estimating the sensitivity of the decision with respect to that set. Further, we demonstrate the effectiveness of the proposed approach in terms of quality of attributions as well as practical scalability in diverse environments that involve both discrete and continuous state and action spaces such as grid-worlds, video games (Atari) and continuous control (MuJoCo). We also conduct a human study on a simple navigation task to observe how their understanding of the task compares with data attributed for a trained RL policy. Keywords -- Explainable AI, Verifiability of AI Decisions, Explainable RL. △ Less

Submitted 22 January, 2024; v1 submitted 6 May, 2023; originally announced May 2023.

Comments: Published at International Conference on Learning Representations (ICLR), 2023

arXiv:2304.12631 [pdf, other]

Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation

Authors: Michael Llordes, Debasis Ganguly, Sumit Bhatia, Chirag Agarwal

Abstract: Neural retrieval models (NRMs) have been shown to outperform their statistical counterparts owing to their ability to capture semantic meaning via dense document representations. These models, however, suffer from poor interpretability as they do not rely on explicit term matching. As a form of local per-query explanations, we introduce the notion of equivalent queries that are generated by maximi… ▽ More Neural retrieval models (NRMs) have been shown to outperform their statistical counterparts owing to their ability to capture semantic meaning via dense document representations. These models, however, suffer from poor interpretability as they do not rely on explicit term matching. As a form of local per-query explanations, we introduce the notion of equivalent queries that are generated by maximizing the similarity between the NRM's results and the result set of a sparse retrieval system with the equivalent query. We then compare this approach with existing methods such as RM3-based query expansion and contrast differences in retrieval effectiveness and in the terms generated by each approach. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: Accepted at SIGIR 2023

arXiv:2303.10431 [pdf, other]

DeAR: Debiasing Vision-Language Models with Additive Residuals

Authors: Ashish Seth, Mayur Hemani, Chirag Agarwal

Abstract: Large pre-trained vision-language models (VLMs) reduce the time for develo** predictive models for various vision-grounded language downstream tasks by providing rich, adaptable image and text representations. However, these models suffer from societal biases owing to the skewed distribution of various identity groups in the training data. These biases manifest as the skewed similarity between t… ▽ More Large pre-trained vision-language models (VLMs) reduce the time for develo** predictive models for various vision-grounded language downstream tasks by providing rich, adaptable image and text representations. However, these models suffer from societal biases owing to the skewed distribution of various identity groups in the training data. These biases manifest as the skewed similarity between the representations for specific text concepts and images of people of different identity groups and, therefore, limit the usefulness of such models in real-world high-stakes applications. In this work, we present DeAR (Debiasing with Additive Residuals), a novel debiasing method that learns additive residual image representations to offset the original representations, ensuring fair output representations. In doing so, it reduces the ability of the representations to distinguish between the different identity groups. Further, we observe that the current fairness tests are performed on limited face image datasets that fail to indicate why a specific text concept should/should not apply to them. To bridge this gap and better evaluate DeAR, we introduce the Protected Attribute Tag Association (PATA) dataset - a new context-based bias benchmarking dataset for evaluating the fairness of large pre-trained VLMs. Additionally, PATA provides visual context for a diverse human population in different scenarios with both positive and negative connotations. Experimental results for fairness and zero-shot performance preservation using multiple datasets demonstrate the efficacy of our framework. △ Less

Submitted 18 March, 2023; originally announced March 2023.

Comments: Accepted to CVPR'23. Codes and dataset will be released soon

arXiv:2302.13406 [pdf, other]

GNNDelete: A General Strategy for Unlearning in Graph Neural Networks

Authors: Jiali Cheng, George Dasoulas, Huan He, Chirag Agarwal, Marinka Zitnik

Abstract: Graph unlearning, which involves deleting graph elements such as nodes, node labels, and relationships from a trained graph neural network (GNN) model, is crucial for real-world applications where data elements may become irrelevant, inaccurate, or privacy-sensitive. However, existing methods for graph unlearning either deteriorate model weights shared across all nodes or fail to effectively delet… ▽ More Graph unlearning, which involves deleting graph elements such as nodes, node labels, and relationships from a trained graph neural network (GNN) model, is crucial for real-world applications where data elements may become irrelevant, inaccurate, or privacy-sensitive. However, existing methods for graph unlearning either deteriorate model weights shared across all nodes or fail to effectively delete edges due to their strong dependence on local graph neighborhoods. To address these limitations, we introduce GNNDelete, a novel model-agnostic layer-wise operator that optimizes two critical properties, namely, Deleted Edge Consistency and Neighborhood Influence, for graph unlearning. Deleted Edge Consistency ensures that the influence of deleted elements is removed from both model weights and neighboring representations, while Neighborhood Influence guarantees that the remaining model knowledge is preserved after deletion. GNNDelete updates representations to delete nodes and edges from the model while retaining the rest of the learned knowledge. We conduct experiments on seven real-world graphs, showing that GNNDelete outperforms existing approaches by up to 38.8% (AUC) on edge, node, and node feature deletion tasks, and 32.2% on distinguishing deleted edges from non-deleted ones. Additionally, GNNDelete is efficient, taking 12.3x less time and 9.3x less space than retraining GNN from scratch on WordNet18. △ Less

Submitted 26 February, 2023; originally announced February 2023.

Comments: Accepted to ICLR2023

arXiv:2301.06928 [pdf, other]

Towards Estimating Transferability using Hard Subsets

Authors: Tarun Ram Menta, Surgan Jandial, Akash Patil, Vimal KB, Saketh Bachu, Balaji Krishnamurthy, Vineeth N. Balasubramanian, Chirag Agarwal, Mausoom Sarkar

Abstract: As transfer learning techniques are increasingly used to transfer knowledge from the source model to the target task, it becomes important to quantify which source models are suitable for a given target task without performing computationally expensive fine tuning. In this work, we propose HASTE (HArd Subset TransfErability), a new strategy to estimate the transferability of a source model to a pa… ▽ More As transfer learning techniques are increasingly used to transfer knowledge from the source model to the target task, it becomes important to quantify which source models are suitable for a given target task without performing computationally expensive fine tuning. In this work, we propose HASTE (HArd Subset TransfErability), a new strategy to estimate the transferability of a source model to a particular target task using only a harder subset of target data. By leveraging the internal and output representations of model, we introduce two techniques, one class agnostic and another class specific, to identify harder subsets and show that HASTE can be used with any existing transferability metric to improve their reliability. We further analyze the relation between HASTE and the optimal average log likelihood as well as negative conditional entropy and empirically validate our theoretical bounds. Our experimental results across multiple source model architectures, target datasets, and transfer learning tasks show that HASTE modified metrics are consistently better or on par with the state of the art transferability metrics. △ Less

Submitted 17 January, 2023; originally announced January 2023.

Comments: First three authors contributed equally

arXiv:2211.16731 [pdf, other]

Towards Training GNNs using Explanation Directed Message Passing

Authors: Valentina Giunchiglia, Chirag Varun Shukla, Guadalupe Gonzalez, Chirag Agarwal

Abstract: With the increasing use of Graph Neural Networks (GNNs) in critical real-world applications, several post hoc explanation methods have been proposed to understand their predictions. However, there has been no work in generating explanations on the fly during model training and utilizing them to improve the expressive power of the underlying GNN models. In this work, we introduce a novel explanatio… ▽ More With the increasing use of Graph Neural Networks (GNNs) in critical real-world applications, several post hoc explanation methods have been proposed to understand their predictions. However, there has been no work in generating explanations on the fly during model training and utilizing them to improve the expressive power of the underlying GNN models. In this work, we introduce a novel explanation-directed neural message passing framework for GNNs, EXPASS (EXplainable message PASSing), which aggregates only embeddings from nodes and edges identified as important by a GNN explanation method. EXPASS can be used with any existing GNN architecture and subgraph-optimizing explainer to learn accurate graph embeddings. We theoretically show that EXPASS alleviates the oversmoothing problem in GNNs by slowing the layer wise loss of Dirichlet energy and that the embedding difference between the vanilla message passing and EXPASS framework can be upper bounded by the difference of their respective model weights. Our empirical results show that graph embeddings learned using EXPASS improve the predictive performance and alleviate the oversmoothing problems of GNNs, opening up new frontiers in graph machine learning to develop explanation-based training frameworks. △ Less

Submitted 1 December, 2022; v1 submitted 29 November, 2022; originally announced November 2022.

Comments: Accepted to the proceedings of the First Learning on Graphs Conference (LoG 2022)

arXiv:2208.09339 [pdf, other]

Evaluating Explainability for Graph Neural Networks

Authors: Chirag Agarwal, Owen Queen, Himabindu Lakkaraju, Marinka Zitnik

Abstract: As post hoc explanations are increasingly used to understand the behavior of graph neural networks (GNNs), it becomes crucial to evaluate the quality and reliability of GNN explanations. However, assessing the quality of GNN explanations is challenging as existing graph datasets have no or unreliable ground-truth explanations for a given task. Here, we introduce a synthetic graph data generator, S… ▽ More As post hoc explanations are increasingly used to understand the behavior of graph neural networks (GNNs), it becomes crucial to evaluate the quality and reliability of GNN explanations. However, assessing the quality of GNN explanations is challenging as existing graph datasets have no or unreliable ground-truth explanations for a given task. Here, we introduce a synthetic graph data generator, ShapeGGen, which can generate a variety of benchmark datasets (e.g., varying graph sizes, degree distributions, homophilic vs. heterophilic graphs) accompanied by ground-truth explanations. Further, the flexibility to generate diverse synthetic datasets and corresponding ground-truth explanations allows us to mimic the data generated by various real-world applications. We include ShapeGGen and several real-world graph datasets into an open-source graph explainability library, GraphXAI. In addition to synthetic and real-world graph datasets with ground-truth explanations, GraphXAI provides data loaders, data processing functions, visualizers, GNN model implementations, and evaluation metrics to benchmark the performance of GNN explainability methods. △ Less

Submitted 16 January, 2023; v1 submitted 19 August, 2022; originally announced August 2022.

arXiv:2206.11104 [pdf, other]

OpenXAI: Towards a Transparent Evaluation of Model Explanations

Authors: Chirag Agarwal, Dan Ley, Satyapriya Krishna, Eshika Saxena, Martin Pawelczyk, Nari Johnson, Isha Puri, Marinka Zitnik, Himabindu Lakkaraju

Abstract: While several types of post hoc explanation methods have been proposed in recent literature, there is very little work on systematically benchmarking these methods. Here, we introduce OpenXAI, a comprehensive and extensible open-source framework for evaluating and benchmarking post hoc explanation methods. OpenXAI comprises of the following key components: (i) a flexible synthetic data generator a… ▽ More While several types of post hoc explanation methods have been proposed in recent literature, there is very little work on systematically benchmarking these methods. Here, we introduce OpenXAI, a comprehensive and extensible open-source framework for evaluating and benchmarking post hoc explanation methods. OpenXAI comprises of the following key components: (i) a flexible synthetic data generator and a collection of diverse real-world datasets, pre-trained models, and state-of-the-art feature attribution methods, and (ii) open-source implementations of eleven quantitative metrics for evaluating faithfulness, stability (robustness), and fairness of explanation methods, in turn providing comparisons of several explanation methods across a wide variety of metrics, models, and datasets. OpenXAI is easily extensible, as users can readily evaluate custom explanation methods and incorporate them into our leaderboards. Overall, OpenXAI provides an automated end-to-end pipeline that not only simplifies and standardizes the evaluation of post hoc explanation methods, but also promotes transparency and reproducibility in benchmarking these methods. While the first release of OpenXAI supports only tabular datasets, the explanation methods and metrics that we consider are general enough to be applicable to other data modalities. OpenXAI datasets and models, implementations of state-of-the-art explanation methods and evaluation metrics, are publicly available at this GitHub link. △ Less

Submitted 13 March, 2024; v1 submitted 22 June, 2022; originally announced June 2022.

Comments: Newer version with updated results and code

arXiv:2206.01465 [pdf, other]

PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDP

Authors: Chaitanya Agarwal, Shibashis Guha, Jan Křetínský, M. Pazhamalai

Abstract: Markov decision processes (MDP) and continuous-time MDP (CTMDP) are the fundamental models for non-deterministic systems with probabilistic uncertainty. Mean payoff (a.k.a. long-run average reward) is one of the most classic objectives considered in their context. We provide the first algorithm to compute mean payoff probably approximately correctly in unknown MDP; further, we extend it to unknown… ▽ More Markov decision processes (MDP) and continuous-time MDP (CTMDP) are the fundamental models for non-deterministic systems with probabilistic uncertainty. Mean payoff (a.k.a. long-run average reward) is one of the most classic objectives considered in their context. We provide the first algorithm to compute mean payoff probably approximately correctly in unknown MDP; further, we extend it to unknown CTMDP. We do not require any knowledge of the state space, only a lower bound on the minimum transition probability, which has been advocated in literature. In addition to providing probably approximately correct (PAC) bounds for our algorithm, we also demonstrate its practical nature by running experiments on standard benchmarks. △ Less

Submitted 3 June, 2022; originally announced June 2022.

Comments: Full version of CAV 2022 paper, 57 pages

arXiv:2203.06877 [pdf, other]

Rethinking Stability for Attribution-based Explanations

Authors: Chirag Agarwal, Nari Johnson, Martin Pawelczyk, Satyapriya Krishna, Eshika Saxena, Marinka Zitnik, Himabindu Lakkaraju

Abstract: As attribution-based explanation methods are increasingly used to establish model trustworthiness in high-stakes situations, it is critical to ensure that these explanations are stable, e.g., robust to infinitesimal perturbations to an input. However, previous works have shown that state-of-the-art explanation methods generate unstable explanations. Here, we introduce metrics to quantify the stabi… ▽ More As attribution-based explanation methods are increasingly used to establish model trustworthiness in high-stakes situations, it is critical to ensure that these explanations are stable, e.g., robust to infinitesimal perturbations to an input. However, previous works have shown that state-of-the-art explanation methods generate unstable explanations. Here, we introduce metrics to quantify the stability of an explanation and show that several popular explanation methods are unstable. In particular, we propose new Relative Stability metrics that measure the change in output explanation with respect to change in input, model representation, or output of the underlying predictor. Finally, our experimental evaluation with three real-world datasets demonstrates interesting insights for seven explanation methods and different stability metrics. △ Less

Submitted 14 March, 2022; originally announced March 2022.

arXiv:2107.13098 [pdf, other]

A Tale Of Two Long Tails

Authors: Daniel D'souza, Zach Nussbaum, Chirag Agarwal, Sara Hooker

Abstract: As machine learning models are increasingly employed to assist human decision-makers, it becomes critical to communicate the uncertainty associated with these model predictions. However, the majority of work on uncertainty has focused on traditional probabilistic or ranking approaches - where the model assigns low probabilities or scores to uncertain examples. While this captures what examples are… ▽ More As machine learning models are increasingly employed to assist human decision-makers, it becomes critical to communicate the uncertainty associated with these model predictions. However, the majority of work on uncertainty has focused on traditional probabilistic or ranking approaches - where the model assigns low probabilities or scores to uncertain examples. While this captures what examples are challenging for the model, it does not capture the underlying source of the uncertainty. In this work, we seek to identify examples the model is uncertain about and characterize the source of said uncertainty. We explore the benefits of designing a targeted intervention - targeted data augmentation of the examples where the model is uncertain over the course of training. We investigate whether the rate of learning in the presence of additional information differs between atypical and noisy examples? Our results show that this is indeed the case, suggesting that well-designed interventions over the course of training can be an effective way to characterize and distinguish between different sources of uncertainty. △ Less

Submitted 27 July, 2021; originally announced July 2021.

Comments: Preliminary results accepted to Workshop on Uncertainty and Robustness in Deep Learning (UDL), ICML, 2021

arXiv:2106.09992 [pdf, other]

Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis

Authors: Martin Pawelczyk, Chirag Agarwal, Shalmali Joshi, Sohini Upadhyay, Himabindu Lakkaraju

Abstract: As machine learning (ML) models become more widely deployed in high-stakes applications, counterfactual explanations have emerged as key tools for providing actionable model explanations in practice. Despite the growing popularity of counterfactual explanations, a deeper understanding of these explanations is still lacking. In this work, we systematically analyze counterfactual explanations throug… ▽ More As machine learning (ML) models become more widely deployed in high-stakes applications, counterfactual explanations have emerged as key tools for providing actionable model explanations in practice. Despite the growing popularity of counterfactual explanations, a deeper understanding of these explanations is still lacking. In this work, we systematically analyze counterfactual explanations through the lens of adversarial examples. We do so by formalizing the similarities between popular counterfactual explanation and adversarial example generation methods identifying conditions when they are equivalent. We then derive the upper bounds on the distances between the solutions output by counterfactual explanation and adversarial example generation methods, which we validate on several real-world data sets. By establishing these theoretical and empirical similarities between counterfactual explanations and adversarial examples, our work raises fundamental questions about the design and development of existing counterfactual explanation algorithms. △ Less

Submitted 19 October, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

Journal ref: International Conference on Artificial Intelligence and Statistics (AISTATS), 28-30 March 2022

arXiv:2106.09078 [pdf, other]

Probing GNN Explainers: A Rigorous Theoretical and Empirical Analysis of GNN Explanation Methods

Authors: Chirag Agarwal, Marinka Zitnik, Himabindu Lakkaraju

Abstract: As Graph Neural Networks (GNNs) are increasingly being employed in critical real-world applications, several methods have been proposed in recent literature to explain the predictions of these models. However, there has been little to no work on systematically analyzing the reliability of these methods. Here, we introduce the first-ever theoretical analysis of the reliability of state-of-the-art G… ▽ More As Graph Neural Networks (GNNs) are increasingly being employed in critical real-world applications, several methods have been proposed in recent literature to explain the predictions of these models. However, there has been little to no work on systematically analyzing the reliability of these methods. Here, we introduce the first-ever theoretical analysis of the reliability of state-of-the-art GNN explanation methods. More specifically, we theoretically analyze the behavior of various state-of-the-art GNN explanation methods with respect to several desirable properties (e.g., faithfulness, stability, and fairness preservation) and establish upper bounds on the violation of these properties. We also empirically validate our theoretical results using extensive experimentation with nine real-world graph datasets. Our empirical results further shed light on several interesting insights about the behavior of state-of-the-art GNN explanation methods. △ Less

Submitted 22 February, 2022; v1 submitted 16 June, 2021; originally announced June 2021.

Comments: Accepted to AISTATS 2022

arXiv:2102.13186 [pdf, other]

Towards a Unified Framework for Fair and Stable Graph Representation Learning

Authors: Chirag Agarwal, Himabindu Lakkaraju, Marinka Zitnik

Abstract: As the representations output by Graph Neural Networks (GNNs) are increasingly employed in real-world applications, it becomes important to ensure that these representations are fair and stable. In this work, we establish a key connection between counterfactual fairness and stability and leverage it to propose a novel framework, NIFTY (uNIfying Fairness and stabiliTY), which can be used with any G… ▽ More As the representations output by Graph Neural Networks (GNNs) are increasingly employed in real-world applications, it becomes important to ensure that these representations are fair and stable. In this work, we establish a key connection between counterfactual fairness and stability and leverage it to propose a novel framework, NIFTY (uNIfying Fairness and stabiliTY), which can be used with any GNN to learn fair and stable representations. We introduce a novel objective function that simultaneously accounts for fairness and stability and develop a layer-wise weight normalization using the Lipschitz constant to enhance neural message passing in GNNs. In doing so, we enforce fairness and stability both in the objective function as well as in the GNN architecture. Further, we show theoretically that our layer-wise weight normalization promotes counterfactual fairness and stability in the resulting representations. We introduce three new graph datasets comprising of high-stakes decisions in criminal justice and financial lending domains. Extensive experimentation with the above datasets demonstrates the efficacy of our framework. △ Less

Submitted 16 June, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

Comments: Accepted to UAI'21

Report number: Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence (UAI 2021),

Journal ref: PMLR 161:2114-2124, 2021

arXiv:2102.10618 [pdf, other]

Towards the Unification and Robustness of Perturbation and Gradient Based Explanations

Authors: Sushant Agarwal, Shahin Jabbari, Chirag Agarwal, Sohini Upadhyay, Zhiwei Steven Wu, Himabindu Lakkaraju

Abstract: As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on develo** techniques for explaining these black boxes in a post hoc manner. In this work, we analyze two popular post hoc interpretation techniques: SmoothGrad which is a gradient based method, and a variant of LIME which is a perturbati… ▽ More As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on develo** techniques for explaining these black boxes in a post hoc manner. In this work, we analyze two popular post hoc interpretation techniques: SmoothGrad which is a gradient based method, and a variant of LIME which is a perturbation based method. More specifically, we derive explicit closed form expressions for the explanations output by these two methods and show that they both converge to the same explanation in expectation, i.e., when the number of perturbed samples used by these methods is large. We then leverage this connection to establish other desirable properties, such as robustness, for these techniques. We also derive finite sample complexity bounds for the number of perturbations required for these methods to converge to their expected explanation. Finally, we empirically validate our theory using extensive experimentation on both synthetic and real world datasets. △ Less

Submitted 19 July, 2021; v1 submitted 21 February, 2021; originally announced February 2021.

Comments: The short version of this paper appears in the proceedings of ICML-21

arXiv:2008.11600 [pdf, other]

Estimating Example Difficulty Using Variance of Gradients

Authors: Chirag Agarwal, Daniel D'souza, Sara Hooker

Abstract: In machine learning, a question of great interest is understanding what examples are challenging for a model to classify. Identifying atypical examples ensures the safe deployment of models, isolates samples that require further human inspection and provides interpretability into model behavior. In this work, we propose Variance of Gradients (VoG) as a valuable and efficient metric to rank data by… ▽ More In machine learning, a question of great interest is understanding what examples are challenging for a model to classify. Identifying atypical examples ensures the safe deployment of models, isolates samples that require further human inspection and provides interpretability into model behavior. In this work, we propose Variance of Gradients (VoG) as a valuable and efficient metric to rank data by difficulty and to surface a tractable subset of the most challenging examples for human-in-the-loop auditing. We show that data points with high VoG scores are far more difficult for the model to learn and over-index on corrupted or memorized examples. Further, restricting the evaluation to the test set instances with the lowest VoG improves the model's generalization performance. Finally, we show that VoG is a valuable and efficient ranking for out-of-distribution detection. △ Less

Submitted 21 June, 2022; v1 submitted 26 August, 2020; originally announced August 2020.

Comments: Accepted to CVPR 2022

arXiv:2007.04663 [pdf, other]

Automation Strategies for Unconstrained Crossword Puzzle Generation

Authors: Charu Agarwal, Rushikesh K. Joshi

Abstract: An unconstrained crossword puzzle is a generalization of the constrained crossword problem. In this problem, only the word vocabulary, and optionally the grid dimensions are known. Hence, it not only requires the algorithm to determine the word locations, but it also needs to come up with the grid geometry. This paper discusses algorithmic strategies for automatic crossword puzzle generation in su… ▽ More An unconstrained crossword puzzle is a generalization of the constrained crossword problem. In this problem, only the word vocabulary, and optionally the grid dimensions are known. Hence, it not only requires the algorithm to determine the word locations, but it also needs to come up with the grid geometry. This paper discusses algorithmic strategies for automatic crossword puzzle generation in such an unconstrained setting. The strategies proposed cover the tasks of selection of words from a given vocabulary, selection of grid sizes, grid resizing and adjustments, metrics for word fitting, back-tracking techniques, and also clue generation. The strategies have been formulated based on a study of the effect of word sequence permutation order on grid fitting. An end-to-end algorithm that combines these strategies is presented, and its performance is analyzed. The techniques have been found to be successful in quickly producing well-packed puzzles of even large sizes. Finally, a few example puzzles generated by our algorithm are also provided. △ Less

Submitted 9 July, 2020; originally announced July 2020.

Comments: 28 pages, 28 figures, category: cs, preprint

arXiv:2006.09373 [pdf, other]

The shape and simplicity biases of adversarially robust ImageNet-trained CNNs

Authors: Peijie Chen, Chirag Agarwal, Anh Nguyen

Abstract: Increasingly more similarities between human vision and convolutional neural networks (CNNs) have been revealed in the past few years. Yet, vanilla CNNs often fall short in generalizing to adversarial or out-of-distribution (OOD) examples which humans demonstrate superior performance. Adversarial training is a leading learning algorithm for improving the robustness of CNNs on adversarial and OOD d… ▽ More Increasingly more similarities between human vision and convolutional neural networks (CNNs) have been revealed in the past few years. Yet, vanilla CNNs often fall short in generalizing to adversarial or out-of-distribution (OOD) examples which humans demonstrate superior performance. Adversarial training is a leading learning algorithm for improving the robustness of CNNs on adversarial and OOD data; however, little is known about the properties, specifically the shape bias and internal features learned inside adversarially-robust CNNs. In this paper, we perform a thorough, systematic study to understand the shape bias and some internal mechanisms that enable the generalizability of AlexNet, GoogLeNet, and ResNet-50 models trained via adversarial training. We find that while standard ImageNet classifiers have a strong texture bias, their R counterparts rely heavily on shapes. Remarkably, adversarial training induces three simplicity biases into hidden neurons in the process of "robustifying" CNNs. That is, each convolutional neuron in R networks often changes to detecting (1) pixel-wise smoother patterns, i.e., a mechanism that blocks high-frequency noise from passing through the network; (2) more lower-level features i.e. textures and colors (instead of objects);and (3) fewer types of inputs. Our findings reveal the interesting mechanisms that made networks more adversarially robust and also explain some recent findings e.g., why R networks benefit from a much larger capacity (Xie et al. 2020) and can act as a strong image prior in image synthesis (Santurkar et al. 2019). △ Less

Submitted 12 September, 2022; v1 submitted 16 June, 2020; originally announced June 2020.

arXiv:2003.08754 [pdf, other]

SAM: The Sensitivity of Attribution Methods to Hyperparameters

Authors: Naman Bansal, Chirag Agarwal, Anh Nguyen

Abstract: Attribution methods can provide powerful insights into the reasons for a classifier's decision. We argue that a key desideratum of an explanation method is its robustness to input hyperparameters which are often randomly set or empirically tuned. High sensitivity to arbitrary hyperparameter choices does not only impede reproducibility but also questions the correctness of an explanation and impair… ▽ More Attribution methods can provide powerful insights into the reasons for a classifier's decision. We argue that a key desideratum of an explanation method is its robustness to input hyperparameters which are often randomly set or empirically tuned. High sensitivity to arbitrary hyperparameter choices does not only impede reproducibility but also questions the correctness of an explanation and impairs the trust of end-users. In this paper, we provide a thorough empirical study on the sensitivity of existing attribution methods. We found an alarming trend that many methods are highly sensitive to changes in their common hyperparameters e.g. even changing a random seed can yield a different explanation! Interestingly, such sensitivity is not reflected in the average explanation accuracy scores over the dataset as commonly reported in the literature. In addition, explanations generated for robust classifiers (i.e. which are trained to be invariant to pixel-wise perturbations) are surprisingly more robust than those generated for regular classifiers. △ Less

Submitted 12 April, 2020; v1 submitted 4 March, 2020; originally announced March 2020.

Comments: Oral paper at CVPR 2020

arXiv:2002.01053 [pdf, other]

doi 10.1109/ICIP40778.2020.9190825

Deep-URL: A Model-Aware Approach To Blind Deconvolution Based On Deep Unfolded Richardson-Lucy Network

Authors: Chirag Agarwal, Shahin Khobahi, Arindam Bose, Mojtaba Soltanalian, Dan Schonfeld

Abstract: The lack of interpretability in current deep learning models causes serious concerns as they are extensively used for various life-critical applications. Hence, it is of paramount importance to develop interpretable deep learning models. In this paper, we consider the problem of blind deconvolution and propose a novel model-aware deep architecture that allows for the recovery of both the blur kern… ▽ More The lack of interpretability in current deep learning models causes serious concerns as they are extensively used for various life-critical applications. Hence, it is of paramount importance to develop interpretable deep learning models. In this paper, we consider the problem of blind deconvolution and propose a novel model-aware deep architecture that allows for the recovery of both the blur kernel and the sharp image from the blurred image. In particular, we propose the Deep Unfolded Richardson-Lucy (Deep-URL) framework -- an interpretable deep-learning architecture that can be seen as an amalgamation of classical estimation technique and deep neural network, and consequently leads to improved performance. Our numerical investigations demonstrate significant improvement compared to state-of-the-art algorithms. △ Less

Submitted 7 June, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

Comments: Accepted. 27th IEEE International Conference on Image Processing (ICIP), 2020

arXiv:1910.04256 [pdf, other]

Explaining image classifiers by removing input features using generative models

Authors: Chirag Agarwal, Anh Nguyen

Abstract: Perturbation-based explanation methods often measure the contribution of an input feature to an image classifier's outputs by heuristically removing it via e.g. blurring, adding noise, or graying out, which often produce unrealistic, out-of-samples. Instead, we propose to integrate a generative inpainter into three representative attribution methods to remove an input feature. Our proposed change… ▽ More Perturbation-based explanation methods often measure the contribution of an input feature to an image classifier's outputs by heuristically removing it via e.g. blurring, adding noise, or graying out, which often produce unrealistic, out-of-samples. Instead, we propose to integrate a generative inpainter into three representative attribution methods to remove an input feature. Our proposed change improved all three methods in (1) generating more plausible counterfactual samples under the true data distribution; (2) being more accurate according to three metrics: object localization, deletion, and saliency metrics; and (3) being more robust to hyperparameter changes. Our findings were consistent across both ImageNet and Places365 datasets and two different pairs of classifiers and inpainters. △ Less

Submitted 6 October, 2020; v1 submitted 9 October, 2019; originally announced October 2019.

Comments: Accepted to Asian Conference on Computer Vision (ACCV), 2020

arXiv:1811.00621 [pdf, ps, other]

Improving Adversarial Robustness by Encouraging Discriminative Features

Authors: Chirag Agarwal, Anh Nguyen, Dan Schonfeld

Abstract: Deep neural networks (DNNs) have achieved state-of-the-art results in various pattern recognition tasks. However, they perform poorly on out-of-distribution adversarial examples i.e. inputs that are specifically crafted by an adversary to cause DNNs to misbehave, questioning the security and reliability of applications. In this paper, we encourage DNN classifiers to learn more discriminative featu… ▽ More Deep neural networks (DNNs) have achieved state-of-the-art results in various pattern recognition tasks. However, they perform poorly on out-of-distribution adversarial examples i.e. inputs that are specifically crafted by an adversary to cause DNNs to misbehave, questioning the security and reliability of applications. In this paper, we encourage DNN classifiers to learn more discriminative features by imposing a center loss in addition to the regular softmax cross-entropy loss. Intuitively, the center loss encourages DNNs to simultaneously learns a center for the deep features of each class, and minimize the distances between the intra-class deep features and their corresponding class centers. We hypothesize that minimizing distances between intra-class features and maximizing the distances between inter-class features at the same time would improve a classifier's robustness to adversarial examples. Our results on state-of-the-art architectures on MNIST, CIFAR-10, and CIFAR-100 confirmed that intuition and highlight the importance of discriminative features. △ Less

Submitted 8 May, 2019; v1 submitted 1 November, 2018; originally announced November 2018.

Comments: This article corresponds to the accepted version at IEEE ICIP 2019. We will link the DOI as soon as it is available

Journal ref: 2019 26th IEEE International Conference on Image Processing (ICIP)

arXiv:1806.01477 [pdf, other]

An Explainable Adversarial Robustness Metric for Deep Learning Neural Networks

Authors: Chirag Agarwal, Bo Dong, Dan Schonfeld, Anthony Hoogs

Abstract: Deep Neural Networks(DNN) have excessively advanced the field of computer vision by achieving state of the art performance in various vision tasks. These results are not limited to the field of vision but can also be seen in speech recognition and machine translation tasks. Recently, DNNs are found to poorly fail when tested with samples that are crafted by making imperceptible changes to the orig… ▽ More Deep Neural Networks(DNN) have excessively advanced the field of computer vision by achieving state of the art performance in various vision tasks. These results are not limited to the field of vision but can also be seen in speech recognition and machine translation tasks. Recently, DNNs are found to poorly fail when tested with samples that are crafted by making imperceptible changes to the original input images. This causes a gap between the validation and adversarial performance of a DNN. An effective and generalizable robustness metric for evaluating the performance of DNN on these adversarial inputs is still missing from the literature. In this paper, we propose Noise Sensitivity Score (NSS), a metric that quantifies the performance of a DNN on a specific input under different forms of fix-directional attacks. An insightful mathematical explanation is provided for deeply understanding the proposed metric. By leveraging the NSS, we also proposed a skewness based dataset robustness metric for evaluating a DNN's adversarial performance on a given dataset. Extensive experiments using widely used state of the art architectures along with popular classification datasets, such as MNIST, CIFAR-10, CIFAR-100, and ImageNet, are used to validate the effectiveness and generalization of our proposed metrics. Instead of simply measuring a DNN's adversarial robustness in the input domain, as previous works, the proposed NSS is built on top of insightful mathematical understanding of the adversarial attack and gives a more explicit explanation of the robustness. △ Less

Submitted 6 June, 2018; v1 submitted 4 June, 2018; originally announced June 2018.

arXiv:1711.02581 [pdf, ps, other]

Convolutional Neural Network Steganalysis's Application to Steganography

Authors: Mehdi Sharifzadeh, Chirag Agarwal, Mohammed Aloraini, Dan Schonfeld

Abstract: This paper presents a novel approach to increase the performance bounds of image steganography under the criteria of minimizing distortion. The proposed approach utilizes a steganalysis convolutional neural network (CNN) framework to understand an image's model and embed in less detectable regions to preserve the model. In other word, the trained steganalysis CNN is used to calculate derivatives o… ▽ More This paper presents a novel approach to increase the performance bounds of image steganography under the criteria of minimizing distortion. The proposed approach utilizes a steganalysis convolutional neural network (CNN) framework to understand an image's model and embed in less detectable regions to preserve the model. In other word, the trained steganalysis CNN is used to calculate derivatives of the statistical model of an image with respect to embedding changes. The experimental results show that the proposed algorithm outperforms previous state-of-the-art methods in a wide range of low relative payloads when compared with HUGO, S-UNIWARD, and HILL by the state-of-the-art steganalysis. △ Less

Submitted 6 November, 2017; originally announced November 2017.

Comments: arXiv admin note: substantial text overlap with arXiv:1705.08616

arXiv:1705.08616 [pdf, ps, other]

A New Parallel Message-distribution Technique for Cost-based Steganography

Authors: Mehdi Sharifzadeh, Chirag Agarwal, Mahdi Salarian, Dan Schonfeld

Abstract: This paper presents two novel approaches to increase performance bounds of image steganography under the criteria of minimizing distortion. First, in order to efficiently use the images' capacities, we propose using parallel images in the embedding stage. The result is then used to prove sub-optimality of the message distribution technique used by all cost based algorithms including HUGO, S-UNIWAR… ▽ More This paper presents two novel approaches to increase performance bounds of image steganography under the criteria of minimizing distortion. First, in order to efficiently use the images' capacities, we propose using parallel images in the embedding stage. The result is then used to prove sub-optimality of the message distribution technique used by all cost based algorithms including HUGO, S-UNIWARD, and HILL. Second, a new distribution approach is presented to further improve the security of these algorithms. Experiments show that this distribution method avoids embedding in smooth regions and thus achieves a better performance, measured by state-of-the-art steganalysis, when compared with the current used distribution. △ Less

Submitted 7 July, 2019; v1 submitted 24 May, 2017; originally announced May 2017.

arXiv:1705.07404 [pdf, other]

Convergence of backpropagation with momentum for network architectures with skip connections

Authors: Chirag Agarwal, Joe Klobusicky, Dan Schonfeld

Abstract: We study a class of deep neural networks with networks that form a directed acyclic graph (DAG). For backpropagation defined by gradient descent with adaptive momentum, we show weights converge for a large class of nonlinear activation functions. The proof generalizes the results of Wu et al. (2008) who showed convergence for a feed forward network with one hidden layer. For an example of the effe… ▽ More We study a class of deep neural networks with networks that form a directed acyclic graph (DAG). For backpropagation defined by gradient descent with adaptive momentum, we show weights converge for a large class of nonlinear activation functions. The proof generalizes the results of Wu et al. (2008) who showed convergence for a feed forward network with one hidden layer. For an example of the effectiveness of DAG architectures, we describe an example of compression through an autoencoder, and compare against sequential feed forward networks under several metrics. △ Less

Submitted 18 January, 2020; v1 submitted 21 May, 2017; originally announced May 2017.

arXiv:1209.2631 [pdf]

doi 10.1080/14786435.2013.824626

Role of Heterogeneities in Staebler-Wronski Effect

Authors: S. C. Agarwal

Abstract: The effect of light soaking (LS) on the properties of hydrogenated amorphous silicon presents many challenging puzzles. Some of them are discussed here, along with their present status. In particular the role of the heterogeneities in LS is examined. We find that for the majority of the solved as well unsolved puzzles the long range potential fluctuations arising from the heterogeneities in the fi… ▽ More The effect of light soaking (LS) on the properties of hydrogenated amorphous silicon presents many challenging puzzles. Some of them are discussed here, along with their present status. In particular the role of the heterogeneities in LS is examined. We find that for the majority of the solved as well unsolved puzzles the long range potential fluctuations arising from the heterogeneities in the films can provide answers which look quite plausible. △ Less

Submitted 12 September, 2012; originally announced September 2012.

Comments: 10 pages, 7 figures

Report number: iitk/2012/5

arXiv:1207.5426 [pdf]

doi 10.1186/1556-276X-8-336

Ion beam generated surface ripples: new insight in the underlying mechanism

Authors: Tanuj Kumar, Ashish Kumar, D. C. Agarwal, N. P. Lalla, D. Kanjilal

Abstract: A new hydrodynamic mechanism is proposed for the ion beam induced surface patterning on solid surfaces. Unlike the standard mechanisms based on the ion beam impact generated erosion and mass redistribution at the free surface (proposed by Bradley-Harper (BH) and its extended theories), the new mechanism proposes that the ion beam induced saltation and creep processes, coupled with incompressible s… ▽ More A new hydrodynamic mechanism is proposed for the ion beam induced surface patterning on solid surfaces. Unlike the standard mechanisms based on the ion beam impact generated erosion and mass redistribution at the free surface (proposed by Bradley-Harper (BH) and its extended theories), the new mechanism proposes that the ion beam induced saltation and creep processes, coupled with incompressible solid flow in amorphous layer, leads to the formation of ripple patterns at the amorphous/crystalline (a/c) interface and hence at the free surface. Ion beam stimulated solid flow inside the amorphous layer controls the wavelength, where as the amount of material transported and re-deposited at a/c interface control the amplitude of ripples. The new approach is verified by designed experiments and supported by the discrete simulation method. △ Less

Submitted 1 June, 2013; v1 submitted 23 July, 2012; originally announced July 2012.

Comments: 12 pages, 6 figures. arXiv admin note: substantial text overlap with arXiv:1206.0821

arXiv:1206.0821

Hydro-dynamics of surface patterning by ion beam irradiation: an interface phenomenon

Authors: Tanuj Kumar, D. C. Agarwal, S. A. Khan, N. P. Lalla, D. Kanjilal

Abstract: We show that the ion beam induced incompressible amorphous solid flow in terms of advection transport mechanism leads to the erosion and deposition of atoms at the amorphous/crystalline (a/c) interface resulting in the formation of pattern at the a/c interface as well as at the free surface. The ion beam impact generated erosion and mass redistribution at the free surface are found to have insigni… ▽ More We show that the ion beam induced incompressible amorphous solid flow in terms of advection transport mechanism leads to the erosion and deposition of atoms at the amorphous/crystalline (a/c) interface resulting in the formation of pattern at the a/c interface as well as at the free surface. The ion beam impact generated erosion and mass redistribution at the free surface are found to have insignificant effect in patterning of surface. By varying the thicknesses of amorphous layer, it has been established that a/c interface plays the dominant role in surface patterning. The morphological variation of Si surface after 50 keV Ar+ ion bombardment has been investigated by atomic force microscopy (AFM) and cross-sectional transmission electron microscopy (X-TEM) as a function of ion fluence. Navier-Stokes flow inside the amorphous layer coupled with the Exner equation successfully explains the growth mechanism of surface patterning. △ Less

Submitted 21 February, 2014; v1 submitted 5 June, 2012; originally announced June 2012.

Comments: This paper has been withdrawn by the author due to a crucial error in calculation in equation 1

arXiv:0710.2587 [pdf]

doi 10.1103/PhysRevLett.100.067204

Origin of the anomalous magnetic circular dichroism spectral shape in ferromagnetic (Ga,Mn)As: Impurity bands inside the band gap

Authors: K. Ando, H. Saito, K. C. Agarwal, M. C. Debnath, V. Zayets

Abstract: The electronic structure of a prototype dilute magnetic semiconductor (DMS), Ga1-xMnxAs, is studied by magnetic circular dichroism (MCD) spectroscopy. We prove that the optical transitions originated from impurity bands cause the strong positive MCD background. The MCD signal due to the E0 transition from the valence band to the conduction band is negative indicating that the p-d exchange intera… ▽ More The electronic structure of a prototype dilute magnetic semiconductor (DMS), Ga1-xMnxAs, is studied by magnetic circular dichroism (MCD) spectroscopy. We prove that the optical transitions originated from impurity bands cause the strong positive MCD background. The MCD signal due to the E0 transition from the valence band to the conduction band is negative indicating that the p-d exchange interactions between the p-carriers and d-spin is antiferromagnetic. The negative E0 MCD signal also indicates that the hole-do** of the valence band is not so large as previously assumed. The impurity bands seem to play important roles for the ferromagnetism of Ga1-xMnxAs. △ Less

Submitted 13 October, 2007; originally announced October 2007.

Comments: 13 pages, 3 figures

arXiv:cond-mat/0505574 [pdf]

Ion Beam Synthesis of embedded SiC

Authors: Y. S. Katharria, V. BAranwal, D. C. Agarwal, R. Krishna, P. Kumar, F. Singh, D. Kanjilal

Abstract: The synthesis of embedded silicon carbide was carried out in N type silicon samples having (100) and (111) orientations using high dose implantation of carbon ions at room temperature. The variation of dose was employed to get dose dependence of silicon carbide formation. Postimplant annealing at 1000 C in order to anneal out the implantion induced defects and to get silicon carbide precipitates… ▽ More The synthesis of embedded silicon carbide was carried out in N type silicon samples having (100) and (111) orientations using high dose implantation of carbon ions at room temperature. The variation of dose was employed to get dose dependence of silicon carbide formation. Postimplant annealing at 1000 C in order to anneal out the implantion induced defects and to get silicon carbide precipitates in silicon matrix. Detailed Fourier tarnsform infrared spectroscopy analysis and x-ray diffraction studies confirms the formation of cubic phase of silicon carbide. The grain size of the silicon carbide precipitates is estimated to a few nano meters. The silicon carbide precipitates have been found to exhibit a better crystalline order in silicon (100) samples than in silicon (111) samples.The x-ray diffraction results also indicate the amorphization of bombarded region of the silicon samples. After annealing the amorphized region got recrystalized into a polycrystalline phase of silicon. △ Less

Submitted 24 May, 2005; originally announced May 2005.

Comments: 10 pages, 6 figures

Showing 1–41 of 41 results for author: Agarwal, C