Search | arXiv e-print repository

Visual Hallucination: Definition, Quantification, and Prescriptive Remediations

Authors: Anku Rani, Vipula Rawte, Harshad Sharma, Neeraj Anand, Krishnav Rajbangshi, Amit Sheth, Amitava Das

Abstract: The troubling rise of hallucination presents perhaps the most significant impediment to the advancement of responsible AI. In recent times, considerable research has focused on detecting and mitigating hallucination in Large Language Models (LLMs). However, it's worth noting that hallucination is also quite prevalent in Vision-Language models (VLMs). In this paper, we offer a fine-grained discours… ▽ More The troubling rise of hallucination presents perhaps the most significant impediment to the advancement of responsible AI. In recent times, considerable research has focused on detecting and mitigating hallucination in Large Language Models (LLMs). However, it's worth noting that hallucination is also quite prevalent in Vision-Language models (VLMs). In this paper, we offer a fine-grained discourse on profiling VLM hallucination based on two tasks: i) image captioning, and ii) Visual Question Answering (VQA). We delineate eight fine-grained orientations of visual hallucination: i) Contextual Guessing, ii) Identity Incongruity, iii) Geographical Erratum, iv) Visual Illusion, v) Gender Anomaly, vi) VLM as Classifier, vii) Wrong Reading, and viii) Numeric Discrepancy. We curate Visual HallucInation eLiciTation (VHILT), a publicly available dataset comprising 2,000 samples generated using eight VLMs across two tasks of captioning and VQA along with human annotations for the categories as mentioned earlier. △ Less

Submitted 30 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

arXiv:2401.01867 [pdf, other]

Dataset Difficulty and the Role of Inductive Bias

Authors: Devin Kwok, Nikhil Anand, Jonathan Frankle, Gintare Karolina Dziugaite, David Rolnick

Abstract: Motivated by the goals of dataset pruning and defect identification, a growing body of methods have been developed to score individual examples within a dataset. These methods, which we call "example difficulty scores", are typically used to rank or categorize examples, but the consistency of rankings between different training runs, scoring methods, and model architectures is generally unknown. T… ▽ More Motivated by the goals of dataset pruning and defect identification, a growing body of methods have been developed to score individual examples within a dataset. These methods, which we call "example difficulty scores", are typically used to rank or categorize examples, but the consistency of rankings between different training runs, scoring methods, and model architectures is generally unknown. To determine how example rankings vary due to these random and controlled effects, we systematically compare different formulations of scores over a range of runs and model architectures. We find that scores largely share the following traits: they are noisy over individual runs of a model, strongly correlated with a single notion of difficulty, and reveal examples that range from being highly sensitive to insensitive to the inductive biases of certain model architectures. Drawing from statistical genetics, we develop a simple method for fingerprinting model architectures using a few sensitive examples. These findings guide practitioners in maximizing the consistency of their scores (e.g. by choosing appropriate scoring methods, number of runs, and subsets of examples), and establishes comprehensive baselines for evaluating scores in the future. △ Less

Submitted 3 January, 2024; originally announced January 2024.

Comments: 10 pages, 6 figures

arXiv:2312.11669 [pdf, other]

Prediction and Control in Continual Reinforcement Learning

Authors: Nishanth Anand, Doina Precup

Abstract: Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies. In this paper, we focus on value function estimation in continual reinforcement learning. We propose to decompose the value function into two components which update at different timescales: a permanent value function, which holds general knowledge tha… ▽ More Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies. In this paper, we focus on value function estimation in continual reinforcement learning. We propose to decompose the value function into two components which update at different timescales: a permanent value function, which holds general knowledge that persists over time, and a transient value function, which allows quick adaptation to new situations. We establish theoretical results showing that our approach is well suited for continual learning and draw connections to the complementary learning systems (CLS) theory from neuroscience. Empirically, this approach improves performance significantly on both prediction and control problems. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: Published at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2311.16302 [pdf, other]

Comprehensive Benchmarking of Entropy and Margin Based Scoring Metrics for Data Selection

Authors: Anusha Sabbineni, Nikhil Anand, Maria Minakova

Abstract: While data selection methods have been studied extensively in active learning, data pruning, and data augmentation settings, there is little evidence for the efficacy of these methods in industry scale settings, particularly in low-resource languages. Our work presents ways of assessing prospective training examples in those settings for their "usefulness" or "difficulty". We also demonstrate how… ▽ More While data selection methods have been studied extensively in active learning, data pruning, and data augmentation settings, there is little evidence for the efficacy of these methods in industry scale settings, particularly in low-resource languages. Our work presents ways of assessing prospective training examples in those settings for their "usefulness" or "difficulty". We also demonstrate how these measures can be used in selecting important examples for training supervised machine learning models. We primarily experiment with entropy and Error L2-Norm (EL2N) scores. We use these metrics to curate high quality datasets from a large pool of \textit{Weak Signal Labeled} data, which assigns no-defect high confidence hypotheses during inference as ground truth labels. We then conduct training data augmentation experiments using these de-identified datasets and demonstrate that score-based selection can result in a 2% decrease in semantic error rate and 4%-7% decrease in domain classification error rate when compared to the baseline technique of random selection. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: Accepted to Efficient Natural Language and Speech Processing (ENLSP-III) workshop at NeurIPS '23

arXiv:2311.16298 [pdf, other]

Influence Scores at Scale for Efficient Language Data Sampling

Authors: Nikhil Anand, Joshua Tan, Maria Minakova

Abstract: Modern ML systems ingest data aggregated from diverse sources, such as synthetic, human-annotated, and live customer traffic. Understanding \textit{which} examples are important to the performance of a learning algorithm is crucial for efficient model training. Recently, a growing body of literature has given rise to various "influence scores," which use training artifacts such as model confidence… ▽ More Modern ML systems ingest data aggregated from diverse sources, such as synthetic, human-annotated, and live customer traffic. Understanding \textit{which} examples are important to the performance of a learning algorithm is crucial for efficient model training. Recently, a growing body of literature has given rise to various "influence scores," which use training artifacts such as model confidence or checkpointed gradients to identify important subsets of data. However, these methods have primarily been developed in computer vision settings, and it remains unclear how well they generalize to language-based tasks using pretrained models. In this paper, we explore the applicability of influence scores in language classification tasks. We evaluate a diverse subset of these scores on the SNLI dataset by quantifying accuracy changes in response to pruning training data through random and influence-score-based sampling. We then stress-test one of the scores -- "variance of gradients" (VoG) from Agarwal et al. (2022) -- in an NLU model stack that was exposed to dynamic user speech patterns in a voice assistant type of setting. Our experiments demonstrate that in many cases, encoder-based language models can be finetuned on roughly 50% of the original data without degradation in performance metrics. Along the way, we summarize lessons learned from applying out-of-the-box implementations of influence scores, quantify the effects of noisy and class-imbalanced data, and offer recommendations on score-based sampling for better accuracy and training efficiency. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: Accepted at EMNLP '23

arXiv:2306.08845 [pdf, other]

Unsupervised speech intelligibility assessment with utterance level alignment distance between teacher and learner Wav2Vec-2.0 representations

Authors: Nayan Anand, Meenakshi Sirigiraju, Chiranjeevi Yarra

Abstract: Speech intelligibility is crucial in language learning for effective communication. Thus, to develop computer-assisted language learning systems, automatic speech intelligibility detection (SID) is necessary. Most of the works have assessed the intelligibility in a supervised manner considering manual annotations, which requires cost and time; hence scalability is limited. To overcome these, this… ▽ More Speech intelligibility is crucial in language learning for effective communication. Thus, to develop computer-assisted language learning systems, automatic speech intelligibility detection (SID) is necessary. Most of the works have assessed the intelligibility in a supervised manner considering manual annotations, which requires cost and time; hence scalability is limited. To overcome these, this work proposes an unsupervised approach for SID. The proposed approach considers alignment distance computed with dynamic-time war** (DTW) between teacher and learner representation sequence as a measure to separate intelligible versus non-intelligible speech. We obtain the feature sequence using current state-of-the-art self-supervised representations from Wav2Vec-2.0. We found the detection accuracies as 90.37\%, 92.57\% and 96.58\%, respectively, with three alignment distance measures -- mean absolute error, mean squared error and cosine distance (equal to one minus cosine similarity). △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2304.08243 [pdf, ps, other]

Stochastic Code Generation

Authors: Swapnil Sharma, Nikita Anand, Kranthi Kiran G. V

Abstract: Large language models pre-trained for code generation can generate high-quality short code but often struggle with generating coherent long code and understanding higher-level or system-level specifications. This issue is also observed in language modeling for long text generation, and one proposed solution is the use of a latent stochastic process. This approach involves generating a document pla… ▽ More Large language models pre-trained for code generation can generate high-quality short code but often struggle with generating coherent long code and understanding higher-level or system-level specifications. This issue is also observed in language modeling for long text generation, and one proposed solution is the use of a latent stochastic process. This approach involves generating a document plan and then producing text that is consistent with it. In this study, we investigate whether this technique can be applied to code generation to improve coherence. We base our proposed encoder and decoder on the pre-trained GPT-2 based CodeParrot model and utilize the APPS dataset for training. We evaluate our results using the HumanEval benchmark and observe that the modified Time Control model performs similarly to CodeParrot on this evaluation. △ Less

Submitted 13 April, 2023; originally announced April 2023.

Comments: 6 pages, 3 figures

arXiv:2304.06861 [pdf, other]

Evaluation of Social Biases in Recent Large Pre-Trained Models

Authors: Swapnil Sharma, Nikita Anand, Kranthi Kiran G. V., Alind Jain

Abstract: Large pre-trained language models are widely used in the community. These models are usually trained on unmoderated and unfiltered data from open sources like the Internet. Due to this, biases that we see in platforms online which are a reflection of those in society are in turn captured and learned by these models. These models are deployed in applications that affect millions of people and their… ▽ More Large pre-trained language models are widely used in the community. These models are usually trained on unmoderated and unfiltered data from open sources like the Internet. Due to this, biases that we see in platforms online which are a reflection of those in society are in turn captured and learned by these models. These models are deployed in applications that affect millions of people and their inherent biases are harmful to the targeted social groups. In this work, we study the general trend in bias reduction as newer pre-trained models are released. Three recent models ( ELECTRA, DeBERTa, and DistilBERT) are chosen and evaluated against two bias benchmarks, StereoSet and CrowS-Pairs. They are compared to the baseline of BERT using the associated metrics. We explore whether as advancements are made and newer, faster, lighter models are released: are they being developed responsibly such that their inherent social biases have been reduced compared to their older counterparts? The results are compiled and we find that all the models under study do exhibit biases but have generally improved as compared to BERT. △ Less

Submitted 13 April, 2023; originally announced April 2023.

Comments: 7 pages, 4 Tables

arXiv:2211.06739 [pdf, other]

Partial Binarization of Neural Networks for Budget-Aware Efficient Learning

Authors: Udbhav Bamba, Neeraj Anand, Saksham Aggarwal, Dilip K. Prasad, Deepak K. Gupta

Abstract: Binarization is a powerful compression technique for neural networks, significantly reducing FLOPs, but often results in a significant drop in model performance. To address this issue, partial binarization techniques have been developed, but a systematic approach to mixing binary and full-precision parameters in a single network is still lacking. In this paper, we propose a controlled approach to… ▽ More Binarization is a powerful compression technique for neural networks, significantly reducing FLOPs, but often results in a significant drop in model performance. To address this issue, partial binarization techniques have been developed, but a systematic approach to mixing binary and full-precision parameters in a single network is still lacking. In this paper, we propose a controlled approach to partial binarization, creating a budgeted binary neural network (B2NN) with our MixBin strategy. This method optimizes the mixing of binary and full-precision components, allowing for explicit selection of the fraction of the network to remain binary. Our experiments show that B2NNs created using MixBin outperform those from random or iterative searches and state-of-the-art layer selection methods by up to 3% on the ImageNet-1K dataset. We also show that B2NNs outperform the structured pruning baseline by approximately 23% at the extreme FLOP budget of 15%, and perform well in object tracking, with up to a 12.4% relative improvement over other baselines. Additionally, we demonstrate that B2NNs developed by MixBin can be transferred across datasets, with some cases showing improved performance over directly applying MixBin on the downstream data. △ Less

Submitted 8 November, 2023; v1 submitted 12 November, 2022; originally announced November 2022.

Comments: Accepted at WACV 2023 Conference

arXiv:2205.15019 [pdf, other]

Protein Structure and Sequence Generation with Equivariant Denoising Diffusion Probabilistic Models

Authors: Namrata Anand, Tudor Achim

Abstract: Proteins are macromolecules that mediate a significant fraction of the cellular processes that underlie life. An important task in bioengineering is designing proteins with specific 3D structures and chemical properties which enable targeted functions. To this end, we introduce a generative model of both protein structure and sequence that can operate at significantly larger scales than previous m… ▽ More Proteins are macromolecules that mediate a significant fraction of the cellular processes that underlie life. An important task in bioengineering is designing proteins with specific 3D structures and chemical properties which enable targeted functions. To this end, we introduce a generative model of both protein structure and sequence that can operate at significantly larger scales than previous molecular generative modeling approaches. The model is learned entirely from experimental data and conditions its generation on a compact specification of protein topology to produce a full-atom backbone configuration as well as sequence and side-chain predictions. We demonstrate the quality of the model via qualitative and quantitative analysis of its samples. Videos of sampling trajectories are available at https://nanand2.github.io/proteins . △ Less

Submitted 26 May, 2022; originally announced May 2022.

arXiv:2106.06508 [pdf, other]

Preferential Temporal Difference Learning

Authors: Nishanth Anand, Doina Precup

Abstract: Temporal-Difference (TD) learning is a general and very useful tool for estimating the value function of a given policy, which in turn is required to find good policies. Generally speaking, TD learning updates states whenever they are visited. When the agent lands in a state, its value can be used to compute the TD-error, which is then propagated to other states. However, it may be interesting, wh… ▽ More Temporal-Difference (TD) learning is a general and very useful tool for estimating the value function of a given policy, which in turn is required to find good policies. Generally speaking, TD learning updates states whenever they are visited. When the agent lands in a state, its value can be used to compute the TD-error, which is then propagated to other states. However, it may be interesting, when computing updates, to take into account other information than whether a state is visited or not. For example, some states might be more important than others (such as states which are frequently seen in a successful trajectory). Or, some states might have unreliable value estimates (for example, due to partial observability or lack of data), making their values less desirable as targets. We propose an approach to re-weighting states used in TD updates, both when they are the input and when they provide the target for the update. We prove that our approach converges with linear function approximation and illustrate its desirable empirical behaviour compared to other TD-style methods. △ Less

Submitted 23 August, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

Comments: Accepted at the 38th International Conference on Machine Learning (ICML, 2021)

Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021, 286-296

arXiv:2006.01294 [pdf, other]

NEMA: Automatic Integration of Large Network Management Databases

Authors: Fubao Wu, Han Hee Song, Jiangtao Yin, Lixin Gao, Mario Baldi, Narendra Anand

Abstract: Network management, whether for malfunction analysis, failure prediction, performance monitoring and improvement, generally involves large amounts of data from different sources. To effectively integrate and manage these sources, automatically finding semantic matches among their schemas or ontologies is crucial. Existing approaches on database matching mainly fall into two categories. One focuses… ▽ More Network management, whether for malfunction analysis, failure prediction, performance monitoring and improvement, generally involves large amounts of data from different sources. To effectively integrate and manage these sources, automatically finding semantic matches among their schemas or ontologies is crucial. Existing approaches on database matching mainly fall into two categories. One focuses on the schema-level matching based on schema properties such as field names, data types, constraints and schema structures. Network management databases contain massive tables (e.g., network products, incidents, security alert and logs) from different departments and groups with nonuniform field names and schema characteristics. It is not reliable to match them by those schema properties. The other category is based on the instance-level matching using general string similarity techniques, which are not applicable for the matching of large network management databases. In this paper, we develop a matching technique for large NEtwork MAnagement databases (NEMA) deploying instance-level matching for effective data integration and connection. We design matching metrics and scores for both numerical and non-numerical fields and propose algorithms for matching these fields. The effectiveness and efficiency of NEMA are evaluated by conducting experiments based on ground truth field pairs in large network management databases. Our measurement on large databases with 1,458 fields, each of which contains over 10 million records, reveals that the accuracies of NEMA are up to 95%. It achieves 2%-10% higher accuracy and 5x-14x speedup over baseline methods. △ Less

Submitted 1 June, 2020; originally announced June 2020.

Comments: 14 pages, 13 Figures, 7 tables

arXiv:2004.03497 [pdf, other]

ProGen: Language Modeling for Protein Generation

Authors: Ali Madani, Bryan McCann, Nikhil Naik, Nitish Shirish Keskar, Namrata Anand, Raphael R. Eguchi, Po-Ssu Huang, Richard Socher

Abstract: Generative modeling for protein engineering is key to solving fundamental problems in synthetic biology, medicine, and material science. We pose protein engineering as an unsupervised sequence generation problem in order to leverage the exponentially growing set of proteins that lack costly, structural annotations. We train a 1.2B-parameter language model, ProGen, on ~280M protein sequences condit… ▽ More Generative modeling for protein engineering is key to solving fundamental problems in synthetic biology, medicine, and material science. We pose protein engineering as an unsupervised sequence generation problem in order to leverage the exponentially growing set of proteins that lack costly, structural annotations. We train a 1.2B-parameter language model, ProGen, on ~280M protein sequences conditioned on taxonomic and keyword tags such as molecular function and cellular component. This provides ProGen with an unprecedented range of evolutionary sequence diversity and allows it to generate with fine-grained control as demonstrated by metrics based on primary sequence similarity, secondary structure accuracy, and conformational energy. △ Less

Submitted 7 March, 2020; originally announced April 2020.

arXiv:1905.09562 [pdf, other]

Recurrent Value Functions

Authors: Pierre Thodoroff, Nishanth Anand, Lucas Caccia, Doina Precup, Joelle Pineau

Abstract: Despite recent successes in Reinforcement Learning, value-based methods often suffer from high variance hindering performance. In this paper, we illustrate this in a continuous control setting where state of the art methods perform poorly whenever sensor noise is introduced. To overcome this issue, we introduce Recurrent Value Functions (RVFs) as an alternative to estimate the value function of a… ▽ More Despite recent successes in Reinforcement Learning, value-based methods often suffer from high variance hindering performance. In this paper, we illustrate this in a continuous control setting where state of the art methods perform poorly whenever sensor noise is introduced. To overcome this issue, we introduce Recurrent Value Functions (RVFs) as an alternative to estimate the value function of a state. We propose to estimate the value function of the current state using the value function of past states visited along the trajectory. Due to the nature of their formulation, RVFs have a natural way of learning an emphasis function that selectively emphasizes important states. First, we establish RVF's asymptotic convergence properties in tabular settings. We then demonstrate their robustness on a partially observable domain and continuous control tasks. Finally, we provide a qualitative interpretation of the learned emphasis function. △ Less

Submitted 23 May, 2019; originally announced May 2019.

arXiv:1412.7399 [pdf, other]

doi 10.1007/s11128-015-1105-y

Do quantum strategies always win?

Authors: Namit Anand, Colin Benjamin

Abstract: In a seminal paper, Meyer [David Meyer, Phys. Rev. Lett. 82, 1052 (1999)] described the advantages of quantum game theory by looking at the classical penny flip game. A player using a quantum strategy can win against a classical player almost 100\% of the time. Here we make a slight modification to the quantum game, with the two players sharing an entangled state to begin with. We then analyze two… ▽ More In a seminal paper, Meyer [David Meyer, Phys. Rev. Lett. 82, 1052 (1999)] described the advantages of quantum game theory by looking at the classical penny flip game. A player using a quantum strategy can win against a classical player almost 100\% of the time. Here we make a slight modification to the quantum game, with the two players sharing an entangled state to begin with. We then analyze two different scenarios, first in which quantum player makes unitary transformations to his qubit while the classical player uses a pure strategy of either flip** or not flip** the state of his qubit. In this case the quantum player always wins against the classical player. In the second scenario we have the quantum player making similar unitary transformations while the classical player makes use of a mixed strategy wherein he either flips or not with some probability "p". We show that in the second scenario, 100\% win record of a quantum player is drastically reduced and for a particular probability "p" the classical player can even win against the quantum player. This is of possible relevance to the field of quantum computation as we show that in this quantum game of preserving versus destroying entanglement a particular classical algorithm can beat the quantum algorithm. △ Less

Submitted 11 August, 2015; v1 submitted 23 December, 2014; originally announced December 2014.

Comments: 12 pages, 3 figures, expanded with material on general quantum unitaries and discussion on gaming the quantum

Journal ref: Quantum Information Processing, Volume 14, issue 11, pp 4027-4038 (November 2015)

Showing 1–15 of 15 results for author: Anand, N