Search | arXiv e-print repository

Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation

Authors: Katherine M. Collins, Najoung Kim, Yonatan Bitton, Verena Rieser, Shayegan Omidshafiei, Yushi Hu, Sherol Chen, Senjuti Dutta, Minsuk Chang, Kimin Lee, Youwei Liang, Georgina Evans, Sahil Singla, Gang Li, Adrian Weller, Junfeng He, Deepak Ramachandran, Krishnamurthy Dj Dvijotham

Abstract: Human feedback plays a critical role in learning and refining reward models for text-to-image generation, but the optimal form the feedback should take for learning an accurate reward function has not been conclusively established. This paper investigates the effectiveness of fine-grained feedback which captures nuanced distinctions in image quality and prompt-alignment, compared to traditional co… ▽ More Human feedback plays a critical role in learning and refining reward models for text-to-image generation, but the optimal form the feedback should take for learning an accurate reward function has not been conclusively established. This paper investigates the effectiveness of fine-grained feedback which captures nuanced distinctions in image quality and prompt-alignment, compared to traditional coarse-grained feedback (for example, thumbs up/down or ranking between a set of options). While fine-grained feedback holds promise, particularly for systems catering to diverse societal preferences, we show that demonstrating its superiority to coarse-grained feedback is not automatic. Through experiments on real and synthetic preference data, we surface the complexities of building effective models due to the interplay of model choice, feedback type, and the alignment between human judgment and computational interpretation. We identify key challenges in eliciting and utilizing fine-grained feedback, prompting a reassessment of its assumed benefits and practicality. Our findings -- e.g., that fine-grained feedback can lead to worse models for a fixed budget, in some settings; however, in controlled settings with known attributes, fine grained rewards can indeed be more helpful -- call for careful consideration of feedback attributes and potentially beckon novel modeling approaches to appropriately unlock the potential value of fine-grained feedback in-the-wild. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.09574 [pdf, other]

Online Bandit Learning with Offline Preference Data

Authors: Akhil Agnihotri, Rahul Jain, Deepak Ramachandran, Zheng Wen

Abstract: Reinforcement Learning with Human Feedback (RLHF) is at the core of fine-tuning methods for generative AI models for language and images. Such feedback is often sought as rank or preference feedback from human raters, as opposed to eliciting scores since the latter tends to be very noisy. On the other hand, RL theory and algorithms predominantly assume that a reward feedback is available. In parti… ▽ More Reinforcement Learning with Human Feedback (RLHF) is at the core of fine-tuning methods for generative AI models for language and images. Such feedback is often sought as rank or preference feedback from human raters, as opposed to eliciting scores since the latter tends to be very noisy. On the other hand, RL theory and algorithms predominantly assume that a reward feedback is available. In particular, approaches for online learning that can be helpful in adaptive data collection via active learning cannot incorporate offline preference data. In this paper, we adopt a finite-armed linear bandit model as a prototypical model of online learning. We consider an offline preference dataset to be available generated by an expert of unknown 'competence'. We propose $\texttt{warmPref-PS}$, a posterior sampling algorithm for online learning that can be warm-started with an offline dataset with noisy preference feedback. We show that by modeling the competence of the expert that generated it, we are able to use such a dataset most effectively. We support our claims with novel theoretical analysis of its Bayesian regret, as well as extensive empirical evaluation of an approximate algorithm which performs substantially better (almost 25 to 50% regret reduction in our studies) as compared to baselines. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09563 [pdf, other]

e-COP : Episodic Constrained Optimization of Policies

Authors: Akhil Agnihotri, Rahul Jain, Deepak Ramachandran, Sahil Singla

Abstract: In this paper, we present the $\texttt{e-COP}$ algorithm, the first policy optimization algorithm for constrained Reinforcement Learning (RL) in episodic (finite horizon) settings. Such formulations are applicable when there are separate sets of optimization criteria and constraints on a system's behavior. We approach this problem by first establishing a policy difference lemma for the episodic se… ▽ More In this paper, we present the $\texttt{e-COP}$ algorithm, the first policy optimization algorithm for constrained Reinforcement Learning (RL) in episodic (finite horizon) settings. Such formulations are applicable when there are separate sets of optimization criteria and constraints on a system's behavior. We approach this problem by first establishing a policy difference lemma for the episodic setting, which provides the theoretical foundation for the algorithm. Then, we propose to combine a set of established and novel solution ideas to yield the $\texttt{e-COP}$ algorithm that is easy to implement and numerically stable, and provide a theoretical guarantee on optimality under certain scaling assumptions. Through extensive empirical analysis using benchmarks in the Safety Gym suite, we show that our algorithm has similar or better performance than SoTA (non-episodic) algorithms adapted for the episodic setting. The scalability of the algorithm opens the door to its application in safety-constrained Reinforcement Learning from Human Feedback for Large Language or Diffusion Models. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2403.17853 [pdf, other]

Using Domain Knowledge to Guide Dialog Structure Induction via Neural Probabilistic Soft Logic

Authors: Connor Pryor, Quan Yuan, Jeremiah Liu, Mehran Kazemi, Deepak Ramachandran, Tania Bedrax-Weiss, Lise Getoor

Abstract: Dialog Structure Induction (DSI) is the task of inferring the latent dialog structure (i.e., a set of dialog states and their temporal transitions) of a given goal-oriented dialog. It is a critical component for modern dialog system design and discourse analysis. Existing DSI approaches are often purely data-driven, deploy models that infer latent states without access to domain knowledge, underpe… ▽ More Dialog Structure Induction (DSI) is the task of inferring the latent dialog structure (i.e., a set of dialog states and their temporal transitions) of a given goal-oriented dialog. It is a critical component for modern dialog system design and discourse analysis. Existing DSI approaches are often purely data-driven, deploy models that infer latent states without access to domain knowledge, underperform when the training corpus is limited/noisy, or have difficulty when test dialogs exhibit distributional shifts from the training domain. This work explores a neural-symbolic approach as a potential solution to these problems. We introduce Neural Probabilistic Soft Logic Dialogue Structure Induction (NEUPSL DSI), a principled approach that injects symbolic knowledge into the latent space of a generative neural model. We conduct a thorough empirical investigation on the effect of NEUPSL DSI learning on hidden representation quality, few-shot learning, and out-of-domain generalization performance. Over three dialog structure induction datasets and across unsupervised and semi-supervised settings for standard and cross-domain generalization, the injection of symbolic knowledge using NEUPSL DSI provides a consistent boost in performance over the canonical baselines. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.05576 [pdf]

Understanding Subjectivity through the Lens of Motivational Context in Model-Generated Image Satisfaction

Authors: Senjuti Dutta, Sherol Chen, Sunny Mak, Amnah Ahmad, Katherine Collins, Alena Butryna, Deepak Ramachandran, Krishnamurthy Dvijotham, Ellie Pavlick, Ravi Rajakumar

Abstract: Image generation models are poised to become ubiquitous in a range of applications. These models are often fine-tuned and evaluated using human quality judgments that assume a universal standard, failing to consider the subjectivity of such tasks. To investigate how to quantify subjectivity, and the scale of its impact, we measure how assessments differ among human annotators across different use… ▽ More Image generation models are poised to become ubiquitous in a range of applications. These models are often fine-tuned and evaluated using human quality judgments that assume a universal standard, failing to consider the subjectivity of such tasks. To investigate how to quantify subjectivity, and the scale of its impact, we measure how assessments differ among human annotators across different use cases. Simulating the effects of ordinarily latent elements of annotators subjectivity, we contrive a set of motivations (t-shirt graphics, presentation visuals, and phone background images) to contextualize a set of crowdsourcing tasks. Our results show that human evaluations of images vary within individual contexts and across combinations of contexts. Three key factors affecting this subjectivity are image appearance, image alignment with text, and representation of objects mentioned in the text. Our study highlights the importance of taking individual users and contexts into account, both when building and evaluating generative models △ Less

Submitted 26 February, 2024; originally announced March 2024.

arXiv:2312.16720 [pdf, other]

Prompt Expansion for Adaptive Text-to-Image Generation

Authors: Siddhartha Datta, Alexander Ku, Deepak Ramachandran, Peter Anderson

Abstract: Text-to-image generation models are powerful but difficult to use. Users craft specific prompts to get better images, though the images can be repetitive. This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort. The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts that are optimized such t… ▽ More Text-to-image generation models are powerful but difficult to use. Users craft specific prompts to get better images, though the images can be repetitive. This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort. The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts that are optimized such that when passed to a text-to-image model, generates a wider variety of appealing images. We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods. Overall, this paper presents a novel and effective approach to improving the text-to-image generation experience. △ Less

Submitted 27 December, 2023; originally announced December 2023.

arXiv:2312.10240 [pdf, other]

Rich Human Feedback for Text-to-Image Generation

Authors: Youwei Liang, Junfeng He, Gang Li, Peizhao Li, Arseniy Klimovskiy, Nicholas Carolan, Jiao Sun, Jordi Pont-Tuset, Sarah Young, Feng Yang, Junjie Ke, Krishnamurthy Dj Dvijotham, Katie Collins, Yiwen Luo, Yang Li, Kai J Kohlhoff, Deepak Ramachandran, Vidhya Navalpakkam

Abstract: Recent Text-to-Image (T2I) generation models such as Stable Diffusion and Imagen have made significant progress in generating high-resolution images based on text descriptions. However, many generated images still suffer from issues such as artifacts/implausibility, misalignment with text descriptions, and low aesthetic quality. Inspired by the success of Reinforcement Learning with Human Feedback… ▽ More Recent Text-to-Image (T2I) generation models such as Stable Diffusion and Imagen have made significant progress in generating high-resolution images based on text descriptions. However, many generated images still suffer from issues such as artifacts/implausibility, misalignment with text descriptions, and low aesthetic quality. Inspired by the success of Reinforcement Learning with Human Feedback (RLHF) for large language models, prior works collected human-provided scores as feedback on generated images and trained a reward model to improve the T2I generation. In this paper, we enrich the feedback signal by (i) marking image regions that are implausible or misaligned with the text, and (ii) annotating which words in the text prompt are misrepresented or missing on the image. We collect such rich human feedback on 18K generated images (RichHF-18K) and train a multimodal transformer to predict the rich feedback automatically. We show that the predicted rich human feedback can be leveraged to improve image generation, for example, by selecting high-quality training data to finetune and improve the generative models, or by creating masks with predicted heatmaps to inpaint the problematic regions. Notably, the improvements generalize to models (Muse) beyond those used to generate the images on which human feedback data were collected (Stable Diffusion variants). The RichHF-18K data set will be released in our GitHub repository: https://github.com/google-research/google-research/tree/master/richhf_18k. △ Less

Submitted 8 April, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: CVPR'24

arXiv:2312.09244 [pdf, other]

Hel** or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking

Authors: Jacob Eisenstein, Chirag Nagpal, Alekh Agarwal, Ahmad Beirami, Alex D'Amour, DJ Dvijotham, Adam Fisch, Katherine Heller, Stephen Pfohl, Deepak Ramachandran, Peter Shaw, Jonathan Berant

Abstract: Reward models play a key role in aligning language model applications towards human preferences. However, this setup creates an incentive for the language model to exploit errors in the reward model to achieve high estimated reward, a phenomenon often termed \emph{reward hacking}. A natural mitigation is to train an ensemble of reward models, aggregating over model outputs to obtain a more robust… ▽ More Reward models play a key role in aligning language model applications towards human preferences. However, this setup creates an incentive for the language model to exploit errors in the reward model to achieve high estimated reward, a phenomenon often termed \emph{reward hacking}. A natural mitigation is to train an ensemble of reward models, aggregating over model outputs to obtain a more robust reward estimate. We explore the application of reward ensembles to alignment at both training time (through reinforcement learning) and inference time (through reranking). First, we show that reward models are \emph{underspecified}: reward models that perform similarly in-distribution can yield very different rewards when used in alignment, due to distribution shift. Second, underspecification results in overoptimization, where alignment to one reward model does not improve reward as measured by another reward model trained on the same data. Third, overoptimization is mitigated by the use of reward ensembles, and ensembles that vary by their \emph{pretraining} seeds lead to better generalization than ensembles that differ only by their \emph{fine-tuning} seeds, with both outperforming individual reward models. However, even pretrain reward ensembles do not eliminate reward hacking: we show several qualitative reward hacking phenomena that are not mitigated by ensembling because all reward models in the ensemble exhibit similar error patterns. △ Less

Submitted 20 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

arXiv:2311.00203 [pdf, other]

Modeling subjectivity (by Mimicking Annotator Annotation) in toxic comment identification across diverse communities

Authors: Senjuti Dutta, Sid Mittal, Sherol Chen, Deepak Ramachandran, Ravi Rajakumar, Ian Kivlichan, Sunny Mak, Alena Butryna, Praveen Paritosh

Abstract: The prevalence and impact of toxic discussions online have made content moderation crucial.Automated systems can play a vital role in identifying toxicity, and reducing the reliance on human moderation.Nevertheless, identifying toxic comments for diverse communities continues to present challenges that are addressed in this paper.The two-part goal of this study is to(1)identify intuitive variances… ▽ More The prevalence and impact of toxic discussions online have made content moderation crucial.Automated systems can play a vital role in identifying toxicity, and reducing the reliance on human moderation.Nevertheless, identifying toxic comments for diverse communities continues to present challenges that are addressed in this paper.The two-part goal of this study is to(1)identify intuitive variances from annotator disagreement using quantitative analysis and (2)model the subjectivity of these viewpoints.To achieve our goal, we published a new dataset\footnote{\url{https://github.com/XXX}} with expert annotators' annotations and used two other public datasets to identify the subjectivity of toxicity.Then leveraging the Large Language Model(LLM),we evaluate the model's ability to mimic diverse viewpoints on toxicity by varying size of the training data and utilizing same set of annotators as the test set used during model training and a separate set of annotators as the test set.We conclude that subjectivity is evident across all annotator groups, demonstrating the shortcomings of majority-rule voting. Moving forward, subjective annotations should serve as ground truth labels for training models for domains like toxicity in diverse communities. △ Less

Submitted 31 October, 2023; originally announced November 2023.

arXiv:2310.04475 [pdf, other]

Demystifying Embedding Spaces using Large Language Models

Authors: Guy Tennenholtz, Yinlam Chow, Chih-Wei Hsu, Jihwan Jeong, Lior Shani, Azamat Tulepbergenov, Deepak Ramachandran, Martin Mladenov, Craig Boutilier

Abstract: Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. Nevertheless, they often preclude direct interpretation. While downstream tasks make use of these compressed representations, meaningful interpretation usually requires visualization using dimensionality reduction or specialized machin… ▽ More Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. Nevertheless, they often preclude direct interpretation. While downstream tasks make use of these compressed representations, meaningful interpretation usually requires visualization using dimensionality reduction or specialized machine learning interpretability methods. This paper addresses the challenge of making such embeddings more interpretable and broadly useful, by employing Large Language Models (LLMs) to directly interact with embeddings -- transforming abstract vectors into understandable narratives. By injecting embeddings into LLMs, we enable querying and exploration of complex embedding data. We demonstrate our approach on a variety of diverse tasks, including: enhancing concept activation vectors (CAVs), communicating novel embedded entities, and decoding user preferences in recommender systems. Our work couples the immense information potential of embeddings with the interpretative power of LLMs. △ Less

Submitted 13 March, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: Accepted to ICLR 2024

arXiv:2308.15299 [pdf, other]

TaskLAMA: Probing the Complex Task Understanding of Language Models

Authors: Quan Yuan, Mehran Kazemi, Xin Xu, Isaac Noble, Vaiva Imbrasaite, Deepak Ramachandran

Abstract: Structured Complex Task Decomposition (SCTD) is the problem of breaking down a complex real-world task (such as planning a wedding) into a directed acyclic graph over individual steps that contribute to achieving the task, with edges specifying temporal dependencies between them. SCTD is an important component of assistive planning tools, and a challenge for commonsense reasoning systems. We probe… ▽ More Structured Complex Task Decomposition (SCTD) is the problem of breaking down a complex real-world task (such as planning a wedding) into a directed acyclic graph over individual steps that contribute to achieving the task, with edges specifying temporal dependencies between them. SCTD is an important component of assistive planning tools, and a challenge for commonsense reasoning systems. We probe how accurately SCTD can be done with the knowledge extracted from Large Language Models (LLMs). We introduce a high-quality human-annotated dataset for this problem and novel metrics to fairly assess performance of LLMs against several baselines. Our experiments reveal that LLMs are able to decompose complex tasks into individual steps effectively, with a relative improvement of 15% to 280% over the best baseline. We also propose a number of approaches to further improve their performance, with a relative improvement of 7% to 37% over the base model. However, we find that LLMs still struggle to predict pairwise temporal dependencies, which reveals a gap in their understanding of complex tasks. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2306.07934 [pdf, other]

BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information

Authors: Mehran Kazemi, Quan Yuan, Deepti Bhatia, Najoung Kim, Xin Xu, Vaiva Imbrasaite, Deepak Ramachandran

Abstract: Automated reasoning with unstructured natural text is a key requirement for many potential applications of NLP and for develo** robust AI systems. Recently, Language Models (LMs) have demonstrated complex reasoning capacities even without any finetuning. However, existing evaluation for automated reasoning assumes access to a consistent and coherent set of information over which models reason. W… ▽ More Automated reasoning with unstructured natural text is a key requirement for many potential applications of NLP and for develo** robust AI systems. Recently, Language Models (LMs) have demonstrated complex reasoning capacities even without any finetuning. However, existing evaluation for automated reasoning assumes access to a consistent and coherent set of information over which models reason. When reasoning in the real-world, the available information is frequently inconsistent or contradictory, and therefore models need to be equipped with a strategy to resolve such conflicts when they arise. One widely-applicable way of resolving conflicts is to impose preferences over information sources (e.g., based on source credibility or information recency) and adopt the source with higher preference. In this paper, we formulate the problem of reasoning with contradictory information guided by preferences over sources as the classical problem of defeasible reasoning, and develop a dataset called BoardgameQA for measuring the reasoning capacity of LMs in this setting. BoardgameQA also incorporates reasoning with implicit background knowledge, to better reflect reasoning problems in downstream applications. We benchmark various LMs on BoardgameQA and the results reveal a significant gap in the reasoning capacity of state-of-the-art LMs on this problem, showing that reasoning with conflicting information does not surface out-of-the-box in LMs. While performance can be improved with finetuning, it nevertheless remains poor. △ Less

Submitted 13 June, 2023; originally announced June 2023.

arXiv:2304.10331 [pdf, other]

doi 10.1103/PhysRevA.108.012819

Nuclear T-violation search using octupole-deformed nuclei in a crystal

Authors: Harish D. Ramachandran, Amar C. Vutha

Abstract: Precision measurements with atoms and molecules can search for subtle violations of time-reversal symmetry (T) in nuclei, and thereby probe a variety of new physics models. We present a detailed scheme for a nuclear T-violation search experiment using $^{153}$Eu$^{3+}$ ions doped in non-centrosymmetric sites within a Y$_2$SiO$_5$ crystal. The ions in this solid contain nuclei that are highly sensi… ▽ More Precision measurements with atoms and molecules can search for subtle violations of time-reversal symmetry (T) in nuclei, and thereby probe a variety of new physics models. We present a detailed scheme for a nuclear T-violation search experiment using $^{153}$Eu$^{3+}$ ions doped in non-centrosymmetric sites within a Y$_2$SiO$_5$ crystal. The ions in this solid contain nuclei that are highly sensitive to T-violation, and avail of large atomic enhancements by being polarized within the solid. But in particular, the system and methods that we discuss here enable the use of vast numbers of nuclei trapped in crystals, while also offering a number of stringent tests to ward off systematic errors. Our approach maps out a path to probe new physics at the PeV energy scale. △ Less

Submitted 7 July, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

Comments: 10 pages, 3 figures, 5 tables

Journal ref: Phys. Rev. A 108, 012819 (2023)

arXiv:2304.06189 [pdf, other]

Coherent quantum beats: spectroscopy of energy differences masked by inhomogeneous broadening

Authors: Harish D. Ramachandran, Julia E. Ford, Amar C. Vutha

Abstract: Precision spectroscopy of solid-state systems is challenging due to inhomogeneous broadening. We describe a technique -- coherent quantum beats -- that enables the measurement of small frequency shifts within an inhomogeneously broadened distribution while addressing the full ensemble. We show that the technique can be used to obtain improvements in signal size and spectral resolution, offering ad… ▽ More Precision spectroscopy of solid-state systems is challenging due to inhomogeneous broadening. We describe a technique -- coherent quantum beats -- that enables the measurement of small frequency shifts within an inhomogeneously broadened distribution while addressing the full ensemble. We show that the technique can be used to obtain improvements in signal size and spectral resolution, offering advantages for precision measurements in solids. △ Less

Submitted 30 June, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

Comments: 8 pages, 8 figures

arXiv:2302.05807 [pdf, other]

Pushing the Accuracy-Group Robustness Frontier with Introspective Self-play

Authors: Jeremiah Zhe Liu, Krishnamurthy Dj Dvijotham, Jihyeon Lee, Quan Yuan, Martin Strobel, Balaji Lakshminarayanan, Deepak Ramachandran

Abstract: Standard empirical risk minimization (ERM) training can produce deep neural network (DNN) models that are accurate on average but under-perform in under-represented population subgroups, especially when there are imbalanced group distributions in the long-tailed training data. Therefore, approaches that improve the accuracy-group robustness trade-off frontier of a DNN model (i.e. improving worst-g… ▽ More Standard empirical risk minimization (ERM) training can produce deep neural network (DNN) models that are accurate on average but under-perform in under-represented population subgroups, especially when there are imbalanced group distributions in the long-tailed training data. Therefore, approaches that improve the accuracy-group robustness trade-off frontier of a DNN model (i.e. improving worst-group accuracy without sacrificing average accuracy, or vice versa) is of crucial importance. Uncertainty-based active learning (AL) can potentially improve the frontier by preferentially sampling underrepresented subgroups to create a more balanced training dataset. However, the quality of uncertainty estimates from modern DNNs tend to degrade in the presence of spurious correlations and dataset bias, compromising the effectiveness of AL for sampling tail groups. In this work, we propose Introspective Self-play (ISP), a simple approach to improve the uncertainty estimation of a deep neural network under dataset bias, by adding an auxiliary introspection task requiring a model to predict the bias for each data point in addition to the label. We show that ISP provably improves the bias-awareness of the model representation and the resulting uncertainty estimates. On two real-world tabular and language tasks, ISP serves as a simple "plug-in" for AL model training, consistently improving both the tail-group sampling rate and the final accuracy-fairness trade-off frontier of popular AL methods. △ Less

Submitted 11 February, 2023; originally announced February 2023.

Comments: Accepted to ICLR 2023. Included additional contribution from Martin Strobel

arXiv:2301.11293 [pdf, other]

Understanding Finetuning for Factual Knowledge Extraction from Language Models

Authors: Mehran Kazemi, Sid Mittal, Deepak Ramachandran

Abstract: Language models (LMs) pretrained on large corpora of text from the web have been observed to contain large amounts of various types of knowledge about the world. This observation has led to a new and exciting paradigm in knowledge graph construction where, instead of manual curation or text mining, one extracts knowledge from the parameters of an LM. Recently, it has been shown that finetuning LMs… ▽ More Language models (LMs) pretrained on large corpora of text from the web have been observed to contain large amounts of various types of knowledge about the world. This observation has led to a new and exciting paradigm in knowledge graph construction where, instead of manual curation or text mining, one extracts knowledge from the parameters of an LM. Recently, it has been shown that finetuning LMs on a set of factual knowledge makes them produce better answers to queries from a different set, thus making finetuned LMs a good candidate for knowledge extraction and, consequently, knowledge graph construction. In this paper, we analyze finetuned LMs for factual knowledge extraction. We show that along with its previously known positive effects, finetuning also leads to a (potentially harmful) phenomenon which we call Frequency Shock, where at the test time the model over-predicts rare entities that appear in the training set and under-predicts common entities that do not appear in the training set enough times. We show that Frequency Shock leads to a degradation in the predictions of the model and beyond a point, the harm from Frequency Shock can even outweigh the positive effects of finetuning, making finetuning harmful overall. We then consider two solutions to remedy the identified negative effect: 1- model mixing and 2- mixture finetuning with the LM's pre-training task. The two solutions combined lead to significant improvements compared to vanilla finetuning. △ Less

Submitted 26 January, 2023; originally announced January 2023.

arXiv:2212.13894 [pdf, other]

LAMBADA: Backward Chaining for Automated Reasoning in Natural Language

Authors: Mehran Kazemi, Najoung Kim, Deepti Bhatia, Xin Xu, Deepak Ramachandran

Abstract: Remarkable progress has been made on automated reasoning with natural text, by using Language Models (LMs) and methods such as Chain-of-Thought and Selection-Inference. These techniques search for proofs in the forward direction from axioms to the conclusion, which suffers from a combinatorial explosion of the search space, and thus high failure rates for problems requiring longer chains of reason… ▽ More Remarkable progress has been made on automated reasoning with natural text, by using Language Models (LMs) and methods such as Chain-of-Thought and Selection-Inference. These techniques search for proofs in the forward direction from axioms to the conclusion, which suffers from a combinatorial explosion of the search space, and thus high failure rates for problems requiring longer chains of reasoning. The classical automated reasoning literature has shown that reasoning in the backward direction (i.e. from the intended conclusion to supporting axioms) is significantly more efficient at proof-finding. Importing this intuition into the LM setting, we develop a Backward Chaining algorithm, called LAMBADA, that decomposes reasoning into four sub-modules. These sub-modules are simply implemented by few-shot prompted LM inference. We show that LAMBADA achieves sizable accuracy boosts over state-of-the-art forward reasoning methods on challenging logical reasoning datasets, particularly when deep and accurate proof chains are required. △ Less

Submitted 29 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: Accepted at ACL 2023

arXiv:2211.06309 [pdf, other]

A Riemannian Genuine Measure of Entanglement for Pure States

Authors: Dharmaraj Ramachandran, Radhika Vathsan

Abstract: While several measures exist for entanglement of multipartite pure states, a true entanglement measure for mixed states still eludes us. A deeper study of the geometry of quantum states may be the way to address this issue, on which context we come up with a measure for pure states based on a geodesic distance on the space of quantum states. Our measure satisfies all the desirable properties of a… ▽ More While several measures exist for entanglement of multipartite pure states, a true entanglement measure for mixed states still eludes us. A deeper study of the geometry of quantum states may be the way to address this issue, on which context we come up with a measure for pure states based on a geodesic distance on the space of quantum states. Our measure satisfies all the desirable properties of a ``Genuine Measure of Entanglement" (GME), and in comparison with some of the other existing measures, shows better smoothness and discriminance. △ Less

Submitted 13 January, 2024; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: Revised version, submitted to QIP

arXiv:2207.07279 [pdf, other]

doi 10.1088/1367-2630/ace9f3

BaF molecules in neon ice: trap**, spectroscopy and optical control of electron spins

Authors: Samuel J. Li, Harish D. Ramachandran, Rhys Anderson, Amar C. Vutha

Abstract: We have trapped BaF molecules in neon ice, and used laser-induced fluorescence spectroscopy to map out optical transitions in the trapped molecules. Our measurements show that the neon lattice does not significantly perturb certain optical transitions in the trapped molecules. We used one of these transitions to polarize the electron spins, detect spin flips and measure hyperfine transitions in th… ▽ More We have trapped BaF molecules in neon ice, and used laser-induced fluorescence spectroscopy to map out optical transitions in the trapped molecules. Our measurements show that the neon lattice does not significantly perturb certain optical transitions in the trapped molecules. We used one of these transitions to polarize the electron spins, detect spin flips and measure hyperfine transitions in the trapped molecules, entirely using lasers. This demonstration with heavy polar molecules opens up new opportunities for precision measurements of beyond-standard-model physics. △ Less

Submitted 25 January, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

Journal ref: New J. Phys. 25, 082001 (2023)

arXiv:2205.10403 [pdf, other]

Tackling Provably Hard Representative Selection via Graph Neural Networks

Authors: Mehran Kazemi, Anton Tsitsulin, Hossein Esfandiari, MohammadHossein Bateni, Deepak Ramachandran, Bryan Perozzi, Vahab Mirrokni

Abstract: Representative Selection (RS) is the problem of finding a small subset of exemplars from a dataset that is representative of the dataset. In this paper, we study RS for attributed graphs, and focus on finding representative nodes that optimize the accuracy of a model trained on the selected representatives. Theoretically, we establish a new hardness result forRS (in the absence of a graph structur… ▽ More Representative Selection (RS) is the problem of finding a small subset of exemplars from a dataset that is representative of the dataset. In this paper, we study RS for attributed graphs, and focus on finding representative nodes that optimize the accuracy of a model trained on the selected representatives. Theoretically, we establish a new hardness result forRS (in the absence of a graph structure) by proving that a particular, highly practical variant of it (RS for Learning) is hard to approximate in polynomial time within any reasonable factor, which implies a significant potential gap between the optimum solution of widely-used surrogate functions and the actual accuracy of the model. We then study the setting where a (homophilous) graph structure is available, or can be constructed, between the data points.We show that with an appropriate modeling approach, the presence of such a structure can turn a hard RS (for learning) problem into one that can be effectively solved. To this end, we develop RS-GNN, a representation learning-based RS model based on Graph Neural Networks. Empirically, we demonstrate the effectiveness of RS-GNN on problems with predefined graph structures as well as problems with graphs induced from node feature similarities, by showing that RS-GNN achieves significant improvements over established baselines on a suite of eight benchmarks. △ Less

Submitted 19 July, 2023; v1 submitted 20 May, 2022; originally announced May 2022.

Comments: Accepted at the Transactions of Machine Learning Research (TMLR) Journal

arXiv:2205.06262 [pdf, other]

FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue

Authors: Alon Albalak, Yi-Lin Tuan, Pegah Jandaghi, Connor Pryor, Luke Yoffe, Deepak Ramachandran, Lise Getoor, Jay Pujara, William Yang Wang

Abstract: Task transfer, transferring knowledge contained in related tasks, holds the promise of reducing the quantity of labeled data required to fine-tune language models. Dialogue understanding encompasses many diverse tasks, yet task transfer has not been thoroughly studied in conversational AI. This work explores conversational task transfer by introducing FETA: a benchmark for few-sample task transfer… ▽ More Task transfer, transferring knowledge contained in related tasks, holds the promise of reducing the quantity of labeled data required to fine-tune language models. Dialogue understanding encompasses many diverse tasks, yet task transfer has not been thoroughly studied in conversational AI. This work explores conversational task transfer by introducing FETA: a benchmark for few-sample task transfer in open-domain dialogue. FETA contains two underlying sets of conversations upon which there are 10 and 7 tasks annotated, enabling the study of intra-dataset task transfer; task transfer without domain adaptation. We utilize three popular language models and three learning algorithms to analyze the transferability between 132 source-target task pairs and create a baseline for future work. We run experiments in the single- and multi-source settings and report valuable findings, e.g., most performance trends are model-specific, and span extraction and multiple-choice tasks benefit the most from task transfer. In addition to task transfer, FETA can be a valuable resource for future research into the efficiency and generalizability of pre-training datasets and model architectures, as well as for learning settings such as continual and multitask learning. △ Less

Submitted 13 October, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

Comments: EMNLP 2022. benchmark available at https://alon-albalak.github.io/feta-website

arXiv:2202.02830 [pdf, other]

doi 10.1145/1122445.1122456

Discovering Personalized Semantics for Soft Attributes in Recommender Systems using Concept Activation Vectors

Authors: Christina Göpfert, Alex Haig, Yinlam Chow, Chih-wei Hsu, Ivan Vendrov, Tyler Lu, Deepak Ramachandran, Hubert Pham, Mohammad Ghavamzadeh, Craig Boutilier

Abstract: Interactive recommender systems have emerged as a promising paradigm to overcome the limitations of the primitive user feedback used by traditional recommender systems (e.g., clicks, item consumption, ratings). They allow users to express intent, preferences, constraints, and contexts in a richer fashion, often using natural language (including faceted search and dialogue). Yet more research is ne… ▽ More Interactive recommender systems have emerged as a promising paradigm to overcome the limitations of the primitive user feedback used by traditional recommender systems (e.g., clicks, item consumption, ratings). They allow users to express intent, preferences, constraints, and contexts in a richer fashion, often using natural language (including faceted search and dialogue). Yet more research is needed to find the most effective ways to use this feedback. One challenge is inferring a user's semantic intent from the open-ended terms or attributes often used to describe a desired item, and using it to refine recommendation results. Leveraging concept activation vectors (CAVs) [26], a recently developed approach for model interpretability in machine learning, we develop a framework to learn a representation that captures the semantics of such attributes and connects them to user preferences and behaviors in recommender systems. One novel feature of our approach is its ability to distinguish objective and subjective attributes (both subjectivity of degree and of sense), and associate different senses of subjective attributes with different users. We demonstrate on both synthetic and real-world data sets that our CAV representation not only accurately interprets users' subjective semantics, but can also be used to improve recommendations through interactive item critiquing. △ Less

Submitted 2 June, 2023; v1 submitted 6 February, 2022; originally announced February 2022.

arXiv:2101.00391 [pdf, other]

Which Linguist Invented the Lightbulb? Presupposition Verification for Question-Answering

Authors: Najoung Kim, Ellie Pavlick, Burcu Karagol Ayan, Deepak Ramachandran

Abstract: Many Question-Answering (QA) datasets contain unanswerable questions, but their treatment in QA systems remains primitive. Our analysis of the Natural Questions (Kwiatkowski et al. 2019) dataset reveals that a substantial portion of unanswerable questions ($\sim$21%) can be explained based on the presence of unverifiable presuppositions. We discuss the shortcomings of current models in handling su… ▽ More Many Question-Answering (QA) datasets contain unanswerable questions, but their treatment in QA systems remains primitive. Our analysis of the Natural Questions (Kwiatkowski et al. 2019) dataset reveals that a substantial portion of unanswerable questions ($\sim$21%) can be explained based on the presence of unverifiable presuppositions. We discuss the shortcomings of current models in handling such questions, and describe how an improved system could handle them. Through a user preference study, we demonstrate that the oracle behavior of our proposed system that provides responses based on presupposition failure is preferred over the oracle behavior of existing QA systems. Then we discuss how our proposed system could be implemented, presenting a novel framework that breaks down the problem into three steps: presupposition generation, presupposition verification and explanation generation. We report our progress in tackling each subproblem, and present a preliminary approach to integrating these steps into an existing QA system. We find that adding presuppositions and their verifiability to an existing model yields modest gains in downstream performance and unanswerability detection. The biggest bottleneck is the verification component, which needs to be substantially improved for the integrated system to approach ideal behavior -- even transfer from the best entailment models currently falls short. △ Less

Submitted 3 September, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

Comments: ACL 2021 Camera-ready

arXiv:2010.05345 [pdf, other]

Do Language Embeddings Capture Scales?

Authors: Xikun Zhang, Deepak Ramachandran, Ian Tenney, Yanai Elazar, Dan Roth

Abstract: Pretrained Language Models (LMs) have been shown to possess significant linguistic, common sense, and factual knowledge. One form of knowledge that has not been studied yet in this context is information about the scalar magnitudes of objects. We show that pretrained language models capture a significant amount of this information but are short of the capability required for general common-sense r… ▽ More Pretrained Language Models (LMs) have been shown to possess significant linguistic, common sense, and factual knowledge. One form of knowledge that has not been studied yet in this context is information about the scalar magnitudes of objects. We show that pretrained language models capture a significant amount of this information but are short of the capability required for general common-sense reasoning. We identify contextual information in pre-training and numeracy as two key factors affecting their performance and show that a simple method of canonicalizing numbers can have a significant effect on the results. △ Less

Submitted 24 November, 2020; v1 submitted 11 October, 2020; originally announced October 2020.

Comments: Accepted at EMNLP Findings 2020 and EMNLP BlackboxNLP workshop 2020; 8 pages, 2 figures; Minor changes to the acknowledgment section

ACM Class: I.2.7

arXiv:1906.01327 [pdf, other]

How Large Are Lions? Inducing Distributions over Quantitative Attributes

Authors: Yanai Elazar, Abhijit Mahabal, Deepak Ramachandran, Tania Bedrax-Weiss, Dan Roth

Abstract: Most current NLP systems have little knowledge about quantitative attributes of objects and events. We propose an unsupervised method for collecting quantitative information from large amounts of web data, and use it to create a new, very large resource consisting of distributions over physical quantities associated with objects, adjectives, and verbs which we call Distributions over Quantitative… ▽ More Most current NLP systems have little knowledge about quantitative attributes of objects and events. We propose an unsupervised method for collecting quantitative information from large amounts of web data, and use it to create a new, very large resource consisting of distributions over physical quantities associated with objects, adjectives, and verbs which we call Distributions over Quantitative (DoQ). This contrasts with recent work in this area which has focused on making only relative comparisons such as "Is a lion bigger than a wolf?". Our evaluation shows that DoQ compares favorably with state of the art results on existing datasets for relative comparisons of nouns and adjectives, and on a new dataset we introduce. △ Less

Submitted 4 June, 2019; originally announced June 2019.

arXiv:1906.00589 [pdf]

An upper bound for the clique number using clique ceiling numbers

Authors: R. Dharmarajan, D. Ramachandran

Abstract: In this article we present the idea of clique ceiling numbers of the vertices of a given graph that has a universal vertex. We follow up with a polynomial-time algorithm to compute an upper bound for the clique number of such a graph using clique ceiling numbers. We compare this algorithm with some upper bound formulas for the clique number. In this article we present the idea of clique ceiling numbers of the vertices of a given graph that has a universal vertex. We follow up with a polynomial-time algorithm to compute an upper bound for the clique number of such a graph using clique ceiling numbers. We compare this algorithm with some upper bound formulas for the clique number. △ Less

Submitted 3 June, 2019; originally announced June 2019.

Comments: 09 pages

MSC Class: 05C07; 05C69

arXiv:1903.10700 [pdf]

On the tractability of the maximum clique problem

Authors: R. Dharmarajan, D. Ramachandran

Abstract: The maximum clique problem is a classical NP-complete problem in graph theory and has important applications in many domains. In this paper we show, in a partially non-constructive way, the existence of an exact polynomial-time algorithm for this problem. We outline the algorithm in pseudo-code style. Then we prove its exactness and efficiency by analysis. The maximum clique problem is a classical NP-complete problem in graph theory and has important applications in many domains. In this paper we show, in a partially non-constructive way, the existence of an exact polynomial-time algorithm for this problem. We outline the algorithm in pseudo-code style. Then we prove its exactness and efficiency by analysis. △ Less

Submitted 17 May, 2019; v1 submitted 26 March, 2019; originally announced March 2019.

Comments: 15 (fifteen) pages

MSC Class: 05C69

arXiv:1901.00626 [pdf]

A modified greedy algorithm to improve bounds for the vertex cover number

Authors: R. Dharmarajan, D. Ramachandran

Abstract: In any attempt at designing an efficient algorithm for the minimum vertex cover problem, obtaining good upper and lower bounds for the vertex cover number could be crucial. In this article we present a modified greedy algorithm of worst-case time complexity O(n3) to obtain bounds for the vertex cover number of an input graph of order n. Using simple facts, the proposed algorithm computes a lower b… ▽ More In any attempt at designing an efficient algorithm for the minimum vertex cover problem, obtaining good upper and lower bounds for the vertex cover number could be crucial. In this article we present a modified greedy algorithm of worst-case time complexity O(n3) to obtain bounds for the vertex cover number of an input graph of order n. Using simple facts, the proposed algorithm computes a lower bound for the vertex cover number. Then using this lower bound it outputs a minimal vertex cover and hence gives an upper bound. The algorithm ensures the output vertex cover is always minimal, which feature is an improvement upon the existing greedy algorithms. △ Less

Submitted 3 January, 2019; originally announced January 2019.

Comments: 13 pages

MSC Class: 05C69; 05C70

Showing 1–28 of 28 results for author: Ramachandran, D