Skip to main content

Showing 1–20 of 20 results for author: Rajagopal, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.09127  [pdf, other

    cs.CL

    Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation

    Authors: Ruixin Yang, Dheeraj Rajagopal, Shirley Anugrah Hayati, Bin Hu, Dongyeop Kang

    Abstract: Uncertainty estimation is a significant issue for current large language models (LLMs) that are generally poorly calibrated and over-confident, especially with reinforcement learning from human feedback (RLHF). Unlike humans, whose decisions and confidences not only stem from intrinsic beliefs but can also be adjusted through daily observations, existing calibration methods for LLMs focus on estim… ▽ More

    Submitted 10 May, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

    Comments: Accepted at ICLR 2024 Workshop on Reliable and Responsible Foundation Models

  2. arXiv:2311.09799  [pdf, other

    cs.CL

    How Far Can We Extract Diverse Perspectives from Large Language Models?

    Authors: Shirley Anugrah Hayati, Minhwa Lee, Dheeraj Rajagopal, Dongyeop Kang

    Abstract: Collecting diverse human opinions is costly and challenging. This leads to a recent trend in collaborative efforts between humans and Large Language Models (LLMs) for generating diverse data, offering potential scalable and efficient solutions. However, the extent of LLMs' capability to generate diverse perspectives on subjective topics remains an unexplored question. In this study, we investigate… ▽ More

    Submitted 18 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  3. arXiv:2310.12963  [pdf, other

    cs.CL cs.AI

    AutoMix: Automatically Mixing Language Models

    Authors: Pranjal Aggarwal, Aman Madaan, Ankit Anand, Srividya Pranavi Potharaju, Swaroop Mishra, Pei Zhou, Aditya Gupta, Dheeraj Rajagopal, Karthik Kappaganthu, Yiming Yang, Shyam Upadhyay, Manaal Faruqui, Mausam

    Abstract: Large language models (LLMs) are now available from cloud API providers in various sizes and configurations. While this diversity offers a broad spectrum of choices, effectively leveraging the options to optimize computational cost and performance remains challenging. In this work, we present Automix, an approach that strategically routes queries to larger LMs, based on the approximate correctness… ▽ More

    Submitted 28 June, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: The first two authors contributed equally. Work started and partly done during Aman's internship at Google. This version adds results on additional models and datasets

  4. arXiv:2210.07469  [pdf, other

    cs.CL

    StyLEx: Explaining Style Using Human Lexical Annotations

    Authors: Shirley Anugrah Hayati, Kyumin Park, Dheeraj Rajagopal, Lyle Ungar, Dongyeop Kang

    Abstract: Large pre-trained language models have achieved impressive results on various style classification tasks, but they often learn spurious domain-specific words to make predictions (Hayati et al., 2021). While human explanation highlights stylistic tokens as important features for this task, we observe that model explanations often do not align with them. To tackle this issue, we introduce StyLEx, a… ▽ More

    Submitted 14 April, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: EACL 2023

  5. arXiv:2205.12485  [pdf, other

    cs.CL cs.AI

    Conditional set generation using Seq2seq models

    Authors: Aman Madaan, Dheeraj Rajagopal, Niket Tandon, Yiming Yang, Antoine Bosselut

    Abstract: Conditional set generation learns a map** from an input sequence of tokens to a set. Several NLP tasks, such as entity ty** and dialogue emotion tagging, are instances of set generation. Seq2Seq models, a popular choice for set generation, treat a set as a sequence and do not fully leverage its key properties, namely order-invariance and cardinality. We propose a novel algorithm for effectivel… ▽ More

    Submitted 24 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: EMNLP 2022

  6. arXiv:2205.12416  [pdf, other

    cs.CL

    Counterfactual Data Augmentation improves Factuality of Abstractive Summarization

    Authors: Dheeraj Rajagopal, Siamak Shakeri, Cicero Nogueira dos Santos, Eduard Hovy, Chung-Ching Chang

    Abstract: Abstractive summarization systems based on pretrained language models often generate coherent but factually inconsistent sentences. In this paper, we present a counterfactual data augmentation approach where we augment data with perturbed summaries that increase the training data diversity. Specifically, we present three augmentation approaches based on replacing (i) entities from other and the sa… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

  7. arXiv:2111.00539  [pdf, other

    cs.CL cs.AI

    Template Filling for Controllable Commonsense Reasoning

    Authors: Dheeraj Rajagopal, Vivek Khetan, Bogdan Sacaleanu, Anatole Gershman, Andrew Fano, Eduard Hovy

    Abstract: Large-scale sequence-to-sequence models have shown to be adept at both multiple-choice and open-domain commonsense reasoning tasks. However, the current systems do not provide the ability to control the various attributes of the reasoning chain. To enable better controllability, we propose to study the commonsense reasoning as a template filling task (TemplateCSR) -- where the language models fill… ▽ More

    Submitted 13 October, 2022; v1 submitted 31 October, 2021; originally announced November 2021.

  8. arXiv:2110.12349  [pdf, other

    cs.AI cs.CL

    Think about it! Improving defeasible reasoning by first modeling the question scenario

    Authors: Aman Madaan, Niket Tandon, Dheeraj Rajagopal, Peter Clark, Yiming Yang, Eduard Hovy

    Abstract: Defeasible reasoning is the mode of reasoning where conclusions can be overturned by taking into account new evidence. Existing cognitive science literature on defeasible reasoning suggests that a person forms a mental model of the problem scenario before answering questions. Our research goal asks whether neural models can similarly benefit from envisioning the question scenario before answering… ▽ More

    Submitted 24 October, 2021; originally announced October 2021.

    Comments: EMNLP 2021

  9. arXiv:2107.04140  [pdf, other

    cs.AR

    First-Generation Inference Accelerator Deployment at Facebook

    Authors: Michael Anderson, Benny Chen, Stephen Chen, Summer Deng, Jordan Fix, Michael Gschwind, Aravind Kalaiah, Changkyu Kim, Jaewon Lee, Jason Liang, Haixin Liu, Yinghai Lu, Jack Montgomery, Arun Moorthy, Satish Nadathur, Sam Naghshineh, Avinash Nayak, Jongsoo Park, Chris Petersen, Martin Schatz, Narayanan Sundaram, Bangsheng Tang, Peter Tang, Amy Yang, Jiecao Yu , et al. (90 additional authors not shown)

    Abstract: In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many of our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, as well as high compute, memory and network bandwidth requirements. We co-designed a high-performance, energy-efficient inference accelerator platform based on these requirements. We describe the in… ▽ More

    Submitted 4 August, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

  10. arXiv:2105.05418  [pdf, other

    cs.CL cs.AI

    Could you give me a hint? Generating inference graphs for defeasible reasoning

    Authors: Aman Madaan, Dheeraj Rajagopal, Niket Tandon, Yiming Yang, Eduard Hovy

    Abstract: Defeasible reasoning is the mode of reasoning where conclusions can be overturned by taking into account new evidence. A commonly used method in cognitive science and logic literature is to handcraft argumentation supporting inference graphs. While humans find inference graphs very useful for reasoning, constructing them at scale is difficult. In this paper, we automatically generate such inferenc… ▽ More

    Submitted 28 May, 2021; v1 submitted 12 May, 2021; originally announced May 2021.

    Comments: Findings of the Association for Computational Linguistics: ACL 2021

  11. arXiv:2104.08765  [pdf, other

    cs.CL

    Improving Neural Model Performance through Natural Language Feedback on Their Explanations

    Authors: Aman Madaan, Niket Tandon, Dheeraj Rajagopal, Yiming Yang, Peter Clark, Keisuke Sakaguchi, Ed Hovy

    Abstract: A class of explainable NLP models for reasoning tasks support their decisions by generating free-form or structured explanations, but what happens when these supporting structures contain errors? Our goal is to allow users to interactively correct explanation structures through natural language feedback. We introduce MERCURIE - an interactive system that refines its explanations for a given reason… ▽ More

    Submitted 18 April, 2021; originally announced April 2021.

  12. arXiv:2104.00814  [pdf, other

    cs.CL

    CURIE: An Iterative Querying Approach for Reasoning About Situations

    Authors: Dheeraj Rajagopal, Aman Madaan, Niket Tandon, Yiming Yang, Shrimai Prabhumoye, Abhilasha Ravichander, Peter Clark, Eduard Hovy

    Abstract: Recently, models have been shown to predict the effects of unexpected situations, e.g., would cloudy skies help or hinder plant growth? Given a context, the goal of such situational reasoning is to elicit the consequences of a new situation (st) that arises in that context. We propose a method to iteratively build a graph of relevant consequences explicitly in a structured situational graph (st-gr… ▽ More

    Submitted 5 April, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

    Comments: This paper builds upon EIGEN (arXiv:2010.11764) and proposes a general framework for situational reasoning

  13. arXiv:2103.12279  [pdf, other

    cs.CL

    SelfExplain: A Self-Explaining Architecture for Neural Text Classifiers

    Authors: Dheeraj Rajagopal, Vidhisha Balachandran, Eduard Hovy, Yulia Tsvetkov

    Abstract: We introduce SelfExplain, a novel self-explaining model that explains a text classifier's predictions using phrase-based concepts. SelfExplain augments existing neural classifiers by adding (1) a globally interpretable layer that identifies the most influential concepts in the training set for a given sample and (2) a locally interpretable layer that quantifies the contribution of each local input… ▽ More

    Submitted 7 September, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

  14. arXiv:2011.08092  [pdf, other

    cs.CL

    A Dataset for Tracking Entities in Open Domain Procedural Text

    Authors: Niket Tandon, Keisuke Sakaguchi, Bhavana Dalvi Mishra, Dheeraj Rajagopal, Peter Clark, Michal Guerquin, Kyle Richardson, Eduard Hovy

    Abstract: We present the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. For example, in a text describing fog removal using potatoes, a car window may transition between being foggy, sticky,opaque, and clear. Previous formulations of this task provide the text and entities involved,and ask how those entities change for just a sm… ▽ More

    Submitted 30 October, 2020; originally announced November 2020.

    Comments: To appear in EMNLP 2020

  15. arXiv:2010.11764  [pdf, other

    cs.CL

    EIGEN: Event Influence GENeration using Pre-trained Language Models

    Authors: Aman Madaan, Dheeraj Rajagopal, Yiming Yang, Abhilasha Ravichander, Eduard Hovy, Shrimai Prabhumoye

    Abstract: Reasoning about events and tracking their influences is fundamental to understanding processes. In this paper, we present EIGEN - a method to leverage pre-trained language models to generate event influences conditioned on a context, nature of their influence, and the distance in a reasoning chain. We also derive a new dataset for research and evaluation of methods for event influence generation.… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

  16. arXiv:2005.01526  [pdf, other

    cs.CL

    What-if I ask you to explain: Explaining the effects of perturbations in procedural text

    Authors: Dheeraj Rajagopal, Niket Tandon, Bhavana Dalvi, Peter Clark, Eduard Hovy

    Abstract: We address the task of explaining the effects of perturbations in procedural text, an important test of process comprehension. Consider a passage describing a rabbit's life-cycle: humans can easily explain the effect on the rabbit population if a female rabbit becomes ill -- i.e., the female rabbit would not become pregnant, and as a result not have babies leading to a decrease in rabbit populatio… ▽ More

    Submitted 7 October, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

  17. arXiv:2003.00576  [pdf, other

    cs.CL

    StructSum: Summarization via Structured Representations

    Authors: Vidhisha Balachandran, Artidoro Pagnoni, Jay Yoon Lee, Dheeraj Rajagopal, Jaime Carbonell, Yulia Tsvetkov

    Abstract: Abstractive text summarization aims at compressing the information of a long source document into a rephrased, condensed summary. Despite advances in modeling techniques, abstractive summarization models still suffer from several key challenges: (i) layout bias: they overfit to the style of training corpora; (ii) limited abstractiveness: they are optimized to copying n-grams from the source rather… ▽ More

    Submitted 16 February, 2021; v1 submitted 1 March, 2020; originally announced March 2020.

  18. arXiv:1804.00720  [pdf, other

    cs.CL cs.LG

    Simple and Effective Semi-Supervised Question Answering

    Authors: Bhuwan Dhingra, Danish Pruthi, Dheeraj Rajagopal

    Abstract: Recent success of deep learning models for the task of extractive Question Answering (QA) is hinged on the availability of large annotated corpora. However, large domain specific annotated corpora are limited and expensive to construct. In this work, we envision a system where the end user specifies a set of base documents and only a few labelled examples. Our system exploits the document structur… ▽ More

    Submitted 2 April, 2018; originally announced April 2018.

    Comments: Short paper, NAACL 2018

  19. arXiv:1706.07230  [pdf, other

    cs.LG cs.AI cs.CL cs.RO

    Gated-Attention Architectures for Task-Oriented Language Grounding

    Authors: Devendra Singh Chaplot, Kanthashree Mysore Sathyendra, Rama Kumar Pasumarthi, Dheeraj Rajagopal, Ruslan Salakhutdinov

    Abstract: To perform tasks specified by natural language instructions, autonomous agents need to extract semantically meaningful representations of language and map it to visual elements and actions in the environment. This problem is called task-oriented language grounding. We propose an end-to-end trainable neural architecture for task-oriented language grounding in 3D environments which assumes no prior… ▽ More

    Submitted 8 January, 2018; v1 submitted 22 June, 2017; originally announced June 2017.

    Comments: To appear in AAAI-18

  20. Customer Data Clustering using Data Mining Technique

    Authors: Dr. Sankar Rajagopal

    Abstract: Classification and patterns extraction from customer data is very important for business support and decision making. Timely identification of newly emerging trends is very important in business process. Large companies are having huge volume of data but starving for knowledge. To overcome the organization current issue, the new breed of technique is required that has intelligence and capability t… ▽ More

    Submitted 9 December, 2011; originally announced December 2011.

    Comments: 11 pages, 2 figures and 1 table

    Journal ref: International Journal of Database Management Systems ( IJDMS ) Vol.3, No.4, November 2011