Skip to main content

Showing 1–25 of 25 results for author: Kalyan, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.04325  [pdf, other

    cs.CL

    Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation

    Authors: Atharvan Dogra, Ameet Deshpande, John Nay, Tanmay Rajpurohit, Ashwin Kalyan, Balaraman Ravindran

    Abstract: Recent developments in large language models (LLMs), while offering a powerful foundation for develo** natural language agents, raise safety concerns about them and the autonomous agents built upon them. Deception is one potential capability of AI agents of particular concern, which we refer to as an act or statement that misleads, hides the truth, or promotes a belief that is not true in its en… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  2. arXiv:2404.08555  [pdf, other

    cs.LG cs.AI cs.CL

    RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs

    Authors: Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande, Bruno Castro da Silva

    Abstract: State-of-the-art large language models (LLMs) have become indispensable tools for various tasks. However, training LLMs to serve as effective assistants for humans requires careful consideration. A promising approach is reinforcement learning from human feedback (RLHF), which leverages human feedback to update the model in accordance with human preferences and mitigate issues like toxicity and hal… ▽ More

    Submitted 15 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

  3. arXiv:2311.09735  [pdf, other

    cs.LG cs.IR

    GEO: Generative Engine Optimization

    Authors: Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande

    Abstract: The advent of large language models (LLMs) has ushered in a new paradigm of search engines that use generative models to gather and summarize information to answer user queries. This emerging technology, which we formalize under the unified framework of generative engines (GEs), can generate accurate and personalized responses, rapidly replacing traditional search engines like Google and Bing. Gen… ▽ More

    Submitted 28 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Accepted to KDD 2024

  4. arXiv:2311.04892  [pdf, other

    cs.CL

    Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

    Authors: Shashank Gupta, Vaishnavi Shrivastava, Ameet Deshpande, Ashwin Kalyan, Peter Clark, Ashish Sabharwal, Tushar Khot

    Abstract: Recent works have showcased the ability of LLMs to embody diverse personas in their responses, exemplified by prompts like 'You are Yoda. Explain the Theory of Relativity.' While this ability allows personalization of LLMs and enables human behavior simulation, its effect on LLMs' capabilities remains unclear. To fill this gap, we present the first extensive study of the unintended side-effects of… ▽ More

    Submitted 27 January, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: Project page: https://allenai.github.io/persona-bias. Paper to appear at ICLR 2024. Added results for other LLMs in v2 (similar findings)

  5. arXiv:2311.02807  [pdf, other

    cs.LG cs.AI cs.CL

    QualEval: Qualitative Evaluation for Model Improvement

    Authors: Vishvak Murahari, Ameet Deshpande, Peter Clark, Tanmay Rajpurohit, Ashish Sabharwal, Karthik Narasimhan, Ashwin Kalyan

    Abstract: Quantitative evaluation metrics have traditionally been pivotal in gauging the advancements of artificial intelligence systems, including large language models (LLMs). However, these metrics have inherent limitations. Given the intricate nature of real-world tasks, a single scalar to quantify and compare is insufficient to capture the fine-grained nuances of model behavior. Metrics serve only as a… ▽ More

    Submitted 5 May, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  6. arXiv:2310.06204  [pdf, other

    cs.CL cs.AI

    Estimating Numbers without Regression

    Authors: Avijit Thawani, Jay Pujara, Ashwin Kalyan

    Abstract: Despite recent successes in language models, their ability to represent numbers is insufficient. Humans conceptualize numbers based on their magnitudes, effectively projecting them on a number line; whereas subword tokenization fails to explicitly capture magnitude by splitting numbers into arbitrary chunks. To alleviate this shortcoming, alternative approaches have been proposed that modify numbe… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: Workshop on Insights from Negative Results in NLP at EACL 2023

  7. arXiv:2309.00133  [pdf, other

    cs.CV

    Distraction-free Embeddings for Robust VQA

    Authors: Atharvan Dogra, Deeksha Varshney, Ashwin Kalyan, Ameet Deshpande, Neeraj Kumar

    Abstract: The generation of effective latent representations and their subsequent refinement to incorporate precise information is an essential prerequisite for Vision-Language Understanding (VLU) tasks such as Video Question Answering (VQA). However, most existing methods for VLU focus on sparsely sampling or fine-graining the input information (e.g., sampling a sparse set of frames or text tokens), or add… ▽ More

    Submitted 31 August, 2023; originally announced September 2023.

  8. arXiv:2308.03882  [pdf, other

    cs.LG cs.AI stat.ML

    Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations

    Authors: Nirbhay Modhe, Qiaozi Gao, Ashwin Kalyan, Dhruv Batra, Govind Thattai, Gaurav Sukhatme

    Abstract: Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by conservative value estimation -- penalizing values of unseen states and actions. Model-free methods penalize values at all unseen actions, while model-based methods are able to further exploit unseen states via model rollouts. However, such methods are handicapped in their ability to find unseen st… ▽ More

    Submitted 24 September, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

  9. arXiv:2305.15093  [pdf, other

    cs.CL cs.AI cs.LG

    C-STS: Conditional Semantic Textual Similarity

    Authors: Ameet Deshpande, Carlos E. Jimenez, Howard Chen, Vishvak Murahari, Victoria Graf, Tanmay Rajpurohit, Ashwin Kalyan, Danqi Chen, Karthik Narasimhan

    Abstract: Semantic textual similarity (STS), a cornerstone task in NLP, measures the degree of similarity between a pair of sentences, and has broad application in fields such as information retrieval and natural language understanding. However, sentence similarity can be inherently ambiguous, depending on the specific aspect of interest. We resolve this ambiguity by proposing a novel task called Conditiona… ▽ More

    Submitted 6 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Published in EMNLP 2023

  10. arXiv:2305.14784  [pdf, other

    cs.AI cs.CL cs.CY cs.LG

    Anthropomorphization of AI: Opportunities and Risks

    Authors: Ameet Deshpande, Tanmay Rajpurohit, Karthik Narasimhan, Ashwin Kalyan

    Abstract: Anthropomorphization is the tendency to attribute human-like traits to non-human entities. It is prevalent in many social contexts -- children anthropomorphize toys, adults do so with brands, and it is a literary device. It is also a versatile tool in science, with behavioral psychology and evolutionary biology meticulously documenting its consequences. With widespread adoption of AI systems, and… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  11. arXiv:2305.08844  [pdf, other

    cs.CL

    RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs

    Authors: Afra Feyza Akyürek, Ekin Akyürek, Aman Madaan, Ashwin Kalyan, Peter Clark, Derry Wijaya, Niket Tandon

    Abstract: Despite their unprecedented success, even the largest language models make mistakes. Similar to how humans learn and improve using feedback, previous work proposed providing language models with natural language feedback to guide them in repairing their outputs. Because human-generated critiques are expensive to obtain, researchers have devised learned critique generators in lieu of human critics… ▽ More

    Submitted 11 July, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  12. ProKnow: Process Knowledge for Safety Constrained and Explainable Question Generation for Mental Health Diagnostic Assistance

    Authors: Kaushik Roy, Manas Gaur, Misagh Soltani, Vipula Rawte, Ashwin Kalyan, Amit Sheth

    Abstract: Current Virtual Mental Health Assistants (VMHAs) provide counseling and suggestive care. They refrain from patient diagnostic assistance because they lack training in safety-constrained and specialized clinical process knowledge. In this work, we define Proknow as an ordered set of information that maps to evidence-based guidelines or categories of conceptual understanding to experts in a domain.… ▽ More

    Submitted 1 June, 2023; v1 submitted 13 May, 2023; originally announced May 2023.

    Journal ref: Front. Big Data, 09 January 2023, Sec. Data Science, Volume 5 - 2022

  13. arXiv:2304.09133  [pdf

    eess.IV cs.CV cs.LG

    Detection and Classification of Glioblastoma Brain Tumor

    Authors: Utkarsh Maurya, Appisetty Krishna Kalyan, Swapnil Bohidar, Dr. S. Sivakumar

    Abstract: Glioblastoma brain tumors are highly malignant and often require early detection and accurate segmentation for effective treatment. We are proposing two deep learning models in this paper, namely UNet and Deeplabv3, for the detection and segmentation of glioblastoma brain tumors using preprocessed brain MRI images. The performance evaluation is done for these models in terms of accuracy and comput… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

    Comments: 12 pages, 8 figures

  14. arXiv:2304.05335  [pdf, other

    cs.CL cs.AI cs.LG

    Toxicity in ChatGPT: Analyzing Persona-assigned Language Models

    Authors: Ameet Deshpande, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan

    Abstract: Large language models (LLMs) have shown incredible capabilities and transcended the natural language processing (NLP) community, with adoption throughout many services like healthcare, therapy, education, and customer service. Since users include people with critical information needs like students or patients engaging with chatbots, the safety of these systems is of prime importance. Therefore, a… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  15. arXiv:2211.16634  [pdf, other

    cs.CL cs.AI cs.LG

    SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers

    Authors: Ameet Deshpande, Md Arafat Sultan, Anthony Ferritto, Ashwin Kalyan, Karthik Narasimhan, Avirup Sil

    Abstract: Fine-tuning pre-trained language models (PLMs) achieves impressive performance on a range of downstream tasks, and their sizes have consequently been getting bigger. Since a different copy of the model is required for each task, this paradigm is infeasible for storage-constrained edge devices like mobile phones. In this paper, we propose SPARTAN, a parameter efficient (PE) and computationally fast… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

  16. arXiv:2210.17517  [pdf, other

    cs.CL cs.AI

    Lila: A Unified Benchmark for Mathematical Reasoning

    Authors: Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, Ashwin Kalyan

    Abstract: Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shop** to climate modeling. Towards evaluating and improving AI systems in this domain, we propose LILA, a unified mathematical reasoning benchmark consisting of 23 diverse tasks along four dimensions: (i) mathematical abilities e.g., arithmetic, calculus (ii) language format e.g., q… ▽ More

    Submitted 8 March, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

    MSC Class: 68T50 ACM Class: I.2.7

  17. arXiv:2209.14610  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning

    Authors: Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Tanmay Rajpurohit, Peter Clark, Ashwin Kalyan

    Abstract: Mathematical reasoning, a core ability of human intelligence, presents unique challenges for machines in abstract thinking and logical reasoning. Recent large pre-trained language models such as GPT-3 have achieved remarkable progress on mathematical reasoning tasks written in text form, such as math word problems (MWP). However, it is unknown if the models can handle more complex problems that in… ▽ More

    Submitted 2 March, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: ICLR 2023. 26 pages and 18 figures. The data and code are available at https://promptpg.github.io

  18. arXiv:2209.09513  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.MM

    Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

    Authors: Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, Ashwin Kalyan

    Abstract: When answering a question, humans utilize the information available across different modalities to synthesize a consistent and complete chain of thought (CoT). This process is normally a black box in the case of deep learning models like large-scale language models. Recently, science question benchmarks have been used to diagnose the multi-hop reasoning ability and interpretability of an AI system… ▽ More

    Submitted 17 October, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

    Comments: Accepted to NeurIPS 2022. 22 pages, 17 figures, 9 tables. Project: https://scienceqa.github.io

  19. arXiv:2204.05660  [pdf, other

    cs.CL cs.AI cs.LG

    NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks

    Authors: Swaroop Mishra, Arindam Mitra, Neeraj Varshney, Bhavdeep Sachdeva, Peter Clark, Chitta Baral, Ashwin Kalyan

    Abstract: Given the ubiquitous nature of numbers in text, reasoning with numbers to perform simple calculations is an important skill of AI systems. While many datasets and models have been developed to this end, state-of-the-art AI systems are brittle; failing to perform the underlying mathematical reasoning when they appear in a slightly different scenario. Drawing inspiration from GLUE that was proposed… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: ACL 2022

  20. arXiv:2110.14207  [pdf, other

    cs.CL cs.AI

    How Much Coffee Was Consumed During EMNLP 2019? Fermi Problems: A New Reasoning Challenge for AI

    Authors: Ashwin Kalyan, Abhinav Kumar, Arjun Chandrasekaran, Ashish Sabharwal, Peter Clark

    Abstract: Many real-world problems require the combined application of multiple reasoning abilities employing suitable abstractions, commonsense knowledge, and creative synthesis of problem-solving strategies. To help advance AI systems towards such capabilities, we propose a new reasoning challenge, namely Fermi Problems (FPs), which are questions whose answers can only be approximately estimated because t… ▽ More

    Submitted 20 December, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: Accepted for publication at EMNLP 2021, 11 pages, 5 tables, 4 figures

  21. arXiv:2106.14080  [pdf, other

    cs.LG cs.AI stat.ML

    Model-Advantage and Value-Aware Models for Model-Based Reinforcement Learning: Bridging the Gap in Theory and Practice

    Authors: Nirbhay Modhe, Harish Kamath, Dhruv Batra, Ashwin Kalyan

    Abstract: This work shows that value-aware model learning, known for its numerous theoretical benefits, is also practically viable for solving challenging continuous control tasks in prevalent model-based reinforcement learning algorithms. First, we derive a novel value-aware model learning objective by bounding the model-advantage i.e. model performance difference, between two MDPs or models given a fixed… ▽ More

    Submitted 28 January, 2022; v1 submitted 26 June, 2021; originally announced June 2021.

  22. arXiv:2106.05784  [pdf, other

    cs.LG cs.AI cs.CL cs.PL cs.SE

    Programming Puzzles

    Authors: Tal Schuster, Ashwin Kalyan, Oleksandr Polozov, Adam Tauman Kalai

    Abstract: We introduce a new type of programming challenge called programming puzzles, as an objective and comprehensive evaluation of program synthesis, and release an open-source dataset of Python Programming Puzzles (P3). Each puzzle is defined by a short Python program $f$, and the goal is to find an input which makes $f$ return True. The puzzles are objective in that each one is specified entirely by t… ▽ More

    Submitted 6 November, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 (Datasets and Benchmarks Track). Puzzles repository: https://github.com/microsoft/PythonProgrammingPuzzles

  23. arXiv:1806.02934  [pdf, other

    stat.ML cs.CL cs.CV cs.LG

    Learn from Your Neighbor: Learning Multi-modal Map**s from Sparse Annotations

    Authors: Ashwin Kalyan, Stefan Lee, Anitha Kannan, Dhruv Batra

    Abstract: Many structured prediction problems (particularly in vision and language domains) are ambiguous, with multiple outputs being correct for an input - e.g. there are many ways of describing an image, multiple ways of translating a sentence; however, exhaustively annotating the applicability of all possible outputs is intractable due to exponentially large output spaces (e.g. all English sentences). I… ▽ More

    Submitted 7 June, 2018; originally announced June 2018.

    Comments: To be presented at ICML 2018; 10 pages 5 figures

  24. arXiv:1804.01186  [pdf, ps, other

    cs.AI cs.LG cs.PL

    Neural-Guided Deductive Search for Real-Time Program Synthesis from Examples

    Authors: Ashwin Kalyan, Abhishek Mohta, Oleksandr Polozov, Dhruv Batra, Prateek Jain, Sumit Gulwani

    Abstract: Synthesizing user-intended programs from a small number of input-output examples is a challenging problem with several important applications like spreadsheet manipulation, data wrangling and code refactoring. Existing synthesis systems either completely rely on deductive logic techniques that are extensively hand-engineered or on purely statistical models that need massive amounts of data, and in… ▽ More

    Submitted 9 September, 2018; v1 submitted 3 April, 2018; originally announced April 2018.

    Comments: Published in ICLR 2018, International Conference on Learning Representations (2018)

  25. arXiv:1606.06424  [pdf

    cs.IR cs.CL cs.LG

    A Novel Framework to Expedite Systematic Reviews by Automatically Building Information Extraction Training Corpora

    Authors: Tanmay Basu, Shraman Kumar, Abhishek Kalyan, Priyanka Jayaswal, Pawan Goyal, Stephen Pettifer, Siddhartha R. Jonnalagadda

    Abstract: A systematic review identifies and collates various clinical studies and compares data elements and results in order to provide an evidence based answer for a particular clinical question. The process is manual and involves lot of time. A tool to automate this process is lacking. The aim of this work is to develop a framework using natural language processing and machine learning to build informat… ▽ More

    Submitted 21 June, 2016; originally announced June 2016.