Skip to main content

Showing 1–21 of 21 results for author: Mishra, B D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01725  [pdf, other

    cs.CL cs.AI cs.LG

    DiscoveryBench: Towards Data-Driven Discovery with Large Language Models

    Authors: Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Bhavana Dalvi Mishra, Abhijeetsingh Meena, Aryan Prakhar, Tirth Vora, Tushar Khot, Ashish Sabharwal, Peter Clark

    Abstract: Can the rapid advances in code generation, function calling, and data analysis using large language models (LLMs) help automate the search and verification of hypotheses purely from a set of provided datasets? To evaluate this question, we present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. The benchmark is designed to systemat… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Website: https://github.com/allenai/discoverybench

  2. arXiv:2406.06769  [pdf, other

    cs.AI cs.CL

    DISCOVERYWORLD: A Virtual Environment for Develo** and Evaluating Automated Scientific Discovery Agents

    Authors: Peter Jansen, Marc-Alexandre Côté, Tushar Khot, Erin Bransom, Bhavana Dalvi Mishra, Bodhisattwa Prasad Majumder, Oyvind Tafjord, Peter Clark

    Abstract: Automated scientific discovery promises to accelerate progress across scientific domains. However, develo** and evaluating an AI agent's capacity for end-to-end scientific reasoning is challenging as running real-world experiments is often prohibitively expensive or infeasible. In this work we introduce DISCOVERYWORLD, the first virtual environment for develo** and benchmarking an agent's abil… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 9 pages, 4 figures. Preprint, under review

  3. arXiv:2402.14798  [pdf, other

    cs.CL cs.AI

    Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic

    Authors: Nathaniel Weir, Kate Sanders, Orion Weller, Shreya Sharma, Dongwei Jiang, Zheng** Jiang, Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Jansen, Peter Clark, Benjamin Van Durme

    Abstract: Contemporary language models enable new opportunities for structured reasoning with text, such as the construction and evaluation of intuitive, proof-like textual entailment trees without relying on brittle formal logic. However, progress in this direction has been hampered by a long-standing lack of a clear protocol for determining what valid compositional entailment is. This absence causes noisy… ▽ More

    Submitted 27 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  4. arXiv:2402.03244  [pdf, other

    cs.LG cs.CL

    Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills

    Authors: Kolby Nottingham, Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Sameer Singh, Peter Clark, Roy Fox

    Abstract: Large language models (LLMs) have recently been used for sequential decision making in interactive environments. However, leveraging environment reward signals for continual LLM actor improvement is not straightforward. We propose Skill Set Optimization (SSO) for improving LLM actor performance through constructing and refining sets of transferable skills. SSO constructs skills by extracting commo… ▽ More

    Submitted 22 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

  5. arXiv:2312.07527  [pdf, other

    cs.CL cs.AI

    BaRDa: A Belief and Reasoning Dataset that Separates Factual Accuracy and Reasoning Ability

    Authors: Peter Clark, Bhavana Dalvi Mishra, Oyvind Tafjord

    Abstract: While there are numerous benchmarks comparing the performance of modern language models (LMs), end-task evaluations often conflate notions of *factual accuracy* ("truth") and *reasoning ability* ("rationality", or "honesty" in the sense of correctly reporting implications of beliefs). Our goal is a dataset that clearly distinguishes these two notions. Our approach is to leverage and extend a colle… ▽ More

    Submitted 23 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Added note about how dataset sampling was performed

  6. arXiv:2310.10134  [pdf, other

    cs.CL cs.AI cs.LG

    CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization

    Authors: Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Peter Jansen, Oyvind Tafjord, Niket Tandon, Li Zhang, Chris Callison-Burch, Peter Clark

    Abstract: Language agents have shown some ability to interact with an external environment, e.g., a virtual world such as ScienceWorld, to perform complex tasks, e.g., growing a plant, without the startup costs of reinforcement learning. However, despite their zero-shot capabilities, these agents to date do not continually improve over time beyond performance refinement on a specific task. Here we present C… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Project page: https://allenai.github.io/clin/

  7. arXiv:2212.10029  [pdf, other

    cs.CL cs.AI

    Do language models have coherent mental models of everyday things?

    Authors: Yuling Gu, Bhavana Dalvi Mishra, Peter Clark

    Abstract: When people think of everyday things like an egg, they typically have a mental image associated with it. This allows them to correctly judge, for example, that "the yolk surrounds the shell" is a false statement. Do language models similarly have a coherent picture of such everyday things? To investigate this, we propose a benchmark dataset consisting of 100 everyday things, their parts, and the r… ▽ More

    Submitted 8 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: ACL 2023

  8. arXiv:2210.16407  [pdf, other

    cs.CL

    Just-DREAM-about-it: Figurative Language Understanding with DREAM-FLUTE

    Authors: Yuling Gu, Yao Fu, Valentina Pyatkin, Ian Magnusson, Bhavana Dalvi Mishra, Peter Clark

    Abstract: Figurative language (e.g., "he flew like the wind") is challenging to understand, as it is hard to tell what implicit information is being conveyed from the surface form alone. We hypothesize that to perform this task well, the reader needs to mentally elaborate the scene being described to identify a sensible meaning of the language. We present DREAM-FLUTE, a figurative language understanding sys… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: Accepted at The Third Workshop on Figurative Language Processing @ EMNLP 2022

  9. arXiv:2210.12217  [pdf, other

    cs.AI cs.CL

    Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning

    Authors: Oyvind Tafjord, Bhavana Dalvi Mishra, Peter Clark

    Abstract: Our goal is a question-answering (QA) system that can show how its answers are implied by its own internal beliefs via a systematic chain of reasoning. Such a capability would allow better understanding of why a model produced the answer it did. Our approach is to recursively combine a trained backward-chaining model, capable of generating a set of premises entailing an answer hypothesis, with a v… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: accepted at EMNLP 2022. arXiv admin note: substantial text overlap with arXiv:2204.13074

  10. arXiv:2204.13074  [pdf, other

    cs.CL cs.AI

    Towards Teachable Reasoning Systems: Using a Dynamic Memory of User Feedback for Continual System Improvement

    Authors: Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Clark

    Abstract: Our goal is a teachable reasoning system for question-answering (QA), where a user can interact with faithful answer explanations, and correct its errors so that the system improves over time. Our approach is to augment a QA model with a dynamic memory of user feedback, containing user-supplied corrections to erroneous model beliefs that users identify during interaction. Retrievals from memory ar… ▽ More

    Submitted 21 October, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

    Comments: accepted at EMNLP 2022

  11. arXiv:2112.08656  [pdf, other

    cs.CL cs.AI

    DREAM: Improving Situational QA by First Elaborating the Situation

    Authors: Yuling Gu, Bhavana Dalvi Mishra, Peter Clark

    Abstract: When people answer questions about a specific situation, e.g., "I cheated on my mid-term exam last week. Was that wrong?", cognitive science suggests that they form a mental picture of that situation before answering. While we do not know how language models (LMs) answer such questions, we conjecture that they may answer more accurately if they are also provided with additional details about the q… ▽ More

    Submitted 5 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: to be published in NAACL 2022

  12. arXiv:2102.03315  [pdf, other

    cs.CL cs.AI

    Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge

    Authors: Sumithra Bhakthavatsalam, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Peter Clark

    Abstract: We present the ARC-DA dataset, a direct-answer ("open response", "freeform") version of the ARC (AI2 Reasoning Challenge) multiple-choice dataset. While ARC has been influential in the community, its multiple-choice format is unrepresentative of real-world questions, and multiple choice formats can be particularly susceptible to artifacts. The ARC-DA dataset addresses these concerns by converting… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

  13. arXiv:2012.13048  [pdf, other

    cs.CL cs.AI

    ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language

    Authors: Oyvind Tafjord, Bhavana Dalvi Mishra, Peter Clark

    Abstract: Transformers have been shown to emulate logical deduction over natural language theories (logical rules expressed in natural language), reliably assigning true/false labels to candidate implications. However, their ability to generate implications of a theory has not yet been demonstrated, and methods for reconstructing proofs of answers are imperfect. In this work we show that a generative model,… ▽ More

    Submitted 3 June, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

    Comments: Findings of ACL 2021

  14. arXiv:2011.08092  [pdf, other

    cs.CL

    A Dataset for Tracking Entities in Open Domain Procedural Text

    Authors: Niket Tandon, Keisuke Sakaguchi, Bhavana Dalvi Mishra, Dheeraj Rajagopal, Peter Clark, Michal Guerquin, Kyle Richardson, Eduard Hovy

    Abstract: We present the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. For example, in a text describing fog removal using potatoes, a car window may transition between being foggy, sticky,opaque, and clear. Previous formulations of this task provide the text and entities involved,and ask how those entities change for just a sm… ▽ More

    Submitted 30 October, 2020; originally announced November 2020.

    Comments: To appear in EMNLP 2020

  15. arXiv:2003.13878  [pdf, other

    cs.CL

    Procedural Reading Comprehension with Attribute-Aware Context Flow

    Authors: Aida Amini, Antoine Bosselut, Bhavana Dalvi Mishra, Ye** Choi, Hannaneh Hajishirzi

    Abstract: Procedural texts often describe processes (e.g., photosynthesis and cooking) that happen over entities (e.g., light, food). In this paper, we introduce an algorithm for procedural reading comprehension by translating the text into a general formalism that represents processes as a sequence of transitions over entity attributes (e.g., location, temperature). Leveraging pre-trained language models,… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

  16. arXiv:1909.04745  [pdf, other

    cs.CL cs.AI

    Everything Happens for a Reason: Discovering the Purpose of Actions in Procedural Text

    Authors: Bhavana Dalvi Mishra, Niket Tandon, Antoine Bosselut, Wen-tau Yih, Peter Clark

    Abstract: Our goal is to better comprehend procedural text, e.g., a paragraph about photosynthesis, by not only predicting what happens, but why some actions need to happen before others. Our approach builds on a prior process comprehension framework for predicting actions' effects, to also identify subsequent steps that those effects enable. We present our new model (XPAD) that biases effect predictions to… ▽ More

    Submitted 18 September, 2019; v1 submitted 10 September, 2019; originally announced September 2019.

    Comments: Accepted to EMNLP 2019 as a long paper. This revision fixed a typo in an author name in references

  17. arXiv:1909.04739  [pdf, other

    cs.CL cs.AI

    WIQA: A dataset for "What if..." reasoning over procedural text

    Authors: Niket Tandon, Bhavana Dalvi Mishra, Keisuke Sakaguchi, Antoine Bosselut, Peter Clark

    Abstract: We introduce WIQA, the first large-scale dataset of "What if..." questions over procedural text. WIQA contains three parts: a collection of paragraphs each describing a process, e.g., beach erosion; a set of crowdsourced influence graphs for each paragraph, describing how one change affects another; and a large (40k) collection of "What if...?" multiple-choice questions derived from the graphs. Fo… ▽ More

    Submitted 10 September, 2019; originally announced September 2019.

    Comments: Accepted at EMNLP 2019

  18. arXiv:1909.01958  [pdf, other

    cs.CL cs.AI

    From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project

    Authors: Peter Clark, Oren Etzioni, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Niket Tandon, Sumithra Bhakthavatsalam, Dirk Groeneveld, Michal Guerquin, Michael Schmitz

    Abstract: AI has achieved remarkable mastery over games such as Chess, Go, and Poker, and even Jeopardy, but the rich variety of standardized exams has remained a landmark challenge. Even in 2016, the best AI system achieved merely 59.3% on an 8th Grade science exam challenge. This paper reports unprecedented success on the Grade 8 New York Regents Science Exam, where for the first time a system scores more… ▽ More

    Submitted 1 February, 2021; v1 submitted 4 September, 2019; originally announced September 2019.

    Comments: AI Magazine 41 (4) Winter 2020. New analysis sections added

  19. arXiv:1906.08942  [pdf, other

    cs.CL cs.LG

    Be Consistent! Improving Procedural Text Comprehension using Label Consistency

    Authors: Xinya Du, Bhavana Dalvi Mishra, Niket Tandon, Antoine Bosselut, Wen-tau Yih, Peter Clark, Claire Cardie

    Abstract: Our goal is procedural text comprehension, namely tracking how the properties of entities (e.g., their location) change with time given a procedural text (e.g., a paragraph about photosynthesis, a recipe). This task is challenging as the world is changing throughout the text, and despite recent advances, current systems still struggle with this task. Our approach is to leverage the fact that, for… ▽ More

    Submitted 21 June, 2019; originally announced June 2019.

    Comments: NAACL 2019

  20. arXiv:1808.10012  [pdf, other

    cs.AI

    Reasoning about Actions and State Changes by Injecting Commonsense Knowledge

    Authors: Niket Tandon, Bhavana Dalvi Mishra, Joel Grus, Wen-tau Yih, Antoine Bosselut, Peter Clark

    Abstract: Comprehending procedural text, e.g., a paragraph describing photosynthesis, requires modeling actions and the state changes they produce, so that questions about entities at different timepoints can be answered. Although several recent systems have shown impressive progress in this task, their predictions can be globally inconsistent or highly improbable. In this paper, we show how the predicted e… ▽ More

    Submitted 29 August, 2018; originally announced August 2018.

    Comments: Accepted at EMNLP 2018. Niket Tandon and Bhavana Dalvi Mishra contributed equally to this work

  21. arXiv:1805.06975  [pdf, other

    cs.CL

    Tracking State Changes in Procedural Text: A Challenge Dataset and Models for Process Paragraph Comprehension

    Authors: Bhavana Dalvi Mishra, Lifu Huang, Niket Tandon, Wen-tau Yih, Peter Clark

    Abstract: We present a new dataset and models for comprehending paragraphs about processes (e.g., photosynthesis), an important genre of text describing a dynamic world. The new dataset, ProPara, is the first to contain natural (rather than machine-generated) text about a changing world along with a full annotation of entity states (location and existence) during those changes (81k datapoints). The end-task… ▽ More

    Submitted 17 May, 2018; originally announced May 2018.

    Comments: In Proc. NAACL'2018