Skip to main content

Showing 1–50 of 53 results for author: Khot, T

.
  1. arXiv:2407.01725  [pdf, other

    cs.CL cs.AI cs.LG

    DiscoveryBench: Towards Data-Driven Discovery with Large Language Models

    Authors: Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Bhavana Dalvi Mishra, Abhijeetsingh Meena, Aryan Prakhar, Tirth Vora, Tushar Khot, Ashish Sabharwal, Peter Clark

    Abstract: Can the rapid advances in code generation, function calling, and data analysis using large language models (LLMs) help automate the search and verification of hypotheses purely from a set of provided datasets? To evaluate this question, we present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. The benchmark is designed to systemat… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Website: https://github.com/allenai/discoverybench

  2. arXiv:2406.06769  [pdf, other

    cs.AI cs.CL

    DISCOVERYWORLD: A Virtual Environment for Develo** and Evaluating Automated Scientific Discovery Agents

    Authors: Peter Jansen, Marc-Alexandre Côté, Tushar Khot, Erin Bransom, Bhavana Dalvi Mishra, Bodhisattwa Prasad Majumder, Oyvind Tafjord, Peter Clark

    Abstract: Automated scientific discovery promises to accelerate progress across scientific domains. However, develo** and evaluating an AI agent's capacity for end-to-end scientific reasoning is challenging as running real-world experiments is often prohibitively expensive or infeasible. In this work we introduce DISCOVERYWORLD, the first virtual environment for develo** and benchmarking an agent's abil… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 9 pages, 4 figures. Preprint, under review

  3. arXiv:2406.06469  [pdf, other

    cs.AI cs.CL cs.LG

    Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

    Authors: Joongwon Kim, Bhargavi Paranjape, Tushar Khot, Hannaneh Hajishirzi

    Abstract: Language agents perform complex tasks by using tools to execute each step precisely. However, most existing agents are based on proprietary models or designed to target specific tasks, such as mathematics or multi-hop question answering. We introduce Husky, a holistic, open-source language agent that learns to reason over a unified action space to address a diverse set of complex tasks involving n… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 50 pages, 42 figures. Project webpage available [here](https://agent-husky.github.io/)

  4. arXiv:2402.00838  [pdf, other

    cs.CL

    OLMo: Accelerating the Science of Language Models

    Authors: Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam , et al. (18 additional authors not shown)

    Abstract: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  5. arXiv:2311.05772  [pdf, other

    cs.AI cs.CL cs.LG

    ADaPT: As-Needed Decomposition and Planning with Language Models

    Authors: Archiki Prasad, Alexander Koller, Mareike Hartmann, Peter Clark, Ashish Sabharwal, Mohit Bansal, Tushar Khot

    Abstract: Large Language Models (LLMs) are increasingly being used for interactive decision-making tasks requiring planning and adapting to the environment. Recent works employ LLMs-as-agents in broadly two ways: iteratively determining the next action (iterative executors) or generating plans and executing sub-tasks using LLMs (plan-and-execute). However, these methods struggle with task complexity, as the… ▽ More

    Submitted 8 April, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: NAACL 2024 (findings) camera-ready. Project Page: https://allenai.github.io/adaptllm

  6. arXiv:2311.04892  [pdf, other

    cs.CL

    Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

    Authors: Shashank Gupta, Vaishnavi Shrivastava, Ameet Deshpande, Ashwin Kalyan, Peter Clark, Ashish Sabharwal, Tushar Khot

    Abstract: Recent works have showcased the ability of LLMs to embody diverse personas in their responses, exemplified by prompts like 'You are Yoda. Explain the Theory of Relativity.' While this ability allows personalization of LLMs and enables human behavior simulation, its effect on LLMs' capabilities remains unclear. To fill this gap, we present the first extensive study of the unintended side-effects of… ▽ More

    Submitted 27 January, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: Project page: https://allenai.github.io/persona-bias. Paper to appear at ICLR 2024. Added results for other LLMs in v2 (similar findings)

  7. arXiv:2307.11694  [pdf, other

    cs.AI cs.LG q-bio.BM q-bio.MN

    SynerGPT: In-Context Learning for Personalized Drug Synergy Prediction and Drug Design

    Authors: Carl Edwards, Aakanksha Naik, Tushar Khot, Martin Burke, Heng Ji, Tom Hope

    Abstract: Predicting synergistic drug combinations can help accelerate discovery of cancer treatments, particularly therapies personalized to a patient's specific tumor via biopsied cells. In this paper, we propose a novel setting and models for in-context drug synergy learning. We are given a small "personalized dataset" of 10-20 drug synergy relationships in the context of specific cancer cell targets. Ou… ▽ More

    Submitted 24 October, 2023; v1 submitted 19 June, 2023; originally announced July 2023.

  8. arXiv:2306.04751  [pdf, other

    cs.CL

    How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources

    Authors: Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Raghavi Chandu, David Wadden, Kelsey MacMillan, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi

    Abstract: In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. Despite recent claims that open models can be on par with state-of-the-art proprietary models, these claims are often accompanied by limited evaluation, making it difficult to compare models across the board and determine the utility of various resources. We provide a la… ▽ More

    Submitted 30 October, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: 18 pages, 6 figure, 10 tables. NeurIPS 2023 Datasets and Benchmarks Track Camera Ready

  9. arXiv:2305.17306  [pdf, other

    cs.CL cs.AI cs.LG

    Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance

    Authors: Yao Fu, Litu Ou, Mingyu Chen, Yuhao Wan, Hao Peng, Tushar Khot

    Abstract: As large language models (LLMs) are continuously being developed, their evaluation becomes increasingly important yet challenging. This work proposes Chain-of-Thought Hub, an open-source evaluation suite on the multi-step reasoning capabilities of large language models. We are interested in this setting for two reasons: (1) from the behavior of GPT and PaLM model family, we observe that complex re… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Preprint. Code at https://github.com/FranxYao/chain-of-thought-hub

  10. arXiv:2305.10142  [pdf, other

    cs.CL

    Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback

    Authors: Yao Fu, Hao Peng, Tushar Khot, Mirella Lapata

    Abstract: We study whether multiple large language models (LLMs) can autonomously improve each other in a negotiation game by playing, reflecting, and criticizing. We are interested in this question because if LLMs were able to improve each other, it would imply the possibility of creating strong AI agents with minimal human intervention. We ask two LLMs to negotiate with each other, playing the roles of a… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: Preprint. Code at https://github.com/FranxYao/GPT-Bargaining

  11. arXiv:2301.12726  [pdf, other

    cs.CL cs.AI cs.LG

    Specializing Smaller Language Models towards Multi-Step Reasoning

    Authors: Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, Tushar Khot

    Abstract: The surprising ability of Large Language Models (LLMs) to perform well on complex reasoning with only few-shot chain-of-thought prompts is believed to emerge only in very large-scale models (100+ billion parameters). We show that such abilities can, in fact, be distilled down from GPT-3.5 ($\ge$ 175B) to T5 variants ($\le$ 11B). We propose model specialization, to specialize the model's ability to… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: Preprint

  12. arXiv:2212.10509  [pdf, other

    cs.CL

    Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

    Authors: Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal

    Abstract: Prompting-based large language models (LLMs) are surprisingly powerful at generating natural language reasoning steps or Chains-of-Thoughts (CoT) for multi-step question answering (QA). They struggle, however, when the necessary knowledge is either unavailable to the LLM or not up-to-date within its parameters. While using the question to retrieve relevant text from an external knowledge source he… ▽ More

    Submitted 22 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: ACL'23 Camera Ready

  13. arXiv:2211.12604  [pdf, other

    cs.CV cs.LG eess.IV

    SuperTran: Reference Based Video Transformer for Enhancing Low Bitrate Streams in Real Time

    Authors: Tejas Khot, Nataliya Shapovalova, Silviu Andrei, Walterio Mayol-Cuevas

    Abstract: This work focuses on low bitrate video streaming scenarios (e.g. 50 - 200Kbps) where the video quality is severely compromised. We present a family of novel deep generative models for enhancing perceptual video quality of such streams by performing super-resolution while also removing compression artifacts. Our model, which we call SuperTran, consumes as input a single high-quality, high-resolutio… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

    Comments: 4 pages

  14. arXiv:2210.10040  [pdf, other

    cs.CL cs.CY cs.LG cs.SI

    The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks

    Authors: Nikil Roashan Selvam, Sunipa Dev, Daniel Khashabi, Tushar Khot, Kai-Wei Chang

    Abstract: How reliably can we trust the scores obtained from social bias benchmarks as faithful indicators of problematic social biases in a given language model? In this work, we study this question by contrasting social biases with non-social biases stemming from choices made during dataset construction that might not even be discernible to the human eye. To do so, we empirically simulate various alternat… ▽ More

    Submitted 16 June, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

    Comments: ACL 2023

  15. arXiv:2210.02406  [pdf, other

    cs.CL

    Decomposed Prompting: A Modular Approach for Solving Complex Tasks

    Authors: Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal

    Abstract: Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn, especially when embedded in more complex tasks. To address this, we propose Decomposed Prompting, a new approach to solve complex tasks by deco… ▽ More

    Submitted 11 April, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: ICLR'23 Camera Ready

  16. arXiv:2210.00720  [pdf, other

    cs.CL cs.AI cs.LG

    Complexity-Based Prompting for Multi-Step Reasoning

    Authors: Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, Tushar Khot

    Abstract: We study the task of prompting large-scale language models to perform multi-step reasoning. Existing work shows that when prompted with a chain of thoughts (CoT), sequences of short sentences describing intermediate reasoning steps towards a final answer, large language models can generate new reasoning chains and predict answers for new inputs. A central question is which reasoning examples make… ▽ More

    Submitted 30 January, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: Preprint

  17. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  18. arXiv:2205.12496  [pdf, other

    cs.CL cs.AI

    Teaching Broad Reasoning Skills for Multi-Step QA by Generating Hard Contexts

    Authors: Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal

    Abstract: Question-answering datasets require a broad set of reasoning skills. We show how to use question decompositions to teach language models these broad reasoning skills in a robust fashion. Specifically, we use widely available QDMR representations to programmatically create hard-to-cheat synthetic contexts for real questions in six multi-step reasoning datasets. These contexts are carefully designed… ▽ More

    Submitted 3 November, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: Accepted at EMNLP'22

  19. arXiv:2205.03685  [pdf, other

    cs.CL

    Better Retrieval May Not Lead to Better Question Answering

    Authors: Zhengzhong Liang, Tushar Khot, Steven Bethard, Mihai Surdeanu, Ashish Sabharwal

    Abstract: Considerable progress has been made recently in open-domain question answering (QA) problems, which require Information Retrieval (IR) and Reading Comprehension (RC). A popular approach to improve the system's performance is to improve the quality of the retrieved context from the IR stage. In this work we show that for StrategyQA, a challenging open-domain QA dataset that requires multi-hop reaso… ▽ More

    Submitted 7 May, 2022; originally announced May 2022.

    Comments: 10 pages

  20. arXiv:2112.08348  [pdf, other

    cs.CL

    Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts

    Authors: Daniel Khashabi, Shane Lyu, Sewon Min, Lianhui Qin, Kyle Richardson, Sean Welleck, Hannaneh Hajishirzi, Tushar Khot, Ashish Sabharwal, Sameer Singh, Ye** Choi

    Abstract: Fine-tuning continuous prompts for target tasks has recently emerged as a compact alternative to full model fine-tuning. Motivated by these promising results, we investigate the feasibility of extracting a discrete (textual) interpretation of continuous prompts that is faithful to the problem they solve. In practice, we observe a "wayward" behavior between the task solved by continuous prompts and… ▽ More

    Submitted 4 May, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: NAACL 2022

  21. arXiv:2110.08542  [pdf, other

    cs.CL

    Hey AI, Can You Solve Complex Tasks by Talking to Agents?

    Authors: Tushar Khot, Kyle Richardson, Daniel Khashabi, Ashish Sabharwal

    Abstract: Training giant models from scratch for each complex task is resource- and data-inefficient. To help develop models that can leverage existing systems, we propose a new challenge: Learning to solve complex tasks by communicating with existing agents (or models) in natural language. We design a synthetic benchmark, CommaQA, with three complex reasoning tasks (explicit, implicit, numeric) designed to… ▽ More

    Submitted 9 May, 2022; v1 submitted 16 October, 2021; originally announced October 2021.

    Comments: Accepted to Findings of ACL 2022

  22. arXiv:2108.00573  [pdf, other

    cs.CL cs.AI

    MuSiQue: Multihop Questions via Single-hop Question Composition

    Authors: Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal

    Abstract: Multihop reasoning remains an elusive goal as existing multihop benchmarks are known to be largely solvable via shortcuts. Can we create a question answering (QA) dataset that, by construction, \emph{requires} proper multihop reasoning? To this end, we introduce a bottom-up approach that systematically selects composable pairs of single-hop questions that are connected, i.e., where one reasoning s… ▽ More

    Submitted 5 May, 2022; v1 submitted 1 August, 2021; originally announced August 2021.

    Comments: Accepted for publication in Transactions of the Association for Computational Linguistics (TACL), 2022

  23. arXiv:2106.01465  [pdf, other

    cs.CL cs.AI cs.LG

    Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?

    Authors: Jieyu Zhao, Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Kai-Wei Chang

    Abstract: Is it possible to use natural language to intervene in a model's behavior and alter its prediction in a desired way? We investigate the effectiveness of natural language interventions for reading-comprehension systems, studying this in the context of social stereotypes. Specifically, we propose a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: 9 pages, Findings of ACL-IJCNLP 2021

  24. arXiv:2104.08727  [pdf, other

    cs.CL cs.AI

    GooAQ: Open Question Answering with Diverse Answer Types

    Authors: Daniel Khashabi, Amos Ng, Tushar Khot, Ashish Sabharwal, Hannaneh Hajishirzi, Chris Callison-Burch

    Abstract: While day-to-day questions come with a variety of answer types, the current question-answering (QA) literature has failed to adequately address the answer diversity of questions. To this end, we present GooAQ, a large-scale dataset with a variety of answer types. This dataset contains over 5 million questions and 3 million answers collected from Google. GooAQ questions are collected semi-automatic… ▽ More

    Submitted 10 September, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

    Comments: EMNLP-Findings 2021

  25. arXiv:2102.03315  [pdf, other

    cs.CL cs.AI

    Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge

    Authors: Sumithra Bhakthavatsalam, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Peter Clark

    Abstract: We present the ARC-DA dataset, a direct-answer ("open response", "freeform") version of the ARC (AI2 Reasoning Challenge) multiple-choice dataset. While ARC has been influential in the community, its multiple-choice format is unrepresentative of real-world questions, and multiple choice formats can be particularly susceptible to artifacts. The ARC-DA dataset addresses these concerns by converting… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

  26. arXiv:2101.02235  [pdf, other

    cs.CL

    Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies

    Authors: Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, Jonathan Berant

    Abstract: A key limitation in current datasets for multi-hop reasoning is that the required steps for answering the question are mentioned in it explicitly. In this work, we introduce StrategyQA, a question answering (QA) benchmark where the required reasoning steps are implicit in the question, and should be inferred using a strategy. A fundamental challenge in this setup is how to elicit such creative que… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

    Comments: Accepted for publication in Transactions of the Association for Computational Linguistics (TACL), 2021. Author's final version

  27. arXiv:2011.07127  [pdf, other

    cs.CL

    IIRC: A Dataset of Incomplete Information Reading Comprehension Questions

    Authors: James Ferguson, Matt Gardner, Hannaneh Hajishirzi, Tushar Khot, Pradeep Dasigi

    Abstract: Humans often have to read multiple documents to address their information needs. However, most existing reading comprehension (RC) tasks only focus on questions for which the contexts provide all the information required to answer them, thus not evaluating a system's performance at identifying a potential lack of sufficient information and locating sources for that information. To fill this gap, w… ▽ More

    Submitted 13 November, 2020; originally announced November 2020.

    Comments: EMNLP 2020

  28. arXiv:2010.12854  [pdf, other

    cs.CL cs.AI

    ReadOnce Transformers: Reusable Representations of Text for Transformers

    Authors: Shih-Ting Lin, Ashish Sabharwal, Tushar Khot

    Abstract: We present ReadOnce Transformers, an approach to convert a transformer-based model into one that can build an information-capturing, task-independent, and compressed representation of text. The resulting representation is reusable across different examples and tasks, thereby requiring a document shared across many examples or tasks to only be \emph{read once}. This leads to faster training and eva… ▽ More

    Submitted 3 August, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

    Comments: Accepted to ACL 2021(Camera Ready)

  29. arXiv:2010.12753  [pdf, other

    cs.CL

    Temporal Reasoning on Implicit Events from Distant Supervision

    Authors: Ben Zhou, Kyle Richardson, Qiang Ning, Tushar Khot, Ashish Sabharwal, Dan Roth

    Abstract: We propose TRACIE, a novel temporal reasoning dataset that evaluates the degree to which systems understand implicit events -- events that are not mentioned explicitly in natural language text but can be inferred from it. This introduces a new challenge in temporal reasoning research, where prior work has focused on explicitly mentioned events. Human readers can infer implicit events via commonsen… ▽ More

    Submitted 7 May, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Accepted at NAACL 2021

  30. arXiv:2010.02428  [pdf, other

    cs.CL

    UnQovering Stereoty** Biases via Underspecified Questions

    Authors: Tao Li, Tushar Khot, Daniel Khashabi, Ashish Sabharwal, Vivek Srikumar

    Abstract: While language embeddings have been shown to have stereoty** biases, how these biases affect downstream question answering (QA) models remains unexplored. We present UNQOVER, a general framework to probe and quantify biases through underspecified questions. We show that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors: positional dependence an… ▽ More

    Submitted 9 October, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: Accepted at Findings of EMNLP 2020

  31. arXiv:2009.00751  [pdf, other

    cs.CL cs.AI

    Text Modular Networks: Learning to Decompose Tasks in the Language of Existing Models

    Authors: Tushar Khot, Daniel Khashabi, Kyle Richardson, Peter Clark, Ashish Sabharwal

    Abstract: We propose a general framework called Text Modular Networks(TMNs) for building interpretable systems that learn to solve complex tasks by decomposing them into simpler ones solvable by existing models. To ensure solvability of simpler tasks, TMNs learn the textual input-output behavior (i.e., language) of existing models through their datasets. This differs from prior decomposition-based approache… ▽ More

    Submitted 12 April, 2021; v1 submitted 1 September, 2020; originally announced September 2020.

    Comments: Accepted to NAACL 2021

  32. arXiv:2005.00789  [pdf, other

    cs.CL cs.AI cs.LG

    Is Multihop QA in DiRe Condition? Measuring and Reducing Disconnected Reasoning

    Authors: Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal

    Abstract: Has there been real progress in multi-hop question-answering? Models often exploit dataset artifacts to produce correct answers, without connecting information across multiple supporting facts. This limits our ability to measure true progress and defeats the purpose of building multi-hop QA datasets. We make three contributions towards addressing this. First, we formalize such undesirable behavior… ▽ More

    Submitted 16 November, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: Accepted at EMNLP'20

  33. arXiv:2005.00700  [pdf, other

    cs.CL cs.AI

    UnifiedQA: Crossing Format Boundaries With a Single QA System

    Authors: Daniel Khashabi, Sewon Min, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, Hannaneh Hajishirzi

    Abstract: Question answering (QA) tasks have been posed using a variety of formats, such as extractive span selection, multiple choice, etc. This has led to format-specialized models, and even to an implicit division in the QA community. We argue that such boundaries are artificial and perhaps unnecessary, given the reasoning abilities we seek to teach are not governed by the format. As evidence, we use the… ▽ More

    Submitted 6 October, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: EMNLP 2020 (Findings)

  34. arXiv:2004.06753  [pdf, other

    cs.CL

    A Simple Yet Strong Pipeline for HotpotQA

    Authors: Dirk Groeneveld, Tushar Khot, Mausam, Ashish Sabharwal

    Abstract: State-of-the-art models for multi-hop question answering typically augment large-scale language models like BERT with additional, intuitively useful capabilities such as named entity recognition, graph-based reasoning, and question decomposition. However, does their strong performance on popular multi-hop datasets really justify this added design complexity? Our results suggest that the answer may… ▽ More

    Submitted 14 April, 2020; originally announced April 2020.

  35. arXiv:2004.04849  [pdf, other

    cs.CL cs.AI cs.LG

    More Bang for Your Buck: Natural Perturbation for Robust Question Answering

    Authors: Daniel Khashabi, Tushar Khot, Ashish Sabharwal

    Abstract: While recent models have achieved human-level scores on many NLP datasets, we observe that they are considerably sensitive to small changes in input. As an alternative to the standard approach of addressing this issue by constructing training sets of completely new examples, we propose doing so via minimal perturbation of examples. Specifically, our approach involves first collecting a set of seed… ▽ More

    Submitted 6 October, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

    Comments: EMNLP 2020

  36. arXiv:1910.11473  [pdf, other

    cs.CL

    QASC: A Dataset for Question Answering via Sentence Composition

    Authors: Tushar Khot, Peter Clark, Michal Guerquin, Peter Jansen, Ashish Sabharwal

    Abstract: Composing knowledge from multiple pieces of texts is a key challenge in multi-hop question answering. We present a multi-hop reasoning dataset, Question Answering via Sentence Composition(QASC), that requires retrieving facts from a large corpus and composing them to answer a multiple-choice question. QASC is the first dataset to offer two desirable properties: (a) the facts to be composed are ann… ▽ More

    Submitted 4 February, 2020; v1 submitted 24 October, 2019; originally announced October 2019.

    Comments: AAAI-20 Camera Ready Version

  37. arXiv:1909.09253  [pdf, other

    cs.CL

    What's Missing: A Knowledge Gap Guided Approach for Multi-hop Question Answering

    Authors: Tushar Khot, Ashish Sabharwal, Peter Clark

    Abstract: Multi-hop textual question answering requires combining information from multiple sentences. We focus on a natural setting where, unlike typical reading comprehension, only partial information is provided with each question. The model must retrieve and use additional knowledge to correctly answer the question. To tackle this challenge, we develop a novel approach that explicitly identifies the kno… ▽ More

    Submitted 19 September, 2019; originally announced September 2019.

    Comments: EMNLP 2019

  38. arXiv:1909.01958  [pdf, other

    cs.CL cs.AI

    From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project

    Authors: Peter Clark, Oren Etzioni, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Niket Tandon, Sumithra Bhakthavatsalam, Dirk Groeneveld, Michal Guerquin, Michael Schmitz

    Abstract: AI has achieved remarkable mastery over games such as Chess, Go, and Poker, and even Jeopardy, but the rich variety of standardized exams has remained a landmark challenge. Even in 2016, the best AI system achieved merely 59.3% on an 8th Grade science exam challenge. This paper reports unprecedented success on the Grade 8 New York Regents Science Exam, where for the first time a system scores more… ▽ More

    Submitted 1 February, 2021; v1 submitted 4 September, 2019; originally announced September 2019.

    Comments: AI Magazine 41 (4) Winter 2020. New analysis sections added

  39. arXiv:1906.03672  [pdf, other

    cs.CL cs.AI

    Question Answering as Global Reasoning over Semantic Abstractions

    Authors: Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Dan Roth

    Abstract: We propose a novel method for exploiting the semantic structure of text to answer multiple-choice questions. The approach is especially suitable for domains that require reasoning over a diverse set of linguistic constructs but have limited training data. To address these challenges, we present the first system, to the best of our knowledge, that reasons over a wide range of semantic abstractions… ▽ More

    Submitted 9 June, 2019; originally announced June 2019.

    Comments: Appeared in AAAI'18

  40. arXiv:1905.02706  [pdf, other

    cs.CV cs.LG

    Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency

    Authors: Tejas Khot, Shubham Agrawal, Shubham Tulsiani, Christoph Mertz, Simon Lucey, Martial Hebert

    Abstract: We present a learning based approach for multi-view stereopsis (MVS). While current deep MVS methods achieve impressive results, they crucially rely on ground-truth 3D training data, and acquisition of such precise 3D geometry for supervision is a major hurdle. Our framework instead leverages photometric consistency between multiple views as supervisory signal for learning depth prediction in a wi… ▽ More

    Submitted 6 June, 2019; v1 submitted 7 May, 2019; originally announced May 2019.

  41. arXiv:1904.09380  [pdf, other

    cs.CL cs.AI cs.LG

    Repurposing Entailment for Multi-Hop Question Answering Tasks

    Authors: Harsh Trivedi, Heeyoung Kwon, Tushar Khot, Ashish Sabharwal, Niranjan Balasubramanian

    Abstract: Question Answering (QA) naturally reduces to an entailment problem, namely, verifying whether some text entails the answer to a question. However, for multi-hop QA tasks, which require reasoning with multiple sentences, it remains unclear how best to utilize entailment models pre-trained on large scale datasets such as SNLI, which are based on sentence pairs. We introduce Multee, a general archite… ▽ More

    Submitted 19 April, 2019; originally announced April 2019.

    Comments: Accepted at NAACL'19

  42. arXiv:1901.02522  [pdf, other

    cs.CL cs.AI

    On the Possibilities and Limitations of Multi-hop Reasoning Under Linguistic Imperfections

    Authors: Daniel Khashabi, Erfan Sadeqi Azer, Tushar Khot, Ashish Sabharwal, Dan Roth

    Abstract: Systems for language understanding have become remarkably strong at overcoming linguistic imperfections in tasks involving phrase matching or simple reasoning. Yet, their accuracy drops dramatically as the number of reasoning steps increases. We present the first formal framework to study such empirical observations. It allows one to quantify the amount and effect of ambiguity, redundancy, incompl… ▽ More

    Submitted 1 May, 2020; v1 submitted 8 January, 2019; originally announced January 2019.

  43. arXiv:1811.01127  [pdf, other

    cs.CL cs.AI

    Exploiting Explicit Paths for Multi-hop Reading Comprehension

    Authors: Souvik Kundu, Tushar Khot, Ashish Sabharwal, Peter Clark

    Abstract: We propose a novel, path-based reasoning approach for the multi-hop reading comprehension task where a system needs to combine facts from multiple passages to answer a question. Although inspired by multi-hop reasoning over knowledge graphs, our proposed approach operates directly over unstructured text. It generates potential paths through passages and scores them without any direct path supervis… ▽ More

    Submitted 8 July, 2019; v1 submitted 2 November, 2018; originally announced November 2018.

    Journal ref: ACL 2019

  44. arXiv:1809.02789  [pdf, other

    cs.CL

    Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

    Authors: Todor Mihaylov, Peter Clark, Tushar Khot, Ashish Sabharwal

    Abstract: We present a new kind of question answering dataset, OpenBookQA, modeled after open book exams for assessing human understanding of a subject. The open book that comes with our questions is a set of 1329 elementary level science facts. Roughly 6000 questions probe an understanding of these facts and their application to novel situations. This requires combining an open book fact (e.g., metals cond… ▽ More

    Submitted 8 September, 2018; originally announced September 2018.

    Comments: Published as conference long paper at EMNLP 2018

  45. arXiv:1808.09333  [pdf, other

    cs.CL cs.AI

    Bridging Knowledge Gaps in Neural Entailment via Symbolic Models

    Authors: Dongyeop Kang, Tushar Khot, Ashish Sabharwal, Peter Clark

    Abstract: Most textual entailment models focus on lexical gaps between the premise text and the hypothesis, but rarely on knowledge gaps. We focus on filling these knowledge gaps in the Science Entailment task, by leveraging an external structured knowledge base (KB) of science facts. Our new architecture combines standard neural entailment models with a knowledge lookup module. To facilitate this lookup, w… ▽ More

    Submitted 4 September, 2018; v1 submitted 28 August, 2018; originally announced August 2018.

    Comments: EMNLP 2018

  46. arXiv:1808.02123  [pdf, other

    cs.LG cs.AI stat.ML

    Structure Learning for Relational Logistic Regression: An Ensemble Approach

    Authors: Nandini Ramanan, Gautam Kunapuli, Tushar Khot, Bahare Fatemi, Seyed Mehran Kazemi, David Poole, Kristian Kersting, Sriraam Natarajan

    Abstract: We consider the problem of learning Relational Logistic Regression (RLR). Unlike standard logistic regression, the features of RLRs are first-order formulae with associated weight vectors instead of scalar weights. We turn the problem of learning RLR to learning these vector-weighted formulae and develop a learning algorithm based on the recently successful functional-gradient boosting methods for… ▽ More

    Submitted 6 August, 2018; originally announced August 2018.

  47. arXiv:1808.00671  [pdf, other

    cs.CV cs.RO

    PCN: Point Completion Network

    Authors: Wentao Yuan, Tejas Khot, David Held, Christoph Mertz, Martial Hebert

    Abstract: Shape completion, the problem of estimating the complete geometry of objects from partial observations, lies at the core of many vision and robotics applications. In this work, we propose Point Completion Network (PCN), a novel learning-based approach for shape completion. Unlike existing shape completion methods, PCN directly operates on raw point clouds without any structural assumption (e.g. sy… ▽ More

    Submitted 26 September, 2019; v1 submitted 2 August, 2018; originally announced August 2018.

    Comments: 3DV 2018 oral. Honorable mention for Best Paper award

  48. arXiv:1805.04680  [pdf, other

    cs.CL cs.AI cs.LG

    AdvEntuRe: Adversarial Training for Textual Entailment with Knowledge-Guided Examples

    Authors: Dongyeop Kang, Tushar Khot, Ashish Sabharwal, Eduard Hovy

    Abstract: We consider the problem of learning textual entailment models with limited supervision (5K-10K training examples), and present two complementary approaches for it. First, we propose knowledge-guided adversarial example generators for incorporating large lexical resources in entailment models via only a handful of rule templates. Second, to make the entailment model - a discriminator - more robust,… ▽ More

    Submitted 12 May, 2018; originally announced May 2018.

    Comments: ACL 2018

  49. arXiv:1803.05457  [pdf, other

    cs.AI cs.CL cs.IR

    Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

    Authors: Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord

    Abstract: We present a new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering. Together, these constitute the AI2 Reasoning Challenge (ARC), which requires far more powerful knowledge and reasoning than previous challenges such as SQuAD or SNLI. The ARC question set is partitioned into a Challenge Set and an Easy Set, where the Challenge Set contains o… ▽ More

    Submitted 14 March, 2018; originally announced March 2018.

    Comments: 10 pages, 7 tables, 2 figures

  50. arXiv:1704.05572  [pdf, other

    cs.AI cs.CL

    Answering Complex Questions Using Open Information Extraction

    Authors: Tushar Khot, Ashish Sabharwal, Peter Clark

    Abstract: While there has been substantial progress in factoid question-answering (QA), answering complex questions remains challenging, typically requiring both a large body of knowledge and inference techniques. Open Information Extraction (Open IE) provides a way to generate semi-structured knowledge for QA, but to date such knowledge has only been used to answer simple questions with retrieval-based met… ▽ More

    Submitted 18 April, 2017; originally announced April 2017.

    Comments: Accepted as short paper at ACL 2017