Skip to main content

Showing 1–50 of 65 results for author: Gardner, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08375  [pdf, other

    eess.SY cs.CE

    A Parameterized Nonlinear Magnetic Equivalent Circuit for Design and Fast Analysis of Radial Flux Magnetic Gears

    Authors: Danial Kazemikia, Matthew Gardner

    Abstract: Magnetic gears offer advantages over mechanical gears, including contactless power transfer, but require robust analysis tools for optimization and commercialization. This study proposes a rapid and accurate 2D nonlinear magnetic equivalent circuit (MEC) model for radial flux magnetic gears (RFMG). The model, featuring a parameterized gear geometry and adjustable flux tube distribution, accommodat… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  2. arXiv:2404.06962  [pdf, other

    cs.LG cs.AI

    Advancing Real-time Pandemic Forecasting Using Large Language Models: A COVID-19 Case Study

    Authors: Hongru Du, Jianan Zhao, Yang Zhao, Shaochong Xu, Xihong Lin, Yiran Chen, Lauren M. Gardner, Hao Frank Yang

    Abstract: Forecasting the short-term spread of an ongoing disease outbreak is a formidable challenge due to the complexity of contributing factors, some of which can be characterized through interlinked, multi-modality variables such as epidemiological time series data, viral biology, population demographics, and the intersection of public policy and human behavior. Existing forecasting model frameworks str… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 35 pages, 10 figures

  3. arXiv:2305.14907  [pdf, other

    cs.CL

    Coverage-based Example Selection for In-Context Learning

    Authors: Shivanshu Gupta, Matt Gardner, Sameer Singh

    Abstract: In-context learning (ICL), the ability of large language models to perform novel tasks by conditioning on a prompt with a few task examples, requires these examples to be informative about the test instance. The standard approach of independently ranking and selecting the most similar examples selects redundant examples while omitting important information. In this work, we show that BERTScore-Rec… ▽ More

    Submitted 6 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted to EMNLP 2023 (Findings) Changelog: Added acknowledgments

  4. arXiv:2305.09572  [pdf, ps, other

    cs.SE stat.CO

    UQpy v4.1: Uncertainty Quantification with Python

    Authors: Dimitrios Tsapetis, Michael D. Shields, Dimitris G. Giovanis, Audrey Olivier, Lukas Novak, Promit Chakroborty, Himanshu Sharma, Mohit Chauhan, Katiana Kontolati, Lohit Vandanapu, Dimitrios Loukrezis, Michael Gardner

    Abstract: This paper presents the latest improvements introduced in Version 4 of the UQpy, Uncertainty Quantification with Python, library. In the latest version, the code was restructured to conform with the latest Python coding conventions, refactored to simplify previous tightly coupled features, and improve its extensibility and modularity. To improve the robustness of UQpy, software engineering best pr… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

  5. arXiv:2212.04092  [pdf, other

    cs.CL

    Successive Prompting for Decomposing Complex Questions

    Authors: Dheeru Dua, Shivanshu Gupta, Sameer Singh, Matt Gardner

    Abstract: Answering complex questions that require making latent decisions is a challenging task, especially when limited supervision is available. Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting by demonstrating how to output intermediate rationalizations while solving the complex question in a single pass. We introduce ``Suc… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

  6. arXiv:2211.00295  [pdf, other

    cs.CL cs.AI

    CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation

    Authors: Abhilasha Ravichander, Matt Gardner, Ana Marasović

    Abstract: The full power of human language-based communication cannot be realized without negation. All human languages have some form of negation. Despite this, negation remains a challenging phenomenon for current natural language understanding systems. To facilitate the future development of models that can process negation effectively, we present CONDAQA, the first English reading comprehension dataset… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: EMNLP 2022

  7. arXiv:2205.08124  [pdf, other

    cs.CL

    When to Use Multi-Task Learning vs Intermediate Fine-Tuning for Pre-Trained Encoder Transfer Learning

    Authors: Orion Weller, Kevin Seppi, Matt Gardner

    Abstract: Transfer learning (TL) in natural language processing (NLP) has seen a surge of interest in recent years, as pre-trained models have shown an impressive ability to transfer to novel tasks. Three main strategies have emerged for making use of multiple supervised datasets during fine-tuning: training on an intermediate task before training on the target task (STILTs), using multi-task learning (MTL)… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

    Comments: ACL 2022

  8. arXiv:2204.05991  [pdf, other

    cs.CV cs.CL

    ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension

    Authors: Sanjay Subramanian, William Merrill, Trevor Darrell, Matt Gardner, Sameer Singh, Anna Rohrbach

    Abstract: Training a referring expression comprehension (ReC) model for a new visual domain requires collecting referring expressions, and potentially corresponding bounding boxes, for images in the domain. While large-scale pre-trained models are useful for image classification across domains, it remains unclear if they can be applied in a zero-shot manner to more complex tasks like ReC. We present ReCLIP,… ▽ More

    Submitted 2 May, 2022; v1 submitted 12 April, 2022; originally announced April 2022.

    Comments: ACL 2022

  9. arXiv:2203.12942  [pdf, other

    cs.CL cs.AI cs.CY

    Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets

    Authors: Yuxiang Wu, Matt Gardner, Pontus Stenetorp, Pradeep Dasigi

    Abstract: Natural language processing models often exploit spurious correlations between task-independent features and labels in datasets to perform well only within the distributions they are trained on, while not generalising to different task distributions. We propose to tackle this problem by generating a debiased version of a dataset, which can then be used to train a debiased, off-the-shelf model, by… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

    Comments: Accepted to ACL 2022 main conference

  10. arXiv:2203.08445  [pdf, other

    cs.CL

    Structurally Diverse Sampling for Sample-Efficient Training and Comprehensive Evaluation

    Authors: Shivanshu Gupta, Sameer Singh, Matt Gardner

    Abstract: A growing body of research has demonstrated the inability of NLP models to generalize compositionally and has tried to alleviate it through specialized architectures, training schemes, and data augmentation, among other approaches. In this work, we study a different approach: training on instances with diverse structures. We propose a model-agnostic algorithm for subsampling such sets of instances… ▽ More

    Submitted 1 November, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

    Comments: Accepted at Findings of EMNLP 2022

  11. arXiv:2202.07206  [pdf, other

    cs.CL cs.LG

    Impact of Pretraining Term Frequencies on Few-Shot Reasoning

    Authors: Yasaman Razeghi, Robert L. Logan IV, Matt Gardner, Sameer Singh

    Abstract: Pretrained Language Models (LMs) have demonstrated ability to perform numerical reasoning by extrapolating from a few examples in few-shot settings. However, the extent to which this extrapolation relies on robust reasoning is unclear. In this paper, we investigate how well these models reason with terms that are less frequent in the pretraining data. In particular, we examine the correlations bet… ▽ More

    Submitted 23 May, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

  12. arXiv:2112.08688  [pdf, other

    cs.CL

    Evidentiality-guided Generation for Knowledge-Intensive NLP Tasks

    Authors: Akari Asai, Matt Gardner, Hannaneh Hajishirzi

    Abstract: Retrieval-augmented generation models have shown state-of-the-art performance across many knowledge-intensive NLP tasks such as open question answering and fact verification. These models are trained to generate the final output given the retrieved passages, which can be irrelevant to the original query, leading to learning spurious cues or answer memorization. This work introduces a method to inc… ▽ More

    Submitted 14 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Published as a conference paper at NAACL 2022 (long). Code available at https://github.com/AkariAsai/evidentiality_qa

  13. arXiv:2109.10613  [pdf, other

    cs.CL

    COVR: A test-bed for Visually Grounded Compositional Generalization with real images

    Authors: Ben Bogin, Shivanshu Gupta, Matt Gardner, Jonathan Berant

    Abstract: While interest in models that generalize at test time to new compositions has risen in recent years, benchmarks in the visually-grounded domain have thus far been restricted to synthetic images. In this work, we propose COVR, a new test-bed for visually-grounded compositional generalization with real images. To create COVR, we use real images annotated with scene graphs, and propose an almost full… ▽ More

    Submitted 22 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021

  14. arXiv:2107.12708  [pdf, other

    cs.CL cs.AI

    QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

    Authors: Anna Rogers, Matt Gardner, Isabelle Augenstein

    Abstract: Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been also much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with over 80 new datasets appearing in the past two years. This study is the largest survey of the field to date. We provide an overv… ▽ More

    Submitted 19 September, 2022; v1 submitted 27 July, 2021; originally announced July 2021.

    Comments: Published in ACM Comput. Surv (2022). This version differs from the final version in that section 7 ("Languages") is not in the main paper rather than the supplementary materials

  15. arXiv:2107.07150  [pdf, other

    cs.CL

    Tailor: Generating and Perturbing Text with Semantic Controls

    Authors: Alexis Ross, Tongshuang Wu, Hao Peng, Matthew E. Peters, Matt Gardner

    Abstract: Controlled text perturbation is useful for evaluating and improving model generalizability. However, current techniques rely on training a model for every target perturbation, which is expensive and hard to generalize. We present Tailor, a semantically-controlled text generation system. Tailor builds on a pretrained seq2seq model and produces textual outputs conditioned on control codes derived fr… ▽ More

    Submitted 17 March, 2022; v1 submitted 15 July, 2021; originally announced July 2021.

  16. arXiv:2107.05833  [pdf, other

    cs.CL

    Enforcing Consistency in Weakly Supervised Semantic Parsing

    Authors: Nitish Gupta, Sameer Singh, Matt Gardner

    Abstract: The predominant challenge in weakly supervised semantic parsing is that of spurious programs that evaluate to correct answers for the wrong reasons. Prior work uses elaborate search strategies to mitigate the prevalence of spurious programs; however, they typically consider only one input at a time. In this work we explore the use of consistency between the output programs for related inputs to re… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

    Comments: Published in ACL 2021

  17. arXiv:2105.03011  [pdf, other

    cs.CL

    A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers

    Authors: Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A. Smith, Matt Gardner

    Abstract: Readers of academic research papers often read with the goal of answering specific questions. Question Answering systems that can answer those questions can make consumption of the content much more efficient. However, building such tools requires data that reflect the difficulty of the task arising from complex reasoning about claims made in multiple parts of a paper. In contrast, existing inform… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

    Comments: Accepted at NAACL 2021; Project page: https://allenai.org/project/qasper

  18. arXiv:2104.10034  [pdf, other

    cs.CR

    On Generating and Labeling Network Traffic with Realistic, Self-Propagating Malware

    Authors: Molly Buchanan, Jeffrey W. Collyer, Jack W. Davidson, Saikat Dey, Mark Gardner, Jason D. Hiser, Jeffry Lang, Alastair Nottingham, Alina Oprea

    Abstract: Research and development of techniques which detect or remediate malicious network activity require access to diverse, realistic, contemporary data sets containing labeled malicious connections. In the absence of such data, said techniques cannot be meaningfully trained, tested, and evaluated. Synthetically produced data containing fabricated or merged network traffic is of limited value as it is… ▽ More

    Submitted 27 May, 2022; v1 submitted 20 April, 2021; originally announced April 2021.

    Comments: 4+2 pages, 3 figures, 1 table, for AI4CS-SDM21

  19. arXiv:2104.08758  [pdf, other

    cs.CL cs.AI

    Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus

    Authors: Jesse Dodge, Maarten Sap, Ana Marasović, William Agnew, Gabriel Ilharco, Dirk Groeneveld, Margaret Mitchell, Matt Gardner

    Abstract: Large language models have led to remarkable progress on many NLP tasks, and researchers are turning to ever-larger text corpora to train them. Some of the largest corpora available are made by scra** significant portions of the internet, and are frequently introduced with only minimal documentation. In this work we provide some of the first documentation for the Colossal Clean Crawled Corpus (C… ▽ More

    Submitted 30 September, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

    Comments: EMNLP 2021 accepted paper camera ready version

  20. arXiv:2104.08744  [pdf, other

    cs.CL

    Generative Context Pair Selection for Multi-hop Question Answering

    Authors: Dheeru Dua, Cicero Nogueira dos Santos, Patrick Ng, Ben Athiwaratkun, Bing Xiang, Matt Gardner, Sameer Singh

    Abstract: Compositional reasoning tasks like multi-hop question answering, require making latent decisions to get the final answer, given a question. However, crowdsourced datasets often capture only a slice of the underlying task distribution, which can induce unanticipated biases in models performing compositional reasoning. Furthermore, discriminatively trained models exploit such biases to get a better… ▽ More

    Submitted 18 April, 2021; originally announced April 2021.

  21. arXiv:2104.08735  [pdf, other

    cs.CL

    Learning with Instance Bundles for Reading Comprehension

    Authors: Dheeru Dua, Pradeep Dasigi, Sameer Singh, Matt Gardner

    Abstract: When training most modern reading comprehension models, all the questions associated with a context are treated as being independent from each other. However, closely related questions and their corresponding answers are not independent, and leveraging these relationships could provide a strong supervision signal to a model. Drawing on ideas from contrastive estimation, we introduce several new su… ▽ More

    Submitted 18 April, 2021; originally announced April 2021.

  22. arXiv:2104.08646  [pdf, other

    cs.CL

    Competency Problems: On Finding and Removing Artifacts in Language Data

    Authors: Matt Gardner, William Merrill, Jesse Dodge, Matthew E. Peters, Alexis Ross, Sameer Singh, Noah A. Smith

    Abstract: Much recent work in NLP has documented dataset artifacts, bias, and spurious correlations between input features and output labels. However, how to tell which features have "spurious" instead of legitimate correlations is typically left unspecified. In this work we argue that for complex language understanding tasks, all simple feature correlations are spurious, and we formalize this notion into a… ▽ More

    Submitted 28 December, 2021; v1 submitted 17 April, 2021; originally announced April 2021.

    Comments: EMNLP 2021. This version fixes an error in Proposition 1 and adds discussion (the EMNLP camera ready version is unfixed) (and v3 adds the acknowledgements that we forgot to put into v2)

  23. arXiv:2104.01759  [pdf, other

    cs.CL

    Paired Examples as Indirect Supervision in Latent Decision Models

    Authors: Nitish Gupta, Sameer Singh, Matt Gardner, Dan Roth

    Abstract: Compositional, structured models are appealing because they explicitly decompose problems and provide interpretable intermediate outputs that give confidence that the model is not simply latching onto data artifacts. Learning these models is challenging, however, because end-task supervision only provides a weak indirect signal on what values the latent decisions should take. This often results in… ▽ More

    Submitted 4 April, 2021; originally announced April 2021.

  24. arXiv:2103.12235  [pdf, other

    cs.CL

    Mitigating False-Negative Contexts in Multi-document Question Answering with Retrieval Marginalization

    Authors: Ansong Ni, Matt Gardner, Pradeep Dasigi

    Abstract: Question Answering (QA) tasks requiring information from multiple documents often rely on a retrieval model to identify relevant information for reasoning. The retrieval model is typically trained to maximize the likelihood of the labeled supporting evidence. However, when retrieving from large text corpora such as Wikipedia, the correct answer can often be obtained from multiple evidence candidat… ▽ More

    Submitted 8 September, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: Accepted to EMNLP 2021 (main conference)

  25. arXiv:2011.08115  [pdf, other

    cs.CL

    Learning from Task Descriptions

    Authors: Orion Weller, Nicholas Lourie, Matt Gardner, Matthew E. Peters

    Abstract: Typically, machine learning systems solve new tasks by training on thousands of examples. In contrast, humans can solve new tasks by reading some instructions, with perhaps an example or two. To take a step toward closing this gap, we introduce a framework for develo** NLP systems that solve new tasks after reading their descriptions, synthesizing prior work in this area. We instantiate this fra… ▽ More

    Submitted 16 November, 2020; originally announced November 2020.

    Comments: EMNLP 2020

  26. arXiv:2011.07127  [pdf, other

    cs.CL

    IIRC: A Dataset of Incomplete Information Reading Comprehension Questions

    Authors: James Ferguson, Matt Gardner, Hannaneh Hajishirzi, Tushar Khot, Pradeep Dasigi

    Abstract: Humans often have to read multiple documents to address their information needs. However, most existing reading comprehension (RC) tasks only focus on questions for which the contexts provide all the information required to answer them, thus not evaluating a system's performance at identifying a potential lack of sufficient information and locating sources for that information. To fill this gap, w… ▽ More

    Submitted 13 November, 2020; originally announced November 2020.

    Comments: EMNLP 2020

  27. arXiv:2010.06694  [pdf, other

    cs.HC

    Easy, Reproducible and Quality-Controlled Data Collection with Crowdaq

    Authors: Qiang Ning, Hao Wu, Pradeep Dasigi, Dheeru Dua, Matt Gardner, Robert L. Logan IV, Ana Marasovic, Zhen Nie

    Abstract: High-quality and large-scale data are key to success for AI systems. However, large-scale data annotation efforts are often confronted with a set of common challenges: (1) designing a user-friendly annotation interface; (2) training enough annotators efficiently; and (3) reproducibility. To address these problems, we introduce Crowdaq, an open-source platform that standardizes the data collection… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

    Comments: Accepted to the demo track of EMNLP 2020

  28. arXiv:2010.06000  [pdf, other

    cs.CV cs.CL

    MedICaT: A Dataset of Medical Images, Captions, and Textual References

    Authors: Sanjay Subramanian, Lucy Lu Wang, Sachin Mehta, Ben Bogin, Madeleine van Zuylen, Sravanthi Parasa, Sameer Singh, Matt Gardner, Hannaneh Hajishirzi

    Abstract: Understanding the relationship between figures and text is key to scientific document understanding. Medical figures in particular are quite complex, often consisting of several subfigures (75% of figures in our dataset), with detailed text describing their content. Previous work studying figures in scientific papers focused on classifying figure content rather than understanding how images relate… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Comments: EMNLP-Findings 2020

  29. arXiv:2010.05647  [pdf, other

    cs.CL

    Improving Compositional Generalization in Semantic Parsing

    Authors: Inbar Oren, Jonathan Herzig, Nitish Gupta, Matt Gardner, Jonathan Berant

    Abstract: Generalization of models to out-of-distribution (OOD) data has captured tremendous attention recently. Specifically, compositional generalization, i.e., whether a model generalizes to new structures built of components observed during training, has sparked substantial interest. In this work, we investigate compositional generalization in semantic parsing, a natural test-bed for compositional gener… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

  30. MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics

    Authors: Anthony Chen, Gabriel Stanovsky, Sameer Singh, Matt Gardner

    Abstract: Posing reading comprehension as a generation problem provides a great deal of flexibility, allowing for open-ended questions with few restrictions on possible answers. However, progress is impeded by existing generation metrics, which rely on token overlap and are agnostic to the nuances of reading comprehension. To address this, we introduce a benchmark for training and evaluating generative read… ▽ More

    Submitted 15 October, 2020; v1 submitted 7 October, 2020; originally announced October 2020.

    Journal ref: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

  31. arXiv:2009.09363  [pdf, other

    cs.CL

    Understanding Mention Detector-Linker Interaction in Neural Coreference Resolution

    Authors: Zhaofeng Wu, Matt Gardner

    Abstract: Despite significant recent progress in coreference resolution, the quality of current state-of-the-art systems still considerably trails behind human-level performance. Using the CoNLL-2012 and PreCo datasets, we dissect the best instantiation of the mainstream end-to-end coreference resolution model that underlies most current best-performing coreference systems, and empirically analyze the behav… ▽ More

    Submitted 8 September, 2021; v1 submitted 20 September, 2020; originally announced September 2020.

    Comments: CRAC @ EMNLP 2021

  32. arXiv:2007.00266  [pdf, other

    cs.CL cs.AI cs.LG

    Latent Compositional Representations Improve Systematic Generalization in Grounded Question Answering

    Authors: Ben Bogin, Sanjay Subramanian, Matt Gardner, Jonathan Berant

    Abstract: Answering questions that involve multi-step reasoning requires decomposing them and using the answers of intermediate steps to reach the final answer. However, state-of-the-art models in grounded question answering often do not explicitly perform decomposition, leading to difficulties in generalization to out-of-distribution examples. In this work, we propose a model that computes a representation… ▽ More

    Submitted 10 November, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

    Comments: Accepted for publication in Transactions of the Association for Computational Linguistics (TACL), 2020. Author's final version

  33. arXiv:2005.00724  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Obtaining Faithful Interpretations from Compositional Neural Networks

    Authors: Sanjay Subramanian, Ben Bogin, Nitish Gupta, Tomer Wolfson, Sameer Singh, Jonathan Berant, Matt Gardner

    Abstract: Neural module networks (NMNs) are a popular approach for modeling compositionality: they achieve high accuracy when applied to problems in language and vision, while reflecting the compositional structure of the problem in the network architecture. However, prior work implicitly assumed that the structure of the network modules, describing the abstract reasoning process, provides a faithful explan… ▽ More

    Submitted 8 September, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: ACL 2020; first three authors contributed equally

  34. arXiv:2005.00242  [pdf, other

    cs.CL

    TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions

    Authors: Qiang Ning, Hao Wu, Rujun Han, Nanyun Peng, Matt Gardner, Dan Roth

    Abstract: A critical part of reading is being able to understand the temporal relationships between events described in a passage of text, even when those relationships are not explicitly stated. However, current machine reading comprehension benchmarks have practically no questions that test temporal phenomena, so systems trained on these benchmarks have no capacity to answer questions such as "what happen… ▽ More

    Submitted 5 October, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

    Comments: 15 pages (incl. 4 pages in the appendix); accepted to EMNLP 2020

  35. Multi-Step Inference for Reasoning Over Paragraphs

    Authors: Jiangming Liu, Matt Gardner, Shay B. Cohen, Mirella Lapata

    Abstract: Complex reasoning over text requires understanding and chaining together free-form predicates and logical connectives. Prior work has largely tried to do this either symbolically or with black-box transformers. We present a middle ground between these two extremes: a compositional model reminiscent of neural module networks that can perform chained logical reasoning. This model first finds relevan… ▽ More

    Submitted 7 June, 2021; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: accepted by EMNLP 2020

  36. arXiv:2004.02709  [pdf, other

    cs.CL

    Evaluating Models' Local Decision Boundaries via Contrast Sets

    Authors: Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang , et al. (1 additional authors not shown)

    Abstract: Standard test sets for supervised learning evaluate in-distribution generalization. Unfortunately, when a dataset has systematic gaps (e.g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities. We propose a new annotation paradigm for NLP that helps to close systemati… ▽ More

    Submitted 1 October, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

  37. arXiv:2001.11770  [pdf, other

    cs.CL

    Break It Down: A Question Understanding Benchmark

    Authors: Tomer Wolfson, Mor Geva, Ankit Gupta, Matt Gardner, Yoav Goldberg, Daniel Deutch, Jonathan Berant

    Abstract: Understanding natural language questions entails the ability to break down a question into the requisite steps for computing its answer. In this work, we introduce a Question Decomposition Meaning Representation (QDMR) for questions. QDMR constitutes the ordered list of steps, expressed through natural language, that are necessary for answering a question. We develop a crowdsourcing pipeline, show… ▽ More

    Submitted 31 January, 2020; originally announced January 2020.

    Comments: Accepted for publication in Transactions of the Association for Computational Linguistics (TACL), 2020. Author's final version

  38. arXiv:1912.12598  [pdf, ps, other

    cs.CL

    ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine Reading Comprehension

    Authors: Dheeru Dua, Ananth Gottumukkala, Alon Talmor, Sameer Singh, Matt Gardner

    Abstract: Reading comprehension is one of the crucial tasks for furthering research in natural language understanding. A lot of diverse reading comprehension datasets have recently been introduced to study various phenomena in natural language, ranging from simple paraphrase matching and entity ty** to entity tracking and understanding the implications of the context. Given the availability of many such d… ▽ More

    Submitted 29 December, 2019; originally announced December 2019.

  39. arXiv:1912.04971  [pdf, other

    cs.CL

    Neural Module Networks for Reasoning over Text

    Authors: Nitish Gupta, Kevin Lin, Dan Roth, Sameer Singh, Matt Gardner

    Abstract: Answering compositional questions that require multiple steps of reasoning against text is challenging, especially when they involve discrete, symbolic operations. Neural module networks (NMNs) learn to parse such questions as executable programs composed of learnable modules, performing well on synthetic visual QA domains. However, we find that it is challenging to learn these models for non-synt… ▽ More

    Submitted 15 February, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: Published in ICLR 2020 (International Conference on Learning Representations, 2020)

  40. arXiv:1910.08812  [pdf, other

    cs.CV

    Deep Parametric Indoor Lighting Estimation

    Authors: Marc-André Gardner, Yannick Hold-Geoffroy, Kalyan Sunkavalli, Christian Gagné, Jean-François Lalonde

    Abstract: We present a method to estimate lighting from a single image of an indoor scene. Previous work has used an environment map representation that does not account for the localized nature of indoor lighting. Instead, we represent lighting as a set of discrete 3D lights with geometric and photometric parameters. We train a deep neural network to regress these parameters from a single image, on a datas… ▽ More

    Submitted 19 October, 2019; originally announced October 2019.

  41. arXiv:1909.11291  [pdf, ps, other

    cs.CL

    Question Answering is a Format; When is it Useful?

    Authors: Matt Gardner, Jonathan Berant, Hannaneh Hajishirzi, Alon Talmor, Sewon Min

    Abstract: Recent years have seen a dramatic expansion of tasks and datasets posed as question answering, from reading comprehension, semantic role labeling, and even machine translation, to image and video understanding. With this expansion, there are many differing views on the utility and definition of "question answering" itself. Some argue that its scope should be narrow, or broad, or that it is overuse… ▽ More

    Submitted 25 September, 2019; originally announced September 2019.

  42. arXiv:1909.09251  [pdf, other

    cs.CL cs.LG

    AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models

    Authors: Eric Wallace, Jens Tuyls, Junlin Wang, Sanjay Subramanian, Matt Gardner, Sameer Singh

    Abstract: Neural NLP models are increasingly accurate but are imperfect and opaque---they break in counterintuitive ways and leave end users puzzled at their behavior. Model interpretation methods ameliorate this opacity by providing explanations for specific model predictions. Unfortunately, existing interpretation codebases make it difficult to apply these methods to new models and tasks, which hinders ad… ▽ More

    Submitted 19 September, 2019; originally announced September 2019.

    Comments: EMNLP 2019 Demo

  43. arXiv:1909.07940  [pdf, other

    cs.CL cs.LG

    Do NLP Models Know Numbers? Probing Numeracy in Embeddings

    Authors: Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, Matt Gardner

    Abstract: The ability to understand and work with numbers (numeracy) is critical for many complex reasoning tasks. Currently, most NLP models treat numbers in text in the same way as other tokens---they embed them as distributed vectors. Is this enough to capture numeracy? We begin by investigating the numerical reasoning capabilities of a state-of-the-art question answering model on the DROP dataset. We fi… ▽ More

    Submitted 18 September, 2019; v1 submitted 17 September, 2019; originally announced September 2019.

    Comments: EMNLP 2019

  44. arXiv:1909.03553  [pdf, other

    cs.CL cs.AI

    QuaRTz: An Open-Domain Dataset of Qualitative Relationship Questions

    Authors: Oyvind Tafjord, Matt Gardner, Kevin Lin, Peter Clark

    Abstract: We introduce the first open-domain dataset, called QuaRTz, for reasoning about textual qualitative relationships. QuaRTz contains general qualitative statements, e.g., "A sunscreen with a higher SPF protects the skin longer.", twinned with 3864 crowdsourced situated questions, e.g., "Billy is wearing sunscreen with a lower SPF than Lucy. Who will be best protected from the sun?", plus annotations… ▽ More

    Submitted 8 September, 2019; originally announced September 2019.

    Comments: EMNLP'19

  45. arXiv:1908.11214  [pdf, other

    cs.CL

    Global Reasoning over Database Structures for Text-to-SQL Parsing

    Authors: Ben Bogin, Matt Gardner, Jonathan Berant

    Abstract: State-of-the-art semantic parsers rely on auto-regressive decoding, emitting one symbol at a time. When tested against complex databases that are unobserved at training time (zero-shot), the parser often struggles to select the correct set of database constants in the new database, due to the local nature of decoding. In this work, we propose a semantic parser that globally reasons about the struc… ▽ More

    Submitted 29 August, 2019; originally announced August 2019.

    Comments: EMNLP 2019

  46. arXiv:1908.07125  [pdf, other

    cs.CL cs.LG

    Universal Adversarial Triggers for Attacking and Analyzing NLP

    Authors: Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer Singh

    Abstract: Adversarial examples highlight model vulnerabilities and are useful for evaluation and interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens that trigger a model to produce a specific prediction when concatenated to any input from a dataset. We propose a gradient-guided search over tokens which finds short trigger sequences (e.g., one word for classification… ▽ More

    Submitted 3 January, 2021; v1 submitted 19 August, 2019; originally announced August 2019.

    Comments: EMNLP 2019

  47. arXiv:1908.05852  [pdf, ps, other

    cs.CL

    Reasoning Over Paragraph Effects in Situations

    Authors: Kevin Lin, Oyvind Tafjord, Peter Clark, Matt Gardner

    Abstract: A key component of successfully reading a passage of text is the ability to apply knowledge gained from the passage to a new situation. In order to facilitate progress on this kind of reading, we present ROPES, a challenging benchmark for reading comprehension targeting Reasoning Over Paragraph Effects in Situations. We target expository language describing causes and effects (e.g., "animal pollin… ▽ More

    Submitted 15 December, 2019; v1 submitted 16 August, 2019; originally announced August 2019.

  48. arXiv:1908.05803  [pdf, other

    cs.CL

    Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning

    Authors: Pradeep Dasigi, Nelson F. Liu, Ana Marasović, Noah A. Smith, Matt Gardner

    Abstract: Machine comprehension of texts longer than a single sentence often requires coreference resolution. However, most current reading comprehension benchmarks do not contain complex coreferential phenomena and hence fail to evaluate the ability of models to resolve coreference. We present a new crowdsourced dataset containing more than 24K span-selection questions that require resolving coreference am… ▽ More

    Submitted 4 September, 2019; v1 submitted 15 August, 2019; originally announced August 2019.

    Comments: 8 pages including appendix; EMNLP 2019 accepted paper camera ready version

  49. arXiv:1906.07241  [pdf, other

    cs.CL

    Barack's Wife Hillary: Using Knowledge-Graphs for Fact-Aware Language Modeling

    Authors: Robert L. Logan IV, Nelson F. Liu, Matthew E. Peters, Matt Gardner, Sameer Singh

    Abstract: Modeling human language requires the ability to not only generate fluent text but also encode factual knowledge. However, traditional language models are only capable of remembering facts seen at training time, and often have difficulty recalling them. To address this, we introduce the knowledge graph language model (KGLM), a neural language model with mechanisms for selecting and copying facts fr… ▽ More

    Submitted 20 June, 2019; v1 submitted 17 June, 2019; originally announced June 2019.

  50. arXiv:1906.02900  [pdf, other

    cs.CL cs.AI

    Compositional Questions Do Not Necessitate Multi-hop Reasoning

    Authors: Sewon Min, Eric Wallace, Sameer Singh, Matt Gardner, Hannaneh Hajishirzi, Luke Zettlemoyer

    Abstract: Multi-hop reading comprehension (RC) questions are challenging because they require reading and reasoning over multiple paragraphs. We argue that it can be difficult to construct large multi-hop RC datasets. For example, even highly compositional questions can be answered with a single hop if they target specific entity types, or the facts needed to answer them are redundant. Our analysis is cente… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

    Comments: Published as a conference paper at ACL 2019 (short). Code available at https://github.com/shmsw25/single-hop-rc