Skip to main content

Showing 1–47 of 47 results for author: Stanovsky, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16086  [pdf, other

    cs.CL

    SEAM: A Stochastic Benchmark for Multi-Document Tasks

    Authors: Gili Lior, Avi Caciularu, Arie Cattan, Shahar Levy, Ori Shapira, Gabriel Stanovsky

    Abstract: Various tasks, such as summarization, multi-hop question answering, or coreference resolution, are naturally phrased over collections of real-world documents. Such tasks present a unique set of challenges, revolving around the lack of coherent narrative structure across documents, which often leads to contradiction, omission, or repetition of information. Despite their real-world application and c… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  2. arXiv:2406.13274  [pdf, other

    cs.CL

    In-Context Learning on a Budget: A Case Study in Named Entity Recognition

    Authors: Uri Berger, Tal Baumel, Gabriel Stanovsky

    Abstract: Few shot in-context learning (ICL) typically assumes access to large annotated training sets. However, in many real world scenarios, such as domain adaptation, there is only a limited budget to annotate a small number of samples, with the goal of maximizing downstream performance. We study various methods for selecting samples to annotate within a predefined budget, specifically focusing on the na… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  3. arXiv:2406.00787  [pdf, other

    cs.CL

    Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation

    Authors: Bar Iluz, Yanai Elazar, Asaf Yehudai, Gabriel Stanovsky

    Abstract: Most works on gender bias focus on intrinsic bias -- removing traces of information about a protected group from the model's internal representation. However, these works are often disconnected from the impact of such debiasing on downstream applications, which is the main motivation for debiasing in the first place. In this work, we systematically test how methods for intrinsic debiasing affect n… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  4. arXiv:2405.14863  [pdf, other

    cs.CL cs.AI cs.LG

    A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns

    Authors: Asaf Yehudai, Taelin Karidi, Gabriel Stanovsky, Ariel Goldstein, Omri Abend

    Abstract: Cross-domain alignment refers to the task of map** a concept from one domain to another. For example, ``If a \textit{doctor} were a \textit{color}, what color would it be?''. This seemingly peculiar task is designed to investigate how people represent concrete and abstract concepts through their map**s between categories and their reasoning processes over those map**s. In this paper, we adap… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: CogSci

  5. arXiv:2403.00499  [pdf, other

    cs.CL

    Do Zombies Understand? A Choose-Your-Own-Adventure Exploration of Machine Cognition

    Authors: Ariel Goldstein, Gabriel Stanovsky

    Abstract: Recent advances in LLMs have sparked a debate on whether they understand text. In this position paper, we argue that opponents in this debate hold different definitions for understanding, and particularly differ in their view on the role of consciousness. To substantiate this claim, we propose a thought experiment involving an open-source chatbot $Z$ which excels on every possible benchmark, seemi… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  6. arXiv:2402.13906  [pdf, other

    cs.CL

    Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction

    Authors: Gili Lior, Yoav Goldberg, Gabriel Stanovsky

    Abstract: Document collections of various domains, e.g., legal, medical, or financial, often share some underlying collection-wide structure, which captures information that can aid both human users and structure-aware models. We propose to identify the typical structure of document within a collection, which requires to capture recurring topics across the collection, while abstracting over arbitrary header… ▽ More

    Submitted 20 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024 findings

  7. arXiv:2401.14493  [pdf, other

    cs.CL cs.HC cs.LG

    K-QA: A Real-World Medical Q&A Benchmark

    Authors: Itay Manes, Naama Ronn, David Cohen, Ran Ilan Ber, Zehavi Horowitz-Kugler, Gabriel Stanovsky

    Abstract: Ensuring the accuracy of responses provided by large language models (LLMs) is crucial, particularly in clinical settings where incorrect information may directly impact patient health. To address this challenge, we construct K-QA, a dataset containing 1,212 patient questions originating from real-world conversations held on K Health (an AI-driven clinical platform). We employ a panel of in-house… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: The data and the evaluation script are available at https://github.com/Itaymanes/K-QA. Results and model comparisons can be viewed at https://huggingface.co/spaces/Itaykhealth/K-QA

  8. arXiv:2401.00595  [pdf, other

    cs.CL

    State of What Art? A Call for Multi-Prompt LLM Evaluation

    Authors: Moran Mizrahi, Guy Kaplan, Dan Malkin, Rotem Dror, Dafna Shahaf, Gabriel Stanovsky

    Abstract: Recent advances in large language models (LLMs) have led to the development of various evaluation benchmarks. These benchmarks typically rely on a single instruction template for evaluating all LLMs on a specific task. In this paper, we comprehensively analyze the brittleness of results obtained via single-prompt evaluations across 6.5M instances, involving 20 different LLMs and 39 tasks from 3 be… ▽ More

    Submitted 6 May, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    Comments: Accepted at TACL; pre-MIT Press publication version

  9. arXiv:2309.12491  [pdf, other

    cs.CL cs.AI

    Exploring the Impact of Training Data Distribution and Subword Tokenization on Gender Bias in Machine Translation

    Authors: Bar Iluz, Tomasz Limisiewicz, Gabriel Stanovsky, David Mareček

    Abstract: We study the effect of tokenization on gender bias in machine translation, an aspect that has been largely overlooked in previous works. Specifically, we focus on the interactions between the frequency of gendered profession names in training data, their representation in the subword tokenizer's vocabulary, and gender bias. We observe that female and non-stereotypical gender inflections of profess… ▽ More

    Submitted 30 September, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: Accepted to AACL 2023

  10. arXiv:2308.00225  [pdf, other

    cs.AI cs.CY cs.LG

    Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias

    Authors: Itay Itzhak, Gabriel Stanovsky, Nir Rosenfeld, Yonatan Belinkov

    Abstract: Recent studies show that instruction tuning (IT) and reinforcement learning from human feedback (RLHF) improve the abilities of large language models (LMs) dramatically. While these tuning methods can help align models with human objectives and generate high-quality text, not much is known about their potential adverse effects. In this work, we investigate the effect of IT and RLHF on decision mak… ▽ More

    Submitted 31 March, 2024; v1 submitted 31 July, 2023; originally announced August 2023.

    Comments: TACL 2024. Presented at ACL 2024. 12 pages

  11. arXiv:2306.01058  [pdf, other

    cs.CL

    Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents

    Authors: Catherine Chen, Zejiang Shen, Dan Klein, Gabriel Stanovsky, Doug Downey, Kyle Lo

    Abstract: Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers. Layout-infused LMs are often evaluated on documents with familiar layout features (e.g., papers from the same publisher), but in practice models encounter documents with unfamiliar distributions of layout features, such as new combinations of text… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: To appear in ACL Findings 2023

  12. arXiv:2305.15389  [pdf, other

    cs.CL

    Comparing Humans and Models on a Similar Scale: Towards Cognitive Gender Bias Evaluation in Coreference Resolution

    Authors: Gili Lior, Gabriel Stanovsky

    Abstract: Spurious correlations were found to be an important factor explaining model performance in various NLP tasks (e.g., gender or racial artifacts), often considered to be ''shortcuts'' to the actual task. However, humans tend to similarly make quick (and sometimes wrong) predictions based on societal and cognitive presuppositions. In this work we address the question: can we quantify the extent to wh… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  13. arXiv:2305.14336  [pdf, other

    cs.CL

    Schema-Driven Information Extraction from Heterogeneous Tables

    Authors: Fan Bai, Junmo Kang, Gabriel Stanovsky, Dayne Freitag, Alan Ritter

    Abstract: In this paper, we explore the question of whether large language models can support cost-efficient information extraction from tables. We introduce schema-driven information extraction, a new task that transforms tabular data into structured records following a human-authored schema. To assess various LLM's capabilities on this task, we present a benchmark comprised of tables from four diverse dom… ▽ More

    Submitted 12 March, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

  14. arXiv:2305.05302  [pdf, other

    cs.CL

    The Perfect Victim: Computational Analysis of Judicial Attitudes towards Victims of Sexual Violence

    Authors: Eliya Habba, Renana Keydar, Dan Bareket, Gabriel Stanovsky

    Abstract: We develop computational models to analyze court statements in order to assess judicial attitudes toward victims of sexual violence in the Israeli court system. The study examines the resonance of "rape myths" in the criminal justice system's response to sex crimes, in particular in judicial assessment of victim's credibility. We begin by formulating an ontology for evaluating judicial attitudes t… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

  15. arXiv:2303.07274  [pdf, other

    cs.CV cs.AI cs.CL

    Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images

    Authors: Nitzan Bitton-Guetta, Yonatan Bitton, Jack Hessel, Ludwig Schmidt, Yuval Elovici, Gabriel Stanovsky, Roy Schwartz

    Abstract: Weird, unusual, and uncanny images pique the curiosity of observers because they challenge commonsense. For example, an image released during the 2022 world cup depicts the famous soccer stars Lionel Messi and Cristiano Ronaldo playing chess, which playfully violates our expectation that their competition should occur on the football field. Humans can easily recognize and interpret these unconvent… ▽ More

    Submitted 12 August, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted to ICCV 2023. Website: whoops-benchmark.github.io

  16. arXiv:2302.08464  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating and Improving the Coreference Capabilities of Machine Translation Models

    Authors: Asaf Yehudai, Arie Cattan, Omri Abend, Gabriel Stanovsky

    Abstract: Machine translation (MT) requires a wide range of linguistic capabilities, which current end-to-end models are expected to learn implicitly by observing aligned sentences in bilingual corpora. In this work, we ask: \emph{How well do MT models learn coreference resolution from implicit signal?} To answer this question, we develop an evaluation methodology that derives coreference clusters from MT o… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Comments: EACL paper

  17. arXiv:2302.04811  [pdf, other

    cs.CL

    A Large-Scale Multilingual Study of Visual Constraints on Linguistic Selection of Descriptions

    Authors: Uri Berger, Lea Frermann, Gabriel Stanovsky, Omri Abend

    Abstract: We present a large, multilingual study into how vision constrains linguistic choice, covering four languages and five linguistic properties, such as verb transitivity or use of numerals. We propose a novel method that leverages existing corpora of images with captions written by native speakers, and apply it to nine corpora, comprising 600k images and 3M captions. We study the relation between vis… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

    Comments: Accepted to EACL 2023 Findings

  18. arXiv:2212.04542  [pdf, other

    cs.CV cs.AI cs.CL

    VASR: Visual Analogies of Situation Recognition

    Authors: Yonatan Bitton, Ron Yosef, Eli Strugo, Dafna Shahaf, Roy Schwartz, Gabriel Stanovsky

    Abstract: A core process in human cognition is analogical map**: the ability to identify a similar relational structure between different situations. We introduce a novel task, Visual Analogies of Situation Recognition, adapting the classical word-analogy task into the visual domain. Given a triplet of images, the task is to select an image candidate B' that completes the analogy (A to A' is like B to wha… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

    Comments: Accepted to AAAI 2023. Website: https://vasr-dataset.github.io/

  19. arXiv:2210.13039  [pdf, other

    cs.CL

    "Covid vaccine is against Covid but Oxford vaccine is made at Oxford!" Semantic Interpretation of Proper Noun Compounds

    Authors: Keshav Kolluru, Gabriel Stanovsky, Mausam

    Abstract: Proper noun compounds, e.g., "Covid vaccine", convey information in a succinct manner (a "Covid vaccine" is a "vaccine that immunizes against the Covid disease"). These are commonly used in short-form domains, such as news headlines, but are largely ignored in information-seeking applications. To address this limitation, we release a new manually annotated dataset, ProNCI, consisting of 22.5K prop… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: Accepted at EMNLP'22

  20. arXiv:2210.07135  [pdf, other

    cs.CL

    You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models

    Authors: Tomasz Limisiewicz, Dan Malkin, Gabriel Stanovsky

    Abstract: Multilingual models have been widely used for cross-lingual transfer to low-resource languages. However, the performance on these languages is hindered by their underrepresentation in the pretraining data. To alleviate this problem, we propose a novel multilingual training technique based on teacher-student knowledge distillation. In this setting, we utilize monolingual teacher models optimized fo… ▽ More

    Submitted 26 May, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: SIGTYP 2023

  21. arXiv:2207.12576  [pdf, other

    cs.CL cs.AI cs.CV cs.HC

    WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models

    Authors: Yonatan Bitton, Nitzan Bitton Guetta, Ron Yosef, Yuval Elovici, Mohit Bansal, Gabriel Stanovsky, Roy Schwartz

    Abstract: While vision-and-language models perform well on tasks such as visual question answering, they struggle when it comes to basic human commonsense reasoning skills. In this work, we introduce WinoGAViL: an online game of vision-and-language associations (e.g., between werewolves and a full moon), used as a dynamic evaluation benchmark. Inspired by the popular card game Codenames, a spymaster gives a… ▽ More

    Submitted 11 October, 2022; v1 submitted 25 July, 2022; originally announced July 2022.

    Comments: Accepted to NeurIPS 2022, Datasets and Benchmarks. Website: https://winogavil.github.io/

  22. arXiv:2205.05974  [pdf, other

    cs.CL

    A Computational Acquisition Model for Multimodal Word Categorization

    Authors: Uri Berger, Gabriel Stanovsky, Omri Abend, Lea Frermann

    Abstract: Recent advances in self-supervised modeling of text and images open new opportunities for computational models of child language acquisition, which is believed to rely heavily on cross-modal signals. However, prior studies have been limited by their reliance on vision models trained on large image datasets annotated with a pre-defined set of depicted object categories. This is (a) not faithful to… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Comments: Accepted to NAACL 2022

  23. arXiv:2205.04086  [pdf, other

    cs.CL cs.AI

    A Balanced Data Approach for Evaluating Cross-Lingual Transfer: Map** the Linguistic Blood Bank

    Authors: Dan Malkin, Tomasz Limisiewicz, Gabriel Stanovsky

    Abstract: We show that the choice of pretraining languages affects downstream cross-lingual transfer for BERT-based models. We inspect zero-shot performance in balanced data conditions to mitigate data size confounds, classifying pretraining languages that improve downstream performance as donors, and languages that are improved in zero-shot performance as recipients. We develop a method of quadratic time c… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: Accepted to NAACL 2022

  24. arXiv:2204.12708  [pdf, other

    cs.CL

    On the Limitations of Dataset Balancing: The Lost Battle Against Spurious Correlations

    Authors: Roy Schwartz, Gabriel Stanovsky

    Abstract: Recent work has shown that deep learning models in NLP are highly sensitive to low-level correlations between simple features and specific output labels, leading to overfitting and lack of generalization. To mitigate this problem, a common practice is to balance datasets by adding new instances or by filtering out "easy" instances (Sakaguchi et al., 2020), culminating in a recent proposal to elimi… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Comments: Findings of NAACL 2022

  25. arXiv:2110.12383  [pdf, other

    cs.CL

    Automated Extraction of Sentencing Decisions from Court Cases in the Hebrew Language

    Authors: Mohr Wenger, Tom Kalir, Noga Berger, Carmit Chalamish, Renana Keydar, Gabriel Stanovsky

    Abstract: We present the task of Automated Punishment Extraction (APE) in sentencing decisions from criminal court cases in Hebrew. Addressing APE will enable the identification of sentencing patterns and constitute an important step** stone for many follow up legal NLP applications in Hebrew, including the prediction of sentencing decisions. We curate a dataset of sexual assault sentencing decisions and… ▽ More

    Submitted 24 October, 2021; originally announced October 2021.

    Comments: Accepted to the Natural Legal Language Processing workshop (NLLP 2021), colocated with EMNLP 2021

  26. arXiv:2109.04513  [pdf, other

    cs.CL

    Filling the Gaps in Ancient Akkadian Texts: A Masked Language Modelling Approach

    Authors: Koren Lazar, Benny Saret, Asaf Yehudai, Wayne Horowitz, Nathan Wasserman, Gabriel Stanovsky

    Abstract: We present models which complete missing text given transliterations of ancient Mesopotamian documents, originally written on cuneiform clay tablets (2500 BCE - 100 CE). Due to the tablets' deterioration, scholars often rely on contextual cues to manually fill in missing parts in the text in a subjective and time-consuming process. We identify that this challenge can be formulated as a masked lang… ▽ More

    Submitted 24 October, 2021; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: Accepted to EMNLP 2021 (Main Conference)

  27. arXiv:2109.03858  [pdf, other

    cs.CL

    Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution and Machine Translation

    Authors: Shahar Levy, Koren Lazar, Gabriel Stanovsky

    Abstract: Recent works have found evidence of gender bias in models of machine translation and coreference resolution using mostly synthetic diagnostic datasets. While these quantify bias in a controlled experiment, they often do so on a small scale and consist mostly of artificial, out-of-distribution sentences. In this work, we find grammatical patterns indicating stereotypical and non-stereotypical gende… ▽ More

    Submitted 10 September, 2021; v1 submitted 8 September, 2021; originally announced September 2021.

    Comments: Accepted to Findings of EMNLP 2021

  28. arXiv:2109.02040  [pdf, other

    cs.CL cs.CV cs.LG

    Data Efficient Masked Language Modeling for Vision and Language

    Authors: Yonatan Bitton, Gabriel Stanovsky, Michael Elhadad, Roy Schwartz

    Abstract: Masked language modeling (MLM) is one of the key sub-tasks in vision-language pretraining. In the cross-modal setting, tokens in the sentence are masked at random, and the model predicts the masked tokens given the image and the text. In this paper, we observe several key disadvantages of MLM in this setting. First, as captions tend to be short, in a third of the sentences no token is sampled. Sec… ▽ More

    Submitted 5 September, 2021; originally announced September 2021.

    Comments: Accepted to Findings of EMNLP 2021

  29. arXiv:2106.04192  [pdf, other

    cs.CL

    Realistic Evaluation Principles for Cross-document Coreference Resolution

    Authors: Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan

    Abstract: We point out that common evaluation practices for cross-document coreference resolution have been unrealistically permissive in their assumed settings, yielding inflated results. We propose addressing this issue via two evaluation methodology principles. First, as in other tasks, models should be evaluated on predicted mentions rather than on gold mentions. Doing this raises a subtle issue regardi… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: *SEM 2021

  30. arXiv:2106.01210  [pdf, other

    cs.CL

    Cross-document Coreference Resolution over Predicted Mentions

    Authors: Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan

    Abstract: Coreference resolution has been mostly investigated within a single document scope, showing impressive progress in recent years based on end-to-end models. However, the more challenging task of cross-document (CD) coreference resolution remained relatively under-explored, with the few recent models applied only to gold mentions. Here, we introduce the first end-to-end model for CD coreference reso… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: Findings of ACL 2021

  31. arXiv:2103.09591  [pdf, other

    cs.CL cs.CV

    Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA

    Authors: Yonatan Bitton, Gabriel Stanovsky, Roy Schwartz, Michael Elhadad

    Abstract: Recent works have shown that supervised models often exploit data artifacts to achieve good test scores while their performance severely degrades on samples outside their training distribution. Contrast sets (Gardneret al., 2020) quantify this phenomenon by perturbing test samples in a minimal way such that the output label is modified. While most contrast sets were created manually, requiring int… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

    Comments: Accepted to NAACL 2021

  32. arXiv:2101.10244  [pdf, other

    cs.CL

    Process-Level Representation of Scientific Protocols with Interactive Annotation

    Authors: Ronen Tamari, Fan Bai, Alan Ritter, Gabriel Stanovsky

    Abstract: We develop Process Execution Graphs (PEG), a document-level representation of real-world wet lab biochemistry protocols, addressing challenges such as cross-sentence relations, long-range coreference, grounding, and implicit arguments. We manually annotate PEGs in a corpus of complex lab protocols with a novel interactive textual simulator that keeps track of entity traits and semantic constraints… ▽ More

    Submitted 14 April, 2021; v1 submitted 25 January, 2021; originally announced January 2021.

    Comments: EACL 2021 camera ready. Data, models and code at https://textlabs.github.io/

  33. arXiv:2101.06561  [pdf, other

    cs.CL cs.AI

    GENIE: Toward Reproducible and Standardized Human Evaluation for Text Generation

    Authors: Daniel Khashabi, Gabriel Stanovsky, Jonathan Bragg, Nicholas Lourie, Jungo Kasai, Ye** Choi, Noah A. Smith, Daniel S. Weld

    Abstract: While often assumed a gold standard, effective human evaluation of text generation remains an important, open area for research. We revisit this problem with a focus on producing consistent evaluations that are reproducible -- over time and across different populations. We study this goal in different stages of the human evaluation pipeline. In particular, we consider design choices for the annota… ▽ More

    Submitted 31 October, 2022; v1 submitted 16 January, 2021; originally announced January 2021.

    Comments: Accepted to EMNLP 2022 main conference, visit our project page at: https://genie.apps.allenai.org

  34. arXiv:2010.06018  [pdf, ps, other

    cs.CL

    Gender Coreference and Bias Evaluation at WMT 2020

    Authors: Tom Kocmi, Tomasz Limisiewicz, Gabriel Stanovsky

    Abstract: Gender bias in machine translation can manifest when choosing gender inflections based on spurious gender correlations. For example, always translating doctors as men and nurses as women. This can be particularly harmful as models become more popular and deployed within commercial systems. Our work presents the largest evidence for the phenomenon in more than 19 systems submitted to the WMT over f… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Comments: Accepted WMT20

  35. MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics

    Authors: Anthony Chen, Gabriel Stanovsky, Sameer Singh, Matt Gardner

    Abstract: Posing reading comprehension as a generation problem provides a great deal of flexibility, allowing for open-ended questions with few restrictions on possible answers. However, progress is impeded by existing generation metrics, which rely on token overlap and are agnostic to the nuances of reading comprehension. To address this, we introduce a benchmark for training and evaluating generative read… ▽ More

    Submitted 15 October, 2020; v1 submitted 7 October, 2020; originally announced October 2020.

    Journal ref: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

  36. arXiv:2009.11032  [pdf, other

    cs.CL

    Streamlining Cross-Document Coreference Resolution: Evaluation and Modeling

    Authors: Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan

    Abstract: Recent evaluation protocols for Cross-document (CD) coreference resolution have often been inconsistent or lenient, leading to incomparable results across works and overestimation of performance. To facilitate proper future research on this task, our primary contribution is proposing a pragmatic evaluation methodology which assumes access to only raw text -- rather than assuming gold mentions, dis… ▽ More

    Submitted 23 October, 2020; v1 submitted 23 September, 2020; originally announced September 2020.

  37. arXiv:2004.13671  [pdf, other

    cs.CL

    Active Learning for Coreference Resolution using Discrete Annotation

    Authors: Belinda Z. Li, Gabriel Stanovsky, Luke Zettlemoyer

    Abstract: We improve upon pairwise annotation for active learning in coreference resolution, by asking annotators to identify mention antecedents if a presented mention pair is deemed not coreferent. This simple modification, when combined with a novel mention clustering algorithm for selecting which examples to label, is much more efficient in terms of the performance obtained per annotation budget. In exp… ▽ More

    Submitted 18 May, 2020; v1 submitted 28 April, 2020; originally announced April 2020.

    Comments: 12 pages, 7 figures, ACL 2020

  38. arXiv:2004.07453  [pdf, other

    cs.CL cs.LG

    The Right Tool for the Job: Matching Model and Instance Complexities

    Authors: Roy Schwartz, Gabriel Stanovsky, Swabha Swayamdipta, Jesse Dodge, Noah A. Smith

    Abstract: As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs. To better respect a given inference budget, we propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit" from neural network calculations for simple instances, and late (and accurate) exi… ▽ More

    Submitted 8 May, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: ACL 2020; 12 pages; code available in https://github.com/allenai/sledgehammer

  39. arXiv:2003.04567  [pdf, other

    cs.AI cs.CL cs.LG

    Ecological Semantics: Programming Environments for Situated Language Understanding

    Authors: Ronen Tamari, Gabriel Stanovsky, Dafna Shahaf, Reut Tsarfaty

    Abstract: Large-scale natural language understanding (NLU) systems have made impressive progress: they can be applied flexibly across a variety of tasks, and employ minimal structural assumptions. However, extensive empirical research has shown this to be a double-edged sword, coming at the cost of shallow understanding: inferior generalization, grounding and explainability. Grounded language learning appro… ▽ More

    Submitted 24 May, 2020; v1 submitted 10 March, 2020; originally announced March 2020.

    Comments: Camera ready for Bridging AI and Cognitive Science (BAICS) workshop at ICLR2020. For interactive demos, see https://eco-sem.github.io/

  40. arXiv:1911.03243  [pdf, ps, other

    cs.CL

    Controlled Crowdsourcing for High-Quality QA-SRL Annotation

    Authors: Paul Roit, Ayal Klein, Daniela Stepanov, Jonathan Mamou, Julian Michael, Gabriel Stanovsky, Luke Zettlemoyer, Ido Dagan

    Abstract: Question-answer driven Semantic Role Labeling (QA-SRL) was proposed as an attractive open and natural flavour of SRL, potentially attainable from laymen. Recently, a large-scale crowdsourced QA-SRL corpus and a trained parser were released. Trying to replicate the QA-SRL annotation for new texts, we found that the resulting annotations were lacking in quality, particularly in coverage, making them… ▽ More

    Submitted 13 May, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

  41. arXiv:1910.11966  [pdf, other

    cs.CL

    Yall should read this! Identifying Plurality in Second-Person Personal Pronouns in English Texts

    Authors: Gabriel Stanovsky, Ronen Tamari

    Abstract: Distinguishing between singular and plural "you" in English is a challenging task which has potential for downstream applications, such as machine translation or coreference resolution. While formal written English does not distinguish between these cases, other languages (such as Spanish), as well as other dialects of English (via phrases such as "yall"), do make this distinction. We make use of… ▽ More

    Submitted 25 October, 2019; originally announced October 2019.

    Comments: Accepted to the 5th Workshop on Noisy User-generated Text

  42. arXiv:1910.02228  [pdf, other

    cs.CL

    On the Limits of Learning to Actively Learn Semantic Representations

    Authors: Omri Koshorek, Gabriel Stanovsky, Yichu Zhou, Vivek Srikumar, Jonathan Berant

    Abstract: One of the goals of natural language understanding is to develop models that map sentences into meaning representations. However, training such models requires expensive annotation of complex structures, which hinders their adoption. Learning to actively-learn (LTAL) is a recent paradigm for reducing the amount of labeled data by learning a policy that selects which samples should be labeled. In t… ▽ More

    Submitted 5 October, 2019; originally announced October 2019.

    Comments: CoNLL 2019

  43. arXiv:1906.07883  [pdf, other

    cs.DL cs.CY cs.SI

    Gender trends in computer science authorship

    Authors: Lucy Lu Wang, Gabriel Stanovsky, Luca Weihs, Oren Etzioni

    Abstract: A large-scale, up-to-date analysis of Computer Science literature (11.8M papers through 2019) reveals that, if trends from the last 50 years continue, parity between the number of male and female authors will not be reached in this century. In contrast, parity is projected to be reached within two to three decades or may have already been reached in other fields of study like Medicine or Sociology… ▽ More

    Submitted 28 January, 2021; v1 submitted 18 June, 2019; originally announced June 2019.

    Comments: 13 pages, 8 figures, 2 tables, 4 appendices; Communications of the ACM

  44. arXiv:1906.00591  [pdf, ps, other

    cs.CL

    Evaluating Gender Bias in Machine Translation

    Authors: Gabriel Stanovsky, Noah A. Smith, Luke Zettlemoyer

    Abstract: We present the first challenge set and evaluation protocol for the analysis of gender bias in machine translation (MT). Our approach uses two recent coreference resolution datasets composed of English sentences which cast participants into non-stereotypical gender roles (e.g., "The doctor asked the nurse to help her in the operation"). We devise an automatic gender bias evaluation method for eight… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: Accepted to ACL 2019

  45. arXiv:1903.00161  [pdf, other

    cs.CL

    DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs

    Authors: Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, Matt Gardner

    Abstract: Reading comprehension has recently seen rapid progress, with systems matching humans on the most popular datasets for the task. However, a large body of work has highlighted the brittleness of these systems, showing that there is much work left to be done. We introduce a new English reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs. In this cro… ▽ More

    Submitted 16 April, 2019; v1 submitted 1 March, 2019; originally announced March 2019.

  46. arXiv:1711.05885  [pdf, other

    cs.CL

    Crowdsourcing Question-Answer Meaning Representations

    Authors: Julian Michael, Gabriel Stanovsky, Luheng He, Ido Dagan, Luke Zettlemoyer

    Abstract: We introduce Question-Answer Meaning Representations (QAMRs), which represent the predicate-argument structure of a sentence as a set of question-answer pairs. We also develop a crowdsourcing scheme to show that QAMRs can be labeled with very little training, and gather a dataset with over 5,000 sentences and 100,000 questions. A detailed qualitative analysis demonstrates that the crowd-generated… ▽ More

    Submitted 15 November, 2017; originally announced November 2017.

    Comments: 8 pages, 6 figures, 2 tables

  47. arXiv:1603.01648  [pdf, other

    cs.CL

    Getting More Out Of Syntax with PropS

    Authors: Gabriel Stanovsky, Jessica Ficler, Ido Dagan, Yoav Goldberg

    Abstract: Semantic NLP applications often rely on dependency trees to recognize major elements of the proposition structure of sentences. Yet, while much semantic structure is indeed expressed by syntax, many phenomena are not easily read out of dependency trees, often leading to further ad-hoc heuristic post-processing or to information loss. To directly address the needs of semantic applications, we prese… ▽ More

    Submitted 4 March, 2016; originally announced March 2016.