Skip to main content

Showing 1–22 of 22 results for author: Marasovic, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.14897  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Chain-of-Thought Unfaithfulness as Disguised Accuracy

    Authors: Oliver Bentham, Nathan Stringham, Ana Marasović

    Abstract: Understanding the extent to which Chain-of-Thought (CoT) generations align with a large language model's (LLM) internal computations is critical for deciding whether to trust an LLM's output. As a proxy for CoT faithfulness, Lanham et al. (2023) propose a metric that measures a model's dependence on its CoT for producing an answer. Within a single family of proprietary models, they find that LLMs… ▽ More

    Submitted 21 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: TMLR accepted paper camera-ready version. First two authors contributed equally. 8 pages main, 13 pages appendix

  2. arXiv:2311.09694  [pdf, other

    cs.CL

    Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness

    Authors: Ashim Gupta, Rishanth Rajendhran, Nathan Stringham, Vivek Srikumar, Ana Marasović

    Abstract: Do larger and more performant models resolve NLP's longstanding robustness issues? We investigate this question using over 20 models of different sizes spanning different architectural choices and pretraining objectives. We conduct evaluations using (a) out-of-domain and challenge test sets, (b) behavioral testing with CheckLists, (c) contrast sets, and (d) adversarial inputs. Our analysis reveals… ▽ More

    Submitted 3 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: To appear at NAACL 24 - main conference. The code is available at: https://github.com/utahnlp/scaling_robustness/

  3. arXiv:2310.13781  [pdf, other

    cs.CL

    How Much Consistency Is Your Accuracy Worth?

    Authors: Jacob K. Johnson, Ana Marasović

    Abstract: Contrast set consistency is a robustness measurement that evaluates the rate at which a model correctly responds to all instances in a bundle of minimally different examples relying on the same knowledge. To draw additional insights, we propose to complement consistency with relative consistency -- the probability that an equally accurate model would surpass the consistency of the proposed model,… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: BlackboxNLP 2023 accepted paper camera-ready version; 6 pages main, 3 pages appendix

  4. arXiv:2211.00295  [pdf, other

    cs.CL cs.AI

    CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation

    Authors: Abhilasha Ravichander, Matt Gardner, Ana Marasović

    Abstract: The full power of human language-based communication cannot be realized without negation. All human languages have some form of negation. Despite this, negation remains a challenging phenomenon for current natural language understanding systems. To facilitate the future development of models that can process negation effectively, we present CONDAQA, the first English reading comprehension dataset… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: EMNLP 2022

  5. arXiv:2210.13575  [pdf, other

    cs.CL cs.AI

    Does Self-Rationalization Improve Robustness to Spurious Correlations?

    Authors: Alexis Ross, Matthew E. Peters, Ana Marasović

    Abstract: Rationalization is fundamental to human reasoning and learning. NLP models trained to produce rationales along with predictions, called self-rationalization models, have been investigated for their interpretability and utility to end-users. However, the extent to which training with human-written rationales facilitates learning remains an under-explored question. We ask whether training models to… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  6. arXiv:2209.06293  [pdf, other

    cs.CL cs.CV

    Do Androids Laugh at Electric Sheep? Humor "Understanding" Benchmarks from The New Yorker Caption Contest

    Authors: Jack Hessel, Ana Marasović, Jena D. Hwang, Lillian Lee, Jeff Da, Rowan Zellers, Robert Mankoff, Ye** Choi

    Abstract: Large neural networks can now generate jokes, but do they really "understand" humor? We challenge AI models with three tasks derived from the New Yorker Cartoon Caption Contest: matching a joke to a cartoon, identifying a winning caption, and explaining why a winning caption is funny. These tasks encapsulate progressively more sophisticated aspects of "understanding" a cartoon; key elements are th… ▽ More

    Submitted 6 July, 2023; v1 submitted 13 September, 2022; originally announced September 2022.

    Journal ref: ACL 2023

  7. arXiv:2205.11686  [pdf, other

    cs.CL cs.CV

    On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization

    Authors: Shruti Palaskar, Akshita Bhagia, Yonatan Bisk, Florian Metze, Alan W Black, Ana Marasović

    Abstract: Combining the visual modality with pretrained language models has been surprisingly effective for simple descriptive tasks such as image captioning. More general text generation however remains elusive. We take a step back and ask: How do these models work for more complex generative tasks, i.e. conditioning on both text and images? Are multimodal models simply visually adapted language models, or… ▽ More

    Submitted 22 October, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: v2: EMNLP Findings 2022 accepted paper camera-ready version. 9 pages main, 2 pages appendix

  8. arXiv:2111.08284  [pdf, other

    cs.CL

    Few-Shot Self-Rationalization with Natural Language Prompts

    Authors: Ana Marasović, Iz Beltagy, Doug Downey, Matthew E. Peters

    Abstract: Self-rationalization models that predict task labels and generate free-text elaborations for their predictions could enable more intuitive interaction with NLP systems. These models are, however, currently trained with a large amount of human-written free-text explanations for each task which hinders their broader usage. We propose to study a more realistic setting of self-rationalization using fe… ▽ More

    Submitted 25 April, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

    Comments: v2: NAACL Findings 2022 accepted paper camera-ready version. First two authors contributed equally. 9 pages main, 3 pages appendix

  9. arXiv:2105.08855  [pdf, other

    cs.CL

    Effective Attention Sheds Light On Interpretability

    Authors: Kaiser Sun, Ana Marasović

    Abstract: An attention matrix of a transformer self-attention sublayer can provably be decomposed into two components and only one of them (effective attention) contributes to the model output. This leads us to ask whether visualizing effective attention gives different conclusions than interpretation of standard attention. Using a subset of the GLUE tasks and BERT, we carry out an analysis to compare the t… ▽ More

    Submitted 18 May, 2021; originally announced May 2021.

    Comments: Accepted to Findings of ACL 2021

  10. arXiv:2104.08758  [pdf, other

    cs.CL cs.AI

    Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus

    Authors: Jesse Dodge, Maarten Sap, Ana Marasović, William Agnew, Gabriel Ilharco, Dirk Groeneveld, Margaret Mitchell, Matt Gardner

    Abstract: Large language models have led to remarkable progress on many NLP tasks, and researchers are turning to ever-larger text corpora to train them. Some of the largest corpora available are made by scra** significant portions of the internet, and are frequently introduced with only minimal documentation. In this work we provide some of the first documentation for the Colossal Clean Crawled Corpus (C… ▽ More

    Submitted 30 September, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

    Comments: EMNLP 2021 accepted paper camera ready version

  11. arXiv:2102.12060  [pdf, other

    cs.CL cs.AI cs.LG

    Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing

    Authors: Sarah Wiegreffe, Ana Marasović

    Abstract: Explainable NLP (ExNLP) has increasingly focused on collecting human-annotated textual explanations. These explanations are used downstream in three ways: as data augmentation to improve performance on a predictive task, as supervision to train models to produce explanations for their predictions, and as a ground-truth to evaluate model-generated explanations. In this review, we identify 65 datase… ▽ More

    Submitted 7 December, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: v3: NeurIPS 2021 accepted paper camera-ready version. The content of v3 is almost the same as of v1-2 but is more condensed. v4: Fixed a typo in the title and added acknowledgements. 10 pages main, 6 pages appendix

  12. arXiv:2012.15793  [pdf, other

    cs.CL

    Promoting Graph Awareness in Linearized Graph-to-Text Generation

    Authors: Alexander Hoyle, Ana Marasović, Noah Smith

    Abstract: Generating text from structured inputs, such as meaning representations or RDF triples, has often involved the use of specialized graph-encoding neural networks. However, recent applications of pretrained transformers to linearizations of graph inputs have yielded state-of-the-art generation results on graph-to-text tasks. Here, we explore the ability of these linearized models to encode local gra… ▽ More

    Submitted 31 December, 2020; originally announced December 2020.

  13. arXiv:2012.13985  [pdf, other

    cs.CL cs.AI

    Explaining NLP Models via Minimal Contrastive Editing (MiCE)

    Authors: Alexis Ross, Ana Marasović, Matthew E. Peters

    Abstract: Humans have been shown to give contrastive explanations, which explain why an observed event happened rather than some other counterfactual event (the contrast case). Despite the influential role that contrastivity plays in how humans explain, this property is largely missing from current methods for explaining NLP models. We present Minimal Contrastive Editing (MiCE), a method for producing contr… ▽ More

    Submitted 23 June, 2021; v1 submitted 27 December, 2020; originally announced December 2020.

  14. arXiv:2010.12762  [pdf, other

    cs.CL

    Measuring Association Between Labels and Free-Text Rationales

    Authors: Sarah Wiegreffe, Ana Marasović, Noah A. Smith

    Abstract: In interpretable NLP, we require faithful rationales that reflect the model's decision-making process for an explained instance. While prior work focuses on extractive rationales (a subset of the input words), we investigate their less-studied counterpart: free-text natural language rationales. We demonstrate that pipelines, existing models for faithful extractive rationalization on information-ex… ▽ More

    Submitted 29 August, 2022; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Revision to EMNLP 2021 camera-ready; corrects simulatability terminology and clarifies computation of rationale quality metric (no results changed). For a detailed explanation of changes, see https://github.com/allenai/label_rationale_association

  15. arXiv:2010.07526  [pdf, other

    cs.CL cs.CV

    Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs

    Authors: Ana Marasović, Chandra Bhagavatula, Jae Sung Park, Ronan Le Bras, Noah A. Smith, Ye** Choi

    Abstract: Natural language rationales could provide intuitive, higher-level explanations that are easily understandable by humans, complementing the more broadly studied lower-level explanations based on gradients or attention weights. We present the first study focused on generating natural language rationales across several complex visual reasoning tasks: visual commonsense reasoning, visual-textual entai… ▽ More

    Submitted 15 October, 2020; originally announced October 2020.

    Comments: Accepted to Findings of EMNLP

  16. arXiv:2010.07487  [pdf, other

    cs.AI cs.CY

    Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI

    Authors: Alon Jacovi, Ana Marasović, Tim Miller, Yoav Goldberg

    Abstract: Trust is a central component of the interaction between people and AI, in that 'incorrect' levels of trust may cause misuse, abuse or disuse of the technology. But what, precisely, is the nature of trust in AI? What are the prerequisites and goals of the cognitive mechanism of trust, and how can we promote them, or assess whether they are being satisfied in a given interaction? This work aims to a… ▽ More

    Submitted 20 January, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

    Comments: Accepted to ACM FAccT 2021

  17. arXiv:2010.06694  [pdf, other

    cs.HC

    Easy, Reproducible and Quality-Controlled Data Collection with Crowdaq

    Authors: Qiang Ning, Hao Wu, Pradeep Dasigi, Dheeru Dua, Matt Gardner, Robert L. Logan IV, Ana Marasovic, Zhen Nie

    Abstract: High-quality and large-scale data are key to success for AI systems. However, large-scale data annotation efforts are often confronted with a set of common challenges: (1) designing a user-friendly annotation interface; (2) training enough annotators efficiently; and (3) reproducibility. To address these problems, we introduce Crowdaq, an open-source platform that standardizes the data collection… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

    Comments: Accepted to the demo track of EMNLP 2020

  18. arXiv:2004.10964  [pdf, other

    cs.CL cs.LG

    Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

    Authors: Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A. Smith

    Abstract: Language models pretrained on text from a wide variety of sources form the foundation of today's NLP. In light of the success of these broad-coverage models, we investigate whether it is still helpful to tailor a pretrained model to the domain of a target task. We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks, s… ▽ More

    Submitted 5 May, 2020; v1 submitted 23 April, 2020; originally announced April 2020.

    Comments: ACL 2020

  19. arXiv:1908.05803  [pdf, other

    cs.CL

    Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning

    Authors: Pradeep Dasigi, Nelson F. Liu, Ana Marasović, Noah A. Smith, Matt Gardner

    Abstract: Machine comprehension of texts longer than a single sentence often requires coreference resolution. However, most current reading comprehension benchmarks do not contain complex coreferential phenomena and hence fail to evaluate the ability of models to resolve coreference. We present a new crowdsourced dataset containing more than 24K span-selection questions that require resolving coreference am… ▽ More

    Submitted 4 September, 2019; v1 submitted 15 August, 2019; originally announced August 2019.

    Comments: 8 pages including appendix; EMNLP 2019 accepted paper camera ready version

  20. arXiv:1711.00768  [pdf, other

    cs.CL

    SRL4ORL: Improving Opinion Role Labeling using Multi-task Learning with Semantic Role Labeling

    Authors: Ana Marasović, Anette Frank

    Abstract: For over a decade, machine learning has been used to extract opinion-holder-target structures from text to answer the question "Who expressed what kind of sentiment towards what?". Recent neural approaches do not outperform the state-of-the-art feature-based models for Opinion Role Labeling (ORL). We suspect this is due to the scarcity of labeled training data and address this issue using differen… ▽ More

    Submitted 19 April, 2018; v1 submitted 2 November, 2017; originally announced November 2017.

    Comments: Published in NAACL 2018

    Journal ref: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)

  21. arXiv:1706.02256  [pdf, other

    cs.CL stat.ML

    A Mention-Ranking Model for Abstract Anaphora Resolution

    Authors: Ana Marasović, Leo Born, Juri Opitz, Anette Frank

    Abstract: Resolving abstract anaphora is an important, but difficult task for text understanding. Yet, with recent advances in representation learning this task becomes a more tangible aim. A central property of abstract anaphora is that it establishes a relation between the anaphor embedded in the anaphoric sentence and its (typically non-nominal) antecedent. We propose a mention-ranking model that learns… ▽ More

    Submitted 21 July, 2017; v1 submitted 7 June, 2017; originally announced June 2017.

    Comments: In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP). Copenhagen, Denmark

  22. arXiv:1608.05243  [pdf, ps, other

    cs.CL

    Multilingual Modal Sense Classification using a Convolutional Neural Network

    Authors: Ana Marasović, Anette Frank

    Abstract: Modal sense classification (MSC) is a special WSD task that depends on the meaning of the proposition in the modal's scope. We explore a CNN architecture for classifying modal sense in English and German. We show that CNNs are superior to manually designed feature-based classifiers and a standard NN classifier. We analyze the feature maps learned by the CNN and identify known and previously unatte… ▽ More

    Submitted 18 August, 2016; originally announced August 2016.

    Comments: Final version, accepted at the 1st Workshop on Representation Learning for NLP, held in conjunction with ACL 2016