Skip to main content

Showing 1–23 of 23 results for author: Pezzelle, S

.
  1. arXiv:2407.04559  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition

    Authors: Aditya K Surikuchi, Raquel Fernández, Sandro Pezzelle

    Abstract: Visual storytelling consists in generating a natural language story given a temporally ordered sequence of images. This task is not only challenging for models, but also very difficult to evaluate with automatic metrics since there is no consensus about what makes a story 'good'. In this paper, we introduce a novel method that measures story quality in terms of human likeness regarding three key a… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  2. arXiv:2406.18403  [pdf, other

    cs.CL

    LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

    Authors: Anna Bavaresco, Raffaella Bernardi, Leonardo Bertolazzi, Desmond Elliott, Raquel Fernández, Albert Gatt, Esam Ghaleb, Mario Giulianelli, Michael Hanna, Alexander Koller, André F. T. Martins, Philipp Mondorf, Vera Neplenbroek, Sandro Pezzelle, Barbara Plank, David Schlangen, Alessandro Suglia, Aditya K Surikuchi, Ece Takmaz, Alberto Testoni

    Abstract: There is an increasing trend towards evaluating NLP models with LLM-generated judgments instead of human judgments. In the absence of a comparison against human data, this raises concerns about the validity of these evaluations; in case they are conducted with proprietary models, this also raises concerns over reproducibility. We provide JUDGE-BENCH, a collection of 20 NLP datasets with human anno… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2403.17806  [pdf, other

    cs.LG cs.CL

    Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms

    Authors: Michael Hanna, Sandro Pezzelle, Yonatan Belinkov

    Abstract: Many recent language model (LM) interpretability studies have adopted the circuits framework, which aims to find the minimal computational subgraph, or circuit, that explains LM behavior on a given task. Most studies determine which edges belong in a LM's circuit by performing causal interventions on each edge independently, but this scales poorly with model size. Edge attribution patching (EAP),… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  4. arXiv:2403.06935  [pdf, other

    cs.CL

    Naming, Describing, and Quantifying Visual Objects in Humans and LLMs

    Authors: Alberto Testoni, Juell Sprott, Sandro Pezzelle

    Abstract: While human speakers use a variety of different expressions when describing the same object in an image, giving rise to a distribution of plausible labels driven by pragmatic constraints, the extent to which current Vision & Language Large Language Models (VLLMs) can mimic this crucial feature of language use is an open question. This applies to common, everyday objects, but it is particularly int… ▽ More

    Submitted 4 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted to ACL 2024 (main conference)

  5. arXiv:2402.12486  [pdf, other

    cs.CL

    Do Pre-Trained Language Models Detect and Understand Semantic Underspecification? Ask the DUST!

    Authors: Frank Wildenburg, Michael Hanna, Sandro Pezzelle

    Abstract: In everyday language use, speakers frequently utter and interpret sentences that are semantically underspecified, namely, whose content is insufficient to fully convey their message or interpret them univocally. For example, to interpret the underspecified sentence "Don't spend too much", which leaves implicit what (not) to spend, additional linguistic context or outside knowledge is needed. In th… ▽ More

    Submitted 12 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  6. arXiv:2402.01352  [pdf, other

    cs.CL cs.AI cs.CV

    Describing Images $\textit{Fast and Slow}$: Quantifying and Predicting the Variation in Human Signals during Visuo-Linguistic Processes

    Authors: Ece Takmaz, Sandro Pezzelle, Raquel Fernández

    Abstract: There is an intricate relation between the properties of an image and how humans behave while describing the image. This behavior shows ample variation, as manifested in human signals such as eye movements and when humans start to describe the image. Despite the value of such signals of visuo-linguistic variation, they are virtually disregarded in the training of current pretrained models, which m… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: To appear in EACL 2024

  7. arXiv:2310.17770  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    GROOViST: A Metric for Grounding Objects in Visual Storytelling

    Authors: Aditya K Surikuchi, Sandro Pezzelle, Raquel Fernández

    Abstract: A proper evaluation of stories generated for a sequence of images -- the task commonly referred to as visual storytelling -- must consider multiple aspects, such as coherence, grammatical correctness, and visual grounding. In this work, we focus on evaluating the degree of grounding, that is, the extent to which a story is about the entities shown in the images. We analyze current metrics, both de… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: In EMNLP 2023 main conference proceedings (to appear)

  8. arXiv:2310.15061  [pdf, other

    cs.CL cs.AI cs.CV

    The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models

    Authors: Xinyi Chen, Raquel Fernández, Sandro Pezzelle

    Abstract: Despite the impressive performance achieved by pre-trained language-and-vision models in downstream tasks, it remains an open question whether this reflects a proper understanding of image-text interaction. In this work, we explore to what extent they handle basic linguistic constructions -- active-passive voice, coordination, and relative clauses -- that even preschool children can typically mast… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: This is the camera-ready version of the paper that will be published in the Proceedings of EMNLP 2023 (Singapore, 6-10 December 2023)

  9. arXiv:2310.15004  [pdf, other

    cs.CL

    When Language Models Fall in Love: Animacy Processing in Transformer Language Models

    Authors: Michael Hanna, Yonatan Belinkov, Sandro Pezzelle

    Abstract: Animacy - whether an entity is alive and sentient - is fundamental to cognitive processing, impacting areas such as memory, vision, and language. However, animacy is not always expressed directly in language: in English it often manifests indirectly, in the form of selectional constraints on verbs and adjectives. This poses a potential issue for transformer language models (LMs): they often train… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: To appear at EMNLP 2023

  10. arXiv:2307.00897  [pdf, other

    cs.LG cs.AI

    Fixing confirmation bias in feature attribution methods via semantic match

    Authors: Giovanni Cinà, Daniel Fernandez-Llaneza, Ludovico Deponte, Nishant Mishra, Tabea E. Röber, Sandro Pezzelle, Iacer Calixto, Rob Goedhart, Ş. İlker Birbil

    Abstract: Feature attribution methods have become a staple method to disentangle the complex behavior of black box models. Despite their success, some scholars have argued that such methods suffer from a serious flaw: they do not allow a reliable interpretation in terms of human concepts. Simply put, visualizing an array of feature contributions is not enough for humans to conclude something about a model's… ▽ More

    Submitted 26 February, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

  11. arXiv:2306.05240  [pdf, other

    cs.CL cs.AI cs.CV

    Dealing with Semantic Underspecification in Multimodal NLP

    Authors: Sandro Pezzelle

    Abstract: Intelligent systems that aim at mastering language as humans do must deal with its semantic underspecification, namely, the possibility for a linguistic signal to convey only part of the information needed for communication to succeed. Consider the usages of the pronoun they, which can leave the gender and number of its referent(s) underspecified. Semantic underspecification is not a bug but a cru… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: To appear in the Proceedings of ACL 2023 (main conference). 13 pages, 3 figures

  12. arXiv:2305.19933  [pdf, other

    cs.CL cs.AI cs.CV

    Speaking the Language of Your Listener: Audience-Aware Adaptation via Plug-and-Play Theory of Mind

    Authors: Ece Takmaz, Nicolo' Brandizzi, Mario Giulianelli, Sandro Pezzelle, Raquel Fernández

    Abstract: Dialogue participants may have varying levels of knowledge about the topic under discussion. In such cases, it is essential for speakers to adapt their utterances by taking their audience into account. Yet, it is an open question how such adaptation can be modelled in computational agents. In this paper, we model a visually grounded referential game between a knowledgeable speaker and a listener w… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: To appear in Findings of ACL 2023

  13. arXiv:2302.07232  [pdf, other

    cs.CL

    A Psycholinguistic Analysis of BERT's Representations of Compounds

    Authors: Lars Buijtelaar, Sandro Pezzelle

    Abstract: This work studies the semantic representations learned by BERT for compounds, that is, expressions such as sunlight or bodyguard. We build on recent studies that explore semantic information in Transformers at the word level and test whether BERT aligns with human semantic intuitions when dealing with expressions (e.g., sunlight) whose overall meaning depends -- to a various extent -- on the seman… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

    Comments: To appear in the Proceedings of EACL 2023 (main conference)

    ACM Class: I.2.7; J.5

  14. arXiv:2011.04592  [pdf, other

    cs.CL cs.CV

    Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze

    Authors: Ece Takmaz, Sandro Pezzelle, Lisa Beinborn, Raquel Fernández

    Abstract: When speakers describe an image, they tend to look at objects before mentioning them. In this paper, we investigate such sequential cross-modal alignment by modelling the image description generation process computationally. We take as our starting point a state-of-the-art image captioning system and develop several model variants that exploit information from human gaze patterns recorded during l… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

    Comments: In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)

  15. arXiv:2011.04554  [pdf, other

    cs.CL cs.CV

    Refer, Reuse, Reduce: Generating Subsequent References in Visual and Conversational Contexts

    Authors: Ece Takmaz, Mario Giulianelli, Sandro Pezzelle, Arabella Sinclair, Raquel Fernández

    Abstract: Dialogue participants often refer to entities or situations repeatedly within a conversation, which contributes to its cohesiveness. Subsequent references exploit the common ground accumulated by the interlocutors and hence have several interesting properties, namely, they tend to be shorter and reuse expressions that were effective in previous mentions. In this paper, we tackle the generation of… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

    Comments: In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)

  16. arXiv:1908.10285  [pdf, other

    cs.CL cs.AI cs.CV

    Is the Red Square Big? MALeViC: Modeling Adjectives Leveraging Visual Contexts

    Authors: Sandro Pezzelle, Raquel Fernández

    Abstract: This work aims at modeling how the meaning of gradable adjectives of size (`big', `small') can be learned from visually-grounded contexts. Inspired by cognitive and linguistic evidence showing that the use of these expressions relies on setting a threshold that is dependent on a specific context, we investigate the ability of multi-modal models in assessing whether an object is `big' or `small' in… ▽ More

    Submitted 27 August, 2019; originally announced August 2019.

    Comments: Accepted at EMNLP-IJCNLP 2019

  17. arXiv:1809.04344  [pdf, other

    cs.CV cs.AI cs.CL

    The Wisdom of MaSSeS: Majority, Subjectivity, and Semantic Similarity in the Evaluation of VQA

    Authors: Shailza Jolly, Sandro Pezzelle, Tassilo Klein, Andreas Dengel, Moin Nabi

    Abstract: We introduce MASSES, a simple evaluation metric for the task of Visual Question Answering (VQA). In its standard form, the VQA task is operationalized as follows: Given an image and an open-ended question in natural language, systems are required to provide a suitable answer. Currently, model performance is evaluated by means of a somehow simplistic metric: If the predicted answer is chosen by at… ▽ More

    Submitted 12 September, 2018; originally announced September 2018.

    Comments: 10 pages, 7 figures

  18. arXiv:1806.00354  [pdf, other

    cs.CL cs.AI

    Some of Them Can be Guessed! Exploring the Effect of Linguistic Context in Predicting Quantifiers

    Authors: Sandro Pezzelle, Shane Steinert-Threlkeld, Raffaela Bernardi, Jakub Szymanik

    Abstract: We study the role of linguistic context in predicting quantifiers (`few', `all'). We collect crowdsourced data from human participants and test various models in a local (single-sentence) and a global context (multi-sentence) condition. Models significantly out-perform humans in the former setting and are only slightly better in the latter. While human performance improves with more linguistic con… ▽ More

    Submitted 1 June, 2018; originally announced June 2018.

    Comments: ACL 2018

  19. arXiv:1804.05018  [pdf, other

    cs.CV cs.LG stat.ML

    Comparatives, Quantifiers, Proportions: A Multi-Task Model for the Learning of Quantities from Vision

    Authors: Sandro Pezzelle, Ionut-Teodor Sorodoc, Raffaella Bernardi

    Abstract: The present work investigates whether different quantification mechanisms (set comparison, vague quantification, and proportional estimation) can be jointly learned from visual scenes by a multi-task computational model. The motivation is that, in humans, these processes underlie the same cognitive, non-symbolic ability, which allows an automatic estimation and comparison of set magnitudes. We sho… ▽ More

    Submitted 13 April, 2018; originally announced April 2018.

    Comments: 12 pages (references included). To appear in the Proceedings of NAACL-HLT 2018

    MSC Class: 68T45

    Journal ref: Proceedings of NAACL-HLT 2018

  20. arXiv:1705.01359  [pdf, other

    cs.CV cs.CL cs.MM

    FOIL it! Find One mismatch between Image and Language caption

    Authors: Ravi Shekhar, Sandro Pezzelle, Yauhen Klimovich, Aurelie Herbelot, Moin Nabi, Enver Sangineto, Raffaella Bernardi

    Abstract: In this paper, we aim to understand whether current language and vision (LaVi) models truly grasp the interaction between the two modalities. To this end, we propose an extension of the MSCOCO dataset, FOIL-COCO, which associates images with both correct and "foil" captions, that is, descriptions of the image that are highly similar to the original ones, but contain one single mistake ("foil word"… ▽ More

    Submitted 3 May, 2017; originally announced May 2017.

    Comments: To appear at ACL 2017

  21. arXiv:1704.02923  [pdf, other

    cs.CL cs.AI cs.CV

    Pay Attention to Those Sets! Learning Quantification from Images

    Authors: Ionut Sorodoc, Sandro Pezzelle, Aurélie Herbelot, Mariella Dimiccoli, Raffaella Bernardi

    Abstract: Major advances have recently been made in merging language and vision representations. But most tasks considered so far have confined themselves to the processing of objects and lexicalised relations amongst objects (content words). We know, however, that humans (even pre-school children) can abstract over raw data to perform certain types of higher-level reasoning, expressed in natural language b… ▽ More

    Submitted 10 April, 2017; originally announced April 2017.

    Comments: Submitted to Journal Paper, 28 pages, 12 figures, 5 tables

  22. arXiv:1702.05270  [pdf, other

    cs.CL cs.AI cs.CV

    Be Precise or Fuzzy: Learning the Meaning of Cardinals and Quantifiers from Vision

    Authors: Sandro Pezzelle, Marco Marelli, Raffaella Bernardi

    Abstract: People can refer to quantities in a visual scene by using either exact cardinals (e.g. one, two, three) or natural language quantifiers (e.g. few, most, all). In humans, these two processes underlie fairly different cognitive and neural mechanisms. Inspired by this evidence, the present study proposes two models for learning the objective meaning of cardinals and quantifiers from visual scenes con… ▽ More

    Submitted 17 February, 2017; originally announced February 2017.

    Comments: Accepted at EACL2017. 7 pages

  23. arXiv:1606.06031  [pdf, other

    cs.CL cs.AI cs.LG

    The LAMBADA dataset: Word prediction requiring a broad discourse context

    Authors: Denis Paperno, Germán Kruszewski, Angeliki Lazaridou, Quan Ngoc Pham, Raffaella Bernardi, Sandro Pezzelle, Marco Baroni, Gemma Boleda, Raquel Fernández

    Abstract: We introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the whole passage, but not if they only see the last sentence preceding the target word. To succeed on LAM… ▽ More

    Submitted 20 June, 2016; originally announced June 2016.

    Comments: 10 pages, Accepted as a long paper for ACL 2016