Skip to main content

Showing 1–19 of 19 results for author: Boleda, G

.
  1. arXiv:2311.10181  [pdf, other

    cs.CL

    The Impact of Familiarity on Naming Variation: A Study on Object Naming in Mandarin Chinese

    Authors: Yunke He, Xixian Liao, Jialing Liang, Gemma Boleda

    Abstract: Different speakers often produce different names for the same object or entity (e.g., "woman" vs. "tourist" for a female tourist). The reasons behind variation in naming are not well understood. We create a Language and Vision dataset for Mandarin Chinese that provides an average of 20 names for 1319 naturalistic images, and investigate how familiarity with a given kind of object relates to the de… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  2. arXiv:2305.14468  [pdf, other

    cs.CV cs.CL

    Run Like a Girl! Sports-Related Gender Bias in Language and Vision

    Authors: Sophia Harrison, Eleonora Gualdoni, Gemma Boleda

    Abstract: Gender bias in Language and Vision datasets and models has the potential to perpetuate harmful stereotypes and discrimination. We analyze gender bias in two Language and Vision datasets. Consistent with prior work, we find that both datasets underrepresent women, which promotes their invisibilization. Moreover, we hypothesize and find that a bias affects human naming choices for people playing spo… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  3. arXiv:2210.11512  [pdf, other

    cs.CL

    Communication breakdown: On the low mutual intelligibility between human and neural captioning

    Authors: Roberto Dessì, Eleonora Gualdoni, Francesca Franzon, Gemma Boleda, Marco Baroni

    Abstract: We compare the 0-shot performance of a neural caption-based image retriever when given as input either human-produced captions or captions generated by a neural captioner. We conduct this comparison on the recently introduced ImageCoDe data-set (Krojer et al., 2022) which contains hard distractors nearly identical to the images to be retrieved. We find that the neural retriever has much higher per… ▽ More

    Submitted 27 April, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: Accepted as a short paper at EMNLP 2022

  4. arXiv:2109.13105  [pdf, other

    cs.CL

    Does referent predictability affect the choice of referential form? A computational approach using masked coreference resolution

    Authors: Laura Aina, Xixian Liao, Gemma Boleda, Matthijs Westera

    Abstract: It is often posited that more predictable parts of a speaker's meaning tend to be made less explicit, for instance using shorter, less informative words. Studying these dynamics in the domain of referring expressions has proven difficult, with existing studies, both psycholinguistic and corpus-based, providing contradictory results. We test the hypothesis that speakers produce less informative ref… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

  5. arXiv:2004.03902  [pdf, other

    cs.CL

    Deep daxes: Mutual exclusivity arises through both learning biases and pragmatic strategies in neural networks

    Authors: Kristina Gulordava, Thomas Brochhagen, Gemma Boleda

    Abstract: Children's tendency to associate novel words with novel referents has been taken to reflect a bias toward mutual exclusivity. This tendency may be advantageous both as (1) an ad-hoc referent selection heuristic to single out referents lacking a label and as (2) an organizing principle of lexical acquisition. This paper investigates under which circumstances cross-situational neural models can come… ▽ More

    Submitted 1 April, 2021; v1 submitted 8 April, 2020; originally announced April 2020.

    Journal ref: Proceedings of the Annual Conference of the Cognitive Science Society, 42, 2020,2089-2095

  6. arXiv:1911.02103  [pdf, other

    cs.CV cs.CL cs.MM

    Recurrent Instance Segmentation using Sequences of Referring Expressions

    Authors: Alba Herrera-Palacio, Carles Ventura, Carina Silberer, Ionut-Teodor Sorodoc, Gemma Boleda, Xavier Giro-i-Nieto

    Abstract: The goal of this work is to segment the objects in an image that are referred to by a sequence of linguistic descriptions (referring expressions). We propose a deep neural network with recurrent layers that output a sequence of binary masks, one for each referring expression provided by the user. The recurrent layers in the architecture allow the model to condition each predicted mask on the previ… ▽ More

    Submitted 5 November, 2019; originally announced November 2019.

    Comments: 3rd NeurIPS Workshop on Visually Grounded Interaction and Language (ViGIL, 2019)

  7. arXiv:1906.05149  [pdf, other

    cs.CL

    Putting words in context: LSTM language models and lexical ambiguity

    Authors: Laura Aina, Kristina Gulordava, Gemma Boleda

    Abstract: In neural network models of language, words are commonly represented using context-invariant representations (word embeddings) which are then put in context in the hidden layers. Since words are often ambiguous, representing the contextually relevant information is not trivial. We investigate how an LSTM language model deals with lexical ambiguity in English, designing a method to probe its hidden… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

    Comments: To appear in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL)

  8. arXiv:1905.07356  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Don't Blame Distributional Semantics if it can't do Entailment

    Authors: Matthijs Westera, Gemma Boleda

    Abstract: Distributional semantics has had enormous empirical success in Computational Linguistics and Cognitive Science in modeling various semantic phenomena, such as semantic similarity, and distributional models are widely used in state-of-the-art Natural Language Processing systems. However, the theoretical status of distributional semantics within a broader theory of language and cognition is still un… ▽ More

    Submitted 17 May, 2019; originally announced May 2019.

    Comments: To appear in Proceedings of the 13th International Conference on Computational Semantics (IWCS 2019), Gothenburg, Sweden

  9. arXiv:1905.06649  [pdf, other

    cs.CL

    What do Entity-Centric Models Learn? Insights from Entity Linking in Multi-Party Dialogue

    Authors: Laura Aina, Carina Silberer, Matthijs Westera, Ionut-Teodor Sorodoc, Gemma Boleda

    Abstract: Humans use language to refer to entities in the external world. Motivated by this, in recent years several models that incorporate a bias towards learning entity representations have been proposed. Such entity-centric models have shown empirical success, but we still know little about why. In this paper we analyze the behavior of two recently proposed entity-centric models in a referential task, E… ▽ More

    Submitted 16 May, 2019; originally announced May 2019.

    Comments: To appear in Proceedings of NAACL 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics

  10. Distributional Semantics and Linguistic Theory

    Authors: Gemma Boleda

    Abstract: Distributional semantics provides multi-dimensional, graded, empirically induced word representations that successfully capture many aspects of meaning in natural languages, as shown in a large body of work in computational linguistics; yet, its impact in theoretical linguistics has so far been limited. This review provides a critical discussion of the literature on distributional semantics, with… ▽ More

    Submitted 18 March, 2020; v1 submitted 6 May, 2019; originally announced May 2019.

    Comments: 22 pages, 4 figures; preprint version (minor modifications wrt previous version). When citing this article, please use the journal reference: Boleda, G. 2020. Distributional Semantics and Linguistic Theory. Annu. Rev. Linguist. 6:213-34

    Journal ref: Annu. Rev. Linguist. 6:213-34 (2020)

  11. arXiv:1809.03169  [pdf, other

    cs.CL

    Short-Term Meaning Shift: A Distributional Exploration

    Authors: Marco Del Tredici, Raquel Fernández, Gemma Boleda

    Abstract: We present the first exploration of meaning shift over short periods of time in online communities using distributional representations. We create a small annotated dataset and use it to assess the performance of a standard model for meaning shift detection on short-term meaning shift. We find that the model has problems distinguishing meaning shift from referential phenomena, and propose a measur… ▽ More

    Submitted 30 April, 2019; v1 submitted 10 September, 2018; originally announced September 2018.

    Comments: Accepted at NAACL2019

  12. Instantiation

    Authors: Abhijeet Gupta, Gemma Boleda, Sebastian Pado

    Abstract: In computational linguistics, a large body of work exists on distributed modeling of lexical relations, focussing largely on lexical relations such as hypernymy (scientist -- person) that hold between two categories, as expressed by common nouns. In contrast, computational linguistics has paid little attention to entities denoted by proper nouns (Marie Curie, Mumbai, ...). These have investigated… ▽ More

    Submitted 5 August, 2018; originally announced August 2018.

    Comments: submitted to Computational Linguistics

    Journal ref: Substantially revised version published at Cognitive Science (2021)

  13. arXiv:1805.05370  [pdf, other

    cs.CL

    AMORE-UPF at SemEval-2018 Task 4: BiLSTM with Entity Library

    Authors: Laura Aina, Carina Silberer, Ionut-Teodor Sorodoc, Matthijs Westera, Gemma Boleda

    Abstract: This paper describes our winning contribution to SemEval 2018 Task 4: Character Identification on Multiparty Dialogues. It is a simple, standard model with one key innovation, an entity library. Our results show that this innovation greatly facilitates the identification of infrequent characters. Because of the generic nature of our model, this finding is potentially relevant to any task that requ… ▽ More

    Submitted 14 May, 2018; originally announced May 2018.

  14. arXiv:1702.01815  [pdf, other

    cs.CL

    Living a discrete life in a continuous world: Reference with distributed representations

    Authors: Gemma Boleda, Sebastian Padó, Nghia The Pham, Marco Baroni

    Abstract: Reference is a crucial property of language that allows us to connect linguistic expressions to the world. Modeling it requires handling both continuous and discrete aspects of meaning. Data-driven models excel at the former, but struggle with the latter, and the reverse is true for symbolic models. This paper (a) introduces a concrete referential task to test both aspects, called cross-modal en… ▽ More

    Submitted 4 September, 2017; v1 submitted 6 February, 2017; originally announced February 2017.

    Comments: Accepted at IWCS 2017. Final version, 9 pages

  15. "Show me the cup": Reference with Continuous Representations

    Authors: Gemma Boleda, Sebastian Padó, Marco Baroni

    Abstract: One of the most basic functions of language is to refer to objects in a shared scene. Modeling reference with continuous representations is challenging because it requires individuation, i.e., tracking and distinguishing an arbitrary number of referents. We introduce a neural network model that, given a definite description and a set of objects represented by natural images, points to the intended… ▽ More

    Submitted 28 June, 2016; originally announced June 2016.

    Journal ref: In: Gelbukh A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science, vol 10761. Springer, Cham

  16. arXiv:1606.06031  [pdf, other

    cs.CL cs.AI cs.LG

    The LAMBADA dataset: Word prediction requiring a broad discourse context

    Authors: Denis Paperno, Germán Kruszewski, Angeliki Lazaridou, Quan Ngoc Pham, Raffaella Bernardi, Sandro Pezzelle, Marco Baroni, Gemma Boleda, Raquel Fernández

    Abstract: We introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the whole passage, but not if they only see the last sentence preceding the target word. To succeed on LAM… ▽ More

    Submitted 20 June, 2016; originally announced June 2016.

    Comments: 10 pages, Accepted as a long paper for ACL 2016

  17. arXiv:1407.8322  [pdf, other

    physics.soc-ph cs.CL physics.data-an

    Zipf's law for word frequencies: word forms versus lemmas in long texts

    Authors: Alvaro Corral, Gemma Boleda, Ramon Ferrer-i-Cancho

    Abstract: Zipf's law is a fundamental paradigm in the statistics of written and spoken natural language as well as in other communication systems. We raise the question of the elementary units for which Zipf's law should hold in the most natural way, studying its validity for plain word forms and for the corresponding lemma forms. In order to have as homogeneous sources as possible, we analyze some of the l… ▽ More

    Submitted 13 July, 2015; v1 submitted 31 July, 2014; originally announced July 2014.

    Journal ref: PLoS ONE 10 (7), e0129031

  18. arXiv:1303.0705  [pdf, ps, other

    physics.soc-ph cond-mat.stat-mech nlin.AO

    A scaling law beyond Zipf's law and its relation to Heaps' law

    Authors: Francesc Font-Clos, Gemma Boleda, Álvaro Corral

    Abstract: The dependence with text length of the statistical properties of word occurrences has long been considered a severe limitation quantitative linguistics. We propose a simple scaling form for the distribution of absolute word frequencies which uncovers the robustness of this distribution as text grows. In this way, the shape of the distribution is always the same and it is only a scale parameter whi… ▽ More

    Submitted 9 September, 2013; v1 submitted 4 March, 2013; originally announced March 2013.

    Comments: Final version, to appear in NJP

  19. arXiv:0901.2924  [pdf, ps, other

    physics.soc-ph cs.CL

    Universal Complex Structures in Written Language

    Authors: Alvaro Corral, Ramon Ferrer-i-Cancho, Gemma Boleda, Albert Diaz-Guilera, .

    Abstract: Quantitative linguistics has provided us with a number of empirical laws that characterise the evolution of languages and competition amongst them. In terms of language usage, one of the most influential results is Zipf's law of word frequencies. Zipf's law appears to be universal, and may not even be unique to human language. However, there is ongoing controversy over whether Zipf's law is a go… ▽ More

    Submitted 19 January, 2009; originally announced January 2009.

    Comments: Short paper