Skip to main content

Showing 1–6 of 6 results for author: Salaberria, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09952  [pdf, other

    cs.CV cs.CL cs.LG

    BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval

    Authors: Imanol Miranda, Ander Salaberria, Eneko Agirre, Gorka Azkune

    Abstract: Existing Vision-Language Compositionality (VLC) benchmarks like SugarCrepe are formulated as image-to-text retrieval problems, where, given an image, the models need to select between the correct textual description and a synthetic hard negative text. In this work we present the Bidirectional Vision-Language Compositionality (BiVLC) dataset. The novelty of BiVLC is to add a synthetic hard negative… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  2. Grounding Spatial Relations in Text-Only Language Models

    Authors: Gorka Azkune, Ander Salaberria, Eneko Agirre

    Abstract: This paper shows that text-only Language Models (LM) can learn to ground spatial relations like "left of" or "below" if they are provided with explicit location information of objects and they are properly trained to leverage those locations. We perform experiments on a verbalized version of the Visual Spatial Reasoning (VSR) dataset, where images are coupled with textual statements which contain… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted in Neural Networks

  3. arXiv:2403.00587  [pdf, other

    cs.CV cs.AI

    Improving Explicit Spatial Relationships in Text-to-Image Generation through an Automatically Derived Dataset

    Authors: Ander Salaberria, Gorka Azkune, Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre, Frank Keller

    Abstract: Existing work has observed that current text-to-image systems do not accurately reflect explicit spatial relations between objects such as 'left of' or 'below'. We hypothesize that this is because explicit spatial relations rarely appear in the image captions used to train these models. We propose an automatic method that, given existing images, generates synthetic captions that contain 14 explici… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 12 pages and 5 figures

  4. arXiv:2304.10637  [pdf, other

    cs.CL

    IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named Entity Recognition using Knowledge Bases

    Authors: Iker GarcĂ­a-Ferrero, Jon Ander Campos, Oscar Sainz, Ander Salaberria, Dan Roth

    Abstract: Named Entity Recognition (NER) is a core natural language processing task in which pre-trained language models have shown remarkable performance. However, standard benchmarks like CoNLL 2003 do not address many of the challenges that deployed NER systems face, such as having to classify emerging or complex entities in a fine-grained way. In this paper we present a novel NER cascade approach compri… ▽ More

    Submitted 27 April, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: SemEval 2023

  5. Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering

    Authors: Ander Salaberria, Gorka Azkune, Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre

    Abstract: Integrating outside knowledge for reasoning in visio-linguistic tasks such as visual question answering (VQA) is an open problem. Given that pretrained language models have been shown to include world knowledge, we propose to use a unimodal (text-only) train and inference procedure based on automatic off-the-shelf captioning of images and pretrained language models. Our results on a visual questio… ▽ More

    Submitted 25 March, 2022; v1 submitted 15 September, 2021; originally announced September 2021.

    Comments: Under review. 25 pages with 4 figures

    Journal ref: Expert Systems with Applications, Volume 212, 2023, 118669

  6. arXiv:2004.01894  [pdf, other

    cs.CL

    Evaluating Multimodal Representations on Visual Semantic Textual Similarity

    Authors: Oier Lopez de Lacalle, Ander Salaberria, Aitor Soroa, Gorka Azkune, Eneko Agirre

    Abstract: The combination of visual and textual representations has produced excellent results in tasks such as image captioning and visual question answering, but the inference capabilities of multimodal representations are largely untested. In the case of textual representations, inference tasks such as Textual Entailment and Semantic Textual Similarity have been often used to benchmark the quality of tex… ▽ More

    Submitted 4 April, 2020; originally announced April 2020.

    Comments: Accepted in ECAI-2020, 8 pages, 6 tables, 6 figures