Skip to main content

Showing 1–12 of 12 results for author: Gonzalez-Dios, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.10579  [pdf, other

    cs.CL

    A Hard Nut to Crack: Idiom Detection with Conversational Large Language Models

    Authors: Francesca De Luca Fornaciari, Begoña Altuna, Itziar Gonzalez-Dios, Maite Melero

    Abstract: In this work, we explore idiomatic language processing with Large Language Models (LLMs). We introduce the Idiomatic language Test Suite IdioTS, a new dataset of difficult examples specifically designed by language experts to assess the capabilities of LLMs to process figurative language at sentence level. We propose a comprehensive evaluation methodology based on an idiom detection task, where LL… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  2. arXiv:2310.15941  [pdf, other

    cs.CL

    This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models

    Authors: Iker García-Ferrero, Begoña Altuna, Javier Álvez, Itziar Gonzalez-Dios, German Rigau

    Abstract: Although large language models (LLMs) have apparently acquired a certain level of grammatical knowledge and the ability to make generalizations, they fail to interpret negation, a crucial step in Natural Language Processing. We try to clarify the reasons for the sub-optimal performance of LLMs understanding negation. We introduce a large semi-automatically generated dataset of circa 400,000 descri… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted in the The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)

  3. arXiv:2306.03189  [pdf

    cs.CL

    Easy-to-Read in Germany: A Survey on its Current State and Available Resources

    Authors: Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel

    Abstract: Easy-to-Read Language (E2R) is a controlled language variant that makes any written text more accessible through the use of clear, direct and simple language. It is mainly aimed at people with cognitive or intellectual disabilities, among other target users. Plain Language (PL), on the other hand, is a variant of a given language, which aims to promote the use of simple language to communicate inf… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: 10th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, 2023

  4. arXiv:2303.03915  [pdf, other

    cs.CL cs.AI

    The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

    Authors: Hugo Laurençon, Lucile Saulnier, Thomas Wang, Christopher Akiki, Albert Villanova del Moral, Teven Le Scao, Leandro Von Werra, Chenghao Mou, Eduardo González Ponferrada, Huu Nguyen, Jörg Frohberg, Mario Šaško, Quentin Lhoest, Angelina McMillan-Major, Gerard Dupont, Stella Biderman, Anna Rogers, Loubna Ben allal, Francesco De Toni, Giada Pistilli, Olivier Nguyen, Somaieh Nikpoor, Maraim Masoud, Pierre Colombo, Javier de la Rosa , et al. (29 additional authors not shown)

    Abstract: As language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings. The BigScience workshop, a 1-year international and multidisciplinary initiative, was formed with the goal of researching and training large language models as a values-driven undertaking, putting issues of ethics, harm, and governance in the f… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: NeurIPS 2022, Datasets and Benchmarks Track

    ACM Class: I.2.7

  5. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  6. arXiv:2211.03152  [pdf, other

    cs.CL cs.AI

    Noisy Channel for Automatic Text Simplification

    Authors: Oscar M Cumbicus-Pineda, Iker Gutiérrez-Fandiño, Itziar Gonzalez-Dios, Aitor Soroa

    Abstract: In this paper we present a simple re-ranking method for Automatic Sentence Simplification based on the noisy channel scheme. Instead of directly computing the best simplification given a complex text, the re-ranking method also considers the probability of the simple sentence to produce the complex counterpart, as well as the probability of the simple text itself, according to a language model. Ou… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

    Comments: 8 pages

  7. arXiv:2205.01376  [pdf, other

    cs.CL

    Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning

    Authors: Oscar Sainz, Itziar Gonzalez-Dios, Oier Lopez de Lacalle, Bonan Min, Eneko Agirre

    Abstract: Recent work has shown that NLP tasks such as Relation Extraction (RE) can be recasted as Textual Entailment tasks using verbalizations, with strong performance in zero-shot and few-shot settings thanks to pre-trained entailment models. The fact that relations in current RE datasets are easily verbalized casts doubts on whether entailment would be effective in more complex tasks. In this work we sh… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: Accepted as Findings of NAACL2022

  8. arXiv:2109.04870  [pdf, other

    cs.CL cs.AI

    MultiAzterTest: a Multilingual Analyzer on Multiple Levels of Language for Readability Assessment

    Authors: Kepa Bengoetxea, Itziar Gonzalez-Dios

    Abstract: Readability assessment is the task of determining how difficult or easy a text is or which level/grade it has. Traditionally, language dependent readability formula have been used, but these formulae take few text characteristics into account. However, Natural Language Processing (NLP) tools that assess the complexity of texts are able to measure more different features and can be adapted to diffe… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: 33 pages

    MSC Class: 68T50; 91F20 ACM Class: I.2.7

  9. arXiv:2107.00333  [pdf, other

    cs.CL

    Multilingual Central Repository: a Cross-lingual Framework for Develo** Wordnets

    Authors: Xavier Gómez Guinovart, Itziar Gonzalez-Dios, Antoni Oliver, German Rigau

    Abstract: Language resources are necessary for language processing,but building them is costly, involves many researches from different areas and needs constant updating. In this paper, we describe the crosslingual framework used for develo** the Multilingual Central Repository (MCR), a multilingual knowledge base that includes wordnets of Basque, Catalan, English, Galician, Portuguese, Spanish and the fo… ▽ More

    Submitted 2 July, 2021; v1 submitted 1 July, 2021; originally announced July 2021.

    Comments: 11 pages, 1 figure. To appear in Special Issue on Linking, Integrating and Extending Wordnets, Linguistic Issues in Language Technology (LiLT) Volume 10, Issue 4, Sep 2017

  10. arXiv:1909.02314  [pdf, ps, other

    cs.AI cs.CL

    Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis

    Authors: Javier Álvez, Itziar Gonzalez-Dios, German Rigau

    Abstract: We describe a detailed analysis of a sample of large benchmark of commonsense reasoning problems that has been automatically obtained from WordNet, SUMO and their map**. The objective is to provide a better assessment of the quality of both the benchmark and the involved knowledge resources for advanced commonsense reasoning tasks. By means of this analysis, we are able to detect some knowledge… ▽ More

    Submitted 6 September, 2019; v1 submitted 5 September, 2019; originally announced September 2019.

    Comments: 9 pages, 2 figures, 2 tables; 10th Global WordNet Conference - GWC 2019

    MSC Class: 68T30 ACM Class: I.2.4

  11. arXiv:1808.04620  [pdf, ps, other

    cs.AI

    Applying the Closed World Assumption to SUMO-based FOL Ontologies for Effective Commonsense Reasoning

    Authors: Javier Álvez, Itziar Gonzalez-Dios, German Rigau

    Abstract: Most commonly, the Open World Assumption is adopted as a standard strategy for the design, construction and use of ontologies. This strategy limits the inferencing capabilities of any system because non-asserted statements (missing knowledge) could be assumed to be alternatively true or false. As we will demonstrate, this is especially the case of first-order logic (FOL) ontologies where non-asser… ▽ More

    Submitted 4 March, 2020; v1 submitted 14 August, 2018; originally announced August 2018.

    Comments: 7 pages, 2 figure, 4 tables

    MSC Class: 68T30 ACM Class: I.2.4

  12. arXiv:1805.07824  [pdf, ps, other

    cs.CL

    Validating WordNet Meronymy Relations using Adimen-SUMO

    Authors: Javier Álvez, Itziar Gonzalez-Dios, German Rigau

    Abstract: In this paper, we report on the practical application of a novel approach for validating the knowledge of WordNet using Adimen-SUMO. In particular, this paper focuses on cross-checking the WordNet meronymy relations against the knowledge encoded in Adimen-SUMO. Our validation approach tests a large set of competency questions (CQs), which are derived (semi)-automatically from the knowledge encoded… ▽ More

    Submitted 20 May, 2018; originally announced May 2018.

    Comments: 14 pages, 10 tables

    MSC Class: 68T30 ACM Class: I.2.4