Skip to main content

Showing 1–13 of 13 results for author: Saggion, H

.
  1. arXiv:2404.07814  [pdf, ps, other

    cs.CL

    MultiLS-SP/CA: Lexical Complexity Prediction and Lexical Simplification Resources for Catalan and Spanish

    Authors: Stefan Bott, Horacio Saggion, Nelson Peréz Rojas, Martin Solis Salazar, Saul Calderon Ramirez

    Abstract: Automatic lexical simplification is a task to substitute lexical items that may be unfamiliar and difficult to understand with easier and more common words. This paper presents MultiLS-SP/CA, a novel dataset for lexical simplification in Spanish and Catalan. This dataset represents the first of its kind in Catalan and a substantial addition to the sparse data on automatic lexical simplification wh… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Submitted to the 40th edition of the SEPLN Conference. Under Revision

  2. arXiv:2312.09897  [pdf, other

    cs.AI

    A Novel Dataset for Financial Education Text Simplification in Spanish

    Authors: Nelson Perez-Rojas, Saul Calderon-Ramirez, Martin Solis-Salazar, Mario Romero-Sandoval, Monica Arias-Monge, Horacio Saggion

    Abstract: Text simplification, crucial in natural language processing, aims to make texts more comprehensible, particularly for specific groups like visually impaired Spanish speakers, a less-represented language in this field. In Spanish, there are few datasets that can be used to create text simplification systems. Our research has the primary objective to develop a Spanish financial text simplification d… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  3. Creating a silver standard for patent simplification

    Authors: Silvia Casola, Alberto Lavelli, Horacio Saggion

    Abstract: Patents are legal documents that aim at protecting inventions on the one hand and at making technical knowledge circulate on the other. Their complex style -- a mix of legal, technical, and extremely vague language -- makes their content hard to access for humans and machines and poses substantial challenges to the information retrieval community. This paper proposes an approach to automatically s… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: This paper has been published at SIGIR 2023

  4. arXiv:2307.02120  [pdf, other

    cs.CL

    Multilingual Controllable Transformer-Based Lexical Simplification

    Authors: Kim Cheng Sheang, Horacio Saggion

    Abstract: Text is by far the most ubiquitous source of knowledge and information and should be made easily accessible to as many people as possible; however, texts often contain complex words that hinder reading comprehension and accessibility. Therefore, suggesting simpler alternatives for complex words without compromising meaning would help convey the information to a broader audience. This paper propose… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: The paper is accepted for SEPLN 2023

  5. arXiv:2303.08032  [pdf, other

    cs.CL cs.LG

    Verifying the Robustness of Automatic Credibility Assessment

    Authors: Piotr Przybyła, Alexander Shvets, Horacio Saggion

    Abstract: Text classification methods have been widely investigated as a way to detect content of low credibility: fake news, social media bots, propaganda, etc. Quite accurate models (likely based on deep neural networks) help in moderating public electronic platforms and often cause content creators to face rejection of their submissions or removal of already published texts. Having the incentive to evade… ▽ More

    Submitted 11 August, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

  6. arXiv:2302.02900  [pdf, other

    cs.CL cs.LG

    Controllable Lexical Simplification for English

    Authors: Kim Cheng Sheang, Daniel Ferrés, Horacio Saggion

    Abstract: Fine-tuning Transformer-based approaches have recently shown exciting results on sentence simplification task. However, so far, no research has applied similar approaches to the Lexical Simplification (LS) task. In this paper, we present ConLS, a Controllable Lexical Simplification system fine-tuned with T5 (a Transformer-based model pre-trained with a BERT-style approach and several other tasks).… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

  7. arXiv:2302.02888  [pdf, other

    cs.CL cs.LG

    Findings of the TSAR-2022 Shared Task on Multilingual Lexical Simplification

    Authors: Horacio Saggion, Sanja Štajner, Daniel Ferrés, Kim Cheng Sheang, Matthew Shardlow, Kai North, Marcos Zampieri

    Abstract: We report findings of the TSAR-2022 shared task on multilingual lexical simplification, organized as part of the Workshop on Text Simplification, Accessibility, and Readability TSAR-2022 held in conjunction with EMNLP 2022. The task called the Natural Language Processing research community to contribute with methods to advance the state of the art in multilingual lexical simplification for English… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

  8. arXiv:2209.05301  [pdf, ps, other

    cs.CL

    Lexical Simplification Benchmarks for English, Portuguese, and Spanish

    Authors: Sanja Stajner, Daniel Ferres, Matthew Shardlow, Kai North, Marcos Zampieri, Horacio Saggion

    Abstract: Even in highly-developed countries, as many as 15-30\% of the population can only understand texts written using a basic vocabulary. Their understanding of everyday texts is limited, which prevents them from taking an active role in society and making informed decisions regarding healthcare, legal representation, or democratic choice. Lexical simplification is a natural language processing task th… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

  9. arXiv:1904.00648  [pdf, other

    cs.CL

    Recognizing Musical Entities in User-generated Content

    Authors: Lorenzo Porcaro, Horacio Saggion

    Abstract: Recognizing Musical Entities is important for Music Information Retrieval (MIR) since it can improve the performance of several tasks such as music recommendation, genre classification or artist similarity. However, most entity recognition systems in the music domain have concentrated on formal texts (e.g. artists' biographies, encyclopedic articles, etc.), ignoring rich and noisy user-generated c… ▽ More

    Submitted 1 April, 2019; originally announced April 2019.

    Comments: International Conference on Computational Linguistics and Intelligent Text Processing (CICLing) 2019

  10. arXiv:1805.00731  [pdf, other

    cs.CL

    Exploring Emoji Usage and Prediction Through a Temporal Variation Lens

    Authors: Francesco Barbieri, Luis Marujo, Pradeep Karuturi, William Brendel, Horacio Saggion

    Abstract: The frequent use of Emojis on social media platforms has created a new form of multimodal social interaction. Develo** methods for the study and representation of emoji semantics helps to improve future multimodal communication systems. In this paper, we explore the usage and semantics of emojis over time. We compare emoji embeddings trained on a corpus of different seasons and show that some em… ▽ More

    Submitted 2 May, 2018; originally announced May 2018.

    Comments: Emojis @ ICWSM 2018

  11. arXiv:1803.02392  [pdf, other

    cs.CL

    Multimodal Emoji Prediction

    Authors: Francesco Barbieri, Miguel Ballesteros, Francesco Ronzano, Horacio Saggion

    Abstract: Emojis are small images that are commonly included in social media text messages. The combination of visual and textual content in the same message builds up a modern way of communication, that automatic systems are not used to deal with. In this paper we extend recent advances in emoji prediction by putting forward a multimodal approach that is able to predict emojis in Instagram posts. Instagram… ▽ More

    Submitted 17 April, 2018; v1 submitted 6 March, 2018; originally announced March 2018.

    Comments: NAACL 2018 (short)

  12. arXiv:1702.07285  [pdf, other

    cs.CL

    Are Emojis Predictable?

    Authors: Francesco Barbieri, Miguel Ballesteros, Horacio Saggion

    Abstract: Emojis are ideograms which are naturally combined with plain text to visually complement or condense the meaning of a message. Despite being widely used in social media, their underlying semantics have received little attention from a Natural Language Processing standpoint. In this paper, we investigate the relation between words and emojis, studying the novel task of predicting which emojis are e… ▽ More

    Submitted 24 February, 2017; v1 submitted 23 February, 2017; originally announced February 2017.

    Comments: To appear at EACL 2017

  13. arXiv:1606.02514  [pdf, other

    cs.CL

    DefExt: A Semi Supervised Definition Extraction Tool

    Authors: Luis Espinosa-Anke, Roberto Carlini, Horacio Saggion, Francesco Ronzano

    Abstract: We present DefExt, an easy to use semi supervised Definition Extraction Tool. DefExt is designed to extract from a target corpus those textual fragments where a term is explicitly mentioned together with its core features, i.e. its definition. It works on the back of a Conditional Random Fields based sequential labeling algorithm and a bootstrap** approach. Bootstrap** enables the model to gra… ▽ More

    Submitted 8 June, 2016; originally announced June 2016.

    Comments: GLOBALEX 2016 Lexicographic Resources for Human Language Technology Workshop Programme (p. 24) (2016, May)