Skip to main content

Showing 1–8 of 8 results for author: Tonelli, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.13800  [pdf

    cs.CL cs.SI

    The Geography of Information Diffusion in Online Discourse on Europe and Migration

    Authors: Elisa Leonardelli, Sara Tonelli

    Abstract: The online diffusion of information related to Europe and migration has been little investigated from an external point of view. However, this is a very relevant topic, especially if users have had no direct contact with Europe and its perception depends solely on information retrieved online. In this work we analyse the information circulating online about Europe and migration after retrieving a… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  2. arXiv:2402.02975  [pdf, other

    cs.CL

    Putting Context in Context: the Impact of Discussion Structure on Text Classification

    Authors: Nicolò Penzo, Antonio Longa, Bruno Lepri, Sara Tonelli, Marco Guerini

    Abstract: Current text classification approaches usually focus on the content to be classified. Contextual aspects (both linguistic and extra-linguistic) are usually neglected, even in tasks based on online discussions. Still in many cases the multi-party and multi-turn nature of the context from which these elements are selected can be fruitfully exploited. In this work, we propose a series of experiments… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted to EACL 2024 main conference

  3. Agreeing to Disagree: Annotating Offensive Language Datasets with Annotators' Disagreement

    Authors: Elisa Leonardelli, Stefano Menini, Alessio Palmero Aprosio, Marco Guerini, Sara Tonelli

    Abstract: Since state-of-the-art approaches to offensive language detection rely on supervised learning, it is crucial to quickly adapt them to the continuously evolving scenario of social media. While several approaches have been proposed to tackle the problem from an algorithmic perspective, so to reduce the need for annotated data, less attention has been paid to the quality of these data. Following a tr… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

    Comments: To appear at EMNLP 2021 (long paper)

  4. Monolingual and Cross-Lingual Acceptability Judgments with the Italian CoLA corpus

    Authors: Daniela Trotta, Raffaele Guarasci, Elisa Leonardelli, Sara Tonelli

    Abstract: The development of automated approaches to linguistic acceptability has been greatly fostered by the availability of the English CoLA corpus, which has also been included in the widely used GLUE benchmark. However, this kind of research for languages other than English, as well as the analysis of cross-lingual approaches, has been hindered by the lack of resources with a comparable size in other l… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

    Comments: Findings of EMNLP 2021. Dataset available at https://github.com/dhfbk/ItaCoLA-dataset

  5. arXiv:2107.02472  [pdf, other

    cs.CL cs.AI cs.CY cs.HC

    Empowering NGOs in Countering Online Hate Messages

    Authors: Yi-Ling Chung, Serra Sinem Tekiroglu, Sara Tonelli, Marco Guerini

    Abstract: Studies on online hate speech have mostly focused on the automated detection of harmful messages. Little attention has been devoted so far to the development of effective strategies to fight hate speech, in particular through the creation of counter-messages. While existing manual scrutiny and intervention strategies are time-consuming and not scalable, advances in natural language processing have… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

    Comments: Preprint of the paper published in Online Social Networks and Media Journal (OSNEM)

  6. arXiv:2103.14916  [pdf, other

    cs.CL

    Abuse is Contextual, What about NLP? The Role of Context in Abusive Language Annotation and Detection

    Authors: Stefano Menini, Alessio Palmero Aprosio, Sara Tonelli

    Abstract: The datasets most widely used for abusive language detection contain lists of messages, usually tweets, that have been manually judged as abusive or not by one or more annotators, with the annotation performed at message level. In this paper, we investigate what happens when the hateful content of a message is judged also based on the context, given that messages are often ambiguous and need to be… ▽ More

    Submitted 27 March, 2021; originally announced March 2021.

  7. arXiv:2005.02235  [pdf, other

    cs.CL

    Creating a Multimodal Dataset of Images and Text to Study Abusive Language

    Authors: Alessio Palmero Aprosio, Stefano Menini, Sara Tonelli

    Abstract: In order to study online hate speech, the availability of datasets containing the linguistic phenomena of interest are of crucial importance. However, when it comes to specific target groups, for example teenagers, collecting such data may be problematic due to issues with consent and privacy restrictions. Furthermore, while text-only datasets of this kind have been widely used, limitations set by… ▽ More

    Submitted 5 May, 2020; originally announced May 2020.

  8. Following the footsteps of giants: Modeling the mobility of historically notable individuals using Wikipedia

    Authors: Lorenzo Lucchini, Sara Tonelli, Bruno Lepri

    Abstract: The steady growth of digitized historical information is continuously stimulating new different approaches to the fields of Digital Humanities and Computational Social Science. In this work, we use Natural Language Processing techniques to retrieve large amounts of historical information from Wikipedia. In particular, the pages of a set of historically notable individuals are processed to catch th… ▽ More

    Submitted 16 December, 2019; originally announced December 2019.

    Journal ref: EPJ Data Sci. 8, 36 (2019)