Skip to main content

Showing 1–5 of 5 results for author: Riabi, A

.
  1. arXiv:2406.17875  [pdf, other

    cs.CL

    Cloaked Classifiers: Pseudonymization Strategies on Sensitive Classification Tasks

    Authors: Arij Riabi, Menel Mahamdi, Virginie Mouilleron, Djamé Seddah

    Abstract: Protecting privacy is essential when sharing data, particularly in the case of an online radicalization dataset that may contain personal information. In this paper, we explore the balance between preserving data usefulness and ensuring robust privacy safeguards, since regulations like the European GDPR shape how personal information must be handled. We share our method for manually pseudonymizing… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Proceedings of the fifth Workshop on Privacy in Natural Language Processing

  2. arXiv:2311.09122  [pdf, other

    cs.CL

    Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

    Authors: Stephen Mayhew, Terra Blevins, Shuheng Liu, Marek Šuppa, Hila Gonen, Joseph Marvin Imperial, Börje F. Karlsson, Peiqin Lin, Nikola Ljubešić, LJ Miranda, Barbara Plank, Arij Riabi, Yuval Pinter

    Abstract: We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages. The overarching goal of UNER is to provide high-quality, cross-lingually consistent annotations to facilitate and standardize multilingual NER research. UNER v1 contains 18 datasets annotated with named entities in a cross-lingual consistent schema across 12 diverse langu… ▽ More

    Submitted 29 June, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: NAACL 2024 Camera-ready

  3. arXiv:2210.13029  [pdf, other

    cs.CL

    Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models

    Authors: Syrielle Montariol, Arij Riabi, Djamé Seddah

    Abstract: Zero-shot cross-lingual transfer learning has been shown to be highly challenging for tasks involving a lot of linguistic specificities or when a cultural gap is present between languages, such as in hate speech detection. In this paper, we highlight this limitation for hate speech detection in several domains and languages using strict experimental settings. Then, we propose to train on multiling… ▽ More

    Submitted 25 October, 2022; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: Accepted to Findings of AACL-IJCNLP 2022

  4. arXiv:2110.13658  [pdf, other

    cs.CL cs.LG

    Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?

    Authors: Arij Riabi, Benoît Sagot, Djamé Seddah

    Abstract: Recent impressive improvements in NLP, largely based on the success of contextual neural language models, have been mostly demonstrated on at most a couple dozen high-resource languages. Building language models and, more generally, NLP systems for non-standardized and low-resource languages remains a challenging task. In this work, we focus on North-African colloquial dialectal Arabic written usi… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: Camera ready version. Accepted to WNUT 2021

  5. arXiv:2010.12643  [pdf, other

    cs.CL

    Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering

    Authors: Arij Riabi, Thomas Scialom, Rachel Keraron, Benoît Sagot, Djamé Seddah, Jacopo Staiano

    Abstract: Coupled with the availability of large scale datasets, deep learning architectures have enabled rapid progress on the Question Answering task. However, most of those datasets are in English, and the performances of state-of-the-art multilingual models are significantly lower when evaluated on non-English data. Due to high data collection costs, it is not realistic to obtain annotated data for each… ▽ More

    Submitted 14 October, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: 7 pages