Skip to main content

Showing 1–9 of 9 results for author: al-azzawi, s

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.09828  [pdf, other

    cs.CL

    AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages

    Authors: Jiayi Wang, David Ifeoluwa Adelani, Sweta Agrawal, Marek Masiak, Ricardo Rei, Eleftheria Briakou, Marine Carpuat, Xuanli He, Sofia Bourhim, Andiswa Bukula, Muhidin Mohamed, Temitayo Olatoye, Tosin Adewumi, Hamam Mokayed, Christine Mwase, Wangui Kimotho, Foutse Yuehgoh, Anuoluwapo Aremu, Jessica Ojo, Shamsuddeen Hassan Muhammad, Salomey Osei, Abdul-Hakeem Omotayo, Chiamaka Chukwuneke, Perez Ogayo, Oumaima Hourrane , et al. (33 additional authors not shown)

    Abstract: Despite the recent progress on scaling multilingual machine translation (MT) to several under-resourced African languages, accurately measuring this progress remains challenging, since evaluation is often performed on n-gram matching metrics such as BLEU, which typically show a weaker correlation with human judgments. Learned metrics such as COMET have higher correlation; however, the lack of eval… ▽ More

    Submitted 23 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Accepted by NAACL 2024

  2. arXiv:2304.12847  [pdf, other

    cs.CL

    NLP-LTU at SemEval-2023 Task 10: The Impact of Data Augmentation and Semi-Supervised Learning Techniques on Text Classification Performance on an Imbalanced Dataset

    Authors: Sana Sabah Al-Azzawi, György Kovács, Filip Nilsson, Tosin Adewumi, Marcus Liwicki

    Abstract: In this paper, we propose a methodology for task 10 of SemEval23, focusing on detecting and classifying online sexism in social media posts. The task is tackling a serious issue, as detecting harmful content on social media platforms is crucial for mitigating the harm of these posts on users. Our solution for this task is based on an ensemble of fine-tuned transformer-based models (BERTweet, RoBER… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: 6 pages, 5 figures , This paper has beed accepted in SemEval workshop at ACL 2023 conference

  3. arXiv:2304.09972  [pdf, other

    cs.CL

    MasakhaNEWS: News Topic Classification for African languages

    Authors: David Ifeoluwa Adelani, Marek Masiak, Israel Abebe Azime, Jesujoba Alabi, Atnafu Lambebo Tonja, Christine Mwase, Odunayo Ogundepo, Bonaventure F. P. Dossou, Akintunde Oladipo, Doreen Nixdorf, Chris Chinenye Emezue, sana al-azzawi, Blessing Sibanda, Davis David, Lolwethu Ndolela, Jonathan Mukiibi, Tunde Ajayi, Tatiana Moteu, Brian Odhiambo, Abraham Owodunni, Nnaemeka Obiefuna, Muhidin Mohamed, Shamsuddeen Hassan Muhammad, Teshome Mulugeta Ababu, Saheed Abdullahi Salahudeen , et al. (40 additional authors not shown)

    Abstract: African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African… ▽ More

    Submitted 20 September, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

    Comments: Accepted to IJCNLP-AACL 2023 (main conference)

  4. arXiv:2304.06459  [pdf, other

    cs.CL cs.AI

    Masakhane-Afrisenti at SemEval-2023 Task 12: Sentiment Analysis using Afro-centric Language Models and Adapters for Low-resource African Languages

    Authors: Israel Abebe Azime, Sana Sabah Al-Azzawi, Atnafu Lambebo Tonja, Iyanuoluwa Shode, Jesujoba Alabi, Ayodele Awokoya, Mardiyyah Oduwole, Tosin Adewumi, Samuel Fanijo, Oyinkansola Awosan, Oreen Yousuf

    Abstract: AfriSenti-SemEval Shared Task 12 of SemEval-2023. The task aims to perform monolingual sentiment classification (sub-task A) for 12 African languages, multilingual sentiment classification (sub-task B), and zero-shot sentiment classification (task C). For sub-task A, we conducted experiments using classical machine learning classifiers, Afro-centric language models, and language-specific models. F… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: SemEval 2023

  5. arXiv:2304.04029  [pdf, other

    cs.CL

    Bipol: A Novel Multi-Axes Bias Evaluation Metric with Explainability for NLP

    Authors: Lama Alkhaled, Tosin Adewumi, Sana Sabah Sabry

    Abstract: We introduce bipol, a new metric with explainability, for estimating social bias in text data. Harmful bias is prevalent in many online sources of data that are used for training machine learning (ML) models. In a step to address this challenge we create a novel metric that involves a two-step process: corpus-level evaluation based on model classification and sentence-level evaluation based on (se… ▽ More

    Submitted 16 September, 2023; v1 submitted 8 April, 2023; originally announced April 2023.

    Comments: Published in Elsevier's Natural Language Processing Journal

  6. Lon-ea at SemEval-2023 Task 11: A Comparison of Activation Functions for Soft and Hard Label Prediction

    Authors: Peyman Hosseini, Mehran Hosseini, Sana Sabah Al-Azzawi, Marcus Liwicki, Ignacio Castro, Matthew Purver

    Abstract: We study the influence of different activation functions in the output layer of deep neural network models for soft and hard label prediction in the learning with disagreement task. In this task, the goal is to quantify the amount of disagreement via predicting soft labels. To predict the soft labels, we use BERT-based preprocessors and encoders and vary the activation function used in the output… ▽ More

    Submitted 3 January, 2024; v1 submitted 4 March, 2023; originally announced March 2023.

    Comments: Accepted in ACL 2023 SemEval Workshop as selected task paper

    ACM Class: I.2.7

  7. arXiv:2301.12139  [pdf, other

    cs.CL

    Bipol: Multi-axes Evaluation of Bias with Explainability in Benchmark Datasets

    Authors: Tosin Adewumi, Isabella Södergren, Lama Alkhaled, Sana Sabah Sabry, Foteini Liwicki, Marcus Liwicki

    Abstract: We investigate five English NLP benchmark datasets (on the superGLUE leaderboard) and two Swedish datasets for bias, along multiple axes. The datasets are the following: Boolean Question (Boolq), CommitmentBank (CB), Winograd Schema Challenge (WSC), Wino-gender diagnostic (AXg), Recognising Textual Entailment (RTE), Swedish CB, and SWEDN. Bias can be harmful and it is known to be common in data, w… ▽ More

    Submitted 16 September, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

    Comments: Accepted at RANLP 2023

  8. arXiv:2210.05480  [pdf, other

    cs.CL

    T5 for Hate Speech, Augmented Data and Ensemble

    Authors: Tosin Adewumi, Sana Sabah Sabry, Nosheen Abid, Foteini Liwicki, Marcus Liwicki

    Abstract: We conduct relatively extensive investigations of automatic hate speech (HS) detection using different state-of-the-art (SoTA) baselines over 11 subtasks of 6 different datasets. Our motivation is to determine which of the recent SoTA models is best for automatic hate speech detection and what advantage methods like data augmentation and ensemble may have on the best model, if any. We carry out 6… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: 15 pages, 18 figures

  9. arXiv:2202.05690  [pdf, other

    cs.CL

    HaT5: Hate Language Identification using Text-to-Text Transfer Transformer

    Authors: Sana Sabah Sabry, Tosin Adewumi, Nosheen Abid, György Kovacs, Foteini Liwicki, Marcus Liwicki

    Abstract: We investigate the performance of a state-of-the art (SoTA) architecture T5 (available on the SuperGLUE) and compare with it 3 other previous SoTA architectures across 5 different tasks from 2 relatively diverse datasets. The datasets are diverse in terms of the number and types of tasks they have. To improve performance, we augment the training data by using an autoregressive model. We achieve ne… ▽ More

    Submitted 11 February, 2022; originally announced February 2022.

    Comments: 7 pages, 3 figures , conference

    MSC Class: 68