Skip to main content

Showing 1–6 of 6 results for author: Mbaye, D

.
  1. arXiv:2305.13989  [pdf, other

    cs.CL

    MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages

    Authors: Cheikh M. Bamba Dione, David Adelani, Peter Nabende, Jesujoba Alabi, Thapelo Sindane, Happy Buzaaba, Shamsuddeen Hassan Muhammad, Chris Chinenye Emezue, Perez Ogayo, Anuoluwapo Aremu, Catherine Gitau, Derguene Mbaye, Jonathan Mukiibi, Blessing Sibanda, Bonaventure F. P. Dossou, Andiswa Bukula, Rooweither Mabuya, Allahsera Auguste Tapo, Edwin Munkoh-Buabeng, victoire Memdjokam Koagne, Fatoumata Ouoba Kabore, Amelia Taylor, Godson Kalipe, Tebogo Macucwa, Vukosi Marivate , et al. (19 additional authors not shown)

    Abstract: In this paper, we present MasakhaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the UD (universal dependencies) guidelines. We conducted extensive POS baseline experiments using conditional random field and several multilingual pre-trained language models. We applied various cross-l… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023 (Main conference)

  2. arXiv:2305.08518  [pdf, other

    cs.CL cs.AI

    Beqi: Revitalize the Senegalese Wolof Language with a Robust Spelling Corrector

    Authors: Derguene Mbaye, Moussa Diallo

    Abstract: The progress of Natural Language Processing (NLP), although fast in recent years, is not at the same pace for all languages. African languages in particular are still behind and lack automatic processing tools. Some of these tools are very important for the development of these languages but also have an important role in many NLP applications. This is particularly the case for automatic spell che… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: 17 pages, 7 figures, 5 tables

  3. arXiv:2305.00606  [pdf, other

    cs.CL

    Low-Resourced Machine Translation for Senegalese Wolof Language

    Authors: Derguene Mbaye, Moussa Diallo, Thierno Ibrahima Diop

    Abstract: Natural Language Processing (NLP) research has made great advancements in recent years with major breakthroughs that have established new benchmarks. However, these advances have mainly benefited a certain group of languages commonly referred to as resource-rich such as English and French. Majority of other languages with weaker resources are then left behind which is the case for most African lan… ▽ More

    Submitted 30 April, 2023; originally announced May 2023.

    Comments: 14 pages, 5 figures, 2 Tables, 8th International Congress on Information and Communication Technology (ICICT 2023)

  4. arXiv:2210.12391  [pdf, other

    cs.CL

    MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition

    Authors: David Ifeoluwa Adelani, Graham Neubig, Sebastian Ruder, Shruti Rijhwani, Michael Beukman, Chester Palen-Michel, Constantine Lignos, Jesujoba O. Alabi, Shamsuddeen H. Muhammad, Peter Nabende, Cheikh M. Bamba Dione, Andiswa Bukula, Rooweither Mabuya, Bonaventure F. P. Dossou, Blessing Sibanda, Happy Buzaaba, Jonathan Mukiibi, Godson Kalipe, Derguene Mbaye, Amelia Taylor, Fatoumata Kabore, Chris Chinenye Emezue, Anuoluwapo Aremu, Perez Ogayo, Catherine Gitau , et al. (20 additional authors not shown)

    Abstract: African languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity r… ▽ More

    Submitted 15 November, 2022; v1 submitted 22 October, 2022; originally announced October 2022.

    Comments: Accepted to EMNLP 2022 (updated Github link)

  5. arXiv:2205.02022  [pdf, other

    cs.CL

    A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation

    Authors: David Ifeoluwa Adelani, Jesujoba Oluwadara Alabi, Angela Fan, Julia Kreutzer, Xiaoyu Shen, Machel Reid, Dana Ruiter, Dietrich Klakow, Peter Nabende, Ernie Chang, Tajuddeen Gwadabe, Freshia Sackey, Bonaventure F. P. Dossou, Chris Chinenye Emezue, Colin Leong, Michael Beukman, Shamsuddeen Hassan Muhammad, Guyo Dub Jarso, Oreen Yousuf, Andre Niyongabo Rubungo, Gilles Hacheme, Eric Peter Wairagala, Muhammad Umair Nasir, Benjamin Ayoade Ajibade, Tunde Oluwaseyi Ajayi , et al. (20 additional authors not shown)

    Abstract: Recent advances in the pre-training of language models leverage large-scale datasets to create multilingual models. However, low-resource languages are mostly left out in these datasets. This is primarily because many widely spoken languages are not well represented on the web and therefore excluded from the large-scale crawls used to create datasets. Furthermore, downstream users of these models… ▽ More

    Submitted 22 August, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

    Comments: Accepted to NAACL 2022 (added evaluation data for amh, kin, nya, sna, xho)

  6. arXiv:2103.11811  [pdf

    cs.CL cs.AI

    MasakhaNER: Named Entity Recognition for African Languages

    Authors: David Ifeoluwa Adelani, Jade Abbott, Graham Neubig, Daniel D'souza, Julia Kreutzer, Constantine Lignos, Chester Palen-Michel, Happy Buzaaba, Shruti Rijhwani, Sebastian Ruder, Stephen Mayhew, Israel Abebe Azime, Shamsuddeen Muhammad, Chris Chinenye Emezue, Joyce Nakatumba-Nabende, Perez Ogayo, Anuoluwapo Aremu, Catherine Gitau, Derguene Mbaye, Jesujoba Alabi, Seid Muhie Yimam, Tajuddeen Gwadabe, Ignatius Ezeani, Rubungo Andre Niyongabo, Jonathan Mukiibi , et al. (36 additional authors not shown)

    Abstract: We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders. We detail characteristics of the languages to help researchers understand the challenges that these languages pose for NER. We… ▽ More

    Submitted 5 July, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: Accepted to TACL 2021, pre-MIT Press publication version