Skip to main content

Showing 1–3 of 3 results for author: Yigezu, M G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.13737  [pdf, ps, other

    cs.CL

    EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation

    Authors: Atnafu Lambebo Tonja, Israel Abebe Azime, Tadesse Destaw Belay, Mesay Gemeda Yigezu, Moges Ahmed Mehamed, Abinew Ali Ayele, Ebrahim Chekol Jibril, Michael Melese Woldeyohannis, Olga Kolesnikova, Philipp Slusallek, Dietrich Klakow, Shengwu Xiong, Seid Muhie Yimam

    Abstract: Large language models (LLMs) have gained popularity recently due to their outstanding performance in various downstream Natural Language Processing (NLP) tasks. However, low-resource languages are still lagging behind current state-of-the-art (SOTA) developments in the field of NLP due to insufficient resources to train LLMs. Ethiopian languages exhibit remarkable linguistic diversity, encompassin… ▽ More

    Submitted 23 June, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted at LREC-Coling 2024

  2. arXiv:2304.09972  [pdf, other

    cs.CL

    MasakhaNEWS: News Topic Classification for African languages

    Authors: David Ifeoluwa Adelani, Marek Masiak, Israel Abebe Azime, Jesujoba Alabi, Atnafu Lambebo Tonja, Christine Mwase, Odunayo Ogundepo, Bonaventure F. P. Dossou, Akintunde Oladipo, Doreen Nixdorf, Chris Chinenye Emezue, sana al-azzawi, Blessing Sibanda, Davis David, Lolwethu Ndolela, Jonathan Mukiibi, Tunde Ajayi, Tatiana Moteu, Brian Odhiambo, Abraham Owodunni, Nnaemeka Obiefuna, Muhidin Mohamed, Shamsuddeen Hassan Muhammad, Teshome Mulugeta Ababu, Saheed Abdullahi Salahudeen , et al. (40 additional authors not shown)

    Abstract: African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African… ▽ More

    Submitted 20 September, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

    Comments: Accepted to IJCNLP-AACL 2023 (main conference)

  3. arXiv:2211.14459  [pdf, other

    cs.CL cs.AI

    Transformer-based Model for Word Level Language Identification in Code-mixed Kannada-English Texts

    Authors: Atnafu Lambebo Tonja, Mesay Gemeda Yigezu, Olga Kolesnikova, Moein Shahiki Tash, Grigori Sidorov, Alexander Gelbuk

    Abstract: Using code-mixed data in natural language processing (NLP) research currently gets a lot of attention. Language identification of social media code-mixed text has been an interesting problem of study in recent years due to the advancement and influences of social media in communication. This paper presents the Instituto Politécnico Nacional, Centro de Investigación en Computación (CIC) team's syst… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.