AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages
Authors:
Odunayo Ogundepo,
Tajuddeen R. Gwadabe,
Clara E. Rivera,
Jonathan H. Clark,
Sebastian Ruder,
David Ifeoluwa Adelani,
Bonaventure F. P. Dossou,
Abdou Aziz DIOP,
Claytone Sikasote,
Gilles Hacheme,
Happy Buzaaba,
Ignatius Ezeani,
Rooweither Mabuya,
Salomey Osei,
Chris Emezue,
Albert Njoroge Kahira,
Shamsuddeen H. Muhammad,
Akintunde Oladipo,
Abraham Toluwase Owodunni,
Atnafu Lambebo Tonja,
Iyanuoluwa Shode,
Akari Asai,
Tunde Oluwaseyi Ajayi,
Clemencia Siro,
Steven Arthur
, et al. (27 additional authors not shown)
Abstract:
African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems -- those that retrieve answer content from other languages while serving people in their native language -- offer a means of filling this gap. To this end, we create…
▽ More
African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems -- those that retrieve answer content from other languages while serving people in their native language -- offer a means of filling this gap. To this end, we create AfriQA, the first cross-lingual QA dataset with a focus on African languages. AfriQA includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, AfriQA focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, AfriQA proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
The African Stopwords project: curating stopwords for African languages
Authors:
Chris Emezue,
Hellina Nigatu,
Cynthia Thinwa,
Helper Zhou,
Shamsuddeen Muhammad,
Lerato Louis,
Idris Abdulmumin,
Samuel Oyerinde,
Benjamin Ajibade,
Olanrewaju Samuel,
Oviawe Joshua,
Emeka Onwuegbuzia,
Handel Emezue,
Ifeoluwatayo A. Ige,
Atnafu Lambebo Tonja,
Chiamaka Chukwuneke,
Bonaventure F. P. Dossou,
Naome A. Etori,
Mbonu Chinedu Emmanuel,
Oreen Yousuf,
Kaosarat Aina,
Davis David
Abstract:
Stopwords are fundamental in Natural Language Processing (NLP) techniques for information retrieval. One of the common tasks in preprocessing of text data is the removal of stopwords. Currently, while high-resource languages like English benefit from the availability of several stopwords, low-resource languages, such as those found in the African continent, have none that are standardized and avai…
▽ More
Stopwords are fundamental in Natural Language Processing (NLP) techniques for information retrieval. One of the common tasks in preprocessing of text data is the removal of stopwords. Currently, while high-resource languages like English benefit from the availability of several stopwords, low-resource languages, such as those found in the African continent, have none that are standardized and available for use in NLP packages. Stopwords in the context of African languages are understudied and can reveal information about the crossover between languages. The \textit{African Stopwords} project aims to study and curate stopwords for African languages. In this paper, we present our current progress on ten African languages as well as future plans for the project.
△ Less
Submitted 21 March, 2023;
originally announced April 2023.