-
1st AfricaNLP Workshop Proceedings, 2020
Authors:
Kathleen Siminyu,
Laura Martinus,
Vukosi Marivate
Abstract:
Proceedings of the 1st AfricaNLP Workshop held on 26th April alongside ICLR 2020, Virtual Conference, Formerly Addis Ababa Ethiopia.
Proceedings of the 1st AfricaNLP Workshop held on 26th April alongside ICLR 2020, Virtual Conference, Formerly Addis Ababa Ethiopia.
△ Less
Submitted 20 November, 2020;
originally announced November 2020.
-
Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages
Authors:
Wilhelmina Nekoto,
Vukosi Marivate,
Tshinondiwa Matsila,
Timi Fasubaa,
Tajudeen Kolawole,
Taiwo Fagbohungbe,
Solomon Oluwole Akinola,
Shamsuddeen Hassan Muhammad,
Salomon Kabongo,
Salomey Osei,
Sackey Freshia,
Rubungo Andre Niyongabo,
Ricky Macharm,
Perez Ogayo,
Orevaoghene Ahia,
Musie Meressa,
Mofe Adeyemi,
Masabata Mokgesi-Selinga,
Lawrence Okegbemi,
Laura Jane Martinus,
Kolawole Tajudeen,
Kevin Degila,
Kelechi Ogueji,
Kathleen Siminyu,
Julia Kreutzer
, et al. (23 additional authors not shown)
Abstract:
Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communicat…
▽ More
Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released under https://github.com/masakhane-io/masakhane-mt.
△ Less
Submitted 6 November, 2020; v1 submitted 5 October, 2020;
originally announced October 2020.
-
Neural Machine Translation for South Africa's Official Languages
Authors:
Laura Martinus,
Jason Webster,
Joanne Moonsamy,
Moses Shaba Jnr,
Ridha Moosa,
Robert Fairon
Abstract:
Recent advances in neural machine translation (NMT) have led to state-of-the-art results for many European-based translation tasks. However, despite these advances, there is has been little focus in applying these methods to African languages. In this paper, we seek to address this gap by creating an NMT benchmark BLEU score between English and the ten remaining official languages in South Africa.
Recent advances in neural machine translation (NMT) have led to state-of-the-art results for many European-based translation tasks. However, despite these advances, there is has been little focus in applying these methods to African languages. In this paper, we seek to address this gap by creating an NMT benchmark BLEU score between English and the ten remaining official languages in South Africa.
△ Less
Submitted 8 May, 2020;
originally announced May 2020.
-
Masakhane -- Machine Translation For Africa
Authors:
Iroro Orife,
Julia Kreutzer,
Blessing Sibanda,
Daniel Whitenack,
Kathleen Siminyu,
Laura Martinus,
Jamiil Toure Ali,
Jade Abbott,
Vukosi Marivate,
Salomon Kabongo,
Musie Meressa,
Espoir Murhabazi,
Orevaoghene Ahia,
Elan van Biljon,
Arshath Ramkilowan,
Adewale Akinfaderin,
Alp Öktem,
Wole Akin,
Ghollah Kioko,
Kevin Degila,
Herman Kamper,
Bonaventure Dossou,
Chris Emezue,
Kelechi Ogueji,
Abdallah Bashir
Abstract:
Africa has over 2000 languages. Despite this, African languages account for a small portion of available resources and publications in Natural Language Processing (NLP). This is due to multiple factors, including: a lack of focus from government and funding, discoverability, a lack of community, sheer language complexity, difficulty in reproducing papers and no benchmarks to compare techniques. To…
▽ More
Africa has over 2000 languages. Despite this, African languages account for a small portion of available resources and publications in Natural Language Processing (NLP). This is due to multiple factors, including: a lack of focus from government and funding, discoverability, a lack of community, sheer language complexity, difficulty in reproducing papers and no benchmarks to compare techniques. To begin to address the identified problems, MASAKHANE, an open-source, continent-wide, distributed, online research effort for machine translation for African languages, was founded. In this paper, we discuss our methodology for building the community and spurring research from the African continent, as well as outline the success of the community in terms of addressing the identified problems affecting African NLP.
△ Less
Submitted 13 March, 2020;
originally announced March 2020.
-
Benchmarking Neural Machine Translation for Southern African Languages
Authors:
Laura Martinus,
Jade Z. Abbott
Abstract:
Unlike major Western languages, most African languages are very low-resourced. Furthermore, the resources that do exist are often scattered and difficult to obtain and discover. As a result, the data and code for existing research has rarely been shared. This has lead a struggle to reproduce reported results, and few publicly available benchmarks for African machine translation models exist. To st…
▽ More
Unlike major Western languages, most African languages are very low-resourced. Furthermore, the resources that do exist are often scattered and difficult to obtain and discover. As a result, the data and code for existing research has rarely been shared. This has lead a struggle to reproduce reported results, and few publicly available benchmarks for African machine translation models exist. To start to address these problems, we trained neural machine translation models for 5 Southern African languages on publicly-available datasets. Code is provided for training the models and evaluate the models on a newly released evaluation set, with the aim of spur future research in the field for Southern African languages.
△ Less
Submitted 17 June, 2019;
originally announced June 2019.
-
A Focus on Neural Machine Translation for African Languages
Authors:
Laura Martinus,
Jade Z. Abbott
Abstract:
African languages are numerous, complex and low-resourced. The datasets required for machine translation are difficult to discover, and existing research is hard to reproduce. Minimal attention has been given to machine translation for African languages so there is scant research regarding the problems that arise when using machine translation techniques. To begin addressing these problems, we tra…
▽ More
African languages are numerous, complex and low-resourced. The datasets required for machine translation are difficult to discover, and existing research is hard to reproduce. Minimal attention has been given to machine translation for African languages so there is scant research regarding the problems that arise when using machine translation techniques. To begin addressing these problems, we trained models to translate English to five of the official South African languages (Afrikaans, isiZulu, Northern Sotho, Setswana, Xitsonga), making use of modern neural machine translation techniques. The results obtained show the promise of using neural machine translation techniques for African languages. By providing reproducible publicly-available data, code and results, this research aims to provide a starting point for other researchers in African machine translation to compare to and build upon.
△ Less
Submitted 14 June, 2019; v1 submitted 11 June, 2019;
originally announced June 2019.
-
Towards Neural Machine Translation for African Languages
Authors:
Jade Z. Abbott,
Laura Martinus
Abstract:
Given that South African education is in crisis, strategies for improvement and sustainability of high-quality, up-to-date education must be explored. In the migration of education online, inclusion of machine translation for low-resourced local languages becomes necessary. This paper aims to spur the use of current neural machine translation (NMT) techniques for low-resourced local languages. The…
▽ More
Given that South African education is in crisis, strategies for improvement and sustainability of high-quality, up-to-date education must be explored. In the migration of education online, inclusion of machine translation for low-resourced local languages becomes necessary. This paper aims to spur the use of current neural machine translation (NMT) techniques for low-resourced local languages. The paper demonstrates state-of-the-art performance on English-to-Setswana translation using the Autshumato dataset. The use of the Transformer architecture beat previous techniques by 5.33 BLEU points. This demonstrates the promise of using current NMT techniques for African languages.
△ Less
Submitted 13 November, 2018;
originally announced November 2018.