-
What Happens When Small Is Made Smaller? Exploring the Impact of Compression on Small Data Pretrained Language Models
Authors:
Busayo Awobade,
Mardiyyah Oduwole,
Steven Kolawole
Abstract:
Compression techniques have been crucial in advancing machine learning by enabling efficient training and deployment of large-scale language models. However, these techniques have received limited attention in the context of low-resource language models, which are trained on even smaller amounts of data and under computational constraints, a scenario known as the "low-resource double-bind." This p…
▽ More
Compression techniques have been crucial in advancing machine learning by enabling efficient training and deployment of large-scale language models. However, these techniques have received limited attention in the context of low-resource language models, which are trained on even smaller amounts of data and under computational constraints, a scenario known as the "low-resource double-bind." This paper investigates the effectiveness of pruning, knowledge distillation, and quantization on an exclusively low-resourced, small-data language model, AfriBERTa. Through a battery of experiments, we assess the effects of compression on performance across several metrics beyond accuracy. Our study provides evidence that compression techniques significantly improve the efficiency and effectiveness of small-data language models, confirming that the prevailing beliefs regarding the effects of compression on large, heavily parameterized models hold true for less-parameterized, small-data models.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
A robust regularized extreme learning machine for regression problems based on self-adaptive accelerated extra-gradient algorithm
Authors:
Muideen Adegoke,
Lateef O. Jolaoso,
Mardiyyah Oduwole
Abstract:
The Extreme Learning Machine (ELM) technique is a machine learning approach for constructing feed-forward neural networks with a single hidden layer and their models. The ELM model can be constructed while being trained by concurrently reducing both the modeling errors and the norm of the output weights. Usually, the squared loss is widely utilized in the objective function of ELM, which can be tr…
▽ More
The Extreme Learning Machine (ELM) technique is a machine learning approach for constructing feed-forward neural networks with a single hidden layer and their models. The ELM model can be constructed while being trained by concurrently reducing both the modeling errors and the norm of the output weights. Usually, the squared loss is widely utilized in the objective function of ELM, which can be treated as a LASSO problem and hence applying the fast iterative shrinkage thresholding algorithm (FISTA). However, in this paper, the minimization problem is solved from a variational inequalities perspective giving rise to improved algorithms that are more efficient than the FISTA. A fast general extra-gradient algorithm which is a form of first-order algorithm is developed with inertial acceleration techniques and variable stepsize which is updated at every iteration. The strong convergence and linear rate of convergence of the proposed algorithm are established under mild conditions. In terms of experiments, two experiments were presented to illustrate the computational advantage of the proposed method to other algorithms.
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
MasakhaNEWS: News Topic Classification for African languages
Authors:
David Ifeoluwa Adelani,
Marek Masiak,
Israel Abebe Azime,
Jesujoba Alabi,
Atnafu Lambebo Tonja,
Christine Mwase,
Odunayo Ogundepo,
Bonaventure F. P. Dossou,
Akintunde Oladipo,
Doreen Nixdorf,
Chris Chinenye Emezue,
sana al-azzawi,
Blessing Sibanda,
Davis David,
Lolwethu Ndolela,
Jonathan Mukiibi,
Tunde Ajayi,
Tatiana Moteu,
Brian Odhiambo,
Abraham Owodunni,
Nnaemeka Obiefuna,
Muhidin Mohamed,
Shamsuddeen Hassan Muhammad,
Teshome Mulugeta Ababu,
Saheed Abdullahi Salahudeen
, et al. (40 additional authors not shown)
Abstract:
African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African…
▽ More
African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa. We provide an evaluation of baseline models by training classical machine learning models and fine-tuning several language models. Furthermore, we explore several alternatives to full fine-tuning of language models that are better suited for zero-shot and few-shot learning such as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern exploiting training (PET), prompting language models (like ChatGPT), and prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API). Our evaluation in zero-shot setting shows the potential of prompting ChatGPT for news topic classification in low-resource African languages, achieving an average performance of 70 F1 points without leveraging additional supervision like MAD-X. In few-shot setting, we show that with as little as 10 examples per label, we achieved more than 90\% (i.e. 86.0 F1 points) of the performance of full supervised training (92.6 F1 points) leveraging the PET approach.
△ Less
Submitted 20 September, 2023; v1 submitted 19 April, 2023;
originally announced April 2023.
-
Masakhane-Afrisenti at SemEval-2023 Task 12: Sentiment Analysis using Afro-centric Language Models and Adapters for Low-resource African Languages
Authors:
Israel Abebe Azime,
Sana Sabah Al-Azzawi,
Atnafu Lambebo Tonja,
Iyanuoluwa Shode,
Jesujoba Alabi,
Ayodele Awokoya,
Mardiyyah Oduwole,
Tosin Adewumi,
Samuel Fanijo,
Oyinkansola Awosan,
Oreen Yousuf
Abstract:
AfriSenti-SemEval Shared Task 12 of SemEval-2023. The task aims to perform monolingual sentiment classification (sub-task A) for 12 African languages, multilingual sentiment classification (sub-task B), and zero-shot sentiment classification (task C). For sub-task A, we conducted experiments using classical machine learning classifiers, Afro-centric language models, and language-specific models. F…
▽ More
AfriSenti-SemEval Shared Task 12 of SemEval-2023. The task aims to perform monolingual sentiment classification (sub-task A) for 12 African languages, multilingual sentiment classification (sub-task B), and zero-shot sentiment classification (task C). For sub-task A, we conducted experiments using classical machine learning classifiers, Afro-centric language models, and language-specific models. For task B, we fine-tuned multilingual pre-trained language models that support many of the languages in the task. For task C, we used we make use of a parameter-efficient Adapter approach that leverages monolingual texts in the target language for effective zero-shot transfer. Our findings suggest that using pre-trained Afro-centric language models improves performance for low-resource African languages. We also ran experiments using adapters for zero-shot tasks, and the results suggest that we can obtain promising results by using adapters with a limited amount of resources.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.