Skip to main content

Showing 1–1 of 1 results for author: Alrefaie, M T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.11130  [pdf

    cs.CL

    Exploring Tokenization Strategies and Vocabulary Sizes for Enhanced Arabic Language Models

    Authors: Mohamed Taher Alrefaie, Nour Eldin Morsy, Nada Samir

    Abstract: This paper presents a comprehensive examination of the impact of tokenization strategies and vocabulary sizes on the performance of Arabic language models in downstream natural language processing tasks. Our investigation focused on the effectiveness of four tokenizers across various tasks, including News Classification, Hate Speech Detection, Sentiment Analysis, and Natural Language Inference. Le… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.