-
Evaluation of Semantic Search and its Role in Retrieved-Augmented-Generation (RAG) for Arabic Language
Authors:
Ali Mahboub,
Muhy Eddin Za'ter,
Bashar Al-Rfooh,
Yazan Estaitia,
Adnan Jaljuli,
Asma Hakouz
Abstract:
The latest advancements in machine learning and deep learning have brought forth the concept of semantic similarity, which has proven immensely beneficial in multiple applications and has largely replaced keyword search. However, evaluating semantic similarity and conducting searches for a specific query across various documents continue to be a complicated task. This complexity is due to the mult…
▽ More
The latest advancements in machine learning and deep learning have brought forth the concept of semantic similarity, which has proven immensely beneficial in multiple applications and has largely replaced keyword search. However, evaluating semantic similarity and conducting searches for a specific query across various documents continue to be a complicated task. This complexity is due to the multifaceted nature of the task, the lack of standard benchmarks, whereas these challenges are further amplified for Arabic language. This paper endeavors to establish a straightforward yet potent benchmark for semantic search in Arabic. Moreover, to precisely evaluate the effectiveness of these metrics and the dataset, we conduct our assessment of semantic search within the framework of retrieval augmented generation (RAG).
△ Less
Submitted 30 May, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
A New Benchmark for Evaluating Automatic Speech Recognition in the Arabic Call Domain
Authors:
Qusai Abo Obaidah,
Muhy Eddin Za'ter,
Adnan Jaljuli,
Ali Mahboub,
Asma Hakouz,
Bashar Al-Rfooh,
Yazan Estaitia
Abstract:
This work is an attempt to introduce a comprehensive benchmark for Arabic speech recognition, specifically tailored to address the challenges of telephone conversations in Arabic language. Arabic, characterized by its rich dialectal diversity and phonetic complexity, presents a number of unique challenges for automatic speech recognition (ASR) systems. These challenges are further amplified in the…
▽ More
This work is an attempt to introduce a comprehensive benchmark for Arabic speech recognition, specifically tailored to address the challenges of telephone conversations in Arabic language. Arabic, characterized by its rich dialectal diversity and phonetic complexity, presents a number of unique challenges for automatic speech recognition (ASR) systems. These challenges are further amplified in the domain of telephone calls, where audio quality, background noise, and conversational speech styles negatively affect recognition accuracy. Our work aims to establish a robust benchmark that not only encompasses the broad spectrum of Arabic dialects but also emulates the real-world conditions of call-based communications. By incorporating diverse dialectical expressions and accounting for the variable quality of call recordings, this benchmark seeks to provide a rigorous testing ground for the development and evaluation of ASR systems capable of navigating the complexities of Arabic speech in telephonic contexts. This work also attempts to establish a baseline performance evaluation using state-of-the-art ASR technologies.
△ Less
Submitted 30 May, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Parallel Vertex Cover Algorithms on GPUs
Authors:
Peter Yamout,
Karim Barada,
Adnan Jaljuli,
Amer E. Mouawad,
Izzat El Hajj
Abstract:
Finding small vertex covers in a graph has applications in numerous domains. Two common formulations of the problem include: Minimum Vertex Cover, which finds the smallest vertex cover in a graph, and Parameterized Vertex Cover, which finds a vertex cover whose size is less than or equal to some parameter $k$. Algorithms for both formulations traverse a search tree, which grows exponentially with…
▽ More
Finding small vertex covers in a graph has applications in numerous domains. Two common formulations of the problem include: Minimum Vertex Cover, which finds the smallest vertex cover in a graph, and Parameterized Vertex Cover, which finds a vertex cover whose size is less than or equal to some parameter $k$. Algorithms for both formulations traverse a search tree, which grows exponentially with the size of the graph or the value of $k$.
Parallelizing the traversal of the vertex cover search tree on GPUs is challenging for multiple reasons. First, the search tree is a narrow binary tree which makes it difficult to extract enough sub-trees to process in parallel to fully utilize the GPU's resources. Second, the search tree is highly imbalanced which makes load balancing across a massive number of parallel GPU workers challenging. Third, kee** around all the intermediate state needed to traverse many sub-trees in parallel puts high pressure on the GPU's memory resources and may act as a limiting factor to parallelism.
To address these challenges, we propose an approach to traverse the vertex cover search tree in parallel using GPUs while handling dynamic load balancing. Each thread block traverses a different sub-tree using a local stack, however, we also use a global worklist to balance load. Blocks contribute branches of their sub-trees to the global worklist on an as-needed basis, while blocks that finish their sub-trees get new ones from the global worklist. We use degree arrays to represent intermediate graphs so that the representation is compact in memory to avoid limiting parallelism, but self-contained which is necessary for load balancing. Our evaluation shows that compared to prior work, our hybrid approach of using local stacks and a global worklist substantially improves performance and reduces load imbalance, especially on difficult instances of the problem.
△ Less
Submitted 21 April, 2022;
originally announced April 2022.