Quranic Audio Dataset: Crowdsourced and Labeled Recitation from Non-Arabic Speakers
Authors:
Raghad Salameh,
Mohamad Al Mdfaa,
Nursultan Askarbekuly,
Manuel Mazzara
Abstract:
This paper addresses the challenge of learning to recite the Quran for non-Arabic speakers. We explore the possibility of crowdsourcing a carefully annotated Quranic dataset, on top of which AI models can be built to simplify the learning process. In particular, we use the volunteer-based crowdsourcing genre and implement a crowdsourcing API to gather audio assets. We integrated the API into an ex…
▽ More
This paper addresses the challenge of learning to recite the Quran for non-Arabic speakers. We explore the possibility of crowdsourcing a carefully annotated Quranic dataset, on top of which AI models can be built to simplify the learning process. In particular, we use the volunteer-based crowdsourcing genre and implement a crowdsourcing API to gather audio assets. We integrated the API into an existing mobile application called NamazApp to collect audio recitations. We developed a crowdsourcing platform called Quran Voice for annotating the gathered audio assets. As a result, we have collected around 7000 Quranic recitations from a pool of 1287 participants across more than 11 non-Arabic countries, and we have annotated 1166 recitations from the dataset in six categories. We have achieved a crowd accuracy of 0.77, an inter-rater agreement of 0.63 between the annotators, and 0.89 between the labels assigned by the algorithm and the expert judgments.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
Map** the Unseen: Unified Promptable Panoptic Map** with Dynamic Labeling using Foundation Models
Authors:
Mohamad Al Mdfaa,
Raghad Salameh,
Sergey Zagoruyko,
Gonzalo Ferrer
Abstract:
In the field of robotics and computer vision, efficient and accurate semantic map** remains a significant challenge due to the growing demand for intelligent machines that can comprehend and interact with complex environments. Conventional panoptic map** methods, however, are limited by predefined semantic classes, thus making them ineffective for handling novel or unforeseen objects. In respo…
▽ More
In the field of robotics and computer vision, efficient and accurate semantic map** remains a significant challenge due to the growing demand for intelligent machines that can comprehend and interact with complex environments. Conventional panoptic map** methods, however, are limited by predefined semantic classes, thus making them ineffective for handling novel or unforeseen objects. In response to this limitation, we introduce the Unified Promptable Panoptic Map** (UPPM) method. UPPM utilizes recent advances in foundation models to enable real-time, on-demand label generation using natural language prompts. By incorporating a dynamic labeling strategy into traditional panoptic map** techniques, UPPM provides significant improvements in adaptability and versatility while maintaining high performance levels in map reconstruction. We demonstrate our approach on real-world and simulated datasets. Results show that UPPM can accurately reconstruct scenes and segment objects while generating rich semantic labels through natural language interactions. A series of ablation experiments validated the advantages of foundation model-based labeling over fixed label sets.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.