-
AudRandAug: Random Image Augmentations for Audio Classification
Authors:
Teerath Kumar,
Muhammad Turab,
Alessandra Mileo,
Malika Bendechache,
Takfarinas Saber
Abstract:
Data augmentation has proven to be effective in training neural networks. Recently, a method called RandAug was proposed, randomly selecting data augmentation techniques from a predefined search space. RandAug has demonstrated significant performance improvements for image-related tasks while imposing minimal computational overhead. However, no prior research has explored the application of RandAu…
▽ More
Data augmentation has proven to be effective in training neural networks. Recently, a method called RandAug was proposed, randomly selecting data augmentation techniques from a predefined search space. RandAug has demonstrated significant performance improvements for image-related tasks while imposing minimal computational overhead. However, no prior research has explored the application of RandAug specifically for audio data augmentation, which converts audio into an image-like pattern. To address this gap, we introduce AudRandAug, an adaptation of RandAug for audio data. AudRandAug selects data augmentation policies from a dedicated audio search space. To evaluate the effectiveness of AudRandAug, we conducted experiments using various models and datasets. Our findings indicate that AudRandAug outperforms other existing data augmentation methods regarding accuracy performance.
△ Less
Submitted 9 September, 2023;
originally announced September 2023.
-
Go Together: Bridging the Gap between Learners and Teachers
Authors:
Asim Irfan,
Atif Nawaz,
Muhammad Turab,
Muhmmad Azeem,
Mashal Adnan,
Ahsan Mehmood,
Sarfaraz Ahmed,
Adnan Ashraf
Abstract:
After the pandemic, humanity has been facing different types of challenges. Social relationships, societal values, and academic and professional behavior have been hit the most. People are shifting their routines to social media and gadgets, and getting addicted to their isolation. This sudden change in their lives has caused an unusual social breakdown and endangered their mental health. In mid-2…
▽ More
After the pandemic, humanity has been facing different types of challenges. Social relationships, societal values, and academic and professional behavior have been hit the most. People are shifting their routines to social media and gadgets, and getting addicted to their isolation. This sudden change in their lives has caused an unusual social breakdown and endangered their mental health. In mid-2021, Pakistan's first Human Library was established under Hel**Mind to overcome these effects. Despite online sessions and webinars, Hel**Mind needs technology to reach the masses. In this work, we customized the UI or UX of a Go Together Mobile Application (GTMA) to meet the requirements of the client organization. A very interesting concept of the book (expert listener or psychologist) and the reader is introduced in GTMA. It offers separate dashboards, separate reviews or rating systems, booking, and venue information to engage the human reader with his or her favorite human book. The loyalty program enables the members to avail discounts through a mobile application and its membership is global where both the human-reader and human-books can register under the platform. The minimum viable product has been approved by our client organization.
△ Less
Submitted 23 July, 2023;
originally announced August 2023.
-
Advanced Audio Aid for Blind People
Authors:
Savera Sarwar,
Muhammad Turab,
Danish Channa,
Aisha Chandio,
M. Uzair Sohu,
Vikram Kumar
Abstract:
One of the most important senses in human life is vision, without it life is totally filled with darkness. According to WHO globally millions of people are visually impaired estimated there are 285 million, of whom some millions are blind. Unfortunately, there are around 2.4 million people are blind in our beloved country Pakistan. Human are a crucial part of society and the blind community is a m…
▽ More
One of the most important senses in human life is vision, without it life is totally filled with darkness. According to WHO globally millions of people are visually impaired estimated there are 285 million, of whom some millions are blind. Unfortunately, there are around 2.4 million people are blind in our beloved country Pakistan. Human are a crucial part of society and the blind community is a main part of society. The technologies are grown so far to make the life of humans easier more comfortable and more reliable for. However, this disability of the blind community would reduce their chance of using such innovative products. Therefore, the visually impaired community believe that they are burden to other societies and they do not capture in normal activities separates the blind people from society and because of this believe did not participate in the normally tasks of society . The visual impair people mainly face most of the problems in this real-time The aim of this work is to turn the real time world into an audio world by telling blind person about the objects in their way and can read printed text. This will enable blind persons to identify the things and read the text without any external help just by using the object detection and reading system in real time. Objective of this work: i) Object detection ii) Read printed text, using state-of-the-art (SOTA) technology.
△ Less
Submitted 17 November, 2022;
originally announced December 2022.
-
Data Dimension Reduction makes ML Algorithms efficient
Authors:
Wisal Khan,
Muhammad Turab,
Waqas Ahmad,
Syed Hasnat Ahmad,
Kelash Kumar,
Bin Luo
Abstract:
Data dimension reduction (DDR) is all about map** data from high dimensions to low dimensions, various techniques of DDR are being used for image dimension reduction like Random Projections, Principal Component Analysis (PCA), the Variance approach, LSA-Transform, the Combined and Direct approaches, and the New Random Approach. Auto-encoders (AE) are used to learn end-to-end map**. In this pap…
▽ More
Data dimension reduction (DDR) is all about map** data from high dimensions to low dimensions, various techniques of DDR are being used for image dimension reduction like Random Projections, Principal Component Analysis (PCA), the Variance approach, LSA-Transform, the Combined and Direct approaches, and the New Random Approach. Auto-encoders (AE) are used to learn end-to-end map**. In this paper, we demonstrate that pre-processing not only speeds up the algorithms but also improves accuracy in both supervised and unsupervised learning. In pre-processing of DDR, first PCA based DDR is used for supervised learning, then we explore AE based DDR for unsupervised learning. In PCA based DDR, we first compare supervised learning algorithms accuracy and time before and after applying PCA. Similarly, in AE based DDR, we compare unsupervised learning algorithm accuracy and time before and after AE representation learning. Supervised learning algorithms including support-vector machines (SVM), Decision Tree with GINI index, Decision Tree with entropy and Stochastic Gradient Descent classifier (SGDC) and unsupervised learning algorithm including K-means clustering, are used for classification purpose. We used two datasets MNIST and FashionMNIST Our experiment shows that there is massive improvement in accuracy and time reduction after pre-processing in both supervised and unsupervised learning.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Investigating Multi-Feature Selection and Ensembling for Audio Classification
Authors:
Muhammad Turab,
Teerath Kumar,
Malika Bendechache,
Takfarinas Saber
Abstract:
Deep Learning (DL) algorithms have shown impressive performance in diverse domains. Among them, audio has attracted many researchers over the last couple of decades due to some interesting patterns--particularly in classification of audio data. For better performance of audio classification, feature selection and combination play a key role as they have the potential to make or break the performan…
▽ More
Deep Learning (DL) algorithms have shown impressive performance in diverse domains. Among them, audio has attracted many researchers over the last couple of decades due to some interesting patterns--particularly in classification of audio data. For better performance of audio classification, feature selection and combination play a key role as they have the potential to make or break the performance of any DL model. To investigate this role, we conduct an extensive evaluation of the performance of several cutting-edge DL models (i.e., Convolutional Neural Network, EfficientNet, MobileNet, Supper Vector Machine and Multi-Perceptron) with various state-of-the-art audio features (i.e., Mel Spectrogram, Mel Frequency Cepstral Coefficients, and Zero Crossing Rate) either independently or as a combination (i.e., through ensembling) on three different datasets (i.e., Free Spoken Digits Dataset, Audio Urdu Digits Dataset, and Audio Gujarati Digits Dataset). Overall, results suggest feature selection depends on both the dataset and the model. However, feature combinations should be restricted to the only features that already achieve good performances when used individually (i.e., mostly Mel Spectrogram, Mel Frequency Cepstral Coefficients). Such feature combination/ensembling enabled us to outperform the previous state-of-the-art results irrespective of our choice of DL model.
△ Less
Submitted 15 June, 2022;
originally announced June 2022.