-
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Authors:
Paarth Neekhara,
Shehzeen Hussain,
Subhankar Ghosh,
Jason Li,
Rafael Valle,
Rohan Badlani,
Boris Ginsburg
Abstract:
Large Language Model (LLM) based text-to-speech (TTS) systems have demonstrated remarkable capabilities in handling large speech datasets and generating natural speech for new speakers. However, LLM-based TTS models are not robust as the generated output can contain repeating words, missing words and mis-aligned speech (referred to as hallucinations or attention errors), especially when the text c…
▽ More
Large Language Model (LLM) based text-to-speech (TTS) systems have demonstrated remarkable capabilities in handling large speech datasets and generating natural speech for new speakers. However, LLM-based TTS models are not robust as the generated output can contain repeating words, missing words and mis-aligned speech (referred to as hallucinations or attention errors), especially when the text contains multiple occurrences of the same token. We examine these challenges in an encoder-decoder transformer model and find that certain cross-attention heads in such models implicitly learn the text and speech alignment when trained for predicting speech tokens for a given text. To make the alignment more robust, we propose techniques utilizing CTC loss and attention priors that encourage monotonic cross-attention over the text tokens. Our guided attention training technique does not introduce any new learnable parameters and significantly improves robustness of LLM-based TTS models.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Proactive Blockage Prediction for UAV assisted Handover in Future Wireless Network
Authors:
Iftikhar Ahmad,
Ahsan Raza Khan,
Abdul Jabbar,
Muhammad Alquraan,
Lina Mohjazi,
Masood Ur Rehman,
Muhammad Ali Imran,
Ahmed Zoha,
Sajjad Hussain
Abstract:
The future wireless communication applications demand seamless connectivity, higher throughput, and low latency, for which the millimeter-wave (mmWave) band is considered a potential technology. Nevertheless, line-of-sight (LoS) is often mandatory for mmWave band communication, and it renders these waves sensitive to sudden changes in the environment. Therefore, it is necessary to maintain the LoS…
▽ More
The future wireless communication applications demand seamless connectivity, higher throughput, and low latency, for which the millimeter-wave (mmWave) band is considered a potential technology. Nevertheless, line-of-sight (LoS) is often mandatory for mmWave band communication, and it renders these waves sensitive to sudden changes in the environment. Therefore, it is necessary to maintain the LoS link for a reliable connection. One such technique to maintain LoS is using proactive handover (HO). However, proactive HO is challenging, requiring continuous information about the surrounding wireless network to anticipate potential blockage. This paper presents a proactive blockage prediction mechanism where an unmanned aerial vehicle (UAV) is used as the base station for HO. The proposed scheme uses computer vision (CV) to obtain potential blocking objects, user speed, and location. To assess the effectiveness of the proposed scheme, the system is evaluated using a publicly available dataset for blockage prediction. The study integrates scenarios from Vision-based Wireless (ViWi) and UAV channel modeling, generating wireless data samples relevant to UAVs. The antenna modeling on the UAV end incorporates a polarization-matched scenario to optimize signal reception. The results demonstrate that UAV-assisted Handover not only ensures seamless connectivity but also enhances overall network performance by 20%. This research contributes to the advancement of proactive blockage mitigation strategies in wireless networks, showcasing the potential of UAVs as dynamic and adaptable base stations.
△ Less
Submitted 26 June, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models
Authors:
Shansong Liu,
Atin Sakkeer Hussain,
Chenshuo Sun,
Ying Shan
Abstract:
The current landscape of research leveraging large language models (LLMs) is experiencing a surge. Many works harness the powerful reasoning capabilities of these models to comprehend various modalities, such as text, speech, images, videos, etc. They also utilize LLMs to understand human intention and generate desired outputs like images, videos, and music. However, research that combines both un…
▽ More
The current landscape of research leveraging large language models (LLMs) is experiencing a surge. Many works harness the powerful reasoning capabilities of these models to comprehend various modalities, such as text, speech, images, videos, etc. They also utilize LLMs to understand human intention and generate desired outputs like images, videos, and music. However, research that combines both understanding and generation using LLMs is still limited and in its nascent stage. To address this gap, we introduce a Multi-modal Music Understanding and Generation (M$^{2}$UGen) framework that integrates LLM's abilities to comprehend and generate music for different modalities. The M$^{2}$UGen framework is purpose-built to unlock creative potential from diverse sources of inspiration, encompassing music, image, and video through the use of pretrained MERT, ViT, and ViViT models, respectively. To enable music generation, we explore the use of AudioLDM 2 and MusicGen. Bridging multi-modal understanding and music generation is accomplished through the integration of the LLaMA 2 model. Furthermore, we make use of the MU-LLaMA model to generate extensive datasets that support text/image/video-to-music generation, facilitating the training of our M$^{2}$UGen framework. We conduct a thorough evaluation of our proposed framework. The experimental results demonstrate that our model achieves or surpasses the performance of the current state-of-the-art models.
△ Less
Submitted 4 March, 2024; v1 submitted 19 November, 2023;
originally announced November 2023.
-
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Authors:
Paarth Neekhara,
Shehzeen Hussain,
Rafael Valle,
Boris Ginsburg,
Rishabh Ranjan,
Shlomo Dubnov,
Farinaz Koushanfar,
Julian McAuley
Abstract:
We propose SelfVC, a training strategy to iteratively improve a voice conversion model with self-synthesized examples. Previous efforts on voice conversion focus on factorizing speech into explicitly disentangled representations that separately encode speaker characteristics and linguistic content. However, disentangling speech representations to capture such attributes using task-specific loss te…
▽ More
We propose SelfVC, a training strategy to iteratively improve a voice conversion model with self-synthesized examples. Previous efforts on voice conversion focus on factorizing speech into explicitly disentangled representations that separately encode speaker characteristics and linguistic content. However, disentangling speech representations to capture such attributes using task-specific loss terms can lead to information loss. In this work, instead of explicitly disentangling attributes with loss terms, we present a framework to train a controllable voice conversion model on entangled speech representations derived from self-supervised learning (SSL) and speaker verification models. First, we develop techniques to derive prosodic information from the audio signal and SSL representations to train predictive submodules in the synthesis model. Next, we propose a training strategy to iteratively improve the synthesis model for voice conversion, by creating a challenging training objective using self-synthesized examples. We demonstrate that incorporating such self-synthesized examples during training improves the speaker similarity of generated speech as compared to a baseline voice conversion model trained solely on heuristically perturbed inputs. Our framework is trained without any text and achieves state-of-the-art results in zero-shot voice conversion on metrics evaluating naturalness, speaker similarity, and intelligibility of synthesized audio.
△ Less
Submitted 3 May, 2024; v1 submitted 14 October, 2023;
originally announced October 2023.
-
Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning
Authors:
Shansong Liu,
Atin Sakkeer Hussain,
Chenshuo Sun,
Ying Shan
Abstract:
Text-to-music generation (T2M-Gen) faces a major obstacle due to the scarcity of large-scale publicly available music datasets with natural language captions. To address this, we propose the Music Understanding LLaMA (MU-LLaMA), capable of answering music-related questions and generating captions for music files. Our model utilizes audio representations from a pretrained MERT model to extract musi…
▽ More
Text-to-music generation (T2M-Gen) faces a major obstacle due to the scarcity of large-scale publicly available music datasets with natural language captions. To address this, we propose the Music Understanding LLaMA (MU-LLaMA), capable of answering music-related questions and generating captions for music files. Our model utilizes audio representations from a pretrained MERT model to extract music features. However, obtaining a suitable dataset for training the MU-LLaMA model remains challenging, as existing publicly accessible audio question answering datasets lack the necessary depth for open-ended music question answering. To fill this gap, we present a methodology for generating question-answer pairs from existing audio captioning datasets and introduce the MusicQA Dataset designed for answering open-ended music-related questions. The experiments demonstrate that the proposed MU-LLaMA model, trained on our designed MusicQA dataset, achieves outstanding performance in both music question answering and music caption generation across various metrics, outperforming current state-of-the-art (SOTA) models in both fields and offering a promising advancement in the T2M-Gen research field.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
A Hybrid Deep Spatio-Temporal Attention-Based Model for Parkinson's Disease Diagnosis Using Resting State EEG Signals
Authors:
Niloufar Delfan,
Mohammadreza Shahsavari,
Sadiq Hussain,
Robertas Damaševičius,
U. Rajendra Acharya
Abstract:
Parkinson's disease (PD), a severe and progressive neurological illness, affects millions of individuals worldwide. For effective treatment and management of PD, an accurate and early diagnosis is crucial. This study presents a deep learning-based model for the diagnosis of PD using resting state electroencephalogram (EEG) signal. The objective of the study is to develop an automated model that ca…
▽ More
Parkinson's disease (PD), a severe and progressive neurological illness, affects millions of individuals worldwide. For effective treatment and management of PD, an accurate and early diagnosis is crucial. This study presents a deep learning-based model for the diagnosis of PD using resting state electroencephalogram (EEG) signal. The objective of the study is to develop an automated model that can extract complex hidden nonlinear features from EEG and demonstrate its generalizability on unseen data. The model is designed using a hybrid model, consists of convolutional neural network (CNN), bidirectional gated recurrent unit (Bi-GRU), and attention mechanism. The proposed method is evaluated on three public datasets (Uc San Diego Dataset, PRED-CT, and University of Iowa (UI) dataset), with one dataset used for training and the other two for evaluation. The results show that the proposed model can accurately diagnose PD with high performance on both the training and hold-out datasets. The model also performs well even when some part of the input information is missing. The results of this work have significant implications for patient treatment and for ongoing investigations into the early detection of Parkinson's disease. The suggested model holds promise as a non-invasive and reliable technique for PD early detection utilizing resting state EEG.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
Verifiable Sustainability in Data Centers
Authors:
Syed Rafiul Hussain,
Patrick McDaniel,
Anshul Gandhi,
Kanad Ghose,
Kartik Gopalan,
Dongyoon Lee,
Yu David Liu,
Zhenhua Liu,
Shuai Mu,
Erez Zadok
Abstract:
Data centers have significant energy needs, both embodied and operational, affecting sustainability adversely. The current techniques and tools for collecting, aggregating, and reporting verifiable sustainability data are vulnerable to cyberattacks and misuse, requiring new security and privacy-preserving solutions. This paper outlines security challenges and research directions for addressing the…
▽ More
Data centers have significant energy needs, both embodied and operational, affecting sustainability adversely. The current techniques and tools for collecting, aggregating, and reporting verifiable sustainability data are vulnerable to cyberattacks and misuse, requiring new security and privacy-preserving solutions. This paper outlines security challenges and research directions for addressing these pressing requirements.
△ Less
Submitted 12 January, 2024; v1 submitted 22 July, 2023;
originally announced July 2023.
-
NetFlick: Adversarial Flickering Attacks on Deep Learning Based Video Compression
Authors:
Jung-Woo Chang,
Nojan Sheybani,
Shehzeen Samarah Hussain,
Mojan Javaheripi,
Seira Hidano,
Farinaz Koushanfar
Abstract:
Video compression plays a significant role in IoT devices for the efficient transport of visual data while satisfying all underlying bandwidth constraints. Deep learning-based video compression methods are rapidly replacing traditional algorithms and providing state-of-the-art results on edge devices. However, recently developed adversarial attacks demonstrate that digitally crafted perturbations…
▽ More
Video compression plays a significant role in IoT devices for the efficient transport of visual data while satisfying all underlying bandwidth constraints. Deep learning-based video compression methods are rapidly replacing traditional algorithms and providing state-of-the-art results on edge devices. However, recently developed adversarial attacks demonstrate that digitally crafted perturbations can break the Rate-Distortion relationship of video compression. In this work, we present a real-world LED attack to target video compression frameworks. Our physically realizable attack, dubbed NetFlick, can degrade the spatio-temporal correlation between successive frames by injecting flickering temporal perturbations. In addition, we propose universal perturbations that can downgrade performance of incoming video without prior knowledge of the contents. Experimental results demonstrate that NetFlick can successfully deteriorate the performance of video compression frameworks in both digital- and physical-settings and can be further extended to attack downstream video classification networks.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations
Authors:
Shehzeen Hussain,
Paarth Neekhara,
Jocelyn Huang,
Jason Li,
Boris Ginsburg
Abstract:
In this work, we propose a zero-shot voice conversion method using speech representations trained with self-supervised learning. First, we develop a multi-task model to decompose a speech utterance into features such as linguistic content, speaker characteristics, and speaking style. To disentangle content and speaker representations, we propose a training strategy based on Siamese networks that e…
▽ More
In this work, we propose a zero-shot voice conversion method using speech representations trained with self-supervised learning. First, we develop a multi-task model to decompose a speech utterance into features such as linguistic content, speaker characteristics, and speaking style. To disentangle content and speaker representations, we propose a training strategy based on Siamese networks that encourages similarity between the content representations of the original and pitch-shifted audio. Next, we develop a synthesis model with pitch and duration predictors that can effectively reconstruct the speech signal from its decomposed representation. Our framework allows controllable and speaker-adaptive synthesis to perform zero-shot any-to-any voice conversion achieving state-of-the-art results on metrics evaluating speaker similarity, intelligibility, and naturalness. Using just 10 seconds of data for a target speaker, our framework can perform voice swap** and achieves a speaker verification EER of 5.5% for seen speakers and 8.4% for unseen speakers.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
Machine learning for accelerating the discovery of high performance low-cost solar cells: a systematic review
Authors:
Satyam Bhatti,
Habib Ullah Manzoor,
Bruno Michel,
Ruy Sebastian Bonilla,
Richard Abrams,
Ahmed Zoha,
Sajjad Hussain,
Rami Ghannam
Abstract:
Solar photovoltaic (PV) technology has merged as an efficient and versatile method for converting the Sun's vast energy into electricity. Innovation in develo** new materials and solar cell architectures is required to ensure lightweight, portable, and flexible miniaturized electronic devices operate for long periods with reduced battery demand. Recent advances in biomedical implantable and wear…
▽ More
Solar photovoltaic (PV) technology has merged as an efficient and versatile method for converting the Sun's vast energy into electricity. Innovation in develo** new materials and solar cell architectures is required to ensure lightweight, portable, and flexible miniaturized electronic devices operate for long periods with reduced battery demand. Recent advances in biomedical implantable and wearable devices have coincided with a growing interest in efficient energy-harvesting solutions. Such devices primarily rely on rechargeable batteries to satisfy their energy needs. Moreover, Artificial Intelligence (AI) and Machine Learning (ML) techniques are touted as game changers in energy harvesting, especially in solar energy materials. In this article, we systematically review a range of ML techniques for optimizing the performance of low-cost solar cells for miniaturized electronic devices. Our systematic review reveals that these ML techniques can expedite the discovery of new solar cell materials and architectures. In particular, this review covers a broad range of ML techniques targeted at producing low-cost solar cells. Moreover, we present a new method of classifying the literature according to data synthesis, ML algorithms, optimization, and fabrication process. In addition, our review reveals that the Gaussian Process Regression (GPR) ML technique with Bayesian Optimization (BO) enables the design of the most promising low-solar cell architecture. Therefore, our review is a critical evaluation of existing ML techniques and is presented to guide researchers in discovering the next generation of low-cost solar cells using ML techniques.
△ Less
Submitted 26 December, 2022;
originally announced December 2022.
-
A Survey on Energy Optimization Techniques in UAV-Based Cellular Networks: From Conventional to Machine Learning Approaches
Authors:
Attai Ibrahim Abubakar,
Iftikhar Ahmad,
Kenechi G. Omeke,
Metin Ozturk,
Cihat Ozturk,
Ali Makine Abdel-Salam,
Michael S. Mollel,
Qammer H. Abbasi,
Sajjad Hussain,
Muhammad Ali Imran
Abstract:
Wireless communication networks have been witnessing an unprecedented demand due to the increasing number of connected devices and emerging bandwidth-hungry applications. Albeit many competent technologies for capacity enhancement purposes, such as millimeter wave communications and network densification, there is still room and need for further capacity enhancement in wireless communication netwo…
▽ More
Wireless communication networks have been witnessing an unprecedented demand due to the increasing number of connected devices and emerging bandwidth-hungry applications. Albeit many competent technologies for capacity enhancement purposes, such as millimeter wave communications and network densification, there is still room and need for further capacity enhancement in wireless communication networks, especially for the cases of unusual people gatherings, such as sport competitions, musical concerts, etc. Unmanned aerial vehicles (UAVs) have been identified as one of the promising options to enhance the capacity due to their easy implementation, pop up fashion operation, and cost-effective nature. The main idea is to deploy base stations on UAVs and operate them as flying base stations, thereby bringing additional capacity to where it is needed. However, because the UAVs mostly have limited energy storage, their energy consumption must be optimized to increase flight time. In this survey, we investigate different energy optimization techniques with a top-level classification in terms of the optimization algorithm employed; conventional and machine learning (ML). Such classification helps understand the state of the art and the current trend in terms of methodology. In this regard, various optimization techniques are identified from the related literature, and they are presented under the above mentioned classes of employed optimization methods. In addition, for the purpose of completeness, we include a brief tutorial on the optimization methods and power supply and charging mechanisms of UAVs. Moreover, novel concepts, such as reflective intelligent surfaces and landing spot optimization, are also covered to capture the latest trend in the literature.
△ Less
Submitted 17 April, 2022;
originally announced April 2022.
-
Multi-task Voice Activated Framework using Self-supervised Learning
Authors:
Shehzeen Hussain,
Van Nguyen,
Shuhua Zhang,
Erik Visser
Abstract:
Self-supervised learning methods such as wav2vec 2.0 have shown promising results in learning speech representations from unlabelled and untranscribed speech data that are useful for speech recognition. Since these representations are learned without any task-specific supervision, they can also be useful for other voice-activated tasks like speaker verification, keyword spotting, emotion classific…
▽ More
Self-supervised learning methods such as wav2vec 2.0 have shown promising results in learning speech representations from unlabelled and untranscribed speech data that are useful for speech recognition. Since these representations are learned without any task-specific supervision, they can also be useful for other voice-activated tasks like speaker verification, keyword spotting, emotion classification etc. In our work, we propose a general purpose framework for adapting a pre-trained wav2vec 2.0 model for different voice-activated tasks. We develop downstream network architectures that operate on the contextualized speech representations of wav2vec 2.0 to adapt the representations for solving a given task. Finally, we extend our framework to perform multi-task learning by jointly optimizing the network parameters on multiple voice activated tasks using a shared transformer backbone. Both of our single and multi-task frameworks achieve state-of-the-art results in speaker verification and keyword spotting benchmarks. Our best performing models achieve 1.98% and 3.15% EER on VoxCeleb1 test set when trained on VoxCeleb2 and VoxCeleb1 respectively, and 98.23% accuracy on Google Speech Commands v1.0 keyword spotting dataset.
△ Less
Submitted 19 March, 2022; v1 submitted 3 October, 2021;
originally announced October 2021.
-
Revenue Maximization through Cell Switching and Spectrum Leasing in 5G HetNets
Authors:
Attai Ibrahim Abubakar,
Cihat Ozturk,
Metin Ozturk,
Michael S. Mollel,
Syed Muhammad Asad,
Naveed Ul Hassan,
Sajjad Hussain,
MuhammadAli Imran
Abstract:
One of the ways of achieving improved capacity in mobile cellular networks is via network densification. Even though densification increases the capacity of the network, it also leads to increased energy consumption which can be curbed by dynamically switching off some base stations (BSs) during periods of low traffic. However, dynamic cell switching has the challenge of spectrum under-utilization…
▽ More
One of the ways of achieving improved capacity in mobile cellular networks is via network densification. Even though densification increases the capacity of the network, it also leads to increased energy consumption which can be curbed by dynamically switching off some base stations (BSs) during periods of low traffic. However, dynamic cell switching has the challenge of spectrum under-utilizationas the spectrum originally occupied by the BSs that are turned off remains dormant. This dormant spectrum can be leased by the primary network (PN) operators, who hold the license, to the secondary network (SN) operators who cannot afford to purchase the spectrum license. Thus enabling the PN to gain additional revenue from spectrum leasing as well as from electricity cost savings due to reduced energy consumption. Therefore, in this work, we propose a cell switching and spectrum leasing framework based on simulated annealing (SA) algorithm to maximize the revenue of the PN while respecting the quality-of-service constraints. The performance evaluation reveals that the proposed method is very close to optimal exhaustive search method with a significant reduction in the computation complexity.
△ Less
Submitted 26 August, 2021;
originally announced August 2021.
-
UncertaintyFuseNet: Robust Uncertainty-aware Hierarchical Feature Fusion Model with Ensemble Monte Carlo Dropout for COVID-19 Detection
Authors:
Moloud Abdar,
Soorena Salari,
Sina Qahremani,
Hak-Keung Lam,
Fakhri Karray,
Sadiq Hussain,
Abbas Khosravi,
U. Rajendra Acharya,
Vladimir Makarenkov,
Saeid Nahavandi
Abstract:
The COVID-19 (Coronavirus disease 2019) pandemic has become a major global threat to human health and well-being. Thus, the development of computer-aided detection (CAD) systems that are capable to accurately distinguish COVID-19 from other diseases using chest computed tomography (CT) and X-ray data is of immediate priority. Such automatic systems are usually based on traditional machine learning…
▽ More
The COVID-19 (Coronavirus disease 2019) pandemic has become a major global threat to human health and well-being. Thus, the development of computer-aided detection (CAD) systems that are capable to accurately distinguish COVID-19 from other diseases using chest computed tomography (CT) and X-ray data is of immediate priority. Such automatic systems are usually based on traditional machine learning or deep learning methods. Differently from most of existing studies, which used either CT scan or X-ray images in COVID-19-case classification, we present a simple but efficient deep learning feature fusion model, called UncertaintyFuseNet, which is able to classify accurately large datasets of both of these types of images. We argue that the uncertainty of the model's predictions should be taken into account in the learning process, even though most of existing studies have overlooked it. We quantify the prediction uncertainty in our feature fusion model using effective Ensemble MC Dropout (EMCD) technique. A comprehensive simulation study has been conducted to compare the results of our new model to the existing approaches, evaluating the performance of competing models in terms of Precision, Recall, F-Measure, Accuracy and ROC curves. The obtained results prove the efficiency of our model which provided the prediction accuracy of 99.08\% and 96.35\% for the considered CT scan and X-ray datasets, respectively. Moreover, our UncertaintyFuseNet model was generally robust to noise and performed well with previously unseen data. The source code of our implementation is freely available at: https://github.com/moloud1987/UncertaintyFuseNet-for-COVID-19-Classification.
△ Less
Submitted 30 January, 2022; v1 submitted 18 May, 2021;
originally announced May 2021.
-
Combining a Convolutional Neural Network with Autoencoders to Predict the Survival Chance of COVID-19 Patients
Authors:
Fahime Khozeimeh,
Danial Sharifrazi,
Navid Hoseini Izadi,
Javad Hassannataj Joloudari,
Afshin Shoeibi,
Roohallah Alizadehsani,
Juan M. Gorriz,
Sadiq Hussain,
Zahra Alizadeh Sani,
Hossein Moosaei,
Abbas Khosravi,
Saeid Nahavandi,
Sheikh Mohammed Shariful Islam
Abstract:
COVID-19 has caused many deaths worldwide. The automation of the diagnosis of this virus is highly desired. Convolutional neural networks (CNNs) have shown outstanding classification performance on image datasets. To date, it appears that COVID computer-aided diagnosis systems based on CNNs and clinical information have not yet been analysed or explored. We propose a novel method, named the CNN-AE…
▽ More
COVID-19 has caused many deaths worldwide. The automation of the diagnosis of this virus is highly desired. Convolutional neural networks (CNNs) have shown outstanding classification performance on image datasets. To date, it appears that COVID computer-aided diagnosis systems based on CNNs and clinical information have not yet been analysed or explored. We propose a novel method, named the CNN-AE, to predict the survival chance of COVID-19 patients using a CNN trained with clinical information. Notably, the required resources to prepare CT images are expensive and limited compared to those required to collect clinical data, such as blood pressure, liver disease, etc. We evaluated our method using a publicly available clinical dataset that we collected. The dataset properties were carefully analysed to extract important features and compute the correlations of features. A data augmentation procedure based on autoencoders (AEs) was proposed to balance the dataset. The experimental results revealed that the average accuracy of the CNN-AE (96.05%) was higher than that of the CNN (92.49%). To demonstrate the generality of our augmentation method, we trained some existing mortality risk prediction methods on our dataset (with and without data augmentation) and compared their performances. We also evaluated our method using another dataset for further generality verification. To show that clinical data can be used for COVID-19 survival chance prediction, the CNN-AE was compared with multiple pre-trained deep models that were tuned based on CT images.
△ Less
Submitted 8 August, 2021; v1 submitted 18 April, 2021;
originally announced April 2021.
-
WaveGuard: Understanding and Mitigating Audio Adversarial Examples
Authors:
Shehzeen Hussain,
Paarth Neekhara,
Shlomo Dubnov,
Julian McAuley,
Farinaz Koushanfar
Abstract:
There has been a recent surge in adversarial attacks on deep learning based automatic speech recognition (ASR) systems. These attacks pose new challenges to deep learning security and have raised significant concerns in deploying ASR systems in safety-critical applications. In this work, we introduce WaveGuard: a framework for detecting adversarial inputs that are crafted to attack ASR systems. Ou…
▽ More
There has been a recent surge in adversarial attacks on deep learning based automatic speech recognition (ASR) systems. These attacks pose new challenges to deep learning security and have raised significant concerns in deploying ASR systems in safety-critical applications. In this work, we introduce WaveGuard: a framework for detecting adversarial inputs that are crafted to attack ASR systems. Our framework incorporates audio transformation functions and analyses the ASR transcriptions of the original and transformed audio to detect adversarial inputs. We demonstrate that our defense framework is able to reliably detect adversarial examples constructed by four recent audio adversarial attacks, with a variety of audio transformation functions. With careful regard for best practices in defense evaluations, we analyze our proposed defense and its strength to withstand adaptive and robust attacks in the audio domain. We empirically demonstrate that audio transformations that recover audio from perceptually informed representations can lead to a strong defense that is robust against an adaptive adversary even in a complete white-box setting. Furthermore, WaveGuard can be used out-of-the box and integrated directly with any ASR model to efficiently detect audio adversarial examples, without the need for model retraining.
△ Less
Submitted 4 March, 2021;
originally announced March 2021.
-
Fusion of convolution neural network, support vector machine and Sobel filter for accurate detection of COVID-19 patients using X-ray images
Authors:
Danial Sharifrazi,
Roohallah Alizadehsani,
Mohamad Roshanzamir,
Javad Hassannataj Joloudari,
Afshin Shoeibi,
Mahboobeh Jafari,
Sadiq Hussain,
Zahra Alizadeh Sani,
Fereshteh Hasanzadeh,
Fahime Khozeimeh,
Abbas Khosravi,
Saeid Nahavandi,
Maryam Panahiazar,
Assef Zare,
Sheikh Mohammed Shariful Islam,
U Rajendra Acharya
Abstract:
The coronavirus (COVID-19) is currently the most common contagious disease which is prevalent all over the world. The main challenge of this disease is the primary diagnosis to prevent secondary infections and its spread from one person to another. Therefore, it is essential to use an automatic diagnosis system along with clinical procedures for the rapid diagnosis of COVID-19 to prevent its sprea…
▽ More
The coronavirus (COVID-19) is currently the most common contagious disease which is prevalent all over the world. The main challenge of this disease is the primary diagnosis to prevent secondary infections and its spread from one person to another. Therefore, it is essential to use an automatic diagnosis system along with clinical procedures for the rapid diagnosis of COVID-19 to prevent its spread. Artificial intelligence techniques using computed tomography (CT) images of the lungs and chest radiography have the potential to obtain high diagnostic performance for Covid-19 diagnosis. In this study, a fusion of convolutional neural network (CNN), support vector machine (SVM), and Sobel filter is proposed to detect COVID-19 using X-ray images. A new X-ray image dataset was collected and subjected to high pass filter using a Sobel filter to obtain the edges of the images. Then these images are fed to CNN deep learning model followed by SVM classifier with ten-fold cross validation strategy. This method is designed so that it can learn with not many data. Our results show that the proposed CNN-SVM with Sobel filtering (CNN-SVM+Sobel) achieved the highest classification accuracy of 99.02% in accurate detection of COVID-19. It showed that using Sobel filter can improve the performance of CNN. Unlike most of the other researches, this method does not use a pre-trained network. We have also validated our developed model using six public databases and obtained the highest performance. Hence, our developed model is ready for clinical application
△ Less
Submitted 13 February, 2021;
originally announced February 2021.
-
Uncertainty-Aware Semi-Supervised Method Using Large Unlabeled and Limited Labeled COVID-19 Data
Authors:
Roohallah Alizadehsani,
Danial Sharifrazi,
Navid Hoseini Izadi,
Javad Hassannataj Joloudari,
Afshin Shoeibi,
Juan M. Gorriz,
Sadiq Hussain,
Juan E. Arco,
Zahra Alizadeh Sani,
Fahime Khozeimeh,
Abbas Khosravi,
Saeid Nahavandi,
Sheikh Mohammed Shariful Islam,
U Rajendra Acharya
Abstract:
The new coronavirus has caused more than one million deaths and continues to spread rapidly. This virus targets the lungs, causing respiratory distress which can be mild or severe. The X-ray or computed tomography (CT) images of lungs can reveal whether the patient is infected with COVID-19 or not. Many researchers are trying to improve COVID-19 detection using artificial intelligence. Our motivat…
▽ More
The new coronavirus has caused more than one million deaths and continues to spread rapidly. This virus targets the lungs, causing respiratory distress which can be mild or severe. The X-ray or computed tomography (CT) images of lungs can reveal whether the patient is infected with COVID-19 or not. Many researchers are trying to improve COVID-19 detection using artificial intelligence. Our motivation is to develop an automatic method that can cope with scenarios in which preparing labeled data is time consuming or expensive. In this article, we propose a Semi-supervised Classification using Limited Labeled Data (SCLLD) relying on Sobel edge detection and Generative Adversarial Networks (GANs) to automate the COVID-19 diagnosis. The GAN discriminator output is a probabilistic value which is used for classification in this work. The proposed system is trained using 10,000 CT scans collected from Omid Hospital, whereas a public dataset is also used for validating our system. The proposed method is compared with other state-of-the-art supervised methods such as Gaussian processes. To the best of our knowledge, this is the first time a semi-supervised method for COVID-19 detection is presented. Our system is capable of learning from a mixture of limited labeled and unlabeled data where supervised learners fail due to a lack of sufficient amount of labeled data. Thus, our semi-supervised training method significantly outperforms the supervised training of Convolutional Neural Network (CNN) when labeled training data is scarce. The 95% confidence intervals for our method in terms of accuracy, sensitivity, and specificity are 99.56 +- 0.20%, 99.88 +- 0.24%, and 99.40 +- 0.18%, respectively, whereas intervals for the CNN (trained supervised) are 68.34 +- 4.11%, 91.2 +- 6.15%, and 46.40 +- 5.21%.
△ Less
Submitted 24 December, 2021; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Expressive Neural Voice Cloning
Authors:
Paarth Neekhara,
Shehzeen Hussain,
Shlomo Dubnov,
Farinaz Koushanfar,
Julian McAuley
Abstract:
Voice cloning is the task of learning to synthesize the voice of an unseen speaker from a few samples. While current voice cloning methods achieve promising results in Text-to-Speech (TTS) synthesis for a new voice, these approaches lack the ability to control the expressiveness of synthesized audio. In this work, we propose a controllable voice cloning method that allows fine-grained control over…
▽ More
Voice cloning is the task of learning to synthesize the voice of an unseen speaker from a few samples. While current voice cloning methods achieve promising results in Text-to-Speech (TTS) synthesis for a new voice, these approaches lack the ability to control the expressiveness of synthesized audio. In this work, we propose a controllable voice cloning method that allows fine-grained control over various style aspects of the synthesized speech for an unseen speaker. We achieve this by explicitly conditioning the speech synthesis model on a speaker encoding, pitch contour and latent style tokens during training. Through both quantitative and qualitative evaluations, we show that our framework can be used for various expressive voice cloning tasks using only a few transcribed or untranscribed speech samples for a new speaker. These cloning tasks include style transfer from a reference speech, synthesizing speech directly from text, and fine-grained style control by manipulating the style conditioning variables during inference.
△ Less
Submitted 30 January, 2021;
originally announced February 2021.
-
EEG based Major Depressive disorder and Bipolar disorder detection using Neural Networks: A review
Authors:
Sana Yasin,
Syed Asad Hussain,
Sinem Aslan,
Imran Raza,
Muhammad Muzammel,
Alice Othmani
Abstract:
Mental disorders represent critical public health challenges as they are leading contributors to the global burden of disease and intensely influence social and financial welfare of individuals. The present comprehensive review concentrate on the two mental disorders: Major depressive Disorder (MDD) and Bipolar Disorder (BD) with noteworthy publications during the last ten years. There is a big ne…
▽ More
Mental disorders represent critical public health challenges as they are leading contributors to the global burden of disease and intensely influence social and financial welfare of individuals. The present comprehensive review concentrate on the two mental disorders: Major depressive Disorder (MDD) and Bipolar Disorder (BD) with noteworthy publications during the last ten years. There is a big need nowadays for phenotypic characterization of psychiatric disorders with biomarkers. Electroencephalography (EEG) signals could offer a rich signature for MDD and BD and then they could improve understanding of pathophysiological mechanisms underling these mental disorders. In this review, we focus on the literature works adopting neural networks fed by EEG signals. Among those studies using EEG and neural networks, we have discussed a variety of EEG based protocols, biomarkers and public datasets for depression and bipolar disorder detection. We conclude with a discussion and valuable recommendations that will help to improve the reliability of developed models and for more accurate and more deterministic computational intelligence based systems in psychiatry. This review will prove to be a structured and valuable initial point for the researchers working on depression and bipolar disorders recognition by using EEG signals.
△ Less
Submitted 4 February, 2021; v1 submitted 28 September, 2020;
originally announced September 2020.
-
Automated Detection and Forecasting of COVID-19 using Deep Learning Techniques: A Review
Authors:
Afshin Shoeibi,
Marjane Khodatars,
Mahboobeh Jafari,
Navid Ghassemi,
Delaram Sadeghi,
Parisa Moridian,
Ali Khadem,
Roohallah Alizadehsani,
Sadiq Hussain,
Assef Zare,
Zahra Alizadeh Sani,
Fahime Khozeimeh,
Saeid Nahavandi,
U. Rajendra Acharya,
Juan M. Gorriz
Abstract:
Coronavirus, or COVID-19, is a hazardous disease that has endangered the health of many people around the world by directly affecting the lungs. COVID-19 is a medium-sized, coated virus with a single-stranded RNA, and also has one of the largest RNA genomes and is approximately 120 nm. The X-Ray and computed tomography (CT) imaging modalities are widely used to obtain a fast and accurate medical d…
▽ More
Coronavirus, or COVID-19, is a hazardous disease that has endangered the health of many people around the world by directly affecting the lungs. COVID-19 is a medium-sized, coated virus with a single-stranded RNA, and also has one of the largest RNA genomes and is approximately 120 nm. The X-Ray and computed tomography (CT) imaging modalities are widely used to obtain a fast and accurate medical diagnosis. Identifying COVID-19 from these medical images is extremely challenging as it is time-consuming and prone to human errors. Hence, artificial intelligence (AI) methodologies can be used to obtain consistent high performance. Among the AI methods, deep learning (DL) networks have gained popularity recently compared to conventional machine learning (ML). Unlike ML, all stages of feature extraction, feature selection, and classification are accomplished automatically in DL models. In this paper, a complete survey of studies on the application of DL techniques for COVID-19 diagnostic and segmentation of lungs is discussed, concentrating on works that used X-Ray and CT images. Additionally, a review of papers on the forecasting of coronavirus prevalence in different parts of the world with DL is presented. Lastly, the challenges faced in the detection of COVID-19 using DL techniques and directions for future research are discussed.
△ Less
Submitted 10 February, 2024; v1 submitted 16 July, 2020;
originally announced July 2020.
-
Energy Optimization in Ultra-Dense Radio Access Networks via Traffic-Aware Cell Switching
Authors:
Metin Ozturk,
Attai Ibrahim Abubakar,
João Pedro Battistella Nadas,
Rao Naveed Bin Rais,
Sajjad Hussain,
Muhammad Ali Imran
Abstract:
Ultra-dense deployments in 5G, the next generation of cellular networks, are an alternative to provide ultra-high throughput by bringing the users closer to the base stations. On the other hand, 5G deployments must not incur a large increase in energy consumption in order to keep them cost-effective and most importantly to reduce the carbon footprint of cellular networks. We propose a reinforcemen…
▽ More
Ultra-dense deployments in 5G, the next generation of cellular networks, are an alternative to provide ultra-high throughput by bringing the users closer to the base stations. On the other hand, 5G deployments must not incur a large increase in energy consumption in order to keep them cost-effective and most importantly to reduce the carbon footprint of cellular networks. We propose a reinforcement learning cell switching algorithm, to minimize the energy consumption in ultra-dense deployments without compromising the quality of service (QoS) experienced by the users. In this regard, the proposed algorithm can intelligently learn which small cells (SCs) to turn off at any given time based on the traffic load of the SCs and the macro cell. To validate the idea, we used the open call detail record (CDR) data set from the city of Milan, Italy, and tested our algorithm against typical operational benchmark solutions. With the obtained results, we demonstrate exactly when and how the proposed algorithm can provide energy savings, and moreover how this happens without reducing QoS of users. Most importantly, we show that our solution has a very similar performance to the exhaustive search, with the advantage of being scalable and less complex.
△ Less
Submitted 8 July, 2020;
originally announced July 2020.
-
Deep Learning for Neuroimaging-based Diagnosis and Rehabilitation of Autism Spectrum Disorder: A Review
Authors:
Marjane Khodatars,
Afshin Shoeibi,
Delaram Sadeghi,
Navid Ghassemi,
Mahboobeh Jafari,
Parisa Moridian,
Ali Khadem,
Roohallah Alizadehsani,
Assef Zare,
Yinan Kong,
Abbas Khosravi,
Saeid Nahavandi,
Sadiq Hussain,
U. Rajendra Acharya,
Michael Berk
Abstract:
Accurate diagnosis of Autism Spectrum Disorder (ASD) followed by effective rehabilitation is essential for the management of this disorder. Artificial intelligence (AI) techniques can aid physicians to apply automatic diagnosis and rehabilitation procedures. AI techniques comprise traditional machine learning (ML) approaches and deep learning (DL) techniques. Conventional ML methods employ various…
▽ More
Accurate diagnosis of Autism Spectrum Disorder (ASD) followed by effective rehabilitation is essential for the management of this disorder. Artificial intelligence (AI) techniques can aid physicians to apply automatic diagnosis and rehabilitation procedures. AI techniques comprise traditional machine learning (ML) approaches and deep learning (DL) techniques. Conventional ML methods employ various feature extraction and classification techniques, but in DL, the process of feature extraction and classification is accomplished intelligently and integrally. DL methods for diagnosis of ASD have been focused on neuroimaging-based approaches. Neuroimaging techniques are non-invasive disease markers potentially useful for ASD diagnosis. Structural and functional neuroimaging techniques provide physicians substantial information about the structure (anatomy and structural connectivity) and function (activity and functional connectivity) of the brain. Due to the intricate structure and function of the brain, proposing optimum procedures for ASD diagnosis with neuroimaging data without exploiting powerful AI techniques like DL may be challenging. In this paper, studies conducted with the aid of DL networks to distinguish ASD are investigated. Rehabilitation tools provided for supporting ASD patients utilizing DL networks are also assessed. Finally, we will present important challenges in the automated detection and rehabilitation of ASD and propose some future works.
△ Less
Submitted 1 November, 2021; v1 submitted 2 July, 2020;
originally announced July 2020.
-
Epileptic Seizures Detection Using Deep Learning Techniques: A Review
Authors:
Afshin Shoeibi,
Marjane Khodatars,
Navid Ghassemi,
Mahboobeh Jafari,
Parisa Moridian,
Roohallah Alizadehsani,
Maryam Panahiazar,
Fahime Khozeimeh,
Assef Zare,
Hossein Hosseini-Nejad,
Abbas Khosravi,
Amir F. Atiya,
Diba Aminshahidi,
Sadiq Hussain,
Modjtaba Rouhani,
Saeid Nahavandi,
Udyavara Rajendra Acharya
Abstract:
A variety of screening approaches have been proposed to diagnose epileptic seizures, using electroencephalography (EEG) and magnetic resonance imaging (MRI) modalities. Artificial intelligence encompasses a variety of areas, and one of its branches is deep learning (DL). Before the rise of DL, conventional machine learning algorithms involving feature extraction were performed. This limited their…
▽ More
A variety of screening approaches have been proposed to diagnose epileptic seizures, using electroencephalography (EEG) and magnetic resonance imaging (MRI) modalities. Artificial intelligence encompasses a variety of areas, and one of its branches is deep learning (DL). Before the rise of DL, conventional machine learning algorithms involving feature extraction were performed. This limited their performance to the ability of those handcrafting the features. However, in DL, the extraction of features and classification are entirely automated. The advent of these techniques in many areas of medicine, such as in the diagnosis of epileptic seizures, has made significant advances. In this study, a comprehensive overview of works focused on automated epileptic seizure detection using DL techniques and neuroimaging modalities is presented. Various methods proposed to diagnose epileptic seizures automatically using EEG and MRI modalities are described. In addition, rehabilitation systems developed for epileptic seizures using DL have been analyzed, and a summary is provided. The rehabilitation tools include cloud computing techniques and hardware required for implementation of DL algorithms. The important challenges in accurate detection of automated epileptic seizures using DL with EEG and MRI modalities are discussed. The advantages and limitations in employing DL-based techniques for epileptic seizures diagnosis are presented. Finally, the most promising DL models proposed and possible future works on automated epileptic seizure detection are delineated.
△ Less
Submitted 29 May, 2021; v1 submitted 2 July, 2020;
originally announced July 2020.
-
Context-Aware Wireless Connectivity and Processing Unit Optimization for IoT Networks
Authors:
Metin Ozturk,
Attai Ibrahim Abubakar,
Rao Naveed Bin Rais,
Mona Jaber,
Sajjad Hussain,
Muhammad Ali Imran
Abstract:
A novel approach is presented in this work for context-aware connectivity and processing optimization of Internet of things (IoT) networks. Different from the state-of-the-art approaches, the proposed approach simultaneously selects the best connectivity and processing unit (e.g., device, fog, and cloud) along with the percentage of data to be offloaded by jointly optimizing energy consumption, re…
▽ More
A novel approach is presented in this work for context-aware connectivity and processing optimization of Internet of things (IoT) networks. Different from the state-of-the-art approaches, the proposed approach simultaneously selects the best connectivity and processing unit (e.g., device, fog, and cloud) along with the percentage of data to be offloaded by jointly optimizing energy consumption, response-time, security, and monetary cost. The proposed scheme employs a reinforcement learning algorithm, and manages to achieve significant gains compared to deterministic solutions. In particular, the requirements of IoT devices in terms of response-time and security are taken as inputs along with the remaining battery level of the devices, and the developed algorithm returns an optimized policy. The results obtained show that only our method is able to meet the holistic multi-objective optimisation criteria, albeit, the benchmark approaches may achieve better results on a particular metric at the cost of failing to reach the other targets. Thus, the proposed approach is a device-centric and context-aware solution that accounts for the monetary and battery constraints.
△ Less
Submitted 29 April, 2020;
originally announced May 2020.
-
Vehicle Intrusion And Theft Control System Using GSM and GPS -- An advance and viable approach
Authors:
Ashad Mustafa,
Hassan Jameel,
Mohtashim Baqar,
Rameez Ahmed Khan,
Zeeshan M Yaqoob,
Zeeshan Rahim,
Syed Safdar Hussain
Abstract:
This paper presents a novel approach towards the designing and development of a feasible and an embedded vehicle intrusion and theft control system using GSM (Global System for Mobile Communication) and GPS (Global Positioning System). The proposed system uses GSM technology as one of the distinguishing building blocks of the system. A GPS module Holux GR89 is used to trace the position of the veh…
▽ More
This paper presents a novel approach towards the designing and development of a feasible and an embedded vehicle intrusion and theft control system using GSM (Global System for Mobile Communication) and GPS (Global Positioning System). The proposed system uses GSM technology as one of the distinguishing building blocks of the system. A GPS module Holux GR89 is used to trace the position of the vehicle and Mercury Switches are used to collect analog data continuously, in case of an intrusion, variations will be observed in sensors reading. Continuous readings from sensors are collected on to the microcontroller constantly and on the basis of those readings decision is taken whether an intrusion is made or not and in case of an intrusion a message from a predefined set of messages is generated to the owner of the vehicle and on reception of the message, the owner will have the luxury to take an action via an SMS either to lock the gears of the vehicle or seize the engine of the vehicle from a far-off place. A relay is used to control gears and engine of the vehicle while working with the microcontroller. A prototype system was built and tested. The results were very positive and encouraging.
△ Less
Submitted 9 April, 2020;
originally announced April 2020.
-
FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA
Authors:
Shehzeen Hussain,
Mojan Javaheripi,
Paarth Neekhara,
Ryan Kastner,
Farinaz Koushanfar
Abstract:
Autoregressive convolutional neural networks (CNNs) have been widely exploited for sequence generation tasks such as audio synthesis, language modeling and neural machine translation. WaveNet is a deep autoregressive CNN composed of several stacked layers of dilated convolution that is used for sequence generation. While WaveNet produces state-of-the art audio generation results, the naive inferen…
▽ More
Autoregressive convolutional neural networks (CNNs) have been widely exploited for sequence generation tasks such as audio synthesis, language modeling and neural machine translation. WaveNet is a deep autoregressive CNN composed of several stacked layers of dilated convolution that is used for sequence generation. While WaveNet produces state-of-the art audio generation results, the naive inference implementation is quite slow; it takes a few minutes to generate just one second of audio on a high-end GPU. In this work, we develop the first accelerator platform~\textit{FastWave} for autoregressive convolutional neural networks, and address the associated design challenges. We design the Fast-Wavenet inference model in Vivado HLS and perform a wide range of optimizations including fixed-point implementation, array partitioning and pipelining. Our model uses a fully parameterized parallel architecture for fast matrix-vector multiplication that enables per-layer customized latency fine-tuning for further throughput improvement. Our experiments comparatively assess the trade-off between throughput and resource utilization for various optimizations. Our best WaveNet design on the Xilinx XCVU13P FPGA that uses only on-chip memory, achieves 66 faster generation speed compared to CPU implementation and 11 faster generation speed than GPU implementation.
△ Less
Submitted 9 February, 2020;
originally announced February 2020.
-
Quantum Calculus-based Volterra LMS for Nonlinear Channel Estimation
Authors:
Muhammad Usman,
Muhammad Sohail Ibrahim,
Jawwad Ahmad,
Syed Saiq Hussain,
Muhammad Moinuddin
Abstract:
A novel adaptive filtering method called $q$-Volterra least mean square ($q$-VLMS) is presented in this paper. The $q$-VLMS is a nonlinear extension of conventional LMS and it is based on Jackson's derivative also known as $q$-calculus. In Volterra LMS, due to large variance of input signal the convergence speed is very low. With proper manipulation we successfully improved the convergence perform…
▽ More
A novel adaptive filtering method called $q$-Volterra least mean square ($q$-VLMS) is presented in this paper. The $q$-VLMS is a nonlinear extension of conventional LMS and it is based on Jackson's derivative also known as $q$-calculus. In Volterra LMS, due to large variance of input signal the convergence speed is very low. With proper manipulation we successfully improved the convergence performance of the Volterra LMS. The proposed algorithm is analyzed for the step-size bounds and results of analysis are verified through computer simulations for nonlinear channel estimation problem.
△ Less
Submitted 7 August, 2019;
originally announced August 2019.
-
Universal Adversarial Perturbations for Speech Recognition Systems
Authors:
Paarth Neekhara,
Shehzeen Hussain,
Prakhar Pandey,
Shlomo Dubnov,
Julian McAuley,
Farinaz Koushanfar
Abstract:
In this work, we demonstrate the existence of universal adversarial audio perturbations that cause mis-transcription of audio signals by automatic speech recognition (ASR) systems. We propose an algorithm to find a single quasi-imperceptible perturbation, which when added to any arbitrary speech signal, will most likely fool the victim speech recognition model. Our experiments demonstrate the appl…
▽ More
In this work, we demonstrate the existence of universal adversarial audio perturbations that cause mis-transcription of audio signals by automatic speech recognition (ASR) systems. We propose an algorithm to find a single quasi-imperceptible perturbation, which when added to any arbitrary speech signal, will most likely fool the victim speech recognition model. Our experiments demonstrate the application of our proposed technique by crafting audio-agnostic universal perturbations for the state-of-the-art ASR system -- Mozilla DeepSpeech. Additionally, we show that such perturbations generalize to a significant extent across models that are not available during training, by performing a transferability test on a WaveNet based ASR system.
△ Less
Submitted 15 August, 2019; v1 submitted 9 May, 2019;
originally announced May 2019.
-
SwishNet: A Fast Convolutional Neural Network for Speech, Music and Noise Classification and Segmentation
Authors:
Md. Shamim Hussain,
Mohammad Ariful Haque
Abstract:
Speech, Music and Noise classification/segmentation is an important preprocessing step for audio processing/indexing. To this end, we propose a novel 1D Convolutional Neural Network (CNN) - SwishNet. It is a fast and lightweight architecture that operates on MFCC features which is suitable to be added to the front-end of an audio processing pipeline. We showed that the performance of our network c…
▽ More
Speech, Music and Noise classification/segmentation is an important preprocessing step for audio processing/indexing. To this end, we propose a novel 1D Convolutional Neural Network (CNN) - SwishNet. It is a fast and lightweight architecture that operates on MFCC features which is suitable to be added to the front-end of an audio processing pipeline. We showed that the performance of our network can be improved by distilling knowledge from a 2D CNN, pretrained on ImageNet. We investigated the performance of our network on the MUSAN corpus - an openly available comprehensive collection of noise, music and speech samples, suitable for deep learning. The proposed network achieved high overall accuracy in clip (length of 0.5-2s) classification (>97% accuracy) and frame-wise segmentation (>93% accuracy) tasks with even higher accuracy (>99%) in speech/non-speech discrimination task. To verify the robustness of our model, we trained it on MUSAN and evaluated it on a different corpus - GTZAN and found good accuracy with very little fine-tuning. We also demonstrated that our model is fast on both CPU and GPU, consumes a low amount of memory and is suitable for implementation in embedded systems.
△ Less
Submitted 1 December, 2018;
originally announced December 2018.
-
Adaptive Control of Scalar Plants in the Presence of Unmodeled Dynamics
Authors:
Heather S. Hussain,
Megumi M. Matsutani,
Anuradha M. Annaswamy,
Eugene Lavretsky
Abstract:
Robust adaptive control of scalar plants in the presence of unmodeled dynamics is established in this paper. It is shown that implementation of a projection algorithm with standard adaptive control of a scalar plant ensures global boundedness of the overall adaptive system for a class of unmodeled dynamics.
Robust adaptive control of scalar plants in the presence of unmodeled dynamics is established in this paper. It is shown that implementation of a projection algorithm with standard adaptive control of a scalar plant ensures global boundedness of the overall adaptive system for a class of unmodeled dynamics.
△ Less
Submitted 31 January, 2013;
originally announced February 2013.