Search | arXiv e-print repository

Integration of Programmable Diffraction with Digital Neural Networks

Authors: Md Sadman Sakib Rahman, Aydogan Ozcan

Abstract: Optical imaging and sensing systems based on diffractive elements have seen massive advances over the last several decades. Earlier generations of diffractive optical processors were, in general, designed to deliver information to an independent system that was separately optimized, primarily driven by human vision or perception. With the recent advances in deep learning and digital neural network… ▽ More Optical imaging and sensing systems based on diffractive elements have seen massive advances over the last several decades. Earlier generations of diffractive optical processors were, in general, designed to deliver information to an independent system that was separately optimized, primarily driven by human vision or perception. With the recent advances in deep learning and digital neural networks, there have been efforts to establish diffractive processors that are jointly optimized with digital neural networks serving as their back-end. These jointly optimized hybrid (optical+digital) processors establish a new "diffractive language" between input electromagnetic waves that carry analog information and neural networks that process the digitized information at the back-end, providing the best of both worlds. Such hybrid designs can process spatially and temporally coherent, partially coherent, or incoherent input waves, providing universal coverage for any spatially varying set of point spread functions that can be optimized for a given task, executed in collaboration with digital neural networks. In this article, we highlight the utility of this exciting collaboration between engineered and programmed diffraction and digital neural networks for a diverse range of applications. We survey some of the major innovations enabled by the push-pull relationship between analog wave processing and digital neural networks, also covering the significant benefits that could be reaped through the synergy between these two complementary paradigms. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 30 Pages, 6 Figures

arXiv:2403.06438 [pdf, other]

Unification of Secret Key Generation and Wiretap Channel Transmission

Authors: Yingbo Hua, Md Saydur Rahman

Abstract: This paper presents further insights into a recently developed round-trip communication scheme called ``Secret-message Transmission by Echoing Encrypted Probes (STEEP)''. A legitimate wireless channel between a multi-antenna user (Alice) and a single-antenna user (Bob) in the presence of a multi-antenna eavesdropper (Eve) is focused on. STEEP does not require full-duplex, channel reciprocity or Ev… ▽ More This paper presents further insights into a recently developed round-trip communication scheme called ``Secret-message Transmission by Echoing Encrypted Probes (STEEP)''. A legitimate wireless channel between a multi-antenna user (Alice) and a single-antenna user (Bob) in the presence of a multi-antenna eavesdropper (Eve) is focused on. STEEP does not require full-duplex, channel reciprocity or Eve's channel state information, but is able to yield a positive secrecy rate in bits per channel use between Alice and Bob in every channel coherence period as long as Eve's receive channel is not noiseless. This secrecy rate does not diminish as coherence time increases. Various statistical behaviors of STEEP's secrecy capacity due to random channel fading are also illustrated. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: This paper has been accepted for presentation at IEEE ICC 2024

arXiv:2310.14005 [pdf, ps, other]

Ophthalmic Biomarker Detection Using Ensembled Vision Transformers -- Winning Solution to IEEE SPS VIP Cup 2023

Authors: H. A. Z. Sameen Shahgir, Khondker Salman Sayeed, Tanjeem Azwad Zaman, Md. Asif Haider, Sheikh Saifur Rahman Jony, M. Sohel Rahman

Abstract: This report outlines our approach in the IEEE SPS VIP Cup 2023: Ophthalmic Biomarker Detection competition. Our primary objective in this competition was to identify biomarkers from Optical Coherence Tomography (OCT) images obtained from a diverse range of patients. Using robust augmentations and 5-fold cross-validation, we trained two vision transformer-based models: MaxViT and EVA-02, and ensemb… ▽ More This report outlines our approach in the IEEE SPS VIP Cup 2023: Ophthalmic Biomarker Detection competition. Our primary objective in this competition was to identify biomarkers from Optical Coherence Tomography (OCT) images obtained from a diverse range of patients. Using robust augmentations and 5-fold cross-validation, we trained two vision transformer-based models: MaxViT and EVA-02, and ensembled them at inference time. We find MaxViT's use of convolution layers followed by strided attention to be better suited for the detection of local features while EVA-02's use of normal attention mechanism and knowledge distillation is better for detecting global features. Ours was the best-performing solution in the competition, achieving a patient-wise F1 score of 0.814 in the first phase and 0.8527 in the second and final phase of VIP Cup 2023, scoring 3.8% higher than the next-best solution. △ Less

Submitted 21 October, 2023; originally announced October 2023.

arXiv:2309.12502 [pdf, ps, other]

doi 10.1109/TSP.2023.3310252

Secure Degree of Freedom of Wireless Networks Using Collaborative Pilots

Authors: Yingbo Hua, Qingpeng Liang, Md Saydur Rahman

Abstract: A wireless network of full-duplex nodes/users, using anti-eavesdrop** channel estimation (ANECE) based on collaborative pilots, can yield a positive secure degree-of-freedom (SDoF) regardless of the number of antennas an eavesdropper may have. This paper presents novel results on SDoF of ANECE by analyzing secret-key capacity (SKC) of each pair of nodes in a network of multiple collaborative nod… ▽ More A wireless network of full-duplex nodes/users, using anti-eavesdrop** channel estimation (ANECE) based on collaborative pilots, can yield a positive secure degree-of-freedom (SDoF) regardless of the number of antennas an eavesdropper may have. This paper presents novel results on SDoF of ANECE by analyzing secret-key capacity (SKC) of each pair of nodes in a network of multiple collaborative nodes per channel coherence period. Each transmission session of ANECE has two phases: phase 1 is used for pilots, and phase 2 is used for random symbols. This results in two parts of SDoF of ANECE. Both lower and upper bounds on the SDoF of ANECE for any number of users are shown, and the conditions for the two bounds to meet are given. This leads to important discoveries, including: a) The phase-1 SDoF is the same for both multi-user ANECE and pair-wise ANECE while the former may require only a fraction of the number of time slots needed by the latter; b) For a three-user network, the phase-2 SDoF of all-user ANECE is generally larger than that of pair-wise ANECE; c) For a two-user network, a modified ANECE deploying square-shaped nonsingular pilot matrices yields a higher total SDoF than the original ANECE. The multi-user ANECE and the modified two-user ANECE shown in this paper appear to be the best full-duplex schemes known today in terms of SDoF subject to each node using a given number of antennas for both transmitting and receiving. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2308.02588 [pdf, other]

Unmasking Parkinson's Disease with Smile: An AI-enabled Screening Framework

Authors: Tariq Adnan, Md Saiful Islam, Wasifur Rahman, Sangwu Lee, Sutapa Dey Tithi, Kazi Noshin, Imran Sarker, M Saifur Rahman, Ehsan Hoque

Abstract: Parkinson's disease (PD) diagnosis remains challenging due to lacking a reliable biomarker and limited access to clinical care. In this study, we present an analysis of the largest video dataset containing micro-expressions to screen for PD. We collected 3,871 videos from 1,059 unique participants, including 256 self-reported PD patients. The recordings are from diverse sources encompassing partic… ▽ More Parkinson's disease (PD) diagnosis remains challenging due to lacking a reliable biomarker and limited access to clinical care. In this study, we present an analysis of the largest video dataset containing micro-expressions to screen for PD. We collected 3,871 videos from 1,059 unique participants, including 256 self-reported PD patients. The recordings are from diverse sources encompassing participants' homes across multiple countries, a clinic, and a PD care facility in the US. Leveraging facial landmarks and action units, we extracted features relevant to Hypomimia, a prominent symptom of PD characterized by reduced facial expressions. An ensemble of AI models trained on these features achieved an accuracy of 89.7% and an Area Under the Receiver Operating Characteristic (AUROC) of 89.3% while being free from detectable bias across population subgroups based on sex and ethnicity on held-out data. Further analysis reveals that features from the smiling videos alone lead to comparable performance, even on two external test sets the model has never seen during training, suggesting the potential for PD risk assessment from smiling selfie videos. △ Less

Submitted 3 August, 2023; originally announced August 2023.

arXiv:2304.09023 [pdf, other]

doi 10.1016/j.ifacol.2023.10.111

Measurement-Based Control for Minimizing Energy Functions in Quantum Systems

Authors: Henrik Glavind Clausen, Salahuddin Abdul Rahman, Özkan Karabacak, Rafal Wisniewski

Abstract: In variational quantum algorithms (VQAs), the most common objective is to find the minimum energy eigenstate of a given energy Hamiltonian. In this paper, we consider the general problem of finding a sufficient control Hamiltonian structure that, under a given feedback control law, ensures convergence to the minimum energy eigenstate of a given energy function. By including quantum non-demolition… ▽ More In variational quantum algorithms (VQAs), the most common objective is to find the minimum energy eigenstate of a given energy Hamiltonian. In this paper, we consider the general problem of finding a sufficient control Hamiltonian structure that, under a given feedback control law, ensures convergence to the minimum energy eigenstate of a given energy function. By including quantum non-demolition (QND) measurements in the loop, convergence to a pure state can be ensured from an arbitrary mixed initial state. Based on existing results on strict control Lyapunov functions, we formulate a semidefinite optimization problem, whose solution defines a non-unique control Hamiltonian, which is sufficient to ensure almost sure convergence to the minimum energy eigenstate under the given feedback law and the action of QND measurements. A numerical example is provided to showcase the proposed methodology. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Comments: Accepted for IFAC 2023 - 22nd World Congress of the International Federation of Automatic Control

Journal ref: IFAC-PapersOnLine, Volume 56, Issue 2, 2023, Pages 5171-5178

arXiv:2210.12921 [pdf]

Investigating self-supervised, weakly supervised and fully supervised training approaches for multi-domain automatic speech recognition: a study on Bangladeshi Bangla

Authors: Ahnaf Mozib Samin, M. Humayon Kobir, Md. Mushtaq Shahriyar Rafee, M. Firoz Ahmed, Mehedi Hasan, Partha Ghosh, Shafkat Kibria, M. Shahidur Rahman

Abstract: Despite huge improvements in automatic speech recognition (ASR) employing neural networks, ASR systems still suffer from a lack of robustness and generalizability issues due to domain shifting. This is mainly because principal corpus design criteria are often not identified and examined adequately while compiling ASR datasets. In this study, we investigate the robustness of the state-of-the-art tr… ▽ More Despite huge improvements in automatic speech recognition (ASR) employing neural networks, ASR systems still suffer from a lack of robustness and generalizability issues due to domain shifting. This is mainly because principal corpus design criteria are often not identified and examined adequately while compiling ASR datasets. In this study, we investigate the robustness of the state-of-the-art transfer learning approaches such as self-supervised wav2vec 2.0 and weakly supervised Whisper as well as fully supervised convolutional neural networks (CNNs) for multi-domain ASR. We also demonstrate the significance of domain selection while building a corpus by assessing these models on a novel multi-domain Bangladeshi Bangla ASR evaluation benchmark - BanSpeech, which contains approximately 6.52 hours of human-annotated speech and 8085 utterances from 13 distinct domains. SUBAK.KO, a mostly read speech corpus for the morphologically rich language Bangla, has been used to train the ASR systems. Experimental evaluation reveals that self-supervised cross-lingual pre-training is the best strategy compared to weak supervision and full supervision to tackle the multi-domain ASR task. Moreover, the ASR models trained on SUBAK.KO face difficulty recognizing speech from domains with mostly spontaneous speech. The BanSpeech will be publicly available to meet the need for a challenging evaluation benchmark for Bangla ASR. △ Less

Submitted 10 May, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

arXiv:2203.05408 [pdf, other]

Attacks as Defenses: Designing Robust Audio CAPTCHAs Using Attacks on Automatic Speech Recognition Systems

Authors: Hadi Abdullah, Aditya Karlekar, Saurabh Prasad, Muhammad Sajidur Rahman, Logan Blue, Luke A. Bauer, Vincent Bindschaedler, Patrick Traynor

Abstract: Audio CAPTCHAs are supposed to provide a strong defense for online resources; however, advances in speech-to-text mechanisms have rendered these defenses ineffective. Audio CAPTCHAs cannot simply be abandoned, as they are specifically named by the W3C as important enablers of accessibility. Accordingly, demonstrably more robust audio CAPTCHAs are important to the future of a secure and accessible… ▽ More Audio CAPTCHAs are supposed to provide a strong defense for online resources; however, advances in speech-to-text mechanisms have rendered these defenses ineffective. Audio CAPTCHAs cannot simply be abandoned, as they are specifically named by the W3C as important enablers of accessibility. Accordingly, demonstrably more robust audio CAPTCHAs are important to the future of a secure and accessible Web. We look to recent literature on attacks on speech-to-text systems for inspiration for the construction of robust, principle-driven audio defenses. We begin by comparing 20 recent attack papers, classifying and measuring their suitability to serve as the basis of new "robust to transcription" but "easy for humans to understand" CAPTCHAs. After showing that none of these attacks alone are sufficient, we propose a new mechanism that is both comparatively intelligible (evaluated through a user study) and hard to automatically transcribe (i.e., $P({\rm transcription}) = 4 \times 10^{-5}$). Finally, we demonstrate that our audio samples have a high probability of being detected as CAPTCHAs when given to speech-to-text systems ($P({\rm evasion}) = 1.77 \times 10^{-4}$). In so doing, we not only demonstrate a CAPTCHA that is approximately four orders of magnitude more difficult to crack, but that such systems can be designed based on the insights gained from attack papers using the differences between the ways that humans and computers process audio. △ Less

Submitted 10 March, 2022; originally announced March 2022.

arXiv:2202.06128 [pdf, other]

Grasp-and-Lift Detection from EEG Signal Using Convolutional Neural Network

Authors: Md. Kamrul Hasan, Sifat Redwan Wahid, Faria Rahman, Shanjida Khan Maliha, Sauda Binte Rahman

Abstract: People undergoing neuromuscular dysfunctions and amputated limbs require automatic prosthetic appliances. In develo** such prostheses, the precise detection of brain motor actions is imperative for the Grasp-and-Lift (GAL) tasks. Because of the low-cost and non-invasive essence of Electroencephalography (EEG), it is widely preferred for detecting motor actions during the controls of prosthetic t… ▽ More People undergoing neuromuscular dysfunctions and amputated limbs require automatic prosthetic appliances. In develo** such prostheses, the precise detection of brain motor actions is imperative for the Grasp-and-Lift (GAL) tasks. Because of the low-cost and non-invasive essence of Electroencephalography (EEG), it is widely preferred for detecting motor actions during the controls of prosthetic tools. This article has automated the hand movement activity viz GAL detection method from the 32-channel EEG signals. The proposed pipeline essentially combines preprocessing and end-to-end detection steps, eliminating the requirement of hand-crafted feature engineering. Preprocessing action consists of raw signal denoising, using either Discrete Wavelet Transform (DWT) or highpass or bandpass filtering and data standardization. The detection step consists of Convolutional Neural Network (CNN)- or Long Short Term Memory (LSTM)-based model. All the investigations utilize the publicly available WAY-EEG-GAL dataset, having six different GAL events. The best experiment reveals that the proposed framework achieves an average area under the ROC curve of 0.944, employing the DWT-based denoising filter, data standardization, and CNN-based detection model. The obtained outcome designates an excellent achievement of the introduced method in detecting GAL events from the EEG signals, turning it applicable to prosthetic appliances, brain-computer interfaces, robotic arms, etc. △ Less

Submitted 12 February, 2022; originally announced February 2022.

Comments: Accepted in https://icaeee2022.com/

arXiv:2201.00458 [pdf, other]

Lung-Originated Tumor Segmentation from Computed Tomography Scan (LOTUS) Benchmark

Authors: Parnian Afshar, Arash Mohammadi, Konstantinos N. Plataniotis, Keyvan Farahani, Justin Kirby, Anastasia Oikonomou, Amir Asif, Leonard Wee, Andre Dekker, Xin Wu, Mohammad Ariful Haque, Shahruk Hossain, Md. Kamrul Hasan, Uday Kamal, Winston Hsu, Jhih-Yuan Lin, M. Sohel Rahman, Nabil Ibtehaz, Sh. M. Amir Foisol, Kin-Man Lam, Zhong Guang, Runze Zhang, Sumohana S. Channappayya, Shashank Gupta, Chander Dev

Abstract: Lung cancer is one of the deadliest cancers, and in part its effective diagnosis and treatment depend on the accurate delineation of the tumor. Human-centered segmentation, which is currently the most common approach, is subject to inter-observer variability, and is also time-consuming, considering the fact that only experts are capable of providing annotations. Automatic and semi-automatic tumor… ▽ More Lung cancer is one of the deadliest cancers, and in part its effective diagnosis and treatment depend on the accurate delineation of the tumor. Human-centered segmentation, which is currently the most common approach, is subject to inter-observer variability, and is also time-consuming, considering the fact that only experts are capable of providing annotations. Automatic and semi-automatic tumor segmentation methods have recently shown promising results. However, as different researchers have validated their algorithms using various datasets and performance metrics, reliably evaluating these methods is still an open challenge. The goal of the Lung-Originated Tumor Segmentation from Computed Tomography Scan (LOTUS) Benchmark created through 2018 IEEE Video and Image Processing (VIP) Cup competition, is to provide a unique dataset and pre-defined metrics, so that different researchers can develop and evaluate their methods in a unified fashion. The 2018 VIP Cup started with a global engagement from 42 countries to access the competition data. At the registration stage, there were 129 members clustered into 28 teams from 10 countries, out of which 9 teams made it to the final stage and 6 teams successfully completed all the required tasks. In a nutshell, all the algorithms proposed during the competition, are based on deep learning models combined with a false positive reduction technique. Methods developed by the three finalists show promising results in tumor segmentation, however, more effort should be put into reducing the false positive rate. This competition manuscript presents an overview of the VIP-Cup challenge, along with the proposed algorithms and results. △ Less

Submitted 2 January, 2022; originally announced January 2022.

arXiv:2111.09262 [pdf, other]

Segmentation of Lung Tumor from CT Images using Deep Supervision

Authors: Farhanaz Farheen, Md. Salman Shamil, Nabil Ibtehaz, M. Sohel Rahman

Abstract: Lung cancer is a leading cause of death in most countries of the world. Since prompt diagnosis of tumors can allow oncologists to discern their nature, type and the mode of treatment, tumor detection and segmentation from CT Scan images is a crucial field of study worldwide. This paper approaches lung tumor segmentation by applying two-dimensional discrete wavelet transform (DWT) on the LOTUS data… ▽ More Lung cancer is a leading cause of death in most countries of the world. Since prompt diagnosis of tumors can allow oncologists to discern their nature, type and the mode of treatment, tumor detection and segmentation from CT Scan images is a crucial field of study worldwide. This paper approaches lung tumor segmentation by applying two-dimensional discrete wavelet transform (DWT) on the LOTUS dataset for more meticulous texture analysis whilst integrating information from neighboring CT slices before feeding them to a Deeply Supervised MultiResUNet model. Variations in learning rates, decay and optimization algorithms while training the network have led to different dice co-efficients, the detailed statistics of which have been included in this paper. We also discuss the challenges in this dataset and how we opted to overcome them. In essence, this study aims to maximize the success rate of predicting tumor regions from two dimensional CT Scan slices by experimenting with a number of adequate networks, resulting in a dice co-efficient of 0.8472. △ Less

Submitted 17 November, 2021; originally announced November 2021.

arXiv:2111.08480 [pdf]

doi 10.3390/s22030919

A Shallow U-Net Architecture for Reliably Predicting Blood Pressure (BP) from Photoplethysmogram (PPG) and Electrocardiogram (ECG) Signals

Authors: Sakib Mahmud, Nabil Ibtehaz, Amith Khandakar, Anas Tahir, Tawsifur Rahman, Khandaker Reajul Islam, Md Shafayet Hossain, M. Sohel Rahman, Mohammad Tariqul Islam, Muhammad E. H. Chowdhury

Abstract: Cardiovascular diseases are the most common causes of death around the world. To detect and treat heart-related diseases, continuous Blood Pressure (BP) monitoring along with many other parameters are required. Several invasive and non-invasive methods have been developed for this purpose. Most existing methods used in the hospitals for continuous monitoring of BP are invasive. On the contrary, cu… ▽ More Cardiovascular diseases are the most common causes of death around the world. To detect and treat heart-related diseases, continuous Blood Pressure (BP) monitoring along with many other parameters are required. Several invasive and non-invasive methods have been developed for this purpose. Most existing methods used in the hospitals for continuous monitoring of BP are invasive. On the contrary, cuff-based BP monitoring methods, which can predict Systolic Blood Pressure (SBP) and Diastolic Blood Pressure (DBP), cannot be used for continuous monitoring. Several studies attempted to predict BP from non-invasively collectible signals such as Photoplethysmogram (PPG) and Electrocardiogram (ECG), which can be used for continuous monitoring. In this study, we explored the applicability of autoencoders in predicting BP from PPG and ECG signals. The investigation was carried out on 12,000 instances of 942 patients of the MIMIC-II dataset and it was found that a very shallow, one-dimensional autoencoder can extract the relevant features to predict the SBP and DBP with the state-of-the-art performance on a very large dataset. Independent test set from a portion of the MIMIC-II dataset provides an MAE of 2.333 and 0.713 for SBP and DBP, respectively. On an external dataset of forty subjects, the model trained on the MIMIC-II dataset, provides an MAE of 2.728 and 1.166 for SBP and DBP, respectively. For both the cases, the results met British Hypertension Society (BHS) Grade A and surpassed the studies from the current literature. △ Less

Submitted 12 November, 2021; originally announced November 2021.

Comments: 22 pages, Figure 8, Table 13

Journal ref: Sensors 2022, 22(3), 919

arXiv:2110.13250 [pdf, other]

Beyond $L_p$ clip**: Equalization-based Psychoacoustic Attacks against ASRs

Authors: Hadi Abdullah, Muhammad Sajidur Rahman, Christian Peeters, Cassidy Gibson, Washington Garcia, Vincent Bindschaedler, Thomas Shrimpton, Patrick Traynor

Abstract: Automatic Speech Recognition (ASR) systems convert speech into text and can be placed into two broad categories: traditional and fully end-to-end. Both types have been shown to be vulnerable to adversarial audio examples that sound benign to the human ear but force the ASR to produce malicious transcriptions. Of these attacks, only the "psychoacoustic" attacks can create examples with relatively i… ▽ More Automatic Speech Recognition (ASR) systems convert speech into text and can be placed into two broad categories: traditional and fully end-to-end. Both types have been shown to be vulnerable to adversarial audio examples that sound benign to the human ear but force the ASR to produce malicious transcriptions. Of these attacks, only the "psychoacoustic" attacks can create examples with relatively imperceptible perturbations, as they leverage the knowledge of the human auditory system. Unfortunately, existing psychoacoustic attacks can only be applied against traditional models, and are obsolete against the newer, fully end-to-end ASRs. In this paper, we propose an equalization-based psychoacoustic attack that can exploit both traditional and fully end-to-end ASRs. We successfully demonstrate our attack against real-world ASRs that include DeepSpeech and Wav2Letter. Moreover, we employ a user study to verify that our method creates low audible distortion. Specifically, 80 of the 100 participants voted in favor of all our attack audio samples as less noisier than the existing state-of-the-art attack. Through this, we demonstrate both types of existing ASR pipelines can be exploited with minimum degradation to attack audio quality. △ Less

Submitted 25 October, 2021; originally announced October 2021.

Comments: accepted at ACML 2021

arXiv:2110.07365 [pdf, other]

DynoLoc: Infrastructure-free RF Tracking in Dynamic Indoor Environments

Authors: Md. Shaifur Rahman, Ayon Chakraborty, Karthikeyan Sunderasan, Sampath Rangarajan

Abstract: Promising solutions exist today that can accurately track mobile entities indoor using visual inertial odometry in favorable visual conditions, or by leveraging fine-grained ranging (RF, ultrasonic, IR, etc.) to reference anchors. However, they are unable to directly cater to "dynamic" indoor environments (e.g. first responder scenarios, multi-player AR/VR gaming in everyday spaces, etc.) that are… ▽ More Promising solutions exist today that can accurately track mobile entities indoor using visual inertial odometry in favorable visual conditions, or by leveraging fine-grained ranging (RF, ultrasonic, IR, etc.) to reference anchors. However, they are unable to directly cater to "dynamic" indoor environments (e.g. first responder scenarios, multi-player AR/VR gaming in everyday spaces, etc.) that are devoid of such favorable conditions. Indeed, we show that the need for "infrastructure-free", and robustness to "node mobility" and "visual conditions" in such environments, motivates a robust RF-based approach along with the need to address a novel and challenging variant of its infrastructure-free (i.e. peer-to-peer) localization problem that is latency-bounded - accurate tracking of mobile entities imposes a latency budget that not only affects the solution computation but also the collection of peer-to-peer ranges themselves. In this work, we present the design and deployment of DynoLoc that addresses this latency-bounded infrastructure-free RF localization problem. To this end, DynoLoc unravels the fundamental tradeoff between latency and localization accuracy and incorporates design elements that judiciously leverage the available ranging resources to adaptively estimate the joint topology of nodes, coupled with robust algorithm that maximizes the localization accuracy even in the face of practical environmental artifacts (wireless connectivity and multipath, node mobility, etc.). This allows DynoLoc to track (every second) a network of few tens of mobile entities even at speeds of 1-2 m/s with median accuracies under 1-2 m (compared to 5m+ with baselines), without infrastructure support. We demonstrate DynoLoc's potential in a real-world firefighters' drill, as well as two other use cases of (i) multi-player AR/VR gaming, and (ii) active shooter tracking by first responders. △ Less

Submitted 19 March, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

Comments: The work was done when all the authors were employees of NEC Laboratories America and is protected by the patent applications: US20210306977A1 and US20210185491A1 available in the public domain

arXiv:2108.00500 [pdf, other]

End to End Bangla Speech Synthesis

Authors: Prithwiraj Bhattacharjee, Rajan Saha Raju, Arif Ahmad, M. Shahidur Rahman

Abstract: Text-to-Speech (TTS) system is a system where speech is synthesized from a given text following any particular approach. Concatenative synthesis, Hidden Markov Model (HMM) based synthesis, Deep Learning (DL) based synthesis with multiple building blocks, etc. are the main approaches for implementing a TTS system. Here, we are presenting our deep learning-based end-to-end Bangla speech synthesis sy… ▽ More Text-to-Speech (TTS) system is a system where speech is synthesized from a given text following any particular approach. Concatenative synthesis, Hidden Markov Model (HMM) based synthesis, Deep Learning (DL) based synthesis with multiple building blocks, etc. are the main approaches for implementing a TTS system. Here, we are presenting our deep learning-based end-to-end Bangla speech synthesis system. It has been implemented with minimal human annotation using only 3 major components (Encoder, Decoder, Post-processing net including waveform synthesis). It does not require any frontend preprocessor and Grapheme-to-Phoneme (G2P) converter. Our model has been trained with phonetically balanced 20 hours of single speaker speech data. It has obtained a 3.79 Mean Opinion Score (MOS) on a scale of 5.0 as subjective evaluation and a 0.77 Perceptual Evaluation of Speech Quality(PESQ) score on a scale of [-0.5, 4.5] as objective evaluation. It is outperforming all existing non-commercial state-of-the-art Bangla TTS systems based on naturalness. △ Less

Submitted 1 August, 2021; originally announced August 2021.

arXiv:2107.08177 [pdf]

doi 10.1021/acsphotonics.1c01365

Computer-free, all-optical reconstruction of holograms using diffractive networks

Authors: Md Sadman Sakib Rahman, Aydogan Ozcan

Abstract: Reconstruction of in-line holograms of unknown objects in general suffers from twin-image artifacts due to the appearance of an out-of-focus image overlap** with the desired image to be reconstructed. Computer-based iterative phase retrieval algorithms and learning-based methods have been used for the suppression of such image artifacts in digital holography. Here we report an all-optical hologr… ▽ More Reconstruction of in-line holograms of unknown objects in general suffers from twin-image artifacts due to the appearance of an out-of-focus image overlap** with the desired image to be reconstructed. Computer-based iterative phase retrieval algorithms and learning-based methods have been used for the suppression of such image artifacts in digital holography. Here we report an all-optical hologram reconstruction method that can instantly retrieve the image of an unknown object from its in-line hologram and eliminate twin-image artifacts without using a digital processor or a computer. Multiple transmissive diffractive layers are trained using deep learning so that the diffracted light from an arbitrary input hologram is processed all-optically, through light-matter interaction, to reconstruct the image of an unknown object at the speed of light propagation and without the need for any external power. This passive all-optical processor composed of spatially-engineered transmissive layers forms a diffractive network, which successfully generalizes to reconstruct in-line holograms of unknown, new objects and exhibits improved diffraction efficiency as well as extended depth-of-field at the hologram recording distance. This all-optical hologram processor and the underlying design framework can find numerous applications in coherent imaging and holographic display-related applications owing to its major advantages in terms of image reconstruction speed and computer-free operation. △ Less

Submitted 17 July, 2021; originally announced July 2021.

Comments: 19 Pages, 5 Figures

Journal ref: ACS Photonics (2021)

arXiv:2103.07985 [pdf]

COVID-19 Infection Localization and Severity Grading from Chest X-ray Images

Authors: Anas M. Tahir, Muhammad E. H. Chowdhury, Amith Khandakar, Tawsifur Rahman, Yazan Qiblawey, Uzair Khurshid, Serkan Kiranyaz, Nabil Ibtehaz, M Shohel Rahman, Somaya Al-Madeed, Khaled Hameed, Tahir Hamid, Sakib Mahmud, Maymouna Ezeddin

Abstract: Coronavirus disease 2019 (COVID-19) has been the main agenda of the whole world, since it came into sight in December 2019 as it has significantly affected the world economy and healthcare system. Given the effects of COVID-19 on pulmonary tissues, chest radiographic imaging has become a necessity for screening and monitoring the disease. Numerous studies have proposed Deep Learning approaches for… ▽ More Coronavirus disease 2019 (COVID-19) has been the main agenda of the whole world, since it came into sight in December 2019 as it has significantly affected the world economy and healthcare system. Given the effects of COVID-19 on pulmonary tissues, chest radiographic imaging has become a necessity for screening and monitoring the disease. Numerous studies have proposed Deep Learning approaches for the automatic diagnosis of COVID-19. Although these methods achieved astonishing performance in detection, they have used limited chest X-ray (CXR) repositories for evaluation, usually with a few hundred COVID-19 CXR images only. Thus, such data scarcity prevents reliable evaluation with the potential of overfitting. In addition, most studies showed no or limited capability in infection localization and severity grading of COVID-19 pneumonia. In this study, we address this urgent need by proposing a systematic and unified approach for lung segmentation and COVID-19 localization with infection quantification from CXR images. To accomplish this, we have constructed the largest benchmark dataset with 33,920 CXR images, including 11,956 COVID-19 samples, where the annotation of ground-truth lung segmentation masks is performed on CXRs by a novel human-machine collaborative approach. An extensive set of experiments was performed using the state-of-the-art segmentation networks, U-Net, U-Net++, and Feature Pyramid Networks (FPN). The developed network, after an extensive iterative process, reached a superior performance for lung region segmentation with Intersection over Union (IoU) of 96.11% and Dice Similarity Coefficient (DSC) of 97.99%. Furthermore, COVID-19 infections of various shapes and types were reliably localized with 83.05% IoU and 88.21% DSC. Finally, the proposed approach has achieved an outstanding COVID-19 detection performance with both sensitivity and specificity values above 99%. △ Less

Submitted 14 March, 2021; originally announced March 2021.

Comments: 30 pages, 5 figures, 4 tables

arXiv:2101.00686 [pdf]

An Evolution of CNN Object Classifiers on Low-Resolution Images

Authors: Md. Mohsin Kabir, Abu Quwsar Ohi, Md. Saifur Rahman, M. F. Mridha

Abstract: Object classification is a significant task in computer vision. It has become an effective research area as an important aspect of image processing and the building block of image localization, detection, and scene parsing. Object classification from low-quality images is difficult for the variance of object colors, aspect ratios, and cluttered backgrounds. The field of object classification has s… ▽ More Object classification is a significant task in computer vision. It has become an effective research area as an important aspect of image processing and the building block of image localization, detection, and scene parsing. Object classification from low-quality images is difficult for the variance of object colors, aspect ratios, and cluttered backgrounds. The field of object classification has seen remarkable advancements, with the development of deep convolutional neural networks (DCNNs). Deep neural networks have been demonstrated as very powerful systems for facing the challenge of object classification from high-resolution images, but deploying such object classification networks on the embedded device remains challenging due to the high computational and memory requirements. Using high-quality images often causes high computational and memory complexity, whereas low-quality images can solve this issue. Hence, in this paper, we investigate an optimal architecture that accurately classifies low-quality images using DCNNs architectures. To validate different baselines on lowquality images, we perform experiments using webcam captured image datasets of 10 different objects. In this research work, we evaluate the proposed architecture by implementing popular CNN architectures. The experimental results validate that the MobileNet architecture delivers better than most of the available CNN architectures for low-resolution webcam image datasets. △ Less

Submitted 3 January, 2021; originally announced January 2021.

Comments: Accepted in IEEE Honet 2020

arXiv:2012.14337 [pdf, other]

WiFresh: Age-of-Information from Theory to Implementation

Authors: Igor Kadota, Muhammad Shahir Rahman, Eytan Modiano

Abstract: Emerging applications, such as smart factories and fleets of drones, increasingly rely on sharing time-sensitive information for monitoring and control. In such application domains, it is essential to keep information fresh, as outdated information loses its value and can lead to system failures and safety risks. The Age-of-Information is a performance metric that captures how fresh the informatio… ▽ More Emerging applications, such as smart factories and fleets of drones, increasingly rely on sharing time-sensitive information for monitoring and control. In such application domains, it is essential to keep information fresh, as outdated information loses its value and can lead to system failures and safety risks. The Age-of-Information is a performance metric that captures how fresh the information is from the perspective of the destination. In this paper, we show that as the congestion in the wireless network increases, the Age-of-Information degrades sharply, leading to outdated information at the destination. Leveraging years of theoretical research, we propose WiFresh: an unconventional architecture that achieves near optimal information freshness in wireless networks of any size, even when the network is overloaded. Our experimental results show that WiFresh can improve information freshness by two orders of magnitude when compared to an equivalent standard WiFi network. We propose and realize two strategies for implementing WiFresh: one at the MAC layer using hardware-level programming and another at the Application layer using Python. △ Less

Submitted 28 December, 2020; originally announced December 2020.

arXiv:2012.11532 [pdf, other]

Dual-CyCon Net: A Cycle Consistent Dual-Domain Convolutional Neural Network Framework for Detection of Partial Discharge

Authors: Mohammad Zunaed, Ankur Nath, Md. Saifur Rahman

Abstract: In the last decade, researchers have been investigating the severity of insulation breakdown caused by partial discharge (PD) in overhead transmission lines with covered conductors or electrical equipment such as generators and motors used in various industries. Develo** an effective partial discharge detection system can lead to significant savings on maintenance and prevent power disruptions.… ▽ More In the last decade, researchers have been investigating the severity of insulation breakdown caused by partial discharge (PD) in overhead transmission lines with covered conductors or electrical equipment such as generators and motors used in various industries. Develo** an effective partial discharge detection system can lead to significant savings on maintenance and prevent power disruptions. Traditional methods rely on hand-crafted features and domain expertise to identify partial discharge patterns in the electrical current. Many data-driven deep learning-based methods have been proposed in recent years to remove these ad hoc feature extraction. However, most of these methods either operate in the time-domain or frequency-domain. Many research approaches have been developed to generate phase-resolved partial discharge (PRPD) patterns from raw PD sensor data. These PRPD diagrams suggest a correlation between partial discharge activities occurring in an alternating electrical waveform's positive and negative half-cycles. However, this correlation criterion between half-cycles has been remained unexplored in deep learning-based methods. This work proposes a novel feature-fusion-based Dual-CyCon Net that can utilize all time, frequency, and phase domain features for joint learning in one cohesive framework. Our proposed cycle-consistency loss exploits any relation between an alternating electrical signal's positive and negative half-cycles to calibrate the model's sensitivity. This loss explores cycle-invariant PD-specific features, enabling the model to learn more robust, noise-invariant features for PD detection. A case study of our proposed framework on a public real-world noisy measurement from high-frequency voltage sensors to detect damaged power lines has achieved a state-of-the-art MCC score of 0.8455. △ Less

Submitted 19 October, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

arXiv:2009.06869 [pdf]

doi 10.1038/s41377-020-00446-w

Ensemble learning of diffractive optical networks

Authors: Md Sadman Sakib Rahman, **gxi Li, Deniz Mengu, Yair Rivenson, Aydogan Ozcan

Abstract: A plethora of research advances have emerged in the fields of optics and photonics that benefit from harnessing the power of machine learning. Specifically, there has been a revival of interest in optical computing hardware, due to its potential advantages for machine learning tasks in terms of parallelization, power efficiency and computation speed. Diffractive Deep Neural Networks (D2NNs) form s… ▽ More A plethora of research advances have emerged in the fields of optics and photonics that benefit from harnessing the power of machine learning. Specifically, there has been a revival of interest in optical computing hardware, due to its potential advantages for machine learning tasks in terms of parallelization, power efficiency and computation speed. Diffractive Deep Neural Networks (D2NNs) form such an optical computing framework, which benefits from deep learning-based design of successive diffractive layers to all-optically process information as the input light diffracts through these passive layers. D2NNs have demonstrated success in various tasks, including e.g., object classification, spectral-encoding of information, optical pulse sha** and imaging, among others. Here, we significantly improve the inference performance of diffractive optical networks using feature engineering and ensemble learning. After independently training a total of 1252 D2NNs that were diversely engineered with a variety of passive input filters, we applied a pruning algorithm to select an optimized ensemble of D2NNs that collectively improve their image classification accuracy. Through this pruning, we numerically demonstrated that ensembles of N=14 and N=30 D2NNs achieve blind testing accuracies of 61.14% and 62.13%, respectively, on the classification of CIFAR-10 test images, providing an inference improvement of >16% compared to the average performance of the individual D2NNs within each ensemble. These results constitute the highest inference accuracies achieved to date by any diffractive optical neural network design on the same dataset and might provide a significant leapfrog to extend the application space of diffractive optical image classification and machine vision systems. △ Less

Submitted 15 September, 2020; originally announced September 2020.

Comments: 22 Pages, 4 Figures, 1 Table

Journal ref: Light: Science & Applications (2021)

arXiv:2005.01669 [pdf]

PPG2ABP: Translating Photoplethysmogram (PPG) Signals to Arterial Blood Pressure (ABP) Waveforms using Fully Convolutional Neural Networks

Authors: Nabil Ibtehaz, Sakib Mahmud, Muhammad E. H. Chowdhury, Amith Khandakar, Mohamed Arselene Ayari, Anas Tahir, M. Sohel Rahman

Abstract: Cardiovascular diseases are one of the most severe causes of mortality, taking a heavy toll of lives annually throughout the world. The continuous monitoring of blood pressure seems to be the most viable option, but this demands an invasive process, bringing about several layers of complexities. This motivates us to develop a method to predict the continuous arterial blood pressure (ABP) waveform… ▽ More Cardiovascular diseases are one of the most severe causes of mortality, taking a heavy toll of lives annually throughout the world. The continuous monitoring of blood pressure seems to be the most viable option, but this demands an invasive process, bringing about several layers of complexities. This motivates us to develop a method to predict the continuous arterial blood pressure (ABP) waveform through a non-invasive approach using photoplethysmogram (PPG) signals. In addition we explore the advantage of deep learning as it would free us from sticking to ideally shaped PPG signals only, by making handcrafted feature computation irrelevant, which is a shortcoming of the existing approaches. Thus, we present, PPG2ABP, a deep learning based method, that manages to predict the continuous ABP waveform from the input PPG signal, with a mean absolute error of 4.604 mmHg, preserving the shape, magnitude and phase in unison. However, the more astounding success of PPG2ABP turns out to be that the computed values of DBP, MAP and SBP from the predicted ABP waveform outperforms the existing works under several metrics, despite that PPG2ABP is not explicitly trained to do so. △ Less

Submitted 26 September, 2022; v1 submitted 4 May, 2020; originally announced May 2020.

arXiv:2004.03747 [pdf]

COVID_MTNet: COVID-19 Detection with Multi-Task Deep Learning Approaches

Authors: Md Zahangir Alom, M M Shaifur Rahman, Mst Shamima Nasrin, Tarek M. Taha, Vijayan K. Asari

Abstract: COVID-19 is currently one the most life-threatening problems around the world. The fast and accurate detection of the COVID-19 infection is essential to identify, take better decisions and ensure treatment for the patients which will help save their lives. In this paper, we propose a fast and efficient way to identify COVID-19 patients with multi-task deep learning (DL) methods. Both X-ray and CT… ▽ More COVID-19 is currently one the most life-threatening problems around the world. The fast and accurate detection of the COVID-19 infection is essential to identify, take better decisions and ensure treatment for the patients which will help save their lives. In this paper, we propose a fast and efficient way to identify COVID-19 patients with multi-task deep learning (DL) methods. Both X-ray and CT scan images are considered to evaluate the proposed technique. We employ our Inception Residual Recurrent Convolutional Neural Network with Transfer Learning (TL) approach for COVID-19 detection and our NABLA-N network model for segmenting the regions infected by COVID-19. The detection model shows around 84.67% testing accuracy from X-ray images and 98.78% accuracy in CT-images. A novel quantitative analysis strategy is also proposed in this paper to determine the percentage of infected regions in X-ray and CT images. The qualitative and quantitative results demonstrate promising results for COVID-19 detection and infected region localization. △ Less

Submitted 18 April, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

Comments: 11 pages, 15 figures

arXiv:1912.10371 [pdf]

Spectral remap** of natural signals

Authors: Md. Shoaibur Rahman

Abstract: Here we present an algorithm to procedurally remap spectral contents of natural signals. The algorithm takes in two inputs: a signal whose spectral component needs to be remapped and a war** or remap** function. The algorithm generates one output, which is a remapped version of the original signal. The input signal is remapped into the output signal in two steps. In the analysis step, the algo… ▽ More Here we present an algorithm to procedurally remap spectral contents of natural signals. The algorithm takes in two inputs: a signal whose spectral component needs to be remapped and a war** or remap** function. The algorithm generates one output, which is a remapped version of the original signal. The input signal is remapped into the output signal in two steps. In the analysis step, the algorithm performs a series of operations to modify the spectral content, i.e., compute the warped phase of the signal according to the given remap** function. In the synthesis step, the modified spectral content is combined with the envelope information of the input signal to reconstruct the warped or remapped output signal. △ Less

Submitted 21 December, 2019; originally announced December 2019.

Comments: 7 pages, 7 figures

arXiv:1910.05262 [pdf, other]

Hear "No Evil", See "Kenansville": Efficient and Transferable Black-Box Attacks on Speech Recognition and Voice Identification Systems

Authors: Hadi Abdullah, Muhammad Sajidur Rahman, Washington Garcia, Logan Blue, Kevin Warren, Anurag Swarnim Yadav, Tom Shrimpton, Patrick Traynor

Abstract: Automatic speech recognition and voice identification systems are being deployed in a wide array of applications, from providing control mechanisms to devices lacking traditional interfaces, to the automatic transcription of conversations and authentication of users. Many of these applications have significant security and privacy considerations. We develop attacks that force mistranscription and… ▽ More Automatic speech recognition and voice identification systems are being deployed in a wide array of applications, from providing control mechanisms to devices lacking traditional interfaces, to the automatic transcription of conversations and authentication of users. Many of these applications have significant security and privacy considerations. We develop attacks that force mistranscription and misidentification in state of the art systems, with minimal impact on human comprehension. Processing pipelines for modern systems are comprised of signal preprocessing and feature extraction steps, whose output is fed to a machine-learned model. Prior work has focused on the models, using white-box knowledge to tailor model-specific attacks. We focus on the pipeline stages before the models, which (unlike the models) are quite similar across systems. As such, our attacks are black-box and transferable, and demonstrably achieve mistranscription and misidentification rates as high as 100% by modifying only a few frames of audio. We perform a study via Amazon Mechanical Turk demonstrating that there is no statistically significant difference between human perception of regular and perturbed audio. Our findings suggest that models may learn aspects of speech that are generally not perceived by human subjects, but that are crucial for model accuracy. We also find that certain English language phonemes (in particular, vowels) are significantly more susceptible to our attack. We show that the attacks are effective when mounted over cellular networks, where signals are subject to degradation due to transcoding, jitter, and packet loss. △ Less

Submitted 11 October, 2019; originally announced October 2019.

arXiv:1907.10418 [pdf]

Improving Malaria Parasite Detection from Red Blood Cell using Deep Convolutional Neural Networks

Authors: Aimon Rahman, Hasib Zunair, M Sohel Rahman, Jesia Quader Yuki, Sabyasachi Biswas, Md Ashraful Alam, Nabila Binte Alam, M. R. C. Mahdy

Abstract: Malaria is a female anopheles mosquito-bite inflicted life-threatening disease which is considered endemic in many parts of the world. This article focuses on improving malaria detection from patches segmented from microscopic images of red blood cell smears by introducing a deep convolutional neural network. Compared to the traditional methods that use tedious hand engineering feature extraction,… ▽ More Malaria is a female anopheles mosquito-bite inflicted life-threatening disease which is considered endemic in many parts of the world. This article focuses on improving malaria detection from patches segmented from microscopic images of red blood cell smears by introducing a deep convolutional neural network. Compared to the traditional methods that use tedious hand engineering feature extraction, the proposed method uses deep learning in an end-to-end arrangement that performs both feature extraction and classification directly from the raw segmented patches of the red blood smears. The dataset used in this study was taken from National Institute of Health named NIH Malaria Dataset. The evaluation metric accuracy and loss along with 5-fold cross validation was used to compare and select the best performing architecture. To maximize the performance, existing standard pre-processing techniques from the literature has also been experimented. In addition, several other complex architectures have been implemented and tested to pick the best performing model. A holdout test has also been conducted to verify how well the proposed model generalizes on unseen data. Our best model achieves an accuracy of almost 97.77%. △ Less

Submitted 23 July, 2019; originally announced July 2019.

Comments: Application of deep learning in biological science for the early detection of disease

arXiv:1905.11883 [pdf, other]

A Case Study on the Effects of Partial Solar Eclipse on Distributed Photovoltaic Systems and Management Areas

Authors: Aditya Sundararajan, Temitayo O. Olowu, Longfei Wei, Shahinur Rahman, Arif I. Sarwat

Abstract: Photovoltaic (PV) systems depend on irradiance, ambient temperature and module temperature. A solar eclipse causes significant changes in these parameters, thereby impacting PV generation profile, performance, and power quality of larger grid where they connect to. This paper presents a case study to evaluate the impacts of the solar eclipse of August 21, 2017 on two real-world grid-tied PV system… ▽ More Photovoltaic (PV) systems depend on irradiance, ambient temperature and module temperature. A solar eclipse causes significant changes in these parameters, thereby impacting PV generation profile, performance, and power quality of larger grid where they connect to. This paper presents a case study to evaluate the impacts of the solar eclipse of August 21, 2017 on two real-world grid-tied PV systems (1.4MW and 355kW) in Miami and Daytona, Florida, the feeders they are connected to, and the management areas they belong to. Four types of analyses are conducted to obtain a comprehensive picture of the impacts using 1-minute PV generation data, hourly weather data, real feeder parameters, and daily reliability data. These analyses include: individual PV system performance measurement using power performance index; power quality analysis at the point of interconnection; a study on the operation of voltage regulating devices on the feeders during eclipse peak using an IEEE 8500 test case distribution feeder; and reliability study involving a multilayer perceptron framework for forecasting system reliability of the management areas. Results from this study provide a unique insight into how solar eclipses impact the behavior of PV systems and the grid, which would be of concern to electric utilities in future high penetration scenarios. △ Less

Submitted 24 May, 2019; originally announced May 2019.

Comments: Accepted by IET Smart Grid journal

arXiv:1811.00244 [pdf, ps, other]

A Variational Step for Reduction of Mixed Gaussian-Impulse Noise from Images

Authors: Mohammad Tariqul Islam, Dipayan Saha, S. M. Mahbubur Rahman, M. Omair Ahmad, M. N. S. Swamy

Abstract: Reduction of mixed noise is an ill posed problem for the occurrence of contrasting distributions of noise in the image. The mixed noise that is usually encountered is the simultaneous presence of additive white Gaussian noise (AWGN) and impulse noise (IN). A standard approach to denoise an image with such corruption is to apply a rank order filter (ROF) followed by an efficient linear filter to re… ▽ More Reduction of mixed noise is an ill posed problem for the occurrence of contrasting distributions of noise in the image. The mixed noise that is usually encountered is the simultaneous presence of additive white Gaussian noise (AWGN) and impulse noise (IN). A standard approach to denoise an image with such corruption is to apply a rank order filter (ROF) followed by an efficient linear filter to remove the residual noise. However, ROF cannot completely remove the heavy tail of the noise distribution originating from the IN and thus the denoising performance can be suboptimal. In this paper, we present a variational step to remove the heavy tail of the noise distribution. Through experiments, it is shown that this approach can significantly improve the denoising performance of mixed AWGN-IN using well-established methods. △ Less

Submitted 1 November, 2018; originally announced November 2018.

Comments: ICECE, Dhaka, Bangladesh, 2018

arXiv:1807.02684 [pdf, other]

doi 10.1016/j.bspc.2018.12.016

VFPred: A Fusion of Signal Processing and Machine Learning techniques in Detecting Ventricular Fibrillation from ECG Signals

Authors: Nabil Ibtehaz, M. Saifur Rahman, M. Sohel Rahman

Abstract: Ventricular Fibrillation (VF), one of the most dangerous arrhythmias, is responsible for sudden cardiac arrests. Thus, various algorithms have been developed to predict VF from Electrocardiogram (ECG), which is a binary classification problem. In the literature, we find a number of algorithms based on signal processing, where, after some robust mathematical operations the decision is given based o… ▽ More Ventricular Fibrillation (VF), one of the most dangerous arrhythmias, is responsible for sudden cardiac arrests. Thus, various algorithms have been developed to predict VF from Electrocardiogram (ECG), which is a binary classification problem. In the literature, we find a number of algorithms based on signal processing, where, after some robust mathematical operations the decision is given based on a predefined threshold over a single value. On the other hand, some machine learning based algorithms are also reported in the literature; however, these algorithms merely combine some parameters and make a prediction using those as features. Both the approaches have their perks and pitfalls; thus our motivation was to coalesce them to get the best out of the both worlds. Hence we have developed, VFPred that, in addition to employing a signal processing pipeline, namely, Empirical Mode Decomposition and Discrete Time Fourier Transform for useful feature extraction, uses a Support Vector Machine for efficient classification. VFPred turns out to be a robust algorithm as it is able to successfully segregate the two classes with equal confidence (Sensitivity = 99.99%, Specificity = 98.40%) even from a short signal of 5 seconds long, whereas existing works though requires longer signals, flourishes in one but fails in the other. △ Less

Submitted 15 November, 2018; v1 submitted 7 July, 2018; originally announced July 2018.

Showing 1–29 of 29 results for author: Rahman, S