-
Syn-Att: Synthetic Speech Attribution via Semi-Supervised Unknown Multi-Class Ensemble of CNNs
Authors:
Md Awsafur Rahman,
Bishmoy Paul,
Najibul Haque Sarker,
Zaber Ibn Abdul Hakim,
Shaikh Anowarul Fattah,
Mohammad Saquib
Abstract:
With the huge technological advances introduced by deep learning in audio & speech processing, many novel synthetic speech techniques achieved incredible realistic results. As these methods generate realistic fake human voices, they can be used in malicious acts such as people imitation, fake news, spreading, spoofing, media manipulations, etc. Hence, the ability to detect synthetic or natural spe…
▽ More
With the huge technological advances introduced by deep learning in audio & speech processing, many novel synthetic speech techniques achieved incredible realistic results. As these methods generate realistic fake human voices, they can be used in malicious acts such as people imitation, fake news, spreading, spoofing, media manipulations, etc. Hence, the ability to detect synthetic or natural speech has become an urgent necessity. Moreover, being able to tell which algorithm has been used to generate a synthetic speech track can be of preeminent importance to track down the culprit. In this paper, a novel strategy is proposed to attribute a synthetic speech track to the generator that is used to synthesize it. The proposed detector transforms the audio into log-mel spectrogram, extracts features using CNN, and classifies it between five known and unknown algorithms, utilizing semi-supervision and ensemble to improve its robustness and generalizability significantly. The proposed detector is validated on two evaluation datasets consisting of a total of 18,000 weakly perturbed (Eval 1) & 10,000 strongly perturbed (Eval 2) synthetic speeches. The proposed method outperforms other top teams in accuracy by 12-13% on Eval 2 and 1-2% on Eval 1, in the IEEE SP Cup challenge at ICASSP 2022.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Flexible Beamforming in B5G for Improving Tethered UAV Coverage over Smart Environments
Authors:
Abdu Saif,
Nor Shahida Mohd Shah,
Soreen Ameen Fattah,
Saeed Hamood Alsamhi,
Santosh Kumar,
Ali Saad Al khuraib
Abstract:
Unmanned Aerial Vehicles (UAVs) are being used for wireless communications in smart environments. However, the need for mobility, scalability of data transmission over wide areas, and the required coverage area make UAV beamforming essential for better coverage and user experience. To this end, we propose a flexible beamforming approach to improve tethered UAV coverage quality and maximize the num…
▽ More
Unmanned Aerial Vehicles (UAVs) are being used for wireless communications in smart environments. However, the need for mobility, scalability of data transmission over wide areas, and the required coverage area make UAV beamforming essential for better coverage and user experience. To this end, we propose a flexible beamforming approach to improve tethered UAV coverage quality and maximize the number of users experiencing the minimum required rate in any target environment. Our solution demonstrates a significant achievement in flexible beamforming in smart environments, including urban, suburban, dense, and high-rise urban. Furthermore, the beamforming gain is mainly concentrated in the target to improve the coverage area based on various scenarios. Simulation results show that the proposed approach can achieve a significantly received flexible power beam that focuses the transmitted signal towards the receiver and improves received power by reducing signal power spread. In the case of no beamforming, signal power spreads out as distance increases, reducing the signal strength. Furthermore, our proposed solution is suitable for improving UAV coverage and reliability in smart and harsh environments.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
A Novel Hierarchical-Classification-Block Based Convolutional Neural Network for Source Camera Model Identification
Authors:
Mohammad Zunaed,
Shaikh Anowarul Fattah
Abstract:
Digital security has been an active area of research interest due to the rapid adaptation of internet infrastructure, the increasing popularity of social media, and digital cameras. Due to inherent differences in working principles to generate an image, different camera brands left behind different intrinsic processing noises which can be used to identify the camera brand. In the last decade, many…
▽ More
Digital security has been an active area of research interest due to the rapid adaptation of internet infrastructure, the increasing popularity of social media, and digital cameras. Due to inherent differences in working principles to generate an image, different camera brands left behind different intrinsic processing noises which can be used to identify the camera brand. In the last decade, many signal processing and deep learning-based methods have been proposed to identify and isolate this noise from the scene details in an image to detect the source camera brand. One prominent solution is to utilize a hierarchical classification system rather than the traditional single-classifier approach. Different individual networks are used for brand-level and model-level source camera identification. This approach allows for better scaling and requires minimal modifications for adding a new camera brand/model to the solution. However, using different full-fledged networks for both brand and model-level classification substantially increases memory consumption and training complexity. Moreover, extracted low-level features from the different network's initial layers often coincide, resulting in redundant weights. To mitigate the training and memory complexity, we propose a classifier-block-level hierarchical system instead of a network-level one for source camera model classification. Our proposed approach not only results in significantly fewer parameters but also retains the capability to add a new camera model with minimal modification. Thorough experimentation on the publicly available Dresden dataset shows that our proposed approach can achieve the same level of state-of-the-art performance but requires fewer parameters compared to a state-of-the-art network-level hierarchical-based system.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
A Novel Multi-Stage Training Approach for Human Activity Recognition from Multimodal Wearable Sensor Data Using Deep Neural Network
Authors:
Tanvir Mahmud,
A. Q. M. Sazzad Sayyed,
Shaikh Anowarul Fattah,
Sun-Yuan Kung
Abstract:
Deep neural network is an effective choice to automatically recognize human actions utilizing data from various wearable sensors. These networks automate the process of feature extraction relying completely on data. However, various noises in time series data with complex inter-modal relationships among sensors make this process more complicated. In this paper, we have proposed a novel multi-stage…
▽ More
Deep neural network is an effective choice to automatically recognize human actions utilizing data from various wearable sensors. These networks automate the process of feature extraction relying completely on data. However, various noises in time series data with complex inter-modal relationships among sensors make this process more complicated. In this paper, we have proposed a novel multi-stage training approach that increases diversity in this feature extraction process to make accurate recognition of actions by combining varieties of features extracted from diverse perspectives. Initially, instead of using single type of transformation, numerous transformations are employed on time series data to obtain variegated representations of the features encoded in raw data. An efficient deep CNN architecture is proposed that can be individually trained to extract features from different transformed spaces. Later, these CNN feature extractors are merged into an optimal architecture finely tuned for optimizing diversified extracted features through a combined training stage or multiple sequential training stages. This approach offers the opportunity to explore the encoded features in raw sensor data utilizing multifarious observation windows with immense scope for efficient selection of features for final convergence. Extensive experimentations have been carried out in three publicly available datasets that provide outstanding performance consistently with average five-fold cross-validation accuracy of 99.29% on UCI HAR database, 99.02% on USC HAR database, and 97.21% on SKODA database outperforming other state-of-the-art approaches.
△ Less
Submitted 3 January, 2021;
originally announced January 2021.
-
CovTANet: A Hybrid Tri-level Attention Based Network for Lesion Segmentation, Diagnosis, and Severity Prediction of COVID-19 Chest CT Scans
Authors:
Tanvir Mahmud,
Md. Jahin Alam,
Sakib Chowdhury,
Shams Nafisa Ali,
Md Maisoon Rahman,
Shaikh Anowarul Fattah,
Mohammad Saquib
Abstract:
Rapid and precise diagnosis of COVID-19 is one of the major challenges faced by the global community to control the spread of this overgrowing pandemic. In this paper, a hybrid neural network is proposed, named CovTANet, to provide an end-to-end clinical diagnostic tool for early diagnosis, lesion segmentation, and severity prediction of COVID-19 utilizing chest computer tomography (CT) scans. A m…
▽ More
Rapid and precise diagnosis of COVID-19 is one of the major challenges faced by the global community to control the spread of this overgrowing pandemic. In this paper, a hybrid neural network is proposed, named CovTANet, to provide an end-to-end clinical diagnostic tool for early diagnosis, lesion segmentation, and severity prediction of COVID-19 utilizing chest computer tomography (CT) scans. A multi-phase optimization strategy is introduced for solving the challenges of complicated diagnosis at a very early stage of infection, where an efficient lesion segmentation network is optimized initially which is later integrated into a joint optimization framework for the diagnosis and severity prediction tasks providing feature enhancement of the infected regions. Moreover, for overcoming the challenges with diffused, blurred, and varying shaped edges of COVID lesions with novel and diverse characteristics, a novel segmentation network is introduced, namely Tri-level Attention-based Segmentation Network (TA-SegNet). This network has significantly reduced semantic gaps in subsequent encoding decoding stages, with immense parallelization of multi-scale features for faster convergence providing considerable performance improvement over traditional networks. Furthermore, a novel tri-level attention mechanism has been introduced, which is repeatedly utilized over the network, combining channel, spatial, and pixel attention schemes for faster and efficient generalization of contextual information embedded in the feature map through feature re-calibration and enhancement operations. Outstanding performances have been achieved in all three-tasks through extensive experimentation on a large publicly available dataset containing 1110 chest CT-volumes that signifies the effectiveness of the proposed scheme at the current stage of the pandemic.
△ Less
Submitted 3 January, 2021;
originally announced January 2021.
-
Automatic Diagnosis of Malaria from Thin Blood Smear Images using Deep Convolutional Neural Network with Multi-Resolution Feature Fusion
Authors:
Tanvir Mahmud,
Shaikh Anowarul Fattah
Abstract:
Malaria, a life-threatening disease, infects millions of people every year throughout the world demanding faster diagnosis for proper treatment before any damages occur. In this paper, an end-to-end deep learning-based approach is proposed for faster diagnosis of malaria from thin blood smear images by making efficient optimizations of features extracted from diversified receptive fields. Firstly,…
▽ More
Malaria, a life-threatening disease, infects millions of people every year throughout the world demanding faster diagnosis for proper treatment before any damages occur. In this paper, an end-to-end deep learning-based approach is proposed for faster diagnosis of malaria from thin blood smear images by making efficient optimizations of features extracted from diversified receptive fields. Firstly, an efficient, highly scalable deep neural network, named as DilationNet, is proposed that incorporates features from a large spectrum by varying dilation rates of convolutions to extract features from different receptive areas. Next, the raw images are resampled to various resolutions to introduce variations in the receptive fields that are used for independently optimizing different forms of DilationNet scaled for different resolutions of images. Afterward, a feature fusion scheme is introduced with the proposed DeepFusionNet architecture for jointly optimizing the feature space of these individually trained networks operating on different levels of observations. All the convolutional layers of various forms of DilationNets that are optimized to extract spatial features from different resolutions of images are directly transferred to provide a variegated feature space from any image. Later, joint optimization of these spatial features is carried out in the DeepFusionNet to extract the most relevant representation of the sample image. This scheme offers the opportunity to explore the feature space extensively by varying the observation level to accurately diagnose the abnormality. Intense experimentations on a publicly available dataset show outstanding performance with accuracy over 99.5% outperforming other state-of-the-art approaches.
△ Less
Submitted 9 December, 2020;
originally announced December 2020.
-
CovSegNet: A Multi Encoder-Decoder Architecture for Improved Lesion Segmentation of COVID-19 Chest CT Scans
Authors:
Tanvir Mahmud,
Md Awsafur Rahman,
Shaikh Anowarul Fattah,
Sun-Yuan Kung
Abstract:
Automatic lung lesions segmentation of chest CT scans is considered a pivotal stage towards accurate diagnosis and severity measurement of COVID-19. Traditional U-shaped encoder-decoder architecture and its variants suffer from diminutions of contextual information in pooling/upsampling operations with increased semantic gaps among encoded and decoded feature maps as well as instigate vanishing gr…
▽ More
Automatic lung lesions segmentation of chest CT scans is considered a pivotal stage towards accurate diagnosis and severity measurement of COVID-19. Traditional U-shaped encoder-decoder architecture and its variants suffer from diminutions of contextual information in pooling/upsampling operations with increased semantic gaps among encoded and decoded feature maps as well as instigate vanishing gradient problems for its sequential gradient propagation that result in sub-optimal performance. Moreover, operating with 3D CT-volume poses further limitations due to the exponential increase of computational complexity making the optimization difficult. In this paper, an automated COVID-19 lesion segmentation scheme is proposed utilizing a highly efficient neural network architecture, namely CovSegNet, to overcome these limitations. Additionally, a two-phase training scheme is introduced where a deeper 2D-network is employed for generating ROI-enhanced CT-volume followed by a shallower 3D-network for further enhancement with more contextual information without increasing computational burden. Along with the traditional vertical expansion of Unet, we have introduced horizontal expansion with multi-stage encoder-decoder modules for achieving optimum performance. Additionally, multi-scale feature maps are integrated into the scale transition process to overcome the loss of contextual information. Moreover, a multi-scale fusion module is introduced with a pyramid fusion scheme to reduce the semantic gaps between subsequent encoder/decoder modules while facilitating the parallel optimization for efficient gradient propagation. Outstanding performances have been achieved in three publicly available datasets that largely outperform other state-of-the-art approaches. The proposed scheme can be easily extended for achieving optimum segmentation performances in a wide variety of applications.
△ Less
Submitted 2 December, 2020;
originally announced December 2020.
-
SPECMAR: Fast Heart Rate Estimation from PPG Signal using a Modified Spectral Subtraction Scheme with Composite Motion Artifacts Reference Generation
Authors:
Mohammad Tariqul Islam,
Sk. Tanvir Ahmed,
Celia Shahnaz,
Shaikh Anowarul Fattah
Abstract:
The task of heart rate estimation using photoplethysmographic (PPG) signal is challenging due to the presence of various motion artifacts in the recorded signals. In this paper, a fast algorithm for heart rate estimation based on modified SPEctral subtraction scheme utilizing Composite Motion Artifacts Reference generation (SPECMAR) is proposed using two-channel PPG and three-axis accelerometer si…
▽ More
The task of heart rate estimation using photoplethysmographic (PPG) signal is challenging due to the presence of various motion artifacts in the recorded signals. In this paper, a fast algorithm for heart rate estimation based on modified SPEctral subtraction scheme utilizing Composite Motion Artifacts Reference generation (SPECMAR) is proposed using two-channel PPG and three-axis accelerometer signals. First, the preliminary noise reduction is obtained by filtering unwanted frequency components from the recorded signals. Next, a composite motion artifacts reference generation method is developed to be employed in the proposed SPECMAR algorithm for motion artifacts reduction. The heart rate is then computed from the noise and motion artifacts reduced PPG signal. Finally, a heart rate tracking algorithm is proposed considering neighboring estimates. The performance of the SPECMAR algorithm has been tested on publicly available PPG database. The average heart rate estimation error is found to be 2.09 BPM on 23 recordings. The Pearson correlation is 0.9907. Due to low computational complexity, the method is faster than the comparing methods. The low estimation error, smooth and fast heart rate tracking makes SPECMAR an ideal choice to be implemented in wearable devices.
△ Less
Submitted 27 November, 2018; v1 submitted 15 October, 2018;
originally announced October 2018.