Search | arXiv e-print repository

Sound Tagging in Infant-centric Home Soundscapes

Authors: Mohammad Nur Hossain Khan, Jialu Li, Nancy L. McElwain, Mark Hasegawa-Johnson, Bashima Islam

Abstract: Certain environmental noises have been associated with negative developmental outcomes for infants and young children. Though classifying or tagging sound events in a domestic environment is an active research area, previous studies focused on data collected from a non-stationary microphone placed in the environment or from the perspective of adults. Further, many of these works ignore infants or… ▽ More Certain environmental noises have been associated with negative developmental outcomes for infants and young children. Though classifying or tagging sound events in a domestic environment is an active research area, previous studies focused on data collected from a non-stationary microphone placed in the environment or from the perspective of adults. Further, many of these works ignore infants or young children in the environment or have data collected from only a single family where noise from the fixed sound source can be moderate at the infant's position or vice versa. Thus, despite the recent success of large pre-trained models for noise event detection, the performance of these models on infant-centric noise soundscapes in the home is yet to be explored. To bridge this gap, we have collected and labeled noises in home soundscapes from 22 families in an unobtrusive manner, where the data are collected through an infant-worn recording device. In this paper, we explore the performance of a large pre-trained model (Audio Spectrogram Transformer [AST]) on our noise-conditioned infant-centric environmental data as well as publicly available home environmental datasets. Utilizing different training strategies such as resampling, utilizing public datasets, mixing public and infant-centric training sets, and data augmentation using noise and masking, we evaluate the performance of a large pre-trained model on sparse and imbalanced infant-centric data. Our results show that fine-tuning the large pre-trained model by combining our collected dataset with public datasets increases the F1-score from 0.11 (public datasets) and 0.76 (collected datasets) to 0.84 (combined datasets) and Cohen's Kappa from 0.013 (public datasets) and 0.77 (collected datasets) to 0.83 (combined datasets) compared to only training with public or collected datasets, respectively. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Accepted in IEEE/ACM CHASE 2024

arXiv:2405.17520 [pdf, other]

Advancing Medical Image Segmentation with Mini-Net: A Lightweight Solution Tailored for Efficient Segmentation of Medical Images

Authors: Syed Javed, Tariq M. Khan, Abdul Qayyum, Arcot Sowmya, Imran Razzak

Abstract: Accurate segmentation of anatomical structures and abnormalities in medical images is crucial for computer-aided diagnosis and analysis. While deep learning techniques excel at this task, their computational demands pose challenges. Additionally, some cutting-edge segmentation methods, though effective for general object segmentation, may not be optimised for medical images. To address these issue… ▽ More Accurate segmentation of anatomical structures and abnormalities in medical images is crucial for computer-aided diagnosis and analysis. While deep learning techniques excel at this task, their computational demands pose challenges. Additionally, some cutting-edge segmentation methods, though effective for general object segmentation, may not be optimised for medical images. To address these issues, we propose Mini-Net, a lightweight segmentation network specifically designed for medical images. With fewer than 38,000 parameters, Mini-Net efficiently captures both high- and low-frequency features, enabling real-time applications in various medical imaging scenarios. We evaluate Mini-Net on various datasets, including DRIVE, STARE, ISIC-2016, ISIC-2018, and MoNuSeg, demonstrating its robustness and good performance compared to state-of-the-art methods. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2404.15337 [pdf, other]

RSSI Estimation for Constrained Indoor Wireless Networks using ANN

Authors: Samrah Arif, M. Arif Khan, Sabih Ur Rehman

Abstract: In the expanding field of the Internet of Things (IoT), wireless channel estimation is a significant challenge. This is specifically true for low-power IoT (LP-IoT) communication, where efficiency and accuracy are extremely important. This research establishes two distinct LP-IoT wireless channel estimation models using Artificial Neural Networks (ANN): a Feature-based ANN model and a Sequence-bas… ▽ More In the expanding field of the Internet of Things (IoT), wireless channel estimation is a significant challenge. This is specifically true for low-power IoT (LP-IoT) communication, where efficiency and accuracy are extremely important. This research establishes two distinct LP-IoT wireless channel estimation models using Artificial Neural Networks (ANN): a Feature-based ANN model and a Sequence-based ANN model. Both models have been constructed to enhance LP-IoT communication by lowering the estimation error in the LP-IoT wireless channel. The Feature-based model aims to capture complex patterns of measured Received Signal Strength Indicator (RSSI) data using environmental characteristics. The Sequence-based approach utilises predetermined categorisation techniques to estimate the RSSI sequence of specifically selected environment characteristics. The findings demonstrate that our suggested approaches attain remarkable precision in channel estimation, with an improvement in MSE of $88.29\%$ of the Feature-based model and $97.46\%$ of the Sequence-based model over existing research. Additionally, the comparative analysis of these techniques with traditional and other Deep Learning (DL)-based techniques also highlights the superior performance of our developed models and their potential in real-world IoT applications. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.11771 [pdf]

IoT-Driven Cloud-based Energy and Environment Monitoring System for Manufacturing Industry

Authors: Nitol Saha, Md Masruk Aulia, Md. Mostafizur Rahman, Mohammed Shafiul Alam Khan

Abstract: This research focused on the development of a cost-effective IoT solution for energy and environment monitoring geared towards manufacturing industries. The proposed system is developed using open-source software that can be easily deployed in any manufacturing environment. The system collects real-time temperature, humidity, and energy data from different devices running on different communicatio… ▽ More This research focused on the development of a cost-effective IoT solution for energy and environment monitoring geared towards manufacturing industries. The proposed system is developed using open-source software that can be easily deployed in any manufacturing environment. The system collects real-time temperature, humidity, and energy data from different devices running on different communication such as TCP/IP, Modbus, etc., and the data is transferred wirelessly using an MQTT client to a database working as a cloud storage solution. The collected data is then visualized and analyzed using a website running on a host machine working as a web client. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.09342 [pdf, other]

Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan

Authors: Muhammad Saad Saeed, Shah Nawaz, Muhammad Salman Tahir, Rohan Kumar Das, Muhammad Zaigham Zaheer, Marta Moscati, Markus Schedl, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf

Abstract: The advancements of technology have led to the use of multimodal systems in various real-world applications. Among them, the audio-visual systems are one of the widely used multimodal systems. In the recent years, associating face and voice of a person has gained attention due to presence of unique correlation between them. The Face-voice Association in Multilingual Environments (FAME) Challenge 2… ▽ More The advancements of technology have led to the use of multimodal systems in various real-world applications. Among them, the audio-visual systems are one of the widely used multimodal systems. In the recent years, associating face and voice of a person has gained attention due to presence of unique correlation between them. The Face-voice Association in Multilingual Environments (FAME) Challenge 2024 focuses on exploring face-voice association under a unique condition of multilingual scenario. This condition is inspired from the fact that half of the world's population is bilingual and most often people communicate under multilingual scenario. The challenge uses a dataset namely, Multilingual Audio-Visual (MAV-Celeb) for exploring face-voice association in multilingual environments. This report provides the details of the challenge, dataset, baselines and task details for the FAME Challenge. △ Less

Submitted 16 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

Comments: ACM Multimedia Conference - Grand Challenge

arXiv:2403.14120 [pdf, other]

Advancing IIoT with Over-the-Air Federated Learning: The Role of Iterative Magnitude Pruning

Authors: Fazal Muhammad Ali Khan, Hatem Abou-Zeid, Aryan Kaushik, Syed Ali Hassan

Abstract: The industrial Internet of Things (IIoT) under Industry 4.0 heralds an era of interconnected smart devices where data-driven insights and machine learning (ML) fuse to revolutionize manufacturing. A noteworthy development in IIoT is the integration of federated learning (FL), which addresses data privacy and security among devices. FL enables edge sensors, also known as peripheral intelligence uni… ▽ More The industrial Internet of Things (IIoT) under Industry 4.0 heralds an era of interconnected smart devices where data-driven insights and machine learning (ML) fuse to revolutionize manufacturing. A noteworthy development in IIoT is the integration of federated learning (FL), which addresses data privacy and security among devices. FL enables edge sensors, also known as peripheral intelligence units (PIUs) to learn and adapt using their data locally, without explicit sharing of confidential data, to facilitate a collaborative yet confidential learning process. However, the lower memory footprint and computational power of PIUs inherently require deep neural network (DNN) models that have a very compact size. Model compression techniques such as pruning can be used to reduce the size of DNN models by removing unnecessary connections that have little impact on the model's performance, thus making the models more suitable for the limited resources of PIUs. Targeting the notion of compact yet robust DNN models, we propose the integration of iterative magnitude pruning (IMP) of the DNN model being trained in an over-the-air FL (OTA-FL) environment for IIoT. We provide a tutorial overview and also present a case study of the effectiveness of IMP in OTA-FL for an IIoT environment. Finally, we present future directions for enhancing and optimizing these deep compression techniques further, aiming to push the boundaries of IIoT capabilities in acquiring compact yet robust and high-performing DNN models. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 6 pages, 6 figures

arXiv:2403.08099 [pdf, other]

Application of Distributed Arithmetic to Adaptive Filtering Algorithms: Trends, Challenges and Future

Authors: Mohd. Tasleem Khan

Abstract: The utilization of distributed arithmetic (DA) in AF algorithms has gained significant attention in recent years due to its potential to enhance computational efficiency and reduce resource requirements. This paper presents an exploration of the application of DA to adaptive filtering (AF) algorithms, analyzing trends, discussing challenges, and outlining future prospects. It begins by providing a… ▽ More The utilization of distributed arithmetic (DA) in AF algorithms has gained significant attention in recent years due to its potential to enhance computational efficiency and reduce resource requirements. This paper presents an exploration of the application of DA to adaptive filtering (AF) algorithms, analyzing trends, discussing challenges, and outlining future prospects. It begins by providing an overview of both DA and AF algorithms, highlighting their individual merits and established applications. Subsequently, the integration of DA into AF algorithms is explored, showcasing its ability to optimize multiply-accumulate operations and mitigate the computational burden associated with AF algorithms. Throughout the paper, the critical trends observed in the field are discussed, including advancements in DA-based hardware architectures. Moreover, the challenges encountered in implementing DA-based AF is also discussed. The continued evolution of DA techniques to cater to the demands of modern AF applications, including real-time processing, resource-constrained environments, and high-dimensional data streams is anticipated. In conclusion, this paper consolidates the current state of applying DA to AF algorithms, offering insights into prevailing trends, discussing challenges, and presenting future research and development in the field. The fusion of these two domains holds promise for achieving improved computational efficiency, reduced hardware complexity, and enhanced performance in various signal processing applications. △ Less

Submitted 17 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.05415 [pdf]

An Overview of Automated Vehicle Platooning Strategies

Authors: M Sabbir Salek, Mugdha Basu Thakur, Pardha Sai Krishna Ala, Mashrur Chowdhury, Matthias Schmid, Pamela Murray-Tuite, Sakib Mahmud Khan, Venkat Krovi

Abstract: Automated vehicle (AV) platooning has the potential to improve the safety, operational, and energy efficiency of surface transportation systems by limiting or eliminating human involvement in the driving tasks. The theoretical validity of the AV platooning strategies has been established and practical applications are being tested under real-world conditions. The emergence of sensors, communicatio… ▽ More Automated vehicle (AV) platooning has the potential to improve the safety, operational, and energy efficiency of surface transportation systems by limiting or eliminating human involvement in the driving tasks. The theoretical validity of the AV platooning strategies has been established and practical applications are being tested under real-world conditions. The emergence of sensors, communication, and control strategies has resulted in rapid and constant evolution of AV platooning strategies. In this paper, we review the state-of-the-art knowledge in AV platooning using a five-component platooning framework, which includes vehicle model, information-receiving process, information flow topology, spacing policy, and controller and discuss the advantages and limitations of the components. Based on the discussion about existing strategies and associated limitations, potential future research directions are presented. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.18859 [pdf]

doi 10.1016/j.xcrp.2024.101941

Taking Second-life Batteries from Exhausted to Empowered using Experiments, Data Analysis, and Health Estimation

Authors: Xiaofan Cui, Muhammad Aadil Khan, Gabriele Pozzato, Surinder Singh, Ratnesh Sharma, Simona Onori

Abstract: The reuse of retired electric vehicle batteries in grid energy storage offers environmental and economic benefits. This study concentrates on health monitoring algorithms for retired batteries deployed in grid storage. Over 15 months of testing, we collect, analyze, and publicize a dataset of second-life batteries, implementing a cycling protocol simulating grid energy storage load profiles within… ▽ More The reuse of retired electric vehicle batteries in grid energy storage offers environmental and economic benefits. This study concentrates on health monitoring algorithms for retired batteries deployed in grid storage. Over 15 months of testing, we collect, analyze, and publicize a dataset of second-life batteries, implementing a cycling protocol simulating grid energy storage load profiles within a 3-4 V voltage window. Four machine-learning-based health estimation models, relying on online-accessible features and initial capacity, are compared, with the selected model achieving a mean absolute percentage error below 2.3% on test data. Additionally, an adaptive online health estimation algorithm is proposed by integrating a clustering-based method, thus limiting estimation errors during online deployment. These results showcase the feasibility of repurposing retired batteries for second-life applications. Based on obtained data and power demand, these second-life batteries exhibit potential for over a decade of grid energy storage use. △ Less

Submitted 8 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: 16 pages, 8 figures

arXiv:2401.15804 [pdf, other]

Brain Tumor Diagnosis Using Quantum Convolutional Neural Networks

Authors: Muhammad Al-Zafar Khan, Nouhaila Innan, Abdullah Al Omar Galib, Mohamed Bennai

Abstract: Integrating Quantum Convolutional Neural Networks (QCNNs) into medical diagnostics represents a transformative advancement in the classification of brain tumors. This research details a high-precision design and execution of a QCNN model specifically tailored to identify and classify brain cancer images. Our proposed QCNN architecture and algorithm have achieved an exceptional classification accur… ▽ More Integrating Quantum Convolutional Neural Networks (QCNNs) into medical diagnostics represents a transformative advancement in the classification of brain tumors. This research details a high-precision design and execution of a QCNN model specifically tailored to identify and classify brain cancer images. Our proposed QCNN architecture and algorithm have achieved an exceptional classification accuracy of 99.67%, demonstrating the model's potential as a powerful tool for clinical applications. The remarkable performance of our model underscores its capability to facilitate rapid and reliable brain tumor diagnoses, potentially streamlining the decision-making process in treatment planning. These findings strongly support the further investigation and application of quantum computing and quantum machine learning methodologies in medical imaging, suggesting a future where quantum-enhanced diagnostics could significantly elevate the standard of patient care and treatment outcomes. △ Less

Submitted 30 January, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

Comments: 10 pages, 9 figures, 45 references

arXiv:2401.04734 [pdf, other]

Online Adaptive Data-driven State-of-health Estimation for Second-life Batteries with BIBO Stability Guarantees

Authors: Xiaofan Cui, Muhammad Aadil Khan, Simona Onori

Abstract: A key challenge that is currently hindering the widespread deployment and use of retired electric vehicle (EV) batteries for second-life (SL) applications is the ability to accurately estimate and monitor their state of health (SOH). Second-life battery systems can be sourced from different battery packs with a lack of knowledge of their historical usage. To facilitate the on-the-field use of SL… ▽ More A key challenge that is currently hindering the widespread deployment and use of retired electric vehicle (EV) batteries for second-life (SL) applications is the ability to accurately estimate and monitor their state of health (SOH). Second-life battery systems can be sourced from different battery packs with a lack of knowledge of their historical usage. To facilitate the on-the-field use of SL batteries, this paper introduces an online adaptive health estimation strategy with guaranteed stability. This method relies exclusively on operational data that can be accessed in real-time from SL batteries. The adaptation algorithm is designed to ensure bounded-input-bounded-output (BIBO) stability. The effectiveness of the proposed approach is shown on a laboratory-aged experimental data set of retired EV batteries. The estimator gains are dynamically adapted to accommodate the distinct characteristics of each individual cell, making it a promising candidate for future SL battery management systems (BMS2). △ Less

Submitted 7 January, 2024; originally announced January 2024.

arXiv:2312.10585 [pdf, ps, other]

ESDMR-Net: A Lightweight Network With Expand-Squeeze and Dual Multiscale Residual Connections for Medical Image Segmentation

Authors: Tariq M Khan, Syed S. Naqvi, Erik Meijering

Abstract: Segmentation is an important task in a wide range of computer vision applications, including medical image analysis. Recent years have seen an increase in the complexity of medical image segmentation approaches based on sophisticated convolutional neural network architectures. This progress has led to incremental enhancements in performance on widely recognised benchmark datasets. However, most of… ▽ More Segmentation is an important task in a wide range of computer vision applications, including medical image analysis. Recent years have seen an increase in the complexity of medical image segmentation approaches based on sophisticated convolutional neural network architectures. This progress has led to incremental enhancements in performance on widely recognised benchmark datasets. However, most of the existing approaches are computationally demanding, which limits their practical applicability. This paper presents an expand-squeeze dual multiscale residual network (ESDMR-Net), which is a fully convolutional network that is particularly well-suited for resource-constrained computing hardware such as mobile devices. ESDMR-Net focuses on extracting multiscale features, enabling the learning of contextual dependencies among semantically distinct features. The ESDMR-Net architecture allows dual-stream information flow within encoder-decoder pairs. The expansion operation (depthwise separable convolution) makes all of the rich features with multiscale information available to the squeeze operation (bottleneck layer), which then extracts the necessary information for the segmentation task. The Expand-Squeeze (ES) block helps the network pay more attention to under-represented classes, which contributes to improved segmentation accuracy. To enhance the flow of information across multiple resolutions or scales, we integrated dual multiscale residual (DMR) blocks into the skip connection. This integration enables the decoder to access features from various levels of abstraction, ultimately resulting in more comprehensive feature representations. We present experiments on seven datasets from five distinct examples of applications. Our model achieved the best results despite having significantly fewer trainable parameters, with a reduction of two or even three orders of magnitude. △ Less

Submitted 16 December, 2023; originally announced December 2023.

arXiv:2312.01212 [pdf]

A Comparative Analysis Towards Melanoma Classification Using Transfer Learning by Analyzing Dermoscopic Images

Authors: Md. Fahim Uddin, Nafisa Tafshir, Mohammad Monirujjaman Khan

Abstract: Melanoma is a sort of skin cancer that starts in the cells known as melanocytes. It is more dangerous than other types of skin cancer because it can spread to other organs. Melanoma can be fatal if it spreads to other parts of the body. Early detection is the key to cure, but it requires the skills of skilled doctors to diagnose it. This paper presents a system that combines deep learning techniqu… ▽ More Melanoma is a sort of skin cancer that starts in the cells known as melanocytes. It is more dangerous than other types of skin cancer because it can spread to other organs. Melanoma can be fatal if it spreads to other parts of the body. Early detection is the key to cure, but it requires the skills of skilled doctors to diagnose it. This paper presents a system that combines deep learning techniques with established transfer learning methods to enable skin lesions classification and diagnosis of melanoma skin lesions. Using Convolutional Neural Networks, it presents a method for categorizing melanoma images into benign and malignant images in this research (CNNs). Researchers used 'Deep Learning' techniques to train an expansive number of photos & essentially to get the expected result deep neural networks to need to be trained with a huge number of parameters as dermoscopic images are sensitive & very hard to classify. This paper, has been emphasized building models with less complexity and comparatively better accuracy with limited datasets & partially fewer deep networks so that the system can predict Melanoma at ease from input dermoscopic images as correctly as possible within devices with less computational power. The dataset has been obtained from ISIC Archive. Multiple pre-trained models ResNet101, DenseNet, EfficientNet, InceptionV3 have been implemented using transfer learning techniques to complete the comparative analysis & every model achieved good accuracy. Before training the models, the data has been augmented by multiple parameters to improve the accuracy. Moreover, the results are better than the previous state-of-the-art approaches & adequate to predict melanoma. Among these architectures, DenseNet performed better than the others which gives a validation accuracy of 96.64%, validation loss of 9.43% & test set accuracy of 99.63%. △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2310.20190 [pdf, other]

Visible to Thermal image Translation for improving visual task in low light conditions

Authors: Md Azim Khan

Abstract: Several visual tasks, such as pedestrian detection and image-to-image translation, are challenging to accomplish in low light using RGB images. Heat variation of objects in thermal images can be used to overcome this. In this work, an end-to-end framework, which consists of a generative network and a detector network, is proposed to translate RGB image into Thermal ones and compare generated therm… ▽ More Several visual tasks, such as pedestrian detection and image-to-image translation, are challenging to accomplish in low light using RGB images. Heat variation of objects in thermal images can be used to overcome this. In this work, an end-to-end framework, which consists of a generative network and a detector network, is proposed to translate RGB image into Thermal ones and compare generated thermal images with real data. We have collected images from two different locations using the Parrot Anafi Thermal drone. After that, we created a two-stream network, preprocessed, augmented, the image data, and trained the generator and discriminator models from scratch. The findings demonstrate that it is feasible to translate RGB training data to thermal data using GAN. As a result, thermal data can now be produced more quickly and affordably, which is useful for security and surveillance applications. △ Less

Submitted 8 November, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.19323 [pdf, other]

A Low-Complexity Machine Learning Design for mmWave Beam Prediction

Authors: Muhammad Qurratulain Khan, Abdo Gaber, Mohammad Parvini, Philipp Schulz, Gerhard Fettweis

Abstract: The 3rd Generation Partnership Project (3GPP) is currently studying machine learning (ML) for the fifth generation (5G)-Advanced New Radio (NR) air interface, where spatial and temporal-domain beam prediction are important use cases. With this background, this letter presents a low-complexity ML design that expedites the spatial-domain beam prediction to reduce the power consumption and the refere… ▽ More The 3rd Generation Partnership Project (3GPP) is currently studying machine learning (ML) for the fifth generation (5G)-Advanced New Radio (NR) air interface, where spatial and temporal-domain beam prediction are important use cases. With this background, this letter presents a low-complexity ML design that expedites the spatial-domain beam prediction to reduce the power consumption and the reference signaling overhead, which are currently imperative for frequent beam measurements. Complexity analysis and evaluation results showcase that the proposed model achieves state-of-the-art accuracy with lower computational complexity, resulting in reduced power consumption and faster beam prediction. Furthermore, important observations on the generalization of the proposed model are presented in this letter. △ Less

Submitted 10 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.17142 [pdf]

Single channel speech enhancement by colored spectrograms

Authors: Sania Gul, Muhammad Salman Khan, Muhammad Fazeel

Abstract: Speech enhancement concerns the processes required to remove unwanted background sounds from the target speech to improve its quality and intelligibility. In this paper, a novel approach for single-channel speech enhancement is presented, using colored spectrograms. We propose the use of a deep neural network (DNN) architecture adapted from the pix2pix generative adversarial network (GAN) and trai… ▽ More Speech enhancement concerns the processes required to remove unwanted background sounds from the target speech to improve its quality and intelligibility. In this paper, a novel approach for single-channel speech enhancement is presented, using colored spectrograms. We propose the use of a deep neural network (DNN) architecture adapted from the pix2pix generative adversarial network (GAN) and train it over colored spectrograms of speech to denoise them. After denoising, the colors of spectrograms are translated to magnitudes of short-time Fourier transform (STFT) using a shallow regression neural network. These estimated STFT magnitudes are later combined with the noisy phases to obtain an enhanced speech. The results show an improvement of almost 0.84 points in the perceptual evaluation of speech quality (PESQ) and 1% in the short-term objective intelligibility (STOI) over the unprocessed noisy data. The gain in quality and intelligibility over the unprocessed signal is almost equal to the gain achieved by the baseline methods used for comparison with the proposed model, but at a much reduced computational cost. The proposed solution offers a comparative PESQ score at almost 10 times reduced computational cost than a similar baseline model that has generated the highest PESQ score trained on grayscaled spectrograms, while it provides only a 1% deficit in STOI at 28 times reduced computational cost when compared to another baseline system based on convolutional neural network-GAN (CNN-GAN) that produces the most intelligible speech. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: 18 pages, 6 figures, 5 tables

arXiv:2310.11651 [pdf, other]

US Microelectronics Packaging Ecosystem: Challenges and Opportunities

Authors: Rouhan Noor, Himanandhan Reddy Kottur, Patrick J Craig, Liton Kumar Biswas, M Shafkat M Khan, Nitin Varshney, Hamed Dalir, Elif Akçalı, Bahareh Ghane Motlagh, Charles Woychik, Yong-Kyu Yoon, Navid Asadizanjani

Abstract: The semiconductor industry is experiencing a significant shift from traditional methods of shrinking devices and reducing costs. Chip designers actively seek new technological solutions to enhance cost-effectiveness while incorporating more features into the silicon footprint. One promising approach is Heterogeneous Integration (HI), which involves advanced packaging techniques to integrate indepe… ▽ More The semiconductor industry is experiencing a significant shift from traditional methods of shrinking devices and reducing costs. Chip designers actively seek new technological solutions to enhance cost-effectiveness while incorporating more features into the silicon footprint. One promising approach is Heterogeneous Integration (HI), which involves advanced packaging techniques to integrate independently designed and manufactured components using the most suitable process technology. However, adopting HI introduces design and security challenges. To enable HI, research and development of advanced packaging is crucial. The existing research raises the possible security threats in the advanced packaging supply chain, as most of the Outsourced Semiconductor Assembly and Test (OSAT) facilities/vendors are offshore. To deal with the increasing demand for semiconductors and to ensure a secure semiconductor supply chain, there are sizable efforts from the United States (US) government to bring semiconductor fabrication facilities onshore. However, the US-based advanced packaging capabilities must also be ramped up to fully realize the vision of establishing a secure, efficient, resilient semiconductor supply chain. Our effort was motivated to identify the possible bottlenecks and weak links in the advanced packaging supply chain based in the US. △ Less

Submitted 30 October, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: 22 pages, 8 figures

arXiv:2310.07245 [pdf, other]

Crowd Counting in Harsh Weather using Image Denoising with Pix2Pix GANs

Authors: Muhammad Asif Khan, Hamid Menouar, Ridha Hamila

Abstract: Visual crowd counting estimates the density of the crowd using deep learning models such as convolution neural networks (CNNs). The performance of the model heavily relies on the quality of the training data that constitutes crowd images. In harsh weather such as fog, dust, and low light conditions, the inference performance may severely degrade on the noisy and blur images. In this paper, we prop… ▽ More Visual crowd counting estimates the density of the crowd using deep learning models such as convolution neural networks (CNNs). The performance of the model heavily relies on the quality of the training data that constitutes crowd images. In harsh weather such as fog, dust, and low light conditions, the inference performance may severely degrade on the noisy and blur images. In this paper, we propose the use of Pix2Pix generative adversarial network (GAN) to first denoise the crowd images prior to passing them to the counting model. A Pix2Pix network is trained using synthetic noisy images generated from original crowd images and then the pretrained generator is then used in the inference engine to estimate the crowd density in unseen, noisy crowd images. The performance is tested on JHU-Crowd dataset to validate the significance of the proposed method particularly when high reliability and accuracy are required. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: The paper has been accepted for presentation in IEEE 38th International Conference on Image and Vision Computing New Zealand (IVCNZ 2023). The final manuscript can be accessed at ieeexplore

arXiv:2309.16975 [pdf]

doi 10.1109/ACCESS.2023.3320809

Perceptual Tone Map** Model for High Dynamic Range Imaging

Authors: Imran Mehmood, Xinye Shi, M. Usman Khan, Ming Ronnier Luo

Abstract: One of the key challenges in tone map** is to preserve the perceptual quality of high dynamic range (HDR) images when map** them to standard dynamic range (SDR) displays. Traditional tone map** operators (TMOs) compress the luminance of HDR images without considering the surround and display conditions emanating into suboptimal results. Current research addresses this challenge by incorporat… ▽ More One of the key challenges in tone map** is to preserve the perceptual quality of high dynamic range (HDR) images when map** them to standard dynamic range (SDR) displays. Traditional tone map** operators (TMOs) compress the luminance of HDR images without considering the surround and display conditions emanating into suboptimal results. Current research addresses this challenge by incorporating perceptual color appearance attributes. In this work, we propose a TMO (TMOz) that leverages CIECAM16 perceptual attributes, i.e., brightness, colorfulness, and hue. TMOz accounts for the effects of both the surround and the display conditions to achieve more optimal colorfulness reproduction. The perceptual brightness is compressed, and the perceptual color scales, i.e., colorfulness and hue are derived from HDR images by employing CIECAM16 color adaptation equations. A psychophysical experiment was conducted to automate the brightness compression parameter. The model employs fully automatic and adaptive approach, obviating the requirement for manual parameter selection. TMOz was evaluated in terms of contrast, colorfulness and overall image quality. The objective and subjective evaluation methods revealed that the proposed model outperformed the state-of-the-art TMOs. △ Less

Submitted 29 September, 2023; originally announced September 2023.

arXiv:2309.04968 [pdf, other]

LMBiS-Net: A Lightweight Multipath Bidirectional Skip Connection based CNN for Retinal Blood Vessel Segmentation

Authors: Mufassir M. Abbasi, Shahzaib Iqbal, Asim Naveed, Tariq M. Khan, Syed S. Naqvi, Wajeeha Khalid

Abstract: Blinding eye diseases are often correlated with altered retinal morphology, which can be clinically identified by segmenting retinal structures in fundus images. However, current methodologies often fall short in accurately segmenting delicate vessels. Although deep learning has shown promise in medical image segmentation, its reliance on repeated convolution and pooling operations can hinder the… ▽ More Blinding eye diseases are often correlated with altered retinal morphology, which can be clinically identified by segmenting retinal structures in fundus images. However, current methodologies often fall short in accurately segmenting delicate vessels. Although deep learning has shown promise in medical image segmentation, its reliance on repeated convolution and pooling operations can hinder the representation of edge information, ultimately limiting overall segmentation accuracy. In this paper, we propose a lightweight pixel-level CNN named LMBiS-Net for the segmentation of retinal vessels with an exceptionally low number of learnable parameters \textbf{(only 0.172 M)}. The network used multipath feature extraction blocks and incorporates bidirectional skip connections for the information flow between the encoder and decoder. Additionally, we have optimized the efficiency of the model by carefully selecting the number of filters to avoid filter overlap. This optimization significantly reduces training time and enhances computational efficiency. To assess the robustness and generalizability of LMBiS-Net, we performed comprehensive evaluations on various aspects of retinal images. Specifically, the model was subjected to rigorous tests to accurately segment retinal vessels, which play a vital role in ophthalmological diagnosis and treatment. By focusing on the retinal blood vessels, we were able to thoroughly analyze the performance and effectiveness of the LMBiS-Net model. The results of our tests demonstrate that LMBiS-Net is not only robust and generalizable but also capable of maintaining high levels of segmentation accuracy. These characteristics highlight the potential of LMBiS-Net as an efficient tool for high-speed and accurate segmentation of retinal images in various clinical applications. △ Less

Submitted 10 September, 2023; originally announced September 2023.

arXiv:2309.03535 [pdf, other]

Feature Enhancer Segmentation Network (FES-Net) for Vessel Segmentation

Authors: Tariq M. Khan, Muhammad Arsalan, Shahzaib Iqbal, Imran Razzak, Erik Meijering

Abstract: Diseases such as diabetic retinopathy and age-related macular degeneration pose a significant risk to vision, highlighting the importance of precise segmentation of retinal vessels for the tracking and diagnosis of progression. However, existing vessel segmentation methods that heavily rely on encoder-decoder structures struggle to capture contextual information about retinal vessel configurations… ▽ More Diseases such as diabetic retinopathy and age-related macular degeneration pose a significant risk to vision, highlighting the importance of precise segmentation of retinal vessels for the tracking and diagnosis of progression. However, existing vessel segmentation methods that heavily rely on encoder-decoder structures struggle to capture contextual information about retinal vessel configurations, leading to challenges in reconciling semantic disparities between encoder and decoder features. To address this, we propose a novel feature enhancement segmentation network (FES-Net) that achieves accurate pixel-wise segmentation without requiring additional image enhancement steps. FES-Net directly processes the input image and utilizes four prompt convolutional blocks (PCBs) during downsampling, complemented by a shallow upsampling approach to generate a binary mask for each class. We evaluate the performance of FES-Net on four publicly available state-of-the-art datasets: DRIVE, STARE, CHASE, and HRF. The evaluation results clearly demonstrate the superior performance of FES-Net compared to other competitive approaches documented in the existing literature. △ Less

Submitted 7 September, 2023; originally announced September 2023.

arXiv:2308.00856 [pdf, other]

Differential Privacy for Adaptive Weight Aggregation in Federated Tumor Segmentation

Authors: Muhammad Irfan Khan, Esa Alhoniemi, Elina Kontio, Suleiman A. Khan, Mojtaba Jafaritadi

Abstract: Federated Learning (FL) is a distributed machine learning approach that safeguards privacy by creating an impartial global model while respecting the privacy of individual client data. However, the conventional FL method can introduce security risks when dealing with diverse client data, potentially compromising privacy and data integrity. To address these challenges, we present a differential pri… ▽ More Federated Learning (FL) is a distributed machine learning approach that safeguards privacy by creating an impartial global model while respecting the privacy of individual client data. However, the conventional FL method can introduce security risks when dealing with diverse client data, potentially compromising privacy and data integrity. To address these challenges, we present a differential privacy (DP) federated deep learning framework in medical image segmentation. In this paper, we extend our similarity weight aggregation (SimAgg) method to DP-SimAgg algorithm, a differentially private similarity-weighted aggregation algorithm for brain tumor segmentation in multi-modal magnetic resonance imaging (MRI). Our DP-SimAgg method not only enhances model segmentation capabilities but also provides an additional layer of privacy preservation. Extensive benchmarking and evaluation of our framework, with computational performance as a key consideration, demonstrate that DP-SimAgg enables accurate and robust brain tumor segmentation while minimizing communication costs during model training. This advancement is crucial for preserving the privacy of medical image data and safeguarding sensitive information. In conclusion, adding a differential privacy layer in the global weight aggregation phase of the federated brain tumor segmentation provides a promising solution to privacy concerns without compromising segmentation model efficacy. By leveraging DP, we ensure the protection of client data against adversarial attacks and malicious participants. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2307.12759 [pdf, other]

Code-Switched Urdu ASR for Noisy Telephonic Environment using Data Centric Approach with Hybrid HMM and CNN-TDNN

Authors: Muhammad Danyal Khan, Raheem Ali, Arshad Aziz

Abstract: Call Centers have huge amount of audio data which can be used for achieving valuable business insights and transcription of phone calls is manually tedious task. An effective Automated Speech Recognition system can accurately transcribe these calls for easy search through call history for specific context and content allowing automatic call monitoring, improving QoS through keyword search and sent… ▽ More Call Centers have huge amount of audio data which can be used for achieving valuable business insights and transcription of phone calls is manually tedious task. An effective Automated Speech Recognition system can accurately transcribe these calls for easy search through call history for specific context and content allowing automatic call monitoring, improving QoS through keyword search and sentiment analysis. ASR for Call Center requires more robustness as telephonic environment are generally noisy. Moreover, there are many low-resourced languages that are on verge of extinction which can be preserved with help of Automatic Speech Recognition Technology. Urdu is the $10^{th}$ most widely spoken language in the world, with 231,295,440 worldwide still remains a resource constrained language in ASR. Regional call-center conversations operate in local language, with a mix of English numbers and technical terms generally causing a "code-switching" problem. Hence, this paper describes an implementation framework of a resource efficient Automatic Speech Recognition/ Speech to Text System in a noisy call-center environment using Chain Hybrid HMM and CNN-TDNN for Code-Switched Urdu Language. Using Hybrid HMM-DNN approach allowed us to utilize the advantages of Neural Network with less labelled data. Adding CNN with TDNN has shown to work better in noisy environment due to CNN's additional frequency dimension which captures extra information from noisy speech, thus improving accuracy. We collected data from various open sources and labelled some of the unlabelled data after analysing its general context and content from Urdu language as well as from commonly used words from other languages, primarily English and were able to achieve WER of 5.2% with noisy as well as clean environment in isolated words or numbers as well as in continuous spontaneous speech. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: 32 pages, 19 figures, 2 tables, preprint

arXiv:2307.07096 [pdf, other]

Low Rank Properties for Estimating Microphones Start Time and Sources Emission Time

Authors: Faxian Cao, Yongqiang Cheng, Adil Mehmood Khan, Zhi**g Yang, S. M. Ahsan Kazmiand Yingxiu Chang

Abstract: Uncertainty in timing information pertaining to the start time of microphone recordings and sources' emission time pose significant challenges in various applications, such as joint microphones and sources localization. Traditional optimization methods, which directly estimate this unknown timing information (UTIm), often fall short compared to approaches exploiting the low-rank property (LRP). LR… ▽ More Uncertainty in timing information pertaining to the start time of microphone recordings and sources' emission time pose significant challenges in various applications, such as joint microphones and sources localization. Traditional optimization methods, which directly estimate this unknown timing information (UTIm), often fall short compared to approaches exploiting the low-rank property (LRP). LRP encompasses an additional low-rank structure, facilitating a linear constraint on UTIm to help formulate related low-rank structure information. This method allows us to attain globally optimal solutions for UTIm, given proper initialization. However, the initialization process often involves randomness, leading to suboptimal, local minimum values. This paper presents a novel, combined low-rank approximation (CLRA) method designed to mitigate the effects of this random initialization. We introduce three new LRP variants, underpinned by mathematical proof, which allow the UTIm to draw on a richer pool of low-rank structural information. Utilizing this augmented low-rank structural information from both LRP and the proposed variants, we formulate four linear constraints on the UTIm. Employing the proposed CLRA algorithm, we derive global optimal solutions for the UTIm via these four linear constraints.Experimental results highlight the superior performance of our method over existing state-of-the-art approaches, measured in terms of both the recovery number and reduced estimation errors of UTIm. △ Less

Submitted 21 July, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

Comments: 13 pages for main content; 9 pages for proof of proposed low rank properties; 13 figures

arXiv:2306.14255 [pdf, other]

AttResDU-Net: Medical Image Segmentation Using Attention-based Residual Double U-Net

Authors: Akib Mohammed Khan, Alif Ashrafee, Fahim Shahriar Khan, Md. Bakhtiar Hasan, Md. Hasanul Kabir

Abstract: Manually inspecting polyps from a colonoscopy for colorectal cancer or performing a biopsy on skin lesions for skin cancer are time-consuming, laborious, and complex procedures. Automatic medical image segmentation aims to expedite this diagnosis process. However, numerous challenges exist due to significant variations in the appearance and sizes of objects with no distinct boundaries. This paper… ▽ More Manually inspecting polyps from a colonoscopy for colorectal cancer or performing a biopsy on skin lesions for skin cancer are time-consuming, laborious, and complex procedures. Automatic medical image segmentation aims to expedite this diagnosis process. However, numerous challenges exist due to significant variations in the appearance and sizes of objects with no distinct boundaries. This paper proposes an attention-based residual Double U-Net architecture (AttResDU-Net) that improves on the existing medical image segmentation networks. Inspired by the Double U-Net, this architecture incorporates attention gates on the skip connections and residual connections in the convolutional blocks. The attention gates allow the model to retain more relevant spatial information by suppressing irrelevant feature representation from the down-sampling path for which the model learns to focus on target regions of varying shapes and sizes. Moreover, the residual connections help to train deeper models by ensuring better gradient flow. We conducted experiments on three datasets: CVC Clinic-DB, ISIC 2018, and the 2018 Data Science Bowl datasets and achieved Dice Coefficient scores of 94.35%, 91.68% and 92.45% respectively. Our results suggest that AttResDU-Net can be facilitated as a reliable method for automatic medical image segmentation in practice. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: Accepted in 2023 International Joint Conference on Neural Networks (IJCNN 2023)

arXiv:2306.06145 [pdf, other]

LDMRes-Net: Enabling Efficient Medical Image Segmentation on IoT and Edge Platforms

Authors: Shahzaib Iqbal, Tariq M. Khan, Syed S. Naqvi, Muhammad Usman, Imran Razzak

Abstract: In this study, we propose LDMRes-Net, a lightweight dual-multiscale residual block-based computational neural network tailored for medical image segmentation on IoT and edge platforms. Conventional U-Net-based models face challenges in meeting the speed and efficiency demands of real-time clinical applications, such as disease monitoring, radiation therapy, and image-guided surgery. LDMRes-Net ove… ▽ More In this study, we propose LDMRes-Net, a lightweight dual-multiscale residual block-based computational neural network tailored for medical image segmentation on IoT and edge platforms. Conventional U-Net-based models face challenges in meeting the speed and efficiency demands of real-time clinical applications, such as disease monitoring, radiation therapy, and image-guided surgery. LDMRes-Net overcomes these limitations with its remarkably low number of learnable parameters (0.072M), making it highly suitable for resource-constrained devices. The model's key innovation lies in its dual multi-residual block architecture, which enables the extraction of refined features on multiple scales, enhancing overall segmentation performance. To further optimize efficiency, the number of filters is carefully selected to prevent overlap, reduce training time, and improve computational efficiency. The study includes comprehensive evaluations, focusing on segmentation of the retinal image of vessels and hard exudates crucial for the diagnosis and treatment of ophthalmology. The results demonstrate the robustness, generalizability, and high segmentation accuracy of LDMRes-Net, positioning it as an efficient tool for accurate and rapid medical image segmentation in diverse clinical applications, particularly on IoT and edge platforms. Such advances hold significant promise for improving healthcare outcomes and enabling real-time medical image analysis in resource-limited settings. △ Less

Submitted 7 September, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

arXiv:2306.00147 [pdf, other]

Stochastic Analysis of LMS Algorithm with Delayed Block Coefficient Adaptation

Authors: Mohd. Tasleem Khan, Oscar Gustafsson

Abstract: In high sample-rate applications of the least-mean-square (LMS) adaptive filtering algorithm, pipelining or/and block processing is required. As opposed to earlier work, pipelining and block processing are jointly considered to obtain what we refer to as the delayed block LMS (DBLMS) algorithm. Different stochastic analyses for the steady and transient states to estimate the step-size bound, adapt… ▽ More In high sample-rate applications of the least-mean-square (LMS) adaptive filtering algorithm, pipelining or/and block processing is required. As opposed to earlier work, pipelining and block processing are jointly considered to obtain what we refer to as the delayed block LMS (DBLMS) algorithm. Different stochastic analyses for the steady and transient states to estimate the step-size bound, adaptation accuracy, and adaptation speed based on the recursive relation of delayed block excess mean square error (MSE) are presented. The effect of different amounts of pipelining delays and block sizes on the adaptation accuracy and speed of the adaptive filter with different filter lengths and speed-ups are studied. It is concluded that for a constant speed-up, a large delay and small block size lead to a slower convergence rate compared to a small delay and large block size with almost the same steady-state MSE. Monte Carlo simulations indicate a good agreement with the proposed estimates for Gaussian inputs. △ Less

Submitted 21 June, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

Comments: 13 pages, 8 figures

arXiv:2305.11397 [pdf, other]

Are Microphone Signals Alone Sufficient for Self-Positioning?

Authors: Faxian Cao, Yongqiang Cheng, Adil Mehmood Khan, Zhi**g Yang

Abstract: In an era where asynchronous environments pose challenges to traditional self-positioning methods, we propose a new transformation to the existing paradigm. Traditionally, time of arrival (TOA) measurements require both microphone and source signals, limiting their applicability in environments with unknown emission time of human voices or sources and unknown recording start time of independent mi… ▽ More In an era where asynchronous environments pose challenges to traditional self-positioning methods, we propose a new transformation to the existing paradigm. Traditionally, time of arrival (TOA) measurements require both microphone and source signals, limiting their applicability in environments with unknown emission time of human voices or sources and unknown recording start time of independent microphones. To address this issue, our research pioneers a map** function capable of transforming both TOA and time difference of arrival (TDOA) formulas, demonstrating, for the first time, that they can be identical to one another. This implies that microphone signals alone are sufficient for self-positioning without the need for source signal waveforms, a groundbreaking advancement in the field that carries the potential to revolutionize self-positioning techniques, expanding their applicability in challenging environments. Supported by a robust mathematical proof and compelling experimental results, this research represents a timely and significant contribution to the current discourse in signal, and audio processing. △ Less

Submitted 6 July, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: 1 figure, including 3 sub-figures

arXiv:2304.12856 [pdf, other]

Retinal Vessel Segmentation via a Multi-resolution Contextual Network and Adversarial Learning

Authors: Tariq M. Khan, Syed S. Naqvi, Antonio Robles-Kelly, Imran Razzak

Abstract: Timely and affordable computer-aided diagnosis of retinal diseases is pivotal in precluding blindness. Accurate retinal vessel segmentation plays an important role in disease progression and diagnosis of such vision-threatening diseases. To this end, we propose a Multi-resolution Contextual Network (MRC-Net) that addresses these issues by extracting multi-scale features to learn contextual depende… ▽ More Timely and affordable computer-aided diagnosis of retinal diseases is pivotal in precluding blindness. Accurate retinal vessel segmentation plays an important role in disease progression and diagnosis of such vision-threatening diseases. To this end, we propose a Multi-resolution Contextual Network (MRC-Net) that addresses these issues by extracting multi-scale features to learn contextual dependencies between semantically different features and using bi-directional recurrent learning to model former-latter and latter-former dependencies. Another key idea is training in adversarial settings for foreground segmentation improvement through optimization of the region-based scores. This novel strategy boosts the performance of the segmentation network in terms of the Dice score (and correspondingly Jaccard index) while kee** the number of trainable parameters comparatively low. We have evaluated our method on three benchmark datasets, including DRIVE, STARE, and CHASE, demonstrating its superior performance as compared with competitive approaches elsewhere in the literature. △ Less

Submitted 25 April, 2023; originally announced April 2023.

arXiv:2304.11445 [pdf, other]

Improving Stain Invariance of CNNs for Segmentation by Fusing Channel Attention and Domain-Adversarial Training

Authors: Kudaibergen Abutalip, Numan Saeed, Mustaqeem Khan, Abdulmotaleb El Saddik

Abstract: Variability in staining protocols, such as different slide preparation techniques, chemicals, and scanner configurations, can result in a diverse set of whole slide images (WSIs). This distribution shift can negatively impact the performance of deep learning models on unseen samples, presenting a significant challenge for develo** new computational pathology applications. In this study, we propo… ▽ More Variability in staining protocols, such as different slide preparation techniques, chemicals, and scanner configurations, can result in a diverse set of whole slide images (WSIs). This distribution shift can negatively impact the performance of deep learning models on unseen samples, presenting a significant challenge for develo** new computational pathology applications. In this study, we propose a method for improving the generalizability of convolutional neural networks (CNNs) to stain changes in a single-source setting for semantic segmentation. Recent studies indicate that style features mainly exist as covariances in earlier network layers. We design a channel attention mechanism based on these findings that detects stain-specific features and modify the previously proposed stain-invariant training scheme. We reweigh the outputs of earlier layers and pass them to the stain-adversarial training branch. We evaluate our method on multi-center, multi-stain datasets and demonstrate its effectiveness through interpretability analysis. Our approach achieves substantial improvements over baselines and competitive performance compared to other methods, as measured by various evaluation metrics. We also show that combining our method with stain augmentation leads to mutually beneficial results and outperforms other techniques. Overall, our study makes significant contributions to the field of computational pathology. △ Less

Submitted 22 April, 2023; originally announced April 2023.

arXiv:2304.09756 [pdf, other]

Contactless Human Activity Recognition using Deep Learning with Flexible and Scalable Software Define Radio

Authors: Muhammad Zakir Khan, Jawad Ahmad, Wadii Boulila, Matthew Broadbent, Syed Aziz Shah, Anis Koubaa, Qammer H. Abbasi

Abstract: Ambient computing is gaining popularity as a major technological advancement for the future. The modern era has witnessed a surge in the advancement in healthcare systems, with viable radio frequency solutions proposed for remote and unobtrusive human activity recognition (HAR). Specifically, this study investigates the use of Wi-Fi channel state information (CSI) as a novel method of ambient sens… ▽ More Ambient computing is gaining popularity as a major technological advancement for the future. The modern era has witnessed a surge in the advancement in healthcare systems, with viable radio frequency solutions proposed for remote and unobtrusive human activity recognition (HAR). Specifically, this study investigates the use of Wi-Fi channel state information (CSI) as a novel method of ambient sensing that can be employed as a contactless means of recognizing human activity in indoor environments. These methods avoid additional costly hardware required for vision-based systems, which are privacy-intrusive, by (re)using Wi-Fi CSI for various safety and security applications. During an experiment utilizing universal software-defined radio (USRP) to collect CSI samples, it was observed that a subject engaged in six distinct activities, which included no activity, standing, sitting, and leaning forward, across different areas of the room. Additionally, more CSI samples were collected when the subject walked in two different directions. This study presents a Wi-Fi CSI-based HAR system that assesses and contrasts deep learning approaches, namely convolutional neural network (CNN), long short-term memory (LSTM), and hybrid (LSTM+CNN), employed for accurate activity recognition. The experimental results indicate that LSTM surpasses current models and achieves an average accuracy of 95.3% in multi-activity classification when compared to CNN and hybrid techniques. In the future, research needs to study the significance of resilience in diverse and dynamic environments to identify the activity of multiple users. △ Less

Submitted 18 April, 2023; originally announced April 2023.

arXiv:2304.02836 [pdf, other]

Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification

Authors: Thomas Z. Li, John M. Still, Kaiwen Xu, Ho Hin Lee, Leon Y. Cai, Aravind R. Krishnan, Riqiang Gao, Mirza S. Khan, Sanja Antic, Michael Kammer, Kim L. Sandler, Fabien Maldonado, Bennett A. Landman, Thomas A. Lasko

Abstract: The accuracy of predictive models for solitary pulmonary nodule (SPN) diagnosis can be greatly increased by incorporating repeat imaging and medical context, such as electronic health records (EHRs). However, clinically routine modalities such as imaging and diagnostic codes can be asynchronous and irregularly sampled over different time scales which are obstacles to longitudinal multimodal learni… ▽ More The accuracy of predictive models for solitary pulmonary nodule (SPN) diagnosis can be greatly increased by incorporating repeat imaging and medical context, such as electronic health records (EHRs). However, clinically routine modalities such as imaging and diagnostic codes can be asynchronous and irregularly sampled over different time scales which are obstacles to longitudinal multimodal learning. In this work, we propose a transformer-based multimodal strategy to integrate repeat imaging with longitudinal clinical signatures from routinely collected EHRs for SPN classification. We perform unsupervised disentanglement of latent clinical signatures and leverage time-distance scaled self-attention to jointly learn from clinical signatures expressions and chest computed tomography (CT) scans. Our classifier is pretrained on 2,668 scans from a public dataset and 1,149 subjects with longitudinal chest CTs, billing codes, medications, and laboratory tests from EHRs of our home institution. Evaluation on 227 subjects with challenging SPNs revealed a significant AUC improvement over a longitudinal multimodal baseline (0.824 vs 0.752 AUC), as well as improvements over a single cross-section multimodal scenario (0.809 AUC) and a longitudinal imaging-only scenario (0.741 AUC). This work demonstrates significant advantages with a novel approach for co-learning longitudinal imaging and non-imaging phenotypes with transformers. Code available at https://github.com/MASILab/lmsignatures. △ Less

Submitted 29 June, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

Comments: Accepted to MICCAI 2023

arXiv:2304.01576 [pdf, other]

MESAHA-Net: Multi-Encoders based Self-Adaptive Hard Attention Network with Maximum Intensity Projections for Lung Nodule Segmentation in CT Scan

Authors: Muhammad Usman, Azka Rehman, Abdullah Shahid, Siddique Latif, Shi Sub Byon, Sung Hyun Kim, Tariq Mahmood Khan, Yeong Gil Shin

Abstract: Accurate lung nodule segmentation is crucial for early-stage lung cancer diagnosis, as it can substantially enhance patient survival rates. Computed tomography (CT) images are widely employed for early diagnosis in lung nodule analysis. However, the heterogeneity of lung nodules, size diversity, and the complexity of the surrounding environment pose challenges for develo** robust nodule segmenta… ▽ More Accurate lung nodule segmentation is crucial for early-stage lung cancer diagnosis, as it can substantially enhance patient survival rates. Computed tomography (CT) images are widely employed for early diagnosis in lung nodule analysis. However, the heterogeneity of lung nodules, size diversity, and the complexity of the surrounding environment pose challenges for develo** robust nodule segmentation methods. In this study, we propose an efficient end-to-end framework, the multi-encoder-based self-adaptive hard attention network (MESAHA-Net), for precise lung nodule segmentation in CT scans. MESAHA-Net comprises three encoding paths, an attention block, and a decoder block, facilitating the integration of three types of inputs: CT slice patches, forward and backward maximum intensity projection (MIP) images, and region of interest (ROI) masks encompassing the nodule. By employing a novel adaptive hard attention mechanism, MESAHA-Net iteratively performs slice-by-slice 2D segmentation of lung nodules, focusing on the nodule region in each slice to generate 3D volumetric segmentation of lung nodules. The proposed framework has been comprehensively evaluated on the LIDC-IDRI dataset, the largest publicly available dataset for lung nodule segmentation. The results demonstrate that our approach is highly robust for various lung nodule types, outperforming previous state-of-the-art techniques in terms of segmentation accuracy and computational complexity, rendering it suitable for real-time clinical implementation. △ Less

Submitted 4 April, 2023; originally announced April 2023.

arXiv:2301.04115 [pdf, other]

Sensing the Environment with 5G Scattered Signals (5G-CommSense): A Feasibility Analysis

Authors: Sandip Jana, Amit Kumar Mishra, Mohammed Zafar Ali Khan

Abstract: By making use of the sensors and AI (SensAI) algorithms for a specialized task, Application Specific INstrumentation (ASIN) framework uses less computational overhead and gives a good performance. This work evaluates the feasibility of the ASIN framework dependent Communication based Sensing (CommSense) system using 5th Generation New Radio (5G NR) infrastructure. Since our proposed system is back… ▽ More By making use of the sensors and AI (SensAI) algorithms for a specialized task, Application Specific INstrumentation (ASIN) framework uses less computational overhead and gives a good performance. This work evaluates the feasibility of the ASIN framework dependent Communication based Sensing (CommSense) system using 5th Generation New Radio (5G NR) infrastructure. Since our proposed system is backed up by 5G NR infra, this system is termed as 5G-CommSense. In this paper, we have used NR channel models specified by the 3rd Generation Partnership Project (3GPP) and added white Gaussian noise (AWGN) to vary the signal to noise ratio at the receiver. Finally, from our simulation result, we conclude that the proposed system is practically feasible. △ Less

Submitted 10 January, 2023; originally announced January 2023.

Comments: 3 pages, Accepted in conference

arXiv:2212.14618 [pdf]

Blind Restoration of Real-World Audio by 1D Operational GANs

Authors: Turker Ince, Serkan Kiranyaz, Ozer Can Devecioglu, Muhammad Salman Khan, Muhammad Chowdhury, Moncef Gabbouj

Abstract: Objective: Despite numerous studies proposed for audio restoration in the literature, most of them focus on an isolated restoration problem such as denoising or dereverberation, ignoring other artifacts. Moreover, assuming a noisy or reverberant environment with limited number of fixed signal-to-distortion ratio (SDR) levels is a common practice. However, real-world audio is often corrupted by a b… ▽ More Objective: Despite numerous studies proposed for audio restoration in the literature, most of them focus on an isolated restoration problem such as denoising or dereverberation, ignoring other artifacts. Moreover, assuming a noisy or reverberant environment with limited number of fixed signal-to-distortion ratio (SDR) levels is a common practice. However, real-world audio is often corrupted by a blend of artifacts such as reverberation, sensor noise, and background audio mixture with varying types, severities, and duration. In this study, we propose a novel approach for blind restoration of real-world audio signals by Operational Generative Adversarial Networks (Op-GANs) with temporal and spectral objective metrics to enhance the quality of restored audio signal regardless of the type and severity of each artifact corrupting it. Methods: 1D Operational-GANs are used with generative neuron model optimized for blind restoration of any corrupted audio signal. Results: The proposed approach has been evaluated extensively over the benchmark TIMIT-RAR (speech) and GTZAN-RAR (non-speech) datasets corrupted with a random blend of artifacts each with a random severity to mimic real-world audio signals. Average SDR improvements of over 7.2 dB and 4.9 dB are achieved, respectively, which are substantial when compared with the baseline methods. Significance: This is a pioneer study in blind audio restoration with the unique capability of direct (time-domain) restoration of real-world audio whilst achieving an unprecedented level of performance for a wide SDR range and artifact types. Conclusion: 1D Op-GANs can achieve robust and computationally effective real-world audio restoration with significantly improved performance. The source codes and the generated real-world audio datasets are shared publicly with the research community in a dedicated GitHub repository1. △ Less

Submitted 20 January, 2023; v1 submitted 30 December, 2022; originally announced December 2022.

arXiv:2212.01445 [pdf, other]

Drones-aided Asset Maintenance in Hospitals

Authors: Muhammad Asif Khan, Hamid Menouar, Ridha Hamila

Abstract: The rapid outbreak of COVID-19 pandemic invoked scientists and researchers to prepare the world for future disasters. During the pandemic, global authorities on healthcare urged the importance of disinfection of objects and surfaces. To implement efficient and safe disinfection services during the pandemic, robots have been utilized for indoor assets. In this paper, we envision the use of drones f… ▽ More The rapid outbreak of COVID-19 pandemic invoked scientists and researchers to prepare the world for future disasters. During the pandemic, global authorities on healthcare urged the importance of disinfection of objects and surfaces. To implement efficient and safe disinfection services during the pandemic, robots have been utilized for indoor assets. In this paper, we envision the use of drones for disinfection of outdoor assets in hospitals and other facilities. Such heterogeneous assets may have different service demands (e.g., service time, quantity of the disinfectant material etc.), whereas drones have typically limited capacity (i.e., travel time, disinfectant carrying capacity). To serve all the facility assets in an efficient manner, the drone to assets allocation and drone travel routes must be optimized. In this paper, we formulate the capacitated vehicle routing problem (CVRP) to find optimal route for each drone such that the total service time is minimized, while simultaneously the drones meet the demands of each asset allocated to it. The problem is solved using mixed integer programming (MIP). As CVRP is an NP-hard problem, we propose a lightweight heuristic to achieve sub-optimal performance while reducing the time complexity in solving the problem involving a large number of assets. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: Paper accepted at 2022 2nd International Conference on Computers and Automation (CompAuto 2022)

arXiv:2210.15122 [pdf]

Experimental Comparison of SNR and RSSI for LoRa-ESL Based on Machine Clustering and Arithmetic Distribution

Authors: Malak Abid Ali Khan, Hongbin Ma, Syed Muhammad Aamir, Cekderi Anil Baris

Abstract: LoRa lacks the sensing capabilities of channel status. Received signal strength indicator (RSSI) decreases due to collision, interference, and near-far effect while for signal-to-noise ratio (SNR), the packets are rejected by decreasing the transmission power (TP) at a higher spreading factor (SF). To overcome these challenges in the case of electric shelf label (ESL) to minimize the dependency on… ▽ More LoRa lacks the sensing capabilities of channel status. Received signal strength indicator (RSSI) decreases due to collision, interference, and near-far effect while for signal-to-noise ratio (SNR), the packets are rejected by decreasing the transmission power (TP) at a higher spreading factor (SF). To overcome these challenges in the case of electric shelf label (ESL) to minimize the dependency on retransmission and acknowledgment, the end devices (EDs) are allocated around gateways (GWs) based on machine clustering with dynamic SF for SNR while dynamic TP for RSSI. The experimental results determined that the RSSI approach is more dominant than SNR because of determining the exact locality of the ED that diminished the capture effect. Arithmetic distribution of EDs for various GWs in different clusters helps to minify the near-far effect. The resultant received power (RP) at each cluster is higher for most of the connected EDs than the threshold RP. △ Less

Submitted 13 December, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.09086 [pdf]

Artificial Intelligence Nomenclature Identified From Delphi Study on Key Issues Related to Trust and Barriers to Adoption for Autonomous Systems

Authors: Thomas E. Doyle, Victoria Tucci, Calvin Zhu, Yifei Zhang, Basem Yassa, Sajjad Rashidiani, Md Asif Khan, Reza Samavi, Michael Noseworthy, Steven Yule

Abstract: The rapid integration of artificial intelligence across traditional research domains has generated an amalgamation of nomenclature. As cross-discipline teams work together on complex machine learning challenges, finding a consensus of basic definitions in the literature is a more fundamental problem. As a step in the Delphi process to define issues with trust and barriers to the adoption of autono… ▽ More The rapid integration of artificial intelligence across traditional research domains has generated an amalgamation of nomenclature. As cross-discipline teams work together on complex machine learning challenges, finding a consensus of basic definitions in the literature is a more fundamental problem. As a step in the Delphi process to define issues with trust and barriers to the adoption of autonomous systems, our study first collected and ranked the top concerns from a panel of international experts from the fields of engineering, computer science, medicine, aerospace, and defence, with experience working with artificial intelligence. This document presents a summary of the literature definitions for nomenclature derived from expert feedback. △ Less

Submitted 14 October, 2022; originally announced October 2022.

Comments: 6 pages

arXiv:2210.08168 [pdf, other]

MKIS-Net: A Light-Weight Multi-Kernel Network for Medical Image Segmentation

Authors: Tariq M. Khan, Muhammad Arsalan, Antonio Robles-Kelly, Erik Meijering

Abstract: Image segmentation is an important task in medical imaging. It constitutes the backbone of a wide variety of clinical diagnostic methods, treatments, and computer-aided surgeries. In this paper, we propose a multi-kernel image segmentation net (MKIS-Net), which uses multiple kernels to create an efficient receptive field and enhance segmentation performance. As a result of its multi-kernel design,… ▽ More Image segmentation is an important task in medical imaging. It constitutes the backbone of a wide variety of clinical diagnostic methods, treatments, and computer-aided surgeries. In this paper, we propose a multi-kernel image segmentation net (MKIS-Net), which uses multiple kernels to create an efficient receptive field and enhance segmentation performance. As a result of its multi-kernel design, MKIS-Net is a light-weight architecture with a small number of trainable parameters. Moreover, these multi-kernel receptive fields also contribute to better segmentation results. We demonstrate the efficacy of MKIS-Net on several tasks including segmentation of retinal vessels, skin lesion segmentation, and chest X-ray segmentation. The performance of the proposed network is quite competitive, and often superior, in comparison to state-of-the-art methods. Moreover, in some cases MKIS-Net has more than an order of magnitude fewer trainable parameters than existing medical image segmentation alternatives and is at least four times smaller than other light-weight architectures. △ Less

Submitted 14 October, 2022; originally announced October 2022.

arXiv:2210.03234 [pdf, other]

Swarm of UAVs for Network Management in 6G: A Technical Review

Authors: Muhammad Asghar Khan, Neeraj Kumar, Syed Agha Hassnain Mohsan, Wali Ullah Khan, Moustafa M. Nasralla, Mohammed H. Alsharif, Justyna ywioek, Insaf Ullah

Abstract: Fifth-generation (5G) cellular networks have led to the implementation of beyond 5G (B5G) networks, which are capable of incorporating autonomous services to swarm of unmanned aerial vehicles (UAVs). They provide capacity expansion strategies to address massive connectivity issues and guarantee ultra-high throughput and low latency, especially in extreme or emergency situations where network densi… ▽ More Fifth-generation (5G) cellular networks have led to the implementation of beyond 5G (B5G) networks, which are capable of incorporating autonomous services to swarm of unmanned aerial vehicles (UAVs). They provide capacity expansion strategies to address massive connectivity issues and guarantee ultra-high throughput and low latency, especially in extreme or emergency situations where network density, bandwidth, and traffic patterns fluctuate. On the one hand, 6G technology integrates AI/ML, IoT, and blockchain to establish ultra-reliable, intelligent, secure, and ubiquitous UAV networks. 6G networks, on the other hand, rely on new enabling technologies such as air interface and transmission technologies, as well as a unique network design, posing new challenges for the swarm of UAVs. Kee** these challenges in mind, this article focuses on the security and privacy, intelligence, and energy-efficiency issues faced by swarms of UAVs operating in 6G mobile networks. In this state-of-the-art review, we integrated blockchain and AI/ML with UAV networks utilizing the 6G ecosystem. The key findings are then presented, and potential research challenges are identified. We conclude the review by shedding light on future research in this emerging field of research. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: 19, 9

arXiv:2208.05184 [pdf]

Preserving the beamforming effect for spatial cue-based pseudo-binaural dereverberation of a single source

Authors: Sania Gul, Muhammad Salman Khan, Syed Waqar Shah

Abstract: Reverberations are unavoidable in enclosures, resulting in reduced intelligibility for hearing impaired and non native listeners and even for the normal hearing listeners in noisy circumstances. It also degrades the performance of machine listening applications. In this paper, we propose a novel approach of binaural dereverberation of a single speech source, using the differences in the interaural… ▽ More Reverberations are unavoidable in enclosures, resulting in reduced intelligibility for hearing impaired and non native listeners and even for the normal hearing listeners in noisy circumstances. It also degrades the performance of machine listening applications. In this paper, we propose a novel approach of binaural dereverberation of a single speech source, using the differences in the interaural cues of the direct path signal and the reverberations. Two beamformers, spaced at an interaural distance, are used to extract the reverberations from the reverberant speech. The interaural cues generated by these reverberations and those generated by the direct path signal act as a two class dataset, used for the training of U-Net (a deep convolutional neural network). After its training, the beamformers are removed and the trained U-Net along with the maximum likelihood estimation (MLE) algorithm is used to discriminate between the direct path cues from the reverberation cues, when the system is exposed to the interaural spectrogram of the reverberant speech signal. Our proposed model has outperformed the classical signal processing dereverberation model weighted prediction error in terms of cepstral distance (CEP), frequency weighted segmental signal to noise ratio (FWSEGSNR) and signal to reverberation modulation energy ratio (SRMR) by 1.4 points, 8 dB and 0.6dB. It has achieved better performance than the deep learning based dereverberation model by gaining 1.3 points improvement in CEP with comparable FWSEGSNR, using training dataset which is almost 8 times smaller than required for that model. The proposed model also sustained its performance under relatively similar unseen acoustic conditions and at positions in the vicinity of its training position. △ Less

Submitted 10 August, 2022; originally announced August 2022.

Comments: 25 pages, 7 figures

arXiv:2208.04998 [pdf, ps, other]

Towards Enabling Next Generation Societal Virtual Reality Applications for Virtual Human Teleportation

Authors: Jacob Chakareski, Mahmudur Khan, Murat Yuksel

Abstract: Virtual reality (VR) is an emerging technology of great societal potential. Some of its most exciting and promising use cases include remote scene content and untethered lifelike navigation. This article first highlights the relevance of such future societal applications and the challenges ahead towards enabling them. It then provides a broad and contextual high-level perspective of several emergi… ▽ More Virtual reality (VR) is an emerging technology of great societal potential. Some of its most exciting and promising use cases include remote scene content and untethered lifelike navigation. This article first highlights the relevance of such future societal applications and the challenges ahead towards enabling them. It then provides a broad and contextual high-level perspective of several emerging technologies and unconventional techniques and argues that only by their synergistic integration can the fundamental performance bottlenecks of hyper-intensive computation, ultra-high data rate, and ultra-low latency be overcome to enable untethered and lifelike VR-based remote scene immersion. A novel future system concept is introduced that embodies this holistic integration, unified with a rigorous analysis, to capture the fundamental synergies and interplay between communications, computation, and signal scalability that arise in this context, and advance its performance at the same time. Several representative results highlighting these trade-offs and the benefits of the envisioned system are presented at the end. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: This is an extended version (with more details) of a tutorial feature article that will appear in the IEEE Signal Processing Magazine in September 2022

arXiv:2208.04626 [pdf]

Recycling an anechoic pre-trained speech separation deep neural network for binaural dereverberation of a single source

Authors: Sania Gul, Muhammad Salman Khan, Syed Waqar Shah, Ata Ur-Rehman

Abstract: Reverberation results in reduced intelligibility for both normal and hearing-impaired listeners. This paper presents a novel psychoacoustic approach of dereverberation of a single speech source by recycling a pre-trained binaural anechoic speech separation neural network. As training the deep neural network (DNN) is a lengthy and computationally expensive process, the advantage of using a pre-trai… ▽ More Reverberation results in reduced intelligibility for both normal and hearing-impaired listeners. This paper presents a novel psychoacoustic approach of dereverberation of a single speech source by recycling a pre-trained binaural anechoic speech separation neural network. As training the deep neural network (DNN) is a lengthy and computationally expensive process, the advantage of using a pre-trained separation network for dereverberation is that the network does not need to be retrained, saving both time and computational resources. The interaural cues of a reverberant source are given to this pretrained neural network to discriminate between the direct path signal and the reverberant speech. The results show an average improvement of 1.3% in signal intelligibility, 0.83 dB in SRMR (signal to reverberation energy ratio) and 0.16 points in perceptual evaluation of speech quality (PESQ) over other state-of-the-art signal processing dereverberation algorithms and 14% in intelligibility and 0.35 points in quality over orthogonal matching pursuit with spectral subtraction (OSS), a machine learning based dereverberation algorithm. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: 15 pages, 4 figures

arXiv:2207.10807 [pdf]

A Machine Learning Approach for Driver Identification Based on CAN-BUS Sensor Data

Authors: Md. Abbas Ali Khan, Mphammad Hanif Ali, AKM Fazlul Haque, Md. Tarek Habib

Abstract: Driver identification is a momentous field of modern decorated vehicles in the controller area network (CAN-BUS) perspective. Many conventional systems are used to identify the driver. One step ahead, most of the researchers use sensor data of CAN-BUS but there are some difficulties because of the variation of the protocol of different models of vehicle. Our aim is to identify the driver through s… ▽ More Driver identification is a momentous field of modern decorated vehicles in the controller area network (CAN-BUS) perspective. Many conventional systems are used to identify the driver. One step ahead, most of the researchers use sensor data of CAN-BUS but there are some difficulties because of the variation of the protocol of different models of vehicle. Our aim is to identify the driver through supervised learning algorithms based on driving behavior analysis. To determine the driver, a driver verification technique is proposed that evaluate driving pattern using the measurement of CAN sensor data. In this paper on-board diagnostic (OBD-II) is used to capture the data from the CAN-BUS sensor and the sensors are listed under SAE J1979 statement. According to the service of OBD-II, drive identification is possible. However, we have gained two types of accuracy on a complete data set with 10 drivers and a partial data set with two drivers. The accuracy is good with less number of drivers compared to the higher number of drivers. We have achieved statistically significant results in terms of accuracy in contrast to the baseline algorithm △ Less

Submitted 15 July, 2022; originally announced July 2022.

arXiv:2207.06551 [pdf, other]

Body Composition Assessment with Limited Field-of-view Computed Tomography: A Semantic Image Extension Perspective

Authors: Kaiwen Xu, Thomas Li, Mirza S. Khan, Riqiang Gao, Sanja L. Antic, Yuankai Huo, Kim L. Sandler, Fabien Maldonado, Bennett A. Landman

Abstract: Field-of-view (FOV) tissue truncation beyond the lungs is common in routine lung screening computed tomography (CT). This poses limitations for opportunistic CT- based body composition (BC) assessment as key anatomical structures are missing. Traditionally, extending the FOV of CT is considered as a CT reconstruction problem using limited data. However, this approach relies on the projection domai… ▽ More Field-of-view (FOV) tissue truncation beyond the lungs is common in routine lung screening computed tomography (CT). This poses limitations for opportunistic CT- based body composition (BC) assessment as key anatomical structures are missing. Traditionally, extending the FOV of CT is considered as a CT reconstruction problem using limited data. However, this approach relies on the projection domain data which might not be available in application. In this work, we formulate the problem from the semantic image extension perspective which only requires image data as inputs. The proposed two-stage method identifies a new FOV border based on the estimated extent of the complete body and imputes missing tissues in the truncated region. The training samples are simulated using CT slices with complete body in FOV, making the model development self-supervised. We evaluate the validity of the proposed method in automatic BC assessment using lung screening CT with limited FOV. The proposed method effectively restores the missing tissues and reduces BC assessment error introduced by FOV tissue truncation. In the BC assessment for a large-scale lung screening CT dataset, this correction improves both the intra-subject consistency and the correlation with anthropometric approximations. The developed method is available at https://github.com/MASILab/S-EFOV. △ Less

Submitted 15 April, 2023; v1 submitted 13 July, 2022; originally announced July 2022.

Comments: Updated with additional evaluation and clarification

arXiv:2206.07595 [pdf]

BIO-CXRNET: A Robust Multimodal Stacking Machine Learning Technique for Mortality Risk Prediction of COVID-19 Patients using Chest X-Ray Images and Clinical Data

Authors: Tawsifur Rahman, Muhammad E. H. Chowdhury, Amith Khandakar, Zaid Bin Mahbub, Md Sakib Abrar Hossain, Abraham Alhatou, Eynas Abdalla, Sreekumar Muthiyal, Khandaker Farzana Islam, Saad Bin Abul Kashem, Muhammad Salman Khan, Susu M. Zughaier, Maqsud Hossain

Abstract: Fast and accurate detection of the disease can significantly help in reducing the strain on the healthcare facility of any country to reduce the mortality during any pandemic. The goal of this work is to create a multimodal system using a novel machine learning framework that uses both Chest X-ray (CXR) images and clinical data to predict severity in COVID-19 patients. In addition, the study prese… ▽ More Fast and accurate detection of the disease can significantly help in reducing the strain on the healthcare facility of any country to reduce the mortality during any pandemic. The goal of this work is to create a multimodal system using a novel machine learning framework that uses both Chest X-ray (CXR) images and clinical data to predict severity in COVID-19 patients. In addition, the study presents a nomogram-based scoring technique for predicting the likelihood of death in high-risk patients. This study uses 25 biomarkers and CXR images in predicting the risk in 930 COVID-19 patients admitted during the first wave of COVID-19 (March-June 2020) in Italy. The proposed multimodal stacking technique produced the precision, sensitivity, and F1-score, of 89.03%, 90.44%, and 89.03%, respectively to identify low or high-risk patients. This multimodal approach improved the accuracy by 6% in comparison to the CXR image or clinical data alone. Finally, nomogram scoring system using multivariate logistic regression -- was used to stratify the mortality risk among the high-risk patients identified in the first stage. Lactate Dehydrogenase (LDH), O2 percentage, White Blood Cells (WBC) Count, Age, and C-reactive protein (CRP) were identified as useful predictor using random forest feature selection model. Five predictors parameters and a CXR image based nomogram score was developed for quantifying the probability of death and categorizing them into two risk groups: survived (<50%), and death (>=50%), respectively. The multi-modal technique was able to predict the death probability of high-risk patients with an F1 score of 92.88 %. The area under the curves for the development and validation cohorts are 0.981 and 0.939, respectively. △ Less

Submitted 15 June, 2022; originally announced June 2022.

Comments: 25 pages, 8 Tables, 10 Figures

arXiv:2204.12177 [pdf]

doi 10.1007/978-3-030-89817-5_6

A Comparative Study on Approaches to Acoustic Scene Classification using CNNs

Authors: Ishrat Jahan Ananya, Sarah Suad, Shadab Hafiz Choudhury, Mohammad Ashrafuzzaman Khan

Abstract: Acoustic scene classification is a process of characterizing and classifying the environments from sound recordings. The first step is to generate features (representations) from the recorded sound and then classify the background environments. However, different kinds of representations have dramatic effects on the accuracy of the classification. In this paper, we explored the three such represen… ▽ More Acoustic scene classification is a process of characterizing and classifying the environments from sound recordings. The first step is to generate features (representations) from the recorded sound and then classify the background environments. However, different kinds of representations have dramatic effects on the accuracy of the classification. In this paper, we explored the three such representations on classification accuracy using neural networks. We investigated the spectrograms, MFCCs, and embeddings representations using different CNN networks and autoencoders. Our dataset consists of sounds from three settings of indoors and outdoors environments - thus the dataset contains sound from six different kinds of environments. We found that the spectrogram representation has the highest classification accuracy while MFCC has the lowest classification accuracy. We reported our findings, insights as well as some guidelines to achieve better accuracy for environment classification using sounds. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: Presented at 2021 Mexican International Conference on Artificial Intelligence. Published in Advances in Computational Intelligence, MICAI 2021, Lecture Notes in Computer Science. 12 pages, 3 figures, 5 tables

Journal ref: Advances in Computational Intelligence, MICAI 2021, Lecture Notes in Artificial Intelligence vol. 13067, pp. 81-91 (2021)

arXiv:2203.15474 [pdf, other]

Gaussian Control Barrier Functions : A Non-Parametric Paradigm to Safety

Authors: Mouhyemen Khan, Tatsuya Ibuki, Abhijit Chatterjee

Abstract: Inspired by the success of control barrier functions (CBFs) in addressing safety, and the rise of data-driven techniques for modeling functions, we propose a non-parametric approach for online synthesis of CBFs using Gaussian Processes (GPs). Mathematical constructs such as CBFs have achieved safety by designing a candidate function a priori. However, designing such a candidate function can be cha… ▽ More Inspired by the success of control barrier functions (CBFs) in addressing safety, and the rise of data-driven techniques for modeling functions, we propose a non-parametric approach for online synthesis of CBFs using Gaussian Processes (GPs). Mathematical constructs such as CBFs have achieved safety by designing a candidate function a priori. However, designing such a candidate function can be challenging. A practical example of such a setting would be to design a CBF in a disaster recovery scenario where safe and navigable regions need to be determined. The decision boundary for safety in such an example is unknown and cannot be designed a priori. In our approach, we work with safety samples or observations to construct the CBF online by assuming a flexible GP prior on these samples, and term our formulation as a Gaussian CBF. GPs have favorable properties, in addition to being non-parametric, such as analytical tractability and robust uncertainty estimation. This allows realizing the posterior components with high safety guarantees by incorporating variance estimation, while also computing associated partial derivatives in closed-form to achieve safe control. Moreover, the synthesized safety function from our approach allows changing the corresponding safe set arbitrarily based on the data, thus allowing non-convex safe sets. We validate our approach experimentally on a quadrotor by demonstrating safe control for fixed but arbitrary safe sets and collision avoidance where the safe set is constructed online. Finally, we juxtapose Gaussian CBFs with regular CBFs in the presence of noisy states to highlight its flexibility and robustness to noise. The experiment video can be seen at: https://youtu.be/HX6uokvCiGk. △ Less

Submitted 1 August, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

arXiv:2202.08146 [pdf, other]

A Prospective Approach for Human-to-Human Interaction Recognition from Wi-Fi Channel Data using Attention Bidirectional Gated Recurrent Neural Network with GUI Application Implementation

Authors: Md. Mohi Uddin Khan, Abdullah Bin Shams, Md. Mohsin Sarker Raihan

Abstract: Human Activity Recognition (HAR) research has gained significant momentum due to recent technological advancements, artificial intelligence algorithms, the need for smart cities, and socioeconomic transformation. However, existing computer vision and sensor-based HAR solutions have limitations such as privacy issues, memory and power consumption, and discomfort in wearing sensors for which researc… ▽ More Human Activity Recognition (HAR) research has gained significant momentum due to recent technological advancements, artificial intelligence algorithms, the need for smart cities, and socioeconomic transformation. However, existing computer vision and sensor-based HAR solutions have limitations such as privacy issues, memory and power consumption, and discomfort in wearing sensors for which researchers are observing a paradigm shift in HAR research. In response, WiFi-based HAR is gaining popularity due to the availability of more coarse-grained Channel State Information. However, existing WiFi-based HAR approaches are limited to classifying independent and non-concurrent human activities performed within equal time duration. Recent research commonly utilizes a Single Input Multiple Output communication link with a WiFi signal of 5 GHz channel frequency, using two WiFi routers or two Intel 5300 NICs as transmitter-receiver. Our study, on the other hand, utilizes a Multiple Input Multiple Output radio link between a WiFi router and an Intel 5300 NIC, with the time-series Wi-Fi channel state information based on 2.4 GHz channel frequency for mutual human-to-human concurrent interaction recognition. The proposed Self-Attention guided Bidirectional Gated Recurrent Neural Network (Attention-BiGRU) deep learning model can classify 13 mutual interactions with a maximum benchmark accuracy of 94% for a single subject-pair. This has been expanded for ten subject pairs, which secured a benchmark accuracy of 88% with improved classification around the interaction-transition region. An executable graphical user interface (GUI) software has also been developed in this study using the PyQt5 python module to classify, save, and display the overall mutual concurrent human interactions performed within a given time duration. ... △ Less

Submitted 9 May, 2023; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: 48 Pages. This is the pre-print version article submitted for peer-review to a prestigious journal

arXiv:2202.06372 [pdf]

doi 10.1080/0952813X.2023.2165724

A Survey of Deep Learning Techniques for the Analysis of COVID-19 and their usability for Detecting Omicron

Authors: Asifullah Khan, Saddam Hussain Khan, Mahrukh Saif, Asiya Batool, Anabia Sohail, Muhammad Waleed Khan

Abstract: The Coronavirus (COVID-19) outbreak in December 2019 has become an ongoing threat to humans worldwide, creating a health crisis that infected millions of lives, as well as devastating the global economy. Deep learning (DL) techniques have proved helpful in analysis and delineation of infectious regions in radiological images in a timely manner. This paper makes an in-depth survey of DL techniques… ▽ More The Coronavirus (COVID-19) outbreak in December 2019 has become an ongoing threat to humans worldwide, creating a health crisis that infected millions of lives, as well as devastating the global economy. Deep learning (DL) techniques have proved helpful in analysis and delineation of infectious regions in radiological images in a timely manner. This paper makes an in-depth survey of DL techniques and draws a taxonomy based on diagnostic strategies and learning approaches. DL techniques are systematically categorized into classification, segmentation, and multi-stage approaches for COVID-19 diagnosis at image and region level analysis. Each category includes pre-trained and custom-made Convolutional Neural Network architectures for detecting COVID-19 infection in radiographic imaging modalities; X-Ray, and Computer Tomography (CT). Furthermore, a discussion is made on challenges in develo** diagnostic techniques such as cross-platform interoperability and examining imaging modality. Similarly, a review of the various methodologies and performance measures used in these techniques is also presented. This survey provides an insight into the promising areas of research in DL for analyzing radiographic images, and further accelerates the research in designing customized DL based diagnostic tools for effectively dealing with new variants of COVID-19 and emerging challenges. △ Less

Submitted 4 April, 2022; v1 submitted 13 February, 2022; originally announced February 2022.

Comments: Pages: 44, Figures: 7, Tables: 14

Showing 1–50 of 119 results for author: Khan, M