Search | arXiv e-print repository

arXiv:2406.17051 [pdf, other]

Leveraging Knowledge Distillation for Lightweight Skin Cancer Classification: Balancing Accuracy and Computational Efficiency

Authors: Niful Islam, Khan Md Hasib, Fahmida Akter Joti, Asif Karim, Sami Azam

Abstract: Skin cancer is a major concern to public health, accounting for one-third of the reported cancers. If not detected early, the cancer has the potential for severe consequences. Recognizing the critical need for effective skin cancer classification, we address the limitations of existing models, which are often too large to deploy in areas with limited computational resources. In response, we presen… ▽ More Skin cancer is a major concern to public health, accounting for one-third of the reported cancers. If not detected early, the cancer has the potential for severe consequences. Recognizing the critical need for effective skin cancer classification, we address the limitations of existing models, which are often too large to deploy in areas with limited computational resources. In response, we present a knowledge distillation based approach for creating a lightweight yet high-performing classifier. The proposed solution involves fusing three models, namely ResNet152V2, ConvNeXtBase, and ViT Base, to create an effective teacher model. The teacher model is then employed to guide a lightweight student model of size 2.03 MB. This student model is further compressed to 469.77 KB using 16-bit quantization, enabling smooth incorporation into edge devices. With six-stage image preprocessing, data augmentation, and a rigorous ablation study, the model achieves an impressive accuracy of 98.75% on the HAM10000 dataset and 98.94% on the Kaggle dataset in classifying benign and malignant skin cancers. With its high accuracy and compact size, our model appears to be a potential choice for accurate skin cancer classification, particularly in resource-constrained settings. △ Less

Submitted 28 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

arXiv:2402.14902 [pdf, other]

BIONIB: Blockchain-based IoT using Novelty Index in Bridge Health Monitoring

Authors: Divija Swetha Gadiraju, Ryan McMaster, Saeed Eftekhar Azam, Deepak Khazanchi

Abstract: Bridge health monitoring becomes crucial with the deployment of IoT sensors. The challenge lies in securely storing vast amounts of data and extracting useful information to promptly identify unhealthy bridge conditions. To address this challenge, we propose BIONIB, wherein real-time IoT data is stored on the blockchain for monitoring bridges. One of the emerging blockchains, EOSIO is used because… ▽ More Bridge health monitoring becomes crucial with the deployment of IoT sensors. The challenge lies in securely storing vast amounts of data and extracting useful information to promptly identify unhealthy bridge conditions. To address this challenge, we propose BIONIB, wherein real-time IoT data is stored on the blockchain for monitoring bridges. One of the emerging blockchains, EOSIO is used because of its exceptional scaling capabilities for monitoring the health of bridges. The approach involves collecting data from IoT sensors and using an unsupervised machine learning-based technique called the Novelty Index (NI) to observe meaningful patterns in the data. Smart contracts of EOSIO are used in implementation because of their efficiency, security, and programmability, making them well-suited for handling complex transactions and automating processes within decentralized applications. BIONIB provides secure storage benefits of blockchain, as well as useful predictions based on the NI. Performance analysis uses real-time data collected from IoT sensors at the bridge in healthy and unhealthy states. The data is collected with extensive experimentation with different loads, climatic conditions, and the health of the bridge. The performance of BIONIB under varying numbers of sensors and various numbers of participating blockchain nodes is observed. We observe a tradeoff between throughput, latency, and computational resources. Storage efficiency can be increased by manifolds with a slight increase in latency caused by NI calculation. As latency is not a significant concern in bridge health applications, the results demonstrate that BIONIB has high throughput, parallel processing, and high security while efficiently scaled. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.14757 [pdf, other]

SHM-Traffic: DRL and Transfer learning based UAV Control for Structural Health Monitoring of Bridges with Traffic

Authors: Divija Swetha Gadiraju, Saeed Eftekhar Azam, Deepak Khazanchi

Abstract: This work focuses on using advanced techniques for structural health monitoring (SHM) for bridges with Traffic. We propose an approach using deep reinforcement learning (DRL)-based control for Unmanned Aerial Vehicle (UAV). Our approach conducts a concrete bridge deck survey while traffic is ongoing and detects cracks. The UAV performs the crack detection, and the location of cracks is initially u… ▽ More This work focuses on using advanced techniques for structural health monitoring (SHM) for bridges with Traffic. We propose an approach using deep reinforcement learning (DRL)-based control for Unmanned Aerial Vehicle (UAV). Our approach conducts a concrete bridge deck survey while traffic is ongoing and detects cracks. The UAV performs the crack detection, and the location of cracks is initially unknown. We use two edge detection techniques. First, we use canny edge detection for crack detection. We also use a Convolutional Neural Network (CNN) for crack detection and compare it with canny edge detection. Transfer learning is applied using CNN with pre-trained weights obtained from a crack image dataset. This enables the model to adapt and improve its performance in identifying and localizing cracks. Proximal Policy Optimization (PPO) is applied for UAV control and bridge surveys. The experimentation across various scenarios is performed to evaluate the performance of the proposed methodology. Key metrics such as task completion time and reward convergence are observed to gauge the effectiveness of the approach. We observe that the Canny edge detector offers up to 40\% lower task completion time, while the CNN excels in up to 12\% better damage detection and 1.8 times better rewards. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2401.03236 [pdf, other]

Challenges of Data-Driven Simulation of Diverse and Consistent Human Driving Behaviors

Authors: Kalle Kujanpää, Daulet Baimukashev, Shibei Zhu, Shoaib Azam, Farzeen Munir, Gokhan Alcan, Ville Kyrki

Abstract: Building simulation environments for develo** and testing autonomous vehicles necessitates that the simulators accurately model the statistical realism of the real-world environment, including the interaction with other vehicles driven by human drivers. To address this requirement, an accurate human behavior model is essential to incorporate the diversity and consistency of human driving behavio… ▽ More Building simulation environments for develo** and testing autonomous vehicles necessitates that the simulators accurately model the statistical realism of the real-world environment, including the interaction with other vehicles driven by human drivers. To address this requirement, an accurate human behavior model is essential to incorporate the diversity and consistency of human driving behavior. We propose a mathematical framework for designing a data-driven simulation model that simulates human driving behavior more realistically than the currently used physics-based simulation models. Experiments conducted using the NGSIM dataset validate our hypothesis regarding the necessity of considering the complexity, diversity, and consistency of human driving behavior when aiming to develop realistic simulators. △ Less

Submitted 6 January, 2024; originally announced January 2024.

arXiv:2310.19405 [pdf, other]

Radar-Lidar Fusion for Object Detection by Designing Effective Convolution Networks

Authors: Farzeen Munir, Shoaib Azam, Tomasz Kucner, Ville Kyrki, Moongu Jeon

Abstract: Object detection is a core component of perception systems, providing the ego vehicle with information about its surroundings to ensure safe route planning. While cameras and Lidar have significantly advanced perception systems, their performance can be limited in adverse weather conditions. In contrast, millimeter-wave technology enables radars to function effectively in such conditions. However,… ▽ More Object detection is a core component of perception systems, providing the ego vehicle with information about its surroundings to ensure safe route planning. While cameras and Lidar have significantly advanced perception systems, their performance can be limited in adverse weather conditions. In contrast, millimeter-wave technology enables radars to function effectively in such conditions. However, relying solely on radar for building a perception system doesn't fully capture the environment due to the data's sparse nature. To address this, sensor fusion strategies have been introduced. We propose a dual-branch framework to integrate radar and Lidar data for enhanced object detection. The primary branch focuses on extracting radar features, while the auxiliary branch extracts Lidar features. These are then combined using additive attention. Subsequently, the integrated features are processed through a novel Parallel Forked Structure (PFS) to manage scale variations. A region proposal head is then utilized for object detection. We evaluated the effectiveness of our proposed method on the Radiate dataset using COCO metrics. The results show that it surpasses state-of-the-art methods by $1.89\%$ and $2.61\%$ in favorable and adverse weather conditions, respectively. This underscores the value of radar-Lidar fusion in achieving precise object detection and localization, especially in challenging weather conditions. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: ITSC conference paper

arXiv:2310.00098 [pdf, other]

Federated Learning with Differential Privacy for End-to-End Speech Recognition

Authors: Martin Pelikan, Sheikh Shams Azam, Vitaly Feldman, Jan "Honza" Silovsky, Kunal Talwar, Tatiana Likhomanenko

Abstract: While federated learning (FL) has recently emerged as a promising approach to train machine learning models, it is limited to only preliminary explorations in the domain of automatic speech recognition (ASR). Moreover, FL does not inherently guarantee user privacy and requires the use of differential privacy (DP) for robust privacy guarantees. However, we are not aware of prior work on applying DP… ▽ More While federated learning (FL) has recently emerged as a promising approach to train machine learning models, it is limited to only preliminary explorations in the domain of automatic speech recognition (ASR). Moreover, FL does not inherently guarantee user privacy and requires the use of differential privacy (DP) for robust privacy guarantees. However, we are not aware of prior work on applying DP to FL for ASR. In this paper, we aim to bridge this research gap by formulating an ASR benchmark for FL with DP and establishing the first baselines. First, we extend the existing research on FL for ASR by exploring different aspects of recent $\textit{large end-to-end transformer models}$: architecture design, seed models, data heterogeneity, domain shift, and impact of cohort size. With a $\textit{practical}$ number of central aggregations we are able to train $\textbf{FL models}$ that are \textbf{nearly optimal} even with heterogeneous data, a seed model from another domain, or no pre-trained seed model. Second, we apply DP to FL for ASR, which is non-trivial since DP noise severely affects model training, especially for large transformer models, due to highly imbalanced gradients in the attention block. We counteract the adverse effect of DP noise by reviving per-layer clip** and explaining why its effect is more apparent in our case than in the prior work. Remarkably, we achieve user-level ($7.2$, $10^{-9}$)-$\textbf{DP}$ (resp. ($4.5$, $10^{-9}$)-$\textbf{DP}$) with a 1.3% (resp. 4.6%) absolute drop in the word error rate for extrapolation to high (resp. low) population scale for $\textbf{FL with DP in ASR}$. △ Less

Submitted 29 September, 2023; originally announced October 2023.

Comments: Under review

arXiv:2309.13102 [pdf, other]

Importance of Smoothness Induced by Optimizers in FL4ASR: Towards Understanding Federated Learning for End-to-End ASR

Authors: Sheikh Shams Azam, Tatiana Likhomanenko, Martin Pelikan, Jan "Honza" Silovsky

Abstract: In this paper, we start by training End-to-End Automatic Speech Recognition (ASR) models using Federated Learning (FL) and examining the fundamental considerations that can be pivotal in minimizing the performance gap in terms of word error rate between models trained using FL versus their centralized counterpart. Specifically, we study the effect of (i) adaptive optimizers, (ii) loss characterist… ▽ More In this paper, we start by training End-to-End Automatic Speech Recognition (ASR) models using Federated Learning (FL) and examining the fundamental considerations that can be pivotal in minimizing the performance gap in terms of word error rate between models trained using FL versus their centralized counterpart. Specifically, we study the effect of (i) adaptive optimizers, (ii) loss characteristics via altering Connectionist Temporal Classification (CTC) weight, (iii) model initialization through seed start, (iv) carrying over modeling setup from experiences in centralized training to FL, e.g., pre-layer or post-layer normalization, and (v) FL-specific hyperparameters, such as number of local epochs, client sampling size, and learning rate scheduler, specifically for ASR under heterogeneous data distribution. We shed light on how some optimizers work better than others via inducing smoothness. We also summarize the applicability of algorithms, trends, and propose best practices from prior works in FL (in general) toward End-to-End ASR models. △ Less

Submitted 22 September, 2023; originally announced September 2023.

Comments: In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2023

arXiv:2308.13545 [pdf, other]

Feature Extraction Using Deep Generative Models for Bangla Text Classification on a New Comprehensive Dataset

Authors: Md. Rafi-Ur-Rashid, Sami Azam, Mirjam Jonkman

Abstract: The selection of features for text classification is a fundamental task in text mining and information retrieval. Despite being the sixth most widely spoken language in the world, Bangla has received little attention due to the scarcity of text datasets. In this research, we collected, annotated, and prepared a comprehensive dataset of 212,184 Bangla documents in seven different categories and mad… ▽ More The selection of features for text classification is a fundamental task in text mining and information retrieval. Despite being the sixth most widely spoken language in the world, Bangla has received little attention due to the scarcity of text datasets. In this research, we collected, annotated, and prepared a comprehensive dataset of 212,184 Bangla documents in seven different categories and made it publicly accessible. We implemented three deep learning generative models: LSTM variational autoencoder (LSTM VAE), auxiliary classifier generative adversarial network (AC-GAN), and adversarial autoencoder (AAE) to extract text features, although their applications are initially found in the field of computer vision. We utilized our dataset to train these three models and used the feature space obtained in the document classification task. We evaluated the performance of the classifiers and found that the adversarial autoencoder model produced the best feature space. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: To be submitted in Heliyon Journal

arXiv:2210.06758 [pdf, other]

Exploring Contextual Representation and Multi-Modality for End-to-End Autonomous Driving

Authors: Shoaib Azam, Farzeen Munir, Ville Kyrki, Moongu Jeon, Witold Pedrycz

Abstract: Learning contextual and spatial environmental representations enhances autonomous vehicle's hazard anticipation and decision-making in complex scenarios. Recent perception systems enhance spatial understanding with sensor fusion but often lack full environmental context. Humans, when driving, naturally employ neural maps that integrate various factors such as historical data, situational subtletie… ▽ More Learning contextual and spatial environmental representations enhances autonomous vehicle's hazard anticipation and decision-making in complex scenarios. Recent perception systems enhance spatial understanding with sensor fusion but often lack full environmental context. Humans, when driving, naturally employ neural maps that integrate various factors such as historical data, situational subtleties, and behavioral predictions of other road users to form a rich contextual understanding of their surroundings. This neural map-based comprehension is integral to making informed decisions on the road. In contrast, even with their significant advancements, autonomous systems have yet to fully harness this depth of human-like contextual understanding. Motivated by this, our work draws inspiration from human driving patterns and seeks to formalize the sensor fusion approach within an end-to-end autonomous driving framework. We introduce a framework that integrates three cameras (left, right, and center) to emulate the human field of view, coupled with top-down bird-eye-view semantic data to enhance contextual representation. The sensor data is fused and encoded using a self-attention mechanism, leading to an auto-regressive waypoint prediction module. We treat feature representation as a sequential problem, employing a vision transformer to distill the contextual interplay between sensor modalities. The efficacy of the proposed method is experimentally evaluated in both open and closed-loop settings. Our method achieves displacement error by 0.67m in open-loop settings, surpassing current methods by 6.9% on the nuScenes dataset. In closed-loop evaluations on CARLA's Town05 Long and Longest6 benchmarks, the proposed method enhances driving performance, route completion, and reduces infractions. △ Less

Submitted 16 January, 2024; v1 submitted 13 October, 2022; originally announced October 2022.

arXiv:2202.05500 [pdf, other]

Multi-Modal Fusion for Sensorimotor Coordination in Steering Angle Prediction

Authors: Farzeen Munir, Shoaib Azam, Byung-Geun Lee, Moongu Jeon

Abstract: Imitation learning is employed to learn sensorimotor coordination for steering angle prediction in an end-to-end fashion requires expert demonstrations. These expert demonstrations are paired with environmental perception and vehicle control data. The conventional frame-based RGB camera is the most common exteroceptive sensor modality used to acquire the environmental perception data. The frame-ba… ▽ More Imitation learning is employed to learn sensorimotor coordination for steering angle prediction in an end-to-end fashion requires expert demonstrations. These expert demonstrations are paired with environmental perception and vehicle control data. The conventional frame-based RGB camera is the most common exteroceptive sensor modality used to acquire the environmental perception data. The frame-based RGB camera has produced promising results when used as a single modality in learning end-to-end lateral control. However, the conventional frame-based RGB camera has limited operability in illumination variation conditions and is affected by the motion blur. The event camera provides complementary information to the frame-based RGB camera. This work explores the fusion of frame-based RGB and event data for learning end-to-end lateral control by predicting steering angle. In addition, how the representation from event data fuse with frame-based RGB data helps to predict the lateral control robustly for the autonomous vehicle. To this end, we propose DRFuser, a novel convolutional encoder-decoder architecture for learning end-to-end lateral control. The encoder module is branched between the frame-based RGB data and event data along with the self-attention layers. Moreover, this study has also contributed to our own collected dataset comprised of event, frame-based RGB, and vehicle control data. The efficacy of the proposed method is experimentally evaluated on our collected dataset, Davis Driving dataset (DDD), and Carla Eventscape dataset. The experimental results illustrate that the proposed method DRFuser outperforms the state-of-the-art in terms of root-mean-square error (RMSE) and mean absolute error (MAE) used as evaluation metrics. △ Less

Submitted 11 February, 2022; originally announced February 2022.

arXiv:2202.00280 [pdf, other]

Recycling Model Updates in Federated Learning: Are Gradient Subspaces Low-Rank?

Authors: Sheikh Shams Azam, Seyyedali Hosseinalipour, Qiang Qiu, Christopher Brinton

Abstract: In this paper, we question the rationale behind propagating large numbers of parameters through a distributed system during federated learning. We start by examining the rank characteristics of the subspace spanned by gradients across epochs (i.e., the gradient-space) in centralized model training, and observe that this gradient-space often consists of a few leading principal components accounting… ▽ More In this paper, we question the rationale behind propagating large numbers of parameters through a distributed system during federated learning. We start by examining the rank characteristics of the subspace spanned by gradients across epochs (i.e., the gradient-space) in centralized model training, and observe that this gradient-space often consists of a few leading principal components accounting for an overwhelming majority (95-99%) of the explained variance. Motivated by this, we propose the "Look-back Gradient Multiplier" (LBGM) algorithm, which exploits this low-rank property to enable gradient recycling between model update rounds of federated learning, reducing transmissions of large parameters to single scalars for aggregation. We analytically characterize the convergence behavior of LBGM, revealing the nature of the trade-off between communication savings and model performance. Our subsequent experimental results demonstrate the improvement LBGM obtains in communication overhead compared to conventional federated learning on several datasets and deep learning models. Additionally, we show that LBGM is a general plug-and-play algorithm that can be used standalone or stacked on top of existing sparsification techniques for distributed model training. △ Less

Submitted 1 February, 2022; originally announced February 2022.

Comments: In Proceedings of the 10th International Conference on Learning Representations (ICLR) 2022

arXiv:2111.15257 [pdf, other]

ARTSeg: Employing Attention for Thermal images Semantic Segmentation

Authors: Farzeen Munir, Shoaib Azam, Unse Fatima, Moongu Jeon

Abstract: The research advancements have made the neural network algorithms deployed in the autonomous vehicle to perceive the surrounding. The standard exteroceptive sensors that are utilized for the perception of the environment are cameras and Lidar. Therefore, the neural network algorithms developed using these exteroceptive sensors have provided the necessary solution for the autonomous vehicle's perce… ▽ More The research advancements have made the neural network algorithms deployed in the autonomous vehicle to perceive the surrounding. The standard exteroceptive sensors that are utilized for the perception of the environment are cameras and Lidar. Therefore, the neural network algorithms developed using these exteroceptive sensors have provided the necessary solution for the autonomous vehicle's perception. One major drawback of these exteroceptive sensors is their operability in adverse weather conditions, for instance, low illumination and night conditions. The useability and affordability of thermal cameras in the sensor suite of the autonomous vehicle provide the necessary improvement in the autonomous vehicle's perception in adverse weather conditions. The semantics of the environment benefits the robust perception, which can be achieved by segmenting different objects in the scene. In this work, we have employed the thermal camera for semantic segmentation. We have designed an attention-based Recurrent Convolution Network (RCNN) encoder-decoder architecture named ARTSeg for thermal semantic segmentation. The main contribution of this work is the design of encoder-decoder architecture, which employ units of RCNN for each encoder and decoder block. Furthermore, additive attention is employed in the decoder module to retain high-resolution features and improve the localization of features. The efficacy of the proposed method is evaluated on the available public dataset, showing better performance with other state-of-the-art methods in mean intersection over union (IoU). △ Less

Submitted 30 November, 2021; originally announced November 2021.

arXiv:2109.03350 [pdf, other]

Federated Learning Beyond the Star: Local D2D Model Consensus with Global Cluster Sampling

Authors: Frank Po-Chen Lin, Seyyedali Hosseinalipour, Sheikh Shams Azam, Christopher G. Brinton, Nicolò Michelusi

Abstract: Federated learning has emerged as a popular technique for distributing model training across the network edge. Its learning architecture is conventionally a star topology between the devices and a central server. In this paper, we propose two timescale hybrid federated learning (TT-HF), which migrates to a more distributed topology via device-to-device (D2D) communications. In TT-HF, local model t… ▽ More Federated learning has emerged as a popular technique for distributing model training across the network edge. Its learning architecture is conventionally a star topology between the devices and a central server. In this paper, we propose two timescale hybrid federated learning (TT-HF), which migrates to a more distributed topology via device-to-device (D2D) communications. In TT-HF, local model training occurs at devices via successive gradient iterations, and the synchronization process occurs at two timescales: (i) macro-scale, where global aggregations are carried out via device-server interactions, and (ii) micro-scale, where local aggregations are carried out via D2D cooperative consensus formation in different device clusters. Our theoretical analysis reveals how device, cluster, and network-level parameters affect the convergence of TT-HF, and leads to a set of conditions under which a convergence rate of O(1/t) is guaranteed. Experimental results demonstrate the improvements in convergence and utilization that can be obtained by TT-HF over state-of-the-art federated learning baselines. △ Less

Submitted 12 September, 2021; v1 submitted 7 September, 2021; originally announced September 2021.

Comments: This paper has been published in IEEE Global Communications Conference 2021 (Globecom). arXiv admin note: substantial text overlap with arXiv:2103.10481

arXiv:2103.10481 [pdf, other]

Semi-Decentralized Federated Learning with Cooperative D2D Local Model Aggregations

Authors: Frank Po-Chen Lin, Seyyedali Hosseinalipour, Sheikh Shams Azam, Christopher G. Brinton, Nicolo Michelusi

Abstract: Federated learning has emerged as a popular technique for distributing machine learning (ML) model training across the wireless edge. In this paper, we propose two timescale hybrid federated learning (TT-HF), a semi-decentralized learning architecture that combines the conventional device-to-server communication paradigm for federated learning with device-to-device (D2D) communications for model t… ▽ More Federated learning has emerged as a popular technique for distributing machine learning (ML) model training across the wireless edge. In this paper, we propose two timescale hybrid federated learning (TT-HF), a semi-decentralized learning architecture that combines the conventional device-to-server communication paradigm for federated learning with device-to-device (D2D) communications for model training. In TT-HF, during each global aggregation interval, devices (i) perform multiple stochastic gradient descent iterations on their individual datasets, and (ii) aperiodically engage in consensus procedure of their model parameters through cooperative, distributed D2D communications within local clusters. With a new general definition of gradient diversity, we formally study the convergence behavior of TT-HF, resulting in new convergence bounds for distributed ML. We leverage our convergence bounds to develop an adaptive control algorithm that tunes the step size, D2D communication rounds, and global aggregation period of TT-HF over time to target a sublinear convergence rate of O(1/t) while minimizing network resource utilization. Our subsequent experiments demonstrate that TT-HF significantly outperforms the current art in federated learning in terms of model accuracy and/or network energy consumption in different scenarios where local device datasets exhibit statistical heterogeneity. Finally, our numerical evaluations demonstrate robustness against outages caused by fading channels, as well favorable performance with non-convex loss functions. △ Less

Submitted 30 September, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

Comments: This paper has been published in IEEE Journal on Selected Areas in Communications (JSAC)

arXiv:2103.03150 [pdf, other]

SSTN: Self-Supervised Domain Adaptation Thermal Object Detection for Autonomous Driving

Authors: Farzeen Munir, Shoaib Azam, Moongu Jeon

Abstract: The sensibility and sensitivity of the environment play a decisive role in the safe and secure operation of autonomous vehicles. This perception of the surrounding is way similar to human visual representation. The human's brain perceives the environment by utilizing different sensory channels and develop a view-invariant representation model. Kee** in this context, different exteroceptive senso… ▽ More The sensibility and sensitivity of the environment play a decisive role in the safe and secure operation of autonomous vehicles. This perception of the surrounding is way similar to human visual representation. The human's brain perceives the environment by utilizing different sensory channels and develop a view-invariant representation model. Kee** in this context, different exteroceptive sensors are deployed on the autonomous vehicle for perceiving the environment. The most common exteroceptive sensors are camera, Lidar and radar for autonomous vehicle's perception. Despite being these sensors have illustrated their benefit in the visible spectrum domain yet in the adverse weather conditions, for instance, at night, they have limited operation capability, which may lead to fatal accidents. In this work, we explore thermal object detection to model a view-invariant model representation by employing the self-supervised contrastive learning approach. For this purpose, we have proposed a deep neural network Self Supervised Thermal Network (SSTN) for learning the feature embedding to maximize the information between visible and infrared spectrum domain by contrastive learning, and later employing these learned feature representation for the thermal object detection using multi-scale encoder-decoder transformer network. The proposed method is extensively evaluated on the two publicly available datasets: the FLIR-ADAS dataset and the KAIST Multi-Spectral dataset. The experimental results illustrate the efficacy of the proposed method. △ Less

Submitted 30 November, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

arXiv:2101.03531 [pdf, other]

Channel Boosting Feature Ensemble for Radar-based Object Detection

Authors: Shoaib Azam, Farzeen Munir, Moongu Jeon

Abstract: Autonomous vehicles are conceived to provide safe and secure services by validating the safety standards as indicated by SOTIF-ISO/PAS-21448 (Safety of the intended functionality). Kee** in this context, the perception of the environment plays an instrumental role in conjunction with localization, planning and control modules. As a pivotal algorithm in the perception stack, object detection prov… ▽ More Autonomous vehicles are conceived to provide safe and secure services by validating the safety standards as indicated by SOTIF-ISO/PAS-21448 (Safety of the intended functionality). Kee** in this context, the perception of the environment plays an instrumental role in conjunction with localization, planning and control modules. As a pivotal algorithm in the perception stack, object detection provides extensive insights into the autonomous vehicle's surroundings. Camera and Lidar are extensively utilized for object detection among different sensor modalities, but these exteroceptive sensors have limitations in resolution and adverse weather conditions. In this work, radar-based object detection is explored provides a counterpart sensor modality to be deployed and used in adverse weather conditions. The radar gives complex data; for this purpose, a channel boosting feature ensemble method with transformer encoder-decoder network is proposed. The object detection task using radar is formulated as a set prediction problem and evaluated on the publicly available dataset in both good and good-bad weather conditions. The proposed method's efficacy is extensively evaluated using the COCO evaluation metric, and the best-proposed model surpasses its state-of-the-art counterpart method by $12.55\%$ and $12.48\%$ in both good and good-bad weather conditions. △ Less

Submitted 30 November, 2021; v1 submitted 10 January, 2021; originally announced January 2021.

arXiv:2010.01792 [pdf, other]

Can we Generalize and Distribute Private Representation Learning?

Authors: Sheikh Shams Azam, Tae** Kim, Seyyedali Hosseinalipour, Carlee Joe-Wong, Saurabh Bagchi, Christopher Brinton

Abstract: We study the problem of learning representations that are private yet informative, i.e., provide information about intended "ally" targets while hiding sensitive "adversary" attributes. We propose Exclusion-Inclusion Generative Adversarial Network (EIGAN), a generalized private representation learning (PRL) architecture that accounts for multiple ally and adversary attributes unlike existing PRL s… ▽ More We study the problem of learning representations that are private yet informative, i.e., provide information about intended "ally" targets while hiding sensitive "adversary" attributes. We propose Exclusion-Inclusion Generative Adversarial Network (EIGAN), a generalized private representation learning (PRL) architecture that accounts for multiple ally and adversary attributes unlike existing PRL solutions. While centrally-aggregated dataset is a prerequisite for most PRL techniques, data in real-world is often siloed across multiple distributed nodes unwilling to share the raw data because of privacy concerns. We address this practical constraint by develo** D-EIGAN, the first distributed PRL method that learns representations at each node without transmitting the source data. We theoretically analyze the behavior of adversaries under the optimal EIGAN and D-EIGAN encoders and the impact of dependencies among ally and adversary tasks on the optimization objective. Our experiments on various datasets demonstrate the advantages of EIGAN in terms of performance, robustness, and scalability. In particular, EIGAN outperforms the previous state-of-the-art by a significant accuracy margin (47% improvement), and D-EIGAN's performance is consistently on par with EIGAN under different network settings. △ Less

Submitted 30 January, 2022; v1 submitted 5 October, 2020; originally announced October 2020.

Comments: In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS) 2022

arXiv:2009.08020 [pdf, other]

doi 10.1109/TITS.2021.3102479

LDNet: End-to-End Lane Marking Detection Approach Using a Dynamic Vision Sensor

Authors: Farzeen Munir, Shoaib Azam, Moongu Jeon, Byung-Geun Lee, Witold Pedrycz

Abstract: Modern vehicles are equipped with various driver-assistance systems, including automatic lane kee**, which prevents unintended lane departures. Traditional lane detection methods incorporate handcrafted or deep learning-based features followed by postprocessing techniques for lane extraction using frame-based RGB cameras. The utilization of frame-based RGB cameras for lane detection tasks is pro… ▽ More Modern vehicles are equipped with various driver-assistance systems, including automatic lane kee**, which prevents unintended lane departures. Traditional lane detection methods incorporate handcrafted or deep learning-based features followed by postprocessing techniques for lane extraction using frame-based RGB cameras. The utilization of frame-based RGB cameras for lane detection tasks is prone to illumination variations, sun glare, and motion blur, which limits the performance of lane detection methods. Incorporating an event camera for lane detection tasks in the perception stack of autonomous driving is one of the most promising solutions for mitigating challenges encountered by frame-based RGB cameras. The main contribution of this work is the design of the lane marking detection model, which employs the dynamic vision sensor. This paper explores the novel application of lane marking detection using an event camera by designing a convolutional encoder followed by the attention-guided decoder. The spatial resolution of the encoded features is retained by a dense atrous spatial pyramid pooling (ASPP) block. The additive attention mechanism in the decoder improves performance for high dimensional input encoded features that promote lane localization and relieve postprocessing computation. The efficacy of the proposed work is evaluated using the DVS dataset for lane extraction (DET). The experimental results show a significant improvement of $5.54\%$ and $5.03\%$ in $F1$ scores in multiclass and binary-class lane marking detection tasks. Additionally, the intersection over union ($IoU$) scores of the proposed method surpass those of the best-performing state-of-the-art method by $6.50\%$ and $9.37\%$ in multiclass and binary-class tasks, respectively. △ Less

Submitted 30 November, 2021; v1 submitted 16 September, 2020; originally announced September 2020.

Journal ref: Munir, Farzeen, Shoaib Azam, Moongu Jeon, Byung-Geun Lee, and Witold Pedrycz. "LDNet: End-to-End Lane Marking Detection Approach Using a Dynamic Vision Sensor." IEEE Transactions on Intelligent Transportation Systems (2021)

arXiv:2007.09511 [pdf, other]

doi 10.1109/TNET.2022.3143495

Multi-Stage Hybrid Federated Learning over Large-Scale D2D-Enabled Fog Networks

Authors: Seyyedali Hosseinalipour, Sheikh Shams Azam, Christopher G. Brinton, Nicolo Michelusi, Vaneet Aggarwal, David J. Love, Huaiyu Dai

Abstract: Federated learning has generated significant interest, with nearly all works focused on a "star" topology where nodes/devices are each connected to a central server. We migrate away from this architecture and extend it through the network dimension to the case where there are multiple layers of nodes between the end devices and the server. Specifically, we develop multi-stage hybrid federated lear… ▽ More Federated learning has generated significant interest, with nearly all works focused on a "star" topology where nodes/devices are each connected to a central server. We migrate away from this architecture and extend it through the network dimension to the case where there are multiple layers of nodes between the end devices and the server. Specifically, we develop multi-stage hybrid federated learning (MH-FL), a hybrid of intra- and inter-layer model learning that considers the network as a multi-layer cluster-based structure. MH-FL considers the topology structures among the nodes in the clusters, including local networks formed via device-to-device (D2D) communications, and presumes a semi-decentralized architecture for federated learning. It orchestrates the devices at different network layers in a collaborative/cooperative manner (i.e., using D2D interactions) to form local consensus on the model parameters and combines it with multi-stage parameter relaying between layers of the tree-shaped hierarchy. We derive the upper bound of convergence for MH-FL with respect to parameters of the network topology (e.g., the spectral radius) and the learning algorithm (e.g., the number of D2D rounds in different clusters). We obtain a set of policies for the D2D rounds at different clusters to guarantee either a finite optimality gap or convergence to the global optimum. We then develop a distributed control algorithm for MH-FL to tune the D2D rounds in each cluster over time to meet specific convergence criteria. Our experiments on real-world datasets verify our analytical results and demonstrate the advantages of MH-FL in terms of resource utilization metrics. △ Less

Submitted 12 January, 2022; v1 submitted 18 July, 2020; originally announced July 2020.

Comments: This paper is accepted for publication in IEEE/ACM Transactions on Networking

Journal ref: IEEE/ACM Transactions on Networking, 2022

arXiv:2006.00821 [pdf, other]

Exploring Thermal Images for Object Detection in Underexposure Regions for Autonomous Driving

Authors: Farzeen Munir, Shoaib Azam, Muhammd Aasim Rafique, Ahmad Muqeem Sheri, Moongu Jeon, Witold Pedrycz

Abstract: Underexposure regions are vital to construct a complete perception of the surroundings for safe autonomous driving. The availability of thermal cameras has provided an essential alternate to explore regions where other optical sensors lack in capturing interpretable signals. A thermal camera captures an image using the heat difference emitted by objects in the infrared spectrum, and object detecti… ▽ More Underexposure regions are vital to construct a complete perception of the surroundings for safe autonomous driving. The availability of thermal cameras has provided an essential alternate to explore regions where other optical sensors lack in capturing interpretable signals. A thermal camera captures an image using the heat difference emitted by objects in the infrared spectrum, and object detection in thermal images becomes effective for autonomous driving in challenging conditions. Although object detection in the visible spectrum domain imaging has matured, thermal object detection lacks effectiveness. A significant challenge is scarcity of labeled data for the thermal domain which is desiderata for SOTA artificial intelligence techniques. This work proposes a domain adaptation framework which employs a style transfer technique for transfer learning from visible spectrum images to thermal images. The framework uses a generative adversarial network (GAN) to transfer the low-level features from the visible spectrum domain to the thermal domain through style consistency. The efficacy of the proposed method of object detection in thermal images is evident from the improved results when used styled images from publicly available thermal image datasets (FLIR ADAS and KAIST Multi-Spectral). △ Less

Submitted 3 May, 2021; v1 submitted 1 June, 2020; originally announced June 2020.

arXiv:2002.06604 [pdf, other]

Key Points Estimation and Point Instance Segmentation Approach for Lane Detection

Authors: Yeongmin Ko, Younkwan Lee, Shoaib Azam, Farzeen Munir, Moongu Jeon, Witold Pedrycz

Abstract: Perception techniques for autonomous driving should be adaptive to various environments. In the case of traffic line detection, an essential perception module, many condition should be considered, such as number of traffic lines and computing power of the target system. To address these problems, in this paper, we propose a traffic line detection method called Point Instance Network (PINet); the m… ▽ More Perception techniques for autonomous driving should be adaptive to various environments. In the case of traffic line detection, an essential perception module, many condition should be considered, such as number of traffic lines and computing power of the target system. To address these problems, in this paper, we propose a traffic line detection method called Point Instance Network (PINet); the method is based on the key points estimation and instance segmentation approach. The PINet includes several stacked hourglass networks that are trained simultaneously. Therefore the size of the trained models can be chosen according to the computing power of the target environment. We cast a clustering problem of the predicted key points as an instance segmentation problem; the PINet can be trained regardless of the number of the traffic lines. The PINet achieves competitive accuracy and false positive on the TuSimple and Culane datasets, popular public datasets for lane detection. Our code is available at https://github.com/koyeongmin/PINet_new △ Less

Submitted 13 September, 2020; v1 submitted 16 February, 2020; originally announced February 2020.

Comments: Submitted to "IEEE Transactions on Intelligent Transportation Systems"

arXiv:1804.11149 [pdf, other]

doi 10.5281/zenodo.1474513

Q-Map: Clinical Concept Mining from Clinical Documents

Authors: Sheikh Shams Azam, Manoj Raju, Venkatesh Pagidimarri, Vamsi Kasivajjala

Abstract: Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum… ▽ More Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter. △ Less

Submitted 16 July, 2018; v1 submitted 30 April, 2018; originally announced April 2018.

Comments: 6 pages, 1 figure

Journal ref: International Journal of Computer and Information Engineering, 12(9), 2018, 691 - 696

Showing 1–22 of 22 results for author: Azam, S