-
Optimization of End-to-End AoI in Edge-Enabled Vehicular Fog Systems: A Dueling-DQN Approach
Authors:
Seifu Birhanu Tadele,
Binayak Kar,
Frezer Guteta Wakgra,
Asif Uddin Khan
Abstract:
In real-time status update services for the Internet of Things (IoT), the timely dissemination of information requiring timely updates is crucial to maintaining its relevance. Failing to keep up with these updates results in outdated information. The age of information (AoI) serves as a metric to quantify the freshness of information. The Existing works to optimize AoI primarily focus on the trans…
▽ More
In real-time status update services for the Internet of Things (IoT), the timely dissemination of information requiring timely updates is crucial to maintaining its relevance. Failing to keep up with these updates results in outdated information. The age of information (AoI) serves as a metric to quantify the freshness of information. The Existing works to optimize AoI primarily focus on the transmission time from the information source to the monitor, neglecting the transmission time from the monitor to the destination. This oversight significantly impacts information freshness and subsequently affects decision-making accuracy. To address this gap, we designed an edge-enabled vehicular fog system to lighten the computational burden on IoT devices. We examined how information transmission and request-response times influence end-to-end AoI. As a solution, we proposed Dueling-Deep Queue Network (dueling-DQN), a deep reinforcement learning (DRL)-based algorithm and compared its performance with DQN policy and analytical results. Our simulation results demonstrate that the proposed dueling-DQN algorithm outperforms both DQN and analytical methods, highlighting its effectiveness in improving real-time system information freshness. Considering the complete end-to-end transmission process, our optimization approach can improve decision-making performance and overall system efficiency.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Beyond Diagonal IRS Assisted Ultra Massive THz Systems: A Low Resolution Approach
Authors:
Wali Ullah Khan,
Chandan Kumar Sheemar,
Zaid Abdullah,
Eva Lagunas,
Symeon Chatzinotas
Abstract:
The terahertz communications have the potential to revolutionize data transfer with unmatched speed and facilitate the development of new high-bandwidth applications. This paper studies the performance of downlink terahertz system assisted by beyond diagonal intelligent reconfigurable surface (BD-IRS). For enhanced energy efficiency and low cost, a joint precoding and BD-IRS phase shift design sat…
▽ More
The terahertz communications have the potential to revolutionize data transfer with unmatched speed and facilitate the development of new high-bandwidth applications. This paper studies the performance of downlink terahertz system assisted by beyond diagonal intelligent reconfigurable surface (BD-IRS). For enhanced energy efficiency and low cost, a joint precoding and BD-IRS phase shift design satisfying the $1$-bit resolution constraints to maximize the spectral efficiency is presented. The original problem is non-linear, NP-hard, and intricately coupled, and obtaining an optimal solution is challenging. To reduce the complexity, we first transform the optimization problem into two problems and then iteratively solve them to achieve an efficient solution. Numerical results demonstrate that the proposed approach for the BD-IRS assisted terahertz system significantly enhances the spectral efficiency compared to the conventional diagonal IRS assisted system.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Shape2.5D: A Dataset of Texture-less Surfaces for Depth and Normals Estimation
Authors:
Muhammad Saif Ullah Khan,
Muhammad Zeshan Afzal,
Didier Stricker
Abstract:
Reconstructing texture-less surfaces poses unique challenges in computer vision, primarily due to the lack of specialized datasets that cater to the nuanced needs of depth and normals estimation in the absence of textural information. We introduce "Shape2.5D," a novel, large-scale dataset designed to address this gap. Comprising 364k frames spanning 2635 3D models and 48 unique objects, our datase…
▽ More
Reconstructing texture-less surfaces poses unique challenges in computer vision, primarily due to the lack of specialized datasets that cater to the nuanced needs of depth and normals estimation in the absence of textural information. We introduce "Shape2.5D," a novel, large-scale dataset designed to address this gap. Comprising 364k frames spanning 2635 3D models and 48 unique objects, our dataset provides depth and surface normal maps for texture-less object reconstruction. The proposed dataset includes synthetic images rendered with 3D modeling software to simulate various lighting conditions and viewing angles. It also includes a real-world subset comprising 4672 frames captured with a depth camera. Our comprehensive benchmarks, performed using a modified encoder-decoder network, showcase the dataset's capability to support the development of algorithms that robustly estimate depth and normals from RGB images. Our open-source data generation pipeline allows the dataset to be extended and adapted for future research. The dataset is publicly available at \url{https://github.com/saifkhichi96/Shape25D}.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Enhanced Bank Check Security: Introducing a Novel Dataset and Transformer-Based Approach for Detection and Verification
Authors:
Muhammad Saif Ullah Khan,
Tahira Shehzadi,
Rabeya Noor,
Didier Stricker,
Muhammad Zeshan Afzal
Abstract:
Automated signature verification on bank checks is critical for fraud prevention and ensuring transaction authenticity. This task is challenging due to the coexistence of signatures with other textual and graphical elements on real-world documents. Verification systems must first detect the signature and then validate its authenticity, a dual challenge often overlooked by current datasets and meth…
▽ More
Automated signature verification on bank checks is critical for fraud prevention and ensuring transaction authenticity. This task is challenging due to the coexistence of signatures with other textual and graphical elements on real-world documents. Verification systems must first detect the signature and then validate its authenticity, a dual challenge often overlooked by current datasets and methodologies focusing only on verification. To address this gap, we introduce a novel dataset specifically designed for signature verification on bank checks. This dataset includes a variety of signature styles embedded within typical check elements, providing a realistic testing ground for advanced detection methods. Moreover, we propose a novel approach for writer-independent signature verification using an object detection network. Our detection-based verification method treats genuine and forged signatures as distinct classes within an object detection framework, effectively handling both detection and verification. We employ a DINO-based network augmented with a dilation module to detect and verify signatures on check images simultaneously. Our approach achieves an AP of 99.2 for genuine and 99.4 for forged signatures, a significant improvement over the DINO baseline, which scored 93.1 and 89.3 for genuine and forged signatures, respectively. This improvement highlights our dilation module's effectiveness in reducing both false positives and negatives. Our results demonstrate substantial advancements in detection-based signature verification technology, offering enhanced security and efficiency in financial document processing.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Finding Blind Spots in Evaluator LLMs with Interpretable Checklists
Authors:
Sumanth Doddapaneni,
Mohammed Safi Ur Rahman Khan,
Sshubam Verma,
Mitesh M. Khapra
Abstract:
Large Language Models (LLMs) are increasingly relied upon to evaluate text outputs of other LLMs, thereby influencing leaderboards and development decisions. However, concerns persist over the accuracy of these assessments and the potential for misleading conclusions. In this work, we investigate the effectiveness of LLMs as evaluators for text generation tasks. We propose FBI, a novel framework d…
▽ More
Large Language Models (LLMs) are increasingly relied upon to evaluate text outputs of other LLMs, thereby influencing leaderboards and development decisions. However, concerns persist over the accuracy of these assessments and the potential for misleading conclusions. In this work, we investigate the effectiveness of LLMs as evaluators for text generation tasks. We propose FBI, a novel framework designed to examine the proficiency of Evaluator LLMs in assessing four critical abilities in other LLMs: factual accuracy, instruction following, coherence in long-form writing, and reasoning proficiency. By introducing targeted perturbations in answers generated by LLMs, that clearly impact one of these key capabilities, we test whether an Evaluator LLM can detect these quality drops. By creating a total of 2400 perturbed answers covering 22 perturbation categories, we conduct a comprehensive study using different evaluation strategies on five prominent LLMs commonly used as evaluators in the literature. Our findings reveal significant shortcomings in current Evaluator LLMs, which failed to identify quality drops in over 50\% of cases on average. Single-answer and pairwise evaluations demonstrated notable limitations, whereas reference-based evaluations showed comparatively better performance. These results underscore the unreliable nature of current Evaluator LLMs and advocate for cautious implementation in practical applications. Code and data are available at https://github.com/AI4Bharat/FBI.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Situational Instructions Database: Task Guidance in Dynamic Environments
Authors:
Muhammad Saif Ullah Khan,
Sankalp Sinha,
Didier Stricker,
Muhammad Zeshan Afzal
Abstract:
The Situational Instructions Database (SID) addresses the need for enhanced situational awareness in artificial intelligence (AI) systems operating in dynamic environments. By integrating detailed scene graphs with dynamically generated, task-specific instructions, SID provides a novel dataset that allows AI systems to perform complex, real-world tasks with improved context sensitivity and operati…
▽ More
The Situational Instructions Database (SID) addresses the need for enhanced situational awareness in artificial intelligence (AI) systems operating in dynamic environments. By integrating detailed scene graphs with dynamically generated, task-specific instructions, SID provides a novel dataset that allows AI systems to perform complex, real-world tasks with improved context sensitivity and operational accuracy. This dataset leverages advanced generative models to simulate a variety of realistic scenarios based on the 3D Semantic Scene Graphs (3DSSG) dataset, enriching it with scenario-specific information that details environmental interactions and tasks. SID facilitates the development of AI applications that can adapt to new and evolving conditions without extensive retraining, supporting research in autonomous technology and AI-driven decision-making processes. This dataset is instrumental in develo** robust, context-aware AI agents capable of effectively navigating and responding to unpredictable settings. Available for research and development, SID serves as a critical resource for advancing the capabilities of intelligent systems in complex environments. Dataset available at \url{https://github.com/mindgarage/situational-instructions-database}.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Beyond Diagonal RIS for 6G Non-Terrestrial Networks: Potentials and Challenges
Authors:
Wali Ullah Khan,
Asad Mahmood,
Muhammad Ali Jamshed,
Eva Lagunas,
Manzoor Ahmed,
Symeon Chatzinotas
Abstract:
Reconfigurable intelligent surface (RIS) has emerged as a promising technology in both terrestrial and non-terrestrial networks (NTNs) due to its ability to manipulate wireless environments for better connectivity. Significant studies have been focused on conventional RIS with diagonal phase response matrices. This simple RIS architecture, though less expensive, has limited flexibility in engineer…
▽ More
Reconfigurable intelligent surface (RIS) has emerged as a promising technology in both terrestrial and non-terrestrial networks (NTNs) due to its ability to manipulate wireless environments for better connectivity. Significant studies have been focused on conventional RIS with diagonal phase response matrices. This simple RIS architecture, though less expensive, has limited flexibility in engineering the wireless channels. As the latest member of RIS technology, beyond diagonal RIS (BD-RIS) has recently been proposed in terrestrial setups. Due to the interconnected phase response elements, BD-RIS significantly enhances the control over the wireless environment. This work proposes the potential and challenges of BD-RIS in NTNs. We begin with the motivation and recent advances in BD-RIS. Subsequently, we discuss the fundamentals of BD-RIS and NTNs. We then outline the application of BD-RIS in NTNs, followed by a case study on BD-RIS enabled non-orthogonal multiple access low earth orbit satellite communication. Finally, we highlight challenges and research directions with concluding remarks.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Pragmatic Formal Verification Methodology for Clock Domain Crossing (CDC)
Authors:
Aman Kumar,
Muhammad Ul Haque Khan,
Bijitendra Mittra
Abstract:
Modern System-on-Chip (SoC) designs are becoming more and more complex due to the technology upscaling. SoC designs often operate on multiple asynchronous clock domains, further adding to the complexity of the overall design. To make the devices power efficient, designers take a Globally-Asynchronous Locally-Synchronous (GALS) approach that creates multiple asynchronous domains. These Clock Domain…
▽ More
Modern System-on-Chip (SoC) designs are becoming more and more complex due to the technology upscaling. SoC designs often operate on multiple asynchronous clock domains, further adding to the complexity of the overall design. To make the devices power efficient, designers take a Globally-Asynchronous Locally-Synchronous (GALS) approach that creates multiple asynchronous domains. These Clock Domain Crossings (CDC) are prone to metastability effects, and functional verification of such CDC is very important to ensure that no bug escapes. Conventional verification methods, such as register transfer level (RTL) simulations and static timing analysis, are not enough to address these CDC issues, which may lead to verification gaps. Additionally, identifying these CDC-related bugs is very time-consuming and is one of the most common reasons for costly silicon re-spins. This paper is focused on the development of a pragmatic formal verification methodology to minimize the CDC issues by exercising Metastability Injection (MSI) in different CDC paths.
△ Less
Submitted 20 April, 2024;
originally announced June 2024.
-
Diffusion-Inspired Quantum Noise Mitigation in Parameterized Quantum Circuits
Authors:
Hoang-Quan Nguyen,
Xuan Bac Nguyen,
Samuel Yen-Chi Chen,
Hugh Churchill,
Nicholas Borys,
Samee U. Khan,
Khoa Luu
Abstract:
Parameterized Quantum Circuits (PQCs) have been acknowledged as a leading strategy to utilize near-term quantum advantages in multiple problems, including machine learning and combinatorial optimization. When applied to specific tasks, the parameters in the quantum circuits are trained to minimize the target function. Although there have been comprehensive studies to improve the performance of the…
▽ More
Parameterized Quantum Circuits (PQCs) have been acknowledged as a leading strategy to utilize near-term quantum advantages in multiple problems, including machine learning and combinatorial optimization. When applied to specific tasks, the parameters in the quantum circuits are trained to minimize the target function. Although there have been comprehensive studies to improve the performance of the PQCs on practical tasks, the errors caused by the quantum noise downgrade the performance when running on real quantum computers. In particular, when the quantum state is transformed through multiple quantum circuit layers, the effect of the quantum noise happens cumulatively and becomes closer to the maximally mixed state or complete noise. This paper studies the relationship between the quantum noise and the diffusion model. Then, we propose a novel diffusion-inspired learning approach to mitigate the quantum noise in the PQCs and reduce the error for specific tasks. Through our experiments, we illustrate the efficiency of the learning strategy and achieve state-of-the-art performance on classification tasks in the quantum noise scenarios.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach
Authors:
Muhammad Saif Ullah Khan,
Dhavalkumar Limbachiya,
Didier Stricker,
Muhammad Zeshan Afzal
Abstract:
Human pose estimation is a key task in computer vision with various applications such as activity recognition and interactive systems. However, the lack of consistency in the annotated skeletons across different datasets poses challenges in develo** universally applicable models. To address this challenge, we propose a novel approach integrating multi-teacher knowledge distillation with a unifie…
▽ More
Human pose estimation is a key task in computer vision with various applications such as activity recognition and interactive systems. However, the lack of consistency in the annotated skeletons across different datasets poses challenges in develo** universally applicable models. To address this challenge, we propose a novel approach integrating multi-teacher knowledge distillation with a unified skeleton representation. Our networks are jointly trained on the COCO and MPII datasets, containing 17 and 16 keypoints, respectively. We demonstrate enhanced adaptability by predicting an extended set of 21 keypoints, 4 (COCO) and 5 (MPII) more than original annotations, improving cross-dataset generalization. Our joint models achieved an average accuracy of 70.89 and 76.40, compared to 53.79 and 55.78 when trained on a single dataset and evaluated on both. Moreover, we also evaluate all 21 predicted points by our two models by reporting an AP of 66.84 and 72.75 on the Halpe dataset. This highlights the potential of our technique to address one of the most pressing challenges in pose estimation research and application - the inconsistency in skeletal annotations.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Quantum Visual Feature Encoding Revisited
Authors:
Xuan-Bac Nguyen,
Hoang-Quan Nguyen,
Hugh Churchill,
Samee U. Khan,
Khoa Luu
Abstract:
Although quantum machine learning has been introduced for a while, its applications in computer vision are still limited. This paper, therefore, revisits the quantum visual encoding strategies, the initial step in quantum machine learning. Investigating the root cause, we uncover that the existing quantum encoding design fails to ensure information preservation of the visual features after the enc…
▽ More
Although quantum machine learning has been introduced for a while, its applications in computer vision are still limited. This paper, therefore, revisits the quantum visual encoding strategies, the initial step in quantum machine learning. Investigating the root cause, we uncover that the existing quantum encoding design fails to ensure information preservation of the visual features after the encoding process, thus complicating the learning process of the quantum machine learning models. In particular, the problem, termed "Quantum Information Gap" (QIG), leads to a gap of information between classical and corresponding quantum features. We provide theoretical proof and practical demonstrations of that found and underscore the significance of QIG, as it directly impacts the performance of quantum machine learning algorithms. To tackle this challenge, we introduce a simple but efficient new loss function named Quantum Information Preserving (QIP) to minimize this gap, resulting in enhanced performance of quantum machine learning algorithms. Extensive experiments validate the effectiveness of our approach, showcasing superior performance compared to current methodologies and consistently achieving state-of-the-art results in quantum modeling.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
QClusformer: A Quantum Transformer-based Framework for Unsupervised Visual Clustering
Authors:
Xuan-Bac Nguyen,
Hoang-Quan Nguyen,
Samuel Yen-Chi Chen,
Samee U. Khan,
Hugh Churchill,
Khoa Luu
Abstract:
Unsupervised vision clustering, a cornerstone in computer vision, has been studied for decades, yielding significant outcomes across numerous vision tasks. However, these algorithms involve substantial computational demands when confronted with vast amounts of unlabeled data. Conversely, Quantum computing holds promise in expediting unsupervised algorithms when handling large-scale databases. In t…
▽ More
Unsupervised vision clustering, a cornerstone in computer vision, has been studied for decades, yielding significant outcomes across numerous vision tasks. However, these algorithms involve substantial computational demands when confronted with vast amounts of unlabeled data. Conversely, Quantum computing holds promise in expediting unsupervised algorithms when handling large-scale databases. In this study, we introduce QClusformer, a pioneering Transformer-based framework leveraging Quantum machines to tackle unsupervised vision clustering challenges. Specifically, we design the Transformer architecture, including the self-attention module and transformer blocks, from a Quantum perspective to enable execution on Quantum hardware. In addition, we present QClusformer, a variant based on the Transformer architecture, tailored for unsupervised vision clustering tasks. By integrating these elements into an end-to-end framework, QClusformer consistently outperforms previous methods running on classical computers. Empirical evaluations across diverse benchmarks, including MS-Celeb-1M and DeepFashion, underscore the superior performance of QClusformer compared to state-of-the-art methods.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training
Authors:
Aisha Urooj Khan,
John Garrett,
Tyler Bradshaw,
Lonie Salkowski,
Jiwoong Jason Jeong,
Amara Tariq,
Imon Banerjee
Abstract:
A visual-language model (VLM) pre-trained on natural images and text pairs poses a significant barrier when applied to medical contexts due to domain shift. Yet, adapting or fine-tuning these VLMs for medical use presents considerable hurdles, including domain misalignment, limited access to extensive datasets, and high-class imbalances. Hence, there is a pressing need for strategies to effectivel…
▽ More
A visual-language model (VLM) pre-trained on natural images and text pairs poses a significant barrier when applied to medical contexts due to domain shift. Yet, adapting or fine-tuning these VLMs for medical use presents considerable hurdles, including domain misalignment, limited access to extensive datasets, and high-class imbalances. Hence, there is a pressing need for strategies to effectively adapt these VLMs to the medical domain, as such adaptations would prove immensely valuable in healthcare applications. In this study, we propose a framework designed to adeptly tailor VLMs to the medical domain, employing selective sampling and hard-negative mining techniques for enhanced performance in retrieval tasks. We validate the efficacy of our proposed approach by implementing it across two distinct VLMs: the in-domain VLM (MedCLIP) and out-of-domain VLMs (ALBEF). We assess the performance of these models both in their original off-the-shelf state and after undergoing our proposed training strategies, using two extensive datasets containing mammograms and their corresponding reports. Our evaluation spans zero-shot, few-shot, and supervised scenarios. Through our approach, we observe a notable enhancement in Recall@K performance for the image-text retrieval task.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
BRACTIVE: A Brain Activation Approach to Human Visual Brain Learning
Authors:
Xuan-Bac Nguyen,
Ho** Jang,
Xin Li,
Samee U. Khan,
Pawan Sinha,
Khoa Luu
Abstract:
The human brain is a highly efficient processing unit, and understanding how it works can inspire new algorithms and architectures in machine learning. In this work, we introduce a novel framework named Brain Activation Network (BRACTIVE), a transformer-based approach to studying the human visual brain. The main objective of BRACTIVE is to align the visual features of subjects with corresponding b…
▽ More
The human brain is a highly efficient processing unit, and understanding how it works can inspire new algorithms and architectures in machine learning. In this work, we introduce a novel framework named Brain Activation Network (BRACTIVE), a transformer-based approach to studying the human visual brain. The main objective of BRACTIVE is to align the visual features of subjects with corresponding brain representations via fMRI signals. It allows us to identify the brain's Regions of Interest (ROI) of the subjects. Unlike previous brain research methods, which can only identify ROIs for one subject at a time and are limited by the number of subjects, BRACTIVE automatically extends this identification to multiple subjects and ROIs. Our experiments demonstrate that BRACTIVE effectively identifies person-specific regions of interest, such as face and body-selective areas, aligning with neuroscience findings and indicating potential applicability to various object categories. More importantly, we found that leveraging human visual brain activity to guide deep neural networks enhances performance across various benchmarks. It encourages the potential of BRACTIVE in both neuroscience and machine intelligence studies.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification
Authors:
Sankalp Sinha,
Muhammad Saif Ullah Khan,
Talha Uddin Sheikh,
Didier Stricker,
Muhammad Zeshan Afzal
Abstract:
Zero-shot learning has been extensively investigated in the broader field of visual recognition, attracting significant interest recently. However, the current work on zero-shot learning in document image classification remains scarce. The existing studies either focus exclusively on zero-shot inference, or their evaluation does not align with the established criteria of zero-shot evaluation in th…
▽ More
Zero-shot learning has been extensively investigated in the broader field of visual recognition, attracting significant interest recently. However, the current work on zero-shot learning in document image classification remains scarce. The existing studies either focus exclusively on zero-shot inference, or their evaluation does not align with the established criteria of zero-shot evaluation in the visual recognition domain. We provide a comprehensive document image classification analysis in Zero-Shot Learning (ZSL) and Generalized Zero-Shot Learning (GZSL) settings to address this gap. Our methodology and evaluation align with the established practices of this domain. Additionally, we propose zero-shot splits for the RVL-CDIP dataset. Furthermore, we introduce CICA (pronounced 'ki-ka'), a framework that enhances the zero-shot learning capabilities of CLIP. CICA consists of a novel 'content module' designed to leverage any generic document-related textual information. The discriminative features extracted by this module are aligned with CLIP's text and image features using a novel 'coupled-contrastive' loss. Our module improves CLIP's ZSL top-1 accuracy by 6.7% and GZSL harmonic mean by 24% on the RVL-CDIP dataset. Our module is lightweight and adds only 3.3% more parameters to CLIP. Our work sets the direction for future research in zero-shot document classification.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Leveraging Pre-trained CNNs for Efficient Feature Extraction in Rice Leaf Disease Classification
Authors:
Md. Shohanur Islam Sobuj,
Md. Imran Hossen,
Md. Foysal Mahmud,
Mahbub Ul Islam Khan
Abstract:
Rice disease classification is a critical task in agricultural research, and in this study, we rigorously evaluate the impact of integrating feature extraction methodologies within pre-trained convolutional neural networks (CNNs). Initial investigations into baseline models, devoid of feature extraction, revealed commendable performance with ResNet-50 and ResNet-101 achieving accuracies of 91% and…
▽ More
Rice disease classification is a critical task in agricultural research, and in this study, we rigorously evaluate the impact of integrating feature extraction methodologies within pre-trained convolutional neural networks (CNNs). Initial investigations into baseline models, devoid of feature extraction, revealed commendable performance with ResNet-50 and ResNet-101 achieving accuracies of 91% and 92%, respectively. Subsequent integration of Histogram of Oriented Gradients (HOG) yielded substantial improvements across architectures, notably propelling the accuracy of EfficientNet-B7 from 92\% to an impressive 97%. Conversely, the application of Local Binary Patterns (LBP) demonstrated more conservative performance enhancements. Moreover, employing Gradient-weighted Class Activation Map** (Grad-CAM) unveiled that HOG integration resulted in heightened attention to disease-specific features, corroborating the performance enhancements observed. Visual representations further validated HOG's notable influence, showcasing a discernible surge in accuracy across epochs due to focused attention on disease-affected regions. These results underscore the pivotal role of feature extraction, particularly HOG, in refining representations and bolstering classification accuracy. The study's significant highlight was the achievement of 97% accuracy with EfficientNet-B7 employing HOG and Grad-CAM, a noteworthy advancement in optimizing pre-trained CNN-based rice disease identification systems. The findings advocate for the strategic integration of advanced feature extraction techniques with cutting-edge pre-trained CNN architectures, presenting a promising avenue for substantially augmenting the precision and effectiveness of image-based disease classification systems in agricultural contexts.
△ Less
Submitted 26 February, 2024;
originally announced May 2024.
-
Multi-Objective Offloading Optimization in MEC and Vehicular-Fog Systems: A Distributed-TD3 Approach
Authors:
Frezer Guteta Wakgra,
Binayak Kar,
Seifu Birhanu Tadele,
Shan-Hsiang Shen,
Asif Uddin Khan
Abstract:
The emergence of 5G networks has enabled the deployment of a two-tier edge and vehicular-fog network. It comprises Multi-access Edge Computing (MEC) and Vehicular-Fogs (VFs), strategically positioned closer to Internet of Things (IoT) devices, reducing propagation latency compared to cloud-based solutions and ensuring satisfactory quality of service (QoS). However, during high-traffic events like…
▽ More
The emergence of 5G networks has enabled the deployment of a two-tier edge and vehicular-fog network. It comprises Multi-access Edge Computing (MEC) and Vehicular-Fogs (VFs), strategically positioned closer to Internet of Things (IoT) devices, reducing propagation latency compared to cloud-based solutions and ensuring satisfactory quality of service (QoS). However, during high-traffic events like concerts or athletic contests, MEC sites may face congestion and become overloaded. Utilizing offloading techniques, we can transfer computationally intensive tasks from resource-constrained devices to those with sufficient capacity, for accelerating tasks and extending device battery life. In this research, we consider offloading within a two-tier MEC and VF architecture, involving offloading from MEC to MEC and from MEC to VF. The primary objective is to minimize the average system cost, considering both latency and energy consumption. To achieve this goal, we formulate a multi-objective optimization problem aimed at minimizing latency and energy while considering given resource constraints. To facilitate decision-making for nearly optimal computational offloading, we design an equivalent reinforcement learning environment that accurately represents the network architecture and the formulated problem. To accomplish this, we propose a Distributed-TD3 (DTD3) approach, which builds on the TD3 algorithm. Extensive simulations, demonstrate that our strategy achieves faster convergence and higher efficiency compared to other benchmark solutions.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Space Physiology and Technology: Musculoskeletal Adaptations, Countermeasures, and the Opportunity for Wearable Robotics
Authors:
Shamas Ul Ebad Khan,
Re** John Varghese,
Panagiotis Kassanos,
Dario Farina,
Etienne Burdet
Abstract:
Space poses significant challenges for human physiology, leading to physiological adaptations in response to an environment vastly different from Earth. While these adaptations can be beneficial, they may not fully counteract the adverse impact of space-related stressors. A comprehensive understanding of these physiological adaptations is needed to devise effective countermeasures to support human…
▽ More
Space poses significant challenges for human physiology, leading to physiological adaptations in response to an environment vastly different from Earth. While these adaptations can be beneficial, they may not fully counteract the adverse impact of space-related stressors. A comprehensive understanding of these physiological adaptations is needed to devise effective countermeasures to support human life in space. This review focuses on the impact of the environment in space on the musculoskeletal system. It highlights the complex interplay between bone and muscle adaptation, the underlying physiological mechanisms, and their implications on astronaut health. Furthermore, the review delves into the deployed and current advances in countermeasures and proposes, as a perspective for future developments, wearable sensing and robotic technologies, such as exoskeletons, as a fitting alternative.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Automated System-level Testing of Unmanned Aerial Systems
Authors:
Hassan Sartaj,
Asmar Muqeet,
Muhammad Zohaib Iqbal,
Muhammad Uzair Khan
Abstract:
Unmanned aerial systems (UAS) rely on various avionics systems that are safety-critical and mission-critical. A major requirement of international safety standards is to perform rigorous system-level testing of avionics software systems. The current industrial practice is to manually create test scenarios, manually/automatically execute these scenarios using simulators, and manually evaluate outco…
▽ More
Unmanned aerial systems (UAS) rely on various avionics systems that are safety-critical and mission-critical. A major requirement of international safety standards is to perform rigorous system-level testing of avionics software systems. The current industrial practice is to manually create test scenarios, manually/automatically execute these scenarios using simulators, and manually evaluate outcomes. The test scenarios typically consist of setting certain flight or environment conditions and testing the system under test in these settings. The state-of-the-art approaches for this purpose also require manual test scenario development and evaluation. In this paper, we propose a novel approach to automate the system-level testing of the UAS. The proposed approach (AITester) utilizes model-based testing and artificial intelligence (AI) techniques to automatically generate, execute, and evaluate various test scenarios. The test scenarios are generated on the fly, i.e., during test execution based on the environmental context at runtime. The approach is supported by a toolset. We empirically evaluate the proposed approach on two core components of UAS, an autopilot system of an unmanned aerial vehicle (UAV) and cockpit display systems (CDS) of the ground control station (GCS). The results show that the AITester effectively generates test scenarios causing deviations from the expected behavior of the UAV autopilot and reveals potential flaws in the GCS-CDS.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks
Authors:
Muhammad Saif Ullah Khan,
Muhammad Ferjad Naeem,
Federico Tombari,
Luc Van Gool,
Didier Stricker,
Muhammad Zeshan Afzal
Abstract:
We propose FocusCLIP, integrating subject-level guidance--a specialized mechanism for target-specific supervision--into the CLIP framework for improved zero-shot transfer on human-centric tasks. Our novel contributions enhance CLIP on both the vision and text sides. On the vision side, we incorporate ROI heatmaps emulating human visual attention mechanisms to emphasize subject-relevant image regio…
▽ More
We propose FocusCLIP, integrating subject-level guidance--a specialized mechanism for target-specific supervision--into the CLIP framework for improved zero-shot transfer on human-centric tasks. Our novel contributions enhance CLIP on both the vision and text sides. On the vision side, we incorporate ROI heatmaps emulating human visual attention mechanisms to emphasize subject-relevant image regions. On the text side, we introduce human pose descriptions to provide rich contextual information. For human-centric tasks, FocusCLIP is trained with images from the MPII Human Pose dataset. The proposed approach surpassed CLIP by an average of 8.61% across five previously unseen datasets covering three human-centric tasks. FocusCLIP achieved an average accuracy of 33.65% compared to 25.04% by CLIP. We observed a 3.98% improvement in activity recognition, a 14.78% improvement in age classification, and a 7.06% improvement in emotion recognition. Moreover, using our proposed single-shot LLM prompting strategy, we release a high-quality MPII Pose Descriptions dataset to encourage further research in multimodal learning for human-centric tasks. Furthermore, we also demonstrate the effectiveness of our subject-level supervision on non-human-centric tasks. FocusCLIP shows a 2.47% improvement over CLIP in zero-shot bird classification using the CUB dataset. Our findings emphasize the potential of integrating subject-level guidance with general pretraining methods for enhanced downstream performance.
△ Less
Submitted 25 March, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages
Authors:
Mohammed Safi Ur Rahman Khan,
Priyam Mehta,
Ananth Sankar,
Umashankar Kumaravelan,
Sumanth Doddapaneni,
Suriyaprasaad G,
Varun Balan G,
Sparsh Jain,
Anoop Kunchukuttan,
Pratyush Kumar,
Raj Dabre,
Mitesh M. Khapra
Abstract:
Despite the considerable advancements in English LLMs, the progress in building comparable models for other languages has been hindered due to the scarcity of tailored resources. Our work aims to bridge this divide by introducing an expansive suite of resources specifically designed for the development of Indic LLMs, covering 22 languages, containing a total of 251B tokens and 74.8M instruction-re…
▽ More
Despite the considerable advancements in English LLMs, the progress in building comparable models for other languages has been hindered due to the scarcity of tailored resources. Our work aims to bridge this divide by introducing an expansive suite of resources specifically designed for the development of Indic LLMs, covering 22 languages, containing a total of 251B tokens and 74.8M instruction-response pairs. Recognizing the importance of both data quality and quantity, our approach combines highly curated manually verified data, unverified yet valuable data, and synthetic data. We build a clean, open-source pipeline for curating pre-training data from diverse sources, including websites, PDFs, and videos, incorporating best practices for crawling, cleaning, flagging, and deduplication. For instruction-fine tuning, we amalgamate existing Indic datasets, translate/transliterate English datasets into Indian languages, and utilize LLaMa2 and Mixtral models to create conversations grounded in articles from Indian Wikipedia and Wikihow. Additionally, we address toxicity alignment by generating toxic prompts for multiple scenarios and then generate non-toxic responses by feeding these toxic prompts to an aligned LLaMa2 model. We hope that the datasets, tools, and resources released as a part of this work will not only propel the research and development of Indic LLMs but also establish an open-source blueprint for extending such efforts to other languages. The data and other artifacts created as part of this work are released with permissive licenses.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Personality Trait Recognition using ECG Spectrograms and Deep Learning
Authors:
Muhammad Mohsin Altaf,
Saadat Ullah Khan,
Muhammad Majd,
Syed Muhammad Anwar
Abstract:
This paper presents an innovative approach to recognizing personality traits using deep learning (DL) methods applied to electrocardiogram (ECG) signals. Within the framework of detecting the big five personality traits model encompassing extra-version, neuroticism, agreeableness, conscientiousness, and openness, the research explores the potential of ECG-derived spectrograms as informative featur…
▽ More
This paper presents an innovative approach to recognizing personality traits using deep learning (DL) methods applied to electrocardiogram (ECG) signals. Within the framework of detecting the big five personality traits model encompassing extra-version, neuroticism, agreeableness, conscientiousness, and openness, the research explores the potential of ECG-derived spectrograms as informative features. Optimal window sizes for spectrogram generation are determined, and a convolutional neural network (CNN), specifically Resnet-18, and visual transformer (ViT) are employed for feature extraction and personality trait classification. The study utilizes the publicly available ASCERTAIN dataset, which comprises various physiological signals, including ECG recordings, collected from 58 participants during the presentation of video stimuli categorized by valence and arousal levels. The outcomes of this study demonstrate noteworthy performance in personality trait classification, consistently achieving F1-scores exceeding 0.9 across different window sizes and personality traits. These results emphasize the viability of ECG signal spectrograms as a valuable modality for personality trait recognition, with Resnet-18 exhibiting effectiveness in discerning distinct personality traits.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Airavata: Introducing Hindi Instruction-tuned LLM
Authors:
Jay Gala,
Thanmay Jayakumar,
Jaavid Aktar Husain,
Aswanth Kumar M,
Mohammed Safi Ur Rahman Khan,
Diptesh Kanojia,
Ratish Puduppully,
Mitesh M. Khapra,
Raj Dabre,
Rudra Murthy,
Anoop Kunchukuttan
Abstract:
We announce the initial release of "Airavata," an instruction-tuned LLM for Hindi. Airavata was created by fine-tuning OpenHathi with diverse, instruction-tuning Hindi datasets to make it better suited for assistive tasks. Along with the model, we also share the IndicInstruct dataset, which is a collection of diverse instruction-tuning datasets to enable further research for Indic LLMs. Additional…
▽ More
We announce the initial release of "Airavata," an instruction-tuned LLM for Hindi. Airavata was created by fine-tuning OpenHathi with diverse, instruction-tuning Hindi datasets to make it better suited for assistive tasks. Along with the model, we also share the IndicInstruct dataset, which is a collection of diverse instruction-tuning datasets to enable further research for Indic LLMs. Additionally, we present evaluation benchmarks and a framework for assessing LLM performance across tasks in Hindi. Currently, Airavata supports Hindi, but we plan to expand this to all 22 scheduled Indic languages. You can access all artifacts at https://ai4bharat.github.io/airavata.
△ Less
Submitted 26 February, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
Efficient Test Data Generation for MC/DC with OCL and Search
Authors:
Hassan Sartaj,
Muhammad Zohaib Iqbal,
Atif Aftab Ahmed Jilani,
Muhammad Uzair Khan
Abstract:
System-level testing of avionics software systems requires compliance with different international safety standards such as DO-178C. An important consideration of the avionics industry is automated test data generation according to the criteria suggested by safety standards. One of the recommended criteria by DO-178C is the modified condition/decision coverage (MC/DC) criterion. The current model-…
▽ More
System-level testing of avionics software systems requires compliance with different international safety standards such as DO-178C. An important consideration of the avionics industry is automated test data generation according to the criteria suggested by safety standards. One of the recommended criteria by DO-178C is the modified condition/decision coverage (MC/DC) criterion. The current model-based test data generation approaches use constraints written in Object Constraint Language (OCL), and apply search techniques to generate test data. These approaches either do not support MC/DC criterion or suffer from performance issues while generating test data for large-scale avionics systems. In this paper, we propose an effective way to automate MC/DC test data generation during model-based testing. We develop a strategy that utilizes case-based reasoning (CBR) and range reduction heuristics designed to solve MC/DC-tailored OCL constraints. We performed an empirical study to compare our proposed strategy for MC/DC test data generation using CBR, range reduction, both CBR and range reduction, with an original search algorithm, and random search. We also empirically compared our strategy with existing constraint-solving approaches. The results show that both CBR and range reduction for MC/DC test data generation outperform the baseline approach. Moreover, the combination of both CBR and range reduction for MC/DC test data generation is an effective approach compared to existing constraint solvers.
△ Less
Submitted 23 June, 2024; v1 submitted 7 January, 2024;
originally announced January 2024.
-
Transformer Network for Multi-Person Tracking and Re-Identification in Unconstrained Environment
Authors:
Hamza Mukhtar,
Muhammad Usman Ghani Khan
Abstract:
Multi-object tracking (MOT) has profound applications in a variety of fields, including surveillance, sports analytics, self-driving, and cooperative robotics. Despite considerable advancements, existing MOT methodologies tend to falter when faced with non-uniform movements, occlusions, and appearance-reappearance scenarios of the objects. Recognizing this inadequacy, we put forward an integrated…
▽ More
Multi-object tracking (MOT) has profound applications in a variety of fields, including surveillance, sports analytics, self-driving, and cooperative robotics. Despite considerable advancements, existing MOT methodologies tend to falter when faced with non-uniform movements, occlusions, and appearance-reappearance scenarios of the objects. Recognizing this inadequacy, we put forward an integrated MOT method that not only marries object detection and identity linkage within a singular, end-to-end trainable framework but also equips the model with the ability to maintain object identity links over long periods of time. Our proposed model, named STMMOT, is built around four key modules: 1) candidate proposal generation, which generates object proposals via a vision-transformer encoder-decoder architecture that detects the object from each frame in the video; 2) scale variant pyramid, a progressive pyramid structure to learn the self-scale and cross-scale similarities in multi-scale feature maps; 3) spatio-temporal memory encoder, extracting the essential information from the memory associated with each object under tracking; and 4) spatio-temporal memory decoder, simultaneously resolving the tasks of object detection and identity association for MOT. Our system leverages a robust spatio-temporal memory module that retains extensive historical observations and effectively encodes them using an attention-based aggregator. The uniqueness of STMMOT lies in representing objects as dynamic query embeddings that are updated continuously, which enables the prediction of object states with attention mechanisms and eradicates the need for post-processing.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Brainformer: Mimic Human Visual Brain Functions to Machine Vision Models via fMRI
Authors:
Xuan-Bac Nguyen,
Xin Li,
Pawan Sinha,
Samee U. Khan,
Khoa Luu
Abstract:
Human perception plays a vital role in forming beliefs and understanding reality. A deeper understanding of brain functionality will lead to the development of novel deep neural networks. In this work, we introduce a novel framework named Brainformer, a straightforward yet effective Transformer-based framework, to analyze Functional Magnetic Resonance Imaging (fMRI) patterns in the human perceptio…
▽ More
Human perception plays a vital role in forming beliefs and understanding reality. A deeper understanding of brain functionality will lead to the development of novel deep neural networks. In this work, we introduce a novel framework named Brainformer, a straightforward yet effective Transformer-based framework, to analyze Functional Magnetic Resonance Imaging (fMRI) patterns in the human perception system from a machine-learning perspective. Specifically, we present the Multi-scale fMRI Transformer to explore brain activity patterns through fMRI signals. This architecture includes a simple yet efficient module for high-dimensional fMRI signal encoding and incorporates a novel embedding technique called 3D Voxels Embedding. Secondly, drawing inspiration from the functionality of the brain's Region of Interest, we introduce a novel loss function called Brain fMRI Guidance Loss. This loss function mimics brain activity patterns from these regions in the deep neural network using fMRI data. This work introduces a prospective approach to transfer knowledge from human perception to neural networks. Our experiments demonstrate that leveraging fMRI information allows the machine vision model to achieve results comparable to State-of-the-Art methods in various image recognition tasks.
△ Less
Submitted 29 May, 2024; v1 submitted 30 November, 2023;
originally announced December 2023.
-
Deriving and Evaluating a Detailed Taxonomy of Game Bugs
Authors:
Nigar Azhar Butt,
Salman Sherin,
Muhammad Uzair Khan,
Atif Aftab Jilani,
Muhammad Zohaib Iqbal
Abstract:
Game development has become an extremely competitive multi-billion-dollar industry. Many games fail even after years of development efforts because of game-breaking bugs that disrupt the game-play and ruin the player experience. The goal of this work is to provide a bug taxonomy for games that will help game developers in develo** bug-resistant games, game testers in designing and executing faul…
▽ More
Game development has become an extremely competitive multi-billion-dollar industry. Many games fail even after years of development efforts because of game-breaking bugs that disrupt the game-play and ruin the player experience. The goal of this work is to provide a bug taxonomy for games that will help game developers in develo** bug-resistant games, game testers in designing and executing fault-finding test cases, and researchers in evaluating game testing approaches. For this purpose, we performed a Multivocal Literature Review (MLR) by analyzing 436 sources, out of which 189 (78 academic and 111 grey) sources reporting bugs encountered in the game development industry were selected for analysis. We validate the proposed taxonomy by conducting a survey involving different game industry practitioners. The MLR allowed us to finalize a detailed taxonomy of 63 game bug categories in end-user perspective including eight first-tier categories: Gaming Balance, Implementation Response, Network, Sound, Temporal, Unexpected Crash, Navigational, and Non-Temporal faults. We observed that manual approaches towards game testing are still widely used. Only one of the approaches targets sound bugs whereas game balancing and how to incorporate machine learning in game testing is trending in the recent literature. Most of the game testing techniques are specialized and dependent on specific platforms.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
A Generalized Space-Efficient Algorithm for Quantum Bit String Comparators
Authors:
Khuram Shahzad,
Omar Usman Khan
Abstract:
Quantum Bit String Comparators (QBSC) operate on two sequences of n-qubits, enabling the determination of their relationships, such as equality, greater than, or less than. This is analogous to the way conditional statements are used in programming languages. Consequently, QBSCs play a crucial role in various algorithms that can be executed or adapted for quantum computers. The development of effi…
▽ More
Quantum Bit String Comparators (QBSC) operate on two sequences of n-qubits, enabling the determination of their relationships, such as equality, greater than, or less than. This is analogous to the way conditional statements are used in programming languages. Consequently, QBSCs play a crucial role in various algorithms that can be executed or adapted for quantum computers. The development of efficient and generalized comparators for any $n$-qubit length has long posed a challenge, as they have a high-cost footprint and lead to quantum delays. Comparators that are efficient are associated with inputs of fixed length. As a result, comparators without a generalized circuit cannot be employed at a higher level, though they are well-suited for problems with limited size requirements. In this paper, we introduce a generalized design for the comparison of two $n$-qubit logic states using just two ancillary bits. The design is examined on the basis of qubit requirements, ancillary bit usage, quantum cost, quantum delay, gate operations, and circuit complexity, and is tested comprehensively on various input lengths. The work allows for sufficient flexibility in the design of quantum algorithms, which can accelerate quantum algorithm development.
△ Less
Submitted 14 November, 2023; v1 submitted 11 November, 2023;
originally announced November 2023.
-
Distributed Delay-Tolerant Strategies for Equality-Constraint Sum-Preserving Resource Allocation
Authors:
Mohammadreza Doostmohammadian,
Alireza Aghasi,
Maria Vrakopoulou,
Hamid R. Rabiee,
Usman A. Khan,
Themistoklis Charalambou
Abstract:
This paper proposes two nonlinear dynamics to solve constrained distributed optimization problem for resource allocation over a multi-agent network. In this setup, coupling constraint refers to resource-demand balance which is preserved at all-times. The proposed solutions can address various model nonlinearities, for example, due to quantization and/or saturation. Further, it allows to reach fast…
▽ More
This paper proposes two nonlinear dynamics to solve constrained distributed optimization problem for resource allocation over a multi-agent network. In this setup, coupling constraint refers to resource-demand balance which is preserved at all-times. The proposed solutions can address various model nonlinearities, for example, due to quantization and/or saturation. Further, it allows to reach faster convergence or to robustify the solution against impulsive noise or uncertainties. We prove convergence over weakly connected networks using convex analysis and Lyapunov theory. Our findings show that convergence can be reached for general sign-preserving odd nonlinearity. We further propose delay-tolerant mechanisms to handle general bounded heterogeneous time-varying delays over the communication network of agents while preserving all-time feasibility. This work finds application in CPU scheduling and coverage control among others. This paper advances the state-of-the-art by addressing (i) possible nonlinearity on the agents/links, meanwhile handling (ii) resource-demand feasibility at all times, (iii) uniform-connectivity instead of all-time connectivity, and (iv) possible heterogeneous and time-varying delays. To our best knowledge, no existing work addresses contributions (i)-(iv) altogether. Simulations and comparative analysis are provided to corroborate our contributions.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Motion Planning for Autonomous Ground Vehicles Using Artificial Potential Fields: A Review
Authors:
Aziz ur Rehman,
Ahsan Tanveer,
M. Touseef Ashraf,
Umer Khan
Abstract:
Autonomous ground vehicle systems have found extensive potential and practical applications in the modern world. The development of an autonomous ground vehicle poses a significant challenge, particularly in identifying the best path plan, based on defined performance metrics such as safety margin, shortest time, and energy consumption. Various techniques for motion planning have been proposed by…
▽ More
Autonomous ground vehicle systems have found extensive potential and practical applications in the modern world. The development of an autonomous ground vehicle poses a significant challenge, particularly in identifying the best path plan, based on defined performance metrics such as safety margin, shortest time, and energy consumption. Various techniques for motion planning have been proposed by researchers, one of which is the use of artificial potential fields. Several authors in the past two decades have proposed various modified versions of the artificial potential field algorithms. The variations of the traditional APF approach have given an answer to prior shortcomings. This gives potential rise to a strategic survey on the improved versions of this algorithm. This study presents a review of motion planning for autonomous ground vehicles using artificial potential fields. Each article is evaluated based on criteria that involve the environment type, which may be either static or dynamic, the evaluation scenario, which may be real-time or simulated, and the method used for improving the search performance of the algorithm. All the customized designs of planning models are analyzed and evaluated. At the end, the results of the review are discussed, and future works are proposed.
△ Less
Submitted 22 October, 2023;
originally announced October 2023.
-
Perceptual Tone Map** Model for High Dynamic Range Imaging
Authors:
Imran Mehmood,
Xinye Shi,
M. Usman Khan,
Ming Ronnier Luo
Abstract:
One of the key challenges in tone map** is to preserve the perceptual quality of high dynamic range (HDR) images when map** them to standard dynamic range (SDR) displays. Traditional tone map** operators (TMOs) compress the luminance of HDR images without considering the surround and display conditions emanating into suboptimal results. Current research addresses this challenge by incorporat…
▽ More
One of the key challenges in tone map** is to preserve the perceptual quality of high dynamic range (HDR) images when map** them to standard dynamic range (SDR) displays. Traditional tone map** operators (TMOs) compress the luminance of HDR images without considering the surround and display conditions emanating into suboptimal results. Current research addresses this challenge by incorporating perceptual color appearance attributes. In this work, we propose a TMO (TMOz) that leverages CIECAM16 perceptual attributes, i.e., brightness, colorfulness, and hue. TMOz accounts for the effects of both the surround and the display conditions to achieve more optimal colorfulness reproduction. The perceptual brightness is compressed, and the perceptual color scales, i.e., colorfulness and hue are derived from HDR images by employing CIECAM16 color adaptation equations. A psychophysical experiment was conducted to automate the brightness compression parameter. The model employs fully automatic and adaptive approach, obviating the requirement for manual parameter selection. TMOz was evaluated in terms of contrast, colorfulness and overall image quality. The objective and subjective evaluation methods revealed that the proposed model outperformed the state-of-the-art TMOs.
△ Less
Submitted 29 September, 2023;
originally announced September 2023.
-
Quantum Vision Clustering
Authors:
Xuan Bac Nguyen,
Hugh Churchill,
Khoa Luu,
Samee U. Khan
Abstract:
Unsupervised visual clustering has garnered significant attention in recent times, aiming to characterize distributions of unlabeled visual images through clustering based on a parameterized appearance approach. Alternatively, clustering algorithms can be viewed as assignment problems, often characterized as NP-hard, yet precisely solvable for small instances on contemporary hardware. Adiabatic qu…
▽ More
Unsupervised visual clustering has garnered significant attention in recent times, aiming to characterize distributions of unlabeled visual images through clustering based on a parameterized appearance approach. Alternatively, clustering algorithms can be viewed as assignment problems, often characterized as NP-hard, yet precisely solvable for small instances on contemporary hardware. Adiabatic quantum computing (AQC) emerges as a promising solution, poised to deliver substantial speedups for a range of NP-hard optimization problems. However, existing clustering formulations face challenges in quantum computing adoption due to scalability issues. In this study, we present the first clustering formulation tailored for resolution using Adiabatic quantum computing. An Ising model is introduced to represent the quantum mechanical system implemented on AQC. The proposed approach demonstrates high competitiveness compared to state-of-the-art optimization-based methods, even when utilizing off-the-shelf integer programming solvers. Lastly, this work showcases the solvability of the proposed clustering problem on current-generation real quantum computers for small examples and analyzes the properties of the obtained solutions
△ Less
Submitted 17 December, 2023; v1 submitted 18 September, 2023;
originally announced September 2023.
-
Sequence-Based Nanobody-Antigen Binding Prediction
Authors:
Usama Sardar,
Sarwan Ali,
Muhammad Sohaib Ayub,
Muhammad Shoaib,
Khurram Bashir,
Imdad Ullah Khan,
Murray Patterson
Abstract:
Nanobodies (Nb) are monomeric heavy-chain fragments derived from heavy-chain only antibodies naturally found in Camelids and Sharks. Their considerably small size (~3-4 nm; 13 kDa) and favorable biophysical properties make them attractive targets for recombinant production. Furthermore, their unique ability to bind selectively to specific antigens, such as toxins, chemicals, bacteria, and viruses,…
▽ More
Nanobodies (Nb) are monomeric heavy-chain fragments derived from heavy-chain only antibodies naturally found in Camelids and Sharks. Their considerably small size (~3-4 nm; 13 kDa) and favorable biophysical properties make them attractive targets for recombinant production. Furthermore, their unique ability to bind selectively to specific antigens, such as toxins, chemicals, bacteria, and viruses, makes them powerful tools in cell biology, structural biology, medical diagnostics, and future therapeutic agents in treating cancer and other serious illnesses. However, a critical challenge in nanobodies production is the unavailability of nanobodies for a majority of antigens. Although some computational methods have been proposed to screen potential nanobodies for given target antigens, their practical application is highly restricted due to their reliance on 3D structures. Moreover, predicting nanobodyantigen interactions (binding) is a time-consuming and labor-intensive task. This study aims to develop a machine-learning method to predict Nanobody-Antigen binding solely based on the sequence data. We curated a comprehensive dataset of Nanobody-Antigen binding and nonbinding data and devised an embedding method based on gapped k-mers to predict binding based only on sequences of nanobody and antigen. Our approach achieves up to 90% accuracy in binding prediction and is significantly more efficient compared to the widely-used computational docking technique.
△ Less
Submitted 14 July, 2023;
originally announced August 2023.
-
CAMP: A Context-Aware Cricket Players Performance Metric
Authors:
Muhammad Sohaib Ayub,
Naimat Ullah,
Sarwan Ali,
Imdad Ullah Khan,
Mian Muhammad Awais,
Muhammad Asad Khan,
Safiullah Faizullah
Abstract:
Cricket is the second most popular sport after soccer in terms of viewership. However, the assessment of individual player performance, a fundamental task in team sports, is currently primarily based on aggregate performance statistics, including average runs and wickets taken. We propose Context-Aware Metric of player Performance, CAMP, to quantify individual players' contributions toward a crick…
▽ More
Cricket is the second most popular sport after soccer in terms of viewership. However, the assessment of individual player performance, a fundamental task in team sports, is currently primarily based on aggregate performance statistics, including average runs and wickets taken. We propose Context-Aware Metric of player Performance, CAMP, to quantify individual players' contributions toward a cricket match outcome. CAMP employs data mining methods and enables effective data-driven decision-making for selection and drafting, coaching and training, team line-ups, and strategy development. CAMP incorporates the exact context of performance, such as opponents' strengths and specific circumstances of games, such as pressure situations. We empirically evaluate CAMP on data of limited-over cricket matches between 2001 and 2019. In every match, a committee of experts declares one player as the best player, called Man of the M}atch (MoM). The top two rated players by CAMP match with MoM in 83\% of the 961 games. Thus, the CAMP rating of the best player closely matches that of the domain experts. By this measure, CAMP significantly outperforms the current best-known players' contribution measure based on the Duckworth-Lewis-Stern (DLS) method.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
A Comprehensive Overview of Large Language Models
Authors:
Humza Naveed,
Asad Ullah Khan,
Shi Qiu,
Muhammad Saqib,
Saeed Anwar,
Muhammad Usman,
Naveed Akhtar,
Nick Barnes,
Ajmal Mian
Abstract:
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks and beyond. This success of LLMs has led to a large influx of research contributions in this direction. These works encompass diverse topics such as architectural innovations, better training strategies, context length improvements, fine-tuning, multi-modal LLMs, robotics, datasets,…
▽ More
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks and beyond. This success of LLMs has led to a large influx of research contributions in this direction. These works encompass diverse topics such as architectural innovations, better training strategies, context length improvements, fine-tuning, multi-modal LLMs, robotics, datasets, benchmarking, efficiency, and more. With the rapid development of techniques and regular breakthroughs in LLM research, it has become considerably challenging to perceive the bigger picture of the advances in this direction. Considering the rapidly emerging plethora of literature on LLMs, it is imperative that the research community is able to benefit from a concise yet comprehensive overview of the recent developments in this field. This article provides an overview of the existing literature on a broad range of LLM-related concepts. Our self-contained comprehensive overview of LLMs discusses relevant background concepts along with covering the advanced topics at the frontier of research in LLMs. This review article is intended to not only provide a systematic survey but also a quick comprehensive reference for the researchers and practitioners to draw insights from extensive informative summaries of the existing works to advance the LLM research.
△ Less
Submitted 9 April, 2024; v1 submitted 12 July, 2023;
originally announced July 2023.
-
Physical Color Calibration of Digital Pathology Scanners for Robust Artificial Intelligence Assisted Cancer Diagnosis
Authors:
Xiaoyi Ji,
Richard Salmon,
Nita Mulliqi,
Umair Khan,
Yinxi Wang,
Anders Blilie,
Henrik Olsson,
Bodil Ginnerup Pedersen,
Karina Dalsgaard Sørensen,
Benedicte Parm Ulhøi,
Svein R Kjosavik,
Emilius AM Janssen,
Mattias Rantalainen,
Lars Egevad,
Pekka Ruusuvuori,
Martin Eklund,
Kimmo Kartasalo
Abstract:
The potential of artificial intelligence (AI) in digital pathology is limited by technical inconsistencies in the production of whole slide images (WSIs), leading to degraded AI performance and posing a challenge for widespread clinical application as fine-tuning algorithms for each new site is impractical. Changes in the imaging workflow can also lead to compromised diagnoses and patient safety r…
▽ More
The potential of artificial intelligence (AI) in digital pathology is limited by technical inconsistencies in the production of whole slide images (WSIs), leading to degraded AI performance and posing a challenge for widespread clinical application as fine-tuning algorithms for each new site is impractical. Changes in the imaging workflow can also lead to compromised diagnoses and patient safety risks. We evaluated whether physical color calibration of scanners can standardize WSI appearance and enable robust AI performance. We employed a color calibration slide in four different laboratories and evaluated its impact on the performance of an AI system for prostate cancer diagnosis on 1,161 WSIs. Color standardization resulted in consistently improved AI model calibration and significant improvements in Gleason grading performance. The study demonstrates that physical color calibration provides a potential solution to the variation introduced by different scanners, making AI-based cancer diagnostics more reliable and applicable in clinical settings.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
UTOPIA: Unconstrained Tracking Objects without Preliminary Examination via Cross-Domain Adaptation
Authors:
Pha Nguyen,
Kha Gia Quach,
John Gauch,
Samee U. Khan,
Bhiksha Raj,
Khoa Luu
Abstract:
Multiple Object Tracking (MOT) aims to find bounding boxes and identities of targeted objects in consecutive video frames. While fully-supervised MOT methods have achieved high accuracy on existing datasets, they cannot generalize well on a newly obtained dataset or a new unseen domain. In this work, we first address the MOT problem from the cross-domain point of view, imitating the process of new…
▽ More
Multiple Object Tracking (MOT) aims to find bounding boxes and identities of targeted objects in consecutive video frames. While fully-supervised MOT methods have achieved high accuracy on existing datasets, they cannot generalize well on a newly obtained dataset or a new unseen domain. In this work, we first address the MOT problem from the cross-domain point of view, imitating the process of new data acquisition in practice. Then, a new cross-domain MOT adaptation from existing datasets is proposed without any pre-defined human knowledge in understanding and modeling objects. It can also learn and update itself from the target data feedback. The intensive experiments are designed on four challenging settings, including MOTSynth to MOT17, MOT17 to MOT20, MOT17 to VisDrone, and MOT17 to DanceTrack. We then prove the adaptability of the proposed self-supervised learning strategy. The experiments also show superior performance on tracking metrics MOTA and IDF1, compared to fully supervised, unsupervised, and self-supervised state-of-the-art methods.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Robust Brain Age Estimation via Regression Models and MRI-derived Features
Authors:
Mansoor Ahmed,
Usama Sardar,
Sarwan Ali,
Shafiq Alam,
Murray Patterson,
Imdad Ullah Khan
Abstract:
The determination of biological brain age is a crucial biomarker in the assessment of neurological disorders and understanding of the morphological changes that occur during aging. Various machine learning models have been proposed for estimating brain age through Magnetic Resonance Imaging (MRI) of healthy controls. However, develo** a robust brain age estimation (BAE) framework has been challe…
▽ More
The determination of biological brain age is a crucial biomarker in the assessment of neurological disorders and understanding of the morphological changes that occur during aging. Various machine learning models have been proposed for estimating brain age through Magnetic Resonance Imaging (MRI) of healthy controls. However, develo** a robust brain age estimation (BAE) framework has been challenging due to the selection of appropriate MRI-derived features and the high cost of MRI acquisition. In this study, we present a novel BAE framework using the Open Big Healthy Brain (OpenBHB) dataset, which is a new multi-site and publicly available benchmark dataset that includes region-wise feature metrics derived from T1-weighted (T1-w) brain MRI scans of 3965 healthy controls aged between 6 to 86 years. Our approach integrates three different MRI-derived region-wise features and different regression models, resulting in a highly accurate brain age estimation with a Mean Absolute Error (MAE) of 3.25 years, demonstrating the framework's robustness. We also analyze our model's regression-based performance on gender-wise (male and female) healthy test groups. The proposed BAE framework provides a new approach for estimating brain age, which has important implications for the understanding of neurological disorders and age-related brain changes.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
A Comprehensive Survey on Affective Computing; Challenges, Trends, Applications, and Future Directions
Authors:
Sitara Afzal,
Haseeb Ali Khan,
Imran Ullah Khan,
Md. Jalil Piran,
Jong Weon Lee
Abstract:
As the name suggests, affective computing aims to recognize human emotions, sentiments, and feelings. There is a wide range of fields that study affective computing, including languages, sociology, psychology, computer science, and physiology. However, no research has ever been done to determine how machine learning (ML) and mixed reality (XR) interact together. This paper discusses the significan…
▽ More
As the name suggests, affective computing aims to recognize human emotions, sentiments, and feelings. There is a wide range of fields that study affective computing, including languages, sociology, psychology, computer science, and physiology. However, no research has ever been done to determine how machine learning (ML) and mixed reality (XR) interact together. This paper discusses the significance of affective computing, as well as its ideas, conceptions, methods, and outcomes. By using approaches of ML and XR, we survey and discuss recent methodologies in affective computing. We survey the state-of-the-art approaches along with current affective data resources. Further, we discuss various applications where affective computing has a significant impact, which will aid future scholars in gaining a better understanding of its significance and practical relevance.
△ Less
Submitted 8 May, 2023;
originally announced May 2023.
-
GAANet: Ghost Auto Anchor Network for Detecting Varying Size Drones in Dark
Authors:
Misha Urooj Khan,
Maham Misbah,
Zeeshan Kaleem,
Yansha Deng,
Abbas Jamalipour
Abstract:
The usage of drones has tremendously increased in different sectors spanning from military to industrial applications. Despite all the benefits they offer, their misuse can lead to mishaps, and tackling them becomes more challenging particularly at night due to their small size and low visibility conditions. To overcome those limitations and improve the detection accuracy at night, we propose an o…
▽ More
The usage of drones has tremendously increased in different sectors spanning from military to industrial applications. Despite all the benefits they offer, their misuse can lead to mishaps, and tackling them becomes more challenging particularly at night due to their small size and low visibility conditions. To overcome those limitations and improve the detection accuracy at night, we propose an object detector called Ghost Auto Anchor Network (GAANet) for infrared (IR) images. The detector uses a YOLOv5 core to address challenges in object detection for IR images, such as poor accuracy and a high false alarm rate caused by extended altitudes, poor lighting, and low image resolution. To improve performance, we implemented auto anchor calculation, modified the conventional convolution block to ghost-convolution, adjusted the input channel size, and used the AdamW optimizer. To enhance the precision of multiscale tiny object recognition, we also introduced an additional extra-small object feature extractor and detector. Experimental results in a custom IR dataset with multiple classes (birds, drones, planes, and helicopters) demonstrate that GAANet shows improvement compared to state-of-the-art detectors. In comparison to GhostNet-YOLOv5, GAANet has higher overall mean average precision (mAP@50), recall, and precision around 2.5\%, 2.3\%, and 1.4\%, respectively. The dataset and code for this paper are available as open source at https://github.com/ZeeshanKaleem/GhostAutoAnchorNet.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
Virus2Vec: Viral Sequence Classification Using Machine Learning
Authors:
Sarwan Ali,
Babatunde Bello,
Prakash Chourasia,
Ria Thazhe Punathil,
Pin-Yu Chen,
Imdad Ullah Khan,
Murray Patterson
Abstract:
Understanding the host-specificity of different families of viruses sheds light on the origin of, e.g., SARS-CoV-2, rabies, and other such zoonotic pathogens in humans. It enables epidemiologists, medical professionals, and policymakers to curb existing epidemics and prevent future ones promptly. In the family Coronaviridae (of which SARS-CoV-2 is a member), it is well-known that the spike protein…
▽ More
Understanding the host-specificity of different families of viruses sheds light on the origin of, e.g., SARS-CoV-2, rabies, and other such zoonotic pathogens in humans. It enables epidemiologists, medical professionals, and policymakers to curb existing epidemics and prevent future ones promptly. In the family Coronaviridae (of which SARS-CoV-2 is a member), it is well-known that the spike protein is the point of contact between the virus and the host cell membrane. On the other hand, the two traditional mammalian orders, Carnivora (carnivores) and Chiroptera (bats) are recognized to be responsible for maintaining and spreading the Rabies Lyssavirus (RABV). We propose Virus2Vec, a feature-vector representation for viral (nucleotide or amino acid) sequences that enable vector-space-based machine learning models to identify viral hosts. Virus2Vec generates numerical feature vectors for unaligned sequences, allowing us to forego the computationally expensive sequence alignment step from the pipeline. Virus2Vec leverages the power of both the \emph{minimizer} and position weight matrix (PWM) to generate compact feature vectors. Using several classifiers, we empirically evaluate Virus2Vec on real-world spike sequences of Coronaviridae and rabies virus sequence data to predict the host (identifying the reservoirs of infection). Our results demonstrate that Virus2Vec outperforms the predictive accuracies of baseline and state-of-the-art methods.
△ Less
Submitted 24 April, 2023;
originally announced April 2023.
-
Learning Situation Hyper-Graphs for Video Question Answering
Authors:
Aisha Urooj Khan,
Hilde Kuehne,
Bo Wu,
Kim Chheu,
Walid Bousselham,
Chuang Gan,
Niels Lobo,
Mubarak Shah
Abstract:
Answering questions about complex situations in videos requires not only capturing the presence of actors, objects, and their relations but also the evolution of these relationships over time. A situation hyper-graph is a representation that describes situations as scene sub-graphs for video frames and hyper-edges for connected sub-graphs and has been proposed to capture all such information in a…
▽ More
Answering questions about complex situations in videos requires not only capturing the presence of actors, objects, and their relations but also the evolution of these relationships over time. A situation hyper-graph is a representation that describes situations as scene sub-graphs for video frames and hyper-edges for connected sub-graphs and has been proposed to capture all such information in a compact structured form. In this work, we propose an architecture for Video Question Answering (VQA) that enables answering questions related to video content by predicting situation hyper-graphs, coined Situation Hyper-Graph based Video Question Answering (SHG-VQA). To this end, we train a situation hyper-graph decoder to implicitly identify graph representations with actions and object/human-object relationships from the input video clip. and to use cross-attention between the predicted situation hyper-graphs and the question embedding to predict the correct answer. The proposed method is trained in an end-to-end manner and optimized by a VQA loss with the cross-entropy function and a Hungarian matching loss for the situation graph prediction. The effectiveness of the proposed architecture is extensively evaluated on two challenging benchmarks: AGQA and STAR. Our results show that learning the underlying situation hyper-graphs helps the system to significantly improve its performance for novel challenges of video question-answering tasks.
△ Less
Submitted 6 May, 2023; v1 submitted 17 April, 2023;
originally announced April 2023.
-
Upper Limb Movement Execution Classification using Electroencephalography for Brain Computer Interface
Authors:
Saadat Ullah Khan,
Muhammad Majid,
Syed Muhammad Anwar
Abstract:
An accurate classification of upper limb movements using electroencephalography (EEG) signals is gaining significant importance in recent years due to the prevalence of brain-computer interfaces. The upper limbs in the human body are crucial since different skeletal segments combine to make a range of motion that helps us in our trivial daily tasks. Decoding EEG-based upper limb movements can be o…
▽ More
An accurate classification of upper limb movements using electroencephalography (EEG) signals is gaining significant importance in recent years due to the prevalence of brain-computer interfaces. The upper limbs in the human body are crucial since different skeletal segments combine to make a range of motion that helps us in our trivial daily tasks. Decoding EEG-based upper limb movements can be of great help to people with spinal cord injury (SCI) or other neuro-muscular diseases such as amyotrophic lateral sclerosis (ALS), primary lateral sclerosis, and periodic paralysis. This can manifest in a loss of sensory and motor function, which could make a person reliant on others to provide care in day-to-day activities. We can detect and classify upper limb movement activities, whether they be executed or imagined using an EEG-based brain-computer interface (BCI). Toward this goal, we focus our attention on decoding movement execution (ME) of the upper limb in this study. For this purpose, we utilize a publicly available EEG dataset that contains EEG signal recordings from fifteen subjects acquired using a 61-channel EEG device. We propose a method to classify four ME classes for different subjects using spectrograms of the EEG data through pre-trained deep learning (DL) models. Our proposed method of using EEG spectrograms for the classification of ME has shown significant results, where the highest average classification accuracy (for four ME classes) obtained is 87.36%, with one subject achieving the best classification accuracy of 97.03%.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.
-
BioSequence2Vec: Efficient Embedding Generation For Biological Sequences
Authors:
Sarwan Ali,
Usama Sardar,
Murray Patterson,
Imdad Ullah Khan
Abstract:
Representation learning is an important step in the machine learning pipeline. Given the current biological sequencing data volume, learning an explicit representation is prohibitive due to the dimensionality of the resulting feature vectors. Kernel-based methods, e.g., SVM, are a proven efficient and useful alternative for several machine learning (ML) tasks such as sequence classification. Three…
▽ More
Representation learning is an important step in the machine learning pipeline. Given the current biological sequencing data volume, learning an explicit representation is prohibitive due to the dimensionality of the resulting feature vectors. Kernel-based methods, e.g., SVM, are a proven efficient and useful alternative for several machine learning (ML) tasks such as sequence classification. Three challenges with kernel methods are (i) the computation time, (ii) the memory usage (storing an $n\times n$ matrix), and (iii) the usage of kernel matrices limited to kernel-based ML methods (difficult to generalize on non-kernel classifiers). While (i) can be solved using approximate methods, challenge (ii) remains for typical kernel methods. Similarly, although non-kernel-based ML methods can be applied to kernel matrices by extracting principal components (kernel PCA), it may result in information loss, while being computationally expensive. In this paper, we propose a general-purpose representation learning approach that embodies kernel methods' qualities while avoiding computation, memory, and generalizability challenges. This involves computing a low-dimensional embedding of each sequence, using random projections of its $k$-mer frequency vectors, significantly reducing the computation needed to compute the dot product and the memory needed to store the resulting representation. Our proposed fast and alignment-free embedding method can be used as input to any distance (e.g., $k$ nearest neighbors) and non-distance (e.g., decision tree) based ML method for classification and clustering tasks. Using different forms of biological sequences as input, we perform a variety of real-world classification tasks, such as SARS-CoV-2 lineage and gene family classification, outperforming several state-of-the-art embedding and kernel methods in predictive performance.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.
-
Efficient Classification of SARS-CoV-2 Spike Sequences Using Federated Learning
Authors:
Prakash Chourasia,
Taslim Murad,
Zahra Tayebi,
Sarwan Ali,
Imdad Ullah Khan,
Murray Patterson
Abstract:
This paper presents a federated learning (FL) approach to train an AI model for SARS-Cov-2 variant classification. We analyze the SARS-CoV-2 spike sequences in a distributed way, without data sharing, to detect different variants of this rapidly mutating coronavirus. Our method maintains the confidentiality of local data (that could be stored in different locations) yet allows us to reliably detec…
▽ More
This paper presents a federated learning (FL) approach to train an AI model for SARS-Cov-2 variant classification. We analyze the SARS-CoV-2 spike sequences in a distributed way, without data sharing, to detect different variants of this rapidly mutating coronavirus. Our method maintains the confidentiality of local data (that could be stored in different locations) yet allows us to reliably detect and identify different known and unknown variants of the novel coronavirus SARS-CoV-2. Using the proposed approach, we achieve an overall accuracy of $93\%$ on the coronavirus variant identification task. We also provide details regarding how the proposed model follows the main laws of federated learning, such as Laws of data ownership, data privacy, model aggregation, and model heterogeneity. Since the proposed model is distributed, it could scale on ``Big Data'' easily. We plan to use this proof-of-concept to implement a privacy-preserving pandemic response strategy.
△ Less
Submitted 8 November, 2023; v1 submitted 16 February, 2023;
originally announced February 2023.
-
Metaverse for Wireless Systems: Architecture, Advances, Standardization, and Open Challenges
Authors:
Latif U. Khan,
Mohsen Guizani,
Dusit Niyato,
Ala Al-Fuqaha,
Merouane Debbah
Abstract:
The growing landscape of emerging wireless applications is a key driver toward the development of novel wireless system designs. Such a design can be based on the metaverse that uses a virtual model of the physical world systems along with other schemes/technologies (e.g., optimization theory, machine learning, and blockchain). A metaverse using a virtual model performs proactive intelligent analy…
▽ More
The growing landscape of emerging wireless applications is a key driver toward the development of novel wireless system designs. Such a design can be based on the metaverse that uses a virtual model of the physical world systems along with other schemes/technologies (e.g., optimization theory, machine learning, and blockchain). A metaverse using a virtual model performs proactive intelligent analytics prior to a user request for efficient management of the wireless system resources. Additionally, a metaverse will enable self-sustainability to operate wireless systems with the least possible intervention from network operators. Although the metaverse can offer many benefits, it faces some challenges as well. Therefore, in this tutorial, we discuss the role of a metaverse in enabling wireless applications. We present an overview, key enablers, design aspects (i.e., metaverse for wireless and wireless for metaverse), and a novel high-level architecture of metaverse-based wireless systems. We discuss metaverse management, reliability, and security of the metaverse-based system. Furthermore, we discuss recent advances and standardization of metaverse-enabled wireless system. Finally, we outline open challenges and present possible solutions.
△ Less
Submitted 22 June, 2023; v1 submitted 26 January, 2023;
originally announced January 2023.
-
SafeSpace MFNet: Precise and Efficient MultiFeature Drone Detection Network
Authors:
Misha Urooj Khan,
Mahnoor Dil,
Muhammad Zeshan Alam,
Farooq Alam Orakazi,
Abdullah M. Almasoud,
Zeeshan Kaleem,
Chau Yuen
Abstract:
The increasing prevalence of unmanned aerial vehicles (UAVs), commonly known as drones, has generated a demand for reliable detection systems. The inappropriate use of drones presents potential security and privacy hazards, particularly concerning sensitive facilities. To overcome those obstacles, we proposed the concept of MultiFeatureNet (MFNet), a solution that enhances feature representation b…
▽ More
The increasing prevalence of unmanned aerial vehicles (UAVs), commonly known as drones, has generated a demand for reliable detection systems. The inappropriate use of drones presents potential security and privacy hazards, particularly concerning sensitive facilities. To overcome those obstacles, we proposed the concept of MultiFeatureNet (MFNet), a solution that enhances feature representation by capturing the most concentrated feature maps. Additionally, we present MultiFeatureNet-Feature Attention (MFNet-FA), a technique that adaptively weights different channels of the input feature maps. To meet the requirements of multi-scale detection, we presented the versions of MFNet and MFNet-FA, namely the small (S), medium (M), and large (L). The outcomes reveal notable performance enhancements. For optimal bird detection, MFNet-M (Ablation study 2) achieves an impressive precision of 99.8\%, while for UAV detection, MFNet-L (Ablation study 2) achieves a precision score of 97.2\%. Among the options, MFNet-FA-S (Ablation study 3) emerges as the most resource-efficient alternative, considering its small feature map size, computational demands (GFLOPs), and operational efficiency (in frame per second). This makes it particularly suitable for deployment on hardware with limited capabilities. Additionally, MFNet-FA-S (Ablation study 3) stands out for its swift real-time inference and multiple-object detection due to the incorporation of the FA module. The proposed MFNet-L with the focus module (Ablation study 2) demonstrates the most remarkable classification outcomes, boasting an average precision of 98.4\%, average recall of 96.6\%, average mean average precision (mAP) of 98.3\%, and average intersection over union (IoU) of 72.8\%. To encourage reproducible research, the dataset, and code for MFNet are freely available as an open-source project: github.com/ZeeshanKaleem/MultiFeatureNet.
△ Less
Submitted 6 October, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
TF-Net: Deep Learning Empowered Tiny Feature Network for Night-time UAV Detection
Authors:
Maham Misbah,
Misha Urooj Khan,
Zhaohui Yang,
Zeeshan Kaleem
Abstract:
Technological advancements have normalized the usage of unmanned aerial vehicles (UAVs) in every sector, spanning from military to commercial but they also pose serious security concerns due to their enhanced functionalities and easy access to private and highly secured areas. Several instances related to UAVs have raised security concerns, leading to UAV detection research studies. Visual techniq…
▽ More
Technological advancements have normalized the usage of unmanned aerial vehicles (UAVs) in every sector, spanning from military to commercial but they also pose serious security concerns due to their enhanced functionalities and easy access to private and highly secured areas. Several instances related to UAVs have raised security concerns, leading to UAV detection research studies. Visual techniques are widely adopted for UAV detection, but they perform poorly at night, in complex backgrounds, and in adverse weather conditions. Therefore, a robust night vision-based drone detection system is required to that could efficiently tackle this problem. Infrared cameras are increasingly used for nighttime surveillance due to their wide applications in night vision equipment. This paper uses a deep learning-based TinyFeatureNet (TF-Net), which is an improved version of YOLOv5s, to accurately detect UAVs during the night using infrared (IR) images. In the proposed TF-Net, we introduce architectural changes in the neck and backbone of the YOLOv5s. We also simulated four different YOLOv5 models (s,m,n,l) and proposed TF-Net for a fair comparison. The results showed better performance for the proposed TF-Net in terms of precision, IoU, GFLOPS, model size, and FPS compared to the YOLOv5s. TF-Net yielded the best results with 95.7\% precision, 84\% mAp, and 44.8\% $IoU$.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Motor imagery classification using EEG spectrograms
Authors:
Saadat Ullah Khan,
Muhammad Majid,
Syed Muhammad Anwar
Abstract:
The loss of limb motion arising from damage to the spinal cord is a disability that could effect people while performing their day-to-day activities. The restoration of limb movement would enable people with spinal cord injury to interact with their environment more naturally and this is where a brain-computer interface (BCI) system could be beneficial. The detection of limb movement imagination (…
▽ More
The loss of limb motion arising from damage to the spinal cord is a disability that could effect people while performing their day-to-day activities. The restoration of limb movement would enable people with spinal cord injury to interact with their environment more naturally and this is where a brain-computer interface (BCI) system could be beneficial. The detection of limb movement imagination (MI) could be significant for such a BCI, where the detected MI can guide the computer system. Using MI detection through electroencephalography (EEG), we can recognize the imagination of movement in a user and translate this into a physical movement. In this paper, we utilize pre-trained deep learning (DL) algorithms for the classification of imagined upper limb movements. We use a publicly available EEG dataset with data representing seven classes of limb movements. We compute the spectrograms of the time series EEG signal and use them as an input to the DL model for MI classification. Our novel approach for the classification of upper limb movements using pre-trained DL algorithms and spectrograms has achieved significantly improved results for seven movement classes. When compared with the recently proposed state-of-the-art methods, our algorithm achieved a significant average accuracy of 84.9% for classifying seven movements.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
Machine Learning for Metaverse-enabled Wireless Systems: Vision, Requirements, and Challenges
Authors:
Latif U. Khan,
Ibrar Yaqoob,
Khaled Salah,
Choong Seon Hong,
Dusit Niyato,
Zhu Han,
Mohsen Guizani
Abstract:
Today's wireless systems are posing key challenges in terms of quality of service and quality of physical experience. Metaverse has the potential to reshape, transform, and add innovations to the existing wireless systems. A metaverse is a collective virtual open space that can enable wireless systems using digital twins, digital avatars, and interactive experience technologies. Machine learning (…
▽ More
Today's wireless systems are posing key challenges in terms of quality of service and quality of physical experience. Metaverse has the potential to reshape, transform, and add innovations to the existing wireless systems. A metaverse is a collective virtual open space that can enable wireless systems using digital twins, digital avatars, and interactive experience technologies. Machine learning (ML) is indispensable for modeling twins, avatars, and deploying interactive experience technologies. In this paper, we present the role of ML in enabling metaverse-based wireless systems. We identify and discuss a set of key requirements for advancing ML in the metaverse-based wireless systems. Moreover, we present a case study of distributed split federated learning for efficiently training meta-space models. Finally, we discuss the future challenges along with potential solutions.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.