Search | arXiv e-print repository

SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators

Authors: Mohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque

Abstract: Emerging multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware. To address such increasing demands, designing a scalable hardware architecture became a key problem. Among recent solutions, the 2.5D silicon interposer multi-chip module (MCM)-based AI accelerator has been actively explored as a promising scalable… ▽ More Emerging multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware. To address such increasing demands, designing a scalable hardware architecture became a key problem. Among recent solutions, the 2.5D silicon interposer multi-chip module (MCM)-based AI accelerator has been actively explored as a promising scalable solution due to their significant benefits in the low engineering cost and composability. However, previous MCM accelerators are based on homogeneous architectures with fixed dataflow, which encounter major challenges from highly heterogeneous multi-model workloads due to their limited workload adaptivity. Therefore, in this work, we explore the opportunity in the heterogeneous dataflow MCM AI accelerators. We identify the scheduling of multi-model workload on heterogeneous dataflow MCM AI accelerator is an important and challenging problem due to its significance and scale, which reaches O(10^18) scale even for a single model case on 6x6 chiplets. We develop a set of heuristics to navigate the huge scheduling space and codify them into a scheduler with advanced techniques such as inter-chiplet pipelining. Our evaluation on ten multi-model workload scenarios for datacenter multitenancy and AR/VR use-cases has shown the efficacy of our approach, achieving on average 35.3% and 31.4% less energy-delay product (EDP) for the respective applications settings compared to homogeneous baselines. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2402.14642 [pdf, other]

Distributed Radiance Fields for Edge Video Compression and Metaverse Integration in Autonomous Driving

Authors: Eugen Šlapak, Matúš Dopiriak, Mohammad Abdullah Al Faruque, Juraj Gazda, Marco Levorato

Abstract: The metaverse is a virtual space that combines physical and digital elements, creating immersive and connected digital worlds. For autonomous mobility, it enables new possibilities with edge computing and digital twins (DTs) that offer virtual prototy**, prediction, and more. DTs can be created with 3D scene reconstruction methods that capture the real world's geometry, appearance, and dynamics.… ▽ More The metaverse is a virtual space that combines physical and digital elements, creating immersive and connected digital worlds. For autonomous mobility, it enables new possibilities with edge computing and digital twins (DTs) that offer virtual prototy**, prediction, and more. DTs can be created with 3D scene reconstruction methods that capture the real world's geometry, appearance, and dynamics. However, sending data for real-time DT updates in the metaverse, such as camera images and videos from connected autonomous vehicles (CAVs) to edge servers, can increase network congestion, costs, and latency, affecting metaverse services. Herein, a new method is proposed based on distributed radiance fields (RFs), multi-access edge computing (MEC) network for video compression and metaverse DT updates. RF-based encoder and decoder are used to create and restore representations of camera images. The method is evaluated on a dataset of camera images from the CARLA simulator. Data savings of up to 80% were achieved for H.264 I-frame - P-frame pairs by using RFs instead of I-frames, while maintaining high peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) qualitative metrics for the reconstructed images. Possible uses and challenges for the metaverse and autonomous mobility are also discussed. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 6 pages, 6 figures, conference

arXiv:2402.13233 [pdf, other]

SMORE: Similarity-based Hyperdimensional Domain Adaptation for Multi-Sensor Time Series Classification

Authors: Junyao Wang, Mohammad Abdullah Al Faruque

Abstract: Many real-world applications of the Internet of Things (IoT) employ machine learning (ML) algorithms to analyze time series information collected by interconnected sensors. However, distribution shift, a fundamental challenge in data-driven ML, arises when a model is deployed on a data distribution different from the training data and can substantially degrade model performance. Additionally, incr… ▽ More Many real-world applications of the Internet of Things (IoT) employ machine learning (ML) algorithms to analyze time series information collected by interconnected sensors. However, distribution shift, a fundamental challenge in data-driven ML, arises when a model is deployed on a data distribution different from the training data and can substantially degrade model performance. Additionally, increasingly sophisticated deep neural networks (DNNs) are required to capture intricate spatial and temporal dependencies in multi-sensor time series data, often exceeding the capabilities of today's edge devices. In this paper, we propose SMORE, a novel resource-efficient domain adaptation (DA) algorithm for multi-sensor time series classification, leveraging the efficient and parallel operations of hyperdimensional computing. SMORE dynamically customizes test-time models with explicit consideration of the domain context of each sample to mitigate the negative impacts of domain shifts. Our evaluation on a variety of multi-sensor time series classification tasks shows that SMORE achieves on average 1.98% higher accuracy than state-of-the-art (SOTA) DNN-based DA algorithms with 18.81x faster training and 4.63x faster inference. △ Less

Submitted 26 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: arXiv admin note: text overlap with arXiv:2308.03295

arXiv:2402.12642 [pdf, other]

Rampo: A CEGAR-based Integration of Binary Code Analysis and System Falsification for Cyber-Kinetic Vulnerability Detection

Authors: Kohei Tsujio, Mohammad Abdullah Al Faruque, Yasser Shoukry

Abstract: This paper presents a novel tool, named Rampo, that can perform binary code analysis to identify cyber kinetic vulnerabilities in CPS. The tool takes as input a Signal Temporal Logic (STL) formula that describes the kinetic effect, i.e., the behavior of the physical system, that one wants to avoid. The tool then searches the possible cyber trajectories in the binary code that may lead to such phys… ▽ More This paper presents a novel tool, named Rampo, that can perform binary code analysis to identify cyber kinetic vulnerabilities in CPS. The tool takes as input a Signal Temporal Logic (STL) formula that describes the kinetic effect, i.e., the behavior of the physical system, that one wants to avoid. The tool then searches the possible cyber trajectories in the binary code that may lead to such physical behavior. This search integrates binary code analysis tools and hybrid systems falsification tools using a Counter-Example Guided Abstraction Refinement (CEGAR) approach. Rampo starts by analyzing the binary code to extract symbolic constraints that represent the different paths in the code. These symbolic constraints are then passed to a Satisfiability Modulo Theories (SMT) solver to extract the range of control signals that can be produced by each path in the code. The next step is to search over possible physical trajectories using a hybrid systems falsification tool that adheres to the behavior of the cyber paths and yet leads to violations of the STL formula. Since the number of cyber paths that need to be explored increases exponentially with the length of physical trajectories, we iteratively perform refinement of the cyber path constraints based on the previous falsification result and traverse the abstract path tree obtained from the control program to explore the search space of the system. To illustrate the practical utility of binary code analysis in identifying cyber kinetic vulnerabilities, we present case studies from diverse CPS domains, showcasing how they can be discovered in their control programs. Our tool could compute the same number of vulnerabilities while leading to a speedup that ranges from 3x to 98x. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2401.05443 [pdf, other]

doi 10.1145/3639477.3639743

LLM4PLC: Harnessing Large Language Models for Verifiable Programming of PLCs in Industrial Control Systems

Authors: Mohamad Fakih, Rahul Dharmaji, Yasamin Moghaddas, Gustavo Quiros Araya, Oluwatosin Ogundare, Mohammad Abdullah Al Faruque

Abstract: Although Large Language Models (LLMs) have established pre-dominance in automated code generation, they are not devoid of shortcomings. The pertinent issues primarily relate to the absence of execution guarantees for generated code, a lack of explainability, and suboptimal support for essential but niche programming languages. State-of-the-art LLMs such as GPT-4 and LLaMa2 fail to produce valid pr… ▽ More Although Large Language Models (LLMs) have established pre-dominance in automated code generation, they are not devoid of shortcomings. The pertinent issues primarily relate to the absence of execution guarantees for generated code, a lack of explainability, and suboptimal support for essential but niche programming languages. State-of-the-art LLMs such as GPT-4 and LLaMa2 fail to produce valid programs for Industrial Control Systems (ICS) operated by Programmable Logic Controllers (PLCs). We propose LLM4PLC, a user-guided iterative pipeline leveraging user feedback and external verification tools including grammar checkers, compilers and SMV verifiers to guide the LLM's generation. We further enhance the generation potential of LLM by employing Prompt Engineering and model fine-tuning through the creation and usage of LoRAs. We validate this system using a FischerTechnik Manufacturing TestBed (MFTB), illustrating how LLMs can evolve from generating structurally flawed code to producing verifiably correct programs for industrial applications. We run a complete test suite on GPT-3.5, GPT-4, Code Llama-7B, a fine-tuned Code Llama-7B model, Code Llama-34B, and a fine-tuned Code Llama-34B model. The proposed pipeline improved the generation success rate from 47% to 72%, and the Survey-of-Experts code quality from 2.25/10 to 7.75/10. To promote open research, we share the complete experimental setup, the LLM Fine-Tuning Weights, and the video demonstrations of the different programs on our dedicated webpage. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 12 pages; 8 figures; Appearing in the 46th International Conference on Software Engineering: Software Engineering in Practice; for demo website, see https://sites.google.com/uci.edu/llm4plc/home

ACM Class: D.2.4; I.2.7; I.2.2

arXiv:2312.09401 [pdf, other]

Inter-Layer Scheduling Space Exploration for Multi-model Inference on Heterogeneous Chiplets

Authors: Mohanad Odema, Hyoukjun Kwon, Mohammad Abdullah Al Faruque

Abstract: To address increasing compute demand from recent multi-model workloads with heavy models like large language models, we propose to deploy heterogeneous chiplet-based multi-chip module (MCM)-based accelerators. We develop an advanced scheduling framework for heterogeneous MCM accelerators that comprehensively consider complex heterogeneity and inter-chiplet pipelining. Our experiments using our fra… ▽ More To address increasing compute demand from recent multi-model workloads with heavy models like large language models, we propose to deploy heterogeneous chiplet-based multi-chip module (MCM)-based accelerators. We develop an advanced scheduling framework for heterogeneous MCM accelerators that comprehensively consider complex heterogeneity and inter-chiplet pipelining. Our experiments using our framework on GPT-2 and ResNet-50 models on a 4-chiplet system have shown upto 2.2x and 1.9x increase in throughput and energy efficiency, compared to a monolithic accelerator with an optimized output-stationary dataflow. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted poster abstract to the IBM IEEE AI Compute Symposium (AICS'23)

arXiv:2311.07705 [pdf, other]

Robust and Scalable Hyperdimensional Computing With Brain-Like Neural Adaptations

Authors: Junyao Wang, Mohammad Abdullah Al Faruque

Abstract: The Internet of Things (IoT) has facilitated many applications utilizing edge-based machine learning (ML) methods to analyze locally collected data. Unfortunately, popular ML algorithms often require intensive computations beyond the capabilities of today's IoT devices. Brain-inspired hyperdimensional computing (HDC) has been introduced to address this issue. However, existing HDCs use static enco… ▽ More The Internet of Things (IoT) has facilitated many applications utilizing edge-based machine learning (ML) methods to analyze locally collected data. Unfortunately, popular ML algorithms often require intensive computations beyond the capabilities of today's IoT devices. Brain-inspired hyperdimensional computing (HDC) has been introduced to address this issue. However, existing HDCs use static encoders, requiring extremely high dimensionality and hundreds of training iterations to achieve reasonable accuracy. This results in a huge efficiency loss, severely impeding the application of HDCs in IoT systems. We observed that a main cause is that the encoding module of existing HDCs lacks the capability to utilize and adapt to information learned during training. In contrast, neurons in human brains dynamically regenerate all the time and provide more useful functionalities when learning new information. While the goal of HDC is to exploit the high-dimensionality of randomly generated base hypervectors to represent the information as a pattern of neural activity, it remains challenging for existing HDCs to support a similar behavior as brain neural regeneration. In this work, we present dynamic HDC learning frameworks that identify and regenerate undesired dimensions to provide adequate accuracy with significantly lowered dimensionalities, thereby accelerating both the training and inference. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2304.05503

arXiv:2308.03295 [pdf, other]

DOMINO: Domain-invariant Hyperdimensional Classification for Multi-Sensor Time Series Data

Authors: Junyao Wang, Luke Chen, Mohammad Abdullah Al Faruque

Abstract: With the rapid evolution of the Internet of Things, many real-world applications utilize heterogeneously connected sensors to capture time-series information. Edge-based machine learning (ML) methodologies are often employed to analyze locally collected data. However, a fundamental issue across data-driven ML approaches is distribution shift. It occurs when a model is deployed on a data distributi… ▽ More With the rapid evolution of the Internet of Things, many real-world applications utilize heterogeneously connected sensors to capture time-series information. Edge-based machine learning (ML) methodologies are often employed to analyze locally collected data. However, a fundamental issue across data-driven ML approaches is distribution shift. It occurs when a model is deployed on a data distribution different from what it was trained on, and can substantially degrade model performance. Additionally, increasingly sophisticated deep neural networks (DNNs) have been proposed to capture spatial and temporal dependencies in multi-sensor time series data, requiring intensive computational resources beyond the capacity of today's edge devices. While brain-inspired hyperdimensional computing (HDC) has been introduced as a lightweight solution for edge-based learning, existing HDCs are also vulnerable to the distribution shift challenge. In this paper, we propose DOMINO, a novel HDC learning framework addressing the distribution shift problem in noisy multi-sensor time-series data. DOMINO leverages efficient and parallel matrix operations on high-dimensional space to dynamically identify and filter out domain-variant dimensions. Our evaluation on a wide range of multi-sensor time series classification tasks shows that DOMINO achieves on average 2.04% higher accuracy than state-of-the-art (SOTA) DNN-based domain generalization techniques, and delivers 16.34x faster training and 2.89x faster inference. More importantly, DOMINO performs notably better when learning from partially labeled and highly imbalanced data, providing 10.93x higher robustness against hardware noises than SOTA DNNs. △ Less

Submitted 18 August, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

arXiv:2307.08065 [pdf, other]

MaGNAS: A Map**-Aware Graph Neural Architecture Search Framework for Heterogeneous MPSoC Deployment

Authors: Mohanad Odema, Halima Bouzidi, Hamza Ouarnoughi, Smail Niar, Mohammad Abdullah Al Faruque

Abstract: Graph Neural Networks (GNNs) are becoming increasingly popular for vision-based applications due to their intrinsic capacity in modeling structural and contextual relations between various parts of an image frame. On another front, the rising popularity of deep vision-based applications at the edge has been facilitated by the recent advancements in heterogeneous multi-processor Systems on Chips (M… ▽ More Graph Neural Networks (GNNs) are becoming increasingly popular for vision-based applications due to their intrinsic capacity in modeling structural and contextual relations between various parts of an image frame. On another front, the rising popularity of deep vision-based applications at the edge has been facilitated by the recent advancements in heterogeneous multi-processor Systems on Chips (MPSoCs) that enable inference under real-time, stringent execution requirements. By extension, GNNs employed for vision-based applications must adhere to the same execution requirements. Yet contrary to typical deep neural networks, the irregular flow of graph learning operations poses a challenge to running GNNs on such heterogeneous MPSoC platforms. In this paper, we propose a novel unified design-map** approach for efficient processing of vision GNN workloads on heterogeneous MPSoC platforms. Particularly, we develop MaGNAS, a map**-aware Graph Neural Architecture Search framework. MaGNAS proposes a GNN architectural design space coupled with prospective map** options on a heterogeneous SoC to identify model architectures that maximize on-device resource efficiency. To achieve this, MaGNAS employs a two-tier evolutionary search to identify optimal GNNs and map** pairings that yield the best performance trade-offs. Through designing a supernet derived from the recent Vision GNN (ViG) architecture, we conducted experiments on four (04) state-of-the-art vision datasets using both (i) a real hardware SoC platform (NVIDIA Xavier AGX) and (ii) a performance/cost model simulator for DNN accelerators. Our experimental results demonstrate that MaGNAS is able to provide 1.57x latency speedup and is 3.38x more energy-efficient for several vision datasets executed on the Xavier MPSoC vs. the GPU-only deployment while sustaining an average 0.11% accuracy reduction from the baseline. △ Less

Submitted 16 July, 2023; originally announced July 2023.

Comments: This article appears as part of the ESWEEK-TECS special issue and was presented in the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES), 2023

arXiv:2306.15748 [pdf, other]

CARMA: Context-Aware Runtime Reconfiguration for Energy-Efficient Sensor Fusion

Authors: Yifan Zhang, Arnav Vaibhav Malawade, Xiaofang Zhang, Yuhui Li, DongHwan Seong, Mohammad Abdullah Al Faruque, Sitao Huang

Abstract: Autonomous systems (AS) are systems that can adapt and change their behavior in response to unanticipated events and include systems such as aerial drones, autonomous vehicles, and ground/aquatic robots. AS require a wide array of sensors, deep-learning models, and powerful hardware platforms to perceive and safely operate in real-time. However, in many contexts, some sensing modalities negatively… ▽ More Autonomous systems (AS) are systems that can adapt and change their behavior in response to unanticipated events and include systems such as aerial drones, autonomous vehicles, and ground/aquatic robots. AS require a wide array of sensors, deep-learning models, and powerful hardware platforms to perceive and safely operate in real-time. However, in many contexts, some sensing modalities negatively impact perception while increasing the system's overall energy consumption. Since AS are often energy-constrained edge devices, energy-efficient sensor fusion methods have been proposed. However, existing methods either fail to adapt to changing scenario conditions or to optimize energy efficiency system-wide. We propose CARMA: a context-aware sensor fusion approach that uses context to dynamically reconfigure the computation flow on a Field-Programmable Gate Array (FPGA) at runtime. By clock-gating unused sensors and model sub-components, CARMA significantly reduces the energy used by a multi-sensory object detector without compromising performance. We use a Deep-learning Processor Unit (DPU) based reconfiguration approach to minimize the latency of model reconfiguration. We evaluate multiple context-identification strategies, propose a novel system-wide energy-performance joint optimization, and evaluate scenario-specific perception performance. Across challenging real-world sensing contexts, CARMA outperforms state-of-the-art methods with up to 1.3x speedup and 73% lower energy consumption. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: Accepted to be published in the 2023 ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED 2023)

arXiv:2304.08600 [pdf, other]

RS2G: Data-Driven Scene-Graph Extraction and Embedding for Robust Autonomous Perception and Scenario Understanding

Authors: Junyao Wang, Arnav Vaibhav Malawade, Junhong Zhou, Shih-Yuan Yu, Mohammad Abdullah Al Faruque

Abstract: Effectively capturing intricate interactions among road users is of critical importance to achieving safe navigation for autonomous vehicles. While graph learning (GL) has emerged as a promising approach to tackle this challenge, existing GL models rely on predefined domain-specific graph extraction rules that often fail in real-world drastically changing scenarios. Additionally, these graph extra… ▽ More Effectively capturing intricate interactions among road users is of critical importance to achieving safe navigation for autonomous vehicles. While graph learning (GL) has emerged as a promising approach to tackle this challenge, existing GL models rely on predefined domain-specific graph extraction rules that often fail in real-world drastically changing scenarios. Additionally, these graph extraction rules severely impede the capability of existing GL methods to generalize knowledge across domains. To address this issue, we propose RoadScene2Graph (RS2G), an innovative autonomous scenario understanding framework with a novel data-driven graph extraction and modeling approach that dynamically captures the diverse relations among road users. Our evaluations demonstrate that on average RS2G outperforms the state-of-the-art (SOTA) rule-based graph extraction method by 4.47% and the SOTA deep learning model by 22.19% in subjective risk assessment. More importantly, RS2G delivers notably better performance in transferring knowledge gained from simulation environments to unseen real-world scenarios. △ Less

Submitted 4 September, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

arXiv:2303.08215 [pdf, other]

Stress Detection using Context-Aware Sensor Fusion from Wearable Devices

Authors: Nafiul Rashid, Trier Mortlock, Mohammad Abdullah Al Faruque

Abstract: Wearable medical technology has become increasingly popular in recent years. One function of wearable health devices is stress detection, which relies on sensor inputs to determine the mental state of patients. This continuous, real-time monitoring can provide healthcare professionals with vital physiological data and enhance the quality of patient care. Current methods of stress detection lack: (… ▽ More Wearable medical technology has become increasingly popular in recent years. One function of wearable health devices is stress detection, which relies on sensor inputs to determine the mental state of patients. This continuous, real-time monitoring can provide healthcare professionals with vital physiological data and enhance the quality of patient care. Current methods of stress detection lack: (i) robustness -- wearable health sensors contain high levels of measurement noise that degrades performance, and (ii) adaptation -- static architectures fail to adapt to changing contexts in sensing conditions. We propose to address these deficiencies with SELF-CARE, a generalized selective sensor fusion method of stress detection that employs novel techniques of context identification and ensemble machine learning. SELF-CARE uses a learning-based classifier to process sensor features and model the environmental variations in sensing conditions known as the noise context. SELF-CARE uses noise context to selectively fuse different sensor combinations across an ensemble of models to perform robust stress classification. Our findings suggest that for wrist-worn devices, sensors that measure motion are most suitable to understand noise context, while for chest-worn devices, the most suitable sensors are those that detect muscle contraction. SELF-CARE demonstrates state-of-the-art performance on the WESAD dataset. Using wrist-based sensors, SELF-CARE achieves 86.34% and 94.12% accuracy for the 3-class and 2-class stress classification problems, respectively. For chest-based wearable sensors, SELF-CARE achieves 86.19% (3-class) and 93.68% (2-class) classification accuracy. This work demonstrates the benefits of utilizing selective, context-aware sensor fusion in mobile health sensing that can be applied broadly to Internet of Things applications. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: 14 pages, 7 figures

arXiv:2303.04292 [pdf, other]

ERUDITE: Human-in-the-Loop IoT for an Adaptive Personalized Learning System

Authors: Mojtaba Taherisadr, Mohammad Abdullah Al Faruque, Salma Elmalaki

Abstract: Thanks to the rapid growth in wearable technologies and recent advancement in machine learning and signal processing, monitoring complex human contexts becomes feasible, paving the way to develop human-in-the-loop IoT systems that naturally evolve to adapt to the human and environment state autonomously. Nevertheless, a central challenge in designing many of these IoT systems arises from the requi… ▽ More Thanks to the rapid growth in wearable technologies and recent advancement in machine learning and signal processing, monitoring complex human contexts becomes feasible, paving the way to develop human-in-the-loop IoT systems that naturally evolve to adapt to the human and environment state autonomously. Nevertheless, a central challenge in designing many of these IoT systems arises from the requirement to infer the human mental state, such as intention, stress, cognition load, or learning ability. While different human contexts can be inferred from the fusion of different sensor modalities that can correlate to a particular mental state, the human brain provides a richer sensor modality that gives us more insights into the required human context. This paper proposes ERUDITE, a human-in-the-loop IoT system for the learning environment that exploits recent wearable neurotechnology to decode brain signals. Through insights from concept learning theory, ERUDITE can infer the human state of learning and understand when human learning increases or declines. By quantifying human learning as an input sensory signal, ERUDITE can provide adequate personalized feedback to humans in a learning environment to enhance their learning experience. ERUDITE is evaluated across $15$ participants and showed that by using the brain signals as a sensor modality to infer the human learning state and providing personalized adaptation to the learning environment, the participants' learning performance increased on average by $26\%$. Furthermore, we showed that ERUDITE can be deployed on an edge-based prototype to evaluate its practicality and scalability. △ Less

Submitted 20 November, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

Comments: It is under review in the IEEE IoT journal

arXiv:2302.12926 [pdf, other]

Map-and-Conquer: Energy-Efficient Map** of Dynamic Neural Nets onto Heterogeneous MPSoCs

Authors: Halima Bouzidi, Mohanad Odema, Hamza Ouarnoughi, Smail Niar, Mohammad Abdullah Al Faruque

Abstract: Heterogeneous MPSoCs comprise diverse processing units of varying compute capabilities. To date, the map** strategies of neural networks (NNs) onto such systems are yet to exploit the full potential of processing parallelism, made possible through both the intrinsic NNs' structure and underlying hardware composition. In this paper, we propose a novel framework to effectively map NNs onto heterog… ▽ More Heterogeneous MPSoCs comprise diverse processing units of varying compute capabilities. To date, the map** strategies of neural networks (NNs) onto such systems are yet to exploit the full potential of processing parallelism, made possible through both the intrinsic NNs' structure and underlying hardware composition. In this paper, we propose a novel framework to effectively map NNs onto heterogeneous MPSoCs in a manner that enables them to leverage the underlying processing concurrency. Specifically, our approach identifies an optimal partitioning scheme of the NN along its `width' dimension, which facilitates deployment of concurrent NN blocks onto different hardware computing units. Additionally, our approach contributes a novel scheme to deploy partitioned NNs onto the MPSoC as dynamic multi-exit networks for additional performance gains. Our experiments on a standard MPSoC platform have yielded dynamic map** configurations that are 2.1x more energy-efficient than the GPU-only map** while incurring 1.7x less latency than DLA-only map**. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: Accepted to the 60th ACM/IEEE Design Automation Conference (DAC 2023)

arXiv:2302.12493 [pdf, other]

SEO: Safety-Aware Energy Optimization Framework for Multi-Sensor Neural Controllers at the Edge

Authors: Mohanad Odema, James Ferlez, Yasser Shoukry, Mohammad Abdullah Al Faruque

Abstract: Runtime energy management has become quintessential for multi-sensor autonomous systems at the edge for achieving high performance given the platform constraints. Typical for such systems, however, is to have their controllers designed with formal guarantees on safety that precede in priority such optimizations, which in turn limits their application in real settings. In this paper, we propose a n… ▽ More Runtime energy management has become quintessential for multi-sensor autonomous systems at the edge for achieving high performance given the platform constraints. Typical for such systems, however, is to have their controllers designed with formal guarantees on safety that precede in priority such optimizations, which in turn limits their application in real settings. In this paper, we propose a novel energy optimization framework that is aware of the autonomous system's safety state, and leverages it to regulate the application of energy optimization methods so that the system's formal safety properties are preserved. In particular, through the formal characterization of a system's safety state as a dynamic processing deadline, the computing workloads of the underlying models can be adapted accordingly. For our experiments, we model two popular runtime energy optimization methods, offloading and gating, and simulate an autonomous driving system (ADS) use-case in the CARLA simulation environment with performance characterizations obtained from the standard Nvidia Drive PX2 ADS platform. Our results demonstrate that through a formal awareness of the perceived risks in the test case scenario, energy efficiency gains are still achieved (reaching 89.9%) while maintaining the desired safety properties. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: Accepted to the 60th ACM/IEEE Design Automation Conference (DAC 2023)

arXiv:2302.06572 [pdf, other]

EnergyShield: Provably-Safe Offloading of Neural Network Controllers for Energy Efficiency

Authors: Mohanad Odema, James Ferlez, Goli Vaisi, Yasser Shoukry, Mohammad Abdullah Al Faruque

Abstract: To mitigate the high energy demand of Neural Network (NN) based Autonomous Driving Systems (ADSs), we consider the problem of offloading NN controllers from the ADS to nearby edge-computing infrastructure, but in such a way that formal vehicle safety properties are guaranteed. In particular, we propose the EnergyShield framework, which repurposes a controller ''shield'' as a low-power runtime safe… ▽ More To mitigate the high energy demand of Neural Network (NN) based Autonomous Driving Systems (ADSs), we consider the problem of offloading NN controllers from the ADS to nearby edge-computing infrastructure, but in such a way that formal vehicle safety properties are guaranteed. In particular, we propose the EnergyShield framework, which repurposes a controller ''shield'' as a low-power runtime safety monitor for the ADS vehicle. Specifically, the shield in EnergyShield provides not only safety interventions but also a formal, state-based quantification of the tolerable edge response time before vehicle safety is compromised. Using EnergyShield, an ADS can then save energy by wirelessly offloading NN computations to edge computers, while still maintaining a formal guarantee of safety until it receives a response (on-vehicle hardware provides a just-in-time fail safe). To validate the benefits of EnergyShield, we implemented and tested it in the Carla simulation environment. Our results show that EnergyShield maintains safe vehicle operation while providing significant energy savings compared to on-vehicle NN evaluation: from 24% to 54% less energy across a range of wireless conditions and edge delays. △ Less

Submitted 13 February, 2023; originally announced February 2023.

Comments: Accepted to be published in the proceedings of the 14th ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS 2023)

arXiv:2301.02723 [pdf, other]

CFG2VEC: Hierarchical Graph Neural Network for Cross-Architectural Software Reverse Engineering

Authors: Shih-Yuan Yu, Yonatan Gizachew Achamyeleh, Chonghan Wang, Anton Kocheturov, Patrick Eisen, Mohammad Abdullah Al Faruque

Abstract: Mission-critical embedded software is critical to our society's infrastructure but can be subject to new security vulnerabilities as technology advances. When security issues arise, Reverse Engineers (REs) use Software Reverse Engineering (SRE) tools to analyze vulnerable binaries. However, existing tools have limited support, and REs undergo a time-consuming, costly, and error-prone process that… ▽ More Mission-critical embedded software is critical to our society's infrastructure but can be subject to new security vulnerabilities as technology advances. When security issues arise, Reverse Engineers (REs) use Software Reverse Engineering (SRE) tools to analyze vulnerable binaries. However, existing tools have limited support, and REs undergo a time-consuming, costly, and error-prone process that requires experience and expertise to understand the behaviors of software and vulnerabilities. To improve these tools, we propose $\textit{cfg2vec}$, a Hierarchical Graph Neural Network (GNN) based approach. To represent binary, we propose a novel Graph-of-Graph (GoG) representation, combining the information of control-flow and function-call graphs. Our $\textit{cfg2vec}$ learns how to represent each binary function compiled from various CPU architectures, utilizing hierarchical GNN and the siamese network-based supervised learning architecture. We evaluate $\textit{cfg2vec}$'s capability of predicting function names from stripped binaries. Our results show that $\textit{cfg2vec}$ outperforms the state-of-the-art by $24.54\%$ in predicting function names and can even achieve $51.84\%$ better given more training data. Additionally, $\textit{cfg2vec}$ consistently outperforms the state-of-the-art for all CPU architectures, while the baseline requires multiple training to achieve similar performance. More importantly, our results demonstrate that our $\textit{cfg2vec}$ could tackle binaries built from unseen CPU architectures, thus indicating that our approach can generalize the learned knowledge. Lastly, we demonstrate its practicability by implementing it as a Ghidra plugin used during resolving DARPA Assured MicroPatching (AMP) challenges. △ Less

Submitted 6 January, 2023; originally announced January 2023.

arXiv:2212.03354 [pdf, other]

HADAS: Hardware-Aware Dynamic Neural Architecture Search for Edge Performance Scaling

Authors: Halima Bouzidi, Mohanad Odema, Hamza Ouarnoughi, Mohammad Abdullah Al Faruque, Smail Niar

Abstract: Dynamic neural networks (DyNNs) have become viable techniques to enable intelligence on resource-constrained edge devices while maintaining computational efficiency. In many cases, the implementation of DyNNs can be sub-optimal due to its underlying backbone architecture being developed at the design stage independent of both: (i) the dynamic computing features, e.g. early exiting, and (ii) the re… ▽ More Dynamic neural networks (DyNNs) have become viable techniques to enable intelligence on resource-constrained edge devices while maintaining computational efficiency. In many cases, the implementation of DyNNs can be sub-optimal due to its underlying backbone architecture being developed at the design stage independent of both: (i) the dynamic computing features, e.g. early exiting, and (ii) the resource efficiency features of the underlying hardware, e.g., dynamic voltage and frequency scaling (DVFS). Addressing this, we present HADAS, a novel Hardware-Aware Dynamic Neural Architecture Search framework that realizes DyNN architectures whose backbone, early exiting features, and DVFS settings have been jointly optimized to maximize performance and resource efficiency. Our experiments using the CIFAR-100 dataset and a diverse set of edge computing platforms have seen HADAS dynamic models achieve up to 57% energy efficiency gains compared to the conventional dynamic ones while maintaining the desired level of accuracy scores. Our code is available at https://github.com/HalimaBouzidi/HADAS △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: To be published in the 26th IEEE/ACM Design, Automation & Test in Europe Conference & Exhibition (DATE), April 2023, Antwerp, Belgium

arXiv:2210.03719 [pdf, other]

BayesImposter: Bayesian Estimation Based .bss Imposter Attack on Industrial Control Systems

Authors: Anomadarshi Barua, Lelin Pan, Mohammad Abdullah Al Faruque

Abstract: Over the last six years, several papers used memory deduplication to trigger various security issues, such as leaking heap-address and causing bit-flip in the physical memory. The most essential requirement for successful memory deduplication is to provide identical copies of a physical page. Recent works use a brute-force approach to create identical copies of a physical page that is an inaccurat… ▽ More Over the last six years, several papers used memory deduplication to trigger various security issues, such as leaking heap-address and causing bit-flip in the physical memory. The most essential requirement for successful memory deduplication is to provide identical copies of a physical page. Recent works use a brute-force approach to create identical copies of a physical page that is an inaccurate and time-consuming primitive from the attacker's perspective. Our work begins to fill this gap by providing a domain-specific structured way to duplicate a physical page in cloud settings in the context of industrial control systems (ICSs). Here, we show a new attack primitive - \textit{BayesImposter}, which points out that the attacker can duplicate the .bss section of the target control DLL file of cloud protocols using the \textit{Bayesian estimation} technique. Our approach results in less memory (i.e., 4 KB compared to GB) and time (i.e., 13 minutes compared to hours) compared to the brute-force approach used in recent works. We point out that ICSs can be expressed as state-space models; hence, the \textit{Bayesian estimation} is an ideal choice to be combined with memory deduplication for a successful attack in cloud settings. To demonstrate the strength of \textit{BayesImposter}, we create a real-world automation platform using a scaled-down automated high-bay warehouse and industrial-grade SIMATIC S7-1500 PLC from Siemens as a target ICS. We demonstrate that \textit{BayesImposter} can predictively inject false commands into the PLC that can cause possible equipment damage with machine failure in the target ICS. Moreover, we show that \textit{BayesImposter} is capable of adversarial control over the target ICS resulting in severe consequences, such as killing a person but making it looks like an accident. Therefore, we also provide countermeasures to prevent the attack. △ Less

Submitted 7 October, 2022; originally announced October 2022.

arXiv:2210.03688 [pdf, other]

A Wolf in Sheep's Clothing: Spreading Deadly Pathogens Under the Disguise of Popular Music

Authors: Anomadarshi Barua, Yonatan Gizachew Achamyeleh, Mohammad Abdullah Al Faruque

Abstract: A Negative Pressure Room (NPR) is an essential requirement by the Bio-Safety Levels (BSLs) in biolabs or infectious-control hospitals to prevent deadly pathogens from being leaked from the facility. An NPR maintains a negative pressure inside with respect to the outside reference space so that microbes are contained inside of an NPR. Nowadays, differential pressure sensors (DPSs) are utilized by t… ▽ More A Negative Pressure Room (NPR) is an essential requirement by the Bio-Safety Levels (BSLs) in biolabs or infectious-control hospitals to prevent deadly pathogens from being leaked from the facility. An NPR maintains a negative pressure inside with respect to the outside reference space so that microbes are contained inside of an NPR. Nowadays, differential pressure sensors (DPSs) are utilized by the Building Management Systems (BMSs) to control and monitor the negative pressure in an NPR. This paper demonstrates a non-invasive and stealthy attack on NPRs by spoofing a DPS at its resonant frequency. Our contributions are: (1) We show that DPSs used in NPRs typically have resonant frequencies in the audible range. (2) We use this finding to design malicious music to create resonance in DPSs, resulting in an overshooting in the DPS's normal pressure readings. (3) We show how the resonance in DPSs can fool the BMSs so that the NPR turns its negative pressure to a positive one, causing a potential \textit{leak} of deadly microbes from NPRs. We do experiments on 8 DPSs from 5 different manufacturers to evaluate their resonant frequencies considering the sampling tube length and find resonance in 6 DPSs. We can achieve a 2.5 Pa change in negative pressure from a $\sim$7 cm distance when a sampling tube is not present and from a $\sim$2.5 cm distance for a 1 m sampling tube length. We also introduce an interval-time variation approach for an adversarial control over the negative pressure and show that the \textit{forged} pressure can be varied within 12 - 33 Pa. Our attack is also capable of attacking multiple NPRs simultaneously. Moreover, we demonstrate our attack at a real-world NPR located in an anonymous bioresearch facility, which is FDA approved and follows CDC guidelines. We also provide countermeasures to prevent the attack. △ Less

Submitted 7 October, 2022; originally announced October 2022.

arXiv:2208.09741 [pdf, other]

Sensor Security: Current Progress, Research Challenges, and Future Roadmap

Authors: Anomadarshi Barua, Mohammad Abdullah Al Faruque

Abstract: Sensors are one of the most pervasive and integral components of today's safety-critical systems. Sensors serve as a bridge between physical quantities and connected systems. The connected systems with sensors blindly believe the sensor as there is no way to authenticate the signal coming from a sensor. This could be an entry point for an attacker. An attacker can inject a fake input signal along… ▽ More Sensors are one of the most pervasive and integral components of today's safety-critical systems. Sensors serve as a bridge between physical quantities and connected systems. The connected systems with sensors blindly believe the sensor as there is no way to authenticate the signal coming from a sensor. This could be an entry point for an attacker. An attacker can inject a fake input signal along with the legitimate signal by using a suitable spoofing technique. As the sensor's transducer is not smart enough to differentiate between a fake and legitimate signal, the injected fake signal eventually can collapse the connected system. This type of attack is known as the transduction attack. Over the last decade, several works have been published to provide a defense against the transduction attack. However, the defenses are proposed on an ad-hoc basis; hence, they are not well-structured. Our work begins to fill this gap by providing a checklist that a defense technique should always follow to be considered as an ideal defense against the transduction attack. We name this checklist as the Golden reference of sensor defense. We provide insights on how this Golden reference can be achieved and argue that sensors should be redesigned from the transducer level to the sensor electronics level. We point out that only hardware or software modification is not enough; instead, a hardware/software (HW/SW) co-design approach is required to ride on this future roadmap to the robust and resilient sensor. △ Less

Submitted 20 August, 2022; originally announced August 2022.

arXiv:2207.08865 [pdf, other]

doi 10.1145/3508352.3549356

Romanus: Robust Task Offloading in Modular Multi-Sensor Autonomous Driving Systems

Authors: Luke Chen, Mohanad Odema, Mohammad Abdullah Al Faruque

Abstract: Due to the high performance and safety requirements of self-driving applications, the complexity of modern autonomous driving systems (ADS) has been growing, instigating the need for more sophisticated hardware which could add to the energy footprint of the ADS platform. Addressing this, edge computing is poised to encompass self-driving applications, enabling the compute-intensive autonomy-relate… ▽ More Due to the high performance and safety requirements of self-driving applications, the complexity of modern autonomous driving systems (ADS) has been growing, instigating the need for more sophisticated hardware which could add to the energy footprint of the ADS platform. Addressing this, edge computing is poised to encompass self-driving applications, enabling the compute-intensive autonomy-related tasks to be offloaded for processing at compute-capable edge servers. Nonetheless, the intricate hardware architecture of ADS platforms, in addition to the stringent robustness demands, set forth complications for task offloading which are unique to autonomous driving. Hence, we present $ROMANUS$, a methodology for robust and efficient task offloading for modular ADS platforms with multi-sensor processing pipelines. Our methodology entails two phases: (i) the introduction of efficient offloading points along the execution path of the involved deep learning models, and (ii) the implementation of a runtime solution based on Deep Reinforcement Learning to adapt the operating mode according to variations in the perceived road scene complexity, network connectivity, and server load. Experiments on the object detection use case demonstrated that our approach is 14.99% more energy-efficient than pure local execution while achieving a 77.06% reduction in risky behavior from a robust-agnostic offloading baseline. △ Less

Submitted 18 July, 2022; originally announced July 2022.

Comments: This paper has been accepted to the 2022 International Conference On Computer-Aided Design (ICCAD 2022)

arXiv:2207.06664 [pdf, other]

Golden Reference-Free Hardware Trojan Localization using Graph Convolutional Network

Authors: Rozhin Yasaei, Sina Faezi, Mohammad Abdullah Al Faruque

Abstract: The globalization of the Integrated Circuit (IC) supply chain has moved most of the design, fabrication, and testing process from a single trusted entity to various untrusted third-party entities worldwide. The risk of using untrusted third-Party Intellectual Property (3PIP) is the possibility for adversaries to insert malicious modifications known as Hardware Trojans (HTs). These HTs can compromi… ▽ More The globalization of the Integrated Circuit (IC) supply chain has moved most of the design, fabrication, and testing process from a single trusted entity to various untrusted third-party entities worldwide. The risk of using untrusted third-Party Intellectual Property (3PIP) is the possibility for adversaries to insert malicious modifications known as Hardware Trojans (HTs). These HTs can compromise the integrity, deteriorate the performance, deny the service, and alter the functionality of the design. While numerous HT detection methods have been proposed in the literature, the crucial task of HT localization is overlooked. Moreover, a few existing HT localization methods have several weaknesses: reliance on a golden reference, inability to generalize for all types of HT, lack of scalability, low localization resolution, and manual feature engineering/property definition. To overcome their shortcomings, we propose a novel, golden reference-free HT localization method at the pre-silicon stage by leveraging Graph Convolutional Network (GCN). In this work, we convert the circuit design to its intrinsic data structure, graph and extract the node attributes. Afterward, the graph convolution performs automatic feature extraction for nodes to classify the nodes as Trojan or benign. Our automated approach does not burden the designer with manual code review. It locates the Trojan signals with 99.6% accuracy, 93.1% F1-score, and a false-positive rate below 0.009%. △ Less

Submitted 14 July, 2022; originally announced July 2022.

Comments: IEEE Transactions on Very Large Scale Integration Systems (TVLSI), 2022

arXiv:2205.11832 [pdf, other]

Neural Contextual Bandits Based Dynamic Sensor Selection for Low-Power Body-Area Networks

Authors: Berken Utku Demirel, Luke Chen, Mohammad Abdullah Al Faruque

Abstract: Providing health monitoring devices with machine intelligence is important for enabling automatic mobile healthcare applications. However, this brings additional challenges due to the resource scarcity of these devices. This work introduces a neural contextual bandits based dynamic sensor selection methodology for high-performance and resource-efficient body-area networks to realize next generatio… ▽ More Providing health monitoring devices with machine intelligence is important for enabling automatic mobile healthcare applications. However, this brings additional challenges due to the resource scarcity of these devices. This work introduces a neural contextual bandits based dynamic sensor selection methodology for high-performance and resource-efficient body-area networks to realize next generation mobile health monitoring devices. The methodology utilizes contextual bandits to select the most informative sensor combinations during runtime and ignore redundant data for decreasing transmission and computing power in a body area network (BAN). The proposed method has been validated using one of the most common health monitoring applications: cardiac activity monitoring. Solutions from our proposed method are compared against those from related works in terms of classification performance and energy while considering the communication energy consumption. Our final solutions could reach $78.8\%$ AU-PRC on the PTB-XL ECG dataset for cardiac abnormality detection while decreasing the overall energy consumption and computational energy by $3.7 \times$ and $4.3 \times$, respectively. △ Less

Submitted 24 May, 2022; originally announced May 2022.

Comments: To be appeared at The International Symposium on Low Power Electronics and Design (ISLPED) 2022

arXiv:2205.03974 [pdf, other]

SELF-CARE: Selective Fusion with Context-Aware Low-Power Edge Computing for Stress Detection

Authors: Nafiul Rashid, Trier Mortlock, Mohammad Abdullah Al Faruque

Abstract: Detecting human stress levels and emotional states with physiological body-worn sensors is a complex task, but one with many health-related benefits. Robustness to sensor measurement noise and energy efficiency of low-power devices remain key challenges in stress detection. We propose SELFCARE, a fully wrist-based method for stress detection that employs context-aware selective sensor fusion that… ▽ More Detecting human stress levels and emotional states with physiological body-worn sensors is a complex task, but one with many health-related benefits. Robustness to sensor measurement noise and energy efficiency of low-power devices remain key challenges in stress detection. We propose SELFCARE, a fully wrist-based method for stress detection that employs context-aware selective sensor fusion that dynamically adapts based on data from the sensors. Our method uses motion to determine the context of the system and learns to adjust the fused sensors accordingly, improving performance while maintaining energy efficiency. SELF-CARE obtains state-of-the-art performance across the publicly available WESAD dataset, achieving 86.34% and 94.12% accuracy for the 3-class and 2-class classification problems, respectively. Evaluation on real hardware shows that our approach achieves up to 2.2x (3-class) and 2.7x (2-class) energy efficiency compared to traditional sensor fusion. △ Less

Submitted 8 May, 2022; originally announced May 2022.

Comments: 4 pages, 3 figures, 1 table, accepted to be published in 18th Annual International Conference on Distributed Computing in Sensor Systems (DCOSS 2022)

arXiv:2204.11431 [pdf, other]

Hardware Trojan Detection using Graph Neural Networks

Authors: Rozhin Yasaei, Luke Chen, Shih-Yuan Yu, Mohammad Abdullah Al Faruque

Abstract: The globalization of the Integrated Circuit (IC) supply chain has moved most of the design, fabrication, and testing process from a single trusted entity to various untrusted third-party entities around the world. The risk of using untrusted third-Party Intellectual Property (3PIP) is the possibility for adversaries to insert malicious modifications known as Hardware Trojans (HTs). These HTs can c… ▽ More The globalization of the Integrated Circuit (IC) supply chain has moved most of the design, fabrication, and testing process from a single trusted entity to various untrusted third-party entities around the world. The risk of using untrusted third-Party Intellectual Property (3PIP) is the possibility for adversaries to insert malicious modifications known as Hardware Trojans (HTs). These HTs can compromise the integrity, deteriorate the performance, and deny the functionality of the intended design. Various HT detection methods have been proposed in the literature; however, many fall short due to their reliance on a golden reference circuit, a limited detection scope, the need for manual code review, or the inability to scale with large modern designs. We propose a novel golden reference-free HT detection method for both Register Transfer Level (RTL) and gate-level netlists by leveraging Graph Neural Networks (GNNs) to learn the behavior of the circuit through a Data Flow Graph (DFG) representation of the hardware design. We evaluate our model on a custom dataset by expanding the Trusthub HT benchmarks \cite{trusthub1}. The results demonstrate that our approach detects unknown HTs with 97% recall (true positive rate) very fast in 21.1ms for RTL and 84% recall in 13.42s for Gate-Level Netlist. △ Less

Submitted 25 April, 2022; originally announced April 2022.

arXiv:2202.11330 [pdf, other]

EcoFusion: Energy-Aware Adaptive Sensor Fusion for Efficient Autonomous Vehicle Perception

Authors: Arnav Vaibhav Malawade, Trier Mortlock, Mohammad Abdullah Al Faruque

Abstract: Autonomous vehicles use multiple sensors, large deep-learning models, and powerful hardware platforms to perceive the environment and navigate safely. In many contexts, some sensing modalities negatively impact perception while increasing energy consumption. We propose EcoFusion: an energy-aware sensor fusion approach that uses context to adapt the fusion method and reduce energy consumption witho… ▽ More Autonomous vehicles use multiple sensors, large deep-learning models, and powerful hardware platforms to perceive the environment and navigate safely. In many contexts, some sensing modalities negatively impact perception while increasing energy consumption. We propose EcoFusion: an energy-aware sensor fusion approach that uses context to adapt the fusion method and reduce energy consumption without affecting perception performance. EcoFusion performs up to 9.5% better at object detection than existing fusion methods with approximately 60% less energy and 58% lower latency on the industry-standard Nvidia Drive PX2 hardware platform. We also propose several context-identification strategies, implement a joint optimization between energy and performance, and present scenario-specific results. △ Less

Submitted 23 February, 2022; originally announced February 2022.

Comments: Accepted to be published in the 59th ACM/IEEE Design Automation Conference (DAC 2022)

arXiv:2201.06644 [pdf, other]

HydraFusion: Context-Aware Selective Sensor Fusion for Robust and Efficient Autonomous Vehicle Perception

Authors: Arnav Vaibhav Malawade, Trier Mortlock, Mohammad Abdullah Al Faruque

Abstract: Although autonomous vehicles (AVs) are expected to revolutionize transportation, robust perception across a wide range of driving contexts remains a significant challenge. Techniques to fuse sensor data from camera, radar, and lidar sensors have been proposed to improve AV perception. However, existing methods are insufficiently robust in difficult driving contexts (e.g., bad weather, low light, s… ▽ More Although autonomous vehicles (AVs) are expected to revolutionize transportation, robust perception across a wide range of driving contexts remains a significant challenge. Techniques to fuse sensor data from camera, radar, and lidar sensors have been proposed to improve AV perception. However, existing methods are insufficiently robust in difficult driving contexts (e.g., bad weather, low light, sensor obstruction) due to rigidity in their fusion implementations. These methods fall into two broad categories: (i) early fusion, which fails when sensor data is noisy or obscured, and (ii) late fusion, which cannot leverage features from multiple sensors and thus produces worse estimates. To address these limitations, we propose HydraFusion: a selective sensor fusion framework that learns to identify the current driving context and fuses the best combination of sensors to maximize robustness without compromising efficiency. HydraFusion is the first approach to propose dynamically adjusting between early fusion, late fusion, and combinations in-between, thus varying both how and when fusion is applied. We show that, on average, HydraFusion outperforms early and late fusion approaches by 13.66% and 14.54%, respectively, without increasing computational complexity or energy consumption on the industry-standard Nvidia Drive PX2 AV hardware platform. We also propose and evaluate both static and deep-learning-based context identification strategies. Our open-source code and model implementation are available at https://github.com/AICPS/hydrafusion. △ Less

Submitted 17 January, 2022; originally announced January 2022.

Comments: Accepted to be published in the 13th ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS 2022)

arXiv:2112.07901 [pdf, other]

Energy-Efficient Real-Time Heart Monitoring on Edge-Fog-Cloud Internet-of-Medical-Things

Authors: Berken Utku Demirel, Islam Abdelsalam Bayoumy, Mohammad Abdullah Al Faruque

Abstract: The recent developments in wearable devices and the Internet of Medical Things (IoMT) allow real-time monitoring and recording of electrocardiogram (ECG) signals. However, continuous monitoring of ECG signals is challenging in low-power wearable devices due to energy and memory constraints. Therefore, in this paper, we present a novel and energy-efficient methodology for continuously monitoring th… ▽ More The recent developments in wearable devices and the Internet of Medical Things (IoMT) allow real-time monitoring and recording of electrocardiogram (ECG) signals. However, continuous monitoring of ECG signals is challenging in low-power wearable devices due to energy and memory constraints. Therefore, in this paper, we present a novel and energy-efficient methodology for continuously monitoring the heart for low-power wearable devices. The proposed methodology is composed of three different layers: 1) a Noise/Artifact detection layer to grade the quality of the ECG signals; 2) a Normal/Abnormal beat classification layer to detect the anomalies in the ECG signals, and 3) an Abnormal beat classification layer to detect diseases from ECG signals. Moreover, a distributed multi-output Convolutional Neural Network (CNN) architecture is used to decrease the energy consumption and latency between the edge-fog/cloud. Our methodology reaches an accuracy of 99.2% on the well-known MIT-BIH Arrhythmia dataset. Evaluation on real hardware shows that our methodology is suitable for devices having a minimum RAM of 32KB. Moreover, the proposed methodology achieves $7\times$ more energy efficiency compared to state-of-the-art works. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: To be appeared at IEEE Internet of Things (IoT) Journal

arXiv:2111.06123 [pdf, other]

Spatio-Temporal Scene-Graph Embedding for Autonomous Vehicle Collision Prediction

Authors: Arnav V. Malawade, Shih-Yuan Yu, Brandon Hsu, Deepan Muthirayan, Pramod P. Khargonekar, Mohammad A. Al Faruque

Abstract: In autonomous vehicles (AVs), early warning systems rely on collision prediction to ensure occupant safety. However, state-of-the-art methods using deep convolutional networks either fail at modeling collisions or are too expensive/slow, making them less suitable for deployment on AV edge hardware. To address these limitations, we propose sg2vec, a spatio-temporal scene-graph embedding methodology… ▽ More In autonomous vehicles (AVs), early warning systems rely on collision prediction to ensure occupant safety. However, state-of-the-art methods using deep convolutional networks either fail at modeling collisions or are too expensive/slow, making them less suitable for deployment on AV edge hardware. To address these limitations, we propose sg2vec, a spatio-temporal scene-graph embedding methodology that uses Graph Neural Network (GNN) and Long Short-Term Memory (LSTM) layers to predict future collisions via visual scene perception. We demonstrate that sg2vec predicts collisions 8.11% more accurately and 39.07% earlier than the state-of-the-art method on synthesized datasets, and 29.47% more accurately on a challenging real-world collision dataset. We also show that sg2vec is better than the state-of-the-art at transferring knowledge from synthetic datasets to real-world driving datasets. Finally, we demonstrate that sg2vec performs inference 9.3x faster with an 88.0% smaller model, 32.4% less power, and 92.8% less energy than the state-of-the-art method on the industry-standard Nvidia DRIVE PX 2 platform, making it more suitable for implementation on the edge. △ Less

Submitted 11 November, 2021; originally announced November 2021.

arXiv:2110.11475 [pdf, other]

Future of Smart Classroom in the Era of Wearable Neurotechnology

Authors: Mojtaba Taherisadr, Berken Utku Demirel, Mohammad Abdullah Al Faruque, Salma Elmalaki

Abstract: Interdisciplinary research among engineering, computer science, and neuroscience to understand and utilize the human brain signals resulted in advances and widespread applicability of wearable neurotechnology in adaptive human-in-the-loop smart systems. Considering these advances, we envision that future education will exploit the advances in wearable neurotechnology and move toward more personali… ▽ More Interdisciplinary research among engineering, computer science, and neuroscience to understand and utilize the human brain signals resulted in advances and widespread applicability of wearable neurotechnology in adaptive human-in-the-loop smart systems. Considering these advances, we envision that future education will exploit the advances in wearable neurotechnology and move toward more personalized smart classrooms where instructions and interactions are tailored towards. students' individual strengths and needs. In this paper, we discuss the future of smart classrooms and how advances in neuroscience, machine learning, and embedded systems as key enablers will provide the infrastructure for envisioned smart classrooms and personalized education along with open challenges that are required to be addressed. △ Less

Submitted 21 October, 2021; originally announced October 2021.

Comments: 4 pages

arXiv:2110.02259 [pdf, other]

Multi-Modal Attack Detection for Cyber-Physical Additive Manufacturing

Authors: Shih-Yuan Yu, Arnav Vaibhav Malawade, Mohammad Abdullah Al Faruque

Abstract: Cyber-Physical Additive Manufacturing (AM) constructs a physical 3D object layer-by-layer according to its digital representation and has been vastly applied to fast prototy** and the manufacturing of functional end-products across fields. The computerization of traditional production processes propels these technological advancements; however, this also introduces new vulnerabilities, necessita… ▽ More Cyber-Physical Additive Manufacturing (AM) constructs a physical 3D object layer-by-layer according to its digital representation and has been vastly applied to fast prototy** and the manufacturing of functional end-products across fields. The computerization of traditional production processes propels these technological advancements; however, this also introduces new vulnerabilities, necessitating the study of cyberattacks on these systems. The AM Sabotage Attack is one kind of kinetic cyberattack that originates from the cyber domain and can eventually lead to physical damage, injury, or even death. By introducing inconspicuous yet damaging alterations in any specific process of the AM digital process chain, the attackers can compromise the structural integrity of a manufactured component in a manner that is invisible to a human observer. If the manufactured objects are critical for their system, those attacks can even compromise the whole system's structural integrity and pose a severe safety risk to its users. For example, an inconspicuous void (less than 1 mm in dimension) placed in the 3D design of a tensile test specimen can reduce its yield load by 14%. However, security studies primarily focus on securing digital assets, overlooking the fact that AM systems are CPSs. △ Less

Submitted 5 October, 2021; originally announced October 2021.

Journal ref: TC-CPS Newsletter Volume 05, Issue 01 (Mar. 2020)

arXiv:2109.08632 [pdf, other]

Graph Learning for Cognitive Digital Twins in Manufacturing Systems

Authors: Trier Mortlock, Deepan Muthirayan, Shih-Yuan Yu, Pramod P. Khargonekar, Mohammad A. Al Faruque

Abstract: Future manufacturing requires complex systems that connect simulation platforms and virtualization with physical data from industrial processes. Digital twins incorporate a physical twin, a digital twin, and the connection between the two. Benefits of using digital twins, especially in manufacturing, are abundant as they can increase efficiency across an entire manufacturing life-cycle. The digita… ▽ More Future manufacturing requires complex systems that connect simulation platforms and virtualization with physical data from industrial processes. Digital twins incorporate a physical twin, a digital twin, and the connection between the two. Benefits of using digital twins, especially in manufacturing, are abundant as they can increase efficiency across an entire manufacturing life-cycle. The digital twin concept has become increasingly sophisticated and capable over time, enabled by rises in many technologies. In this paper, we detail the cognitive digital twin as the next stage of advancement of a digital twin that will help realize the vision of Industry 4.0. Cognitive digital twins will allow enterprises to creatively, effectively, and efficiently exploit implicit knowledge drawn from the experience of existing manufacturing systems. They also enable more autonomous decisions and control, while improving the performance across the enterprise (at scale). This paper presents graph learning as one potential pathway towards enabling cognitive functionalities in manufacturing digital twins. A novel approach to realize cognitive digital twins in the product design stage of manufacturing that utilizes graph learning is presented. △ Less

Submitted 17 September, 2021; originally announced September 2021.

arXiv:2109.01183 [pdf, other]

roadscene2vec: A Tool for Extracting and Embedding Road Scene-Graphs

Authors: Arnav Vaibhav Malawade, Shih-Yuan Yu, Brandon Hsu, Harsimrat Kaeley, Anurag Karra, Mohammad Abdullah Al Faruque

Abstract: Recently, road scene-graph representations used in conjunction with graph learning techniques have been shown to outperform state-of-the-art deep learning techniques in tasks including action classification, risk assessment, and collision prediction. To enable the exploration of applications of road scene-graph representations, we introduce roadscene2vec: an open-source tool for extracting and emb… ▽ More Recently, road scene-graph representations used in conjunction with graph learning techniques have been shown to outperform state-of-the-art deep learning techniques in tasks including action classification, risk assessment, and collision prediction. To enable the exploration of applications of road scene-graph representations, we introduce roadscene2vec: an open-source tool for extracting and embedding road scene-graphs. The goal of roadscene2vec is to enable research into the applications and capabilities of road scene-graphs by providing tools for generating scene-graphs, graph learning models to generate spatio-temporal scene-graph embeddings, and tools for visualizing and analyzing scene-graph-based methodologies. The capabilities of roadscene2vec include (i) customized scene-graph generation from either video clips or data from the CARLA simulator, (ii) multiple configurable spatio-temporal graph embedding models and baseline CNN-based models, (iii) built-in functionality for using graph and sequence embeddings for risk assessment and collision prediction applications, (iv) tools for evaluating transfer learning, and (v) utilities for visualizing scene-graphs and analyzing the explainability of graph learning models. We demonstrate the utility of roadscene2vec for these use cases with experimental results and qualitative evaluations for both graph learning models and CNN-based models. roadscene2vec is available at https://github.com/AICPS/roadscene2vec. △ Less

Submitted 30 December, 2021; v1 submitted 2 September, 2021; originally announced September 2021.

arXiv:2108.03166 [pdf, other]

Feature Augmented Hybrid CNN for Stress Recognition Using Wrist-based Photoplethysmography Sensor

Authors: Nafiul Rashid, Luke Chen, Manik Dautta, Abel Jimenez, Peter Tseng, Mohammad Abdullah Al Faruque

Abstract: Stress is a physiological state that hampers mental health and has serious consequences to physical health. Moreover, the COVID-19 pandemic has increased stress levels among people across the globe. Therefore, continuous monitoring and detection of stress are necessary. The recent advances in wearable devices have allowed the monitoring of several physiological signals related to stress. Among the… ▽ More Stress is a physiological state that hampers mental health and has serious consequences to physical health. Moreover, the COVID-19 pandemic has increased stress levels among people across the globe. Therefore, continuous monitoring and detection of stress are necessary. The recent advances in wearable devices have allowed the monitoring of several physiological signals related to stress. Among them, wrist-worn wearable devices like smartwatches are most popular due to their convenient usage. And the photoplethysmography (PPG) sensor is the most prevalent sensor in almost all consumer-grade wrist-worn smartwatches. Therefore, this paper focuses on using a wrist-based PPG sensor that collects Blood Volume Pulse (BVP) signals to detect stress which may be applicable for consumer-grade wristwatches. Moreover, state-of-the-art works have used either classical machine learning algorithms to detect stress using hand-crafted features or have used deep learning algorithms like Convolutional Neural Network (CNN) which automatically extracts features. This paper proposes a novel hybrid CNN (H-CNN) classifier that uses both the hand-crafted features and the automatically extracted features by CNN to detect stress using the BVP signal. Evaluation on the benchmark WESAD dataset shows that, for 3-class classification (Baseline vs. Stress vs. Amusement), our proposed H-CNN outperforms traditional classifiers and normal CNN by 5% and 7% accuracy, and 10% and 7% macro F1 score, respectively. Also for 2-class classification (Stress vs. Non-stress), our proposed H-CNN outperforms traditional classifiers and normal CNN by 3% and ~5% accuracy, and ~3% and ~7% macro F1 score, respectively. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: 4 pages, 3 figures, to be published in 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2021

arXiv:2108.00672 [pdf, other]

Energy-efficient Blood Pressure Monitoring based on Single-site Photoplethysmogram on Wearable Devices

Authors: Wenrui Lin, Berken Utku Demirel, Mohammad Abdullah Al Faruque, G. P. Li

Abstract: The paper proposes accurate Blood Pressure Monitoring (BPM) based on a single-site Photoplethysmographic (PPG) sensor and provides an energy-efficient solution on edge cuffless wearable devices. Continuous PPG signal preprocessed and used as input of the Artificial Neural Network (ANN), and outputs systolic BP (SBP), diastolic BP (DBP), and mean arterial BP (MAP) values for each heartbeat. The imp… ▽ More The paper proposes accurate Blood Pressure Monitoring (BPM) based on a single-site Photoplethysmographic (PPG) sensor and provides an energy-efficient solution on edge cuffless wearable devices. Continuous PPG signal preprocessed and used as input of the Artificial Neural Network (ANN), and outputs systolic BP (SBP), diastolic BP (DBP), and mean arterial BP (MAP) values for each heartbeat. The improvement of the BPM accuracy is obtained by removing outliers in the preprocessing step and the whole-based inputs compared to parameter-based inputs extracted from the PPG signal. Performance obtained is $3.42 \pm 5.42$ mmHg (MAE $\pm$ RMSD) for SBP, $1.92 \pm 3.29$ mmHg for DBP, and $2.21 \pm 3.50$ mmHg for MAP which is competitive compared to other studies. This is the first BPM solution with edge computing artificial intelligence as we have learned so far. Evaluation experiments on real hardware show that the solution takes 42.2 ms, 18.2 KB RAM, and 2.1 mJ average energy per reading. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: 4 pages, 3 figures, EMBC conference

arXiv:2108.00216 [pdf, other]

Single-Channel EEG Based Arousal Level Estimation Using Multitaper Spectrum Estimation at Low-Power Wearable Devices

Authors: Berken Utku Demirel, Ivan Skelin, Haoxin Zhang, Jack J. Lin, Mohammad Abdullah Al Faruque

Abstract: This paper proposes a novel lightweight method using the multitaper power spectrum to estimate arousal levels at wearable devices. We show that the spectral slope (1/f) of the electrophysiological power spectrum reflects the scale-free neural activity. To evaluate the proposed feature's performance, we used scalp EEG recorded during anesthesia and sleep with technician-scored Hypnogram annotations… ▽ More This paper proposes a novel lightweight method using the multitaper power spectrum to estimate arousal levels at wearable devices. We show that the spectral slope (1/f) of the electrophysiological power spectrum reflects the scale-free neural activity. To evaluate the proposed feature's performance, we used scalp EEG recorded during anesthesia and sleep with technician-scored Hypnogram annotations. It is shown that the proposed methodology discriminates wakefulness from reduced arousal solely based on the neurophysiological brain state with more than 80% accuracy. Therefore, our findings describe a common electrophysiological marker that tracks reduced arousal states, which can be applied to different applications (e.g., emotion detection, driver drowsiness). Evaluation on hardware shows that the proposed methodology can be implemented for devices with a minimum RAM of 512 KB with 55 mJ average energy consumption. △ Less

Submitted 31 July, 2021; originally announced August 2021.

Comments: To appear at the 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC'21), October 31

arXiv:2108.00078 [pdf]

Technical Report for HW2VEC -- A Graph Learning Tool for Automating Hardware Security

Authors: Yasamin Moghaddas, Tommy Nguyen, Shih-Yuan Yu, Rozhin Yasaei, Mohammad Abdullah Al Faruque

Abstract: In this technical report, we present HW2VEC [11], an open-source graph learning tool for hardware security, and its implementation details (Figure 1). HW2VEC provides toolboxes for graph representation extraction in the form of Data Flow Graphs (DFGs) or Abstract Syntax Trees (ASTs) from hardware designs at RTL and GLN levels. Besides, HW2VEC also offers graph learning tools for representing hardw… ▽ More In this technical report, we present HW2VEC [11], an open-source graph learning tool for hardware security, and its implementation details (Figure 1). HW2VEC provides toolboxes for graph representation extraction in the form of Data Flow Graphs (DFGs) or Abstract Syntax Trees (ASTs) from hardware designs at RTL and GLN levels. Besides, HW2VEC also offers graph learning tools for representing hardware designs in vectors that preserve both structural features and behavioral features. To the best of our knowledge, HW2VEC is the first open-source research tool that supports applying graph learning methods to hardware designs in different abstraction levels for hardware security. We organize the remainder of this technical report as follows: Section 2 introduces the architecture of HW2VEC; Section 3 gives information about the use-case implementations; Section 4 provides the experimental results and demonstrates the performance of HW2VEC for two hardware security applications: HT detection and IP piracy detection; finally, Section 5 will conclude this report. △ Less

Submitted 27 July, 2021; originally announced August 2021.

Comments: arXiv admin note: text overlap with arXiv:2107.12328

arXiv:2107.12328 [pdf, other]

HW2VEC: A Graph Learning Tool for Automating Hardware Security

Authors: Shih-Yuan Yu, Rozhin Yasaei, Qingrong Zhou, Tommy Nguyen, Mohammad Abdullah Al Faruque

Abstract: The time-to-market pressure and continuous growing complexity of hardware designs have promoted the globalization of the Integrated Circuit (IC) supply chain. However, such globalization also poses various security threats in each phase of the IC supply chain. Although the advancements of Machine Learning (ML) have pushed the frontier of hardware security, most conventional ML-based methods can on… ▽ More The time-to-market pressure and continuous growing complexity of hardware designs have promoted the globalization of the Integrated Circuit (IC) supply chain. However, such globalization also poses various security threats in each phase of the IC supply chain. Although the advancements of Machine Learning (ML) have pushed the frontier of hardware security, most conventional ML-based methods can only achieve the desired performance by manually finding a robust feature representation for circuits that are non-Euclidean data. As a result, modeling these circuits using graph learning to improve design flows has attracted research attention in the Electronic Design Automation (EDA) field. However, due to the lack of supporting tools, only a few existing works apply graph learning to resolve hardware security issues. To attract more attention, we propose HW2VEC, an open-source graph learning tool that lowers the threshold for newcomers to research hardware security applications with graphs. HW2VEC provides an automated pipeline for extracting a graph representation from a hardware design in various abstraction levels (register transfer level or gate-level netlist). Besides, HW2VEC users can automatically transform the non-Euclidean hardware designs into Euclidean graph embeddings for solving their problems. In this paper, we demonstrate that HW2VEC can achieve state-of-the-art performance on two hardware security-related tasks: Hardware Trojan Detection and Intellectual Property Piracy Detection. We provide the time profiling results for the graph extraction and the learning pipelines in HW2VEC. △ Less

Submitted 26 July, 2021; originally announced July 2021.

arXiv:2107.10895 [pdf, other]

doi 10.1145/3477006

SAGE: A Split-Architecture Methodology for Efficient End-to-End Autonomous Vehicle Control

Authors: Arnav Malawade, Mohanad Odema, Sebastien Lajeunesse-DeGroot, Mohammad Abdullah Al Faruque

Abstract: Autonomous vehicles (AV) are expected to revolutionize transportation and improve road safety significantly. However, these benefits do not come without cost; AVs require large Deep-Learning (DL) models and powerful hardware platforms to operate reliably in real-time, requiring between several hundred watts to one kilowatt of power. This power consumption can dramatically reduce vehicles' driving… ▽ More Autonomous vehicles (AV) are expected to revolutionize transportation and improve road safety significantly. However, these benefits do not come without cost; AVs require large Deep-Learning (DL) models and powerful hardware platforms to operate reliably in real-time, requiring between several hundred watts to one kilowatt of power. This power consumption can dramatically reduce vehicles' driving range and affect emissions. To address this problem, we propose SAGE: a methodology for selectively offloading the key energy-consuming modules of DL architectures to the cloud to optimize edge energy usage while meeting real-time latency constraints. Furthermore, we leverage Head Network Distillation (HND) to introduce efficient bottlenecks within the DL architecture in order to minimize the network overhead costs of offloading with almost no degradation in the model's performance. We evaluate SAGE using an Nvidia Jetson TX2 and an industry-standard Nvidia Drive PX2 as the AV edge devices and demonstrate that our offloading strategy is practical for a wide range of DL models and internet connection bandwidths on 3G, 4G LTE, and WiFi technologies. Compared to edge-only computation, SAGE reduces energy consumption by an average of 36.13%, 47.07%, and 55.66% for an AV with one low-resolution camera, one high-resolution camera, and three high-resolution cameras, respectively. SAGE also reduces upload data size by up to 98.40% compared to direct camera offloading. △ Less

Submitted 22 July, 2021; originally announced July 2021.

Comments: This article appears as part of the ESWEEK-TECS special issue and was presented in the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2021

arXiv:2107.10823 [pdf]

Establishing Digital Recognition and Identification of Microscopic Objects for Implementation of Artificial Intelligence (AI) Guided Microassembly

Authors: Tuo Zhou, Shih-Yuan Yu, Matthew Michaels, Fangzhou Du, Lawrence Kulinsky, Mohammad Abdullah Al Faruque

Abstract: s miniaturization of electrical and mechanical components used in modern technology progresses, there is an increasing need for high-throughput and low-cost micro-scale assembly techniques. Many current micro-assembly methods are serial in nature, resulting in unfeasibly low throughput. Additionally, the need for increasingly smaller tools to pick and place individual microparts makes these method… ▽ More s miniaturization of electrical and mechanical components used in modern technology progresses, there is an increasing need for high-throughput and low-cost micro-scale assembly techniques. Many current micro-assembly methods are serial in nature, resulting in unfeasibly low throughput. Additionally, the need for increasingly smaller tools to pick and place individual microparts makes these methods cost prohibitive. Alternatively, parallel self-assembly or directed-assembly techniques can be employed by utilizing forces dominant at the micro and nano scales such as electro-kinetic, thermal, and capillary forces. However, these forces are governed by complex equations and often act on microparts simultaneously and competitively, making modeling and simulation difficult. The research in this paper presents a novel phenomenological approach to directed micro-assembly through the use of artificial intelligence to correlate micro-particle movement via dielectrophoretic and electro-osmotic forces in response to varying frequency of an applied non-uniform electric field. This research serves as a proof of concept of the application of artificial intelligence to create high yield low-cost micro-assembly techniques, which will prove useful in a variety of fields including micro-electrical-mechanical systems (MEMS), biotechnology, and tissue engineering. △ Less

Submitted 22 July, 2021; originally announced July 2021.

arXiv:2107.09309 [pdf, other]

LENS: Layer Distribution Enabled Neural Architecture Search in Edge-Cloud Hierarchies

Authors: Mohanad Odema, Nafiul Rashid, Berken Utku Demirel, Mohammad Abdullah Al Faruque

Abstract: Edge-Cloud hierarchical systems employing intelligence through Deep Neural Networks (DNNs) endure the dilemma of workload distribution within them. Previous solutions proposed to distribute workloads at runtime according to the state of the surroundings, like the wireless conditions. However, such conditions are usually overlooked at design time. This paper addresses this issue for DNN architectur… ▽ More Edge-Cloud hierarchical systems employing intelligence through Deep Neural Networks (DNNs) endure the dilemma of workload distribution within them. Previous solutions proposed to distribute workloads at runtime according to the state of the surroundings, like the wireless conditions. However, such conditions are usually overlooked at design time. This paper addresses this issue for DNN architectural design by presenting a novel methodology, LENS, which administers multi-objective Neural Architecture Search (NAS) for two-tiered systems, where the performance objectives are refashioned to consider the wireless communication parameters. From our experimental search space, we demonstrate that LENS improves upon the traditional solution's Pareto set by 76.47% and 75% with respect to the energy and latency metrics, respectively. △ Less

Submitted 20 July, 2021; originally announced July 2021.

Comments: To appear at the 58th IEEE/ACM Design Automation Conference (DAC), December 2021, San Francisco, CA, USA

arXiv:2107.09130 [pdf, other]

GNN4IP: Graph Neural Network for Hardware Intellectual Property Piracy Detection

Authors: Rozhin Yasaei, Shih-Yuan Yu, Emad Kasaeyan Naeini, Mohammad Abdullah Al Faruque

Abstract: Aggressive time-to-market constraints and enormous hardware design and fabrication costs have pushed the semiconductor industry toward hardware Intellectual Properties (IP) core design. However, the globalization of the integrated circuits (IC) supply chain exposes IP providers to theft and illegal redistribution of IPs. Watermarking and fingerprinting are proposed to detect IP piracy. Nevertheles… ▽ More Aggressive time-to-market constraints and enormous hardware design and fabrication costs have pushed the semiconductor industry toward hardware Intellectual Properties (IP) core design. However, the globalization of the integrated circuits (IC) supply chain exposes IP providers to theft and illegal redistribution of IPs. Watermarking and fingerprinting are proposed to detect IP piracy. Nevertheless, they come with additional hardware overhead and cannot guarantee IP security as advanced attacks are reported to remove the watermark, forge, or bypass it. In this work, we propose a novel methodology, GNN4IP, to assess similarities between circuits and detect IP piracy. We model the hardware design as a graph and construct a graph neural network model to learn its behavior using the comprehensive dataset of register transfer level codes and gate-level netlists that we have gathered. GNN4IP detects IP piracy with 96% accuracy in our dataset and recognizes the original IP in its obfuscated version with 100% accuracy. △ Less

Submitted 19 July, 2021; originally announced July 2021.

arXiv:2102.11450 [pdf, other]

doi 10.1109/TII.2021.3062030

Neuroscience-Inspired Algorithms for the Predictive Maintenance of Manufacturing Systems

Authors: Arnav V. Malawade, Nathan D. Costa, Deepan Muthirayan, Pramod P. Khargonekar, Mohammad A. Al Faruque

Abstract: If machine failures can be detected preemptively, then maintenance and repairs can be performed more efficiently, reducing production costs. Many machine learning techniques for performing early failure detection using vibration data have been proposed; however, these methods are often power and data-hungry, susceptible to noise, and require large amounts of data preprocessing. Also, training is u… ▽ More If machine failures can be detected preemptively, then maintenance and repairs can be performed more efficiently, reducing production costs. Many machine learning techniques for performing early failure detection using vibration data have been proposed; however, these methods are often power and data-hungry, susceptible to noise, and require large amounts of data preprocessing. Also, training is usually only performed once before inference, so they do not learn and adapt as the machine ages. Thus, we propose a method of performing online, real-time anomaly detection for predictive maintenance using Hierarchical Temporal Memory (HTM). Inspired by the human neocortex, HTMs learn and adapt continuously and are robust to noise. Using the Numenta Anomaly Benchmark, we empirically demonstrate that our approach outperforms state-of-the-art algorithms at preemptively detecting real-world cases of bearing failures and simulated 3D printer failures. Our approach achieves an average score of 64.71, surpassing state-of-the-art deep-learning (49.38) and statistical (61.06) methods. △ Less

Submitted 22 February, 2021; originally announced February 2021.

arXiv:2102.01875 [pdf, other]

AHAR: Adaptive CNN for Energy-efficient Human Activity Recognition in Low-power Edge Devices

Authors: Nafiul Rashid, Berken Utku Demirel, Mohammad Abdullah Al Faruque

Abstract: Human Activity Recognition (HAR) is one of the key applications of health monitoring that requires continuous use of wearable devices to track daily activities. This paper proposes an Adaptive CNN for energy-efficient HAR (AHAR) suitable for low-power edge devices. Unlike traditional early exit architecture that makes the exit decision based on classification confidence, AHAR proposes a novel adap… ▽ More Human Activity Recognition (HAR) is one of the key applications of health monitoring that requires continuous use of wearable devices to track daily activities. This paper proposes an Adaptive CNN for energy-efficient HAR (AHAR) suitable for low-power edge devices. Unlike traditional early exit architecture that makes the exit decision based on classification confidence, AHAR proposes a novel adaptive architecture that uses an output block predictor to select a portion of the baseline architecture to use during the inference phase. Experimental results show that traditional early exit architectures suffer from performance loss whereas our adaptive architecture provides similar or better performance as the baseline one while being energy-efficient. We validate our methodology in classifying locomotion activities from two datasets- Opportunity and w-HAR. Compared to the fog/cloud computing approaches for the Opportunity dataset, our baseline and adaptive architecture shows a comparable weighted F1 score of 91.79%, and 91.57%, respectively. For the w-HAR dataset, our baseline and adaptive architecture outperforms the state-of-the-art works with a weighted F1 score of 97.55%, and 97.64%, respectively. Evaluation on real hardware shows that our baseline architecture is significantly energy-efficient (422.38x less) and memory-efficient (14.29x less) compared to the works on the Opportunity dataset. For the w-HAR dataset, our baseline architecture requires 2.04x less energy and 2.18x less memory compared to the state-of-the-art work. Moreover, experimental results show that our adaptive architecture is 12.32% (Opportunity) and 11.14% (w-HAR) energy-efficient than our baseline while providing similar (Opportunity) or better (w-HAR) performance with no significant memory overhead. △ Less

Submitted 3 January, 2022; v1 submitted 2 February, 2021; originally announced February 2021.

Comments: 11 pages, 9 figures, Accepted to be published in IEEE Internet of Things Journal

arXiv:2009.06435 [pdf, other]

Scene-Graph Augmented Data-Driven Risk Assessment of Autonomous Vehicle Decisions

Authors: Shih-Yuan Yu, Arnav V. Malawade, Deepan Muthirayan, Pramod P. Khargonekar, Mohammad A. Al Faruque

Abstract: Despite impressive advancements in Autonomous Driving Systems (ADS), navigation in complex road conditions remains a challenging problem. There is considerable evidence that evaluating the subjective risk level of various decisions can improve ADS' safety in both normal and complex driving scenarios. However, existing deep learning-based methods often fail to model the relationships between traffi… ▽ More Despite impressive advancements in Autonomous Driving Systems (ADS), navigation in complex road conditions remains a challenging problem. There is considerable evidence that evaluating the subjective risk level of various decisions can improve ADS' safety in both normal and complex driving scenarios. However, existing deep learning-based methods often fail to model the relationships between traffic participants and can suffer when faced with complex real-world scenarios. Besides, these methods lack transferability and explainability. To address these limitations, we propose a novel data-driven approach that uses scene-graphs as intermediate representations. Our approach includes a Multi-Relation Graph Convolution Network, a Long-Short Term Memory Network, and attention layers for modeling the subjective risk of driving maneuvers. To train our model, we formulate this task as a supervised scene classification problem. We consider a typical use case to demonstrate our model's capabilities: lane changes. We show that our approach achieves a higher classification accuracy than the state-of-the-art approach on both large (96.4% vs. 91.2%) and small (91.8% vs. 71.2%) synthesized datasets, also illustrating that our approach can learn effectively even from smaller datasets. We also show that our model trained on a synthesized dataset achieves an average accuracy of 87.8% when tested on a real-world dataset compared to the 70.3% accuracy achieved by the state-of-the-art model trained on the same synthesized dataset, showing that our approach can more effectively transfer knowledge. Finally, we demonstrate that the use of spatial and temporal attention layers improves our model's performance by 2.7% and 0.7% respectively, and increases its explainability. △ Less

Submitted 31 August, 2020; originally announced September 2020.

Comments: 10 pages, 5 figures, 1 table

arXiv:1906.04239 [pdf, other]

Pykg2vec: A Python Library for Knowledge Graph Embedding

Authors: Shih Yuan Yu, Sujit Rokka Chhetri, Arquimedes Canedo, Palash Goyal, Mohammad Abdullah Al Faruque

Abstract: Pykg2vec is an open-source Python library for learning the representations of the entities and relations in knowledge graphs. Pykg2vec's flexible and modular software architecture currently implements 16 state-of-the-art knowledge graph embedding algorithms, and is designed to easily incorporate new algorithms. The goal of pykg2vec is to provide a practical and educational platform to accelerate r… ▽ More Pykg2vec is an open-source Python library for learning the representations of the entities and relations in knowledge graphs. Pykg2vec's flexible and modular software architecture currently implements 16 state-of-the-art knowledge graph embedding algorithms, and is designed to easily incorporate new algorithms. The goal of pykg2vec is to provide a practical and educational platform to accelerate research in knowledge graph representation learning. Pykg2vec is built on top of TensorFlow and Python's multiprocessing framework and provides modules for batch generation, Bayesian hyperparameter optimization, mean rank evaluation, embedding, and result visualization. Pykg2vec is released under the MIT License and is also available in the Python Package Index (PyPI). The source code of pykg2vec is available at https://github.com/Sujit-O/pykg2vec. △ Less

Submitted 4 June, 2019; originally announced June 2019.

Comments: 5 pages, 5 figures, few code snippets

arXiv:1808.08213 [pdf, other]

doi 10.1145/3240765.3243477

Future Automation Engineering using Structural Graph Convolutional Neural Networks

Authors: Jiang Wan, Blake S. Pollard, Sujit Rokka Chhetri, Palash Goyal, Mohammad Abdullah Al Faruque, Arquimedes Canedo

Abstract: The digitalization of automation engineering generates large quantities of engineering data that is interlinked in knowledge graphs. Classifying and clustering subgraphs according to their functionality is useful to discover functionally equivalent engineering artifacts that exhibit different graph structures. This paper presents a new graph learning algorithm designed to classify engineering data… ▽ More The digitalization of automation engineering generates large quantities of engineering data that is interlinked in knowledge graphs. Classifying and clustering subgraphs according to their functionality is useful to discover functionally equivalent engineering artifacts that exhibit different graph structures. This paper presents a new graph learning algorithm designed to classify engineering data artifacts -- represented in the form of graphs -- according to their structure and neighborhood features. Our Structural Graph Convolutional Neural Network (SGCNN) is capable of learning graphs and subgraphs with a novel graph invariant convolution kernel and downsampling/pooling algorithm. On a realistic engineering-related dataset, we show that SGCNN is capable of achieving ~91% classification accuracy. △ Less

Submitted 24 August, 2018; originally announced August 2018.

Comments: ICCAD 2018

arXiv:1311.6005 [pdf]

Modeling and Simulation of the EV Charging in a Residential Distribution Power Grid

Authors: Fereidoun Ahourai, Irvin Huang, Mohammad Abdullah Al Faruque

Abstract: There are numerous advantages of using Electric Vehicles (EVs) as an alternative method of transportation. However, an increase in EV usage in the existing residential distribution grid poses problems such as overloading the existing infrastructure. In this paper, we have modeled and simulated a residential distribution grid in GridLAB-D (an open-source software tool used to model, simulate, and a… ▽ More There are numerous advantages of using Electric Vehicles (EVs) as an alternative method of transportation. However, an increase in EV usage in the existing residential distribution grid poses problems such as overloading the existing infrastructure. In this paper, we have modeled and simulated a residential distribution grid in GridLAB-D (an open-source software tool used to model, simulate, and analyze power distribution systems) to illustrate the problems associated with a higher EV market penetration rates in the residential domain. Power grid upgrades or control algorithms at the transformer level are required to overcome issues such as transformer overloading. We demonstrate the method of coordinating EV charging in a residential distribution grid so as to overcome the overloading problem without any upgrades in the distribution grid. △ Less

Submitted 23 November, 2013; originally announced November 2013.

Comments: Proceedings of Green Energy and Systems Conference 2013, November 25, Long Beach, CA, USA

arXiv:1007.3633 [pdf]

Alternatives to Mobile Keypad Design: Improved Text Feed

Authors: Satish Narayana Srirama, M. A. A. Faruque, M. A. S. Munni

Abstract: In this paper we tried to focus on some of the problems with the mobile keypad and text entering in these devices, and tried to give some possible suggestions. We mainly took some of the basic Human Computer Interaction principles and some general issues into consideration. In this paper we tried to focus on some of the problems with the mobile keypad and text entering in these devices, and tried to give some possible suggestions. We mainly took some of the basic Human Computer Interaction principles and some general issues into consideration. △ Less

Submitted 21 July, 2010; originally announced July 2010.

Comments: 6th International Conference on Computer and Information Technology (ICCIT2003), 19-21 December, 2003

Showing 1–50 of 50 results for author: Faruque, M A A