-
Network architecture search of X-ray based scientific applications
Authors:
Adarsha Balaji,
Ramyad Hadidi,
Gregory Kollmer,
Mohammed E. Fouda,
Prasanna Balaprakash
Abstract:
X-ray and electron diffraction-based microscopy use bragg peak detection and ptychography to perform 3-D imaging at an atomic resolution. Typically, these techniques are implemented using computationally complex tasks such as a Psuedo-Voigt function or solving a complex inverse problem. Recently, the use of deep neural networks has improved the existing state-of-the-art approaches. However, the de…
▽ More
X-ray and electron diffraction-based microscopy use bragg peak detection and ptychography to perform 3-D imaging at an atomic resolution. Typically, these techniques are implemented using computationally complex tasks such as a Psuedo-Voigt function or solving a complex inverse problem. Recently, the use of deep neural networks has improved the existing state-of-the-art approaches. However, the design and development of the neural network models depends on time and labor intensive tuning of the model by application experts. To that end, we propose a hyperparameter (HPS) and neural architecture search (NAS) approach to automate the design and optimization of the neural network models for model size, energy consumption and throughput. We demonstrate the improved performance of the auto-tuned models when compared to the manually tuned BraggNN and PtychoNN benchmark. We study and demonstrate the importance of the exploring the search space of tunable hyperparameters in enhancing the performance of bragg peak detection and ptychographic reconstruction. Our NAS and HPS of (1) BraggNN achieves a 31.03\% improvement in bragg peak detection accuracy with a 87.57\% reduction in model size, and (2) PtychoNN achieves a 16.77\% improvement in model accuracy and a 12.82\% reduction in model size when compared to the baseline PtychoNN model. When inferred on the Orin-AGX platform, the optimized Braggnn and Ptychonn models demonstrate a 10.51\% and 9.47\% reduction in inference latency and a 44.18\% and 15.34\% reduction in energy consumption when compared to their respective baselines, when inferred in the Orin-AGX edge platform.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
AudioFool: Fast, Universal and synchronization-free Cross-Domain Attack on Speech Recognition
Authors:
Mohamad Fakih,
Rouwaida Kanj,
Fadi Kurdahi,
Mohammed E. Fouda
Abstract:
Automatic Speech Recognition systems have been shown to be vulnerable to adversarial attacks that manipulate the command executed on the device. Recent research has focused on exploring methods to create such attacks, however, some issues relating to Over-The-Air (OTA) attacks have not been properly addressed. In our work, we examine the needed properties of robust attacks compatible with the OTA…
▽ More
Automatic Speech Recognition systems have been shown to be vulnerable to adversarial attacks that manipulate the command executed on the device. Recent research has focused on exploring methods to create such attacks, however, some issues relating to Over-The-Air (OTA) attacks have not been properly addressed. In our work, we examine the needed properties of robust attacks compatible with the OTA model, and we design a method of generating attacks with arbitrary such desired properties, namely the invariance to synchronization, and the robustness to filtering: this allows a Denial-of-Service (DoS) attack against ASR systems. We achieve these characteristics by constructing attacks in a modified frequency domain through an inverse Fourier transform. We evaluate our method on standard keyword classification tasks and analyze it in OTA, and we analyze the properties of the cross-domain attacks to explain the efficiency of the approach.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
A Survey on Multi-AP Coordination Approaches over Emerging WLANs: Future Directions and Open Challenges
Authors:
Shikhar Verma,
Tiago Koketsu Rodrigues,
Yuichi Kawamoto,
Mostafa M. Fouda,
Nei Kato
Abstract:
Recent advancements in wireless local area network (WLAN) technology include IEEE 802.11be and 802.11ay, often known as Wi-Fi 7 and WiGig, respectively. The goal of these developments is to provide Extremely High Throughput (EHT) and low latency to meet the demands of future applications like as 8K videos, augmented and virtual reality, the Internet of Things, telesurgery, and other develo** tec…
▽ More
Recent advancements in wireless local area network (WLAN) technology include IEEE 802.11be and 802.11ay, often known as Wi-Fi 7 and WiGig, respectively. The goal of these developments is to provide Extremely High Throughput (EHT) and low latency to meet the demands of future applications like as 8K videos, augmented and virtual reality, the Internet of Things, telesurgery, and other develo** technologies. IEEE 802.11be includes new features such as 320 MHz bandwidth, multi-link operation, Multi-user Multi-Input Multi-Output, orthogonal frequency-division multiple access, and Multiple-Access Point (multi-AP) coordination (MAP-Co) to achieve EHT. With the increase in the number of overlap** APs and inter-AP interference, researchers have focused on studying MAP-Co approaches for coordinated transmission in IEEE 802.11be, making MAP-Co a key feature of future WLANs. Moreover, similar issues may arise in EHF bands WLAN, particularly for standards beyond IEEE 802.11ay. This has prompted researchers to investigate the implementation of MAP-Co over future 802.11ay WLANs. Thus, in this article, we provide a comprehensive review of the state-of-the-art MAP-Co features and their shortcomings concerning emerging WLAN. Finally, we discuss several novel future directions and open challenges for MAP-Co.
△ Less
Submitted 19 December, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Low Precision Quantization-aware Training in Spiking Neural Networks with Differentiable Quantization Function
Authors:
Ayan Shymyrbay,
Mohammed E. Fouda,
Ahmed Eltawil
Abstract:
Deep neural networks have been proven to be highly effective tools in various domains, yet their computational and memory costs restrict them from being widely deployed on portable devices. The recent rapid increase of edge computing devices has led to an active search for techniques to address the above-mentioned limitations of machine learning frameworks. The quantization of artificial neural ne…
▽ More
Deep neural networks have been proven to be highly effective tools in various domains, yet their computational and memory costs restrict them from being widely deployed on portable devices. The recent rapid increase of edge computing devices has led to an active search for techniques to address the above-mentioned limitations of machine learning frameworks. The quantization of artificial neural networks (ANNs), which converts the full-precision synaptic weights into low-bit versions, emerged as one of the solutions. At the same time, spiking neural networks (SNNs) have become an attractive alternative to conventional ANNs due to their temporal information processing capability, energy efficiency, and high biological plausibility. Despite being driven by the same motivation, the simultaneous utilization of both concepts has yet to be thoroughly studied. Therefore, this work aims to bridge the gap between recent progress in quantized neural networks and SNNs. It presents an extensive study on the performance of the quantization function, represented as a linear combination of sigmoid functions, exploited in low-bit weight quantization in SNNs. The presented quantization function demonstrates the state-of-the-art performance on four popular benchmarks, CIFAR10-DVS, DVS128 Gesture, N-Caltech101, and N-MNIST, for binary networks (64.05\%, 95.45\%, 68.71\%, and 99.43\% respectively) with small accuracy drops and up to 31$\times$ memory savings, which outperforms existing methods.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Smart Handover with Predicted User Behavior using Convolutional Neural Networks for WiGig Systems
Authors:
Tiago Koketsu Rodrigues,
Shikhar Verma,
Yuichi Kawamoto,
Nei Kato,
Mostafa M. Fouda,
Muhammad Ismail
Abstract:
WiGig networks and 60 GHz frequency communications have a lot of potential for commercial and personal use. They can offer extremely high transmission rates but at the cost of low range and penetration. Due to these issues, WiGig systems are unstable and need to rely on frequent handovers to maintain high-quality connections. However, this solution is problematic as it forces users into bad connec…
▽ More
WiGig networks and 60 GHz frequency communications have a lot of potential for commercial and personal use. They can offer extremely high transmission rates but at the cost of low range and penetration. Due to these issues, WiGig systems are unstable and need to rely on frequent handovers to maintain high-quality connections. However, this solution is problematic as it forces users into bad connections and downtime before they are switched to a better access point. In this work, we use Machine Learning to identify patterns in user behaviors and predict user actions. This prediction is used to do proactive handovers, switching users to access points with better future transmission rates and a more stable environment based on the future state of the user. Results show that not only the proposal is effective at predicting channel data, but the use of such predictions improves system performance and avoids unnecessary handovers.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
Thermal Heating in ReRAM Crossbar Arrays: Challenges and Solutions
Authors:
Kamilya Smagulova,
Mohammed E. Fouda,
Ahmed Eltawil
Abstract:
The higher speed, scalability and parallelism offered by ReRAM crossbar arrays foster development of ReRAM-based next generation AI accelerators. At the same time, sensitivity of ReRAM to temperature variations decreases R_on/Roff ratio and negatively affects the achieved accuracy and reliability of the hardware. Various works on temperature-aware optimization and remap** in ReRAM crossbar array…
▽ More
The higher speed, scalability and parallelism offered by ReRAM crossbar arrays foster development of ReRAM-based next generation AI accelerators. At the same time, sensitivity of ReRAM to temperature variations decreases R_on/Roff ratio and negatively affects the achieved accuracy and reliability of the hardware. Various works on temperature-aware optimization and remap** in ReRAM crossbar arrays reported up to 58\% improvement in accuracy and 2.39$\times$ ReRAM lifetime enhancement. This paper classifies the challenges caused by thermal heat, starting from constraints in ReRAM cells' dimensions and characteristics to their placement in the architecture. In addition, it reviews available solutions designed to mitigate the impact of these challenges, including emerging temperature-resilient DNN training methods. Our work also provides a summary of the techniques and their advantages and limitations.
△ Less
Submitted 31 January, 2023; v1 submitted 28 December, 2022;
originally announced December 2022.
-
G2NetPL: Generic Game-Theoretic Network for Partial-Label Image Classification
Authors:
Rabab Abdelfattah,
Xin Zhang,
Mostafa M. Fouda,
Xiaofeng Wang,
Song Wang
Abstract:
Multi-label image classification aims to predict all possible labels in an image. It is usually formulated as a partial-label learning problem, since it could be expensive in practice to annotate all the labels in every training image. Existing works on partial-label learning focus on the case where each training image is labeled with only a subset of its positive/negative labels. To effectively a…
▽ More
Multi-label image classification aims to predict all possible labels in an image. It is usually formulated as a partial-label learning problem, since it could be expensive in practice to annotate all the labels in every training image. Existing works on partial-label learning focus on the case where each training image is labeled with only a subset of its positive/negative labels. To effectively address partial-label classification, this paper proposes an end-to-end Generic Game-theoretic Network (G2NetPL) for partial-label learning, which can be applied to most partial-label settings, including a very challenging, but annotation-efficient case where only a subset of the training images are labeled, each with only one positive label, while the rest of the training images remain unlabeled. In G2NetPL, each unobserved label is associated with a soft pseudo label, which, together with the network, formulates a two-player non-zero-sum non-cooperative game. The objective of the network is to minimize the loss function with given pseudo labels, while the pseudo labels will seek convergence to 1 (positive) or 0 (negative) with a penalty of deviating from the predicted labels determined by the network. In addition, we introduce a confidence-aware scheduler into the loss of the network to adaptively perform easy-to-hard learning for different labels. Extensive experiments demonstrate that our proposed G2NetPL outperforms many state-of-the-art multi-label classification methods under various partial-label settings on three different datasets.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
Mixed-Precision Neural Networks: A Survey
Authors:
Mariam Rakka,
Mohammed E. Fouda,
Pramod Khargonekar,
Fadi Kurdahi
Abstract:
Mixed-precision Deep Neural Networks achieve the energy efficiency and throughput needed for hardware deployment, particularly when the resources are limited, without sacrificing accuracy. However, the optimal per-layer bit precision that preserves accuracy is not easily found, especially with the abundance of models, datasets, and quantization techniques that creates an enormous search space. In…
▽ More
Mixed-precision Deep Neural Networks achieve the energy efficiency and throughput needed for hardware deployment, particularly when the resources are limited, without sacrificing accuracy. However, the optimal per-layer bit precision that preserves accuracy is not easily found, especially with the abundance of models, datasets, and quantization techniques that creates an enormous search space. In order to tackle this difficulty, a body of literature has emerged recently, and several frameworks that achieved promising accuracy results have been proposed. In this paper, we start by summarizing the quantization techniques used generally in literature. Then, we present a thorough survey of the mixed-precision frameworks, categorized according to their optimization techniques such as reinforcement learning and quantization techniques like deterministic rounding. Furthermore, the advantages and shortcomings of each framework are discussed, where we present a juxtaposition. We finally give guidelines for future mixed-precision frameworks.
△ Less
Submitted 11 August, 2022;
originally announced August 2022.
-
DNA Pattern Matching Acceleration with Analog Resistive CAM
Authors:
**ane Bazzi,
Jana Sweidan,
Mohammed E. Fouda,
Rouwaida Kanj,
Ahmed M. Eltawil
Abstract:
DNA pattern matching is essential for many widely used bioinformatics applications. Disease diagnosis is one of these applications, since analyzing changes in DNA sequences can increase our understanding of possible genetic diseases. The remarkable growth in the size of DNA datasets has resulted in challenges in discovering DNA patterns efficiently in terms of run time and power consumption. In th…
▽ More
DNA pattern matching is essential for many widely used bioinformatics applications. Disease diagnosis is one of these applications, since analyzing changes in DNA sequences can increase our understanding of possible genetic diseases. The remarkable growth in the size of DNA datasets has resulted in challenges in discovering DNA patterns efficiently in terms of run time and power consumption. In this paper, we propose an efficient hardware and software codesign that determines the chance of the occurrence of repeat-expansion diseases using DNA pattern matching. The proposed design parallelizes the DNA pattern matching task using associative memory realized with analog content-addressable memory and implements an algorithm that returns the maximum number of consecutive occurrences of a specific pattern within a DNA sequence. We fully implement all the required hardware circuits with PTM 45-nm technology, and we evaluate the proposed architecture on a practical human DNA dataset. The results show that our design is energy-efficient and significantly accelerates the DNA pattern matching task compared to previous approaches described in the literature.
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
BackLink: Supervised Local Training with Backward Links
Authors:
Wenzhe Guo,
Mohammed E Fouda,
Ahmed M. Eltawil,
Khaled N. Salama
Abstract:
Empowered by the backpropagation (BP) algorithm, deep neural networks have dominated the race in solving various cognitive tasks. The restricted training pattern in the standard BP requires end-to-end error propagation, causing large memory cost and prohibiting model parallelization. Existing local training methods aim to resolve the training obstacle by completely cutting off the backward path be…
▽ More
Empowered by the backpropagation (BP) algorithm, deep neural networks have dominated the race in solving various cognitive tasks. The restricted training pattern in the standard BP requires end-to-end error propagation, causing large memory cost and prohibiting model parallelization. Existing local training methods aim to resolve the training obstacle by completely cutting off the backward path between modules and isolating their gradients to reduce memory cost and accelerate the training process. These methods prevent errors from flowing between modules and hence information exchange, resulting in inferior performance. This work proposes a novel local training algorithm, BackLink, which introduces inter-module backward dependency and allows errors to flow between modules. The algorithm facilitates information to flow backward along with the network. To preserve the computational advantage of local training, BackLink restricts the error propagation length within the module. Extensive experiments performed in various deep convolutional neural networks demonstrate that our method consistently improves the classification performance of local training algorithms over other methods. For example, in ResNet32 with 16 local modules, our method surpasses the conventional greedy local training method by 4.00\% and a recent work by 1.83\% in accuracy on CIFAR10, respectively. Analysis of computational costs reveals that small overheads are incurred in GPU memory costs and runtime on multiple GPUs. Our method can lead up to a 79\% reduction in memory cost and 52\% in simulation runtime in ResNet110 compared to the standard BP. Therefore, our method could create new opportunities for improving training algorithms towards better efficiency and biological plausibility.
△ Less
Submitted 14 May, 2022;
originally announced May 2022.
-
DT2CAM: A Decision Tree to Content Addressable Memory Framework
Authors:
Mariam Rakka,
Mohammed E. Fouda,
Rouwaida Kanj,
Fadi Kurdahi
Abstract:
Decision trees are considered one of the most powerful tools for data classification. Accelerating the decision tree search is crucial for on-the-edge applications that have limited power and latency budget. In this paper, we propose a Content Addressable Memory (CAM) Compiler for Decision Tree (DT) inference acceleration. We propose a novel "adaptive-precision" scheme that results in a compact im…
▽ More
Decision trees are considered one of the most powerful tools for data classification. Accelerating the decision tree search is crucial for on-the-edge applications that have limited power and latency budget. In this paper, we propose a Content Addressable Memory (CAM) Compiler for Decision Tree (DT) inference acceleration. We propose a novel "adaptive-precision" scheme that results in a compact implementation and enables an efficient bijective map** to Ternary Content Addressable Memories while maintaining high inference accuracies. In addition, a Resistive-CAM (ReCAM) functional synthesizer is developed for map** the decision tree to the ReCAM and performing functional simulations for energy, latency, and accuracy evaluations. We study the decision tree accuracy under hardware non-idealities including device defects, manufacturing variability, and input encoding noise. We test our framework on various DT datasets including \textit{Give Me Some Credit}, \textit{Titanic}, and \textit{COVID-19}. Our results reveal up to {42.4\%} energy savings and up to 17.8x better energy-delay-area product compared to the state-of-art hardware accelerators, and up to 333 million decisions per sec for the pipelined implementation.
△ Less
Submitted 12 April, 2022;
originally announced April 2022.
-
Efficient Analog CAM Design
Authors:
**ane Bazzi,
Jana Sweidan,
Mohammed E. Fouda,
Rouwaida Kanj,
Ahmed M. Eltawil
Abstract:
Content Addressable Memories (CAMs) are considered a key-enabler for in-memory computing (IMC). IMC shows order of magnitude improvement in energy efficiency and throughput compared to traditional computing techniques. Recently, analog CAMs (aCAMs) were proposed as a means to improve storage density and energy efficiency. In this work, we propose two new aCAM cells to improve data encoding and rob…
▽ More
Content Addressable Memories (CAMs) are considered a key-enabler for in-memory computing (IMC). IMC shows order of magnitude improvement in energy efficiency and throughput compared to traditional computing techniques. Recently, analog CAMs (aCAMs) were proposed as a means to improve storage density and energy efficiency. In this work, we propose two new aCAM cells to improve data encoding and robustness as compared to existing aCAM cells. We propose a methodology to choose the margin and interval width for data encoding. In addition, we perform a comprehensive comparison against prior work in terms of the number of intervals, noise sensitivity, dynamic range, energy, latency, area, and probability of failure.
△ Less
Submitted 4 March, 2022;
originally announced March 2022.
-
In-memory Associative Processors: Tutorial, Potential, and Challenges
Authors:
Mohammed E. Fouda,
Hasan Erdem Yantir,
Ahmed M. Eltawil,
Fadi Kurdahi
Abstract:
In-memory computing is an emerging computing paradigm that overcomes the limitations of exiting Von-Neumann computing architectures such as the memory-wall bottleneck. In such paradigm, the computations are performed directly on the data stored in the memory, which highly reduces the memory-processor communications during computation. Hence, significant speedup and energy savings could be achieved…
▽ More
In-memory computing is an emerging computing paradigm that overcomes the limitations of exiting Von-Neumann computing architectures such as the memory-wall bottleneck. In such paradigm, the computations are performed directly on the data stored in the memory, which highly reduces the memory-processor communications during computation. Hence, significant speedup and energy savings could be achieved especially with data-intensive applications. Associative processors (APs) were proposed in the seventies and recently were revived thanks to the high-density memories. In this tutorial brief, we overview the functionalities and recent trends of APs in addition to the implementation of each content-addressable memory with different technologies. The AP operations and runtime complexity are also summarized. We also explain and explore the possible applications that can benefit from APs. Finally, the AP limitations, challenges, and future directions are discussed.
△ Less
Submitted 12 April, 2022; v1 submitted 1 March, 2022;
originally announced March 2022.
-
Efficient Training of Spiking Neural Networks with Temporally-Truncated Local Backpropagation through Time
Authors:
Wenzhe Guo,
Mohammed E. Fouda,
Ahmed M. Eltawil,
Khaled Nabil Salama
Abstract:
Directly training spiking neural networks (SNNs) has remained challenging due to complex neural dynamics and intrinsic non-differentiability in firing functions. The well-known backpropagation through time (BPTT) algorithm proposed to train SNNs suffers from large memory footprint and prohibits backward and update unlocking, making it impossible to exploit the potential of locally-supervised train…
▽ More
Directly training spiking neural networks (SNNs) has remained challenging due to complex neural dynamics and intrinsic non-differentiability in firing functions. The well-known backpropagation through time (BPTT) algorithm proposed to train SNNs suffers from large memory footprint and prohibits backward and update unlocking, making it impossible to exploit the potential of locally-supervised training methods. This work proposes an efficient and direct training algorithm for SNNs that integrates a locally-supervised training method with a temporally-truncated BPTT algorithm. The proposed algorithm explores both temporal and spatial locality in BPTT and contributes to significant reduction in computational cost including GPU memory utilization, main memory access and arithmetic operations. We thoroughly explore the design space concerning temporal truncation length and local training block size and benchmark their impact on classification accuracy of different networks running different types of tasks. The results reveal that temporal truncation has a negative effect on the accuracy of classifying frame-based datasets, but leads to improvement in accuracy on dynamic-vision-sensor (DVS) recorded datasets. In spite of resulting information loss, local training is capable of alleviating overfitting. The combined effect of temporal truncation and local training can lead to the slowdown of accuracy drop and even improvement in accuracy. In addition, training deep SNNs models such as AlexNet classifying CIFAR10-DVS dataset leads to 7.26% increase in accuracy, 89.94% reduction in GPU memory, 10.79% reduction in memory access, and 99.64% reduction in MAC operations compared to the standard end-to-end BPTT.
△ Less
Submitted 13 December, 2021;
originally announced January 2022.
-
CAPTIVE: Constrained Adversarial Perturbations to Thwart IC Reverse Engineering
Authors:
Amir Hosein Afandizadeh Zargari,
Marzieh AshrafiAmiri,
Minjun Seo,
Sai Manoj Pudukotai Dinakarrao,
Mohammed E. Fouda,
Fadi Kurdahi
Abstract:
Reverse engineering (RE) in Integrated Circuits (IC) is a process in which one will attempt to extract the internals of an IC, extract the circuit structure, and determine the gate-level information of an IC. In general, RE process can be done for validation as well as intellectual property (IP) stealing intentions. In addition, RE also facilitates different illicit activities such as insertion of…
▽ More
Reverse engineering (RE) in Integrated Circuits (IC) is a process in which one will attempt to extract the internals of an IC, extract the circuit structure, and determine the gate-level information of an IC. In general, RE process can be done for validation as well as intellectual property (IP) stealing intentions. In addition, RE also facilitates different illicit activities such as insertion of hardware Trojan, pirate, or counterfeit a design, or develop an attack. In this work, we propose an approach to introduce cognitive perturbations, with the aid of adversarial machine learning, to the IC layout that could prevent the RE process from succeeding. We first construct a layer-by-layer image dataset of 45nm predictive technology. With this dataset, we propose a conventional neural network model called RecoG-Net to recognize the logic gates, which is the first step in RE. RecoG-Net is successfully to recognize the gates with more than 99.7% accuracy. Our thwarting approach utilizes the concept of the adversarial attack generation algorithms to generate perturbation. Unlike traditional adversarial attacks in machine learning, the perturbation generation needs to be highly constrained to meet the fab rules such as Design Rule Checking (DRC) Layout vs. Schematic (LVS) checks. Hence, we propose CAPTIVE as an constrained perturbation generation satisfying the DRC. The experiments shows that the accuracy of reverse engineering using machine learning techniques can decrease from 100% to approximately 30% based on the adversary generator.
△ Less
Submitted 21 October, 2021;
originally announced October 2021.
-
In-memory Multi-valued Associative Processor
Authors:
Mira Hout,
Mohammed E. Fouda,
Rouwaida Kanj,
Ahmed M. Eltawil
Abstract:
In-memory associative processor architectures are offered as a great candidate to overcome memory-wall bottleneck and to enable vector/parallel arithmetic operations. In this paper, we extend the functionality of the associative processor to multi-valued arithmetic. To allow for in-memory compute implementation of arithmetic or logic functions, we propose a structured methodology enabling the auto…
▽ More
In-memory associative processor architectures are offered as a great candidate to overcome memory-wall bottleneck and to enable vector/parallel arithmetic operations. In this paper, we extend the functionality of the associative processor to multi-valued arithmetic. To allow for in-memory compute implementation of arithmetic or logic functions, we propose a structured methodology enabling the automatic generation of the corresponding look-up tables (LUTs). We propose two approaches to build the LUTs: a first approach that formalizes the intuition behind LUT pass ordering and a more optimized approach that reduces the number of required write cycles. To demonstrate these methodologies, we present a novel ternary associative processor (TAP) architecture that is employed to implement efficient ternary vector in-place addition. A SPICE-MATLAB co-simulator is implemented to test the functionality of the TAP and to evaluate the performance of the proposed AP ternary in-place adder implementations in terms of energy, delay, and area. Results show that compared to the binary AP adder, the ternary AP adder results in a 12.25\% and 6.2\% reduction in energy and area, respectively. The ternary AP also demonstrates a 52.64\% reduction in energy and a delay that is up to 9.5x smaller when compared to a state-of-art ternary carry-lookahead adder.
△ Less
Submitted 18 October, 2021;
originally announced October 2021.
-
Efficient Noise Mitigation Technique for Quantum Computing
Authors:
Ali Shaib,
Mohamad H. Naim,
Mohammed E. Fouda,
Rouwaida Kanj,
Fadi Kurdahi
Abstract:
Quantum computers have enabled solving problems beyond the current computers' capabilities. However, this requires handling noise arising from unwanted interactions in these systems. Several protocols have been proposed to address efficient and accurate quantum noise profiling and mitigation. In this work, we propose a novel protocol that efficiently estimates the average output of a noisy quantum…
▽ More
Quantum computers have enabled solving problems beyond the current computers' capabilities. However, this requires handling noise arising from unwanted interactions in these systems. Several protocols have been proposed to address efficient and accurate quantum noise profiling and mitigation. In this work, we propose a novel protocol that efficiently estimates the average output of a noisy quantum device to be used for quantum noise mitigation. The multi-qubit system average behavior is approximated as a special form of a Pauli Channel where Clifford gates are used to estimate the average output for circuits of different depths. The characterized Pauli channel error rates, and state preparation and measurement errors are then used to construct the outputs for different depths thereby eliminating the need for large simulations and enabling efficient mitigation. We demonstrate the efficiency of the proposed protocol on four IBM Q 5-qubit quantum devices. Our method demonstrates improved accuracy with efficient noise characterization. We report up to 88\% and 69\% improvement for the proposed approach compared to the unmitigated, and pure measurement error mitigation approaches, respectively.
△ Less
Submitted 10 September, 2021;
originally announced September 2021.
-
Resistive Neural Hardware Accelerators
Authors:
Kamilya Smagulova,
Mohammed E. Fouda,
Fadi Kurdahi,
Khaled Salama,
Ahmed Eltawil
Abstract:
Deep Neural Networks (DNNs), as a subset of Machine Learning (ML) techniques, entail that real-world data can be learned and that decisions can be made in real-time. However, their wide adoption is hindered by a number of software and hardware limitations. The existing general-purpose hardware platforms used to accelerate DNNs are facing new challenges associated with the growing amount of data an…
▽ More
Deep Neural Networks (DNNs), as a subset of Machine Learning (ML) techniques, entail that real-world data can be learned and that decisions can be made in real-time. However, their wide adoption is hindered by a number of software and hardware limitations. The existing general-purpose hardware platforms used to accelerate DNNs are facing new challenges associated with the growing amount of data and are exponentially increasing the complexity of computations. An emerging non-volatile memory (NVM) devices and processing-in-memory (PIM) paradigm is creating a new hardware architecture generation with increased computing and storage capabilities. In particular, the shift towards ReRAM-based in-memory computing has great potential in the implementation of area and power efficient inference and in training large-scale neural network architectures. These can accelerate the process of the IoT-enabled AI technologies entering our daily life. In this survey, we review the state-of-the-art ReRAM-based DNN many-core accelerators, and their superiority compared to CMOS counterparts was shown. The review covers different aspects of hardware and software realization of DNN accelerators, their present limitations, and future prospectives. In particular, comparison of the accelerators shows the need for the introduction of new performance metrics and benchmarking standards. In addition, the major concerns regarding the efficient design of accelerators include a lack of accuracy in simulation tools for software and hardware co-design.
△ Less
Submitted 8 September, 2021;
originally announced September 2021.
-
Detection of False-Reading Attacks in the AMI Net-Metering System
Authors:
Mahmoud M. Badr,
Mohamed I. Ibrahem,
Mohamed Mahmoud,
Mostafa M. Fouda,
Waleed Alasmary
Abstract:
In smart grid, malicious customers may compromise their smart meters (SMs) to report false readings to achieve financial gains illegally. Reporting false readings not only causes hefty financial losses to the utility but may also degrade the grid performance because the reported readings are used for energy management. This paper is the first work that investigates this problem in the net-metering…
▽ More
In smart grid, malicious customers may compromise their smart meters (SMs) to report false readings to achieve financial gains illegally. Reporting false readings not only causes hefty financial losses to the utility but may also degrade the grid performance because the reported readings are used for energy management. This paper is the first work that investigates this problem in the net-metering system, in which one SM is used to report the difference between the power consumed and the power generated. First, we prepare a benign dataset for the net-metering system by processing a real power consumption and generation dataset. Then, we propose a new set of attacks tailored for the net-metering system to create malicious dataset. After that, we analyze the data and we found time correlations between the net meter readings and correlations between the readings and relevant data obtained from trustworthy sources such as the solar irradiance and temperature. Based on the data analysis, we propose a general multi-data-source deep hybrid learning-based detector to identify the false-reading attacks. Our detector is trained on net meter readings of all customers besides data from the trustworthy sources to enhance the detector performance by learning the correlations between them. The rationale here is that although an attacker can report false readings, he cannot manipulate the solar irradiance and temperature values because they are beyond his control. Extensive experiments have been conducted, and the results indicate that our detector can identify the false-reading attacks with high detection rate and low false alarm.
△ Less
Submitted 2 December, 2020;
originally announced December 2020.
-
On-Chip Error-triggered Learning of Multi-layer Memristive Spiking Neural Networks
Authors:
Melika Payvand,
Mohammed E. Fouda,
Fadi Kurdahi,
Ahmed M. Eltawil,
Emre O. Neftci
Abstract:
Recent breakthroughs in neuromorphic computing show that local forms of gradient descent learning are compatible with Spiking Neural Networks (SNNs) and synaptic plasticity. Although SNNs can be scalably implemented using neuromorphic VLSI, an architecture that can learn using gradient-descent in situ is still missing. In this paper, we propose a local, gradient-based, error-triggered learning alg…
▽ More
Recent breakthroughs in neuromorphic computing show that local forms of gradient descent learning are compatible with Spiking Neural Networks (SNNs) and synaptic plasticity. Although SNNs can be scalably implemented using neuromorphic VLSI, an architecture that can learn using gradient-descent in situ is still missing. In this paper, we propose a local, gradient-based, error-triggered learning algorithm with online ternary weight updates. The proposed algorithm enables online training of multi-layer SNNs with memristive neuromorphic hardware showing a small loss in the performance compared with the state of the art. We also propose a hardware architecture based on memristive crossbar arrays to perform the required vector-matrix multiplications. The necessary peripheral circuitry including pre-synaptic, post-synaptic and write circuits required for online training, have been designed in the sub-threshold regime for power saving with a standard 180 nm CMOS process.
△ Less
Submitted 21 November, 2020;
originally announced November 2020.
-
Privacy-Preserving and Efficient Data Collection Scheme for AMI Networks Using Deep Learning
Authors:
Mohamed I. Ibrahem,
Mohamed Mahmoud,
Mostafa M. Fouda,
Fawaz Alsolami,
Waleed Alasmary,
Xuemin,
Shen
Abstract:
In advanced metering infrastructure (AMI), smart meters (SMs), which are installed at the consumer side, send fine-grained power consumption readings periodically to the electricity utility for load monitoring and energy management. Change and transmit (CAT) is an efficient approach to collect these readings, where the readings are not transmitted when there is no enough change in consumption. How…
▽ More
In advanced metering infrastructure (AMI), smart meters (SMs), which are installed at the consumer side, send fine-grained power consumption readings periodically to the electricity utility for load monitoring and energy management. Change and transmit (CAT) is an efficient approach to collect these readings, where the readings are not transmitted when there is no enough change in consumption. However, this approach causes a privacy problem that is by analyzing the transmission pattern of an SM, sensitive information on the house dwellers can be inferred. For instance, since the transmission pattern is distinguishable when dwellers are on travel, attackers may analyze the pattern to launch a presence-privacy attack (PPA) to infer whether the dwellers are absent from home. In this paper, we propose a scheme, called "STDL", for efficient collection of power consumption readings in AMI networks while preserving the consumers' privacy by sending spoofing transmissions (redundant real readings) using a deep-learning approach. We first use a clustering technique and real power consumption readings to create a dataset for transmission patterns using the CAT approach. Then, we train an attacker model using deep-learning, and our evaluations indicate that the success rate of the attacker is about 91%. Finally, we train a deep-learning-based defense model to send spoofing transmissions efficiently to thwart the PPA. Extensive evaluations are conducted, and the results indicate that our scheme can reduce the attacker's success rate, to 13.52% in case he knows the defense model and to 3.15% in case he does not know the model, while still achieving high efficiency in terms of the number of readings that should be transmitted. Our measurements indicate that the proposed scheme can reduce the number of readings that should be transmitted by about 41% compared to continuously transmitting readings.
△ Less
Submitted 7 November, 2020;
originally announced November 2020.
-
Detection of Lying Electrical Vehicles in Charging Coordination Application Using Deep Learning
Authors:
Ahmed Shafee,
Mostafa M. Fouda,
Mohamed Mahmoud,
Waleed Alasmary,
Abdulah J. Aljohani,
Fathi Amsaad
Abstract:
The simultaneous charging of many electric vehicles (EVs) stresses the distribution system and may cause grid instability in severe cases. The best way to avoid this problem is by charging coordination. The idea is that the EVs should report data (such as state-of-charge (SoC) of the battery) to run a mechanism to prioritize the charging requests and select the EVs that should charge during this t…
▽ More
The simultaneous charging of many electric vehicles (EVs) stresses the distribution system and may cause grid instability in severe cases. The best way to avoid this problem is by charging coordination. The idea is that the EVs should report data (such as state-of-charge (SoC) of the battery) to run a mechanism to prioritize the charging requests and select the EVs that should charge during this time slot and defer other requests to future time slots. However, EVs may lie and send false data to receive high charging priority illegally. In this paper, we first study this attack to evaluate the gains of the lying EVs and how their behavior impacts the honest EVs and the performance of charging coordination mechanism. Our evaluations indicate that lying EVs have a greater chance to get charged comparing to honest EVs and they degrade the performance of the charging coordination mechanism. Then, an anomaly based detector that is using deep neural networks (DNN) is devised to identify the lying EVs. To do that, we first create an honest dataset for charging coordination application using real driving traces and information revealed by EV manufacturers, and then we also propose a number of attacks to create malicious data. We trained and evaluated two models, which are the multi-layer perceptron (MLP) and the gated recurrent unit (GRU) using this dataset and the GRU detector gives better results. Our evaluations indicate that our detector can detect lying EVs with high accuracy and low false positive rate.
△ Less
Submitted 28 May, 2020;
originally announced May 2020.
-
Efficient Privacy-Preserving Electricity Theft Detection with Dynamic Billing and Load Monitoring for AMI Networks
Authors:
Mohamed I. Ibrahem,
Mahmoud Nabil,
Mostafa M. Fouda,
Mohamed Mahmoud,
Waleed Alasmary,
Fawaz Alsolami
Abstract:
In advanced metering infrastructure (AMI), smart meters (SMs) are installed at the consumer side to send fine-grained power consumption readings periodically to the system operator (SO) for load monitoring, energy management, billing, etc. However, fraudulent consumers launch electricity theft cyber-attacks by reporting false readings to reduce their bills illegally. These attacks do not only caus…
▽ More
In advanced metering infrastructure (AMI), smart meters (SMs) are installed at the consumer side to send fine-grained power consumption readings periodically to the system operator (SO) for load monitoring, energy management, billing, etc. However, fraudulent consumers launch electricity theft cyber-attacks by reporting false readings to reduce their bills illegally. These attacks do not only cause financial losses but may also degrade the grid performance because the readings are used for grid management. To identify these attackers, the existing schemes employ machine-learning models using the consumers' fine-grained readings, which violates the consumers' privacy by revealing their lifestyle. In this paper, we propose an efficient scheme that enables the SO to detect electricity theft, compute bills, and monitor load while preserving the consumers' privacy. The idea is that SMs encrypt their readings using functional encryption, and the SO uses the ciphertexts to (i) compute the bills following dynamic pricing approach, (ii) monitor the grid load, and (iii) evaluate a machine-learning model to detect fraudulent consumers, without being able to learn the individual readings to preserve consumers' privacy. We adapted a functional encryption scheme so that the encrypted readings are aggregated for billing and load monitoring and only the aggregated value is revealed to the SO. Also, we exploited the inner-product operations on encrypted readings to evaluate a machine-learning model to detect fraudulent consumers. Real dataset is used to evaluate our scheme, and our evaluations indicate that our scheme is secure and can detect fraudulent consumers accurately with low communication and computation overhead.
△ Less
Submitted 28 May, 2020;
originally announced May 2020.
-
Application of ICA on Self-Interference Cancellation of In-band Full Duplex Systems
Authors:
Mohammed E. Fouda,
Sergey Shaboyan,
Ayman Elezabi,
Ahmed Eltawil
Abstract:
In this letter, we propose a modified version of Fast Independent Component Analysis (FICA) algorithm to solve the self-interference cancellation (SIC) problem in In-band Full Duplex (IBFD) communication systems. The complex mixing problem is mathematically formulated to suit the real-valued blind source separation (BSS) algorithms. In addition, we propose a method to estimate the ambiguity factor…
▽ More
In this letter, we propose a modified version of Fast Independent Component Analysis (FICA) algorithm to solve the self-interference cancellation (SIC) problem in In-band Full Duplex (IBFD) communication systems. The complex mixing problem is mathematically formulated to suit the real-valued blind source separation (BSS) algorithms. In addition, we propose a method to estimate the ambiguity factors associated with ICA lumped together with the channels and residual separation error. Experiments were performed on an FD platform where FICA-based BSS was applied for SIC in the frequency domain. Experimental results show superior performance compared to least squares SIC by up to 6 dB gain in the SNR.
△ Less
Submitted 3 January, 2020;
originally announced January 2020.
-
AI Aided Noise Processing of Spintronic Based IoT Sensor for Magnetocardiography Application
Authors:
Attayeb Mohsen,
Muftah Al-Mahdawi,
Mostafa M. Fouda,
Mikihiko Oogane,
Yasuo Ando,
Zubair Md Fadlullah
Abstract:
As we are about to embark upon the highly hyped "Society 5.0", powered by the Internet of Things (IoT), traditional ways to monitor human heart signals for tracking cardio-vascular conditions are challenging, particularly in remote healthcare settings. On the merits of low power consumption, portability, and non-intrusiveness, there are no suitable IoT solutions that can provide information compar…
▽ More
As we are about to embark upon the highly hyped "Society 5.0", powered by the Internet of Things (IoT), traditional ways to monitor human heart signals for tracking cardio-vascular conditions are challenging, particularly in remote healthcare settings. On the merits of low power consumption, portability, and non-intrusiveness, there are no suitable IoT solutions that can provide information comparable to the conventional Electrocardiography (ECG). In this paper, we propose an IoT device utilizing a spintronic ultra-sensitive sensor that measures the magnetic fields produced by cardio-vascular electrical activity, i.e. Magentocardiography (MCG). After that, we treat the low-frequency noise generated by the sensors, which is also a challenge for most other sensors dealing with low-frequency bio-magnetic signals. Instead of relying on generic signal processing techniques such as averaging or filtering, we employ deep-learning training on bio-magnetic signals. Using an existing dataset of ECG records, MCG labels are synthetically constructed. A unique deep learning structure composed of combined Convolutional Neural Network (CNN) with Gated Recurrent Unit (GRU) is trained using the labeled data moving through a striding window, which is able to smartly capture and eliminate the noise features. Simulation results are reported to evaluate the effectiveness of the proposed method that demonstrates encouraging performance.
△ Less
Submitted 10 June, 2020; v1 submitted 8 November, 2019;
originally announced November 2019.
-
Error-triggered Three-Factor Learning Dynamics for Crossbar Arrays
Authors:
Melika Payvand,
Mohammed Fouda,
Fadi Kurdahi,
Ahmed Eltawil,
Emre O. Neftci
Abstract:
Recent breakthroughs suggest that local, approximate gradient descent learning is compatible with Spiking Neural Networks (SNNs). Although SNNs can be scalably implemented using neuromorphic VLSI, an architecture that can learn in-situ as accurately as conventional processors is still missing. Here, we propose a subthreshold circuit architecture designed through insights obtained from machine lear…
▽ More
Recent breakthroughs suggest that local, approximate gradient descent learning is compatible with Spiking Neural Networks (SNNs). Although SNNs can be scalably implemented using neuromorphic VLSI, an architecture that can learn in-situ as accurately as conventional processors is still missing. Here, we propose a subthreshold circuit architecture designed through insights obtained from machine learning and computational neuroscience that could achieve such accuracy. Using a surrogate gradient learning framework, we derive local, error-triggered learning dynamics compatible with crossbar arrays and the temporal dynamics of SNNs. The derivation reveals that circuits used for inference and training dynamics can be shared, which simplifies the circuit and suppresses the effects of fabrication mismatch. We present SPICE simulations on XFAB 180nm process, as well as large-scale simulations of the spiking neural networks on event-based benchmarks, including a gesture recognition task. Our results show that the number of updates can be reduced hundred-fold compared to the standard rule while achieving performances that are on par with the state-of-the-art.
△ Less
Submitted 14 October, 2019;
originally announced October 2019.
-
Spiking Neural Networks for Inference and Learning: A Memristor-based Design Perspective
Authors:
M. E. Fouda,
F. Kurdahi,
A. Eltawil,
E. Neftci
Abstract:
On metrics of density and power efficiency, neuromorphic technologies have the potential to surpass mainstream computing technologies in tasks where real-time functionality, adaptability, and autonomy are essential. While algorithmic advances in neuromorphic computing are proceeding successfully, the potential of memristors to improve neuromorphic computing have not yet born fruit, primarily becau…
▽ More
On metrics of density and power efficiency, neuromorphic technologies have the potential to surpass mainstream computing technologies in tasks where real-time functionality, adaptability, and autonomy are essential. While algorithmic advances in neuromorphic computing are proceeding successfully, the potential of memristors to improve neuromorphic computing have not yet born fruit, primarily because they are often used as a drop-in replacement to conventional memory. However, interdisciplinary approaches anchored in machine learning theory suggest that multifactor plasticity rules matching neural and synaptic dynamics to the device capabilities can take better advantage of memristor dynamics and its stochasticity. Furthermore, such plasticity rules generally show much higher performance than that of classical Spike Time Dependent Plasticity (STDP) rules. This chapter reviews the recent development in learning with spiking neural network models and their possible implementation with memristor-based hardware.
△ Less
Submitted 8 October, 2019; v1 submitted 4 September, 2019;
originally announced September 2019.
-
Mimic Learning to Generate a Shareable Network Intrusion Detection Model
Authors:
Ahmed Shafee,
Mohamed Baza,
Douglas A. Talbert,
Mostafa M. Fouda,
Mahmoud Nabil,
Mohamed Mahmoud
Abstract:
Purveyors of malicious network attacks continue to increase the complexity and the sophistication of their techniques, and their ability to evade detection continues to improve as well. Hence, intrusion detection systems must also evolve to meet these increasingly challenging threats. Machine learning is often used to support this needed improvement. However, training a good prediction model can r…
▽ More
Purveyors of malicious network attacks continue to increase the complexity and the sophistication of their techniques, and their ability to evade detection continues to improve as well. Hence, intrusion detection systems must also evolve to meet these increasingly challenging threats. Machine learning is often used to support this needed improvement. However, training a good prediction model can require a large set of labelled training data. Such datasets are difficult to obtain because privacy concerns prevent the majority of intrusion detection agencies from sharing their sensitive data. In this paper, we propose the use of mimic learning to enable the transfer of intrusion detection knowledge through a teacher model trained on private data to a student model. This student model provides a mean of publicly sharing knowledge extracted from private data without sharing the data itself. Our results confirm that the proposed scheme can produce a student intrusion detection model that mimics the teacher model without requiring access to the original dataset.
△ Less
Submitted 18 February, 2020; v1 submitted 2 May, 2019;
originally announced May 2019.
-
Non-Stationary Polar Codes for Resistive Memories
Authors:
Marwen Zorgui,
Mohammed E. Fouda,
Zhiying Wang,
Ahmed M. Eltawil,
Fadi Kurdahi
Abstract:
Resistive memories are considered a promising memory technology enabling high storage densities with in-memory computing capabilities. However, the readout reliability of resistive memories is impaired due to the inevitable existence of wire resistance, resulting in the sneak path problem. Motivated by this problem, we study polar coding over channels with different reliability levels, termed non-…
▽ More
Resistive memories are considered a promising memory technology enabling high storage densities with in-memory computing capabilities. However, the readout reliability of resistive memories is impaired due to the inevitable existence of wire resistance, resulting in the sneak path problem. Motivated by this problem, we study polar coding over channels with different reliability levels, termed non-stationary polar codes, and we propose a technique improving its bit error rate (BER) performance. We then apply the framework of non-stationary polar codes to the crossbar array and evaluate its BER performance under two modeling approaches, namely binary symmetric channels (BSCs) and binary asymmetric channels (BSCs). Finally, we propose a technique for biasing the proportion of high-resistance states in the crossbar array and show its advantage in reducing further the BER. Several simulations are carried out using a SPICE-like simulator, exhibiting significant reduction in BER.
△ Less
Submitted 18 April, 2019;
originally announced April 2019.
-
On Resistive Memories: One Step Row Readout Technique and Sensing Circuitry
Authors:
Mohammed E Fouda,
Ahmed M. Eltawil,
Fadi Kurdahi
Abstract:
Transistor-based memories are rapidly approaching their maximum density per unit area. Resistive crossbar arrays enable denser memory due to the small size of switching devices. However, due to the resistive nature of these memories, they suffer from current sneak paths complicating the readout procedure. In this paper, we propose a row readout technique with circuitry that can be used to read {se…
▽ More
Transistor-based memories are rapidly approaching their maximum density per unit area. Resistive crossbar arrays enable denser memory due to the small size of switching devices. However, due to the resistive nature of these memories, they suffer from current sneak paths complicating the readout procedure. In this paper, we propose a row readout technique with circuitry that can be used to read {selector-less} resistive crossbar based memories. High throughput reading and writing techniques are needed to overcome the memory-wall bottleneck problem and to enable near memory computing paradigm. The proposed technique can read the entire row of dense crossbar arrays in one cycle, unlike previously published techniques. The requirements for the readout circuitry are discussed and satisfied in the proposed circuit. Additionally, an approximated expression for the power consumed while reading the array is derived. A figure of merit is defined and used to compare the proposed approach with existing reading techniques. Finally, a quantitative analysis of the effect of biasing mismatch on the array size is discussed.
△ Less
Submitted 4 March, 2019;
originally announced March 2019.
-
One-Dimensional Vector based Pattern Matching
Authors:
Y. M. Fouda
Abstract:
Template matching is a basic method in image analysis to extract useful information from images. In this paper, we suggest a new method for pattern matching. Our method transform the template image from two dimensional image into one dimensional vector. Also all sub-windows (same size of template) in the reference image will transform into one dimensional vectors. The three similarity measures SAD…
▽ More
Template matching is a basic method in image analysis to extract useful information from images. In this paper, we suggest a new method for pattern matching. Our method transform the template image from two dimensional image into one dimensional vector. Also all sub-windows (same size of template) in the reference image will transform into one dimensional vectors. The three similarity measures SAD, SSD, and Euclidean are used to compute the likeness between template and all sub-windows in the reference image to find the best match. The experimental results show the superior performance of the proposed method over the conventional methods on various template of different sizes.
△ Less
Submitted 10 September, 2014;
originally announced September 2014.