-
Network architecture search of X-ray based scientific applications
Authors:
Adarsha Balaji,
Ramyad Hadidi,
Gregory Kollmer,
Mohammed E. Fouda,
Prasanna Balaprakash
Abstract:
X-ray and electron diffraction-based microscopy use bragg peak detection and ptychography to perform 3-D imaging at an atomic resolution. Typically, these techniques are implemented using computationally complex tasks such as a Psuedo-Voigt function or solving a complex inverse problem. Recently, the use of deep neural networks has improved the existing state-of-the-art approaches. However, the de…
▽ More
X-ray and electron diffraction-based microscopy use bragg peak detection and ptychography to perform 3-D imaging at an atomic resolution. Typically, these techniques are implemented using computationally complex tasks such as a Psuedo-Voigt function or solving a complex inverse problem. Recently, the use of deep neural networks has improved the existing state-of-the-art approaches. However, the design and development of the neural network models depends on time and labor intensive tuning of the model by application experts. To that end, we propose a hyperparameter (HPS) and neural architecture search (NAS) approach to automate the design and optimization of the neural network models for model size, energy consumption and throughput. We demonstrate the improved performance of the auto-tuned models when compared to the manually tuned BraggNN and PtychoNN benchmark. We study and demonstrate the importance of the exploring the search space of tunable hyperparameters in enhancing the performance of bragg peak detection and ptychographic reconstruction. Our NAS and HPS of (1) BraggNN achieves a 31.03\% improvement in bragg peak detection accuracy with a 87.57\% reduction in model size, and (2) PtychoNN achieves a 16.77\% improvement in model accuracy and a 12.82\% reduction in model size when compared to the baseline PtychoNN model. When inferred on the Orin-AGX platform, the optimized Braggnn and Ptychonn models demonstrate a 10.51\% and 9.47\% reduction in inference latency and a 44.18\% and 15.34\% reduction in energy consumption when compared to their respective baselines, when inferred in the Orin-AGX edge platform.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
AudioFool: Fast, Universal and synchronization-free Cross-Domain Attack on Speech Recognition
Authors:
Mohamad Fakih,
Rouwaida Kanj,
Fadi Kurdahi,
Mohammed E. Fouda
Abstract:
Automatic Speech Recognition systems have been shown to be vulnerable to adversarial attacks that manipulate the command executed on the device. Recent research has focused on exploring methods to create such attacks, however, some issues relating to Over-The-Air (OTA) attacks have not been properly addressed. In our work, we examine the needed properties of robust attacks compatible with the OTA…
▽ More
Automatic Speech Recognition systems have been shown to be vulnerable to adversarial attacks that manipulate the command executed on the device. Recent research has focused on exploring methods to create such attacks, however, some issues relating to Over-The-Air (OTA) attacks have not been properly addressed. In our work, we examine the needed properties of robust attacks compatible with the OTA model, and we design a method of generating attacks with arbitrary such desired properties, namely the invariance to synchronization, and the robustness to filtering: this allows a Denial-of-Service (DoS) attack against ASR systems. We achieve these characteristics by constructing attacks in a modified frequency domain through an inverse Fourier transform. We evaluate our method on standard keyword classification tasks and analyze it in OTA, and we analyze the properties of the cross-domain attacks to explain the efficiency of the approach.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
Efficient Training of Spiking Neural Networks with Temporally-Truncated Local Backpropagation through Time
Authors:
Wenzhe Guo,
Mohammed E. Fouda,
Ahmed M. Eltawil,
Khaled Nabil Salama
Abstract:
Directly training spiking neural networks (SNNs) has remained challenging due to complex neural dynamics and intrinsic non-differentiability in firing functions. The well-known backpropagation through time (BPTT) algorithm proposed to train SNNs suffers from large memory footprint and prohibits backward and update unlocking, making it impossible to exploit the potential of locally-supervised train…
▽ More
Directly training spiking neural networks (SNNs) has remained challenging due to complex neural dynamics and intrinsic non-differentiability in firing functions. The well-known backpropagation through time (BPTT) algorithm proposed to train SNNs suffers from large memory footprint and prohibits backward and update unlocking, making it impossible to exploit the potential of locally-supervised training methods. This work proposes an efficient and direct training algorithm for SNNs that integrates a locally-supervised training method with a temporally-truncated BPTT algorithm. The proposed algorithm explores both temporal and spatial locality in BPTT and contributes to significant reduction in computational cost including GPU memory utilization, main memory access and arithmetic operations. We thoroughly explore the design space concerning temporal truncation length and local training block size and benchmark their impact on classification accuracy of different networks running different types of tasks. The results reveal that temporal truncation has a negative effect on the accuracy of classifying frame-based datasets, but leads to improvement in accuracy on dynamic-vision-sensor (DVS) recorded datasets. In spite of resulting information loss, local training is capable of alleviating overfitting. The combined effect of temporal truncation and local training can lead to the slowdown of accuracy drop and even improvement in accuracy. In addition, training deep SNNs models such as AlexNet classifying CIFAR10-DVS dataset leads to 7.26% increase in accuracy, 89.94% reduction in GPU memory, 10.79% reduction in memory access, and 99.64% reduction in MAC operations compared to the standard end-to-end BPTT.
△ Less
Submitted 13 December, 2021;
originally announced January 2022.
-
Configurable Independent Component Analysis Preprocessing Accelerator
Authors:
Hsi-Hung Lu,
Chung-An Shen,
Mohammed E. Fouda,
Ahmed M. Eltawil
Abstract:
Independent component analysis (ICA) has been used in many applications, including self-interference cancellation for in-band full-duplex wireless systems and anomaly detection in industrial internet of things. This paper presents a high-throughput and highly efficient configurable preprocessing accelerator for the ICA algorithm. The proposed ICA accelerator has three major blocks that perform dat…
▽ More
Independent component analysis (ICA) has been used in many applications, including self-interference cancellation for in-band full-duplex wireless systems and anomaly detection in industrial internet of things. This paper presents a high-throughput and highly efficient configurable preprocessing accelerator for the ICA algorithm. The proposed ICA accelerator has three major blocks that perform data centering, covariance matrix for computation, and eigenvalue decomposition (EVD). Specifically, the proposed accelerator is based on a high-performance matrix multiplication array (MMA). The proposed MMA architecture uses time-multiplexed processing so that the efficiency of hardware utilization is greatly enhanced. Furthermore, the processing flow utilizes parallel processing such that the centering, the calculation of the covariance matrix, and EVD are conducted simultaneously and are individually pipelined to maximize throughput. This paper presents the architecture, circuit design, and performance estimates based on post-layout extraction of the proposed preprocessing ICA accelerator. The proposed design achieves a throughput of 40.7 kMatrices per second at complexity of 73.3 kGE.
△ Less
Submitted 30 April, 2022; v1 submitted 10 January, 2022;
originally announced January 2022.
-
Efficient Noise Mitigation Technique for Quantum Computing
Authors:
Ali Shaib,
Mohamad H. Naim,
Mohammed E. Fouda,
Rouwaida Kanj,
Fadi Kurdahi
Abstract:
Quantum computers have enabled solving problems beyond the current computers' capabilities. However, this requires handling noise arising from unwanted interactions in these systems. Several protocols have been proposed to address efficient and accurate quantum noise profiling and mitigation. In this work, we propose a novel protocol that efficiently estimates the average output of a noisy quantum…
▽ More
Quantum computers have enabled solving problems beyond the current computers' capabilities. However, this requires handling noise arising from unwanted interactions in these systems. Several protocols have been proposed to address efficient and accurate quantum noise profiling and mitigation. In this work, we propose a novel protocol that efficiently estimates the average output of a noisy quantum device to be used for quantum noise mitigation. The multi-qubit system average behavior is approximated as a special form of a Pauli Channel where Clifford gates are used to estimate the average output for circuits of different depths. The characterized Pauli channel error rates, and state preparation and measurement errors are then used to construct the outputs for different depths thereby eliminating the need for large simulations and enabling efficient mitigation. We demonstrate the efficiency of the proposed protocol on four IBM Q 5-qubit quantum devices. Our method demonstrates improved accuracy with efficient noise characterization. We report up to 88\% and 69\% improvement for the proposed approach compared to the unmitigated, and pure measurement error mitigation approaches, respectively.
△ Less
Submitted 10 September, 2021;
originally announced September 2021.
-
Pinched Hysteresis Loops In Nonlinear Resonators
Authors:
A. S. Elwakil,
M. E. Fouda,
S. Majzoub,
A. G. Radwan
Abstract:
This paper shows that pinched hysteresis can be observed in simple nonlinear resonance circuits containing a single diode that behaves as a voltage-controlled switch. Mathematical models are derived and numerically validated for both series and parallel resonator circuits. The lobe area of the pinched loop is found to increase with increased frequency and multiple pinch-points are possible with an…
▽ More
This paper shows that pinched hysteresis can be observed in simple nonlinear resonance circuits containing a single diode that behaves as a voltage-controlled switch. Mathematical models are derived and numerically validated for both series and parallel resonator circuits. The lobe area of the pinched loop is found to increase with increased frequency and multiple pinch-points are possible with an odd symmetrical nonlinearity such as a cubic nonlinearity. Experiments have been performed to prove the existence of pinched hysteresis with a single diode and with two anti-parallel diodes. The formation of a pinched loop in these circuits confirms that: 1) pinched hysteresis is not a finger-print of memristors and that 2) the existence of a nonlinearity is essential for generating this behavior. Finally, an application in a digital logic circuit is validated.
△ Less
Submitted 10 October, 2020;
originally announced October 2020.
-
Application of ICA on Self-Interference Cancellation of In-band Full Duplex Systems
Authors:
Mohammed E. Fouda,
Sergey Shaboyan,
Ayman Elezabi,
Ahmed Eltawil
Abstract:
In this letter, we propose a modified version of Fast Independent Component Analysis (FICA) algorithm to solve the self-interference cancellation (SIC) problem in In-band Full Duplex (IBFD) communication systems. The complex mixing problem is mathematically formulated to suit the real-valued blind source separation (BSS) algorithms. In addition, we propose a method to estimate the ambiguity factor…
▽ More
In this letter, we propose a modified version of Fast Independent Component Analysis (FICA) algorithm to solve the self-interference cancellation (SIC) problem in In-band Full Duplex (IBFD) communication systems. The complex mixing problem is mathematically formulated to suit the real-valued blind source separation (BSS) algorithms. In addition, we propose a method to estimate the ambiguity factors associated with ICA lumped together with the channels and residual separation error. Experiments were performed on an FD platform where FICA-based BSS was applied for SIC in the frequency domain. Experimental results show superior performance compared to least squares SIC by up to 6 dB gain in the SNR.
△ Less
Submitted 3 January, 2020;
originally announced January 2020.