Search | arXiv e-print repository

C3PU: Cross-Coupling Capacitor Processing Unit Using Analog-Mixed Signal In-Memory Computing for AI Inference

Authors: Dima Kilani, Baker Mohammad, Yasmin Halawani, Mohammed F. Tolba, Hani Saleh

Abstract: This paper presents a novel cross-coupling capacitor processing unit (C3PU) that supports analog-mixed signal in memory computing to perform multiply-and-accumulate (MAC) operations. The C3PU consists of a capacitive unit, a CMOS transistor, and a voltage-to-time converter (VTC). The capacitive unit serves as a computational element that holds the multiplier operand and performs multiplication onc… ▽ More This paper presents a novel cross-coupling capacitor processing unit (C3PU) that supports analog-mixed signal in memory computing to perform multiply-and-accumulate (MAC) operations. The C3PU consists of a capacitive unit, a CMOS transistor, and a voltage-to-time converter (VTC). The capacitive unit serves as a computational element that holds the multiplier operand and performs multiplication once the multiplicand is applied at the terminal. The multiplicand is the input voltage that is converted to a pulse width signal using a low power VTC. The transistor transfers this multiplication where a voltage level is generated. A demonstrator of 5x4 C3PU array that is capable of implementing 4 MAC units is presented. The design has been verified using Monte Carlo simulation in 65 nm technology. The 5x4 C3PU consumed energy of 66.4 fJ/MAC at 0.3 V voltage supply with an error of 5.7%. The proposed unit achieves lower energy and occupies a smaller area by 3.4x and 3.6x, respectively, with similar error value when compared to a digital-based 8x4-bit fixed point MAC unit. The C3PU has been utilized through an iris fower classification utilizing an artificial neural network which achieved a 90% classification accuracy compared to ideal accuracy of 96.67% using MATLAB. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: 10 pages, 12 figures and 7 tables

arXiv:2105.02954 [pdf, other]

Deep Neural Networks Based Weight Approximation and Computation Reuse for 2-D Image Classification

Authors: Mohammed F. Tolba, Huruy Tekle Tesfai, Hani Saleh, Baker Mohammad, Mahmoud Al-Qutayri

Abstract: Deep Neural Networks (DNNs) are computationally and memory intensive, which makes their hardware implementation a challenging task especially for resource constrained devices such as IoT nodes. To address this challenge, this paper introduces a new method to improve DNNs performance by fusing approximate computing with data reuse techniques to be used for image recognition applications. DNNs weigh… ▽ More Deep Neural Networks (DNNs) are computationally and memory intensive, which makes their hardware implementation a challenging task especially for resource constrained devices such as IoT nodes. To address this challenge, this paper introduces a new method to improve DNNs performance by fusing approximate computing with data reuse techniques to be used for image recognition applications. DNNs weights are approximated based on the linear and quadratic approximation methods during the training phase, then, all of the weights are replaced with the linear/quadratic coefficients to execute the inference in a way where different weights could be computed using the same coefficients. This leads to a repetition of the weights across the processing element (PE) array, which in turn enables the reuse of the DNN sub-computations (computational reuse) and leverage the same data (data reuse) to reduce DNNs computations, memory accesses, and improve energy efficiency albeit at the cost of increased training time. Complete analysis for both MNIST and CIFAR 10 datasets is presented for image recognition , where LeNet 5 revealed a reduction in the number of parameters by a factor of 1211.3x with a drop of less than 0.9% in accuracy. When compared to the state of the art Row Stationary (RS) method, the proposed architecture saved 54% of the total number of adders and multipliers needed. Overall, the proposed approach is suitable for IoT edge devices as it reduces the memory size requirement as well as the number of needed memory accesses. △ Less

Submitted 28 April, 2021; originally announced May 2021.

Comments: 10 pages 9 figures

arXiv:1706.07886 [pdf, other]

doi 10.1016/j.patrec.2010.09.019

Fundamental Matrix Estimation: A Study of Error Criteria

Authors: Mohammed E. Fathy, Ashraf S. Hussein, Mohammed F. Tolba

Abstract: The fundamental matrix (FM) describes the geometric relations that exist between two images of the same scene. Different error criteria are used for estimating FMs from an input set of correspondences. In this paper, the accuracy and efficiency aspects of the different error criteria were studied. We mathematically and experimentally proved that the most popular error criterion, the symmetric epip… ▽ More The fundamental matrix (FM) describes the geometric relations that exist between two images of the same scene. Different error criteria are used for estimating FMs from an input set of correspondences. In this paper, the accuracy and efficiency aspects of the different error criteria were studied. We mathematically and experimentally proved that the most popular error criterion, the symmetric epipolar distance, is biased. It was also shown that despite the similarity between the algebraic expressions of the symmetric epipolar distance and Sampson distance, they have different accuracy properties. In addition, a new error criterion, Kanatani distance, was proposed and was proved to be the most effective for use during the outlier removal phase from accuracy and efficiency perspectives. To thoroughly test the accuracy of the different error criteria, we proposed a randomized algorithm for Reprojection Error-based Correspondence Generation (RE-CG). As input, RE-CG takes an FM and a desired reprojection error value $d$. As output, RE-CG generates a random correspondence having that error value. Mathematical analysis of this algorithm revealed that the success probability for any given trial is 1 - (2/3)^2 at best and is 1 - (6/7)^2 at worst while experiments demonstrated that the algorithm often succeeds after only one trial. △ Less

Submitted 23 June, 2017; originally announced June 2017.

Comments: 15 pages, 7 figures, Pattern Recognition Letters, 2011

arXiv:1609.06260 [pdf, other]

GAdaBoost: Accelerating Adaboost Feature Selection with Genetic Algorithms

Authors: Mai Tolba, Mohamed Moustafa

Abstract: Boosted cascade of simple features, by Viola and Jones, is one of the most famous object detection frameworks. However, it suffers from a lengthy training process. This is due to the vast features space and the exhaustive search nature of Adaboost. In this paper we propose GAdaboost: a Genetic Algorithm to accelerate the training procedure through natural feature selection. Specifically, we propos… ▽ More Boosted cascade of simple features, by Viola and Jones, is one of the most famous object detection frameworks. However, it suffers from a lengthy training process. This is due to the vast features space and the exhaustive search nature of Adaboost. In this paper we propose GAdaboost: a Genetic Algorithm to accelerate the training procedure through natural feature selection. Specifically, we propose to limit Adaboost search within a subset of the huge feature space, while evolving this subset following a Genetic Algorithm. Experiments demonstrate that our proposed GAdaboost is up to 3.7 times faster than Adaboost. We also demonstrate that the price of this speedup is a mere decrease (3%, 4%) in detection accuracy when tested on FDDB benchmark face detection set, and Caltech Web Faces respectively. △ Less

Submitted 20 September, 2016; originally announced September 2016.

Comments: 8th International Conference on Evolutionary Computation Theory and Applications (ECTA 2016). Final paper will appear at the SCITEPRESS Digital Library

Showing 1–4 of 4 results for author: Tolba, M