-
Fully-parallel Convolutional Neural Network Hardware
Authors:
Christiam F. Frasser,
Pablo Linares-Serrano,
V. Canals,
Miquel Roca,
T. Serrano-Gotarredona,
Josep L. Rossello
Abstract:
A new trans-disciplinary knowledge area, Edge Artificial Intelligence or Edge Intelligence, is beginning to receive a tremendous amount of interest from the machine learning community due to the ever increasing popularization of the Internet of Things (IoT). Unfortunately, the incorporation of AI characteristics to edge computing devices presents the drawbacks of being power and area hungry for ty…
▽ More
A new trans-disciplinary knowledge area, Edge Artificial Intelligence or Edge Intelligence, is beginning to receive a tremendous amount of interest from the machine learning community due to the ever increasing popularization of the Internet of Things (IoT). Unfortunately, the incorporation of AI characteristics to edge computing devices presents the drawbacks of being power and area hungry for typical machine learning techniques such as Convolutional Neural Networks (CNN). In this work, we propose a new power-and-area-efficient architecture for implementing Articial Neural Networks (ANNs) in hardware, based on the exploitation of correlation phenomenon in Stochastic Computing (SC) systems. The architecture purposed can solve the difficult implementation challenges that SC presents for CNN applications, such as the high resources used in binary-tostochastic conversion, the inaccuracy produced by undesired correlation between signals, and the stochastic maximum function implementation. Compared with traditional binary logic implementations, experimental results showed an improvement of 19.6x and 6.3x in terms of speed performance and energy efficiency, for the FPGA implementation. We have also realized a full VLSI implementation of the proposed SC-CNN architecture demonstrating that our optimization achieve a 18x area reduction over previous SC-DNN architecture VLSI implementation in a comparable technological node. For the first time, a fully-parallel CNN as LENET-5 is embedded and tested in a single FPGA, showing the benefits of using stochastic computing for embedded applications, in contrast to traditional binary logic implementations.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
Stochastic-based Neural Network hardware acceleration for an efficient ligand-based virtual screening
Authors:
Christian F. Frasser,
Carola de Benito,
Vincent Canals,
Miquel Roca,
Pedro J. Ballester,
Josep L. Rossello
Abstract:
Artificial Neural Networks (ANN) have been popularized in many science and technological areas due to their capacity to solve many complex pattern matching problems. That is the case of Virtual Screening, a research area that studies how to identify those molecular compounds with the highest probability to present biological activity for a therapeutic target. Due to the vast number of small organi…
▽ More
Artificial Neural Networks (ANN) have been popularized in many science and technological areas due to their capacity to solve many complex pattern matching problems. That is the case of Virtual Screening, a research area that studies how to identify those molecular compounds with the highest probability to present biological activity for a therapeutic target. Due to the vast number of small organic compounds and the thousands of targets for which such large-scale screening can potentially be carried out, there has been an increasing interest in the research community to increase both, processing speed and energy efficiency in the screening of molecular databases. In this work, we present a classification model describing each molecule with a single energy-based vector and propose a machine-learning system based on the use of ANNs. Different ANNs are studied with respect to their suitability to identify biochemical similarities. Also, a high-performance and energy-efficient hardware acceleration platform based on the use of stochastic computing is proposed for the ANN implementation. This platform is of utility when screening vast libraries of compounds. As a result, the proposed model showed appreciable improvements with respect previously published works in terms of the main relevant characteristics (accuracy, speed and energy-efficiency).
△ Less
Submitted 3 June, 2020;
originally announced June 2020.
-
Reservoir Computing Hardware with Cellular Automata
Authors:
Alejandro Morán,
Christiam F. Frasser,
Josep L. Rosselló
Abstract:
Elementary cellular automata (ECA) is a widely studied one-dimensional processing methodology where the successive iteration of the automaton may lead to the recreation of a rich pattern dynamic. Recently, cellular automata have been proposed as a feasible way to implement Reservoir Computing (RC) systems in which the automata rule is fixed and the training is performed using a linear regression.…
▽ More
Elementary cellular automata (ECA) is a widely studied one-dimensional processing methodology where the successive iteration of the automaton may lead to the recreation of a rich pattern dynamic. Recently, cellular automata have been proposed as a feasible way to implement Reservoir Computing (RC) systems in which the automata rule is fixed and the training is performed using a linear regression. In this work we perform an exhaustive study of the performance of the different ECA rules when applied to pattern recognition of time-independent input signals using a RC scheme. Once the different ECA rules have been tested, the most accurate one (rule 90) is selected to implement a digital circuit. Rule 90 is easily reproduced using a reduced set of XOR gates and shift-registers, thus representing a high-performance alternative for RC hardware implementation in terms of processing time, circuit area, power dissipation and system accuracy. The model (both in software and its hardware implementation) has been tested using a pattern recognition task of handwritten numbers (the MNIST database) for which we obtained competitive results in terms of accuracy, speed and power dissipation. The proposed model can be considered to be a low-cost method to implement fast pattern recognition digital circuits.
△ Less
Submitted 21 June, 2018; v1 submitted 13 June, 2018;
originally announced June 2018.
-
Stochastic-Based Pattern Recognition Analysis
Authors:
V. Canals,
A. Morro,
J. L. Rosselló
Abstract:
In this work we review the basic principles of stochastic logic and propose its application to probabilistic-based pattern-recognition analysis. The proposed technique is intrinsically a parallel comparison of input data to various pre-stored categories using Bayesian techniques. We design smart pulse-based stochastic-logic blocks to provide an efficient pattern recognition analysis. The proposed…
▽ More
In this work we review the basic principles of stochastic logic and propose its application to probabilistic-based pattern-recognition analysis. The proposed technique is intrinsically a parallel comparison of input data to various pre-stored categories using Bayesian techniques. We design smart pulse-based stochastic-logic blocks to provide an efficient pattern recognition analysis. The proposed rchitecture is applied to a specific navigation problem. The resulting system is orders of magnitude faster than processor-based solutions.
△ Less
Submitted 20 February, 2012;
originally announced February 2012.
-
A Fast Concurrent Power-Thermal Model for Sub-100nm Digital ICs
Authors:
J. L. Rossello,
V. Canals,
S. A. Bota,
A. Keshavarzi,
J. Segura
Abstract:
As technology scales down, the static power is expected to become a significant fraction of the total power. The exponential dependence of static power with the operating temperature makes the thermal profile estimation of high-performance ICs a key issue to compute the total power dissipated in next-generations. In this paper we present accurate and compact analytical models to estimate the sta…
▽ More
As technology scales down, the static power is expected to become a significant fraction of the total power. The exponential dependence of static power with the operating temperature makes the thermal profile estimation of high-performance ICs a key issue to compute the total power dissipated in next-generations. In this paper we present accurate and compact analytical models to estimate the static power dissipation and the temperature of operation of CMOS gates. The models are the fundamentals of a performance estimation tool in which numerical procedures are avoided for any computation to set a faster estimation and optimization. The models developed are compared to measurements and SPICE simulations for a 0.12mm technology showing excellent results.
△ Less
Submitted 25 October, 2007;
originally announced October 2007.
-
Smart Temperature Sensor for Thermal Testing of Cell-Based ICs
Authors:
S. A. Bota,
M. Rosales,
J. L. Rossello,
J. Segura
Abstract:
In this paper we present a simple and efficient built-in temperature sensor for thermal monitoring of standard-cell based VLSI circuits. The proposed smart temperature sensor uses a ring-oscillator composed of complex gates instead of inverters to optimize their linearity. Simulation results from a 0.18$μ$m CMOS technology show that the non-linearity error of the sensor can be reduced when an ad…
▽ More
In this paper we present a simple and efficient built-in temperature sensor for thermal monitoring of standard-cell based VLSI circuits. The proposed smart temperature sensor uses a ring-oscillator composed of complex gates instead of inverters to optimize their linearity. Simulation results from a 0.18$μ$m CMOS technology show that the non-linearity error of the sensor can be reduced when an adequate set of standard logic gates is selected.
△ Less
Submitted 25 October, 2007;
originally announced October 2007.