-
Thermal Heating in ReRAM Crossbar Arrays: Challenges and Solutions
Authors:
Kamilya Smagulova,
Mohammed E. Fouda,
Ahmed Eltawil
Abstract:
The higher speed, scalability and parallelism offered by ReRAM crossbar arrays foster development of ReRAM-based next generation AI accelerators. At the same time, sensitivity of ReRAM to temperature variations decreases R_on/Roff ratio and negatively affects the achieved accuracy and reliability of the hardware. Various works on temperature-aware optimization and remap** in ReRAM crossbar array…
▽ More
The higher speed, scalability and parallelism offered by ReRAM crossbar arrays foster development of ReRAM-based next generation AI accelerators. At the same time, sensitivity of ReRAM to temperature variations decreases R_on/Roff ratio and negatively affects the achieved accuracy and reliability of the hardware. Various works on temperature-aware optimization and remap** in ReRAM crossbar arrays reported up to 58\% improvement in accuracy and 2.39$\times$ ReRAM lifetime enhancement. This paper classifies the challenges caused by thermal heat, starting from constraints in ReRAM cells' dimensions and characteristics to their placement in the architecture. In addition, it reviews available solutions designed to mitigate the impact of these challenges, including emerging temperature-resilient DNN training methods. Our work also provides a summary of the techniques and their advantages and limitations.
△ Less
Submitted 31 January, 2023; v1 submitted 28 December, 2022;
originally announced December 2022.
-
Resistive Neural Hardware Accelerators
Authors:
Kamilya Smagulova,
Mohammed E. Fouda,
Fadi Kurdahi,
Khaled Salama,
Ahmed Eltawil
Abstract:
Deep Neural Networks (DNNs), as a subset of Machine Learning (ML) techniques, entail that real-world data can be learned and that decisions can be made in real-time. However, their wide adoption is hindered by a number of software and hardware limitations. The existing general-purpose hardware platforms used to accelerate DNNs are facing new challenges associated with the growing amount of data an…
▽ More
Deep Neural Networks (DNNs), as a subset of Machine Learning (ML) techniques, entail that real-world data can be learned and that decisions can be made in real-time. However, their wide adoption is hindered by a number of software and hardware limitations. The existing general-purpose hardware platforms used to accelerate DNNs are facing new challenges associated with the growing amount of data and are exponentially increasing the complexity of computations. An emerging non-volatile memory (NVM) devices and processing-in-memory (PIM) paradigm is creating a new hardware architecture generation with increased computing and storage capabilities. In particular, the shift towards ReRAM-based in-memory computing has great potential in the implementation of area and power efficient inference and in training large-scale neural network architectures. These can accelerate the process of the IoT-enabled AI technologies entering our daily life. In this survey, we review the state-of-the-art ReRAM-based DNN many-core accelerators, and their superiority compared to CMOS counterparts was shown. The review covers different aspects of hardware and software realization of DNN accelerators, their present limitations, and future prospectives. In particular, comparison of the accelerators shows the need for the introduction of new performance metrics and benchmarking standards. In addition, the major concerns regarding the efficient design of accelerators include a lack of accuracy in simulation tools for software and hardware co-design.
△ Less
Submitted 8 September, 2021;
originally announced September 2021.
-
Wafer Quality Inspection using Memristive LSTM, ANN, DNN and HTM
Authors:
Kazybek Adam,
Kamilya Smagulova,
Olga Krestinskaya,
Alex Pappachen James
Abstract:
The automated wafer inspection and quality control is a complex and time-consuming task, which can speed up using neuromorphic memristive architectures, as a separate inspection device or integrating directly into sensors. This paper presents the performance analysis and comparison of different neuromorphic architectures for patterned wafer quality inspection and classification. The application of…
▽ More
The automated wafer inspection and quality control is a complex and time-consuming task, which can speed up using neuromorphic memristive architectures, as a separate inspection device or integrating directly into sensors. This paper presents the performance analysis and comparison of different neuromorphic architectures for patterned wafer quality inspection and classification. The application of non-volatile memristive devices in these architectures ensures low power consumption, small on-chip area scalability. We demonstrate that Long-Short Term Memory (LSTM) outperforms other architectures for the same number of training iterations, and has relatively low on-chip area and power consumption.
△ Less
Submitted 27 September, 2018;
originally announced September 2018.
-
Memristive LSTM network hardware architecture for time-series predictive modeling problem
Authors:
Kazybek Adam,
Kamilya Smagulova,
Alex Pappachen James
Abstract:
Analysis of time-series data allows to identify long-term trends and make predictions that can help to improve our lives. With the rapid development of artificial neural networks, long short-term memory (LSTM) recurrent neural network (RNN) configuration is found to be capable in dealing with time-series forecasting problems where data points are time-dependent and possess seasonality trends. Gate…
▽ More
Analysis of time-series data allows to identify long-term trends and make predictions that can help to improve our lives. With the rapid development of artificial neural networks, long short-term memory (LSTM) recurrent neural network (RNN) configuration is found to be capable in dealing with time-series forecasting problems where data points are time-dependent and possess seasonality trends. Gated structure of LSTM cell and flexibility in network topology (one-to-many, many-to-one, etc.) allows to model systems with multiple input variables and control several parameters such as the size of the look-back window to make a prediction and number of time steps to be predicted. These make LSTM attractive tool over conventional methods such as autoregression models, the simple average, moving average, naive approach, ARIMA, Holt's linear trend method, Holt's Winter seasonal method, and others. In this paper, we propose a hardware implementation of LSTM network architecture for time-series forecasting problem. All simulations were performed using TSMC 0.18um CMOS technology and HP memristor model.
△ Less
Submitted 9 September, 2018;
originally announced September 2018.
-
Design of CMOS-memristor Circuits for LSTM architecture
Authors:
Kamilya Smagulova,
Kazybek Adam,
Olga Krestinskaya,
Alex Pappachen James
Abstract:
Long Short-Term memory (LSTM) architecture is a well-known approach for building recurrent neural networks (RNN) useful in sequential processing of data in application to natural language processing. The near-sensor hardware implementation of LSTM is challenged due to large parallelism and complexity. We propose a 0.18 m CMOS, GST memristor LSTM hardware architecture for near-sensor processing. Th…
▽ More
Long Short-Term memory (LSTM) architecture is a well-known approach for building recurrent neural networks (RNN) useful in sequential processing of data in application to natural language processing. The near-sensor hardware implementation of LSTM is challenged due to large parallelism and complexity. We propose a 0.18 m CMOS, GST memristor LSTM hardware architecture for near-sensor processing. The proposed system is validated in a forecasting problem based on Keras model.
△ Less
Submitted 6 June, 2018;
originally announced June 2018.
-
CMOS-Memristor Hybrid Integrated Pixel Sensors
Authors:
Kamilya Smagulova,
Aigerim Tankimanova,
Alex Pappachen James
Abstract:
Increase in image resolution require the ability of image sensors to pack an increased number of circuit components in a given area. On the the other hand a high speed processing of signals from the sensors require the ability of pixel to carry out pixel parallel operations. In the paper, we propose a modified 3T and 4T CMOS wide dynamic range pixels, which we refer as 2T-M and 3T-M configurations…
▽ More
Increase in image resolution require the ability of image sensors to pack an increased number of circuit components in a given area. On the the other hand a high speed processing of signals from the sensors require the ability of pixel to carry out pixel parallel operations. In the paper, we propose a modified 3T and 4T CMOS wide dynamic range pixels, which we refer as 2T-M and 3T-M configurations, comprising of MOSFETS and memristors. The low leakage currents and low area of memristors helps to achieve the objective of reducing the area, while the possibility to create arrays of memristors and MOSFETs across different layers within the chip, ensure the possibility to scale the circuit architecture.
△ Less
Submitted 15 October, 2016;
originally announced October 2016.
-
CMOS-Memristor Dendrite Threshold Circuits
Authors:
Askhat Zhanbossinov,
Kamilya Smagulova,
Alex Pappachen James
Abstract:
Non-linear neuron models overcomes the limitations of linear binary models of neurons that have the inability to compute linearly non-separable functions such as XOR. While several biologically plausible models based on dendrite thresholds are reported in the previous studies, the hardware implementation of such non-linear neuron models remain as an open problem. In this paper, we propose a circui…
▽ More
Non-linear neuron models overcomes the limitations of linear binary models of neurons that have the inability to compute linearly non-separable functions such as XOR. While several biologically plausible models based on dendrite thresholds are reported in the previous studies, the hardware implementation of such non-linear neuron models remain as an open problem. In this paper, we propose a circuit design for implementing logical dendrite non-linearity response of dendrite spike and saturation types. The proposed dendrite cells are used to build XOR circuit and intensity detection circuit that consists of different combinations of dendrite cells with saturating and spiking responses. The dendrite cells are designed using a set of memristors, Zener diodes, and CMOS NOT gates. The circuits are designed, analyzed and verified on circuit boards.
△ Less
Submitted 16 September, 2016;
originally announced September 2016.