Search | arXiv e-print repository

System-level Impact of Non-Ideal Program-Time of Charge Trap Flash (CTF) on Deep Neural Network

Authors: S. Shrivastava, A. Biswas, S. Chakrabarty, G. Dash, V. Saraswat, U. Ganguly

Abstract: Learning of deep neural networks (DNN) using Resistive Processing Unit (RPU) architecture is energy-efficient as it utilizes dedicated neuromorphic hardware and stochastic computation of weight updates for in-memory computing. Charge Trap Flash (CTF) devices can implement RPU-based weight updates in DNNs. However, prior work has shown that the weight updates (V_T) in CTF-based RPU are impacted by… ▽ More Learning of deep neural networks (DNN) using Resistive Processing Unit (RPU) architecture is energy-efficient as it utilizes dedicated neuromorphic hardware and stochastic computation of weight updates for in-memory computing. Charge Trap Flash (CTF) devices can implement RPU-based weight updates in DNNs. However, prior work has shown that the weight updates (V_T) in CTF-based RPU are impacted by the non-ideal program time of CTF. The non-ideal program time is affected by two factors of CTF. Firstly, the effects of the number of input pulses (N) or pulse width (pw), and secondly, the gap between successive update pulses (t_gap) used for the stochastic computation of weight updates. Therefore, the impact of this non-ideal program time must be studied for neural network training simulations. In this study, Firstly, we propose a pulse-train design compensation technique to reduce the total error caused by non-ideal program time of CTF and stochastic variance of a network. Secondly, we simulate RPU-based DNN with non-ideal program time of CTF on MNIST and Fashion-MNIST datasets. We find that for larger N (~1000), learning performance approaches the ideal (software-level) training level and, therefore, is not much impacted by the choice of t_gap used to implement RPU-based weight updates. However, for lower N (<500), learning performance depends on T_gap of the pulses. Finally, we also performed an ablation study to isolate the causal factor of the improved learning performance. We conclude that the lower noise level in the weight updates is the most likely significant factor to improve the learning performance of DNN. Thus, our study attempts to compensate for the error caused by non-ideal program time and standardize the pulse length (N) and pulse gap (t_gap) specifications for CTF-based RPUs for accurate system-level on-chip training. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2304.03124 [pdf]

FeFET-based MirrorBit cell for High-density NVM storage

Authors: Paritosh Meihar, Rowtu Srinu, Vivek Saraswat, Sandip Lashkare, Halid Mulaosmanovic, Ajay Kumar Singh, Stefan Dünkel, Sven Beyer, Udayan Ganguly

Abstract: HfO2-based Ferroelectric field-effect transistor (FeFET) has become a center of attraction for non-volatile memory applications because of their low power, fast switching speed, high scalability, and CMOS compatibility. In this work, we show an n-channel FeFET-based Multibit memory, termed MirrorBit, which effectively doubles the chip density via programming the gradient ferroelectric polarization… ▽ More HfO2-based Ferroelectric field-effect transistor (FeFET) has become a center of attraction for non-volatile memory applications because of their low power, fast switching speed, high scalability, and CMOS compatibility. In this work, we show an n-channel FeFET-based Multibit memory, termed MirrorBit, which effectively doubles the chip density via programming the gradient ferroelectric polarizations in the gate using an appropriate biasing scheme. We have experimentally demonstrated MirrorBit on GlobalFoundries HfO2-based FeFET devices fabricated at 28 nm bulk HKMG CMOS technology. Retention of MirrorBit states has been shown up to $10^5$ s at different temperatures. Also, the endurance is found to be more than $10^3$ cycles. A TCAD simulation is also presented to explain the origin and working of MirrorBit states based on the FeFET model calibrated using the GlobalFoundries FeFET device. We have also proposed the array-level implementation and sensing methodology of the MirrorBit memory. Thus, we have converted 1-bit FeFET into 2-bit FeFET using a particular programming scheme in existing FeFET, without needing any notable fabrication process alteration, to double the chip density for high-density non-volatile memory storage. △ Less

Submitted 14 September, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

Comments: 6 pages, 9 figures

arXiv:2111.02885 [pdf]

Stochasticity Invariance Control in Pr$_{1-x}$Ca$_x$MnO$_3$ RRAM to enable Large-Scale Stochastic Recurrent Neural Networks

Authors: Vivek Saraswat, Udayan Ganguly

Abstract: Emerging non-volatile memories have been proposed for a wide range of applications from easing the von-Neumann bottleneck to neuromorphic applications. Specifically, scalable RRAMs based on Pr$_{1-x}$Ca$_x$MnO$_3$ (PCMO) exhibit analog switching have been demonstrated as an integrating neuron, an analog synapse, and a voltage-controlled oscillator. More recently, the inherent stochasticity of memr… ▽ More Emerging non-volatile memories have been proposed for a wide range of applications from easing the von-Neumann bottleneck to neuromorphic applications. Specifically, scalable RRAMs based on Pr$_{1-x}$Ca$_x$MnO$_3$ (PCMO) exhibit analog switching have been demonstrated as an integrating neuron, an analog synapse, and a voltage-controlled oscillator. More recently, the inherent stochasticity of memristors has been proposed for efficient hardware implementations of Boltzmann Machines. However, as the problem size scales, the number of neurons increase and controlling the stochastic distribution tightly over many iterations is necessary. This requires parametric control over stochasticity. Here, we characterize the stochastic Set in PCMO RRAMs. We identify that the Set time distribution depends on the internal state of the device (i.e., resistance) in addition to external input (i.e., voltage pulse). This requires the confluence of contradictory properties like stochastic switching as well as deterministic state control in the same device. Unlike, "stochastic-everywhere" filamentary memristors, in PCMO RRAMs, we leverage the (i) stochastic Set in negative polarity and (ii) deterministic analog Reset in positive polarity to demonstrate 100x reduced Set time distribution drift. The impact on Boltzmann Machines' performance is analyzed and as opposed to the "fixed external input stochasticity", the "state-monitored stochasticity" can solve problems 20x larger in size. State monitoring also tunes out the device-to-device variability effect on distributions providing 10x better performance. In addition to the physical insights, this study establishes the use of experimental stochasticity in PCMO RRAMs in stochastic recurrent neural networks reliably over many iterations. △ Less

Submitted 2 November, 2021; originally announced November 2021.

arXiv:2105.01358 [pdf]

Simplified Klinokinesis using Spiking Neural Networks for Resource-Constrained Navigation on the Neuromorphic Processor Loihi

Authors: Apoorv Kishore, Vivek Saraswat, Udayan Ganguly

Abstract: C. elegans shows chemotaxis using klinokinesis where the worm senses the concentration based on a single concentration sensor to compute the concentration gradient to perform foraging through gradient ascent/descent towards the target concentration followed by contour tracking. The biomimetic implementation requires complex neurons with multiple ion channel dynamics as well as interneurons for con… ▽ More C. elegans shows chemotaxis using klinokinesis where the worm senses the concentration based on a single concentration sensor to compute the concentration gradient to perform foraging through gradient ascent/descent towards the target concentration followed by contour tracking. The biomimetic implementation requires complex neurons with multiple ion channel dynamics as well as interneurons for control. While this is a key capability of autonomous robots, its implementation on energy-efficient neuromorphic hardware like Intel's Loihi requires adaptation of the network to hardware-specific constraints, which has not been achieved. In this paper, we demonstrate the adaptation of chemotaxis based on klinokinesis to Loihi by implementing necessary neuronal dynamics with only LIF neurons as well as a complete spike-based implementation of all functions e.g. Heaviside function and subtractions. Our results show that Loihi implementation is equivalent to the software counterpart on Python in terms of performance - both during foraging and contour tracking. The Loihi results are also resilient in noisy environments. Thus, we demonstrate a successful adaptation of chemotaxis on Loihi - which can now be combined with the rich array of SNN blocks for SNN based complex robotic control. △ Less

Submitted 4 May, 2021; originally announced May 2021.

arXiv:2104.14264 [pdf]

Hardware-Friendly Synaptic Orders and Timescales in Liquid State Machines for Speech Classification

Authors: Vivek Saraswat, A**kya Gorad, Anand Naik, Aakash Patil, Udayan Ganguly

Abstract: Liquid State Machines are brain inspired spiking neural networks (SNNs) with random reservoir connectivity and bio-mimetic neuronal and synaptic models. Reservoir computing networks are proposed as an alternative to deep neural networks to solve temporal classification problems. Previous studies suggest 2nd order (double exponential) synaptic waveform to be crucial for achieving high accuracy for… ▽ More Liquid State Machines are brain inspired spiking neural networks (SNNs) with random reservoir connectivity and bio-mimetic neuronal and synaptic models. Reservoir computing networks are proposed as an alternative to deep neural networks to solve temporal classification problems. Previous studies suggest 2nd order (double exponential) synaptic waveform to be crucial for achieving high accuracy for TI-46 spoken digits recognition. The proposal of long-time range (ms) bio-mimetic synaptic waveforms is a challenge to compact and power efficient neuromorphic hardware. In this work, we analyze the role of synaptic orders namely: δ (high output for single time step), 0th (rectangular with a finite pulse width), 1st (exponential fall) and 2nd order (exponential rise and fall) and synaptic timescales on the reservoir output response and on the TI-46 spoken digits classification accuracy under a more comprehensive parameter sweep. We find the optimal operating point to be correlated to an optimal range of spiking activity in the reservoir. Further, the proposed 0th order synapses perform at par with the biologically plausible 2nd order synapses. This is substantial relaxation for circuit designers as synapses are the most abundant components in an in-memory implementation for SNNs. The circuit benefits for both analog and mixed-signal realizations of 0th order synapse are highlighted demonstrating 2-3 orders of savings in area and power consumptions by eliminating Op-Amps and Digital to Analog Converter circuits. This has major implications on a complete neural network implementation with focus on peripheral limitations and algorithmic simplifications to overcome them. △ Less

Submitted 29 April, 2021; originally announced April 2021.

Showing 1–5 of 5 results for author: Saraswat, V