Search | arXiv e-print repository

doi 10.1109/TCSII.2022.3207378

Fully-Binarized, Parallel, RRAM-based Computing Primitive for In-Memory Similarity Search

Authors: Sandeep Kaur Kingra, Vivek Parmar, Deepak Verma, Alessandro Bricalli, Giuseppe Piccolboni, Gabriel Molas, Amir Regev, Manan Suri

Abstract: In this work, we propose a fully-binarized XOR-based IMSS (In-Memory Similarity Search) using RRAM (Resistive Random Access Memory) arrays. XOR (Exclusive OR) operation is realized using 2T-2R bitcells arranged along the column in an array. This enables simultaneous match operation across multiple stored data vectors by performing analog column-wise XOR operation and summation to compute HD (Hammi… ▽ More In this work, we propose a fully-binarized XOR-based IMSS (In-Memory Similarity Search) using RRAM (Resistive Random Access Memory) arrays. XOR (Exclusive OR) operation is realized using 2T-2R bitcells arranged along the column in an array. This enables simultaneous match operation across multiple stored data vectors by performing analog column-wise XOR operation and summation to compute HD (Hamming Distance). The proposed scheme is experimentally validated on fabricated RRAM arrays. Full-system validation is performed through SPICE simulations using open source Skywater 130 nm CMOS PDK demonstrating energy of 17 fJ per XOR operation using the proposed bitcell with a full-system power dissipation of 145 $μ$W. Using projected estimations at advanced nodes (28 nm) energy savings of $\approx$1.5$\times$ compared to the state-of-the-art can be observed for a fixed workload. Application-level validation is performed on HSI (Hyper-Spectral Image) pixel classification task using the Salinas dataset demonstrating an accuracy of 90%. △ Less

Submitted 18 September, 2022; v1 submitted 4 August, 2022; originally announced August 2022.

arXiv:2206.00250 [pdf, other]

doi 10.1109/TNANO.2022.3193921

Time-multiplexed In-memory computation scheme for map** Quantized Neural Networks on hybrid CMOS-OxRAM building blocks

Authors: Sandeep Kaur Kingra, Vivek Parmar, Manoj Sharma, Manan Suri

Abstract: In this work, we experimentally demonstrate two key building blocks for realizing Binary/Ternary Neural Networks (BNNs/TNNs): (i) 130 nm CMOS based sigmoidal neurons and (ii) HfOx based multi-level (MLC) OxRAM-synaptic blocks. An optimized vector matrix multiplication programming scheme that utilizes the two building blocks is also presented. Compared to prior approaches that utilize differential… ▽ More In this work, we experimentally demonstrate two key building blocks for realizing Binary/Ternary Neural Networks (BNNs/TNNs): (i) 130 nm CMOS based sigmoidal neurons and (ii) HfOx based multi-level (MLC) OxRAM-synaptic blocks. An optimized vector matrix multiplication programming scheme that utilizes the two building blocks is also presented. Compared to prior approaches that utilize differential synaptic structures, a single device per synapse with two sets of READ operations is used. Proposed hardware map** strategy shows performance change of <5% (decrease of 2-5% for TNN, increase of 0.2% for BNN) compared to ideal quantized neural networks (QNN) with significant memory savings in the order of 16-32x for classification problem on Fashion MNIST (FMNIST) dataset. Impact of OxRAM device variability on the performance of Hardware QNN (BNN/TNN) is also analyzed. △ Less

Submitted 28 July, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

arXiv:2006.05657 [pdf, other]

Methodology for Realizing VMM with Binary RRAM Arrays: Experimental Demonstration of Binarized-ADALINE Using OxRAM Crossbar

Authors: Sandeep Kaur Kingra, Vivek Parmar, Shubham Negi, Sufyan Khan, Boris Hudec, Tuo-Hung Hou, Manan Suri

Abstract: In this paper, we present an efficient hardware map** methodology for realizing vector matrix multiplication (VMM) on resistive memory (RRAM) arrays. Using the proposed VMM computation technique, we experimentally demonstrate a binarized-ADALINE (Adaptive Linear) classifier on an OxRAM crossbar. An 8x8 OxRAM crossbar with Ni/3-nm HfO2/7 nm Al-doped-TiO2/TiN device stack is used. Weight training… ▽ More In this paper, we present an efficient hardware map** methodology for realizing vector matrix multiplication (VMM) on resistive memory (RRAM) arrays. Using the proposed VMM computation technique, we experimentally demonstrate a binarized-ADALINE (Adaptive Linear) classifier on an OxRAM crossbar. An 8x8 OxRAM crossbar with Ni/3-nm HfO2/7 nm Al-doped-TiO2/TiN device stack is used. Weight training for the binarized-ADALINE classifier is performed ex-situ on UCI cancer dataset. Post weight generation the OxRAM array is carefully programmed to binary weight-states using the proposed weight map** technique on a custom-built testbench. Our VMM powered binarized-ADALINE network achieves a classification accuracy of 78% in simulation and 67% in experiments. Experimental accuracy was found to drop mainly due to crossbar inherent sneak-path issues and RRAM device programming variability. △ Less

Submitted 10 June, 2020; originally announced June 2020.

Comments: Accepted for presentation at the IEEE International Symposium on Circuits and Systems (ISCAS) 2020

arXiv:1811.05772 [pdf]

doi 10.1038/s41598-020-59121-0

SLIM: Simultaneous Logic-in-Memory Computing Exploiting Bilayer Analog OxRAM Devices

Authors: Sandeep Kaur Kingra, Vivek Parmar, Che-Chia Chang, Boris Hudec, Tuo-Hung Hou, Manan Suri

Abstract: Von Neumann architecture based computers isolate/physically separate computation and storage units i.e. data is shuttled between computation unit (processor) and memory unit to realize logic/ arithmetic and storage functions. This to-and-fro movement of data leads to a fundamental limitation of modern computers, known as the memory wall. Logic in-Memory (LIM) approaches aim to address this bottlen… ▽ More Von Neumann architecture based computers isolate/physically separate computation and storage units i.e. data is shuttled between computation unit (processor) and memory unit to realize logic/ arithmetic and storage functions. This to-and-fro movement of data leads to a fundamental limitation of modern computers, known as the memory wall. Logic in-Memory (LIM) approaches aim to address this bottleneck by computing inside the memory units and thereby eliminating the energy-intensive and time-consuming data movement. However, most LIM approaches reported in literature are not truly "simultaneous" as during LIM operation the bitcell can be used only as a Memory cell or only as a Logic cell. The bitcell is not capable of storing both the Memory/Logic outputs simultaneously. Here, we propose a novel 'Simultaneous Logic in-Memory' (SLIM) methodology that allows to implement both Memory and Logic operations simultaneously on the same bitcell in a non-destructive manner without losing the previously stored Memory state. Through extensive experiments we demonstrate the SLIM methodology using non-filamentary bilayer analog OxRAM devices with NMOS transistors (2T-1R bitcell). Detailed programming scheme, array level implementation and controller architecture are also proposed. Furthermore, to study the impact of introducing SLIM array in the memory hierarchy, a simple image processing application (edge detection) is also investigated. It has been estimated that by performing all computations inside the SLIM array, the total Energy Delay Product (EDP) reduces by ~ 40x in comparison to a modern-day computer. EDP saving owing to reduction in data transfer between CPU Memory is observed to be ~ 780x. △ Less

Submitted 13 February, 2020; v1 submitted 14 November, 2018; originally announced November 2018.

Journal ref: Sci Rep 10, 2567 (2020)

Showing 1–4 of 4 results for author: Kingra, S K