-
A Survey of Graph and Attention Based Hyperspectral Image Classification Methods for Remote Sensing Data
Authors:
Aryan Vats,
Manan Suri
Abstract:
The use of Deep Learning techniques for classification in Hyperspectral Imaging (HSI) is rapidly growing and achieving improved performances. Due to the nature of the data captured by sensors that produce HSI images, a common issue is the dimensionality of the bands that may or may not contribute to the label class distinction. Due to the widespread nature of class labels, Principal Component Anal…
▽ More
The use of Deep Learning techniques for classification in Hyperspectral Imaging (HSI) is rapidly growing and achieving improved performances. Due to the nature of the data captured by sensors that produce HSI images, a common issue is the dimensionality of the bands that may or may not contribute to the label class distinction. Due to the widespread nature of class labels, Principal Component Analysis is a common method used for reducing the dimensionality. However,there may exist methods that incorporate all bands of the Hyperspectral image with the help of the Attention mechanism. Furthermore, to yield better spectral spatial feature extraction, recent methods have also explored the usage of Graph Convolution Networks and their unique ability to use node features in prediction, which is akin to the pixel spectral makeup. In this survey we present a comprehensive summary of Graph based and Attention based methods to perform Hyperspectral Image Classification for remote sensing and aerial HSI images. We also summarize relevant datasets on which these techniques have been evaluated and benchmark the processing techniques.
△ Less
Submitted 15 October, 2023;
originally announced October 2023.
-
Optimized Implementation of Neuromorphic HATS Algorithm on FPGA
Authors:
Khushal Sethi,
Manan Suri
Abstract:
In this paper, we present first-ever optimized hardware implementation of a state-of-the-art neuromorphic approach Histogram of Averaged Time Surfaces (HATS) algorithm to event-based object classification in FPGA for asynchronous time-based image sensors (ATIS). Our Implementation achieves latency of 3.3 ms for the N-CARS dataset samples and is capable of processing 2.94 Mevts/s. Speed-up is achie…
▽ More
In this paper, we present first-ever optimized hardware implementation of a state-of-the-art neuromorphic approach Histogram of Averaged Time Surfaces (HATS) algorithm to event-based object classification in FPGA for asynchronous time-based image sensors (ATIS). Our Implementation achieves latency of 3.3 ms for the N-CARS dataset samples and is capable of processing 2.94 Mevts/s. Speed-up is achieved by using parallelism in the design and multiple Processing Elements can be added. As development platform, Zynq-7000 SoC from Xilinx is used. The tradeoff between Average Absolute Error and Resource Utilization for fixed precision implementation is analyzed and presented. The proposed FPGA implementation is $\sim$ 32 x power efficient compared to software implementation.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER
Authors:
Sreyan Ghosh,
Utkarsh Tyagi,
Manan Suri,
Sonal Kumar,
S Ramaneswaran,
Dinesh Manocha
Abstract:
Complex Named Entity Recognition (NER) is the task of detecting linguistically complex named entities in low-context text. In this paper, we present ACLM Attention-map aware keyword selection for Conditional Language Model fine-tuning), a novel data augmentation approach based on conditional generation to address the data scarcity problem in low-resource complex NER. ACLM alleviates the context-en…
▽ More
Complex Named Entity Recognition (NER) is the task of detecting linguistically complex named entities in low-context text. In this paper, we present ACLM Attention-map aware keyword selection for Conditional Language Model fine-tuning), a novel data augmentation approach based on conditional generation to address the data scarcity problem in low-resource complex NER. ACLM alleviates the context-entity mismatch issue, a problem existing NER data augmentation techniques suffer from and often generates incoherent augmentations by placing complex named entities in the wrong context. ACLM builds on BART and is optimized on a novel text reconstruction or denoising task - we use selective masking (aided by attention maps) to retain the named entities and certain keywords in the input sentence that provide contextually relevant additional knowledge or hints about the named entities. Compared with other data augmentation strategies, ACLM can generate more diverse and coherent augmentations preserving the true word sense of complex entities in the sentence. We demonstrate the effectiveness of ACLM both qualitatively and quantitatively on monolingual, cross-lingual, and multilingual complex NER across various low-resource settings. ACLM outperforms all our neural baselines by a significant margin (1%-36%). In addition, we demonstrate the application of ACLM to other domains that suffer from data scarcity (e.g., biomedical). In practice, ACLM generates more effective and factual augmentations for these domains than prior methods. Code: https://github.com/Sreyan88/ACLM
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
The Geometry of Multilingual Language Models: An Equality Lens
Authors:
Cheril Shah,
Yashashree Chandak,
Manan Suri
Abstract:
Understanding the representations of different languages in multilingual language models is essential for comprehending their cross-lingual properties, predicting their performance on downstream tasks, and identifying any biases across languages. In our study, we analyze the geometry of three multilingual language models in Euclidean space and find that all languages are represented by unique geom…
▽ More
Understanding the representations of different languages in multilingual language models is essential for comprehending their cross-lingual properties, predicting their performance on downstream tasks, and identifying any biases across languages. In our study, we analyze the geometry of three multilingual language models in Euclidean space and find that all languages are represented by unique geometries. Using a geometric separability index we find that although languages tend to be closer according to their linguistic family, they are almost separable with languages from other families. We also introduce a Cross-Lingual Similarity Index to measure the distance of languages with each other in the semantic space. Our findings indicate that the low-resource languages are not represented as good as high resource languages in any of the models
△ Less
Submitted 13 May, 2023;
originally announced May 2023.
-
CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network
Authors:
Sreyan Ghosh,
Manan Suri,
Purva Chiniya,
Utkarsh Tyagi,
Sonal Kumar,
Dinesh Manocha
Abstract:
The tremendous growth of social media users interacting in online conversations has led to significant growth in hate speech, affecting people from various demographics. Most of the prior works focus on detecting explicit hate speech, which is overt and leverages hateful phrases, with very little work focusing on detecting hate speech that is implicit or denotes hatred through indirect or coded la…
▽ More
The tremendous growth of social media users interacting in online conversations has led to significant growth in hate speech, affecting people from various demographics. Most of the prior works focus on detecting explicit hate speech, which is overt and leverages hateful phrases, with very little work focusing on detecting hate speech that is implicit or denotes hatred through indirect or coded language. In this paper, we present CoSyn, a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations. CoSyn introduces novel ways to encode these external contexts and employs a novel context interaction mechanism that clearly captures the interplay between them, making independent assessments of the amounts of information to be retrieved from these noisy contexts. Additionally, it carries out all these operations in the hyperbolic space to account for the scale-free dynamics of social media. We demonstrate the effectiveness of CoSyn on 6 hate speech datasets and show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
△ Less
Submitted 24 October, 2023; v1 submitted 2 March, 2023;
originally announced March 2023.
-
WADER at SemEval-2023 Task 9: A Weak-labelling framework for Data augmentation in tExt Regression Tasks
Authors:
Manan Suri,
Aaryak Garg,
Divya Chaudhary,
Ian Gorton,
Bijendra Kumar
Abstract:
Intimacy is an essential element of human relationships and language is a crucial means of conveying it. Textual intimacy analysis can reveal social norms in different contexts and serve as a benchmark for testing computational models' ability to understand social information. In this paper, we propose a novel weak-labeling strategy for data augmentation in text regression tasks called WADER. WADE…
▽ More
Intimacy is an essential element of human relationships and language is a crucial means of conveying it. Textual intimacy analysis can reveal social norms in different contexts and serve as a benchmark for testing computational models' ability to understand social information. In this paper, we propose a novel weak-labeling strategy for data augmentation in text regression tasks called WADER. WADER uses data augmentation to address the problems of data imbalance and data scarcity and provides a method for data augmentation in cross-lingual, zero-shot tasks. We benchmark the performance of State-of-the-Art pre-trained multilingual language models using WADER and analyze the use of sampling techniques to mitigate bias in data and optimally select augmentation candidates. Our results show that WADER outperforms the baseline model and provides a direction for mitigating data imbalance and scarcity in text regression tasks.
△ Less
Submitted 5 March, 2023;
originally announced March 2023.
-
A novel multimodal dynamic fusion network for disfluency detection in spoken utterances
Authors:
Sreyan Ghosh,
Utkarsh Tyagi,
Sonal Kumar,
Manan Suri,
Rajiv Ratn Shah
Abstract:
Disfluency, though originating from human spoken utterances, is primarily studied as a uni-modal text-based Natural Language Processing (NLP) task. Based on early-fusion and self-attention-based multimodal interaction between text and acoustic modalities, in this paper, we propose a novel multimodal architecture for disfluency detection from individual utterances. Our architecture leverages a mult…
▽ More
Disfluency, though originating from human spoken utterances, is primarily studied as a uni-modal text-based Natural Language Processing (NLP) task. Based on early-fusion and self-attention-based multimodal interaction between text and acoustic modalities, in this paper, we propose a novel multimodal architecture for disfluency detection from individual utterances. Our architecture leverages a multimodal dynamic fusion network that adds minimal parameters over an existing text encoder commonly used in prior art to leverage the prosodic and acoustic cues hidden in speech. Through experiments, we show that our proposed model achieves state-of-the-art results on the widely used English Switchboard for disfluency detection and outperforms prior unimodal and multimodal systems in literature by a significant margin. In addition, we make a thorough qualitative analysis and show that, unlike text-only systems, which suffer from spurious correlations in the data, our system overcomes this problem through additional cues from speech signals. We make all our codes publicly available on GitHub.
△ Less
Submitted 26 November, 2022;
originally announced November 2022.
-
Exploiting Nanoelectronic Properties of Memory Chips for Prevention of IC Counterfeiting
Authors:
Supriya Chakraborty,
Tamoghno Das,
Manan Suri
Abstract:
This study presents a methodology for anticounterfeiting of Non-Volatile Memory (NVM) chips. In particular, we experimentally demonstrate a generalized methodology for detecting (i) Integrated Circuit (IC) origin, (ii) recycled or used NVM chips, and (iii) identification of used locations (addresses) in the chip. Our proposed methodology inspects latency and variability signatures of Commercial-Of…
▽ More
This study presents a methodology for anticounterfeiting of Non-Volatile Memory (NVM) chips. In particular, we experimentally demonstrate a generalized methodology for detecting (i) Integrated Circuit (IC) origin, (ii) recycled or used NVM chips, and (iii) identification of used locations (addresses) in the chip. Our proposed methodology inspects latency and variability signatures of Commercial-Off-The-Shelf (COTS) NVM chips. The proposed technique requires low-cycle (~100) pre-conditioning and utilizes Machine Learning (ML) algorithms. We observe different trends in evolution of latency (sector erase or page write) with cycling on different NVM technologies from different vendors. ML assisted approach is utilized for detecting IC manufacturers with 95.1 % accuracy obtained on prepared test dataset consisting of 3 different NVM technologies including 6 different manufacturers (9 types of chips).
△ Less
Submitted 9 September, 2022;
originally announced September 2022.
-
Low-Power Hardware-Based Deep-Learning Diagnostics Support Case Study
Authors:
Khushal Sethi,
Vivek Parmar,
Manan Suri
Abstract:
Deep learning research has generated widespread interest leading to emergence of a large variety of technological innovations and applications. As significant proportion of deep learning research focuses on vision based applications, there exists a potential for using some of these techniques to enable low-power portable health-care diagnostic support solutions. In this paper, we propose an embedd…
▽ More
Deep learning research has generated widespread interest leading to emergence of a large variety of technological innovations and applications. As significant proportion of deep learning research focuses on vision based applications, there exists a potential for using some of these techniques to enable low-power portable health-care diagnostic support solutions. In this paper, we propose an embedded-hardware-based implementation of microscopy diagnostic support system for PoC case study on: (a) Malaria in thick blood smears, (b) Tuberculosis in sputum samples, and (c) Intestinal parasite infection in stool samples. We use a Squeeze-Net based model to reduce the network size and computation time. We also utilize the Trained Quantization technique to further reduce memory footprint of the learned models. This enables microscopy-based detection of pathogens that classifies with laboratory expert level accuracy as a standalone embedded hardware platform. The proposed implementation is 6x more power-efficient compared to conventional CPU-based implementation and has an inference time of $\sim$ 3 ms/sample.
△ Less
Submitted 3 September, 2022;
originally announced September 2022.
-
Fully-Binarized, Parallel, RRAM-based Computing Primitive for In-Memory Similarity Search
Authors:
Sandeep Kaur Kingra,
Vivek Parmar,
Deepak Verma,
Alessandro Bricalli,
Giuseppe Piccolboni,
Gabriel Molas,
Amir Regev,
Manan Suri
Abstract:
In this work, we propose a fully-binarized XOR-based IMSS (In-Memory Similarity Search) using RRAM (Resistive Random Access Memory) arrays. XOR (Exclusive OR) operation is realized using 2T-2R bitcells arranged along the column in an array. This enables simultaneous match operation across multiple stored data vectors by performing analog column-wise XOR operation and summation to compute HD (Hammi…
▽ More
In this work, we propose a fully-binarized XOR-based IMSS (In-Memory Similarity Search) using RRAM (Resistive Random Access Memory) arrays. XOR (Exclusive OR) operation is realized using 2T-2R bitcells arranged along the column in an array. This enables simultaneous match operation across multiple stored data vectors by performing analog column-wise XOR operation and summation to compute HD (Hamming Distance). The proposed scheme is experimentally validated on fabricated RRAM arrays. Full-system validation is performed through SPICE simulations using open source Skywater 130 nm CMOS PDK demonstrating energy of 17 fJ per XOR operation using the proposed bitcell with a full-system power dissipation of 145 $μ$W. Using projected estimations at advanced nodes (28 nm) energy savings of $\approx$1.5$\times$ compared to the state-of-the-art can be observed for a fixed workload. Application-level validation is performed on HSI (Hyper-Spectral Image) pixel classification task using the Salinas dataset demonstrating an accuracy of 90%.
△ Less
Submitted 18 September, 2022; v1 submitted 4 August, 2022;
originally announced August 2022.
-
Memory-Oriented Design-Space Exploration of Edge-AI Hardware for XR Applications
Authors:
Vivek Parmar,
Syed Shakib Sarwar,
Ziyun Li,
Hsien-Hsin S. Lee,
Barbara De Salvo,
Manan Suri
Abstract:
Low-Power Edge-AI capabilities are essential for on-device extended reality (XR) applications to support the vision of Metaverse. In this work, we investigate two representative XR workloads: (i) Hand detection and (ii) Eye segmentation, for hardware design space exploration. For both applications, we train deep neural networks and analyze the impact of quantization and hardware specific bottlenec…
▽ More
Low-Power Edge-AI capabilities are essential for on-device extended reality (XR) applications to support the vision of Metaverse. In this work, we investigate two representative XR workloads: (i) Hand detection and (ii) Eye segmentation, for hardware design space exploration. For both applications, we train deep neural networks and analyze the impact of quantization and hardware specific bottlenecks. Through simulations, we evaluate a CPU and two systolic inference accelerator implementations. Next, we compare these hardware solutions with advanced technology nodes. The impact of integrating state-of-the-art emerging non-volatile memory technology (STT/SOT/VGSOT MRAM) into the XR-AI inference pipeline is evaluated. We found that significant energy benefits (>=24%) can be achieved for hand detection (IPS=10) and eye segmentation (IPS=0.1) by introducing non-volatile memory in the memory hierarchy for designs at 7nm node while meeting minimum IPS (inference per second). Moreover, we can realize substantial reduction in area (>=30%) owing to the small form factor of MRAM compared to traditional SRAM.
△ Less
Submitted 28 March, 2023; v1 submitted 8 June, 2022;
originally announced June 2022.
-
Low Power Neuromorphic EMG Gesture Classification
Authors:
Sai Sukruth Bezugam,
Ahmed Shaban,
Manan Suri
Abstract:
EMG (Electromyograph) signal based gesture recognition can prove vital for applications such as smart wearables and bio-medical neuro-prosthetic control. Spiking Neural Networks (SNNs) are promising for low-power, real-time EMG gesture recognition, owing to their inherent spike/event driven spatio-temporal dynamics. In literature, there are limited demonstrations of neuromorphic hardware implement…
▽ More
EMG (Electromyograph) signal based gesture recognition can prove vital for applications such as smart wearables and bio-medical neuro-prosthetic control. Spiking Neural Networks (SNNs) are promising for low-power, real-time EMG gesture recognition, owing to their inherent spike/event driven spatio-temporal dynamics. In literature, there are limited demonstrations of neuromorphic hardware implementation (at full chip/board/system scale) for EMG gesture classification. Moreover, most literature attempts exploit primitive SNNs based on LIF (Leaky Integrate and Fire) neurons. In this work, we address the aforementioned gaps with following key contributions: (1) Low-power, high accuracy demonstration of EMG-signal based gesture recognition using neuromorphic Recurrent Spiking Neural Networks (RSNN). In particular, we propose a multi-time scale recurrent neuromorphic system based on special double-exponential adaptive threshold (DEXAT) neurons. Our network achieves state-of-the-art classification accuracy (90%) while using ~53% lesser neurons than best reported prior art on Roshambo EMG dataset. (2) A new multi-channel spike encoder scheme for efficient processing of real-valued EMG data on neuromorphic systems. (3) Unique multi-compartment methodology to implement complex adaptive neurons on Intel's dedicated neuromorphic Loihi chip is shown. (4) RSNN implementation on Loihi (Nahuku 32) achieves significant energy/latency benefits of ~983X/19X compared to GPU for batch size as 50.
△ Less
Submitted 4 June, 2022;
originally announced June 2022.
-
Time-multiplexed In-memory computation scheme for map** Quantized Neural Networks on hybrid CMOS-OxRAM building blocks
Authors:
Sandeep Kaur Kingra,
Vivek Parmar,
Manoj Sharma,
Manan Suri
Abstract:
In this work, we experimentally demonstrate two key building blocks for realizing Binary/Ternary Neural Networks (BNNs/TNNs): (i) 130 nm CMOS based sigmoidal neurons and (ii) HfOx based multi-level (MLC) OxRAM-synaptic blocks. An optimized vector matrix multiplication programming scheme that utilizes the two building blocks is also presented. Compared to prior approaches that utilize differential…
▽ More
In this work, we experimentally demonstrate two key building blocks for realizing Binary/Ternary Neural Networks (BNNs/TNNs): (i) 130 nm CMOS based sigmoidal neurons and (ii) HfOx based multi-level (MLC) OxRAM-synaptic blocks. An optimized vector matrix multiplication programming scheme that utilizes the two building blocks is also presented. Compared to prior approaches that utilize differential synaptic structures, a single device per synapse with two sets of READ operations is used. Proposed hardware map** strategy shows performance change of <5% (decrease of 2-5% for TNN, increase of 0.2% for BNN) compared to ideal quantized neural networks (QNN) with significant memory savings in the order of 16-32x for classification problem on Fashion MNIST (FMNIST) dataset. Impact of OxRAM device variability on the performance of Hardware QNN (BNN/TNN) is also analyzed.
△ Less
Submitted 28 July, 2022; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Bird-Area Water-Bodies Dataset (BAWD) and Predictive AI Model for Avian Botulism Outbreak (AVI-BoT)
Authors:
Narayani Bhatia,
Devang Mahesh,
Jashandeep Singh,
Manan Suri
Abstract:
Avian botulism is a paralytic bacterial disease in birds often leading to high fatality. In-vitro diagnostic techniques such as Mouse Bioassay, ELISA, PCR are usually non-preventive, post-mortem in nature, and require invasive sample collection from affected sites or dead birds. In this study, we build a first-ever multi-spectral, remote-sensing imagery based global Bird-Area Water-bodies Dataset…
▽ More
Avian botulism is a paralytic bacterial disease in birds often leading to high fatality. In-vitro diagnostic techniques such as Mouse Bioassay, ELISA, PCR are usually non-preventive, post-mortem in nature, and require invasive sample collection from affected sites or dead birds. In this study, we build a first-ever multi-spectral, remote-sensing imagery based global Bird-Area Water-bodies Dataset (BAWD) (i.e. fused satellite images of warm-water lakes/marshy-lands or similar water-body sites that are important for avian fauna) backed by on-ground reporting evidence of outbreaks. BAWD consists of 16 topographically diverse global sites monitored over a time-span of 4 years (2016-2021). We propose a first-ever Artificial Intelligence based (AI) model to predict potential outbreak of Avian botulism called AVI-BoT (Aerosol Visible, Infra-red (NIR/SWIR) and Bands of Thermal). We also train and investigate a simpler (5-band) Causative-Factor model (based on prominent physiological factors reported in literature) to predict Avian botulism. AVI-BoT demonstrates a training accuracy of 0.96 and validation accuracy of 0.989 on BAWD, far superior in comparison to our model based on causative factors. We also perform an ablation study and perform a detailed feature-space analysis. We further analyze three test case study locations - Lower Klamath National Wildlife Refuge and Langvlei and Rondevlei lakes where an outbreak had occurred, and Pong Dam where an outbreak had not occurred and confirm predictions with on-ground reportings. The proposed technique presents a scale-able, low-cost, non-invasive methodology for continuous monitoring of bird-habitats against botulism outbreaks with the potential of saving valuable fauna lives.
△ Less
Submitted 17 November, 2022; v1 submitted 3 May, 2021;
originally announced May 2021.
-
NV-Fogstore : Device-aware hybrid caching in fog computing environments
Authors:
Khushal Sethi,
Manan Suri
Abstract:
Edge caching via the placement of distributed storages throughout the network is a promising solution to reduce latency and network costs of content delivery. With the advent of the upcoming 5G future, billions of F-RAN (Fog-Radio Access Network) nodes will created and used for for the purpose of Edge Caching. Hence, the total amount of memory deployed at the edge is expected to increase 100 times…
▽ More
Edge caching via the placement of distributed storages throughout the network is a promising solution to reduce latency and network costs of content delivery. With the advent of the upcoming 5G future, billions of F-RAN (Fog-Radio Access Network) nodes will created and used for for the purpose of Edge Caching. Hence, the total amount of memory deployed at the edge is expected to increase 100 times.
Currently, used DRAM-based caches in CDN (Content Delivery Networks) are extremely power-hungry and costly. Our purpose is to reduce the cost of ownership and recurring costs (of power consumption) in an F-RAN node while maintaining Quality of Service.
For our purpose, we propose NV-FogStore, a scalable hybrid key-value storage architecture for the utilization of Non-Volatile Memories (such as RRAM, MRAM, Intel Optane) in Edge Cache.
We further describe in detail a novel, hierarchical, write-damage, size and frequency aware content caching policy H-GREEDY for our architecture.
We show that our policy can be tuned as per performance objectives, to lower the power, energy consumption and total cost over an only DRAM-based system for only a relatively smaller trade-off in the average access latency.
△ Less
Submitted 20 October, 2020;
originally announced October 2020.
-
Exploration of Optimized Semantic Segmentation Architectures for edge-Deployment on Drones
Authors:
Vivek Parmar,
Narayani Bhatia,
Shubham Negi,
Manan Suri
Abstract:
In this paper, we present an analysis on the impact of network parameters for semantic segmentation architectures in context of UAV data processing. We present the analysis on the DroneDeploy Segmentation benchmark. Based on the comparative analysis we identify the optimal network architecture to be FPN-EfficientNetB3 with pretrained encoder backbones based on Imagenet Dataset. The network achieve…
▽ More
In this paper, we present an analysis on the impact of network parameters for semantic segmentation architectures in context of UAV data processing. We present the analysis on the DroneDeploy Segmentation benchmark. Based on the comparative analysis we identify the optimal network architecture to be FPN-EfficientNetB3 with pretrained encoder backbones based on Imagenet Dataset. The network achieves IoU score of 0.65 and F1-score of 0.71 over the validation dataset. We also compare the various architectures in terms of their memory footprint and inference latency with further exploration of the impact of TensorRT based optimizations. We achieve memory savings of ~4.1x and latency improvement of 10% compared to Model: FPN and Backbone: InceptionResnetV2.
△ Less
Submitted 6 July, 2020;
originally announced July 2020.
-
Unified Characterization Platform for Emerging NVM Technology: Neural Network Application Benchmarking Using off-the-shelf NVM Chips
Authors:
Supriya Chakraborty,
Abhishek Gupta,
Manan Suri
Abstract:
In this paper, we present a unified FPGA based electrical test-bench for characterizing different emerging NonVolatile Memory (NVM) chips. In particular, we present detailed electrical characterization and benchmarking of multiple commercially available, off-the-shelf, NVM chips viz.: MRAM, FeRAM, CBRAM, and ReRAM. We investigate important NVM parameters such as: (i) current consumption patterns,…
▽ More
In this paper, we present a unified FPGA based electrical test-bench for characterizing different emerging NonVolatile Memory (NVM) chips. In particular, we present detailed electrical characterization and benchmarking of multiple commercially available, off-the-shelf, NVM chips viz.: MRAM, FeRAM, CBRAM, and ReRAM. We investigate important NVM parameters such as: (i) current consumption patterns, (ii) endurance, and (iii) error characterization. The proposed FPGA based testbench is then utilized for a Proof-of-Concept (PoC) Neural Network (NN) image classification application. Four emerging NVM chips are benchmarked against standard SRAM and Flash technology for the AI application as active weight memory during inference mode.
△ Less
Submitted 10 June, 2020;
originally announced June 2020.
-
Methodology for Realizing VMM with Binary RRAM Arrays: Experimental Demonstration of Binarized-ADALINE Using OxRAM Crossbar
Authors:
Sandeep Kaur Kingra,
Vivek Parmar,
Shubham Negi,
Sufyan Khan,
Boris Hudec,
Tuo-Hung Hou,
Manan Suri
Abstract:
In this paper, we present an efficient hardware map** methodology for realizing vector matrix multiplication (VMM) on resistive memory (RRAM) arrays. Using the proposed VMM computation technique, we experimentally demonstrate a binarized-ADALINE (Adaptive Linear) classifier on an OxRAM crossbar. An 8x8 OxRAM crossbar with Ni/3-nm HfO2/7 nm Al-doped-TiO2/TiN device stack is used. Weight training…
▽ More
In this paper, we present an efficient hardware map** methodology for realizing vector matrix multiplication (VMM) on resistive memory (RRAM) arrays. Using the proposed VMM computation technique, we experimentally demonstrate a binarized-ADALINE (Adaptive Linear) classifier on an OxRAM crossbar. An 8x8 OxRAM crossbar with Ni/3-nm HfO2/7 nm Al-doped-TiO2/TiN device stack is used. Weight training for the binarized-ADALINE classifier is performed ex-situ on UCI cancer dataset. Post weight generation the OxRAM array is carefully programmed to binary weight-states using the proposed weight map** technique on a custom-built testbench. Our VMM powered binarized-ADALINE network achieves a classification accuracy of 78% in simulation and 67% in experiments. Experimental accuracy was found to drop mainly due to crossbar inherent sneak-path issues and RRAM device programming variability.
△ Less
Submitted 10 June, 2020;
originally announced June 2020.
-
Investigation of Data Deletion Vulnerabilities in NAND Flash Memory Based Storage
Authors:
Abhilash Garg,
Supriya Chakraborty,
Manoj Malik,
Devesh Kumar,
Satyajeet Singh,
Manan Suri
Abstract:
Semiconductor NAND Flash based memory technology dominates the electronic Non-Volatile storage media market. Though NAND Flash offers superior performance and reliability over conventional magnetic HDDs, yet it suffers from certain data-security vulnerabilities. Such vulnerabilities can expose sensitive information stored on the media to security risks. It is thus necessary to study in detail the…
▽ More
Semiconductor NAND Flash based memory technology dominates the electronic Non-Volatile storage media market. Though NAND Flash offers superior performance and reliability over conventional magnetic HDDs, yet it suffers from certain data-security vulnerabilities. Such vulnerabilities can expose sensitive information stored on the media to security risks. It is thus necessary to study in detail the fundamental reasons behind data-security vulnerabilities of NAND Flash for use in critical applications. In this paper, the problem of unreliable data-deletion/sanitization in commercial NAND Flash media is investigated along with the fundamental reasons leading to such vulnerabilities. Exhaustive software based data recovery experiments (multiple iterations) has been carried out on commercial NAND Flash storage media (8 GB and 16 GB) for different types of filesystems (NTFS and FAT) and OS specific delete/Erase instructions. 100 % data recovery is obtained for windows and linux based delete/Erase commands. Inverse effect of performance enhancement techniques like wear levelling, bad block management etc. is also observed with the help of software based recovery experiments.
△ Less
Submitted 21 January, 2020;
originally announced January 2020.
-
The Statistical Dictionary-based String Matching Problem
Authors:
M. Suri,
S. Rini
Abstract:
In the Dictionary-based String Matching (DSM) problem, a retrieval system has access to a source sequence and stores the position of a certain number of strings in a posting table. When a user inquires the position of a string, the retrieval system, instead of searching in the source sequence directly, relies on the the posting table to answer the query more efficiently. In this paper, the Statist…
▽ More
In the Dictionary-based String Matching (DSM) problem, a retrieval system has access to a source sequence and stores the position of a certain number of strings in a posting table. When a user inquires the position of a string, the retrieval system, instead of searching in the source sequence directly, relies on the the posting table to answer the query more efficiently. In this paper, the Statistical DSM problem is a proposed as a statistical and information-theoretic formulation of the classic DSM problem in which both the source and the query have a statistical description while the strings stored in the posting sequence are described as a code. Through this formulation, we are able to define the efficiency of the retrieval system as the average cost in answering a users' query in the limit of sufficiently long source sequence. This formulation is used to study the retrieval performance for the case in which (i) all the strings of a given length, referred to as k-grams , and (ii) prefix-free codes.
△ Less
Submitted 22 November, 2018;
originally announced November 2018.
-
SLIM: Simultaneous Logic-in-Memory Computing Exploiting Bilayer Analog OxRAM Devices
Authors:
Sandeep Kaur Kingra,
Vivek Parmar,
Che-Chia Chang,
Boris Hudec,
Tuo-Hung Hou,
Manan Suri
Abstract:
Von Neumann architecture based computers isolate/physically separate computation and storage units i.e. data is shuttled between computation unit (processor) and memory unit to realize logic/ arithmetic and storage functions. This to-and-fro movement of data leads to a fundamental limitation of modern computers, known as the memory wall. Logic in-Memory (LIM) approaches aim to address this bottlen…
▽ More
Von Neumann architecture based computers isolate/physically separate computation and storage units i.e. data is shuttled between computation unit (processor) and memory unit to realize logic/ arithmetic and storage functions. This to-and-fro movement of data leads to a fundamental limitation of modern computers, known as the memory wall. Logic in-Memory (LIM) approaches aim to address this bottleneck by computing inside the memory units and thereby eliminating the energy-intensive and time-consuming data movement. However, most LIM approaches reported in literature are not truly "simultaneous" as during LIM operation the bitcell can be used only as a Memory cell or only as a Logic cell. The bitcell is not capable of storing both the Memory/Logic outputs simultaneously. Here, we propose a novel 'Simultaneous Logic in-Memory' (SLIM) methodology that allows to implement both Memory and Logic operations simultaneously on the same bitcell in a non-destructive manner without losing the previously stored Memory state. Through extensive experiments we demonstrate the SLIM methodology using non-filamentary bilayer analog OxRAM devices with NMOS transistors (2T-1R bitcell). Detailed programming scheme, array level implementation and controller architecture are also proposed. Furthermore, to study the impact of introducing SLIM array in the memory hierarchy, a simple image processing application (edge detection) is also investigated. It has been estimated that by performing all computations inside the SLIM array, the total Energy Delay Product (EDP) reduces by ~ 40x in comparison to a modern-day computer. EDP saving owing to reduction in data transfer between CPU Memory is observed to be ~ 780x.
△ Less
Submitted 13 February, 2020; v1 submitted 14 November, 2018;
originally announced November 2018.
-
Low-Power (1T1N) Skyrmionic Synapses for Spiking Neuromorphic Systems
Authors:
Tinish Bhattacharya,
Sai Li,
Yangqi Huang,
Wang Kang,
Weisheng Zhao,
Manan Suri
Abstract:
In this work, we propose a `1-transistor 1-nanotrack' (1T1N) synapse based on movement of magnetic skyrmions using spin polarised current pulses. The proposed synaptic bit-cell has 4 terminals and fully decoupled spike transmission- and programming- paths. With careful tuning of programming parameters we ensure multi-level non-volatile conductance evolution in the proposed skyrmionic synapse. Thro…
▽ More
In this work, we propose a `1-transistor 1-nanotrack' (1T1N) synapse based on movement of magnetic skyrmions using spin polarised current pulses. The proposed synaptic bit-cell has 4 terminals and fully decoupled spike transmission- and programming- paths. With careful tuning of programming parameters we ensure multi-level non-volatile conductance evolution in the proposed skyrmionic synapse. Through micromagnetic simulations, we studied in detail the impact of programming conditions (current density, pulse width) on synaptic performance parameters such as number of conductance levels and energy per transition. The programming parameters chosen used all further analysis gave rise to a synapse with 7 distinct conductance states and 1.2 fJ per conductance state transition event. Exploiting bidirectional conductance modulation the 1T1N synapse is able to undergo long-term potentiation (LTP) & depression (LTD) according to a simplified variant of biological spike timing dependent plasticity (STDP) rule. We present subthreshold CMOS spike generator circuit which when coupled with well known subthreshold integrator circuit, produces custom pre and post-neuronal spike shapes, responsible for implementing unsupervised learning with the proposed 1T1N synaptic bit-cell and consuming ~ 0.25 pJ/event. A spiking neural network (SNN) incorporating the characteristics of the 1T1N synapse was simulated for two seperate applications: pattern extraction from noisy video streams and MNIST classification.
△ Less
Submitted 6 November, 2018;
originally announced November 2018.
-
MASTISK
Authors:
Tinish Bhattacharya,
Vivek Parmar,
Manan Suri
Abstract:
In this paper, we present MASTISK (MAchine-learning and Synaptic-plasticity Technology Integrated Simulation frameworK). MASTISK is an open-source versatile and flexible tool developed in MATLAB for design exploration of dedicated neuromorphic hardware using nanodevices and hybrid CMOS-nanodevice circuits. MASTISK has a hierarchical organization capturing details at the level of devices, circuits…
▽ More
In this paper, we present MASTISK (MAchine-learning and Synaptic-plasticity Technology Integrated Simulation frameworK). MASTISK is an open-source versatile and flexible tool developed in MATLAB for design exploration of dedicated neuromorphic hardware using nanodevices and hybrid CMOS-nanodevice circuits. MASTISK has a hierarchical organization capturing details at the level of devices, circuits (i.e. neurons or activation functions, synapses or weights) and architectures (i.e. topology, learning-rules, algorithms). In the current version, MASTISK provides user-friendly interface for design and simulation of spiking neural networks (SNN) powered by spatio-temporal learning rules such as Spike-Timing Dependent Plasticity (STDP). Users may provide network definition as a simple input parameter file and the framework is capable of performing automated learning/inference simulations. Validation case-studies of the proposed open source simulator will be published in the proceedings of IJCNN 2018. The proposed framework offers new functionalities, compared to similar simulation tools in literature, such as: (i) arbitrary synaptic circuit modeling capability with both identical and non-identical stimuli, (ii) arbitrary spike modeling, and (iii) nanodevice based neuron emulation. The code of MASTISK is available on request at: https://gitlab.com/NVM IITD Research/MASTISK/wikis/home
△ Less
Submitted 3 April, 2018;
originally announced April 2018.
-
Design Exploration of Hybrid CMOS-OxRAM Deep Generative Architectures
Authors:
Vivek Parmar,
Manan Suri
Abstract:
Deep Learning and its applications have gained tremendous interest recently in both academia and industry. Restricted Boltzmann Machines (RBMs) offer a key methodology to implement deep learning paradigms. This paper presents a novel approach for realizing hybrid CMOS-OxRAM based deep generative models (DGM). In our proposed hybrid DGM architectures, HfOx based (filamentary-type switching) OxRAM d…
▽ More
Deep Learning and its applications have gained tremendous interest recently in both academia and industry. Restricted Boltzmann Machines (RBMs) offer a key methodology to implement deep learning paradigms. This paper presents a novel approach for realizing hybrid CMOS-OxRAM based deep generative models (DGM). In our proposed hybrid DGM architectures, HfOx based (filamentary-type switching) OxRAM devices are extensively used for realizing multiple computational and non-computational functions such as: (i) Synapses (weights), (ii) internal neuron-state storage, (iii) stochastic neuron activation and (iv) programmable signal normalization. To validate the proposed scheme we have simulated two different architectures: (i) Deep Belief Network (DBN) and (ii) Stacked Denoising Autoencoder for classification and reconstruction of hand-written digits from a reduced MNIST dataset of 6000 images. Contrastive-divergence (CD) specially optimized for OxRAM devices was used to drive the synaptic weight update mechanism of each layer in the network. Overall learning rule was based on greedy-layer wise learning with no back propagation which allows the network to be trained to a good pre-training stage. Performance of the simulated hybrid CMOS-RRAM DGM model matches closely with software based model for a 2-layers deep network. Top-3 test accuracy achieved by the DBN was 95.5%. MSE of the SDA network was 0.003, lower than software based approach. Endurance analysis of the simulated architectures show that for 200 epochs of training (single RBM layer), maximum switching events/per OxRAM device was ~ 7000 cycles.
△ Less
Submitted 6 January, 2018;
originally announced January 2018.
-
Dispenser Concept for Unmanned Aerial Vehicles (UAV, Drone, UAS)
Authors:
Manan Suri
Abstract:
System, design and methodology to load and dispense different articles from an autonomous aircraft are disclosed. In one embodiment, the design of a unique detachable dispenser for delivery of articles is described along with an intelligent methodology of loading and delivering the articles to and from the dispenser. Design of the dispenser, interaction of the dispenser with the flight control uni…
▽ More
System, design and methodology to load and dispense different articles from an autonomous aircraft are disclosed. In one embodiment, the design of a unique detachable dispenser for delivery of articles is described along with an intelligent methodology of loading and delivering the articles to and from the dispenser. Design of the dispenser, interaction of the dispenser with the flight control unit and ground control or base-station, and interaction of the base station with the sender or recipient of the article, are also described.
△ Less
Submitted 22 September, 2017;
originally announced October 2017.
-
Exploiting OxRAM Resistive Switching for Dynamic Range Improvement of CMOS Image Sensors
Authors:
Ashwani Kumar,
Mukul Sarkar,
Manan Suri
Abstract:
We present a unique application of OxRAM devices in CMOS Image Sensors (CIS) for dynamic range (DR) improvement. We propose a modified 3T-APS (Active Pixel Sensor) circuit that incorporates OxRAM in 1T-1R configuration. DR improvement is achieved by resistive compression of the pixel output signal through autonomous programming of OxRAM device resistance during exposure. We show that by carefully…
▽ More
We present a unique application of OxRAM devices in CMOS Image Sensors (CIS) for dynamic range (DR) improvement. We propose a modified 3T-APS (Active Pixel Sensor) circuit that incorporates OxRAM in 1T-1R configuration. DR improvement is achieved by resistive compression of the pixel output signal through autonomous programming of OxRAM device resistance during exposure. We show that by carefully preconditioning the OxRAM resistance, pixel DR can be enhanced. Detailed impact of OxRAM SET-to-RESET and RESET-to-SET transitions on pixel DR is discussed. For experimental validation with specific OxRAM preprogrammed states, a 4 Kb 10 nm thick HfOx (1T-1R) matrix was fabricated and characterized. Best case, relative pixel DR improvement of ~ 50 dB was obtained for our design.
△ Less
Submitted 5 May, 2017;
originally announced May 2017.