Search | arXiv e-print repository

TCAM-SSD: A Framework for Search-Based Computing in Solid-State Drives

Authors: Ryan Wong, Nikita Kim, Kevin Higgs, Sapan Agarwal, Engin Ipek, Saugata Ghose, Ben Feinberg

Abstract: As the amount of data produced in society continues to grow at an exponential rate, modern applications are incurring significant performance and energy penalties due to high data movement between the CPU and memory/storage. While processing in main memory can alleviate these penalties, it is becoming increasingly difficult to keep large datasets entirely in main memory. This has led to a recent p… ▽ More As the amount of data produced in society continues to grow at an exponential rate, modern applications are incurring significant performance and energy penalties due to high data movement between the CPU and memory/storage. While processing in main memory can alleviate these penalties, it is becoming increasingly difficult to keep large datasets entirely in main memory. This has led to a recent push for in-storage computation, where processing is performed inside the storage device. We propose TCAM-SSD, a new framework for search-based computation inside the NAND flash memory arrays of a conventional solid-state drive (SSD), which requires lightweight modifications to only the array periphery and firmware. TCAM-SSD introduces a search manager and link table, which can logically partition the NAND flash memory's contents into search-enabled regions and standard storage regions. Together, these light firmware changes enable TCAM-SSD to seamlessly handle block I/O operations, in addition to new search operations, thereby reducing end-to-end execution time and total data movement. We provide an NVMe-compatible interface that provides programmers with the ability to dynamically allocate data on and make use of TCAM-SSD, allowing the system to be leveraged by a wide variety of applications. We evaluate three example use cases of TCAM-SSD to demonstrate its benefits. For transactional databases, TCAM-SSD can mitigate the performance penalties for applications with large datasets, achieving a 60.9% speedup over a conventional system that retrieves data from the SSD and computes using the CPU. For database analytics, TCAM-SSD provides an average speedup of 17.7x over a conventional system for a collection of analytical queries. For graph analytics, we combine TCAM-SSD's associative search with a sparse data structure, speeding up graph computing for larger-than-memory datasets by 14.5%. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2109.01262 [pdf, other]

doi 10.1109/MCAS.2022.3214409

On the Accuracy of Analog Neural Network Inference Accelerators

Authors: T. Patrick Xiao, Ben Feinberg, Christopher H. Bennett, Venkatraman Prabhakar, Prashant Saxena, Vineet Agrawal, Sapan Agarwal, Matthew J. Marinella

Abstract: Specialized accelerators have recently garnered attention as a method to reduce the power consumption of neural network inference. A promising category of accelerators utilizes nonvolatile memory arrays to both store weights and perform $\textit{in situ}$ analog computation inside the array. While prior work has explored the design space of analog accelerators to optimize performance and energy ef… ▽ More Specialized accelerators have recently garnered attention as a method to reduce the power consumption of neural network inference. A promising category of accelerators utilizes nonvolatile memory arrays to both store weights and perform $\textit{in situ}$ analog computation inside the array. While prior work has explored the design space of analog accelerators to optimize performance and energy efficiency, there is seldom a rigorous evaluation of the accuracy of these accelerators. This work shows how architectural design decisions, particularly in map** neural network parameters to analog memory cells, influence inference accuracy. When evaluated using ResNet50 on ImageNet, the resilience of the system to analog non-idealities - cell programming errors, analog-to-digital converter resolution, and array parasitic resistances - all improve when analog quantities in the hardware are made proportional to the weights in the network. Moreover, contrary to the assumptions of prior work, nearly equivalent resilience to cell imprecision can be achieved by fully storing weights as analog quantities, rather than spreading weight bits across multiple devices, often referred to as bit slicing. By exploiting proportionality, analog system designers have the freedom to match the precision of the hardware to the needs of the algorithm, rather than attempting to guarantee the same level of precision in the intermediate results as an equivalent digital accelerator. This ultimately results in an analog accelerator that is more accurate, more robust to analog errors, and more energy-efficient. △ Less

Submitted 3 February, 2022; v1 submitted 2 September, 2021; originally announced September 2021.

Comments: Changes in v3: modified definition of state-independent error (factor of 2) for fairer comparison to state-proportional. Added more results on INT4 network

Journal ref: IEEE Circuits and Systems Magazine, vol. 22, no. 4, pp. 26-48, 2022

arXiv:2004.00802 [pdf]

Device-aware inference operations in SONOS nonvolatile memory arrays

Authors: Christopher H. Bennett, T. Patrick Xiao, Ryan Dellana, Vineet Agrawal, Ben Feinberg, Venkatraman Prabhakar, Krishnaswamy Ramkumar, Long Hinh, Swatilekha Saha, Vijay Raghavan, Ramesh Chettuvetty, Sapan Agarwal, Matthew J. Marinella

Abstract: Non-volatile memory arrays can deploy pre-trained neural network models for edge inference. However, these systems are affected by device-level noise and retention issues. Here, we examine damage caused by these effects, introduce a mitigation strategy, and demonstrate its use in fabricated array of SONOS (Silicon-Oxide-Nitride-Oxide-Silicon) devices. On MNIST, fashion-MNIST, and CIFAR-10 tasks, o… ▽ More Non-volatile memory arrays can deploy pre-trained neural network models for edge inference. However, these systems are affected by device-level noise and retention issues. Here, we examine damage caused by these effects, introduce a mitigation strategy, and demonstrate its use in fabricated array of SONOS (Silicon-Oxide-Nitride-Oxide-Silicon) devices. On MNIST, fashion-MNIST, and CIFAR-10 tasks, our approach increases resilience to synaptic noise and drift. We also show strong performance can be realized with ADCs of 5-8 bits precision. △ Less

Submitted 2 April, 2020; originally announced April 2020.

Comments: To be presented at IEEE International Physics Reliability Symposium (IRPS) 2020

arXiv:2003.10396 [pdf, other]

Evaluating complexity and resilience trade-offs in emerging memory inference machines

Authors: Christopher H. Bennett, Ryan Dellana, T. Patrick Xiao, Ben Feinberg, Sapan Agarwal, Suma Cardwell, Matthew J. Marinella, William Severa, Brad Aimone

Abstract: Neuromorphic-style inference only works well if limited hardware resources are maximized properly, e.g. accuracy continues to scale with parameters and complexity in the face of potential disturbance. In this work, we use realistic crossbar simulations to highlight that compact implementations of deep neural networks are unexpectedly susceptible to collapse from multiple system disturbances. Our w… ▽ More Neuromorphic-style inference only works well if limited hardware resources are maximized properly, e.g. accuracy continues to scale with parameters and complexity in the face of potential disturbance. In this work, we use realistic crossbar simulations to highlight that compact implementations of deep neural networks are unexpectedly susceptible to collapse from multiple system disturbances. Our work proposes a middle path towards high performance and strong resilience utilizing the Mosaics framework, and specifically by re-using synaptic connections in a recurrent neural network implementation that possesses a natural form of noise-immunity. △ Less

Submitted 25 February, 2020; originally announced March 2020.

arXiv:1205.2671 [pdf, other]

Fundamental Physics at the Intensity Frontier

Authors: J. L. Hewett, H. Weerts, R. Brock, J. N. Butler, B. C. K. Casey, J. Collar, A. de Gouvea, R. Essig, Y. Grossman, W. Haxton, J. A. Jaros, C. K. Jung, Z. T. Lu, K. Pitts, Z. Ligeti, J. R. Patterson, M. Ramsey-Musolf, J. L. Ritchie, A. Roodman, K. Scholberg, C. E. M. Wagner, G. P. Zeller, S. Aefsky, A. Afanasev, K. Agashe , et al. (443 additional authors not shown)

Abstract: The Proceedings of the 2011 workshop on Fundamental Physics at the Intensity Frontier. Science opportunities at the intensity frontier are identified and described in the areas of heavy quarks, charged leptons, neutrinos, proton decay, new light weakly-coupled particles, and nucleons, nuclei, and atoms. The Proceedings of the 2011 workshop on Fundamental Physics at the Intensity Frontier. Science opportunities at the intensity frontier are identified and described in the areas of heavy quarks, charged leptons, neutrinos, proton decay, new light weakly-coupled particles, and nucleons, nuclei, and atoms. △ Less

Submitted 11 May, 2012; originally announced May 2012.

Comments: 229 pages

Report number: ANL-HEP-TR-12-25, SLAC-R-991

Showing 1–5 of 5 results for author: Feinberg, B