-
TCAM-SSD: A Framework for Search-Based Computing in Solid-State Drives
Authors:
Ryan Wong,
Nikita Kim,
Kevin Higgs,
Sapan Agarwal,
Engin Ipek,
Saugata Ghose,
Ben Feinberg
Abstract:
As the amount of data produced in society continues to grow at an exponential rate, modern applications are incurring significant performance and energy penalties due to high data movement between the CPU and memory/storage. While processing in main memory can alleviate these penalties, it is becoming increasingly difficult to keep large datasets entirely in main memory. This has led to a recent p…
▽ More
As the amount of data produced in society continues to grow at an exponential rate, modern applications are incurring significant performance and energy penalties due to high data movement between the CPU and memory/storage. While processing in main memory can alleviate these penalties, it is becoming increasingly difficult to keep large datasets entirely in main memory. This has led to a recent push for in-storage computation, where processing is performed inside the storage device.
We propose TCAM-SSD, a new framework for search-based computation inside the NAND flash memory arrays of a conventional solid-state drive (SSD), which requires lightweight modifications to only the array periphery and firmware. TCAM-SSD introduces a search manager and link table, which can logically partition the NAND flash memory's contents into search-enabled regions and standard storage regions. Together, these light firmware changes enable TCAM-SSD to seamlessly handle block I/O operations, in addition to new search operations, thereby reducing end-to-end execution time and total data movement. We provide an NVMe-compatible interface that provides programmers with the ability to dynamically allocate data on and make use of TCAM-SSD, allowing the system to be leveraged by a wide variety of applications. We evaluate three example use cases of TCAM-SSD to demonstrate its benefits. For transactional databases, TCAM-SSD can mitigate the performance penalties for applications with large datasets, achieving a 60.9% speedup over a conventional system that retrieves data from the SSD and computes using the CPU. For database analytics, TCAM-SSD provides an average speedup of 17.7x over a conventional system for a collection of analytical queries. For graph analytics, we combine TCAM-SSD's associative search with a sparse data structure, speeding up graph computing for larger-than-memory datasets by 14.5%.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
On the Accuracy of Analog Neural Network Inference Accelerators
Authors:
T. Patrick Xiao,
Ben Feinberg,
Christopher H. Bennett,
Venkatraman Prabhakar,
Prashant Saxena,
Vineet Agrawal,
Sapan Agarwal,
Matthew J. Marinella
Abstract:
Specialized accelerators have recently garnered attention as a method to reduce the power consumption of neural network inference. A promising category of accelerators utilizes nonvolatile memory arrays to both store weights and perform $\textit{in situ}$ analog computation inside the array. While prior work has explored the design space of analog accelerators to optimize performance and energy ef…
▽ More
Specialized accelerators have recently garnered attention as a method to reduce the power consumption of neural network inference. A promising category of accelerators utilizes nonvolatile memory arrays to both store weights and perform $\textit{in situ}$ analog computation inside the array. While prior work has explored the design space of analog accelerators to optimize performance and energy efficiency, there is seldom a rigorous evaluation of the accuracy of these accelerators. This work shows how architectural design decisions, particularly in map** neural network parameters to analog memory cells, influence inference accuracy. When evaluated using ResNet50 on ImageNet, the resilience of the system to analog non-idealities - cell programming errors, analog-to-digital converter resolution, and array parasitic resistances - all improve when analog quantities in the hardware are made proportional to the weights in the network. Moreover, contrary to the assumptions of prior work, nearly equivalent resilience to cell imprecision can be achieved by fully storing weights as analog quantities, rather than spreading weight bits across multiple devices, often referred to as bit slicing. By exploiting proportionality, analog system designers have the freedom to match the precision of the hardware to the needs of the algorithm, rather than attempting to guarantee the same level of precision in the intermediate results as an equivalent digital accelerator. This ultimately results in an analog accelerator that is more accurate, more robust to analog errors, and more energy-efficient.
△ Less
Submitted 3 February, 2022; v1 submitted 2 September, 2021;
originally announced September 2021.
-
Device-aware inference operations in SONOS nonvolatile memory arrays
Authors:
Christopher H. Bennett,
T. Patrick Xiao,
Ryan Dellana,
Vineet Agrawal,
Ben Feinberg,
Venkatraman Prabhakar,
Krishnaswamy Ramkumar,
Long Hinh,
Swatilekha Saha,
Vijay Raghavan,
Ramesh Chettuvetty,
Sapan Agarwal,
Matthew J. Marinella
Abstract:
Non-volatile memory arrays can deploy pre-trained neural network models for edge inference. However, these systems are affected by device-level noise and retention issues. Here, we examine damage caused by these effects, introduce a mitigation strategy, and demonstrate its use in fabricated array of SONOS (Silicon-Oxide-Nitride-Oxide-Silicon) devices. On MNIST, fashion-MNIST, and CIFAR-10 tasks, o…
▽ More
Non-volatile memory arrays can deploy pre-trained neural network models for edge inference. However, these systems are affected by device-level noise and retention issues. Here, we examine damage caused by these effects, introduce a mitigation strategy, and demonstrate its use in fabricated array of SONOS (Silicon-Oxide-Nitride-Oxide-Silicon) devices. On MNIST, fashion-MNIST, and CIFAR-10 tasks, our approach increases resilience to synaptic noise and drift. We also show strong performance can be realized with ADCs of 5-8 bits precision.
△ Less
Submitted 2 April, 2020;
originally announced April 2020.
-
Evaluating complexity and resilience trade-offs in emerging memory inference machines
Authors:
Christopher H. Bennett,
Ryan Dellana,
T. Patrick Xiao,
Ben Feinberg,
Sapan Agarwal,
Suma Cardwell,
Matthew J. Marinella,
William Severa,
Brad Aimone
Abstract:
Neuromorphic-style inference only works well if limited hardware resources are maximized properly, e.g. accuracy continues to scale with parameters and complexity in the face of potential disturbance. In this work, we use realistic crossbar simulations to highlight that compact implementations of deep neural networks are unexpectedly susceptible to collapse from multiple system disturbances. Our w…
▽ More
Neuromorphic-style inference only works well if limited hardware resources are maximized properly, e.g. accuracy continues to scale with parameters and complexity in the face of potential disturbance. In this work, we use realistic crossbar simulations to highlight that compact implementations of deep neural networks are unexpectedly susceptible to collapse from multiple system disturbances. Our work proposes a middle path towards high performance and strong resilience utilizing the Mosaics framework, and specifically by re-using synaptic connections in a recurrent neural network implementation that possesses a natural form of noise-immunity.
△ Less
Submitted 25 February, 2020;
originally announced March 2020.
-
Fundamental Physics at the Intensity Frontier
Authors:
J. L. Hewett,
H. Weerts,
R. Brock,
J. N. Butler,
B. C. K. Casey,
J. Collar,
A. de Gouvea,
R. Essig,
Y. Grossman,
W. Haxton,
J. A. Jaros,
C. K. Jung,
Z. T. Lu,
K. Pitts,
Z. Ligeti,
J. R. Patterson,
M. Ramsey-Musolf,
J. L. Ritchie,
A. Roodman,
K. Scholberg,
C. E. M. Wagner,
G. P. Zeller,
S. Aefsky,
A. Afanasev,
K. Agashe
, et al. (443 additional authors not shown)
Abstract:
The Proceedings of the 2011 workshop on Fundamental Physics at the Intensity Frontier. Science opportunities at the intensity frontier are identified and described in the areas of heavy quarks, charged leptons, neutrinos, proton decay, new light weakly-coupled particles, and nucleons, nuclei, and atoms.
The Proceedings of the 2011 workshop on Fundamental Physics at the Intensity Frontier. Science opportunities at the intensity frontier are identified and described in the areas of heavy quarks, charged leptons, neutrinos, proton decay, new light weakly-coupled particles, and nucleons, nuclei, and atoms.
△ Less
Submitted 11 May, 2012;
originally announced May 2012.