-
AGNI: In-Situ, Iso-Latency Stochastic-to-Binary Number Conversion for In-DRAM Deep Learning
Authors:
Supreeth Mysore Shivanandamurthy,
Sairam Sri Vatsavai,
Ishan Thakkar,
Sayed Ahmad Salehi
Abstract:
Recent years have seen a rapid increase in research activity in the field of DRAM-based Processing-In-Memory (PIM) accelerators, where the analog computing capability of DRAM is employed by minimally changing the inherent structure of DRAM peripherals to accelerate various data-centric applications. Several DRAM-based PIM accelerators for Convolutional Neural Networks (CNNs) have also been reporte…
▽ More
Recent years have seen a rapid increase in research activity in the field of DRAM-based Processing-In-Memory (PIM) accelerators, where the analog computing capability of DRAM is employed by minimally changing the inherent structure of DRAM peripherals to accelerate various data-centric applications. Several DRAM-based PIM accelerators for Convolutional Neural Networks (CNNs) have also been reported. Among these, the accelerators leveraging in-DRAM stochastic arithmetic have shown manifold improvements in processing latency and throughput, due to the ability of stochastic arithmetic to convert multiplications into simple bit-wise logical AND operations. However,the use of in-DRAM stochastic arithmetic for CNN acceleration requires frequent stochastic to binary number conversions. For that, prior works employ full adder-based or serial counter based in-DRAM circuits. These circuits consume large area and incur long latency. Their in-DRAM implementations also require heavy modifications in DRAM peripherals, which significantly diminishes the benefits of using stochastic arithmetic in these accelerators. To address these shortcomings, this paper presents a new substrate for in-DRAM stochastic-to-binary number conversion called AGNI. AGNI makes minor modifications in DRAM peripherals using pass transistors, capacitors, encoders, and charge pumps, and re-purposes the sense amplifiers as voltage comparators, to enable in-situ binary conversion of input statistic operands of different sizes with iso latency.
△ Less
Submitted 11 February, 2023;
originally announced February 2023.
-
ATRIA: A Bit-Parallel Stochastic Arithmetic Based Accelerator for In-DRAM CNN Processing
Authors:
Supreeth Mysore Shivanandamurthy,
Ishan. G. Thakkar,
Sayed Ahmad Salehi
Abstract:
With the rapidly growing use of Convolutional Neural Networks (CNNs) in real-world applications related to machine learning and Artificial Intelligence (AI), several hardware accelerator designs for CNN inference and training have been proposed recently. In this paper, we present ATRIA, a novel bit-pArallel sTochastic aRithmetic based In-DRAM Accelerator for energy-efficient and high-speed inferen…
▽ More
With the rapidly growing use of Convolutional Neural Networks (CNNs) in real-world applications related to machine learning and Artificial Intelligence (AI), several hardware accelerator designs for CNN inference and training have been proposed recently. In this paper, we present ATRIA, a novel bit-pArallel sTochastic aRithmetic based In-DRAM Accelerator for energy-efficient and high-speed inference of CNNs. ATRIA employs light-weight modifications in DRAM cell arrays to implement bit-parallel stochastic arithmetic based acceleration of multiply-accumulate (MAC) operations inside DRAM. ATRIA significantly improves the latency, throughput, and efficiency of processing CNN inferences by performing 16 MAC operations in only five consecutive memory operation cycles. We mapped the inference tasks of four benchmark CNNs on ATRIA to compare its performance with five state-of-the-art in-DRAM CNN accelerators from prior work. The results of our analysis show that ATRIA exhibits only 3.5% drop in CNN inference accuracy and still achieves improvements of up to 3.2x in frames-per-second (FPS) and up to 10x in efficiency (FPS/W/mm2), compared to the best-performing in-DRAM accelerator from prior work.
△ Less
Submitted 26 May, 2021;
originally announced May 2021.
-
ODIN: A Bit-Parallel Stochastic Arithmetic Based Accelerator for In-Situ Neural Network Processing in Phase Change RAM
Authors:
Supreeth Mysore Shivanandamurthy,
Ishan. G. Thakkar,
Sayed Ahmad Salehi
Abstract:
Due to the very rapidly growing use of Artificial Neural Networks (ANNs) in real-world applications related to machine learning and Artificial Intelligence (AI), several hardware accelerator de-signs for ANNs have been proposed recently. In this paper, we present a novel processing-in-memory (PIM) engine called ODIN that employs hybrid binary-stochastic bit-parallel arithmetic in-side phase change…
▽ More
Due to the very rapidly growing use of Artificial Neural Networks (ANNs) in real-world applications related to machine learning and Artificial Intelligence (AI), several hardware accelerator de-signs for ANNs have been proposed recently. In this paper, we present a novel processing-in-memory (PIM) engine called ODIN that employs hybrid binary-stochastic bit-parallel arithmetic in-side phase change RAM (PCRAM) to enable a low-overhead in-situ acceleration of all essential ANN functions such as multiply-accumulate (MAC), nonlinear activation, and pooling. We mapped four ANN benchmark applications on ODIN to compare its performance with a conventional processor-centric design and a crossbar-based in-situ ANN accelerator from prior work. The results of our analysis for the considered ANN topologies indicate that our ODIN accelerator can be at least 5.8x faster and 23.2x more energy-efficient, and up to 90.8x faster and 1554x more energy-efficient, compared to the crossbar-based in-situ ANN accelerator from prior work.
△ Less
Submitted 5 March, 2021;
originally announced March 2021.
-
Low-cost Stochastic Number Generators for Stochastic Computing
Authors:
Sayed Ahmad Salehi
Abstract:
Stochastic unary computing provides low-area circuits. However, the required area consuming stochastic number generators (SNGs) in these circuits can diminish their overall gain in area, particularly if several SNGs are required. We propose area-efficient SNGs by sharing the permuted output of one linear feedback shift register (LFSR) among several SNGs. With no hardware overhead, the proposed arc…
▽ More
Stochastic unary computing provides low-area circuits. However, the required area consuming stochastic number generators (SNGs) in these circuits can diminish their overall gain in area, particularly if several SNGs are required. We propose area-efficient SNGs by sharing the permuted output of one linear feedback shift register (LFSR) among several SNGs. With no hardware overhead, the proposed architecture generates stochastic bit streams with minimum stochastic computing correlation (SCC). Compared to the circular shifting approach presented in prior work, our approach produces stochastic bit streams with 67% less average SCC when a 10-bit LFSR is shared between two SNGs. To generalize our approach, we propose an algorithm to find a set of m permutations (n>m>2) with minimum pairwise SCC, for an n-bit LFSR. The search space for finding permutations with exact minimum SCC grows rapidly when n increases and it is intractable to perform a search algorithm using accurately calculated pairwise SCC values, for n>9. We propose a similarity function that can be used in the proposed search algorithm to quickly find a set of permutations with SCC values close to the minimum one. We evaluated our approach for several applications. The results show that, compared to prior work, it achieves lower MSE with the same (or even lower) area. Additionally, based on simulation results, we show that replacing the comparator component of an SNG circuit with a weighted binary generator can reduce SCC.
△ Less
Submitted 3 January, 2020;
originally announced January 2020.