Showing 1–2 of 2 results for author: Soltaniyeh, M
-
SPOTS: An Accelerator for Sparse Convolutional Networks Leveraging Systolic General Matrix-Matrix Multiplication
Authors:
Mohammadreza Soltaniyeh,
Richard P. Martin,
Santosh Nagarakatte
Abstract:
This paper proposes a new hardware accelerator for sparse convolutional neural networks (CNNs) by building a hardware unit to perform the Image to Column (IM2COL) transformation of the input feature map coupled with a systolic array-based general matrix-matrix multiplication (GEMM) unit. Our design carefully overlaps the IM2COL transformation with the GEMM computation to maximize parallelism. We p…
▽ More
This paper proposes a new hardware accelerator for sparse convolutional neural networks (CNNs) by building a hardware unit to perform the Image to Column (IM2COL) transformation of the input feature map coupled with a systolic array-based general matrix-matrix multiplication (GEMM) unit. Our design carefully overlaps the IM2COL transformation with the GEMM computation to maximize parallelism. We propose a novel design for the IM2COL unit that uses a set of distributed local memories connected by a ring network, which improves energy efficiency and latency by streaming the input feature map only once. We propose a tall systolic array for the GEMM unit while also providing the ability to organize it as multiple small GEMM units, which enables our design to handle a wide range of CNNs and their parameters. Further, our design improves performance by effectively map** the sparse data to the hardware units by utilizing sparsity in both input feature maps and weights. Our prototype, SPOTS, is on average 1.74X faster than Eyeriss. It is also 78X, and 12X more energy-efficient when compared to CPU and GPU implementations, respectively.
△ Less
Submitted 24 November, 2021; v1 submitted 28 July, 2021;
originally announced July 2021.
-
Synergistic CPU-FPGA Acceleration of Sparse Linear Algebra
Authors:
Mohammadreza Soltaniyeh,
Richard P. Martin,
Santosh Nagarakatte
Abstract:
This paper describes REAP, a software-hardware approach that enables high performance sparse linear algebra computations on a cooperative CPU-FPGA platform. REAP carefully separates the task of organizing the matrix elements from the computation phase. It uses the CPU to provide a first-pass re-organization of the matrix elements, allowing the FPGA to focus on the computation. We introduce a new i…
▽ More
This paper describes REAP, a software-hardware approach that enables high performance sparse linear algebra computations on a cooperative CPU-FPGA platform. REAP carefully separates the task of organizing the matrix elements from the computation phase. It uses the CPU to provide a first-pass re-organization of the matrix elements, allowing the FPGA to focus on the computation. We introduce a new intermediate representation that allows the CPU to communicate the sparse data and the scheduling decisions to the FPGA. The computation is optimized on the FPGA for effective resource utilization with pipelining. REAP improves the performance of Sparse General Matrix Multiplication (SpGEMM) and Sparse Cholesky Factorization by 3.2X and 1.85X compared to widely used sparse libraries for them on the CPU, respectively.
△ Less
Submitted 28 April, 2020;
originally announced April 2020.