Showing 1–2 of 2 results for author: Soltaniyeh, M

Search v0.5.6 released 2020-02-24

arXiv:2107.13386 [pdf, other]

cs.AR

SPOTS: An Accelerator for Sparse Convolutional Networks Leveraging Systolic General Matrix-Matrix Multiplication

Authors: Mohammadreza Soltaniyeh, Richard P. Martin, Santosh Nagarakatte

Abstract: This paper proposes a new hardware accelerator for sparse convolutional neural networks (CNNs) by building a hardware unit to perform the Image to Column (IM2COL) transformation of the input feature map coupled with a systolic array-based general matrix-matrix multiplication (GEMM) unit. Our design carefully overlaps the IM2COL transformation with the GEMM computation to maximize parallelism. We p… ▽ More This paper proposes a new hardware accelerator for sparse convolutional neural networks (CNNs) by building a hardware unit to perform the Image to Column (IM2COL) transformation of the input feature map coupled with a systolic array-based general matrix-matrix multiplication (GEMM) unit. Our design carefully overlaps the IM2COL transformation with the GEMM computation to maximize parallelism. We propose a novel design for the IM2COL unit that uses a set of distributed local memories connected by a ring network, which improves energy efficiency and latency by streaming the input feature map only once. We propose a tall systolic array for the GEMM unit while also providing the ability to organize it as multiple small GEMM units, which enables our design to handle a wide range of CNNs and their parameters. Further, our design improves performance by effectively map** the sparse data to the hardware units by utilizing sparsity in both input feature maps and weights. Our prototype, SPOTS, is on average 1.74X faster than Eyeriss. It is also 78X, and 12X more energy-efficient when compared to CPU and GPU implementations, respectively. △ Less

Submitted 24 November, 2021; v1 submitted 28 July, 2021; originally announced July 2021.

Comments: 24 pages

Report number: Rutgers Department of Computer Science Technical Report DCS-TR-756
arXiv:2004.13907 [pdf, other]

cs.DC cs.MS cs.PL

Synergistic CPU-FPGA Acceleration of Sparse Linear Algebra

Authors: Mohammadreza Soltaniyeh, Richard P. Martin, Santosh Nagarakatte

Abstract: This paper describes REAP, a software-hardware approach that enables high performance sparse linear algebra computations on a cooperative CPU-FPGA platform. REAP carefully separates the task of organizing the matrix elements from the computation phase. It uses the CPU to provide a first-pass re-organization of the matrix elements, allowing the FPGA to focus on the computation. We introduce a new i… ▽ More This paper describes REAP, a software-hardware approach that enables high performance sparse linear algebra computations on a cooperative CPU-FPGA platform. REAP carefully separates the task of organizing the matrix elements from the computation phase. It uses the CPU to provide a first-pass re-organization of the matrix elements, allowing the FPGA to focus on the computation. We introduce a new intermediate representation that allows the CPU to communicate the sparse data and the scheduling decisions to the FPGA. The computation is optimized on the FPGA for effective resource utilization with pipelining. REAP improves the performance of Sparse General Matrix Multiplication (SpGEMM) and Sparse Cholesky Factorization by 3.2X and 1.85X compared to widely used sparse libraries for them on the CPU, respectively. △ Less

Submitted 28 April, 2020; originally announced April 2020.

Comments: 12 pages

Report number: Rutgers Computer Science Technical Report DCS-TR-750

Search v0.5.6 released 2020-02-24