NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference

Sun, Ruiqi; Ye, Siwei; Zhao, Jie; He, Xin; Li, Yiran; Zou, An

Computer Science > Machine Learning

arXiv:2305.14405 (cs)

[Submitted on 23 May 2023 (v1), last revised 8 Feb 2024 (this version, v3)]

Title:NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference

Authors:Ruiqi Sun, Siwei Ye, Jie Zhao, Xin He, Yiran Li, An Zou

View PDF

Abstract:The inherent diversity of computation types within individual Deep Neural Network (DNN) models imposes a corresponding need for a varied set of computation units within hardware processors. This diversity poses a significant constraint on computation efficiency during the execution of different neural networks. In this study, we present NeuralMatrix, a framework that transforms the computation of entire DNNs into linear matrix operations. This transformation seamlessly enables the execution of various DNN models using a single General-Purpose Matrix Multiplication (GEMM) accelerator. Extensive experimental results spanning different DNN models demonstrate that our approach preserves network accuracy while providing both generality and application-specific levels of computation efficiency. This allows a broad spectrum of DNN models to be executed using a single GEMM accelerator, eliminating the need for additional special function units.

Comments:	11 pages, 6figures, Submitted to 41st International Conference on Machine Learning
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)
Cite as:	arXiv:2305.14405 [cs.LG]
	(or arXiv:2305.14405v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2305.14405

Submission history

From: Ruiqi Sun [view email]
[v1] Tue, 23 May 2023 12:03:51 UTC (3,424 KB)
[v2] Fri, 6 Oct 2023 13:28:30 UTC (279 KB)
[v3] Thu, 8 Feb 2024 10:11:27 UTC (498 KB)

Computer Science > Machine Learning

Title:NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators