MaxEVA: Maximizing the Efficiency of Matrix Multiplication on Versal AI Engine

Taka, Endri; Arora, Aman; Wu, Kai-Chiang; Marculescu, Diana

Computer Science > Hardware Architecture

arXiv:2311.04980 (cs)

[Submitted on 8 Nov 2023 (v1), last revised 14 Nov 2023 (this version, v2)]

Title:MaxEVA: Maximizing the Efficiency of Matrix Multiplication on Versal AI Engine

Authors:Endri Taka, Aman Arora, Kai-Chiang Wu, Diana Marculescu

View PDF

Abstract:The increasing computational and memory requirements of Deep Learning (DL) workloads has led to outstanding innovations in hardware architectures. An archetype of such architectures is the novel Versal AI Engine (AIE) by AMD/Xilinx. The AIE comprises multiple programmable processors optimized for vector-based algorithms. An AIE array consisting of 400 processor cores, operating at 1.25 GHz is able to deliver a peak throughput of 8 TFLOPs for 32-bit floating-point (fp32), and 128 TOPs for 8-bit integer (int8) precision. In this work, we propose MaxEVA: a novel framework to efficiently map Matrix Multiplication (MatMul) workloads on Versal AIE devices. Our framework maximizes the performance and energy efficiency of MatMul applications by efficiently exploiting features of the AIE architecture and resolving performance bottlenecks from multiple angles. When demonstrating on the VC1902 device of the VCK190 board, MaxEVA accomplishes up to 5.44 TFLOPs and 77.01 TOPs throughput for fp32 and int8 precisions, respectively. In terms of energy efficiency, MaxEVA attains up to 124.16 GFLOPs/W for fp32, and 1.16 TOPs/W for int8. Our proposed method substantially outperforms the state-of-the-art approach by exhibiting up to 2.19x throughput gain and 20.4% higher energy efficiency. The MaxEVA framework provides notable insights to fill the knowledge gap in effectively designing MatMul-based DL workloads on the new Versal AIE devices.

Comments:	Accepted as full paper at FPT 2023
Subjects:	Hardware Architecture (cs.AR)
Cite as:	arXiv:2311.04980 [cs.AR]
	(or arXiv:2311.04980v2 [cs.AR] for this version)
	https://doi.org/10.48550/arXiv.2311.04980

Submission history

From: Endri Taka [view email]
[v1] Wed, 8 Nov 2023 19:02:05 UTC (3,303 KB)
[v2] Tue, 14 Nov 2023 00:42:17 UTC (3,303 KB)

Computer Science > Hardware Architecture

Title:MaxEVA: Maximizing the Efficiency of Matrix Multiplication on Versal AI Engine

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Hardware Architecture

Title:MaxEVA: Maximizing the Efficiency of Matrix Multiplication on Versal AI Engine

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators