Showing 1–2 of 2 results for author: Mustafa, E

Search v0.5.6 released 2020-02-24

arXiv:2401.04012 [pdf, other]

cs.AR

MX: Enhancing RISC-V's Vector ISA for Ultra-Low Overhead, Energy-Efficient Matrix Multiplication

Authors: Matteo Perotti, Yichao Zhang, Matheus Cavalcante, Enis Mustafa, Luca Benini

Abstract: Dense Matrix Multiplication (MatMul) is arguably one of the most ubiquitous compute-intensive kernels, spanning linear algebra, DSP, graphics, and machine learning applications. Thus, MatMul optimization is crucial not only in high-performance processors but also in embedded low-power platforms. Several Instruction Set Architectures (ISAs) have recently included matrix extensions to improve MatMul… ▽ More Dense Matrix Multiplication (MatMul) is arguably one of the most ubiquitous compute-intensive kernels, spanning linear algebra, DSP, graphics, and machine learning applications. Thus, MatMul optimization is crucial not only in high-performance processors but also in embedded low-power platforms. Several Instruction Set Architectures (ISAs) have recently included matrix extensions to improve MatMul performance and efficiency at the cost of added matrix register files and units. In this paper, we propose Matrix eXtension (MX), a lightweight approach that builds upon the open-source RISC-V Vector (RVV) ISA to boost MatMul energy efficiency. Instead of adding expensive dedicated hardware, MX uses the pre-existing vector register file and functional units to create a hybrid vector/matrix engine at a negligible area cost (< 3%), which comes from a compact near-FPU tile buffer for higher data reuse, and no clock frequency overhead. We implement MX on a compact and highly energy-optimized RVV processor and evaluate it in both a Dual- and 64-Core cluster in a 12-nm technology node. MX boosts the Dual-Core's energy efficiency by 10% for a double-precision 64x64x64 matrix multiplication with the same FPU utilization (~97%) and by 25% on the 64-Core cluster for the same benchmark on 32-bit data, with a 56% performance gain. △ Less

Submitted 8 January, 2024; originally announced January 2024.
arXiv:2208.05558 [pdf, other]

cs.DC

Federated Learning for Digital Twin-Based Vehicular Networks: Architecture and Challenges

Authors: Latif U. Khan, Ehzaz Mustafa, Junaid Shuja, Faisal Rehman, Kashif Bilal, Zhu Han, Choong Seon Hong

Abstract: Emerging intelligent transportation applications, such as accident reporting, lane change assistance, collision avoidance, and infotainment, will be based on diverse requirements (e.g., latency, reliability, quality of physical experience). To fulfill such requirements, there is a significant need to deploy a digital twin-based intelligent transportation system. Although the twin-based implementat… ▽ More Emerging intelligent transportation applications, such as accident reporting, lane change assistance, collision avoidance, and infotainment, will be based on diverse requirements (e.g., latency, reliability, quality of physical experience). To fulfill such requirements, there is a significant need to deploy a digital twin-based intelligent transportation system. Although the twin-based implementation of vehicular networks can offer performance optimization. Modeling twins is a significantly challenging task. Machine learning (ML) can be a preferable solution to model such a virtual model, and specifically federated learning (FL) is a distributed learning scheme that can better preserve privacy compared to centralized ML. Although FL can offer performance enhancement, it requires careful design. Therefore, in this article, we present an overview of FL for the twin-based vehicular network. A general architecture showing FL for the twin-based vehicular network is proposed. Our proposed architecture consists of two spaces, such as twin space and a physical space. The physical space consists of all the physical entities (e.g., cars and edge servers) required for vehicular networks, whereas the twin space refers to the logical space that is used for the deployment of twins. A twin space can be implemented either using edge servers and cloud servers. We also outline a few use cases of FL for the twin-based vehicular network. Finally, the paper is concluded and an outlook on open challenges is presented. △ Less

Submitted 10 August, 2022; originally announced August 2022.

Search v0.5.6 released 2020-02-24