Search | arXiv e-print repository

arXiv:2402.19184 [pdf, other]

Data Transfer Optimizations for Host-CPU and Accelerators in AXI4MLIR

Authors: Jude Haris, Nicolas Bohm Agostini, Antonino Tumeo, David Kaeli, José Cano

Abstract: As custom hardware accelerators become more prevalent, it becomes increasingly important to automatically generate efficient host-driver code that can fully leverage the capabilities of these accelerators. This approach saves time and reduces the likelihood of errors that can occur during manual implementation. AXI4MLIR extends the MLIR compiler framework to generate host-driver code for custom ac… ▽ More As custom hardware accelerators become more prevalent, it becomes increasingly important to automatically generate efficient host-driver code that can fully leverage the capabilities of these accelerators. This approach saves time and reduces the likelihood of errors that can occur during manual implementation. AXI4MLIR extends the MLIR compiler framework to generate host-driver code for custom accelerators for linear algebra problems. By leveraging specific compiler optimizations, we can further increase accelerator utilization. In this work we offer two key observations through a MatMul accelerator case study. First, the accelerator's compute core utilization is less than 10%, and second, the critical latency bottleneck is caused by copying data between the heap and memory-mapped DMA buffers. We identify a set of missing host code optimizations to improve the under-utilization and the latency bottleneck. Therefore, we propose three key host-code data-movement-related optimizations, extending AXI4MLIR. The optimizations provide DMA-based data allocation, coalescing of DMA transfers, and pipelining of the accelerator's load, compute, and store stages. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2312.14821 [pdf, other]

doi 10.1109/CGO57630.2024.10444801

AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators

Authors: Nicolas Bohm Agostini, Jude Haris, Perry Gibson, Malith Jayaweera, Norm Rubin, Antonino Tumeo, José L. Abellán, José Cano, David Kaeli

Abstract: This paper addresses the need for automatic and efficient generation of host driver code for arbitrary custom AXI-based accelerators targeting linear algebra algorithms, an important workload in various applications, including machine learning and scientific computing. While existing tools have focused on automating accelerator prototy**, little attention has been paid to the host-accelerator in… ▽ More This paper addresses the need for automatic and efficient generation of host driver code for arbitrary custom AXI-based accelerators targeting linear algebra algorithms, an important workload in various applications, including machine learning and scientific computing. While existing tools have focused on automating accelerator prototy**, little attention has been paid to the host-accelerator interaction. This paper introduces AXI4MLIR, an extension of the MLIR compiler framework designed to facilitate the automated generation of host-accelerator driver code. With new MLIR attributes and transformations, AXI4MLIR empowers users to specify accelerator features (including their instructions) and communication patterns and exploit the host memory hierarchy. We demonstrate AXI4MLIR's versatility across different types of accelerators and problems, showcasing significant CPU cache reference reductions (up to 56%) and up to a 1.65x speedup compared to manually optimized driver code implementations. AXI4MLIR implementation is open-source and available at: https://github.com/AXI4MLIR/axi4mlir. △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: 13 pages, 17 figures, to appear in CGO2024

ACM Class: D.3.3

arXiv:2211.12504 [pdf, other]

Identifying gender bias in blockbuster movies through the lens of machine learning

Authors: Muhammad Junaid Haris, Aanchal Upreti, Melih Kurtaran, Filip Ginter, Sebastien Lafond, Sepinoud Azimi

Abstract: The problem of gender bias is highly prevalent and well known. In this paper, we have analysed the portrayal of gender roles in English movies, a medium that effectively influences society in sha** people's beliefs and opinions. First, we gathered scripts of films from different genres and derived sentiments and emotions using natural language processing techniques. Afterwards, we converted the… ▽ More The problem of gender bias is highly prevalent and well known. In this paper, we have analysed the portrayal of gender roles in English movies, a medium that effectively influences society in sha** people's beliefs and opinions. First, we gathered scripts of films from different genres and derived sentiments and emotions using natural language processing techniques. Afterwards, we converted the scripts into embeddings, i.e. a way of representing text in the form of vectors. With a thorough investigation, we found specific patterns in male and female characters' personality traits in movies that align with societal stereotypes. Furthermore, we used mathematical and machine learning techniques and found some biases wherein men are shown to be more dominant and envious than women, whereas women have more joyful roles in movies. In our work, we introduce, to the best of our knowledge, a novel technique to convert dialogues into an array of emotions by combining it with Plutchik's wheel of emotions. Our study aims to encourage reflections on gender equality in the domain of film and facilitate other researchers in analysing movies automatically instead of using manual approaches. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2110.00478 [pdf, other]

SECDA: Efficient Hardware/Software Co-Design of FPGA-based DNN Accelerators for Edge Inference

Authors: Jude Haris, Perry Gibson, José Cano, Nicolas Bohm Agostini, David Kaeli

Abstract: Edge computing devices inherently face tight resource constraints, which is especially apparent when deploying Deep Neural Networks (DNN) with high memory and compute demands. FPGAs are commonly available in edge devices. Since these reconfigurable circuits can achieve higher throughput and lower power consumption than general purpose processors, they are especially well-suited for DNN acceleratio… ▽ More Edge computing devices inherently face tight resource constraints, which is especially apparent when deploying Deep Neural Networks (DNN) with high memory and compute demands. FPGAs are commonly available in edge devices. Since these reconfigurable circuits can achieve higher throughput and lower power consumption than general purpose processors, they are especially well-suited for DNN acceleration. However, existing solutions for designing FPGA-based DNN accelerators for edge devices come with high development overheads, given the cost of repeated FPGA synthesis passes, reimplementation in a Hardware Description Language (HDL) of the simulated design, and accelerator system integration. In this paper we propose SECDA, a new hardware/software co-design methodology to reduce design time of optimized DNN inference accelerators on edge devices with FPGAs. SECDA combines cost-effective SystemC simulation with hardware execution, streamlining design space exploration and the development process via reduced design evaluation time. As a case study, we use SECDA to efficiently develop two different DNN accelerator designs on a PYNQ-Z1 board, a platform that includes an edge FPGA. We quickly and iteratively explore the system's hardware/software stack, while identifying and mitigating performance bottlenecks. We evaluate the two accelerator designs with four common DNN models, achieving an average performance speedup across models of up to 3.5$\times$ with a 2.9$\times$ reduction in energy consumption over CPU-only inference. Our code is available at https://github.com/gicLAB/SECDA △ Less

Submitted 1 October, 2021; originally announced October 2021.

Comments: This paper is accepted to SBAC-PAD 2021

Showing 1–4 of 4 results for author: Haris, J