Search | arXiv e-print repository

It's all about PR -- Smart Benchmarking AI Accelerators using Performance Representatives

Authors: Alexander Louis-Ferdinand Jung, Jannik Steinmetz, Jonathan Gietz, Konstantin Lübeck, Oliver Bringmann

Abstract: Statistical models are widely used to estimate the performance of commercial off-the-shelf (COTS) AI hardware accelerators. However, training of statistical performance models often requires vast amounts of data, leading to a significant time investment and can be difficult in case of limited hardware availability. To alleviate this problem, we propose a novel performance modeling methodology that… ▽ More Statistical models are widely used to estimate the performance of commercial off-the-shelf (COTS) AI hardware accelerators. However, training of statistical performance models often requires vast amounts of data, leading to a significant time investment and can be difficult in case of limited hardware availability. To alleviate this problem, we propose a novel performance modeling methodology that significantly reduces the number of training samples while maintaining good accuracy. Our approach leverages knowledge of the target hardware architecture and initial parameter sweeps to identify a set of Performance Representatives (PR) for deep neural network (DNN) layers. These PRs are then used for benchmarking, building a statistical performance model, and making estimations. This targeted approach drastically reduces the number of training samples needed, opposed to random sampling, to achieve a better estimation accuracy. We achieve a Mean Absolute Percentage Error (MAPE) of as low as 0.02% for single-layer estimations and 0.68% for whole DNN estimations with less than 10000 training samples. The results demonstrate the superiority of our method for single-layer estimations compared to models trained with randomly sampled datasets of the same size. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted version for: SAMOS'24

arXiv:2402.00069 [pdf, other]

Using the Abstract Computer Architecture Description Language to Model AI Hardware Accelerators

Authors: Mika Markus Müller, Alexander Richard Manfred Borst, Konstantin Lübeck, Alexander Louis-Ferdinand Jung, Oliver Bringmann

Abstract: Artificial Intelligence (AI) has witnessed remarkable growth, particularly through the proliferation of Deep Neural Networks (DNNs). These powerful models drive technological advancements across various domains. However, to harness their potential in real-world applications, specialized hardware accelerators are essential. This demand has sparked a market for parameterizable AI hardware accelerato… ▽ More Artificial Intelligence (AI) has witnessed remarkable growth, particularly through the proliferation of Deep Neural Networks (DNNs). These powerful models drive technological advancements across various domains. However, to harness their potential in real-world applications, specialized hardware accelerators are essential. This demand has sparked a market for parameterizable AI hardware accelerators offered by different vendors. Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements. The decision involves choosing the right hardware and configuring a suitable set of parameters. However, comparing different accelerator design alternatives remains a complex task. Often, engineers rely on data sheets, spreadsheet calculations, or slow black-box simulators, which only offer a coarse understanding of the performance characteristics. The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams, which helps to communicate computer architecture on different abstraction levels and allows for inferring performance characteristics. In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results. △ Less

Submitted 30 January, 2024; originally announced February 2024.

Comments: Accepted Version for: MBMV'24

arXiv:2209.03807 [pdf, other]

Hardware Accelerator and Neural Network Co-Optimization for Ultra-Low-Power Audio Processing Devices

Authors: Christoph Gerum, Adrian Frischknecht, Tobias Hald, Paul Palomero Bernardo, Konstantin Lübeck, Oliver Bringmann

Abstract: The increasing spread of artificial neural networks does not stop at ultralow-power edge devices. However, these very often have high computational demand and require specialized hardware accelerators to ensure the design meets power and performance constraints. The manual optimization of neural networks along with the corresponding hardware accelerators can be very challenging. This paper present… ▽ More The increasing spread of artificial neural networks does not stop at ultralow-power edge devices. However, these very often have high computational demand and require specialized hardware accelerators to ensure the design meets power and performance constraints. The manual optimization of neural networks along with the corresponding hardware accelerators can be very challenging. This paper presents HANNAH (Hardware Accelerator and Neural Network seArcH), a framework for automated and combined hardware/software co-design of deep neural networks and hardware accelerators for resource and power-constrained edge devices. The optimization approach uses an evolution-based search algorithm, a neural network template technique, and analytical KPI models for the configurable UltraTrail hardware accelerator template to find an optimized neural network and accelerator configuration. We demonstrate that HANNAH can find suitable neural networks with minimized power consumption and high accuracy for different audio classification tasks such as single-class wake word detection, multi-class keyword detection, and voice activity detection, which are superior to the related work. △ Less

Submitted 29 September, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

Comments: Accepted Version for: EUROMICRO DSD 2022

Showing 1–3 of 3 results for author: Lübeck, K