-
ApproxDARTS: Differentiable Neural Architecture Search with Approximate Multipliers
Authors:
Michal Pinos,
Lukas Sekanina,
Vojtech Mrazek
Abstract:
Integrating the principles of approximate computing into the design of hardware-aware deep neural networks (DNN) has led to DNNs implementations showing good output quality and highly optimized hardware parameters such as low latency or inference energy. In this work, we present ApproxDARTS, a neural architecture search (NAS) method enabling the popular differentiable neural architecture search me…
▽ More
Integrating the principles of approximate computing into the design of hardware-aware deep neural networks (DNN) has led to DNNs implementations showing good output quality and highly optimized hardware parameters such as low latency or inference energy. In this work, we present ApproxDARTS, a neural architecture search (NAS) method enabling the popular differentiable neural architecture search method called DARTS to exploit approximate multipliers and thus reduce the power consumption of generated neural networks. We showed on the CIFAR-10 data set that the ApproxDARTS is able to perform a complete architecture search within less than $10$ GPU hours and produce competitive convolutional neural networks (CNN) containing approximate multipliers in convolutional layers. For example, ApproxDARTS created a CNN showing an energy consumption reduction of (a) $53.84\%$ in the arithmetic operations of the inference phase compared to the CNN utilizing the native $32$-bit floating-point multipliers and (b) $5.97\%$ compared to the CNN utilizing the exact $8$-bit fixed-point multipliers, in both cases with a negligible accuracy drop. Moreover, the ApproxDARTS is $2.3\times$ faster than a similar but evolutionary algorithm-based method called EvoApproxNAS.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Exploring Quantization and Map** Synergy in Hardware-Aware Deep Neural Network Accelerators
Authors:
Jan Klhufek,
Miroslav Safar,
Vojtech Mrazek,
Zdenek Vasicek,
Lukas Sekanina
Abstract:
Energy efficiency and memory footprint of a convolutional neural network (CNN) implemented on a CNN inference accelerator depend on many factors, including a weight quantization strategy (i.e., data types and bit-widths) and map** (i.e., placement and scheduling of DNN elementary operations on hardware units of the accelerator). We show that enabling rich mixed quantization schemes during the im…
▽ More
Energy efficiency and memory footprint of a convolutional neural network (CNN) implemented on a CNN inference accelerator depend on many factors, including a weight quantization strategy (i.e., data types and bit-widths) and map** (i.e., placement and scheduling of DNN elementary operations on hardware units of the accelerator). We show that enabling rich mixed quantization schemes during the implementation can open a previously hidden space of map**s that utilize the hardware resources more effectively. CNNs utilizing quantized weights and activations and suitable map**s can significantly improve trade-offs among the accuracy, energy, and memory requirements compared to less carefully optimized CNN implementations. To find, analyze, and exploit these map**s, we: (i) extend a general-purpose state-of-the-art map** tool (Timeloop) to support mixed quantization, which is not currently available; (ii) propose an efficient multi-objective optimization algorithm to find the most suitable bit-widths and map** for each DNN layer executed on the accelerator; and (iii) conduct a detailed experimental evaluation to validate the proposed method. On two CNNs (MobileNetV1 and MobileNetV2) and two accelerators (Eyeriss and Simba) we show that for a given quality metric (such as the accuracy on ImageNet), energy savings are up to 37% without any accuracy drop.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Xel-FPGAs: An End-to-End Automated Exploration Framework for Approximate Accelerators in FPGA-Based Systems
Authors:
Bharath Srinivas Prabakaran,
Vojtech Mrazek,
Zdenek Vasicek,
Lukas Sekanina,
Muhammad Shafique
Abstract:
Generation and exploration of approximate circuits and accelerators has been a prominent research domain to achieve energy-efficiency and/or performance improvements. This research has predominantly focused on ASICs, while not achieving similar gains when deployed for FPGA-based accelerator systems, due to the inherent architectural differences between the two. In this work, we propose a novel fra…
▽ More
Generation and exploration of approximate circuits and accelerators has been a prominent research domain to achieve energy-efficiency and/or performance improvements. This research has predominantly focused on ASICs, while not achieving similar gains when deployed for FPGA-based accelerator systems, due to the inherent architectural differences between the two. In this work, we propose a novel framework, Xel-FPGAs, which leverages statistical or machine learning models to effectively explore the architecture-space of state-of-the-art ASIC-based approximate circuits to cater them for FPGA-based systems given a simple RTL description of the target application. We have also evaluated the scalability of our framework on a multi-stage application using a hierarchical search strategy. The Xel-FPGAs framework is capable of reducing the exploration time by up to 95%, when compared to the default synthesis, place, and route approaches, while identifying an improved set of Pareto-optimal designs for a given application, when compared to the state-of-the-art. The complete framework is open-source and available online at https://github.com/ehw-fit/xel-fpgas.
△ Less
Submitted 8 August, 2023; v1 submitted 8 March, 2023;
originally announced March 2023.
-
RoHNAS: A Neural Architecture Search Framework with Conjoint Optimization for Adversarial Robustness and Hardware Efficiency of Convolutional and Capsule Networks
Authors:
Alberto Marchisio,
Vojtech Mrazek,
Andrea Massa,
Beatrice Bussolino,
Maurizio Martina,
Muhammad Shafique
Abstract:
Neural Architecture Search (NAS) algorithms aim at finding efficient Deep Neural Network (DNN) architectures for a given application under given system constraints. DNNs are computationally-complex as well as vulnerable to adversarial attacks. In order to address multiple design objectives, we propose RoHNAS, a novel NAS framework that jointly optimizes for adversarial-robustness and hardware-effi…
▽ More
Neural Architecture Search (NAS) algorithms aim at finding efficient Deep Neural Network (DNN) architectures for a given application under given system constraints. DNNs are computationally-complex as well as vulnerable to adversarial attacks. In order to address multiple design objectives, we propose RoHNAS, a novel NAS framework that jointly optimizes for adversarial-robustness and hardware-efficiency of DNNs executed on specialized hardware accelerators. Besides the traditional convolutional DNNs, RoHNAS additionally accounts for complex types of DNNs such as Capsule Networks. For reducing the exploration time, RoHNAS analyzes and selects appropriate values of adversarial perturbation for each dataset to employ in the NAS flow. Extensive evaluations on multi - Graphics Processing Unit (GPU) - High Performance Computing (HPC) nodes provide a set of Pareto-optimal solutions, leveraging the tradeoff between the above-discussed design objectives. For example, a Pareto-optimal DNN for the CIFAR-10 dataset exhibits 86.07% accuracy, while having an energy of 38.63 mJ, a memory footprint of 11.85 MiB, and a latency of 4.47 ms.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
Designing Approximate Arithmetic Circuits with Combined Error Constraints
Authors:
Milan Češka,
Jiří Matyáš,
Vojtech Mrazek,
Tomáš Vojnar
Abstract:
Approximate circuits trading the power consumption for the quality of results play a key role in the development of energy-aware systems. Designing complex approximate circuits is, however, a very difficult and computationally demanding process. When deploying approximate circuits, various error metrics (e.g., mean average error, worst-case error, error rate), as well as other constraints (e.g., c…
▽ More
Approximate circuits trading the power consumption for the quality of results play a key role in the development of energy-aware systems. Designing complex approximate circuits is, however, a very difficult and computationally demanding process. When deploying approximate circuits, various error metrics (e.g., mean average error, worst-case error, error rate), as well as other constraints (e.g., correct multiplication by 0), have to be considered. The state-of-the-art approximation methods typically focus on a single metric which significantly limits the applicability of the resulting circuits. In this paper, we experimentally investigate how various error metrics and their combinations affect the reduction of the power consumption that can be achieved. To this end, we extend evolutionary-driven techniques that allow us to effectively explore the design space of the approximate circuits. We identify principal limitations when complex error constraints are required as well as important correlations among the error metrics enabling the construction of circuits providing the best-known trade-offs between the power reduction and combined error constraints.
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
Optimization of BDD-based Approximation Error Metrics Calculations
Authors:
Vojtech Mrazek
Abstract:
Software methods introduced for automated design of approximate implementations of arithmetic circuits rely on fast and accurate evaluation of approximate candidate implementations. To accelerate the evaluation of circuit error, we propose four novel algorithms for the exact worst-case and mean absolute error analysis based on Binary Decision Diagrams. As these algorithms do not compute any absolu…
▽ More
Software methods introduced for automated design of approximate implementations of arithmetic circuits rely on fast and accurate evaluation of approximate candidate implementations. To accelerate the evaluation of circuit error, we propose four novel algorithms for the exact worst-case and mean absolute error analysis based on Binary Decision Diagrams. As these algorithms do not compute any absolute values in the characteristic function, which basically compares a candidate approximate circuit with a golden circuit, the error evaluation is significantly faster than the standard BDD-based error analysis. On average, the proposed algorithms are three times faster (in some cases, 30 times faster) than the baseline for 8- to 32-bit approximate adders. These results were obtained from more than 49 thousand runs with different configurations of the method. The proposed error evaluation algorithms are available as an open-source software https://github.com/ehw-fit/bdd-evaluation.
△ Less
Submitted 6 May, 2022;
originally announced May 2022.
-
ArithsGen: Arithmetic Circuit Generator for Hardware Accelerators
Authors:
Jan Klhufek,
Vojtech Mrazek
Abstract:
Generators of arithmetic circuits can automatically deliver various implementations of arithmetic circuits that show different tradeoffs between the key circuit parameters (delay, area, power consumption). However, existing (freely-)available generators are limited if more complex circuits with a hierarchical structure and additional architecture optimization are requested. Furthermore, they suppo…
▽ More
Generators of arithmetic circuits can automatically deliver various implementations of arithmetic circuits that show different tradeoffs between the key circuit parameters (delay, area, power consumption). However, existing (freely-)available generators are limited if more complex circuits with a hierarchical structure and additional architecture optimization are requested. Furthermore, they support only a few output formats. In order to overcome the above-mentioned limitations, we developed a new generator of arithmetic circuits called ArithsGen. ArithsGen can generate specific architectures of signed and unsigned adders and multipliers using basic building elements such as wires and gates. Compared to existing generators, the user can, for example, specify the type of adders used in multipliers. The tool supports various outputs formats (Verilog, BLIF, C/C++, or integer netlists). ArithsGen was evaluated in the synthesis and optimization of generic customizable accurate and approximate adders and multipliers. Furthermore, we used the circuits generated by ArithsGen as seeds for a tool developed to automatically create approximate implementations of arithmetic circuits. We show that different initial circuits (generated by ArithsGen) significantly impact the properties of these approximate implementations. The tool is available online at https://github.com/ehw-fit/ariths-gen.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
Evolutionary Neural Architecture Search Supporting Approximate Multipliers
Authors:
Michal Pinos,
Vojtech Mrazek,
Lukas Sekanina
Abstract:
There is a growing interest in automated neural architecture search (NAS) methods. They are employed to routinely deliver high-quality neural network architectures for various challenging data sets and reduce the designer's effort. The NAS methods utilizing multi-objective evolutionary algorithms are especially useful when the objective is not only to minimize the network error but also to minimiz…
▽ More
There is a growing interest in automated neural architecture search (NAS) methods. They are employed to routinely deliver high-quality neural network architectures for various challenging data sets and reduce the designer's effort. The NAS methods utilizing multi-objective evolutionary algorithms are especially useful when the objective is not only to minimize the network error but also to minimize the number of parameters (weights) or power consumption of the inference phase. We propose a multi-objective NAS method based on Cartesian genetic programming for evolving convolutional neural networks (CNN). The method allows approximate operations to be used in CNNs to reduce the power consumption of a target hardware implementation. During the NAS process, a suitable CNN architecture is evolved together with approximate multipliers to deliver the best trade-offs between the accuracy, network size, and power consumption. The most suitable approximate multipliers are automatically selected from a library of approximate multipliers. Evolved CNNs are compared with common human-created CNNs of a similar complexity on the CIFAR-10 benchmark problem.
△ Less
Submitted 28 January, 2021;
originally announced January 2021.
-
DESCNet: Develo** Efficient Scratchpad Memories for Capsule Network Hardware
Authors:
Alberto Marchisio,
Vojtech Mrazek,
Muhammad Abdullah Hanif,
Muhammad Shafique
Abstract:
Deep Neural Networks (DNNs) have been established as the state-of-the-art algorithm for advanced machine learning applications. Recently proposed by the Google Brain's team, the Capsule Networks (CapsNets) have improved the generalization ability, as compared to DNNs, due to their multi-dimensional capsules and preserving the spatial relationship between different objects. However, they pose signi…
▽ More
Deep Neural Networks (DNNs) have been established as the state-of-the-art algorithm for advanced machine learning applications. Recently proposed by the Google Brain's team, the Capsule Networks (CapsNets) have improved the generalization ability, as compared to DNNs, due to their multi-dimensional capsules and preserving the spatial relationship between different objects. However, they pose significantly high computational and memory requirements, making their energy-efficient inference a challenging task. This paper provides, for the first time, an in-depth analysis to highlight the design and management related challenges for the (on-chip) memories deployed in hardware accelerators executing fast CapsNets inference. To enable an efficient design, we propose an application-specific memory hierarchy, which minimizes the off-chip memory accesses, while efficiently feeding the data to the hardware accelerator. We analyze the corresponding on-chip memory requirements and leverage it to propose a novel methodology to explore different scratchpad memory designs and their energy/area trade-offs.
Afterwards, an application-specific power-gating technique is proposed to further reduce the energy consumption, depending upon the utilization across different operations of the CapsNets. Our results for a selected Pareto-optimal solution demonstrate no performance loss and an energy reduction of 79% for the complete accelerator, including computational units and memories, when compared to a state-of-the-art design executing Google's CapsNet model for the MNIST dataset.
△ Less
Submitted 12 October, 2020;
originally announced October 2020.
-
NASCaps: A Framework for Neural Architecture Search to Optimize the Accuracy and Hardware Efficiency of Convolutional Capsule Networks
Authors:
Alberto Marchisio,
Andrea Massa,
Vojtech Mrazek,
Beatrice Bussolino,
Maurizio Martina,
Muhammad Shafique
Abstract:
Deep Neural Networks (DNNs) have made significant improvements to reach the desired accuracy to be employed in a wide variety of Machine Learning (ML) applications. Recently the Google Brain's team demonstrated the ability of Capsule Networks (CapsNets) to encode and learn spatial correlations between different input features, thereby obtaining superior learning capabilities compared to traditiona…
▽ More
Deep Neural Networks (DNNs) have made significant improvements to reach the desired accuracy to be employed in a wide variety of Machine Learning (ML) applications. Recently the Google Brain's team demonstrated the ability of Capsule Networks (CapsNets) to encode and learn spatial correlations between different input features, thereby obtaining superior learning capabilities compared to traditional (i.e., non-capsule based) DNNs. However, designing CapsNets using conventional methods is a tedious job and incurs significant training effort. Recent studies have shown that powerful methods to automatically select the best/optimal DNN model configuration for a given set of applications and a training dataset are based on the Neural Architecture Search (NAS) algorithms. Moreover, due to their extreme computational and memory requirements, DNNs are employed using the specialized hardware accelerators in IoT-Edge/CPS devices. In this paper, we propose NASCaps, an automated framework for the hardware-aware NAS of different types of DNNs, covering both traditional convolutional DNNs and CapsNets. We study the efficacy of deploying a multi-objective Genetic Algorithm (e.g., based on the NSGA-II algorithm). The proposed framework can jointly optimize the network accuracy and the corresponding hardware efficiency, expressed in terms of energy, memory, and latency of a given hardware accelerator executing the DNN inference. Besides supporting the traditional DNN layers, our framework is the first to model and supports the specialized capsule layers and dynamic routing in the NAS-flow. We evaluate our framework on different datasets, generating different network configurations, and demonstrate the tradeoffs between the different output metrics. We will open-source the complete framework and configurations of the Pareto-optimal architectures at https://github.com/ehw-fit/nascaps.
△ Less
Submitted 19 August, 2020;
originally announced August 2020.
-
Semantically-Oriented Mutation Operator in Cartesian Genetic Programming for Evolutionary Circuit Design
Authors:
David Hodan,
Vojtech Mrazek,
Zdenek Vasicek
Abstract:
Despite many successful applications, Cartesian Genetic Programming (CGP) suffers from limited scalability, especially when used for evolutionary circuit design. Considering the multiplier design problem, for example, the 5x5-bit multiplier represents the most complex circuit evolved from a randomly generated initial population. The efficiency of CGP highly depends on the performance of the point…
▽ More
Despite many successful applications, Cartesian Genetic Programming (CGP) suffers from limited scalability, especially when used for evolutionary circuit design. Considering the multiplier design problem, for example, the 5x5-bit multiplier represents the most complex circuit evolved from a randomly generated initial population. The efficiency of CGP highly depends on the performance of the point mutation operator, however, this operator is purely stochastic. This contrasts with the recent developments in Genetic Programming (GP), where advanced informed approaches such as semantic-aware operators are incorporated to improve the search space exploration capability of GP. In this paper, we propose a semantically-oriented mutation operator (SOMO) suitable for the evolutionary design of combinational circuits. SOMO uses semantics to determine the best value for each mutated gene. Compared to the common CGP and its variants as well as the recent versions of Semantic GP, the proposed method converges on common Boolean benchmarks substantially faster while kee** the phenotype size relatively small. The successfully evolved instances presented in this paper include 10-bit parity, 10+10-bit adder and 5x5-bit multiplier. The most complex circuits were evolved in less than one hour with a single-thread implementation running on a common CPU.
△ Less
Submitted 23 April, 2020;
originally announced April 2020.
-
ApproxFPGAs: Embracing ASIC-Based Approximate Arithmetic Components for FPGA-Based Systems
Authors:
Bharath Srinivas Prabakaran,
Vojtech Mrazek,
Zdenek Vasicek,
Lukas Sekanina,
Muhammad Shafique
Abstract:
There has been abundant research on the development of Approximate Circuits (ACs) for ASICs. However, previous studies have illustrated that ASIC-based ACs offer asymmetrical gains in FPGA-based accelerators. Therefore, an AC that might be pareto-optimal for ASICs might not be pareto-optimal for FPGAs. In this work, we present the ApproxFPGAs methodology that uses machine learning models to reduce…
▽ More
There has been abundant research on the development of Approximate Circuits (ACs) for ASICs. However, previous studies have illustrated that ASIC-based ACs offer asymmetrical gains in FPGA-based accelerators. Therefore, an AC that might be pareto-optimal for ASICs might not be pareto-optimal for FPGAs. In this work, we present the ApproxFPGAs methodology that uses machine learning models to reduce the exploration time for analyzing the state-of-the-art ASIC-based ACs to determine the set of pareto-optimal FPGA-based ACs. We also perform a case-study to illustrate the benefits obtained by deploying these pareto-optimal FPGA-based ACs in a state-of-the-art automation framework to systematically generate pareto-optimal approximate accelerators that can be deployed in FPGA-based systems to achieve high performance or low-power consumption.
△ Less
Submitted 22 April, 2020;
originally announced April 2020.
-
Using Libraries of Approximate Circuits in Design of Hardware Accelerators of Deep Neural Networks
Authors:
Vojtech Mrazek,
Lukas Sekanina,
Zdenek Vasicek
Abstract:
Approximate circuits have been developed to provide good tradeoffs between power consumption and quality of service in error resilient applications such as hardware accelerators of deep neural networks (DNN). In order to accelerate the approximate circuit design process and to support a fair benchmarking of circuit approximation methods, libraries of approximate circuits have been introduced. For…
▽ More
Approximate circuits have been developed to provide good tradeoffs between power consumption and quality of service in error resilient applications such as hardware accelerators of deep neural networks (DNN). In order to accelerate the approximate circuit design process and to support a fair benchmarking of circuit approximation methods, libraries of approximate circuits have been introduced. For example, EvoApprox8b contains hundreds of 8-bit approximate adders and multipliers. By means of genetic programming we generated an extended version of the library in which thousands of 8- to 128-bit approximate arithmetic circuits are included. These circuits form Pareto fronts with respect to several error metrics, power consumption and other circuit parameters. In our case study we show how a large set of approximate multipliers can be used to perform a resilience analysis of a hardware accelerator of ResNet DNN and to select the most suitable approximate multiplier for a given application. Results are reported for various instances of the ResNet DNN trained on CIFAR-10 benchmark problem.
△ Less
Submitted 22 April, 2020;
originally announced April 2020.
-
Adaptive Verifiability-Driven Strategy for Evolutionary Approximation of Arithmetic Circuits
Authors:
Milan Ceska,
Jiri Matyas,
Vojtech Mrazek,
Lukas Sekanina,
Zdenek Vasicek,
Tomas Vojnar
Abstract:
We present a novel approach for designing complex approximate arithmetic circuits that trade correctness for power consumption and play important role in many energy-aware applications. Our approach integrates in a unique way formal methods providing formal guarantees on the approximation error into an evolutionary circuit optimisation algorithm. The key idea is to employ a novel adaptive search s…
▽ More
We present a novel approach for designing complex approximate arithmetic circuits that trade correctness for power consumption and play important role in many energy-aware applications. Our approach integrates in a unique way formal methods providing formal guarantees on the approximation error into an evolutionary circuit optimisation algorithm. The key idea is to employ a novel adaptive search strategy that drives the evolution towards promptly verifiable approximate circuits. As demonstrated in an extensive experimental evaluation including several structurally different arithmetic circuits and target precisions, the search strategy provides superior scalability and versatility with respect to various approximation scenarios. Our approach significantly improves capabilities of the existing methods and paves a way towards an automated design process of provably-correct circuit approximations.
△ Less
Submitted 5 March, 2020;
originally announced March 2020.
-
TFApprox: Towards a Fast Emulation of DNN Approximate Hardware Accelerators on GPU
Authors:
Filip Vaverka,
Vojtech Mrazek,
Zdenek Vasicek,
Lukas Sekanina
Abstract:
Energy efficiency of hardware accelerators of deep neural networks (DNN) can be improved by introducing approximate arithmetic circuits. In order to quantify the error introduced by using these circuits and avoid the expensive hardware prototy**, a software emulator of the DNN accelerator is usually executed on CPU or GPU. However, this emulation is typically two or three orders of magnitude slo…
▽ More
Energy efficiency of hardware accelerators of deep neural networks (DNN) can be improved by introducing approximate arithmetic circuits. In order to quantify the error introduced by using these circuits and avoid the expensive hardware prototy**, a software emulator of the DNN accelerator is usually executed on CPU or GPU. However, this emulation is typically two or three orders of magnitude slower than a software DNN implementation running on CPU or GPU and operating with standard floating point arithmetic instructions and common DNN libraries. The reason is that there is no hardware support for approximate arithmetic operations on common CPUs and GPUs and these operations have to be expensively emulated. In order to address this issue, we propose an efficient emulation method for approximate circuits utilized in a given DNN accelerator which is emulated on GPU. All relevant approximate circuits are implemented as look-up tables and accessed through a texture memory mechanism of CUDA capable GPUs. We exploit the fact that the texture memory is optimized for irregular read-only access and in some GPU architectures is even implemented as a dedicated cache. This technique allowed us to reduce the inference time of the emulated DNN accelerator approximately 200 times with respect to an optimized CPU version on complex DNNs such as ResNet. The proposed approach extends the TensorFlow library and is available online at https://github.com/ehw-fit/tf-approximate.
△ Less
Submitted 21 February, 2020;
originally announced February 2020.
-
ReD-CaNe: A Systematic Methodology for Resilience Analysis and Design of Capsule Networks under Approximations
Authors:
Alberto Marchisio,
Vojtech Mrazek,
Muhammad Abudllah Hanif,
Muhammad Shafique
Abstract:
Recent advances in Capsule Networks (CapsNets) have shown their superior learning capability, compared to the traditional Convolutional Neural Networks (CNNs). However, the extremely high complexity of CapsNets limits their fast deployment in real-world applications. Moreover, while the resilience of CNNs have been extensively investigated to enable their energy-efficient implementations, the anal…
▽ More
Recent advances in Capsule Networks (CapsNets) have shown their superior learning capability, compared to the traditional Convolutional Neural Networks (CNNs). However, the extremely high complexity of CapsNets limits their fast deployment in real-world applications. Moreover, while the resilience of CNNs have been extensively investigated to enable their energy-efficient implementations, the analysis of CapsNets' resilience is a largely unexplored area, that can provide a strong foundation to investigate techniques to overcome the CapsNets' complexity challenge.
Following the trend of Approximate Computing to enable energy-efficient designs, we perform an extensive resilience analysis of the CapsNets inference subjected to the approximation errors. Our methodology models the errors arising from the approximate components (like multipliers), and analyze their impact on the classification accuracy of CapsNets. This enables the selection of approximate components based on the resilience of each operation of the CapsNet inference. We modify the TensorFlow framework to simulate the injection of approximation noise (based on the models of the approximate components) at different computational operations of the CapsNet inference. Our results show that the CapsNets are more resilient to the errors injected in the computations that occur during the dynamic routing (the softmax and the update of the coefficients), rather than other stages like convolutions and activation functions. Our analysis is extremely useful towards designing efficient CapsNet hardware accelerators with approximate components. To the best of our knowledge, this is the first proof-of-concept for employing approximations on the specialized CapsNet hardware.
△ Less
Submitted 2 December, 2019;
originally announced December 2019.
-
ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining
Authors:
Vojtech Mrazek,
Zdenek Vasicek,
Lukas Sekanina,
Muhammad Abdullah Hanif,
Muhammad Shafique
Abstract:
The state-of-the-art approaches employ approximate computing to reduce the energy consumption of DNN hardware. Approximate DNNs then require extensive retraining afterwards to recover from the accuracy loss caused by the use of approximate operations. However, retraining of complex DNNs does not scale well. In this paper, we demonstrate that efficient approximations can be introduced into the comp…
▽ More
The state-of-the-art approaches employ approximate computing to reduce the energy consumption of DNN hardware. Approximate DNNs then require extensive retraining afterwards to recover from the accuracy loss caused by the use of approximate operations. However, retraining of complex DNNs does not scale well. In this paper, we demonstrate that efficient approximations can be introduced into the computational path of DNN accelerators while retraining can completely be avoided. ALWANN provides highly optimized implementations of DNNs for custom low-power accelerators in which the number of computing units is lower than the number of DNN layers. First, a fully trained DNN is converted to operate with 8-bit weights and 8-bit multipliers in convolutional layers. A suitable approximate multiplier is then selected for each computing element from a library of approximate multipliers in such a way that (i) one approximate multiplier serves several layers, and (ii) the overall classification error and energy consumption are minimized. The optimizations including the multiplier selection problem are solved by means of a multiobjective optimization NSGA-II algorithm. In order to completely avoid the computationally expensive retraining of DNN, which is usually employed to improve the classification accuracy, we propose a simple weight updating scheme that compensates the inaccuracy introduced by employing approximate multipliers. The proposed approach is evaluated for two architectures of DNN accelerators with approximate multipliers from the open-source "EvoApprox" library. We report that the proposed approach saves 30% of energy needed for multiplication in convolutional layers of ResNet-50 while the accuracy is degraded by only 0.6%. The proposed technique and approximate layers are available as an open-source extension of TensorFlow at https://github.com/ehw-fit/tf-approximate.
△ Less
Submitted 25 July, 2019; v1 submitted 11 June, 2019;
originally announced July 2019.
-
Automated Circuit Approximation Method Driven by Data Distribution
Authors:
Zdenek Vasicek,
Vojtech Mrazek,
Lukas Sekanina
Abstract:
We propose an application-tailored data-driven fully automated method for functional approximation of combinational circuits. We demonstrate how an application-level error metric such as the classification accuracy can be translated to a component-level error metric needed for an efficient and fast search in the space of approximate low-level components that are used in the application. This is po…
▽ More
We propose an application-tailored data-driven fully automated method for functional approximation of combinational circuits. We demonstrate how an application-level error metric such as the classification accuracy can be translated to a component-level error metric needed for an efficient and fast search in the space of approximate low-level components that are used in the application. This is possible by employing a weighted mean error distance (WMED) metric for steering the circuit approximation process which is conducted by means of genetic programming. WMED introduces a set of weights (calculated from the data distribution measured on a selected signal in a given application) determining the importance of each input vector for the approximation process. The method is evaluated using synthetic benchmarks and application-specific approximate MAC (multiply-and-accumulate) units that are designed to provide the best trade-offs between the classification accuracy and power consumption of two image classifiers based on neural networks.
△ Less
Submitted 11 March, 2019;
originally announced March 2019.
-
autoAx: An Automatic Design Space Exploration and Circuit Building Methodology utilizing Libraries of Approximate Components
Authors:
Vojtech Mrazek,
Muhammad Abdullah Hanif,
Zdenek Vasicek,
Lukas Sekanina,
Muhammad Shafique
Abstract:
Approximate computing is an emerging paradigm for develo** highly energy-efficient computing systems such as various accelerators. In the literature, many libraries of elementary approximate circuits have already been proposed to simplify the design process of approximate accelerators. Because these libraries contain from tens to thousands of approximate implementations for a single arithmetic o…
▽ More
Approximate computing is an emerging paradigm for develo** highly energy-efficient computing systems such as various accelerators. In the literature, many libraries of elementary approximate circuits have already been proposed to simplify the design process of approximate accelerators. Because these libraries contain from tens to thousands of approximate implementations for a single arithmetic operation it is intractable to find an optimal combination of approximate circuits in the library even for an application consisting of a few operations. An open problem is "how to effectively combine circuits from these libraries to construct complex approximate accelerators". This paper proposes a novel methodology for searching, selecting and combining the most suitable approximate circuits from a set of available libraries to generate an approximate accelerator for a given application. To enable fast design space generation and exploration, the methodology utilizes machine learning techniques to create computational models estimating the overall quality of processing and hardware cost without performing full synthesis at the accelerator level. Using the methodology, we construct hundreds of approximate accelerators (for a Sobel edge detector) showing different but relevant tradeoffs between the quality of processing and hardware cost and identify a corresponding Pareto-frontier. Furthermore, when searching for approximate implementations of a generic Gaussian filter consisting of 17 arithmetic operations, the proposed approach allows us to identify approximately $10^3$ highly important implementations from $10^{23}$ possible solutions in a few hours, while the exhaustive search would take four months on a high-end processor.
△ Less
Submitted 1 April, 2019; v1 submitted 22 February, 2019;
originally announced February 2019.