-
Algorithm and Hardware Co-design for Reconfigurable CNN Accelerator
Authors:
Hongxiang Fan,
Martin Ferianc,
Zhiqiang Que,
He Li,
Shuanglong Liu,
Xinyu Niu,
Wayne Luk
Abstract:
Recent advances in algorithm-hardware co-design for deep neural networks (DNNs) have demonstrated their potential in automatically designing neural architectures and hardware designs. Nevertheless, it is still a challenging optimization problem due to the expensive training cost and the time-consuming hardware implementation, which makes the exploration on the vast design space of neural architect…
▽ More
Recent advances in algorithm-hardware co-design for deep neural networks (DNNs) have demonstrated their potential in automatically designing neural architectures and hardware designs. Nevertheless, it is still a challenging optimization problem due to the expensive training cost and the time-consuming hardware implementation, which makes the exploration on the vast design space of neural architecture and hardware design intractable. In this paper, we demonstrate that our proposed approach is capable of locating designs on the Pareto frontier. This capability is enabled by a novel three-phase co-design framework, with the following new features: (a) decoupling DNN training from the design space exploration of hardware architecture and neural architecture, (b) providing a hardware-friendly neural architecture space by considering hardware characteristics in constructing the search cells, (c) adopting Gaussian process to predict accuracy, latency and power consumption to avoid time-consuming synthesis and place-and-route processes. In comparison with the manually-designed ResNet101, InceptionV2 and MobileNetV2, we can achieve up to 5% higher accuracy with up to 3x speed up on the ImageNet dataset. Compared with other state-of-the-art co-design frameworks, our found network and hardware configuration can achieve 2% ~ 6% higher accuracy, 2x ~ 26x smaller latency and 8.5x higher energy efficiency.
△ Less
Submitted 24 November, 2021;
originally announced November 2021.
-
High-Performance FPGA-based Accelerator for Bayesian Neural Networks
Authors:
Hongxiang Fan,
Martin Ferianc,
Miguel Rodrigues,
Hongyu Zhou,
Xinyu Niu,
Wayne Luk
Abstract:
Neural networks (NNs) have demonstrated their potential in a wide range of applications such as image recognition, decision making or recommendation systems. However, standard NNs are unable to capture their model uncertainty which is crucial for many safety-critical applications including healthcare and autonomous vehicles. In comparison, Bayesian neural networks (BNNs) are able to express uncert…
▽ More
Neural networks (NNs) have demonstrated their potential in a wide range of applications such as image recognition, decision making or recommendation systems. However, standard NNs are unable to capture their model uncertainty which is crucial for many safety-critical applications including healthcare and autonomous vehicles. In comparison, Bayesian neural networks (BNNs) are able to express uncertainty in their prediction via a mathematical grounding. Nevertheless, BNNs have not been as widely used in industrial practice, mainly because of their expensive computational cost and limited hardware performance. This work proposes a novel FPGA-based hardware architecture to accelerate BNNs inferred through Monte Carlo Dropout. Compared with other state-of-the-art BNN accelerators, the proposed accelerator can achieve up to 4 times higher energy efficiency and 9 times better compute efficiency. Considering partial Bayesian inference, an automatic framework is proposed, which explores the trade-off between hardware and algorithmic performance. Extensive experiments are conducted to demonstrate that our proposed framework can effectively find the optimal points in the design space.
△ Less
Submitted 30 November, 2021; v1 submitted 12 May, 2021;
originally announced May 2021.
-
Improving Performance Estimation for FPGA-based Accelerators for Convolutional Neural Networks
Authors:
Martin Ferianc,
Hongxiang Fan,
Ringo S. W. Chu,
Jakub Stano,
Wayne Luk
Abstract:
Field-programmable gate array (FPGA) based accelerators are being widely used for acceleration of convolutional neural networks (CNNs) due to their potential in improving the performance and reconfigurability for specific application instances. To determine the optimal configuration of an FPGA-based accelerator, it is necessary to explore the design space and an accurate performance prediction pla…
▽ More
Field-programmable gate array (FPGA) based accelerators are being widely used for acceleration of convolutional neural networks (CNNs) due to their potential in improving the performance and reconfigurability for specific application instances. To determine the optimal configuration of an FPGA-based accelerator, it is necessary to explore the design space and an accurate performance prediction plays an important role during the exploration. This work introduces a novel method for fast and accurate estimation of latency based on a Gaussian process parametrised by an analytic approximation and coupled with runtime data. The experiments conducted on three different CNNs on an FPGA-based accelerator on Intel Arria 10 GX 1150 demonstrated a 30.7% improvement in accuracy with respect to the mean absolute error in comparison to a standard analytic method in leave-one-out cross-validation.
△ Less
Submitted 1 February, 2020;
originally announced February 2020.
-
Learning Absolute Sound Source Localisation With Limited Supervisions
Authors:
Yang Chu,
Wayne Luk,
Dan Goodman
Abstract:
An accurate auditory space map can be learned from auditory experience, for example during development or in response to altered auditory cues such as a modified pinna. We studied neural network models that learn to localise a single sound source in the horizontal plane using binaural cues based on limited supervisions. These supervisions can be unreliable or sparse in real life. First, a simple m…
▽ More
An accurate auditory space map can be learned from auditory experience, for example during development or in response to altered auditory cues such as a modified pinna. We studied neural network models that learn to localise a single sound source in the horizontal plane using binaural cues based on limited supervisions. These supervisions can be unreliable or sparse in real life. First, a simple model that has unreliable estimation of the sound source location is built, in order to simulate the unreliable auditory orienting response of newborns. It is used as a Teacher that acts as a source of unreliable supervisions. Then we show that it is possible to learn a continuous auditory space map based only on noisy left or right feedbacks from the Teacher. Furthermore, reinforcement rewards from the environment are used as a source of sparse supervision. By combining the unreliable innate response and the sparse reinforcement rewards, an accurate auditory space map, which is hard to be achieved by either one of these two kind of supervisions, can eventually be learned. Our results show that the auditory space map** can be calibrated even without explicit supervision. Moreover, this study implies a possibly more general neural mechanism where multiple sub-modules can be coordinated to facilitate each other's learning process under limited supervisions.
△ Less
Submitted 28 January, 2020;
originally announced January 2020.
-
DirectPET: Full Size Neural Network PET Reconstruction from Sinogram Data
Authors:
William Whiteley,
Wing K. Luk,
Jens Gregor
Abstract:
Purpose: Neural network image reconstruction directly from measurement data is a relatively new field of research, that until now has been limited to producing small single-slice images (e.g., 1x128x128). This paper proposes a novel and more efficient network design for Positron Emission Tomography called DirectPET which is capable of reconstructing multi-slice image volumes (i.e., 16x400x400) fro…
▽ More
Purpose: Neural network image reconstruction directly from measurement data is a relatively new field of research, that until now has been limited to producing small single-slice images (e.g., 1x128x128). This paper proposes a novel and more efficient network design for Positron Emission Tomography called DirectPET which is capable of reconstructing multi-slice image volumes (i.e., 16x400x400) from sinograms.
Approach: Large-scale direct neural network reconstruction is accomplished by addressing the associated memory space challenge through the introduction of a specially designed Radon inversion layer. Using patient data, we compare the proposed method to the benchmark Ordered Subsets Expectation Maximization (OSEM) algorithm using signal-to-noise ratio, bias, mean absolute error and structural similarity measures. In addition, line profiles and full-width half-maximum measurements are provided for a sample of lesions.
Results: DirectPET is shown capable of producing images that are quantitatively and qualitatively similar to the OSEM target images in a fraction of the time. We also report on an experiment where DirectPET is trained to map low count raw data to normal count target images demonstrating the method's ability to maintain image quality under a low dose scenario.
Conclusion: The ability of DirectPET to quickly reconstruct high-quality, multi-slice image volumes suggests potential clinical viability of the method. However, design parameters and performance boundaries need to be fully established before adoption can be considered.
△ Less
Submitted 11 February, 2020; v1 submitted 19 August, 2019;
originally announced August 2019.
-
Convolution Based Spectral Partitioning Architecture for Hyperspectral Image Classification
Authors:
Ringo S. W. Chu,
Ho-Cheung Ng,
Xiwei Wang,
Wayne Luk
Abstract:
Hyperspectral images (HSIs) can distinguish materials with high number of spectral bands, which is widely adopted in remote sensing applications and benefits in high accuracy land cover classifications. However, HSIs processing are tangled with the problem of high dimensionality and limited amount of labelled data. To address these challenges, this paper proposes a deep learning architecture using…
▽ More
Hyperspectral images (HSIs) can distinguish materials with high number of spectral bands, which is widely adopted in remote sensing applications and benefits in high accuracy land cover classifications. However, HSIs processing are tangled with the problem of high dimensionality and limited amount of labelled data. To address these challenges, this paper proposes a deep learning architecture using three dimensional convolutional neural networks with spectral partitioning to perform effective feature extraction. We conduct experiments using Indian Pines and Salinas scenes acquired by NASA Airborne Visible/Infra-Red Imaging Spectrometer. In comparison to prior results, our architecture shows competitive performance for classification results over current methods.
△ Less
Submitted 27 June, 2019;
originally announced June 2019.
-
Optimizing CNN-based Hyperspectral Image Classification on FPGAs
Authors:
Shuanglong Liu,
Ringo S. W. Chu,
Xiwei Wang,
Wayne Luk
Abstract:
Hyperspectral image (HSI) classification has been widely adopted in applications involving remote sensing imagery analysis which require high classification accuracy and real-time processing speed. Methods based on Convolutional neural networks (CNNs) have been proven to achieve state-of-the-art accuracy in classifying HSIs. However, CNN models are often too computationally intensive to achieve re…
▽ More
Hyperspectral image (HSI) classification has been widely adopted in applications involving remote sensing imagery analysis which require high classification accuracy and real-time processing speed. Methods based on Convolutional neural networks (CNNs) have been proven to achieve state-of-the-art accuracy in classifying HSIs. However, CNN models are often too computationally intensive to achieve real-time response due to the high dimensional nature of HSI, compared to traditional methods such as Support Vector Machines (SVMs). Besides, previous CNN models used in HSI are not specially designed for efficient implementation on embedded devices such as FPGAs. This paper proposes a novel CNN-based algorithm for HSI classification which takes into account hardware efficiency. A customized architecture which enables the proposed algorithm to be mapped effectively onto FPGA resources is then proposed to support real-time on-board classification with low power consumption. Implementation results show that our proposed accelerator on a Xilinx Zynq 706 FPGA board achieves more than 70x faster than an Intel 8-core Xeon CPU and 3x faster than an NVIDIA GeForce 1080 GPU. Compared to previous SVM-based FPGA accelerators, we achieve comparable processing speed but provide a much higher classification accuracy.
△ Less
Submitted 27 June, 2019;
originally announced June 2019.