-
A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models
Authors:
Reda Bensaid,
Vincent Gripon,
François Leduc-Primeau,
Lukas Mauch,
Ghouthi Boukli Hacene,
Fabien Cardinaux
Abstract:
In recent years, the rapid evolution of computer vision has seen the emergence of various foundation models, each tailored to specific data types and tasks. In this study, we explore the adaptation of these models for few-shot semantic segmentation. Specifically, we conduct a comprehensive comparative analysis of four prominent foundation models: DINO V2, Segment Anything, CLIP, Masked AutoEncoder…
▽ More
In recent years, the rapid evolution of computer vision has seen the emergence of various foundation models, each tailored to specific data types and tasks. In this study, we explore the adaptation of these models for few-shot semantic segmentation. Specifically, we conduct a comprehensive comparative analysis of four prominent foundation models: DINO V2, Segment Anything, CLIP, Masked AutoEncoders, and of a straightforward ResNet50 pre-trained on the COCO dataset. We also include 5 adaptation methods, ranging from linear probing to fine tuning. Our findings show that DINO V2 outperforms other models by a large margin, across various datasets and adaptation methods. On the other hand, adaptation methods provide little discrepancy in the obtained results, suggesting that a simple linear probing can compete with advanced, more computationally intensive, alternatives
△ Less
Submitted 2 April, 2024; v1 submitted 20 January, 2024;
originally announced January 2024.
-
SAGE-HB: Swift Adaptation and Generalization in Massive MIMO Hybrid Beamforming
Authors:
Ali Hasanzadeh Karkan,
Hamed Hojatian,
Jean-François Frigon,
François Leduc-Primeau
Abstract:
Deep learning (DL)-based solutions have emerged as promising candidates for beamforming in massive Multiple-Input Multiple-Output (mMIMO) systems. Nevertheless, it remains challenging to seamlessly adapt these solutions to practical deployment scenarios, typically necessitating extensive data for fine-tuning while grappling with domain adaptation and generalization issues. In response, we propose…
▽ More
Deep learning (DL)-based solutions have emerged as promising candidates for beamforming in massive Multiple-Input Multiple-Output (mMIMO) systems. Nevertheless, it remains challenging to seamlessly adapt these solutions to practical deployment scenarios, typically necessitating extensive data for fine-tuning while grappling with domain adaptation and generalization issues. In response, we propose a novel approach combining Meta-Learning Domain Generalization (MLDG) with novel data augmentation techniques during fine-tuning. This approach not only accelerates adaptation to new channel environments but also significantly reduces the data requirements for fine-tuning, thereby enhancing the practicality and efficiency of DL-based mMIMO systems. The proposed approach is validated by simulating the performance of a backbone model when deployed in a new channel environment, and with different antenna configurations, path loss, and base station height parameters. Our proposed approach demonstrates superior zero-shot performance compared to existing methods and also achieves near-optimal performance with significantly fewer fine-tuning data samples.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Learning Energy-Efficient Hardware Configurations for Massive MIMO Beamforming
Authors:
Hamed Hojatian,
Zoubeir Mlika,
Jérémy Nadal,
Jean-François Frigon,
François Leduc-Primeau
Abstract:
Hybrid beamforming (HBF) and antenna selection are promising techniques for improving the energy efficiency~(EE) of massive multiple-input multiple-output~(mMIMO) systems. However, the transmitter architecture may contain several parameters that need to be optimized, such as the power allocated to the antennas and the connections between the antennas and the radio frequency chains. Therefore, find…
▽ More
Hybrid beamforming (HBF) and antenna selection are promising techniques for improving the energy efficiency~(EE) of massive multiple-input multiple-output~(mMIMO) systems. However, the transmitter architecture may contain several parameters that need to be optimized, such as the power allocated to the antennas and the connections between the antennas and the radio frequency chains. Therefore, finding the optimal transmitter architecture requires solving a non-convex mixed integer problem in a large search space. In this paper, we consider the problem of maximizing the EE of fully digital precoder~(FDP) and hybrid beamforming~(HBF) transmitters. First, we propose an energy model for different beamforming structures. Then, based on the proposed energy model, we develop an unsupervised deep learning method to maximize the EE by designing the transmitter configuration for FDP and HBF. The proposed deep neural networks can provide different trade-offs between spectral efficiency and energy consumption while adapting to different numbers of active users. Finally, to ensure that the proposed method can be implemented in practice, we investigate the ability of the model to be trained exclusively using imperfect channel state information~(CSI), both for the input to the deep learning model and for the calculation of the loss function. Simulation results show that the proposed solutions can outperform conventional methods in terms of EE while being trained with imperfect CSI. Furthermore, we show that the proposed solutions are less complex and more robust to noise than conventional methods.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Sparq: A Custom RISC-V Vector Processor for Efficient Sub-Byte Quantized Inference
Authors:
Théo Dupuis,
Yoan Fournier,
MohammadHossein AskariHemmat,
Nizar El Zarif,
François Leduc-Primeau,
Jean Pierre David,
Yvon Savaria
Abstract:
Convolutional Neural Networks (CNNs) are used in a wide range of applications, with full-precision CNNs achieving high accuracy at the expense of portability. Recent progress in quantization techniques has demonstrated that sub-byte Quantized Neural Networks (QNNs) achieve comparable or superior accuracy while significantly reducing the computational cost and memory footprint. However, sub-byte co…
▽ More
Convolutional Neural Networks (CNNs) are used in a wide range of applications, with full-precision CNNs achieving high accuracy at the expense of portability. Recent progress in quantization techniques has demonstrated that sub-byte Quantized Neural Networks (QNNs) achieve comparable or superior accuracy while significantly reducing the computational cost and memory footprint. However, sub-byte computation on commodity hardware is sub-optimal due to the lack of support for such precision. In this paper, we introduce Sparq, a Sub-byte vector Processor designed for the AcceleRation of QNN inference. This processor is based on a modified version of Ara, an open-source 64-bit RISC-V ``V'' compliant processor. Sparq is implemented in GLOBAL FOUNDRIES 22FDX FD-SOI technology and extends the Instruction Set Architecture (ISA) by adding a new multiply-shift-accumulate instruction to improve sub-byte computation effciency. The floating-point unit is also removed to minimize area and power usage. To demonstrate Sparq performance, we implement an ultra-low-precision (1-bit to 4-bit) vectorized conv2d operation taking advantage of the dedicated hardware. We show that Sparq can significantly accelerate sub-byte computations with respectively 3.2 times, and 1.7 times acceleration over an optimized 16-bit 2D convolution for 2-bit and 4-bit quantization.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Quark: An Integer RISC-V Vector Processor for Sub-Byte Quantized DNN Inference
Authors:
MohammadHossein AskariHemmat,
Theo Dupuis,
Yoan Fournier,
Nizar El Zarif,
Matheus Cavalcante,
Matteo Perotti,
Frank Gurkaynak,
Luca Benini,
Francois Leduc-Primeau,
Yvon Savaria,
Jean-Pierre David
Abstract:
In this paper, we present Quark, an integer RISC-V vector processor specifically tailored for sub-byte DNN inference. Quark is implemented in GlobalFoundries' 22FDX FD-SOI technology. It is designed on top of Ara, an open-source 64-bit RISC-V vector processor. To accommodate sub-byte DNN inference, Quark extends Ara by adding specialized vector instructions to perform sub-byte quantized operations…
▽ More
In this paper, we present Quark, an integer RISC-V vector processor specifically tailored for sub-byte DNN inference. Quark is implemented in GlobalFoundries' 22FDX FD-SOI technology. It is designed on top of Ara, an open-source 64-bit RISC-V vector processor. To accommodate sub-byte DNN inference, Quark extends Ara by adding specialized vector instructions to perform sub-byte quantized operations. We also remove the floating-point unit from Quarks' lanes and use the CVA6 RISC-V scalar core for the re-scaling operations that are required in quantized neural network inference. This makes each lane of Quark 2 times smaller and 1.9 times more power efficient compared to the ones of Ara. In this paper we show that Quark can run quantized models at sub-byte precision. Notably we show that for 1-bit and 2-bit quantized models, Quark can accelerate computation of Conv2d over various ranges of inputs and kernel sizes.
△ Less
Submitted 12 February, 2023;
originally announced February 2023.
-
SAMSON: Sharpness-Aware Minimization Scaled by Outlier Normalization for Improving DNN Generalization and Robustness
Authors:
Gonçalo Mordido,
Sébastien Henwood,
Sarath Chandar,
François Leduc-Primeau
Abstract:
Energy-efficient deep neural network (DNN) accelerators are prone to non-idealities that degrade DNN performance at inference time. To mitigate such degradation, existing methods typically add perturbations to the DNN weights during training to simulate inference on noisy hardware. However, this often requires knowledge about the target hardware and leads to a trade-off between DNN performance and…
▽ More
Energy-efficient deep neural network (DNN) accelerators are prone to non-idealities that degrade DNN performance at inference time. To mitigate such degradation, existing methods typically add perturbations to the DNN weights during training to simulate inference on noisy hardware. However, this often requires knowledge about the target hardware and leads to a trade-off between DNN performance and robustness, decreasing the former to increase the latter. In this work, we show that applying sharpness-aware training, by optimizing for both the loss value and loss sharpness, significantly improves robustness to noisy hardware at inference time without relying on any assumptions about the target hardware. In particular, we propose a new adaptive sharpness-aware method that conditions the worst-case perturbation of a given weight not only on its magnitude but also on the range of the weight distribution. This is achieved by performing sharpness-aware minimization scaled by outlier minimization (SAMSON). Our approach outperforms existing sharpness-aware training methods both in terms of model generalization performance in noiseless regimes and robustness in noisy settings, as measured on several architectures and datasets.
△ Less
Submitted 21 March, 2023; v1 submitted 18 November, 2022;
originally announced November 2022.
-
Flexible Unsupervised Learning for Massive MIMO Subarray Hybrid Beamforming
Authors:
Hamed Hojatian,
Jérémy Nadal,
Jean-François Frigon,
François Leduc-Primeau
Abstract:
Hybrid beamforming is a promising technology to improve the energy efficiency of massive MIMO systems. In particular, subarray hybrid beamforming can further decrease power consumption by reducing the number of phase-shifters. However, designing the hybrid beamforming vectors is a complex task due to the discrete nature of the subarray connections and the phase-shift amounts. Finding the optimal c…
▽ More
Hybrid beamforming is a promising technology to improve the energy efficiency of massive MIMO systems. In particular, subarray hybrid beamforming can further decrease power consumption by reducing the number of phase-shifters. However, designing the hybrid beamforming vectors is a complex task due to the discrete nature of the subarray connections and the phase-shift amounts. Finding the optimal connections between RF chains and antennas requires solving a non-convex problem in a large search space. In addition, conventional solutions assume that perfect CSI is available, which is not the case in practical systems. Therefore, we propose a novel unsupervised learning approach to design the hybrid beamforming for any subarray structure while supporting quantized phase-shifters and noisy CSI. One major feature of the proposed architecture is that no beamforming codebook is required, and the neural network is trained to take into account the phase-shifter quantization. Simulation results show that the proposed deep learning solutions can achieve higher sum-rates than existing methods.
△ Less
Submitted 10 August, 2022;
originally announced August 2022.
-
MemSE: Fast MSE Prediction for Noisy Memristor-Based DNN Accelerators
Authors:
Jonathan Kern,
Sébastien Henwood,
Gonçalo Mordido,
Elsa Dupraz,
Abdeldjalil Aïssa-El-Bey,
Yvon Savaria,
François Leduc-Primeau
Abstract:
Memristors enable the computation of matrix-vector multiplications (MVM) in memory and, therefore, show great potential in highly increasing the energy efficiency of deep neural network (DNN) inference accelerators. However, computations in memristors suffer from hardware non-idealities and are subject to different sources of noise that may negatively impact system performance. In this work, we th…
▽ More
Memristors enable the computation of matrix-vector multiplications (MVM) in memory and, therefore, show great potential in highly increasing the energy efficiency of deep neural network (DNN) inference accelerators. However, computations in memristors suffer from hardware non-idealities and are subject to different sources of noise that may negatively impact system performance. In this work, we theoretically analyze the mean squared error of DNNs that use memristor crossbars to compute MVM. We take into account both the quantization noise, due to the necessity of reducing the DNN model size, and the programming noise, stemming from the variability during the programming of the memristance value. Simulations on pre-trained DNN models showcase the accuracy of the analytical prediction. Furthermore the proposed method is almost two order of magnitude faster than Monte-Carlo simulation, thus making it possible to optimize the implementation parameters to achieve minimal error for a given power constraint.
△ Less
Submitted 3 May, 2022;
originally announced May 2022.
-
Optimizing the Energy Efficiency of Unreliable Memories for Quantized Kalman Filtering
Authors:
Jonathan Kern,
Elsa Dupraz,
Abdeldjalil Aïssa-El-Bey,
Lav R. Varshney,
François Leduc-Primeau
Abstract:
This paper presents a quantized Kalman filter implemented using unreliable memories. We consider that both the quantization and the unreliable memories introduce errors in the computations, and develop an error propagation model that takes into account these two sources of errors. In addition to providing updated Kalman filter equations, the proposed error model accurately predicts the covariance…
▽ More
This paper presents a quantized Kalman filter implemented using unreliable memories. We consider that both the quantization and the unreliable memories introduce errors in the computations, and develop an error propagation model that takes into account these two sources of errors. In addition to providing updated Kalman filter equations, the proposed error model accurately predicts the covariance of the estimation error and gives a relation between the performance of the filter and its energy consumption, depending on the noise level in the memories. Then, since memories are responsible for a large part of the energy consumption of embedded systems, optimization methods are introduced so as to minimize the memory energy consumption under a desired estimation performance of the filter. The first method computes the optimal energy levels allocated to each memory bank individually, and the second one optimizes the energy allocation per groups of memory banks. Simulations show a close match between the theoretical analysis and experimental results. Furthermore, they demonstrate an important reduction in energy consumption of more than 50%.
△ Less
Submitted 3 September, 2021;
originally announced September 2021.
-
Energy Optimization of Faulty Quantized Min-Sum LDPC Decoders
Authors:
Mohamed Yaoumi,
Jeremy Nadal,
Elsa Dupraz,
Frederic Guilloud,
Francois Leduc-Primeau
Abstract:
The objective of this paper is to minimize the energy consumption of a quantized Min-Sum LDPC decoder, by considering aggressive voltage downscaling of the decoder circuit. Since low power supply may introduce faults in the memories used by the decoder architecture, this paper proposes to optimize the energy consumption of the faulty Min-Sum decoder while satisfying a given performance criterion.…
▽ More
The objective of this paper is to minimize the energy consumption of a quantized Min-Sum LDPC decoder, by considering aggressive voltage downscaling of the decoder circuit. Since low power supply may introduce faults in the memories used by the decoder architecture, this paper proposes to optimize the energy consumption of the faulty Min-Sum decoder while satisfying a given performance criterion. The proposed optimization method relies on a coordinate descent algorithm that optimizes code and decoder parameters which have a strong influence on the decoder energy consumption: codeword length, number of quantization bits, and failure probability of the memories. Optimal parameter values are provided for several codes defined by their protographs, and significant energy gains are observed compared to non-optimized setups.
△ Less
Submitted 27 August, 2021; v1 submitted 26 August, 2021;
originally announced August 2021.
-
Decentralized Beamforming for Cell-Free Massive MIMO with Unsupervised Learning
Authors:
Hamed Hojatian,
Jeremy Nadal,
Jean-Francois Frigon,
Francois Leduc-Primeau
Abstract:
Cell-free massive MIMO (CF-mMIMO) systems represent a promising approach to increase the spectral efficiency of wireless communication systems. However, near-optimal beamforming solutions require a large amount of signaling exchange between access points (APs) and the network controller (NC). In this letter, we propose two unsupervised deep neural networks (DNN) architectures, fully and partially…
▽ More
Cell-free massive MIMO (CF-mMIMO) systems represent a promising approach to increase the spectral efficiency of wireless communication systems. However, near-optimal beamforming solutions require a large amount of signaling exchange between access points (APs) and the network controller (NC). In this letter, we propose two unsupervised deep neural networks (DNN) architectures, fully and partially distributed, that can perform decentralized coordinated beamforming with zero or limited communication overhead between APs and NC, for both fully digital and hybrid precoding. The proposed DNNs achieve near-optimal sum-rate while also reducing computational complexity by 10-24x compared to conventional near-optimal solutions.
△ Less
Submitted 26 November, 2021; v1 submitted 30 June, 2021;
originally announced June 2021.
-
Unsupervised Deep Learning for Massive MIMO Hybrid Beamforming
Authors:
Hamed Hojatian,
Jeremy Nadal,
Jean-Francois Frigon,
Francois Leduc-Primeau
Abstract:
Hybrid beamforming is a promising technique to reduce the complexity and cost of massive multiple-input multiple-output (MIMO) systems while providing high data rate. However, the hybrid precoder design is a challenging task requiring channel state information (CSI) feedback and solving a complex optimization problem. This paper proposes a novel RSSI-based unsupervised deep learning method to desi…
▽ More
Hybrid beamforming is a promising technique to reduce the complexity and cost of massive multiple-input multiple-output (MIMO) systems while providing high data rate. However, the hybrid precoder design is a challenging task requiring channel state information (CSI) feedback and solving a complex optimization problem. This paper proposes a novel RSSI-based unsupervised deep learning method to design the hybrid beamforming in massive MIMO systems. Furthermore, we propose i) a method to design the synchronization signal (SS) in initial access (IA); and ii) a method to design the codebook for the analog precoder. We also evaluate the system performance through a realistic channel model in various scenarios. We show that the proposed method not only greatly increases the spectral efficiency especially in frequency-division duplex (FDD) communication by using partial CSI feedback, but also has near-optimal sum-rate and outperforms other state-of-the-art full-CSI solutions.
△ Less
Submitted 2 July, 2020; v1 submitted 30 June, 2020;
originally announced July 2020.
-
Noisy Density Evolution With Asymmetric Deviation Models
Authors:
Elsa Dupraz,
François Leduc-Primeau
Abstract:
This paper considers low-density parity-check (LDPC) decoders affected by deviations introduced by the electronic device on which the decoder is implemented. Noisy density evolution (DE) that allows to theoretically study the performance of these LDPC decoders can only consider symmetric deviation models due to the all-zero codeword assumption. A novel DE method is proposed that admits the use of…
▽ More
This paper considers low-density parity-check (LDPC) decoders affected by deviations introduced by the electronic device on which the decoder is implemented. Noisy density evolution (DE) that allows to theoretically study the performance of these LDPC decoders can only consider symmetric deviation models due to the all-zero codeword assumption. A novel DE method is proposed that admits the use of asymmetric deviation models, thus widening the range of faulty implementations that can be analyzed. DE equations are provided for three noisy decoders: belief propagation, Gallager B, and quantized min-sum (MS). Simulation results confirm that the proposed DE accurately predicts the performance of LDPC decoders with asymmetric deviations. Furthermore, asymmetric versions of the Gallager B and MS decoders are proposed to compensate the effect of asymmetric deviations. The parameters of these decoders are then optimized using the proposed DE, leading to better ensemble thresholds and improved finite-length performance in the presence of asymmetric deviations.
△ Less
Submitted 17 November, 2020; v1 submitted 12 May, 2020;
originally announced May 2020.
-
RSSI-Based Hybrid Beamforming Design with Deep Learning
Authors:
Hamed Hojatian,
Vu Nguyen Ha,
Jérémy Nadal,
Jean-François Frigon,
François Leduc-Primeau
Abstract:
Hybrid beamforming is a promising technology for 5G millimetre-wave communications. However, its implementation is challenging in practical multiple-input multiple-output (MIMO) systems because non-convex optimization problems have to be solved, introducing additional latency and energy consumption. In addition, the channel-state information (CSI) must be either estimated from pilot signals or fed…
▽ More
Hybrid beamforming is a promising technology for 5G millimetre-wave communications. However, its implementation is challenging in practical multiple-input multiple-output (MIMO) systems because non-convex optimization problems have to be solved, introducing additional latency and energy consumption. In addition, the channel-state information (CSI) must be either estimated from pilot signals or fed back through dedicated channels, introducing a large signaling overhead. In this paper, a hybrid precoder is designed based only on received signal strength indicator (RSSI) feedback from each user. A deep learning method is proposed to perform the associated optimization with reasonable complexity. Results demonstrate that the obtained sum-rates are very close to the ones obtained with full-CSI optimal but complex solutions. Finally, the proposed solution allows to greatly increase the spectral efficiency of the system when compared to existing techniques, as minimal CSI feedback is required.
△ Less
Submitted 12 March, 2020;
originally announced March 2020.
-
Layerwise Noise Maximisation to Train Low-Energy Deep Neural Networks
Authors:
Sébastien Henwood,
François Leduc-Primeau,
Yvon Savaria
Abstract:
Deep neural networks (DNNs) depend on the storage of a large number of parameters, which consumes an important portion of the energy used during inference. This paper considers the case where the energy usage of memory elements can be reduced at the cost of reduced reliability. A training algorithm is proposed to optimize the reliability of the storage separately for each layer of the network, whi…
▽ More
Deep neural networks (DNNs) depend on the storage of a large number of parameters, which consumes an important portion of the energy used during inference. This paper considers the case where the energy usage of memory elements can be reduced at the cost of reduced reliability. A training algorithm is proposed to optimize the reliability of the storage separately for each layer of the network, while incurring a negligible complexity overhead compared to a conventional stochastic gradient descent training. For an exponential energy-reliability model, the proposed training approach can decrease the memory energy consumption of a DNN with binary parameters by 3.3$\times$ at isoaccuracy, compared to a reliable implementation.
△ Less
Submitted 23 December, 2019;
originally announced December 2019.
-
Training Modern Deep Neural Networks for Memory-Fault Robustness
Authors:
Ghouthi Boukli Hacene,
François Leduc-Primeau,
Amal Ben Soussia,
Vincent Gripon,
François Gagnon
Abstract:
Because deep neural networks (DNNs) rely on a large number of parameters and computations, their implementation in energy-constrained systems is challenging. In this paper, we investigate the solution of reducing the supply voltage of the memories used in the system, which results in bit-cell faults. We explore the robustness of state-of-the-art DNN architectures towards such defects and propose a…
▽ More
Because deep neural networks (DNNs) rely on a large number of parameters and computations, their implementation in energy-constrained systems is challenging. In this paper, we investigate the solution of reducing the supply voltage of the memories used in the system, which results in bit-cell faults. We explore the robustness of state-of-the-art DNN architectures towards such defects and propose a regularizer meant to mitigate their effects on accuracy. Our experiments clearly demonstrate the interest of operating the system in a faulty regime to save energy without reducing accuracy.
△ Less
Submitted 22 November, 2019;
originally announced November 2019.
-
A Study of Deep Learning Robustness Against Computation Failures
Authors:
Jean-Charles Vialatte,
François Leduc-Primeau
Abstract:
For many types of integrated circuits, accepting larger failure rates in computations can be used to improve energy efficiency. We study the performance of faulty implementations of certain deep neural networks based on pessimistic and optimistic models of the effect of hardware faults. After identifying the impact of hyperparameters such as the number of layers on robustness, we study the ability…
▽ More
For many types of integrated circuits, accepting larger failure rates in computations can be used to improve energy efficiency. We study the performance of faulty implementations of certain deep neural networks based on pessimistic and optimistic models of the effect of hardware faults. After identifying the impact of hyperparameters such as the number of layers on robustness, we study the ability of the network to compensate for computational failures through an increase of the network size. We show that some networks can achieve equivalent performance under faulty implementations, and quantify the required increase in computational complexity.
△ Less
Submitted 18 April, 2017;
originally announced April 2017.
-
Stall Pattern Avoidance in Polynomial Product Codes
Authors:
Carlo Condo,
Francois Leduc-Primeau,
Gabi Sarkis,
Pascal Giard,
Warren Gross
Abstract:
Product codes are a concatenated error-correction scheme that has been often considered for applications requiring very low bit-error rates, which demand that the error floor be decreased as much as possible. In this work, we consider product codes constructed from polynomial algebraic codes, and propose a novel low-complexity post-processing technique that is able to improve the error-correction…
▽ More
Product codes are a concatenated error-correction scheme that has been often considered for applications requiring very low bit-error rates, which demand that the error floor be decreased as much as possible. In this work, we consider product codes constructed from polynomial algebraic codes, and propose a novel low-complexity post-processing technique that is able to improve the error-correction performance by orders of magnitude. We provide lower bounds for the error rate achievable under post processing, and present simulation results indicating that these bounds are tight.
△ Less
Submitted 15 November, 2016;
originally announced November 2016.
-
A 9.52 dB NCG FEC scheme and 164 bits/cycle low-complexity product decoder architecture
Authors:
Carlo Condo,
Pascal Giard,
François Leduc-Primeau,
Gabi Sarkis,
Warren J. Gross
Abstract:
Powerful Forward Error Correction (FEC) schemes are used in optical communications to achieve bit-error rates below $10^{-15}$. These FECs follow one of two approaches: concatenation of simpler hard-decision codes or usage of inherently powerful soft-decision codes. The first approach yields lower Net Coding Gains (NCGs), but can usually work at higher code rates and have lower complexity decoders…
▽ More
Powerful Forward Error Correction (FEC) schemes are used in optical communications to achieve bit-error rates below $10^{-15}$. These FECs follow one of two approaches: concatenation of simpler hard-decision codes or usage of inherently powerful soft-decision codes. The first approach yields lower Net Coding Gains (NCGs), but can usually work at higher code rates and have lower complexity decoders. In this work, we propose a novel FEC scheme based on a product code and a post-processing technique. It can achieve an NCG of 9.52~dB at a BER of $10^{-15}$ and 9.96~dB at a BER of $10^{-18}$, an error-correction performance that sits between that of current hard-decision and soft-decision FECs. A decoder architecture is designed, tested on FPGA and synthesized in 65 nm CMOS technology: its 164 bits/cycle worst-case information throughput can reach 100 Gb/s at the achieved frequency of 609~MHz. Its complexity is shown to be lower than that of hard-decision decoders in literature, and an order of magnitude lower than the estimated complexity of soft-decision decoders.
△ Less
Submitted 5 April, 2017; v1 submitted 18 October, 2016;
originally announced October 2016.
-
VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing
Authors:
Arash Ardakani,
François Leduc-Primeau,
Naoya Onizawa,
Takahiro Hanyu,
Warren J. Gross
Abstract:
The hardware implementation of deep neural networks (DNNs) has recently received tremendous attention: many applications in fact require high-speed operations that suit a hardware implementation. However, numerous elements and complex interconnections are usually required, leading to a large area occupation and copious power consumption. Stochastic computing has shown promising results for low-pow…
▽ More
The hardware implementation of deep neural networks (DNNs) has recently received tremendous attention: many applications in fact require high-speed operations that suit a hardware implementation. However, numerous elements and complex interconnections are usually required, leading to a large area occupation and copious power consumption. Stochastic computing has shown promising results for low-power area-efficient hardware implementations, even though existing stochastic algorithms require long streams that cause long latencies. In this paper, we propose an integer form of stochastic computation and introduce some elementary circuits. We then propose an efficient implementation of a DNN based on integral stochastic computing. The proposed architecture has been implemented on a Virtex7 FPGA, resulting in 45% and 62% average reductions in area and latency compared to the best reported architecture in literature. We also synthesize the circuits in a 65 nm CMOS technology and we show that the proposed integral stochastic architecture results in up to 21% reduction in energy consumption compared to the binary radix implementation at the same misclassification rate. Due to fault-tolerant nature of stochastic architectures, we also consider a quasi-synchronous implementation which yields 33% reduction in energy consumption w.r.t. the binary radix implementation without any compromise on performance.
△ Less
Submitted 24 August, 2016; v1 submitted 29 September, 2015;
originally announced September 2015.
-
Modeling and Energy Optimization of LDPC Decoder Circuits with Timing Violations
Authors:
François Leduc-Primeau,
Frank R. Kschischang,
Warren J. Gross
Abstract:
This paper proposes a "quasi-synchronous" design approach for signal processing circuits, in which timing violations are permitted, but without the need for a hardware compensation mechanism. The case of a low-density parity-check (LDPC) decoder is studied, and a method for accurately modeling the effect of timing violations at a high level of abstraction is presented. The error-correction perform…
▽ More
This paper proposes a "quasi-synchronous" design approach for signal processing circuits, in which timing violations are permitted, but without the need for a hardware compensation mechanism. The case of a low-density parity-check (LDPC) decoder is studied, and a method for accurately modeling the effect of timing violations at a high level of abstraction is presented. The error-correction performance of code ensembles is then evaluated using density evolution while taking into account the effect of timing faults. Following this, several quasi-synchronous LDPC decoder circuits based on the offset min-sum algorithm are optimized, providing a 23%-40% reduction in energy consumption or energy-delay product, while achieving the same performance and occupying the same area as conventional synchronous circuits.
△ Less
Submitted 17 November, 2017; v1 submitted 12 March, 2015;
originally announced March 2015.
-
Relaxed Half-Stochastic Belief Propagation
Authors:
François Leduc-Primeau,
Saied Hemati,
Shie Mannor,
Warren J. Gross
Abstract:
Low-density parity-check codes are attractive for high throughput applications because of their low decoding complexity per bit, but also because all the codeword bits can be decoded in parallel. However, achieving this in a circuit implementation is complicated by the number of wires required to exchange messages between processing nodes. Decoding algorithms that exchange binary messages are inte…
▽ More
Low-density parity-check codes are attractive for high throughput applications because of their low decoding complexity per bit, but also because all the codeword bits can be decoded in parallel. However, achieving this in a circuit implementation is complicated by the number of wires required to exchange messages between processing nodes. Decoding algorithms that exchange binary messages are interesting for fully-parallel implementations because they can reduce the number and the length of the wires, and increase logic density. This paper introduces the Relaxed Half-Stochastic (RHS) decoding algorithm, a binary message belief propagation (BP) algorithm that achieves a coding gain comparable to the best known BP algorithms that use real-valued messages. We derive the RHS algorithm by starting from the well-known Sum-Product algorithm, and then derive a low-complexity version suitable for circuit implementation. We present extensive simulation results on two standardized codes having different rates and constructions, including low bit error rate results. These simulations show that RHS can be an advantageous replacement for the existing state-of-the-art decoding algorithms when targeting fully-parallel implementations.
△ Less
Submitted 11 May, 2012;
originally announced May 2012.