-
Efficient Bit-Channel Reliability Computation for Multi-Mode Polar Code Encoders and Decoders
Authors:
Carlo Condo,
Seyyed Ali Hashemi,
Warren J. Gross
Abstract:
Polar codes are a family of capacity-achieving error-correcting codes, and they have been selected as part of the next generation wireless communication standard. Each polar code bit-channel is assigned a reliability value, used to determine which bits transmit information and which parity. Relative reliabilities need to be known by both encoders and decoders: in case of multi-mode systems, where…
▽ More
Polar codes are a family of capacity-achieving error-correcting codes, and they have been selected as part of the next generation wireless communication standard. Each polar code bit-channel is assigned a reliability value, used to determine which bits transmit information and which parity. Relative reliabilities need to be known by both encoders and decoders: in case of multi-mode systems, where multiple code lengths and code rates are supported, the storage of relative reliabilities can lead to high implementation complexity. In this work, observe patterns among code reliabilities. We propose an approximate computation technique to easily represent the reliabilities of multiple codes, through a limited set of variables and update rules. The proposed method allows to tune the trade-off between reliability accuracy and implementation complexity. An approximate computation architecture for encoders and decoders is designed and implemented, showing 50.7% less area occupation than storage-based solutions, with less than 0.05 dB error correction performance degradation.
△ Less
Submitted 16 May, 2017;
originally announced May 2017.
-
Partitioned List Decoding of Polar Codes: Analysis and Improvement of Finite Length Performance
Authors:
Seyyed Ali Hashemi,
Marco Mondelli,
S. Hamed Hassani,
Rudiger Urbanke,
Warren J. Gross
Abstract:
Polar codes represent one of the major recent breakthroughs in coding theory and, because of their attractive features, they have been selected for the incoming 5G standard. As such, a lot of attention has been devoted to the development of decoding algorithms with good error performance and efficient hardware implementation. One of the leading candidates in this regard is represented by successiv…
▽ More
Polar codes represent one of the major recent breakthroughs in coding theory and, because of their attractive features, they have been selected for the incoming 5G standard. As such, a lot of attention has been devoted to the development of decoding algorithms with good error performance and efficient hardware implementation. One of the leading candidates in this regard is represented by successive-cancellation list (SCL) decoding. However, its hardware implementation requires a large amount of memory. Recently, a partitioned SCL (PSCL) decoder has been proposed to significantly reduce the memory consumption. In this paper, we examine the paradigm of PSCL decoding from both theoretical and practical standpoints: (i) by changing the construction of the code, we are able to improve the performance at no additional computational, latency or memory cost, (ii) we present an optimal scheme to allocate cyclic redundancy checks (CRCs), and (iii) we provide an upper bound on the list size that allows MAP performance.
△ Less
Submitted 29 August, 2017; v1 submitted 15 May, 2017;
originally announced May 2017.
-
Blind Detection with Polar Codes
Authors:
Carlo Condo,
Seyyed Ali Hashemi,
Warren J. Gross
Abstract:
In blind detection, a set of candidates has to be decoded within a strict time constraint, to identify which transmissions are directed at the user equipment. Blind detection is an operation required by the 3GPP LTE/LTE-Advanced standard, and it will be required in the 5th generation wireless communication standard (5G) as well. We propose a blind detection scheme based on polar codes, where the r…
▽ More
In blind detection, a set of candidates has to be decoded within a strict time constraint, to identify which transmissions are directed at the user equipment. Blind detection is an operation required by the 3GPP LTE/LTE-Advanced standard, and it will be required in the 5th generation wireless communication standard (5G) as well. We propose a blind detection scheme based on polar codes, where the radio network temporary identifier (RNTI) is transmitted instead of some of the frozen bits. A low-complexity decoding stage decodes all candidates, selecting a subset that is decoded by a high-performance algorithm. Simulations results show good missed detection and false alarm rates, that meet the system specifications. We also propose an early stop** criterion for the second decoding stage that can reduce the number of operations performed, improving both average latency and energy consumption. The detection speed is analyzed and different system parameter combinations are shown to meet the stringent timing requirements, leading to various implementation trade-offs.
△ Less
Submitted 15 May, 2017; v1 submitted 4 May, 2017;
originally announced May 2017.
-
Fast and Flexible Successive-Cancellation List Decoders for Polar Codes
Authors:
Seyyed Ali Hashemi,
Carlo Condo,
Warren J. Gross
Abstract:
Polar codes have gained significant amount of attention during the past few years and have been selected as a coding scheme for the next generation of mobile broadband standard. Among decoding schemes, successive-cancellation list (SCL) decoding provides a reasonable trade-off between the error-correction performance and hardware implementation complexity when used to decode polar codes, at the co…
▽ More
Polar codes have gained significant amount of attention during the past few years and have been selected as a coding scheme for the next generation of mobile broadband standard. Among decoding schemes, successive-cancellation list (SCL) decoding provides a reasonable trade-off between the error-correction performance and hardware implementation complexity when used to decode polar codes, at the cost of limited throughput. The simplified SCL (SSCL) and its extension SSCL-SPC increase the speed of decoding by removing redundant calculations when encountering particular information and frozen bit patterns (rate one and single parity check codes), while kee** the error-correction performance unaltered. In this paper, we improve SSCL and SSCL-SPC by proving that the list size imposes a specific number of bit estimations required to decode rate one and single parity check codes. Thus, the number of estimations can be limited while guaranteeing exactly the same error-correction performance as if all bits of the code were estimated. We call the new decoding algorithms Fast-SSCL and Fast-SSCL-SPC. Moreover, we show that the number of bit estimations in a practical application can be tuned to achieve desirable speed, while kee** the error-correction performance almost unchanged. Hardware architectures implementing both algorithms are then described and implemented: it is shown that our design can achieve 1.86 Gb/s throughput, higher than the best state-of-the-art decoders.
△ Less
Submitted 29 August, 2017; v1 submitted 23 March, 2017;
originally announced March 2017.
-
A Multi-Gbps Unrolled Hardware List Decoder for a Systematic Polar Code
Authors:
Pascal Giard,
Alexios Balatsoukas-Stimming,
Thomas Christoph Müller,
Andreas Burg,
Claude Thibeault,
Warren J. Gross
Abstract:
Polar codes are a new class of block codes with an explicit construction that provably achieve the capacity of various communications channels, even with the low-complexity successive-cancellation (SC) decoding algorithm. Yet, the more complex successive-cancellation list (SCL) decoding algorithm is gathering more attention lately as it significantly improves the error-correction performance of sh…
▽ More
Polar codes are a new class of block codes with an explicit construction that provably achieve the capacity of various communications channels, even with the low-complexity successive-cancellation (SC) decoding algorithm. Yet, the more complex successive-cancellation list (SCL) decoding algorithm is gathering more attention lately as it significantly improves the error-correction performance of short- to moderate-length polar codes, especially when they are concatenated with a cyclic redundancy check code. However, as SCL decoding explores several decoding paths, existing hardware implementations tend to be significantly slower than SC-based decoders. In this paper, we show how the unrolling technique, which has already been used in the context of SC decoding, can be adapted to SCL decoding yielding a multi-Gbps SCL-based polar decoder with an error-correction performance that is competitive when compared to an LDPC code of similar length and rate. Post-place-and-route ASIC results for 28 nm CMOS are provided showing that this decoder can sustain a throughput greater than 10 Gbps at 468 MHz with an energy efficiency of 7.25 pJ/bit.
△ Less
Submitted 3 February, 2017;
originally announced February 2017.
-
Fast Simplified Successive-Cancellation List Decoding of Polar Codes
Authors:
Seyyed Ali Hashemi,
Carlo Condo,
Warren J. Gross
Abstract:
Polar codes are capacity achieving error correcting codes that can be decoded through the successive-cancellation algorithm. To improve its error-correction performance, a list-based version called successive-cancellation list (SCL) has been proposed in the past, that however substantially increases the number of time-steps in the decoding process. The simplified SCL (SSCL) decoding algorithm expl…
▽ More
Polar codes are capacity achieving error correcting codes that can be decoded through the successive-cancellation algorithm. To improve its error-correction performance, a list-based version called successive-cancellation list (SCL) has been proposed in the past, that however substantially increases the number of time-steps in the decoding process. The simplified SCL (SSCL) decoding algorithm exploits constituent codes within the polar code structure to greatly reduce the required number of time-steps without introducing any error-correction performance loss. In this paper, we propose a faster decoding approach to decode one of these constituent codes, the Rate-1 node. We use this Rate-1 node decoder to develop Fast-SSCL. We demonstrate that only a list-size-bound number of bits needs to be estimated in Rate-1 nodes and Fast-SSCL exactly matches the error-correction performance of SCL and SSCL. This technique can potentially greatly reduce the total number of time-steps needed for polar codes decoding: analysis on a set of case studies show that Fast-SSCL has a number of time-steps requirement that is up to 66.6% lower than SSCL and 88.1% lower than SCL.
△ Less
Submitted 27 January, 2017;
originally announced January 2017.
-
Neural Offset Min-Sum Decoding
Authors:
Loren Lugosch,
Warren J. Gross
Abstract:
Recently, it was shown that if multiplicative weights are assigned to the edges of a Tanner graph used in belief propagation decoding, it is possible to use deep learning techniques to find values for the weights which improve the error-correction performance of the decoder. Unfortunately, this approach requires many multiplications, which are generally expensive operations. In this paper, we sugg…
▽ More
Recently, it was shown that if multiplicative weights are assigned to the edges of a Tanner graph used in belief propagation decoding, it is possible to use deep learning techniques to find values for the weights which improve the error-correction performance of the decoder. Unfortunately, this approach requires many multiplications, which are generally expensive operations. In this paper, we suggest a more hardware-friendly approach in which offset min-sum decoding is augmented with learnable offset parameters. Our method uses no multiplications and has a parameter count less than half that of the multiplicative algorithm. This both speeds up training and provides a feasible path to hardware architectures. After describing our method, we compare the performance of the two neural decoding algorithms and show that our method achieves error-correction performance within 0.1 dB of the multiplicative approach and as much as 1 dB better than traditional belief propagation for the codes under consideration.
△ Less
Submitted 27 July, 2017; v1 submitted 20 January, 2017;
originally announced January 2017.
-
Stall Pattern Avoidance in Polynomial Product Codes
Authors:
Carlo Condo,
Francois Leduc-Primeau,
Gabi Sarkis,
Pascal Giard,
Warren Gross
Abstract:
Product codes are a concatenated error-correction scheme that has been often considered for applications requiring very low bit-error rates, which demand that the error floor be decreased as much as possible. In this work, we consider product codes constructed from polynomial algebraic codes, and propose a novel low-complexity post-processing technique that is able to improve the error-correction…
▽ More
Product codes are a concatenated error-correction scheme that has been often considered for applications requiring very low bit-error rates, which demand that the error floor be decreased as much as possible. In this work, we consider product codes constructed from polynomial algebraic codes, and propose a novel low-complexity post-processing technique that is able to improve the error-correction performance by orders of magnitude. We provide lower bounds for the error rate achievable under post processing, and present simulation results indicating that these bounds are tight.
△ Less
Submitted 15 November, 2016;
originally announced November 2016.
-
Neural Networks Designing Neural Networks: Multi-Objective Hyper-Parameter Optimization
Authors:
Sean C. Smithson,
Guang Yang,
Warren J. Gross,
Brett H. Meyer
Abstract:
Artificial neural networks have gone through a recent rise in popularity, achieving state-of-the-art results in various fields, including image classification, speech recognition, and automated control. Both the performance and computational complexity of such models are heavily dependant on the design of characteristic hyper-parameters (e.g., number of hidden layers, nodes per layer, or choice of…
▽ More
Artificial neural networks have gone through a recent rise in popularity, achieving state-of-the-art results in various fields, including image classification, speech recognition, and automated control. Both the performance and computational complexity of such models are heavily dependant on the design of characteristic hyper-parameters (e.g., number of hidden layers, nodes per layer, or choice of activation functions), which have traditionally been optimized manually. With machine learning penetrating low-power mobile and embedded areas, the need to optimize not only for performance (accuracy), but also for implementation complexity, becomes paramount. In this work, we present a multi-objective design space exploration method that reduces the number of solution networks trained and evaluated through response surface modelling. Given spaces which can easily exceed 1020 solutions, manually designing a near-optimal architecture is unlikely as opportunities to reduce network complexity, while maintaining performance, may be overlooked. This problem is exacerbated by the fact that hyper-parameters which perform well on specific datasets may yield sub-par results on others, and must therefore be designed on a per-application basis. In our work, machine learning is leveraged by training an artificial neural network to predict the performance of future candidate networks. The method is evaluated on the MNIST and CIFAR-10 image datasets, optimizing for both recognition accuracy and computational complexity. Experimental results demonstrate that the proposed method can closely approximate the Pareto-optimal front, while only exploring a small fraction of the design space.
△ Less
Submitted 7 November, 2016;
originally announced November 2016.
-
Sparsely-Connected Neural Networks: Towards Efficient VLSI Implementation of Deep Neural Networks
Authors:
Arash Ardakani,
Carlo Condo,
Warren J. Gross
Abstract:
Recently deep neural networks have received considerable attention due to their ability to extract and represent high-level abstractions in data sets. Deep neural networks such as fully-connected and convolutional neural networks have shown excellent performance on a wide range of recognition and classification tasks. However, their hardware implementations currently suffer from large silicon area…
▽ More
Recently deep neural networks have received considerable attention due to their ability to extract and represent high-level abstractions in data sets. Deep neural networks such as fully-connected and convolutional neural networks have shown excellent performance on a wide range of recognition and classification tasks. However, their hardware implementations currently suffer from large silicon area and high power consumption due to the their high degree of complexity. The power/energy consumption of neural networks is dominated by memory accesses, the majority of which occur in fully-connected networks. In fact, they contain most of the deep neural network parameters. In this paper, we propose sparsely-connected networks, by showing that the number of connections in fully-connected networks can be reduced by up to 90% while improving the accuracy performance on three popular datasets (MNIST, CIFAR10 and SVHN). We then propose an efficient hardware architecture based on linear-feedback shift registers to reduce the memory requirements of the proposed sparsely-connected networks. The proposed architecture can save up to 90% of memory compared to the conventional implementations of fully-connected neural networks. Moreover, implementation results show up to 84% reduction in the energy consumption of a single neuron of the proposed sparsely-connected networks compared to a single neuron of fully-connected neural networks.
△ Less
Submitted 30 March, 2017; v1 submitted 4 November, 2016;
originally announced November 2016.
-
A 9.52 dB NCG FEC scheme and 164 bits/cycle low-complexity product decoder architecture
Authors:
Carlo Condo,
Pascal Giard,
François Leduc-Primeau,
Gabi Sarkis,
Warren J. Gross
Abstract:
Powerful Forward Error Correction (FEC) schemes are used in optical communications to achieve bit-error rates below $10^{-15}$. These FECs follow one of two approaches: concatenation of simpler hard-decision codes or usage of inherently powerful soft-decision codes. The first approach yields lower Net Coding Gains (NCGs), but can usually work at higher code rates and have lower complexity decoders…
▽ More
Powerful Forward Error Correction (FEC) schemes are used in optical communications to achieve bit-error rates below $10^{-15}$. These FECs follow one of two approaches: concatenation of simpler hard-decision codes or usage of inherently powerful soft-decision codes. The first approach yields lower Net Coding Gains (NCGs), but can usually work at higher code rates and have lower complexity decoders. In this work, we propose a novel FEC scheme based on a product code and a post-processing technique. It can achieve an NCG of 9.52~dB at a BER of $10^{-15}$ and 9.96~dB at a BER of $10^{-18}$, an error-correction performance that sits between that of current hard-decision and soft-decision FECs. A decoder architecture is designed, tested on FPGA and synthesized in 65 nm CMOS technology: its 164 bits/cycle worst-case information throughput can reach 100 Gb/s at the achieved frequency of 609~MHz. Its complexity is shown to be lower than that of hard-decision decoders in literature, and an order of magnitude lower than the estimated complexity of soft-decision decoders.
△ Less
Submitted 5 April, 2017; v1 submitted 18 October, 2016;
originally announced October 2016.
-
Hardware Decoders for Polar Codes: An Overview
Authors:
Pascal Giard,
Gabi Sarkis,
Alexios Balatsoukas-Stimming,
YouZhe Fan,
Chi-ying Tsui,
Andreas Burg,
Claude Thibeault,
Warren J. Gross
Abstract:
Polar codes are an exciting new class of error correcting codes that achieve the symmetric capacity of memoryless channels. Many decoding algorithms were developed and implemented, addressing various application requirements: from error-correction performance rivaling that of LDPC codes to very high throughput or low-complexity decoders. In this work, we review the state of the art in polar decode…
▽ More
Polar codes are an exciting new class of error correcting codes that achieve the symmetric capacity of memoryless channels. Many decoding algorithms were developed and implemented, addressing various application requirements: from error-correction performance rivaling that of LDPC codes to very high throughput or low-complexity decoders. In this work, we review the state of the art in polar decoders implementing the successive-cancellation, belief propagation, and list decoding algorithms, illustrating their advantages.
△ Less
Submitted 2 June, 2016;
originally announced June 2016.
-
Fast Low-Complexity Decoders for Low-Rate Polar Codes
Authors:
Pascal Giard,
Alexios Balatsoukas-Stimming,
Gabi Sarkis,
Claude Thibeault,
Warren J. Gross
Abstract:
Polar codes are capacity-achieving error-correcting codes with an explicit construction that can be decoded with low-complexity algorithms. In this work, we show how the state-of-the-art low-complexity decoding algorithm can be improved to better accommodate low-rate codes. More constituent codes are recognized in the updated algorithm and dedicated hardware is added to efficiently decode these ne…
▽ More
Polar codes are capacity-achieving error-correcting codes with an explicit construction that can be decoded with low-complexity algorithms. In this work, we show how the state-of-the-art low-complexity decoding algorithm can be improved to better accommodate low-rate codes. More constituent codes are recognized in the updated algorithm and dedicated hardware is added to efficiently decode these new constituent codes. We also alter the polar code construction to further decrease the latency and increase the throughput with little to no noticeable effect on error-correction performance. Rate-flexible decoders for polar codes of length 1024 and 2048 are implemented on FPGA. Over the previous work, they are shown to have from 22% to 28% lower latency and 26% to 34% greater throughput when decoding low-rate codes. On 65 nm ASIC CMOS technology, the proposed decoder for a (1024, 512) polar code is shown to compare favorably against the state-of-the-art ASIC decoders. With a clock frequency of 400 MHz and a supply voltage of 0.8 V, it has a latency of 0.41 $μ$s and an area efficiency of 1.8 Gbps/mm$^2$ for an energy efficiency of 77 pJ/info. bit. At 600 MHz with a supply of 1 V, the latency is reduced to 0.27 $μ$s and the area efficiency increased to 2.7 Gbps/mm$^2$ at 115 pJ/info. bit.
△ Less
Submitted 17 March, 2016; v1 submitted 16 March, 2016;
originally announced March 2016.
-
Partitioned Successive-Cancellation List Decoding of Polar Codes
Authors:
Seyyed Ali Hashemi,
Alexios Balatsoukas-Stimming,
Pascal Giard,
Claude Thibeault,
Warren J. Gross
Abstract:
Successive-cancellation list (SCL) decoding is an algorithm that provides very good error-correction performance for polar codes. However, its hardware implementation requires a large amount of memory, mainly to store intermediate results. In this paper, a partitioned SCL algorithm is proposed to reduce the large memory requirements of the conventional SCL algorithm. The decoder tree is broken int…
▽ More
Successive-cancellation list (SCL) decoding is an algorithm that provides very good error-correction performance for polar codes. However, its hardware implementation requires a large amount of memory, mainly to store intermediate results. In this paper, a partitioned SCL algorithm is proposed to reduce the large memory requirements of the conventional SCL algorithm. The decoder tree is broken into partitions that are decoded separately. We show that with careful selection of list sizes and number of partitions, the proposed algorithm can outperform conventional SCL while requiring less memory.
△ Less
Submitted 22 January, 2016; v1 submitted 9 December, 2015;
originally announced December 2015.
-
VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing
Authors:
Arash Ardakani,
François Leduc-Primeau,
Naoya Onizawa,
Takahiro Hanyu,
Warren J. Gross
Abstract:
The hardware implementation of deep neural networks (DNNs) has recently received tremendous attention: many applications in fact require high-speed operations that suit a hardware implementation. However, numerous elements and complex interconnections are usually required, leading to a large area occupation and copious power consumption. Stochastic computing has shown promising results for low-pow…
▽ More
The hardware implementation of deep neural networks (DNNs) has recently received tremendous attention: many applications in fact require high-speed operations that suit a hardware implementation. However, numerous elements and complex interconnections are usually required, leading to a large area occupation and copious power consumption. Stochastic computing has shown promising results for low-power area-efficient hardware implementations, even though existing stochastic algorithms require long streams that cause long latencies. In this paper, we propose an integer form of stochastic computation and introduce some elementary circuits. We then propose an efficient implementation of a DNN based on integral stochastic computing. The proposed architecture has been implemented on a Virtex7 FPGA, resulting in 45% and 62% average reductions in area and latency compared to the best reported architecture in literature. We also synthesize the circuits in a 65 nm CMOS technology and we show that the proposed integral stochastic architecture results in up to 21% reduction in energy consumption compared to the binary radix implementation at the same misclassification rate. Due to fault-tolerant nature of stochastic architectures, we also consider a quasi-synchronous implementation which yields 33% reduction in energy consumption w.r.t. the binary radix implementation without any compromise on performance.
△ Less
Submitted 24 August, 2016; v1 submitted 29 September, 2015;
originally announced September 2015.
-
Flexible and Low-Complexity Encoding and Decoding of Systematic Polar Codes
Authors:
Gabi Sarkis,
Ido Tal,
Pascal Giard,
Alexander Vardy,
Claude Thibeault,
Warren J. Gross
Abstract:
In this work, we present hardware and software implementations of flexible polar systematic encoders and decoders. The proposed implementations operate on polar codes of any length less than a maximum and of any rate. We describe the low-complexity, highly parallel, and flexible systematic-encoding algorithm that we use and prove its correctness. Our hardware implementation results show that the o…
▽ More
In this work, we present hardware and software implementations of flexible polar systematic encoders and decoders. The proposed implementations operate on polar codes of any length less than a maximum and of any rate. We describe the low-complexity, highly parallel, and flexible systematic-encoding algorithm that we use and prove its correctness. Our hardware implementation results show that the overhead of adding code rate and length flexibility is little, and the impact on operation latency minor compared to code-specific versions. Finally, the flexible software encoder and decoder implementations are also shown to be able to maintain high throughput and low latency.
△ Less
Submitted 23 February, 2016; v1 submitted 13 July, 2015;
originally announced July 2015.
-
Fast List Decoders for Polar Codes
Authors:
Gabi Sarkis,
Pascal Giard,
Alexander Vardy,
Claude Thibeault,
Warren J. Gross
Abstract:
Polar codes asymptotically achieve the symmetric capacity of memoryless channels, yet their error-correcting performance under successive-cancellation (SC) decoding for short and moderate length codes is worse than that of other modern codes such as low-density parity-check (LDPC) codes. Of the many methods to improve the error-correction performance of polar codes, list decoding yields the best r…
▽ More
Polar codes asymptotically achieve the symmetric capacity of memoryless channels, yet their error-correcting performance under successive-cancellation (SC) decoding for short and moderate length codes is worse than that of other modern codes such as low-density parity-check (LDPC) codes. Of the many methods to improve the error-correction performance of polar codes, list decoding yields the best results, especially when the polar code is concatenated with a cyclic redundancy check (CRC). List decoding involves exploring several decoding paths with SC decoding, and therefore tends to be slower than SC decoding itself, by an order of magnitude in practical implementations. In this paper, we present a new algorithm based on unrolling the decoding tree of the code that improves the speed of list decoding by an order of magnitude when implemented in software. Furthermore, we show that for software-defined radio applications, our proposed algorithm is faster than the fastest software implementations of LDPC decoders in the literature while offering comparable error-correction performance at similar or shorter code lengths.
△ Less
Submitted 10 November, 2015; v1 submitted 6 May, 2015;
originally announced May 2015.
-
Multi-mode Unrolled Architectures for Polar Decoders
Authors:
Pascal Giard,
Gabi Sarkis,
Claude Thibeault,
Warren J. Gross
Abstract:
In this work, we present a family of architectures for polar decoders using a reduced-complexity successive-cancellation decoding algorithm that employs unrolling to achieve extremely high throughput values while retaining moderate implementation complexity. The resulting fully-unrolled, deeply-pipelined architecture is capable of achieving a coded throughput in excess of 1 Tbps on a 65 nm ASIC at…
▽ More
In this work, we present a family of architectures for polar decoders using a reduced-complexity successive-cancellation decoding algorithm that employs unrolling to achieve extremely high throughput values while retaining moderate implementation complexity. The resulting fully-unrolled, deeply-pipelined architecture is capable of achieving a coded throughput in excess of 1 Tbps on a 65 nm ASIC at 500 MHz---three orders of magnitude greater than current state-of-the-art polar decoders. However, unrolled decoders are built for a specific, fixed code. Therefore we also present a new method to enable the use of multiple code lengths and rates in a fully-unrolled polar decoder architecture. This method leads to a length- and rate-flexible decoder while retaining the very high speed typical to unrolled decoders. The resulting decoders can decode a master polar code of a given rate and length, and several shorter codes of different rates and lengths. We present results for two versions of a multi-mode decoder supporting eight and ten different polar codes, respectively. Both are capable of a peak throughput of 25.6 Gbps. For each decoder, the energy efficiency for the longest supported polar code is shown to be of 14.8 pJ/bit at 250 MHz and of 8.8 pJ/bit at 500 MHz.
△ Less
Submitted 11 July, 2016; v1 submitted 6 May, 2015;
originally announced May 2015.
-
Low-Latency Software Polar Decoders
Authors:
Pascal Giard,
Gabi Sarkis,
Camille Leroux,
Claude Thibeault,
Warren J. Gross
Abstract:
Polar codes are a new class of capacity-achieving error-correcting codes with low encoding and decoding complexity. Their low-complexity decoding algorithms rendering them attractive for use in software-defined radio applications where computational resources are limited. In this work, we present low-latency software polar decoders that exploit modern processor capabilities. We show how adapting t…
▽ More
Polar codes are a new class of capacity-achieving error-correcting codes with low encoding and decoding complexity. Their low-complexity decoding algorithms rendering them attractive for use in software-defined radio applications where computational resources are limited. In this work, we present low-latency software polar decoders that exploit modern processor capabilities. We show how adapting the algorithm at various levels can lead to significant improvements in latency and throughput, yielding polar decoders that are suitable for high-performance software-defined radio applications on modern desktop processors and embedded-platform processors. These proposed decoders have an order of magnitude lower latency and memory footprint compared to state-of-the-art decoders, while maintaining comparable throughput. In addition, we present strategies and results for implementing polar decoders on graphical processing units. Finally, we show that the energy efficiency of the proposed decoders is comparable to state-of-the-art software polar decoders.
△ Less
Submitted 11 July, 2016; v1 submitted 1 April, 2015;
originally announced April 2015.
-
Modeling and Energy Optimization of LDPC Decoder Circuits with Timing Violations
Authors:
François Leduc-Primeau,
Frank R. Kschischang,
Warren J. Gross
Abstract:
This paper proposes a "quasi-synchronous" design approach for signal processing circuits, in which timing violations are permitted, but without the need for a hardware compensation mechanism. The case of a low-density parity-check (LDPC) decoder is studied, and a method for accurately modeling the effect of timing violations at a high level of abstraction is presented. The error-correction perform…
▽ More
This paper proposes a "quasi-synchronous" design approach for signal processing circuits, in which timing violations are permitted, but without the need for a hardware compensation mechanism. The case of a low-density parity-check (LDPC) decoder is studied, and a method for accurately modeling the effect of timing violations at a high level of abstraction is presented. The error-correction performance of code ensembles is then evaluated using density evolution while taking into account the effect of timing faults. Following this, several quasi-synchronous LDPC decoder circuits based on the offset min-sum algorithm are optimized, providing a 23%-40% reduction in energy consumption or energy-delay product, while achieving the same performance and occupying the same area as conventional synchronous circuits.
△ Less
Submitted 17 November, 2017; v1 submitted 12 March, 2015;
originally announced March 2015.
-
A 237 Gbps Unrolled Hardware Polar Decoder
Authors:
Pascal Giard,
Gabi Sarkis,
Claude Thibeault,
Warren J. Gross
Abstract:
In this letter we present a new architecture for a polar decoder using a reduced complexity successive cancellation decoding algorithm. This novel fully-unrolled, deeply-pipelined architecture is capable of achieving a coded throughput of over 237 Gbps for a (1024,512) polar code implemented using an FPGA. This decoder is two orders of magnitude faster than state-of-the-art polar decoders.
In this letter we present a new architecture for a polar decoder using a reduced complexity successive cancellation decoding algorithm. This novel fully-unrolled, deeply-pipelined architecture is capable of achieving a coded throughput of over 237 Gbps for a (1024,512) polar code implemented using an FPGA. This decoder is two orders of magnitude faster than state-of-the-art polar decoders.
△ Less
Submitted 18 December, 2014;
originally announced December 2014.
-
In-Network Linear Regression with Arbitrarily Split Data Matrices
Authors:
François D. Côté,
Ioannis N. Psaromiligkos,
Warren J. Gross
Abstract:
In this paper, we address the problem of how a network of agents can collaboratively fit a linear model when each agent only ever has an arbitrary summand of the regression data. This problem generalizes previously studied data-matrix-splitting scenarios, allowing for some agents to have more measurements of some features than of others and even have measurements that other agents have. We present…
▽ More
In this paper, we address the problem of how a network of agents can collaboratively fit a linear model when each agent only ever has an arbitrary summand of the regression data. This problem generalizes previously studied data-matrix-splitting scenarios, allowing for some agents to have more measurements of some features than of others and even have measurements that other agents have. We present a variable-centric framework for distributed optimization in a network, and use this framework to develop a proximal algorithm, based on the Douglas-Rachford method, that solves the problem.
△ Less
Submitted 5 August, 2014;
originally announced August 2014.
-
Increasing the Speed of Polar List Decoders
Authors:
Gabi Sarkis,
Pascal Giard,
Alexander Vardy,
Claude Thibeault,
Warren J. Gross
Abstract:
In this work, we present a simplified successive cancellation list decoder that uses a Chase-like decoding process to achieve a six time improvement in speed compared to successive cancellation list decoding while maintaining the same error-correction performance advantage over standard successive-cancellation polar decoders. We discuss the algorithm and detail the data structures and methods used…
▽ More
In this work, we present a simplified successive cancellation list decoder that uses a Chase-like decoding process to achieve a six time improvement in speed compared to successive cancellation list decoding while maintaining the same error-correction performance advantage over standard successive-cancellation polar decoders. We discuss the algorithm and detail the data structures and methods used to obtain this speed-up. We also propose an adaptive decoding algorithm that significantly improves the throughput while retaining the error-correction performance. Simulation results over the additive white Gaussian noise channel are provided and show that the proposed system is up to 16 times faster than an LDPC decoder of the same frame size, code rate, and similar error-correction performance, making it more suitable for use as a software decoding solution.
△ Less
Submitted 10 July, 2014;
originally announced July 2014.
-
Associative Memories Based on Multiple-Valued Sparse Clustered Networks
Authors:
Hooman Jarollahi,
Naoya Onizawa,
Takahiro Hanyu,
Warren J. Gross
Abstract:
Associative memories are structures that store data patterns and retrieve them given partial inputs. Sparse Clustered Networks (SCNs) are recently-introduced binary-weighted associative memories that significantly improve the storage and retrieval capabilities over the prior state-of-the art. However, deleting or updating the data patterns result in a significant increase in the data retrieval err…
▽ More
Associative memories are structures that store data patterns and retrieve them given partial inputs. Sparse Clustered Networks (SCNs) are recently-introduced binary-weighted associative memories that significantly improve the storage and retrieval capabilities over the prior state-of-the art. However, deleting or updating the data patterns result in a significant increase in the data retrieval error probability. In this paper, we propose an algorithm to address this problem by incorporating multiple-valued weights for the interconnections used in the network. The proposed algorithm lowers the error rate by an order of magnitude for our sample network with 60% deleted contents. We then investigate the advantages of the proposed algorithm for hardware implementations.
△ Less
Submitted 3 February, 2014;
originally announced February 2014.
-
Selective Decoding in Associative Memories Based on Sparse-Clustered Networks
Authors:
Hooman Jarollahi,
Naoya Onizawa,
Warren J. Gross
Abstract:
Associative memories are structures that can retrieve previously stored information given a partial input pattern instead of an explicit address as in indexed memories. A few hardware approaches have recently been introduced for a new family of associative memories based on Sparse-Clustered Networks (SCN) that show attractive features. These architectures are suitable for implementations with low…
▽ More
Associative memories are structures that can retrieve previously stored information given a partial input pattern instead of an explicit address as in indexed memories. A few hardware approaches have recently been introduced for a new family of associative memories based on Sparse-Clustered Networks (SCN) that show attractive features. These architectures are suitable for implementations with low retrieval latency, but are limited to small networks that store a few hundred data entries. In this paper, a new hardware architecture of SCNs is proposed that features a new data-storage technique as well as a method we refer to as Selective Decoding (SD-SCN). The SD-SCN has been implemented using a similar FPGA used in the previous efforts and achieves two orders of magnitude higher capacity, with no error-performance penalty but with the cost of few extra clock cycles per data access.
△ Less
Submitted 27 August, 2013;
originally announced August 2013.
-
Fast Polar Decoders: Algorithm and Implementation
Authors:
Gabi Sarkis,
Pascal Giard,
Alexander Vardy,
Claude Thibeault,
Warren J. Gross
Abstract:
Polar codes provably achieve the symmetric capacity of a memoryless channel while having an explicit construction. This work aims to increase the throughput of polar decoder hardware by an order of magnitude relative to the state of the art successive-cancellation decoder. We present an algorithm, architecture, and FPGA implementation of a gigabit-per-second polar decoder.
Polar codes provably achieve the symmetric capacity of a memoryless channel while having an explicit construction. This work aims to increase the throughput of polar decoder hardware by an order of magnitude relative to the state of the art successive-cancellation decoder. We present an algorithm, architecture, and FPGA implementation of a gigabit-per-second polar decoder.
△ Less
Submitted 9 December, 2013; v1 submitted 26 July, 2013;
originally announced July 2013.
-
Fast Software Polar Decoders
Authors:
Pascal Giard,
Gabi Sarkis,
Claude Thibeault,
Warren J. Gross
Abstract:
Among error-correcting codes, polar codes are the first to provably achieve channel capacity with an explicit construction. In this work, we present software implementations of a polar decoder that leverage the capabilities of modern general-purpose processors to achieve an information throughput in excess of 200 Mbps, a throughput well suited for software-defined-radio applications. We also show…
▽ More
Among error-correcting codes, polar codes are the first to provably achieve channel capacity with an explicit construction. In this work, we present software implementations of a polar decoder that leverage the capabilities of modern general-purpose processors to achieve an information throughput in excess of 200 Mbps, a throughput well suited for software-defined-radio applications. We also show that, for a similar error-correction performance, the throughput of polar decoders both surpasses that of LDPC decoders targeting general-purpose processors and is competitive with that of state-of-the-art software LDPC decoders running on graphic processing units.
△ Less
Submitted 29 January, 2014; v1 submitted 26 June, 2013;
originally announced June 2013.
-
Scalable Successive-Cancellation Hardware Decoder for Polar Codes
Authors:
Alexandre J. Raymond,
Warren J. Gross
Abstract:
Polar codes, discovered by Arıkan, are the first error-correcting codes with an explicit construction to provably achieve channel capacity, asymptotically. However, their error-correction performance at finite lengths tends to be lower than existing capacity-approaching schemes. Using the successive-cancellation algorithm, polar decoders can be designed for very long codes, with low hardware compl…
▽ More
Polar codes, discovered by Arıkan, are the first error-correcting codes with an explicit construction to provably achieve channel capacity, asymptotically. However, their error-correction performance at finite lengths tends to be lower than existing capacity-approaching schemes. Using the successive-cancellation algorithm, polar decoders can be designed for very long codes, with low hardware complexity, leveraging the regular structure of such codes. We present an architecture and an implementation of a scalable hardware decoder based on this algorithm. This design is shown to scale to code lengths of up to N = 2^20 on an Altera Stratix IV FPGA, limited almost exclusively by the amount of available SRAM.
△ Less
Submitted 14 June, 2013;
originally announced June 2013.
-
Hardware Architecture for List SC Decoding of Polar Codes
Authors:
A. Balatsoukas-Stimming,
A. J. Raymond,
W. J. Gross,
A. Burg
Abstract:
We present a hardware architecture and algorithmic improvements for list SC decoding of polar codes. More specifically, we show how to completely avoid copying of the likelihoods, which is algorithmically the most cumbersome part of list SC decoding. The hardware architecture was synthesized for a blocklength of N = 1024 bits and list sizes L = 2, 4 using a UMC 90nm VLSI technology. The resulting…
▽ More
We present a hardware architecture and algorithmic improvements for list SC decoding of polar codes. More specifically, we show how to completely avoid copying of the likelihoods, which is algorithmically the most cumbersome part of list SC decoding. The hardware architecture was synthesized for a blocklength of N = 1024 bits and list sizes L = 2, 4 using a UMC 90nm VLSI technology. The resulting decoder can achieve a coded throughput of 181 Mbps at a frequency of 459 MHz.
△ Less
Submitted 27 February, 2014; v1 submitted 28 March, 2013;
originally announced March 2013.
-
A Low-Power Content-Addressable-Memory Based on Clustered-Sparse-Networks
Authors:
Hooman Jarollahi,
Vincent Gripon,
Naoya Onizawa,
Warren J. Gross
Abstract:
A low-power Content-Addressable-Memory (CAM) is introduced employing a new mechanism for associativity between the input tags and the corresponding address of the output data. The proposed architecture is based on a recently developed clustered-sparse-network using binary-weighted connections that on-average will eliminate most of the parallel comparisons performed during a search. Therefore, the…
▽ More
A low-power Content-Addressable-Memory (CAM) is introduced employing a new mechanism for associativity between the input tags and the corresponding address of the output data. The proposed architecture is based on a recently developed clustered-sparse-network using binary-weighted connections that on-average will eliminate most of the parallel comparisons performed during a search. Therefore, the dynamic energy consumption of the proposed design is significantly lower compared to that of a conventional low-power CAM design. Given an input tag, the proposed architecture computes a few possibilities for the location of the matched tag and performs the comparisons on them to locate a single valid match. A 0.13 um CMOS technology was used for simulation purposes. The energy consumption and the search delay of the proposed design are 9.5%, and 30.4% of that of the conventional NAND architecture respectively with a 3.4% higher number of transistors.
△ Less
Submitted 18 February, 2013;
originally announced February 2013.
-
Relaxed Half-Stochastic Belief Propagation
Authors:
François Leduc-Primeau,
Saied Hemati,
Shie Mannor,
Warren J. Gross
Abstract:
Low-density parity-check codes are attractive for high throughput applications because of their low decoding complexity per bit, but also because all the codeword bits can be decoded in parallel. However, achieving this in a circuit implementation is complicated by the number of wires required to exchange messages between processing nodes. Decoding algorithms that exchange binary messages are inte…
▽ More
Low-density parity-check codes are attractive for high throughput applications because of their low decoding complexity per bit, but also because all the codeword bits can be decoded in parallel. However, achieving this in a circuit implementation is complicated by the number of wires required to exchange messages between processing nodes. Decoding algorithms that exchange binary messages are interesting for fully-parallel implementations because they can reduce the number and the length of the wires, and increase logic density. This paper introduces the Relaxed Half-Stochastic (RHS) decoding algorithm, a binary message belief propagation (BP) algorithm that achieves a coding gain comparable to the best known BP algorithms that use real-valued messages. We derive the RHS algorithm by starting from the well-known Sum-Product algorithm, and then derive a low-complexity version suitable for circuit implementation. We present extensive simulation results on two standardized codes having different rates and constructions, including low bit error rate results. These simulations show that RHS can be an advantageous replacement for the existing state-of-the-art decoding algorithms when targeting fully-parallel implementations.
△ Less
Submitted 11 May, 2012;
originally announced May 2012.
-
A Chernoff-type Lower Bound for the Gaussian Q-function
Authors:
François D. Côté,
Ioannis N. Psaromiligkos,
Warren J. Gross
Abstract:
A lower bound for the Gaussian Q-function is presented in the form of a single exponential function with parametric order and weight. We prove the lower bound by introducing two functions, one related to the Q-function and the other similarly related to the exponential function, and by obtaining inequalities that indicate the sign of the difference of the two functions.
A lower bound for the Gaussian Q-function is presented in the form of a single exponential function with parametric order and weight. We prove the lower bound by introducing two functions, one related to the Q-function and the other similarly related to the exponential function, and by obtaining inequalities that indicate the sign of the difference of the two functions.
△ Less
Submitted 22 March, 2012; v1 submitted 29 February, 2012;
originally announced February 2012.
-
Hardware Implementation of Successive Cancellation Decoders for Polar Codes
Authors:
Camille Leroux,
Alexandre J. Raymond,
Gabi Sarkis,
Ido Tal,
Alexander Vardy,
Warren J. Gross
Abstract:
The recently-discovered polar codes are seen as a major breakthrough in coding theory; they provably achieve the theoretical capacity of discrete memoryless channels using the low complexity successive cancellation (SC) decoding algorithm. Motivated by recent developments in polar coding theory, we propose a family of efficient hardware implementations for SC polar decoders. We show that such deco…
▽ More
The recently-discovered polar codes are seen as a major breakthrough in coding theory; they provably achieve the theoretical capacity of discrete memoryless channels using the low complexity successive cancellation (SC) decoding algorithm. Motivated by recent developments in polar coding theory, we propose a family of efficient hardware implementations for SC polar decoders. We show that such decoders can be implemented with O(n) processing elements, O(n) memory elements, and can provide a constant throughput for a given target clock frequency. Furthermore, we show that SC decoding can be implemented in the logarithm domain, thereby eliminating costly multiplication and division operations and reducing the complexity of each processing element greatly. We also present a detailed architecture for an SC decoder and provide logic synthesis results confirming the linear growth in complexity of the decoder as the code length increases.
△ Less
Submitted 18 November, 2011;
originally announced November 2011.
-
Hardware architectures for Successive Cancellation Decoding of Polar Codes
Authors:
Camille Leroux,
Ido Tal,
Alexander Vardy,
Warren J. Gross
Abstract:
The recently-discovered polar codes are widely seen as a major breakthrough in coding theory. These codes achieve the capacity of many important channels under successive cancellation decoding. Motivated by the rapid progress in the theory of polar codes, we propose a family of architectures for efficient hardware implementation of successive cancellation decoders. We show that such decoders can b…
▽ More
The recently-discovered polar codes are widely seen as a major breakthrough in coding theory. These codes achieve the capacity of many important channels under successive cancellation decoding. Motivated by the rapid progress in the theory of polar codes, we propose a family of architectures for efficient hardware implementation of successive cancellation decoders. We show that such decoders can be implemented with O(n) processing elements and O(n) memory elements, while providing constant throughput. We also propose a technique for overlap** the decoding of several consecutive codewords, thereby achieving a significant speed-up factor. We furthermore show that successive cancellation decoding can be implemented in the logarithmic domain, thereby eliminating the multiplication and division operations and greatly reducing the complexity of each processing element.
△ Less
Submitted 12 November, 2010;
originally announced November 2010.
-
Spontaneous coordinated activity in cultured networks: Analysis of multiple ignition sites, primary circuits, burst phase delay distributions and functional structures
Authors:
Michael I. Ham,
Vadas Gintautas,
Guenter W. Gross
Abstract:
All higher order central nervous systems exhibit spontaneous neural activity, though the purpose and mechanistic origin of such activity remains poorly understood. We explore the ignition and spread of collective spontaneous electrophysiological burst activity in networks of cultured cortical neurons growing on microelectrode arrays using information theory and first-spike-in-burst analysis method…
▽ More
All higher order central nervous systems exhibit spontaneous neural activity, though the purpose and mechanistic origin of such activity remains poorly understood. We explore the ignition and spread of collective spontaneous electrophysiological burst activity in networks of cultured cortical neurons growing on microelectrode arrays using information theory and first-spike-in-burst analysis methods. We show the presence of burst leader neurons, which form a mono-synaptically connected primary circuit, and initiate a majority of network bursts. Leader/follower firing delay times form temporally stable positively skewed distributions. Blocking inhibitory synapses usually results in shorter delay times with reduced variance. These distributions are generalized characterizations of internal network dynamics and provide estimates of pair-wise synaptic distances. We show that mutual information between neural nodes is a function of distance, which is maintained under disinhibition. The resulting analysis produces specific quantitative constraints and insights into the activation patterns of collective neuronal activity in self-organized cortical networks, which may prove useful for models emulating spontaneously active systems.
△ Less
Submitted 12 April, 2010;
originally announced April 2010.
-
The functional structure of cortical neuronal networks grown in vitro
Authors:
Luis M. A. Bettencourt,
Greg J. Stephens,
Michael I. Ham,
Guenter W. Gross
Abstract:
We apply an information theoretic treatment of action potential time series measured with microelectrode arrays to estimate the connectivity of mammalian neuronal cell assemblies grown {\it in vitro}. We infer connectivity between two neurons via the measurement of the mutual information between their spike trains. In addition we measure higher point multi-informations between any two spike trai…
▽ More
We apply an information theoretic treatment of action potential time series measured with microelectrode arrays to estimate the connectivity of mammalian neuronal cell assemblies grown {\it in vitro}. We infer connectivity between two neurons via the measurement of the mutual information between their spike trains. In addition we measure higher point multi-informations between any two spike trains conditional on the activity of a third cell, as a means to identify and distinguish classes of functional connectivity among three neurons. The use of a conditional three-cell measure removes some interpretational shortcomings of the pairwise mutual information and sheds light into the functional connectivity arrangements of any three cells. We analyze the resultant connectivity graphs in light of other complex networks and demonstrate that, despite their {\it ex vivo} development, the connectivity maps derived from cultured neural assemblies are similar to other biological networks and display nontrivial structure in clustering coefficient, network diameter and assortative mixing. Specifically we show that these networks are weakly disassortative small world graphs, which differ significantly in their structure from randomized graphs with the same degree. We expect our analysis to be useful in identifying the computational motifs of a wide variety of complex networks, derived from time series data.
△ Less
Submitted 7 March, 2007;
originally announced March 2007.