-
Least-Squares Design of Chromatic Dispersion Compensation FIR Filters Realized with Overlap-Save Processing
Authors:
Oscar Gustafsson,
Cheolyong Bae,
Hakan Johansson
Abstract:
A design method for chromatic dispersion compensation filters realized using overlap-save processing in the frequency domain is proposed. Based on the idea to use the values that are normally zero-padded, better results than using optimal time-domain design are obtained without any modification of the overlap-save processing complexity.
A design method for chromatic dispersion compensation filters realized using overlap-save processing in the frequency domain is proposed. Based on the idea to use the values that are normally zero-padded, better results than using optimal time-domain design are obtained without any modification of the overlap-save processing complexity.
△ Less
Submitted 20 June, 2023;
originally announced August 2023.
-
Stochastic Analysis of LMS Algorithm with Delayed Block Coefficient Adaptation
Authors:
Mohd. Tasleem Khan,
Oscar Gustafsson
Abstract:
In high sample-rate applications of the least-mean-square (LMS) adaptive filtering algorithm, pipelining or/and block processing is required. As opposed to earlier work, pipelining and block processing are jointly considered to obtain what we refer to as the delayed block LMS (DBLMS) algorithm. Different stochastic analyses for the steady and transient states to estimate the step-size bound, adapt…
▽ More
In high sample-rate applications of the least-mean-square (LMS) adaptive filtering algorithm, pipelining or/and block processing is required. As opposed to earlier work, pipelining and block processing are jointly considered to obtain what we refer to as the delayed block LMS (DBLMS) algorithm. Different stochastic analyses for the steady and transient states to estimate the step-size bound, adaptation accuracy, and adaptation speed based on the recursive relation of delayed block excess mean square error (MSE) are presented. The effect of different amounts of pipelining delays and block sizes on the adaptation accuracy and speed of the adaptive filter with different filter lengths and speed-ups are studied. It is concluded that for a constant speed-up, a large delay and small block size lead to a slower convergence rate compared to a small delay and large block size with almost the same steady-state MSE. Monte Carlo simulations indicate a good agreement with the proposed estimates for Gaussian inputs.
△ Less
Submitted 21 June, 2023; v1 submitted 31 May, 2023;
originally announced June 2023.
-
On Frequency-Domain Implementation of Digital FIR Filters Using Overlap-Add and Overlap-Save Techniques
Authors:
Hakan Johansson,
Oscar Gustafsson
Abstract:
In this paper, new insights in frequency-domain implementations of digital finite-length impulse response filtering (linear convolution) using overlap-add and overlap-save techniques are provided. It is shown that, in practical finite-wordlength implementations, the overall system corresponds to a time-varying system that can be represented in essentially two different ways. One way is to represen…
▽ More
In this paper, new insights in frequency-domain implementations of digital finite-length impulse response filtering (linear convolution) using overlap-add and overlap-save techniques are provided. It is shown that, in practical finite-wordlength implementations, the overall system corresponds to a time-varying system that can be represented in essentially two different ways. One way is to represent the system with a distortion function and aliasing functions, which in this paper is derived from multirate filter bank representations. The other way is to use a periodically time-varying impulse-response representation or, equivalently, a set of time-invariant impulse responses and the corresponding frequency responses. The paper provides systematic derivations and analyses of these representations along with filter impulse response properties and design examples. The representations are particularly useful when analyzing the effect of coefficient quantizations as well as the use of shorter DFT lengths than theoretically required. A comprehensive computational-complexity analysis is also provided, and accurate formulas for estimating the optimal DFT lengths for given filter lengths are derived. Using optimal DFT lengths, it is shown that the frequency-domain implementations have lower computational complexities (multiplication rates) than the corresponding time-domain implementations for filter lengths that are shorter than those reported earlier in the literature. In particular, for general (unsymmetric) filters, the frequency-domain implementations are shown to be more efficient for all filter lengths. This opens up for new considerations when comparing complexities of different filter implementations.
△ Less
Submitted 14 March, 2023; v1 submitted 17 February, 2023;
originally announced February 2023.
-
Massive Machine Type Communication Pilot-Hop** Sequence Detection Architectures Based on Non-Negative Least Squares for Grant-Free Random Access
Authors:
Narges Mohammadi Sarband,
Ema Becirovic,
Mattias Krysander,
Erik G. Larsson,
Oscar Gustafsson
Abstract:
User activity detection in grant-free random access massive machine type communication (mMTC) using pilot-hop** sequences can be formulated as solving a non-negative least squares (NNLS) problem. In this work, two architectures using different algorithms to solve the NNLS problem is proposed. The algorithms are implemented using a fully parallel approach and fixed-point arithmetic, leading to hi…
▽ More
User activity detection in grant-free random access massive machine type communication (mMTC) using pilot-hop** sequences can be formulated as solving a non-negative least squares (NNLS) problem. In this work, two architectures using different algorithms to solve the NNLS problem is proposed. The algorithms are implemented using a fully parallel approach and fixed-point arithmetic, leading to high detection rates and low power consumption. The first algorithm, fast projected gradients, converges faster to the optimal value. The second algorithm, multiplicative updates, is partially implemented in the logarithmic domain, and provides a smaller chip area and lower power consumption. For a detection rate of about one million detections per second, the chip area for the fast algorithm is about 0.7 mm$^2$ compared to about 0.5 mm$^2$ for the multiplicative algorithm when implemented in a 28 nm FD-SOI standard cell process at 1 V power supply voltage. The energy consumption is about 300 nJ/detection for the fast projected gradient algorithm using 256 iterations, leading to a convergence close to the theoretical. With 128 iterations, about 250 nJ/detection is required, with a detection performance on par with 192 iterations of the multiplicative algorithm for which about 100 nJ/detection is required.
△ Less
Submitted 7 December, 2020; v1 submitted 4 September, 2020;
originally announced September 2020.
-
A Distributed Processing Architecture for Modular and Scalable Massive MIMO Base Stations
Authors:
Erik Bertilsson,
Oscar Gustafsson,
Erik G. Larsson
Abstract:
In this work, a scalable and modular architecture for massive MIMO base stations with distributed processing is proposed. New antennas can readily be added by adding a new node as each node handles all the additional involved processing. The architecture supports conjugate beamforming, zero-forcing, and MMSE, where for the two latter cases a central matrix inversion is required. The impact of the…
▽ More
In this work, a scalable and modular architecture for massive MIMO base stations with distributed processing is proposed. New antennas can readily be added by adding a new node as each node handles all the additional involved processing. The architecture supports conjugate beamforming, zero-forcing, and MMSE, where for the two latter cases a central matrix inversion is required. The impact of the time required for this matrix inversion is carefully analyzed along with a generic frame format. As part of the contribution, careful computational, memory, and communication analyses are presented. It is shown that all computations can be mapped to a single computational structure and that a processing node consisting of a single such processing element can handle a broad range of bandwidths and number of terminals.
△ Less
Submitted 24 January, 2018;
originally announced January 2018.