-
Coding for the unsourced B-channel with erasures: enhancing the linked loop code
Authors:
William W. Zheng,
Jamison R. Ebert,
Stefano Rini,
Jean-Francois Chamberland
Abstract:
In [1], the linked loop code (LLC) is presented as a promising code for the unsourced A-channel with erasures (UACE). The UACE is an unsourced multiple access channel in which active users' transmitted symbols are erased with a given probability and the channel output is obtained as the union of the non-erased symbols. In this paper, we extend the UACE channel model to the unsourced B-channel with…
▽ More
In [1], the linked loop code (LLC) is presented as a promising code for the unsourced A-channel with erasures (UACE). The UACE is an unsourced multiple access channel in which active users' transmitted symbols are erased with a given probability and the channel output is obtained as the union of the non-erased symbols. In this paper, we extend the UACE channel model to the unsourced B-channel with erasures (UBCE). The UBCE differs from the UACE in that the channel output is the multiset union, or bag union, of the non-erased input symbols. In other words, the UBCE preserves the symbol multiplicity of the channel output while the UACE does not. Both the UACE and UBCE find applications in modeling aspects of unsourced random access. The LLC from [1] is enhanced and shown to outperform the tree code over the UBCE. Findings are supported by numerical simulations.
△ Less
Submitted 20 May, 2024;
originally announced June 2024.
-
ARMA Processes with Discrete-Continuous Excitation: Compressibility Beyond Sparsity
Authors:
Mohammad-Amin Charusaie,
Stefano Rini,
Arash Amini
Abstract:
Rényi Information Dimension (RID) plays a central role in quantifying the compressibility of random variables with singularities in their distribution, encompassing and extending beyond the class of sparse sources. The RID, from a high perspective, presents the average number of bits that is needed for coding the i.i.d. samples of a random variable with high precision. There are two main extension…
▽ More
Rényi Information Dimension (RID) plays a central role in quantifying the compressibility of random variables with singularities in their distribution, encompassing and extending beyond the class of sparse sources. The RID, from a high perspective, presents the average number of bits that is needed for coding the i.i.d. samples of a random variable with high precision. There are two main extensions of the RID for stochastic processes: information dimension rate (IDR) and block information dimension (BID). In addition, a more recent approach towards the compressibility of stochastic processes revolves around the concept of $ε$-achievable compression rates, which treat a random process as the limiting point of finite-dimensional random vectors and apply the compressed sensing tools on these random variables. While there is limited knowledge about the interplay of the the BID, the IDR, and $ε$-achievable compression rates, the value of IDR and BID themselves are known only for very specific types of processes, namely i.i.d. sequences (i.e., discrete-domain white noise) and moving-average (MA) processes. This paper investigates the IDR and BID of discrete-time Auto-Regressive Moving-Average (ARMA) processes in general, and their relations with $ε$-achievable compression rates when the excitation noise has a discrete-continuous measure. To elaborate, this paper shows that the RID and $ε$-achievable compression rates of this type of processes are equal to that of their excitation noise. In other words, the samples of such ARMA processes can be compressed as much as their sparse excitation noise, although the samples themselves are by no means sparse. The results of this paper can be used to evaluate the compressibility of various types of locally correlated data with finite- or infinite-memory as they are often modelled via ARMA processes.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids
Authors:
Dyah A. M. G. Wisnu,
Stefano Rini,
Ryandhimas E. Zezario,
Hsin-Min Wang,
Yu Tsao
Abstract:
This paper introduces HAAQI-Net, a non-intrusive deep learning model for music audio quality assessment tailored for hearing aid users. Unlike traditional methods like the Hearing Aid Audio Quality Index (HAAQI), which rely on intrusive comparisons to a reference signal, HAAQI-Net offers a more accessible and efficient alternative. Using a bidirectional Long Short-Term Memory (BLSTM) architecture…
▽ More
This paper introduces HAAQI-Net, a non-intrusive deep learning model for music audio quality assessment tailored for hearing aid users. Unlike traditional methods like the Hearing Aid Audio Quality Index (HAAQI), which rely on intrusive comparisons to a reference signal, HAAQI-Net offers a more accessible and efficient alternative. Using a bidirectional Long Short-Term Memory (BLSTM) architecture with attention mechanisms and features from the pre-trained BEATs model, HAAQI-Net predicts HAAQI scores directly from music audio clips and hearing loss patterns. Results show HAAQI-Net's effectiveness, with predicted scores achieving a Linear Correlation Coefficient (LCC) of 0.9368, a Spearman's Rank Correlation Coefficient (SRCC) of 0.9486, and a Mean Squared Error (MSE) of 0.0064, reducing inference time from 62.52 seconds to 2.54 seconds. Although effective, feature extraction via the large BEATs model incurs computational overhead. To address this, a knowledge distillation strategy creates a student distillBEATs model, distilling information from the teacher BEATs model during HAAQI-Net training, reducing required parameters. The distilled HAAQI-Net maintains strong performance with an LCC of 0.9071, an SRCC of 0.9307, and an MSE of 0.0091, while reducing parameters by 75.85% and inference time by 96.46%. This reduction enhances HAAQI-Net's efficiency and scalability, making it viable for real-world music audio quality assessment in hearing aid settings. This work also opens avenues for further research into optimizing deep learning models for specific applications, contributing to audio signal processing and quality assessment by providing insights into develo** efficient and accurate models for practical applications in hearing aid technology.
△ Less
Submitted 5 June, 2024; v1 submitted 2 January, 2024;
originally announced January 2024.
-
Coding for the unsourced A-channel with erasures: the linked loop code
Authors:
William W. Zheng,
Jamison R. Ebert,
Stefano Rini,
Jean-Francois Chamberland
Abstract:
The A-channel is a noiseless multiple access channel in which users simultaneously transmit Q-ary symbols and the receiver observes the set of transmitted symbols, but not their multiplicities. An A-channel is said to be unsourced if, additionally, users transmissions are encoded across time using a common codebook and decoding of the transmitted messages is done without regard to the identities o…
▽ More
The A-channel is a noiseless multiple access channel in which users simultaneously transmit Q-ary symbols and the receiver observes the set of transmitted symbols, but not their multiplicities. An A-channel is said to be unsourced if, additionally, users transmissions are encoded across time using a common codebook and decoding of the transmitted messages is done without regard to the identities of the active users. An interesting variant of the unsourced A-channel is the unsourced A-channel with erasures (UACE), in which transmitted symbols are erased with a given independent and identically distributed probability. In this paper, we focus on designing a code that enables a list of transmitted codewords to be recovered despite the erasures of some of the transmitted symbols. To this end, we propose the linked-loop code (LLC), which uses parity bits to link each symbol to the previous M symbols in a tail-biting manner, i.e., the first symbols of the transmission are linked to the last ones. The decoding process occurs in two phases: the first phase decodes the codewords that do not suffer from any erasures, and the second phase attempts to recover the erased symbols using the available parities. We compare the performance of the LLC over the UACE with other codes in the literature and argue for the effectiveness of the construction. Our motivation for studying the UACE comes from its relevance in machine-type communication and coded compressed sensing.
△ Less
Submitted 19 September, 2023;
originally announced December 2023.
-
Harmonic Retrieval Using Weighted Lifted-Structure Low-Rank Matrix Completion
Authors:
Mohammad Bokaei,
Saeed Razavikia,
Stefano Rini,
Arash Amini,
Hamid Behrouzi
Abstract:
In this paper, we investigate the problem of recovering the frequency components of a mixture of $K$ complex sinusoids from a random subset of $N$ equally-spaced time-domain samples. Because of the random subset, the samples are effectively non-uniform. Besides, the frequency values of each of the $K$ complex sinusoids are assumed to vary continuously within a given range.
For this problem, we p…
▽ More
In this paper, we investigate the problem of recovering the frequency components of a mixture of $K$ complex sinusoids from a random subset of $N$ equally-spaced time-domain samples. Because of the random subset, the samples are effectively non-uniform. Besides, the frequency values of each of the $K$ complex sinusoids are assumed to vary continuously within a given range.
For this problem, we propose a two-step strategy: (i) we first lift the incomplete set of uniform samples (unavailable samples are treated as missing data) into a structured matrix with missing entries, which is potentially low-rank; then (ii) we complete the matrix using a weighted nuclear minimization problem. We call the method a \emph{ weighted lifted-structured (WLi) low-rank matrix recovery}. Our approach can be applied to a range of matrix structures such as Hankel and double-Hankel, among others, and provides improvement over the unweighted existing schemes such as EMaC and DEMaC. We provide theoretical guarantees for the proposed method, as well as numerical simulations in both noiseless and noisy settings. Both the theoretical and the numerical results confirm the superiority of the proposed approach.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
M22: A Communication-Efficient Algorithm for Federated Learning Inspired by Rate-Distortion
Authors:
Yangyi Liu,
Stefano Rini,
Sadaf Salehkalaibar,
Jun Chen
Abstract:
In federated learning (FL), the communication constraint between the remote learners and the Parameter Server (PS) is a crucial bottleneck. For this reason, model updates must be compressed so as to minimize the loss in accuracy resulting from the communication constraint. This paper proposes ``\emph{${\bf M}$-magnitude weighted $L_{\bf 2}$ distortion + $\bf 2$ degrees of freedom''} (M22) algorith…
▽ More
In federated learning (FL), the communication constraint between the remote learners and the Parameter Server (PS) is a crucial bottleneck. For this reason, model updates must be compressed so as to minimize the loss in accuracy resulting from the communication constraint. This paper proposes ``\emph{${\bf M}$-magnitude weighted $L_{\bf 2}$ distortion + $\bf 2$ degrees of freedom''} (M22) algorithm, a rate-distortion inspired approach to gradient compression for federated training of deep neural networks (DNNs). In particular, we propose a family of distortion measures between the original gradient and the reconstruction we referred to as ``$M$-magnitude weighted $L_2$'' distortion, and we assume that gradient updates follow an i.i.d. distribution -- generalized normal or Weibull, which have two degrees of freedom. In both the distortion measure and the gradient, there is one free parameter for each that can be fitted as a function of the iteration number. Given a choice of gradient distribution and distortion measure, we design the quantizer minimizing the expected distortion in gradient reconstruction. To measure the gradient compression performance under a communication constraint, we define the \emph{per-bit accuracy} as the optimal improvement in accuracy that one bit of communication brings to the centralized model over the training period. Using this performance measure, we systematically benchmark the choice of gradient distribution and distortion measure. We provide substantial insights on the role of these choices and argue that significant performance improvements can be attained using such a rate-distortion inspired compressor.
△ Less
Submitted 22 January, 2023;
originally announced January 2023.
-
Empirical Risk Minimization with Relative Entropy Regularization
Authors:
Samir M. Perlaza,
Gaetan Bisson,
Iñaki Esnaola,
Alain Jean-Marie,
Stefano Rini
Abstract:
The empirical risk minimization (ERM) problem with relative entropy regularization (ERM-RER) is investigated under the assumption that the reference measure is a $σ$-finite measure, and not necessarily a probability measure. Under this assumption, which leads to a generalization of the ERM-RER problem allowing a larger degree of flexibility for incorporating prior knowledge, numerous relevant prop…
▽ More
The empirical risk minimization (ERM) problem with relative entropy regularization (ERM-RER) is investigated under the assumption that the reference measure is a $σ$-finite measure, and not necessarily a probability measure. Under this assumption, which leads to a generalization of the ERM-RER problem allowing a larger degree of flexibility for incorporating prior knowledge, numerous relevant properties are stated. Among these properties, the solution to this problem, if it exists, is shown to be a unique probability measure, mutually absolutely continuous with the reference measure. Such a solution exhibits a probably-approximately-correct guarantee for the ERM problem independently of whether the latter possesses a solution. For a fixed dataset and under a specific condition, the empirical risk is shown to be a sub-Gaussian random variable when the models are sampled from the solution to the ERM-RER problem. The generalization capabilities of the solution to the ERM-RER problem (the Gibbs algorithm) are studied via the sensitivity of the expected empirical risk to deviations from such a solution towards alternative probability measures. Finally, an interesting connection between sensitivity, generalization error, and lautum information is established.
△ Less
Submitted 8 April, 2024; v1 submitted 12 November, 2022;
originally announced November 2022.
-
Sharp asymptotics on the compression of two-layer neural networks
Authors:
Mohammad Hossein Amani,
Simone Bombari,
Marco Mondelli,
Rattana Pukdee,
Stefano Rini
Abstract:
In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M<N nodes. More precisely, we consider the setting in which the weights of the target network are i.i.d. sub-Gaussian, and we minimize the population L_2 loss between the outputs of the target and of the compressed network, under the assumption of Gaussian inputs. By using tools…
▽ More
In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M<N nodes. More precisely, we consider the setting in which the weights of the target network are i.i.d. sub-Gaussian, and we minimize the population L_2 loss between the outputs of the target and of the compressed network, under the assumption of Gaussian inputs. By using tools from high-dimensional probability, we show that this non-convex problem can be simplified when the target network is sufficiently over-parameterized, and provide the error rate of this approximation as a function of the input dimension and N. In this mean-field limit, the simplified objective, as well as the optimal weights of the compressed network, does not depend on the realization of the target network, but only on expected scaling factors. Furthermore, for networks with ReLU activation, we conjecture that the optimum of the simplified optimization problem is achieved by taking weights on the Equiangular Tight Frame (ETF), while the scaling of the weights and the orientation of the ETF depend on the parameters of the target network. Numerical evidence is provided to support this conjecture.
△ Less
Submitted 16 August, 2022; v1 submitted 17 May, 2022;
originally announced May 2022.
-
How to Attain Communication-Efficient DNN Training? Convert, Compress, Correct
Authors:
Zhong-**g Chen,
Eduin E. Hernandez,
Yu-Chih Huang,
Stefano Rini
Abstract:
This paper introduces CO3 -- an algorithm for communication-efficient federated Deep Neural Network (DNN) training. CO3 takes its name from three processing applied which reduce the communication load when transmitting the local DNN gradients from the remote users to the Parameter Server. Namely: (i) gradient quantization through floating-point conversion, (ii) lossless compression of the quantize…
▽ More
This paper introduces CO3 -- an algorithm for communication-efficient federated Deep Neural Network (DNN) training. CO3 takes its name from three processing applied which reduce the communication load when transmitting the local DNN gradients from the remote users to the Parameter Server. Namely: (i) gradient quantization through floating-point conversion, (ii) lossless compression of the quantized gradient, and (iii) quantization error correction. We carefully design each of the steps above to assure good training performance under a constraint on the communication rate. In particular, in steps (i) and (ii), we adopt the assumption that DNN gradients are distributed according to a generalized normal distribution, which is validated numerically in the paper. For step (iii), we utilize an error feedback with memory decay mechanism to correct the quantization error introduced in step (i). We argue that the memory decay coefficient, similarly to the learning rate, can be optimally tuned to improve convergence. A rigorous convergence analysis of the proposed CO3 with SGD is provided. Moreover, with extensive simulations, we show that CO3 offers improved performance when compared with existing gradient compression schemes in the literature which employ sketching and non-uniform quantization of the local gradients.
△ Less
Submitted 1 June, 2023; v1 submitted 18 April, 2022;
originally announced April 2022.
-
A Perspective on Neural Capacity Estimation: Viability and Reliability
Authors:
Farhad Mirkarimi,
Stefano Rini,
Nariman Farsad
Abstract:
Recently, several methods have been proposed for estimating the mutual information from sample data using deep neural networks. These estimators ar referred to as neural mutual information estimation (NMIE)s. NMIEs differ from other approaches as they are data-driven estimators. As such, they have the potential to perform well on a large class of capacity problems. In order to test the performance…
▽ More
Recently, several methods have been proposed for estimating the mutual information from sample data using deep neural networks. These estimators ar referred to as neural mutual information estimation (NMIE)s. NMIEs differ from other approaches as they are data-driven estimators. As such, they have the potential to perform well on a large class of capacity problems. In order to test the performance across various NMIEs, it is desirable to establish a benchmark encompassing the different challenges of capacity estimation. This is the objective of this paper. In particular, we consider three scenarios for benchmarking:i the classic AWGN channel, ii channels continuous inputs optical intensity and peak-power constrained AWGN channel iii channels with a discrete output, i.e., Poisson channel. We also consider the extension to the multi-terminal case with iv the AWGN and optical MAC models. We argue that benchmarking a certain NMIE across these four scenarios provides a substantive test of performance. In this paper we study the performance of mutual information neural estimator (MINE), smoothed mutual information lower-bound estimator (SMILE), and directed information neural estimator (DINE). and provide insights on the performance of other methods as well. To summarize our benchmarking results, MINE provides the most reliable performance.
△ Less
Submitted 5 October, 2022; v1 submitted 22 March, 2022;
originally announced March 2022.
-
Convert, compress, correct: Three steps toward communication-efficient DNN training
Authors:
Zhong-**g Chen,
Eduin E. Hernandez,
Yu-Chih Huang,
Stefano Rini
Abstract:
In this paper, we introduce a novel algorithm, $\mathsf{CO}_3$, for communication-efficiency distributed Deep Neural Network (DNN) training. $\mathsf{CO}_3$ is a joint training/communication protocol, which encompasses three processing steps for the network gradients: (i) quantization through floating-point conversion, (ii) lossless compression, and (iii) error correction. These three components a…
▽ More
In this paper, we introduce a novel algorithm, $\mathsf{CO}_3$, for communication-efficiency distributed Deep Neural Network (DNN) training. $\mathsf{CO}_3$ is a joint training/communication protocol, which encompasses three processing steps for the network gradients: (i) quantization through floating-point conversion, (ii) lossless compression, and (iii) error correction. These three components are crucial in the implementation of distributed DNN training over rate-constrained links. The interplay of these three steps in processing the DNN gradients is carefully balanced to yield a robust and high-performance scheme. The performance of the proposed scheme is investigated through numerical evaluations over CIFAR-10.
△ Less
Submitted 16 March, 2022;
originally announced March 2022.
-
Coded Demixing for Unsourced Random Access
Authors:
Jamison R. Ebert,
Vamsi K. Amalladinne,
Stefano Rini,
Jean-Francois Chamberland,
Krishna R. Narayanan
Abstract:
Unsourced random access (URA) is a recently proposed multiple access paradigm tailored to the uplink channel of machine-type communication networks. By exploiting a strong connection between URA and compressed sensing, the massive multiple access problem may be cast as a compressed sensing (CS) problem, albeit one in exceedingly large dimensions. To efficiently handle the dimensionality of the pro…
▽ More
Unsourced random access (URA) is a recently proposed multiple access paradigm tailored to the uplink channel of machine-type communication networks. By exploiting a strong connection between URA and compressed sensing, the massive multiple access problem may be cast as a compressed sensing (CS) problem, albeit one in exceedingly large dimensions. To efficiently handle the dimensionality of the problem, coded compressed sensing (CCS) has emerged as a pragmatic signal processing tool that, when applied to URA, offers good performance at low complexity. While CCS is effective at recovering a signal that is sparse with respect to a single basis, it is unable to jointly recover signals that are sparse with respect to separate bases. In this article, the CCS framework is extended to the demixing setting, yielding a novel technique called coded demixing. A generalized framework for coded demixing is presented and a low-complexity recovery algorithm based on approximate message passing (AMP) is developed. Coded demixing is applied to heterogeneous multi-class URA networks and traditional single-class networks. Its performance is analyzed and numerical simulations are presented to highlight the benefits of coded demixing.
△ Less
Submitted 27 June, 2022; v1 submitted 1 March, 2022;
originally announced March 2022.
-
Two-snapshot DOA Estimation via Hankel-structured Matrix Completion
Authors:
Mohammad Bokaei,
Saeed Razavikia,
Arash Amini,
Stefano Rini
Abstract:
In this paper, we study the problem of estimating the direction of arrival (DOA) using a sparsely sampled uniform linear array (ULA). Based on an initial incomplete ULA measurement, our strategy is to choose a sparse subset of array elements for measuring the next snapshot. Then, we use a Hankel-structured matrix completion to interpolate for the missing ULA measurements. Finally, the source DOAs…
▽ More
In this paper, we study the problem of estimating the direction of arrival (DOA) using a sparsely sampled uniform linear array (ULA). Based on an initial incomplete ULA measurement, our strategy is to choose a sparse subset of array elements for measuring the next snapshot. Then, we use a Hankel-structured matrix completion to interpolate for the missing ULA measurements. Finally, the source DOAs are estimated using a subspace method such as Prony on the fully recovered ULA. We theoretically provide a sufficient bound for the number of required samples (array elements) for perfect recovery. The numerical comparisons of the proposed method with existing techniques such as atomic-norm minimization and off-the-grid approaches confirm the superiority of the proposed method.
△ Less
Submitted 21 February, 2022;
originally announced February 2022.
-
Empirical Risk Minimization with Relative Entropy Regularization: Optimality and Sensitivity Analysis
Authors:
Samir M. Perlaza,
Gaetan Bisson,
Iñaki Esnaola,
Alain Jean-Marie,
Stefano Rini
Abstract:
The optimality and sensitivity of the empirical risk minimization problem with relative entropy regularization (ERM-RER) are investigated for the case in which the reference is a sigma-finite measure instead of a probability measure. This generalization allows for a larger degree of flexibility in the incorporation of prior knowledge over the set of models. In this setting, the interplay of the re…
▽ More
The optimality and sensitivity of the empirical risk minimization problem with relative entropy regularization (ERM-RER) are investigated for the case in which the reference is a sigma-finite measure instead of a probability measure. This generalization allows for a larger degree of flexibility in the incorporation of prior knowledge over the set of models. In this setting, the interplay of the regularization parameter, the reference measure, the risk function, and the empirical risk induced by the solution of the ERM-RER problem is characterized. This characterization yields necessary and sufficient conditions for the existence of a regularization parameter that achieves an arbitrarily small empirical risk with arbitrarily high probability. The sensitivity of the expected empirical risk to deviations from the solution of the ERM-RER problem is studied. The sensitivity is then used to provide upper and lower bounds on the expected empirical risk. Moreover, it is shown that the expectation of the sensitivity is upper bounded, up to a constant factor, by the square root of the lautum information between the models and the datasets.
△ Less
Submitted 12 November, 2022; v1 submitted 9 February, 2022;
originally announced February 2022.
-
Lossy Gradient Compression: How Much Accuracy Can One Bit Buy?
Authors:
Sadaf Salehkalaibar,
Stefano Rini
Abstract:
In federated learning (FL), a global model is trained at a Parameter Server (PS) by aggregating model updates obtained from multiple remote learners. Generally, the communication between the remote users and the PS is rate-limited, while the transmission from the PS to the remote users are unconstrained. The FL setting gives rise to the distributed learning scenario in which the updates from the r…
▽ More
In federated learning (FL), a global model is trained at a Parameter Server (PS) by aggregating model updates obtained from multiple remote learners. Generally, the communication between the remote users and the PS is rate-limited, while the transmission from the PS to the remote users are unconstrained. The FL setting gives rise to the distributed learning scenario in which the updates from the remote learners have to be compressed so as to meet communication rate constraints in the uplink transmission toward the PS. For this problem, one wishes to compress the model updates so as to minimize the loss in accuracy resulting from the compression error. In this paper, we take a rate-distortion approach to address the compressor design problem for the distributed training of deep neural networks (DNNs). In particular, we define a measure of the compression performance under communication-rate constraints -- the \emph{per-bit accuracy} -- which addresses the ultimate improvement of accuracy that a bit of communication brings to the centralized model. In order to maximize the per-bit accuracy, we consider modeling the DNN gradient updates at remote learners as a generalized normal distribution. Under this assumption on the DNN gradient distribution, we propose a class of distortion measures to aid the design of quantizers for the compression of the model updates. We argue that this family of distortion measures, which we refer to as "$M$-magnitude weighted $L_2$" norm, captures the practitioner's intuition in the choice of gradient compressor. Numerical simulations are provided to validate the proposed approach for the CIFAR-10 dataset.
△ Less
Submitted 2 June, 2022; v1 submitted 6 February, 2022;
originally announced February 2022.
-
DNN gradient lossless compression: Can GenNorm be the answer?
Authors:
Zhong-**g Chen,
Eduin E. Hernandez,
Yu-Chih Huang,
Stefano Rini
Abstract:
In this paper, the problem of optimal gradient lossless compression in Deep Neural Network (DNN) training is considered. Gradient compression is relevant in many distributed DNN training scenarios, including the recently popular federated learning (FL) scenario in which each remote users are connected to the parameter server (PS) through a noiseless but rate limited channel. In distributed DNN tra…
▽ More
In this paper, the problem of optimal gradient lossless compression in Deep Neural Network (DNN) training is considered. Gradient compression is relevant in many distributed DNN training scenarios, including the recently popular federated learning (FL) scenario in which each remote users are connected to the parameter server (PS) through a noiseless but rate limited channel. In distributed DNN training, if the underlying gradient distribution is available, classical lossless compression approaches can be used to reduce the number of bits required for communicating the gradient entries. Mean field analysis has suggested that gradient updates can be considered as independent random variables, while Laplace approximation can be used to argue that gradient has a distribution approximating the normal (Norm) distribution in some regimes. In this paper we argue that, for some networks of practical interest, the gradient entries can be well modelled as having a generalized normal (GenNorm) distribution. We provide numerical evaluations to validate that the hypothesis GenNorm modelling provides a more accurate prediction of the DNN gradient tail distribution. Additionally, this modeling choice provides concrete improvement in terms of lossless compression of the gradients when applying classical fix-to-variable lossless coding algorithms, such as Huffman coding, to the quantized gradient updates. This latter results indeed provides an effective compression strategy with low memory and computational complexity that has great practical relevance in distributed DNN training scenarios.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.
-
Neural Capacity Estimators: How Reliable Are They?
Authors:
Farhad Mirkarimi,
Stefano Rini,
Nariman Farsad
Abstract:
Recently, several methods have been proposed for estimating the mutual information from sample data using deep neural networks and without the knowing closed form distribution of the data. This class of estimators is referred to as neural mutual information estimators. Although very promising, such techniques have yet to be rigorously bench-marked so as to establish their efficacy, ease of impleme…
▽ More
Recently, several methods have been proposed for estimating the mutual information from sample data using deep neural networks and without the knowing closed form distribution of the data. This class of estimators is referred to as neural mutual information estimators. Although very promising, such techniques have yet to be rigorously bench-marked so as to establish their efficacy, ease of implementation, and stability for capacity estimation which is joint maximization frame-work. In this paper, we compare the different techniques proposed in the literature for estimating capacity and provide a practitioner perspective on their effectiveness. In particular, we study the performance of mutual information neural estimator (MINE), smoothed mutual information lower-bound estimator (SMILE), and directed information neural estimator (DINE) and provide insights on InfoNCE. We evaluated these algorithms in terms of their ability to learn the input distributions that are capacity approaching for the AWGN channel, the optical intensity channel, and peak power-constrained AWGN channel. For both scenarios, we provide insightful comments on various aspects of the training process, such as stability, sensitivity to initialization.
△ Less
Submitted 18 March, 2022; v1 submitted 14 November, 2021;
originally announced November 2021.
-
Speeding-Up Back-Propagation in DNN: Approximate Outer Product with Memory
Authors:
Eduin E. Hernandez,
Stefano Rini,
Tolga M. Duman
Abstract:
In this paper, an algorithm for approximate evaluation of back-propagation in DNN training is considered, which we term Approximate Outer Product Gradient Descent with Memory (Mem-AOP-GD). The Mem-AOP-GD algorithm implements an approximation of the stochastic gradient descent by considering only a subset of the outer products involved in the matrix multiplications that encompass backpropagation. I…
▽ More
In this paper, an algorithm for approximate evaluation of back-propagation in DNN training is considered, which we term Approximate Outer Product Gradient Descent with Memory (Mem-AOP-GD). The Mem-AOP-GD algorithm implements an approximation of the stochastic gradient descent by considering only a subset of the outer products involved in the matrix multiplications that encompass backpropagation. In order to correct for the inherent bias in this approximation, the algorithm retains in memory an accumulation of the outer products that are not used in the approximation. We investigate the performance of the proposed algorithm in terms of DNN training loss under two design parameters: (i) the number of outer products used for the approximation, and (ii) the policy used to select such outer products. We experimentally show that significant improvements in computational complexity as well as accuracy can indeed be obtained through Mem-AOPGD.
△ Less
Submitted 18 October, 2021;
originally announced October 2021.
-
Wireless Federated Learning with Limited Communication and Differential Privacy
Authors:
Amir Sonee,
Stefano Rini,
Yu-Chih Huang
Abstract:
This paper investigates the role of dimensionality reduction in efficient communication and differential privacy (DP) of the local datasets at the remote users for over-the-air computation (AirComp)-based federated learning (FL) model. More precisely, we consider the FL setting in which clients are prompted to train a machine learning model by simultaneous channel-aware and limited communications…
▽ More
This paper investigates the role of dimensionality reduction in efficient communication and differential privacy (DP) of the local datasets at the remote users for over-the-air computation (AirComp)-based federated learning (FL) model. More precisely, we consider the FL setting in which clients are prompted to train a machine learning model by simultaneous channel-aware and limited communications with a parameter server (PS) over a Gaussian multiple-access channel (GMAC), so that transmissions sum coherently at the PS globally aware of the channel coefficients. For this setting, an algorithm is proposed based on applying federated stochastic gradient descent (FedSGD) for training the minimum of a given loss function based on the local gradients, Johnson-Lindenstrauss (JL) random projection for reducing the dimension of the local updates, and artificial noise to further aid user's privacy. For this scheme, our results show that the local DP performance is mainly improved due to injecting noise of greater variance on each dimension while kee** the sensitivity of the projected vectors unchanged. This is while the convergence rate is slowed down compared to the case without dimensionality reduction. As the performance outweighs for the slower convergence, the trade-off between privacy and convergence is higher but is shown to lessen in high-dimensional regime yielding almost the same trade-off with much less communication cost.
△ Less
Submitted 1 June, 2021;
originally announced June 2021.
-
Comparison-limited Vector Quantization
Authors:
Joseph Chataignon,
Stefano Rini
Abstract:
In this paper a variation of the classic vector quantization problem is considered. In the standard formulation, a quantizer is designed to minimize the distortion between input and output when the number of reconstruction points is fixed. We consider, instead, the scenario in which the number of comparators used in quantization is fixed. More precisely, we study the case in which a vector quantiz…
▽ More
In this paper a variation of the classic vector quantization problem is considered. In the standard formulation, a quantizer is designed to minimize the distortion between input and output when the number of reconstruction points is fixed. We consider, instead, the scenario in which the number of comparators used in quantization is fixed. More precisely, we study the case in which a vector quantizer of dimension d is comprised of k comparators, each receiving a linear combination of the inputs and producing the output value one/zero if this linear combination is above/below a certain threshold. In reconstruction, the comparators' output is mapped to a reconstruction point, chosen so as to minimize a chosen distortion measure between the quantizer input and its reconstruction. The Comparison-Limited Vector Quantization (CLVQ) problem is then defined as the problem of optimally designing the configuration of the compactors and the choice of reconstruction points so as to minimize the given distortion. In this paper, we design a numerical optimization algorithm for the CLVQ problem. This algorithm leverages combinatorial geometrical notions to describe the hyperplane arrangement induced by the configuration of the comparators. It also relies on a genetic genetic meta heuristic to improve the selection of the quantizer initialization and avoid local minima encountered during optimization. We numerically evaluate the performance of our algorithm in the case of input distributions following uniform and Gaussian i.i.d. sources to be compressed under quadratic distortion and compare it to the classic Linde-Buzo-Gray (LBG) algorithm.
△ Less
Submitted 30 May, 2021;
originally announced May 2021.
-
Stochastic Binning and Coded Demixing for Unsourced Random Access
Authors:
Jamison R. Ebert,
Vamsi K. Amalladinne,
Stefano Rini,
Jean-Francois Chamberland,
Krishna R. Narayanan
Abstract:
Unsourced random access is a novel communication paradigm designed for handling a large number of uncoordinated users that sporadically transmit very short messages. Under this model, coded compressed sensing (CCS) has emerged as a low-complexity scheme that exhibits good error performance. Yet, one of the challenges faced by CCS pertains to disentangling a large number of codewords present on a s…
▽ More
Unsourced random access is a novel communication paradigm designed for handling a large number of uncoordinated users that sporadically transmit very short messages. Under this model, coded compressed sensing (CCS) has emerged as a low-complexity scheme that exhibits good error performance. Yet, one of the challenges faced by CCS pertains to disentangling a large number of codewords present on a single factor graph. To mitigate this issue, this article introduces a modified CCS scheme whereby active devices stochastically partition themselves into groups that utilize separate sampling matrices with low cross-coherence for message transmission. At the receiver, ideas from the field of compressed demixing are employed for support recovery, and separate factor graphs are created for message disambiguation in each cluster. This reduces the number of active users on a factor graph, which improves performance significantly in typical scenarios. Indeed, coded demixing reduces the probability of error as the number of groups increases, up to a point. Findings are supported with numerical simulations.
△ Less
Submitted 21 July, 2021; v1 submitted 12 April, 2021;
originally announced April 2021.
-
Hierarchical Causal Bandit
Authors:
Ruiyang Song,
Stefano Rini,
Kuang Xu
Abstract:
Causal bandit is a nascent learning model where an agent sequentially experiments in a causal network of variables, in order to identify the reward-maximizing intervention. Despite the model's wide applicability, existing analytical results are largely restricted to a parallel bandit version where all variables are mutually independent. We introduce in this work the hierarchical causal bandit mode…
▽ More
Causal bandit is a nascent learning model where an agent sequentially experiments in a causal network of variables, in order to identify the reward-maximizing intervention. Despite the model's wide applicability, existing analytical results are largely restricted to a parallel bandit version where all variables are mutually independent. We introduce in this work the hierarchical causal bandit model as a viable path towards understanding general causal bandits with dependent variables. The core idea is to incorporate a contextual variable that captures the interaction among all variables with direct effects. Using this hierarchical framework, we derive sharp insights on algorithmic design in causal bandits with dependent arms and obtain nearly matching regret bounds in the case of a binary context.
△ Less
Submitted 6 March, 2021;
originally announced March 2021.
-
Straggler Mitigation through Unequal Error Protection for Distributed Approximate Matrix Multiplication
Authors:
Busra Tegin,
Eduin. E. Hernandez,
Stefano Rini,
Tolga M. Duman
Abstract:
Large-scale machine learning and data mining methods routinely distribute computations across multiple agents to parallelize processing. The time required for the computations at the agents is affected by the availability of local resources and/or poor channel conditions giving rise to the "straggler problem". As a remedy to this problem, we employ Unequal Error Protection (UEP) codes to obtain an…
▽ More
Large-scale machine learning and data mining methods routinely distribute computations across multiple agents to parallelize processing. The time required for the computations at the agents is affected by the availability of local resources and/or poor channel conditions giving rise to the "straggler problem". As a remedy to this problem, we employ Unequal Error Protection (UEP) codes to obtain an approximation of the matrix product in the distributed computation setting to provide higher protection for the blocks with higher effect on the final result. We characterize the performance of the proposed approach from a theoretical perspective by bounding the expected reconstruction error for matrices with uncorrelated entries. We also apply the proposed coding strategy to the computation of the back-propagation step in the training of a Deep Neural Network (DNN) for an image classification task in the evaluation of the gradients. Our numerical experiments show that it is indeed possible to obtain significant improvements in the overall time required to achieve the DNN training convergence by producing approximation of matrix products using UEP codes in the presence of stragglers.
△ Less
Submitted 27 July, 2021; v1 submitted 4 March, 2021;
originally announced March 2021.
-
Multi-Class Unsourced Random Access via Coded Demixing
Authors:
Vamsi K. Amalladinne,
Allen Hao,
Stefano Rini,
Jean-Francois Chamberland
Abstract:
Unsourced random access (URA) is a recently proposed communication paradigm attuned to machine-driven data transfers. In the original URA formulation, all the active devices share the same number of bits per packet. The scenario where several classes of devices transmit concurrently has so far received little attention. An initial solution to this problem takes the form of group successive interfe…
▽ More
Unsourced random access (URA) is a recently proposed communication paradigm attuned to machine-driven data transfers. In the original URA formulation, all the active devices share the same number of bits per packet. The scenario where several classes of devices transmit concurrently has so far received little attention. An initial solution to this problem takes the form of group successive interference cancellation, where codewords from a class of devices with more resources are recovered first, followed by the decoding of the remaining messages. This article introduces a joint iterative decoding approach rooted in approximate message passing. This framework has a concatenated coding structure borrowed from the single-class coded compressed sensing and admits a solution that offers performance improvement at little added computational complexity. Our findings point to new connections between multi-class URA and compressive demixing. The performance of the envisioned algorithm is validated through numerical simulations.
△ Less
Submitted 15 February, 2021;
originally announced February 2021.
-
An Exploration of the Heterogeneous Unsourced MAC
Authors:
Allen Hao,
Stefano Rini,
Vamsi Amalladinne,
Asit Kumar Pradhan,
Jean-Francois Chamberland
Abstract:
The unsourced MAC model was originally introduced to study the communication scenario in which a number of devices with low-complexity and low-energy wish to upload their respective messages to a base station. In the original problem formulation, all devices communicate using the same information rate. This may be very inefficient in certain wireless situations with varied channel conditions, powe…
▽ More
The unsourced MAC model was originally introduced to study the communication scenario in which a number of devices with low-complexity and low-energy wish to upload their respective messages to a base station. In the original problem formulation, all devices communicate using the same information rate. This may be very inefficient in certain wireless situations with varied channel conditions, power budgets, and payload requirements at the devices. This paper extends the original problem setting so as to allow for such variability. More specifically, we consider the scenario in which devices are clustered into two classes, possibly with different SNR levels or distinct payload requirements. In the cluster with higher power,devices transmit using a two-layer superposition modulation. In the cluster with lower energy, users transmit with the same base constellation as in the high power cluster. Within each layer, devices employ the same codebook. At the receiver, signal grou**s are recovered using Approximate Message Passing(AMP), and proceeding from the high to the low power levels using successive interference cancellation (SIC). This layered architecture is implemented using Coded Compressed Sensing(CCS) within every grou**. An outer tree code is employed to stitch fragments together across times and layers, as needed.This pragmatic approach to heterogeneous CCS is validated numerically and design guidelines are identified.
△ Less
Submitted 21 November, 2020;
originally announced November 2020.
-
An efficient label-free analyte detection algorithm for time-resolved spectroscopy
Authors:
Stefano Rini,
Hirotsugu Hiramatsu
Abstract:
Time-resolved spectral techniques play an important analysis tool in many contexts, from physical chemistry to biomedicine. Customarily, the label-free detection of analytes is manually performed by experts through the aid of classic dimensionality-reduction methods, such as Principal Component Analysis (PCA) and Non-negative Matrix Factorization (NMF). This fundamental reliance on expert analysis…
▽ More
Time-resolved spectral techniques play an important analysis tool in many contexts, from physical chemistry to biomedicine. Customarily, the label-free detection of analytes is manually performed by experts through the aid of classic dimensionality-reduction methods, such as Principal Component Analysis (PCA) and Non-negative Matrix Factorization (NMF). This fundamental reliance on expert analysis for unknown analyte detection severely hinders the applicability and the throughput of these such techniques. For this reason, in this paper, we formulate this detection problem as an unsupervised learning problem and propose a novel machine learning algorithm for label-free analyte detection. To show the effectiveness of the proposed solution, we consider the problem of detecting the amino-acids in Liquid Chromatography coupled with Raman spectroscopy (LC-Raman).
△ Less
Submitted 15 November, 2020;
originally announced November 2020.
-
Fairness-Oriented User Association in HetNets Using Bargaining Game Theory
Authors:
Ehsan Sadeghi,
Hamid Behroozi,
Stefano Rini
Abstract:
In this paper, the user association and resource allocation problem is investigated for a two-tier HetNet consisting of one macro Base Station (BS) and a number of pico BSs. The effectiveness of user association to BSs is evaluated in terms of fairness and load distribution. In particular, the problem of determining a fair user association is formulated as a bargaining game so that for the Nash Ba…
▽ More
In this paper, the user association and resource allocation problem is investigated for a two-tier HetNet consisting of one macro Base Station (BS) and a number of pico BSs. The effectiveness of user association to BSs is evaluated in terms of fairness and load distribution. In particular, the problem of determining a fair user association is formulated as a bargaining game so that for the Nash Bargaining Solution (NBS) abiding the fairness axioms provides an optimal and fair user association. The NBS also yields in a Pareto optimal solution and leads to a proportional fair solution in the proposed HetNet model. Additionally, we introduce a novel algorithmic solution in which a new Coalition Generation Algorithm (CGA), called SINR-based CGA, is considered in order to simplify the coalition generation phase. Our simulation results show the efficiency of the proposed user association scheme in terms of fairness and load distribution among BSs and users. In particular, we compare the performance of the proposed solution with that of the throughput-oriented scheme in terms of the max-sum-rate scheme and show that the proposed solution yields comparable average data rates and overall sum rate.
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
Straggler Mitigation through Unequal Error Protection for Distributed Matrix Multiplication
Authors:
Busra Tegin,
Eduin E. Hernandez,
Stefano Rini,
Tolga M. Duman
Abstract:
Large-scale machine learning and data mining methods routinely distribute computations across multiple agents to parallelize processing. The time required for computation at the agents is affected by the availability of local resources giving rise to the "straggler problem" in which the computation results are held back by unresponsive agents. For this problem, linear coding of the matrix sub-bloc…
▽ More
Large-scale machine learning and data mining methods routinely distribute computations across multiple agents to parallelize processing. The time required for computation at the agents is affected by the availability of local resources giving rise to the "straggler problem" in which the computation results are held back by unresponsive agents. For this problem, linear coding of the matrix sub-blocks can be used to introduce resilience toward straggling. The Parameter Server (PS) utilizes a channel code and distributes the matrices to the workers for multiplication. It then produces an approximation to the desired matrix multiplication using the results of the computations received at a given deadline. In this paper, we propose to employ Unequal Error Protection (UEP) codes to alleviate the straggler problem. The resiliency level of each sub-block is chosen according to its norm as blocks with larger norms have higher effects on the result of the matrix multiplication. We validate the effectiveness of our scheme both theoretically and through numerical evaluations. We derive a theoretical characterization of the performance of UEP using random linear codes, and compare it the case of equal error protection. We also apply the proposed coding strategy to the computation of the back-propagation step in the training of a Deep Neural Network (DNN), for which we investigate the fundamental trade-off between precision and the time required for the computations.
△ Less
Submitted 19 March, 2021; v1 submitted 5 November, 2020;
originally announced November 2020.
-
On Sum Secure Degrees of Freedom for K-User MISO Broadcast Channel With Alternating CSIT
Authors:
Leyla Sadighi,
Sadaf Salehkalaibar,
Stefano Rini
Abstract:
In this paper, the sum secure degrees of freedom (SDoF) of the $K$-user Multiple Input/Single Output (MISO) Broadcast Channel with Confidential Messages (BCCM) and alternating Channel State Information at the Transmitter (CSIT) is investigated. In the MISO BCCM, a $K$-antenna transmitter (TX) communicates toward $K$ single-antenna receivers (RXs), so that message for RX $k$ is kept secret from RX…
▽ More
In this paper, the sum secure degrees of freedom (SDoF) of the $K$-user Multiple Input/Single Output (MISO) Broadcast Channel with Confidential Messages (BCCM) and alternating Channel State Information at the Transmitter (CSIT) is investigated. In the MISO BCCM, a $K$-antenna transmitter (TX) communicates toward $K$ single-antenna receivers (RXs), so that message for RX $k$ is kept secret from RX $j$ with $j<k$. For this model, we consider the scenario in which the CSI of the RXs from $2$ to $K$ is instantaneously known at the transmitter while CSI of RX $1$ is known at the transmitter (i) instantaneously for half of the time and (ii) with a unit delay for the remainder of the time. We refer to this CSIT availability as \emph{alternating} CSIT. Alternating CIST has been shown to provide synergistic gains in terms of SDoF and is thus of a viable strategy to ensure secure communication by simply relying on the CSI feedback strategy. Our main contribution is the characterization of sum SDoF for this model as $SDoF_{\rm sum}= (2K-1)/2$. Interestingly, this $SDoF_{\rm sum}$ is attained by a rather simple achievability in which the TX uses artificial noise to prevent the decoding of the message of the unintended receivers at RX $1$. For simplicity first, the proof for the case $K=3$ is discussed in detail and after that, we have presented the results for any number of RXs.
△ Less
Submitted 22 November, 2020; v1 submitted 2 November, 2020;
originally announced November 2020.
-
Decentralized optimization over noisy, rate-constrained networks: Achieving consensus by communicating differences
Authors:
Rajarshi Saha,
Stefano Rini,
Milind Rao,
Andrea Goldsmith
Abstract:
In decentralized optimization, multiple nodes in a network collaborate to minimize the sum of their local loss functions. The information exchange between nodes required for this task, is often limited by network connectivity. We consider a setting in which communication between nodes is hindered by both (i) a finite rate-constraint on the signal transmitted by any node, and (ii) additive noise co…
▽ More
In decentralized optimization, multiple nodes in a network collaborate to minimize the sum of their local loss functions. The information exchange between nodes required for this task, is often limited by network connectivity. We consider a setting in which communication between nodes is hindered by both (i) a finite rate-constraint on the signal transmitted by any node, and (ii) additive noise corrupting the signal received by any node. We propose a novel algorithm for this scenario: Decentralized Lazy Mirror Descent with Differential Exchanges (DLMD-DiffEx), which guarantees convergence of the local estimates to the optimal solution under the given communication constraints. A salient feature of DLMD-DiffEx is the introduction of additional proxy variables that are maintained by the nodes to account for the disagreement in their estimates due to channel noise and rate-constraints. Convergence to the optimal solution is attained by having nodes iteratively exchange these disagreement terms until consensus is achieved. In order to prevent noise accumulation during this exchange, DLMD-DiffEx relies on two sequences; one controlling the power of the transmitted signal, and the other determining the consensus rate. We provide clear insights on the design of these two sequences which highlights the interplay between consensus rate and noise amplification. We investigate the performance of DLMD-DiffEx both from a theoretical perspective as well as through numerical evaluations.
△ Less
Submitted 6 October, 2021; v1 submitted 21 October, 2020;
originally announced October 2020.
-
Efficient Federated Learning over Multiple Access Channel with Differential Privacy Constraints
Authors:
Amir Sonee,
Stefano Rini
Abstract:
In this paper, the problem of federated learning (FL) through digital communication between clients and a parameter server (PS) over a multiple access channel (MAC), also subject to differential privacy (DP) constraints, is studied. More precisely, we consider the setting in which clients in a centralized network are prompted to train a machine learning model using their local datasets. The inform…
▽ More
In this paper, the problem of federated learning (FL) through digital communication between clients and a parameter server (PS) over a multiple access channel (MAC), also subject to differential privacy (DP) constraints, is studied. More precisely, we consider the setting in which clients in a centralized network are prompted to train a machine learning model using their local datasets. The information exchange between the clients and the PS takes places over a MAC channel and must also preserve the DP of the local datasets. Accordingly, the objective of the clients is to minimize the training loss subject to (i) rate constraints for reliable communication over the MAC and (ii) DP constraint over the local datasets. For this optimization scenario, we proposed a novel consensus scheme in which digital distributed stochastic gradient descent (D-DSGD) is performed by each client. To preserve DP, a digital artificial noise is also added by the users to the locally quantized gradients. The performance of the scheme is evaluated in terms of the convergence rate and DP level for a given MAC capacity. The performance is optimized over the choice of the quantization levels and the artificial noise parameters. Numerical evaluations are presented to validate the performance of the proposed scheme.
△ Less
Submitted 1 November, 2020; v1 submitted 15 May, 2020;
originally announced May 2020.
-
The Information & Mutual Information Ratio for Counting Image Features and Their Matches
Authors:
Ali Khajegili Mirabadi,
Stefano Rini
Abstract:
Feature extraction and description is an important topic of computer vision, as it is the starting point of a number of tasks such as image reconstruction, stitching, registration, and recognition among many others. In this paper, two new image features are proposed: the Information Ratio (IR) and the Mutual Information Ratio (MIR). The IR is a feature of a single image, while the MIR describes fe…
▽ More
Feature extraction and description is an important topic of computer vision, as it is the starting point of a number of tasks such as image reconstruction, stitching, registration, and recognition among many others. In this paper, two new image features are proposed: the Information Ratio (IR) and the Mutual Information Ratio (MIR). The IR is a feature of a single image, while the MIR describes features common across two or more images.We begin by introducing the IR and the MIR and motivate these features in an information theoretical context as the ratio of the self-information of an intensity level over the information contained over the pixels of the same intensity. Notably, the relationship of the IR and MIR with the image entropy and mutual information, classic information measures, are discussed. Finally, the effectiveness of these features is tested through feature extraction over INRIA Copydays datasets and feature matching over the Oxfords Affine Covariant Regions. These numerical evaluations validate the relevance of the IR and MIR in practical computer vision tasks
△ Less
Submitted 14 May, 2020;
originally announced May 2020.
-
Decentralized SGD with Over-the-Air Computation
Authors:
Emre Ozfatura,
Stefano Rini,
Deniz Gunduz
Abstract:
We study the performance of decentralized stochastic gradient descent (DSGD) in a wireless network, where the nodes collaboratively optimize an objective function using their local datasets. Unlike the conventional setting, where the nodes communicate over error-free orthogonal communication links, we assume that transmissions are prone to additive noise and interference.We first consider a point-…
▽ More
We study the performance of decentralized stochastic gradient descent (DSGD) in a wireless network, where the nodes collaboratively optimize an objective function using their local datasets. Unlike the conventional setting, where the nodes communicate over error-free orthogonal communication links, we assume that transmissions are prone to additive noise and interference.We first consider a point-to-point (P2P) transmission strategy, termed the OAC-P2P scheme, in which the node pairs are scheduled in an orthogonal fashion to minimize interference. Since in the DSGD framework, each node requires a linear combination of the neighboring models at the consensus step, we then propose the OAC-MAC scheme, which utilizes the signal superposition property of the wireless medium to achieve over-the-air computation (OAC). For both schemes, we cast the scheduling problem as a graph coloring problem. We numerically evaluate the performance of these two schemes for the MNIST image classification task under various network conditions. We show that the OAC-MAC scheme attains better convergence performance with a fewer communication rounds.
△ Less
Submitted 6 March, 2020;
originally announced March 2020.
-
On the Capacity of the Oversampled Wiener Phase Noise Channel
Authors:
Luca Barletta,
Stefano Rini
Abstract:
In this paper, the capacity of the oversampled Wiener phase noise (OWPN) channel is investigated. The OWPN channel is a discrete-time point-to-point channel with a multi-sample receiver in which the channel output is affected by both additive and multiplicative noise. The additive noise is a white standard Gaussian process while the multiplicative noise is a Wiener phase noise process. This channe…
▽ More
In this paper, the capacity of the oversampled Wiener phase noise (OWPN) channel is investigated. The OWPN channel is a discrete-time point-to-point channel with a multi-sample receiver in which the channel output is affected by both additive and multiplicative noise. The additive noise is a white standard Gaussian process while the multiplicative noise is a Wiener phase noise process. This channel generalizes a number of channel models previously studied in the literature which investigate the effects of phase noise on the channel capacity, such as the Wiener phase noise channel and the non-coherent channel. We derive upper and inner bounds to the capacity of OWPN channel: (i) an upper bound is derived through the I-MMSE relationship by bounding the Fisher information when estimating a phase noise sample given the past channel outputs and phase noise realizations, then (ii) two inner bounds are shown: one relying on coherent combining of the oversampled channel outputs and one relying on non-coherent combining of the samples. After capacity, we study generalized degrees of freedom (GDoF) of the OWPN channel for the case in which the oversampling factor grows with the average transmit power $P$ as $P$? and the frequency noise variance as $P^α$?. Using our new capacity bounds, we derive the GDoF region in three regimes: regime (i) in which the GDoF region equals that of the classic additive white Gaussian noise (for $β\leq 1$), one (ii) in which GDoF region reduces to that of the non-coherent channel (for $β\geq \min \{α,1\}$) and, finally, one in which partially-coherent combining of the over-samples is asymptotically optimal (for $2 α-1\leq β\leq 1$). Overall, our results are the first to identify the regimes in which different oversampling strategies are asymptotically optimal.
△ Less
Submitted 21 January, 2020;
originally announced January 2020.
-
Compressibility Measures for Affinely Singular Random Vectors
Authors:
Mohammad-Amin Charusaie,
Arash Amini,
Stefano Rini
Abstract:
There are several ways to measure the compressibility of a random measure; they include general approaches such as using the rate-distortion curve, as well as more specific notions, such as the Renyi information dimension (RID). The RID parameter indicates the concentration of the measure around lower-dimensional subsets of the space. While the evaluation of such compressibility parameters is well…
▽ More
There are several ways to measure the compressibility of a random measure; they include general approaches such as using the rate-distortion curve, as well as more specific notions, such as the Renyi information dimension (RID). The RID parameter indicates the concentration of the measure around lower-dimensional subsets of the space. While the evaluation of such compressibility parameters is well-studied for continuous and discrete measures, the case of discrete-continuous measures is quite subtle. In this paper, we focus on a class of multi-dimensional random measures that have singularities on affine lower-dimensional subsets. This class of distributions naturally arises when considering linear transformation of component-wise independent discrete-continuous random variables. To measure the compressibility of such distributions, we introduce the new notion of dimensional-rate bias (DRB) which is closely related to the entropy and differential entropy in discrete and continuous cases, respectively. Similar to entropy and differential entropy, DRB is useful in evaluating the mutual information between distributions of the aforementioned type. Besides the DRB, we also evaluate the the RID of these distributions. We further provide an upper-bound for the RID of multi-dimensional random measures that are obtained by Lipschitz functions of component-wise independent discrete-continuous random variables ($\mathbf{X}$). The upper-bound is shown to be achievable when the Lipschitz function is $A \mathbf{X}$, where $A$ satisfies {\changed$\spark({A_{m\times n}}) = m+1$} (e.g., Vandermonde matrices). When considering discrete-domain moving-average processes with non-Gaussian excitation noise, the above results allow us to evaluate the block-average RID and DRB, as well as to determine a relationship between these parameters and other existing compressibility measures.
△ Less
Submitted 8 March, 2022; v1 submitted 12 January, 2020;
originally announced January 2020.
-
Comparison-limited Vector Quantization
Authors:
Joseph Chataignon,
Stefano Rini
Abstract:
A variation of the classic vector quantization problem is considered, in which the analog-to-digital (A2D) conversion is not constrained by the cardinality of the output but rather by the number of comparators available for quantization. More specifically, we consider the scenario in which a vector quantizer of dimension d is comprised of k comparators, each receiving a linear combination of the i…
▽ More
A variation of the classic vector quantization problem is considered, in which the analog-to-digital (A2D) conversion is not constrained by the cardinality of the output but rather by the number of comparators available for quantization. More specifically, we consider the scenario in which a vector quantizer of dimension d is comprised of k comparators, each receiving a linear combination of the inputs and producing zero/one when this signal is above/below a threshold. Given a distribution of the inputs and a distortion criterion, the value of the linear combinations and thresholds are to be configured so as to minimize the distortion between the quantizer input and its reconstruction. This vector quantizer architecture naturally arises in many A2D conversion scenarios in which the quantizer's cost and energy consumption are severely restricted. For this novel vector quantizer architecture, we propose an algorithm to determine the optimal configuration and provide the first performance evaluation for the case of uniform and Gaussian sources.
△ Less
Submitted 27 June, 2019; v1 submitted 14 May, 2019;
originally announced May 2019.
-
LDPC Coded Multiuser Sha** for the Gaussian Multiple Access Channel
Authors:
Alexios Balatsoukas-Stimming,
Stefano Rini,
Joerg Kliewer
Abstract:
The joint design of input constellation and low-density parity-check (LDPC) codes to approach the symmetric capacity of the two-user Gaussian multiple access channel is studied. More specifically, multilevel coding is employed at each user to construct a high-order input constellation and the constellations of the users are jointly designed so as to maximize the multiuser sha** gain. At the rece…
▽ More
The joint design of input constellation and low-density parity-check (LDPC) codes to approach the symmetric capacity of the two-user Gaussian multiple access channel is studied. More specifically, multilevel coding is employed at each user to construct a high-order input constellation and the constellations of the users are jointly designed so as to maximize the multiuser sha** gain. At the receiver, each layer of the multilevel coding is jointly decoded among users, while successive cancellation is employed across layers. The LDPC code employed by each user in each layer is designed using EXIT charts to support joint decoding among users for the prescribed per-layer rate and SNR. Numerical simulations are provided to validate the proposed constellation and LDPC code designs.
△ Less
Submitted 1 May, 2019;
originally announced May 2019.
-
The Statistical Dictionary-based String Matching Problem
Authors:
M. Suri,
S. Rini
Abstract:
In the Dictionary-based String Matching (DSM) problem, a retrieval system has access to a source sequence and stores the position of a certain number of strings in a posting table. When a user inquires the position of a string, the retrieval system, instead of searching in the source sequence directly, relies on the the posting table to answer the query more efficiently. In this paper, the Statist…
▽ More
In the Dictionary-based String Matching (DSM) problem, a retrieval system has access to a source sequence and stores the position of a certain number of strings in a posting table. When a user inquires the position of a string, the retrieval system, instead of searching in the source sequence directly, relies on the the posting table to answer the query more efficiently. In this paper, the Statistical DSM problem is a proposed as a statistical and information-theoretic formulation of the classic DSM problem in which both the source and the query have a statistical description while the strings stored in the posting sequence are described as a code. Through this formulation, we are able to define the efficiency of the retrieval system as the average cost in answering a users' query in the limit of sufficiently long source sequence. This formulation is used to study the retrieval performance for the case in which (i) all the strings of a given length, referred to as k-grams , and (ii) prefix-free codes.
△ Less
Submitted 22 November, 2018;
originally announced November 2018.
-
Distributed Convex Optimization With Limited Communications
Authors:
Milind Rao,
Stefano Rini,
Andrea Goldsmith
Abstract:
In this paper, a distributed convex optimization algorithm, termed \emph{distributed coordinate dual averaging} (DCDA) algorithm, is proposed. The DCDA algorithm addresses the scenario of a large distributed optimization problem with limited communication among nodes in the network. Currently known distributed subgradient methods, such as the distributed dual averaging or the distributed alternati…
▽ More
In this paper, a distributed convex optimization algorithm, termed \emph{distributed coordinate dual averaging} (DCDA) algorithm, is proposed. The DCDA algorithm addresses the scenario of a large distributed optimization problem with limited communication among nodes in the network. Currently known distributed subgradient methods, such as the distributed dual averaging or the distributed alternating direction method of multipliers algorithms, assume that nodes can exchange messages of large cardinality. Such network communication capabilities are not valid in many scenarios of practical relevance. In the DCDA algorithm, on the other hand, communication of each coordinate of the optimization variable is restricted over time. For the proposed algorithm, we bound the rate of convergence under different communication protocols and network architectures. We also consider the extensions to the case of imperfect gradient knowledge and the case in which transmitted messages are corrupted by additive noise or are quantized. Relevant numerical simulations are also provided.
△ Less
Submitted 29 October, 2018;
originally announced October 2018.
-
On MIMO Channel Capacity with Output Quantization Constraints
Authors:
Abbas Khalili,
Stefano Rini,
Luca Barletta,
Elza Erkip,
Yonina C. Eldar
Abstract:
The capacity of a Multiple-Input Multiple-Output (MIMO) channel in which the antenna outputs are processed by an analog linear combining network and quantized by a set of threshold quantizers is studied. The linear combining weights and quantization thresholds are selected from a set of possible configurations as a function of the channel matrix. The possible configurations of the combining networ…
▽ More
The capacity of a Multiple-Input Multiple-Output (MIMO) channel in which the antenna outputs are processed by an analog linear combining network and quantized by a set of threshold quantizers is studied. The linear combining weights and quantization thresholds are selected from a set of possible configurations as a function of the channel matrix. The possible configurations of the combining network model specific analog receiver architectures, such as single antenna selection, sign quantization of the antenna outputs or linear processing of the outputs. An interesting connection between the capacity of this channel and a constrained sphere packing problem in which unit spheres are packed in a hyperplane arrangement is shown. From a high-level perspective, this follows from the fact that each threshold quantizer can be viewed as a hyperplane partitioning the transmitter signal space. Accordingly, the output of the set of quantizers corresponds to the possible regions induced by the hyperplane arrangement corresponding to the channel realization and receiver configuration. This connection provides a number of important insights into the design of quantization architectures for MIMO receivers; for instance, it shows that for a given number of quantizers, choosing configurations which induce a larger number of partitions can lead to higher rates.
△ Less
Submitted 5 June, 2018;
originally announced June 2018.
-
On the Capacity of the Carbon Copy onto Dirty Paper Channel
Authors:
Stefano Rini,
Shlomo Shamai
Abstract:
The "Carbon Copy onto Dirty Paper" (CCDP) channel is the compound "writing on dirty paper" channel in which the channel output is obtained as the sum of the channel input, white Gaussian noise and a Gaussian state sequence randomly selected among a set possible realizations. The transmitter has non-causal knowledge of the set of possible state sequences but does not know which sequence is selected…
▽ More
The "Carbon Copy onto Dirty Paper" (CCDP) channel is the compound "writing on dirty paper" channel in which the channel output is obtained as the sum of the channel input, white Gaussian noise and a Gaussian state sequence randomly selected among a set possible realizations. The transmitter has non-causal knowledge of the set of possible state sequences but does not know which sequence is selected to produce the channel output. We study the capacity of the CCDP channel for two scenarios: (i) the state sequences are independent and identically distributed, and (ii) the state sequences are scaled versions of the same sequence. In the first scenario, we show that a combination of superposition coding, time-sharing and Gel'fand-Pinsker binning is sufficient to approach the capacity to within three bits per channel use for any number of possible state realizations. In the second scenario, we derive capacity to within four bits-per-channel-use for the case of two possible state sequences. This result is extended to the CCDP channel with any number of possible state sequences under certain conditions on the scaling parameters which we denote as "strong fading" regime. We conclude by providing some remarks on the capacity of the CCDP channel in which the state sequences have any jointly Gaussian distribution.
△ Less
Submitted 8 July, 2017;
originally announced July 2017.
-
Compress-and-Estimate Source Coding for a Vector Gaussian Source
Authors:
Ruiyang Song,
Stefano Rini,
Alon Kipnis,
Andrea Goldsmith
Abstract:
We consider the remote vector source coding problem in which a vector Gaussian source is to be estimated from noisy linear measurements. For this problem, we derive the performance of the compress-and-estimate (CE) coding scheme and compare it to the optimal performance. In the CE coding scheme, the remote encoder compresses the noisy source observations so as to minimize the local distortion meas…
▽ More
We consider the remote vector source coding problem in which a vector Gaussian source is to be estimated from noisy linear measurements. For this problem, we derive the performance of the compress-and-estimate (CE) coding scheme and compare it to the optimal performance. In the CE coding scheme, the remote encoder compresses the noisy source observations so as to minimize the local distortion measure, independent from the joint distribution between the source and the observations. In reconstruction, the decoder estimates the original source realization from the lossy-compressed noisy observations. For the CE coding in the Gaussian vector case, we show that, if the code rate is less than a threshold, then the CE coding scheme attains the same performance as the optimal coding scheme. We also introduce lower and upper bounds for the performance gap above this threshold. In addition, an example with two observations and two sources is studied to illustrate the behavior of the performance gap.
△ Less
Submitted 3 July, 2017;
originally announced July 2017.
-
Capacity Outer Bound and Degrees of Freedom of Wiener Phase Noise Channels with Oversampling
Authors:
Luca Barletta,
Stefano Rini
Abstract:
The discrete-time Wiener phase noise channel with an integrate-and-dump multi-sample receiver is studied.
A novel outer bound on the capacity with an average input power constraint is derived as a function of the oversampling factor.
This outer bound yields the degrees of freedom for the scenario in which the oversampling factor grows with the transmit power $P$ as $P^α$.
The result shows, p…
▽ More
The discrete-time Wiener phase noise channel with an integrate-and-dump multi-sample receiver is studied.
A novel outer bound on the capacity with an average input power constraint is derived as a function of the oversampling factor.
This outer bound yields the degrees of freedom for the scenario in which the oversampling factor grows with the transmit power $P$ as $P^α$.
The result shows, perhaps surprisingly, that the largest pre-log that can be attained with phase modulation at high signal-to-noise ratio is at most $1/4$.
△ Less
Submitted 23 May, 2017;
originally announced May 2017.
-
A General Framework for Low-Resolution Receivers for MIMO Channels
Authors:
Stefano Rini,
Luca Barletta,
Yonina C. Eldar,
Elza Erkip
Abstract:
The capacity of a discrete-time multi-input multi-output (MIMO) Gaussian channel with output quantization is investigated for different receiver architectures. A general formulation of this problem is proposed in which the antenna outputs are processed by analog combiners while sign quantizers are used for analog-to-digital conversion. To exemplify this approach, four analog receiver architectures…
▽ More
The capacity of a discrete-time multi-input multi-output (MIMO) Gaussian channel with output quantization is investigated for different receiver architectures. A general formulation of this problem is proposed in which the antenna outputs are processed by analog combiners while sign quantizers are used for analog-to-digital conversion. To exemplify this approach, four analog receiver architectures of varying generality and complexity are considered: (a) multiple antenna selection and sign quantization of the antenna outputs, (b) single antenna selection and multilevel quantization, (c) multiple antenna selection and multilevel quantization, and (d) linear combining of the antenna outputs and multilevel quantization. Achievable rates are studied as a function of the number of available sign quantizers and compared among different architectures. In particular, it is shown that architecture (a) is sufficient to attain the optimal high signal-to-noise ratio performance for a MIMO receiver in which the number of antennas is larger than the number of sign quantizers. Numerical evaluations of the average performance are presented for the case in which the channel gains are i.i.d. Gaussian.
△ Less
Submitted 7 July, 2017; v1 submitted 26 February, 2017;
originally announced February 2017.
-
Capacity of Discrete-Time Wiener Phase Noise Channels to Within a Constant Gap
Authors:
Luca Barletta,
Stefano Rini
Abstract:
The capacity of the discrete-time channel affected by both additive Gaussian noise and Wiener phase noise is studied. Novel inner and outer bounds are presented, which differ of at most $6.65$ bits per channel use for all channel parameters. The capacity of this model can be subdivided in three regimes: (i) for large values of the frequency noise variance, the channel behaves similarly to a channe…
▽ More
The capacity of the discrete-time channel affected by both additive Gaussian noise and Wiener phase noise is studied. Novel inner and outer bounds are presented, which differ of at most $6.65$ bits per channel use for all channel parameters. The capacity of this model can be subdivided in three regimes: (i) for large values of the frequency noise variance, the channel behaves similarly to a channel with circularly uniform iid phase noise; (ii) when the frequency noise variance is small, the effects of the additive noise dominate over those of the phase noise, while (iii) for intermediate values of the frequency noise variance, the transmission rate over the phase modulation channel has to be reduced due to the presence of phase noise.
△ Less
Submitted 18 January, 2017;
originally announced January 2017.
-
On Capacity of the Writing onto Fast Fading Dirt Channel
Authors:
Stefano Rini,
Shlomo Shamai
Abstract:
The "Writing onto Fast Fading Dirt" (WFFD) channel is investigated to study the effects of partial channel knowledge on the capacity of the "writing on dirty paper" channel. The WFFD channel is the Gel'fand-Pinsker channel in which the output is obtained as the sum of the input, white Gaussian noise and a fading-times-state term. The fading-times-state term is equal to the element-wise product of…
▽ More
The "Writing onto Fast Fading Dirt" (WFFD) channel is investigated to study the effects of partial channel knowledge on the capacity of the "writing on dirty paper" channel. The WFFD channel is the Gel'fand-Pinsker channel in which the output is obtained as the sum of the input, white Gaussian noise and a fading-times-state term. The fading-times-state term is equal to the element-wise product of the channel state sequence, known only at the transmitter, and a fast fading process, known only at the receiver. We consider the case of Gaussian distributed channel states and derive an approximate characterization of capacity for different classes of fading distributions, both continuous and discrete. In particular, we prove that if the fading distribution concentrates in a sufficiently small interval, then capacity is approximately equal to the AWGN capacity times the probability of this interval. We also show that there exists a class of fading distributions for which having the transmitter treat the fading-times-state term as additional noise closely approaches capacity. Although a closed-form expression of the capacity of the general WFFD channel remains unknown, our results show that the presence of fading can severely reduce the usefulness of channel state knowledge at the transmitter.
△ Less
Submitted 7 July, 2017; v1 submitted 20 June, 2016;
originally announced June 2016.
-
Optimal Rate Allocation in Mismatched Multiterminal Source Coding
Authors:
Ruiyang Song,
Stefano Rini,
Alon Kipnis,
Andrea J. Goldsmith
Abstract:
We consider a multiterminal source coding problem in which a source is estimated at a central processing unit from lossy-compressed remote observations. Each lossy-encoded observation is produced by a remote sensor which obtains a noisy version of the source and compresses this observation minimizing a local distortion measure which depends only on the marginal distribution of its observation. The…
▽ More
We consider a multiterminal source coding problem in which a source is estimated at a central processing unit from lossy-compressed remote observations. Each lossy-encoded observation is produced by a remote sensor which obtains a noisy version of the source and compresses this observation minimizing a local distortion measure which depends only on the marginal distribution of its observation. The central node, on the other hand, has knowledge of the joint distribution of the source and all the observations and produces the source estimate which minimizes a different distortion measure between the source and its reconstruction. In this correspondence, we investigate the problem of optimally choosing the rate of each lossy-compressed remote estimate so as to minimize the distortion at the central processing unit subject to a bound on the overall communication rate between the remote sensors and the central unit. We focus, in particular, on two models of practical relevance: the case of a Gaussian source observed in additive Gaussian noise and reconstructed under quadratic distortion, and the case of a binary source observed in bit-flip** noise and reconstructed under Hamming distortion. In both scenarios we show that there exist regimes under which having more remote encoders does reduce the source distortion: in other words, having fewer, high-quality remote estimates provides a smaller distortion than having more, lower-quality estimates.
△ Less
Submitted 12 May, 2016;
originally announced May 2016.
-
The Carbon Copy onto Dirty Paper Channel with Statistically Equivalent States
Authors:
Stefano Rini,
Shlomo Shamai Shitz
Abstract:
Costa's "writing on dirty paper" capacity result establishes that full state pre-cancellation can be attained in Gelfand-Pinsker channel with additive state and additive Gaussian noise. The "carbon copy onto dirty paper" channel is the extension of Costa's model to the compound setting: M receivers each observe the sum of the channel input, Gaussian noise and one of M Gaussian state sequences and…
▽ More
Costa's "writing on dirty paper" capacity result establishes that full state pre-cancellation can be attained in Gelfand-Pinsker channel with additive state and additive Gaussian noise. The "carbon copy onto dirty paper" channel is the extension of Costa's model to the compound setting: M receivers each observe the sum of the channel input, Gaussian noise and one of M Gaussian state sequences and attempt to decode the same common message. The state sequences are all non-causally known at the transmitter which attempts to simultaneously pre-code its transmission against the channel state affecting each output. In this correspondence we derive the capacity to within 2.25 bits-per-channel-use of the carbon copying onto dirty paper channel in which the state sequences are statistically equivalent, having the same variance and the same pairwise correlation. For this channel capacity is approached by letting the channel input be the superposition of two codewords: a base codeword, simultaneously decoded at each user, and a top codeword which is pre-coded against the state realization at each user for a portion 1/M of the time. The outer bound relies on a recursive bounding in which incremental side information is provided at each receiver. This result represents a significant first step toward determining the capacity of the most general "carbon copy onto dirty paper" channel in which state sequences appearing in the different channel outputs have any jointly Gaussian distribution.
△ Less
Submitted 5 February, 2016;
originally announced February 2016.
-
On the Capacity of the Dirty Paper Channel with Fast Fading and Discrete Channel States
Authors:
Stefano Rini,
Shlomo Shamai Shitz
Abstract:
The "writing dirty paper" capacity result crucially dependents on the perfect channel knowledge at the transmitter as the presence of even a small uncertainty in the channel realization gravely hampers the ability of the transmitter to pre-code its transmission against the channel state. This is particularly disappointing as it implies that interference pre-coding in practical systems is effective…
▽ More
The "writing dirty paper" capacity result crucially dependents on the perfect channel knowledge at the transmitter as the presence of even a small uncertainty in the channel realization gravely hampers the ability of the transmitter to pre-code its transmission against the channel state. This is particularly disappointing as it implies that interference pre-coding in practical systems is effective only when the channel estimates at the users have very high precision, a condition which is generally unattainable in wireless environments. In this paper we show that substantial improvements are possible when the state sequence is drawn from a discrete distribution, such as a constrained input constellation, for which state decoding can be approximately optimal. We consider the "writing on dirty paper" channel in which the state sequence is multiplied by a fast fading process and derive conditions on the fading and state distributions for which state decoding closely approaches capacity. These conditions intuitively relate to the ability of the receiver to correctly identify both the input and the state realization despite of the uncertainty introduced by fading.
△ Less
Submitted 5 February, 2016;
originally announced February 2016.
-
The Rate-Distortion Risk in Estimation from Compressed Data
Authors:
Alon Kipnis,
Stefano Rini,
Andrea J. Goldsmith
Abstract:
Consider the problem of estimating a latent signal from a lossy compressed version of the data when the compressor is agnostic to the relation between the signal and the data. This situation arises in a host of modern applications when data is transmitted or stored prior to determining the downstream inference task. Given a bitrate constraint and a distortion measure between the data and its compr…
▽ More
Consider the problem of estimating a latent signal from a lossy compressed version of the data when the compressor is agnostic to the relation between the signal and the data. This situation arises in a host of modern applications when data is transmitted or stored prior to determining the downstream inference task. Given a bitrate constraint and a distortion measure between the data and its compressed version, let us consider the joint distribution achieving Shannon's rate-distortion (RD) function. Given an estimator and a loss function associated with the downstream inference task, define the rate-distortion risk as the expected loss under the RD-achieving distribution. We provide general conditions under which the operational risk in estimating from the compressed data is asymptotically equivalent to the RD risk. The main theoretical tools to prove this equivalence are transportation-cost inequalities in conjunction with properties of compression codes achieving Shannon's RD function. Whenever such equivalence holds, a recipe for designing estimators from datasets undergoing lossy compression without specifying the actual compression technique emerges: design the estimator to minimize the RD risk. Our conditions simplified in the special cases of discrete memoryless or multivariate normal data. For these scenarios, we derive explicit expressions for the RD risk of several estimators and compare them to the optimal source coding performance associated with full knowledge of the relation between the latent signal and the data.
△ Less
Submitted 10 January, 2021; v1 submitted 5 February, 2016;
originally announced February 2016.