Search | arXiv e-print repository

Coding for the unsourced B-channel with erasures: enhancing the linked loop code

Authors: William W. Zheng, Jamison R. Ebert, Stefano Rini, Jean-Francois Chamberland

Abstract: In [1], the linked loop code (LLC) is presented as a promising code for the unsourced A-channel with erasures (UACE). The UACE is an unsourced multiple access channel in which active users' transmitted symbols are erased with a given probability and the channel output is obtained as the union of the non-erased symbols. In this paper, we extend the UACE channel model to the unsourced B-channel with… ▽ More In [1], the linked loop code (LLC) is presented as a promising code for the unsourced A-channel with erasures (UACE). The UACE is an unsourced multiple access channel in which active users' transmitted symbols are erased with a given probability and the channel output is obtained as the union of the non-erased symbols. In this paper, we extend the UACE channel model to the unsourced B-channel with erasures (UBCE). The UBCE differs from the UACE in that the channel output is the multiset union, or bag union, of the non-erased input symbols. In other words, the UBCE preserves the symbol multiplicity of the channel output while the UACE does not. Both the UACE and UBCE find applications in modeling aspects of unsourced random access. The LLC from [1] is enhanced and shown to outperform the tree code over the UBCE. Findings are supported by numerical simulations. △ Less

Submitted 20 May, 2024; originally announced June 2024.

Comments: 5 pages, 2 figures, accepted by ICASSP 2024

arXiv:2406.06349 [pdf, other]

ARMA Processes with Discrete-Continuous Excitation: Compressibility Beyond Sparsity

Authors: Mohammad-Amin Charusaie, Stefano Rini, Arash Amini

Abstract: Rényi Information Dimension (RID) plays a central role in quantifying the compressibility of random variables with singularities in their distribution, encompassing and extending beyond the class of sparse sources. The RID, from a high perspective, presents the average number of bits that is needed for coding the i.i.d. samples of a random variable with high precision. There are two main extension… ▽ More Rényi Information Dimension (RID) plays a central role in quantifying the compressibility of random variables with singularities in their distribution, encompassing and extending beyond the class of sparse sources. The RID, from a high perspective, presents the average number of bits that is needed for coding the i.i.d. samples of a random variable with high precision. There are two main extensions of the RID for stochastic processes: information dimension rate (IDR) and block information dimension (BID). In addition, a more recent approach towards the compressibility of stochastic processes revolves around the concept of $ε$-achievable compression rates, which treat a random process as the limiting point of finite-dimensional random vectors and apply the compressed sensing tools on these random variables. While there is limited knowledge about the interplay of the the BID, the IDR, and $ε$-achievable compression rates, the value of IDR and BID themselves are known only for very specific types of processes, namely i.i.d. sequences (i.e., discrete-domain white noise) and moving-average (MA) processes. This paper investigates the IDR and BID of discrete-time Auto-Regressive Moving-Average (ARMA) processes in general, and their relations with $ε$-achievable compression rates when the excitation noise has a discrete-continuous measure. To elaborate, this paper shows that the RID and $ε$-achievable compression rates of this type of processes are equal to that of their excitation noise. In other words, the samples of such ARMA processes can be compressed as much as their sparse excitation noise, although the samples themselves are by no means sparse. The results of this paper can be used to evaluate the compressibility of various types of locally correlated data with finite- or infinite-memory as they are often modelled via ARMA processes. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2401.01145 [pdf, other]

HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids

Authors: Dyah A. M. G. Wisnu, Stefano Rini, Ryandhimas E. Zezario, Hsin-Min Wang, Yu Tsao

Abstract: This paper introduces HAAQI-Net, a non-intrusive deep learning model for music audio quality assessment tailored for hearing aid users. Unlike traditional methods like the Hearing Aid Audio Quality Index (HAAQI), which rely on intrusive comparisons to a reference signal, HAAQI-Net offers a more accessible and efficient alternative. Using a bidirectional Long Short-Term Memory (BLSTM) architecture… ▽ More This paper introduces HAAQI-Net, a non-intrusive deep learning model for music audio quality assessment tailored for hearing aid users. Unlike traditional methods like the Hearing Aid Audio Quality Index (HAAQI), which rely on intrusive comparisons to a reference signal, HAAQI-Net offers a more accessible and efficient alternative. Using a bidirectional Long Short-Term Memory (BLSTM) architecture with attention mechanisms and features from the pre-trained BEATs model, HAAQI-Net predicts HAAQI scores directly from music audio clips and hearing loss patterns. Results show HAAQI-Net's effectiveness, with predicted scores achieving a Linear Correlation Coefficient (LCC) of 0.9368, a Spearman's Rank Correlation Coefficient (SRCC) of 0.9486, and a Mean Squared Error (MSE) of 0.0064, reducing inference time from 62.52 seconds to 2.54 seconds. Although effective, feature extraction via the large BEATs model incurs computational overhead. To address this, a knowledge distillation strategy creates a student distillBEATs model, distilling information from the teacher BEATs model during HAAQI-Net training, reducing required parameters. The distilled HAAQI-Net maintains strong performance with an LCC of 0.9071, an SRCC of 0.9307, and an MSE of 0.0091, while reducing parameters by 75.85% and inference time by 96.46%. This reduction enhances HAAQI-Net's efficiency and scalability, making it viable for real-world music audio quality assessment in hearing aid settings. This work also opens avenues for further research into optimizing deep learning models for specific applications, contributing to audio signal processing and quality assessment by providing insights into develo** efficient and accurate models for practical applications in hearing aid technology. △ Less

Submitted 5 June, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

arXiv:2312.02160 [pdf, ps, other]

Coding for the unsourced A-channel with erasures: the linked loop code

Authors: William W. Zheng, Jamison R. Ebert, Stefano Rini, Jean-Francois Chamberland

Abstract: The A-channel is a noiseless multiple access channel in which users simultaneously transmit Q-ary symbols and the receiver observes the set of transmitted symbols, but not their multiplicities. An A-channel is said to be unsourced if, additionally, users transmissions are encoded across time using a common codebook and decoding of the transmitted messages is done without regard to the identities o… ▽ More The A-channel is a noiseless multiple access channel in which users simultaneously transmit Q-ary symbols and the receiver observes the set of transmitted symbols, but not their multiplicities. An A-channel is said to be unsourced if, additionally, users transmissions are encoded across time using a common codebook and decoding of the transmitted messages is done without regard to the identities of the active users. An interesting variant of the unsourced A-channel is the unsourced A-channel with erasures (UACE), in which transmitted symbols are erased with a given independent and identically distributed probability. In this paper, we focus on designing a code that enables a list of transmitted codewords to be recovered despite the erasures of some of the transmitted symbols. To this end, we propose the linked-loop code (LLC), which uses parity bits to link each symbol to the previous M symbols in a tail-biting manner, i.e., the first symbols of the transmission are linked to the last ones. The decoding process occurs in two phases: the first phase decodes the codewords that do not suffer from any erasures, and the second phase attempts to recover the erased symbols using the available parities. We compare the performance of the LLC over the UACE with other codes in the literature and argue for the effectiveness of the construction. Our motivation for studying the UACE comes from its relevance in machine-type communication and coded compressed sensing. △ Less

Submitted 19 September, 2023; originally announced December 2023.

Comments: 5 pages, 4 figures, to be published in the 31st European Signal Processing Conference, EUSIPCO 2023

arXiv:2311.05003 [pdf, ps, other]

Harmonic Retrieval Using Weighted Lifted-Structure Low-Rank Matrix Completion

Authors: Mohammad Bokaei, Saeed Razavikia, Stefano Rini, Arash Amini, Hamid Behrouzi

Abstract: In this paper, we investigate the problem of recovering the frequency components of a mixture of $K$ complex sinusoids from a random subset of $N$ equally-spaced time-domain samples. Because of the random subset, the samples are effectively non-uniform. Besides, the frequency values of each of the $K$ complex sinusoids are assumed to vary continuously within a given range. For this problem, we p… ▽ More In this paper, we investigate the problem of recovering the frequency components of a mixture of $K$ complex sinusoids from a random subset of $N$ equally-spaced time-domain samples. Because of the random subset, the samples are effectively non-uniform. Besides, the frequency values of each of the $K$ complex sinusoids are assumed to vary continuously within a given range. For this problem, we propose a two-step strategy: (i) we first lift the incomplete set of uniform samples (unavailable samples are treated as missing data) into a structured matrix with missing entries, which is potentially low-rank; then (ii) we complete the matrix using a weighted nuclear minimization problem. We call the method a \emph{ weighted lifted-structured (WLi) low-rank matrix recovery}. Our approach can be applied to a range of matrix structures such as Hankel and double-Hankel, among others, and provides improvement over the unweighted existing schemes such as EMaC and DEMaC. We provide theoretical guarantees for the proposed method, as well as numerical simulations in both noiseless and noisy settings. Both the theoretical and the numerical results confirm the superiority of the proposed approach. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2301.09269 [pdf, other]

M22: A Communication-Efficient Algorithm for Federated Learning Inspired by Rate-Distortion

Authors: Yangyi Liu, Stefano Rini, Sadaf Salehkalaibar, Jun Chen

Abstract: In federated learning (FL), the communication constraint between the remote learners and the Parameter Server (PS) is a crucial bottleneck. For this reason, model updates must be compressed so as to minimize the loss in accuracy resulting from the communication constraint. This paper proposes ``\emph{${\bf M}$-magnitude weighted $L_{\bf 2}$ distortion + $\bf 2$ degrees of freedom''} (M22) algorith… ▽ More In federated learning (FL), the communication constraint between the remote learners and the Parameter Server (PS) is a crucial bottleneck. For this reason, model updates must be compressed so as to minimize the loss in accuracy resulting from the communication constraint. This paper proposes ``\emph{${\bf M}$-magnitude weighted $L_{\bf 2}$ distortion + $\bf 2$ degrees of freedom''} (M22) algorithm, a rate-distortion inspired approach to gradient compression for federated training of deep neural networks (DNNs). In particular, we propose a family of distortion measures between the original gradient and the reconstruction we referred to as ``$M$-magnitude weighted $L_2$'' distortion, and we assume that gradient updates follow an i.i.d. distribution -- generalized normal or Weibull, which have two degrees of freedom. In both the distortion measure and the gradient, there is one free parameter for each that can be fitted as a function of the iteration number. Given a choice of gradient distribution and distortion measure, we design the quantizer minimizing the expected distortion in gradient reconstruction. To measure the gradient compression performance under a communication constraint, we define the \emph{per-bit accuracy} as the optimal improvement in accuracy that one bit of communication brings to the centralized model over the training period. Using this performance measure, we systematically benchmark the choice of gradient distribution and distortion measure. We provide substantial insights on the role of these choices and argue that significant performance improvements can be attained using such a rate-distortion inspired compressor. △ Less

Submitted 22 January, 2023; originally announced January 2023.

Comments: arXiv admin note: text overlap with arXiv:2202.02812

arXiv:2211.06617 [pdf, other]

doi 10.1109/TIT.2024.3365728

Empirical Risk Minimization with Relative Entropy Regularization

Authors: Samir M. Perlaza, Gaetan Bisson, Iñaki Esnaola, Alain Jean-Marie, Stefano Rini

Abstract: The empirical risk minimization (ERM) problem with relative entropy regularization (ERM-RER) is investigated under the assumption that the reference measure is a $σ$-finite measure, and not necessarily a probability measure. Under this assumption, which leads to a generalization of the ERM-RER problem allowing a larger degree of flexibility for incorporating prior knowledge, numerous relevant prop… ▽ More The empirical risk minimization (ERM) problem with relative entropy regularization (ERM-RER) is investigated under the assumption that the reference measure is a $σ$-finite measure, and not necessarily a probability measure. Under this assumption, which leads to a generalization of the ERM-RER problem allowing a larger degree of flexibility for incorporating prior knowledge, numerous relevant properties are stated. Among these properties, the solution to this problem, if it exists, is shown to be a unique probability measure, mutually absolutely continuous with the reference measure. Such a solution exhibits a probably-approximately-correct guarantee for the ERM problem independently of whether the latter possesses a solution. For a fixed dataset and under a specific condition, the empirical risk is shown to be a sub-Gaussian random variable when the models are sampled from the solution to the ERM-RER problem. The generalization capabilities of the solution to the ERM-RER problem (the Gibbs algorithm) are studied via the sensitivity of the expected empirical risk to deviations from such a solution towards alternative probability measures. Finally, an interesting connection between sensitivity, generalization error, and lautum information is established. △ Less

Submitted 8 April, 2024; v1 submitted 12 November, 2022; originally announced November 2022.

Comments: Appears in IEEE Transactions on Information Theory: Submitted June 2023. Revised in October 2023. Accepted January 2024. CameraReady February 2024. Also available as: Research Report, INRIA, No. RR-9454, Centre Inria d'Université Côte d'Azur, Sophia Antipolis, France, Feb., 2022. Last version: Version 7

Report number: RR-9454

arXiv:2205.08199 [pdf, ps, other]

Sharp asymptotics on the compression of two-layer neural networks

Authors: Mohammad Hossein Amani, Simone Bombari, Marco Mondelli, Rattana Pukdee, Stefano Rini

Abstract: In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M<N nodes. More precisely, we consider the setting in which the weights of the target network are i.i.d. sub-Gaussian, and we minimize the population L_2 loss between the outputs of the target and of the compressed network, under the assumption of Gaussian inputs. By using tools… ▽ More In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M<N nodes. More precisely, we consider the setting in which the weights of the target network are i.i.d. sub-Gaussian, and we minimize the population L_2 loss between the outputs of the target and of the compressed network, under the assumption of Gaussian inputs. By using tools from high-dimensional probability, we show that this non-convex problem can be simplified when the target network is sufficiently over-parameterized, and provide the error rate of this approximation as a function of the input dimension and N. In this mean-field limit, the simplified objective, as well as the optimal weights of the compressed network, does not depend on the realization of the target network, but only on expected scaling factors. Furthermore, for networks with ReLU activation, we conjecture that the optimum of the simplified optimization problem is achieved by taking weights on the Equiangular Tight Frame (ETF), while the scaling of the weights and the orientation of the ETF depend on the parameters of the target network. Numerical evidence is provided to support this conjecture. △ Less

Submitted 16 August, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

arXiv:2204.08211 [pdf, ps, other]

How to Attain Communication-Efficient DNN Training? Convert, Compress, Correct

Authors: Zhong-**g Chen, Eduin E. Hernandez, Yu-Chih Huang, Stefano Rini

Abstract: This paper introduces CO3 -- an algorithm for communication-efficient federated Deep Neural Network (DNN) training. CO3 takes its name from three processing applied which reduce the communication load when transmitting the local DNN gradients from the remote users to the Parameter Server. Namely: (i) gradient quantization through floating-point conversion, (ii) lossless compression of the quantize… ▽ More This paper introduces CO3 -- an algorithm for communication-efficient federated Deep Neural Network (DNN) training. CO3 takes its name from three processing applied which reduce the communication load when transmitting the local DNN gradients from the remote users to the Parameter Server. Namely: (i) gradient quantization through floating-point conversion, (ii) lossless compression of the quantized gradient, and (iii) quantization error correction. We carefully design each of the steps above to assure good training performance under a constraint on the communication rate. In particular, in steps (i) and (ii), we adopt the assumption that DNN gradients are distributed according to a generalized normal distribution, which is validated numerically in the paper. For step (iii), we utilize an error feedback with memory decay mechanism to correct the quantization error introduced in step (i). We argue that the memory decay coefficient, similarly to the learning rate, can be optimally tuned to improve convergence. A rigorous convergence analysis of the proposed CO3 with SGD is provided. Moreover, with extensive simulations, we show that CO3 offers improved performance when compared with existing gradient compression schemes in the literature which employ sketching and non-uniform quantization of the local gradients. △ Less

Submitted 1 June, 2023; v1 submitted 18 April, 2022; originally announced April 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2203.09044

arXiv:2203.11793 [pdf, other]

A Perspective on Neural Capacity Estimation: Viability and Reliability

Authors: Farhad Mirkarimi, Stefano Rini, Nariman Farsad

Abstract: Recently, several methods have been proposed for estimating the mutual information from sample data using deep neural networks. These estimators ar referred to as neural mutual information estimation (NMIE)s. NMIEs differ from other approaches as they are data-driven estimators. As such, they have the potential to perform well on a large class of capacity problems. In order to test the performance… ▽ More Recently, several methods have been proposed for estimating the mutual information from sample data using deep neural networks. These estimators ar referred to as neural mutual information estimation (NMIE)s. NMIEs differ from other approaches as they are data-driven estimators. As such, they have the potential to perform well on a large class of capacity problems. In order to test the performance across various NMIEs, it is desirable to establish a benchmark encompassing the different challenges of capacity estimation. This is the objective of this paper. In particular, we consider three scenarios for benchmarking:i the classic AWGN channel, ii channels continuous inputs optical intensity and peak-power constrained AWGN channel iii channels with a discrete output, i.e., Poisson channel. We also consider the extension to the multi-terminal case with iv the AWGN and optical MAC models. We argue that benchmarking a certain NMIE across these four scenarios provides a substantive test of performance. In this paper we study the performance of mutual information neural estimator (MINE), smoothed mutual information lower-bound estimator (SMILE), and directed information neural estimator (DINE). and provide insights on the performance of other methods as well. To summarize our benchmarking results, MINE provides the most reliable performance. △ Less

Submitted 5 October, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: 33 pages, 9 figures, under revison for possible journal publication

arXiv:2203.09044 [pdf, ps, other]

Convert, compress, correct: Three steps toward communication-efficient DNN training

Authors: Zhong-**g Chen, Eduin E. Hernandez, Yu-Chih Huang, Stefano Rini

Abstract: In this paper, we introduce a novel algorithm, $\mathsf{CO}_3$, for communication-efficiency distributed Deep Neural Network (DNN) training. $\mathsf{CO}_3$ is a joint training/communication protocol, which encompasses three processing steps for the network gradients: (i) quantization through floating-point conversion, (ii) lossless compression, and (iii) error correction. These three components a… ▽ More In this paper, we introduce a novel algorithm, $\mathsf{CO}_3$, for communication-efficiency distributed Deep Neural Network (DNN) training. $\mathsf{CO}_3$ is a joint training/communication protocol, which encompasses three processing steps for the network gradients: (i) quantization through floating-point conversion, (ii) lossless compression, and (iii) error correction. These three components are crucial in the implementation of distributed DNN training over rate-constrained links. The interplay of these three steps in processing the DNN gradients is carefully balanced to yield a robust and high-performance scheme. The performance of the proposed scheme is investigated through numerical evaluations over CIFAR-10. △ Less

Submitted 16 March, 2022; originally announced March 2022.

arXiv:2203.00239 [pdf, ps, other]

doi 10.1109/TSP.2022.3182224

Coded Demixing for Unsourced Random Access

Authors: Jamison R. Ebert, Vamsi K. Amalladinne, Stefano Rini, Jean-Francois Chamberland, Krishna R. Narayanan

Abstract: Unsourced random access (URA) is a recently proposed multiple access paradigm tailored to the uplink channel of machine-type communication networks. By exploiting a strong connection between URA and compressed sensing, the massive multiple access problem may be cast as a compressed sensing (CS) problem, albeit one in exceedingly large dimensions. To efficiently handle the dimensionality of the pro… ▽ More Unsourced random access (URA) is a recently proposed multiple access paradigm tailored to the uplink channel of machine-type communication networks. By exploiting a strong connection between URA and compressed sensing, the massive multiple access problem may be cast as a compressed sensing (CS) problem, albeit one in exceedingly large dimensions. To efficiently handle the dimensionality of the problem, coded compressed sensing (CCS) has emerged as a pragmatic signal processing tool that, when applied to URA, offers good performance at low complexity. While CCS is effective at recovering a signal that is sparse with respect to a single basis, it is unable to jointly recover signals that are sparse with respect to separate bases. In this article, the CCS framework is extended to the demixing setting, yielding a novel technique called coded demixing. A generalized framework for coded demixing is presented and a low-complexity recovery algorithm based on approximate message passing (AMP) is developed. Coded demixing is applied to heterogeneous multi-class URA networks and traditional single-class networks. Its performance is analyzed and numerical simulations are presented to highlight the benefits of coded demixing. △ Less

Submitted 27 June, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: 1053-587X Copyright 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information

Journal ref: IEEE Transactions on Signal Processing, vol. 70, pp. 2972-2984, 2022

arXiv:2202.10148 [pdf, other]

Two-snapshot DOA Estimation via Hankel-structured Matrix Completion

Authors: Mohammad Bokaei, Saeed Razavikia, Arash Amini, Stefano Rini

Abstract: In this paper, we study the problem of estimating the direction of arrival (DOA) using a sparsely sampled uniform linear array (ULA). Based on an initial incomplete ULA measurement, our strategy is to choose a sparse subset of array elements for measuring the next snapshot. Then, we use a Hankel-structured matrix completion to interpolate for the missing ULA measurements. Finally, the source DOAs… ▽ More In this paper, we study the problem of estimating the direction of arrival (DOA) using a sparsely sampled uniform linear array (ULA). Based on an initial incomplete ULA measurement, our strategy is to choose a sparse subset of array elements for measuring the next snapshot. Then, we use a Hankel-structured matrix completion to interpolate for the missing ULA measurements. Finally, the source DOAs are estimated using a subspace method such as Prony on the fully recovered ULA. We theoretically provide a sufficient bound for the number of required samples (array elements) for perfect recovery. The numerical comparisons of the proposed method with existing techniques such as atomic-norm minimization and off-the-grid approaches confirm the superiority of the proposed method. △ Less

Submitted 21 February, 2022; originally announced February 2022.

arXiv:2202.04385 [pdf, ps, other]

Empirical Risk Minimization with Relative Entropy Regularization: Optimality and Sensitivity Analysis

Authors: Samir M. Perlaza, Gaetan Bisson, Iñaki Esnaola, Alain Jean-Marie, Stefano Rini

Abstract: The optimality and sensitivity of the empirical risk minimization problem with relative entropy regularization (ERM-RER) are investigated for the case in which the reference is a sigma-finite measure instead of a probability measure. This generalization allows for a larger degree of flexibility in the incorporation of prior knowledge over the set of models. In this setting, the interplay of the re… ▽ More The optimality and sensitivity of the empirical risk minimization problem with relative entropy regularization (ERM-RER) are investigated for the case in which the reference is a sigma-finite measure instead of a probability measure. This generalization allows for a larger degree of flexibility in the incorporation of prior knowledge over the set of models. In this setting, the interplay of the regularization parameter, the reference measure, the risk function, and the empirical risk induced by the solution of the ERM-RER problem is characterized. This characterization yields necessary and sufficient conditions for the existence of a regularization parameter that achieves an arbitrarily small empirical risk with arbitrarily high probability. The sensitivity of the expected empirical risk to deviations from the solution of the ERM-RER problem is studied. The sensitivity is then used to provide upper and lower bounds on the expected empirical risk. Moreover, it is shown that the expectation of the sensitivity is upper bounded, up to a constant factor, by the square root of the lautum information between the models and the datasets. △ Less

Submitted 12 November, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

Comments: In Proc. IEEE International Symposium on Information Theory (ISIT), Aalto, Finland, Jul., 2022

arXiv:2202.02812 [pdf, ps, other]

Lossy Gradient Compression: How Much Accuracy Can One Bit Buy?

Authors: Sadaf Salehkalaibar, Stefano Rini

Abstract: In federated learning (FL), a global model is trained at a Parameter Server (PS) by aggregating model updates obtained from multiple remote learners. Generally, the communication between the remote users and the PS is rate-limited, while the transmission from the PS to the remote users are unconstrained. The FL setting gives rise to the distributed learning scenario in which the updates from the r… ▽ More In federated learning (FL), a global model is trained at a Parameter Server (PS) by aggregating model updates obtained from multiple remote learners. Generally, the communication between the remote users and the PS is rate-limited, while the transmission from the PS to the remote users are unconstrained. The FL setting gives rise to the distributed learning scenario in which the updates from the remote learners have to be compressed so as to meet communication rate constraints in the uplink transmission toward the PS. For this problem, one wishes to compress the model updates so as to minimize the loss in accuracy resulting from the compression error. In this paper, we take a rate-distortion approach to address the compressor design problem for the distributed training of deep neural networks (DNNs). In particular, we define a measure of the compression performance under communication-rate constraints -- the \emph{per-bit accuracy} -- which addresses the ultimate improvement of accuracy that a bit of communication brings to the centralized model. In order to maximize the per-bit accuracy, we consider modeling the DNN gradient updates at remote learners as a generalized normal distribution. Under this assumption on the DNN gradient distribution, we propose a class of distortion measures to aid the design of quantizers for the compression of the model updates. We argue that this family of distortion measures, which we refer to as "$M$-magnitude weighted $L_2$" norm, captures the practitioner's intuition in the choice of gradient compressor. Numerical simulations are provided to validate the proposed approach for the CIFAR-10 dataset. △ Less

Submitted 2 June, 2022; v1 submitted 6 February, 2022; originally announced February 2022.

arXiv:2111.07599 [pdf, ps, other]

DNN gradient lossless compression: Can GenNorm be the answer?

Authors: Zhong-**g Chen, Eduin E. Hernandez, Yu-Chih Huang, Stefano Rini

Abstract: In this paper, the problem of optimal gradient lossless compression in Deep Neural Network (DNN) training is considered. Gradient compression is relevant in many distributed DNN training scenarios, including the recently popular federated learning (FL) scenario in which each remote users are connected to the parameter server (PS) through a noiseless but rate limited channel. In distributed DNN tra… ▽ More In this paper, the problem of optimal gradient lossless compression in Deep Neural Network (DNN) training is considered. Gradient compression is relevant in many distributed DNN training scenarios, including the recently popular federated learning (FL) scenario in which each remote users are connected to the parameter server (PS) through a noiseless but rate limited channel. In distributed DNN training, if the underlying gradient distribution is available, classical lossless compression approaches can be used to reduce the number of bits required for communicating the gradient entries. Mean field analysis has suggested that gradient updates can be considered as independent random variables, while Laplace approximation can be used to argue that gradient has a distribution approximating the normal (Norm) distribution in some regimes. In this paper we argue that, for some networks of practical interest, the gradient entries can be well modelled as having a generalized normal (GenNorm) distribution. We provide numerical evaluations to validate that the hypothesis GenNorm modelling provides a more accurate prediction of the DNN gradient tail distribution. Additionally, this modeling choice provides concrete improvement in terms of lossless compression of the gradients when applying classical fix-to-variable lossless coding algorithms, such as Huffman coding, to the quantized gradient updates. This latter results indeed provides an effective compression strategy with low memory and computational complexity that has great practical relevance in distributed DNN training scenarios. △ Less

Submitted 15 November, 2021; originally announced November 2021.

arXiv:2111.07401 [pdf, other]

Neural Capacity Estimators: How Reliable Are They?

Authors: Farhad Mirkarimi, Stefano Rini, Nariman Farsad

Abstract: Recently, several methods have been proposed for estimating the mutual information from sample data using deep neural networks and without the knowing closed form distribution of the data. This class of estimators is referred to as neural mutual information estimators. Although very promising, such techniques have yet to be rigorously bench-marked so as to establish their efficacy, ease of impleme… ▽ More Recently, several methods have been proposed for estimating the mutual information from sample data using deep neural networks and without the knowing closed form distribution of the data. This class of estimators is referred to as neural mutual information estimators. Although very promising, such techniques have yet to be rigorously bench-marked so as to establish their efficacy, ease of implementation, and stability for capacity estimation which is joint maximization frame-work. In this paper, we compare the different techniques proposed in the literature for estimating capacity and provide a practitioner perspective on their effectiveness. In particular, we study the performance of mutual information neural estimator (MINE), smoothed mutual information lower-bound estimator (SMILE), and directed information neural estimator (DINE) and provide insights on InfoNCE. We evaluated these algorithms in terms of their ability to learn the input distributions that are capacity approaching for the AWGN channel, the optical intensity channel, and peak power-constrained AWGN channel. For both scenarios, we provide insightful comments on various aspects of the training process, such as stability, sensitivity to initialization. △ Less

Submitted 18 March, 2022; v1 submitted 14 November, 2021; originally announced November 2021.

Comments: 7 pages, accepted for publication at the 2022 IEEE International Conference on Communications (ICC)

arXiv:2110.09164 [pdf, ps, other]

Speeding-Up Back-Propagation in DNN: Approximate Outer Product with Memory

Authors: Eduin E. Hernandez, Stefano Rini, Tolga M. Duman

Abstract: In this paper, an algorithm for approximate evaluation of back-propagation in DNN training is considered, which we term Approximate Outer Product Gradient Descent with Memory (Mem-AOP-GD). The Mem-AOP-GD algorithm implements an approximation of the stochastic gradient descent by considering only a subset of the outer products involved in the matrix multiplications that encompass backpropagation. I… ▽ More In this paper, an algorithm for approximate evaluation of back-propagation in DNN training is considered, which we term Approximate Outer Product Gradient Descent with Memory (Mem-AOP-GD). The Mem-AOP-GD algorithm implements an approximation of the stochastic gradient descent by considering only a subset of the outer products involved in the matrix multiplications that encompass backpropagation. In order to correct for the inherent bias in this approximation, the algorithm retains in memory an accumulation of the outer products that are not used in the approximation. We investigate the performance of the proposed algorithm in terms of DNN training loss under two design parameters: (i) the number of outer products used for the approximation, and (ii) the policy used to select such outer products. We experimentally show that significant improvements in computational complexity as well as accuracy can indeed be obtained through Mem-AOPGD. △ Less

Submitted 18 October, 2021; originally announced October 2021.

Comments: 5 pages, 3 figures

arXiv:2106.00564 [pdf, other]

Wireless Federated Learning with Limited Communication and Differential Privacy

Authors: Amir Sonee, Stefano Rini, Yu-Chih Huang

Abstract: This paper investigates the role of dimensionality reduction in efficient communication and differential privacy (DP) of the local datasets at the remote users for over-the-air computation (AirComp)-based federated learning (FL) model. More precisely, we consider the FL setting in which clients are prompted to train a machine learning model by simultaneous channel-aware and limited communications… ▽ More This paper investigates the role of dimensionality reduction in efficient communication and differential privacy (DP) of the local datasets at the remote users for over-the-air computation (AirComp)-based federated learning (FL) model. More precisely, we consider the FL setting in which clients are prompted to train a machine learning model by simultaneous channel-aware and limited communications with a parameter server (PS) over a Gaussian multiple-access channel (GMAC), so that transmissions sum coherently at the PS globally aware of the channel coefficients. For this setting, an algorithm is proposed based on applying federated stochastic gradient descent (FedSGD) for training the minimum of a given loss function based on the local gradients, Johnson-Lindenstrauss (JL) random projection for reducing the dimension of the local updates, and artificial noise to further aid user's privacy. For this scheme, our results show that the local DP performance is mainly improved due to injecting noise of greater variance on each dimension while kee** the sensitivity of the projected vectors unchanged. This is while the convergence rate is slowed down compared to the case without dimensionality reduction. As the performance outweighs for the slower convergence, the trade-off between privacy and convergence is higher but is shown to lessen in high-dimensional regime yielding almost the same trade-off with much less communication cost. △ Less

Submitted 1 June, 2021; originally announced June 2021.

arXiv:2105.14464 [pdf, other]

Comparison-limited Vector Quantization

Authors: Joseph Chataignon, Stefano Rini

Abstract: In this paper a variation of the classic vector quantization problem is considered. In the standard formulation, a quantizer is designed to minimize the distortion between input and output when the number of reconstruction points is fixed. We consider, instead, the scenario in which the number of comparators used in quantization is fixed. More precisely, we study the case in which a vector quantiz… ▽ More In this paper a variation of the classic vector quantization problem is considered. In the standard formulation, a quantizer is designed to minimize the distortion between input and output when the number of reconstruction points is fixed. We consider, instead, the scenario in which the number of comparators used in quantization is fixed. More precisely, we study the case in which a vector quantizer of dimension d is comprised of k comparators, each receiving a linear combination of the inputs and producing the output value one/zero if this linear combination is above/below a certain threshold. In reconstruction, the comparators' output is mapped to a reconstruction point, chosen so as to minimize a chosen distortion measure between the quantizer input and its reconstruction. The Comparison-Limited Vector Quantization (CLVQ) problem is then defined as the problem of optimally designing the configuration of the compactors and the choice of reconstruction points so as to minimize the given distortion. In this paper, we design a numerical optimization algorithm for the CLVQ problem. This algorithm leverages combinatorial geometrical notions to describe the hyperplane arrangement induced by the configuration of the comparators. It also relies on a genetic genetic meta heuristic to improve the selection of the quantizer initialization and avoid local minima encountered during optimization. We numerically evaluate the performance of our algorithm in the case of input distributions following uniform and Gaussian i.i.d. sources to be compressed under quadratic distortion and compare it to the classic Linde-Buzo-Gray (LBG) algorithm. △ Less

Submitted 30 May, 2021; originally announced May 2021.

Comments: 27 pages, 10 figures

arXiv:2104.05686 [pdf, ps, other]

Stochastic Binning and Coded Demixing for Unsourced Random Access

Authors: Jamison R. Ebert, Vamsi K. Amalladinne, Stefano Rini, Jean-Francois Chamberland, Krishna R. Narayanan

Abstract: Unsourced random access is a novel communication paradigm designed for handling a large number of uncoordinated users that sporadically transmit very short messages. Under this model, coded compressed sensing (CCS) has emerged as a low-complexity scheme that exhibits good error performance. Yet, one of the challenges faced by CCS pertains to disentangling a large number of codewords present on a s… ▽ More Unsourced random access is a novel communication paradigm designed for handling a large number of uncoordinated users that sporadically transmit very short messages. Under this model, coded compressed sensing (CCS) has emerged as a low-complexity scheme that exhibits good error performance. Yet, one of the challenges faced by CCS pertains to disentangling a large number of codewords present on a single factor graph. To mitigate this issue, this article introduces a modified CCS scheme whereby active devices stochastically partition themselves into groups that utilize separate sampling matrices with low cross-coherence for message transmission. At the receiver, ideas from the field of compressed demixing are employed for support recovery, and separate factor graphs are created for message disambiguation in each cluster. This reduces the number of active users on a factor graph, which improves performance significantly in typical scenarios. Indeed, coded demixing reduces the probability of error as the number of groups increases, up to a point. Findings are supported with numerical simulations. △ Less

Submitted 21 July, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: Submitted to IEEE-SPAWC 2021

arXiv:2103.04215 [pdf, ps, other]

Hierarchical Causal Bandit

Authors: Ruiyang Song, Stefano Rini, Kuang Xu

Abstract: Causal bandit is a nascent learning model where an agent sequentially experiments in a causal network of variables, in order to identify the reward-maximizing intervention. Despite the model's wide applicability, existing analytical results are largely restricted to a parallel bandit version where all variables are mutually independent. We introduce in this work the hierarchical causal bandit mode… ▽ More Causal bandit is a nascent learning model where an agent sequentially experiments in a causal network of variables, in order to identify the reward-maximizing intervention. Despite the model's wide applicability, existing analytical results are largely restricted to a parallel bandit version where all variables are mutually independent. We introduce in this work the hierarchical causal bandit model as a viable path towards understanding general causal bandits with dependent variables. The core idea is to incorporate a contextual variable that captures the interaction among all variables with direct effects. Using this hierarchical framework, we derive sharp insights on algorithmic design in causal bandits with dependent arms and obtain nearly matching regret bounds in the case of a binary context. △ Less

Submitted 6 March, 2021; originally announced March 2021.

arXiv:2103.02928 [pdf, other]

Straggler Mitigation through Unequal Error Protection for Distributed Approximate Matrix Multiplication

Authors: Busra Tegin, Eduin. E. Hernandez, Stefano Rini, Tolga M. Duman

Abstract: Large-scale machine learning and data mining methods routinely distribute computations across multiple agents to parallelize processing. The time required for the computations at the agents is affected by the availability of local resources and/or poor channel conditions giving rise to the "straggler problem". As a remedy to this problem, we employ Unequal Error Protection (UEP) codes to obtain an… ▽ More Large-scale machine learning and data mining methods routinely distribute computations across multiple agents to parallelize processing. The time required for the computations at the agents is affected by the availability of local resources and/or poor channel conditions giving rise to the "straggler problem". As a remedy to this problem, we employ Unequal Error Protection (UEP) codes to obtain an approximation of the matrix product in the distributed computation setting to provide higher protection for the blocks with higher effect on the final result. We characterize the performance of the proposed approach from a theoretical perspective by bounding the expected reconstruction error for matrices with uncorrelated entries. We also apply the proposed coding strategy to the computation of the back-propagation step in the training of a Deep Neural Network (DNN) for an image classification task in the evaluation of the gradients. Our numerical experiments show that it is indeed possible to obtain significant improvements in the overall time required to achieve the DNN training convergence by producing approximation of matrix products using UEP codes in the presence of stragglers. △ Less

Submitted 27 July, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

Comments: 16 pages. arXiv admin note: text overlap with arXiv:2011.02749

arXiv:2102.07704 [pdf, ps, other]

Multi-Class Unsourced Random Access via Coded Demixing

Authors: Vamsi K. Amalladinne, Allen Hao, Stefano Rini, Jean-Francois Chamberland

Abstract: Unsourced random access (URA) is a recently proposed communication paradigm attuned to machine-driven data transfers. In the original URA formulation, all the active devices share the same number of bits per packet. The scenario where several classes of devices transmit concurrently has so far received little attention. An initial solution to this problem takes the form of group successive interfe… ▽ More Unsourced random access (URA) is a recently proposed communication paradigm attuned to machine-driven data transfers. In the original URA formulation, all the active devices share the same number of bits per packet. The scenario where several classes of devices transmit concurrently has so far received little attention. An initial solution to this problem takes the form of group successive interference cancellation, where codewords from a class of devices with more resources are recovered first, followed by the decoding of the remaining messages. This article introduces a joint iterative decoding approach rooted in approximate message passing. This framework has a concatenated coding structure borrowed from the single-class coded compressed sensing and admits a solution that offers performance improvement at little added computational complexity. Our findings point to new connections between multi-class URA and compressive demixing. The performance of the envisioned algorithm is validated through numerical simulations. △ Less

Submitted 15 February, 2021; originally announced February 2021.

Comments: Submitted to IEEE ISIT-2021

arXiv:2011.10900 [pdf, ps, other]

An Exploration of the Heterogeneous Unsourced MAC

Authors: Allen Hao, Stefano Rini, Vamsi Amalladinne, Asit Kumar Pradhan, Jean-Francois Chamberland

Abstract: The unsourced MAC model was originally introduced to study the communication scenario in which a number of devices with low-complexity and low-energy wish to upload their respective messages to a base station. In the original problem formulation, all devices communicate using the same information rate. This may be very inefficient in certain wireless situations with varied channel conditions, powe… ▽ More The unsourced MAC model was originally introduced to study the communication scenario in which a number of devices with low-complexity and low-energy wish to upload their respective messages to a base station. In the original problem formulation, all devices communicate using the same information rate. This may be very inefficient in certain wireless situations with varied channel conditions, power budgets, and payload requirements at the devices. This paper extends the original problem setting so as to allow for such variability. More specifically, we consider the scenario in which devices are clustered into two classes, possibly with different SNR levels or distinct payload requirements. In the cluster with higher power,devices transmit using a two-layer superposition modulation. In the cluster with lower energy, users transmit with the same base constellation as in the high power cluster. Within each layer, devices employ the same codebook. At the receiver, signal grou**s are recovered using Approximate Message Passing(AMP), and proceeding from the high to the low power levels using successive interference cancellation (SIC). This layered architecture is implemented using Coded Compressed Sensing(CCS) within every grou**. An outer tree code is employed to stitch fragments together across times and layers, as needed.This pragmatic approach to heterogeneous CCS is validated numerically and design guidelines are identified. △ Less

Submitted 21 November, 2020; originally announced November 2020.

arXiv:2011.07470 [pdf, other]

An efficient label-free analyte detection algorithm for time-resolved spectroscopy

Authors: Stefano Rini, Hirotsugu Hiramatsu

Abstract: Time-resolved spectral techniques play an important analysis tool in many contexts, from physical chemistry to biomedicine. Customarily, the label-free detection of analytes is manually performed by experts through the aid of classic dimensionality-reduction methods, such as Principal Component Analysis (PCA) and Non-negative Matrix Factorization (NMF). This fundamental reliance on expert analysis… ▽ More Time-resolved spectral techniques play an important analysis tool in many contexts, from physical chemistry to biomedicine. Customarily, the label-free detection of analytes is manually performed by experts through the aid of classic dimensionality-reduction methods, such as Principal Component Analysis (PCA) and Non-negative Matrix Factorization (NMF). This fundamental reliance on expert analysis for unknown analyte detection severely hinders the applicability and the throughput of these such techniques. For this reason, in this paper, we formulate this detection problem as an unsupervised learning problem and propose a novel machine learning algorithm for label-free analyte detection. To show the effectiveness of the proposed solution, we consider the problem of detecting the amino-acids in Liquid Chromatography coupled with Raman spectroscopy (LC-Raman). △ Less

Submitted 15 November, 2020; originally announced November 2020.

arXiv:2011.04801 [pdf, ps, other]

Fairness-Oriented User Association in HetNets Using Bargaining Game Theory

Authors: Ehsan Sadeghi, Hamid Behroozi, Stefano Rini

Abstract: In this paper, the user association and resource allocation problem is investigated for a two-tier HetNet consisting of one macro Base Station (BS) and a number of pico BSs. The effectiveness of user association to BSs is evaluated in terms of fairness and load distribution. In particular, the problem of determining a fair user association is formulated as a bargaining game so that for the Nash Ba… ▽ More In this paper, the user association and resource allocation problem is investigated for a two-tier HetNet consisting of one macro Base Station (BS) and a number of pico BSs. The effectiveness of user association to BSs is evaluated in terms of fairness and load distribution. In particular, the problem of determining a fair user association is formulated as a bargaining game so that for the Nash Bargaining Solution (NBS) abiding the fairness axioms provides an optimal and fair user association. The NBS also yields in a Pareto optimal solution and leads to a proportional fair solution in the proposed HetNet model. Additionally, we introduce a novel algorithmic solution in which a new Coalition Generation Algorithm (CGA), called SINR-based CGA, is considered in order to simplify the coalition generation phase. Our simulation results show the efficiency of the proposed user association scheme in terms of fairness and load distribution among BSs and users. In particular, we compare the performance of the proposed solution with that of the throughput-oriented scheme in terms of the max-sum-rate scheme and show that the proposed solution yields comparable average data rates and overall sum rate. △ Less

Submitted 9 November, 2020; originally announced November 2020.

arXiv:2011.02749 [pdf, ps, other]

Straggler Mitigation through Unequal Error Protection for Distributed Matrix Multiplication

Authors: Busra Tegin, Eduin E. Hernandez, Stefano Rini, Tolga M. Duman

Abstract: Large-scale machine learning and data mining methods routinely distribute computations across multiple agents to parallelize processing. The time required for computation at the agents is affected by the availability of local resources giving rise to the "straggler problem" in which the computation results are held back by unresponsive agents. For this problem, linear coding of the matrix sub-bloc… ▽ More Large-scale machine learning and data mining methods routinely distribute computations across multiple agents to parallelize processing. The time required for computation at the agents is affected by the availability of local resources giving rise to the "straggler problem" in which the computation results are held back by unresponsive agents. For this problem, linear coding of the matrix sub-blocks can be used to introduce resilience toward straggling. The Parameter Server (PS) utilizes a channel code and distributes the matrices to the workers for multiplication. It then produces an approximation to the desired matrix multiplication using the results of the computations received at a given deadline. In this paper, we propose to employ Unequal Error Protection (UEP) codes to alleviate the straggler problem. The resiliency level of each sub-block is chosen according to its norm as blocks with larger norms have higher effects on the result of the matrix multiplication. We validate the effectiveness of our scheme both theoretically and through numerical evaluations. We derive a theoretical characterization of the performance of UEP using random linear codes, and compare it the case of equal error protection. We also apply the proposed coding strategy to the computation of the back-propagation step in the training of a Deep Neural Network (DNN), for which we investigate the fundamental trade-off between precision and the time required for the computations. △ Less

Submitted 19 March, 2021; v1 submitted 5 November, 2020; originally announced November 2020.

Comments: 6 pages, 6 figures

arXiv:2011.00889 [pdf, ps, other]

On Sum Secure Degrees of Freedom for K-User MISO Broadcast Channel With Alternating CSIT

Authors: Leyla Sadighi, Sadaf Salehkalaibar, Stefano Rini

Abstract: In this paper, the sum secure degrees of freedom (SDoF) of the $K$-user Multiple Input/Single Output (MISO) Broadcast Channel with Confidential Messages (BCCM) and alternating Channel State Information at the Transmitter (CSIT) is investigated. In the MISO BCCM, a $K$-antenna transmitter (TX) communicates toward $K$ single-antenna receivers (RXs), so that message for RX $k$ is kept secret from RX… ▽ More In this paper, the sum secure degrees of freedom (SDoF) of the $K$-user Multiple Input/Single Output (MISO) Broadcast Channel with Confidential Messages (BCCM) and alternating Channel State Information at the Transmitter (CSIT) is investigated. In the MISO BCCM, a $K$-antenna transmitter (TX) communicates toward $K$ single-antenna receivers (RXs), so that message for RX $k$ is kept secret from RX $j$ with $j<k$. For this model, we consider the scenario in which the CSI of the RXs from $2$ to $K$ is instantaneously known at the transmitter while CSI of RX $1$ is known at the transmitter (i) instantaneously for half of the time and (ii) with a unit delay for the remainder of the time. We refer to this CSIT availability as \emph{alternating} CSIT. Alternating CIST has been shown to provide synergistic gains in terms of SDoF and is thus of a viable strategy to ensure secure communication by simply relying on the CSI feedback strategy. Our main contribution is the characterization of sum SDoF for this model as $SDoF_{\rm sum}= (2K-1)/2$. Interestingly, this $SDoF_{\rm sum}$ is attained by a rather simple achievability in which the TX uses artificial noise to prevent the decoding of the message of the unintended receivers at RX $1$. For simplicity first, the proof for the case $K=3$ is discussed in detail and after that, we have presented the results for any number of RXs. △ Less

Submitted 22 November, 2020; v1 submitted 2 November, 2020; originally announced November 2020.

Comments: Journal

arXiv:2010.11292 [pdf, other]

Decentralized optimization over noisy, rate-constrained networks: Achieving consensus by communicating differences

Authors: Rajarshi Saha, Stefano Rini, Milind Rao, Andrea Goldsmith

Abstract: In decentralized optimization, multiple nodes in a network collaborate to minimize the sum of their local loss functions. The information exchange between nodes required for this task, is often limited by network connectivity. We consider a setting in which communication between nodes is hindered by both (i) a finite rate-constraint on the signal transmitted by any node, and (ii) additive noise co… ▽ More In decentralized optimization, multiple nodes in a network collaborate to minimize the sum of their local loss functions. The information exchange between nodes required for this task, is often limited by network connectivity. We consider a setting in which communication between nodes is hindered by both (i) a finite rate-constraint on the signal transmitted by any node, and (ii) additive noise corrupting the signal received by any node. We propose a novel algorithm for this scenario: Decentralized Lazy Mirror Descent with Differential Exchanges (DLMD-DiffEx), which guarantees convergence of the local estimates to the optimal solution under the given communication constraints. A salient feature of DLMD-DiffEx is the introduction of additional proxy variables that are maintained by the nodes to account for the disagreement in their estimates due to channel noise and rate-constraints. Convergence to the optimal solution is attained by having nodes iteratively exchange these disagreement terms until consensus is achieved. In order to prevent noise accumulation during this exchange, DLMD-DiffEx relies on two sequences; one controlling the power of the transmitted signal, and the other determining the consensus rate. We provide clear insights on the design of these two sequences which highlights the interplay between consensus rate and noise amplification. We investigate the performance of DLMD-DiffEx both from a theoretical perspective as well as through numerical evaluations. △ Less

Submitted 6 October, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

Comments: 15 pages, 6 figures (To be published in the "IEEE Journal on Selected Areas in Communications (JSAC) Special Issue on Distributed Learning over Wireless Edge Networks")

arXiv:2005.07776 [pdf, other]

Efficient Federated Learning over Multiple Access Channel with Differential Privacy Constraints

Authors: Amir Sonee, Stefano Rini

Abstract: In this paper, the problem of federated learning (FL) through digital communication between clients and a parameter server (PS) over a multiple access channel (MAC), also subject to differential privacy (DP) constraints, is studied. More precisely, we consider the setting in which clients in a centralized network are prompted to train a machine learning model using their local datasets. The inform… ▽ More In this paper, the problem of federated learning (FL) through digital communication between clients and a parameter server (PS) over a multiple access channel (MAC), also subject to differential privacy (DP) constraints, is studied. More precisely, we consider the setting in which clients in a centralized network are prompted to train a machine learning model using their local datasets. The information exchange between the clients and the PS takes places over a MAC channel and must also preserve the DP of the local datasets. Accordingly, the objective of the clients is to minimize the training loss subject to (i) rate constraints for reliable communication over the MAC and (ii) DP constraint over the local datasets. For this optimization scenario, we proposed a novel consensus scheme in which digital distributed stochastic gradient descent (D-DSGD) is performed by each client. To preserve DP, a digital artificial noise is also added by the users to the locally quantized gradients. The performance of the scheme is evaluated in terms of the convergence rate and DP level for a given MAC capacity. The performance is optimized over the choice of the quantization levels and the artificial noise parameters. Numerical evaluations are presented to validate the performance of the proposed scheme. △ Less

Submitted 1 November, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

arXiv:2005.06739 [pdf, other]

The Information & Mutual Information Ratio for Counting Image Features and Their Matches

Authors: Ali Khajegili Mirabadi, Stefano Rini

Abstract: Feature extraction and description is an important topic of computer vision, as it is the starting point of a number of tasks such as image reconstruction, stitching, registration, and recognition among many others. In this paper, two new image features are proposed: the Information Ratio (IR) and the Mutual Information Ratio (MIR). The IR is a feature of a single image, while the MIR describes fe… ▽ More Feature extraction and description is an important topic of computer vision, as it is the starting point of a number of tasks such as image reconstruction, stitching, registration, and recognition among many others. In this paper, two new image features are proposed: the Information Ratio (IR) and the Mutual Information Ratio (MIR). The IR is a feature of a single image, while the MIR describes features common across two or more images.We begin by introducing the IR and the MIR and motivate these features in an information theoretical context as the ratio of the self-information of an intensity level over the information contained over the pixels of the same intensity. Notably, the relationship of the IR and MIR with the image entropy and mutual information, classic information measures, are discussed. Finally, the effectiveness of these features is tested through feature extraction over INRIA Copydays datasets and feature matching over the Oxfords Affine Covariant Regions. These numerical evaluations validate the relevance of the IR and MIR in practical computer vision tasks △ Less

Submitted 14 May, 2020; originally announced May 2020.

Comments: 8-th Iran Workshop on Communication and Information Theory, 2020

arXiv:2003.04216 [pdf, other]

Decentralized SGD with Over-the-Air Computation

Authors: Emre Ozfatura, Stefano Rini, Deniz Gunduz

Abstract: We study the performance of decentralized stochastic gradient descent (DSGD) in a wireless network, where the nodes collaboratively optimize an objective function using their local datasets. Unlike the conventional setting, where the nodes communicate over error-free orthogonal communication links, we assume that transmissions are prone to additive noise and interference.We first consider a point-… ▽ More We study the performance of decentralized stochastic gradient descent (DSGD) in a wireless network, where the nodes collaboratively optimize an objective function using their local datasets. Unlike the conventional setting, where the nodes communicate over error-free orthogonal communication links, we assume that transmissions are prone to additive noise and interference.We first consider a point-to-point (P2P) transmission strategy, termed the OAC-P2P scheme, in which the node pairs are scheduled in an orthogonal fashion to minimize interference. Since in the DSGD framework, each node requires a linear combination of the neighboring models at the consensus step, we then propose the OAC-MAC scheme, which utilizes the signal superposition property of the wireless medium to achieve over-the-air computation (OAC). For both schemes, we cast the scheduling problem as a graph coloring problem. We numerically evaluate the performance of these two schemes for the MNIST image classification task under various network conditions. We show that the OAC-MAC scheme attains better convergence performance with a fewer communication rounds. △ Less

Submitted 6 March, 2020; originally announced March 2020.

arXiv:2001.07485 [pdf, ps, other]

On the Capacity of the Oversampled Wiener Phase Noise Channel

Authors: Luca Barletta, Stefano Rini

Abstract: In this paper, the capacity of the oversampled Wiener phase noise (OWPN) channel is investigated. The OWPN channel is a discrete-time point-to-point channel with a multi-sample receiver in which the channel output is affected by both additive and multiplicative noise. The additive noise is a white standard Gaussian process while the multiplicative noise is a Wiener phase noise process. This channe… ▽ More In this paper, the capacity of the oversampled Wiener phase noise (OWPN) channel is investigated. The OWPN channel is a discrete-time point-to-point channel with a multi-sample receiver in which the channel output is affected by both additive and multiplicative noise. The additive noise is a white standard Gaussian process while the multiplicative noise is a Wiener phase noise process. This channel generalizes a number of channel models previously studied in the literature which investigate the effects of phase noise on the channel capacity, such as the Wiener phase noise channel and the non-coherent channel. We derive upper and inner bounds to the capacity of OWPN channel: (i) an upper bound is derived through the I-MMSE relationship by bounding the Fisher information when estimating a phase noise sample given the past channel outputs and phase noise realizations, then (ii) two inner bounds are shown: one relying on coherent combining of the oversampled channel outputs and one relying on non-coherent combining of the samples. After capacity, we study generalized degrees of freedom (GDoF) of the OWPN channel for the case in which the oversampling factor grows with the average transmit power $P$ as $P$? and the frequency noise variance as $P^α$?. Using our new capacity bounds, we derive the GDoF region in three regimes: regime (i) in which the GDoF region equals that of the classic additive white Gaussian noise (for $β\leq 1$), one (ii) in which GDoF region reduces to that of the non-coherent channel (for $β\geq \min \{α,1\}$) and, finally, one in which partially-coherent combining of the over-samples is asymptotically optimal (for $2 α-1\leq β\leq 1$). Overall, our results are the first to identify the regimes in which different oversampling strategies are asymptotically optimal. △ Less

Submitted 21 January, 2020; originally announced January 2020.

arXiv:2001.03884 [pdf, other]

doi 10.1109/ISIT44484.2020.9174417

Compressibility Measures for Affinely Singular Random Vectors

Authors: Mohammad-Amin Charusaie, Arash Amini, Stefano Rini

Abstract: There are several ways to measure the compressibility of a random measure; they include general approaches such as using the rate-distortion curve, as well as more specific notions, such as the Renyi information dimension (RID). The RID parameter indicates the concentration of the measure around lower-dimensional subsets of the space. While the evaluation of such compressibility parameters is well… ▽ More There are several ways to measure the compressibility of a random measure; they include general approaches such as using the rate-distortion curve, as well as more specific notions, such as the Renyi information dimension (RID). The RID parameter indicates the concentration of the measure around lower-dimensional subsets of the space. While the evaluation of such compressibility parameters is well-studied for continuous and discrete measures, the case of discrete-continuous measures is quite subtle. In this paper, we focus on a class of multi-dimensional random measures that have singularities on affine lower-dimensional subsets. This class of distributions naturally arises when considering linear transformation of component-wise independent discrete-continuous random variables. To measure the compressibility of such distributions, we introduce the new notion of dimensional-rate bias (DRB) which is closely related to the entropy and differential entropy in discrete and continuous cases, respectively. Similar to entropy and differential entropy, DRB is useful in evaluating the mutual information between distributions of the aforementioned type. Besides the DRB, we also evaluate the the RID of these distributions. We further provide an upper-bound for the RID of multi-dimensional random measures that are obtained by Lipschitz functions of component-wise independent discrete-continuous random variables ($\mathbf{X}$). The upper-bound is shown to be achievable when the Lipschitz function is $A \mathbf{X}$, where $A$ satisfies {\changed$\spark({A_{m\times n}}) = m+1$} (e.g., Vandermonde matrices). When considering discrete-domain moving-average processes with non-Gaussian excitation noise, the above results allow us to evaluate the block-average RID and DRB, as well as to determine a relationship between these parameters and other existing compressibility measures. △ Less

Submitted 8 March, 2022; v1 submitted 12 January, 2020; originally announced January 2020.

arXiv:1905.05401 [pdf, other]

Comparison-limited Vector Quantization

Authors: Joseph Chataignon, Stefano Rini

Abstract: A variation of the classic vector quantization problem is considered, in which the analog-to-digital (A2D) conversion is not constrained by the cardinality of the output but rather by the number of comparators available for quantization. More specifically, we consider the scenario in which a vector quantizer of dimension d is comprised of k comparators, each receiving a linear combination of the i… ▽ More A variation of the classic vector quantization problem is considered, in which the analog-to-digital (A2D) conversion is not constrained by the cardinality of the output but rather by the number of comparators available for quantization. More specifically, we consider the scenario in which a vector quantizer of dimension d is comprised of k comparators, each receiving a linear combination of the inputs and producing zero/one when this signal is above/below a threshold. Given a distribution of the inputs and a distortion criterion, the value of the linear combinations and thresholds are to be configured so as to minimize the distortion between the quantizer input and its reconstruction. This vector quantizer architecture naturally arises in many A2D conversion scenarios in which the quantizer's cost and energy consumption are severely restricted. For this novel vector quantizer architecture, we propose an algorithm to determine the optimal configuration and provide the first performance evaluation for the case of uniform and Gaussian sources. △ Less

Submitted 27 June, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

Comments: 5 pages, 5 figures

arXiv:1905.00405 [pdf, ps, other]

LDPC Coded Multiuser Sha** for the Gaussian Multiple Access Channel

Authors: Alexios Balatsoukas-Stimming, Stefano Rini, Joerg Kliewer

Abstract: The joint design of input constellation and low-density parity-check (LDPC) codes to approach the symmetric capacity of the two-user Gaussian multiple access channel is studied. More specifically, multilevel coding is employed at each user to construct a high-order input constellation and the constellations of the users are jointly designed so as to maximize the multiuser sha** gain. At the rece… ▽ More The joint design of input constellation and low-density parity-check (LDPC) codes to approach the symmetric capacity of the two-user Gaussian multiple access channel is studied. More specifically, multilevel coding is employed at each user to construct a high-order input constellation and the constellations of the users are jointly designed so as to maximize the multiuser sha** gain. At the receiver, each layer of the multilevel coding is jointly decoded among users, while successive cancellation is employed across layers. The LDPC code employed by each user in each layer is designed using EXIT charts to support joint decoding among users for the prescribed per-layer rate and SNR. Numerical simulations are provided to validate the proposed constellation and LDPC code designs. △ Less

Submitted 1 May, 2019; originally announced May 2019.

Comments: To be presented at ISIT 2019

arXiv:1811.09216 [pdf, other]

The Statistical Dictionary-based String Matching Problem

Authors: M. Suri, S. Rini

Abstract: In the Dictionary-based String Matching (DSM) problem, a retrieval system has access to a source sequence and stores the position of a certain number of strings in a posting table. When a user inquires the position of a string, the retrieval system, instead of searching in the source sequence directly, relies on the the posting table to answer the query more efficiently. In this paper, the Statist… ▽ More In the Dictionary-based String Matching (DSM) problem, a retrieval system has access to a source sequence and stores the position of a certain number of strings in a posting table. When a user inquires the position of a string, the retrieval system, instead of searching in the source sequence directly, relies on the the posting table to answer the query more efficiently. In this paper, the Statistical DSM problem is a proposed as a statistical and information-theoretic formulation of the classic DSM problem in which both the source and the query have a statistical description while the strings stored in the posting sequence are described as a code. Through this formulation, we are able to define the efficiency of the retrieval system as the average cost in answering a users' query in the limit of sufficiently long source sequence. This formulation is used to study the retrieval performance for the case in which (i) all the strings of a given length, referred to as k-grams , and (ii) prefix-free codes. △ Less

Submitted 22 November, 2018; originally announced November 2018.

Comments: 8 pages, 2 figures, submitted to IEEE ICASSP 2019

arXiv:1810.12457 [pdf, ps, other]

Distributed Convex Optimization With Limited Communications

Authors: Milind Rao, Stefano Rini, Andrea Goldsmith

Abstract: In this paper, a distributed convex optimization algorithm, termed \emph{distributed coordinate dual averaging} (DCDA) algorithm, is proposed. The DCDA algorithm addresses the scenario of a large distributed optimization problem with limited communication among nodes in the network. Currently known distributed subgradient methods, such as the distributed dual averaging or the distributed alternati… ▽ More In this paper, a distributed convex optimization algorithm, termed \emph{distributed coordinate dual averaging} (DCDA) algorithm, is proposed. The DCDA algorithm addresses the scenario of a large distributed optimization problem with limited communication among nodes in the network. Currently known distributed subgradient methods, such as the distributed dual averaging or the distributed alternating direction method of multipliers algorithms, assume that nodes can exchange messages of large cardinality. Such network communication capabilities are not valid in many scenarios of practical relevance. In the DCDA algorithm, on the other hand, communication of each coordinate of the optimization variable is restricted over time. For the proposed algorithm, we bound the rate of convergence under different communication protocols and network architectures. We also consider the extensions to the case of imperfect gradient knowledge and the case in which transmitted messages are corrupted by additive noise or are quantized. Relevant numerical simulations are also provided. △ Less

Submitted 29 October, 2018; originally announced October 2018.

Comments: Extended version of submission to IEEE ICASSP 2019

arXiv:1806.01803 [pdf, other]

On MIMO Channel Capacity with Output Quantization Constraints

Authors: Abbas Khalili, Stefano Rini, Luca Barletta, Elza Erkip, Yonina C. Eldar

Abstract: The capacity of a Multiple-Input Multiple-Output (MIMO) channel in which the antenna outputs are processed by an analog linear combining network and quantized by a set of threshold quantizers is studied. The linear combining weights and quantization thresholds are selected from a set of possible configurations as a function of the channel matrix. The possible configurations of the combining networ… ▽ More The capacity of a Multiple-Input Multiple-Output (MIMO) channel in which the antenna outputs are processed by an analog linear combining network and quantized by a set of threshold quantizers is studied. The linear combining weights and quantization thresholds are selected from a set of possible configurations as a function of the channel matrix. The possible configurations of the combining network model specific analog receiver architectures, such as single antenna selection, sign quantization of the antenna outputs or linear processing of the outputs. An interesting connection between the capacity of this channel and a constrained sphere packing problem in which unit spheres are packed in a hyperplane arrangement is shown. From a high-level perspective, this follows from the fact that each threshold quantizer can be viewed as a hyperplane partitioning the transmitter signal space. Accordingly, the output of the set of quantizers corresponds to the possible regions induced by the hyperplane arrangement corresponding to the channel realization and receiver configuration. This connection provides a number of important insights into the design of quantization architectures for MIMO receivers; for instance, it shows that for a given number of quantizers, choosing configurations which induce a larger number of partitions can lead to higher rates. △ Less

Submitted 5 June, 2018; originally announced June 2018.

Comments: 5 pages, 3 figures, ISIT 2018

arXiv:1707.02398 [pdf, other]

On the Capacity of the Carbon Copy onto Dirty Paper Channel

Authors: Stefano Rini, Shlomo Shamai

Abstract: The "Carbon Copy onto Dirty Paper" (CCDP) channel is the compound "writing on dirty paper" channel in which the channel output is obtained as the sum of the channel input, white Gaussian noise and a Gaussian state sequence randomly selected among a set possible realizations. The transmitter has non-causal knowledge of the set of possible state sequences but does not know which sequence is selected… ▽ More The "Carbon Copy onto Dirty Paper" (CCDP) channel is the compound "writing on dirty paper" channel in which the channel output is obtained as the sum of the channel input, white Gaussian noise and a Gaussian state sequence randomly selected among a set possible realizations. The transmitter has non-causal knowledge of the set of possible state sequences but does not know which sequence is selected to produce the channel output. We study the capacity of the CCDP channel for two scenarios: (i) the state sequences are independent and identically distributed, and (ii) the state sequences are scaled versions of the same sequence. In the first scenario, we show that a combination of superposition coding, time-sharing and Gel'fand-Pinsker binning is sufficient to approach the capacity to within three bits per channel use for any number of possible state realizations. In the second scenario, we derive capacity to within four bits-per-channel-use for the case of two possible state sequences. This result is extended to the CCDP channel with any number of possible state sequences under certain conditions on the scaling parameters which we denote as "strong fading" regime. We conclude by providing some remarks on the capacity of the CCDP channel in which the state sequences have any jointly Gaussian distribution. △ Less

Submitted 8 July, 2017; originally announced July 2017.

arXiv:1707.00420 [pdf, other]

Compress-and-Estimate Source Coding for a Vector Gaussian Source

Authors: Ruiyang Song, Stefano Rini, Alon Kipnis, Andrea Goldsmith

Abstract: We consider the remote vector source coding problem in which a vector Gaussian source is to be estimated from noisy linear measurements. For this problem, we derive the performance of the compress-and-estimate (CE) coding scheme and compare it to the optimal performance. In the CE coding scheme, the remote encoder compresses the noisy source observations so as to minimize the local distortion meas… ▽ More We consider the remote vector source coding problem in which a vector Gaussian source is to be estimated from noisy linear measurements. For this problem, we derive the performance of the compress-and-estimate (CE) coding scheme and compare it to the optimal performance. In the CE coding scheme, the remote encoder compresses the noisy source observations so as to minimize the local distortion measure, independent from the joint distribution between the source and the observations. In reconstruction, the decoder estimates the original source realization from the lossy-compressed noisy observations. For the CE coding in the Gaussian vector case, we show that, if the code rate is less than a threshold, then the CE coding scheme attains the same performance as the optimal coding scheme. We also introduce lower and upper bounds for the performance gap above this threshold. In addition, an example with two observations and two sources is studied to illustrate the behavior of the performance gap. △ Less

Submitted 3 July, 2017; originally announced July 2017.

arXiv:1705.08148 [pdf, ps, other]

Capacity Outer Bound and Degrees of Freedom of Wiener Phase Noise Channels with Oversampling

Authors: Luca Barletta, Stefano Rini

Abstract: The discrete-time Wiener phase noise channel with an integrate-and-dump multi-sample receiver is studied. A novel outer bound on the capacity with an average input power constraint is derived as a function of the oversampling factor. This outer bound yields the degrees of freedom for the scenario in which the oversampling factor grows with the transmit power $P$ as $P^α$. The result shows, p… ▽ More The discrete-time Wiener phase noise channel with an integrate-and-dump multi-sample receiver is studied. A novel outer bound on the capacity with an average input power constraint is derived as a function of the oversampling factor. This outer bound yields the degrees of freedom for the scenario in which the oversampling factor grows with the transmit power $P$ as $P^α$. The result shows, perhaps surprisingly, that the largest pre-log that can be attained with phase modulation at high signal-to-noise ratio is at most $1/4$. △ Less

Submitted 23 May, 2017; originally announced May 2017.

Comments: 5 pages, 1 figure, Submitted to Intern. Workshop Inf. Theory (ITW) 2017

arXiv:1702.08133 [pdf, ps, other]

A General Framework for Low-Resolution Receivers for MIMO Channels

Authors: Stefano Rini, Luca Barletta, Yonina C. Eldar, Elza Erkip

Abstract: The capacity of a discrete-time multi-input multi-output (MIMO) Gaussian channel with output quantization is investigated for different receiver architectures. A general formulation of this problem is proposed in which the antenna outputs are processed by analog combiners while sign quantizers are used for analog-to-digital conversion. To exemplify this approach, four analog receiver architectures… ▽ More The capacity of a discrete-time multi-input multi-output (MIMO) Gaussian channel with output quantization is investigated for different receiver architectures. A general formulation of this problem is proposed in which the antenna outputs are processed by analog combiners while sign quantizers are used for analog-to-digital conversion. To exemplify this approach, four analog receiver architectures of varying generality and complexity are considered: (a) multiple antenna selection and sign quantization of the antenna outputs, (b) single antenna selection and multilevel quantization, (c) multiple antenna selection and multilevel quantization, and (d) linear combining of the antenna outputs and multilevel quantization. Achievable rates are studied as a function of the number of available sign quantizers and compared among different architectures. In particular, it is shown that architecture (a) is sufficient to attain the optimal high signal-to-noise ratio performance for a MIMO receiver in which the number of antennas is larger than the number of sign quantizers. Numerical evaluations of the average performance are presented for the case in which the channel gains are i.i.d. Gaussian. △ Less

Submitted 7 July, 2017; v1 submitted 26 February, 2017; originally announced February 2017.

arXiv:1701.04982 [pdf, ps, other]

Capacity of Discrete-Time Wiener Phase Noise Channels to Within a Constant Gap

Authors: Luca Barletta, Stefano Rini

Abstract: The capacity of the discrete-time channel affected by both additive Gaussian noise and Wiener phase noise is studied. Novel inner and outer bounds are presented, which differ of at most $6.65$ bits per channel use for all channel parameters. The capacity of this model can be subdivided in three regimes: (i) for large values of the frequency noise variance, the channel behaves similarly to a channe… ▽ More The capacity of the discrete-time channel affected by both additive Gaussian noise and Wiener phase noise is studied. Novel inner and outer bounds are presented, which differ of at most $6.65$ bits per channel use for all channel parameters. The capacity of this model can be subdivided in three regimes: (i) for large values of the frequency noise variance, the channel behaves similarly to a channel with circularly uniform iid phase noise; (ii) when the frequency noise variance is small, the effects of the additive noise dominate over those of the phase noise, while (iii) for intermediate values of the frequency noise variance, the transmission rate over the phase modulation channel has to be reduced due to the presence of phase noise. △ Less

Submitted 18 January, 2017; originally announced January 2017.

Comments: 7 pages, 3 figures. Extended version of a paper submitted to ISIT 2017

arXiv:1606.06039 [pdf, ps, other]

On Capacity of the Writing onto Fast Fading Dirt Channel

Authors: Stefano Rini, Shlomo Shamai

Abstract: The "Writing onto Fast Fading Dirt" (WFFD) channel is investigated to study the effects of partial channel knowledge on the capacity of the "writing on dirty paper" channel. The WFFD channel is the Gel'fand-Pinsker channel in which the output is obtained as the sum of the input, white Gaussian noise and a fading-times-state term. The fading-times-state term is equal to the element-wise product of… ▽ More The "Writing onto Fast Fading Dirt" (WFFD) channel is investigated to study the effects of partial channel knowledge on the capacity of the "writing on dirty paper" channel. The WFFD channel is the Gel'fand-Pinsker channel in which the output is obtained as the sum of the input, white Gaussian noise and a fading-times-state term. The fading-times-state term is equal to the element-wise product of the channel state sequence, known only at the transmitter, and a fast fading process, known only at the receiver. We consider the case of Gaussian distributed channel states and derive an approximate characterization of capacity for different classes of fading distributions, both continuous and discrete. In particular, we prove that if the fading distribution concentrates in a sufficiently small interval, then capacity is approximately equal to the AWGN capacity times the probability of this interval. We also show that there exists a class of fading distributions for which having the transmitter treat the fading-times-state term as additional noise closely approaches capacity. Although a closed-form expression of the capacity of the general WFFD channel remains unknown, our results show that the presence of fading can severely reduce the usefulness of channel state knowledge at the transmitter. △ Less

Submitted 7 July, 2017; v1 submitted 20 June, 2016; originally announced June 2016.

arXiv:1605.03755 [pdf, ps, other]

Optimal Rate Allocation in Mismatched Multiterminal Source Coding

Authors: Ruiyang Song, Stefano Rini, Alon Kipnis, Andrea J. Goldsmith

Abstract: We consider a multiterminal source coding problem in which a source is estimated at a central processing unit from lossy-compressed remote observations. Each lossy-encoded observation is produced by a remote sensor which obtains a noisy version of the source and compresses this observation minimizing a local distortion measure which depends only on the marginal distribution of its observation. The… ▽ More We consider a multiterminal source coding problem in which a source is estimated at a central processing unit from lossy-compressed remote observations. Each lossy-encoded observation is produced by a remote sensor which obtains a noisy version of the source and compresses this observation minimizing a local distortion measure which depends only on the marginal distribution of its observation. The central node, on the other hand, has knowledge of the joint distribution of the source and all the observations and produces the source estimate which minimizes a different distortion measure between the source and its reconstruction. In this correspondence, we investigate the problem of optimally choosing the rate of each lossy-compressed remote estimate so as to minimize the distortion at the central processing unit subject to a bound on the overall communication rate between the remote sensors and the central unit. We focus, in particular, on two models of practical relevance: the case of a Gaussian source observed in additive Gaussian noise and reconstructed under quadratic distortion, and the case of a binary source observed in bit-flip** noise and reconstructed under Hamming distortion. In both scenarios we show that there exist regimes under which having more remote encoders does reduce the source distortion: in other words, having fewer, high-quality remote estimates provides a smaller distortion than having more, lower-quality estimates. △ Less

Submitted 12 May, 2016; originally announced May 2016.

arXiv:1602.02206 [pdf, ps, other]

The Carbon Copy onto Dirty Paper Channel with Statistically Equivalent States

Authors: Stefano Rini, Shlomo Shamai Shitz

Abstract: Costa's "writing on dirty paper" capacity result establishes that full state pre-cancellation can be attained in Gelfand-Pinsker channel with additive state and additive Gaussian noise. The "carbon copy onto dirty paper" channel is the extension of Costa's model to the compound setting: M receivers each observe the sum of the channel input, Gaussian noise and one of M Gaussian state sequences and… ▽ More Costa's "writing on dirty paper" capacity result establishes that full state pre-cancellation can be attained in Gelfand-Pinsker channel with additive state and additive Gaussian noise. The "carbon copy onto dirty paper" channel is the extension of Costa's model to the compound setting: M receivers each observe the sum of the channel input, Gaussian noise and one of M Gaussian state sequences and attempt to decode the same common message. The state sequences are all non-causally known at the transmitter which attempts to simultaneously pre-code its transmission against the channel state affecting each output. In this correspondence we derive the capacity to within 2.25 bits-per-channel-use of the carbon copying onto dirty paper channel in which the state sequences are statistically equivalent, having the same variance and the same pairwise correlation. For this channel capacity is approached by letting the channel input be the superposition of two codewords: a base codeword, simultaneously decoded at each user, and a top codeword which is pre-coded against the state realization at each user for a portion 1/M of the time. The outer bound relies on a recursive bounding in which incremental side information is provided at each receiver. This result represents a significant first step toward determining the capacity of the most general "carbon copy onto dirty paper" channel in which state sequences appearing in the different channel outputs have any jointly Gaussian distribution. △ Less

Submitted 5 February, 2016; originally announced February 2016.

arXiv:1602.02205 [pdf, ps, other]

On the Capacity of the Dirty Paper Channel with Fast Fading and Discrete Channel States

Authors: Stefano Rini, Shlomo Shamai Shitz

Abstract: The "writing dirty paper" capacity result crucially dependents on the perfect channel knowledge at the transmitter as the presence of even a small uncertainty in the channel realization gravely hampers the ability of the transmitter to pre-code its transmission against the channel state. This is particularly disappointing as it implies that interference pre-coding in practical systems is effective… ▽ More The "writing dirty paper" capacity result crucially dependents on the perfect channel knowledge at the transmitter as the presence of even a small uncertainty in the channel realization gravely hampers the ability of the transmitter to pre-code its transmission against the channel state. This is particularly disappointing as it implies that interference pre-coding in practical systems is effective only when the channel estimates at the users have very high precision, a condition which is generally unattainable in wireless environments. In this paper we show that substantial improvements are possible when the state sequence is drawn from a discrete distribution, such as a constrained input constellation, for which state decoding can be approximately optimal. We consider the "writing on dirty paper" channel in which the state sequence is multiplied by a fast fading process and derive conditions on the fading and state distributions for which state decoding closely approaches capacity. These conditions intuitively relate to the ability of the receiver to correctly identify both the input and the state realization despite of the uncertainty introduced by fading. △ Less

Submitted 5 February, 2016; originally announced February 2016.

arXiv:1602.02201 [pdf, ps, other]

The Rate-Distortion Risk in Estimation from Compressed Data

Authors: Alon Kipnis, Stefano Rini, Andrea J. Goldsmith

Abstract: Consider the problem of estimating a latent signal from a lossy compressed version of the data when the compressor is agnostic to the relation between the signal and the data. This situation arises in a host of modern applications when data is transmitted or stored prior to determining the downstream inference task. Given a bitrate constraint and a distortion measure between the data and its compr… ▽ More Consider the problem of estimating a latent signal from a lossy compressed version of the data when the compressor is agnostic to the relation between the signal and the data. This situation arises in a host of modern applications when data is transmitted or stored prior to determining the downstream inference task. Given a bitrate constraint and a distortion measure between the data and its compressed version, let us consider the joint distribution achieving Shannon's rate-distortion (RD) function. Given an estimator and a loss function associated with the downstream inference task, define the rate-distortion risk as the expected loss under the RD-achieving distribution. We provide general conditions under which the operational risk in estimating from the compressed data is asymptotically equivalent to the RD risk. The main theoretical tools to prove this equivalence are transportation-cost inequalities in conjunction with properties of compression codes achieving Shannon's RD function. Whenever such equivalence holds, a recipe for designing estimators from datasets undergoing lossy compression without specifying the actual compression technique emerges: design the estimator to minimize the RD risk. Our conditions simplified in the special cases of discrete memoryless or multivariate normal data. For these scenarios, we derive explicit expressions for the RD risk of several estimators and compare them to the optimal source coding performance associated with full knowledge of the relation between the latent signal and the data. △ Less

Submitted 10 January, 2021; v1 submitted 5 February, 2016; originally announced February 2016.

Comments: Second revision. IEEE Transactions on Information Theory

Showing 1–50 of 74 results for author: Rini, S