-
On Semi-Supervised Estimation of Distributions
Authors:
H. S. Melihcan Erol,
Erixhen Sula,
Lizhong Zheng
Abstract:
We study the problem of estimating the joint probability mass function (pmf) over two random variables. In particular, the estimation is based on the observation of $m$ samples containing both variables and $n$ samples missing one fixed variable. We adopt the minimax framework with $l^p_p$ loss functions, and we show that the composition of uni-variate minimax estimators achieves minimax risk with…
▽ More
We study the problem of estimating the joint probability mass function (pmf) over two random variables. In particular, the estimation is based on the observation of $m$ samples containing both variables and $n$ samples missing one fixed variable. We adopt the minimax framework with $l^p_p$ loss functions, and we show that the composition of uni-variate minimax estimators achieves minimax risk with the optimal first-order constant for $p \ge 2$, in the regime $m = o(n)$.
△ Less
Submitted 15 May, 2023; v1 submitted 13 May, 2023;
originally announced May 2023.
-
On the Semi-supervised Expectation Maximization
Authors:
Erixhen Sula,
Lizhong Zheng
Abstract:
The Expectation Maximization (EM) algorithm is widely used as an iterative modification to maximum likelihood estimation when the data is incomplete. We focus on a semi-supervised case to learn the model from labeled and unlabeled samples. Existing work in the semi-supervised case has focused mainly on performance rather than convergence guarantee, however we focus on the contribution of the label…
▽ More
The Expectation Maximization (EM) algorithm is widely used as an iterative modification to maximum likelihood estimation when the data is incomplete. We focus on a semi-supervised case to learn the model from labeled and unlabeled samples. Existing work in the semi-supervised case has focused mainly on performance rather than convergence guarantee, however we focus on the contribution of the labeled samples to the convergence rate. The analysis clearly demonstrates how the labeled samples improve the convergence rate for the exponential family mixture model. In this case, we assume that the population EM (EM with unlimited data) is initialized within the neighborhood of global convergence for the population EM that consists solely of samples that have not been labeled. The analysis for the labeled samples provides a comprehensive description of the convergence rate for the Gaussian mixture model. In addition, we extend the findings for labeled samples and offer an alternative proof for the population EM's convergence rate with unlabeled samples for the symmetric mixture of two Gaussians.
△ Less
Submitted 25 January, 2023; v1 submitted 1 November, 2022;
originally announced November 2022.
-
Shannon Bounds on Lossy Gray-Wyner Networks
Authors:
Erixhen Sula,
Michael Gastpar
Abstract:
The Gray-Wyner network subject to a fidelity criterion is studied. Upper and lower bounds for the trade-offs between the private sum-rate and the common rate are obtained for arbitrary sources subject to mean-squared error distortion. The bounds meet exactly, leading to the computation of the rate region, when the source is jointly Gaussian. They meet partially when the sources are modeled via an…
▽ More
The Gray-Wyner network subject to a fidelity criterion is studied. Upper and lower bounds for the trade-offs between the private sum-rate and the common rate are obtained for arbitrary sources subject to mean-squared error distortion. The bounds meet exactly, leading to the computation of the rate region, when the source is jointly Gaussian. They meet partially when the sources are modeled via an additive Gaussian "channel". The bounds are inspired from the Shannon bounds on the rate-distortion problem.
△ Less
Submitted 2 February, 2022;
originally announced February 2022.
-
Lower bound on Wyner's Common Information
Authors:
Erixhen Sula,
Michael Gastpar
Abstract:
An important notion of common information between two random variables is due to Wyner. In this paper, we derive a lower bound on Wyner's common information for continuous random variables. The new bound improves on the only other general lower bound on Wyner's common information, which is the mutual information. We also show that the new lower bound is tight for the so-called "Gaussian channels"…
▽ More
An important notion of common information between two random variables is due to Wyner. In this paper, we derive a lower bound on Wyner's common information for continuous random variables. The new bound improves on the only other general lower bound on Wyner's common information, which is the mutual information. We also show that the new lower bound is tight for the so-called "Gaussian channels" case, namely, when the joint distribution of the random variables can be written as the sum of a single underlying random variable and Gaussian noises. We motivate this work from the recent variations of Wyner's common information and applications to network data compression problems such as the Gray-Wyner network.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.
-
Supply-Power-Constrained Cable Capacity Maximization Using Multi-Layer Neural Networks
Authors:
Junho Cho,
Sethumadhavan Chandrasekhar,
Erixhen Sula,
Samuel Olsson,
Ellsworth Burrows,
Greg Raybon,
Roland Ryf,
Nicolas Fontaine,
Jean-Christophe Antona,
Steve Grubb,
Peter Winzer,
Andrew Chraplyvy
Abstract:
We experimentally solve the problem of maximizing capacity under a total supply power constraint in a massively parallel submarine cable context, i.e., for a spatially uncoupled system in which fiber Kerr nonlinearity is not a dominant limitation. By using multi-layer neural networks trained with extensive measurement data acquired from a 12-span 744-km optical fiber link as an accurate digital tw…
▽ More
We experimentally solve the problem of maximizing capacity under a total supply power constraint in a massively parallel submarine cable context, i.e., for a spatially uncoupled system in which fiber Kerr nonlinearity is not a dominant limitation. By using multi-layer neural networks trained with extensive measurement data acquired from a 12-span 744-km optical fiber link as an accurate digital twin of the true optical system, we experimentally maximize fiber capacity with respect to the transmit signal's spectral power distribution based on a gradient-descent algorithm. By observing convergence to approximately the same maximum capacity and power distribution for almost arbitrary initial conditions, we conjecture that the capacity surface is a concave function of the transmit signal power distribution. We then demonstrate that eliminating gain flattening filters (GFFs) from the optical amplifiers results in substantial capacity gains per Watt of electrical supply power compared to a conventional system that contains GFFs.
△ Less
Submitted 20 February, 2020;
originally announced February 2020.
-
The Gaussian lossy Gray-Wyner network
Authors:
Erixhen Sula,
Michael Gastpar
Abstract:
We consider the problem of source coding subject to a fidelity criterion for the Gray-Wyner network that connects a single source with two receivers via a common channel and two private channels. The pareto-optimal trade-offs between the sum-rate of the private channels and the rate of the common channel is completely characterized for jointly Gaussian sources subject to the mean-squared error cri…
▽ More
We consider the problem of source coding subject to a fidelity criterion for the Gray-Wyner network that connects a single source with two receivers via a common channel and two private channels. The pareto-optimal trade-offs between the sum-rate of the private channels and the rate of the common channel is completely characterized for jointly Gaussian sources subject to the mean-squared error criterion, leveraging convex duality and an argument involving the factorization of convex envelopes. Specifically, it is attained by selecting the auxiliary random variable to be jointly Gaussian with the sources.
△ Less
Submitted 30 September, 2020; v1 submitted 2 February, 2020;
originally announced February 2020.
-
Common Information Components Analysis
Authors:
Michael Gastpar,
Erixhen Sula
Abstract:
We give an information-theoretic interpretation of Canonical Correlation Analysis (CCA) via (relaxed) Wyner's common information. CCA permits to extract from two high-dimensional data sets low-dimensional descriptions (features) that capture the commonalities between the data sets, using a framework of correlations and linear transforms. Our interpretation first extracts the common information up…
▽ More
We give an information-theoretic interpretation of Canonical Correlation Analysis (CCA) via (relaxed) Wyner's common information. CCA permits to extract from two high-dimensional data sets low-dimensional descriptions (features) that capture the commonalities between the data sets, using a framework of correlations and linear transforms. Our interpretation first extracts the common information up to a pre-selected resolution level, and then projects this back onto each of the data sets. In the case of Gaussian statistics, this procedure precisely reduces to CCA, where the resolution level specifies the number of CCA components that are extracted. This also suggests a novel algorithm, Common Information Components Analysis (CICA), with several desirable features, including a natural extension to beyond just two data sets.
△ Less
Submitted 28 February, 2020; v1 submitted 3 February, 2020;
originally announced February 2020.
-
On Wyner's Common Information in the Gaussian Case
Authors:
Erixhen Sula,
Michael Gastpar
Abstract:
Wyner's Common Information and a natural relaxation are studied in the special case of Gaussian random variables. The relaxation replaces conditional independence by a bound on the conditional mutual information. The main contribution is the proof that Gaussian auxiliaries are optimal, leading to a closed-form formula. As a corollary, the proof technique also establishes the optimality of Gaussian…
▽ More
Wyner's Common Information and a natural relaxation are studied in the special case of Gaussian random variables. The relaxation replaces conditional independence by a bound on the conditional mutual information. The main contribution is the proof that Gaussian auxiliaries are optimal, leading to a closed-form formula. As a corollary, the proof technique also establishes the optimality of Gaussian auxiliaries for the Gaussian Gray-Wyner network, a long-standing open problem.
△ Less
Submitted 28 September, 2020; v1 submitted 15 December, 2019;
originally announced December 2019.
-
Supply-Power-Constrained Cable Capacity Maximization Using Deep Neural Networks
Authors:
Junho Cho,
Sethumadhavan Chandrasekhar,
Erixhen Sula,
Samuel Olsson,
Ellsworth Burrows,
Greg Raybon,
Roland Ryf,
Nicolas Fontaine,
Jean-Christophe Antona,
Steve Grubb,
Peter Winzer,
Andrew Chraplyvy
Abstract:
We experimentally achieve a 19% capacity gain per Watt of electrical supply power in a 12-span link by eliminating gain flattening filters and optimizing launch powers using machine learning by deep neural networks in a massively parallel fiber context.
We experimentally achieve a 19% capacity gain per Watt of electrical supply power in a 12-span link by eliminating gain flattening filters and optimizing launch powers using machine learning by deep neural networks in a massively parallel fiber context.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
Sum-Rate Capacity for Symmetric Gaussian Multiple Access Channels with Feedback
Authors:
Erixhen Sula,
Michael Gastpar,
Gerhard Kramer
Abstract:
The feedback sum-rate capacity is established for the symmetric $J$-user Gaussian multiple-access channel (GMAC). The main contribution is a converse bound that combines the dependence-balance argument of Hekstra and Willems (1989) with a variant of the factorization of a convex envelope of Geng and Nair (2014). The converse bound matches the achievable sum-rate of the Fourier-Modulated Estimate C…
▽ More
The feedback sum-rate capacity is established for the symmetric $J$-user Gaussian multiple-access channel (GMAC). The main contribution is a converse bound that combines the dependence-balance argument of Hekstra and Willems (1989) with a variant of the factorization of a convex envelope of Geng and Nair (2014). The converse bound matches the achievable sum-rate of the Fourier-Modulated Estimate Correction strategy of Kramer (2002).
△ Less
Submitted 23 November, 2018;
originally announced November 2018.
-
Compute--Forward Multiple Access (CFMA): Practical Code Design
Authors:
Erixhen Sula,
**gge Zhu,
Adriano Pastore,
Sung Hoon Lim,
Michael Gastpar
Abstract:
We present a practical strategy that aims to attain rate points on the dominant face of the multiple access channel capacity using a standard low complexity decoder. This technique is built upon recent theoretical developments of Zhu and Gastpar on compute-forward multiple access (CFMA) which achieves the capacity of the multiple access channel using a sequential decoder. We illustrate this strate…
▽ More
We present a practical strategy that aims to attain rate points on the dominant face of the multiple access channel capacity using a standard low complexity decoder. This technique is built upon recent theoretical developments of Zhu and Gastpar on compute-forward multiple access (CFMA) which achieves the capacity of the multiple access channel using a sequential decoder. We illustrate this strategy with off-the-shelf LDPC codes. In the first stage of decoding, the receiver first recovers a linear combination of the transmitted codewords using the sum-product algorithm (SPA). In the second stage, by using the recovered sum-of-codewords as side information, the receiver recovers one of the two codewords using a modified SPA, ultimately recovering both codewords. The main benefit of recovering the sum-of-codewords instead of the codeword itself is that it allows to attain points on the dominant face of the multiple access channel capacity without the need of rate-splitting or time sharing while maintaining a low complexity in the order of a standard point-to-point decoder. This property is also shown to be crucial for some applications, e.g., interference channels. For all the simulations with single-layer binary codes, our proposed practical strategy is shown to be within \SI{1.7}{\decibel} of the theoretical limits, without explicit optimization on the off-the-self LDPC codes.
△ Less
Submitted 29 December, 2017;
originally announced December 2017.