-
Scaling laws for learning with real and surrogate data
Authors:
Ayush Jain,
Andrea Montanari,
Eren Sasoglu
Abstract:
Collecting large quantities of high-quality data can be prohibitively expensive or impractical, and a bottleneck in machine learning. One may instead augment a small set of $n$ data points from the target distribution with data from more accessible sources, e.g. data collected under different circumstances or synthesized by generative models. We refer to such data as `surrogate data.' We introduce…
▽ More
Collecting large quantities of high-quality data can be prohibitively expensive or impractical, and a bottleneck in machine learning. One may instead augment a small set of $n$ data points from the target distribution with data from more accessible sources, e.g. data collected under different circumstances or synthesized by generative models. We refer to such data as `surrogate data.' We introduce a weighted empirical risk minimization (ERM) approach for integrating surrogate data into training. We analyze mathematically this method under several classical statistical models, and validate our findings empirically on datasets from different domains. Our main findings are: $(i)$ Integrating surrogate data can significantly reduce the test error on the original distribution. Surprisingly, this can happen even when the surrogate data is unrelated to the original ones. We trace back this behavior to the classical Stein's paradox. $(ii)$ In order to reap the benefit of surrogate data, it is crucial to use optimally weighted ERM. $(iii)$ The test error of models trained on mixtures of real and surrogate data is approximately described by a scaling law. This scaling law can be used to predict the optimal weighting scheme, and to choose the amount of surrogate data to add.
△ Less
Submitted 28 June, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Sliding-Window Superposition Coding:Two-User Interference Channels
Authors:
Lele Wang,
Young-Han Kim,
Chiao-Yi Chen,
Hosung Park,
Eren Sasoglu
Abstract:
A low-complexity coding scheme is developed to achieve the rate region of maximum likelihood decoding for interference channels. As in the classical rate-splitting multiple access scheme by Grant, Rimoldi, Urbanke, and Whiting, the proposed coding scheme uses superposition of multiple codewords with successive cancellation decoding, which can be implemented using standard point-to-point encoders a…
▽ More
A low-complexity coding scheme is developed to achieve the rate region of maximum likelihood decoding for interference channels. As in the classical rate-splitting multiple access scheme by Grant, Rimoldi, Urbanke, and Whiting, the proposed coding scheme uses superposition of multiple codewords with successive cancellation decoding, which can be implemented using standard point-to-point encoders and decoders. Unlike rate-splitting multiple access, which is not rate-optimal for multiple receivers, the proposed coding scheme transmits codewords over multiple blocks in a staggered manner and recovers them successively over sliding decoding windows, achieving the single-stream optimal rate region as well as the more general Han--Kobayashi inner bound for the two-user interference channel. The feasibility of this scheme in practice is verified by implementing it using commercial channel codes over the two-user Gaussian interference channel.
△ Less
Submitted 9 January, 2017;
originally announced January 2017.
-
Polar Coding for Processes with Memory
Authors:
Eren Sasoglu,
Ido Tal
Abstract:
We study polar coding for stochastic processes with memory. For example, a process may be defined by the joint distribution of the input and output of a channel. The memory may be present in the channel, the input, or both. We show that $ψ$-mixing processes polarize under the standard Arıkan transform, under a mild condition. We further show that the rate of polarization of the \emph{low-entropy}…
▽ More
We study polar coding for stochastic processes with memory. For example, a process may be defined by the joint distribution of the input and output of a channel. The memory may be present in the channel, the input, or both. We show that $ψ$-mixing processes polarize under the standard Arıkan transform, under a mild condition. We further show that the rate of polarization of the \emph{low-entropy} synthetic channels is roughly $O(2^{-\sqrt{N}})$, where $N$ is the blocklength. That is, essentially the same rate as in the memoryless case.
△ Less
Submitted 15 August, 2018; v1 submitted 4 February, 2016;
originally announced February 2016.
-
Reed-Muller Codes Achieve Capacity on Erasure Channels
Authors:
Shrinivas Kudekar,
Santhosh Kumar,
Marco Mondelli,
Henry D. Pfister,
Eren Şaşoğlu,
Rüdiger Urbanke
Abstract:
We introduce a new approach to proving that a sequence of deterministic linear codes achieves capacity on an erasure channel under maximum a posteriori decoding. Rather than relying on the precise structure of the codes our method exploits code symmetry. In particular, the technique applies to any sequence of linear codes where the blocklengths are strictly increasing, the code rates converge, and…
▽ More
We introduce a new approach to proving that a sequence of deterministic linear codes achieves capacity on an erasure channel under maximum a posteriori decoding. Rather than relying on the precise structure of the codes our method exploits code symmetry. In particular, the technique applies to any sequence of linear codes where the blocklengths are strictly increasing, the code rates converge, and the permutation group of each code is doubly transitive. In other words, we show that symmetry alone implies near-optimal performance.
An important consequence of this result is that a sequence of Reed-Muller codes with increasing blocklength and converging rate achieves capacity. This possibility has been suggested previously in the literature but it has only been proven for cases where the limiting code rate is 0 or 1. Moreover, these results extend naturally to all affine-invariant codes and, thus, to extended primitive narrow-sense BCH codes. This also resolves, in the affirmative, the existence question for capacity-achieving sequences of binary cyclic codes. The primary tools used in the proof are the sharp threshold property for symmetric monotone boolean functions and the area theorem for extrinsic information transfer functions.
△ Less
Submitted 18 January, 2016;
originally announced January 2016.
-
Reed-Muller Codes Achieve Capacity on the Binary Erasure Channel under MAP Decoding
Authors:
Shrinivas Kudekar,
Marco Mondelli,
Eren Şaşoğlu,
Rüdiger Urbanke
Abstract:
We show that Reed-Muller codes achieve capacity under maximum a posteriori bit decoding for transmission over the binary erasure channel for all rates $0 < R < 1$. The proof is generic and applies to other codes with sufficient amount of symmetry as well. The main idea is to combine the following observations: (i) monotone functions experience a sharp threshold behavior, (ii) the extrinsic informa…
▽ More
We show that Reed-Muller codes achieve capacity under maximum a posteriori bit decoding for transmission over the binary erasure channel for all rates $0 < R < 1$. The proof is generic and applies to other codes with sufficient amount of symmetry as well. The main idea is to combine the following observations: (i) monotone functions experience a sharp threshold behavior, (ii) the extrinsic information transfer (EXIT) functions are monotone, (iii) Reed--Muller codes are 2-transitive and thus the EXIT functions associated with their codeword bits are all equal, and (iv) therefore the Area Theorem for the average EXIT functions implies that RM codes' threshold is at channel capacity.
△ Less
Submitted 21 May, 2015;
originally announced May 2015.
-
Optimal Haplotype Assembly from High-Throughput Mate-Pair Reads
Authors:
Govinda M. Kamath,
Eren Şaşoğlu,
David Tse
Abstract:
Humans have $23$ pairs of homologous chromosomes. The homologous pairs are almost identical pairs of chromosomes. For the most part, differences in homologous chromosome occur at certain documented positions called single nucleotide polymorphisms (SNPs). A haplotype of an individual is the pair of sequences of SNPs on the two homologous chromosomes. In this paper, we study the problem of inferring…
▽ More
Humans have $23$ pairs of homologous chromosomes. The homologous pairs are almost identical pairs of chromosomes. For the most part, differences in homologous chromosome occur at certain documented positions called single nucleotide polymorphisms (SNPs). A haplotype of an individual is the pair of sequences of SNPs on the two homologous chromosomes. In this paper, we study the problem of inferring haplotypes of individuals from mate-pair reads of their genome. We give a simple formula for the coverage needed for haplotype assembly, under a generative model. The analysis here leverages connections of this problem with decoding convolutional codes.
△ Less
Submitted 6 February, 2015;
originally announced February 2015.
-
Polar coding for interference networks
Authors:
Lele Wang,
Eren Sasoglu
Abstract:
A polar coding scheme for interference networks is introduced. The scheme combines Arikan's monotone chain rules for multiple-access channels and a method by Hassani and Urbanke to 'align' two incompatible polarization processes. It achieves the Han--Kobayashi inner bound for two-user interference channels and generalizes to interference networks.
A polar coding scheme for interference networks is introduced. The scheme combines Arikan's monotone chain rules for multiple-access channels and a method by Hassani and Urbanke to 'align' two incompatible polarization processes. It achieves the Han--Kobayashi inner bound for two-user interference channels and generalizes to interference networks.
△ Less
Submitted 28 January, 2014;
originally announced January 2014.
-
Universal Polarization
Authors:
Eren Sasoglu,
Lele Wang
Abstract:
A method to polarize channels universally is introduced. The method is based on combining two distinct channels in each polarization step, as opposed to Arikan's original method of combining identical channels. This creates an equal number of only two types of channels, one of which becomes progressively better as the other becomes worse. The locations of the good polarized channels are independen…
▽ More
A method to polarize channels universally is introduced. The method is based on combining two distinct channels in each polarization step, as opposed to Arikan's original method of combining identical channels. This creates an equal number of only two types of channels, one of which becomes progressively better as the other becomes worse. The locations of the good polarized channels are independent of the underlying channel, guaranteeing universality. Polarizing the good channels further with Arikan's method results in universal polar codes of rate 1/2. The method is generalized to construct codes of arbitrary rates.
It is also shown that the less noisy ordering of channels is preserved under polarization, and thus a good polar code for a given channel will perform well over a less noisy one.
△ Less
Submitted 27 December, 2013; v1 submitted 29 July, 2013;
originally announced July 2013.
-
On the Capacity Region for Index Coding
Authors:
Fatemeh Arbabjolfaei,
Bernd Bandemer,
Young-Han Kim,
Eren Sasoglu,
Lele Wang
Abstract:
A new inner bound on the capacity region of a general index coding problem is established. Unlike most existing bounds that are based on graph theoretic or algebraic tools, the bound is built on a random coding scheme and optimal decoding, and has a simple polymatroidal single-letter expression. The utility of the inner bound is demonstrated by examples that include the capacity region for all ind…
▽ More
A new inner bound on the capacity region of a general index coding problem is established. Unlike most existing bounds that are based on graph theoretic or algebraic tools, the bound is built on a random coding scheme and optimal decoding, and has a simple polymatroidal single-letter expression. The utility of the inner bound is demonstrated by examples that include the capacity region for all index coding problems with up to five messages (there are 9846 nonisomorphic ones).
△ Less
Submitted 15 June, 2013; v1 submitted 6 February, 2013;
originally announced February 2013.
-
A Comparison of Superposition Coding Schemes
Authors:
Lele Wang,
Eren Sasoglu,
Bernd Bandemer,
Young-Han Kim
Abstract:
There are two variants of superposition coding schemes. Cover's original superposition coding scheme has code clouds of the identical shape, while Bergmans's superposition coding scheme has code clouds of independently generated shapes. These two schemes yield identical achievable rate regions in several scenarios, such as the capacity region for degraded broadcast channels. This paper shows that…
▽ More
There are two variants of superposition coding schemes. Cover's original superposition coding scheme has code clouds of the identical shape, while Bergmans's superposition coding scheme has code clouds of independently generated shapes. These two schemes yield identical achievable rate regions in several scenarios, such as the capacity region for degraded broadcast channels. This paper shows that under the optimal maximum likelihood decoding, these two superposition coding schemes can result in different rate regions. In particular, it is shown that for the two-receiver broadcast channel, Cover's superposition coding scheme can achieve rates strictly larger than Bergmans's scheme.
△ Less
Submitted 5 February, 2013;
originally announced February 2013.
-
Polar codes for the two-user multiple-access channel
Authors:
Eren Sasoglu,
Emre Telatar,
Edmund Yeh
Abstract:
Arikan's polar coding method is extended to two-user multiple-access channels. It is shown that if the two users of the channel use the Arikan construction, the resulting channels will polarize to one of five possible extremals, on each of which uncoded transmission is optimal. The sum rate achieved by this coding technique is the one that correponds to uniform input distributions. The encoding an…
▽ More
Arikan's polar coding method is extended to two-user multiple-access channels. It is shown that if the two users of the channel use the Arikan construction, the resulting channels will polarize to one of five possible extremals, on each of which uncoded transmission is optimal. The sum rate achieved by this coding technique is the one that correponds to uniform input distributions. The encoding and decoding complexities and the error performance of these codes are as in the single-user case: $O(n\log n)$ for encoding and decoding, and $o(\exp(-n^{1/2-ε}))$ for block error probability, where $n$ is the block length.
△ Less
Submitted 22 June, 2010;
originally announced June 2010.
-
An entropy inequality for q-ary random variables and its application to channel polarization
Authors:
Eren Sasoglu
Abstract:
It is shown that given two copies of a q-ary input channel $W$, where q is prime, it is possible to create two channels $W^-$ and $W^+$ whose symmetric capacities satisfy $I(W^-)\le I(W)\le I(W^+)$, where the inequalities are strict except in trivial cases. This leads to a simple proof of channel polarization in the q-ary case.
It is shown that given two copies of a q-ary input channel $W$, where q is prime, it is possible to create two channels $W^-$ and $W^+$ whose symmetric capacities satisfy $I(W^-)\le I(W)\le I(W^+)$, where the inequalities are strict except in trivial cases. This leads to a simple proof of channel polarization in the q-ary case.
△ Less
Submitted 10 June, 2010;
originally announced June 2010.
-
Polarization for arbitrary discrete memoryless channels
Authors:
Eren Sasoglu,
Emre Telatar,
Erdal Arikan
Abstract:
Channel polarization, originally proposed for binary-input channels, is generalized to arbitrary discrete memoryless channels. Specifically, it is shown that when the input alphabet size is a prime number, a similar construction to that for the binary case leads to polarization. This method can be extended to channels of composite input alphabet sizes by decomposing such channels into a set of c…
▽ More
Channel polarization, originally proposed for binary-input channels, is generalized to arbitrary discrete memoryless channels. Specifically, it is shown that when the input alphabet size is a prime number, a similar construction to that for the binary case leads to polarization. This method can be extended to channels of composite input alphabet sizes by decomposing such channels into a set of channels with prime input alphabet sizes. It is also shown that all discrete memoryless channels can be polarized by randomized constructions. The introduction of randomness does not change the order of complexity of polar code construction, encoding, and decoding. A previous result on the error probability behavior of polar codes is also extended to the case of arbitrary discrete memoryless channels. The generalization of polarization to channels with arbitrary finite input alphabet sizes leads to polar-coding methods for approaching the true (as opposed to symmetric) channel capacity of arbitrary channels with discrete or continuous input alphabets.
△ Less
Submitted 3 August, 2009;
originally announced August 2009.
-
Polar Codes: Characterization of Exponent, Bounds, and Constructions
Authors:
Satish Babu Korada,
Eren Sasoglu,
Rudiger Urbanke
Abstract:
Polar codes were recently introduced by Arıkan. They achieve the capacity of arbitrary symmetric binary-input discrete memoryless channels under a low complexity successive cancellation decoding strategy. The original polar code construction is closely related to the recursive construction of Reed-Muller codes and is based on the $2 \times 2$ matrix $\bigl[ 1 &0 1& 1 \bigr]$. It was shown by Arı…
▽ More
Polar codes were recently introduced by Arıkan. They achieve the capacity of arbitrary symmetric binary-input discrete memoryless channels under a low complexity successive cancellation decoding strategy. The original polar code construction is closely related to the recursive construction of Reed-Muller codes and is based on the $2 \times 2$ matrix $\bigl[ 1 &0 1& 1 \bigr]$. It was shown by Arıkan and Telatar that this construction achieves an error exponent of $\frac12$, i.e., that for sufficiently large blocklengths the error probability decays exponentially in the square root of the length. It was already mentioned by Arıkan that in principle larger matrices can be used to construct polar codes. A fundamental question then is to see whether there exist matrices with exponent exceeding $\frac12$. We first show that any $\ell \times \ell$ matrix none of whose column permutations is upper triangular polarizes symmetric channels. We then characterize the exponent of a given square matrix and derive upper and lower bounds on achievable exponents. Using these bounds we show that there are no matrices of size less than 15 with exponents exceeding $\frac12$. Further, we give a general construction based on BCH codes which for large $n$ achieves exponents arbitrarily close to 1 and which exceeds $\frac12$ for size 16.
△ Less
Submitted 26 January, 2009; v1 submitted 5 January, 2009;
originally announced January 2009.
-
A Class of Transformations that Polarize Symmetric Binary-Input Memoryless Channels
Authors:
Satish Babu Korada,
Eren Sasoglu
Abstract:
A generalization of Arıkan's polar code construction using transformations of the form $G^{\otimes n}$ where $G$ is an $\ell \times \ell$ matrix is considered. Necessary and sufficient conditions are given for these transformations to ensure channel polarization. It is shown that a large class of such transformations polarize symmetric binary-input memoryless channels.
A generalization of Arıkan's polar code construction using transformations of the form $G^{\otimes n}$ where $G$ is an $\ell \times \ell$ matrix is considered. Necessary and sufficient conditions are given for these transformations to ensure channel polarization. It is shown that a large class of such transformations polarize symmetric binary-input memoryless channels.
△ Less
Submitted 11 November, 2008;
originally announced November 2008.