Search | arXiv e-print repository

Scaling laws for learning with real and surrogate data

Authors: Ayush Jain, Andrea Montanari, Eren Sasoglu

Abstract: Collecting large quantities of high-quality data can be prohibitively expensive or impractical, and a bottleneck in machine learning. One may instead augment a small set of $n$ data points from the target distribution with data from more accessible sources, e.g. data collected under different circumstances or synthesized by generative models. We refer to such data as `surrogate data.' We introduce… ▽ More Collecting large quantities of high-quality data can be prohibitively expensive or impractical, and a bottleneck in machine learning. One may instead augment a small set of $n$ data points from the target distribution with data from more accessible sources, e.g. data collected under different circumstances or synthesized by generative models. We refer to such data as `surrogate data.' We introduce a weighted empirical risk minimization (ERM) approach for integrating surrogate data into training. We analyze mathematically this method under several classical statistical models, and validate our findings empirically on datasets from different domains. Our main findings are: $(i)$ Integrating surrogate data can significantly reduce the test error on the original distribution. Surprisingly, this can happen even when the surrogate data is unrelated to the original ones. We trace back this behavior to the classical Stein's paradox. $(ii)$ In order to reap the benefit of surrogate data, it is crucial to use optimally weighted ERM. $(iii)$ The test error of models trained on mixtures of real and surrogate data is approximately described by a scaling law. This scaling law can be used to predict the optimal weighting scheme, and to choose the amount of surrogate data to add. △ Less

Submitted 28 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: Added new experiments

arXiv:1701.02345 [pdf, ps, other]

Sliding-Window Superposition Coding:Two-User Interference Channels

Authors: Lele Wang, Young-Han Kim, Chiao-Yi Chen, Hosung Park, Eren Sasoglu

Abstract: A low-complexity coding scheme is developed to achieve the rate region of maximum likelihood decoding for interference channels. As in the classical rate-splitting multiple access scheme by Grant, Rimoldi, Urbanke, and Whiting, the proposed coding scheme uses superposition of multiple codewords with successive cancellation decoding, which can be implemented using standard point-to-point encoders a… ▽ More A low-complexity coding scheme is developed to achieve the rate region of maximum likelihood decoding for interference channels. As in the classical rate-splitting multiple access scheme by Grant, Rimoldi, Urbanke, and Whiting, the proposed coding scheme uses superposition of multiple codewords with successive cancellation decoding, which can be implemented using standard point-to-point encoders and decoders. Unlike rate-splitting multiple access, which is not rate-optimal for multiple receivers, the proposed coding scheme transmits codewords over multiple blocks in a staggered manner and recovers them successively over sliding decoding windows, achieving the single-stream optimal rate region as well as the more general Han--Kobayashi inner bound for the two-user interference channel. The feasibility of this scheme in practice is verified by implementing it using commercial channel codes over the two-user Gaussian interference channel. △ Less

Submitted 9 January, 2017; originally announced January 2017.

Comments: Submitted to the IEEE Transactions on Information Theory

arXiv:1602.01870 [pdf, other]

Polar Coding for Processes with Memory

Authors: Eren Sasoglu, Ido Tal

Abstract: We study polar coding for stochastic processes with memory. For example, a process may be defined by the joint distribution of the input and output of a channel. The memory may be present in the channel, the input, or both. We show that $ψ$-mixing processes polarize under the standard Arıkan transform, under a mild condition. We further show that the rate of polarization of the \emph{low-entropy}… ▽ More We study polar coding for stochastic processes with memory. For example, a process may be defined by the joint distribution of the input and output of a channel. The memory may be present in the channel, the input, or both. We show that $ψ$-mixing processes polarize under the standard Arıkan transform, under a mild condition. We further show that the rate of polarization of the \emph{low-entropy} synthetic channels is roughly $O(2^{-\sqrt{N}})$, where $N$ is the blocklength. That is, essentially the same rate as in the memoryless case. △ Less

Submitted 15 August, 2018; v1 submitted 4 February, 2016; originally announced February 2016.

Comments: Submitted to IEEE Transactions on Information Theory

arXiv:1601.04689 [pdf, ps, other]

Reed-Muller Codes Achieve Capacity on Erasure Channels

Authors: Shrinivas Kudekar, Santhosh Kumar, Marco Mondelli, Henry D. Pfister, Eren Şaşoğlu, Rüdiger Urbanke

Abstract: We introduce a new approach to proving that a sequence of deterministic linear codes achieves capacity on an erasure channel under maximum a posteriori decoding. Rather than relying on the precise structure of the codes our method exploits code symmetry. In particular, the technique applies to any sequence of linear codes where the blocklengths are strictly increasing, the code rates converge, and… ▽ More We introduce a new approach to proving that a sequence of deterministic linear codes achieves capacity on an erasure channel under maximum a posteriori decoding. Rather than relying on the precise structure of the codes our method exploits code symmetry. In particular, the technique applies to any sequence of linear codes where the blocklengths are strictly increasing, the code rates converge, and the permutation group of each code is doubly transitive. In other words, we show that symmetry alone implies near-optimal performance. An important consequence of this result is that a sequence of Reed-Muller codes with increasing blocklength and converging rate achieves capacity. This possibility has been suggested previously in the literature but it has only been proven for cases where the limiting code rate is 0 or 1. Moreover, these results extend naturally to all affine-invariant codes and, thus, to extended primitive narrow-sense BCH codes. This also resolves, in the affirmative, the existence question for capacity-achieving sequences of binary cyclic codes. The primary tools used in the proof are the sharp threshold property for symmetric monotone boolean functions and the area theorem for extrinsic information transfer functions. △ Less

Submitted 18 January, 2016; originally announced January 2016.

Comments: This article combines our previous articles arXiv:1505.05123 and arXiv:1505.05831

arXiv:1505.05831 [pdf, ps, other]

Reed-Muller Codes Achieve Capacity on the Binary Erasure Channel under MAP Decoding

Authors: Shrinivas Kudekar, Marco Mondelli, Eren Şaşoğlu, Rüdiger Urbanke

Abstract: We show that Reed-Muller codes achieve capacity under maximum a posteriori bit decoding for transmission over the binary erasure channel for all rates $0 < R < 1$. The proof is generic and applies to other codes with sufficient amount of symmetry as well. The main idea is to combine the following observations: (i) monotone functions experience a sharp threshold behavior, (ii) the extrinsic informa… ▽ More We show that Reed-Muller codes achieve capacity under maximum a posteriori bit decoding for transmission over the binary erasure channel for all rates $0 < R < 1$. The proof is generic and applies to other codes with sufficient amount of symmetry as well. The main idea is to combine the following observations: (i) monotone functions experience a sharp threshold behavior, (ii) the extrinsic information transfer (EXIT) functions are monotone, (iii) Reed--Muller codes are 2-transitive and thus the EXIT functions associated with their codeword bits are all equal, and (iv) therefore the Area Theorem for the average EXIT functions implies that RM codes' threshold is at channel capacity. △ Less

Submitted 21 May, 2015; originally announced May 2015.

Comments: 4 pages

arXiv:1502.01975 [pdf, other]

Optimal Haplotype Assembly from High-Throughput Mate-Pair Reads

Authors: Govinda M. Kamath, Eren Şaşoğlu, David Tse

Abstract: Humans have $23$ pairs of homologous chromosomes. The homologous pairs are almost identical pairs of chromosomes. For the most part, differences in homologous chromosome occur at certain documented positions called single nucleotide polymorphisms (SNPs). A haplotype of an individual is the pair of sequences of SNPs on the two homologous chromosomes. In this paper, we study the problem of inferring… ▽ More Humans have $23$ pairs of homologous chromosomes. The homologous pairs are almost identical pairs of chromosomes. For the most part, differences in homologous chromosome occur at certain documented positions called single nucleotide polymorphisms (SNPs). A haplotype of an individual is the pair of sequences of SNPs on the two homologous chromosomes. In this paper, we study the problem of inferring haplotypes of individuals from mate-pair reads of their genome. We give a simple formula for the coverage needed for haplotype assembly, under a generative model. The analysis here leverages connections of this problem with decoding convolutional codes. △ Less

Submitted 6 February, 2015; originally announced February 2015.

Comments: 10 pages, 4 figures, Submitted to ISIT 2015

arXiv:1401.7293 [pdf, ps, other]

Polar coding for interference networks

Authors: Lele Wang, Eren Sasoglu

Abstract: A polar coding scheme for interference networks is introduced. The scheme combines Arikan's monotone chain rules for multiple-access channels and a method by Hassani and Urbanke to 'align' two incompatible polarization processes. It achieves the Han--Kobayashi inner bound for two-user interference channels and generalizes to interference networks. A polar coding scheme for interference networks is introduced. The scheme combines Arikan's monotone chain rules for multiple-access channels and a method by Hassani and Urbanke to 'align' two incompatible polarization processes. It achieves the Han--Kobayashi inner bound for two-user interference channels and generalizes to interference networks. △ Less

Submitted 28 January, 2014; originally announced January 2014.

Comments: Shorter version submitted to ISIT 2014

arXiv:1307.7495 [pdf, ps, other]

Universal Polarization

Authors: Eren Sasoglu, Lele Wang

Abstract: A method to polarize channels universally is introduced. The method is based on combining two distinct channels in each polarization step, as opposed to Arikan's original method of combining identical channels. This creates an equal number of only two types of channels, one of which becomes progressively better as the other becomes worse. The locations of the good polarized channels are independen… ▽ More A method to polarize channels universally is introduced. The method is based on combining two distinct channels in each polarization step, as opposed to Arikan's original method of combining identical channels. This creates an equal number of only two types of channels, one of which becomes progressively better as the other becomes worse. The locations of the good polarized channels are independent of the underlying channel, guaranteeing universality. Polarizing the good channels further with Arikan's method results in universal polar codes of rate 1/2. The method is generalized to construct codes of arbitrary rates. It is also shown that the less noisy ordering of channels is preserved under polarization, and thus a good polar code for a given channel will perform well over a less noisy one. △ Less

Submitted 27 December, 2013; v1 submitted 29 July, 2013; originally announced July 2013.

Comments: Submitted to the IEEE Transactions on Information Theory

arXiv:1302.1601 [pdf, ps, other]

doi 10.1109/ISIT.2013.6620369

On the Capacity Region for Index Coding

Authors: Fatemeh Arbabjolfaei, Bernd Bandemer, Young-Han Kim, Eren Sasoglu, Lele Wang

Abstract: A new inner bound on the capacity region of a general index coding problem is established. Unlike most existing bounds that are based on graph theoretic or algebraic tools, the bound is built on a random coding scheme and optimal decoding, and has a simple polymatroidal single-letter expression. The utility of the inner bound is demonstrated by examples that include the capacity region for all ind… ▽ More A new inner bound on the capacity region of a general index coding problem is established. Unlike most existing bounds that are based on graph theoretic or algebraic tools, the bound is built on a random coding scheme and optimal decoding, and has a simple polymatroidal single-letter expression. The utility of the inner bound is demonstrated by examples that include the capacity region for all index coding problems with up to five messages (there are 9846 nonisomorphic ones). △ Less

Submitted 15 June, 2013; v1 submitted 6 February, 2013; originally announced February 2013.

Comments: 5 pages, 6 figures, accepted to the 2013 IEEE International Symposium on Information Theory (ISIT), Istanbul, Turkey, July 2013

arXiv:1302.1258 [pdf, ps, other]

doi 10.1109/ISIT.2013.6620770

A Comparison of Superposition Coding Schemes

Authors: Lele Wang, Eren Sasoglu, Bernd Bandemer, Young-Han Kim

Abstract: There are two variants of superposition coding schemes. Cover's original superposition coding scheme has code clouds of the identical shape, while Bergmans's superposition coding scheme has code clouds of independently generated shapes. These two schemes yield identical achievable rate regions in several scenarios, such as the capacity region for degraded broadcast channels. This paper shows that… ▽ More There are two variants of superposition coding schemes. Cover's original superposition coding scheme has code clouds of the identical shape, while Bergmans's superposition coding scheme has code clouds of independently generated shapes. These two schemes yield identical achievable rate regions in several scenarios, such as the capacity region for degraded broadcast channels. This paper shows that under the optimal maximum likelihood decoding, these two superposition coding schemes can result in different rate regions. In particular, it is shown that for the two-receiver broadcast channel, Cover's superposition coding scheme can achieve rates strictly larger than Bergmans's scheme. △ Less

Submitted 5 February, 2013; originally announced February 2013.

Comments: 5 pages, 3 figures, 1 table, submitted to IEEE International Symposium on Information Theory (ISIT 2013)

arXiv:1006.4255 [pdf, other]

doi 10.1109/TIT.2013.2268946

Polar codes for the two-user multiple-access channel

Authors: Eren Sasoglu, Emre Telatar, Edmund Yeh

Abstract: Arikan's polar coding method is extended to two-user multiple-access channels. It is shown that if the two users of the channel use the Arikan construction, the resulting channels will polarize to one of five possible extremals, on each of which uncoded transmission is optimal. The sum rate achieved by this coding technique is the one that correponds to uniform input distributions. The encoding an… ▽ More Arikan's polar coding method is extended to two-user multiple-access channels. It is shown that if the two users of the channel use the Arikan construction, the resulting channels will polarize to one of five possible extremals, on each of which uncoded transmission is optimal. The sum rate achieved by this coding technique is the one that correponds to uniform input distributions. The encoding and decoding complexities and the error performance of these codes are as in the single-user case: $O(n\log n)$ for encoding and decoding, and $o(\exp(-n^{1/2-ε}))$ for block error probability, where $n$ is the block length. △ Less

Submitted 22 June, 2010; originally announced June 2010.

Comments: 12 pages. Submitted to the IEEE Transactions on Information Theory

arXiv:1006.2006 [pdf, ps, other]

doi 10.1109/ISIT.2010.5513502

An entropy inequality for q-ary random variables and its application to channel polarization

Authors: Eren Sasoglu

Abstract: It is shown that given two copies of a q-ary input channel $W$, where q is prime, it is possible to create two channels $W^-$ and $W^+$ whose symmetric capacities satisfy $I(W^-)\le I(W)\le I(W^+)$, where the inequalities are strict except in trivial cases. This leads to a simple proof of channel polarization in the q-ary case. It is shown that given two copies of a q-ary input channel $W$, where q is prime, it is possible to create two channels $W^-$ and $W^+$ whose symmetric capacities satisfy $I(W^-)\le I(W)\le I(W^+)$, where the inequalities are strict except in trivial cases. This leads to a simple proof of channel polarization in the q-ary case. △ Less

Submitted 10 June, 2010; originally announced June 2010.

Comments: To be presented at the IEEE 2010 International Symposium on Information Theory

arXiv:0908.0302 [pdf, ps, other]

Polarization for arbitrary discrete memoryless channels

Authors: Eren Sasoglu, Emre Telatar, Erdal Arikan

Abstract: Channel polarization, originally proposed for binary-input channels, is generalized to arbitrary discrete memoryless channels. Specifically, it is shown that when the input alphabet size is a prime number, a similar construction to that for the binary case leads to polarization. This method can be extended to channels of composite input alphabet sizes by decomposing such channels into a set of c… ▽ More Channel polarization, originally proposed for binary-input channels, is generalized to arbitrary discrete memoryless channels. Specifically, it is shown that when the input alphabet size is a prime number, a similar construction to that for the binary case leads to polarization. This method can be extended to channels of composite input alphabet sizes by decomposing such channels into a set of channels with prime input alphabet sizes. It is also shown that all discrete memoryless channels can be polarized by randomized constructions. The introduction of randomness does not change the order of complexity of polar code construction, encoding, and decoding. A previous result on the error probability behavior of polar codes is also extended to the case of arbitrary discrete memoryless channels. The generalization of polarization to channels with arbitrary finite input alphabet sizes leads to polar-coding methods for approaching the true (as opposed to symmetric) channel capacity of arbitrary channels with discrete or continuous input alphabets. △ Less

Submitted 3 August, 2009; originally announced August 2009.

Comments: 12 pages

arXiv:0901.0536 [pdf, ps, other]

Polar Codes: Characterization of Exponent, Bounds, and Constructions

Authors: Satish Babu Korada, Eren Sasoglu, Rudiger Urbanke

Abstract: Polar codes were recently introduced by Arıkan. They achieve the capacity of arbitrary symmetric binary-input discrete memoryless channels under a low complexity successive cancellation decoding strategy. The original polar code construction is closely related to the recursive construction of Reed-Muller codes and is based on the $2 \times 2$ matrix $\bigl[ 1 &0 1& 1 \bigr]$. It was shown by Arı… ▽ More Polar codes were recently introduced by Arıkan. They achieve the capacity of arbitrary symmetric binary-input discrete memoryless channels under a low complexity successive cancellation decoding strategy. The original polar code construction is closely related to the recursive construction of Reed-Muller codes and is based on the $2 \times 2$ matrix $\bigl[ 1 &0 1& 1 \bigr]$. It was shown by Arıkan and Telatar that this construction achieves an error exponent of $\frac12$, i.e., that for sufficiently large blocklengths the error probability decays exponentially in the square root of the length. It was already mentioned by Arıkan that in principle larger matrices can be used to construct polar codes. A fundamental question then is to see whether there exist matrices with exponent exceeding $\frac12$. We first show that any $\ell \times \ell$ matrix none of whose column permutations is upper triangular polarizes symmetric channels. We then characterize the exponent of a given square matrix and derive upper and lower bounds on achievable exponents. Using these bounds we show that there are no matrices of size less than 15 with exponents exceeding $\frac12$. Further, we give a general construction based on BCH codes which for large $n$ achieves exponents arbitrarily close to 1 and which exceeds $\frac12$ for size 16. △ Less

Submitted 26 January, 2009; v1 submitted 5 January, 2009; originally announced January 2009.

Comments: Submitted to IEEE Transactions on Information Theory, minor updates

arXiv:0811.1770 [pdf, ps, other]

A Class of Transformations that Polarize Symmetric Binary-Input Memoryless Channels

Authors: Satish Babu Korada, Eren Sasoglu

Abstract: A generalization of Arıkan's polar code construction using transformations of the form $G^{\otimes n}$ where $G$ is an $\ell \times \ell$ matrix is considered. Necessary and sufficient conditions are given for these transformations to ensure channel polarization. It is shown that a large class of such transformations polarize symmetric binary-input memoryless channels. A generalization of Arıkan's polar code construction using transformations of the form $G^{\otimes n}$ where $G$ is an $\ell \times \ell$ matrix is considered. Necessary and sufficient conditions are given for these transformations to ensure channel polarization. It is shown that a large class of such transformations polarize symmetric binary-input memoryless channels. △ Less

Submitted 11 November, 2008; originally announced November 2008.

Comments: 7 pages, 1 figure

Showing 1–15 of 15 results for author: Sasoglu, E