Search | arXiv e-print repository

arXiv:1903.06135 [pdf, other]

Deep Switch Networks for Generating Discrete Data and Language

Abstract: Multilayer switch networks are proposed as artificial generators of high-dimensional discrete data (e.g., binary vectors, categorical data, natural language, network log files, and discrete-valued time series). Unlike deconvolution networks which generate continuous-valued data and which consist of upsampling filters and reverse pooling layers, multilayer switch networks are composed of adaptive s… ▽ More Multilayer switch networks are proposed as artificial generators of high-dimensional discrete data (e.g., binary vectors, categorical data, natural language, network log files, and discrete-valued time series). Unlike deconvolution networks which generate continuous-valued data and which consist of upsampling filters and reverse pooling layers, multilayer switch networks are composed of adaptive switches which model conditional distributions of discrete random variables. An interpretable, statistical framework is introduced for training these nonlinear networks based on a maximum-likelihood objective function. To learn network parameters, stochastic gradient descent is applied to the objective. This direct optimization is stable until convergence, and does not involve back-propagation over separate encoder and decoder networks, or adversarial training of dueling networks. While training remains tractable for moderately sized networks, Markov-chain Monte Carlo (MCMC) approximations of gradients are derived for deep networks which contain latent variables. The statistical framework is evaluated on synthetic data, high-dimensional binary data of handwritten digits, and web-crawled natural language data. Aspects of the model's framework such as interpretability, computational complexity, and generalization ability are discussed. △ Less

Submitted 14 March, 2019; originally announced March 2019.

Comments: To be presented at the AISTATS-2019 conference, 12 pages, double-column

arXiv:1809.05073 [pdf, ps, other]

doi 10.1109/TIT.2020.3016605

Channel Polarization through the Lens of Blackwell Measures

Authors: Naveen Goela, Maxim Raginsky

Abstract: Each memoryless binary-input channel (BIC) can be uniquely described by its Blackwell measure, which is a probability distribution on the unit interval $[0,1]$ with mean $1/2$. Conversely, any such probability distribution defines a BIC. The evolution of the Blackwell measure under Arikan's polar transform is derived for general BICs, and is analogous to density evolution as cited in the literatur… ▽ More Each memoryless binary-input channel (BIC) can be uniquely described by its Blackwell measure, which is a probability distribution on the unit interval $[0,1]$ with mean $1/2$. Conversely, any such probability distribution defines a BIC. The evolution of the Blackwell measure under Arikan's polar transform is derived for general BICs, and is analogous to density evolution as cited in the literature. The present analysis emphasizes functional equations. Consequently, the evolution of a variety of channel functionals is characterized, including the symmetric capacity, Bhattacharyya parameter, moments of information density, Hellinger affinity, Gallager's reliability function, the Hirschfeld-Gebelein-Renyi maximal correlation, and the Bayesian information gain. The evolution of measure is specialized for symmetric BICs according to their decomposition into binary symmetric (sub)-channels (BSCs), which simplifies iterative computations and the construction of polar codes. It is verified that, as a consequence of the Blackwell--Sherman--Stein theorem, all channel functionals $\mathrm{I}_f$ that can be expressed as an expectation of a convex function $f$ with respect to the Blackwell measure of a channel polarize in each iteration due to the polar transformation on the class of symmetric BICs. Moreover, for $f$ either convex or non-convex, a necessary and sufficient condition is established to determine whether the random process associated with each $\mathrm{I}_f$ is a martingale, submartingale, or supermartingale. Represented via functional inequalities in terms of $f$, this condition is numerically verifiable for all $\mathrm{I}_f$, and can generate analytical proofs. To exhibit one such proof, it is shown that the random process associated with the squared maximal correlation parameter is a supermartingale, and converges almost surely on the unit interval $[0,1]$. △ Less

Submitted 7 September, 2020; v1 submitted 13 September, 2018; originally announced September 2018.

Comments: 20 pages, double-column, 5 figures

Journal ref: IEEE Transactions on Information Theory, vol. 66, no. 10, pp. 6222-6241, Oct. 2020

arXiv:1301.6150 [pdf, ps, other]

doi 10.1109/TIT.2014.2378172

Polar Codes For Broadcast Channels

Authors: Naveen Goela, Emmanuel Abbe, Michael Gastpar

Abstract: Polar codes are introduced for discrete memoryless broadcast channels. For $m$-user deterministic broadcast channels, polarization is applied to map uniformly random message bits from $m$ independent messages to one codeword while satisfying broadcast constraints. The polarization-based codes achieve rates on the boundary of the private-message capacity region. For two-user noisy broadcast channel… ▽ More Polar codes are introduced for discrete memoryless broadcast channels. For $m$-user deterministic broadcast channels, polarization is applied to map uniformly random message bits from $m$ independent messages to one codeword while satisfying broadcast constraints. The polarization-based codes achieve rates on the boundary of the private-message capacity region. For two-user noisy broadcast channels, polar implementations are presented for two information-theoretic schemes: i) Cover's superposition codes; ii) Marton's codes. Due to the structure of polarization, constraints on the auxiliary and channel-input distributions are identified to ensure proper alignment of polarization indices in the multi-user setting. The codes achieve rates on the capacity boundary of a few classes of broadcast channels (e.g., binary-input stochastically degraded). The complexity of encoding and decoding is $O(n*log n)$ where $n$ is the block length. In addition, polar code sequences obtain a stretched-exponential decay of $O(2^{-n^β})$ of the average block error probability where $0 < β< 0.5$. △ Less

Submitted 25 January, 2013; originally announced January 2013.

Comments: 25 pages, double-column, 7 figures

Journal ref: IEEE Transactions on Information Theory, vol. 61, no. 2, pp. 758-782, Feb. 2015

arXiv:1209.3358 [pdf, ps, other]

Computation in Multicast Networks: Function Alignment and Converse Theorems

Authors: Changho Suh, Naveen Goela, Michael Gastpar

Abstract: The classical problem in network coding theory considers communication over multicast networks. Multiple transmitters send independent messages to multiple receivers which decode the same set of messages. In this work, computation over multicast networks is considered: each receiver decodes an identical function of the original messages. For a countably infinite class of two-transmitter two-receiv… ▽ More The classical problem in network coding theory considers communication over multicast networks. Multiple transmitters send independent messages to multiple receivers which decode the same set of messages. In this work, computation over multicast networks is considered: each receiver decodes an identical function of the original messages. For a countably infinite class of two-transmitter two-receiver single-hop linear deterministic networks, the computing capacity is characterized for a linear function (modulo-2 sum) of Bernoulli sources. Inspired by the geometric concept of interference alignment in networks, a new achievable coding scheme called function alignment is introduced. A new converse theorem is established that is tighter than cut-set based and genie-aided bounds. Computation (vs. communication) over multicast networks requires additional analysis to account for multiple receivers sharing a network's computational resources. We also develop a network decomposition theorem which identifies elementary parallel subnetworks that can constitute an original network without loss of optimality. The decomposition theorem provides a conceptually-simpler algebraic proof of achievability that generalizes to $L$-transmitter $L$-receiver networks. △ Less

Submitted 16 February, 2016; v1 submitted 15 September, 2012; originally announced September 2012.

Comments: to appear in the IEEE Transactions on Information Theory

arXiv:1205.4168 [pdf, ps, other]

Approximate Feedback Capacity of the Gaussian Multicast Channel

Authors: Changho Suh, Naveen Goela, Michael Gastpar

Abstract: We characterize the capacity region to within log{2(M-1)} bits/s/Hz for the M-transmitter K-receiver Gaussian multicast channel with feedback where each receiver wishes to decode every message from the M transmitters. Extending Cover-Leung's achievable scheme intended for (M,K)=(2,1), we show that this generalized scheme achieves the cutset-based outer bound within log{2(M-1)} bits per transmitter… ▽ More We characterize the capacity region to within log{2(M-1)} bits/s/Hz for the M-transmitter K-receiver Gaussian multicast channel with feedback where each receiver wishes to decode every message from the M transmitters. Extending Cover-Leung's achievable scheme intended for (M,K)=(2,1), we show that this generalized scheme achieves the cutset-based outer bound within log{2(M-1)} bits per transmitter for all channel parameters. In contrast to the capacity in the non-feedback case, the feedback capacity improves upon the naive intersection of the feedback capacities of K individual multiple access channels. We find that feedback provides unbounded multiplicative gain at high signal-to-noise ratios as was shown in the Gaussian interference channel. To complement the results, we establish the exact feedback capacity of the Avestimehr-Diggavi-Tse (ADT) deterministic model, from which we make the observation that feedback can also be beneficial for function computation. △ Less

Submitted 18 May, 2012; originally announced May 2012.

Comments: Extended version of a conference paper that appears in ISIT 2012

arXiv:1202.6299 [pdf, ps, other]

doi 10.1109/TSP.2012.2188716

Reduced-Dimension Linear Transform Coding of Correlated Signals in Networks

Authors: Naveen Goela, Michael Gastpar

Abstract: A model, called the linear transform network (LTN), is proposed to analyze the compression and estimation of correlated signals transmitted over directed acyclic graphs (DAGs). An LTN is a DAG network with multiple source and receiver nodes. Source nodes transmit subspace projections of random correlated signals by applying reduced-dimension linear transforms. The subspace projections are linearly… ▽ More A model, called the linear transform network (LTN), is proposed to analyze the compression and estimation of correlated signals transmitted over directed acyclic graphs (DAGs). An LTN is a DAG network with multiple source and receiver nodes. Source nodes transmit subspace projections of random correlated signals by applying reduced-dimension linear transforms. The subspace projections are linearly processed by multiple relays and routed to intended receivers. Each receiver applies a linear estimator to approximate a subset of the sources with minimum mean squared error (MSE) distortion. The model is extended to include noisy networks with power constraints on transmitters. A key task is to compute all local compression matrices and linear estimators in the network to minimize end-to-end distortion. The non-convex problem is solved iteratively within an optimization framework using constrained quadratic programs (QPs). The proposed algorithm recovers as special cases the regular and distributed Karhunen-Loeve transforms (KLTs). Cut-set lower bounds on the distortion region of multi-source, multi-receiver networks are given for linear coding based on convex relaxations. Cut-set lower bounds are also given for any coding strategy based on information theory. The distortion region and compression-estimation tradeoffs are illustrated for different communication demands (e.g. multiple unicast), and graph structures. △ Less

Submitted 28 February, 2012; originally announced February 2012.

Comments: 33 pages, 7 figures, To appear in IEEE Transactions on Signal Processing

Journal ref: IEEE Transactions on Signal Processing, vol. 60, no. 6, pp. 3174-3187, June 2012

arXiv:1105.2621 [pdf, ps, other]

A Compressed Sensing Wire-Tap Channel

Authors: Galen Reeves, Naveen Goela, Nebojsa Milosavljevic, Michael Gastpar

Abstract: A multiplicative Gaussian wire-tap channel inspired by compressed sensing is studied. Lower and upper bounds on the secrecy capacity are derived, and shown to be relatively tight in the large system limit for a large class of compressed sensing matrices. Surprisingly, it is shown that the secrecy capacity of this channel is nearly equal to the capacity without any secrecy constraint provided that… ▽ More A multiplicative Gaussian wire-tap channel inspired by compressed sensing is studied. Lower and upper bounds on the secrecy capacity are derived, and shown to be relatively tight in the large system limit for a large class of compressed sensing matrices. Surprisingly, it is shown that the secrecy capacity of this channel is nearly equal to the capacity without any secrecy constraint provided that the channel of the eavesdropper is strictly worse than the channel of the intended receiver. In other words, the eavesdropper can see almost everything and yet learn almost nothing. This behavior, which contrasts sharply with that of many commonly studied wiretap channels, is made possible by the fact that a small number of linear projections can make a crucial difference in the ability to estimate sparse vectors. △ Less

Submitted 13 May, 2011; originally announced May 2011.

Showing 1–7 of 7 results for author: Goela, N