-
Deep Switch Networks for Generating Discrete Data and Language
Authors:
Payam Delgosha,
Naveen Goela
Abstract:
Multilayer switch networks are proposed as artificial generators of high-dimensional discrete data (e.g., binary vectors, categorical data, natural language, network log files, and discrete-valued time series). Unlike deconvolution networks which generate continuous-valued data and which consist of upsampling filters and reverse pooling layers, multilayer switch networks are composed of adaptive s…
▽ More
Multilayer switch networks are proposed as artificial generators of high-dimensional discrete data (e.g., binary vectors, categorical data, natural language, network log files, and discrete-valued time series). Unlike deconvolution networks which generate continuous-valued data and which consist of upsampling filters and reverse pooling layers, multilayer switch networks are composed of adaptive switches which model conditional distributions of discrete random variables. An interpretable, statistical framework is introduced for training these nonlinear networks based on a maximum-likelihood objective function. To learn network parameters, stochastic gradient descent is applied to the objective. This direct optimization is stable until convergence, and does not involve back-propagation over separate encoder and decoder networks, or adversarial training of dueling networks. While training remains tractable for moderately sized networks, Markov-chain Monte Carlo (MCMC) approximations of gradients are derived for deep networks which contain latent variables. The statistical framework is evaluated on synthetic data, high-dimensional binary data of handwritten digits, and web-crawled natural language data. Aspects of the model's framework such as interpretability, computational complexity, and generalization ability are discussed.
△ Less
Submitted 14 March, 2019;
originally announced March 2019.
-
Channel Polarization through the Lens of Blackwell Measures
Authors:
Naveen Goela,
Maxim Raginsky
Abstract:
Each memoryless binary-input channel (BIC) can be uniquely described by its Blackwell measure, which is a probability distribution on the unit interval $[0,1]$ with mean $1/2$. Conversely, any such probability distribution defines a BIC. The evolution of the Blackwell measure under Arikan's polar transform is derived for general BICs, and is analogous to density evolution as cited in the literatur…
▽ More
Each memoryless binary-input channel (BIC) can be uniquely described by its Blackwell measure, which is a probability distribution on the unit interval $[0,1]$ with mean $1/2$. Conversely, any such probability distribution defines a BIC. The evolution of the Blackwell measure under Arikan's polar transform is derived for general BICs, and is analogous to density evolution as cited in the literature. The present analysis emphasizes functional equations. Consequently, the evolution of a variety of channel functionals is characterized, including the symmetric capacity, Bhattacharyya parameter, moments of information density, Hellinger affinity, Gallager's reliability function, the Hirschfeld-Gebelein-Renyi maximal correlation, and the Bayesian information gain. The evolution of measure is specialized for symmetric BICs according to their decomposition into binary symmetric (sub)-channels (BSCs), which simplifies iterative computations and the construction of polar codes. It is verified that, as a consequence of the Blackwell--Sherman--Stein theorem, all channel functionals $\mathrm{I}_f$ that can be expressed as an expectation of a convex function $f$ with respect to the Blackwell measure of a channel polarize in each iteration due to the polar transformation on the class of symmetric BICs. Moreover, for $f$ either convex or non-convex, a necessary and sufficient condition is established to determine whether the random process associated with each $\mathrm{I}_f$ is a martingale, submartingale, or supermartingale. Represented via functional inequalities in terms of $f$, this condition is numerically verifiable for all $\mathrm{I}_f$, and can generate analytical proofs. To exhibit one such proof, it is shown that the random process associated with the squared maximal correlation parameter is a supermartingale, and converges almost surely on the unit interval $[0,1]$.
△ Less
Submitted 7 September, 2020; v1 submitted 13 September, 2018;
originally announced September 2018.
-
Polar Codes For Broadcast Channels
Authors:
Naveen Goela,
Emmanuel Abbe,
Michael Gastpar
Abstract:
Polar codes are introduced for discrete memoryless broadcast channels. For $m$-user deterministic broadcast channels, polarization is applied to map uniformly random message bits from $m$ independent messages to one codeword while satisfying broadcast constraints. The polarization-based codes achieve rates on the boundary of the private-message capacity region. For two-user noisy broadcast channel…
▽ More
Polar codes are introduced for discrete memoryless broadcast channels. For $m$-user deterministic broadcast channels, polarization is applied to map uniformly random message bits from $m$ independent messages to one codeword while satisfying broadcast constraints. The polarization-based codes achieve rates on the boundary of the private-message capacity region. For two-user noisy broadcast channels, polar implementations are presented for two information-theoretic schemes: i) Cover's superposition codes; ii) Marton's codes. Due to the structure of polarization, constraints on the auxiliary and channel-input distributions are identified to ensure proper alignment of polarization indices in the multi-user setting. The codes achieve rates on the capacity boundary of a few classes of broadcast channels (e.g., binary-input stochastically degraded). The complexity of encoding and decoding is $O(n*log n)$ where $n$ is the block length. In addition, polar code sequences obtain a stretched-exponential decay of $O(2^{-n^β})$ of the average block error probability where $0 < β< 0.5$.
△ Less
Submitted 25 January, 2013;
originally announced January 2013.
-
Computation in Multicast Networks: Function Alignment and Converse Theorems
Authors:
Changho Suh,
Naveen Goela,
Michael Gastpar
Abstract:
The classical problem in network coding theory considers communication over multicast networks. Multiple transmitters send independent messages to multiple receivers which decode the same set of messages. In this work, computation over multicast networks is considered: each receiver decodes an identical function of the original messages. For a countably infinite class of two-transmitter two-receiv…
▽ More
The classical problem in network coding theory considers communication over multicast networks. Multiple transmitters send independent messages to multiple receivers which decode the same set of messages. In this work, computation over multicast networks is considered: each receiver decodes an identical function of the original messages. For a countably infinite class of two-transmitter two-receiver single-hop linear deterministic networks, the computing capacity is characterized for a linear function (modulo-2 sum) of Bernoulli sources. Inspired by the geometric concept of interference alignment in networks, a new achievable coding scheme called function alignment is introduced. A new converse theorem is established that is tighter than cut-set based and genie-aided bounds. Computation (vs. communication) over multicast networks requires additional analysis to account for multiple receivers sharing a network's computational resources. We also develop a network decomposition theorem which identifies elementary parallel subnetworks that can constitute an original network without loss of optimality. The decomposition theorem provides a conceptually-simpler algebraic proof of achievability that generalizes to $L$-transmitter $L$-receiver networks.
△ Less
Submitted 16 February, 2016; v1 submitted 15 September, 2012;
originally announced September 2012.
-
Approximate Feedback Capacity of the Gaussian Multicast Channel
Authors:
Changho Suh,
Naveen Goela,
Michael Gastpar
Abstract:
We characterize the capacity region to within log{2(M-1)} bits/s/Hz for the M-transmitter K-receiver Gaussian multicast channel with feedback where each receiver wishes to decode every message from the M transmitters. Extending Cover-Leung's achievable scheme intended for (M,K)=(2,1), we show that this generalized scheme achieves the cutset-based outer bound within log{2(M-1)} bits per transmitter…
▽ More
We characterize the capacity region to within log{2(M-1)} bits/s/Hz for the M-transmitter K-receiver Gaussian multicast channel with feedback where each receiver wishes to decode every message from the M transmitters. Extending Cover-Leung's achievable scheme intended for (M,K)=(2,1), we show that this generalized scheme achieves the cutset-based outer bound within log{2(M-1)} bits per transmitter for all channel parameters. In contrast to the capacity in the non-feedback case, the feedback capacity improves upon the naive intersection of the feedback capacities of K individual multiple access channels. We find that feedback provides unbounded multiplicative gain at high signal-to-noise ratios as was shown in the Gaussian interference channel. To complement the results, we establish the exact feedback capacity of the Avestimehr-Diggavi-Tse (ADT) deterministic model, from which we make the observation that feedback can also be beneficial for function computation.
△ Less
Submitted 18 May, 2012;
originally announced May 2012.
-
Reduced-Dimension Linear Transform Coding of Correlated Signals in Networks
Authors:
Naveen Goela,
Michael Gastpar
Abstract:
A model, called the linear transform network (LTN), is proposed to analyze the compression and estimation of correlated signals transmitted over directed acyclic graphs (DAGs). An LTN is a DAG network with multiple source and receiver nodes. Source nodes transmit subspace projections of random correlated signals by applying reduced-dimension linear transforms. The subspace projections are linearly…
▽ More
A model, called the linear transform network (LTN), is proposed to analyze the compression and estimation of correlated signals transmitted over directed acyclic graphs (DAGs). An LTN is a DAG network with multiple source and receiver nodes. Source nodes transmit subspace projections of random correlated signals by applying reduced-dimension linear transforms. The subspace projections are linearly processed by multiple relays and routed to intended receivers. Each receiver applies a linear estimator to approximate a subset of the sources with minimum mean squared error (MSE) distortion. The model is extended to include noisy networks with power constraints on transmitters. A key task is to compute all local compression matrices and linear estimators in the network to minimize end-to-end distortion. The non-convex problem is solved iteratively within an optimization framework using constrained quadratic programs (QPs). The proposed algorithm recovers as special cases the regular and distributed Karhunen-Loeve transforms (KLTs). Cut-set lower bounds on the distortion region of multi-source, multi-receiver networks are given for linear coding based on convex relaxations. Cut-set lower bounds are also given for any coding strategy based on information theory. The distortion region and compression-estimation tradeoffs are illustrated for different communication demands (e.g. multiple unicast), and graph structures.
△ Less
Submitted 28 February, 2012;
originally announced February 2012.
-
A Compressed Sensing Wire-Tap Channel
Authors:
Galen Reeves,
Naveen Goela,
Nebojsa Milosavljevic,
Michael Gastpar
Abstract:
A multiplicative Gaussian wire-tap channel inspired by compressed sensing is studied. Lower and upper bounds on the secrecy capacity are derived, and shown to be relatively tight in the large system limit for a large class of compressed sensing matrices. Surprisingly, it is shown that the secrecy capacity of this channel is nearly equal to the capacity without any secrecy constraint provided that…
▽ More
A multiplicative Gaussian wire-tap channel inspired by compressed sensing is studied. Lower and upper bounds on the secrecy capacity are derived, and shown to be relatively tight in the large system limit for a large class of compressed sensing matrices. Surprisingly, it is shown that the secrecy capacity of this channel is nearly equal to the capacity without any secrecy constraint provided that the channel of the eavesdropper is strictly worse than the channel of the intended receiver. In other words, the eavesdropper can see almost everything and yet learn almost nothing. This behavior, which contrasts sharply with that of many commonly studied wiretap channels, is made possible by the fact that a small number of linear projections can make a crucial difference in the ability to estimate sparse vectors.
△ Less
Submitted 13 May, 2011;
originally announced May 2011.