-
Message-Relevant Dimension Reduction of Neural Populations
Authors:
Amanda Merkley,
Alice Y. Nam,
Y. Kate Hong,
Pulkit Grover
Abstract:
Quantifying relevant interactions between neural populations is a prominent question in the analysis of high-dimensional neural recordings. However, existing dimension reduction methods often discuss communication in the absence of a formal framework, while frameworks proposed to address this gap are impractical in data analysis. This work bridges the formal framework of M-Information Flow with pr…
▽ More
Quantifying relevant interactions between neural populations is a prominent question in the analysis of high-dimensional neural recordings. However, existing dimension reduction methods often discuss communication in the absence of a formal framework, while frameworks proposed to address this gap are impractical in data analysis. This work bridges the formal framework of M-Information Flow with practical analysis of real neural data. To this end, we propose Iterative Regression, a message-dependent linear dimension reduction technique that iteratively finds an orthonormal basis such that each basis vector maximizes correlation between the projected data and the message. We then define 'M-forwarding' to formally capture the notion of a message being forwarded from one neural population to another. We apply our methodology to recordings we collected from two neural populations in a simplified model of whisker-based sensory detection in mice, and show that the low-dimensional M-forwarding structure we infer supports biological evidence of a similar structure between the two original, high-dimensional populations.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
OpportunityFinder: A Framework for Automated Causal Inference
Authors:
Huy Nguyen,
Prince Grover,
Devashish Khatwani
Abstract:
We introduce OpportunityFinder, a code-less framework for performing a variety of causal inference studies with panel data for non-expert users. In its current state, OpportunityFinder only requires users to provide raw observational data and a configuration file. A pipeline is then triggered that inspects/processes data, chooses the suitable algorithm(s) to execute the causal study. It returns th…
▽ More
We introduce OpportunityFinder, a code-less framework for performing a variety of causal inference studies with panel data for non-expert users. In its current state, OpportunityFinder only requires users to provide raw observational data and a configuration file. A pipeline is then triggered that inspects/processes data, chooses the suitable algorithm(s) to execute the causal study. It returns the causal impact of the treatment on the configured outcome, together with sensitivity and robustness results. Causal inference is widely studied and used to estimate the downstream impact of individual's interactions with products and features. It is common that these causal studies are performed by scientists and/or economists periodically. Business stakeholders are often bottle-necked on scientist or economist bandwidth to conduct causal studies. We offer OpportunityFinder as a solution for commonly performed causal studies with four key features: (1) easy to use for both Business Analysts and Scientists, (2) abstraction of multiple algorithms under a single I/O interface, (3) support for causal impact analysis under binary treatment with panel data and (4) dynamic selection of algorithm based on scale of data.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
Computing Unique Information for Poisson and Multinomial Systems
Authors:
Chaitanya Goswami,
Amanda Merkley,
Pulkit Grover
Abstract:
Bivariate Partial Information Decomposition (PID) describes how the mutual information between a random variable M and two random variables Y and Z is decomposed into unique, redundant, and synergistic terms. Recently, PID has shown promise as an emerging tool to understand biological systems and biases in machine learning. However, computing PID is a challenging problem as it typically involves o…
▽ More
Bivariate Partial Information Decomposition (PID) describes how the mutual information between a random variable M and two random variables Y and Z is decomposed into unique, redundant, and synergistic terms. Recently, PID has shown promise as an emerging tool to understand biological systems and biases in machine learning. However, computing PID is a challenging problem as it typically involves optimizing over distributions. In this work, we study the problem of computing PID in two systems: the Poisson system inspired by the 'ideal Poisson channel' and the multinomial system inspired by multinomial thinning, for a scalar M. We provide sufficient conditions for both systems under which closed-form expressions for many operationally-motivated PID can be obtained, thereby allowing us to easily compute PID for these systems. Our proof consists of showing that one of the unique information terms is zero, which allows the remaining unique, redundant, and synergistic terms to be easily computed using only the marginal and the joint mutual information.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
Extracting Unique Information Through Markov Relations
Authors:
Keerthana Gurushankar,
Praveen Venkatesh,
Pulkit Grover
Abstract:
We propose two new measures for extracting the unique information in $X$ and not $Y$ about a message $M$, when $X, Y$ and $M$ are joint random variables with a given joint distribution. We take a Markov based approach, motivated by questions in fair machine learning, and inspired by similar Markov-based optimization problems that have been used in the Information Bottleneck and Common Information…
▽ More
We propose two new measures for extracting the unique information in $X$ and not $Y$ about a message $M$, when $X, Y$ and $M$ are joint random variables with a given joint distribution. We take a Markov based approach, motivated by questions in fair machine learning, and inspired by similar Markov-based optimization problems that have been used in the Information Bottleneck and Common Information frameworks. We obtain a complete characterization of our definitions in the Gaussian case (namely, when $X, Y$ and $M$ are jointly Gaussian), under the assumption of Gaussian optimality. We also examine the consistency of our definitions with the partial information decomposition (PID) framework, and show that these Markov based definitions achieve non-negativity, but not symmetry, within the PID framework.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
Fraud Dataset Benchmark and Applications
Authors:
Prince Grover,
Julia Xu,
Justin Tittelfitz,
Anqi Cheng,
Zheng Li,
Jakub Zablocki,
Jianbo Liu,
Hao Zhou
Abstract:
Standardized datasets and benchmarks have spurred innovations in computer vision, natural language processing, multi-modal and tabular settings. We note that, as compared to other well researched fields, fraud detection has unique challenges: high-class imbalance, diverse feature types, frequently changing fraud patterns, and adversarial nature of the problem. Due to these, the modeling approaches…
▽ More
Standardized datasets and benchmarks have spurred innovations in computer vision, natural language processing, multi-modal and tabular settings. We note that, as compared to other well researched fields, fraud detection has unique challenges: high-class imbalance, diverse feature types, frequently changing fraud patterns, and adversarial nature of the problem. Due to these, the modeling approaches evaluated on datasets from other research fields may not work well for the fraud detection. In this paper, we introduce Fraud Dataset Benchmark (FDB), a compilation of publicly available datasets catered to fraud detection FDB comprises variety of fraud related tasks, ranging from identifying fraudulent card-not-present transactions, detecting bot attacks, classifying malicious URLs, estimating risk of loan default to content moderation. The Python based library for FDB provides a consistent API for data loading with standardized training and testing splits. We demonstrate several applications of FDB that are of broad interest for fraud detection, including feature engineering, comparison of supervised learning algorithms, label noise removal, class-imbalance treatment and semi-supervised learning. We hope that FDB provides a common playground for researchers and practitioners in the fraud detection domain to develop robust and customized machine learning techniques targeting various fraud use cases.
△ Less
Submitted 22 September, 2023; v1 submitted 30 August, 2022;
originally announced August 2022.
-
Quantifying Feature Contributions to Overall Disparity Using Information Theory
Authors:
Sanghamitra Dutta,
Praveen Venkatesh,
Pulkit Grover
Abstract:
When a machine-learning algorithm makes biased decisions, it can be helpful to understand the sources of disparity to explain why the bias exists. Towards this, we examine the problem of quantifying the contribution of each individual feature to the observed disparity. If we have access to the decision-making model, one potential approach (inspired from intervention-based approaches in explainabil…
▽ More
When a machine-learning algorithm makes biased decisions, it can be helpful to understand the sources of disparity to explain why the bias exists. Towards this, we examine the problem of quantifying the contribution of each individual feature to the observed disparity. If we have access to the decision-making model, one potential approach (inspired from intervention-based approaches in explainability literature) is to vary each individual feature (while kee** the others fixed) and use the resulting change in disparity to quantify its contribution. However, we may not have access to the model or be able to test/audit its outputs for individually varying features. Furthermore, the decision may not always be a deterministic function of the input features (e.g., with human-in-the-loop). For these situations, we might need to explain contributions using purely distributional (i.e., observational) techniques, rather than interventional. We ask the question: what is the "potential" contribution of each individual feature to the observed disparity in the decisions when the exact decision-making mechanism is not accessible? We first provide canonical examples (thought experiments) that help illustrate the difference between distributional and interventional approaches to explaining contributions, and when either is better suited. When unable to intervene on the inputs, we quantify the "redundant" statistical dependency about the protected attribute that is present in both the final decision and an individual feature, by leveraging a body of work in information theory called Partial Information Decomposition. We also perform a simple case study to show how this technique could be applied to quantify contributions.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
Can Information Flows Suggest Targets for Interventions in Neural Circuits?
Authors:
Praveen Venkatesh,
Sanghamitra Dutta,
Neil Mehta,
Pulkit Grover
Abstract:
Motivated by neuroscientific and clinical applications, we empirically examine whether observational measures of information flow can suggest interventions. We do so by performing experiments on artificial neural networks in the context of fairness in machine learning, where the goal is to induce fairness in the system through interventions. Using our recently developed $M$-information flow framew…
▽ More
Motivated by neuroscientific and clinical applications, we empirically examine whether observational measures of information flow can suggest interventions. We do so by performing experiments on artificial neural networks in the context of fairness in machine learning, where the goal is to induce fairness in the system through interventions. Using our recently developed $M$-information flow framework, we measure the flow of information about the true label (responsible for accuracy, and hence desirable), and separately, the flow of information about a protected attribute (responsible for bias, and hence undesirable) on the edges of a trained neural network. We then compare the flow magnitudes against the effect of intervening on those edges by pruning. We show that pruning edges that carry larger information flows about the protected attribute reduces bias at the output to a greater extent. This demonstrates that $M$-information flow can meaningfully suggest targets for interventions, answering the title's question in the affirmative. We also evaluate bias-accuracy tradeoffs for different intervention strategies, to analyze how one might use estimates of desirable and undesirable information flows (here, accuracy and bias flows) to inform interventions that preserve the former while reducing the latter.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.
-
A Minimal Intervention Definition of Reverse Engineering a Neural Circuit
Authors:
Keerthana Gurushankar,
Pulkit Grover
Abstract:
In neuroscience, researchers have developed informal notions of what it means to reverse engineer a system, e.g., being able to model or simulate a system in some sense. A recent influential paper of Jonas and Kording, that examines a microprocessor using techniques from neuroscience, suggests that common techniques to understand neural systems are inadequate. Part of the difficulty, as a previous…
▽ More
In neuroscience, researchers have developed informal notions of what it means to reverse engineer a system, e.g., being able to model or simulate a system in some sense. A recent influential paper of Jonas and Kording, that examines a microprocessor using techniques from neuroscience, suggests that common techniques to understand neural systems are inadequate. Part of the difficulty, as a previous work of Lazebnik noted, lies in lack of formal language. We provide a theoretical framework for defining reverse engineering of computational systems, motivated by the neuroscience context. Of specific interest are recent works where, increasingly, interventions are being made to alter the function of the neural circuitry to both understand the system and treat disorders. Starting from Lazebnik's viewpoint that understanding a system means you can ``fix it'', and motivated by use-cases in neuroscience, we propose the following requirement on reverse engineering: once an agent claims to have reverse-engineered a neural circuit, they subsequently need to be able to: (a) provide a minimal set of interventions to change the input/output (I/O) behavior of the circuit to a desired behavior; (b) arrive at this minimal set of interventions while operating under bounded rationality constraints (e.g., limited memory) to rule out brute-force approaches. Under certain assumptions, we show that this reverse engineering goal falls within the class of undecidable problems. Next, we examine some canonical computational systems and reverse engineering goals (as specified by desired I/O behaviors) where reverse engineering can indeed be performed. Finally, using an exemplar network, the ``reward network'' in the brain, we summarize the state of current neuroscientific understanding, and discuss how computer-science and information-theoretic concepts can inform goals of future neuroscience studies.
△ Less
Submitted 2 October, 2021;
originally announced October 2021.
-
Fairness Under Feature Exemptions: Counterfactual and Observational Measures
Authors:
Sanghamitra Dutta,
Praveen Venkatesh,
Piotr Mardziel,
Anupam Datta,
Pulkit Grover
Abstract:
With the growing use of ML in highly consequential domains, quantifying disparity with respect to protected attributes, e.g., gender, race, etc., is important. While quantifying disparity is essential, sometimes the needs of an occupation may require the use of certain features that are critical in a way that any disparity that can be explained by them might need to be exempted. E.g., in hiring a…
▽ More
With the growing use of ML in highly consequential domains, quantifying disparity with respect to protected attributes, e.g., gender, race, etc., is important. While quantifying disparity is essential, sometimes the needs of an occupation may require the use of certain features that are critical in a way that any disparity that can be explained by them might need to be exempted. E.g., in hiring a software engineer for a safety-critical application, coding-skills may be weighed strongly, whereas name, zip code, or reference letters may be used only to the extent that they do not add disparity. In this work, we propose an information-theoretic decomposition of the total disparity (a quantification inspired from counterfactual fairness) into two components: a non-exempt component which quantifies the part that cannot be accounted for by the critical features, and an exempt component that quantifies the remaining disparity. This decomposition allows one to check if the disparity arose purely due to the critical features (inspired from the business necessity defense of disparate impact law) and also enables selective removal of the non-exempt component if desired. We arrive at this decomposition through canonical examples that lead to a set of desirable properties (axioms) that a measure of non-exempt disparity should satisfy. Our proposed measure satisfies all of them. Our quantification bridges ideas of causality, Simpson's paradox, and a body of work from information theory called Partial Information Decomposition. We also obtain an impossibility result showing that no observational measure can satisfy all the desirable properties, leading us to relax our goals and examine observational measures that satisfy only some of them. We perform case studies to show how one can audit/train models while reducing non-exempt disparity.
△ Less
Submitted 6 August, 2021; v1 submitted 14 June, 2020;
originally announced June 2020.
-
CodeNet: Training Large Scale Neural Networks in Presence of Soft-Errors
Authors:
Sanghamitra Dutta,
Ziqian Bai,
Tze Meng Low,
Pulkit Grover
Abstract:
This work proposes the first strategy to make distributed training of neural networks resilient to computing errors, a problem that has remained unsolved despite being first posed in 1956 by von Neumann. He also speculated that the efficiency and reliability of the human brain is obtained by allowing for low power but error-prone components with redundancy for error-resilience. It is surprising th…
▽ More
This work proposes the first strategy to make distributed training of neural networks resilient to computing errors, a problem that has remained unsolved despite being first posed in 1956 by von Neumann. He also speculated that the efficiency and reliability of the human brain is obtained by allowing for low power but error-prone components with redundancy for error-resilience. It is surprising that this problem remains open, even as massive artificial neural networks are being trained on increasingly low-cost and unreliable processing units. Our coding-theory-inspired strategy, "CodeNet," solves this problem by addressing three challenges in the science of reliable computing: (i) Providing the first strategy for error-resilient neural network training by encoding each layer separately; (ii) Kee** the overheads of coding (encoding/error-detection/decoding) low by obviating the need to re-encode the updated parameter matrices after each iteration from scratch. (iii) Providing a completely decentralized implementation with no central node (which is a single point of failure), allowing all primary computational steps to be error-prone. We theoretically demonstrate that CodeNet has higher error tolerance than replication, which we leverage to speed up computation time. Simultaneously, CodeNet requires lower redundancy than replication, and equal computational and communication costs in scaling sense. We first demonstrate the benefits of CodeNet in reducing expected computation time over replication when accounting for checkpointing. Our experiments show that CodeNet achieves the best accuracy-runtime tradeoff compared to both replication and uncoded strategies. CodeNet is a significant step towards biologically plausible neural network training, that could hold the key to orders of magnitude efficiency improvements.
△ Less
Submitted 3 March, 2019;
originally announced March 2019.
-
Information Flow in Computational Systems
Authors:
Praveen Venkatesh,
Sanghamitra Dutta,
Pulkit Grover
Abstract:
We develop a theoretical framework for defining and identifying flows of information in computational systems. Here, a computational system is assumed to be a directed graph, with "clocked" nodes that send transmissions to each other along the edges of the graph at discrete points in time. We are interested in a definition that captures the dynamic flow of information about a specific message, and…
▽ More
We develop a theoretical framework for defining and identifying flows of information in computational systems. Here, a computational system is assumed to be a directed graph, with "clocked" nodes that send transmissions to each other along the edges of the graph at discrete points in time. We are interested in a definition that captures the dynamic flow of information about a specific message, and which guarantees an unbroken "information path" between appropriately defined inputs and outputs in the directed graph. Prior measures, including those based on Granger Causality and Directed Information, fail to provide clear assumptions and guarantees about when they correctly reflect information flow about a message. We take a systematic approach---iterating through candidate definitions and counterexamples---to arrive at a definition for information flow that is based on conditional mutual information, and which satisfies desirable properties, including the existence of information paths. Finally, we describe how information flow might be detected in a noiseless setting, and provide an algorithm to identify information paths on the time-unrolled graph of a computational system.
△ Less
Submitted 2 March, 2020; v1 submitted 6 February, 2019;
originally announced February 2019.
-
Coded Elastic Computing
Authors:
Yaoqing Yang,
Matteo Interlandi,
Pulkit Grover,
Soummya Kar,
Saeed Amizadeh,
Markus Weimer
Abstract:
Cloud providers have recently introduced new offerings whereby spare computing resources are accessible at discounts compared to on-demand computing. Exploiting such opportunity is challenging inasmuch as such resources are accessed with low-priority and therefore can elastically leave (through preemption) and join the computation at any time. In this paper, we design a new technique called coded…
▽ More
Cloud providers have recently introduced new offerings whereby spare computing resources are accessible at discounts compared to on-demand computing. Exploiting such opportunity is challenging inasmuch as such resources are accessed with low-priority and therefore can elastically leave (through preemption) and join the computation at any time. In this paper, we design a new technique called coded elastic computing, enabling distributed computations over elastic resources. The proposed technique allows machines to leave the computation without sacrificing the algorithm-level performance, and, at the same time, adaptively reduce the workload at existing machines when new ones join the computation. Leveraging coded redundancy, our approach can achieve similar computational cost as the original (noiseless) method when all machines are present; the cost gracefully increases when machines are preempted and reduces when machines join. The performance of the proposed technique is evaluated on matrix-vector multiplication and linear regression tasks. In experimental validations, it can achieve exactly the same numerical result as the noiseless computation, while reducing the computation time by 46% when compared to non-adaptive coding schemes.
△ Less
Submitted 26 May, 2019; v1 submitted 16 December, 2018;
originally announced December 2018.
-
An Application of Storage-Optimal MatDot Codes for Coded Matrix Multiplication: Fast k-Nearest Neighbors Estimation
Authors:
Utsav Sheth,
Sanghamitra Dutta,
Malhar Chaudhari,
Haewon Jeong,
Yaoqing Yang,
Jukka Kohonen,
Teemu Roos,
Pulkit Grover
Abstract:
We propose a novel application of coded computing to the problem of the nearest neighbor estimation using MatDot Codes [Fahim. et.al. 2017], that are known to be optimal for matrix multiplication in terms of recovery threshold under storage constraints. In approximate nearest neighbor algorithms, it is common to construct efficient in-memory indexes to improve query response time. One such strateg…
▽ More
We propose a novel application of coded computing to the problem of the nearest neighbor estimation using MatDot Codes [Fahim. et.al. 2017], that are known to be optimal for matrix multiplication in terms of recovery threshold under storage constraints. In approximate nearest neighbor algorithms, it is common to construct efficient in-memory indexes to improve query response time. One such strategy is Multiple Random Projection Trees (MRPT), which reduces the set of candidate points over which Euclidean distance calculations are performed. However, this may result in a high memory footprint and possibly paging penalties for large or high-dimensional data. Here we propose two techniques to parallelize MRPT, that exploit data and model parallelism respectively, by dividing both the data storage and the computation efforts among different nodes in a distributed computing cluster. This is especially critical when a single compute node cannot hold the complete dataset in memory. We also propose a novel coded computation strategy based on MatDot codes for the model-parallel architecture that, in a straggler-prone environment, achieves the storage-optimal recovery threshold, i.e., the number of nodes that are required to serve a query. We experimentally demonstrate that, in the absence of straggling, our distributed approaches require less query time than execution on a single processing node, providing near-linear speedups with respect to the number of worker nodes. Through our experiments on real systems with simulated straggling, we also show that our strategy achieves a faster query execution than the uncoded strategy in a straggler-prone environment.
△ Less
Submitted 28 November, 2018;
originally announced November 2018.
-
A Unified Coded Deep Neural Network Training Strategy Based on Generalized PolyDot Codes for Matrix Multiplication
Authors:
Sanghamitra Dutta,
Ziqian Bai,
Haewon Jeong,
Tze Meng Low,
Pulkit Grover
Abstract:
This paper has two contributions. First, we propose a novel coded matrix multiplication technique called Generalized PolyDot codes that advances on existing methods for coded matrix multiplication under storage and communication constraints. This technique uses "garbage alignment," i.e., aligning computations in coded computing that are not a part of the desired output. Generalized PolyDot codes b…
▽ More
This paper has two contributions. First, we propose a novel coded matrix multiplication technique called Generalized PolyDot codes that advances on existing methods for coded matrix multiplication under storage and communication constraints. This technique uses "garbage alignment," i.e., aligning computations in coded computing that are not a part of the desired output. Generalized PolyDot codes bridge between Polynomial codes and MatDot codes, trading off between recovery threshold and communication costs. Second, we demonstrate that Generalized PolyDot can be used for training large Deep Neural Networks (DNNs) on unreliable nodes prone to soft-errors. This requires us to address three additional challenges: (i) prohibitively large overhead of coding the weight matrices in each layer of the DNN at each iteration; (ii) nonlinear operations during training, which are incompatible with linear coding; and (iii) not assuming presence of an error-free master node, requiring us to architect a fully decentralized implementation without any "single point of failure." We allow all primary DNN training steps, namely, matrix multiplication, nonlinear activation, Hadamard product, and update steps as well as the encoding/decoding to be error-prone. We consider the case of mini-batch size $B=1$, as well as $B>1$, leveraging coded matrix-vector products, and matrix-matrix products respectively. The problem of DNN training under soft-errors also motivates an interesting, probabilistic error model under which a real number $(P,Q)$ MDS code is shown to correct $P-Q-1$ errors with probability $1$ as compared to $\lfloor \frac{P-Q}{2} \rfloor$ for the more conventional, adversarial error model. We also demonstrate that our proposed strategy can provide unbounded gains in error tolerance over a competing replication strategy and a preliminary MDS-code-based strategy for both these error models.
△ Less
Submitted 26 November, 2018;
originally announced November 2018.
-
Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control
Authors:
Yangchen Pan,
Amir-massoud Farahmand,
Martha White,
Saleh Nabi,
Piyush Grover,
Daniel Nikovski
Abstract:
Recent work has shown that reinforcement learning (RL) is a promising approach to control dynamical systems described by partial differential equations (PDE). This paper shows how to use RL to tackle more general PDE control problems that have continuous high-dimensional action spaces with spatial relationship among action dimensions. In particular, we propose the concept of action descriptors, wh…
▽ More
Recent work has shown that reinforcement learning (RL) is a promising approach to control dynamical systems described by partial differential equations (PDE). This paper shows how to use RL to tackle more general PDE control problems that have continuous high-dimensional action spaces with spatial relationship among action dimensions. In particular, we propose the concept of action descriptors, which encode regularities among spatially-extended action dimensions and enable the agent to control high-dimensional action PDEs. We provide theoretical evidence suggesting that this approach can be more sample efficient compared to a conventional approach that treats each action dimension separately and does not explicitly exploit the spatial regularity of the action space. The action descriptor approach is then used within the deep deterministic policy gradient algorithm. Experiments on two PDE control problems, with up to 256-dimensional continuous actions, show the advantage of the proposed approach over the conventional one.
△ Less
Submitted 12 June, 2018;
originally announced June 2018.
-
Straggler-Resilient and Communication-Efficient Distributed Iterative Linear Solver
Authors:
Farzin Haddadpour,
Yaoqing Yang,
Malhar Chaudhari,
Viveck R Cadambe,
Pulkit Grover
Abstract:
We propose a novel distributed iterative linear inverse solver method. Our method, PolyLin, has significantly lower communication cost, both in terms of number of rounds as well as number of bits, in comparison with the state of the art at the cost of higher computational complexity and storage. Our algorithm also has a built-in resilience to straggling and faulty computation nodes. We develop a n…
▽ More
We propose a novel distributed iterative linear inverse solver method. Our method, PolyLin, has significantly lower communication cost, both in terms of number of rounds as well as number of bits, in comparison with the state of the art at the cost of higher computational complexity and storage. Our algorithm also has a built-in resilience to straggling and faulty computation nodes. We develop a natural variant of our main algorithm that trades off communication cost for computational complexity. Our method is inspired by ideas in error correcting codes.
△ Less
Submitted 15 June, 2018;
originally announced June 2018.
-
Coded FFT and Its Communication Overhead
Authors:
Haewon Jeong,
Tze Meng Low,
Pulkit Grover
Abstract:
We propose a coded computing strategy and examine communication costs of coded computing algorithms to make distributed Fast Fourier Transform (FFT) resilient to errors during the computation. We apply maximum distance separable (MDS) codes to a widely used "Transpose" algorithm for parallel FFT. In the uncoded distributed FFT algorithm, the most expensive step is a single "all-to-all" communicati…
▽ More
We propose a coded computing strategy and examine communication costs of coded computing algorithms to make distributed Fast Fourier Transform (FFT) resilient to errors during the computation. We apply maximum distance separable (MDS) codes to a widely used "Transpose" algorithm for parallel FFT. In the uncoded distributed FFT algorithm, the most expensive step is a single "all-to-all" communication. We compare this with communication overhead of coding in our coded FFT algorithm. We show that by using a systematic MDS code, the communication overhead of coding is negligible in comparison with the communication costs inherent in an uncoded FFT implementation if the number of parity nodes is at most $o(\log K)$, where $K$ is the number of systematic nodes.
△ Less
Submitted 24 May, 2018;
originally announced May 2018.
-
Coded Iterative Computing using Substitute Decoding
Authors:
Yaoqing Yang,
Malhar Chaudhari,
Pulkit Grover,
Soummya Kar
Abstract:
In this paper, we propose a new coded computing technique called "substitute decoding" for general iterative distributed computation tasks. In the first part of the paper, we use PageRank as a simple example to show that substitute decoding can make the computation of power iterations solving PageRank on sparse matrices robust to erasures in distributed systems. For these sparse matrices, codes wi…
▽ More
In this paper, we propose a new coded computing technique called "substitute decoding" for general iterative distributed computation tasks. In the first part of the paper, we use PageRank as a simple example to show that substitute decoding can make the computation of power iterations solving PageRank on sparse matrices robust to erasures in distributed systems. For these sparse matrices, codes with dense generator matrices can significantly increase storage costs and codes with low-density generator matrices (LDGM) are preferred. Surprisingly, we show through both theoretical analysis and simulations that when substitute decoding is used, coded iterative computing with extremely low-density codes (2 or 3 non-zeros in each row of the generator matrix) can achieve almost the same convergence rate as noiseless techniques, despite the poor error-correction ability of LDGM codes. In the second part of the paper, we discuss applications of substitute decoding beyond solving linear systems and PageRank. These applications include (1) computing eigenvectors, (2) computing the truncated singular value decomposition (SVD), and (3) gradient descent. These examples show that the substitute decoding algorithm is useful in a wide range of applications.
△ Less
Submitted 15 May, 2018;
originally announced May 2018.
-
On the Optimal Recovery Threshold of Coded Matrix Multiplication
Authors:
Sanghamitra Dutta,
Mohammad Fahim,
Farzin Haddadpour,
Haewon Jeong,
Viveck Cadambe,
Pulkit Grover
Abstract:
We provide novel coded computation strategies for distributed matrix-matrix products that outperform the recent "Polynomial code" constructions in recovery threshold, i.e., the required number of successful workers. When $m$-th fraction of each matrix can be stored in each worker node, Polynomial codes require $m^2$ successful workers, while our MatDot codes only require $2m-1$ successful workers,…
▽ More
We provide novel coded computation strategies for distributed matrix-matrix products that outperform the recent "Polynomial code" constructions in recovery threshold, i.e., the required number of successful workers. When $m$-th fraction of each matrix can be stored in each worker node, Polynomial codes require $m^2$ successful workers, while our MatDot codes only require $2m-1$ successful workers, albeit at a higher communication cost from each worker to the fusion node. We also provide a systematic construction of MatDot codes. Further, we propose "PolyDot" coding that interpolates between Polynomial codes and MatDot codes to trade off communication cost and recovery threshold. Finally, we demonstrate a coding technique for multiplying $n$ matrices ($n \geq 3$) by applying MatDot and PolyDot coding ideas.
△ Less
Submitted 16 May, 2018; v1 submitted 30 January, 2018;
originally announced January 2018.
-
Coding Method for Parallel Iterative Linear Solver
Authors:
Yaoqing Yang,
Pulkit Grover,
Soummya Kar
Abstract:
Computationally intensive distributed and parallel computing is often bottlenecked by a small set of slow workers known as stragglers. In this paper, we utilize the emerging idea of "coded computation" to design a novel error-correcting-code inspired technique for solving linear inverse problems under specific iterative methods in a parallelized implementation affected by stragglers. Example appli…
▽ More
Computationally intensive distributed and parallel computing is often bottlenecked by a small set of slow workers known as stragglers. In this paper, we utilize the emerging idea of "coded computation" to design a novel error-correcting-code inspired technique for solving linear inverse problems under specific iterative methods in a parallelized implementation affected by stragglers. Example applications include inverse problems in machine learning on graphs, such as personalized PageRank and sampling on graphs. We provably show that our coded-computation technique can reduce the mean-squared error under a computational deadline constraint. In fact, the ratio of mean-squared error of replication-based and coded techniques diverges to infinity as the deadline increases. Our experiments for personalized PageRank performed on real systems and real social networks show that this ratio can be as large as $10^4$. Further, unlike coded-computation techniques proposed thus far, our strategy combines outputs of all workers, including the stragglers, to produce more accurate estimates at the computational deadline. This also ensures that the accuracy degrades "gracefully" in the event that the number of stragglers is large.
△ Less
Submitted 5 June, 2017; v1 submitted 1 June, 2017;
originally announced June 2017.
-
Coded convolution for parallel and distributed computing within a deadline
Authors:
Sanghamitra Dutta,
Viveck Cadambe,
Pulkit Grover
Abstract:
We consider the problem of computing the convolution of two long vectors using parallel processing units in the presence of "stragglers". Stragglers refer to the small fraction of faulty or slow processors that delays the entire computation in time-critical distributed systems. We first show that splitting the vectors into smaller pieces and using a linear code to encode these pieces provides bett…
▽ More
We consider the problem of computing the convolution of two long vectors using parallel processing units in the presence of "stragglers". Stragglers refer to the small fraction of faulty or slow processors that delays the entire computation in time-critical distributed systems. We first show that splitting the vectors into smaller pieces and using a linear code to encode these pieces provides better resilience against stragglers than replication-based schemes under a simple, worst-case straggler analysis. We then demonstrate that under commonly used models of computation time, coding can dramatically improve the probability of finishing the computation within a target "deadline" time. As opposed to the more commonly used technique of expected computation time analysis, we quantify the exponents of the probability of failure in the limit of large deadlines. Our exponent metric captures the probability of failing to finish before a specified deadline time, i.e. , the behavior of the "tail". Moreover, our technique also allows for simple closed form expressions for more general models of computation time, e.g. shifted Weibull models instead of only shifted exponentials. Thus, through this problem of coded convolution, we establish the utility of a novel asymptotic failure exponent analysis for distributed systems.
△ Less
Submitted 10 May, 2017;
originally announced May 2017.
-
"Short-Dot": Computing Large Linear Transforms Distributedly Using Coded Short Dot Products
Authors:
Sanghamitra Dutta,
Viveck Cadambe,
Pulkit Grover
Abstract:
Faced with saturation of Moore's law and increasing dimension of data, system designers have increasingly resorted to parallel and distributed computing. However, distributed computing is often bottle necked by a small fraction of slow processors called "stragglers" that reduce the speed of computation because the fusion node has to wait for all processors to finish. To combat the effect of stragg…
▽ More
Faced with saturation of Moore's law and increasing dimension of data, system designers have increasingly resorted to parallel and distributed computing. However, distributed computing is often bottle necked by a small fraction of slow processors called "stragglers" that reduce the speed of computation because the fusion node has to wait for all processors to finish. To combat the effect of stragglers, recent literature introduces redundancy in computations across processors, e.g.,~using repetition-based strategies or erasure codes. The fusion node can exploit this redundancy by completing the computation using outputs from only a subset of the processors, ignoring the stragglers. In this paper, we propose a novel technique -- that we call "Short-Dot" -- to introduce redundant computations in a coding theory inspired fashion, for computing linear transforms of long vectors. Instead of computing long dot products as required in the original linear transform, we construct a larger number of redundant and short dot products that can be computed faster and more efficiently at individual processors. In reference to comparable schemes that introduce redundancy to tackle stragglers, Short-Dot reduces the cost of computation, storage and communication since shorter portions are stored and computed at each processor, and also shorter portions of the input is communicated to each processor. We demonstrate through probabilistic analysis as well as experiments that Short-Dot offers significant speed-up compared to existing techniques. We also derive trade-offs between the length of the dot-products and the resilience to stragglers (number of processors to wait for), for any such strategy and compare it to that achieved by our strategy.
△ Less
Submitted 17 April, 2017;
originally announced April 2017.
-
Adaptivity provably helps: information-theoretic limits on $l_0$ cost of non-adaptive sensing
Authors:
Sanghamitra Dutta,
Pulkit Grover
Abstract:
The advantages of adaptivity and feedback are of immense interest in signal processing and communication with many positive and negative results. Although it is established that adaptivity does not offer substantial reductions in minimax mean square error for a fixed number of measurements, existing results have shown several advantages of adaptivity in complexity of reconstruction, accuracy of su…
▽ More
The advantages of adaptivity and feedback are of immense interest in signal processing and communication with many positive and negative results. Although it is established that adaptivity does not offer substantial reductions in minimax mean square error for a fixed number of measurements, existing results have shown several advantages of adaptivity in complexity of reconstruction, accuracy of support detection, and gain in signal-to-noise ratio, under constraints on sensing energy. Sensing energy has often been measured in terms of the Frobenius Norm of the sensing matrix. This paper uses a different metric that we call the $l_0$ cost of a sensing matrix-- to quantify the complexity of sensing. Thus sparse sensing matrices have a lower cost. We derive information-theoretic lower bounds on the $l_0$ cost that hold for any non-adaptive sensing strategy. We establish that any non-adaptive sensing strategy must incur an $l_0$ cost of $Θ\left( N \log_2(N)\right) $ to reconstruct an $N$-dimensional, one--sparse signal when the number of measurements are limited to $Θ\left(\log_2 (N)\right)$. In comparison, bisection-type adaptive strategies only require an $l_0$ cost of at most $\mathcal{O}(N)$ for an equal number of measurements. The problem has an interesting interpretation as a sphere packing problem in a multidimensional space, such that all the sphere centres have minimum non-zero co-ordinates. We also discuss the variation in $l_0$ cost as the number of measurements increase from $Θ\left(\log_2 (N)\right)$ to $Θ\left(N\right)$.
△ Less
Submitted 22 June, 2016; v1 submitted 12 February, 2016;
originally announced February 2016.
-
Rate Distortion for Lossy In-network Function Computation: Information Dissipation and Sequential Reverse Water-Filling
Authors:
Yaoqing Yang,
Pulkit Grover,
Soummya Kar
Abstract:
We consider the problem of distributed lossy linear function computation in a tree network. We examine two cases: (i) data aggregation (only one sink node computes) and (ii) consensus (all nodes compute the same function). By quantifying the accumulation of information loss in distributed computing, we obtain fundamental limits on network computation rate as a function of incremental distortions (…
▽ More
We consider the problem of distributed lossy linear function computation in a tree network. We examine two cases: (i) data aggregation (only one sink node computes) and (ii) consensus (all nodes compute the same function). By quantifying the accumulation of information loss in distributed computing, we obtain fundamental limits on network computation rate as a function of incremental distortions (and hence incremental loss of information) along the edges of the network. The above characterization, based on quantifying distortion accumulation, offers an improvement over classical cut-set type techniques which are based on overall distortions instead of incremental distortions. This quantification of information loss qualitatively resembles information dissipation in cascaded channels [1]. Surprisingly, this accumulation effect of distortion happens even at infinite blocklength. Combining this observation with an inequality on the dominance of mean-square quantities over relative-entropy quantities, we obtain outer bounds on the rate distortion function that are tighter than classical cut-set bounds by a difference which can be arbitrarily large in both data aggregation and consensus. We also obtain inner bounds on the optimal rate using random Gaussian coding, which differ from the outer bounds by $\mathcal{O}(\sqrt{D})$, where $D$ is the overall distortion. The obtained inner and outer bounds can provide insights on rate (bit) allocations for both the data aggregation problem and the consensus problem. We show that for tree networks, the rate allocation results have a mathematical structure similar to classical reverse water-filling for parallel Gaussian sources.
△ Less
Submitted 12 January, 2017; v1 submitted 22 January, 2016;
originally announced January 2016.
-
Energy Efficient Distributed Coding for Data Collection in a Noisy Sparse Network
Authors:
Yaoqing Yang,
Soummya Kar,
Pulkit Grover
Abstract:
We consider the problem of data collection in a two-layer network consisting of (1) links between $N$ distributed agents and a remote sink node; (2) a sparse network formed by these distributed agents. We study the effect of inter-agent communications on the overall energy consumption. Despite the sparse connections between agents, we provide an in-network coding scheme that reduces the overall en…
▽ More
We consider the problem of data collection in a two-layer network consisting of (1) links between $N$ distributed agents and a remote sink node; (2) a sparse network formed by these distributed agents. We study the effect of inter-agent communications on the overall energy consumption. Despite the sparse connections between agents, we provide an in-network coding scheme that reduces the overall energy consumption by a factor of $Θ(\log N)$ compared to a naive scheme which neglects inter-agent communications. By providing lower bounds on both the energy consumption and the sparseness (number of links) of the network, we show that are energy-optimal except for a factor of $Θ(\log\log N)$. The proposed scheme extends a previous work of Gallager on noisy broadcasting from a complete graph to a sparse graph, while bringing in new techniques from error control coding and noisy circuits.
△ Less
Submitted 22 January, 2016;
originally announced January 2016.
-
Model-free control framework for multi-limb soft robots
Authors:
Vishesh Vikas,
Piyush Grover,
Barry Trimmer
Abstract:
The deformable and continuum nature of soft robots promises versatility and adaptability. However, control of modular, multi-limbed soft robots for terrestrial locomotion is challenging due to the complex robot structure, actuator mechanics and robot-environment interaction. Traditionally, soft robot control is performed by modeling kinematics using exact geometric equations and finite element ana…
▽ More
The deformable and continuum nature of soft robots promises versatility and adaptability. However, control of modular, multi-limbed soft robots for terrestrial locomotion is challenging due to the complex robot structure, actuator mechanics and robot-environment interaction. Traditionally, soft robot control is performed by modeling kinematics using exact geometric equations and finite element analysis. The research presents an alternative, model-free, data-driven, reinforcement learning inspired approach, for controlling multi-limbed soft material robots. This control approach can be summarized as a four-step process of discretization, visualization, learning and optimization. The first step involves identification and subsequent discretization of key factors that dominate robot-environment, in turn, the robot control. Graph theory is used to visualize relationships and transitions between the discretized states. The graph representation facilitates mathematical definition of periodic control patterns (simple cycles) and locomotion gaits. Rewards corresponding to individual arcs of the graph are weighted displacement and orientation change for robot state-to-state transitions. These rewards are specific to surface of locomotion and are learned. Finally, the control patterns result from optimization of reward dependent locomotion task (e.g. translation) cost function. The optimization problem is an Integer Linear Programming problem which can be quickly solved using standard solvers. The framework is generic and independent of type of actuator, soft material properties or the type of friction mechanism, as the control exists in the robot's task space. Furthermore, the data-driven nature of the framework imparts adaptability to the framework toward different locomotion surfaces by re-learning rewards.
△ Less
Submitted 19 September, 2015;
originally announced September 2015.
-
Energy Harvesting Transmitters that Heat Up: Throughput Maximization under Temperature Constraints
Authors:
Omur Ozel,
Sennur Ulukus,
Pulkit Grover
Abstract:
Motivated by damage due to heating in sensor operation, we consider the throughput optimal offline data scheduling problem in an energy harvesting transmitter such that the resulting temperature increase remains below a critical level. We model the temperature dynamics of the transmitter as a linear system and determine the optimal transmit power policy under such temperature constraints as well a…
▽ More
Motivated by damage due to heating in sensor operation, we consider the throughput optimal offline data scheduling problem in an energy harvesting transmitter such that the resulting temperature increase remains below a critical level. We model the temperature dynamics of the transmitter as a linear system and determine the optimal transmit power policy under such temperature constraints as well as energy harvesting constraints over an AWGN channel. We first derive the structural properties of the solution for the general case with multiple energy arrivals. We show that the optimal power policy is piecewise monotone decreasing with possible jumps at the energy harvesting instants. We derive analytical expressions for the optimal solution in the single energy arrival case. We show that, in the single energy arrival case, the optimal power is monotone decreasing, the resulting temperature is monotone increasing, and both remain constant after the temperature hits the critical level. We then generalize the solution for the multiple energy arrival case.
△ Less
Submitted 2 September, 2015;
originally announced September 2015.
-
Graph Codes for Distributed Instant Message Collection in an Arbitrary Noisy Broadcast Network
Authors:
Yaoqing Yang,
Soummya Kar,
Pulkit Grover
Abstract:
We consider the problem of minimizing the number of broadcasts for collecting all sensor measurements at a sink node in a noisy broadcast sensor network. Focusing first on arbitrary network topologies, we provide (i) fundamental limits on the required number of broadcasts of data gathering, and (ii) a general in-network computing strategy to achieve an upper bound within factor $\log N$ of the fun…
▽ More
We consider the problem of minimizing the number of broadcasts for collecting all sensor measurements at a sink node in a noisy broadcast sensor network. Focusing first on arbitrary network topologies, we provide (i) fundamental limits on the required number of broadcasts of data gathering, and (ii) a general in-network computing strategy to achieve an upper bound within factor $\log N$ of the fundamental limits, where $N$ is the number of agents in the network. Next, focusing on two example networks, namely, \textcolor{black}{arbitrary geometric networks and random Erd$\ddot{o}$s-R$\acute{e}$nyi networks}, we provide improved in-network computing schemes that are optimal in that they attain the fundamental limits, i.e., the lower and upper bounds are tight \textcolor{black}{in order sense}. Our main techniques are three distributed encoding techniques, called graph codes, which are designed respectively for the above-mentioned three scenarios. Our work thus extends and unifies previous works such as those of Gallager [1] and Karamchandani~\emph{et. al.} [2] on number of broadcasts for distributed function computation in special network topologies, while bringing in novel techniques, e.g., from error-control coding and noisy circuits, for both upper and lower bounds.
△ Less
Submitted 30 January, 2017; v1 submitted 6 August, 2015;
originally announced August 2015.
-
Computing Linear Transformations with Unreliable Components
Authors:
Yaoqing Yang,
Pulkit Grover,
Soummya Kar
Abstract:
We consider the problem of computing a binary linear transformation using unreliable components when all circuit components are unreliable. Two noise models of unreliable components are considered: probabilistic errors and permanent errors. We introduce the "ENCODED" technique that ensures that the error probability of the computation of the linear transformation is kept bounded below a small cons…
▽ More
We consider the problem of computing a binary linear transformation using unreliable components when all circuit components are unreliable. Two noise models of unreliable components are considered: probabilistic errors and permanent errors. We introduce the "ENCODED" technique that ensures that the error probability of the computation of the linear transformation is kept bounded below a small constant independent of the size of the linear transformation even when all logic gates in the computation are noisy. Further, we show that the scheme requires fewer operations (in order sense) than its "uncoded" counterpart. By deriving a lower bound, we show that in some cases, the scheme is order-optimal. Using these results, we examine the gain in energy-efficiency from use of "voltage-scaling" scheme where gate-energy is reduced by lowering the supply voltage. We use a gate energy-reliability model to show that tuning gate-energy appropriately at different stages of the computation ("dynamic" voltage scaling), in conjunction with ENCODED, can lead to order-sense energy-savings over the classical "uncoded" approach. Finally, we also examine the problem of computing a linear transformation when noiseless decoders can be used, providing upper and lower bounds to the problem.
△ Less
Submitted 13 May, 2017; v1 submitted 24 June, 2015;
originally announced June 2015.
-
On the Total-Power Capacity of Regular-LDPC Codes with Iterative Message-Passing Decoders
Authors:
Karthik Ganesan,
Pulkit Grover,
Jan Rabaey,
Andrea Goldsmith
Abstract:
Motivated by recently derived fundamental limits on total (transmit + decoding) power for coded communication with VLSI decoders, this paper investigates the scaling behavior of the minimum total power needed to communicate over AWGN channels as the target bit-error-probability tends to zero. We focus on regular-LDPC codes and iterative message-passing decoders. We analyze scaling behavior under t…
▽ More
Motivated by recently derived fundamental limits on total (transmit + decoding) power for coded communication with VLSI decoders, this paper investigates the scaling behavior of the minimum total power needed to communicate over AWGN channels as the target bit-error-probability tends to zero. We focus on regular-LDPC codes and iterative message-passing decoders. We analyze scaling behavior under two VLSI complexity models of decoding. One model abstracts power consumed in processing elements ("node model"), and another abstracts power consumed in wires which connect the processing elements ("wire model"). We prove that a coding strategy using regular-LDPC codes with Gallager-B decoding achieves order-optimal scaling of total power under the node model. However, we also prove that regular-LDPC codes and iterative message-passing decoders cannot meet existing fundamental limits on total power under the wire model. Further, if the transmit energy-per-bit is bounded, total power grows at a rate that is worse than uncoded transmission. Complementing our theoretical results, we develop detailed physical models of decoding implementations using post-layout circuit simulations. Our theoretical and numerical results show that approaching fundamental limits on total power requires increasing the complexity of both the code design and the corresponding decoding algorithm as communication distance is increased or error-probability is lowered.
△ Less
Submitted 18 November, 2015; v1 submitted 4 April, 2015;
originally announced April 2015.
-
Energy Harvesting Wireless Communications: A Review of Recent Advances
Authors:
Sennur Ulukus,
Aylin Yener,
Elza Erkip,
Osvaldo Simeone,
Michele Zorzi,
Pulkit Grover,
Kaibin Huang
Abstract:
This article summarizes recent contributions in the broad area of energy harvesting wireless communications. In particular, we provide the current state of the art for wireless networks composed of energy harvesting nodes, starting from the information-theoretic performance limits to transmission scheduling policies and resource allocation, medium access and networking issues. The emerging related…
▽ More
This article summarizes recent contributions in the broad area of energy harvesting wireless communications. In particular, we provide the current state of the art for wireless networks composed of energy harvesting nodes, starting from the information-theoretic performance limits to transmission scheduling policies and resource allocation, medium access and networking issues. The emerging related area of energy transfer for self-sustaining energy harvesting wireless networks is considered in detail covering both energy cooperation aspects and simultaneous energy and information transfer. Various potential models with energy harvesting nodes at different network scales are reviewed as well as models for energy consumption at the nodes.
△ Less
Submitted 24 January, 2015;
originally announced January 2015.
-
Energy-efficient Decoders for Compressive Sensing: Fundamental Limits and Implementations
Authors:
Tongxin Li,
Mayank Bakshi,
Pulkit Grover
Abstract:
The fundamental problem considered in this paper is "What is the \textit{energy} consumed for the implementation of a \emph{compressive sensing} decoding algorithm on a circuit?". Using the "information-friction" framework, we examine the smallest amount of \textit{bit-meters} as a measure for the energy consumed by a circuit. We derive a fundamental lower bound for the implementation of compressi…
▽ More
The fundamental problem considered in this paper is "What is the \textit{energy} consumed for the implementation of a \emph{compressive sensing} decoding algorithm on a circuit?". Using the "information-friction" framework, we examine the smallest amount of \textit{bit-meters} as a measure for the energy consumed by a circuit. We derive a fundamental lower bound for the implementation of compressive sensing decoding algorithms on a circuit. In the setting where the number of measurements scales linearly with the sparsity and the sparsity is sub-linear with the length of the signal, we show that the \textit{bit-meters} consumption for these algorithms is order-tight, i.e., it matches the lower bound asymptotically up to a constant factor. Our implementations yield interesting insights into design of energy-efficient circuits that are not captured by the notion of computational efficiency alone.
△ Less
Submitted 16 February, 2015; v1 submitted 16 November, 2014;
originally announced November 2014.
-
"Information-Friction" and its implications on minimum energy required for communication
Authors:
Pulkit Grover
Abstract:
Just as there are frictional losses associated with moving masses on a surface, what if there were frictional losses associated with moving information on a substrate? Indeed, many modes of communication suffer from such frictional losses. We propose to model these losses as proportional to "bit-meters," i.e., the product of mass of information (i.e., the number of bits) and the distance of inform…
▽ More
Just as there are frictional losses associated with moving masses on a surface, what if there were frictional losses associated with moving information on a substrate? Indeed, many modes of communication suffer from such frictional losses. We propose to model these losses as proportional to "bit-meters," i.e., the product of mass of information (i.e., the number of bits) and the distance of information transport. We use this "information- friction" model to understand fundamental energy requirements on encoding and decoding in communication circuitry. First, for communication across a binary input AWGN channel, we arrive at fundamental limits on bit-meters (and thus energy consumption) for decoding implementations that have a predetermined input-independent length of messages. For encoding, we relax the fixed-length assumption and derive bounds for flexible-message- length implementations. Using these lower bounds we show that the total (transmit + encoding + decoding) energy-per-bit must diverge to infinity as the target error probability is lowered to zero. Further, the closer the communication rate is maintained to the channel capacity (as the target error-probability is lowered to zero), the faster the required decoding energy diverges to infinity.
△ Less
Submitted 1 September, 2014; v1 submitted 6 January, 2014;
originally announced January 2014.
-
Information embedding and the triple role of control
Authors:
Pulkit Grover,
Aaron B. Wagner,
Anant Sahai
Abstract:
We consider the problem of information embedding where the encoder modifies a white Gaussian host signal in a power-constrained manner to encode a message, and the decoder recovers both the embedded message and the modified host signal. This partially extends the recent work of Sumszyk and Steinberg to the continuous-alphabet Gaussian setting. Through a control-theoretic lens, we observe that the…
▽ More
We consider the problem of information embedding where the encoder modifies a white Gaussian host signal in a power-constrained manner to encode a message, and the decoder recovers both the embedded message and the modified host signal. This partially extends the recent work of Sumszyk and Steinberg to the continuous-alphabet Gaussian setting. Through a control-theoretic lens, we observe that the problem is a minimalist example of what is called the "triple role" of control actions. We show that a dirty-paper-coding strategy achieves the optimal rate for perfect recovery of the modified host and the message for any message rate. For imperfect recovery of the modified host, by deriving bounds on the minimum mean-square error (MMSE) in recovering the modified host signal, we show that DPC-based strategies are guaranteed to attain within a uniform constant factor of 16 of the optimal weighted sum of power required in host signal modification and the MMSE in the modified host signal reconstruction for all weights and all message rates. When specialized to the zero-rate case, our results provide the tightest known lower bounds on the asymptotic costs for the vector version of a famous open problem in decentralized control: the Witsenhausen counterexample. Numerically, this tighter bound helps us characterize the asymptotically optimal costs for the vector Witsenhausen problem to within a factor of 1.3 for all problem parameters, improving on the earlier best known bound of 2.
△ Less
Submitted 20 June, 2013;
originally announced June 2013.
-
Towards a communication-theoretic understanding of system-level power consumption
Authors:
Pulkit Grover,
Kristen Ann Woyach,
Anant Sahai
Abstract:
Traditional communication theory focuses on minimizing transmit power. However, communication links are increasingly operating at shorter ranges where transmit power can be significantly smaller than the power consumed in decoding. This paper models the required decoding power and investigates the minimization of total system power from two complementary perspectives.
First, an isolated point-to…
▽ More
Traditional communication theory focuses on minimizing transmit power. However, communication links are increasingly operating at shorter ranges where transmit power can be significantly smaller than the power consumed in decoding. This paper models the required decoding power and investigates the minimization of total system power from two complementary perspectives.
First, an isolated point-to-point link is considered. Using new lower bounds on the complexity of message-passing decoding, lower bounds are derived on decoding power. These bounds show that 1) there is a fundamental tradeoff between transmit and decoding power; 2) unlike the implications of the traditional "waterfall" curve which focuses on transmit power, the total power must diverge to infinity as error probability goes to zero; 3) Regular LDPCs, and not their known capacity-achieving irregular counterparts, can be shown to be power order optimal in some cases; and 4) the optimizing transmit power is bounded away from the Shannon limit.
Second, we consider a collection of links. When systems both generate and face interference, coding allows a system to support a higher density of transmitter-receiver pairs (assuming interference is treated as noise). However, at low densities, uncoded transmission may be more power-efficient in some cases.
△ Less
Submitted 16 February, 2011; v1 submitted 23 October, 2010;
originally announced October 2010.
-
Implicit and explicit communication in decentralized control
Authors:
Pulkit Grover,
Anant Sahai
Abstract:
There has been substantial progress recently in understanding toy problems of purely implicit signaling. These are problems where the source and the channel are implicit -- the message is generated endogenously by the system, and the plant itself is used as a channel. In this paper, we explore how implicit and explicit communication can be used synergistically to reduce control costs.
The setting…
▽ More
There has been substantial progress recently in understanding toy problems of purely implicit signaling. These are problems where the source and the channel are implicit -- the message is generated endogenously by the system, and the plant itself is used as a channel. In this paper, we explore how implicit and explicit communication can be used synergistically to reduce control costs.
The setting is an extension of Witsenhausen's counterexample where a rate-limited external channel connects the two controllers. Using a semi-deterministic version of the problem, we arrive at a binning-based strategy that can outperform the best known strategies by an arbitrarily large factor.
We also show that our binning-based strategy attains within a constant factor of the optimal cost for an asymptotically infinite-length version of the problem uniformly over all problem parameters and all rates on the external channel. For the scalar case, although our results yield approximate optimality for each fixed rate, we are unable to prove approximately-optimality uniformly over all rates.
△ Less
Submitted 23 October, 2010;
originally announced October 2010.
-
Is Witsenhausen's counterexample a relevant toy?
Authors:
Pulkit Grover,
Anant Sahai
Abstract:
This paper answers a question raised by Doyle on the relevance of the Witsenhausen counterexample as a toy decentralized control problem. The question has two sides, the first of which focuses on the lack of an external channel in the counterexample. Using existing results, we argue that the core difficulty in the counterexample is retained even in the presence of such a channel. The second side q…
▽ More
This paper answers a question raised by Doyle on the relevance of the Witsenhausen counterexample as a toy decentralized control problem. The question has two sides, the first of which focuses on the lack of an external channel in the counterexample. Using existing results, we argue that the core difficulty in the counterexample is retained even in the presence of such a channel. The second side questions the LQG formulation of the counterexample. We consider alternative formulations and show that the understanding developed for the LQG case guides the investigation for these other cases as well. Specifically, we consider 1) a variation on the original counterexample with general, but bounded, noise distributions, and 2) an adversarial extension with bounded disturbance and quadratic costs. For each of these formulations, we show that quantization-based nonlinear strategies outperform linear strategies by an arbitrarily large factor. Further, these nonlinear strategies also perform within a constant factor of the optimal, uniformly over all possible parameter choices (for fixed noise distributions in the Bayesian case).
Fortuitously, the assumption of bounded noise results in a significant simplification of proofs as compared to those for the LQG formulation. Therefore, the results in this paper are also of pedagogical interest.
△ Less
Submitted 13 September, 2010;
originally announced September 2010.
-
Information embedding meets distributed control
Authors:
Pulkit Grover,
Aaron B. Wagner,
Anant Sahai
Abstract:
We consider the problem of information embedding where the encoder modifies a white Gaussian host signal in a power-constrained manner to encode the message, and the decoder recovers both the embedded message and the modified host signal. This extends the recent work of Sumszyk and Steinberg to the continuous-alphabet Gaussian setting. We show that a dirty-paper-coding based strategy achieves th…
▽ More
We consider the problem of information embedding where the encoder modifies a white Gaussian host signal in a power-constrained manner to encode the message, and the decoder recovers both the embedded message and the modified host signal. This extends the recent work of Sumszyk and Steinberg to the continuous-alphabet Gaussian setting. We show that a dirty-paper-coding based strategy achieves the optimal rate for perfect recovery of the modified host and the message. We also provide bounds for the extension wherein the modified host signal is recovered only to within a specified distortion. When specialized to the zero-rate case, our results provide the tightest known lower bounds on the asymptotic costs for the vector version of a famous open problem in distributed control -- the Witsenhausen counterexample. Using this bound, we characterize the asymptotically optimal costs for the vector Witsenhausen problem numerically to within a factor of 1.3 for all problem parameters, improving on the earlier best known bound of 2.
△ Less
Submitted 2 March, 2010;
originally announced March 2010.
-
The finite-dimensional Witsenhausen counterexample
Authors:
Pulkit Grover,
Se Yong Park,
Anant Sahai
Abstract:
Recently, a vector version of Witsenhausen's counterexample was considered and it was shown that in that limit of infinite vector length, certain quantization-based control strategies are provably within a constant factor of the optimal cost for all possible problem parameters. In this paper, finite vector lengths are considered with the dimension being viewed as an additional problem parameter.…
▽ More
Recently, a vector version of Witsenhausen's counterexample was considered and it was shown that in that limit of infinite vector length, certain quantization-based control strategies are provably within a constant factor of the optimal cost for all possible problem parameters. In this paper, finite vector lengths are considered with the dimension being viewed as an additional problem parameter. By applying a large-deviation "sphere-packing" philosophy, a lower bound to the optimal cost for the finite dimensional case is derived that uses appropriate shadows of the infinite-length bound. Using the new lower bound, we show that good lattice-based control strategies achieve within a constant factor of the optimal cost uniformly over all possible problem parameters, including the vector length. For Witsenhausen's original problem -- the scalar case -- the gap between regular lattice-based strategies and the lower bound is numerically never more than a factor of 8.
△ Less
Submitted 2 March, 2010;
originally announced March 2010.
-
Green Codes: Energy-Efficient Short-Range Communication
Authors:
Pulkit Grover,
Anant Sahai
Abstract:
A green code attempts to minimize the total energy per-bit required to communicate across a noisy channel. The classical information-theoretic approach neglects the energy expended in processing the data at the encoder and the decoder and only minimizes the energy required for transmissions. Since there is no cost associated with using more degrees of freedom, the traditionally optimal strategy…
▽ More
A green code attempts to minimize the total energy per-bit required to communicate across a noisy channel. The classical information-theoretic approach neglects the energy expended in processing the data at the encoder and the decoder and only minimizes the energy required for transmissions. Since there is no cost associated with using more degrees of freedom, the traditionally optimal strategy is to communicate at rate zero.
In this work, we use our recently proposed model for the power consumed by iterative message passing. Using generalized sphere-packing bounds on the decoding power, we find lower bounds on the total energy consumed in the transmissions and the decoding, allowing for freedom in the choice of the rate. We show that contrary to the classical intuition, the rate for green codes is bounded away from zero for any given error probability. In fact, as the desired bit-error probability goes to zero, the optimizing rate for our bounds converges to 1.
△ Less
Submitted 15 May, 2008;
originally announced May 2008.
-
The price of certainty: "waterslide curves" and the gap to capacity
Authors:
Anant Sahai,
Pulkit Grover
Abstract:
The classical problem of reliable point-to-point digital communication is to achieve a low probability of error while kee** the rate high and the total power consumption small. Traditional information-theoretic analysis uses `waterfall' curves to convey the revolutionary idea that unboundedly low probabilities of bit-error are attainable using only finite transmit power. However, practitioners…
▽ More
The classical problem of reliable point-to-point digital communication is to achieve a low probability of error while kee** the rate high and the total power consumption small. Traditional information-theoretic analysis uses `waterfall' curves to convey the revolutionary idea that unboundedly low probabilities of bit-error are attainable using only finite transmit power. However, practitioners have long observed that the decoder complexity, and hence the total power consumption, goes up when attempting to use sophisticated codes that operate close to the waterfall curve.
This paper gives an explicit model for power consumption at an idealized decoder that allows for extreme parallelism in implementation. The decoder architecture is in the spirit of message passing and iterative decoding for sparse-graph codes. Generalized sphere-packing arguments are used to derive lower bounds on the decoding power needed for any possible code given only the gap from the Shannon limit and the desired probability of error. As the gap goes to zero, the energy per bit spent in decoding is shown to go to infinity. This suggests that to optimize total power, the transmitter should operate at a power that is strictly above the minimum demanded by the Shannon capacity.
The lower bound is plotted to show an unavoidable tradeoff between the average bit-error probability and the total power used in transmission and decoding. In the spirit of conventional waterfall curves, we call these `waterslide' curves.
△ Less
Submitted 2 January, 2008;
originally announced January 2008.
-
Tradeoff between decoding complexity and rate for codes on graphs
Authors:
Pulkit Grover
Abstract:
We consider transmission over a general memoryless channel, with bounded decoding complexity per bit under message passing decoding. We show that the achievable rate is bounded below capacity if there is a finite success in the decoding in a specified number of operations per bit at the decoder for some codes on graphs. These codes include LDPC and LDGM codes. Good performance with low decoding…
▽ More
We consider transmission over a general memoryless channel, with bounded decoding complexity per bit under message passing decoding. We show that the achievable rate is bounded below capacity if there is a finite success in the decoding in a specified number of operations per bit at the decoder for some codes on graphs. These codes include LDPC and LDGM codes. Good performance with low decoding complexity suggests strong local structures in the graphs of these codes, which are detrimental to the code rate asymptotically. The proof method leads to an interesting necessary condition on the code structures which could achieve capacity with bounded decoding complexity. We also show that if a code sequence achieves a rate epsilon close to the channel capacity, the decoding complexity scales at least as O(log(1/epsilon).
△ Less
Submitted 12 February, 2007;
originally announced February 2007.
-
What is needed to exploit knowledge of primary transmissions?
Authors:
Pulkit Grover,
Anant Sahai
Abstract:
Recently, Tarokh and others have raised the possibility that a cognitive radio might know the interference signal being transmitted by a strong primary user in a non-causal way, and use this knowledge to increase its data rates. However, there is a subtle difference between knowing the signal transmitted by the primary and the actual interference at our receiver since there is a wireless channel…
▽ More
Recently, Tarokh and others have raised the possibility that a cognitive radio might know the interference signal being transmitted by a strong primary user in a non-causal way, and use this knowledge to increase its data rates. However, there is a subtle difference between knowing the signal transmitted by the primary and the actual interference at our receiver since there is a wireless channel between these two points. We show that even an unknown phase results in a substantial decrease in the data rates that can be achieved, and thus there is a need to feedback interference channel estimates to the cognitive transmitter. We then consider the case of fading channels. We derive an upper bound on the rate for given outage error probability for faded dirt. We give a scheme that uses appropriate "training" to obtain such estimates and quantify this scheme's required overhead as a function of the relevant coherence time and interference power.
△ Less
Submitted 12 February, 2007;
originally announced February 2007.