-
Adaptive Curves for Optimally Efficient Market Making
Authors:
Viraj Nadkarni,
Sanjeev Kulkarni,
Pramod Viswanath
Abstract:
Automated Market Makers (AMMs) are essential in Decentralized Finance (DeFi) as they match liquidity supply with demand. They function through liquidity providers (LPs) who deposit assets into liquidity pools. However, the asset trading prices in these pools often trail behind those in more dynamic, centralized exchanges, leading to potential arbitrage losses for LPs. This issue is tackled by adap…
▽ More
Automated Market Makers (AMMs) are essential in Decentralized Finance (DeFi) as they match liquidity supply with demand. They function through liquidity providers (LPs) who deposit assets into liquidity pools. However, the asset trading prices in these pools often trail behind those in more dynamic, centralized exchanges, leading to potential arbitrage losses for LPs. This issue is tackled by adapting market maker bonding curves to trader behavior, based on the classical market microstructure model of Glosten and Milgrom. Our approach ensures a zero-profit condition for the market maker's prices. We derive the differential equation that an optimal adaptive curve should follow to minimize arbitrage losses while remaining competitive. Solutions to this optimality equation are obtained for standard Gaussian and Lognormal price models using Kalman filtering. A key feature of our method is its ability to estimate the external market price without relying on price or loss oracles. We also provide an equivalent differential equation for the implied dynamics of canonical static bonding curves and establish conditions for their optimality. Our algorithms demonstrate robustness to changing market conditions and adversarial perturbations, and we offer an on-chain implementation using Uniswap v4 alongside off-chain AI co-processors.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Unconditionally Safe Light Client
Authors:
Niusha Moshrefi,
Peiyao Sheng,
Soubhik Deb,
Sreeram Kannan,
Pramod Viswanath
Abstract:
Blockchain applications often rely on lightweight clients to access and verify on-chain data efficiently without the need to run a resource-intensive full node. These light clients must maintain robust security to protect the blockchain's integrity for users of applications built upon it, achieving this with minimal resources and without significant latency. Moreover, different applications have v…
▽ More
Blockchain applications often rely on lightweight clients to access and verify on-chain data efficiently without the need to run a resource-intensive full node. These light clients must maintain robust security to protect the blockchain's integrity for users of applications built upon it, achieving this with minimal resources and without significant latency. Moreover, different applications have varying security needs. This work focuses on addressing these two key requirements in the context of Proof-of-Stake (PoS) blockchains and identifying the fundamental cost-latency trade-offs to achieve tailored, optimal security for each light client.
The key security guarantee of PoS blockchains is economic (implied by the "stake"). In this paper we formalize this cryptoeconomic security to light clients, ensuring that the cost of corrupting the data provided to light clients must outweigh the potential profit, thereby economically deterring malicious actors. We further introduce "insured" cryptoeconomic security to light clients, providing unconditional protection via the attribution of adversarial actions and the consequent slashing of stakes. The divisible and fungible nature of stake facilitates programmable security, allowing for customization of the security level and insurance amount according to the specific needs of different applications.
We implemented the protocols in less than 1000 lines of Solidity and TypeScript code and evaluated their gas cost, latency, and the computational overhead. For example, for a transaction with value of \$32k, the light client can choose between zero cost with a latency of 5 hours or instant confirmation with an insurance cost of \$7.45. Thus, the client can select the optimal point on the latency-cost trade-off spectrum that best aligns with its needs. Light clients require negligible storage and face minimal computational costs,...
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
BFT-PoLoc: A Byzantine Fortified Trigonometric Proof of Location Protocol using Internet Delays
Authors:
Peiyao Sheng,
Vishal Sevani,
Ranvir Rana,
Himanshu Tyagi,
Pramod Viswanath
Abstract:
Internet platforms depend on accurately determining the geographical locations of online users to deliver targeted services (e.g., advertising). The advent of decentralized platforms (blockchains) emphasizes the importance of geographically distributed nodes, making the validation of locations more crucial. In these decentralized settings, mutually non-trusting participants need to {\em prove} the…
▽ More
Internet platforms depend on accurately determining the geographical locations of online users to deliver targeted services (e.g., advertising). The advent of decentralized platforms (blockchains) emphasizes the importance of geographically distributed nodes, making the validation of locations more crucial. In these decentralized settings, mutually non-trusting participants need to {\em prove} their locations to each other. The incentives for claiming desired location include decentralization properties (validators of a blockchain), explicit rewards for improving coverage (physical infrastructure blockchains) and regulatory compliance -- and entice participants towards prevaricating their true location malicious via VPNs, tampering with internet delays, or compromising other parties (challengers) to misrepresent their location. Traditional delay-based geolocation methods focus on reducing the noise in measurements and are very vulnerable to wilful divergences from prescribed protocol.
In this paper we use Internet delay measurements to securely prove the location of IP addresses while being immune to a large fraction of Byzantine actions. Our core methods are to endow Internet telemetry tools (e.g., **) with cryptographic primitives (signatures and hash functions) together with Byzantine resistant data inferences subject to Euclidean geometric constraints. We introduce two new networking protocols, robust against Byzantine actions: Proof of Internet Geometry (PoIG) converts delay measurements into precise distance estimates across the Internet; Proof of Location (PoLoc) enables accurate and efficient multilateration of a specific IP address. The key algorithmic innovations are in conducting ``Byzantine fortified trigonometry" (BFT) inferences of data, endowing low rank matrix completion methods with Byzantine resistance.
△ Less
Submitted 28 March, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
DeepPolar: Inventing Nonlinear Large-Kernel Polar Codes via Deep Learning
Authors:
S Ashwin Hebbar,
Sravan Kumar Ankireddy,
Hyeji Kim,
Sewoong Oh,
Pramod Viswanath
Abstract:
Progress in designing channel codes has been driven by human ingenuity and, fittingly, has been sporadic. Polar codes, developed on the foundation of Arikan's polarization kernel, represent the latest breakthrough in coding theory and have emerged as the state-of-the-art error-correction code for short-to-medium block length regimes. In an effort to automate the invention of good channel codes, es…
▽ More
Progress in designing channel codes has been driven by human ingenuity and, fittingly, has been sporadic. Polar codes, developed on the foundation of Arikan's polarization kernel, represent the latest breakthrough in coding theory and have emerged as the state-of-the-art error-correction code for short-to-medium block length regimes. In an effort to automate the invention of good channel codes, especially in this regime, we explore a novel, non-linear generalization of Polar codes, which we call DeepPolar codes. DeepPolar codes extend the conventional Polar coding framework by utilizing a larger kernel size and parameterizing these kernels and matched decoders through neural networks. Our results demonstrate that these data-driven codes effectively leverage the benefits of a larger kernel size, resulting in enhanced reliability when compared to both existing neural codes and conventional Polar codes.
△ Less
Submitted 4 June, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
Proof of Diligence: Cryptoeconomic Security for Rollups
Authors:
Peiyao Sheng,
Ranvir Rana,
Himanshu Tyagi,
Pramod Viswanath
Abstract:
Layer 1 (L1) blockchains such as Ethereum are secured under an "honest supermajority of stake" assumption for a large pool of validators who verify each and every transaction on it. This high security comes at a scalability cost which not only effects the throughput of the blockchain but also results in high gas fees for executing transactions on chain. The most successful solution for this proble…
▽ More
Layer 1 (L1) blockchains such as Ethereum are secured under an "honest supermajority of stake" assumption for a large pool of validators who verify each and every transaction on it. This high security comes at a scalability cost which not only effects the throughput of the blockchain but also results in high gas fees for executing transactions on chain. The most successful solution for this problem is provided by optimistic rollups, Layer 2 (L2) blockchains that execute transactions outside L1 but post the transaction data on L1. The security for such L2 chains is argued, informally, under the assumption that a set of nodes will check the transaction data posted on L1 and raise an alarm (a fraud proof) if faulty transactions are detected. However, all current deployments lack a proper incentive mechanism for ensuring that these nodes will do their job ``diligently'', and simply rely on a cursory incentive alignment argument for security. We solve this problem by introducing an incentivized watchtower network designed to serve as the first line of defense for rollups. Our main contribution is a ``Proof of Diligence'' protocol that requires watchtowers to continuously provide a proof that they have verified L2 assertions and get rewarded for the same. Proof of Diligence protocol includes a carefully-designed incentive mechanism that is provably secure when watchtowers are rational actors, under a mild rational independence assumption.
Our proposed system is now live on Ethereum testnet. We deployed a watchtower network and implemented Proof of Diligence for multiple optimistic rollups. We extract execution as well as inclusion proofs for transactions as a part of the bounty. Each watchtower has minimal additional computational overhead beyond access to standard L1 and L2 RPC nodes.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
ZeroSwap: Data-driven Optimal Market Making in DeFi
Authors:
Viraj Nadkarni,
Jiachen Hu,
Ranvir Rana,
Chi **,
Sanjeev Kulkarni,
Pramod Viswanath
Abstract:
Automated Market Makers (AMMs) are major centers of matching liquidity supply and demand in Decentralized Finance. Their functioning relies primarily on the presence of liquidity providers (LPs) incentivized to invest their assets into a liquidity pool. However, the prices at which a pooled asset is traded is often more stale than the prices on centralized and more liquid exchanges. This leads to…
▽ More
Automated Market Makers (AMMs) are major centers of matching liquidity supply and demand in Decentralized Finance. Their functioning relies primarily on the presence of liquidity providers (LPs) incentivized to invest their assets into a liquidity pool. However, the prices at which a pooled asset is traded is often more stale than the prices on centralized and more liquid exchanges. This leads to the LPs suffering losses to arbitrage. This problem is addressed by adapting market prices to trader behavior, captured via the classical market microstructure model of Glosten and Milgrom. In this paper, we propose the first optimal Bayesian and the first model-free data-driven algorithm to optimally track the external price of the asset. The notion of optimality that we use enforces a zero-profit condition on the prices of the market maker, hence the name ZeroSwap. This ensures that the market maker balances losses to informed traders with profits from noise traders. The key property of our approach is the ability to estimate the external market price without the need for price oracles or loss oracles. Our theoretical guarantees on the performance of both these algorithms, ensuring the stability and convergence of their price recommendations, are of independent interest in the theory of reinforcement learning. We empirically demonstrate the robustness of our algorithms to changing market conditions.
△ Less
Submitted 29 April, 2024; v1 submitted 13 October, 2023;
originally announced October 2023.
-
SAKSHI: Decentralized AI Platforms
Authors:
Suma Bhat,
Canhui Chen,
Zerui Cheng,
Zhixuan Fang,
Ashwin Hebbar,
Sreeram Kannan,
Ranvir Rana,
Peiyao Sheng,
Himanshu Tyagi,
Pramod Viswanath,
Xuechao Wang
Abstract:
Large AI models (e.g., Dall-E, GPT4) have electrified the scientific, technological and societal landscape through their superhuman capabilities. These services are offered largely in a traditional web2.0 format (e.g., OpenAI's GPT4 service). As more large AI models proliferate (personalizing and specializing to a variety of domains), there is a tremendous need to have a neutral trust-free platfor…
▽ More
Large AI models (e.g., Dall-E, GPT4) have electrified the scientific, technological and societal landscape through their superhuman capabilities. These services are offered largely in a traditional web2.0 format (e.g., OpenAI's GPT4 service). As more large AI models proliferate (personalizing and specializing to a variety of domains), there is a tremendous need to have a neutral trust-free platform that allows the hosting of AI models, clients receiving AI services efficiently, yet in a trust-free, incentive compatible, Byzantine behavior resistant manner. In this paper we propose SAKSHI, a trust-free decentralized platform specifically suited for AI services. The key design principles of SAKSHI are the separation of the data path (where AI query and service is managed) and the control path (where routers and compute and storage hosts are managed) from the transaction path (where the metering and billing of services are managed over a blockchain). This separation is enabled by a "proof of inference" layer which provides cryptographic resistance against a variety of misbehaviors, including poor AI service, nonpayment for service, copying of AI models. This is joint work between multiple universities (Princeton University, University of Illinois at Urbana-Champaign, Tsinghua University, HKUST) and two startup companies (Witness Chain and Eigen Layer).
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
CFT-Forensics: High-Performance Byzantine Accountability for Crash Fault Tolerant Protocols
Authors:
Weizhao Tang,
Peiyao Sheng,
Ronghao Ni,
Pronoy Roy,
Xuechao Wang,
Giulia Fanti,
Pramod Viswanath
Abstract:
Crash fault tolerant (CFT) consensus algorithms are commonly used in scenarios where system components are trusted -- e.g., enterprise settings and government infrastructure. However, CFT consensus can be broken by even a single corrupt node. A desirable property in the face of such potential Byzantine faults is \emph{accountability}: if a corrupt node breaks protocol and affects consensus safety,…
▽ More
Crash fault tolerant (CFT) consensus algorithms are commonly used in scenarios where system components are trusted -- e.g., enterprise settings and government infrastructure. However, CFT consensus can be broken by even a single corrupt node. A desirable property in the face of such potential Byzantine faults is \emph{accountability}: if a corrupt node breaks protocol and affects consensus safety, it should be possible to identify the culpable components with cryptographic integrity from the node states. Today, the best-known protocol for providing accountability to CFT protocols is called PeerReview; it essentially records a signed transcript of all messages sent during the CFT protocol. Because PeerReview is agnostic to the underlying CFT protocol, it incurs high communication and storage overhead. We propose CFT-Forensics, an accountability framework for CFT protocols. We show that for a special family of \emph{forensics-compliant} CFT protocols (which includes widely-used CFT protocols like Raft and multi-Paxos), CFT-Forensics gives provable accountability guarantees. Under realistic deployment settings, we show theoretically that CFT-Forensics operates at a fraction of the cost of PeerReview. We subsequently instantiate CFT-Forensics for Raft, and implement Raft-Forensics as an extension to the popular nuRaft library. In extensive experiments, we demonstrate that Raft-Forensics adds low overhead to vanilla Raft. With 256 byte messages, Raft-Forensics achieves a peak throughput 87.8\% of vanilla Raft at 46\% higher latency ($+44$ ms). We finally integrate Raft-Forensics into the open-source central bank digital currency OpenCBDC, and show that in wide-area network experiments, Raft-Forensics achieves 97.8\% of the throughput of Raft, with 14.5\% higher latency ($+326$ ms).
△ Less
Submitted 3 June, 2024; v1 submitted 15 May, 2023;
originally announced May 2023.
-
Compressed Error HARQ: Feedback Communication on Noise-Asymmetric Channels
Authors:
Sravan Kumar Ankireddy,
S. Ashwin Hebbar,
Yihan Jiang,
Hyeji Kim,
Pramod Viswanath
Abstract:
In modern communication systems with feedback, there are increasingly more scenarios where the transmitter has much less power than the receiver (e.g., medical implant devices), which we refer to as noise-asymmetric channels. For such channels, the feedback link is of higher quality than the forward link. However, feedback schemes for cellular communications, such as hybrid ARQ, do not fully utili…
▽ More
In modern communication systems with feedback, there are increasingly more scenarios where the transmitter has much less power than the receiver (e.g., medical implant devices), which we refer to as noise-asymmetric channels. For such channels, the feedback link is of higher quality than the forward link. However, feedback schemes for cellular communications, such as hybrid ARQ, do not fully utilize the high-quality feedback link. To this end, we introduce Compressed Error Hybrid ARQ, a generalization of hybrid ARQ tailored for noise-asymmetric channels; the receiver sends its estimated message to the transmitter, and the transmitter harmoniously switches between hybrid ARQ and compressed error retransmission. We show that our proposed method significantly improves reliability, latency, and spectral efficiency compared to the conventional hybrid ARQ in various practical scenarios where the transmitter is resource-constrained.
△ Less
Submitted 20 February, 2023;
originally announced February 2023.
-
Machine Learning-Aided Efficient Decoding of Reed-Muller Subcodes
Authors:
Mohammad Vahid Jamali,
Xiyang Liu,
Ashok Vardhan Makkuva,
Hessam Mahdavifar,
Sewoong Oh,
Pramod Viswanath
Abstract:
Reed-Muller (RM) codes achieve the capacity of general binary-input memoryless symmetric channels and are conjectured to have a comparable performance to that of random codes in terms of scaling laws. However, such results are established assuming maximum-likelihood decoders for general code parameters. Also, RM codes only admit limited sets of rates. Efficient decoders such as successive cancella…
▽ More
Reed-Muller (RM) codes achieve the capacity of general binary-input memoryless symmetric channels and are conjectured to have a comparable performance to that of random codes in terms of scaling laws. However, such results are established assuming maximum-likelihood decoders for general code parameters. Also, RM codes only admit limited sets of rates. Efficient decoders such as successive cancellation list (SCL) decoder and recently-introduced recursive projection-aggregation (RPA) decoders are available for RM codes at finite lengths. In this paper, we focus on subcodes of RM codes with flexible rates. We first extend the RPA decoding algorithm to RM subcodes. To lower the complexity of our decoding algorithm, referred to as subRPA, we investigate different approaches to prune the projections. Next, we derive the soft-decision based version of our algorithm, called soft-subRPA, that not only improves upon the performance of subRPA but also enables a differentiable decoding algorithm. Building upon the soft-subRPA algorithm, we then provide a framework for training a machine learning (ML) model to search for \textit{good} sets of projections that minimize the decoding error rate. Training our ML model enables achieving very close to the performance of full-projection decoding with a significantly smaller number of projections. We also show that the choice of the projections in decoding RM subcodes matters significantly, and our ML-aided projection pruning scheme is able to find a \textit{good} selection, i.e., with negligible performance degradation compared to the full-projection case, given a reasonable number of projections.
△ Less
Submitted 31 July, 2023; v1 submitted 15 January, 2023;
originally announced January 2023.
-
TrustBoost: Boosting Trust among Interoperable Blockchains
Authors:
Peiyao Sheng,
Xuechao Wang,
Sreeram Kannan,
Kartik Nayak,
Pramod Viswanath
Abstract:
Currently there exist many blockchains with weak trust guarantees, limiting applications and participation. Existing solutions to boost the trust using a stronger blockchain, e.g., via checkpointing, requires the weaker blockchain to give up sovereignty. In this paper, we propose a family of protocols in which multiple blockchains interact to create a combined ledger with boosted trust. We show th…
▽ More
Currently there exist many blockchains with weak trust guarantees, limiting applications and participation. Existing solutions to boost the trust using a stronger blockchain, e.g., via checkpointing, requires the weaker blockchain to give up sovereignty. In this paper, we propose a family of protocols in which multiple blockchains interact to create a combined ledger with boosted trust. We show that even if several of the interacting blockchains cease to provide security guarantees, the combined ledger continues to be secure - our TrustBoost protocols achieve the optimal threshold of tolerating the insecure blockchains. This optimality, along with the necessity of blockchain interactions, is formally shown within the classic shared memory model, tackling the long standing open challenge of solving consensus in the presence of both Byzantine objects and processes. Furthermore, our proposed construction of TrustBoost simply operates via smart contracts and require no change to the underlying consensus protocols of the participating blockchains, a form of ``consensus on top of consensus''. The protocols are lightweight and can be used on specific (e.g., high value) transactions; we demonstrate the practicality by implementing and deploying TrustBoost as cross-chain smart contracts in the Cosmos ecosystem using approximately 3,000 lines of Rust code, made available as open source. Our evaluation shows that using 10 Cosmos chains in a local testnet, TrustBoost has a gas cost of roughly $2 with a latency of 2 minutes per request, which is in line with the cost on a high security chain such as Bitcoin or Ethereum
△ Less
Submitted 20 September, 2023; v1 submitted 20 October, 2022;
originally announced October 2022.
-
Proof of Backhaul: Trustfree Measurement of Broadband Bandwidth
Authors:
Peiyao Sheng,
Nikita Yadav,
Vishal Sevani,
Arun Babu,
SVR Anand,
Himanshu Tyagi,
Pramod Viswanath
Abstract:
Recent years have seen the emergence of decentralized wireless networks consisting of nodes hosted by many individuals and small enterprises, reawakening the decades-old dream of open networking. These networks have been deployed in an organic, distributed manner and are driven by new economic models resting on tokenized incentives. A critical requirement for the incentives to scale is the ability…
▽ More
Recent years have seen the emergence of decentralized wireless networks consisting of nodes hosted by many individuals and small enterprises, reawakening the decades-old dream of open networking. These networks have been deployed in an organic, distributed manner and are driven by new economic models resting on tokenized incentives. A critical requirement for the incentives to scale is the ability to prove network performance in a decentralized trustfree manner, i.e., a Byzantine fault tolerant network telemetry system. In this paper, we present a Proof of Backhaul (PoB) protocol which measures the bandwidth of the (broadband) backhaul link of a wireless access point, termed prover, in a decentralized and trustfree manner. In particular, our proposed protocol is the first one to satisfy the following two properties: (1) Trustfree. Bandwidth measurement is secure against Byzantine attacks by collaborations of challenge servers and the prover. (2) Open. The barrier-to-entry for being a challenge server is low; there is no requirement of having a low latency and high throughput path to the measured link. At a high-level, our protocol aggregates the challenge traffic from multiple challenge servers and uses cryptographic primitives to ensure that a subset of challengers or, even challengers and provers, cannot maliciously modify results in their favor. A formal security model allows us to establish guarantees of accurate bandwidth measurement as a function of the fraction of malicious actors. Our evaluation shows that our PoB protocol can verify backhaul bandwidth of up to 1000 Mbps with less than 8% error using measurements lasting only 100 ms. The measurement accuracy is not affected in the presence of corrupted challengers. Importantly, the basic verification protocol lends itself to a minor modification that can measure available bandwidth even in the presence of cross-traffic.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
CRISP: Curriculum based Sequential Neural Decoders for Polar Code Family
Authors:
S Ashwin Hebbar,
Viraj Nadkarni,
Ashok Vardhan Makkuva,
Suma Bhat,
Sewoong Oh,
Pramod Viswanath
Abstract:
Polar codes are widely used state-of-the-art codes for reliable communication that have recently been included in the 5th generation wireless standards (5G). However, there remains room for the design of polar decoders that are both efficient and reliable in the short blocklength regime. Motivated by recent successes of data-driven channel decoders, we introduce a novel $\textbf{C}$ur…
▽ More
Polar codes are widely used state-of-the-art codes for reliable communication that have recently been included in the 5th generation wireless standards (5G). However, there remains room for the design of polar decoders that are both efficient and reliable in the short blocklength regime. Motivated by recent successes of data-driven channel decoders, we introduce a novel $\textbf{C}$ur$\textbf{RI}$culum based $\textbf{S}$equential neural decoder for $\textbf{P}$olar codes (CRISP). We design a principled curriculum, guided by information-theoretic insights, to train CRISP and show that it outperforms the successive-cancellation (SC) decoder and attains near-optimal reliability performance on the Polar(32,16) and Polar(64,22) codes. The choice of the proposed curriculum is critical in achieving the accuracy gains of CRISP, as we show by comparing against other curricula. More notably, CRISP can be readily extended to Polarization-Adjusted-Convolutional (PAC) codes, where existing SC decoders are significantly less reliable. To the best of our knowledge, CRISP constructs the first data-driven decoder for PAC codes and attains near-optimal performance on the PAC(32,16) code.
△ Less
Submitted 29 May, 2023; v1 submitted 1 October, 2022;
originally announced October 2022.
-
TinyTurbo: Efficient Turbo Decoders on Edge
Authors:
S Ashwin Hebbar,
Rajesh K Mishra,
Sravan Kumar Ankireddy,
Ashok V Makkuva,
Hyeji Kim,
Pramod Viswanath
Abstract:
In this paper, we introduce a neural-augmented decoder for Turbo codes called TINYTURBO . TINYTURBO has complexity comparable to the classical max-log-MAP algorithm but has much better reliability than the max-log-MAP baseline and performs close to the MAP algorithm. We show that TINYTURBO exhibits strong robustness on a variety of practical channels of interest, such as EPA and EVA channels, whic…
▽ More
In this paper, we introduce a neural-augmented decoder for Turbo codes called TINYTURBO . TINYTURBO has complexity comparable to the classical max-log-MAP algorithm but has much better reliability than the max-log-MAP baseline and performs close to the MAP algorithm. We show that TINYTURBO exhibits strong robustness on a variety of practical channels of interest, such as EPA and EVA channels, which are included in the LTE standards. We also show that TINYTURBO strongly generalizes across different rate, blocklengths, and trellises. We verify the reliability and efficiency of TINYTURBO via over-the-air experiments.
△ Less
Submitted 30 September, 2022;
originally announced September 2022.
-
Optimal Bootstrap** of PoW Blockchains
Authors:
Ranvir Rana,
Dimitris Karakostas,
Sreeram Kannan,
Aggelos Kiayias,
Pramod Viswanath
Abstract:
Proof of Work (PoW) blockchains are susceptible to adversarial majority mining attacks in the early stages due to incipient participation and corresponding low net hash power. Bootstrap** ensures safety and liveness during the transient stage by protecting against a majority mining attack, allowing a PoW chain to grow the participation base and corresponding mining hash power. Liveness is especi…
▽ More
Proof of Work (PoW) blockchains are susceptible to adversarial majority mining attacks in the early stages due to incipient participation and corresponding low net hash power. Bootstrap** ensures safety and liveness during the transient stage by protecting against a majority mining attack, allowing a PoW chain to grow the participation base and corresponding mining hash power. Liveness is especially important since a loss of liveness will lead to loss of honest mining rewards, decreasing honest participation, hence creating an undesired spiral; indeed existing bootstrap** mechanisms offer especially weak liveness guarantees.
In this paper, we propose Advocate, a new bootstrap** methodology, which achieves two main results: (a) optimal liveness and low latency under a super-majority adversary for the Nakamoto longest chain protocol and (b) immediate black-box generalization to a variety of parallel-chain based scaling architectures, including OHIE and Prism. We demonstrate via a full-stack implementation the robustness of Advocate under a 90% adversarial majority.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
Minotaur: Multi-Resource Blockchain Consensus
Authors:
Matthias Fitzi,
Xuechao Wang,
Sreeram Kannan,
Aggelos Kiayias,
Nikos Leonardos,
Pramod Viswanath,
Gerui Wang
Abstract:
Resource-based consensus is the backbone of permissionless distributed ledger systems. The security of such protocols relies fundamentally on the level of resources actively engaged in the system. The variety of different resources (and related proof protocols, some times referred to as PoX in the literature) raises the fundamental question whether it is possible to utilize many of them in tandem…
▽ More
Resource-based consensus is the backbone of permissionless distributed ledger systems. The security of such protocols relies fundamentally on the level of resources actively engaged in the system. The variety of different resources (and related proof protocols, some times referred to as PoX in the literature) raises the fundamental question whether it is possible to utilize many of them in tandem and build multi-resource consensus protocols. The challenge in combining different resources is to achieve fungibility between them, in the sense that security would hold as long as the cumulative adversarial power across all resources is bounded.
In this work, we put forth Minotaur, a multi-resource blockchain consensus protocol that combines proof-of-work (PoW) and proof-of-stake (PoS), and we prove it optimally fungible. At the core of our design, Minotaur operates in epochs while continuously sampling the active computational power to provide a fair exchange between the two resources, work and stake. Further, we demonstrate the ability of Minotaur to handle a higher degree of work fluctuation as compared to the Bitcoin blockchain; we also generalize Minotaur to any number of resources.
We demonstrate the simplicity of Minotaur via implementing a full stack client in Rust (available open source). We use the client to test the robustness of Minotaur to variable mining power and combined work/stake attacks and demonstrate concrete empirical evidence towards the suitability of Minotaur to serve as the consensus layer of a real-world blockchain.
△ Less
Submitted 7 September, 2022; v1 submitted 27 January, 2022;
originally announced January 2022.
-
KO codes: Inventing Nonlinear Encoding and Decoding for Reliable Wireless Communication via Deep-learning
Authors:
Ashok Vardhan Makkuva,
Xiyang Liu,
Mohammad Vahid Jamali,
Hessam Mahdavifar,
Sewoong Oh,
Pramod Viswanath
Abstract:
Landmark codes underpin reliable physical layer communication, e.g., Reed-Muller, BCH, Convolution, Turbo, LDPC and Polar codes: each is a linear code and represents a mathematical breakthrough. The impact on humanity is huge: each of these codes has been used in global wireless communication standards (satellite, WiFi, cellular). Reliability of communication over the classical additive white Gaus…
▽ More
Landmark codes underpin reliable physical layer communication, e.g., Reed-Muller, BCH, Convolution, Turbo, LDPC and Polar codes: each is a linear code and represents a mathematical breakthrough. The impact on humanity is huge: each of these codes has been used in global wireless communication standards (satellite, WiFi, cellular). Reliability of communication over the classical additive white Gaussian noise (AWGN) channel enables benchmarking and ranking of the different codes. In this paper, we construct KO codes, a computationaly efficient family of deep-learning driven (encoder, decoder) pairs that outperform the state-of-the-art reliability performance on the standardized AWGN channel. KO codes beat state-of-the-art Reed-Muller and Polar codes, under the low-complexity successive cancellation decoding, in the challenging short-to-medium block length regime on the AWGN channel. We show that the gains of KO codes are primarily due to the nonlinear map** of information bits directly to transmit real symbols (bypassing modulation) and yet possess an efficient, high performance decoder. The key technical innovation that renders this possible is design of a novel family of neural architectures inspired by the computation tree of the {\bf K}ronecker {\bf O}peration (KO) central to Reed-Muller and Polar codes. These architectures pave way for the discovery of a much richer class of hitherto unexplored nonlinear algebraic structures. The code is available at \href{https://github.com/deepcomm/KOcodes}{https://github.com/deepcomm/KOcodes}
△ Less
Submitted 29 August, 2021;
originally announced August 2021.
-
Securing Parallel-chain Protocols under Variable Mining Power
Authors:
Xuechao Wang,
Viswa Virinchi Muppirala,
Lei Yang,
Sreeram Kannan,
Pramod Viswanath
Abstract:
Several emerging PoW blockchain protocols rely on a "parallel-chain" architecture for scaling, where instead of a single chain, multiple chains are run in parallel and aggregated. A key requirement of practical PoW blockchains is to adapt to mining power variations over time. In this paper, we consider the design of provably secure parallel-chain protocols which can adapt to such mining power vari…
▽ More
Several emerging PoW blockchain protocols rely on a "parallel-chain" architecture for scaling, where instead of a single chain, multiple chains are run in parallel and aggregated. A key requirement of practical PoW blockchains is to adapt to mining power variations over time. In this paper, we consider the design of provably secure parallel-chain protocols which can adapt to such mining power variations.
The Bitcoin difficulty adjustment rule adjusts the difficulty target of block mining periodically to get a constant mean inter-block time. While superficially simple, the rule has proved itself to be sophisticated and successfully secure, both in practice and in theory. We show that natural adaptations of the Bitcoin adjustment rule to the parallel-chain case open the door to subtle, but catastrophic safety and liveness breaches. We uncover a meta-design principle that allow us to design variable mining difficulty protocols for three popular PoW blockchain proposals (Prism, OHIE, and Fruitchains) inside a common rubric.
The principle has three components:(M1) a pivot chain, based on which blocks in all chains choose difficulty, (M2) a monotonicity condition for referencing pivot chain blocks and (M3) translating additional protocol aspects from using levels (depth) to using "difficulty levels". We show that protocols employing a subset of these principles may have catastrophic failures. The security of the designs is also proved using a common rubric - the key technical challenge involves analyzing the interaction between the pivot chain and the other chains, as well as bounding the sudden changes in difficulty target experienced in non-pivot chains. We empirically investigate the responsivity of the new mining difficulty rule via simulations based on historical Bitcoin data, and find that the protocol very effectively controls the forking rate across all the chains.
△ Less
Submitted 9 May, 2021; v1 submitted 6 May, 2021;
originally announced May 2021.
-
Reed-Muller Subcodes: Machine Learning-Aided Design of Efficient Soft Recursive Decoding
Authors:
Mohammad Vahid Jamali,
Xiyang Liu,
Ashok Vardhan Makkuva,
Hessam Mahdavifar,
Sewoong Oh,
Pramod Viswanath
Abstract:
Reed-Muller (RM) codes are conjectured to achieve the capacity of any binary-input memoryless symmetric (BMS) channel, and are observed to have a comparable performance to that of random codes in terms of scaling laws. On the negative side, RM codes lack efficient decoders with performance close to that of a maximum likelihood decoder for general parameters. Also, they only admit certain discrete…
▽ More
Reed-Muller (RM) codes are conjectured to achieve the capacity of any binary-input memoryless symmetric (BMS) channel, and are observed to have a comparable performance to that of random codes in terms of scaling laws. On the negative side, RM codes lack efficient decoders with performance close to that of a maximum likelihood decoder for general parameters. Also, they only admit certain discrete sets of rates. In this paper, we focus on subcodes of RM codes with flexible rates that can take any code dimension from 1 to n, where n is the blocklength. We first extend the recursive projection-aggregation (RPA) algorithm proposed recently by Ye and Abbe for decoding RM codes. To lower the complexity of our decoding algorithm, referred to as subRPA in this paper, we investigate different ways for pruning the projections. We then derive the soft-decision based version of our algorithm, called soft-subRPA, that is shown to improve upon the performance of subRPA. Furthermore, it enables training a machine learning (ML) model to search for \textit{good} sets of projections in the sense of minimizing the decoding error rate. Training our ML model enables achieving very close to the performance of full-projection decoding with a significantly reduced number of projections. For instance, our simulation results on a (64,14) RM subcode show almost identical performance for full-projection decoding and pruned-projection decoding with 15 projections picked via training our ML model. This is equivalent to lowering the complexity by a factor of more than 4 without sacrificing the decoding performance.
△ Less
Submitted 2 February, 2021;
originally announced February 2021.
-
ACeD: Scalable Data Availability Oracle
Authors:
Peiyao Sheng,
Bowen Xue,
Sreeram Kannan,
Pramod Viswanath
Abstract:
A popular method in practice offloads computation and storage in blockchains by relying on committing only hashes of off-chain data into the blockchain. This mechanism is acknowledged to be vulnerable to a stalling attack: the blocks corresponding to the committed hashes may be unavailable at any honest node. The straightforward solution of broadcasting all blocks to the entire network sidesteps t…
▽ More
A popular method in practice offloads computation and storage in blockchains by relying on committing only hashes of off-chain data into the blockchain. This mechanism is acknowledged to be vulnerable to a stalling attack: the blocks corresponding to the committed hashes may be unavailable at any honest node. The straightforward solution of broadcasting all blocks to the entire network sidesteps this data availability attack, but it is not scalable. In this paper, we propose ACeD, a scalable solution to this data availability problem with $O(1)$ communication efficiency, the first to the best of our knowledge.
The key innovation is a new protocol that requires each of the $N$ nodes to receive only $O(1/N)$ of the block, such that the data is guaranteed to be available in a distributed manner in the network. Our solution creatively integrates coding-theoretic designs inside of Merkle tree commitments to guarantee efficient and tamper-proof reconstruction; this solution is distinct from Asynchronous Verifiable Information Dispersal (in guaranteeing efficient proofs of malformed coding) and Coded Merkle Tree (which only provides guarantees for random corruption as opposed to our guarantees for worst-case corruption). We implement ACeD with full functionality in 6000 lines of Rust code, integrate the functionality as a smart contract into Ethereum via a high-performance implementation demonstrating up to 10,000 transactions per second in throughput and 6000x reduction in gas cost on the Ethereum testnet Kovan.
△ Less
Submitted 3 March, 2021; v1 submitted 30 October, 2020;
originally announced November 2020.
-
Blockchain CAP Theorem Allows User-Dependent Adaptivity and Finality
Authors:
Suryanarayana Sankagiri,
Xuechao Wang,
Sreeram Kannan,
Pramod Viswanath
Abstract:
Longest-chain blockchain protocols, such as Bitcoin, guarantee liveness even when the number of actively participating users is variable, i.e., they are adaptive. However, they are not safe under network partitions, i.e., they do not guarantee finality. On the other hand, classical blockchain protocols, like PBFT, achieve finality but not adaptivity. Indeed, the CAP theorem in the context of block…
▽ More
Longest-chain blockchain protocols, such as Bitcoin, guarantee liveness even when the number of actively participating users is variable, i.e., they are adaptive. However, they are not safe under network partitions, i.e., they do not guarantee finality. On the other hand, classical blockchain protocols, like PBFT, achieve finality but not adaptivity. Indeed, the CAP theorem in the context of blockchains asserts that no protocol can simultaneously offer both adaptivity and finality.
We propose a new blockchain protocol, called the checkpointed longest chain, that offers individual users the choice between finality and adaptivity instead of imposing it at a system-wide level. This protocol's salient feature is that it supports two distinct confirmation rules: one that guarantees adaptivity and the other finality. The more optimistic adaptive rule always confirms blocks that are marked as finalized by the more conservative rule, and may possibly confirm more blocks during variable participation levels. Clients (users) make a local choice between the confirmation rules as per their personal preference, while miners follow a fixed block proposal rule that is consistent with both confirmation rules. The proposed protocol has the additional benefit of intrinsic validity: the finalized blocks always lie on a single blockchain, and therefore miners can attest to the validity of transactions while proposing blocks. Our protocol builds on the notion of a finality gadget, a popular technique for adding finality to longest-chain protocols.
△ Less
Submitted 2 April, 2021; v1 submitted 26 October, 2020;
originally announced October 2020.
-
BFT Protocol Forensics
Authors:
Peiyao Sheng,
Gerui Wang,
Kartik Nayak,
Sreeram Kannan,
Pramod Viswanath
Abstract:
Byzantine fault-tolerant (BFT) protocols allow a group of replicas to come to a consensus even when some of the replicas are Byzantine faulty. There exist multiple BFT protocols to securely tolerate an optimal number of faults $t$ under different network settings. However, if the number of faults $f$ exceeds $t$ then security could be violated. In this paper we mathematically formalize the study o…
▽ More
Byzantine fault-tolerant (BFT) protocols allow a group of replicas to come to a consensus even when some of the replicas are Byzantine faulty. There exist multiple BFT protocols to securely tolerate an optimal number of faults $t$ under different network settings. However, if the number of faults $f$ exceeds $t$ then security could be violated. In this paper we mathematically formalize the study of forensic support of BFT protocols: we aim to identify (with cryptographic integrity) as many of the malicious replicas as possible and in as a distributed manner as possible. Our main result is that forensic support of BFT protocols depends heavily on minor implementation details that do not affect the protocol's security or complexity. Focusing on popular BFT protocols (PBFT, HotStuff, Algorand) we exactly characterize their forensic support, showing that there exist minor variants of each protocol for which the forensic supports vary widely. We show strong forensic support capability of LibraBFT, the consensus protocol of Diem cryptocurrency; our lightweight forensic module implemented on a Diem client is open-sourced and is under active consideration for deployment in Diem. Finally, we show that all secure BFT protocols designed for $2t+1$ replicas communicating over a synchronous network forensic support are inherently nonexistent; this impossibility result holds for all BFT protocols and even if one has access to the states of all replicas (including Byzantine ones).
△ Less
Submitted 8 November, 2021; v1 submitted 13 October, 2020;
originally announced October 2020.
-
Enriching Word Embeddings with Temporal and Spatial Information
Authors:
Hongyu Gong,
Suma Bhat,
Pramod Viswanath
Abstract:
The meaning of a word is closely linked to sociocultural factors that can change over time and location, resulting in corresponding meaning changes. Taking a global view of words and their meanings in a widely used language, such as English, may require us to capture more refined semantics for use in time-specific or location-aware situations, such as the study of cultural trends or language use.…
▽ More
The meaning of a word is closely linked to sociocultural factors that can change over time and location, resulting in corresponding meaning changes. Taking a global view of words and their meanings in a widely used language, such as English, may require us to capture more refined semantics for use in time-specific or location-aware situations, such as the study of cultural trends or language use. However, popular vector representations for words do not adequately include temporal or spatial information. In this work, we present a model for learning word representation conditioned on time and location. In addition to capturing meaning changes over time and location, we require that the resulting word embeddings retain salient semantic and geometric properties. We train our model on time- and location-stamped corpora, and show using both quantitative and qualitative evaluations that it can capture semantics across time and locations. We note that our model compares favorably with the state-of-the-art for time-specific embedding, and serves as a new benchmark for location-specific embeddings.
△ Less
Submitted 1 October, 2020;
originally announced October 2020.
-
Deepcode and Modulo-SK are Designed for Different Settings
Authors:
Hyeji Kim,
Yihan Jiang,
Sreeram Kannan,
Sewoong Oh,
Pramod Viswanath
Abstract:
We respond to [1] which claimed that "Modulo-SK scheme outperforms Deepcode [2]". We demonstrate that this statement is not true: the two schemes are designed and evaluated for entirely different settings. DeepCode is designed and evaluated for the AWGN channel with (potentially delayed) uncoded output feedback. Modulo-SK is evaluated on the AWGN channel with coded feedback and unit delay. [1] als…
▽ More
We respond to [1] which claimed that "Modulo-SK scheme outperforms Deepcode [2]". We demonstrate that this statement is not true: the two schemes are designed and evaluated for entirely different settings. DeepCode is designed and evaluated for the AWGN channel with (potentially delayed) uncoded output feedback. Modulo-SK is evaluated on the AWGN channel with coded feedback and unit delay. [1] also claimed an implementation of Schalkwijk and Kailath (SK) [3] which was numerically stable for any number of information bits and iterations. However, we observe that while their implementation does marginally improve over ours, it also suffers from a fundamental issue with precision. Finally, we show that Deepcode dominates the optimized performance of SK, over a natural choice of parameterizations when the feedback is noisy.
△ Less
Submitted 18 August, 2020;
originally announced August 2020.
-
Everything is a Race and Nakamoto Always Wins
Authors:
Amir Dembo,
Sreeram Kannan,
Ertem Nusret Tas,
David Tse,
Pramod Viswanath,
Xuechao Wang,
Ofer Zeitouni
Abstract:
Nakamoto invented the longest chain protocol, and claimed its security by analyzing the private double-spend attack, a race between the adversary and the honest nodes to grow a longer chain. But is it the worst attack? We answer the question in the affirmative for three classes of longest chain protocols, designed for different consensus models: 1) Nakamoto's original Proof-of-Work protocol; 2) Ou…
▽ More
Nakamoto invented the longest chain protocol, and claimed its security by analyzing the private double-spend attack, a race between the adversary and the honest nodes to grow a longer chain. But is it the worst attack? We answer the question in the affirmative for three classes of longest chain protocols, designed for different consensus models: 1) Nakamoto's original Proof-of-Work protocol; 2) Ouroboros and SnowWhite Proof-of-Stake protocols; 3) Chia Proof-of-Space protocol. As a consequence, exact characterization of the maximum tolerable adversary power is obtained for each protocol as a function of the average block time normalized by the network delay. The security analysis of these protocols is performed in a unified manner by a novel method of reducing all attacks to a race between the adversary and the honest nodes.
△ Less
Submitted 30 August, 2020; v1 submitted 21 May, 2020;
originally announced May 2020.
-
Free2Shard: Adaptive-adversary-resistant sharding via Dynamic Self Allocation
Authors:
Ranvir Rana,
Sreeram Kannan,
David Tse,
Pramod Viswanath
Abstract:
Propelled by the growth of large-scale blockchain deployments, much recent progress has been made in designing sharding protocols that achieve throughput scaling linearly in the number of nodes. However, existing protocols are not robust to an adversary adaptively corrupting a fixed fraction of nodes. In this paper, we propose Free2Shard -- a new architecture that achieves near-linear scaling whil…
▽ More
Propelled by the growth of large-scale blockchain deployments, much recent progress has been made in designing sharding protocols that achieve throughput scaling linearly in the number of nodes. However, existing protocols are not robust to an adversary adaptively corrupting a fixed fraction of nodes. In this paper, we propose Free2Shard -- a new architecture that achieves near-linear scaling while being secure against a fully adaptive adversary.
The focal point of this architecture is a dynamic self-allocation algorithm that lets users allocate themselves to shards in response to adversarial action, without requiring a central or cryptographic proof. This architecture has several attractive features unusual for sharding protocols, including: (a) the ability to handle the regime of large number of shards (relative to the number of nodes); (b) heterogeneous shard demands; (c) requiring only a small minority to follow the self-allocation; (d) asynchronous shard rotation; (e) operation in a purely identity-free proof-of-work setting. The key technical contribution is a deep mathematical connection to the classical work of Blackwell in dynamic game theory.
△ Less
Submitted 19 May, 2020;
originally announced May 2020.
-
Prism Removes Consensus Bottleneck for Smart Contracts
Authors:
Gerui Wang,
Shuo Wang,
Vivek Bagaria,
David Tse,
Pramod Viswanath
Abstract:
The performance of existing permissionless smart contract platforms such as Ethereum is limited by the consensus layer. Prism is a new proof-of-work consensus protocol that provably achieves throughput and latency up to physical limits while retaining the strong guarantees of the longest chain protocol. This paper reports experimental results from implementations of two smart contract virtual mach…
▽ More
The performance of existing permissionless smart contract platforms such as Ethereum is limited by the consensus layer. Prism is a new proof-of-work consensus protocol that provably achieves throughput and latency up to physical limits while retaining the strong guarantees of the longest chain protocol. This paper reports experimental results from implementations of two smart contract virtual machines, EVM and MoveVM, on top of Prism and demonstrates that the consensus bottleneck has been removed. Code can be found at https://github.com/wgr523/prism-smart-contracts.
△ Less
Submitted 12 June, 2020; v1 submitted 19 April, 2020;
originally announced April 2020.
-
Turbo Autoencoder: Deep learning based channel codes for point-to-point communication channels
Authors:
Yihan Jiang,
Hyeji Kim,
Himanshu Asnani,
Sreeram Kannan,
Sewoong Oh,
Pramod Viswanath
Abstract:
Designing codes that combat the noise in a communication medium has remained a significant area of research in information theory as well as wireless communications. Asymptotically optimal channel codes have been developed by mathematicians for communicating under canonical models after over 60 years of research. On the other hand, in many non-canonical channel settings, optimal codes do not exist…
▽ More
Designing codes that combat the noise in a communication medium has remained a significant area of research in information theory as well as wireless communications. Asymptotically optimal channel codes have been developed by mathematicians for communicating under canonical models after over 60 years of research. On the other hand, in many non-canonical channel settings, optimal codes do not exist and the codes designed for canonical models are adapted via heuristics to these channels and are thus not guaranteed to be optimal. In this work, we make significant progress on this problem by designing a fully end-to-end jointly trained neural encoder and decoder, namely, Turbo Autoencoder (TurboAE), with the following contributions: ($a$) under moderate block lengths, TurboAE approaches state-of-the-art performance under canonical channels; ($b$) moreover, TurboAE outperforms the state-of-the-art codes under non-canonical settings in terms of reliability. TurboAE shows that the development of channel coding design can be automated via deep learning, with near-optimal performance.
△ Less
Submitted 7 November, 2019;
originally announced November 2019.
-
Proof-of-Stake Longest Chain Protocols: Security vs Predictability
Authors:
Vivek Bagaria,
Amir Dembo,
Sreeram Kannan,
Sewoong Oh,
David Tse,
Pramod Viswanath,
Xuechao Wang,
Ofer Zeitouni
Abstract:
The Nakamoto longest chain protocol is remarkably simple and has been proven to provide security against any adversary with less than 50% of the total hashing power. Proof-of-stake (PoS) protocols are an energy efficient alternative; however existing protocols adopting Nakamoto's longest chain design achieve provable security only by allowing long-term predictability (which have serious security i…
▽ More
The Nakamoto longest chain protocol is remarkably simple and has been proven to provide security against any adversary with less than 50% of the total hashing power. Proof-of-stake (PoS) protocols are an energy efficient alternative; however existing protocols adopting Nakamoto's longest chain design achieve provable security only by allowing long-term predictability (which have serious security implications). In this paper, we prove that a natural longest chain PoS protocol with similar predictability as Nakamoto's PoW protocol can achieve security against any adversary with less than 1/(1+e) fraction of the total stake. Moreover we propose a new family of longest chain PoS protocols that achieve security against a 50% adversary, while only requiring short-term predictability. Our proofs present a new approach to analyzing the formal security of blockchains, based on a notion of adversary-proof convergence.
△ Less
Submitted 22 February, 2020; v1 submitted 5 October, 2019;
originally announced October 2019.
-
Coded Merkle Tree: Solving Data Availability Attacks in Blockchains
Authors:
Mingchao Yu,
Saeid Sahraei,
Songze Li,
Salman Avestimehr,
Sreeram Kannan,
Pramod Viswanath
Abstract:
In this paper, we propose coded Merkle tree (CMT), a novel hash accumulator that offers a constant-cost protection against data availability attacks in blockchains, even if the majority of the network nodes are malicious. A CMT is constructed using a family of sparse erasure codes on each layer, and is recovered by iteratively applying a peeling-decoding technique that enables a compact proof for…
▽ More
In this paper, we propose coded Merkle tree (CMT), a novel hash accumulator that offers a constant-cost protection against data availability attacks in blockchains, even if the majority of the network nodes are malicious. A CMT is constructed using a family of sparse erasure codes on each layer, and is recovered by iteratively applying a peeling-decoding technique that enables a compact proof for data availability attack on any layer. Our algorithm enables any node to verify the full availability of any data block generated by the system by just downloading a $Θ(1)$ byte block hash commitment and randomly sampling $Θ(\log b)$ bytes, where $b$ is the size of the data block. With the help of only one connected honest node in the system, our method also allows any node to verify any tampering of the coded Merkle tree by just downloading $Θ(\log b)$ bytes. We provide a modular library for CMT in Rust and Python and demonstrate its efficacy inside the Parity Bitcoin client.
△ Less
Submitted 20 December, 2019; v1 submitted 2 October, 2019;
originally announced October 2019.
-
Practical Low Latency Proof of Work Consensus
Authors:
Lei Yang,
Xuechao Wang,
Vivek Bagaria,
Gerui Wang,
Mohammad Alizadeh,
David Tse,
Giulia Fanti,
Pramod Viswanath
Abstract:
Bitcoin is the first fully-decentralized permissionless blockchain protocol to achieve a high level of security, but at the expense of poor throughput and latency. Scaling the performance of Bitcoin has a been a major recent direction of research. One successful direction of work has involved replacing proof of work (PoW) by proof of stake (PoS). Proposals to scale the performance in the PoW setti…
▽ More
Bitcoin is the first fully-decentralized permissionless blockchain protocol to achieve a high level of security, but at the expense of poor throughput and latency. Scaling the performance of Bitcoin has a been a major recent direction of research. One successful direction of work has involved replacing proof of work (PoW) by proof of stake (PoS). Proposals to scale the performance in the PoW setting itself have focused mostly on parallelizing the mining process, scaling throughput; the few proposals to improve latency have either sacrificed throughput or the latency guarantees involve large constants rendering it practically useless. Our first contribution is to design a new PoW blockchain Prism++ that has provably low latency and high throughput; the design retains the parallel-chain approach espoused in Prism but invents a new confirmation rule to infer the permanency of a block by combining information across the parallel chains. We show security at the level of Bitcoin with very small confirmation latency (a small constant factor of block interarrival time). A key aspect to scaling the performance is to use a large number of parallel chains, which puts significant strain on the system. Our second contribution is the design and evaluation of a practical system to efficiently manage the memory, computation, and I/O imperatives of a large number of parallel chains. Our implementation of Prism++ achieves a throughput of over 80,000 transactions per second and confirmation latency of tens of seconds on networks of up to 900 EC2 Virtual Machines.
△ Less
Submitted 17 February, 2023; v1 submitted 24 September, 2019;
originally announced September 2019.
-
Barracuda: The Power of $\ell$-polling in Proof-of-Stake Blockchains
Authors:
Giulia Fanti,
Jiantao Jiao,
Ashok Makkuva,
Sewoong Oh,
Ranvir Rana,
Pramod Viswanath
Abstract:
A blockchain is a database of sequential events that is maintained by a distributed group of nodes. A key consensus problem in blockchains is that of determining the next block (data element) in the sequence. Many blockchains address this by electing a new node to propose each new block. The new block is (typically) appended to the tip of the proposer's local blockchain, and subsequently broadcast…
▽ More
A blockchain is a database of sequential events that is maintained by a distributed group of nodes. A key consensus problem in blockchains is that of determining the next block (data element) in the sequence. Many blockchains address this by electing a new node to propose each new block. The new block is (typically) appended to the tip of the proposer's local blockchain, and subsequently broadcast to the rest of the network. Without network delay (or adversarial behavior), this procedure would give a perfect chain, since each proposer would have the same view of the blockchain. A major challenge in practice is forking. Due to network delays, a proposer may not yet have the most recent block, and may, therefore, create a side chain that branches from the middle of the main chain. Forking reduces throughput, since only one a single main chain can survive, and all other blocks are discarded. We propose a new P2P protocol for blockchains called Barracuda, in which each proposer, prior to proposing a block, polls $\ell$ other nodes for their local blocktree information. Under a stochastic network model, we prove that this lightweight primitive improves throughput as if the entire network were a factor of $\ell$ faster. We provide guidelines on how to implement Barracuda in practice, guaranteeing robustness against several real-world factors.
△ Less
Submitted 18 September, 2019;
originally announced September 2019.
-
Coded State Machine -- Scaling State Machine Execution under Byzantine Faults
Authors:
Songze Li,
Saeid Sahraei,
Mingchao Yu,
Salman Avestimehr,
Sreeram Kannan,
Pramod Viswanath
Abstract:
We introduce an information-theoretic framework, named Coded State Machine (CSM), to securely and efficiently execute multiple state machines on untrusted network nodes, some of which are Byzantine. The standard method of solving this problem is using State Machine Replication, which achieves high security at the cost of low efficiency. We propose CSM, which achieves the optimal linear scaling in…
▽ More
We introduce an information-theoretic framework, named Coded State Machine (CSM), to securely and efficiently execute multiple state machines on untrusted network nodes, some of which are Byzantine. The standard method of solving this problem is using State Machine Replication, which achieves high security at the cost of low efficiency. We propose CSM, which achieves the optimal linear scaling in storage efficiency, throughput, and security simultaneously with the size of the network. The storage efficiency is scaled via the design of Lagrange coded states and coded input commands that require the same storage size as their origins. The computational efficiency is scaled using a novel delegation algorithm, called INTERMIX, which is an information-theoretically verifiable matrix-vector multiplication algorithm of independent interest. Using INTERMIX, the network nodes securely delegate their coding operations to a single worker node, and a small group of randomly selected auditor nodes verify its correctness, so that computational efficiency can scale almost linearly with the network size, without compromising on security.
△ Less
Submitted 25 June, 2019;
originally announced June 2019.
-
Learning in Gated Neural Networks
Authors:
Ashok Vardhan Makkuva,
Sewoong Oh,
Sreeram Kannan,
Pramod Viswanath
Abstract:
Gating is a key feature in modern neural networks including LSTMs, GRUs and sparsely-gated deep neural networks. The backbone of such gated networks is a mixture-of-experts layer, where several experts make regression decisions and gating controls how to weigh the decisions in an input-dependent manner. Despite having such a prominent role in both modern and classical machine learning, very little…
▽ More
Gating is a key feature in modern neural networks including LSTMs, GRUs and sparsely-gated deep neural networks. The backbone of such gated networks is a mixture-of-experts layer, where several experts make regression decisions and gating controls how to weigh the decisions in an input-dependent manner. Despite having such a prominent role in both modern and classical machine learning, very little is understood about parameter recovery of mixture-of-experts since gradient descent and EM algorithms are known to be stuck in local optima in such models.
In this paper, we perform a careful analysis of the optimization landscape and show that with appropriately designed loss functions, gradient descent can indeed learn the parameters accurately. A key idea underpinning our results is the design of two {\em distinct} loss functions, one for recovering the expert parameters and another for recovering the gating parameters. We demonstrate the first sample complexity results for parameter recovery in this model for any algorithm and demonstrate significant performance gains over standard loss functions in numerical experiments.
△ Less
Submitted 17 June, 2020; v1 submitted 6 June, 2019;
originally announced June 2019.
-
DeepTurbo: Deep Turbo Decoder
Authors:
Yihan Jiang,
Hyeji Kim,
Himanshu Asnani,
Sreeram Kannan,
Sewoong Oh,
Pramod Viswanath
Abstract:
Present-day communication systems routinely use codes that approach the channel capacity when coupled with a computationally efficient decoder. However, the decoder is typically designed for the Gaussian noise channel and is known to be sub-optimal for non-Gaussian noise distribution. Deep learning methods offer a new approach for designing decoders that can be trained and tailored for arbitrary c…
▽ More
Present-day communication systems routinely use codes that approach the channel capacity when coupled with a computationally efficient decoder. However, the decoder is typically designed for the Gaussian noise channel and is known to be sub-optimal for non-Gaussian noise distribution. Deep learning methods offer a new approach for designing decoders that can be trained and tailored for arbitrary channel statistics. We focus on Turbo codes and propose DeepTurbo, a novel deep learning based architecture for Turbo decoding.
The standard Turbo decoder (Turbo) iteratively applies the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm with an interleaver in the middle. A neural architecture for Turbo decoding termed (NeuralBCJR), was proposed recently. There, the key idea is to create a module that imitates the BCJR algorithm using supervised learning, and to use the interleaver architecture along with this module, which is then fine-tuned using end-to-end training. However, knowledge of the BCJR algorithm is required to design such an architecture, which also constrains the resulting learned decoder. Here we remedy this requirement and propose a fully end-to-end trained neural decoder - Deep Turbo Decoder (DeepTurbo). With novel learnable decoder structure and training methodology, DeepTurbo reveals superior performance under both AWGN and non-AWGN settings as compared to the other two decoders - Turbo and NeuralBCJR. Furthermore, among all the three, DeepTurbo exhibits the lowest error floor.
△ Less
Submitted 24 April, 2019; v1 submitted 6 March, 2019;
originally announced March 2019.
-
Context-Sensitive Malicious Spelling Error Correction
Authors:
Hongyu Gong,
Yuchen Li,
Suma Bhat,
Pramod Viswanath
Abstract:
Misspelled words of the malicious kind work by changing specific keywords and are intended to thwart existing automated applications for cyber-environment control such as harassing content detection on the Internet and email spam detection. In this paper, we focus on malicious spelling correction, which requires an approach that relies on the context and the surface forms of targeted keywords. In…
▽ More
Misspelled words of the malicious kind work by changing specific keywords and are intended to thwart existing automated applications for cyber-environment control such as harassing content detection on the Internet and email spam detection. In this paper, we focus on malicious spelling correction, which requires an approach that relies on the context and the surface forms of targeted keywords. In the context of two applications--profanity detection and email spam detection--we show that malicious misspellings seriously degrade their performance. We then propose a context-sensitive approach for malicious spelling correction using word embeddings and demonstrate its superior performance compared to state-of-the-art spell checkers.
△ Less
Submitted 22 January, 2019;
originally announced January 2019.
-
LEARN Codes: Inventing Low-latency Codes via Recurrent Neural Networks
Authors:
Yihan Jiang,
Hyeji Kim,
Himanshu Asnani,
Sreeram Kannan,
Sewoong Oh,
Pramod Viswanath
Abstract:
Designing channel codes under low-latency constraints is one of the most demanding requirements in 5G standards. However, a sharp characterization of the performance of traditional codes is available only in the large block-length limit. Guided by such asymptotic analysis, code designs require large block lengths as well as latency to achieve the desired error rate. Tail-biting convolutional codes…
▽ More
Designing channel codes under low-latency constraints is one of the most demanding requirements in 5G standards. However, a sharp characterization of the performance of traditional codes is available only in the large block-length limit. Guided by such asymptotic analysis, code designs require large block lengths as well as latency to achieve the desired error rate. Tail-biting convolutional codes and other recent state-of-the-art short block codes, while promising reduced latency, are neither robust to channel-mismatch nor adaptive to varying channel conditions. When the codes designed for one channel (e.g.,~Additive White Gaussian Noise (AWGN) channel) are used for another (e.g.,~non-AWGN channels), heuristics are necessary to achieve non-trivial performance.
In this paper, we first propose an end-to-end learned neural code, obtained by jointly designing a Recurrent Neural Network (RNN) based encoder and decoder. This code outperforms canonical convolutional code under block settings. We then leverage this experience to propose a new class of codes under low-latency constraints, which we call Low-latency Efficient Adaptive Robust Neural (LEARN) codes. These codes outperform state-of-the-art low-latency codes and exhibit robustness and adaptivity properties. LEARN codes show the potential to design new versatile and universal codes for future communications via tools of modern deep learning coupled with communication engineering insights.
△ Less
Submitted 24 July, 2020; v1 submitted 30 November, 2018;
originally announced November 2018.
-
Estimators for Multivariate Information Measures in General Probability Spaces
Authors:
Arman Rahimzamani,
Himanshu Asnani,
Pramod Viswanath,
Sreeram Kannan
Abstract:
Information theoretic quantities play an important role in various settings in machine learning, including causality testing, structure inference in graphical models, time-series problems, feature selection as well as in providing privacy guarantees. A key quantity of interest is the mutual information and generalizations thereof, including conditional mutual information, multivariate mutual infor…
▽ More
Information theoretic quantities play an important role in various settings in machine learning, including causality testing, structure inference in graphical models, time-series problems, feature selection as well as in providing privacy guarantees. A key quantity of interest is the mutual information and generalizations thereof, including conditional mutual information, multivariate mutual information, total correlation and directed information. While the aforementioned information quantities are well defined in arbitrary probability spaces, existing estimators add or subtract entropies (we term them $ΣH$ methods). These methods work only in purely discrete space or purely continuous case since entropy (or differential entropy) is well defined only in that regime.
In this paper, we define a general graph divergence measure ($\mathbb{GDM}$), as a measure of incompatibility between the observed distribution and a given graphical model structure. This generalizes the aforementioned information measures and we construct a novel estimator via a coupling trick that directly estimates these multivariate information measures using the Radon-Nikodym derivative. These estimators are proven to be consistent in a general setting which includes several cases where the existing estimators fail, thus providing the only known estimators for the following settings: (1) the data has some discrete and some continuous-valued components (2) some (or all) of the components themselves are discrete-continuous mixtures (3) the data is real-valued but does not have a joint density on the entire space, rather is supported on a low-dimensional manifold. We show that our proposed estimators significantly outperform known estimators on synthetic and real datasets.
△ Less
Submitted 26 October, 2018;
originally announced October 2018.
-
Deconstructing the Blockchain to Approach Physical Limits
Authors:
Vivek Bagaria,
Sreeram Kannan,
David Tse,
Giulia Fanti,
Pramod Viswanath
Abstract:
Transaction throughput, confirmation latency and confirmation reliability are fundamental performance measures of any blockchain system in addition to its security. In a decentralized setting, these measures are limited by two underlying physical network attributes: communication capacity and speed-of-light propagation delay. Existing systems operate far away from these physical limits. In this wo…
▽ More
Transaction throughput, confirmation latency and confirmation reliability are fundamental performance measures of any blockchain system in addition to its security. In a decentralized setting, these measures are limited by two underlying physical network attributes: communication capacity and speed-of-light propagation delay. Existing systems operate far away from these physical limits. In this work we introduce Prism, a new proof-of-work blockchain protocol, which can achieve 1) security against up to 50% adversarial hashing power; 2) optimal throughput up to the capacity C of the network; 3) confirmation latency for honest transactions proportional to the propagation delay D, with confirmation error probability exponentially small in CD ; 4) eventual total ordering of all transactions. Our approach to the design of this protocol is based on deconstructing the blockchain into its basic functionalities and systematically scaling up these functionalities to approach their physical limits.
△ Less
Submitted 1 October, 2019; v1 submitted 18 October, 2018;
originally announced October 2018.
-
Learning One-hidden-layer Neural Networks under General Input Distributions
Authors:
Weihao Gao,
Ashok Vardhan Makkuva,
Sewoong Oh,
Pramod Viswanath
Abstract:
Significant advances have been made recently on training neural networks, where the main challenge is in solving an optimization problem with abundant critical points. However, existing approaches to address this issue crucially rely on a restrictive assumption: the training data is drawn from a Gaussian distribution. In this paper, we provide a novel unified framework to design loss functions wit…
▽ More
Significant advances have been made recently on training neural networks, where the main challenge is in solving an optimization problem with abundant critical points. However, existing approaches to address this issue crucially rely on a restrictive assumption: the training data is drawn from a Gaussian distribution. In this paper, we provide a novel unified framework to design loss functions with desirable landscape properties for a wide range of general input distributions. On these loss functions, remarkably, stochastic gradient descent theoretically recovers the true parameters with global initializations and empirically outperforms the existing approaches. Our loss function design bridges the notion of score functions with the topic of neural network optimization. Central to our approach is the task of estimating the score function from samples, which is of basic and independent interest to theoretical statistics. Traditional estimation methods (example: kernel based) fail right at the outset; we bring statistical methods of local likelihood to design a novel estimator of score functions, that provably adapts to the local geometry of the unknown density.
△ Less
Submitted 26 February, 2019; v1 submitted 9 October, 2018;
originally announced October 2018.
-
PolyShard: Coded Sharding Achieves Linearly Scaling Efficiency and Security Simultaneously
Authors:
Songze Li,
Mingchao Yu,
Chien-Sheng Yang,
A. Salman Avestimehr,
Sreeram Kannan,
Pramod Viswanath
Abstract:
Today's blockchain designs suffer from a trilemma claiming that no blockchain system can simultaneously achieve decentralization, security, and performance scalability. For current blockchain systems, as more nodes join the network, the efficiency of the system (computation, communication, and storage) stays constant at best. A leading idea for enabling blockchains to scale efficiency is the notio…
▽ More
Today's blockchain designs suffer from a trilemma claiming that no blockchain system can simultaneously achieve decentralization, security, and performance scalability. For current blockchain systems, as more nodes join the network, the efficiency of the system (computation, communication, and storage) stays constant at best. A leading idea for enabling blockchains to scale efficiency is the notion of sharding: different subsets of nodes handle different portions of the blockchain, thereby reducing the load for each individual node. However, existing sharding proposals achieve efficiency scaling by compromising on trust - corrupting the nodes in a given shard will lead to the permanent loss of the corresponding portion of data. In this paper, we settle the trilemma by demonstrating a new protocol for coded storage and computation in blockchains. In particular, we propose PolyShard: ``polynomially coded sharding'' scheme that achieves information-theoretic upper bounds on the efficiency of the storage, system throughput, as well as on trust, thus enabling a truly scalable system. We provide simulation results that numerically demonstrate the performance improvement over state of the arts, and the scalability of the PolyShard system. Finally, we discuss potential enhancements, and highlight practical considerations in building such a system.
△ Less
Submitted 24 January, 2020; v1 submitted 27 September, 2018;
originally announced September 2018.
-
Compounding of Wealth in Proof-of-Stake Cryptocurrencies
Authors:
Giulia Fanti,
Leonid Kogan,
Sewoong Oh,
Kathleen Ruan,
Pramod Viswanath,
Gerui Wang
Abstract:
Proof-of-stake (PoS) is a promising approach for designing efficient blockchains, where block proposers are randomly chosen with probability proportional to their stake. A primary concern with PoS systems is the "rich getting richer" phenomenon, whereby wealthier nodes are more likely to get elected, and hence reap the block reward, making them even wealthier. In this paper, we introduce the notio…
▽ More
Proof-of-stake (PoS) is a promising approach for designing efficient blockchains, where block proposers are randomly chosen with probability proportional to their stake. A primary concern with PoS systems is the "rich getting richer" phenomenon, whereby wealthier nodes are more likely to get elected, and hence reap the block reward, making them even wealthier. In this paper, we introduce the notion of equitability, which quantifies how much a proposer can amplify her stake compared to her initial investment. Even with everyone following protocol (i.e., honest behavior), we show that existing methods of allocating block rewards lead to poor equitability, as does initializing systems with small stake pools and/or large rewards relative to the stake pool. We identify a \emph{geometric} reward function, which we prove is maximally equitable over all choices of reward functions under honest behavior and bound the deviation for strategic actions; the proofs involve the study of optimization problems and stochastic dominances of Polya urn processes, and are of independent mathematical interest. These results allow us to provide a systematic framework to choose the parameters of a practical incentive system for PoS cryptocurrencies.
△ Less
Submitted 15 October, 2018; v1 submitted 20 September, 2018;
originally announced September 2018.
-
Deepcode: Feedback Codes via Deep Learning
Authors:
Hyeji Kim,
Yihan Jiang,
Sreeram Kannan,
Sewoong Oh,
Pramod Viswanath
Abstract:
The design of codes for communicating reliably over a statistically well defined channel is an important endeavor involving deep mathematical research and wide-ranging practical applications. In this work, we present the first family of codes obtained via deep learning, which significantly beats state-of-the-art codes designed over several decades of research. The communication channel under consi…
▽ More
The design of codes for communicating reliably over a statistically well defined channel is an important endeavor involving deep mathematical research and wide-ranging practical applications. In this work, we present the first family of codes obtained via deep learning, which significantly beats state-of-the-art codes designed over several decades of research. The communication channel under consideration is the Gaussian noise channel with feedback, whose study was initiated by Shannon; feedback is known theoretically to improve reliability of communication, but no practical codes that do so have ever been successfully constructed.
We break this logjam by integrating information theoretic insights harmoniously with recurrent-neural-network based encoders and decoders to create novel codes that outperform known codes by 3 orders of magnitude in reliability. We also demonstrate several desirable properties of the codes: (a) generalization to larger block lengths, (b) composability with known codes, (c) adaptation to practical constraints. This result also has broader ramifications for coding theory: even when the channel has a clear mathematical model, deep learning methodologies, when combined with channel-specific information-theoretic insights, can potentially beat state-of-the-art codes constructed over decades of mathematical research.
△ Less
Submitted 2 July, 2018;
originally announced July 2018.
-
Dandelion++: Lightweight Cryptocurrency Networking with Formal Anonymity Guarantees
Authors:
Giulia Fanti,
Shaileshh Bojja Venkatakrishnan,
Surya Bakshi,
Bradley Denby,
Shruti Bhargava,
Andrew Miller,
Pramod Viswanath
Abstract:
Recent work has demonstrated significant anonymity vulnerabilities in Bitcoin's networking stack. In particular, the current mechanism for broadcasting Bitcoin transactions allows third-party observers to link transactions to the IP addresses that originated them. This lays the groundwork for low-cost, large-scale deanonymization attacks. In this work, we present Dandelion++, a first-principles de…
▽ More
Recent work has demonstrated significant anonymity vulnerabilities in Bitcoin's networking stack. In particular, the current mechanism for broadcasting Bitcoin transactions allows third-party observers to link transactions to the IP addresses that originated them. This lays the groundwork for low-cost, large-scale deanonymization attacks. In this work, we present Dandelion++, a first-principles defense against large-scale deanonymization attacks with near-optimal information-theoretic guarantees. Dandelion++ builds upon a recent proposal called Dandelion that exhibited similar goals. However, in this paper, we highlight simplifying assumptions made in Dandelion, and show how they can lead to serious deanonymization attacks when violated. In contrast, Dandelion++ defends against stronger adversaries that are allowed to disobey protocol. Dandelion++ is lightweight, scalable, and completely interoperable with the existing Bitcoin network. We evaluate it through experiments on Bitcoin's mainnet (i.e., the live Bitcoin network) to demonstrate its interoperability and low broadcast latency overhead.
△ Less
Submitted 28 May, 2018;
originally announced May 2018.
-
Embedding Syntax and Semantics of Prepositions via Tensor Decomposition
Authors:
Hongyu Gong,
Suma Bhat,
Pramod Viswanath
Abstract:
Prepositions are among the most frequent words in English and play complex roles in the syntax and semantics of sentences. Not surprisingly, they pose well-known difficulties in automatic processing of sentences (prepositional attachment ambiguities and idiosyncratic uses in phrases). Existing methods on preposition representation treat prepositions no different from content words (e.g., word2vec…
▽ More
Prepositions are among the most frequent words in English and play complex roles in the syntax and semantics of sentences. Not surprisingly, they pose well-known difficulties in automatic processing of sentences (prepositional attachment ambiguities and idiosyncratic uses in phrases). Existing methods on preposition representation treat prepositions no different from content words (e.g., word2vec and GloVe). In addition, recent studies aiming at solving prepositional attachment and preposition selection problems depend heavily on external linguistic resources and use dataset-specific word representations. In this paper we use word-triple counts (one of the triples being a preposition) to capture a preposition's interaction with its attachment and complement. We then derive preposition embeddings via tensor decomposition on a large unlabeled corpus. We reveal a new geometry involving Hadamard products and empirically demonstrate its utility in paraphrasing phrasal verbs. Furthermore, our preposition embeddings are used as simple features in two challenging downstream tasks: preposition selection and prepositional attachment disambiguation. We achieve results comparable to or better than the state-of-the-art on multiple standardized datasets.
△ Less
Submitted 23 May, 2018;
originally announced May 2018.
-
Communication Algorithms via Deep Learning
Authors:
Hyeji Kim,
Yihan Jiang,
Ranvir Rana,
Sreeram Kannan,
Sewoong Oh,
Pramod Viswanath
Abstract:
Coding theory is a central discipline underpinning wireline and wireless modems that are the workhorses of the information age. Progress in coding theory is largely driven by individual human ingenuity with sporadic breakthroughs over the past century. In this paper we study whether it is possible to automate the discovery of decoding algorithms via deep learning. We study a family of sequential c…
▽ More
Coding theory is a central discipline underpinning wireline and wireless modems that are the workhorses of the information age. Progress in coding theory is largely driven by individual human ingenuity with sporadic breakthroughs over the past century. In this paper we study whether it is possible to automate the discovery of decoding algorithms via deep learning. We study a family of sequential codes parameterized by recurrent neural network (RNN) architectures. We show that creatively designed and trained RNN architectures can decode well known sequential codes such as the convolutional and turbo codes with close to optimal performance on the additive white Gaussian noise (AWGN) channel, which itself is achieved by breakthrough algorithms of our times (Viterbi and BCJR decoders, representing dynamic programing and forward-backward algorithms). We show strong generalizations, i.e., we train at a specific signal to noise ratio and block length but test at a wide range of these quantities, as well as robustness and adaptivity to deviations from the AWGN setting.
△ Less
Submitted 23 May, 2018;
originally announced May 2018.
-
Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms
Authors:
Ashok Vardhan Makkuva,
Sewoong Oh,
Sreeram Kannan,
Pramod Viswanath
Abstract:
Mixture-of-Experts (MoE) is a widely popular model for ensemble learning and is a basic building block of highly successful modern neural networks as well as a component in Gated Recurrent Units (GRU) and Attention networks. However, present algorithms for learning MoE including the EM algorithm, and gradient descent are known to get stuck in local optima. From a theoretical viewpoint, finding an…
▽ More
Mixture-of-Experts (MoE) is a widely popular model for ensemble learning and is a basic building block of highly successful modern neural networks as well as a component in Gated Recurrent Units (GRU) and Attention networks. However, present algorithms for learning MoE including the EM algorithm, and gradient descent are known to get stuck in local optima. From a theoretical viewpoint, finding an efficient and provably consistent algorithm to learn the parameters remains a long standing open problem for more than two decades. In this paper, we introduce the first algorithm that learns the true parameters of a MoE model for a wide class of non-linearities with global consistency guarantees. While existing algorithms jointly or iteratively estimate the expert parameters and the gating paramters in the MoE, we propose a novel algorithm that breaks the deadlock and can directly estimate the expert parameters by sensing its echo in a carefully designed cross-moment tensor between the inputs and the output. Once the experts are known, the recovery of gating parameters still requires an EM algorithm; however, we show that the EM algorithm for this simplified problem, unlike the joint EM algorithm, converges to the true parameters. We empirically validate our algorithm on both the synthetic and real data sets in a variety of settings, and show superior performance to standard baselines.
△ Less
Submitted 6 June, 2019; v1 submitted 20 February, 2018;
originally announced February 2018.
-
Graph2Seq: Scalable Learning Dynamics for Graphs
Authors:
Shaileshh Bojja Venkatakrishnan,
Mohammad Alizadeh,
Pramod Viswanath
Abstract:
Neural networks have been shown to be an effective tool for learning algorithms over graph-structured data. However, graph representation techniques---that convert graphs to real-valued vectors for use with neural networks---are still in their infancy. Recent works have proposed several approaches (e.g., graph convolutional networks), but these methods have difficulty scaling and generalizing to g…
▽ More
Neural networks have been shown to be an effective tool for learning algorithms over graph-structured data. However, graph representation techniques---that convert graphs to real-valued vectors for use with neural networks---are still in their infancy. Recent works have proposed several approaches (e.g., graph convolutional networks), but these methods have difficulty scaling and generalizing to graphs with different sizes and shapes. We present Graph2Seq, a new technique that represents vertices of graphs as infinite time-series. By not limiting the representation to a fixed dimension, Graph2Seq scales naturally to graphs of arbitrary sizes and shapes. Graph2Seq is also reversible, allowing full recovery of the graph structure from the sequences. By analyzing a formal computational model for graph representation, we show that an unbounded sequence is necessary for scalability. Our experimental results with Graph2Seq show strong generalization and new state-of-the-art performance on a variety of graph combinatorial optimization problems.
△ Less
Submitted 9 October, 2018; v1 submitted 13 February, 2018;
originally announced February 2018.
-
Estimating Mutual Information for Discrete-Continuous Mixtures
Authors:
Weihao Gao,
Sreeram Kannan,
Sewoong Oh,
Pramod Viswanath
Abstract:
Estimating mutual information from observed samples is a basic primitive, useful in several machine learning tasks including correlation mining, information bottleneck clustering, learning a Chow-Liu tree, and conditional independence testing in (causal) graphical models. While mutual information is a well-defined quantity in general probability spaces, existing estimators can only handle two spec…
▽ More
Estimating mutual information from observed samples is a basic primitive, useful in several machine learning tasks including correlation mining, information bottleneck clustering, learning a Chow-Liu tree, and conditional independence testing in (causal) graphical models. While mutual information is a well-defined quantity in general probability spaces, existing estimators can only handle two special cases of purely discrete or purely continuous pairs of random variables. The main challenge is that these methods first estimate the (differential) entropies of X, Y and the pair (X;Y) and add them up with appropriate signs to get an estimate of the mutual information. These 3H-estimators cannot be applied in general mixture spaces, where entropy is not well-defined. In this paper, we design a novel estimator for mutual information of discrete-continuous mixtures. We prove that the proposed estimator is consistent. We provide numerical experiments suggesting superiority of the proposed estimator compared to other heuristics of adding small continuous noise to all the samples and applying standard estimators tailored for purely continuous variables, and quantizing the samples and applying standard estimators tailored for purely discrete variables. This significantly widens the applicability of mutual information estimation in real-world applications, where some variables are discrete, some continuous, and others are a mixture between continuous and discrete components.
△ Less
Submitted 9 October, 2018; v1 submitted 18 September, 2017;
originally announced September 2017.
-
Discovering Potential Correlations via Hypercontractivity
Authors:
Hyeji Kim,
Weihao Gao,
Sreeram Kannan,
Sewoong Oh,
Pramod Viswanath
Abstract:
Discovering a correlation from one variable to another variable is of fundamental scientific and practical interest. While existing correlation measures are suitable for discovering average correlation, they fail to discover hidden or potential correlations. To bridge this gap, (i) we postulate a set of natural axioms that we expect a measure of potential correlation to satisfy; (ii) we show that…
▽ More
Discovering a correlation from one variable to another variable is of fundamental scientific and practical interest. While existing correlation measures are suitable for discovering average correlation, they fail to discover hidden or potential correlations. To bridge this gap, (i) we postulate a set of natural axioms that we expect a measure of potential correlation to satisfy; (ii) we show that the rate of information bottleneck, i.e., the hypercontractivity coefficient, satisfies all the proposed axioms; (iii) we provide a novel estimator to estimate the hypercontractivity coefficient from samples; and (iv) we provide numerical experiments demonstrating that this proposed estimator discovers potential correlations among various indicators of WHO datasets, is robust in discovering gene interactions from gene expression time series data, and is statistically more powerful than the estimators for other correlation measures in binary hypothesis testing of canonical examples of potential correlations.
△ Less
Submitted 13 November, 2017; v1 submitted 12 September, 2017;
originally announced September 2017.