Search | arXiv e-print repository

arXiv:2405.20488 [pdf, other]

Shoal++: High Throughput DAG BFT Can Be Fast!

Authors: Balaji Arun, Zekun Li, Florian Suri-Payer, Sourav Das, Alexander Spiegelman

Abstract: Today's practical partially synchronous Byzantine Fault Tolerant (BFT) consensus protocols trade off low latency and high throughput. On the one end, traditional BFT protocols such as PBFT and its derivatives optimize for latency. They require, in fault-free executions, only 3 message exchanges to commit, the optimum for BFT consensus. However, this class of protocols typically relies on a single… ▽ More Today's practical partially synchronous Byzantine Fault Tolerant (BFT) consensus protocols trade off low latency and high throughput. On the one end, traditional BFT protocols such as PBFT and its derivatives optimize for latency. They require, in fault-free executions, only 3 message exchanges to commit, the optimum for BFT consensus. However, this class of protocols typically relies on a single leader, hampering throughput scalability. On the other end, a new class of so-called DAG-BFT protocols demonstrates how to achieve highly scalable throughput by separating data dissemination from consensus, and using every replica as proposer. Unfortunately, existing DAG-BFT protocols pay a steep latency premium, requiring on average 10.5 message exchanges to commit a transactions. This work aims to soften this tension and proposes Shoal++, a novel DAG-based BFT consensus system that offers the throughput of DAGs while reducing commit latency to an average of 4.5 message exchanges. Our empirical findings are encouraging, showing that Shoal++ achieves throughput comparable to state-of-the-art DAG BFT solutions while reducing latency by up to 60%. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.06117 [pdf, other]

Deferred Objects to Enhance Smart Contract Programming with Optimistic Parallel Execution

Authors: George Mitenkov, Igor Kabiljo, Zekun Li, Alexander Spiegelman, Satyanarayana Vusirikala, Zhuolun Xiang, Aleksandar Zlateski, Nuno P. Lopes, Rati Gelashvili

Abstract: One of the main bottlenecks of blockchains is smart contract execution. To increase throughput, modern blockchains try to execute transactions in parallel. Unfortunately, however, common blockchain use cases introduce read-write conflicts between transactions, forcing sequentiality. We propose RapidLane, an extension for parallel execution engines that allows the engine to capture computations i… ▽ More One of the main bottlenecks of blockchains is smart contract execution. To increase throughput, modern blockchains try to execute transactions in parallel. Unfortunately, however, common blockchain use cases introduce read-write conflicts between transactions, forcing sequentiality. We propose RapidLane, an extension for parallel execution engines that allows the engine to capture computations in conflicting parts of transactions and defer their execution until a later time, sometimes optimistically predicting execution results. This technique, coupled with support for a new construct for smart contract languages, allows one to turn certain sequential workloads into parallelizable ones. We integrated RapidLane into Block-STM, a state-of-the-art parallel execution engine used by several blockchains in production, and deployed it on the Aptos blockchain. Our evaluation shows that on commonly contended workloads, such as peer-to-peer transfers with a single fee payer and NFT minting, RapidLane yields up to $12\times$ more throughput. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2306.03058 [pdf, other]

Shoal: Improving DAG-BFT Latency And Robustness

Authors: Alexander Spiegelman, Balaji Arun, Rati Gelashvili, Zekun Li

Abstract: The Narwhal system is a state-of-the-art Byzantine fault-tolerant scalable architecture that involves constructing a directed acyclic graph (DAG) of messages among a set of validators in a Blockchain network. Bullshark is a zero-overhead consensus protocol on top of the Narwhal's DAG that can order over 100k transactions per second. Unfortunately, the high throughput of Bullshark comes with a late… ▽ More The Narwhal system is a state-of-the-art Byzantine fault-tolerant scalable architecture that involves constructing a directed acyclic graph (DAG) of messages among a set of validators in a Blockchain network. Bullshark is a zero-overhead consensus protocol on top of the Narwhal's DAG that can order over 100k transactions per second. Unfortunately, the high throughput of Bullshark comes with a latency price due to the DAG construction, increasing the latency compared to the state-of-the-art leader-based BFT consensus protocols. We introduce Shoal, a protocol-agnostic framework for enhancing Narwhal-based consensus. By incorporating leader reputation and pipelining support for the first time, Shoal significantly reduces latency. Moreover, the combination of properties of the DAG construction and the leader reputation mechanism enables the elimination of timeouts in all but extremely uncommon scenarios in practice, a property we name Prevalent Responsiveness" (it strictly subsumes the established and often desired Optimistic Responsiveness property for BFT protocols). We integrated Shoal instantiated with Bullshark, the fastest existing Narwhal-based consensus protocol, in an open-source Blockchain project and provide experimental evaluations demonstrating up to 40% latency reduction in the failure-free executions, and up-to 80% reduction in executions with failures against the vanilla Bullshark implementation. △ Less

Submitted 7 July, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

arXiv:2209.05633 [pdf, other]

Bullshark: The Partially Synchronous Version

Authors: Alexander Spiegelman, Neil Giridharan, Alberto Sonnino, Lefteris Kokoris-Kogias

Abstract: The purpose of this manuscript is to describe the deterministic partially synchronous version of Bullshark in a simple and clean way. This result is published in CCS 2022, however, the description there is less clear because it uses the terminology of the full asynchronous Bullshark. The CCS version ties the description of the asynchronous and partially synchronous versions of Bullshark since it t… ▽ More The purpose of this manuscript is to describe the deterministic partially synchronous version of Bullshark in a simple and clean way. This result is published in CCS 2022, however, the description there is less clear because it uses the terminology of the full asynchronous Bullshark. The CCS version ties the description of the asynchronous and partially synchronous versions of Bullshark since it targets an academic audience. Due to the recent interest in DAG-based BFT protocols, we provide a separate and simple description of the partially synchronous version that targets a more general audience. We focus here on the DAG ordering logic. For more details about the asynchronous version, garbage collection, fairness, proofs, related work, evaluation, and efficient DAG implementation please refer to the fullpaper. An intuitive extended summary can be found in the "DAG meets BFT" blogpost. △ Less

Submitted 12 September, 2022; originally announced September 2022.

arXiv:2203.06871 [pdf, other]

Block-STM: Scaling Blockchain Execution by Turning Ordering Curse to a Performance Blessing

Authors: Rati Gelashvili, Alexander Spiegelman, Zhuolun Xiang, George Danezis, Zekun Li, Dahlia Malkhi, Yu Xia, Runtian Zhou

Abstract: Block-STM is a parallel execution engine for smart contracts, built around the principles of Software Transactional Memory. Transactions are grouped in blocks, and every execution of the block must yield the same deterministic outcome. Block-STM further enforces that the outcome is consistent with executing transactions according to a preset order, leveraging this order to dynamically detect depen… ▽ More Block-STM is a parallel execution engine for smart contracts, built around the principles of Software Transactional Memory. Transactions are grouped in blocks, and every execution of the block must yield the same deterministic outcome. Block-STM further enforces that the outcome is consistent with executing transactions according to a preset order, leveraging this order to dynamically detect dependencies and avoid conflicts during speculative transaction execution. At the core of Block-STM is a novel, low-overhead collaborative scheduler of execution and validation tasks. Block-STM is implemented on the main branch of the Diem Blockchain code-base and runs in production at Aptos. Our evaluation demonstrates that Block-STM is adaptive to workloads with different conflict rates and utilizes the inherent parallelism therein. Block-STM achieves up to $110k$ tps in the Diem benchmarks and up to $170k$ tps in the Aptos Benchmarks, which is a $20$x and $17$x improvement over the sequential baseline with $32$ threads, respectively. The throughput on a contended workload is up to $50k$ tps and $80k$ tps in Diem and Aptos benchmarks, respectively. △ Less

Submitted 25 August, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

arXiv:2202.09123 [pdf, other]

Make Every Word Count: Adaptive BA with Fewer Words

Authors: Shir Cohen, Idit Keidar, Alexander Spiegelman

Abstract: Byzantine Agreement is a key component in many distributed systems. While Dolev and Reischuk have proven a long time ago that quadratic communication complexity is necessary for worst-case runs, the question of what can be done in practically common runs with fewer failures remained open. In this paper we present the first Byzantine Broadcast algorithm with $O(n(f+1))$ communication complexity, wh… ▽ More Byzantine Agreement is a key component in many distributed systems. While Dolev and Reischuk have proven a long time ago that quadratic communication complexity is necessary for worst-case runs, the question of what can be done in practically common runs with fewer failures remained open. In this paper we present the first Byzantine Broadcast algorithm with $O(n(f+1))$ communication complexity, where $0\leq f\leq t$ is the actual number of process failures in a run. And for BA with strong unanimity, we present the first optimal-resilience algorithm that has linear communication complexity in the failure-free case and a quadratic cost otherwise. △ Less

Submitted 10 January, 2024; v1 submitted 18 February, 2022; originally announced February 2022.

arXiv:2201.05677 [pdf, other]

Bullshark: DAG BFT Protocols Made Practical

Authors: Alexander Spiegelman, Neil Giridharan, Alberto Sonnino, Lefteris Kokoris-Kogias

Abstract: We present Bullshark, the first directed acyclic graph (DAG) based asynchronous Byzantine Atomic Broadcast protocol that is optimized for the common synchronous case. Like previous DAG-based BFT protocols, Bullshark requires no extra communication to achieve consensus on top of building the DAG. That is, parties can totally order the vertices of the DAG by interpreting their local view of the DAG… ▽ More We present Bullshark, the first directed acyclic graph (DAG) based asynchronous Byzantine Atomic Broadcast protocol that is optimized for the common synchronous case. Like previous DAG-based BFT protocols, Bullshark requires no extra communication to achieve consensus on top of building the DAG. That is, parties can totally order the vertices of the DAG by interpreting their local view of the DAG edges. Unlike other asynchronous DAG-based protocols, Bullshark provides a practical low latency fast-path that exploits synchronous periods and deprecates the need for notoriously complex view-change mechanisms. Bullshark achieves this while maintaining all the desired properties of its predecessor DAG-Rider. Namely, it has optimal amortized communication complexity, it provides fairness and asynchronous liveness, and safety is guaranteed even under a quantum adversary. In order to show the practicality and simplicity of our approach, we also introduce a standalone partially synchronous version of Bullshark which we evaluate against the state of the art. The implemented protocol is embarrassingly simple (200 LOC on top of an existing DAG-based mempool implementation (Narwhal & Tusk). It is highly efficient, achieving for example, 125,000 transaction per second with a 2 seconds latency for a deployment of 50 parties. In the same setting the state of the art pays a steep 50% latency increase as it optimizes for asynchrony. △ Less

Submitted 7 September, 2022; v1 submitted 14 January, 2022; originally announced January 2022.

arXiv:2110.00960 [pdf, other]

Be Aware of Your Leaders

Authors: Shir Cohen, Rati Gelashvili, Lefteris Kokoris Kogias, Zekun Li, Dahlia Malkhi, Alberto Sonnino, Alexander Spiegelman

Abstract: Advances in blockchains have influenced the State-Machine-Replication (SMR) world and many state-of-the-art blockchain-SMR solutions are based on two pillars: Chaining and Leader-rotation. A predetermined round-robin mechanism used for Leader-rotation, however, has an undesirable behavior: crashed parties become designated leaders infinitely often, slowing down overall system performance. In this… ▽ More Advances in blockchains have influenced the State-Machine-Replication (SMR) world and many state-of-the-art blockchain-SMR solutions are based on two pillars: Chaining and Leader-rotation. A predetermined round-robin mechanism used for Leader-rotation, however, has an undesirable behavior: crashed parties become designated leaders infinitely often, slowing down overall system performance. In this paper, we provide a new Leader-Aware SMR framework that, among other desirable properties, formalizes a Leader-utilization requirement that bounds the number of rounds whose leaders are faulty in crash-only executions. We introduce Carousel, a novel, reputation-based Leader-rotation solution to achieve Leader-Aware SMR. The challenge in adaptive Leader-rotation is that it cannot rely on consensus to determine a leader, since consensus itself needs a leader. Carousel uses the available on-chain information to determine a leader locally and achieves Liveness despite this difficulty. A HotStuff implementation fitted with Carousel demonstrates drastic performance improvements: it increases throughput over 2x in faultless settings and provided a 20x throughput increase and 5x latency reduction in the presence of faults. △ Less

Submitted 3 October, 2021; originally announced October 2021.

arXiv:2106.10362 [pdf, other]

Jolteon and Ditto: Network-Adaptive Efficient Consensus with Asynchronous Fallback

Authors: Rati Gelashvili, Lefteris Kokoris-Kogias, Alberto Sonnino, Alexander Spiegelman, Zhuolun Xiang

Abstract: Existing committee-based Byzantine state machine replication (SMR) protocols, typically deployed in production blockchains, face a clear trade-off: (1) they either achieve linear communication cost in the happy path, but sacrifice liveness during periods of asynchrony, or (2) they are robust (progress with probability one) but pay quadratic communication cost. We believe this trade-off is unwarran… ▽ More Existing committee-based Byzantine state machine replication (SMR) protocols, typically deployed in production blockchains, face a clear trade-off: (1) they either achieve linear communication cost in the happy path, but sacrifice liveness during periods of asynchrony, or (2) they are robust (progress with probability one) but pay quadratic communication cost. We believe this trade-off is unwarranted since existing linear protocols still have asymptotic quadratic cost in the worst case. We design Ditto, a Byzantine SMR protocol that enjoys the best of both worlds: optimal communication on and off the happy path (linear and quadratic, respectively) and progress guarantee under asynchrony and DDoS attacks. We achieve this by replacing the view-synchronization of partially synchronous protocols with an asynchronous fallback mechanism at no extra asymptotic cost. Specifically, we start from HotStuff, a state-of-the-art linear protocol, and gradually build Ditto. As a separate contribution and an intermediate step, we design a 2-chain version of HotStuff, Jolteon, which leverages a quadratic view-change mechanism to reduce the latency of the standard 3-chain HotStuff. We implement and experimentally evaluate all our systems. Notably, Jolteon's commit latency outperforms HotStuff by 200-300ms with varying system size. Additionally, Ditto adapts to the network and provides better performance than Jolteon under faulty conditions and better performance than VABA (a state-of-the-art asynchronous protocol) under faultless conditions. This proves our case that breaking the robustness-efficiency trade-off is in the realm of practicality. △ Less

Submitted 9 July, 2024; v1 submitted 18 June, 2021; originally announced June 2021.

Comments: arXiv admin note: text overlap with arXiv:2103.03181

arXiv:2105.11827 [pdf, other]

Narwhal and Tusk: A DAG-based Mempool and Efficient BFT Consensus

Authors: George Danezis, Eleftherios Kokoris Kogias, Alberto Sonnino, Alexander Spiegelman

Abstract: We propose separating the task of reliable transaction dissemination from transaction ordering, to enable high-performance Byzantine fault-tolerant quorum-based consensus. We design and evaluate a mempool protocol, Narwhal, specializing in high-throughput reliable dissemination and storage of causal histories of transactions. Narwhal tolerates an asynchronous network and maintains high performance… ▽ More We propose separating the task of reliable transaction dissemination from transaction ordering, to enable high-performance Byzantine fault-tolerant quorum-based consensus. We design and evaluate a mempool protocol, Narwhal, specializing in high-throughput reliable dissemination and storage of causal histories of transactions. Narwhal tolerates an asynchronous network and maintains high performance despite failures. Narwhal is designed to easily scale-out using multiple workers at each validator, and we demonstrate that there is no foreseeable limit to the throughput we can achieve. Composing Narwhal with a partially synchronous consensus protocol (Narwhal-HotStuff) yields significantly better throughput even in the presence of faults or intermittent loss of liveness due to asynchrony. However, loss of liveness can result in higher latency. To achieve overall good performance when faults occur we design Tusk, a zero-message overhead asynchronous consensus protocol, to work with Narwhal. We demonstrate its high performance under a variety of configurations and faults. As a summary of results, on a WAN, Narwhal-Hotstuff achieves over 130,000 tx/sec at less than 2-sec latency compared with 1,800 tx/sec at 1-sec latency for Hotstuff. Additional workers increase throughput linearly to 600,000 tx/sec without any latency increase. Tusk achieves 160,000 tx/sec with about 3 seconds latency. Under faults, both protocols maintain high throughput, but Narwhal-HotStuff suffers from increased latency. △ Less

Submitted 16 March, 2022; v1 submitted 25 May, 2021; originally announced May 2021.

arXiv:2103.03181 [pdf, other]

Be Prepared When Network Goes Bad: An Asynchronous View-Change Protocol

Authors: Rati Gelashvili, Lefteris Kokoris-Kogias, Alexander Spiegelman, Zhuolun Xiang

Abstract: The popularity of permissioned blockchain systems demands BFT SMR protocols that are efficient under good network conditions (synchrony) and robust under bad network conditions (asynchrony). The state-of-the-art partially synchronous BFT SMR protocols provide optimal linear communication cost per decision under synchrony and good leaders, but lose liveness under asynchrony. On the other hand, the… ▽ More The popularity of permissioned blockchain systems demands BFT SMR protocols that are efficient under good network conditions (synchrony) and robust under bad network conditions (asynchrony). The state-of-the-art partially synchronous BFT SMR protocols provide optimal linear communication cost per decision under synchrony and good leaders, but lose liveness under asynchrony. On the other hand, the state-of-the-art asynchronous BFT SMR protocols are live even under asynchrony, but always pay quadratic cost even under synchrony. In this paper, we propose a BFT SMR protocol that achieves the best of both worlds -- optimal linear cost per decision under good networks and leaders, optimal quadratic cost per decision under bad networks, and remains always live. △ Less

Submitted 4 March, 2021; originally announced March 2021.

arXiv:2102.08325 [pdf, other]

All You Need is DAG

Authors: Idit Keidar, Eleftherios Kokoris-Kogias, Oded Naor, Alexander Spiegelman

Abstract: We present DAG-Rider, the first asynchronous Byzantine Atomic Broadcast protocol that achieves optimal resilience, optimal amortized communication complexity, and optimal time complexity. DAG-Rider is post-quantum safe and ensures that all messages proposed by correct processes eventually get decided. We construct DAG-Rider in two layers: In the first layer, processes reliably broadcast their prop… ▽ More We present DAG-Rider, the first asynchronous Byzantine Atomic Broadcast protocol that achieves optimal resilience, optimal amortized communication complexity, and optimal time complexity. DAG-Rider is post-quantum safe and ensures that all messages proposed by correct processes eventually get decided. We construct DAG-Rider in two layers: In the first layer, processes reliably broadcast their proposals and build a structured Directed Acyclic Graph (DAG) of the communication among them. In the second layer, processes locally observe their DAGs and totally order all proposals with no extra communication. △ Less

Submitted 4 June, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

arXiv:2011.04719 [pdf, other]

Probabilistic Indistinguishability and the Quality of Validity in Byzantine Agreement

Authors: Guy Goren, Yoram Moses, Alexander Spiegelman

Abstract: Lower bounds and impossibility results in distributed computing are both intellectually challenging and practically important. Hundreds if not thousands of proofs appear in the literature, but surprisingly, the vast majority of them apply to deterministic algorithms only. Probabilistic protocols have been around for at least four decades and are receiving a lot of attention with the emergence of b… ▽ More Lower bounds and impossibility results in distributed computing are both intellectually challenging and practically important. Hundreds if not thousands of proofs appear in the literature, but surprisingly, the vast majority of them apply to deterministic algorithms only. Probabilistic protocols have been around for at least four decades and are receiving a lot of attention with the emergence of blockchain systems. Nonetheless, we are aware of only a handful of randomized lower bounds. In this paper we provide a formal framework for reasoning about randomized distributed algorithms. We generalize the notion of indistinguishability, the most useful tool in deterministic lower bounds, to apply to a probabilistic setting. We apply this framework to prove a result of independent interest. Namely, we completely characterize the quality of decisions that protocols for a randomized multi-valued Consensus problem can guarantee in an asynchronous environment with Byzantine faults. We use the new notion to prove a lower bound on the probability at which it can be guaranteed that honest parties will not decide on a possibly bogus value. Finally, we show that the bound is tight by providing a protocol that matches it. △ Less

Submitted 11 October, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

arXiv:2002.06993 [pdf, other]

In Search for an Optimal Authenticated Byzantine Agreement

Authors: Alexander Spiegelman

Abstract: In this paper, we challenge the conventional approach of state machine replication systems to design deterministic agreement protocols in the eventually synchronous communication model. We first prove that no such protocol can guarantee bounded communication cost before the global stabilization time and propose a different approach that hopes for the best (synchrony) but prepares for the worst (as… ▽ More In this paper, we challenge the conventional approach of state machine replication systems to design deterministic agreement protocols in the eventually synchronous communication model. We first prove that no such protocol can guarantee bounded communication cost before the global stabilization time and propose a different approach that hopes for the best (synchrony) but prepares for the worst (asynchrony). Accordingly, we design an optimistic byzantine agreement protocol that first tries an efficient deterministic algorithm that relies on synchrony for termination only, and then, only if an agreement was not reached due to asynchrony, the protocol uses a randomized asynchronous protocol for fallback that guarantees termination with probability 1. We formally prove that our protocol achieves optimal communication complexity under all network conditions and failure scenarios. We first prove a lower bound of $Ω(ft+ t)$ for synchronous deterministic byzantine agreement protocols, where $t$ is the failure threshold, and $f$ is the actual number of failures. Then, we present a tight upper bound and use it for the synchronous part of the optimistic protocol. Finally, for the asynchronous fallback, we use a variant of the (optimal) VABA protocol, which we reconstruct to safely combine it with the synchronous part. We believe that our adaptive to failures synchronous byzantine agreement protocol has an independent interest since it is the first protocol we are aware of which communication complexity optimally depends on the actual number of failures. △ Less

Submitted 4 August, 2021; v1 submitted 17 February, 2020; originally announced February 2020.

arXiv:2002.06545 [pdf, other]

Not a COINcidence: Sub-Quadratic Asynchronous Byzantine Agreement WHP

Authors: Shir Cohen, Idit Keidar, Alexander Spiegelman

Abstract: King and Saia were the first to break the quadratic word complexity bound for Byzantine Agreement in synchronous systems against an adaptive adversary, and Algorand broke this bound with near-optimal resilience (first in the synchronous model and then with eventual-synchrony). Yet the question of asynchronous sub-quadratic Byzantine Agreement remained open. To the best of our knowledge, we are the… ▽ More King and Saia were the first to break the quadratic word complexity bound for Byzantine Agreement in synchronous systems against an adaptive adversary, and Algorand broke this bound with near-optimal resilience (first in the synchronous model and then with eventual-synchrony). Yet the question of asynchronous sub-quadratic Byzantine Agreement remained open. To the best of our knowledge, we are the first to answer this question in the affirmative. A key component of our solution is a shared coin algorithm based on a VRF. A second essential ingredient is VRF-based committee sampling, which we formalize and utilize in the asynchronous model for the first time. Our algorithms work against a delayed-adaptive adversary, which cannot perform after-the-fact removals but has full control of Byzantine processes and full information about communication in earlier rounds. Using committee sampling and our shared coin, we solve Byzantine Agreement with high probability, with a word complexity of $\widetilde{O}(n)$ and $O(1)$ expected time, breaking the $O(n^2)$ bit barrier for asynchronous Byzantine Agreement. △ Less

Submitted 3 August, 2020; v1 submitted 16 February, 2020; originally announced February 2020.

arXiv:2001.00363 [pdf, other]

Using Nesting to Push the Limits of Transactional Data Structure Libraries

Authors: Gal Assa, Hagar Meir, Guy Golan-Gueta, Idit Keidar, Alexander Spiegelman

Abstract: Transactional data structure libraries (TDSL) combine the ease-of-programming of transactions with the high performance and scalability of custom-tailored concurrent data structures. They can be very efficient thanks to their ability to exploit data structure semantics in order to reduce overhead, aborts, and wasted work compared to general-purpose software transactional memory. However, TDSLs wer… ▽ More Transactional data structure libraries (TDSL) combine the ease-of-programming of transactions with the high performance and scalability of custom-tailored concurrent data structures. They can be very efficient thanks to their ability to exploit data structure semantics in order to reduce overhead, aborts, and wasted work compared to general-purpose software transactional memory. However, TDSLs were not previously used for complex use-cases involving long transactions and a variety of data structures. In this paper, we boost the performance and usability of a TDSL, towards allowing it to support complex applications. A key idea is nesting. Nested transactions create checkpoints within a longer transaction, so as to limit the scope of abort, without changing the semantics of the original transaction. We build a Java TDSL with built-in support for nested transactions over a number of data structures. We conduct a case study of a complex network intrusion detection system that invests a significant amount of work to process each packet. Our study shows that our library outperforms publicly available STMs twofold without nesting, and by up to 16x when nesting is used. △ Less

Submitted 16 February, 2021; v1 submitted 2 January, 2020; originally announced January 2020.

arXiv:1911.10486 [pdf, other]

ACE: Abstract Consensus Encapsulation for Liveness Boosting of State Machine Replication

Authors: Alexander Spiegelman, Arik Rinberg

Abstract: With the emergence of cross-organization attack-prone byzantine fault-tolerant (BFT) systems, so-called Blockchains, providing asynchronous state machine replication (SMR) solutions is no longer a theoretical concern. This paper introduces ACE: a general framework for the software design of fault-tolerant SMR systems. We first propose a new leader-based-view (LBV) abstraction that encapsulates the… ▽ More With the emergence of cross-organization attack-prone byzantine fault-tolerant (BFT) systems, so-called Blockchains, providing asynchronous state machine replication (SMR) solutions is no longer a theoretical concern. This paper introduces ACE: a general framework for the software design of fault-tolerant SMR systems. We first propose a new leader-based-view (LBV) abstraction that encapsulates the core properties provided by each view in a partially synchronous consensus algorithm, designed according to the leader-based view-by-view paradigm (e.g., PBFT and Paxos). Then, we compose several LBV instances in a non-trivial way in order to boost asynchronous liveness of existing SMR solutions. ACE is model agnostic - it abstracts away any model assumptions that consensus protocols may have, e.g., the ratio and types of faulty parties. For example, when the LBV abstraction is instantiated with a partially synchronous consensus algorithm designed to tolerate crash failures, e.g., Paxos or Raft, ACE yields an asynchronous SMR for $n = 2f+1$ parties. However, if the LBV abstraction is instantiated with a byzantine protocol like PBFT or HotStuff, then ACE yields an asynchronous byzantine SMR for $n = 3f+1$ parties. To demonstrate the power of ACE, we implement it in C++, instantiate the LBV abstraction with a view implementation of HotStuff -- a state of the art partially synchronous byzantine agreement protocol -- and compare it with the base HotStuff implementation under different adversarial scenarios. Our evaluation shows that while ACE is outperformed by HotStuff in the optimistic, synchronous, failure-free case, ACE has absolute superiority during network asynchrony and attacks. △ Less

Submitted 24 November, 2019; originally announced November 2019.

arXiv:1911.04124 [pdf, ps, other]

Tuning PoW with Hybrid Expenditure

Authors: Itay Tsabary, Alexander Spiegelman, Ittay Eyal

Abstract: Proof of Work (PoW) is a Sybil-deterrence security mechanism. It introduces an external cost to a system by requiring computational effort to perform actions. However, since its inception, a central challenge was to tune this cost. Initial designs for deterring spam email and DoS attacks applied overhead equally to honest participants and attackers. Requiring too little effort did not deter attack… ▽ More Proof of Work (PoW) is a Sybil-deterrence security mechanism. It introduces an external cost to a system by requiring computational effort to perform actions. However, since its inception, a central challenge was to tune this cost. Initial designs for deterring spam email and DoS attacks applied overhead equally to honest participants and attackers. Requiring too little effort did not deter attacks, whereas too much encumbered honest participation. This might be the reason it was never widely adopted. Nakamoto overcame this trade-off in Bitcoin by distinguishing desired from malicious behavior and introducing internal rewards for the former. This solution gained popularity in securing cryptocurrencies and using the virtual internally-minted tokens for rewards. However, in existing blockchain protocols the internal rewards fund (almost) the same value of external expenses. Thus, as the token value soars, so does the PoW expenditure. Bitcoin PoW, for example, already expends as much electricity as Colombia or Switzerland. This amount of resource-guzzling is unsustainable and hinders even wider adoption of these systems. In this work we present Hybrid Expenditure Blockchain (HEB), a novel PoW mechanism. HEB is a generalization of Nakamoto's protocol that enables tuning the external expenditure by introducing a complementary internal-expenditure mechanism. Thus, for the first time, HEB decouples external expenditure from the reward value. We show a practical parameter choice by which HEB requires significantly less external consumption compare to Nakamoto's protocol, its resilience against rational attackers is similar, and it retains the decentralized and permissionless nature of the system. Taking the Bitcoin ecosystem as an example, HEB cuts the electricity consumption by half. △ Less

Submitted 4 August, 2021; v1 submitted 11 November, 2019; originally announced November 2019.

arXiv:1909.05204 [pdf, ps, other]

Cogsworth: Byzantine View Synchronization

Authors: Oded Naor, Mathieu Baudet, Dahlia Malkhi, Alexander Spiegelman

Abstract: Most methods for Byzantine fault tolerance (BFT) in the partial synchrony setting divide the local state of the nodes into views, and the transition from one view to the next dictates a leader change. In order to provide liveness, all honest nodes need to stay in the same view for a sufficiently long time. This requires \emph{view synchronization}, a requisite of BFT that we extract and formally d… ▽ More Most methods for Byzantine fault tolerance (BFT) in the partial synchrony setting divide the local state of the nodes into views, and the transition from one view to the next dictates a leader change. In order to provide liveness, all honest nodes need to stay in the same view for a sufficiently long time. This requires \emph{view synchronization}, a requisite of BFT that we extract and formally define here. Existing approaches for Byzantine view synchronization incur quadratic communication (in $n$, the number of parties). A cascade of $O(n)$ view changes may thus result in $O(n^3)$ communication complexity. This paper presents a new Byzantine view synchronization algorithm named Cogsworth, that has optimistically linear communication complexity and constant latency. Faced with benign failures, Cogsworth has expected linear communication and constant latency. The result here serves as an important step towards reaching solutions that have overall quadratic communication, the known lower bound on Byzantine fault tolerant consensus. Cogsworth is particularly useful for a family of BFT protocols that already exhibit linear communication under various circumstances, but suffer quadratic overhead due to view synchronization. △ Less

Submitted 6 February, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

arXiv:1906.03819 [pdf, other]

FairLedger: A Fair Blockchain Protocol for Financial Institutions

Authors: Kfir Lev-Ari, Alexander Spiegelman, Idit Keidar, Dahlia Malkhi

Abstract: Financial institutions are currently looking into technologies for permissioned blockchains. A major effort in this direction is Hyperledger, an open source project hosted by the Linux Foundation and backed by a consortium of over a hundred companies. A key component in permissioned blockchain protocols is a byzantine fault tolerant (BFT) consensus engine that orders transactions. However, current… ▽ More Financial institutions are currently looking into technologies for permissioned blockchains. A major effort in this direction is Hyperledger, an open source project hosted by the Linux Foundation and backed by a consortium of over a hundred companies. A key component in permissioned blockchain protocols is a byzantine fault tolerant (BFT) consensus engine that orders transactions. However, currently available BFT solutions in Hyperledger (as well as in the literature at large) are inadequate for financial settings; they are not designed to ensure fairness or to tolerate selfish behavior that arises when financial institutions strive to maximize their own profit. We present FairLedger, a permissioned blockchain BFT protocol, which is fair, designed to deal with rational behavior, and, no less important, easy to understand and implement. The secret sauce of our protocol is a new communication abstraction, called detectable all-to-all (DA2A), which allows us to detect participants (byzantine or rational) that deviate from the protocol, and punish them. We implement FairLedger in the Hyperledger open source project, using Iroha framework, one of the biggest projects therein. To evaluate FairLegder's performance, we also implement it in the PBFT framework and compare the two protocols. Our results show that in failure-free scenarios FairLedger achieves better throughput than both Iroha's implementation and PBFT in wide-area settings. △ Less

Submitted 10 June, 2019; originally announced June 2019.

arXiv:1902.10995 [pdf, ps, other]

Fast Concurrent Data Sketches

Authors: Arik Rinberg, Alexander Spiegelman, Edward Bortnikov, Eshcar Hillel, Idit Keidar, Lee Rhodes, Hadar Serviansky

Abstract: Data sketches are approximate succinct summaries of long streams. They are widely used for processing massive amounts of data and answering statistical queries about it in real-time. Existing libraries producing sketches are very fast, but do not allow parallelism for creating sketches using multiple threads or querying them while they are being built. We present a generic approach to parallelisin… ▽ More Data sketches are approximate succinct summaries of long streams. They are widely used for processing massive amounts of data and answering statistical queries about it in real-time. Existing libraries producing sketches are very fast, but do not allow parallelism for creating sketches using multiple threads or querying them while they are being built. We present a generic approach to parallelising data sketches efficiently, while bounding the error that such parallelism introduces. Utilising relaxed semantics and the notion of strong linearisability we prove our algorithm's correctness and analyse the error it induces in two specific sketches. Our implementation achieves high scalability while kee** the error small. △ Less

Submitted 5 December, 2019; v1 submitted 28 February, 2019; originally announced February 2019.

arXiv:1902.03899 [pdf, other]

Mind the Mining

Authors: Guy Goren, Alexander Spiegelman

Abstract: In this paper we revisit the mining strategies in proof of work based cryptocurrencies and propose two strategies, we call smart and smarter mining, that in many cases strictly dominate honest mining. In contrast to other known attacks, like selfish mining, which induce zero-sum games among the miners, the strategies proposed in this paper increase miners' profit by reducing their variable costs (… ▽ More In this paper we revisit the mining strategies in proof of work based cryptocurrencies and propose two strategies, we call smart and smarter mining, that in many cases strictly dominate honest mining. In contrast to other known attacks, like selfish mining, which induce zero-sum games among the miners, the strategies proposed in this paper increase miners' profit by reducing their variable costs (i.e., electricity). Moreover, the proposed strategies are viable for much smaller miners than previously known attacks, and surprisingly, an attack performed by one miner is profitable for all other miners as well. While saving electricity power is very encouraging for the environment, it is less so for the coin's security. The smart/smarter strategies expose the coin to under 50\% attacks and this vulnerability might only grow when new miners join the coin as a response to the increase in profit margins induced by these strategies. △ Less

Submitted 12 February, 2019; v1 submitted 11 February, 2019; originally announced February 2019.

arXiv:1811.01332 [pdf, ps, other]

Validated Asynchronous Byzantine Agreement with Optimal Resilience and Asymptotically Optimal Time and Word Communication

Authors: Ittai Abraham, Dahlia Malkhi, Alexander Spiegelman

Abstract: We provide a new protocol for Validated Asynchronous Byzantine Agreement. Validated (multi-valued) Asynchronous Byzantine Agreement is a key building block in constructing Atomic Broadcast and fault-tolerant state machine replication in the asynchronous setting. Our protocol can withstand the optimal number $f<n/3$ of Byzantine failures and reaches agreement in the asymptotically optimal expected… ▽ More We provide a new protocol for Validated Asynchronous Byzantine Agreement. Validated (multi-valued) Asynchronous Byzantine Agreement is a key building block in constructing Atomic Broadcast and fault-tolerant state machine replication in the asynchronous setting. Our protocol can withstand the optimal number $f<n/3$ of Byzantine failures and reaches agreement in the asymptotically optimal expected $O(1)$ running time. Honest parties in our protocol send only an expected $O(n^2)$ messages where each message contains a value and a constant number of signatures. Hence our total expected communication is $O(n^2)$ words. The best previous result of Cachin et al. from 2001 solves Validated Byzantine Agreement with optimal resilience and $O(1)$ expected time but with $O(n^3)$ expected word communication. Our work addresses an open question of Cachin et al. from 2001 and improves the expected word communication from $O(n^3)$ to the asymptotically optimal $O(n^2)$. △ Less

Submitted 4 November, 2018; originally announced November 2018.

arXiv:1805.08979 [pdf, other]

Game of Coins

Authors: Alexander Spiegelman, Idit Keidar, Moshe Tennenholtz

Abstract: We formalize the current practice of strategic mining in multi-cryptocurrency markets as a game, and prove that any better-response learning in such games converges to equilibrium. We then offer a reward design scheme that moves the system configuration from any initial equilibrium to a desired one for any better-response learning of the miners. Our work introduces the first multi-coin strategic a… ▽ More We formalize the current practice of strategic mining in multi-cryptocurrency markets as a game, and prove that any better-response learning in such games converges to equilibrium. We then offer a reward design scheme that moves the system configuration from any initial equilibrium to a desired one for any better-response learning of the miners. Our work introduces the first multi-coin strategic attack for adaptive and learning miners, as well as the study of reward design in a multi-agent system of learning agents. △ Less

Submitted 23 May, 2018; originally announced May 2018.

arXiv:1805.06265 [pdf, ps, other]

Integrated Bounds for Disintegrated Storage

Authors: Alon Berger, Idit Keidar, Alexander Spiegelman

Abstract: We point out a somewhat surprising similarity between non-authenticated Byzantine storage, coded storage, and certain emulations of shared registers from smaller ones. A common characteristic in all of these is the inability of reads to safely return a value obtained in a single atomic access to shared storage. We collectively refer to such systems as disintegrated storage, and show integrated spa… ▽ More We point out a somewhat surprising similarity between non-authenticated Byzantine storage, coded storage, and certain emulations of shared registers from smaller ones. A common characteristic in all of these is the inability of reads to safely return a value obtained in a single atomic access to shared storage. We collectively refer to such systems as disintegrated storage, and show integrated space lower bounds for asynchronous regular wait-free emulations in all of them. In a nutshell, if readers are invisible, then the storage cost of such systems is inherently exponential in the size of written values; otherwise, it is at least linear in the number of readers. Our bounds are asymptotically tight to known algorithms, and thus justify their high costs. △ Less

Submitted 6 August, 2018; v1 submitted 16 May, 2018; originally announced May 2018.

arXiv:1705.07212 [pdf, other]

Space Complexity of Fault Tolerant Register Emulations

Authors: Gregory Chockler, Alexander Spiegelman

Abstract: Driven by the rising popularity of cloud storage, the costs associated with implementing reliable storage services from a collection of fault-prone servers have recently become an actively studied question. The well-known ABD result shows that an f-tolerant register can be emulated using a collection of 2f + 1 fault-prone servers each storing a single read-modify-write object type, which is known… ▽ More Driven by the rising popularity of cloud storage, the costs associated with implementing reliable storage services from a collection of fault-prone servers have recently become an actively studied question. The well-known ABD result shows that an f-tolerant register can be emulated using a collection of 2f + 1 fault-prone servers each storing a single read-modify-write object type, which is known to be optimal. In this paper we generalize this bound: we investigate the inherent space complexity of emulating reliable multi-writer registers as a fucntion of the type of the base objects exposed by the underlying servers, the number of writers to the emulated register, the number of available servers, and the failure threshold. We establish a sharp separation between registers, and both max-registers (the base object types assumed by ABD) and CAS in terms of the resources (i.e., the number of base objects of the respective types) required to support the emulation; we show that no such separation exists between max-registers and CAS. Our main technical contribution is lower and upper bounds on the resources required in case the underlying base objects are fault-prone read/write registers. We show that the number of required registers is directly proportional to the number of writers and inversely proportional to the number of servers. △ Less

Submitted 19 May, 2017; originally announced May 2017.

Comments: Conference version appears in Proceedings of PODC '17

ACM Class: C.2.4; D.4.7

arXiv:1705.02808 [pdf, other]

Towards Reduced Instruction Sets for Synchronization

Authors: Rati Gelashvili, Idit Keidar, Alexander Spiegelman, Roger Wattenhofer

Abstract: Contrary to common belief, a recent work by Ellen, Gelashvili, Shavit, and Zhu has shown that computability does not require multicore architectures to support "strong" synchronization instructions like compare-and-swap, as opposed to combinations of "weaker" instructions like decrement and multiply. However, this is the status quo, and in turn, most efficient concurrent data-structures heavily re… ▽ More Contrary to common belief, a recent work by Ellen, Gelashvili, Shavit, and Zhu has shown that computability does not require multicore architectures to support "strong" synchronization instructions like compare-and-swap, as opposed to combinations of "weaker" instructions like decrement and multiply. However, this is the status quo, and in turn, most efficient concurrent data-structures heavily rely on compare-and-swap (e.g. for swinging pointers and in general, conflict resolution). We show that this need not be the case, by designing and implementing a concurrent linearizable Log data-structure (also known as a History object), supporting two operations: append(item), which appends the item to the log, and get-log(), which returns the appended items so far, in order. Readers are wait-free and writers are lock-free, and this data-structure can be used in a lock-free universal construction to implement any concurrent object with a given sequential specification. Our implementation uses atomic read, xor, decrement, and fetch-and-increment instructions supported on X86 architectures, and provides similar performance to a compare-and-swap-based solution on today's hardware. This raises a fundamental question about minimal set of synchronization instructions that the architectures have to support. △ Less

Submitted 8 May, 2017; originally announced May 2017.

arXiv:1612.02916 [pdf, other]

Solida: A Blockchain Protocol Based on Reconfigurable Byzantine Consensus

Authors: Ittai Abraham, Dahlia Malkhi, Kartik Nayak, Ling Ren, Alexander Spiegelman

Abstract: The decentralized cryptocurrency Bitcoin has experienced great success but also encountered many challenges. One of the challenges has been the long confirmation time. Another challenge is the lack of incentives at certain steps of the protocol, raising concerns for transaction withholding, selfish mining, etc. To address these challenges, we propose Solida, a decentralized blockchain protocol bas… ▽ More The decentralized cryptocurrency Bitcoin has experienced great success but also encountered many challenges. One of the challenges has been the long confirmation time. Another challenge is the lack of incentives at certain steps of the protocol, raising concerns for transaction withholding, selfish mining, etc. To address these challenges, we propose Solida, a decentralized blockchain protocol based on reconfigurable Byzantine consensus augmented by proof-of-work. Solida improves on Bitcoin in confirmation time, and provides safety and liveness assuming the adversary control less than (roughly) one-third of the total mining power. △ Less

Submitted 18 November, 2017; v1 submitted 8 December, 2016; originally announced December 2016.

arXiv:1608.06696 [pdf, other]

Flexible Paxos: Quorum intersection revisited

Authors: Heidi Howard, Dahlia Malkhi, Alexander Spiegelman

Abstract: Distributed consensus is integral to modern distributed systems. The widely adopted Paxos algorithm uses two phases, each requiring majority agreement, to reliably reach consensus. In this paper, we demonstrate that Paxos, which lies at the foundation of many production systems, is conservative. Specifically, we observe that each of the phases of Paxos may use non-intersecting quorums. Majority qu… ▽ More Distributed consensus is integral to modern distributed systems. The widely adopted Paxos algorithm uses two phases, each requiring majority agreement, to reliably reach consensus. In this paper, we demonstrate that Paxos, which lies at the foundation of many production systems, is conservative. Specifically, we observe that each of the phases of Paxos may use non-intersecting quorums. Majority quorums are not necessary as intersection is required only across phases. Using this weakening of the requirements made in the original formulation, we propose Flexible Paxos, which generalizes over the Paxos algorithm to provide flexible quorums. We show that Flexible Paxos is safe, efficient and easy to utilize in existing distributed systems. We conclude by discussing the wide reaching implications of this result. Examples include improved availability from reducing the size of second phase quorums by one when the number of acceptors is even and utilizing small disjoint phase-2 quorums to speed up the steady-state. △ Less

Submitted 23 August, 2016; originally announced August 2016.

ACM Class: C.2.4

arXiv:1508.03762 [pdf, ps, other]

Space Bounds for Reliable Multi-Writer Data Store: Inherent Cost of Read/Write Primitives

Authors: Gregory Chockler, Dan Dobre, Alexander Shraer, Alexander Spiegelman

Abstract: Reliable storage emulations from fault-prone components have established themselves as an algorithmic foundation of modern storage services and applications. Most existing reliable storage emulations are built from storage services supporting arbitrary read-modify-write primitives. Since such primitives are not typically exposed by pre-existing or off-the-shelf components (such as cloud storage se… ▽ More Reliable storage emulations from fault-prone components have established themselves as an algorithmic foundation of modern storage services and applications. Most existing reliable storage emulations are built from storage services supporting arbitrary read-modify-write primitives. Since such primitives are not typically exposed by pre-existing or off-the-shelf components (such as cloud storage services or network-attached disks) it is natural to ask if they are indeed essential for efficient storage emulations. In this paper, we answer this question in the affirmative. We show that relaxing the underlying storage to only support read/write operations leads to a linear blow-up in the emulation space requirements. We also show that the space complexity is not adaptive to concurrency, which implies that the storage cannot be reliably reclaimed even in sequential runs. On a positive side, we show that Compare-and-Swap primitives, which are commonly available with many off-the-shelf storage services, can be used to emulate a reliable multi-writer atomic register with constant storage and adaptive time complexity. △ Less

Submitted 15 August, 2015; originally announced August 2015.

ACM Class: C.2.4; C.4; D.4.7

arXiv:1507.07086 [pdf, other]

On Liveness of Dynamic Storage

Authors: Alexander Spiegelman, Idit Keidar

Abstract: Dynamic distributed storage algorithms such as DynaStore, Reconfigurable Paxos, RAMBO, and RDS, do not ensure liveness (wait-freedom) in asynchronous runs with infinitely many reconfigurations. We prove that this is inherent for asynchronous dynamic storage algorithms, including ones that use $Ω$ or $\diamond S$ oracles. Our result holds even if only one process may fail, provided that machines th… ▽ More Dynamic distributed storage algorithms such as DynaStore, Reconfigurable Paxos, RAMBO, and RDS, do not ensure liveness (wait-freedom) in asynchronous runs with infinitely many reconfigurations. We prove that this is inherent for asynchronous dynamic storage algorithms, including ones that use $Ω$ or $\diamond S$ oracles. Our result holds even if only one process may fail, provided that machines that were successfully removed from the system's configuration may be switched off by an administrator. Intuitively, the impossibility relies on the fact that a correct process can be suspected to have failed at any time, i.e., its failure is indistinguishable to other processes from slow delivery of its messages, and so the system should be able to reconfigure without waiting for this process to complete its pending operations. To circumvent this result, we define a dynamic eventually perfect failure detector, and present an algorithm that uses it to emulate wait-free dynamic atomic storage (with no restrictions on reconfiguration rate). Together, our results thus draw a sharp line between oracles like $Ω$ and $\diamond S$, which allow some correct process to continue to be suspected forever, and a dynamic eventually perfect one, which does not. △ Less

Submitted 25 July, 2015; originally announced July 2015.

arXiv:1507.05169 [pdf, other]

Space Bounds for Reliable Storage: Fundamental Limits of Coding

Authors: Alexander Spiegelman, Yuval Cassuto, Gregory Chockler, Idit Keidar

Abstract: We study the inherent space requirements of shared storage algorithms in asynchronous fault-prone systems. Previous works use codes to achieve a better storage cost than the well-known replication approach. However, a closer look reveals that they incur extra costs somewhere else: Some use unbounded storage in communication links, while others assume bounded concurrency or synchronous periods. We… ▽ More We study the inherent space requirements of shared storage algorithms in asynchronous fault-prone systems. Previous works use codes to achieve a better storage cost than the well-known replication approach. However, a closer look reveals that they incur extra costs somewhere else: Some use unbounded storage in communication links, while others assume bounded concurrency or synchronous periods. We prove here that this is inherent, and indeed, if there is no bound on the concurrency level, then the storage cost of any reliable storage algorithm is at least f+1 times the data size, where f is the number of tolerated failures. We further present a technique for combining erasure-codes with full replication so as to obtain the best of both. We present a storage algorithm whose storage cost is close to the lower bound in the worst case, and adapts to the concurrency level. △ Less

Submitted 18 July, 2015; originally announced July 2015.

Showing 1–32 of 32 results for author: Spiegelman, A