-
C5: Cloned Concurrency Control that Always Keeps Up
Authors:
Jeffrey Helt,
Abhinav Sharma,
Daniel J. Abadi,
Wyatt Lloyd,
Jose M. Faleiro
Abstract:
Asynchronously replicated primary-backup databases are commonly deployed to improve availability and offload read-only transactions. To both apply replicated writes from the primary and serve read-only transactions, the backups implement a cloned concurrency control protocol. The protocol ensures read-only transactions always return a snapshot of state that previously existed on the primary. This…
▽ More
Asynchronously replicated primary-backup databases are commonly deployed to improve availability and offload read-only transactions. To both apply replicated writes from the primary and serve read-only transactions, the backups implement a cloned concurrency control protocol. The protocol ensures read-only transactions always return a snapshot of state that previously existed on the primary. This compels the backup to exactly copy the commit order resulting from the primary's concurrency control. Existing cloned concurrency control protocols guarantee this by limiting the backup's parallelism. As a result, the primary's concurrency control executes some workloads with more parallelism than these protocols. In this paper, we prove that this parallelism gap leads to unbounded replication lag, where writes can take arbitrarily long to replicate to the backup and which has led to catastrophic failures in production systems. We then design C5, the first cloned concurrency protocol to provide bounded replication lag. We implement two versions of C5: Our evaluation in MyRocks, a widely deployed database, demonstrates C5 provides bounded replication lag. Our evaluation in Cicada, a recent in-memory database, demonstrates C5 keeps up with even the fastest of primaries.
△ Less
Submitted 6 July, 2022;
originally announced July 2022.
-
A Fault-Tolerance Shim for Serverless Computing
Authors:
Vikram Sreekanti,
Chenggang Wu,
Saurav Chhatrapati,
Joseph E. Gonzalez,
Joseph M. Hellerstein,
Jose M. Faleiro
Abstract:
Serverless computing has grown in popularity in recent years, with an increasing number of applications being built on Functions-as-a-Service (FaaS) platforms. By default, FaaS platforms support retry-based fault tolerance, but this is insufficient for programs that modify shared state, as they can unwittingly persist partial sets of updates in case of failures. To address this challenge, we would…
▽ More
Serverless computing has grown in popularity in recent years, with an increasing number of applications being built on Functions-as-a-Service (FaaS) platforms. By default, FaaS platforms support retry-based fault tolerance, but this is insufficient for programs that modify shared state, as they can unwittingly persist partial sets of updates in case of failures. To address this challenge, we would like atomic visibility of the updates made by a FaaS application.
In this paper, we present AFT, an atomic fault tolerance shim for serverless applications. AFT interposes between a commodity FaaS platform and storage engine and ensures atomic visibility of updates by enforcing the read atomic isolation guarantee. AFT supports new protocols to guarantee read atomic isolation in the serverless setting. We demonstrate that aft introduces minimal overhead relative to existing storage engines and scales smoothly to thousands of requests per second, while preventing a significant number of consistency anomalies.
△ Less
Submitted 12 March, 2020;
originally announced March 2020.
-
Cloudburst: Stateful Functions-as-a-Service
Authors:
Vikram Sreekanti,
Chenggang Wu,
Xiayue Charles Lin,
Johann Schleier-Smith,
Jose M. Faleiro,
Joseph E. Gonzalez,
Joseph M. Hellerstein,
Alexey Tumanov
Abstract:
Function-as-a-Service (FaaS) platforms and "serverless" cloud computing are becoming increasingly popular. Current FaaS offerings are targeted at stateless functions that do minimal I/O and communication. We argue that the benefits of serverless computing can be extended to a broader range of applications and algorithms. We present the design and implementation of Cloudburst, a stateful FaaS platf…
▽ More
Function-as-a-Service (FaaS) platforms and "serverless" cloud computing are becoming increasingly popular. Current FaaS offerings are targeted at stateless functions that do minimal I/O and communication. We argue that the benefits of serverless computing can be extended to a broader range of applications and algorithms. We present the design and implementation of Cloudburst, a stateful FaaS platform that provides familiar Python programming with low-latency mutable state and communication, while maintaining the autoscaling benefits of serverless computing. Cloudburst accomplishes this by leveraging Anna, an autoscaling key-value store, for state sharing and overlay routing combined with mutable caches co-located with function executors for data locality. Performant cache consistency emerges as a key challenge in this architecture. To this end, Cloudburst provides a combination of lattice-encapsulated state and new definitions and protocols for distributed session consistency. Empirical results on benchmarks and diverse applications show that Cloudburst makes stateful functions practical, reducing the state-management overheads of current FaaS platforms by orders of magnitude while also improving the state of the art in serverless consistency.
△ Less
Submitted 24 July, 2020; v1 submitted 13 January, 2020;
originally announced January 2020.
-
Serverless Computing: One Step Forward, Two Steps Back
Authors:
Joseph M. Hellerstein,
Jose Faleiro,
Joseph E. Gonzalez,
Johann Schleier-Smith,
Vikram Sreekanti,
Alexey Tumanov,
Chenggang Wu
Abstract:
Serverless computing offers the potential to program the cloud in an autoscaling, pay-as-you go manner. In this paper we address critical gaps in first-generation serverless computing, which place its autoscaling potential at odds with dominant trends in modern computing: notably data-centric and distributed computing, but also open source and custom hardware. Put together, these gaps make current…
▽ More
Serverless computing offers the potential to program the cloud in an autoscaling, pay-as-you go manner. In this paper we address critical gaps in first-generation serverless computing, which place its autoscaling potential at odds with dominant trends in modern computing: notably data-centric and distributed computing, but also open source and custom hardware. Put together, these gaps make current serverless offerings a bad fit for cloud innovation and particularly bad for data systems innovation. In addition to pinpointing some of the main shortfalls of current serverless architectures, we raise a set of challenges we believe must be met to unlock the radical potential that the cloud---with its exabytes of storage and millions of cores---should offer to innovative developers.
△ Less
Submitted 10 December, 2018;
originally announced December 2018.
-
Automating Truth: The Case for Crowd-Powered Scientific Investigation in Economics
Authors:
Jorge Faleiro
Abstract:
Scientific investigation procedures have been evolving to follow an ever-changing cultural landscape, the sophistication of the technology available and an ever-growing knowledge base. This continuous evolution brought investigation practices through distinct historical phases, mostly marked by different types of participants and organization, from individual natural philosophers to science driven…
▽ More
Scientific investigation procedures have been evolving to follow an ever-changing cultural landscape, the sophistication of the technology available and an ever-growing knowledge base. This continuous evolution brought investigation practices through distinct historical phases, mostly marked by different types of participants and organization, from individual natural philosophers to science driven by large institutions. There is clear evidence that we are now getting to an age of drastic disruptive change. Increased complexity and mandatory multidisciplinary thinking have moved research from an initial phase of disjoint polymaths into a current phase of widespread uncontrolled use of computational tools and data generation, the "informatics crisis". The use of advanced computational technology for communication and generation of data in large scale without proper controls is compromising our ability to conduct an adequate reproducible investigation, causing a dangerous drift from the scientific method. To counteract this deviation, we advocate the use of a next-generation investigative approach leveraging forces of human diversity, micro-specialized crowds and proper computer-assisted control methods associated with a "pipeline of proof". This paper outlines the impact of advanced computational technology, not only as an accelerator of the rate in which humanity acquires objective knowledge but also as a dangerous side effect as a generator of massive amounts of uncontrolled, unverified and untraceable data and results that cannot be reproduced. We propose an alternative for methods of investigation based on collaboration in large-scale through standard procedures of proof and crowds in building a "collective brain in which neurons are human collaborators".
△ Less
Submitted 7 September, 2018;
originally announced September 2018.
-
Design Principles for Scaling Multi-core OLTP Under High Contention
Authors:
Kun Ren,
Jose M. Faleiro,
Daniel J. Abadi
Abstract:
Although significant recent progress has been made in improving the multi-core scalability of high throughput transactional database systems, modern systems still fail to achieve scalable throughput for workloads involving frequent access to highly contended data. Most of this inability to achieve high throughput is explained by the fundamental constraints involved in guaranteeing ACID --- the add…
▽ More
Although significant recent progress has been made in improving the multi-core scalability of high throughput transactional database systems, modern systems still fail to achieve scalable throughput for workloads involving frequent access to highly contended data. Most of this inability to achieve high throughput is explained by the fundamental constraints involved in guaranteeing ACID --- the addition of cores results in more concurrent transactions accessing the same contended data for which access must be serialized in order to guarantee isolation. Thus, linear scalability for contended workloads is impossible. However, there exist flaws in many modern architectures that exacerbate their poor scalability, and result in throughput that is much worse than fundamentally required by the workload.
In this paper we identify two prevalent design principles that limit the multi-core scalability of many (but not all) transactional database systems on contended workloads: the multi-purpose nature of execution threads in these systems, and the lack of advanced planning of data access. We demonstrate the deleterious results of these design principles by implementing a prototype system, ORTHRUS, that is motivated by the principles of separation of database component functionality and advanced planning of transactions. We find that these two principles alone result in significantly improved scalability on high-contention workloads, and an order of magnitude increase in throughput for a non-trivial subset of these contended workloads.
△ Less
Submitted 5 January, 2016; v1 submitted 18 December, 2015;
originally announced December 2015.
-
Rethinking serializable multiversion concurrency control
Authors:
Jose M. Faleiro,
Daniel J. Abadi
Abstract:
Multi-versioned database systems have the potential to significantly increase the amount of concurrency in transaction processing because they can avoid read-write conflicts. Unfortunately, the increase in concurrency usually comes at the cost of transaction serializability. If a database user requests full serializability, modern multi-versioned systems significantly constrain read-write concurre…
▽ More
Multi-versioned database systems have the potential to significantly increase the amount of concurrency in transaction processing because they can avoid read-write conflicts. Unfortunately, the increase in concurrency usually comes at the cost of transaction serializability. If a database user requests full serializability, modern multi-versioned systems significantly constrain read-write concurrency among conflicting transactions and employ expensive synchronization patterns in their design. In main-memory multi-core settings, these additional constraints are so burdensome that multi-versioned systems are often significantly outperformed by single-version systems.
We propose Bohm, a new concurrency control protocol for main-memory multi-versioned database systems. Bohm guarantees serializable execution while ensuring that reads never block writes. In addition, Bohm does not require reads to perform any book-kee** whatsoever, thereby avoiding the overhead of tracking reads via contended writes to shared memory. This leads to excellent scalability and performance in multi-core settings. Bohm has all the above characteristics without performing validation based concurrency control. Instead, it is pessimistic, and is therefore not prone to excessive aborts in the presence of contention. An experimental evaluation shows that Bohm performs well in both high contention and low contention settings, and is able to dramatically outperform state-of-the-art multi-versioned systems despite maintaining the full set of serializability guarantees.
△ Less
Submitted 2 December, 2015; v1 submitted 7 December, 2014;
originally announced December 2014.
-
Limiting Lamport Exposure to Distant Failures in Globally-Managed Distributed Systems
Authors:
Cristina Băsescu,
Georgia Fragkouli,
Enis Ceyhun Alp,
Michael F. Nowlan,
Jose M. Faleiro,
Gaylor Bosson,
Kelong Cong,
Pierluca Borsò-Tan,
Vero Estrada-Galiñanes,
Bryan Ford
Abstract:
Globalized computing infrastructures offer the convenience and elasticity of globally managed objects and services, but lack the resilience to distant failures that localized infrastructures such as private clouds provide. Providing both global management and resilience to distant failures, however, poses a fundamental problem for configuration services: How to discover a possibly migratory, stron…
▽ More
Globalized computing infrastructures offer the convenience and elasticity of globally managed objects and services, but lack the resilience to distant failures that localized infrastructures such as private clouds provide. Providing both global management and resilience to distant failures, however, poses a fundamental problem for configuration services: How to discover a possibly migratory, strongly-consistent service/object in a globalized infrastructure without dependencies on globalized state? Limix is the first metadata configuration service that addresses this problem. With Limix, global strongly-consistent data-plane services and objects are insulated from remote gray failures by ensuring that the definitive, strongly-consistent metadata for any object is always confined to the same region as the object itself. Limix guarantees availability bounds: any user can continue accessing any strongly consistent object that matters to the user located at distance $Δ$ away, insulated from failures outside a small multiple of $Δ$. We built a Limix metadata service based on CockroachDB. Our experiments on Internet-like networks and on AWS, using realistic trace-driven workloads, show that Limix enables global management and significantly improves availability over the state-of-the-art.
△ Less
Submitted 15 July, 2022; v1 submitted 3 May, 2014;
originally announced May 2014.