Search | arXiv e-print repository

Suki: Choreographed Distributed Dataflow in Rust

Authors: Shadaj Laddad, Alvin Cheung, Joseph M. Hellerstein

Abstract: Programming models for distributed dataflow have long focused on analytical workloads that allow the runtime to dynamically place and schedule compute logic. Meanwhile, models that enable fine-grained control over placement, such as actors, make global optimization difficult. In this extended abstract, we present Suki, an embedded Rust DSL that lets developers implement streaming dataflow with exp… ▽ More Programming models for distributed dataflow have long focused on analytical workloads that allow the runtime to dynamically place and schedule compute logic. Meanwhile, models that enable fine-grained control over placement, such as actors, make global optimization difficult. In this extended abstract, we present Suki, an embedded Rust DSL that lets developers implement streaming dataflow with explicit placement of computation. Key to this choreographic programming approach is our use of staged programming, which lets us expose a high-level Rust API while compiling local compute units into individual binaries with zero-overhead. We also explore how this approach, combined with Rust's trait system, enables a type-safe API for map** dataflow programs to cloud computing resources. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2404.01593 [pdf, other]

doi 10.1145/3639257

Optimizing Distributed Protocols with Query Rewrites [Technical Report]

Authors: David Chu, Rithvik Panchapakesan, Shadaj Laddad, Lucky Katahanas, Chris Liu, Kaushik Shivakumar, Natacha Crooks, Joseph M. Hellerstein, Heidi Howard

Abstract: Distributed protocols such as 2PC and Paxos lie at the core of many systems in the cloud, but standard implementations do not scale. New scalable distributed protocols are developed through careful analysis and rewrites, but this process is ad hoc and error-prone. This paper presents an approach for scaling any distributed protocol by applying rule-driven rewrites, borrowing from query optimizatio… ▽ More Distributed protocols such as 2PC and Paxos lie at the core of many systems in the cloud, but standard implementations do not scale. New scalable distributed protocols are developed through careful analysis and rewrites, but this process is ad hoc and error-prone. This paper presents an approach for scaling any distributed protocol by applying rule-driven rewrites, borrowing from query optimization. Distributed protocol rewrites entail a new burden: reasoning about spatiotemporal correctness. We leverage order-insensitivity and data dependency analysis to systematically identify correct coordination-free scaling opportunities. We apply this analysis to create preconditions and mechanisms for coordination-free decoupling and partitioning, two fundamental vertical and horizontal scaling techniques. Manual rule-driven applications of decoupling and partitioning improve the throughput of 2PC by $5\times$ and Paxos by $3\times$, and match state-of-the-art throughput in recent work. These results point the way toward automated optimizers for distributed protocols based on correct-by-construction rewrite rules. △ Less

Submitted 2 April, 2024; v1 submitted 3 January, 2024; originally announced April 2024.

Comments: Technical report of paper accepted at SIGMOD 2024

arXiv:2403.16795 [pdf, other]

doi 10.1145/3653697

"We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning

Authors: Shreya Shankar, Rolando Garcia, Joseph M Hellerstein, Aditya G Parameswaran

Abstract: Organizations rely on machine learning engineers (MLEs) to deploy models and maintain ML pipelines in production. Due to models' extensive reliance on fresh data, the operationalization of machine learning, or MLOps, requires MLEs to have proficiency in data science and engineering. When considered holistically, the job seems staggering -- how do MLEs do MLOps, and what are their unaddressed chall… ▽ More Organizations rely on machine learning engineers (MLEs) to deploy models and maintain ML pipelines in production. Due to models' extensive reliance on fresh data, the operationalization of machine learning, or MLOps, requires MLEs to have proficiency in data science and engineering. When considered holistically, the job seems staggering -- how do MLEs do MLOps, and what are their unaddressed challenges? To address these questions, we conducted semi-structured ethnographic interviews with 18 MLEs working on various applications, including chatbots, autonomous vehicles, and finance. We find that MLEs engage in a workflow of (i) data preparation, (ii) experimentation, (iii) evaluation throughout a multi-staged deployment, and (iv) continual monitoring and response. Throughout this workflow, MLEs collaborate extensively with data scientists, product stakeholders, and one another, supplementing routine verbal exchanges with communication tools ranging from Slack to organization-wide ticketing and reporting systems. We introduce the 3Vs of MLOps: velocity, visibility, and versioning -- three virtues of successful ML deployments that MLEs learn to balance and grow as they mature. Finally, we discuss design implications and opportunities for future work. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2209.09125

Journal ref: Proc. ACM Hum.-Comput. Interact. 8, CSCW1, Article 206 (April 2024)

arXiv:2310.07898 [pdf, other]

FlorDB: Multiversion Hindsight Logging for Continuous Training

Authors: Rolando Garcia, Anusha Dandamudi, Gabriel Matute, Lehan Wan, Joseph Gonzalez, Joseph M. Hellerstein, Koushik Sen

Abstract: Production Machine Learning involves continuous training: hosting multiple versions of models over time, often with many model versions running at once. When model performance does not meet expectations, Machine Learning Engineers (MLEs) debug issues by exploring and analyzing numerous prior versions of code and training data to identify root causes and mitigate problems. Traditional debugging and… ▽ More Production Machine Learning involves continuous training: hosting multiple versions of models over time, often with many model versions running at once. When model performance does not meet expectations, Machine Learning Engineers (MLEs) debug issues by exploring and analyzing numerous prior versions of code and training data to identify root causes and mitigate problems. Traditional debugging and logging tools often fall short in managing this experimental, multi-version context. FlorDB introduces Multiversion Hindsight Logging, which allows engineers to use the most recent version's logging statements to query past versions, even when older versions logged different data. Log statement propagation enables consistent injection of logging statements into past code versions, regardless of changes to the codebase. Once log statements are propagated across code versions, the remaining challenge in Multiversion Hindsight Logging is to efficiently replay the new log statements based on checkpoints from previous runs. Finally, a coherent user experience is required to help MLEs debug across all versions of code and data. To this end, FlorDB presents a unified relational model for efficient handling of historical queries, offering a comprehensive view of the log history to simplify the exploration of past code iterations. We present a performance evaluation on diverse benchmarks confirming its scalability and the ability to deliver real-time query responses, leveraging query-based filtering and checkpoint-based parallelism for efficient replay. △ Less

Submitted 2 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2308.06815 [pdf, other]

Optimizing the cloud? Don't train models. Build oracles!

Authors: Tiemo Bang, Conor Power, Siavash Ameli, Natacha Crooks, Joseph M. Hellerstein

Abstract: We propose cloud oracles, an alternative to machine learning for online optimization of cloud configurations. Our cloud oracle approach guarantees complete accuracy and explainability of decisions for problems that can be formulated as parametric convex optimizations. We give experimental evidence of this technique's efficacy and share a vision of research directions for expanding its applicabilit… ▽ More We propose cloud oracles, an alternative to machine learning for online optimization of cloud configurations. Our cloud oracle approach guarantees complete accuracy and explainability of decisions for problems that can be formulated as parametric convex optimizations. We give experimental evidence of this technique's efficacy and share a vision of research directions for expanding its applicability. △ Less

Submitted 22 December, 2023; v1 submitted 13 August, 2023; originally announced August 2023.

Comments: Camera-ready publication for CIDR'24: https://www.cidrdb.org/cidr2024/papers/p47-bang.pdf

arXiv:2306.10585 [pdf, other]

Optimizing Stateful Dataflow with Local Rewrites

Authors: Shadaj Laddad, Conor Power, Tyler Hou, Alvin Cheung, Joseph M. Hellerstein

Abstract: Optimizing a stateful dataflow language is a challenging task. There are strict correctness constraints for preserving properties expected by downstream consumers, a large space of possible optimizations, and complex analyses that must reason about the behavior of the program over time. Classic compiler techniques with specialized optimization passes yield unpredictable performance and have comple… ▽ More Optimizing a stateful dataflow language is a challenging task. There are strict correctness constraints for preserving properties expected by downstream consumers, a large space of possible optimizations, and complex analyses that must reason about the behavior of the program over time. Classic compiler techniques with specialized optimization passes yield unpredictable performance and have complex correctness proofs. But with e-graphs, we can dramatically simplify the process of building a correct optimizer while yielding more consistent results! In this short paper, we discuss our early work using e-graphs to develop an optimizer for a the Hydroflow dataflow language. Our prototype demonstrates that composing simple, easy-to-prove rewrite rules is sufficient to match techniques in hand-optimized systems. △ Less

Submitted 18 June, 2023; originally announced June 2023.

Comments: EGRAPHS 2023

arXiv:2305.14614 [pdf, other]

doi 10.1145/3584684.3597272

Invited Paper: Initial Steps Toward a Compiler for Distributed Programs

Authors: Joseph M. Hellerstein, Shadaj Laddad, Mae Milano, Conor Power, Mingwei Samuel

Abstract: In the Hydro project we are designing a compiler toolkit that can optimize for the concerns of distributed systems, including scale-up and scale-down, availability, and consistency of outcomes across replicas. This invited paper overviews the project, and provides an early walk-through of the kind of optimization that is possible. We illustrate how type transformations as well as local program tra… ▽ More In the Hydro project we are designing a compiler toolkit that can optimize for the concerns of distributed systems, including scale-up and scale-down, availability, and consistency of outcomes across replicas. This invited paper overviews the project, and provides an early walk-through of the kind of optimization that is possible. We illustrate how type transformations as well as local program transformations can combine, step by step, to convert a single-node program into a variety of distributed design points that offer the same semantics with different performance and deployment characteristics. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Journal ref: The 5th workshop on Advanced tools, program- ming languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems (ApPLIED 2023), June 19, 2023, Orlando, FL, USA

arXiv:2210.12605 [pdf, other]

Keep CALM and CRDT On

Authors: Shadaj Laddad, Conor Power, Mae Milano, Alvin Cheung, Natacha Crooks, Joseph M. Hellerstein

Abstract: Despite decades of research and practical experience, developers have few tools for programming reliable distributed applications without resorting to expensive coordination techniques. Conflict-free replicated datatypes (CRDTs) are a promising line of work that enable coordination-free replication and offer certain eventual consistency guarantees in a relatively simple object-oriented API. Yet CR… ▽ More Despite decades of research and practical experience, developers have few tools for programming reliable distributed applications without resorting to expensive coordination techniques. Conflict-free replicated datatypes (CRDTs) are a promising line of work that enable coordination-free replication and offer certain eventual consistency guarantees in a relatively simple object-oriented API. Yet CRDT guarantees extend only to data updates; observations of CRDT state are unconstrained and unsafe. We propose an agenda that embraces the simplicity of CRDTs, but provides richer, more uniform guarantees. We extend CRDTs with a query model that reasons about which queries are safe without coordination by applying monotonicity results from the CALM Theorem, and lay out a larger agenda for develo** CRDT data stores that let developers safely and efficiently interact with replicated application state. △ Less

Submitted 22 October, 2022; originally announced October 2022.

arXiv:2209.09125 [pdf, other]

Operationalizing Machine Learning: An Interview Study

Authors: Shreya Shankar, Rolando Garcia, Joseph M. Hellerstein, Aditya G. Parameswaran

Abstract: Organizations rely on machine learning engineers (MLEs) to operationalize ML, i.e., deploy and maintain ML pipelines in production. The process of operationalizing ML, or MLOps, consists of a continual loop of (i) data collection and labeling, (ii) experimentation to improve ML performance, (iii) evaluation throughout a multi-staged deployment process, and (iv) monitoring of performance drops in p… ▽ More Organizations rely on machine learning engineers (MLEs) to operationalize ML, i.e., deploy and maintain ML pipelines in production. The process of operationalizing ML, or MLOps, consists of a continual loop of (i) data collection and labeling, (ii) experimentation to improve ML performance, (iii) evaluation throughout a multi-staged deployment process, and (iv) monitoring of performance drops in production. When considered together, these responsibilities seem staggering -- how does anyone do MLOps, what are the unaddressed challenges, and what are the implications for tool builders? We conducted semi-structured ethnographic interviews with 18 MLEs working across many applications, including chatbots, autonomous vehicles, and finance. Our interviews expose three variables that govern success for a production ML deployment: Velocity, Validation, and Versioning. We summarize common practices for successful ML experimentation, deployment, and sustaining production performance. Finally, we discuss interviewees' pain points and anti-patterns, with implications for tool design. △ Less

Submitted 16 September, 2022; originally announced September 2022.

Comments: 20 pages, 4 figures

arXiv:2205.12425 [pdf, other]

Katara: Synthesizing CRDTs with Verified Lifting

Authors: Shadaj Laddad, Conor Power, Mae Milano, Alvin Cheung, Joseph M. Hellerstein

Abstract: Conflict-free replicated data types (CRDTs) are a promising tool for designing scalable, coordination-free distributed systems. However, constructing correct CRDTs is difficult, posing a challenge for even seasoned developers. As a result, CRDT development is still largely the domain of academics, with new designs often awaiting peer review and a manual proof of correctness. In this paper, we pres… ▽ More Conflict-free replicated data types (CRDTs) are a promising tool for designing scalable, coordination-free distributed systems. However, constructing correct CRDTs is difficult, posing a challenge for even seasoned developers. As a result, CRDT development is still largely the domain of academics, with new designs often awaiting peer review and a manual proof of correctness. In this paper, we present Katara, a program synthesis-based system that takes sequential data type implementations and automatically synthesizes verified CRDT designs from them. Key to this process is a new formal definition of CRDT correctness that combines a reference sequential type with a lightweight ordering constraint that resolves conflicts between non-commutative operations. Our process follows the tradition of work in verified lifting, including an encoding of correctness into SMT logic using synthesized inductive invariants and hand-crafted grammars for the CRDT state and runtime. Katara is able to automatically synthesize CRDTs for a wide variety of scenarios, from reproducing classic CRDTs to synthesizing novel designs based on specifications in existing literature. Crucially, our synthesized CRDTs are fully, automatically verified, eliminating entire classes of common errors and reducing the process of producing a new CRDT from a painstaking paper proof of correctness to a lightweight specification. △ Less

Submitted 21 September, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

ACM Class: D.1.2

arXiv:2205.07147 [pdf]

The Sky Above The Clouds

Authors: Sarah Chasins, Alvin Cheung, Natacha Crooks, Ali Ghodsi, Ken Goldberg, Joseph E. Gonzalez, Joseph M. Hellerstein, Michael I. Jordan, Anthony D. Joseph, Michael W. Mahoney, Aditya Parameswaran, David Patterson, Raluca Ada Popa, Koushik Sen, Scott Shenker, Dawn Song, Ion Stoica

Abstract: Technology ecosystems often undergo significant transformations as they mature. For example, telephony, the Internet, and PCs all started with a single provider, but in the United States each is now served by a competitive market that uses comprehensive and universal technology standards to provide compatibility. This white paper presents our view on how the cloud ecosystem, barely over fifteen ye… ▽ More Technology ecosystems often undergo significant transformations as they mature. For example, telephony, the Internet, and PCs all started with a single provider, but in the United States each is now served by a competitive market that uses comprehensive and universal technology standards to provide compatibility. This white paper presents our view on how the cloud ecosystem, barely over fifteen years old, could evolve as it matures. △ Less

Submitted 14 May, 2022; originally announced May 2022.

Comments: 35 pages

arXiv:2104.04102 [pdf, other]

Read-Write Quorum Systems Made Practical

Authors: Michael Whittaker, Aleksey Charapko, Joseph M. Hellerstein, Heidi Howard, Ion Stoica

Abstract: Quorum systems are a powerful mechanism for ensuring the consistency of replicated data. Production systems usually opt for majority quorums due to their simplicity and fault tolerance, but majority quorum systems provide poor throughput and scalability. Alternatively, researchers have invented a number of theoretically "optimal" quorum systems, but the underlying theory ignores many practical com… ▽ More Quorum systems are a powerful mechanism for ensuring the consistency of replicated data. Production systems usually opt for majority quorums due to their simplicity and fault tolerance, but majority quorum systems provide poor throughput and scalability. Alternatively, researchers have invented a number of theoretically "optimal" quorum systems, but the underlying theory ignores many practical complexities such as machine heterogeneity and workload skew. In this paper, we conduct a pragmatic re-examination of quorum systems. We enrich the current theory on quorum systems with a number of practical refinements to find quorum systems that provide higher throughput, lower latency, and lower network load. We also develop a library Quoracle that precisely quantifies the available trade-offs between quorum systems to empower engineers to find optimal quorum systems, given a set of objectives for specific deployments and workloads. Our tool is available online at: https://github.com/mwhittaker/quoracle. △ Less

Submitted 8 April, 2021; originally announced April 2021.

Comments: To be published in PaPoC 2021 (https://papoc-workshop.github.io/2021/)

arXiv:2103.02145 [pdf, other]

Enhancing the Interactivity of Dataframe Queries by Leveraging Think Time

Authors: Doris Xin, Devin Petersohn, Dixin Tang, Yifan Wu, Joseph E. Gonzalez, Joseph M. Hellerstein, Anthony D. Joseph, Aditya G. Parameswaran

Abstract: We propose opportunistic evaluation, a framework for accelerating interactions with dataframes. Interactive latency is critical for iterative, human-in-the-loop dataframe workloads for supporting exploratory data analysis. Opportunistic evaluation significantly reduces interactive latency by 1) prioritizing computation directly relevant to the interactions and 2) leveraging think time for asynchro… ▽ More We propose opportunistic evaluation, a framework for accelerating interactions with dataframes. Interactive latency is critical for iterative, human-in-the-loop dataframe workloads for supporting exploratory data analysis. Opportunistic evaluation significantly reduces interactive latency by 1) prioritizing computation directly relevant to the interactions and 2) leveraging think time for asynchronous background computation for non-critical operators that might be relevant to future interactions. We show, through empirical analysis, that current user behavior presents ample opportunities for optimization, and the solutions we propose effectively harness such opportunities. △ Less

Submitted 2 March, 2021; originally announced March 2021.

arXiv:2101.01159 [pdf, other]

New Directions in Cloud Programming

Authors: Alvin Cheung, Natacha Crooks, Joseph M. Hellerstein, Mae Milano

Abstract: Nearly twenty years after the launch of AWS, it remains difficult for most developers to harness the enormous potential of the cloud. In this paper we lay out an agenda for a new generation of cloud programming research aimed at bringing research ideas to programmers in an evolutionary fashion. Key to our approach is a separation of distributed programs into a PACT of four facets: Program semant… ▽ More Nearly twenty years after the launch of AWS, it remains difficult for most developers to harness the enormous potential of the cloud. In this paper we lay out an agenda for a new generation of cloud programming research aimed at bringing research ideas to programmers in an evolutionary fashion. Key to our approach is a separation of distributed programs into a PACT of four facets: Program semantics, Availablity, Consistency and Targets of optimization. We propose to migrate developers gradually to PACT programming by lifting familiar code into our more declarative level of abstraction. We then propose a multi-stage compiler that emits human-readable code at each stage that can be hand-tuned by developers seeking more control. Our agenda raises numerous research challenges across multiple areas including language design, query optimization, transactions, distributed consistency, compilers and program synthesis. △ Less

Submitted 4 January, 2021; originally announced January 2021.

Journal ref: CIDR 2021

arXiv:2012.15762 [pdf, other]

Scaling Replicated State Machines with Compartmentalization [Technical Report]

Authors: Michael Whittaker, Ailidani Ailijiang, Aleksey Charapko, Murat Demirbas, Neil Giridharan, Joseph M. Hellerstein, Heidi Howard, Ion Stoica, Adriana Szekeres

Abstract: State machine replication protocols, like MultiPaxos and Raft, are a critical component of many distributed systems and databases. However, these protocols offer relatively low throughput due to several bottlenecked components. Numerous existing protocols fix different bottlenecks in isolation but fall short of a complete solution. When you fix one bottleneck, another arises. In this paper, we int… ▽ More State machine replication protocols, like MultiPaxos and Raft, are a critical component of many distributed systems and databases. However, these protocols offer relatively low throughput due to several bottlenecked components. Numerous existing protocols fix different bottlenecks in isolation but fall short of a complete solution. When you fix one bottleneck, another arises. In this paper, we introduce compartmentalization, the first comprehensive technique to eliminate state machine replication bottlenecks. Compartmentalization involves decoupling individual bottlenecks into distinct components and scaling these components independently. Compartmentalization has two key strengths. First, compartmentalization leads to strong performance. In this paper, we demonstrate how to compartmentalize MultiPaxos to increase its throughput by 6x on a write-only workload and 16x on a mixed read-write workload. Unlike other approaches, we achieve this performance without the need for specialized hardware. Second, compartmentalization is a technique, not a protocol. Industry practitioners can apply compartmentalization to their protocols incrementally without having to adopt a completely new protocol. △ Less

Submitted 16 May, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

Comments: Technical Report

arXiv:2010.13752 [pdf, other]

Senate: A Maliciously-Secure MPC Platform for Collaborative Analytics

Authors: Rishabh Poddar, Sukrit Kalra, Avishay Yanai, Ryan Deng, Raluca Ada Popa, Joseph M. Hellerstein

Abstract: Many organizations stand to benefit from pooling their data together in order to draw mutually beneficial insights -- e.g., for fraud detection across banks, better medical studies across hospitals, etc. However, such organizations are often prevented from sharing their data with each other by privacy concerns, regulatory hurdles, or business competition. We present Senate, a system that allows mu… ▽ More Many organizations stand to benefit from pooling their data together in order to draw mutually beneficial insights -- e.g., for fraud detection across banks, better medical studies across hospitals, etc. However, such organizations are often prevented from sharing their data with each other by privacy concerns, regulatory hurdles, or business competition. We present Senate, a system that allows multiple parties to collaboratively run analytical SQL queries without revealing their individual data to each other. Unlike prior works on secure multi-party computation (MPC) that assume that all parties are semi-honest, Senate protects the data even in the presence of malicious adversaries. At the heart of Senate lies a new MPC decomposition protocol that decomposes the cryptographic MPC computation into smaller units, some of which can be executed by subsets of parties and in parallel, while preserving its security guarantees. Senate then provides a new query planning algorithm that decomposes and plans the cryptographic computation effectively, achieving a performance of up to 145$\times$ faster than the state-of-the-art. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Comments: USENIX Security 2021

arXiv:2009.09845 [pdf, other]

A FaaS File System for Serverless Computing

Authors: Johann Schleier-Smith, Leonhard Holz, Nathan Pemberton, Joseph M. Hellerstein

Abstract: Serverless computing with cloud functions is quickly gaining adoption, but constrains programmers with its limited support for state management. We introduce a shared file system for cloud functions. It offers familiar POSIX semantics while taking advantage of distinctive aspects of cloud functions to achieve scalability and performance beyond what traditional shared file systems can offer. We tak… ▽ More Serverless computing with cloud functions is quickly gaining adoption, but constrains programmers with its limited support for state management. We introduce a shared file system for cloud functions. It offers familiar POSIX semantics while taking advantage of distinctive aspects of cloud functions to achieve scalability and performance beyond what traditional shared file systems can offer. We take advantage of the function-grained fault tolerance model of cloud functions to proceed optimistically using local state, safe in the knowledge that we can restart if cache reads or lock activity cannot be reconciled upon commit. The boundaries of cloud functions provide implicit commit and rollback points, giving us the flexibility to use transaction processing techniques without changing the programming model or API. This allows a variety of stateful sever-based applications to benefit from the simplicity and scalability of serverless computing, often with little or no modification. △ Less

Submitted 16 September, 2020; originally announced September 2020.

arXiv:2007.09468 [pdf, other]

Matchmaker Paxos: A Reconfigurable Consensus Protocol [Technical Report]

Authors: Michael Whittaker, Neil Giridharan, Adriana Szekeres, Joseph M. Hellerstein, Heidi Howard, Faisal Nawab, Ion Stoica

Abstract: State machine replication protocols, like MultiPaxos and Raft, are at the heart of nearly every strongly consistent distributed database. To tolerate machine failures, these protocols must replace failed machines with live machines, a process known as reconfiguration. Reconfiguration has become increasingly important over time as the need for frequent reconfiguration has grown. Despite this, recon… ▽ More State machine replication protocols, like MultiPaxos and Raft, are at the heart of nearly every strongly consistent distributed database. To tolerate machine failures, these protocols must replace failed machines with live machines, a process known as reconfiguration. Reconfiguration has become increasingly important over time as the need for frequent reconfiguration has grown. Despite this, reconfiguration has largely been neglected in the literature. In this paper, we present Matchmaker Paxos and Matchmaker MultiPaxos, a reconfigurable consensus and state machine replication protocol respectively. Our protocols can perform a reconfiguration with little to no impact on the latency or throughput of command processing; they can perform a reconfiguration in one round trip (theoretically) and a few milliseconds (empirically); they provide a number of theoretical insights; and they present a framework that can be generalized to other replication protocols in a way that previous reconfiguration techniques can not. We provide proofs of correctness for the protocols and optimizations, and present empirical results from an open source implementation. △ Less

Submitted 20 July, 2020; v1 submitted 18 July, 2020; originally announced July 2020.

arXiv:2007.05832 [pdf, other]

Optimizing Prediction Serving on Low-Latency Serverless Dataflow

Authors: Vikram Sreekanti, Harikaran Subbaraj, Chenggang Wu, Joseph E. Gonzalez, Joseph M. Hellerstein

Abstract: Prediction serving systems are designed to provide large volumes of low-latency inferences machine learning models. These systems mix data processing and computationally intensive model inference and benefit from multiple heterogeneous processors and distributed computing resources. In this paper, we argue that a familiar dataflow API is well-suited to this latency-sensitive task, and amenable to… ▽ More Prediction serving systems are designed to provide large volumes of low-latency inferences machine learning models. These systems mix data processing and computationally intensive model inference and benefit from multiple heterogeneous processors and distributed computing resources. In this paper, we argue that a familiar dataflow API is well-suited to this latency-sensitive task, and amenable to optimization even with unmodified black-box ML models. We present the design of Cloudflow, a system that provides this API and realizes it on an autoscaling serverless backend. Cloudflow transparently implements performance-critical optimizations including operator fusion and competitive execution. Our evaluation shows that Cloudflow's optimizations yield significant performance improvements on synthetic workloads and that Cloudflow outperforms state-of-the-art prediction serving systems by as much as 2x on real-world prediction pipelines, meeting latency goals of demanding applications like real-time video analysis. △ Less

Submitted 11 July, 2020; originally announced July 2020.

arXiv:2006.07357 [pdf, other]

doi 10.14778/3436905.3436925

Hindsight Logging for Model Training

Authors: Rolando Garcia, Eric Liu, Vikram Sreekanti, Bobby Yan, Anusha Dandamudi, Joseph E. Gonzalez, Joseph M. Hellerstein, Koushik Sen

Abstract: In modern Machine Learning, model training is an iterative, experimental process that can consume enormous computation resources and developer time. To aid in that process, experienced model developers log and visualize program variables during training runs. Exhaustive logging of all variables is infeasible. Optimistic logging can be accompanied by program checkpoints; this allows developers to a… ▽ More In modern Machine Learning, model training is an iterative, experimental process that can consume enormous computation resources and developer time. To aid in that process, experienced model developers log and visualize program variables during training runs. Exhaustive logging of all variables is infeasible. Optimistic logging can be accompanied by program checkpoints; this allows developers to add log statements post-hoc, and "replay" desired log statements from checkpoint -- a process we refer to as hindsight logging. Unfortunately, hindsight logging raises tricky problems in data management and software engineering. Done poorly, hindsight logging can waste resources and generate technical debt embodied in multiple variants of training code. In this paper, we present methodologies for efficient and effective logging practices for model training, with a focus on techniques for hindsight logging. Our goal is for experienced model developers to learn and adopt these practices. To make this easier, we provide an open-source suite of tools for Fast Low-Overhead Recovery (flor) that embodies our design across three tasks: (i) efficient background logging in Python, (ii) adaptable periodic checkpointing, and (iii) an instrumentation library that codifies hindsight logging for efficient and automatic record-replay of model-training. Model developers can use each flor tool separately as they see fit, or they can use flor in hands-free mode, entrusting it to instrument their code end-to-end for efficient record-replay. Our solutions leverage techniques from physiological transaction logs and recovery in database systems. Evaluations on modern ML benchmarks demonstrate that flor can produce fast checkpointing with small user-specifiable overheads (e.g. 7%), and still provide hindsight log replay times orders of magnitude faster than restarting training from scratch. △ Less

Submitted 2 December, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

arXiv:2003.06007 [pdf, other]

A Fault-Tolerance Shim for Serverless Computing

Authors: Vikram Sreekanti, Chenggang Wu, Saurav Chhatrapati, Joseph E. Gonzalez, Joseph M. Hellerstein, Jose M. Faleiro

Abstract: Serverless computing has grown in popularity in recent years, with an increasing number of applications being built on Functions-as-a-Service (FaaS) platforms. By default, FaaS platforms support retry-based fault tolerance, but this is insufficient for programs that modify shared state, as they can unwittingly persist partial sets of updates in case of failures. To address this challenge, we would… ▽ More Serverless computing has grown in popularity in recent years, with an increasing number of applications being built on Functions-as-a-Service (FaaS) platforms. By default, FaaS platforms support retry-based fault tolerance, but this is insufficient for programs that modify shared state, as they can unwittingly persist partial sets of updates in case of failures. To address this challenge, we would like atomic visibility of the updates made by a FaaS application. In this paper, we present AFT, an atomic fault tolerance shim for serverless applications. AFT interposes between a commodity FaaS platform and storage engine and ensures atomic visibility of updates by enforcing the read atomic isolation guarantee. AFT supports new protocols to guarantee read atomic isolation in the serverless setting. We demonstrate that aft introduces minimal overhead relative to existing storage engines and scales smoothly to thousands of requests per second, while preventing a significant number of consistency anomalies. △ Less

Submitted 12 March, 2020; originally announced March 2020.

arXiv:2003.00331 [pdf, other]

Bipartisan Paxos: A Modular State Machine Replication Protocol

Authors: Michael Whittaker, Neil Giridharan, Adriana Szekeres, Joseph M. Hellerstein, Ion Stoica

Abstract: There is no shortage of state machine replication protocols. From Generalized Paxos to EPaxos, a huge number of replication protocols have been proposed that achieve high throughput and low latency. However, these protocols all have two problems. First, they do not scale. Many protocols actually slow down when you scale them, instead of speeding up. For example, increasing the number of MultiPaxos… ▽ More There is no shortage of state machine replication protocols. From Generalized Paxos to EPaxos, a huge number of replication protocols have been proposed that achieve high throughput and low latency. However, these protocols all have two problems. First, they do not scale. Many protocols actually slow down when you scale them, instead of speeding up. For example, increasing the number of MultiPaxos acceptors increases quorum sizes and slows down the protocol. Second, they are too complicated. This is not a secret; state machine replication is notoriously difficult to understand. In this paper, we tackle both problems with a single solution: modularity. We present Bipartisan Paxos (BPaxos), a modular state machine replication protocol. Modularity yields high throughput via scaling. We note that while many replication protocol components do not scale, some do. By modularizing BPaxos, we are able to disentangle the two and scale the bottleneck components to increase the protocol's throughput. Modularity also yields simplicity. BPaxos is divided into a number of independent modules that can be understood and proven correct in isolation. △ Less

Submitted 29 February, 2020; originally announced March 2020.

arXiv:2001.04592 [pdf, other]

doi 10.14778/3407790.3407836

Cloudburst: Stateful Functions-as-a-Service

Authors: Vikram Sreekanti, Chenggang Wu, Xiayue Charles Lin, Johann Schleier-Smith, Jose M. Faleiro, Joseph E. Gonzalez, Joseph M. Hellerstein, Alexey Tumanov

Abstract: Function-as-a-Service (FaaS) platforms and "serverless" cloud computing are becoming increasingly popular. Current FaaS offerings are targeted at stateless functions that do minimal I/O and communication. We argue that the benefits of serverless computing can be extended to a broader range of applications and algorithms. We present the design and implementation of Cloudburst, a stateful FaaS platf… ▽ More Function-as-a-Service (FaaS) platforms and "serverless" cloud computing are becoming increasingly popular. Current FaaS offerings are targeted at stateless functions that do minimal I/O and communication. We argue that the benefits of serverless computing can be extended to a broader range of applications and algorithms. We present the design and implementation of Cloudburst, a stateful FaaS platform that provides familiar Python programming with low-latency mutable state and communication, while maintaining the autoscaling benefits of serverless computing. Cloudburst accomplishes this by leveraging Anna, an autoscaling key-value store, for state sharing and overlay routing combined with mutable caches co-located with function executors for data locality. Performant cache consistency emerges as a key challenge in this architecture. To this end, Cloudburst provides a combination of lattice-encapsulated state and new definitions and protocols for distributed session consistency. Empirical results on benchmarks and diverse applications show that Cloudburst makes stateful functions practical, reducing the state-management overheads of current FaaS platforms by orders of magnitude while also improving the state of the art in serverless consistency. △ Less

Submitted 24 July, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

Journal ref: PVLDB, 13(11):2438-2452, 2020

arXiv:2001.00888 [pdf, other]

Towards Scalable Dataframe Systems

Authors: Devin Petersohn, Stephen Macke, Doris Xin, William Ma, Doris Lee, Xiangxi Mo, Joseph E. Gonzalez, Joseph M. Hellerstein, Anthony D. Joseph, Aditya Parameswaran

Abstract: Dataframes are a popular abstraction to represent, prepare, and analyze data. Despite the remarkable success of dataframe libraries in Rand Python, dataframes face performance issues even on moderately large datasets. Moreover, there is significant ambiguity regarding dataframe semantics. In this paper we lay out a vision and roadmap for scalable dataframe systems. To demonstrate the potential in… ▽ More Dataframes are a popular abstraction to represent, prepare, and analyze data. Despite the remarkable success of dataframe libraries in Rand Python, dataframes face performance issues even on moderately large datasets. Moreover, there is significant ambiguity regarding dataframe semantics. In this paper we lay out a vision and roadmap for scalable dataframe systems. To demonstrate the potential in this area, we report on our experience building MODIN, a scaled-up implementation of the most widely-used and complex dataframe API today, Python's pandas. With pandas as a reference, we propose a simple data model and algebra for dataframes to ground discussion in the field. Given this foundation, we lay out an agenda of open research opportunities where the distinct features of dataframes will require extending the state of the art in many dimensions of data management. We discuss the implications of signature data-frame features including flexible schemas, ordering, row/column equivalence, and data/metadata fluidity, as well as the piecemeal, trial-and-error-based approach to interacting with dataframes. △ Less

Submitted 2 June, 2020; v1 submitted 3 January, 2020; originally announced January 2020.

arXiv:1905.04278 [pdf, other]

doi 10.14778/3368289.3368294

Deep Unsupervised Cardinality Estimation

Authors: Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, Ion Stoica

Abstract: Cardinality estimation has long been grounded in statistical tools for density estimation. To capture the rich multivariate distributions of relational tables, we propose the use of a new type of high-capacity statistical model: deep autoregressive models. However, direct application of these models leads to a limited estimator that is prohibitively expensive to evaluate for range or wildcard pred… ▽ More Cardinality estimation has long been grounded in statistical tools for density estimation. To capture the rich multivariate distributions of relational tables, we propose the use of a new type of high-capacity statistical model: deep autoregressive models. However, direct application of these models leads to a limited estimator that is prohibitively expensive to evaluate for range or wildcard predicates. To produce a truly usable estimator, we develop a Monte Carlo integration scheme on top of autoregressive models that can efficiently handle range queries with dozens of dimensions or more. Like classical synopses, our estimator summarizes the data without supervision. Unlike previous solutions, we approximate the joint data distribution without any independence assumptions. Evaluated on real-world datasets and compared against real systems and dominant families of techniques, our estimator achieves single-digit multiplicative error at tail, an up to 90$\times$ accuracy improvement over the second best method, and is space- and runtime-efficient. △ Less

Submitted 21 November, 2019; v1 submitted 10 May, 2019; originally announced May 2019.

Comments: VLDB 2020. Updates since version 1: new title and new/revised content

Journal ref: Proceedings of the VLDB Endowment (PLVDB), Vol. 13, No. 3, pp. 279-292 (2019)

arXiv:1901.01973 [pdf, ps, other]

Looking Back at Postgres

Authors: Joseph M. Hellerstein

Abstract: This is a recollection of the UC Berkeley Postgres project, which was led by Mike Stonebraker from the mid-1980's to the mid-1990's. The article was solicited for Stonebraker's Turing Award book, as one of many personal/historical recollections. As a result it focuses on Stonebraker's design ideas and leadership. But Stonebraker was never a coder, and he stayed out of the way of his development te… ▽ More This is a recollection of the UC Berkeley Postgres project, which was led by Mike Stonebraker from the mid-1980's to the mid-1990's. The article was solicited for Stonebraker's Turing Award book, as one of many personal/historical recollections. As a result it focuses on Stonebraker's design ideas and leadership. But Stonebraker was never a coder, and he stayed out of the way of his development team. The Postgres codebase was the work of a team of brilliant students and the occasional university "staff programmers" who had little more experience (and only slightly more compensation) than the students. I was lucky to join that team as a student during the latter years of the project. I got helpful input on this writeup from some of the more senior students on the project, but any errors or omissions are mine. If you spot any such, please contact me and I will try to fix them. △ Less

Submitted 7 January, 2019; originally announced January 2019.

arXiv:1901.01930 [pdf, other]

Kee** CALM: When Distributed Consistency is Easy

Authors: Joseph M. Hellerstein, Peter Alvaro

Abstract: A key concern in modern distributed systems is to avoid the cost of coordination while maintaining consistent semantics. Until recently, there was no answer to the question of when coordination is actually required. In this paper we present an informal introduction to the CALM Theorem, which answers this question precisely by moving up from traditional storage consistency to consider properties of… ▽ More A key concern in modern distributed systems is to avoid the cost of coordination while maintaining consistent semantics. Until recently, there was no answer to the question of when coordination is actually required. In this paper we present an informal introduction to the CALM Theorem, which answers this question precisely by moving up from traditional storage consistency to consider properties of programs. CALM is an acronym for "consistency as logical monotonicity". The CALM Theorem shows that the programs that have consistent, coordination-free distributed implementations are exactly the programs that can be expressed in monotonic logic. This theoretical result has practical implications for developers of distributed applications. We show how CALM provides a constructive application-level counterpart to conventional "systems" wisdom, such as the apparently negative results of the CAP Theorem. We also discuss ways that monotonic thinking can influence distributed systems design, and how new programming language designs and tools can help developers write consistent, coordination-free code. △ Less

Submitted 25 January, 2019; v1 submitted 7 January, 2019; originally announced January 2019.

arXiv:1812.03651 [pdf, other]

Serverless Computing: One Step Forward, Two Steps Back

Authors: Joseph M. Hellerstein, Jose Faleiro, Joseph E. Gonzalez, Johann Schleier-Smith, Vikram Sreekanti, Alexey Tumanov, Chenggang Wu

Abstract: Serverless computing offers the potential to program the cloud in an autoscaling, pay-as-you go manner. In this paper we address critical gaps in first-generation serverless computing, which place its autoscaling potential at odds with dominant trends in modern computing: notably data-centric and distributed computing, but also open source and custom hardware. Put together, these gaps make current… ▽ More Serverless computing offers the potential to program the cloud in an autoscaling, pay-as-you go manner. In this paper we address critical gaps in first-generation serverless computing, which place its autoscaling potential at odds with dominant trends in modern computing: notably data-centric and distributed computing, but also open source and custom hardware. Put together, these gaps make current serverless offerings a bad fit for cloud innovation and particularly bad for data systems innovation. In addition to pinpointing some of the main shortfalls of current serverless architectures, we raise a set of challenges we believe must be met to unlock the radical potential that the cloud---with its exabytes of storage and millions of cores---should offer to innovative developers. △ Less

Submitted 10 December, 2018; originally announced December 2018.

Comments: 8 pages, draft for CIDR 2019

arXiv:1809.07750 [pdf, other]

doi 10.1109/EuroSP48549.2020.00041

Chorus: a Programming Framework for Building Scalable Differential Privacy Mechanisms

Authors: Noah Johnson, Joseph P. Near, Joseph M. Hellerstein, Dawn Song

Abstract: Differential privacy is fast becoming the gold standard in enabling statistical analysis of data while protecting the privacy of individuals. However, practical use of differential privacy still lags behind research progress because research prototypes cannot satisfy the scalability requirements of production deployments. To address this challenge, we present Chorus, a framework for building scala… ▽ More Differential privacy is fast becoming the gold standard in enabling statistical analysis of data while protecting the privacy of individuals. However, practical use of differential privacy still lags behind research progress because research prototypes cannot satisfy the scalability requirements of production deployments. To address this challenge, we present Chorus, a framework for building scalable differential privacy mechanisms which is based on cooperation between the mechanism itself and a high-performance production database management system (DBMS). We demonstrate the use of Chorus to build the first highly scalable implementations of complex mechanisms like Weighted PINQ, MWEM, and the matrix mechanism. We report on our experience deploying Chorus at Uber, and evaluate its scalability on real-world queries. △ Less

Submitted 4 May, 2021; v1 submitted 20 September, 2018; originally announced September 2018.

arXiv:1809.00089 [pdf, other]

Eliminating Boundaries in Cloud Storage with Anna

Authors: Chenggang Wu, Vikram Sreekanti, Joseph M. Hellerstein

Abstract: In this paper, we describe how we extended a distributed key-value store called Anna into an elastic, multi-tier service for the cloud. In its extended form, Anna is designed to overcome the narrow cost-performance limitations typical of current cloud storage systems. We describe three key aspects of Anna's new design: multi-master selective replication of hot keys, a vertical tiering of storage l… ▽ More In this paper, we describe how we extended a distributed key-value store called Anna into an elastic, multi-tier service for the cloud. In its extended form, Anna is designed to overcome the narrow cost-performance limitations typical of current cloud storage systems. We describe three key aspects of Anna's new design: multi-master selective replication of hot keys, a vertical tiering of storage layers with different cost-performance tradeoffs, and horizontal elasticity of each tier to add and remove nodes in response to load dynamics. Anna's policy engine uses these mechanisms to balance service-level objectives around cost, latency and fault tolerance. Experimental results explore the behavior of Anna's mechanisms and policy, exhibiting orders of magnitude efficiency improvements over both commodity cloud KVS services and research systems. △ Less

Submitted 31 August, 2018; originally announced September 2018.

arXiv:1806.01499 [pdf, other]

Facilitating Exploration with Interaction Snapshots under High Latency

Authors: Yifan Wu, Remco Chang, Joseph M. Hellerstein, Eugene Wu

Abstract: Latency is, unfortunately, a reality when working with large datasets. Guaranteeing imperceptible latency for interactivity is often prohibitively expensive: the application developer may be forced to migrate data processing engines or deal with complex error bounds on samples, and to limit the application to users with high network bandwidth. Instead of relying on the backend, we propose a simple… ▽ More Latency is, unfortunately, a reality when working with large datasets. Guaranteeing imperceptible latency for interactivity is often prohibitively expensive: the application developer may be forced to migrate data processing engines or deal with complex error bounds on samples, and to limit the application to users with high network bandwidth. Instead of relying on the backend, we propose a simple UX design---interaction snapshots. Responses of requests from the interactions are asynchronously loaded in "snapshots". With interaction snapshots, users can interact concurrently while the snapshots load. Our user study participants found it useful not to have to wait for each result and easily navigate to prior snapshots. For latency up to 5 seconds, participants were able to complete extrema, threshold, and trend identification tasks with little negative impact. △ Less

Submitted 5 September, 2020; v1 submitted 5 June, 2018; originally announced June 2018.

arXiv:1803.00701 [pdf, other]

CLX: Towards verifiable PBE data transformation

Authors: Zhongjun **, Michael Cafarella, H. V. Jagadish, Sean Kandel, Michael Minar, Joseph M. Hellerstein

Abstract: Effective data analytics on data collected from the real world usually begins with a notoriously expensive pre-processing step of data transformation and wrangling. Programming By Example (PBE) systems have been proposed to automatically infer transformations using simple examples that users provide as hints. However, an important usability issue - verification - limits the effective use of such P… ▽ More Effective data analytics on data collected from the real world usually begins with a notoriously expensive pre-processing step of data transformation and wrangling. Programming By Example (PBE) systems have been proposed to automatically infer transformations using simple examples that users provide as hints. However, an important usability issue - verification - limits the effective use of such PBE data transformation systems, since the verification process is often effort-consuming and unreliable. We propose a data transformation paradigm design CLX (pronounced "clicks") with a focus on facilitating verification for end users in a PBE-like data transformation. CLX performs pattern clustering in both input and output data, which allows the user to verify at the pattern level, rather than the data instance level, without having to write any regular expressions, thereby significantly reducing user verification effort. Thereafter, CLX automatically generates transformation programs as regular-expression replace operations that are easy for average users to verify. We experimentally compared the CLX prototype with both FlashFill, a state-of-the-art PBE data transformation tool, and Trifacta, an influential system supporting interactive data transformation. The results show improvements over the state of the art tools in saving user verification effort, without loss of efficiency or expressive power. In a user effort study on data sets of various sizes, when the data size grew by a factor of 30, the user verification time required by the CLX prototype grew by 1.3x whereas that required by FlashFill grew by 11.4x. In another test assessing the users' understanding of the transformation logic - a key ingredient in effective verification - CLX users achieved a success rate about twice that of FlashFill users. △ Less

Submitted 12 August, 2019; v1 submitted 1 March, 2018; originally announced March 2018.

Comments: 16 pages

arXiv:1712.05855 [pdf, other]

A Berkeley View of Systems Challenges for AI

Authors: Ion Stoica, Dawn Song, Raluca Ada Popa, David Patterson, Michael W. Mahoney, Randy Katz, Anthony D. Joseph, Michael Jordan, Joseph M. Hellerstein, Joseph E. Gonzalez, Ken Goldberg, Ali Ghodsi, David Culler, Pieter Abbeel

Abstract: With the increasing commoditization of computer vision, speech recognition and machine translation systems and the widespread deployment of learning-based back-end technologies such as digital advertising and intelligent infrastructures, AI (Artificial Intelligence) has moved from research labs to production. These changes have been made possible by unprecedented levels of data and computation, by… ▽ More With the increasing commoditization of computer vision, speech recognition and machine translation systems and the widespread deployment of learning-based back-end technologies such as digital advertising and intelligent infrastructures, AI (Artificial Intelligence) has moved from research labs to production. These changes have been made possible by unprecedented levels of data and computation, by methodological advances in machine learning, by innovations in systems software and architectures, and by the broad accessibility of these technologies. The next generation of AI systems promises to accelerate these developments and increasingly impact our lives via frequent interactions and making (often mission-critical) decisions on our behalf, often in highly personalized contexts. Realizing this promise, however, raises daunting challenges. In particular, we need AI systems that make timely and safe decisions in unpredictable environments, that are robust against sophisticated adversaries, and that can process ever increasing amounts of data across organizations and individuals without compromising confidentiality. These challenges will be exacerbated by the end of the Moore's Law, which will constrain the amount of data these technologies can store and process. In this paper, we propose several open research directions in systems, architectures, and security that can address these challenges and help unlock AI's potential to improve lives and society. △ Less

Submitted 15 December, 2017; originally announced December 2017.

Comments: Berkeley Technical Report

Report number: EECS-2017-159

arXiv:1605.05566 [pdf, ps, other]

Naughton's Wisconsin Bibliography: A Brief Guide

Authors: Joseph M. Hellerstein

Abstract: Over nearly three decades at the University of Wisconsin, Jeff Naughton has left an indelible mark on computer science. He has been a global leader of the database research field, deepening its core and pushing its boundaries. Many of Naughton's ideas were translated directly into practice in commercial and open-source systems. But software comes and goes. In the end, it is the ideas themselves th… ▽ More Over nearly three decades at the University of Wisconsin, Jeff Naughton has left an indelible mark on computer science. He has been a global leader of the database research field, deepening its core and pushing its boundaries. Many of Naughton's ideas were translated directly into practice in commercial and open-source systems. But software comes and goes. In the end, it is the ideas themselves that have had impact, ideas written down in papers. Naughton has been a prolific scholar over the last thirty years, with over 175 publications in his bibliography, covering a wide range of topics. This document does not attempt to enumerate or even summarize the wealth of ideas that Naughton has published over the course of his academic career--the task is too daunting. Instead, the best this short note aims to do is to serve as a rough map of the territory: something to help other researchers navigate the wide spaces of Naughton's work. △ Less

Submitted 17 May, 2016; originally announced May 2016.

Comments: Presented at the Wisconsin Database Group 40 Year Event, on the occasion of Jeff Naughton's retirement from the University of Wisconsin

arXiv:1510.07092 [pdf, other]

Asynchronous Complex Analytics in a Distributed Dataflow Architecture

Authors: Joseph E. Gonzalez, Peter Bailis, Michael I. Jordan, Michael J. Franklin, Joseph M. Hellerstein, Ali Ghodsi, Ion Stoica

Abstract: Scalable distributed dataflow systems have recently experienced widespread adoption, with commodity dataflow engines such as Hadoop and Spark, and even commodity SQL engines routinely supporting increasingly sophisticated analytics tasks (e.g., support vector machines, logistic regression, collaborative filtering). However, these systems' synchronous (often Bulk Synchronous Parallel) dataflow exec… ▽ More Scalable distributed dataflow systems have recently experienced widespread adoption, with commodity dataflow engines such as Hadoop and Spark, and even commodity SQL engines routinely supporting increasingly sophisticated analytics tasks (e.g., support vector machines, logistic regression, collaborative filtering). However, these systems' synchronous (often Bulk Synchronous Parallel) dataflow execution model is at odds with an increasingly important trend in the machine learning community: the use of asynchrony via shared, mutable state (i.e., data races) in convex programming tasks, which has---in a single-node context---delivered noteworthy empirical performance gains and inspired new research into asynchronous algorithms. In this work, we attempt to bridge this gap by evaluating the use of lightweight, asynchronous state transfer within a commodity dataflow engine. Specifically, we investigate the use of asynchronous sideways information passing (ASIP) that presents single-stage parallel iterators with a Volcano-like intra-operator iterator that can be used for asynchronous information passing. We port two synchronous convex programming algorithms, stochastic gradient descent and the alternating direction method of multipliers (ADMM), to use ASIPs. We evaluate an implementation of ASIPs within on Apache Spark that exhibits considerable speedups as well as a rich set of performance trade-offs in the use of these asynchronous algorithms. △ Less

Submitted 23 October, 2015; originally announced October 2015.

arXiv:1507.05539 [pdf, ps, other]

doi 10.1017/S1471068415000381

Putting Logic-Based Distributed Systems on Stable Grounds

Authors: Tom J. Ameloot, Jan Van den Bussche, William R. Marczak, Peter Alvaro, Joseph M. Hellerstein

Abstract: In the Declarative Networking paradigm, Datalog-like languages are used to express distributed computations. Whereas recently formal operational semantics for these languages have been developed, a corresponding declarative semantics has been lacking so far. The challenge is to capture precisely the amount of nondeterminism that is inherent to distributed computations due to concurrency, networkin… ▽ More In the Declarative Networking paradigm, Datalog-like languages are used to express distributed computations. Whereas recently formal operational semantics for these languages have been developed, a corresponding declarative semantics has been lacking so far. The challenge is to capture precisely the amount of nondeterminism that is inherent to distributed computations due to concurrency, networking delays, and asynchronous communication. This paper shows how a declarative, model-based semantics can be obtained by simply using the well-known stable model semantics for Datalog with negation. We show that the model-based semantics matches previously proposed formal operational semantics. △ Less

Submitted 25 July, 2015; v1 submitted 20 July, 2015; originally announced July 2015.

Comments: To appear in Theory and Practice of Logic Programming (TPLP)

Journal ref: Theory and Practice of Logic Programming 16 (2016) 378-417

arXiv:1402.2237 [pdf, other]

Coordination Avoidance in Database Systems (Extended Version)

Authors: Peter Bailis, Alan Fekete, Michael J. Franklin, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica

Abstract: Minimizing coordination, or blocking communication between concurrently executing operations, is key to maximizing scalability, availability, and high performance in database systems. However, uninhibited coordination-free execution can compromise application correctness, or consistency. When is coordination necessary for correctness? The classic use of serializable transactions is sufficient to m… ▽ More Minimizing coordination, or blocking communication between concurrently executing operations, is key to maximizing scalability, availability, and high performance in database systems. However, uninhibited coordination-free execution can compromise application correctness, or consistency. When is coordination necessary for correctness? The classic use of serializable transactions is sufficient to maintain correctness but is not necessary for all applications, sacrificing potential scalability. In this paper, we develop a formal framework, invariant confluence, that determines whether an application requires coordination for correct execution. By operating on application-level invariants over database states (e.g., integrity constraints), invariant confluence analysis provides a necessary and sufficient condition for safe, coordination-free execution. When programmers specify their application invariants, this analysis allows databases to coordinate only when anomalies that might violate invariants are possible. We analyze the invariant confluence of common invariants and operations from real-world database systems (i.e., integrity constraints) and applications and show that many are invariant confluent and therefore achievable without coordination. We apply these results to a proof-of-concept coordination-avoiding database prototype and demonstrate sizable performance gains compared to serializable execution, notably a 25-fold improvement over prior TPC-C New-Order performance on a 200 server cluster. △ Less

Submitted 30 October, 2014; v1 submitted 10 February, 2014; originally announced February 2014.

Comments: Extended version of paper appearing in PVLDB Vol. 8, No. 3

arXiv:1309.3324 [pdf, other]

Blazes: Coordination Analysis for Distributed Programs

Authors: Peter Alvaro, Neil Conway, Joseph M. Hellerstein, David Maier

Abstract: Distributed consistency is perhaps the most discussed topic in distributed systems today. Coordination protocols can ensure consistency, but in practice they cause undesirable performance unless used judiciously. Scalable distributed architectures avoid coordination whenever possible, but under-coordinated systems can exhibit behavioral anomalies under fault, which are often extremely difficult to… ▽ More Distributed consistency is perhaps the most discussed topic in distributed systems today. Coordination protocols can ensure consistency, but in practice they cause undesirable performance unless used judiciously. Scalable distributed architectures avoid coordination whenever possible, but under-coordinated systems can exhibit behavioral anomalies under fault, which are often extremely difficult to debug. This raises significant challenges for distributed system architects and developers. In this paper we present Blazes, a cross-platform program analysis framework that (a) identifies program locations that require coordination to ensure consistent executions, and (b) automatically synthesizes application-specific coordination code that can significantly outperform general-purpose techniques. We present two case studies, one using annotated programs in the Twitter Storm system, and another using the Bloom declarative language. △ Less

Submitted 28 November, 2013; v1 submitted 12 September, 2013; originally announced September 2013.

Comments: Updated to include additional materials from the original technical report: derivation rules, output stream labels

arXiv:1304.4303 [pdf, other]

Learning and Verifying Quantified Boolean Queries by Example

Authors: Azza Abouzied, Dana Angluin, Christos Papadimitriou, Joseph M. Hellerstein, Avi Silberschatz

Abstract: To help a user specify and verify quantified queries --- a class of database queries known to be very challenging for all but the most expert users --- one can question the user on whether certain data objects are answers or non-answers to her intended query. In this paper, we analyze the number of questions needed to learn or verify qhorn queries, a special class of Boolean quantified queries who… ▽ More To help a user specify and verify quantified queries --- a class of database queries known to be very challenging for all but the most expert users --- one can question the user on whether certain data objects are answers or non-answers to her intended query. In this paper, we analyze the number of questions needed to learn or verify qhorn queries, a special class of Boolean quantified queries whose underlying form is conjunctions of quantified Horn expressions. We provide optimal polynomial-question and polynomial-time learning and verification algorithms for two subclasses of the class qhorn with upper constant limits on a query's causal density. △ Less

Submitted 15 April, 2013; originally announced April 2013.

Comments: Extended Version of PODS 2013 paper

arXiv:1302.0309 [pdf, other]

Highly Available Transactions: Virtues and Limitations (Extended Version)

Authors: Peter Bailis, Aaron Davidson, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica

Abstract: To minimize network latency and remain online during server failures and network partitions, many modern distributed data storage systems eschew transactional functionality, which provides strong semantic guarantees for groups of multiple operations over multiple data items. In this work, we consider the problem of providing Highly Available Transactions (HATs): transactional guarantees that do no… ▽ More To minimize network latency and remain online during server failures and network partitions, many modern distributed data storage systems eschew transactional functionality, which provides strong semantic guarantees for groups of multiple operations over multiple data items. In this work, we consider the problem of providing Highly Available Transactions (HATs): transactional guarantees that do not suffer unavailability during system partitions or incur high network latency. We introduce a taxonomy of highly available systems and analyze existing ACID isolation and distributed data consistency guarantees to identify which can and cannot be achieved in HAT systems. This unifies the literature on weak transactional isolation, replica consistency, and highly available systems. We analytically and experimentally quantify the availability and performance benefits of HATs--often two to three orders of magnitude over wide-area networks--and discuss their necessary semantic compromises. △ Less

Submitted 6 October, 2013; v1 submitted 1 February, 2013; originally announced February 2013.

Comments: Extended version of "Highly Available Transactions: Virtues and Limitations" to appear in VLDB 2014

arXiv:1204.6082 [pdf, other]

Probabilistically Bounded Staleness for Practical Partial Quorums

Authors: Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, Ion Stoica

Abstract: Data store replication results in a fundamental trade-off between operation latency and data consistency. In this paper, we examine this trade-off in the context of quorum-replicated data stores. Under partial, or non-strict quorum replication, a data store waits for responses from a subset of replicas before answering a query, without guaranteeing that read and write replica sets intersect. As de… ▽ More Data store replication results in a fundamental trade-off between operation latency and data consistency. In this paper, we examine this trade-off in the context of quorum-replicated data stores. Under partial, or non-strict quorum replication, a data store waits for responses from a subset of replicas before answering a query, without guaranteeing that read and write replica sets intersect. As deployed in practice, these configurations provide only basic eventual consistency guarantees, with no limit to the recency of data returned. However, anecdotally, partial quorums are often "good enough" for practitioners given their latency benefits. In this work, we explain why partial quorums are regularly acceptable in practice, analyzing both the staleness of data they return and the latency benefits they offer. We introduce Probabilistically Bounded Staleness (PBS) consistency, which provides expected bounds on staleness with respect to both versions and wall clock time. We derive a closed-form solution for versioned staleness as well as model real-time staleness for representative Dynamo-style systems under internet-scale production workloads. Using PBS, we measure the latency-consistency trade-off for partial quorum systems. We quantitatively demonstrate how eventually consistent systems frequently return consistent data within tens of milliseconds while offering significant latency benefits. △ Less

Submitted 26 April, 2012; originally announced April 2012.

Comments: VLDB2012

Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 8, pp. 776-787 (2012)

arXiv:1204.6078 [pdf, other]

Distributed GraphLab: A Framework for Machine Learning in the Cloud

Authors: Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, Joseph M. Hellerstein

Abstract: While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning algorithms and can lead to inefficient learning systems. To help fill this critical void, we introduced the GraphLab abstraction which naturally expresses asynchronous,… ▽ More While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning algorithms and can lead to inefficient learning systems. To help fill this critical void, we introduced the GraphLab abstraction which naturally expresses asynchronous, dynamic, graph-parallel computation while ensuring data consistency and achieving a high degree of parallel performance in the shared-memory setting. In this paper, we extend the GraphLab framework to the substantially more challenging distributed setting while preserving strong data consistency guarantees. We develop graph based extensions to pipelined locking and data versioning to reduce network congestion and mitigate the effect of network latency. We also introduce fault tolerance to the GraphLab abstraction using the classic Chandy-Lamport snapshot algorithm and demonstrate how it can be easily implemented by exploiting the GraphLab abstraction itself. Finally, we evaluate our distributed implementation of the GraphLab abstraction on a large Amazon EC2 deployment and show 1-2 orders of magnitude performance gains over Hadoop-based implementations. △ Less

Submitted 26 April, 2012; originally announced April 2012.

Comments: VLDB2012

Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 8, pp. 716-727 (2012)

arXiv:1007.3781 [pdf, other]

Multiresolution Cube Estimators for Sensor Network Aggregate Queries

Authors: Alexandra Meliou, Carlos Guestrin, Joseph M. Hellerstein

Abstract: In this work we present in-network techniques to improve the efficiency of spatial aggregate queries. Such queries are very common in a sensornet setting, demanding more targeted techniques for their handling. Our approach constructs and maintains multi-resolution cube hierarchies inside the network, which can be constructed in a distributed fashion. In case of failures, recovery can also be perfo… ▽ More In this work we present in-network techniques to improve the efficiency of spatial aggregate queries. Such queries are very common in a sensornet setting, demanding more targeted techniques for their handling. Our approach constructs and maintains multi-resolution cube hierarchies inside the network, which can be constructed in a distributed fashion. In case of failures, recovery can also be performed with in-network decisions. In this paper we demonstrate how in-network cube hierarchies can be used to summarize sensor data, and how they can be exploited to improve the efficiency of spatial aggregate queries. We show that query plans over our cube summaries can be computed in polynomial time, and we present a PTIME algorithm that selects the minimum number of data requests that can compute the answer to a spatial query. We further extend our algorithm to handle optimization over multiple queries, which can also be done in polynomial time. We discuss enriching cube hierarchies with extra summary information, and present an algorithm for distributed cube construction. Finally we investigate node and area failures, and algorithms to recover query results. △ Less

Submitted 21 July, 2010; originally announced July 2010.

Comments: 14 pages, 8 figures, IV Alberto Mendelzon Workshop on Foundations of Data Management

Journal ref: IV Alberto Mendelzon Workshop on Foundations of Data Management, 2010

arXiv:1006.4990 [pdf, other]

GraphLab: A New Framework for Parallel Machine Learning

Authors: Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, Joseph M. Hellerstein

Abstract: Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions… ▽ More Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance. We demonstrate the expressiveness of the GraphLab framework by designing and implementing parallel versions of belief propagation, Gibbs sampling, Co-EM, Lasso and Compressed Sensing. We show that using GraphLab we can achieve excellent parallel performance on large scale real-world problems. △ Less

Submitted 25 June, 2010; originally announced June 2010.

Journal ref: The 26th Conference on Uncertainty in Artificial Intelligence (UAI 2010), Catalina Island, California, July 8-11, 2010

Showing 1–44 of 44 results for author: Hellerstein, J M