-
Analytically-Driven Resource Management for Cloud-Native Microservices
Authors:
Yanqi Zhang,
Zhuangzhuang Zhou,
Sameh Elnikety,
Christina Delimitrou
Abstract:
Resource management for cloud-native microservices has attracted a lot of recent attention. Previous work has shown that machine learning (ML)-driven approaches outperform traditional techniques, such as autoscaling, in terms of both SLA maintenance and resource efficiency. However, ML-driven approaches also face challenges including lengthy data collection processes and limited scalability. We pr…
▽ More
Resource management for cloud-native microservices has attracted a lot of recent attention. Previous work has shown that machine learning (ML)-driven approaches outperform traditional techniques, such as autoscaling, in terms of both SLA maintenance and resource efficiency. However, ML-driven approaches also face challenges including lengthy data collection processes and limited scalability. We present Ursa, a lightweight resource management system for cloud-native microservices that addresses these challenges. Ursa uses an analytical model that decomposes the end-to-end SLA into per-service SLA, and maps per-service SLA to individual resource allocations per microservice tier. To speed up the exploration process and avoid prolonged SLA violations, Ursa explores each microservice individually, and swiftly stops exploration if latency exceeds its SLA.
We evaluate Ursa on a set of representative and end-to-end microservice topologies, including a social network, media service and video processing pipeline, each consisting of multiple classes and priorities of requests with different SLAs, and compare it against two representative ML-driven systems, Sinan and Firm. Compared to these ML-driven approaches, Ursa provides significant advantages: It shortens the data collection process by more than 128x, and its control plane is 43x faster than ML-driven approaches. At the same time, Ursa does not sacrifice resource efficiency or SLAs. During online deployment, Ursa reduces the SLA violation rate by 9.0% up to 49.9%, and reduces CPU allocation by up to 86.2% compared to ML-driven approaches.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Towards Fast, Adaptive, and Hardware-Assisted User-Space Scheduling
Authors:
Lisa,
Li,
Nikita Lazarev,
David Koufaty,
Yijun Yin,
Andy Anderson,
Zhiru Zhang,
Edward Suh,
Kostis Kaffes,
Christina Delimitrou
Abstract:
Modern datacenter applications are prone to high tail latencies since their requests typically follow highly-dispersive distributions. Delivering fast interrupts is essential to reducing tail latency. Prior work has proposed both OS- and system-level solutions to reduce tail latencies for microsecond-scale workloads through better scheduling. Unfortunately, existing approaches like customized data…
▽ More
Modern datacenter applications are prone to high tail latencies since their requests typically follow highly-dispersive distributions. Delivering fast interrupts is essential to reducing tail latency. Prior work has proposed both OS- and system-level solutions to reduce tail latencies for microsecond-scale workloads through better scheduling. Unfortunately, existing approaches like customized dataplane OSes, require significant OS changes, experience scalability limitations, or do not reach the full performance capabilities hardware offers.
The emergence of new hardware features like UINTR exposed new opportunities to rethink the design paradigms and abstractions of traditional scheduling systems. We propose LibPreemptible, a preemptive user-level threading library that is flexible, lightweight, and adaptive. LibPreemptible was built with a set of optimizations like LibUtimer for scalability, and deadline-oriented API for flexible policies, time-quantum controller for adaptiveness. Compared to the prior state-of-the-art scheduling system Shinjuku, our system achieves significant tail latency and throughput improvements for various workloads without modifying the kernel. We also demonstrate the flexibility of LibPreemptible across scheduling policies for real applications experiencing varying load levels and characteristics.
△ Less
Submitted 11 November, 2023; v1 submitted 5 August, 2023;
originally announced August 2023.
-
Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks
Authors:
Mingyu Liang,
Wenyin Fu,
Louis Feng,
Zhongyi Lin,
Pavani Panakanti,
Shengbao Zheng,
Srinivas Sridharan,
Christina Delimitrou
Abstract:
Building large AI fleets to support the rapidly growing DL workloads is an active research topic for modern cloud providers. Generating accurate benchmarks plays an essential role in designing the fast-paced software and hardware solutions in this space. Two fundamental challenges to make this scalable are (i) workload representativeness and (ii) the ability to quickly incorporate changes to the f…
▽ More
Building large AI fleets to support the rapidly growing DL workloads is an active research topic for modern cloud providers. Generating accurate benchmarks plays an essential role in designing the fast-paced software and hardware solutions in this space. Two fundamental challenges to make this scalable are (i) workload representativeness and (ii) the ability to quickly incorporate changes to the fleet into the benchmarks.
To overcome these issues, we propose Mystique, an accurate and scalable framework for production AI benchmark generation. It leverages the PyTorch execution trace (ET), a new feature that captures the runtime information of AI models at the granularity of operators, in a graph format, together with their metadata. By sourcing fleet ETs, we can build AI benchmarks that are portable and representative. Mystique is scalable, due to its lightweight data collection, in terms of runtime overhead and instrumentation effort. It is also adaptive because ET composability allows flexible control on benchmark creation.
We evaluate our methodology on several production AI models, and show that benchmarks generated with Mystique closely resemble original AI models, both in execution time and system-level metrics. We also showcase the portability of the generated benchmarks across platforms, and demonstrate several use cases enabled by the fine-grained composability of the execution trace.
△ Less
Submitted 11 April, 2023; v1 submitted 16 December, 2022;
originally announced January 2023.
-
QoS-Aware Resource Management for Multi-phase Serverless Workflows with Aquatope
Authors:
Zhuangzhuang Zhou,
Yanqi Zhang,
Christina Delimitrou
Abstract:
Multi-stage serverless applications, i.e., workflows with many computation and I/O stages, are becoming increasingly representative of FaaS platforms. Despite their advantages in terms of fine-grained scalability and modular development, these applications are subject to suboptimal performance, resource inefficiency, and high costs to a larger degree than previous simple serverless functions.
We…
▽ More
Multi-stage serverless applications, i.e., workflows with many computation and I/O stages, are becoming increasingly representative of FaaS platforms. Despite their advantages in terms of fine-grained scalability and modular development, these applications are subject to suboptimal performance, resource inefficiency, and high costs to a larger degree than previous simple serverless functions.
We present Aquatope, a QoS-and-uncertainty-aware resource scheduler for end-to-end serverless workflows that takes into account the inherent uncertainty present in FaaS platforms, and improves performance predictability and resource efficiency. Aquatope uses a set of scalable and validated Bayesian models to create pre-warmed containers ahead of function invocations, and to allocate appropriate resources at function granularity to meet a complex workflow's end-to-end QoS, while minimizing resource cost. Across a diverse set of analytics and interactive multi-stage serverless workloads, Aquatope significantly outperforms prior systems, reducing QoS violations by 5x, and cost by 34% on average and up to 52% compared to other QoS-meeting methods.
△ Less
Submitted 28 December, 2022;
originally announced December 2022.
-
End-to-End Application Cloning for Distributed Cloud Microservices with Ditto
Authors:
Mingyu Liang,
Yu Gan,
Yueying Li,
Carlos Torres,
Abhishek Danotia,
Mahesh Ketkar,
Christina Delimitrou
Abstract:
We present Ditto, an automated framework for cloning end-to-end cloud applications, both monolithic and microservices, which captures I/O and network activity, as well as kernel operations, in addition to application logic. Ditto takes a hierarchical approach to application cloning, starting with capturing the dependency graph across distributed services, to recreating each tier's control/data flo…
▽ More
We present Ditto, an automated framework for cloning end-to-end cloud applications, both monolithic and microservices, which captures I/O and network activity, as well as kernel operations, in addition to application logic. Ditto takes a hierarchical approach to application cloning, starting with capturing the dependency graph across distributed services, to recreating each tier's control/data flow, and finally generating system calls and assembly that mimics the individual applications. Ditto does not reveal the logic of the original application, facilitating publicly sharing clones of production services with hardware vendors, cloud providers, and the research community.
We show that across a diverse set of single- and multi-tier applications, Ditto accurately captures their CPU and memory characteristics as well as their high-level performance metrics, is portable across platforms, and facilitates a wide range of system studies.
△ Less
Submitted 28 December, 2022;
originally announced December 2022.
-
A Hardware-Software Stack for Serverless Edge Swarms
Authors:
Liam Patterson,
David Pigorovsky,
Brian Dempsey,
Nikita Lazarev,
Aditya Shah,
Clara Steinhoff,
Ariana Bruno,
Justin Hu,
Christina Delimitrou
Abstract:
Swarms of autonomous devices are increasing in ubiquity and size, making the need for rethinking their hardware-software system stack critical.
We present HiveMind, the first swarm coordination platform that enables programmable execution of complex task workflows between cloud and edge resources in a performant and scalable manner. HiveMind is a software-hardware platform that includes a domain…
▽ More
Swarms of autonomous devices are increasing in ubiquity and size, making the need for rethinking their hardware-software system stack critical.
We present HiveMind, the first swarm coordination platform that enables programmable execution of complex task workflows between cloud and edge resources in a performant and scalable manner. HiveMind is a software-hardware platform that includes a domain-specific language to simplify programmability of cloud-edge applications, a program synthesis tool to automatically explore task placement strategies, a centralized controller that leverages serverless computing to elastically scale cloud resources, and a reconfigurable hardware acceleration fabric for network and remote memory accesses.
We design and build the full end-to-end HiveMind system on two real edge swarms comprised of drones and robotic cars. We quantify the opportunities and challenges serverless introduces to edge applications, as well as the trade-offs between centralized and distributed coordination. We show that HiveMind achieves significantly better performance predictability and battery efficiency compared to existing centralized and decentralized platforms, while also incurring lower network traffic. Using both real systems and a validated simulator we show that HiveMind can scale to thousands of edge devices without sacrificing performance or efficiency, demonstrating that centralized platforms can be both scalable and performant.
△ Less
Submitted 29 December, 2021;
originally announced December 2021.
-
Sage: Leveraging ML to Diagnose Unpredictable Performance in Cloud Microservices
Authors:
Yu Gan,
Mingyu Liang,
Sundar Dev,
David Lo,
Christina Delimitrou
Abstract:
Cloud applications are increasingly shifting from large monolithic services, to complex graphs of loosely-coupled microservices. Despite their advantages, microservices also introduce cascading QoS violations in cloud applications, which are difficult to diagnose and correct.
We present Sage, a ML-driven root cause analysis system for interactive cloud microservices. Sage leverages unsupervised…
▽ More
Cloud applications are increasingly shifting from large monolithic services, to complex graphs of loosely-coupled microservices. Despite their advantages, microservices also introduce cascading QoS violations in cloud applications, which are difficult to diagnose and correct.
We present Sage, a ML-driven root cause analysis system for interactive cloud microservices. Sage leverages unsupervised learning models to circumvent the overhead of trace labeling, determines the root cause of unpredictable performance online, and applies corrective actions to restore performance. On experiments on both dedicated local clusters and large GCE clusters we show that Sage achieves high root cause detection accuracy and predictable performance.
△ Less
Submitted 12 December, 2021;
originally announced December 2021.
-
Sinan: Data Driven Resource Management for Cloud Microservices
Authors:
Yanqi Zhang,
Weizhe Hua,
Zhuangzhuang Zhou,
Ed Suh,
Christina Delimitrou
Abstract:
Cloud applications are increasingly shifting to interactive and loosely-coupled microservices. Despite their advantages, microservices complicate resource management, due to inter-tier dependencies.
We present Sinan, a cluster manager for interactive microservices that leverages easily-obtainable tracing data instead of empirical decisions, to infer the impact of a resource allocation on on end-…
▽ More
Cloud applications are increasingly shifting to interactive and loosely-coupled microservices. Despite their advantages, microservices complicate resource management, due to inter-tier dependencies.
We present Sinan, a cluster manager for interactive microservices that leverages easily-obtainable tracing data instead of empirical decisions, to infer the impact of a resource allocation on on end-to-end performance, and allocate appropriate resources to each tier. In a preliminary evaluation of Sinan with an end-to-end social network built with microservices, we show that Sinan's data-driven approach, allows the service to always meet its QoS without sacrificing resource efficiency.
△ Less
Submitted 12 December, 2021;
originally announced December 2021.
-
Dagger: Accelerating RPCs in Cloud Microservices Through Tightly-Coupled Reconfigurable NICs
Authors:
Nikita Lazarev,
Shaojie Xiang,
Neil Adit,
Zhiru Zhang,
Christina Delimitrou
Abstract:
The ongoing shift of cloud services from monolithic designs to microservices creates high demand for efficient and high performance datacenter networking stacks, optimized for fine-grained workloads. Commodity networking systems based on software stacks and peripheral NICs introduce high overheads when it comes to delivering small messages.
We present Dagger, a hardware acceleration fabric for c…
▽ More
The ongoing shift of cloud services from monolithic designs to microservices creates high demand for efficient and high performance datacenter networking stacks, optimized for fine-grained workloads. Commodity networking systems based on software stacks and peripheral NICs introduce high overheads when it comes to delivering small messages.
We present Dagger, a hardware acceleration fabric for cloud RPCs based on FPGAs, where the accelerator is closely-coupled with the host processor over a configurable memory interconnect. The three key design principle of Dagger are: (1) offloading the entire RPC stack to an FPGA-based NIC, (2) leveraging memory interconnects instead of PCIe buses as the interface with the host CPU, and (3) making the acceleration fabric reconfigurable, so it can accommodate the diverse needs of microservices. We show that the combination of these principles significantly improves the efficiency and performance of cloud RPC systems while preserving their generality. Dagger achieves 1.3-3.8x higher per-core RPC throughput compared to both highly-optimized software stacks, and systems using specialized RDMA adapters. It also scales up to 84 Mrps with 8 threads on 4 CPU cores, while maintaining state-of-the-art us-scale tail latency. We also demonstrate that large third-party applications, like memcached and MICA KVS, can be easily ported on Dagger with minimal changes to their codebase, bringing their median and tail KVS access latency down to 2.8 - 3.5us and 5.4 - 7.8us, respectively. Finally, we show that Dagger is beneficial for multi-tier end-to-end microservices with different threading models by evaluating it using an 8-tier application implementing a flight check-in service.
△ Less
Submitted 2 June, 2021;
originally announced June 2021.
-
Sinan: Data-Driven, QoS-Aware Cluster Management for Microservices
Authors:
Yanqi Zhang,
Weizhe Hua,
Zhuangzhuang Zhou,
Edward Suh,
Christina Delimitrou
Abstract:
Cloud applications are increasingly shifting from large monolithic services, to large numbers of loosely-coupled, specialized microservices. Despite their advantages in terms of facilitating development, deployment, modularity, and isolation, microservices complicate resource management, as dependencies between them introduce backpressure effects and cascading QoS violations.
We present Sinan, a…
▽ More
Cloud applications are increasingly shifting from large monolithic services, to large numbers of loosely-coupled, specialized microservices. Despite their advantages in terms of facilitating development, deployment, modularity, and isolation, microservices complicate resource management, as dependencies between them introduce backpressure effects and cascading QoS violations.
We present Sinan, a data-driven cluster manager for interactive cloud microservices that is online and QoS-aware. Sinan leverages a set of scalable and validated machine learning models to determine the performance impact of dependencies between microservices, and allocate appropriate resources per tier in a way that preserves the end-to-end tail latency target. We evaluate Sinan both on dedicated local clusters and large-scale deployments on Google Compute Engine (GCE) across representative end-to-end applications built with microservices, such as social networks and hotel reservation sites. We show that Sinan always meets QoS, while also maintaining cluster utilization high, in contrast to prior work which leads to unpredictable performance or sacrifices resource efficiency. Furthermore, the techniques in Sinan are explainable, meaning that cloud operators can yield insights from the ML models on how to better deploy and design their applications to reduce unpredictable performance.
△ Less
Submitted 27 May, 2021;
originally announced May 2021.
-
Sage: Using Unsupervised Learning for Scalable Performance Debugging in Microservices
Authors:
Yu Gan,
Mingyu Liang,
Sundar Dev,
David Lo,
Christina Delimitrou
Abstract:
Cloud applications are increasingly shifting from large monolithic services to complex graphs of loosely-coupled microservices. Despite the advantages of modularity and elasticity microservices offer, they also complicate cluster management and performance debugging, as dependencies between tiers introduce backpressure and cascading QoS violations.
We present Sage, a machine learning-driven root…
▽ More
Cloud applications are increasingly shifting from large monolithic services to complex graphs of loosely-coupled microservices. Despite the advantages of modularity and elasticity microservices offer, they also complicate cluster management and performance debugging, as dependencies between tiers introduce backpressure and cascading QoS violations.
We present Sage, a machine learning-driven root cause analysis system for interactive cloud microservices. Sage leverages unsupervised ML models to circumvent the overhead of trace labeling, captures the impact of dependencies between microservices to determine the root cause of unpredictable performance online, and applies corrective actions to recover a cloud service's QoS. In experiments on both dedicated local clusters and large clusters on Google Compute Engine we show that Sage consistently achieves over 93% accuracy in correctly identifying the root cause of QoS violations, and improves performance predictability.
△ Less
Submitted 1 January, 2021;
originally announced January 2021.
-
CuttleSys: Data-Driven Resource Management forInteractive Applications on Reconfigurable Multicores
Authors:
Neeraj Kulkarni,
Gonzalo Gonzalez-Pumariega,
Amulya Khurana,
Christine Shoemaker,
Christina Delimitrou,
David Albonesi
Abstract:
Multi-tenancy for latency-critical applications leads to re-source interference and unpredictable performance. Core reconfiguration opens up more opportunities for colocation,as it allows the hardware to adjust to the dynamic performance and power needs of a specific mix of co-scheduled applications. However, reconfigurability also introduces challenges, as even for a small number of reconfigurabl…
▽ More
Multi-tenancy for latency-critical applications leads to re-source interference and unpredictable performance. Core reconfiguration opens up more opportunities for colocation,as it allows the hardware to adjust to the dynamic performance and power needs of a specific mix of co-scheduled applications. However, reconfigurability also introduces challenges, as even for a small number of reconfigurable cores, exploring the design space becomes more time- and resource-demanding.
We present CuttleSys, a runtime for reconfigurable multi-cores that leverages scalable and lightweight data mining to quickly identify suitable core and cache configurations for a set of co-scheduled applications. The runtime combines collaborative filtering to infer the behavior of each job on every core and cache configuration, with Dynamically Dimensioned Search to efficiently explore the configuration space. We evaluate CuttleSys on multicores with tens of reconfigurable cores and show up to 2.46x and 1.55x performance improvements compared to core-level gating and oracle-like asymmetric multicores respectively, under stringent power constraints.
△ Less
Submitted 1 August, 2020;
originally announced August 2020.
-
Dagger: Towards Efficient RPCs in Cloud Microservices with Near-Memory Reconfigurable NICs
Authors:
Nikita Lazarev,
Neil Adit,
Shaojie Xiang,
Zhiru Zhang,
Christina Delimitrou
Abstract:
Cloud applications are increasingly relying on hundreds of loosely-coupled microservices to complete user requests that meet an applications end-to-end QoS requirements. Communication time between services accounts for a large fraction of the end-to-end latency and can introduce performance unpredictability and QoS violations. This work presents our early work on Dagger, a hardware acceleration pl…
▽ More
Cloud applications are increasingly relying on hundreds of loosely-coupled microservices to complete user requests that meet an applications end-to-end QoS requirements. Communication time between services accounts for a large fraction of the end-to-end latency and can introduce performance unpredictability and QoS violations. This work presents our early work on Dagger, a hardware acceleration platform for networking, designed specifically with the unique qualities of microservices in mind. The Dagger architecture relies on an FPGA-based NIC, closely coupled with the processor over a configurable memory interconnect, designed to offload and accelerate RPC stacks. Unlike the traditional cloud systems that use PCIe links as the NIC I/O interface, we leverage memory-interconnected FPGAs as networking devices to provide the efficiency, transparency, and programmability needed for fine-grained microservices. We show that this considerably improves CPU utilization and performance for cloud RPCs.
△ Less
Submitted 11 September, 2020; v1 submitted 16 July, 2020;
originally announced July 2020.
-
HiveMind: A Scalable and Serverless Coordination Control Platform for UAV Swarms
Authors:
Justin Hu,
Ariana Bruno,
Brian Ritchken,
Brendon Jackson,
Mateo Espinosa,
Aditya Shah,
Christina Delimitrou
Abstract:
Swarms of autonomous devices are increasing in ubiquity and size. There are two main trains of thought for controlling devices in such swarms; centralized and distributed control. Centralized platforms achieve higher output quality but result in high network traffic and limited scalability, while decentralized systems are more scalable, but less sophisticated.
In this work we present HiveMind, a…
▽ More
Swarms of autonomous devices are increasing in ubiquity and size. There are two main trains of thought for controlling devices in such swarms; centralized and distributed control. Centralized platforms achieve higher output quality but result in high network traffic and limited scalability, while decentralized systems are more scalable, but less sophisticated.
In this work we present HiveMind, a centralized coordination control platform for IoT swarms that is both scalable and performant. HiveMind leverages a centralized cluster for all resource-intensive computation, deferring lightweight and time-critical operations, such as obstacle avoidance to the edge devices to reduce network traffic. HiveMind employs an event-driven serverless framework to run tasks on the cluster, guarantees fault tolerance both in the edge devices and serverless functions, and handles straggler tasks and underperforming devices. We evaluate HiveMind on a swarm of 16 programmable drones on two scenarios; searching for given items, and counting unique people in an area. We show that HiveMind achieves better performance and battery efficiency compared to fully centralized and fully decentralized platforms, while also handling load imbalances and failures gracefully, and allowing edge devices to leverage the cluster to collectively improve their output quality.
△ Less
Submitted 4 February, 2020;
originally announced February 2020.
-
Ripple: A Practical Declarative Programming Framework for Serverless Compute
Authors:
Shannon Joyner,
Michael MacCoss,
Christina Delimitrou,
Hakim Weatherspoon
Abstract:
Serverless computing has emerged as a promising alternative to infrastructure- (IaaS) and platform-as-a-service (PaaS)cloud platforms for applications with ample parallelism and intermittent activity. Serverless promises greater resource elasticity, significant cost savings, and simplified application deployment. All major cloud providers, including Amazon, Google, and Microsoft, have introduced s…
▽ More
Serverless computing has emerged as a promising alternative to infrastructure- (IaaS) and platform-as-a-service (PaaS)cloud platforms for applications with ample parallelism and intermittent activity. Serverless promises greater resource elasticity, significant cost savings, and simplified application deployment. All major cloud providers, including Amazon, Google, and Microsoft, have introduced serverless to their public cloud offerings. For serverless to reach its potential, there is a pressing need for programming frameworks that abstract the deployment complexity away from the user. This includes simplifying the process of writing applications for serverless environments, automating task and data partitioning, and handling scheduling and fault tolerance.
We present Ripple, a programming framework designed to specifically take applications written for single-machine execution and allow them to take advantage of the task parallelism of serverless. Ripple exposes a simple interface that users can leverage to express the high-level dataflow of a wide spectrum of applications, including machine learning (ML) analytics, genomics, and proteomics. Ripple also automates resource provisioning, meeting user-defined QoS targets, and handles fault tolerance by eagerly detecting straggler tasks. We port Ripple over AWS Lambda and show that, across a set of diverse applications, it provides an expressive and generalizable programming framework that simplifies running data-parallel applications on serverless, and can improve performance by up to 80x compared to IaaS/PaaS clouds for similar costs.
△ Less
Submitted 1 January, 2020;
originally announced January 2020.
-
uqSim: Scalable and Validated Simulation of Cloud Microservices
Authors:
Yanqi Zhang,
Yu Gan,
Christina Delimitrou
Abstract:
Current cloud services are moving away from monolithic designs and towards graphs of many loosely-coupled, single-concerned microservices. Microservices have several advantages, including speeding up development and deployment, allowing specialization of the software infrastructure, and hel** with debugging and error isolation. At the same time they introduce several hardware and software challe…
▽ More
Current cloud services are moving away from monolithic designs and towards graphs of many loosely-coupled, single-concerned microservices. Microservices have several advantages, including speeding up development and deployment, allowing specialization of the software infrastructure, and hel** with debugging and error isolation. At the same time they introduce several hardware and software challenges. Given that most of the performance and efficiency implications of microservices happen at scales larger than what is available outside production deployments, studying such effects requires designing the right simulation infrastructures.
We present uqSim, a scalable and validated queueing network simulator designed specifically for interactive microservices. uqSim provides detailed intra- and inter-microservice models that allow it to faithfully reproduce the behavior of complex, many-tier applications. uqSim is also modular, allowing reuse of individual models across microservices and end-to-end applications. We have validated uqSim both against simple and more complex microservices graphs, and have shown that it accurately captures performance in terms of throughput and tail latency. Finally, we use uqSim to model the tail at scale effects of request fanout, and the performance impact of power management in latency-sensitive microservices.
△ Less
Submitted 5 November, 2019;
originally announced November 2019.
-
An Open-Source Benchmark Suite for Cloud and IoT Microservices
Authors:
Yu Gan,
Yanqi Zhang,
Dailun Cheng,
Ankitha Shetty,
Priyal Rathi,
Nayan Katarki,
Ariana Bruno,
Justin Hu,
Brian Ritchken,
Brendon Jackson,
Kelvin Hu,
Meghna Pancholi,
Yuan He,
Brett Clancy,
Chris Colen,
Fukang Wen,
Catherine Leung,
Siyuan Wang,
Leon Zaruvinsky,
Mateo Espinosa,
Rick Lin,
Zhongling Liu,
Jake Padilla,
Christina Delimitrou
Abstract:
Cloud services have recently started undergoing a major shift from monolithic applications, to graphs of hundreds of loosely-coupled microservices. Microservices fundamentally change a lot of assumptions current cloud systems are designed with, and present both opportunities and challenges when optimizing for quality of service (QoS) and utilization. In this paper we explore the implications micro…
▽ More
Cloud services have recently started undergoing a major shift from monolithic applications, to graphs of hundreds of loosely-coupled microservices. Microservices fundamentally change a lot of assumptions current cloud systems are designed with, and present both opportunities and challenges when optimizing for quality of service (QoS) and utilization. In this paper we explore the implications microservices have across the cloud system stack. We first present DeathStarBench, a novel, open-source benchmark suite built with microservices that is representative of large end-to-end services, modular and extensible. DeathStarBench includes a social network, a media service, an e-commerce site, a banking system, and IoT applications for coordination control of UAV swarms. We then use DeathStarBench to study the architectural characteristics of microservices, their implications in networking and operating systems, their challenges with respect to cluster management, and their trade-offs in terms of application design and programming frameworks. Finally, we explore the tail at scale effects of microservices in real deployments with hundreds of users, and highlight the increased pressure they put on performance predictability.
△ Less
Submitted 27 May, 2019;
originally announced May 2019.
-
Leveraging Deep Learning to Improve the Performance Predictability of Cloud Microservices
Authors:
Yu Gan,
Yanqi Zhang,
Kelvin Hu,
Dailun Cheng,
Yuan He,
Meghna Pancholi,
Christina Delimitrou
Abstract:
Performance unpredictability is a major roadblock towards cloud adoption, and has performance, cost, and revenue ramifications. Predictable performance is even more critical as cloud services transition from monolithic designs to microservices. Detecting QoS violations after they occur in systems with microservices results in long recovery times, as hotspots propagate and amplify across dependent…
▽ More
Performance unpredictability is a major roadblock towards cloud adoption, and has performance, cost, and revenue ramifications. Predictable performance is even more critical as cloud services transition from monolithic designs to microservices. Detecting QoS violations after they occur in systems with microservices results in long recovery times, as hotspots propagate and amplify across dependent services. We present Seer, an online cloud performance debugging system that leverages deep learning and the massive amount of tracing data cloud systems collect to learn spatial and temporal patterns that translate to QoS violations. Seer combines lightweight distributed RPC-level tracing, with detailed low-level hardware monitoring to signal an upcoming QoS violation, and diagnose the source of unpredictable performance. Once an imminent QoS violation is detected, Seer notifies the cluster manager to take action to avoid performance degradation altogether. We evaluate Seer both in local clusters, and in large-scale deployments of end-to-end applications built with microservices with hundreds of users. We show that Seer correctly anticipates QoS violations 91% of the time, and avoids the QoS violation to begin with in 84% of cases. Finally, we show that Seer can identify application-level design bugs, and provide insights on how to better architect microservices to achieve predictable performance.
△ Less
Submitted 2 May, 2019;
originally announced May 2019.
-
The Architectural Implications of Microservices in the Cloud
Authors:
Yu Gan,
Christina Delimitrou
Abstract:
Cloud services have recently undergone a shift from monolithic applications to microservices, with hundreds or thousands of loosely-coupled microservices comprising the end-to-end application. Microservices present both opportunities and challenges when optimizing for quality of service (QoS) and cloud utilization. In this paper we explore the implications cloud microservices have on system bottle…
▽ More
Cloud services have recently undergone a shift from monolithic applications to microservices, with hundreds or thousands of loosely-coupled microservices comprising the end-to-end application. Microservices present both opportunities and challenges when optimizing for quality of service (QoS) and cloud utilization. In this paper we explore the implications cloud microservices have on system bottlenecks, and datacenter server design. We first present and characterize an end-to-end application built using tens of popular open-source microservices that implements a movie renting and streaming service, and is modular and extensible. We then use the end-to-end service to study the scalability and performance bottlenecks of microservices, and highlight implications they have on the design of datacenter hardware. Specifically, we revisit the long-standing debate of brawny versus wimpy cores in the context of microservices, we quantify the I-cache pressure they introduce, and measure the time spent in computation versus communication between microservices over RPCs. As more cloud applications switch to this new programming model, it is increasingly important to revisit the assumptions we have previously used to build and manage cloud systems.
△ Less
Submitted 25 May, 2018;
originally announced May 2018.
-
To Centralize or Not to Centralize: A Tale of Swarm Coordination
Authors:
Justin Hu,
Ariana Bruno,
Drew Zagieboylo,
Mark Zhao,
Brian Ritchken,
Brendon Jackson,
Joo Yeon Chae,
Francois Mertil,
Mateo Espinosa,
Christina Delimitrou
Abstract:
Large swarms of autonomous devices are increasing in size and importance. When it comes to controlling the devices of large-scale swarms there are two main lines of thought. Centralized control, where all decisions - and often compute - happen in a centralized back-end cloud system, and distributed control, where edge devices are responsible for selecting and executing tasks with minimal or zero h…
▽ More
Large swarms of autonomous devices are increasing in size and importance. When it comes to controlling the devices of large-scale swarms there are two main lines of thought. Centralized control, where all decisions - and often compute - happen in a centralized back-end cloud system, and distributed control, where edge devices are responsible for selecting and executing tasks with minimal or zero help from a centralized entity. In this work we aim to quantify the trade-offs between the two approaches with respect to task assignment quality, latency, and reliability. We do so first on a local swarm of 12 programmable drones with a 10-server cluster as the backend cloud, and then using a validated simulator to study the tail at scale effects of swarm coordination control. We conclude that although centralized control almost always outperforms distributed in the quality of its decisions, it faces significant scalability limitations, and we provide a list of system challenges that need to be addressed for centralized control to scale.
△ Less
Submitted 4 May, 2018;
originally announced May 2018.
-
Seer: Leveraging Big Data to Navigate the Increasing Complexity of Cloud Debugging
Authors:
Yu Gan,
Meghna Pancholi,
Dailun Cheng,
Siyuan Hu,
Yuan He,
Christina Delimitrou
Abstract:
Performance unpredictability in cloud services leads to poor user experience, degraded availability, and has revenue ramifications. Detecting performance degradation a posteriori helps the system take corrective action, but does not avoid the QoS violations. Detecting QoS violations after the fact is even more detrimental when a service consists of hundreds of thousands of loosely-coupled microser…
▽ More
Performance unpredictability in cloud services leads to poor user experience, degraded availability, and has revenue ramifications. Detecting performance degradation a posteriori helps the system take corrective action, but does not avoid the QoS violations. Detecting QoS violations after the fact is even more detrimental when a service consists of hundreds of thousands of loosely-coupled microservices, since performance hiccups can quickly propagate across the dependency graph of microservices. In this work we focus on anticipating QoS violations in cloud settings to mitigate performance unpredictability to begin with. We propose Seer, a cloud runtime that leverages the massive amount of tracing data cloud systems collect over time and a set of practical learning techniques to signal upcoming QoS violations, as well as identify the microservice(s) causing them. Once an imminent QoS violation is detected Seer uses machine-level hardware events to determine the cause of the QoS violation, and adjusts the resource allocations to prevent it. In local clusters with 10 40-core servers and 200-instance clusters on GCE running diverse cloud microservices, we show that Seer correctly anticipates QoS violations 91% of the time, and attributes the violation to the correct microservice in 89% of cases. Finally, Seer detects QoS violations early enough for a corrective action to almost always be applied successfully.
△ Less
Submitted 24 April, 2018;
originally announced April 2018.
-
Mage: Online Interference-Aware Scheduling in Multi-Scale Heterogeneous Systems
Authors:
Francisco Romero,
Christina Delimitrou
Abstract:
Heterogeneity has grown in popularity both at the core and server level as a way to improve both performance and energy efficiency. However, despite these benefits, scheduling applications in heterogeneous machines remains challenging. Additionally, when these heterogeneous resources accommodate multiple applications to increase utilization, resources are prone to contention, destructive interfere…
▽ More
Heterogeneity has grown in popularity both at the core and server level as a way to improve both performance and energy efficiency. However, despite these benefits, scheduling applications in heterogeneous machines remains challenging. Additionally, when these heterogeneous resources accommodate multiple applications to increase utilization, resources are prone to contention, destructive interference, and unpredictable performance. Existing solutions examine heterogeneity either across or within a server, leading to missed performance and efficiency opportunities. We present Mage, a practical interference-aware runtime that optimizes performance and efficiency in systems with intra- and inter-server heterogeneity. Mage leverages fast and online data mining to quickly explore the space of application placements, and determine the one that minimizes destructive interference between co-resident applications. Mage continuously monitors the performance of active applications, and, upon detecting QoS violations, it determines whether alternative placements would prove more beneficial, taking into account any overheads from migration. Across 350 application mixes on a heterogeneous CMP, Mage improves performance by 38% and up to 2x compared to a greedy scheduler. Across 160 mixes on a heterogeneous cluster, Mage improves performance by 30% on average and up to 52% over the greedy scheduler, and by 11% over the combination of Paragon [15] for inter- and intra-server heterogeneity.
△ Less
Submitted 17 April, 2018;
originally announced April 2018.
-
Pliant: Leveraging Approximation to Improve Datacenter Resource Efficiency
Authors:
Neeraj Kulkarni,
Feng Qi,
Christina Delimitrou
Abstract:
Cloud multi-tenancy is typically constrained to a single interactive service colocated with one or more batch, low-priority services, whose performance can be sacrificed when deemed necessary. Approximate computing applications offer the opportunity to enable tighter colocation among multiple applications whose performance is important. We present Pliant, a lightweight cloud runtime that leverages…
▽ More
Cloud multi-tenancy is typically constrained to a single interactive service colocated with one or more batch, low-priority services, whose performance can be sacrificed when deemed necessary. Approximate computing applications offer the opportunity to enable tighter colocation among multiple applications whose performance is important. We present Pliant, a lightweight cloud runtime that leverages the ability of approximate computing applications to tolerate some loss in their output quality to boost the utilization of shared servers. During periods of high resource contention, Pliant employs incremental and interference-aware approximation to reduce contention in shared resources, and prevent QoS violations for co-scheduled interactive, latency-critical services. We evaluate Pliant across different interactive and approximate computing applications, and show that it preserves QoS for all co-scheduled workloads, while incurring a 2.1\% loss in output quality, on average.
△ Less
Submitted 12 April, 2018;
originally announced April 2018.