Skip to main content

Showing 1–23 of 23 results for author: Delimitrou, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.02920  [pdf, other

    cs.DC cs.AI

    Analytically-Driven Resource Management for Cloud-Native Microservices

    Authors: Yanqi Zhang, Zhuangzhuang Zhou, Sameh Elnikety, Christina Delimitrou

    Abstract: Resource management for cloud-native microservices has attracted a lot of recent attention. Previous work has shown that machine learning (ML)-driven approaches outperform traditional techniques, such as autoscaling, in terms of both SLA maintenance and resource efficiency. However, ML-driven approaches also face challenges including lengthy data collection processes and limited scalability. We pr… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  2. arXiv:2308.02896  [pdf, other

    cs.DC cs.AR cs.NI cs.OS

    Towards Fast, Adaptive, and Hardware-Assisted User-Space Scheduling

    Authors: Lisa, Li, Nikita Lazarev, David Koufaty, Yijun Yin, Andy Anderson, Zhiru Zhang, Edward Suh, Kostis Kaffes, Christina Delimitrou

    Abstract: Modern datacenter applications are prone to high tail latencies since their requests typically follow highly-dispersive distributions. Delivering fast interrupts is essential to reducing tail latency. Prior work has proposed both OS- and system-level solutions to reduce tail latencies for microsecond-scale workloads through better scheduling. Unfortunately, existing approaches like customized data… ▽ More

    Submitted 11 November, 2023; v1 submitted 5 August, 2023; originally announced August 2023.

    Comments: Accepted by HPCA2024

  3. arXiv:2301.04122  [pdf, other

    cs.DC cs.AI

    Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks

    Authors: Mingyu Liang, Wenyin Fu, Louis Feng, Zhongyi Lin, Pavani Panakanti, Shengbao Zheng, Srinivas Sridharan, Christina Delimitrou

    Abstract: Building large AI fleets to support the rapidly growing DL workloads is an active research topic for modern cloud providers. Generating accurate benchmarks plays an essential role in designing the fast-paced software and hardware solutions in this space. Two fundamental challenges to make this scalable are (i) workload representativeness and (ii) the ability to quickly incorporate changes to the f… ▽ More

    Submitted 11 April, 2023; v1 submitted 16 December, 2022; originally announced January 2023.

    Comments: Accepted to ISCA 2023

  4. arXiv:2212.13882  [pdf, other

    cs.DC cs.NI

    QoS-Aware Resource Management for Multi-phase Serverless Workflows with Aquatope

    Authors: Zhuangzhuang Zhou, Yanqi Zhang, Christina Delimitrou

    Abstract: Multi-stage serverless applications, i.e., workflows with many computation and I/O stages, are becoming increasingly representative of FaaS platforms. Despite their advantages in terms of fine-grained scalability and modular development, these applications are subject to suboptimal performance, resource inefficiency, and high costs to a larger degree than previous simple serverless functions. We… ▽ More

    Submitted 28 December, 2022; originally announced December 2022.

  5. arXiv:2212.13867  [pdf, other

    cs.DC cs.AR

    End-to-End Application Cloning for Distributed Cloud Microservices with Ditto

    Authors: Mingyu Liang, Yu Gan, Yueying Li, Carlos Torres, Abhishek Danotia, Mahesh Ketkar, Christina Delimitrou

    Abstract: We present Ditto, an automated framework for cloning end-to-end cloud applications, both monolithic and microservices, which captures I/O and network activity, as well as kernel operations, in addition to application logic. Ditto takes a hierarchical approach to application cloning, starting with capturing the dependency graph across distributed services, to recreating each tier's control/data flo… ▽ More

    Submitted 28 December, 2022; originally announced December 2022.

  6. arXiv:2112.14831  [pdf, other

    cs.DC cs.AR

    A Hardware-Software Stack for Serverless Edge Swarms

    Authors: Liam Patterson, David Pigorovsky, Brian Dempsey, Nikita Lazarev, Aditya Shah, Clara Steinhoff, Ariana Bruno, Justin Hu, Christina Delimitrou

    Abstract: Swarms of autonomous devices are increasing in ubiquity and size, making the need for rethinking their hardware-software system stack critical. We present HiveMind, the first swarm coordination platform that enables programmable execution of complex task workflows between cloud and edge resources in a performant and scalable manner. HiveMind is a software-hardware platform that includes a domain… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

  7. arXiv:2112.06263  [pdf, other

    cs.DC

    Sage: Leveraging ML to Diagnose Unpredictable Performance in Cloud Microservices

    Authors: Yu Gan, Mingyu Liang, Sundar Dev, David Lo, Christina Delimitrou

    Abstract: Cloud applications are increasingly shifting from large monolithic services, to complex graphs of loosely-coupled microservices. Despite their advantages, microservices also introduce cascading QoS violations in cloud applications, which are difficult to diagnose and correct. We present Sage, a ML-driven root cause analysis system for interactive cloud microservices. Sage leverages unsupervised… ▽ More

    Submitted 12 December, 2021; originally announced December 2021.

  8. arXiv:2112.06254  [pdf, other

    cs.DC

    Sinan: Data Driven Resource Management for Cloud Microservices

    Authors: Yanqi Zhang, Weizhe Hua, Zhuangzhuang Zhou, Ed Suh, Christina Delimitrou

    Abstract: Cloud applications are increasingly shifting to interactive and loosely-coupled microservices. Despite their advantages, microservices complicate resource management, due to inter-tier dependencies. We present Sinan, a cluster manager for interactive microservices that leverages easily-obtainable tracing data instead of empirical decisions, to infer the impact of a resource allocation on on end-… ▽ More

    Submitted 12 December, 2021; originally announced December 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2105.13424

  9. arXiv:2106.01482  [pdf, other

    cs.AR cs.NI

    Dagger: Accelerating RPCs in Cloud Microservices Through Tightly-Coupled Reconfigurable NICs

    Authors: Nikita Lazarev, Shaojie Xiang, Neil Adit, Zhiru Zhang, Christina Delimitrou

    Abstract: The ongoing shift of cloud services from monolithic designs to microservices creates high demand for efficient and high performance datacenter networking stacks, optimized for fine-grained workloads. Commodity networking systems based on software stacks and peripheral NICs introduce high overheads when it comes to delivering small messages. We present Dagger, a hardware acceleration fabric for c… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

  10. arXiv:2105.13424  [pdf, other

    cs.DC cs.LG cs.NI

    Sinan: Data-Driven, QoS-Aware Cluster Management for Microservices

    Authors: Yanqi Zhang, Weizhe Hua, Zhuangzhuang Zhou, Edward Suh, Christina Delimitrou

    Abstract: Cloud applications are increasingly shifting from large monolithic services, to large numbers of loosely-coupled, specialized microservices. Despite their advantages in terms of facilitating development, deployment, modularity, and isolation, microservices complicate resource management, as dependencies between them introduce backpressure effects and cascading QoS violations. We present Sinan, a… ▽ More

    Submitted 27 May, 2021; originally announced May 2021.

  11. arXiv:2101.00267  [pdf, other

    cs.DC cs.PF

    Sage: Using Unsupervised Learning for Scalable Performance Debugging in Microservices

    Authors: Yu Gan, Mingyu Liang, Sundar Dev, David Lo, Christina Delimitrou

    Abstract: Cloud applications are increasingly shifting from large monolithic services to complex graphs of loosely-coupled microservices. Despite the advantages of modularity and elasticity microservices offer, they also complicate cluster management and performance debugging, as dependencies between tiers introduce backpressure and cascading QoS violations. We present Sage, a machine learning-driven root… ▽ More

    Submitted 1 January, 2021; originally announced January 2021.

  12. arXiv:2008.00329  [pdf, other

    cs.AR

    CuttleSys: Data-Driven Resource Management forInteractive Applications on Reconfigurable Multicores

    Authors: Neeraj Kulkarni, Gonzalo Gonzalez-Pumariega, Amulya Khurana, Christine Shoemaker, Christina Delimitrou, David Albonesi

    Abstract: Multi-tenancy for latency-critical applications leads to re-source interference and unpredictable performance. Core reconfiguration opens up more opportunities for colocation,as it allows the hardware to adjust to the dynamic performance and power needs of a specific mix of co-scheduled applications. However, reconfigurability also introduces challenges, as even for a small number of reconfigurabl… ▽ More

    Submitted 1 August, 2020; originally announced August 2020.

  13. arXiv:2007.08622  [pdf

    cs.AR cs.NI

    Dagger: Towards Efficient RPCs in Cloud Microservices with Near-Memory Reconfigurable NICs

    Authors: Nikita Lazarev, Neil Adit, Shaojie Xiang, Zhiru Zhang, Christina Delimitrou

    Abstract: Cloud applications are increasingly relying on hundreds of loosely-coupled microservices to complete user requests that meet an applications end-to-end QoS requirements. Communication time between services accounts for a large fraction of the end-to-end latency and can introduce performance unpredictability and QoS violations. This work presents our early work on Dagger, a hardware acceleration pl… ▽ More

    Submitted 11 September, 2020; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: 4 pages, 7 figures

  14. arXiv:2002.01419  [pdf, other

    cs.DC cs.NI

    HiveMind: A Scalable and Serverless Coordination Control Platform for UAV Swarms

    Authors: Justin Hu, Ariana Bruno, Brian Ritchken, Brendon Jackson, Mateo Espinosa, Aditya Shah, Christina Delimitrou

    Abstract: Swarms of autonomous devices are increasing in ubiquity and size. There are two main trains of thought for controlling devices in such swarms; centralized and distributed control. Centralized platforms achieve higher output quality but result in high network traffic and limited scalability, while decentralized systems are more scalable, but less sophisticated. In this work we present HiveMind, a… ▽ More

    Submitted 4 February, 2020; originally announced February 2020.

  15. arXiv:2001.00222  [pdf, other

    cs.DC

    Ripple: A Practical Declarative Programming Framework for Serverless Compute

    Authors: Shannon Joyner, Michael MacCoss, Christina Delimitrou, Hakim Weatherspoon

    Abstract: Serverless computing has emerged as a promising alternative to infrastructure- (IaaS) and platform-as-a-service (PaaS)cloud platforms for applications with ample parallelism and intermittent activity. Serverless promises greater resource elasticity, significant cost savings, and simplified application deployment. All major cloud providers, including Amazon, Google, and Microsoft, have introduced s… ▽ More

    Submitted 1 January, 2020; originally announced January 2020.

  16. arXiv:1911.02122  [pdf, other

    cs.DC

    uqSim: Scalable and Validated Simulation of Cloud Microservices

    Authors: Yanqi Zhang, Yu Gan, Christina Delimitrou

    Abstract: Current cloud services are moving away from monolithic designs and towards graphs of many loosely-coupled, single-concerned microservices. Microservices have several advantages, including speeding up development and deployment, allowing specialization of the software infrastructure, and hel** with debugging and error isolation. At the same time they introduce several hardware and software challe… ▽ More

    Submitted 5 November, 2019; originally announced November 2019.

  17. arXiv:1905.11055  [pdf, other

    cs.DC

    An Open-Source Benchmark Suite for Cloud and IoT Microservices

    Authors: Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, Nayan Katarki, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson, Kelvin Hu, Meghna Pancholi, Yuan He, Brett Clancy, Chris Colen, Fukang Wen, Catherine Leung, Siyuan Wang, Leon Zaruvinsky, Mateo Espinosa, Rick Lin, Zhongling Liu, Jake Padilla, Christina Delimitrou

    Abstract: Cloud services have recently started undergoing a major shift from monolithic applications, to graphs of hundreds of loosely-coupled microservices. Microservices fundamentally change a lot of assumptions current cloud systems are designed with, and present both opportunities and challenges when optimizing for quality of service (QoS) and utilization. In this paper we explore the implications micro… ▽ More

    Submitted 27 May, 2019; originally announced May 2019.

  18. arXiv:1905.00968  [pdf, other

    cs.DC cs.LG

    Leveraging Deep Learning to Improve the Performance Predictability of Cloud Microservices

    Authors: Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, Christina Delimitrou

    Abstract: Performance unpredictability is a major roadblock towards cloud adoption, and has performance, cost, and revenue ramifications. Predictable performance is even more critical as cloud services transition from monolithic designs to microservices. Detecting QoS violations after they occur in systems with microservices results in long recovery times, as hotspots propagate and amplify across dependent… ▽ More

    Submitted 2 May, 2019; originally announced May 2019.

  19. arXiv:1805.10351  [pdf, other

    cs.DC

    The Architectural Implications of Microservices in the Cloud

    Authors: Yu Gan, Christina Delimitrou

    Abstract: Cloud services have recently undergone a shift from monolithic applications to microservices, with hundreds or thousands of loosely-coupled microservices comprising the end-to-end application. Microservices present both opportunities and challenges when optimizing for quality of service (QoS) and cloud utilization. In this paper we explore the implications cloud microservices have on system bottle… ▽ More

    Submitted 25 May, 2018; originally announced May 2018.

  20. arXiv:1805.01786  [pdf, other

    cs.DC

    To Centralize or Not to Centralize: A Tale of Swarm Coordination

    Authors: Justin Hu, Ariana Bruno, Drew Zagieboylo, Mark Zhao, Brian Ritchken, Brendon Jackson, Joo Yeon Chae, Francois Mertil, Mateo Espinosa, Christina Delimitrou

    Abstract: Large swarms of autonomous devices are increasing in size and importance. When it comes to controlling the devices of large-scale swarms there are two main lines of thought. Centralized control, where all decisions - and often compute - happen in a centralized back-end cloud system, and distributed control, where edge devices are responsible for selecting and executing tasks with minimal or zero h… ▽ More

    Submitted 4 May, 2018; originally announced May 2018.

  21. arXiv:1804.09136  [pdf, other

    cs.DC

    Seer: Leveraging Big Data to Navigate the Increasing Complexity of Cloud Debugging

    Authors: Yu Gan, Meghna Pancholi, Dailun Cheng, Siyuan Hu, Yuan He, Christina Delimitrou

    Abstract: Performance unpredictability in cloud services leads to poor user experience, degraded availability, and has revenue ramifications. Detecting performance degradation a posteriori helps the system take corrective action, but does not avoid the QoS violations. Detecting QoS violations after the fact is even more detrimental when a service consists of hundreds of thousands of loosely-coupled microser… ▽ More

    Submitted 24 April, 2018; originally announced April 2018.

  22. arXiv:1804.06462  [pdf, other

    cs.DC

    Mage: Online Interference-Aware Scheduling in Multi-Scale Heterogeneous Systems

    Authors: Francisco Romero, Christina Delimitrou

    Abstract: Heterogeneity has grown in popularity both at the core and server level as a way to improve both performance and energy efficiency. However, despite these benefits, scheduling applications in heterogeneous machines remains challenging. Additionally, when these heterogeneous resources accommodate multiple applications to increase utilization, resources are prone to contention, destructive interfere… ▽ More

    Submitted 17 April, 2018; originally announced April 2018.

  23. arXiv:1804.05671  [pdf, other

    cs.PF

    Pliant: Leveraging Approximation to Improve Datacenter Resource Efficiency

    Authors: Neeraj Kulkarni, Feng Qi, Christina Delimitrou

    Abstract: Cloud multi-tenancy is typically constrained to a single interactive service colocated with one or more batch, low-priority services, whose performance can be sacrificed when deemed necessary. Approximate computing applications offer the opportunity to enable tighter colocation among multiple applications whose performance is important. We present Pliant, a lightweight cloud runtime that leverages… ▽ More

    Submitted 12 April, 2018; originally announced April 2018.

    Comments: 15 pages, 10 figures