-
CASPER: Carbon-Aware Scheduling and Provisioning for Distributed Web Services
Authors:
Abel Souza,
Shruti Jasoria,
Basundhara Chakrabarty,
Alexander Bridgwater,
Axel Lundberg,
Filip Skogh,
Ahmed Ali-Eldin,
David Irwin,
Prashant Shenoy
Abstract:
There has been a significant societal push towards sustainable practices, including in computing. Modern interactive workloads such as geo-distributed web-services exhibit various spatiotemporal and performance flexibility, enabling the possibility to adapt the location, time, and intensity of processing to align with the availability of renewable and low-carbon energy. An example is a web applica…
▽ More
There has been a significant societal push towards sustainable practices, including in computing. Modern interactive workloads such as geo-distributed web-services exhibit various spatiotemporal and performance flexibility, enabling the possibility to adapt the location, time, and intensity of processing to align with the availability of renewable and low-carbon energy. An example is a web application hosted across multiple cloud regions, each with varying carbon intensity based on their local electricity mix. Distributed load-balancing enables the exploitation of low-carbon energy through load migration across regions, reducing web applications carbon footprint. In this paper, we present CASPER, a carbon-aware scheduling and provisioning system that primarily minimizes the carbon footprint of distributed web services while also respecting their Service Level Objectives (SLO). We formulate CASPER as an multi-objective optimization problem that considers both the variable carbon intensity and latency constraints of the network. Our evaluation reveals the significant potential of CASPER in achieving substantial reductions in carbon emissions. Compared to baseline methods, CASPER demonstrates improvements of up to 70% with no latency performance degradation.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
BLAFS: A Bloat Aware File System
Authors:
Huaifeng Zhang,
Mohannad Alhanahnah,
Ahmed Ali-Eldin
Abstract:
While there has been exponential improvements in hardware performance over the years, software performance has lagged behind. The performance-gap is caused by software inefficiencies, many of which are caused by software bloat. Software bloat occurs due to the ever increasing, mostly unused, features and dependencies in a software. Bloat exists in all layers of software, from the operating system,…
▽ More
While there has been exponential improvements in hardware performance over the years, software performance has lagged behind. The performance-gap is caused by software inefficiencies, many of which are caused by software bloat. Software bloat occurs due to the ever increasing, mostly unused, features and dependencies in a software. Bloat exists in all layers of software, from the operating system, to the application, resulting in computing resource wastage. The problem is exacerbated in both cloud and edge setting as the number of applications running increase. To remove software bloat, multiple debloating tools have been proposed in the literature. However, these tools do not provide safety guarantees on the debloated software, with some files needed during run-time removed. In this paper, We introduce BLAFS, a BLoat-Aware-file system for containers. BLAFS guarantees debloating safety for both cloud and edge systems. BLAFS is implemented on top of the Overlay file-system, allowing for file-system layer sharing across the containers. We compare BLAFS to two state-of-the-art debloating tools (Cimplifier and Dockerslim), and two state-of-the-art lazy-loading container snap-shotters for edge systems (Starlight and eStargz). Our evaluation of real-world containers shows BLAFS reduces container sizes by up to 97% of the original size, while maintaining the safety of the containers when other debloating tools fail. We also evaluate BLAFS's performance in edge settings. It can reduce the container provisioning time by up to 90% providing comparable bandwidth reductions to lazy-loading snap-shotters, while removing 97% of the vulnerabilities, and up to 97% less space on the edge.
△ Less
Submitted 8 May, 2023;
originally announced May 2023.
-
Machine Learning Systems are Bloated and Vulnerable
Authors:
Huaifeng Zhang,
Fahmi Abdulqadir Ahmed,
Dyako Fatih,
Akayou Kitessa,
Mohannad Alhanahnah,
Philipp Leitner,
Ahmed Ali-Eldin
Abstract:
Today's software is bloated with both code and features that are not used by most users. This bloat is prevalent across the entire software stack, from operating systems and applications to containers. Containers are lightweight virtualization technologies used to package code and dependencies, providing portable, reproducible and isolated environments. For their ease of use, data scientists often…
▽ More
Today's software is bloated with both code and features that are not used by most users. This bloat is prevalent across the entire software stack, from operating systems and applications to containers. Containers are lightweight virtualization technologies used to package code and dependencies, providing portable, reproducible and isolated environments. For their ease of use, data scientists often utilize machine learning containers to simplify their workflow. However, this convenience comes at a cost: containers are often bloated with unnecessary code and dependencies, resulting in very large sizes. In this paper, we analyze and quantify bloat in machine learning containers. We develop MMLB, a framework for analyzing bloat in software systems, focusing on machine learning containers. MMLB measures the amount of bloat at both the container and package levels, quantifying the sources of bloat. In addition, MMLB integrates with vulnerability analysis tools and performs package dependency analysis to evaluate the impact of bloat on container vulnerabilities. Through experimentation with 15 machine learning containers from TensorFlow, PyTorch, and Nvidia, we show that bloat accounts for up to 80% of machine learning container sizes, increasing container provisioning times by up to 370% and exacerbating vulnerabilities by up to 99%.
△ Less
Submitted 25 January, 2024; v1 submitted 16 December, 2022;
originally announced December 2022.
-
Model-driven Cluster Resource Management for AI Workloads in Edge Clouds
Authors:
Qianlin Liang,
Walid A. Hanafy,
Ahmed Ali-Eldin,
Prashant Shenoy
Abstract:
Since emerging edge applications such as Internet of Things (IoT) analytics and augmented reality have tight latency constraints, hardware AI accelerators have been recently proposed to speed up deep neural network (DNN) inference run by these applications. Resource-constrained edge servers and accelerators tend to be multiplexed across multiple IoT applications, introducing the potential for perf…
▽ More
Since emerging edge applications such as Internet of Things (IoT) analytics and augmented reality have tight latency constraints, hardware AI accelerators have been recently proposed to speed up deep neural network (DNN) inference run by these applications. Resource-constrained edge servers and accelerators tend to be multiplexed across multiple IoT applications, introducing the potential for performance interference between latency-sensitive workloads. In this paper, we design analytic models to capture the performance of DNN inference workloads on shared edge accelerators, such as GPU and edgeTPU, under different multiplexing and concurrency behaviors. After validating our models using extensive experiments, we use them to design various cluster resource management algorithms to intelligently manage multiple applications on edge accelerators while respecting their latency constraints. We implement a prototype of our system in Kubernetes and show that our system can host 2.3X more DNN applications in heterogeneous multi-tenant edge clusters with no latency violations when compared to traditional knapsack hosting algorithms.
△ Less
Submitted 18 January, 2022;
originally announced January 2022.
-
LaSS: Running Latency Sensitive Serverless Computations at the Edge
Authors:
Bin Wang,
Ahmed Ali-Eldin,
Prashant Shenoy
Abstract:
Serverless computing has emerged as a new paradigm for running short-lived computations in the cloud. Due to its ability to handle IoT workloads, there has been considerable interest in running serverless functions at the edge. However, the constrained nature of the edge and the latency sensitive nature of workloads result in many challenges for serverless platforms. In this paper, we present LaSS…
▽ More
Serverless computing has emerged as a new paradigm for running short-lived computations in the cloud. Due to its ability to handle IoT workloads, there has been considerable interest in running serverless functions at the edge. However, the constrained nature of the edge and the latency sensitive nature of workloads result in many challenges for serverless platforms. In this paper, we present LaSS, a platform that uses model-driven approaches for running latency-sensitive serverless computations on edge resources. LaSS uses principled queuing-based methods to determine an appropriate allocation for each hosted function and auto-scales the allocated resources in response to workload dynamics. LaSS uses a fair-share allocation approach to guarantee a minimum of allocated resources to each function in the presence of overload. In addition, it utilizes resource reclamation methods based on container deflation and termination to reassign resources from over-provisioned functions to under-provisioned ones. We implement a prototype of our approach on an OpenWhisk serverless edge cluster and conduct a detailed experimental evaluation. Our results show that LaSS can accurately predict the resources needed for serverless functions in the presence of highly dynamic workloads, and reprovision container capacity within hundreds of milliseconds while maintaining fair share allocation guarantees.
△ Less
Submitted 28 April, 2021;
originally announced April 2021.
-
The Hidden cost of the Edge: A Performance Comparison of Edge and Cloud Latencies
Authors:
Ahmed Ali-Eldin,
Bin Wang,
Prashant Shenoy
Abstract:
Edge computing has emerged as a popular paradigm for running latency-sensitive applications due to its ability to offer lower network latencies to end-users. In this paper, we argue that despite its lower network latency, the resource-constrained nature of the edge can result in higher end-to-end latency, especially at higher utilizations, when compared to cloud data centers. We study this edge pe…
▽ More
Edge computing has emerged as a popular paradigm for running latency-sensitive applications due to its ability to offer lower network latencies to end-users. In this paper, we argue that despite its lower network latency, the resource-constrained nature of the edge can result in higher end-to-end latency, especially at higher utilizations, when compared to cloud data centers. We study this edge performance inversion problem through an analytic comparison of edge and cloud latencies and analyze conditions under which the edge can yield worse performance than the cloud. To verify our analytic results, we conduct a detailed experimental comparison of the edge and the cloud latencies using a realistic application and real cloud workloads. Both our analytical and experimental results show that even at moderate utilizations, the edge queuing delays can offset the benefits of lower network latencies, and even result in performance inversion where running in the cloud would provide superior latencies. We finally discuss practical implications of our results and provide insights into how application designers and service providers should design edge applications and systems to avoid these pitfalls.
△ Less
Submitted 28 April, 2021;
originally announced April 2021.
-
Cloud-scale VM Deflation for Running Interactive Applications On Transient Servers
Authors:
Alexander Fuerst,
Ahmed Ali-Eldin,
Prashant Shenoy,
Prateek Sharma
Abstract:
Transient computing has become popular in public cloud environments for running delay-insensitive batch and data processing applications at low cost. Since transient cloud servers can be revoked at any time by the cloud provider, they are considered unsuitable for running interactive application such as web services. In this paper, we present VM deflation as an alternative mechanism to server pree…
▽ More
Transient computing has become popular in public cloud environments for running delay-insensitive batch and data processing applications at low cost. Since transient cloud servers can be revoked at any time by the cloud provider, they are considered unsuitable for running interactive application such as web services. In this paper, we present VM deflation as an alternative mechanism to server preemption for reclaiming resources from transient cloud servers under resource pressure. Using real traces from top-tier cloud providers, we show the feasibility of using VM deflation as a resource reclamation mechanism for interactive applications in public clouds. We show how current hypervisor mechanisms can be used to implement VM deflation and present cluster deflation policies for resource management of transient and on-demand cloud VMs. Experimental evaluation of our deflation system on a Linux cluster shows that microservice-based applications can be deflated by up to 50\% with negligible performance overhead. Our cluster-level deflation policies allow overcommitment levels as high as 50\%, with less than a 1\% decrease in application throughput, and can enable cloud platforms to increase revenue by 30\%.
△ Less
Submitted 31 May, 2020;
originally announced June 2020.
-
Power-Performance Tradeoffs in Data Center Servers: DVFS, CPU pinning, Horizontal, and Vertical Scaling
Authors:
Jakub Krzywda,
Ahmed Ali-Eldin,
Trevor E. Carlson,
Per-Olov Östberg,
Erik Elmroth
Abstract:
Dynamic Voltage and Frequency Scaling (DVFS), CPU pinning, horizontal, and vertical scaling, are four techniques that have been proposed as actuators to control the performance and energy consumption on data center servers. This work investigates the utility of these four actuators, and quantifies the power-performance tradeoffs associated with them. Using replicas of the German Wikipedia running…
▽ More
Dynamic Voltage and Frequency Scaling (DVFS), CPU pinning, horizontal, and vertical scaling, are four techniques that have been proposed as actuators to control the performance and energy consumption on data center servers. This work investigates the utility of these four actuators, and quantifies the power-performance tradeoffs associated with them. Using replicas of the German Wikipedia running on our local testbed, we perform a set of experiments to quantify the influence of DVFS, vertical and horizontal scaling, and CPU pinning on end-to-end response time (average and tail), throughput, and power consumption with different workloads. Results of the experiments show that DVFS rarely reduces the power consumption of underloaded servers by more than 5%, but it can be used to limit the maximal power consumption of a saturated server by up to 20% (at a cost of performance degradation). CPU pinning reduces the power consumption of underloaded server (by up to 7%) at the cost of performance degradation, which can be limited by choosing an appropriate CPU pinning scheme. Horizontal and vertical scaling improves both the average and tail response time, but the improvement is not proportional to the amount of resources added. The load balancing strategy has a big impact on the tail response time of horizontally scaled applications.
△ Less
Submitted 13 March, 2019;
originally announced March 2019.
-
A Survey on Modeling Energy Consumption of Cloud Applications: Deconstruction, State of the Art, and Trade-off Debates
Authors:
Zheng Li,
Selome Tesfatsion,
Saeed Bastani,
Ahmed Ali-Eldin,
Erik Elmroth,
Maria Kihl,
Rajiv Ranjan
Abstract:
Given the complexity and heterogeneity in Cloud computing scenarios, the modeling approach has widely been employed to investigate and analyze the energy consumption of Cloud applications, by abstracting real-world objects and processes that are difficult to observe or understand directly. It is clear that the abstraction sacrifices, and usually does not need, the complete reflection of the realit…
▽ More
Given the complexity and heterogeneity in Cloud computing scenarios, the modeling approach has widely been employed to investigate and analyze the energy consumption of Cloud applications, by abstracting real-world objects and processes that are difficult to observe or understand directly. It is clear that the abstraction sacrifices, and usually does not need, the complete reflection of the reality to be modeled. Consequently, current energy consumption models vary in terms of purposes, assumptions, application characteristics and environmental conditions, with possible overlaps between different research works. Therefore, it would be necessary and valuable to reveal the state-of-the-art of the existing modeling efforts, so as to weave different models together to facilitate comprehending and further investigating application energy consumption in the Cloud domain. By systematically selecting, assessing and synthesizing 76 relevant studies, we rationalized and organized over 30 energy consumption models with unified notations. To help investigate the existing models and facilitate future modeling work, we deconstructed the runtime execution and deployment environment of Cloud applications, and identified 18 environmental factors and 12 workload factors that would be influential on the energy consumption. In particular, there are complicated trade-offs and even debates when dealing with the combinational impacts of multiple factors.
△ Less
Submitted 2 August, 2017;
originally announced August 2017.
-
A Distributed Data Collection Algorithm for Wireless Sensor Networks with Persistent Storage Nodes
Authors:
Salah A. Aly,
Ahmed Ali-Eldin,
H. Vincent Poor
Abstract:
A distributed data collection algorithm to accurately store and forward information obtained by wireless sensor networks is proposed. The proposed algorithm does not depend on the sensor network topology, routing tables, or geographic locations of sensor nodes, but rather makes use of uniformly distributed storage nodes. Analytical and simulation results for this algorithm show that, with high pro…
▽ More
A distributed data collection algorithm to accurately store and forward information obtained by wireless sensor networks is proposed. The proposed algorithm does not depend on the sensor network topology, routing tables, or geographic locations of sensor nodes, but rather makes use of uniformly distributed storage nodes. Analytical and simulation results for this algorithm show that, with high probability, the data disseminated by the sensor nodes can be precisely collected by querying any small set of storage nodes.
△ Less
Submitted 11 November, 2010;
originally announced November 2010.