-
LACS: Learning-Augmented Algorithms for Carbon-Aware Resource Scaling with Uncertain Demand
Authors:
Roozbeh Bostandoost,
Adam Lechowicz,
Walid A. Hanafy,
Noman Bashir,
Prashant Shenoy,
Mohammad Hajiesmaili
Abstract:
Motivated by an imperative to reduce the carbon emissions of cloud data centers, this paper studies the online carbon-aware resource scaling problem with unknown job lengths (OCSU) and applies it to carbon-aware resource scaling for executing computing workloads. The task is to dynamically scale resources (e.g., the number of servers) assigned to a job of unknown length such that it is completed b…
▽ More
Motivated by an imperative to reduce the carbon emissions of cloud data centers, this paper studies the online carbon-aware resource scaling problem with unknown job lengths (OCSU) and applies it to carbon-aware resource scaling for executing computing workloads. The task is to dynamically scale resources (e.g., the number of servers) assigned to a job of unknown length such that it is completed before a deadline, with the objective of reducing the carbon emissions of executing the workload. The total carbon emissions of executing a job originate from the emissions of running the job and excess carbon emitted while switching between different scales (e.g., due to checkpoint and resume). Prior work on carbon-aware resource scaling has assumed accurate job length information, while other approaches have ignored switching losses and require carbon intensity forecasts. These assumptions prohibit the practical deployment of prior work for online carbon-aware execution of scalable computing workload. We propose LACS, a theoretically robust learning-augmented algorithm that solves OCSU. To achieve improved practical average-case performance, LACS integrates machine-learned predictions of job length. To achieve solid theoretical performance, LACS extends the recent theoretical advances on online conversion with switching costs to handle a scenario where the job length is unknown. Our experimental evaluations demonstrate that, on average, the carbon footprint of LACS lies within 1.2% of the online baseline that assumes perfect job length information and within 16% of the offline baseline that, in addition to the job length, also requires accurate carbon intensity forecasts. Furthermore, LACS achieves a 32% reduction in carbon footprint compared to the deadline-aware carbon-agnostic execution of the job.
△ Less
Submitted 4 June, 2024; v1 submitted 29 March, 2024;
originally announced April 2024.
-
The War of the Efficiencies: Understanding the Tension between Carbon and Energy Optimization
Authors:
Walid A. Hanafy,
Roozbeh Bostandoost,
Noman Bashir,
David Irwin,
Mohammad Hajiesmaili,
Prashant Shenoy
Abstract:
Major innovations in computing have been driven by scaling up computing infrastructure, while aggressively optimizing operating costs. The result is a network of worldwide datacenters that consume a large amount of energy, mostly in an energy-efficient manner. Since the electric grid powering these datacenters provided a simple and opaque abstraction of an unlimited and reliable power supply, the…
▽ More
Major innovations in computing have been driven by scaling up computing infrastructure, while aggressively optimizing operating costs. The result is a network of worldwide datacenters that consume a large amount of energy, mostly in an energy-efficient manner. Since the electric grid powering these datacenters provided a simple and opaque abstraction of an unlimited and reliable power supply, the computing industry remained largely oblivious to the carbon intensity of the electricity it uses. Much like the rest of the society, it generally treated the carbon intensity of the electricity as constant, which was mostly true for a fossil fuel-driven grid. As a result, the cost-driven objective of increasing energy-efficiency -- by doing more work per unit of energy -- has generally been viewed as the most carbon-efficient approach. However, as the electric grid is increasingly powered by clean energy and is exposing its time-varying carbon intensity, the most energy-efficient operation is no longer necessarily the most carbon-efficient operation. There has been a recent focus on exploiting the flexibility of computing's workloads -- along temporal, spatial, and resource dimensions -- to reduce carbon emissions, which comes at the cost of either performance or energy efficiency. In this paper, we discuss the trade-offs between energy efficiency and carbon efficiency in exploiting computing's flexibility and show that blindly optimizing for energy efficiency is not always the right approach.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Understanding the Benefits of Hardware-Accelerated Communication in Model-Serving Applications
Authors:
Walid A. Hanafy,
Limin Wang,
Hyunseok Chang,
Sarit Mukherjee,
T. V. Lakshman,
Prashant Shenoy
Abstract:
It is commonly assumed that the end-to-end networking performance of edge offloading is purely dictated by that of the network connectivity between end devices and edge computing facilities, where ongoing innovation in 5G/6G networking can help. However, with the growing complexity of edge-offloaded computation and dynamic load balancing requirements, an offloaded task often goes through a multi-s…
▽ More
It is commonly assumed that the end-to-end networking performance of edge offloading is purely dictated by that of the network connectivity between end devices and edge computing facilities, where ongoing innovation in 5G/6G networking can help. However, with the growing complexity of edge-offloaded computation and dynamic load balancing requirements, an offloaded task often goes through a multi-stage pipeline that spans across multiple compute nodes and proxies interconnected via a dedicated network fabric within a given edge computing facility. As the latest hardware-accelerated transport technologies such as RDMA and GPUDirect RDMA are adopted to build such network fabric, there is a need for good understanding of the full potential of these technologies in the context of computation offload and the effect of different factors such as GPU scheduling and characteristics of computation on the net performance gain achievable by these technologies. This paper unveils detailed insights into the latency overhead in typical machine learning (ML)-based computation pipelines and analyzes the potential benefits of adopting hardware-accelerated communication. To this end, we build a model-serving framework that supports various communication mechanisms. Using the framework, we identify performance bottlenecks in state-of-the-art model-serving pipelines and show how hardware-accelerated communication can alleviate them. For example, we show that GPUDirect RDMA can save 15--50\% of model-serving latency, which amounts to 70--160 ms.
△ Less
Submitted 10 July, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
CarbonScaler: Leveraging Cloud Workload Elasticity for Optimizing Carbon-Efficiency
Authors:
Walid A. Hanafy,
Qianlin Liang,
Noman Bashir,
David Irwin,
Prashant Shenoy
Abstract:
Cloud platforms are increasing their emphasis on sustainability and reducing their operational carbon footprint. A common approach for reducing carbon emissions is to exploit the temporal flexibility inherent to many cloud workloads by executing them in periods with the greenest energy and suspending them at other times. Since such suspend-resume approaches can incur long delays in job completion…
▽ More
Cloud platforms are increasing their emphasis on sustainability and reducing their operational carbon footprint. A common approach for reducing carbon emissions is to exploit the temporal flexibility inherent to many cloud workloads by executing them in periods with the greenest energy and suspending them at other times. Since such suspend-resume approaches can incur long delays in job completion times, we present a new approach that exploits the elasticity of batch workloads in the cloud to optimize their carbon emissions. Our approach is based on the notion of "carbon scaling," similar to cloud autoscaling, where a job dynamically varies its server allocation based on fluctuations in the carbon cost of the grid's energy. We develop a greedy algorithm for minimizing a job's carbon emissions via carbon scaling that is based on the well-known problem of marginal resource allocation. We implement a CarbonScaler prototype in Kubernetes using its autoscaling capabilities and an analytic tool to guide the carbon-efficient deployment of batch applications in the cloud. We then evaluate CarbonScaler using real-world machine learning training and MPI jobs on a commercial cloud platform and show that it can yield i) 51% carbon savings over carbon-agnostic execution; ii) 37% over a state-of-the-art suspend-resume policy; and iii) 8% over the best static scaling policy.
△ Less
Submitted 19 October, 2023; v1 submitted 16 February, 2023;
originally announced February 2023.
-
Ecovisor: A Virtual Energy System for Carbon-Efficient Applications
Authors:
Abel Souza,
Noman Bashir,
Jorge Murillo,
Walid Hanafy,
Qianlin Liang,
David Irwin,
Prashant Shenoy
Abstract:
Cloud platforms' rapid growth is raising significant concerns about their carbon emissions. To reduce emissions, future cloud platforms will need to increase their reliance on renewable energy sources, such as solar and wind, which have zero emissions but are highly unreliable. Unfortunately, today's energy systems effectively mask this unreliability in hardware, which prevents applications from o…
▽ More
Cloud platforms' rapid growth is raising significant concerns about their carbon emissions. To reduce emissions, future cloud platforms will need to increase their reliance on renewable energy sources, such as solar and wind, which have zero emissions but are highly unreliable. Unfortunately, today's energy systems effectively mask this unreliability in hardware, which prevents applications from optimizing their carbon-efficiency, or work done per kilogram of carbon emitted. To address this problem, we design an "ecovisor", which virtualizes the energy system and exposes software-defined control of it to applications. An ecovisor enables each application to handle clean energy's unreliability in software based on its own specific requirements. We implement a small-scale ecovisor prototype that virtualizes a physical energy system to enable software-based application-level i) visibility into variable grid carbon-intensity and renewable generation and ii) control of server power usage and battery charging/discharging. We evaluate the ecovisor approach by showing how multiple applications can concurrently exercise their virtual energy system in different ways to better optimize carbon-efficiency based on their specific requirements compared to a general system-wide policy.
△ Less
Submitted 10 October, 2022;
originally announced October 2022.
-
Model-driven Cluster Resource Management for AI Workloads in Edge Clouds
Authors:
Qianlin Liang,
Walid A. Hanafy,
Ahmed Ali-Eldin,
Prashant Shenoy
Abstract:
Since emerging edge applications such as Internet of Things (IoT) analytics and augmented reality have tight latency constraints, hardware AI accelerators have been recently proposed to speed up deep neural network (DNN) inference run by these applications. Resource-constrained edge servers and accelerators tend to be multiplexed across multiple IoT applications, introducing the potential for perf…
▽ More
Since emerging edge applications such as Internet of Things (IoT) analytics and augmented reality have tight latency constraints, hardware AI accelerators have been recently proposed to speed up deep neural network (DNN) inference run by these applications. Resource-constrained edge servers and accelerators tend to be multiplexed across multiple IoT applications, introducing the potential for performance interference between latency-sensitive workloads. In this paper, we design analytic models to capture the performance of DNN inference workloads on shared edge accelerators, such as GPU and edgeTPU, under different multiplexing and concurrency behaviors. After validating our models using extensive experiments, we use them to design various cluster resource management algorithms to intelligently manage multiple applications on edge accelerators while respecting their latency constraints. We implement a prototype of our system in Kubernetes and show that our system can host 2.3X more DNN applications in heterogeneous multi-tenant edge clusters with no latency violations when compared to traditional knapsack hosting algorithms.
△ Less
Submitted 18 January, 2022;
originally announced January 2022.