-
Loki: A System for Serving ML Inference Pipelines with Hardware and Accuracy Scaling
Authors:
Sohaib Ahmad,
Hui Guan,
Ramesh K. Sitaraman
Abstract:
The rapid adoption of machine learning (ML) has underscored the importance of serving ML models with high throughput and resource efficiency. Traditional approaches to managing increasing query demands have predominantly focused on hardware scaling, which involves increasing server count or computing power. However, this strategy can often be impractical due to limitations in the available budget…
▽ More
The rapid adoption of machine learning (ML) has underscored the importance of serving ML models with high throughput and resource efficiency. Traditional approaches to managing increasing query demands have predominantly focused on hardware scaling, which involves increasing server count or computing power. However, this strategy can often be impractical due to limitations in the available budget or compute resources. As an alternative, accuracy scaling offers a promising solution by adjusting the accuracy of ML models to accommodate fluctuating query demands. Yet, existing accuracy scaling techniques target independent ML models and tend to underperform while managing inference pipelines. Furthermore, they lack integration with hardware scaling, leading to potential resource inefficiencies during low-demand periods. To address the limitations, this paper introduces Loki, a system designed for serving inference pipelines effectively with both hardware and accuracy scaling. Loki incorporates an innovative theoretical framework for optimal resource allocation and an effective query routing algorithm, aimed at improving system accuracy and minimizing latency deadline violations. Our empirical evaluation demonstrates that through accuracy scaling, the effective capacity of a fixed-size cluster can be enhanced by more than $2.7\times$ compared to relying solely on hardware scaling. When compared with state-of-the-art inference-serving systems, Loki achieves up to a $10\times$ reduction in Service Level Objective (SLO) violations, with minimal compromises on accuracy and while fulfilling throughput demands.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
The Green Mirage: Impact of Location- and Market-based Carbon Intensity Estimation on Carbon Optimization Efficacy
Authors:
Diptyaroop Maji,
Noman Bashir,
David Irwin,
Prashant Shenoy,
Ramesh K. Sitaraman
Abstract:
In recent years, there has been an increased emphasis on reducing the carbon emissions from electricity consumption. Many organizations have set ambitious targets to reduce the carbon footprint of their operations as a part of their sustainability goals. The carbon footprint of any consumer of electricity is computed as the product of the total energy consumption and the carbon intensity of electr…
▽ More
In recent years, there has been an increased emphasis on reducing the carbon emissions from electricity consumption. Many organizations have set ambitious targets to reduce the carbon footprint of their operations as a part of their sustainability goals. The carbon footprint of any consumer of electricity is computed as the product of the total energy consumption and the carbon intensity of electricity. Third-party carbon information services provide information on carbon intensity across regions that consumers can leverage to modulate their energy consumption patterns to reduce their overall carbon footprint. In addition, to accelerate their decarbonization process, large electricity consumers increasingly acquire power purchase agreements (PPAs) from renewable power plants to obtain renewable energy credits that offset their "brown" energy consumption. There are primarily two methods for attributing carbon-free energy, or renewable energy credits, to electricity consumers: location-based and market-based. These two methods yield significantly different carbon intensity values for various consumers. As there is a lack of consensus which method to use for carbon-free attribution, a concurrent application of both approaches is observed in practice. In this paper, we show that such concurrent applications can cause discrepancies in the carbon savings reported by carbon optimization techniques. Our analysis across three state-of-the-art carbon optimization techniques shows possible overestimation of up to 55.1% in the carbon reductions reported by the consumers and even increased emissions for consumers in some cases. We also find that carbon optimization techniques make different decisions under the market-based method and location-based method, and the market-based method can yield up to 28.2% less carbon savings than those claimed by the location-based method for consumers without PPAs.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
BONES: Near-Optimal Neural-Enhanced Video Streaming
Authors:
Lingdong Wang,
Simran Singh,
Jacob Chakareski,
Mohammad Hajiesmaili,
Ramesh K. Sitaraman
Abstract:
Accessing high-quality video content can be challenging due to insufficient and unstable network bandwidth. Recent advances in neural enhancement have shown promising results in improving the quality of degraded videos through deep learning. Neural-Enhanced Streaming (NES) incorporates this new approach into video streaming, allowing users to download low-quality video segments and then enhance th…
▽ More
Accessing high-quality video content can be challenging due to insufficient and unstable network bandwidth. Recent advances in neural enhancement have shown promising results in improving the quality of degraded videos through deep learning. Neural-Enhanced Streaming (NES) incorporates this new approach into video streaming, allowing users to download low-quality video segments and then enhance them to obtain high-quality content without violating the playback of the video stream. We introduce BONES, an NES control algorithm that jointly manages the network and computational resources to maximize the quality of experience (QoE) of the user. BONES formulates NES as a Lyapunov optimization problem and solves it in an online manner with near-optimal performance, making it the first NES algorithm to provide a theoretical performance guarantee. Comprehensive experimental results indicate that BONES increases QoE by 5\% to 20\% over state-of-the-art algorithms with minimal overhead. Our code is available at https://github.com/UMass-LIDS/bones.
△ Less
Submitted 10 April, 2024; v1 submitted 15 October, 2023;
originally announced October 2023.
-
BOLA360: Near-optimal View and Bitrate Adaptation for 360-degree Video Streaming
Authors:
Ali Zeynali,
Mohammad Hajiesmaili,
Ramesh K. Sitaraman
Abstract:
Recent advances in omnidirectional cameras and AR/VR headsets have spurred the adoption of 360-degree videos that are widely believed to be the future of online video streaming. 360-degree videos allow users to wear a head-mounted display (HMD) and experience the video as if they are physically present in the scene. Streaming high-quality 360-degree videos at scale is an unsolved problem that is m…
▽ More
Recent advances in omnidirectional cameras and AR/VR headsets have spurred the adoption of 360-degree videos that are widely believed to be the future of online video streaming. 360-degree videos allow users to wear a head-mounted display (HMD) and experience the video as if they are physically present in the scene. Streaming high-quality 360-degree videos at scale is an unsolved problem that is more challenging than traditional (2D) video delivery. The data rate required to stream 360-degree videos is an order of magnitude more than traditional videos. Further, the penalty for rebuffering events where the video freezes or displays a blank screen is more severe as it may cause cybersickness. We propose an online adaptive bitrate (ABR) algorithm for 360-degree videos called BOLA360 that runs inside the client's video player and orchestrates the download of video segments from the server so as to maximize the quality-of-experience (QoE) of the user. BOLA360 conserves bandwidth by downloading only those video segments that are likely to fall within the field-of-view (FOV) of the user. In addition, BOLA360 continually adapts the bitrate of the downloaded video segments so as to enable a smooth playback without rebuffering. We prove that BOLA360 is near-optimal with respect to an optimal offline algorithm that maximizes QoE. Further, we evaluate BOLA360 on a wide range of network and user head movement profiles and show that it provides $13.6\%$ to $372.5\%$ more QoE than state-of-the-art algorithms. While ABR algorithms for traditional (2D) videos have been well-studied over the last decade, our work is the first ABR algorithm for 360-degree videos with both theoretical and empirical guarantees on its performance.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
Untangling Carbon-free Energy Attribution and Carbon Intensity Estimation for Carbon-aware Computing
Authors:
Diptyaroop Maji,
Noman Bashir,
David Irwin,
Prashant Shenoy,
Ramesh K. Sitaraman
Abstract:
Many organizations, including governments, utilities, and businesses, have set ambitious targets to reduce carbon emissions for their Environmental, Social, and Governance (ESG) goals. To achieve these targets, these organizations increasingly use power purchase agreements (PPAs) to obtain renewable energy credits, which they use to compensate for the ``brown'' energy consumed from the grid. Howev…
▽ More
Many organizations, including governments, utilities, and businesses, have set ambitious targets to reduce carbon emissions for their Environmental, Social, and Governance (ESG) goals. To achieve these targets, these organizations increasingly use power purchase agreements (PPAs) to obtain renewable energy credits, which they use to compensate for the ``brown'' energy consumed from the grid. However, the details of these PPAs are often private and not shared with important stakeholders, such as grid operators and carbon information services, who monitor and report the grid's carbon emissions. This often results in incorrect carbon accounting, where the same renewable energy production could be factored into grid carbon emission reports and separately claimed by organizations that own PPAs. Such ``double counting'' of renewable energy production could lead organizations with PPAs to understate their carbon emissions and overstate their progress toward sustainability goals, and also provide significant challenges to consumers using common carbon reduction measures to decrease their carbon footprint. Unfortunately, there is no consensus on accurately computing the grid's carbon intensity by properly accounting for PPAs. The goal of our work is to shed quantitative and qualitative light on the renewable energy attribution and the incorrect carbon intensity estimation problems.
△ Less
Submitted 5 February, 2024; v1 submitted 13 August, 2023;
originally announced August 2023.
-
360TripleView: 360-Degree Video View Management System Driven by Convergence Value of Viewing Preferences
Authors:
Qian Zhou,
Michael Zink,
Ramesh Sitaraman,
Klara Nahrstedt
Abstract:
360-degree video has become increasingly popular in content consumption. However, finding the viewing direction for important content within each frame poses a significant challenge. Existing approaches rely on either viewer input or algorithmic determination to select the viewing direction, but neither mode consistently outperforms the other in terms of content-importance. In this paper, we propo…
▽ More
360-degree video has become increasingly popular in content consumption. However, finding the viewing direction for important content within each frame poses a significant challenge. Existing approaches rely on either viewer input or algorithmic determination to select the viewing direction, but neither mode consistently outperforms the other in terms of content-importance. In this paper, we propose 360TripleView, the first view management system for 360-degree video that automatically infers and utilizes the better view mode for each frame, ultimately providing viewers with higher content-importance views. Through extensive experiments and a user study, we demonstrate that 360TripleView achieves over 90\% accuracy in inferring the better mode and significantly enhances content-importance compared to existing methods.
△ Less
Submitted 3 December, 2023; v1 submitted 13 June, 2023;
originally announced June 2023.
-
CU-Net: Real-Time High-Fidelity Color Upsampling for Point Clouds
Authors:
Lingdong Wang,
Mohammad Hajiesmaili,
Jacob Chakareski,
Ramesh K. Sitaraman
Abstract:
Point cloud upsampling is essential for high-quality augmented reality, virtual reality, and telepresence applications, due to the capture, processing, and communication limitations of existing technologies. Although geometry upsampling to densify a point cloud's coordinates has been well studied, the upsampling of the color attributes has been largely overlooked. In this paper, we propose CU-Net,…
▽ More
Point cloud upsampling is essential for high-quality augmented reality, virtual reality, and telepresence applications, due to the capture, processing, and communication limitations of existing technologies. Although geometry upsampling to densify a point cloud's coordinates has been well studied, the upsampling of the color attributes has been largely overlooked. In this paper, we propose CU-Net, the first deep-learning point cloud color upsampling model that enables low latency and high visual fidelity operation. CU-Net achieves linear time and space complexity by leveraging a feature extractor based on sparse convolution and a color prediction module based on neural implicit function. Therefore, CU-Net is theoretically guaranteed to be more efficient than most existing methods with quadratic complexity. Experimental results demonstrate that CU-Net can colorize a photo-realistic point cloud with nearly a million points in real time, while having notably better visual performance than baselines. Besides, CU-Net can adapt to arbitrary upsampling ratios and unseen objects without retraining. Our source code is available at https://github.com/UMass-LIDS/cunet.
△ Less
Submitted 16 November, 2022; v1 submitted 12 September, 2022;
originally announced September 2022.
-
Enabling Sustainable Clouds: The Case for Virtualizing the Energy System
Authors:
Noman Bashir,
Tian Guo,
Mohammad Hajiesmaili,
David Irwin,
Prashant Shenoy,
Ramesh Sitaraman,
Abel Souza,
Adam Wierman
Abstract:
Cloud platforms' growing energy demand and carbon emissions are raising concern about their environmental sustainability. The current approach to enabling sustainable clouds focuses on improving energy-efficiency and purchasing carbon offsets. These approaches have limits: many cloud data centers already operate near peak efficiency, and carbon offsets cannot scale to near zero carbon where there…
▽ More
Cloud platforms' growing energy demand and carbon emissions are raising concern about their environmental sustainability. The current approach to enabling sustainable clouds focuses on improving energy-efficiency and purchasing carbon offsets. These approaches have limits: many cloud data centers already operate near peak efficiency, and carbon offsets cannot scale to near zero carbon where there is little carbon left to offset. Instead, enabling sustainable clouds will require applications to adapt to when and where unreliable low-carbon energy is available. Applications cannot do this today because their energy use and carbon emissions are not visible to them, as the energy system provides the rigid abstraction of a continuous, reliable energy supply. This vision paper instead advocates for a ``carbon first'' approach to cloud design that elevates carbon-efficiency to a first-class metric. To do so, we argue that cloud platforms should virtualize the energy system by exposing visibility into, and software-defined control of, it to applications, enabling them to define their own abstractions for managing energy and carbon emissions based on their own requirements.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
Organizing Virtual Conferences through Mirrors: The ACM e-Energy 2020 Experience
Authors:
Dan Wang,
Arun Vishwanath,
Ramesh Sitaraman,
Iven Mareels
Abstract:
The emergence of the world-wide COVID-19 pandemic has forced academic conferences to be held entirely in a virtual manner. While prior studies have advocated the merits of virtual conferences in terms of energy and cost savings, organizers are increasingly facing the prospect of planning and executing them systematically, in order to deliver a rich conference-attending-experience for all participa…
▽ More
The emergence of the world-wide COVID-19 pandemic has forced academic conferences to be held entirely in a virtual manner. While prior studies have advocated the merits of virtual conferences in terms of energy and cost savings, organizers are increasingly facing the prospect of planning and executing them systematically, in order to deliver a rich conference-attending-experience for all participants. Starting from March 2020, tens of conferences have been held virtually. Past conferences have revealed numerous challenges, from budget planning, to selecting the supporting virtual platforms. Among these, two special challenges were identified: 1) how to deliver talks to geo-distributed attendees and 2) how to stimulate social interactions among attendees. These are the two important goals of an academic conference. In this paper, we advocate a mirror program approach for academic conferences. More specifically, the conference program is executed in multiple parallel (mirrored) programs, so that each mirror program can fit a different time zone. This can effectively address the first challenge.
△ Less
Submitted 19 August, 2020;
originally announced August 2020.
-
Online Inventory Management with Application to Energy Procurement in Data Centers
Authors:
Lin Yang,
Mohammad H. Hajiesmaili,
Ramesh Sitaraman,
Enrique Mallada,
Wing S. Wong,
Adam Wierman
Abstract:
Motivated by the application of energy storage management in electricity markets, this paper considers the problem of online linear programming with inventory management constraints. Specifically, a decision maker should satisfy some units of an asset as her demand, either form a market with time-varying price or from her own inventory. The decision maker is presented a price in slot-by-slot manne…
▽ More
Motivated by the application of energy storage management in electricity markets, this paper considers the problem of online linear programming with inventory management constraints. Specifically, a decision maker should satisfy some units of an asset as her demand, either form a market with time-varying price or from her own inventory. The decision maker is presented a price in slot-by-slot manner, and must immediately decide the purchased amount with the current price to cover the demand or to store in inventory for covering the future demand. The inventory has a limited capacity and its critical role is to buy and store assets at low price and use the stored assets to cover the demand at high price. The ultimate goal of the decision maker is to cover the demands while minimizing the cost of buying assets from the market. We propose BatMan, an online algorithm for simple inventory models, and BatManRate, an extended version for the case with rate constraints. Both BatMan and BatManRate achieve optimal competitive ratios, meaning that no other online algorithm can achieve a better theoretical guarantee. To illustrate the results, we use the proposed algorithms to design and evaluate energy procurement and storage management strategies for data centers with a portfolio of energy sources including the electric grid, local renewable generation, and energy storage systems.
△ Less
Submitted 14 January, 2019;
originally announced January 2019.
-
OCCAM: An Optimization-Based Approach to Network Inference
Authors:
Anirudh Sabnis,
Ramesh K. Sitaraman,
Donald Towsley
Abstract:
We study the problem of inferring the structure of a communication network based only on network measurements made from a set of hosts situated at the network periphery. Our novel approach called "OCCAM" is based on the principle of occam's razor and finds the "simplest" network that explains the observed network measurements. OCCAM infers the internal topology of a communication network, includin…
▽ More
We study the problem of inferring the structure of a communication network based only on network measurements made from a set of hosts situated at the network periphery. Our novel approach called "OCCAM" is based on the principle of occam's razor and finds the "simplest" network that explains the observed network measurements. OCCAM infers the internal topology of a communication network, including the internal nodes and links of the network that are not amenable to direct measurement. In addition to network topology, OCCAM infers the routing paths that packets take between the hosts. OCCAM uses path metrics measurable from the hosts and expresses the observed measurements as constraints of a mixed-integer bilinear optimization problem that can then be feasibly solved to yield the network topology and the routing paths. We empirically validate OCCAM on a wide variety of real-world ISP networks and show that its inferences agree closely with the ground truth. Specifically, OCCAM infers the topology with an average network similarity score of 93% and infers routing paths with a path edit distance of 0.20. Further, OCCAM is robust to error in its measured path metric inputs, producing high quality inferences even when 20-30% of its inputs are erroneous. Our work is a significant advance in network tomography as it proposes and empirically evaluates the first method that infers the complete network topology, rather than just logical routing trees from sources.
△ Less
Submitted 1 June, 2019; v1 submitted 9 June, 2018;
originally announced June 2018.
-
Adaptive TTL-Based Caching for Content Delivery
Authors:
Soumya Basu,
Aditya Sundarrajan,
Javad Ghaderi,
Sanjay Shakkottai,
Ramesh Sitaraman
Abstract:
Content Delivery Networks (CDNs) deliver a majority of the user-requested content on the Internet, including web pages, videos, and software downloads. A CDN server caches and serves the content requested by users. Designing caching algorithms that automatically adapt to the heterogeneity, burstiness, and non-stationary nature of real-world content requests is a major challenge and is the focus of…
▽ More
Content Delivery Networks (CDNs) deliver a majority of the user-requested content on the Internet, including web pages, videos, and software downloads. A CDN server caches and serves the content requested by users. Designing caching algorithms that automatically adapt to the heterogeneity, burstiness, and non-stationary nature of real-world content requests is a major challenge and is the focus of our work. While there is much work on caching algorithms for stationary request traffic, the work on non-stationary request traffic is very limited. Consequently, most prior models are inaccurate for production CDN traffic that is non-stationary.
We propose two TTL-based caching algorithms and provide provable guarantees for content request traffic that is bursty and non-stationary. The first algorithm called d-TTL dynamically adapts a TTL parameter using a stochastic approximation approach. Given a feasible target hit rate, we show that the hit rate of d-TTL converges to its target value for a general class of bursty traffic that allows Markov dependence over time and non-stationary arrivals. The second algorithm called f-TTL uses two caches, each with its own TTL. The first-level cache adaptively filters out non-stationary traffic, while the second-level cache stores frequently-accessed stationary traffic. Given feasible targets for both the hit rate and the expected cache size, f-TTL asymptotically achieves both targets. We implement d-TTL and f-TTL and evaluate both algorithms using an extensive nine-day trace consisting of 500 million requests from a production CDN server. We show that both d-TTL and f-TTL converge to their hit rate targets with an error of about 1.3%. But, f-TTL requires a significantly smaller cache size than d-TTL to achieve the same hit rate, since it effectively filters out the non-stationary traffic for rarely-accessed objects.
△ Less
Submitted 8 December, 2017; v1 submitted 14 April, 2017;
originally announced April 2017.
-
MON: Mission-optimized Overlay Networks
Authors:
Bruce Spang,
Anirudh Sabnis,
Ramesh Sitaraman,
Don Towsley,
Brian DeCleene
Abstract:
Large organizations often have users in multiple sites which are connected over the Internet. Since resources are limited, communication between these sites needs to be carefully orchestrated for the most benefit to the organization. We present a Mission-optimized Overlay Network (MON), a hybrid overlay network architecture for maximizing utility to the organization. We combine an offline and an o…
▽ More
Large organizations often have users in multiple sites which are connected over the Internet. Since resources are limited, communication between these sites needs to be carefully orchestrated for the most benefit to the organization. We present a Mission-optimized Overlay Network (MON), a hybrid overlay network architecture for maximizing utility to the organization. We combine an offline and an online system to solve non-concave utility maximization problems. The offline tier, the Predictive Flow Optimizer (PFO), creates plans for routing traffic using a model of network conditions. The online tier, MONtra, is aware of the precise local network conditions and is able to react quickly to problems within the network. Either tier alone is insufficient. The PFO may take too long to react to network changes. MONtra only has local information and cannot optimize non-concave mission utilities. However, by combining the two systems, MON is robust and achieves near-optimal utility under a wide range of network conditions. While best-effort overlay networks are well studied, our work is the first to design overlays that are optimized for mission utility.
△ Less
Submitted 27 January, 2017;
originally announced January 2017.
-
BOLA: Near-Optimal Bitrate Adaptation for Online Videos
Authors:
Kevin Spiteri,
Rahul Urgaonkar,
Ramesh K. Sitaraman
Abstract:
Modern video players employ complex algorithms to adapt the bitrate of the video that is shown to the user. Bitrate adaptation requires a tradeoff between reducing the probability that the video freezes (rebuffers) and enhancing the quality of the video. A bitrate that is too high leads to frequent rebuffering, while a bitrate that is too low leads to poor video quality. Video providers segment vi…
▽ More
Modern video players employ complex algorithms to adapt the bitrate of the video that is shown to the user. Bitrate adaptation requires a tradeoff between reducing the probability that the video freezes (rebuffers) and enhancing the quality of the video. A bitrate that is too high leads to frequent rebuffering, while a bitrate that is too low leads to poor video quality. Video providers segment videos into short segments and encode each segment at multiple bitrates. The video player adaptively chooses the bitrate of each segment to download, possibly choosing different bitrates for successive segments. We formulate bitrate adaptation as a utility-maximization problem and devise an online control algorithm called BOLA that uses Lyapunov optimization to minimize rebuffering and maximize video quality. We prove that BOLA achieves a time-average utility that is within an additive term O(1/V) of the optimal value, for a control parameter V related to the video buffer size. Further, unlike prior work, BOLA does not require prediction of available network bandwidth. We empirically validate BOLA in a simulated network environment using a collection of network traces. We show that BOLA achieves near-optimal utility and in many cases significantly higher utility than current state-of-the-art algorithms. Our work has immediate impact on real-world video players and for the evolving DASH standard for video transmission. We also implemented an updated version of BOLA that is now part of the standard reference player dash.js and is used in production by several video providers such as Akamai, BBC, CBS, and Orange.
△ Less
Submitted 17 June, 2020; v1 submitted 25 January, 2016;
originally announced January 2016.
-
On the Complexity of Optimal Routing and Content Caching in Heterogeneous Networks
Authors:
Mostafa Dehghan,
Anand Seetharam,
Bo Jiang,
Ting He,
Theodoros Salonidis,
Jim Kurose,
Don Towsley,
Ramesh Sitaraman
Abstract:
We investigate the problem of optimal request routing and content caching in a heterogeneous network supporting in-network content caching with the goal of minimizing average content access delay. Here, content can either be accessed directly from a back-end server (where content resides permanently) or be obtained from one of multiple in-network caches. To access a piece of content, a user must d…
▽ More
We investigate the problem of optimal request routing and content caching in a heterogeneous network supporting in-network content caching with the goal of minimizing average content access delay. Here, content can either be accessed directly from a back-end server (where content resides permanently) or be obtained from one of multiple in-network caches. To access a piece of content, a user must decide whether to route its request to a cache or to the back-end server. Additionally, caches must decide which content to cache. We investigate the problem complexity of two problem formulations, where the direct path to the back-end server is modeled as i) a congestion-sensitive or ii) a congestion-insensitive path, reflecting whether or not the delay of the uncached path to the back-end server depends on the user request load, respectively. We show that the problem is NP-complete in both cases. We prove that under the congestion-insensitive model the problem can be solved optimally in polynomial time if each piece of content is requested by only one user, or when there are at most two caches in the network. We also identify a structural property of the user-cache graph that potentially makes the problem NP-complete. For the congestion-sensitive model, we prove that the problem remains NP-complete even if there is only one cache in the network and each content is requested by only one user. We show that approximate solutions can be found for both models within a (1-1/e) factor of the optimal solution, and demonstrate a greedy algorithm that is found to be within 1% of optimal for small problem sizes. Through trace-driven simulations we evaluate the performance of our greedy algorithms, which show up to a 50% reduction in average delay over solutions based on LRU content caching.
△ Less
Submitted 31 December, 2014;
originally announced January 2015.
-
Go-With-The-Winner: Client-Side Server Selection for Content Delivery
Authors:
Chang Liu,
Ramesh K. Sitaraman,
Don Towsley
Abstract:
Content delivery networks deliver much of the web and video content in the world by deploying a large distributed network of servers. We model and analyze a simple paradigm for client-side server selection that is commonly used in practice where each user independently measures the performance of a set of candidate servers and selects the one that performs the best. For web (resp., video) delivery…
▽ More
Content delivery networks deliver much of the web and video content in the world by deploying a large distributed network of servers. We model and analyze a simple paradigm for client-side server selection that is commonly used in practice where each user independently measures the performance of a set of candidate servers and selects the one that performs the best. For web (resp., video) delivery, we propose and analyze a simple algorithm where each user randomly chooses two or more candidate servers and selects the server that provided the best hit rate (resp., bit rate). We prove that the algorithm converges quickly to an optimal state where all users receive the best hit rate (resp., bit rate), with high probability. We also show that if each user chose just one random server instead of two, some users receive a hit rate (resp., bit rate) that tends to zero. We simulate our algorithm and evaluate its performance with varying choices of parameters, system load, and content popularity.
△ Less
Submitted 3 November, 2016; v1 submitted 31 December, 2013;
originally announced January 2014.
-
Dynamic Provisioning in Next-Generation Data Centers with On-site Power Production
Authors:
**long Tu,
Lian Lu,
Minghua Chen,
Ramesh K. Sitaraman
Abstract:
The critical need for clean and economical sources of energy is transforming data centers that are primarily energy consumers to also energy producers. We focus on minimizing the operating costs of next-generation data centers that can jointly optimize the energy supply from on-site generators and the power grid, and the energy demand from servers as well as power conditioning and cooling systems.…
▽ More
The critical need for clean and economical sources of energy is transforming data centers that are primarily energy consumers to also energy producers. We focus on minimizing the operating costs of next-generation data centers that can jointly optimize the energy supply from on-site generators and the power grid, and the energy demand from servers as well as power conditioning and cooling systems. We formulate the cost minimization problem and present an offline optimal algorithm. For "on-grid" data centers that use only the grid, we devise a deterministic online algorithm that achieves the best possible competitive ratio of $2-α_{s}$, where $α_{s}$ is a normalized look-ahead window size. For "hybrid" data centers that have on-site power generation in addition to the grid, we develop an online algorithm that achieves a competitive ratio of at most \textmd{\normalsize {\small $\frac{P_{\max} (2-α_{s})}{c_{o}+c_{m}/L} [1+2\frac{P_{\max}-c_{o}}{P_{\max}(1+α_{g})}]$}}, where $α_{s}$ and $α_{g}$ are normalized look-ahead window sizes, $P_{\max}$ is the maximum grid power price, and $L$, $c_{o}$, and $c_{m}$ are parameters of an on-site generator.
Using extensive workload traces from Akamai with the corresponding grid power prices, we simulate our offline and online algorithms in a realistic setting. Our offline (resp., online) algorithm achieves a cost reduction of 25.8% (resp., 20.7%) for a hybrid data center and 12.3% (resp., 7.3%) for an on-grid data center. The cost reductions are quite significant and make a strong case for a joint optimization of energy supply and energy demand in a data center. A hybrid data center provides about 13% additional cost reduction over an on-grid data center representing the additional cost benefits that on-site power generation provides over using the grid alone.
△ Less
Submitted 9 April, 2013; v1 submitted 27 March, 2013;
originally announced March 2013.
-
Distributing Content Simplifies ISP Traffic Engineering
Authors:
Abhigyan Sharma,
Arun Venkataramani,
Ramesh Sitaraman
Abstract:
Several major Internet service providers (e.g., Level-3, AT&T, Verizon) today also offer content distribution services. The emergence of such "Network-CDNs" (NCDNs) are driven by market forces that place more value on content services than just carrying the bits. NCDNs are also necessitated by the need to reduce the cost of carrying ever-increasing volumes of traffic across their backbones. An NCD…
▽ More
Several major Internet service providers (e.g., Level-3, AT&T, Verizon) today also offer content distribution services. The emergence of such "Network-CDNs" (NCDNs) are driven by market forces that place more value on content services than just carrying the bits. NCDNs are also necessitated by the need to reduce the cost of carrying ever-increasing volumes of traffic across their backbones. An NCDN has the flexibility to determine both where content is placed and how traffic is routed within the network. However NCDNs today continue to treat traffic engineering independently from content placement and request redirection decisions. In this paper, we investigate the interplay between content distribution strategies and traffic engineering and ask how an NCDN should engineer traffic in a content-aware manner. Our experimental analysis, based on traces from a large content distribution network and real ISP topologies, shows that effective content placement can significantly simplify traffic engineering and in most cases obviate the need to engineer NCDN traffic all together! Further, we show that simple demand-oblivious schemes for routing and placement such as InverseCap and LRU suffice as they achieve network costs that are close to the best possible.
△ Less
Submitted 25 September, 2012;
originally announced September 2012.
-
Optimizing MapReduce for Highly Distributed Environments
Authors:
Benjamin Heintz,
Abhishek Chandra,
Ramesh K. Sitaraman
Abstract:
MapReduce, the popular programming paradigm for large-scale data processing, has traditionally been deployed over tightly-coupled clusters where the data is already locally available. The assumption that the data and compute resources are available in a single central location, however, no longer holds for many emerging applications in commercial, scientific and social networking domains, where th…
▽ More
MapReduce, the popular programming paradigm for large-scale data processing, has traditionally been deployed over tightly-coupled clusters where the data is already locally available. The assumption that the data and compute resources are available in a single central location, however, no longer holds for many emerging applications in commercial, scientific and social networking domains, where the data is generated in a geographically distributed manner. Further, the computational resources needed for carrying out the data analysis may be distributed across multiple data centers or community resources such as Grids. In this paper, we develop a modeling framework to capture MapReduce execution in a highly distributed environment comprising distributed data sources and distributed computational resources. This framework is flexible enough to capture several design choices and performance optimizations for MapReduce execution. We propose a model-driven optimization that has two key features: (i) it is end-to-end as opposed to myopic optimizations that may only make locally optimal but globally suboptimal decisions, and (ii) it can control multiple MapReduce phases to achieve low runtime, as opposed to single-phase optimizations that may control only individual phases. Our model results show that our optimization can provide nearly 82% and 64% reduction in execution time over myopic and single-phase optimizations, respectively. We have modified Hadoop to implement our model outputs, and using three different MapReduce applications over an 8-node emulated PlanetLab testbed, we show that our optimized Hadoop execution plan achieves 31-41% reduction in runtime over a vanilla Hadoop execution. Our model-driven optimization also provides several insights into the choice of techniques and execution parameters based on application and platform characteristics.
△ Less
Submitted 30 July, 2012;
originally announced July 2012.
-
Energy-Aware Load Balancing in Content Delivery Networks
Authors:
Vimal Mathew,
Ramesh K. Sitaraman,
Prashant Shenoy
Abstract:
Internet-scale distributed systems such as content delivery networks (CDNs) operate hundreds of thousands of servers deployed in thousands of data center locations around the globe. Since the energy costs of operating such a large IT infrastructure are a significant fraction of the total operating costs, we argue for redesigning CDNs to incorporate energy optimizations as a first-order principle.…
▽ More
Internet-scale distributed systems such as content delivery networks (CDNs) operate hundreds of thousands of servers deployed in thousands of data center locations around the globe. Since the energy costs of operating such a large IT infrastructure are a significant fraction of the total operating costs, we argue for redesigning CDNs to incorporate energy optimizations as a first-order principle. We propose techniques to turn off CDN servers during periods of low load while seeking to balance three key design goals: maximize energy reduction, minimize the impact on client-perceived service availability (SLAs), and limit the frequency of on-off server transitions to reduce wear-and-tear and its impact on hardware reliability. We propose an optimal offline algorithm and an online algorithm to extract energy savings both at the level of local load balancing within a data center and global load balancing across data centers. We evaluate our algorithms using real production workload traces from a large commercial CDN. Our results show that it is possible to reduce the energy consumption of a CDN by more than 55% while ensuring a high level of availability that meets customer SLA requirements and incurring an average of one on-off transition per server per day. Further, we show that kee** even 10% of the servers as hot spares helps absorb load spikes due to global flash crowds with little impact on availability SLAs. Finally, we show that redistributing load across proximal data centers can enhance service availability significantly, but has only a modest impact on energy savings.
△ Less
Submitted 26 September, 2011;
originally announced September 2011.
-
Algorithms for Constructing Overlay Networks For Live Streaming
Authors:
Konstantin Andreev,
Bruce M. Maggs,
Adam Meyerson,
Jevan Saks,
Ramesh K. Sitaraman
Abstract:
We present a polynomial time approximation algorithm for constructing an overlay multicast network for streaming live media events over the Internet. The class of overlay networks constructed by our algorithm include networks used by Akamai Technologies to deliver live media events to a global audience with high fidelity. We construct networks consisting of three stages of nodes. The nodes in the…
▽ More
We present a polynomial time approximation algorithm for constructing an overlay multicast network for streaming live media events over the Internet. The class of overlay networks constructed by our algorithm include networks used by Akamai Technologies to deliver live media events to a global audience with high fidelity. We construct networks consisting of three stages of nodes. The nodes in the first stage are the entry points that act as sources for the live streams. Each source forwards each of its streams to one or more nodes in the second stage that are called reflectors. A reflector can split an incoming stream into multiple identical outgoing streams, which are then sent on to nodes in the third and final stage that act as sinks and are located in edge networks near end-users. As the packets in a stream travel from one stage to the next, some of them may be lost. A sink combines the packets from multiple instances of the same stream (by reordering packets and discarding duplicates) to form a single instance of the stream with minimal loss. Our primary contribution is an algorithm that constructs an overlay network that provably satisfies capacity and reliability constraints to within a constant factor of optimal, and minimizes cost to within a logarithmic factor of optimal. Further in the common case where only the transmission costs are minimized, we show that our algorithm produces a solution that has cost within a factor of 2 of optimal. We also implement our algorithm and evaluate it on realistic traces derived from Akamai's live streaming network. Our empirical results show that our algorithm can be used to efficiently construct large-scale overlay networks in practice with near-optimal cost.
△ Less
Submitted 19 September, 2011;
originally announced September 2011.