-
Towards Building Autonomous Data Services on Azure
Authors:
Yiwen Zhu,
Yuanyuan Tian,
Joyce Cahoon,
Subru Krishnan,
Ankita Agarwal,
Rana Alotaibi,
Jesús Camacho-Rodríguez,
Bibin Chundatt,
Andrew Chung,
Niharika Dutta,
Andrew Fogarty,
Anja Gruenheid,
Brandon Haynes,
Matteo Interlandi,
Minu Iyer,
Nick Jurgens,
Sumeet Khushalani,
Brian Kroth,
Manoj Kumar,
Jyoti Leeka,
Sergiy Matusevych,
Minni Mittal,
Andreas Mueller,
Kartheek Muthyala,
Harsha Nagulapalli
, et al. (13 additional authors not shown)
Abstract:
Modern cloud has turned data services into easily accessible commodities. With just a few clicks, users are now able to access a catalog of data processing systems for a wide range of tasks. However, the cloud brings in both complexity and opportunity. While cloud users can quickly start an application by using various data services, it can be difficult to configure and optimize these services to…
▽ More
Modern cloud has turned data services into easily accessible commodities. With just a few clicks, users are now able to access a catalog of data processing systems for a wide range of tasks. However, the cloud brings in both complexity and opportunity. While cloud users can quickly start an application by using various data services, it can be difficult to configure and optimize these services to gain the most value from them. For cloud providers, managing every aspect of an ever-increasing set of data services, while meeting customer SLAs and minimizing operational cost is becoming more challenging. Cloud technology enables the collection of significant amounts of workload traces and system telemetry. With the progress in data science (DS) and machine learning (ML), it is feasible and desirable to utilize a data-driven, ML-based approach to automate various aspects of data services, resulting in the creation of autonomous data services. This paper presents our perspectives and insights on creating autonomous data services on Azure. It also covers the future endeavors we plan to undertake and unresolved issues that still need attention.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
To Share, or not to Share Online Event Trend Aggregation Over Bursty Event Streams
Authors:
Olga Poppe,
Chuan Lei,
Lei Ma,
Allison Rozet,
Elke A. Rundensteiner
Abstract:
Complex event processing (CEP) systems continuously evaluate large workloads of pattern queries under tight time constraints. Event trend aggregation queries with Kleene patterns are commonly used to retrieve summarized insights about the recent trends in event streams. State-of-art methods are limited either due to repetitive computations or unnecessary trend construction. Existing shared approac…
▽ More
Complex event processing (CEP) systems continuously evaluate large workloads of pattern queries under tight time constraints. Event trend aggregation queries with Kleene patterns are commonly used to retrieve summarized insights about the recent trends in event streams. State-of-art methods are limited either due to repetitive computations or unnecessary trend construction. Existing shared approaches are guided by statically selected and hence rigid sharing plans that are often sub-optimal under stream fluctuations. In this work, we propose a novel framework Hamlet that is the first to overcome these limitations. Hamlet introduces two key innovations. First, Hamlet adaptively decides whether to share or not to share computations depending on the current stream properties at run time to harvest the maximum sharing benefit. Second, Hamlet is equipped with a highly efficient shared trend aggregation strategy that avoids trend construction. Our experimental study on both real and synthetic data sets demonstrates that Hamlet consistently reduces query latency by up to five orders of magnitude compared to the state-of-the-art approaches.
△ Less
Submitted 3 March, 2021; v1 submitted 1 January, 2021;
originally announced January 2021.
-
Sharon: Shared Online Event Sequence Aggregation
Authors:
Olga Poppe,
Allison Rozet,
Chuan Lei,
Elke A. Rundensteiner,
David Maier
Abstract:
Streaming systems evaluate massive workloads of event sequence aggregation queries. State-of-the-art approaches suffer from long delays caused by not sharing intermediate results of similar queries and by constructing event sequences prior to their aggregation. To overcome these limitations, our Shared Online Event Sequence Aggregation (Sharon) approach shares intermediate aggregates among multipl…
▽ More
Streaming systems evaluate massive workloads of event sequence aggregation queries. State-of-the-art approaches suffer from long delays caused by not sharing intermediate results of similar queries and by constructing event sequences prior to their aggregation. To overcome these limitations, our Shared Online Event Sequence Aggregation (Sharon) approach shares intermediate aggregates among multiple queries while avoiding the expensive construction of event sequences. Our Sharon optimizer faces two challenges. One, a sharing decision is not always beneficial. Two, a sharing decision may exclude other sharing opportunities. To guide our Sharon optimizer, we compactly encode sharing candidates, their benefits, and conflicts among candidates into the Sharon graph. Based on the graph, we map our problem of finding an optimal sharing plan to the Maximum Weight Independent Set (MWIS) problem. We then use the guaranteed weight of a greedy algorithm for the MWIS problem to prune the search of our sharing plan finder without sacrificing its optimality. The Sharon optimizer is shown to produce sharing plans that achieve up to an 18-fold speed-up compared to state-of-the-art approaches.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
GRETA: Graph-based Real-time Event Trend Aggregation
Authors:
Olga Poppe,
Chuan Lei,
Elke A. Rundensteiner,
David Maier
Abstract:
Streaming applications from algorithmic trading to traffic management deploy Kleene patterns to detect and aggregate arbitrarily-long event sequences, called event trends. State-of-the-art systems process such queries in two steps. Namely, they first construct all trends and then aggregate them. Due to the exponential costs of trend construction, this two-step approach suffers from both a long del…
▽ More
Streaming applications from algorithmic trading to traffic management deploy Kleene patterns to detect and aggregate arbitrarily-long event sequences, called event trends. State-of-the-art systems process such queries in two steps. Namely, they first construct all trends and then aggregate them. Due to the exponential costs of trend construction, this two-step approach suffers from both a long delays and high memory costs. To overcome these limitations, we propose the Graph-based Real-time Event Trend Aggregation (Greta) approach that dynamically computes event trend aggregation without first constructing these trends. We define the Greta graph to compactly encode all trends. Our Greta runtime incrementally maintains the graph, while dynamically propagating aggregates along its edges. Based on the graph, the final aggregate is incrementally updated and instantaneously returned at the end of each query window. Our Greta runtime represents a win-win solution, reducing both the time complexity from exponential to quadratic and the space complexity from exponential to linear in the number of events. Our experiments demonstrate that Greta achieves up to four orders of magnitude speed-up and up to 50--fold memory reduction compared to the state-of-the-art two-step approaches.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
Event Trend Aggregation Under Rich Event Matching Semantics
Authors:
Olga Poppe,
Chuan Lei,
Elke A. Rundensteiner,
David Maier
Abstract:
Streaming applications from health care analytics to algorithmic trading deploy Kleene queries to detect and aggregate event trends. Rich event matching semantics determine how to compose events into trends. The expressive power of state-of-the-art systems remains limited in that they do not support the rich variety of these semantics. Worse yet, they suffer from long delays and high memory costs…
▽ More
Streaming applications from health care analytics to algorithmic trading deploy Kleene queries to detect and aggregate event trends. Rich event matching semantics determine how to compose events into trends. The expressive power of state-of-the-art systems remains limited in that they do not support the rich variety of these semantics. Worse yet, they suffer from long delays and high memory costs because they opt to maintain aggregates at a fine granularity. To overcome these limitations, our Coarse-Grained Event Trend Aggregation (Cogra) approach supports this rich diversity of event matching semantics within one system. Better yet, Cogra incrementally maintains aggregates at the coarsest granularity possible for each of these semantics. In this way, Cogra minimizes the number of aggregates -- reducing both time and space complexity. Our experiments demonstrate that Cogra achieves up to four orders of magnitude speed-up and up to eight orders of magnitude memory reduction compared to state-of-the-art approaches.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
Seagull: An Infrastructure for Load Prediction and Optimized Resource Allocation
Authors:
Olga Poppe,
Tayo Amuneke,
Dalitso Banda,
Aritra De,
Ari Green,
Manon Knoertzer,
Ehi Nosakhare,
Karthik Rajendran,
Deepak Shankargouda,
Meina Wang,
Alan Au,
Carlo Curino,
Qun Guo,
Alekh **dal,
Ajay Kalhan,
Morgan Oslake,
Sonia Parchani,
Vijay Ramani,
Raj Sellappan,
Saikat Sen,
Sheetal Shrotri,
Soundararajan Srinivasan,
** Xia,
Shize Xu,
Alicia Yang
, et al. (1 additional authors not shown)
Abstract:
Microsoft Azure is dedicated to guarantee high quality of service to its customers, in particular, during periods of high customer activity, while controlling cost. We employ a Data Science (DS) driven solution to predict user load and leverage these predictions to optimize resource allocation. To this end, we built the Seagull infrastructure that processes per-server telemetry, validates the data…
▽ More
Microsoft Azure is dedicated to guarantee high quality of service to its customers, in particular, during periods of high customer activity, while controlling cost. We employ a Data Science (DS) driven solution to predict user load and leverage these predictions to optimize resource allocation. To this end, we built the Seagull infrastructure that processes per-server telemetry, validates the data, trains and deploys ML models. The models are used to predict customer load per server (24h into the future), and optimize service operations. Seagull continually re-evaluates accuracy of predictions, fallback to previously known good models and triggers alerts as appropriate. We deployed this infrastructure in production for PostgreSQL and MySQL servers across all Azure regions, and applied it to the problem of scheduling server backups during low-load time. This minimizes interference with user-induced load and improves customer experience.
△ Less
Submitted 16 October, 2020; v1 submitted 27 September, 2020;
originally announced September 2020.
-
MLOS: An Infrastructure for Automated Software Performance Engineering
Authors:
Carlo Curino,
Neha Godwal,
Brian Kroth,
Sergiy Kuryata,
Greg Lapinski,
Siqi Liu,
Slava Oks,
Olga Poppe,
Adam Smiechowski,
Ed Thayer,
Markus Weimer,
Yiwen Zhu
Abstract:
Develo** modern systems software is a complex task that combines business logic programming and Software Performance Engineering (SPE). The later is an experimental and labor-intensive activity focused on optimizing the system for a given hardware, software, and workload (hw/sw/wl) context.
Today's SPE is performed during build/release phases by specialized teams, and cursed by: 1) lack of sta…
▽ More
Develo** modern systems software is a complex task that combines business logic programming and Software Performance Engineering (SPE). The later is an experimental and labor-intensive activity focused on optimizing the system for a given hardware, software, and workload (hw/sw/wl) context.
Today's SPE is performed during build/release phases by specialized teams, and cursed by: 1) lack of standardized and automated tools, 2) significant repeated work as hw/sw/wl context changes, 3) fragility induced by a "one-size-fit-all" tuning (where improvements on one workload or component may impact others). The net result: despite costly investments, system software is often outside its optimal operating point - anecdotally leaving 30% to 40% of performance on the table.
The recent developments in Data Science (DS) hints at an opportunity: combining DS tooling and methodologies with a new developer experience to transform the practice of SPE. In this paper we present: MLOS, an ML-powered infrastructure and methodology to democratize and automate Software Performance Engineering. MLOS enables continuous, instance-level, robust, and trackable systems optimization. MLOS is being developed and employed within Microsoft to optimize SQL Server performance. Early results indicated that component-level optimizations can lead to 20%-90% improvements when custom-tuning for a specific hw/sw/wl, hinting at a significant opportunity. However, several research challenges remain that will require community involvement. To this end, we are in the process of open-sourcing the MLOS core infrastructure, and we are engaging with academic institutions to create an educational program around Software 2.0 and MLOS ideas.
△ Less
Submitted 4 June, 2020; v1 submitted 1 June, 2020;
originally announced June 2020.
-
Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML
Authors:
Ashvin Agrawal,
Rony Chatterjee,
Carlo Curino,
Avrilia Floratou,
Neha Gowdal,
Matteo Interlandi,
Alekh **dal,
Kostantinos Karanasos,
Subru Krishnan,
Brian Kroth,
Jyoti Leeka,
Kwanghyun Park,
Hiren Patel,
Olga Poppe,
Fotis Psallidas,
Raghu Ramakrishnan,
Abhishek Roy,
Karla Saur,
Rathijit Sen,
Markus Weimer,
Travis Wright,
Yiwen Zhu
Abstract:
Machine learning (ML) has proven itself in high-value web applications such as search ranking and is emerging as a powerful tool in a much broader range of enterprise scenarios including voice recognition and conversational understanding for customer support, autotuning for videoconferencing, intelligent feedback loops in large-scale sysops, manufacturing and autonomous vehicle management, complex…
▽ More
Machine learning (ML) has proven itself in high-value web applications such as search ranking and is emerging as a powerful tool in a much broader range of enterprise scenarios including voice recognition and conversational understanding for customer support, autotuning for videoconferencing, intelligent feedback loops in large-scale sysops, manufacturing and autonomous vehicle management, complex financial predictions, just to name a few. Meanwhile, as the value of data is increasingly recognized and monetized, concerns about securing valuable data and risks to individual privacy have been growing. Consequently, rigorous data management has emerged as a key requirement in enterprise settings. How will these trends (ML growing popularity, and stricter data governance) intersect? What are the unmet requirements for applying ML in enterprise settings? What are the technical challenges for the DB community to solve? In this paper, we present our vision of how ML and database systems are likely to come together, and early steps we take towards making this vision a reality.
△ Less
Submitted 27 December, 2019; v1 submitted 30 August, 2019;
originally announced September 2019.