Skip to main content

Showing 1–31 of 31 results for author: Scheinert, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05692  [pdf, other

    cs.DC

    Privacy-Preserving Sharing of Data Analytics Runtime Metrics for Performance Modeling

    Authors: Jonathan Will, Dominik Scheinert, Jan Bode, Cedric Kring, Seraphin Zunzer, Lauritz Thamsen

    Abstract: Performance modeling for large-scale data analytics workloads can improve the efficiency of cluster resource allocations and job scheduling. However, the performance of these workloads is influenced by numerous factors, such as job inputs and the assigned cluster resources. As a result, performance models require significant amounts of training data. This data can be obtained by exchanging runtime… ▽ More

    Submitted 13 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: 4 pages, 4 figures, presented at the WOSP-C workshop at ICPE 2024

  2. arXiv:2403.02129  [pdf, other

    cs.DC

    Demeter: Resource-Efficient Distributed Stream Processing under Dynamic Loads with Multi-Configuration Optimization

    Authors: Morgan Geldenhuys, Dominik Scheinert, Odej Kao, Lauritz Thamsen

    Abstract: Distributed Stream Processing (DSP) focuses on the near real-time processing of large streams of unbounded data. To increase processing capacities, DSP systems are able to dynamically scale across a cluster of commodity nodes, ensuring a good Quality of Service despite variable workloads. However, selecting scaleout configurations which maximize resource utilization remains a challenge. This is es… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 12 pages, 14 figures, published at ICPE 2024

  3. arXiv:2403.02093  [pdf, other

    cs.DC

    Daedalus: Self-Adaptive Horizontal Autoscaling for Resource Efficiency of Distributed Stream Processing Systems

    Authors: Benjamin J. J. Pfister, Dominik Scheinert, Morgan K. Geldenhuys, Odej Kao

    Abstract: Distributed Stream Processing (DSP) systems are capable of processing large streams of unbounded data, offering high throughput and low latencies. To maintain a stable Quality of Service (QoS), these systems require a sufficient allocation of resources. At the same time, over-provisioning can result in wasted energy and high operating costs. Therefore, to maximize resource utilization, autoscaling… ▽ More

    Submitted 5 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: 12 pages, 11 figures, 1 table

  4. Towards a Peer-to-Peer Data Distribution Layer for Efficient and Collaborative Resource Optimization of Distributed Dataflow Applications

    Authors: Dominik Scheinert, Soeren Becker, Jonathan Will, Luis Englaender, Lauritz Thamsen

    Abstract: Performance modeling can help to improve the resource efficiency of clusters and distributed dataflow applications, yet the available modeling data is often limited. Collaborative approaches to performance modeling, characterized by the sharing of performance data or models, have been shown to improve resource efficiency, but there has been little focus on actual data sharing strategies and implem… ▽ More

    Submitted 23 January, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: 7 pages, 4 figures, 2 tables

    Journal ref: IEEE BigData (2023) 2339-2345

  5. Karasu: A Collaborative Approach to Efficient Cluster Configuration for Big Data Analytics

    Authors: Dominik Scheinert, Philipp Wiesner, Thorsten Wittkopp, Lauritz Thamsen, Jonathan Will, Odej Kao

    Abstract: Selecting the right resources for big data analytics jobs is hard because of the wide variety of configuration options like machine type and cluster size. As poor choices can have a significant impact on resource efficiency, cost, and energy usage, automated approaches are gaining popularity. Most existing methods rely on profiling recurring workloads to find near-optimal solutions over time. Due… ▽ More

    Submitted 23 November, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

    Comments: 10 pages, 9 figures

    Journal ref: IEEE IPCCC (2023) 403-412

  6. Evaluation of Data Enrichment Methods for Distributed Stream Processing Systems

    Authors: Dominik Scheinert, Fabian Casares, Morgan K. Geldenhuys, Kevin Styp-Rekowski, Odej Kao

    Abstract: Stream processing has become a critical component in the architecture of modern applications. With the exponential growth of data generation from sources such as the Internet of Things, business intelligence, and telecommunications, real-time processing of unbounded data streams has become a necessity. DSP systems provide a solution to this challenge, offering high horizontal scalability, fault-to… ▽ More

    Submitted 23 November, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: 10 pages, 13 figures, 2 tables

    Journal ref: IEEE IC2E (2023) 202-211

  7. Selecting Efficient Cluster Resources for Data Analytics: When and How to Allocate for In-Memory Processing?

    Authors: Jonathan Will, Lauritz Thamsen, Dominik Scheinert, Odej Kao

    Abstract: Distributed dataflow systems such as Apache Spark or Apache Flink enable parallel, in-memory data processing on large clusters of commodity hardware. Consequently, the appropriate amount of memory to allocate to the cluster is a crucial consideration. In this paper, we analyze the challenge of efficient resource allocation for distributed data processing, focusing on memory. We emphasize that in… ▽ More

    Submitted 7 June, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: 4 pages, 3 Figures; ACM SSDBM 2023

    ACM Class: C.2.4; C.4; I.2.8; H.2.8; H.2.4

  8. arXiv:2301.10681  [pdf, other

    cs.LG

    PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning

    Authors: Thorsten Wittkopp, Dominik Scheinert, Philipp Wiesner, Alexander Acker, Odej Kao

    Abstract: Due to the complexity of modern IT services, failures can be manifold, occur at any stage, and are hard to detect. For this reason, anomaly detection applied to monitoring data such as logs allows gaining relevant insights to improve IT services steadily and eradicate failures. However, existing anomaly detection methods that provide high accuracy often rely on labeled training data, which are tim… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.

    Comments: published in the proceedings of the 56th Hawaii International Conference on System Sciences (HICSS 2023)

  9. Probabilistic Time Series Forecasting for Adaptive Monitoring in Edge Computing Environments

    Authors: Dominik Scheinert, Babak Sistani Zadeh Aghdam, Soeren Becker, Odej Kao, Lauritz Thamsen

    Abstract: With increasingly more computation being shifted to the edge of the network, monitoring of critical infrastructures, such as intermediate processing nodes in autonomous driving, is further complicated due to the typically resource-constrained environments. In order to reduce the resource overhead on the network link imposed by monitoring, various methods have been discussed that either follow a fi… ▽ More

    Submitted 30 January, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

    Comments: 6 pages, 5 figures, 2 tables

    Journal ref: IEEE BigData (2022) 4583-4588

  10. Perona: Robust Infrastructure Fingerprinting for Resource-Efficient Big Data Analytics

    Authors: Dominik Scheinert, Soeren Becker, Jonathan Bader, Lauritz Thamsen, Jonathan Will, Odej Kao

    Abstract: Choosing a good resource configuration for big data analytics applications can be challenging, especially in cloud environments. Automated approaches are desirable as poor decisions can reduce performance and raise costs. The majority of existing automated approaches either build performance models from previous workload executions or conduct iterative resource configuration profiling until a near… ▽ More

    Submitted 30 January, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: 8 pages, 5 figures, 3 tables

    Journal ref: IEEE BigData (2022) 209-216

  11. Ruya: Memory-Aware Iterative Optimization of Cluster Configurations for Big Data Processing

    Authors: Jonathan Will, Lauritz Thamsen, Jonathan Bader, Dominik Scheinert, Odej Kao

    Abstract: Selecting appropriate computational resources for data processing jobs on large clusters is difficult, even for expert users like data engineers. Inadequate choices can result in vastly increased costs, without significantly improving performance. One crucial aspect of selecting an efficient resource configuration is avoiding memory bottlenecks. By knowing the required memory of a job in advance,… ▽ More

    Submitted 3 February, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

    Comments: 9 pages, 5 Figures, 3 Tables; IEEE BigData 2022. arXiv admin note: substantial text overlap with arXiv:2206.13852

    ACM Class: C.2.4; I.2.8; I.2.6

    Journal ref: 2022 IEEE International Conference on Big Data (Big Data) pp. 161-169

  12. Reshi: Recommending Resources for Scientific Workflow Tasks on Heterogeneous Infrastructures

    Authors: Jonathan Bader, Fabian Lehmann, Alexander Groth, Lauritz Thamsen, Dominik Scheinert, Jonathan Will, Ulf Leser, Odej Kao

    Abstract: Scientific workflows typically comprise a multitude of different processing steps which often are executed in parallel on different partitions of the input data. These executions, in turn, must be scheduled on the compute nodes of the computational infrastructure at hand. This assignment is complicated by the facts that (a) tasks typically have highly heterogeneous resource requirements and (b) in… ▽ More

    Submitted 17 October, 2022; v1 submitted 16 August, 2022; originally announced August 2022.

    Comments: Paper accepted in 41st IEEE International Performance Computing and Communications Conference (IPCCC 2022)

  13. arXiv:2207.09298  [pdf, other

    cs.DC cs.AI

    Magpie: Automatically Tuning Static Parameters for Distributed File Systems using Deep Reinforcement Learning

    Authors: Houkun Zhu, Dominik Scheinert, Lauritz Thamsen, Kordian Gontarska, Odej Kao

    Abstract: Distributed file systems are widely used nowadays, yet using their default configurations is often not optimal. At the same time, tuning configuration parameters is typically challenging and time-consuming. It demands expertise and tuning operations can also be expensive. This is especially the case for static parameters, where changes take effect only after a restart of the system or workloads. W… ▽ More

    Submitted 22 July, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

    Comments: Accepted at The IEEE International Conference on Cloud Engineering (IC2E) conference 2022

  14. Get Your Memory Right: The Crispy Resource Allocation Assistant for Large-Scale Data Processing

    Authors: Jonathan Will, Lauritz Thamsen, Jonathan Bader, Dominik Scheinert, Odej Kao

    Abstract: Distributed dataflow systems like Apache Spark and Apache Hadoop enable data-parallel processing of large datasets on clusters. Yet, selecting appropriate computational resources for dataflow jobs -- that neither lead to bottlenecks nor to low resource utilization -- is often challenging, even for expert users such as data engineers. Further, existing automated approaches to resource selection rel… ▽ More

    Submitted 10 January, 2023; v1 submitted 28 June, 2022; originally announced June 2022.

    Comments: 9 pages, 3 figures, 2 tables, IEEE IC2E 2022

    ACM Class: C.2.4; I.2.8; I.2.6

    Journal ref: 2022 IEEE International Conference on Cloud Engineering (IC2E), pp. 58-66

  15. arXiv:2206.09679  [pdf, other

    cs.DC

    Phoebe: QoS-Aware Distributed Stream Processing through Anticipating Dynamic Workloads

    Authors: Morgan K. Geldenhuys, Dominik Scheinert, Odej Kao, Lauritz Thamsen

    Abstract: Distributed Stream Processing systems have become an essential part of big data processing platforms. They are characterized by the high-throughput processing of near to real-time event streams with the goal of delivering low-latency results and thus enabling time-sensitive decision making. At the same time, results are expected to be consistent even in the presence of partial failures where exact… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

    Comments: 10 pages, ICWS2022

  16. Collaborative Cluster Configuration for Distributed Data-Parallel Processing: A Research Overview

    Authors: Lauritz Thamsen, Dominik Scheinert, Jonathan Will, Jonathan Bader, Odej Kao

    Abstract: Many organizations routinely analyze large datasets using systems for distributed data-parallel processing and clusters of commodity resources. Yet, users need to configure adequate resources for their data processing jobs. This requires significant insights into expected job runtimes and scaling behavior, resource characteristics, input data distributions, and other factors. Unable to estimate pe… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

  17. Cucumber: Renewable-Aware Admission Control for Delay-Tolerant Cloud and Edge Workloads

    Authors: Philipp Wiesner, Dominik Scheinert, Thorsten Wittkopp, Lauritz Thamsen, Odej Kao

    Abstract: The growing electricity demand of cloud and edge computing increases operational costs and will soon have a considerable impact on the environment. A possible countermeasure is equip** IT infrastructure directly with on-site renewable energy sources. Yet, particularly smaller data centers may not be able to use all generated power directly at all times, while feeding it into the public grid or e… ▽ More

    Submitted 27 August, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

    Comments: Accepted at Euro-Par 2022. GitHub repository: https://github.com/dos-group/cucumber

  18. arXiv:2203.05362  [pdf, other

    cs.DC

    Efficient Runtime Profiling for Black-box Machine Learning Services on Sensor Streams

    Authors: Soeren Becker, Dominik Scheinert, Florian Schmidt, Odej Kao

    Abstract: In highly distributed environments such as cloud, edge and fog computing, the application of machine learning for automating and optimizing processes is on the rise. Machine learning jobs are frequently applied in streaming conditions, where models are used to analyze data streams originating from e.g. video streams or sensory data. Often the results for particular data samples need to be provided… ▽ More

    Submitted 10 March, 2022; originally announced March 2022.

    Comments: Accepted as a short paper at the 6th IEEE International Conference on Fog and Edge Computing 2022

  19. arXiv:2111.13462  [pdf, other

    cs.DB cs.GL cs.LG

    A Taxonomy of Anomalies in Log Data

    Authors: Thorsten Wittkopp, Philipp Wiesner, Dominik Scheinert, Odej Kao

    Abstract: Log data anomaly detection is a core component in the area of artificial intelligence for IT operations. However, the large amount of existing methods makes it hard to choose the right approach for a specific system. A better understanding of different kinds of anomalies, and which algorithms are suitable for detecting them, would support researchers and IT operators. Although a common taxonomy fo… ▽ More

    Submitted 26 November, 2021; originally announced November 2021.

    Comments: Paper accepted and presented at AIOPS workshop 2021 co-located with ICSOC 2021

  20. On the Potential of Execution Traces for Batch Processing Workload Optimization in Public Clouds

    Authors: Dominik Scheinert, Alireza Alamgiralem, Jonathan Bader, Jonathan Will, Thorsten Wittkopp, Lauritz Thamsen

    Abstract: With the growing amount of data, data processing workloads and the management of their resource usage becomes increasingly important. Since managing a dedicated infrastructure is in many situations infeasible or uneconomical, users progressively execute their respective workloads in the cloud. As the configuration of workloads and resources is often challenging, various methods have been proposed… ▽ More

    Submitted 16 January, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

    Comments: 6 pages, 5 figures, 1 table

    Journal ref: IEEE BigData (2021) 3113-3118

  21. Training Data Reduction for Performance Models of Data Analytics Jobs in the Cloud

    Authors: Jonathan Will, Onur Arslan, Jonathan Bader, Dominik Scheinert, Lauritz Thamsen

    Abstract: Distributed dataflow systems like Apache Flink and Apache Spark simplify processing large amounts of data on clusters in a data-parallel manner. However, choosing suitable cluster resources for distributed dataflow jobs in both type and number is difficult, especially for users who do not have access to previous performance metrics. One approach to overcoming this issue is to have users share runt… ▽ More

    Submitted 11 March, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: 6 pages, 5 figures, Accepted for the BPOD Workshop at IEEE Big Data 2021

    ACM Class: C.2.4; I.2.8; I.2.6

    Journal ref: IEEE Big Data (2021) 3141-3146

  22. LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak Supervision

    Authors: Thorsten Wittkopp, Philipp Wiesner, Dominik Scheinert, Alexander Acker

    Abstract: With increasing scale and complexity of cloud operations, automated detection of anomalies in monitoring data such as logs will be an essential part of managing future IT infrastructures. However, many methods based on artificial intelligence, such as supervised deep learning models, require large amounts of labeled training data to perform well. In practice, this data is rarely available because… ▽ More

    Submitted 25 November, 2021; v1 submitted 2 November, 2021; originally announced November 2021.

    Comments: Paper accepted on ICSOC 2021 and published on springer

    Journal ref: 19th International Conference on Service-Oriented Computing, 2021, 700-707

  23. Let's Wait Awhile: How Temporal Workload Shifting Can Reduce Carbon Emissions in the Cloud

    Authors: Philipp Wiesner, Ilja Behnke, Dominik Scheinert, Kordian Gontarska, Lauritz Thamsen

    Abstract: Depending on energy sources and demand, the carbon intensity of the public power grid fluctuates over time. Exploiting this variability is an important factor in reducing the emissions caused by data centers. However, regional differences in the availability of low-carbon energy sources make it hard to provide general best practices for when to consume electricity. Moreover, existing research in t… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: To be published in the proceedings of the 22nd International Middleware Conference (Middleware '21), December 6-10, 2021, Virtual Event, Canada

  24. arXiv:2109.09537  [pdf, other

    cs.LG

    A2Log: Attentive Augmented Log Anomaly Detection

    Authors: Thorsten Wittkopp, Alexander Acker, Sasho Nedelkoski, Jasmin Bogatinovski, Dominik Scheinert, Wu Fan, Odej Kao

    Abstract: Anomaly detection becomes increasingly important for the dependability and serviceability of IT services. As log lines record events during the execution of IT services, they are a primary source for diagnostics. Thereby, unsupervised methods provide a significant benefit since not all anomalies can be known at training time. Existing unsupervised methods need anomaly examples to obtain a suitable… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

    Comments: This paper has been accepted for HICSS 2022 and will appear in the conference proceedings

  25. arXiv:2109.02340  [pdf, other

    cs.DC

    Khaos: Dynamically Optimizing Checkpointing for Dependable Distributed Stream Processing

    Authors: Morgan K. Geldenhuys, Benjamin J. J. Pfister, Dominik Scheinert, Lauritz Thamsen, Odej Kao

    Abstract: Distributed Stream Processing systems are becoming an increasingly essential part of Big Data processing platforms as users grow ever more reliant on their ability to provide fast access to new results. As such, making timely decisions based on these results is dependent on a system's ability to tolerate failure. Typically, these systems achieve fault tolerance and the ability to recover automatic… ▽ More

    Submitted 26 January, 2023; v1 submitted 6 September, 2021; originally announced September 2021.

  26. Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs using Graph Propagation

    Authors: Dominik Scheinert, Houkun Zhu, Lauritz Thamsen, Morgan K. Geldenhuys, Jonathan Will, Alexander Acker, Odej Kao

    Abstract: Distributed dataflow systems like Spark and Flink enable the use of clusters for scalable data analytics. While runtime prediction models can be used to initially select appropriate cluster resources given target runtimes, the actual runtime performance of dataflow jobs depends on several factors and varies over time. Yet, in many situations, dynamic scaling can be used to meet formulated runtime… ▽ More

    Submitted 26 January, 2022; v1 submitted 27 August, 2021; originally announced August 2021.

    Comments: 8 pages, 5 figures, 3 tables

    Journal ref: IEEE IPCCC (2021) 1-8

  27. arXiv:2108.04749  [pdf, other

    cs.DC cs.AI

    Evaluation of Load Prediction Techniques for Distributed Stream Processing

    Authors: Kordian Gontarska, Morgan Geldenhuys, Dominik Scheinert, Philipp Wiesner, Andreas Polze, Lauritz Thamsen

    Abstract: Distributed Stream Processing (DSP) systems enable processing large streams of continuous data to produce results in near to real time. They are an essential part of many data-intensive applications and analytics platforms. The rate at which events arrive at DSP systems can vary considerably over time, which may be due to trends, cyclic, and seasonal patterns within the data streams. A priori know… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

  28. Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across Contexts

    Authors: Dominik Scheinert, Lauritz Thamsen, Houkun Zhu, Jonathan Will, Alexander Acker, Thorsten Wittkopp, Odej Kao

    Abstract: Distributed dataflow systems enable the use of clusters for scalable data analytics. However, selecting appropriate cluster resources for a processing job is often not straightforward. Performance models trained on historical executions of a concrete job are helpful in such situations, yet they are usually bound to a specific job execution context (e.g. node type, software versions, job parameters… ▽ More

    Submitted 17 October, 2021; v1 submitted 29 July, 2021; originally announced July 2021.

    Comments: 10 pages, 8 figures, 2 tables

    Journal ref: IEEE CLUSTER (2021) 261-270

  29. C3O: Collaborative Cluster Configuration Optimization for Distributed Data Processing in Public Clouds

    Authors: Jonathan Will, Lauritz Thamsen, Dominik Scheinert, Jonathan Bader, Odej Kao

    Abstract: Distributed dataflow systems enable data-parallel processing of large datasets on clusters. Public cloud providers offer a large variety and quantity of resources that can be used for such clusters. Yet, selecting appropriate cloud resources for dataflow jobs - that neither lead to bottlenecks nor to low resource utilization - is often challenging, even for expert users such as data engineers. W… ▽ More

    Submitted 1 December, 2021; v1 submitted 28 July, 2021; originally announced July 2021.

    Comments: 10 pages, 5 figures, IEEE IC2E 2021. arXiv admin note: text overlap with arXiv:2011.07965

    ACM Class: C.2.4; I.2.8; I.2.6

    Journal ref: IEEE IC2E (2021) 43-52

  30. Learning Dependencies in Distributed Cloud Applications to Identify and Localize Anomalies

    Authors: Dominik Scheinert, Alexander Acker, Lauritz Thamsen, Morgan K. Geldenhuys, Odej Kao

    Abstract: Operation and maintenance of large distributed cloud applications can quickly become unmanageably complex, putting human operators under immense stress when problems occur. Utilizing machine learning for identification and localization of anomalies in such systems supports human experts and enables fast mitigation. However, due to the various inter-dependencies of system components, anomalies do n… ▽ More

    Submitted 9 September, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

    Comments: 6 pages, 5 figures, 3 tables

    Journal ref: IEEE/ACM CloudIntelligence (2021) 7-12

  31. TELESTO: A Graph Neural Network Model for Anomaly Classification in Cloud Services

    Authors: Dominik Scheinert, Alexander Acker

    Abstract: Deployment, operation and maintenance of large IT systems becomes increasingly complex and puts human experts under extreme stress when problems occur. Therefore, utilization of machine learning (ML) and artificial intelligence (AI) is applied on IT system operation and maintenance - summarized in the term AIOps. One specific direction aims at the recognition of re-occurring anomaly types to enabl… ▽ More

    Submitted 29 July, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

    Comments: 12 pages, 2 figures, 4 tables

    Journal ref: Springer ICSOC LNCS 12632 (2020) 214-227