Skip to main content

Showing 1–50 of 54 results for author: Thamsen, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05692  [pdf, other

    cs.DC

    Privacy-Preserving Sharing of Data Analytics Runtime Metrics for Performance Modeling

    Authors: Jonathan Will, Dominik Scheinert, Jan Bode, Cedric Kring, Seraphin Zunzer, Lauritz Thamsen

    Abstract: Performance modeling for large-scale data analytics workloads can improve the efficiency of cluster resource allocations and job scheduling. However, the performance of these workloads is influenced by numerous factors, such as job inputs and the assigned cluster resources. As a result, performance models require significant amounts of training data. This data can be obtained by exchanging runtime… ▽ More

    Submitted 13 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: 4 pages, 4 figures, presented at the WOSP-C workshop at ICPE 2024

  2. arXiv:2403.02129  [pdf, other

    cs.DC

    Demeter: Resource-Efficient Distributed Stream Processing under Dynamic Loads with Multi-Configuration Optimization

    Authors: Morgan Geldenhuys, Dominik Scheinert, Odej Kao, Lauritz Thamsen

    Abstract: Distributed Stream Processing (DSP) focuses on the near real-time processing of large streams of unbounded data. To increase processing capacities, DSP systems are able to dynamically scale across a cluster of commodity nodes, ensuring a good Quality of Service despite variable workloads. However, selecting scaleout configurations which maximize resource utilization remains a challenge. This is es… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 12 pages, 14 figures, published at ICPE 2024

  3. The Common Workflow Scheduler Interface: Status Quo and Future Plans

    Authors: Fabian Lehmann, Jonathan Bader, Lauritz Thamsen, Ulf Leser

    Abstract: Nowadays, many scientific workflows from different domains, such as Remote Sensing, Astronomy, and Bioinformatics, are executed on large computing infrastructures managed by resource managers. Scientific workflow management systems (SWMS) support the workflow execution and communicate with the infrastructures' resource managers. However, the communication between SWMS and resource managers is comp… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Journal ref: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W 2023)

  4. Towards a Peer-to-Peer Data Distribution Layer for Efficient and Collaborative Resource Optimization of Distributed Dataflow Applications

    Authors: Dominik Scheinert, Soeren Becker, Jonathan Will, Luis Englaender, Lauritz Thamsen

    Abstract: Performance modeling can help to improve the resource efficiency of clusters and distributed dataflow applications, yet the available modeling data is often limited. Collaborative approaches to performance modeling, characterized by the sharing of performance data or models, have been shown to improve resource efficiency, but there has been little focus on actual data sharing strategies and implem… ▽ More

    Submitted 23 January, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: 7 pages, 4 figures, 2 tables

    Journal ref: IEEE BigData (2023) 2339-2345

  5. arXiv:2311.08185  [pdf, other

    cs.DC

    Predicting Dynamic Memory Requirements for Scientific Workflow Tasks

    Authors: Jonathan Bader, Nils Diedrich, Lauritz Thamsen, Odej Kao

    Abstract: With the increasing amount of data available to scientists in disciplines as diverse as bioinformatics, physics, and remote sensing, scientific workflow systems are becoming increasingly important for composing and executing scalable data analysis pipelines. When writing such workflows, users need to specify the resources to be reserved for tasks so that sufficient resources are allocated on the t… ▽ More

    Submitted 19 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Paper accepted in 2023 IEEE International Conference on Big Data

  6. Lotaru: Locally Predicting Workflow Task Runtimes for Resource Management on Heterogeneous Infrastructures

    Authors: Jonathan Bader, Fabian Lehmann, Lauritz Thamsen, Ulf Leser, Odej Kao

    Abstract: Many resource management techniques for task scheduling, energy and carbon efficiency, and cost optimization in workflows rely on a-priori task runtime knowledge. Building runtime prediction models on historical data is often not feasible in practice as workflows, their input data, and the cluster infrastructure change. Online methods, on the other hand, which estimate task runtimes on specific ma… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Journal ref: Future Generation Computer Systems, Volume 150, January 2024, Pages 171-185

  7. Karasu: A Collaborative Approach to Efficient Cluster Configuration for Big Data Analytics

    Authors: Dominik Scheinert, Philipp Wiesner, Thorsten Wittkopp, Lauritz Thamsen, Jonathan Will, Odej Kao

    Abstract: Selecting the right resources for big data analytics jobs is hard because of the wide variety of configuration options like machine type and cluster size. As poor choices can have a significant impact on resource efficiency, cost, and energy usage, automated approaches are gaining popularity. Most existing methods rely on profiling recurring workloads to find near-optimal solutions over time. Due… ▽ More

    Submitted 23 November, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

    Comments: 10 pages, 9 figures

    Journal ref: IEEE IPCCC (2023) 403-412

  8. Selecting Efficient Cluster Resources for Data Analytics: When and How to Allocate for In-Memory Processing?

    Authors: Jonathan Will, Lauritz Thamsen, Dominik Scheinert, Odej Kao

    Abstract: Distributed dataflow systems such as Apache Spark or Apache Flink enable parallel, in-memory data processing on large clusters of commodity hardware. Consequently, the appropriate amount of memory to allocate to the cluster is a crucial consideration. In this paper, we analyze the challenge of efficient resource allocation for distributed data processing, focusing on memory. We emphasize that in… ▽ More

    Submitted 7 June, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: 4 pages, 3 Figures; ACM SSDBM 2023

    ACM Class: C.2.4; C.4; I.2.8; H.2.8; H.2.4

  9. FedZero: Leveraging Renewable Excess Energy in Federated Learning

    Authors: Philipp Wiesner, Ramin Khalili, Dennis Grinwald, Pratik Agrawal, Lauritz Thamsen, Odej Kao

    Abstract: Federated Learning (FL) is an emerging machine learning technique that enables distributed model training across data silos or edge devices without data sharing. Yet, FL inevitably introduces inefficiencies compared to centralized model training, which will further increase the already high energy usage and associated carbon emissions of machine learning in the future. One idea to reduce FL's carb… ▽ More

    Submitted 10 January, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted for publication at ACM e-Energy '24

  10. Towards a Real-Time IoT: Approaches for Incoming Packet Processing in Cyber-Physical Systems

    Authors: Ilja Behnke, Christoph Blumschein, Robert Danicki, Philipp Wiesner, Lauritz Thamsen, Odej Kao

    Abstract: Embedded real-time devices for monitoring, controlling, and collaboration purposes in cyber-physical systems are now commonly equipped with IP networking capabilities. However, the reception and processing of IP packets generates workloads in unpredictable frequencies as networks are outside of a developer's control and difficult to anticipate, especially when networks are connected to the interne… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: text overlap with arXiv:2204.08846

    Journal ref: Journal of Systems Architecture. 140 (2023)

  11. Towards Energy Consumption and Carbon Footprint Testing for AI-driven IoT Services

    Authors: Demetris Trihinas, Lauritz Thamsen, Jossekin Beilharz, Moysis Symeonides

    Abstract: Energy consumption and carbon emissions are expected to be crucial factors for Internet of Things (IoT) applications. Both the scale and the geo-distribution keep increasing, while Artificial Intelligence (AI) further penetrates the "edge" in order to satisfy the need for highly-responsive and intelligent services. To date, several edge/fog emulators are catering for IoT testing by supporting the… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: Presented at the 2nd International Workshop on Testing Distributed Internet of Things Systems (TDIS 2022)

    Journal ref: 2022 IEEE International Conference on Cloud Engineering (IC2E 2022)

  12. How Workflow Engines Should Talk to Resource Managers: A Proposal for a Common Workflow Scheduling Interface

    Authors: Fabian Lehmann, Jonathan Bader, Friedrich Tschirpke, Lauritz Thamsen, Ulf Leser

    Abstract: Scientific workflow management systems (SWMSs) and resource managers together ensure that tasks are scheduled on provisioned resources so that all dependencies are obeyed, and some optimization goal, such as makespan minimization, is achieved. In practice, however, there is no clear separation of scheduling responsibilities between an SWMS and a resource manager because there exists no agreed-upon… ▽ More

    Submitted 13 July, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

    Journal ref: 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)

  13. Probabilistic Time Series Forecasting for Adaptive Monitoring in Edge Computing Environments

    Authors: Dominik Scheinert, Babak Sistani Zadeh Aghdam, Soeren Becker, Odej Kao, Lauritz Thamsen

    Abstract: With increasingly more computation being shifted to the edge of the network, monitoring of critical infrastructures, such as intermediate processing nodes in autonomous driving, is further complicated due to the typically resource-constrained environments. In order to reduce the resource overhead on the network link imposed by monitoring, various methods have been discussed that either follow a fi… ▽ More

    Submitted 30 January, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

    Comments: 6 pages, 5 figures, 2 tables

    Journal ref: IEEE BigData (2022) 4583-4588

  14. Perona: Robust Infrastructure Fingerprinting for Resource-Efficient Big Data Analytics

    Authors: Dominik Scheinert, Soeren Becker, Jonathan Bader, Lauritz Thamsen, Jonathan Will, Odej Kao

    Abstract: Choosing a good resource configuration for big data analytics applications can be challenging, especially in cloud environments. Automated approaches are desirable as poor decisions can reduce performance and raise costs. The majority of existing automated approaches either build performance models from previous workload executions or conduct iterative resource configuration profiling until a near… ▽ More

    Submitted 30 January, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: 8 pages, 5 figures, 3 tables

    Journal ref: IEEE BigData (2022) 209-216

  15. Ruya: Memory-Aware Iterative Optimization of Cluster Configurations for Big Data Processing

    Authors: Jonathan Will, Lauritz Thamsen, Jonathan Bader, Dominik Scheinert, Odej Kao

    Abstract: Selecting appropriate computational resources for data processing jobs on large clusters is difficult, even for expert users like data engineers. Inadequate choices can result in vastly increased costs, without significantly improving performance. One crucial aspect of selecting an efficient resource configuration is avoiding memory bottlenecks. By knowing the required memory of a job in advance,… ▽ More

    Submitted 3 February, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

    Comments: 9 pages, 5 Figures, 3 Tables; IEEE BigData 2022. arXiv admin note: substantial text overlap with arXiv:2206.13852

    ACM Class: C.2.4; I.2.8; I.2.6

    Journal ref: 2022 IEEE International Conference on Big Data (Big Data) pp. 161-169

  16. Reshi: Recommending Resources for Scientific Workflow Tasks on Heterogeneous Infrastructures

    Authors: Jonathan Bader, Fabian Lehmann, Alexander Groth, Lauritz Thamsen, Dominik Scheinert, Jonathan Will, Ulf Leser, Odej Kao

    Abstract: Scientific workflows typically comprise a multitude of different processing steps which often are executed in parallel on different partitions of the input data. These executions, in turn, must be scheduled on the compute nodes of the computational infrastructure at hand. This assignment is complicated by the facts that (a) tasks typically have highly heterogeneous resource requirements and (b) in… ▽ More

    Submitted 17 October, 2022; v1 submitted 16 August, 2022; originally announced August 2022.

    Comments: Paper accepted in 41st IEEE International Performance Computing and Communications Conference (IPCCC 2022)

  17. arXiv:2207.09298  [pdf, other

    cs.DC cs.AI

    Magpie: Automatically Tuning Static Parameters for Distributed File Systems using Deep Reinforcement Learning

    Authors: Houkun Zhu, Dominik Scheinert, Lauritz Thamsen, Kordian Gontarska, Odej Kao

    Abstract: Distributed file systems are widely used nowadays, yet using their default configurations is often not optimal. At the same time, tuning configuration parameters is typically challenging and time-consuming. It demands expertise and tuning operations can also be expensive. This is especially the case for static parameters, where changes take effect only after a restart of the system or workloads. W… ▽ More

    Submitted 22 July, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

    Comments: Accepted at The IEEE International Conference on Cloud Engineering (IC2E) conference 2022

  18. Get Your Memory Right: The Crispy Resource Allocation Assistant for Large-Scale Data Processing

    Authors: Jonathan Will, Lauritz Thamsen, Jonathan Bader, Dominik Scheinert, Odej Kao

    Abstract: Distributed dataflow systems like Apache Spark and Apache Hadoop enable data-parallel processing of large datasets on clusters. Yet, selecting appropriate computational resources for dataflow jobs -- that neither lead to bottlenecks nor to low resource utilization -- is often challenging, even for expert users such as data engineers. Further, existing automated approaches to resource selection rel… ▽ More

    Submitted 10 January, 2023; v1 submitted 28 June, 2022; originally announced June 2022.

    Comments: 9 pages, 3 figures, 2 tables, IEEE IC2E 2022

    ACM Class: C.2.4; I.2.8; I.2.6

    Journal ref: 2022 IEEE International Conference on Cloud Engineering (IC2E), pp. 58-66

  19. arXiv:2206.09679  [pdf, other

    cs.DC

    Phoebe: QoS-Aware Distributed Stream Processing through Anticipating Dynamic Workloads

    Authors: Morgan K. Geldenhuys, Dominik Scheinert, Odej Kao, Lauritz Thamsen

    Abstract: Distributed Stream Processing systems have become an essential part of big data processing platforms. They are characterized by the high-throughput processing of near to real-time event streams with the goal of delivering low-latency results and thus enabling time-sensitive decision making. At the same time, results are expected to be consistent even in the presence of partial failures where exact… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

    Comments: 10 pages, ICWS2022

  20. Collaborative Cluster Configuration for Distributed Data-Parallel Processing: A Research Overview

    Authors: Lauritz Thamsen, Dominik Scheinert, Jonathan Will, Jonathan Bader, Odej Kao

    Abstract: Many organizations routinely analyze large datasets using systems for distributed data-parallel processing and clusters of commodity resources. Yet, users need to configure adequate resources for their data processing jobs. This requires significant insights into expected job runtimes and scaling behavior, resource characteristics, input data distributions, and other factors. Unable to estimate pe… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

  21. Lotaru: Locally Estimating Runtimes of Scientific Workflow Tasks in Heterogeneous Clusters

    Authors: Jonathan Bader, Fabian Lehmann, Lauritz Thamsen, Jonathan Will, Ulf Leser, Odej Kao

    Abstract: Many scientific workflow scheduling algorithms need to be informed about task runtimes a-priori to conduct efficient scheduling. In heterogeneous cluster infrastructures, this problem becomes aggravated because these runtimes are required for each task-node pair. Using historical data is often not feasible as logs are typically not retained indefinitely and workloads as well as infrastructure chan… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

    Comments: paper accepted in 34th International Conference on Scientific and Statistical Database Management (SSDBM 2022)

  22. Cucumber: Renewable-Aware Admission Control for Delay-Tolerant Cloud and Edge Workloads

    Authors: Philipp Wiesner, Dominik Scheinert, Thorsten Wittkopp, Lauritz Thamsen, Odej Kao

    Abstract: The growing electricity demand of cloud and edge computing increases operational costs and will soon have a considerable impact on the environment. A possible countermeasure is equip** IT infrastructure directly with on-site renewable energy sources. Yet, particularly smaller data centers may not be able to use all generated power directly at all times, while feeding it into the public grid or e… ▽ More

    Submitted 27 August, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

    Comments: Accepted at Euro-Par 2022. GitHub repository: https://github.com/dos-group/cucumber

  23. arXiv:2204.08846  [pdf, other

    cs.NI cs.OS

    Differentiating Network Flows for Priority-Aware Scheduling of Incoming Packets in Real-Time IoT Systems

    Authors: Christoph Blumschein, Ilja Behnke, Lauritz Thamsen, Odej Kao

    Abstract: When IP-packet processing is unconditionally carried out on behalf of an operating system kernel thread, processing systems can experience overload in high incoming traffic scenarios. This is especially worrying for embedded real-time devices controlling their physical environment in industrial IoT scenarios and automotive systems. We propose an embedded real-time aware IP stack adaption with an e… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

    Comments: 25th International Symposium on Real-Time Distributed Computing

  24. SyncMesh: Improving Data Locality for Function-as-a-Service in Meshed Edge Networks

    Authors: Daniel Habenicht, Kevin Kreutz, Soeren Becker, Jonathan Bader, Lauritz Thamsen, Odej Kao

    Abstract: The increasing use of Internet of Things devices coincides with more communication and data movement in networks, which can exceed existing network capabilities. These devices often process sensor or user information, where data privacy and latency are a major concern. Therefore, traditional approaches like cloud computing do not fit well, yet new architectures such as edge computing address this… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

  25. arXiv:2201.00594  [pdf, other

    cs.NI cs.AR cs.DC cs.OS

    A Priority-Aware Multiqueue NIC Design

    Authors: Ilja Behnke, Philipp Wiesner, Robert Danicki, Lauritz Thamsen

    Abstract: Low-level embedded systems are used to control cyber-phyiscal systems in industrial and autonomous applications. They need to meet hard real-time requirements as unanticipated controller delays on moving machines can have devastating effects. Modern developments such as the industrial Internet of Things and autonomous machines require these devices to connect to large IP networks. Since Network In… ▽ More

    Submitted 3 January, 2022; originally announced January 2022.

    Comments: The 37th ACM/SIGAPP Symposium on Applied Computing (SAC '22)

    ACM Class: C.2.4; B.4.1; D.4.4

  26. arXiv:2112.09580  [pdf, ps, other

    cs.DC cs.SE

    Continuously Testing Distributed IoT Systems: An Overview of the State of the Art

    Authors: Jossekin Beilharz, Philipp Wiesner, Arne Boockmeyer, Lukas Pirl, Dirk Friedenberger, Florian Brokhausen, Ilja Behnke, Andreas Polze, Lauritz Thamsen

    Abstract: The continuous testing of small changes to systems has proven to be useful and is widely adopted in the development of software systems. For this, software is tested in environments that are as close as possible to the production environments. When testing IoT systems, this approach is met with unique challenges that stem from the typically large scale of the deployments, heterogeneity of nodes,… ▽ More

    Submitted 17 December, 2021; originally announced December 2021.

  27. On the Potential of Execution Traces for Batch Processing Workload Optimization in Public Clouds

    Authors: Dominik Scheinert, Alireza Alamgiralem, Jonathan Bader, Jonathan Will, Thorsten Wittkopp, Lauritz Thamsen

    Abstract: With the growing amount of data, data processing workloads and the management of their resource usage becomes increasingly important. Since managing a dedicated infrastructure is in many situations infeasible or uneconomical, users progressively execute their respective workloads in the cloud. As the configuration of workloads and resources is often challenging, various methods have been proposed… ▽ More

    Submitted 16 January, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

    Comments: 6 pages, 5 figures, 1 table

    Journal ref: IEEE BigData (2021) 3113-3118

  28. Training Data Reduction for Performance Models of Data Analytics Jobs in the Cloud

    Authors: Jonathan Will, Onur Arslan, Jonathan Bader, Dominik Scheinert, Lauritz Thamsen

    Abstract: Distributed dataflow systems like Apache Flink and Apache Spark simplify processing large amounts of data on clusters in a data-parallel manner. However, choosing suitable cluster resources for distributed dataflow jobs in both type and number is difficult, especially for users who do not have access to previous performance metrics. One approach to overcoming this issue is to have users share runt… ▽ More

    Submitted 11 March, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: 6 pages, 5 figures, Accepted for the BPOD Workshop at IEEE Big Data 2021

    ACM Class: C.2.4; I.2.8; I.2.6

    Journal ref: IEEE Big Data (2021) 3141-3146

  29. Tarema: Adaptive Resource Allocation for Scalable Scientific Workflows in Heterogeneous Clusters

    Authors: Jonathan Bader, Lauritz Thamsen, Svetlana Kulagina, Jonathan Will, Henning Meyerhenke, Odej Kao

    Abstract: Scientific workflow management systems like Nextflow support large-scale data analysis by abstracting away the details of scientific workflows. In these systems, workflows consist of several abstract tasks, of which instances are run in parallel and transform input partitions into output partitions. Resource managers like Kubernetes execute such workflow tasks on cluster infrastructures. However,… ▽ More

    Submitted 19 January, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

    Journal ref: IEEE Big Data (2021), 65-75

  30. Let's Wait Awhile: How Temporal Workload Shifting Can Reduce Carbon Emissions in the Cloud

    Authors: Philipp Wiesner, Ilja Behnke, Dominik Scheinert, Kordian Gontarska, Lauritz Thamsen

    Abstract: Depending on energy sources and demand, the carbon intensity of the public power grid fluctuates over time. Exploiting this variability is an important factor in reducing the emissions caused by data centers. However, regional differences in the availability of low-carbon energy sources make it hard to provide general best practices for when to consume electricity. Moreover, existing research in t… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: To be published in the proceedings of the 22nd International Middleware Conference (Middleware '21), December 6-10, 2021, Virtual Event, Canada

  31. arXiv:2109.13009  [pdf, other

    cs.DC

    LOS: Local-Optimistic Scheduling of Periodic Model Training For Anomaly Detection on Sensor Data Streams in Meshed Edge Networks

    Authors: Soeren Becker, Florian Schmidt, Lauritz Thamsen, Ana Juan Ferrer, Odej Kao

    Abstract: Anomaly detection is increasingly important to handle the amount of sensor data in Edge and Fog environments, Smart Cities, as well as in Industry 4.0. To ensure good results, the utilized ML models need to be updated periodically to adapt to seasonal changes and concept drifts in the sensor data. Although the increasing resource availability at the edge can allow for in-situ execution of model tr… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

    Comments: 2nd IEEE International Conference on Autonomic Computing and Self-Organizing Systems - ACSOS 2021

  32. arXiv:2109.02340  [pdf, other

    cs.DC

    Khaos: Dynamically Optimizing Checkpointing for Dependable Distributed Stream Processing

    Authors: Morgan K. Geldenhuys, Benjamin J. J. Pfister, Dominik Scheinert, Lauritz Thamsen, Odej Kao

    Abstract: Distributed Stream Processing systems are becoming an increasingly essential part of Big Data processing platforms as users grow ever more reliant on their ability to provide fast access to new results. As such, making timely decisions based on these results is dependent on a system's ability to tolerate failure. Typically, these systems achieve fault tolerance and the ability to recover automatic… ▽ More

    Submitted 26 January, 2023; v1 submitted 6 September, 2021; originally announced September 2021.

  33. arXiv:2109.00294  [pdf, other

    cs.MA

    GRAL: Localization of Floating Wireless Sensors in Pipe Networks

    Authors: Martin Haug, Felix Lorenz, Lauritz Thamsen

    Abstract: Mobile wireless sensors are increasingly recognized as a valuable tool for monitoring critical infrastructures. An important use case is the discovery of leaks and inflows in pipe networks using a swarm of floating sensor nodes. While passively drifting along, the devices must track their individual positions so critical points can later be located. Since pipelines are often situated in inaccessib… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

    Comments: to be presented at the 1st International Workshop on Testing Distributed Internet of Things Systems; associated implementation code can be found at https://github.com/reknih/gral/

  34. AuctionWhisk: Using an Auction-Inspired Approach for Function Placement in Serverless Fog Platforms

    Authors: David Bermbach, Jonathan Bader, Jonathan Hasenburg, Tobias Pfandzelter, Lauritz Thamsen

    Abstract: The Function-as-a-Service (FaaS) paradigm has a lot of potential as a computing model for fog environments comprising both cloud and edge nodes, as compute requests can be scheduled across the entire fog continuum in a fine-grained manner. When the request rate exceeds capacity limits at the resource-constrained edge, some functions need to be offloaded towards the cloud. In this paper, we prese… ▽ More

    Submitted 23 November, 2021; v1 submitted 30 August, 2021; originally announced August 2021.

    Comments: Wiley - Software: Practice and Experience

  35. Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs using Graph Propagation

    Authors: Dominik Scheinert, Houkun Zhu, Lauritz Thamsen, Morgan K. Geldenhuys, Jonathan Will, Alexander Acker, Odej Kao

    Abstract: Distributed dataflow systems like Spark and Flink enable the use of clusters for scalable data analytics. While runtime prediction models can be used to initially select appropriate cluster resources given target runtimes, the actual runtime performance of dataflow jobs depends on several factors and varies over time. Yet, in many situations, dynamic scaling can be used to meet formulated runtime… ▽ More

    Submitted 26 January, 2022; v1 submitted 27 August, 2021; originally announced August 2021.

    Comments: 8 pages, 5 figures, 3 tables

    Journal ref: IEEE IPCCC (2021) 1-8

  36. arXiv:2108.10721  [pdf, other

    cs.DC

    Dependable IoT Data Stream Processing for Monitoring and Control of Urban Infrastructures

    Authors: Morgan K. Geldenhuys, Jonathan Will, Benjamin J. J. Pfister, Martin Haug, Alexander Scharmann, Lauritz Thamsen

    Abstract: The Internet of Things describes a network of physical devices interacting and producing vast streams of sensor data. At present there are a number of general challenges which exist while develo** solutions for use cases involving the monitoring and control of urban infrastructures. These include the need for a dependable method for extracting value from these high volume streams of time sensiti… ▽ More

    Submitted 24 August, 2021; originally announced August 2021.

  37. arXiv:2108.08685  [pdf, other

    cs.DC

    On the Future of Cloud Engineering

    Authors: David Bermbach, Abhishek Chandra, Chandra Krintz, Aniruddha Gokhale, Aleksander Slominski, Lauritz Thamsen, Everton Cavalcante, Tian Guo, Ivona Brandic, Rich Wolski

    Abstract: Ever since the commercial offerings of the Cloud started appearing in 2006, the landscape of cloud computing has been undergoing remarkable changes with the emergence of many different types of service offerings, developer productivity enhancement tools, and new application classes as well as the manifestation of cloud functionality closer to the user at the edge. The notion of utility computing,… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

    Comments: author copy/preprint of a paper published in the IEEE International Conference on Cloud Engineering (IC2E 2021)

  38. arXiv:2108.04749  [pdf, other

    cs.DC cs.AI

    Evaluation of Load Prediction Techniques for Distributed Stream Processing

    Authors: Kordian Gontarska, Morgan Geldenhuys, Dominik Scheinert, Philipp Wiesner, Andreas Polze, Lauritz Thamsen

    Abstract: Distributed Stream Processing (DSP) systems enable processing large streams of continuous data to produce results in near to real time. They are an essential part of many data-intensive applications and analytics platforms. The rate at which events arrive at DSP systems can vary considerably over time, which may be due to trends, cyclic, and seasonal patterns within the data streams. A priori know… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

  39. Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across Contexts

    Authors: Dominik Scheinert, Lauritz Thamsen, Houkun Zhu, Jonathan Will, Alexander Acker, Thorsten Wittkopp, Odej Kao

    Abstract: Distributed dataflow systems enable the use of clusters for scalable data analytics. However, selecting appropriate cluster resources for a processing job is often not straightforward. Performance models trained on historical executions of a concrete job are helpful in such situations, yet they are usually bound to a specific job execution context (e.g. node type, software versions, job parameters… ▽ More

    Submitted 17 October, 2021; v1 submitted 29 July, 2021; originally announced July 2021.

    Comments: 10 pages, 8 figures, 2 tables

    Journal ref: IEEE CLUSTER (2021) 261-270

  40. C3O: Collaborative Cluster Configuration Optimization for Distributed Data Processing in Public Clouds

    Authors: Jonathan Will, Lauritz Thamsen, Dominik Scheinert, Jonathan Bader, Odej Kao

    Abstract: Distributed dataflow systems enable data-parallel processing of large datasets on clusters. Public cloud providers offer a large variety and quantity of resources that can be used for such clusters. Yet, selecting appropriate cloud resources for dataflow jobs - that neither lead to bottlenecks nor to low resource utilization - is often challenging, even for expert users such as data engineers. W… ▽ More

    Submitted 1 December, 2021; v1 submitted 28 July, 2021; originally announced July 2021.

    Comments: 10 pages, 5 figures, IEEE IC2E 2021. arXiv admin note: text overlap with arXiv:2011.07965

    ACM Class: C.2.4; I.2.8; I.2.6

    Journal ref: IEEE IC2E (2021) 43-52

  41. arXiv:2104.10085  [pdf, other

    cs.AI cs.LG

    Predicting Medical Interventions from Vital Parameters: Towards a Decision Support System for Remote Patient Monitoring

    Authors: Kordian Gontarska, Weronika Wrazen, Jossekin Beilharz, Robert Schmid, Lauritz Thamsen, Andreas Polze

    Abstract: Cardiovascular diseases and heart failures in particular are the main cause of non-communicable disease mortality in the world. Constant patient monitoring enables better medical treatment as it allows practitioners to react on time and provide the appropriate treatment. Telemedicine can provide constant remote monitoring so patients can stay in their homes, only requiring medical sensing equipmen… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

  42. Detecting and Mitigating Network Packet Overloads on Real-Time Devices in IoT Systems

    Authors: Robert Danicki, Martin Haug, Ilja Behnke, Laurenz Mädje, Lauritz Thamsen

    Abstract: Manufacturing, automotive, and aerospace environments use embedded systems for control and automation and need to fulfill strict real-time guarantees. To facilitate more efficient business processes and remote control, such devices are being connected to IP networks. Due to the difficulty in predicting network packets and the interrelated workloads of interrupt handlers and drivers, devices contro… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: EdgeSys '21

  43. arXiv:2103.06026  [pdf, other

    cs.NI

    Towards a Cognitive Compute Continuum: An Architecture for Ad-Hoc Self-Managed Swarms

    Authors: Ana Juan Ferrer, Soeren Becker, Florian Schmidt, Lauritz Thamsen, Odej Kao

    Abstract: In this paper we introduce our vision of a Cognitive Computing Continuum to address the changing IT service provisioning towards a distributed, opportunistic, self-managed collaboration between heterogeneous devices outside the traditional data center boundaries. The focal point of this continuum are cognitive devices, which have to make decisions autonomously using their on-board computation and… ▽ More

    Submitted 10 March, 2021; originally announced March 2021.

    Comments: 8 pages, CCGrid 2021 Cloud2Things Workshop

  44. Learning Dependencies in Distributed Cloud Applications to Identify and Localize Anomalies

    Authors: Dominik Scheinert, Alexander Acker, Lauritz Thamsen, Morgan K. Geldenhuys, Odej Kao

    Abstract: Operation and maintenance of large distributed cloud applications can quickly become unmanageably complex, putting human operators under immense stress when problems occur. Utilizing machine learning for identification and localization of anomalies in such systems supports human experts and enables fast mitigation. However, due to the various inter-dependencies of system components, anomalies do n… ▽ More

    Submitted 9 September, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

    Comments: 6 pages, 5 figures, 3 tables

    Journal ref: IEEE/ACM CloudIntelligence (2021) 7-12

  45. arXiv:2103.01170  [pdf, other

    cs.DC

    LEAF: Simulating Large Energy-Aware Fog Computing Environments

    Authors: Philipp Wiesner, Lauritz Thamsen

    Abstract: Despite constant improvements in efficiency, today's data centers and networks consume enormous amounts of energy and this demand is expected to rise even further. An important research question is whether and how fog computing can curb this trend. As real-life deployments of fog infrastructure are still rare, a significant part of research relies on simulations. However, existing power models usu… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

    Comments: To appear in the Proceedings of the 5th IEEE International Conference on Fog and Edge Computing 2021

  46. PIERES: A Playground for Network Interrupt Experiments on Real-Time Embedded Systems in the IoT

    Authors: Franz Bender, Jan Jonas Brune, Nick Lauritz Keutel, Ilja Behnke, Lauritz Thamsen

    Abstract: IoT devices have become an integral part of our lives and the industry. Many of these devices run real-time systems or are used as part of them. As these devices receive network packets over IP networks, the network interface informs the CPU about their arrival using interrupts that might preempt critical processes. Therefore, the question arises whether network interrupts pose a threat to the rea… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

    Comments: The Ninth International Workshop on Load Testing and Benchmarking of Software Systems (LTB 2021)

    ACM Class: I.6.7; B.8.1

    Journal ref: 2021 Companion of the ACM/SPEC International Conference on Performance Engineering (ICPE '21), 81-84

  47. Hugo: A Cluster Scheduler that Efficiently Learns to Select Complementary Data-Parallel Jobs

    Authors: Lauritz Thamsen, Ilya Verbitskiy, Sasho Nedelkoski, Vinh Thuy Tran, Vinicius Meyer, Miguel G. Xavier, Odej Kao, Cesar A. F. De Rose

    Abstract: Distributed data processing systems like MapReduce, Spark, and Flink are popular tools for analysis of large datasets with cluster resources. Yet, users often overprovision resources for their data processing jobs, while the resource usage of these jobs also typically fluctuates considerably. Therefore, multiple jobs usually get scheduled onto the same shared resources to increase the resource uti… ▽ More

    Submitted 14 February, 2021; originally announced February 2021.

  48. arXiv:2102.06170  [pdf, other

    cs.DC

    Chiron: Optimizing Fault Tolerance in QoS-aware Distributed Stream Processing Jobs

    Authors: Morgan Geldenhuys, Lauritz Thamsen, Odej Kao

    Abstract: Fault tolerance is a property which needs deeper consideration when dealing with streaming jobs requiring high levels of availability and low-latency processing even in case of failures where Quality-of-Service constraints must be adhered to. Typically, systems achieve fault tolerance and the ability to recover automatically from partial failures by implementing Checkpoint and Rollback Recovery. H… ▽ More

    Submitted 11 February, 2021; originally announced February 2021.

  49. arXiv:2102.06094  [pdf, other

    cs.DC

    Effectively Testing System Configurations of Critical IoT Analytics Pipelines

    Authors: Morgan Geldenhuys, Lauritz Thamsen, Kain Kordian Gontarska, Felix Lorenz, Odej Kao

    Abstract: The emergence of the Internet of Things has seen the introduction of numerous connected devices used for the monitoring and control of even Critical Infrastructures. Distributed stream processing has become key to analyzing data generated by these connected devices and improving our ability to make decisions. However, optimizing these systems towards specific Quality of Service targets is a diffic… ▽ More

    Submitted 25 February, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

  50. Interrupting Real-Time IoT Tasks: How Bad Can It Be to Connect Your Critical Embedded System to the Internet?

    Authors: Ilja Behnke, Lukas Pirl, Lauritz Thamsen, Robert Danicki, Andreas Polze, Odej Kao

    Abstract: Embedded systems have been used to control physical environments for decades. Usually, such use cases require low latencies between commands and actions as well as a high predictability of the expected worst-case delay. To achieve this on small, low-powered microcontrollers, Real-Time Operating Systems (RTOSs) are used to manage the different tasks on these machines as deterministically as possibl… ▽ More

    Submitted 13 April, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

    Comments: IPCCC 2020: 39th International Performance Computing and Communications Conference

    Journal ref: 39th International Performance Computing and Communications Conference (IPCCC), IEEE, 2020, pp. 1-6