Skip to main content

Showing 1–16 of 16 results for author: Henning, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.07917  [pdf, other

    cs.DC cs.PF cs.SE

    High-level Stream Processing: A Complementary Analysis of Fault Recovery

    Authors: Adriano Vogel, Sören Henning, Esteban Perez-Wohlfeil, Otmar Ertl, Rick Rabiser

    Abstract: Parallel computing is very important to accelerate the performance of software systems. Additionally, considering that a recurring challenge is to process high data volumes continuously, stream processing emerged as a paradigm and software architectural style. Several software systems rely on stream processing to deliver scalable performance, whereas open-source frameworks provide coding abstracti… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Extended paper version. arXiv admin note: substantial text overlap with arXiv:2404.06203

  2. A Comprehensive Benchmarking Analysis of Fault Recovery in Stream Processing Frameworks

    Authors: Adriano Vogel, Sören Henning, Esteban Perez-Wohlfeil, Otmar Ertl, Rick Rabiser

    Abstract: Nowadays, several software systems rely on stream processing architectures to deliver scalable performance and handle large volumes of data in near real-time. Stream processing frameworks facilitate scalable computing by distributing the application's execution across multiple machines. Despite performance being extensively studied, the measurement of fault tolerance-a key feature offered by strea… ▽ More

    Submitted 29 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: Accepted for publication in the 18th ACM International Conference on Distributed and Event-Based Systems (DEBS'24), June 24-28, 2024, Villeurbanne, France, 12 pages

  3. ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks

    Authors: Sören Henning, Adriano Vogel, Michael Leichtfried, Otmar Ertl, Rick Rabiser

    Abstract: Distributed stream processing frameworks help building scalable and reliable applications that perform transformations and aggregations on continuous data streams. This paper introduces ShuffleBench, a novel benchmark to evaluate the performance of modern stream processing frameworks. In contrast to other benchmarks, it focuses on use cases where stream processing frameworks are mainly employed fo… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: accepted for publication in Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering (ICPE '24), May 7--11, 2024, London, United Kingdom, 12 pages

  4. arXiv:2310.12702  [pdf, other

    cs.SE cs.DC cs.PF

    Benchmarking Function Hook Latency in Cloud-Native Environments

    Authors: Mario Kahlhofer, Patrick Kern, Sören Henning, Stefan Rass

    Abstract: Researchers and engineers are increasingly adopting cloud-native technologies for application development and performance evaluation. While this has improved the reproducibility of benchmarks in the cloud, the complexity of cloud-native environments makes it difficult to run benchmarks reliably. Cloud-native applications are often instrumented or altered at runtime, by dynamically patching or hook… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: to be published in the 14th Symposium on Software Performance (SSP 2023), source code available at https://github.com/dynatrace-research/function-hook-latency-benchmarking

  5. arXiv:2307.02340  [pdf, other

    cs.CL

    MuLMS-AZ: An Argumentative Zoning Dataset for the Materials Science Domain

    Authors: Timo Pierre Schrader, Teresa Bürkle, Sophie Henning, Sherry Tan, Matteo Finco, Stefan Grünewald, Maira Indrikova, Felix Hildebrand, Annemarie Friedrich

    Abstract: Scientific publications follow conventionalized rhetorical structures. Classifying the Argumentative Zone (AZ), e.g., identifying whether a sentence states a Motivation, a Result or Background information, has been proposed to improve processing of scholarly documents. In this work, we adapt and extend this idea to the domain of materials science research. We present and release a new dataset of 5… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: 15 pages, 2 figures, 14 tables, to be published in "Proceedings of the 4th Workshop on Computational Approaches to Discourse"

  6. Benchmarking scalability of stream processing frameworks deployed as microservices in the cloud

    Authors: Sören Henning, Wilhelm Hasselbring

    Abstract: Context: The combination of distributed stream processing with microservice architectures is an emerging pattern for building data-intensive software systems. In such systems, stream processing frameworks such as Apache Flink, Apache Kafka Streams, Apache Samza, Hazelcast Jet, or the Apache Beam SDK are used inside microservices to continuously process massive amounts of data in a distributed fash… ▽ More

    Submitted 17 October, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: 19 pages

    Journal ref: Journal of Systems and Software, Volume 208, February 2024, 111879

  7. arXiv:2212.07156  [pdf, other

    cs.CL cs.AI

    MIST: a Large-Scale Annotated Resource and Neural Models for Functions of Modal Verbs in English Scientific Text

    Authors: Sophie Henning, Nicole Macher, Stefan Grünewald, Annemarie Friedrich

    Abstract: Modal verbs (e.g., "can", "should", or "must") occur highly frequently in scientific articles. Decoding their function is not straightforward: they are often used for hedging, but they may also denote abilities and restrictions. Understanding their meaning is important for various NLP tasks such as writing assistance or accurate information extraction from scientific text. To foster research on… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: 20 pages, 7 figures. Accepted to EMNLP Findings 2022; typesetting of this version slightly differs from conference version

  8. arXiv:2210.04675  [pdf, other

    cs.CL cs.AI

    A Survey of Methods for Addressing Class Imbalance in Deep-Learning Based Natural Language Processing

    Authors: Sophie Henning, William Beluch, Alexander Fraser, Annemarie Friedrich

    Abstract: Many natural language processing (NLP) tasks are naturally imbalanced, as some target categories occur much more frequently than others in the real world. In such scenarios, current NLP models still tend to perform poorly on less frequent classes. Addressing class imbalance in NLP is an active research topic, yet, finding a good approach for a particular task and imbalance scenario is difficult.… ▽ More

    Submitted 22 February, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: Camera-ready version for EACL 2023

  9. arXiv:2204.11509  [pdf, other

    cs.DC

    Streaming vs. Functions: A Cost Perspective on Cloud Event Processing

    Authors: Tobias Pfandzelter, Sören Henning, Trever Schirmer, Wilhelm Hasselbring, David Bermbach

    Abstract: In cloud event processing, data generated at the edge is processed in real-time by cloud resources. Both distributed stream processing (DSP) and Function-as-a-Service (FaaS) have been proposed to implement such event processing applications. FaaS emphasizes fast development and easy operation, while DSP emphasizes efficient handling of large data volumes. Despite their architectural differences, b… ▽ More

    Submitted 12 August, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

    Comments: Accepted for Publication at the 10th IEEE International Conference on Cloud Engineering (IC2E 2022)

  10. Goals and Measures for Analyzing Power Consumption Data in Manufacturing Enterprises

    Authors: Sören Henning, Wilhelm Hasselbring, Heinz Burmester, Armin Möbius, Maik Wojcieszak

    Abstract: The Internet of Things adoption in the manufacturing industry allows enterprises to monitor their electrical power consumption in real time and at machine level. In this paper, we follow up on such emerging opportunities for data acquisition and show that analyzing power consumption in manufacturing enterprises can serve a variety of purposes. Apart from the prevalent goal of reducing overall powe… ▽ More

    Submitted 22 September, 2020; originally announced September 2020.

    Comments: 24 pages

    Journal ref: Journal of Data, Information and Management (2021)

  11. Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines in Microservice Architectures

    Authors: Sören Henning, Wilhelm Hasselbring

    Abstract: Distributed stream processing engines are designed with a focus on scalability to process big data volumes in a continuous manner. We present the Theodolite method for benchmarking the scalability of distributed stream processing engines. Core of this method is the definition of use cases that microservices implementing stream processing have to fulfill. For each use case, our method identifies re… ▽ More

    Submitted 11 February, 2021; v1 submitted 1 September, 2020; originally announced September 2020.

    Comments: 28 pages

    Journal ref: Big Data Research 25 (2021)

  12. Scalable and Reliable Multi-Dimensional Aggregation of Sensor Data Streams

    Authors: Sören Henning, Wilhelm Hasselbring

    Abstract: Ever-increasing amounts of data and requirements to process them in real time lead to more and more analytics platforms and software systems being designed according to the concept of stream processing. A common area of application is the processing of continuous data streams from sensors, for example, IoT devices or performance monitoring tools. In addition to analyzing pure sensor data, analyses… ▽ More

    Submitted 15 November, 2019; originally announced November 2019.

    Comments: 6 pages

    Journal ref: 2019 IEEE International Conference on Big Data (Big Data)

  13. Industrial DevOps

    Authors: Wilhelm Hasselbring, Sören Henning, Björn Latte, Armin Möbius, Thomas Richter, Stefan Schalk, Maik Wojcieszak

    Abstract: The visions and ideas of Industry 4.0 require a profound interconnection of machines, plants, and IT systems in industrial production environments. This significantly increases the importance of software, which is coincidentally one of the main obstacles to the introduction of Industry 4.0. Lack of experience and knowledge, high investment and maintenance costs, as well as uncertainty about future… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

    Comments: 10 pages

    Journal ref: 2019 IEEE International Conference on Software Architecture Companion (ICSA-C)

  14. A Scalable Architecture for Power Consumption Monitoring in Industrial Production Environments

    Authors: Sören Henning, Wilhelm Hasselbring, Armin Möbius

    Abstract: Detailed knowledge about the electrical power consumption in industrial production environments is a prerequisite to reduce and optimize their power consumption. Today's industrial production sites are equipped with a variety of sensors that, inter alia, monitor electrical power consumption in detail. However, these environments often lack an automated data collation and analysis. We present a s… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

    Comments: 10 pages

    Journal ref: 2019 IEEE International Conference on Fog Computing (ICFC)

  15. arXiv:1806.10654  [pdf, other

    cs.CL

    Generalized chart constraints for efficient PCFG and TAG parsing

    Authors: Stefan Grünewald, Sophie Henning, Alexander Koller

    Abstract: Chart constraints, which specify at which string positions a constituent may begin or end, have been shown to speed up chart parsers for PCFGs. We generalize chart constraints to more expressive grammar formalisms and describe a neural tagger which predicts chart constraints at very high precision. Our constraints accelerate both PCFG and TAG parsing, and combine effectively with other pruning tec… ▽ More

    Submitted 27 June, 2018; originally announced June 2018.

    Journal ref: Proceedings of ACL 2018 (Short Papers)

  16. arXiv:1705.04587  [pdf, ps, other

    cs.CC

    Complexity and Inapproximability Results for Parallel Task Scheduling and Strip Packing

    Authors: Sören Henning, Klaus Jansen, Malin Rau, Lars Schmarje

    Abstract: We study the Parallel Task Scheduling problem $Pm|size_j|C_{\max}$ with a constant number of machines. This problem is known to be strongly NP-complete for each $m \geq 5$, while it is solvable in pseudo-polynomial time for each $m \leq 3$. We give a positive answer to the long-standing open question whether this problem is strongly $NP$-complete for $m=4$. As a second result, we improve the lower… ▽ More

    Submitted 12 May, 2017; originally announced May 2017.