-
Toward Research Software Categories
Authors:
Wilhelm Hasselbring,
Stephan Druskat,
Jan Bernoth,
Philine Betker,
Michael Felderer,
Stephan Ferenz,
Anna-Lena Lamprecht,
Jan Linxweiler,
Bernhard Rumpe
Abstract:
Research software has been categorized in different contexts to serve different goals. We start with a look at what research software is, before we discuss the purpose of research software categories. We propose a multi-dimensional categorization of research software. We present a template for characterizing such categories. As selected dimensions, we present our proposed role-based, developer-bas…
▽ More
Research software has been categorized in different contexts to serve different goals. We start with a look at what research software is, before we discuss the purpose of research software categories. We propose a multi-dimensional categorization of research software. We present a template for characterizing such categories. As selected dimensions, we present our proposed role-based, developer-based, and maturity-based categories. Since our work has been inspired by various previous efforts to categorize research software, we discuss them as related works. We characterize all these categories via the previously introduced template, to enable a systematic comparison.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
From Digital Twins to Digital Twin Prototypes: Concepts, Formalization, and Applications
Authors:
Alexander Barbie,
Wilhelm Hasselbring
Abstract:
The transformation to Industry 4.0 also transforms the processes of how we develop intelligent manufacturing production systems. To advance the software development of these new (embedded) software systems, digital twins may be employed. However, there is no consensual definition of what a digital twin is. In this paper, we give an overview of the current state of the digital twin concept and form…
▽ More
The transformation to Industry 4.0 also transforms the processes of how we develop intelligent manufacturing production systems. To advance the software development of these new (embedded) software systems, digital twins may be employed. However, there is no consensual definition of what a digital twin is. In this paper, we give an overview of the current state of the digital twin concept and formalize the digital twin concept using the Object-Z notation. This formalization includes the concepts of physical twins, digital models, digital templates, digital threads, digital shadows, digital twins, and digital twin prototypes. The relationships between all these concepts are visualized as UML class diagrams.
Our digital twin prototype (DTP) approach supports engineers during the development and automated testing of complex embedded software systems. This approach enable engineers to test embedded software systems in a virtual context, without the need of a connection to a physical object. In continuous integration / continuous deployment pipelines such digital twin prototypes can be used for automated integration testing and, thus, allow for an agile verification and validation process.
In this paper, we demonstrate and report on how to apply and implement a digital twin by the example of two real-world field studies (ocean observation systems and smart farming). For independent replication and extension of our approach by other researchers, we provide a lab study published open source on GitHub.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Enabling Automated Integration Testing of Smart Farming Applications via Digital Twin Prototypes
Authors:
Alexander Barbie,
Wilhelm Hasselbring,
Malte Hansen
Abstract:
Industry 4.0 represents a major technological shift that has the potential to transform the manufacturing industry, making it more efficient, productive, and sustainable. Smart farming is a concept that involves the use of advanced technologies to improve the efficiency and sustainability of agricultural practices. Industry 4.0 and smart farming are closely related, as many of the technologies use…
▽ More
Industry 4.0 represents a major technological shift that has the potential to transform the manufacturing industry, making it more efficient, productive, and sustainable. Smart farming is a concept that involves the use of advanced technologies to improve the efficiency and sustainability of agricultural practices. Industry 4.0 and smart farming are closely related, as many of the technologies used in smart farming are also used in Industry 4.0. Digital twins have the potential for cost-effective software development of such applications. With our Digital Twin Prototype approach, all sensor interfaces are integrated into the development process, and their inputs and outputs of the emulated hardware match those of the real hardware. The emulators respond to the same commands and return identically formatted data packages as their real counterparts, making the Digital Twin Prototype a valid source of a digital shadow, i.e. the Digital Twin Prototype is a prototype of the physical twin and can replace it for automated testing of the digital twin software. In this paper, we present a case study for employing our Digital Twin Prototype approach to automated testing of software for improving the making of silage with a smart farming application. Besides automated testing with continuous integration, we also discuss continuous deployment of modular Docker containers in this context.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Embedded Software Development with Digital Twins: Specific Requirements for Small and Medium-Sized Enterprises
Authors:
Alexander Barbie,
Wilhelm Hasselbring
Abstract:
The transformation to Industry 4.0 changes the way embedded software systems are developed. Digital twins have the potential for cost-effective software development and maintenance strategies. With reduced costs and faster development cycles, small and medium-sized enterprises (SME) have the chance to grow with new smart products. We interviewed SMEs about their current development processes. In t…
▽ More
The transformation to Industry 4.0 changes the way embedded software systems are developed. Digital twins have the potential for cost-effective software development and maintenance strategies. With reduced costs and faster development cycles, small and medium-sized enterprises (SME) have the chance to grow with new smart products. We interviewed SMEs about their current development processes. In this paper, we present the first results of these interviews. First results show that real-time requirements prevent, to date, a Software-in-the-Loop development approach, due to a lack of proper tooling. Security/safety concerns, and the accessibility of hardware are the main impediments. Only temporary access to the hardware leads to Software-in-the-Loop development approaches based on simulations/emulators. Yet, this is not in all use cases possible. All interviewees see the potential of Software-in-the-Loop approaches and digital twins with regard to quality and customization. One reason it will take some effort to convince engineers, is the conservative nature of the embedded community, particularly in SMEs.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
Collaborative, Code-Proximal Dynamic Software Visualization within Code Editors
Authors:
Alexander Krause-Glau,
Wilhelm Hasselbring
Abstract:
Software visualizations are usually realized as standalone and isolated tools that use embedded code viewers within the visualization. In the context of program comprehension, only few approaches integrate visualizations into code editors, such as integrated development environments. This is surprising since professional developers consider reading source code as one of the most important ways to…
▽ More
Software visualizations are usually realized as standalone and isolated tools that use embedded code viewers within the visualization. In the context of program comprehension, only few approaches integrate visualizations into code editors, such as integrated development environments. This is surprising since professional developers consider reading source code as one of the most important ways to understand software, therefore spend a lot of time with code editors. In this paper, we introduce the design and proof-of-concept implementation for a software visualization approach that can be embedded into code editors. Our contribution differs from related work in that we use dynamic analysis of a software system's runtime behavior. Additionally, we incorporate distributed tracing. This enables developers to understand how, for example, the currently handled source code behaves as a fully deployed, distributed software system. Our visualization approach enhances common remote pair programming tools and is collaboratively usable by employing shared code cities. As a result, user interactions are synchronized between code editor and visualization, as well as broadcasted to collaborators. To the best of our knowledge, this is the first approach that combines code editors with collaboratively usable code cities. Therefore, we conducted a user study to collect first-time feedback regarding the perceived usefulness and perceived usability of our approach. We additionally collected logging information to provide more data regarding time spent in code cities that are embedded in code editors. Seven teams with two students each participated in that study. The results show that the majority of participants find our approach useful and would employ it for their own use. We provide each participant's video recording, raw results, and all steps to reproduce our experiment as supplementary package.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Towards Solving the Challenge of Minimal Overhead Monitoring
Authors:
David Georg Reichelt,
Stefan Kühne,
Wilhelm Hasselbring
Abstract:
The examination of performance changes or the performance behavior of a software requires the measurement of the performance. This is done via probes, i.e., pieces of code which obtain and process measurement data, and which are inserted into the examined application. The execution of those probes in a singular method creates overhead, which deteriorates performance measurements of calling methods…
▽ More
The examination of performance changes or the performance behavior of a software requires the measurement of the performance. This is done via probes, i.e., pieces of code which obtain and process measurement data, and which are inserted into the examined application. The execution of those probes in a singular method creates overhead, which deteriorates performance measurements of calling methods and slows down the measurement process. Therefore, an important challenge for performance measurement is the reduction of the measurement overhead.
To address this challenge, the overhead should be minimized. Based on an analysis of the sources of performance overhead, we derive the following four optimization options: (1) Source instrumentation instead of AspectJ instrumentation, (2) reduction of measurement data, (3) change of the queue and (4) aggregation of measurement data. We evaluate the effect of these optimization options using the MooBench benchmark. Thereby, we show that these optimizations options reduce the monitoring overhead of the monitoring framework Kieker. For MooBench, the execution duration could be reduced from 4.77 ms to 0.39 ms per method invocation on average.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
Automated Identification of Performance Changes at Code Level
Authors:
David Georg Reichelt,
Stefan Kühne,
Wilhelm Hasselbring
Abstract:
To develop software with optimal performance, even small performance changes need to be identified. Identifying performance changes is challenging since the performance of software is influenced by non-deterministic factors. Therefore, not every performance change is measurable with reasonable effort. In this work, we discuss which performance changes are measurable at code level with reasonable m…
▽ More
To develop software with optimal performance, even small performance changes need to be identified. Identifying performance changes is challenging since the performance of software is influenced by non-deterministic factors. Therefore, not every performance change is measurable with reasonable effort. In this work, we discuss which performance changes are measurable at code level with reasonable measurement effort and how to identify them. We present (1) an analysis of the boundaries of measuring performance changes, (2) an approach for determining a configuration for reproducible performance change identification, and (3) an evaluation comparing of how well our approach is able to identify performance changes in the application server Jetty compared with the usage of Jetty's own performance regression benchmarks. Thereby, we find (1) that small performance differences are only measurable by fine-grained measurement workloads, (2) that performance changes caused by the change of one operation can be identified using a unit-test-sized workload definition and a suitable configuration, and (3) that using our approach identifies small performance regressions more efficiently than using Jetty's performance regression benchmarks.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
Benchmarking scalability of stream processing frameworks deployed as microservices in the cloud
Authors:
Sören Henning,
Wilhelm Hasselbring
Abstract:
Context: The combination of distributed stream processing with microservice architectures is an emerging pattern for building data-intensive software systems. In such systems, stream processing frameworks such as Apache Flink, Apache Kafka Streams, Apache Samza, Hazelcast Jet, or the Apache Beam SDK are used inside microservices to continuously process massive amounts of data in a distributed fash…
▽ More
Context: The combination of distributed stream processing with microservice architectures is an emerging pattern for building data-intensive software systems. In such systems, stream processing frameworks such as Apache Flink, Apache Kafka Streams, Apache Samza, Hazelcast Jet, or the Apache Beam SDK are used inside microservices to continuously process massive amounts of data in a distributed fashion. While all of these frameworks promote scalability as a core feature, there is only little empirical research evaluating and comparing their scalability. Objective: The goal of this study to obtain evidence about the scalability of state-of-the-art stream processing framework in different execution environments and regarding different scalability dimensions. Method: We benchmark five modern stream processing frameworks regarding their scalability using a systematic method. We conduct over 740 hours of experiments on Kubernetes clusters in the Google cloud and in a private cloud, where we deploy up to 110 simultaneously running microservice instances, which process up to one million messages per second. Results: All benchmarked frameworks exhibit approximately linear scalability as long as sufficient cloud resources are provisioned. However, the frameworks show considerable differences in the rate at which resources have to be added to cope with increasing load. There is no clear superior framework, but the ranking of the frameworks depends on the use case. Using Apache Beam as an abstraction layer still comes at the cost of significantly higher resource requirements regardless of the use case. We observe our results regardless of scaling load on a microservice, scaling the computational work performed inside the microservice, and the selected cloud environment. Moreover, vertical scaling can be a complementary measure to achieve scalability of stream processing frameworks.
△ Less
Submitted 17 October, 2023; v1 submitted 20 March, 2023;
originally announced March 2023.
-
SPRAT: A Spatially-Explicit Marine Ecosystem Model Based on Population Balance Equations
Authors:
Arne N. Johanson,
Andreas Oschlies,
Wilhelm Hasselbring,
Wilhelm Hasselbring,
Boris Worm
Abstract:
To successfully manage marine fisheries using an ecosystem-based approach, long-term predictions of fish stock development considering changing environmental conditions are necessary. Such predictions can be provided by end-to-end ecosystem models, which couple existing physical and biogeochemical ocean models with newly developed spatially-explicit fish stock models. Typically, Individual-Based M…
▽ More
To successfully manage marine fisheries using an ecosystem-based approach, long-term predictions of fish stock development considering changing environmental conditions are necessary. Such predictions can be provided by end-to-end ecosystem models, which couple existing physical and biogeochemical ocean models with newly developed spatially-explicit fish stock models. Typically, Individual-Based Models (IBMs) and models based on Advection-Diffusion-Reaction (ADR) equations are employed for the fish stock models. In this paper, we present a novel fish stock model called SPRAT for end-to\hyp{}end ecosystem modeling based on Population Balance Equations (PBEs) that combines the advantages of IBMs and ADR models while avoiding their main drawbacks. SPRAT accomplishes this by describing the modeled ecosystem processes from the perspective of individuals while still being based on partial differential equations. We apply the SPRAT model to explore a well-documented regime shift observed on the eastern Scotian Shelf in the 1990s from a cod-dominated to a herring-dominated ecosystem. Model simulations are able to reconcile the observed multitrophic dynamics with documented changes in both fishing pressure and water temperature, followed by a predator-prey reversal that may have impeded recovery of depleted cod stocks. We conclude that our model can be used to generate new hypotheses and test ideas about spatially interacting fish populations, and their joint responses to both environmental and fisheries forcing.
△ Less
Submitted 30 September, 2022;
originally announced October 2022.
-
Modeling Polyp Activity of Paragorgia arborea Using Supervised Learning
Authors:
Arne Johanson,
Sascha Flögel,
Wolf-Christian Dullo,
Peter Linke,
Wilhelm Hasselbring
Abstract:
While the distribution patterns of cold-water corals, such as Paragorgia arborea, have received increasing attention in recent studies, little is known about their in situ activity patterns. In this paper, we examine polyp activity in P. arborea using machine learning techniques to analyze high-resolution time series data and photographs obtained from an autonomous lander cluster deployed in the S…
▽ More
While the distribution patterns of cold-water corals, such as Paragorgia arborea, have received increasing attention in recent studies, little is known about their in situ activity patterns. In this paper, we examine polyp activity in P. arborea using machine learning techniques to analyze high-resolution time series data and photographs obtained from an autonomous lander cluster deployed in the Stjernsund, Norway. An interactive illustration of the models derived in this paper is provided online as supplementary material. We find that the best predictor of the degree of extension of the coral polyps is current direction with a lag of three hours. Other variables that are not directly associated with water currents, such as temperature and salinity, offer much less information concerning polyp activity. Interestingly, the degree of polyp extension can be predicted more reliably by sampling the laminar flows in the water column above the measurement site than by sampling the more turbulent flows in the direct vicinity of the corals. Our results show that the activity patterns of the P. arborea polyps are governed by the strong tidal current regime of the Stjernsund. It appears that P. arborea does not react to shorter changes in the ambient current regime but instead adjusts its behavior in accordance with the large-scale pattern of the tidal cycle itself in order to optimize nutrient uptake.
△ Less
Submitted 26 September, 2022;
originally announced September 2022.
-
Streaming vs. Functions: A Cost Perspective on Cloud Event Processing
Authors:
Tobias Pfandzelter,
Sören Henning,
Trever Schirmer,
Wilhelm Hasselbring,
David Bermbach
Abstract:
In cloud event processing, data generated at the edge is processed in real-time by cloud resources. Both distributed stream processing (DSP) and Function-as-a-Service (FaaS) have been proposed to implement such event processing applications. FaaS emphasizes fast development and easy operation, while DSP emphasizes efficient handling of large data volumes. Despite their architectural differences, b…
▽ More
In cloud event processing, data generated at the edge is processed in real-time by cloud resources. Both distributed stream processing (DSP) and Function-as-a-Service (FaaS) have been proposed to implement such event processing applications. FaaS emphasizes fast development and easy operation, while DSP emphasizes efficient handling of large data volumes. Despite their architectural differences, both can be used to model and implement loosely-coupled job graphs.
In this paper, we consider the selection of FaaS and DSP from a cost perspective. We implement stateless and stateful workflows from the Theodolite benchmarking suite using cloud FaaS and DSP. In an extensive evaluation, we show how application type, cloud service provider, and runtime environment can influence the cost of application deployments and derive decision guidelines for cloud engineers.
△ Less
Submitted 12 August, 2022; v1 submitted 25 April, 2022;
originally announced April 2022.
-
Thematic Domain Analysis for Ocean Modeling
Authors:
Reiner Jung,
Sven Gundlach,
Wilhelm Hasselbring
Abstract:
Ocean science is a discipline that employs ocean models as an essential research asset. Such scientific modeling provides mathematical abstractions of real-world systems, e.g., the oceans. These models are then coded as implementations of the mathematical abstractions. The developed software systems are called models of the real-world system.
To advance the state in engineering such ocean models…
▽ More
Ocean science is a discipline that employs ocean models as an essential research asset. Such scientific modeling provides mathematical abstractions of real-world systems, e.g., the oceans. These models are then coded as implementations of the mathematical abstractions. The developed software systems are called models of the real-world system.
To advance the state in engineering such ocean models, we intend to better understand how ocean models are developed and maintained in ocean science. In this paper, we present the results of semi-structured interviews and the Thematic Analysis~(TA) of the interview results to analyze the domain of ocean modeling. Thereby, we identified developer requirements and impediments to model development and evolution, and related themes. This analysis can help to understand where methods from software engineering should be introduced and which challenges need to be addressed.
We suggest that other researchers extend and repeat our TA with model developers and research software engineers working in related domains to further advance our knowledge and skills in scientific modeling.
△ Less
Submitted 1 February, 2022;
originally announced February 2022.
-
JavaBERT: Training a transformer-based model for the Java programming language
Authors:
Nelson Tavares de Sousa,
Wilhelm Hasselbring
Abstract:
Code quality is and will be a crucial factor while develo** new software code, requiring appropriate tools to ensure functional and reliable code. Machine learning techniques are still rarely used for software engineering tools, missing out the potential benefits of its application. Natural language processing has shown the potential to process text data regarding a variety of tasks. We argue, t…
▽ More
Code quality is and will be a crucial factor while develo** new software code, requiring appropriate tools to ensure functional and reliable code. Machine learning techniques are still rarely used for software engineering tools, missing out the potential benefits of its application. Natural language processing has shown the potential to process text data regarding a variety of tasks. We argue, that such models can also show similar benefits for software code processing. In this paper, we investigate how models used for natural language processing can be trained upon software code. We introduce a data retrieval pipeline for software code and train a model upon Java software code. The resulting model, JavaBERT, shows a high accuracy on the masked language modeling task showing its potential for software engineering tools.
△ Less
Submitted 20 October, 2021;
originally announced October 2021.
-
Live Visualization of Dynamic Software Cities with Heat Map Overlays
Authors:
Alexander Krause,
Malte Hansen,
Wilhelm Hasselbring
Abstract:
The 3D city metaphor in software visualization is a well-explored rendering method. Numerous tools use their custom variation to visualize offline-analyzed data. Heat map overlays are one of these variants. They introduce a separate information layer in addition to the software city's own semantics. Results show that their usage facilitates program comprehension.
In this paper, we present our he…
▽ More
The 3D city metaphor in software visualization is a well-explored rendering method. Numerous tools use their custom variation to visualize offline-analyzed data. Heat map overlays are one of these variants. They introduce a separate information layer in addition to the software city's own semantics. Results show that their usage facilitates program comprehension.
In this paper, we present our heat map approach for the city metaphor visualization based on live trace analysis. In comparison to previous approaches, our implementation uses live dynamic analysis of a software system's runtime behavior. At any time, users can toggle the heat map feature and choose which runtime-dependent metric the heat map should visualize. Our approach continuously and automatically renders both software cities and heat maps. It does not require a manual or semi-automatic generation of heat maps and seamlessly blends into the overall software visualization. We implemented this approach in our web-based tool ExplorViz, such that the heat map overlay is also available in our augmented reality environment. ExplorViz is developed as open source software and is continuously published via Docker images. A live demo of ExplorViz is publicly available.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
Software Development Processes in Ocean System Modeling
Authors:
Reiner Jung,
Sven Gundlach,
Wilhelm Hasselbring
Abstract:
Scientific modeling provides mathematical abstractions of real-world systems and builds software as implementations of these mathematical abstractions. Ocean science is a multidisciplinary discipline develo** scientific models and simulations as ocean system models that are an essential research asset.
In software engineering and information systems research, modeling is also an essential acti…
▽ More
Scientific modeling provides mathematical abstractions of real-world systems and builds software as implementations of these mathematical abstractions. Ocean science is a multidisciplinary discipline develo** scientific models and simulations as ocean system models that are an essential research asset.
In software engineering and information systems research, modeling is also an essential activity. In particular, business process modeling for business process management and systems engineering is the activity of representing processes of an enterprise, so that the current process may be analyzed, improved, and automated.
In this paper, we employ process modeling for analyzing scientific software development in ocean science to advance the state in engineering of ocean system models and to better understand how ocean system models are developed and maintained in ocean science. We interviewed domain experts in semi-structured interviews, analyzed the results via thematic analysis, and modeled the results via the business process modeling notation BPMN.
The processes modeled as a result describe an aspired state of software development in the domain, which are often not (yet) implemented. This enables existing processes in simulation-based system engineering to be improved with the help of these process models.
△ Less
Submitted 19 August, 2021;
originally announced August 2021.
-
Control Flow Versus Data Flow in Distributed Systems Integration: Revival of Flow-Based Programming for the Industrial Internet of Things
Authors:
Wilhelm Hasselbring,
Maik Wojcieszak,
Schahram Dustdar
Abstract:
When we consider the application layer of networked infrastructures, data and control flow are important concerns in distributed systems integration. Modularity is a fundamental principle in software design, in particular for distributed system architectures. Modularity emphasizes high cohesion of individual modules and low coupling between modules. Microservices are a recent modularization approa…
▽ More
When we consider the application layer of networked infrastructures, data and control flow are important concerns in distributed systems integration. Modularity is a fundamental principle in software design, in particular for distributed system architectures. Modularity emphasizes high cohesion of individual modules and low coupling between modules. Microservices are a recent modularization approach with the specific requirements of independent deployability and, in particular, decentralized data management. Cohesiveness of microservices goes hand-in-hand with loose coupling, making the development, deployment, and evolution of microservice architectures flexible and scalable. However, in our experience with microservice architectures, interactions and flows among microservices are usually more complex than in traditional, monolithic enterprise systems, since services tend to be smaller and only have one responsibility, causing collaboration needs. We suggest that for loose coupling among microservices, explicit control-flow modeling and execution with central workflow engines should be avoided on the application integration level. On the level of integrating microservices, data-flow modeling should be dominant. Control-flow should be secondary and preferably delegated to the microservices. We discuss coupling in distributed systems integration and reflect the history of business process modeling with respect to data and control flow. To illustrate our recommendations, we present some results for flow-based programming in our Industrial DevOps project Titan, where we employ flow-based programming for the Industrial Internet of Things.
△ Less
Submitted 18 August, 2021;
originally announced August 2021.
-
Benchmarking as Empirical Standard in Software Engineering Research
Authors:
Wilhelm Hasselbring
Abstract:
In empirical software engineering, benchmarks can be used for comparing different methods, techniques and tools. However, the recent ACM SIGSOFT Empirical Standards for Software Engineering Research do not include an explicit checklist for benchmarking. In this paper, we discuss benchmarks for software performance and scalability evaluation as example research areas in software engineering, relate…
▽ More
In empirical software engineering, benchmarks can be used for comparing different methods, techniques and tools. However, the recent ACM SIGSOFT Empirical Standards for Software Engineering Research do not include an explicit checklist for benchmarking. In this paper, we discuss benchmarks for software performance and scalability evaluation as example research areas in software engineering, relate benchmarks to some other empirical research methods, and discuss the requirements on benchmarks that may constitute the basis for a checklist of a benchmarking standard for empirical software engineering research.
△ Less
Submitted 1 May, 2021;
originally announced May 2021.
-
Continuous API Evolution in Heterogenous Enterprise Software Systems
Authors:
Holger Knoche,
Wilhelm Hasselbring
Abstract:
The ability to independently deploy parts of a software system is one of the cornerstones of modern software development, and allows for these parts to evolve independently and at different speeds.
A major challenge of such independent deployment, however, is to ensure that despite their individual evolution, the interfaces between interacting parts remain compatible. This is especially importan…
▽ More
The ability to independently deploy parts of a software system is one of the cornerstones of modern software development, and allows for these parts to evolve independently and at different speeds.
A major challenge of such independent deployment, however, is to ensure that despite their individual evolution, the interfaces between interacting parts remain compatible. This is especially important for enterprise software systems, which are often highly integrated and based on heterogenous IT infrastructures.
Although several approaches for interface evolution have been proposed, many of these rely on the developer to adhere to certain rules, but provide little guidance for doing so. In this paper, we present an approach for interface evolution that is easy to use for developers, and also addresses typical challenges of heterogenous enterprise software, especially legacy system integration.
△ Less
Submitted 21 March, 2021;
originally announced March 2021.
-
Towards Automated Metamorphic Test Identification for Ocean System Models
Authors:
Dilip Jagadeeshwarswamy Hiremath,
Martin Claus,
Wilhelm Hasselbring,
Willi Rath
Abstract:
Metamorphic testing seeks to verify software in the absence of test oracles. Our application domain is ocean system modeling, where test oracles rarely exist, but where symmetries of the simulated physical systems are known. The input data set is large owing to the requirements of the application domain. This paper presents work in progress for the automated generation of metamorphic test scenario…
▽ More
Metamorphic testing seeks to verify software in the absence of test oracles. Our application domain is ocean system modeling, where test oracles rarely exist, but where symmetries of the simulated physical systems are known. The input data set is large owing to the requirements of the application domain. This paper presents work in progress for the automated generation of metamorphic test scenarios using machine learning. We extended our previously proposed method [1] to identify metamorphic relations with reduced computational complexity. Initially, we represent metamorphic relations as identity maps. We construct a cost function that minimizes for identifying a metamorphic relation orthogonal to previously found metamorphic relations and penalize for the identity map. A machine learning algorithm is used to identify all possible metamorphic relations minimizing the defined cost function. We propose applying dimensionality reduction techniques to identify attributes in the input which have high variance among the identified metamorphic relations. We apply mutation on these selected attributes to identify distinct metamorphic relations with reduced computational complexity. For experimental evaluation, we subject the two implementations of an ocean-modeling application to the proposed method to present the use of metamorphic relations to test the two implementations of this application.
△ Less
Submitted 17 March, 2021;
originally announced March 2021.
-
Prototy** Autonomous Robotic Networks on Different Layers of RAMI 4.0 with Digital Twins
Authors:
Alexander Barbie,
Wilhelm Hasselbring,
Niklas Pech,
Stefan Sommer,
Sascha Flögel,
Frank Wenzhöfer
Abstract:
In this decade, the amount of (industrial) Internet of Things devices will increase tremendously. Today, there exist no common standards for interconnection, observation, or the monitoring of these devices. In context of the German "Industrie 4.0" strategy the Reference Architectural Model Industry 4.0 (RAMI 4.0) was introduced to connect different aspects of this rapid development. The idea is to…
▽ More
In this decade, the amount of (industrial) Internet of Things devices will increase tremendously. Today, there exist no common standards for interconnection, observation, or the monitoring of these devices. In context of the German "Industrie 4.0" strategy the Reference Architectural Model Industry 4.0 (RAMI 4.0) was introduced to connect different aspects of this rapid development. The idea is to let different stakeholders of these products speak and understand the same terminology. In this paper, we present an approach using Digital Twins to prototype different layers along the axis of the RAMI 4.0, by the example of an autonomous ocean observation system developed in the project ARCHES.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
Develo** an Underwater Network of Ocean Observation Systems with Digital Twin Prototypes -- A Field Report from the Baltic Sea
Authors:
Alexander Barbie,
Niklas Pech,
Wilhelm Hasselbring,
Sascha Flögel,
Frank Wenzhöfer,
Michael Walter,
Elena Shchekinova,
Marc Busse,
Matthias Türk,
Michael Hofbauer,
Stefan Sommer
Abstract:
During the research cruise AL547 with RV ALKOR (October 20-31, 2020), a collaborative underwater network of ocean observation systems was deployed in Boknis Eck (SW Baltic Sea, German exclusive economic zone (EEZ)) in the context of the project ARCHES (Autonomous Robotic Networks to Help Modern Societies). This network was realized via a Digital Twin Prototype approach. During that period differen…
▽ More
During the research cruise AL547 with RV ALKOR (October 20-31, 2020), a collaborative underwater network of ocean observation systems was deployed in Boknis Eck (SW Baltic Sea, German exclusive economic zone (EEZ)) in the context of the project ARCHES (Autonomous Robotic Networks to Help Modern Societies). This network was realized via a Digital Twin Prototype approach. During that period different scenarios were executed to demonstrate the feasibility of Digital Twins in an extreme environment such as underwater. One of the scenarios showed the collaboration of stage IV Digital Twins with their physical counterparts on the seafloor. This way, we address the research question, whether Digital Twins represent a feasible approach to operate mobile ad hoc networks for ocean and coastal observation.
△ Less
Submitted 15 March, 2021;
originally announced March 2021.
-
Goals and Measures for Analyzing Power Consumption Data in Manufacturing Enterprises
Authors:
Sören Henning,
Wilhelm Hasselbring,
Heinz Burmester,
Armin Möbius,
Maik Wojcieszak
Abstract:
The Internet of Things adoption in the manufacturing industry allows enterprises to monitor their electrical power consumption in real time and at machine level. In this paper, we follow up on such emerging opportunities for data acquisition and show that analyzing power consumption in manufacturing enterprises can serve a variety of purposes. Apart from the prevalent goal of reducing overall powe…
▽ More
The Internet of Things adoption in the manufacturing industry allows enterprises to monitor their electrical power consumption in real time and at machine level. In this paper, we follow up on such emerging opportunities for data acquisition and show that analyzing power consumption in manufacturing enterprises can serve a variety of purposes. Apart from the prevalent goal of reducing overall power consumption for economical and ecological reasons, such data can, for example, be used to improve production processes.
Based on a literature review and expert interviews, we discuss how analyzing power consumption data can serve the goals reporting, optimization, fault detection, and predictive maintenance. To tackle these goals, we propose to implement the measures real-time data processing, multi-level monitoring, temporal aggregation, correlation, anomaly detection, forecasting, visualization, and alerting in software.
We transfer our findings to two manufacturing enterprises and show how the presented goals reflect in these enterprises. In a pilot implementation of a power consumption analytics platform, we show how our proposed measures can be implemented with a microservice-based architecture, stream processing techniques, and the fog computing paradigm. We provide the implementations as open source as well as a public demo allowing to reproduce and extend our research.
△ Less
Submitted 22 September, 2020;
originally announced September 2020.
-
Automated identification of metamorphic test scenarios for an ocean-modeling application
Authors:
Dilip J. Hiremath,
Martin Claus,
Wilhelm Hasselbring,
Willi Rath
Abstract:
Metamorphic testing seeks to validate software in the absence of test oracles. Our application domain is ocean modeling, where test oracles often do not exist, but where symmetries of the simulated physical systems are known. In this short paper we present work in progress for automated generation of metamorphic test scenarios using machine learning. Metamorphic testing may be expressed as f(g(X))…
▽ More
Metamorphic testing seeks to validate software in the absence of test oracles. Our application domain is ocean modeling, where test oracles often do not exist, but where symmetries of the simulated physical systems are known. In this short paper we present work in progress for automated generation of metamorphic test scenarios using machine learning. Metamorphic testing may be expressed as f(g(X))=h(f(X)) with f being the application under test, with input data X, and with the metamorphic relation (g, h). Automatically generated metamorphic relations can be used for constructing regression tests, and for comparing different versions of the same software application. Here, we restrict to h being the identity map. Then, the task of constructing tests means finding different g which we tackle using machine learning algorithms. These algorithms typically minimize a cost function. As one possible g is already known to be the identity map, for finding a second possible g, we construct the cost function to minimize for g being a metamorphic relation and to penalize for g being the identity map. After identifying the first metamorphic relation, the procedure is repeated with a cost function rewarding g that are orthogonal to previously found metamorphic relations. For experimental evaluation, two implementations of an ocean-modeling application will be subjected to the proposed method with the objective of presenting the use of metamorphic relations to test the implementations of the applications.
△ Less
Submitted 3 September, 2020;
originally announced September 2020.
-
Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines in Microservice Architectures
Authors:
Sören Henning,
Wilhelm Hasselbring
Abstract:
Distributed stream processing engines are designed with a focus on scalability to process big data volumes in a continuous manner. We present the Theodolite method for benchmarking the scalability of distributed stream processing engines. Core of this method is the definition of use cases that microservices implementing stream processing have to fulfill. For each use case, our method identifies re…
▽ More
Distributed stream processing engines are designed with a focus on scalability to process big data volumes in a continuous manner. We present the Theodolite method for benchmarking the scalability of distributed stream processing engines. Core of this method is the definition of use cases that microservices implementing stream processing have to fulfill. For each use case, our method identifies relevant workload dimensions that might affect the scalability of a use case. We propose to design one benchmark per use case and relevant workload dimension. We present a general benchmarking framework, which can be applied to execute the individual benchmarks for a given use case and workload dimension. Our framework executes an implementation of the use case's dataflow architecture for different workloads of the given dimension and various numbers of processing instances. This way, it identifies how resources demand evolves with increasing workloads. Within the scope of this paper, we present 4 identified use cases, derived from processing Industrial Internet of Things data, and 7 corresponding workload dimensions. We provide implementations of 4 benchmarks with Kafka Streams and Apache Flink as well as an implementation of our benchmarking framework to execute scalability benchmarks in cloud environments. We use both for evaluating the Theodolite method and for benchmarking Kafka Streams' and Flink's scalability for different deployment options.
△ Less
Submitted 11 February, 2021; v1 submitted 1 September, 2020;
originally announced September 2020.
-
Microservice Decomposition via Static and Dynamic Analysis of the Monolith
Authors:
Alexander Krause,
Christian Zirkelbach,
Wilhelm Hasselbring,
Stephan Lenga,
Dan Kröger
Abstract:
Migrating monolithic software systems into microservices requires the application of decomposition techniquesto find and select appropriate service boundaries. These techniques are often based on domain knowledge, static code analysis, and non-functional requirements such as maintainability. In this paper, we present our experience with an approach that extends static analysis with dynamic analysi…
▽ More
Migrating monolithic software systems into microservices requires the application of decomposition techniquesto find and select appropriate service boundaries. These techniques are often based on domain knowledge, static code analysis, and non-functional requirements such as maintainability. In this paper, we present our experience with an approach that extends static analysis with dynamic analysis of a legacy software system's runtime behavior, including the live trace visualization to support the decomposition into microservices. Overall, our approach combines established analysis techniques for microservice decomposition, such as the bounded context pattern of domain-driven design, and enriches the collected information via dynamic software visualization to identify appropriate microservice boundaries. In collaboration with the German IT service provider adesso SE, we applied our approach to their real-word, legacy lottery application in|FOCUS to identify good microservice decompositions for this layered monolithic Enterprise Java system.
△ Less
Submitted 5 March, 2020;
originally announced March 2020.
-
Scalable and Reliable Multi-Dimensional Aggregation of Sensor Data Streams
Authors:
Sören Henning,
Wilhelm Hasselbring
Abstract:
Ever-increasing amounts of data and requirements to process them in real time lead to more and more analytics platforms and software systems being designed according to the concept of stream processing. A common area of application is the processing of continuous data streams from sensors, for example, IoT devices or performance monitoring tools. In addition to analyzing pure sensor data, analyses…
▽ More
Ever-increasing amounts of data and requirements to process them in real time lead to more and more analytics platforms and software systems being designed according to the concept of stream processing. A common area of application is the processing of continuous data streams from sensors, for example, IoT devices or performance monitoring tools. In addition to analyzing pure sensor data, analyses of data for groups of sensors often need to be performed as well. Therefore, data streams of the individual sensors have to be continuously aggregated to a data stream for a group. Motivated by a real-world application scenario, we propose that such a stream aggregation approach has to allow for aggregating sensors in hierarchical groups, support multiple such hierarchies in parallel, provide reconfiguration at runtime, and preserve the scalability and reliability qualities induced by applying stream processing techniques. We propose a stream processing architecture fulfilling these requirements, which can be integrated into existing big data architectures. We present a pilot implementation of such an extended architecture and show how it is used in industry. Furthermore, in experimental evaluations we show that our solution scales linearly with the amount of sensors and provides adequate reliability in the case of faults.
△ Less
Submitted 15 November, 2019;
originally announced November 2019.
-
Comparing Static and Dynamic Weighted Software Coupling Metrics
Authors:
Henning Schnoor,
Wilhelm Hasselbring
Abstract:
Coupling metrics are an established way to measure software architecture quality with respect to modularity. Static coupling metrics are obtained from the source or compiled code of a program, while dynamic metrics use runtime data gathered e.g., by monitoring a system in production. We study \emph{weighted} dynamic coupling that takes into account how often a connection is executed during a syste…
▽ More
Coupling metrics are an established way to measure software architecture quality with respect to modularity. Static coupling metrics are obtained from the source or compiled code of a program, while dynamic metrics use runtime data gathered e.g., by monitoring a system in production. We study \emph{weighted} dynamic coupling that takes into account how often a connection is executed during a system's run. We investigate the correlation between dynamic weighted metrics and their static counterparts. We use data collected from four different experiments, each monitoring production use of a commercial software system over a period of four weeks. We observe an unexpected level of correlation between the static and the weighted dynamic case as well as revealing differences between class- and package-level analyses.
△ Less
Submitted 27 September, 2019;
originally announced September 2019.
-
FAIR and Open Computer Science Research Software
Authors:
Wilhelm Hasselbring,
Leslie Carr,
Simon Hettrick,
Heather Packer,
Thanassis Tiropanis
Abstract:
In computational science and in computer science, research software is a central asset for research. Computational science is the application of computer science and software engineering principles to solving scientific problems, whereas computer science is the study of computer hardware and software design.
The Open Science agenda holds that science advances faster when we can build on existing…
▽ More
In computational science and in computer science, research software is a central asset for research. Computational science is the application of computer science and software engineering principles to solving scientific problems, whereas computer science is the study of computer hardware and software design.
The Open Science agenda holds that science advances faster when we can build on existing results. Therefore, research software has to be reusable for advancing science. Thus, we need proper research software engineering for obtaining reusable and sustainable research software. This way, software engineering methods may improve research in other disciplines. However, research in software engineering and computer science itself will also benefit from reuse when research software is involved.
For good scientific practice, the resulting research software should be open and adhere to the FAIR principles (findable, accessible, interoperable and repeatable) to allow repeatability, reproducibility, and reuse. Compared to research data, research software should be both archived for reproducibility and actively maintained for reusability. The FAIR data principles do not require openness, but research software should be open source software. Established open source software licenses provide sufficient licensing options, such that it should be the rare exception to keep research software closed.
We review and analyze the current state in this area in order to give recommendations for making computer science research software FAIR and open. We observe that research software publishing practices in computer science and in computational science show significant differences.
△ Less
Submitted 16 August, 2019;
originally announced August 2019.
-
Modularization of Research Software for Collaborative Open Source Development
Authors:
Christian Zirkelbach,
Alexander Krause,
Wilhelm Hasselbring
Abstract:
Software systems evolve over their lifetime. Changing conditions, such as requirements or customer requests make it inevitable for developers to perform adjustments to the underlying code base. Especially in the context of open source software where everybody can contribute, requirements can change over time and new user groups may be addressed. In particular, research software is often not struct…
▽ More
Software systems evolve over their lifetime. Changing conditions, such as requirements or customer requests make it inevitable for developers to perform adjustments to the underlying code base. Especially in the context of open source software where everybody can contribute, requirements can change over time and new user groups may be addressed. In particular, research software is often not structured with a maintainable and extensible architecture. In combination with obsolescent technologies, this is a challenging task for new developers, especially, when students are involved.
In this paper, we report on the modularization process and architecture of our open source research project ExplorViz towards a microservice architecture. The new architecture facilitates a collaborative development process for both researchers and students. We describe the modularization measures and present how we solved occurring issues and enhanced our development process. Afterwards, we illustrate our modularization approach with our modernized, extensible software system architecture and highlight the improved collaborative development process. Finally, we present a proof-of-concept implementation featuring several developed extensions in terms of architecture and extensibility.
△ Less
Submitted 12 July, 2019;
originally announced July 2019.
-
Industrial DevOps
Authors:
Wilhelm Hasselbring,
Sören Henning,
Björn Latte,
Armin Möbius,
Thomas Richter,
Stefan Schalk,
Maik Wojcieszak
Abstract:
The visions and ideas of Industry 4.0 require a profound interconnection of machines, plants, and IT systems in industrial production environments. This significantly increases the importance of software, which is coincidentally one of the main obstacles to the introduction of Industry 4.0. Lack of experience and knowledge, high investment and maintenance costs, as well as uncertainty about future…
▽ More
The visions and ideas of Industry 4.0 require a profound interconnection of machines, plants, and IT systems in industrial production environments. This significantly increases the importance of software, which is coincidentally one of the main obstacles to the introduction of Industry 4.0. Lack of experience and knowledge, high investment and maintenance costs, as well as uncertainty about future developments cause many small and medium-sized enterprises hesitating to adopt Industry 4.0 solutions. We propose Industrial DevOps as an approach to introduce methods and culture of DevOps into industrial production environments. The fundamental concept of this approach is a continuous process of operation, observation, and development of the entire production environment. This way, all stakeholders, systems, and data can thus be integrated via incremental steps and adjustments can be made quickly. Furthermore, we present the Titan software platform accompanied by a role model for integrating production environments with Industrial DevOps. In two initial industrial application scenarios, we address the challenges of energy management and predictive maintenance with the methods, organizational structures, and tools of Industrial DevOps.
△ Less
Submitted 3 July, 2019;
originally announced July 2019.
-
A Scalable Architecture for Power Consumption Monitoring in Industrial Production Environments
Authors:
Sören Henning,
Wilhelm Hasselbring,
Armin Möbius
Abstract:
Detailed knowledge about the electrical power consumption in industrial production environments is a prerequisite to reduce and optimize their power consumption. Today's industrial production sites are equipped with a variety of sensors that, inter alia, monitor electrical power consumption in detail. However, these environments often lack an automated data collation and analysis.
We present a s…
▽ More
Detailed knowledge about the electrical power consumption in industrial production environments is a prerequisite to reduce and optimize their power consumption. Today's industrial production sites are equipped with a variety of sensors that, inter alia, monitor electrical power consumption in detail. However, these environments often lack an automated data collation and analysis.
We present a system architecture that integrates different sensors and analyzes and visualizes the power consumption of devices, machines, and production plants. It is designed with a focus on scalability to support production environments of various sizes and to handle varying loads. We argue that a scalable architecture in this context must meet requirements for fault tolerance, extensibility, real-time data processing, and resource efficiency. As a solution, we propose a microservice-based architecture augmented by big data and stream processing techniques. Applying the fog computing paradigm, parts of it are deployed in an elastic, central cloud while other parts run directly, decentralized in the production environment.
A prototype implementation of this architecture presents solutions how different kinds of sensors can be integrated and their measurements can be continuously aggregated. In order to make analyzed data comprehensible, it features a single-page web application that provides different forms of data visualization. We deploy this pilot implementation in the data center of a medium-sized enterprise, where we successfully monitor the power consumption of 16~servers. Furthermore, we show the scalability of our architecture with 20,000~simulated sensors.
△ Less
Submitted 1 July, 2019;
originally announced July 2019.
-
Performance-oriented DevOps: A Research Agenda
Authors:
Andreas Brunnert,
Andre van Hoorn,
Felix Willnecker,
Alexandru Danciu,
Wilhelm Hasselbring,
Christoph Heger,
Nikolas Herbst,
Pooyan Jamshidi,
Reiner Jung,
Joakim von Kistowski,
Anne Koziolek,
Johannes Kroß,
Simon Spinner,
Christian Vögele,
Jürgen Walter,
Alexander Wert
Abstract:
DevOps is a trend towards a tighter integration between development (Dev) and operations (Ops) teams. The need for such an integration is driven by the requirement to continuously adapt enterprise applications (EAs) to changes in the business environment. As of today, DevOps concepts have been primarily introduced to ensure a constant flow of features and bug fixes into new releases from a functio…
▽ More
DevOps is a trend towards a tighter integration between development (Dev) and operations (Ops) teams. The need for such an integration is driven by the requirement to continuously adapt enterprise applications (EAs) to changes in the business environment. As of today, DevOps concepts have been primarily introduced to ensure a constant flow of features and bug fixes into new releases from a functional perspective. In order to integrate a non-functional perspective into these DevOps concepts this report focuses on tools, activities, and processes to ensure one of the most important quality attributes of a software system, namely performance.
Performance describes system properties concerning its timeliness and use of resources. Common metrics are response time, throughput, and resource utilization. Performance goals for EAs are typically defined by setting upper and/or lower bounds for these metrics and specific business transactions. In order to ensure that such performance goals can be met, several activities are required during development and operation of these systems as well as during the transition from Dev to Ops. Activities during development are typically summarized by the term Software Performance Engineering (SPE), whereas activities during operations are called Application Performance Management (APM). SPE and APM were historically tackled independently from each other, but the newly emerging DevOps concepts require and enable a tighter integration between both activity streams. This report presents existing solutions to support this integration as well as open research challenges in this area.
△ Less
Submitted 18 August, 2015;
originally announced August 2015.
-
Runtime Reconfiguration of J2EE Applications
Authors:
Jasminka Matevska-Meyer,
Sascha Olliges,
Wilhelm Hasselbring
Abstract:
Runtime reconfiguration considered as "applying required changes to a running system" plays an important role for providing high availability not only of safety- and mission-critical systems, but also for commercial web-applications offering professional services. Hereby, the main concerns are maintaining the consistency of the running system during reconfiguration and minimizing its down-time c…
▽ More
Runtime reconfiguration considered as "applying required changes to a running system" plays an important role for providing high availability not only of safety- and mission-critical systems, but also for commercial web-applications offering professional services. Hereby, the main concerns are maintaining the consistency of the running system during reconfiguration and minimizing its down-time caused by the reconfiguration. This paper focuses on the platform independent subsystem that realises deployment and redeployment of J2EE modules based on the new J2EE Deployment API as a part of the implementation of our proposed system architecture enabling runtime reconfiguration of component-based systems. Our "controlled runtime redeployment" comprises an extension of hot deployment and dynamic reloading, complemented by allowing for structural change
△ Less
Submitted 17 November, 2004;
originally announced November 2004.