-
A Query Language for Software Architecture Information (Extended version)
Authors:
Joshua Ammermann,
Sven Jordan,
Lukas Linsbauer,
Ina Schaefer
Abstract:
Software maintenance is an important part of a software system's life cycle. Maintenance tasks of existing software systems suffer from architecture information that is diverging over time (architectural drift). The Digital Architecture Twin (DArT) can support software maintenance by providing up-to-date architecture information. For this, the DArT gathers such information and co-evolves with a so…
▽ More
Software maintenance is an important part of a software system's life cycle. Maintenance tasks of existing software systems suffer from architecture information that is diverging over time (architectural drift). The Digital Architecture Twin (DArT) can support software maintenance by providing up-to-date architecture information. For this, the DArT gathers such information and co-evolves with a software system, enabling continuous reverse engineering. But the crucial link for stakeholders to retrieve this information is missing. To fill this gap, we contribute the Architecture Information Query Language (AIQL), which enables stakeholders to access up-to-date and tailored architecture information. We derived four application scenarios in the context of continuous reverse engineering. We showed that the AIQL provides the required functionality to formulate queries for the application scenarios and that the language scales for use with real-world software systems. In a user study, stakeholders agreed that the language is easy to understand and assessed its value to the specific stakeholder for the application scenarios.
△ Less
Submitted 4 July, 2023; v1 submitted 29 June, 2023;
originally announced June 2023.
-
Custom-Tailored Clone Detection for IEC 61131-3 Programming Languages
Authors:
Kamil Rosiak,
Alexander Schlie,
Lukas Linsbauer,
Birgit Vogel-Heuser,
Ina Schaefer
Abstract:
Automated production systems (aPS) are highly customized systems that consist of hardware and software. Such aPS are controlled by a programmable logic controller (PLC), often in accordance with the IEC 61131-3 standard that divides system implementation into so-called program organization units (POUs) as the smallest software unit and is comprised of multiple textual and graphical programming lan…
▽ More
Automated production systems (aPS) are highly customized systems that consist of hardware and software. Such aPS are controlled by a programmable logic controller (PLC), often in accordance with the IEC 61131-3 standard that divides system implementation into so-called program organization units (POUs) as the smallest software unit and is comprised of multiple textual and graphical programming languages that can be arbitrarily nested. A common practice during the development of such systems is reusing implementation artifacts by copying, pasting, and then modifying code. This approach is referred to as code cloning. It is used on a fine-granular level where a POU is cloned within a system variant. It is also applied on the coarse-granular system level, where the entire system is cloned and adapted to create a system variant, for example for another customer. This ad hoc practice for the development of variants is commonly referred to as clone-and-own. It allows the fast development of variants to meet varying customer requirements or altered regulatory guidelines. However, clone-and-own is a non-sustainable approach and does not scale with an increasing number of variants. It has a detrimental effect on the overall quality of a software system, such as the propagation of bugs to other variants, which harms maintenance. In order to support the effective development and maintenance of such systems, a detailed code clone analysis is required. On the one hand, an analysis of code clones within a variant (i.e., clone detection in the classical sense) supports experts in refactoring respective code into library components. On the other hand, an analysis of commonalities and differences between cloned variants (i.e., variability analysis) supports the maintenance and further reuse and facilitates the migration of variants into a software product line (SPL).
△ Less
Submitted 22 August, 2021;
originally announced August 2021.
-
Semantic Clone Detection via Probabilistic Software Modeling
Authors:
Hannes Thaller,
Lukas Linsbauer,
Alexander Egyed
Abstract:
Semantic clone detection is the process of finding program elements with similar or equal runtime behavior. For example, detecting the semantic equality between the recursive and iterative implementation of the factorial computation. Semantic clone detection is the de facto technical boundary of clone detectors. In recent years, this boundary has been tested using interesting new approaches. This…
▽ More
Semantic clone detection is the process of finding program elements with similar or equal runtime behavior. For example, detecting the semantic equality between the recursive and iterative implementation of the factorial computation. Semantic clone detection is the de facto technical boundary of clone detectors. In recent years, this boundary has been tested using interesting new approaches. This article contributes a semantic clone detection approach that detects clones that have 0% syntactic similarity. We present Semantic Clone Detection via Probabilistic Software Modeling (SCD-PSM) as a stable and precise solution to semantic clone detection. PSM builds a probabilistic model of a program that is capable of evaluating and generating runtime data. SCD-PSM leverages this model and its model elements for finding behaviorally equal model elements. This behavioral equality is then generalized to semantic equality of the original program elements. It uses the likelihood between model elements as a distance metric. Then, it employs the likelihood ratio significance test to decide whether this distance is significant, given a pre-specified and controllable false-positive rate. The output of SCD-PSM are pairs of program elements (i.e., methods), their distance, and a decision on whether they are clones or not. SCD-PSM yields excellent results with a Matthews Correlation Coefficient greater than 0.9. These results are obtained on classical semantic clone detection problems such as detecting recursive and iterative versions of an algorithm, but also on complex problems used in coding competitions.
△ Less
Submitted 21 May, 2022; v1 submitted 11 August, 2020;
originally announced August 2020.
-
Towards Fault Localization via Probabilistic Software Modeling
Authors:
Hannes Thaller,
Lukas Linsbauer,
Alexander Egyed,
Stefan Fischer
Abstract:
Software testing helps developers to identify bugs. However, awareness of bugs is only the first step. Finding and correcting the faulty program components is equally hard and essential for high-quality software. Fault localization automatically pinpoints the location of an existing bug in a program. It is a hard problem, and existing methods are not yet precise enough for widespread industrial ad…
▽ More
Software testing helps developers to identify bugs. However, awareness of bugs is only the first step. Finding and correcting the faulty program components is equally hard and essential for high-quality software. Fault localization automatically pinpoints the location of an existing bug in a program. It is a hard problem, and existing methods are not yet precise enough for widespread industrial adoption. We propose fault localization via Probabilistic Software Modeling (PSM). PSM analyzes the structure and behavior of a program and synthesizes a network of Probabilistic Models (PMs). Each PM models a method with its inputs and outputs and is capable of evaluating the likelihood of runtime data. We use this likelihood evaluation to find fault locations and their impact on dependent code elements. Results indicate that PSM is a robust framework for accurate fault localization.
△ Less
Submitted 4 March, 2020; v1 submitted 21 January, 2020;
originally announced January 2020.
-
Towards Semantic Clone Detection via Probabilistic Software Modeling
Authors:
Hannes Thaller,
Lukas Linsbauer,
Alexander Egyed
Abstract:
Semantic clones are program components with similar behavior, but different textual representation. Semantic similarity is hard to detect, and semantic clone detection is still an open issue. We present semantic clone detection via Probabilistic Software Modeling (PSM) as a robust method for detecting semantically equivalent methods. PSM inspects the structure and runtime behavior of a program and…
▽ More
Semantic clones are program components with similar behavior, but different textual representation. Semantic similarity is hard to detect, and semantic clone detection is still an open issue. We present semantic clone detection via Probabilistic Software Modeling (PSM) as a robust method for detecting semantically equivalent methods. PSM inspects the structure and runtime behavior of a program and synthesizes a network of Probabilistic Models (PMs). Each PM in the network represents a method in the program and is capable of generating and evaluating runtime events. We leverage these capabilities to accurately find semantic clones. Results show that the approach can detect semantic clones in the complete absence of syntactic similarity with high precision and low error rates.
△ Less
Submitted 21 January, 2020;
originally announced January 2020.
-
Probabilistic Software Modeling: A Data-driven Paradigm for Software Analysis
Authors:
Hannes Thaller,
Lukas Linsbauer,
Rudolf Ramler,
Alexander Egyed
Abstract:
Software systems are complex, and behavioral comprehension with the increasing amount of AI components challenges traditional testing and maintenance strategies.The lack of tools and methodologies for behavioral software comprehension leaves developers to testing and debugging that work in the boundaries of known scenarios. We present Probabilistic Software Modeling (PSM), a data-driven modeling p…
▽ More
Software systems are complex, and behavioral comprehension with the increasing amount of AI components challenges traditional testing and maintenance strategies.The lack of tools and methodologies for behavioral software comprehension leaves developers to testing and debugging that work in the boundaries of known scenarios. We present Probabilistic Software Modeling (PSM), a data-driven modeling paradigm for predictive and generative methods in software engineering. PSM analyzes a program and synthesizes a network of probabilistic models that can simulate and quantify the original program's behavior. The approach extracts the type, executable, and property structure of a program and copies its topology. Each model is then optimized towards the observed runtime leading to a network that reflects the system's structure and behavior. The resulting network allows for the full spectrum of statistical inferential analysis with which rich predictive and generative applications can be built. Applications range from the visualization of states, inferential queries, test case generation, and anomaly detection up to the stochastic execution of the modeled system. In this work, we present the modeling methodologies, an empirical study of the runtime behavior of software systems, and a comprehensive study on PSM modeled systems. Results indicate that PSM is a solid foundation for structural and behavioral software comprehension applications.
△ Less
Submitted 18 December, 2019; v1 submitted 17 December, 2019;
originally announced December 2019.
-
Feature Maps: A Comprehensible Software Representation for Design Pattern Detection
Authors:
Hannes Thaller,
Lukas Linsbauer,
Alexander Egyed
Abstract:
Design patterns are elegant and well-tested solutions to recurrent software development problems. They are the result of software developers dealing with problems that frequently occur, solving them in the same or a slightly adapted way. A pattern's semantics provide the intent, motivation, and applicability, describing what it does, why it is needed, and where it is useful. Consequently, design p…
▽ More
Design patterns are elegant and well-tested solutions to recurrent software development problems. They are the result of software developers dealing with problems that frequently occur, solving them in the same or a slightly adapted way. A pattern's semantics provide the intent, motivation, and applicability, describing what it does, why it is needed, and where it is useful. Consequently, design patterns encode a well of information. Developers weave this information into their systems whenever they use design patterns to solve problems. This work presents Feature Maps, a flexible human- and machine-comprehensible software representation based on micro-structures. Our algorithm, the Feature-Role Normalization, presses the high-dimensional, inhomogeneous vector space of micro-structures into a feature map. We apply these concepts to the problem of detecting instances of design patterns in source code. We evaluate our methodology on four design patterns, a wide range of balanced and imbalanced labeled training data, and compare classical machine learning (Random Forests) with modern deep learning approaches (Convolutional Neural Networks). Feature maps yield robust classifiers even under challenging settings of strongly imbalanced data distributions without sacrificing human comprehensibility. Results suggest that feature maps are an excellent addition in the software analysis toolbox that can reveal useful information hidden in the source code.
△ Less
Submitted 16 January, 2019; v1 submitted 24 December, 2018;
originally announced December 2018.
-
A Hitchhiker's Guide to Search-Based Software Engineering for Software Product Lines
Authors:
Roberto E. Lopez-Herrejon,
Javier Ferrer,
Francisco Chicano,
Lukas Linsbauer,
Alexander Egyed,
Enrique Alba
Abstract:
Search Based Software Engineering (SBSE) is an emerging discipline that focuses on the application of search-based optimization techniques to software engineering problems. The capacity of SBSE techniques to tackle problems involving large search spaces make their application attractive for Software Product Lines (SPLs). In recent years, several publications have appeared that apply SBSE technique…
▽ More
Search Based Software Engineering (SBSE) is an emerging discipline that focuses on the application of search-based optimization techniques to software engineering problems. The capacity of SBSE techniques to tackle problems involving large search spaces make their application attractive for Software Product Lines (SPLs). In recent years, several publications have appeared that apply SBSE techniques to SPL problems. In this paper, we present the results of a systematic map** study of such publications. We identified the stages of the SPL life cycle where SBSE techniques have been used, what case studies have been employed and how they have been analysed. This map** study revealed potential venues for further research as well as common misunderstanding and pitfalls when applying SBSE techniques that we address by providing a guideline for researchers and practitioners interested in exploiting these techniques.
△ Less
Submitted 11 June, 2014;
originally announced June 2014.