Search | arXiv e-print repository

A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models

Authors: Fernando Vallecillos Ruiz, Anastasiia Grishina, Max Hort, Leon Moonen

Abstract: Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back using neural machine translation with language models. We investigate whether this correction capability of Large Language Models (LLMs) extends to Automatic Program Repair (APR). Current generative models for APR are pre-trained on source code and fine-tuned for repair. This pape… ▽ More Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back using neural machine translation with language models. We investigate whether this correction capability of Large Language Models (LLMs) extends to Automatic Program Repair (APR). Current generative models for APR are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back. We hypothesize that RTT with LLMs restores the most commonly seen patterns in code during pre-training, i.e., performs a regression toward the mean, which removes bugs as they are a form of noise w.r.t. the more frequent, natural, bug-free code in the training data. To test this hypothesis, we employ eight recent LLMs pre-trained on code, including the latest GPT versions, and four common program repair benchmarks in Java. We find that RTT with English as an intermediate language repaired 101 of 164 bugs with GPT-4 on the HumanEval-Java dataset. Moreover, 46 of these are unique bugs that are not repaired by other LLMs fine-tuned for APR. Our findings highlight the viability of round-trip translation with LLMs as a technique for automated program repair and its potential for research in software engineering. Keywords: automated program repair, large language model, machine translation △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2307.02443 [pdf, other]

An Exploratory Literature Study on Sharing and Energy Use of Language Models for Source Code

Authors: Max Hort, Anastasiia Grishina, Leon Moonen

Abstract: Large language models trained on source code can support a variety of software development tasks, such as code recommendation and program repair. Large amounts of data for training such models benefit the models' performance. However, the size of the data and models results in long training times and high energy consumption. While publishing source code allows for replicability, users need to repe… ▽ More Large language models trained on source code can support a variety of software development tasks, such as code recommendation and program repair. Large amounts of data for training such models benefit the models' performance. However, the size of the data and models results in long training times and high energy consumption. While publishing source code allows for replicability, users need to repeat the expensive training process if models are not shared. The main goal of the study is to investigate if publications that trained language models for software engineering (SE) tasks share source code and trained artifacts. The second goal is to analyze the transparency on training energy usage. We perform a snowballing-based literature search to find publications on language models for source code, and analyze their reusability from a sustainability standpoint. From 494 unique publications, we identified 293 relevant publications that use language models to address code-related tasks. Among them, 27% (79 out of 293) make artifacts available for reuse. This can be in the form of tools or IDE plugins designed for specific tasks or task-agnostic models that can be fine-tuned for a variety of downstream tasks. Moreover, we collect insights on the hardware used for model training, as well as training time, which together determine the energy consumption of the development process. We find that there are deficiencies in the sharing of information and artifacts for current studies on source code models for software engineering tasks, with 40% of the surveyed papers not sharing source code or trained artifacts. We recommend the sharing of source code as well as trained artifacts, to enable sustainable reproducibility. Moreover, comprehensive information on training times and hardware configurations should be shared for transparency on a model's carbon footprint. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: Accepted for publication in the 17th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM 2023)

arXiv:2305.04940 [pdf, other]

doi 10.1145/3611643.3616304

The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification

Authors: Anastasiia Grishina, Max Hort, Leon Moonen

Abstract: The use of modern Natural Language Processing (NLP) techniques has shown to be beneficial for software engineering tasks, such as vulnerability detection and type inference. However, training deep NLP models requires significant computational resources. This paper explores techniques that aim at achieving the best usage of resources and available information in these models. We propose a generic… ▽ More The use of modern Natural Language Processing (NLP) techniques has shown to be beneficial for software engineering tasks, such as vulnerability detection and type inference. However, training deep NLP models requires significant computational resources. This paper explores techniques that aim at achieving the best usage of resources and available information in these models. We propose a generic approach, EarlyBIRD, to build composite representations of code from the early layers of a pre-trained transformer model. We empirically investigate the viability of this approach on the CodeBERT model by comparing the performance of 12 strategies for creating composite representations with the standard practice of only using the last encoder layer. Our evaluation on four datasets shows that several early layer combinations yield better performance on defect detection, and some combinations improve multi-class classification. More specifically, we obtain a +2 average improvement of detection accuracy on Devign with only 3 out of 12 layers of CodeBERT and a 3.3x speed-up of fine-tuning. These findings show that early layers can be used to obtain better results using the same resources, as well as to reduce resource usage during fine-tuning and inference. △ Less

Submitted 11 September, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

Comments: The content in this pre-print is the same as in the CRC accepted for publication in the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023)

arXiv:2305.00382 [pdf, other]

Constructing a Knowledge Graph from Textual Descriptions of Software Vulnerabilities in the National Vulnerability Database

Authors: Anders Mølmen Høst, Pierre Lison, Leon Moonen

Abstract: Knowledge graphs have shown promise for several cybersecurity tasks, such as vulnerability assessment and threat analysis. In this work, we present a new method for constructing a vulnerability knowledge graph from information in the National Vulnerability Database (NVD). Our approach combines named entity recognition (NER), relation extraction (RE), and entity prediction using a combination of ne… ▽ More Knowledge graphs have shown promise for several cybersecurity tasks, such as vulnerability assessment and threat analysis. In this work, we present a new method for constructing a vulnerability knowledge graph from information in the National Vulnerability Database (NVD). Our approach combines named entity recognition (NER), relation extraction (RE), and entity prediction using a combination of neural models, heuristic rules, and knowledge graph embeddings. We demonstrate how our method helps to fix missing entities in knowledge graphs used for cybersecurity and evaluate the performance. △ Less

Submitted 15 May, 2023; v1 submitted 30 April, 2023; originally announced May 2023.

Comments: Accepted for publication in the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), Tórshavn, Faroe Islands, May 22nd-24th, 2023. [v2]: added funding acknowledgments

arXiv:2304.10423 [pdf, other]

doi 10.1145/3583131.3590481

Fully Autonomous Programming with Large Language Models

Authors: Vadim Liventsev, Anastasiia Grishina, Aki Härmä, Leon Moonen

Abstract: Current approaches to program synthesis with Large Language Models (LLMs) exhibit a "near miss syndrome": they tend to generate programs that semantically resemble the correct answer (as measured by text similarity metrics or human evaluation), but achieve a low or even zero accuracy as measured by unit tests due to small imperfections, such as the wrong input or output format. This calls for an a… ▽ More Current approaches to program synthesis with Large Language Models (LLMs) exhibit a "near miss syndrome": they tend to generate programs that semantically resemble the correct answer (as measured by text similarity metrics or human evaluation), but achieve a low or even zero accuracy as measured by unit tests due to small imperfections, such as the wrong input or output format. This calls for an approach known as Synthesize, Execute, Debug (SED), whereby a draft of the solution is generated first, followed by a program repair phase addressing the failed tests. To effectively apply this approach to instruction-driven LLMs, one needs to determine which prompts perform best as instructions for LLMs, as well as strike a balance between repairing unsuccessful programs and replacing them with newly generated ones. We explore these trade-offs empirically, comparing replace-focused, repair-focused, and hybrid debug strategies, as well as different template-based and model-based prompt-generation techniques. We use OpenAI Codex as the LLM and Program Synthesis Benchmark 2 as a database of problem descriptions and tests for evaluation. The resulting framework outperforms both conventional usage of Codex without the repair phase and traditional genetic programming approaches. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: Accepted for publication in the Genetic and Evolutionary Computation Conference (GECCO 2023)

arXiv:2303.07283 [pdf, other]

CHESS: A Framework for Evaluation of Self-adaptive Systems based on Chaos Engineering

Authors: Sehrish Malik, Moeen Ali Naqvi, Leon Moonen

Abstract: There is an increasing need to assess the correct behavior of self-adaptive and self-healing systems due to their adoption in critical and highly dynamic environments. However, there is a lack of systematic evaluation methods for self-adaptive and self-healing systems. We proposed CHESS, a novel approach to address this gap by evaluating self-adaptive and self-healing systems through fault injecti… ▽ More There is an increasing need to assess the correct behavior of self-adaptive and self-healing systems due to their adoption in critical and highly dynamic environments. However, there is a lack of systematic evaluation methods for self-adaptive and self-healing systems. We proposed CHESS, a novel approach to address this gap by evaluating self-adaptive and self-healing systems through fault injection based on chaos engineering (CE) [ arXiv:2208.13227 ]. The artifact presented in this paper provides an extensive overview of the use of CHESS through two microservice-based case studies: a smart office case study and an existing demo application called Yelb. It comes with a managing system service, a self-monitoring service, as well as five fault injection scenarios covering infrastructure faults and functional faults. Each of these components can be easily extended or replaced to adopt the CHESS approach to a new case study, help explore its promises and limitations, and identify directions for future research. Keywords: self-healing, resilience, chaos engineering, evaluation, artifact △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: Accepted for publication in the 18nd Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS 2023)

arXiv:2211.03911 [pdf, other]

Towards Extending the Range of Bugs That Automated Program Repair Can Handle

Authors: Omar I. Al-Bataineh, Leon Moonen

Abstract: Modern automated program repair (APR) is well-tuned to finding and repairing bugs that introduce observable erroneous behavior to a program. However, a significant class of bugs does not lead to such observable behavior (e.g., liveness/termination bugs, non-functional bugs, and information flow bugs). Such bugs can generally not be handled with current APR approaches, so, as a community, we need t… ▽ More Modern automated program repair (APR) is well-tuned to finding and repairing bugs that introduce observable erroneous behavior to a program. However, a significant class of bugs does not lead to such observable behavior (e.g., liveness/termination bugs, non-functional bugs, and information flow bugs). Such bugs can generally not be handled with current APR approaches, so, as a community, we need to develop complementary techniques. To stimulate the systematic study of alternative APR approaches and hybrid APR combinations, we devise a novel bug classification system that enables methodical analysis of their bug detection power and bug repair capabilities. To demonstrate the benefits, we analyze the repair of termination bugs in sequential and concurrent programs. The study shows that integrating dynamic APR with formal analysis techniques, such as termination provers and software model checkers, reduces complexity and improves the overall reliability of these repairs. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: Accepted for publication in the 22nd IEEE International Conference on Software Quality, Reliability and Security (QRS 2022)

arXiv:2208.13244 [pdf, other]

Assessing the Impact of Execution Environment on Observation-Based Slicing

Authors: David Binkley, Leon Moonen

Abstract: Program slicing reduces a program to a smaller version that retains a chosen computation, referred to as a slicing criterion. One recent multi-lingual slicing approach, observation-based slicing (ORBS), speculatively deletes parts of the program and then executes the code. If the behavior of the slicing criteria is unchanged, the speculative deletion is made permanent. While this makes ORBS lang… ▽ More Program slicing reduces a program to a smaller version that retains a chosen computation, referred to as a slicing criterion. One recent multi-lingual slicing approach, observation-based slicing (ORBS), speculatively deletes parts of the program and then executes the code. If the behavior of the slicing criteria is unchanged, the speculative deletion is made permanent. While this makes ORBS language agnostic, it can lead to the production of some non-intuitive slices. One particular challenge is when the execution environment plays a role. For example, ORBS will delete the line "a = 0" if the memory location assigned to a contains zero before executing the statement, since deletion will not affect the value of a and thus the slicing criterion. Consequently, slices can differ between execution environments due to factors such as initialization and call stack reuse. The technique considered, nVORBS, attempts to ameliorate this problem by validating a candidate slice in n different execution environments. We conduct an empirical study to collect initial insights into how often the execution environment leads to slice differences. Specifically, we compare and contrast the slices produced by seven different instantiations of nVORBS. Looking forward, the technique can be seen as a variation on metamorphic testing, and thus suggests how ideas from metamorphic testing might be used to improve dynamic program analysis. △ Less

Submitted 28 August, 2022; originally announced August 2022.

arXiv:2208.13227 [pdf, other]

doi 10.1109/ACSOS55765.2022.00018

On Evaluating Self-Adaptive and Self-Healing Systems using Chaos Engineering

Authors: Moeen Ali Naqvi, Sehrish Malik, Merve Astekin, Leon Moonen

Abstract: With the growing adoption of self-adaptive systems in various domains, there is an increasing need for strategies to assess their correct behavior. In particular self-healing systems, which aim to provide resilience and fault-tolerance, often deal with unanticipated failures in critical and highly dynamic environments. Their reactive and complex behavior makes it challenging to assess if these sys… ▽ More With the growing adoption of self-adaptive systems in various domains, there is an increasing need for strategies to assess their correct behavior. In particular self-healing systems, which aim to provide resilience and fault-tolerance, often deal with unanticipated failures in critical and highly dynamic environments. Their reactive and complex behavior makes it challenging to assess if these systems execute according to the desired goals. Recently, several studies have expressed concern about the lack of systematic evaluation methods for self-healing behavior. In this paper, we propose CHESS, an approach for the systematic evaluation of self-adaptive and self-healing systems that builds on chaos engineering. Chaos engineering is a methodology for subjecting a system to unexpected conditions and scenarios. It has shown great promise in hel** developers build resilient microservice architectures and cyber-physical systems. CHESS turns this idea around by using chaos engineering to evaluate how well a self-healing system can withstand such perturbations. We investigate the viability of this approach through an exploratory study on a self-healing smart office environment. The study helps us explore the promises and limitations of the approach, as well as identify directions where additional work is needed. We conclude with a summary of lessons learned. △ Less

Submitted 28 August, 2022; originally announced August 2022.

Comments: 10 pages

arXiv:2202.02679 [pdf, other]

doi 10.1016/j.infsof.2022.106844

Featherweight Assisted Vulnerability Discovery

Authors: David Binkley, Leon Moonen, Sibren Isaacman

Abstract: Predicting vulnerable source code helps to focus attention on those parts of the code that need to be examined with more scrutiny. Recent work proposed the use of function names as semantic cues that can be learned by a deep neural network (DNN) to aid in the hunt for vulnerability of functions. Combining identifier splitting, which splits each function name into its constituent words, with a no… ▽ More Predicting vulnerable source code helps to focus attention on those parts of the code that need to be examined with more scrutiny. Recent work proposed the use of function names as semantic cues that can be learned by a deep neural network (DNN) to aid in the hunt for vulnerability of functions. Combining identifier splitting, which splits each function name into its constituent words, with a novel frequency-based algorithm, we explore the extent to which the words that make up a function's name can predict potentially vulnerable functions. In contrast to *lightweight* predictions by a DNN that considers only function names, avoiding the use of a DNN provides *featherweight* predictions. The underlying idea is that function names that contain certain "dangerous" words are more likely to accompany vulnerable functions. Of course, this assumes that the frequency-based algorithm can be properly tuned to focus on truly dangerous words. Because it is more transparent than a DNN, the frequency-based algorithm enables us to investigate the inner workings of the DNN. If successful, this investigation into what the DNN does and does not learn will help us train more effective future models. We empirically evaluate our approach on a heterogeneous dataset containing over 73000 functions labeled vulnerable, and over 950000 functions labeled benign. Our analysis shows that words alone account for a significant portion of the DNN's classification ability. We also find that words are of greatest value in the datasets with a more homogeneous vocabulary. Thus, when working within the scope of a given project, where the vocabulary is unavoidably homogeneous, our approach provides a cheaper, potentially complementary, technique to aid in the hunt for source-code vulnerabilities. Finally, this approach has the advantage that it is viable with orders of magnitude less training data. △ Less

Submitted 5 February, 2022; originally announced February 2022.

Comments: 17 pages, 6 figures, 6 tables

Journal ref: Information and Software Technology, 2022

arXiv:2111.05713 [pdf, ps, other]

Towards More Reliable Automated Program Repair by Integrating Static Analysis Techniques

Authors: Omar I. Al-Bataineh, Anastasiia Grishina, Leon Moonen

Abstract: A long-standing open challenge for automated program repair is the overfitting problem, which is caused by having insufficient or incomplete specifications to validate whether a generated patch is correct or not. Most available repair systems rely on weak specifications (i.e., specifications that are synthesized from test cases) which limits the quality of generated repairs. To strengthen specific… ▽ More A long-standing open challenge for automated program repair is the overfitting problem, which is caused by having insufficient or incomplete specifications to validate whether a generated patch is correct or not. Most available repair systems rely on weak specifications (i.e., specifications that are synthesized from test cases) which limits the quality of generated repairs. To strengthen specifications and improve the quality of repairs, we propose to closer integrate static bug detection techniques with automated program repair. The integration combines automated program repair with static analysis techniques in such a way that bug detection patterns can be synthesized into specifications that the repair system can use. We explore the feasibility of such integration using two types of bugs: arithmetic bugs, such as integer overflow, and logical bugs, such as termination bugs. As part of our analysis, we make several observations that help to improve patch generation for these classes of bugs. Moreover, these observations assist with narrowing down the candidate patch search space, and inferring an effective search order. △ Less

Submitted 10 November, 2021; originally announced November 2021.

Comments: Accepted at the 21st IEEE International Conference on Software Quality, Reliability, and Security (QRS 2021)

arXiv:2107.08760 [pdf, other]

doi 10.1145/3475960.3475985

CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software

Authors: Guru Prasad Bhandari, Amara Naseer, Leon Moonen

Abstract: Data-driven research on the automated discovery and repair of security vulnerabilities in source code requires comprehensive datasets of real-life vulnerable code and their fixes. To assist in such research, we propose a method to automatically collect and curate a comprehensive vulnerability dataset from Common Vulnerabilities and Exposures (CVE) records in the public National Vulnerability Datab… ▽ More Data-driven research on the automated discovery and repair of security vulnerabilities in source code requires comprehensive datasets of real-life vulnerable code and their fixes. To assist in such research, we propose a method to automatically collect and curate a comprehensive vulnerability dataset from Common Vulnerabilities and Exposures (CVE) records in the public National Vulnerability Database (NVD). We implement our approach in a fully automated dataset collection tool and share an initial release of the resulting vulnerability dataset named CVEfixes. The CVEfixes collection tool automatically fetches all available CVE records from the NVD, gathers the vulnerable code and corresponding fixes from associated open-source repositories, and organizes the collected information in a relational database. Moreover, the dataset is enriched with meta-data such as programming language, and detailed code and security metrics at five levels of abstraction. The collection can easily be repeated to keep up-to-date with newly discovered or patched vulnerabilities. The initial release of CVEfixes spans all published CVEs up to 9 June 2021, covering 5365 CVE records for 1754 open-source projects that were addressed in a total of 5495 vulnerability fixing commits. CVEfixes supports various types of data-driven software security research, such as vulnerability prediction, vulnerability classification, vulnerability severity prediction, analysis of vulnerability-related code changes, and automated vulnerability repair. △ Less

Submitted 19 July, 2021; originally announced July 2021.

Comments: Accepted for publication in Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE '21), August 19-20, 2021, Athens, Greece

arXiv:2101.02534 [pdf, other]

Adaptive Immunity for Software: Towards Autonomous Self-healing Systems

Authors: Moeen Ali Naqvi, Merve Astekin, Sehrish Malik, Leon Moonen

Abstract: Testing and code reviews are known techniques to improve the quality and robustness of software. Unfortunately, the complexity of modern software systems makes it impossible to anticipate all possible problems that can occur at runtime, which limits what issues can be found using testing and reviews. Thus, it is of interest to consider autonomous self-healing software systems, which can automatica… ▽ More Testing and code reviews are known techniques to improve the quality and robustness of software. Unfortunately, the complexity of modern software systems makes it impossible to anticipate all possible problems that can occur at runtime, which limits what issues can be found using testing and reviews. Thus, it is of interest to consider autonomous self-healing software systems, which can automatically detect, diagnose, and contain unanticipated problems at runtime. Most research in this area has adopted a model-driven approach, where actual behavior is checked against a model specifying the intended behavior, and a controller takes action when the system behaves outside of the specification. However, it is not easy to develop these specifications, nor to keep them up-to-date as the system evolves. We pose that, with the recent advances in machine learning, such models may be learned by observing the system. Moreover, we argue that artificial immune systems (AISs) are particularly well-suited for building self-healing systems, because of their anomaly detection and diagnosis capabilities. We present the state-of-the-art in self-healing systems and in AISs, surveying some of the research directions that have been considered up to now. To help advance the state-of-the-art, we develop a research agenda for building self-healing software systems using AISs, identifying required foundations, and promising research directions. △ Less

Submitted 7 January, 2021; originally announced January 2021.

Comments: 5 pages, 2 figures

arXiv:2009.03257 [pdf, other]

doi 10.1145/3239235.3239248

Improving Problem Identification via Automated Log Clustering using Dimensionality Reduction

Authors: Carl Martin Rosenberg, Leon Moonen

Abstract: Goal: We consider the problem of automatically grou** logs of runs that failed for the same underlying reasons, so that they can be treated more effectively, and investigate the following questions: (1) Does an approach developed to identify problems in system logs generalize to identifying problems in continuous deployment logs? (2) How does dimensionality reduction affect the quality of automa… ▽ More Goal: We consider the problem of automatically grou** logs of runs that failed for the same underlying reasons, so that they can be treated more effectively, and investigate the following questions: (1) Does an approach developed to identify problems in system logs generalize to identifying problems in continuous deployment logs? (2) How does dimensionality reduction affect the quality of automated log clustering? (3) How does the criterion used for merging clusters in the clustering algorithm affect clustering quality? Method: We replicate and extend earlier work on clustering system log files to assess its generalization to continuous deployment logs. We consider the optional inclusion of one of these dimensionality reduction techniques: Principal Component Analysis (PCA), Latent Semantic Indexing (LSI), and Non-negative Matrix Factorization (NMF). Moreover, we consider three alternative cluster merge criteria (Single Linkage, Average Linkage, and Weighted Linkage), in addition to the Complete Linkage criterion used in earlier work. We empirically evaluate the 16 resulting configurations on continuous deployment logs provided by our industrial collaborator. Results: Our study shows that (1) identifying problems in continuous deployment logs via clustering is feasible, (2) including NMF significantly improves overall accuracy and robustness, and (3) Complete Linkage performs best of all merge criteria analyzed. Conclusions: We conclude that problem identification via automated log clustering is improved by including dimensionality reduction, as it decreases the pipeline's sensitivity to parameter choice, thereby increasing its robustness for handling different inputs. △ Less

Submitted 7 September, 2020; originally announced September 2020.

Journal ref: Published in ESEM'18, Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, October 2018, Article: 16, pp. 1-10,

arXiv:2008.06948 [pdf, other]

doi 10.1145/3382494.3410684

Spectrum-Based Log Diagnosis

Authors: Carl Martin Rosenberg, Leon Moonen

Abstract: We present and evaluate Spectrum-Based Log Diagnosis (SBLD), a method to help developers quickly diagnose problems found in complex integration and deployment runs. Inspired by Spectrum-Based Fault Localization, SBLD leverages the differences in event occurrences between logs for failing and passing runs, to highlight events that are stronger associated with failing runs. Using data provided by… ▽ More We present and evaluate Spectrum-Based Log Diagnosis (SBLD), a method to help developers quickly diagnose problems found in complex integration and deployment runs. Inspired by Spectrum-Based Fault Localization, SBLD leverages the differences in event occurrences between logs for failing and passing runs, to highlight events that are stronger associated with failing runs. Using data provided by our industrial partner, we empirically investigate the following questions: (i) How well does SBLD reduce the effort needed to identify all failure-relevant events in the log for a failing run? (ii) How is the performance of SBLD affected by available data? (iii) How does SBLD compare to searching for simple textual patterns that often occur in failure-relevant events? We answer (i) and (ii) using summary statistics and heatmap visualizations, and for (iii) we compare three configurations of SBLD (with resp. minimum, median and maximum data) against a textual search using Wilcoxon signed-rank tests and the Vargha-Delaney measure of stochastic superiority. Our evaluation shows that (i) SBLD achieves a significant effort reduction for the dataset used, (ii) SBLD benefits from additional logs for passing runs in general, and it benefits from additional logs for failing runs when there is a proportional amount of logs for passing runs in the data. Finally, (iii) SBLD and textual search are roughly equally effective at effort-reduction, while textual search has a slightly better recall. We investigate the cause, and discuss how it is due to the characteristics of a specific part of our data. We conclude that SBLD shows promise as a method for diagnosing failing runs, that its performance is positively affected by additional data, but that it does not outperform textual search on the dataset considered. Future work includes investigating SBLD's generalizability on additional datasets. △ Less

Submitted 16 August, 2020; originally announced August 2020.

Comments: Published in ESEM'20: ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), October 8-9, 2020, Bari, Italy. ACM, 12 pages

arXiv:0707.2291 [pdf]

An Integrated Crosscutting Concern Migration Strategy and its Application to JHotDraw

Authors: Marius Marin, Leon Moonen, Arie van Deursen

Abstract: In this paper we propose a systematic strategy for migrating crosscutting concerns in existing object-oriented systems to aspect-based solutions. The proposed strategy consists of four steps: mining, exploration, documentation and refactoring of crosscutting concerns. We discuss in detail a new approach to aspect refactoring that is fully integrated with our strategy, and apply the whole strateg… ▽ More In this paper we propose a systematic strategy for migrating crosscutting concerns in existing object-oriented systems to aspect-based solutions. The proposed strategy consists of four steps: mining, exploration, documentation and refactoring of crosscutting concerns. We discuss in detail a new approach to aspect refactoring that is fully integrated with our strategy, and apply the whole strategy to an object-oriented system, namely the JHotDraw framework. The result of this migration is made available as an open-source project, which is the largest aspect refactoring available to date. We report on our experiences with conducting this case study and reflect on the success and challenges of the migration process, as well as on the feasibility of automatic aspect refactoring. △ Less

Submitted 22 July, 2007; v1 submitted 16 July, 2007; originally announced July 2007.

Comments: 10+ 4 pages

Report number: TUD-SERG-2007-019 ACM Class: D.2

arXiv:cs/0609147 [pdf]

Identifying Crosscutting Concerns Using Fan-in Analysis

Authors: Marius Marin, Arie van Deursen, Leon Moonen

Abstract: Aspect mining is a reverse engineering process that aims at finding crosscutting concerns in existing systems. This paper proposes an aspect mining approach based on determining methods that are called from many different places, and hence have a high fan-in, which can be seen as a symptom of crosscutting functionality. The approach is semi-automatic, and consists of three steps: metric calculat… ▽ More Aspect mining is a reverse engineering process that aims at finding crosscutting concerns in existing systems. This paper proposes an aspect mining approach based on determining methods that are called from many different places, and hence have a high fan-in, which can be seen as a symptom of crosscutting functionality. The approach is semi-automatic, and consists of three steps: metric calculation, method filtering, and call site analysis. Carrying out these steps is an interactive process supported by an Eclipse plug-in called FINT. Fan-in analysis has been applied to three open source Java systems, totaling around 200,000 lines of code. The most interesting concerns identified are discussed in detail, which includes several concerns not previously discussed in the aspect-oriented literature. The results show that a significant number of crosscutting concerns can be recognized using fan-in analysis, and each of the three steps can be supported by tools. △ Less

Submitted 19 February, 2007; v1 submitted 26 September, 2006; originally announced September 2006.

Comments: 34+4 pages; Extended version [Marin et al. 2004a]

Report number: TUD-SERG-2006-013 ACM Class: D.2.3; D.2.7; D.2.8

Journal ref: ACM Transactions on Software Engineering and Methodology, 2007

arXiv:cs/0607063 [pdf]

Prioritizing Software Inspection Results using Static Profiling

Authors: Cathal Boogerd, Leon Moonen

Abstract: Static software checking tools are useful as an additional automated software inspection step that can easily be integrated in the development cycle and assist in creating secure, reliable and high quality code. However, an often quoted disadvantage of these tools is that they generate an overly large number of warnings, including many false positives due to the approximate analysis techniques.… ▽ More Static software checking tools are useful as an additional automated software inspection step that can easily be integrated in the development cycle and assist in creating secure, reliable and high quality code. However, an often quoted disadvantage of these tools is that they generate an overly large number of warnings, including many false positives due to the approximate analysis techniques. This information overload effectively limits their usefulness. In this paper we present ELAN, a technique that helps the user prioritize the information generated by a software inspection tool, based on a demand-driven computation of the likelihood that execution reaches the locations for which warnings are reported. This analysis is orthogonal to other prioritization techniques known from literature, such as severity levels and statistical analysis to reduce false positives. We evaluate feasibility of our technique using a number of case studies and assess the quality of our predictions by comparing them to actual values obtained by dynamic profiling. △ Less

Submitted 12 July, 2006; originally announced July 2006.

Comments: 14 pages

Report number: TUD-SERG-2006-001

arXiv:cs/0607006 [pdf]

Applying and Combining Three Different Aspect Mining Techniques

Authors: Mariano Ceccato, Marius Marin, Kim Mens, Leon Moonen, Paolo Tonella, Tom Tourwe

Abstract: Understanding a software system at source-code level requires understanding the different concerns that it addresses, which in turn requires a way to identify these concerns in the source code. Whereas some concerns are explicitly represented by program entities (like classes, methods and variables) and thus are easy to identify, crosscutting concerns are not captured by a single program entity… ▽ More Understanding a software system at source-code level requires understanding the different concerns that it addresses, which in turn requires a way to identify these concerns in the source code. Whereas some concerns are explicitly represented by program entities (like classes, methods and variables) and thus are easy to identify, crosscutting concerns are not captured by a single program entity but are scattered over many program entities and are tangled with the other concerns. Because of their crosscutting nature, such crosscutting concerns are difficult to identify, and reduce the understandability of the system as a whole. In this paper, we report on a combined experiment in which we try to identify crosscutting concerns in the JHotDraw framework automatically. We first apply three independently developed aspect mining techniques to JHotDraw and evaluate and compare their results. Based on this analysis, we present three interesting combinations of these three techniques, and show how these combinations provide a more complete coverage of the detected concerns as compared to the original techniques individually. Our results are a first step towards improving the understandability of a system that contains crosscutting concerns, and can be used as a basis for refactoring the identified crosscutting concerns into aspects. △ Less

Submitted 2 July, 2006; originally announced July 2006.

Comments: 28 pages

Report number: TUD-SERG-2006-002

arXiv:cs/0606113 [pdf]

doi 10.1109/WCRE.2006.6

A common framework for aspect mining based on crosscutting concern sorts

Authors: Marius Marin, Leon Moonen, Arie van Deursen

Abstract: The increasing number of aspect mining techniques proposed in literature calls for a methodological way of comparing and combining them in order to assess, and improve on, their quality. This paper addresses this situation by proposing a common framework based on crosscutting concern sorts which allows for consistent assessment, comparison and combination of aspect mining techniques. The framewo… ▽ More The increasing number of aspect mining techniques proposed in literature calls for a methodological way of comparing and combining them in order to assess, and improve on, their quality. This paper addresses this situation by proposing a common framework based on crosscutting concern sorts which allows for consistent assessment, comparison and combination of aspect mining techniques. The framework identifies a set of requirements that ensure homogeneity in formulating the mining goals, presenting the results and assessing their quality. We demonstrate feasibility of the approach by retrofitting an existing aspect mining technique to the framework, and by using it to design and implement two new mining techniques. We apply the three techniques to a known aspect mining benchmark and show how they can be consistently assessed and combined to increase the quality of the results. The techniques and combinations are implemented in FINT, our publicly available free aspect mining tool. △ Less

Submitted 27 June, 2006; originally announced June 2006.

Comments: 14 pages

Report number: TUD-SERG-2006-009

Journal ref: Proceedings Working Conference on Reverse Engineering (WCRE), IEEE Computer Society, 2006, pages 29-38

arXiv:cs/0503015 [pdf, ps, other]

A Systematic Aspect-Oriented Refactoring and Testing Strategy, and its Application to JHotDraw

Authors: Arie van Deursen, Marius Marin, Leon Moonen

Abstract: Aspect oriented programming aims at achieving better modularization for a system's crosscutting concerns in order to improve its key quality attributes, such as evolvability and reusability. Consequently, the adoption of aspect-oriented techniques in existing (legacy) software systems is of interest to remediate software aging. The refactoring of existing systems to employ aspect-orientation wil… ▽ More Aspect oriented programming aims at achieving better modularization for a system's crosscutting concerns in order to improve its key quality attributes, such as evolvability and reusability. Consequently, the adoption of aspect-oriented techniques in existing (legacy) software systems is of interest to remediate software aging. The refactoring of existing systems to employ aspect-orientation will be considerably eased by a systematic approach that will ensure a safe and consistent migration. In this paper, we propose a refactoring and testing strategy that supports such an approach and consider issues of behavior conservation and (incremental) integration of the aspect-oriented solution with the original system. The strategy is applied to the JHotDraw open source project and illustrated on a group of selected concerns. Finally, we abstract from the case study and present a number of generic refactorings which contribute to an incremental aspect-oriented refactoring process and associate particular types of crosscutting concerns to the model and features of the employed aspect language. The contributions of this paper are both in the area of supporting migration towards aspect-oriented solutions and supporting the development of aspect languages that are better suited for such migrations. △ Less

Submitted 5 March, 2005; originally announced March 2005.

Comments: 25 pages

ACM Class: D.2.7; D.2.5; D.1.5

Showing 1–21 of 21 results for author: Moonen, L