Search | arXiv e-print repository

doi 10.1145/3664646.3665664

AI-Assisted Assessment of Coding Practices in Modern Code Review

Authors: Manushree Vijayvergiya, Małgorzata Salawa, Ivan Budiselić, Dan Zheng, Pascal Lamblin, Marko Ivanković, Juanjo Carin, Mateusz Lewko, Jovan Andonov, Goran Petrović, Daniel Tarlow, Petros Maniatis, René Just

Abstract: Modern code review is a process in which an incremental code contribution made by a code author is reviewed by one or more peers before it is committed to the version control system. An important element of modern code review is verifying that code contributions adhere to best practices. While some of these best practices can be automatically verified, verifying others is commonly left to human re… ▽ More Modern code review is a process in which an incremental code contribution made by a code author is reviewed by one or more peers before it is committed to the version control system. An important element of modern code review is verifying that code contributions adhere to best practices. While some of these best practices can be automatically verified, verifying others is commonly left to human reviewers. This paper reports on the development, deployment, and evaluation of AutoCommenter, a system backed by a large language model that automatically learns and enforces coding best practices. We implemented AutoCommenter for four programming languages (C++, Java, Python, and Go) and evaluated its performance and adoption in a large industrial setting. Our evaluation shows that an end-to-end system for learning and enforcing coding best practices is feasible and has a positive impact on the developer workflow. Additionally, this paper reports on the challenges associated with deploying such a system to tens of thousands of developers and the corresponding lessons learned. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: To appear at the ACM International Conference on AI-Powered Software (AIware '24)

arXiv:2405.06783 [pdf, other]

BLIP: Facilitating the Exploration of Undesirable Consequences of Digital Technologies

Authors: Rock Yuren Pang, Sebastin Santy, René Just, Katharina Reinecke

Abstract: Digital technologies have positively transformed society, but they have also led to undesirable consequences not anticipated at the time of design or development. We posit that insights into past undesirable consequences can help researchers and practitioners gain awareness and anticipate potential adverse effects. To test this assumption, we introduce BLIP, a system that extracts real-world undes… ▽ More Digital technologies have positively transformed society, but they have also led to undesirable consequences not anticipated at the time of design or development. We posit that insights into past undesirable consequences can help researchers and practitioners gain awareness and anticipate potential adverse effects. To test this assumption, we introduce BLIP, a system that extracts real-world undesirable consequences of technology from online articles, summarizes and categorizes them, and presents them in an interactive, web-based interface. In two user studies with 15 researchers in various computer science disciplines, we found that BLIP substantially increased the number and diversity of undesirable consequences they could list in comparison to relying on prior knowledge or searching online. Moreover, BLIP helped them identify undesirable consequences relevant to their ongoing projects, made them aware of undesirable consequences they "had never considered," and inspired them to reflect on their own experiences with technology. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: To appear in the Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USA

arXiv:2401.16526 [pdf, other]

doi 10.1145/3620665.3640387

FPGA Technology Map** Using Sketch-Guided Program Synthesis

Authors: Gus Henry Smith, Ben Kushigian, Vishal Canumalla, Andrew Cheung, Steven Lyubomirsky, Sorawee Porncharoenwase, René Just, Gilbert Louis Bernstein, Zachary Tatlock

Abstract: FPGA technology map** is the process of implementing a hardware design expressed in high-level HDL (hardware design language) code using the low-level, architecture-specific primitives of the target FPGA. As FPGAs become increasingly heterogeneous, achieving high performance requires hardware synthesis tools that better support map** to complex, highly configurable primitives like digital sign… ▽ More FPGA technology map** is the process of implementing a hardware design expressed in high-level HDL (hardware design language) code using the low-level, architecture-specific primitives of the target FPGA. As FPGAs become increasingly heterogeneous, achieving high performance requires hardware synthesis tools that better support map** to complex, highly configurable primitives like digital signal processors (DSPs). Current tools support DSP map** via handwritten special-case map** rules, which are laborious to write, error-prone, and often overlook map** opportunities. We introduce Lakeroad, a principled approach to technology map** via sketch-guided program synthesis. Lakeroad leverages two techniques -- architecture-independent sketch templates and semantics extraction from HDL -- to provide extensible technology map** with stronger correctness guarantees and higher coverage of map** opportunities than state-of-the-art tools. Across representative microbenchmarks, Lakeroad produces 2--3.5$\times$ the number of optimal map**s compared to proprietary state-of-the-art tools and 6--44$\times$ the number of optimal map**s compared to popular open-source tools, while also providing correctness guarantees not given by any other tool. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2310.16262 [pdf, other]

rTisane: Externalizing conceptual models for data analysis increases engagement with domain knowledge and improves statistical model quality

Authors: Eunice Jun, Edward Misback, Jeffrey Heer, René Just

Abstract: Statistical models should accurately reflect analysts' domain knowledge about variables and their relationships. While recent tools let analysts express these assumptions and use them to produce a resulting statistical model, it remains unclear what analysts want to express and how externalization impacts statistical model quality. This paper addresses these gaps. We first conduct an exploratory s… ▽ More Statistical models should accurately reflect analysts' domain knowledge about variables and their relationships. While recent tools let analysts express these assumptions and use them to produce a resulting statistical model, it remains unclear what analysts want to express and how externalization impacts statistical model quality. This paper addresses these gaps. We first conduct an exploratory study of analysts using a domain-specific language (DSL) to express conceptual models. We observe a preference for detailing how variables relate and a desire to allow, and then later resolve, ambiguity in their conceptual models. We leverage these findings to develop rTisane, a DSL for expressing conceptual models augmented with an interactive disambiguation process. In a controlled evaluation, we find that rTisane's DSL helps analysts engage more deeply with and accurately externalize their assumptions. rTisane also leads to statistical models that match analysts' assumptions, maintain analysis intent, and better fit the data. △ Less

Submitted 24 October, 2023; originally announced October 2023.

ACM Class: H.5.2; D.2.2; H.1.2; D.3.2

arXiv:2306.09130 [pdf, other]

MuRS: Mutant Ranking and Suppression using Identifier Templates

Authors: Zimin Chen, Malgorzata Salawa, Manushree Vijayvergiya, Goran Petrovic, Marko Ivankovic, Rene Just

Abstract: Diff-based mutation testing is a mutation testing approach that only mutates lines affected by a code change under review. Google's mutation testing service integrates diff-based mutation testing into the code review process and continuously gathers developer feedback on mutants surfaced during code review. To enhance the developer experience, the mutation testing service implements a number of su… ▽ More Diff-based mutation testing is a mutation testing approach that only mutates lines affected by a code change under review. Google's mutation testing service integrates diff-based mutation testing into the code review process and continuously gathers developer feedback on mutants surfaced during code review. To enhance the developer experience, the mutation testing service implements a number of suppression rules, which target not-useful mutants-that is, mutants that have consistently received negative developer feedback. However, while effective, manually implementing suppression rules require significant engineering time. An automatic system to rank and suppress mutants would facilitate the maintenance of the mutation testing service. This paper proposes and evaluates MuRS, an automated approach that groups mutants by patterns in the source code under test and uses these patterns to rank and suppress future mutants based on historical developer feedback on mutants in the same group. To evaluate MuRS, we conducted an A/B testing study, comparing MuRS to the existing mutation testing service. Despite the strong baseline, which uses manually developed suppression rules, the results show a statistically significantly lower negative feedback ratio of 11.45% for MuRS versus 12.41% for the baseline. The results also show that MuRS is able to recover existing suppression rules implemented in the baseline. Finally, the results show that statement-deletion mutant groups received both the most positive and negative developer feedback, suggesting a need for additional context that can distinguish between useful and not-useful mutants in these groups. Overall, MuRS has the potential to substantially reduce the development and maintenance cost for an effective mutation testing service by automatically learning suppression rules. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2305.09580 [pdf, other]

Generate Compilers from Hardware Models!

Authors: Gus Henry Smith, Ben Kushigian, Vishal Canumalla, Andrew Cheung, René Just, Zachary Tatlock

Abstract: Compiler backends should be automatically generated from hardware design language (HDL) models of the hardware they target. Generating compiler components directly from HDL can provide stronger correctness guarantees, ease development effort, and encourage hardware exploration. Past work has already championed this idea; here we argue that advances in program synthesis make the approach more feasi… ▽ More Compiler backends should be automatically generated from hardware design language (HDL) models of the hardware they target. Generating compiler components directly from HDL can provide stronger correctness guarantees, ease development effort, and encourage hardware exploration. Past work has already championed this idea; here we argue that advances in program synthesis make the approach more feasible. We present a concrete example by demonstrating how FPGA technology mappers can be automatically generated from SystemVerilog models of an FPGA's primitives using program synthesis. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: 3 pages, 2 figures, to be presented at the 2023 PLARCH Workshop at FCRC

arXiv:2302.03927 [pdf, other]

doi 10.1109/ICSE48619.2023.00199

On the Applicability of Language Models to Block-Based Programs

Authors: Elisabeth Griebl, Benedikt Fein, Florian Obermüller, Gordon Fraser, René Just

Abstract: Block-based programming languages like Scratch are increasingly popular for programming education and end-user programming. Recent program analyses build on the insight that source code can be modelled using techniques from natural language processing. Many of the regularities of source code that support this approach are due to the syntactic overhead imposed by textual programming languages. This… ▽ More Block-based programming languages like Scratch are increasingly popular for programming education and end-user programming. Recent program analyses build on the insight that source code can be modelled using techniques from natural language processing. Many of the regularities of source code that support this approach are due to the syntactic overhead imposed by textual programming languages. This syntactic overhead, however, is precisely what block-based languages remove in order to simplify programming. Consequently, it is unclear how well this modelling approach performs on block-based programming languages. In this paper, we investigate the applicability of language models for the popular block-based programming language Scratch. We model Scratch programs using n-gram models, the most essential type of language model, and transformers, a popular deep learning model. Evaluation on the example tasks of code completion and bug finding confirm that blocks inhibit predictability, but the use of language models is nevertheless feasible. Our findings serve as foundation for improving tooling and analyses for block-based languages. △ Less

Submitted 8 February, 2023; originally announced February 2023.

Comments: To appear at the 45th IEEE/ACM International Conference on Software Engineering (ICSE'2023)

MSC Class: 68-04 ACM Class: D.2.5; D.2.3

arXiv:2203.10677 [pdf, other]

doi 10.1145/3510003.3512764

Repairing Brain-Computer Interfaces with Fault-Based Data Acquisition

Authors: Cailin Winston, Caleb Winston, Chloe N Winston, Claris Winston, Cleah Winston, Rajesh PN Rao, René Just

Abstract: Brain-computer interfaces (BCIs) decode recorded neural signals from the brain and/or stimulate the brain with encoded neural signals. BCIs span both hardware and software and have a wide range of applications in restorative medicine, from restoring movement through prostheses and robotic limbs to restoring sensation and communication through spellers. BCIs also have applications in diagnostic med… ▽ More Brain-computer interfaces (BCIs) decode recorded neural signals from the brain and/or stimulate the brain with encoded neural signals. BCIs span both hardware and software and have a wide range of applications in restorative medicine, from restoring movement through prostheses and robotic limbs to restoring sensation and communication through spellers. BCIs also have applications in diagnostic medicine, e.g., providing clinicians with data for detecting seizures, sleep patterns, or emotions. Despite their promise, BCIs have not yet been adopted for long-term, day-to-day use because of challenges related to reliability and robustness, which are needed for safe operation in all scenarios. Ensuring safe operation currently requires hours of manual data collection and recalibration, involving both patients and clinicians. However, data collection is not targeted at eliminating specific faults in a BCI. This paper presents a new methodology for characterizing, detecting, and localizing faults in BCIs. Specifically, it proposes partial test oracles as a method for detecting faults and slice functions as a method for localizing faults to characteristic patterns in the input data or relevant tasks performed by the user. Through targeted data acquisition and retraining, the proposed methodology improves the correctness of BCIs. We evaluated the proposed methodology on five BCI applications. The results show that the proposed methodology (1) precisely localizes faults and (2) can significantly reduce the frequency of faults through retraining based on targeted, fault-based data acquisition. These results suggest that the proposed methodology is a promising step towards repairing faulty BCIs. △ Less

Submitted 20 March, 2022; originally announced March 2022.

Comments: Accepted at International Conference on Software Engineering (ICSE-2022)

arXiv:2201.02705 [pdf, other]

doi 10.1145/3491102.3501888

Tisane: Authoring Statistical Models via Formal Reasoning from Conceptual and Data Relationships

Authors: Eunice Jun, Audrey Seo, Jeffrey Heer, René Just

Abstract: Proper statistical modeling incorporates domain theory about how concepts relate and details of how data were measured. However, data analysts currently lack tool support for recording and reasoning about domain assumptions, data collection, and modeling choices in an integrated manner, leading to mistakes that can compromise scientific validity. For instance, generalized linear mixed-effects mode… ▽ More Proper statistical modeling incorporates domain theory about how concepts relate and details of how data were measured. However, data analysts currently lack tool support for recording and reasoning about domain assumptions, data collection, and modeling choices in an integrated manner, leading to mistakes that can compromise scientific validity. For instance, generalized linear mixed-effects models (GLMMs) help answer complex research questions, but omitting random effects impairs the generalizability of results. To address this need, we present Tisane, a mixed-initiative system for authoring generalized linear models with and without mixed-effects. Tisane introduces a study design specification language for expressing and asking questions about relationships between variables. Tisane contributes an interactive compilation process that represents relationships in a graph, infers candidate statistical models, and asks follow-up questions to disambiguate user queries to construct a valid model. In case studies with three researchers, we find that Tisane helps them focus on their goals and assumptions while avoiding past mistakes. △ Less

Submitted 7 January, 2022; originally announced January 2022.

arXiv:2106.07520 [pdf, other]

JUGE: An Infrastructure for Benchmarking Java Unit Test Generators

Authors: Xavier Devroey, Alessio Gambi, Juan Pablo Galeotti, René Just, Fitsum Kifetew, Annibale Panichella, Sebastiano Panichella

Abstract: Researchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C#, or Python) and for various platforms (e.g., desktop, web, or mobile applications). Such generators exhibit varying effectiveness and efficiency, depending on the testing goals they aim to satisfy (e.g.… ▽ More Researchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C#, or Python) and for various platforms (e.g., desktop, web, or mobile applications). Such generators exhibit varying effectiveness and efficiency, depending on the testing goals they aim to satisfy (e.g., unit-testing of libraries vs. system-testing of entire applications) and the underlying techniques they implement. In this context, practitioners need to be able to compare different generators to identify the most suited one for their requirements, while researchers seek to identify future research directions. This can be achieved through the systematic execution of large-scale evaluations of different generators. However, the execution of such empirical evaluations is not trivial and requires a substantial effort to collect benchmarks, setup the evaluation infrastructure, and collect and analyse the results. In this paper, we present our JUnit Generation benchmarking infrastructure (JUGE) supporting generators (e.g., search-based, random-based, symbolic execution, etc.) seeking to automate the production of unit tests for various purposes (e.g., validation, regression testing, fault localization, etc.). The primary goal is to reduce the overall effort, ease the comparison of several generators, and enhance the knowledge transfer between academia and industry by standardizing the evaluation and comparison process. Since 2013, eight editions of a unit testing tool competition, co-located with the Search-Based Software Testing Workshop, have taken place and used and updated JUGE. As a result, an increasing amount of tools (over ten) from both academia and industry have been evaluated on JUGE, matured over the years, and allowed the identification of future research directions. △ Less

Submitted 28 October, 2022; v1 submitted 14 June, 2021; originally announced June 2021.

arXiv:2104.02712 [pdf, other]

Hypothesis Formalization: Empirical Findings, Software Limitations, and Design Implications

Authors: Eunice Jun, Melissa Birchfield, Nicole de Moura, Jeffrey Heer, Rene Just

Abstract: Data analysis requires translating higher level questions and hypotheses into computable statistical models. We present a mixed-methods study aimed at identifying the steps, considerations, and challenges involved in operationalizing hypotheses into statistical models, a process we refer to as hypothesis formalization. In a formative content analysis of research papers, we find that researchers hi… ▽ More Data analysis requires translating higher level questions and hypotheses into computable statistical models. We present a mixed-methods study aimed at identifying the steps, considerations, and challenges involved in operationalizing hypotheses into statistical models, a process we refer to as hypothesis formalization. In a formative content analysis of research papers, we find that researchers highlight decomposing a hypothesis into sub-hypotheses, selecting proxy variables, and formulating statistical models based on data collection design as key steps. In a lab study, we find that analysts fixated on implementation and shaped their analysis to fit familiar approaches, even if sub-optimal. In an analysis of software tools, we find that tools provide inconsistent, low-level abstractions that may limit the statistical models analysts use to formalize hypotheses. Based on these observations, we characterize hypothesis formalization as a dual-search process balancing conceptual and statistical considerations constrained by data and computation, and discuss implications for future tools. △ Less

Submitted 6 April, 2021; originally announced April 2021.

arXiv:2103.07189 [pdf, other]

Does mutation testing improve testing practices?

Authors: Goran Petrović, Marko Ivanković, Gordon Fraser, René Just

Abstract: Various proxy metrics for test quality have been defined in order to guide developers when writing tests. Code coverage is particularly well established in practice, even though the question of how coverage relates to test quality is a matter of ongoing debate. Mutation testing offers a promising alternative: Artificial defects can identify holes in a test suite, and thus provide concrete suggesti… ▽ More Various proxy metrics for test quality have been defined in order to guide developers when writing tests. Code coverage is particularly well established in practice, even though the question of how coverage relates to test quality is a matter of ongoing debate. Mutation testing offers a promising alternative: Artificial defects can identify holes in a test suite, and thus provide concrete suggestions for additional tests. Despite the obvious advantages of mutation testing, it is not yet well established in practice. Until recently, mutation testing tools and techniques simply did not scale to complex systems. Although they now do scale, a remaining obstacle is lack of evidence that writing tests for mutants actually improves test quality. In this paper we aim to fill this gap: By analyzing a large dataset of almost 15 million mutants, we investigate how these mutants influenced developers over time, and how these mutants relate to real faults. Our analyses suggest that developers using mutation testing write more tests, and actively improve their test suites with high quality tests such that fewer mutants remain. By analyzing a dataset of past fixes of real high-priority faults, our analyses further provide evidence that mutants are indeed coupled with real faults. In other words, had mutation testing been used for the changes introducing the faults, it would have reported a live mutant that could have prevented the bug. △ Less

Submitted 12 March, 2021; originally announced March 2021.

Comments: To be published in the Proceedings of the International Conference on Software Engineering (ICSE'21)

MSC Class: 68-04 ACM Class: D.2.5

arXiv:2102.11378 [pdf, other]

Practical Mutation Testing at Scale

Authors: Goran Petrović, Marko Ivanković, Gordon Fraser, René Just

Abstract: Mutation analysis assesses a test suite's adequacy by measuring its ability to detect small artificial faults, systematically seeded into the tested program. Mutation analysis is considered one of the strongest test-adequacy criteria. Mutation testing builds on top of mutation analysis and is a testing technique that uses mutants as test goals to create or improve a test suite. Mutation testing ha… ▽ More Mutation analysis assesses a test suite's adequacy by measuring its ability to detect small artificial faults, systematically seeded into the tested program. Mutation analysis is considered one of the strongest test-adequacy criteria. Mutation testing builds on top of mutation analysis and is a testing technique that uses mutants as test goals to create or improve a test suite. Mutation testing has long been considered intractable because the sheer number of mutants that can be created represents an insurmountable problem -- both in terms of human and computational effort. This has hindered the adoption of mutation testing as an industry standard. For example, Google has a codebase of two billion lines of code and more than 500,000,000 tests are executed on a daily basis. The traditional approach to mutation testing does not scale to such an environment. To address these challenges, this paper presents a scalable approach to mutation testing based on the following main ideas: (1) Mutation testing is done incrementally, mutating only changed code during code review, rather than the entire code base; (2) Mutants are filtered, removing mutants that are likely to be irrelevant to developers, and limiting the number of mutants per line and per code review process; (3) Mutants are selected based on the historical performance of mutation operators, further eliminating irrelevant mutants and improving mutant quality. Evaluation in a code-review-based setting with more than 24,000 developers on more than 1,000 projects shows that the proposed approach produces orders of magnitude fewer mutants and that context-based mutant filtering and selection improve mutant quality and actionability. Overall, the proposed approach represents a mutation testing framework that seamlessly integrates into the software development workflow and is applicable up to large-scale industrial settings. △ Less

Submitted 26 February, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

Comments: This work has been submitted to the IEEE for possible publication

MSC Class: 68-04 ACM Class: D.2.5

arXiv:2102.03054 [pdf, other]

Removing biased data to improve fairness and accuracy

Authors: Sahil Verma, Michael Ernst, Rene Just

Abstract: Machine learning systems are often trained using data collected from historical decisions. If past decisions were biased, then automated systems that learn from historical data will also be biased. We propose a black-box approach to identify and remove biased training data. Machine learning models trained on such debiased data (a subset of the original training data) have low individual discrimina… ▽ More Machine learning systems are often trained using data collected from historical decisions. If past decisions were biased, then automated systems that learn from historical data will also be biased. We propose a black-box approach to identify and remove biased training data. Machine learning models trained on such debiased data (a subset of the original training data) have low individual discrimination, often 0%. These models also have greater accuracy and lower statistical disparity than models trained on the full historical data. We evaluated our methodology in experiments using 6 real-world datasets. Our approach outperformed seven previous approaches in terms of individual discrimination and accuracy. △ Less

Submitted 5 February, 2021; originally announced February 2021.

Comments: 16 pages, 5 Figures, 8 Tables

arXiv:1904.05387 [pdf, other]

doi 10.1145/3332165.3347940

Tea: A High-level Language and Runtime System for Automating Statistical Analysis

Authors: Eunice Jun, Maureen Daum, Jared Roesch, Sarah E. Chasins, Emery D. Berger, Rene Just, Katharina Reinecke

Abstract: Though statistical analyses are centered on research questions and hypotheses, current statistical analysis tools are not. Users must first translate their hypotheses into specific statistical tests and then perform API calls with functions and parameters. To do so accurately requires that users have statistical expertise. To lower this barrier to valid, replicable statistical analysis, we introdu… ▽ More Though statistical analyses are centered on research questions and hypotheses, current statistical analysis tools are not. Users must first translate their hypotheses into specific statistical tests and then perform API calls with functions and parameters. To do so accurately requires that users have statistical expertise. To lower this barrier to valid, replicable statistical analysis, we introduce Tea, a high-level declarative language and runtime system. In Tea, users express their study design, any parametric assumptions, and their hypotheses. Tea compiles these high-level specifications into a constraint satisfaction problem that determines the set of valid statistical tests, and then executes them to test the hypothesis. We evaluate Tea using a suite of statistical analyses drawn from popular tutorials. We show that Tea generally matches the choices of experts while automatically switching to non-parametric tests when parametric assumptions are not met. We simulate the effect of mistakes made by non-expert users and show that Tea automatically avoids both false negatives and false positives that could be produced by the application of incorrect statistical tests. △ Less

Submitted 10 April, 2019; originally announced April 2019.

Comments: 11 pages

arXiv:1611.02516 [pdf, other]

Tailored Mutants Fit Bugs Better

Authors: Miltiadis Allamanis, Earl T. Barr, René Just, Charles Sutton

Abstract: Mutation analysis measures test suite adequacy, the degree to which a test suite detects seeded faults: one test suite is better than another if it detects more mutants. Mutation analysis effectiveness rests on the assumption that mutants are coupled with real faults i.e. mutant detection is strongly correlated with real fault detection. The work that validated this also showed that a large portio… ▽ More Mutation analysis measures test suite adequacy, the degree to which a test suite detects seeded faults: one test suite is better than another if it detects more mutants. Mutation analysis effectiveness rests on the assumption that mutants are coupled with real faults i.e. mutant detection is strongly correlated with real fault detection. The work that validated this also showed that a large portion of defects remain out of reach. We introduce tailored mutation operators to reach and capture these defects. Tailored mutation operators are built from and apply to an existing codebase and its history. They can, for instance, identify and replay errors specific to the project for which they are tailored. As our point of departure, we define tailored mutation operators for identifiers, which mutation analysis has largely ignored, because there are too many ways to mutate them. Evaluated on the Defects4J dataset, our new mutation operators creates mutants coupled to 14% more faults, compared to traditional mutation operators. These new mutation operators, however, quadruple the number of mutants. To combat this problem, we propose a new approach to mutant selection focusing on the location at which to apply mutation operators and the unnaturalness of the mutated code. The results demonstrate that the location selection heuristics produce mutants more closely coupled to real faults for a given budget of mutation operator applications. In summary, this paper defines and explores tailored mutation operators, advancing the state of the art in mutation testing in two ways: 1) it suggests mutation operators that mutate identifiers and literals, extending mutation analysis to a new class of faults and 2) it demonstrates that selecting the location where a mutation operator is applied decreases the number of generated mutants without affecting the coupling of mutants and real faults. △ Less

Submitted 8 November, 2016; originally announced November 2016.

arXiv:1303.2784 [pdf]

Using State Infection Conditions to Detect Equivalent Mutants and Speed up Mutation Analysis

Authors: René Just, Michael D. Ernst, Gordon Fraser

Abstract: Mutation analysis evaluates test suites and testing techniques by measuring how well they detect seeded defects (mutants). Even though well established in research, mutation analysis is rarely used in practice due to scalability problems --- there are multiple mutations per code statement leading to a large number of mutants, and hence executions of the test suite. In addition, the use of mutation… ▽ More Mutation analysis evaluates test suites and testing techniques by measuring how well they detect seeded defects (mutants). Even though well established in research, mutation analysis is rarely used in practice due to scalability problems --- there are multiple mutations per code statement leading to a large number of mutants, and hence executions of the test suite. In addition, the use of mutation to improve test suites is futile for mutants that are equivalent, which means that there exists no test case that distinguishes them from the original program. This paper introduces two optimizations based on state infection conditions, i.e., conditions that determine for a test execution whether the same execution on a mutant would lead to a different state. First, redundant test execution can be avoided by monitoring state infection conditions, leading to an overall performance improvement. Second, state infection conditions can aid in identifying equivalent mutants, thus guiding efforts to improve test suites. △ Less

Submitted 12 March, 2013; originally announced March 2013.

Report number: DPA-13021

Showing 1–17 of 17 results for author: Just, R