Computer Science > Software Engineering
[Submitted on 13 Apr 2021]
Title:Feature-Oriented Defect Prediction: Scenarios, Metrics, and Classifiers
View PDFAbstract:Several software defect prediction techniques have been developed over the past decades. These techniques predict defects at the granularity of typical software assets, such as components and files. In this paper, we investigate feature-oriented defect prediction: predicting defects at the granularity of features -- domain-entities that represent software functionality and often cross-cut software assets. Feature-oriented defect prediction can be beneficial since: (i) some features might be more error-prone than others, (ii) characteristics of defective features might be useful to predict other error-prone features, and (iii) feature-specific code might be prone to faults arising from feature interactions. We explore the feasibility and solution space for feature-oriented defect prediction. Our study relies on 12 software projects from which we analyzed 13,685 bug-introducing and corrective commits, and systematically generated 62,868 training and test datasets to evaluate classifiers, metrics, and scenarios. The datasets were generated based on the 13,685 commits, 81 releases, and 24, 532 permutations of our 12 projects depending on the scenario addressed. We covered scenarios such as just-in-time (JIT) and cross-project defect prediction. Our results confirm the feasibility of feature-oriented defect prediction. We found the best performance (i.e., precision and robustness) when using the Random Forest classifier, with process and structure metrics. Surprisingly, single-project JIT and release-level predictions had median AUC-ROC values greater than 95% and 90% respectively, contrary to studies that assert poor performance due to insufficient training data. We also found that a model trained on release-level data from one of the twelve projects could predict defect-proneness of features in the other eleven projects with median AUC-ROC of 82%, without retraining.
Submission history
From: Mukelabai Mukelabai [view email][v1] Tue, 13 Apr 2021 13:06:51 UTC (3,380 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.