-
The Untold Impact of Learning Approaches on Software Fault-Proneness Predictions
Authors:
Mohammad Jamil Ahmad,
Katerina Goseva-Popstojanova,
Robyn R. Lutz
Abstract:
Software fault-proneness prediction is an active research area, with many factors affecting prediction performance extensively studied. However, the impact of the learning approach (i.e., the specifics of the data used for training and the target variable being predicted) on the prediction performance has not been studied, except for one initial work. This paper explores the effects of two learnin…
▽ More
Software fault-proneness prediction is an active research area, with many factors affecting prediction performance extensively studied. However, the impact of the learning approach (i.e., the specifics of the data used for training and the target variable being predicted) on the prediction performance has not been studied, except for one initial work. This paper explores the effects of two learning approaches, useAllPredictAll and usePrePredictPost, on the performance of software fault-proneness prediction, both within-release and across-releases. The empirical results are based on data extracted from 64 releases of twelve open-source projects. Results show that the learning approach has a substantial, and typically unacknowledged, impact on the classification performance. Specifically, using useAllPredictAll leads to significantly better performance than using usePrePredictPost learning approach, both within-release and across-releases. Furthermore, this paper uncovers that, for within-release predictions, this difference in classification performance is due to different levels of class imbalance in the two learning approaches. When class imbalance is addressed, the performance difference between the learning approaches is eliminated. Our findings imply that the learning approach should always be explicitly identified and its impact on software fault-proneness prediction considered. The paper concludes with a discussion of potential consequences of our results for both research and practice.
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
Towards Malware Detection via CPU Power Consumption: Data Collection Design and Analytics (Extended Version)
Authors:
Robert Bridges,
Jarilyn Hernandez Jimenez,
Jeffrey Nichols,
Katerina Goseva-Popstojanova,
Stacy Prowell
Abstract:
This paper presents an experimental design and data analytics approach aimed at power-based malware detection on general-purpose computers. Leveraging the fact that malware executions must consume power, we explore the postulate that malware can be accurately detected via power data analytics. Our experimental design and implementation allow for programmatic collection of CPU power profiles for fi…
▽ More
This paper presents an experimental design and data analytics approach aimed at power-based malware detection on general-purpose computers. Leveraging the fact that malware executions must consume power, we explore the postulate that malware can be accurately detected via power data analytics. Our experimental design and implementation allow for programmatic collection of CPU power profiles for fixed tasks during uninfected and infected states using five different rootkits. To characterize the power consumption profiles, we use both simple statistical and novel, sophisticated features. We test a one-class anomaly detection ensemble (that baselines non-infected power profiles) and several kernel-based SVM classifiers (that train on both uninfected and infected profiles) in detecting previously unseen malware and clean profiles. The anomaly detection system exhibits perfect detection when using all features and tasks, with smaller false detection rate than the supervised classifiers. The primary contribution is the proof of concept that baselining power of fixed tasks can provide accurate detection of rootkits. Moreover, our treatment presents engineering hurdles needed for experimentation and allows analysis of each statistical feature individually. This work appears to be the first step towards a viable power-based detection capability for general-purpose computers, and presents next steps toward this goal.
△ Less
Submitted 16 May, 2018;
originally announced May 2018.
-
Malware Detection on General-Purpose Computers Using Power Consumption Monitoring: A Proof of Concept and Case Study
Authors:
Jarilyn M. Hernández Jiménez,
Jeffrey A. Nichols,
Katerina Goseva-Popstojanova,
Stacy Prowell,
Robert A. Bridges
Abstract:
Malware detection is challenging when faced with automatically generated and polymorphic malware, as well as with rootkits, which are exceptionally hard to detect. In an attempt to contribute towards addressing these challenges, we conducted a proof of concept study that explored the use of power consumption for detection of malware presence in a general-purpose computer. The results of our experi…
▽ More
Malware detection is challenging when faced with automatically generated and polymorphic malware, as well as with rootkits, which are exceptionally hard to detect. In an attempt to contribute towards addressing these challenges, we conducted a proof of concept study that explored the use of power consumption for detection of malware presence in a general-purpose computer. The results of our experiments indicate that malware indeed leaves a signal on the power consumption of a general-purpose computer. Specifically, for the case study based on two different rootkits, the data collected at the +12V rails on the motherboard showed the most noticeable increment of the power consumption after the computer was infected. Our future work includes experimenting with more malware examples and workloads, and develo** data analytics approach for automatic malware detection based on power consumption.
△ Less
Submitted 4 May, 2017;
originally announced May 2017.