-
Toward Improved Deep Learning-based Vulnerability Detection
Authors:
Adriana Sejfia,
Satyaki Das,
Saad Shafiq,
Nenad Medvidović
Abstract:
Deep learning (DL) has been a common thread across several recent techniques for vulnerability detection. The rise of large, publicly available datasets of vulnerabilities has fueled the learning process underpinning these techniques. While these datasets help the DL-based vulnerability detectors, they also constrain these detectors' predictive abilities. Vulnerabilities in these datasets have to…
▽ More
Deep learning (DL) has been a common thread across several recent techniques for vulnerability detection. The rise of large, publicly available datasets of vulnerabilities has fueled the learning process underpinning these techniques. While these datasets help the DL-based vulnerability detectors, they also constrain these detectors' predictive abilities. Vulnerabilities in these datasets have to be represented in a certain way, e.g., code lines, functions, or program slices within which the vulnerabilities exist. We refer to this representation as a base unit. The detectors learn how base units can be vulnerable and then predict whether other base units are vulnerable. We have hypothesized that this focus on individual base units harms the ability of the detectors to properly detect those vulnerabilities that span multiple base units (or MBU vulnerabilities). For vulnerabilities such as these, a correct detection occurs when all comprising base units are detected as vulnerable. Verifying how existing techniques perform in detecting all parts of a vulnerability is important to establish their effectiveness for other downstream tasks. To evaluate our hypothesis, we conducted a study focusing on three prominent DL-based detectors: ReVeal, DeepWukong, and LineVul. Our study shows that all three detectors contain MBU vulnerabilities in their respective datasets. Further, we observed significant accuracy drops when detecting these types of vulnerabilities. We present our study and a framework that can be used to help DL-based detectors toward the proper inclusion of MBU vulnerabilities.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Practical Automated Detection of Malicious npm Packages
Authors:
Adriana Sejfia,
Max Schäfer
Abstract:
The npm registry is one of the pillars of the JavaScript and TypeScript ecosystems, hosting over 1.7 million packages ranging from simple utility libraries to complex frameworks and entire applications. Due to the overwhelming popularity of npm, it has become a prime target for malicious actors, who publish new packages or compromise existing packages to introduce malware that tampers with or exfi…
▽ More
The npm registry is one of the pillars of the JavaScript and TypeScript ecosystems, hosting over 1.7 million packages ranging from simple utility libraries to complex frameworks and entire applications. Due to the overwhelming popularity of npm, it has become a prime target for malicious actors, who publish new packages or compromise existing packages to introduce malware that tampers with or exfiltrates sensitive data from users who install either these packages or any package that (transitively) depends on them. Defending against such attacks is essential to maintaining the integrity of the software supply chain, but the sheer volume of package updates makes comprehensive manual review infeasible.
We present Amalfi, a machine-learning based approach for automatically detecting potentially malicious packages comprised of three complementary techniques. We start with classifiers trained on known examples of malicious and benign packages. If a package is flagged as malicious by a classifier, we then check whether it includes metadata about its source repository, and if so whether the package can be reproduced from its source code. Packages that are reproducible from source are not usually malicious, so this step allows us to weed out false positives. Finally, we also employ a simple textual clone-detection technique to identify copies of malicious packages that may have been missed by the classifiers, reducing the number of false negatives.
Amalfi improves on the state of the art in that it is lightweight, requiring only a few seconds per package to extract features and run the classifiers, and gives good results in practice: running it on 96287 package versions published over the course of one week, we were able to identify 95 previously unknown malware samples, with a manageable number of false positives.
△ Less
Submitted 28 February, 2022;
originally announced February 2022.
-
Assessing the Feasibility of Web-Request Prediction Models on Mobile Platforms
Authors:
Yixue Zhao,
Siwei Yin,
Adriana Sejfia,
Marcelo Schmitt Laser,
Haoyu Wang,
Nenad Medvidovic
Abstract:
Prefetching web pages is a well-studied solution to reduce network latency by predicting users' future actions based on their past behaviors. However, such techniques are largely unexplored on mobile platforms. Today's privacy regulations make it infeasible to explore prefetching with the usual strategy of amassing large amounts of data over long periods and constructing conventional, "large" pred…
▽ More
Prefetching web pages is a well-studied solution to reduce network latency by predicting users' future actions based on their past behaviors. However, such techniques are largely unexplored on mobile platforms. Today's privacy regulations make it infeasible to explore prefetching with the usual strategy of amassing large amounts of data over long periods and constructing conventional, "large" prediction models. Our work is based on the observation that this may not be necessary: Given previously reported mobile-device usage trends (e.g., repetitive behaviors in brief bursts), we hypothesized that prefetching should work effectively with "small" models trained on mobile-user requests collected during much shorter time periods. To test this hypothesis, we constructed a framework for automatically assessing prediction models, and used it to conduct an extensive empirical study based on over 15 million HTTP requests collected from nearly 11,500 mobile users during a 24-hour period, resulting in over 7 million models. Our results demonstrate the feasibility of prefetching with small models on mobile platforms, directly motivating future work in this area. We further introduce several strategies for improving prediction models while reducing the model size. Finally, our framework provides the foundation for future explorations of effective prediction models across a range of usage scenarios.
△ Less
Submitted 23 March, 2021; v1 submitted 10 November, 2020;
originally announced November 2020.
-
FrUITeR: A Framework for Evaluating UI Test Reuse
Authors:
Yixue Zhao,
Justin Chen,
Adriana Sejfia,
Marcelo Schmitt Laser,
Jie Zhang,
Federica Sarro,
Mark Harman,
Nenad Medvidovic
Abstract:
UI testing is tedious and time-consuming due to the manual effort required. Recent research has explored opportunities for reusing existing UI tests from an app to automatically generate new tests for other apps. However, the evaluation of such techniques currently remains manual, unscalable, and unreproducible, which can waste effort and impede progress in this emerging area. We introduce FrUITeR…
▽ More
UI testing is tedious and time-consuming due to the manual effort required. Recent research has explored opportunities for reusing existing UI tests from an app to automatically generate new tests for other apps. However, the evaluation of such techniques currently remains manual, unscalable, and unreproducible, which can waste effort and impede progress in this emerging area. We introduce FrUITeR, a framework that automatically evaluates UI test reuse in a reproducible way. We apply FrUITeR to existing test-reuse techniques on a uniform benchmark we established, resulting in 11,917 test reuse cases from 20 apps. We report several key findings aimed at improving UI test reuse that are missed by existing work.
△ Less
Submitted 3 November, 2020; v1 submitted 7 August, 2020;
originally announced August 2020.