Search | arXiv e-print repository

Toward Improved Deep Learning-based Vulnerability Detection

Authors: Adriana Sejfia, Satyaki Das, Saad Shafiq, Nenad Medvidović

Abstract: Deep learning (DL) has been a common thread across several recent techniques for vulnerability detection. The rise of large, publicly available datasets of vulnerabilities has fueled the learning process underpinning these techniques. While these datasets help the DL-based vulnerability detectors, they also constrain these detectors' predictive abilities. Vulnerabilities in these datasets have to… ▽ More Deep learning (DL) has been a common thread across several recent techniques for vulnerability detection. The rise of large, publicly available datasets of vulnerabilities has fueled the learning process underpinning these techniques. While these datasets help the DL-based vulnerability detectors, they also constrain these detectors' predictive abilities. Vulnerabilities in these datasets have to be represented in a certain way, e.g., code lines, functions, or program slices within which the vulnerabilities exist. We refer to this representation as a base unit. The detectors learn how base units can be vulnerable and then predict whether other base units are vulnerable. We have hypothesized that this focus on individual base units harms the ability of the detectors to properly detect those vulnerabilities that span multiple base units (or MBU vulnerabilities). For vulnerabilities such as these, a correct detection occurs when all comprising base units are detected as vulnerable. Verifying how existing techniques perform in detecting all parts of a vulnerability is important to establish their effectiveness for other downstream tasks. To evaluate our hypothesis, we conducted a study focusing on three prominent DL-based detectors: ReVeal, DeepWukong, and LineVul. Our study shows that all three detectors contain MBU vulnerabilities in their respective datasets. Further, we observed significant accuracy drops when detecting these types of vulnerabilities. We present our study and a framework that can be used to help DL-based detectors toward the proper inclusion of MBU vulnerabilities. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2202.13953 [pdf, other]

doi 10.1145/3510003.3510104

Practical Automated Detection of Malicious npm Packages

Authors: Adriana Sejfia, Max Schäfer

Abstract: The npm registry is one of the pillars of the JavaScript and TypeScript ecosystems, hosting over 1.7 million packages ranging from simple utility libraries to complex frameworks and entire applications. Due to the overwhelming popularity of npm, it has become a prime target for malicious actors, who publish new packages or compromise existing packages to introduce malware that tampers with or exfi… ▽ More The npm registry is one of the pillars of the JavaScript and TypeScript ecosystems, hosting over 1.7 million packages ranging from simple utility libraries to complex frameworks and entire applications. Due to the overwhelming popularity of npm, it has become a prime target for malicious actors, who publish new packages or compromise existing packages to introduce malware that tampers with or exfiltrates sensitive data from users who install either these packages or any package that (transitively) depends on them. Defending against such attacks is essential to maintaining the integrity of the software supply chain, but the sheer volume of package updates makes comprehensive manual review infeasible. We present Amalfi, a machine-learning based approach for automatically detecting potentially malicious packages comprised of three complementary techniques. We start with classifiers trained on known examples of malicious and benign packages. If a package is flagged as malicious by a classifier, we then check whether it includes metadata about its source repository, and if so whether the package can be reproduced from its source code. Packages that are reproducible from source are not usually malicious, so this step allows us to weed out false positives. Finally, we also employ a simple textual clone-detection technique to identify copies of malicious packages that may have been missed by the classifiers, reducing the number of false negatives. Amalfi improves on the state of the art in that it is lightweight, requiring only a few seconds per package to extract features and run the classifiers, and gives good results in practice: running it on 96287 package versions published over the course of one week, we were able to identify 95 previously unknown malware samples, with a manageable number of false positives. △ Less

Submitted 28 February, 2022; originally announced February 2022.

Comments: 12 pages, accepted for publication at ICSE 2022

arXiv:2011.04654 [pdf, other]

Assessing the Feasibility of Web-Request Prediction Models on Mobile Platforms

Authors: Yixue Zhao, Siwei Yin, Adriana Sejfia, Marcelo Schmitt Laser, Haoyu Wang, Nenad Medvidovic

Abstract: Prefetching web pages is a well-studied solution to reduce network latency by predicting users' future actions based on their past behaviors. However, such techniques are largely unexplored on mobile platforms. Today's privacy regulations make it infeasible to explore prefetching with the usual strategy of amassing large amounts of data over long periods and constructing conventional, "large" pred… ▽ More Prefetching web pages is a well-studied solution to reduce network latency by predicting users' future actions based on their past behaviors. However, such techniques are largely unexplored on mobile platforms. Today's privacy regulations make it infeasible to explore prefetching with the usual strategy of amassing large amounts of data over long periods and constructing conventional, "large" prediction models. Our work is based on the observation that this may not be necessary: Given previously reported mobile-device usage trends (e.g., repetitive behaviors in brief bursts), we hypothesized that prefetching should work effectively with "small" models trained on mobile-user requests collected during much shorter time periods. To test this hypothesis, we constructed a framework for automatically assessing prediction models, and used it to conduct an extensive empirical study based on over 15 million HTTP requests collected from nearly 11,500 mobile users during a 24-hour period, resulting in over 7 million models. Our results demonstrate the feasibility of prefetching with small models on mobile platforms, directly motivating future work in this area. We further introduce several strategies for improving prediction models while reducing the model size. Finally, our framework provides the foundation for future explorations of effective prediction models across a range of usage scenarios. △ Less

Submitted 23 March, 2021; v1 submitted 10 November, 2020; originally announced November 2020.

Comments: MOBILESoft 2021

arXiv:2008.03427 [pdf, other]

doi 10.1145/3368089.3409708

FrUITeR: A Framework for Evaluating UI Test Reuse

Authors: Yixue Zhao, Justin Chen, Adriana Sejfia, Marcelo Schmitt Laser, Jie Zhang, Federica Sarro, Mark Harman, Nenad Medvidovic

Abstract: UI testing is tedious and time-consuming due to the manual effort required. Recent research has explored opportunities for reusing existing UI tests from an app to automatically generate new tests for other apps. However, the evaluation of such techniques currently remains manual, unscalable, and unreproducible, which can waste effort and impede progress in this emerging area. We introduce FrUITeR… ▽ More UI testing is tedious and time-consuming due to the manual effort required. Recent research has explored opportunities for reusing existing UI tests from an app to automatically generate new tests for other apps. However, the evaluation of such techniques currently remains manual, unscalable, and unreproducible, which can waste effort and impede progress in this emerging area. We introduce FrUITeR, a framework that automatically evaluates UI test reuse in a reproducible way. We apply FrUITeR to existing test-reuse techniques on a uniform benchmark we established, resulting in 11,917 test reuse cases from 20 apps. We report several key findings aimed at improving UI test reuse that are missed by existing work. △ Less

Submitted 3 November, 2020; v1 submitted 7 August, 2020; originally announced August 2020.

Comments: ESEC/FSE 2020

Showing 1–4 of 4 results for author: Sejfia, A