Search | arXiv e-print repository

Toward Improved Deep Learning-based Vulnerability Detection

Authors: Adriana Sejfia, Satyaki Das, Saad Shafiq, Nenad Medvidović

Abstract: Deep learning (DL) has been a common thread across several recent techniques for vulnerability detection. The rise of large, publicly available datasets of vulnerabilities has fueled the learning process underpinning these techniques. While these datasets help the DL-based vulnerability detectors, they also constrain these detectors' predictive abilities. Vulnerabilities in these datasets have to… ▽ More Deep learning (DL) has been a common thread across several recent techniques for vulnerability detection. The rise of large, publicly available datasets of vulnerabilities has fueled the learning process underpinning these techniques. While these datasets help the DL-based vulnerability detectors, they also constrain these detectors' predictive abilities. Vulnerabilities in these datasets have to be represented in a certain way, e.g., code lines, functions, or program slices within which the vulnerabilities exist. We refer to this representation as a base unit. The detectors learn how base units can be vulnerable and then predict whether other base units are vulnerable. We have hypothesized that this focus on individual base units harms the ability of the detectors to properly detect those vulnerabilities that span multiple base units (or MBU vulnerabilities). For vulnerabilities such as these, a correct detection occurs when all comprising base units are detected as vulnerable. Verifying how existing techniques perform in detecting all parts of a vulnerability is important to establish their effectiveness for other downstream tasks. To evaluate our hypothesis, we conducted a study focusing on three prominent DL-based detectors: ReVeal, DeepWukong, and LineVul. Our study shows that all three detectors contain MBU vulnerabilities in their respective datasets. Further, we observed significant accuracy drops when detecting these types of vulnerabilities. We present our study and a framework that can be used to help DL-based detectors toward the proper inclusion of MBU vulnerabilities. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2209.02577 [pdf, other]

doi 10.1145/3540250.3549134

Avgust: Automating Usage-Based Test Generation from Videos of App Executions

Authors: Yixue Zhao, Saghar Talebipour, Kesina Baral, Hyojae Park, Leon Yee, Safwat Ali Khan, Yuriy Brun, Nenad Medvidovic, Kevin Moran

Abstract: Writing and maintaining UI tests for mobile apps is a time-consuming and tedious task. While decades of research have produced automated approaches for UI test generation, these approaches typically focus on testing for crashes or maximizing code coverage. By contrast, recent research has shown that developers prefer usage-based tests, which center around specific uses of app features, to help sup… ▽ More Writing and maintaining UI tests for mobile apps is a time-consuming and tedious task. While decades of research have produced automated approaches for UI test generation, these approaches typically focus on testing for crashes or maximizing code coverage. By contrast, recent research has shown that developers prefer usage-based tests, which center around specific uses of app features, to help support activities such as regression testing. Very few existing techniques support the generation of such tests, as doing so requires automating the difficult task of understanding the semantics of UI screens and user inputs. In this paper, we introduce Avgust, which automates key steps of generating usage-based tests. Avgust uses neural models for image understanding to process video recordings of app uses to synthesize an app-agnostic state-machine encoding of those uses. Then, Avgust uses this encoding to synthesize test cases for a new target app. We evaluate Avgust on 374 videos of common uses of 18 popular apps and show that 69% of the tests Avgust generates successfully execute the desired usage, and that Avgust's classifiers outperform the state of the art. △ Less

Submitted 1 November, 2022; v1 submitted 6 September, 2022; originally announced September 2022.

Journal ref: ESEC/FSE 2022

arXiv:2104.08432 [pdf, other]

Architectural Archipelagos: Technical Debt in Long-Lived Software Research Platforms

Authors: Marcelo Schmitt Laser, Duc Minh Le, Joshua Garcia, Nenad Medvidović

Abstract: This paper identifies a model of software evolution that is prevalent in large, long-lived academic research tool suites (3L-ARTS). This model results in an "archipelago" of related but haphazardly organized architectural "islands", and inherently induces technical debt. We illustrate the archipelago model with examples from two 3L-ARTS archipelagos identified in literature. This paper identifies a model of software evolution that is prevalent in large, long-lived academic research tool suites (3L-ARTS). This model results in an "archipelago" of related but haphazardly organized architectural "islands", and inherently induces technical debt. We illustrate the archipelago model with examples from two 3L-ARTS archipelagos identified in literature. △ Less

Submitted 16 April, 2021; originally announced April 2021.

arXiv:2102.09835 [pdf, other]

Architectural Decay as Predictor of Issue- and Change-Proneness

Authors: Duc Minh Le, Suhrid Karthik, Marcelo Schmitt Laser, Nenad Medvidovic

Abstract: Architectural decay imposes real costs in terms of developer effort, system correctness, and performance. Over time, those problems are likely to be revealed as explicit implementation issues (defects, feature changes, etc.). Recent empirical studies have demonstrated that there is a significant correlation between architectural "smells" -- manifestations of architectural decay -- and implementati… ▽ More Architectural decay imposes real costs in terms of developer effort, system correctness, and performance. Over time, those problems are likely to be revealed as explicit implementation issues (defects, feature changes, etc.). Recent empirical studies have demonstrated that there is a significant correlation between architectural "smells" -- manifestations of architectural decay -- and implementation issues. In this paper, we take a step further in exploring this phenomenon. We analyze the available development data from 10 open-source software systems and show that information regarding current architectural decay in these systems can be used to build models that accurately predict future issue-proneness and change-proneness of the systems' implementations. As a less intuitive result, we also show that, in cases where historical data for a system is unavailable, such data from other, unrelated systems can provide reasonably accurate issue- and change-proneness prediction capabilities. △ Less

Submitted 19 February, 2021; originally announced February 2021.

arXiv:2011.04654 [pdf, other]

Assessing the Feasibility of Web-Request Prediction Models on Mobile Platforms

Authors: Yixue Zhao, Siwei Yin, Adriana Sejfia, Marcelo Schmitt Laser, Haoyu Wang, Nenad Medvidovic

Abstract: Prefetching web pages is a well-studied solution to reduce network latency by predicting users' future actions based on their past behaviors. However, such techniques are largely unexplored on mobile platforms. Today's privacy regulations make it infeasible to explore prefetching with the usual strategy of amassing large amounts of data over long periods and constructing conventional, "large" pred… ▽ More Prefetching web pages is a well-studied solution to reduce network latency by predicting users' future actions based on their past behaviors. However, such techniques are largely unexplored on mobile platforms. Today's privacy regulations make it infeasible to explore prefetching with the usual strategy of amassing large amounts of data over long periods and constructing conventional, "large" prediction models. Our work is based on the observation that this may not be necessary: Given previously reported mobile-device usage trends (e.g., repetitive behaviors in brief bursts), we hypothesized that prefetching should work effectively with "small" models trained on mobile-user requests collected during much shorter time periods. To test this hypothesis, we constructed a framework for automatically assessing prediction models, and used it to conduct an extensive empirical study based on over 15 million HTTP requests collected from nearly 11,500 mobile users during a 24-hour period, resulting in over 7 million models. Our results demonstrate the feasibility of prefetching with small models on mobile platforms, directly motivating future work in this area. We further introduce several strategies for improving prediction models while reducing the model size. Finally, our framework provides the foundation for future explorations of effective prediction models across a range of usage scenarios. △ Less

Submitted 23 March, 2021; v1 submitted 10 November, 2020; originally announced November 2020.

Comments: MOBILESoft 2021

arXiv:2008.03427 [pdf, other]

doi 10.1145/3368089.3409708

FrUITeR: A Framework for Evaluating UI Test Reuse

Authors: Yixue Zhao, Justin Chen, Adriana Sejfia, Marcelo Schmitt Laser, Jie Zhang, Federica Sarro, Mark Harman, Nenad Medvidovic

Abstract: UI testing is tedious and time-consuming due to the manual effort required. Recent research has explored opportunities for reusing existing UI tests from an app to automatically generate new tests for other apps. However, the evaluation of such techniques currently remains manual, unscalable, and unreproducible, which can waste effort and impede progress in this emerging area. We introduce FrUITeR… ▽ More UI testing is tedious and time-consuming due to the manual effort required. Recent research has explored opportunities for reusing existing UI tests from an app to automatically generate new tests for other apps. However, the evaluation of such techniques currently remains manual, unscalable, and unreproducible, which can waste effort and impede progress in this emerging area. We introduce FrUITeR, a framework that automatically evaluates UI test reuse in a reproducible way. We apply FrUITeR to existing test-reuse techniques on a uniform benchmark we established, resulting in 11,917 test reuse cases from 20 apps. We report several key findings aimed at improving UI test reuse that are missed by existing work. △ Less

Submitted 3 November, 2020; v1 submitted 7 August, 2020; originally announced August 2020.

Comments: ESEC/FSE 2020

arXiv:1902.08879 [pdf, other]

A Microservice Architecture for Online Mobile App Optimization

Authors: Yixue Zhao, Nenad Medvidovic

Abstract: A large number of techniques for analyzing and optimizing mobile apps have emerged in the past decade. However, those techniques' components are notoriously difficult to extract and reuse outside their original tools. This paper introduces MAOMAO, a microservice-based reference architecture for reusing and integrating such components. MAOMAO's twin goals are (1) adoption of available app optimizat… ▽ More A large number of techniques for analyzing and optimizing mobile apps have emerged in the past decade. However, those techniques' components are notoriously difficult to extract and reuse outside their original tools. This paper introduces MAOMAO, a microservice-based reference architecture for reusing and integrating such components. MAOMAO's twin goals are (1) adoption of available app optimization techniques in practice and (2) improved construction and evaluation of new techniques. The paper uses several existing app optimization techniques to illustrate both the motivation behind MAOMAO and its potential to fundamentally alter the landscape in this area. △ Less

Submitted 23 February, 2019; originally announced February 2019.

arXiv:1810.08862 [pdf, other]

doi 10.1145/3180155.3180249

Leveraging Program Analysis to Reduce User-Perceived Latency in Mobile Applications

Authors: Yixue Zhao, Marcelo Schmitt Laser, Yingjun Lyu, Nenad Medvidovic

Abstract: Reducing network latency in mobile applications is an effective way of improving the mobile user experience and has tangible economic benefits. This paper presents PALOMA, a novel client-centric technique for reducing the network latency by prefetching HTTP requests in Android apps. Our work leverages string analysis and callback control-flow analysis to automatically instrument apps using PALOMA'… ▽ More Reducing network latency in mobile applications is an effective way of improving the mobile user experience and has tangible economic benefits. This paper presents PALOMA, a novel client-centric technique for reducing the network latency by prefetching HTTP requests in Android apps. Our work leverages string analysis and callback control-flow analysis to automatically instrument apps using PALOMA's rigorous formulation of scenarios that address "what" and "when" to prefetch. PALOMA has been shown to incur significant runtime savings (several hundred milliseconds per prefetchable HTTP request), both when applied on a reusable evaluation benchmark we have developed and on real applications △ Less

Submitted 20 October, 2018; originally announced October 2018.

Comments: ICSE 2018

arXiv:1810.08861 [pdf, other]

doi 10.1145/3238147.3238215

Empirically Assessing Opportunities for Prefetching and Caching in Mobile Apps

Authors: Yixue Zhao, Paul Wat, Marcelo Schmitt Laser, Nenad Medvidovic

Abstract: Network latency in mobile software has a large impact on user experience, with potentially severe economic consequences. Prefetching and caching have been shown effective in reducing the latencies in browser-based systems. However, those techniques cannot be directly applied to the emerging domain of mobile apps because of the differences in network interactions. Moreover, there is a lack of resea… ▽ More Network latency in mobile software has a large impact on user experience, with potentially severe economic consequences. Prefetching and caching have been shown effective in reducing the latencies in browser-based systems. However, those techniques cannot be directly applied to the emerging domain of mobile apps because of the differences in network interactions. Moreover, there is a lack of research on prefetching and caching techniques that may be suitable for the mobile app domain, and it is not clear whether such techniques can be effective or whether they are even feasible. This paper takes the first step toward answering these questions by conducting a comprehensive study to understand the characteristics of HTTP requests in over 1000 popular Android apps. Our work focuses on the prefetchability of requests using static program analysis techniques and cacheability of resulting responses. We find that there is a substantial opportunity to leverage prefetching and caching in mobile apps, but that suitable techniques must take into account the nature of apps' network interactions and idiosyncrasies such as untrustworthy HTTP header information. Our observations provide guidelines for developers to utilize prefetching and caching schemes in app development, and motivate future research in this area. △ Less

Submitted 20 October, 2018; originally announced October 2018.

Comments: ASE 2018

arXiv:1704.04798 [pdf, other]

Uncovering Architectural Design Decisions

Authors: Arman Shahbazian, Youn Kyu Lee, Duc Le, Nenad Medvidovic

Abstract: Over the past three decades, considerable effort has been devoted to the study of software architecture. A major portion of this effort has focused on the originally proposed view of four "C"s---components, connectors, configurations, and constraints---that are the building blocks of a system's architecture. Despite being simple and appealing, this view has proven to be incomplete and has required… ▽ More Over the past three decades, considerable effort has been devoted to the study of software architecture. A major portion of this effort has focused on the originally proposed view of four "C"s---components, connectors, configurations, and constraints---that are the building blocks of a system's architecture. Despite being simple and appealing, this view has proven to be incomplete and has required further elaboration. To that end, researchers have more recently tried to approach architectures from another important perspective---that of design decisions that yield a system's architecture. These more recent efforts have lacked a precise understanding of several key questions, however: (1) What is an architectural design decision (definition)? (2) How can architectural design decisions be found in existing systems (identification)? (3) What system decisions are and are not architectural (classification)? (4) How are architectural design decisions manifested in the code (reification)? (5) How can important architectural decisions be preserved and/or changed as desired (evolution)? This paper presents a technique targeted at answering these questions by analyzing information that is readily available about software systems. We applied our technique on over 100 different versions of two widely adopted open- source systems, and found that it can accurately uncover the architectural design decisions embodied in the systems. △ Less

Submitted 16 April, 2017; originally announced April 2017.

Showing 1–10 of 10 results for author: Medvidovic, N