Search | arXiv e-print repository

arXiv:2312.17726 [pdf, ps, other]

Comparing Effectiveness and Efficiency of Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP) Tools in a Large Java-based System

Authors: Aishwarya Seth, Saikath Bhattacharya, Sarah Elder, Nusrat Zahan, Laurie Williams

Abstract: Security resources are scarce, and practitioners need guidance in the effective and efficient usage of techniques and tools available in the cybersecurity industry. Two emerging tool types, Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP), have not been thoroughly evaluated against well-established counterparts such as Dynamic Application Security Test… ▽ More Security resources are scarce, and practitioners need guidance in the effective and efficient usage of techniques and tools available in the cybersecurity industry. Two emerging tool types, Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP), have not been thoroughly evaluated against well-established counterparts such as Dynamic Application Security Testing (DAST) and Static Application Security Testing (SAST). The goal of this research is to aid practitioners in making informed choices about the use of Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP) tools through an analysis of their effectiveness and efficiency in comparison with different vulnerability detection and prevention techniques and tools. We apply IAST and RASP on OpenMRS, an open-source Java-based online application. We compare the efficiency and effectiveness of IAST and RASP with techniques applied on OpenMRS in prior work. We measure efficiency and effectiveness in terms of the number and type of vulnerabilities detected and prevented per hour. Our study shows IAST performed relatively well compared to other techniques, performing second-best in both efficiency and effectiveness. IAST detected eight Top-10 OWASP security risks compared to nine by SMPT and seven for EMPT, DAST, and SAST. IAST found more vulnerabilities than SMPT. The efficiency of IAST (2.14 VpH) is second to only EMPT (2.22 VpH). These findings imply that our study benefited from using IAST when conducting black-box security testing. In the context of a large, enterprise-scale web application such as OpenMRS, RASP does not replace vulnerability detection, while IAST is a powerful tool that complements other techniques. △ Less

Submitted 29 December, 2023; originally announced December 2023.

arXiv:2208.01595 [pdf, other]

Do I really need all this work to find vulnerabilities? An empirical case study comparing vulnerability detection techniques on a Java application

Authors: Sarah Elder, Nusrat Zahan, Rui Shu, Monica Metro, Valeri Kozarev, Tim Menzies, Laurie Williams

Abstract: CONTEXT: Applying vulnerability detection techniques is one of many tasks using the limited resources of a software project. OBJECTIVE: The goal of this research is to assist managers and other decision-makers in making informed choices about the use of software vulnerability detection techniques through an empirical study of the efficiency and effectiveness of four techniques on a Java-based we… ▽ More CONTEXT: Applying vulnerability detection techniques is one of many tasks using the limited resources of a software project. OBJECTIVE: The goal of this research is to assist managers and other decision-makers in making informed choices about the use of software vulnerability detection techniques through an empirical study of the efficiency and effectiveness of four techniques on a Java-based web application. METHOD: We apply four different categories of vulnerability detection techniques \textendash~ systematic manual penetration testing (SMPT), exploratory manual penetration testing (EMPT), dynamic application security testing (DAST), and static application security testing (SAST) \textendash\ to an open-source medical records system. RESULTS: We found the most vulnerabilities using SAST. However, EMPT found more severe vulnerabilities. With each technique, we found unique vulnerabilities not found using the other techniques. The efficiency of manual techniques (EMPT, SMPT) was comparable to or better than the efficiency of automated techniques (DAST, SAST) in terms of Vulnerabilities per Hour (VpH). CONCLUSIONS: The vulnerability detection technique practitioners should select may vary based on the goals and available resources of the project. If the goal of an organization is to find "all" vulnerabilities in a project, they need to use as many techniques as their resources allow. △ Less

Submitted 2 August, 2022; originally announced August 2022.

ACM Class: D.2.5

arXiv:2103.05160 [pdf, other]

Vulnerability Detection is Just the Beginning

Authors: Sarah Elder

Abstract: Vulnerability detection plays a key role in secure software development. There are many different vulnerability detection tools and techniques to choose from, and insufficient information on which vulnerability detection techniques to use and when. The goal of this research is to assist managers and other decision-makers on software projects in making informed choices about the use of different so… ▽ More Vulnerability detection plays a key role in secure software development. There are many different vulnerability detection tools and techniques to choose from, and insufficient information on which vulnerability detection techniques to use and when. The goal of this research is to assist managers and other decision-makers on software projects in making informed choices about the use of different software vulnerability detection techniques through empirical analysis of the efficiency and effectiveness of each technique. We will examine the relationships between the vulnerability detection technique used to find a vulnerability, the type of vulnerability found, the exploitability of the vulnerability, and the effort needed to fix a vulnerability on two projects where we ensure all vulnerabilities found have been fixed. We will then examine how these relationships are seen in Open Source Software more broadly where practitioners may use different vulnerability detection techniques, or may not fix all vulnerabilities found due to resource constraints. △ Less

Submitted 8 March, 2021; originally announced March 2021.

Comments: 5 pages, 1 figure, submitted to International Conference on Software Engineering: Doctoral Symposium (ICSE-DS)

ACM Class: D.2.4; D.4.6

arXiv:2103.05088 [pdf, other]

Structuring a Comprehensive Software Security Course Around the OWASP Application Security Verification Standard

Authors: Sarah Elder, Nusrat Zahan, Val Kozarev, Rui Shu, Tim Menzies, Laurie Williams

Abstract: Lack of security expertise among software practitioners is a problem with many implications. First, there is a deficit of security professionals to meet current needs. Additionally, even practitioners who do not plan to work in security may benefit from increased understanding of security. The goal of this paper is to aid software engineering educators in designing a comprehensive software securit… ▽ More Lack of security expertise among software practitioners is a problem with many implications. First, there is a deficit of security professionals to meet current needs. Additionally, even practitioners who do not plan to work in security may benefit from increased understanding of security. The goal of this paper is to aid software engineering educators in designing a comprehensive software security course by sharing an experience running a software security course for the eleventh time. Through all the eleven years of running the software security course, the course objectives have been comprehensive - ranging from security testing, to secure design and coding, to security requirements to security risk management. For the first time in this eleventh year, a theme of the course assignments was to map vulnerability discovery to the security controls of the Open Web Application Security Project (OWASP) Application Security Verification Standard (ASVS). Based upon student performance on a final exploratory penetration testing project, this map** may have increased students' depth of understanding of a wider range of security topics. The students efficiently detected 191 unique and verified vulnerabilities of 28 different Common Weakness Enumeration (CWE) types during a three-hour period in the OpenMRS project, an electronic health record application in active use. △ Less

Submitted 8 March, 2021; originally announced March 2021.

Comments: 10 pages, 5 figures, 1 table, submitted to International Conference on Software Engineering: Joint Track on Software Engineering Education and Training (ICSE-JSEET)

ACM Class: K.3.0; D.2.0; K.6.5

arXiv:1809.07937 [pdf, other]

Bugs in Infrastructure as Code

Authors: Akond Rahman, Sarah Elder, Faysal Hossain Shezan, Vanessa Frost, Jonathan Stallings, Laurie Williams

Abstract: Infrastructure as code (IaC) scripts are used to automate the maintenance and configuration of software development and deployment infrastructure. IaC scripts can be complex in nature, containing hundreds of lines of code, leading to defects that can be difficult to debug, and lead to wide-scale system discrepancies such as service outages at scale. Use of IaC scripts is getting increasingly popul… ▽ More Infrastructure as code (IaC) scripts are used to automate the maintenance and configuration of software development and deployment infrastructure. IaC scripts can be complex in nature, containing hundreds of lines of code, leading to defects that can be difficult to debug, and lead to wide-scale system discrepancies such as service outages at scale. Use of IaC scripts is getting increasingly popular, yet the nature of defects that occur in these scripts have not been systematically categorized. A systematic categorization of defects can inform practitioners about process improvement opportunities to mitigate defects in IaC scripts. The goal of this paper is to help software practitioners improve their development process of infrastructure as code (IaC) scripts by categorizing the defect categories in IaC scripts based upon a qualitative analysis of commit messages and issue report descriptions. We mine open source version control systems collected from four organizations namely, Mirantis, Mozilla, Openstack, and Wikimedia Commons to conduct our research study. We use 1021, 3074, 7808, and 972 commits that map to 165, 580, 1383, and 296 IaC scripts, respectively, collected from Mirantis, Mozilla, Openstack, and Wikimedia Commons. With 89 raters we apply the defect type attribute of the orthogonal defect classification (ODC) methodology to categorize the defects. We also review prior literature that have used ODC to categorize defects, and compare the defect category distribution of IaC scripts with 26 non-IaC software systems. Respectively, for Mirantis, Mozilla, Openstack, and Wikimedia Commons, we observe (i) 49.3%, 36.5%, 57.6%, and 62.7% of the IaC defects to contain syntax and configuration-related defects; (ii) syntax and configuration-related defects are more prevalent amongst IaC scripts compared to that of previously-studied non-IaC software. △ Less

Submitted 17 July, 2019; v1 submitted 21 September, 2018; originally announced September 2018.

Comments: Not peer-reviewed

arXiv:1809.00984 [pdf]

doi 10.1088/1674-1056/27/11/118101

The materials data ecosystem: materials data science and its role in data-driven materials discovery

Authors: Hai-Qing Yin, Xue Jiang, Guo-Quan Liu, Sharon Elder, Bin Xu1, Qing-Jun Zheng, Xuan-Hui Qu

Abstract: Since its launch in 2011, Materials Genome Initiative (MGI) has drawn the attention of researchers from across academia, government, and industry worldwide.As one of the three tools of MGI, the materials data, for the first time, emerged as an extremely significant approach in materials discovery. Data science has been applied in different disciplines as an interdisciplinary field to extract knowl… ▽ More Since its launch in 2011, Materials Genome Initiative (MGI) has drawn the attention of researchers from across academia, government, and industry worldwide.As one of the three tools of MGI, the materials data, for the first time, emerged as an extremely significant approach in materials discovery. Data science has been applied in different disciplines as an interdisciplinary field to extract knowledge from the data. The concept of materials data science was utilized to demonstrate the data application in materials science. To explore its potential as an active research branch in the big data age, a three-tier system was put forward to define the infrastructure of data classification, curation and knowledge extraction of materials data. △ Less

Submitted 29 August, 2018; originally announced September 2018.

arXiv:1611.00065 [pdf, other]

Bayesian Adaptive Data Analysis Guarantees from Subgaussianity

Authors: Sam Elder

Abstract: The new field of adaptive data analysis seeks to provide algorithms and provable guarantees for models of machine learning that allow researchers to reuse their data, which normally falls outside of the usual statistical paradigm of static data analysis. In 2014, Dwork, Feldman, Hardt, Pitassi, Reingold and Roth introduced one potential model and proposed several solutions based on differential pr… ▽ More The new field of adaptive data analysis seeks to provide algorithms and provable guarantees for models of machine learning that allow researchers to reuse their data, which normally falls outside of the usual statistical paradigm of static data analysis. In 2014, Dwork, Feldman, Hardt, Pitassi, Reingold and Roth introduced one potential model and proposed several solutions based on differential privacy. In previous work in 2016, we described a problem with this model and instead proposed a Bayesian variant, but also found that the analogous Bayesian methods cannot achieve the same statistical guarantees as in the static case. In this paper, we prove the first positive results for the Bayesian model, showing that with a Dirichlet prior, the posterior mean algorithm indeed matches the statistical guarantees of the static case. The main ingredient is a new theorem showing that the $\mathrm{Beta}(α,β)$ distribution is subgaussian with variance proxy $O(1/(α+β+1))$, a concentration result also of independent interest. We provide two proofs of this result: a probabilistic proof utilizing a simple condition for the raw moments of a positive random variable and a learning-theoretic proof based on considering the beta distribution as a posterior, both of which have implications to other related problems. △ Less

Submitted 20 March, 2017; v1 submitted 31 October, 2016; originally announced November 2016.

arXiv:1604.02492 [pdf, other]

Challenges in Bayesian Adaptive Data Analysis

Authors: Sam Elder

Abstract: Traditional statistical analysis requires that the analysis process and data are independent. By contrast, the new field of adaptive data analysis hopes to understand and provide algorithms and accuracy guarantees for research as it is commonly performed in practice, as an iterative process of interacting repeatedly with the same data set, such as repeated tests against a holdout set. Previous wor… ▽ More Traditional statistical analysis requires that the analysis process and data are independent. By contrast, the new field of adaptive data analysis hopes to understand and provide algorithms and accuracy guarantees for research as it is commonly performed in practice, as an iterative process of interacting repeatedly with the same data set, such as repeated tests against a holdout set. Previous work has defined a model with a rather strong lower bound on sample complexity in terms of the number of queries, $n\sim\sqrt q$, arguing that adaptive data analysis is much harder than static data analysis, where $n\sim\log q$ is possible. Instead, we argue that those strong lower bounds point to a limitation of the previous model in that it must consider wildly asymmetric scenarios which do not hold in typical applications. To better understand other difficulties of adaptivity, we propose a new Bayesian version of the problem that mandates symmetry. Since the other lower bound techniques are ruled out, we can more effectively see difficulties that might otherwise be overshadowed. As a first contribution to this model, we produce a new problem using error-correcting codes on which a large family of methods, including all previously proposed algorithms, require roughly $n\sim\sqrt[4]q$. These early results illustrate new difficulties in adaptive data analysis regarding slightly correlated queries on problems with concentrated uncertainty. △ Less

Submitted 20 March, 2017; v1 submitted 8 April, 2016; originally announced April 2016.

arXiv:1410.6801 [pdf, ps, other]

Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

Authors: Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, Madalina Persu

Abstract: We show how to approximate a data matrix $\mathbf{A}$ with a much smaller sketch $\mathbf{\tilde A}$ that can be used to solve a general class of constrained k-rank approximation problems to within $(1+ε)$ error. Importantly, this class of problems includes $k$-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just $O(k)$ dime… ▽ More We show how to approximate a data matrix $\mathbf{A}$ with a much smaller sketch $\mathbf{\tilde A}$ that can be used to solve a general class of constrained k-rank approximation problems to within $(1+ε)$ error. Importantly, this class of problems includes $k$-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just $O(k)$ dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For $k$-means dimensionality reduction, we provide $(1+ε)$ relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate principal component analysis, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only `cover' a good subspace for $\bv{A}$, but can be used directly to compute this subspace. Finally, for $k$-means clustering, we show how to achieve a $(9+ε)$ approximation by Johnson-Lindenstrauss projecting data points to just $O(\log k/ε^2)$ dimensions. This gives the first result that leverages the specific structure of $k$-means to achieve dimension independent of input size and sublinear in $k$. △ Less

Submitted 2 April, 2015; v1 submitted 24 October, 2014; originally announced October 2014.

Showing 1–9 of 9 results for author: Elder, S