-
Using AI Assistants in Software Development: A Qualitative Study on Security Practices and Concerns
Authors:
Jan H. Klemmer,
Stefan Albert Horstmann,
Nikhil Patnaik,
Cordelia Ludden,
Cordell Burton Jr,
Carson Powers,
Fabio Massacci,
Akond Rahman,
Daniel Votipka,
Heather Richter Lipford,
Awais Rashid,
Alena Naiakshina,
Sascha Fahl
Abstract:
Following the recent release of AI assistants, such as OpenAI's ChatGPT and GitHub Copilot, the software industry quickly utilized these tools for software development tasks, e.g., generating code or consulting AI for advice. While recent research has demonstrated that AI-generated code can contain security issues, how software professionals balance AI assistant usage and security remains unclear.…
▽ More
Following the recent release of AI assistants, such as OpenAI's ChatGPT and GitHub Copilot, the software industry quickly utilized these tools for software development tasks, e.g., generating code or consulting AI for advice. While recent research has demonstrated that AI-generated code can contain security issues, how software professionals balance AI assistant usage and security remains unclear. This paper investigates how software professionals use AI assistants in secure software development, what security implications and considerations arise, and what impact they foresee on secure software development. We conducted 27 semi-structured interviews with software professionals, including software engineers, team leads, and security testers. We also reviewed 190 relevant Reddit posts and comments to gain insights into the current discourse surrounding AI assistants for software development. Our analysis of the interviews and Reddit posts finds that despite many security and quality concerns, participants widely use AI assistants for security-critical tasks, e.g., code generation, threat modeling, and vulnerability detection. Their overall mistrust leads to checking AI suggestions in similar ways to human code, although they expect improvements and, therefore, a heavier use for security tasks in the future. We conclude with recommendations for software professionals to critically check AI suggestions, AI creators to improve suggestion security and capabilities for ethical security tasks, and academic researchers to consider general-purpose AI in software development.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Analyzing and Mitigating (with LLMs) the Security Misconfigurations of Helm Charts from Artifact Hub
Authors:
Francesco Minna,
Fabio Massacci,
Katja Tuma
Abstract:
Background: Helm is a package manager that allows defining, installing, and upgrading applications with Kubernetes (K8s), a popular container orchestration platform. A Helm chart is a collection of files describing all dependencies, resources, and parameters required for deploying an application within a K8s cluster. Objective: The goal of this study is to mine and empirically evaluate the securit…
▽ More
Background: Helm is a package manager that allows defining, installing, and upgrading applications with Kubernetes (K8s), a popular container orchestration platform. A Helm chart is a collection of files describing all dependencies, resources, and parameters required for deploying an application within a K8s cluster. Objective: The goal of this study is to mine and empirically evaluate the security of Helm charts, comparing the performance of existing tools in terms of misconfigurations reported by policies available by default, and measure to what extent LLMs could be used for removing misconfiguration. We also want to investigate whether there are false positives in both the LLM refactorings and the tool outputs. Method: We propose a pipeline to mine Helm charts from Artifact Hub, a popular centralized repository, and analyze them using state-of-the-art open-source tools, such as Checkov and KICS. First, such a pipeline will run several chart analyzers and identify the common and unique misconfigurations reported by each tool. Secondly, it will use LLMs to suggest mitigation for each misconfiguration. Finally, the chart refactoring previously generated will be analyzed again by the same tools to see whether it satisfies the tool's policies. At the same time, we will also perform a manual analysis on a subset of charts to evaluate whether there are false positive misconfigurations from the tool's reporting and in the LLM refactoring.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Cross-ecosystem categorization: A manual-curation protocol for the categorization of Java Maven libraries along Python PyPI Topics
Authors:
Ranindya Paramitha,
Yuan Feng,
Fabio Massacci,
Carlos E. Budde
Abstract:
Context: Software of different functional categories, such as text processing vs. networking, has different profiles in terms of metrics like security and updates. Using popularity to compare e.g. Java vs. Python libraries might give a skewed perspective, as the categories of the most popular software vary from one ecosystem to the next. How can one compare libraries datasets across software ecosy…
▽ More
Context: Software of different functional categories, such as text processing vs. networking, has different profiles in terms of metrics like security and updates. Using popularity to compare e.g. Java vs. Python libraries might give a skewed perspective, as the categories of the most popular software vary from one ecosystem to the next. How can one compare libraries datasets across software ecosystems, when not even the category names are uniform among them? Objective: We study how to generate a language-agnostic categorisation of software by functional purpose, that enables cross-ecosystem studies of libraries datasets. This provides the functional fingerprint information needed for software metrics comparisons. Method: We designed and implemented a human-guided protocol to categorise libraries from software ecosystems. Category names mirror PyPI Topic classifiers, but the protocol is generic and can be applied to any ecosystem. We demonstrate it by categorising 256 Java/Maven libraries with severe security vulnerabilities. Results: The protocol allows three or more people to categorise any number of libraries. The categorisation produced is functional-oriented and language-agnostic. The Java/Maven dataset demonstration resulted in a majority of Internet-oriented libraries, coherent with its selection by severe vulnerabilities. To allow replication and updates, we make the dataset and the protocol individual steps available as open data. Conclusions: Libraries categorisation by functional purpose is feasible with our protocol, which produced the fingerprint of a 256-libraries Java dataset. While this was labour intensive, humans excel in the required inference tasks, so full automation of the process is not envisioned. However, results can provide the ground truth needed for machine learning in large-scale cross-ecosystem empirical studies.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
A Graph-based Stratified Sampling Methodology for the Analysis of (Underground) Forums
Authors:
Giorgio Di Tizio,
Gilberto Atondo Siu,
Alice Hutchings,
Fabio Massacci
Abstract:
[Context] Researchers analyze underground forums to study abuse and cybercrime activities. Due to the size of the forums and the domain expertise required to identify criminal discussions, most approaches employ supervised machine learning techniques to automatically classify the posts of interest. [Goal] Human annotation is costly. How to select samples to annotate that account for the structure…
▽ More
[Context] Researchers analyze underground forums to study abuse and cybercrime activities. Due to the size of the forums and the domain expertise required to identify criminal discussions, most approaches employ supervised machine learning techniques to automatically classify the posts of interest. [Goal] Human annotation is costly. How to select samples to annotate that account for the structure of the forum? [Method] We present a methodology to generate stratified samples based on information about the centrality properties of the population and evaluate classifier performance. [Result] We observe that by employing a sample obtained from a uniform distribution of the post degree centrality metric, we maintain the same level of precision but significantly increase the recall (+30%) compared to a sample whose distribution is respecting the population stratification. We find that classifiers trained with similar samples disagree on the classification of criminal activities up to 33% of the time when deployed on the entire forum.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Are Software Updates Useless Against Advanced Persistent Threats?
Authors:
Fabio Massacci,
Giorgio Di Tizio
Abstract:
A dilemma worth Shakespeare's Hamlet is increasingly haunting companies and security researchers: ``to update or not to update, this is the question``. From the perspective of recommended common practices by software vendors the answer is unambiguous: you should keep your software up-to-date. But is common sense always good sense? We argue it is not.
A dilemma worth Shakespeare's Hamlet is increasingly haunting companies and security researchers: ``to update or not to update, this is the question``. From the perspective of recommended common practices by software vendors the answer is unambiguous: you should keep your software up-to-date. But is common sense always good sense? We argue it is not.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
Building cross-language corpora for human understanding of privacy policies
Authors:
Francesco Ciclosi,
Silvia Vidor,
Fabio Massacci
Abstract:
Making sure that users understand privacy policies that impact them is a key challenge for a real GDPR deployment. Research studies are mostly carried in English, but in Europe and elsewhere, users speak a language that is not English. Replicating studies in different languages requires the availability of comparable cross-language privacy policies corpora. This work provides a methodology for bui…
▽ More
Making sure that users understand privacy policies that impact them is a key challenge for a real GDPR deployment. Research studies are mostly carried in English, but in Europe and elsewhere, users speak a language that is not English. Replicating studies in different languages requires the availability of comparable cross-language privacy policies corpora. This work provides a methodology for building comparable cross-language in a national language and a reference study language. We provide an application example of our methodology comparing English and Italian extending the corpus of one of the first studies about users understanding of technical terms in privacy policies. We also investigate other open issues that can make replication harder.
△ Less
Submitted 10 February, 2023;
originally announced February 2023.
-
The Data Protection Officer, an ubiquitous role nobody really knows
Authors:
Francesco Ciclosi,
Fabio Massacci
Abstract:
Among all cybersecurity and privacy workers, the Data Protection Officer (DPO) stands between those auditing a company's compliance and those acting as management advisors. A person that must be somehow versed in legal, management, and cybersecurity technical skills. We describe how this role tackles socio-technical risks in everyday scenarios.
Among all cybersecurity and privacy workers, the Data Protection Officer (DPO) stands between those auditing a company's compliance and those acting as management advisors. A person that must be somehow versed in legal, management, and cybersecurity technical skills. We describe how this role tackles socio-technical risks in everyday scenarios.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.
-
On the acceptance by code reviewers of candidate security patches suggested by Automated Program Repair tools
Authors:
Aurora Papotti,
Ranindya Paramitha,
Fabio Massacci
Abstract:
Background: Testing and validation of the semantic correctness of patches provided by tools for Automated Program Repairs (APR) has received a lot of attention. Yet, the eventual acceptance or rejection of suggested patches for real world projects by humans patch reviewers has received a limited attention. Objective: To address this issue, we plan to investigate whether (possibly incorrect) securi…
▽ More
Background: Testing and validation of the semantic correctness of patches provided by tools for Automated Program Repairs (APR) has received a lot of attention. Yet, the eventual acceptance or rejection of suggested patches for real world projects by humans patch reviewers has received a limited attention. Objective: To address this issue, we plan to investigate whether (possibly incorrect) security patches suggested by APR tools are recognized by human reviewers. We also want to investigate whether knowing that a patch was produced by an allegedly specialized tool does change the decision of human reviewers. Method: In the first phase, using a balanced design, we propose to human reviewers a combination of patches proposed by APR tools for different vulnerabilities and ask reviewers to adopt or reject the proposed patches. In the second phase, we tell participants that some of the proposed patches were generated by security specialized tools (even if the tool was actually a `normal' APR tool) and measure whether the human reviewers would change their decision to adopt or reject a patch. Limitations: The experiment will be conducted in an academic setting, and to maintain power, it will focus on a limited sample of popular APR tools and popular vulnerability types.
△ Less
Submitted 16 September, 2022; v1 submitted 15 September, 2022;
originally announced September 2022.
-
Cryptographic and Financial Fairness
Authors:
Daniele Friolo,
Fabio Massacci,
Chan Nam Ngo,
Daniele Venturi
Abstract:
A recent trend in multi-party computation is to achieve cryptographic fairness via monetary penalties, i.e. each honest player either obtains the output or receives a compensation in the form of a cryptocurrency. We pioneer another type of fairness, financial fairness, that is closer to the real-world valuation of financial transactions. Intuitively, a penalty protocol is financially fair if the n…
▽ More
A recent trend in multi-party computation is to achieve cryptographic fairness via monetary penalties, i.e. each honest player either obtains the output or receives a compensation in the form of a cryptocurrency. We pioneer another type of fairness, financial fairness, that is closer to the real-world valuation of financial transactions. Intuitively, a penalty protocol is financially fair if the net present cost of participation (the total value of cash inflows less cash outflows, weighted by the relative discount rate) is the same for all honest participants, even when some parties cheat.
We formally define the notion, show several impossibility results based on game theory, and analyze the practical effects of (lack of) financial fairness if one was to run the protocols for real on Bitcoin using Bloomberg's dark pool trading.
For example, we show that the ladder protocol (CRYPTO'14), and its variants (CCS'15 and CCS'16), fail to achieve financial fairness both in theory and in practice, while the penalty protocols of Kumaresan and Bentov (CCS'14) and Baum, David and Dowsley (FC'20) are financially fair.
This version contains formal definitions, detailed security proofs, demos and experimental data in the appendix.
△ Less
Submitted 11 August, 2022; v1 submitted 21 July, 2022;
originally announced July 2022.
-
Towards a Security Stress-Test for Cloud Configurations
Authors:
Francesco Minna,
Fabio Massacci,
Katja Tuma
Abstract:
Securing cloud configurations is an elusive task, which is left up to system administrators who have to base their decisions on ``trial and error'' experimentations or by observing good practices (e.g., CIS Benchmarks). We propose a knowledge, AND/OR, graphs approach to model cloud deployment security objects and vulnerabilities. In this way, we can capture relationships between configurations, pe…
▽ More
Securing cloud configurations is an elusive task, which is left up to system administrators who have to base their decisions on ``trial and error'' experimentations or by observing good practices (e.g., CIS Benchmarks). We propose a knowledge, AND/OR, graphs approach to model cloud deployment security objects and vulnerabilities. In this way, we can capture relationships between configurations, permissions (e.g., CAP\_SYS\_ADMIN), and security profiles (e.g., AppArmor and SecComp), as first-class citizens. Such an approach allows us to suggest alternative and safer configurations, support administrators in the study of what-if scenarios, and scale the analysis to large scale deployments. We present an initial validation and illustrate the approach with three real vulnerabilities from known sources.
△ Less
Submitted 7 June, 2022; v1 submitted 28 May, 2022;
originally announced May 2022.
-
Software Updates Strategies: a Quantitative Evaluation against Advanced Persistent Threats
Authors:
Giorgio Di Tizio,
Michele Armellini,
Fabio Massacci
Abstract:
Software updates reduce the opportunity for exploitation. However, since updates can also introduce breaking changes, enterprises face the problem of balancing the need to secure software with updates with the need to support operations. We propose a methodology to quantitatively investigate the effectiveness of software updates strategies against attacks of Advanced Persistent Threats (APTs). We…
▽ More
Software updates reduce the opportunity for exploitation. However, since updates can also introduce breaking changes, enterprises face the problem of balancing the need to secure software with updates with the need to support operations. We propose a methodology to quantitatively investigate the effectiveness of software updates strategies against attacks of Advanced Persistent Threats (APTs). We consider strategies where the vendor updates are the only limiting factors to cases in which enterprises delay updates from 1 to 7 months based on SANS data. Our manually curated dataset of APT attacks covers 86 APTs and 350 campaigns from 2008 to 2020. It includes information about attack vectors, exploited vulnerabilities (e.g. 0-days vs public vulnerabilities), and affected software and versions. Contrary to common belief, most APT campaigns employed publicly known vulnerabilities. If an enterprise could theoretically update as soon as an update is released, it would face lower odds of being compromised than those waiting one (4.9x) or three (9.1x) months. However, if attacked, it could still be compromised from 14% to 33% of the times. As in practice enterprises must do regression testing before applying an update, our major finding is that one could perform 12% of all possible updates restricting oneself only to versions fixing publicly known vulnerabilities without significant changes to the odds of being compromised compared to a company that updates for all versions.
△ Less
Submitted 25 May, 2022; v1 submitted 16 May, 2022;
originally announced May 2022.
-
Secure Software Development in the Era of Fluid Multi-party Open Software and Services
Authors:
Ivan Pashchenko,
Riccardo Scandariato,
Antonino Sabetta,
Fabio Massacci
Abstract:
Pushed by market forces, software development has become fast-paced. As a consequence, modern development projects are assembled from 3rd-party components. Security & privacy assurance techniques once designed for large, controlled updates over months or years, must now cope with small, continuous changes taking place within a week, and happening in sub-components that are controlled by third-part…
▽ More
Pushed by market forces, software development has become fast-paced. As a consequence, modern development projects are assembled from 3rd-party components. Security & privacy assurance techniques once designed for large, controlled updates over months or years, must now cope with small, continuous changes taking place within a week, and happening in sub-components that are controlled by third-party developers one might not even know they existed. In this paper, we aim to provide an overview of the current software security approaches and evaluate their appropriateness in the face of the changed nature in software development. Software security assurance could benefit by switching from a process-based to an artefact-based approach. Further, security evaluation might need to be more incremental, automated and decentralized. We believe this can be achieved by supporting mechanisms for lightweight and scalable screenings that are applicable to the entire population of software components albeit there might be a price to pay.
△ Less
Submitted 4 March, 2021;
originally announced March 2021.
-
Technical Leverage in a Software Ecosystem: Development Opportunities and Security Risks
Authors:
Fabio Massacci,
Ivan Pashchenko
Abstract:
In finance, leverage is the ratio between assets borrowed from others and one's own assets. A matching situation is present in software: by using free open-source software (FOSS) libraries a developer leverages on other people's code to multiply the offered functionalities with a much smaller own codebase. In finance as in software, leverage magnifies profits when returns from borrowing exceed cos…
▽ More
In finance, leverage is the ratio between assets borrowed from others and one's own assets. A matching situation is present in software: by using free open-source software (FOSS) libraries a developer leverages on other people's code to multiply the offered functionalities with a much smaller own codebase. In finance as in software, leverage magnifies profits when returns from borrowing exceed costs of integration, but it may also magnify losses, in particular in the presence of security vulnerabilities. We aim to understand the level of technical leverage in the FOSS ecosystem and whether it can be a potential source of security vulnerabilities. Also, we introduce two metrics change distance and change direction to capture the amount and the evolution of the dependency on third-party libraries.
The application of the proposed metrics on 8494 distinct library versions from the FOSS Maven-based Java libraries shows that small and medium libraries (less than 100KLoC) have disproportionately more leverage on FOSS dependencies in comparison to large libraries. We show that leverage pays off as leveraged libraries only add a 4% delay in the time interval between library releases while providing four times more code than their own. However, libraries with such leverage (i.e., 75% of libraries in our sample) also have 1.6 higher odds of being vulnerable in comparison to the libraries with lower leverage.
We provide an online demo for computing the proposed metrics for real-world software libraries available under the following URL: https://techleverage.eu/.
△ Less
Submitted 4 March, 2021;
originally announced March 2021.
-
A Convolutional Transformation Network for Malware Classification
Authors:
Duc-Ly Vu,
Trong-Kha Nguyen,
Tam V. Nguyen,
Tu N. Nguyen,
Fabio Massacci,
Phu H. Phung
Abstract:
Modern malware evolves various detection avoidance techniques to bypass the state-of-the-art detection methods. An emerging trend to deal with this issue is the combination of image transformation and machine learning techniques to classify and detect malware. However, existing works in this field only perform simple image transformation methods that limit the accuracy of the detection. In this pa…
▽ More
Modern malware evolves various detection avoidance techniques to bypass the state-of-the-art detection methods. An emerging trend to deal with this issue is the combination of image transformation and machine learning techniques to classify and detect malware. However, existing works in this field only perform simple image transformation methods that limit the accuracy of the detection. In this paper, we introduce a novel approach to classify malware by using a deep network on images transformed from binary samples. In particular, we first develop a novel hybrid image transformation method to convert binaries into color images that convey the binary semantics. The images are trained by a deep convolutional neural network that later classifies the test inputs into benign or malicious categories. Through the extensive experiments, our proposed method surpasses all baselines and achieves 99.14% in terms of accuracy on the testing set.
△ Less
Submitted 16 September, 2019;
originally announced September 2019.
-
Vulnerable Open Source Dependencies: Counting Those That Matter
Authors:
Ivan Pashchenko,
Henrik Plate,
Serena Elisa Ponta,
Antonino Sabetta,
Fabio Massacci
Abstract:
BACKGROUND: Vulnerable dependencies are a known problem in today's open-source software ecosystems because OSS libraries are highly interconnected and developers do not always update their dependencies. AIMS: In this paper we aim to present a precise methodology, that combines the code-based analysis of patches with information on build, test, update dates, and group extracted from the very code r…
▽ More
BACKGROUND: Vulnerable dependencies are a known problem in today's open-source software ecosystems because OSS libraries are highly interconnected and developers do not always update their dependencies. AIMS: In this paper we aim to present a precise methodology, that combines the code-based analysis of patches with information on build, test, update dates, and group extracted from the very code repository, and therefore, caters to the needs of industrial practice for correct allocation of development and audit resources. METHOD: To understand the industrial impact of the proposed methodology, we considered the 200 most popular OSS Java libraries used by SAP in its own software. Our analysis included 10905 distinct GAVs (group, artifact, version) when considering all the library versions. RESULTS: We found that about 20% of the dependencies affected by a known vulnerability are not deployed, and therefore, they do not represent a danger to the analyzed library because they cannot be exploited in practice. Developers of the analyzed libraries are able to fix (and actually responsible for) 82% of the deployed vulnerable dependencies. The vast majority (81%) of vulnerable dependencies may be fixed by simply updating to a new version, while 1% of the vulnerable dependencies in our sample are halted, and therefore, potentially require a costly mitigation strategy. CONCLUSIONS: Our case study shows that the correct counting allows software development companies to receive actionable information about their library dependencies, and therefore, correctly allocate costly development and audit resources, which is spent inefficiently in case of distorted measurements.
△ Less
Submitted 29 August, 2018;
originally announced August 2018.
-
The Effect of Security Education and Expertise on Security Assessments: the Case of Software Vulnerabilities
Authors:
Luca Allodi,
Marco Cremonini,
Fabio Massacci,
Woohyun Shim
Abstract:
In spite of the growing importance of software security and the industry demand for more cyber security expertise in the workforce, the effect of security education and experience on the ability to assess complex software security problems has only been recently investigated. As proxy for the full range of software security skills, we considered the problem of assessing the severity of software vu…
▽ More
In spite of the growing importance of software security and the industry demand for more cyber security expertise in the workforce, the effect of security education and experience on the ability to assess complex software security problems has only been recently investigated. As proxy for the full range of software security skills, we considered the problem of assessing the severity of software vulnerabilities by means of a structured analysis methodology widely used in industry (i.e. the Common Vulnerability Scoring System (\CVSS) v3), and designed a study to compare how accurately individuals with background in information technology but different professional experience and education in cyber security are able to assess the severity of software vulnerabilities. Our results provide some structural insights into the complex relationship between education or experience of assessors and the quality of their assessments. In particular we find that individual characteristics matter more than professional experience or formal education; apparently it is the \emph{combination} of skills that one owns (including the actual knowledge of the system under study), rather than the specialization or the years of experience, to influence more the assessment quality. Similarly, we find that the overall advantage given by professional expertise significantly depends on the composition of the individual security skills as well as on the available information.
△ Less
Submitted 20 August, 2018;
originally announced August 2018.
-
Attack Potential in Impact and Complexity
Authors:
Luca Allodi,
Fabio Massacci
Abstract:
Vulnerability exploitation is reportedly one of the main attack vectors against computer systems. Yet, most vulnerabilities remain unexploited by attackers. It is therefore of central importance to identify vulnerabilities that carry a high `potential for attack'. In this paper we rely on Symantec data on real attacks detected in the wild to identify a trade-off in the Impact and Complexity of a v…
▽ More
Vulnerability exploitation is reportedly one of the main attack vectors against computer systems. Yet, most vulnerabilities remain unexploited by attackers. It is therefore of central importance to identify vulnerabilities that carry a high `potential for attack'. In this paper we rely on Symantec data on real attacks detected in the wild to identify a trade-off in the Impact and Complexity of a vulnerability, in terms of attacks that it generates; exploiting this effect, we devise a readily computable estimator of the vulnerability's Attack Potential that reliably estimates the expected volume of attacks against the vulnerability. We evaluate our estimator performance against standard patching policies by measuring foiled attacks and demanded workload expressed as the number of vulnerabilities entailed to patch. We show that our estimator significantly improves over standard patching policies by ruling out low-risk vulnerabilities, while maintaining invariant levels of coverage against attacks in the wild. Our estimator can be used as a first aid for vulnerability prioritisation to focus assessment efforts on high-potential vulnerabilities.
△ Less
Submitted 15 January, 2018;
originally announced January 2018.
-
TestREx: a Framework for Repeatable Exploits
Authors:
Stanislav Dashevskyi,
Daniel Ricardo dos Santos,
Fabio Massacci,
Antonino Sabetta
Abstract:
Web applications are the target of many well known exploits and also a fertile ground for the discovery of security vulnerabilities. Yet, the success of an exploit depends both on the vulnerability in the application source code and the environment in which the application is deployed and run. As execution environments are complex (application servers, databases and other supporting applications),…
▽ More
Web applications are the target of many well known exploits and also a fertile ground for the discovery of security vulnerabilities. Yet, the success of an exploit depends both on the vulnerability in the application source code and the environment in which the application is deployed and run. As execution environments are complex (application servers, databases and other supporting applications), we need to have a reliable framework to test whether known exploits can be reproduced in different settings, better understand their effects, and facilitate the discovery of new vulnerabilities. In this paper, we present TestREx - a framework that allows for highly automated, easily repeatable exploit testing in a variety of contexts, so that a security tester may quickly and efficiently perform large-scale experiments with vulnerability exploits. It supports packing and running applications with their environments, injecting exploits, monitoring their success, and generating security reports. We also provide a corpus of example applications, taken from related works or implemented by us.
△ Less
Submitted 10 September, 2017;
originally announced September 2017.
-
A Systematically Empirical Evaluation of Vulnerability Discovery Models: a Study on Browsers' Vulnerabilities
Authors:
Viet Hung Nguyen,
Fabio Massacci
Abstract:
A precise vulnerability discovery model (VDM) will provide a useful insight to assess software security, and could be a good prediction instrument for both software vendors and users to understand security trends and plan ahead patching schedule accordingly. Thus far, several models have been proposed and validated. Yet, no systematically independent validation by somebody other than the author ex…
▽ More
A precise vulnerability discovery model (VDM) will provide a useful insight to assess software security, and could be a good prediction instrument for both software vendors and users to understand security trends and plan ahead patching schedule accordingly. Thus far, several models have been proposed and validated. Yet, no systematically independent validation by somebody other than the author exists. Furthermore, there are a number of issues that might bias previous studies in the field. In this work, we fill in the gap by introducing an empirical methodology that systematically evaluates the performance of a VDM in two aspects: quality and predictability. We further apply this methodology to assess existing VDMs. The results show that some models should be rejected outright, while some others might be adequate to capture the discovery process of vulnerabilities. We also consider different usage scenarios of VDMs and find that the simplest linear model is the most appropriate choice in terms of both quality and predictability when browsers are young. Otherwise, logistics-based models are better choices.
△ Less
Submitted 11 June, 2013;
originally announced June 2013.
-
MAP-REDUCE Runtime Enforcement of Information Flow Policies
Authors:
Minh Ngo,
Fabio Massacci,
Olga Gadyatskaya
Abstract:
We propose a flexible framework that can be easily customized to enforce a large variety of information flow properties. Our framework combines the ideas of secure multi-execution and map-reduce computations. The information flow property of choice can be obtained by simply changes to a map (or reduce) program that control parallel executions.
We present the architecture of the enforcement mecha…
▽ More
We propose a flexible framework that can be easily customized to enforce a large variety of information flow properties. Our framework combines the ideas of secure multi-execution and map-reduce computations. The information flow property of choice can be obtained by simply changes to a map (or reduce) program that control parallel executions.
We present the architecture of the enforcement mechanism and its customizations for non-interference (NI) (from Devriese and Piessens) and some properties proposed by Mantel, such as removal of inputs (RI) and deletion of inputs (DI), and demonstrate formally soundness and precision of enforcement for these properties.
△ Less
Submitted 9 May, 2013;
originally announced May 2013.
-
The (Un)Reliability of NVD Vulnerable Versions Data: an Empirical Experiment on Google Chrome Vulnerabilities
Authors:
Viet Hung Nguyen,
Fabio Massacci
Abstract:
NVD is one of the most popular databases used by researchers to conduct empirical research on data sets of vulnerabilities. Our recent analysis on Chrome vulnerability data reported by NVD has revealed an abnormally phenomenon in the data where almost vulnerabilities were originated from the first versions. This inspires our experiment to validate the reliability of the NVD vulnerable version data…
▽ More
NVD is one of the most popular databases used by researchers to conduct empirical research on data sets of vulnerabilities. Our recent analysis on Chrome vulnerability data reported by NVD has revealed an abnormally phenomenon in the data where almost vulnerabilities were originated from the first versions. This inspires our experiment to validate the reliability of the NVD vulnerable version data. In this experiment, we verify for each version of Chrome that NVD claims vulnerable is actually vulnerable. The experiment revealed several errors in the vulnerability data of Chrome. Furthermore, we have also analyzed how these errors might impact the conclusions of an empirical study on foundational vulnerability. Our results show that different conclusions could be obtained due to the data errors.
△ Less
Submitted 17 February, 2013;
originally announced February 2013.
-
My Software has a Vulnerability, should I worry?
Authors:
Luca Allodi,
Fabio Massacci
Abstract:
(U.S) Rule-based policies to mitigate software risk suggest to use the CVSS score to measure the individual vulnerability risk and act accordingly: an HIGH CVSS score according to the NVD (National (U.S.) Vulnerability Database) is therefore translated into a "Yes". A key issue is whether such rule is economically sensible, in particular if reported vulnerabilities have been actually exploited in…
▽ More
(U.S) Rule-based policies to mitigate software risk suggest to use the CVSS score to measure the individual vulnerability risk and act accordingly: an HIGH CVSS score according to the NVD (National (U.S.) Vulnerability Database) is therefore translated into a "Yes". A key issue is whether such rule is economically sensible, in particular if reported vulnerabilities have been actually exploited in the wild, and whether the risk score do actually match the risk of actual exploitation.
We compare the NVD dataset with two additional datasets, the EDB for the white market of vulnerabilities (such as those present in Metasploit), and the EKITS for the exploits traded in the black market. We benchmark them against Symantec's threat explorer dataset (SYM) of actual exploit in the wild. We analyze the whole spectrum of CVSS submetrics and use these characteristics to perform a case-controlled analysis of CVSS scores (similar to those used to link lung cancer and smoking) to test its reliability as a risk factor for actual exploitation.
We conclude that (a) fixing just because a high CVSS score in NVD only yields negligible risk reduction, (b) the additional existence of proof of concepts exploits (e.g. in EDB) may yield some additional but not large risk reduction, (c) fixing in response to presence in black markets yields the equivalent risk reduction of wearing safety belt in cars (you might also die but still..). On the negative side, our study shows that as industry we miss a metric with high specificity (ruling out vulns for which we shouldn't worry).
In order to address the feedback from BlackHat 2013's audience, the final revision (V3) provides additional data in Appendix A detailing how the control variables in the study affect the results.
△ Less
Submitted 24 September, 2013; v1 submitted 7 January, 2013;
originally announced January 2013.
-
An Independent Validation of Vulnerability Discovery Models
Authors:
Viet Hung Nguyen,
Fabio Massacci
Abstract:
Having a precise vulnerability discovery model (VDM) would provide a useful quantitative insight to assess software security. Thus far, several models have been proposed with some evidence supporting their goodness-of-fit.
In this work we describe an independent validation of the applicability of six existing VDMs in seventeen releases of the three popular browsers Firefox, Google Chrome and Int…
▽ More
Having a precise vulnerability discovery model (VDM) would provide a useful quantitative insight to assess software security. Thus far, several models have been proposed with some evidence supporting their goodness-of-fit.
In this work we describe an independent validation of the applicability of six existing VDMs in seventeen releases of the three popular browsers Firefox, Google Chrome and Internet Explorer. We have collected five different kinds of data sets based on different definitions of a vulnerability. We introduce two quantitative metrics, goodness-of-fit entropy and goodness-of-fit quality, to analyze the impact of vulnerability data sets to the stability as well as quality of VDMs in the software life cycles.
The experiment result shows that the "confirmed-by-vendors' advisories" data sets apparently yields more stable and better results for VDMs. And the performance of the s-shape logistic model (AML) seems to be superior performance in overall. Meanwhile, Anderson thermodynamic model (AT) is indeed not suitable for modeling the vulnerability discovery process. This means that the discovery process of vulnerabilities and normal bugs are different because the interests of people in finding security vulnerabilities are more than finding normal programming bugs.
△ Less
Submitted 26 March, 2012;
originally announced March 2012.
-
DES: a Challenge Problem for Nonmonotonic Reasoning Systems
Authors:
Maarit Hietalahti,
Fabio Massacci,
Ilkka Niemela
Abstract:
The US Data Encryption Standard, DES for short, is put forward as an interesting benchmark problem for nonmonotonic reasoning systems because (i) it provides a set of test cases of industrial relevance which shares features of randomly generated problems and real-world problems, (ii) the representation of DES using normal logic programs with the stable model semantics is simple and easy to under…
▽ More
The US Data Encryption Standard, DES for short, is put forward as an interesting benchmark problem for nonmonotonic reasoning systems because (i) it provides a set of test cases of industrial relevance which shares features of randomly generated problems and real-world problems, (ii) the representation of DES using normal logic programs with the stable model semantics is simple and easy to understand, and (iii) this subclass of logic programs can be seen as an interesting special case for many other formalizations of nonmonotonic reasoning. In this paper we present two encodings of DES as logic programs: a direct one out of the standard specifications and an optimized one extending the work of Massacci and Marraro. The computational properties of the encodings are studied by using them for DES key search with the Smodels system as the implementation of the stable model semantics. Results indicate that the encodings and Smodels are quite competitive: they outperform state-of-the-art SAT-checkers working with an optimized encoding of DES into SAT and are comparable with a SAT-checker that is customized and tuned for the optimized SAT encoding.
△ Less
Submitted 8 March, 2000;
originally announced March 2000.