Search | arXiv e-print repository

Journey to the Center of Software Supply Chain Attacks

Authors: Piergiorgio Ladisa, Serena Elisa Ponta, Antonino Sabetta, Matias Martinez, Olivier Barais

Abstract: This work discusses open-source software supply chain attacks and proposes a general taxonomy describing how attackers conduct them. We then provide a list of safeguards to mitigate such attacks. We present our tool "Risk Explorer for Software Supply Chains" to explore such information and we discuss its industrial use-cases. This work discusses open-source software supply chain attacks and proposes a general taxonomy describing how attackers conduct them. We then provide a list of safeguards to mitigate such attacks. We present our tool "Risk Explorer for Software Supply Chains" to explore such information and we discuss its industrial use-cases. △ Less

Submitted 11 April, 2023; originally announced April 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2204.04008

arXiv:2108.05115 [pdf, ps, other]

The Used, the Bloated, and the Vulnerable: Reducing the Attack Surface of an Industrial Application

Authors: Serena Elisa Ponta, Wolfram Fischer, Henrik Plate, Antonino Sabetta

Abstract: Software reuse may result in software bloat when significant portions of application dependencies are effectively unused. Several tools exist to remove unused (byte)code from an application or its dependencies, thus producing smaller artifacts and, potentially, reducing the overall attack surface. In this paper we evaluate the ability of three debloating tools to distinguish which dependency class… ▽ More Software reuse may result in software bloat when significant portions of application dependencies are effectively unused. Several tools exist to remove unused (byte)code from an application or its dependencies, thus producing smaller artifacts and, potentially, reducing the overall attack surface. In this paper we evaluate the ability of three debloating tools to distinguish which dependency classes are necessary for an application to function correctly from those that could be safely removed. To do so, we conduct a case study on a real-world commercial Java application. Our study shows that the tools we used were able to correctly identify a considerable amount of redundant code, which could be removed without altering the results of the existing application tests. One of the redundant classes turned out to be (formerly) vulnerable, confirming that this technique has the potential to be applied for hardening purposes. However, by manually reviewing the results of our experiments, we observed that none of the tools can handle a widely used default mechanism for dynamic class loading. △ Less

Submitted 11 August, 2021; originally announced August 2021.

arXiv:2105.03346 [pdf, other]

Detecting Security Fixes in Open-Source Repositories using Static Code Analyzers

Authors: Therese Fehrer, Rocío Cabrera Lozoya, Antonino Sabetta, Dario Di Nucci, Damian A. Tamburri

Abstract: The sources of reliable, code-level information about vulnerabilities that affect open-source software (OSS) are scarce, which hinders a broad adoption of advanced tools that provide code-level detection and assessment of vulnerable OSS dependencies. In this paper, we study the extent to which the output of off-the-shelf static code analyzers can be used as a source of features to represent comm… ▽ More The sources of reliable, code-level information about vulnerabilities that affect open-source software (OSS) are scarce, which hinders a broad adoption of advanced tools that provide code-level detection and assessment of vulnerable OSS dependencies. In this paper, we study the extent to which the output of off-the-shelf static code analyzers can be used as a source of features to represent commits in Machine Learning (ML) applications. In particular, we investigate how such features can be used to construct embeddings and train ML models to automatically identify source code commits that contain vulnerability fixes. We analyze such embeddings for security-relevant and non-security-relevant commits, and we show that, although in isolation they are not different in a statistically significant manner, it is possible to use them to construct a ML pipeline that achieves results comparable with the state of the art. We also found that the combination of our method with commit2vec represents a tangible improvement over the state of the art in the automatic identification of commits that fix vulnerabilities: the ML models we construct and commit2vec are complementary, the former being more generally applicable, albeit not as accurate. △ Less

Submitted 7 May, 2021; originally announced May 2021.

Comments: Submitted to ESEC/FSE 2021, Industry Track

arXiv:2103.13375 [pdf, other]

Automated Map** of Vulnerability Advisories onto their Fix Commits in Open Source Repositories

Authors: Daan Hommersom, Antonino Sabetta, Bonaventura Coppola, Dario Di Nucci, Damian A. Tamburri

Abstract: The lack of comprehensive sources of accurate vulnerability data represents a critical obstacle to studying and understanding software vulnerabilities (and their corrections). In this paper, we present an approach that combines heuristics stemming from practical experience and machine-learning (ML) - specifically, natural language processing (NLP) - to address this problem. Our method consists of… ▽ More The lack of comprehensive sources of accurate vulnerability data represents a critical obstacle to studying and understanding software vulnerabilities (and their corrections). In this paper, we present an approach that combines heuristics stemming from practical experience and machine-learning (ML) - specifically, natural language processing (NLP) - to address this problem. Our method consists of three phases. First, an advisory record containing key information about a vulnerability is extracted from an advisory (expressed in natural language). Second, using heuristics, a subset of candidate fix commits is obtained from the source code repository of the affected project by filtering out commits that are known to be irrelevant for the task at hand. Finally, for each such candidate commit, our method builds a numerical feature vector reflecting the characteristics of the commit that are relevant to predicting its match with the advisory at hand. The feature vectors are then exploited for building a final ranked list of candidate fixing commits. The score attributed by the ML model to each feature is kept visible to the users, allowing them to interpret the predictions. We evaluated our approach using a prototype implementation named FixFinder on a manually curated data set that comprises 2,391 known fix commits corresponding to 1,248 public vulnerability advisories. When considering the top-10 commits in the ranked results, our implementation could successfully identify at least one fix commit for up to 84.03% of the vulnerabilities (with a fix commit on the first position for 65.06% of the vulnerabilities). In conclusion, our method reduces considerably the effort needed to search OSS repositories for the commits that fix known vulnerabilities. △ Less

Submitted 10 May, 2023; v1 submitted 24 March, 2021; originally announced March 2021.

arXiv:2103.03331 [pdf, other]

Secure Software Development in the Era of Fluid Multi-party Open Software and Services

Authors: Ivan Pashchenko, Riccardo Scandariato, Antonino Sabetta, Fabio Massacci

Abstract: Pushed by market forces, software development has become fast-paced. As a consequence, modern development projects are assembled from 3rd-party components. Security & privacy assurance techniques once designed for large, controlled updates over months or years, must now cope with small, continuous changes taking place within a week, and happening in sub-components that are controlled by third-part… ▽ More Pushed by market forces, software development has become fast-paced. As a consequence, modern development projects are assembled from 3rd-party components. Security & privacy assurance techniques once designed for large, controlled updates over months or years, must now cope with small, continuous changes taking place within a week, and happening in sub-components that are controlled by third-party developers one might not even know they existed. In this paper, we aim to provide an overview of the current software security approaches and evaluate their appropriateness in the face of the changed nature in software development. Software security assurance could benefit by switching from a process-based to an artefact-based approach. Further, security evaluation might need to be more incremental, automated and decentralized. We believe this can be achieved by supporting mechanisms for lightweight and scalable screenings that are applicable to the entire population of software components albeit there might be a price to pay. △ Less

Submitted 4 March, 2021; originally announced March 2021.

Comments: 7 pages, 1 figure, to be published in Proceedings of International Conference on Software Engineering - New Ideas and Emerging Results

ACM Class: D.2.0; D.2.13

arXiv:2008.04568 [pdf, other]

Code-based Vulnerability Detection in Node.js Applications: How far are we?

Authors: Bodin Chinthanet, Serena Elisa Ponta, Henrik Plate, Antonino Sabetta, Raula Gaikovina Kula, Takashi Ishio, Kenichi Matsumoto

Abstract: With one of the largest available collection of reusable packages, the JavaScript runtime environment Node.js is one of the most popular programming application. With recent work showing evidence that known vulnerabilities are prevalent in both open source and industrial software, we propose and implement a viable code-based vulnerability detection tool for Node.js applications. Our case study lis… ▽ More With one of the largest available collection of reusable packages, the JavaScript runtime environment Node.js is one of the most popular programming application. With recent work showing evidence that known vulnerabilities are prevalent in both open source and industrial software, we propose and implement a viable code-based vulnerability detection tool for Node.js applications. Our case study lists the challenges encountered while implementing our Node.js vulnerable code detector. △ Less

Submitted 11 August, 2020; originally announced August 2020.

arXiv:1911.07620 [pdf, other]

Exploiting Token and Path-based Representations of Code for Identifying Security-Relevant Commits

Authors: Achyudh Ram, Ji Xin, Meiyappan Nagappan, Yaoliang Yu, Rocío Cabrera Lozoya, Antonino Sabetta, Jimmy Lin

Abstract: Public vulnerability databases such as CVE and NVD account for only 60% of security vulnerabilities present in open-source projects, and are known to suffer from inconsistent quality. Over the last two years, there has been considerable growth in the number of known vulnerabilities across projects available in various repositories such as NPM and Maven Central. Such an increasing risk calls for a… ▽ More Public vulnerability databases such as CVE and NVD account for only 60% of security vulnerabilities present in open-source projects, and are known to suffer from inconsistent quality. Over the last two years, there has been considerable growth in the number of known vulnerabilities across projects available in various repositories such as NPM and Maven Central. Such an increasing risk calls for a mechanism to infer the presence of security threats in a timely manner. We propose novel hierarchical deep learning models for the identification of security-relevant commits from either the commit diff or the source code for the Java classes. By comparing the performance of our model against code2vec, a state-of-the-art model that learns from path-based representations of code, and a logistic regression baseline, we show that deep learning models show promising results in identifying security-related commits. We also conduct a comparative analysis of how various deep learning models learn across different input representations and the effect of regularization on the generalization of our models. △ Less

Submitted 14 November, 2019; originally announced November 2019.

arXiv:1911.07605 [pdf, other]

doi 10.1007/s42979-021-00566-z

Commit2Vec: Learning Distributed Representations of Code Changes

Authors: Rocìo Cabrera Lozoya, Arnaud Baumann, Antonino Sabetta, Michele Bezzi

Abstract: Deep learning methods, which have found successful applications in fields like image classification and natural language processing, have recently been applied to source code analysis too, due to the enormous amount of freely available source code (e.g., from open-source software repositories). In this work, we elaborate upon a state-of-the-art approach to the representation of source code that… ▽ More Deep learning methods, which have found successful applications in fields like image classification and natural language processing, have recently been applied to source code analysis too, due to the enormous amount of freely available source code (e.g., from open-source software repositories). In this work, we elaborate upon a state-of-the-art approach to the representation of source code that uses information about its syntactic structure, and we adapt it to represent source changes (i.e., commits). We use this representation to classify security-relevant commits. Because our method uses transfer learning (that is, we train a network on a "pretext task" for which abundant labeled data is available, and then we use such network for the target task of commit classification, for which fewer labeled instances are available), we studied the impact of pre-training the network using two different pretext tasks versus a randomly initialized model. Our results indicate that representations that leverage the structural information obtained through code syntax outperform token-based representations. Furthermore, the performance metrics obtained when pre-training on a loosely related pretext task with a very large dataset ($>10^6$ samples) were surpassed when pretraining on a smaller dataset ($>10^4$ samples) but for a pretext task that is more closely related to the target task. △ Less

Submitted 17 November, 2021; v1 submitted 18 November, 2019; originally announced November 2019.

Comments: A previous version of this paper had the following title: "patch2vec: Distributed Representation of Code Changes"; we updated the title to distinguish it from another pre-existing approach with the same name. An improved version of this work appeared in Springer Nature Computer Science, 2021 (https://doi.org/10.1007/s42979-021-00566-z)

Journal ref: SN Computer Science volume 2, Article number: 150 (2021)

arXiv:1902.02595 [pdf, other]

A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software

Authors: Serena E. Ponta, Henrik Plate, Antonino Sabetta, Michele Bezzi, Cédric Dangremont

Abstract: Advancing our understanding of software vulnerabilities, automating their identification, the analysis of their impact, and ultimately their mitigation is necessary to enable the development of software that is more secure. While operating a vulnerability assessment tool that we developed and that is currently used by hundreds of development units at SAP, we manually collected and curated a datase… ▽ More Advancing our understanding of software vulnerabilities, automating their identification, the analysis of their impact, and ultimately their mitigation is necessary to enable the development of software that is more secure. While operating a vulnerability assessment tool that we developed and that is currently used by hundreds of development units at SAP, we manually collected and curated a dataset of vulnerabilities of open-source software and the commits fixing them. The data was obtained both from the National Vulnerability Database (NVD) and from project-specific Web resources that we monitor on a continuous basis. From that data, we extracted a dataset that maps 624 publicly disclosed vulnerabilities affecting 205 distinct open-source Java projects, used in SAP products or internal tools, onto the 1282 commits that fix them. Out of 624 vulnerabilities, 29 do not have a CVE identifier at all and 46, which do have a CVE identifier assigned by a numbering authority, are not available in the NVD yet. The dataset is released under an open-source license, together with supporting scripts that allow researchers to automatically retrieve the actual content of the commits from the corresponding repositories and to augment the attributes available for each instance. Also, these scripts allow to complement the dataset with additional instances that are not security fixes (which is useful, for example, in machine learning applications). Our dataset has been successfully used to train classifiers that could automatically identify security-relevant commits in code repositories. The release of this dataset and the supporting code as open-source will allow future research to be based on data of industrial relevance; also, it represents a concrete step towards making the maintenance of this dataset a shared effort involving open-source communities, academia, and the industry. △ Less

Submitted 19 March, 2019; v1 submitted 7 February, 2019; originally announced February 2019.

Comments: This is a pre-print version of the paper that appears in the proceedings of The 16th International Conference on Mining Software Repositories (MSR), Data Showcase track

Journal ref: Proceedings of The 16th International Conference on Mining Software Repositories (Data Showcase track), 2019

arXiv:1808.09753 [pdf, other]

Vulnerable Open Source Dependencies: Counting Those That Matter

Authors: Ivan Pashchenko, Henrik Plate, Serena Elisa Ponta, Antonino Sabetta, Fabio Massacci

Abstract: BACKGROUND: Vulnerable dependencies are a known problem in today's open-source software ecosystems because OSS libraries are highly interconnected and developers do not always update their dependencies. AIMS: In this paper we aim to present a precise methodology, that combines the code-based analysis of patches with information on build, test, update dates, and group extracted from the very code r… ▽ More BACKGROUND: Vulnerable dependencies are a known problem in today's open-source software ecosystems because OSS libraries are highly interconnected and developers do not always update their dependencies. AIMS: In this paper we aim to present a precise methodology, that combines the code-based analysis of patches with information on build, test, update dates, and group extracted from the very code repository, and therefore, caters to the needs of industrial practice for correct allocation of development and audit resources. METHOD: To understand the industrial impact of the proposed methodology, we considered the 200 most popular OSS Java libraries used by SAP in its own software. Our analysis included 10905 distinct GAVs (group, artifact, version) when considering all the library versions. RESULTS: We found that about 20% of the dependencies affected by a known vulnerability are not deployed, and therefore, they do not represent a danger to the analyzed library because they cannot be exploited in practice. Developers of the analyzed libraries are able to fix (and actually responsible for) 82% of the deployed vulnerable dependencies. The vast majority (81%) of vulnerable dependencies may be fixed by simply updating to a new version, while 1% of the vulnerable dependencies in our sample are halted, and therefore, potentially require a costly mitigation strategy. CONCLUSIONS: Our case study shows that the correct counting allows software development companies to receive actionable information about their library dependencies, and therefore, correctly allocate costly development and audit resources, which is spent inefficiently in case of distorted measurements. △ Less

Submitted 29 August, 2018; originally announced August 2018.

Comments: This is a pre-print of the paper that appears, with the same title, in the proceedings of the 12th International Symposium on Empirical Software Engineering and Measurement, 2018

arXiv:1807.02458 [pdf, other]

A Practical Approach to the Automatic Classification of Security-Relevant Commits

Authors: Antonino Sabetta, Michele Bezzi

Abstract: The lack of reliable sources of detailed information on the vulnerabilities of open-source software (OSS) components is a major obstacle to maintaining a secure software supply chain and an effective vulnerability management process. Standard sources of advisories and vulnerability data, such as the National Vulnerability Database (NVD), are known to suffer from poor coverage and inconsistent qual… ▽ More The lack of reliable sources of detailed information on the vulnerabilities of open-source software (OSS) components is a major obstacle to maintaining a secure software supply chain and an effective vulnerability management process. Standard sources of advisories and vulnerability data, such as the National Vulnerability Database (NVD), are known to suffer from poor coverage and inconsistent quality. To reduce our dependency on these sources, we propose an approach that uses machine-learning to analyze source code repositories and to automatically identify commits that are security-relevant (i.e., that are likely to fix a vulnerability). We treat the source code changes introduced by commits as documents written in natural language, classifying them using standard document classification methods. Combining independent classifiers that use information from different facets of commits, our method can yield high precision (80%) while ensuring acceptable recall (43%). In particular, the use of information extracted from the source code changes yields a substantial improvement over the best known approach in state of the art, while requiring a significantly smaller amount of training data and employing a simpler architecture. △ Less

Submitted 6 July, 2018; originally announced July 2018.

Comments: Published in the Proc. of the 34th IEEE International Conference on Software Maintenance and Evolution (ICSME) 2018

arXiv:1806.05893 [pdf, other]

Beyond Metadata: Code-centric and Usage-based Analysis of Known Vulnerabilities in Open-source Software

Authors: Serena E. Ponta, Henrik Plate, Antonino Sabetta

Abstract: The use of open-source software (OSS) is ever-increasing, and so is the number of open-source vulnerabilities being discovered and publicly disclosed. The gains obtained from the reuse of community-developed libraries may be offset by the cost of detecting, assessing, and mitigating their vulnerabilities in a timely fashion. In this paper we present a novel method to detect, assess and mitigate… ▽ More The use of open-source software (OSS) is ever-increasing, and so is the number of open-source vulnerabilities being discovered and publicly disclosed. The gains obtained from the reuse of community-developed libraries may be offset by the cost of detecting, assessing, and mitigating their vulnerabilities in a timely fashion. In this paper we present a novel method to detect, assess and mitigate OSS vulnerabilities that improves on state-of-the-art approaches, which commonly depend on metadata to identify vulnerable OSS dependencies. Our solution instead is code-centric and combines static and dynamic analysis to determine the reachability of the vulnerable portion of libraries used (directly or transitively) by an application. Taking this usage into account, our approach then supports developers in choosing among the existing non-vulnerable library versions. VULAS, the tool implementing our code-centric and usage-based approach, is officially recommended by SAP to scan its Java software, and has been successfully used to perform more than 250000 scans of about 500 applications since December 2016. We report on our experience and on the lessons we learned when maturing the tool from a research prototype to an industrial-grade solution. △ Less

Submitted 12 July, 2018; v1 submitted 15 June, 2018; originally announced June 2018.

Comments: To appear in the Proc. of the 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME) Added: - acknowledgements - citation to Dashevskyi et al. (TSE 2018), DOI: 10.1109/TSE.2018.2816033

arXiv:1709.03084 [pdf, other]

doi 10.1007/s10009-017-0474-1

TestREx: a Framework for Repeatable Exploits

Authors: Stanislav Dashevskyi, Daniel Ricardo dos Santos, Fabio Massacci, Antonino Sabetta

Abstract: Web applications are the target of many well known exploits and also a fertile ground for the discovery of security vulnerabilities. Yet, the success of an exploit depends both on the vulnerability in the application source code and the environment in which the application is deployed and run. As execution environments are complex (application servers, databases and other supporting applications),… ▽ More Web applications are the target of many well known exploits and also a fertile ground for the discovery of security vulnerabilities. Yet, the success of an exploit depends both on the vulnerability in the application source code and the environment in which the application is deployed and run. As execution environments are complex (application servers, databases and other supporting applications), we need to have a reliable framework to test whether known exploits can be reproduced in different settings, better understand their effects, and facilitate the discovery of new vulnerabilities. In this paper, we present TestREx - a framework that allows for highly automated, easily repeatable exploit testing in a variety of contexts, so that a security tester may quickly and efficiently perform large-scale experiments with vulnerability exploits. It supports packing and running applications with their environments, injecting exploits, monitoring their success, and generating security reports. We also provide a corpus of example applications, taken from related works or implemented by us. △ Less

Submitted 10 September, 2017; originally announced September 2017.

Journal ref: Int. J. Software Tools for Technology Transfer, 2017

arXiv:1504.04971 [pdf, other]

Impact assessment for vulnerabilities in open-source software libraries

Authors: Henrik Plate, Serena Elisa Ponta, Antonino Sabetta

Abstract: Software applications integrate more and more open-source software (OSS) to benefit from code reuse. As a drawback, each vulnerability discovered in bundled OSS potentially affects the application. Upon the disclosure of every new vulnerability, the application vendor has to decide whether it is exploitable in his particular usage context, hence, whether users require an urgent application patch c… ▽ More Software applications integrate more and more open-source software (OSS) to benefit from code reuse. As a drawback, each vulnerability discovered in bundled OSS potentially affects the application. Upon the disclosure of every new vulnerability, the application vendor has to decide whether it is exploitable in his particular usage context, hence, whether users require an urgent application patch containing a non-vulnerable version of the OSS. Current decision making is mostly based on high-level vulnerability descriptions and expert knowledge, thus, effort intense and error prone. This paper proposes a pragmatic approach to facilitate the impact assessment, describes a proof-of-concept for Java, and examines one example vulnerability as case study. The approach is independent from specific kinds of vulnerabilities or programming languages and can deliver immediate results. △ Less

Submitted 21 April, 2015; v1 submitted 20 April, 2015; originally announced April 2015.

arXiv:1307.6980 [pdf, other]

doi 10.1007/978-3-642-41030-7_31

Machine-Readable Privacy Certificates for Services

Authors: Marco Anisetti, Claudio A. Ardagna, Michele Bezzi, Ernesto Damiani, Antonino Sabetta

Abstract: Privacy-aware processing of personal data on the web of services requires managing a number of issues arising both from the technical and the legal domain. Several approaches have been proposed to matching privacy requirements (on the clients side) and privacy guarantees (on the service provider side). Still, the assurance of effective data protection (when possible) relies on substantial human ef… ▽ More Privacy-aware processing of personal data on the web of services requires managing a number of issues arising both from the technical and the legal domain. Several approaches have been proposed to matching privacy requirements (on the clients side) and privacy guarantees (on the service provider side). Still, the assurance of effective data protection (when possible) relies on substantial human effort and exposes organizations to significant (non-)compliance risks. In this paper we put forward the idea that a privacy certification scheme producing and managing machine-readable artifacts in the form of privacy certificates can play an important role towards the solution of this problem. Digital privacy certificates represent the reasons why a privacy property holds for a service and describe the privacy measures supporting it. Also, privacy certificates can be used to automatically select services whose certificates match the client policies (privacy requirements). Our proposal relies on an evolution of the conceptual model developed in the Assert4Soa project and on a certificate format specifically tailored to represent privacy properties. To validate our approach, we present a worked-out instance showing how privacy property Retention-based unlinkability can be certified for a banking financial service. △ Less

Submitted 26 July, 2013; originally announced July 2013.

Comments: 20 pages, 6 figures

Showing 1–15 of 15 results for author: Sabetta, A