Search | arXiv e-print repository

doi 10.1145/3627106.3627138

On the Feasibility of Cross-Language Detection of Malicious Packages in npm and PyPI

Authors: Piergiorgio Ladisa, Serena Elisa Ponta, Nicola Ronzoni, Matias Martinez, Olivier Barais

Abstract: Current software supply chains heavily rely on open-source packages hosted in public repositories. Given the popularity of ecosystems like npm and PyPI, malicious users started to spread malware by publishing open-source packages containing malicious code. Recent works apply machine learning techniques to detect malicious packages in the npm ecosystem. However, the scarcity of samples poses a chal… ▽ More Current software supply chains heavily rely on open-source packages hosted in public repositories. Given the popularity of ecosystems like npm and PyPI, malicious users started to spread malware by publishing open-source packages containing malicious code. Recent works apply machine learning techniques to detect malicious packages in the npm ecosystem. However, the scarcity of samples poses a challenge to the application of machine learning techniques in other ecosystems. Despite the differences between JavaScript and Python, the open-source software supply chain attacks targeting such languages show noticeable similarities (e.g., use of installation scripts, obfuscated strings, URLs). In this paper, we present a novel approach that involves a set of language-independent features and the training of models capable of detecting malicious packages in npm and PyPI by capturing their commonalities. This methodology allows us to train models on a diverse dataset encompassing multiple languages, thereby overcoming the challenge of limited sample availability. We evaluate the models both in a controlled experiment (where labels of data are known) and in the wild by scanning newly uploaded packages for both npm and PyPI for 10 days. We find that our approach successfully detects malicious packages for both npm and PyPI. Over an analysis of 31,292 packages, we reported 58 previously unknown malicious packages (38 for npm and 20 for PyPI), which were consequently removed from the respective repositories. △ Less

Submitted 14 October, 2023; originally announced October 2023.

Comments: Proceedings of Annual Computer Security Applications Conference (ACSAC '23), December 4--8, 2023, Austin, TX, USA

arXiv:2307.09087 [pdf, other]

doi 10.1145/3605770.3625212

The Hitchhiker's Guide to Malicious Third-Party Dependencies

Authors: Piergiorgio Ladisa, Merve Sahin, Serena Elisa Ponta, Marco Rosa, Matias Martinez, Olivier Barais

Abstract: The increasing popularity of certain programming languages has spurred the creation of ecosystem-specific package repositories and package managers. Such repositories (e.g., npm, PyPI) serve as public databases that users can query to retrieve packages for various functionalities, whereas package managers automatically handle dependency resolution and package installation on the client side. These… ▽ More The increasing popularity of certain programming languages has spurred the creation of ecosystem-specific package repositories and package managers. Such repositories (e.g., npm, PyPI) serve as public databases that users can query to retrieve packages for various functionalities, whereas package managers automatically handle dependency resolution and package installation on the client side. These mechanisms enhance software modularization and accelerate implementation. However, they have become a target for malicious actors seeking to propagate malware on a large scale. In this work, we show how attackers can leverage capabilities of popular package managers and languages to achieve arbitrary code execution on victim machines, thereby realizing open-source software supply chain attacks. Based on the analysis of 7 ecosystems, we identify 3 install-time and 4 runtime techniques, and we provide recommendations describing how to reduce the risk when consuming third-party dependencies. We will provide proof-of-concepts that demonstrate the identified techniques. Furthermore, we describe evasion strategies employed by attackers to circumvent detection mechanisms. △ Less

Submitted 6 October, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

Comments: Proceedings of the 2023 Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses (SCORED '23), November 30, 2023, Copenhagen, Denmark

arXiv:2304.05200 [pdf, other]

Journey to the Center of Software Supply Chain Attacks

Authors: Piergiorgio Ladisa, Serena Elisa Ponta, Antonino Sabetta, Matias Martinez, Olivier Barais

Abstract: This work discusses open-source software supply chain attacks and proposes a general taxonomy describing how attackers conduct them. We then provide a list of safeguards to mitigate such attacks. We present our tool "Risk Explorer for Software Supply Chains" to explore such information and we discuss its industrial use-cases. This work discusses open-source software supply chain attacks and proposes a general taxonomy describing how attackers conduct them. We then provide a list of safeguards to mitigate such attacks. We present our tool "Risk Explorer for Software Supply Chains" to explore such information and we discuss its industrial use-cases. △ Less

Submitted 11 April, 2023; originally announced April 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2204.04008

arXiv:2210.03998 [pdf, other]

Towards the Detection of Malicious Java Packages

Authors: Piergiorgio Ladisa, Henrik Plate, Matias Martinez, Olivier Barais, Serena Elisa Ponta

Abstract: Open-source software supply chain attacks aim at infecting downstream users by poisoning open-source packages. The common way of consuming such artifacts is through package repositories and the development of vetting strategies to detect such attacks is ongoing research. Despite its popularity, the Java ecosystem is the less explored one in the context of supply chain attacks. In this paper we p… ▽ More Open-source software supply chain attacks aim at infecting downstream users by poisoning open-source packages. The common way of consuming such artifacts is through package repositories and the development of vetting strategies to detect such attacks is ongoing research. Despite its popularity, the Java ecosystem is the less explored one in the context of supply chain attacks. In this paper we present indicators of malicious behavior that can be observed statically through the analysis of Java bytecode. Then we evaluate how such indicators and their combinations perform when detecting malicious code injections. We do so by injecting three malicious payloads taken from real-world examples into the Top-10 most popular Java libraries from libraries.io. We found that the analysis of strings in the constant pool and of sensitive APIs in the bytecode instructions aid in the task of detecting malicious Java packages by significantly reducing the information, thus, making also manual triage possible. △ Less

Submitted 8 October, 2022; originally announced October 2022.

arXiv:2108.05115 [pdf, ps, other]

The Used, the Bloated, and the Vulnerable: Reducing the Attack Surface of an Industrial Application

Authors: Serena Elisa Ponta, Wolfram Fischer, Henrik Plate, Antonino Sabetta

Abstract: Software reuse may result in software bloat when significant portions of application dependencies are effectively unused. Several tools exist to remove unused (byte)code from an application or its dependencies, thus producing smaller artifacts and, potentially, reducing the overall attack surface. In this paper we evaluate the ability of three debloating tools to distinguish which dependency class… ▽ More Software reuse may result in software bloat when significant portions of application dependencies are effectively unused. Several tools exist to remove unused (byte)code from an application or its dependencies, thus producing smaller artifacts and, potentially, reducing the overall attack surface. In this paper we evaluate the ability of three debloating tools to distinguish which dependency classes are necessary for an application to function correctly from those that could be safely removed. To do so, we conduct a case study on a real-world commercial Java application. Our study shows that the tools we used were able to correctly identify a considerable amount of redundant code, which could be removed without altering the results of the existing application tests. One of the redundant classes turned out to be (formerly) vulnerable, confirming that this technique has the potential to be applied for hardening purposes. However, by manually reviewing the results of our experiments, we observed that none of the tools can handle a widely used default mechanism for dynamic class loading. △ Less

Submitted 11 August, 2021; originally announced August 2021.

arXiv:2008.04568 [pdf, other]

Code-based Vulnerability Detection in Node.js Applications: How far are we?

Authors: Bodin Chinthanet, Serena Elisa Ponta, Henrik Plate, Antonino Sabetta, Raula Gaikovina Kula, Takashi Ishio, Kenichi Matsumoto

Abstract: With one of the largest available collection of reusable packages, the JavaScript runtime environment Node.js is one of the most popular programming application. With recent work showing evidence that known vulnerabilities are prevalent in both open source and industrial software, we propose and implement a viable code-based vulnerability detection tool for Node.js applications. Our case study lis… ▽ More With one of the largest available collection of reusable packages, the JavaScript runtime environment Node.js is one of the most popular programming application. With recent work showing evidence that known vulnerabilities are prevalent in both open source and industrial software, we propose and implement a viable code-based vulnerability detection tool for Node.js applications. Our case study lists the challenges encountered while implementing our Node.js vulnerable code detector. △ Less

Submitted 11 August, 2020; originally announced August 2020.

arXiv:1902.02595 [pdf, other]

A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software

Authors: Serena E. Ponta, Henrik Plate, Antonino Sabetta, Michele Bezzi, Cédric Dangremont

Abstract: Advancing our understanding of software vulnerabilities, automating their identification, the analysis of their impact, and ultimately their mitigation is necessary to enable the development of software that is more secure. While operating a vulnerability assessment tool that we developed and that is currently used by hundreds of development units at SAP, we manually collected and curated a datase… ▽ More Advancing our understanding of software vulnerabilities, automating their identification, the analysis of their impact, and ultimately their mitigation is necessary to enable the development of software that is more secure. While operating a vulnerability assessment tool that we developed and that is currently used by hundreds of development units at SAP, we manually collected and curated a dataset of vulnerabilities of open-source software and the commits fixing them. The data was obtained both from the National Vulnerability Database (NVD) and from project-specific Web resources that we monitor on a continuous basis. From that data, we extracted a dataset that maps 624 publicly disclosed vulnerabilities affecting 205 distinct open-source Java projects, used in SAP products or internal tools, onto the 1282 commits that fix them. Out of 624 vulnerabilities, 29 do not have a CVE identifier at all and 46, which do have a CVE identifier assigned by a numbering authority, are not available in the NVD yet. The dataset is released under an open-source license, together with supporting scripts that allow researchers to automatically retrieve the actual content of the commits from the corresponding repositories and to augment the attributes available for each instance. Also, these scripts allow to complement the dataset with additional instances that are not security fixes (which is useful, for example, in machine learning applications). Our dataset has been successfully used to train classifiers that could automatically identify security-relevant commits in code repositories. The release of this dataset and the supporting code as open-source will allow future research to be based on data of industrial relevance; also, it represents a concrete step towards making the maintenance of this dataset a shared effort involving open-source communities, academia, and the industry. △ Less

Submitted 19 March, 2019; v1 submitted 7 February, 2019; originally announced February 2019.

Comments: This is a pre-print version of the paper that appears in the proceedings of The 16th International Conference on Mining Software Repositories (MSR), Data Showcase track

Journal ref: Proceedings of The 16th International Conference on Mining Software Repositories (Data Showcase track), 2019

arXiv:1808.09753 [pdf, other]

Vulnerable Open Source Dependencies: Counting Those That Matter

Authors: Ivan Pashchenko, Henrik Plate, Serena Elisa Ponta, Antonino Sabetta, Fabio Massacci

Abstract: BACKGROUND: Vulnerable dependencies are a known problem in today's open-source software ecosystems because OSS libraries are highly interconnected and developers do not always update their dependencies. AIMS: In this paper we aim to present a precise methodology, that combines the code-based analysis of patches with information on build, test, update dates, and group extracted from the very code r… ▽ More BACKGROUND: Vulnerable dependencies are a known problem in today's open-source software ecosystems because OSS libraries are highly interconnected and developers do not always update their dependencies. AIMS: In this paper we aim to present a precise methodology, that combines the code-based analysis of patches with information on build, test, update dates, and group extracted from the very code repository, and therefore, caters to the needs of industrial practice for correct allocation of development and audit resources. METHOD: To understand the industrial impact of the proposed methodology, we considered the 200 most popular OSS Java libraries used by SAP in its own software. Our analysis included 10905 distinct GAVs (group, artifact, version) when considering all the library versions. RESULTS: We found that about 20% of the dependencies affected by a known vulnerability are not deployed, and therefore, they do not represent a danger to the analyzed library because they cannot be exploited in practice. Developers of the analyzed libraries are able to fix (and actually responsible for) 82% of the deployed vulnerable dependencies. The vast majority (81%) of vulnerable dependencies may be fixed by simply updating to a new version, while 1% of the vulnerable dependencies in our sample are halted, and therefore, potentially require a costly mitigation strategy. CONCLUSIONS: Our case study shows that the correct counting allows software development companies to receive actionable information about their library dependencies, and therefore, correctly allocate costly development and audit resources, which is spent inefficiently in case of distorted measurements. △ Less

Submitted 29 August, 2018; originally announced August 2018.

Comments: This is a pre-print of the paper that appears, with the same title, in the proceedings of the 12th International Symposium on Empirical Software Engineering and Measurement, 2018

arXiv:1806.05893 [pdf, other]

Beyond Metadata: Code-centric and Usage-based Analysis of Known Vulnerabilities in Open-source Software

Authors: Serena E. Ponta, Henrik Plate, Antonino Sabetta

Abstract: The use of open-source software (OSS) is ever-increasing, and so is the number of open-source vulnerabilities being discovered and publicly disclosed. The gains obtained from the reuse of community-developed libraries may be offset by the cost of detecting, assessing, and mitigating their vulnerabilities in a timely fashion. In this paper we present a novel method to detect, assess and mitigate… ▽ More The use of open-source software (OSS) is ever-increasing, and so is the number of open-source vulnerabilities being discovered and publicly disclosed. The gains obtained from the reuse of community-developed libraries may be offset by the cost of detecting, assessing, and mitigating their vulnerabilities in a timely fashion. In this paper we present a novel method to detect, assess and mitigate OSS vulnerabilities that improves on state-of-the-art approaches, which commonly depend on metadata to identify vulnerable OSS dependencies. Our solution instead is code-centric and combines static and dynamic analysis to determine the reachability of the vulnerable portion of libraries used (directly or transitively) by an application. Taking this usage into account, our approach then supports developers in choosing among the existing non-vulnerable library versions. VULAS, the tool implementing our code-centric and usage-based approach, is officially recommended by SAP to scan its Java software, and has been successfully used to perform more than 250000 scans of about 500 applications since December 2016. We report on our experience and on the lessons we learned when maturing the tool from a research prototype to an industrial-grade solution. △ Less

Submitted 12 July, 2018; v1 submitted 15 June, 2018; originally announced June 2018.

Comments: To appear in the Proc. of the 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME) Added: - acknowledgements - citation to Dashevskyi et al. (TSE 2018), DOI: 10.1109/TSE.2018.2816033

arXiv:1507.07479 [pdf, other]

Modularity for Security-Sensitive Workflows

Authors: Daniel Ricardo dos Santos, Silvio Ranise, Serena Elisa Ponta

Abstract: An established trend in software engineering insists on using components (sometimes also called services or packages) to encapsulate a set of related functionalities or data. By defining interfaces specifying what functionalities they provide or use, components can be combined with others to form more complex components. In this way, IT systems can be designed by mostly re-using existing component… ▽ More An established trend in software engineering insists on using components (sometimes also called services or packages) to encapsulate a set of related functionalities or data. By defining interfaces specifying what functionalities they provide or use, components can be combined with others to form more complex components. In this way, IT systems can be designed by mostly re-using existing components and develo** new ones to provide new functionalities. In this paper, we introduce a notion of component and a combination mechanism for an important class of software artifacts, called security-sensitive workflows. These are business processes in which execution constraints on the tasks are complemented with authorization constraints (e.g., Separation of Duty) and authorization policies (constraining which users can execute which tasks). We show how well-known workflow execution patterns can be simulated by our combination mechanism and how authorization constraints can also be imposed across components. Then, we demonstrate the usefulness of our notion of component by showing (i) the scalability of a technique for the synthesis of run-time monitors for security-sensitive workflows and (ii) the design of a plug-in for the re-use of workflows and related run-time monitors inside an editor for security-sensitive workflows. △ Less

Submitted 27 July, 2015; originally announced July 2015.

arXiv:1504.04971 [pdf, other]

Impact assessment for vulnerabilities in open-source software libraries

Authors: Henrik Plate, Serena Elisa Ponta, Antonino Sabetta

Abstract: Software applications integrate more and more open-source software (OSS) to benefit from code reuse. As a drawback, each vulnerability discovered in bundled OSS potentially affects the application. Upon the disclosure of every new vulnerability, the application vendor has to decide whether it is exploitable in his particular usage context, hence, whether users require an urgent application patch c… ▽ More Software applications integrate more and more open-source software (OSS) to benefit from code reuse. As a drawback, each vulnerability discovered in bundled OSS potentially affects the application. Upon the disclosure of every new vulnerability, the application vendor has to decide whether it is exploitable in his particular usage context, hence, whether users require an urgent application patch containing a non-vulnerable version of the OSS. Current decision making is mostly based on high-level vulnerability descriptions and expert knowledge, thus, effort intense and error prone. This paper proposes a pragmatic approach to facilitate the impact assessment, describes a proof-of-concept for Java, and examines one example vulnerability as case study. The approach is independent from specific kinds of vulnerabilities or programming languages and can deliver immediate results. △ Less

Submitted 21 April, 2015; v1 submitted 20 April, 2015; originally announced April 2015.

arXiv:1206.6757 [pdf, other]

Detection of Configuration Vulnerabilities in Distributed (Web) Environments

Authors: Matteo Maria Casalino, Michele Mangili, Henrik Plate, Serena Elisa Ponta

Abstract: Many tools and libraries are readily available to build and operate distributed Web applications. While the setup of operational environments is comparatively easy, practice shows that their continuous secure operation is more difficult to achieve, many times resulting in vulnerable systems exposed to the Internet. Authenticated vulnerability scanners and validation tools represent a means to dete… ▽ More Many tools and libraries are readily available to build and operate distributed Web applications. While the setup of operational environments is comparatively easy, practice shows that their continuous secure operation is more difficult to achieve, many times resulting in vulnerable systems exposed to the Internet. Authenticated vulnerability scanners and validation tools represent a means to detect security vulnerabilities caused by missing patches or misconfiguration, but current approaches center much around the concepts of hosts and operating systems. This paper presents a language and an approach for the declarative specification and execution of machine-readable security checks for sets of more fine-granular system components depending on each other in a distributed environment. Such a language, building on existing standards, fosters the creation and sharing of security content among security stakeholders. Our approach is exemplified by vulnerabilities of and corresponding checks for Open Source Software commonly used in today's Internet applications. △ Less

Submitted 12 July, 2012; v1 submitted 28 June, 2012; originally announced June 2012.

Comments: 18 pages. To appear in Proc. of Security and Privacy in Communication Networks - 8th Iternational ICST Conference, SecureComm, 2012

Showing 1–12 of 12 results for author: Ponta, S E