Search | arXiv e-print repository

Have You Poisoned My Data? Defending Neural Networks against Data Poisoning

Authors: Fabio De Gaspari, Dorjan Hitaj, Luigi V. Mancini

Abstract: The unprecedented availability of training data fueled the rapid development of powerful neural networks in recent years. However, the need for such large amounts of data leads to potential threats such as poisoning attacks: adversarial manipulations of the training data aimed at compromising the learned model to achieve a given adversarial goal. This paper investigates defenses against clean-la… ▽ More The unprecedented availability of training data fueled the rapid development of powerful neural networks in recent years. However, the need for such large amounts of data leads to potential threats such as poisoning attacks: adversarial manipulations of the training data aimed at compromising the learned model to achieve a given adversarial goal. This paper investigates defenses against clean-label poisoning attacks and proposes a novel approach to detect and filter poisoned datapoints in the transfer learning setting. We define a new characteristic vector representation of datapoints and show that it effectively captures the intrinsic properties of the data distribution. Through experimental analysis, we demonstrate that effective poisons can be successfully differentiated from clean points in the characteristic vector space. We thoroughly evaluate our proposed approach and compare it to existing state-of-the-art defenses using multiple architectures, datasets, and poison budgets. Our evaluation shows that our proposal outperforms existing approaches in defense rate and final trained model performance across all experimental settings. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Paper accepted for publication at European Symposium on Research in Computer Security (ESORICS) 2024

arXiv:2403.03593 [pdf, other]

Do You Trust Your Model? Emerging Malware Threats in the Deep Learning Ecosystem

Authors: Dorjan Hitaj, Giulio Pagnotta, Fabio De Gaspari, Sediola Ruko, Briland Hitaj, Luigi V. Mancini, Fernando Perez-Cruz

Abstract: Training high-quality deep learning models is a challenging task due to computational and technical requirements. A growing number of individuals, institutions, and companies increasingly rely on pre-trained, third-party models made available in public repositories. These models are often used directly or integrated in product pipelines with no particular precautions, since they are effectively ju… ▽ More Training high-quality deep learning models is a challenging task due to computational and technical requirements. A growing number of individuals, institutions, and companies increasingly rely on pre-trained, third-party models made available in public repositories. These models are often used directly or integrated in product pipelines with no particular precautions, since they are effectively just data in tensor form and considered safe. In this paper, we raise awareness of a new machine learning supply chain threat targeting neural networks. We introduce MaleficNet 2.0, a novel technique to embed self-extracting, self-executing malware in neural networks. MaleficNet 2.0 uses spread-spectrum channel coding combined with error correction techniques to inject malicious payloads in the parameters of deep neural networks. MaleficNet 2.0 injection technique is stealthy, does not degrade the performance of the model, and is robust against removal techniques. We design our approach to work both in traditional and distributed learning settings such as Federated Learning, and demonstrate that it is effective even when a reduced number of bits is used for the model parameters. Finally, we implement a proof-of-concept self-extracting neural network malware using MaleficNet 2.0, demonstrating the practicality of the attack against a widely adopted machine learning framework. Our aim with this work is to raise awareness against these new, dangerous attacks both in the research community and industry, and we hope to encourage further research in mitigation techniques against such threats. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 16 pages, 9 figures

arXiv:2303.00431 [pdf, other]

OliVaR: Improving Olive Variety Recognition using Deep Neural Networks

Authors: Hristofor Miho, Giulio Pagnotta, Dorjan Hitaj, Fabio De Gaspari, Luigi V. Mancini, Georgios Koubouris, Gianluca Godino, Mehmet Hakan, Concepcion Muñoz Diez

Abstract: The easy and accurate identification of varieties is fundamental in agriculture, especially in the olive sector, where more than 1200 olive varieties are currently known worldwide. Varietal misidentification leads to many potential problems for all the actors in the sector: farmers and nursery workers may establish the wrong variety, leading to its maladaptation in the field; olive oil and table o… ▽ More The easy and accurate identification of varieties is fundamental in agriculture, especially in the olive sector, where more than 1200 olive varieties are currently known worldwide. Varietal misidentification leads to many potential problems for all the actors in the sector: farmers and nursery workers may establish the wrong variety, leading to its maladaptation in the field; olive oil and table olive producers may label and sell a non-authentic product; consumers may be misled; and breeders may commit errors during targeted crossings between different varieties. To date, the standard for varietal identification and certification consists of two methods: morphological classification and genetic analysis. The morphological classification consists of the visual pairwise comparison of different organs of the olive tree, where the most important organ is considered to be the endocarp. In contrast, different methods for genetic classification exist (RAPDs, SSR, and SNP). Both classification methods present advantages and disadvantages. Visual morphological classification requires highly specialized personnel and is prone to human error. Genetic identification methods are more accurate but incur a high cost and are difficult to implement. This paper introduces OliVaR, a novel approach to olive varietal identification. OliVaR uses a teacher-student deep learning architecture to learn the defining characteristics of the endocarp of each specific olive variety and perform classification. We construct what is, to the best of our knowledge, the largest olive variety dataset to date, comprising image data for 131 varieties from the Mediterranean basin. We thoroughly test OliVaR on this dataset and show that it correctly predicts olive varieties with over 86% accuracy. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: 10 pages, 9 figures

arXiv:2303.00387 [pdf, other]

doi 10.1109/TIFS.2023.3318964

DOLOS: A Novel Architecture for Moving Target Defense

Authors: Giulio Pagnotta, Fabio De Gaspari, Dorjan Hitaj, Mauro Andreolini, Michele Colajanni, Luigi V. Mancini

Abstract: Moving Target Defense and Cyber Deception emerged in recent years as two key proactive cyber defense approaches, contrasting with the static nature of the traditional reactive cyber defense. The key insight behind these approaches is to impose an asymmetric disadvantage for the attacker by using deception and randomization techniques to create a dynamic attack surface. Moving Target Defense typica… ▽ More Moving Target Defense and Cyber Deception emerged in recent years as two key proactive cyber defense approaches, contrasting with the static nature of the traditional reactive cyber defense. The key insight behind these approaches is to impose an asymmetric disadvantage for the attacker by using deception and randomization techniques to create a dynamic attack surface. Moving Target Defense typically relies on system randomization and diversification, while Cyber Deception is based on decoy nodes and fake systems to deceive attackers. However, current Moving Target Defense techniques are complex to manage and can introduce high overheads, while Cyber Deception nodes are easily recognized and avoided by adversaries. This paper presents DOLOS, a novel architecture that unifies Cyber Deception and Moving Target Defense approaches. DOLOS is motivated by the insight that deceptive techniques are much more powerful when integrated into production systems rather than deployed alongside them. DOLOS combines typical Moving Target Defense techniques, such as randomization, diversity, and redundancy, with cyber deception and seamlessly integrates them into production systems through multiple layers of isolation. We extensively evaluate DOLOS against a wide range of attackers, ranging from automated malware to professional penetration testers, and show that DOLOS is highly effective in slowing down attacks and protecting the integrity of production systems. We also provide valuable insights and considerations for the future development of MTD techniques based on our findings. △ Less

Submitted 27 September, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: 16 pages

Journal ref: IEEE Transactions on Information Forensics and Security, 2023

arXiv:2301.11050 [pdf, other]

Minerva: A File-Based Ransomware Detector

Authors: Dorjan Hitaj, Giulio Pagnotta, Fabio De Gaspari, Lorenzo De Carli, Luigi V. Mancini

Abstract: Ransomware attacks have caused billions of dollars in damages in recent years, and are expected to cause billions more in the future. Consequently, significant effort has been devoted to ransomware detection and mitigation. Behavioral-based ransomware detection approaches have garnered considerable attention recently. These behavioral detectors typically rely on process-based behavioral profiles t… ▽ More Ransomware attacks have caused billions of dollars in damages in recent years, and are expected to cause billions more in the future. Consequently, significant effort has been devoted to ransomware detection and mitigation. Behavioral-based ransomware detection approaches have garnered considerable attention recently. These behavioral detectors typically rely on process-based behavioral profiles to identify malicious behaviors. However, with an increasing body of literature highlighting the vulnerability of such approaches to evasion attacks, a comprehensive solution to the ransomware problem remains elusive. This paper presents Minerva, a novel robust approach to ransomware detection. Minerva is engineered to be robust by design against evasion attacks, with architectural and feature selection choices informed by their resilience to adversarial manipulation. We conduct a comprehensive analysis of Minerva across a diverse spectrum of ransomware types, encompassing unseen ransomware as well as variants designed specifically to evade Minerva. Our evaluation showcases the ability of Minerva to accurately identify ransomware, generalize to unseen threats, and withstand evasion attacks. Furthermore, Minerva achieves remarkably low detection times, enabling the adoption of data loss prevention techniques with near-zero overhead. △ Less

Submitted 16 April, 2024; v1 submitted 26 January, 2023; originally announced January 2023.

Comments: 14 pages

arXiv:2106.00541 [pdf, other]

doi 10.1145/3433210.3453101

MalPhase: Fine-Grained Malware Detection Using Network Flow Data

Authors: Michal Piskozub, Fabio De Gaspari, Frederick Barr-Smith, Luigi V. Mancini, Ivan Martinovic

Abstract: Economic incentives encourage malware authors to constantly develop new, increasingly complex malware to steal sensitive data or blackmail individuals and companies into paying large ransoms. In 2017, the worldwide economic impact of cyberattacks is estimated to be between 445 and 600 billion USD, or 0.8% of global GDP. Traditionally, one of the approaches used to defend against malware is network… ▽ More Economic incentives encourage malware authors to constantly develop new, increasingly complex malware to steal sensitive data or blackmail individuals and companies into paying large ransoms. In 2017, the worldwide economic impact of cyberattacks is estimated to be between 445 and 600 billion USD, or 0.8% of global GDP. Traditionally, one of the approaches used to defend against malware is network traffic analysis, which relies on network data to detect the presence of potentially malicious software. However, to keep up with increasing network speeds and amount of traffic, network analysis is generally limited to work on aggregated network data, which is traditionally challenging and yields mixed results. In this paper we present MalPhase, a system that was designed to cope with the limitations of aggregated flows. MalPhase features a multi-phase pipeline for malware detection, type and family classification. The use of an extended set of network flow features and a simultaneous multi-tier architecture facilitates a performance improvement for deep learning models, making them able to detect malicious flows (>98% F1) and categorize them to a respective malware type (>93% F1) and family (>91% F1). Furthermore, the use of robust features and denoising autoencoders allows MalPhase to perform well on samples with varying amounts of benign traffic mixed in. Finally, MalPhase detects unseen malware samples with performance comparable to that of known samples, even when interlaced with benign flows to reflect realistic network environments. △ Less

Submitted 1 June, 2021; originally announced June 2021.

Comments: Paper accepted for publication at ACM AsiaCCS 2021

arXiv:2105.06165 [pdf, other]

PassFlow: Guessing Passwords with Generative Flows

Authors: Giulio Pagnotta, Dorjan Hitaj, Fabio De Gaspari, Luigi V. Mancini

Abstract: Recent advances in generative machine learning models rekindled research interest in the area of password guessing. Data-driven password guessing approaches based on GANs, language models and deep latent variable models have shown impressive generalization performance and offer compelling properties for the task of password guessing. In this paper, we propose PassFlow, a flow-based generative mode… ▽ More Recent advances in generative machine learning models rekindled research interest in the area of password guessing. Data-driven password guessing approaches based on GANs, language models and deep latent variable models have shown impressive generalization performance and offer compelling properties for the task of password guessing. In this paper, we propose PassFlow, a flow-based generative model approach to password guessing. Flow-based models allow for precise log-likelihood computation and optimization, which enables exact latent variable inference. Additionally, flow-based models provide meaningful latent space representation, which enables operations such as exploration of specific subspaces of the latent space and interpolation. We demonstrate the applicability of generative flows to the context of password guessing, departing from previous applications of flow-networks which are mainly limited to the continuous space of image generation. We show that PassFlow is able to outperform prior state-of-the-art GAN-based approaches in the password guessing task while using a training set that is orders of magnitudes smaller than that of previous art. Furthermore, a qualitative analysis of the generated samples shows that PassFlow can accurately model the distribution of the original passwords, with even non-matched samples closely resembling human-like passwords. △ Less

Submitted 14 December, 2021; v1 submitted 13 May, 2021; originally announced May 2021.

Comments: 12 pages, 6 figures, 6 tables

arXiv:2103.17059 [pdf, other]

Reliable Detection of Compressed and Encrypted Data

Authors: Fabio De Gaspari, Dorjan Hitaj, Giulio Pagnotta, Lorenzo De Carli, Luigi V. Mancini

Abstract: Several cybersecurity domains, such as ransomware detection, forensics and data analysis, require methods to reliably identify encrypted data fragments. Typically, current approaches employ statistics derived from byte-level distribution, such as entropy estimation, to identify encrypted fragments. However, modern content types use compression techniques which alter data distribution pushing it cl… ▽ More Several cybersecurity domains, such as ransomware detection, forensics and data analysis, require methods to reliably identify encrypted data fragments. Typically, current approaches employ statistics derived from byte-level distribution, such as entropy estimation, to identify encrypted fragments. However, modern content types use compression techniques which alter data distribution pushing it closer to the uniform distribution. The result is that current approaches exhibit unreliable encryption detection performance when compressed data appears in the dataset. Furthermore, proposed approaches are typically evaluated over few data types and fragment sizes, making it hard to assess their practical applicability. This paper compares existing statistical tests on a large, standardized dataset and shows that current approaches consistently fail to distinguish encrypted and compressed data on both small and large fragment sizes. We address these shortcomings and design EnCoD, a learning-based classifier which can reliably distinguish compressed and encrypted data. We evaluate EnCoD on a dataset of 16 different file types and fragment sizes ranging from 512B to 8KB. Our results highlight that EnCoD outperforms current approaches by a wide margin, with accuracy ranging from ~82 for 512B fragments up to ~92 for 8KB data fragments. Moreover, EnCoD can pinpoint the exact format of a given data fragment, rather than performing only binary classification like previous approaches. △ Less

Submitted 31 March, 2021; originally announced March 2021.

Comments: 12 pages, 8 figures. arXiv admin note: substantial text overlap with arXiv:2010.07754

arXiv:2010.07754 [pdf, other]

EnCoD: Distinguishing Compressed and Encrypted File Fragments

Authors: Fabio De Gaspari, Dorjan Hitaj, Giulio Pagnotta, Lorenzo De Carli, Luigi V. Mancini

Abstract: Reliable identification of encrypted file fragments is a requirement for several security applications, including ransomware detection, digital forensics, and traffic analysis. A popular approach consists of estimating high entropy as a proxy for randomness. However, many modern content types (e.g. office documents, media files, etc.) are highly compressed for storage and transmission efficiency.… ▽ More Reliable identification of encrypted file fragments is a requirement for several security applications, including ransomware detection, digital forensics, and traffic analysis. A popular approach consists of estimating high entropy as a proxy for randomness. However, many modern content types (e.g. office documents, media files, etc.) are highly compressed for storage and transmission efficiency. Compression algorithms also output high-entropy data, thus reducing the accuracy of entropy-based encryption detectors. Over the years, a variety of approaches have been proposed to distinguish encrypted file fragments from high-entropy compressed fragments. However, these approaches are typically only evaluated over a few, select data types and fragment sizes, which makes a fair assessment of their practical applicability impossible. This paper aims to close this gap by comparing existing statistical tests on a large, standardized dataset. Our results show that current approaches cannot reliably tell apart encryption and compression, even for large fragment sizes. To address this issue, we design EnCoD, a learning-based classifier which can reliably distinguish compressed and encrypted data, starting with fragments as small as 512 bytes. We evaluate EnCoD against current approaches over a large dataset of different data types, showing that it outperforms current state-of-the-art for most considered fragment sizes and data types. △ Less

Submitted 15 October, 2020; originally announced October 2020.

Comments: 19 pages, 6 images, 2 tables. Accepted for publication at the 14th International Conference on Network and System Security (NSS2020)

arXiv:2005.00283 [pdf, other]

Facilitating Access to Multilingual COVID-19 Information via Neural Machine Translation

Authors: Andy Way, Rejwanul Haque, Guodong Xie, Federico Gaspari, Maja Popovic, Alberto Poncelas

Abstract: Every day, more people are becoming infected and dying from exposure to COVID-19. Some countries in Europe like Spain, France, the UK and Italy have suffered particularly badly from the virus. Others such as Germany appear to have coped extremely well. Both health professionals and the general public are keen to receive up-to-date information on the effects of the virus, as well as treatments that… ▽ More Every day, more people are becoming infected and dying from exposure to COVID-19. Some countries in Europe like Spain, France, the UK and Italy have suffered particularly badly from the virus. Others such as Germany appear to have coped extremely well. Both health professionals and the general public are keen to receive up-to-date information on the effects of the virus, as well as treatments that have proven to be effective. In cases where language is a barrier to access of pertinent information, machine translation (MT) may help people assimilate information published in different languages. Our MT systems trained on COVID-19 data are freely available for anyone to use to help translate information published in German, French, Italian, Spanish into English, as well as the reverse direction. △ Less

Submitted 1 May, 2020; originally announced May 2020.

arXiv:1911.02423 [pdf, other]

The Naked Sun: Malicious Cooperation Between Benign-Looking Processes

Authors: Fabio De Gaspari, Dorjan Hitaj, Giulio Pagnotta, Lorenzo De Carli, Luigi V. Mancini

Abstract: Recent progress in machine learning has generated promising results in behavioral malware detection. Behavioral modeling identifies malicious processes via features derived by their runtime behavior. Behavioral features hold great promise as they are intrinsically related to the functioning of each malware, and are therefore considered difficult to evade. Indeed, while a significant amount of resu… ▽ More Recent progress in machine learning has generated promising results in behavioral malware detection. Behavioral modeling identifies malicious processes via features derived by their runtime behavior. Behavioral features hold great promise as they are intrinsically related to the functioning of each malware, and are therefore considered difficult to evade. Indeed, while a significant amount of results exists on evasion of static malware features, evasion of dynamic features has seen limited work. This paper thoroughly examines the robustness of behavioral malware detectors to evasion, focusing particularly on anti-ransomware evasion. We choose ransomware as its behavior tends to differ significantly from that of benign processes, making it a low-hanging fruit for behavioral detection (and a difficult candidate for evasion). Our analysis identifies a set of novel attacks that distribute the overall malware workload across a small set of cooperating processes to avoid the generation of significant behavioral features. Our most effective attack decreases the accuracy of a state-of-the-art classifier from 98.6% to 0% using only 18 cooperating processes. Furthermore, we show our attacks to be effective against commercial ransomware detectors even in a black-box setting. △ Less

Submitted 6 November, 2019; originally announced November 2019.

Comments: 15 pages, 6 figures, 4 tables

arXiv:1803.10664 [pdf]

Autonomous Intelligent Cyber-defense Agent (AICA) Reference Architecture. Release 2.0

Authors: Alexander Kott, Paul Théron, Martin Drašar, Edlira Dushku, Benoît LeBlanc, Paul Losiewicz, Alessandro Guarino, Luigi Mancini, Agostino Panico, Mauno Pihelgas, Krzysztof Rzadca, Fabio De Gaspari

Abstract: This report - a major revision of its previous release - describes a reference architecture for intelligent software agents performing active, largely autonomous cyber-defense actions on military networks of computing and communicating devices. The report is produced by the North Atlantic Treaty Organization (NATO) Research Task Group (RTG) IST-152 "Intelligent Autonomous Agents for Cyber Defense… ▽ More This report - a major revision of its previous release - describes a reference architecture for intelligent software agents performing active, largely autonomous cyber-defense actions on military networks of computing and communicating devices. The report is produced by the North Atlantic Treaty Organization (NATO) Research Task Group (RTG) IST-152 "Intelligent Autonomous Agents for Cyber Defense and Resilience". In a conflict with a technically sophisticated adversary, NATO military tactical networks will operate in a heavily contested battlefield. Enemy software cyber agents - malware - will infiltrate friendly networks and attack friendly command, control, communications, computers, intelligence, surveillance, and reconnaissance and computerized weapon systems. To fight them, NATO needs artificial cyber hunters - intelligent, autonomous, mobile agents specialized in active cyber defense. With this in mind, in 2016, NATO initiated RTG IST-152. Its objective has been to help accelerate the development and transition to practice of such software agents by producing a reference architecture and technical roadmap. This report presents the concept and architecture of an Autonomous Intelligent Cyber-defense Agent (AICA). We describe the rationale of the AICA concept, explain the methodology and purpose that drive the definition of the AICA Reference Architecture, and review some of the main features and challenges of AICAs. △ Less

Submitted 22 March, 2023; v1 submitted 28 March, 2018; originally announced March 2018.

Comments: This is a major revision and extension of the earlier release of AICA Reference Architecture

Report number: ARL-SR-0421

arXiv:1608.04766 [pdf, other]

Know Your Enemy: Stealth Configuration-Information Gathering in SDN

Authors: Mauro Conti, Fabio De Gaspari, Luigi V. Mancini

Abstract: Software Defined Networking (SDN) is a network architecture that aims at providing high flexibility through the separation of the network logic from the forwarding functions. The industry has already widely adopted SDN and researchers thoroughly analyzed its vulnerabilities, proposing solutions to improve its security. However, we believe important security aspects of SDN are still left uninvestig… ▽ More Software Defined Networking (SDN) is a network architecture that aims at providing high flexibility through the separation of the network logic from the forwarding functions. The industry has already widely adopted SDN and researchers thoroughly analyzed its vulnerabilities, proposing solutions to improve its security. However, we believe important security aspects of SDN are still left uninvestigated. In this paper, we raise the concern of the possibility for an attacker to obtain knowledge about an SDN network. In particular, we introduce a novel attack, named Know Your Enemy (KYE), by means of which an attacker can gather vital information about the configuration of the network. This information ranges from the configuration of security tools, such as attack detection thresholds for network scanning, to general network policies like QoS and network virtualization. Additionally, we show that an attacker can perform a KYE attack in a stealthy fashion, i.e., without the risk of being detected. We underline that the vulnerability exploited by the KYE attack is proper of SDN and is not present in legacy networks. To address the KYE attack, we also propose an active defense countermeasure based on network flows obfuscation, which considerably increases the complexity for a successful attack. Our solution offers provable security guarantees that can be tailored to the needs of the specific network under consideration △ Less

Submitted 16 August, 2016; originally announced August 2016.

arXiv:1502.02234 [pdf, other]

LineSwitch: Efficiently Managing Switch Flow in Software-Defined Networking while Effectively Tackling DoS Attacks

Authors: Moreno Ambrosin, Mauro Conti, Fabio De Gaspari, Radha Poovendran

Abstract: Software Defined Networking (SDN) is a new networking architecture which aims to provide better decoupling between network control (control plane) and data forwarding functionalities (data plane). This separation introduces several benefits, such as a directly programmable and (virtually) centralized network control. However, researchers showed that the required communication channel between the c… ▽ More Software Defined Networking (SDN) is a new networking architecture which aims to provide better decoupling between network control (control plane) and data forwarding functionalities (data plane). This separation introduces several benefits, such as a directly programmable and (virtually) centralized network control. However, researchers showed that the required communication channel between the control and data plane of SDN creates a potential bottleneck in the system, introducing new vulnerabilities. Indeed, this behavior could be exploited to mount powerful attacks, such as the control plane saturation attack, that can severely hinder the performance of the whole network. In this paper we present LineSwitch, an efficient and effective solution against control plane saturation attack. LineSwitch combines SYN proxy techniques and probabilistic blacklisting of network traffic. We implemented LineSwitch as an extension of OpenFlow, the current reference implementation of SDN, and evaluate our solution considering different traffic scenarios (with and without attack). The results of our preliminary experiments confirm that, compared to the state-of-the-art, LineSwitch reduces the time overhead up to 30%, while ensuring the same level of protection. △ Less

Submitted 8 February, 2015; originally announced February 2015.

Comments: In Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security (ASIACCS 2015). To appear

Showing 1–14 of 14 results for author: Gaspari, F