-
Longitudinal Study of the Prevalence of Malware Evasive Techniques
Authors:
Lorenzo Maffia,
Dario Nisi,
Platon Kotzias,
Giovanni Lagorio,
Simone Aonzo,
Davide Balzarotti
Abstract:
By their very nature, malware samples employ a variety of techniques to conceal their malicious behavior and hide it from analysis tools. To mitigate the problem, a large number of different evasion techniques have been documented over the years, and PoC implementations have been collected in public frameworks, like the popular Al-Khaser. As malware authors tend to reuse existing approaches, it is…
▽ More
By their very nature, malware samples employ a variety of techniques to conceal their malicious behavior and hide it from analysis tools. To mitigate the problem, a large number of different evasion techniques have been documented over the years, and PoC implementations have been collected in public frameworks, like the popular Al-Khaser. As malware authors tend to reuse existing approaches, it is common to observe the same evasive techniques in malware samples of different families. However, no measurement study has been conducted to date to assess the adoption and prevalence of evasion techniques.
In this paper, we present a large-scale study, conducted by dynamically analyzing more than 180K Windows malware samples, on the evolution of evasive techniques over the years. To perform the experiments, we developed a custom Pin-based Evasive Program Profiler (Pepper), a tool capable of both detecting and circumventing 53 anti-dynamic-analysis techniques of different categories, ranging from anti-debug to virtual machine detection.
To observe the phenomenon of evasion from different points of view, we employed four different datasets, including benign files, advanced persistent threat (APTs), malware samples collected over a period of five years, and a recent collection of different families submitted to VirusTotal over a one-month period.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
Adversarial EXEmples: A Survey and Experimental Evaluation of Practical Attacks on Machine Learning for Windows Malware Detection
Authors:
Luca Demetrio,
Scott E. Coull,
Battista Biggio,
Giovanni Lagorio,
Alessandro Armando,
Fabio Roli
Abstract:
Recent work has shown that adversarial Windows malware samples - referred to as adversarial EXEmples in this paper - can bypass machine learning-based detection relying on static code analysis by perturbing relatively few input bytes. To preserve malicious functionality, previous attacks either add bytes to existing non-functional areas of the file, potentially limiting their effectiveness, or req…
▽ More
Recent work has shown that adversarial Windows malware samples - referred to as adversarial EXEmples in this paper - can bypass machine learning-based detection relying on static code analysis by perturbing relatively few input bytes. To preserve malicious functionality, previous attacks either add bytes to existing non-functional areas of the file, potentially limiting their effectiveness, or require running computationally-demanding validation steps to discard malware variants that do not correctly execute in sandbox environments. In this work, we overcome these limitations by develo** a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks based on practical, functionality-preserving manipulations to the Windows Portable Executable (PE) file format. These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section. Our experimental results show that these attacks outperform existing ones in both white-box and black-box scenarios, achieving a better trade-off in terms of evasion rate and size of the injected payload, while also enabling evasion of models that have been shown to be robust to previous attacks. To facilitate reproducibility of our findings, we open source our framework and all the corresponding attack implementations as part of the secml-malware Python library. We conclude this work by discussing the limitations of current machine learning-based malware detectors, along with potential mitigation strategies based on embedding domain knowledge coming from subject-matter experts directly into the learning process.
△ Less
Submitted 22 March, 2021; v1 submitted 17 August, 2020;
originally announced August 2020.
-
Functionality-preserving Black-box Optimization of Adversarial Windows Malware
Authors:
Luca Demetrio,
Battista Biggio,
Giovanni Lagorio,
Fabio Roli,
Alessandro Armando
Abstract:
Windows malware detectors based on machine learning are vulnerable to adversarial examples, even if the attacker is only given black-box query access to the model. The main drawback of these attacks is that: (i) they are query-inefficient, as they rely on iteratively applying random transformations to the input malware; and (ii) they may also require executing the adversarial malware in a sandbox…
▽ More
Windows malware detectors based on machine learning are vulnerable to adversarial examples, even if the attacker is only given black-box query access to the model. The main drawback of these attacks is that: (i) they are query-inefficient, as they rely on iteratively applying random transformations to the input malware; and (ii) they may also require executing the adversarial malware in a sandbox at each iteration of the optimization process, to ensure that its intrusive functionality is preserved. In this paper, we overcome these issues by presenting a novel family of black-box attacks that are both query-efficient and functionality-preserving, as they rely on the injection of benign content - which will never be executed - either at the end of the malicious file, or within some newly-created sections. Our attacks are formalized as a constrained minimization problem which also enables optimizing the trade-off between the probability of evading detection and the size of the injected payload. We empirically investigate this trade-off on two popular static Windows malware detectors, and show that our black-box attacks can bypass them with only few queries and small payloads, even when they only return the predicted labels. We also evaluate whether our attacks transfer to other commercial antivirus solutions, and surprisingly find that they can evade, on average, more than 12 commercial antivirus engines. We conclude by discussing the limitations of our approach, and its possible future extensions to target malware classifiers based on dynamic analysis.
△ Less
Submitted 18 February, 2021; v1 submitted 30 March, 2020;
originally announced March 2020.
-
WAF-A-MoLE: Evading Web Application Firewalls through Adversarial Machine Learning
Authors:
Luca Demetrio,
Andrea Valenza,
Gabriele Costa,
Giovanni Lagorio
Abstract:
Web Application Firewalls are widely used in production environments to mitigate security threats like SQL injections. Many industrial products rely on signature-based techniques, but machine learning approaches are becoming more and more popular. The main goal of an adversary is to craft semantically malicious payloads to bypass the syntactic analysis performed by a WAF. In this paper, we present…
▽ More
Web Application Firewalls are widely used in production environments to mitigate security threats like SQL injections. Many industrial products rely on signature-based techniques, but machine learning approaches are becoming more and more popular. The main goal of an adversary is to craft semantically malicious payloads to bypass the syntactic analysis performed by a WAF. In this paper, we present WAF-A-MoLE, a tool that models the presence of an adversary. This tool leverages on a set of mutation operators that alter the syntax of a payload without affecting the original semantics. We evaluate the performance of the tool against existing WAFs, that we trained using our publicly available SQL query dataset. We show that WAF-A-MoLE bypasses all the considered machine learning based WAFs.
△ Less
Submitted 7 January, 2020;
originally announced January 2020.
-
Explaining Vulnerabilities of Deep Learning to Adversarial Malware Binaries
Authors:
Luca Demetrio,
Battista Biggio,
Giovanni Lagorio,
Fabio Roli,
Alessandro Armando
Abstract:
Recent work has shown that deep-learning algorithms for malware detection are also susceptible to adversarial examples, i.e., carefully-crafted perturbations to input malware that enable misleading classification. Although this has questioned their suitability for this task, it is not yet clear why such algorithms are easily fooled also in this particular application domain. In this work, we take…
▽ More
Recent work has shown that deep-learning algorithms for malware detection are also susceptible to adversarial examples, i.e., carefully-crafted perturbations to input malware that enable misleading classification. Although this has questioned their suitability for this task, it is not yet clear why such algorithms are easily fooled also in this particular application domain. In this work, we take a first step to tackle this issue by leveraging explainable machine-learning algorithms developed to interpret the black-box decisions of deep neural networks. In particular, we use an explainable technique known as feature attribution to identify the most influential input features contributing to each decision, and adapt it to provide meaningful explanations to the classification of malware binaries. In this case, we find that a recently-proposed convolutional neural network does not learn any meaningful characteristic for malware detection from the data and text sections of executable files, but rather tends to learn to discriminate between benign and malware samples based on the characteristics found in the file header. Based on this finding, we propose a novel attack algorithm that generates adversarial malware binaries by only changing few tens of bytes in the file header. With respect to the other state-of-the-art attack algorithms, our attack does not require injecting any padding bytes at the end of the file, and it is much more efficient, as it requires manipulating much fewer bytes.
△ Less
Submitted 24 January, 2019; v1 submitted 11 January, 2019;
originally announced January 2019.
-
Coinductive subty** for abstract compilation of object-oriented languages into Horn formulas
Authors:
Davide Ancona,
Giovanni Lagorio
Abstract:
In recent work we have shown how it is possible to define very precise type systems for object-oriented languages by abstractly compiling a program into a Horn formula f. Then type inference amounts to resolving a certain goal w.r.t. the coinductive (that is, the greatest) Herbrand model of f.
Type systems defined in this way are idealized, since in the most interesting instantiations both the…
▽ More
In recent work we have shown how it is possible to define very precise type systems for object-oriented languages by abstractly compiling a program into a Horn formula f. Then type inference amounts to resolving a certain goal w.r.t. the coinductive (that is, the greatest) Herbrand model of f.
Type systems defined in this way are idealized, since in the most interesting instantiations both the terms of the coinductive Herbrand universe and goal derivations cannot be finitely represented. However, sound and quite expressive approximations can be implemented by considering only regular terms and derivations. In doing so, it is essential to introduce a proper subty** relation formalizing the notion of approximation between types.
In this paper we study a subty** relation on coinductive terms built on union and object type constructors. We define an interpretation of types as set of values induced by a quite intuitive relation of membership of values to types, and prove that the definition of subty** is sound w.r.t. subset inclusion between type interpretations. The proof of soundness has allowed us to simplify the notion of contractive derivation and to discover that the previously given definition of subty** did not cover all possible representations of the empty type.
△ Less
Submitted 7 June, 2010;
originally announced June 2010.