Search | arXiv e-print repository

arXiv:2406.02011 [pdf, other]

A Risk Estimation Study of Native Code Vulnerabilities in Android Applications

Authors: Silvia Lucia Sanna, Diego Soi, Davide Maiorca, Giorgio Fumera, Giorgio Giacinto

Abstract: Android is the most used Operating System worldwide for mobile devices, with hundreds of thousands of apps downloaded daily. Although these apps are primarily written in Java and Kotlin, advanced functionalities such as graphics or cryptography are provided through native C/C++ libraries. These libraries can be affected by common vulnerabilities in C/C++ code (e.g., memory errors such as buffer ov… ▽ More Android is the most used Operating System worldwide for mobile devices, with hundreds of thousands of apps downloaded daily. Although these apps are primarily written in Java and Kotlin, advanced functionalities such as graphics or cryptography are provided through native C/C++ libraries. These libraries can be affected by common vulnerabilities in C/C++ code (e.g., memory errors such as buffer overflow), through which attackers can read/modify data or execute arbitrary code. The detection and assessment of vulnerabilities in Android native code have only been recently explored by previous research work. In this paper, we propose a fast risk-based approach that provides a risk score related to the native part of an Android application. In this way, before an app is released, the developer can check if the app may contain vulnerabilities in the Native Code and, if present, patch them to publish a more secure application. To this end, we first use fast regular expressions to detect library versions and possible vulnerable functions. Then, we apply scores extracted from a vulnerability database to the analyzed application, thus obtaining a risk score representative of the whole app. We demonstrate the validity of our approach by performing a large-scale analysis on more than $100,000$ applications (but only $40\%$ contained native code) and $15$ popular libraries carrying known vulnerabilities. The attained results show that many applications contain well-known vulnerabilities that miscreants can potentially exploit, posing serious concerns about the security of the whole Android applications landscape. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.01118 [pdf, other]

A Survey of the Overlooked Dangers of Template Engines

Authors: Lorenzo Pisu, Davide Maiorca, Giorgio Giacinto

Abstract: Template engines play a pivotal role in modern web application development, facilitating the dynamic rendering of content, products, and user interfaces. Nowadays, template engines are essential in any website that deals with dynamic data, from e-commerce platforms to social media. However, their widespread use also makes them attractive targets for attackers seeking to exploit vulnerabilities and… ▽ More Template engines play a pivotal role in modern web application development, facilitating the dynamic rendering of content, products, and user interfaces. Nowadays, template engines are essential in any website that deals with dynamic data, from e-commerce platforms to social media. However, their widespread use also makes them attractive targets for attackers seeking to exploit vulnerabilities and gain unauthorized access to web servers. This paper presents a comprehensive survey of template engines, focusing on their susceptibility to Remote Code Execution (RCE) attacks, a critical security concern in web application development. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 29 pages, 2 figures

arXiv:2312.17356 [pdf, other]

Can you See me? On the Visibility of NOPs against Android Malware Detectors

Authors: Diego Soi, Davide Maiorca, Giorgio Giacinto, Harel Berger

Abstract: Android malware still represents the most significant threat to mobile systems. While Machine Learning systems are increasingly used to identify these threats, past studies have revealed that attackers can bypass these detection mechanisms by making subtle changes to Android applications, such as adding specific API calls. These modifications are often referred to as No OPerations (NOP), which ide… ▽ More Android malware still represents the most significant threat to mobile systems. While Machine Learning systems are increasingly used to identify these threats, past studies have revealed that attackers can bypass these detection mechanisms by making subtle changes to Android applications, such as adding specific API calls. These modifications are often referred to as No OPerations (NOP), which ideally should not alter the semantics of the program. However, many NOPs can be spotted and eliminated by refining the app analysis process. This paper proposes a visibility metric that assesses the difficulty in spotting NOPs and similar non-operational codes. We tested our metric on a state-of-the-art, opcode-based deep learning system for Android malware detection. We implemented attacks on the feature and problem spaces and calculated their visibility according to our metric. The attained results show an intriguing trade-off between evasion efficacy and detectability: our metric can be valuable to ensure the real effectiveness of an adversarial attack, also serving as a useful aid to develop better defenses. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2205.05573 [pdf, other]

A Longitudinal Study of Cryptographic API: a Decade of Android Malware

Authors: Adam Janovsky, Davide Maiorca, Dominik Macko, Vashek Matyas, Giorgio Giacinto

Abstract: Cryptography has been extensively used in Android applications to guarantee secure communications, conceal critical data from reverse engineering, or ensure mobile users' privacy. Various system-based and third-party libraries for Android provide cryptographic functionalities, and previous works mainly explored the misuse of cryptographic API in benign applications. However, the role of cryptograp… ▽ More Cryptography has been extensively used in Android applications to guarantee secure communications, conceal critical data from reverse engineering, or ensure mobile users' privacy. Various system-based and third-party libraries for Android provide cryptographic functionalities, and previous works mainly explored the misuse of cryptographic API in benign applications. However, the role of cryptographic API has not yet been explored in Android malware. This paper performs a comprehensive, longitudinal analysis of cryptographic API in Android malware. In particular, we analyzed $603\,937$ Android applications (half of them malicious, half benign) released between $2012$ and $2020$, gathering more than 1 million cryptographic API expressions. Our results reveal intriguing trends and insights on how and why cryptography is employed in Android malware. For instance, we point out the widespread use of weak hash functions and the late transition from insecure DES to AES. Additionally, we show that cryptography-related characteristics can help to improve the performance of learning-based systems in detecting malicious applications. △ Less

Submitted 6 July, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

Comments: Fix processing time data

arXiv:2005.01452 [pdf, other]

Do Gradient-based Explanations Tell Anything About Adversarial Robustness to Android Malware?

Authors: Marco Melis, Michele Scalas, Ambra Demontis, Davide Maiorca, Battista Biggio, Giorgio Giacinto, Fabio Roli

Abstract: While machine-learning algorithms have demonstrated a strong ability in detecting Android malware, they can be evaded by sparse evasion attacks crafted by injecting a small set of fake components, e.g., permissions and system calls, without compromising intrusive functionality. Previous work has shown that, to improve robustness against such attacks, learning algorithms should avoid overemphasizin… ▽ More While machine-learning algorithms have demonstrated a strong ability in detecting Android malware, they can be evaded by sparse evasion attacks crafted by injecting a small set of fake components, e.g., permissions and system calls, without compromising intrusive functionality. Previous work has shown that, to improve robustness against such attacks, learning algorithms should avoid overemphasizing few discriminant features, providing instead decisions that rely upon a large subset of components. In this work, we investigate whether gradient-based attribution methods, used to explain classifiers' decisions by identifying the most relevant features, can be used to help identify and select more robust algorithms. To this end, we propose to exploit two different metrics that represent the evenness of explanations, and a new compact security measure called Adversarial Robustness Metric. Our experiments conducted on two different datasets and five classification algorithms for Android malware detection show that a strong connection exists between the uniformity of explanations and adversarial robustness. In particular, we found that popular techniques like Gradient*Input and Integrated Gradients are strongly correlated to security when applied to both linear and nonlinear detectors, while more elementary explanation techniques like the simple Gradient do not provide reliable information about the robustness of such classifiers. △ Less

Submitted 27 May, 2021; v1 submitted 4 May, 2020; originally announced May 2020.

arXiv:1910.01037 [pdf, other]

Automotive Cybersecurity: Foundations for Next-Generation Vehicles

Authors: Michele Scalas, Giorgio Giacinto

Abstract: The automotive industry is experiencing a serious transformation due to a digitalisation process and the transition to the new paradigm of Mobility-as-a-Service. The next-generation vehicles are going to be very complex cyber-physical systems, whose design must be reinvented to fulfil the increasing demand of smart services, both for safety and entertainment purposes, causing the manufacturers' mo… ▽ More The automotive industry is experiencing a serious transformation due to a digitalisation process and the transition to the new paradigm of Mobility-as-a-Service. The next-generation vehicles are going to be very complex cyber-physical systems, whose design must be reinvented to fulfil the increasing demand of smart services, both for safety and entertainment purposes, causing the manufacturers' model to converge towards that of IT companies. Connected cars and autonomous driving are the preeminent factors that drive along this route, and they cause the necessity of a new design to address the emerging cybersecurity issues: the "old" automotive architecture relied on a single closed network, with no external communications; modern vehicles are going to be always connected indeed, which means the attack surface will be much more extended. The result is the need for a paradigm shift towards a secure-by-design approach. △ Less

Submitted 2 October, 2019; originally announced October 2019.

Comments: Accepted to ICTCS 2019 conference

arXiv:1904.10270 [pdf, other]

PowerDrive: Accurate De-Obfuscation and Analysis of PowerShell Malware

Authors: Denis Ugarte, Davide Maiorca, Fabrizio Cara, Giorgio Giacinto

Abstract: PowerShell is nowadays a widely-used technology to administrate and manage Windows-based operating systems. However, it is also extensively used by malware vectors to execute payloads or drop additional malicious contents. Similarly to other scripting languages used by malware, PowerShell attacks are challenging to analyze due to the extensive use of multiple obfuscation layers, which make the rea… ▽ More PowerShell is nowadays a widely-used technology to administrate and manage Windows-based operating systems. However, it is also extensively used by malware vectors to execute payloads or drop additional malicious contents. Similarly to other scripting languages used by malware, PowerShell attacks are challenging to analyze due to the extensive use of multiple obfuscation layers, which make the real malicious code hard to be unveiled. To the best of our knowledge, a comprehensive solution for properly de-obfuscating such attacks is currently missing. In this paper, we present PowerDrive, an open-source, static and dynamic multi-stage de-obfuscator for PowerShell attacks. PowerDrive instruments the PowerShell code to progressively de-obfuscate it by showing the analyst the employed obfuscation steps. We used PowerDrive to successfully analyze thousands of PowerShell attacks extracted from various malware vectors and executables. The attained results show interesting patterns used by attackers to devise their malicious scripts. Moreover, we provide a taxonomy of behavioral models adopted by the analyzed codes and a comprehensive list of the malicious domains contacted during the analysis. △ Less

Submitted 24 April, 2019; v1 submitted 23 April, 2019; originally announced April 2019.

arXiv:1811.09985 [pdf, other]

Poisoning Behavioral Malware Clustering

Authors: Battista Biggio, Konrad Rieck, Davide Ariu, Christian Wressnegger, Igino Corona, Giorgio Giacinto, Fabio Roli

Abstract: Clustering algorithms have become a popular tool in computer security to analyze the behavior of malware variants, identify novel malware families, and generate signatures for antivirus systems. However, the suitability of clustering algorithms for security-sensitive settings has been recently questioned by showing that they can be significantly compromised if an attacker can exercise some control… ▽ More Clustering algorithms have become a popular tool in computer security to analyze the behavior of malware variants, identify novel malware families, and generate signatures for antivirus systems. However, the suitability of clustering algorithms for security-sensitive settings has been recently questioned by showing that they can be significantly compromised if an attacker can exercise some control over the input data. In this paper, we revisit this problem by focusing on behavioral malware clustering approaches, and investigate whether and to what extent an attacker may be able to subvert these approaches through a careful injection of samples with poisoning behavior. To this end, we present a case study on Malheur, an open-source tool for behavioral malware clustering. Our experiments not only demonstrate that this tool is vulnerable to poisoning attacks, but also that it can be significantly compromised even if the attacker can only inject a very small percentage of attacks into the input data. As a remedy, we discuss possible countermeasures and highlight the need for more secure clustering algorithms. △ Less

Submitted 25 November, 2018; originally announced November 2018.

Journal ref: 2014 ACM CCS Workshop on Artificial Intelligent and Security, AISec '14, pages 27-36, New York, NY, USA, 2014. ACM

arXiv:1811.00830 [pdf, other]

doi 10.1145/3332184

Towards Adversarial Malware Detection: Lessons Learned from PDF-based Attacks

Authors: Davide Maiorca, Battista Biggio, Giorgio Giacinto

Abstract: Malware still constitutes a major threat in the cybersecurity landscape, also due to the widespread use of infection vectors such as documents. These infection vectors hide embedded malicious code to the victim users, facilitating the use of social engineering techniques to infect their machines. Research showed that machine-learning algorithms provide effective detection mechanisms against such t… ▽ More Malware still constitutes a major threat in the cybersecurity landscape, also due to the widespread use of infection vectors such as documents. These infection vectors hide embedded malicious code to the victim users, facilitating the use of social engineering techniques to infect their machines. Research showed that machine-learning algorithms provide effective detection mechanisms against such threats, but the existence of an arms race in adversarial settings has recently challenged such systems. In this work, we focus on malware embedded in PDF files as a representative case of such an arms race. We start by providing a comprehensive taxonomy of the different approaches used to generate PDF malware, and of the corresponding learning-based detection systems. We then categorize threats specifically targeted against learning-based PDF malware detectors, using a well-established framework in the field of adversarial machine learning. This framework allows us to categorize known vulnerabilities of learning-based PDF malware detectors and to identify novel attacks that may threaten such systems, along with the potential defense mechanisms that can mitigate the impact of such threats. We conclude the paper by discussing how such findings highlight promising research directions towards tackling the more general challenge of designing robust malware detectors in adversarial settings. △ Less

Submitted 14 April, 2020; v1 submitted 2 November, 2018; originally announced November 2018.

Journal ref: ACM Computing Surveys, Vol. 52, No. 4, Article 78, 2019

arXiv:1805.09563 [pdf, other]

doi 10.1016/j.cose.2019.06.004

On the Effectiveness of System API-Related Information for Android Ransomware Detection

Authors: Michele Scalas, Davide Maiorca, Francesco Mercaldo, Corrado Aaron Visaggio, Fabio Martinelli, Giorgio Giacinto

Abstract: Ransomware constitutes a significant threat to the Android operating system. It can either lock or encrypt the target devices, and victims are forced to pay ransoms to restore their data. Hence, the prompt detection of such attacks has a priority in comparison to other malicious threats. Previous works on Android malware detection mainly focused on Machine Learning-oriented approaches that were ta… ▽ More Ransomware constitutes a significant threat to the Android operating system. It can either lock or encrypt the target devices, and victims are forced to pay ransoms to restore their data. Hence, the prompt detection of such attacks has a priority in comparison to other malicious threats. Previous works on Android malware detection mainly focused on Machine Learning-oriented approaches that were tailored to identifying malware families, without a clear focus on ransomware. More specifically, such approaches resorted to complex information types such as permissions, user-implemented API calls, and native calls. However, this led to significant drawbacks concerning complexity, resilience against obfuscation, and explainability. To overcome these issues, in this paper, we propose and discuss learning-based detection strategies that rely on System API information. These techniques leverage the fact that ransomware attacks heavily resort to System API to perform their actions, and allow distinguishing between generic malware, ransomware and goodware. We tested three different ways of employing System API information, i.e., through packages, classes, and methods, and we compared their performances to other, more complex state-of-the-art approaches. The attained results showed that systems based on System API could detect ransomware and generic malware with very good accuracy, comparable to systems that employed more complex information. Moreover, the proposed systems could accurately detect novel samples in the wild and showed resilience against static obfuscation attempts. Finally, to guarantee early on-device detection, we developed and released on the Android platform a complete ransomware and malware detector (R-PackDroid) that employed one of the methodologies proposed in this paper. △ Less

Submitted 26 June, 2019; v1 submitted 24 May, 2018; originally announced May 2018.

Journal ref: Computers & Security 86C (2019) pp. 168-182

arXiv:1803.04173 [pdf, other]

Adversarial Malware Binaries: Evading Deep Learning for Malware Detection in Executables

Authors: Bojan Kolosnjaji, Ambra Demontis, Battista Biggio, Davide Maiorca, Giorgio Giacinto, Claudia Eckert, Fabio Roli

Abstract: Machine-learning methods have already been exploited as useful tools for detecting malicious executable files. They leverage data retrieved from malware samples, such as header fields, instruction sequences, or even raw bytes, to learn models that discriminate between benign and malicious software. However, it has also been shown that machine learning and deep neural networks can be fooled by evas… ▽ More Machine-learning methods have already been exploited as useful tools for detecting malicious executable files. They leverage data retrieved from malware samples, such as header fields, instruction sequences, or even raw bytes, to learn models that discriminate between benign and malicious software. However, it has also been shown that machine learning and deep neural networks can be fooled by evasion attacks (also referred to as adversarial examples), i.e., small changes to the input data that cause misclassification at test time. In this work, we investigate the vulnerability of malware detection methods that use deep networks to learn from raw bytes. We propose a gradient-based attack that is capable of evading a recently-proposed deep network suited to this purpose by only changing few specific bytes at the end of each malware sample, while preserving its intrusive functionality. Promising results show that our adversarial malware binaries evade the targeted network with high probability, even though less than 1% of their bytes are modified. △ Less

Submitted 12 March, 2018; originally announced March 2018.

arXiv:1803.03544 [pdf, other]

Explaining Black-box Android Malware Detection

Authors: Marco Melis, Davide Maiorca, Battista Biggio, Giorgio Giacinto, Fabio Roli

Abstract: Machine-learning models have been recently used for detecting malicious Android applications, reporting impressive performances on benchmark datasets, even when trained only on features statically extracted from the application, such as system calls and permissions. However, recent findings have highlighted the fragility of such in-vitro evaluations with benchmark datasets, showing that very few c… ▽ More Machine-learning models have been recently used for detecting malicious Android applications, reporting impressive performances on benchmark datasets, even when trained only on features statically extracted from the application, such as system calls and permissions. However, recent findings have highlighted the fragility of such in-vitro evaluations with benchmark datasets, showing that very few changes to the content of Android malware may suffice to evade detection. How can we thus trust that a malware detector performing well on benchmark data will continue to do so when deployed in an operating environment? To mitigate this issue, the most popular Android malware detectors use linear, explainable machine-learning models to easily identify the most influential features contributing to each decision. In this work, we generalize this approach to any black-box machine- learning model, by leveraging a gradient-based approach to identify the most influential local features. This enables using nonlinear models to potentially increase accuracy without sacrificing interpretability of decisions. Our approach also highlights the global characteristics learned by the model to discriminate between benign and malware applications. Finally, as shown by our empirical analysis on a popular Android malware detection task, it also helps identifying potential vulnerabilities of linear and nonlinear models against adversarial manipulations. △ Less

Submitted 29 October, 2018; v1 submitted 9 March, 2018; originally announced March 2018.

Comments: Published on the Proceedings of 26th European Signal Processing Conference (EUSIPCO '18)

arXiv:1802.01185 [pdf, other]

IntelliAV: Building an Effective On-Device Android Malware Detector

Authors: Mansour Ahmadi, Angelo Sotgiu, Giorgio Giacinto

Abstract: The importance of employing machine learning for malware detection has become explicit to the security community. Several anti-malware vendors have claimed and advertised the application of machine learning in their products in which the inference phase is performed on servers and high-performance machines, but the feasibility of such approaches on mobile devices with limited computational resourc… ▽ More The importance of employing machine learning for malware detection has become explicit to the security community. Several anti-malware vendors have claimed and advertised the application of machine learning in their products in which the inference phase is performed on servers and high-performance machines, but the feasibility of such approaches on mobile devices with limited computational resources has not yet been assessed by the research community, vendors still being skeptical. In this paper, we aim to show the practicality of devising a learning-based anti-malware on Android mobile devices, first. Furthermore, we aim to demonstrate the significance of such a tool to cease new and evasive malware that can not easily be caught by signature-based or offline learning-based security tools. To this end, we first propose the extraction of a set of lightweight yet powerful features from Android applications. Then, we embed these features in a vector space to build an effective as well as efficient model. Hence, the model can perform the inference on the device for detecting potentially harmful applications. We show that without resorting to any signatures and relying only on a training phase involving a reasonable set of samples, the proposed system, named IntelliAV, provides more satisfying performances than the popular major anti-malware products. Moreover, we evaluate the robustness of IntelliAV against common obfuscation techniques where most of the anti-malware solutions get affected. △ Less

Submitted 4 February, 2018; originally announced February 2018.

arXiv:1710.10225 [pdf, other]

doi 10.1016/j.cose.2020.101901

Adversarial Detection of Flash Malware: Limitations and Open Issues

Authors: Davide Maiorca, Ambra Demontis, Battista Biggio, Fabio Roli, Giorgio Giacinto

Abstract: During the past four years, Flash malware has become one of the most insidious threats to detect, with almost 600 critical vulnerabilities targeting Adobe Flash disclosed in the wild. Research has shown that machine learning can be successfully used to detect Flash malware by leveraging static analysis to extract information from the structure of the file or its bytecode. However, the robustness o… ▽ More During the past four years, Flash malware has become one of the most insidious threats to detect, with almost 600 critical vulnerabilities targeting Adobe Flash disclosed in the wild. Research has shown that machine learning can be successfully used to detect Flash malware by leveraging static analysis to extract information from the structure of the file or its bytecode. However, the robustness of Flash malware detectors against well-crafted evasion attempts - also known as adversarial examples - has never been investigated. In this paper, we propose a security evaluation of a novel, representative Flash detector that embeds a combination of the prominent, static features employed by state-of-the-art tools. In particular, we discuss how to craft adversarial Flash malware examples, showing that it suffices to manipulate the corresponding source malware samples slightly to evade detection. We then empirically demonstrate that popular defense techniques proposed to mitigate evasion attempts, including re-training on adversarial examples, may not always be sufficient to ensure robustness. We argue that this occurs when the feature vectors extracted from adversarial examples become indistinguishable from those of benign data, meaning that the given feature representation is intrinsically vulnerable. In this respect, we are the first to formally define and quantitatively characterize this vulnerability, highlighting when an attack can be countered by solely improving the security of the learning algorithm, or when it requires also considering additional features. We conclude the paper by suggesting alternative research directions to improve the security of learning-based Flash malware detectors. △ Less

Submitted 14 July, 2020; v1 submitted 27 October, 2017; originally announced October 2017.

Journal ref: Computers and Security, Volume 96, September 2020, 101901

arXiv:1708.06131 [pdf, other]

doi 10.1007/978-3-642-40994-3_25

Evasion Attacks against Machine Learning at Test Time

Authors: Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Srndic, Pavel Laskov, Giorgio Giacinto, Fabio Roli

Abstract: In security-sensitive applications, the success of machine learning depends on a thorough vetting of their resistance to adversarial data. In one pertinent, well-motivated attack scenario, an adversary may attempt to evade a deployed system at test time by carefully manipulating attack samples. In this work, we present a simple but effective gradient-based approach that can be exploited to systema… ▽ More In security-sensitive applications, the success of machine learning depends on a thorough vetting of their resistance to adversarial data. In one pertinent, well-motivated attack scenario, an adversary may attempt to evade a deployed system at test time by carefully manipulating attack samples. In this work, we present a simple but effective gradient-based approach that can be exploited to systematically assess the security of several, widely-used classification algorithms against evasion attacks. Following a recently proposed framework for security evaluation, we simulate attack scenarios that exhibit different risk levels for the classifier by increasing the attacker's knowledge of the system and her ability to manipulate attack samples. This gives the classifier designer a better picture of the classifier performance under evasion attacks, and allows him to perform a more informed model selection (or parameter setting). We evaluate our approach on the relevant security task of malware detection in PDF files, and show that such systems can be easily evaded. We also sketch some countermeasures suggested by our analysis. △ Less

Submitted 21 August, 2017; originally announced August 2017.

Comments: In this paper, in 2013, we were the first to introduce the notion of evasion attacks (adversarial examples) created with high confidence (instead of minimum-distance misclassifications), and the notion of surrogate learners (substitute models). These two concepts are now widely re-used in develo** attacks against deep networks (even if not always referring to the ideas reported in this work). arXiv admin note: text overlap with arXiv:1401.7727

Journal ref: ECML PKDD, Part III, vol. 8190, LNCS, pp. 387--402. Springer, 2013

arXiv:1704.08996 [pdf, other]

Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection

Authors: Ambra Demontis, Marco Melis, Battista Biggio, Davide Maiorca, Daniel Arp, Konrad Rieck, Igino Corona, Giorgio Giacinto, Fabio Roli

Abstract: To cope with the increasing variability and sophistication of modern attacks, machine learning has been widely adopted as a statistically-sound tool for malware detection. However, its security against well-crafted attacks has not only been recently questioned, but it has been shown that machine learning exhibits inherent vulnerabilities that can be exploited to evade detection at test time. In ot… ▽ More To cope with the increasing variability and sophistication of modern attacks, machine learning has been widely adopted as a statistically-sound tool for malware detection. However, its security against well-crafted attacks has not only been recently questioned, but it has been shown that machine learning exhibits inherent vulnerabilities that can be exploited to evade detection at test time. In other words, machine learning itself can be the weakest link in a security system. In this paper, we rely upon a previously-proposed attack framework to categorize potential attack scenarios against learning-based malware detection tools, by modeling attackers with different skills and capabilities. We then define and implement a set of corresponding evasion attacks to thoroughly assess the security of Drebin, an Android malware detector. The main contribution of this work is the proposal of a simple and scalable secure-learning paradigm that mitigates the impact of evasion attacks, while only slightly worsening the detection rate in the absence of attack. We finally argue that our secure-learning approach can also be readily applied to other malware detection tasks. △ Less

Submitted 28 April, 2017; originally announced April 2017.

Comments: Accepted for publication on IEEE Transactions on Dependable and Secure Computing

arXiv:1511.04317 [pdf, other]

Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification

Authors: Mansour Ahmadi, Dmitry Ulyanov, Stanislav Semenov, Mikhail Trofimov, Giorgio Giacinto

Abstract: Modern malware is designed with mutation characteristics, namely polymorphism and metamorphism, which causes an enormous growth in the number of variants of malware samples. Categorization of malware samples on the basis of their behaviors is essential for the computer security community, because they receive huge number of malware everyday, and the signature extraction process is usually based on… ▽ More Modern malware is designed with mutation characteristics, namely polymorphism and metamorphism, which causes an enormous growth in the number of variants of malware samples. Categorization of malware samples on the basis of their behaviors is essential for the computer security community, because they receive huge number of malware everyday, and the signature extraction process is usually based on malicious parts characterizing malware families. Microsoft released a malware classification challenge in 2015 with a huge dataset of near 0.5 terabytes of data, containing more than 20K malware samples. The analysis of this dataset inspired the development of a novel paradigm that is effective in categorizing malware variants into their actual family groups. This paradigm is presented and discussed in the present paper, where emphasis has been given to the phases related to the extraction, and selection of a set of novel features for the effective representation of malware samples. Features can be grouped according to different characteristics of malware behavior, and their fusion is performed according to a per-class weighting paradigm. The proposed method achieved a very high accuracy ($\approx$ 0.998) on the Microsoft Malware Challenge dataset. △ Less

Submitted 10 March, 2016; v1 submitted 13 November, 2015; originally announced November 2015.

arXiv:1401.7727 [pdf, other]

Security Evaluation of Support Vector Machines in Adversarial Environments

Authors: Battista Biggio, Igino Corona, Blaine Nelson, Benjamin I. P. Rubinstein, Davide Maiorca, Giorgio Fumera, Giorgio Giacinto, and Fabio Roli

Abstract: Support Vector Machines (SVMs) are among the most popular classification techniques adopted in security applications like malware detection, intrusion detection, and spam filtering. However, if SVMs are to be incorporated in real-world security systems, they must be able to cope with attack patterns that can either mislead the learning algorithm (poisoning), evade detection (evasion), or gain info… ▽ More Support Vector Machines (SVMs) are among the most popular classification techniques adopted in security applications like malware detection, intrusion detection, and spam filtering. However, if SVMs are to be incorporated in real-world security systems, they must be able to cope with attack patterns that can either mislead the learning algorithm (poisoning), evade detection (evasion), or gain information about their internal parameters (privacy breaches). The main contributions of this chapter are twofold. First, we introduce a formal general framework for the empirical evaluation of the security of machine-learning systems. Second, according to our framework, we demonstrate the feasibility of evasion, poisoning and privacy attacks against SVMs in real-world security problems. For each attack technique, we evaluate its impact and discuss whether (and how) it can be countered through an adversary-aware design of SVMs. Our experiments are easily reproducible thanks to open-source code that we have made available, together with all the employed datasets, on a public repository. △ Less

Submitted 29 January, 2014; originally announced January 2014.

Comments: 47 pages, 9 figures; chapter accepted into book 'Support Vector Machine Applications'

Showing 1–18 of 18 results for author: Giacinto, G