-
A Systematization of Cybersecurity Regulations, Standards and Guidelines for the Healthcare Sector
Authors:
Maria Patrizia Carello,
Alberto Marchetti Spaccamela,
Leonardo Querzoni,
Marco Angelini
Abstract:
The growing adoption of IT solutions in the healthcare sector is leading to a steady increase in the number of cybersecurity incidents. As a result, organizations worldwide have introduced regulations, standards, and best practices to address cybersecurity and data protection issues in this sector. However, the application of this large corpus of documents presents operational difficulties, and op…
▽ More
The growing adoption of IT solutions in the healthcare sector is leading to a steady increase in the number of cybersecurity incidents. As a result, organizations worldwide have introduced regulations, standards, and best practices to address cybersecurity and data protection issues in this sector. However, the application of this large corpus of documents presents operational difficulties, and operators continue to lag behind in resilience to cyber attacks. This paper contributes a systematization of the significant cybersecurity documents relevant to the healthcare sector. We collected the 49 most significant documents and used the NIST cybersecurity framework to categorize key information and support the implementation of cybersecurity measures.
△ Less
Submitted 28 April, 2023;
originally announced April 2023.
-
Adversarial Attacks against Binary Similarity Systems
Authors:
Gianluca Capozzi,
Daniele Cono D'Elia,
Giuseppe Antonio Di Luna,
Leonardo Querzoni
Abstract:
In recent years, binary analysis gained traction as a fundamental approach to inspect software and guarantee its security. Due to the exponential increase of devices running software, much research is now moving towards new autonomous solutions based on deep learning models, as they have been showing state-of-the-art performances in solving binary analysis problems. One of the hot topics in this c…
▽ More
In recent years, binary analysis gained traction as a fundamental approach to inspect software and guarantee its security. Due to the exponential increase of devices running software, much research is now moving towards new autonomous solutions based on deep learning models, as they have been showing state-of-the-art performances in solving binary analysis problems. One of the hot topics in this context is binary similarity, which consists in determining if two functions in assembly code are compiled from the same source code. However, it is unclear how deep learning models for binary similarity behave in an adversarial context. In this paper, we study the resilience of binary similarity models against adversarial examples, showing that they are susceptible to both targeted and untargeted attacks (w.r.t. similarity goals) performed by black-box and white-box attackers. In more detail, we extensively test three current state-of-the-art solutions for binary similarity against two black-box greedy attacks, including a new technique that we call Spatial Greedy, and one white-box attack in which we repurpose a gradient-guided strategy used in attacks to image classifiers.
△ Less
Submitted 3 November, 2023; v1 submitted 20 March, 2023;
originally announced March 2023.
-
Where Did My Variable Go? Poking Holes in Incomplete Debug Information
Authors:
Cristian Assaiante,
Daniele Cono D'Elia,
Giuseppe Antonio Di Luna,
Leonardo Querzoni
Abstract:
The availability of debug information for optimized executables can largely ease crucial tasks such as crash analysis. Source-level debuggers use this information to display program state in terms of source code, allowing users to reason on it even when optimizations alter program structure extensively. A few recent endeavors have proposed effective methodologies for identifying incorrect instance…
▽ More
The availability of debug information for optimized executables can largely ease crucial tasks such as crash analysis. Source-level debuggers use this information to display program state in terms of source code, allowing users to reason on it even when optimizations alter program structure extensively. A few recent endeavors have proposed effective methodologies for identifying incorrect instances of debug information, which can mislead users by presenting them with an inconsistent program state.
In this work, we identify and study a related important problem: the completeness of debug information. Unlike correctness issues for which an unoptimized executable can serve as reference, we find there is no analogous oracle to deem when the cause behind an unreported part of program state is an unavoidable effect of optimization or a compiler implementation defect. In this scenario, we argue that empirically derived conjectures on the expected availability of debug information can serve as an effective means to expose classes of these defects.
We propose three conjectures involving variable values and study how often synthetic programs compiled with different configurations of the popular gcc and LLVM compilers deviate from them. We then discuss techniques to pinpoint the optimizations behind such violations and minimize bug reports accordingly. Our experiments revealed, among others, 24 bugs already confirmed by the developers of the gcc-gdb and clang-lldb ecosystems.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer
Authors:
Fiorella Artuso,
Marco Mormando,
Giuseppe A. Di Luna,
Leonardo Querzoni
Abstract:
A recent trend in binary code analysis promotes the use of neural solutions based on instruction embedding models. An instruction embedding model is a neural network that transforms sequences of assembly instructions into embedding vectors. If the embedding network is trained such that the translation from code to vectors partially preserves the semantic, the network effectively represents an asse…
▽ More
A recent trend in binary code analysis promotes the use of neural solutions based on instruction embedding models. An instruction embedding model is a neural network that transforms sequences of assembly instructions into embedding vectors. If the embedding network is trained such that the translation from code to vectors partially preserves the semantic, the network effectively represents an assembly code model.
In this paper we present BinBert, a novel assembly code model. BinBert is built on a transformer pre-trained on a huge dataset of both assembly instruction sequences and symbolic execution information. BinBert can be applied to assembly instructions sequences and it is fine-tunable, i.e. it can be re-trained as part of a neural architecture on task-specific data. Through fine-tuning, BinBert learns how to apply the general knowledge acquired with pre-training to the specific task.
We evaluated BinBert on a multi-task benchmark that we specifically designed to test the understanding of assembly code. The benchmark is composed of several tasks, some taken from the literature, and a few novel tasks that we designed, with a mix of intrinsic and downstream tasks.
Our results show that BinBert outperforms state-of-the-art models for binary instruction embedding, raising the bar for binary code understanding.
△ Less
Submitted 13 August, 2022;
originally announced August 2022.
-
Constantine: Automatic Side-Channel Resistance Using Efficient Control and Data Flow Linearization
Authors:
Pietro Borrello,
Daniele Cono D'Elia,
Leonardo Querzoni,
Cristiano Giuffrida
Abstract:
In the era of microarchitectural side channels, vendors scramble to deploy mitigations for transient execution attacks, but leave traditional side-channel attacks against sensitive software (e.g., crypto programs) to be fixed by developers by means of constant-time programming (i.e., absence of secret-dependent code/data patterns). Unfortunately, writing constant-time code by hand is hard, as evid…
▽ More
In the era of microarchitectural side channels, vendors scramble to deploy mitigations for transient execution attacks, but leave traditional side-channel attacks against sensitive software (e.g., crypto programs) to be fixed by developers by means of constant-time programming (i.e., absence of secret-dependent code/data patterns). Unfortunately, writing constant-time code by hand is hard, as evidenced by the many flaws discovered in production side channel-resistant code. Prior efforts to automatically transform programs into constant-time equivalents offer limited security or compatibility guarantees, hindering their applicability to real-world software.
In this paper, we present Constantine, a compiler-based system to automatically harden programs against microarchitectural side channels. Constantine pursues a radical design point where secret-dependent control and data flows are completely linearized (i.e., all involved code/data accesses are always executed). This strategy provides strong security and compatibility guarantees by construction, but its natural implementation leads to state explosion in real-world programs. To address this challenge, Constantine relies on carefully designed optimizations such as just-in-time loop linearization and aggressive function cloning for fully context-sensitive points-to analysis, which not only address state explosion, but also lead to an efficient and compatible solution. Constantine yields overheads as low as 16% on standard benchmarks and can handle a fully-fledged component from the production wolfSSL library.
△ Less
Submitted 14 September, 2021; v1 submitted 21 April, 2021;
originally announced April 2021.
-
Who is Debugging the Debuggers? Exposing Debug Information Bugs in Optimized Binaries
Authors:
Giuseppe Antonio Di Luna,
Davide Italiano,
Luca Massarelli,
Sebastian Osterlund,
Cristiano Giuffrida,
Leonardo Querzoni
Abstract:
Despite the advancements in software testing, bugs still plague deployed software and result in crashes in production. When debugging issues -- sometimes caused by "heisenbugs" -- there is the need to interpret core dumps and reproduce the issue offline on the same binary deployed. This requires the entire toolchain (compiler, linker, debugger) to correctly generate and use debug information. Litt…
▽ More
Despite the advancements in software testing, bugs still plague deployed software and result in crashes in production. When debugging issues -- sometimes caused by "heisenbugs" -- there is the need to interpret core dumps and reproduce the issue offline on the same binary deployed. This requires the entire toolchain (compiler, linker, debugger) to correctly generate and use debug information. Little attention has been devoted to checking that such information is correctly preserved by modern toolchains' optimization stages. This is particularly important as managing debug information in optimized production binaries is non-trivial, often leading to toolchain bugs that may hinder post-deployment debugging efforts. In this paper, we present Debug$^{2}$, a framework to find debug information bugs in modern toolchains. Our framework feeds random source programs to the target toolchain and surgically compares the debugging behavior of their optimized/unoptimized binary variants. Such differential analysis allows Debug$^{2}$ to check invariants at each debugging step and detect bugs from invariant violations. Our invariants are based on the (in)consistency of common debug entities, such as source lines, stack frames, and function arguments. We show that, while simple, this strategy yields powerful cross-toolchain and cross-language invariants, which can pinpoint several bugs in modern toolchains. We have used Debug$^{2}$ to find 23 bugs in the LLVM toolchain (clang/lldb), 8 bugs in the GNU toolchain (GCC/gdb), and 3 in the Rust toolchain (rustc/lldb) -- with 14 bugs already fixed by the developers.
△ Less
Submitted 4 December, 2020; v1 submitted 27 November, 2020;
originally announced November 2020.
-
Synchronous Byzantine Lattice Agreement in ${\cal O}(\log (f))$ Rounds
Authors:
Giuseppe Antonio Di Luna,
Emmanuelle Anceaume,
Silvia Bonomi,
Leonardo Querzoni
Abstract:
In the Lattice Agreement (LA) problem, originally proposed by Attiya et al. \cite{Attiya:1995}, a set of processes has to decide on a chain of a lattice. More precisely, each correct process proposes an element $e$ of a certain join-semi lattice $L$ and it has to decide on a value that contains $e$. Moreover, any pair $p_i,p_j$ of correct processes has to decide two values $dec_i$ and $dec_j$ that…
▽ More
In the Lattice Agreement (LA) problem, originally proposed by Attiya et al. \cite{Attiya:1995}, a set of processes has to decide on a chain of a lattice. More precisely, each correct process proposes an element $e$ of a certain join-semi lattice $L$ and it has to decide on a value that contains $e$. Moreover, any pair $p_i,p_j$ of correct processes has to decide two values $dec_i$ and $dec_j$ that are comparable (e.g., $dec_i \leq dec_j$ or $dec_j < dec_i$). LA has been studied for its practical applications, as example it can be used to implement a snapshot objects \cite{Attiya:1995} or a replicated state machine with commutative operations \cite{Faleiro:2012}. Interestingly, the study of the Byzantine Lattice Agreement started only recently, and it has been mainly devoted to asynchronous systems. The synchronous case has been object of a recent pre-print \cite{Zheng:aa} where Zheng et al. propose an algorithm terminating in ${\cal O}(\sqrt f)$ rounds and tolerating $f < \lceil n/3 \rceil$ Byzantine processes.
In this paper we present new contributions for the synchronous case. We investigate the problem in the usual message passing model for a system of $n$ processes with distinct unique IDs. We first prove that, when only authenticated channels are available, the problem cannot be solved if $f=n/3$ or more processes are Byzantine. We then propose a novel algorithm that works in a synchronous system model with signatures (i.e., the {\em authenticated message} model), tolerates up to $f$ byzantine failures (where $f<n/3$) and that terminates in ${\cal O}(\log f)$ rounds. We discuss how to remove authenticated messages at the price of algorithm resiliency ($f < n/4$). Finally, we present a transformer that converts any synchronous LA algorithm to an algorithm for synchronous Generalised Lattice Agreement.
△ Less
Submitted 13 January, 2020; v1 submitted 8 January, 2020;
originally announced January 2020.
-
In Nomine Function: Naming Functions in Stripped Binaries with Neural Networks
Authors:
Fiorella Artuso,
Giuseppe Antonio Di Luna,
Luca Massarelli,
Leonardo Querzoni
Abstract:
In this paper we investigate the problem of automatically naming pieces of assembly code. Where by naming we mean assigning to an assembly function a string of words that would likely be assigned by a human reverse engineer. We formally and precisely define the framework in which our investigation takes place. That is we define the problem, we provide reasonable justifications for the choices that…
▽ More
In this paper we investigate the problem of automatically naming pieces of assembly code. Where by naming we mean assigning to an assembly function a string of words that would likely be assigned by a human reverse engineer. We formally and precisely define the framework in which our investigation takes place. That is we define the problem, we provide reasonable justifications for the choices that we made for the design of training and the tests. We performed an analysis on a large real-world corpora constituted by nearly 9 millions of functions taken from more than 22k softwares. In such framework we test baselines coming from the field of Natural Language Processing (e.g., Seq2Seq networks and Transformer). Interestingly, our evaluation shows promising results beating the state-of-the-art and reaching good performance. We investigate the applicability of tine-tuning (i.e., taking a model already trained on a large generic corpora and retraining it for a specific task). Such technique is popular and well-known in the NLP field. Our results confirm that fine-tuning is effective even when neural networks are applied to binaries. We show that a model, pre-trained on the aforementioned corpora, when fine-tuned has higher performances on specific domains (such as predicting names in system utilites, malware, etc).
△ Less
Submitted 4 February, 2021; v1 submitted 17 December, 2019;
originally announced December 2019.
-
Byzantine Generalized Lattice Agreement
Authors:
Giuseppe Antonio Di Luna,
Emmanuelle Anceaume,
Leonardo Querzoni
Abstract:
The paper investigates the Lattice Agreement (LA) problem in asynchronous systems. In LA each process proposes an element $e$ from a predetermined lattice, and has to decide on an element $e'$ of the lattice such that $e \leq e'$. Moreover, decisions of different processes have to be comparable (no two processes can decide two elements $e$ and $e'$ such that…
▽ More
The paper investigates the Lattice Agreement (LA) problem in asynchronous systems. In LA each process proposes an element $e$ from a predetermined lattice, and has to decide on an element $e'$ of the lattice such that $e \leq e'$. Moreover, decisions of different processes have to be comparable (no two processes can decide two elements $e$ and $e'$ such that $(e \not\leq e') \land (e' \not\leq e)$). It has been shown that Generalized LA (i.e., a version of LA proposing and deciding on sequences of values) can be used to build a Replicated State Machine (RSM) with commutative update operations. The key advantage of LA and Generalized LA is that they can be solved in asynchronous systems prone to crash-failures (this is not the case with standard Consensus). In this paper we assume Byzantine failures. We propose the Wait Till Safe (WTS) algorithm for LA, and we show that its resilience to $f \leq (n-1)/3$ Byzantines is optimal. We then generalize WTS obtaining a Generalized LA algorithm, namely GWTS. We use GWTS to build a RSM with commutative updates. Our RSM works in asynchronous systems and tolerates $f \leq (n-1)/3$ malicious entities. All our algorithms use the minimal assumption of authenticated channels. When the more powerful public signatures are available, we discuss how to improve the message complexity of our results (from quadratic to linear, when $f={\cal O}(1)$). At the best of our knowledge this is the first paper proposing a solution for Byzantine LA that works on any possible lattice, and it is the first work proposing a Byzantine tolerant RSM built on it.
△ Less
Submitted 13 February, 2020; v1 submitted 13 October, 2019;
originally announced October 2019.
-
Peel the onion: Recognition of Android apps behind the Tor Network
Authors:
Emanuele Petagna,
Giuseppe Laurenza,
Claudio Ciccotelli,
Leonardo Querzoni
Abstract:
In this work we show that Tor is vulnerable to app deanonymization attacks on Android devices through network traffic analysis. For this purpose, we describe a general methodology for performing an attack that allows to deanonymize the apps running on a target smartphone using Tor, which is the victim of the attack. Then, we discuss a Proof-of-Concept, implementing the methodology, that shows how…
▽ More
In this work we show that Tor is vulnerable to app deanonymization attacks on Android devices through network traffic analysis. For this purpose, we describe a general methodology for performing an attack that allows to deanonymize the apps running on a target smartphone using Tor, which is the victim of the attack. Then, we discuss a Proof-of-Concept, implementing the methodology, that shows how the attack can be performed in practice and allows to assess the deanonymization accuracy that it is possible to achieve. While attacks against Tor anonymity have been already gained considerable attention in the context of website fingerprinting in desktop environments, to the best of our knowledge this is the first work that highlights Tor vulnerability to apps deanonymization attacks on Android devices. In our experiments we achieved an accuracy of 97%.
△ Less
Submitted 14 January, 2019;
originally announced January 2019.
-
SAFE: Self-Attentive Function Embeddings for Binary Similarity
Authors:
Luca Massarelli,
Giuseppe Antonio Di Luna,
Fabio Petroni,
Leonardo Querzoni,
Roberto Baldoni
Abstract:
The binary similarity problem consists in determining if two functions are similar by only considering their compiled form. Advanced techniques for binary similarity recently gained momentum as they can be applied in several fields, such as copyright disputes, malware analysis, vulnerability detection, etc., and thus have an immediate practical impact. Current solutions compare functions by first…
▽ More
The binary similarity problem consists in determining if two functions are similar by only considering their compiled form. Advanced techniques for binary similarity recently gained momentum as they can be applied in several fields, such as copyright disputes, malware analysis, vulnerability detection, etc., and thus have an immediate practical impact. Current solutions compare functions by first transforming their binary code in multi-dimensional vector representations (embeddings), and then comparing vectors through simple and efficient geometric operations. However, embeddings are usually derived from binary code using manual feature extraction, that may fail in considering important function characteristics, or may consider features that are not important for the binary similarity problem. In this paper we propose SAFE, a novel architecture for the embedding of functions based on a self-attentive neural network. SAFE works directly on disassembled binary functions, does not require manual feature extraction, is computationally more efficient than existing solutions (i.e., it does not incur in the computational overhead of building or manipulating control flow graphs), and is more general as it works on stripped binaries and on multiple architectures. We report the results from a quantitative and qualitative analysis that show how SAFE provides a noticeable performance improvement with respect to previous solutions. Furthermore, we show how clusters of our embedding vectors are closely related to the semantic of the implemented algorithms, paving the way for further interesting applications (e.g. semantic-based binary function search).
△ Less
Submitted 19 December, 2019; v1 submitted 13 November, 2018;
originally announced November 2018.
-
Unsupervised Features Extraction for Binary Similarity Using Graph Embedding Neural Networks
Authors:
Roberto Baldoni,
Giuseppe Antonio Di Luna,
Luca Massarelli,
Fabio Petroni,
Leonardo Querzoni
Abstract:
In this paper we consider the binary similarity problem that consists in determining if two binary functions are similar only considering their compiled form. This problem is know to be crucial in several application scenarios, such as copyright disputes, malware analysis, vulnerability detection, etc. The current state-of-the-art solutions in this field work by creating an embedding model that ma…
▽ More
In this paper we consider the binary similarity problem that consists in determining if two binary functions are similar only considering their compiled form. This problem is know to be crucial in several application scenarios, such as copyright disputes, malware analysis, vulnerability detection, etc. The current state-of-the-art solutions in this field work by creating an embedding model that maps binary functions into vectors in $\mathbb{R}^{n}$. Such embedding model captures syntactic and semantic similarity between binaries, i.e., similar binary functions are mapped to points that are close in the vector space. This strategy has many advantages, one of them is the possibility to precompute embeddings of several binary functions, and then compare them with simple geometric operations (e.g., dot product). In [32] functions are first transformed in Annotated Control Flow Graphs (ACFGs) constituted by manually engineered features and then graphs are embedded into vectors using a deep neural network architecture. In this paper we propose and test several ways to compute annotated control flow graphs that use unsupervised approaches for feature learning, without incurring a human bias. Our methods are inspired after techniques used in the natural language processing community (e.g., we use word2vec to encode assembly instructions). We show that our approach is indeed successful, and it leads to better performance than previous state-of-the-art solutions. Furthermore, we report on a qualitative analysis of functions embeddings. We found interesting cases in which embeddings are clustered according to the semantic of the original binary function.
△ Less
Submitted 13 November, 2018; v1 submitted 23 October, 2018;
originally announced October 2018.
-
Android Malware Family Classification Based on Resource Consumption over Time
Authors:
Luca Massarelli,
Leonardo Aniello,
Claudio Ciccotelli,
Leonardo Querzoni,
Daniele Ucci,
Roberto Baldoni
Abstract:
The vast majority of today's mobile malware targets Android devices. This has pushed the research effort in Android malware analysis in the last years. An important task of malware analysis is the classification of malware samples into known families. Static malware analysis is known to fall short against techniques that change static characteristics of the malware (e.g. code obfuscation), while d…
▽ More
The vast majority of today's mobile malware targets Android devices. This has pushed the research effort in Android malware analysis in the last years. An important task of malware analysis is the classification of malware samples into known families. Static malware analysis is known to fall short against techniques that change static characteristics of the malware (e.g. code obfuscation), while dynamic analysis has proven effective against such techniques. To the best of our knowledge, the most notable work on Android malware family classification purely based on dynamic analysis is DroidScribe. With respect to DroidScribe, our approach is easier to reproduce. Our methodology only employs publicly available tools, does not require any modification to the emulated environment or Android OS, and can collect data from physical devices. The latter is a key factor, since modern mobile malware can detect the emulated environment and hide their malicious behavior. Our approach relies on resource consumption metrics available from the proc file system. Features are extracted through detrended fluctuation analysis and correlation. Finally, a SVM is employed to classify malware into families. We provide an experimental evaluation on malware samples from the Drebin dataset, where we obtain a classification accuracy of 82%, proving that our methodology achieves an accuracy comparable to that of DroidScribe. Furthermore, we make the software we developed publicly available, to ease the reproducibility of our results.
△ Less
Submitted 4 September, 2017;
originally announced September 2017.
-
Big Data in Critical Infrastructures Security Monitoring: Challenges and Opportunities
Authors:
L. Aniello,
A. Bondavalli,
A. Ceccarelli,
C. Ciccotelli,
M. Cinque,
F. Frattini,
A. Guzzo,
A. Pecchia,
A. Pugliese,
L. Querzoni,
S. Russo
Abstract:
Critical Infrastructures (CIs), such as smart power grids, transport systems, and financial infrastructures, are more and more vulnerable to cyber threats, due to the adoption of commodity computing facilities. Despite the use of several monitoring tools, recent attacks have proven that current defensive mechanisms for CIs are not effective enough against most advanced threats. In this paper we ex…
▽ More
Critical Infrastructures (CIs), such as smart power grids, transport systems, and financial infrastructures, are more and more vulnerable to cyber threats, due to the adoption of commodity computing facilities. Despite the use of several monitoring tools, recent attacks have proven that current defensive mechanisms for CIs are not effective enough against most advanced threats. In this paper we explore the idea of a framework leveraging multiple data sources to improve protection capabilities of CIs. Challenges and opportunities are discussed along three main research directions: i) use of distinct and heterogeneous data sources, ii) monitoring with adaptive granularity, and iii) attack modeling and runtime combination of multiple data analysis techniques.
△ Less
Submitted 7 May, 2014; v1 submitted 1 May, 2014;
originally announced May 2014.