-
Adversarial Attacks against Binary Similarity Systems
Authors:
Gianluca Capozzi,
Daniele Cono D'Elia,
Giuseppe Antonio Di Luna,
Leonardo Querzoni
Abstract:
In recent years, binary analysis gained traction as a fundamental approach to inspect software and guarantee its security. Due to the exponential increase of devices running software, much research is now moving towards new autonomous solutions based on deep learning models, as they have been showing state-of-the-art performances in solving binary analysis problems. One of the hot topics in this c…
▽ More
In recent years, binary analysis gained traction as a fundamental approach to inspect software and guarantee its security. Due to the exponential increase of devices running software, much research is now moving towards new autonomous solutions based on deep learning models, as they have been showing state-of-the-art performances in solving binary analysis problems. One of the hot topics in this context is binary similarity, which consists in determining if two functions in assembly code are compiled from the same source code. However, it is unclear how deep learning models for binary similarity behave in an adversarial context. In this paper, we study the resilience of binary similarity models against adversarial examples, showing that they are susceptible to both targeted and untargeted attacks (w.r.t. similarity goals) performed by black-box and white-box attackers. In more detail, we extensively test three current state-of-the-art solutions for binary similarity against two black-box greedy attacks, including a new technique that we call Spatial Greedy, and one white-box attack in which we repurpose a gradient-guided strategy used in attacks to image classifiers.
△ Less
Submitted 3 November, 2023; v1 submitted 20 March, 2023;
originally announced March 2023.
-
Where Did My Variable Go? Poking Holes in Incomplete Debug Information
Authors:
Cristian Assaiante,
Daniele Cono D'Elia,
Giuseppe Antonio Di Luna,
Leonardo Querzoni
Abstract:
The availability of debug information for optimized executables can largely ease crucial tasks such as crash analysis. Source-level debuggers use this information to display program state in terms of source code, allowing users to reason on it even when optimizations alter program structure extensively. A few recent endeavors have proposed effective methodologies for identifying incorrect instance…
▽ More
The availability of debug information for optimized executables can largely ease crucial tasks such as crash analysis. Source-level debuggers use this information to display program state in terms of source code, allowing users to reason on it even when optimizations alter program structure extensively. A few recent endeavors have proposed effective methodologies for identifying incorrect instances of debug information, which can mislead users by presenting them with an inconsistent program state.
In this work, we identify and study a related important problem: the completeness of debug information. Unlike correctness issues for which an unoptimized executable can serve as reference, we find there is no analogous oracle to deem when the cause behind an unreported part of program state is an unavoidable effect of optimization or a compiler implementation defect. In this scenario, we argue that empirically derived conjectures on the expected availability of debug information can serve as an effective means to expose classes of these defects.
We propose three conjectures involving variable values and study how often synthetic programs compiled with different configurations of the popular gcc and LLVM compilers deviate from them. We then discuss techniques to pinpoint the optimizations behind such violations and minimize bug reports accordingly. Our experiments revealed, among others, 24 bugs already confirmed by the developers of the gcc-gdb and clang-lldb ecosystems.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Constantine: Automatic Side-Channel Resistance Using Efficient Control and Data Flow Linearization
Authors:
Pietro Borrello,
Daniele Cono D'Elia,
Leonardo Querzoni,
Cristiano Giuffrida
Abstract:
In the era of microarchitectural side channels, vendors scramble to deploy mitigations for transient execution attacks, but leave traditional side-channel attacks against sensitive software (e.g., crypto programs) to be fixed by developers by means of constant-time programming (i.e., absence of secret-dependent code/data patterns). Unfortunately, writing constant-time code by hand is hard, as evid…
▽ More
In the era of microarchitectural side channels, vendors scramble to deploy mitigations for transient execution attacks, but leave traditional side-channel attacks against sensitive software (e.g., crypto programs) to be fixed by developers by means of constant-time programming (i.e., absence of secret-dependent code/data patterns). Unfortunately, writing constant-time code by hand is hard, as evidenced by the many flaws discovered in production side channel-resistant code. Prior efforts to automatically transform programs into constant-time equivalents offer limited security or compatibility guarantees, hindering their applicability to real-world software.
In this paper, we present Constantine, a compiler-based system to automatically harden programs against microarchitectural side channels. Constantine pursues a radical design point where secret-dependent control and data flows are completely linearized (i.e., all involved code/data accesses are always executed). This strategy provides strong security and compatibility guarantees by construction, but its natural implementation leads to state explosion in real-world programs. To address this challenge, Constantine relies on carefully designed optimizations such as just-in-time loop linearization and aggressive function cloning for fully context-sensitive points-to analysis, which not only address state explosion, but also lead to an efficient and compatible solution. Constantine yields overheads as low as 16% on standard benchmarks and can handle a fully-fledged component from the production wolfSSL library.
△ Less
Submitted 14 September, 2021; v1 submitted 21 April, 2021;
originally announced April 2021.
-
Hiding in the Particles: When Return-Oriented Programming Meets Program Obfuscation
Authors:
Pietro Borrello,
Emilio Coppa,
Daniele Cono D'Elia
Abstract:
Largely known for attack scenarios, code reuse techniques at a closer look reveal properties that are appealing also for program obfuscation. We explore the popular return-oriented programming paradigm under this light, transforming program functions into ROP chains that coexist seamlessly with the surrounding software stack. We show how to build chains that can withstand popular static and dynami…
▽ More
Largely known for attack scenarios, code reuse techniques at a closer look reveal properties that are appealing also for program obfuscation. We explore the popular return-oriented programming paradigm under this light, transforming program functions into ROP chains that coexist seamlessly with the surrounding software stack. We show how to build chains that can withstand popular static and dynamic deobfuscation approaches, evaluating the robustness and overheads of the design over common programs. The results suggest a significant amount of computational resources would be required to carry a deobfuscation attack for secret finding and code coverage goals.
△ Less
Submitted 6 April, 2021; v1 submitted 11 December, 2020;
originally announced December 2020.
-
Designing Robust API Monitoring Solutions
Authors:
Daniele Cono D'Elia,
Simone Nicchi,
Matteo Mariani,
Matteo Marini,
Federico Palmaro
Abstract:
Tracing the sequence of library and system calls that a program makes is very helpful in the characterization of its interactions with the surrounding environment and ultimately of its semantics. Due to entanglements of real-world software stacks, accomplishing this task can be surprisingly challenging as we take accuracy, reliability, and transparency into the equation. To manage these dimensions…
▽ More
Tracing the sequence of library and system calls that a program makes is very helpful in the characterization of its interactions with the surrounding environment and ultimately of its semantics. Due to entanglements of real-world software stacks, accomplishing this task can be surprisingly challenging as we take accuracy, reliability, and transparency into the equation. To manage these dimensions effectively, we identify six challenges that API monitoring solutions should overcome and outline actionable design points for them, reporting insights from our experience in building API tracers for software security research. We detail two implementation variants, based on hardware-assisted virtualization (realizing the first general-purpose user-space tracer of this kind) and on dynamic binary translation, that achieve API monitoring robustly. We share our SNIPER system as open source.
△ Less
Submitted 11 March, 2021; v1 submitted 1 May, 2020;
originally announced May 2020.
-
WEIZZ: Automatic Grey-box Fuzzing for Structured Binary Formats
Authors:
Andrea Fioraldi,
Daniele Cono D'Elia,
Emilio Coppa
Abstract:
Fuzzing technologies have evolved at a fast pace in recent years, revealing bugs in programs with ever increasing depth and speed. Applications working with complex formats are however more difficult to take on, as inputs need to meet certain format-specific characteristics to get through the initial parsing stage and reach deeper behaviors of the program. Unlike prior proposals based on manually…
▽ More
Fuzzing technologies have evolved at a fast pace in recent years, revealing bugs in programs with ever increasing depth and speed. Applications working with complex formats are however more difficult to take on, as inputs need to meet certain format-specific characteristics to get through the initial parsing stage and reach deeper behaviors of the program. Unlike prior proposals based on manually written format specifications, in this paper we present a technique to automatically generate and mutate inputs for unknown chunk-based binary formats. We propose a technique to identify dependencies between input bytes and comparison instructions, and later use them to assign tags that characterize the processing logic of the program. Tags become the building block for structure-aware mutations involving chunks and fields of the input. We show that our techniques performs comparably to structure-aware fuzzing proposals that require human assistance. Our prototype implementation WEIZZ revealed 16 unknown bugs in widely used programs.
△ Less
Submitted 12 August, 2020; v1 submitted 1 November, 2019;
originally announced November 2019.
-
On-Stack Replacement à la Carte
Authors:
Daniele Cono D'Elia,
Camil Demetrescu
Abstract:
On-stack replacement (OSR) dynamically transfers execution between different code versions. This mechanism is used in mainstream runtime systems to support adaptive and speculative optimizations by running code tailored to provide the best expected performance for the actual workload. Current approaches either restrict the program points where OSR can be fired or require complex optimization-speci…
▽ More
On-stack replacement (OSR) dynamically transfers execution between different code versions. This mechanism is used in mainstream runtime systems to support adaptive and speculative optimizations by running code tailored to provide the best expected performance for the actual workload. Current approaches either restrict the program points where OSR can be fired or require complex optimization-specific operations to realign the program's state during a transition. The engineering effort to implement OSR and the lack of abstractions make it rarely accessible to the research community, leaving fundamental question regarding its flexibility largely unexplored.
In this article we make a first step towards a provably sound abstract framework for OSR. We show that compiler optimizations can be made OSR-aware in isolation, and then safely composed. We identify a class of transformations, which we call live-variable equivalent (LVE), that captures a natural property of fundamental compiler optimizations, and devise an algorithm to automatically generate the OSR machinery required for an LVE transition at arbitrary program locations.
We present an implementation of our ideas in LLVM and evaluate it against prominent benchmarks, showing that bidirectional OSR transitions are possible almost everywhere in the code in the presence of common, unhindered global optimizations. We then discuss the end-to-end utility of our techniques in source-level debugging of optimized code, showing how our algorithms can provide novel building blocks for debuggers for both executables and managed runtimes.
△ Less
Submitted 8 August, 2017;
originally announced August 2017.
-
A Survey of Symbolic Execution Techniques
Authors:
Roberto Baldoni,
Emilio Coppa,
Daniele Cono D'Elia,
Camil Demetrescu,
Irene Finocchi
Abstract:
Many security and software testing applications require checking whether certain properties of a program hold for any possible usage scenario. For instance, a tool for identifying software vulnerabilities may need to rule out the existence of any backdoor to bypass a program's authentication. One approach would be to test the program using different, possibly random inputs. As the backdoor may onl…
▽ More
Many security and software testing applications require checking whether certain properties of a program hold for any possible usage scenario. For instance, a tool for identifying software vulnerabilities may need to rule out the existence of any backdoor to bypass a program's authentication. One approach would be to test the program using different, possibly random inputs. As the backdoor may only be hit for very specific program workloads, automated exploration of the space of possible inputs is of the essence. Symbolic execution provides an elegant solution to the problem, by systematically exploring many possible execution paths at the same time without necessarily requiring concrete inputs. Rather than taking on fully specified input values, the technique abstractly represents them as symbols, resorting to constraint solvers to construct actual instances that would cause property violations. Symbolic execution has been incubated in dozens of tools developed over the last four decades, leading to major practical breakthroughs in a number of prominent software reliability applications. The goal of this survey is to provide an overview of the main ideas, challenges, and solutions developed in the area, distilling them for a broad audience.
The present survey has been accepted for publication at ACM Computing Surveys. If you are considering citing this survey, we would appreciate if you could use the following BibTeX entry: http://goo.gl/Hf5Fvc
△ Less
Submitted 2 May, 2018; v1 submitted 3 October, 2016;
originally announced October 2016.
-
Ball-Larus Path Profiling Across Multiple Loop iterations
Authors:
Daniele Cono D'Elia,
Camil Demetrescu,
Irene Finocchi
Abstract:
Identifying the hottest paths in the control flow graph of a routine can direct optimizations to portions of the code where most resources are consumed. This powerful methodology, called path profiling, was introduced by Ball and Larus in the mid 90s and has received considerable attention in the last 15 years for its practical relevance. A shortcoming of Ball-Larus path profiling was the inabilit…
▽ More
Identifying the hottest paths in the control flow graph of a routine can direct optimizations to portions of the code where most resources are consumed. This powerful methodology, called path profiling, was introduced by Ball and Larus in the mid 90s and has received considerable attention in the last 15 years for its practical relevance. A shortcoming of Ball-Larus path profiling was the inability to profile cyclic paths, making it difficult to mine interesting execution patterns that span multiple loop iterations. Previous results, based on rather complex algorithms, have attempted to circumvent this limitation at the price of significant performance losses already for a small number of iterations. In this paper, we present a new approach to multiple iterations path profiling, based on data structures built on top of the original Ball-Larus numbering technique. Our approach allows it to profile all executed paths obtained as a concatenation of up to k Ball-Larus acyclic paths, where k is a user-defined parameter. An extensive experimental investigation on a large variety of Java benchmarks on the Jikes RVM shows that, surprisingly, our approach can be even faster than Ball-Larus due to fewer operations on smaller hash tables, producing compact representations of cyclic paths even for large values of k.
△ Less
Submitted 18 April, 2013;
originally announced April 2013.