Search | arXiv e-print repository

doi 10.1109/ICSE43902.2021.00071

Fuzzing Symbolic Expressions

Authors: Luca Borzacchiello, Emilio Coppa, Camil Demetrescu

Abstract: Recent years have witnessed a wide array of results in software testing, exploring different approaches and methodologies ranging from fuzzers to symbolic engines, with a full spectrum of instances in between such as concolic execution and hybrid fuzzing. A key ingredient of many of these tools is Satisfiability Modulo Theories (SMT) solvers, which are used to reason over symbolic expressions coll… ▽ More Recent years have witnessed a wide array of results in software testing, exploring different approaches and methodologies ranging from fuzzers to symbolic engines, with a full spectrum of instances in between such as concolic execution and hybrid fuzzing. A key ingredient of many of these tools is Satisfiability Modulo Theories (SMT) solvers, which are used to reason over symbolic expressions collected during the analysis. In this paper, we investigate whether techniques borrowed from the fuzzing domain can be applied to check whether symbolic formulas are satisfiable in the context of concolic and hybrid fuzzing engines, providing a viable alternative to classic SMT solving techniques. We devise a new approximate solver, FUZZY-SAT, and show that it is both competitive with and complementary to state-of-the-art solvers such as Z3 with respect to handling queries generated by hybrid fuzzers. △ Less

Submitted 12 February, 2021; originally announced February 2021.

Journal ref: Proceedings of the 43rd International Conference on Software Engineering (ICSE 2021)

arXiv:2012.06658 [pdf, other]

doi 10.1109/DSN48987.2021.00064

Hiding in the Particles: When Return-Oriented Programming Meets Program Obfuscation

Authors: Pietro Borrello, Emilio Coppa, Daniele Cono D'Elia

Abstract: Largely known for attack scenarios, code reuse techniques at a closer look reveal properties that are appealing also for program obfuscation. We explore the popular return-oriented programming paradigm under this light, transforming program functions into ROP chains that coexist seamlessly with the surrounding software stack. We show how to build chains that can withstand popular static and dynami… ▽ More Largely known for attack scenarios, code reuse techniques at a closer look reveal properties that are appealing also for program obfuscation. We explore the popular return-oriented programming paradigm under this light, transforming program functions into ROP chains that coexist seamlessly with the surrounding software stack. We show how to build chains that can withstand popular static and dynamic deobfuscation approaches, evaluating the robustness and overheads of the design over common programs. The results suggest a significant amount of computational resources would be required to carry a deobfuscation attack for secret finding and code coverage goals. △ Less

Submitted 6 April, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

Comments: Published in the proceedings of DSN'21 (51st IEEE/IFIP Int. Conf. on Dependable Systems and Networks). Code and BibTeX entry available at https://github.com/pietroborrello/raindrop

ACM Class: D.4.6

Journal ref: 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

arXiv:1911.00621 [pdf, other]

doi 10.1145/3395363.3397372

WEIZZ: Automatic Grey-box Fuzzing for Structured Binary Formats

Authors: Andrea Fioraldi, Daniele Cono D'Elia, Emilio Coppa

Abstract: Fuzzing technologies have evolved at a fast pace in recent years, revealing bugs in programs with ever increasing depth and speed. Applications working with complex formats are however more difficult to take on, as inputs need to meet certain format-specific characteristics to get through the initial parsing stage and reach deeper behaviors of the program. Unlike prior proposals based on manually… ▽ More Fuzzing technologies have evolved at a fast pace in recent years, revealing bugs in programs with ever increasing depth and speed. Applications working with complex formats are however more difficult to take on, as inputs need to meet certain format-specific characteristics to get through the initial parsing stage and reach deeper behaviors of the program. Unlike prior proposals based on manually written format specifications, in this paper we present a technique to automatically generate and mutate inputs for unknown chunk-based binary formats. We propose a technique to identify dependencies between input bytes and comparison instructions, and later use them to assign tags that characterize the processing logic of the program. Tags become the building block for structure-aware mutations involving chunks and fields of the input. We show that our techniques performs comparably to structure-aware fuzzing proposals that require human assistance. Our prototype implementation WEIZZ revealed 16 unknown bugs in widely used programs. △ Less

Submitted 12 August, 2020; v1 submitted 1 November, 2019; originally announced November 2019.

Journal ref: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020

arXiv:1610.00502 [pdf, other]

A Survey of Symbolic Execution Techniques

Authors: Roberto Baldoni, Emilio Coppa, Daniele Cono D'Elia, Camil Demetrescu, Irene Finocchi

Abstract: Many security and software testing applications require checking whether certain properties of a program hold for any possible usage scenario. For instance, a tool for identifying software vulnerabilities may need to rule out the existence of any backdoor to bypass a program's authentication. One approach would be to test the program using different, possibly random inputs. As the backdoor may onl… ▽ More Many security and software testing applications require checking whether certain properties of a program hold for any possible usage scenario. For instance, a tool for identifying software vulnerabilities may need to rule out the existence of any backdoor to bypass a program's authentication. One approach would be to test the program using different, possibly random inputs. As the backdoor may only be hit for very specific program workloads, automated exploration of the space of possible inputs is of the essence. Symbolic execution provides an elegant solution to the problem, by systematically exploring many possible execution paths at the same time without necessarily requiring concrete inputs. Rather than taking on fully specified input values, the technique abstractly represents them as symbols, resorting to constraint solvers to construct actual instances that would cause property violations. Symbolic execution has been incubated in dozens of tools developed over the last four decades, leading to major practical breakthroughs in a number of prominent software reliability applications. The goal of this survey is to provide an overview of the main ideas, challenges, and solutions developed in the area, distilling them for a broad audience. The present survey has been accepted for publication at ACM Computing Surveys. If you are considering citing this survey, we would appreciate if you could use the following BibTeX entry: http://goo.gl/Hf5Fvc △ Less

Submitted 2 May, 2018; v1 submitted 3 October, 2016; originally announced October 2016.

Comments: This is the authors pre-print copy. If you are considering citing this survey, we would appreciate if you could use the following BibTeX entry: http://goo.gl/Hf5Fvc

Journal ref: ACM Computing Surveys 51(3), 2018. BibTeX entry: http://goo.gl/Hf5Fvc

arXiv:1503.09062 [pdf, other]

On data skewness, stragglers, and MapReduce progress indicators

Authors: Emilio Coppa, Irene Finocchi

Abstract: We tackle the problem of predicting the performance of MapReduce applications, designing accurate progress indicators that keep programmers informed on the percentage of completed computation time during the execution of a job. Through extensive experiments, we show that state-of-the-art progress indicators (including the one provided by Hadoop) can be seriously harmed by data skewness, load unbal… ▽ More We tackle the problem of predicting the performance of MapReduce applications, designing accurate progress indicators that keep programmers informed on the percentage of completed computation time during the execution of a job. Through extensive experiments, we show that state-of-the-art progress indicators (including the one provided by Hadoop) can be seriously harmed by data skewness, load unbalancing, and straggling tasks. This is mainly due to their implicit assumption that the running time depends linearly on the input size. We thus design a novel profile-guided progress indicator, called NearestFit, that operates without the linear hypothesis assumption and exploits a careful combination of nearest neighbor regression and statistical curve fitting techniques. Our theoretical progress model requires fine-grained profile data, that can be very difficult to manage in practice. To overcome this issue, we resort to computing accurate approximations for some of the quantities used in our model through space- and time-efficient data streaming algorithms. We implemented NearestFit on top of Hadoop 2.6.0. An extensive empirical assessment over the Amazon EC2 platform on a variety of real-world benchmarks shows that NearestFit is practical w.r.t. space and time overheads and that its accuracy is generally very good, even in scenarios where competitors incur non-negligible errors and wide prediction fluctuations. Overall, NearestFit significantly improves the current state-of-art on progress analysis for MapReduce. △ Less

Submitted 2 April, 2015; v1 submitted 31 March, 2015; originally announced March 2015.

arXiv:1304.3804 [pdf, ps, other]

Multithreaded Input-Sensitive Profiling

Authors: Emilio Coppa, Camil Demetrescu, Irene Finocchi, Romolo Marotta

Abstract: Input-sensitive profiling is a recent performance analysis technique that makes it possible to estimate the empirical cost function of individual routines of a program, hel** developers understand how performance scales to larger inputs and pinpoint asymptotic bottlenecks in the code. A current limitation of input-sensitive profilers is that they specifically target sequential computations, igno… ▽ More Input-sensitive profiling is a recent performance analysis technique that makes it possible to estimate the empirical cost function of individual routines of a program, hel** developers understand how performance scales to larger inputs and pinpoint asymptotic bottlenecks in the code. A current limitation of input-sensitive profilers is that they specifically target sequential computations, ignoring any communication between threads. In this paper we show how to overcome this limitation, extending the range of applicability of the original approach to multithreaded applications and to applications that operate on I/O streams. We develop new metrics for automatically estimating the size of the input given to each routine activation, addressing input produced by non-deterministic memory stores performed by other threads as well as by the OS kernel (e.g., in response to I/O or network operations). We provide real case studies, showing that our extension allows it to characterize the behavior of complex applications more precisely than previous approaches. An extensive experimental investigation on a variety of benchmark suites (including the SPEC OMP2012 and the PARSEC benchmarks) shows that our Valgrind-based input-sensitive profiler incurs an overhead comparable to other prominent heavyweight analysis tools, while collecting significantly more performance points from each profiling session and correctly characterizing both thread-induced and external input. △ Less

Submitted 13 April, 2013; originally announced April 2013.

MSC Class: 68N30 ACM Class: C.4; D.2.8

Showing 1–6 of 6 results for author: Coppa, E