-
Automatically Testing Functional Properties of Code Translation Models
Authors:
Hasan Ferit Eniser,
Valentin Wüstholz,
Maria Christakis
Abstract:
Large language models are becoming increasingly practical for translating code across programming languages, a process known as $transpiling$. Even though automated transpilation significantly boosts developer productivity, a key concern is whether the generated code is correct. Existing work initially used manually crafted test suites to test the translations of a small corpus of programs; these…
▽ More
Large language models are becoming increasingly practical for translating code across programming languages, a process known as $transpiling$. Even though automated transpilation significantly boosts developer productivity, a key concern is whether the generated code is correct. Existing work initially used manually crafted test suites to test the translations of a small corpus of programs; these test suites were later automated. In contrast, we devise the first approach for automated, functional, property-based testing of code translation models. Our general, user-provided specifications about the transpiled code capture a range of properties, from purely syntactic to purely semantic ones. As shown by our experiments, this approach is very effective in detecting property violations in popular code translation models, and therefore, in evaluating model quality with respect to given properties. We also go a step further and explore the usage scenario where a user simply aims to obtain a correct translation of some code with respect to certain properties without necessarily being concerned about the overall quality of the model. To this purpose, we develop the first property-guided search procedure for code translation models, where a model is repeatedly queried with slightly different parameters to produce alternative and potentially more correct translations. Our results show that this search procedure helps to obtain significantly better code translations.
△ Less
Submitted 29 January, 2024; v1 submitted 7 September, 2023;
originally announced September 2023.
-
Specifying and Testing $k$-Safety Properties for Machine-Learning Models
Authors:
Maria Christakis,
Hasan Ferit Eniser,
Jörg Hoffmann,
Adish Singla,
Valentin Wüstholz
Abstract:
Machine-learning models are becoming increasingly prevalent in our lives, for instance assisting in image-classification or decision-making tasks. Consequently, the reliability of these models is of critical importance and has resulted in the development of numerous approaches for validating and verifying their robustness and fairness. However, beyond such specific properties, it is challenging to…
▽ More
Machine-learning models are becoming increasingly prevalent in our lives, for instance assisting in image-classification or decision-making tasks. Consequently, the reliability of these models is of critical importance and has resulted in the development of numerous approaches for validating and verifying their robustness and fairness. However, beyond such specific properties, it is challenging to specify, let alone check, general functional-correctness expectations from models. In this paper, we take inspiration from specifications used in formal methods, expressing functional-correctness properties by reasoning about $k$ different executions, so-called $k$-safety properties. Considering a credit-screening model of a bank, the expected property that "if a person is denied a loan and their income decreases, they should still be denied the loan" is a 2-safety property. Here, we show the wide applicability of $k$-safety properties for machine-learning models and present the first specification language for expressing them. We also operationalize the language in a framework for automatically validating such properties using metamorphic testing. Our experiments show that our framework is effective in identifying property violations, and that detected bugs could be used to train better models.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
Compositional Verification of Smart Contracts Through Communication Abstraction (Extended)
Authors:
Scott Wesley,
Maria Christakis,
Jorge A. Navas,
Richard Trefler,
Valentin Wüstholz,
Arie Gurfinkel
Abstract:
Solidity smart contracts are programs that manage up to 2^160 users on a blockchain. Verifying a smart contract relative to all users is intractable due to state explosion. Existing solutions either restrict the number of users to under-approximate behaviour, or rely on manual proofs. In this paper, we present local bundles that reduce contracts with arbitrarily many users to sequential programs w…
▽ More
Solidity smart contracts are programs that manage up to 2^160 users on a blockchain. Verifying a smart contract relative to all users is intractable due to state explosion. Existing solutions either restrict the number of users to under-approximate behaviour, or rely on manual proofs. In this paper, we present local bundles that reduce contracts with arbitrarily many users to sequential programs with a few representative users. Each representative user abstracts concrete users that are locally symmetric to each other relative to the contract and the property. Our abstraction is semi-automated. The representatives depend on communication patterns, and are computed via static analysis. A summary for the behaviour of each representative is provided manually, but a default summary is often sufficient. Once obtained, a local bundle is amenable to sequential static analysis. We show that local bundles are relatively complete for parameterized safety verification, under moderate assumptions. We implement local bundle abstraction in SmartACE, and show order-of-magnitude speedups compared to a state-of-the-art verifier.
△ Less
Submitted 31 August, 2021; v1 submitted 18 July, 2021;
originally announced July 2021.
-
Automatically Tailoring Static Analysis to Custom Usage Scenarios
Authors:
Muhammad Numair Mansur,
Benjamin Mariano,
Maria Christakis,
Jorge A. Navas,
Valentin Wüstholz
Abstract:
In recent years, there has been significant progress in the development and industrial adoption of static analyzers. Such analyzers typically provide a large, if not huge, number of configurable options controlling the precision and performance of the analysis. A major hurdle in integrating static analyzers in the software-development life cycle is tuning their options to custom usage scenarios, s…
▽ More
In recent years, there has been significant progress in the development and industrial adoption of static analyzers. Such analyzers typically provide a large, if not huge, number of configurable options controlling the precision and performance of the analysis. A major hurdle in integrating static analyzers in the software-development life cycle is tuning their options to custom usage scenarios, such as a particular code base or certain resource constraints. In this paper, we propose a technique that automatically tailors a static analyzer, specifically an abstract interpreter, to the code under analysis and any given resource constraints. We implement this technique in a framework called TAILOR, which we use to perform an extensive evaluation on real-world benchmarks. Our experiments show that the configurations generated by TAILOR are vastly better than the default analysis options, vary significantly depending on the code under analysis, and most remain tailored to several subsequent code versions.
△ Less
Submitted 30 September, 2020; v1 submitted 29 September, 2020;
originally announced September 2020.
-
Detecting Critical Bugs in SMT Solvers Using Blackbox Mutational Fuzzing
Authors:
Muhammad Numair Mansur,
Maria Christakis,
Valentin Wüstholz,
Fuyuan Zhang
Abstract:
Formal methods use SMT solvers extensively for deciding formula satisfiability, for instance, in software verification, systematic test generation, and program synthesis. However, due to their complex implementations, solvers may contain critical bugs that lead to unsound results. Given the wide applicability of solvers in software reliability, relying on such unsound results may have detrimental…
▽ More
Formal methods use SMT solvers extensively for deciding formula satisfiability, for instance, in software verification, systematic test generation, and program synthesis. However, due to their complex implementations, solvers may contain critical bugs that lead to unsound results. Given the wide applicability of solvers in software reliability, relying on such unsound results may have detrimental consequences. In this paper, we present STORM, a novel blackbox mutational fuzzing technique for detecting critical bugs in SMT solvers. We run our fuzzer on seven mature solvers and find 29 previously unknown critical bugs. STORM is already being used in testing new features of popular solvers before deployment.
△ Less
Submitted 13 April, 2020;
originally announced April 2020.
-
RAID: Randomized Adversarial-Input Detection for Neural Networks
Authors:
Hasan Ferit Eniser,
Maria Christakis,
Valentin Wüstholz
Abstract:
In recent years, neural networks have become the default choice for image classification and many other learning tasks, even though they are vulnerable to so-called adversarial attacks. To increase their robustness against these attacks, there have emerged numerous detection mechanisms that aim to automatically determine if an input is adversarial. However, state-of-the-art detection mechanisms ei…
▽ More
In recent years, neural networks have become the default choice for image classification and many other learning tasks, even though they are vulnerable to so-called adversarial attacks. To increase their robustness against these attacks, there have emerged numerous detection mechanisms that aim to automatically determine if an input is adversarial. However, state-of-the-art detection mechanisms either rely on being tuned for each type of attack, or they do not generalize across different attack types. To alleviate these issues, we propose a novel technique for adversarial-image detection, RAID, that trains a secondary classifier to identify differences in neuron activation values between benign and adversarial inputs. Our technique is both more reliable and more effective than the state of the art when evaluated against six popular attacks. Moreover, a straightforward extension of RAID increases its robustness against detection-aware adversaries without affecting its effectiveness.
△ Less
Submitted 7 February, 2020;
originally announced February 2020.
-
Perfectly Parallel Fairness Certification of Neural Networks
Authors:
Caterina Urban,
Maria Christakis,
Valentin Wüstholz,
Fuyuan Zhang
Abstract:
Recently, there is growing concern that machine-learning models, which currently assist or even automate decision making, reproduce, and in the worst case reinforce, bias of the training data. The development of tools and techniques for certifying fairness of these models or describing their biased behavior is, therefore, critical. In this paper, we propose a perfectly parallel static analysis for…
▽ More
Recently, there is growing concern that machine-learning models, which currently assist or even automate decision making, reproduce, and in the worst case reinforce, bias of the training data. The development of tools and techniques for certifying fairness of these models or describing their biased behavior is, therefore, critical. In this paper, we propose a perfectly parallel static analysis for certifying causal fairness of feed-forward neural networks used for classification of tabular data. When certification succeeds, our approach provides definite guarantees, otherwise, it describes and quantifies the biased behavior. We design the analysis to be sound, in practice also exact, and configurable in terms of scalability and precision, thereby enabling pay-as-you-go certification. We implement our approach in an open-source tool and demonstrate its effectiveness on models trained with popular datasets.
△ Less
Submitted 21 April, 2020; v1 submitted 5 December, 2019;
originally announced December 2019.
-
Targeted Greybox Fuzzing with Static Lookahead Analysis
Authors:
Valentin Wüstholz,
Maria Christakis
Abstract:
Automatic test generation typically aims to generate inputs that explore new paths in the program under test in order to find bugs. Existing work has, therefore, focused on guiding the exploration toward program parts that are more likely to contain bugs by using an offline static analysis.
In this paper, we introduce a novel technique for targeted greybox fuzzing using an online static analysis…
▽ More
Automatic test generation typically aims to generate inputs that explore new paths in the program under test in order to find bugs. Existing work has, therefore, focused on guiding the exploration toward program parts that are more likely to contain bugs by using an offline static analysis.
In this paper, we introduce a novel technique for targeted greybox fuzzing using an online static analysis that guides the fuzzer toward a set of target locations, for instance, located in recently modified parts of the program. This is achieved by first semantically analyzing each program path that is explored by an input in the fuzzer's test suite. The results of this analysis are then used to control the fuzzer's specialized power schedule, which determines how often to fuzz inputs from the test suite. We implemented our technique by extending a state-of-the-art, industrial fuzzer for Ethereum smart contracts and evaluate its effectiveness on 27 real-world benchmarks. Using an online analysis is particularly suitable for the domain of smart contracts since it does not require any code instrumentation---instrumentation to contracts changes their semantics. Our experiments show that targeted fuzzing significantly outperforms standard greybox fuzzing for reaching 83% of the challenging target locations (up to 14x of median speed-up).
△ Less
Submitted 17 May, 2019;
originally announced May 2019.
-
Harvey: A Greybox Fuzzer for Smart Contracts
Authors:
Valentin Wüstholz,
Maria Christakis
Abstract:
We present Harvey, an industrial greybox fuzzer for smart contracts, which are programs managing accounts on a blockchain. Greybox fuzzing is a lightweight test-generation approach that effectively detects bugs and security vulnerabilities. However, greybox fuzzers randomly mutate program inputs to exercise new paths; this makes it challenging to cover code that is guarded by narrow checks, which…
▽ More
We present Harvey, an industrial greybox fuzzer for smart contracts, which are programs managing accounts on a blockchain. Greybox fuzzing is a lightweight test-generation approach that effectively detects bugs and security vulnerabilities. However, greybox fuzzers randomly mutate program inputs to exercise new paths; this makes it challenging to cover code that is guarded by narrow checks, which are satisfied by no more than a few input values. Moreover, most real-world smart contracts transition through many different states during their lifetime, e.g., for every bid in an auction. To explore these states and thereby detect deep vulnerabilities, a greybox fuzzer would need to generate sequences of contract transactions, e.g., by creating bids from multiple users, while at the same time kee** the search space and test suite tractable. In this experience paper, we explain how Harvey alleviates both challenges with two key fuzzing techniques and distill the main lessons learned. First, Harvey extends standard greybox fuzzing with a method for predicting new inputs that are more likely to cover new paths or reveal vulnerabilities in smart contracts. Second, it fuzzes transaction sequences in a targeted and demand-driven way. We have evaluated our approach on 27 real-world contracts. Our experiments show that the underlying techniques significantly increase Harvey's effectiveness in achieving high coverage and detecting vulnerabilities, in most cases orders-of-magnitude faster; they also reveal new insights about contract code.
△ Less
Submitted 15 May, 2019;
originally announced May 2019.
-
Differentially Testing Soundness and Precision of Program Analyzers
Authors:
Christian Klinger,
Maria Christakis,
Valentin Wüstholz
Abstract:
In the last decades, numerous program analyzers have been developed both by academia and industry. Despite their abundance however, there is currently no systematic way of comparing the effectiveness of different analyzers on arbitrary code. In this paper, we present the first automated technique for differentially testing soundness and precision of program analyzers. We used our technique to comp…
▽ More
In the last decades, numerous program analyzers have been developed both by academia and industry. Despite their abundance however, there is currently no systematic way of comparing the effectiveness of different analyzers on arbitrary code. In this paper, we present the first automated technique for differentially testing soundness and precision of program analyzers. We used our technique to compare six mature, state-of-the art analyzers on tens of thousands of automatically generated benchmarks. Our technique detected soundness and precision issues in most analyzers, and we evaluated the implications of these issues to both designers and users of program analyzers.
△ Less
Submitted 16 December, 2018; v1 submitted 12 December, 2018;
originally announced December 2018.
-
Learning Inputs in Greybox Fuzzing
Authors:
Valentin Wüstholz,
Maria Christakis
Abstract:
Greybox fuzzing is a lightweight testing approach that effectively detects bugs and security vulnerabilities. However, greybox fuzzers randomly mutate program inputs to exercise new paths; this makes it challenging to cover code that is guarded by complex checks.
In this paper, we present a technique that extends greybox fuzzing with a method for learning new inputs based on already explored pro…
▽ More
Greybox fuzzing is a lightweight testing approach that effectively detects bugs and security vulnerabilities. However, greybox fuzzers randomly mutate program inputs to exercise new paths; this makes it challenging to cover code that is guarded by complex checks.
In this paper, we present a technique that extends greybox fuzzing with a method for learning new inputs based on already explored program executions. These inputs can be learned such that they guide exploration toward specific executions, for instance, ones that increase path coverage or reveal vulnerabilities. We have evaluated our technique and compared it to traditional greybox fuzzing on 26 real-world benchmarks. In comparison, our technique significantly increases path coverage (by up to 3X) and detects more bugs (up to 38% more), often orders-of-magnitude faster.
△ Less
Submitted 20 July, 2018;
originally announced July 2018.
-
Specification Mining for Smart Contracts with Automatic Abstraction Tuning
Authors:
Florentin Guth,
Valentin Wüstholz,
Maria Christakis,
Peter Müller
Abstract:
Smart contracts are programs that manage digital assets according to a certain protocol, expressing for instance the rules of an auction. Understanding the possible behaviors of a smart contract is difficult, which complicates development, auditing, and the post-mortem analysis of attacks.
This paper presents the first specification mining technique for smart contracts. Our technique extracts th…
▽ More
Smart contracts are programs that manage digital assets according to a certain protocol, expressing for instance the rules of an auction. Understanding the possible behaviors of a smart contract is difficult, which complicates development, auditing, and the post-mortem analysis of attacks.
This paper presents the first specification mining technique for smart contracts. Our technique extracts the possible behaviors of smart contracts from contract executions recorded on a blockchain and expresses them as finite automata. A novel dependency analysis allows us to separate independent interactions with a contract. Our technique tunes the abstractions for the automata construction automatically based on configurable metrics, for instance, to maximize readability or precision. We implemented our technique for the Ethereum blockchain and evaluated its usability on several real-world contracts.
△ Less
Submitted 20 July, 2018;
originally announced July 2018.
-
Failure-Directed Program Trimming (Extended Version)
Authors:
Kostas Ferles,
Valentin Wüstholz,
Maria Christakis,
Isil Dillig
Abstract:
This paper describes a new program simplification technique called program trimming that aims to improve the scalability and precision of safety checking tools. Given a program ${\mathcal P}$, program trimming generates a new program ${\mathcal P}'$ such that ${\mathcal P}$ and ${\mathcal P}'$ are equi-safe (i.e., ${\mathcal P}'$ has a bug if and only if ${\mathcal P}$ has a bug), but…
▽ More
This paper describes a new program simplification technique called program trimming that aims to improve the scalability and precision of safety checking tools. Given a program ${\mathcal P}$, program trimming generates a new program ${\mathcal P}'$ such that ${\mathcal P}$ and ${\mathcal P}'$ are equi-safe (i.e., ${\mathcal P}'$ has a bug if and only if ${\mathcal P}$ has a bug), but ${\mathcal P}'$ has fewer execution paths than ${\mathcal P}$. Since many program analyzers are sensitive to the number of execution paths, program trimming has the potential to improve the effectiveness of safety checking tools.
In addition to introducing the concept of program trimming, this paper also presents a lightweight static analysis that can be used as a pre-processing step to remove program paths while retaining equi-safety. We have implemented the proposed technique in a tool called Trimmer and evaluate it in the context of two program analysis techniques, namely abstract interpretation and dynamic symbolic execution. Our experiments show that program trimming significantly improves the effectiveness of both techniques.
△ Less
Submitted 14 June, 2017;
originally announced June 2017.
-
Static Detection of DoS Vulnerabilities in Programs that use Regular Expressions (Extended Version)
Authors:
Valentin Wüstholz,
Oswaldo Olivo,
Marijn J. H. Heule,
Isil Dillig
Abstract:
In an algorithmic complexity attack, a malicious party takes advantage of the worst-case behavior of an algorithm to cause denial-of-service. A prominent algorithmic complexity attack is regular expression denial-of-service (ReDoS), in which the attacker exploits a vulnerable regular expression by providing a carefully-crafted input string that triggers worst-case behavior of the matching algorith…
▽ More
In an algorithmic complexity attack, a malicious party takes advantage of the worst-case behavior of an algorithm to cause denial-of-service. A prominent algorithmic complexity attack is regular expression denial-of-service (ReDoS), in which the attacker exploits a vulnerable regular expression by providing a carefully-crafted input string that triggers worst-case behavior of the matching algorithm. This paper proposes a technique for automatically finding ReDoS vulnerabilities in programs. Specifically, our approach automatically identifies vulnerable regular expressions in the program and determines whether an "evil" input string can be matched against a vulnerable regular expression. We have implemented our proposed approach in a tool called REXPLOITER and found 41 exploitable security vulnerabilities in Java web applications.
△ Less
Submitted 15 January, 2017;
originally announced January 2017.
-
The Dafny Integrated Development Environment
Authors:
K. Rustan M. Leino,
Valentin Wüstholz
Abstract:
In recent years, program verifiers and interactive theorem provers have become more powerful and more suitable for verifying large programs or proofs. This has demonstrated the need for improving the user experience of these tools to increase productivity and to make them more accessible to non-experts. This paper presents an integrated development environment for Dafny-a programming language, ver…
▽ More
In recent years, program verifiers and interactive theorem provers have become more powerful and more suitable for verifying large programs or proofs. This has demonstrated the need for improving the user experience of these tools to increase productivity and to make them more accessible to non-experts. This paper presents an integrated development environment for Dafny-a programming language, verifier, and proof assistant-that addresses issues present in most state-of-the-art verifiers: low responsiveness and lack of support for understanding non-obvious verification failures. The paper demonstrates several new features that move the state-of-the-art closer towards a verification environment that can provide verification feedback as the user types and can present more helpful information about the program or failed verifications in a demand-driven and unobtrusive way.
△ Less
Submitted 26 April, 2014;
originally announced April 2014.