Search | arXiv e-print repository

Leveraging Static Analysis for Bug Repair

Authors: Ruba Mutasim, Gabriel Synnaeve, David Pichardie, Baptiste Rozière

Abstract: We propose a method combining machine learning with a static analysis tool (i.e. Infer) to automatically repair source code. Machine Learning methods perform well for producing idiomatic source code. However, their output is sometimes difficult to trust as language models can output incorrect code with high confidence. Static analysis tools are trustable, but also less flexible and produce non-idi… ▽ More We propose a method combining machine learning with a static analysis tool (i.e. Infer) to automatically repair source code. Machine Learning methods perform well for producing idiomatic source code. However, their output is sometimes difficult to trust as language models can output incorrect code with high confidence. Static analysis tools are trustable, but also less flexible and produce non-idiomatic code. In this paper, we propose to fix resource leak bugs in IR space, and to use a sequence-to-sequence model to propose fix in source code space. We also study several decoding strategies, and use Infer to filter the output of the model. On a dataset of CodeNet submissions with potential resource leak bugs, our method is able to find a function with the same semantics that does not raise a warning with around 97% precision and 66% recall. △ Less

Submitted 21 April, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

Comments: 13 pages. DL4C 2023

arXiv:2212.03129 [pdf, other]

doi 10.1145/3571202

Formally Verified Native Code Generation in an Effectful JIT -- or: Turning the CompCert Backend into a Formally Verified JIT Compiler

Authors: Aurèle Barrière, Sandrine Blazy, David Pichardie

Abstract: Modern Just-in-Time compilers (or JITs) typically interleave several mechanisms to execute a program. For faster startup times and to observe the initial behavior of an execution, interpretation can be initially used. But after a while, JITs dynamically produce native code for parts of the program they execute often. Although some time is spent compiling dynamically, this mechanism makes for much… ▽ More Modern Just-in-Time compilers (or JITs) typically interleave several mechanisms to execute a program. For faster startup times and to observe the initial behavior of an execution, interpretation can be initially used. But after a while, JITs dynamically produce native code for parts of the program they execute often. Although some time is spent compiling dynamically, this mechanism makes for much faster times for the remaining of the program execution. Such compilers are complex pieces of software with various components, and greatly rely on a precise interplay between the different languages being executed, including on-stack-replacement. Traditional static compilers like CompCert have been mechanized in proof assistants, but JITs have been scarcely formalized so far, partly due to their impure nature and their numerous components. This work presents a model JIT with dynamic generation of native code, implemented and formally verified in Coq. Although some parts of a JIT cannot be written in Coq, we propose a proof methodology to delimit, specify and reason on the impure effects of a JIT. We argue that the daunting task of formally verifying a complete JIT should draw on existing proofs of native code generation. To this end, our work successfully reuses CompCert and its correctness proofs during dynamic compilation. Finally, our prototype can be extracted and executed. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: Proceedings of the ACM on Programming Languages, 2023

arXiv:2107.09472 [pdf, other]

doi 10.1007/978-3-030-88806-0_6

Verified Functional Programming of an Abstract Interpreter

Authors: Lucas Franceschino, David Pichardie, Jean-Pierre Talpin

Abstract: Abstract interpreters are complex pieces of software: even if the abstract interpretation theory and companion algorithms are well understood, their implementations are subject to bugs, that might question the soundness of their computations. While some formally verified abstract interpreters have been written in the past, writing and understanding them requires expertise in the use of proof ass… ▽ More Abstract interpreters are complex pieces of software: even if the abstract interpretation theory and companion algorithms are well understood, their implementations are subject to bugs, that might question the soundness of their computations. While some formally verified abstract interpreters have been written in the past, writing and understanding them requires expertise in the use of proof assistants, and requires a non-trivial amount of interactive proofs. This paper presents a formally verified abstract interpreter fully programmed and proved correct in the F* verified programming environment. Thanks to F* refinement types and SMT prover capabilities we demonstrate a substantial saving in proof effort compared to previous works based on interactive proof assistants. Almost all the code of our implementation, proofs included, written in a functional style, are presented directly in the paper. △ Less

Submitted 17 October, 2021; v1 submitted 20 July, 2021; originally announced July 2021.

Comments: Published in SAS21

arXiv:1509.06503 [pdf, other]

A Verified Information-Flow Architecture

Authors: Arthur Azevedo de Amorim, Nathan Collins, André DeHon, Delphine Demange, Catalin Hritcu, David Pichardie, Benjamin C. Pierce, Randy Pollack, Andrew Tolmach

Abstract: SAFE is a clean-slate design for a highly secure computer system, with pervasive mechanisms for tracking and limiting information flows. At the lowest level, the SAFE hardware supports fine-grained programmable tags, with efficient and flexible propagation and combination of tags as instructions are executed. The operating system virtualizes these generic facilities to present an information-flow… ▽ More SAFE is a clean-slate design for a highly secure computer system, with pervasive mechanisms for tracking and limiting information flows. At the lowest level, the SAFE hardware supports fine-grained programmable tags, with efficient and flexible propagation and combination of tags as instructions are executed. The operating system virtualizes these generic facilities to present an information-flow abstract machine that allows user programs to label sensitive data with rich confidentiality policies. We present a formal, machine-checked model of the key hardware and software mechanisms used to dynamically control information flow in SAFE and an end-to-end proof of noninterference for this model. We use a refinement proof methodology to propagate the noninterference property of the abstract machine down to the concrete machine level. We use an intermediate layer in the refinement chain that factors out the details of the information-flow control policy and devise a code generator for compiling such information-flow policies into low-level monitor code. Finally, we verify the correctness of this generator using a dedicated Hoare logic that abstracts from low-level machine instructions into a reusable set of verified structured code generators. △ Less

Submitted 6 March, 2016; v1 submitted 22 September, 2015; originally announced September 2015.

arXiv:1304.3596 [pdf, other]

Formal Verification of a C Value Analysis Based on Abstract Interpretation

Authors: Sandrine Blazy, Vincent Laporte, André Maroneze, David Pichardie

Abstract: Static analyzers based on abstract interpretation are complex pieces of software implementing delicate algorithms. Even if static analysis techniques are well understood, their implementation on real languages is still error-prone. This paper presents a formal verification using the Coq proof assistant: a formalization of a value analysis (based on abstract interpretation), and a soundness proof o… ▽ More Static analyzers based on abstract interpretation are complex pieces of software implementing delicate algorithms. Even if static analysis techniques are well understood, their implementation on real languages is still error-prone. This paper presents a formal verification using the Coq proof assistant: a formalization of a value analysis (based on abstract interpretation), and a soundness proof of the value analysis. The formalization relies on generic interfaces. The mechanized proof is facilitated by a translation validation of a Bourdoncle fixpoint iterator. The work has been integrated into the CompCert verified C-compiler. Our verified analysis directly operates over an intermediate language of the compiler having the same expressiveness as C. The automatic extraction of our value analysis into OCaml yields a program with competitive results, obtained from experiments on a number of benchmarks and comparisons with the Frama-C tool. △ Less

Submitted 12 April, 2013; originally announced April 2013.

Journal ref: SAS - 20th Static Analysis Symposium Lecture Notes in Computer Science (2013) 324-344

arXiv:1204.4322 [pdf, other]

doi 10.2168/LMCS-8(2:5)2012

Secure the Clones

Authors: Thomas Jensen, Florent Kirchner, David Pichardie

Abstract: Exchanging mutable data objects with untrusted code is a delicate matter because of the risk of creating a data space that is accessible by an attacker. Consequently, secure programming guidelines for Java stress the importance of using defensive copying before accepting or handing out references to an internal mutable object. However, implementation of a copy method (like clone()) is entirely le… ▽ More Exchanging mutable data objects with untrusted code is a delicate matter because of the risk of creating a data space that is accessible by an attacker. Consequently, secure programming guidelines for Java stress the importance of using defensive copying before accepting or handing out references to an internal mutable object. However, implementation of a copy method (like clone()) is entirely left to the programmer. It may not provide a sufficiently deep copy of an object and is subject to overriding by a malicious sub-class. Currently no language-based mechanism supports secure object cloning. This paper proposes a type-based annotation system for defining modular copy policies for class-based object-oriented programs. A copy policy specifies the maximally allowed sharing between an object and its clone. We present a static enforcement mechanism that will guarantee that all classes fulfil their copy policy, even in the presence of overriding of copy methods, and establish the semantic correctness of the overall approach in Coq. The mechanism has been implemented and experimentally evaluated on clone methods from several Java libraries. △ Less

Submitted 4 June, 2012; v1 submitted 19 April, 2012; originally announced April 2012.

ACM Class: I.1.2, F.3.1, F.3.3, D.3.3

Journal ref: Logical Methods in Computer Science, Volume 8, Issue 2 (May 31, 2012) lmcs:801

arXiv:1007.3353 [pdf, ps, other]

doi 10.1007/978-3-642-18070-5_7

Sawja: Static Analysis Workshop for Java

Authors: Laurent Hubert, Nicolas Barré, Frédéric Besson, Delphine Demange, Thomas Jensen, Vincent Monfort, David Pichardie, Tiphaine Turpin

Abstract: Static analysis is a powerful technique for automatic verification of programs but raises major engineering challenges when develo** a full-fledged analyzer for a realistic language such as Java. This paper describes the Sawja library: a static analysis framework fully compliant with Java 6 which provides OCaml modules for efficiently manipulating Java bytecode programs. We present the main feat… ▽ More Static analysis is a powerful technique for automatic verification of programs but raises major engineering challenges when develo** a full-fledged analyzer for a realistic language such as Java. This paper describes the Sawja library: a static analysis framework fully compliant with Java 6 which provides OCaml modules for efficiently manipulating Java bytecode programs. We present the main features of the library, including (i) efficient functional data-structures for representing program with implicit sharing and lazy parsing, (ii) an intermediate stack-less representation, and (iii) fast computation and manipulation of complete programs. △ Less

Submitted 20 July, 2010; originally announced July 2010.

Journal ref: The International Conference on Formal Verification of Object-Oriented Software 2010.13 (2010) 253--267

arXiv:1007.3249 [pdf, ps, other]

doi 10.1016/j.entcs.2009.11.012

Soundly Handling Static Fields: Issues, Semantics and Analysis

Authors: Laurent Hubert, David Pichardie

Abstract: Although in most cases class initialization works as expected, some static fields may be read before being initialized, despite being initialized in their corresponding class initializer. We propose an analysis which compute, for each program point, the set of static fields that must have been initialized and discuss its soundness. We show that such an analysis can be directly applied to identify… ▽ More Although in most cases class initialization works as expected, some static fields may be read before being initialized, despite being initialized in their corresponding class initializer. We propose an analysis which compute, for each program point, the set of static fields that must have been initialized and discuss its soundness. We show that such an analysis can be directly applied to identify the static fields that may be read before being initialized and to improve the precision while preserving the soundness of a null-pointer analysis. △ Less

Submitted 20 July, 2010; v1 submitted 19 July, 2010; originally announced July 2010.

Comments: Proceedings of the Fourth Workshop on Bytecode Semantics, Verification, Analysis and Transformation (BYTECODE 2009)

Journal ref: Electronic Notes in Theoretical Computer Science 253, 5 (2009) 15 - 30

arXiv:1007.3133 [pdf, ps, other]

Enforcing Secure Object Initialization in Java

Authors: Laurent Hubert, Thomas Jensen, Vincent Monfort, David Pichardie

Abstract: Sun and the CERT recommend for secure Java development to not allow partially initialized objects to be accessed. The CERT considers the severity of the risks taken by not following this recommendation as high. The solution currently used to enforce object initialization is to implement a coding pattern proposed by Sun, which is not formally checked. We propose a modular type system to formally sp… ▽ More Sun and the CERT recommend for secure Java development to not allow partially initialized objects to be accessed. The CERT considers the severity of the risks taken by not following this recommendation as high. The solution currently used to enforce object initialization is to implement a coding pattern proposed by Sun, which is not formally checked. We propose a modular type system to formally specify the initialization policy of libraries or programs and a type checker to statically check at load time that all loaded classes respect the policy. This allows to prove the absence of bugs which have allowed some famous privilege escalations in Java. Our experimental results show that our safe default policy allows to prove 91% of classes of java.lang, java.security and javax.security safe without any annotation and by adding 57 simple annotations we proved all classes but four safe. The type system and its soundness theorem have been formalized and machine checked using Coq. △ Less

Submitted 19 July, 2010; originally announced July 2010.

Journal ref: 15th European Symposium on Research in Computer Security (ESORICS) 6345 (2010) 101-115

Showing 1–9 of 9 results for author: Pichardie, D