Search | arXiv e-print repository

Sound Borrow-Checking for Rust via Symbolic Semantics (Long Version)

Authors: Son Ho, Aymeric Fromherz, Jonathan Protzenko

Abstract: The Rust programming language continues to rise in popularity, and as such, warrants the close attention of the programming languages community. In this work, we present a new foundational contribution towards the theoretical understanding of Rust's semantics. We prove that LLBC, a high-level, borrow-centric model previously proposed for Rust's semantics and execution, is sound with regards to a l… ▽ More The Rust programming language continues to rise in popularity, and as such, warrants the close attention of the programming languages community. In this work, we present a new foundational contribution towards the theoretical understanding of Rust's semantics. We prove that LLBC, a high-level, borrow-centric model previously proposed for Rust's semantics and execution, is sound with regards to a low-level pointer-based language à la CompCert. Specifically, we prove the following: that LLBC is a correct view over a traditional model of execution; that LLBC's symbolic semantics are a correct abstraction of LLBC programs; and that LLBC's symbolic semantics act as a borrow-checker for LLBC, i.e. that symbolically-checked LLBC programs do not get stuck when executed on a heap-and-addresses model of execution. To prove these results, we introduce a new proof style that considerably simplifies our proofs of simulation, which relies on a notion of hybrid states. Equipped with this reasoning framework, we show that a new addition to LLBC's symbolic semantics, namely a join operation, preserves the abstraction and borrow-checking properties. This in turn allows us to add support for loops to the Aeneas framework; we show, using a series of examples and case studies, that this unlocks new expressive power for Aeneas. △ Less

Submitted 1 July, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

arXiv:2403.09435 [pdf, other]

StarMalloc: A Formally Verified, Concurrent, Performant, and Security-Oriented Memory Allocator

Authors: Antonin Reitz, Aymeric Fromherz, Jonathan Protzenko

Abstract: In this work, we present StarMalloc, a verified, security-oriented, concurrent memory allocator that can be used as a drop-in replacement in real-world projects. Using the Steel separation logic framework, we show how to specify and verify StarMalloc, relying on dependent types and modular abstractions to enable efficient verification. As part of StarMalloc, we also develop several generic datastr… ▽ More In this work, we present StarMalloc, a verified, security-oriented, concurrent memory allocator that can be used as a drop-in replacement in real-world projects. Using the Steel separation logic framework, we show how to specify and verify StarMalloc, relying on dependent types and modular abstractions to enable efficient verification. As part of StarMalloc, we also develop several generic datastructures and proof libraries directly reusable in future systems verification projects. We finally show that StarMalloc can be used with real-world projects, including the Firefox browser, and evaluate it against 10 state-of-the-art memory allocators, demonstrating its competitiveness. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2206.07185 [pdf, ps, other]

doi 10.1145/3547647

Aeneas: Rust Verification by Functional Translation

Authors: Son Ho, Jonathan Protzenko

Abstract: We present Aeneas, a new verification toolchain for Rust programs based on a lightweight functional translation. We leverage Rust's rich region-based type system to eliminate memory reasoning for many Rust programs, as long as they do not rely on interior mutability or unsafe code. Doing so, we relieve the proof engineer of the burden of memory-based reasoning, allowing them to instead focus on fu… ▽ More We present Aeneas, a new verification toolchain for Rust programs based on a lightweight functional translation. We leverage Rust's rich region-based type system to eliminate memory reasoning for many Rust programs, as long as they do not rely on interior mutability or unsafe code. Doing so, we relieve the proof engineer of the burden of memory-based reasoning, allowing them to instead focus on functional properties of their code. Our first contribution is a new approach to borrows and controlled aliasing. We propose a pure, functional semantics for LLBC, a Low-Level Borrow Calculus that captures a large subset of Rust programs. Our semantics is value-based, meaning there is no notion of memory, addresses or pointer arithmetic. Our semantics is also ownership-centric, meaning that we enforce soundness of borrows via a semantic criterion based on loans rather than through a syntactic type-based lifetime discipline. We claim that our semantics captures the essence of the borrow mechanism rather than its current implementation in the Rust compiler. Our second contribution is a translation from LLBC to a pure lambda-calculus. This allows the user to reason about the original Rust program through the theorem prover of their choice. To deal with the well-known technical difficulty of terminating a borrow, we rely on a novel approach, in which we approximate the borrow graph in the presence of function calls. This in turn allows us to perform the translation using a new technical device called backward functions. We implement our toolchain in a mixture of Rust and OCaml. Our evaluation shows significant gains of verification productivity for the programmer. Rust goes to great lengths to enforce static control of aliasing; the proof engineer should not waste any time on memory reasoning when so much already comes "for free"! △ Less

Submitted 28 September, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

Journal ref: Proceedings of the ACM on Programming Languages, Volume 6, Issue ICFP. August 2022. Article No.: 116, pp 711-741

arXiv:2103.03198 [pdf, other]

doi 10.1145/3473582

Catala: A Programming Language for the Law

Authors: Denis Merigoux, Nicolas Chataing, Jonathan Protzenko

Abstract: Law at large underpins modern society, codifying and governing many aspects of citizens' daily lives. Oftentimes, law is subject to interpretation, debate and challenges throughout various courts and jurisdictions. But in some other areas, law leaves little room for interpretation, and essentially aims to rigorously describe a computation, a decision procedure or, simply said, an algorithm. Unfort… ▽ More Law at large underpins modern society, codifying and governing many aspects of citizens' daily lives. Oftentimes, law is subject to interpretation, debate and challenges throughout various courts and jurisdictions. But in some other areas, law leaves little room for interpretation, and essentially aims to rigorously describe a computation, a decision procedure or, simply said, an algorithm. Unfortunately, prose remains a woefully inadequate tool for the job. The lack of formalism leaves room for ambiguities; the structure of legal statutes, with many paragraphs and sub-sections spread across multiple pages, makes it hard to compute the intended outcome of the algorithm underlying a given text; and, as with any other piece of poorly-specified critical software, the use of informal language leaves corner cases unaddressed. We introduce Catala, a new programming language that we specifically designed to allow a straightforward and systematic translation of statutory law into an executable implementation. Catala aims to bring together lawyers and programmers through a shared medium, which together they can understand, edit and evolve, bridging a gap that often results in dramatically incorrect implementations of the law. We have implemented a compiler for Catala, and have proven the correctness of its core compilation steps using the F* proof assistant. We evaluate Catala on several legal texts that are algorithms in disguise, notably section 121 of the US federal income tax and the byzantine French family benefits; in doing so, we uncover a bug in the official implementation. We observe as a consequence of the formalization process that using Catala enables rich interactions between lawyers and programmers, leading to a greater understanding of the original legislative intent, while producing a correct-by-construction executable specification reusable by the greater software ecosystem. △ Less

Submitted 2 July, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

arXiv:2102.01644 [pdf, ps, other]

Modularity, Code Specialization, and Zero-Cost Abstractions for Program Verification

Authors: Son Ho, Aymeric Fromherz, Jonathan Protzenko

Abstract: For all the successes in verifying low-level, efficient, security-critical code, little has been said or studied about the structure, architecture and engineering of such large-scale proof developments. We present the design, implementation and evaluation of a set of language-based techniques that allow the programmer to modularly write and verify code at a high level of abstraction, while retai… ▽ More For all the successes in verifying low-level, efficient, security-critical code, little has been said or studied about the structure, architecture and engineering of such large-scale proof developments. We present the design, implementation and evaluation of a set of language-based techniques that allow the programmer to modularly write and verify code at a high level of abstraction, while retaining control over the compilation process and producing high-quality, zero-overhead, low-level code suitable for integration into mainstream software. We implement our techniques within the F* proof assistant, and specifically its shallowly-embedded Low* toolchain that compiles to C. Through our evaluation, we establish that our techniques were critical in scaling the popular HACL* library past 100,000 lines of verified source code, and brought about significant gains in proof engineer productivity. The exposition of our methodology converges on one final, novel case study: the streaming API, a finicky API that has historically caused many bugs in high-profile software. Using our approach, we manage to capture the streaming semantics in a generic way, and apply it ``for free'' to over a dozen use-cases. Six of those have made it into the reference implementation of the Python programming language, replacing the previous CVE-ridden code. △ Less

Submitted 6 July, 2023; v1 submitted 2 February, 2021; originally announced February 2021.

arXiv:2011.07966 [pdf, other]

A Modern Compiler for the French Tax Code

Authors: Denis Merigoux, Raphaël Monat, Jonathan Protzenko

Abstract: In France, income tax is computed from taxpayers' individual returns, using an algorithm that is authored, designed and maintained by the French Public Finances Directorate (DGFiP). This algorithm relies on a legacy custom language and compiler originally designed in 1990, which unlike French wine, did not age well with time. Owing to the shortcomings of the input language and the technical limita… ▽ More In France, income tax is computed from taxpayers' individual returns, using an algorithm that is authored, designed and maintained by the French Public Finances Directorate (DGFiP). This algorithm relies on a legacy custom language and compiler originally designed in 1990, which unlike French wine, did not age well with time. Owing to the shortcomings of the input language and the technical limitations of the compiler, the algorithm is proving harder and harder to maintain, relying on ad-hoc behaviors and workarounds to implement the most recent changes in tax law. Competence loss and aging code also mean that the system does not benefit from any modern compiler techniques that would increase confidence in the implementation. We overhaul this infrastructure and present Mlang, an open-source compiler toolchain whose goal is to replace the existing infrastructure. Mlang is based on a reverse-engineered formalization of the DGFiP's system, and has been thoroughly validated against the private DGFiP test suite. As such, Mlang has a formal semantics; eliminates previous handwritten workarounds in C; compiles to modern languages (Python); and enables a variety of instrumentations, providing deep insights about the essence of French income tax computation. The DGFiP is now officially transitioning to Mlang for their production system. △ Less

Submitted 25 January, 2021; v1 submitted 16 November, 2020; originally announced November 2020.

arXiv:1803.06547 [pdf, other]

Meta-F*: Proof Automation with SMT, Tactics, and Metaprograms

Authors: Guido Martínez, Danel Ahman, Victor Dumitrescu, Nick Giannarakis, Chris Hawblitzel, Catalin Hritcu, Monal Narasimhamurthy, Zoe Paraskevopoulou, Clément Pit-Claudel, Jonathan Protzenko, Tahina Ramananandro, Aseem Rastogi, Nikhil Swamy

Abstract: We introduce Meta-F*, a tactics and metaprogramming framework for the F* program verifier. The main novelty of Meta-F* is allowing the use of tactics and metaprogramming to discharge assertions not solvable by SMT, or to just simplify them into well-behaved SMT fragments. Plus, Meta-F* can be used to generate verified code automatically. Meta-F* is implemented as an F* effect, which, given the p… ▽ More We introduce Meta-F*, a tactics and metaprogramming framework for the F* program verifier. The main novelty of Meta-F* is allowing the use of tactics and metaprogramming to discharge assertions not solvable by SMT, or to just simplify them into well-behaved SMT fragments. Plus, Meta-F* can be used to generate verified code automatically. Meta-F* is implemented as an F* effect, which, given the powerful effect system of F*, heavily increases code reuse and even enables the lightweight verification of metaprograms. Metaprograms can be either interpreted, or compiled to efficient native code that can be dynamically loaded into the F* type-checker and can interoperate with interpreted code. Evaluation on realistic case studies shows that Meta-F* provides substantial gains in proof development, efficiency, and robustness. △ Less

Submitted 7 March, 2019; v1 submitted 17 March, 2018; originally announced March 2018.

Comments: Full version of ESOP'19 paper

arXiv:1703.00055 [pdf, other]

doi 10.1145/3167090

A Monadic Framework for Relational Verification: Applied to Information Security, Program Equivalence, and Optimizations

Authors: Niklas Grimm, Kenji Maillard, Cédric Fournet, Catalin Hritcu, Matteo Maffei, Jonathan Protzenko, Tahina Ramananandro, Aseem Rastogi, Nikhil Swamy, Santiago Zanella-Béguelin

Abstract: Relational properties describe multiple runs of one or more programs. They characterize many useful notions of security, program refinement, and equivalence for programs with diverse computational effects, and they have received much attention in the recent literature. Rather than develo** separate tools for special classes of effects and relational properties, we advocate using a general purpos… ▽ More Relational properties describe multiple runs of one or more programs. They characterize many useful notions of security, program refinement, and equivalence for programs with diverse computational effects, and they have received much attention in the recent literature. Rather than develo** separate tools for special classes of effects and relational properties, we advocate using a general purpose proof assistant as a unifying framework for the relational verification of effectful programs. The essence of our approach is to model effectful computations using monads and to prove relational properties on their monadic representations, making the most of existing support for reasoning about pure programs. We apply this method in F* and evaluate it by encoding a variety of relational program analyses, including information flow control, program equivalence and refinement at higher order, correctness of program optimizations and game-based cryptographic security. By relying on SMT-based automation, unary weakest preconditions, user-defined effects, and monadic reification, we show that, compared to unary properties, verifying relational properties requires little additional effort from the F* programmer. △ Less

Submitted 12 October, 2019; v1 submitted 28 February, 2017; originally announced March 2017.

Comments: CPP'18 extended version with the missing ERC acknowledgement

arXiv:1703.00053 [pdf, other]

doi 10.1145/3110261

Verified Low-Level Programming Embedded in F*

Authors: Jonathan Protzenko, Jean-Karim Zinzindohoué, Aseem Rastogi, Tahina Ramananandro, Peng Wang, Santiago Zanella-Béguelin, Antoine Delignat-Lavaud, Catalin Hritcu, Karthikeyan Bhargavan, Cédric Fournet, Nikhil Swamy

Abstract: We present Low*, a language for low-level programming and verification, and its application to high-assurance optimized cryptographic libraries. Low* is a shallow embedding of a small, sequential, well-behaved subset of C in F*, a dependently-typed variant of ML aimed at program verification. Departing from ML, Low* does not involve any garbage collection or implicit heap allocation; instead, it h… ▽ More We present Low*, a language for low-level programming and verification, and its application to high-assurance optimized cryptographic libraries. Low* is a shallow embedding of a small, sequential, well-behaved subset of C in F*, a dependently-typed variant of ML aimed at program verification. Departing from ML, Low* does not involve any garbage collection or implicit heap allocation; instead, it has a structured memory model à la CompCert, and it provides the control required for writing efficient low-level security-critical code. By virtue of ty**, any Low* program is memory safe. In addition, the programmer can make full use of the verification power of F* to write high-level specifications and verify the functional correctness of Low* code using a combination of SMT automation and sophisticated manual proofs. At extraction time, specifications and proofs are erased, and the remaining code enjoys a predictable translation to C. We prove that this translation preserves semantics and side-channel resistance. We provide a new compiler back-end from Low* to C and, to evaluate our approach, we implement and verify various cryptographic algorithms, constructions, and tools for a total of about 28,000 lines of code, specification and proof. We show that our Low* code delivers performance competitive with existing (unverified) C cryptographic libraries, suggesting our approach may be applicable to larger-scale low-level software. △ Less

Submitted 11 December, 2018; v1 submitted 28 February, 2017; originally announced March 2017.

Comments: extended version of ICFP final camera ready version; only Acknowledgements differ from 30 Aug 2017 version

arXiv:1608.06499 [pdf, other]

Dijkstra Monads for Free

Authors: Danel Ahman, Catalin Hritcu, Kenji Maillard, Guido Martinez, Gordon Plotkin, Jonathan Protzenko, Aseem Rastogi, Nikhil Swamy

Abstract: Dijkstra monads enable a dependent type theory to be enhanced with support for specifying and verifying effectful code via weakest preconditions. Together with their closely related counterparts, Hoare monads, they provide the basis on which verification tools like F*, Hoare Type Theory (HTT), and Ynot are built. We show that Dijkstra monads can be derived "for free" by applying a continuation-p… ▽ More Dijkstra monads enable a dependent type theory to be enhanced with support for specifying and verifying effectful code via weakest preconditions. Together with their closely related counterparts, Hoare monads, they provide the basis on which verification tools like F*, Hoare Type Theory (HTT), and Ynot are built. We show that Dijkstra monads can be derived "for free" by applying a continuation-passing style (CPS) translation to the standard monadic definitions of the underlying computational effects. Automatically deriving Dijkstra monads in this way provides a correct-by-construction and efficient way of reasoning about user-defined effects in dependent type theories. We demonstrate these ideas in EMF*, a new dependently typed calculus, validating it via both formal proof and a prototype implementation within F*. Besides equip** F* with a more uniform and extensible effect system, EMF* enables a novel mixture of intrinsic and extrinsic proofs within F*. △ Less

Submitted 12 October, 2019; v1 submitted 23 August, 2016; originally announced August 2016.

Comments: extended pre-print for POPL 2017 final version with the missing ERC acknowledgement

arXiv:1311.7256 [pdf, ps, other]

The ins and outs of iteration in Mezzo

Authors: Armaël Guéneau, François Pottier, Jonathan Protzenko

Abstract: This is a talk proposal for HOPE 2013. Using iteration over a collection as a case study, we wish to illustrate the strengths and weaknesses of the prototype programming language Mezzo. This is a talk proposal for HOPE 2013. Using iteration over a collection as a case study, we wish to illustrate the strengths and weaknesses of the prototype programming language Mezzo. △ Less

Submitted 28 November, 2013; originally announced November 2013.

ACM Class: D.3.2

arXiv:1311.7242 [pdf, ps, other]

doi 10.1145/2500365.2500598

Programming with Permissions in Mezzo

Authors: Jonathan Protzenko, François Pottier

Abstract: We present Mezzo, a typed programming language of ML lineage. Mezzo is equipped with a novel static discipline of duplicable and affine permissions, which controls aliasing and ownership. This rules out certain mistakes, including representation exposure and data races, and enables new idioms, such as gradual initialization, memory re-use, and (type)state changes. Although the core static discipli… ▽ More We present Mezzo, a typed programming language of ML lineage. Mezzo is equipped with a novel static discipline of duplicable and affine permissions, which controls aliasing and ownership. This rules out certain mistakes, including representation exposure and data races, and enables new idioms, such as gradual initialization, memory re-use, and (type)state changes. Although the core static discipline disallows sharing a mutable data structure, Mezzo offers several ways of working around this restriction, including a novel dynamic ownership control mechanism which we dub "adoption and abandon". △ Less

Submitted 28 November, 2013; originally announced November 2013.

ACM Class: D.3.2

Journal ref: ICFP 2013, Proceedings of the 18th ACM SIGPLAN international conference on Functional programming

arXiv:1311.6929 [pdf, other]

doi 10.4230/OASIcs.FSFMA.2013.68

Illustrating the Mezzo programming language

Authors: Jonathan Protzenko

Abstract: When programmers want to prove strong program invariants, they are usually faced with a choice between using theorem provers and using traditional programming languages. The former requires them to provide program proofs, which, for many applications, is considered a heavy burden. The latter provides less guarantees and the programmer usually has to write run-time assertions to compensate for the… ▽ More When programmers want to prove strong program invariants, they are usually faced with a choice between using theorem provers and using traditional programming languages. The former requires them to provide program proofs, which, for many applications, is considered a heavy burden. The latter provides less guarantees and the programmer usually has to write run-time assertions to compensate for the lack of suitable invariants expressible in the type system. We introduce Mezzo, a programming language in the tradition of ML, in which the usual concept of a type is replaced by a more precise notion of a permission. Programs written in Mezzo usually enjoy stronger guarantees than programs written in pure ML. However, because Mezzo is based on a type system, the reasoning requires no user input. In this paper, we illustrate the key concepts of Mezzo, highlighting the static guarantees our language provides. △ Less

Submitted 27 November, 2013; originally announced November 2013.

ACM Class: D.3.2

Journal ref: 1st French Singaporean Workshop on Formal Methods and Applications (FSFMA 2013)

Showing 1–13 of 13 results for author: Protzenko, J