-
A Coq Mechanization of JavaScript Regular Expression Semantics
Authors:
Noé De Santo,
Aurèle Barrière,
Clément Pit-Claudel
Abstract:
We present an executable, proven-safe, faithful, and future-proof Coq mechanization of JavaScript regular expression (regex) matching, as specified by the last published edition of ECMA-262 section 22.2. This is, to our knowledge, the first time that an industrial-strength regex language has been faithfully mechanized in an interactive theorem prover. We highlight interesting challenges that arose…
▽ More
We present an executable, proven-safe, faithful, and future-proof Coq mechanization of JavaScript regular expression (regex) matching, as specified by the last published edition of ECMA-262 section 22.2. This is, to our knowledge, the first time that an industrial-strength regex language has been faithfully mechanized in an interactive theorem prover. We highlight interesting challenges that arose in the process (including issues of encoding, corner cases, and executability), and we document the steps that we took to ensure that the result is straightforwardly auditable and that our understanding of the spec aligns with existing implementations.
We demonstrate the usability and versatility of the mechanization through a broad collection of analyses, case studies, and experiments: we prove that JavaScript regex matching always terminates and is safe (no assertion failures); we identifying subtle corner cases that led to mistakes in previous publications; we verify an optimization extracted from a state-of-the-art regex engine; we show that some classic properties described in automata textbooks and used in derivatives-based matchers do not hold in JavaScript regexes; and we demonstrate that the cost of updating the mechanization to account for changes in the original specification is reasonably low.
Our mechanization can be extracted to OCaml and linked with Unicode libraries to produce an executable engine that passes the relevant parts of the official Test262 conformance test suite.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Linear Matching of JavaScript Regular Expressions
Authors:
Aurèle Barrière,
Clément Pit-Claudel
Abstract:
Modern regex languages have strayed far from well-understood traditional regular expressions: they include features that fundamentally transform the matching problem. In exchange for these features, modern regex engines at times suffer from exponential complexity blowups, a frequent source of denial-of-service vulnerabilities in JavaScript applications. Worse, regex semantics differ across languag…
▽ More
Modern regex languages have strayed far from well-understood traditional regular expressions: they include features that fundamentally transform the matching problem. In exchange for these features, modern regex engines at times suffer from exponential complexity blowups, a frequent source of denial-of-service vulnerabilities in JavaScript applications. Worse, regex semantics differ across languages, and the impact of these divergences on algorithmic design and worst-case matching complexity has seldom been investigated.
This paper provides a novel perspective on JavaScript's regex semantics by identifying a larger-than-previously-understood subset of the language that can be matched with linear time guarantees. In the process, we discover several cases where state-of-the-art algorithms were either wrong (semantically incorrect), inefficient (suffering from superlinear complexity) or excessively restrictive (assuming certain features could not be matched linearly). We introduce novel algorithms to restore correctness and linear complexity. We further advance the state-of-the-art in linear regex matching by presenting the first nonbacktracking algorithms for matching lookarounds in linear time: one supporting captureless lookbehinds in any regex language, and another leveraging a JavaScript property to support unrestricted lookaheads and lookbehinds. Finally, we describe new time and space complexity tradeoffs for regex engines. All of our algorithms are practical: we validated them in a prototype implementation, and some have also been merged in the V8 JavaScript implementation used in Chrome and Node.js.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Formally Verified Native Code Generation in an Effectful JIT -- or: Turning the CompCert Backend into a Formally Verified JIT Compiler
Authors:
Aurèle Barrière,
Sandrine Blazy,
David Pichardie
Abstract:
Modern Just-in-Time compilers (or JITs) typically interleave several mechanisms to execute a program. For faster startup times and to observe the initial behavior of an execution, interpretation can be initially used. But after a while, JITs dynamically produce native code for parts of the program they execute often. Although some time is spent compiling dynamically, this mechanism makes for much…
▽ More
Modern Just-in-Time compilers (or JITs) typically interleave several mechanisms to execute a program. For faster startup times and to observe the initial behavior of an execution, interpretation can be initially used. But after a while, JITs dynamically produce native code for parts of the program they execute often. Although some time is spent compiling dynamically, this mechanism makes for much faster times for the remaining of the program execution. Such compilers are complex pieces of software with various components, and greatly rely on a precise interplay between the different languages being executed, including on-stack-replacement. Traditional static compilers like CompCert have been mechanized in proof assistants, but JITs have been scarcely formalized so far, partly due to their impure nature and their numerous components. This work presents a model JIT with dynamic generation of native code, implemented and formally verified in Coq. Although some parts of a JIT cannot be written in Coq, we propose a proof methodology to delimit, specify and reason on the impure effects of a JIT. We argue that the daunting task of formally verifying a complete JIT should draw on existing proofs of native code generation. To this end, our work successfully reuses CompCert and its correctness proofs during dynamic compilation. Finally, our prototype can be extracted and executed.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
Changing Observations in Epistemic Temporal Logic
Authors:
Aurèle Barrière,
Bastien Maubert,
Aniello Murano,
Sasha Rubin
Abstract:
We study dynamic changes of agents' observational power in logics of knowledge and time. We consider CTL*K, the extension of CTL* with knowledge operators, and enrich it with a new operator that models a change in an agent's way of observing the system. We extend the classic semantics of knowledge for perfect-recall agents to account for changes of observation, and we show that this new operator s…
▽ More
We study dynamic changes of agents' observational power in logics of knowledge and time. We consider CTL*K, the extension of CTL* with knowledge operators, and enrich it with a new operator that models a change in an agent's way of observing the system. We extend the classic semantics of knowledge for perfect-recall agents to account for changes of observation, and we show that this new operator strictly increases the expressivity of CTL*K. We reduce the model-checking problem for our logic to that for CTL*K, which is known to be decidable. This provides a solution to the model-checking problem for our logic, but its complexity is not optimal. Indeed we provide a direct decision procedure with better complexity.
△ Less
Submitted 3 September, 2018; v1 submitted 17 May, 2018;
originally announced May 2018.
-
High local genetic diversity and low outcrossing rate in Caenorhabditis elegans natural populations
Authors:
Antoine Barrière,
Marie-Anne Félix
Abstract:
Background: Caenorhabditis elegans is a major model system in biology, yet very little is known about its biology outside the laboratory. Especially, its unusual mode of reproduction with self-fertile hermaphrodites and facultative males raises the question of its frequency of outcrossing in natural populations. Results: We describe the first analysis of C. elegans individuals sampled directly f…
▽ More
Background: Caenorhabditis elegans is a major model system in biology, yet very little is known about its biology outside the laboratory. Especially, its unusual mode of reproduction with self-fertile hermaphrodites and facultative males raises the question of its frequency of outcrossing in natural populations. Results: We describe the first analysis of C. elegans individuals sampled directly from natural populations. C. elegans is found predominantly in the dauer stage and with a very low frequency of males compared with hermaphrodites. While C. elegans was previously shown to display a low worldwide genetic diversity, we find by comparison a surprisingly high local genetic diversity of C. elegans populations; this local diversity is contributed in great part by immigration of new alleles rather than by mutation. Our results on heterozygote frequency, male frequency and linkage disequilibrium furthermore show that selfing is the predominant mode of reproduction in C. elegans natural populations, yet that infrequent outcrossing events occur, at a rate of approximately 1%. Conclusions: Our results give a first insight in the biology of C. elegans in the natural populations. They demonstrate that local populations of C. elegans are genetically diverse and that a low frequency of outcrossing allows for the recombination of these locally diverse genotypes.
△ Less
Submitted 1 August, 2005;
originally announced August 2005.