-
Symbolic Automata: $ω$-Regularity Modulo Theories
Authors:
Margus Veanes,
Thomas Ball,
Gabriel Ebner,
Olli Saarikivi
Abstract:
Symbolic automata are finite state automata that support potentially infinite alphabets, such as the set of rational numbers, generally applied to regular expressions/languages over finite words. In symbolic automata (or automata modulo theories), an alphabet is represented by an effective Boolean algebra, supported by a decision procedure for satisfiability. Regular languages over infinite words…
▽ More
Symbolic automata are finite state automata that support potentially infinite alphabets, such as the set of rational numbers, generally applied to regular expressions/languages over finite words. In symbolic automata (or automata modulo theories), an alphabet is represented by an effective Boolean algebra, supported by a decision procedure for satisfiability. Regular languages over infinite words (so called $ω$-regular languages) have a rich history paralleling that of regular languages over finite words, with well known applications to model checking via Büchi automata and temporal logics.
We generalize symbolic automata to support $ω$-regular languages via symbolic transition terms and symbolic derivatives, bringing together a variety of classic automata and logics in a unified framework that provides all the necessary ingredients to support symbolic model checking modulo $A$, $NBW_A$. In particular, we define: (1) alternating Büchi automata modulo $A$, $ABW_A$ as well (non-alternating) non-deterministic Büchi automata modulo $A$, $NBW_A$; (2) an alternation elimination algorithm that incrementally constructs an $NBW_A$ from an $ABW_A$, and can also be used for constructing the product of two $NBW_A$'s; (3) a definition of linear temporal logic (LTL) modulo $A$ that generalizes Vardi's construction of alternating Büchi automata from LTL, using (2) to go from LTL modulo $A$ to $NBW_A$ via $ABW_A$.
Finally, we present a combination of LTL modulo $A$ with extended regular expressions modulo $A$ that generalizes the Property Specification Language (PSL). Our combination allows regex complement, that is not supported in PSL but can be supported naturally by using symbolic transition terms.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Derivative Based Extended Regular Expression Matching Supporting Intersection, Complement and Lookarounds
Authors:
Ian Erik Varatalu,
Margus Veanes,
Juhan-Peep Ernits
Abstract:
Regular expressions are widely used in software. Various regular expression engines support different combinations of extensions to classical regular constructs such as Kleene star, concatenation, nondeterministic choice (union in terms of match semantics). The extensions include e.g. anchors, lookarounds, counters, backreferences. The properties of combinations of such extensions have been subjec…
▽ More
Regular expressions are widely used in software. Various regular expression engines support different combinations of extensions to classical regular constructs such as Kleene star, concatenation, nondeterministic choice (union in terms of match semantics). The extensions include e.g. anchors, lookarounds, counters, backreferences. The properties of combinations of such extensions have been subject of active recent research.
In the current paper we present a symbolic derivatives based approach to finding matches to regular expressions that, in addition to the classical regular constructs, also support complement, intersection and lookarounds (both negative and positive lookaheads and lookbacks). The theory of computing symbolic derivatives and determining nullability given an input string is presented that shows that such a combination of extensions yields a match semantics that corresponds to an effective Boolean algebra, which in turn opens up possibilities of applying various Boolean logic rewrite rules to optimize the search for matches.
In addition to the theoretical framework we present an implementation of the combination of extensions to demonstrate the efficacy of the approach accompanied with practical examples.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Incremental Dead State Detection in Logarithmic Time
Authors:
Caleb Stanford,
Margus Veanes
Abstract:
Identifying live and dead states in an abstract transition system is a recurring problem in formal verification; for example, it arises in our recent work on efficiently deciding regex constraints in SMT. However, state-of-the-art graph algorithms for maintaining reachability information incrementally (that is, as states are visited and before the entire state space is explored) assume that new ed…
▽ More
Identifying live and dead states in an abstract transition system is a recurring problem in formal verification; for example, it arises in our recent work on efficiently deciding regex constraints in SMT. However, state-of-the-art graph algorithms for maintaining reachability information incrementally (that is, as states are visited and before the entire state space is explored) assume that new edges can be added from any state at any time, whereas in many applications, outgoing edges are added from each state as it is explored. To formalize the latter situation, we propose guided incremental digraphs (GIDs), incremental graphs which support labeling closed states (states which will not receive further outgoing edges). Our main result is that dead state detection in GIDs is solvable in $O(\log m)$ amortized time per edge for $m$ edges, improving upon $O(\sqrt{m})$ per edge due to Bender, Fineman, Gilbert, and Tarjan (BFGT) for general incremental directed graphs.
We introduce two algorithms for GIDs: one establishing the logarithmic time bound, and a second algorithm to explore a lazy heuristics-based approach. To enable an apples-to-apples experimental comparison, we implemented both algorithms, two simpler baselines, and the state-of-the-art BFGT baseline using a common directed graph interface in Rust. Our evaluation shows $110$-$530$x speedups over BFGT for the largest input graphs over a range of graph classes, random graphs, and graphs arising from regex benchmarks.
△ Less
Submitted 29 May, 2023; v1 submitted 12 January, 2023;
originally announced January 2023.
-
Succinct Determinisation of Counting Automata via Sphere Construction (Technical Report)
Authors:
Lukáš Holík,
Ondřej Lengál,
Olli Saarikivi,
Lenka Turoňová,
Margus Veanes,
Tomáš Vojnar
Abstract:
We propose an efficient algorithm for determinising counting automata (CAs), i.e., finite automata extended with bounded counters. The algorithm avoids unfolding counters into control states, unlike the naïve approach, and thus produces much smaller deterministic automata. We also develop a simplified and faster version of the general algorithm for the sub-class of so-called monadic CAs (MCAs), i.…
▽ More
We propose an efficient algorithm for determinising counting automata (CAs), i.e., finite automata extended with bounded counters. The algorithm avoids unfolding counters into control states, unlike the naïve approach, and thus produces much smaller deterministic automata. We also develop a simplified and faster version of the general algorithm for the sub-class of so-called monadic CAs (MCAs), i.e., CAs with counting loops on character classes, which are common in practice. Our main motivation is (besides applications in verification and decision procedures of logics) the application of deterministic (M)CAs in pattern matching regular expressions with counting, which are very common in e.g. network traffic processing and log analysis. We have evaluated our algorithm against practical benchmarks from these application domains and concluded that compared to the naïve approach, our algorithm is much less prone to explode, produces automata that can be several orders of magnitude smaller, and is overall faster.
△ Less
Submitted 4 October, 2019;
originally announced October 2019.
-
Simulation Algorithms for Symbolic Automata (Technical Report)
Authors:
Lukáš Holík,
Ondřej Lengál,
Juraj Síč,
Margus Veanes,
Tomáš Vojnar
Abstract:
We investigate means of efficient computation of the simulation relation over symbolic finite automata (SFAs), i.e., finite automata with transitions labeled by predicates over alphabet symbols. In one approach, we build on the algorithm by Ilie, Navaro, and Yu proposed originally for classical finite automata, modifying it using the so-called mintermisation of the transition predicates. This solu…
▽ More
We investigate means of efficient computation of the simulation relation over symbolic finite automata (SFAs), i.e., finite automata with transitions labeled by predicates over alphabet symbols. In one approach, we build on the algorithm by Ilie, Navaro, and Yu proposed originally for classical finite automata, modifying it using the so-called mintermisation of the transition predicates. This solution, however, generates all Boolean combinations of the predicates, which easily causes an exponential blowup in the number of transitions. Therefore, we propose two more advanced solutions. The first one still applies mintermisation but in a local way, mitigating the size of the exponential blowup. The other one focuses on a novel symbolic way of dealing with transitions, for which we need to sacrifice the counting technique of the original algorithm (counting is used to decrease the dependency of the running time on the number of transitions from quadratic to linear). We perform a thorough experimental evaluation of all the algorithms, together with several further alternatives, showing that all of them have their merits in practice, but with the clear indication that in most of the cases, efficient treatment of symbolic transitions is more beneficial than counting.
△ Less
Submitted 27 July, 2018; v1 submitted 23 July, 2018;
originally announced July 2018.