Search | arXiv e-print repository

Relational Perspective on Graph Query Languages

Authors: Diego Figueira, Anthony W. Lin, Liat Peterfreund

Abstract: We study a relational perspective of graph database querying. Such a perspective underlies various graph database systems but very few theoretical investigations have been conducted on it. This perspective offers a powerful and unified framework to study graph database querying, by which algorithms and complexity follow from classical results. We provide two concrete applications. The first is q… ▽ More We study a relational perspective of graph database querying. Such a perspective underlies various graph database systems but very few theoretical investigations have been conducted on it. This perspective offers a powerful and unified framework to study graph database querying, by which algorithms and complexity follow from classical results. We provide two concrete applications. The first is querying property graphs. The property graph data model supersedes previously proposed graph models and underlies the new standard GQL for graph query languages. We show that this standard can be, by and large, expressed by extensions of relational calculus with transitive closure operators (FO[TC]) and existential second-order quantifiers (ESO). With this, we obtain optimal data complexity bounds, along with extensions including schema validation. The second application is incorporating data from concrete domains (e.g., numbers) in graph database querying. We use embedded finite model theory and, by exploiting a generic Restricted Quantifier Collapse (RQC) result for FO[TC] and ESO, we obtain optimal data complexity bounds for GQL with arithmetics and comparisons. Moreover, we show that Regular Data Path Querying with operations on data (i.e. using register automata formalisms) can be captured in FO[TC] over embedded finite graphs while preserving nondeterministic logspace data complexity. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2406.17871 [pdf, other]

Revisiting the Expressiveness Landscape of Data Graph Queries

Authors: Michael Benedikt, Anthony Widjaja Lin, Di-De Yen

Abstract: The study of graph queries in database theory has spanned more than three decades, resulting in a multitude of proposals for graph query languages. These languages differ in the mechanisms. We can identify three main families of languages, with the canonical representatives being: (1) regular path queries, (2) walk logic, and (3) first-order logic with transitive closure operators. This paper prov… ▽ More The study of graph queries in database theory has spanned more than three decades, resulting in a multitude of proposals for graph query languages. These languages differ in the mechanisms. We can identify three main families of languages, with the canonical representatives being: (1) regular path queries, (2) walk logic, and (3) first-order logic with transitive closure operators. This paper provides a complete picture of the expressive power of these languages in the context of data graphs. Specifically, we consider a graph data model that supports querying over both data and topology. For example, "Does there exist a path between two different persons in a social network with the same last name?". We also show that an extension of (1), augmented with transitive closure operators, can unify the expressivity of (1)--(3) without increasing the query evaluation complexity. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2405.16166 [pdf, other]

The Power of Hard Attention Transformers on Data Sequences: A Formal Language Theoretic Perspective

Authors: Pascal Bergsträßer, Chris Köcher, Anthony Widjaja Lin, Georg Zetzsche

Abstract: Formal language theory has recently been successfully employed to unravel the power of transformer encoders. This setting is primarily applicable in Natural Languange Processing (NLP), as a token embedding function (where a bounded number of tokens is admitted) is first applied before feeding the input to the transformer. On certain kinds of data (e.g. time series), we want our transformers to be… ▽ More Formal language theory has recently been successfully employed to unravel the power of transformer encoders. This setting is primarily applicable in Natural Languange Processing (NLP), as a token embedding function (where a bounded number of tokens is admitted) is first applied before feeding the input to the transformer. On certain kinds of data (e.g. time series), we want our transformers to be able to handle \emph{arbitrary} input sequences of numbers (or tuples thereof) without \emph{a priori} limiting the values of these numbers. In this paper, we initiate the study of the expressive power of transformer encoders on sequences of data (i.e. tuples of numbers). Our results indicate an increase in expressive power of hard attention transformers over data sequences, in stark contrast to the case of strings. In particular, we prove that Unique Hard Attention Transformers (UHAT) over inputs as data sequences no longer lie within the circuit complexity class $AC^0$ (even without positional encodings), unlike the case of string inputs, but are still within the complexity class $TC^0$ (even with positional encodings). Over strings, UHAT without positional encodings capture only regular languages. In contrast, we show that over data sequences UHAT can capture non-regular properties. Finally, we show that UHAT capture languages definable in an extension of linear temporal logic with unary numeric predicates and arithmetics. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2401.02618 [pdf, ps, other]

doi 10.1145/3632864

Regular Abstractions for Array Systems

Authors: Chih-Duo Hong, Anthony W. Lin

Abstract: Verifying safety and liveness over array systems is a highly challenging problem. Array systems naturally capture parameterized systems such as distributed protocols with an unbounded number of processes. Such distributed protocols often exploit process IDs during their computation, resulting in array systems whose element values range over an infinite domain. In this paper, we develop a novel fra… ▽ More Verifying safety and liveness over array systems is a highly challenging problem. Array systems naturally capture parameterized systems such as distributed protocols with an unbounded number of processes. Such distributed protocols often exploit process IDs during their computation, resulting in array systems whose element values range over an infinite domain. In this paper, we develop a novel framework for proving safety and liveness over array systems. The crux of the framework is to overapproximate an array system as a string rewriting system (i.e. over a finite alphabet) by means of a new predicate abstraction that exploits the so-called indexed predicates. This allows us to tap into powerful verification methods for string rewriting systems that have been heavily developed in the last few decades (e.g. regular model checking). We demonstrate how our method yields simple, automatically verifiable proofs of safety and liveness properties for challenging examples, including Dijkstra's self-stabilizing protocol and the Chang-Roberts leader election protocol. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2311.17037 [pdf, other]

Concurrent Stochastic Lossy Channel Games

Authors: Daniel Stan, Muhammad Najib, Anthony Widjaja Lin, Parosh Aziz Abdulla

Abstract: Concurrent stochastic games are an important formalism for the rational verification of probabilistic multi-agent systems, which involves verifying whether a temporal logic property is satisfied in some or all game-theoretic equilibria of such systems. In this work, we study the rational verification of probabilistic multi-agent systems where agents can cooperate by communicating over unbounded lo… ▽ More Concurrent stochastic games are an important formalism for the rational verification of probabilistic multi-agent systems, which involves verifying whether a temporal logic property is satisfied in some or all game-theoretic equilibria of such systems. In this work, we study the rational verification of probabilistic multi-agent systems where agents can cooperate by communicating over unbounded lossy channels. To model such systems, we present concurrent stochastic lossy channel games (CSLCG) and employ an equilibrium concept from cooperative game theory known as the core, which is the most fundamental and widely studied cooperative equilibrium concept. Our main contribution is twofold. First, we show that the rational verification problem is undecidable for systems whose agents have almost-sure LTL objectives. Second, we provide a decidable fragment of such a class of objectives that subsumes almost-sure reachability and safety. Our techniques involve reductions to solving infinite-state zero-sum games with conjunctions of qualitative objectives. To the best of our knowledge, our result represents the first decidability result on the rational verification of stochastic multi-agent systems on infinite arenas. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: To appear at CSL 2024. Extended version

arXiv:2311.15883 [pdf, other]

Characterising and Verifying the Core in Concurrent Multi-Player Mean-Payoff Games (Full Version)

Authors: Julian Gutierrez, Anthony W. Lin, Muhammad Najib, Thomas Steeples, Michael Wooldridge

Abstract: Concurrent multi-player mean-payoff games are important models for systems of agents with individual, non-dichotomous preferences. Whilst these games have been extensively studied in terms of their equilibria in non-cooperative settings, this paper explores an alternative solution concept: the core from cooperative game theory. This concept is particularly relevant for cooperative AI systems, as i… ▽ More Concurrent multi-player mean-payoff games are important models for systems of agents with individual, non-dichotomous preferences. Whilst these games have been extensively studied in terms of their equilibria in non-cooperative settings, this paper explores an alternative solution concept: the core from cooperative game theory. This concept is particularly relevant for cooperative AI systems, as it enables the modelling of cooperation among agents, even when their goals are not fully aligned. Our contribution is twofold. First, we provide a characterisation of the core using discrete geometry techniques and establish a necessary and sufficient condition for its non-emptiness. We then use the characterisation to prove the existence of polynomial witnesses in the core. Second, we use the existence of such witnesses to solve key decision problems in rational verification and provide tight complexity bounds for the problem of checking whether some/every equilibrium in a game satisfies a given LTL or GR(1) specification. Our approach is general and can be adapted to handle other specifications expressed in various fragments of LTL without incurring additional computational costs. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: This is the full version of the paper with the same title that appears in the CSL'24 proceedings

arXiv:2311.04031 [pdf, other]

Ramsey Quantifiers in Linear Arithmetics

Authors: Pascal Bergsträßer, Moses Ganardi, Anthony W. Lin, Georg Zetzsche

Abstract: We study Satisfiability Modulo Theories (SMT) enriched with the so-called Ramsey quantifiers, which assert the existence of cliques (complete graphs) in the graph induced by some formulas. The extended framework is known to have applications in proving program termination (in particular, whether a transitive binary predicate is well-founded), and monadic decomposability of SMT formulas. Our main r… ▽ More We study Satisfiability Modulo Theories (SMT) enriched with the so-called Ramsey quantifiers, which assert the existence of cliques (complete graphs) in the graph induced by some formulas. The extended framework is known to have applications in proving program termination (in particular, whether a transitive binary predicate is well-founded), and monadic decomposability of SMT formulas. Our main result is a new algorithm for eliminating Ramsey quantifiers from three common SMT theories: Linear Integer Arithmetic (LIA), Linear Real Arithmetic (LRA), and Linear Integer Real Arithmetic (LIRA). In particular, if we work only with existentially quantified formulas, then our algorithm runs in polynomial time and produces a formula of linear size. One immediate consequence is that checking well-foundedness of a given formula in the aforementioned theory defining a transitive predicate can be straightforwardly handled by highly optimized SMT-solvers. We show also how this provides a uniform semi-algorithm for verifying termination and liveness with completeness guarantee (in fact, with an optimal computational complexity) for several well-known classes of infinite-state systems, which include succinct timed systems, one-counter systems, and monotonic counter systems. Another immediate consequence is a solution to an open problem on checking monadic decomposability of a given relation in quantifier-free fragments of LRA and LIRA, which is an important problem in automated reasoning and constraint databases. Our result immediately implies decidability of this problem with an optimal complexity (coNP-complete) and enables exploitation of SMT-solvers. It also provides a termination guarantee for the generic monadic decomposition algorithm of Veanes et al. for LIA, LRA, and LIRA. We report encouraging experimental results on a prototype implementation of our algorithms on micro-benchmarks. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2311.03901 [pdf, ps, other]

Parikh's Theorem Made Symbolic

Authors: Matthew Hague, Artur Jeż, Anthony W. Lin

Abstract: Parikh's Theorem is a fundamental result in automata theory with numerous applications in computer science: software verification (e.g. infinite-state verification, string constraints, and theory of arrays), verification of cryptographic protocols (e.g. using Horn clauses modulo equational theories) and database querying (e.g. evaluating path-queries in graph databases). Parikh's Theorem states th… ▽ More Parikh's Theorem is a fundamental result in automata theory with numerous applications in computer science: software verification (e.g. infinite-state verification, string constraints, and theory of arrays), verification of cryptographic protocols (e.g. using Horn clauses modulo equational theories) and database querying (e.g. evaluating path-queries in graph databases). Parikh's Theorem states that the letter-counting abstraction of a language recognized by finite automata or context-free grammars is definable in Presburger Arithmetic. Unfortunately, real-world applications typically require large alphabets - which are well-known to be not amenable to explicit treatment of the alphabets. Symbolic automata have proven in the last decade to be an effective algorithmic framework for handling large finite or even infinite alphabets. A symbolic automaton employs an effective boolean algebra, which offers a symbolic representation of character sets and often lends itself to an exponentially more succinct representation of a language. Instead of letter-counting, Parikh's Theorem for symbolic automata amounts to counting the number of times different predicates are satisfied by an input sequence. Unfortunately, naively applying Parikh's Theorem from classical automata theory to symbolic automata yields existential Presburger formulas of exponential size. We provide a new construction for Parikh's Theorem for symbolic automata and grammars, which avoids this exponential blowup: our algorithm computes an existential formula in polynomial-time over (quantifier-free) Presburger and the base theory. In fact, our algorithm extends to the model of parametric symbolic grammars, which are one of the most expressive models of languages over infinite alphabets. We have implemented our algorithm and show it can be used to solve string constraints that are difficult to solve by existing solvers. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: Accepted tp POPL '24

arXiv:2310.03817 [pdf, ps, other]

Logical Languages Accepted by Transformer Encoders with Hard Attention

Authors: Pablo Barcelo, Alexander Kozachinskiy, Anthony Widjaja Lin, Vladimir Podolskii

Abstract: We contribute to the study of formal languages that can be recognized by transformer encoders. We focus on two self-attention mechanisms: (1) UHAT (Unique Hard Attention Transformers) and (2) AHAT (Average Hard Attention Transformers). UHAT encoders are known to recognize only languages inside the circuit complexity class ${\sf AC}^0$, i.e., accepted by a family of poly-sized and depth-bounded boo… ▽ More We contribute to the study of formal languages that can be recognized by transformer encoders. We focus on two self-attention mechanisms: (1) UHAT (Unique Hard Attention Transformers) and (2) AHAT (Average Hard Attention Transformers). UHAT encoders are known to recognize only languages inside the circuit complexity class ${\sf AC}^0$, i.e., accepted by a family of poly-sized and depth-bounded boolean circuits with unbounded fan-ins. On the other hand, AHAT encoders can recognize languages outside ${\sf AC}^0$), but their expressive power still lies within the bigger circuit complexity class ${\sf TC}^0$, i.e., ${\sf AC}^0$-circuits extended by majority gates. We first show a negative result that there is an ${\sf AC}^0$-language that cannot be recognized by an UHAT encoder. On the positive side, we show that UHAT encoders can recognize a rich fragment of ${\sf AC}^0$-languages, namely, all languages definable in first-order logic with arbitrary unary numerical predicates. This logic, includes, for example, all regular languages from ${\sf AC}^0$. We then show that AHAT encoders can recognize all languages of our logic even when we enrich it with counting terms. We apply these results to derive new results on the expressive power of UHAT and AHAT up to permutation of letters (a.k.a. Parikh images). △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2308.00175 [pdf, ps, other]

Decision Procedures for Sequence Theories (Technical Report)

Authors: Artur Jeż, Anthony W. Lin, Oliver Markgraf, Philipp Rümmer

Abstract: Sequence theories are an extension of theories of strings with an infinite alphabet of letters, together with a corresponding alphabet theory (e.g. linear integer arithmetic). Sequences are natural abstractions of extendable arrays, which permit a wealth of operations including append, map, split, and concatenation. In spite of the growing amount of tool support for theories of sequences by leadin… ▽ More Sequence theories are an extension of theories of strings with an infinite alphabet of letters, together with a corresponding alphabet theory (e.g. linear integer arithmetic). Sequences are natural abstractions of extendable arrays, which permit a wealth of operations including append, map, split, and concatenation. In spite of the growing amount of tool support for theories of sequences by leading SMT-solvers, little is known about the decidability of sequence theories, which is in stark contrast to the state of the theories of strings. We show that the decidable theory of strings with concatenation and regular constraints can be extended to the world of sequences over an alphabet theory that forms a Boolean algebra, while preserving decidability. In particular, decidability holds when regular constraints are interpreted as parametric automata (which extend both symbolic automata and variable automata), but fails when interpreted as register automata (even over the alphabet theory of equality). When length constraints are added, the problem is Turing-equivalent to word equations with length (and regular) constraints. Similar investigations are conducted in the presence of symbolic transducers, which naturally model sequence functions like map, split, filter, etc. We have developed a new sequence solver, SeCo, based on parametric automata, and show its efficacy on two classes of benchmarks: (i) invariant checking on array-manipulating programs and parameterized systems, and (ii) benchmarks on symbolic register automata. △ Less

Submitted 31 July, 2023; originally announced August 2023.

arXiv:2205.09015 [pdf, ps, other]

Ramsey Quantifiers over Automatic Structures: Complexity and Applications to Verification

Authors: Pascal Bergsträßer, Moses Ganardi, Anthony W. Lin, Georg Zetzsche

Abstract: Automatic structures are infinite structures that are finitely represented by synchronized finite-state automata. This paper concerns specifically automatic structures over finite words and trees (ranked/unranked). We investigate the "directed version" of Ramsey quantifiers, which express the existence of an infinite directed clique. This subsumes the standard "undirected version" of Ramsey quanti… ▽ More Automatic structures are infinite structures that are finitely represented by synchronized finite-state automata. This paper concerns specifically automatic structures over finite words and trees (ranked/unranked). We investigate the "directed version" of Ramsey quantifiers, which express the existence of an infinite directed clique. This subsumes the standard "undirected version" of Ramsey quantifiers. Interesting connections between Ramsey quantifiers and two problems in verification are firstly observed: (1) reachability with Büchi and generalized Büchi conditions in regular model checking can be seen as Ramsey quantification over transitive automatic graphs (i.e., whose edge relations are transitive), (2) checking monadic decomposability (a.k.a. recognizability) of automatic relations can be viewed as Ramsey quantification over co-transitive automatic graphs (i.e., the complements of whose edge relations are transitive). We provide a comprehensive complexity landscape of Ramsey quantifiers in these three cases (general, transitive, co-transitive), all between NL and EXP. In turn, this yields a wealth of new results with precise complexity, e.g., verification of subtree/flat prefix rewriting, as well as monadic decomposability over tree-automatic relations. We also obtain substantially simpler proofs, e.g., for NL complexity for monadic decomposability over word-automatic relations (given by DFAs). △ Less

Submitted 13 February, 2023; v1 submitted 18 May, 2022; originally announced May 2022.

arXiv:2112.06039 [pdf, other]

doi 10.1145/3497775.3503691

CertiStr: A Certified String Solver (technical report)

Authors: Shuanglong Kan, Anthony W. Lin, Philipp Rümmer, Micha Schrader

Abstract: Theories over strings are among the most heavily researched logical theories in the SMT community in the past decade, owing to the error-prone nature of string manipulations, which often leads to security vulnerabilities (e.g. cross-site scripting and code injection). The majority of the existing decision procedures and solvers for these theories are themselves intricate; they are complicated algo… ▽ More Theories over strings are among the most heavily researched logical theories in the SMT community in the past decade, owing to the error-prone nature of string manipulations, which often leads to security vulnerabilities (e.g. cross-site scripting and code injection). The majority of the existing decision procedures and solvers for these theories are themselves intricate; they are complicated algorithmically, and also have to deal with a very rich vocabulary of operations. This has led to a plethora of bugs in implementation, which have for instance been discovered through fuzzing. In this paper, we present CertiStr, a certified implementation of a string constraint solver for the theory of strings with concatenation and regular constraints. CertiStr aims to solve string constraints using a forward-propagation algorithm based on symbolic representations of regular constraints as symbolic automata, which returns three results: sat, unsat, and unknown, and is guaranteed to terminate for the string constraints whose concatenation dependencies are acyclic. The implementation has been developed and proven correct in Isabelle/HOL, through which an effective solver in OCaml was generated. We demonstrate the effectiveness and efficiency of CertiStr against the standard Kaluza benchmark, in which 80.4% tests are in the string constraint fragment of CertiStr. Of these 80.4% tests, CertiStr can solve 83.5% (i.e. CertiStr returns sat or unsat) within 60s. △ Less

Submitted 11 December, 2021; originally announced December 2021.

arXiv:2111.04298 [pdf, other]

Solving String Constraints With Regex-Dependent Functions Through Transducers With Priorities And Variables

Authors: Taolue Chen, Alejandro Flores Lamas, Matthew Hague, Zhilei Han, Denghang Hu, Shuanglong Kan, Anthony Widjaja Lin, Philipp Ruemmer, Zhilin Wu

Abstract: Regular expressions are a classical concept in formal language theory. Regular expressions in programming languages (RegEx) such as JavaScript, feature non-standard semantics of operators (e.g. greedy/lazy Kleene star), as well as additional features such as capturing groups and references. While symbolic execution of programs containing RegExes appeals to string solvers natively supporting import… ▽ More Regular expressions are a classical concept in formal language theory. Regular expressions in programming languages (RegEx) such as JavaScript, feature non-standard semantics of operators (e.g. greedy/lazy Kleene star), as well as additional features such as capturing groups and references. While symbolic execution of programs containing RegExes appeals to string solvers natively supporting important features of RegEx, such a string solver is hitherto missing. In this paper, we propose the first string theory and string solver that natively provide such a support. The key idea of our string solver is to introduce a new automata model, called prioritized streaming string transducers (PSST), to formalize the semantics of RegEx-dependent string functions. PSSTs combine priorities, which have previously been introduced in prioritized finite-state automata to capture greedy/lazy semantics, with string variables as in streaming string transducers to model capturing groups. We validate the consistency of the formal semantics with the actual JavaScript semantics by extensive experiments. Furthermore, to solve the string constraints, we show that PSSTs enjoy nice closure and algorithmic properties, in particular, the regularity-preserving property (i.e., pre-images of regular constraints under PSSTs are regular), and introduce a sound sequent calculus that exploits these properties and performs propagation of regular constraints by means of taking post-images or pre-images. Although the satisfiability of the string constraint language is undecidable, we show that our approach is complete for the so-called straight-line fragment. We evaluate the performance of our string solver on over 195000 string constraints generated from an open-source RegEx library. The experimental results show the efficacy of our approach, drastically improving the existing methods in both precision and efficiency. △ Less

Submitted 21 November, 2021; v1 submitted 8 November, 2021; originally announced November 2021.

Comments: 44 pages, 18 figures

ACM Class: F.4.1; F.1.1

arXiv:2107.09119 [pdf, ps, other]

Rational Verification for Probabilistic Systems

Authors: Julian Gutierrez, Lewis Hammond, Anthony W. Lin, Muhammad Najib, Michael Wooldridge

Abstract: Rational verification is the problem of determining which temporal logic properties will hold in a multi-agent system, under the assumption that agents in the system act rationally, by choosing strategies that collectively form a game-theoretic equilibrium. Previous work in this area has largely focussed on deterministic systems. In this paper, we develop the theory and algorithms for rational ver… ▽ More Rational verification is the problem of determining which temporal logic properties will hold in a multi-agent system, under the assumption that agents in the system act rationally, by choosing strategies that collectively form a game-theoretic equilibrium. Previous work in this area has largely focussed on deterministic systems. In this paper, we develop the theory and algorithms for rational verification in probabilistic systems. We focus on concurrent stochastic games (CSGs), which can be used to model uncertainty and randomness in complex multi-agent environments. We study the rational verification problem for both non-cooperative games and cooperative games in the qualitative probabilistic setting. In the former case, we consider LTL properties satisfied by the Nash equilibria of the game and in the latter case LTL properties satisfied by the core. In both cases, we show that the problem is 2EXPTIME-complete, thus not harder than the much simpler verification problem of model checking LTL properties of systems modelled as Markov decision processes (MDPs). △ Less

Submitted 26 July, 2021; v1 submitted 19 July, 2021; originally announced July 2021.

Comments: 18th International Conference on Principles of Knowledge Representation and Reasoning (KR 2021)

arXiv:2105.13071 [pdf, other]

Learning Union of Integer Hypercubes with Queries (Technical Report)

Authors: Oliver Markgraf, Daniel Stan, Anthony W. Lin

Abstract: We study the problem of learning a finite union of integer (axis-aligned) hypercubes over the d-dimensional integer lattice, i.e., whose edges are parallel to the coordinate axes. This is a natural generalization of the classic problem in the computational learning theory of learning rectangles. We provide a learning algorithm with access to a minimally adequate teacher (i.e. membership and equiva… ▽ More We study the problem of learning a finite union of integer (axis-aligned) hypercubes over the d-dimensional integer lattice, i.e., whose edges are parallel to the coordinate axes. This is a natural generalization of the classic problem in the computational learning theory of learning rectangles. We provide a learning algorithm with access to a minimally adequate teacher (i.e. membership and equivalence oracles) that solves this problem in polynomial-time, for any fixed dimension d. Over a non-fixed dimension, the problem subsumes the problem of learning DNF boolean formulas, a central open problem in the field. We have also provided extensions to handle infinite hypercubes in the union, as well as showing how subset queries could improve the performance of the learning algorithm in practice. Our problem has a natural application to the problem of monadic decomposition of quantifier-free integer linear arithmetic formulas, which has been actively studied in recent years. In particular, a finite union of integer hypercubes correspond to a finite disjunction of monadic predicates over integer linear arithmetic (without modulo constraints). Our experiments suggest that our learning algorithms substantially outperform the existing algorithms. △ Less

Submitted 27 May, 2021; originally announced May 2021.

Comments: Extended version (technical report) of a paper published in CAV 2021

arXiv:2102.04361 [pdf, other]

Regular Model Checking Approach to Knowledge Reasoning over Parameterized Systems (technical report)

Authors: Daniel Stan, Anthony Widjaja Lin

Abstract: We present a general framework for modelling and verifying epistemic properties over parameterized multi-agent systems that communicate by truthful public announcements. In our framework, the number of agents or the amount of certain resources are parameterized (i.e. not known a priori), and the corresponding verification problem asks whether a given epistemic property is true regardless of the in… ▽ More We present a general framework for modelling and verifying epistemic properties over parameterized multi-agent systems that communicate by truthful public announcements. In our framework, the number of agents or the amount of certain resources are parameterized (i.e. not known a priori), and the corresponding verification problem asks whether a given epistemic property is true regardless of the instantiation of the parameters. For example, in a muddy children puzzle, one could ask whether each child will eventually find out whether (s)he is muddy, regardless of the number of children. Our framework is regular model checking (RMC)-based, wherein synchronous finite-state automata (equivalently, monadic second-order logic over words) are used to specify the systems. We propose an extension of public announcement logic as specification language. Of special interests is the addition of the so-called iterated public announcement operators, which are crucial for reasoning about knowledge in parameterized systems. Although the operators make the model checking problem undecidable, we show that this becomes decidable when an appropriate "disappearance relation" is given. Further, we show how Angluin's L*-algorithm for learning finite automata can be applied to find a disappearance relation, which is guaranteed to terminate if it is regular. We have implemented the algorithm and apply this to such examples as the Muddy Children Puzzle, the Russian Card Problem, and Large Number Challenge. △ Less

Submitted 8 March, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

Comments: Extended version, version of record accepted at the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-21)

arXiv:2011.02413 [pdf, ps, other]

Probabilistic Bisimulation for Parameterized Systems (Technical Report)

Authors: Chih-Duo Hong, Anthony W. Lin, Rupak Majumdar, Philipp Rümmer

Abstract: Probabilistic bisimulation is a fundamental notion of process equivalence for probabilistic systems. Among others, it has important applications including formalizing the anonymity property of several communication protocols. There is a lot of work on verifying probabilistic bisimulation for finite systems. This is however not the case for parameterized systems, where the problem is in general und… ▽ More Probabilistic bisimulation is a fundamental notion of process equivalence for probabilistic systems. Among others, it has important applications including formalizing the anonymity property of several communication protocols. There is a lot of work on verifying probabilistic bisimulation for finite systems. This is however not the case for parameterized systems, where the problem is in general undecidable. In this paper we provide a generic framework for reasoning about probabilistic bisimulation for parameterized systems. Our approach is in the spirit of software verification, wherein we encode proof rules for probabilistic bisimulation and use a decidable first-order theory to specify systems and candidate bisimulation relations, which can then be checked automatically against the proof rules. As a case study, we show that our framework is sufficiently expressive for proving the anonymity property of the parameterized dining cryptographers protocol and the parameterized grades protocol, when supplied with a candidate regular bisimulation relation. Both of these protocols hitherto could not be verified by existing automatic methods. Moreover, with the help of standard automata learning algorithms, we show that the candidate relations can be synthesized fully automatically, making the verification fully automated. △ Less

Submitted 4 November, 2020; originally announced November 2020.

arXiv:2010.15975 [pdf, ps, other]

String Constraints with Concatenation and Transducers Solved Efficiently (Technical Report)

Authors: Lukas Holik, Petr Janku, Anthony W. Lin, Philipp Rümmer, Tomas Vojnar

Abstract: String analysis is the problem of reasoning about how strings are manipulated by a program. It has numerous applications including automatic detection of cross-site scripting (XSS). A popular string analysis technique includes symbolic executions, which at their core use string (constraint) solvers. Such solvers typically reason about constraints expressed in theories over strings with the concate… ▽ More String analysis is the problem of reasoning about how strings are manipulated by a program. It has numerous applications including automatic detection of cross-site scripting (XSS). A popular string analysis technique includes symbolic executions, which at their core use string (constraint) solvers. Such solvers typically reason about constraints expressed in theories over strings with the concatenation operator as an atomic constraint. In recent years, researchers started to recognise the importance of incorporating the replace-all operator and finite transductions in the theories of strings with concatenation. Such string operations are typically crucial for reasoning about XSS vulnerabilities in web applications, especially for modelling sanitisation functions and implicit browser transductions (e.g. innerHTML). In this paper, we provide the first string solver that can reason about constraints involving both concatenation and finite transductions. Moreover, it has a completeness and termination guarantee for several important fragments (e.g. straight-line fragment). The main challenge addressed in the paper is the prohibitive worst-case complexity of the theory. To this end, we propose a method that exploits succinct alternating finite automata as concise symbolic representations of string constraints. Alternation offers not only exponential savings in space when representing Boolean combinations of transducers, but also a possibility of succinct representation of otherwise costly combinations of transducers and concatenation. Reasoning about the emptiness of the AFA language requires a state-space exploration in an exponential-sized graph, for which we use model checking algorithms (e.g. IC3). We have implemented our algorithm and demonstrated its efficacy on benchmarks that are derived from XSS and other examples in the literature. △ Less

Submitted 29 October, 2020; originally announced October 2020.

Comments: Full version of POPL'18 published paper with all proofs

arXiv:2010.02340 [pdf, ps, other]

Complexity Analysis of Tree Share Structure

Authors: Xuan-Bach Le, Aquinas Hobor, Anthony W. Lin

Abstract: The tree share structure proposed by Dockins et al. is an elegant model for tracking disjoint ownership in concurrent separation logic, but decision procedures for tree shares are hard to implement due to a lack of a systematic theoretical study. We show that the first-order theory of the full Boolean algebra of tree shares (that is, with all tree-share constants) is decidable and has the same com… ▽ More The tree share structure proposed by Dockins et al. is an elegant model for tracking disjoint ownership in concurrent separation logic, but decision procedures for tree shares are hard to implement due to a lack of a systematic theoretical study. We show that the first-order theory of the full Boolean algebra of tree shares (that is, with all tree-share constants) is decidable and has the same complexity as of the first-order theory of Countable Atomless Boolean Algebras. We prove that combining this additive structure with a constant-restricted unary multiplicative "relativization" operator has a non-elementary lower bound. We examine the consequences of this lower bound and prove that it comes from the combination of both theories by proving an upper bound on a generalization of the restricted multiplicative theory in isolation. △ Less

Submitted 5 October, 2020; originally announced October 2020.

Comments: 20 pages including appendix. Published at the 16th Asian Symposium on Programming Languages and Systems (APLAS 2018) in December 2018

arXiv:2009.13459 [pdf, other]

Parameterized Synthesis with Safety Properties

Authors: Oliver Markgraf, Chih-Duo Hong, Anthony W. Lin, Muhammad Najib, Daniel Neider

Abstract: Parameterized synthesis offers a solution to the problem of constructing correct and verified controllers for parameterized systems. Such systems occur naturally in practice (e.g., in the form of distributed protocols where the amount of processes is often unknown at design time and the protocol must work regardless of the number of processes). In this paper, we present a novel learning based appr… ▽ More Parameterized synthesis offers a solution to the problem of constructing correct and verified controllers for parameterized systems. Such systems occur naturally in practice (e.g., in the form of distributed protocols where the amount of processes is often unknown at design time and the protocol must work regardless of the number of processes). In this paper, we present a novel learning based approach to the synthesis of reactive controllers for parameterized systems from safety specifications. We use the framework of regular model checking to model the synthesis problem as an infinite-duration two-player game and show how one can utilize Angluin's well-known L* algorithm to learn correct-by-design controllers. This approach results in a synthesis procedure that is conceptually simpler than existing synthesis methods with a completeness guarantee, whenever a winning strategy can be expressed by a regular set. We have implemented our algorithm in a tool called L*-PSynth and have demonstrated its performance on a range of benchmarks, including robotic motion planning and distributed protocols. Despite the simplicity of L*-PSynth it competes well against (and in many cases even outperforms) the state-of-the-art tools for synthesizing parameterized systems. △ Less

Submitted 28 September, 2020; originally announced September 2020.

Comments: 18 pages

arXiv:2007.15478 [pdf, other]

doi 10.46298/lmcs-17(4:4)2021

Quadratic Word Equations with Length Constraints, Counter Systems, and Presburger Arithmetic with Divisibility

Authors: Anthony W. Lin, Rupak Majumdar

Abstract: Word equations are a crucial element in the theoretical foundation of constraint solving over strings. A word equation relates two words over string variables and constants. Its solution amounts to a function map** variables to constant strings that equate the left and right hand sides of the equation. While the problem of solving word equations is decidable, the decidability of the problem of s… ▽ More Word equations are a crucial element in the theoretical foundation of constraint solving over strings. A word equation relates two words over string variables and constants. Its solution amounts to a function map** variables to constant strings that equate the left and right hand sides of the equation. While the problem of solving word equations is decidable, the decidability of the problem of solving a word equation with a length constraint (i.e., a constraint relating the lengths of words in the word equation) has remained a long-standing open problem. We focus on the subclass of quadratic word equations, i.e., in which each variable occurs at most twice. We first show that the length abstractions of solutions to quadratic word equations are in general not Presburger-definable. We then describe a class of counter systems with Presburger transition relations which capture the length abstraction of a quadratic word equation with regular constraints. We provide an encoding of the effect of a simple loop of the counter systems in the existential theory of Presburger Arithmetic with divisibility (PAD). Since PAD is decidable (NP-hard and is in NEXP), we obtain a decision procedure for quadratic words equations with length constraints for which the associated counter system is flat (i.e., all nodes belong to at most one cycle). In particular, we show a decidability result (in fact, also an NP algorithm with a PAD oracle) for a recently proposed NP-complete fragment of word equations called regular-oriented word equations, when augmented with length constraints. We extend this decidability result (in fact, with a complexity upper bound of PSPACE with a PAD oracle) in the presence of regular constraints. △ Less

Submitted 28 October, 2021; v1 submitted 30 July, 2020; originally announced July 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:1805.06701

Journal ref: Logical Methods in Computer Science, Volume 17, Issue 4 (October 29, 2021) lmcs:6693

arXiv:2007.06913 [pdf, ps, other]

A Decision Procedure for Path Feasibility of String Manipulating Programs with Integer Data Type

Authors: Taolue Chen, Matthew Hague, **long He, Denghang Hu, Anthony Widjaja Lin, Philipp Rummer, Zhilin Wu

Abstract: Strings are widely used in programs, especially in web applications. Integer data type occurs naturally in string-manipulating programs, and is frequently used to refer to lengths of, or positions in, strings. Analysis and testing of string-manipulating programs can be formulated as the path feasibility problem: given a symbolic execution path, does there exist an assignment to the inputs that yie… ▽ More Strings are widely used in programs, especially in web applications. Integer data type occurs naturally in string-manipulating programs, and is frequently used to refer to lengths of, or positions in, strings. Analysis and testing of string-manipulating programs can be formulated as the path feasibility problem: given a symbolic execution path, does there exist an assignment to the inputs that yields a concrete execution that realizes this path? Such a problem can naturally be reformulated as a string constraint solving problem. Although state-of-the-art string constraint solvers usually provide support for both string and integer data types, they mainly resort to heuristics without completeness guarantees. In this paper, we propose a decision procedure for a class of string-manipulating programs which includes not only a wide range of string operations such as concatenation, replaceAll, reverse, and finite transducers, but also those involving the integer data-type such as length, indexof, and substring. To the best of our knowledge, this represents one of the most expressive string constraint languages that is currently known to be decidable. Our decision procedure is based on a variant of cost register automata. We implement the decision procedure, giving rise to a new solver OSTRICH+. We evaluate the performance of OSTRICH+ on a wide range of existing and new benchmarks. The experimental results show that OSTRICH+ is the first string decision procedure capable of tackling finite transducers and integer constraints, whilst its overall performance is comparable with the state-of-the-art string constraint solvers. △ Less

Submitted 14 July, 2020; originally announced July 2020.

arXiv:2005.00990 [pdf, other]

Regular Model Checking Revisited (Technical Report)

Authors: Anthony W. Lin, Philipp Rümmer

Abstract: In this contribution we revisit regular model checking, a powerful framework that has been successfully applied for the verification of infinite-state systems, especially parameterized systems (concurrent systems with an arbitrary number of processes). We provide a reformulation of regular model checking with length-preserving transducers in terms of existential second-order theory over automatic… ▽ More In this contribution we revisit regular model checking, a powerful framework that has been successfully applied for the verification of infinite-state systems, especially parameterized systems (concurrent systems with an arbitrary number of processes). We provide a reformulation of regular model checking with length-preserving transducers in terms of existential second-order theory over automatic structures. We argue that this is a natural formulation that enables us tap into powerful synthesis techniques that have been extensively studied in the software verification community. More precisely, in this formulation the first-order part represents the verification conditions for the desired correctness property (for which we have complete solvers), whereas the existentially quantified second-order variables represent the relations to be synthesized. We show that many interesting correctness properties can be formulated in this way, examples being safety, liveness, bisimilarity, and games. More importantly, we show that this new formulation allows new interesting benchmarks (and old regular model checking benchmarks that were previously believed to be difficult), especially in the domain of parameterized system verification, to be solved. △ Less

Submitted 21 November, 2021; v1 submitted 3 May, 2020; originally announced May 2020.

ACM Class: F.3.1

arXiv:2004.12371 [pdf, ps, other]

Monadic Decomposition in Integer Linear Arithmetic (Technical Report)

Authors: Matthew Hague, Anthony Widjaja Lin, Philipp Rümmer, Zhilin Wu

Abstract: Monadic decomposability is a notion of variable independence, which asks whether a given formula in a first-order theory is expressible as a Boolean combination of monadic predicates in the theory. Recently, Veanes et al. showed the usefulness of monadic decomposability in the context of SMT (i.e. the input formula is quantifier-free), and found various interesting applications including string an… ▽ More Monadic decomposability is a notion of variable independence, which asks whether a given formula in a first-order theory is expressible as a Boolean combination of monadic predicates in the theory. Recently, Veanes et al. showed the usefulness of monadic decomposability in the context of SMT (i.e. the input formula is quantifier-free), and found various interesting applications including string analysis. However, checking monadic decomposability is undecidable in general. Decidability for certain theories is known (e.g. Presburger Arithmetic, Tarski's Real-Closed Field), but there are very few results regarding their computational complexity. In this paper, we study monadic decomposability of integer linear arithmetic in the setting of SMT. We show that this decision problem is coNP-complete and, when monadically decomposable, a formula admits a decomposition of exponential size in the worst case. We provide a new application of our results to string constraint solving with length constraints. We then extend our results to variadic decomposability, where predicates could admit multiple free variables (in contrast to monadic decomposability). Finally, we give an application to quantifier elimination in integer linear arithmetic where the variables in a block of quantifiers, if independent, could be eliminated with an exponential (instead of the standard doubly exponential) blow-up. △ Less

Submitted 26 April, 2020; originally announced April 2020.

ACM Class: F.4.1

arXiv:1903.00728 [pdf, other]

Monadic Decomposability of Regular Relations

Authors: Pablo Barcelo, Chih-Duo Hong, Xuan-Bach Le, Anthony W. Lin, Reino Niskanen

Abstract: Monadic decomposibility --- the ability to determine whether a formula in a given logical theory can be decomposed into a boolean combination of monadic formulas --- is a powerful tool for devising a decision procedure for a given logical theory. In this paper, we revisit a classical decision problem in automata theory: given a regular (a.k.a. synchronized rational) relation, determine whether it… ▽ More Monadic decomposibility --- the ability to determine whether a formula in a given logical theory can be decomposed into a boolean combination of monadic formulas --- is a powerful tool for devising a decision procedure for a given logical theory. In this paper, we revisit a classical decision problem in automata theory: given a regular (a.k.a. synchronized rational) relation, determine whether it is recognizable, i.e., it has a monadic decomposition (that is, a representation as a boolean combination of cartesian products of regular languages). Regular relations are expressive formalisms which, using an appropriate string encoding, can capture relations definable in Presburger Arithmetic. In fact, their expressive power coincide with relations definable in a universal automatic structure; equivalently, those definable by finite set interpretations in WS1S (Weak Second Order Theory of One Successor). Determining whether a regular relation admits a recognizable relation was known to be decidable (and in exponential time for binary relations), but its precise complexity still hitherto remains open. Our main contribution is to fully settle the complexity of this decision problem by develo** new techniques employing infinite Ramsey theory. The complexity for DFA (resp. NFA) representations of regular relations is shown to be NLOGSPACE-complete (resp. \PSPACE-complete). △ Less

Submitted 8 May, 2019; v1 submitted 2 March, 2019; originally announced March 2019.

Comments: Full version of the paper accepted to ICALP 2019

arXiv:1812.02989 [pdf, ps, other]

CSS Minification via Constraint Solving (Technical Report)

Authors: Matthew Hague, Anthony W. Lin, Chih-Duo Hong

Abstract: Minification is a widely-accepted technique which aims at reducing the size of the code transmitted over the web. We study the problem of minifying Cascading Style Sheets (CSS) --- the de facto language for styling web documents. Traditionally, CSS minifiers focus on simple syntactic transformations (e.g. shortening colour names). In this paper, we propose a new minification method based on mergin… ▽ More Minification is a widely-accepted technique which aims at reducing the size of the code transmitted over the web. We study the problem of minifying Cascading Style Sheets (CSS) --- the de facto language for styling web documents. Traditionally, CSS minifiers focus on simple syntactic transformations (e.g. shortening colour names). In this paper, we propose a new minification method based on merging similar rules in a CSS file. We consider safe transformations of CSS files, which preserve the semantics of the CSS file. The semantics of CSS files are sensitive to the ordering of rules in the file. To automatically identify a rule merging opportunity that best minimises file size, we reduce the rule-merging problem to a problem on CSS-graphs, i.e., node-weighted bipartite graphs with a dependency ordering on the edges, where weights capture the number of characters (e.g. in a selector or in a property declaration). Roughly speaking, the corresponding CSS-graph problem concerns minimising the total weight of a sequence of bicliques (complete bipartite subgraphs) that covers the CSS-graph and respects the edge order. We provide the first full formalisation of CSS3 selectors and reduce dependency detection to satisfiability of quantifier-free integer linear arithmetic, for which highly-optimised SMT-solvers are available. To solve the above NP-hard graph optimisation problem, we show how Max-SAT solvers can be effectively employed. We have implemented our algorithms using Max-SAT and SMT-solvers as backends, and tested against approximately 70 real-world examples (including the top 20 most popular websites). In our benchmarks, our tool yields larger savings than six well-known minifiers (which do not perform rule-merging, but support many other optimisations). Our experiments also suggest that better savings can be achieved in combination with one of these six minifiers. △ Less

Submitted 7 December, 2018; originally announced December 2018.

ACM Class: F.3.1

arXiv:1811.03167 [pdf, other]

Decision Procedures for Path Feasibility of String-Manipulating Programs with Complex Operations

Authors: Taolue Chen, Matthew Hague, Anthony W. Lin, Philipp Rümmer, Zhilin Wu

Abstract: The design and implementation of decision procedures for checking path feasibility in string-manipulating programs is an important problem, whose applications include symbolic execution and automated detection of cross-site scripting (XSS) vulnerabilities. A (symbolic) path is a finite sequence of assignments and assertions (i.e. without loops), and checking its feasibility amounts to determining… ▽ More The design and implementation of decision procedures for checking path feasibility in string-manipulating programs is an important problem, whose applications include symbolic execution and automated detection of cross-site scripting (XSS) vulnerabilities. A (symbolic) path is a finite sequence of assignments and assertions (i.e. without loops), and checking its feasibility amounts to determining the existence of inputs that yield a successful execution. We give two general semantic conditions which together ensure the decidability of path feasibility: (1) each assertion admits regular monadic decomposition, and (2) each assignment uses a (possibly nondeterministic) function whose inverse relation preserves regularity. We show these conditions are expressive since they are satisfied by a multitude of string operations. They also strictly subsume existing decidable string theories, and most existing benchmarks (e.g. most of Kaluza's, and all of SLOG's, Stranger's, and SLOTH's). We give a simple decision procedure and an extensible architecture of a string solver in that a user may easily incorporate his/her own string functions. We show the general fragment has a tight, but high complexity. To address this, we propose to allow only partial string functions (i.e., prohibit nondeterminism) in condition (2). When nondeterministic functions are needed, we also provide a syntactic fragment that provides a support of nondeterministic functions but can be reduced to an existing solver SLOTH. We provide an efficient implementation of our decision procedure for deterministic partial string functions in a new string solver OSTRICH. It provides built-in support for concatenation, reverse, functional transducers, and replaceall and provides a framework for extensibility to support further string functions. We demonstrate the efficacy of our new solver against other competitive solvers. △ Less

Submitted 7 November, 2018; originally announced November 2018.

arXiv:1805.06701 [pdf, ps, other]

Quadratic Word Equations with Length Constraints, Counter Systems, and Presburger Arithmetic with Divisibility

Authors: Anthony W. Lin, Rupak Majumdar

Abstract: Word equations are a crucial element in the theoretical foundation of constraint solving over strings, which have received a lot of attention in recent years. A word equation relates two words over string variables and constants. Its solution amounts to a function map** variables to constant strings that equate the left and right hand sides of the equation. While the problem of solving word equa… ▽ More Word equations are a crucial element in the theoretical foundation of constraint solving over strings, which have received a lot of attention in recent years. A word equation relates two words over string variables and constants. Its solution amounts to a function map** variables to constant strings that equate the left and right hand sides of the equation. While the problem of solving word equations is decidable, the decidability of the problem of solving a word equation with a length constraint (i.e., a constraint relating the lengths of words in the word equation) has remained a long-standing open problem. In this paper, we focus on the subclass of quadratic word equations, i.e., in which each variable occurs at most twice. We first show that the length abstractions of solutions to quadratic word equations are in general not Presburger-definable. We then describe a class of counter systems with Presburger transition relations which capture the length abstraction of a quadratic word equation with regular constraints. We provide an encoding of the effect of a simple loop of the counter systems in the theory of existential Presburger Arithmetic with divisibility (PAD). Since PAD is decidable, we get a decision procedure for quadratic words equations with length constraints for which the associated counter system is \emph{flat} (i.e., all nodes belong to at most one cycle). We show a decidability result (in fact, also an NP algorithm with a PAD oracle) for a recently proposed NP-complete fragment of word equations called regular-oriented word equations, together with length constraints. Decidability holds when the constraints are additionally extended with regular constraints with a 1-weak control structure. △ Less

Submitted 17 May, 2018; originally announced May 2018.

Comments: 18 pages

arXiv:1711.03363 [pdf, other]

What Is Decidable about String Constraints with the ReplaceAll Function

Authors: Taolue Chen, Yan Chen, Matthew Hague, Anthony W. Lin, Zhilin Wu

Abstract: Recently, it was shown that any theory of strings containing the string-replace function (even the most restricted version where pattern/replacement strings are both constant strings) becomes undecidable if we do not impose some kind of straight-line (aka acyclicity) restriction on the formulas. Despite this, the straight-line restriction is still practically sensible since this condition is typic… ▽ More Recently, it was shown that any theory of strings containing the string-replace function (even the most restricted version where pattern/replacement strings are both constant strings) becomes undecidable if we do not impose some kind of straight-line (aka acyclicity) restriction on the formulas. Despite this, the straight-line restriction is still practically sensible since this condition is typically met by string constraints that are generated by symbolic execution. In this paper, we provide the first systematic study of straight-line string constraints with the string-replace function and the regular constraints as the basic operations. We show that a large class of such constraints (i.e. when only a constant string or a regular expression is permitted in the pattern) is decidable. We note that the string-replace function, even under this restriction, is sufficiently powerful for expressing the concatenation operator and much more (e.g. extensions of regular expressions with string variables). This gives us the most expressive decidable logic containing concatenation, replace, and regular constraints under the same umbrella. Our decision procedure for the straight-line fragment follows an automata-theoretic approach, and is modular in the sense that the string-replace terms are removed one by one to generate more and more regular constraints, which can then be discharged by the state-of-the-art string constraint solvers. We also show that this fragment is, in a way, a maximal decidable subclass of the straight-line fragment with string-replace and regular constraints. To this end, we show undecidability results for the following two extensions: (1) variables are permitted in the pattern parameter of the replace function, (2) length constraints are permitted. △ Less

Submitted 9 November, 2017; originally announced November 2017.

ACM Class: F.4.1; F.1.1

arXiv:1710.10756 [pdf, other]

Fair Termination for Parameterized Probabilistic Concurrent Systems (Technical Report)

Authors: Ondrej Lengal, Anthony W. Lin, Rupak Majumdar, Philipp Ruemmer

Abstract: We consider the problem of automatically verifying that a parameterized family of probabilistic concurrent systems terminates with probability one for all instances against adversarial schedulers. A parameterized family defines an infinite-state system: for each number n, the family consists of an instance with n finite-state processes. In contrast to safety, the parameterized verification of live… ▽ More We consider the problem of automatically verifying that a parameterized family of probabilistic concurrent systems terminates with probability one for all instances against adversarial schedulers. A parameterized family defines an infinite-state system: for each number n, the family consists of an instance with n finite-state processes. In contrast to safety, the parameterized verification of liveness is currently still considered extremely challenging especially in the presence of probabilities in the model. One major challenge is to provide a sufficiently powerful symbolic framework. One well-known symbolic framework for the parameterized verification of non-probabilistic concurrent systems is regular model checking. Although the framework was recently extended to probabilistic systems, incorporating fairness in the framework - often crucial for verifying termination - has been especially difficult due to the presence of an infinite number of fairness constraints (one for each process). Our main contribution is a systematic, regularity-preserving, encoding of finitary fairness (a realistic notion of fairness proposed by Alur & Henzinger) in the framework of regular model checking for probabilistic parameterized systems. Our encoding reduces termination with finitary fairness to verifying parameterized termination without fairness over probabilistic systems in regular model checking (for which a verification framework already exists). We show that our algorithm could verify termination for many interesting examples from distributed algorithms (Herman's protocol) and evolutionary biology (Moran process, cell cycle switch), which do not hold under the standard notion of fairness. To the best of our knowledge, our algorithm is the first fully-automatic method that can prove termination for these examples. △ Less

Submitted 29 October, 2017; originally announced October 2017.

Comments: A technical report of a TACAS'17 paper

arXiv:1709.07139 [pdf, ps, other]

Learning to Prove Safety over Parameterised Concurrent Systems (Full Version)

Authors: Yu-Fang Chen, Chih-Duo Hong, Anthony W. Lin, Philipp Ruemmer

Abstract: We revisit the classic problem of proving safety over parameterised concurrent systems, i.e., an infinite family of finite-state concurrent systems that are represented by some finite (symbolic) means. An example of such an infinite family is a dining philosopher protocol with any number n of processes (n being the parameter that defines the infinite family). Regular model checking is a well-known… ▽ More We revisit the classic problem of proving safety over parameterised concurrent systems, i.e., an infinite family of finite-state concurrent systems that are represented by some finite (symbolic) means. An example of such an infinite family is a dining philosopher protocol with any number n of processes (n being the parameter that defines the infinite family). Regular model checking is a well-known generic framework for modelling parameterised concurrent systems, where an infinite set of configurations (resp. transitions) is represented by a regular set (resp. regular transducer). Although verifying safety properties in the regular model checking framework is undecidable in general, many sophisticated semi-algorithms have been developed in the past fifteen years that can successfully prove safety in many practical instances. In this paper, we propose a simple solution to synthesise regular inductive invariants that makes use of Angluin's classic L* algorithm (and its variants). We provide a termination guarantee when the set of configurations reachable from a given set of initial configurations is regular. We have tested L* algorithm on standard (as well as new) examples in regular model checking including the dining philosopher protocol, the dining cryptographer protocol, and several mutual exclusion protocols (e.g. Bakery, Burns, Szymanski, and German). Our experiments show that, despite the simplicity of our solution, it can perform at least as well as existing semi-algorithms. △ Less

Submitted 2 October, 2017; v1 submitted 20 September, 2017; originally announced September 2017.

Comments: Full version of FMCAD'17 paper

arXiv:1606.01451 [pdf, other]

Liveness of Randomised Parameterised Systems under Arbitrary Schedulers (Technical Report)

Authors: Anthony W. Lin, Philipp Ruemmer

Abstract: We consider the problem of verifying liveness for systems with a finite, but unbounded, number of processes, commonly known as parameterised systems. Typical examples of such systems include distributed protocols (e.g. for the dining philosopher problem). Unlike the case of verifying safety, proving liveness is still considered extremely challenging, especially in the presence of randomness in the… ▽ More We consider the problem of verifying liveness for systems with a finite, but unbounded, number of processes, commonly known as parameterised systems. Typical examples of such systems include distributed protocols (e.g. for the dining philosopher problem). Unlike the case of verifying safety, proving liveness is still considered extremely challenging, especially in the presence of randomness in the system. In this paper we consider liveness under arbitrary (including unfair) schedulers, which is often considered a desirable property in the literature of self-stabilising systems. We introduce an automatic method of proving liveness for randomised parameterised systems under arbitrary schedulers. Viewing liveness as a two-player reachability game (between Scheduler and Process), our method is a CEGAR approach that synthesises a progress relation for Process that can be symbolically represented as a finite-state automaton. The method is incremental and exploits both Angluin-style L*-learning and SAT-solvers. Our experiments show that our algorithm is able to prove liveness automatically for well-known randomised distributed protocols, including Lehmann-Rabin Randomised Dining Philosopher Protocol and randomised self-stabilising protocols (such as the Israeli-Jalfon Protocol). To the best of our knowledge, this is the first fully-automatic method that can prove liveness for randomised protocols. △ Less

Submitted 4 June, 2016; originally announced June 2016.

Comments: Full version of CAV'16 paper

arXiv:1605.06868 [pdf, ps, other]

Decidable models of integer-manipulating programs with recursive parallelism (technical report)

Authors: Matthew Hague, Anthony Widjaja Lin

Abstract: We study safety verification for multithreaded programs with recursive parallelism (i.e. unbounded thread creation and recursion) as well as unbounded integer variables. Since the threads in each program configuration are structured in a hierarchical fashion, our model is state-extended ground-tree rewrite systems equipped with shared unbounded integer counters that can be incremented, decremented… ▽ More We study safety verification for multithreaded programs with recursive parallelism (i.e. unbounded thread creation and recursion) as well as unbounded integer variables. Since the threads in each program configuration are structured in a hierarchical fashion, our model is state-extended ground-tree rewrite systems equipped with shared unbounded integer counters that can be incremented, decremented, and compared against an integer constant. Since the model is Turing-complete, we propose a decidable underapproximation. First, using a restriction similar to context-bounding, we underapproximate the global control by a weak global control (i.e. DAGs possibly with self-loops), thereby limiting the number of synchronisations between different threads. Second, we bound the number of reversals between non-decrementing and non-incrementing modes of the counters. Under this restriction, we show that reachability becomes NP-complete. In fact, it is poly-time reducible to satisfaction over existential Presburger formulas, which allows one to tap into highly optimised SMT solvers. Our decidable approximation strictly generalises known decidable models including (i) weakly-synchronised ground-tree rewrite systems, and (ii) synchronisation/reversal-bounded concurrent pushdown systems systems with counters. Finally, we show that, when equipped with reversal-bounded counters, relaxing the weak control restriction by the notion of senescence results in undecidability. △ Less

Submitted 22 May, 2016; originally announced May 2016.

Comments: Full version of conference submission, 18 pages inc. appendix

ACM Class: F.1.1; F.4.3

arXiv:1511.01633 [pdf, ps, other]

String Solving with Word Equations and Transducers: Towards a Logic for Analysing Mutation XSS (Full Version)

Authors: Anthony W. Lin, Pablo Barcelo

Abstract: We study the fundamental issue of decidability of satisfiability over string logics with concatenations and finite-state transducers as atomic operations. Although restricting to one type of operations yields decidability, little is known about the decidability of their combined theory, which is especially relevant when analysing security vulnerabilities of dynamic web pages in a more realistic br… ▽ More We study the fundamental issue of decidability of satisfiability over string logics with concatenations and finite-state transducers as atomic operations. Although restricting to one type of operations yields decidability, little is known about the decidability of their combined theory, which is especially relevant when analysing security vulnerabilities of dynamic web pages in a more realistic browser model. On the one hand, word equations (string logic with concatenations) cannot precisely capture sanitisation functions (e.g. htmlescape) and implicit browser transductions (e.g. innerHTML mutations). On the other hand, transducers suffer from the reverse problem of being able to model sanitisation functions and browser transductions, but not string concatenations. Naively combining word equations and transducers easily leads to an undecidable logic. Our main contribution is to show that the "straight-line fragment" of the logic is decidable (complexity ranges from PSPACE to EXPSPACE). The fragment can express the program logics of straight-line string-manipulating programs with concatenations and transductions as atomic operations, which arise when performing bounded model checking or dynamic symbolic executions. We demonstrate that the logic can naturally express constraints required for analysing mutation XSS in web applications. Finally, the logic remains decidable in the presence of length, letter-counting, regular, indexOf, and disequality constraints. △ Less

Submitted 5 November, 2015; originally announced November 2015.

Comments: Full version of POPL'16 paper

ACM Class: F.3.1

arXiv:1510.08506 [pdf, other]

Regular Symmetry Patterns (Technical Report)

Authors: Anthony W. Lin, Truong Khanh Nguyen, Philipp Rümmer, Jun Sun

Abstract: Symmetry reduction is a well-known approach for alleviating the state explosion problem in model checking. Automatically identifying symmetries in concurrent systems, however, is computationally expensive. We propose a symbolic framework for capturing symmetry patterns in parameterised systems (i.e. an infinite family of finite-state systems): two regular word transducers to represent, respectivel… ▽ More Symmetry reduction is a well-known approach for alleviating the state explosion problem in model checking. Automatically identifying symmetries in concurrent systems, however, is computationally expensive. We propose a symbolic framework for capturing symmetry patterns in parameterised systems (i.e. an infinite family of finite-state systems): two regular word transducers to represent, respectively, parameterised systems and symmetry patterns. The framework subsumes various types of symmetry relations ranging from weaker notions (e.g. simulation preorders) to the strongest notion (i.e. isomorphisms). Our framework enjoys two algorithmic properties: (1) symmetry verification: given a transducer, we can automatically check whether it is a symmetry pattern of a given system, and (2) symmetry synthesis: we can automatically generate a symmetry pattern for a given system in the form of a transducer. Furthermore, our symbolic language allows additional constraints that the symmetry patterns need to satisfy to be easily incorporated in the verification/synthesis. We show how these properties can help identify symmetry patterns in examples like dining philosopher protocols, self-stabilising protocols, and prioritised resource-allocator protocol. In some cases (e.g. Gries's coffee can problem), our technique automatically synthesises a safety-preserving finite approximant, which can then be verified for safety solely using a finite-state model checker. △ Less

Submitted 28 October, 2015; originally announced October 2015.

Comments: Technical report of VMCAI'16 paper

arXiv:1507.07911 [pdf, ps, other]

doi 10.2168/LMCS-11(4:1)2015

Expressive Path Queries on Graph with Data

Authors: Pablo Barcelo, Gaelle Fontaine, Anthony Widjaja Lin

Abstract: Graph data models have recently become popular owing to their applications, e.g., in social networks and the semantic web. Typical navigational query languages over graph databases - such as Conjunctive Regular Path Queries (CRPQs) - cannot express relevant properties of the interaction between the underlying data and the topology. Two languages have been recently proposed to overcome this problem… ▽ More Graph data models have recently become popular owing to their applications, e.g., in social networks and the semantic web. Typical navigational query languages over graph databases - such as Conjunctive Regular Path Queries (CRPQs) - cannot express relevant properties of the interaction between the underlying data and the topology. Two languages have been recently proposed to overcome this problem: walk logic (WL) and regular expressions with memory (REM). In this paper, we begin by investigating fundamental properties of WL and REM, i.e., complexity of evaluation problems and expressive power. We first show that the data complexity of WL is nonelementary, which rules out its practicality. On the other hand, while REM has low data complexity, we point out that many natural data/topology properties of graphs expressible in WL cannot be expressed in REM. To this end, we propose register logic, an extension of REM, which we show to be able to express many natural graph properties expressible in WL, while at the same time preserving the elementariness of data complexity of REMs. It is also incomparable to WL in terms of expressive power. △ Less

Submitted 5 October, 2015; v1 submitted 28 July, 2015; originally announced July 2015.

Comments: 39 pages

Journal ref: Logical Methods in Computer Science, Volume 11, Issue 4 (October 5, 2015) lmcs:1602

arXiv:1412.5143 [pdf, ps, other]

Detecting Redundant CSS Rules in HTML5 Applications: A Tree-Rewriting Approach

Authors: Matthew Hague, Anthony Widjaja Lin, Luke Ong

Abstract: HTML5 applications normally have a large set of CSS (Cascading Style Sheets) rules for data display. Each CSS rule consists of a node selector (given in an XPath-like query language) and a declaration block (assigning values to selected nodes' display attributes). As web applications evolve, maintaining CSS files can easily become problematic. Some CSS rules will be replaced by new ones, but these… ▽ More HTML5 applications normally have a large set of CSS (Cascading Style Sheets) rules for data display. Each CSS rule consists of a node selector (given in an XPath-like query language) and a declaration block (assigning values to selected nodes' display attributes). As web applications evolve, maintaining CSS files can easily become problematic. Some CSS rules will be replaced by new ones, but these obsolete (hence redundant) CSS rules often remain in the applications. Not only does this "bloat" the applications, but it also significantly increases web browsers' processing time. Most works on detecting redundant CSS rules in HTML5 applications do not consider the dynamic behaviors of HTML5 (specified in JavaScript); in fact, the only proposed method that takes these into account is dynamic analysis (a.k.a. testing), which cannot soundly prove redundancy of CSS rules. In this paper, we introduce an abstraction of HTML5 applications based on monotonic tree-rewriting and study its "redundancy problem". We establish the precise complexity of the problem and various subproblems of practical importance (ranging from P to EXP). In particular, our algorithm relies on an efficient reduction to an analysis of symbolic pushdown systems (for which highly optimised solvers are available), which yields a fast method for checking redundancy in practice. We implemented our algorithm and demonstrated its efficacy in detecting redundant CSS rules in HTML5 applications. △ Less

Submitted 18 August, 2015; v1 submitted 15 December, 2014; originally announced December 2014.

Comments: 50 pages

arXiv:1411.3164 [pdf, ps, other]

A linear time algorithm for the orbit problem over cyclic groups

Authors: Anthony Widjaja Lin, Sanming Zhou

Abstract: The orbit problem is at the heart of symmetry reduction methods for model checking concurrent systems. It asks whether two given configurations in a concurrent system (represented as finite strings over some finite alphabet) are in the same orbit with respect to a given finite permutation group (represented by their generators) acting on this set of configurations by permuting indices. It is known… ▽ More The orbit problem is at the heart of symmetry reduction methods for model checking concurrent systems. It asks whether two given configurations in a concurrent system (represented as finite strings over some finite alphabet) are in the same orbit with respect to a given finite permutation group (represented by their generators) acting on this set of configurations by permuting indices. It is known that the problem is in general as hard as the graph isomorphism problem, whose precise complexity (whether it is solvable in polynomial-time) is a long-standing open problem. In this paper, we consider the restriction of the orbit problem when the permutation group is cyclic (i.e. generated by a single permutation), an important restriction of the problem. It is known that this subproblem is solvable in polynomial-time. Our main result is a linear-time algorithm for this subproblem. △ Less

Submitted 13 November, 2015; v1 submitted 12 November, 2014; originally announced November 2014.

Comments: Accepted in Acta Informatica in Nov 2015

MSC Class: 68N30; 11A05; 11Y16; 68Q25; 68Q60; 68Q85

arXiv:1401.4130 [pdf, ps, other]

Analysis of Probabilistic Basic Parallel Processes

Authors: Rémi Bonnet, Stefan Kiefer, Anthony W. Lin

Abstract: Basic Parallel Processes (BPPs) are a well-known subclass of Petri Nets. They are the simplest common model of concurrent programs that allows unbounded spawning of processes. In the probabilistic version of BPPs, every process generates other processes according to a probability distribution. We study the decidability and complexity of fundamental qualitative problems over probabilistic BPPs -- i… ▽ More Basic Parallel Processes (BPPs) are a well-known subclass of Petri Nets. They are the simplest common model of concurrent programs that allows unbounded spawning of processes. In the probabilistic version of BPPs, every process generates other processes according to a probability distribution. We study the decidability and complexity of fundamental qualitative problems over probabilistic BPPs -- in particular reachability with probability 1 of different classes of target sets (e.g. upward-closed sets). Our results concern both the Markov-chain model, where processes are scheduled randomly, and the MDP model, where processes are picked by a scheduler. △ Less

Submitted 16 January, 2014; originally announced January 2014.

Comments: This is the technical report for a FoSSaCS'14 paper

Showing 1–39 of 39 results for author: Lin, A W