Search | arXiv e-print repository

arXiv:2403.02019 [pdf, other]

Active Learning of Mealy Machines with Timers

Authors: Véronique Bruyère, Bharat Garhewal, Guillermo A. Pérez, Gaëtan Staquet, Frits W. Vaandrager

Abstract: We present the first algorithm for query learning of a general class of Mealy machines with timers (MMTs) in a black-box context. Our algorithm is an extension of the L# algorithm of Vaandrager et al. to a timed setting. Like the algorithm for learning timed automata proposed by Waga, our algorithm is inspired by ideas of Maler & Pnueli. Based on the elementary languages of, both Waga's and our al… ▽ More We present the first algorithm for query learning of a general class of Mealy machines with timers (MMTs) in a black-box context. Our algorithm is an extension of the L# algorithm of Vaandrager et al. to a timed setting. Like the algorithm for learning timed automata proposed by Waga, our algorithm is inspired by ideas of Maler & Pnueli. Based on the elementary languages of, both Waga's and our algorithm use symbolic queries, which are then implemented using finitely many concrete queries. However, whereas Waga needs exponentially many concrete queries to implement a single symbolic query, we only need a polynomial number. This is because in order to learn a timed automaton, a learner needs to determine the exact guard and reset for each transition (out of exponentially many possibilities), whereas for learning an MMT a learner only needs to figure out which of the preceding transitions caused a timeout. As shown in our previous work, this can be done efficiently for a subclass of MMTs that are race-avoiding: if a timeout is caused by a preceding input then a slight change in the timing of this input will induce a corresponding change in the timing of the timeout ("wiggling"). Experiments with a prototype implementation, written in Rust, show that our algorithm is able to efficiently learn realistic benchmarks. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 77 pages, 19 figures

MSC Class: 68Q45 ACM Class: F.4.3

arXiv:2402.13785 [pdf, other]

Synthesis of Hierarchical Controllers Based on Deep Reinforcement Learning Policies

Authors: Florent Delgrange, Guy Avni, Anna Lukina, Christian Schilling, Ann Nowé, Guillermo A. Pérez

Abstract: We propose a novel approach to the problem of controller design for environments modeled as Markov decision processes (MDPs). Specifically, we consider a hierarchical MDP a graph with each vertex populated by an MDP called a "room". We first apply deep reinforcement learning (DRL) to obtain low-level policies for each room, scaling to large rooms of unknown structure. We then apply reactive synthe… ▽ More We propose a novel approach to the problem of controller design for environments modeled as Markov decision processes (MDPs). Specifically, we consider a hierarchical MDP a graph with each vertex populated by an MDP called a "room". We first apply deep reinforcement learning (DRL) to obtain low-level policies for each room, scaling to large rooms of unknown structure. We then apply reactive synthesis to obtain a high-level planner that chooses which low-level policy to execute in each room. The central challenge in synthesizing the planner is the need for modeling rooms. We address this challenge by develo** a DRL procedure to train concise "latent" policies together with PAC guarantees on their performance. Unlike previous approaches, ours circumvents a model distillation step. Our approach combats sparse rewards in DRL and enables reusability of low-level policies. We demonstrate feasibility in a case study involving agent navigation amid moving obstacles. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 19 pages main text, 17 pages Appendix (excluding references)

arXiv:2402.13237 [pdf, other]

Continuous Pushdown VASS in One Dimension are Easy

Authors: Guillermo A. Perez, Shrisha Rao

Abstract: A pushdown vector addition system with states (PVASS) extends the model of vector addition systems with a pushdown stack. The algorithmic analysis of PVASS has applications such as static analysis of recursive programs manipulating integer variables. Unfortunately, reachability analysis, even for one-dimensional PVASS is not known to be decidable. We relax the model of one-dimensional PVASS to mak… ▽ More A pushdown vector addition system with states (PVASS) extends the model of vector addition systems with a pushdown stack. The algorithmic analysis of PVASS has applications such as static analysis of recursive programs manipulating integer variables. Unfortunately, reachability analysis, even for one-dimensional PVASS is not known to be decidable. We relax the model of one-dimensional PVASS to make the counter updates continuous and show that in this case reachability, coverability, and boundedness are decidable in polynomial time. In addition, for the extension of the model with lower-bound guards on the states, we show that coverability and reachability are in NP, and boundedness is in coNP. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: 2 tables, 6 figures, 12 pages

arXiv:2402.09121 [pdf, ps, other]

Inform: From Compartmental Models to Stochastic Bounded Counter Machines

Authors: Tim Leys, Guillermo A. Perez

Abstract: Compartmental models are used in epidemiology to capture the evolution of infectious diseases such as COVID-19 in a population by assigning members of it to compartments with labels such as susceptible, infected, and recovered. In a stochastic compartmental model the flow of individuals between compartments is determined probabilistically. We establish that certain stochastic compartment models ca… ▽ More Compartmental models are used in epidemiology to capture the evolution of infectious diseases such as COVID-19 in a population by assigning members of it to compartments with labels such as susceptible, infected, and recovered. In a stochastic compartmental model the flow of individuals between compartments is determined probabilistically. We establish that certain stochastic compartment models can be encoded as probabilistic counter machines where the configurations are bounded. Based on the latter, we obtain simple descriptions of the models in the PRISM language. This enables the analysis of such compartmental models via probabilistic model checkers. Finally, we report on experimental results where we analyze results from a Belgian COVID-19 model using a probabilistic model checkers. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2310.17410 [pdf, ps, other]

Synthesizing Efficiently Monitorable Formulas in Metric Temporal Logic

Authors: Ritam Raha, Rajarshi Roy, Nathanael Fijalkow, Daniel Neider, Guillermo A. Perez

Abstract: In runtime verification, manually formalizing a specification for monitoring system executions is a tedious and error-prone process. To address this issue, we consider the problem of automatically synthesizing formal specifications from system executions. To demonstrate our approach, we consider the popular specification language Metric Temporal Logic (MTL), which is particularly tailored towards… ▽ More In runtime verification, manually formalizing a specification for monitoring system executions is a tedious and error-prone process. To address this issue, we consider the problem of automatically synthesizing formal specifications from system executions. To demonstrate our approach, we consider the popular specification language Metric Temporal Logic (MTL), which is particularly tailored towards specifying temporal properties for cyber-physical systems (CPS). Most of the classical approaches for synthesizing temporal logic formulas aim at minimizing the size of the formula. However, for efficiency in monitoring, along with the size, the amount of "lookahead" required for the specification becomes relevant, especially for safety-critical applications. We formalize this notion and devise a learning algorithm that synthesizes concise formulas having bounded lookahead. To do so, our algorithm reduces the synthesis task to a series of satisfiability problems in Linear Real Arithmetic (LRA) and generates MTL formulas from their satisfying assignments. The reduction uses a novel encoding of a popular MTL monitoring procedure using LRA. Finally, we implement our algorithm in a tool called TEAL and demonstrate its ability to synthesize efficiently monitorable MTL formulas in a CPS application. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2308.13609 [pdf, ps, other]

Integer Programming with GCD Constraints

Authors: Rémy Defossez, Christoph Haase, Alessio Mansutti, Guillermo A. Perez

Abstract: We study the non-linear extension of integer programming with greatest common divisor constraints of the form $\gcd(f,g) \sim d$, where $f$ and $g$ are linear polynomials, $d$ is a positive integer, and $\sim$ is a relation among $\leq, =, \neq$ and $\geq$. We show that the feasibility problem for these systems is in NP, and that an optimal solution minimizing a linear objective function, if it ex… ▽ More We study the non-linear extension of integer programming with greatest common divisor constraints of the form $\gcd(f,g) \sim d$, where $f$ and $g$ are linear polynomials, $d$ is a positive integer, and $\sim$ is a relation among $\leq, =, \neq$ and $\geq$. We show that the feasibility problem for these systems is in NP, and that an optimal solution minimizing a linear objective function, if it exists, has polynomial bit length. To show these results, we identify an expressive fragment of the existential theory of the integers with addition and divisibility that admits solutions of polynomial bit length. It was shown by Lipshitz [Trans. Am. Math. Soc., 235, pp. 271-283, 1978] that this theory adheres to a local-to-global principle in the following sense: a formula $Φ$ is equi-satisfiable with a formula $Ψ$ in this theory such that $Ψ$ has a solution if and only if $Ψ$ has a solution modulo every prime $p$. We show that in our fragment, only a polynomial number of primes of polynomial bit length need to be considered, and that the solutions modulo prime numbers can be combined to yield a solution to $Φ$ of polynomial bit length. As a technical by-product, we establish a Chinese-remainder-type theorem for systems of congruences and non-congruences showing that solution sizes do not depend on the magnitude of the moduli of non-congruences. △ Less

Submitted 25 August, 2023; originally announced August 2023.

arXiv:2308.07738 [pdf, other]

Formally-Sharp DAgger for MCTS: Lower-Latency Monte Carlo Tree Search using Data Aggregation with Formal Methods

Authors: Debraj Chakraborty, Damien Busatto-Gaston, Jean-François Raskin, Guillermo A. Pérez

Abstract: We study how to efficiently combine formal methods, Monte Carlo Tree Search (MCTS), and deep learning in order to produce high-quality receding horizon policies in large Markov Decision processes (MDPs). In particular, we use model-checking techniques to guide the MCTS algorithm in order to generate offline samples of high-quality decisions on a representative set of states of the MDP. Those sampl… ▽ More We study how to efficiently combine formal methods, Monte Carlo Tree Search (MCTS), and deep learning in order to produce high-quality receding horizon policies in large Markov Decision processes (MDPs). In particular, we use model-checking techniques to guide the MCTS algorithm in order to generate offline samples of high-quality decisions on a representative set of states of the MDP. Those samples can then be used to train a neural network that imitates the policy used to generate them. This neural network can either be used as a guide on a lower-latency MCTS online search, or alternatively be used as a full-fledged policy when minimal latency is required. We use statistical model checking to detect when additional samples are needed and to focus those additional samples on configurations where the learnt neural network policy differs from the (computationally-expensive) offline policy. We illustrate the use of our method on MDPs that model the Frozen Lake and Pac-Man environments -- two popular benchmarks to evaluate reinforcement-learning algorithms. △ Less

Submitted 15 August, 2023; originally announced August 2023.

arXiv:2305.09634 [pdf, other]

Bi-Objective Lexicographic Optimization in Markov Decision Processes with Related Objectives

Authors: Damien Busatto-Gaston, Debraj Chakraborty, Anirban Majumdar, Sayan Mukherjee, Guillermo A. Pérez, Jean-François Raskin

Abstract: We consider lexicographic bi-objective problems on Markov Decision Processes (MDPs), where we optimize one objective while guaranteeing optimality of another. We propose a two-stage technique for solving such problems when the objectives are related (in a way that we formalize). We instantiate our technique for two natural pairs of objectives: minimizing the (conditional) expected number of steps… ▽ More We consider lexicographic bi-objective problems on Markov Decision Processes (MDPs), where we optimize one objective while guaranteeing optimality of another. We propose a two-stage technique for solving such problems when the objectives are related (in a way that we formalize). We instantiate our technique for two natural pairs of objectives: minimizing the (conditional) expected number of steps to a target while guaranteeing the optimal probability of reaching it; and maximizing the (conditional) expected average reward while guaranteeing an optimal probability of staying safe (w.r.t. some safe set of states). For the first combination of objectives, which covers the classical frozen lake environment from reinforcement learning, we also report on experiments performed using a prototype implementation of our algorithm and compare it with what can be obtained from state-of-the-art probabilistic model checkers solving optimal reachability. △ Less

Submitted 15 August, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

arXiv:2305.07451 [pdf, ps, other]

doi 10.1007/978-3-031-42626-1_3

Automata with Timers

Authors: Véronique Bruyère, Guillermo A. Pérez, Gaëtan Staquet, Frits W. Vaandrager

Abstract: In this work, we study properties of deterministic finite-state automata with timers, a subclass of timed automata proposed by Vaandrager et al. as a candidate for an efficiently learnable timed model. We first study the complexity of the configuration reachability problem for such automata and establish that it is PSPACE-complete. Then, as simultaneous timeouts (we call these, races) can occur in… ▽ More In this work, we study properties of deterministic finite-state automata with timers, a subclass of timed automata proposed by Vaandrager et al. as a candidate for an efficiently learnable timed model. We first study the complexity of the configuration reachability problem for such automata and establish that it is PSPACE-complete. Then, as simultaneous timeouts (we call these, races) can occur in timed runs of such automata, we study the problem of determining whether it is possible to modify the delays between the actions in a run, in a way to avoid such races. The absence of races is important for modelling purposes and to streamline learning of automata with timers. We provide an effective characterization of when an automaton is race-avoiding and establish that the related decision problem is in 3EXP and PSPACE-hard. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: 35 pages, 9 figures

ACM Class: F.4.3

Journal ref: Formal Modeling and Analysis of Timed Systems (FORMATS) 2023 pp. 33-49

arXiv:2305.05739 [pdf, ps, other]

Graph-Based Reductions for Parametric and Weighted MDPs

Authors: Kasper Engelen, Guillermo A. Pérez, Shrisha Rao

Abstract: We study the complexity of reductions for weighted reachability in parametric Markov decision processes. That is, we say a state p is never worse than q if for all valuations of the polynomial indeterminates it is the case that the maximal expected weight that can be reached from p is greater than the same value from q. In terms of computational complexity, we establish that determining whether p… ▽ More We study the complexity of reductions for weighted reachability in parametric Markov decision processes. That is, we say a state p is never worse than q if for all valuations of the polynomial indeterminates it is the case that the maximal expected weight that can be reached from p is greater than the same value from q. In terms of computational complexity, we establish that determining whether p is never worse than q is coETR-complete. On the positive side, we give a polynomial-time algorithm to compute the equivalence classes of the order we study for Markov chains. Additionally, we describe and implement two inference rules to under-approximate the never-worse relation and empirically show that it can be used as an efficient preprocessing step for the analysis of large Markov decision processes. △ Less

Submitted 9 May, 2023; originally announced May 2023.

arXiv:2303.12558 [pdf, other]

Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees

Authors: Florent Delgrange, Ann Nowé, Guillermo A. Pérez

Abstract: Although deep reinforcement learning (DRL) has many success stories, the large-scale deployment of policies learned through these advanced techniques in safety-critical scenarios is hindered by their lack of formal guarantees. Variational Markov Decision Processes (VAE-MDPs) are discrete latent space models that provide a reliable framework for distilling formally verifiable controllers from any R… ▽ More Although deep reinforcement learning (DRL) has many success stories, the large-scale deployment of policies learned through these advanced techniques in safety-critical scenarios is hindered by their lack of formal guarantees. Variational Markov Decision Processes (VAE-MDPs) are discrete latent space models that provide a reliable framework for distilling formally verifiable controllers from any RL policy. While the related guarantees address relevant practical aspects such as the satisfaction of performance and safety properties, the VAE approach suffers from several learning flaws (posterior collapse, slow learning speed, poor dynamics estimates), primarily due to the absence of abstraction and representation guarantees to support latent optimization. We introduce the Wasserstein auto-encoded MDP (WAE-MDP), a latent space model that fixes those issues by minimizing a penalized form of the optimal transport between the behaviors of the agent executing the original policy and the distilled policy, for which the formal guarantees apply. Our approach yields bisimulation guarantees while learning the distilled policy, allowing concrete optimization of the abstraction and representation model quality. Our experiments show that, besides distilling policies up to 10 times faster, the latent model quality is indeed better in general. Moreover, we present experiments from a simple time-to-failure verification algorithm on the latent space. The fact that our approach enables such simple verification techniques highlights its applicability. △ Less

Submitted 21 April, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

Comments: ICLR 2023, 10 pages main text, 14 pages appendix (excluding references)

arXiv:2303.03839 [pdf, ps, other]

The Temporal Logic Synthesis Format TLSF v1.2

Authors: Swen Jacobs, Guillermo A. Perez, Philipp Schlehuber-Caissier

Abstract: We present an extension of the Temporal Logic Synthesis Format (TLSF). TLSF builds on standard LTL, but additionally supports high-level constructs, such as sets and functions, as well as parameters that allow a specification to define a whole a family of problems. Our extension introduces operators and a new semantics option for LTLf , i.e., LTL on finite executions. We present an extension of the Temporal Logic Synthesis Format (TLSF). TLSF builds on standard LTL, but additionally supports high-level constructs, such as sets and functions, as well as parameters that allow a specification to define a whole a family of problems. Our extension introduces operators and a new semantics option for LTLf , i.e., LTL on finite executions. △ Less

Submitted 7 March, 2023; originally announced March 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:1604.02284, arXiv:1601.05228

arXiv:2303.03284 [pdf, other]

The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models

Authors: Raphael Avalos, Florent Delgrange, Ann Nowé, Guillermo A. Pérez, Diederik M. Roijers

Abstract: Partially Observable Markov Decision Processes (POMDPs) are used to model environments where the full state cannot be perceived by an agent. As such the agent needs to reason taking into account the past observations and actions. However, simply remembering the full history is generally intractable due to the exponential growth in the history space. Maintaining a probability distribution that mode… ▽ More Partially Observable Markov Decision Processes (POMDPs) are used to model environments where the full state cannot be perceived by an agent. As such the agent needs to reason taking into account the past observations and actions. However, simply remembering the full history is generally intractable due to the exponential growth in the history space. Maintaining a probability distribution that models the belief over what the true state is can be used as a sufficient statistic of the history, but its computation requires access to the model of the environment and is often intractable. While SOTA algorithms use Recurrent Neural Networks to compress the observation-action history aiming to learn a sufficient statistic, they lack guarantees of success and can lead to sub-optimal policies. To overcome this, we propose the Wasserstein Belief Updater, an RL algorithm that learns a latent model of the POMDP and an approximation of the belief update. Our approach comes with theoretical guarantees on the quality of our approximation ensuring that our outputted beliefs allow for learning the optimal value function. △ Less

Submitted 26 October, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

arXiv:2212.05337 [pdf, ps, other]

Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking

Authors: Dennis Gross, Thiago D. Simao, Nils Jansen, Guillermo A. Perez

Abstract: Deep Reinforcement Learning (RL) agents are susceptible to adversarial noise in their observations that can mislead their policies and decrease their performance. However, an adversary may be interested not only in decreasing the reward, but also in modifying specific temporal logic properties of the policy. This paper presents a metric that measures the exact impact of adversarial attacks against… ▽ More Deep Reinforcement Learning (RL) agents are susceptible to adversarial noise in their observations that can mislead their policies and decrease their performance. However, an adversary may be interested not only in decreasing the reward, but also in modifying specific temporal logic properties of the policy. This paper presents a metric that measures the exact impact of adversarial attacks against such properties. We use this metric to craft optimal adversarial attacks. Furthermore, we introduce a model checking method that allows us to verify the robustness of RL policies against adversarial attacks. Our empirical analysis confirms (1) the quality of our metric to craft adversarial attacks against temporal logic properties, and (2) that we are able to concisely assess a system's robustness against attacks. △ Less

Submitted 10 December, 2022; originally announced December 2022.

Comments: ICAART 2023 Paper (Technical Report)

arXiv:2211.08891 [pdf, other]

doi 10.1007/978-3-031-30823-9_14

Validating Streaming JSON Documents with Learned VPAs

Authors: Véronique Bruyère, Guillermo A. Perez, Gaëtan Staquet

Abstract: We present a new streaming algorithm to validate JSON documents against a set of constraints given as a JSON schema. Among the possible values a JSON document can hold, objects are unordered collections of key-value pairs while arrays are ordered collections of values. We prove that there always exists a visibly pushdown automaton (VPA) that accepts the same set of JSON documents as a JSON schema.… ▽ More We present a new streaming algorithm to validate JSON documents against a set of constraints given as a JSON schema. Among the possible values a JSON document can hold, objects are unordered collections of key-value pairs while arrays are ordered collections of values. We prove that there always exists a visibly pushdown automaton (VPA) that accepts the same set of JSON documents as a JSON schema. Leveraging this result, our approach relies on learning a VPA for the provided schema. As the learned VPA assumes a fixed order on the key-value pairs of the objects, we abstract its transitions in a special kind of graph, and propose an efficient streaming algorithm using the VPA and its graph to decide whether a JSON document is valid for the schema. We evaluate the implementation of our algorithm on a number of random JSON documents, and compare it to the classical validation algorithm. △ Less

Submitted 8 March, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: 46 pages, 10 figures, published at TACAS 2023

ACM Class: F.4.3

Journal ref: Tools and Algorithms for the Construction and Analysis of Systems (TACAS) 2023, pp. 271-289

arXiv:2210.00785 [pdf, ps, other]

The Geometry of Reachability in Continuous Vector Addition Systems with States

Authors: Shaull Almagor, Arka Ghosh, Tim Leys, Guillermo A. Perez

Abstract: We study the geometry of reachability sets of continuous vector addition systems with states (VASS). In particular we establish that they are almost Minkowski sums of convex cones and zonotopes generated by the vectors labelling the transitions of the VASS. We use the latter to prove that short so-called linear path schemes suffice as witnesses of reachability in continuous VASS of fixed dimension… ▽ More We study the geometry of reachability sets of continuous vector addition systems with states (VASS). In particular we establish that they are almost Minkowski sums of convex cones and zonotopes generated by the vectors labelling the transitions of the VASS. We use the latter to prove that short so-called linear path schemes suffice as witnesses of reachability in continuous VASS of fixed dimension. Then, we give new polynomial-time algorithms for the reachability problem for linear path schemes. Finally, we also establish that enriching the model with zero tests makes the reachability problem intractable already for linear path schemes of dimension two. △ Less

Submitted 14 November, 2022; v1 submitted 3 October, 2022; originally announced October 2022.

arXiv:2209.07133 [pdf, other]

COOL-MC: A Comprehensive Tool for Reinforcement Learning and Model Checking

Authors: Dennis Gross, Nils Jansen, Sebastian Junges, Guillermo A. Perez

Abstract: This paper presents COOL-MC, a tool that integrates state-of-the-art reinforcement learning (RL) and model checking. Specifically, the tool builds upon the OpenAI gym and the probabilistic model checker Storm. COOL-MC provides the following features: (1) a simulator to train RL policies in the OpenAI gym for Markov decision processes (MDPs) that are defined as input for Storm, (2) a new model buil… ▽ More This paper presents COOL-MC, a tool that integrates state-of-the-art reinforcement learning (RL) and model checking. Specifically, the tool builds upon the OpenAI gym and the probabilistic model checker Storm. COOL-MC provides the following features: (1) a simulator to train RL policies in the OpenAI gym for Markov decision processes (MDPs) that are defined as input for Storm, (2) a new model builder for Storm, which uses callback functions to verify (neural network) RL policies, (3) formal abstractions that relate models and policies specified in OpenAI gym or Storm, and (4) algorithms to obtain bounds on the performance of so-called permissive policies. We describe the components and architecture of COOL-MC and demonstrate its features on multiple benchmark environments. △ Less

Submitted 15 September, 2022; originally announced September 2022.

arXiv:2206.00251 [pdf, other]

The Reactive Synthesis Competition (SYNTCOMP): 2018-2021

Authors: Swen Jacobs, Guillermo A. Perez, Remco Abraham, Veronique Bruyere, Michael Cadilhac, Maximilien Colange, Charly Delfosse, Tom van Dijk, Alexandre Duret-Lutz, Peter Faymonville, Bernd Finkbeiner, Ayrat Khalimov, Felix Klein, Michael Luttenberger, Klara Meyer, Thibaud Michaud, Adrien Pommellet, Florian Renkin, Philipp Schlehuber-Caissier, Mouhammad Sakr, Salomon Sickert, Gaetan Staquet, Clement Tamines, Leander Tentrup, Adam Walker

Abstract: We report on the last four editions of the reactive synthesis competition (SYNTCOMP 2018-2021). We briefly describe the evaluation scheme and the experimental setup of SYNTCOMP. Then, we introduce new benchmark classes that have been added to the SYNTCOMP library and give an overview of the participants of SYNTCOMP. Finally, we present and analyze the results of our experimental evaluations, inclu… ▽ More We report on the last four editions of the reactive synthesis competition (SYNTCOMP 2018-2021). We briefly describe the evaluation scheme and the experimental setup of SYNTCOMP. Then, we introduce new benchmark classes that have been added to the SYNTCOMP library and give an overview of the participants of SYNTCOMP. Finally, we present and analyze the results of our experimental evaluations, including a ranking of tools with respect to quantity and quality - that is, the total size in terms of logic and memory elements - of solutions. △ Less

Submitted 6 May, 2024; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: accepted for publication in STTT

arXiv:2204.06079 [pdf, other]

Acacia-Bonsai: A Modern Implementation of Downset-Based LTL Realizability

Authors: Michaël Cadilhac, Guillermo A. Pérez

Abstract: We describe our implementation of downset-manipulating algorithms used to solve the realizability problem for linear temporal logic (LTL). These algorithms were introduced by Filiot et al.~in the 2010s and implemented in the tools Acacia and Acacia+ in C and Python. We identify degrees of freedom in the original algorithms and provide a complete rewriting of Acacia in C++20 articulated around gene… ▽ More We describe our implementation of downset-manipulating algorithms used to solve the realizability problem for linear temporal logic (LTL). These algorithms were introduced by Filiot et al.~in the 2010s and implemented in the tools Acacia and Acacia+ in C and Python. We identify degrees of freedom in the original algorithms and provide a complete rewriting of Acacia in C++20 articulated around genericity and leveraging modern techniques for better performances. These techniques include compile-time specialization of the algorithms, the use of SIMD registers to store vectors, and several preprocessing steps, some relying on efficient Binary Decision Diagram (BDD) libraries. We also explore different data structures to store downsets. The resulting tool is competitive against comparable modern tools. △ Less

Submitted 12 April, 2022; originally announced April 2022.

Comments: 10 pages, 1 figure

arXiv:2112.09655 [pdf, other]

Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes (Technical Report)

Authors: Florent Delgrange, Ann Nowé, Guillermo A. Pérez

Abstract: We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniqu… ▽ More We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework introduced by Gelada et al. to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model. △ Less

Submitted 14 June, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

Comments: AAAI 2022, technical report including supplementary material (10 pages main text, 14 pages appendix)

arXiv:2112.02976 [pdf, ps, other]

Lecture Notes on Partially Known MDPs

Authors: Guillermo A. Perez

Abstract: In these notes we will tackle the problem of finding optimal policies for Markov decision processes (MDPs) which are not fully known to us. Our intention is to slowly transition from an offline setting to an online (learning) setting. Namely, we are moving towards reinforcement learning. In these notes we will tackle the problem of finding optimal policies for Markov decision processes (MDPs) which are not fully known to us. Our intention is to slowly transition from an offline setting to an online (learning) setting. Namely, we are moving towards reinforcement learning. △ Less

Submitted 20 June, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

arXiv:2110.09434 [pdf, other]

doi 10.1007/978-3-031-30823-9_14

Learning Realtime One-Counter Automata

Authors: Véronique Bruyère, Guillermo A. Pérez, Gaëtan Staquet

Abstract: We present a new learning algorithm for realtime one-counter automata. Our algorithm uses membership and equivalence queries as in Angluin's L* algorithm, as well as counter value queries and partial equivalence queries. In a partial equivalence query, we ask the teacher whether the language of a given finite-state automaton coincides with a counter-bounded subset of the target language. We evalua… ▽ More We present a new learning algorithm for realtime one-counter automata. Our algorithm uses membership and equivalence queries as in Angluin's L* algorithm, as well as counter value queries and partial equivalence queries. In a partial equivalence query, we ask the teacher whether the language of a given finite-state automaton coincides with a counter-bounded subset of the target language. We evaluate an implementation of our algorithm on a number of random benchmarks and on a use case regarding efficient JSON-stream validation. △ Less

Submitted 18 October, 2021; originally announced October 2021.

Comments: 55 pages, 9 figures, submitted to TACAS 2022

ACM Class: F.4.3

Journal ref: Tools and Algorithms for the Construction and Analysis of Systems (TACAS) 2022 pp. 271-289

arXiv:2104.11758 [pdf, ps, other]

Active Learning of Sequential Transducers with Side Information about the Domain

Authors: Raphaël Berthon, Adrien Boiret, Guillermo A. Perez, Jean-François Raskin

Abstract: Active learning is a setting in which a student queries a teacher, through membership and equivalence queries, in order to learn a language. Performance on these algorithms is often measured in the number of queries required to learn a target, with an emphasis on costly equivalence queries. In graybox learning, the learning process is accelerated by foreknowledge of some information on the target.… ▽ More Active learning is a setting in which a student queries a teacher, through membership and equivalence queries, in order to learn a language. Performance on these algorithms is often measured in the number of queries required to learn a target, with an emphasis on costly equivalence queries. In graybox learning, the learning process is accelerated by foreknowledge of some information on the target. Here, we consider graybox active learning of subsequential string transducers, where a regular overapproximation of the domain is known by the student. We show that there exists an algorithm using string equation solvers that uses this knowledge to learn subsequential string transducers with a better guarantee on the required number of equivalence queries than classical active learning. △ Less

Submitted 23 April, 2021; originally announced April 2021.

arXiv:2101.11996 [pdf, ps, other]

Continuous One-Counter Automata

Authors: Michael Blondin, Tim Leys, Filip Mazowiecki, Philip Offtermatt, Guillermo A. Pérez

Abstract: We study the reachability problem for continuous one-counter automata, COCA for short. In such automata, transitions are guarded by upper and lower bound tests against the counter value. Additionally, the counter updates associated with taking transitions can be (non-deterministically) scaled down by a nonzero factor between zero and one. Our three main results are as follows: (1) We prove that th… ▽ More We study the reachability problem for continuous one-counter automata, COCA for short. In such automata, transitions are guarded by upper and lower bound tests against the counter value. Additionally, the counter updates associated with taking transitions can be (non-deterministically) scaled down by a nonzero factor between zero and one. Our three main results are as follows: (1) We prove that the reachability problem for COCA with global upper and lower bound tests is in NC2; (2) that, in general, the problem is decidable in polynomial time; and (3) that it is decidable in the polynomial hierarchy for COCA with parametric counter updates and bound tests. △ Less

Submitted 3 February, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

arXiv:2009.13128 [pdf, ps, other]

The Complexity of Reachability in Parametric Markov Decision Processes

Authors: Sebastian Junges, Joost-Pieter Katoen, Guillermo A. Pérez, Tobias Winkler

Abstract: This article presents the complexity of reachability decision problems for parametric Markov decision processes (pMDPs), an extension to Markov decision processes (MDPs) where transitions probabilities are described by polynomials over a finite set of parameters. In particular, we study the complexity of finding values for these parameters such that the induced MDP satisfies some maximal or minima… ▽ More This article presents the complexity of reachability decision problems for parametric Markov decision processes (pMDPs), an extension to Markov decision processes (MDPs) where transitions probabilities are described by polynomials over a finite set of parameters. In particular, we study the complexity of finding values for these parameters such that the induced MDP satisfies some maximal or minimal reachability probability constraints. We discuss different variants depending on the comparison operator in the constraints and the domain of the parameter values. We improve all known lower bounds for this problem, and notably provide ETR-completeness results for distinct variants of this problem. △ Less

Submitted 28 September, 2020; originally announced September 2020.

Comments: This is a preprint of an article under submission which follows our earlier CONCUR paper. It contains small corrections and new results regarding qualitative reachability queries

arXiv:2005.09253 [pdf, other]

Safe Learning for Near Optimal Scheduling

Authors: Damien Busatto-Gaston, Debraj Chakraborty, Shibashis Guha, Guillermo A. Pérez, Jean-François Raskin

Abstract: In this paper, we investigate the combination of synthesis, model-based learning, and online sampling techniques to obtain safe and near-optimal schedulers for a preemptible task scheduling problem. Our algorithms can handle Markov decision processes (MDPs) that have 1020 states and beyond which cannot be handled with state-of-the art probabilistic model-checkers. We provide probably approximately… ▽ More In this paper, we investigate the combination of synthesis, model-based learning, and online sampling techniques to obtain safe and near-optimal schedulers for a preemptible task scheduling problem. Our algorithms can handle Markov decision processes (MDPs) that have 1020 states and beyond which cannot be handled with state-of-the art probabilistic model-checkers. We provide probably approximately correct (PAC) guarantees for learning the model. Additionally, we extend Monte-Carlo tree search with advice, computed using safety games or obtained using the earliest-deadline-first scheduler, to safely explore the learned model online. Finally, we implemented and compared our algorithms empirically against shielded deep Q-learning on large task systems. △ Less

Submitted 13 July, 2021; v1 submitted 19 May, 2020; originally announced May 2020.

arXiv:2005.05587 [pdf, ps, other]

Robustness Verification for Classifier Ensembles

Authors: Dennis Gross, Nils Jansen, Guillermo A. Pérez, Stephan Raaijmakers

Abstract: We give a formal verification procedure that decides whether a classifier ensemble is robust against arbitrary randomized attacks. Such attacks consist of a set of deterministic attacks and a distribution over this set. The robustness-checking problem consists of assessing, given a set of classifiers and a labelled data set, whether there exists a randomized attack that induces a certain expected… ▽ More We give a formal verification procedure that decides whether a classifier ensemble is robust against arbitrary randomized attacks. Such attacks consist of a set of deterministic attacks and a distribution over this set. The robustness-checking problem consists of assessing, given a set of classifiers and a labelled data set, whether there exists a randomized attack that induces a certain expected loss against all classifiers. We show the NP-hardness of the problem and provide an upper bound on the number of attacks that is sufficient to form an optimal randomized attack. These results provide an effective way to reason about the robustness of a classifier ensemble. We provide SMT and MILP encodings to compute optimal randomized attacks or prove that there is no attack inducing a certain expected loss. In the latter case, the classifier ensemble is provably robust. Our prototype implementation verifies multiple neural-network ensembles trained for image-classification tasks. The experimental results using the MILP encoding are promising both in terms of scalability and the general applicability of our verification procedure. △ Less

Submitted 9 July, 2020; v1 submitted 12 May, 2020; originally announced May 2020.

arXiv:2005.01071 [pdf, other]

Revisiting Parameter Synthesis for One-Counter Automata

Authors: Guillermo A. Pérez, Ritam Raha

Abstract: We study the (parameter) synthesis problem for one-counter automata with parameters. One-counter automata are obtained by extending classical finite-state automata with a counter whose value can range over non-negative integers and be tested for zero. The updates and tests applicable to the counter can further be made parametric by introducing a set of integer-valued variables called parameters. T… ▽ More We study the (parameter) synthesis problem for one-counter automata with parameters. One-counter automata are obtained by extending classical finite-state automata with a counter whose value can range over non-negative integers and be tested for zero. The updates and tests applicable to the counter can further be made parametric by introducing a set of integer-valued variables called parameters. The synthesis problem for such automata asks whether there exists a valuation of the parameters such that all infinite runs of the automaton satisfy some omega-regular property. Lechner showed that (the complement of) the problem can be encoded in a restricted one-alternation fragment of Presburger arithmetic with divisibility. In this work (i) we argue that said fragment, called AERPADPLUS, is unfortunately undecidable. Nevertheless, by a careful re-encoding of the problem into a decidable restriction of AERPADPLUS, (ii) we prove that the synthesis problem is decidable in general and in N2EXP for several fixed omega-regular properties. Finally, (iii) we give a polynomial-space algorithm for the special case of the problem where parameters can only be used in tests, and not updates, of the counter. △ Less

Submitted 19 October, 2021; v1 submitted 3 May, 2020; originally announced May 2020.

arXiv:2004.02593 [pdf, ps, other]

Let's Agree to Degree: Comparing Graph Convolutional Networks in the Message-Passing Framework

Authors: Floris Geerts, Filip Mazowiecki, Guillermo A. Pérez

Abstract: In this paper we cast neural networks defined on graphs as message-passing neural networks (MPNNs) in order to study the distinguishing power of different classes of such models. We are interested in whether certain architectures are able to tell vertices apart based on the feature labels given as input with the graph. We consider two variants of MPNNS: anonymous MPNNs whose message functions depe… ▽ More In this paper we cast neural networks defined on graphs as message-passing neural networks (MPNNs) in order to study the distinguishing power of different classes of such models. We are interested in whether certain architectures are able to tell vertices apart based on the feature labels given as input with the graph. We consider two variants of MPNNS: anonymous MPNNs whose message functions depend only on the labels of vertices involved; and degree-aware MPNNs in which message functions can additionally use information regarding the degree of vertices. The former class covers a popular formalisms for computing functions on graphs: graph neural networks (GNN). The latter covers the so-called graph convolutional networks (GCNs), a recently introduced variant of GNNs by Kipf and Welling. We obtain lower and upper bounds on the distinguishing power of MPNNs in terms of the distinguishing power of the Weisfeiler-Lehman (WL) algorithm. Our results imply that (i) the distinguishing power of GCNs is bounded by the WL algorithm, but that they are one step ahead; (ii) the WL algorithm cannot be simulated by "plain vanilla" GCNs but the addition of a trade-off parameter between features of the vertex and those of its neighbours (as proposed by Kipf and Welling themselves) resolves this problem. △ Less

Submitted 6 April, 2020; originally announced April 2020.

Comments: 22 pages

arXiv:1912.05793 [pdf, ps, other]

The Extended HOA Format for Synthesis

Authors: Guillermo A. Perez

Abstract: We propose a small extension to the Hanoi Omega-Automata format to define reactive-synthesis problems. Namely, we add a "controllable-AP" header item specifying the subset of atomic propositions which is controllable. We describe the semantics of the new format and propose an output format for synthesized strategies. Finally, we also comment on tool support meant to encourage fast adoption of the… ▽ More We propose a small extension to the Hanoi Omega-Automata format to define reactive-synthesis problems. Namely, we add a "controllable-AP" header item specifying the subset of atomic propositions which is controllable. We describe the semantics of the new format and propose an output format for synthesized strategies. Finally, we also comment on tool support meant to encourage fast adoption of the extended Hanoi Omega-Automata format for synthesis. △ Less

Submitted 13 May, 2020; v1 submitted 12 December, 2019; originally announced December 2019.

Comments: Updated the link to a cited tool and made it explicit that parity automata can be assumed to be complete

arXiv:1907.06913 [pdf, ps, other]

Partial Solvers for Generalized Parity Games

Authors: Véronique Bruyère, Guillermo A. Pérez, Jean-François Raskin, Clément Tamines

Abstract: Parity games have been broadly studied in recent years for their applications to controller synthesis and verification. In practice, partial solvers for parity games that execute in polynomial time, while incomplete, can solve most games in publicly available benchmark suites. In this paper, we combine those partial solvers with the classical recursive algorithm for parity games due to Zielonka. W… ▽ More Parity games have been broadly studied in recent years for their applications to controller synthesis and verification. In practice, partial solvers for parity games that execute in polynomial time, while incomplete, can solve most games in publicly available benchmark suites. In this paper, we combine those partial solvers with the classical recursive algorithm for parity games due to Zielonka. We also extend partial solvers to generalized parity games that are games with conjunction of parity objectives. We have implemented those algorithms and evaluated them on a large set of benchmarks proposed in the last LTL synthesis competition. △ Less

Submitted 20 July, 2019; v1 submitted 16 July, 2019; originally announced July 2019.

arXiv:1904.01503 [pdf, other]

On the Complexity of Reachability in Parametric Markov Decision Processes

Authors: Tobias Winkler, Sebastian Junges, Guillermo A. Pérez, Joost-Pieter Katoen

Abstract: This paper studies parametric Markov decision processes (pMDPs), an extension to Markov decision processes (MDPs) where transitions probabilities are described by polynomials over a finite set of parameters. Fixing values for all parameters yields MDPs. In particular, this paper studies the complexity of finding values for these parameters such that the induced MDP satisfies some reachability cons… ▽ More This paper studies parametric Markov decision processes (pMDPs), an extension to Markov decision processes (MDPs) where transitions probabilities are described by polynomials over a finite set of parameters. Fixing values for all parameters yields MDPs. In particular, this paper studies the complexity of finding values for these parameters such that the induced MDP satisfies some reachability constraints. We discuss different variants depending on the comparison operator in the constraints and the domain of the parameter values. We improve all known lower bounds for this problem, and notably provide ETR-completeness results for distinct variants of this problem. Furthermore, we provide insights in the functions describing the induced reachability probabilities, and how pMDPs generalise concurrent stochastic reachability games. △ Less

Submitted 2 April, 2019; originally announced April 2019.

Comments: Full version with proofs, 42 pages

arXiv:1902.06576 [pdf, other]

Coverability in 1-VASS with Disequality Tests

Authors: Shaull Almagor, Nathann Cohen, Guillermo A. Pérez, Mahsa Shirmohammadi, James Worrell

Abstract: We study a class of reachability problems in weighted graphs with constraints on the accumulated weight of paths. The problems we study can equivalently be formulated in the model of vector addition systems with states (VASS). We consider a version of the vertex-to-vertex reachability problem in which the accumulated weight of a path is required always to be non-negative. This is equivalent to the… ▽ More We study a class of reachability problems in weighted graphs with constraints on the accumulated weight of paths. The problems we study can equivalently be formulated in the model of vector addition systems with states (VASS). We consider a version of the vertex-to-vertex reachability problem in which the accumulated weight of a path is required always to be non-negative. This is equivalent to the so-called control-state reachability problem (also called the coverability problem) for 1-dimensional VASS. We show that this problem lies in NC: the class of problems solvable in polylogarithmic parallel time. In our main result we generalise the problem to allow disequality constraints on edges (i.e., we allow edges to be disabled if the accumulated weight is equal to a specific value). We show that in this case the vertex-to-vertex reachability problem is solvable in polynomial time even though a shortest path may have exponential length. In the language of VASS this means that control-state reachability is in polynomial time for 1-dimensional VASS with disequality tests. △ Less

Submitted 7 September, 2020; v1 submitted 18 February, 2019; originally announced February 2019.

arXiv:1811.07146 [pdf, ps, other]

The Impatient May Use Limited Optimism to Minimize Regret

Authors: Michaël Cadilhac, Guillermo A. Pérez, Marie van den Bogaard

Abstract: Discounted-sum games provide a formal model for the study of reinforcement learning, where the agent is enticed to get rewards early since later rewards are discounted. When the agent interacts with the environment, she may regret her actions, realizing that a previous choice was suboptimal given the behavior of the environment. The main contribution of this paper is a PSPACE algorithm for computi… ▽ More Discounted-sum games provide a formal model for the study of reinforcement learning, where the agent is enticed to get rewards early since later rewards are discounted. When the agent interacts with the environment, she may regret her actions, realizing that a previous choice was suboptimal given the behavior of the environment. The main contribution of this paper is a PSPACE algorithm for computing the minimum possible regret of a given game. To this end, several results of independent interest are shown. (1) We identify a class of regret-minimizing and admissible strategies that first assume that the environment is collaborating, then assume it is adversarial---the precise timing of the switch is key here. (2) Disregarding the computational cost of numerical analysis, we provide an NP algorithm that checks that the regret entailed by a given time-switching strategy exceeds a given value. (3) We show that determining whether a strategy minimizes regret is decidable in PSPACE. △ Less

Submitted 17 November, 2018; originally announced November 2018.

arXiv:1807.04920 [pdf, other]

On the Complexity of Value Iteration

Authors: Nikhil Balaji, Stefan Kiefer, Petr Novotný, Guillermo A. Pérez, Mahsa Shirmohammadi

Abstract: Value iteration is a fundamental algorithm for solving Markov Decision Processes (MDPs). It computes the maximal $n$-step payoff by iterating $n$ times a recurrence equation which is naturally associated to the MDP. At the same time, value iteration provides a policy for the MDP that is optimal on a given finite horizon $n$. In this paper, we settle the computational complexity of value iteration.… ▽ More Value iteration is a fundamental algorithm for solving Markov Decision Processes (MDPs). It computes the maximal $n$-step payoff by iterating $n$ times a recurrence equation which is naturally associated to the MDP. At the same time, value iteration provides a policy for the MDP that is optimal on a given finite horizon $n$. In this paper, we settle the computational complexity of value iteration. We show that, given a horizon $n$ in binary and an MDP, computing an optimal policy is EXP-complete, thus resolving an open problem that goes back to the seminal 1987 paper on the complexity of MDPs by Papadimitriou and Tsitsiklis. As a step** stone, we show that it is EXP-complete to compute the $n$-fold iteration (with $n$ in binary) of a function given by a straight-line program over the integers with $\max$ and $+$ as operators. △ Less

Submitted 27 April, 2019; v1 submitted 13 July, 2018; originally announced July 2018.

Comments: Full version of an ICALP'19 paper

arXiv:1804.09077 [pdf, ps, other]

When is Containment Decidable for Probabilistic Automata?

Authors: Laure Daviaud, Marcin Jurdziński, Ranko Lazić, Filip Mazowiecki, Guillermo A. Pérez, James Worrell

Abstract: The emptiness and containment problems for probabilistic automata are natural quantitative generalisations of the classical language emptiness and inclusion problems for Boolean automata. It is well known that both problems are undecidable. In this paper we provide a more refined view of these problems in terms of the degree of ambiguity of probabilistic automata. We show that a gap version of the… ▽ More The emptiness and containment problems for probabilistic automata are natural quantitative generalisations of the classical language emptiness and inclusion problems for Boolean automata. It is well known that both problems are undecidable. In this paper we provide a more refined view of these problems in terms of the degree of ambiguity of probabilistic automata. We show that a gap version of the emptiness problem (that is known be undecidable in general) becomes decidable for automata of polynomial ambiguity. We complement this positive result by showing that the emptiness problem remains undecidable even when restricted to automata of linear ambiguity. We then turn to finitely ambiguous automata. Here we show decidability of containment in case one of the automata is assumed to be unambiguous while the other one is allowed to be finitely ambiguous. Our proof of this last result relies on the decidability of the theory of real exponentiation, which has been shown, subject to Schanuel's Conjecture, by Macintyre and Wilkie. △ Less

Submitted 29 March, 2020; v1 submitted 24 April, 2018; originally announced April 2018.

ACM Class: F.3.1

arXiv:1804.08924 [pdf, other]

Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints

Authors: Jan Křetínský, Guillermo A. Pérez, Jean-François Raskin

Abstract: We formalize the problem of maximizing the mean-payoff value with high probability while satisfying a parity objective in a Markov decision process (MDP) with unknown probabilistic transition function and unknown reward function. Assuming the support of the unknown transition function and a lower bound on the minimal transition probability are known in advance, we show that in MDPs consisting of a… ▽ More We formalize the problem of maximizing the mean-payoff value with high probability while satisfying a parity objective in a Markov decision process (MDP) with unknown probabilistic transition function and unknown reward function. Assuming the support of the unknown transition function and a lower bound on the minimal transition probability are known in advance, we show that in MDPs consisting of a single end component, two combinations of guarantees on the parity and mean-payoff objectives can be achieved depending on how much memory one is willing to use. (i) For all $ε$ and $γ$ we can construct an online-learning finite-memory strategy that almost-surely satisfies the parity objective and which achieves an $ε$-optimal mean payoff with probability at least $1 - γ$. (ii) Alternatively, for all $ε$ and $γ$ there exists an online-learning infinite-memory strategy that satisfies the parity objective surely and which achieves an $ε$-optimal mean payoff with probability at least $1 - γ$. We extend the above results to MDPs consisting of more than one end component in a natural way. Finally, we show that the aforementioned guarantees are tight, i.e. there are MDPs for which stronger combinations of the guarantees cannot be ensured. △ Less

Submitted 23 August, 2018; v1 submitted 24 April, 2018; originally announced April 2018.

arXiv:1804.06336 [pdf, other]

Weak Cost Register Automata are Still Powerful

Authors: Shaull Almagor, Michaël Cadilhac, Filip Mazowiecki, Guillermo A. Pérez

Abstract: We consider one of the weakest variants of cost register automata over a tropical semiring, namely copyless cost register automata over $\mathbb{N}$ with updates using $\min$ and increments. We show that this model can simulate, in some sense, the runs of counter machines with zero-tests. We deduce that a number of problems pertaining to that model are undecidable, in particular equivalence, dispr… ▽ More We consider one of the weakest variants of cost register automata over a tropical semiring, namely copyless cost register automata over $\mathbb{N}$ with updates using $\min$ and increments. We show that this model can simulate, in some sense, the runs of counter machines with zero-tests. We deduce that a number of problems pertaining to that model are undecidable, in particular equivalence, disproving a conjecture of Alur et al. from 2012. To emphasize how weak these machines are, we also show that they can be expressed as a restricted form of linearly-ambiguous weighted automata. △ Less

Submitted 17 April, 2018; originally announced April 2018.

Comments: 16 pages

arXiv:1711.11439 [pdf, other]

doi 10.4204/EPTCS.260.10

The 4th Reactive Synthesis Competition (SYNTCOMP 2017): Benchmarks, Participants & Results

Authors: Swen Jacobs, Nicolas Basset, Roderick Bloem, Romain Brenguier, Maximilien Colange, Peter Faymonville, Bernd Finkbeiner, Ayrat Khalimov, Felix Klein, Thibaud Michaud, Guillermo A. Pérez, Jean-François Raskin, Ocan Sankur, Leander Tentrup

Abstract: We report on the fourth reactive synthesis competition (SYNTCOMP 2017). We introduce two new benchmark classes that have been added to the SYNTCOMP library, and briefly describe the benchmark selection, evaluation scheme and the experimental setup of SYNTCOMP 2017. We present the participants of SYNTCOMP 2017, with a focus on changes with respect to the previous years and on the two completely new… ▽ More We report on the fourth reactive synthesis competition (SYNTCOMP 2017). We introduce two new benchmark classes that have been added to the SYNTCOMP library, and briefly describe the benchmark selection, evaluation scheme and the experimental setup of SYNTCOMP 2017. We present the participants of SYNTCOMP 2017, with a focus on changes with respect to the previous years and on the two completely new tools that have entered the competition. Finally, we present and analyze the results of our experimental evaluation, including a ranking of tools with respect to quantity and quality of solutions. △ Less

Submitted 28 November, 2017; originally announced November 2017.

Comments: In Proceedings SYNT 2017, arXiv:1711.10224. arXiv admin note: text overlap with arXiv:1609.00507

Journal ref: EPTCS 260, 2017, pp. 116-143

arXiv:1710.07903 [pdf, ps, other]

The Complexity of Graph-Based Reductions for Reachability in Markov Decision Processes

Authors: Stephane Le Roux, Guillermo A. Perez

Abstract: We study the never-worse relation (NWR) for Markov decision processes with an infinite-horizon reachability objective. A state q is never worse than a state p if the maximal probability of reaching the target set of states from p is at most the same value from q, regard- less of the probabilities labelling the transitions. Extremal-probability states, end components, and essential states are all s… ▽ More We study the never-worse relation (NWR) for Markov decision processes with an infinite-horizon reachability objective. A state q is never worse than a state p if the maximal probability of reaching the target set of states from p is at most the same value from q, regard- less of the probabilities labelling the transitions. Extremal-probability states, end components, and essential states are all special cases of the equivalence relation induced by the NWR. Using the NWR, states in the same equivalence class can be collapsed. Then, actions leading to sub- optimal states can be removed. We show the natural decision problem associated to computing the NWR is coNP-complete. Finally, we ex- tend a previously known incomplete polynomial-time iterative algorithm to under-approximate the NWR. △ Less

Submitted 24 February, 2018; v1 submitted 22 October, 2017; originally announced October 2017.

arXiv:1701.02903 [pdf, ps, other]

On Delay and Regret Determinization of Max-Plus Automata

Authors: Emmanuel Filiot, Ismaël Jecker, Nathan Lhote, Guillermo A. Pérez, Jean-François Raskin

Abstract: Decidability of the determinization problem for weighted automata over the semiring $(\mathbb{Z} \cup {-\infty}, \max, +)$, WA for short, is a long-standing open question. We propose two ways of approaching it by constraining the search space of deterministic WA: k-delay and r-regret. A WA N is k-delay determinizable if there exists a deterministic automaton D that defines the same function as N a… ▽ More Decidability of the determinization problem for weighted automata over the semiring $(\mathbb{Z} \cup {-\infty}, \max, +)$, WA for short, is a long-standing open question. We propose two ways of approaching it by constraining the search space of deterministic WA: k-delay and r-regret. A WA N is k-delay determinizable if there exists a deterministic automaton D that defines the same function as N and for all words α in the language of N, the accepting run of D on α is always at most k-away from a maximal accepting run of N on α. That is, along all prefixes of the same length, the absolute difference between the running sums of weights of the two runs is at most k. A WA N is r-regret determinizable if for all words α in its language, its non-determinism can be resolved on the fly to construct a run of N such that the absolute difference between its value and the value assigned to α by N is at most r. We show that a WA is determinizable if and only if it is k-delay determinizable for some k. Hence deciding the existence of some k is as difficult as the general determinization problem. When k and r are given as input, the k-delay and r-regret determinization problems are shown to be EXPtime-complete. We also show that determining whether a WA is r-regret determinizable for some r is in EXPtime. △ Less

Submitted 3 March, 2017; v1 submitted 11 January, 2017; originally announced January 2017.

arXiv:1611.08696 [pdf, other]

Optimizing Expectation with Guarantees in POMDPs (Technical Report)

Authors: Krishnendu Chatterjee, Petr Novotný, Guillermo A. Pérez, Jean-François Raskin, Đorđe Žikelić

Abstract: A standard objective in partially-observable Markov decision processes (POMDPs) is to find a policy that maximizes the expected discounted-sum payoff. However, such policies may still permit unlikely but highly undesirable outcomes, which is problematic especially in safety-critical applications. Recently, there has been a surge of interest in POMDPs where the goal is to maximize the probability t… ▽ More A standard objective in partially-observable Markov decision processes (POMDPs) is to find a policy that maximizes the expected discounted-sum payoff. However, such policies may still permit unlikely but highly undesirable outcomes, which is problematic especially in safety-critical applications. Recently, there has been a surge of interest in POMDPs where the goal is to maximize the probability to ensure that the payoff is at least a given threshold, but these approaches do not consider any optimization beyond satisfying this threshold constraint. In this work we go beyond both the "expectation" and "threshold" approaches and consider a "guaranteed payoff optimization (GPO)" problem for POMDPs, where we are given a threshold $t$ and the objective is to find a policy $σ$ such that a) each possible outcome of $σ$ yields a discounted-sum payoff of at least $t$, and b) the expected discounted-sum payoff of $σ$ is optimal (or near-optimal) among all policies satisfying a). We present a practical approach to tackle the GPO problem and evaluate it on standard POMDP benchmarks. △ Less

Submitted 29 January, 2017; v1 submitted 26 November, 2016; originally announced November 2016.

arXiv:1611.08677 [pdf, other]

Admissibility in Quantitative Graph Games

Authors: Romain Brenguier, Guillermo A. Pérez, Jean-François Raskin, Ocan Sankur

Abstract: Admissibility has been studied for games of infinite duration with Boolean objectives. We extend here this study to games of infinite duration with quantitative objectives. First, we show that, un- der the assumption that optimal worst-case and cooperative strategies exist, admissible strategies are guaranteed to exist. Second, we give a characterization of admissible strategies using the no- tion… ▽ More Admissibility has been studied for games of infinite duration with Boolean objectives. We extend here this study to games of infinite duration with quantitative objectives. First, we show that, un- der the assumption that optimal worst-case and cooperative strategies exist, admissible strategies are guaranteed to exist. Second, we give a characterization of admissible strategies using the no- tion of adversarial and cooperative values of a history, and we characterize the set of outcomes that are compatible with admissible strategies. Finally, we show how these characterizations can be used to design algorithms to decide relevant verification and synthesis problems. △ Less

Submitted 26 November, 2016; originally announced November 2016.

Comments: This is the full version of an article that will be published in FSTTCS'16

ACM Class: F.1.1; D.2.4

arXiv:1609.00507 [pdf, other]

doi 10.4204/EPTCS.229.12

The 3rd Reactive Synthesis Competition (SYNTCOMP 2016): Benchmarks, Participants & Results

Authors: Swen Jacobs, Roderick Bloem, Romain Brenguier, Ayrat Khalimov, Felix Klein, Robert Könighofer, Jens Kreber, Alexander Legg, Nina Narodytska, Guillermo A. Pérez, Jean-François Raskin, Leonid Ryzhyk, Ocan Sankur, Martina Seidl, Leander Tentrup, Adam Walker

Abstract: We report on the benchmarks, participants and results of the third reactive synthesis competition(SYNTCOMP 2016). The benchmark library of SYNTCOMP 2016 has been extended to benchmarks in the new LTL-based temporal logic synthesis format (TLSF), and 2 new sets of benchmarks for the existing AIGER-based format for safety specifications. The participants of SYNTCOMP 2016 can be separated according t… ▽ More We report on the benchmarks, participants and results of the third reactive synthesis competition(SYNTCOMP 2016). The benchmark library of SYNTCOMP 2016 has been extended to benchmarks in the new LTL-based temporal logic synthesis format (TLSF), and 2 new sets of benchmarks for the existing AIGER-based format for safety specifications. The participants of SYNTCOMP 2016 can be separated according to these two classes of specifications, and we give an overview of the 6 tools that entered the competition in the AIGER-based track, and the 3 participants that entered the TLSF-based track. We briefly describe the benchmark selection, evaluation scheme and the experimental setup of SYNTCOMP 2016. Finally, we present and analyze the results of our experimental evaluation, including a comparison to participants of previous competitions and a legacy tool. △ Less

Submitted 23 November, 2016; v1 submitted 2 September, 2016; originally announced September 2016.

Comments: In Proceedings SYNT 2016, arXiv:1611.07178

Journal ref: EPTCS 229, 2016, pp. 149-177

arXiv:1602.01174 [pdf, other]

doi 10.4204/EPTCS.202.7

Compositional Algorithms for Succinct Safety Games

Authors: Romain Brenguier, Guillermo A. Pérez, Jean-François Raskin, Ocan Sankur

Abstract: We study the synthesis of circuits for succinct safety specifications given in the AIG format. We show how AIG safety specifications can be decomposed automatically into sub specifications. Then we propose symbolic compositional algorithms to solve the synthesis problem compositionally starting for the sub-specifications. We have evaluated the compositional algorithms on a set of benchmarks includ… ▽ More We study the synthesis of circuits for succinct safety specifications given in the AIG format. We show how AIG safety specifications can be decomposed automatically into sub specifications. Then we propose symbolic compositional algorithms to solve the synthesis problem compositionally starting for the sub-specifications. We have evaluated the compositional algorithms on a set of benchmarks including those proposed for the first synthesis competition organised in 2014 by the Synthesis Workshop affiliated to the CAV conference. We show that a large number of benchmarks can be decomposed automatically and solved more efficiently with the compositional algorithms that we propose in this paper. △ Less

Submitted 2 February, 2016; originally announced February 2016.

Comments: In Proceedings SYNT 2015, arXiv:1602.00786

ACM Class: F.3.1,B.1.2,B.6.3

Journal ref: EPTCS 202, 2016, pp. 98-111

arXiv:1602.01171 [pdf, other]

doi 10.4204/EPTCS.202.4

The Second Reactive Synthesis Competition (SYNTCOMP 2015)

Authors: Swen Jacobs, Roderick Bloem, Romain Brenguier, Robert Könighofer, Guillermo A. Pérez, Jean-François Raskin, Leonid Ryzhyk, Ocan Sankur, Martina Seidl, Leander Tentrup, Adam Walker

Abstract: We report on the design and results of the second reactive synthesis competition (SYNTCOMP 2015). We describe our extended benchmark library, with 6 completely new sets of benchmarks, and additional challenging instances for 4 of the benchmark sets that were already used in SYNTCOMP 2014. To enhance the analysis of experimental results, we introduce an extension of our benchmark format with meta-i… ▽ More We report on the design and results of the second reactive synthesis competition (SYNTCOMP 2015). We describe our extended benchmark library, with 6 completely new sets of benchmarks, and additional challenging instances for 4 of the benchmark sets that were already used in SYNTCOMP 2014. To enhance the analysis of experimental results, we introduce an extension of our benchmark format with meta-information, including a difficulty rating and a reference size for solutions. Tools are evaluated on a set of 250 benchmarks, selected to provide a good coverage of benchmarks from all classes and difficulties. We report on changes of the evaluation scheme and the experimental setup. Finally, we describe the entrants into SYNTCOMP 2015, as well as the results of our experimental evaluation. In our analysis, we emphasize progress over the tools that participated last year. △ Less

Submitted 2 February, 2016; originally announced February 2016.

Comments: In Proceedings SYNT 2015, arXiv:1602.00786

Journal ref: EPTCS 202, 2016, pp. 27-57

arXiv:1512.05568 [pdf, ps, other]

Non-Zero Sum Games for Reactive Synthesis

Authors: Romain Brenguier, Lorenzo Clemente, Paul Hunter, Guillermo A. Pérez, Mickael Randour, Jean-François Raskin, Ocan Sankur, Mathieu Sassolas

Abstract: In this invited contribution, we summarize new solution concepts useful for the synthesis of reactive systems that we have introduced in several recent publications. These solution concepts are developed in the context of non-zero sum games played on graphs. They are part of the contributions obtained in the inVEST project funded by the European Research Council. In this invited contribution, we summarize new solution concepts useful for the synthesis of reactive systems that we have introduced in several recent publications. These solution concepts are developed in the context of non-zero sum games played on graphs. They are part of the contributions obtained in the inVEST project funded by the European Research Council. △ Less

Submitted 17 December, 2015; originally announced December 2015.

Comments: LATA'16 invited paper

arXiv:1512.04255 [pdf, ps, other]

The Fixed Initial Credit Problem for Partial-Observation Energy Games is Ack-complete

Authors: Guillermo A. Pérez

Abstract: In this paper we study two-player games with asymmetric partial observation and an energy objective. Such games are played on a weighted automaton by Eve, choosing actions, and Adam, choosing a transition labelled with the given action. Eve attempts to maintain the sum of the weights (of the transitions taken) non-negative while Adam tries to do the opposite. Eve does not know the exact state of t… ▽ More In this paper we study two-player games with asymmetric partial observation and an energy objective. Such games are played on a weighted automaton by Eve, choosing actions, and Adam, choosing a transition labelled with the given action. Eve attempts to maintain the sum of the weights (of the transitions taken) non-negative while Adam tries to do the opposite. Eve does not know the exact state of the game, she is only given an equivalence class of states which contains it. In contrast, Adam has full observation. We show the fixed initial credit problem for these games is Ack-complete. △ Less

Submitted 16 November, 2016; v1 submitted 14 December, 2015; originally announced December 2015.

arXiv:1511.00523 [pdf, ps, other]

Minimizing Regret in Discounted-Sum Games

Authors: Paul Hunter, Guillermo A. Pérez, Jean-François Raskin

Abstract: In this paper, we study the problem of minimizing regret in discounted-sum games played on weighted game graphs. We give algorithms for the general problem of computing the minimal regret of the controller (Eve) as well as several variants depending on which strategies the environment (Adam) is permitted to use. We also consider the problem of synthesizing regret-free strategies for Eve in each of… ▽ More In this paper, we study the problem of minimizing regret in discounted-sum games played on weighted game graphs. We give algorithms for the general problem of computing the minimal regret of the controller (Eve) as well as several variants depending on which strategies the environment (Adam) is permitted to use. We also consider the problem of synthesizing regret-free strategies for Eve in each of these scenarios. △ Less

Submitted 4 May, 2018; v1 submitted 2 November, 2015; originally announced November 2015.

Comments: arXiv admin note: text overlap with arXiv:1504.01708; some typos have been removed in the proof of simple strategies being sufficient to minimize regret against any adversary

ACM Class: F.1.1; D.2.4

arXiv:1506.08726 [pdf, other]

doi 10.1007/s10009-016-0416-3

The First Reactive Synthesis Competition (SYNTCOMP 2014)

Authors: Swen Jacobs, Roderick Bloem, Romain Brenguier, Rüdiger Ehlers, Timotheus Hell, Robert Könighofer, Guillermo A. Pérez, Jean-François Raskin, Leonid Ryzhyk, Ocan Sankur, Martina Seidl, Leander Tentrup, Adam Walker

Abstract: We introduce the reactive synthesis competition (SYNTCOMP), a long-term effort intended to stimulate and guide advances in the design and application of synthesis procedures for reactive systems. The first iteration of SYNTCOMP is based on the controller synthesis problem for finite-state systems and safety specifications. We provide an overview of this problem and existing approaches to solve it,… ▽ More We introduce the reactive synthesis competition (SYNTCOMP), a long-term effort intended to stimulate and guide advances in the design and application of synthesis procedures for reactive systems. The first iteration of SYNTCOMP is based on the controller synthesis problem for finite-state systems and safety specifications. We provide an overview of this problem and existing approaches to solve it, and report on the design and results of the first SYNTCOMP. This includes the definition of the benchmark format, the collection of benchmarks, the rules of the competition, and the five synthesis tools that participated. We present and analyze the results of the competition and draw conclusions on the state of the art. Finally, we give an outlook on future directions of SYNTCOMP. △ Less

Submitted 13 April, 2016; v1 submitted 29 June, 2015; originally announced June 2015.

Comments: 24 pages, published in STTT

Journal ref: International Journal on Software Tools for Technology Transfer, Online First, 2016, pp 1-24

Showing 1–50 of 55 results for author: Perez, G A