Search | arXiv e-print repository

Learning Algorithms for Verification of Markov Decision Processes

Authors: Tomáš Brázdil, Krishnendu Chatterjee, Martin Chmelik, Vojtěch Forejt, Jan Křetínský, Marta Kwiatkowska, Tobias Meggendorfer, David Parker, Mateusz Ujma

Abstract: We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs). The primary goal of our techniques is to improve performance by avoiding an exhaustive exploration of the state space, instead focussing on particularly relevant areas of the system, guided by heuristics. Our work builds on the previous results of Br{á}z… ▽ More We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs). The primary goal of our techniques is to improve performance by avoiding an exhaustive exploration of the state space, instead focussing on particularly relevant areas of the system, guided by heuristics. Our work builds on the previous results of Br{á}zdil et al., significantly extending it as well as refining several details and fixing errors. The presented framework focuses on probabilistic reachability, which is a core problem in verification, and is instantiated in two distinct scenarios. The first assumes that full knowledge of the MDP is available, in particular precise transition probabilities. It performs a heuristic-driven partial exploration of the model, yielding precise lower and upper bounds on the required probability. The second tackles the case where we may only sample the MDP without knowing the exact transition dynamics. Here, we obtain probabilistic guarantees, again in terms of both the lower and upper bounds, which provides efficient stop** criteria for the approximation. In particular, the latter is an extension of statistical model-checking (SMC) for unbounded properties in MDPs. In contrast to other related approaches, we do not restrict our attention to time-bounded (finite-horizon) or discounted properties, nor assume any particular structural properties of the MDP. △ Less

Submitted 20 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

arXiv:1711.06120 [pdf, ps, other]

doi 10.23638/LMCS-14(4:13)2018

Game Characterization of Probabilistic Bisimilarity, and Applications to Pushdown Automata

Authors: Vojtěch Forejt, Petr Jančar, Stefan Kiefer, James Worrell

Abstract: We study the bisimilarity problem for probabilistic pushdown automata (pPDA) and subclasses thereof. Our definition of pPDA allows both probabilistic and non-deterministic branching, generalising the classical notion of pushdown automata (without epsilon-transitions). We first show a general characterization of probabilistic bisimilarity in terms of two-player games, which naturally reduces checki… ▽ More We study the bisimilarity problem for probabilistic pushdown automata (pPDA) and subclasses thereof. Our definition of pPDA allows both probabilistic and non-deterministic branching, generalising the classical notion of pushdown automata (without epsilon-transitions). We first show a general characterization of probabilistic bisimilarity in terms of two-player games, which naturally reduces checking bisimilarity of probabilistic labelled transition systems to checking bisimilarity of standard (non-deterministic) labelled transition systems. This reduction can be easily implemented in the framework of pPDA, allowing to use known results for standard (non-probabilistic) PDA and their subclasses. A direct use of the reduction incurs an exponential increase of complexity, which does not matter in deriving decidability of bisimilarity for pPDA due to the non-elementary complexity of the problem. In the cases of probabilistic one-counter automata (pOCA), of probabilistic visibly pushdown automata (pvPDA), and of probabilistic basic process algebras (i.e., single-state pPDA) we show that an implicit use of the reduction can avoid the complexity increase; we thus get PSPACE, EXPTIME, and 2-EXPTIME upper bounds, respectively, like for the respective non-probabilistic versions. The bisimilarity problems for OCA and vPDA are known to have matching lower bounds (thus being PSPACE-complete and EXPTIME-complete, respectively); we show that these lower bounds also hold for fully probabilistic versions that do not use non-determinism. △ Less

Submitted 14 November, 2018; v1 submitted 16 November, 2017; originally announced November 2017.

Journal ref: Logical Methods in Computer Science, Volume 14, Issue 4 (November 15, 2018) lmcs:4077

arXiv:1605.03811 [pdf, ps, other]

Decidability Results for Multi-objective Stochastic Games

Authors: Romain Brenguier, Vojtěch Forejt

Abstract: We study stochastic two-player turn-based games in which the objective of one player is to ensure several infinite-horizon total reward objectives, while the other player attempts to spoil at least one of the objectives. The games have previously been shown not to be determined, and an approximation algorithm for computing a Pareto curve has been given. The major drawback of the existing algorithm… ▽ More We study stochastic two-player turn-based games in which the objective of one player is to ensure several infinite-horizon total reward objectives, while the other player attempts to spoil at least one of the objectives. The games have previously been shown not to be determined, and an approximation algorithm for computing a Pareto curve has been given. The major drawback of the existing algorithm is that it needs to compute Pareto curves for finite horizon objectives (for increasing length of the horizon), and the size of these Pareto curves can grow unboundedly, even when the infinite-horizon Pareto curve is small. By adapting existing results, we first give an algorithm that computes the Pareto curve for determined games. Then, as the main result of the paper, we show that for the natural class of stop** games and when there are two reward objectives, the problem of deciding whether a player can ensure satisfaction of the objectives with given thresholds is decidable. The result relies on intricate and novel proof which shows that the Pareto curves contain only finitely many points. As a consequence, we get that the two-objective discounted-reward problem for unrestricted class of stochastic games is decidable. △ Less

Submitted 12 May, 2016; originally announced May 2016.

Comments: 35 pages

arXiv:1604.06386 [pdf, other]

Stability in Graphs and Games

Authors: Tomáš Brázdil, Vojtěch Forejt, Antonín Kučera, Petr Novotný

Abstract: We study graphs and two-player games in which rewards are assigned to states, and the goal of the players is to satisfy or dissatisfy certain property of the generated outcome, given as a mean payoff property. Since the notion of mean-payoff does not reflect possible fluctuations from the mean-payoff along a run, we propose definitions and algorithms for capturing the stability of the system, and… ▽ More We study graphs and two-player games in which rewards are assigned to states, and the goal of the players is to satisfy or dissatisfy certain property of the generated outcome, given as a mean payoff property. Since the notion of mean-payoff does not reflect possible fluctuations from the mean-payoff along a run, we propose definitions and algorithms for capturing the stability of the system, and give algorithms for deciding if a given mean payoff and stability objective can be ensured in the system △ Less

Submitted 21 April, 2016; originally announced April 2016.

arXiv:1604.04435 [pdf, ps, other]

Expected Reachability-Time Games

Authors: Vojtěch Forejt, Marta Kwiatkowska, Gethin Norman, Ashutosh Trivedi

Abstract: Probabilistic timed automata are a suitable formalism to model systems with real-time, nondeterministic and probabilistic behaviour. We study two-player zero-sum games on such automata where the objective of the game is specified as the expected time to reach a target. The two players---called player Min and player Max---compete by proposing timed moves simultaneously and the move with a shorter d… ▽ More Probabilistic timed automata are a suitable formalism to model systems with real-time, nondeterministic and probabilistic behaviour. We study two-player zero-sum games on such automata where the objective of the game is specified as the expected time to reach a target. The two players---called player Min and player Max---compete by proposing timed moves simultaneously and the move with a shorter delay is performed. The first player attempts to minimise the given objective while the second tries to maximise the objective. We observe that these games are not determined, and study decision problems related to computing the upper and lower values, showing that the problems are decidable and lie in the complexity class NEXPTIME $\cap$ co-NEXPTIME. △ Less

Submitted 15 April, 2016; originally announced April 2016.

Comments: Manuscript accepted to Theoretical Computer Science

arXiv:1509.04116 [pdf, ps, other]

Controller synthesis for MDPs and Frequency LTL$\setminus$GU

Authors: Vojtěch Forejt, Jan Krčál, Jan Křetínský

Abstract: Quantitative extensions of temporal logics have recently attracted significant attention. In this work, we study frequency LTL (fLTL), an extension of LTL which allows to speak about frequencies of events along an execution. Such an extension is particularly useful for probabilistic systems that often cannot fulfil strict qualitative guarantees on the behaviour. It has been recently shown that con… ▽ More Quantitative extensions of temporal logics have recently attracted significant attention. In this work, we study frequency LTL (fLTL), an extension of LTL which allows to speak about frequencies of events along an execution. Such an extension is particularly useful for probabilistic systems that often cannot fulfil strict qualitative guarantees on the behaviour. It has been recently shown that controller synthesis for Markov decision processes and fLTL is decidable when all the bounds on frequencies are 1. As a step towards a complete quantitative solution, we show that the problem is decidable for the fragment fLTL$\setminus$GU, where U does not occur in the scope of G (but still F can). Our solution is based on a novel translation of such quantitative formulae into equivalent deterministic automata. △ Less

Submitted 14 September, 2015; originally announced September 2015.

Comments: Extended version of a paper presented at LPAR 2015

arXiv:1504.04662 [pdf, other]

doi 10.2168/LMCS-11(2:16)2015

Permissive Controller Synthesis for Probabilistic Systems

Authors: Klaus Drager, Vojtech Forejt, Marta Kwiatkowska, David Parker, Mateusz Ujma

Abstract: We propose novel controller synthesis techniques for probabilistic systems modelled using stochastic two-player games: one player acts as a controller, the second represents its environment, and probability is used to capture uncertainty arising due to, for example, unreliable sensors or faulty system components. Our aim is to generate robust controllers that are resilient to unexpected system ch… ▽ More We propose novel controller synthesis techniques for probabilistic systems modelled using stochastic two-player games: one player acts as a controller, the second represents its environment, and probability is used to capture uncertainty arising due to, for example, unreliable sensors or faulty system components. Our aim is to generate robust controllers that are resilient to unexpected system changes at runtime, and flexible enough to be adapted if additional constraints need to be imposed. We develop a permissive controller synthesis framework, which generates multi-strategies for the controller, offering a choice of control actions to take at each time step. We formalise the notion of permissivity using penalties, which are incurred each time a possible control action is disallowed by a multi-strategy. Permissive controller synthesis aims to generate a multi-strategy that minimises these penalties, whilst guaranteeing the satisfaction of a specified system property. We establish several key results about the optimality of multi-strategies and the complexity of synthesising them. Then, we develop methods to perform permissive controller synthesis using mixed integer linear programming and illustrate their effectiveness on a selection of case studies. △ Less

Submitted 29 June, 2015; v1 submitted 17 April, 2015; originally announced April 2015.

Journal ref: Logical Methods in Computer Science, Volume 11, Issue 2 (June 30, 2015) lmcs:1576

arXiv:1501.05561 [pdf, other]

On Frequency LTL in Probabilistic Systems

Authors: Vojtěch Forejt, Jan Krčál

Abstract: We study frequency linear-time temporal logic (fLTL) which extends the linear-time temporal logic (LTL) with a path operator $G^p$ expressing that on a path, certain formula holds with at least a given frequency p, thus relaxing the semantics of the usual G operator of LTL. Such logic is particularly useful in probabilistic systems, where some undesirable events such as random failures may occur a… ▽ More We study frequency linear-time temporal logic (fLTL) which extends the linear-time temporal logic (LTL) with a path operator $G^p$ expressing that on a path, certain formula holds with at least a given frequency p, thus relaxing the semantics of the usual G operator of LTL. Such logic is particularly useful in probabilistic systems, where some undesirable events such as random failures may occur and are acceptable if they are rare enough. Frequency-related extensions of LTL have been previously studied by several authors, where mostly the logic is equipped with an extended "until" and "globally" operator, leading to undecidability of most interesting problems. For the variant we study, we are able to establish fundamental decidability results. We show that for Markov chains, the problem of computing the probability with which a given fLTL formula holds has the same complexity as the analogous problem for LTL. We also show that for Markov decision processes the problem becomes more delicate, but when restricting the frequency bound $p$ to be 1 and negations not to be outside any $G^p$ operator, we can compute the maximum probability of satisfying the fLTL formula. This can be again performed with the same time complexity as for the ordinary LTL formulas. △ Less

Submitted 26 June, 2015; v1 submitted 22 January, 2015; originally announced January 2015.

Comments: A paper presented at CONCUR 2015, with appendix

arXiv:1501.03093 [pdf, other]

MultiGain: A controller synthesis tool for MDPs with multiple mean-payoff objectives

Authors: Tomáš Brázdil, Krishnendu Chatterjee, Vojtěch Forejt, Antonín Kučera

Abstract: We present MultiGain, a tool to synthesize strategies for Markov decision processes (MDPs) with multiple mean-payoff objectives. Our models are described in PRISM, and our tool uses the existing interface and simulator of PRISM. Our tool extends PRISM by adding novel algorithms for multiple mean-payoff objectives, and also provides features such as (i)~generating strategies and exploring them for… ▽ More We present MultiGain, a tool to synthesize strategies for Markov decision processes (MDPs) with multiple mean-payoff objectives. Our models are described in PRISM, and our tool uses the existing interface and simulator of PRISM. Our tool extends PRISM by adding novel algorithms for multiple mean-payoff objectives, and also provides features such as (i)~generating strategies and exploring them for simulation, and checking them with respect to other properties; and (ii)~generating an approximate Pareto curve for two mean-payoff objectives. In addition, we present a new practical algorithm for the analysis of MDPs with multiple mean-payoff objectives under memoryless strategies. △ Less

Submitted 13 January, 2015; originally announced January 2015.

Comments: Extended version for a TACAS 2015 tool demo paper

arXiv:1402.2967 [pdf, ps, other]

Verification of Markov Decision Processes using Learning Algorithms

Authors: Tomáš Brázdil, Krishnendu Chatterjee, Martin Chmelík, Vojtěch Forejt, Jan Křetínský, Marta Kwiatkowska, David Parker, Mateusz Ujma

Abstract: We present a general framework for applying machine-learning algorithms to the verification of Markov decision processes (MDPs). The primary goal of these techniques is to improve performance by avoiding an exhaustive exploration of the state space. Our framework focuses on probabilistic reachability, which is a core property for verification, and is illustrated through two distinct instantiations… ▽ More We present a general framework for applying machine-learning algorithms to the verification of Markov decision processes (MDPs). The primary goal of these techniques is to improve performance by avoiding an exhaustive exploration of the state space. Our framework focuses on probabilistic reachability, which is a core property for verification, and is illustrated through two distinct instantiations. The first assumes that full knowledge of the MDP is available, and performs a heuristic-driven partial exploration of the model, yielding precise lower and upper bounds on the required probability. The second tackles the case where we may only sample the MDP, and yields probabilistic guarantees, again in terms of both the lower and upper bounds, which provides efficient stop** criteria for the approximation. The latter is the first extension of statistical model-checking for unbounded properties in MDPs. In contrast with other related approaches, we do not restrict our attention to time-bounded (finite-horizon) or discounted properties, nor assume any particular properties of the MDP. We also show how our techniques extend to LTL objectives. We present experimental results showing the performance of our framework on several examples. △ Less

Submitted 30 March, 2015; v1 submitted 10 February, 2014; originally announced February 2014.

arXiv:1310.3119 [pdf, other]

Solvency Markov Decision Processes with Interest

Authors: Tomáš Brázdil, Taolue Chen, Vojtěch Forejt, Petr Novotný, Aistis Simaitis

Abstract: Solvency games, introduced by Berger et al., provide an abstract framework for modelling decisions of a risk-averse investor, whose goal is to avoid ever going broke. We study a new variant of this model, where, in addition to stochastic environment and fixed increments and decrements to the investor's wealth, we introduce interest, which is earned or paid on the current level of savings or debt,… ▽ More Solvency games, introduced by Berger et al., provide an abstract framework for modelling decisions of a risk-averse investor, whose goal is to avoid ever going broke. We study a new variant of this model, where, in addition to stochastic environment and fixed increments and decrements to the investor's wealth, we introduce interest, which is earned or paid on the current level of savings or debt, respectively. We study problems related to the minimum initial wealth sufficient to avoid bankruptcy (i.e. steady decrease of the wealth) with probability at least p. We present an exponential time algorithm which approximates this minimum initial wealth, and show that a polynomial time approximation is not possible unless P = NP. For the qualitative case, i.e. p=1, we show that the problem whether a given number is larger than or equal to the minimum initial wealth belongs to both NP and coNP, and show that a polynomial time algorithm would yield a polynomial time algorithm for mean-payoff games, existence of which is a longstanding open problem. We also identify some classes of solvency MDPs for which this problem is in P. In all above cases the algorithms also give corresponding bankruptcy avoiding strategies. △ Less

Submitted 11 October, 2013; originally announced October 2013.

Comments: 25 pages. This is a full version of a paper accepted at FST&TCS 2013

arXiv:1302.0745 [pdf, ps, other]

Safe Schedulability of Bounded-Rate Multi-Mode Systems

Authors: Rajeev Alur, Vojtech Forejt, Salar Moarref, Ashutosh Trivedi

Abstract: Bounded-rate multi-mode systems (BMMS) are hybrid systems that can switch freely among a finite set of modes, and whose dynamics is specified by a finite number of real-valued variables with mode-dependent rates that can vary within given bounded sets. The schedulability problem for BMMS is defined as an infinite-round game between two players---the scheduler and the environment---where in each ro… ▽ More Bounded-rate multi-mode systems (BMMS) are hybrid systems that can switch freely among a finite set of modes, and whose dynamics is specified by a finite number of real-valued variables with mode-dependent rates that can vary within given bounded sets. The schedulability problem for BMMS is defined as an infinite-round game between two players---the scheduler and the environment---where in each round the scheduler proposes a time and a mode while the environment chooses an allowable rate for that mode, and the state of the system changes linearly in the direction of the rate vector. The goal of the scheduler is to keep the state of the system within a pre-specified safe set using a non-Zeno schedule, while the goal of the environment is the opposite. Green scheduling under uncertainty is a paradigmatic example of BMMS where a winning strategy of the scheduler corresponds to a robust energy-optimal policy. We present an algorithm to decide whether the scheduler has a winning strategy from an arbitrary starting state, and give an algorithm to compute such a winning strategy, if it exists. We show that the schedulability problem for BMMS is co-NP complete in general, but for two variables it is in PTIME. We also study the discrete schedulability problem where the environment has only finitely many choices of rate vectors in each mode and the scheduler can make decisions only at multiples of a given clock period, and show it to be EXPTIME-complete. △ Less

Submitted 4 February, 2013; originally announced February 2013.

Comments: Technical report for a paper presented at HSCC 2013

arXiv:1210.2273 [pdf, other]

Bisimilarity of Probabilistic Pushdown Automata

Authors: Vojtech Forejt, Petr Jancar, Stefan Kiefer, James Worrell

Abstract: We study the bisimilarity problem for probabilistic pushdown automata (pPDA) and subclasses thereof. Our definition of pPDA allows both probabilistic and non-deterministic branching, generalising the classical notion of pushdown automata (without epsilon-transitions). Our first contribution is a general construction that reduces checking bisimilarity of probabilistic transition systems to checking… ▽ More We study the bisimilarity problem for probabilistic pushdown automata (pPDA) and subclasses thereof. Our definition of pPDA allows both probabilistic and non-deterministic branching, generalising the classical notion of pushdown automata (without epsilon-transitions). Our first contribution is a general construction that reduces checking bisimilarity of probabilistic transition systems to checking bisimilarity of non-deterministic transition systems. This construction directly yields decidability of bisimilarity for pPDA, as well as an elementary upper bound for the bisimilarity problem on the subclass of probabilistic basic process algebras, i.e., single-state pPDA. We further show that, with careful analysis, the general reduction can be used to prove an EXPTIME upper bound for bisimilarity of probabilistic visibly pushdown automata. Here we also provide a matching lower bound, establishing EXPTIME-completeness. Finally we prove that deciding bisimilarity of probabilistic one-counter automata, another subclass of pPDA, is PSPACE-complete. Here we use a more specialised argument to obtain optimal complexity bounds. △ Less

Submitted 8 October, 2012; originally announced October 2012.

Comments: technical report accompanying an FSTTCS'12 paper

arXiv:1206.6295 [pdf, other]

Pareto Curves for Probabilistic Model Checking

Authors: Vojtech Forejt, Marta Kwiatkowska, David Parker

Abstract: Multi-objective probabilistic model checking provides a way to verify several, possibly conflicting, quantitative properties of a stochastic system. It has useful applications in controller synthesis and compositional probabilistic verification. However, existing methods are based on linear programming, which limits the scale of systems that can be analysed and makes verification of time-bounded p… ▽ More Multi-objective probabilistic model checking provides a way to verify several, possibly conflicting, quantitative properties of a stochastic system. It has useful applications in controller synthesis and compositional probabilistic verification. However, existing methods are based on linear programming, which limits the scale of systems that can be analysed and makes verification of time-bounded properties very difficult. We present a novel approach that addresses both of these shortcomings, based on the generation of successive approximations of the Pareto curve for a multi-objective model checking problem. We illustrate dramatic improvements in efficiency on a large set of benchmarks and show how the ability to visualise Pareto curves significantly enhances the quality of results obtained from current probabilistic verification tools. △ Less

Submitted 24 June, 2012; originally announced June 2012.

arXiv:1104.3489 [pdf, ps, other]

doi 10.2168/LMCS-10(1:13)2014

Markov Decision Processes with Multiple Long-run Average Objectives

Authors: Tomáš Brázdil, Václav Brožek, Krishnendu Chatterjee, Vojtěch Forejt, Antonín Kučera

Abstract: We study Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) functions. We consider two different objectives, namely, expectation and satisfaction objectives. Given an MDP with k limit-average functions, in the expectation objective the goal is to maximize the expected limit-average value, and in the satisfaction objective the goal is to maximize the probability of runs s… ▽ More We study Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) functions. We consider two different objectives, namely, expectation and satisfaction objectives. Given an MDP with k limit-average functions, in the expectation objective the goal is to maximize the expected limit-average value, and in the satisfaction objective the goal is to maximize the probability of runs such that the limit-average value stays above a given vector. We show that under the expectation objective, in contrast to the case of one limit-average function, both randomization and memory are necessary for strategies even for epsilon-approximation, and that finite-memory randomized strategies are sufficient for achieving Pareto optimal values. Under the satisfaction objective, in contrast to the case of one limit-average function, infinite memory is necessary for strategies achieving a specific value (i.e. randomized finite-memory strategies are not sufficient), whereas memoryless randomized strategies are sufficient for epsilon-approximation, for all epsilon>0. We further prove that the decision problems for both expectation and satisfaction objectives can be solved in polynomial time and the trade-off curve (Pareto curve) can be epsilon-approximated in time polynomial in the size of the MDP and 1/epsilon, and exponential in the number of limit-average functions, for all epsilon>0. Our analysis also reveals flaws in previous work for MDPs with multiple mean-payoff functions under the expectation objective, corrects the flaws, and allows us to obtain improved results. △ Less

Submitted 12 February, 2014; v1 submitted 18 April, 2011; originally announced April 2011.

Journal ref: Logical Methods in Computer Science, Volume 10, Issue 1 (February 14, 2014) lmcs:1156

Showing 1–15 of 15 results for author: Forejt, V