Search | arXiv e-print repository

Learning Algorithms for Verification of Markov Decision Processes

Authors: Tomáš Brázdil, Krishnendu Chatterjee, Martin Chmelik, Vojtěch Forejt, Jan Křetínský, Marta Kwiatkowska, Tobias Meggendorfer, David Parker, Mateusz Ujma

Abstract: We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs). The primary goal of our techniques is to improve performance by avoiding an exhaustive exploration of the state space, instead focussing on particularly relevant areas of the system, guided by heuristics. Our work builds on the previous results of Br{á}z… ▽ More We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs). The primary goal of our techniques is to improve performance by avoiding an exhaustive exploration of the state space, instead focussing on particularly relevant areas of the system, guided by heuristics. Our work builds on the previous results of Br{á}zdil et al., significantly extending it as well as refining several details and fixing errors. The presented framework focuses on probabilistic reachability, which is a core problem in verification, and is instantiated in two distinct scenarios. The first assumes that full knowledge of the MDP is available, in particular precise transition probabilities. It performs a heuristic-driven partial exploration of the model, yielding precise lower and upper bounds on the required probability. The second tackles the case where we may only sample the MDP without knowing the exact transition dynamics. Here, we obtain probabilistic guarantees, again in terms of both the lower and upper bounds, which provides efficient stop** criteria for the approximation. In particular, the latter is an extension of statistical model-checking (SMC) for unbounded properties in MDPs. In contrast to other related approaches, we do not restrict our attention to time-bounded (finite-horizon) or discounted properties, nor assume any particular structural properties of the MDP. △ Less

Submitted 20 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

arXiv:1710.00675 [pdf, ps, other]

Sensor Synthesis for POMDPs with Reachability Objectives

Authors: Krishnendu Chatterjee, Martin Chmelik, Ufuk Topcu

Abstract: Partially observable Markov decision processes (POMDPs) are widely used in probabilistic planning problems in which an agent interacts with an environment using noisy and imprecise sensors. We study a setting in which the sensors are only partially defined and the goal is to synthesize "weakest" additional sensors, such that in the resulting POMDP, there is a small-memory policy for the agent that… ▽ More Partially observable Markov decision processes (POMDPs) are widely used in probabilistic planning problems in which an agent interacts with an environment using noisy and imprecise sensors. We study a setting in which the sensors are only partially defined and the goal is to synthesize "weakest" additional sensors, such that in the resulting POMDP, there is a small-memory policy for the agent that almost-surely (with probability~1) satisfies a reachability objective. We show that the problem is NP-complete, and present a symbolic algorithm by encoding the problem into SAT instances. We illustrate trade-offs between the amount of memory of the policy and the number of additional sensors on a simple example. We have implemented our approach and consider three classical POMDP examples from the literature, and show that in all the examples the number of sensors can be significantly decreased (as compared to the existing solutions in the literature) without increasing the complexity of the policies. △ Less

Submitted 29 September, 2017; originally announced October 2017.

Comments: arXiv admin note: text overlap with arXiv:1511.08456

arXiv:1602.07565 [pdf, other]

Stochastic Shortest Path with Energy Constraints in POMDPs

Authors: Tomáš Brázdil, Krishnendu Chatterjee, Martin Chmelík, Anchit Gupta, Petr Novotný

Abstract: We consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard cons… ▽ More We consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard constraint. The energy levels may increase and decrease with transitions, and the hard constraint requires that the energy level must remain positive in all steps till the target is reached. First, we present a novel algorithm for solving POMDPs with energy levels, develo** on existing POMDP solvers and using RTDP as its main method. Our second contribution is related to policy representation. For larger POMDP instances the policies computed by existing solvers are too large to be understandable. We present an automated procedure based on machine learning techniques that automatically extracts important decisions of the policy allowing us to compute succinct human readable policies. Finally, we show experimentally that our algorithm performs well and computes succinct policies on a number of POMDP instances from the literature that were naturally enhanced with energy levels. △ Less

Submitted 11 May, 2016; v1 submitted 24 February, 2016; originally announced February 2016.

Comments: Technical report accompanying a paper published in proceedings of AAMAS 2016

arXiv:1511.08456 [pdf, ps, other]

A Symbolic SAT-based Algorithm for Almost-sure Reachability with Small Strategies in POMDPs

Authors: Krishnendu Chatterjee, Martin Chmelik, Jessica Davies

Abstract: POMDPs are standard models for probabilistic planning problems, where an agent interacts with an uncertain environment. We study the problem of almost-sure reachability, where given a set of target states, the question is to decide whether there is a policy to ensure that the target set is reached with probability 1 (almost-surely). While in general the problem is EXPTIME-complete, in many practic… ▽ More POMDPs are standard models for probabilistic planning problems, where an agent interacts with an uncertain environment. We study the problem of almost-sure reachability, where given a set of target states, the question is to decide whether there is a policy to ensure that the target set is reached with probability 1 (almost-surely). While in general the problem is EXPTIME-complete, in many practical cases policies with a small amount of memory suffice. Moreover, the existing solution to the problem is explicit, which first requires to construct explicitly an exponential reduction to a belief-support MDP. In this work, we first study the existence of observation-stationary strategies, which is NP-complete, and then small-memory strategies. We present a symbolic algorithm by an efficient encoding to SAT and using a SAT solver for the problem. We report experimental results demonstrating the scalability of our symbolic (SAT-based) approach. △ Less

Submitted 26 November, 2015; originally announced November 2015.

Comments: Full version of "A Symbolic SAT-based Algorithm for Almost-sure Reachability with Small Strategies in POMDPs" AAAI 2016

arXiv:1502.02834 [pdf, other]

Counterexample Explanation by Learning Small Strategies in Markov Decision Processes

Authors: Tomáš Brázdil, Krishnendu Chatterjee, Martin Chmelík, Andreas Fellner, Jan Křetínský

Abstract: While for deterministic systems, a counterexample to a property can simply be an error trace, counterexamples in probabilistic systems are necessarily more complex. For instance, a set of erroneous traces with a sufficient cumulative probability mass can be used. Since these are too large objects to understand and manipulate, compact representations such as subchains have been considered. In the c… ▽ More While for deterministic systems, a counterexample to a property can simply be an error trace, counterexamples in probabilistic systems are necessarily more complex. For instance, a set of erroneous traces with a sufficient cumulative probability mass can be used. Since these are too large objects to understand and manipulate, compact representations such as subchains have been considered. In the case of probabilistic systems with non-determinism, the situation is even more complex. While a subchain for a given strategy (or scheduler, resolving non-determinism) is a straightforward choice, we take a different approach. Instead, we focus on the strategy - which can be a counterexample to violation of or a witness of satisfaction of a property - itself, and extract the most important decisions it makes, and present its succinct representation. The key tools we employ to achieve this are (1) introducing a concept of importance of a state w.r.t. the strategy, and (2) learning using decision trees. There are three main consequent advantages of our approach. Firstly, it exploits the quantitative information on states, stressing the more important decisions. Secondly, it leads to a greater variability and degree of freedom in representing the strategies. Thirdly, the representation uses a self-explanatory data structure. In summary, our approach produces more succinct and more explainable strategies, as opposed to e.g. binary decision diagrams. Finally, our experimental results show that we can extract several rules describing the strategy even for very large systems that do not fit in memory, and based on the rules explain the erroneous behaviour. △ Less

Submitted 10 February, 2015; originally announced February 2015.

arXiv:1411.3880 [pdf, other]

Optimal Cost Almost-sure Reachability in POMDPs

Authors: Krishnendu Chatterjee, Martin Chmelík, Raghav Gupta, Ayush Kanodia

Abstract: We consider partially observable Markov decision processes (POMDPs) with a set of target states and every transition is associated with an integer cost. The optimization objective we study asks to minimize the expected total cost till the target set is reached, while ensuring that the target set is reached almost-surely (with probability 1). We show that for integer costs approximating the optimal… ▽ More We consider partially observable Markov decision processes (POMDPs) with a set of target states and every transition is associated with an integer cost. The optimization objective we study asks to minimize the expected total cost till the target set is reached, while ensuring that the target set is reached almost-surely (with probability 1). We show that for integer costs approximating the optimal cost is undecidable. For positive costs, our results are as follows: (i) we establish matching lower and upper bounds for the optimal cost and the bound is double exponential; (ii) we show that the problem of approximating the optimal cost is decidable and present approximation algorithms develo** on the existing algorithms for POMDPs with finite-horizon objectives. While the worst-case running time of our algorithm is double exponential, we also present efficient stop** criteria for the algorithm and show experimentally that it performs well in many examples of interest. △ Less

Submitted 14 November, 2014; originally announced November 2014.

Comments: Full Version of Optimal Cost Almost-sure Reachability in POMDPs, AAAI 2015. arXiv admin note: text overlap with arXiv:1207.4166 by other authors

arXiv:1409.3360 [pdf, other]

Qualitative Analysis of POMDPs with Temporal Logic Specifications for Robotics Applications

Authors: Krishnendu Chatterjee, Martin Chmelík, Raghav Gupta, Ayush Kanodia

Abstract: We consider partially observable Markov decision processes (POMDPs), that are a standard framework for robotics applications to model uncertainties present in the real world, with temporal logic specifications. All temporal logic specifications in linear-time temporal logic (LTL) can be expressed as parity objectives. We study the qualitative analysis problem for POMDPs with parity objectives that… ▽ More We consider partially observable Markov decision processes (POMDPs), that are a standard framework for robotics applications to model uncertainties present in the real world, with temporal logic specifications. All temporal logic specifications in linear-time temporal logic (LTL) can be expressed as parity objectives. We study the qualitative analysis problem for POMDPs with parity objectives that asks whether there is a controller (policy) to ensure that the objective holds with probability 1 (almost-surely). While the qualitative analysis of POMDPs with parity objectives is undecidable, recent results show that when restricted to finite-memory policies the problem is EXPTIME-complete. While the problem is intractable in theory, we present a practical approach to solve the qualitative analysis problem. We designed several heuristics to deal with the exponential complexity, and have used our implementation on a number of well-known POMDP examples for robotics applications. Our results provide the first practical approach to solve the qualitative analysis of robot motion planning with LTL properties in the presence of uncertainty. △ Less

Submitted 18 February, 2015; v1 submitted 11 September, 2014; originally announced September 2014.

arXiv:1408.2058 [pdf]

POMDPs under Probabilistic Semantics

Authors: Krishnendu Chatterjee, Martin Chmelik

Abstract: We consider partially observable Markov decision processes (POMDPs) with limit-average payoff, where a reward value in the interval [0,1] is associated to every transition, and the payoff of an infinite path is the long-run average of the rewards. We consider two types of path constraints: (i) quantitative constraint defines the set of paths where the payoff is at least a given threshold lambda_1… ▽ More We consider partially observable Markov decision processes (POMDPs) with limit-average payoff, where a reward value in the interval [0,1] is associated to every transition, and the payoff of an infinite path is the long-run average of the rewards. We consider two types of path constraints: (i) quantitative constraint defines the set of paths where the payoff is at least a given threshold lambda_1 in (0,1]; and (ii) qualitative constraint which is a special case of quantitative constraint with lambda_1=1. We consider the computation of the almost-sure winning set, where the controller needs to ensure that the path constraint is satisfied with probability 1. Our main results for qualitative path constraint are as follows: (i) the problem of deciding the existence of a finite-memory controller is EXPTIME-complete; and (ii) the problem of deciding the existence of an infinite-memory controller is undecidable. For quantitative path constraint we show that the problem of deciding the existence of a finite-memory controller is undecidable. △ Less

Submitted 9 August, 2014; originally announced August 2014.

Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

Report number: UAI-P-2013-PG-142-151

arXiv:1405.0835 [pdf, ps, other]

CEGAR for Qualitative Analysis of Probabilistic Systems

Authors: Krishnendu Chatterjee, Martin Chmelik, Przemyslaw Daca

Abstract: We consider Markov decision processes (MDPs) which are a standard model for probabilistic systems. We focus on qualitative properties for MDPs that can express that desired behaviors of the system arise almost-surely (with probability 1) or with positive probability. We introduce a new simulation relation to capture the refinement relation of MDPs with respect to qualitative properties, and presen… ▽ More We consider Markov decision processes (MDPs) which are a standard model for probabilistic systems. We focus on qualitative properties for MDPs that can express that desired behaviors of the system arise almost-surely (with probability 1) or with positive probability. We introduce a new simulation relation to capture the refinement relation of MDPs with respect to qualitative properties, and present discrete graph theoretic algorithms with quadratic complexity to compute the simulation relation. We present an automated technique for assume-guarantee style reasoning for compositional analysis of MDPs with qualitative properties by giving a counter-example guided abstraction-refinement approach to compute our new simulation relation. We have implemented our algorithms and show that the compositional analysis leads to significant improvements. △ Less

Submitted 5 May, 2014; originally announced May 2014.

arXiv:1402.2967 [pdf, ps, other]

Verification of Markov Decision Processes using Learning Algorithms

Authors: Tomáš Brázdil, Krishnendu Chatterjee, Martin Chmelík, Vojtěch Forejt, Jan Křetínský, Marta Kwiatkowska, David Parker, Mateusz Ujma

Abstract: We present a general framework for applying machine-learning algorithms to the verification of Markov decision processes (MDPs). The primary goal of these techniques is to improve performance by avoiding an exhaustive exploration of the state space. Our framework focuses on probabilistic reachability, which is a core property for verification, and is illustrated through two distinct instantiations… ▽ More We present a general framework for applying machine-learning algorithms to the verification of Markov decision processes (MDPs). The primary goal of these techniques is to improve performance by avoiding an exhaustive exploration of the state space. Our framework focuses on probabilistic reachability, which is a core property for verification, and is illustrated through two distinct instantiations. The first assumes that full knowledge of the MDP is available, and performs a heuristic-driven partial exploration of the model, yielding precise lower and upper bounds on the required probability. The second tackles the case where we may only sample the MDP, and yields probabilistic guarantees, again in terms of both the lower and upper bounds, which provides efficient stop** criteria for the approximation. The latter is the first extension of statistical model-checking for unbounded properties in MDPs. In contrast with other related approaches, we do not restrict our attention to time-bounded (finite-horizon) or discounted properties, nor assume any particular properties of the MDP. We also show how our techniques extend to LTL objectives. We present experimental results showing the performance of our framework on several examples. △ Less

Submitted 30 March, 2015; v1 submitted 10 February, 2014; originally announced February 2014.

arXiv:1309.2802 [pdf, ps, other]

What is Decidable about Partially Observable Markov Decision Processes with ω-Regular Objectives

Authors: Krishnendu Chatterjee, Martin Chmelik, Mathieu Tracol

Abstract: We consider partially observable Markov decision processes (POMDPs) with ω-regular conditions specified as parity objectives. The class of ω-regular languages extends regular languages to infinite strings and provides a robust specification language to express all properties used in verification, and parity objectives are canonical forms to express ω-regular conditions. The qualitative analysis pr… ▽ More We consider partially observable Markov decision processes (POMDPs) with ω-regular conditions specified as parity objectives. The class of ω-regular languages extends regular languages to infinite strings and provides a robust specification language to express all properties used in verification, and parity objectives are canonical forms to express ω-regular conditions. The qualitative analysis problem given a POMDP and a parity objective asks whether there is a strategy to ensure that the objective is satis- fied with probability 1 (resp. positive probability). While the qualitative analysis problems are known to be undecidable even for very special cases of parity objectives, we establish decidability (with optimal complexity) of the qualitative analysis problems for POMDPs with all parity objectives under finite- memory strategies. We establish optimal (exponential) memory bounds and EXPTIME-completeness of the qualitative analysis problems under finite-memory strategies for POMDPs with parity objectives. △ Less

Submitted 11 September, 2013; originally announced September 2013.

Comments: Full version of the CSL 2013 paper. arXiv admin note: text overlap with arXiv:1308.4846

arXiv:1308.4846 [pdf, ps, other]

POMDPs under Probabilistic Semantics

Authors: Krishnendu Chatterjee, Martin Chmelík

Abstract: We consider partially observable Markov decision processes (POMDPs) with limit-average payoff, where a reward value in the interval [0,1] is associated to every transition, and the payoff of an infinite path is the long-run average of the rewards. We consider two types of path constraints: (i) quantitative constraint defines the set of paths where the payoff is at least a given threshold λ in (0,… ▽ More We consider partially observable Markov decision processes (POMDPs) with limit-average payoff, where a reward value in the interval [0,1] is associated to every transition, and the payoff of an infinite path is the long-run average of the rewards. We consider two types of path constraints: (i) quantitative constraint defines the set of paths where the payoff is at least a given threshold λ in (0, 1]; and (ii) qualitative constraint which is a special case of quantitative constraint with λ = 1. We consider the computation of the almost-sure winning set, where the controller needs to ensure that the path constraint is satisfied with probability 1. Our main results for qualitative path constraint are as follows: (i) the problem of deciding the existence of a finite-memory controller is EXPTIME-complete; and (ii) the problem of deciding the existence of an infinite-memory controller is undecidable. For quantitative path constraint we show that the problem of deciding the existence of a finite-memory controller is undecidable. △ Less

Submitted 22 August, 2013; originally announced August 2013.

Comments: Full version of: POMDPs under Probabilistic Semantics, UAI 2013

arXiv:1210.2450 [pdf, other]

doi 10.4204/EPTCS.96.3

Interface Simulation Distances

Authors: Pavol Černý, Martin Chmelík, Thomas A. Henzinger, Arjun Radhakrishna

Abstract: The classical (boolean) notion of refinement for behavioral interfaces of system components is the alternating refinement preorder. In this paper, we define a distance for interfaces, called interface simulation distance. It makes the alternating refinement preorder quantitative by, intuitively, tolerating errors (while counting them) in the alternating simulation game. We show that the interface… ▽ More The classical (boolean) notion of refinement for behavioral interfaces of system components is the alternating refinement preorder. In this paper, we define a distance for interfaces, called interface simulation distance. It makes the alternating refinement preorder quantitative by, intuitively, tolerating errors (while counting them) in the alternating simulation game. We show that the interface simulation distance satisfies the triangle inequality, that the distance between two interfaces does not increase under parallel composition with a third interface, and that the distance between two interfaces can be bounded from above and below by distances between abstractions of the two interfaces. We illustrate the framework, and the properties of the distances under composition of interfaces, with two case studies. △ Less

Submitted 8 October, 2012; originally announced October 2012.

Comments: In Proceedings GandALF 2012, arXiv:1210.2028

Journal ref: EPTCS 96, 2012, pp. 29-42

arXiv:1209.4499 [pdf, other]

Controllable-choice Message Sequence Graphs

Authors: Martin Chmelík, Vojtěch Řehák

Abstract: We focus on the realizability problem of Message Sequence Graphs (MSG), i.e. the problem whether a given MSG specification is correctly distributable among parallel components communicating via messages. This fundamental problem of MSG is known to be undecidable. We introduce a well motivated restricted class of MSG, so called controllable-choice MSG, and show that all its models are realizable an… ▽ More We focus on the realizability problem of Message Sequence Graphs (MSG), i.e. the problem whether a given MSG specification is correctly distributable among parallel components communicating via messages. This fundamental problem of MSG is known to be undecidable. We introduce a well motivated restricted class of MSG, so called controllable-choice MSG, and show that all its models are realizable and moreover it is decidable whether a given MSG model is a member of this class. In more detail, this class of MSG specifications admits a deadlock-free realization by overloading existing messages with additional bounded control data. We also show that the presented class is the largest known subclass of MSG that allows for deadlock-free realization. △ Less

Submitted 21 September, 2012; v1 submitted 20 September, 2012; originally announced September 2012.

Comments: The full version of paper accepted to LNCS proceedings of MEMICS 2012

arXiv:1202.4140 [pdf, ps, other]

Equivalence of Games with Probabilistic Uncertainty and Partial-observation Games

Authors: Krishnendu Chatterjee, Martin Chmelik, Rupak Majumdar

Abstract: We introduce games with probabilistic uncertainty, a natural model for controller synthesis in which the controller observes the state of the system through imprecise sensors that provide correct information about the current state with a fixed probability. That is, in each step, the sensors return an observed state, and given the observed state, there is a probability distribution (due to the est… ▽ More We introduce games with probabilistic uncertainty, a natural model for controller synthesis in which the controller observes the state of the system through imprecise sensors that provide correct information about the current state with a fixed probability. That is, in each step, the sensors return an observed state, and given the observed state, there is a probability distribution (due to the estimation error) over the actual current state. The controller must base its decision on the observed state (rather than the actual current state, which it does not know). On the other hand, we assume that the environment can perfectly observe the current state. We show that our model can be reduced in polynomial time to standard partial-observation stochastic games, and vice-versa. As a consequence we establish the precise decidability frontier for the new class of games, and for most of the decidable problems establish optimal complexity results. △ Less

Submitted 1 July, 2012; v1 submitted 19 February, 2012; originally announced February 2012.

Showing 1–15 of 15 results for author: Chmelik, M