-
Learning Algorithms for Verification of Markov Decision Processes
Authors:
Tomáš Brázdil,
Krishnendu Chatterjee,
Martin Chmelik,
Vojtěch Forejt,
Jan Křetínský,
Marta Kwiatkowska,
Tobias Meggendorfer,
David Parker,
Mateusz Ujma
Abstract:
We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs). The primary goal of our techniques is to improve performance by avoiding an exhaustive exploration of the state space, instead focussing on particularly relevant areas of the system, guided by heuristics. Our work builds on the previous results of Br{á}z…
▽ More
We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs). The primary goal of our techniques is to improve performance by avoiding an exhaustive exploration of the state space, instead focussing on particularly relevant areas of the system, guided by heuristics. Our work builds on the previous results of Br{á}zdil et al., significantly extending it as well as refining several details and fixing errors.
The presented framework focuses on probabilistic reachability, which is a core problem in verification, and is instantiated in two distinct scenarios. The first assumes that full knowledge of the MDP is available, in particular precise transition probabilities. It performs a heuristic-driven partial exploration of the model, yielding precise lower and upper bounds on the required probability. The second tackles the case where we may only sample the MDP without knowing the exact transition dynamics. Here, we obtain probabilistic guarantees, again in terms of both the lower and upper bounds, which provides efficient stop** criteria for the approximation. In particular, the latter is an extension of statistical model-checking (SMC) for unbounded properties in MDPs. In contrast to other related approaches, we do not restrict our attention to time-bounded (finite-horizon) or discounted properties, nor assume any particular structural properties of the MDP.
△ Less
Submitted 20 March, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
On-the-fly Adaptation of Patrolling Strategies in Changing Environments
Authors:
Tomáš Brázdil,
David Klaška,
Antonín Kučera,
Vít Musil,
Petr Novotný,
Vojtěch Řehák
Abstract:
We consider the problem of efficient patrolling strategy adaptation in a changing environment where the topology of Defender's moves and the importance of guarded targets change unpredictably. The Defender must instantly switch to a new strategy optimized for the new environment, not disrupting the ongoing patrolling task, and the new strategy must be computed promptly under all circumstances. Sin…
▽ More
We consider the problem of efficient patrolling strategy adaptation in a changing environment where the topology of Defender's moves and the importance of guarded targets change unpredictably. The Defender must instantly switch to a new strategy optimized for the new environment, not disrupting the ongoing patrolling task, and the new strategy must be computed promptly under all circumstances. Since strategy switching may cause unintended security risks compromising the achieved protection, our solution includes mechanisms for detecting and mitigating this problem. The efficiency of our framework is evaluated experimentally.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
Qualitative Controller Synthesis for Consumption Markov Decision Processes
Authors:
František Blahoudek,
Tomáš Brázdil,
Petr Novotný,
Melkior Ornik,
Pranay Thangeda,
Ufuk Topcu
Abstract:
Consumption Markov Decision Processes (CMDPs) are probabilistic decision-making models of resource-constrained systems. In a CMDP, the controller possesses a certain amount of a critical resource, such as electric power. Each action of the controller can consume some amount of the resource. Resource replenishment is only possible in special reload states, in which the resource level can be reloade…
▽ More
Consumption Markov Decision Processes (CMDPs) are probabilistic decision-making models of resource-constrained systems. In a CMDP, the controller possesses a certain amount of a critical resource, such as electric power. Each action of the controller can consume some amount of the resource. Resource replenishment is only possible in special reload states, in which the resource level can be reloaded up to the full capacity of the system. The task of the controller is to prevent resource exhaustion, i.e. ensure that the available amount of the resource stays non-negative, while ensuring an additional linear-time property. We study the complexity of strategy synthesis in consumption MDPs with almost-sure Büchi objectives. We show that the problem can be solved in polynomial time. We implement our algorithm and show that it can efficiently solve CMDPs modelling real-world scenarios.
△ Less
Submitted 14 May, 2020;
originally announced May 2020.
-
Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
Authors:
Tomas Brazdil,
Krishnendu Chatterjee,
Petr Novotny,
Jiri Vahala
Abstract:
Markov decision processes (MDPs) are the defacto frame-work for sequential decision making in the presence ofstochastic uncertainty. A classical optimization criterion forMDPs is to maximize the expected discounted-sum pay-off, which ignores low probability catastrophic events withhighly negative impact on the system. On the other hand,risk-averse policies require the probability of undesirableeve…
▽ More
Markov decision processes (MDPs) are the defacto frame-work for sequential decision making in the presence ofstochastic uncertainty. A classical optimization criterion forMDPs is to maximize the expected discounted-sum pay-off, which ignores low probability catastrophic events withhighly negative impact on the system. On the other hand,risk-averse policies require the probability of undesirableevents to be below a given threshold, but they do not accountfor optimization of the expected payoff. We consider MDPswith discounted-sum payoff with failure states which repre-sent catastrophic outcomes. The objective of risk-constrainedplanning is to maximize the expected discounted-sum payoffamong risk-averse policies that ensure the probability to en-counter a failure state is below a desired threshold. Our maincontribution is an efficient risk-constrained planning algo-rithm that combines UCT-like search with a predictor learnedthrough interaction with the MDP (in the style of AlphaZero)and with a risk-constrained action selection via linear pro-gramming. We demonstrate the effectiveness of our approachwith experiments on classical MDPs from the literature, in-cluding benchmarks with an order of 10^6 states.
△ Less
Submitted 27 February, 2020;
originally announced February 2020.
-
Deciding Fast Termination for Probabilistic VASS with Nondeterminism
Authors:
Tomáš Brázdil,
Krishnendu Chatterjee,
Antonín Kučera,
Petr Novotný,
Dominik Velan
Abstract:
A probabilistic vector addition system with states (pVASS) is a finite state Markov process augmented with non-negative integer counters that can be incremented or decremented during each state transition, blocking any behaviour that would cause a counter to decrease below zero. The pVASS can be used as abstractions of probabilistic programs with many decidable properties. The use of pVASS as abst…
▽ More
A probabilistic vector addition system with states (pVASS) is a finite state Markov process augmented with non-negative integer counters that can be incremented or decremented during each state transition, blocking any behaviour that would cause a counter to decrease below zero. The pVASS can be used as abstractions of probabilistic programs with many decidable properties. The use of pVASS as abstractions requires the presence of nondeterminism in the model. In this paper, we develop techniques for checking fast termination of pVASS with nondeterminism.
That is, for every initial configuration of size n, we consider the worst expected number of transitions needed to reach a configuration with some counter negative (the expected termination time). We show that the problem whether the asymptotic expected termination time is linear is decidable in polynomial time for a certain natural class of pVASS with nondeterminism. Furthermore, we show the following dichotomy: if the asymptotic expected termination time is not linear, then it is at least quadratic, i.e., in $Ω(n^2)$.
△ Less
Submitted 25 July, 2019;
originally announced July 2019.
-
Strategy Representation by Decision Trees with Linear Classifiers
Authors:
Pranav Ashok,
Tomáš Brázdil,
Krishnendu Chatterjee,
Jan Křetínský,
Christoph H. Lampert,
Viktor Toman
Abstract:
Graph games and Markov decision processes (MDPs) are standard models in reactive synthesis and verification of probabilistic systems with nondeterminism. The class of $ω$-regular winning conditions; e.g., safety, reachability, liveness, parity conditions; provides a robust and expressive specification formalism for properties that arise in analysis of reactive systems. The resolutions of nondeterm…
▽ More
Graph games and Markov decision processes (MDPs) are standard models in reactive synthesis and verification of probabilistic systems with nondeterminism. The class of $ω$-regular winning conditions; e.g., safety, reachability, liveness, parity conditions; provides a robust and expressive specification formalism for properties that arise in analysis of reactive systems. The resolutions of nondeterminism in games and MDPs are represented as strategies, and we consider succinct representation of such strategies. The decision-tree data structure from machine learning retains the flavor of decisions of strategies and allows entropy-based minimization to obtain succinct trees. However, in contrast to traditional machine-learning problems where small errors are allowed, for winning strategies in graph games and MDPs no error is allowed, and the decision tree must represent the entire strategy. In this work we propose decision trees with linear classifiers for representation of strategies in graph games and MDPs. We have implemented strategy representation using this data structure and we present experimental results for problems on graph games and MDPs, which show that this new data structure presents a much more efficient strategy representation as compared to standard decision trees.
△ Less
Submitted 27 June, 2019; v1 submitted 19 June, 2019;
originally announced June 2019.
-
Monte Carlo Tree Search for Verifying Reachability in Markov Decision Processes
Authors:
Pranav Ashok,
Tomáš Brázdil,
Jan Křetínský,
Ondřej Slámečka
Abstract:
The maximum reachability probabilities in a Markov decision process can be computed using value iteration (VI). Recently, simulation-based heuristic extensions of VI have been introduced, such as bounded real-time dynamic programming (BRTDP), which often manage to avoid explicit analysis of the whole state space while preserving guarantees on the computed result. In this paper, we introduce a new…
▽ More
The maximum reachability probabilities in a Markov decision process can be computed using value iteration (VI). Recently, simulation-based heuristic extensions of VI have been introduced, such as bounded real-time dynamic programming (BRTDP), which often manage to avoid explicit analysis of the whole state space while preserving guarantees on the computed result. In this paper, we introduce a new class of such heuristics, based on Monte Carlo tree search (MCTS), a technique celebrated in various machine-learning settings. We provide a spectrum of algorithms ranging from MCTS to BRTDP. We evaluate these techniques and show that for larger examples, where VI is no more applicable, our techniques are more broadly applicable than BRTDP with only a minor additional overhead.
△ Less
Submitted 10 September, 2018;
originally announced September 2018.
-
Synthesizing Efficient Solutions for Patrolling Problems in the Internet Environment
Authors:
Tomáš Brázdil,
Antonín Kučera,
Vojtěch Řehák
Abstract:
We propose an algorithm for constructing efficient patrolling strategies in the Internet environment, where the protected targets are nodes connected to the network and the patrollers are software agents capable of detecting/preventing undesirable activities on the nodes. The algorithm is based on a novel compositional principle designed for a special class of strategies, and it can quickly constr…
▽ More
We propose an algorithm for constructing efficient patrolling strategies in the Internet environment, where the protected targets are nodes connected to the network and the patrollers are software agents capable of detecting/preventing undesirable activities on the nodes. The algorithm is based on a novel compositional principle designed for a special class of strategies, and it can quickly construct (sub)optimal solutions even if the number of targets reaches hundreds of millions.
△ Less
Submitted 10 May, 2018; v1 submitted 8 May, 2018;
originally announced May 2018.
-
Efficient Algorithms for Asymptotic Bounds on Termination Time in VASS
Authors:
Tomáš Brázdil,
Krishnendu Chatterjee,
Antonín Kučera,
Petr Novotný,
Dominik Velan,
Florian Zuleger
Abstract:
Vector Addition Systems with States (VASS) provide a well-known and fundamental model for the analysis of concurrent processes, parameterized systems, and are also used as abstract models of programs in resource bound analysis. In this paper we study the problem of obtaining asymptotic bounds on the termination time of a given VASS. In particular, we focus on the practically important case of obta…
▽ More
Vector Addition Systems with States (VASS) provide a well-known and fundamental model for the analysis of concurrent processes, parameterized systems, and are also used as abstract models of programs in resource bound analysis. In this paper we study the problem of obtaining asymptotic bounds on the termination time of a given VASS. In particular, we focus on the practically important case of obtaining polynomial bounds on termination time. Our main contributions are as follows: First, we present a polynomial-time algorithm for deciding whether a given VASS has a linear asymptotic complexity. We also show that if the complexity of a VASS is not linear, it is at least quadratic. Second, we classify VASS according to quantitative properties of their cycles. We show that certain singularities in these properties are the key reason for non-polynomial asymptotic complexity of VASS. In absence of singularities, we show that the asymptotic complexity is always polynomial and of the form $Θ(n^k)$, for some integer $k\leq d$, where $d$ is the dimension of the VASS. We present a polynomial-time algorithm computing the optimal $k$. For general VASS, the same algorithm, which is based on a complete technique for the construction of ranking functions in VASS, produces a valid lower bound, i.e., a $k$ such that the termination complexity is $Ω(n^k)$. Our results are based on new insights into the geometry of VASS dynamics, which hold the potential for further applicability to VASS analysis.
△ Less
Submitted 29 April, 2018;
originally announced April 2018.
-
Strategy Representation by Decision Trees in Reactive Synthesis
Authors:
Tomáš Brázdil,
Krishnendu Chatterjee,
Jan Křetínský,
Viktor Toman
Abstract:
Graph games played by two players over finite-state graphs are central in many problems in computer science. In particular, graph games with $ω$-regular winning conditions, specified as parity objectives, which can express properties such as safety, liveness, fairness, are the basic framework for verification and synthesis of reactive systems. The decisions for a player at various states of the gr…
▽ More
Graph games played by two players over finite-state graphs are central in many problems in computer science. In particular, graph games with $ω$-regular winning conditions, specified as parity objectives, which can express properties such as safety, liveness, fairness, are the basic framework for verification and synthesis of reactive systems. The decisions for a player at various states of the graph game are represented as strategies. While the algorithmic problem for solving graph games with parity objectives has been widely studied, the most prominent data-structure for strategy representation in graph games has been binary decision diagrams (BDDs). However, due to the bit-level representation, BDDs do not retain the inherent flavor of the decisions of strategies, and are notoriously hard to minimize to obtain succinct representation. In this work we propose decision trees for strategy representation in graph games. Decision trees retain the flavor of decisions of strategies and allow entropy-based minimization to obtain succinct trees. However, decision trees work in settings (e.g., probabilistic models) where errors are allowed, and overfitting of data is typically avoided. In contrast, for strategies in graph games no error is allowed, and the decision tree must represent the entire strategy. We develop new techniques to extend decision trees to overcome the above obstacles, while retaining the entropy-based techniques to obtain succinct trees. We have implemented our techniques to extend the existing decision tree solvers. We present experimental results for problems in reactive synthesis to show that decision trees provide a much more efficient data-structure for strategy representation as compared to BDDs.
△ Less
Submitted 19 March, 2018; v1 submitted 2 February, 2018;
originally announced February 2018.
-
Efficient Algorithms for Checking Fast Termination in VASS
Authors:
Tomáš Brázdil,
Krishnendu Chatterjee,
Antonín Kučera,
Petr Novotný,
Dominik Velan
Abstract:
Vector Addition Systems with States (VASS) consists of a finite state space equipped with d counters, where in each transition every counter is incremented, decremented, or left unchanged. VASS provide a fundamental model for analysis of concurrent processes, parametrized systems, and they are also used as abstract models for programs for bounds analysis. While termination is the basic liveness pr…
▽ More
Vector Addition Systems with States (VASS) consists of a finite state space equipped with d counters, where in each transition every counter is incremented, decremented, or left unchanged. VASS provide a fundamental model for analysis of concurrent processes, parametrized systems, and they are also used as abstract models for programs for bounds analysis. While termination is the basic liveness property that asks the qualitative question of whether a given model always terminates or not, the more general quantitative question asks for bounds on the number of steps to termination. In the realm of quantitative bounds a fundamental problem is to obtain asymptotic bounds on termination time. Large asymptotic bounds such as exponential or higher already suggest that either there is some error in modeling, or the model is not useful in practice. Hence we focus on polynomial asymptotic bounds for VASS. While some well-known approaches (e.g., lexicographic ranking functions) are neither sound nor complete with respect to polynomial bounds, other approaches only present sound methods for upper bounds. In this work our main contributions are as follows: First, for linear asymptotic bounds we present a sound and complete method for VASS, and moreover, our algorithm runs in polynomial time. Second, we classify VASS according the normals of the vectors of the cycles. We show that singularities in the normal are the key reason for asymptotic bounds such as exponential and non-elementary for VASS. In absence of singularities, we show that the asymptotic complexity bound is always polynomial and of the form $Θ(n^k)$, for some k $\leq$ d. We present an algorithm, with time complexity polynomial in the size of the VASS and exponential in dimension d, to compute the optimal k.
△ Less
Submitted 29 August, 2017;
originally announced August 2017.
-
Optimizing the Expected Mean Payoff in Energy Markov Decision Processes
Authors:
Tomáš Brázdil,
Antonín Kučera,
Petr Novotný
Abstract:
Energy Markov Decision Processes (EMDPs) are finite-state Markov decision processes where each transition is assigned an integer counter update and a rational payoff. An EMDP configuration is a pair s(n), where s is a control state and n is the current counter value. The configurations are changed by performing transitions in the standard way. We consider the problem of computing a safe strategy (…
▽ More
Energy Markov Decision Processes (EMDPs) are finite-state Markov decision processes where each transition is assigned an integer counter update and a rational payoff. An EMDP configuration is a pair s(n), where s is a control state and n is the current counter value. The configurations are changed by performing transitions in the standard way. We consider the problem of computing a safe strategy (i.e., a strategy that keeps the counter non-negative) which maximizes the expected mean payoff.
△ Less
Submitted 3 July, 2016;
originally announced July 2016.
-
Stability in Graphs and Games
Authors:
Tomáš Brázdil,
Vojtěch Forejt,
Antonín Kučera,
Petr Novotný
Abstract:
We study graphs and two-player games in which rewards are assigned to states, and the goal of the players is to satisfy or dissatisfy certain property of the generated outcome, given as a mean payoff property. Since the notion of mean-payoff does not reflect possible fluctuations from the mean-payoff along a run, we propose definitions and algorithms for capturing the stability of the system, and…
▽ More
We study graphs and two-player games in which rewards are assigned to states, and the goal of the players is to satisfy or dissatisfy certain property of the generated outcome, given as a mean payoff property. Since the notion of mean-payoff does not reflect possible fluctuations from the mean-payoff along a run, we propose definitions and algorithms for capturing the stability of the system, and give algorithms for deciding if a given mean payoff and stability objective can be ensured in the system
△ Less
Submitted 21 April, 2016;
originally announced April 2016.
-
Stochastic Shortest Path with Energy Constraints in POMDPs
Authors:
Tomáš Brázdil,
Krishnendu Chatterjee,
Martin Chmelík,
Anchit Gupta,
Petr Novotný
Abstract:
We consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard cons…
▽ More
We consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard constraint. The energy levels may increase and decrease with transitions, and the hard constraint requires that the energy level must remain positive in all steps till the target is reached. First, we present a novel algorithm for solving POMDPs with energy levels, develo** on existing POMDP solvers and using RTDP as its main method. Our second contribution is related to policy representation. For larger POMDP instances the policies computed by existing solvers are too large to be understandable. We present an automated procedure based on machine learning techniques that automatically extracts important decisions of the policy allowing us to compute succinct human readable policies. Finally, we show experimentally that our algorithm performs well and computes succinct policies on a number of POMDP instances from the literature that were naturally enhanced with energy levels.
△ Less
Submitted 11 May, 2016; v1 submitted 24 February, 2016;
originally announced February 2016.
-
Strategy Synthesis in Adversarial Patrolling Games
Authors:
Tomáš Brázdil,
Petr Hliněný,
Antonín Kučera,
Vojtěch Řehák,
Matúš Abaffy
Abstract:
Patrolling is one of the central problems in operational security. Formally, a patrolling problem is specified by a set $U$ of nodes (admissible defender's positions), a set $T \subseteq U$ of vulnerable targets, an admissible defender's moves over $U$, and a function which to every target assigns the time needed to complete an intrusion at it. The goal is to design an optimal strategy for a defen…
▽ More
Patrolling is one of the central problems in operational security. Formally, a patrolling problem is specified by a set $U$ of nodes (admissible defender's positions), a set $T \subseteq U$ of vulnerable targets, an admissible defender's moves over $U$, and a function which to every target assigns the time needed to complete an intrusion at it. The goal is to design an optimal strategy for a defender who is moving from node to node and aims at detecting possible intrusions at the targets. The goal of the attacker is to maximize the probability of a successful attack. We assume that the attacker is adversarial, i.e., he knows the strategy of the defender and can observe her moves.
We prove that the defender has an optimal strategy for every patrolling problem. Further, we show that for every $\varepsilon$ > 0, there exists a finite-memory $\varepsilon$-optimal strategy for the defender constructible in exponential time, and we observe that such a strategy cannot be computed in polynomial time unless P=NP.
Then we focus ourselves to unrestricted defender's moves. Here, a patrolling problem is fully determined by its signature, the number of targets of each attack length. We bound the maximal probability of successfully defended attacks. Then, we introduce a decomposition method which allows to split a given patrolling problem $G$ into smaller subproblems and construct a defender's strategy for $G$ by "composing" the strategies constructed for these subproblems.
Finally, for patrolling problems with $T = U$ and a well-formed signature, we give an exact classification of all sufficiently connected environments where the defender can achieve the same value as in the fully connected uniform environment. This result is useful for designing "good" environments where the defender can act optimally.
△ Less
Submitted 13 July, 2015;
originally announced July 2015.
-
Long-Run Average Behaviour of Probabilistic Vector Addition Systems
Authors:
Tomas Brazdil,
Stefan Kiefer,
Antonin Kucera,
Petr Novotny
Abstract:
We study the pattern frequency vector for runs in probabilistic Vector Addition Systems with States (pVASS). Intuitively, each configuration of a given pVASS is assigned one of finitely many patterns, and every run can thus be seen as an infinite sequence of these patterns. The pattern frequency vector assigns to each run the limit of pattern frequencies computed for longer and longer prefixes of…
▽ More
We study the pattern frequency vector for runs in probabilistic Vector Addition Systems with States (pVASS). Intuitively, each configuration of a given pVASS is assigned one of finitely many patterns, and every run can thus be seen as an infinite sequence of these patterns. The pattern frequency vector assigns to each run the limit of pattern frequencies computed for longer and longer prefixes of the run. If the limit does not exist, then the vector is undefined. We show that for one-counter pVASS, the pattern frequency vector is defined and takes only finitely many values for almost all runs. Further, these values and their associated probabilities can be approximated up to an arbitrarily small relative error in polynomial time. For stable two-counter pVASS, we show the same result, but we do not provide any upper complexity bound. As a byproduct of our study, we discover counterexamples falsifying some classical results about stochastic Petri nets published in the~80s.
△ Less
Submitted 19 June, 2015; v1 submitted 11 May, 2015;
originally announced May 2015.
-
Counterexample Explanation by Learning Small Strategies in Markov Decision Processes
Authors:
Tomáš Brázdil,
Krishnendu Chatterjee,
Martin Chmelík,
Andreas Fellner,
Jan Křetínský
Abstract:
While for deterministic systems, a counterexample to a property can simply be an error trace, counterexamples in probabilistic systems are necessarily more complex. For instance, a set of erroneous traces with a sufficient cumulative probability mass can be used. Since these are too large objects to understand and manipulate, compact representations such as subchains have been considered. In the c…
▽ More
While for deterministic systems, a counterexample to a property can simply be an error trace, counterexamples in probabilistic systems are necessarily more complex. For instance, a set of erroneous traces with a sufficient cumulative probability mass can be used. Since these are too large objects to understand and manipulate, compact representations such as subchains have been considered. In the case of probabilistic systems with non-determinism, the situation is even more complex. While a subchain for a given strategy (or scheduler, resolving non-determinism) is a straightforward choice, we take a different approach. Instead, we focus on the strategy - which can be a counterexample to violation of or a witness of satisfaction of a property - itself, and extract the most important decisions it makes, and present its succinct representation. The key tools we employ to achieve this are (1) introducing a concept of importance of a state w.r.t. the strategy, and (2) learning using decision trees. There are three main consequent advantages of our approach. Firstly, it exploits the quantitative information on states, stressing the more important decisions. Secondly, it leads to a greater variability and degree of freedom in representing the strategies. Thirdly, the representation uses a self-explanatory data structure. In summary, our approach produces more succinct and more explainable strategies, as opposed to e.g. binary decision diagrams. Finally, our experimental results show that we can extract several rules describing the strategy even for very large systems that do not fit in memory, and based on the rules explain the erroneous behaviour.
△ Less
Submitted 10 February, 2015;
originally announced February 2015.
-
MultiGain: A controller synthesis tool for MDPs with multiple mean-payoff objectives
Authors:
Tomáš Brázdil,
Krishnendu Chatterjee,
Vojtěch Forejt,
Antonín Kučera
Abstract:
We present MultiGain, a tool to synthesize strategies for Markov decision processes (MDPs) with multiple mean-payoff objectives. Our models are described in PRISM, and our tool uses the existing interface and simulator of PRISM. Our tool extends PRISM by adding novel algorithms for multiple mean-payoff objectives, and also provides features such as (i)~generating strategies and exploring them for…
▽ More
We present MultiGain, a tool to synthesize strategies for Markov decision processes (MDPs) with multiple mean-payoff objectives. Our models are described in PRISM, and our tool uses the existing interface and simulator of PRISM. Our tool extends PRISM by adding novel algorithms for multiple mean-payoff objectives, and also provides features such as (i)~generating strategies and exploring them for simulation, and checking them with respect to other properties; and (ii)~generating an approximate Pareto curve for two mean-payoff objectives. In addition, we present a new practical algorithm for the analysis of MDPs with multiple mean-payoff objectives under memoryless strategies.
△ Less
Submitted 13 January, 2015;
originally announced January 2015.
-
Optimizing Performance of Continuous-Time Stochastic Systems using Timeout Synthesis
Authors:
Tomáš Brázdil,
Ľuboš Korenčiak,
Jan Krčál,
Petr Novotný,
Vojtěch Řehák
Abstract:
We consider parametric version of fixed-delay continuous-time Markov chains (or equivalently deterministic and stochastic Petri nets, DSPN) where fixed-delay transitions are specified by parameters, rather than concrete values. Our goal is to synthesize values of these parameters that, for a given cost function, minimise expected total cost incurred before reaching a given set of target states. We…
▽ More
We consider parametric version of fixed-delay continuous-time Markov chains (or equivalently deterministic and stochastic Petri nets, DSPN) where fixed-delay transitions are specified by parameters, rather than concrete values. Our goal is to synthesize values of these parameters that, for a given cost function, minimise expected total cost incurred before reaching a given set of target states. We show that under mild assumptions, optimal values of parameters can be effectively approximated using translation to a Markov decision process (MDP) whose actions correspond to discretized values of these parameters.
△ Less
Submitted 15 April, 2016; v1 submitted 17 July, 2014;
originally announced July 2014.
-
Verification of Markov Decision Processes using Learning Algorithms
Authors:
Tomáš Brázdil,
Krishnendu Chatterjee,
Martin Chmelík,
Vojtěch Forejt,
Jan Křetínský,
Marta Kwiatkowska,
David Parker,
Mateusz Ujma
Abstract:
We present a general framework for applying machine-learning algorithms to the verification of Markov decision processes (MDPs). The primary goal of these techniques is to improve performance by avoiding an exhaustive exploration of the state space. Our framework focuses on probabilistic reachability, which is a core property for verification, and is illustrated through two distinct instantiations…
▽ More
We present a general framework for applying machine-learning algorithms to the verification of Markov decision processes (MDPs). The primary goal of these techniques is to improve performance by avoiding an exhaustive exploration of the state space. Our framework focuses on probabilistic reachability, which is a core property for verification, and is illustrated through two distinct instantiations. The first assumes that full knowledge of the MDP is available, and performs a heuristic-driven partial exploration of the model, yielding precise lower and upper bounds on the required probability. The second tackles the case where we may only sample the MDP, and yields probabilistic guarantees, again in terms of both the lower and upper bounds, which provides efficient stop** criteria for the approximation. The latter is the first extension of statistical model-checking for unbounded properties in MDPs. In contrast with other related approaches, we do not restrict our attention to time-bounded (finite-horizon) or discounted properties, nor assume any particular properties of the MDP. We also show how our techniques extend to LTL objectives. We present experimental results showing the performance of our framework on several examples.
△ Less
Submitted 30 March, 2015; v1 submitted 10 February, 2014;
originally announced February 2014.
-
Zero-Reachability in Probabilistic Multi-Counter Automata
Authors:
Tomáš Brázdil,
Stefan Kiefer,
Antonín Kučera,
Petr Novotný,
Joost-Pieter Katoen
Abstract:
We study the qualitative and quantitative zero-reachability problem in probabilistic multi-counter systems. We identify the undecidable variants of the problems, and then we concentrate on the remaining two cases. In the first case, when we are interested in the probability of all runs that visit zero in some counter, we show that the qualitative zero-reachability is decidable in time which is pol…
▽ More
We study the qualitative and quantitative zero-reachability problem in probabilistic multi-counter systems. We identify the undecidable variants of the problems, and then we concentrate on the remaining two cases. In the first case, when we are interested in the probability of all runs that visit zero in some counter, we show that the qualitative zero-reachability is decidable in time which is polynomial in the size of a given pMC and doubly exponential in the number of counters. Further, we show that the probability of all zero-reaching runs can be effectively approximated up to an arbitrarily small given error epsilon > 0 in time which is polynomial in log(epsilon), exponential in the size of a given pMC, and doubly exponential in the number of counters. In the second case, we are interested in the probability of all runs that visit zero in some counter different from the last counter. Here we show that the qualitative zero-reachability is decidable and SquareRootSum-hard, and the probability of all zero-reaching runs can be effectively approximated up to an arbitrarily small given error epsilon > 0 (these result applies to pMC satisfying a suitable technical condition that can be verified in polynomial time). The proof techniques invented in the second case allow to construct counterexamples for some classical results about ergodicity in stochastic Petri nets.
△ Less
Submitted 27 January, 2014;
originally announced January 2014.
-
Solvency Markov Decision Processes with Interest
Authors:
Tomáš Brázdil,
Taolue Chen,
Vojtěch Forejt,
Petr Novotný,
Aistis Simaitis
Abstract:
Solvency games, introduced by Berger et al., provide an abstract framework for modelling decisions of a risk-averse investor, whose goal is to avoid ever going broke. We study a new variant of this model, where, in addition to stochastic environment and fixed increments and decrements to the investor's wealth, we introduce interest, which is earned or paid on the current level of savings or debt,…
▽ More
Solvency games, introduced by Berger et al., provide an abstract framework for modelling decisions of a risk-averse investor, whose goal is to avoid ever going broke. We study a new variant of this model, where, in addition to stochastic environment and fixed increments and decrements to the investor's wealth, we introduce interest, which is earned or paid on the current level of savings or debt, respectively.
We study problems related to the minimum initial wealth sufficient to avoid bankruptcy (i.e. steady decrease of the wealth) with probability at least p. We present an exponential time algorithm which approximates this minimum initial wealth, and show that a polynomial time approximation is not possible unless P = NP. For the qualitative case, i.e. p=1, we show that the problem whether a given number is larger than or equal to the minimum initial wealth belongs to both NP and coNP, and show that a polynomial time algorithm would yield a polynomial time algorithm for mean-payoff games, existence of which is a longstanding open problem. We also identify some classes of solvency MDPs for which this problem is in P. In all above cases the algorithms also give corresponding bankruptcy avoiding strategies.
△ Less
Submitted 11 October, 2013;
originally announced October 2013.
-
Determinacy in Stochastic Games with Unbounded Payoff Functions
Authors:
Tomáš Brázdil,
Antonín Kučera,
Petr Novotný
Abstract:
We consider infinite-state turn-based stochastic games of two players, Box and Diamond, who aim at maximizing and minimizing the expected total reward accumulated along a run, respectively. Since the total accumulated reward is unbounded, the determinacy of such games cannot be deduced directly from Martin's determinacy result for Blackwell games. Nevertheless, we show that these games are determi…
▽ More
We consider infinite-state turn-based stochastic games of two players, Box and Diamond, who aim at maximizing and minimizing the expected total reward accumulated along a run, respectively. Since the total accumulated reward is unbounded, the determinacy of such games cannot be deduced directly from Martin's determinacy result for Blackwell games. Nevertheless, we show that these games are determined both for unrestricted (i.e., history-dependent and randomized) strategies and deterministic strategies, and the equilibrium value is the same. Further, we show that these games are generally not determined for memoryless strategies. Then, we consider a subclass of Diamond-finitely-branching games and show that they are determined for all of the considered strategy types, where the equilibrium value is always the same. We also examine the existence and type of (epsilon-)optimal strategies for both players.
△ Less
Submitted 8 August, 2012;
originally announced August 2012.
-
Minimizing Expected Termination Time in One-Counter Markov Decision Processes
Authors:
Tomáš Brázdil,
Antonín Kučera,
Petr Novotný,
Dominik Wojtczak
Abstract:
We consider the problem of computing the value and an optimal strategy for minimizing the expected termination time in one-counter Markov decision processes. Since the value may be irrational and an optimal strategy may be rather complicated, we concentrate on the problems of approximating the value up to a given error epsilon > 0 and computing a finite representation of an epsilon-optimal strateg…
▽ More
We consider the problem of computing the value and an optimal strategy for minimizing the expected termination time in one-counter Markov decision processes. Since the value may be irrational and an optimal strategy may be rather complicated, we concentrate on the problems of approximating the value up to a given error epsilon > 0 and computing a finite representation of an epsilon-optimal strategy. We show that these problems are solvable in exponential time for a given configuration, and we also show that they are computationally hard in the sense that a polynomial-time approximation algorithm cannot exist unless P=NP.
△ Less
Submitted 4 May, 2012;
originally announced May 2012.
-
Efficient Controller Synthesis for Consumption Games with Multiple Resource Types
Authors:
Tomáš Brázdil,
Krishnendu Chatterjee,
Antonín Kučera,
Petr Novotný
Abstract:
We introduce consumption games, a model for discrete interactive system with multiple resources that are consumed or reloaded independently. More precisely, a consumption game is a finite-state graph where each transition is labeled by a vector of resource updates, where every update is a non-positive number or omega. The omega updates model the reloading of a given resource. Each vertex belongs e…
▽ More
We introduce consumption games, a model for discrete interactive system with multiple resources that are consumed or reloaded independently. More precisely, a consumption game is a finite-state graph where each transition is labeled by a vector of resource updates, where every update is a non-positive number or omega. The omega updates model the reloading of a given resource. Each vertex belongs either to player \Box or player \Diamond, where the aim of player \Box is to play so that the resources are never exhausted. We consider several natural algorithmic problems about consumption games, and show that although these problems are computationally hard in general, they are solvable in polynomial time for every fixed number of resource types (i.e., the dimension of the update vectors).
△ Less
Submitted 31 May, 2012; v1 submitted 3 February, 2012;
originally announced February 2012.
-
Stabilization of Branching Queueing Networks
Authors:
Tomáš Brázdil,
Stefan Kiefer
Abstract:
Queueing networks are gaining attraction for the performance analysis of parallel computer systems. A Jackson network is a set of interconnected servers, where the completion of a job at server i may result in the creation of a new job for server j. We propose to extend Jackson networks by "branching" and by "control" features. Both extensions are new and substantially expand the modelling power o…
▽ More
Queueing networks are gaining attraction for the performance analysis of parallel computer systems. A Jackson network is a set of interconnected servers, where the completion of a job at server i may result in the creation of a new job for server j. We propose to extend Jackson networks by "branching" and by "control" features. Both extensions are new and substantially expand the modelling power of Jackson networks. On the other hand, the extensions raise computational questions, particularly concerning the stability of the networks, i.e, the ergodicity of the underlying Markov chain. We show for our extended model that it is decidable in polynomial time if there exists a controller that achieves stability. Moreover, if such a controller exists, one can efficiently compute a static randomized controller which stabilizes the network in a very strong sense; in particular, all moments of the queue sizes are finite.
△ Less
Submitted 5 December, 2011;
originally announced December 2011.
-
Fixed-delay Events in Generalized Semi-Markov Processes Revisited
Authors:
Tomáš Brázdil,
Jan Krčál,
Jan Křetínský,
Vojtěch Řehák
Abstract:
We study long run average behavior of generalized semi-Markov processes with both fixed-delay events as well as variable-delay events. We show that allowing two fixed-delay events and one variable-delay event may cause an unstable behavior of a GSMP. In particular, we show that a frequency of a given state may not be defined for almost all runs (or more generally, an invariant measure may not exis…
▽ More
We study long run average behavior of generalized semi-Markov processes with both fixed-delay events as well as variable-delay events. We show that allowing two fixed-delay events and one variable-delay event may cause an unstable behavior of a GSMP. In particular, we show that a frequency of a given state may not be defined for almost all runs (or more generally, an invariant measure may not exist). We use this observation to disprove several results from literature. Next we study GSMP with at most one fixed-delay event combined with an arbitrary number of variable-delay events. We prove that such a GSMP always possesses an invariant measure which means that the frequencies of states are always well defined and we provide algorithms for approximation of these frequencies. Additionally, we show that the positive results remain valid even if we allow an arbitrary number of reasonably restricted fixed-delay events.
△ Less
Submitted 12 September, 2011; v1 submitted 7 June, 2011;
originally announced June 2011.
-
Approximating the Termination Value of One-Counter MDPs and Stochastic Games
Authors:
Tomáš Brázdil,
Václav Brožek,
Kousha Etessami,
Antonín Kučera
Abstract:
One-counter MDPs (OC-MDPs) and one-counter simple stochastic games (OC-SSGs) are 1-player, and 2-player turn-based zero-sum, stochastic games played on the transition graph of classic one-counter automata (equivalently, pushdown automata with a 1-letter stack alphabet). A key objective for the analysis and verification of these games is the termination objective, where the players aim to maximize…
▽ More
One-counter MDPs (OC-MDPs) and one-counter simple stochastic games (OC-SSGs) are 1-player, and 2-player turn-based zero-sum, stochastic games played on the transition graph of classic one-counter automata (equivalently, pushdown automata with a 1-letter stack alphabet). A key objective for the analysis and verification of these games is the termination objective, where the players aim to maximize (minimize, respectively) the probability of hitting counter value 0, starting at a given control state and given counter value. Recently, we studied qualitative decision problems ("is the optimal termination value = 1?") for OC-MDPs (and OC-SSGs) and showed them to be decidable in P-time (in NP and coNP, respectively). However, quantitative decision and approximation problems ("is the optimal termination value ? p", or "approximate the termination value within epsilon") are far more challenging. This is so in part because optimal strategies may not exist, and because even when they do exist they can have a highly non-trivial structure. It thus remained open even whether any of these quantitative termination problems are computable. In this paper we show that all quantitative approximation problems for the termination value for OC-MDPs and OC-SSGs are computable. Specifically, given a OC-SSG, and given epsilon > 0, we can compute a value v that approximates the value of the OC-SSG termination game within additive error epsilon, and furthermore we can compute epsilon-optimal strategies for both players in the game. A key ingredient in our proofs is a subtle martingale, derived from solving certain LPs that we can associate with a maximizing OC-MDP. An application of Azuma's inequality on these martingales yields a computable bound for the "wealth" at which a "rich person's strategy" becomes epsilon-optimal for OC-MDPs.
△ Less
Submitted 20 July, 2011; v1 submitted 26 April, 2011;
originally announced April 2011.
-
Markov Decision Processes with Multiple Long-run Average Objectives
Authors:
Tomáš Brázdil,
Václav Brožek,
Krishnendu Chatterjee,
Vojtěch Forejt,
Antonín Kučera
Abstract:
We study Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) functions. We consider two different objectives, namely, expectation and satisfaction objectives. Given an MDP with k limit-average functions, in the expectation objective the goal is to maximize the expected limit-average value, and in the satisfaction objective the goal is to maximize the probability of runs s…
▽ More
We study Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) functions. We consider two different objectives, namely, expectation and satisfaction objectives. Given an MDP with k limit-average functions, in the expectation objective the goal is to maximize the expected limit-average value, and in the satisfaction objective the goal is to maximize the probability of runs such that the limit-average value stays above a given vector. We show that under the expectation objective, in contrast to the case of one limit-average function, both randomization and memory are necessary for strategies even for epsilon-approximation, and that finite-memory randomized strategies are sufficient for achieving Pareto optimal values. Under the satisfaction objective, in contrast to the case of one limit-average function, infinite memory is necessary for strategies achieving a specific value (i.e. randomized finite-memory strategies are not sufficient), whereas memoryless randomized strategies are sufficient for epsilon-approximation, for all epsilon>0. We further prove that the decision problems for both expectation and satisfaction objectives can be solved in polynomial time and the trade-off curve (Pareto curve) can be epsilon-approximated in time polynomial in the size of the MDP and 1/epsilon, and exponential in the number of limit-average functions, for all epsilon>0. Our analysis also reveals flaws in previous work for MDPs with multiple mean-payoff functions under the expectation objective, corrects the flaws, and allows us to obtain improved results.
△ Less
Submitted 12 February, 2014; v1 submitted 18 April, 2011;
originally announced April 2011.
-
Efficient Analysis of Probabilistic Programs with an Unbounded Counter
Authors:
Tomas Brazdil,
Stefan Kiefer,
Antonin Kucera
Abstract:
We show that a subclass of infinite-state probabilistic programs that can be modeled by probabilistic one-counter automata (pOC) admits an efficient quantitative analysis. In particular, we show that the expected termination time can be approximated up to an arbitrarily small relative error with polynomially many arithmetic operations, and the same holds for the probability of all runs that satisf…
▽ More
We show that a subclass of infinite-state probabilistic programs that can be modeled by probabilistic one-counter automata (pOC) admits an efficient quantitative analysis. In particular, we show that the expected termination time can be approximated up to an arbitrarily small relative error with polynomially many arithmetic operations, and the same holds for the probability of all runs that satisfy a given omega-regular property. Further, our results establish a powerful link between pOC and martingale theory, which leads to fundamental observations about quantitative properties of runs in pOC. In particular, we provide a "divergence gap theorem", which bounds a positive non-termination probability in pOC away from zero.
△ Less
Submitted 12 February, 2011;
originally announced February 2011.
-
Measuring Performance of Continuous-Time Stochastic Processes using Timed Automata
Authors:
Tomáš Brázdil,
Jan Krčál,
Jan Křetínský,
Antonín Kučera,
Vojtěch Řehák
Abstract:
We propose deterministic timed automata (DTA) as a model-independent language for specifying performance and dependability measures over continuous-time stochastic processes. Technically, these measures are defined as limit frequencies of locations (control states) of a DTA that observes computations of a given stochastic process. Then, we study the properties of DTA measures over semi-Markov proc…
▽ More
We propose deterministic timed automata (DTA) as a model-independent language for specifying performance and dependability measures over continuous-time stochastic processes. Technically, these measures are defined as limit frequencies of locations (control states) of a DTA that observes computations of a given stochastic process. Then, we study the properties of DTA measures over semi-Markov processes in greater detail. We show that DTA measures over semi-Markov processes are well-defined with probability one, and there are only finitely many values that can be assumed by these measures with positive probability. We also give an algorithm which approximates these values and the associated probabilities up to an arbitrarily small given precision. Thus, we obtain a general and effective framework for analysing DTA measures over semi-Markov processes.
△ Less
Submitted 21 January, 2011;
originally announced January 2011.
-
One-Counter Stochastic Games
Authors:
Tomáš Brázdil,
Václav Brožek,
Kousha Etessami
Abstract:
We study the computational complexity of basic decision problems for one-counter simple stochastic games (OC-SSGs), under various objectives. OC-SSGs are 2-player turn-based stochastic games played on the transition graph of classic one-counter automata. We study primarily the termination objective, where the goal of one player is to maximize the probability of reaching counter value 0, while the…
▽ More
We study the computational complexity of basic decision problems for one-counter simple stochastic games (OC-SSGs), under various objectives. OC-SSGs are 2-player turn-based stochastic games played on the transition graph of classic one-counter automata. We study primarily the termination objective, where the goal of one player is to maximize the probability of reaching counter value 0, while the other player wishes to avoid this. Partly motivated by the goal of understanding termination objectives, we also study certain "limit" and "long run average" reward objectives that are closely related to some well-studied objectives for stochastic games with rewards. Examples of problems we address include: does player 1 have a strategy to ensure that the counter eventually hits 0, i.e., terminates, almost surely, regardless of what player 2 does? Or that the liminf (or limsup) counter value equals infinity with a desired probability? Or that the long run average reward is >0 with desired probability? We show that the qualitative termination problem for OC-SSGs is in NP intersection coNP, and is in P-time for 1-player OC-SSGs, or equivalently for one-counter Markov Decision Processes (OC-MDPs). Moreover, we show that quantitative limit problems for OC-SSGs are in NP intersection coNP, and are in P-time for 1-player OC-MDPs. Both qualitative limit problems and qualitative termination problems for OC-SSGs are already at least as hard as Condon's quantitative decision problem for finite-state SSGs.
△ Less
Submitted 28 September, 2010;
originally announced September 2010.
-
Runtime Analysis of Probabilistic Programs with Unbounded Recursion
Authors:
Tomas Brazdil,
Stefan Kiefer,
Antonin Kucera,
Ivana Hutarova Varekova
Abstract:
We study termination time and recurrence time in programs with unbounded recursion, which are either randomized or operate on some statistically quantified inputs. As the underlying formal model for such programs we use probabilistic pushdown automata (pPDA) which are equivalent to probabilistic recursive state machines. We obtain tail bounds for the distribution of termination time for pPDA. We a…
▽ More
We study termination time and recurrence time in programs with unbounded recursion, which are either randomized or operate on some statistically quantified inputs. As the underlying formal model for such programs we use probabilistic pushdown automata (pPDA) which are equivalent to probabilistic recursive state machines. We obtain tail bounds for the distribution of termination time for pPDA. We also study the recurrence time for probabilistic recursive programs that are not supposed to terminate (such as system daemons, network servers, etc.). Typically, such programs react to certain requests generated by their environment, and hence operate in finite request-service cycles. We obtain bounds for the frequency of long request-service cycles.
△ Less
Submitted 31 May, 2012; v1 submitted 10 July, 2010;
originally announced July 2010.
-
Space-efficient scheduling of stochastically generated tasks
Authors:
Tomáš Brázdil,
Javier Esparza,
Stefan Kiefer,
Michael Luttenberger
Abstract:
We study the problem of scheduling tasks for execution by a processor when the tasks can stochastically generate new tasks. Tasks can be of different types, and each type has a fixed, known probability of generating other tasks. We present results on the random variable S^sigma modeling the maximal space needed by the processor to store the currently active tasks when acting under the scheduler si…
▽ More
We study the problem of scheduling tasks for execution by a processor when the tasks can stochastically generate new tasks. Tasks can be of different types, and each type has a fixed, known probability of generating other tasks. We present results on the random variable S^sigma modeling the maximal space needed by the processor to store the currently active tasks when acting under the scheduler sigma. We obtain tail bounds for the distribution of S^sigma for both offline and online schedulers, and investigate the expected value of S^sigma.
△ Less
Submitted 27 April, 2010; v1 submitted 24 April, 2010;
originally announced April 2010.
-
Qualitative Reachability in Stochastic BPA Games
Authors:
Tomáš Brázdil,
Václav Brožek,
Antonín Kučera,
Jan Obdržálek
Abstract:
We consider a class of infinite-state stochastic games generated by stateless pushdown automata (or, equivalently, 1-exit recursive state machines), where the winning objective is specified by a regular set of target configurations and a qualitative probability constraint `>0' or `=1'. The goal of one player is to maximize the probability of reaching the target set so that the constraint is sati…
▽ More
We consider a class of infinite-state stochastic games generated by stateless pushdown automata (or, equivalently, 1-exit recursive state machines), where the winning objective is specified by a regular set of target configurations and a qualitative probability constraint `>0' or `=1'. The goal of one player is to maximize the probability of reaching the target set so that the constraint is satisfied, while the other player aims at the opposite. We show that the winner in such games can be determined in PTIME for the `>0' constraint, and both in NP and coNP for the `=1' constraint. Further, we prove that the winning regions for both players are regular, and we design algorithms which compute the associated finite-state automata. Finally, we show that winning strategies can be synthesized effectively.
△ Less
Submitted 27 February, 2010;
originally announced March 2010.
-
Reachability Games on Extended Vector Addition Systems with States
Authors:
Tomas Brazdil,
Petr Jancar,
Antonin Kucera
Abstract:
We consider two-player turn-based games with zero-reachability and zero-safety objectives generated by extended vector addition systems with states. Although the problem of deciding the winner in such games is undecidable in general, we identify several decidable and even tractable subcases of this problem obtained by restricting the number of counters and/or the sets of target configurations.
We consider two-player turn-based games with zero-reachability and zero-safety objectives generated by extended vector addition systems with states. Although the problem of deciding the winner in such games is undecidable in general, we identify several decidable and even tractable subcases of this problem obtained by restricting the number of counters and/or the sets of target configurations.
△ Less
Submitted 12 February, 2010;
originally announced February 2010.
-
One-Counter Markov Decision Processes
Authors:
Tomáš Brázdil,
Václav Brožek,
Kousha Etessami,
Antonín Kučera,
Dominik Wojtczak
Abstract:
We study the computational complexity of central analysis problems for One-Counter Markov Decision Processes (OC-MDPs), a class of finitely-presented, countable-state MDPs. OC-MDPs are equivalent to a controlled extension of (discrete-time) Quasi-Birth-Death processes (QBDs), a stochastic model studied heavily in queueing theory and applied probability. They can thus be viewed as a natural ``adv…
▽ More
We study the computational complexity of central analysis problems for One-Counter Markov Decision Processes (OC-MDPs), a class of finitely-presented, countable-state MDPs. OC-MDPs are equivalent to a controlled extension of (discrete-time) Quasi-Birth-Death processes (QBDs), a stochastic model studied heavily in queueing theory and applied probability. They can thus be viewed as a natural ``adversarial'' version of a classic stochastic model. Alternatively, they can also be viewed as a natural probabilistic/controlled extension of classic one-counter automata. OC-MDPs also subsume (as a very restricted special case) a recently studied MDP model called ``solvency games'' that model a risk-averse gambling scenario. Basic computational questions about these models include ``termination'' questions and ``limit'' questions, such as the following: does the controller have a ``strategy'' (or ``policy'') to ensure that the counter (which may for example count the number of jobs in the queue) will hit value 0 (the empty queue) almost surely (a.s.)? Or that it will have infinite limsup value, a.s.? Or, that it will hit value 0 in selected terminal states, a.s.? Or, in case these are not satisfied a.s., compute the maximum (supremum) such probability over all strategies. We provide new upper and lower bounds on the complexity of such problems. For some of them we present a polynomial-time algorithm, whereas for others we show PSPACE- or BH-hardness and give an EXPTIME upper bound. Our upper bounds combine techniques from the theory of MDP reward models, the theory of random walks, and a variety of automata-theoretic methods.
△ Less
Submitted 11 September, 2009; v1 submitted 16 April, 2009;
originally announced April 2009.