-
Symblicit algorithms for optimal strategy synthesis in monotonic Markov decision processes
Authors:
Aaron Bohy,
Véronique Bruyère,
Jean-François Raskin
Abstract:
When treating Markov decision processes (MDPs) with large state spaces, using explicit representations quickly becomes unfeasible. Lately, Wimmer et al. have proposed a so-called symblicit algorithm for the synthesis of optimal strategies in MDPs, in the quantitative setting of expected mean-payoff. This algorithm, based on the strategy iteration algorithm of Howard and Veinott, efficiently combin…
▽ More
When treating Markov decision processes (MDPs) with large state spaces, using explicit representations quickly becomes unfeasible. Lately, Wimmer et al. have proposed a so-called symblicit algorithm for the synthesis of optimal strategies in MDPs, in the quantitative setting of expected mean-payoff. This algorithm, based on the strategy iteration algorithm of Howard and Veinott, efficiently combines symbolic and explicit data structures, and uses binary decision diagrams as symbolic representation. The aim of this paper is to show that the new data structure of pseudo-antichains (an extension of antichains) provides another interesting alternative, especially for the class of monotonic MDPs. We design efficient pseudo-antichain based symblicit algorithms (with open source implementations) for two quantitative settings: the expected mean-payoff and the stochastic shortest path. For two practical applications coming from automated planning and LTL synthesis, we report promising experimental results w.r.t. both the run time and the memory consumption.
△ Less
Submitted 21 July, 2014;
originally announced July 2014.
-
Symblicit algorithms for optimal strategy synthesis in monotonic Markov decision processes (extended version)
Authors:
Aaron Bohy,
Véronique Bruyère,
Jean-François Raskin
Abstract:
When treating Markov decision processes (MDPs) with large state spaces, using explicit representations quickly becomes unfeasible. Lately, Wimmer et al. have proposed a so-called symblicit algorithm for the synthesis of optimal strategies in MDPs, in the quantitative setting of expected mean-payoff. This algorithm, based on the strategy iteration algorithm of Howard and Veinott, efficiently combin…
▽ More
When treating Markov decision processes (MDPs) with large state spaces, using explicit representations quickly becomes unfeasible. Lately, Wimmer et al. have proposed a so-called symblicit algorithm for the synthesis of optimal strategies in MDPs, in the quantitative setting of expected mean-payoff. This algorithm, based on the strategy iteration algorithm of Howard and Veinott, efficiently combines symbolic and explicit data structures, and uses binary decision diagrams as symbolic representation. The aim of this paper is to show that the new data structure of pseudo-antichains (an extension of antichains) provides another interesting alternative, especially for the class of monotonic MDPs. We design efficient pseudo-antichain based symblicit algorithms (with open source implementations) for two quantitative settings: the expected mean-payoff and the stochastic shortest path. For two practical applications coming from automated planning and LTL synthesis, we report promising experimental results w.r.t. both the run time and the memory consumption.
△ Less
Submitted 20 June, 2014; v1 submitted 5 February, 2014;
originally announced February 2014.
-
Synthesis from LTL Specifications with Mean-Payoff Objectives
Authors:
Aaron Bohy,
Véronique Bruyère,
Emmanuel Filiot,
Jean-François Raskin
Abstract:
The classical LTL synthesis problem is purely qualitative: the given LTL specification is realized or not by a reactive system. LTL is not expressive enough to formalize the correctness of reactive systems with respect to some quantitative aspects. This paper extends the qualitative LTL synthesis setting to a quantitative setting. The alphabet of actions is extended with a weight function ranging…
▽ More
The classical LTL synthesis problem is purely qualitative: the given LTL specification is realized or not by a reactive system. LTL is not expressive enough to formalize the correctness of reactive systems with respect to some quantitative aspects. This paper extends the qualitative LTL synthesis setting to a quantitative setting. The alphabet of actions is extended with a weight function ranging over the rational numbers. The value of an infinite word is the mean-payoff of the weights of its letters. The synthesis problem then amounts to automatically construct (if possible) a reactive system whose executions all satisfy a given LTL formula and have mean-payoff values greater than or equal to some given threshold. The latter problem is called LTLMP synthesis and the LTLMP realizability problem asks to check whether such a system exists. We first show that LTLMP realizability is not more difficult than LTL realizability: it is 2ExpTime-Complete. This is done by reduction to two-player mean-payoff parity games. While infinite memory strategies are required to realize LTLMP specifications in general, we show that epsilon-optimality can be obtained with finite memory strategies, for any epsilon > 0. To obtain an efficient algorithm in practice, we define a Safraless procedure to decide whether there exists a finite-memory strategy that realizes a given specification for some given threshold. This procedure is based on a reduction to two-player energy safety games which are in turn reduced to safety games. Finally, we show that those safety games can be solved efficiently by exploiting the structure of their state spaces and by using antichains as a symbolic data-structure. All our results extend to multi-dimensional weights. We have implemented an antichain-based procedure and we report on some promising experimental results.
△ Less
Submitted 9 January, 2013; v1 submitted 11 October, 2012;
originally announced October 2012.