Search | arXiv e-print repository

Mungojerrie: Reinforcement Learning of Linear-Time Objectives

Authors: Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, Dominik Wojtczak

Abstract: Reinforcement learning synthesizes controllers without prior knowledge of the system. At each timestep, a reward is given. The controllers optimize the discounted sum of these rewards. Applying this class of algorithms requires designing a reward scheme, which is typically done manually. The designer must ensure that their intent is accurately captured. This may not be trivial, and is prone to err… ▽ More Reinforcement learning synthesizes controllers without prior knowledge of the system. At each timestep, a reward is given. The controllers optimize the discounted sum of these rewards. Applying this class of algorithms requires designing a reward scheme, which is typically done manually. The designer must ensure that their intent is accurately captured. This may not be trivial, and is prone to error. An alternative to this manual programming, akin to programming directly in assembly, is to specify the objective in a formal language and have it "compiled" to a reward scheme. Mungojerrie (https://plv.colorado.edu/mungojerrie/) is a tool for testing reward schemes for $ω$-regular objectives on finite models. The tool contains reinforcement learning algorithms and a probabilistic model checker. Mungojerrie supports models specified in PRISM and $ω$-automata specified in HOA. △ Less

Submitted 17 June, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

Comments: Mungojerrie is available at https://plv.colorado.edu/mungojerrie/

arXiv:2106.06777 [pdf, other]

Model-free Reinforcement Learning for Branching Markov Decision Processes

Authors: Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, Dominik Wojtczak

Abstract: We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the s… ▽ More We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach. △ Less

Submitted 12 June, 2021; originally announced June 2021.

Comments: to appear in CAV 2021

arXiv:1706.09886 [pdf, ps, other]

Optimal Control for Multi-Mode Systems with Discrete Costs

Authors: Mahmoud A. A. Mousa, Sven Schewe, Dominik Wojtczak

Abstract: This paper studies optimal time-bounded control in multi-mode systems with discrete costs. Multi-mode systems are an important subclass of linear hybrid systems, in which there are no guards on transitions and all invariants are global. Each state has a continuous cost attached to it, which is linear in the sojourn time, while a discrete cost is attached to each transition taken. We show that an o… ▽ More This paper studies optimal time-bounded control in multi-mode systems with discrete costs. Multi-mode systems are an important subclass of linear hybrid systems, in which there are no guards on transitions and all invariants are global. Each state has a continuous cost attached to it, which is linear in the sojourn time, while a discrete cost is attached to each transition taken. We show that an optimal control for this model can be computed in NEXPTIME and approximated in PSPACE. We also show that the one-dimensional case is simpler: although the problem is NP-complete (and in LOGSPACE for an infinite time horizon), we develop an FPTAS for finding an approximate solution. △ Less

Submitted 29 June, 2017; originally announced June 2017.

Comments: extended version of a FORMATS 2017 paper

arXiv:1302.4406 [pdf, other]

Optimal Scheduling for Linear-Rate Multi-Mode Systems

Authors: Dominik Wojtczak

Abstract: Linear-Rate Multi-Mode Systems is a model that can be seen both as a subclass of switched linear systems with imposed global safety constraints and as hybrid automata with no guards on transitions. We study the existence and design of a controller for this model that keeps the state of the system within a given safe set for the whole time. A sufficient and necessary condition is given for such a c… ▽ More Linear-Rate Multi-Mode Systems is a model that can be seen both as a subclass of switched linear systems with imposed global safety constraints and as hybrid automata with no guards on transitions. We study the existence and design of a controller for this model that keeps the state of the system within a given safe set for the whole time. A sufficient and necessary condition is given for such a controller to exist as well as an algorithm that finds one in polynomial time. We further generalise the model by adding costs on modes and present an algorithm that constructs a safe controller which minimises the peak cost, the average-cost or any cost expressed as a weighted sum of these two. Finally, we present numerical simulation results based on our implementation of these algorithms. △ Less

Submitted 18 February, 2013; originally announced February 2013.

Showing 1–4 of 4 results for author: Wojtczak, D