Showing 1–1 of 1 results for author: Wajid, M S

Search v0.5.6 released 2020-02-24

arXiv:2211.15602 [pdf, ps, other]

cs.DM cs.CC math.CO

Upper Bounds for All and Max-gain Policy Iteration Algorithms on Deterministic MDPs

Authors: Ritesh Goenka, Eashan Gupta, Sushil Khyalia, Pratyush Agarwal, Mulinti Shaik Wajid, Shivaram Kalyanakrishnan

Abstract: Policy Iteration (PI) is a widely used family of algorithms to compute optimal policies for Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on Deterministic MDPs (DMDPs): the class of MDPs in which every state-action pair has a unique next state. Our results include a non-trivial upper bound that applies to the entire family of PI algorithms; another to all "max-g… ▽ More Policy Iteration (PI) is a widely used family of algorithms to compute optimal policies for Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on Deterministic MDPs (DMDPs): the class of MDPs in which every state-action pair has a unique next state. Our results include a non-trivial upper bound that applies to the entire family of PI algorithms; another to all "max-gain" switching variants; and affirmation that a conjecture regarding Howard's PI on MDPs is true for DMDPs. Our analysis is based on certain graph-theoretic results, which may be of independent interest. △ Less

Submitted 8 October, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

Comments: Added new bounds for two state MDPs

MSC Class: 90C40 (Primary) 68Q25; 05C35; 05C38 (Secondary)

Search v0.5.6 released 2020-02-24