Skip to main content

Showing 1–4 of 4 results for author: Dewanto, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2204.04324  [pdf, other

    cs.LG

    Approximate discounting-free policy evaluation from transient and recurrent states

    Authors: Vektor Dewanto, Marcus Gallagher

    Abstract: In order to distinguish policies that prescribe good from bad actions in transient states, we need to evaluate the so-called bias of a policy from transient states. However, we observe that most (if not all) works in approximate discounting-free policy evaluation thus far are developed for estimating the bias solely from recurrent states. We therefore propose a system of approximators for the bias… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: 28 pages

  2. arXiv:2107.01348  [pdf, other

    cs.LG cs.AI cs.RO eess.SY

    Examining average and discounted reward optimality criteria in reinforcement learning

    Authors: Vektor Dewanto, Marcus Gallagher

    Abstract: In reinforcement learning (RL), the goal is to obtain an optimal policy, for which the optimality criterion is fundamentally important. Two major optimality criteria are average and discounted rewards. While the latter is more popular, it is problematic to apply in environments without an inherent notion of discounting. This motivates us to revisit a) the progression of optimality criteria in dyna… ▽ More

    Submitted 1 September, 2022; v1 submitted 3 July, 2021; originally announced July 2021.

    Comments: 23 pages, restructuring, adding more details

  3. arXiv:2105.13609  [pdf, other

    cs.LG cs.AI eess.SY

    A nearly Blackwell-optimal policy gradient method

    Authors: Vektor Dewanto, Marcus Gallagher

    Abstract: For continuing environments, reinforcement learning (RL) methods commonly maximize the discounted reward criterion with discount factor close to 1 in order to approximate the average reward (the gain). However, such a criterion only considers the long-run steady-state performance, ignoring the transient behaviour in transient states. In this work, we develop a policy gradient method that optimizes… ▽ More

    Submitted 3 July, 2022; v1 submitted 28 May, 2021; originally announced May 2021.

    Comments: 30 pages, major re-structuring

  4. arXiv:2010.08920  [pdf, ps, other

    cs.LG cs.AI

    Average-reward model-free reinforcement learning: a systematic review and literature map**

    Authors: Vektor Dewanto, George Dunn, Ali Eshragh, Marcus Gallagher, Fred Roosta

    Abstract: Reinforcement learning is important part of artificial intelligence. In this paper, we review model-free reinforcement learning that utilizes the average reward optimality criterion in the infinite horizon setting. Motivated by the solo survey by Mahadevan (1996a), we provide an updated review of work in this area and extend it to cover policy-iteration and function approximation methods (in addit… ▽ More

    Submitted 3 August, 2021; v1 submitted 18 October, 2020; originally announced October 2020.

    Comments: 36 pages, refined prelim and politer sections