Skip to main content

Showing 1–6 of 6 results for author: Nota, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2212.14066  [pdf, ps, other

    cs.LG cs.AI

    On the Convergence of Discounted Policy Gradient Methods

    Authors: Chris Nota

    Abstract: Many popular policy gradient methods for reinforcement learning follow a biased approximation of the policy gradient known as the discounted approximation. While it has been shown that the discounted approximation of the policy gradient is not the gradient of any objective function, little else is known about its convergence behavior or properties. In this paper, we show that if the discounted app… ▽ More

    Submitted 9 January, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

    Comments: 10 pages

  2. arXiv:2001.01577  [pdf, other

    cs.AI

    Learning Reusable Options for Multi-Task Reinforcement Learning

    Authors: Francisco M. Garcia, Chris Nota, Philip S. Thomas

    Abstract: Reinforcement learning (RL) has become an increasingly active area of research in recent years. Although there are many algorithms that allow an agent to solve tasks efficiently, they often ignore the possibility that prior experience related to the task at hand might be available. For many practical applications, it might be unfeasible for an agent to learn how to solve a task from scratch, given… ▽ More

    Submitted 6 January, 2020; originally announced January 2020.

    Comments: 15 pages, 7 figures, pre-print

  3. arXiv:1906.07073  [pdf, other

    cs.LG stat.ML

    Is the Policy Gradient a Gradient?

    Authors: Chris Nota, Philip S. Thomas

    Abstract: The policy gradient theorem describes the gradient of the expected discounted return with respect to an agent's policy parameters. However, most policy gradient methods drop the discount factor from the state distribution and therefore do not optimize the discounted objective. What do they optimize instead? This has been an open question for several years, and this lack of theoretical clarity has… ▽ More

    Submitted 27 February, 2020; v1 submitted 17 June, 2019; originally announced June 2019.

    Comments: 8 pages, 3 figures

  4. arXiv:1906.03063  [pdf, ps, other

    cs.LG stat.ML

    Classical Policy Gradient: Preserving Bellman's Principle of Optimality

    Authors: Philip S. Thomas, Scott M. Jordan, Yash Chandak, Chris Nota, James Kostas

    Abstract: We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.

    Submitted 6 June, 2019; originally announced June 2019.

    Comments: 1 page, 0 figures

  5. arXiv:1906.01770  [pdf, other

    cs.LG stat.ML

    Lifelong Learning with a Changing Action Set

    Authors: Yash Chandak, Georgios Theocharous, Chris Nota, Philip S. Thomas

    Abstract: In many real-world sequential decision making problems, the number of available actions (decisions) can vary over time. While problems like catastrophic forgetting, changing transition dynamics, changing rewards functions, etc. have been well-studied in the lifelong learning literature, the setting where the action set changes remains unaddressed. In this paper, we present an algorithm that autono… ▽ More

    Submitted 10 May, 2020; v1 submitted 4 June, 2019; originally announced June 2019.

    Comments: Thirty-fourth Conference on Artificial Intelligence (AAAI 2020) [Outstanding Student Paper Honorable Mention. ]

  6. arXiv:1902.05650  [pdf, other

    cs.LG stat.ML

    Asynchronous Coagent Networks

    Authors: James E. Kostas, Chris Nota, Philip S. Thomas

    Abstract: Coagent policy gradient algorithms (CPGAs) are reinforcement learning algorithms for training a class of stochastic neural networks called coagent networks. In this work, we prove that CPGAs converge to locally optimal policies. Additionally, we extend prior theory to encompass asynchronous and recurrent coagent networks. These extensions facilitate the straightforward design and analysis of hiera… ▽ More

    Submitted 10 August, 2020; v1 submitted 14 February, 2019; originally announced February 2019.

    Comments: Updated version