Skip to main content

Showing 1–13 of 13 results for author: Sutton, R S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2102.07686  [pdf, other

    cs.LG cs.AI stat.ML

    Does the Adam Optimizer Exacerbate Catastrophic Forgetting?

    Authors: Dylan R. Ashley, Sina Ghiassian, Richard S. Sutton

    Abstract: Catastrophic forgetting remains a severe hindrance to the broad application of artificial neural networks (ANNs), however, it continues to be a poorly understood phenomenon. Despite the extensive amount of work on catastrophic forgetting, we argue that it is still unclear how exactly the phenomenon should be quantified, and, moreover, to what degree all of the choices we make when designing learni… ▽ More

    Submitted 9 June, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

    Comments: 9 pages in main text + 3 pages of references + 16 pages of appendices, 6 figures in main text + 21 figures in appendices, 6 tables in appendices; source code available at https://github.com/dylanashley/catastrophic-forgetting/tree/arxiv

    ACM Class: I.2.6

  2. arXiv:1912.04002  [pdf, other

    cs.LG stat.ML

    Learning Sparse Representations Incrementally in Deep Reinforcement Learning

    Authors: J. Fernando Hernandez-Garcia, Richard S. Sutton

    Abstract: Sparse representations have been shown to be useful in deep reinforcement learning for mitigating catastrophic interference and improving the performance of agents in terms of cumulative reward. Previous results were based on a two step process were the representation was learned offline and the action-value function was learned online afterwards. In this paper, we investigate if it is possible to… ▽ More

    Submitted 9 December, 2019; originally announced December 2019.

  3. arXiv:1904.01191  [pdf, other

    cs.LG cs.AI stat.ML

    Planning with Expectation Models

    Authors: Yi Wan, Zaheer Abbas, Adam White, Martha White, Richard S. Sutton

    Abstract: Distribution and sample models are two popular model choices in model-based reinforcement learning (MBRL). However, learning these models can be intractable, particularly when the state and action spaces are large. Expectation models, on the other hand, are relatively easier to learn due to their compactness and have also been widely used for deterministic environments. For stochastic environments… ▽ More

    Submitted 29 July, 2020; v1 submitted 1 April, 2019; originally announced April 2019.

  4. arXiv:1903.03252  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Feature Relevance Through Step Size Adaptation in Temporal-Difference Learning

    Authors: Alex Kearney, Vivek Veeriah, Jaden Travnik, Patrick M. Pilarski, Richard S. Sutton

    Abstract: There is a long history of using meta learning as representation learning, specifically for determining the relevance of inputs. In this paper, we examine an instance of meta-learning in which feature relevance is learned by adapting step size parameters of stochastic gradient descent---building on a variety of prior work in stochastic approximation, machine learning, and artificial neural network… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.

  5. arXiv:1901.07510  [pdf, other

    cs.LG stat.ML

    Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target

    Authors: J. Fernando Hernandez-Garcia, Richard S. Sutton

    Abstract: Multi-step methods such as Retrace($λ$) and $n$-step $Q$-learning have become a crucial component of modern deep reinforcement learning agents. These methods are often evaluated as a part of bigger architectures and their evaluations rarely include enough samples to draw statistically significant conclusions about their performance. This type of methodology makes it difficult to understand how par… ▽ More

    Submitted 7 February, 2019; v1 submitted 22 January, 2019; originally announced January 2019.

  6. arXiv:1811.02597  [pdf, other

    cs.LG cs.AI stat.ML

    Online Off-policy Prediction

    Authors: Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White

    Abstract: This paper investigates the problem of online prediction learning, where learning proceeds continuously as the agent interacts with an environment. The predictions made by the agent are contingent on a particular way of behaving, represented as a value function. However, the behavior used to select actions and generate the behavior data might be different from the one used to define the prediction… ▽ More

    Submitted 6 November, 2018; originally announced November 2018.

    Comments: 68 pages

  7. arXiv:1807.01830  [pdf, other

    cs.LG cs.AI stat.ML

    Per-decision Multi-step Temporal Difference Learning with Control Variates

    Authors: Kristopher De Asis, Richard S. Sutton

    Abstract: Multi-step temporal difference (TD) learning is an important approach in reinforcement learning, as it unifies one-step TD learning with Monte Carlo methods in a way where intermediate algorithms can outperform either extreme. They address a bias-variance trade off between reliance on current estimates, which could be poor, and incorporating longer sampled reward sequences into the updates. Especi… ▽ More

    Submitted 4 July, 2018; originally announced July 2018.

    Journal ref: (2018). In Conference on Uncertainty in Artificial Intelligence. http://auai.org/uai2018/proceedings/papers/282.pdf

  8. arXiv:1806.00540  [pdf, other

    cs.LG cs.AI stat.ML

    Integrating Episodic Memory into a Reinforcement Learning Agent using Reservoir Sampling

    Authors: Kenny J. Young, Richard S. Sutton, Shuo Yang

    Abstract: Episodic memory is a psychology term which refers to the ability to recall specific events from the past. We suggest one advantage of this particular type of memory is the ability to easily assign credit to a specific state when remembered information is found to be useful. Inspired by this idea, and the increasing popularity of external memory mechanisms to handle long-term dependencies in deep l… ▽ More

    Submitted 1 June, 2018; originally announced June 2018.

  9. arXiv:1805.07476  [pdf, other

    cs.LG cs.AI stat.ML

    Two geometric input transformation methods for fast online reinforcement learning with neural nets

    Authors: Sina Ghiassian, Huizhen Yu, Banafsheh Rafiee, Richard S. Sutton

    Abstract: We apply neural nets with ReLU gates in online reinforcement learning. Our goal is to train these networks in an incremental manner, without the computationally expensive experience replay. By studying how individual neural nodes behave in online training, we recognize that the global nature of ReLU gates can cause undesirable learning interference in each node's learning behavior. We propose redu… ▽ More

    Submitted 6 September, 2018; v1 submitted 18 May, 2018; originally announced May 2018.

    Comments: 16 pages

  10. arXiv:1804.03334  [pdf, other

    cs.LG stat.ML

    TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent

    Authors: Alex Kearney, Vivek Veeriah, Jaden B. Travnik, Richard S. Sutton, Patrick M. Pilarski

    Abstract: In this paper, we introduce a method for adapting the step-sizes of temporal difference (TD) learning. The performance of TD methods often depends on well chosen step-sizes, yet few algorithms have been developed for setting the step-size automatically for TD learning. An important limitation of current methods is that they adapt a single step-size shared by all the weights of the learning system.… ▽ More

    Submitted 10 April, 2018; originally announced April 2018.

    Comments: Version as submitted to the 31st Conference on Neural Information Processing Systems (NIPS 2017) on May 19, 2017. 9 pages, 5 figures. Extended version in preparation for journal submission

  11. arXiv:1612.02879  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks

    Authors: Vivek Veeriah, Shangtong Zhang, Richard S. Sutton

    Abstract: Representations are fundamental to artificial intelligence. The performance of a learning system depends on the type of representation used for representing the data. Typically, these representations are hand-engineered using domain knowledge. More recently, the trend is to learn these representations through stochastic gradient descent in multi-layer neural networks, which is called backprop. Lea… ▽ More

    Submitted 27 April, 2017; v1 submitted 8 December, 2016; originally announced December 2016.

  12. arXiv:1607.05047  [pdf, other

    stat.ML cs.LG

    A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward

    Authors: S. A. Murphy, Y. Deng, E. B. Laber, H. R. Maei, R. S. Sutton, K. Witkiewitz

    Abstract: We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals. This algorithm is developed with a view towards its use in mobile health.

    Submitted 18 July, 2016; originally announced July 2016.

  13. arXiv:1507.00353  [pdf, other

    cs.AI cs.LG stat.ML

    An Empirical Evaluation of True Online TD(λ)

    Authors: Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Richard S. Sutton

    Abstract: The true online TD(λ) algorithm has recently been proposed (van Seijen and Sutton, 2014) as a universal replacement for the popular TD(λ) algorithm, in temporal-difference learning and reinforcement learning. True online TD(λ) has better theoretical properties than conventional TD(λ), and the expectation is that it also results in faster learning. In this paper, we put this hypothesis to the test.… ▽ More

    Submitted 1 July, 2015; originally announced July 2015.

    Comments: European Workshop on Reinforcement Learning (EWRL) 2015