Skip to main content

Showing 1–50 of 88 results for author: Sutton, R

.
  1. arXiv:2406.14951  [pdf, other

    cs.LG cs.AI

    An Idiosyncrasy of Time-discretization in Reinforcement Learning

    Authors: Kris De Asis, Richard S. Sutton

    Abstract: Many reinforcement learning algorithms are built on an assumption that an agent interacts with an environment over fixed-duration, discrete time steps. However, physical systems are continuous in time, requiring a choice of time-discretization granularity when digitally controlling them. Furthermore, such systems do not wait for decisions to be made before advancing the environment state, necessit… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: RLC 2024

    ACM Class: I.2.6; I.2.9

  2. arXiv:2405.09999  [pdf, other

    cs.LG cs.AI

    Reward Centering

    Authors: Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton

    Abstract: We show that discounted methods for solving continuing reinforcement learning problems can perform significantly better if they center their rewards by subtracting out the rewards' empirical average. The improvement is substantial at commonly used discount factors and increases further as the discount factor approaches one. In addition, we show that if a problem's rewards are shifted by a constant… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: In Proceedings of RLC 2024

  3. arXiv:2402.02342  [pdf, other

    cs.LG cs.AI math.OC

    MetaOptimize: A Framework for Optimizing Step Sizes and Other Meta-parameters

    Authors: Arsalan Sharifnassab, Saber Salehkaleybar, Richard Sutton

    Abstract: This paper addresses the challenge of optimizing meta-parameters (i.e., hyperparameters) in machine learning algorithms, a critical factor influencing training efficiency and model performance. Moving away from the computationally expensive traditional meta-parameter search methods, we introduce MetaOptimize framework that dynamically adjusts meta-parameters, particularly step sizes (also known as… ▽ More

    Submitted 27 May, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

  4. arXiv:2401.17401  [pdf, other

    cs.LG cs.AI

    Step-size Optimization for Continual Learning

    Authors: Thomas Degris, Khurram Javed, Arsalan Sharifnassab, Yuxin Liu, Richard Sutton

    Abstract: In continual learning, a learner has to keep learning from the data over its whole life time. A key issue is to decide what knowledge to keep and what knowledge to let go. In a neural network, this can be implemented by using a step-size vector to scale how much gradient samples change network weights. Common algorithms, like RMSProp and Adam, use heuristics, specifically normalization, to adapt t… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  5. arXiv:2312.15091  [pdf, ps, other

    cs.LG math.OC

    A Note on Stability in Asynchronous Stochastic Approximation without Communication Delays

    Authors: Huizhen Yu, Yi Wan, Richard S. Sutton

    Abstract: In this paper, we study asynchronous stochastic approximation algorithms without communication delays. Our main contribution is a stability proof for these algorithms that extends a method of Borkar and Meyn by accommodating more general noise conditions. We also derive convergence results from this stability result and discuss their application in important average-reward reinforcement learning p… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: 21 pages

    MSC Class: 62L20 (Primary) 93E35; 90C40 (Secondary)

  6. arXiv:2310.01569  [pdf, other

    cs.AI cs.LG

    Iterative Option Discovery for Planning, by Planning

    Authors: Kenny Young, Richard S. Sutton

    Abstract: Discovering useful temporal abstractions, in the form of options, is widely thought to be key to applying reinforcement learning and planning to increasingly complex domains. Building on the empirical success of the Expert Iteration approach to policy learning used in AlphaZero, we propose Option Iteration, an analogous approach to option discovery. Rather than learning a single strong policy that… ▽ More

    Submitted 22 December, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Fixed incorrect arrows on some figures in the appendix

  7. arXiv:2306.15625  [pdf, other

    cs.LG cs.AI

    Value-aware Importance Weighting for Off-policy Reinforcement Learning

    Authors: Kristopher De Asis, Eric Graves, Richard S. Sutton

    Abstract: Importance sampling is a central idea underlying off-policy prediction in reinforcement learning. It provides a strategy for re-weighting samples from a distribution to obtain unbiased estimates under another distribution. However, importance sampling weights tend to exhibit extreme variance, often leading to stability issues in practice. In this work, we consider a broader class of importance wei… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: CoLLAs 2023

    ACM Class: I.2

  8. arXiv:2306.13812  [pdf, other

    cs.LG

    Maintaining Plasticity in Deep Continual Learning

    Authors: Shibhansh Dohare, J. Fernando Hernandez-Garcia, Parash Rahman, A. Rupam Mahmood, Richard S. Sutton

    Abstract: Modern deep-learning systems are specialized to problem settings in which training occurs once and then never again, as opposed to continual-learning settings in which training occurs continually. If deep-learning systems are applied in a continual learning setting, then it is well known that they may fail to remember earlier examples. More fundamental, but less well known, is that they may also l… ▽ More

    Submitted 9 April, 2024; v1 submitted 23 June, 2023; originally announced June 2023.

  9. arXiv:2302.05326  [pdf, other

    cs.LG cs.AI

    Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks

    Authors: Khurram Javed, Haseeb Shah, Rich Sutton, Martha White

    Abstract: Constructing states from sequences of observations is an important component of reinforcement learning agents. One solution for state construction is to use recurrent neural networks. Back-propagation through time (BPTT), and real-time recurrent learning (RTRL) are two popular gradient-based methods for recurrent learning. BPTT requires complete trajectories of observations before it can compute t… ▽ More

    Submitted 21 November, 2023; v1 submitted 20 January, 2023; originally announced February 2023.

    Comments: Scalable recurrent learning, online learning, real-time recurrent learning, cascade correlation networks, agent-state construction, columnar networks, constructive networks

  10. arXiv:2301.13757  [pdf, other

    cs.LG cs.AI

    Toward Efficient Gradient-Based Value Estimation

    Authors: Arsalan Sharifnassab, Richard Sutton

    Abstract: Gradient-based methods for value estimation in reinforcement learning have favorable stability properties, but they are typically much slower than Temporal Difference (TD) learning methods. We study the root causes of this slowness and show that Mean Square Bellman Error (MSBE) is an ill-conditioned loss function in the sense that its Hessian has large condition-number. To resolve the adverse effe… ▽ More

    Submitted 23 July, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

  11. arXiv:2210.14361  [pdf, other

    cs.LG cs.AI

    Auxiliary task discovery through generate-and-test

    Authors: Banafsheh Rafiee, Sina Ghiassian, Jun **, Richard Sutton, Jun Luo, Adam White

    Abstract: In this paper, we explore an approach to auxiliary task discovery in reinforcement learning based on ideas from representation learning. Auxiliary tasks tend to improve data efficiency by forcing the agent to learn auxiliary prediction and control objectives in addition to the main task of maximizing reward, and thus producing better representations. Typically these tasks are designed by people. M… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

  12. arXiv:2209.15141  [pdf, other

    cs.LG

    On Convergence of Average-Reward Off-Policy Control Algorithms in Weakly Communicating MDPs

    Authors: Yi Wan, Richard S. Sutton

    Abstract: We show two average-reward off-policy control algorithms, Differential Q-learning (Wan, Naik, & Sutton 2021a) and RVI Q-learning (Abounadi Bertsekas & Borkar 2001), converge in weakly communicating MDPs. Weakly communicating MDPs are the most general MDPs that can be solved by a learning algorithm with a single stream of experience. The original convergence proofs of the two algorithms require tha… ▽ More

    Submitted 5 November, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

  13. arXiv:2208.11173  [pdf, other

    cs.AI cs.LG

    The Alberta Plan for AI Research

    Authors: Richard S. Sutton, Michael Bowling, Patrick M. Pilarski

    Abstract: Herein we describe our approach to artificial intelligence research, which we call the Alberta Plan. The Alberta Plan is pursued within our research groups in Alberta and by others who are like minded throughout the world. We welcome all who would join us in this pursuit.

    Submitted 21 March, 2023; v1 submitted 23 August, 2022; originally announced August 2022.

  14. arXiv:2207.13735  [pdf, other

    hep-ph

    NNLO interpolation grids for jet production at the LHC

    Authors: D. Britzger, A. Gehrmann-De Ridder, T. Gehrmann, E. W. N. Glover, C. Gwenlan, A. Huss, J. Pires, K. Rabbertz, D. Savoiu, M. R. Sutton, J. Stark

    Abstract: Fast interpolation-grid frameworks facilitate an efficient and flexible evaluation of higher-order predictions for any choice of parton distribution functions or value of the strong coupling $α_s$. They constitute an essential tool for the extraction of parton distribution functions and Standard Model parameters, as well as studies of the dependence of cross sections on the renormalisation and fac… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: 19 pages, 10 figures, 6 tables

    Report number: CERN-TH-2022-125, IPPP/22/53, MPP-2022-80, ZU-TH 34/22

  15. arXiv:2207.01613  [pdf, other

    cs.LG

    Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions

    Authors: Tian Tian, Kenny Young, Richard S. Sutton

    Abstract: Value iteration (VI) is a foundational dynamic programming method, important for learning and planning in optimal control and reinforcement learning. VI proceeds in batches, where the update to the value of each state must be completed before the next batch of updates can begin. Completing a single batch is prohibitively expensive if the state space is large, rendering VI impractical for many appl… ▽ More

    Submitted 27 November, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

  16. arXiv:2205.12515  [pdf, other

    cs.LG cs.AI

    Toward Discovering Options that Achieve Faster Planning

    Authors: Yi Wan, Richard S. Sutton

    Abstract: We propose a new objective for option discovery that emphasizes the computational advantage of using options in planning. In a sequential machine, the speed of planning is proportional to the number of elementary operations used to achieve a good policy. For episodic tasks, the number of elementary operations depends on the number of options composed by the policy in an episode and the number of o… ▽ More

    Submitted 29 September, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

  17. arXiv:2202.13252  [pdf, other

    cs.AI

    The Quest for a Common Model of the Intelligent Decision Maker

    Authors: Richard S. Sutton

    Abstract: The premise of the Multi-disciplinary Conference on Reinforcement Learning and Decision Making is that multiple disciplines share an interest in goal-directed decision making over time. The idea of this paper is to sharpen and deepen this premise by proposing a perspective on the decision maker that is substantive and widely held across psychology, artificial intelligence, economics, control theor… ▽ More

    Submitted 5 June, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

    Comments: Will appear as an extended abstract at the fifth Multi-disciplinary Conference on Reinforcement Learning and Decision Making, held in Providence, Rhode Island, June 8-11, 2022

  18. arXiv:2202.09701  [pdf, ps, other

    cs.LG

    A History of Meta-gradient: Gradient Methods for Meta-learning

    Authors: Richard S. Sutton

    Abstract: The history of meta-learning methods based on gradient descent is reviewed, focusing primarily on methods that adapt step-size (learning rate) meta-parameters.

    Submitted 19 February, 2022; originally announced February 2022.

    Comments: 3 pages of text, 54 references

  19. Reward-Respecting Subtasks for Model-Based Reinforcement Learning

    Authors: Richard S. Sutton, Marlos C. Machado, G. Zacharias Holland, David Szepesvari, Finbarr Timbers, Brian Tanner, Adam White

    Abstract: To achieve the ambitious goals of artificial intelligence, reinforcement learning must include planning with a model of the world that is abstract in state and time. Deep learning has made progress with state abstraction, but temporal abstraction has rarely been used, despite extensively developed theory based on the options framework. One reason for this is that the space of possible options is i… ▽ More

    Submitted 16 September, 2023; v1 submitted 7 February, 2022; originally announced February 2022.

    Journal ref: Artificial Intelligence, first published online September 6, 2023

  20. arXiv:2112.15236  [pdf, other

    cs.LG cs.AI

    Learning Agent State Online with Recurrent Generate-and-Test

    Authors: Amir Samani, Richard S. Sutton

    Abstract: Learning continually and online from a continuous stream of data is challenging, especially for a reinforcement learning agent with sequential data. When the environment only provides observations giving partial information about the state of the environment, the agent must learn the agent state based on the data stream of experience. We refer to the state learned directly from the data stream of… ▽ More

    Submitted 30 December, 2021; originally announced December 2021.

  21. arXiv:2112.01120  [pdf, other

    hep-ex hep-ph

    Impact of jet-production data on the next-to-next-to-leading-order determination of HERAPDF2.0 parton distributions

    Authors: H1, ZEUS Collaborations, :, I. Abt, R. Aggarwal, V. Andreev, M. Arratia, V. Aushev, A. Baghdasaryan, A. Baty, K. Begzsuren, O. Behnke, A. Belousov, A. Bertolin, I. Bloch, V. Boudry, G. Brandt, I. Brock, N. H. Brook, R. Brugnera, A. Bruni, A. Buniatyan, P. J. Bussey, L. Bystritskaya, A. Caldwell , et al. (212 additional authors not shown)

    Abstract: The HERAPDF2.0 ensemble of parton distribution functions (PDFs) was introduced in 2015. The final stage is presented, a next-to-next-to-leading-order (NNLO) analysis of the HERA data on inclusive deep inelastic $ep$ scattering together with jet data as published by the H1 and ZEUS collaborations. A perturbative QCD fit, simultaneously of $α_s(M_Z^2)$ and and the PDFs, was performed with the result… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

    Comments: 43 pages, 24 figures, to be submitted to Eur. Phys. J. C

    Report number: DESY-21-206

  22. arXiv:2110.13855  [pdf, other

    cs.LG

    Average-Reward Learning and Planning with Options

    Authors: Yi Wan, Abhishek Naik, Richard S. Sutton

    Abstract: We extend the options framework for temporal abstraction in reinforcement learning from discounted Markov decision processes (MDPs) to average-reward MDPs. Our contributions include general convergent off-policy inter-option learning algorithms, intra-option algorithms for learning values and models, as well as sample-based planning variants of our learning algorithms. Our algorithms and convergen… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

  23. arXiv:2109.05110  [pdf, other

    cs.LG cs.AI

    An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment

    Authors: Sina Ghiassian, Richard S. Sutton

    Abstract: Many off-policy prediction learning algorithms have been proposed in the past decade, but it remains unclear which algorithms learn faster than others. We empirically compare 11 off-policy prediction learning algorithms with linear function approximation on two small tasks: the Rooms task, and the High Variance Rooms task. The tasks are designed such that learning fast in them is challenging. In t… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: 13 pages

  24. arXiv:2108.06325  [pdf, other

    cs.LG

    Continual Backprop: Stochastic Gradient Descent with Persistent Randomness

    Authors: Shibhansh Dohare, Richard S. Sutton, A. Rupam Mahmood

    Abstract: The Backprop algorithm for learning in neural networks utilizes two mechanisms: first, stochastic gradient descent and second, initialization with small random weights, where the latter is essential to the effectiveness of the former. We show that in continual learning setups, Backprop performs well initially, but over time its performance degrades. Stochastic gradient descent alone is insufficien… ▽ More

    Submitted 5 May, 2022; v1 submitted 13 August, 2021; originally announced August 2021.

  25. arXiv:2107.11404  [pdf

    astro-ph.EP astro-ph.IM physics.geo-ph

    Trace Elemental Behavior in the Solar Nebula: Synchrotron X-ray Fluorescence Analyses of CM and CR Chondritic Iron Sulfides and Associated Metal

    Authors: Sheryl A. Singerling, Stephen R. Sutton, Antonio Lanzirotti, Matthew Newville, Adrian J. Brearley

    Abstract: We have performed a coordinated focused ion beam (FIB)-scanning and transmission electron microscopy (S/TEM), electron probe microanalysis (EMPA)-synchrotron X-ray fluorescence (SXRF) microprobe study to determine phase-specific microstructural characteristics and high-resolution in situ trace element concentrations of primary pyrrhotite, pentlandite, and associated metal grains from chondrules in… ▽ More

    Submitted 23 July, 2021; originally announced July 2021.

    Comments: 47 pages including appendices, 12 figures in main paper, 5 tables in main paper

  26. arXiv:2106.00922  [pdf, other

    cs.LG cs.AI

    An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task

    Authors: Sina Ghiassian, Richard S. Sutton

    Abstract: Off-policy prediction -- learning the value function for one policy from data generated while following another policy -- is one of the most challenging subproblems in reinforcement learning. This paper presents empirical results with eleven prominent off-policy learning algorithms that use linear function approximation: five Gradient-TD methods, two Emphatic-TD methods, Off-policy TD($λ$), Vtrace… ▽ More

    Submitted 11 June, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

  27. arXiv:2104.08543  [pdf, other

    cs.AI

    Planning with Expectation Models for Control

    Authors: Katya Kudashkina, Yi Wan, Abhishek Naik, Richard S. Sutton

    Abstract: In model-based reinforcement learning (MBRL), Wan et al. (2019) showed conditions under which the environment model could produce the expectation of the next feature vector rather than the full distribution, or a sample thereof, with no loss in planning performance. Such expectation models are of interest when the environment is stochastic and non-stationary, and the model is approximate, such as… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

  28. arXiv:2103.05787  [pdf, other

    cs.LG

    Scalable Online Recurrent Learning Using Columnar Neural Networks

    Authors: Khurram Javed, Martha White, Rich Sutton

    Abstract: Structural credit assignment for recurrent learning is challenging. An algorithm called RTRL can compute gradients for recurrent networks online but is computationally intractable for large networks. Alternatives, such as BPTT, are not online. In this work, we propose a credit-assignment algorithm -- \algoname{} -- that approximates the gradients for recurrent learning in real-time using $O(n)$ op… ▽ More

    Submitted 9 March, 2021; originally announced March 2021.

    Comments: Structural credit-assignment, scalable recurrent learning, scalable meta-learning, backward view credit-assignment

  29. arXiv:2102.07686  [pdf, other

    cs.LG cs.AI stat.ML

    Does the Adam Optimizer Exacerbate Catastrophic Forgetting?

    Authors: Dylan R. Ashley, Sina Ghiassian, Richard S. Sutton

    Abstract: Catastrophic forgetting remains a severe hindrance to the broad application of artificial neural networks (ANNs), however, it continues to be a poorly understood phenomenon. Despite the extensive amount of work on catastrophic forgetting, we argue that it is still unclear how exactly the phenomenon should be quantified, and, moreover, to what degree all of the choices we make when designing learni… ▽ More

    Submitted 9 June, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

    Comments: 9 pages in main text + 3 pages of references + 16 pages of appendices, 6 figures in main text + 21 figures in appendices, 6 tables in appendices; source code available at https://github.com/dylanashley/catastrophic-forgetting/tree/arxiv

    ACM Class: I.2.6

  30. arXiv:2101.02808  [pdf, other

    cs.LG cs.AI

    Average-Reward Off-Policy Policy Evaluation with Function Approximation

    Authors: Shangtong Zhang, Yi Wan, Richard S. Sutton, Shimon Whiteson

    Abstract: We consider off-policy policy evaluation with function approximation (FA) in average-reward MDPs, where the goal is to estimate both the reward rate and the differential value function. For this problem, bootstrap** is necessary and, along with off-policy learning and FA, results in the deadly triad (Sutton & Barto, 2018). To address the deadly triad, we propose two novel algorithms, reproducing… ▽ More

    Submitted 18 October, 2022; v1 submitted 7 January, 2021; originally announced January 2021.

    Comments: ICML 2021

  31. arXiv:2011.04590  [pdf, other

    cs.AI

    From Eye-blinks to State Construction: Diagnostic Benchmarks for Online Representation Learning

    Authors: Banafsheh Rafiee, Zaheer Abbas, Sina Ghiassian, Raksha Kumaraswamy, Richard Sutton, Elliot Ludvig, Adam White

    Abstract: We present three new diagnostic prediction problems inspired by classical-conditioning experiments to facilitate research in online prediction learning. Experiments in classical conditioning show that animals such as rabbits, pigeons, and dogs can make long temporal associations that enable multi-step prediction. To replicate this remarkable ability, an agent must construct an internal state repre… ▽ More

    Submitted 10 October, 2022; v1 submitted 9 November, 2020; originally announced November 2020.

  32. arXiv:2010.15268  [pdf, other

    cs.LG cs.AI

    Understanding the Pathologies of Approximate Policy Evaluation when Combined with Greedification in Reinforcement Learning

    Authors: Kenny Young, Richard S. Sutton

    Abstract: Despite empirical success, the theory of reinforcement learning (RL) with value function approximation remains fundamentally incomplete. Prior work has identified a variety of pathological behaviours that arise in RL algorithms that combine approximate on-policy evaluation and greedification. One prominent example is policy oscillation, wherein an algorithm may cycle indefinitely between policies,… ▽ More

    Submitted 28 October, 2020; originally announced October 2020.

  33. arXiv:2008.12095  [pdf, other

    cs.AI cs.HC cs.LG

    Document-editing Assistants and Model-based Reinforcement Learning as a Path to Conversational AI

    Authors: Katya Kudashkina, Patrick M. Pilarski, Richard S. Sutton

    Abstract: Intelligent assistants that follow commands or answer simple questions, such as Siri and Google search, are among the most economically important applications of AI. Future conversational AI assistants promise even greater capabilities and a better user experience through a deeper understanding of the domain, the user, or the user's purposes. But what domain and what methods are best suited to res… ▽ More

    Submitted 27 August, 2020; originally announced August 2020.

    Comments: Currently under review

  34. arXiv:2008.11329  [pdf, other

    cs.LG cs.AI

    Inverse Policy Evaluation for Value-based Sequential Decision-making

    Authors: Alan Chan, Kris de Asis, Richard S. Sutton

    Abstract: Value-based methods for reinforcement learning lack generally applicable ways to derive behavior from a value function. Many approaches involve approximate value iteration (e.g., $Q$-learning), and acting greedily with respect to the estimates with an arbitrary degree of entropy to ensure that the state-space is sufficiently explored. Behavior based on explicit greedification assumes that the valu… ▽ More

    Submitted 25 August, 2020; originally announced August 2020.

    Comments: Submitted to NeurIPS 2020

  35. arXiv:2006.16318  [pdf, other

    cs.LG cs.AI

    Learning and Planning in Average-Reward Markov Decision Processes

    Authors: Yi Wan, Abhishek Naik, Richard S. Sutton

    Abstract: We introduce learning and planning algorithms for average-reward MDPs, including 1) the first general proven-convergent off-policy model-free control algorithm without reference states, 2) the first proven-convergent off-policy model-free prediction algorithm, and 3) the first off-policy learning algorithm that converges to the actual value function rather than to the value function plus an offset… ▽ More

    Submitted 28 June, 2021; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: In Proceedings of ICML 2021

  36. arXiv:2003.01700  [pdf, other

    hep-ph hep-ex

    Les Houches 2019: Physics at TeV Colliders: Standard Model Working Group Report

    Authors: S. Amoroso, P. Azzurri, J. Bendavid, E. Bothmann, D. Britzger, H. Brooks, A. Buckley, M. Calvetti, X. Chen, M. Chiesa, L. Cieri, V. Ciulli, J. Cruz-Martinez, A. Cueto, A. Denner, S. Dittmaier, M. Donegà, M. Dührssen-Debling, I. Fabre, S. Ferrario-Ravasio, D. de Florian, S. Forte, P. Francavilla, T. Gehrmann, A. Gehrmann-De Ridder , et al. (58 additional authors not shown)

    Abstract: This Report summarizes the proceedings of the 2019 Les Houches workshop on Physics at TeV Colliders. Session 1 dealt with (I) new developments for high precision Standard Model calculations, (II) the sensitivity of parton distribution functions to the experimental inputs, (III) new developments in jet substructure techniques and a detailed examination of gluon fragmentation at the LHC, (IV) issues… ▽ More

    Submitted 3 March, 2020; originally announced March 2020.

    Comments: Proceedings of the Standard Model Working Group of the 2019 Les Houches Workshop, Physics at TeV Colliders, Les Houches 10-28 June 2019. 226 pages

  37. arXiv:1912.04002  [pdf, other

    cs.LG stat.ML

    Learning Sparse Representations Incrementally in Deep Reinforcement Learning

    Authors: J. Fernando Hernandez-Garcia, Richard S. Sutton

    Abstract: Sparse representations have been shown to be useful in deep reinforcement learning for mitigating catastrophic interference and improving the performance of agents in terms of cumulative reward. Previous results were based on a two step process were the representation was learned offline and the action-value function was learned online afterwards. In this paper, we investigate if it is possible to… ▽ More

    Submitted 9 December, 2019; originally announced December 2019.

  38. arXiv:1910.02140  [pdf, ps, other

    cs.AI

    Discounted Reinforcement Learning Is Not an Optimization Problem

    Authors: Abhishek Naik, Roshan Shariff, Niko Yasui, Hengshuai Yao, Richard S. Sutton

    Abstract: Discounted reinforcement learning is fundamentally incompatible with function approximation for control in continuing tasks. It is not an optimization problem in its usual formulation, so when using function approximation there is no optimal policy. We substantiate these claims, then go on to address some misconceptions about discounting and its connection to the average reward formulation. We enc… ▽ More

    Submitted 27 November, 2019; v1 submitted 4 October, 2019; originally announced October 2019.

    Comments: Accepted for presentation at the Optimization Foundations of Reinforcement Learning Workshop at NeurIPS 2019

  39. arXiv:1909.03906  [pdf, other

    cs.LG cs.AI

    Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

    Authors: Kristopher De Asis, Alan Chan, Silviu Pitis, Richard S. Sutton, Daniel Graves

    Abstract: We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a $\textit{fixed}$ number of future time steps. To learn the value function for horizon $h$, these algorithms bootstrap from the value function for horizon $h-1$, or some shorter horizon. Because no value function bootstraps from itself… ▽ More

    Submitted 10 February, 2020; v1 submitted 9 September, 2019; originally announced September 2019.

    Comments: AAAI 2020

    ACM Class: I.2

  40. arXiv:1908.03568  [pdf, other

    cs.LG cs.AI stat.ML

    Behaviour Suite for Reinforcement Learning

    Authors: Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado Van Hasselt

    Abstract: This paper introduces the Behaviour Suite for Reinforcement Learning, or bsuite for short. bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives. First, to collect clear, informative and scalable problems that capture key issues in the design of general and efficient learning algorithms. Second, to stud… ▽ More

    Submitted 14 February, 2020; v1 submitted 9 August, 2019; originally announced August 2019.

  41. Calculations for deep inelastic scattering using fast interpolation grid techniques at NNLO in QCD and the extraction of $α_s$ from HERA data

    Authors: D. Britzger, J. Currie, A. Gehrmann-De Ridder, T. Gehrmann, E. W. N. Glover, C. Gwenlan, A. Huss, T. Morgan, J. Niehues, J. Pires, K. Rabbertz, M. R. Sutton

    Abstract: The extension of interpolation-grid frameworks for perturbative QCD calculations at next-to-next-to-leading order (NNLO) is presented for deep inelastic scattering (DIS) processes. A fast and flexible evaluation of higher-order predictions for any a posteriori choice of parton distribution functions (PDFs) or value of the strong coupling constant is essential in iterative fitting procedures to ext… ▽ More

    Submitted 27 August, 2021; v1 submitted 12 June, 2019; originally announced June 2019.

    Comments: 13 pages, 6 figures, 2 tables. v2: corrected scale bands in Fig. 4; version to appear in EPJC. v3: changes as discussed in an erratum submitted to EPJ C

    Report number: CERN-TH-2019-079, CFTP/19-020, IPPP/19/44, MPP-2019-114, ZU-TH 29/19

  42. arXiv:1904.01191  [pdf, other

    cs.LG cs.AI stat.ML

    Planning with Expectation Models

    Authors: Yi Wan, Zaheer Abbas, Adam White, Martha White, Richard S. Sutton

    Abstract: Distribution and sample models are two popular model choices in model-based reinforcement learning (MBRL). However, learning these models can be intractable, particularly when the state and action spaces are large. Expectation models, on the other hand, are relatively easier to learn due to their compactness and have also been widely used for deterministic environments. For stochastic environments… ▽ More

    Submitted 29 July, 2020; v1 submitted 1 April, 2019; originally announced April 2019.

  43. arXiv:1903.03252  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Feature Relevance Through Step Size Adaptation in Temporal-Difference Learning

    Authors: Alex Kearney, Vivek Veeriah, Jaden Travnik, Patrick M. Pilarski, Richard S. Sutton

    Abstract: There is a long history of using meta learning as representation learning, specifically for determining the relevance of inputs. In this paper, we examine an instance of meta-learning in which feature relevance is learned by adapting step size parameters of stochastic gradient descent---building on a variety of prior work in stochastic approximation, machine learning, and artificial neural network… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.

  44. arXiv:1903.00194  [pdf, other

    cs.AI cs.LG

    Should All Temporal Difference Learning Use Emphasis?

    Authors: Xiang Gu, Sina Ghiassian, Richard S. Sutton

    Abstract: Emphatic Temporal Difference (ETD) learning has recently been proposed as a convergent off-policy learning method. ETD was proposed mainly to address convergence issues of conventional Temporal Difference (TD) learning under off-policy training but it is different from conventional TD learning even under on-policy training. A simple counterexample provided back in 2017 pointed to a potential class… ▽ More

    Submitted 1 March, 2019; originally announced March 2019.

  45. arXiv:1901.07510  [pdf, other

    cs.LG stat.ML

    Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target

    Authors: J. Fernando Hernandez-Garcia, Richard S. Sutton

    Abstract: Multi-step methods such as Retrace($λ$) and $n$-step $Q$-learning have become a crucial component of modern deep reinforcement learning agents. These methods are often evaluated as a part of bigger architectures and their evaluations rarely include enough samples to draw statistically significant conclusions about their performance. This type of methodology makes it difficult to understand how par… ▽ More

    Submitted 7 February, 2019; v1 submitted 22 January, 2019; originally announced January 2019.

  46. arXiv:1811.02597  [pdf, other

    cs.LG cs.AI stat.ML

    Online Off-policy Prediction

    Authors: Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White

    Abstract: This paper investigates the problem of online prediction learning, where learning proceeds continuously as the agent interacts with an environment. The predictions made by the agent are contingent on a particular way of behaving, represented as a value function. However, the behavior used to select actions and generate the behavior data might be different from the one used to define the prediction… ▽ More

    Submitted 6 November, 2018; originally announced November 2018.

    Comments: 68 pages

  47. arXiv:1809.07435  [pdf, other

    cs.LG cs.AI eess.SP

    Predicting Periodicity with Temporal Difference Learning

    Authors: Kristopher De Asis, Brendan Bennett, Richard S. Sutton

    Abstract: Temporal difference (TD) learning is an important approach in reinforcement learning, as it combines ideas from dynamic programming and Monte Carlo methods in a way that allows for online and incremental model-free learning. A key idea of TD learning is that it is learning predictive knowledge about the environment in the form of value functions, from which it can derive its behavior to address lo… ▽ More

    Submitted 19 September, 2018; originally announced September 2018.

  48. arXiv:1807.01830  [pdf, other

    cs.LG cs.AI stat.ML

    Per-decision Multi-step Temporal Difference Learning with Control Variates

    Authors: Kristopher De Asis, Richard S. Sutton

    Abstract: Multi-step temporal difference (TD) learning is an important approach in reinforcement learning, as it unifies one-step TD learning with Monte Carlo methods in a way where intermediate algorithms can outperform either extreme. They address a bias-variance trade off between reliance on current estimates, which could be poor, and incorporating longer sampled reward sequences into the updates. Especi… ▽ More

    Submitted 4 July, 2018; originally announced July 2018.

    Journal ref: (2018). In Conference on Uncertainty in Artificial Intelligence. http://auai.org/uai2018/proceedings/papers/282.pdf

  49. arXiv:1806.00540  [pdf, other

    cs.LG cs.AI stat.ML

    Integrating Episodic Memory into a Reinforcement Learning Agent using Reservoir Sampling

    Authors: Kenny J. Young, Richard S. Sutton, Shuo Yang

    Abstract: Episodic memory is a psychology term which refers to the ability to recall specific events from the past. We suggest one advantage of this particular type of memory is the ability to easily assign credit to a specific state when remembered information is found to be useful. Inspired by this idea, and the increasing popularity of external memory mechanisms to handle long-term dependencies in deep l… ▽ More

    Submitted 1 June, 2018; originally announced June 2018.

  50. arXiv:1805.07476  [pdf, other

    cs.LG cs.AI stat.ML

    Two geometric input transformation methods for fast online reinforcement learning with neural nets

    Authors: Sina Ghiassian, Huizhen Yu, Banafsheh Rafiee, Richard S. Sutton

    Abstract: We apply neural nets with ReLU gates in online reinforcement learning. Our goal is to train these networks in an incremental manner, without the computationally expensive experience replay. By studying how individual neural nodes behave in online training, we recognize that the global nature of ReLU gates can cause undesirable learning interference in each node's learning behavior. We propose redu… ▽ More

    Submitted 6 September, 2018; v1 submitted 18 May, 2018; originally announced May 2018.

    Comments: 16 pages