Skip to main content

Showing 1–26 of 26 results for author: Abel, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.14769  [pdf, other

    cs.LG cs.CL

    Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human Input

    Authors: Andi Peng, Yuying Sun, Tianmin Shu, David Abel

    Abstract: Humans use social context to specify preferences over behaviors, i.e. their reward functions. Yet, algorithms for inferring reward models from preference data do not take this social learning view into account. Inspired by pragmatic human communication, we study how to extract fine-grained data regarding why an example is preferred that is useful for learning more accurate reward models. We propos… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  2. arXiv:2307.11046  [pdf, other

    cs.LG cs.AI

    A Definition of Continual Reinforcement Learning

    Authors: David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh

    Abstract: In a standard view of the reinforcement learning problem, an agent's goal is to efficiently identify a policy that maximizes long-term reward. However, this perspective is based on a restricted view of learning as finding a solution, rather than treating learning as endless adaptation. In contrast, continual reinforcement learning refers to the setting in which the best agents never stop learning.… ▽ More

    Submitted 1 December, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023

  3. arXiv:2307.11044  [pdf, other

    cs.LG cs.AI

    On the Convergence of Bounded Agents

    Authors: David Abel, André Barreto, Hado van Hasselt, Benjamin Van Roy, Doina Precup, Satinder Singh

    Abstract: When has an agent converged? Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing. However, as we shift the focus of our learning problem from the environment's state to the agent's state, the concept of an agent's convergence becomes significantly… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  4. arXiv:2212.10420  [pdf, other

    cs.AI cs.LG math.ST

    Settling the Reward Hypothesis

    Authors: Michael Bowling, John D. Martin, David Abel, Will Dabney

    Abstract: The reward hypothesis posits that, "all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)." We aim to fully settle this hypothesis. This will not conclude with a simple affirmation or refutation, but rather specify completely the implicit requirements on goals and purposes under which the hy… ▽ More

    Submitted 16 September, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  5. arXiv:2211.05400   

    cs.RO

    onlineFGO: Online Continuous-Time Factor Graph Optimization with Time-Centric Multi-Sensor Fusion for Robust Localization in Large-Scale Environments

    Authors: Haoming Zhang, Felix Widmayer, Lars Lünnemann, Dirk Abel

    Abstract: Accurate and consistent vehicle localization in urban areas is challenging due to the large-scale and complicated environments. In this paper, we propose onlineFGO, a novel time-centric graph-optimization-based localization method that fuses multiple sensor measurements with the continuous-time trajectory representation for vehicle localization tasks. We generalize the graph construction independe… ▽ More

    Submitted 1 September, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

    Comments: A major revision of this paper is available, which will be submitted to arXiv later

  6. arXiv:2210.08803  [pdf, other

    cs.DC cs.AI cs.IR cs.LG

    Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference

    Authors: Joey Wang, Yingcan Wei, Minseok Lee, Matthias Langer, Fan Yu, Jie Liu, Alex Liu, Daniel Abel, Gems Guo, Jianbing Dong, Jerry Shi, Kunlun Li

    Abstract: In this talk, we introduce Merlin HugeCTR. Merlin HugeCTR is an open source, GPU-accelerated integration framework for click-through rate estimation. It optimizes both training and inference, whilst enabling model training at scale with model-parallel embeddings and data-parallel neural networks. In particular, Merlin HugeCTR combines a high-performance GPU embedding cache with an hierarchical sto… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: 4 pages

    Journal ref: Proceedings of the 16th ACM Conference on Recommender Systems, 2022

  7. arXiv:2209.06159  [pdf, other

    cs.LG

    Meta-Gradients in Non-Stationary Environments

    Authors: Jelena Luketina, Sebastian Flennerhag, Yannick Schroecker, David Abel, Tom Zahavy, Satinder Singh

    Abstract: Meta-gradient methods (Xu et al., 2018; Zahavy et al., 2020) offer a promising solution to the problem of hyperparameter selection and adaptation in non-stationary reinforcement learning problems. However, the properties of meta-gradients in such environments have not been systematically studied. In this work, we bring new clarity to meta-gradients in non-stationary environments. Concretely, we as… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: 16 pages, 9 figures, CoLLAs 2022

  8. arXiv:2203.00397  [pdf, other

    cs.LG cs.AI

    A Theory of Abstraction in Reinforcement Learning

    Authors: David Abel

    Abstract: Reinforcement learning defines the problem facing agents that learn to make good decisions through action and observation alone. To be effective problem solvers, such agents must efficiently explore vast worlds, assign credit from delayed feedback, and generalize to new experiences, all while making use of limited data, computational resources, and perceptual bandwidth. Abstraction is essential to… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

    Journal ref: Doctoral Dissertation, Department of Computer Science, Brown University, 2020

  9. arXiv:2111.00876  [pdf, other

    cs.LG cs.AI

    On the Expressivity of Markov Reward

    Authors: David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh

    Abstract: Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajector… ▽ More

    Submitted 18 January, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

    Comments: Accepted to NeurIPS 2021

  10. arXiv:2110.03424  [pdf, other

    cs.LG cs.AI

    Bad-Policy Density: A Measure of Reinforcement Learning Hardness

    Authors: David Abel, Cameron Allen, Dilip Arumugam, D. Ellis Hershkowitz, Michael L. Littman, Lawson L. S. Wong

    Abstract: Reinforcement learning is hard in general. Yet, in many specific environments, learning is easy. What makes learning easy in one environment, but difficult in another? We address this question by proposing a simple measure of reinforcement-learning hardness called the bad-policy density. This quantity measures the fraction of the deterministic stationary policy space that is below a desired thresh… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: Presented at the 2021 ICML Workshop on Reinforcement Learning Theory

  11. People construct simplified mental representations to plan

    Authors: Mark K. Ho, David Abel, Carlos G. Correa, Michael L. Littman, Jonathan D. Cohen, Thomas L. Griffiths

    Abstract: One of the most striking features of human cognition is the capacity to plan. Two aspects of human planning stand out: its efficiency and flexibility. Efficiency is especially impressive because plans must often be made in complex environments, and yet people successfully plan solutions to myriad everyday problems despite having limited cognitive resources. Standard accounts in psychology, economi… ▽ More

    Submitted 26 November, 2022; v1 submitted 14 May, 2021; originally announced May 2021.

    Comments: 56 pages, 5 main figures, 10 extended data figures, supplementary information is included in ancillary files

    Journal ref: Nature, 606(7912), 129-136 (2022)

  12. arXiv:2103.00107  [pdf, other

    cs.LG cs.AI stat.ML

    Revisiting Peng's Q($λ$) for Modern Reinforcement Learning

    Authors: Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel

    Abstract: Off-policy multi-step reinforcement learning algorithms consist of conservative and non-conservative algorithms: the former actively cut traces, whereas the latter do not. Recently, Munos et al. (2016) proved the convergence of conservative algorithms to an optimal Q-function. In contrast, non-conservative algorithms are thought to be unsafe and have a limited or no theoretical guarantee. Nonethel… ▽ More

    Submitted 26 February, 2021; originally announced March 2021.

    Comments: 26 pages, 7 figures, 2 tables

  13. arXiv:2012.10394  [pdf, other

    physics.ao-ph cs.LG

    Deep Learning for Climate Model Output Statistics

    Authors: Michael Steininger, Daniel Abel, Katrin Ziegler, Anna Krause, Heiko Paeth, Andreas Hotho

    Abstract: Climate models are an important tool for the assessment of prospective climate change effects but they suffer from systematic and representation errors, especially for precipitation. Model output statistics (MOS) reduce these errors by fitting the model output to observational data with machine learning. In this work, we explore the feasibility and potential of deep learning with convolutional neu… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

    Comments: Accepted for the Tackling Climate Change with Machine Learning Workshop at NeurIPS 2020

  14. arXiv:2006.15085  [pdf, other

    cs.LG cs.AI stat.ML

    What can I do here? A Theory of Affordances in Reinforcement Learning

    Authors: Khimya Khetarpal, Zafarali Ahmed, Gheorghe Comanici, David Abel, Doina Precup

    Abstract: Reinforcement learning algorithms usually assume that all actions are always available to an agent. However, both people and animals understand the general link between the features of their environment and the actions that are feasible. Gibson (1977) coined the term "affordances" to describe the fact that certain states enable an agent to do certain actions, in the context of embodied agents. In… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

    Comments: Thirty-seventh International Conference on Machine Learning (ICML 2020)

  15. arXiv:2002.05769  [pdf, other

    cs.AI

    The Efficiency of Human Cognition Reflects Planned Information Processing

    Authors: Mark K. Ho, David Abel, Jonathan D. Cohen, Michael L. Littman, Thomas L. Griffiths

    Abstract: Planning is useful. It lets people take actions that have desirable long-term consequences. But, planning is hard. It requires thinking about consequences, which consumes limited computational and cognitive resources. Thus, people should plan their actions, but they should also be smart about how they deploy resources used for planning their actions. Put another way, people should also "plan their… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Comments: 13 pg (incl. supplemental materials); included in Proceedings of the 34th AAAI Conference on Artificial Intelligence

  16. arXiv:2002.05518  [pdf, other

    cs.LG cs.AI stat.ML

    Learning State Abstractions for Transfer in Continuous Control

    Authors: Kavosh Asadi, David Abel, Michael L. Littman

    Abstract: Can simple algorithms with a good representation solve challenging reinforcement learning problems? In this work, we answer this question in the affirmative, where we take "simple learning algorithm" to be tabular Q-Learning, the "good representations" to be a learned state abstraction, and "challenging problems" to be continuous control tasks. Our main contribution is a learning algorithm that ab… ▽ More

    Submitted 8 February, 2020; originally announced February 2020.

  17. arXiv:2001.05411  [pdf, other

    cs.LG cs.AI stat.ML

    Lipschitz Lifelong Reinforcement Learning

    Authors: Erwan Lecarpentier, David Abel, Kavosh Asadi, Yuu **nai, Emmanuel Rachelson, Michael L. Littman

    Abstract: We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes (MDPs) and establish that close MDPs have close optimal value functions. Formally, the optimal value functions are Lipschitz continuous with respect to the tasks space. These theoretical results lead us to a value-transfe… ▽ More

    Submitted 22 March, 2021; v1 submitted 15 January, 2020; originally announced January 2020.

    Comments: In proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI 2021), 21 pages, 11 figures

  18. arXiv:1910.11116  [pdf, other

    cs.RO cs.CV

    Depth Camera Based Particle Filter for Robotic Osteotomy Navigation

    Authors: Tim Übelhör, Jonas Gesenhues, Nassim Ayoub, Ali Modabber, Dirk Abel

    Abstract: Active surgical robots lack acceptance in clinical practice, because they do not offer the flexibility and usability required for a versatile usage: the systems require a large installation space or a complicated registration step, where the preoperative plan is aligned to the patient and transformed to the base frame of the robot. In this paper, a navigation system for robotic osteotomies is desi… ▽ More

    Submitted 24 October, 2019; originally announced October 2019.

    Comments: 6 pages, submitted to ICRA 2020

  19. arXiv:1903.00606  [pdf, other

    cs.AI

    Discovering Options for Exploration by Minimizing Cover Time

    Authors: Yuu **nai, Jee Won Park, David Abel, George Konidaris

    Abstract: One of the main challenges in reinforcement learning is solving tasks with sparse reward. We show that the difficulty of discovering a distant rewarding state in an MDP is bounded by the expected cover time of a random walk over the graph induced by the MDP's transition dynamics. We therefore propose to accelerate exploration by constructing options that minimize cover time. The proposed algorithm… ▽ More

    Submitted 16 March, 2019; v1 submitted 1 March, 2019; originally announced March 2019.

  20. arXiv:1812.01129  [pdf, other

    cs.LG cs.AI

    Mitigating Planner Overfitting in Model-Based Reinforcement Learning

    Authors: Dilip Arumugam, David Abel, Kavosh Asadi, Nakul Gopalan, Christopher Grimm, Jun Ki Lee, Lucas Lehnert, Michael L. Littman

    Abstract: An agent with an inaccurate model of its environment faces a difficult choice: it can ignore the errors in its model and act in the real world in whatever way it determines is optimal with respect to its model. Alternatively, it can take a more conservative stance and eschew its model in favor of optimizing its behavior solely via real-world interaction. This latter approach can be exceedingly slo… ▽ More

    Submitted 19 March, 2020; v1 submitted 3 December, 2018; originally announced December 2018.

  21. arXiv:1810.07311  [pdf, other

    cs.AI

    Finding Options that Minimize Planning Time

    Authors: Yuu **nai, David Abel, D Ellis Hershkowitz, Michael Littman, George Konidaris

    Abstract: We formalize the problem of selecting the optimal set of options for planning as that of computing the smallest set of options so that planning converges in less than a given maximum of value-iteration passes. We first show that the problem is NP-hard, even if the task is constrained to be deterministic---the first such complexity result for option discovery. We then present the first polynomial-t… ▽ More

    Submitted 16 March, 2019; v1 submitted 16 October, 2018; originally announced October 2018.

  22. arXiv:1706.00536  [pdf, other

    cs.AI

    Modeling Latent Attention Within Neural Networks

    Authors: Christopher Grimm, Dilip Arumugam, Siddharth Karamcheti, David Abel, Lawson L. S. Wong, Michael L. Littman

    Abstract: Deep neural networks are able to solve tasks across a variety of domains and modalities of data. Despite many empirical successes, we lack the ability to clearly understand and interpret the learned internal mechanisms that contribute to such effective behaviors or, more critically, failure modes. In this work, we present a general method for visualizing an arbitrary neural network's inner mechani… ▽ More

    Submitted 30 December, 2017; v1 submitted 1 June, 2017; originally announced June 2017.

  23. arXiv:1701.04113  [pdf, other

    cs.LG cs.AI

    Near Optimal Behavior via Approximate State Abstraction

    Authors: David Abel, D. Ellis Hershkowitz, Michael L. Littman

    Abstract: The combinatorial explosion that plagues planning and reinforcement learning (RL) algorithms can be moderated using state abstraction. Prohibitively large task representations can be condensed such that essential information is preserved, and consequently, solutions are tractably computable. However, exact abstractions, which treat only fully-identical situations as equivalent, fail to present opp… ▽ More

    Submitted 15 January, 2017; originally announced January 2017.

    Comments: Earlier version published at ICML 2016

  24. arXiv:1701.04079  [pdf, other

    cs.LG cs.AI

    Agent-Agnostic Human-in-the-Loop Reinforcement Learning

    Authors: David Abel, John Salvatier, Andreas Stuhlmüller, Owain Evans

    Abstract: Providing Reinforcement Learning agents with expert advice can dramatically improve various aspects of learning. Prior work has developed teaching protocols that enable agents to learn efficiently in complex environments; many of these methods tailor the teacher's guidance to agents with a particular representation or underlying learning scheme, offering effective but specialized teaching procedur… ▽ More

    Submitted 15 January, 2017; originally announced January 2017.

    Comments: Presented at the NIPS Workshop on the Future of Interactive Learning Machines, 2016

  25. arXiv:1603.04119  [pdf, other

    cs.AI cs.LG stat.ML

    Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains

    Authors: David Abel, Alekh Agarwal, Fernando Diaz, Akshay Krishnamurthy, Robert E. Schapire

    Abstract: High-dimensional observations and complex real-world dynamics present major challenges in reinforcement learning for both function approximation and exploration. We address both of these challenges with two complementary techniques: First, we develop a gradient-boosting style, non-parametric function approximator for learning on $Q$-function residuals. And second, we propose an exploration strateg… ▽ More

    Submitted 13 March, 2016; originally announced March 2016.

  26. arXiv:1503.00810  [pdf, other

    cs.CY

    Development of an Android Application for an Electronic Medical Record System in an Outpatient Environment for Healthcare in Fiji

    Authors: Daryl Abel, Bulou Gavidi, Nicholas Rollings, Rohitash Chandra

    Abstract: The outpatients department in a develo** country is typically understaffed and inadequately equipped to handle a large numbers of patients filing through on an average day. The use of electronic medical record (EMR) systems can resolve some of the longstanding medical inefficiencies common in develo** countries. This paper presents the design and implementation of a proposed outpatient managem… ▽ More

    Submitted 2 March, 2015; originally announced March 2015.

    Comments: Technical Report, AICRG, Software Foundation, Fiji, March 2015

    Report number: TR-03-2015