Skip to main content

Showing 1–18 of 18 results for author: van Seijen, H

.
  1. arXiv:2310.00229  [pdf, other

    cs.AI cs.LG

    Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning

    Authors: Mingde Zhao, Safa Alver, Harm van Seijen, Romain Laroche, Doina Precup, Yoshua Bengio

    Abstract: Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning framework utilizing spatio-temporal abstractions to generalize better in novel situations. It automatically decomposes the given task into smaller, more manageable subtasks, and thus enables sparse decision-making and focused computation on the relevant parts of the environment. The decomposition relies… ▽ More

    Submitted 16 March, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: ICLR 2024 Camera-Ready

  2. arXiv:2303.08690  [pdf, other

    cs.LG cs.AI

    Replay Buffer with Local Forgetting for Adapting to Local Environment Changes in Deep Model-Based Reinforcement Learning

    Authors: Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Harm van Seijen, Sarath Chandar

    Abstract: One of the key behavioral characteristics used in neuroscience to determine whether the subject of study -- be it a rodent or a human -- exhibits model-based learning is effective adaptation to local changes in the environment, a particular form of adaptivity that is the focus of this work. In reinforcement learning, however, recent work has shown that modern deep model-based reinforcement-learnin… ▽ More

    Submitted 27 September, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

  3. arXiv:2211.00164  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information

    Authors: Riashat Islam, Manan Tomar, Alex Lamb, Yonathan Efroni, Hongyu Zang, Aniket Didolkar, Dipendra Misra, Xin Li, Harm van Seijen, Remi Tachet des Combes, John Langford

    Abstract: Learning to control an agent from data collected offline in a rich pixel-based visual observation space is vital for real-world applications of reinforcement learning (RL). A major challenge in this setting is the presence of input information that is hard to model and irrelevant to controlling the agent. This problem has been approached by the theoretical RL community through the lens of exogenou… ▽ More

    Submitted 13 August, 2023; v1 submitted 31 October, 2022; originally announced November 2022.

    Comments: ICML 2023

  4. arXiv:2207.00429  [pdf, other

    cs.LG cs.AI

    Modular Lifelong Reinforcement Learning via Neural Composition

    Authors: Jorge A. Mendez, Harm van Seijen, Eric Eaton

    Abstract: Humans commonly solve complex problems by decomposing them into easier subproblems and then combining the subproblem solutions. This type of compositional reasoning permits reuse of the subproblem solutions when tackling future tasks that share part of the underlying compositional structure. In a continual or lifelong reinforcement learning (RL) setting, this ability to decompose knowledge into re… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

    Comments: Published at ICLR 2022. Code: https://github.com/Lifelong-ML/Mendez2022ModularLifelongRL

  5. arXiv:2204.11464  [pdf, other

    cs.LG cs.AI

    Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods

    Authors: Yi Wan, Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Sarath Chandar, Harm van Seijen

    Abstract: In recent years, a growing number of deep model-based reinforcement learning (RL) methods have been introduced. The interest in deep model-based RL is not surprising, given its many potential benefits, such as higher sample efficiency and the potential for fast adaption to changes in the environment. However, we demonstrate, using an improved version of the recently introduced Local Change Adaptat… ▽ More

    Submitted 25 June, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

  6. arXiv:2203.04806  [pdf, other

    cs.CL

    One-Shot Learning from a Demonstration with Hierarchical Latent Language

    Authors: Nathaniel Weir, Xingdi Yuan, Marc-Alexandre Côté, Matthew Hausknecht, Romain Laroche, Ida Momennejad, Harm Van Seijen, Benjamin Van Durme

    Abstract: Humans have the capability, aided by the expressive compositionality of their language, to learn quickly by demonstration. They are able to describe unseen task-performing procedures and generalize their execution to other contexts. In this work, we introduce DescribeWorld, an environment designed to test this sort of generalization skill in grounded agents, where tasks are linguistically and proc… ▽ More

    Submitted 9 March, 2022; originally announced March 2022.

  7. arXiv:2107.06405  [pdf, other

    cs.LG cs.AI cs.RO

    Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks

    Authors: Sungryull Sohn, Sungtae Lee, Jongwook Choi, Harm van Seijen, Mehdi Fatemi, Honglak Lee

    Abstract: We propose the k-Shortest-Path (k-SP) constraint: a novel constraint on the agent's trajectory that improves the sample efficiency in sparse-reward MDPs. We show that any optimal policy necessarily satisfies the k-SP constraint. Notably, the k-SP constraint prevents the policy from exploring state-action pairs along the non-k-SP trajectories (e.g., going back and forth). However, in practice, excl… ▽ More

    Submitted 13 July, 2021; originally announced July 2021.

    Comments: In proceedings of ICML 2021

  8. arXiv:2010.01069  [pdf, other

    cs.LG cs.AI

    A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

    Authors: Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson, Remi Tachet des Combes

    Abstract: We investigate the discounting mismatch in actor-critic algorithm implementations from a representation learning perspective. Theoretically, actor-critic algorithms usually have discounting for both actor and critic, i.e., there is a $γ^t$ term in the actor update for the transition observed at time $t$ in a trajectory and the critic is a discounted value function. Practitioners, however, usually… ▽ More

    Submitted 26 January, 2022; v1 submitted 2 October, 2020; originally announced October 2020.

    Comments: AAMAS 2022

  9. arXiv:2007.03158  [pdf, other

    cs.LG cs.AI stat.ML

    The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning

    Authors: Harm van Seijen, Hadi Nekoei, Evan Racah, Sarath Chandar

    Abstract: Deep model-based Reinforcement Learning (RL) has the potential to substantially improve the sample-efficiency of deep RL. While various challenges have long held it back, a number of papers have recently come out reporting success with deep model-based methods. This is a great development, but the lack of a consistent metric to evaluate such methods makes it difficult to compare various approaches… ▽ More

    Submitted 3 December, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

    Comments: NeurIPS 2020, code: https://github.com/chandar-lab/LoCA

  10. arXiv:1906.00572  [pdf, other

    cs.LG stat.ML

    Using a Logarithmic Map** to Enable Lower Discount Factors in Reinforcement Learning

    Authors: Harm van Seijen, Mehdi Fatemi, Arash Tavakoli

    Abstract: In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that poor performance of low discount factors is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis tha… ▽ More

    Submitted 23 December, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

    Comments: NeurIPS 2019, code: https://github.com/microsoft/logrl

  11. arXiv:1809.02591  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Invariances for Policy Generalization

    Authors: Remi Tachet, Philip Bachman, Harm van Seijen

    Abstract: While recent progress has spawned very powerful machine learning systems, those agents remain extremely specialized and fail to transfer the knowledge they gain to similar yet unseen tasks. In this paper, we study a simple reinforcement learning problem and focus on learning policies that encode the proper invariances for generalization to different settings. We evaluate three potential methods fo… ▽ More

    Submitted 12 December, 2020; v1 submitted 7 September, 2018; originally announced September 2018.

    Comments: 7 pages, 1 figure

  12. arXiv:1706.04208  [pdf, other

    cs.LG

    Hybrid Reward Architecture for Reinforcement Learning

    Authors: Harm van Seijen, Mehdi Fatemi, Joshua Romoff, Romain Laroche, Tavian Barnes, Jeffrey Tsang

    Abstract: One of the main challenges in reinforcement learning (RL) is generalisation. In typical deep RL methods this is achieved by approximating the optimal value function with a low-dimensional representation using a deep network. While this approach works well in many domains, in domains where the optimal value function cannot easily be reduced to a low-dimensional representation, learning can be very… ▽ More

    Submitted 27 November, 2017; v1 submitted 13 June, 2017; originally announced June 2017.

  13. arXiv:1704.00756  [pdf, other

    cs.LG cs.AI stat.ML

    Multi-Advisor Reinforcement Learning

    Authors: Romain Laroche, Mehdi Fatemi, Joshua Romoff, Harm van Seijen

    Abstract: We consider tackling a single-agent RL problem by distributing it to $n$ learners. These learners, called advisors, endeavour to solve the problem from a different focus. Their advice, taking the form of action values, is then communicated to an aggregator, which is in control of the system. We show that the local planning method for the advisors is critical and that none of the ones found in the… ▽ More

    Submitted 14 November, 2017; v1 submitted 3 April, 2017; originally announced April 2017.

    Comments: Submitted at ICLR2018

  14. arXiv:1612.05159  [pdf, other

    cs.LG cs.AI

    Separation of Concerns in Reinforcement Learning

    Authors: Harm van Seijen, Mehdi Fatemi, Joshua Romoff, Romain Laroche

    Abstract: In this paper, we propose a framework for solving a single-agent task by using multiple agents, each focusing on different aspects of the task. This approach has two main advantages: 1) it allows for training specialized agents on different parts of the task, and 2) it provides a new way to transfer knowledge, by transferring trained agents. Our framework generalizes the traditional hierarchical d… ▽ More

    Submitted 28 March, 2017; v1 submitted 15 December, 2016; originally announced December 2016.

  15. arXiv:1608.05151  [pdf, other

    cs.AI

    Effective Multi-step Temporal-Difference Learning for Non-Linear Function Approximation

    Authors: Harm van Seijen

    Abstract: Multi-step temporal-difference (TD) learning, where the update targets contain information from multiple time steps ahead, is one of the most popular forms of TD learning for linear function approximation. The reason is that multi-step methods often yield substantially better performance than their single-step counter-parts, due to a lower bias of the update targets. For non-linear function approx… ▽ More

    Submitted 17 August, 2016; originally announced August 2016.

  16. arXiv:1512.04087  [pdf, other

    cs.AI cs.LG

    True Online Temporal-Difference Learning

    Authors: Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton

    Abstract: The temporal-difference methods TD($λ$) and Sarsa($λ$) form a core part of modern reinforcement learning. Their appeal comes from their good performance, low computational cost, and their simple interpretation, given by their forward view. Recently, new versions of these methods were introduced, called true online TD($λ$) and true online Sarsa($λ$), respectively (van Seijen & Sutton, 2014). These… ▽ More

    Submitted 8 September, 2016; v1 submitted 13 December, 2015; originally announced December 2015.

    Comments: This is the published JMLR version. It is a much improved version. The main changes are: 1) re-structuring of the article; 2) additional analysis on the forward view; 3) empirical comparison of traditional and new forward view; 4) added discussion of other true online papers; 5) updated discussion for non-linear function approximation

    Journal ref: Journal of Machine Learning Research (JMLR), 17(145):1-40, 2016

  17. arXiv:1507.00353  [pdf, other

    cs.AI cs.LG stat.ML

    An Empirical Evaluation of True Online TD(λ)

    Authors: Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Richard S. Sutton

    Abstract: The true online TD(λ) algorithm has recently been proposed (van Seijen and Sutton, 2014) as a universal replacement for the popular TD(λ) algorithm, in temporal-difference learning and reinforcement learning. True online TD(λ) has better theoretical properties than conventional TD(λ), and the expectation is that it also results in faster learning. In this paper, we put this hypothesis to the test.… ▽ More

    Submitted 1 July, 2015; originally announced July 2015.

    Comments: European Workshop on Reinforcement Learning (EWRL) 2015

  18. arXiv:1301.2343  [pdf, ps, other

    cs.AI cs.LG

    Planning by Prioritized Swee** with Small Backups

    Authors: Harm van Seijen, Richard S. Sutton

    Abstract: Efficient planning plays a crucial role in model-based reinforcement learning. Traditionally, the main planning operation is a full backup based on the current estimates of the successor states. Consequently, its computation time is proportional to the number of successor states. In this paper, we introduce a new planning backup that uses only the current value of a single successor state and has… ▽ More

    Submitted 10 January, 2013; originally announced January 2013.