Skip to main content

Showing 1–16 of 16 results for author: Vuorio, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00495  [pdf, other

    cs.LG

    A Bayesian Solution To The Imitation Gap

    Authors: Risto Vuorio, Mattie Fellows, Cong Lu, Clémence Grislain, Shimon Whiteson

    Abstract: In many real-world settings, an agent must learn to act in environments where no reward signal can be specified, but a set of expert demonstrations is available. Imitation learning (IL) is a popular framework for learning policies from such demonstrations. However, in some cases, differences in observability between the expert and the agent can give rise to an imitation gap such that the expert's… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  2. arXiv:2403.03020  [pdf, other

    cs.LG cs.AI

    SplAgger: Split Aggregation for Meta-Reinforcement Learning

    Authors: Jacob Beck, Matthew Jackson, Risto Vuorio, Zheng Xiong, Shimon Whiteson

    Abstract: A core ambition of reinforcement learning (RL) is the creation of agents capable of rapid learning in novel tasks. Meta-RL aims to achieve this by directly learning such agents. Black box methods do so by training off-the-shelf sequence models end-to-end. By contrast, task inference methods explicitly infer a posterior distribution over the unknown task, typically using distinct objectives and seq… ▽ More

    Submitted 1 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Published at Reinforcement Learning Conference (RLC) 2024. Code is provided at https://github.com/jacooba/hyper

  3. arXiv:2402.06570  [pdf, other

    cs.LG cs.RO

    Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control

    Authors: Zheng Xiong, Risto Vuorio, Jacob Beck, Matthieu Zimmer, Kun Shao, Shimon Whiteson

    Abstract: Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies. However, learning a highly performant universal policy requires sophisticated architectures like transformers (TF) that have larger memory and computational cost than simpler multi-layer perceptrons (MLP). To achieve both good per… ▽ More

    Submitted 3 June, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  4. arXiv:2310.02782  [pdf, other

    cs.LG cs.AI

    Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design

    Authors: Matthew Thomas Jackson, Minqi Jiang, Jack Parker-Holder, Risto Vuorio, Chris Lu, Gregory Farquhar, Shimon Whiteson, Jakob Nicolaus Foerster

    Abstract: The past decade has seen vast progress in deep reinforcement learning (RL) on the back of algorithms manually designed by human researchers. Recently, it has been shown that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks. Despite impressive initial results from algorithms such as Learned Policy Gradient (LPG), th… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: Published at NeurIPS 2023

  5. arXiv:2309.14970  [pdf, other

    cs.LG cs.AI cs.RO

    Recurrent Hypernetworks are Surprisingly Strong in Meta-RL

    Authors: Jacob Beck, Risto Vuorio, Zheng Xiong, Shimon Whiteson

    Abstract: Deep reinforcement learning (RL) is notoriously impractical to deploy due to sample inefficiency. Meta-RL directly addresses this sample inefficiency by learning to perform few-shot learning when a distribution of related tasks is available for meta-training. While many specialized meta-RL methods have been proposed, recent work suggests that end-to-end learning in conjunction with an off-the-shel… ▽ More

    Submitted 26 December, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: Published at NeurIPS 2023. We provide code at https://github.com/jacooba/hyper

  6. arXiv:2301.08028  [pdf, other

    cs.LG

    A Survey of Meta-Reinforcement Learning

    Authors: Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson

    Abstract: While deep reinforcement learning (RL) has fueled multiple high-profile successes in machine learning, it is held back from more widespread adoption by its often poor data efficiency and the limited generality of the policies it produces. A promising approach for alleviating these limitations is to cast the development of better RL algorithms as a machine learning problem itself in a process calle… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

  7. arXiv:2211.02667  [pdf, other

    cs.LG stat.ML

    Deconfounded Imitation Learning

    Authors: Risto Vuorio, Johann Brehmer, Hanno Ackermann, Daniel Dijkman, Taco Cohen, Pim de Haan

    Abstract: Standard imitation learning can fail when the expert demonstrators have different sensory inputs than the imitating agent. This is because partial observability gives rise to hidden confounders in the causal graph. We break down the space of confounded imitation learning problems and identify three settings with different data requirements in which the correct imitation policy can be identified. W… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

  8. arXiv:2210.11348  [pdf, other

    cs.LG cs.AI cs.RO

    Hypernetworks in Meta-Reinforcement Learning

    Authors: Jacob Beck, Matthew Thomas Jackson, Risto Vuorio, Shimon Whiteson

    Abstract: Training a reinforcement learning (RL) agent on a real-world robotics task remains generally impractical due to sample inefficiency. Multi-task RL and meta-RL aim to improve sample efficiency by generalizing over a distribution of related tasks. However, doing so is difficult in practice: In multi-task RL, state of the art methods often fail to outperform a degenerate solution that simply learns e… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: Published at CoRL 2022

  9. arXiv:2209.11303  [pdf, other

    cs.LG

    An Investigation of the Bias-Variance Tradeoff in Meta-Gradients

    Authors: Risto Vuorio, Jacob Beck, Shimon Whiteson, Jakob Foerster, Gregory Farquhar

    Abstract: Meta-gradients provide a general approach for optimizing the meta-parameters of reinforcement learning (RL) algorithms. Estimation of meta-gradients is central to the performance of these meta-algorithms, and has been studied in the setting of MAML-style short-horizon meta-RL problems. In this context, prior work has investigated the estimation of the Hessian of the RL objective, as well as tackli… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

  10. arXiv:2112.00478  [pdf, other

    cs.LG cs.AI stat.ML

    On the Practical Consistency of Meta-Reinforcement Learning Algorithms

    Authors: Zheng Xiong, Luisa Zintgraf, Jacob Beck, Risto Vuorio, Shimon Whiteson

    Abstract: Consistency is the theoretical property of a meta learning algorithm that ensures that, under certain assumptions, it can adapt to any task at test time. An open question is whether and how theoretical consistency translates into practice, in comparison to inconsistent algorithms. In this paper, we empirically investigate this question on a set of representative meta-RL algorithms. We find that th… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

  11. arXiv:2102.04999  [pdf, other

    cs.LG cs.AI

    Adaptive Pairwise Weights for Temporal Credit Assignment

    Authors: Zeyu Zheng, Risto Vuorio, Richard Lewis, Satinder Singh

    Abstract: How much credit (or blame) should an action taken in a state get for a future reward? This is the fundamental temporal credit assignment problem in Reinforcement Learning (RL). One of the earliest and still most widely used heuristics is to assign this credit based on a scalar coefficient, $λ$ (treated as a hyperparameter), raised to the power of the time interval between the state-action and the… ▽ More

    Submitted 6 June, 2022; v1 submitted 9 February, 2021; originally announced February 2021.

    Comments: AAAI 2022. The first two authors contributed equally

  12. arXiv:2102.04897  [pdf, other

    cs.LG cs.AI

    Learning State Representations from Random Deep Action-conditional Predictions

    Authors: Zeyu Zheng, Vivek Veeriah, Risto Vuorio, Richard Lewis, Satinder Singh

    Abstract: Our main contribution in this work is an empirical finding that random General Value Functions (GVFs), i.e., deep action-conditional predictions -- random both in what feature of observations they predict as well as in the sequence of actions the predictions are conditioned upon -- form good auxiliary tasks for reinforcement learning (RL) problems. In particular, we show that random deep action-co… ▽ More

    Submitted 5 November, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

    Comments: NeurIPS 2021

  13. arXiv:1911.11260  [pdf, other

    cs.LG cs.AI stat.ML

    Deep Reinforcement Learning for Multi-Driver Vehicle Dispatching and Repositioning Problem

    Authors: John Holler, Risto Vuorio, Zhiwei Qin, Xiaocheng Tang, Yan Jiao, Tiancheng **, Satinder Singh, Chenxi Wang, Jie** Ye

    Abstract: Order dispatching and driver repositioning (also known as fleet management) in the face of spatially and temporally varying supply and demand are central to a ride-sharing platform marketplace. Hand-crafting heuristic solutions that account for the dynamics in these resource allocation problems is difficult, and may be better handled by an end-to-end machine learning method. Previous works have ex… ▽ More

    Submitted 25 November, 2019; originally announced November 2019.

    Comments: ICDM 2019 Short Paper

  14. arXiv:1910.13616  [pdf, other

    cs.LG cs.AI stat.ML

    Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

    Authors: Risto Vuorio, Shao-Hua Sun, Hexiang Hu, Joseph J. Lim

    Abstract: Model-agnostic meta-learners aim to acquire meta-learned parameters from similar tasks to adapt to novel tasks from the same distribution with few gradient updates. With the flexibility in the choice of models, those frameworks demonstrate appealing performance on a variety of domains such as few-shot image classification and reinforcement learning. However, one important limitation of such framew… ▽ More

    Submitted 29 October, 2019; originally announced October 2019.

  15. arXiv:1812.07172  [pdf, other

    cs.LG cs.AI stat.ML

    Toward Multimodal Model-Agnostic Meta-Learning

    Authors: Risto Vuorio, Shao-Hua Sun, Hexiang Hu, Joseph J. Lim

    Abstract: Gradient-based meta-learners such as MAML are able to learn a meta-prior from similar tasks to adapt to novel tasks from the same distribution with few gradient updates. One important limitation of such frameworks is that they seek a common initialization shared across the entire task distribution, substantially limiting the diversity of the task distributions that they are able to learn from. In… ▽ More

    Submitted 18 December, 2018; originally announced December 2018.

  16. arXiv:1806.06928  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Meta Continual Learning

    Authors: Risto Vuorio, Dong-Yeon Cho, Daejoong Kim, Jiwon Kim

    Abstract: Using neural networks in practical settings would benefit from the ability of the networks to learn new tasks throughout their lifetimes without forgetting the previous tasks. This ability is limited in the current deep neural networks by a problem called catastrophic forgetting, where training on new tasks tends to severely degrade performance on previous tasks. One way to lessen the impact of th… ▽ More

    Submitted 11 June, 2018; originally announced June 2018.