Skip to main content

Showing 1–35 of 35 results for author: Laroche, R

.
  1. arXiv:2310.17139  [pdf, other

    cs.LG

    Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

    Authors: Hongyu Zang, Xin Li, Leiji Zhang, Yang Liu, Baigui Sun, Riashat Islam, Remi Tachet des Combes, Romain Laroche

    Abstract: While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par. In some instances, their performance has even significantly underperformed alternative methods. We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks. Our analysis… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  2. arXiv:2310.04413  [pdf, other

    cs.LG cs.AI

    Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

    Authors: Zhang-Wei Hong, Aviral Kumar, Sathwik Karnik, Abhishek Bhandwaldar, Akash Srivastava, Joni Pajarinen, Romain Laroche, Abhishek Gupta, Pulkit Agrawal

    Abstract: Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning techniques such as behavior cloning is to find a policy that achieves a higher average return than the trajectories constituting the dataset. However, we empirica… ▽ More

    Submitted 11 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: Accepted NeurIPS 2023

    Journal ref: NeurIPS 2023

  3. arXiv:2310.00229  [pdf, other

    cs.AI cs.LG

    Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning

    Authors: Mingde Zhao, Safa Alver, Harm van Seijen, Romain Laroche, Doina Precup, Yoshua Bengio

    Abstract: Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning framework utilizing spatio-temporal abstractions to generalize better in novel situations. It automatically decomposes the given task into smaller, more manageable subtasks, and thus enables sparse decision-making and focused computation on the relevant parts of the environment. The decomposition relies… ▽ More

    Submitted 16 March, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: ICLR 2024 Camera-Ready

  4. arXiv:2306.13085  [pdf, other

    cs.LG cs.AI

    Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting

    Authors: Zhang-Wei Hong, Pulkit Agrawal, Rémi Tachet des Combes, Romain Laroche

    Abstract: Most offline reinforcement learning (RL) algorithms return a target policy maximizing a trade-off between (1) the expected performance gain over the behavior policy that collected the dataset, and (2) the risk stemming from the out-of-distribution-ness of the induced state-action occupancy. It follows that the performance of the target policy is strongly related to the performance of the behavior… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Journal ref: Conference paper at ICLR 2023

  5. arXiv:2305.16338  [pdf, other

    cs.LG cs.AI cs.CL

    Think Before You Act: Decision Transformers with Working Memory

    Authors: Jikun Kang, Romain Laroche, Xingdi Yuan, Adam Trischler, Xue Liu, Jie Fu

    Abstract: Decision Transformer-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performance relies on massive data and computation. We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training. As a result, training on a new task may deteriorate the model's performance o… ▽ More

    Submitted 28 May, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted at ICML 2024

  6. arXiv:2211.00863  [pdf, other

    cs.LG cs.AI

    Behavior Prior Representation learning for Offline Reinforcement Learning

    Authors: Hongyu Zang, Xin Li, Jie Yu, Chen Liu, Riashat Islam, Remi Tachet Des Combes, Romain Laroche

    Abstract: Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have proposed common workarounds based on the pre-training of state representations, followed by policy training. In this work, we introduce a simple, yet effective approach for learning state representations. Our… ▽ More

    Submitted 27 February, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: ICLR 2023

  7. arXiv:2211.00247  [pdf, other

    cs.LG cs.AI

    Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning

    Authors: Riashat Islam, Hongyu Zang, Anirudh Goyal, Alex Lamb, Kenji Kawaguchi, Xin Li, Romain Laroche, Yoshua Bengio, Remi Tachet Des Combes

    Abstract: Goal-conditioned reinforcement learning (RL) is a promising direction for training agents that are capable of solving multiple tasks and reach a diverse set of objectives. How to \textit{specify} and \textit{ground} these goals in such a way that we can both reliably reach goals during training as well as generalize to new goals during evaluation remains an open area of research. Defining goals in… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: Neurips 2022

  8. arXiv:2210.06468  [pdf, other

    cs.AI cs.CL cs.LG

    Contrastive Multimodal Learning for Emergence of Graphical Sensory-Motor Communication

    Authors: Tristan Karch, Yoann Lemesle, Romain Laroche, Clément Moulin-Frier, Pierre-Yves Oudeyer

    Abstract: In this paper, we investigate whether artificial agents can develop a shared language in an ecological setting where communication relies on a sensory-motor channel. To this end, we introduce the Graphical Referential Game (GREG) where a speaker must produce a graphical utterance to name a visual referent object while a listener has to select the corresponding object among distractor referents, gi… ▽ More

    Submitted 14 February, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

  9. arXiv:2206.01251  [pdf, other

    cs.LG cs.AI cs.CV

    Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods

    Authors: Yuchen Lu, Zhen Liu, Aristide Baratin, Romain Laroche, Aaron Courville, Alessandro Sordoni

    Abstract: We address the problem of evaluating the quality of self-supervised learning (SSL) models without access to supervised labels, while being agnostic to the architecture, learning algorithm or data manipulation used during training. We argue that representations can be evaluated through the lens of expressiveness and learnability. We propose to use the Intrinsic Dimension (ID) to assess expressivene… ▽ More

    Submitted 14 November, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

    Journal ref: TMLR 2023 -- Transactions of Machine Learning Research, 11/2023

  10. arXiv:2206.01085  [pdf, other

    cs.LG

    Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning

    Authors: David Brandfonbrener, Remi Tachet des Combes, Romain Laroche

    Abstract: Most theoretically motivated work in the offline reinforcement learning setting requires precise uncertainty estimates. This requirement restricts the algorithms derived in that work to the tabular and linear settings where such estimates exist. In this work, we develop a novel method for incorporating scalable uncertainty estimates into an offline reinforcement learning algorithm called deep-SPIB… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

  11. arXiv:2206.01079  [pdf, other

    cs.LG

    When does return-conditioned supervised learning work for offline reinforcement learning?

    Authors: David Brandfonbrener, Alberto Bietti, Jacob Buckman, Romain Laroche, Joan Bruna

    Abstract: Several recent works have proposed a class of algorithms for the offline reinforcement learning (RL) problem that we will refer to as return-conditioned supervised learning (RCSL). RCSL algorithms learn the distribution of actions conditioned on both the state and the return of the trajectory. Then they define a policy by conditioning on achieving high return. In this paper, we provide a rigorous… ▽ More

    Submitted 11 January, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

  12. arXiv:2205.13950  [pdf, other

    cs.LG eess.SY

    Non-Markovian policies occupancy measures

    Authors: Romain Laroche, Remi Tachet des Combes, Jacob Buckman

    Abstract: A central object of study in Reinforcement Learning (RL) is the Markovian policy, in which an agent's actions are chosen from a memoryless probability distribution, conditioned only on its current state. The family of Markovian policies is broad enough to be interesting, yet simple enough to be amenable to analysis. However, RL often involves more complex policies: ensembles of policies, policies… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: 9p+sup. mat

  13. arXiv:2203.04806  [pdf, other

    cs.CL

    One-Shot Learning from a Demonstration with Hierarchical Latent Language

    Authors: Nathaniel Weir, Xingdi Yuan, Marc-Alexandre Côté, Matthew Hausknecht, Romain Laroche, Ida Momennejad, Harm Van Seijen, Benjamin Van Durme

    Abstract: Humans have the capability, aided by the expressive compositionality of their language, to learn quickly by demonstration. They are able to describe unseen task-performing procedures and generalize their execution to other contexts. In this work, we introduce DescribeWorld, an environment designed to test this sort of generalization skill in grounded agents, where tasks are linguistically and proc… ▽ More

    Submitted 9 March, 2022; originally announced March 2022.

  14. arXiv:2202.07496  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms

    Authors: Romain Laroche, Remi Tachet

    Abstract: In Reinforcement Learning, the optimal action at a given state is dependent on policy decisions at subsequent states. As a consequence, the learning targets evolve with time and the policy optimization process must be efficient at unlearning what it previously learnt. In this paper, we discover that the policy gradient theorem prescribes policy updates that are slow to unlearn because of their str… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

    Comments: 9p+appendix, accepted to AISTATS 2022

  15. arXiv:2202.06828  [pdf, other

    cs.LG

    On the Convergence of SARSA with Linear Function Approximation

    Authors: Shangtong Zhang, Remi Tachet, Romain Laroche

    Abstract: SARSA, a classical on-policy control algorithm for reinforcement learning, is known to chatter when combined with linear function approximation: SARSA does not diverge but oscillates in a bounded region. However, little is known about how fast SARSA converges to that region and how large the region is. In this paper, we make progress towards this open problem by showing the convergence rate of pro… ▽ More

    Submitted 12 May, 2023; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: ICML 2023

  16. arXiv:2111.02997  [pdf, other

    cs.LG

    Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

    Authors: Shangtong Zhang, Remi Tachet, Romain Laroche

    Abstract: In this paper, we establish the global optimality and convergence rate of an off-policy actor critic algorithm in the tabular setting without using density ratio to correct the discrepancy between the state distribution of the behavior policy and that of the target policy. Our work goes beyond existing works on the optimality of policy gradient methods in that existing works use the exact policy g… ▽ More

    Submitted 24 October, 2022; v1 submitted 4 November, 2021; originally announced November 2021.

    Comments: Journal of Machine Learning Research 2022

  17. arXiv:2109.14733  [pdf, other

    cs.LG cs.AI

    Batched Bandits with Crowd Externalities

    Authors: Romain Laroche, Othmane Safsafi, Raphael Feraud, Nicolas Broutin

    Abstract: In Batched Multi-Armed Bandits (BMAB), the policy is not allowed to be updated at each time step. Usually, the setting asserts a maximum number of allowed policy updates and the algorithm schedules them so that to minimize the expected regret. In this paper, we describe a novel setting for BMAB, with the following twist: the timing of the policy update is not controlled by the BMAB algorithm, but… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

    Comments: 31 pages

  18. arXiv:2109.14727  [pdf, other

    cs.LG cs.AI

    Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates

    Authors: Romain Laroche, Remi Tachet

    Abstract: The policy gradient theorem states that the policy should only be updated in states that are visited by the current policy, which leads to insufficient planning in the off-policy states, and thus to convergence to suboptimal policies. We tackle this planning issue by extending the policy gradient theory to policy updates with respect to any state density. Under these generalized policy updates, we… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

    Comments: accepted to NeurIPS as a poster

  19. arXiv:2109.06232  [pdf, other

    cs.CL cs.IT cs.NE

    The Emergence of the Shape Bias Results from Communicative Efficiency

    Authors: Eva Portelance, Michael C. Frank, Dan Jurafsky, Alessandro Sordoni, Romain Laroche

    Abstract: By the age of two, children tend to assume that new word categories are based on objects' shape, rather than their color or texture; this assumption is called the shape bias. They are thought to learn this bias by observing that their caregiver's language is biased towards shape based categories. This presents a chicken and egg problem: if the shape bias must be present in the language in order fo… ▽ More

    Submitted 14 September, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: Accepted at CoNLL 2021

  20. arXiv:2106.00099  [pdf, other

    cs.LG

    Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

    Authors: Harsh Satija, Philip S. Thomas, Joelle Pineau, Romain Laroche

    Abstract: We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning (RL) setting. We consider the scenario where: (i) we have a dataset collected under a known baseline policy, (ii) multiple reward signals are received from the environment inducing as many objectives to optimize. We present an SPI formulation for this RL setting that takes into account the… ▽ More

    Submitted 29 October, 2021; v1 submitted 31 May, 2021; originally announced June 2021.

  21. arXiv:2010.01069  [pdf, other

    cs.LG cs.AI

    A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

    Authors: Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson, Remi Tachet des Combes

    Abstract: We investigate the discounting mismatch in actor-critic algorithm implementations from a representation learning perspective. Theoretically, actor-critic algorithms usually have discounting for both actor and critic, i.e., there is a $γ^t$ term in the actor update for the transition observed at time $t$ in a trajectory and the critic is a discounted value function. Practitioners, however, usually… ▽ More

    Submitted 26 January, 2022; v1 submitted 2 October, 2020; originally announced October 2020.

    Comments: AAMAS 2022

  22. arXiv:2002.10948  [pdf, other

    q-bio.NC cs.AI cs.LG eess.SY

    Reinforcement Learning Framework for Deep Brain Stimulation Study

    Authors: Dmitrii Krylov, Remi Tachet, Romain Laroche, Michael Rosenblum, Dmitry V. Dylov

    Abstract: Malfunctioning neurons in the brain sometimes operate synchronously, reportedly causing many neurological diseases, e.g. Parkinson's. Suppression and control of this collective synchronous activity are therefore of great importance for neuroscience, and can only rely on limited engineering trials due to the need to experiment with live human brains. We present the first Reinforcement Learning gym… ▽ More

    Submitted 22 February, 2020; originally announced February 2020.

    Comments: 7 pages + 1 references, 7 figures. arXiv admin note: text overlap with arXiv:1909.12154

    Journal ref: IJCAI 2020, pp. 2847-2854

  23. arXiv:2002.09127  [pdf, other

    cs.CL cs.LG

    Learning Dynamic Belief Graphs to Generalize on Text-Based Games

    Authors: Ashutosh Adhikari, Xingdi Yuan, Marc-Alexandre Côté, Mikuláš Zelinka, Marc-Antoine Rondeau, Romain Laroche, Pascal Poupart, Jian Tang, Adam Trischler, William L. Hamilton

    Abstract: Playing text-based games requires skills in processing natural language and sequential decision making. Achieving human-level performance on text-based games remains an open challenge, and prior research has largely relied on hand-crafted structured representations and heuristics. In this work, we investigate how an agent can plan and generalize in text-based games using graph-structured represent… ▽ More

    Submitted 11 May, 2021; v1 submitted 20 February, 2020; originally announced February 2020.

    Comments: Bug fixed in Table 1

  24. arXiv:1910.09532  [pdf, other

    cs.CL cs.LG

    Building Dynamic Knowledge Graphs from Text-based Games

    Authors: Mikuláš Zelinka, Xingdi Yuan, Marc-Alexandre Côté, Romain Laroche, Adam Trischler

    Abstract: We are interested in learning how to update Knowledge Graphs (KG) from text. In this preliminary work, we propose a novel Sequence-to-Sequence (Seq2Seq) architecture to generate elementary KG operations. Furthermore, we introduce a new dataset for KG extraction built upon text-based game transitions (over 300k data points). We conduct experiments and discuss the results.

    Submitted 23 January, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

    Comments: NeurIPS 2019, Graph Representation Learning (GRL) Workshop

  25. arXiv:1909.05236  [pdf, other

    cs.LG cs.AI stat.ML

    Safe Policy Improvement with an Estimated Baseline Policy

    Authors: Thiago D. Simão, Romain Laroche, Rémi Tachet des Combes

    Abstract: Previous work has shown the unreliability of existing algorithms in the batch Reinforcement Learning setting, and proposed the theoretically-grounded Safe Policy Improvement with Baseline Bootstrap** (SPIBB) fix: reproduce the baseline policy in the uncertain state-action pairs, in order to control the variance on the trained policy performance. However, in many real-world applications such as d… ▽ More

    Submitted 28 December, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

    Comments: Published at AAMAS 2020

  26. arXiv:1907.05079  [pdf, other

    cs.LG cs.AI stat.ML

    Safe Policy Improvement with Soft Baseline Bootstrap**

    Authors: Kimia Nadjahi, Romain Laroche, Rémi Tachet des Combes

    Abstract: Batch Reinforcement Learning (Batch RL) consists in training a policy using trajectories collected with another policy, called the behavioural policy. Safe policy improvement (SPI) provides guarantees with high probability that the trained policy performs better than the behavioural policy, also called baseline in this setting. Previous work shows that the SPI objective improves mean performance a… ▽ More

    Submitted 11 July, 2019; originally announced July 2019.

    Comments: Accepted paper at ECML-PKDD2019

  27. arXiv:1903.01004  [pdf, other

    cs.LG cs.AI stat.ML

    Budgeted Reinforcement Learning in Continuous State Space

    Authors: Nicolas Carrara, Edouard Leurent, Romain Laroche, Tanguy Urvoy, Odalric-Ambrym Maillard, Olivier Pietquin

    Abstract: A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to… ▽ More

    Submitted 27 May, 2019; v1 submitted 3 March, 2019; originally announced March 2019.

    Comments: N. Carrara and E. Leurent have equally contributed

  28. arXiv:1811.07763  [pdf, other

    cs.LG stat.ML

    Decentralized Exploration in Multi-Armed Bandits -- Extended version

    Authors: Raphaël Féraud, Réda Alami, Romain Laroche

    Abstract: We consider the decentralized exploration problem: a set of players collaborate to identify the best arm by asynchronously interacting with the same stochastic environment. The objective is to insure privacy in the best arm identification problem between asynchronous, collaborative, and thrifty players. In the context of a digital service, we advocate that this decentralized approach allows a good… ▽ More

    Submitted 16 January, 2023; v1 submitted 19 November, 2018; originally announced November 2018.

  29. arXiv:1806.11525  [pdf, other

    cs.CL cs.LG

    Counting to Explore and Generalize in Text-based Games

    Authors: Xingdi Yuan, Marc-Alexandre Côté, Alessandro Sordoni, Romain Laroche, Remi Tachet des Combes, Matthew Hausknecht, Adam Trischler

    Abstract: We propose a recurrent RL agent with an episodic exploration mechanism that helps discovering good policies in text-based game environments. We show promising results on a set of generated text-based games of varying difficulty where the goal is to collect a coin located at the end of a chain of rooms. In contrast to previous text-based RL approaches, we observe that our agent learns policies that… ▽ More

    Submitted 6 March, 2019; v1 submitted 29 June, 2018; originally announced June 2018.

  30. arXiv:1712.06924  [pdf, other

    cs.LG cs.AI stat.ML

    Safe Policy Improvement with Baseline Bootstrap**

    Authors: Romain Laroche, Paul Trichelair, Rémi Tachet des Combes

    Abstract: This paper considers Safe Policy Improvement (SPI) in Batch Reinforcement Learning (Batch RL): from a fixed dataset and without direct access to the true environment, train a policy that is guaranteed to perform at least as well as the baseline policy used to collect the data. Our approach, called SPI with Baseline Bootstrap** (SPIBB), is inspired by the knows-what-it-knows paradigm: it bootstra… ▽ More

    Submitted 7 June, 2019; v1 submitted 19 December, 2017; originally announced December 2017.

    Comments: accepted as a long oral at ICML2019

  31. arXiv:1707.01450  [pdf, ps, other

    cs.AI cs.CL

    The Complex Negotiation Dialogue Game

    Authors: Romain Laroche

    Abstract: This position paper formalises an abstract model for complex negotiation dialogue. This model is to be used for the benchmark of optimisation algorithms ranging from Reinforcement Learning to Stochastic Games, through Transfer Learning, One-Shot Learning or others.

    Submitted 5 July, 2017; originally announced July 2017.

    Comments: Position paper for Sigdial/Semdial 2017 special session on negotiation dialogue

  32. arXiv:1706.04208  [pdf, other

    cs.LG

    Hybrid Reward Architecture for Reinforcement Learning

    Authors: Harm van Seijen, Mehdi Fatemi, Joshua Romoff, Romain Laroche, Tavian Barnes, Jeffrey Tsang

    Abstract: One of the main challenges in reinforcement learning (RL) is generalisation. In typical deep RL methods this is achieved by approximating the optimal value function with a low-dimensional representation using a deep network. While this approach works well in many domains, in domains where the optimal value function cannot easily be reduced to a low-dimensional representation, learning can be very… ▽ More

    Submitted 27 November, 2017; v1 submitted 13 June, 2017; originally announced June 2017.

  33. arXiv:1704.00756  [pdf, other

    cs.LG cs.AI stat.ML

    Multi-Advisor Reinforcement Learning

    Authors: Romain Laroche, Mehdi Fatemi, Joshua Romoff, Harm van Seijen

    Abstract: We consider tackling a single-agent RL problem by distributing it to $n$ learners. These learners, called advisors, endeavour to solve the problem from a different focus. Their advice, taking the form of action values, is then communicated to an aggregator, which is in control of the system. We show that the local planning method for the advisors is critical and that none of the ones found in the… ▽ More

    Submitted 14 November, 2017; v1 submitted 3 April, 2017; originally announced April 2017.

    Comments: Submitted at ICLR2018

  34. arXiv:1701.08810  [pdf, other

    stat.ML cs.AI cs.LG math.OC

    Reinforcement Learning Algorithm Selection

    Authors: Romain Laroche, Raphael Feraud

    Abstract: This paper formalises the problem of online algorithm selection in the context of Reinforcement Learning. The setup is as follows: given an episodic task and a finite number of off-policy RL algorithms, a meta-algorithm has to decide which RL algorithm is in control during the next episode so as to maximize the expected return. The article presents a novel meta-algorithm, called Epochal Stochastic… ▽ More

    Submitted 14 November, 2017; v1 submitted 30 January, 2017; originally announced January 2017.

  35. arXiv:1612.05159  [pdf, other

    cs.LG cs.AI

    Separation of Concerns in Reinforcement Learning

    Authors: Harm van Seijen, Mehdi Fatemi, Joshua Romoff, Romain Laroche

    Abstract: In this paper, we propose a framework for solving a single-agent task by using multiple agents, each focusing on different aspects of the task. This approach has two main advantages: 1) it allows for training specialized agents on different parts of the task, and 2) it provides a new way to transfer knowledge, by transferring trained agents. Our framework generalizes the traditional hierarchical d… ▽ More

    Submitted 28 March, 2017; v1 submitted 15 December, 2016; originally announced December 2016.