Search | arXiv e-print repository

Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks

Authors: Maxime Chevalier-Boisvert, Bolun Dai, Mark Towers, Rodrigo de Lazcano, Lucas Willems, Salem Lahlou, Suman Pal, Pablo Samuel Castro, Jordan Terry

Abstract: We present the Minigrid and Miniworld libraries which provide a suite of goal-oriented 2D and 3D environments. The libraries were explicitly created with a minimalistic design paradigm to allow users to rapidly develop new environments for a wide range of research-specific needs. As a result, both have received widescale adoption by the RL community, facilitating research in a wide range of areas.… ▽ More We present the Minigrid and Miniworld libraries which provide a suite of goal-oriented 2D and 3D environments. The libraries were explicitly created with a minimalistic design paradigm to allow users to rapidly develop new environments for a wide range of research-specific needs. As a result, both have received widescale adoption by the RL community, facilitating research in a wide range of areas. In this paper, we outline the design philosophy, environment details, and their world generation API. We also showcase the additional capabilities brought by the unified API between Minigrid and Miniworld through case studies on transfer learning (for both RL agents and humans) between the different observation spaces. The source code of Minigrid and Miniworld can be found at https://github.com/Farama-Foundation/{Minigrid, Miniworld} along with their documentation at https://{minigrid, miniworld}.farama.org/. △ Less

Submitted 23 June, 2023; originally announced June 2023.

arXiv:2008.04391 [pdf, other]

DeepDrummer : Generating Drum Loops using Deep Learning and a Human in the Loop

Authors: Guillaume Alain, Maxime Chevalier-Boisvert, Frederic Osterrath, Remi Piche-Taillefer

Abstract: DeepDrummer is a drum loop generation tool that uses active learning to learn the preferences (or current artistic intentions) of a human user from a small number of interactions. The principal goal of this tool is to enable an efficient exploration of new musical ideas. We train a deep neural network classifier on audio data and show how it can be used as the core component of a system that gener… ▽ More DeepDrummer is a drum loop generation tool that uses active learning to learn the preferences (or current artistic intentions) of a human user from a small number of interactions. The principal goal of this tool is to enable an efficient exploration of new musical ideas. We train a deep neural network classifier on audio data and show how it can be used as the core component of a system that generates drum loops based on few prior beliefs as to how these loops should be structured. We aim to build a system that can converge to meaningful results even with a limited number of interactions with the user. This property enables our method to be used from a cold start situation (no pre-existing dataset), or starting from a collection of audio samples provided by the user. In a proof of concept study with 25 participants, we empirically demonstrate that DeepDrummer is able to converge towards the preference of our subjects after a small number of interactions. △ Less

Submitted 26 August, 2020; v1 submitted 10 August, 2020; originally announced August 2020.

arXiv:2007.12770 [pdf, other]

BabyAI 1.1

Authors: David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Yoshua Bengio

Abstract: The BabyAI platform is designed to measure the sample efficiency of training an agent to follow grounded-language instructions. BabyAI 1.0 presents baseline results of an agent trained by deep imitation or reinforcement learning. BabyAI 1.1 improves the agent's architecture in three minor ways. This increases reinforcement learning sample efficiency by up to 3 times and improves imitation learning… ▽ More The BabyAI platform is designed to measure the sample efficiency of training an agent to follow grounded-language instructions. BabyAI 1.0 presents baseline results of an agent trained by deep imitation or reinforcement learning. BabyAI 1.1 improves the agent's architecture in three minor ways. This increases reinforcement learning sample efficiency by up to 3 times and improves imitation learning performance on the hardest level from 77 % to 90.4 %. We hope that these improvements increase the computational efficiency of BabyAI experiments and help users design better agents. △ Less

Submitted 24 July, 2020; originally announced July 2020.

Comments: 9 pages, 1 figure, technical report

arXiv:2002.00412 [pdf, other]

Combating False Negatives in Adversarial Imitation Learning

Authors: Konrad Zolna, Chitwan Saharia, Leonard Boussioux, David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Yoshua Bengio

Abstract: In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior. However, as the trained policy learns to be more successful, the negative examples (the ones produced by the agent) become increasingly similar to expert ones. Despite the fact that the task is successfully accomplished in some of the agent's t… ▽ More In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior. However, as the trained policy learns to be more successful, the negative examples (the ones produced by the agent) become increasingly similar to expert ones. Despite the fact that the task is successfully accomplished in some of the agent's trajectories, the discriminator is trained to output low values for them. We hypothesize that this inconsistent training signal for the discriminator can impede its learning, and consequently leads to worse overall performance of the agent. We show experimental evidence for this hypothesis and that the 'False Negatives' (i.e. successful agent episodes) significantly hinder adversarial imitation learning, which is the first contribution of this paper. Then, we propose a method to alleviate the impact of false negatives and test it on the BabyAI environment. This method consistently improves sample efficiency over the baselines by at least an order of magnitude. △ Less

Submitted 2 February, 2020; originally announced February 2020.

Comments: This is an extended version of the student abstract published at 34th AAAI Conference on Artificial Intelligence

arXiv:2001.00271 [pdf, other]

Options of Interest: Temporal Abstraction with Interest Functions

Authors: Khimya Khetarpal, Martin Klissarov, Maxime Chevalier-Boisvert, Pierre-Luc Bacon, Doina Precup

Abstract: Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. The options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, be… ▽ More Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. The options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, because of difficulty in learning it from data. We provide a generalization of initiation sets suitable for general function approximation, by defining an interest function associated with an option. We derive a gradient-based learning algorithm for interest functions, leading to a new interest-option-critic architecture. We investigate how interest functions can be leveraged to learn interpretable and reusable temporal abstractions. We demonstrate the efficacy of the proposed approach through quantitative and qualitative results, in both discrete and continuous environments. △ Less

Submitted 1 January, 2020; originally announced January 2020.

Comments: To appear in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)

arXiv:1912.00444 [pdf, other]

Automated curriculum generation for Policy Gradients from Demonstrations

Authors: Anirudh Srinivasan, Dzmitry Bahdanau, Maxime Chevalier-Boisvert, Yoshua Bengio

Abstract: In this paper, we present a technique that improves the process of training an agent (using RL) for instruction following. We develop a training curriculum that uses a nominal number of expert demonstrations and trains the agent in a manner that draws parallels from one of the ways in which humans learn to perform complex tasks, i.e by starting from the goal and working backwards. We test our meth… ▽ More In this paper, we present a technique that improves the process of training an agent (using RL) for instruction following. We develop a training curriculum that uses a nominal number of expert demonstrations and trains the agent in a manner that draws parallels from one of the ways in which humans learn to perform complex tasks, i.e by starting from the goal and working backwards. We test our method on the BabyAI platform and show an improvement in sample efficiency for some of its tasks compared to a PPO (proximal policy optimization) baseline. △ Less

Submitted 1 December, 2019; originally announced December 2019.

Comments: Accepted to Deep RL Workshop at NeurIPS 2019

arXiv:1911.12825 [pdf, other]

Option-Critic in Cooperative Multi-agent Systems

Authors: Jhelum Chakravorty, Nadeem Ward, Julien Roy, Maxime Chevalier-Boisvert, Sumana Basu, Andrei Lupu, Doina Precup

Abstract: In this paper, we investigate learning temporal abstractions in cooperative multi-agent systems, using the options framework (Sutton et al, 1999). First, we address the planning problem for the decentralized POMDP represented by the multi-agent system, by introducing a \emph{common information approach}. We use the notion of \emph{common beliefs} and broadcasting to solve an equivalent centralized… ▽ More In this paper, we investigate learning temporal abstractions in cooperative multi-agent systems, using the options framework (Sutton et al, 1999). First, we address the planning problem for the decentralized POMDP represented by the multi-agent system, by introducing a \emph{common information approach}. We use the notion of \emph{common beliefs} and broadcasting to solve an equivalent centralized POMDP problem. Then, we propose the Distributed Option Critic (DOC) algorithm, which uses centralized option evaluation and decentralized intra-option improvement. We theoretically analyze the asymptotic convergence of DOC and build a new multi-agent environment to demonstrate its validity. Our experiments empirically show that DOC performs competitively against baselines and scales with the number of agents. △ Less

Submitted 19 March, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

arXiv:1911.03594 [pdf, other]

Robo-PlaNet: Learning to Poke in a Day

Authors: Maxime Chevalier-Boisvert, Guillaume Alain, Florian Golemo, Derek Nowrouzezahrai

Abstract: Recently, the Deep Planning Network (PlaNet) approach was introduced as a model-based reinforcement learning method that learns environment dynamics directly from pixel observations. This architecture is useful for learning tasks in which either the agent does not have access to meaningful states (like position/velocity of robotic joints) or where the observed states significantly deviate from the… ▽ More Recently, the Deep Planning Network (PlaNet) approach was introduced as a model-based reinforcement learning method that learns environment dynamics directly from pixel observations. This architecture is useful for learning tasks in which either the agent does not have access to meaningful states (like position/velocity of robotic joints) or where the observed states significantly deviate from the physical state of the agent (which is commonly the case in low-cost robots in the form of backlash or noisy joint readings). PlaNet, by design, interleaves phases of training the dynamics model with phases of collecting more data on the target environment, leading to long training times. In this work, we introduce Robo-PlaNet, an asynchronous version of PlaNet. This algorithm consistently reaches higher performance in the same amount of time, which we demonstrate in both a simulated and a real robotic experiment. △ Less

Submitted 19 November, 2019; v1 submitted 8 November, 2019; originally announced November 2019.

Comments: 4 pages, 3 figures. Version 2: added reference and acknowledgement

arXiv:1810.08272 [pdf, other]

BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning

Authors: Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, Yoshua Bengio

Abstract: Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and scientific reasons, but given the poor data efficiency of the current learning methods, this goal may require substantial research efforts. Here, we introduce the BabyAI research platform to support investigations towards including humans in the loop for grounded languag… ▽ More Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and scientific reasons, but given the poor data efficiency of the current learning methods, this goal may require substantial research efforts. Here, we introduce the BabyAI research platform to support investigations towards including humans in the loop for grounded language learning. The BabyAI platform comprises an extensible suite of 19 levels of increasing difficulty. The levels gradually lead the agent towards acquiring a combinatorially rich synthetic language which is a proper subset of English. The platform also provides a heuristic expert agent for the purpose of simulating a human teacher. We report baseline results and estimate the amount of human involvement that would be required to train a neural network-based agent on some of the BabyAI levels. We put forward strong evidence that current deep learning methods are not yet sufficiently sample efficient when it comes to learning a language with compositional properties. △ Less

Submitted 19 December, 2019; v1 submitted 18 October, 2018; originally announced October 2018.

Comments: Accepted at ICLR 2019

arXiv:1511.02956 [pdf, other]

Interprocedural Type Specialization of JavaScript Programs Without Type Analysis

Authors: Maxime Chevalier-Boisvert, Marc Feeley

Abstract: Dynamically typed programming languages such as Python and JavaScript defer type checking to run time. VM implementations can improve performance by eliminating redundant dynamic type checks. However, type inference analyses are often costly and involve tradeoffs between compilation time and resulting precision. This has lead to the creation of increasingly complex multi-tiered VM architectures.… ▽ More Dynamically typed programming languages such as Python and JavaScript defer type checking to run time. VM implementations can improve performance by eliminating redundant dynamic type checks. However, type inference analyses are often costly and involve tradeoffs between compilation time and resulting precision. This has lead to the creation of increasingly complex multi-tiered VM architectures. Lazy basic block versioning is a simple JIT compilation technique which effectively removes redundant type checks from critical code paths. This novel approach lazily generates type-specialized versions of basic blocks on-the-fly while propagating context-dependent type information. This approach does not require the use of costly program analyses, is not restricted by the precision limitations of traditional type analyses. This paper extends lazy basic block versioning to propagate type information interprocedurally, across function call boundaries. Our implementation in a JavaScript JIT compiler shows that across 26 benchmarks, interprocedural basic block versioning eliminates more type tag tests on average than what is achievable with static type analysis without resorting to code transformations. On average, 94.3% of type tag tests are eliminated, yielding speedups of up to 56%. We also show that our implementation is able to outperform Truffle/JS on several benchmarks, both in terms of execution time and compilation time. △ Less

Submitted 9 November, 2015; originally announced November 2015.

Comments: 10 pages, 10 figures, submitted to CGO 2016

ACM Class: D.3.4

arXiv:1507.02437 [pdf, other]

Extending Basic Block Versioning with Typed Object Shapes

Authors: Maxime Chevalier-Boisvert, Marc Feeley

Abstract: Typical JavaScript (JS) programs feature a large number of object property accesses. Hence, fast property reads and writes are crucial for good performance. Unfortunately, many (often redundant) dynamic checks are implied in each property access and the semantic complexity of JS makes it difficult to optimize away these tests through program analysis. We introduce two techniques to effectively eli… ▽ More Typical JavaScript (JS) programs feature a large number of object property accesses. Hence, fast property reads and writes are crucial for good performance. Unfortunately, many (often redundant) dynamic checks are implied in each property access and the semantic complexity of JS makes it difficult to optimize away these tests through program analysis. We introduce two techniques to effectively eliminate a large proportion of dynamic checks related to object property accesses. Typed shapes enable code specialization based on object property types without potentially complex and expensive analyses. Shape propagation allows the elimination of redundant shape checks in inline caches. These two techniques combine particularly well with Basic Block Versioning (BBV), but should be easily adaptable to tracing Just-In-Time (JIT) compilers and method JITs with type feedback. To assess the effectiveness of the techniques presented, we have implemented them in Higgs, a type-specializing JIT compiler for JS. The techniques are compared to a baseline using polymorphic Inline Caches (PICs), as well as commercial JS implementations. Empirical results show that across the 26 benchmarks tested, these techniques eliminate on average 48% of type tests, reduce code size by 17% and reduce execution time by 25%. On several benchmarks, Higgs performs better than current production JS virtual machines △ Less

Submitted 9 July, 2015; originally announced July 2015.

ACM Class: D.3.4

arXiv:1411.0352 [pdf, other]

Simple and Effective Type Check Removal through Lazy Basic Block Versioning

Authors: Maxime Chevalier-Boisvert, Marc Feeley

Abstract: Dynamically typed programming languages such as JavaScript and Python defer type checking to run time. In order to maximize performance, dynamic language VM implementations must attempt to eliminate redundant dynamic type checks. However, type inference analyses are often costly and involve tradeoffs between compilation time and resulting precision. This has lead to the creation of increasingly co… ▽ More Dynamically typed programming languages such as JavaScript and Python defer type checking to run time. In order to maximize performance, dynamic language VM implementations must attempt to eliminate redundant dynamic type checks. However, type inference analyses are often costly and involve tradeoffs between compilation time and resulting precision. This has lead to the creation of increasingly complex multi-tiered VM architectures. This paper introduces lazy basic block versioning, a simple JIT compilation technique which effectively removes redundant type checks from critical code paths. This novel approach lazily generates type-specialized versions of basic blocks on-the-fly while propagating context-dependent type information. This does not require the use of costly program analyses, is not restricted by the precision limitations of traditional type analyses and avoids the implementation complexity of speculative optimization techniques. We have implemented intraprocedural lazy basic block versioning in a JavaScript JIT compiler. This approach is compared with a classical flow-based type analysis. Lazy basic block versioning performs as well or better on all benchmarks. On average, 71% of type tests are eliminated, yielding speedups of up to 50%. We also show that our implementation generates more efficient machine code than TraceMonkey, a tracing JIT compiler for JavaScript, on several benchmarks. The combination of implementation simplicity, low algorithmic complexity and good run time performance makes basic block versioning attractive for baseline JIT compilers. △ Less

Submitted 29 May, 2015; v1 submitted 2 November, 2014; originally announced November 2014.

ACM Class: D.3.4

arXiv:1401.3041 [pdf, ps, other]

Removing Dynamic Type Tests with Context-Driven Basic Block Versioning

Authors: Maxime Chevalier-Boisvert, Marc Feeley

Abstract: Dynamic ty** is an important feature of dynamic programming languages. Primitive operators such as those for performing arithmetic and comparisons typically operate on a wide variety of in put value types, and as such, must internally implement some form of dynamic type dispatch and type checking. Removing such type tests is important for an efficient implementation. In this paper, we examine… ▽ More Dynamic ty** is an important feature of dynamic programming languages. Primitive operators such as those for performing arithmetic and comparisons typically operate on a wide variety of in put value types, and as such, must internally implement some form of dynamic type dispatch and type checking. Removing such type tests is important for an efficient implementation. In this paper, we examine the effectiveness of a novel approach to reducing the number of dynamically executed type tests called context-driven basic block versioning. This simple technique clones and specializes basic blocks in such a way as to allow the compiler to accumulate type information while machine code is generated, without a separate type analysis pass. The accumulated information allows the removal of some redundant type tests, particularly in performance-critical paths. We have implemented intraprocedural context-driven basic block versioning in a JavaScript JIT compiler. For comparison, we have also implemented a classical flow-based type analysis operating on the same concrete types. Our results show that basic block versioning performs better on most benchmarks and removes a large fraction of type tests at the expense of a moderate code size increase. We believe that this technique offers a good tradeoff between implementation complexity and performance, and is suitable for integration in production JIT compilers. △ Less

Submitted 13 January, 2014; originally announced January 2014.

Comments: 22 pages, 10 figures

Showing 1–13 of 13 results for author: Chevalier-Boisvert, M