Skip to main content

Showing 1–8 of 8 results for author: Machado, M C

Searching in archive stat. Search in all archives.
.
  1. arXiv:2108.05828  [pdf, other

    cs.LG cs.AI stat.ML

    A general class of surrogate functions for stable and efficient reinforcement learning

    Authors: Sharan Vaswani, Olivier Bachem, Simone Totaro, Robert Mueller, Shivam Garg, Matthieu Geist, Marlos C. Machado, Pablo Samuel Castro, Nicolas Le Roux

    Abstract: Common policy gradient methods rely on the maximization of a sequence of surrogate functions. In recent years, many such surrogate functions have been proposed, most without strong theoretical guarantees, leading to algorithms such as TRPO, PPO or MPO. Rather than design yet another surrogate function, we instead propose a general framework (FMA-PG) based on functional mirror ascent that gives ris… ▽ More

    Submitted 30 October, 2023; v1 submitted 12 August, 2021; originally announced August 2021.

    Comments: Fixed minor typos

  2. arXiv:2101.05265  [pdf, other

    cs.LG cs.AI stat.ML

    Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

    Authors: Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare

    Abstract: Reinforcement learning methods trained on few environments rarely learn policies that generalize to unseen environments. To improve generalization, we incorporate the inherent sequential structure in reinforcement learning into the representation learning process. This approach is orthogonal to recent approaches, which rarely exploit this structure explicitly. Specifically, we introduce a theoreti… ▽ More

    Submitted 18 March, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

    Comments: ICLR 2021 (Spotlight). Website: https://agarwl.github.io/pse

  3. arXiv:2008.13773  [pdf, other

    cs.LG stat.ML

    Beyond variance reduction: Understanding the true impact of baselines on policy optimization

    Authors: Wesley Chung, Valentin Thomas, Marlos C. Machado, Nicolas Le Roux

    Abstract: Bandit and reinforcement learning (RL) problems can often be framed as optimization problems where the goal is to maximize average performance while having access only to stochastic estimates of the true gradient. Traditionally, stochastic optimization theory predicts that learning dynamics are governed by the curvature of the loss function and the noise of the gradient estimates. In this paper we… ▽ More

    Submitted 19 February, 2021; v1 submitted 31 August, 2020; originally announced August 2020.

  4. arXiv:2006.11266  [pdf, other

    cs.LG cs.AI stat.ML

    An operator view of policy gradient methods

    Authors: Dibya Ghosh, Marlos C. Machado, Nicolas Le Roux

    Abstract: We cast policy gradient methods as the repeated application of two operators: a policy improvement operator $\mathcal{I}$, which maps any policy $π$ to a better one $\mathcal{I}π$, and a projection operator $\mathcal{P}$, which finds the best approximation of $\mathcal{I}π$ in the set of realizable policies. We use this framework to introduce operator-based versions of traditional policy gradient… ▽ More

    Submitted 22 October, 2020; v1 submitted 19 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020

  5. arXiv:1908.02388  [pdf, other

    cs.LG stat.ML

    Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment

    Authors: Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare

    Abstract: This paper provides an empirical evaluation of recently developed exploration algorithms within the Arcade Learning Environment (ALE). We study the use of different reward bonuses that incentives exploration in reinforcement learning. We do so by fixing the learning algorithm used and focusing only on the impact of the different exploration bonuses in the agent's performance. We use Rainbow, the s… ▽ More

    Submitted 24 September, 2021; v1 submitted 6 August, 2019; originally announced August 2019.

    Comments: Accepted at the second Exploration in Reinforcement Learning Workshop at the 36th International Conference on Machine Learning, Long Beach, California. The full version arxiv.longhoe.net/abs/2109.11052 was published as a conference paper at ICLR 2020

  6. arXiv:1810.00123  [pdf, other

    cs.LG cs.AI stat.ML

    Generalization and Regularization in DQN

    Authors: Jesse Farebrother, Marlos C. Machado, Michael Bowling

    Abstract: Deep reinforcement learning algorithms have shown an impressive ability to learn complex control policies in high-dimensional tasks. However, despite the ever-increasing performance on popular benchmarks, policies learned by deep reinforcement learning algorithms can struggle to generalize when evaluated in remarkably similar environments. In this paper we propose a protocol to evaluate generaliza… ▽ More

    Submitted 17 January, 2020; v1 submitted 28 September, 2018; originally announced October 2018.

    Comments: Earlier versions of this work were presented both at the NeurIPS'18 Deep Reinforcement Learning Workshop and the 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM'19)

  7. arXiv:1807.11622  [pdf, other

    cs.LG cs.AI stat.ML

    Count-Based Exploration with the Successor Representation

    Authors: Marlos C. Machado, Marc G. Bellemare, Michael Bowling

    Abstract: In this paper we introduce a simple approach for exploration in reinforcement learning (RL) that allows us to develop theoretically justified algorithms in the tabular case but that is also extendable to settings where function approximation is required. Our approach is based on the successor representation (SR), which was originally introduced as a representation defining state generalization by… ▽ More

    Submitted 26 November, 2019; v1 submitted 30 July, 2018; originally announced July 2018.

    Comments: This paper appears in the Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI 2020)

  8. arXiv:1803.09001  [pdf, other

    cs.LG cs.AI stat.ML

    Accelerating Learning in Constructive Predictive Frameworks with the Successor Representation

    Authors: Craig Sherstan, Marlos C. Machado, Patrick M. Pilarski

    Abstract: Here we propose using the successor representation (SR) to accelerate learning in a constructive knowledge system based on general value functions (GVFs). In real-world settings like robotics for unstructured and dynamic environments, it is infeasible to model all meaningful aspects of a system and its environment by hand due to both complexity and size. Instead, robots must be capable of learning… ▽ More

    Submitted 23 March, 2018; originally announced March 2018.