Skip to main content

Showing 1–8 of 8 results for author: Romoff, J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2007.02786  [pdf, other

    cs.LG stat.ML

    TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?

    Authors: Joshua Romoff, Peter Henderson, David Kanaa, Emmanuel Bengio, Ahmed Touati, Pierre-Luc Bacon, Joelle Pineau

    Abstract: We investigate whether Jacobi preconditioning, accounting for the bootstrap term in temporal difference (TD) learning, can help boost performance of adaptive optimizers. Our method, TDprop, computes a per parameter learning rate based on the diagonal preconditioning of the TD update rule. We show how this can be used in both $n$-step returns and TD($λ$). Our theoretical findings demonstrate that i… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

    Comments: Presented at the Theoretical Foundations of Reinforcement Learning workshop at ICML 2020

  2. arXiv:1906.04585  [pdf, other

    cs.LG cs.AI cs.MA math.OC stat.ML

    Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning

    Authors: Mahmoud Assran, Joshua Romoff, Nicolas Ballas, Joelle Pineau, Michael Rabbat

    Abstract: Multi-simulator training has contributed to the recent success of Deep Reinforcement Learning by stabilizing learning and allowing for higher training throughputs. We propose Gossip-based Actor-Learner Architectures (GALA) where several actor-learners (such as A2C agents) are organized in a peer-to-peer communication topology, and exchange information through asynchronous gossip in order to take a… ▽ More

    Submitted 21 April, 2020; v1 submitted 9 June, 2019; originally announced June 2019.

    Journal ref: Advances in Neural Information Processing Systems (2019) 13299-13309

  3. arXiv:1902.01883  [pdf, other

    cs.LG cs.AI stat.ML

    Separating value functions across time-scales

    Authors: Joshua Romoff, Peter Henderson, Ahmed Touati, Emma Brunskill, Joelle Pineau, Yann Ollivier

    Abstract: In many finite horizon episodic reinforcement learning (RL) settings, it is desirable to optimize for the undiscounted return - in settings like Atari, for instance, the goal is to collect the most points while staying alive in the long run. Yet, it may be difficult (or even intractable) mathematically to learn with this target. As such, temporal discounting is often applied to optimize over a sho… ▽ More

    Submitted 24 May, 2019; v1 submitted 5 February, 2019; originally announced February 2019.

    Comments: Full version accepted to ICML 2019. Extended abstract also to be presented at RLDM 2019

  4. arXiv:1810.11187  [pdf, other

    cs.LG cs.AI cs.MA stat.ML

    TarMAC: Targeted Multi-Agent Communication

    Authors: Abhishek Das, Théophile Gervet, Joshua Romoff, Dhruv Batra, Devi Parikh, Michael Rabbat, Joelle Pineau

    Abstract: We propose a targeted communication architecture for multi-agent reinforcement learning, where agents learn both what messages to send and whom to address them to while performing cooperative tasks in partially-observable environments. This targeting behavior is learnt solely from downstream task-specific reward without any communication supervision. We additionally augment this with a multi-round… ▽ More

    Submitted 21 February, 2020; v1 submitted 26 October, 2018; originally announced October 2018.

    Comments: ICML 2019

  5. arXiv:1810.02525  [pdf, other

    cs.LG cs.AI stat.ML

    Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods

    Authors: Peter Henderson, Joshua Romoff, Joelle Pineau

    Abstract: Recent analyses of certain gradient descent optimization methods have shown that performance can degrade in some settings - such as with stochasticity or implicit momentum. In deep reinforcement learning (Deep RL), such optimization methods are often used for training neural networks via the temporal difference error or policy gradient. As an agent improves over time, the optimization target chang… ▽ More

    Submitted 5 October, 2018; originally announced October 2018.

    Comments: Accepted at the European Workshop on Reinforcement Learning 2018 (EWRL14)

  6. arXiv:1806.02315  [pdf, other

    cs.LG stat.ML

    Randomized Value Functions via Multiplicative Normalizing Flows

    Authors: Ahmed Touati, Harsh Satija, Joshua Romoff, Joelle Pineau, Pascal Vincent

    Abstract: Randomized value functions offer a promising approach towards the challenge of efficient exploration in complex environments with high dimensional state and action spaces. Unlike traditional point estimate methods, randomized value functions maintain a posterior distribution over action-space values. This prevents the agent's behavior policy from prematurely exploiting early estimates and falling… ▽ More

    Submitted 28 June, 2019; v1 submitted 6 June, 2018; originally announced June 2018.

    Journal ref: UAI 2019: Conference on Uncertainty in Artificial Intelligence 2019

  7. arXiv:1805.03359  [pdf, other

    cs.LG cs.AI stat.ML

    Reward Estimation for Variance Reduction in Deep Reinforcement Learning

    Authors: Joshua Romoff, Peter Henderson, Alexandre Piché, Vincent Francois-Lavet, Joelle Pineau

    Abstract: Reinforcement Learning (RL) agents require the specification of a reward signal for learning behaviours. However, introduction of corrupt or stochastic rewards can yield high variance in learning. Such corruption may be a direct result of goal misspecification, randomness in the reward signal, or correlation of the reward with external factors that are not known to the agent. Corruption or stochas… ▽ More

    Submitted 7 November, 2018; v1 submitted 8 May, 2018; originally announced May 2018.

    Comments: Version 1 as appears in the International Conference on Learning Representations (ICLR) 2018 Workshop Track; Version 2 as appears in the Proceedings of The 2nd Conference on Robot Learning

  8. arXiv:1704.00756  [pdf, other

    cs.LG cs.AI stat.ML

    Multi-Advisor Reinforcement Learning

    Authors: Romain Laroche, Mehdi Fatemi, Joshua Romoff, Harm van Seijen

    Abstract: We consider tackling a single-agent RL problem by distributing it to $n$ learners. These learners, called advisors, endeavour to solve the problem from a different focus. Their advice, taking the form of action values, is then communicated to an aggregator, which is in control of the system. We show that the local planning method for the advisors is critical and that none of the ones found in the… ▽ More

    Submitted 14 November, 2017; v1 submitted 3 April, 2017; originally announced April 2017.

    Comments: Submitted at ICLR2018