Showing 1–2 of 2 results for author: GX-Chen, A

Search v0.5.6 released 2020-02-24

arXiv:2208.12345 [pdf, other]

cs.LG cs.AI

Light-weight probing of unsupervised representations for Reinforcement Learning

Authors: Wancong Zhang, Anthony GX-Chen, Vlad Sobal, Yann LeCun, Nicolas Carion

Abstract: Unsupervised visual representation learning offers the opportunity to leverage large corpora of unlabeled trajectories to form useful visual representations, which can benefit the training of reinforcement learning (RL) algorithms. However, evaluating the fitness of such representations requires training RL algorithms which is computationally intensive and has high variance outcomes. Inspired by t… ▽ More Unsupervised visual representation learning offers the opportunity to leverage large corpora of unlabeled trajectories to form useful visual representations, which can benefit the training of reinforcement learning (RL) algorithms. However, evaluating the fitness of such representations requires training RL algorithms which is computationally intensive and has high variance outcomes. Inspired by the vision community, we study whether linear probing can be a proxy evaluation task for the quality of unsupervised RL representation. Specifically, we probe for the observed reward in a given state and the action of an expert in a given state, both of which are generally applicable to many RL domains. Through rigorous experimentation, we show that the probing tasks are strongly rank correlated with the downstream RL performance on the Atari100k Benchmark, while having lower variance and up to 600x lower computational cost. This provides a more efficient method for exploring the space of pretraining algorithms and identifying promising pretraining recipes without the need to run RL evaluations for every setting. Leveraging this framework, we further improve existing self-supervised learning (SSL) recipes for RL, highlighting the importance of the forward model, the size of the visual backbone, and the precise formulation of the unsupervised objective. △ Less

Submitted 31 May, 2024; v1 submitted 25 August, 2022; originally announced August 2022.

Comments: To appear in the proceedings of the Reinforcement Learning Conference 2024
arXiv:2201.01836 [pdf, other]

cs.LG cs.AI

A Generalized Bootstrap Target for Value-Learning, Efficiently Combining Value and Feature Predictions

Authors: Anthony GX-Chen, Veronica Chelu, Blake A. Richards, Joelle Pineau

Abstract: Estimating value functions is a core component of reinforcement learning algorithms. Temporal difference (TD) learning algorithms use bootstrap**, i.e. they update the value function toward a learning target using value estimates at subsequent time-steps. Alternatively, the value function can be updated toward a learning target constructed by separately predicting successor features (SF)--a poli… ▽ More Estimating value functions is a core component of reinforcement learning algorithms. Temporal difference (TD) learning algorithms use bootstrap**, i.e. they update the value function toward a learning target using value estimates at subsequent time-steps. Alternatively, the value function can be updated toward a learning target constructed by separately predicting successor features (SF)--a policy-dependent model--and linearly combining them with instantaneous rewards. We focus on bootstrap** targets used when estimating value functions, and propose a new backup target, the $η$-return mixture, which implicitly combines value-predictive knowledge (used by TD methods) with (successor) feature-predictive knowledge--with a parameter $η$ capturing how much to rely on each. We illustrate that incorporating predictive knowledge through an $ηγ$-discounted SF model makes more efficient use of sampled experience, compared to either extreme, i.e. bootstrap** entirely on the value function estimate, or bootstrap** on the product of separately estimated successor features and instantaneous reward models. We empirically show this approach leads to faster policy evaluation and better control performance, for tabular and nonlinear function approximations, indicating scalability and generality. △ Less

Submitted 5 January, 2022; originally announced January 2022.

Comments: 18 pages, 6 figures, 2 tables. Preprint. Accepted by AAAI-22

Search v0.5.6 released 2020-02-24