Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

Tang, Yunhao; Kozuno, Tadashi; Rowland, Mark; Munos, Rémi; Valko, Michal

Computer Science > Machine Learning

arXiv:2106.13125v1 (cs)

[Submitted on 24 Jun 2021 (this version), latest version 3 Nov 2021 (v2)]

Title:Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

Authors:Yunhao Tang, Tadashi Kozuno, Mark Rowland, Rémi Munos, Michal Valko

View PDF

Abstract:Model-agnostic meta-reinforcement learning requires estimating the Hessian matrix of value functions. This is challenging from an implementation perspective, as repeatedly differentiating policy gradient estimates may lead to biased Hessian estimates. In this work, we provide a unifying framework for estimating higher-order derivatives of value functions, based on off-policy evaluation. Our framework interprets a number of prior approaches as special cases and elucidates the bias and variance trade-off of Hessian estimates. This framework also opens the door to a new family of estimates, which can be easily implemented with auto-differentiation libraries, and lead to performance gains in practice.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2106.13125 [cs.LG]
	(or arXiv:2106.13125v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2106.13125

Submission history

From: Yunhao Tang [view email]
[v1] Thu, 24 Jun 2021 15:58:01 UTC (2,047 KB)
[v2] Wed, 3 Nov 2021 10:04:06 UTC (5,332 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2021-06

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Yunhao Tang
Tadashi Kozuno
Mark Rowland
Rémi Munos
Michal Valko

export BibTeX citation

Computer Science > Machine Learning

Title:Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators