On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Zhang, Junyu; Ni, Chengzhuo; Yu, Zheng; Szepesvari, Csaba; Wang, Mengdi

Computer Science > Machine Learning

arXiv:2102.08607 (cs)

[Submitted on 17 Feb 2021 (v1), last revised 27 May 2021 (this version, v2)]

Title:On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Authors:Junyu Zhang, Chengzhuo Ni, Zheng Yu, Csaba Szepesvari, Mengdi Wang

View PDF

Abstract:Policy gradient (PG) gives rise to a rich class of reinforcement learning (RL) methods. Recently, there has been an emerging trend to accelerate the existing PG methods such as REINFORCE by the \emph{variance reduction} techniques. However, all existing variance-reduced PG methods heavily rely on an uncheckable importance weight assumption made for every single iteration of the algorithms. In this paper, a simple gradient truncation mechanism is proposed to address this issue. Moreover, we design a Truncated Stochastic Incremental Variance-Reduced Policy Gradient (TSIVR-PG) method, which is able to maximize not only a cumulative sum of rewards but also a general utility function over a policy's long-term visiting distribution. We show an $\tilde{\mathcal{O}}(\epsilon^{-3})$ sample complexity for TSIVR-PG to find an $\epsilon$-stationary policy. By assuming the overparameterizaiton of policy and exploiting the hidden convexity of the problem, we further show that TSIVR-PG converges to global $\epsilon$-optimal policy with $\tilde{\mathcal{O}}(\epsilon^{-2})$ samples.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2102.08607 [cs.LG]
	(or arXiv:2102.08607v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2102.08607

Submission history

From: Junyu Zhang [view email]
[v1] Wed, 17 Feb 2021 07:06:19 UTC (172 KB)
[v2] Thu, 27 May 2021 22:08:32 UTC (450 KB)

Computer Science > Machine Learning

Title:On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators