Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning

Roderick, Melrose; Manek, Gaurav; Berkenkamp, Felix; Kolter, J. Zico

Computer Science > Machine Learning

arXiv:2311.14885 (cs)

[Submitted on 25 Nov 2023]

Title:Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning

Authors:Melrose Roderick, Gaurav Manek, Felix Berkenkamp, J. Zico Kolter

View PDF

Abstract:A key problem in off-policy Reinforcement Learning (RL) is the mismatch, or distribution shift, between the dataset and the distribution over states and actions visited by the learned policy. This problem is exacerbated in the fully offline setting. The main approach to correct this shift has been through importance sampling, which leads to high-variance gradients. Other approaches, such as conservatism or behavior-regularization, regularize the policy at the cost of performance. In this paper, we propose a new approach for stable off-policy Q-Learning. Our method, Projected Off-Policy Q-Learning (POP-QL), is a novel actor-critic algorithm that simultaneously reweights off-policy samples and constrains the policy to prevent divergence and reduce value-approximation error. In our experiments, POP-QL not only shows competitive performance on standard benchmarks, but also out-performs competing methods in tasks where the data-collection policy is significantly sub-optimal.

Comments:	10 pages
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2311.14885 [cs.LG]
	(or arXiv:2311.14885v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2311.14885

Submission history

From: Melrose Roderick [view email]
[v1] Sat, 25 Nov 2023 00:30:58 UTC (1,093 KB)

Computer Science > Machine Learning

Title:Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators