Policy-Guided Diffusion

Jackson, Matthew Thomas; Matthews, Michael Tryfan; Lu, Cong; Ellis, Benjamin; Whiteson, Shimon; Foerster, Jakob

Abstract:In many real-world settings, agents must learn from an offline dataset gathered by some prior behavior policy. Such a setting naturally leads to distribution shift between the behavior policy and the target policy being trained - requiring policy conservatism to avoid instability and overestimation bias. Autoregressive world models offer a different solution to this by generating synthetic, on-policy experience. However, in practice, model rollouts must be severely truncated to avoid compounding error. As an alternative, we propose policy-guided diffusion. Our method uses diffusion models to generate entire trajectories under the behavior distribution, applying guidance from the target policy to move synthetic experience further on-policy. We show that policy-guided diffusion models a regularized form of the target distribution that balances action likelihood under both the target and behavior policies, leading to plausible trajectories with high target policy probability, while retaining a lower dynamics error than an offline world model baseline. Using synthetic experience from policy-guided diffusion as a drop-in substitute for real data, we demonstrate significant improvements in performance across a range of standard offline reinforcement learning algorithms and environments. Our approach provides an effective alternative to autoregressive offline world models, opening the door to the controllable generation of synthetic training data.

Comments:	Previously at the NeurIPS 2023 Workshop on Robot Learning
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Cite as:	arXiv:2404.06356 [cs.LG]
	(or arXiv:2404.06356v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2404.06356

Computer Science > Machine Learning

Title:Policy-Guided Diffusion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators