Constrained Reinforcement Learning via Dissipative Saddle Flow Dynamics

Zheng, Tianqi; You, Pengcheng; Mallada, Enrique

Computer Science > Machine Learning

arXiv:2212.01505 (cs)

[Submitted on 3 Dec 2022]

Title:Constrained Reinforcement Learning via Dissipative Saddle Flow Dynamics

Authors:Tianqi Zheng, Pengcheng You, Enrique Mallada

View PDF

Abstract:In constrained reinforcement learning (C-RL), an agent seeks to learn from the environment a policy that maximizes the expected cumulative reward while satisfying minimum requirements in secondary cumulative reward constraints. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods are based on stochastic gradient descent ascent algorithms whose trajectories are connected to the optimal policy only after a mixing output stage that depends on the algorithm's history. As a result, there is a mismatch between the behavioral policy and the optimal one. In this work, we propose a novel algorithm for constrained RL that does not suffer from these limitations. Leveraging recent results on regularized saddle-flow dynamics, we develop a novel stochastic gradient descent-ascent algorithm whose trajectories converge to the optimal policy almost surely.

Subjects:	Machine Learning (cs.LG); Systems and Control (eess.SY)
Cite as:	arXiv:2212.01505 [cs.LG]
	(or arXiv:2212.01505v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2212.01505

Submission history

From: Tianqi Zheng [view email]
[v1] Sat, 3 Dec 2022 01:54:55 UTC (536 KB)

Computer Science > Machine Learning

Title:Constrained Reinforcement Learning via Dissipative Saddle Flow Dynamics

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Constrained Reinforcement Learning via Dissipative Saddle Flow Dynamics

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators