Approximate Shielding of Atari Agents for Safe Exploration

Goodall, Alexander W.; Belardinelli, Francesco

Computer Science > Artificial Intelligence

arXiv:2304.11104 (cs)

[Submitted on 21 Apr 2023]

Title:Approximate Shielding of Atari Agents for Safe Exploration

Authors:Alexander W. Goodall, Francesco Belardinelli

View PDF

Abstract:Balancing exploration and conservatism in the constrained setting is an important problem if we are to use reinforcement learning for meaningful tasks in the real world. In this paper, we propose a principled algorithm for safe exploration based on the concept of shielding. Previous approaches to shielding assume access to a safety-relevant abstraction of the environment or a high-fidelity simulator. Instead, our work is based on latent shielding - another approach that leverages world models to verify policy roll-outs in the latent space of a learned dynamics model. Our novel algorithm builds on this previous work, using safety critics and other additional features to improve the stability and farsightedness of the algorithm. We demonstrate the effectiveness of our approach by running experiments on a small set of Atari games with state dependent safety labels. We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations, and in some cases improves the speed of convergence and quality of the final agent.

Comments:	Accepted for presentation at the ALA workshop as part of AAMAS 2023
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2304.11104 [cs.AI]
	(or arXiv:2304.11104v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2304.11104

Submission history

From: Alexander W. Goodall [view email]
[v1] Fri, 21 Apr 2023 16:19:54 UTC (3,418 KB)

Computer Science > Artificial Intelligence

Title:Approximate Shielding of Atari Agents for Safe Exploration

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Approximate Shielding of Atari Agents for Safe Exploration

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators