Policy-Value Alignment and Robustness in Search-based Multi-Agent Learning

Grupen, Niko A.; Hanlon, Michael; Hao, Alexis; Lee, Daniel D.; Selman, Bart

Computer Science > Artificial Intelligence

arXiv:2301.11857 (cs)

[Submitted on 27 Jan 2023 (v1), last revised 6 Feb 2023 (this version, v2)]

Title:Policy-Value Alignment and Robustness in Search-based Multi-Agent Learning

Authors:Niko A. Grupen, Michael Hanlon, Alexis Hao, Daniel D. Lee, Bart Selman

View PDF

Abstract:Large-scale AI systems that combine search and learning have reached super-human levels of performance in game-playing, but have also been shown to fail in surprising ways. The brittleness of such models limits their efficacy and trustworthiness in real-world deployments. In this work, we systematically study one such algorithm, AlphaZero, and identify two phenomena related to the nature of exploration. First, we find evidence of policy-value misalignment -- for many states, AlphaZero's policy and value predictions contradict each other, revealing a tension between accurate move-selection and value estimation in AlphaZero's objective. Further, we find inconsistency within AlphaZero's value function, which causes it to generalize poorly, despite its policy playing an optimal strategy. From these insights we derive VISA-VIS: a novel method that improves policy-value alignment and value robustness in AlphaZero. Experimentally, we show that our method reduces policy-value misalignment by up to 76%, reduces value generalization error by up to 50%, and reduces average value error by up to 55%.

Comments:	9 pages, 5 figures
Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Cite as:	arXiv:2301.11857 [cs.AI]
	(or arXiv:2301.11857v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2301.11857

Submission history

From: Niko Grupen [view email]
[v1] Fri, 27 Jan 2023 17:05:29 UTC (1,168 KB)
[v2] Mon, 6 Feb 2023 15:59:53 UTC (1,171 KB)

Computer Science > Artificial Intelligence

Title:Policy-Value Alignment and Robustness in Search-based Multi-Agent Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Policy-Value Alignment and Robustness in Search-based Multi-Agent Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators