Off-Belief Learning

Hu, Hengyuan; Lerer, Adam; Cui, Brandon; Wu, David; Pineda, Luis; Brown, Noam; Foerster, Jakob

Computer Science > Artificial Intelligence

arXiv:2103.04000v3 (cs)

[Submitted on 6 Mar 2021 (v1), revised 29 Jun 2021 (this version, v3), latest version 18 Aug 2021 (v5)]

Title:Off-Belief Learning

Authors:Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster

View PDF

Abstract:The standard problem setting in Dec-POMDPs is self-play, where the goal is to find a set of policies that play optimally together. Policies learned through self-play may adopt arbitrary conventions and implicitly rely on multi-step reasoning based on fragile assumptions about other agents' actions and thus fail when paired with humans or independently trained agents at test time. To address this, we present off-belief learning (OBL). At each timestep OBL agents follow a policy $\pi_1$ that is optimized assuming past actions were taken by a given, fixed policy ($\pi_0$), but assuming that future actions will be taken by $\pi_1$. When $\pi_0$ is uniform random, OBL converges to an optimal policy that does not rely on inferences based on other agents' behavior (an optimal grounded policy). OBL can be iterated in a hierarchy, where the optimal policy from one level becomes the input to the next, thereby introducing multi-level cognitive reasoning in a controlled manner. Unlike existing approaches, which may converge to any equilibrium policy, OBL converges to a unique policy, making it suitable for zero-shot coordination (ZSC). OBL can be scaled to high-dimensional settings with a fictitious transition mechanism and shows strong performance in both a toy-setting and the benchmark human-AI & ZSC problem Hanabi.

Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2103.04000 [cs.AI]
	(or arXiv:2103.04000v3 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2103.04000

Submission history

From: Hengyuan Hu [view email]
[v1] Sat, 6 Mar 2021 01:09:55 UTC (1,774 KB)
[v2] Wed, 16 Jun 2021 03:26:08 UTC (2,135 KB)
[v3] Tue, 29 Jun 2021 18:05:19 UTC (2,136 KB)
[v4] Wed, 4 Aug 2021 21:12:37 UTC (2,136 KB)
[v5] Wed, 18 Aug 2021 01:59:19 UTC (2,136 KB)

Computer Science > Artificial Intelligence

Title:Off-Belief Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Off-Belief Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators