Understanding Game-Playing Agents with Natural Language Annotations

Tomlin, Nicholas; He, Andre; Klein, Dan

Computer Science > Computation and Language

arXiv:2204.07531 (cs)

[Submitted on 15 Apr 2022]

Title:Understanding Game-Playing Agents with Natural Language Annotations

Authors:Nicholas Tomlin, Andre He, Dan Klein

View PDF

Abstract:We present a new dataset containing 10K human-annotated games of Go and show how these natural language annotations can be used as a tool for model interpretability. Given a board state and its associated comment, our approach uses linear probing to predict mentions of domain-specific terms (e.g., ko, atari) from the intermediate state representations of game-playing agents like AlphaGo Zero. We find these game concepts are nontrivially encoded in two distinct policy networks, one trained via imitation learning and another trained via reinforcement learning. Furthermore, mentions of domain-specific terms are most easily predicted from the later layers of both models, suggesting that these policy networks encode high-level abstractions similar to those used in the natural language annotations.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2204.07531 [cs.CL]
	(or arXiv:2204.07531v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2204.07531

Submission history

From: Nicholas Tomlin [view email]
[v1] Fri, 15 Apr 2022 16:11:08 UTC (3,938 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2022-04

Change to browse by:

cs
cs.AI

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:Understanding Game-Playing Agents with Natural Language Annotations

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Understanding Game-Playing Agents with Natural Language Annotations

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators