Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds

Albrecht, Joshua; Fetterman, Abraham J.; Fogelman, Bryden; Kitanidis, Ellie; Wróblewski, Bartosz; Seo, Nicole; Rosenthal, Michael; Knutins, Maksis; Polizzi, Zachary; Simon, James B.; Qiu, Kanjun

Computer Science > Artificial Intelligence

arXiv:2210.13417 (cs)

[Submitted on 24 Oct 2022]

Title:Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds

Authors:Joshua Albrecht, Abraham J. Fetterman, Bryden Fogelman, Ellie Kitanidis, Bartosz Wróblewski, Nicole Seo, Michael Rosenthal, Maksis Knutins, Zachary Polizzi, James B. Simon, Kanjun Qiu

View PDF

Abstract:Despite impressive successes, deep reinforcement learning (RL) systems still fall short of human performance on generalization to new tasks and environments that differ from their training. As a benchmark tailored for studying RL generalization, we introduce Avalon, a set of tasks in which embodied agents in highly diverse procedural 3D worlds must survive by navigating terrain, hunting or gathering food, and avoiding hazards. Avalon is unique among existing RL benchmarks in that the reward function, world dynamics, and action space are the same for every task, with tasks differentiated solely by altering the environment; its 20 tasks, ranging in complexity from eat and throw to hunt and navigate, each create worlds in which the agent must perform specific skills in order to survive. This setup enables investigations of generalization within tasks, between tasks, and to compositional tasks that require combining skills learned from previous tasks. Avalon includes a highly efficient simulator, a library of baselines, and a benchmark with scoring metrics evaluated against hundreds of hours of human performance, all of which are open-source and publicly available. We find that standard RL baselines make progress on most tasks but are still far from human performance, suggesting Avalon is challenging enough to advance the quest for generalizable RL.

Comments:	Accepted to NeurIPS Datasets and Benchmarks 2022. Video and links to all code, data, etc can be found at this https URL
Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2210.13417 [cs.AI]
	(or arXiv:2210.13417v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2210.13417

Submission history

From: Ellie Kitanidis [view email]
[v1] Mon, 24 Oct 2022 17:34:50 UTC (9,860 KB)

Computer Science > Artificial Intelligence

Title:Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators