Search | arXiv e-print repository

Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games

Authors: Lukas Schäfer, Logan Jones, Anssi Kanervisto, Yuhan Cao, Tabish Rashid, Raluca Georgescu, Dave Bignell, Siddhartha Sen, Andrea Treviño Gavito, Sam Devlin

Abstract: Video games have served as useful benchmarks for the decision making community, but going beyond Atari games towards training agents in modern games has been prohibitively expensive for the vast majority of the research community. Recent progress in the research, development and open release of large vision models has the potential to amortize some of these costs across the community. However, it… ▽ More Video games have served as useful benchmarks for the decision making community, but going beyond Atari games towards training agents in modern games has been prohibitively expensive for the vast majority of the research community. Recent progress in the research, development and open release of large vision models has the potential to amortize some of these costs across the community. However, it is currently unclear which of these models have learnt representations that retain information critical for sequential decision making. Towards enabling wider participation in the research of gameplaying agents in modern games, we present a systematic study of imitation learning with publicly available visual encoders compared to the typical, task-specific, end-to-end training approach in Minecraft, Minecraft Dungeons and Counter-Strike: Global Offensive. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: Preprint

arXiv:2303.02160 [pdf, other]

doi 10.1145/3544548.3581348

Navigates Like Me: Understanding How People Evaluate Human-Like AI in Video Games

Authors: Stephanie Milani, Arthur Juliani, Ida Momennejad, Raluca Georgescu, Jaroslaw Rzpecki, Alison Shaw, Gavin Costello, Fei Fang, Sam Devlin, Katja Hofmann

Abstract: We aim to understand how people assess human likeness in navigation produced by people and artificially intelligent (AI) agents in a video game. To this end, we propose a novel AI agent with the goal of generating more human-like behavior. We collect hundreds of crowd-sourced assessments comparing the human-likeness of navigation behavior generated by our agent and baseline AI agents with human-ge… ▽ More We aim to understand how people assess human likeness in navigation produced by people and artificially intelligent (AI) agents in a video game. To this end, we propose a novel AI agent with the goal of generating more human-like behavior. We collect hundreds of crowd-sourced assessments comparing the human-likeness of navigation behavior generated by our agent and baseline AI agents with human-generated behavior. Our proposed agent passes a Turing Test, while the baseline agents do not. By passing a Turing Test, we mean that human judges could not quantitatively distinguish between videos of a person and an AI agent navigating. To understand what people believe constitutes human-like navigation, we extensively analyze the justifications of these assessments. This work provides insights into the characteristics that people consider human-like in the context of goal-directed video game navigation, which is a key step for further improving human interactions with AI agents. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: 18 pages; accepted at CHI 2023

arXiv:2301.10677 [pdf, other]

Imitating Human Behaviour with Diffusion Models

Authors: Tim Pearce, Tabish Rashid, Anssi Kanervisto, Dave Bignell, Mingfei Sun, Raluca Georgescu, Sergio Valcarcel Macua, Shan Zheng Tan, Ida Momennejad, Katja Hofmann, Sam Devlin

Abstract: Diffusion models have emerged as powerful generative models in the text-to-image domain. This paper studies their application as observation-to-action models for imitating human behaviour in sequential environments. Human behaviour is stochastic and multimodal, with structured correlations between action dimensions. Meanwhile, standard modelling choices in behaviour cloning are limited in their ex… ▽ More Diffusion models have emerged as powerful generative models in the text-to-image domain. This paper studies their application as observation-to-action models for imitating human behaviour in sequential environments. Human behaviour is stochastic and multimodal, with structured correlations between action dimensions. Meanwhile, standard modelling choices in behaviour cloning are limited in their expressiveness and may introduce bias into the cloned policy. We begin by pointing out the limitations of these choices. We then propose that diffusion models are an excellent fit for imitating human behaviour, since they learn an expressive distribution over the joint action space. We introduce several innovations to make diffusion models suitable for sequential environments; designing suitable architectures, investigating the role of guidance, and develo** reliable sampling strategies. Experimentally, diffusion models closely match human demonstrations in a simulated robotic control task and a modern 3D gaming environment. △ Less

Submitted 3 March, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

Comments: Published in ICLR 2023

Journal ref: ICLR 2023

arXiv:2211.10869 [pdf, other]

UniMASK: Unified Inference in Sequential Decision Problems

Authors: Micah Carroll, Orr Paradise, Jessy Lin, Raluca Georgescu, Mingfei Sun, David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca Dragan, Sam Devlin

Abstract: Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision-making, where many well-studied tasks like behavior cloning, offline reinforcement learning, inverse dynamics, and waypoint conditioning correspond to different sequenc… ▽ More Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision-making, where many well-studied tasks like behavior cloning, offline reinforcement learning, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. We introduce the UniMASK framework, which provides a unified way to specify models which can be trained on many different sequential decision-making tasks. We show that a single UniMASK model is often capable of carrying out many tasks with performance similar to or better than single-task models. Additionally, after fine-tuning, our UniMASK models consistently outperform comparable single-task models. Our code is publicly available at https://github.com/micahcarroll/uniMASK. △ Less

Submitted 19 November, 2022; originally announced November 2022.

Comments: NeurIPS 2022 (Oral). A prior version was published at an ICML Workshop, available at arXiv:2204.13326

arXiv:2209.00570 [pdf, other]

Go-Explore Complex 3D Game Environments for Automated Reachability Testing

Authors: Cong Lu, Raluca Georgescu, Johan Verwey

Abstract: Modern AAA video games feature huge game levels and maps which are increasingly hard for level testers to cover exhaustively. As a result, games often ship with catastrophic bugs such as the player falling through the floor or being stuck in walls. We propose an approach specifically targeted at reachability bugs in simulated 3D environments based on the powerful exploration algorithm, Go-Explore,… ▽ More Modern AAA video games feature huge game levels and maps which are increasingly hard for level testers to cover exhaustively. As a result, games often ship with catastrophic bugs such as the player falling through the floor or being stuck in walls. We propose an approach specifically targeted at reachability bugs in simulated 3D environments based on the powerful exploration algorithm, Go-Explore, which saves unique checkpoints across the map and then identifies promising ones to explore from. We show that when coupled with simple heuristics derived from the game's navigation mesh, Go-Explore finds challenging bugs and comprehensively explores complex environments without the need for human demonstration or knowledge of the game dynamics. Go-Explore vastly outperforms more complicated baselines including reinforcement learning with intrinsic curiosity in both covering the navigation mesh and number of unique positions across the map discovered. Finally, due to our use of parallel agents, our algorithm can fully cover a vast 1.5km x 1.5km game world within 10 hours on a single machine making it extremely promising for continuous testing suites. △ Less

Submitted 1 September, 2022; originally announced September 2022.

arXiv:2204.13326 [pdf, other]

Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

Authors: Micah Carroll, Jessy Lin, Orr Paradise, Raluca Georgescu, Mingfei Sun, David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca Dragan, Sam Devlin

Abstract: Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a se… ▽ More Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. We introduce the FlexiBiT framework, which provides a unified way to specify models which can be trained on many different sequential decision making tasks. We show that a single FlexiBiT model is simultaneously capable of carrying out many tasks with performance similar to or better than specialized models. Additionally, we show that performance can be further improved by fine-tuning our general model on specific tasks of interest. △ Less

Submitted 9 December, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

Comments: Superseded by arXiv:2211.10869

arXiv:2105.09637 [pdf, other]

Navigation Turing Test (NTT): Learning to Evaluate Human-Like Navigation

Authors: Sam Devlin, Raluca Georgescu, Ida Momennejad, Jaroslaw Rzepecki, Evelyn Zuniga, Gavin Costello, Guy Leroy, Ali Shaw, Katja Hofmann

Abstract: A key challenge on the path to develo** agents that learn complex human-like behavior is the need to quickly and accurately quantify human-likeness. While human assessments of such behavior can be highly accurate, speed and scalability are limited. We address these limitations through a novel automated Navigation Turing Test (ANTT) that learns to predict human judgments of human-likeness. We dem… ▽ More A key challenge on the path to develo** agents that learn complex human-like behavior is the need to quickly and accurately quantify human-likeness. While human assessments of such behavior can be highly accurate, speed and scalability are limited. We address these limitations through a novel automated Navigation Turing Test (ANTT) that learns to predict human judgments of human-likeness. We demonstrate the effectiveness of our automated NTT on a navigation task in a complex 3D environment. We investigate six classification models to shed light on the types of architectures best suited to this task, and validate them against data collected through a human NTT. Our best models achieve high accuracy when distinguishing true human and agent behavior. At the same time, we show that predicting finer-grained human assessment of agents' progress towards human-like behavior remains unsolved. Our work takes an important step towards agents that more effectively learn complex human-like behavior. △ Less

Submitted 28 July, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

Comments: All data collected throughout this study, plus the code to reproduce our analysis and ANTT are available at https://github.com/microsoft/NTT

Journal ref: Proceedings of the 38th International Conference on Machine Learning (ICML), 139:2644-2653, 2021

arXiv:1208.1707 [pdf, ps, other]

Numerical investigation of the Bautin bifurcation in a delay differential equation modeling leukemia

Authors: Anca Veronica Ion, Raluca Mihaela Georgescu

Abstract: In a previous work we investigated the existence of Hopf degenerate bifurcation points for a differential delay equation modeling leukemia and we actually found Hopf points of codimension two for the considered problem. If around the parameters corresponding to such a point we vary two parameters (the considered problem has five parameters), then a Bautin bifurcation should occur. In this work we… ▽ More In a previous work we investigated the existence of Hopf degenerate bifurcation points for a differential delay equation modeling leukemia and we actually found Hopf points of codimension two for the considered problem. If around the parameters corresponding to such a point we vary two parameters (the considered problem has five parameters), then a Bautin bifurcation should occur. In this work we chose a Hopf point of codimension two for the considered problem and perform numerical integration for parameters chosen in a neighborhood of the bifurcation point parameters. The results show that, indeed, we have a Bautin bifurcation in the chosen point. △ Less

Submitted 8 August, 2012; originally announced August 2012.

Comments: To be presented at CAIM 2012 (Conference on Applied and Industrial Mathematics), 23-25 August 2012, Chisinau, Republic of Moldova

MSC Class: 37C75; 65L03; 37G05; 37G15

arXiv:1205.3917 [pdf, ps, other]

Hopf points of codimension two in a delay differential equation modeling leukemia

Authors: Anca Veronica Ion, Raluca Mihaela Georgescu

Abstract: This paper continues the work contained in two previous papers, devoted to the study of the dynamical system generated by a delay differential equation that models leukemia. Here our aim is to identify degenerate Hopf bifurcation points. By using an approximation of the center manifold, we compute the first Lyapunov coefficient for Hopf bifurcation points. We find by direct computation, in some zo… ▽ More This paper continues the work contained in two previous papers, devoted to the study of the dynamical system generated by a delay differential equation that models leukemia. Here our aim is to identify degenerate Hopf bifurcation points. By using an approximation of the center manifold, we compute the first Lyapunov coefficient for Hopf bifurcation points. We find by direct computation, in some zones of the parameter space (of biological significance), points where the first Lyapunov coefficient equals zero. For these we compute the second Lyapunov coefficient, that determines the type of the degenerate Hopf bifurcation. △ Less

Submitted 17 May, 2012; originally announced May 2012.

MSC Class: 65L03; 37C75; 37G05; 37G15

arXiv:1001.5354 [pdf, ps, other]

Stability of equilibrium and periodic solutions of a delay equation modeling leukemia

Authors: Anca-Veronica Ion, Raluca-Mihaela Georgescu

Abstract: We consider a delay differential equation that occurs in the study of chronic myelogenous leukemia. After shortly reminding some previous results concerning the stability of equilibrium solutions, we concentrate on the study of stability of periodic solutions emerged by Hopf bifurcation from a certain equilibrium point. We give the algorithm for approximating a center manifold at a typical point… ▽ More We consider a delay differential equation that occurs in the study of chronic myelogenous leukemia. After shortly reminding some previous results concerning the stability of equilibrium solutions, we concentrate on the study of stability of periodic solutions emerged by Hopf bifurcation from a certain equilibrium point. We give the algorithm for approximating a center manifold at a typical point (in the parameter space) of Hopf bifurcation (and an unstable manifold in the vicinity of such a point, where such a manifold exists). Then we find the normal form of the equation restricted to the center manifold, by computing the first Lyapunov coefficient. The normal form allows us to establish the stability properties of the periodic solutions occurred by Hopf bifurcation. △ Less

Submitted 22 March, 2010; v1 submitted 29 January, 2010; originally announced January 2010.

MSC Class: 65L03; 37C75; 37G05; 37G15

Journal ref: "Journal of Middle Volga Mathematical Society", 11, 2(2009), 146-157

Showing 1–10 of 10 results for author: Georgescu, R