Search | arXiv e-print repository

TextWorldExpress: Simulating Text Games at One Million Steps Per Second

Authors: Peter A. Jansen, Marc-Alexandre Côté

Abstract: Text-based games offer a challenging test bed to evaluate virtual agents at language understanding, multi-step problem-solving, and common-sense reasoning. However, speed is a major limitation of current text-based games, cap** at 300 steps per second, mainly due to the use of legacy tooling. In this work we present TextWorldExpress, a high-performance simulator that includes implementations of… ▽ More Text-based games offer a challenging test bed to evaluate virtual agents at language understanding, multi-step problem-solving, and common-sense reasoning. However, speed is a major limitation of current text-based games, cap** at 300 steps per second, mainly due to the use of legacy tooling. In this work we present TextWorldExpress, a high-performance simulator that includes implementations of three common text game benchmarks that increases simulation throughput by approximately three orders of magnitude, reaching over one million steps per second on common desktop hardware. This significantly reduces experiment runtime, enabling billion-step-scale experiments in about one day. △ Less

Submitted 2 March, 2023; v1 submitted 1 August, 2022; originally announced August 2022.

Comments: Accepted to EACL 2023

arXiv:2107.04132 [pdf, other]

A Systematic Survey of Text Worlds as Embodied Natural Language Environments

Authors: Peter A Jansen

Abstract: Text Worlds are virtual environments for embodied agents that, unlike 2D or 3D environments, are rendered exclusively using textual descriptions. These environments offer an alternative to higher-fidelity 3D environments due to their low barrier to entry, providing the ability to study semantics, compositional inference, and other high-level tasks with rich high-level action spaces while controlli… ▽ More Text Worlds are virtual environments for embodied agents that, unlike 2D or 3D environments, are rendered exclusively using textual descriptions. These environments offer an alternative to higher-fidelity 3D environments due to their low barrier to entry, providing the ability to study semantics, compositional inference, and other high-level tasks with rich high-level action spaces while controlling for perceptual input. This systematic survey outlines recent developments in tooling, environments, and agent modeling for Text Worlds, while examining recent trends in knowledge graphs, common sense reasoning, transfer learning of Text World performance to higher-fidelity environments, as well as near-term development targets that, once achieved, make Text Worlds an attractive general research paradigm for natural language processing. △ Less

Submitted 8 July, 2021; originally announced July 2021.

Comments: 18 pages

arXiv:2009.14259 [pdf, other]

Visually-Grounded Planning without Vision: Language Models Infer Detailed Plans from High-level Instructions

Authors: Peter A. Jansen

Abstract: The recently proposed ALFRED challenge task aims for a virtual robotic agent to complete complex multi-step everyday tasks in a virtual home environment from high-level natural language directives, such as "put a hot piece of bread on a plate". Currently, the best-performing models are able to complete less than 5% of these tasks successfully. In this work we focus on modeling the translation prob… ▽ More The recently proposed ALFRED challenge task aims for a virtual robotic agent to complete complex multi-step everyday tasks in a virtual home environment from high-level natural language directives, such as "put a hot piece of bread on a plate". Currently, the best-performing models are able to complete less than 5% of these tasks successfully. In this work we focus on modeling the translation problem of converting natural language directives into detailed multi-step sequences of actions that accomplish those goals in the virtual environment. We empirically demonstrate that it is possible to generate gold multi-step plans from language directives alone without any visual input in 26% of unseen cases. When a small amount of visual information is incorporated, namely the starting location in the virtual environment, our best-performing GPT-2 model successfully generates gold command sequences in 58% of cases. Our results suggest that contextualized language models may provide strong visual semantic planning modules for grounded virtual agents. △ Less

Submitted 26 October, 2020; v1 submitted 29 September, 2020; originally announced September 2020.

Comments: Accepted to Findings of EMNLP. V2: corrected typo Table 1; margins Table 3

arXiv:1802.03052 [pdf, other]

WorldTree: A Corpus of Explanation Graphs for Elementary Science Questions supporting Multi-Hop Inference

Authors: Peter A. Jansen, Elizabeth Wainwright, Steven Marmorstein, Clayton T. Morrison

Abstract: Develo** methods of automated inference that are able to provide users with compelling human-readable justifications for why the answer to a question is correct is critical for domains such as science and medicine, where user trust and detecting costly errors are limiting factors to adoption. One of the central barriers to training question answering models on explainable inference tasks is the… ▽ More Develo** methods of automated inference that are able to provide users with compelling human-readable justifications for why the answer to a question is correct is critical for domains such as science and medicine, where user trust and detecting costly errors are limiting factors to adoption. One of the central barriers to training question answering models on explainable inference tasks is the lack of gold explanations to serve as training data. In this paper we present a corpus of explanations for standardized science exams, a recent challenge task for question answering. We manually construct a corpus of detailed explanations for nearly all publicly available standardized elementary science question (approximately 1,680 3rd through 5th grade questions) and represent these as "explanation graphs" -- sets of lexically overlap** sentences that describe how to arrive at the correct answer to a question through a combination of domain and world knowledge. We also provide an explanation-centered tablestore, a collection of semi-structured tables that contain the knowledge to construct these elementary science explanations. Together, these two knowledge resources map out a substantial portion of the knowledge required for answering and explaining elementary science exams, and provide both structured and free-text training data for the explainable inference task. △ Less

Submitted 8 February, 2018; originally announced February 2018.

Comments: Accepted at the Language Resource and Evaluation Conference (LREC) 2018

arXiv:1009.5718 [pdf, other]

Monitoring wild animal communities with arrays of motion sensitive camera traps

Authors: Roland Kays, Sameer Tilak, Bart Kranstauber, Patrick A. Jansen, Chris Carbone, Marcus J. Rowcliffe, Tony Fountain, Jay Eggert, Zhihai He

Abstract: Studying animal movement and distribution is of critical importance to addressing environmental challenges including invasive species, infectious diseases, climate and land-use change. Motion sensitive camera traps offer a visual sensor to record the presence of a broad range of species providing location -specific information on movement and behavior. Modern digital camera traps that record video… ▽ More Studying animal movement and distribution is of critical importance to addressing environmental challenges including invasive species, infectious diseases, climate and land-use change. Motion sensitive camera traps offer a visual sensor to record the presence of a broad range of species providing location -specific information on movement and behavior. Modern digital camera traps that record video present new analytical opportunities, but also new data management challenges. This paper describes our experience with a terrestrial animal monitoring system at Barro Colorado Island, Panama. Our camera network captured the spatio-temporal dynamics of terrestrial bird and mammal activity at the site - data relevant to immediate science questions, and long-term conservation issues. We believe that the experience gained and lessons learned during our year long deployment and testing of the camera traps as well as the developed solutions are applicable to broader sensor network applications and are valuable for the advancement of the sensor network research. We suggest that the continued development of these hardware, software, and analytical tools, in concert, offer an exciting sensor-network solution to monitoring of animal populations which could realistically scale over larger areas and time spans. △ Less

Submitted 28 September, 2010; originally announced September 2010.

Showing 1–5 of 5 results for author: Jansen, P A