-
On the Evaluation of Procedural Level Generation Systems
Authors:
Oliver Withington,
Michael Cook,
Laurissa Tokarchuk
Abstract:
The evaluation of procedural content generation (PCG) systems for generating video game levels is a complex and contested topic. Ideally, the field would have access to robust, generalisable and widely accepted evaluation approaches that can be used to compare novel PCG systems to prior work, but consensus on how to evaluate novel systems is currently limited. We argue that the field can benefit f…
▽ More
The evaluation of procedural content generation (PCG) systems for generating video game levels is a complex and contested topic. Ideally, the field would have access to robust, generalisable and widely accepted evaluation approaches that can be used to compare novel PCG systems to prior work, but consensus on how to evaluate novel systems is currently limited. We argue that the field can benefit from a structured analysis of how procedural level generation systems can be evaluated, and how these techniques are currently used by researchers. This analysis can then be used to both inform on the current state of affairs, and to provide data to justify changes to this practice. This work aims to provide this by first develo** a novel taxonomy of PCG evaluation approaches, and then presenting the results of a survey of recent work in the field through the lens of this taxonomy. The results of this survey highlight several important weaknesses in current practice which we argue could be substantially mitigated by 1) promoting use of evaluation free system descriptions where appropriate, 2) promoting the development of diverse research frameworks, 3) promoting reuse of code and methodology wherever possible.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Not All the Same: Understanding and Informing Similarity Estimation in Tile-Based Video Games
Authors:
Sebastian Berns,
Vanessa Volz,
Laurissa Tokarchuk,
Sam Snodgrass,
Christian Guckelsberger
Abstract:
Similarity estimation is essential for many game AI applications, from the procedural generation of distinct assets to automated exploration with game-playing agents. While similarity metrics often substitute human evaluation, their alignment with our judgement is unclear. Consequently, the result of their application can fail human expectations, leading to e.g. unappreciated content or unbelievab…
▽ More
Similarity estimation is essential for many game AI applications, from the procedural generation of distinct assets to automated exploration with game-playing agents. While similarity metrics often substitute human evaluation, their alignment with our judgement is unclear. Consequently, the result of their application can fail human expectations, leading to e.g. unappreciated content or unbelievable agent behaviour. We alleviate this gap through a multi-factorial study of two tile-based games in two representations, where participants (N=456) judged the similarity of level triplets. Based on this data, we construct domain-specific perceptual spaces, encoding similarity-relevant attributes. We compare 12 metrics to these spaces and evaluate their approximation quality through several quantitative lenses. Moreover, we conduct a qualitative labelling study to identify the features underlying the human similarity judgement in this popular genre. Our findings inform the selection of existing metrics and highlight requirements for the design of new similarity metrics benefiting game development and research.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Exploring Minecraft Settlement Generators with Generative Shift Analysis
Authors:
Jean-Baptiste Hervé,
Oliver Withington,
Marion Hervé,
Laurissa Tokarchuk,
Christoph Salge
Abstract:
With growing interest in Procedural Content Generation (PCG) it becomes increasingly important to develop methods and tools for evaluating and comparing alternative systems. There is a particular lack regarding the evaluation of generative pipelines, where a set of generative systems work in series to make iterative changes to an artifact. We introduce a novel method called Generative Shift for ev…
▽ More
With growing interest in Procedural Content Generation (PCG) it becomes increasingly important to develop methods and tools for evaluating and comparing alternative systems. There is a particular lack regarding the evaluation of generative pipelines, where a set of generative systems work in series to make iterative changes to an artifact. We introduce a novel method called Generative Shift for evaluating the impact of individual stages in a PCG pipeline by quantifying the impact that a generative process has when it is applied to a pre-existing artifact. We explore this technique by applying it to a very rich dataset of Minecraft game maps produced by a set of alternative settlement generators developed as part of the Generative Design in Minecraft Competition (GDMC), all of which are designed to produce appropriate settlements for a pre-existing map. While this is an early exploration of this technique we find it to be a promising lens to apply to PCG evaluation, and we are optimistic about the potential of Generative Shift to be a domain-agnostic method for evaluating generative pipelines.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
The Right Variety: Improving Expressive Range Analysis with Metric Selection Methods
Authors:
Oliver Withington,
Laurissa Tokarchuk
Abstract:
Expressive Range Analysis (ERA), an approach for visualising the output of Procedural Content Generation (PCG) systems, is widely used within PCG research to evaluate and compare generators, often to make comparative statements about their relative performance in terms of output diversity and search space exploration. Producing a standard ERA visualisation requires the selection of two metrics whi…
▽ More
Expressive Range Analysis (ERA), an approach for visualising the output of Procedural Content Generation (PCG) systems, is widely used within PCG research to evaluate and compare generators, often to make comparative statements about their relative performance in terms of output diversity and search space exploration. Producing a standard ERA visualisation requires the selection of two metrics which can be calculated for all generated artefacts to be visualised. However, to our knowledge there are no methodologies or heuristics for justifying the selection of a specific metric pair over alternatives. Prior work has typically either made a selection based on established but unjustified norms, designer intuition, or has produced multiple visualisations across all possible pairs. This work aims to contribute to this area by identifying valuable characteristics of metric pairings, and by demonstrating that pairings that have these characteristics have an increased probability of producing an informative ERA projection of the underlying generator. We introduce and investigate three quantifiable selection criteria for assessing metric pairs, and demonstrate how these criteria can be operationalized to rank those available. Though this is an early exploration of the concept of quantifying the utility of ERA metric pairs, we argue that the approach explored in this paper can make ERA more useful and usable for both researchers and game designers.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Visualising Generative Spaces Using Convolutional Neural Network Embeddings
Authors:
Oliver Withington,
Laurissa Tokarchuk
Abstract:
As academic interest in procedural content generation (PCG) for games has increased, so has the need for methodologies for comparing and contrasting the output spaces of alternative PCG systems. In this paper we introduce and evaluate a novel approach for visualising the generative spaces of level generation systems, using embeddings extracted from a trained convolutional neural network. We evalua…
▽ More
As academic interest in procedural content generation (PCG) for games has increased, so has the need for methodologies for comparing and contrasting the output spaces of alternative PCG systems. In this paper we introduce and evaluate a novel approach for visualising the generative spaces of level generation systems, using embeddings extracted from a trained convolutional neural network. We evaluate the approach in terms of its ability to produce 2D visualisations of encoded game levels that correlate with their behavioural characteristics. The results across two alternative game domains, Super Mario and Boxoban, indicate that this approach is powerful in certain settings and that it has the potential to supersede alternative methods for visually comparing generative spaces. However its performance was also inconsistent across the domains investigated in this work, as well as it being susceptible to intermittent failure. We conclude that this method is worthy of further evaluation, but that future implementations of it would benefit from significant refinement.
△ Less
Submitted 31 October, 2022;
originally announced October 2022.
-
Compressing and Comparing the Generative Spaces of Procedural Content Generators
Authors:
Oliver Withington,
Laurissa Tokarchuk
Abstract:
The past decade has seen a rapid increase in the level of research interest in procedural content generation (PCG) for digital games, and there are now numerous research avenues focused on new approaches for driving and applying PCG systems. An area in which progress has been comparatively slow is the development of generalisable approaches for comparing alternative PCG systems, especially in term…
▽ More
The past decade has seen a rapid increase in the level of research interest in procedural content generation (PCG) for digital games, and there are now numerous research avenues focused on new approaches for driving and applying PCG systems. An area in which progress has been comparatively slow is the development of generalisable approaches for comparing alternative PCG systems, especially in terms of their generative spaces. It is to this area that this paper aims to make a contribution, by exploring the utility of data compression algorithms in compressing the generative spaces of PCG systems. We hope that this approach could be the basis for develo** useful qualitative tools for comparing PCG systems to help designers better understand and optimize their generators. In this work we assess the efficacy of a selection of algorithms across sets of levels for 2D tile-based games by investigating how much their respective generative space compressions correlate with level behavioral characteristics. We conclude that the approach looks to be a promising one despite some inconsistency in efficacy in alternative domains, and that of the algorithms tested Multiple Correspondence Analysis appears to perform the most effectively.
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
Extended Reality (XR) Remote Research: a Survey of Drawbacks and Opportunities
Authors:
Jack Ratcliffe,
Francesco Soave,
Nick Bryan-Kinns,
Laurissa Tokarchuk,
Ildar Farkhatdinov
Abstract:
Extended Reality (XR) technology - such as virtual and augmented reality - is now widely used in Human Computer Interaction (HCI), social science and psychology experimentation. However, these experiments are predominantly deployed in-lab with a co-present researcher. Remote experiments, without co-present researchers, have not flourished, despite the success of remote approaches for non-XR invest…
▽ More
Extended Reality (XR) technology - such as virtual and augmented reality - is now widely used in Human Computer Interaction (HCI), social science and psychology experimentation. However, these experiments are predominantly deployed in-lab with a co-present researcher. Remote experiments, without co-present researchers, have not flourished, despite the success of remote approaches for non-XR investigations. This paper summarises findings from a 30-item survey of 46 XR researchers to understand perceived limitations and benefits of remote XR experimentation. Our thematic analysis identifies concerns common with non-XR remote research, such as participant recruitment, as well as XR-specific issues, including safety and hardware variability. We identify potential positive affordances of XR technology, including leveraging data collection functionalities builtin to HMDs (e.g. hand, gaze tracking) and the portability and reproducibility of an experimental setting. We suggest that XR technology could be conceptualised as an interactive technology and a capable data-collection device suited for remote experimentation.
△ Less
Submitted 20 January, 2021;
originally announced January 2021.
-
Finding Dory in the Crowd: Detecting Social Interactions using Multi-Modal Mobile Sensing
Authors:
Kleomenis Katevas,
Katrin Hänsel,
Richard Clegg,
Ilias Leontiadis,
Hamed Haddadi,
Laurissa Tokarchuk
Abstract:
Remembering our day-to-day social interactions is challenging even if you aren't a blue memory challenged fish. The ability to automatically detect and remember these types of interactions is not only beneficial for individuals interested in their behavior in crowded situations, but also of interest to those who analyze crowd behavior. Currently, detecting social interactions is often performed us…
▽ More
Remembering our day-to-day social interactions is challenging even if you aren't a blue memory challenged fish. The ability to automatically detect and remember these types of interactions is not only beneficial for individuals interested in their behavior in crowded situations, but also of interest to those who analyze crowd behavior. Currently, detecting social interactions is often performed using a variety of methods including ethnographic studies, computer vision techniques and manual annotation-based data analysis. However, mobile phones offer easier means for data collection that is easy to analyze and can preserve the user's privacy. In this work, we present a system for detecting stationary social interactions inside crowds, leveraging multi-modal mobile sensing data such as Bluetooth Smart (BLE), accelerometer and gyroscope. To inform the development of such system, we conducted a study with 24 participants, where we asked them to socialize with each other for 45 minutes. We built a machine learning system based on gradient-boosted trees that predicts both 1:1 and group interactions with 77.8% precision and 86.5% recall, a 30.2% performance increase compared to a proximity-based approach. By utilizing a community detection-based method, we further detected the various group formation that exist within the crowd. Using mobile phone sensors already carried by the majority of people in a crowd makes our approach particularly well suited to real-life analysis of crowd behavior and influence strategies.
△ Less
Submitted 16 November, 2018; v1 submitted 30 August, 2018;
originally announced September 2018.
-
SensingKit: Evaluating the Sensor Power Consumption in iOS devices
Authors:
Kleomenis Katevas,
Hamed Haddadi,
Laurissa Tokarchuk
Abstract:
Today's smartphones come equipped with a range of advanced sensors capable of sensing motion, orientation, audio as well as environmental data with high accuracy. With the existence of application distribution channels such as the Apple App Store and the Google Play Store, researchers can distribute applications and collect large scale data in ways that previously were not possible. Motivated by t…
▽ More
Today's smartphones come equipped with a range of advanced sensors capable of sensing motion, orientation, audio as well as environmental data with high accuracy. With the existence of application distribution channels such as the Apple App Store and the Google Play Store, researchers can distribute applications and collect large scale data in ways that previously were not possible. Motivated by the lack of a universal, multi-platform sensing library, in this work we present the design and implementation of SensingKit, an open-source continuous sensing system that supports both iOS and Android mobile devices. One of the unique features of SensingKit is the support of the latest beacon technologies based on Bluetooth Smart (BLE), such as iBeaconand Eddystone. We evaluate and compare the power consumption of each supported sensor individually, using an iPhone 5S device running on iOS 9. We believe that this platform will be beneficial to all researchers and developers who plan to use mobile sensing technology in large-scale experiments.
△ Less
Submitted 17 June, 2016;
originally announced June 2016.