-
LLMs achieve adult human performance on higher-order theory of mind tasks
Authors:
Winnie Street,
John Oliver Siy,
Geoff Keeling,
Adrien Baranes,
Benjamin Barnett,
Michael McKibben,
Tatenda Kanyere,
Alison Lentz,
Blaise Aguera y Arcas,
Robin I. M. Dunbar
Abstract:
This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a recursive manner (e.g. I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite -- Multi-Order Theory of Mind Q&A -- and using it to compare the per…
▽ More
This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a recursive manner (e.g. I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite -- Multi-Order Theory of Mind Q&A -- and using it to compare the performance of five LLMs to a newly gathered adult human benchmark. We find that GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences. Our results suggest that there is an interplay between model size and finetuning for the realisation of ToM abilities, and that the best-performing LLMs have developed a generalised capacity for ToM. Given the role that higher-order ToM plays in a wide range of cooperative and competitive human behaviours, these findings have significant implications for user-facing LLM applications.
△ Less
Submitted 31 May, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
Optimal Designs for Two-Stage Inference
Authors:
Jonathan W. Stallrich,
Michael McKibben
Abstract:
The analysis of screening experiments is often done in two stages, starting with factor selection via an analysis under a main effects model. The success of this first stage is influenced by three components: (1) main effect estimators' variances and (2) bias, and (3) the estimate of the noise variance. Component (3) has only recently been given attention with design techniques that ensure an unbi…
▽ More
The analysis of screening experiments is often done in two stages, starting with factor selection via an analysis under a main effects model. The success of this first stage is influenced by three components: (1) main effect estimators' variances and (2) bias, and (3) the estimate of the noise variance. Component (3) has only recently been given attention with design techniques that ensure an unbiased estimate of the noise variance. In this paper, we propose a design criterion based on expected confidence intervals of the first stage analysis that balances all three components. To address model misspecification, we propose a computationally-efficient all-subsets analysis and a corresponding constrained design criterion based on lack-of-fit. Scenarios found in existing design literature are revisited with our criteria and new designs are provided that improve upon existing methods.
△ Less
Submitted 15 March, 2024; v1 submitted 10 August, 2023;
originally announced August 2023.
-
On weighted pseudo almost automorphic mild solutions for some mean field stochastic evolution equations
Authors:
Moustapha Dieye,
Amadou Diop,
Mamadou Moustapha Mbaye,
Mark A. McKibben
Abstract:
When the evolution familiy is hyperbolic and satisfies the Acquistapace-Terreni conditions, the existence and uniquenness of an almost automorphic mild solution and a weighted pseudo almost automorphic mild solution in distribution of mean-filed nonautonomous stochastic evolution equations driven by fractional Brownian motion is proved. Examples illustrating the main results are included.
When the evolution familiy is hyperbolic and satisfies the Acquistapace-Terreni conditions, the existence and uniquenness of an almost automorphic mild solution and a weighted pseudo almost automorphic mild solution in distribution of mean-filed nonautonomous stochastic evolution equations driven by fractional Brownian motion is proved. Examples illustrating the main results are included.
△ Less
Submitted 11 August, 2022;
originally announced August 2022.