Search | arXiv e-print repository

LLMs achieve adult human performance on higher-order theory of mind tasks

Authors: Winnie Street, John Oliver Siy, Geoff Keeling, Adrien Baranes, Benjamin Barnett, Michael McKibben, Tatenda Kanyere, Alison Lentz, Blaise Aguera y Arcas, Robin I. M. Dunbar

Abstract: This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a recursive manner (e.g. I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite -- Multi-Order Theory of Mind Q&A -- and using it to compare the per… ▽ More This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a recursive manner (e.g. I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite -- Multi-Order Theory of Mind Q&A -- and using it to compare the performance of five LLMs to a newly gathered adult human benchmark. We find that GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences. Our results suggest that there is an interplay between model size and finetuning for the realisation of ToM abilities, and that the best-performing LLMs have developed a generalised capacity for ToM. Given the role that higher-order ToM plays in a wide range of cooperative and competitive human behaviours, these findings have significant implications for user-facing LLM applications. △ Less

Submitted 31 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

ACM Class: I.2.7; H.1.2

arXiv:2308.05577 [pdf, other]

Optimal Designs for Two-Stage Inference

Authors: Jonathan W. Stallrich, Michael McKibben

Abstract: The analysis of screening experiments is often done in two stages, starting with factor selection via an analysis under a main effects model. The success of this first stage is influenced by three components: (1) main effect estimators' variances and (2) bias, and (3) the estimate of the noise variance. Component (3) has only recently been given attention with design techniques that ensure an unbi… ▽ More The analysis of screening experiments is often done in two stages, starting with factor selection via an analysis under a main effects model. The success of this first stage is influenced by three components: (1) main effect estimators' variances and (2) bias, and (3) the estimate of the noise variance. Component (3) has only recently been given attention with design techniques that ensure an unbiased estimate of the noise variance. In this paper, we propose a design criterion based on expected confidence intervals of the first stage analysis that balances all three components. To address model misspecification, we propose a computationally-efficient all-subsets analysis and a corresponding constrained design criterion based on lack-of-fit. Scenarios found in existing design literature are revisited with our criteria and new designs are provided that improve upon existing methods. △ Less

Submitted 15 March, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

arXiv:2208.06076 [pdf, ps, other]

On weighted pseudo almost automorphic mild solutions for some mean field stochastic evolution equations

Authors: Moustapha Dieye, Amadou Diop, Mamadou Moustapha Mbaye, Mark A. McKibben

Abstract: When the evolution familiy is hyperbolic and satisfies the Acquistapace-Terreni conditions, the existence and uniquenness of an almost automorphic mild solution and a weighted pseudo almost automorphic mild solution in distribution of mean-filed nonautonomous stochastic evolution equations driven by fractional Brownian motion is proved. Examples illustrating the main results are included. When the evolution familiy is hyperbolic and satisfies the Acquistapace-Terreni conditions, the existence and uniquenness of an almost automorphic mild solution and a weighted pseudo almost automorphic mild solution in distribution of mean-filed nonautonomous stochastic evolution equations driven by fractional Brownian motion is proved. Examples illustrating the main results are included. △ Less

Submitted 11 August, 2022; originally announced August 2022.

Showing 1–3 of 3 results for author: McKibben, M