-
LLMs achieve adult human performance on higher-order theory of mind tasks
Authors:
Winnie Street,
John Oliver Siy,
Geoff Keeling,
Adrien Baranes,
Benjamin Barnett,
Michael McKibben,
Tatenda Kanyere,
Alison Lentz,
Blaise Aguera y Arcas,
Robin I. M. Dunbar
Abstract:
This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a recursive manner (e.g. I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite -- Multi-Order Theory of Mind Q&A -- and using it to compare the per…
▽ More
This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a recursive manner (e.g. I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite -- Multi-Order Theory of Mind Q&A -- and using it to compare the performance of five LLMs to a newly gathered adult human benchmark. We find that GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences. Our results suggest that there is an interplay between model size and finetuning for the realisation of ToM abilities, and that the best-performing LLMs have developed a generalised capacity for ToM. Given the role that higher-order ToM plays in a wide range of cooperative and competitive human behaviours, these findings have significant implications for user-facing LLM applications.
△ Less
Submitted 31 May, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
The Ethics of Advanced AI Assistants
Authors:
Iason Gabriel,
Arianna Manzini,
Geoff Keeling,
Lisa Anne Hendricks,
Verena Rieser,
Hasan Iqbal,
Nenad TomaĊĦev,
Ira Ktena,
Zachary Kenton,
Mikel Rodriguez,
Seliem El-Sayed,
Sasha Brown,
Canfer Akbulut,
Andrew Trask,
Edward Hughes,
A. Stevie Bergman,
Renee Shelby,
Nahema Marchal,
Conor Griffin,
Juan Mateos-Garcia,
Laura Weidinger,
Winnie Street,
Benjamin Lange,
Alex Ingerman,
Alison Lentz
, et al. (32 additional authors not shown)
Abstract:
This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, pro…
▽ More
This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, providing an overview of AI assistants, their technical foundations and potential range of applications. It then explores questions around AI value alignment, well-being, safety and malicious uses. Extending the circle of inquiry further, we next consider the relationship between advanced AI assistants and individual users in more detail, exploring topics such as manipulation and persuasion, anthropomorphism, appropriate relationships, trust and privacy. With this analysis in place, we consider the deployment of advanced assistants at a societal scale, focusing on cooperation, equity and access, misinformation, economic impact, the environment and how best to evaluate advanced AI assistants. Finally, we conclude by providing a range of recommendations for researchers, developers, policymakers and public stakeholders.
△ Less
Submitted 28 April, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
Authors:
Seliem El-Sayed,
Canfer Akbulut,
Amanda McCroskery,
Geoff Keeling,
Zachary Kenton,
Zaria Jalan,
Nahema Marchal,
Arianna Manzini,
Toby Shevlane,
Shannon Vallor,
Daniel Susser,
Matija Franklin,
Sophie Bridgers,
Harry Law,
Matthew Rahtz,
Murray Shanahan,
Michael Henry Tessler,
Arthur Douillard,
Tom Everitt,
Sasha Brown
Abstract:
Recent generative AI systems have demonstrated more advanced persuasive capabilities and are increasingly permeating areas of life where they can influence decision-making. Generative AI presents a new risk profile of persuasion due the opportunity for reciprocal exchange and prolonged interactions. This has led to growing concerns about harms from AI persuasion and how they can be mitigated, high…
▽ More
Recent generative AI systems have demonstrated more advanced persuasive capabilities and are increasingly permeating areas of life where they can influence decision-making. Generative AI presents a new risk profile of persuasion due the opportunity for reciprocal exchange and prolonged interactions. This has led to growing concerns about harms from AI persuasion and how they can be mitigated, highlighting the need for a systematic study of AI persuasion. The current definitions of AI persuasion are unclear and related harms are insufficiently studied. Existing harm mitigation approaches prioritise harms from the outcome of persuasion over harms from the process of persuasion. In this paper, we lay the groundwork for the systematic study of AI persuasion. We first put forward definitions of persuasive generative AI. We distinguish between rationally persuasive generative AI, which relies on providing relevant facts, sound reasoning, or other forms of trustworthy evidence, and manipulative generative AI, which relies on taking advantage of cognitive biases and heuristics or misrepresenting information. We also put forward a map of harms from AI persuasion, including definitions and examples of economic, physical, environmental, psychological, sociocultural, political, privacy, and autonomy harm. We then introduce a map of mechanisms that contribute to harmful persuasion. Lastly, we provide an overview of approaches that can be used to mitigate against process harms of persuasion, including prompt engineering for manipulation classification and red teaming. Future work will operationalise these mitigations and study the interaction between different types of mechanisms of persuasion.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Should agentic conversational AI change how we think about ethics? Characterising an interactional ethics centred on respect
Authors:
Lize Alberts,
Geoff Keeling,
Amanda McCroskery
Abstract:
With the growing popularity of conversational agents based on large language models (LLMs), we need to ensure their behaviour is ethical and appropriate. Work in this area largely centres around the 'HHH' criteria: making outputs more helpful and honest, and avoiding harmful (biased, toxic, or inaccurate) statements. Whilst this semantic focus is useful when viewing LLM agents as mere mediums or o…
▽ More
With the growing popularity of conversational agents based on large language models (LLMs), we need to ensure their behaviour is ethical and appropriate. Work in this area largely centres around the 'HHH' criteria: making outputs more helpful and honest, and avoiding harmful (biased, toxic, or inaccurate) statements. Whilst this semantic focus is useful when viewing LLM agents as mere mediums or output-generating systems, it fails to account for pragmatic factors that can make the same speech act seem more or less tactless or inconsiderate in different social situations. With the push towards agentic AI, wherein systems become increasingly proactive in chasing goals and performing actions in the world, considering the pragmatics of interaction becomes essential. We propose an interactional approach to ethics that is centred on relational and situational factors. We explore what it means for a system, as a social actor, to treat an individual respectfully in a (series of) interaction(s). Our work anticipates a set of largely unexplored risks at the level of situated social interaction, and offers practical suggestions to help agentic LLM technologies treat people well.
△ Less
Submitted 16 May, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
Engaging Engineering Teams Through Moral Imagination: A Bottom-Up Approach for Responsible Innovation and Ethical Culture Change in Technology Companies
Authors:
Benjamin Lange,
Geoff Keeling,
Amanda McCroskery,
Ben Zevenbergen,
Sandra Blascovich,
Kyle Pedersen,
Alison Lentz,
Blaise Aguera y Arcas
Abstract:
We propose a "Moral Imagination" methodology to facilitate a culture of responsible innovation for engineering and product teams in technology companies. Our approach has been operationalized over the past two years at Google, where we have conducted over 50 workshops with teams across the organization. We argue that our approach is a crucial complement to existing formal and informal initiatives…
▽ More
We propose a "Moral Imagination" methodology to facilitate a culture of responsible innovation for engineering and product teams in technology companies. Our approach has been operationalized over the past two years at Google, where we have conducted over 50 workshops with teams across the organization. We argue that our approach is a crucial complement to existing formal and informal initiatives for fostering a culture of ethical awareness, deliberation, and decision-making in technology design such as company principles, ethics and privacy review procedures, and compliance controls. We characterize some of the distinctive benefits of our methodology for the technology sector in particular.
△ Less
Submitted 28 October, 2023; v1 submitted 12 June, 2023;
originally announced June 2023.
-
Algorithmic Bias, Generalist Models,and Clinical Medicine
Authors:
Geoff Keeling
Abstract:
The technical landscape of clinical machine learning is shifting in ways that destabilize pervasive assumptions about the nature and causes of algorithmic bias. On one hand, the dominant paradigm in clinical machine learning is narrow in the sense that models are trained on biomedical datasets for particular clinical tasks such as diagnosis and treatment recommendation. On the other hand, the emer…
▽ More
The technical landscape of clinical machine learning is shifting in ways that destabilize pervasive assumptions about the nature and causes of algorithmic bias. On one hand, the dominant paradigm in clinical machine learning is narrow in the sense that models are trained on biomedical datasets for particular clinical tasks such as diagnosis and treatment recommendation. On the other hand, the emerging paradigm is generalist in the sense that general-purpose language models such as Google's BERT and PaLM are increasingly being adapted for clinical use cases via prompting or fine-tuning on biomedical datasets. Many of these next-generation models provide substantial performance gains over prior clinical models, but at the same time introduce novel kinds of algorithmic bias and complicate the explanatory relationship between algorithmic biases and biases in training data. This paper articulates how and in what respects biases in generalist models differ from biases in prior clinical models, and draws out practical recommendations for algorithmic bias mitigation.
△ Less
Submitted 6 May, 2023;
originally announced May 2023.
-
On the Opportunities and Risks of Foundation Models
Authors:
Rishi Bommasani,
Drew A. Hudson,
Ehsan Adeli,
Russ Altman,
Simran Arora,
Sydney von Arx,
Michael S. Bernstein,
Jeannette Bohg,
Antoine Bosselut,
Emma Brunskill,
Erik Brynjolfsson,
Shyamal Buch,
Dallas Card,
Rodrigo Castellon,
Niladri Chatterji,
Annie Chen,
Kathleen Creel,
Jared Quincy Davis,
Dora Demszky,
Chris Donahue,
Moussa Doumbouya,
Esin Durmus,
Stefano Ermon,
John Etchemendy,
Kawin Ethayarajh
, et al. (89 additional authors not shown)
Abstract:
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their cap…
▽ More
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.
△ Less
Submitted 12 July, 2022; v1 submitted 16 August, 2021;
originally announced August 2021.