A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
Authors:
Seliem El-Sayed,
Canfer Akbulut,
Amanda McCroskery,
Geoff Keeling,
Zachary Kenton,
Zaria Jalan,
Nahema Marchal,
Arianna Manzini,
Toby Shevlane,
Shannon Vallor,
Daniel Susser,
Matija Franklin,
Sophie Bridgers,
Harry Law,
Matthew Rahtz,
Murray Shanahan,
Michael Henry Tessler,
Arthur Douillard,
Tom Everitt,
Sasha Brown
Abstract:
Recent generative AI systems have demonstrated more advanced persuasive capabilities and are increasingly permeating areas of life where they can influence decision-making. Generative AI presents a new risk profile of persuasion due the opportunity for reciprocal exchange and prolonged interactions. This has led to growing concerns about harms from AI persuasion and how they can be mitigated, high…
▽ More
Recent generative AI systems have demonstrated more advanced persuasive capabilities and are increasingly permeating areas of life where they can influence decision-making. Generative AI presents a new risk profile of persuasion due the opportunity for reciprocal exchange and prolonged interactions. This has led to growing concerns about harms from AI persuasion and how they can be mitigated, highlighting the need for a systematic study of AI persuasion. The current definitions of AI persuasion are unclear and related harms are insufficiently studied. Existing harm mitigation approaches prioritise harms from the outcome of persuasion over harms from the process of persuasion. In this paper, we lay the groundwork for the systematic study of AI persuasion. We first put forward definitions of persuasive generative AI. We distinguish between rationally persuasive generative AI, which relies on providing relevant facts, sound reasoning, or other forms of trustworthy evidence, and manipulative generative AI, which relies on taking advantage of cognitive biases and heuristics or misrepresenting information. We also put forward a map of harms from AI persuasion, including definitions and examples of economic, physical, environmental, psychological, sociocultural, political, privacy, and autonomy harm. We then introduce a map of mechanisms that contribute to harmful persuasion. Lastly, we provide an overview of approaches that can be used to mitigate against process harms of persuasion, including prompt engineering for manipulation classification and red teaming. Future work will operationalise these mitigations and study the interaction between different types of mechanisms of persuasion.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
Re-Ranking News Comments by Constructiveness and Curiosity Significantly Increases Perceived Respect, Trustworthiness, and Interest
Authors:
Emily Saltz,
Zaria Jalan,
Tin Acosta
Abstract:
Online commenting platforms have commonly developed systems to address online harms by removing and down-ranking content. An alternative, under-explored approach is to focus on up-ranking content to proactively prioritize prosocial commentary and set better conversational norms. We present a study with 460 English-speaking US-based news readers to understand the effects of re-ranking comments by c…
▽ More
Online commenting platforms have commonly developed systems to address online harms by removing and down-ranking content. An alternative, under-explored approach is to focus on up-ranking content to proactively prioritize prosocial commentary and set better conversational norms. We present a study with 460 English-speaking US-based news readers to understand the effects of re-ranking comments by constructiveness, curiosity, and personal stories on a variety of outcomes related to willingness to participate and engage, as well as perceived credibility and polarization in a comment section. In our rich-media survey experiment, participants across these four ranking conditions and a control group reviewed prototypes of comment sections of a Politics op-ed and Dining article. We found that outcomes varied significantly by article type. Up-ranking curiosity and constructiveness improved a number of measures for the Politics article, including perceived Respect, Trustworthiness, and Interestingness of the comment section. Constructiveness also increased perceptions that the comments were favorable to Republicans, with no condition worsening perceptions of partisans. Additionally, in the Dining article, personal stories and constructiveness rankings significantly improved the perceived informativeness of the comments. Overall, these findings indicate that incorporating prosocial qualities of speech into ranking could be a promising approach to promote healthier, less polarized dialogue in online comment sections.
△ Less
Submitted 15 April, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.