Search | arXiv e-print repository

ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into Principles

Authors: Savvas Petridis, Ben Wedin, James Wexler, Aaron Donsbach, Mahima Pushkarna, Nitesh Goyal, Carrie J. Cai, Michael Terry

Abstract: Large language model (LLM) prompting is a promising new approach for users to create and customize their own chatbots. However, current methods for steering a chatbot's outputs, such as prompt engineering and fine-tuning, do not support users in converting their natural feedback on the model's outputs to changes in the prompt or model. In this work, we explore how to enable users to interactively… ▽ More Large language model (LLM) prompting is a promising new approach for users to create and customize their own chatbots. However, current methods for steering a chatbot's outputs, such as prompt engineering and fine-tuning, do not support users in converting their natural feedback on the model's outputs to changes in the prompt or model. In this work, we explore how to enable users to interactively refine model outputs through their feedback, by hel** them convert their feedback into a set of principles (i.e. a constitution) that dictate the model's behavior. From a formative study, we (1) found that users needed support converting their feedback into principles for the chatbot and (2) classified the different principle types desired by users. Inspired by these findings, we developed ConstitutionMaker, an interactive tool for converting user feedback into principles, to steer LLM-based chatbots. With ConstitutionMaker, users can provide either positive or negative feedback in natural language, select auto-generated feedback, or rewrite the chatbot's response; each mode of feedback automatically generates a principle that is inserted into the chatbot's prompt. In a user study with 14 participants, we compare ConstitutionMaker to an ablated version, where users write their own principles. With ConstitutionMaker, participants felt that their principles could better guide the chatbot, that they could more easily convert their feedback into principles, and that they could write principles more efficiently, with less mental demand. ConstitutionMaker helped users identify ways to improve the chatbot, formulate their intuitive responses to the model into feedback, and convert this feedback into specific and clear principles. Together, these findings inform future tools that support the interactive critiquing of LLM outputs. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2207.02308 [pdf]

doi 10.1145/3517428.3544819

LaMPost: Design and Evaluation of an AI-assisted Email Writing Prototype for Adults with Dyslexia

Authors: Steven M. Goodman, Erin Buehler, Patrick Clary, Andy Coenen, Aaron Donsbach, Tiffanie N. Horne, Michal Lahav, Robert Macdonald, Rain Breaw Michaels, Ajit Narayanan, Mahima Pushkarna, Joel Riley, Alex Santana, Lei Shi, Rachel Sweeney, Phil Weaver, Ann Yuan, Meredith Ringel Morris

Abstract: Prior work has explored the writing challenges experienced by people with dyslexia, and the potential for new spelling, grammar, and word retrieval technologies to address these challenges. However, the capabilities for natural language generation demonstrated by the latest class of large language models (LLMs) highlight an opportunity to explore new forms of human-AI writing support tools. In thi… ▽ More Prior work has explored the writing challenges experienced by people with dyslexia, and the potential for new spelling, grammar, and word retrieval technologies to address these challenges. However, the capabilities for natural language generation demonstrated by the latest class of large language models (LLMs) highlight an opportunity to explore new forms of human-AI writing support tools. In this paper, we introduce LaMPost, a prototype email-writing interface that explores the potential for LLMs to power writing support tools that address the varied needs of people with dyslexia. LaMPost draws from our understanding of these needs and introduces novel AI-powered features for email-writing, including: outlining main ideas, generating a subject line, suggesting changes, rewriting a selection. We evaluated LaMPost with 19 adults with dyslexia, identifying many promising routes for further exploration (including the popularity of the "rewrite" and "subject line" features), but also finding that the current generation of LLMs may not surpass the accuracy and quality thresholds required to meet the needs of writers with dyslexia. Surprisingly, we found that participants' awareness of the AI had no effect on their perception of the system, nor on their feelings of autonomy, expression, and self-efficacy when writing emails. Our findings yield further insight into the benefits and drawbacks of using LLMs as writing support for adults with dyslexia and provide a foundation to build upon in future research. △ Less

Submitted 5 July, 2022; originally announced July 2022.

Comments: To appear at The 24th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '22), October 23-26, 2022, Athens, Greece. 26 pages

arXiv:2203.06566 [pdf, other]

PromptChainer: Chaining Large Language Model Prompts through Visual Programming

Authors: Tongshuang Wu, Ellen Jiang, Aaron Donsbach, Jeff Gray, Alejandra Molina, Michael Terry, Carrie J Cai

Abstract: While LLMs can effectively help prototype single ML functionalities, many real-world applications involve complex tasks that cannot be easily handled via a single run of an LLM. Recent work has found that chaining multiple LLM runs together (with the output of one step being the input to the next) can help users accomplish these more complex tasks, and in a way that is perceived to be more transpa… ▽ More While LLMs can effectively help prototype single ML functionalities, many real-world applications involve complex tasks that cannot be easily handled via a single run of an LLM. Recent work has found that chaining multiple LLM runs together (with the output of one step being the input to the next) can help users accomplish these more complex tasks, and in a way that is perceived to be more transparent and controllable. However, it remains unknown what users need when authoring their own LLM chains -- a key step for lowering the barriers for non-AI-experts to prototype AI-infused applications. In this work, we explore the LLM chain authoring process. We conclude from pilot studies find that chaining requires careful scaffolding for transforming intermediate node outputs, as well as debugging the chain at multiple granularities; to help with these needs, we designed PromptChainer, an interactive interface for visually programming chains. Through case studies with four people, we show that PromptChainer supports building prototypes for a range of applications, and conclude with open questions on scaling chains to complex tasks, and supporting low-fi chain prototy**. △ Less

Submitted 12 March, 2022; originally announced March 2022.

Comments: CHI LBW 2022

arXiv:2002.12387 [pdf]

doi 10.1145/3313831.3376260

Unmet Needs and Opportunities for Mobile Translation AI

Authors: Daniel J. Liebling, Michal Lahav, Abigail Evans, Aaron Donsbach, Jess Holbrook, Boris Smus, Lindsey Boran

Abstract: Translation apps and devices are often presented in the context of providing assistance while traveling abroad. However, the spectrum of needs for cross-language communication is much wider. To investigate these needs, we conducted three studies with populations spanning socioeconomic status and geographic regions: (1) United States-based travelers, (2) migrant workers in India, and (3) immigrant… ▽ More Translation apps and devices are often presented in the context of providing assistance while traveling abroad. However, the spectrum of needs for cross-language communication is much wider. To investigate these needs, we conducted three studies with populations spanning socioeconomic status and geographic regions: (1) United States-based travelers, (2) migrant workers in India, and (3) immigrant populations in the United States. We compare frequent travelers' perception and actual translation needs with those of the two migrant communities. The latter two, with low language proficiency, have the greatest translation needs to navigate their daily lives. However, current mobile translation apps do not meet these needs. Our findings provide new insights on the usage practices and limitations of mobile translation tools. Finally, we propose design implications to help apps better serve these unmet needs. △ Less

Submitted 27 February, 2020; originally announced February 2020.

Comments: 13 pages, 3 figures, to be published in Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI '20)

Showing 1–4 of 4 results for author: Donsbach, A