Search | arXiv e-print repository

garak: A Framework for Security Probing Large Language Models

Authors: Leon Derczynski, Erick Galinkin, Jeffrey Martin, Subho Majumdar, Nanna Inie

Abstract: As Large Language Models (LLMs) are deployed and integrated into thousands of applications, the need for scalable evaluation of how models respond to adversarial attacks grows rapidly. However, LLM security is a moving target: models produce unpredictable output, are constantly updated, and the potential adversary is highly diverse: anyone with access to the internet and a decent command of natura… ▽ More As Large Language Models (LLMs) are deployed and integrated into thousands of applications, the need for scalable evaluation of how models respond to adversarial attacks grows rapidly. However, LLM security is a moving target: models produce unpredictable output, are constantly updated, and the potential adversary is highly diverse: anyone with access to the internet and a decent command of natural language. Further, what constitutes a security weak in one context may not be an issue in a different context; one-fits-all guardrails remain theoretical. In this paper, we argue that it is time to rethink what constitutes ``LLM security'', and pursue a holistic approach to LLM security evaluation, where exploration and discovery of issues are central. To this end, this paper introduces garak (Generative AI Red-teaming and Assessment Kit), a framework which can be used to discover and identify vulnerabilities in a target LLM or dialog system. garak probes an LLM in a structured fashion to discover potential vulnerabilities. The outputs of the framework describe a target model's weaknesses, contribute to an informed discussion of what composes vulnerabilities in unique contexts, and can inform alignment and policy discussions for LLM deployment. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: https://garak.ai

arXiv:2405.15902 [pdf, other]

doi 10.1145/3656156.3665432

Hacc-Man: An Arcade Game for Jailbreaking LLMs

Authors: Matheus Valentim, Jeanette Falk, Nanna Inie

Abstract: The recent leaps in complexity and fluency of Large Language Models (LLMs) mean that, for the first time in human history, people can interact with computers using natural language alone. This creates monumental possibilities of automation and accessibility of computing, but also raises severe security and safety threats: When everyone can interact with LLMs, everyone can potentially break into th… ▽ More The recent leaps in complexity and fluency of Large Language Models (LLMs) mean that, for the first time in human history, people can interact with computers using natural language alone. This creates monumental possibilities of automation and accessibility of computing, but also raises severe security and safety threats: When everyone can interact with LLMs, everyone can potentially break into the systems running LLMs. All it takes is creative use of language. This paper presents Hacc-Man, a game which challenges its players to "jailbreak" an LLM: subvert the LLM to output something that it is not intended to. Jailbreaking is at the intersection between creative problem solving and LLM security. The purpose of the game is threefold: 1. To heighten awareness of the risks of deploying fragile LLMs in everyday systems, 2. To heighten people's self-efficacy in interacting with LLMs, and 3. To discover the creative problem solving strategies, people deploy in this novel context. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2404.16047 [pdf, other]

doi 10.1145/3630106.3659040

From "AI" to Probabilistic Automation: How Does Anthropomorphization of Technical Systems Descriptions Influence Trust?

Authors: Nanna Inie, Stefania Druga, Peter Zukerman, Emily M. Bender

Abstract: This paper investigates the influence of anthropomorphized descriptions of so-called "AI" (artificial intelligence) systems on people's self-assessment of trust in the system. Building on prior work, we define four categories of anthropomorphization (1. Properties of a cognizer, 2. Agency, 3. Biological metaphors, and 4. Properties of a communicator). We use a survey-based approach (n=954) to inve… ▽ More This paper investigates the influence of anthropomorphized descriptions of so-called "AI" (artificial intelligence) systems on people's self-assessment of trust in the system. Building on prior work, we define four categories of anthropomorphization (1. Properties of a cognizer, 2. Agency, 3. Biological metaphors, and 4. Properties of a communicator). We use a survey-based approach (n=954) to investigate whether participants are likely to trust one of two (fictitious) "AI" systems by randomly assigning people to see either an anthropomorphized or a de-anthropomorphized description of the systems. We find that participants are no more likely to trust anthropomorphized over de-anthropmorphized product descriptions overall. The type of product or system in combination with different anthropomorphic categories appears to exert greater influence on trust than anthropomorphizing language alone, and age is the only demographic factor that significantly correlates with people's preference for anthropomorphized or de-anthropomorphized descriptions. When elaborating on their choices, participants highlight factors such as lesser of two evils, lower or higher stakes contexts, and human favoritism as driving motivations when choosing between product A and B, irrespective of whether they saw an anthropomorphized or a de-anthropomorphized description of the product. Our results suggest that "anthropomorphism" in "AI" descriptions is an aggregate concept that may influence different groups differently, and provide nuance to the discussion of whether anthropomorphization leads to higher trust and over-reliance by the general public in systems sold as "AI". △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: Accepted to FAccT 2024. arXiv admin note: text overlap with arXiv:2403.05957

Journal ref: FAccT 2024

arXiv:2403.05957 [pdf, ps, other]

What Motivates People to Trust 'AI' Systems?

Authors: Nanna Inie

Abstract: Companies, organizations, and governments across the world are eager to employ so-called 'AI' (artificial intelligence) technology in a broad range of different products and systems. The promise of this cause célèbre is that the technologies offer increased automation, efficiency, and productivity - meanwhile, critics sound warnings of illusions of objectivity, pollution of our information ecosyst… ▽ More Companies, organizations, and governments across the world are eager to employ so-called 'AI' (artificial intelligence) technology in a broad range of different products and systems. The promise of this cause célèbre is that the technologies offer increased automation, efficiency, and productivity - meanwhile, critics sound warnings of illusions of objectivity, pollution of our information ecosystems, and reproduction of biases and discriminatory outcomes. This paper explores patterns of motivation in the general population for trusting (or distrusting) 'AI' systems. Based on a survey with more than 450 respondents from more than 30 different countries (and about 3000 open text answers), this paper presents a qualitative analysis of current opinions and thoughts about 'AI' technology, focusing on reasons for trusting such systems. The different reasons are synthesized into four rationales (lines of reasoning): the Human favoritism rationale, the Black box rationale, the OPSEC rationale, and the 'Wicked world, tame computers' rationale. These rationales provide insights into human motivation for trusting 'AI' which could be relevant for developers and designers of such systems, as well as for scholars develo** measures of trust in technological systems. △ Less

Submitted 9 March, 2024; originally announced March 2024.

arXiv:2311.06237 [pdf, other]

Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild

Authors: Nanna Inie, Jonathan Stray, Leon Derczynski

Abstract: Engaging in the deliberate generation of abnormal outputs from large language models (LLMs) by attacking them is a novel human activity. This paper presents a thorough exposition of how and why people perform such attacks. Using a formal qualitative methodology, we interviewed dozens of practitioners from a broad range of backgrounds, all contributors to this novel work of attempting to cause LLMs… ▽ More Engaging in the deliberate generation of abnormal outputs from large language models (LLMs) by attacking them is a novel human activity. This paper presents a thorough exposition of how and why people perform such attacks. Using a formal qualitative methodology, we interviewed dozens of practitioners from a broad range of backgrounds, all contributors to this novel work of attempting to cause LLMs to fail. We relate and connect this activity between its practitioners' motivations and goals; the strategies and techniques they deploy; and the crucial role the community plays. As a result, this paper presents a grounded theory of how and why people attack large language models: LLM red teaming in the wild. △ Less

Submitted 13 November, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

arXiv:2303.08931 [pdf, other]

Designing Participatory AI: Creative Professionals' Worries and Expectations about Generative AI

Authors: Nanna Inie, Jeanette Falk, Steven Tanimoto

Abstract: Generative AI, i.e., the group of technologies that automatically generate visual or written content based on text prompts, has undergone a leap in complexity and become widely available within just a few years. Such technologies potentially introduce a massive disruption to creative fields. This paper presents the results of a qualitative survey ($N$ = 23) investigating how creative professionals… ▽ More Generative AI, i.e., the group of technologies that automatically generate visual or written content based on text prompts, has undergone a leap in complexity and become widely available within just a few years. Such technologies potentially introduce a massive disruption to creative fields. This paper presents the results of a qualitative survey ($N$ = 23) investigating how creative professionals think about generative AI. The results show that the advancement of these AI models prompts important reflections on what defines creativity and how creatives imagine using AI to support their workflows. Based on these reflections, we discuss how we might design \textit{participatory AI} in the domain of creative expertise with the goal of empowering creative professionals in their present and future coexistence with AI. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: CHI 2023

arXiv:2002.08139 [pdf]

How Interaction Designers Use Tools to Manage Ideas

Authors: Nanna Inie, Peter Dalsgaard

Abstract: This paper presents a grounded theory-analysis based on a qualitative study of professional interaction designers (n=20) with a focus on how they use tools to manage design ideas. Idea management can be understood as a subcategory of the field Personal Information Management, which includes the activities around the capture, organization, retrieval, and use of information. Idea management pertains… ▽ More This paper presents a grounded theory-analysis based on a qualitative study of professional interaction designers (n=20) with a focus on how they use tools to manage design ideas. Idea management can be understood as a subcategory of the field Personal Information Management, which includes the activities around the capture, organization, retrieval, and use of information. Idea management pertains then to the management and use of ideas as part of creative activities. The paper identifies tool-supported idea management strategies and needs of professional interaction designers, and discusses the context and consequences of these strategies. Based on our analysis, we identify a conceptual framework of ten strategies which are supported by tools: saving, externalizing, advancing, exploring, archiving, clustering, extracting, browsing, verifying, and collaborating. Finally, we discuss how this framework can be used to characterize and analyze existing and novel idea management tools. △ Less

Submitted 19 February, 2020; originally announced February 2020.

arXiv:2002.04494 [pdf, other]

doi 10.1145/3334480.3383159

The Rumour Mill: Making the Spread of Misinformation Explicit and Tangible

Authors: Nanna Inie, Jeanette Falk Olesen, Leon Derczynski

Abstract: Misinformation spread presents a technological and social threat to society. With the advance of AI-based language models, automatically generated texts have become difficult to identify and easy to create at scale. We present "The Rumour Mill", a playful art piece, designed as a commentary on the spread of rumours and automatically-generated misinformation. The mill is a tabletop interactive mach… ▽ More Misinformation spread presents a technological and social threat to society. With the advance of AI-based language models, automatically generated texts have become difficult to identify and easy to create at scale. We present "The Rumour Mill", a playful art piece, designed as a commentary on the spread of rumours and automatically-generated misinformation. The mill is a tabletop interactive machine, which invites a user to experience the process of creating believable text by interacting with different tangible controls on the mill. The user manipulates visible parameters to adjust the genre and type of an automatically generated text rumour. The Rumour Mill is a physical demonstration of the state of current technology and its ability to generate and manipulate natural language text, and of the act of starting and spreading rumours. △ Less

Submitted 16 February, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

Comments: Accepted to CHI 2020 Interactivity

Showing 1–8 of 8 results for author: Inie, N