-
Soft Begging: Modular and Efficient Shielding of LLMs against Prompt Injection and Jailbreaking based on Prompt Tuning
Authors:
Simon Ostermann,
Kevin Baum,
Christoph Endres,
Julia Masloh,
Patrick Schramowski
Abstract:
Prompt injection (both direct and indirect) and jailbreaking are now recognized as significant issues for large language models (LLMs), particularly due to their potential for harm in application-integrated contexts. This extended abstract explores a novel approach to protecting LLMs from such attacks, termed "soft begging." This method involves training soft prompts to counteract the effects of c…
▽ More
Prompt injection (both direct and indirect) and jailbreaking are now recognized as significant issues for large language models (LLMs), particularly due to their potential for harm in application-integrated contexts. This extended abstract explores a novel approach to protecting LLMs from such attacks, termed "soft begging." This method involves training soft prompts to counteract the effects of corrupted prompts on the LLM's output. We provide an overview of prompt injections and jailbreaking, introduce the theoretical basis of the "soft begging" technique, and discuss an evaluation of its effectiveness.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
On the Quest for Effectiveness in Human Oversight: Interdisciplinary Perspectives
Authors:
Sarah Sterz,
Kevin Baum,
Sebastian Biewer,
Holger Hermanns,
Anne Lauber-Rönsberg,
Philip Meinel,
Markus Langer
Abstract:
Human oversight is currently discussed as a potential safeguard to counter some of the negative aspects of high-risk AI applications. This prompts a critical examination of the role and conditions necessary for what is prominently termed effective or meaningful human oversight of these systems. This paper investigates effective human oversight by synthesizing insights from psychological, legal, ph…
▽ More
Human oversight is currently discussed as a potential safeguard to counter some of the negative aspects of high-risk AI applications. This prompts a critical examination of the role and conditions necessary for what is prominently termed effective or meaningful human oversight of these systems. This paper investigates effective human oversight by synthesizing insights from psychological, legal, philosophical, and technical domains. Based on the claim that the main objective of human oversight is risk mitigation, we propose a viable understanding of effectiveness in human oversight: for human oversight to be effective, the oversight person has to have (a) sufficient causal power with regard to the system and its effects, (b) suitable epistemic access to relevant aspects of the situation, (c) self-control, and (d) fitting intentions for their role. Furthermore, we argue that this is equivalent to saying that an oversight person is effective if and only if they are morally responsible and have fitting intentions. Against this backdrop, we suggest facilitators and inhibitors of effectiveness in human oversight when striving for practical applicability. We discuss factors in three domains, namely, the technical design of the system, individual factors of oversight persons, and the environmental circumstances in which they operate. Finally, this paper scrutinizes the upcoming AI Act of the European Union -- in particular Article 14 on Human Oversight -- as an exemplary regulatory framework in which we study the practicality of our understanding of effective human oversight. By analyzing the provisions and implications of the European AI Act proposal, we pinpoint how far that proposal aligns with our analyses regarding effective human oversight as well as how it might get enriched by our conceptual understanding of effectiveness in human oversight.
△ Less
Submitted 7 May, 2024; v1 submitted 5 April, 2024;
originally announced April 2024.
-
Identifying Drivers of Predictive Aleatoric Uncertainty
Authors:
Pascal Iversen,
Simon Witzke,
Katharina Baum,
Bernhard Y. Renard
Abstract:
Explainability and uncertainty quantification are two pillars of trustable artificial intelligence. However, the reasoning behind uncertainty estimates is generally left unexplained. Identifying the drivers of uncertainty complements explanations of point predictions in recognizing model limitations and enhances trust in decisions and their communication. So far, explanations of uncertainties have…
▽ More
Explainability and uncertainty quantification are two pillars of trustable artificial intelligence. However, the reasoning behind uncertainty estimates is generally left unexplained. Identifying the drivers of uncertainty complements explanations of point predictions in recognizing model limitations and enhances trust in decisions and their communication. So far, explanations of uncertainties have been rarely studied. The few exceptions rely on Bayesian neural networks or technically intricate approaches, such as auxiliary generative models, thereby hindering their broad adoption. We present a simple approach to explain predictive aleatoric uncertainties. We estimate uncertainty as predictive variance by adapting a neural network with a Gaussian output distribution. Subsequently, we apply out-of-the-box explainers to the model's variance output. This approach can explain uncertainty influences more reliably than literature baselines, which we evaluate in a synthetic setting with a known data-generating process. We further adapt multiple metrics from conventional XAI research to uncertainty explanations. We quantify our findings with a nuanced benchmark analysis that includes real-world datasets. Finally, we apply our approach to an age regression model and discover reasonable sources of uncertainty. Overall, we explain uncertainty estimates with little modifications to the model architecture and demonstrate that our approach competes effectively with more intricate methods.
△ Less
Submitted 30 May, 2024; v1 submitted 12 December, 2023;
originally announced December 2023.
-
Software Do** Analysis for Human Oversight
Authors:
Sebastian Biewer,
Kevin Baum,
Sarah Sterz,
Holger Hermanns,
Sven Hetmank,
Markus Langer,
Anne Lauber-Rönsberg,
Franz Lehr
Abstract:
This article introduces a framework that is meant to assist in mitigating societal risks that software can pose. Concretely, this encompasses facets of software do** as well as unfairness and discrimination in high-risk decision-making systems. The term software do** refers to software that contains surreptitiously added functionality that is against the interest of the user. A prominent examp…
▽ More
This article introduces a framework that is meant to assist in mitigating societal risks that software can pose. Concretely, this encompasses facets of software do** as well as unfairness and discrimination in high-risk decision-making systems. The term software do** refers to software that contains surreptitiously added functionality that is against the interest of the user. A prominent example of software do** are the tampered emission cleaning systems that were found in millions of cars around the world when the diesel emissions scandal surfaced. The first part of this article combines the formal foundations of software do** analysis with established probabilistic falsification techniques to arrive at a black-box analysis technique for identifying undesired effects of software. We apply this technique to emission cleaning systems in diesel cars but also to high-risk systems that evaluate humans in a possibly unfair or discriminating way. We demonstrate how our approach can assist humans-in-the-loop to make better informed and more responsible decisions. This is to promote effective human oversight, which will be a central requirement enforced by the European Union's upcoming AI Act. We complement our technical contribution with a juridically, philosophically, and psychologically informed perspective on the potential problems caused by such systems.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
SimbaML: Connecting Mechanistic Models and Machine Learning with Augmented Data
Authors:
Maximilian Kleissl,
Lukas Drews,
Benedict B. Heyder,
Julian Zabbarov,
Pascal Iversen,
Simon Witzke,
Bernhard Y. Renard,
Katharina Baum
Abstract:
Training sophisticated machine learning (ML) models requires large datasets that are difficult or expensive to collect for many applications. If prior knowledge about system dynamics is available, mechanistic representations can be used to supplement real-world data. We present SimbaML (Simulation-Based ML), an open-source tool that unifies realistic synthetic dataset generation from ordinary diff…
▽ More
Training sophisticated machine learning (ML) models requires large datasets that are difficult or expensive to collect for many applications. If prior knowledge about system dynamics is available, mechanistic representations can be used to supplement real-world data. We present SimbaML (Simulation-Based ML), an open-source tool that unifies realistic synthetic dataset generation from ordinary differential equation-based models and the direct analysis and inclusion in ML pipelines. SimbaML conveniently enables investigating transfer learning from synthetic to real-world data, data augmentation, identifying needs for data collection, and benchmarking physics-informed ML approaches. SimbaML is available from https://pypi.org/project/simba-ml/.
△ Less
Submitted 9 July, 2023; v1 submitted 8 April, 2023;
originally announced April 2023.
-
LazyFox: Fast and parallelized overlap** community detection in large graphs
Authors:
Tim Garrels,
Athar Khodabakhsh,
Bernhard Y. Renard,
Katharina Baum
Abstract:
The detection of communities in graph datasets provides insight about a graph's underlying structure and is an important tool for various domains such as social sciences, marketing, traffic forecast, and drug discovery. While most existing algorithms provide fast approaches for community detection, their results usually contain strictly separated communities. However, most datasets would semantica…
▽ More
The detection of communities in graph datasets provides insight about a graph's underlying structure and is an important tool for various domains such as social sciences, marketing, traffic forecast, and drug discovery. While most existing algorithms provide fast approaches for community detection, their results usually contain strictly separated communities. However, most datasets would semantically allow for or even require overlap** communities that can only be determined at much higher computational cost. We build on an efficient algorithm, Fox, that detects such overlap** communities. Fox measures the closeness of a node to a community by approximating the count of triangles which that node forms with that community. We propose LazyFox, a multi-threaded version of the Fox algorithm, which provides even faster detection without an impact on community quality. This allows for the analyses of significantly larger and more complex datasets. LazyFox enables overlap** community detection on complex graph datasets with millions of nodes and billions of edges in days instead of weeks. As part of this work, LazyFox's implementation was published and is available as a tool under an MIT licence at https://github.com/TimGarrels/LazyFox.
△ Less
Submitted 6 October, 2022;
originally announced October 2022.
-
Explainability Auditing for Intelligent Systems: A Rationale for Multi-Disciplinary Perspectives
Authors:
Markus Langer,
Kevin Baum,
Kathrin Hartmann,
Stefan Hessel,
Timo Speith,
Jonas Wahl
Abstract:
National and international guidelines for trustworthy artificial intelligence (AI) consider explainability to be a central facet of trustworthy systems. This paper outlines a multi-disciplinary rationale for explainability auditing. Specifically, we propose that explainability auditing can ensure the quality of explainability of systems in applied contexts and can be the basis for certification as…
▽ More
National and international guidelines for trustworthy artificial intelligence (AI) consider explainability to be a central facet of trustworthy systems. This paper outlines a multi-disciplinary rationale for explainability auditing. Specifically, we propose that explainability auditing can ensure the quality of explainability of systems in applied contexts and can be the basis for certification as a means to communicate whether systems meet certain explainability standards and requirements. Moreover, we emphasize that explainability auditing needs to take a multi-disciplinary perspective, and we provide an overview of four perspectives (technical, psychological, ethical, legal) and their respective benefits with respect to explainability auditing.
△ Less
Submitted 5 August, 2021;
originally announced August 2021.
-
What Do We Want From Explainable Artificial Intelligence (XAI)? -- A Stakeholder Perspective on XAI and a Conceptual Model Guiding Interdisciplinary XAI Research
Authors:
Markus Langer,
Daniel Oster,
Timo Speith,
Holger Hermanns,
Lena Kästner,
Eva Schmidt,
Andreas Sesing,
Kevin Baum
Abstract:
Previous research in Explainable Artificial Intelligence (XAI) suggests that a main aim of explainability approaches is to satisfy specific interests, goals, expectations, needs, and demands regarding artificial systems (we call these stakeholders' desiderata) in a variety of contexts. However, the literature on XAI is vast, spreads out across multiple largely disconnected disciplines, and it ofte…
▽ More
Previous research in Explainable Artificial Intelligence (XAI) suggests that a main aim of explainability approaches is to satisfy specific interests, goals, expectations, needs, and demands regarding artificial systems (we call these stakeholders' desiderata) in a variety of contexts. However, the literature on XAI is vast, spreads out across multiple largely disconnected disciplines, and it often remains unclear how explainability approaches are supposed to achieve the goal of satisfying stakeholders' desiderata. This paper discusses the main classes of stakeholders calling for explainability of artificial systems and reviews their desiderata. We provide a model that explicitly spells out the main concepts and relations necessary to consider and investigate when evaluating, adjusting, choosing, and develo** explainability approaches that aim to satisfy stakeholders' desiderata. This model can serve researchers from the variety of different disciplines involved in XAI as a common ground. It emphasizes where there is interdisciplinary potential in the evaluation and the development of explainability approaches.
△ Less
Submitted 15 February, 2021;
originally announced February 2021.
-
Towards a Framework Combining Machine Ethics and Machine Explainability
Authors:
Kevin Baum,
Holger Hermanns,
Timo Speith
Abstract:
We find ourselves surrounded by a rapidly increasing number of autonomous and semi-autonomous systems. Two grand challenges arise from this development: Machine Ethics and Machine Explainability. Machine Ethics, on the one hand, is concerned with behavioral constraints for systems, so that morally acceptable, restricted behavior results; Machine Explainability, on the other hand, enables systems t…
▽ More
We find ourselves surrounded by a rapidly increasing number of autonomous and semi-autonomous systems. Two grand challenges arise from this development: Machine Ethics and Machine Explainability. Machine Ethics, on the one hand, is concerned with behavioral constraints for systems, so that morally acceptable, restricted behavior results; Machine Explainability, on the other hand, enables systems to explain their actions and argue for their decisions, so that human users can understand and justifiably trust them.
In this paper, we try to motivate and work towards a framework combining Machine Ethics and Machine Explainability. Starting from a toy example, we detect various desiderata of such a framework and argue why they should and how they could be incorporated in autonomous systems. Our main idea is to apply a framework of formal argumentation theory both, for decision-making under ethical constraints and for the task of generating useful explanations given only limited knowledge of the world. The result of our deliberations can be described as a first version of an ethically motivated, principle-governed framework combining Machine Ethics and Machine Explainability
△ Less
Submitted 2 January, 2019;
originally announced January 2019.