Search | arXiv e-print repository

Affirmative safety: An approach to risk management for high-risk AI

Authors: Akash R. Wasil, Joshua Clymer, David Krueger, Emily Dardaman, Simeon Campos, Evan R. Murphy

Abstract: Prominent AI experts have suggested that companies develo** high-risk AI systems should be required to show that such systems are safe before they can be developed or deployed. The goal of this paper is to expand on this idea and explore its implications for risk management. We argue that entities develo** or deploying high-risk AI systems should be required to present evidence of affirmative… ▽ More Prominent AI experts have suggested that companies develo** high-risk AI systems should be required to show that such systems are safe before they can be developed or deployed. The goal of this paper is to expand on this idea and explore its implications for risk management. We argue that entities develo** or deploying high-risk AI systems should be required to present evidence of affirmative safety: a proactive case that their activities keep risks below acceptable thresholds. We begin the paper by highlighting global security risks from AI that have been acknowledged by AI experts and world governments. Next, we briefly describe principles of risk management from other high-risk fields (e.g., nuclear safety). Then, we propose a risk management approach for advanced AI in which model developers must provide evidence that their activities keep certain risks below regulator-set thresholds. As a first step toward understanding what affirmative safety cases should include, we illustrate how certain kinds of technical evidence and operational evidence can support an affirmative safety case. In the technical section, we discuss behavioral evidence (evidence about model outputs), cognitive evidence (evidence about model internals), and developmental evidence (evidence about the training process). In the operational section, we offer examples of organizational practices that could contribute to affirmative safety cases: information security practices, safety culture, and emergency response capacity. Finally, we briefly compare our approach to the NIST AI Risk Management Framework. Overall, we hope our work contributes to ongoing discussions about national and global security risks posed by AI and regulatory approaches to address these risks. △ Less

Submitted 14 April, 2024; originally announced June 2024.

arXiv:2311.09227 [pdf, other]

Open-Sourcing Highly Capable Foundation Models: An evaluation of risks, benefits, and alternative methods for pursuing open-source objectives

Authors: Elizabeth Seger, Noemi Dreksler, Richard Moulange, Emily Dardaman, Jonas Schuett, K. Wei, Christoph Winter, Mackenzie Arnold, Seán Ó hÉigeartaigh, Anton Korinek, Markus Anderljung, Ben Bucknall, Alan Chan, Eoghan Stafford, Leonie Koessler, Aviv Ovadya, Ben Garfinkel, Emma Bluemke, Michael Aird, Patrick Levermore, Julian Hazell, Abhishek Gupta

Abstract: Recent decisions by leading AI labs to either open-source their models or to restrict access to their models has sparked debate about whether, and how, increasingly capable AI models should be shared. Open-sourcing in AI typically refers to making model architecture and weights freely and publicly accessible for anyone to modify, study, build on, and use. This offers advantages such as enabling ex… ▽ More Recent decisions by leading AI labs to either open-source their models or to restrict access to their models has sparked debate about whether, and how, increasingly capable AI models should be shared. Open-sourcing in AI typically refers to making model architecture and weights freely and publicly accessible for anyone to modify, study, build on, and use. This offers advantages such as enabling external oversight, accelerating progress, and decentralizing control over AI development and use. However, it also presents a growing potential for misuse and unintended consequences. This paper offers an examination of the risks and benefits of open-sourcing highly capable foundation models. While open-sourcing has historically provided substantial net benefits for most software and AI development processes, we argue that for some highly capable foundation models likely to be developed in the near future, open-sourcing may pose sufficiently extreme risks to outweigh the benefits. In such a case, highly capable foundation models should not be open-sourced, at least not initially. Alternative strategies, including non-open-source model sharing options, are explored. The paper concludes with recommendations for developers, standard-setting bodies, and governments for establishing safe and responsible model sharing practices and preserving open-source benefits where safe. △ Less

Submitted 29 September, 2023; originally announced November 2023.

Comments: Official release at https://www.governance.ai/research-paper/open-sourcing-highly-capable-foundation-models

arXiv:2303.18010 [pdf]

Augmented Collective Intelligence in Collaborative Ideation: Agenda and Challenges

Authors: Emily Dardaman, Abhishek Gupta

Abstract: AI systems may be better thought of as peers than as tools. This paper explores applications of augmented collective intelligence (ACI) beneficial to collaborative ideation. Design considerations are offered for an experiment that evaluates the performance of hybrid human- AI collectives. The investigation described combines humans and large language models (LLMs) to ideate on increasingly complex… ▽ More AI systems may be better thought of as peers than as tools. This paper explores applications of augmented collective intelligence (ACI) beneficial to collaborative ideation. Design considerations are offered for an experiment that evaluates the performance of hybrid human- AI collectives. The investigation described combines humans and large language models (LLMs) to ideate on increasingly complex topics. A promising real-time collection tool called Polis is examined to facilitate ACI, including case studies from citizen engagement projects in Taiwan and Bowling Green, Kentucky. The authors discuss three challenges to consider when designing an ACI experiment: topic selection, participant selection, and evaluation of results. The paper concludes that researchers should address these challenges to conduct empirical studies of ACI in collaborative ideation. △ Less

Submitted 31 March, 2023; originally announced March 2023.

Comments: 5 pages

arXiv:2303.18006 [pdf]

Asking Better Questions -- The Art and Science of Forecasting: A mechanism for truer answers to high-stakes questions

Authors: Emily Dardaman, Abhishek Gupta

Abstract: Without the ability to estimate and benchmark AI capability advancements, organizations are left to respond to each change reactively, impeding their ability to build viable mid and long-term strategies. This paper explores the recent growth of forecasting, a political science tool that uses explicit assumptions and quantitative estimation that leads to improved prediction accuracy. Done at the co… ▽ More Without the ability to estimate and benchmark AI capability advancements, organizations are left to respond to each change reactively, impeding their ability to build viable mid and long-term strategies. This paper explores the recent growth of forecasting, a political science tool that uses explicit assumptions and quantitative estimation that leads to improved prediction accuracy. Done at the collective level, forecasting can identify and verify talent, enable leaders to build better models of AI advancements and improve inputs into design policy. Successful approaches to forecasting and case studies are examined, revealing a subclass of "superforecasters" who outperform 98% of the population and whose insights will be most reliable. Finally, techniques behind successful forecasting are outlined, including Phillip Tetlock's "Ten Commandments." To adapt to a quickly changing technology landscape, designers and policymakers should consider forecasting as a first line of defense. △ Less

Submitted 31 March, 2023; originally announced March 2023.

Comments: 7 pages

Showing 1–4 of 4 results for author: Dardaman, E