Search | arXiv e-print repository

The Ethics of Advanced AI Assistants

Authors: Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks, Verena Rieser, Hasan Iqbal, Nenad Tomašev, Ira Ktena, Zachary Kenton, Mikel Rodriguez, Seliem El-Sayed, Sasha Brown, Canfer Akbulut, Andrew Trask, Edward Hughes, A. Stevie Bergman, Renee Shelby, Nahema Marchal, Conor Griffin, Juan Mateos-Garcia, Laura Weidinger, Winnie Street, Benjamin Lange, Alex Ingerman, Alison Lentz , et al. (32 additional authors not shown)

Abstract: This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, pro… ▽ More This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, providing an overview of AI assistants, their technical foundations and potential range of applications. It then explores questions around AI value alignment, well-being, safety and malicious uses. Extending the circle of inquiry further, we next consider the relationship between advanced AI assistants and individual users in more detail, exploring topics such as manipulation and persuasion, anthropomorphism, appropriate relationships, trust and privacy. With this analysis in place, we consider the deployment of advanced assistants at a societal scale, focusing on cooperation, equity and access, misinformation, economic impact, the environment and how best to evaluate advanced AI assistants. Finally, we conclude by providing a range of recommendations for researchers, developers, policymakers and public stakeholders. △ Less

Submitted 28 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.14068 [pdf, other]

Holistic Safety and Responsibility Evaluations of Advanced AI Models

Authors: Laura Weidinger, Joslyn Barnhart, Jenny Brennan, Christina Butterfield, Susie Young, Will Hawkins, Lisa Anne Hendricks, Ramona Comanescu, Oscar Chang, Mikel Rodriguez, Jennifer Beroshi, Dawn Bloxwich, Lev Proleev, Jilin Chen, Sebastian Farquhar, Lewis Ho, Iason Gabriel, Allan Dafoe, William Isaac

Abstract: Safety and responsibility evaluations of advanced AI models are a critical but develo** field of research and practice. In the development of Google DeepMind's advanced AI models, we innovated on and applied a broad set of approaches to safety evaluation. In this report, we summarise and share elements of our evolving approach as well as lessons learned for a broad audience. Key lessons learned… ▽ More Safety and responsibility evaluations of advanced AI models are a critical but develo** field of research and practice. In the development of Google DeepMind's advanced AI models, we innovated on and applied a broad set of approaches to safety evaluation. In this report, we summarise and share elements of our evolving approach as well as lessons learned for a broad audience. Key lessons learned include: First, theoretical underpinnings and frameworks are invaluable to organise the breadth of risk domains, modalities, forms, metrics, and goals. Second, theory and practice of safety evaluation development each benefit from collaboration to clarify goals, methods and challenges, and facilitate the transfer of insights between different stakeholders and disciplines. Third, similar key methods, lessons, and institutions apply across the range of concerns in responsibility and safety - including established and emerging harms. For this reason it is important that a wide range of actors working on safety evaluation and safety research communities work together to develop, refine and implement novel evaluation approaches and best practices, rather than operating in silos. The report concludes with outlining the clear need to rapidly advance the science of evaluations, to integrate new evaluations into the development and governance of AI, to establish scientifically-grounded norms and standards, and to promote a robust evaluation ecosystem. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 10 pages excluding bibliography

arXiv:2403.13793 [pdf, other]

Evaluating Frontier Models for Dangerous Capabilities

Authors: Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, Heidi Howard, Tom Lieberum, Ramana Kumar, Maria Abi Raad, Albert Webson, Lewis Ho, Sharon Lin, Sebastian Farquhar, Marcus Hutter, Gregoire Deletang, Anian Ruoss, Seliem El-Sayed, Sasha Brown, Anca Dragan, Rohin Shah , et al. (2 additional authors not shown)

Abstract: To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini 1.0 models. Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning. We do not find evidence of strong dangerous… ▽ More To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini 1.0 models. Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning. We do not find evidence of strong dangerous capabilities in the models we evaluated, but we flag early warning signs. Our goal is to help advance a rigorous science of dangerous capability evaluation, in preparation for future models. △ Less

Submitted 5 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2311.02462 [pdf, ps, other]

Levels of AGI for Operationalizing Progress on the Path to AGI

Authors: Meredith Ringel Morris, Jascha Sohl-dickstein, Noah Fiedel, Tris Warkentin, Allan Dafoe, Aleksandra Faust, Clement Farabet, Shane Legg

Abstract: We propose a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors. This framework introduces levels of AGI performance, generality, and autonomy, providing a common language to compare models, assess risks, and measure progress along the path to AGI. To develop our framework, we analyze existing definitions of AGI, and distill… ▽ More We propose a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors. This framework introduces levels of AGI performance, generality, and autonomy, providing a common language to compare models, assess risks, and measure progress along the path to AGI. To develop our framework, we analyze existing definitions of AGI, and distill six principles that a useful ontology for AGI should satisfy. With these principles in mind, we propose "Levels of AGI" based on depth (performance) and breadth (generality) of capabilities, and reflect on how current systems fit into this ontology. We discuss the challenging requirements for future benchmarks that quantify the behavior and capabilities of AGI models against these levels. Finally, we discuss how these levels of AGI interact with deployment considerations such as autonomy and risk, and emphasize the importance of carefully selecting Human-AI Interaction paradigms for responsible and safe deployment of highly capable AI systems. △ Less

Submitted 5 June, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

Comments: version 4 - Position Paper accepted to ICML 2024. Note that due to ICML position paper titling format requirements, the title has changed slightly from that of the original arXiv pre-print. The original pre-print title was "Levels of AGI: Operationalizing Progress on the Path to AGI" but the official published title for ICML 2024 is "Levels of AGI for Operationalizing Progress on the Path to AGI"

Journal ref: Proceedings of ICML 2024

arXiv:2307.04699 [pdf, other]

International Institutions for Advanced AI

Authors: Lewis Ho, Joslyn Barnhart, Robert Trager, Yoshua Bengio, Miles Brundage, Allison Carnegie, Rumman Chowdhury, Allan Dafoe, Gillian Hadfield, Margaret Levi, Duncan Snidal

Abstract: International institutions may have an important role to play in ensuring advanced AI systems benefit humanity. International collaborations can unlock AI's ability to further sustainable development, and coordination of regulatory efforts can reduce obstacles to innovation and the spread of benefits. Conversely, the potential dangerous capabilities of powerful and general-purpose AI systems creat… ▽ More International institutions may have an important role to play in ensuring advanced AI systems benefit humanity. International collaborations can unlock AI's ability to further sustainable development, and coordination of regulatory efforts can reduce obstacles to innovation and the spread of benefits. Conversely, the potential dangerous capabilities of powerful and general-purpose AI systems create global externalities in their development and deployment, and international efforts to further responsible AI practices could help manage the risks they pose. This paper identifies a set of governance functions that could be performed at an international level to address these challenges, ranging from supporting access to frontier AI systems to setting international safety standards. It groups these functions into four institutional models that exhibit internal synergies and have precedents in existing organizations: 1) a Commission on Frontier AI that facilitates expert consensus on opportunities and risks from advanced AI, 2) an Advanced AI Governance Organization that sets international standards to manage global threats from advanced models, supports their implementation, and possibly monitors compliance with a future governance regime, 3) a Frontier AI Collaborative that promotes access to cutting-edge AI, and 4) an AI Safety Project that brings together leading researchers and engineers to further AI safety research. We explore the utility of these models and identify open questions about their viability. △ Less

Submitted 11 July, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

Comments: 19 pages, 2 figures, fixed rendering issues

ACM Class: K.4.1

arXiv:2305.15324 [pdf, other]

Model evaluation for extreme risks

Authors: Toby Shevlane, Sebastian Farquhar, Ben Garfinkel, Mary Phuong, Jess Whittlestone, Jade Leung, Daniel Kokotajlo, Nahema Marchal, Markus Anderljung, Noam Kolt, Lewis Ho, Divya Siddarth, Shahar Avin, Will Hawkins, Been Kim, Iason Gabriel, Vijay Bolina, Jack Clark, Yoshua Bengio, Paul Christiano, Allan Dafoe

Abstract: Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify danger… ▽ More Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify dangerous capabilities (through "dangerous capability evaluations") and the propensity of models to apply their capabilities for harm (through "alignment evaluations"). These evaluations will become critical for kee** policymakers and other stakeholders informed, and for making responsible decisions about model training, deployment, and security. △ Less

Submitted 22 September, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: Fixed typos; added citation

ACM Class: K.4.1

arXiv:2303.12642 [pdf]

Democratising AI: Multiple Meanings, Goals, and Methods

Authors: Elizabeth Seger, Aviv Ovadya, Ben Garfinkel, Divya Siddarth, Allan Dafoe

Abstract: Numerous parties are calling for the democratisation of AI, but the phrase is used to refer to a variety of goals, the pursuit of which sometimes conflict. This paper identifies four kinds of AI democratisation that are commonly discussed: (1) the democratisation of AI use, (2) the democratisation of AI development, (3) the democratisation of AI profits, and (4) the democratisation of AI governanc… ▽ More Numerous parties are calling for the democratisation of AI, but the phrase is used to refer to a variety of goals, the pursuit of which sometimes conflict. This paper identifies four kinds of AI democratisation that are commonly discussed: (1) the democratisation of AI use, (2) the democratisation of AI development, (3) the democratisation of AI profits, and (4) the democratisation of AI governance. Numerous goals and methods of achieving each form of democratisation are discussed. The main takeaway from this paper is that AI democratisation is a multifarious and sometimes conflicting concept that should not be conflated with improving AI accessibility. If we want to move beyond ambiguous commitments to democratising AI, to productive discussions of concrete policies and trade-offs, then we need to recognise the principal role of the democratisation of AI governance in navigating tradeoffs and risks across decisions around use, development, and profits. △ Less

Submitted 7 August, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

Comments: V2 Changed second author affiliation; added citation to section 5.2; edit to author contribution statement; V3 camera ready version for conference proceedings. Minor content changes in response to reviewer comments

arXiv:2206.04132 [pdf, other]

Forecasting AI Progress: Evidence from a Survey of Machine Learning Researchers

Authors: Baobao Zhang, Noemi Dreksler, Markus Anderljung, Lauren Kahn, Charlie Giattino, Allan Dafoe, Michael C. Horowitz

Abstract: Advances in artificial intelligence (AI) are sha** modern life, from transportation, health care, science, finance, to national defense. Forecasts of AI development could help improve policy- and decision-making. We report the results from a large survey of AI and machine learning (ML) researchers on their beliefs about progress in AI. The survey, fielded in late 2019, elicited forecasts for nea… ▽ More Advances in artificial intelligence (AI) are sha** modern life, from transportation, health care, science, finance, to national defense. Forecasts of AI development could help improve policy- and decision-making. We report the results from a large survey of AI and machine learning (ML) researchers on their beliefs about progress in AI. The survey, fielded in late 2019, elicited forecasts for near-term AI development milestones and high- or human-level machine intelligence, defined as when machines are able to accomplish every or almost every task humans are able to do currently. As part of this study, we re-contacted respondents from a highly-cited study by Grace et al. (2018), in which AI/ML researchers gave forecasts about high-level machine intelligence and near-term milestones in AI development. Results from our 2019 survey show that, in aggregate, AI/ML researchers surveyed placed a 50% likelihood of human-level machine intelligence being achieved by 2060. The results show researchers newly contacted in 2019 expressed similar beliefs about the progress of advanced AI as respondents in the Grace et al. (2018) survey. For the recontacted participants from the Grace et al. (2018) study, the aggregate forecast for a 50% likelihood of high-level machine intelligence shifted from 2062 to 2076, although this change is not statistically significant, likely due to the small size of our panel sample. Forecasts of several near-term AI milestones have reduced in time, suggesting more optimism about AI progress. Finally, AI/ML researchers also exhibited significant optimism about how human-level machine intelligence will impact society. △ Less

Submitted 8 June, 2022; originally announced June 2022.

ACM Class: K.4.1

arXiv:2111.13872 [pdf, other]

Normative Disagreement as a Challenge for Cooperative AI

Authors: Julian Stastny, Maxime Riché, Alexander Lyzhov, Johannes Treutlein, Allan Dafoe, Jesse Clifton

Abstract: Cooperation in settings where agents have both common and conflicting interests (mixed-motive environments) has recently received considerable attention in multi-agent learning. However, the mixed-motive environments typically studied have a single cooperative outcome on which all agents can agree. Many real-world multi-agent environments are instead bargaining problems (BPs): they have several Pa… ▽ More Cooperation in settings where agents have both common and conflicting interests (mixed-motive environments) has recently received considerable attention in multi-agent learning. However, the mixed-motive environments typically studied have a single cooperative outcome on which all agents can agree. Many real-world multi-agent environments are instead bargaining problems (BPs): they have several Pareto-optimal payoff profiles over which agents have conflicting preferences. We argue that typical cooperation-inducing learning algorithms fail to cooperate in BPs when there is room for normative disagreement resulting in the existence of multiple competing cooperative equilibria, and illustrate this problem empirically. To remedy the issue, we introduce the notion of norm-adaptive policies. Norm-adaptive policies are capable of behaving according to different norms in different circumstances, creating opportunities for resolving normative disagreement. We develop a class of norm-adaptive policies and show in experiments that these significantly increase cooperation. However, norm-adaptiveness cannot address residual bargaining failure arising from a fundamental tradeoff between exploitability and cooperative robustness. △ Less

Submitted 27 November, 2021; originally announced November 2021.

Comments: Accepted at the Cooperative AI workshop and the Strategic ML workshop at NeurIPS 2021

arXiv:2106.11039 [pdf, ps, other]

doi 10.1038/s42256-021-00298-y

Institutionalising Ethics in AI through Broader Impact Requirements

Authors: Carina Prunkl, Carolyn Ashurst, Markus Anderljung, Helena Webb, Jan Leike, Allan Dafoe

Abstract: Turning principles into practice is one of the most pressing challenges of artificial intelligence (AI) governance. In this article, we reflect on a novel governance initiative by one of the world's largest AI conferences. In 2020, the Conference on Neural Information Processing Systems (NeurIPS) introduced a requirement for submitting authors to include a statement on the broader societal impacts… ▽ More Turning principles into practice is one of the most pressing challenges of artificial intelligence (AI) governance. In this article, we reflect on a novel governance initiative by one of the world's largest AI conferences. In 2020, the Conference on Neural Information Processing Systems (NeurIPS) introduced a requirement for submitting authors to include a statement on the broader societal impacts of their research. Drawing insights from similar governance initiatives, including institutional review boards (IRBs) and impact requirements for funding applications, we investigate the risks, challenges and potential benefits of such an initiative. Among the challenges, we list a lack of recognised best practice and procedural transparency, researcher opportunity costs, institutional and social pressures, cognitive biases, and the inherently difficult nature of the task. The potential benefits, on the other hand, include improved anticipation and identification of impacts, better communication with policy and governance experts, and a general strengthening of the norms around responsible research. To maximise the chance of success, we recommend measures to increase transparency, improve guidance, create incentives to engage earnestly with the process, and facilitate public deliberation on the requirement's merits and future. Perhaps the most important contribution from this analysis are the insights we can gain regarding effective community-based governance and the role and responsibility of the AI research community more broadly. △ Less

Submitted 30 May, 2021; originally announced June 2021.

Journal ref: Nature Machine Intelligence 3.2 (2021): 104-110

arXiv:2106.04338 [pdf]

Engines of Power: Electricity, AI, and General-Purpose Military Transformations

Authors: Jeffrey Ding, Allan Dafoe

Abstract: Major theories of military innovation focus on relatively narrow technological developments, such as nuclear weapons or aircraft carriers. Arguably the most profound military implications of technological change, however, come from more fundamental advances arising from general purpose technologies, such as the steam engine, electricity, and the computer. With few exceptions, political scientists… ▽ More Major theories of military innovation focus on relatively narrow technological developments, such as nuclear weapons or aircraft carriers. Arguably the most profound military implications of technological change, however, come from more fundamental advances arising from general purpose technologies, such as the steam engine, electricity, and the computer. With few exceptions, political scientists have not theorized about GPTs. Drawing from the economics literature on GPTs, we distill several propositions on how and when GPTs affect military affairs. We call these effects general-purpose military transformations. In particular, we argue that the impacts of GMTs on military effectiveness are broad, delayed, and shaped by indirect productivity spillovers. Additionally, GMTs differentially advantage those militaries that can draw from a robust industrial base in the GPT. To illustrate the explanatory value of our theory, we conduct a case study of the military consequences of electricity, the prototypical GPT. Finally, we apply our findings to artificial intelligence, which will plausibly cause a profound general-purpose military transformation. △ Less

Submitted 8 June, 2021; originally announced June 2021.

arXiv:2105.02117 [pdf, other]

Ethics and Governance of Artificial Intelligence: Evidence from a Survey of Machine Learning Researchers

Authors: Baobao Zhang, Markus Anderljung, Lauren Kahn, Noemi Dreksler, Michael C. Horowitz, Allan Dafoe

Abstract: Machine learning (ML) and artificial intelligence (AI) researchers play an important role in the ethics and governance of AI, including taking action against what they perceive to be unethical uses of AI (Belfield, 2020; Van Noorden, 2020). Nevertheless, this influential group's attitudes are not well understood, which undermines our ability to discern consensuses or disagreements between AI/ML re… ▽ More Machine learning (ML) and artificial intelligence (AI) researchers play an important role in the ethics and governance of AI, including taking action against what they perceive to be unethical uses of AI (Belfield, 2020; Van Noorden, 2020). Nevertheless, this influential group's attitudes are not well understood, which undermines our ability to discern consensuses or disagreements between AI/ML researchers. To examine these researchers' views, we conducted a survey of those who published in the top AI/ML conferences (N = 524). We compare these results with those from a 2016 survey of AI/ML researchers (Grace, Salvatier, Dafoe, Zhang, & Evans, 2018) and a 2018 survey of the US public (Zhang & Dafoe, 2020). We find that AI/ML researchers place high levels of trust in international organizations and scientific organizations to shape the development and use of AI in the public interest; moderate trust in most Western tech companies; and low trust in national militaries, Chinese tech companies, and Facebook. While the respondents were overwhelmingly opposed to AI/ML researchers working on lethal autonomous weapons, they are less opposed to researchers working on other military applications of AI, particularly logistics algorithms. A strong majority of respondents think that AI safety research should be prioritized and that ML institutions should conduct pre-publication review to assess potential harms. Being closer to the technology itself, AI/ML re-searchers are well placed to highlight new risks and develop technical solutions, so this novel attempt to measure their attitudes has broad relevance. The findings should help to improve how researchers, private sector executives, and policymakers think about regulations, governance frameworks, guiding principles, and national and international governance strategies for AI. △ Less

Submitted 5 May, 2021; originally announced May 2021.

ACM Class: K.7.4

arXiv:2104.07237 [pdf, other]

doi 10.1145/3461702.3462617

Skilled and Mobile: Survey Evidence of AI Researchers' Immigration Preferences

Authors: Remco Zwetsloot, Baobao Zhang, Noemi Dreksler, Lauren Kahn, Markus Anderljung, Allan Dafoe, Michael C. Horowitz

Abstract: Countries, companies, and universities are increasingly competing over top-tier artificial intelligence (AI) researchers. Where are these researchers likely to immigrate and what affects their immigration decisions? We conducted a survey $(n = 524)$ of the immigration preferences and motivations of researchers that had papers accepted at one of two prestigious AI conferences: the Conference on Neu… ▽ More Countries, companies, and universities are increasingly competing over top-tier artificial intelligence (AI) researchers. Where are these researchers likely to immigrate and what affects their immigration decisions? We conducted a survey $(n = 524)$ of the immigration preferences and motivations of researchers that had papers accepted at one of two prestigious AI conferences: the Conference on Neural Information Processing Systems (NeurIPS) and the International Conference on Machine Learning (ICML). We find that the U.S. is the most popular destination for AI researchers, followed by the U.K., Canada, Switzerland, and France. A country's professional opportunities stood out as the most common factor that influences immigration decisions of AI researchers, followed by lifestyle and culture, the political climate, and personal relations. The destination country's immigration policies were important to just under half of the researchers surveyed, while around a quarter noted current immigration difficulties to be a deciding factor. Visa and immigration difficulties were perceived to be a particular impediment to conducting AI research in the U.S., the U.K., and Canada. Implications of the findings for the future of AI talent policies and governance are discussed. △ Less

Submitted 5 May, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

Comments: Accepted for poster presentation at the 2021 AAAI/ACM Conference on AI, Ethics, and Society

ACM Class: K.7.4

arXiv:2101.09195 [pdf, other]

Randomization Inference beyond the Sharp Null: Bounded Null Hypotheses and Quantiles of Individual Treatment Effects

Authors: Devin Caughey, Allan Dafoe, Xinran Li, Luke Miratrix

Abstract: Randomization inference (RI) is typically interpreted as testing Fisher's "sharp" null hypothesis that all unit-level effects are exactly zero. This hypothesis is often criticized as restrictive and implausible, making its rejection scientifically uninteresting. We show, however, that many randomization tests are also valid for a "bounded" null hypothesis under which the unit-level effects are all… ▽ More Randomization inference (RI) is typically interpreted as testing Fisher's "sharp" null hypothesis that all unit-level effects are exactly zero. This hypothesis is often criticized as restrictive and implausible, making its rejection scientifically uninteresting. We show, however, that many randomization tests are also valid for a "bounded" null hypothesis under which the unit-level effects are all non-positive (or all non-negative) but are otherwise heterogeneous. In addition to being more plausible a priori, bounded nulls are closely related to substantively important concepts such as monotonicity and Pareto efficiency. Reinterpreting RI in this way also dramatically expands the range of inferences possible in this framework. We show that exact confidence intervals for the maximum (or minimum) unit-level effect can be obtained by inverting tests for a sequence of bounded nulls. We also generalize RI to cover inference for quantiles of the individual effect distribution as well as for the proportion of individual effects larger (or smaller) than a given threshold. The proposed confidence intervals for all effect quantiles are simultaneously valid, in the sense that no correction for multiple analyses is required, and are thus a "free lunch" added to conventional RI. In sum, our reinterpretation and generalization provide a broader justification for randomization tests and a basis for exact nonparametric inference for effect quantiles. We illustrate our methods with simulations and applications, finding that Stephenson rank statistics can provide more informative results than the more common Wilcoxon rank or difference-in-means statistics. We also provide an R package RIQITE implementing the proposed approach. △ Less

Submitted 28 August, 2023; v1 submitted 22 January, 2021; originally announced January 2021.

arXiv:2012.08630 [pdf, other]

Open Problems in Cooperative AI

Authors: Allan Dafoe, Edward Hughes, Yoram Bachrach, Tantum Collins, Kevin R. McKee, Joel Z. Leibo, Kate Larson, Thore Graepel

Abstract: Problems of cooperation--in which agents seek ways to jointly improve their welfare--are ubiquitous and important. They can be found at scales ranging from our daily routines--such as driving on highways, scheduling meetings, and working collaboratively--to our global challenges--such as peace, commerce, and pandemic preparedness. Arguably, the success of the human species is rooted in our ability… ▽ More Problems of cooperation--in which agents seek ways to jointly improve their welfare--are ubiquitous and important. They can be found at scales ranging from our daily routines--such as driving on highways, scheduling meetings, and working collaboratively--to our global challenges--such as peace, commerce, and pandemic preparedness. Arguably, the success of the human species is rooted in our ability to cooperate. Since machines powered by artificial intelligence are playing an ever greater role in our lives, it will be important to equip them with the capabilities necessary to cooperate and to foster cooperation. We see an opportunity for the field of artificial intelligence to explicitly focus effort on this class of problems, which we term Cooperative AI. The objective of this research would be to study the many aspects of the problems of cooperation and to innovate in AI to contribute to solving these problems. Central goals include building machine agents with the capabilities needed for cooperation, building tools to foster cooperation in populations of (machine and/or human) agents, and otherwise conducting AI research for insight relevant to problems of cooperation. This research integrates ongoing work on multi-agent systems, game theory and social choice, human-machine interaction and alignment, natural-language processing, and the construction of social tools and platforms. However, Cooperative AI is not the union of these existing areas, but rather an independent bet about the productivity of specific kinds of conversations that involve these and other areas. We see opportunity to more explicitly focus on the problem of cooperation, to construct unified theory and vocabulary, and to build bridges with adjacent communities working on cooperation, including in the natural, social, and behavioural sciences. △ Less

Submitted 15 December, 2020; originally announced December 2020.

arXiv:2012.08347 [pdf]

Beyond Privacy Trade-offs with Structured Transparency

Authors: Andrew Trask, Emma Bluemke, Teddy Collins, Ben Garfinkel Eric Drexler, Claudia Ghezzou Cuervas-Mons, Iason Gabriel, Allan Dafoe, William Isaac

Abstract: Successful collaboration involves sharing information. However, parties may disagree on how the information they need to share should be used. We argue that many of these concerns reduce to 'the copy problem': once a bit of information is copied and shared, the sender can no longer control how the recipient uses it. From the perspective of each collaborator, this presents a dilemma that can inhibi… ▽ More Successful collaboration involves sharing information. However, parties may disagree on how the information they need to share should be used. We argue that many of these concerns reduce to 'the copy problem': once a bit of information is copied and shared, the sender can no longer control how the recipient uses it. From the perspective of each collaborator, this presents a dilemma that can inhibit collaboration. The copy problem is often amplified by three related problems which we term the bundling, edit, and recursive enforcement problems. We find that while the copy problem is not solvable, aspects of these amplifying problems have been addressed in a variety of disconnected fields. We observe that combining these efforts could improve the governability of information flows and thereby incentivise collaboration. We propose a five-part framework which groups these efforts into specific capabilities and offers a foundation for their integration into an overarching vision we call "structured transparency". We conclude by surveying an array of use-cases that illustrate the structured transparency principles and their related capabilities. △ Less

Submitted 12 March, 2024; v1 submitted 15 December, 2020; originally announced December 2020.

arXiv:2004.07213 [pdf, ps, other]

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims

Authors: Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian Hadfield, Heidy Khlaaf, **gying Yang, Helen Toner, Ruth Fong, Tegan Maharaj, Pang Wei Koh, Sara Hooker, Jade Leung, Andrew Trask, Emma Bluemke, Jonathan Lebensold, Cullen O'Keefe, Mark Koren, Théo Ryffel, JB Rubinovitz, Tamay Besiroglu, Federica Carugati, Jack Clark, Peter Eckersley , et al. (34 additional authors not shown)

Abstract: With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they… ▽ More With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they are building AI responsibly, they will need to make verifiable claims to which they can be held accountable. Those outside of a given organization also need effective means of scrutinizing such claims. This report suggests various steps that different stakeholders can take to improve the verifiability of claims made about AI systems and their associated development processes, with a focus on providing evidence about the safety, security, fairness, and privacy protection of AI systems. We analyze ten mechanisms for this purpose--spanning institutions, software, and hardware--and make recommendations aimed at implementing, exploring, or improving those mechanisms. △ Less

Submitted 20 April, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

arXiv:2001.05068 [pdf, other]

doi 10.1145/3375627.3375863

Social and Governance Implications of Improved Data Efficiency

Authors: Aaron D. Tucker, Markus Anderljung, Allan Dafoe

Abstract: Many researchers work on improving the data efficiency of machine learning. What would happen if they succeed? This paper explores the social-economic impact of increased data efficiency. Specifically, we examine the intuition that data efficiency will erode the barriers to entry protecting incumbent data-rich AI firms, exposing them to more competition from data-poor firms. We find that this intu… ▽ More Many researchers work on improving the data efficiency of machine learning. What would happen if they succeed? This paper explores the social-economic impact of increased data efficiency. Specifically, we examine the intuition that data efficiency will erode the barriers to entry protecting incumbent data-rich AI firms, exposing them to more competition from data-poor firms. We find that this intuition is only partially correct: data efficiency makes it easier to create ML applications, but large AI firms may have more to gain from higher performing AI systems. Further, we find that the effect on privacy, data markets, robustness, and misuse are complex. For example, while it seems intuitive that misuse risk would increase along with data efficiency -- as more actors gain access to any level of capability -- the net effect crucially depends on how much defensive measures are improved. More investigation into data efficiency, as well as research into the "AI production function", will be key to understanding the development of the AI industry and its societal impacts. △ Less

Submitted 14 January, 2020; originally announced January 2020.

Comments: 7 pages, 2 figures, accepted to Artificial Intelligence Ethics and Society 2020

arXiv:2001.03246 [pdf]

doi 10.1080/09636412.2021.1915583

The Logic of Strategic Assets: From Oil to Artificial Intelligence

Authors: Jeffrey Ding, Allan Dafoe

Abstract: What resources and technologies are strategic? This question is often the focus of policy and theoretical debates, where the label "strategic" designates those assets that warrant the attention of the highest levels of the state. But these conversations are plagued by analytical confusion, flawed heuristics, and the rhetorical use of "strategic" to advance particular agendas. We aim to improve the… ▽ More What resources and technologies are strategic? This question is often the focus of policy and theoretical debates, where the label "strategic" designates those assets that warrant the attention of the highest levels of the state. But these conversations are plagued by analytical confusion, flawed heuristics, and the rhetorical use of "strategic" to advance particular agendas. We aim to improve these conversations through conceptual clarification, introducing a theory based on important rivalrous externalities for which socially optimal behavior will not be produced alone by markets or individual national security entities. We distill and theorize the most important three forms of these externalities, which involve cumulative-, infrastructure-, and dependency-strategic logics. We then employ these logics to clarify three important cases: the Avon 2 engine in the 1950s, the U.S.-Japan technology rivalry in the late 1980s, and contemporary conversations about artificial intelligence. △ Less

Submitted 31 May, 2021; v1 submitted 9 January, 2020; originally announced January 2020.

Comments: Added references and corrected typos

arXiv:2001.00463 [pdf, ps, other]

The Offense-Defense Balance of Scientific Knowledge: Does Publishing AI Research Reduce Misuse?

Authors: Toby Shevlane, Allan Dafoe

Abstract: There is growing concern over the potential misuse of artificial intelligence (AI) research. Publishing scientific research can facilitate misuse of the technology, but the research can also contribute to protections against misuse. This paper addresses the balance between these two effects. Our theoretical framework elucidates the factors governing whether the published research will be more usef… ▽ More There is growing concern over the potential misuse of artificial intelligence (AI) research. Publishing scientific research can facilitate misuse of the technology, but the research can also contribute to protections against misuse. This paper addresses the balance between these two effects. Our theoretical framework elucidates the factors governing whether the published research will be more useful for attackers or defenders, such as the possibility for adequate defensive measures, or the independent discovery of the knowledge outside of the scientific community. The balance will vary across scientific fields. However, we show that the existing conversation within AI has imported concepts and conclusions from prior debates within computer security over the disclosure of software vulnerabilities. While disclosure of software vulnerabilities often favours defence, this cannot be assumed for AI research. The AI research community should consider concepts and policies from a broad set of adjacent fields, and ultimately needs to craft policy well-suited to its particular challenges. △ Less

Submitted 9 January, 2020; v1 submitted 27 December, 2019; originally announced January 2020.

arXiv:1912.12835 [pdf, other]

U.S. Public Opinion on the Governance of Artificial Intelligence

Authors: Baobao Zhang, Allan Dafoe

Abstract: Artificial intelligence (AI) has widespread societal implications, yet social scientists are only beginning to study public attitudes toward the technology. Existing studies find that the public's trust in institutions can play a major role in sha** the regulation of emerging technologies. Using a large-scale survey (N=2000), we examined Americans' perceptions of 13 AI governance challenges as w… ▽ More Artificial intelligence (AI) has widespread societal implications, yet social scientists are only beginning to study public attitudes toward the technology. Existing studies find that the public's trust in institutions can play a major role in sha** the regulation of emerging technologies. Using a large-scale survey (N=2000), we examined Americans' perceptions of 13 AI governance challenges as well as their trust in governmental, corporate, and multistakeholder institutions to responsibly develop and manage AI. While Americans perceive all of the AI governance issues to be important for tech companies and governments to manage, they have only low to moderate trust in these institutions to manage AI applications. △ Less

Submitted 30 December, 2019; originally announced December 2019.

Comments: 22 pages; 7 figures; 4 tables; accepted for oral presentation at the 2020 AAAI/ACM Conference on AI, Ethics, and Society

arXiv:1912.11595 [pdf]

The Windfall Clause: Distributing the Benefits of AI for the Common Good

Authors: Cullen O'Keefe, Peter Cihon, Ben Garfinkel, Carrick Flynn, Jade Leung, Allan Dafoe

Abstract: As the transformative potential of AI has become increasingly salient as a matter of public and political interest, there has been growing discussion about the need to ensure that AI broadly benefits humanity. This in turn has spurred debate on the social responsibilities of large technology companies to serve the interests of society at large. In response, ethical principles and codes of conduct… ▽ More As the transformative potential of AI has become increasingly salient as a matter of public and political interest, there has been growing discussion about the need to ensure that AI broadly benefits humanity. This in turn has spurred debate on the social responsibilities of large technology companies to serve the interests of society at large. In response, ethical principles and codes of conduct have been proposed to meet the escalating demand for this responsibility to be taken seriously. As yet, however, few institutional innovations have been suggested to translate this responsibility into legal commitments which apply to companies positioned to reap large financial gains from the development and use of AI. This paper offers one potentially attractive tool for addressing such issues: the Windfall Clause, which is an ex ante commitment by AI firms to donate a significant amount of any eventual extremely large profits. By this we mean an early commitment that profits that a firm could not earn without achieving fundamental, economically transformative breakthroughs in AI capabilities will be donated to benefit humanity broadly, with particular attention towards mitigating any downsides from deployment of windfall-generating AI. △ Less

Submitted 24 January, 2020; v1 submitted 25 December, 2019; originally announced December 2019.

Comments: Short version to be published in proceedings of AIES

arXiv:1806.00610 [pdf, other]

Between Progress and Potential Impact of AI: the Neglected Dimensions

Authors: Fernando Martínez-Plumed, Shahar Avin, Miles Brundage, Allan Dafoe, Sean Ó hÉigeartaigh, José Hernández-Orallo

Abstract: We reframe the analysis of progress in AI by incorporating into an overall framework both the task performance of a system, and the time and resource costs incurred in the development and deployment of the system. These costs include: data, expert knowledge, human oversight, software resources, computing cycles, hardware and network facilities, and (what kind of) time. These costs are distributed… ▽ More We reframe the analysis of progress in AI by incorporating into an overall framework both the task performance of a system, and the time and resource costs incurred in the development and deployment of the system. These costs include: data, expert knowledge, human oversight, software resources, computing cycles, hardware and network facilities, and (what kind of) time. These costs are distributed over the life cycle of the system, and may place differing demands on different developers and users. The multidimensional performance and cost space we present can be collapsed to a single utility metric that measures the value of the system for different stakeholders. Even without a single utility function, AI advances can be generically assessed by whether they expand the Pareto surface. We label these types of costs as neglected dimensions of AI progress, and explore them using four case studies: Alpha* (Go, Chess, and other board games), ALE (Atari games), ImageNet (Image classification) and Virtual Personal Assistants (Siri, Alexa, Cortana, and Google Assistant). This broader model of progress in AI will lead to novel ways of estimating the potential societal use and impact of an AI system, and the establishment of milestones for future progress. △ Less

Submitted 2 July, 2022; v1 submitted 2 June, 2018; originally announced June 2018.

arXiv:1802.07228 [pdf]

The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation

Authors: Miles Brundage, Shahar Avin, Jack Clark, Helen Toner, Peter Eckersley, Ben Garfinkel, Allan Dafoe, Paul Scharre, Thomas Zeitzoff, Bobby Filar, Hyrum Anderson, Heather Roff, Gregory C. Allen, Jacob Steinhardt, Carrick Flynn, Seán Ó hÉigeartaigh, Simon Beard, Haydn Belfield, Sebastian Farquhar, Clare Lyle, Rebecca Crootof, Owain Evans, Michael Page, Joanna Bryson, Roman Yampolskiy , et al. (1 additional authors not shown)

Abstract: This report surveys the landscape of potential security threats from malicious uses of AI, and proposes ways to better forecast, prevent, and mitigate these threats. After analyzing the ways in which AI may influence the threat landscape in the digital, physical, and political domains, we make four high-level recommendations for AI researchers and other stakeholders. We also suggest several promis… ▽ More This report surveys the landscape of potential security threats from malicious uses of AI, and proposes ways to better forecast, prevent, and mitigate these threats. After analyzing the ways in which AI may influence the threat landscape in the digital, physical, and political domains, we make four high-level recommendations for AI researchers and other stakeholders. We also suggest several promising areas for further research that could expand the portfolio of defenses, or make attacks less effective or harder to execute. Finally, we discuss, but do not conclusively resolve, the long-term equilibrium of attackers and defenders. △ Less

Submitted 20 February, 2018; originally announced February 2018.

arXiv:1709.07339 [pdf, other]

Beyond the Sharp Null: Randomization Inference, Bounded Null Hypotheses, and Confidence Intervals for Maximum Effects

Authors: Devin Caughey, Allan Dafoe, Luke Miratrix

Abstract: Fisherian randomization inference is often dismissed as testing an uninteresting and implausible hypothesis: the sharp null of no effects whatsoever. We show that this view is overly narrow. Many randomization tests are also valid under a more general "bounded" null hypothesis under which all effects are weakly negative (or positive), thus accommodating heterogenous effects. By inverting such test… ▽ More Fisherian randomization inference is often dismissed as testing an uninteresting and implausible hypothesis: the sharp null of no effects whatsoever. We show that this view is overly narrow. Many randomization tests are also valid under a more general "bounded" null hypothesis under which all effects are weakly negative (or positive), thus accommodating heterogenous effects. By inverting such tests we can form one-sided confidence intervals for the maximum (or minimum) effect. These properties hold for all effect-increasing test statistics, which include both common statistics such as the mean difference and uncommon ones such as Stephenson rank statistics. The latter's sensitivity to extreme effects permits detection of positive effects even when the average effect is negative. We argue that bounded nulls are often of substantive or theoretical interest, and illustrate with two applications: testing monotonicity in an IV analysis and inferring effect sizes in a small randomized experiment. △ Less

Submitted 21 September, 2017; originally announced September 2017.

arXiv:1705.08807 [pdf, other]

When Will AI Exceed Human Performance? Evidence from AI Experts

Authors: Katja Grace, John Salvatier, Allan Dafoe, Baobao Zhang, Owain Evans

Abstract: Advances in artificial intelligence (AI) will transform modern life by resha** transportation, health, science, finance, and the military. To adapt public policy, we need to better anticipate these advances. Here we report the results from a large survey of machine learning researchers on their beliefs about progress in AI. Researchers predict AI will outperform humans in many activities in the… ▽ More Advances in artificial intelligence (AI) will transform modern life by resha** transportation, health, science, finance, and the military. To adapt public policy, we need to better anticipate these advances. Here we report the results from a large survey of machine learning researchers on their beliefs about progress in AI. Researchers predict AI will outperform humans in many activities in the next ten years, such as translating languages (by 2024), writing high-school essays (by 2026), driving a truck (by 2027), working in retail (by 2031), writing a bestselling book (by 2049), and working as a surgeon (by 2053). Researchers believe there is a 50% chance of AI outperforming humans in all tasks in 45 years and of automating all human jobs in 120 years, with Asian respondents expecting these dates much sooner than North Americans. These results will inform discussion amongst researchers and policymakers about anticipating and managing trends in AI. △ Less

Submitted 3 May, 2018; v1 submitted 24 May, 2017; originally announced May 2017.

Comments: Accepted by Journal of Artificial Intelligence Research (AI and Society Track). Minor update to refer to related work (page 5)

arXiv:1609.00606 [pdf, other]

doi 10.1515/em-2017-0013

The magnitude and direction of collider bias for binary variables

Authors: Trang Quynh Nguyen, Allan Dafoe, Elizabeth L. Ogburn

Abstract: Suppose we are interested in the effect of variable $X$ on variable $Y$. If $X$ and $Y$ both influence, or are associated with variables that influence, a common outcome, called a collider, then conditioning on the collider (or on a variable influenced by the collider -- its "child") induces a spurious association between $X$ and $Y$, which is known as collider bias. Characterizing the magnitude a… ▽ More Suppose we are interested in the effect of variable $X$ on variable $Y$. If $X$ and $Y$ both influence, or are associated with variables that influence, a common outcome, called a collider, then conditioning on the collider (or on a variable influenced by the collider -- its "child") induces a spurious association between $X$ and $Y$, which is known as collider bias. Characterizing the magnitude and direction of collider bias is crucial for understanding the implications of selection bias and for adjudicating decisions about whether to control for variables that are known to be associated with both exposure and outcome but could be either confounders or colliders. Considering a class of situations where all variables are binary, and where $X$ and $Y$ either are, or are respectively influenced by, two marginally independent causes of a collider, we derive collider bias that results from (i) conditioning on specific levels of, or (ii) linear regression adjustment for, the collider (or its child). We also derive simple conditions that determine the sign of such bias. △ Less

Submitted 14 January, 2019; v1 submitted 2 September, 2016; originally announced September 2016.

Journal ref: Epidemiologic Methods. 2019. Vol 8, Issue 1

Showing 1–29 of 29 results for author: Dafoe, A