Skip to main content

Showing 1–29 of 29 results for author: Dafoe, A

.
  1. arXiv:2404.16244  [pdf, other

    cs.CY

    The Ethics of Advanced AI Assistants

    Authors: Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks, Verena Rieser, Hasan Iqbal, Nenad Tomašev, Ira Ktena, Zachary Kenton, Mikel Rodriguez, Seliem El-Sayed, Sasha Brown, Canfer Akbulut, Andrew Trask, Edward Hughes, A. Stevie Bergman, Renee Shelby, Nahema Marchal, Conor Griffin, Juan Mateos-Garcia, Laura Weidinger, Winnie Street, Benjamin Lange, Alex Ingerman, Alison Lentz , et al. (32 additional authors not shown)

    Abstract: This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, pro… ▽ More

    Submitted 28 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  2. arXiv:2404.14068  [pdf, other

    cs.AI cs.LG

    Holistic Safety and Responsibility Evaluations of Advanced AI Models

    Authors: Laura Weidinger, Joslyn Barnhart, Jenny Brennan, Christina Butterfield, Susie Young, Will Hawkins, Lisa Anne Hendricks, Ramona Comanescu, Oscar Chang, Mikel Rodriguez, Jennifer Beroshi, Dawn Bloxwich, Lev Proleev, Jilin Chen, Sebastian Farquhar, Lewis Ho, Iason Gabriel, Allan Dafoe, William Isaac

    Abstract: Safety and responsibility evaluations of advanced AI models are a critical but develo** field of research and practice. In the development of Google DeepMind's advanced AI models, we innovated on and applied a broad set of approaches to safety evaluation. In this report, we summarise and share elements of our evolving approach as well as lessons learned for a broad audience. Key lessons learned… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 10 pages excluding bibliography

  3. arXiv:2403.13793  [pdf, other

    cs.LG

    Evaluating Frontier Models for Dangerous Capabilities

    Authors: Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, Heidi Howard, Tom Lieberum, Ramana Kumar, Maria Abi Raad, Albert Webson, Lewis Ho, Sharon Lin, Sebastian Farquhar, Marcus Hutter, Gregoire Deletang, Anian Ruoss, Seliem El-Sayed, Sasha Brown, Anca Dragan, Rohin Shah , et al. (2 additional authors not shown)

    Abstract: To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini 1.0 models. Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning. We do not find evidence of strong dangerous… ▽ More

    Submitted 5 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  4. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  5. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  6. arXiv:2311.02462  [pdf, ps, other

    cs.AI

    Levels of AGI for Operationalizing Progress on the Path to AGI

    Authors: Meredith Ringel Morris, Jascha Sohl-dickstein, Noah Fiedel, Tris Warkentin, Allan Dafoe, Aleksandra Faust, Clement Farabet, Shane Legg

    Abstract: We propose a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors. This framework introduces levels of AGI performance, generality, and autonomy, providing a common language to compare models, assess risks, and measure progress along the path to AGI. To develop our framework, we analyze existing definitions of AGI, and distill… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

    Comments: version 4 - Position Paper accepted to ICML 2024. Note that due to ICML position paper titling format requirements, the title has changed slightly from that of the original arXiv pre-print. The original pre-print title was "Levels of AGI: Operationalizing Progress on the Path to AGI" but the official published title for ICML 2024 is "Levels of AGI for Operationalizing Progress on the Path to AGI"

    Journal ref: Proceedings of ICML 2024

  7. arXiv:2307.04699  [pdf, other

    cs.CY

    International Institutions for Advanced AI

    Authors: Lewis Ho, Joslyn Barnhart, Robert Trager, Yoshua Bengio, Miles Brundage, Allison Carnegie, Rumman Chowdhury, Allan Dafoe, Gillian Hadfield, Margaret Levi, Duncan Snidal

    Abstract: International institutions may have an important role to play in ensuring advanced AI systems benefit humanity. International collaborations can unlock AI's ability to further sustainable development, and coordination of regulatory efforts can reduce obstacles to innovation and the spread of benefits. Conversely, the potential dangerous capabilities of powerful and general-purpose AI systems creat… ▽ More

    Submitted 11 July, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

    Comments: 19 pages, 2 figures, fixed rendering issues

    ACM Class: K.4.1

  8. arXiv:2305.15324  [pdf, other

    cs.AI

    Model evaluation for extreme risks

    Authors: Toby Shevlane, Sebastian Farquhar, Ben Garfinkel, Mary Phuong, Jess Whittlestone, Jade Leung, Daniel Kokotajlo, Nahema Marchal, Markus Anderljung, Noam Kolt, Lewis Ho, Divya Siddarth, Shahar Avin, Will Hawkins, Been Kim, Iason Gabriel, Vijay Bolina, Jack Clark, Yoshua Bengio, Paul Christiano, Allan Dafoe

    Abstract: Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify danger… ▽ More

    Submitted 22 September, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Fixed typos; added citation

    ACM Class: K.4.1

  9. arXiv:2303.12642  [pdf

    cs.AI cs.CY cs.LG

    Democratising AI: Multiple Meanings, Goals, and Methods

    Authors: Elizabeth Seger, Aviv Ovadya, Ben Garfinkel, Divya Siddarth, Allan Dafoe

    Abstract: Numerous parties are calling for the democratisation of AI, but the phrase is used to refer to a variety of goals, the pursuit of which sometimes conflict. This paper identifies four kinds of AI democratisation that are commonly discussed: (1) the democratisation of AI use, (2) the democratisation of AI development, (3) the democratisation of AI profits, and (4) the democratisation of AI governanc… ▽ More

    Submitted 7 August, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

    Comments: V2 Changed second author affiliation; added citation to section 5.2; edit to author contribution statement; V3 camera ready version for conference proceedings. Minor content changes in response to reviewer comments

  10. arXiv:2206.04132  [pdf, other

    cs.CY

    Forecasting AI Progress: Evidence from a Survey of Machine Learning Researchers

    Authors: Baobao Zhang, Noemi Dreksler, Markus Anderljung, Lauren Kahn, Charlie Giattino, Allan Dafoe, Michael C. Horowitz

    Abstract: Advances in artificial intelligence (AI) are sha** modern life, from transportation, health care, science, finance, to national defense. Forecasts of AI development could help improve policy- and decision-making. We report the results from a large survey of AI and machine learning (ML) researchers on their beliefs about progress in AI. The survey, fielded in late 2019, elicited forecasts for nea… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

    ACM Class: K.4.1

  11. arXiv:2111.13872  [pdf, other

    cs.MA cs.AI cs.GT cs.LG

    Normative Disagreement as a Challenge for Cooperative AI

    Authors: Julian Stastny, Maxime Riché, Alexander Lyzhov, Johannes Treutlein, Allan Dafoe, Jesse Clifton

    Abstract: Cooperation in settings where agents have both common and conflicting interests (mixed-motive environments) has recently received considerable attention in multi-agent learning. However, the mixed-motive environments typically studied have a single cooperative outcome on which all agents can agree. Many real-world multi-agent environments are instead bargaining problems (BPs): they have several Pa… ▽ More

    Submitted 27 November, 2021; originally announced November 2021.

    Comments: Accepted at the Cooperative AI workshop and the Strategic ML workshop at NeurIPS 2021

  12. Institutionalising Ethics in AI through Broader Impact Requirements

    Authors: Carina Prunkl, Carolyn Ashurst, Markus Anderljung, Helena Webb, Jan Leike, Allan Dafoe

    Abstract: Turning principles into practice is one of the most pressing challenges of artificial intelligence (AI) governance. In this article, we reflect on a novel governance initiative by one of the world's largest AI conferences. In 2020, the Conference on Neural Information Processing Systems (NeurIPS) introduced a requirement for submitting authors to include a statement on the broader societal impacts… ▽ More

    Submitted 30 May, 2021; originally announced June 2021.

    Journal ref: Nature Machine Intelligence 3.2 (2021): 104-110

  13. arXiv:2106.04338  [pdf

    econ.GN

    Engines of Power: Electricity, AI, and General-Purpose Military Transformations

    Authors: Jeffrey Ding, Allan Dafoe

    Abstract: Major theories of military innovation focus on relatively narrow technological developments, such as nuclear weapons or aircraft carriers. Arguably the most profound military implications of technological change, however, come from more fundamental advances arising from general purpose technologies, such as the steam engine, electricity, and the computer. With few exceptions, political scientists… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

  14. arXiv:2105.02117  [pdf, other

    cs.CY

    Ethics and Governance of Artificial Intelligence: Evidence from a Survey of Machine Learning Researchers

    Authors: Baobao Zhang, Markus Anderljung, Lauren Kahn, Noemi Dreksler, Michael C. Horowitz, Allan Dafoe

    Abstract: Machine learning (ML) and artificial intelligence (AI) researchers play an important role in the ethics and governance of AI, including taking action against what they perceive to be unethical uses of AI (Belfield, 2020; Van Noorden, 2020). Nevertheless, this influential group's attitudes are not well understood, which undermines our ability to discern consensuses or disagreements between AI/ML re… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    ACM Class: K.7.4

  15. Skilled and Mobile: Survey Evidence of AI Researchers' Immigration Preferences

    Authors: Remco Zwetsloot, Baobao Zhang, Noemi Dreksler, Lauren Kahn, Markus Anderljung, Allan Dafoe, Michael C. Horowitz

    Abstract: Countries, companies, and universities are increasingly competing over top-tier artificial intelligence (AI) researchers. Where are these researchers likely to immigrate and what affects their immigration decisions? We conducted a survey $(n = 524)$ of the immigration preferences and motivations of researchers that had papers accepted at one of two prestigious AI conferences: the Conference on Neu… ▽ More

    Submitted 5 May, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

    Comments: Accepted for poster presentation at the 2021 AAAI/ACM Conference on AI, Ethics, and Society

    ACM Class: K.7.4

  16. arXiv:2101.09195  [pdf, other

    stat.ME math.ST

    Randomization Inference beyond the Sharp Null: Bounded Null Hypotheses and Quantiles of Individual Treatment Effects

    Authors: Devin Caughey, Allan Dafoe, Xinran Li, Luke Miratrix

    Abstract: Randomization inference (RI) is typically interpreted as testing Fisher's "sharp" null hypothesis that all unit-level effects are exactly zero. This hypothesis is often criticized as restrictive and implausible, making its rejection scientifically uninteresting. We show, however, that many randomization tests are also valid for a "bounded" null hypothesis under which the unit-level effects are all… ▽ More

    Submitted 28 August, 2023; v1 submitted 22 January, 2021; originally announced January 2021.

  17. arXiv:2012.08630  [pdf, other

    cs.AI cs.MA

    Open Problems in Cooperative AI

    Authors: Allan Dafoe, Edward Hughes, Yoram Bachrach, Tantum Collins, Kevin R. McKee, Joel Z. Leibo, Kate Larson, Thore Graepel

    Abstract: Problems of cooperation--in which agents seek ways to jointly improve their welfare--are ubiquitous and important. They can be found at scales ranging from our daily routines--such as driving on highways, scheduling meetings, and working collaboratively--to our global challenges--such as peace, commerce, and pandemic preparedness. Arguably, the success of the human species is rooted in our ability… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

  18. arXiv:2012.08347  [pdf

    cs.CR cs.CY

    Beyond Privacy Trade-offs with Structured Transparency

    Authors: Andrew Trask, Emma Bluemke, Teddy Collins, Ben Garfinkel Eric Drexler, Claudia Ghezzou Cuervas-Mons, Iason Gabriel, Allan Dafoe, William Isaac

    Abstract: Successful collaboration involves sharing information. However, parties may disagree on how the information they need to share should be used. We argue that many of these concerns reduce to 'the copy problem': once a bit of information is copied and shared, the sender can no longer control how the recipient uses it. From the perspective of each collaborator, this presents a dilemma that can inhibi… ▽ More

    Submitted 12 March, 2024; v1 submitted 15 December, 2020; originally announced December 2020.

  19. arXiv:2004.07213  [pdf, ps, other

    cs.CY

    Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims

    Authors: Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian Hadfield, Heidy Khlaaf, **gying Yang, Helen Toner, Ruth Fong, Tegan Maharaj, Pang Wei Koh, Sara Hooker, Jade Leung, Andrew Trask, Emma Bluemke, Jonathan Lebensold, Cullen O'Keefe, Mark Koren, Théo Ryffel, JB Rubinovitz, Tamay Besiroglu, Federica Carugati, Jack Clark, Peter Eckersley , et al. (34 additional authors not shown)

    Abstract: With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they… ▽ More

    Submitted 20 April, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

  20. Social and Governance Implications of Improved Data Efficiency

    Authors: Aaron D. Tucker, Markus Anderljung, Allan Dafoe

    Abstract: Many researchers work on improving the data efficiency of machine learning. What would happen if they succeed? This paper explores the social-economic impact of increased data efficiency. Specifically, we examine the intuition that data efficiency will erode the barriers to entry protecting incumbent data-rich AI firms, exposing them to more competition from data-poor firms. We find that this intu… ▽ More

    Submitted 14 January, 2020; originally announced January 2020.

    Comments: 7 pages, 2 figures, accepted to Artificial Intelligence Ethics and Society 2020

  21. The Logic of Strategic Assets: From Oil to Artificial Intelligence

    Authors: Jeffrey Ding, Allan Dafoe

    Abstract: What resources and technologies are strategic? This question is often the focus of policy and theoretical debates, where the label "strategic" designates those assets that warrant the attention of the highest levels of the state. But these conversations are plagued by analytical confusion, flawed heuristics, and the rhetorical use of "strategic" to advance particular agendas. We aim to improve the… ▽ More

    Submitted 31 May, 2021; v1 submitted 9 January, 2020; originally announced January 2020.

    Comments: Added references and corrected typos

  22. arXiv:2001.00463  [pdf, ps, other

    cs.CY cs.AI

    The Offense-Defense Balance of Scientific Knowledge: Does Publishing AI Research Reduce Misuse?

    Authors: Toby Shevlane, Allan Dafoe

    Abstract: There is growing concern over the potential misuse of artificial intelligence (AI) research. Publishing scientific research can facilitate misuse of the technology, but the research can also contribute to protections against misuse. This paper addresses the balance between these two effects. Our theoretical framework elucidates the factors governing whether the published research will be more usef… ▽ More

    Submitted 9 January, 2020; v1 submitted 27 December, 2019; originally announced January 2020.

  23. arXiv:1912.12835  [pdf, other

    cs.CY

    U.S. Public Opinion on the Governance of Artificial Intelligence

    Authors: Baobao Zhang, Allan Dafoe

    Abstract: Artificial intelligence (AI) has widespread societal implications, yet social scientists are only beginning to study public attitudes toward the technology. Existing studies find that the public's trust in institutions can play a major role in sha** the regulation of emerging technologies. Using a large-scale survey (N=2000), we examined Americans' perceptions of 13 AI governance challenges as w… ▽ More

    Submitted 30 December, 2019; originally announced December 2019.

    Comments: 22 pages; 7 figures; 4 tables; accepted for oral presentation at the 2020 AAAI/ACM Conference on AI, Ethics, and Society

  24. arXiv:1912.11595  [pdf

    cs.CY cs.AI

    The Windfall Clause: Distributing the Benefits of AI for the Common Good

    Authors: Cullen O'Keefe, Peter Cihon, Ben Garfinkel, Carrick Flynn, Jade Leung, Allan Dafoe

    Abstract: As the transformative potential of AI has become increasingly salient as a matter of public and political interest, there has been growing discussion about the need to ensure that AI broadly benefits humanity. This in turn has spurred debate on the social responsibilities of large technology companies to serve the interests of society at large. In response, ethical principles and codes of conduct… ▽ More

    Submitted 24 January, 2020; v1 submitted 25 December, 2019; originally announced December 2019.

    Comments: Short version to be published in proceedings of AIES

  25. arXiv:1806.00610  [pdf, other

    cs.AI

    Between Progress and Potential Impact of AI: the Neglected Dimensions

    Authors: Fernando Martínez-Plumed, Shahar Avin, Miles Brundage, Allan Dafoe, Sean Ó hÉigeartaigh, José Hernández-Orallo

    Abstract: We reframe the analysis of progress in AI by incorporating into an overall framework both the task performance of a system, and the time and resource costs incurred in the development and deployment of the system. These costs include: data, expert knowledge, human oversight, software resources, computing cycles, hardware and network facilities, and (what kind of) time. These costs are distributed… ▽ More

    Submitted 2 July, 2022; v1 submitted 2 June, 2018; originally announced June 2018.

  26. arXiv:1802.07228  [pdf

    cs.AI cs.CR cs.CY

    The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation

    Authors: Miles Brundage, Shahar Avin, Jack Clark, Helen Toner, Peter Eckersley, Ben Garfinkel, Allan Dafoe, Paul Scharre, Thomas Zeitzoff, Bobby Filar, Hyrum Anderson, Heather Roff, Gregory C. Allen, Jacob Steinhardt, Carrick Flynn, Seán Ó hÉigeartaigh, Simon Beard, Haydn Belfield, Sebastian Farquhar, Clare Lyle, Rebecca Crootof, Owain Evans, Michael Page, Joanna Bryson, Roman Yampolskiy , et al. (1 additional authors not shown)

    Abstract: This report surveys the landscape of potential security threats from malicious uses of AI, and proposes ways to better forecast, prevent, and mitigate these threats. After analyzing the ways in which AI may influence the threat landscape in the digital, physical, and political domains, we make four high-level recommendations for AI researchers and other stakeholders. We also suggest several promis… ▽ More

    Submitted 20 February, 2018; originally announced February 2018.

  27. arXiv:1709.07339  [pdf, other

    stat.ME

    Beyond the Sharp Null: Randomization Inference, Bounded Null Hypotheses, and Confidence Intervals for Maximum Effects

    Authors: Devin Caughey, Allan Dafoe, Luke Miratrix

    Abstract: Fisherian randomization inference is often dismissed as testing an uninteresting and implausible hypothesis: the sharp null of no effects whatsoever. We show that this view is overly narrow. Many randomization tests are also valid under a more general "bounded" null hypothesis under which all effects are weakly negative (or positive), thus accommodating heterogenous effects. By inverting such test… ▽ More

    Submitted 21 September, 2017; originally announced September 2017.

  28. arXiv:1705.08807  [pdf, other

    cs.AI cs.CY

    When Will AI Exceed Human Performance? Evidence from AI Experts

    Authors: Katja Grace, John Salvatier, Allan Dafoe, Baobao Zhang, Owain Evans

    Abstract: Advances in artificial intelligence (AI) will transform modern life by resha** transportation, health, science, finance, and the military. To adapt public policy, we need to better anticipate these advances. Here we report the results from a large survey of machine learning researchers on their beliefs about progress in AI. Researchers predict AI will outperform humans in many activities in the… ▽ More

    Submitted 3 May, 2018; v1 submitted 24 May, 2017; originally announced May 2017.

    Comments: Accepted by Journal of Artificial Intelligence Research (AI and Society Track). Minor update to refer to related work (page 5)

  29. arXiv:1609.00606  [pdf, other

    stat.ME math.ST

    The magnitude and direction of collider bias for binary variables

    Authors: Trang Quynh Nguyen, Allan Dafoe, Elizabeth L. Ogburn

    Abstract: Suppose we are interested in the effect of variable $X$ on variable $Y$. If $X$ and $Y$ both influence, or are associated with variables that influence, a common outcome, called a collider, then conditioning on the collider (or on a variable influenced by the collider -- its "child") induces a spurious association between $X$ and $Y$, which is known as collider bias. Characterizing the magnitude a… ▽ More

    Submitted 14 January, 2019; v1 submitted 2 September, 2016; originally announced September 2016.

    Journal ref: Epidemiologic Methods. 2019. Vol 8, Issue 1