Search | arXiv e-print repository

Confidence-Building Measures for Artificial Intelligence: Workshop Proceedings

Authors: Sarah Shoker, Andrew Reddie, Sarah Barrington, Ruby Booth, Miles Brundage, Husanjot Chahal, Michael Depp, Bill Drexel, Ritwik Gupta, Marina Favaro, Jake Hecla, Alan Hickey, Margarita Konaev, Kirthi Kumar, Nathan Lambert, Andrew Lohn, Cullen O'Keefe, Nazneen Rajani, Michael Sellitto, Robert Trager, Leah Walker, Alexa Wehsener, Jessica Young

Abstract: Foundation models could eventually introduce several pathways for undermining state security: accidents, inadvertent escalation, unintentional conflict, the proliferation of weapons, and the interference with human diplomacy are just a few on a long list. The Confidence-Building Measures for Artificial Intelligence workshop hosted by the Geopolitics Team at OpenAI and the Berkeley Risk and Securit… ▽ More Foundation models could eventually introduce several pathways for undermining state security: accidents, inadvertent escalation, unintentional conflict, the proliferation of weapons, and the interference with human diplomacy are just a few on a long list. The Confidence-Building Measures for Artificial Intelligence workshop hosted by the Geopolitics Team at OpenAI and the Berkeley Risk and Security Lab at the University of California brought together a multistakeholder group to think through the tools and strategies to mitigate the potential risks introduced by foundation models to international security. Originating in the Cold War, confidence-building measures (CBMs) are actions that reduce hostility, prevent conflict escalation, and improve trust between parties. The flexibility of CBMs make them a key instrument for navigating the rapid changes in the foundation model landscape. Participants identified the following CBMs that directly apply to foundation models and which are further explained in this conference proceedings: 1. crisis hotlines 2. incident sharing 3. model, transparency, and system cards 4. content provenance and watermarks 5. collaborative red teaming and table-top exercises and 6. dataset and evaluation sharing. Because most foundation model developers are non-government entities, many CBMs will need to involve a wider stakeholder community. These measures can be implemented either by AI labs or by relevant government actors. △ Less

Submitted 3 August, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

arXiv:2305.14553 [pdf]

doi 10.51593/2022CA003

Adversarial Machine Learning and Cybersecurity: Risks, Challenges, and Legal Implications

Authors: Micah Musser, Andrew Lohn, James X. Dempsey, Jonathan Spring, Ram Shankar Siva Kumar, Brenda Leong, Christina Liaghati, Cindy Martinez, Crystal D. Grant, Daniel Rohrer, Heather Frase, Jonathan Elliott, John Bansemer, Mikel Rodriguez, Mitt Regan, Rumman Chowdhury, Stefan Hermanek

Abstract: In July 2022, the Center for Security and Emerging Technology (CSET) at Georgetown University and the Program on Geopolitics, Technology, and Governance at the Stanford Cyber Policy Center convened a workshop of experts to examine the relationship between vulnerabilities in artificial intelligence systems and more traditional types of software vulnerabilities. Topics discussed included the extent… ▽ More In July 2022, the Center for Security and Emerging Technology (CSET) at Georgetown University and the Program on Geopolitics, Technology, and Governance at the Stanford Cyber Policy Center convened a workshop of experts to examine the relationship between vulnerabilities in artificial intelligence systems and more traditional types of software vulnerabilities. Topics discussed included the extent to which AI vulnerabilities can be handled under standard cybersecurity processes, the barriers currently preventing the accurate sharing of information about AI vulnerabilities, legal issues associated with adversarial attacks on AI systems, and potential areas where government support could improve AI vulnerability management and mitigation. This report is meant to accomplish two things. First, it provides a high-level discussion of AI vulnerabilities, including the ways in which they are disanalogous to other types of vulnerabilities, and the current state of affairs regarding information sharing and legal oversight of AI vulnerabilities. Second, it attempts to articulate broad recommendations as endorsed by the majority of participants at the workshop. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2207.13825 [pdf, other]

Will AI Make Cyber Swords or Shields: A few mathematical models of technological progress

Authors: Andrew J Lohn, Krystal Alex Jackson

Abstract: We aim to demonstrate the value of mathematical models for policy debates about technological progress in cybersecurity by considering phishing, vulnerability discovery, and the dynamics between patching and exploitation. We then adjust the inputs to those mathematical models to match some possible advances in their underlying technology. We find that AI's impact on phishing may be overestimated b… ▽ More We aim to demonstrate the value of mathematical models for policy debates about technological progress in cybersecurity by considering phishing, vulnerability discovery, and the dynamics between patching and exploitation. We then adjust the inputs to those mathematical models to match some possible advances in their underlying technology. We find that AI's impact on phishing may be overestimated but could lead to more attacks going undetected. Advances in vulnerability discovery have the potential to help attackers more than defenders. And automation that writes exploits is more useful to attackers than automation that writes patches, although advances that help deploy patches faster have the potential to be more impactful than either. △ Less

Submitted 27 July, 2022; originally announced July 2022.

Comments: Technical companion paper to CSET report entitled "Will AI Make Cyber Swords or Shields: Using models to project the impact of technology development

arXiv:2206.12725 [pdf, other]

Empirical Evaluation of Physical Adversarial Patch Attacks Against Overhead Object Detection Models

Authors: Gavin S. Hartnett, Li Ang Zhang, Caolionn O'Connell, Andrew J. Lohn, Jair Aguirre

Abstract: Adversarial patches are images designed to fool otherwise well-performing neural network-based computer vision models. Although these attacks were initially conceived of and studied digitally, in that the raw pixel values of the image were perturbed, recent work has demonstrated that these attacks can successfully transfer to the physical world. This can be accomplished by printing out the patch a… ▽ More Adversarial patches are images designed to fool otherwise well-performing neural network-based computer vision models. Although these attacks were initially conceived of and studied digitally, in that the raw pixel values of the image were perturbed, recent work has demonstrated that these attacks can successfully transfer to the physical world. This can be accomplished by printing out the patch and adding it into scenes of newly captured images or video footage. In this work we further test the efficacy of adversarial patch attacks in the physical world under more challenging conditions. We consider object detection models trained on overhead imagery acquired through aerial or satellite cameras, and we test physical adversarial patches inserted into scenes of a desert environment. Our main finding is that it is far more difficult to successfully implement the adversarial patch attacks under these conditions than in the previously considered conditions. This has important implications for AI safety as the real-world threat posed by adversarial examples may be overstated. △ Less

Submitted 25 June, 2022; originally announced June 2022.

arXiv:2010.02456 [pdf, other]

Downscaling Attack and Defense: Turning What You See Back Into What You Get

Authors: Andrew J. Lohn

Abstract: The resizing of images, which is typically a required part of preprocessing for computer vision systems, is vulnerable to attack. Images can be created such that the image is completely different at machine-vision scales than at other scales and the default settings for some common computer vision and machine learning systems are vulnerable. We show that defenses exist and are trivial to administe… ▽ More The resizing of images, which is typically a required part of preprocessing for computer vision systems, is vulnerable to attack. Images can be created such that the image is completely different at machine-vision scales than at other scales and the default settings for some common computer vision and machine learning systems are vulnerable. We show that defenses exist and are trivial to administer provided that defenders are aware of the threat. These attacks and defenses help to establish the role of input sanitization in machine learning. △ Less

Submitted 7 October, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

arXiv:2009.00802 [pdf, other]

Estimating the Brittleness of AI: Safety Integrity Levels and the Need for Testing Out-Of-Distribution Performance

Authors: Andrew J. Lohn

Abstract: Test, Evaluation, Verification, and Validation (TEVV) for Artificial Intelligence (AI) is a challenge that threatens to limit the economic and societal rewards that AI researchers have devoted themselves to producing. A central task of TEVV for AI is estimating brittleness, where brittleness implies that the system functions well within some bounds and poorly outside of those bounds. This paper ar… ▽ More Test, Evaluation, Verification, and Validation (TEVV) for Artificial Intelligence (AI) is a challenge that threatens to limit the economic and societal rewards that AI researchers have devoted themselves to producing. A central task of TEVV for AI is estimating brittleness, where brittleness implies that the system functions well within some bounds and poorly outside of those bounds. This paper argues that neither of those criteria are certain of Deep Neural Networks. First, highly touted AI successes (eg. image classification and speech recognition) are orders of magnitude more failure-prone than are typically certified in critical systems even within design bounds (perfectly in-distribution sampling). Second, performance falls off only gradually as inputs become further Out-Of-Distribution (OOD). Enhanced emphasis is needed on designing systems that are resilient despite failure-prone AI components as well as on evaluating and improving OOD performance in order to get AI to where it can clear the challenging hurdles of TEVV and certification. △ Less

Submitted 1 September, 2020; originally announced September 2020.

arXiv:2004.07213 [pdf, ps, other]

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims

Authors: Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian Hadfield, Heidy Khlaaf, **gying Yang, Helen Toner, Ruth Fong, Tegan Maharaj, Pang Wei Koh, Sara Hooker, Jade Leung, Andrew Trask, Emma Bluemke, Jonathan Lebensold, Cullen O'Keefe, Mark Koren, Théo Ryffel, JB Rubinovitz, Tamay Besiroglu, Federica Carugati, Jack Clark, Peter Eckersley , et al. (34 additional authors not shown)

Abstract: With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they… ▽ More With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they are building AI responsibly, they will need to make verifiable claims to which they can be held accountable. Those outside of a given organization also need effective means of scrutinizing such claims. This report suggests various steps that different stakeholders can take to improve the verifiability of claims made about AI systems and their associated development processes, with a focus on providing evidence about the safety, security, fairness, and privacy protection of AI systems. We analyze ten mechanisms for this purpose--spanning institutions, software, and hardware--and make recommendations aimed at implementing, exploring, or improving those mechanisms. △ Less

Submitted 20 April, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

arXiv:2003.02763 [pdf, other]

A Quantitative History of A.I. Research in the United States and China

Authors: Daniel Ish, Andrew Lohn, Christian Curriden

Abstract: Motivated by recent interest in the status and consequences of competition between the U.S. and China in A.I. research, we analyze 60 years of abstract data scraped from Scopus to explore and quantify trends in publications on A.I. topics from institutions affiliated with each country. We find the total volume of publications produced in both countries grows with a remarkable regularity over tens… ▽ More Motivated by recent interest in the status and consequences of competition between the U.S. and China in A.I. research, we analyze 60 years of abstract data scraped from Scopus to explore and quantify trends in publications on A.I. topics from institutions affiliated with each country. We find the total volume of publications produced in both countries grows with a remarkable regularity over tens of years. While China initially experienced faster growth in publication volume than the U.S., growth slowed in China when it reached parity with the U.S. and the growth rates of both countries are now similar. We also see both countries undergo a seismic shift in topic choice around 1990, and connect this to an explosion of interest in neural network methods. Finally, we see evidence that between 2000 and 2010, China's topic choice tended to lag that of the U.S. but that in recent decades the topic portfolios have come into closer alignment. △ Less

Submitted 11 June, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

arXiv:1910.02095 [pdf, other]

Adversarial Examples for Cost-Sensitive Classifiers

Authors: Gavin S. Hartnett, Andrew J. Lohn, Alexander P. Sedlack

Abstract: Motivated by safety-critical classification problems, we investigate adversarial attacks against cost-sensitive classifiers. We use current state-of-the-art adversarially-resistant neural network classifiers [1] as the underlying models. Cost-sensitive predictions are then achieved via a final processing step in the feed-forward evaluation of the network. We evaluate the effectiveness of cost-sens… ▽ More Motivated by safety-critical classification problems, we investigate adversarial attacks against cost-sensitive classifiers. We use current state-of-the-art adversarially-resistant neural network classifiers [1] as the underlying models. Cost-sensitive predictions are then achieved via a final processing step in the feed-forward evaluation of the network. We evaluate the effectiveness of cost-sensitive classifiers against a variety of attacks and we introduce a new cost-sensitive attack which performs better than targeted attacks in some cases. We also explored the measures a defender can take in order to limit their vulnerability to these attacks. This attacker/defender scenario is naturally framed as a two-player zero-sum finite game which we analyze using game theory. △ Less

Submitted 4 October, 2019; originally announced October 2019.

arXiv:1910.00111 [pdf, other]

Defense in Depth: The Basics of Blockade and Delay

Authors: Andrew J. Lohn

Abstract: Given that individual defenses are rarely sufficient, defense-in-depth is nearly universal and options for individual defensive layers abound. We develop a simple mathematical theory that can help in selecting the type and quantity of defenses for two different defense-in-depth strategies: Blockade and Delay. This theoretical approach accounts for budgetary constraints and the number, skill, and d… ▽ More Given that individual defenses are rarely sufficient, defense-in-depth is nearly universal and options for individual defensive layers abound. We develop a simple mathematical theory that can help in selecting the type and quantity of defenses for two different defense-in-depth strategies: Blockade and Delay. This theoretical approach accounts for budgetary constraints and the number, skill, and diversity of attackers. We find that defenders have several reasons to be optimistic including that the number of required defenses increases more slowly than the number of attackers, that similar attackers are defended more easily than similar defenses are defeated, and that defenders do not necessarily need to act as quickly as attackers. △ Less

Submitted 30 September, 2019; originally announced October 2019.

arXiv:1808.10062 [pdf]

Timelines for In-Code Discovery of Zero-Day Vulnerabilities and Supply-Chain Attacks

Authors: Andrew J. Lohn

Abstract: Zero-day vulnerabilities can be accidentally or maliciously placed in code and can remain in place for years. In this study, we address an aspect of their longevity by considering the likelihood that they will be discovered in the code across versions. We approximate well-disguised vulnerabilities as only being discoverable if the relevant lines of code are explicitly examined, and obvious vulnera… ▽ More Zero-day vulnerabilities can be accidentally or maliciously placed in code and can remain in place for years. In this study, we address an aspect of their longevity by considering the likelihood that they will be discovered in the code across versions. We approximate well-disguised vulnerabilities as only being discoverable if the relevant lines of code are explicitly examined, and obvious vulnerabilities as being discoverable if any part of the relevant file is examined. We analyze the version-to-version changes in three types of open source software (Mozilla Firefox, GNU/Linus, and glibc) to understand the rate at which the various pieces of code are amended and find that much of the revision behavior can be captured with a simple intuitive model. We use that model and the data from over a billion unique lines of code in 87 different versions of software to specify the bounds for in-code discoverability of vulnerabilities - from expertly hidden to obviously observable. △ Less

Submitted 31 August, 2018; v1 submitted 29 August, 2018; originally announced August 2018.

arXiv:1406.4033 [pdf]

doi 10.1063/1.4895526

Degenerate Resistive Switching and Ultrahigh Density Storage in Resistive Memory

Authors: Andrew J. Lohn, Patrick R. Mickel, Conrad D. James, Matthew J. Marinella

Abstract: We show that, in tantalum oxide resistive memories, activation power provides a multi-level variable for information storage that can be set and read separately from the resistance. These two state variables (resistance and activation power) can be precisely controlled in two steps: (1) the possible activation power states are selected by partially reducing resistance, then (2) a subsequent partia… ▽ More We show that, in tantalum oxide resistive memories, activation power provides a multi-level variable for information storage that can be set and read separately from the resistance. These two state variables (resistance and activation power) can be precisely controlled in two steps: (1) the possible activation power states are selected by partially reducing resistance, then (2) a subsequent partial increase in resistance specifies the resistance state and the final activation power state. We show that these states can be precisely written and read electrically, making this approach potentially amenable for ultra-high density memories. We provide a theoretical explanation for information storage and retrieval from activation power and experimentally demonstrate information storage in a third dimension related to the change in activation power with resistance. △ Less

Submitted 16 June, 2014; originally announced June 2014.

Showing 1–12 of 12 results for author: Lohn, A