Search | arXiv e-print repository

IDs for AI Systems

Authors: Alan Chan, Noam Kolt, Peter Wills, Usman Anwar, Christian Schroeder de Witt, Nitarshan Rajkumar, Lewis Hammond, David Krueger, Lennart Heim, Markus Anderljung

Abstract: AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. A user may not be able to verify whether a system satisfies certain safety standards. An investigator may not know whom to investigate when a system causes an incident. A platform may find it difficult to penalize repeated negative interactions with the same s… ▽ More AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. A user may not be able to verify whether a system satisfies certain safety standards. An investigator may not know whom to investigate when a system causes an incident. A platform may find it difficult to penalize repeated negative interactions with the same system. Across a number of domains, IDs address analogous problems by identifying \textit{particular} entities (e.g., a particular Boeing 747) and providing information about other entities of the same class (e.g., some or all Boeing 747s). We propose a framework in which IDs are ascribed to \textbf{instances} of AI systems (e.g., a particular chat session with Claude 3), and associated information is accessible to parties seeking to interact with that system. We characterize IDs for AI systems, argue that there could be significant demand for IDs from key actors, analyze how those actors could incentivize ID adoption, explore potential implementations of our framework, and highlight limitations and risks. IDs seem most warranted in high-stakes settings, where certain actors (e.g., those that enable AI systems to make financial transactions) could experiment with incentives for ID use. Deployers of AI systems could experiment with develo** ID implementations. With further study, IDs could help to manage a world where AI systems pervade society. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2406.02619 [pdf, other]

Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits

Authors: Andis Draguns, Andrew Gritsevskiy, Sumeet Ramesh Motwani, Charlie Rogers-Smith, Jeffrey Ladish, Christian Schroeder de Witt

Abstract: The rapid proliferation of open-source language models significantly increases the risks of downstream backdoor attacks. These backdoors can introduce dangerous behaviours during model deployment and can evade detection by conventional cybersecurity monitoring systems. In this paper, we introduce a novel class of backdoors in autoregressive transformer models, that, in contrast to prior art, are u… ▽ More The rapid proliferation of open-source language models significantly increases the risks of downstream backdoor attacks. These backdoors can introduce dangerous behaviours during model deployment and can evade detection by conventional cybersecurity monitoring systems. In this paper, we introduce a novel class of backdoors in autoregressive transformer models, that, in contrast to prior art, are unelicitable in nature. Unelicitability prevents the defender from triggering the backdoor, making it impossible to evaluate or detect ahead of deployment even if given full white-box access and using automated techniques, such as red-teaming or certain formal verification methods. We show that our novel construction is not only unelicitable thanks to using cryptographic techniques, but also has favourable robustness properties. We confirm these properties in empirical investigations, and provide evidence that our backdoors can withstand state-of-the-art mitigation strategies. Additionally, we expand on previous work by showing that our universal backdoors, while not completely undetectable in white-box settings, can be harder to detect than some existing designs. By demonstrating the feasibility of seamlessly integrating backdoors into transformer models, this paper fundamentally questions the efficacy of pre-deployment detection strategies. This offers new insights into the offence-defence balance in AI safety and security. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 10 pages, 5 figures

arXiv:2405.19540 [pdf, other]

Computing Low-Entropy Couplings for Large-Support Distributions

Authors: Samuel Sokota, Dylan Sam, Christian Schroeder de Witt, Spencer Compton, Jakob Foerster, J. Zico Kolter

Abstract: Minimum-entropy coupling (MEC) -- the process of finding a joint distribution with minimum entropy for given marginals -- has applications in areas such as causality and steganography. However, existing algorithms are either computationally intractable for large-support distributions or limited to specific distribution types and sensitive to hyperparameter choices. This work addresses these limita… ▽ More Minimum-entropy coupling (MEC) -- the process of finding a joint distribution with minimum entropy for given marginals -- has applications in areas such as causality and steganography. However, existing algorithms are either computationally intractable for large-support distributions or limited to specific distribution types and sensitive to hyperparameter choices. This work addresses these limitations by unifying a prior family of iterative MEC (IMEC) approaches into a generalized partition-based formalism. From this framework, we derive a novel IMEC algorithm called ARIMEC, capable of handling arbitrary discrete distributions, and introduce a method to make IMEC robust to suboptimal hyperparameter settings. These innovations facilitate the application of IMEC to high-throughput steganography with language models, among other settings. Our codebase is available at https://github.com/ssokota/mec . △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2404.17047 [pdf, other]

Near to Mid-term Risks and Opportunities of Open-Source Generative AI

Authors: Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder de Witt, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Botos Csaba, Fabro Steibel, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Marvin Imperial, Juan A. Nolazco-Flores, Lori Landay, Matthew Jackson, Paul Röttger, Philip H. S. Torr, Trevor Darrell, Yong Suk Lee, Jakob Foerster

Abstract: In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation i… ▽ More In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source Generative AI. We argue for the responsible open sourcing of generative AI models in the near and medium term. To set the stage, we first introduce an AI openness taxonomy system and apply it to 40 current large language models. We then outline differential benefits and risks of open versus closed source AI and present potential risk mitigation, ranging from best practices to calls for technical and scientific contributions. We hope that this report will add a much needed missing voice to the current public discourse on near to mid-term AI safety and other societal impact. △ Less

Submitted 24 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: Accepted to ICML'24 as a position paper

arXiv:2404.07099 [pdf, other]

Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and Detection

Authors: Linas Nasvytis, Kai Sandbrink, Jakob Foerster, Tim Franzmeyer, Christian Schroeder de Witt

Abstract: While reinforcement learning (RL) algorithms have been successfully applied across numerous sequential decision-making problems, their generalization to unforeseen testing environments remains a significant concern. In this paper, we study the problem of out-of-distribution (OOD) detection in RL, which focuses on identifying situations at test time that RL agents have not encountered in their trai… ▽ More While reinforcement learning (RL) algorithms have been successfully applied across numerous sequential decision-making problems, their generalization to unforeseen testing environments remains a significant concern. In this paper, we study the problem of out-of-distribution (OOD) detection in RL, which focuses on identifying situations at test time that RL agents have not encountered in their training environments. We first propose a clarification of terminology for OOD detection in RL, which aligns it with the literature from other machine learning domains. We then present new benchmark scenarios for OOD detection, which introduce anomalies with temporal autocorrelation into different components of the agent-environment loop. We argue that such scenarios have been understudied in the current literature, despite their relevance to real-world situations. Confirming our theoretical predictions, our experimental results suggest that state-of-the-art OOD detectors are not able to identify such anomalies. To address this problem, we propose a novel method for OOD detection, which we call DEXTER (Detection via Extraction of Time Series Representations). By treating environment observations as time series data, DEXTER extracts salient time series features, and then leverages an ensemble of isolation forest algorithms to detect anomalies. We find that DEXTER can reliably identify anomalies across benchmark scenarios, exhibiting superior performance compared to both state-of-the-art OOD detectors and high-dimensional changepoint detectors adopted from statistics. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: Accepted as a full paper to the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024)

arXiv:2402.07510 [pdf, other]

Secret Collusion Among Generative AI Agents

Authors: Sumeet Ramesh Motwani, Mikhail Baranchuk, Martin Strohmeier, Vijay Bolina, Philip H. S. Torr, Lewis Hammond, Christian Schroeder de Witt

Abstract: Recent capability increases in large language models (LLMs) open up applications in which teams of communicating generative AI agents solve joint tasks. This poses privacy and security challenges concerning the unauthorised sharing of information, or other unwanted forms of agent coordination. Modern steganographic techniques could render such dynamics hard to detect. In this paper, we comprehensi… ▽ More Recent capability increases in large language models (LLMs) open up applications in which teams of communicating generative AI agents solve joint tasks. This poses privacy and security challenges concerning the unauthorised sharing of information, or other unwanted forms of agent coordination. Modern steganographic techniques could render such dynamics hard to detect. In this paper, we comprehensively formalise the problem of secret collusion in systems of generative AI agents by drawing on relevant concepts from both the AI and security literature. We study incentives for the use of steganography, and propose a variety of mitigation measures. Our investigations result in a model evaluation framework that systematically tests capabilities required for various forms of secret collusion. We provide extensive empirical results across a range of contemporary LLMs. While the steganographic capabilities of current models remain limited, GPT-4 displays a capability jump suggesting the need for continuous monitoring of steganographic frontier model capabilities. We conclude by laying out a comprehensive research program to mitigate future risks of collusion between generative AI models. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.01088 [pdf, other]

The Danger Of Arrogance: Welfare Equilibra As A Solution To Stackelberg Self-Play In Non-Coincidental Games

Authors: Jake Levi, Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster

Abstract: The increasing prevalence of multi-agent learning systems in society necessitates understanding how to learn effective and safe policies in general-sum multi-agent environments against a variety of opponents, including self-play. General-sum learning is difficult because of non-stationary opponents and misaligned incentives. Our first main contribution is to show that many recent approaches to gen… ▽ More The increasing prevalence of multi-agent learning systems in society necessitates understanding how to learn effective and safe policies in general-sum multi-agent environments against a variety of opponents, including self-play. General-sum learning is difficult because of non-stationary opponents and misaligned incentives. Our first main contribution is to show that many recent approaches to general-sum learning can be derived as approximations to Stackelberg strategies, which suggests a framework for develo** new multi-agent learning algorithms. We then define non-coincidental games as games in which the Stackelberg strategy profile is not a Nash Equilibrium. This notably includes several canonical matrix games and provides a normative theory for why existing algorithms fail in self-play in such games. We address this problem by introducing Welfare Equilibria (WE) as a generalisation of Stackelberg Strategies, which can recover desirable Nash Equilibria even in non-coincidental games. Finally, we introduce Welfare Function Search (WelFuSe) as a practical approach to finding desirable WE against unknown opponents, which finds more mutually desirable solutions in self-play, while preserving performance against naive learning opponents. △ Less

Submitted 27 March, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: 31 pages, 23 figures

arXiv:2311.10090 [pdf, other]

JaxMARL: Multi-Agent RL Environments in JAX

Authors: Alexander Rutherford, Benjamin Ellis, Matteo Gallici, Jonathan Cook, Andrei Lupu, Gardar Ingvarsson, Timon Willi, Akbir Khan, Christian Schroeder de Witt, Alexandra Souly, Saptarashmi Bandyopadhyay, Mikayel Samvelyan, Minqi Jiang, Robert Tjarko Lange, Shimon Whiteson, Bruno Lacerda, Nick Hawes, Tim Rocktaschel, Chris Lu, Jakob Nicolaus Foerster

Abstract: Benchmarks play an important role in the development of machine learning algorithms. For example, research in reinforcement learning (RL) has been heavily influenced by available environments and benchmarks. However, RL environments are traditionally run on the CPU, limiting their scalability with typical academic compute. Recent advancements in JAX have enabled the wider use of hardware accelerat… ▽ More Benchmarks play an important role in the development of machine learning algorithms. For example, research in reinforcement learning (RL) has been heavily influenced by available environments and benchmarks. However, RL environments are traditionally run on the CPU, limiting their scalability with typical academic compute. Recent advancements in JAX have enabled the wider use of hardware acceleration to overcome these computational hurdles, enabling massively parallel RL training pipelines and environments. This is particularly useful for multi-agent reinforcement learning (MARL) research. First of all, multiple agents must be considered at each environment step, adding computational burden, and secondly, the sample complexity is increased due to non-stationarity, decentralised partial observability, or other MARL challenges. In this paper, we present JaxMARL, the first open-source code base that combines ease-of-use with GPU enabled efficiency, and supports a large number of commonly used MARL environments as well as popular baseline algorithms. When considering wall clock time, our experiments show that per-run our JAX-based training pipeline is up to 12500x faster than existing approaches. This enables efficient and thorough evaluations, with the potential to alleviate the evaluation crisis of the field. We also introduce and benchmark SMAX, a vectorised, simplified version of the popular StarCraft Multi-Agent Challenge, which removes the need to run the StarCraft II game engine. This not only enables GPU acceleration, but also provides a more flexible MARL environment, unlocking the potential for self-play, meta-learning, and other future applications in MARL. We provide code at https://github.com/flairox/jaxmarl. △ Less

Submitted 19 December, 2023; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2308.13049 [pdf, other]

Bayesian Exploration Networks

Authors: Mattie Fellows, Brandon Kaplowitz, Christian Schroeder de Witt, Shimon Whiteson

Abstract: Bayesian reinforcement learning (RL) offers a principled and elegant approach for sequential decision making under uncertainty. Most notably, Bayesian agents do not face an exploration/exploitation dilemma, a major pathology of frequentist methods. However theoretical understanding of model-free approaches is lacking. In this paper, we introduce a novel Bayesian model-free formulation and the firs… ▽ More Bayesian reinforcement learning (RL) offers a principled and elegant approach for sequential decision making under uncertainty. Most notably, Bayesian agents do not face an exploration/exploitation dilemma, a major pathology of frequentist methods. However theoretical understanding of model-free approaches is lacking. In this paper, we introduce a novel Bayesian model-free formulation and the first analysis showing that model-free approaches can yield Bayes-optimal policies. We show all existing model-free approaches make approximations that yield policies that can be arbitrarily Bayes-suboptimal. As a first step towards model-free Bayes optimality, we introduce the Bayesian exploration network (BEN) which uses normalising flows to model both the aleatoric uncertainty (via density estimation) and epistemic uncertainty (via variational inference) in the Bellman operator. In the limit of complete optimisation, BEN learns true Bayes-optimal policies, but like in variational expectation-maximisation, partial optimisation renders our approach tractable. Empirical results demonstrate that BEN can learn true Bayes-optimal policies in tasks where existing model-free approaches fail. △ Less

Submitted 25 June, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: Typos fixed and provided clearer proof of Theorem 3.2

arXiv:2303.10733 [pdf, other]

Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning

Authors: Yat Long Lo, Christian Schroeder de Witt, Samuel Sokota, Jakob Nicolaus Foerster, Shimon Whiteson

Abstract: By enabling agents to communicate, recent cooperative multi-agent reinforcement learning (MARL) methods have demonstrated better task performance and more coordinated behavior. Most existing approaches facilitate inter-agent communication by allowing agents to send messages to each other through free communication channels, i.e., cheap talk channels. Current methods require these channels to be co… ▽ More By enabling agents to communicate, recent cooperative multi-agent reinforcement learning (MARL) methods have demonstrated better task performance and more coordinated behavior. Most existing approaches facilitate inter-agent communication by allowing agents to send messages to each other through free communication channels, i.e., cheap talk channels. Current methods require these channels to be constantly accessible and known to the agents a priori. In this work, we lift these requirements such that the agents must discover the cheap talk channels and learn how to use them. Hence, the problem has two main parts: cheap talk discovery (CTD) and cheap talk utilization (CTU). We introduce a novel conceptual framework for both parts and develop a new algorithm based on mutual information maximization that outperforms existing algorithms in CTD/CTU settings. We also release a novel benchmark suite to stimulate future research in CTD/CTU. △ Less

Submitted 19 March, 2023; originally announced March 2023.

Comments: The 11th International Conference on Learning Representations (ICLR)

arXiv:2211.11043 [pdf, other]

Revealing Robust Oil and Gas Company Macro-Strategies using Deep Multi-Agent Reinforcement Learning

Authors: Dylan Radovic, Lucas Kruitwagen, Christian Schroeder de Witt, Ben Caldecott, Shane Tomlinson, Mark Workman

Abstract: The energy transition potentially poses an existential risk for major international oil companies (IOCs) if they fail to adapt to low-carbon business models. Projections of energy futures, however, are met with diverging assumptions on its scale and pace, causing disagreement among IOC decision-makers and their stakeholders over what the business model of an incumbent fossil fuel company should be… ▽ More The energy transition potentially poses an existential risk for major international oil companies (IOCs) if they fail to adapt to low-carbon business models. Projections of energy futures, however, are met with diverging assumptions on its scale and pace, causing disagreement among IOC decision-makers and their stakeholders over what the business model of an incumbent fossil fuel company should be. In this work, we used deep multi-agent reinforcement learning to solve an energy systems wargame wherein players simulate IOC decision-making, including hydrocarbon and low-carbon investments decisions, dividend policies, and capital structure measures, through an uncertain energy transition to explore critical and non-linear governance questions, from leveraged transitions to reserve replacements. Adversarial play facilitated by state-of-the-art algorithms revealed decision-making strategies robust to energy transition uncertainty and against multiple IOCs. In all games, robust strategies emerged in the form of low-carbon business models as a result of early transition-oriented movement. IOCs adopting such strategies outperformed business-as-usual and delayed transition strategies regardless of hydrocarbon demand projections. In addition to maximizing value, these strategies benefit greater society by contributing substantial amounts of capital necessary to accelerate the global low-carbon energy transition. Our findings point towards the need for lenders and investors to effectively mobilize transition-oriented finance and engage with IOCs to ensure responsible reallocation of capital towards low-carbon business models that would enable the emergence of fossil fuel incumbents as future low-carbon leaders. △ Less

Submitted 20 November, 2022; originally announced November 2022.

arXiv:2210.14889 [pdf, other]

Perfectly Secure Steganography Using Minimum Entropy Coupling

Authors: Christian Schroeder de Witt, Samuel Sokota, J. Zico Kolter, Jakob Foerster, Martin Strohmeier

Abstract: Steganography is the practice of encoding secret information into innocuous content in such a manner that an adversarial third party would not realize that there is hidden meaning. While this problem has classically been studied in security literature, recent advances in generative models have led to a shared interest among security and machine learning researchers in develo** scalable steganogr… ▽ More Steganography is the practice of encoding secret information into innocuous content in such a manner that an adversarial third party would not realize that there is hidden meaning. While this problem has classically been studied in security literature, recent advances in generative models have led to a shared interest among security and machine learning researchers in develo** scalable steganography techniques. In this work, we show that a steganography procedure is perfectly secure under Cachin (1998)'s information-theoretic model of steganography if and only if it is induced by a coupling. Furthermore, we show that, among perfectly secure procedures, a procedure maximizes information throughput if and only if it is induced by a minimum entropy coupling. These insights yield what are, to the best of our knowledge, the first steganography algorithms to achieve perfect security guarantees for arbitrary covertext distributions. To provide empirical validation, we compare a minimum entropy coupling-based approach to three modern baselines -- arithmetic coding, Meteor, and adaptive dynamic grou** -- using GPT-2, WaveRNN, and Image Transformer as communication channels. We find that the minimum entropy coupling-based approach achieves superior encoding efficiency, despite its stronger security constraints. In aggregate, these results suggest that it may be natural to view information-theoretic steganography through the lens of minimum entropy coupling. △ Less

Submitted 30 October, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

arXiv:2210.12124 [pdf, other]

Equivariant Networks for Zero-Shot Coordination

Authors: Darius Muglich, Christian Schroeder de Witt, Elise van der Pol, Shimon Whiteson, Jakob Foerster

Abstract: Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner. A common failure mode is symmetry breaking, when agents arbitrarily converge on one out of many equivalent but mutually incompatible policies. Commonly these examples include partial observability, e.g. waving your right hand vs. left hand to convey a covert message.… ▽ More Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner. A common failure mode is symmetry breaking, when agents arbitrarily converge on one out of many equivalent but mutually incompatible policies. Commonly these examples include partial observability, e.g. waving your right hand vs. left hand to convey a covert message. In this paper, we present a novel equivariant network architecture for use in Dec-POMDPs that effectively leverages environmental symmetry for improving zero-shot coordination, doing so more effectively than prior methods. Our method also acts as a ``coordination-improvement operator'' for generic, pre-trained policies, and thus may be applied at test-time in conjunction with any self-play algorithm. We provide theoretical guarantees of our work and test on the AI benchmark task of Hanabi, where we demonstrate our methods outperforming other symmetry-aware baselines in zero-shot coordination, as well as able to improve the coordination ability of a variety of pre-trained policies. In particular, we show our method can be used to improve on the state of the art for zero-shot coordination on the Hanabi benchmark. △ Less

Submitted 10 April, 2024; v1 submitted 21 October, 2022; originally announced October 2022.

arXiv:2210.05639 [pdf, other]

Discovered Policy Optimisation

Authors: Chris Lu, Jakub Grudzien Kuba, Alistair Letcher, Luke Metz, Christian Schroeder de Witt, Jakob Foerster

Abstract: Tremendous progress has been made in reinforcement learning (RL) over the past decade. Most of these advancements came through the continual development of new algorithms, which were designed using a combination of mathematical derivations, intuitions, and experimentation. Such an approach of creating algorithms manually is limited by human understanding and ingenuity. In contrast, meta-learning p… ▽ More Tremendous progress has been made in reinforcement learning (RL) over the past decade. Most of these advancements came through the continual development of new algorithms, which were designed using a combination of mathematical derivations, intuitions, and experimentation. Such an approach of creating algorithms manually is limited by human understanding and ingenuity. In contrast, meta-learning provides a toolkit for automatic machine learning method optimisation, potentially addressing this flaw. However, black-box approaches which attempt to discover RL algorithms with minimal prior structure have thus far not outperformed existing hand-crafted algorithms. Mirror Learning, which includes RL algorithms, such as PPO, offers a potential middle-ground starting point: while every method in this framework comes with theoretical guarantees, components that differentiate them are subject to design. In this paper we explore the Mirror Learning space by meta-learning a "drift" function. We refer to the immediate result as Learnt Policy Optimisation (LPO). By analysing LPO we gain original insights into policy optimisation which we use to formulate a novel, closed-form RL algorithm, Discovered Policy Optimisation (DPO). Our experiments in Brax environments confirm state-of-the-art performance of LPO and DPO, as well as their transfer to unseen settings. △ Less

Submitted 12 October, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

Comments: NeurIPS 2022

arXiv:2207.10170 [pdf, other]

Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks

Authors: Tim Franzmeyer, Stephen McAleer, João F. Henriques, Jakob N. Foerster, Philip H. S. Torr, Adel Bibi, Christian Schroeder de Witt

Abstract: Autonomous agents deployed in the real world need to be robust against adversarial attacks on sensory inputs. Robustifying agent policies requires anticipating the strongest attacks possible. We demonstrate that existing observation-space attacks on reinforcement learning agents have a common weakness: while effective, their lack of information-theoretic detectability constraints makes them detect… ▽ More Autonomous agents deployed in the real world need to be robust against adversarial attacks on sensory inputs. Robustifying agent policies requires anticipating the strongest attacks possible. We demonstrate that existing observation-space attacks on reinforcement learning agents have a common weakness: while effective, their lack of information-theoretic detectability constraints makes them detectable using automated means or human inspection. Detectability is undesirable to adversaries as it may trigger security escalations. We introduce ε-illusory, a novel form of adversarial attack on sequential decision-makers that is both effective and of ε-bounded statistical detectability. We propose a novel dual ascent algorithm to learn such attacks end-to-end. Compared to existing attacks, we empirically find ε-illusory to be significantly harder to detect with automated methods, and a small study with human participants (IRB approval under reference R84123/RE001) suggests they are similarly harder to detect for humans. Our findings suggest the need for better anomaly detectors, as well as effective hardware- and system-level defenses. The project website can be found at https://tinyurl.com/illusory-attacks. △ Less

Submitted 6 May, 2024; v1 submitted 20 July, 2022; originally announced July 2022.

Comments: ICLR 2024 Spotlight (top 5%)

arXiv:2206.12765 [pdf, other]

Generalized Beliefs for Cooperative AI

Authors: Darius Muglich, Luisa Zintgraf, Christian Schroeder de Witt, Shimon Whiteson, Jakob Foerster

Abstract: Self-play is a common paradigm for constructing solutions in Markov games that can yield optimal policies in collaborative settings. However, these policies often adopt highly-specialized conventions that make playing with a novel partner difficult. To address this, recent approaches rely on encoding symmetry and convention-awareness into policy training, but these require strong environmental ass… ▽ More Self-play is a common paradigm for constructing solutions in Markov games that can yield optimal policies in collaborative settings. However, these policies often adopt highly-specialized conventions that make playing with a novel partner difficult. To address this, recent approaches rely on encoding symmetry and convention-awareness into policy training, but these require strong environmental assumptions and can complicate policy training. We therefore propose moving the learning of conventions to the belief space. Specifically, we propose a belief learning model that can maintain beliefs over rollouts of policies not seen at training time, and can thus decode and adapt to novel conventions at test time. We show how to leverage this model for both search and training of a best response over various pools of policies to greatly improve ad-hoc teamplay. We also show how our setup promotes explainability and interpretability of nuanced agent conventions. △ Less

Submitted 25 June, 2022; originally announced June 2022.

arXiv:2205.15311 [pdf, other]

Biological Evolution and Genetic Algorithms: Exploring the Space of Abstract Tile Self-Assembly

Authors: Christian Schroeder de Witt

Abstract: A physically-motivated genetic algorithm (GA) and full enumeration for a tile-based model of self-assembly (JaTAM) is implemented using a graphics processing unit (GPU). We observe performance gains with respect to state-of-the-art implementations on CPU of factor 7.7 for the GA and 2.9 for JaTAM. The correctness of our GA implementation is demonstrated using a test-bed fitness function, and our J… ▽ More A physically-motivated genetic algorithm (GA) and full enumeration for a tile-based model of self-assembly (JaTAM) is implemented using a graphics processing unit (GPU). We observe performance gains with respect to state-of-the-art implementations on CPU of factor 7.7 for the GA and 2.9 for JaTAM. The correctness of our GA implementation is demonstrated using a test-bed fitness function, and our JaTAM implementation is verified by classifying a well-known search space $S_{2,8}$ based on two tile types. The performance gains achieved allow for the classification of a larger search space $S^{32}_{3,8}$ based on three tile types. The prevalence of structures based on two tile types demonstrates that simple organisms emerge preferrably even in complex ecosystems. The modularity of the largest structures found motivates the assumption that to first order, $S_{2,8}$ forms the building blocks of $S_{3,8}$. We conclude that GPUs may play an important role in future studies of evolutionary dynamics. △ Less

Submitted 28 May, 2022; originally announced May 2022.

Comments: MPhys Thesis, 2012. Awarded University of Oxford Tessella Prize

arXiv:2205.01447 [pdf, other]

Model-Free Opponent Sha**

Authors: Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster

Abstract: In general-sum games, the interaction of self-interested learning agents commonly leads to collectively worst-case outcomes, such as defect-defect in the iterated prisoner's dilemma (IPD). To overcome this, some methods, such as Learning with Opponent-Learning Awareness (LOLA), shape their opponents' learning process. However, these methods are myopic since only a small number of steps can be anti… ▽ More In general-sum games, the interaction of self-interested learning agents commonly leads to collectively worst-case outcomes, such as defect-defect in the iterated prisoner's dilemma (IPD). To overcome this, some methods, such as Learning with Opponent-Learning Awareness (LOLA), shape their opponents' learning process. However, these methods are myopic since only a small number of steps can be anticipated, are asymmetric since they treat other agents as naive learners, and require the use of higher-order derivatives, which are calculated through white-box access to an opponent's differentiable learning algorithm. To address these issues, we propose Model-Free Opponent Sha** (M-FOS). M-FOS learns in a meta-game in which each meta-step is an episode of the underlying inner game. The meta-state consists of the inner policies, and the meta-policy produces a new inner policy to be used in the next episode. M-FOS then uses generic model-free optimisation methods to learn meta-policies that accomplish long-horizon opponent sha**. Empirically, M-FOS near-optimally exploits naive learners and other, more sophisticated algorithms from the literature. For example, to the best of our knowledge, it is the first method to learn the well-known Zero-Determinant (ZD) extortion strategy in the IPD. In the same settings, M-FOS leads to socially optimal outcomes under meta-self-play. Finally, we show that M-FOS can be scaled to high-dimensional settings. △ Less

Submitted 4 November, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

Comments: ICML 2022 camera ready version. Code: https://github.com/luchris429/Model-Free-Opponent-Sha**

arXiv:2205.00666 [pdf, other]

(Private)-Retroactive Carbon Pricing [(P)ReCaP]: A Market-based Approach for Climate Finance and Risk Assessment

Authors: Yoshua Bengio, Prateek Gupta, Dylan Radovic, Maarten Scholl, Andrew Williams, Christian Schroeder de Witt, Tianyu Zhang, Yang Zhang

Abstract: Insufficient Social Cost of Carbon (SCC) estimation methods and short-term decision-making horizons have hindered the ability of carbon emitters to properly correct for the negative externalities of climate change, as well as the capacity of nations to balance economic and climate policy. To overcome these limitations, we introduce Retrospective Social Cost of Carbon Updating (ReSCCU), a novel mec… ▽ More Insufficient Social Cost of Carbon (SCC) estimation methods and short-term decision-making horizons have hindered the ability of carbon emitters to properly correct for the negative externalities of climate change, as well as the capacity of nations to balance economic and climate policy. To overcome these limitations, we introduce Retrospective Social Cost of Carbon Updating (ReSCCU), a novel mechanism that corrects for these limitations as empirically measured evidence is collected. To implement ReSCCU in the context of carbon taxation, we propose Retroactive Carbon Pricing (ReCaP), a market mechanism in which polluters offload the payment of ReSCCU adjustments to insurers. To alleviate systematic risks and minimize government involvement, we introduce the Private ReCaP (PReCaP) prediction market, which could see real-world implementation based on the engagement of a few high net-worth individuals or independent institutions. △ Less

Submitted 2 May, 2022; originally announced May 2022.

MSC Class: 91B18 (Primary) 91B76; 91G40 (Secondary) ACM Class: J.4

arXiv:2201.02373 [pdf, other]

Mirror Learning: A Unifying Framework of Policy Optimisation

Authors: Jakub Grudzien Kuba, Christian Schroeder de Witt, Jakob Foerster

Abstract: Modern deep reinforcement learning (RL) algorithms are motivated by either the generalised policy iteration (GPI) or trust-region learning (TRL) frameworks. However, algorithms that strictly respect these theoretical frameworks have proven unscalable. Surprisingly, the only known scalable algorithms violate the GPI/TRL assumptions, e.g. due to required regularisation or other heuristics. The curre… ▽ More Modern deep reinforcement learning (RL) algorithms are motivated by either the generalised policy iteration (GPI) or trust-region learning (TRL) frameworks. However, algorithms that strictly respect these theoretical frameworks have proven unscalable. Surprisingly, the only known scalable algorithms violate the GPI/TRL assumptions, e.g. due to required regularisation or other heuristics. The current explanation of their empirical success is essentially "by analogy": they are deemed approximate adaptations of theoretically sound methods. Unfortunately, studies have shown that in practice these algorithms differ greatly from their conceptual ancestors. In contrast, in this paper we introduce a novel theoretical framework, named Mirror Learning, which provides theoretical guarantees to a large class of algorithms, including TRPO and PPO. While the latter two exploit the flexibility of our framework, GPI and TRL fit in merely as pathologically restrictive corner cases thereof. This suggests that the empirical performance of state-of-the-art methods is a direct consequence of their theoretical properties, rather than of aforementioned approximate analogies. Mirror learning sets us free to boldly explore novel, theoretically sound RL algorithms, a thus far uncharted wonderland. △ Less

Submitted 14 July, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

arXiv:2111.12197 [pdf, other]

Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the Age of AI-NIDS

Authors: Christian Schroeder de Witt, Yongchao Huang, Philip H. S. Torr, Martin Strohmeier

Abstract: Cyber attacks are increasing in volume, frequency, and complexity. In response, the security community is looking toward fully automating cyber defense systems using machine learning. However, so far the resultant effects on the coevolutionary dynamics of attackers and defenders have not been examined. In this whitepaper, we hypothesise that increased automation on both sides will accelerate the c… ▽ More Cyber attacks are increasing in volume, frequency, and complexity. In response, the security community is looking toward fully automating cyber defense systems using machine learning. However, so far the resultant effects on the coevolutionary dynamics of attackers and defenders have not been examined. In this whitepaper, we hypothesise that increased automation on both sides will accelerate the coevolutionary cycle, thus begging the question of whether there are any resultant fixed points, and how they are characterised. Working within the threat model of Locked Shields, Europe's largest cyberdefense exercise, we study blackbox adversarial attacks on network classifiers. Given already existing attack capabilities, we question the utility of optimal evasion attack frameworks based on minimal evasion distances. Instead, we suggest a novel reinforcement learning setting that can be used to efficiently generate arbitrary adversarial perturbations. We then argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions, and introduce a temporally extended multi-agent reinforcement learning framework in which the resultant dynamics can be studied. We hypothesise that one plausible fixed point of AI-NIDS may be a scenario where the defense strategy relies heavily on whitelisted feature flow subspaces. Finally, we demonstrate that a continual learning approach is required to study attacker-defender dynamics in temporally extended general-sum games. △ Less

Submitted 23 November, 2021; originally announced November 2021.

arXiv:2107.08295 [pdf, other]

Communicating via Markov Decision Processes

Authors: Samuel Sokota, Christian Schroeder de Witt, Maximilian Igl, Luisa Zintgraf, Philip Torr, Martin Strohmeier, J. Zico Kolter, Shimon Whiteson, Jakob Foerster

Abstract: We consider the problem of communicating exogenous information by means of Markov decision process trajectories. This setting, which we call a Markov coding game (MCG), generalizes both source coding and a large class of referential games. MCGs also isolate a problem that is important in decentralized control settings in which cheap-talk is not available -- namely, they require balancing communica… ▽ More We consider the problem of communicating exogenous information by means of Markov decision process trajectories. This setting, which we call a Markov coding game (MCG), generalizes both source coding and a large class of referential games. MCGs also isolate a problem that is important in decentralized control settings in which cheap-talk is not available -- namely, they require balancing communication with the associated cost of communicating. We contribute a theoretically grounded approach to MCGs based on maximum entropy reinforcement learning and minimum entropy coupling that we call MEME. Due to recent breakthroughs in approximation algorithms for minimum entropy coupling, MEME is not merely a theoretical algorithm, but can be applied to practical settings. Empirically, we show both that MEME is able to outperform a strong baseline on small MCGs and that MEME is able to achieve strong performance on extremely large MCGs. To the latter point, we demonstrate that MEME is able to losslessly communicate binary images via trajectories of Cartpole and Pong, while simultaneously achieving the maximal or near maximal expected returns, and that it is even capable of performing well in the presence of actuator noise. △ Less

Submitted 12 June, 2022; v1 submitted 17 July, 2021; originally announced July 2021.

Comments: ICML 2022

arXiv:2104.08492 [pdf, other]

A Self-Supervised Auxiliary Loss for Deep RL in Partially Observable Settings

Authors: Eltayeb Ahmed, Luisa Zintgraf, Christian A. Schroeder de Witt, Nicolas Usunier

Abstract: In this work we explore an auxiliary loss useful for reinforcement learning in environments where strong performing agents are required to be able to navigate a spatial environment. The auxiliary loss proposed is to minimize the classification error of a neural network classifier that predicts whether or not a pair of states sampled from the agents current episode trajectory are in order. The clas… ▽ More In this work we explore an auxiliary loss useful for reinforcement learning in environments where strong performing agents are required to be able to navigate a spatial environment. The auxiliary loss proposed is to minimize the classification error of a neural network classifier that predicts whether or not a pair of states sampled from the agents current episode trajectory are in order. The classifier takes as input a pair of states as well as the agent's memory. The motivation for this auxiliary loss is that there is a strong correlation with which of a pair of states is more recent in the agents episode trajectory and which of the two states is spatially closer to the agent. Our hypothesis is that learning features to answer this question encourages the agent to learn and internalize in memory representations of states that facilitate spatial reasoning. We tested this auxiliary loss on a navigation task in a gridworld and achieved 9.6% increase in accumulative episode reward compared to a strong baseline approach. △ Less

Submitted 17 April, 2021; originally announced April 2021.

arXiv:2012.09670 [pdf, other]

RainBench: Towards Global Precipitation Forecasting from Satellite Imagery

Authors: Christian Schroeder de Witt, Catherine Tong, Valentina Zantedeschi, Daniele De Martini, Freddie Kalaitzis, Matthew Chantry, Duncan Watson-Parris, Piotr Bilinski

Abstract: Extreme precipitation events, such as violent rainfall and hail storms, routinely ravage economies and livelihoods around the develo** world. Climate change further aggravates this issue. Data-driven deep learning approaches could widen the access to accurate multi-day forecasts, to mitigate against such events. However, there is currently no benchmark dataset dedicated to the study of global pr… ▽ More Extreme precipitation events, such as violent rainfall and hail storms, routinely ravage economies and livelihoods around the develo** world. Climate change further aggravates this issue. Data-driven deep learning approaches could widen the access to accurate multi-day forecasts, to mitigate against such events. However, there is currently no benchmark dataset dedicated to the study of global precipitation forecasts. In this paper, we introduce \textbf{RainBench}, a new multi-modal benchmark dataset for data-driven precipitation forecasting. It includes simulated satellite data, a selection of relevant meteorological data from the ERA5 reanalysis product, and IMERG precipitation data. We also release \textbf{PyRain}, a library to process large precipitation datasets efficiently. We present an extensive analysis of our novel dataset and establish baseline results for two benchmark medium-range precipitation forecasting tasks. Finally, we discuss existing data-driven weather forecasting methodologies and suggest future research avenues. △ Less

Submitted 17 December, 2020; originally announced December 2020.

Comments: Work completed during the 2020 Frontier Development Lab research accelerator, a private-public partnership with NASA in the US, and ESA in Europe. Accepted as a spotlight/long oral talk at both Climate Change and AI, as well as AI for Earth Sciences Workshops at NeurIPS 2020

arXiv:2011.09533 [pdf, other]

Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?

Authors: Christian Schroeder de Witt, Tarun Gupta, Denys Makoviichuk, Viktor Makoviychuk, Philip H. S. Torr, Mingfei Sun, Shimon Whiteson

Abstract: Most recently developed approaches to cooperative multi-agent reinforcement learning in the \emph{centralized training with decentralized execution} setting involve estimating a centralized, joint value function. In this paper, we demonstrate that, despite its various theoretical shortcomings, Independent PPO (IPPO), a form of independent learning in which each agent simply estimates its local val… ▽ More Most recently developed approaches to cooperative multi-agent reinforcement learning in the \emph{centralized training with decentralized execution} setting involve estimating a centralized, joint value function. In this paper, we demonstrate that, despite its various theoretical shortcomings, Independent PPO (IPPO), a form of independent learning in which each agent simply estimates its local value function, can perform just as well as or better than state-of-the-art joint learning approaches on popular multi-agent benchmark suite SMAC with little hyperparameter tuning. We also compare IPPO to several variants; the results suggest that IPPO's strong performance may be due to its robustness to some forms of environment non-stationarity. △ Less

Submitted 18 November, 2020; originally announced November 2020.

arXiv:2006.04222 [pdf, other]

Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning

Authors: Shariq Iqbal, Christian A. Schroeder de Witt, Bei Peng, Wendelin Böhmer, Shimon Whiteson, Fei Sha

Abstract: Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities; however, common patterns of behavior often emerge among these agents/entities. Our method aims to leverage these commonalities by asking the question: ``What is the expected utility of each agent when only considering a randomly selected sub-group of its observed entities?… ▽ More Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities; however, common patterns of behavior often emerge among these agents/entities. Our method aims to leverage these commonalities by asking the question: ``What is the expected utility of each agent when only considering a randomly selected sub-group of its observed entities?'' By posing this counterfactual question, we can recognize state-action trajectories within sub-groups of entities that we may have encountered in another task and use what we learned in that task to inform our prediction in the current one. We then reconstruct a prediction of the full returns as a combination of factors considering these disjoint groups of entities and train this ``randomly factorized" value function as an auxiliary objective for value-based multi-agent reinforcement learning. By doing so, our model can recognize and leverage similarities across tasks to improve learning efficiency in a multi-task setting. Our approach, Randomized Entity-wise Factorization for Imagined Learning (REFIL), outperforms all strong baselines by a significant margin in challenging multi-task StarCraft micromanagement settings. △ Less

Submitted 11 June, 2021; v1 submitted 7 June, 2020; originally announced June 2020.

Comments: ICML 2021 Camera Ready

arXiv:2005.07062 [pdf, other]

Simulation-Based Inference for Global Health Decisions

Authors: Christian Schroeder de Witt, Bradley Gram-Hansen, Nantas Nardelli, Andrew Gambardella, Rob Zinkov, Puneet Dokania, N. Siddharth, Ana Belen Espinosa-Gonzalez, Ara Darzi, Philip Torr, Atılım Güneş Baydin

Abstract: The COVID-19 pandemic has highlighted the importance of in-silico epidemiological modelling in predicting the dynamics of infectious diseases to inform health policy and decision makers about suitable prevention and containment strategies. Work in this setting involves solving challenging inference and control problems in individual-based models of ever increasing complexity. Here we discuss recen… ▽ More The COVID-19 pandemic has highlighted the importance of in-silico epidemiological modelling in predicting the dynamics of infectious diseases to inform health policy and decision makers about suitable prevention and containment strategies. Work in this setting involves solving challenging inference and control problems in individual-based models of ever increasing complexity. Here we discuss recent breakthroughs in machine learning, specifically in simulation-based inference, and explore its potential as a novel venue for model calibration to support the design and evaluation of public health interventions. To further stimulate research, we are develo** software interfaces that turn two cornerstone COVID-19 and malaria epidemiology models COVID-sim, (https://github.com/mrc-ide/covid-sim/) and OpenMalaria (https://github.com/SwissTPH/openmalaria) into probabilistic programs, enabling efficient interpretable Bayesian inference within those simulators. △ Less

Submitted 14 May, 2020; originally announced May 2020.

Journal ref: ICML Workshop on Machine Learning for Global Health, Thirty-Seventh International Conference on Machine Learning (ICML 2020)

arXiv:2003.08839 [pdf, other]

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

Authors: Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson

Abstract: In many real-world settings, a team of agents must coordinate its behaviour while acting in a decentralised fashion. At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised l… ▽ More In many real-world settings, a team of agents must coordinate its behaviour while acting in a decentralised fashion. At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a mixing network that estimates joint action-values as a monotonic combination of per-agent values. We structurally enforce that the joint-action value is monotonic in the per-agent values, through the use of non-negative weights in the mixing network, which guarantees consistency between the centralised and decentralised policies. To evaluate the performance of QMIX, we propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning. We evaluate QMIX on a challenging set of SMAC scenarios and show that it significantly outperforms existing multi-agent reinforcement learning methods. △ Less

Submitted 27 August, 2020; v1 submitted 19 March, 2020; originally announced March 2020.

Comments: Extended version of the ICML 2018 conference paper (arXiv:1803.11485)

Journal ref: Journal of Machine Learning Research 21(178):1-51, 2020

arXiv:2003.06709 [pdf, other]

FACMAC: Factored Multi-Agent Centralised Policy Gradients

Authors: Bei Peng, Tabish Rashid, Christian A. Schroeder de Witt, Pierre-Alexandre Kamienny, Philip H. S. Torr, Wendelin Böhmer, Shimon Whiteson

Abstract: We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. Like MADDPG, a popular multi-agent actor-critic method, our approach uses deep deterministic policy gradients to learn policies. However, FACMAC learns a centralised but factored critic, which combines per-agent utilit… ▽ More We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. Like MADDPG, a popular multi-agent actor-critic method, our approach uses deep deterministic policy gradients to learn policies. However, FACMAC learns a centralised but factored critic, which combines per-agent utilities into the joint action-value function via a non-linear monotonic function, as in QMIX, a popular multi-agent Q-learning algorithm. However, unlike QMIX, there are no inherent constraints on factoring the critic. We thus also employ a nonmonotonic factorisation and empirically demonstrate that its increased representational capacity allows it to solve some tasks that cannot be solved with monolithic, or monotonically factored critics. In addition, FACMAC uses a centralised policy gradient estimator that optimises over the entire joint action space, rather than optimising over each agent's action space separately as in MADDPG. This allows for more coordinated policy changes and fully reaps the benefits of a centralised critic. We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks. Empirical results demonstrate FACMAC's superior performance over MADDPG and other baselines on all three domains. △ Less

Submitted 7 May, 2021; v1 submitted 14 March, 2020; originally announced March 2020.

arXiv:1910.09056 [pdf, other]

Amortized Rejection Sampling in Universal Probabilistic Programming

Authors: Saeid Naderiparizi, Adam Ścibior, Andreas Munk, Mehrdad Ghadiri, Atılım Güneş Baydin, Bradley Gram-Hansen, Christian Schroeder de Witt, Robert Zinkov, Philip H. S. Torr, Tom Rainforth, Yee Whye Teh, Frank Wood

Abstract: Naive approaches to amortized inference in probabilistic programs with unbounded loops can produce estimators with infinite variance. This is particularly true of importance sampling inference in programs that explicitly include rejection sampling as part of the user-programmed generative procedure. In this paper we develop a new and efficient amortized importance sampling estimator. We prove fini… ▽ More Naive approaches to amortized inference in probabilistic programs with unbounded loops can produce estimators with infinite variance. This is particularly true of importance sampling inference in programs that explicitly include rejection sampling as part of the user-programmed generative procedure. In this paper we develop a new and efficient amortized importance sampling estimator. We prove finite variance of our estimator and empirically demonstrate our method's correctness and efficiency compared to existing alternatives on generative programs containing rejection sampling loops and discuss how to implement our method in a generic probabilistic programming framework. △ Less

Submitted 28 March, 2022; v1 submitted 20 October, 2019; originally announced October 2019.

Comments: AISTATS 2022 camera ready

arXiv:1905.12432 [pdf, other]

Hijacking Malaria Simulators with Probabilistic Programming

Authors: Bradley Gram-Hansen, Christian Schröder de Witt, Tom Rainforth, Philip H. S. Torr, Yee Whye Teh, Atılım Güneş Baydin

Abstract: Epidemiology simulations have become a fundamental tool in the fight against the epidemics of various infectious diseases like AIDS and malaria. However, the complicated and stochastic nature of these simulators can mean their output is difficult to interpret, which reduces their usefulness to policymakers. In this paper, we introduce an approach that allows one to treat a large class of populatio… ▽ More Epidemiology simulations have become a fundamental tool in the fight against the epidemics of various infectious diseases like AIDS and malaria. However, the complicated and stochastic nature of these simulators can mean their output is difficult to interpret, which reduces their usefulness to policymakers. In this paper, we introduce an approach that allows one to treat a large class of population-based epidemiology simulators as probabilistic generative models. This is achieved by hijacking the internal random number generator calls, through the use of a universal probabilistic programming system (PPS). In contrast to other methods, our approach can be easily retrofitted to simulators written in popular industrial programming frameworks. We demonstrate that our method can be used for interpretable introspection and inference, thus shedding light on black-box simulators. This reinstates much-needed trust between policymakers and evidence-based methods. △ Less

Submitted 29 May, 2019; originally announced May 2019.

Comments: 6 pages, 3 figures, Accepted at the International Conference on Machine Learning AI for Social Good Workshop, Long Beach, United States, 2019

Journal ref: ICML Workshop on AI for Social Good, 2018

arXiv:1905.07366 [pdf, other]

Stratospheric Aerosol Injection as a Deep Reinforcement Learning Problem

Authors: Christian Schroeder de Witt, Thomas Hornigold

Abstract: As global greenhouse gas emissions continue to rise, the use of stratospheric aerosol injection (SAI), a form of solar geoengineering, is increasingly considered in order to artificially mitigate climate change effects. However, initial research in simulation suggests that naive SAI can have catastrophic regional consequences, which may induce serious geostrategic conflicts. Current geo-engineerin… ▽ More As global greenhouse gas emissions continue to rise, the use of stratospheric aerosol injection (SAI), a form of solar geoengineering, is increasingly considered in order to artificially mitigate climate change effects. However, initial research in simulation suggests that naive SAI can have catastrophic regional consequences, which may induce serious geostrategic conflicts. Current geo-engineering research treats SAI control in low-dimensional approximation only. We suggest treating SAI as a high-dimensional control problem, with policies trained according to a context-sensitive reward function within the Deep Reinforcement Learning (DRL) paradigm. In order to facilitate training in simulation, we suggest to emulate HadCM3, a widely used General Circulation Model, using deep learning techniques. We believe this is the first application of DRL to the climate sciences. △ Less

Submitted 17 May, 2019; originally announced May 2019.

Comments: Awarded Poster and Spotlight Oral at Climate Change: How Can AI Help? (Workshop) at International Conference on Machine Learning, Long Beach, California, 2019

arXiv:1902.04043 [pdf, other]

The StarCraft Multi-Agent Challenge

Authors: Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, Shimon Whiteson

Abstract: In the last few years, deep multi-agent reinforcement learning (RL) has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative, multi-agent learning, in which teams of agents must learn to coordinate their behaviour while conditioning only on their private observations. This is an attractive research area since such p… ▽ More In the last few years, deep multi-agent reinforcement learning (RL) has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative, multi-agent learning, in which teams of agents must learn to coordinate their behaviour while conditioning only on their private observations. This is an attractive research area since such problems are relevant to a large number of real-world systems and are also more amenable to evaluation than general-sum problems. Standardised environments such as the ALE and MuJoCo have allowed single-agent RL to move beyond toy domains, such as grid worlds. However, there is no comparable benchmark for cooperative multi-agent RL. As a result, most papers in this field use one-off toy problems, making it difficult to measure real progress. In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap. SMAC is based on the popular real-time strategy game StarCraft II and focuses on micromanagement challenges where each unit is controlled by an independent agent that must act based on local observations. We offer a diverse set of challenge maps and recommendations for best practices in benchmarking and evaluations. We also open-source a deep multi-agent RL learning framework including state-of-the-art algorithms. We believe that SMAC can provide a standard benchmark environment for years to come. Videos of our best agents for several SMAC scenarios are available at: https://youtu.be/VZ7zmQ_obZ0. △ Less

Submitted 9 December, 2019; v1 submitted 11 February, 2019; originally announced February 2019.

arXiv:1810.11702 [pdf, other]

Multi-Agent Common Knowledge Reinforcement Learning

Authors: Christian A. Schroeder de Witt, Jakob N. Foerster, Gregory Farquhar, Philip H. S. Torr, Wendelin Boehmer, Shimon Whiteson

Abstract: Cooperative multi-agent reinforcement learning often requires decentralised policies, which severely limit the agents' ability to coordinate their behaviour. In this paper, we show that common knowledge between agents allows for complex decentralised coordination. Common knowledge arises naturally in a large number of decentralised cooperative multi-agent tasks, for example, when agents can recons… ▽ More Cooperative multi-agent reinforcement learning often requires decentralised policies, which severely limit the agents' ability to coordinate their behaviour. In this paper, we show that common knowledge between agents allows for complex decentralised coordination. Common knowledge arises naturally in a large number of decentralised cooperative multi-agent tasks, for example, when agents can reconstruct parts of each others' observations. Since agents an independently agree on their common knowledge, they can execute complex coordinated policies that condition on this knowledge in a fully decentralised fashion. We propose multi-agent common knowledge reinforcement learning (MACKRL), a novel stochastic actor-critic algorithm that learns a hierarchical policy tree. Higher levels in the hierarchy coordinate groups of agents by conditioning on their common knowledge, or delegate to lower levels with smaller subgroups but potentially richer common knowledge. The entire policy tree can be executed in a fully decentralised fashion. As the lowest policy tree level consists of independent policies for each agent, MACKRL reduces to independently learnt decentralised policies as a special case. We demonstrate that our method can exploit common knowledge for superior performance on complex decentralised coordination tasks, including a stochastic matrix game and challenging problems in StarCraft II unit micromanagement. △ Less

Submitted 11 January, 2020; v1 submitted 27 October, 2018; originally announced October 2018.

Comments: Advances in Neural Information Processing Systems, 9924-9935

arXiv:1807.03632 [pdf, other]

doi 10.1145/3203217.3205341

The SAGE Project: a Storage Centric Approach for Exascale Computing

Authors: Sai Narasimhamurthy, Nikita Danilov, Sining Wu, Ganesan Umanesan, Steven Wei-der Chien, Sergio Rivas-Gomez, Ivy Bo Peng, Erwin Laure, Shaun de Witt, Dirk Pleiter, Stefano Markidis

Abstract: SAGE (Percipient StorAGe for Exascale Data Centric Computing) is a European Commission funded project towards the era of Exascale computing. Its goal is to design and implement a Big Data/Extreme Computing (BDEC) capable infrastructure with associated software stack. The SAGE system follows a "storage centric" approach as it is capable of storing and processing large data volumes at the Exascale r… ▽ More SAGE (Percipient StorAGe for Exascale Data Centric Computing) is a European Commission funded project towards the era of Exascale computing. Its goal is to design and implement a Big Data/Extreme Computing (BDEC) capable infrastructure with associated software stack. The SAGE system follows a "storage centric" approach as it is capable of storing and processing large data volumes at the Exascale regime. SAGE addresses the convergence of Big Data Analysis and HPC in an era of next-generation data centric computing. This convergence is driven by the proliferation of massive data sources, such as large, dispersed scientific instruments and sensors where data needs to be processed, analyzed and integrated into simulations to derive scientific and innovative insights. A first prototype of the SAGE system has been been implemented and installed at the Julich Supercomputing Center. The SAGE storage system consists of multiple types of storage device technologies in a multi-tier I/O hierarchy, including flash, disk, and non-volatile memory technologies. The main SAGE software component is the Seagate Mero Object Storage that is accessible via the Clovis API and higher level interfaces. The SAGE project also includes scientific applications for the validation of the SAGE concepts. The objective of this paper is to present the SAGE project concepts, the prototype of the SAGE platform and discuss the software architecture of the SAGE system. △ Less

Submitted 6 July, 2018; originally announced July 2018.

Comments: Submitted to Computing Frontiers 2018. arXiv admin note: substantial text overlap with arXiv:1805.00556

arXiv:1805.00556 [pdf, other]

doi 10.1016/j.parco.2018.03.002

SAGE: Percipient Storage for Exascale Data Centric Computing

Authors: Sai Narasimhamurthy, Nikita Danilov, Sining Wu, Ganesan Umanesan, Stefano Markidis, Sergio Rivas-Gomez, Ivy Bo Peng, Erwin Laure, Dirk Pleiter, Shaun de Witt

Abstract: We aim to implement a Big Data/Extreme Computing (BDEC) capable system infrastructure as we head towards the era of Exascale computing - termed SAGE (Percipient StorAGe for Exascale Data Centric Computing). The SAGE system will be capable of storing and processing immense volumes of data at the Exascale regime, and provide the capability for Exascale class applications to use such a storage infras… ▽ More We aim to implement a Big Data/Extreme Computing (BDEC) capable system infrastructure as we head towards the era of Exascale computing - termed SAGE (Percipient StorAGe for Exascale Data Centric Computing). The SAGE system will be capable of storing and processing immense volumes of data at the Exascale regime, and provide the capability for Exascale class applications to use such a storage infrastructure. SAGE addresses the increasing overlaps between Big Data Analysis and HPC in an era of next-generation data centric computing that has developed due to the proliferation of massive data sources, such as large, dispersed scientific instruments and sensors, whose data needs to be processed, analyzed and integrated into simulations to derive scientific and innovative insights. Indeed, Exascale I/O, as a problem that has not been sufficiently dealt with for simulation codes, is appropriately addressed by the SAGE platform. The objective of this paper is to discuss the software architecture of the SAGE system and look at early results we have obtained employing some of its key methodologies, as the system continues to evolve. △ Less

Submitted 1 May, 2018; originally announced May 2018.

Journal ref: Parallel Computing, 23 March 2018

arXiv:1803.11485 [pdf, other]

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

Authors: Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson

Abstract: In many real-world settings, a team of agents must coordinate their behaviour while acting in a decentralised way. At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an att… ▽ More In many real-world settings, a team of agents must coordinate their behaviour while acting in a decentralised way. At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations. We structurally enforce that the joint-action value is monotonic in the per-agent values, which allows tractable maximisation of the joint action-value in off-policy learning, and guarantees consistency between the centralised and decentralised policies. We evaluate QMIX on a challenging set of StarCraft II micromanagement tasks, and show that QMIX significantly outperforms existing value-based multi-agent reinforcement learning methods. △ Less

Submitted 6 June, 2018; v1 submitted 30 March, 2018; originally announced March 2018.

Comments: Camera-ready version, International Conference of Machine Learning 2018

arXiv:1512.00272 [pdf, other]

doi 10.1088/1742-6596/664/4/042052

Enabling Object Storage via shims for Grid Middleware

Authors: Samuel Cadellin Skipsey, Shaun De Witt, Alastair Dewhurst, David Britton, Gareth Roy, David Crooks

Abstract: The Object Store model has quickly become the basis of most commercially successful mass storage infrastructure, backing so-called "Cloud" storage such as Amazon S3, but also underlying the implementation of most parallel distributed storage systems. Many of the assumptions in Object Store design are similar, but not identical, to concepts in the design of Grid Storage Elements, although the requi… ▽ More The Object Store model has quickly become the basis of most commercially successful mass storage infrastructure, backing so-called "Cloud" storage such as Amazon S3, but also underlying the implementation of most parallel distributed storage systems. Many of the assumptions in Object Store design are similar, but not identical, to concepts in the design of Grid Storage Elements, although the requirement for "POSIX-like" filesystem structures on top of SEs makes the disjunction seem larger. As modern Object Stores provide many features that most Grid SEs do not (block level stri**, parallel access, automatic file repair, etc.), it is of interest to see how easily we can provide interfaces to typical Object Stores via plugins and shims for Grid tools, and how well experiments can adapt their data models to them. We present evaluation of, and first-deployment experiences with, (for example) Xrootd-Ceph interfaces for direct object-store access, as part of an initiative within GridPP\cite{GridPP} hosted at RAL. Additionally, we discuss the tradeoffs and experience of develo** plugins for the currently-popular {\it Ceph} parallel distributed filesystem for the GFAL2 access layer, at Glasgow. △ Less

Submitted 30 October, 2015; originally announced December 2015.

Comments: 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015)

arXiv:1404.3633 [pdf, other]

doi 10.4204/EPTCS.172.20

The ZX-calculus is incomplete for quantum mechanics

Authors: Christian Schröder de Witt, Vladimir Zamdzhiev

Abstract: We prove that the ZX-calculus is incomplete for quantum mechanics. We suggest the addition of a new 'color-swap' rule, of which currently no analytical formulation is known and which we suspect may be necessary, but not sufficient to make the ZX-calculus complete. We prove that the ZX-calculus is incomplete for quantum mechanics. We suggest the addition of a new 'color-swap' rule, of which currently no analytical formulation is known and which we suspect may be necessary, but not sufficient to make the ZX-calculus complete. △ Less

Submitted 29 December, 2014; v1 submitted 14 April, 2014; originally announced April 2014.

Comments: In Proceedings QPL 2014, arXiv:1412.8102

Journal ref: EPTCS 172, 2014, pp. 285-292

Showing 1–39 of 39 results for author: de Witt, S