-
IDs for AI Systems
Authors:
Alan Chan,
Noam Kolt,
Peter Wills,
Usman Anwar,
Christian Schroeder de Witt,
Nitarshan Rajkumar,
Lewis Hammond,
David Krueger,
Lennart Heim,
Markus Anderljung
Abstract:
AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. A user may not be able to verify whether a system satisfies certain safety standards. An investigator may not know whom to investigate when a system causes an incident. A platform may find it difficult to penalize repeated negative interactions with the same s…
▽ More
AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. A user may not be able to verify whether a system satisfies certain safety standards. An investigator may not know whom to investigate when a system causes an incident. A platform may find it difficult to penalize repeated negative interactions with the same system. Across a number of domains, IDs address analogous problems by identifying \textit{particular} entities (e.g., a particular Boeing 747) and providing information about other entities of the same class (e.g., some or all Boeing 747s). We propose a framework in which IDs are ascribed to \textbf{instances} of AI systems (e.g., a particular chat session with Claude 3), and associated information is accessible to parties seeking to interact with that system. We characterize IDs for AI systems, argue that there could be significant demand for IDs from key actors, analyze how those actors could incentivize ID adoption, explore potential implementations of our framework, and highlight limitations and risks. IDs seem most warranted in high-stakes settings, where certain actors (e.g., those that enable AI systems to make financial transactions) could experiment with incentives for ID use. Deployers of AI systems could experiment with develo** ID implementations. With further study, IDs could help to manage a world where AI systems pervade society.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits
Authors:
Andis Draguns,
Andrew Gritsevskiy,
Sumeet Ramesh Motwani,
Charlie Rogers-Smith,
Jeffrey Ladish,
Christian Schroeder de Witt
Abstract:
The rapid proliferation of open-source language models significantly increases the risks of downstream backdoor attacks. These backdoors can introduce dangerous behaviours during model deployment and can evade detection by conventional cybersecurity monitoring systems. In this paper, we introduce a novel class of backdoors in autoregressive transformer models, that, in contrast to prior art, are u…
▽ More
The rapid proliferation of open-source language models significantly increases the risks of downstream backdoor attacks. These backdoors can introduce dangerous behaviours during model deployment and can evade detection by conventional cybersecurity monitoring systems. In this paper, we introduce a novel class of backdoors in autoregressive transformer models, that, in contrast to prior art, are unelicitable in nature. Unelicitability prevents the defender from triggering the backdoor, making it impossible to evaluate or detect ahead of deployment even if given full white-box access and using automated techniques, such as red-teaming or certain formal verification methods. We show that our novel construction is not only unelicitable thanks to using cryptographic techniques, but also has favourable robustness properties. We confirm these properties in empirical investigations, and provide evidence that our backdoors can withstand state-of-the-art mitigation strategies. Additionally, we expand on previous work by showing that our universal backdoors, while not completely undetectable in white-box settings, can be harder to detect than some existing designs. By demonstrating the feasibility of seamlessly integrating backdoors into transformer models, this paper fundamentally questions the efficacy of pre-deployment detection strategies. This offers new insights into the offence-defence balance in AI safety and security.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Computing Low-Entropy Couplings for Large-Support Distributions
Authors:
Samuel Sokota,
Dylan Sam,
Christian Schroeder de Witt,
Spencer Compton,
Jakob Foerster,
J. Zico Kolter
Abstract:
Minimum-entropy coupling (MEC) -- the process of finding a joint distribution with minimum entropy for given marginals -- has applications in areas such as causality and steganography. However, existing algorithms are either computationally intractable for large-support distributions or limited to specific distribution types and sensitive to hyperparameter choices. This work addresses these limita…
▽ More
Minimum-entropy coupling (MEC) -- the process of finding a joint distribution with minimum entropy for given marginals -- has applications in areas such as causality and steganography. However, existing algorithms are either computationally intractable for large-support distributions or limited to specific distribution types and sensitive to hyperparameter choices. This work addresses these limitations by unifying a prior family of iterative MEC (IMEC) approaches into a generalized partition-based formalism. From this framework, we derive a novel IMEC algorithm called ARIMEC, capable of handling arbitrary discrete distributions, and introduce a method to make IMEC robust to suboptimal hyperparameter settings. These innovations facilitate the application of IMEC to high-throughput steganography with language models, among other settings. Our codebase is available at https://github.com/ssokota/mec .
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Authors:
Francisco Eiras,
Aleksandar Petrov,
Bertie Vidgen,
Christian Schroeder de Witt,
Fabio Pizzati,
Katherine Elkins,
Supratik Mukhopadhyay,
Adel Bibi,
Botos Csaba,
Fabro Steibel,
Fazl Barez,
Genevieve Smith,
Gianluca Guadagni,
Jon Chun,
Jordi Cabot,
Joseph Marvin Imperial,
Juan A. Nolazco-Flores,
Lori Landay,
Matthew Jackson,
Paul Röttger,
Philip H. S. Torr,
Trevor Darrell,
Yong Suk Lee,
Jakob Foerster
Abstract:
In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation i…
▽ More
In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source Generative AI. We argue for the responsible open sourcing of generative AI models in the near and medium term. To set the stage, we first introduce an AI openness taxonomy system and apply it to 40 current large language models. We then outline differential benefits and risks of open versus closed source AI and present potential risk mitigation, ranging from best practices to calls for technical and scientific contributions. We hope that this report will add a much needed missing voice to the current public discourse on near to mid-term AI safety and other societal impact.
△ Less
Submitted 24 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and Detection
Authors:
Linas Nasvytis,
Kai Sandbrink,
Jakob Foerster,
Tim Franzmeyer,
Christian Schroeder de Witt
Abstract:
While reinforcement learning (RL) algorithms have been successfully applied across numerous sequential decision-making problems, their generalization to unforeseen testing environments remains a significant concern. In this paper, we study the problem of out-of-distribution (OOD) detection in RL, which focuses on identifying situations at test time that RL agents have not encountered in their trai…
▽ More
While reinforcement learning (RL) algorithms have been successfully applied across numerous sequential decision-making problems, their generalization to unforeseen testing environments remains a significant concern. In this paper, we study the problem of out-of-distribution (OOD) detection in RL, which focuses on identifying situations at test time that RL agents have not encountered in their training environments. We first propose a clarification of terminology for OOD detection in RL, which aligns it with the literature from other machine learning domains. We then present new benchmark scenarios for OOD detection, which introduce anomalies with temporal autocorrelation into different components of the agent-environment loop. We argue that such scenarios have been understudied in the current literature, despite their relevance to real-world situations. Confirming our theoretical predictions, our experimental results suggest that state-of-the-art OOD detectors are not able to identify such anomalies. To address this problem, we propose a novel method for OOD detection, which we call DEXTER (Detection via Extraction of Time Series Representations). By treating environment observations as time series data, DEXTER extracts salient time series features, and then leverages an ensemble of isolation forest algorithms to detect anomalies. We find that DEXTER can reliably identify anomalies across benchmark scenarios, exhibiting superior performance compared to both state-of-the-art OOD detectors and high-dimensional changepoint detectors adopted from statistics.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Secret Collusion Among Generative AI Agents
Authors:
Sumeet Ramesh Motwani,
Mikhail Baranchuk,
Martin Strohmeier,
Vijay Bolina,
Philip H. S. Torr,
Lewis Hammond,
Christian Schroeder de Witt
Abstract:
Recent capability increases in large language models (LLMs) open up applications in which teams of communicating generative AI agents solve joint tasks. This poses privacy and security challenges concerning the unauthorised sharing of information, or other unwanted forms of agent coordination. Modern steganographic techniques could render such dynamics hard to detect. In this paper, we comprehensi…
▽ More
Recent capability increases in large language models (LLMs) open up applications in which teams of communicating generative AI agents solve joint tasks. This poses privacy and security challenges concerning the unauthorised sharing of information, or other unwanted forms of agent coordination. Modern steganographic techniques could render such dynamics hard to detect. In this paper, we comprehensively formalise the problem of secret collusion in systems of generative AI agents by drawing on relevant concepts from both the AI and security literature. We study incentives for the use of steganography, and propose a variety of mitigation measures. Our investigations result in a model evaluation framework that systematically tests capabilities required for various forms of secret collusion. We provide extensive empirical results across a range of contemporary LLMs. While the steganographic capabilities of current models remain limited, GPT-4 displays a capability jump suggesting the need for continuous monitoring of steganographic frontier model capabilities. We conclude by laying out a comprehensive research program to mitigate future risks of collusion between generative AI models.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
The Danger Of Arrogance: Welfare Equilibra As A Solution To Stackelberg Self-Play In Non-Coincidental Games
Authors:
Jake Levi,
Chris Lu,
Timon Willi,
Christian Schroeder de Witt,
Jakob Foerster
Abstract:
The increasing prevalence of multi-agent learning systems in society necessitates understanding how to learn effective and safe policies in general-sum multi-agent environments against a variety of opponents, including self-play. General-sum learning is difficult because of non-stationary opponents and misaligned incentives. Our first main contribution is to show that many recent approaches to gen…
▽ More
The increasing prevalence of multi-agent learning systems in society necessitates understanding how to learn effective and safe policies in general-sum multi-agent environments against a variety of opponents, including self-play. General-sum learning is difficult because of non-stationary opponents and misaligned incentives. Our first main contribution is to show that many recent approaches to general-sum learning can be derived as approximations to Stackelberg strategies, which suggests a framework for develo** new multi-agent learning algorithms. We then define non-coincidental games as games in which the Stackelberg strategy profile is not a Nash Equilibrium. This notably includes several canonical matrix games and provides a normative theory for why existing algorithms fail in self-play in such games. We address this problem by introducing Welfare Equilibria (WE) as a generalisation of Stackelberg Strategies, which can recover desirable Nash Equilibria even in non-coincidental games. Finally, we introduce Welfare Function Search (WelFuSe) as a practical approach to finding desirable WE against unknown opponents, which finds more mutually desirable solutions in self-play, while preserving performance against naive learning opponents.
△ Less
Submitted 27 March, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
FLASH-TB: Integrating Arc-Flags and Trip-Based Public Transit Routing
Authors:
Ernestine Großmann,
Jonas Sauer,
Christian Schulz,
Patrick Steil,
Sascha Witt
Abstract:
We present FLASH-TB, a journey planning algorithm for public transit networks that combines Trip-Based Public Transit Routing (TB) with the Arc-Flags speedup technique. The basic idea is simple: The network is partitioned into a configurable number of cells. For each cell and each possible transfer between two vehicles, the algorithm precomputes a flag that indicates whether the transfer is requir…
▽ More
We present FLASH-TB, a journey planning algorithm for public transit networks that combines Trip-Based Public Transit Routing (TB) with the Arc-Flags speedup technique. The basic idea is simple: The network is partitioned into a configurable number of cells. For each cell and each possible transfer between two vehicles, the algorithm precomputes a flag that indicates whether the transfer is required to reach the cell. During a query, only flagged transfers are explored. Our algorithm improves upon previous attempts to apply Arc-Flags to public transit networks, which saw limited success due to conflicting rules for pruning the search space. We show that these rules can be reconciled while still producing correct results. Because the number of cells is configurable, FLASH-TB offers a tradeoff between query time and memory consumption. It is significantly more space-efficient than existing techniques with a comparable preprocessing time, which store generalized shortest-path trees: to match their query performance, it requires up to two orders of magnitude less memory. The fastest configuration of FLASH-TB achieves a speedup of more than two orders of magnitude over TB, offering sub-millisecond query times even on large countrywide networks.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
JaxMARL: Multi-Agent RL Environments in JAX
Authors:
Alexander Rutherford,
Benjamin Ellis,
Matteo Gallici,
Jonathan Cook,
Andrei Lupu,
Gardar Ingvarsson,
Timon Willi,
Akbir Khan,
Christian Schroeder de Witt,
Alexandra Souly,
Saptarashmi Bandyopadhyay,
Mikayel Samvelyan,
Minqi Jiang,
Robert Tjarko Lange,
Shimon Whiteson,
Bruno Lacerda,
Nick Hawes,
Tim Rocktaschel,
Chris Lu,
Jakob Nicolaus Foerster
Abstract:
Benchmarks play an important role in the development of machine learning algorithms. For example, research in reinforcement learning (RL) has been heavily influenced by available environments and benchmarks. However, RL environments are traditionally run on the CPU, limiting their scalability with typical academic compute. Recent advancements in JAX have enabled the wider use of hardware accelerat…
▽ More
Benchmarks play an important role in the development of machine learning algorithms. For example, research in reinforcement learning (RL) has been heavily influenced by available environments and benchmarks. However, RL environments are traditionally run on the CPU, limiting their scalability with typical academic compute. Recent advancements in JAX have enabled the wider use of hardware acceleration to overcome these computational hurdles, enabling massively parallel RL training pipelines and environments. This is particularly useful for multi-agent reinforcement learning (MARL) research. First of all, multiple agents must be considered at each environment step, adding computational burden, and secondly, the sample complexity is increased due to non-stationarity, decentralised partial observability, or other MARL challenges. In this paper, we present JaxMARL, the first open-source code base that combines ease-of-use with GPU enabled efficiency, and supports a large number of commonly used MARL environments as well as popular baseline algorithms. When considering wall clock time, our experiments show that per-run our JAX-based training pipeline is up to 12500x faster than existing approaches. This enables efficient and thorough evaluations, with the potential to alleviate the evaluation crisis of the field. We also introduce and benchmark SMAX, a vectorised, simplified version of the popular StarCraft Multi-Agent Challenge, which removes the need to run the StarCraft II game engine. This not only enables GPU acceleration, but also provides a more flexible MARL environment, unlocking the potential for self-play, meta-learning, and other future applications in MARL. We provide code at https://github.com/flairox/jaxmarl.
△ Less
Submitted 19 December, 2023; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Bayesian Exploration Networks
Authors:
Mattie Fellows,
Brandon Kaplowitz,
Christian Schroeder de Witt,
Shimon Whiteson
Abstract:
Bayesian reinforcement learning (RL) offers a principled and elegant approach for sequential decision making under uncertainty. Most notably, Bayesian agents do not face an exploration/exploitation dilemma, a major pathology of frequentist methods. However theoretical understanding of model-free approaches is lacking. In this paper, we introduce a novel Bayesian model-free formulation and the firs…
▽ More
Bayesian reinforcement learning (RL) offers a principled and elegant approach for sequential decision making under uncertainty. Most notably, Bayesian agents do not face an exploration/exploitation dilemma, a major pathology of frequentist methods. However theoretical understanding of model-free approaches is lacking. In this paper, we introduce a novel Bayesian model-free formulation and the first analysis showing that model-free approaches can yield Bayes-optimal policies. We show all existing model-free approaches make approximations that yield policies that can be arbitrarily Bayes-suboptimal. As a first step towards model-free Bayes optimality, we introduce the Bayesian exploration network (BEN) which uses normalising flows to model both the aleatoric uncertainty (via density estimation) and epistemic uncertainty (via variational inference) in the Bellman operator. In the limit of complete optimisation, BEN learns true Bayes-optimal policies, but like in variational expectation-maximisation, partial optimisation renders our approach tractable. Empirical results demonstrate that BEN can learn true Bayes-optimal policies in tasks where existing model-free approaches fail.
△ Less
Submitted 25 June, 2024; v1 submitted 24 August, 2023;
originally announced August 2023.
-
Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning
Authors:
Yat Long Lo,
Christian Schroeder de Witt,
Samuel Sokota,
Jakob Nicolaus Foerster,
Shimon Whiteson
Abstract:
By enabling agents to communicate, recent cooperative multi-agent reinforcement learning (MARL) methods have demonstrated better task performance and more coordinated behavior. Most existing approaches facilitate inter-agent communication by allowing agents to send messages to each other through free communication channels, i.e., cheap talk channels. Current methods require these channels to be co…
▽ More
By enabling agents to communicate, recent cooperative multi-agent reinforcement learning (MARL) methods have demonstrated better task performance and more coordinated behavior. Most existing approaches facilitate inter-agent communication by allowing agents to send messages to each other through free communication channels, i.e., cheap talk channels. Current methods require these channels to be constantly accessible and known to the agents a priori. In this work, we lift these requirements such that the agents must discover the cheap talk channels and learn how to use them. Hence, the problem has two main parts: cheap talk discovery (CTD) and cheap talk utilization (CTU). We introduce a novel conceptual framework for both parts and develop a new algorithm based on mutual information maximization that outperforms existing algorithms in CTD/CTU settings. We also release a novel benchmark suite to stimulate future research in CTD/CTU.
△ Less
Submitted 19 March, 2023;
originally announced March 2023.
-
Revealing Robust Oil and Gas Company Macro-Strategies using Deep Multi-Agent Reinforcement Learning
Authors:
Dylan Radovic,
Lucas Kruitwagen,
Christian Schroeder de Witt,
Ben Caldecott,
Shane Tomlinson,
Mark Workman
Abstract:
The energy transition potentially poses an existential risk for major international oil companies (IOCs) if they fail to adapt to low-carbon business models. Projections of energy futures, however, are met with diverging assumptions on its scale and pace, causing disagreement among IOC decision-makers and their stakeholders over what the business model of an incumbent fossil fuel company should be…
▽ More
The energy transition potentially poses an existential risk for major international oil companies (IOCs) if they fail to adapt to low-carbon business models. Projections of energy futures, however, are met with diverging assumptions on its scale and pace, causing disagreement among IOC decision-makers and their stakeholders over what the business model of an incumbent fossil fuel company should be. In this work, we used deep multi-agent reinforcement learning to solve an energy systems wargame wherein players simulate IOC decision-making, including hydrocarbon and low-carbon investments decisions, dividend policies, and capital structure measures, through an uncertain energy transition to explore critical and non-linear governance questions, from leveraged transitions to reserve replacements. Adversarial play facilitated by state-of-the-art algorithms revealed decision-making strategies robust to energy transition uncertainty and against multiple IOCs. In all games, robust strategies emerged in the form of low-carbon business models as a result of early transition-oriented movement. IOCs adopting such strategies outperformed business-as-usual and delayed transition strategies regardless of hydrocarbon demand projections. In addition to maximizing value, these strategies benefit greater society by contributing substantial amounts of capital necessary to accelerate the global low-carbon energy transition. Our findings point towards the need for lenders and investors to effectively mobilize transition-oriented finance and engage with IOCs to ensure responsible reallocation of capital towards low-carbon business models that would enable the emergence of fossil fuel incumbents as future low-carbon leaders.
△ Less
Submitted 20 November, 2022;
originally announced November 2022.
-
Perfectly Secure Steganography Using Minimum Entropy Coupling
Authors:
Christian Schroeder de Witt,
Samuel Sokota,
J. Zico Kolter,
Jakob Foerster,
Martin Strohmeier
Abstract:
Steganography is the practice of encoding secret information into innocuous content in such a manner that an adversarial third party would not realize that there is hidden meaning. While this problem has classically been studied in security literature, recent advances in generative models have led to a shared interest among security and machine learning researchers in develo** scalable steganogr…
▽ More
Steganography is the practice of encoding secret information into innocuous content in such a manner that an adversarial third party would not realize that there is hidden meaning. While this problem has classically been studied in security literature, recent advances in generative models have led to a shared interest among security and machine learning researchers in develo** scalable steganography techniques. In this work, we show that a steganography procedure is perfectly secure under Cachin (1998)'s information-theoretic model of steganography if and only if it is induced by a coupling. Furthermore, we show that, among perfectly secure procedures, a procedure maximizes information throughput if and only if it is induced by a minimum entropy coupling. These insights yield what are, to the best of our knowledge, the first steganography algorithms to achieve perfect security guarantees for arbitrary covertext distributions. To provide empirical validation, we compare a minimum entropy coupling-based approach to three modern baselines -- arithmetic coding, Meteor, and adaptive dynamic grou** -- using GPT-2, WaveRNN, and Image Transformer as communication channels. We find that the minimum entropy coupling-based approach achieves superior encoding efficiency, despite its stronger security constraints. In aggregate, these results suggest that it may be natural to view information-theoretic steganography through the lens of minimum entropy coupling.
△ Less
Submitted 30 October, 2023; v1 submitted 24 October, 2022;
originally announced October 2022.
-
Equivariant Networks for Zero-Shot Coordination
Authors:
Darius Muglich,
Christian Schroeder de Witt,
Elise van der Pol,
Shimon Whiteson,
Jakob Foerster
Abstract:
Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner. A common failure mode is symmetry breaking, when agents arbitrarily converge on one out of many equivalent but mutually incompatible policies. Commonly these examples include partial observability, e.g. waving your right hand vs. left hand to convey a covert message.…
▽ More
Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner. A common failure mode is symmetry breaking, when agents arbitrarily converge on one out of many equivalent but mutually incompatible policies. Commonly these examples include partial observability, e.g. waving your right hand vs. left hand to convey a covert message. In this paper, we present a novel equivariant network architecture for use in Dec-POMDPs that effectively leverages environmental symmetry for improving zero-shot coordination, doing so more effectively than prior methods. Our method also acts as a ``coordination-improvement operator'' for generic, pre-trained policies, and thus may be applied at test-time in conjunction with any self-play algorithm. We provide theoretical guarantees of our work and test on the AI benchmark task of Hanabi, where we demonstrate our methods outperforming other symmetry-aware baselines in zero-shot coordination, as well as able to improve the coordination ability of a variety of pre-trained policies. In particular, we show our method can be used to improve on the state of the art for zero-shot coordination on the Hanabi benchmark.
△ Less
Submitted 10 April, 2024; v1 submitted 21 October, 2022;
originally announced October 2022.
-
Discovered Policy Optimisation
Authors:
Chris Lu,
Jakub Grudzien Kuba,
Alistair Letcher,
Luke Metz,
Christian Schroeder de Witt,
Jakob Foerster
Abstract:
Tremendous progress has been made in reinforcement learning (RL) over the past decade. Most of these advancements came through the continual development of new algorithms, which were designed using a combination of mathematical derivations, intuitions, and experimentation. Such an approach of creating algorithms manually is limited by human understanding and ingenuity. In contrast, meta-learning p…
▽ More
Tremendous progress has been made in reinforcement learning (RL) over the past decade. Most of these advancements came through the continual development of new algorithms, which were designed using a combination of mathematical derivations, intuitions, and experimentation. Such an approach of creating algorithms manually is limited by human understanding and ingenuity. In contrast, meta-learning provides a toolkit for automatic machine learning method optimisation, potentially addressing this flaw. However, black-box approaches which attempt to discover RL algorithms with minimal prior structure have thus far not outperformed existing hand-crafted algorithms. Mirror Learning, which includes RL algorithms, such as PPO, offers a potential middle-ground starting point: while every method in this framework comes with theoretical guarantees, components that differentiate them are subject to design. In this paper we explore the Mirror Learning space by meta-learning a "drift" function. We refer to the immediate result as Learnt Policy Optimisation (LPO). By analysing LPO we gain original insights into policy optimisation which we use to formulate a novel, closed-form RL algorithm, Discovered Policy Optimisation (DPO). Our experiments in Brax environments confirm state-of-the-art performance of LPO and DPO, as well as their transfer to unseen settings.
△ Less
Submitted 12 October, 2022; v1 submitted 11 October, 2022;
originally announced October 2022.
-
Local and Non-local Microwave Impedance of a Three-Terminal Hybrid Device
Authors:
B. Harlech-Jones,
S. J. Waddy,
J. D. S. Witt,
D. Govender,
L. Casparis,
E. Martinez,
R. Kallaher,
S. Gronin,
G. Gardner,
M. J. Manfra,
D. J. Reilly
Abstract:
We report microwave impedance measurements of a superconductor-semiconductor hybrid nanowire device with three terminals (3T). Our technique makes use of transmission line resonators to acquire the nine complex scattering matrix parameters (S-parameters) of the device on fast timescales and across a spectrum of frequencies spanning 0.3 - 7 GHz. Via comparison with dc-transport measurements, we exa…
▽ More
We report microwave impedance measurements of a superconductor-semiconductor hybrid nanowire device with three terminals (3T). Our technique makes use of transmission line resonators to acquire the nine complex scattering matrix parameters (S-parameters) of the device on fast timescales and across a spectrum of frequencies spanning 0.3 - 7 GHz. Via comparison with dc-transport measurements, we examine the utility of this technique for probing the local and non-local response of 3T devices where capacitive and inductive contributions can play a role. Such measurements require careful interpretation but may be of use in discerning true Majorana zero modes from trivial states arising from disorder.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks
Authors:
Tim Franzmeyer,
Stephen McAleer,
João F. Henriques,
Jakob N. Foerster,
Philip H. S. Torr,
Adel Bibi,
Christian Schroeder de Witt
Abstract:
Autonomous agents deployed in the real world need to be robust against adversarial attacks on sensory inputs. Robustifying agent policies requires anticipating the strongest attacks possible. We demonstrate that existing observation-space attacks on reinforcement learning agents have a common weakness: while effective, their lack of information-theoretic detectability constraints makes them detect…
▽ More
Autonomous agents deployed in the real world need to be robust against adversarial attacks on sensory inputs. Robustifying agent policies requires anticipating the strongest attacks possible. We demonstrate that existing observation-space attacks on reinforcement learning agents have a common weakness: while effective, their lack of information-theoretic detectability constraints makes them detectable using automated means or human inspection. Detectability is undesirable to adversaries as it may trigger security escalations. We introduce ε-illusory, a novel form of adversarial attack on sequential decision-makers that is both effective and of ε-bounded statistical detectability. We propose a novel dual ascent algorithm to learn such attacks end-to-end. Compared to existing attacks, we empirically find ε-illusory to be significantly harder to detect with automated methods, and a small study with human participants (IRB approval under reference R84123/RE001) suggests they are similarly harder to detect for humans. Our findings suggest the need for better anomaly detectors, as well as effective hardware- and system-level defenses. The project website can be found at https://tinyurl.com/illusory-attacks.
△ Less
Submitted 6 May, 2024; v1 submitted 20 July, 2022;
originally announced July 2022.
-
Generalized Beliefs for Cooperative AI
Authors:
Darius Muglich,
Luisa Zintgraf,
Christian Schroeder de Witt,
Shimon Whiteson,
Jakob Foerster
Abstract:
Self-play is a common paradigm for constructing solutions in Markov games that can yield optimal policies in collaborative settings. However, these policies often adopt highly-specialized conventions that make playing with a novel partner difficult. To address this, recent approaches rely on encoding symmetry and convention-awareness into policy training, but these require strong environmental ass…
▽ More
Self-play is a common paradigm for constructing solutions in Markov games that can yield optimal policies in collaborative settings. However, these policies often adopt highly-specialized conventions that make playing with a novel partner difficult. To address this, recent approaches rely on encoding symmetry and convention-awareness into policy training, but these require strong environmental assumptions and can complicate policy training. We therefore propose moving the learning of conventions to the belief space. Specifically, we propose a belief learning model that can maintain beliefs over rollouts of policies not seen at training time, and can thus decode and adapt to novel conventions at test time. We show how to leverage this model for both search and training of a best response over various pools of policies to greatly improve ad-hoc teamplay. We also show how our setup promotes explainability and interpretability of nuanced agent conventions.
△ Less
Submitted 25 June, 2022;
originally announced June 2022.
-
Biological Evolution and Genetic Algorithms: Exploring the Space of Abstract Tile Self-Assembly
Authors:
Christian Schroeder de Witt
Abstract:
A physically-motivated genetic algorithm (GA) and full enumeration for a tile-based model of self-assembly (JaTAM) is implemented using a graphics processing unit (GPU). We observe performance gains with respect to state-of-the-art implementations on CPU of factor 7.7 for the GA and 2.9 for JaTAM. The correctness of our GA implementation is demonstrated using a test-bed fitness function, and our J…
▽ More
A physically-motivated genetic algorithm (GA) and full enumeration for a tile-based model of self-assembly (JaTAM) is implemented using a graphics processing unit (GPU). We observe performance gains with respect to state-of-the-art implementations on CPU of factor 7.7 for the GA and 2.9 for JaTAM. The correctness of our GA implementation is demonstrated using a test-bed fitness function, and our JaTAM implementation is verified by classifying a well-known search space $S_{2,8}$ based on two tile types. The performance gains achieved allow for the classification of a larger search space $S^{32}_{3,8}$ based on three tile types. The prevalence of structures based on two tile types demonstrates that simple organisms emerge preferrably even in complex ecosystems. The modularity of the largest structures found motivates the assumption that to first order, $S_{2,8}$ forms the building blocks of $S_{3,8}$. We conclude that GPUs may play an important role in future studies of evolutionary dynamics.
△ Less
Submitted 28 May, 2022;
originally announced May 2022.
-
Model-Free Opponent Sha**
Authors:
Chris Lu,
Timon Willi,
Christian Schroeder de Witt,
Jakob Foerster
Abstract:
In general-sum games, the interaction of self-interested learning agents commonly leads to collectively worst-case outcomes, such as defect-defect in the iterated prisoner's dilemma (IPD). To overcome this, some methods, such as Learning with Opponent-Learning Awareness (LOLA), shape their opponents' learning process. However, these methods are myopic since only a small number of steps can be anti…
▽ More
In general-sum games, the interaction of self-interested learning agents commonly leads to collectively worst-case outcomes, such as defect-defect in the iterated prisoner's dilemma (IPD). To overcome this, some methods, such as Learning with Opponent-Learning Awareness (LOLA), shape their opponents' learning process. However, these methods are myopic since only a small number of steps can be anticipated, are asymmetric since they treat other agents as naive learners, and require the use of higher-order derivatives, which are calculated through white-box access to an opponent's differentiable learning algorithm. To address these issues, we propose Model-Free Opponent Sha** (M-FOS). M-FOS learns in a meta-game in which each meta-step is an episode of the underlying inner game. The meta-state consists of the inner policies, and the meta-policy produces a new inner policy to be used in the next episode. M-FOS then uses generic model-free optimisation methods to learn meta-policies that accomplish long-horizon opponent sha**. Empirically, M-FOS near-optimally exploits naive learners and other, more sophisticated algorithms from the literature. For example, to the best of our knowledge, it is the first method to learn the well-known Zero-Determinant (ZD) extortion strategy in the IPD. In the same settings, M-FOS leads to socially optimal outcomes under meta-self-play. Finally, we show that M-FOS can be scaled to high-dimensional settings.
△ Less
Submitted 4 November, 2022; v1 submitted 3 May, 2022;
originally announced May 2022.
-
(Private)-Retroactive Carbon Pricing [(P)ReCaP]: A Market-based Approach for Climate Finance and Risk Assessment
Authors:
Yoshua Bengio,
Prateek Gupta,
Dylan Radovic,
Maarten Scholl,
Andrew Williams,
Christian Schroeder de Witt,
Tianyu Zhang,
Yang Zhang
Abstract:
Insufficient Social Cost of Carbon (SCC) estimation methods and short-term decision-making horizons have hindered the ability of carbon emitters to properly correct for the negative externalities of climate change, as well as the capacity of nations to balance economic and climate policy. To overcome these limitations, we introduce Retrospective Social Cost of Carbon Updating (ReSCCU), a novel mec…
▽ More
Insufficient Social Cost of Carbon (SCC) estimation methods and short-term decision-making horizons have hindered the ability of carbon emitters to properly correct for the negative externalities of climate change, as well as the capacity of nations to balance economic and climate policy. To overcome these limitations, we introduce Retrospective Social Cost of Carbon Updating (ReSCCU), a novel mechanism that corrects for these limitations as empirically measured evidence is collected. To implement ReSCCU in the context of carbon taxation, we propose Retroactive Carbon Pricing (ReCaP), a market mechanism in which polluters offload the payment of ReSCCU adjustments to insurers. To alleviate systematic risks and minimize government involvement, we introduce the Private ReCaP (PReCaP) prediction market, which could see real-world implementation based on the engagement of a few high net-worth individuals or independent institutions.
△ Less
Submitted 2 May, 2022;
originally announced May 2022.
-
Mirror Learning: A Unifying Framework of Policy Optimisation
Authors:
Jakub Grudzien Kuba,
Christian Schroeder de Witt,
Jakob Foerster
Abstract:
Modern deep reinforcement learning (RL) algorithms are motivated by either the generalised policy iteration (GPI) or trust-region learning (TRL) frameworks. However, algorithms that strictly respect these theoretical frameworks have proven unscalable. Surprisingly, the only known scalable algorithms violate the GPI/TRL assumptions, e.g. due to required regularisation or other heuristics. The curre…
▽ More
Modern deep reinforcement learning (RL) algorithms are motivated by either the generalised policy iteration (GPI) or trust-region learning (TRL) frameworks. However, algorithms that strictly respect these theoretical frameworks have proven unscalable. Surprisingly, the only known scalable algorithms violate the GPI/TRL assumptions, e.g. due to required regularisation or other heuristics. The current explanation of their empirical success is essentially "by analogy": they are deemed approximate adaptations of theoretically sound methods. Unfortunately, studies have shown that in practice these algorithms differ greatly from their conceptual ancestors. In contrast, in this paper we introduce a novel theoretical framework, named Mirror Learning, which provides theoretical guarantees to a large class of algorithms, including TRPO and PPO. While the latter two exploit the flexibility of our framework, GPI and TRL fit in merely as pathologically restrictive corner cases thereof. This suggests that the empirical performance of state-of-the-art methods is a direct consequence of their theoretical properties, rather than of aforementioned approximate analogies. Mirror learning sets us free to boldly explore novel, theoretically sound RL algorithms, a thus far uncharted wonderland.
△ Less
Submitted 14 July, 2022; v1 submitted 7 January, 2022;
originally announced January 2022.
-
Spin-Relaxation Mechanisms in InAs Quantum Well Heterostructures
Authors:
J. D. S. Witt,
S. J. Pauka,
G. C. Gardner,
S. Gronin,
T. Wang,
C. Thomas,
M. J. Manfra,
D. J. Reilly,
M. C. Cassidy
Abstract:
The spin-orbit interaction and spin-relaxation mechanisms of a shallow InAs quantum well heterostructure are investigated by magnetoconductance measurements as a function of an applied top-gate voltage. The data were fit using the Iordanskii--Lyanda-Geller--Pikus model and two distinct transport regimes were identified which correspond to the first and second sub-bands of the quantum well. The spi…
▽ More
The spin-orbit interaction and spin-relaxation mechanisms of a shallow InAs quantum well heterostructure are investigated by magnetoconductance measurements as a function of an applied top-gate voltage. The data were fit using the Iordanskii--Lyanda-Geller--Pikus model and two distinct transport regimes were identified which correspond to the first and second sub-bands of the quantum well. The spin-orbit interaction splitting energy is extracted from the fits to the data, which also displays two distinct regimes. The different sub-band regimes exhibit different spin-scattering mechanisms, the identification of which, is of relevance for device platforms of reduced dimensionality which utilise the spin-orbit interaction.
△ Less
Submitted 6 December, 2021; v1 submitted 30 November, 2021;
originally announced November 2021.
-
Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the Age of AI-NIDS
Authors:
Christian Schroeder de Witt,
Yongchao Huang,
Philip H. S. Torr,
Martin Strohmeier
Abstract:
Cyber attacks are increasing in volume, frequency, and complexity. In response, the security community is looking toward fully automating cyber defense systems using machine learning. However, so far the resultant effects on the coevolutionary dynamics of attackers and defenders have not been examined. In this whitepaper, we hypothesise that increased automation on both sides will accelerate the c…
▽ More
Cyber attacks are increasing in volume, frequency, and complexity. In response, the security community is looking toward fully automating cyber defense systems using machine learning. However, so far the resultant effects on the coevolutionary dynamics of attackers and defenders have not been examined. In this whitepaper, we hypothesise that increased automation on both sides will accelerate the coevolutionary cycle, thus begging the question of whether there are any resultant fixed points, and how they are characterised. Working within the threat model of Locked Shields, Europe's largest cyberdefense exercise, we study blackbox adversarial attacks on network classifiers. Given already existing attack capabilities, we question the utility of optimal evasion attack frameworks based on minimal evasion distances. Instead, we suggest a novel reinforcement learning setting that can be used to efficiently generate arbitrary adversarial perturbations. We then argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions, and introduce a temporally extended multi-agent reinforcement learning framework in which the resultant dynamics can be studied. We hypothesise that one plausible fixed point of AI-NIDS may be a scenario where the defense strategy relies heavily on whitelisted feature flow subspaces. Finally, we demonstrate that a continual learning approach is required to study attacker-defender dynamics in temporally extended general-sum games.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
Extending the Time Horizon: Efficient Public Transit Routing on Arbitrary-Length Timetables
Authors:
Sascha Witt
Abstract:
We study the problem of computing all Pareto-optimal journeys in a public transit network regarding the two criteria of arrival time and number of transfers taken. In recent years, great advances have been made in making public transit network routing more scalable to larger networks. However, most approaches are silent on scalability in another dimension: Time. Experimental evaluations are often…
▽ More
We study the problem of computing all Pareto-optimal journeys in a public transit network regarding the two criteria of arrival time and number of transfers taken. In recent years, great advances have been made in making public transit network routing more scalable to larger networks. However, most approaches are silent on scalability in another dimension: Time. Experimental evaluations are often done on slices of timetables spanning a couple of days, when in reality, the planning horizon is much longer. We introduce an extension to trip-based public transit routing, proposed in [12], that allows efficient handling of arbitrarily long timetables. Our experimental evaluation shows that the resulting algorithm achieves fast queries on year-spanning timetables, and can incorporate updates such as delays or changed routes quickly even on large networks.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
A Network Control Theory Approach to Longitudinal Symptom Dynamics in Major Depressive Disorder
Authors:
Tim Hahn,
Hamidreza Jamalabadi,
Daniel Emden,
Janik Goltermann,
Jan Ernsting,
Nils R. Winter,
Lukas Fisch,
Ramona Leenings,
Kelvin Sarink,
Vincent Holstein,
Marius Gruber,
Dominik Grotegerd,
Susanne Meinert,
Katharina Dohm,
Elisabeth J. Leehr,
Maike Richter,
Lisa Sindermann,
Verena Enneking,
Hannah Lemke,
Stephanie Witt,
Marcella Rietschel,
Katharina Brosch,
Julia-Katharina Pfarr,
Tina Meller,
Kai Gustav Ringwald
, et al. (9 additional authors not shown)
Abstract:
Background: The evolution of symptoms over time is at the heart of understanding and treating mental disorders. However, a principled, quantitative framework explaining symptom dynamics remains elusive. Here, we propose a Network Control Theory of Psychopathology allowing us to formally derive a theoretical control energy which we hypothesize quantifies resistance to future symptom improvement in…
▽ More
Background: The evolution of symptoms over time is at the heart of understanding and treating mental disorders. However, a principled, quantitative framework explaining symptom dynamics remains elusive. Here, we propose a Network Control Theory of Psychopathology allowing us to formally derive a theoretical control energy which we hypothesize quantifies resistance to future symptom improvement in Major Depressive Disorder (MDD). We test this hypothesis and investigate the relation to genetic and environmental risk as well as resilience.
Methods: We modelled longitudinal symptom-network dynamics derived from N=2,059 Beck Depression Inventory measurements acquired over a median of 134 days in a sample of N=109 patients suffering from MDD. We quantified the theoretical energy required for each patient and time-point to reach a symptom-free state given individual symptom-network topology (E 0 ) and 1) tested if E 0 predicts future symptom improvement and 2) whether this relationship is moderated by Polygenic Risk Scores (PRS) of mental disorders, childhood maltreatment experience, and self-reported resilience.
Outcomes: We show that E 0 indeed predicts symptom reduction at the next measurement and reveal that this coupling between E 0 and future symptom change increases with higher genetic risk and childhood maltreatment while it decreases with resilience.
Interpretation: Our study provides a mechanistic framework capable of predicting future symptom improvement based on individual symptom-network topology and clarifies the role of genetic and environmental risk as well as resilience. Our control-theoretic framework makes testable, quantitative predictions for individual therapeutic response and provides a starting-point for the theory-driven design of personalized interventions.
Funding: German Research Foundation and Interdisciplinary Centre for Clinical Research, Münster
△ Less
Submitted 21 July, 2021;
originally announced July 2021.
-
Genetic, Individual, and Familial Risk Correlates of Brain Network Controllability in Major Depressive Disorder
Authors:
Tim Hahn,
Nils R. Winter,
Jan Ernsting,
Marius Gruber,
Marco J. Mauritz,
Lukas Fisch,
Ramona Leenings,
Kelvin Sarink,
Julian Blanke,
Vincent Holstein,
Daniel Emden,
Marie Beisemann,
Nils Opel,
Dominik Grotegerd,
Susanne Meinert,
Walter Heindel,
Stephanie Witt,
Marcella Rietschel,
Markus M. Nöthen,
Andreas J. Forstner,
Tilo Kircher,
Igor Nenadic,
Andreas Jansen,
Bertram Müller-Myhsok,
Till F. M. Andlauer
, et al. (5 additional authors not shown)
Abstract:
Background: A therapeutic intervention in psychiatry can be viewed as an attempt to influence the brain's large-scale, dynamic network state transitions underlying cognition and behavior. Building on connectome-based graph analysis and control theory, Network Control Theory is emerging as a powerful tool to quantify network controllability - i.e., the influence of one brain region over others rega…
▽ More
Background: A therapeutic intervention in psychiatry can be viewed as an attempt to influence the brain's large-scale, dynamic network state transitions underlying cognition and behavior. Building on connectome-based graph analysis and control theory, Network Control Theory is emerging as a powerful tool to quantify network controllability - i.e., the influence of one brain region over others regarding dynamic network state transitions. If and how network controllability is related to mental health remains elusive.
Methods: From Diffusion Tensor Imaging data, we inferred structural connectivity and inferred calculated network controllability parameters to investigate their association with genetic and familial risk in patients diagnosed with major depressive disorder (MDD, n=692) and healthy controls (n=820).
Results: First, we establish that controllability measures differ between healthy controls and MDD patients while not varying with current symptom severity or remission status. Second, we show that controllability in MDD patients is associated with polygenic scores for MDD and psychiatric cross-disorder risk. Finally, we provide evidence that controllability varies with familial risk of MDD and bipolar disorder as well as with body mass index.
Conclusions: We show that network controllability is related to genetic, individual, and familial risk in MDD patients. We discuss how these insights into individual variation of network controllability may inform mechanistic models of treatment response prediction and personalized intervention-design in mental health.
△ Less
Submitted 21 July, 2021;
originally announced July 2021.
-
Communicating via Markov Decision Processes
Authors:
Samuel Sokota,
Christian Schroeder de Witt,
Maximilian Igl,
Luisa Zintgraf,
Philip Torr,
Martin Strohmeier,
J. Zico Kolter,
Shimon Whiteson,
Jakob Foerster
Abstract:
We consider the problem of communicating exogenous information by means of Markov decision process trajectories. This setting, which we call a Markov coding game (MCG), generalizes both source coding and a large class of referential games. MCGs also isolate a problem that is important in decentralized control settings in which cheap-talk is not available -- namely, they require balancing communica…
▽ More
We consider the problem of communicating exogenous information by means of Markov decision process trajectories. This setting, which we call a Markov coding game (MCG), generalizes both source coding and a large class of referential games. MCGs also isolate a problem that is important in decentralized control settings in which cheap-talk is not available -- namely, they require balancing communication with the associated cost of communicating. We contribute a theoretically grounded approach to MCGs based on maximum entropy reinforcement learning and minimum entropy coupling that we call MEME. Due to recent breakthroughs in approximation algorithms for minimum entropy coupling, MEME is not merely a theoretical algorithm, but can be applied to practical settings. Empirically, we show both that MEME is able to outperform a strong baseline on small MCGs and that MEME is able to achieve strong performance on extremely large MCGs. To the latter point, we demonstrate that MEME is able to losslessly communicate binary images via trajectories of Cartpole and Pong, while simultaneously achieving the maximal or near maximal expected returns, and that it is even capable of performing well in the presence of actuator noise.
△ Less
Submitted 12 June, 2022; v1 submitted 17 July, 2021;
originally announced July 2021.
-
Josephson Junctions Via Anodization of Epitaxial Al on an InAs Heterostructure
Authors:
A. Jouan,
J. D. S. Witt,
G. C. Gardner,
C. Thomas,
T. Lindemann,
S. Gronin,
M. J. Manfra,
D. J. Reilly
Abstract:
We combine electron beam lithography and masked anodization of epitaxial aluminium to define tunnel junctions via selective oxidation, alleviating the need for wet-etch processing or direct deposition of dielectric materials. Applying this technique to define Josephson junctions in proximity induced superconducting Al-InAs heterostructures, we observe multiple Andreev reflections in transport expe…
▽ More
We combine electron beam lithography and masked anodization of epitaxial aluminium to define tunnel junctions via selective oxidation, alleviating the need for wet-etch processing or direct deposition of dielectric materials. Applying this technique to define Josephson junctions in proximity induced superconducting Al-InAs heterostructures, we observe multiple Andreev reflections in transport experiments, indicative of a high quality junction. We further compare the mobility and density of Hall-bars defined via wet etching and anodization. These results may find utility in uncovering new fabrication approaches to junction-based qubit platforms.
△ Less
Submitted 23 May, 2021;
originally announced May 2021.
-
A Self-Supervised Auxiliary Loss for Deep RL in Partially Observable Settings
Authors:
Eltayeb Ahmed,
Luisa Zintgraf,
Christian A. Schroeder de Witt,
Nicolas Usunier
Abstract:
In this work we explore an auxiliary loss useful for reinforcement learning in environments where strong performing agents are required to be able to navigate a spatial environment. The auxiliary loss proposed is to minimize the classification error of a neural network classifier that predicts whether or not a pair of states sampled from the agents current episode trajectory are in order. The clas…
▽ More
In this work we explore an auxiliary loss useful for reinforcement learning in environments where strong performing agents are required to be able to navigate a spatial environment. The auxiliary loss proposed is to minimize the classification error of a neural network classifier that predicts whether or not a pair of states sampled from the agents current episode trajectory are in order. The classifier takes as input a pair of states as well as the agent's memory. The motivation for this auxiliary loss is that there is a strong correlation with which of a pair of states is more recent in the agents episode trajectory and which of the two states is spatially closer to the agent. Our hypothesis is that learning features to answer this question encourages the agent to learn and internalize in memory representations of states that facilitate spatial reasoning. We tested this auxiliary loss on a navigation task in a gridworld and achieved 9.6% increase in accumulative episode reward compared to a strong baseline approach.
△ Less
Submitted 17 April, 2021;
originally announced April 2021.
-
Spin-orbit Energies in Etch-Confined Superconductor-Semiconductor Nanowires
Authors:
J. D. S. Witt,
G. C. Gardner,
C. Thomas,
T. Lindemann,
S. Gronin,
M. J. Manfra,
D. J. Reilly
Abstract:
We report magneto-transport measurements of quasi-1-dimensional (1D) Al-InAs nanowires produced via etching of a hybrid superconductor-semiconductor two-dimensional electron gas (2DEG). Tunnel spectroscopy measurements above the superconducting gap provide a means of identifying the 1D sub-bands associated with the confined 1D region. Fitting the data to a model that includes the different compone…
▽ More
We report magneto-transport measurements of quasi-1-dimensional (1D) Al-InAs nanowires produced via etching of a hybrid superconductor-semiconductor two-dimensional electron gas (2DEG). Tunnel spectroscopy measurements above the superconducting gap provide a means of identifying the 1D sub-bands associated with the confined 1D region. Fitting the data to a model that includes the different components of the spin-orbit interaction (SOI) reveals their strength, of interest for evaluating the suitability of superconductor-semiconductor 2DEG for realizing Majorana qubits.
△ Less
Submitted 24 November, 2021; v1 submitted 15 March, 2021;
originally announced March 2021.
-
RainBench: Towards Global Precipitation Forecasting from Satellite Imagery
Authors:
Christian Schroeder de Witt,
Catherine Tong,
Valentina Zantedeschi,
Daniele De Martini,
Freddie Kalaitzis,
Matthew Chantry,
Duncan Watson-Parris,
Piotr Bilinski
Abstract:
Extreme precipitation events, such as violent rainfall and hail storms, routinely ravage economies and livelihoods around the develo** world. Climate change further aggravates this issue. Data-driven deep learning approaches could widen the access to accurate multi-day forecasts, to mitigate against such events. However, there is currently no benchmark dataset dedicated to the study of global pr…
▽ More
Extreme precipitation events, such as violent rainfall and hail storms, routinely ravage economies and livelihoods around the develo** world. Climate change further aggravates this issue. Data-driven deep learning approaches could widen the access to accurate multi-day forecasts, to mitigate against such events. However, there is currently no benchmark dataset dedicated to the study of global precipitation forecasts. In this paper, we introduce \textbf{RainBench}, a new multi-modal benchmark dataset for data-driven precipitation forecasting. It includes simulated satellite data, a selection of relevant meteorological data from the ERA5 reanalysis product, and IMERG precipitation data. We also release \textbf{PyRain}, a library to process large precipitation datasets efficiently. We present an extensive analysis of our novel dataset and establish baseline results for two benchmark medium-range precipitation forecasting tasks. Finally, we discuss existing data-driven weather forecasting methodologies and suggest future research avenues.
△ Less
Submitted 17 December, 2020;
originally announced December 2020.
-
Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?
Authors:
Christian Schroeder de Witt,
Tarun Gupta,
Denys Makoviichuk,
Viktor Makoviychuk,
Philip H. S. Torr,
Mingfei Sun,
Shimon Whiteson
Abstract:
Most recently developed approaches to cooperative multi-agent reinforcement learning in the \emph{centralized training with decentralized execution} setting involve estimating a centralized, joint value function. In this paper, we demonstrate that, despite its various theoretical shortcomings, Independent PPO (IPPO), a form of independent learning in which each agent simply estimates its local val…
▽ More
Most recently developed approaches to cooperative multi-agent reinforcement learning in the \emph{centralized training with decentralized execution} setting involve estimating a centralized, joint value function. In this paper, we demonstrate that, despite its various theoretical shortcomings, Independent PPO (IPPO), a form of independent learning in which each agent simply estimates its local value function, can perform just as well as or better than state-of-the-art joint learning approaches on popular multi-agent benchmark suite SMAC with little hyperparameter tuning. We also compare IPPO to several variants; the results suggest that IPPO's strong performance may be due to its robustness to some forms of environment non-stationarity.
△ Less
Submitted 18 November, 2020;
originally announced November 2020.
-
Engineering In-place (Shared-memory) Sorting Algorithms
Authors:
Michael Axtmann,
Sascha Witt,
Daniel Ferizovic,
Peter Sanders
Abstract:
We present sorting algorithms that represent the fastest known techniques for a wide range of input sizes, input distributions, data types, and machines. A part of the speed advantage is due to the feature to work in-place. Previously, the in-place feature often implied performance penalties. Our main algorithmic contribution is a blockwise approach to in-place data distribution that is provably c…
▽ More
We present sorting algorithms that represent the fastest known techniques for a wide range of input sizes, input distributions, data types, and machines. A part of the speed advantage is due to the feature to work in-place. Previously, the in-place feature often implied performance penalties. Our main algorithmic contribution is a blockwise approach to in-place data distribution that is provably cache-efficient. We also parallelize this approach taking dynamic load balancing and memory locality into account. Our comparison-based algorithm, In-place Superscalar Samplesort (IPS$^4$o), combines this technique with branchless decision trees. By taking cases with many equal elements into account and by adapting the distribution degree dynamically, we obtain a highly robust algorithm that outperforms the best in-place parallel comparison-based competitor by almost a factor of three. IPS$^4$o also outperforms the best comparison-based competitors in the in-place or not in-place, parallel or sequential settings. IPS$^4$o even outperforms the best integer sorting algorithms in a wide range of situations. In many of the remaining cases (often involving near-uniform input distributions, small keys, or a sequential setting), our new in-place radix sorter turns out to be the best algorithm. Claims to have the, in some sense, "best" sorting algorithm can be found in many papers which cannot all be true. Therefore, we base our conclusions on extensive experiments involving a large part of the cross product of 21 state-of-the-art sorting codes, 6 data types, 10 input distributions, 4 machines, 4 memory allocation strategies, and input sizes varying over 7 orders of magnitude. This confirms the robust performance of our algorithms while revealing major performance problems in many competitors outside the concrete set of measurements reported in the associated publications.
△ Less
Submitted 3 February, 2021; v1 submitted 28 September, 2020;
originally announced September 2020.
-
Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning
Authors:
Shariq Iqbal,
Christian A. Schroeder de Witt,
Bei Peng,
Wendelin Böhmer,
Shimon Whiteson,
Fei Sha
Abstract:
Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities; however, common patterns of behavior often emerge among these agents/entities. Our method aims to leverage these commonalities by asking the question: ``What is the expected utility of each agent when only considering a randomly selected sub-group of its observed entities?…
▽ More
Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities; however, common patterns of behavior often emerge among these agents/entities. Our method aims to leverage these commonalities by asking the question: ``What is the expected utility of each agent when only considering a randomly selected sub-group of its observed entities?'' By posing this counterfactual question, we can recognize state-action trajectories within sub-groups of entities that we may have encountered in another task and use what we learned in that task to inform our prediction in the current one. We then reconstruct a prediction of the full returns as a combination of factors considering these disjoint groups of entities and train this ``randomly factorized" value function as an auxiliary objective for value-based multi-agent reinforcement learning. By doing so, our model can recognize and leverage similarities across tasks to improve learning efficiency in a multi-task setting. Our approach, Randomized Entity-wise Factorization for Imagined Learning (REFIL), outperforms all strong baselines by a significant margin in challenging multi-task StarCraft micromanagement settings.
△ Less
Submitted 11 June, 2021; v1 submitted 7 June, 2020;
originally announced June 2020.
-
Simulation-Based Inference for Global Health Decisions
Authors:
Christian Schroeder de Witt,
Bradley Gram-Hansen,
Nantas Nardelli,
Andrew Gambardella,
Rob Zinkov,
Puneet Dokania,
N. Siddharth,
Ana Belen Espinosa-Gonzalez,
Ara Darzi,
Philip Torr,
Atılım Güneş Baydin
Abstract:
The COVID-19 pandemic has highlighted the importance of in-silico epidemiological modelling in predicting the dynamics of infectious diseases to inform health policy and decision makers about suitable prevention and containment strategies. Work in this setting involves solving challenging inference and control problems in individual-based models of ever increasing complexity. Here we discuss recen…
▽ More
The COVID-19 pandemic has highlighted the importance of in-silico epidemiological modelling in predicting the dynamics of infectious diseases to inform health policy and decision makers about suitable prevention and containment strategies. Work in this setting involves solving challenging inference and control problems in individual-based models of ever increasing complexity. Here we discuss recent breakthroughs in machine learning, specifically in simulation-based inference, and explore its potential as a novel venue for model calibration to support the design and evaluation of public health interventions. To further stimulate research, we are develo** software interfaces that turn two cornerstone COVID-19 and malaria epidemiology models COVID-sim, (https://github.com/mrc-ide/covid-sim/) and OpenMalaria (https://github.com/SwissTPH/openmalaria) into probabilistic programs, enabling efficient interpretable Bayesian inference within those simulators.
△ Less
Submitted 14 May, 2020;
originally announced May 2020.
-
Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
Authors:
Tabish Rashid,
Mikayel Samvelyan,
Christian Schroeder de Witt,
Gregory Farquhar,
Jakob Foerster,
Shimon Whiteson
Abstract:
In many real-world settings, a team of agents must coordinate its behaviour while acting in a decentralised fashion. At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised l…
▽ More
In many real-world settings, a team of agents must coordinate its behaviour while acting in a decentralised fashion. At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a mixing network that estimates joint action-values as a monotonic combination of per-agent values. We structurally enforce that the joint-action value is monotonic in the per-agent values, through the use of non-negative weights in the mixing network, which guarantees consistency between the centralised and decentralised policies. To evaluate the performance of QMIX, we propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning. We evaluate QMIX on a challenging set of SMAC scenarios and show that it significantly outperforms existing multi-agent reinforcement learning methods.
△ Less
Submitted 27 August, 2020; v1 submitted 19 March, 2020;
originally announced March 2020.
-
FACMAC: Factored Multi-Agent Centralised Policy Gradients
Authors:
Bei Peng,
Tabish Rashid,
Christian A. Schroeder de Witt,
Pierre-Alexandre Kamienny,
Philip H. S. Torr,
Wendelin Böhmer,
Shimon Whiteson
Abstract:
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. Like MADDPG, a popular multi-agent actor-critic method, our approach uses deep deterministic policy gradients to learn policies. However, FACMAC learns a centralised but factored critic, which combines per-agent utilit…
▽ More
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. Like MADDPG, a popular multi-agent actor-critic method, our approach uses deep deterministic policy gradients to learn policies. However, FACMAC learns a centralised but factored critic, which combines per-agent utilities into the joint action-value function via a non-linear monotonic function, as in QMIX, a popular multi-agent Q-learning algorithm. However, unlike QMIX, there are no inherent constraints on factoring the critic. We thus also employ a nonmonotonic factorisation and empirically demonstrate that its increased representational capacity allows it to solve some tasks that cannot be solved with monolithic, or monotonically factored critics. In addition, FACMAC uses a centralised policy gradient estimator that optimises over the entire joint action space, rather than optimising over each agent's action space separately as in MADDPG. This allows for more coordinated policy changes and fully reaps the benefits of a centralised critic. We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks. Empirical results demonstrate FACMAC's superior performance over MADDPG and other baselines on all three domains.
△ Less
Submitted 7 May, 2021; v1 submitted 14 March, 2020;
originally announced March 2020.
-
Amortized Rejection Sampling in Universal Probabilistic Programming
Authors:
Saeid Naderiparizi,
Adam Ścibior,
Andreas Munk,
Mehrdad Ghadiri,
Atılım Güneş Baydin,
Bradley Gram-Hansen,
Christian Schroeder de Witt,
Robert Zinkov,
Philip H. S. Torr,
Tom Rainforth,
Yee Whye Teh,
Frank Wood
Abstract:
Naive approaches to amortized inference in probabilistic programs with unbounded loops can produce estimators with infinite variance. This is particularly true of importance sampling inference in programs that explicitly include rejection sampling as part of the user-programmed generative procedure. In this paper we develop a new and efficient amortized importance sampling estimator. We prove fini…
▽ More
Naive approaches to amortized inference in probabilistic programs with unbounded loops can produce estimators with infinite variance. This is particularly true of importance sampling inference in programs that explicitly include rejection sampling as part of the user-programmed generative procedure. In this paper we develop a new and efficient amortized importance sampling estimator. We prove finite variance of our estimator and empirically demonstrate our method's correctness and efficiency compared to existing alternatives on generative programs containing rejection sampling loops and discuss how to implement our method in a generic probabilistic programming framework.
△ Less
Submitted 28 March, 2022; v1 submitted 20 October, 2019;
originally announced October 2019.
-
Repairing the Surface of InAs-based Topological Heterostructures
Authors:
S. J. Pauka,
J. D. S. Witt,
C. N. Allen,
B. Harlech-Jones,
A. Jouan,
G. C. Gardner,
S. Gronin,
T. Wang,
C. Thomas,
M. J. Manfra,
D. J. Reilly,
M. C. Cassidy
Abstract:
Candidate systems for topologically-protected qubits include two-dimensional electron gases (2DEGs) based on heterostructures exhibiting a strong spin-orbit interaction (SOI) and superconductivity via the proximity effect. For InAs- or InSb-based materials, the need to form shallow quantum wells to create a hard-gapped $p$-wave superconducting state often subjects them to fabrication-induced damag…
▽ More
Candidate systems for topologically-protected qubits include two-dimensional electron gases (2DEGs) based on heterostructures exhibiting a strong spin-orbit interaction (SOI) and superconductivity via the proximity effect. For InAs- or InSb-based materials, the need to form shallow quantum wells to create a hard-gapped $p$-wave superconducting state often subjects them to fabrication-induced damage, limiting their mobility. Here we examine scattering mechanisms in processed InAs 2DEG quantum wells and demonstrate a means of increasing their mobility via repairing the semiconductor-dielectric interface. Passivation of charged impurity states with an argon-hydrogen plasma results in a significant increase in the measured mobility and reduction in its variance relative to untreated samples, up to 45300 cm$^2$/(V s) in a 10 nm deep quantum well.
△ Less
Submitted 23 August, 2019;
originally announced August 2019.
-
Hijacking Malaria Simulators with Probabilistic Programming
Authors:
Bradley Gram-Hansen,
Christian Schröder de Witt,
Tom Rainforth,
Philip H. S. Torr,
Yee Whye Teh,
Atılım Güneş Baydin
Abstract:
Epidemiology simulations have become a fundamental tool in the fight against the epidemics of various infectious diseases like AIDS and malaria. However, the complicated and stochastic nature of these simulators can mean their output is difficult to interpret, which reduces their usefulness to policymakers. In this paper, we introduce an approach that allows one to treat a large class of populatio…
▽ More
Epidemiology simulations have become a fundamental tool in the fight against the epidemics of various infectious diseases like AIDS and malaria. However, the complicated and stochastic nature of these simulators can mean their output is difficult to interpret, which reduces their usefulness to policymakers. In this paper, we introduce an approach that allows one to treat a large class of population-based epidemiology simulators as probabilistic generative models. This is achieved by hijacking the internal random number generator calls, through the use of a universal probabilistic programming system (PPS). In contrast to other methods, our approach can be easily retrofitted to simulators written in popular industrial programming frameworks. We demonstrate that our method can be used for interpretable introspection and inference, thus shedding light on black-box simulators. This reinstates much-needed trust between policymakers and evidence-based methods.
△ Less
Submitted 29 May, 2019;
originally announced May 2019.
-
Stratospheric Aerosol Injection as a Deep Reinforcement Learning Problem
Authors:
Christian Schroeder de Witt,
Thomas Hornigold
Abstract:
As global greenhouse gas emissions continue to rise, the use of stratospheric aerosol injection (SAI), a form of solar geoengineering, is increasingly considered in order to artificially mitigate climate change effects. However, initial research in simulation suggests that naive SAI can have catastrophic regional consequences, which may induce serious geostrategic conflicts. Current geo-engineerin…
▽ More
As global greenhouse gas emissions continue to rise, the use of stratospheric aerosol injection (SAI), a form of solar geoengineering, is increasingly considered in order to artificially mitigate climate change effects. However, initial research in simulation suggests that naive SAI can have catastrophic regional consequences, which may induce serious geostrategic conflicts. Current geo-engineering research treats SAI control in low-dimensional approximation only. We suggest treating SAI as a high-dimensional control problem, with policies trained according to a context-sensitive reward function within the Deep Reinforcement Learning (DRL) paradigm. In order to facilitate training in simulation, we suggest to emulate HadCM3, a widely used General Circulation Model, using deep learning techniques. We believe this is the first application of DRL to the climate sciences.
△ Less
Submitted 17 May, 2019;
originally announced May 2019.
-
The StarCraft Multi-Agent Challenge
Authors:
Mikayel Samvelyan,
Tabish Rashid,
Christian Schroeder de Witt,
Gregory Farquhar,
Nantas Nardelli,
Tim G. J. Rudner,
Chia-Man Hung,
Philip H. S. Torr,
Jakob Foerster,
Shimon Whiteson
Abstract:
In the last few years, deep multi-agent reinforcement learning (RL) has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative, multi-agent learning, in which teams of agents must learn to coordinate their behaviour while conditioning only on their private observations. This is an attractive research area since such p…
▽ More
In the last few years, deep multi-agent reinforcement learning (RL) has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative, multi-agent learning, in which teams of agents must learn to coordinate their behaviour while conditioning only on their private observations. This is an attractive research area since such problems are relevant to a large number of real-world systems and are also more amenable to evaluation than general-sum problems. Standardised environments such as the ALE and MuJoCo have allowed single-agent RL to move beyond toy domains, such as grid worlds. However, there is no comparable benchmark for cooperative multi-agent RL. As a result, most papers in this field use one-off toy problems, making it difficult to measure real progress. In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap. SMAC is based on the popular real-time strategy game StarCraft II and focuses on micromanagement challenges where each unit is controlled by an independent agent that must act based on local observations. We offer a diverse set of challenge maps and recommendations for best practices in benchmarking and evaluations. We also open-source a deep multi-agent RL learning framework including state-of-the-art algorithms. We believe that SMAC can provide a standard benchmark environment for years to come. Videos of our best agents for several SMAC scenarios are available at: https://youtu.be/VZ7zmQ_obZ0.
△ Less
Submitted 9 December, 2019; v1 submitted 11 February, 2019;
originally announced February 2019.
-
Multi-Agent Common Knowledge Reinforcement Learning
Authors:
Christian A. Schroeder de Witt,
Jakob N. Foerster,
Gregory Farquhar,
Philip H. S. Torr,
Wendelin Boehmer,
Shimon Whiteson
Abstract:
Cooperative multi-agent reinforcement learning often requires decentralised policies, which severely limit the agents' ability to coordinate their behaviour. In this paper, we show that common knowledge between agents allows for complex decentralised coordination. Common knowledge arises naturally in a large number of decentralised cooperative multi-agent tasks, for example, when agents can recons…
▽ More
Cooperative multi-agent reinforcement learning often requires decentralised policies, which severely limit the agents' ability to coordinate their behaviour. In this paper, we show that common knowledge between agents allows for complex decentralised coordination. Common knowledge arises naturally in a large number of decentralised cooperative multi-agent tasks, for example, when agents can reconstruct parts of each others' observations. Since agents an independently agree on their common knowledge, they can execute complex coordinated policies that condition on this knowledge in a fully decentralised fashion. We propose multi-agent common knowledge reinforcement learning (MACKRL), a novel stochastic actor-critic algorithm that learns a hierarchical policy tree. Higher levels in the hierarchy coordinate groups of agents by conditioning on their common knowledge, or delegate to lower levels with smaller subgroups but potentially richer common knowledge. The entire policy tree can be executed in a fully decentralised fashion. As the lowest policy tree level consists of independent policies for each agent, MACKRL reduces to independently learnt decentralised policies as a special case. We demonstrate that our method can exploit common knowledge for superior performance on complex decentralised coordination tasks, including a stochastic matrix game and challenging problems in StarCraft II unit micromanagement.
△ Less
Submitted 11 January, 2020; v1 submitted 27 October, 2018;
originally announced October 2018.
-
The SAGE Project: a Storage Centric Approach for Exascale Computing
Authors:
Sai Narasimhamurthy,
Nikita Danilov,
Sining Wu,
Ganesan Umanesan,
Steven Wei-der Chien,
Sergio Rivas-Gomez,
Ivy Bo Peng,
Erwin Laure,
Shaun de Witt,
Dirk Pleiter,
Stefano Markidis
Abstract:
SAGE (Percipient StorAGe for Exascale Data Centric Computing) is a European Commission funded project towards the era of Exascale computing. Its goal is to design and implement a Big Data/Extreme Computing (BDEC) capable infrastructure with associated software stack. The SAGE system follows a "storage centric" approach as it is capable of storing and processing large data volumes at the Exascale r…
▽ More
SAGE (Percipient StorAGe for Exascale Data Centric Computing) is a European Commission funded project towards the era of Exascale computing. Its goal is to design and implement a Big Data/Extreme Computing (BDEC) capable infrastructure with associated software stack. The SAGE system follows a "storage centric" approach as it is capable of storing and processing large data volumes at the Exascale regime.
SAGE addresses the convergence of Big Data Analysis and HPC in an era of next-generation data centric computing. This convergence is driven by the proliferation of massive data sources, such as large, dispersed scientific instruments and sensors where data needs to be processed, analyzed and integrated into simulations to derive scientific and innovative insights. A first prototype of the SAGE system has been been implemented and installed at the Julich Supercomputing Center. The SAGE storage system consists of multiple types of storage device technologies in a multi-tier I/O hierarchy, including flash, disk, and non-volatile memory technologies. The main SAGE software component is the Seagate Mero Object Storage that is accessible via the Clovis API and higher level interfaces. The SAGE project also includes scientific applications for the validation of the SAGE concepts.
The objective of this paper is to present the SAGE project concepts, the prototype of the SAGE platform and discuss the software architecture of the SAGE system.
△ Less
Submitted 6 July, 2018;
originally announced July 2018.
-
SAGE: Percipient Storage for Exascale Data Centric Computing
Authors:
Sai Narasimhamurthy,
Nikita Danilov,
Sining Wu,
Ganesan Umanesan,
Stefano Markidis,
Sergio Rivas-Gomez,
Ivy Bo Peng,
Erwin Laure,
Dirk Pleiter,
Shaun de Witt
Abstract:
We aim to implement a Big Data/Extreme Computing (BDEC) capable system infrastructure as we head towards the era of Exascale computing - termed SAGE (Percipient StorAGe for Exascale Data Centric Computing). The SAGE system will be capable of storing and processing immense volumes of data at the Exascale regime, and provide the capability for Exascale class applications to use such a storage infras…
▽ More
We aim to implement a Big Data/Extreme Computing (BDEC) capable system infrastructure as we head towards the era of Exascale computing - termed SAGE (Percipient StorAGe for Exascale Data Centric Computing). The SAGE system will be capable of storing and processing immense volumes of data at the Exascale regime, and provide the capability for Exascale class applications to use such a storage infrastructure. SAGE addresses the increasing overlaps between Big Data Analysis and HPC in an era of next-generation data centric computing that has developed due to the proliferation of massive data sources, such as large, dispersed scientific instruments and sensors, whose data needs to be processed, analyzed and integrated into simulations to derive scientific and innovative insights. Indeed, Exascale I/O, as a problem that has not been sufficiently dealt with for simulation codes, is appropriately addressed by the SAGE platform. The objective of this paper is to discuss the software architecture of the SAGE system and look at early results we have obtained employing some of its key methodologies, as the system continues to evolve.
△ Less
Submitted 1 May, 2018;
originally announced May 2018.
-
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
Authors:
Tabish Rashid,
Mikayel Samvelyan,
Christian Schroeder de Witt,
Gregory Farquhar,
Jakob Foerster,
Shimon Whiteson
Abstract:
In many real-world settings, a team of agents must coordinate their behaviour while acting in a decentralised way. At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an att…
▽ More
In many real-world settings, a team of agents must coordinate their behaviour while acting in a decentralised way. At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations. We structurally enforce that the joint-action value is monotonic in the per-agent values, which allows tractable maximisation of the joint action-value in off-policy learning, and guarantees consistency between the centralised and decentralised policies. We evaluate QMIX on a challenging set of StarCraft II micromanagement tasks, and show that QMIX significantly outperforms existing value-based multi-agent reinforcement learning methods.
△ Less
Submitted 6 June, 2018; v1 submitted 30 March, 2018;
originally announced March 2018.
-
In-place Parallel Super Scalar Samplesort (IPS$^4$o)
Authors:
Michael Axtmann,
Sascha Witt,
Daniel Ferizovic,
Peter Sanders
Abstract:
We present a sorting algorithm that works in-place, executes in parallel, is cache-efficient, avoids branch-mispredictions, and performs work O(n log n) for arbitrary inputs with high probability. The main algorithmic contributions are new ways to make distribution-based algorithms in-place: On the practical side, by using coarse-grained block-based permutations, and on the theoretical side, we sh…
▽ More
We present a sorting algorithm that works in-place, executes in parallel, is cache-efficient, avoids branch-mispredictions, and performs work O(n log n) for arbitrary inputs with high probability. The main algorithmic contributions are new ways to make distribution-based algorithms in-place: On the practical side, by using coarse-grained block-based permutations, and on the theoretical side, we show how to eliminate the recursion stack. Extensive experiments show that our algorithm IPS$^4$o scales well on a variety of multi-core machines. We outperform our closest in-place competitor by a factor of up to 3. Even as a sequential algorithm, we are up to 1.5 times faster than the closest sequential competitor, BlockQuicksort.
△ Less
Submitted 29 June, 2017; v1 submitted 5 May, 2017;
originally announced May 2017.
-
Control of superconductivity with a single ferromagnetic layer in niobium/erbium bilayers
Authors:
N. Satchell,
J. D. S. Witt,
M. G. Flokstra,
S. L. Lee,
J. F. K. Cooper,
C. J. Kinane,
S. Langridge,
G. Burnell
Abstract:
Superconducting spintronics in hybrid superconductor/ferromagnet (S-F) heterostructures provides an exciting potential new class of device. The prototypical super-spintronic device is the superconducting spin-valve, where the critical temperature, $T_c$, of the S-layer can be controlled by the relative orientation of two (or more) F-layers. Here, we show that such control is also possible in a sim…
▽ More
Superconducting spintronics in hybrid superconductor/ferromagnet (S-F) heterostructures provides an exciting potential new class of device. The prototypical super-spintronic device is the superconducting spin-valve, where the critical temperature, $T_c$, of the S-layer can be controlled by the relative orientation of two (or more) F-layers. Here, we show that such control is also possible in a simple S/F bilayer. Using field history to set the remanent magnetic state of a thin Er layer, we demonstrate for a Nb/Er bilayer a high level of control of both $T_c$ and the shape of the resistive transition, R(T), to zero resistance. We are able to model the origin of the remanent magnetization, treating it as an increase in the effective exchange field of the ferromagnet and link this, using conventional S-F theory, to the suppression of $T_c$. We observe stepped features in the R(T) which we argue is due to a fundamental interaction of superconductivity with inhomogeneous ferromagnetism, a phenomena currently lacking theoretical description.
△ Less
Submitted 9 April, 2017; v1 submitted 27 January, 2017;
originally announced January 2017.
-
Trip-Based Public Transit Routing Using Condensed Search Trees
Authors:
Sascha Witt
Abstract:
We study the problem of planning Pareto-optimal journeys in public transit networks. Most existing algorithms and speed-up techniques work by computing subjourneys to intermediary stops until the destination is reached. In contrast, the trip-based model focuses on trips and transfers between them, constructing journeys as a sequence of trips. In this paper, we develop a speed-up technique for this…
▽ More
We study the problem of planning Pareto-optimal journeys in public transit networks. Most existing algorithms and speed-up techniques work by computing subjourneys to intermediary stops until the destination is reached. In contrast, the trip-based model focuses on trips and transfers between them, constructing journeys as a sequence of trips. In this paper, we develop a speed-up technique for this model inspired by principles behind existing state-of-the-art speed-up techniques, Transfer Pattern and Hub Labelling. The resulting algorithm allows us to compute Pareto-optimal (with respect to arrival time and number of transfers) 24-hour profiles on very large real-world networks in less than half a millisecond. Compared to the current state of the art for bicriteria queries on public transit networks, this is up to two orders of magnitude faster, while increasing preprocessing overhead by at most one order of magnitude.
△ Less
Submitted 15 September, 2016; v1 submitted 5 July, 2016;
originally announced July 2016.