-
IDs for AI Systems
Authors:
Alan Chan,
Noam Kolt,
Peter Wills,
Usman Anwar,
Christian Schroeder de Witt,
Nitarshan Rajkumar,
Lewis Hammond,
David Krueger,
Lennart Heim,
Markus Anderljung
Abstract:
AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. A user may not be able to verify whether a system satisfies certain safety standards. An investigator may not know whom to investigate when a system causes an incident. A platform may find it difficult to penalize repeated negative interactions with the same s…
▽ More
AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. A user may not be able to verify whether a system satisfies certain safety standards. An investigator may not know whom to investigate when a system causes an incident. A platform may find it difficult to penalize repeated negative interactions with the same system. Across a number of domains, IDs address analogous problems by identifying \textit{particular} entities (e.g., a particular Boeing 747) and providing information about other entities of the same class (e.g., some or all Boeing 747s). We propose a framework in which IDs are ascribed to \textbf{instances} of AI systems (e.g., a particular chat session with Claude 3), and associated information is accessible to parties seeking to interact with that system. We characterize IDs for AI systems, argue that there could be significant demand for IDs from key actors, analyze how those actors could incentivize ID adoption, explore potential implementations of our framework, and highlight limitations and risks. IDs seem most warranted in high-stakes settings, where certain actors (e.g., those that enable AI systems to make financial transactions) could experiment with incentives for ID use. Deployers of AI systems could experiment with develo** ID implementations. With further study, IDs could help to manage a world where AI systems pervade society.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Sliding Window 3-Objective Pareto Optimization for Problems with Chance Constraints
Authors:
Frank Neumann,
Carsten Witt
Abstract:
Constrained single-objective problems have been frequently tackled by evolutionary multi-objective algorithms where the constraint is relaxed into an additional objective. Recently, it has been shown that Pareto optimization approaches using bi-objective models can be significantly sped up using sliding windows (Neumann and Witt, ECAI 2023). In this paper, we extend the sliding window approach to…
▽ More
Constrained single-objective problems have been frequently tackled by evolutionary multi-objective algorithms where the constraint is relaxed into an additional objective. Recently, it has been shown that Pareto optimization approaches using bi-objective models can be significantly sped up using sliding windows (Neumann and Witt, ECAI 2023). In this paper, we extend the sliding window approach to $3$-objective formulations for tackling chance constrained problems. On the theoretical side, we show that our new sliding window approach improves previous runtime bounds obtained in (Neumann and Witt, GECCO 2023) while maintaining the same approximation guarantees. Our experimental investigations for the chance constrained dominating set problem show that our new sliding window approach allows one to solve much larger instances in a much more efficient way than the 3-objective approach presented in (Neumann and Witt, GECCO 2023).
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits
Authors:
Andis Draguns,
Andrew Gritsevskiy,
Sumeet Ramesh Motwani,
Charlie Rogers-Smith,
Jeffrey Ladish,
Christian Schroeder de Witt
Abstract:
The rapid proliferation of open-source language models significantly increases the risks of downstream backdoor attacks. These backdoors can introduce dangerous behaviours during model deployment and can evade detection by conventional cybersecurity monitoring systems. In this paper, we introduce a novel class of backdoors in autoregressive transformer models, that, in contrast to prior art, are u…
▽ More
The rapid proliferation of open-source language models significantly increases the risks of downstream backdoor attacks. These backdoors can introduce dangerous behaviours during model deployment and can evade detection by conventional cybersecurity monitoring systems. In this paper, we introduce a novel class of backdoors in autoregressive transformer models, that, in contrast to prior art, are unelicitable in nature. Unelicitability prevents the defender from triggering the backdoor, making it impossible to evaluate or detect ahead of deployment even if given full white-box access and using automated techniques, such as red-teaming or certain formal verification methods. We show that our novel construction is not only unelicitable thanks to using cryptographic techniques, but also has favourable robustness properties. We confirm these properties in empirical investigations, and provide evidence that our backdoors can withstand state-of-the-art mitigation strategies. Additionally, we expand on previous work by showing that our universal backdoors, while not completely undetectable in white-box settings, can be harder to detect than some existing designs. By demonstrating the feasibility of seamlessly integrating backdoors into transformer models, this paper fundamentally questions the efficacy of pre-deployment detection strategies. This offers new insights into the offence-defence balance in AI safety and security.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Computing Low-Entropy Couplings for Large-Support Distributions
Authors:
Samuel Sokota,
Dylan Sam,
Christian Schroeder de Witt,
Spencer Compton,
Jakob Foerster,
J. Zico Kolter
Abstract:
Minimum-entropy coupling (MEC) -- the process of finding a joint distribution with minimum entropy for given marginals -- has applications in areas such as causality and steganography. However, existing algorithms are either computationally intractable for large-support distributions or limited to specific distribution types and sensitive to hyperparameter choices. This work addresses these limita…
▽ More
Minimum-entropy coupling (MEC) -- the process of finding a joint distribution with minimum entropy for given marginals -- has applications in areas such as causality and steganography. However, existing algorithms are either computationally intractable for large-support distributions or limited to specific distribution types and sensitive to hyperparameter choices. This work addresses these limitations by unifying a prior family of iterative MEC (IMEC) approaches into a generalized partition-based formalism. From this framework, we derive a novel IMEC algorithm called ARIMEC, capable of handling arbitrary discrete distributions, and introduce a method to make IMEC robust to suboptimal hyperparameter settings. These innovations facilitate the application of IMEC to high-throughput steganography with language models, among other settings. Our codebase is available at https://github.com/ssokota/mec .
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Authors:
Francisco Eiras,
Aleksandar Petrov,
Bertie Vidgen,
Christian Schroeder de Witt,
Fabio Pizzati,
Katherine Elkins,
Supratik Mukhopadhyay,
Adel Bibi,
Botos Csaba,
Fabro Steibel,
Fazl Barez,
Genevieve Smith,
Gianluca Guadagni,
Jon Chun,
Jordi Cabot,
Joseph Marvin Imperial,
Juan A. Nolazco-Flores,
Lori Landay,
Matthew Jackson,
Paul Röttger,
Philip H. S. Torr,
Trevor Darrell,
Yong Suk Lee,
Jakob Foerster
Abstract:
In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation i…
▽ More
In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source Generative AI. We argue for the responsible open sourcing of generative AI models in the near and medium term. To set the stage, we first introduce an AI openness taxonomy system and apply it to 40 current large language models. We then outline differential benefits and risks of open versus closed source AI and present potential risk mitigation, ranging from best practices to calls for technical and scientific contributions. We hope that this report will add a much needed missing voice to the current public discourse on near to mid-term AI safety and other societal impact.
△ Less
Submitted 24 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Runtime Analysis of a Multi-Valued Compact Genetic Algorithm on Generalized OneMax
Authors:
Sumit Adak,
Carsten Witt
Abstract:
A class of metaheuristic techniques called estimation-of-distribution algorithms (EDAs) are employed in optimization as more sophisticated substitutes for traditional strategies like evolutionary algorithms. EDAs generally drive the search for the optimum by creating explicit probabilistic models of potential candidate solutions through repeated sampling and selection from the underlying search sp…
▽ More
A class of metaheuristic techniques called estimation-of-distribution algorithms (EDAs) are employed in optimization as more sophisticated substitutes for traditional strategies like evolutionary algorithms. EDAs generally drive the search for the optimum by creating explicit probabilistic models of potential candidate solutions through repeated sampling and selection from the underlying search space.
Most theoretical research on EDAs has focused on pseudo-Boolean optimization. Jedidia et al. (GECCO 2023) proposed the first EDAs for optimizing problems involving multi-valued decision variables. By building a framework, they have analyzed the runtime of a multi-valued UMDA on the r-valued LeadingOnes function. Using their framework, here we focus on the multi-valued compact genetic algorithm (r-cGA) and provide a first runtime analysis of a generalized OneMax function.
To prove our results, we investigate the effect of genetic drift and progress of the probabilistic model towards the optimum. After finding the right algorithm parameters, we prove that the r-cGA solves this r-valued OneMax problem efficiently. We show that with high probability, the runtime bound is O(r2 n log2 r log3 n). At the end of experiments, we state one conjecture related to the expected runtime of another variant of multi-valued OneMax function.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and Detection
Authors:
Linas Nasvytis,
Kai Sandbrink,
Jakob Foerster,
Tim Franzmeyer,
Christian Schroeder de Witt
Abstract:
While reinforcement learning (RL) algorithms have been successfully applied across numerous sequential decision-making problems, their generalization to unforeseen testing environments remains a significant concern. In this paper, we study the problem of out-of-distribution (OOD) detection in RL, which focuses on identifying situations at test time that RL agents have not encountered in their trai…
▽ More
While reinforcement learning (RL) algorithms have been successfully applied across numerous sequential decision-making problems, their generalization to unforeseen testing environments remains a significant concern. In this paper, we study the problem of out-of-distribution (OOD) detection in RL, which focuses on identifying situations at test time that RL agents have not encountered in their training environments. We first propose a clarification of terminology for OOD detection in RL, which aligns it with the literature from other machine learning domains. We then present new benchmark scenarios for OOD detection, which introduce anomalies with temporal autocorrelation into different components of the agent-environment loop. We argue that such scenarios have been understudied in the current literature, despite their relevance to real-world situations. Confirming our theoretical predictions, our experimental results suggest that state-of-the-art OOD detectors are not able to identify such anomalies. To address this problem, we propose a novel method for OOD detection, which we call DEXTER (Detection via Extraction of Time Series Representations). By treating environment observations as time series data, DEXTER extracts salient time series features, and then leverages an ensemble of isolation forest algorithms to detect anomalies. We find that DEXTER can reliably identify anomalies across benchmark scenarios, exhibiting superior performance compared to both state-of-the-art OOD detectors and high-dimensional changepoint detectors adopted from statistics.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
A Flexible Evolutionary Algorithm With Dynamic Mutation Rate Archive
Authors:
Martin S. Krejca,
Carsten Witt
Abstract:
We propose a new, flexible approach for dynamically maintaining successful mutation rates in evolutionary algorithms using $k$-bit flip mutations. The algorithm adds successful mutation rates to an archive of promising rates that are favored in subsequent steps. Rates expire when their number of unsuccessful trials has exceeded a threshold, while rates currently not present in the archive can ente…
▽ More
We propose a new, flexible approach for dynamically maintaining successful mutation rates in evolutionary algorithms using $k$-bit flip mutations. The algorithm adds successful mutation rates to an archive of promising rates that are favored in subsequent steps. Rates expire when their number of unsuccessful trials has exceeded a threshold, while rates currently not present in the archive can enter it in two ways: (i) via user-defined minimum selection probabilities for rates combined with a successful step or (ii) via a stagnation detection mechanism increasing the value for a promising rate after the current bit-flip neighborhood has been explored with high probability. For the minimum selection probabilities, we suggest different options, including heavy-tailed distributions.
We conduct rigorous runtime analysis of the flexible evolutionary algorithm on the OneMax and Jump functions, on general unimodal functions, on minimum spanning trees, and on a class of hurdle-like functions with varying hurdle width that benefit particularly from the archive of promising mutation rates. In all cases, the runtime bounds are close to or even outperform the best known results for both stagnation detection and heavy-tailed mutations.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Secret Collusion Among Generative AI Agents
Authors:
Sumeet Ramesh Motwani,
Mikhail Baranchuk,
Martin Strohmeier,
Vijay Bolina,
Philip H. S. Torr,
Lewis Hammond,
Christian Schroeder de Witt
Abstract:
Recent capability increases in large language models (LLMs) open up applications in which teams of communicating generative AI agents solve joint tasks. This poses privacy and security challenges concerning the unauthorised sharing of information, or other unwanted forms of agent coordination. Modern steganographic techniques could render such dynamics hard to detect. In this paper, we comprehensi…
▽ More
Recent capability increases in large language models (LLMs) open up applications in which teams of communicating generative AI agents solve joint tasks. This poses privacy and security challenges concerning the unauthorised sharing of information, or other unwanted forms of agent coordination. Modern steganographic techniques could render such dynamics hard to detect. In this paper, we comprehensively formalise the problem of secret collusion in systems of generative AI agents by drawing on relevant concepts from both the AI and security literature. We study incentives for the use of steganography, and propose a variety of mitigation measures. Our investigations result in a model evaluation framework that systematically tests capabilities required for various forms of secret collusion. We provide extensive empirical results across a range of contemporary LLMs. While the steganographic capabilities of current models remain limited, GPT-4 displays a capability jump suggesting the need for continuous monitoring of steganographic frontier model capabilities. We conclude by laying out a comprehensive research program to mitigate future risks of collusion between generative AI models.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
The Danger Of Arrogance: Welfare Equilibra As A Solution To Stackelberg Self-Play In Non-Coincidental Games
Authors:
Jake Levi,
Chris Lu,
Timon Willi,
Christian Schroeder de Witt,
Jakob Foerster
Abstract:
The increasing prevalence of multi-agent learning systems in society necessitates understanding how to learn effective and safe policies in general-sum multi-agent environments against a variety of opponents, including self-play. General-sum learning is difficult because of non-stationary opponents and misaligned incentives. Our first main contribution is to show that many recent approaches to gen…
▽ More
The increasing prevalence of multi-agent learning systems in society necessitates understanding how to learn effective and safe policies in general-sum multi-agent environments against a variety of opponents, including self-play. General-sum learning is difficult because of non-stationary opponents and misaligned incentives. Our first main contribution is to show that many recent approaches to general-sum learning can be derived as approximations to Stackelberg strategies, which suggests a framework for develo** new multi-agent learning algorithms. We then define non-coincidental games as games in which the Stackelberg strategy profile is not a Nash Equilibrium. This notably includes several canonical matrix games and provides a normative theory for why existing algorithms fail in self-play in such games. We address this problem by introducing Welfare Equilibria (WE) as a generalisation of Stackelberg Strategies, which can recover desirable Nash Equilibria even in non-coincidental games. Finally, we introduce Welfare Function Search (WelFuSe) as a practical approach to finding desirable WE against unknown opponents, which finds more mutually desirable solutions in self-play, while preserving performance against naive learning opponents.
△ Less
Submitted 27 March, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
JaxMARL: Multi-Agent RL Environments in JAX
Authors:
Alexander Rutherford,
Benjamin Ellis,
Matteo Gallici,
Jonathan Cook,
Andrei Lupu,
Gardar Ingvarsson,
Timon Willi,
Akbir Khan,
Christian Schroeder de Witt,
Alexandra Souly,
Saptarashmi Bandyopadhyay,
Mikayel Samvelyan,
Minqi Jiang,
Robert Tjarko Lange,
Shimon Whiteson,
Bruno Lacerda,
Nick Hawes,
Tim Rocktaschel,
Chris Lu,
Jakob Nicolaus Foerster
Abstract:
Benchmarks play an important role in the development of machine learning algorithms. For example, research in reinforcement learning (RL) has been heavily influenced by available environments and benchmarks. However, RL environments are traditionally run on the CPU, limiting their scalability with typical academic compute. Recent advancements in JAX have enabled the wider use of hardware accelerat…
▽ More
Benchmarks play an important role in the development of machine learning algorithms. For example, research in reinforcement learning (RL) has been heavily influenced by available environments and benchmarks. However, RL environments are traditionally run on the CPU, limiting their scalability with typical academic compute. Recent advancements in JAX have enabled the wider use of hardware acceleration to overcome these computational hurdles, enabling massively parallel RL training pipelines and environments. This is particularly useful for multi-agent reinforcement learning (MARL) research. First of all, multiple agents must be considered at each environment step, adding computational burden, and secondly, the sample complexity is increased due to non-stationarity, decentralised partial observability, or other MARL challenges. In this paper, we present JaxMARL, the first open-source code base that combines ease-of-use with GPU enabled efficiency, and supports a large number of commonly used MARL environments as well as popular baseline algorithms. When considering wall clock time, our experiments show that per-run our JAX-based training pipeline is up to 12500x faster than existing approaches. This enables efficient and thorough evaluations, with the potential to alleviate the evaluation crisis of the field. We also introduce and benchmark SMAX, a vectorised, simplified version of the popular StarCraft Multi-Agent Challenge, which removes the need to run the StarCraft II game engine. This not only enables GPU acceleration, but also provides a more flexible MARL environment, unlocking the potential for self-play, meta-learning, and other future applications in MARL. We provide code at https://github.com/flairox/jaxmarl.
△ Less
Submitted 19 December, 2023; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Bayesian Exploration Networks
Authors:
Mattie Fellows,
Brandon Kaplowitz,
Christian Schroeder de Witt,
Shimon Whiteson
Abstract:
Bayesian reinforcement learning (RL) offers a principled and elegant approach for sequential decision making under uncertainty. Most notably, Bayesian agents do not face an exploration/exploitation dilemma, a major pathology of frequentist methods. However theoretical understanding of model-free approaches is lacking. In this paper, we introduce a novel Bayesian model-free formulation and the firs…
▽ More
Bayesian reinforcement learning (RL) offers a principled and elegant approach for sequential decision making under uncertainty. Most notably, Bayesian agents do not face an exploration/exploitation dilemma, a major pathology of frequentist methods. However theoretical understanding of model-free approaches is lacking. In this paper, we introduce a novel Bayesian model-free formulation and the first analysis showing that model-free approaches can yield Bayes-optimal policies. We show all existing model-free approaches make approximations that yield policies that can be arbitrarily Bayes-suboptimal. As a first step towards model-free Bayes optimality, we introduce the Bayesian exploration network (BEN) which uses normalising flows to model both the aleatoric uncertainty (via density estimation) and epistemic uncertainty (via variational inference) in the Bellman operator. In the limit of complete optimisation, BEN learns true Bayes-optimal policies, but like in variational expectation-maximisation, partial optimisation renders our approach tractable. Empirical results demonstrate that BEN can learn true Bayes-optimal policies in tasks where existing model-free approaches fail.
△ Less
Submitted 25 June, 2024; v1 submitted 24 August, 2023;
originally announced August 2023.
-
First Steps Towards a Runtime Analysis of Neuroevolution
Authors:
Paul Fischer,
Emil Lundt Larsen,
Carsten Witt
Abstract:
We consider a simple setting in neuroevolution where an evolutionary algorithm optimizes the weights and activation functions of a simple artificial neural network. We then define simple example functions to be learned by the network and conduct rigorous runtime analyses for networks with a single neuron and for a more advanced structure with several neurons and two layers. Our results show that t…
▽ More
We consider a simple setting in neuroevolution where an evolutionary algorithm optimizes the weights and activation functions of a simple artificial neural network. We then define simple example functions to be learned by the network and conduct rigorous runtime analyses for networks with a single neuron and for a more advanced structure with several neurons and two layers. Our results show that the proposed algorithm is generally efficient on two example problems designed for one neuron and efficient with at least constant probability on the example problem for a two-layer network. In particular, the so-called harmonic mutation operator choosing steps of size $j$ with probability proportional to $1/j$ turns out as a good choice for the underlying search space. However, for the case of one neuron, we also identify situations with hard-to-overcome local optima. Experimental investigations of our neuroevolutionary algorithm and a state-of-the-art CMA-ES support the theoretical findings.
△ Less
Submitted 16 October, 2023; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Fast Pareto Optimization Using Sliding Window Selection
Authors:
Frank Neumann,
Carsten Witt
Abstract:
Pareto optimization using evolutionary multi-objective algorithms has been widely applied to solve constrained submodular optimization problems. A crucial factor determining the runtime of the used evolutionary algorithms to obtain good approximations is the population size of the algorithms which grows with the number of trade-offs that the algorithms encounter. In this paper, we introduce a slid…
▽ More
Pareto optimization using evolutionary multi-objective algorithms has been widely applied to solve constrained submodular optimization problems. A crucial factor determining the runtime of the used evolutionary algorithms to obtain good approximations is the population size of the algorithms which grows with the number of trade-offs that the algorithms encounter. In this paper, we introduce a sliding window speed up technique for recently introduced algorithms. We prove that our technique eliminates the population size as a crucial factor negatively impacting the runtime and achieves the same theoretical performance guarantees as previous approaches within less computation time. Our experimental investigations for the classical maximum coverage problem confirms that our sliding window technique clearly leads to better results for a wide range of instances and constraint settings.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
How Well Does the Metropolis Algorithm Cope With Local Optima?
Authors:
Benjamin Doerr,
Taha El Ghazi El Houssaini,
Amirhossein Rajabi,
Carsten Witt
Abstract:
The Metropolis algorithm (MA) is a classic stochastic local search heuristic. It avoids getting stuck in local optima by occasionally accepting inferior solutions. To better and in a rigorous manner understand this ability, we conduct a mathematical runtime analysis of the MA on the CLIFF benchmark. Apart from one local optimum, cliff functions are monotonically increasing towards the global optim…
▽ More
The Metropolis algorithm (MA) is a classic stochastic local search heuristic. It avoids getting stuck in local optima by occasionally accepting inferior solutions. To better and in a rigorous manner understand this ability, we conduct a mathematical runtime analysis of the MA on the CLIFF benchmark. Apart from one local optimum, cliff functions are monotonically increasing towards the global optimum. Consequently, to optimize a cliff function, the MA only once needs to accept an inferior solution. Despite seemingly being an ideal benchmark for the MA to profit from its main working principle, our mathematical runtime analysis shows that this hope does not come true. Even with the optimal temperature (the only parameter of the MA), the MA optimizes most cliff functions less efficiently than simple elitist evolutionary algorithms (EAs), which can only leave the local optimum by generating a superior solution possibly far away. This result suggests that our understanding of why the MA is often very successful in practice is not yet complete. Our work also suggests to equip the MA with global mutation operators, an idea supported by our preliminary experiments.
△ Less
Submitted 15 May, 2023; v1 submitted 21 April, 2023;
originally announced April 2023.
-
3-Objective Pareto Optimization for Problems with Chance Constraints
Authors:
Frank Neumann,
Carsten Witt
Abstract:
Evolutionary multi-objective algorithms have successfully been used in the context of Pareto optimization where a given constraint is relaxed into an additional objective. In this paper, we explore the use of 3-objective formulations for problems with chance constraints. Our formulation trades off the expected cost and variance of the stochastic component as well as the given deterministic constra…
▽ More
Evolutionary multi-objective algorithms have successfully been used in the context of Pareto optimization where a given constraint is relaxed into an additional objective. In this paper, we explore the use of 3-objective formulations for problems with chance constraints. Our formulation trades off the expected cost and variance of the stochastic component as well as the given deterministic constraint. We point out benefits that this 3-objective formulation has compared to a bi-objective one recently investigated for chance constraints with Normally distributed stochastic components. Our analysis shows that the 3-objective formulation allows to compute all required trade-offs using 1-bit flips only, when dealing with a deterministic cardinality constraint. Furthermore, we carry out experimental investigations for the chance constrained dominating set problem and show the benefit for this classical NP-hard problem.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning
Authors:
Yat Long Lo,
Christian Schroeder de Witt,
Samuel Sokota,
Jakob Nicolaus Foerster,
Shimon Whiteson
Abstract:
By enabling agents to communicate, recent cooperative multi-agent reinforcement learning (MARL) methods have demonstrated better task performance and more coordinated behavior. Most existing approaches facilitate inter-agent communication by allowing agents to send messages to each other through free communication channels, i.e., cheap talk channels. Current methods require these channels to be co…
▽ More
By enabling agents to communicate, recent cooperative multi-agent reinforcement learning (MARL) methods have demonstrated better task performance and more coordinated behavior. Most existing approaches facilitate inter-agent communication by allowing agents to send messages to each other through free communication channels, i.e., cheap talk channels. Current methods require these channels to be constantly accessible and known to the agents a priori. In this work, we lift these requirements such that the agents must discover the cheap talk channels and learn how to use them. Hence, the problem has two main parts: cheap talk discovery (CTD) and cheap talk utilization (CTU). We introduce a novel conceptual framework for both parts and develop a new algorithm based on mutual information maximization that outperforms existing algorithms in CTD/CTU settings. We also release a novel benchmark suite to stimulate future research in CTD/CTU.
△ Less
Submitted 19 March, 2023;
originally announced March 2023.
-
Revealing Robust Oil and Gas Company Macro-Strategies using Deep Multi-Agent Reinforcement Learning
Authors:
Dylan Radovic,
Lucas Kruitwagen,
Christian Schroeder de Witt,
Ben Caldecott,
Shane Tomlinson,
Mark Workman
Abstract:
The energy transition potentially poses an existential risk for major international oil companies (IOCs) if they fail to adapt to low-carbon business models. Projections of energy futures, however, are met with diverging assumptions on its scale and pace, causing disagreement among IOC decision-makers and their stakeholders over what the business model of an incumbent fossil fuel company should be…
▽ More
The energy transition potentially poses an existential risk for major international oil companies (IOCs) if they fail to adapt to low-carbon business models. Projections of energy futures, however, are met with diverging assumptions on its scale and pace, causing disagreement among IOC decision-makers and their stakeholders over what the business model of an incumbent fossil fuel company should be. In this work, we used deep multi-agent reinforcement learning to solve an energy systems wargame wherein players simulate IOC decision-making, including hydrocarbon and low-carbon investments decisions, dividend policies, and capital structure measures, through an uncertain energy transition to explore critical and non-linear governance questions, from leveraged transitions to reserve replacements. Adversarial play facilitated by state-of-the-art algorithms revealed decision-making strategies robust to energy transition uncertainty and against multiple IOCs. In all games, robust strategies emerged in the form of low-carbon business models as a result of early transition-oriented movement. IOCs adopting such strategies outperformed business-as-usual and delayed transition strategies regardless of hydrocarbon demand projections. In addition to maximizing value, these strategies benefit greater society by contributing substantial amounts of capital necessary to accelerate the global low-carbon energy transition. Our findings point towards the need for lenders and investors to effectively mobilize transition-oriented finance and engage with IOCs to ensure responsible reallocation of capital towards low-carbon business models that would enable the emergence of fossil fuel incumbents as future low-carbon leaders.
△ Less
Submitted 20 November, 2022;
originally announced November 2022.
-
Perfectly Secure Steganography Using Minimum Entropy Coupling
Authors:
Christian Schroeder de Witt,
Samuel Sokota,
J. Zico Kolter,
Jakob Foerster,
Martin Strohmeier
Abstract:
Steganography is the practice of encoding secret information into innocuous content in such a manner that an adversarial third party would not realize that there is hidden meaning. While this problem has classically been studied in security literature, recent advances in generative models have led to a shared interest among security and machine learning researchers in develo** scalable steganogr…
▽ More
Steganography is the practice of encoding secret information into innocuous content in such a manner that an adversarial third party would not realize that there is hidden meaning. While this problem has classically been studied in security literature, recent advances in generative models have led to a shared interest among security and machine learning researchers in develo** scalable steganography techniques. In this work, we show that a steganography procedure is perfectly secure under Cachin (1998)'s information-theoretic model of steganography if and only if it is induced by a coupling. Furthermore, we show that, among perfectly secure procedures, a procedure maximizes information throughput if and only if it is induced by a minimum entropy coupling. These insights yield what are, to the best of our knowledge, the first steganography algorithms to achieve perfect security guarantees for arbitrary covertext distributions. To provide empirical validation, we compare a minimum entropy coupling-based approach to three modern baselines -- arithmetic coding, Meteor, and adaptive dynamic grou** -- using GPT-2, WaveRNN, and Image Transformer as communication channels. We find that the minimum entropy coupling-based approach achieves superior encoding efficiency, despite its stronger security constraints. In aggregate, these results suggest that it may be natural to view information-theoretic steganography through the lens of minimum entropy coupling.
△ Less
Submitted 30 October, 2023; v1 submitted 24 October, 2022;
originally announced October 2022.
-
Equivariant Networks for Zero-Shot Coordination
Authors:
Darius Muglich,
Christian Schroeder de Witt,
Elise van der Pol,
Shimon Whiteson,
Jakob Foerster
Abstract:
Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner. A common failure mode is symmetry breaking, when agents arbitrarily converge on one out of many equivalent but mutually incompatible policies. Commonly these examples include partial observability, e.g. waving your right hand vs. left hand to convey a covert message.…
▽ More
Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner. A common failure mode is symmetry breaking, when agents arbitrarily converge on one out of many equivalent but mutually incompatible policies. Commonly these examples include partial observability, e.g. waving your right hand vs. left hand to convey a covert message. In this paper, we present a novel equivariant network architecture for use in Dec-POMDPs that effectively leverages environmental symmetry for improving zero-shot coordination, doing so more effectively than prior methods. Our method also acts as a ``coordination-improvement operator'' for generic, pre-trained policies, and thus may be applied at test-time in conjunction with any self-play algorithm. We provide theoretical guarantees of our work and test on the AI benchmark task of Hanabi, where we demonstrate our methods outperforming other symmetry-aware baselines in zero-shot coordination, as well as able to improve the coordination ability of a variety of pre-trained policies. In particular, we show our method can be used to improve on the state of the art for zero-shot coordination on the Hanabi benchmark.
△ Less
Submitted 10 April, 2024; v1 submitted 21 October, 2022;
originally announced October 2022.
-
Discovered Policy Optimisation
Authors:
Chris Lu,
Jakub Grudzien Kuba,
Alistair Letcher,
Luke Metz,
Christian Schroeder de Witt,
Jakob Foerster
Abstract:
Tremendous progress has been made in reinforcement learning (RL) over the past decade. Most of these advancements came through the continual development of new algorithms, which were designed using a combination of mathematical derivations, intuitions, and experimentation. Such an approach of creating algorithms manually is limited by human understanding and ingenuity. In contrast, meta-learning p…
▽ More
Tremendous progress has been made in reinforcement learning (RL) over the past decade. Most of these advancements came through the continual development of new algorithms, which were designed using a combination of mathematical derivations, intuitions, and experimentation. Such an approach of creating algorithms manually is limited by human understanding and ingenuity. In contrast, meta-learning provides a toolkit for automatic machine learning method optimisation, potentially addressing this flaw. However, black-box approaches which attempt to discover RL algorithms with minimal prior structure have thus far not outperformed existing hand-crafted algorithms. Mirror Learning, which includes RL algorithms, such as PPO, offers a potential middle-ground starting point: while every method in this framework comes with theoretical guarantees, components that differentiate them are subject to design. In this paper we explore the Mirror Learning space by meta-learning a "drift" function. We refer to the immediate result as Learnt Policy Optimisation (LPO). By analysing LPO we gain original insights into policy optimisation which we use to formulate a novel, closed-form RL algorithm, Discovered Policy Optimisation (DPO). Our experiments in Brax environments confirm state-of-the-art performance of LPO and DPO, as well as their transfer to unseen settings.
△ Less
Submitted 12 October, 2022; v1 submitted 11 October, 2022;
originally announced October 2022.
-
Runtime Analysis of the (1+1) EA on Weighted Sums of Transformed Linear Functions
Authors:
Frank Neumann,
Carsten Witt
Abstract:
Linear functions play a key role in the runtime analysis of evolutionary algorithms and studies have provided a wide range of new insights and techniques for analyzing evolutionary computation methods. Motivated by studies on separable functions and the optimization behaviour of evolutionary algorithms as well as objective functions from the area of chance constrained optimization, we study the cl…
▽ More
Linear functions play a key role in the runtime analysis of evolutionary algorithms and studies have provided a wide range of new insights and techniques for analyzing evolutionary computation methods. Motivated by studies on separable functions and the optimization behaviour of evolutionary algorithms as well as objective functions from the area of chance constrained optimization, we study the class of objective functions that are weighted sums of two transformed linear functions. Our results show that the (1+1) EA, with a mutation rate depending on the number of overlap** bits of the functions, obtains an optimal solution for these functions in expected time O(n log n), thereby generalizing a well-known result for linear functions to a much wider range of problems.
△ Less
Submitted 11 August, 2022;
originally announced August 2022.
-
Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks
Authors:
Tim Franzmeyer,
Stephen McAleer,
João F. Henriques,
Jakob N. Foerster,
Philip H. S. Torr,
Adel Bibi,
Christian Schroeder de Witt
Abstract:
Autonomous agents deployed in the real world need to be robust against adversarial attacks on sensory inputs. Robustifying agent policies requires anticipating the strongest attacks possible. We demonstrate that existing observation-space attacks on reinforcement learning agents have a common weakness: while effective, their lack of information-theoretic detectability constraints makes them detect…
▽ More
Autonomous agents deployed in the real world need to be robust against adversarial attacks on sensory inputs. Robustifying agent policies requires anticipating the strongest attacks possible. We demonstrate that existing observation-space attacks on reinforcement learning agents have a common weakness: while effective, their lack of information-theoretic detectability constraints makes them detectable using automated means or human inspection. Detectability is undesirable to adversaries as it may trigger security escalations. We introduce ε-illusory, a novel form of adversarial attack on sequential decision-makers that is both effective and of ε-bounded statistical detectability. We propose a novel dual ascent algorithm to learn such attacks end-to-end. Compared to existing attacks, we empirically find ε-illusory to be significantly harder to detect with automated methods, and a small study with human participants (IRB approval under reference R84123/RE001) suggests they are similarly harder to detect for humans. Our findings suggest the need for better anomaly detectors, as well as effective hardware- and system-level defenses. The project website can be found at https://tinyurl.com/illusory-attacks.
△ Less
Submitted 6 May, 2024; v1 submitted 20 July, 2022;
originally announced July 2022.
-
Generalized Beliefs for Cooperative AI
Authors:
Darius Muglich,
Luisa Zintgraf,
Christian Schroeder de Witt,
Shimon Whiteson,
Jakob Foerster
Abstract:
Self-play is a common paradigm for constructing solutions in Markov games that can yield optimal policies in collaborative settings. However, these policies often adopt highly-specialized conventions that make playing with a novel partner difficult. To address this, recent approaches rely on encoding symmetry and convention-awareness into policy training, but these require strong environmental ass…
▽ More
Self-play is a common paradigm for constructing solutions in Markov games that can yield optimal policies in collaborative settings. However, these policies often adopt highly-specialized conventions that make playing with a novel partner difficult. To address this, recent approaches rely on encoding symmetry and convention-awareness into policy training, but these require strong environmental assumptions and can complicate policy training. We therefore propose moving the learning of conventions to the belief space. Specifically, we propose a belief learning model that can maintain beliefs over rollouts of policies not seen at training time, and can thus decode and adapt to novel conventions at test time. We show how to leverage this model for both search and training of a best response over various pools of policies to greatly improve ad-hoc teamplay. We also show how our setup promotes explainability and interpretability of nuanced agent conventions.
△ Less
Submitted 25 June, 2022;
originally announced June 2022.
-
Biological Evolution and Genetic Algorithms: Exploring the Space of Abstract Tile Self-Assembly
Authors:
Christian Schroeder de Witt
Abstract:
A physically-motivated genetic algorithm (GA) and full enumeration for a tile-based model of self-assembly (JaTAM) is implemented using a graphics processing unit (GPU). We observe performance gains with respect to state-of-the-art implementations on CPU of factor 7.7 for the GA and 2.9 for JaTAM. The correctness of our GA implementation is demonstrated using a test-bed fitness function, and our J…
▽ More
A physically-motivated genetic algorithm (GA) and full enumeration for a tile-based model of self-assembly (JaTAM) is implemented using a graphics processing unit (GPU). We observe performance gains with respect to state-of-the-art implementations on CPU of factor 7.7 for the GA and 2.9 for JaTAM. The correctness of our GA implementation is demonstrated using a test-bed fitness function, and our JaTAM implementation is verified by classifying a well-known search space $S_{2,8}$ based on two tile types. The performance gains achieved allow for the classification of a larger search space $S^{32}_{3,8}$ based on three tile types. The prevalence of structures based on two tile types demonstrates that simple organisms emerge preferrably even in complex ecosystems. The modularity of the largest structures found motivates the assumption that to first order, $S_{2,8}$ forms the building blocks of $S_{3,8}$. We conclude that GPUs may play an important role in future studies of evolutionary dynamics.
△ Less
Submitted 28 May, 2022;
originally announced May 2022.
-
Surround-view Fisheye Camera Perception for Automated Driving: Overview, Survey and Challenges
Authors:
Varun Ravi Kumar,
Ciaran Eising,
Christian Witt,
Senthil Yogamani
Abstract:
Surround-view fisheye cameras are commonly used for near-field sensing in automated driving. Four fisheye cameras on four sides of the vehicle are sufficient to cover 360° around the vehicle capturing the entire near-field region. Some primary use cases are automated parking, traffic jam assist, and urban driving. There are limited datasets and very little work on near-field perception tasks as th…
▽ More
Surround-view fisheye cameras are commonly used for near-field sensing in automated driving. Four fisheye cameras on four sides of the vehicle are sufficient to cover 360° around the vehicle capturing the entire near-field region. Some primary use cases are automated parking, traffic jam assist, and urban driving. There are limited datasets and very little work on near-field perception tasks as the focus in automotive perception is on far-field perception. In contrast to far-field, surround-view perception poses additional challenges due to high precision object detection requirements of 10cm and partial visibility of objects. Due to the large radial distortion of fisheye cameras, standard algorithms cannot be extended easily to the surround-view use case. Thus, we are motivated to provide a self-contained reference for automotive fisheye camera perception for researchers and practitioners. Firstly, we provide a unified and taxonomic treatment of commonly used fisheye camera models. Secondly, we discuss various perception tasks and existing literature. Finally, we discuss the challenges and future direction.
△ Less
Submitted 5 January, 2023; v1 submitted 26 May, 2022;
originally announced May 2022.
-
Model-Free Opponent Sha**
Authors:
Chris Lu,
Timon Willi,
Christian Schroeder de Witt,
Jakob Foerster
Abstract:
In general-sum games, the interaction of self-interested learning agents commonly leads to collectively worst-case outcomes, such as defect-defect in the iterated prisoner's dilemma (IPD). To overcome this, some methods, such as Learning with Opponent-Learning Awareness (LOLA), shape their opponents' learning process. However, these methods are myopic since only a small number of steps can be anti…
▽ More
In general-sum games, the interaction of self-interested learning agents commonly leads to collectively worst-case outcomes, such as defect-defect in the iterated prisoner's dilemma (IPD). To overcome this, some methods, such as Learning with Opponent-Learning Awareness (LOLA), shape their opponents' learning process. However, these methods are myopic since only a small number of steps can be anticipated, are asymmetric since they treat other agents as naive learners, and require the use of higher-order derivatives, which are calculated through white-box access to an opponent's differentiable learning algorithm. To address these issues, we propose Model-Free Opponent Sha** (M-FOS). M-FOS learns in a meta-game in which each meta-step is an episode of the underlying inner game. The meta-state consists of the inner policies, and the meta-policy produces a new inner policy to be used in the next episode. M-FOS then uses generic model-free optimisation methods to learn meta-policies that accomplish long-horizon opponent sha**. Empirically, M-FOS near-optimally exploits naive learners and other, more sophisticated algorithms from the literature. For example, to the best of our knowledge, it is the first method to learn the well-known Zero-Determinant (ZD) extortion strategy in the IPD. In the same settings, M-FOS leads to socially optimal outcomes under meta-self-play. Finally, we show that M-FOS can be scaled to high-dimensional settings.
△ Less
Submitted 4 November, 2022; v1 submitted 3 May, 2022;
originally announced May 2022.
-
(Private)-Retroactive Carbon Pricing [(P)ReCaP]: A Market-based Approach for Climate Finance and Risk Assessment
Authors:
Yoshua Bengio,
Prateek Gupta,
Dylan Radovic,
Maarten Scholl,
Andrew Williams,
Christian Schroeder de Witt,
Tianyu Zhang,
Yang Zhang
Abstract:
Insufficient Social Cost of Carbon (SCC) estimation methods and short-term decision-making horizons have hindered the ability of carbon emitters to properly correct for the negative externalities of climate change, as well as the capacity of nations to balance economic and climate policy. To overcome these limitations, we introduce Retrospective Social Cost of Carbon Updating (ReSCCU), a novel mec…
▽ More
Insufficient Social Cost of Carbon (SCC) estimation methods and short-term decision-making horizons have hindered the ability of carbon emitters to properly correct for the negative externalities of climate change, as well as the capacity of nations to balance economic and climate policy. To overcome these limitations, we introduce Retrospective Social Cost of Carbon Updating (ReSCCU), a novel mechanism that corrects for these limitations as empirically measured evidence is collected. To implement ReSCCU in the context of carbon taxation, we propose Retroactive Carbon Pricing (ReCaP), a market mechanism in which polluters offload the payment of ReSCCU adjustments to insurers. To alleviate systematic risks and minimize government involvement, we introduce the Private ReCaP (PReCaP) prediction market, which could see real-world implementation based on the engagement of a few high net-worth individuals or independent institutions.
△ Less
Submitted 2 May, 2022;
originally announced May 2022.
-
The Compact Genetic Algorithm Struggles on Cliff Functions
Authors:
Frank Neumann,
Dirk Sudholt,
Carsten Witt
Abstract:
The compact genetic algorithm (cGA) is an non-elitist estimation of distribution algorithm which has shown to be able to deal with difficult multimodal fitness landscapes that are hard to solve by elitist algorithms. In this paper, we investigate the cGA on the CLIFF function for which it has been shown recently that non-elitist evolutionary algorithms and artificial immune systems optimize it in…
▽ More
The compact genetic algorithm (cGA) is an non-elitist estimation of distribution algorithm which has shown to be able to deal with difficult multimodal fitness landscapes that are hard to solve by elitist algorithms. In this paper, we investigate the cGA on the CLIFF function for which it has been shown recently that non-elitist evolutionary algorithms and artificial immune systems optimize it in expected polynomial time. We point out that the cGA faces major difficulties when solving the CLIFF function and investigate its dynamics both experimentally and theoretically around the cliff. Our experimental results indicate that the cGA requires exponential time for all values of the update strength $K$. We show theoretically that, under sensible assumptions, there is a negative drift when sampling around the location of the cliff. Experiments further suggest that there is a phase transition for $K$ where the expected optimization time drops from $n^{Θ(n)}$ to $2^{Θ(n)}$.
△ Less
Submitted 11 April, 2022;
originally announced April 2022.
-
Simulated Annealing is a Polynomial-Time Approximation Scheme for the Minimum Spanning Tree Problem
Authors:
Benjamin Doerr,
Amirhossein Rajabi,
Carsten Witt
Abstract:
We prove that Simulated Annealing with an appropriate cooling schedule computes arbitrarily tight constant-factor approximations to the minimum spanning tree problem in polynomial time. This result was conjectured by Wegener (2005). More precisely, denoting by $n, m, w_{\max}$, and $w_{\min}$ the number of vertices and edges as well as the maximum and minimum edge weight of the MST instance, we pr…
▽ More
We prove that Simulated Annealing with an appropriate cooling schedule computes arbitrarily tight constant-factor approximations to the minimum spanning tree problem in polynomial time. This result was conjectured by Wegener (2005). More precisely, denoting by $n, m, w_{\max}$, and $w_{\min}$ the number of vertices and edges as well as the maximum and minimum edge weight of the MST instance, we prove that simulated annealing with initial temperature $T_0 \ge w_{\max}$ and multiplicative cooling schedule with factor $1-1/\ell$, where $\ell = ω(mn\ln(m))$, with probability at least $1-1/m$ computes in time $O(\ell (\ln\ln (\ell) + \ln(T_0/w_{\min}) ))$ a spanning tree with weight at most $1+κ$ times the optimum weight, where $1+κ= \frac{(1+o(1))\ln(\ell m)}{\ln(\ell) -\ln (mn\ln (m))}$. Consequently, for any $ε>0$, we can choose $\ell$ in such a way that a $(1+ε)$-approximation is found in time $O((mn\ln(n))^{1+1/ε+o(1)}(\ln\ln n + \ln(T_0/w_{\min})))$ with probability at least $1-1/m$. In the special case of so-called $(1+ε)$-separated weights, this algorithm computes an optimal solution (again in time $O( (mn\ln(n))^{1+1/ε+o(1)}(\ln\ln n + \ln(T_0/w_{\min})))$), which is a significant speed-up over Wegener's runtime guarantee of $O(m^{8 + 8/ε})$.
△ Less
Submitted 22 July, 2023; v1 submitted 5 April, 2022;
originally announced April 2022.
-
Mirror Learning: A Unifying Framework of Policy Optimisation
Authors:
Jakub Grudzien Kuba,
Christian Schroeder de Witt,
Jakob Foerster
Abstract:
Modern deep reinforcement learning (RL) algorithms are motivated by either the generalised policy iteration (GPI) or trust-region learning (TRL) frameworks. However, algorithms that strictly respect these theoretical frameworks have proven unscalable. Surprisingly, the only known scalable algorithms violate the GPI/TRL assumptions, e.g. due to required regularisation or other heuristics. The curre…
▽ More
Modern deep reinforcement learning (RL) algorithms are motivated by either the generalised policy iteration (GPI) or trust-region learning (TRL) frameworks. However, algorithms that strictly respect these theoretical frameworks have proven unscalable. Surprisingly, the only known scalable algorithms violate the GPI/TRL assumptions, e.g. due to required regularisation or other heuristics. The current explanation of their empirical success is essentially "by analogy": they are deemed approximate adaptations of theoretically sound methods. Unfortunately, studies have shown that in practice these algorithms differ greatly from their conceptual ancestors. In contrast, in this paper we introduce a novel theoretical framework, named Mirror Learning, which provides theoretical guarantees to a large class of algorithms, including TRPO and PPO. While the latter two exploit the flexibility of our framework, GPI and TRL fit in merely as pathologically restrictive corner cases thereof. This suggests that the empirical performance of state-of-the-art methods is a direct consequence of their theoretical properties, rather than of aforementioned approximate analogies. Mirror learning sets us free to boldly explore novel, theoretically sound RL algorithms, a thus far uncharted wonderland.
△ Less
Submitted 14 July, 2022; v1 submitted 7 January, 2022;
originally announced January 2022.
-
Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the Age of AI-NIDS
Authors:
Christian Schroeder de Witt,
Yongchao Huang,
Philip H. S. Torr,
Martin Strohmeier
Abstract:
Cyber attacks are increasing in volume, frequency, and complexity. In response, the security community is looking toward fully automating cyber defense systems using machine learning. However, so far the resultant effects on the coevolutionary dynamics of attackers and defenders have not been examined. In this whitepaper, we hypothesise that increased automation on both sides will accelerate the c…
▽ More
Cyber attacks are increasing in volume, frequency, and complexity. In response, the security community is looking toward fully automating cyber defense systems using machine learning. However, so far the resultant effects on the coevolutionary dynamics of attackers and defenders have not been examined. In this whitepaper, we hypothesise that increased automation on both sides will accelerate the coevolutionary cycle, thus begging the question of whether there are any resultant fixed points, and how they are characterised. Working within the threat model of Locked Shields, Europe's largest cyberdefense exercise, we study blackbox adversarial attacks on network classifiers. Given already existing attack capabilities, we question the utility of optimal evasion attack frameworks based on minimal evasion distances. Instead, we suggest a novel reinforcement learning setting that can be used to efficiently generate arbitrary adversarial perturbations. We then argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions, and introduce a temporally extended multi-agent reinforcement learning framework in which the resultant dynamics can be studied. We hypothesise that one plausible fixed point of AI-NIDS may be a scenario where the defense strategy relies heavily on whitelisted feature flow subspaces. Finally, we demonstrate that a continual learning approach is required to study attacker-defender dynamics in temporally extended general-sum games.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
Runtime Analysis of Single- and Multi-Objective Evolutionary Algorithms for Chance Constrained Optimization Problems with Normally Distributed Random Variables
Authors:
Frank Neumann,
Carsten Witt
Abstract:
Chance constrained optimization problems allow to model problems where constraints involving stochastic components should only be violated with a small probability. Evolutionary algorithms have been applied to this scenario and shown to achieve high quality results. With this paper, we contribute to the theoretical understanding of evolutionary algorithms for chance constrained optimization. We st…
▽ More
Chance constrained optimization problems allow to model problems where constraints involving stochastic components should only be violated with a small probability. Evolutionary algorithms have been applied to this scenario and shown to achieve high quality results. With this paper, we contribute to the theoretical understanding of evolutionary algorithms for chance constrained optimization. We study the scenario of stochastic components that are independent and Normally distributed. Considering the simple single-objective (1+1)~EA, we show that imposing an additional uniform constraint already leads to local optima for very restricted scenarios and an exponential optimization time. We therefore introduce a multi-objective formulation of the problem which trades off the expected cost and its variance. We show that multi-objective evolutionary algorithms are highly effective when using this formulation and obtain a set of solutions that contains an optimal solution for any possible confidence level imposed on the constraint. Furthermore, we prove that this approach can also be used to compute a set of optimal solutions for the chance constrained minimum spanning tree problem. In order to deal with potentially exponentially many trade-offs in the multi-objective formulation, we propose and analyze improved convex multi-objective approaches. Experimental investigations on instances of the NP-hard stochastic minimum weight dominating set problem confirm the benefit of the multi-objective and the improved convex multi-objective approach in practice.
△ Less
Submitted 9 August, 2022; v1 submitted 13 September, 2021;
originally announced September 2021.
-
Communicating via Markov Decision Processes
Authors:
Samuel Sokota,
Christian Schroeder de Witt,
Maximilian Igl,
Luisa Zintgraf,
Philip Torr,
Martin Strohmeier,
J. Zico Kolter,
Shimon Whiteson,
Jakob Foerster
Abstract:
We consider the problem of communicating exogenous information by means of Markov decision process trajectories. This setting, which we call a Markov coding game (MCG), generalizes both source coding and a large class of referential games. MCGs also isolate a problem that is important in decentralized control settings in which cheap-talk is not available -- namely, they require balancing communica…
▽ More
We consider the problem of communicating exogenous information by means of Markov decision process trajectories. This setting, which we call a Markov coding game (MCG), generalizes both source coding and a large class of referential games. MCGs also isolate a problem that is important in decentralized control settings in which cheap-talk is not available -- namely, they require balancing communication with the associated cost of communicating. We contribute a theoretically grounded approach to MCGs based on maximum entropy reinforcement learning and minimum entropy coupling that we call MEME. Due to recent breakthroughs in approximation algorithms for minimum entropy coupling, MEME is not merely a theoretical algorithm, but can be applied to practical settings. Empirically, we show both that MEME is able to outperform a strong baseline on small MCGs and that MEME is able to achieve strong performance on extremely large MCGs. To the latter point, we demonstrate that MEME is able to losslessly communicate binary images via trajectories of Cartpole and Pong, while simultaneously achieving the maximal or near maximal expected returns, and that it is even capable of performing well in the presence of actuator noise.
△ Less
Submitted 12 June, 2022; v1 submitted 17 July, 2021;
originally announced July 2021.
-
A Self-Supervised Auxiliary Loss for Deep RL in Partially Observable Settings
Authors:
Eltayeb Ahmed,
Luisa Zintgraf,
Christian A. Schroeder de Witt,
Nicolas Usunier
Abstract:
In this work we explore an auxiliary loss useful for reinforcement learning in environments where strong performing agents are required to be able to navigate a spatial environment. The auxiliary loss proposed is to minimize the classification error of a neural network classifier that predicts whether or not a pair of states sampled from the agents current episode trajectory are in order. The clas…
▽ More
In this work we explore an auxiliary loss useful for reinforcement learning in environments where strong performing agents are required to be able to navigate a spatial environment. The auxiliary loss proposed is to minimize the classification error of a neural network classifier that predicts whether or not a pair of states sampled from the agents current episode trajectory are in order. The classifier takes as input a pair of states as well as the agent's memory. The motivation for this auxiliary loss is that there is a strong correlation with which of a pair of states is more recent in the agents episode trajectory and which of the two states is spatially closer to the agent. Our hypothesis is that learning features to answer this question encourages the agent to learn and internalize in memory representations of states that facilitate spatial reasoning. We tested this auxiliary loss on a navigation task in a gridworld and achieved 9.6% increase in accumulative episode reward compared to a strong baseline approach.
△ Less
Submitted 17 April, 2021;
originally announced April 2021.
-
Stagnation Detection in Highly Multimodal Fitness Landscapes
Authors:
Amirhossein Rajabi,
Carsten Witt
Abstract:
Stagnation detection has been proposed as a mechanism for randomized search heuristics to escape from local optima by automatically increasing the size of the neighborhood to find the so-called gap size, i.e., the distance to the next improvement. Its usefulness has mostly been considered in simple multimodal landscapes with few local optima that could be crossed one after another. In multimodal l…
▽ More
Stagnation detection has been proposed as a mechanism for randomized search heuristics to escape from local optima by automatically increasing the size of the neighborhood to find the so-called gap size, i.e., the distance to the next improvement. Its usefulness has mostly been considered in simple multimodal landscapes with few local optima that could be crossed one after another. In multimodal landscapes with a more complex location of optima of similar gap size, stagnation detection suffers from the fact that the neighborhood size is frequently reset to $1$ without using gap sizes that were promising in the past.
In this paper, we investigate a new mechanism called radius memory which can be added to stagnation detection to control the search radius more carefully by giving preference to values that were successful in the past. We implement this idea in an algorithm called SD-RLS$^{\text{m}}$ and show compared to previous variants of stagnation detection that it yields speed-ups for linear functions under uniform constraints and the minimum spanning tree problem. Moreover, its running time does not significantly deteriorate on unimodal functions and a generalization of the Jump benchmark. Finally, we present experimental results carried out to study SD-RLS$^{\text{m}}$ and compare it with other algorithms.
△ Less
Submitted 22 April, 2021; v1 submitted 9 April, 2021;
originally announced April 2021.
-
On Steady-State Evolutionary Algorithms and Selective Pressure: Why Inverse Rank-Based Allocation of Reproductive Trials is Best
Authors:
Dogan Corus,
Andrei Lissovoi,
Pietro S. Oliveto,
Carsten Witt
Abstract:
We analyse the impact of the selective pressure for the global optimisation capabilities of steady-state EAs. For the standard bimodal benchmark function \twomax we rigorously prove that using uniform parent selection leads to exponential runtimes with high probability to locate both optima for the standard ($μ$+1)~EA and ($μ$+1)~RLS with any polynomial population sizes. On the other hand, we prov…
▽ More
We analyse the impact of the selective pressure for the global optimisation capabilities of steady-state EAs. For the standard bimodal benchmark function \twomax we rigorously prove that using uniform parent selection leads to exponential runtimes with high probability to locate both optima for the standard ($μ$+1)~EA and ($μ$+1)~RLS with any polynomial population sizes. On the other hand, we prove that selecting the worst individual as parent leads to efficient global optimisation with overwhelming probability for reasonable population sizes. Since always selecting the worst individual may have detrimental effects for esca** from local optima, we consider the performance of stochastic parent selection operators with low selective pressure for a function class called \textsc{TruncatedTwoMax} where one slope is shorter than the other. An experimental analysis shows that the EAs equipped with inverse tournament selection, where the loser is selected for reproduction and small tournament sizes, globally optimise \textsc{TwoMax} efficiently and effectively escape from local optima of \textsc{TruncatedTwoMax} with high probability. Thus they identify both optima efficiently while uniform (or stronger) selection fails in theory and in practice. We then show the power of inverse selection on function classes from the literature where populations are essential by providing rigorous proofs or experimental evidence that it outperforms uniform selection equipped with or without a restart strategy. We conclude the paper by confirming our theoretical insights with an empirical analysis of the different selective pressures on standard benchmarks of the classical MaxSat and Multidimensional Knapsack Problems.
△ Less
Submitted 18 March, 2021;
originally announced March 2021.
-
OmniDet: Surround View Cameras based Multi-task Visual Perception Network for Autonomous Driving
Authors:
Varun Ravi Kumar,
Senthil Yogamani,
Hazem Rashed,
Ganesh Sistu,
Christian Witt,
Isabelle Leang,
Stefan Milz,
Patrick Mäder
Abstract:
Surround View fisheye cameras are commonly deployed in automated driving for 360° near-field sensing around the vehicle. This work presents a multi-task visual perception network on unrectified fisheye images to enable the vehicle to sense its surrounding environment. It consists of six primary tasks necessary for an autonomous driving system: depth estimation, visual odometry, semantic segmentati…
▽ More
Surround View fisheye cameras are commonly deployed in automated driving for 360° near-field sensing around the vehicle. This work presents a multi-task visual perception network on unrectified fisheye images to enable the vehicle to sense its surrounding environment. It consists of six primary tasks necessary for an autonomous driving system: depth estimation, visual odometry, semantic segmentation, motion segmentation, object detection, and lens soiling detection. We demonstrate that the jointly trained model performs better than the respective single task versions. Our multi-task model has a shared encoder providing a significant computational advantage and has synergized decoders where tasks support each other. We propose a novel camera geometry based adaptation mechanism to encode the fisheye distortion model both at training and inference. This was crucial to enable training on the WoodScape dataset, comprised of data from different parts of the world collected by 12 different cameras mounted on three different cars with different intrinsics and viewpoints. Given that bounding boxes is not a good representation for distorted fisheye images, we also extend object detection to use a polygon with non-uniformly sampled vertices. We additionally evaluate our model on standard automotive datasets, namely KITTI and Cityscapes. We obtain the state-of-the-art results on KITTI for depth estimation and pose estimation tasks and competitive performance on the other tasks. We perform extensive ablation studies on various architecture choices and task weighting methodologies. A short video at https://youtu.be/xbSjZ5OfPes provides qualitative results.
△ Less
Submitted 6 June, 2023; v1 submitted 15 February, 2021;
originally announced February 2021.
-
Stagnation Detection with Randomized Local Search
Authors:
Amirhossein Rajabi,
Carsten Witt
Abstract:
Recently a mechanism called stagnation detection was proposed that automatically adjusts the mutation rate of evolutionary algorithms when they encounter local optima. The so-called $SD-(1+1)EA$ introduced by Rajabi and Witt (GECCO 2020) adds stagnation detection to the classical $(1+1)EA$ with standard bit mutation, which flips each bit independently with some mutation rate, and raises the mutati…
▽ More
Recently a mechanism called stagnation detection was proposed that automatically adjusts the mutation rate of evolutionary algorithms when they encounter local optima. The so-called $SD-(1+1)EA$ introduced by Rajabi and Witt (GECCO 2020) adds stagnation detection to the classical $(1+1)EA$ with standard bit mutation, which flips each bit independently with some mutation rate, and raises the mutation rate when the algorithm is likely to have encountered local optima.
In this paper, we investigate stagnation detection in the context of the $k$-bit flip operator of randomized local search that flips $k$ bits chosen uniformly at random and let stagnation detection adjust the parameter $k$. We obtain improved runtime results compared to the $SD-(1+1)EA$ amounting to a speed-up of up to $e=2.71\dots$ Moreover, we propose additional schemes that prevent infinite optimization times even if the algorithm misses a working choice of $k$ due to unlucky events. Finally, we present an example where standard bit mutation still outperforms the local $k$-bit flip with stagnation detection.
△ Less
Submitted 8 February, 2021; v1 submitted 28 January, 2021;
originally announced January 2021.
-
RainBench: Towards Global Precipitation Forecasting from Satellite Imagery
Authors:
Christian Schroeder de Witt,
Catherine Tong,
Valentina Zantedeschi,
Daniele De Martini,
Freddie Kalaitzis,
Matthew Chantry,
Duncan Watson-Parris,
Piotr Bilinski
Abstract:
Extreme precipitation events, such as violent rainfall and hail storms, routinely ravage economies and livelihoods around the develo** world. Climate change further aggravates this issue. Data-driven deep learning approaches could widen the access to accurate multi-day forecasts, to mitigate against such events. However, there is currently no benchmark dataset dedicated to the study of global pr…
▽ More
Extreme precipitation events, such as violent rainfall and hail storms, routinely ravage economies and livelihoods around the develo** world. Climate change further aggravates this issue. Data-driven deep learning approaches could widen the access to accurate multi-day forecasts, to mitigate against such events. However, there is currently no benchmark dataset dedicated to the study of global precipitation forecasts. In this paper, we introduce \textbf{RainBench}, a new multi-modal benchmark dataset for data-driven precipitation forecasting. It includes simulated satellite data, a selection of relevant meteorological data from the ERA5 reanalysis product, and IMERG precipitation data. We also release \textbf{PyRain}, a library to process large precipitation datasets efficiently. We present an extensive analysis of our novel dataset and establish baseline results for two benchmark medium-range precipitation forecasting tasks. Finally, we discuss existing data-driven weather forecasting methodologies and suggest future research avenues.
△ Less
Submitted 17 December, 2020;
originally announced December 2020.
-
Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?
Authors:
Christian Schroeder de Witt,
Tarun Gupta,
Denys Makoviichuk,
Viktor Makoviychuk,
Philip H. S. Torr,
Mingfei Sun,
Shimon Whiteson
Abstract:
Most recently developed approaches to cooperative multi-agent reinforcement learning in the \emph{centralized training with decentralized execution} setting involve estimating a centralized, joint value function. In this paper, we demonstrate that, despite its various theoretical shortcomings, Independent PPO (IPPO), a form of independent learning in which each agent simply estimates its local val…
▽ More
Most recently developed approaches to cooperative multi-agent reinforcement learning in the \emph{centralized training with decentralized execution} setting involve estimating a centralized, joint value function. In this paper, we demonstrate that, despite its various theoretical shortcomings, Independent PPO (IPPO), a form of independent learning in which each agent simply estimates its local value function, can perform just as well as or better than state-of-the-art joint learning approaches on popular multi-agent benchmark suite SMAC with little hyperparameter tuning. We also compare IPPO to several variants; the results suggest that IPPO's strong performance may be due to its robustness to some forms of environment non-stationarity.
△ Less
Submitted 18 November, 2020;
originally announced November 2020.
-
Improved Runtime Results for Simple Randomised Search Heuristics on Linear Functions with a Uniform Constraint
Authors:
Frank Neumann,
Mojgan Pourhassan,
Carsten Witt
Abstract:
In the last decade remarkable progress has been made in development of suitable proof techniques for analysing randomised search heuristics. The theoretical investigation of these algorithms on classes of functions is essential to the understanding of the underlying stochastic process. Linear functions have been traditionally studied in this area resulting in tight bounds on the expected optimisat…
▽ More
In the last decade remarkable progress has been made in development of suitable proof techniques for analysing randomised search heuristics. The theoretical investigation of these algorithms on classes of functions is essential to the understanding of the underlying stochastic process. Linear functions have been traditionally studied in this area resulting in tight bounds on the expected optimisation time of simple randomised search algorithms for this class of problems. Recently, the constrained version of this problem has gained attention and some theoretical results have also been obtained on this class of problems. In this paper we study the class of linear functions under uniform constraint and investigate the expected optimisation time of Randomised Local Search (RLS) and a simple evolutionary algorithm called (1+1) EA. We prove a tight bound of $Θ(n^2)$ for RLS and improve the previously best known upper bound of (1+1) EA from $O(n^2 \log (Bw_{\max}))$ to $O(n^2\log B)$ in expectation and to $O(n^2 \log n)$ with high probability, where $w_{\max}$ and $B$ are the maximum weight of the linear objective function and the bound of the uniform constraint, respectively. Also, we obtain a tight bound of $O(n^2)$ for the (1+1) EA on a special class of instances. We complement our theoretical studies by experimental investigations that consider different values of $B$ and also higher mutation rates that reflect the fact that $2$-bit flips are crucial for dealing with the uniform constraint.
△ Less
Submitted 21 October, 2020;
originally announced October 2020.
-
UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models
Authors:
Varun Ravi Kumar,
Senthil Yogamani,
Markus Bach,
Christian Witt,
Stefan Milz,
Patrick Mader
Abstract:
In classical computer vision, rectification is an integral part of multi-view depth estimation. It typically includes epipolar rectification and lens distortion correction. This process simplifies the depth estimation significantly, and thus it has been adopted in CNN approaches. However, rectification has several side effects, including a reduced field of view (FOV), resampling distortion, and se…
▽ More
In classical computer vision, rectification is an integral part of multi-view depth estimation. It typically includes epipolar rectification and lens distortion correction. This process simplifies the depth estimation significantly, and thus it has been adopted in CNN approaches. However, rectification has several side effects, including a reduced field of view (FOV), resampling distortion, and sensitivity to calibration errors. The effects are particularly pronounced in case of significant distortion (e.g., wide-angle fisheye cameras). In this paper, we propose a generic scale-aware self-supervised pipeline for estimating depth, euclidean distance, and visual odometry from unrectified monocular videos. We demonstrate a similar level of precision on the unrectified KITTI dataset with barrel distortion comparable to the rectified KITTI dataset. The intuition being that the rectification step can be implicitly absorbed within the CNN model, which learns the distortion model without increasing complexity. Our approach does not suffer from a reduced field of view and avoids computational costs for rectification at inference time. To further illustrate the general applicability of the proposed framework, we apply it to wide-angle fisheye cameras with 190$^\circ$ horizontal field of view. The training framework UnRectDepthNet takes in the camera distortion model as an argument and adapts projection and unprojection functions accordingly. The proposed algorithm is evaluated further on the KITTI rectified dataset, and we achieve state-of-the-art results that improve upon our previous work FisheyeDistanceNet. Qualitative results on a distorted test scene video sequence indicate excellent performance https://youtu.be/K6pbx3bU4Ss.
△ Less
Submitted 6 June, 2023; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Evolutionary Algorithms with Self-adjusting Asymmetric Mutation
Authors:
Amirhossein Rajabi,
Carsten Witt
Abstract:
Evolutionary Algorithms (EAs) and other randomized search heuristics are often considered as unbiased algorithms that are invariant with respect to different transformations of the underlying search space. However, if a certain amount of domain knowledge is available the use of biased search operators in EAs becomes viable. We consider a simple (1+1) EA for binary search spaces and analyze an asym…
▽ More
Evolutionary Algorithms (EAs) and other randomized search heuristics are often considered as unbiased algorithms that are invariant with respect to different transformations of the underlying search space. However, if a certain amount of domain knowledge is available the use of biased search operators in EAs becomes viable. We consider a simple (1+1) EA for binary search spaces and analyze an asymmetric mutation operator that can treat zero- and one-bits differently. This operator extends previous work by Jansen and Sudholt (ECJ 18(1), 2010) by allowing the operator asymmetry to vary according to the success rate of the algorithm. Using a self-adjusting scheme that learns an appropriate degree of asymmetry, we show improved runtime results on the class of functions OneMax$_a$ describing the number of matching bits with a fixed target $a\in\{0,1\}^n$.
△ Less
Submitted 16 June, 2020;
originally announced June 2020.
-
Improved Fixed-Budget Results via Drift Analysis
Authors:
Timo Kötzing,
Carsten Witt
Abstract:
Fixed-budget theory is concerned with computing or bounding the fitness value achievable by randomized search heuristics within a given budget of fitness function evaluations. Despite recent progress in fixed-budget theory, there is a lack of general tools to derive such results. We transfer drift theory, the key tool to derive expected optimization times, to the fixed-budged perspective. A first…
▽ More
Fixed-budget theory is concerned with computing or bounding the fitness value achievable by randomized search heuristics within a given budget of fitness function evaluations. Despite recent progress in fixed-budget theory, there is a lack of general tools to derive such results. We transfer drift theory, the key tool to derive expected optimization times, to the fixed-budged perspective. A first and easy-to-use statement concerned with iterating drift in so-called greed-admitting scenarios immediately translates into bounds on the expected function value. Afterwards, we consider a more general tool based on the well-known variable drift theorem. Applications of this technique to the LeadingOnes benchmark function yield statements that are more precise than the previous state of the art.
△ Less
Submitted 12 June, 2020;
originally announced June 2020.
-
Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning
Authors:
Shariq Iqbal,
Christian A. Schroeder de Witt,
Bei Peng,
Wendelin Böhmer,
Shimon Whiteson,
Fei Sha
Abstract:
Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities; however, common patterns of behavior often emerge among these agents/entities. Our method aims to leverage these commonalities by asking the question: ``What is the expected utility of each agent when only considering a randomly selected sub-group of its observed entities?…
▽ More
Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities; however, common patterns of behavior often emerge among these agents/entities. Our method aims to leverage these commonalities by asking the question: ``What is the expected utility of each agent when only considering a randomly selected sub-group of its observed entities?'' By posing this counterfactual question, we can recognize state-action trajectories within sub-groups of entities that we may have encountered in another task and use what we learned in that task to inform our prediction in the current one. We then reconstruct a prediction of the full returns as a combination of factors considering these disjoint groups of entities and train this ``randomly factorized" value function as an auxiliary objective for value-based multi-agent reinforcement learning. By doing so, our model can recognize and leverage similarities across tasks to improve learning efficiency in a multi-task setting. Our approach, Randomized Entity-wise Factorization for Imagined Learning (REFIL), outperforms all strong baselines by a significant margin in challenging multi-task StarCraft micromanagement settings.
△ Less
Submitted 11 June, 2021; v1 submitted 7 June, 2020;
originally announced June 2020.
-
Simulation-Based Inference for Global Health Decisions
Authors:
Christian Schroeder de Witt,
Bradley Gram-Hansen,
Nantas Nardelli,
Andrew Gambardella,
Rob Zinkov,
Puneet Dokania,
N. Siddharth,
Ana Belen Espinosa-Gonzalez,
Ara Darzi,
Philip Torr,
Atılım Güneş Baydin
Abstract:
The COVID-19 pandemic has highlighted the importance of in-silico epidemiological modelling in predicting the dynamics of infectious diseases to inform health policy and decision makers about suitable prevention and containment strategies. Work in this setting involves solving challenging inference and control problems in individual-based models of ever increasing complexity. Here we discuss recen…
▽ More
The COVID-19 pandemic has highlighted the importance of in-silico epidemiological modelling in predicting the dynamics of infectious diseases to inform health policy and decision makers about suitable prevention and containment strategies. Work in this setting involves solving challenging inference and control problems in individual-based models of ever increasing complexity. Here we discuss recent breakthroughs in machine learning, specifically in simulation-based inference, and explore its potential as a novel venue for model calibration to support the design and evaluation of public health interventions. To further stimulate research, we are develo** software interfaces that turn two cornerstone COVID-19 and malaria epidemiology models COVID-sim, (https://github.com/mrc-ide/covid-sim/) and OpenMalaria (https://github.com/SwissTPH/openmalaria) into probabilistic programs, enabling efficient interpretable Bayesian inference within those simulators.
△ Less
Submitted 14 May, 2020;
originally announced May 2020.
-
Self-Adjusting Evolutionary Algorithms for Multimodal Optimization
Authors:
Amirhossein Rajabi,
Carsten Witt
Abstract:
Recent theoretical research has shown that self-adjusting and self-adaptive mechanisms can provably outperform static settings in evolutionary algorithms for binary search spaces. However, the vast majority of these studies focuses on unimodal functions which do not require the algorithm to flip several bits simultaneously to make progress. In fact, existing self-adjusting algorithms are not desig…
▽ More
Recent theoretical research has shown that self-adjusting and self-adaptive mechanisms can provably outperform static settings in evolutionary algorithms for binary search spaces. However, the vast majority of these studies focuses on unimodal functions which do not require the algorithm to flip several bits simultaneously to make progress. In fact, existing self-adjusting algorithms are not designed to detect local optima and do not have any obvious benefit to cross large Hamming gaps.
We suggest a mechanism called stagnation detection that can be added as a module to existing evolutionary algorithms (both with and without prior self-adjusting algorithms). Added to a simple (1+1) EA, we prove an expected runtime on the well-known Jump benchmark that corresponds to an asymptotically optimal parameter setting and outperforms other mechanisms for multimodal optimization like heavy-tailed mutation. We also investigate the module in the context of a self-adjusting (1+$λ$) EA and show that it combines the previous benefits of this algorithm on unimodal problems with more efficient multimodal optimization.
To explore the limitations of the approach, we additionally present an example where both self-adjusting mechanisms, including stagnation detection, do not help to find a beneficial setting of the mutation rate. Finally, we investigate our module for stagnation detection experimentally.
△ Less
Submitted 2 June, 2020; v1 submitted 7 April, 2020;
originally announced April 2020.
-
Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
Authors:
Tabish Rashid,
Mikayel Samvelyan,
Christian Schroeder de Witt,
Gregory Farquhar,
Jakob Foerster,
Shimon Whiteson
Abstract:
In many real-world settings, a team of agents must coordinate its behaviour while acting in a decentralised fashion. At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised l…
▽ More
In many real-world settings, a team of agents must coordinate its behaviour while acting in a decentralised fashion. At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a mixing network that estimates joint action-values as a monotonic combination of per-agent values. We structurally enforce that the joint-action value is monotonic in the per-agent values, through the use of non-negative weights in the mixing network, which guarantees consistency between the centralised and decentralised policies. To evaluate the performance of QMIX, we propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning. We evaluate QMIX on a challenging set of SMAC scenarios and show that it significantly outperforms existing multi-agent reinforcement learning methods.
△ Less
Submitted 27 August, 2020; v1 submitted 19 March, 2020;
originally announced March 2020.
-
FACMAC: Factored Multi-Agent Centralised Policy Gradients
Authors:
Bei Peng,
Tabish Rashid,
Christian A. Schroeder de Witt,
Pierre-Alexandre Kamienny,
Philip H. S. Torr,
Wendelin Böhmer,
Shimon Whiteson
Abstract:
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. Like MADDPG, a popular multi-agent actor-critic method, our approach uses deep deterministic policy gradients to learn policies. However, FACMAC learns a centralised but factored critic, which combines per-agent utilit…
▽ More
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. Like MADDPG, a popular multi-agent actor-critic method, our approach uses deep deterministic policy gradients to learn policies. However, FACMAC learns a centralised but factored critic, which combines per-agent utilities into the joint action-value function via a non-linear monotonic function, as in QMIX, a popular multi-agent Q-learning algorithm. However, unlike QMIX, there are no inherent constraints on factoring the critic. We thus also employ a nonmonotonic factorisation and empirically demonstrate that its increased representational capacity allows it to solve some tasks that cannot be solved with monolithic, or monotonically factored critics. In addition, FACMAC uses a centralised policy gradient estimator that optimises over the entire joint action space, rather than optimising over each agent's action space separately as in MADDPG. This allows for more coordinated policy changes and fully reaps the benefits of a centralised critic. We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks. Empirical results demonstrate FACMAC's superior performance over MADDPG and other baselines on all three domains.
△ Less
Submitted 7 May, 2021; v1 submitted 14 March, 2020;
originally announced March 2020.