-
Pareto Conditioned Networks
Authors:
Mathieu Reymond,
Eugenio Bargiacchi,
Ann Nowé
Abstract:
In multi-objective optimization, learning all the policies that reach Pareto-efficient solutions is an expensive process. The set of optimal policies can grow exponentially with the number of objectives, and recovering all solutions requires an exhaustive exploration of the entire state space. We propose Pareto Conditioned Networks (PCN), a method that uses a single neural network to encompass all…
▽ More
In multi-objective optimization, learning all the policies that reach Pareto-efficient solutions is an expensive process. The set of optimal policies can grow exponentially with the number of objectives, and recovering all solutions requires an exhaustive exploration of the entire state space. We propose Pareto Conditioned Networks (PCN), a method that uses a single neural network to encompass all non-dominated policies. PCN associates every past transition with its episode's return. It trains the network such that, when conditioned on this same return, it should reenact said transition. In doing so we transform the optimization problem into a classification problem. We recover a concrete policy by conditioning the network on the desired Pareto-efficient solution. Our method is stable as it learns in a supervised fashion, thus avoiding moving target issues. Moreover, by using a single network, PCN scales efficiently with the number of objectives. Finally, it makes minimal assumptions on the shape of the Pareto front, which makes it suitable to a wider range of problems than previous state-of-the-art multi-objective reinforcement learning algorithms.
△ Less
Submitted 11 April, 2022;
originally announced April 2022.
-
A Practical Guide to Multi-Objective Reinforcement Learning and Planning
Authors:
Conor F. Hayes,
Roxana Rădulescu,
Eugenio Bargiacchi,
Johan Källström,
Matthew Macfarlane,
Mathieu Reymond,
Timothy Verstraeten,
Luisa M. Zintgraf,
Richard Dazeley,
Fredrik Heintz,
Enda Howley,
Athirai A. Irissappane,
Patrick Mannion,
Ann Nowé,
Gabriel Ramos,
Marcello Restelli,
Peter Vamplew,
Diederik M. Roijers
Abstract:
Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying pr…
▽ More
Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems.
△ Less
Submitted 17 March, 2021;
originally announced March 2021.
-
Scalable Optimization for Wind Farm Control using Coordination Graphs
Authors:
Timothy Verstraeten,
Pieter-Jan Daems,
Eugenio Bargiacchi,
Diederik M. Roijers,
Pieter J. K. Libin,
Jan Helsen
Abstract:
Wind farms are a crucial driver toward the generation of ecological and renewable energy. Due to their rapid increase in capacity, contemporary wind farms need to adhere to strict constraints on power output to ensure stability of the electricity grid. Specifically, a wind farm controller is required to match the farm's power production with a power demand imposed by the grid operator. This is a n…
▽ More
Wind farms are a crucial driver toward the generation of ecological and renewable energy. Due to their rapid increase in capacity, contemporary wind farms need to adhere to strict constraints on power output to ensure stability of the electricity grid. Specifically, a wind farm controller is required to match the farm's power production with a power demand imposed by the grid operator. This is a non-trivial optimization problem, as complex dependencies exist between the wind turbines. State-of-the-art wind farm control typically relies on physics-based heuristics that fail to capture the full load spectrum that defines a turbine's health status. When this is not taken into account, the long-term viability of the farm's turbines is put at risk. Given the complex dependencies that determine a turbine's lifetime, learning a flexible and optimal control strategy requires a data-driven approach. However, as wind farms are large-scale multi-agent systems, optimizing control strategies over the full joint action space is intractable. We propose a new learning method for wind farm control that leverages the sparse wind farm structure to factorize the optimization problem. Using a Bayesian approach, based on multi-agent Thompson sampling, we explore the factored joint action space for configurations that match the demand, while considering the lifetime of turbines. We apply our method to a grid-like wind farm layout, and evaluate configurations using a state-of-the-art wind flow simulator. Our results are competitive with a physics-based heuristic approach in terms of demand error, while, contrary to the heuristic, our method prolongs the lifetime of high-risk turbines.
△ Less
Submitted 19 January, 2021;
originally announced January 2021.
-
Model-based Multi-Agent Reinforcement Learning with Cooperative Prioritized Swee**
Authors:
Eugenio Bargiacchi,
Timothy Verstraeten,
Diederik M. Roijers,
Ann Nowé
Abstract:
We present a new model-based reinforcement learning algorithm, Cooperative Prioritized Swee**, for efficient learning in multi-agent Markov decision processes. The algorithm allows for sample-efficient learning on large problems by exploiting a factorization to approximate the value function. Our approach only requires knowledge about the structure of the problem in the form of a dynamic decisio…
▽ More
We present a new model-based reinforcement learning algorithm, Cooperative Prioritized Swee**, for efficient learning in multi-agent Markov decision processes. The algorithm allows for sample-efficient learning on large problems by exploiting a factorization to approximate the value function. Our approach only requires knowledge about the structure of the problem in the form of a dynamic decision network. Using this information, our method learns a model of the environment and performs temporal difference updates which affect multiple joint states and actions at once. Batch updates are additionally performed which efficiently back-propagate knowledge throughout the factored Q-function. Our method outperforms the state-of-the-art algorithm sparse cooperative Q-learning algorithm, both on the well-known SysAdmin benchmark and randomized environments.
△ Less
Submitted 15 January, 2020;
originally announced January 2020.
-
Multi-Agent Thompson Sampling for Bandit Applications with Sparse Neighbourhood Structures
Authors:
Timothy Verstraeten,
Eugenio Bargiacchi,
Pieter JK Libin,
Jan Helsen,
Diederik M Roijers,
Ann Nowé
Abstract:
Multi-agent coordination is prevalent in many real-world applications. However, such coordination is challenging due to its combinatorial nature. An important observation in this regard is that agents in the real world often only directly affect a limited set of neighbouring agents. Leveraging such loose couplings among agents is key to making coordination in multi-agent systems feasible. In this…
▽ More
Multi-agent coordination is prevalent in many real-world applications. However, such coordination is challenging due to its combinatorial nature. An important observation in this regard is that agents in the real world often only directly affect a limited set of neighbouring agents. Leveraging such loose couplings among agents is key to making coordination in multi-agent systems feasible. In this work, we focus on learning to coordinate. Specifically, we consider the multi-agent multi-armed bandit framework, in which fully cooperative loosely-coupled agents must learn to coordinate their decisions to optimize a common objective. We propose multi-agent Thompson sampling (MATS), a new Bayesian exploration-exploitation algorithm that leverages loose couplings. We provide a regret bound that is sublinear in time and low-order polynomial in the highest number of actions of a single agent for sparse coordination graphs. Additionally, we empirically show that MATS outperforms the state-of-the-art algorithm, MAUCE, on two synthetic benchmarks, and a novel benchmark with Poisson distributions. An example of a loosely-coupled multi-agent system is a wind farm. Coordination within the wind farm is necessary to maximize power production. As upstream wind turbines only affect nearby downstream turbines, we can use MATS to efficiently learn the optimal control mechanism for the farm. To demonstrate the benefits of our method toward applications we apply MATS to a realistic wind farm control task. In this task, wind turbines must coordinate their alignments with respect to the incoming wind vector in order to optimize power production. Our results show that MATS improves significantly upon state-of-the-art coordination methods in terms of performance, demonstrating the value of using MATS in practical applications with sparse neighbourhood structures.
△ Less
Submitted 7 February, 2020; v1 submitted 22 November, 2019;
originally announced November 2019.