-
Minimax Exploiter: A Data Efficient Approach for Competitive Self-Play
Authors:
Daniel Bairamian,
Philippe Marcotte,
Joshua Romoff,
Gabriel Robert,
Derek Nowrouzezahrai
Abstract:
Recent advances in Competitive Self-Play (CSP) have achieved, or even surpassed, human level performance in complex game environments such as Dota 2 and StarCraft II using Distributed Multi-Agent Reinforcement Learning (MARL). One core component of these methods relies on creating a pool of learning agents -- consisting of the Main Agent, past versions of this agent, and Exploiter Agents -- where…
▽ More
Recent advances in Competitive Self-Play (CSP) have achieved, or even surpassed, human level performance in complex game environments such as Dota 2 and StarCraft II using Distributed Multi-Agent Reinforcement Learning (MARL). One core component of these methods relies on creating a pool of learning agents -- consisting of the Main Agent, past versions of this agent, and Exploiter Agents -- where Exploiter Agents learn counter-strategies to the Main Agents. A key drawback of these approaches is the large computational cost and physical time that is required to train the system, making them impractical to deploy in highly iterative real-life settings such as video game productions. In this paper, we propose the Minimax Exploiter, a game theoretic approach to exploiting Main Agents that leverages knowledge of its opponents, leading to significant increases in data efficiency. We validate our approach in a diversity of settings, including simple turn based games, the arcade learning environment, and For Honor, a modern video game. The Minimax Exploiter consistently outperforms strong baselines, demonstrating improved stability and data efficiency, leading to a robust CSP-MARL method that is both flexible and easy to deploy.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Graph augmented Deep Reinforcement Learning in the GameRLand3D environment
Authors:
Edward Beeching,
Maxim Peter,
Philippe Marcotte,
Jilles Debangoye,
Olivier Simonin,
Joshua Romoff,
Christian Wolf
Abstract:
We address planning and navigation in challenging 3D video games featuring maps with disconnected regions reachable by agents using special actions. In this setting, classical symbolic planners are not applicable or difficult to adapt. We introduce a hybrid technique combining a low level policy trained with reinforcement learning and a graph based high level classical planner. In addition to prov…
▽ More
We address planning and navigation in challenging 3D video games featuring maps with disconnected regions reachable by agents using special actions. In this setting, classical symbolic planners are not applicable or difficult to adapt. We introduce a hybrid technique combining a low level policy trained with reinforcement learning and a graph based high level classical planner. In addition to providing human-interpretable paths, the approach improves the generalization performance of an end-to-end approach in unseen maps, where it achieves a 20% absolute increase in success rate over a recurrent end-to-end agent on a point to point navigation task in yet unseen large-scale maps of size 1km x 1km. In an in-depth experimental study, we quantify the limitations of end-to-end Deep RL approaches in vast environments and we also introduce "GameRLand3D", a new benchmark and soon to be released environment can generate complex procedural 3D maps for navigation tasks.
△ Less
Submitted 22 December, 2021;
originally announced December 2021.
-
Achieving an optimal trade-off between revenue and energy peak within a smart grid environment
Authors:
Sezin Afsar,
Luce Brotcorne,
Patrice Marcotte,
Gilles Savard
Abstract:
We consider an energy provider whose goal is to simultaneously set revenue-maximizing prices and meet a peak load constraint. In our bilevel setting, the provider acts as a leader (upper level) that takes into account a smart grid (lower level) that minimizes the sum of users' disutilities. The latter bases its decisions on the hourly prices set by the leader, as well as the schedule preferences s…
▽ More
We consider an energy provider whose goal is to simultaneously set revenue-maximizing prices and meet a peak load constraint. In our bilevel setting, the provider acts as a leader (upper level) that takes into account a smart grid (lower level) that minimizes the sum of users' disutilities. The latter bases its decisions on the hourly prices set by the leader, as well as the schedule preferences set by the users for each task. Considering both the monopolistic and competitive situations, we illustrate numerically the validity of the approach, which achieves an 'optimal' trade-off between three objectives: revenue, user cost, and peak demand.
△ Less
Submitted 21 January, 2016;
originally announced January 2016.
-
An Approximation Algorithm for Stackelberg Network Pricing
Authors:
S. Roch,
P. Marcotte,
G. Savard
Abstract:
We consider the problem of maximizing the revenue raised from tolls set on the arcs of a transportation network, under the constraint that users are assigned to toll-compatible shortest paths. We first prove that this problem is strongly NP-hard. We then provide a polynomial time algorithm with a worst-case precision guarantee of ${1/2}\log_2 m_T+1$, where $m_T$ denotes the number of toll arcs.…
▽ More
We consider the problem of maximizing the revenue raised from tolls set on the arcs of a transportation network, under the constraint that users are assigned to toll-compatible shortest paths. We first prove that this problem is strongly NP-hard. We then provide a polynomial time algorithm with a worst-case precision guarantee of ${1/2}\log_2 m_T+1$, where $m_T$ denotes the number of toll arcs. Finally we show that the approximation is tight with respect to a natural relaxation by constructing a family of instances for which the relaxation gap is reached.
△ Less
Submitted 26 September, 2004;
originally announced September 2004.