Skip to main content

Showing 1–37 of 37 results for author: Pérolat, J

.
  1. arXiv:2303.03196  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning

    Authors: Marc Lanctot, John Schultz, Neil Burch, Max Olan Smith, Daniel Hennes, Thomas Anthony, Julien Perolat

    Abstract: Progress in fields of machine learning and adversarial planning has benefited significantly from benchmark domains, from checkers and the classic UCI data sets to Go and Diplomacy. In sequential decision-making, agent evaluation has largely been restricted to few interactions against experts, with the aim to reach some desired level of performance (e.g. beating a human professional player). We pro… ▽ More

    Submitted 31 October, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: 25 pages, 8 figures, Accepted at TMLR October 2023

  2. arXiv:2209.10958  [pdf, ps, other

    cs.MA cs.AI

    Develo**, Evaluating and Scaling Learning Agents in Multi-Agent Environments

    Authors: Ian Gemp, Thomas Anthony, Yoram Bachrach, Avishkar Bhoopchand, Kalesha Bullard, Jerome Connor, Vibhavari Dasagi, Bart De Vylder, Edgar Duenez-Guzman, Romuald Elie, Richard Everett, Daniel Hennes, Edward Hughes, Mina Khan, Marc Lanctot, Kate Larson, Guy Lever, Siqi Liu, Luke Marris, Kevin R. McKee, Paul Muller, Julien Perolat, Florian Strub, Andrea Tacchetti, Eugene Tarassov , et al. (2 additional authors not shown)

    Abstract: The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in d… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Comments: Published in AI Communications 2022

  3. arXiv:2208.10138  [pdf, other

    cs.GT stat.ML

    Learning Correlated Equilibria in Mean-Field Games

    Authors: Paul Muller, Romuald Elie, Mark Rowland, Mathieu Lauriere, Julien Perolat, Sarah Perrin, Matthieu Geist, Georgios Piliouras, Olivier Pietquin, Karl Tuyls

    Abstract: The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

  4. arXiv:2206.15378  [pdf, other

    cs.AI cs.GT cs.MA

    Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

    Authors: Julien Perolat, Bart de Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot , et al. (9 additional authors not shown)

    Abstract: We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additiona… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

  5. arXiv:2205.12944  [pdf, other

    cs.LG cs.AI cs.GT math.OC

    Learning in Mean Field Games: A Survey

    Authors: Mathieu Laurière, Sarah Perrin, Julien Pérolat, Sertan Girgin, Paul Muller, Romuald Élie, Matthieu Geist, Olivier Pietquin

    Abstract: Non-cooperative and cooperative games with a very large number of players have many applications but remain generally intractable when the number of players increases. Introduced by Lasry and Lions, and Huang, Caines and Malhamé, Mean Field Games (MFGs) rely on a mean-field approximation to allow the number of players to grow to infinity. Traditional methods for solving these games generally rely… ▽ More

    Submitted 20 February, 2024; v1 submitted 25 May, 2022; originally announced May 2022.

  6. arXiv:2203.11973  [pdf, other

    cs.LG math.OC stat.ML

    Scalable Deep Reinforcement Learning Algorithms for Mean Field Games

    Authors: Mathieu Laurière, Sarah Perrin, Sertan Girgin, Paul Muller, Ayush Jain, Theophile Cabannes, Georgios Piliouras, Julien Pérolat, Romuald Élie, Olivier Pietquin, Matthieu Geist

    Abstract: Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quant… ▽ More

    Submitted 17 June, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

  7. arXiv:2111.08350  [pdf, other

    cs.GT cs.MA

    Learning Equilibria in Mean-Field Games: Introducing Mean-Field PSRO

    Authors: Paul Muller, Mark Rowland, Romuald Elie, Georgios Piliouras, Julien Perolat, Mathieu Lauriere, Raphael Marinier, Olivier Pietquin, Karl Tuyls

    Abstract: Recent advances in multiagent learning have seen the introduction ofa family of algorithms that revolve around the population-based trainingmethod PSRO, showing convergence to Nash, correlated and coarse corre-lated equilibria. Notably, when the number of agents increases, learningbest-responses becomes exponentially more difficult, and as such ham-pers PSRO training methods. The paradigm of mean-… ▽ More

    Submitted 29 August, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

    Comments: AAMAS

  8. arXiv:2110.11943  [pdf, other

    math.DS cs.MA cs.NI eess.SY math.OC

    Solving N-player dynamic routing games with congestion: a mean field approach

    Authors: Theophile Cabannes, Mathieu Lauriere, Julien Perolat, Raphael Marinier, Sertan Girgin, Sarah Perrin, Olivier Pietquin, Alexandre M. Bayen, Eric Goubault, Romuald Elie

    Abstract: The recent emergence of navigational tools has changed traffic patterns and has now enabled new types of congestion-aware routing control like dynamic road pricing. Using the fundamental diagram of traffic flows - applied in macroscopic and mesoscopic traffic modeling - the article introduces a new N-player dynamic routing game with explicit congestion dynamics. The model is well-posed and can rep… ▽ More

    Submitted 27 October, 2021; v1 submitted 22 October, 2021; originally announced October 2021.

  9. arXiv:2110.10819  [pdf, other

    cs.LG cs.AI

    Shaking the foundations: delusions in sequence models for interaction and control

    Authors: Pedro A. Ortega, Markus Kunesch, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Joel Veness, Jonas Buchli, Jonas Degrave, Bilal Piot, Julien Perolat, Tom Everitt, Corentin Tallec, Emilio Parisotto, Tom Erez, Yutian Chen, Scott Reed, Marcus Hutter, Nando de Freitas, Shane Legg

    Abstract: The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains. One important problem class that has remained relatively elusive however is purposeful adaptive behavior. Currently there is a common perception that sequence models "lack the understanding of the cause and effect of… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

    Comments: DeepMind Tech Report, 16 pages, 4 figures

  10. arXiv:2109.09717  [pdf, other

    cs.LG cs.GT cs.MA math.OC

    Generalization in Mean Field Games by Learning Master Policies

    Authors: Sarah Perrin, Mathieu Laurière, Julien Pérolat, Romuald Élie, Matthieu Geist, Olivier Pietquin

    Abstract: Mean Field Games (MFGs) can potentially scale multi-agent systems to extremely large populations of agents. Yet, most of the literature assumes a single initial distribution for the agents, which limits the practical applications of MFGs. Machine Learning has the potential to solve a wider diversity of MFG problems thanks to generalizations capacities. We study how to leverage these generalization… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

  11. arXiv:2106.03787  [pdf, other

    cs.LG cs.MA

    Concave Utility Reinforcement Learning: the Mean-Field Game Viewpoint

    Authors: Matthieu Geist, Julien Pérolat, Mathieu Laurière, Romuald Elie, Sarah Perrin, Olivier Bachem, Rémi Munos, Olivier Pietquin

    Abstract: Concave Utility Reinforcement Learning (CURL) extends RL from linear to concave utilities in the occupancy measure induced by the agent's policy. This encompasses not only RL but also imitation learning and exploration, among others. Yet, this more general paradigm invalidates the classical Bellman equations, and calls for new algorithms. Mean-field Games (MFGs) are a continuous approximation of m… ▽ More

    Submitted 16 February, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: AAMAS 2022

  12. arXiv:2105.07933  [pdf, other

    cs.MA cs.AI

    Mean Field Games Flock! The Reinforcement Learning Way

    Authors: Sarah Perrin, Mathieu Laurière, Julien Pérolat, Matthieu Geist, Romuald Élie, Olivier Pietquin

    Abstract: We present a method enabling a large number of agents to learn how to flock, which is a natural behavior observed in large populations of animals. This problem has drawn a lot of interest but requires many structural assumptions and is tractable only in small dimensions. We phrase this problem as a Mean Field Game (MFG), where each individual chooses its acceleration depending on the population be… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

  13. arXiv:2103.00623  [pdf, other

    cs.AI

    Scaling up Mean Field Games with Online Mirror Descent

    Authors: Julien Perolat, Sarah Perrin, Romuald Elie, Mathieu Laurière, Georgios Piliouras, Matthieu Geist, Karl Tuyls, Olivier Pietquin

    Abstract: We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD). We show that continuous-time OMD provably converges to a Nash equilibrium under a natural and well-motivated set of monotonicity assumptions. This theoretical result nicely extends to multi-population games and to settings involving common noise. A thorough experimental investigation on vari… ▽ More

    Submitted 28 February, 2021; originally announced March 2021.

  14. arXiv:2011.09192  [pdf, other

    cs.AI cs.GT cs.MA

    Game Plan: What AI can do for Football, and What Football can do for AI

    Authors: Karl Tuyls, Shayegan Omidshafiei, Paul Muller, Zhe Wang, Jerome Connor, Daniel Hennes, Ian Graham, William Spearman, Tim Waskett, Dafydd Steele, Pauline Luc, Adria Recasens, Alexandre Galashov, Gregory Thornton, Romuald Elie, Pablo Sprechmann, Pol Moreno, Kris Cao, Marta Garnelo, Praneet Dutta, Michal Valko, Nicolas Heess, Alex Bridgland, Julien Perolat, Bart De Vylder , et al. (11 additional authors not shown)

    Abstract: The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with t… ▽ More

    Submitted 18 November, 2020; originally announced November 2020.

  15. arXiv:2008.12234  [pdf, other

    cs.AI cs.LG

    The Advantage Regret-Matching Actor-Critic

    Authors: Audrūnas Gruslys, Marc Lanctot, Rémi Munos, Finbarr Timbers, Martin Schmid, Julien Perolat, Dustin Morrill, Vinicius Zambaldi, Jean-Baptiste Lespiau, John Schultz, Mohammad Gheshlaghi Azar, Michael Bowling, Karl Tuyls

    Abstract: Regret minimization has played a key role in online learning, equilibrium computation in games, and reinforcement learning (RL). In this paper, we describe a general model-free RL method for no-regret learning based on repeated reconsideration of past behavior. We propose a model-free RL algorithm, the AdvantageRegret-Matching Actor-Critic (ARMAC): rather than saving past state-action data, ARMAC… ▽ More

    Submitted 27 August, 2020; originally announced August 2020.

  16. arXiv:2007.03458  [pdf, other

    math.OC cs.AI

    Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications

    Authors: Sarah Perrin, Julien Perolat, Mathieu Laurière, Matthieu Geist, Romuald Elie, Olivier Pietquin

    Abstract: In this paper, we deepen the analysis of continuous time Fictitious Play learning algorithm to the consideration of various finite state Mean Field Game settings (finite horizon, $γ$-discounted), allowing in particular for the introduction of an additional common noise. We first present a theoretical convergence analysis of the continuous time Fictitious Play process and prove that the induced e… ▽ More

    Submitted 26 October, 2020; v1 submitted 5 July, 2020; originally announced July 2020.

  17. arXiv:2006.04635  [pdf, other

    cs.LG cs.AI cs.GT cs.MA stat.ML

    Learning to Play No-Press Diplomacy with Best Response Policy Iteration

    Authors: Thomas Anthony, Tom Eccles, Andrea Tacchetti, János Kramár, Ian Gemp, Thomas C. Hudson, Nicolas Porcel, Marc Lanctot, Julien Pérolat, Richard Everett, Roman Werpachowski, Satinder Singh, Thore Graepel, Yoram Bachrach

    Abstract: Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects.… ▽ More

    Submitted 4 January, 2022; v1 submitted 8 June, 2020; originally announced June 2020.

  18. Navigating the Landscape of Multiplayer Games

    Authors: Shayegan Omidshafiei, Karl Tuyls, Wojciech M. Czarnecki, Francisco C. Santos, Mark Rowland, Jerome Connor, Daniel Hennes, Paul Muller, Julien Perolat, Bart De Vylder, Audrunas Gruslys, Remi Munos

    Abstract: Multiplayer games have long been used as testbeds in artificial intelligence research, aptly referred to as the Drosophila of artificial intelligence. Traditionally, researchers have focused on using well-known games to build strong agents. This progress, however, can be better informed by characterizing games and their topological landscape. Tackling this latter question can facilitate understand… ▽ More

    Submitted 17 November, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

  19. arXiv:2002.08456  [pdf, other

    cs.GT cs.LG stat.ML

    From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization

    Authors: Julien Perolat, Remi Munos, Jean-Baptiste Lespiau, Shayegan Omidshafiei, Mark Rowland, Pedro Ortega, Neil Burch, Thomas Anthony, David Balduzzi, Bart De Vylder, Georgios Piliouras, Marc Lanctot, Karl Tuyls

    Abstract: In this paper we investigate the Follow the Regularized Leader dynamics in sequential imperfect information games (IIG). We generalize existing results of Poincaré recurrence from normal-form games to zero-sum two-player imperfect information games and other sequential game settings. We then investigate how adapting the reward (by adding a regularization term) of the game can give strong convergen… ▽ More

    Submitted 19 February, 2020; originally announced February 2020.

    Comments: 43 pages

  20. arXiv:1909.12823  [pdf, other

    cs.MA cs.AI cs.LG

    A Generalized Training Approach for Multiagent Learning

    Authors: Paul Muller, Shayegan Omidshafiei, Mark Rowland, Karl Tuyls, Julien Perolat, Siqi Liu, Daniel Hennes, Luke Marris, Marc Lanctot, Edward Hughes, Zhe Wang, Guy Lever, Nicolas Heess, Thore Graepel, Remi Munos

    Abstract: This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO). PSRO is general in the sense that it (1) encompasses well-known algorithms such as fictitious play and double oracle as special cases, and (2) in principle applies to general-sum, many-player games. Despite this, prior studies of PSRO have been focused on two-… ▽ More

    Submitted 14 February, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

  21. arXiv:1909.09849  [pdf, other

    cs.MA cs.AI cs.LG

    Multiagent Evaluation under Incomplete Information

    Authors: Mark Rowland, Shayegan Omidshafiei, Karl Tuyls, Julien Perolat, Michal Valko, Georgios Piliouras, Remi Munos

    Abstract: This paper investigates the evaluation of learned multiagent strategies in the incomplete information setting, which plays a critical role in ranking and training of agents. Traditionally, researchers have relied on Elo ratings for this purpose, with recent works also using methods based on Nash equilibria. Unfortunately, Elo is unable to handle intransitive agent interactions, and other technique… ▽ More

    Submitted 10 January, 2020; v1 submitted 21 September, 2019; originally announced September 2019.

  22. arXiv:1908.09453  [pdf, other

    cs.LG cs.AI cs.GT cs.MA

    OpenSpiel: A Framework for Reinforcement Learning in Games

    Authors: Marc Lanctot, Edward Lockhart, Jean-Baptiste Lespiau, Vinicius Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes , et al. (2 additional authors not shown)

    Abstract: OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and sequential, strictly turn-taking and simultaneous-move, perfect and imperfect information games, as well as traditional multiagent environments such as (partia… ▽ More

    Submitted 26 September, 2020; v1 submitted 25 August, 2019; originally announced August 2019.

  23. arXiv:1907.02633  [pdf, other

    math.OC cs.LG stat.ML

    On the Convergence of Model Free Learning in Mean Field Games

    Authors: Romuald Elie, Julien Pérolat, Mathieu Laurière, Matthieu Geist, Olivier Pietquin

    Abstract: Learning by experience in Multi-Agent Systems (MAS) is a difficult and exciting task, due to the lack of stationarity of the environment, whose dynamics evolves as the population learns. In order to design scalable algorithms for systems with a large population of interacting agents (e.g. swarms), this paper focuses on Mean Field MAS, where the number of agents is asymptotically infinite. Recently… ▽ More

    Submitted 20 February, 2020; v1 submitted 4 July, 2019; originally announced July 2019.

    Journal ref: AAAI 2020 conference proceedings

  24. arXiv:1906.09831  [pdf, other

    cs.GT cs.AI cs.LG

    Foolproof Cooperative Learning

    Authors: Alexis Jacq, Julien Perolat, Matthieu Geist, Olivier Pietquin

    Abstract: This paper extends the notion of learning equilibrium in game theory from matrix games to stochastic games. We introduce Foolproof Cooperative Learning (FCL), an algorithm that converges to a Tit-for-Tat behavior. It allows cooperative strategies when played against itself while being not exploitable by selfish players. We prove that in repeated symmetric games, this algorithm is a learning equili… ▽ More

    Submitted 15 October, 2020; v1 submitted 24 June, 2019; originally announced June 2019.

    Journal ref: Proceedings of The 12th Asian Conference on Machine Learning, PMLR 129:401-416, 2020

  25. arXiv:1906.00190  [pdf, other

    cs.LG cs.AI stat.ML

    Neural Replicator Dynamics

    Authors: Daniel Hennes, Dustin Morrill, Shayegan Omidshafiei, Remi Munos, Julien Perolat, Marc Lanctot, Audrunas Gruslys, Jean-Baptiste Lespiau, Paavo Parmas, Edgar Duenez-Guzman, Karl Tuyls

    Abstract: Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning. Using these algorithms in multiagent environments poses problems such as nonstationarity and instability. In this paper, we first demonstrate that standard softmax-based policy gradient can be prone to poor performance in the presence of even the most benign nonstati… ▽ More

    Submitted 26 February, 2020; v1 submitted 1 June, 2019; originally announced June 2019.

  26. arXiv:1903.05614  [pdf, other

    cs.AI cs.GT cs.LG

    Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent

    Authors: Edward Lockhart, Marc Lanctot, Julien Pérolat, Jean-Baptiste Lespiau, Dustin Morrill, Finbarr Timbers, Karl Tuyls

    Abstract: In this paper, we present exploitability descent, a new algorithm to compute approximate equilibria in two-player zero-sum extensive-form games with imperfect information, by direct policy optimization against worst-case opponents. We prove that when following this optimization, the exploitability of a player's strategy converges asymptotically to zero, and hence when both players employ this opti… ▽ More

    Submitted 12 June, 2020; v1 submitted 13 March, 2019; originally announced March 2019.

    Comments: IJCAI 2019, 11 pages, 1 figure

  27. arXiv:1903.01373  [pdf, other

    cs.MA cs.GT

    $α$-Rank: Multi-Agent Evaluation by Evolution

    Authors: Shayegan Omidshafiei, Christos Papadimitriou, Georgios Piliouras, Karl Tuyls, Mark Rowland, Jean-Baptiste Lespiau, Wojciech M. Czarnecki, Marc Lanctot, Julien Perolat, Remi Munos

    Abstract: We introduce $α$-Rank, a principled evolutionary dynamics methodology for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs). The approach leverages continuous- and discrete-time evolutionary dynamical systems applied to empirical games, and scales tractably in the number of… ▽ More

    Submitted 4 October, 2019; v1 submitted 4 March, 2019; originally announced March 2019.

  28. arXiv:1901.08106  [pdf, other

    cs.LG cs.GT cs.MA stat.ML

    Open-ended Learning in Symmetric Zero-sum Games

    Authors: David Balduzzi, Marta Garnelo, Yoram Bachrach, Wojciech M. Czarnecki, Julien Perolat, Max Jaderberg, Thore Graepel

    Abstract: Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them `winner' and `loser'. If the game is approximately transitive, then self-play generates sequences of agents of increasing strength. However, nontransitive games, such as rock-paper-scissors, can exhibit strategic cycles, and there is no longer a clear objective -- we want agen… ▽ More

    Submitted 13 May, 2019; v1 submitted 23 January, 2019; originally announced January 2019.

    Comments: ICML 2019, final version

  29. arXiv:1812.07019  [pdf, other

    cs.NE cs.MA q-bio.PE

    Malthusian Reinforcement Learning

    Authors: Joel Z. Leibo, Julien Perolat, Edward Hughes, Steven Wheelwright, Adam H. Marblestone, Edgar Duéñez-Guzmán, Peter Sunehag, Iain Dunning, Thore Graepel

    Abstract: Here we explore a new algorithmic framework for multi-agent reinforcement learning, called Malthusian reinforcement learning, which extends self-play to include fitness-linked population size dynamics that drive ongoing innovation. In Malthusian RL, increases in a subpopulation's average return drive subsequent increases in its size, just as Thomas Malthus argued in 1798 was the relationship betwe… ▽ More

    Submitted 3 March, 2019; v1 submitted 17 December, 2018; originally announced December 2018.

    Comments: 9 pages, 2 tables, 4 figures

  30. arXiv:1810.09026  [pdf, other

    cs.LG cs.AI cs.GT cs.MA stat.ML

    Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

    Authors: Sriram Srinivasan, Marc Lanctot, Vinicius Zambaldi, Julien Perolat, Karl Tuyls, Remi Munos, Michael Bowling

    Abstract: Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence. Among the most common approaches are algorithms based on gradient ascent of a score function representing discounted return. In this paper, we examine the role of these policy gradient and actor-critic algorithms in partially-observable multiagent environments.… ▽ More

    Submitted 12 June, 2020; v1 submitted 21 October, 2018; originally announced October 2018.

    Comments: NeurIPS 2018

  31. arXiv:1809.07802  [pdf, other

    cs.LG cs.CV stat.ML

    Playing the Game of Universal Adversarial Perturbations

    Authors: Julien Perolat, Mateusz Malinowski, Bilal Piot, Olivier Pietquin

    Abstract: We study the problem of learning classifiers robust to universal adversarial perturbations. While prior work approaches this problem via robust optimization, adversarial training, or input transformation, we instead phrase it as a two-player zero-sum game. In this new formulation, both players simultaneously play the same game, where one player chooses a classifier that minimizes a classification… ▽ More

    Submitted 25 September, 2018; v1 submitted 20 September, 2018; originally announced September 2018.

  32. arXiv:1806.02643  [pdf, other

    cs.LG cs.GT stat.ML

    Re-evaluating Evaluation

    Authors: David Balduzzi, Karl Tuyls, Julien Perolat, Thore Graepel

    Abstract: Progress in machine learning is measured by careful evaluation on problems of outstanding common interest. However, the proliferation of benchmark suites and environments, adversarial attacks, and other complications has diluted the basic evaluation model by overwhelming researchers with choices. Deliberate or accidental cherry picking is increasingly likely, and designing well-balanced evaluation… ▽ More

    Submitted 30 October, 2018; v1 submitted 7 June, 2018; originally announced June 2018.

    Comments: NIPS 2018, final version

  33. arXiv:1803.06376  [pdf, other

    cs.GT cs.MA

    A Generalised Method for Empirical Game Theoretic Analysis

    Authors: Karl Tuyls, Julien Perolat, Marc Lanctot, Joel Z Leibo, Thore Graepel

    Abstract: This paper provides theoretical bounds for empirical game theoretical analysis of complex multi-agent interactions. We provide insights in the empirical meta game showing that a Nash equilibrium of the meta-game is an approximate Nash equilibrium of the true underlying game. We investigate and show how many data samples are required to obtain a close enough approximation of the underlying game. Ad… ▽ More

    Submitted 16 March, 2018; originally announced March 2018.

    Comments: will appear at AAMAS'18

  34. arXiv:1711.05074  [pdf, other

    cs.GT cs.MA

    Symmetric Decomposition of Asymmetric Games

    Authors: Karl Tuyls, Julien Perolat, Marc Lanctot, Georg Ostrovski, Rahul Savani, Joel Leibo, Toby Ord, Thore Graepel, Shane Legg

    Abstract: We introduce new theoretical insights into two-population asymmetric games allowing for an elegant symmetric decomposition into two single population symmetric games. Specifically, we show how an asymmetric bimatrix game (A,B) can be decomposed into its symmetric counterparts by envisioning and investigating the payoff tables (A and B) that constitute the asymmetric game, as two independent, singl… ▽ More

    Submitted 17 January, 2018; v1 submitted 14 November, 2017; originally announced November 2017.

    Comments: Paper is published in Scientific Reports; https://www.nature.com/articles/s41598-018-19194-4, 2018

  35. arXiv:1711.00832  [pdf, other

    cs.AI cs.GT cs.LG cs.MA

    A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning

    Authors: Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Perolat, David Silver, Thore Graepel

    Abstract: To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL). The simplest form is independent reinforcement learning (InRL), where each agent treats its experience as part of its (non-stationary) environment. In this paper, we first observe that policies learned using InRL can overfit to t… ▽ More

    Submitted 7 November, 2017; v1 submitted 2 November, 2017; originally announced November 2017.

    Comments: Camera-ready copy of NIPS 2017 paper, including appendix

  36. arXiv:1707.06600  [pdf, other

    cs.MA cs.NE q-bio.PE

    A multi-agent reinforcement learning model of common-pool resource appropriation

    Authors: Julien Perolat, Joel Z. Leibo, Vinicius Zambaldi, Charles Beattie, Karl Tuyls, Thore Graepel

    Abstract: Humanity faces numerous problems of common-pool resource appropriation. This class of multi-agent social dilemma includes the problems of ensuring sustainable use of fresh water, common fisheries, grazing pastures, and irrigation systems. Abstract models of common-pool resource appropriation based on non-cooperative game theory predict that self-interested agents will generally fail to find social… ▽ More

    Submitted 6 September, 2017; v1 submitted 20 July, 2017; originally announced July 2017.

    Comments: 15 pages, 11 figures

  37. arXiv:1606.08718  [pdf, ps, other

    cs.GT

    Learning Nash Equilibrium for General-Sum Markov Games from Batch Data

    Authors: Julien Pérolat, Florian Strub, Bilal Piot, Olivier Pietquin

    Abstract: This paper addresses the problem of learning a Nash equilibrium in $γ$-discounted multiplayer general-sum Markov Games (MG). A key component of this model is the possibility for the players to either collaborate or team apart to increase their rewards. Building an artificial player for general-sum MGs implies to learn more complex strategies which are impossible to obtain by using techniques devel… ▽ More

    Submitted 6 March, 2017; v1 submitted 28 June, 2016; originally announced June 2016.

    Comments: 20th International Conference on Artificial Intelligence and Statistics (AISTATS) 2017, Fort Lauderdale, Florida, USA. JMLR: W&CP volume 54

    Report number: CRIStAL, UMR 9189