Skip to main content

Showing 1–18 of 18 results for author: Anthony, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.03121  [pdf, other

    cs.AI cs.GT cs.MA

    Evaluating Agents using Social Choice Theory

    Authors: Marc Lanctot, Kate Larson, Yoram Bachrach, Luke Marris, Zun Li, Avishkar Bhoopchand, Thomas Anthony, Brian Tanner, Anna Koop

    Abstract: We argue that many general evaluation problems can be viewed through the lens of voting theory. Each task is interpreted as a separate voter, which requires only ordinal rankings or pairwise comparisons of agents to produce an overall evaluation. By viewing the aggregator as a social welfare function, we are able to leverage centuries of research in social choice theory to derive principled evalua… ▽ More

    Submitted 6 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

  2. arXiv:2305.00768  [pdf, other

    cs.MA stat.ML

    Heterogeneous Social Value Orientation Leads to Meaningful Diversity in Sequential Social Dilemmas

    Authors: Udari Madhushani, Kevin R. McKee, John P. Agapiou, Joel Z. Leibo, Richard Everett, Thomas Anthony, Edward Hughes, Karl Tuyls, Edgar A. Duéñez-Guzmán

    Abstract: In social psychology, Social Value Orientation (SVO) describes an individual's propensity to allocate resources between themself and others. In reinforcement learning, SVO has been instantiated as an intrinsic motivation that remaps an agent's rewards based on particular target distributions of group reward. Prior studies show that groups of agents endowed with heterogeneous SVO learn diverse poli… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

  3. arXiv:2303.03196  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning

    Authors: Marc Lanctot, John Schultz, Neil Burch, Max Olan Smith, Daniel Hennes, Thomas Anthony, Julien Perolat

    Abstract: Progress in fields of machine learning and adversarial planning has benefited significantly from benchmark domains, from checkers and the classic UCI data sets to Go and Diplomacy. In sequential decision-making, agent evaluation has largely been restricted to few interactions against experts, with the aim to reach some desired level of performance (e.g. beating a human professional player). We pro… ▽ More

    Submitted 31 October, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: 25 pages, 8 figures, Accepted at TMLR October 2023

  4. arXiv:2210.09257  [pdf, other

    cs.LG cs.AI cs.GT cs.MA

    Turbocharging Solution Concepts: Solving NEs, CEs and CCEs with Neural Equilibrium Solvers

    Authors: Luke Marris, Ian Gemp, Thomas Anthony, Andrea Tacchetti, Siqi Liu, Karl Tuyls

    Abstract: Solution concepts such as Nash Equilibria, Correlated Equilibria, and Coarse Correlated Equilibria are useful components for many multiagent machine learning algorithms. Unfortunately, solving a normal-form game could take prohibitive or non-deterministic time to converge, and could fail. We introduce the Neural Equilibrium Solver which utilizes a special equivariant neural network architecture to… ▽ More

    Submitted 15 April, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022

  5. arXiv:2209.10958  [pdf, ps, other

    cs.MA cs.AI

    Develo**, Evaluating and Scaling Learning Agents in Multi-Agent Environments

    Authors: Ian Gemp, Thomas Anthony, Yoram Bachrach, Avishkar Bhoopchand, Kalesha Bullard, Jerome Connor, Vibhavari Dasagi, Bart De Vylder, Edgar Duenez-Guzman, Romuald Elie, Richard Everett, Daniel Hennes, Edward Hughes, Mina Khan, Marc Lanctot, Kate Larson, Guy Lever, Siqi Liu, Luke Marris, Kevin R. McKee, Paul Muller, Julien Perolat, Florian Strub, Andrea Tacchetti, Eugene Tarassov , et al. (2 additional authors not shown)

    Abstract: The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in d… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Comments: Published in AI Communications 2022

  6. arXiv:2206.15378  [pdf, other

    cs.AI cs.GT cs.MA

    Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

    Authors: Julien Perolat, Bart de Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot , et al. (9 additional authors not shown)

    Abstract: We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additiona… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

  7. arXiv:2106.01901  [pdf, other

    cs.MA

    Iterative Empirical Game Solving via Single Policy Best Response

    Authors: Max Olan Smith, Thomas Anthony, Michael P. Wellman

    Abstract: Policy-Space Response Oracles (PSRO) is a general algorithmic framework for learning policies in multiagent systems by interleaving empirical game analysis with deep reinforcement learning (Deep RL). At each iteration, Deep RL is invoked to train a best response to a mixture of opponent policies. The repeated application of Deep RL poses an expensive computational burden as we look to apply this a… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

    Journal ref: ICLR 2021

  8. arXiv:2106.01285  [pdf, other

    cs.GT

    Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent

    Authors: Ian Gemp, Rahul Savani, Marc Lanctot, Yoram Bachrach, Thomas Anthony, Richard Everett, Andrea Tacchetti, Tom Eccles, János Kramár

    Abstract: Nash equilibrium is a central concept in game theory. Several Nash solvers exist, yet none scale to normal-form games with many actions and many players, especially those with payoff tensors too big to be stored in memory. In this work, we propose an approach that iteratively improves an approximation to a Nash equilibrium through joint play. It accomplishes this by tracing a previously establishe… ▽ More

    Submitted 4 February, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

    Comments: Published in AAMAS 2022 (code available as part of open_spiel on github -- search ADIDAS in repo)

  9. arXiv:2011.04021  [pdf, other

    cs.AI cs.LG

    On the role of planning in model-based deep reinforcement learning

    Authors: Jessica B. Hamrick, Abram L. Friesen, Feryal Behbahani, Arthur Guez, Fabio Viola, Sims Witherspoon, Thomas Anthony, Lars Buesing, Petar Veličković, Théophane Weber

    Abstract: Model-based planning is often thought to be necessary for deep, careful reasoning and generalization in artificial agents. While recent successes of model-based reinforcement learning (MBRL) with deep function approximation have strengthened this hypothesis, the resulting diversity of model-based methods has also made it difficult to track which components drive success and why. In this paper, we… ▽ More

    Submitted 17 March, 2021; v1 submitted 8 November, 2020; originally announced November 2020.

    Comments: Published at ICLR 2021

  10. arXiv:2009.14180  [pdf, other

    cs.MA

    Learning to Play against Any Mixture of Opponents

    Authors: Max Olan Smith, Thomas Anthony, Yongzhao Wang, Michael P. Wellman

    Abstract: Intuitively, experience playing against one mixture of opponents in a given domain should be relevant for a different mixture in the same domain. We propose a transfer learning method, Q-Mixing, that starts by learning Q-values against each pure-strategy opponent. Then a Q-value for any distribution of opponent strategies is approximated by appropriately averaging the separately learned Q-values.… ▽ More

    Submitted 3 June, 2021; v1 submitted 29 September, 2020; originally announced September 2020.

  11. arXiv:2006.04635  [pdf, other

    cs.LG cs.AI cs.GT cs.MA stat.ML

    Learning to Play No-Press Diplomacy with Best Response Policy Iteration

    Authors: Thomas Anthony, Tom Eccles, Andrea Tacchetti, János Kramár, Ian Gemp, Thomas C. Hudson, Nicolas Porcel, Marc Lanctot, Julien Pérolat, Richard Everett, Roman Werpachowski, Satinder Singh, Thore Graepel, Yoram Bachrach

    Abstract: Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects.… ▽ More

    Submitted 4 January, 2022; v1 submitted 8 June, 2020; originally announced June 2020.

  12. arXiv:2003.00799  [pdf, other

    cs.GT cs.LG cs.MA stat.ML

    Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games

    Authors: Edward Hughes, Thomas W. Anthony, Tom Eccles, Joel Z. Leibo, David Balduzzi, Yoram Bachrach

    Abstract: Zero-sum games have long guided artificial intelligence research, since they possess both a rich strategy space of best-responses and a clear evaluation metric. What's more, competition is a vital mechanism in many real-world multi-agent systems capable of generating intelligent innovations: Darwinian evolution, the market economy and the AlphaZero algorithm, to name a few. In two-player zero-sum… ▽ More

    Submitted 27 February, 2020; originally announced March 2020.

    Comments: Accepted for publication at AAMAS 2020

  13. arXiv:2002.08456  [pdf, other

    cs.GT cs.LG stat.ML

    From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization

    Authors: Julien Perolat, Remi Munos, Jean-Baptiste Lespiau, Shayegan Omidshafiei, Mark Rowland, Pedro Ortega, Neil Burch, Thomas Anthony, David Balduzzi, Bart De Vylder, Georgios Piliouras, Marc Lanctot, Karl Tuyls

    Abstract: In this paper we investigate the Follow the Regularized Leader dynamics in sequential imperfect information games (IIG). We generalize existing results of Poincaré recurrence from normal-form games to zero-sum two-player imperfect information games and other sequential game settings. We then investigate how adapting the reward (by adding a regularization term) of the game can give strong convergen… ▽ More

    Submitted 19 February, 2020; originally announced February 2020.

    Comments: 43 pages

  14. arXiv:2001.04678  [pdf, other

    cs.LG cs.AI cs.GT cs.MA stat.ML

    Smooth markets: A basic mechanism for organizing gradient-based learners

    Authors: David Balduzzi, Wojciech M Czarnecki, Thomas W Anthony, Ian M Gemp, Edward Hughes, Joel Z Leibo, Georgios Piliouras, Thore Graepel

    Abstract: With the success of modern machine learning, it is becoming increasingly important to understand and control how learning algorithms interact. Unfortunately, negative results from game theory show there is little hope of understanding or controlling general n-player games. We therefore introduce smooth markets (SM-games), a class of n-player games with pairwise zero sum interactions. SM-games codi… ▽ More

    Submitted 18 January, 2020; v1 submitted 14 January, 2020; originally announced January 2020.

    Comments: 18 pages, 3 figures

    Journal ref: ICLR 2020

  15. Analyzing the HCP Datasets using GPUs: The Anatomy of a Science Engagement

    Authors: John-Paul Robinson, Thomas Anthony, Ravi Tripathi, Sara A. Sims, Kristina M. Visscher, Purushotham V. Bangalore

    Abstract: This paper documents the experience improving the performance of a data processing workflow for analysis of the Human Connectome Project's HCP900 data set. It describes how network and compute bottlenecks were discovered and resolved during the course of a science engagement. A series of computational enhancements to the stock FSL BedpostX workflow are described. These enhancements migrated the wo… ▽ More

    Submitted 7 September, 2019; originally announced September 2019.

    Comments: 6 pages, 3 figures, PEARC '18: Practice and Experience in Advanced Research Computing, July 22--26, 2018, Pittsburgh, PA, USA

  16. arXiv:1908.09453  [pdf, other

    cs.LG cs.AI cs.GT cs.MA

    OpenSpiel: A Framework for Reinforcement Learning in Games

    Authors: Marc Lanctot, Edward Lockhart, Jean-Baptiste Lespiau, Vinicius Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes , et al. (2 additional authors not shown)

    Abstract: OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and sequential, strictly turn-taking and simultaneous-move, perfect and imperfect information games, as well as traditional multiagent environments such as (partia… ▽ More

    Submitted 26 September, 2020; v1 submitted 25 August, 2019; originally announced August 2019.

  17. arXiv:1904.03646  [pdf, other

    cs.LG stat.ML

    Policy Gradient Search: Online Planning and Expert Iteration without Search Trees

    Authors: Thomas Anthony, Robert Nishihara, Philipp Moritz, Tim Salimans, John Schulman

    Abstract: Monte Carlo Tree Search (MCTS) algorithms perform simulation-based search to improve policies online. During search, the simulation policy is adapted to explore the most promising lines of play. MCTS has been used by state-of-the-art programs for many problems, however a disadvantage to MCTS is that it estimates the values of states with Monte Carlo averages, stored in a search tree; this does not… ▽ More

    Submitted 7 April, 2019; originally announced April 2019.

  18. arXiv:1705.08439  [pdf, other

    cs.AI

    Thinking Fast and Slow with Deep Learning and Tree Search

    Authors: Thomas Anthony, Zheng Tian, David Barber

    Abstract: Sequential decision making problems, such as structured prediction, robotic control, and game playing, require a combination of planning policies and generalisation of those plans. In this paper, we present Expert Iteration (ExIt), a novel reinforcement learning algorithm which decomposes the problem into separate planning and generalisation tasks. Planning new policies is performed by tree search… ▽ More

    Submitted 3 December, 2017; v1 submitted 23 May, 2017; originally announced May 2017.

    Comments: v1 to v2: - Add a value function in MCTS - Some MCTS hyper-parameters changed - Repetition of experiments: improved accuracy and errors shown. (note the reduction in effect size for the tpt/cat experiment) - Results from a longer training run, including changes in expert strength in training - Comparison to MoHex. v3: clarify independence of ExIt and AG0. v4: see appendix E