Skip to main content

Showing 1–5 of 5 results for author: Smith, M O

.
  1. arXiv:2305.14223  [pdf, other

    cs.MA cs.AI cs.GT cs.LG

    Co-Learning Empirical Games and World Models

    Authors: Max Olan Smith, Michael P. Wellman

    Abstract: Game-based decision-making involves reasoning over both world dynamics and strategic interactions among the agents. Typically, empirical models capturing these respective aspects are learned and used separately. We investigate the potential gain from co-learning these elements: a world model for dynamics and an empirical game for strategic interactions. Empirical games drive world models toward a… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  2. arXiv:2303.03196  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning

    Authors: Marc Lanctot, John Schultz, Neil Burch, Max Olan Smith, Daniel Hennes, Thomas Anthony, Julien Perolat

    Abstract: Progress in fields of machine learning and adversarial planning has benefited significantly from benchmark domains, from checkers and the classic UCI data sets to Go and Diplomacy. In sequential decision-making, agent evaluation has largely been restricted to few interactions against experts, with the aim to reach some desired level of performance (e.g. beating a human professional player). We pro… ▽ More

    Submitted 31 October, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: 25 pages, 8 figures, Accepted at TMLR October 2023

  3. arXiv:2106.01901  [pdf, other

    cs.MA

    Iterative Empirical Game Solving via Single Policy Best Response

    Authors: Max Olan Smith, Thomas Anthony, Michael P. Wellman

    Abstract: Policy-Space Response Oracles (PSRO) is a general algorithmic framework for learning policies in multiagent systems by interleaving empirical game analysis with deep reinforcement learning (Deep RL). At each iteration, Deep RL is invoked to train a best response to a mixture of opponent policies. The repeated application of Deep RL poses an expensive computational burden as we look to apply this a… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

    Journal ref: ICLR 2021

  4. arXiv:2009.14180  [pdf, other

    cs.MA

    Learning to Play against Any Mixture of Opponents

    Authors: Max Olan Smith, Thomas Anthony, Yongzhao Wang, Michael P. Wellman

    Abstract: Intuitively, experience playing against one mixture of opponents in a given domain should be relevant for a different mixture in the same domain. We propose a transfer learning method, Q-Mixing, that starts by learning Q-values against each pure-strategy opponent. Then a Q-value for any distribution of opponent strategies is approximated by appropriately averaging the separately learned Q-values.… ▽ More

    Submitted 3 June, 2021; v1 submitted 29 September, 2020; originally announced September 2020.

  5. arXiv:1909.02128  [pdf, other

    cs.AI cs.LG cs.MA

    No Press Diplomacy: Modeling Multi-Agent Gameplay

    Authors: Philip Paquette, Yuchen Lu, Steven Bocco, Max O. Smith, Satya Ortiz-Gagne, Jonathan K. Kummerfeld, Satinder Singh, Joelle Pineau, Aaron Courville

    Abstract: Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal. Reliance on trust and coordination makes Diplomacy the first non-cooperative multi-agent benchmark for complex sequential social dilemmas in a rich environment. In this work, we focus on training an agent that learns to play the No Press version of Diplomacy wher… ▽ More

    Submitted 19 November, 2019; v1 submitted 4 September, 2019; originally announced September 2019.

    Comments: Accepted at NeurIPS 2019