Skip to main content

Showing 1–50 of 60 results for author: Bowling, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19561  [pdf, other

    cs.LG cs.AI

    Meta-Gradient Search Control: A Method for Improving the Efficiency of Dyna-style Planning

    Authors: Bradley Burega, John D. Martin, Luke Kapeluck, Michael Bowling

    Abstract: We study how a Reinforcement Learning (RL) system can remain sample-efficient when learning from an imperfect model of the environment. This is particularly challenging when the learning system is resource-constrained and in continual settings, where the environment dynamics change. To address these challenges, our paper introduces an online, meta-gradient algorithm that tunes a probability with w… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.13909  [pdf, other

    cs.LG

    Beyond Optimism: Exploration With Partially Observable Rewards

    Authors: Simone Parisi, Alireza Kazemipour, Michael Bowling

    Abstract: Exploration in reinforcement learning (RL) remains an open challenge. RL algorithms rely on observing rewards to train the agent, and if informative rewards are sparse the agent learns slowly or may not learn at all. To improve exploration and reward discovery, popular algorithms rely on optimism. But what if sometimes rewards are unobservable, e.g., situations of partial monitoring in bandits and… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  3. arXiv:2402.06819  [pdf, other

    cs.LG

    Monitored Markov Decision Processes

    Authors: Simone Parisi, Montaser Mohammedalamen, Alireza Kazemipour, Matthew E. Taylor, Michael Bowling

    Abstract: In reinforcement learning (RL), an agent learns to perform a task by interacting with an environment and receiving feedback (a numerical reward) for its actions. However, the assumption that rewards are always observable is often not applicable in real-world problems. For example, the agent may need to ask a human to supervise its actions or activate a monitoring system to receive feedback. There… ▽ More

    Submitted 13 February, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: AAMAS 2024, Main Track

  4. arXiv:2311.06979  [pdf, ps, other

    cs.AI cs.PL cs.SE

    Assessing the Interpretability of Programmatic Policies with Large Language Models

    Authors: Zahra Bashir, Michael Bowling, Levi H. S. Lelis

    Abstract: Although the synthesis of programs encoding policies often carries the promise of interpretability, systematic evaluations were never performed to assess the interpretability of these policies, likely because of the complexity of such an evaluation. In this paper, we introduce a novel metric that uses large-language models (LLM) to assess the interpretability of programmatic policies. For our metr… ▽ More

    Submitted 20 January, 2024; v1 submitted 12 November, 2023; originally announced November 2023.

    Comments: This paper is under-review for IJCAI. The main file is arxiv.tex and I have a supplementary_materials.tex file as well

  5. arXiv:2310.10833  [pdf, other

    cs.LG cs.AI

    Proper Laplacian Representation Learning

    Authors: Diego Gomez, Michael Bowling, Marlos C. Machado

    Abstract: The ability to learn good representations of states is essential for solving large reinforcement learning problems, where exploration, generalization, and transfer are particularly challenging. The Laplacian representation is a promising approach to address these problems by inducing informative state encoding and intrinsic rewards for temporally-extended action discovery and reward sha**. To ob… ▽ More

    Submitted 3 April, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  6. arXiv:2310.10553  [pdf, other

    cs.LG cs.MA stat.ML

    TacticAI: an AI assistant for football tactics

    Authors: Zhe Wang, Petar Veličković, Daniel Hennes, Nenad Tomašev, Laurel Prince, Michael Kaisers, Yoram Bachrach, Romuald Elie, Li Kevin Wenliang, Federico Piccinini, William Spearman, Ian Graham, Jerome Connor, Yi Yang, Adrià Recasens, Mina Khan, Nathalie Beauguerlange, Pablo Sprechmann, Pol Moreno, Nicolas Heess, Michael Bowling, Demis Hassabis, Karl Tuyls

    Abstract: Identifying key patterns of tactics implemented by rival teams, and develo** effective responses, lies at the heart of modern football. However, doing so algorithmically remains an open research challenge. To address this unmet need, we propose TacticAI, an AI football tactics assistant developed and evaluated in close collaboration with domain experts from Liverpool FC. We focus on analysing co… ▽ More

    Submitted 17 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: 32 pages, 10 figures

  7. arXiv:2303.01074  [pdf, other

    cs.GT cs.LG

    Learning not to Regret

    Authors: David Sychrovský, Michal Šustr, Elnaz Davoodi, Michael Bowling, Marc Lanctot, Martin Schmid

    Abstract: The literature on game-theoretic equilibrium finding predominantly focuses on single games or their repeated play. Nevertheless, numerous real-world scenarios feature playing a game sampled from a distribution of similar, but not identical games, such as playing poker with different public cards or trading correlated assets on the stock market. As these similar games feature similar equilibra, we… ▽ More

    Submitted 19 February, 2024; v1 submitted 2 March, 2023; originally announced March 2023.

  8. arXiv:2302.12359  [pdf, other

    cs.AI cs.LG

    Targeted Search Control in AlphaZero for Effective Policy Improvement

    Authors: Alexandre Trudeau, Michael Bowling

    Abstract: AlphaZero is a self-play reinforcement learning algorithm that achieves superhuman play in chess, shogi, and Go via policy iteration. To be an effective policy improvement operator, AlphaZero's search requires accurate value estimates for the states appearing in its search tree. AlphaZero trains upon self-play matches beginning from the initial state of a game and only samples actions over the fir… ▽ More

    Submitted 28 February, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Comments: This paper has been accepted to the Proceedings of the 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2023)

  9. arXiv:2212.10420  [pdf, other

    cs.AI cs.LG math.ST

    Settling the Reward Hypothesis

    Authors: Michael Bowling, John D. Martin, David Abel, Will Dabney

    Abstract: The reward hypothesis posits that, "all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)." We aim to fully settle this hypothesis. This will not conclude with a simple affirmation or refutation, but rather specify completely the implicit requirements on goals and purposes under which the hy… ▽ More

    Submitted 16 September, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  10. arXiv:2211.01480  [pdf, other

    cs.MA cs.CL cs.HC

    Over-communicate no more: Situated RL agents learn concise communication protocols

    Authors: Aleksandra Kalinowska, Elnaz Davoodi, Florian Strub, Kory W Mathewson, Ivana Kajic, Michael Bowling, Todd D Murphey, Patrick M Pilarski

    Abstract: While it is known that communication facilitates cooperation in multi-agent settings, it is unclear how to design artificial agents that can learn to effectively and efficiently communicate with each other. Much research on communication emergence uses reinforcement learning (RL) and explores unsituated communication in one-step referential tasks -- the tasks are not temporally interactive and lac… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

  11. arXiv:2208.11173  [pdf, other

    cs.AI cs.LG

    The Alberta Plan for AI Research

    Authors: Richard S. Sutton, Michael Bowling, Patrick M. Pilarski

    Abstract: Herein we describe our approach to artificial intelligence research, which we call the Alberta Plan. The Alberta Plan is pursued within our research groups in Alberta and by others who are like minded throughout the world. We welcome all who would join us in this pursuit.

    Submitted 21 March, 2023; v1 submitted 23 August, 2022; originally announced August 2022.

  12. arXiv:2206.02036  [pdf, other

    cs.LG cs.AI stat.ML

    Interpolating Between Softmax Policy Gradient and Neural Replicator Dynamics with Capped Implicit Exploration

    Authors: Dustin Morrill, Esra'a Saleh, Michael Bowling, Amy Greenwald

    Abstract: Neural replicator dynamics (NeuRD) is an alternative to the foundational softmax policy gradient (SPG) algorithm motivated by online learning and evolutionary game theory. The NeuRD expected update is designed to be nearly identical to that of SPG, however, we show that the Monte Carlo updates differ in a substantial way: the importance correction accounting for a sampled action is nullified in th… ▽ More

    Submitted 4 June, 2022; originally announced June 2022.

    Comments: At Reinforcement Learning and Decision Making 2022, June 2022. 9 pages and 1 figure

  13. arXiv:2205.12031   

    cs.GT cs.AI

    Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games: Corrections

    Authors: Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, Amy R. Greenwald

    Abstract: Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria. To develop hindsight rational learning in sequential decision-making settings, we formalize behavioral deviations as a general class of… ▽ More

    Submitted 1 June, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: Please see version 4 of arXiv:2102.06973 (arXiv:2102.06973v4). This submission was a version of that paper with highlighted corrections. After submitting, I figured out that it would be better to submit this report as another version of arXiv:2102.06973

  14. arXiv:2205.10736  [pdf, other

    cs.LG cs.AI stat.ML

    Should Models Be Accurate?

    Authors: Esra'a Saleh, John D. Martin, Anna Koop, Arash Pourzarabi, Michael Bowling

    Abstract: Model-based Reinforcement Learning (MBRL) holds promise for data-efficiency by planning with model-generated experience in addition to learning with experience from the environment. However, in complex or changing environments, models in MBRL will inevitably be imperfect, and their detrimental effects on learning can be difficult to mitigate. In this work, we question whether the objective of thes… ▽ More

    Submitted 22 May, 2022; originally announced May 2022.

    Comments: The 5th Multidisciplinary Conference on Reinforcement Learning and Decision Making ( RLDM 2022 )

  15. arXiv:2112.03178  [pdf, other

    cs.AI cs.GT cs.LG

    Student of Games: A unified learning algorithm for both perfect and imperfect information games

    Authors: Martin Schmid, Matej Moravcik, Neil Burch, Rudolf Kadlec, Josh Davidson, Kevin Waugh, Nolan Bard, Finbarr Timbers, Marc Lanctot, G. Zacharias Holland, Elnaz Davoodi, Alden Christianson, Michael Bowling

    Abstract: Games have a long history as benchmarks for progress in artificial intelligence. Approaches using search and learning produced strong performance across many perfect information games, and approaches using game-theoretic reasoning and learning demonstrated strong performance for specific imperfect information poker variants. We introduce Student of Games, a general-purpose algorithm that unifies p… ▽ More

    Submitted 15 November, 2023; v1 submitted 6 December, 2021; originally announced December 2021.

    Comments: Published in Science Advances

    Journal ref: Science Advances 9, eadg3256 (2023)

  16. arXiv:2111.08102  [pdf, ps, other

    cs.AI cs.GT

    The Partially Observable History Process

    Authors: Dustin Morrill, Amy R. Greenwald, Michael Bowling

    Abstract: We introduce the partially observable history process (POHP) formalism for reinforcement learning. POHP centers around the actions and observations of a single agent and abstracts away the presence of other players without reducing them to stochastic processes. Our formalism provides a streamlined interface for designing algorithms that defy categorization as exclusively single or multi-agent, and… ▽ More

    Submitted 24 February, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: 8 pages, 2 figures

    Journal ref: AAAI-22 Workshop on Reinforcement Learning and Games, February 28, 2022

  17. arXiv:2110.15907  [pdf, other

    cs.AI cs.LG

    Learning to Be Cautious

    Authors: Montaser Mohammedalamen, Dustin Morrill, Alexander Sieusahai, Yash Satsangi, Michael Bowling

    Abstract: A key challenge in the field of reinforcement learning is to develop agents that behave cautiously in novel situations. It is generally impossible to anticipate all situations that an autonomous system may face or what behavior would best avoid bad outcomes. An agent that could learn to be cautious would overcome this challenge by discovering for itself when and how to behave cautiously. In contra… ▽ More

    Submitted 29 October, 2021; originally announced October 2021.

  18. arXiv:2110.05740  [pdf, other

    cs.LG cs.AI

    Temporal Abstraction in Reinforcement Learning with the Successor Representation

    Authors: Marlos C. Machado, Andre Barreto, Doina Precup, Michael Bowling

    Abstract: Reasoning at multiple levels of temporal abstraction is one of the key attributes of intelligence. In reinforcement learning, this is often modeled through temporally extended courses of actions called options. Options allow agents to make predictions and to operate at different levels of abstraction within an environment. Nevertheless, approaches based on the options framework often start with th… ▽ More

    Submitted 11 April, 2023; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: This is the final, published JMLR version

    Journal ref: Journal of Machine Learning Research (JMLR), 24(80):1-69, 2023

  19. arXiv:2102.06973  [pdf, other

    cs.GT cs.AI

    Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

    Authors: Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, Amy Greenwald

    Abstract: Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria. To develop hindsight rational learning in sequential decision-making settings, we formalize behavioral deviations as a general class of… ▽ More

    Submitted 22 June, 2022; v1 submitted 13 February, 2021; originally announced February 2021.

    Comments: Corrected technical report for the paper with the same title in the proceedings of the thirty-eighth International Conference on Machine Learning (ICML 2021), virtual. Compared to v5, this version removes the version indicator from an arXiv reference. 43 pages and 6 figures

  20. arXiv:2101.04237  [pdf, other

    cs.AI cs.LG

    Solving Common-Payoff Games with Approximate Policy Iteration

    Authors: Samuel Sokota, Edward Lockhart, Finbarr Timbers, Elnaz Davoodi, Ryan D'Orazio, Neil Burch, Martin Schmid, Michael Bowling, Marc Lanctot

    Abstract: For artificially intelligent learning systems to have widespread applicability in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is difficult -- computing even an epsilon-optimal joint policy is a NEXP complete problem. Nevertheless, a recently rediscovered insight -- that a team of agents can coordinate via common knowledge -- h… ▽ More

    Submitted 11 January, 2021; originally announced January 2021.

    Comments: AAAI 2021

  21. arXiv:2012.05874  [pdf, other

    cs.GT cs.AI

    Hindsight and Sequential Rationality of Correlated Play

    Authors: Dustin Morrill, Ryan D'Orazio, Reca Sarfati, Marc Lanctot, James R. Wright, Amy Greenwald, Michael Bowling

    Abstract: Driven by recent successes in two-player, zero-sum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibrium-based strategies. However, this approach has been less effective at producing competent players in general-sum games or those with more than two players than in two-player, zero-sum games. An appealing alternative is to c… ▽ More

    Submitted 22 June, 2022; v1 submitted 10 December, 2020; originally announced December 2020.

    Comments: Corrected technical report for the paper with the same title in the proceedings of the thirty-fifth AAAI Conference on Artificial Intelligence (AAAI-21), February 2-9, 2021, Virtual. Compared to v5, this version fixes the realized terminal history indicators in the diagram describing MacQueen's counterexample. 27 pages and 16 figures

  22. arXiv:2011.01297  [pdf, other

    cs.LG cs.AI

    Useful Policy Invariant Sha** from Arbitrary Advice

    Authors: Paniz Behboudian, Yash Satsangi, Matthew E. Taylor, Anna Harutyunyan, Michael Bowling

    Abstract: Reinforcement learning is a powerful learning paradigm in which agents can learn to maximize sparse and delayed reward signals. Although RL has had many impressive successes in complex domains, learning can take hours, days, or even years of training data. A major challenge of contemporary RL research is to discover how to learn with less data. Previous work has shown that domain information can b… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

    Comments: 9 pages, 6 figures, Adaptive and Learning Agents (ALA) 2020 Workshop

  23. arXiv:2008.12234  [pdf, other

    cs.AI cs.LG

    The Advantage Regret-Matching Actor-Critic

    Authors: Audrūnas Gruslys, Marc Lanctot, Rémi Munos, Finbarr Timbers, Martin Schmid, Julien Perolat, Dustin Morrill, Vinicius Zambaldi, Jean-Baptiste Lespiau, John Schultz, Mohammad Gheshlaghi Azar, Michael Bowling, Karl Tuyls

    Abstract: Regret minimization has played a key role in online learning, equilibrium computation in games, and reinforcement learning (RL). In this paper, we describe a general model-free RL method for no-regret learning based on repeated reconsideration of past behavior. We propose a model-free RL algorithm, the AdvantageRegret-Matching Actor-Critic (ARMAC): rather than saving past state-action data, ARMAC… ▽ More

    Submitted 27 August, 2020; originally announced August 2020.

  24. arXiv:2006.08740  [pdf, other

    cs.GT

    Sound Algorithms in Imperfect Information Games

    Authors: Michal Šustr, Martin Schmid, Matej Moravčík, Neil Burch, Marc Lanctot, Michael Bowling

    Abstract: Search has played a fundamental role in computer game research since the very beginning. And while online search has been commonly used in perfect information games such as Chess and Go, online search methods for imperfect information games have only been introduced relatively recently. This paper addresses the question of what is a sound online algorithm in an imperfect information setting of two… ▽ More

    Submitted 2 March, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: Accepted to AAMAS2021 as extended abstract (Ref. numbers not available yet)

  25. arXiv:2006.06054  [pdf, other

    cs.AI

    Marginal Utility for Planning in Continuous or Large Discrete Action Spaces

    Authors: Zaheen Farraz Ahmad, Levi H. S. Lelis, Michael Bowling

    Abstract: Sample-based planning is a powerful family of algorithms for generating intelligent behavior from a model of the environment. Generating good candidate actions is critical to the success of sample-based planners, particularly in continuous or large action spaces. Typically, candidate action generation exhausts the action space, uses domain knowledge, or more recently, involves learning a stochasti… ▽ More

    Submitted 17 June, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

  26. arXiv:2006.04363  [pdf, other

    cs.LG cs.AI stat.ML

    Hallucinating Value: A Pitfall of Dyna-style Planning with Imperfect Environment Models

    Authors: Taher Jafferjee, Ehsan Imani, Erin Talvitie, Martha White, Micheal Bowling

    Abstract: Dyna-style reinforcement learning (RL) agents improve sample efficiency over model-free RL agents by updating the value function with simulated experience generated by an environment model. However, it is often difficult to learn accurate models of environment dynamics, and even small errors may result in failure of Dyna agents. In this paper, we investigate one type of model error: hallucinated s… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

    Comments: 9 pages, 7 figures,

  27. arXiv:2004.13657  [pdf, other

    cs.LG cs.AI stat.ML

    Sample-Efficient Model-based Actor-Critic for an Interactive Dialogue Task

    Authors: Katya Kudashkina, Valliappa Chockalingam, Graham W. Taylor, Michael Bowling

    Abstract: Human-computer interactive systems that rely on machine learning are becoming paramount to the lives of millions of people who use digital assistants on a daily basis. Yet, further advances are limited by the availability of data and the cost of acquiring new samples. One way to address this problem is by improving the sample efficiency of current approaches. As a solution path, we present a model… ▽ More

    Submitted 28 April, 2020; originally announced April 2020.

  28. arXiv:2004.09677  [pdf, other

    cs.LG stat.ML

    Approximate exploitability: Learning a best response in large games

    Authors: Finbarr Timbers, Nolan Bard, Edward Lockhart, Marc Lanctot, Martin Schmid, Neil Burch, Julian Schrittwieser, Thomas Hubert, Michael Bowling

    Abstract: Researchers have demonstrated that neural networks are vulnerable to adversarial examples and subtle environment changes, both of which one can view as a form of distribution shift. To humans, the resulting errors can look like blunders, eroding trust in these agents. In prior games research, agent evaluation often focused on the in-practice game outcomes. While valuable, such evaluation typically… ▽ More

    Submitted 3 November, 2022; v1 submitted 20 April, 2020; originally announced April 2020.

  29. arXiv:1912.02967  [pdf, other

    cs.AI cs.GT cs.LG

    Alternative Function Approximation Parameterizations for Solving Games: An Analysis of $f$-Regression Counterfactual Regret Minimization

    Authors: Ryan D'Orazio, Dustin Morrill, James R. Wright, Michael Bowling

    Abstract: Function approximation is a powerful approach for structuring large decision problems that has facilitated great achievements in the areas of reinforcement learning and game playing. Regression counterfactual regret minimization (RCFR) is a simple algorithm for approximately solving imperfect information games with normalized rectified linear unit (ReLU) parameterized policies. In contrast, the mo… ▽ More

    Submitted 1 May, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

    Comments: 11 pages, includes appendix

    Journal ref: Nineteenth International Conference on Autonomous Agents and Multi-Agent Systems, 9-13 May 2020, Auckland, New Zealand

  30. arXiv:1907.09633  [pdf, other

    cs.GT cs.AI

    Low-Variance and Zero-Variance Baselines for Extensive-Form Games

    Authors: Trevor Davis, Martin Schmid, Michael Bowling

    Abstract: Extensive-form games (EFGs) are a common model of multi-agent interactions with imperfect information. State-of-the-art algorithms for solving these games typically perform full walks of the game tree that can prove prohibitively slow in large games. Alternatively, sampling-based methods such as Monte Carlo Counterfactual Regret Minimization walk one or more trajectories through the tree, touching… ▽ More

    Submitted 22 July, 2019; originally announced July 2019.

    Comments: Under review for NeurIPS 2019

  31. arXiv:1906.11110  [pdf, other

    cs.AI cs.GT

    Rethinking Formal Models of Partially Observable Multiagent Decision Making

    Authors: Vojtěch Kovařík, Martin Schmid, Neil Burch, Michael Bowling, Viliam Lisý

    Abstract: Multiagent decision-making in partially observable environments is usually modelled as either an extensive-form game (EFG) in game theory or a partially observable stochastic game (POSG) in multiagent reinforcement learning (MARL). One issue with the current situation is that while most practical problems can be modelled in both formalisms, the relationship of the two models is unclear, which hind… ▽ More

    Submitted 28 September, 2021; v1 submitted 26 June, 2019; originally announced June 2019.

    Comments: A 2020 update of the original 2019 version of the paper. (Rewrote the main text and clarified the relationship between FOSGs/POSGs and EFGs. Some of the technical results are now presented in the appendix.)

  32. arXiv:1906.02403  [pdf, other

    cs.AI cs.CL cs.LG cs.MA

    Ease-of-Teaching and Language Structure from Emergent Communication

    Authors: Fushan Li, Michael Bowling

    Abstract: Artificial agents have been shown to learn to communicate when needed to complete a cooperative task. Some level of language structure (e.g., compositionality) has been found in the learned communication protocols. This observed structure is often the result of specific environmental pressures during training. By introducing new agents periodically to replace old ones, sequentially and within a po… ▽ More

    Submitted 28 October, 2019; v1 submitted 5 June, 2019; originally announced June 2019.

    Comments: Accepted at Neural Information Processing Systems (NeurIPS) 2019

  33. The Hanabi Challenge: A New Frontier for AI Research

    Authors: Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, Michael Bowling

    Abstract: From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains… ▽ More

    Submitted 6 December, 2019; v1 submitted 1 February, 2019; originally announced February 2019.

    Comments: 32 pages, 5 figures, In Press (Artificial Intelligence)

  34. arXiv:1811.01458  [pdf, other

    cs.MA cs.AI cs.LG

    Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning

    Authors: Jakob N. Foerster, Francis Song, Edward Hughes, Neil Burch, Iain Dunning, Shimon Whiteson, Matthew Botvinick, Michael Bowling

    Abstract: When observing the actions of others, humans make inferences about why they acted as they did, and what this implies about the world; humans also use the fact that their actions will be interpreted in this manner, allowing them to act informatively and thereby communicate efficiently with others. Although learning algorithms have recently achieved superhuman performance in a number of two-player,… ▽ More

    Submitted 10 September, 2019; v1 submitted 4 November, 2018; originally announced November 2018.

  35. arXiv:1810.09026  [pdf, other

    cs.LG cs.AI cs.GT cs.MA stat.ML

    Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

    Authors: Sriram Srinivasan, Marc Lanctot, Vinicius Zambaldi, Julien Perolat, Karl Tuyls, Remi Munos, Michael Bowling

    Abstract: Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence. Among the most common approaches are algorithms based on gradient ascent of a score function representing discounted return. In this paper, we examine the role of these policy gradient and actor-critic algorithms in partially-observable multiagent environments.… ▽ More

    Submitted 12 June, 2020; v1 submitted 21 October, 2018; originally announced October 2018.

    Comments: NeurIPS 2018

  36. arXiv:1810.00123  [pdf, other

    cs.LG cs.AI stat.ML

    Generalization and Regularization in DQN

    Authors: Jesse Farebrother, Marlos C. Machado, Michael Bowling

    Abstract: Deep reinforcement learning algorithms have shown an impressive ability to learn complex control policies in high-dimensional tasks. However, despite the ever-increasing performance on popular benchmarks, policies learned by deep reinforcement learning algorithms can struggle to generalize when evaluated in remarkably similar environments. In this paper we propose a protocol to evaluate generaliza… ▽ More

    Submitted 17 January, 2020; v1 submitted 28 September, 2018; originally announced October 2018.

    Comments: Earlier versions of this work were presented both at the NeurIPS'18 Deep Reinforcement Learning Workshop and the 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM'19)

  37. arXiv:1809.07893  [pdf, other

    cs.GT cs.AI

    Solving Large Extensive-Form Games with Strategy Constraints

    Authors: Trevor Davis, Kevin Waugh, Michael Bowling

    Abstract: Extensive-form games are a common model for multiagent interactions with imperfect information. In two-player zero-sum games, the typical solution concept is a Nash equilibrium over the unconstrained strategy set for each player. In many situations, however, we would like to constrain the set of possible strategies. For example, constraints are a natural way to model limited resources, risk mitiga… ▽ More

    Submitted 5 February, 2019; v1 submitted 20 September, 2018; originally announced September 2018.

    Comments: Appeared in AAAI 2019

  38. arXiv:1809.03057  [pdf, other

    cs.GT cs.AI

    Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines

    Authors: Martin Schmid, Neil Burch, Marc Lanctot, Matej Moravcik, Rudolf Kadlec, Michael Bowling

    Abstract: Learning strategies for imperfect information games from samples of interaction is a challenging problem. A common method for this setting, Monte Carlo Counterfactual Regret Minimization (MCCFR), can have slow long-term convergence rates due to high variance. In this paper, we introduce a variance reduction technique (VR-MCCFR) that applies to any sampling variant of MCCFR. Using this technique, p… ▽ More

    Submitted 9 September, 2018; originally announced September 2018.

  39. arXiv:1807.11622  [pdf, other

    cs.LG cs.AI stat.ML

    Count-Based Exploration with the Successor Representation

    Authors: Marlos C. Machado, Marc G. Bellemare, Michael Bowling

    Abstract: In this paper we introduce a simple approach for exploration in reinforcement learning (RL) that allows us to develop theoretically justified algorithms in the tabular case but that is also extendable to settings where function approximation is required. Our approach is based on the successor representation (SR), which was originally introduced as a representation defining state generalization by… ▽ More

    Submitted 26 November, 2019; v1 submitted 30 July, 2018; originally announced July 2018.

    Comments: This paper appears in the Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI 2020)

  40. arXiv:1806.01825  [pdf, other

    cs.AI

    The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces

    Authors: G. Zacharias Holland, Erin J. Talvitie, Michael Bowling

    Abstract: Dyna is a fundamental approach to model-based reinforcement learning (MBRL) that interleaves planning, acting, and learning in an online setting. In the most typical application of Dyna, the dynamics model is used to generate one-step transitions from selected start states from the agent's history, which are used to update the agent's value function or policy as if they were real experiences. In t… ▽ More

    Submitted 28 March, 2019; v1 submitted 5 June, 2018; originally announced June 2018.

  41. arXiv:1709.06009  [pdf, other

    cs.LG

    Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents

    Authors: Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, Michael Bowling

    Abstract: The Arcade Learning Environment (ALE) is an evaluation platform that poses the challenge of building AI agents with general competency across dozens of Atari 2600 games. It supports a variety of different problem settings and it has been receiving increasing attention from the scientific community, leading to some high-profile success stories such as the much publicized Deep Q-Networks (DQN). In t… ▽ More

    Submitted 30 November, 2017; v1 submitted 18 September, 2017; originally announced September 2017.

  42. arXiv:1703.00956  [pdf, other

    cs.LG cs.AI

    A Laplacian Framework for Option Discovery in Reinforcement Learning

    Authors: Marlos C. Machado, Marc G. Bellemare, Michael Bowling

    Abstract: Representation learning and option discovery are two of the biggest challenges in reinforcement learning (RL). Proto-value functions (PVFs) are a well-known approach for representation learning in MDPs. In this paper we address the option discovery problem by showing how PVFs implicitly define options. We do it by introducing eigenpurposes, intrinsic reward functions derived from the learned repre… ▽ More

    Submitted 15 June, 2017; v1 submitted 2 March, 2017; originally announced March 2017.

    Comments: Appearing in the Proceedings of the 34th International Conference on Machine Learning (ICML)

  43. DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker

    Authors: Matej Moravčík, Martin Schmid, Neil Burch, Viliam Lisý, Dustin Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson, Michael Bowling

    Abstract: Artificial intelligence has seen several breakthroughs in recent years, with games often serving as milestones. A common feature of these games is that players have perfect information. Poker is the quintessential game of imperfect information, and a longstanding challenge problem in artificial intelligence. We introduce DeepStack, an algorithm for imperfect information settings. It combines recur… ▽ More

    Submitted 3 March, 2017; v1 submitted 6 January, 2017; originally announced January 2017.

  44. arXiv:1612.07547  [pdf, ps, other

    cs.GT

    Equilibrium Approximation Quality of Current No-Limit Poker Bots

    Authors: Viliam Lisy, Michael Bowling

    Abstract: Approximating a Nash equilibrium is currently the best performing approach for creating poker-playing programs. While for the simplest variants of the game, it is possible to evaluate the quality of the approximation by computing the value of the best response strategy, this is currently not computationally feasible for larger variants of the game, such as heads-up no-limit Texas hold'em. In this… ▽ More

    Submitted 8 January, 2017; v1 submitted 22 December, 2016; originally announced December 2016.

    Comments: To appear at AAAI-17 Workshop on Computer Poker and Imperfect Information Games

  45. arXiv:1612.06915  [pdf, other

    cs.AI

    AIVAT: A New Variance Reduction Technique for Agent Evaluation in Imperfect Information Games

    Authors: Neil Burch, Martin Schmid, Matej Moravčík, Michael Bowling

    Abstract: Evaluating agent performance when outcomes are stochastic and agents use randomized strategies can be challenging when there is limited data available. The variance of sampled outcomes may make the simple approach of Monte Carlo sampling inadequate. This is the case for agents playing heads-up no-limit Texas hold'em poker, where man-machine competitions have involved multiple days of consistent pl… ▽ More

    Submitted 19 January, 2017; v1 submitted 20 December, 2016; originally announced December 2016.

    Comments: To appear at AAAI-17 Workshop on Computer Poker and Imperfect Information Games

  46. arXiv:1605.07700  [pdf, other

    cs.LG cs.AI

    Learning Purposeful Behaviour in the Absence of Rewards

    Authors: Marlos C. Machado, Michael Bowling

    Abstract: Artificial intelligence is commonly defined as the ability to achieve goals in the world. In the reinforcement learning framework, goals are encoded as reward functions that guide agent behaviour, and the sum of observed rewards provide a notion of progress. However, some domains have no such reward signal, or have a reward signal so sparse as to appear absent. Without reward feedback, agent behav… ▽ More

    Submitted 24 May, 2016; originally announced May 2016.

    Comments: Extended version of the paper presented at the workshop entitled Abstraction in Reinforcement Learning, at the 33rd International Conference on Machine Learning, New York, NY, USA, 2016

  47. arXiv:1512.01563  [pdf, other

    cs.LG

    State of the Art Control of Atari Games Using Shallow Reinforcement Learning

    Authors: Yitao Liang, Marlos C. Machado, Erik Talvitie, Michael Bowling

    Abstract: The recently introduced Deep Q-Networks (DQN) algorithm has gained attention as one of the first successful combinations of deep neural networks and reinforcement learning. Its promise was demonstrated in the Arcade Learning Environment (ALE), a challenging framework composed of dozens of Atari 2600 games used to evaluate general competency in AI. It achieved dramatically better results than earli… ▽ More

    Submitted 21 April, 2016; v1 submitted 4 December, 2015; originally announced December 2015.

    Comments: A shorter version of this paper appears in the Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2016)

  48. arXiv:1411.7974  [pdf, ps, other

    cs.AI cs.GT cs.MA

    Solving Games with Functional Regret Estimation

    Authors: Kevin Waugh, Dustin Morrill, J. Andrew Bagnell, Michael Bowling

    Abstract: We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function app… ▽ More

    Submitted 31 December, 2014; v1 submitted 28 November, 2014; originally announced November 2014.

    Comments: AAAI Conference on Artificial Intelligence 2015

  49. arXiv:1410.4604  [pdf, other

    cs.LG cs.AI

    Domain-Independent Optimistic Initialization for Reinforcement Learning

    Authors: Marlos C. Machado, Sriram Srinivasan, Michael Bowling

    Abstract: In Reinforcement Learning (RL), it is common to use optimistic initialization of value functions to encourage exploration. However, such an approach generally depends on the domain, viz., the scale of the rewards must be known, and the feature representation must have a constant norm. We present a simple approach that performs optimistic initialization with less dependence on the domain.

    Submitted 16 October, 2014; originally announced October 2014.

  50. arXiv:1303.4441  [pdf, other

    cs.GT

    Solving Imperfect Information Games Using Decomposition

    Authors: Neil Burch, Michael Johanson, Michael Bowling

    Abstract: Decomposition, i.e. independently analyzing possible subgames, has proven to be an essential principle for effective decision-making in perfect information games. However, in imperfect information games, decomposition has proven to be problematic. To date, all proposed techniques for decomposition in imperfect information games have abandoned theoretical guarantees. This work presents the first te… ▽ More

    Submitted 21 April, 2014; v1 submitted 18 March, 2013; originally announced March 2013.

    Comments: 7 pages by 2 columns, 5 figures; April 21 2014 - expand explanations and theory