Skip to main content

Showing 1–16 of 16 results for author: Morrill, D

.
  1. arXiv:2306.07372  [pdf, other

    cs.LG cs.AI cs.GT

    Composing Efficient, Robust Tests for Policy Selection

    Authors: Dustin Morrill, Thomas J. Walsh, Daniel Hernandez, Peter R. Wurman, Peter Stone

    Abstract: Modern reinforcement learning systems produce many high-quality policies throughout the learning process. However, to choose which policy to actually deploy in the real world, they must be tested under an intractable number of environmental conditions. We introduce RPOSST, an algorithm to select a small set of test cases from a larger pool based on a relatively small number of sample evaluations.… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: 26 pages, 13 figures. To appear in Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI 2023)

    ACM Class: B.8.1; I.2.6

  2. arXiv:2206.02036  [pdf, other

    cs.LG cs.AI stat.ML

    Interpolating Between Softmax Policy Gradient and Neural Replicator Dynamics with Capped Implicit Exploration

    Authors: Dustin Morrill, Esra'a Saleh, Michael Bowling, Amy Greenwald

    Abstract: Neural replicator dynamics (NeuRD) is an alternative to the foundational softmax policy gradient (SPG) algorithm motivated by online learning and evolutionary game theory. The NeuRD expected update is designed to be nearly identical to that of SPG, however, we show that the Monte Carlo updates differ in a substantial way: the importance correction accounting for a sampled action is nullified in th… ▽ More

    Submitted 4 June, 2022; originally announced June 2022.

    Comments: At Reinforcement Learning and Decision Making 2022, June 2022. 9 pages and 1 figure

  3. arXiv:2205.12031   

    cs.GT cs.AI

    Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games: Corrections

    Authors: Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, Amy R. Greenwald

    Abstract: Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria. To develop hindsight rational learning in sequential decision-making settings, we formalize behavioral deviations as a general class of… ▽ More

    Submitted 1 June, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: Please see version 4 of arXiv:2102.06973 (arXiv:2102.06973v4). This submission was a version of that paper with highlighted corrections. After submitting, I figured out that it would be better to submit this report as another version of arXiv:2102.06973

  4. arXiv:2111.08102  [pdf, ps, other

    cs.AI cs.GT

    The Partially Observable History Process

    Authors: Dustin Morrill, Amy R. Greenwald, Michael Bowling

    Abstract: We introduce the partially observable history process (POHP) formalism for reinforcement learning. POHP centers around the actions and observations of a single agent and abstracts away the presence of other players without reducing them to stochastic processes. Our formalism provides a streamlined interface for designing algorithms that defy categorization as exclusively single or multi-agent, and… ▽ More

    Submitted 24 February, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: 8 pages, 2 figures

    Journal ref: AAAI-22 Workshop on Reinforcement Learning and Games, February 28, 2022

  5. arXiv:2110.15907  [pdf, other

    cs.AI cs.LG

    Learning to Be Cautious

    Authors: Montaser Mohammedalamen, Dustin Morrill, Alexander Sieusahai, Yash Satsangi, Michael Bowling

    Abstract: A key challenge in the field of reinforcement learning is to develop agents that behave cautiously in novel situations. It is generally impossible to anticipate all situations that an autonomous system may face or what behavior would best avoid bad outcomes. An agent that could learn to be cautious would overcome this challenge by discovering for itself when and how to behave cautiously. In contra… ▽ More

    Submitted 29 October, 2021; originally announced October 2021.

  6. arXiv:2102.06973  [pdf, other

    cs.GT cs.AI

    Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

    Authors: Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, Amy Greenwald

    Abstract: Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria. To develop hindsight rational learning in sequential decision-making settings, we formalize behavioral deviations as a general class of… ▽ More

    Submitted 22 June, 2022; v1 submitted 13 February, 2021; originally announced February 2021.

    Comments: Corrected technical report for the paper with the same title in the proceedings of the thirty-eighth International Conference on Machine Learning (ICML 2021), virtual. Compared to v5, this version removes the version indicator from an arXiv reference. 43 pages and 6 figures

  7. arXiv:2012.05874  [pdf, other

    cs.GT cs.AI

    Hindsight and Sequential Rationality of Correlated Play

    Authors: Dustin Morrill, Ryan D'Orazio, Reca Sarfati, Marc Lanctot, James R. Wright, Amy Greenwald, Michael Bowling

    Abstract: Driven by recent successes in two-player, zero-sum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibrium-based strategies. However, this approach has been less effective at producing competent players in general-sum games or those with more than two players than in two-player, zero-sum games. An appealing alternative is to c… ▽ More

    Submitted 22 June, 2022; v1 submitted 10 December, 2020; originally announced December 2020.

    Comments: Corrected technical report for the paper with the same title in the proceedings of the thirty-fifth AAAI Conference on Artificial Intelligence (AAAI-21), February 2-9, 2021, Virtual. Compared to v5, this version fixes the realized terminal history indicators in the diagram describing MacQueen's counterexample. 27 pages and 16 figures

  8. arXiv:2009.01093  [pdf

    physics.app-ph cond-mat.mtrl-sci cond-mat.soft

    X-ray linear dichroic ptychography

    Authors: Yuan Hung Lo, Jihan Zhou, Arjun Rana, Drew Morrill, Christian Gentry, Bjoern Enders, Young-Sang Yu, Chang-Yu Sun, David Shapiro, Roger Falcone, Henry Kapteyn, Margaret Murnane, Pupa U. P. A. Gilbert, Jianwei Miao

    Abstract: Biominerals such as seashells, corals skeletons, bone, and enamel are optically anisotropic crystalline materials with unique nano- and micro-scale organization that translates into exceptional macroscopic mechanical properties, providing inspiration for engineering new and superior biomimetic structures. Here we use particles of Seriatopora aculeata coral skeleton as a model and demonstrate, for… ▽ More

    Submitted 2 September, 2020; originally announced September 2020.

  9. arXiv:2008.12234  [pdf, other

    cs.AI cs.LG

    The Advantage Regret-Matching Actor-Critic

    Authors: Audrūnas Gruslys, Marc Lanctot, Rémi Munos, Finbarr Timbers, Martin Schmid, Julien Perolat, Dustin Morrill, Vinicius Zambaldi, Jean-Baptiste Lespiau, John Schultz, Mohammad Gheshlaghi Azar, Michael Bowling, Karl Tuyls

    Abstract: Regret minimization has played a key role in online learning, equilibrium computation in games, and reinforcement learning (RL). In this paper, we describe a general model-free RL method for no-regret learning based on repeated reconsideration of past behavior. We propose a model-free RL algorithm, the AdvantageRegret-Matching Actor-Critic (ARMAC): rather than saving past state-action data, ARMAC… ▽ More

    Submitted 27 August, 2020; originally announced August 2020.

  10. arXiv:1912.02967  [pdf, other

    cs.AI cs.GT cs.LG

    Alternative Function Approximation Parameterizations for Solving Games: An Analysis of $f$-Regression Counterfactual Regret Minimization

    Authors: Ryan D'Orazio, Dustin Morrill, James R. Wright, Michael Bowling

    Abstract: Function approximation is a powerful approach for structuring large decision problems that has facilitated great achievements in the areas of reinforcement learning and game playing. Regression counterfactual regret minimization (RCFR) is a simple algorithm for approximately solving imperfect information games with normalized rectified linear unit (ReLU) parameterized policies. In contrast, the mo… ▽ More

    Submitted 1 May, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

    Comments: 11 pages, includes appendix

    Journal ref: Nineteenth International Conference on Autonomous Agents and Multi-Agent Systems, 9-13 May 2020, Auckland, New Zealand

  11. arXiv:1910.01706  [pdf, ps, other

    cs.LG cs.GT

    Bounds for Approximate Regret-Matching Algorithms

    Authors: Ryan D'Orazio, Dustin Morrill, James R. Wright

    Abstract: A dominant approach to solving large imperfect-information games is Counterfactural Regret Minimization (CFR). In CFR, many regret minimization problems are combined to solve the game. For very large games, abstraction is typically needed to render CFR tractable. Abstractions are often manually tuned, possibly removing important strategic differences in the full game and harming performance. Funct… ▽ More

    Submitted 27 November, 2019; v1 submitted 3 October, 2019; originally announced October 2019.

    Comments: 4 pages + acknowledgements, references, and appendices (9 pages total)

    Journal ref: Smooth Games Optimization and Machine Learning Workshop: Bridging Game Theory and Deep Learning (SGO&ML), at the Thirty-third Conference on Neural Information Processing Systems (NeurIPS 2019), Dec 14th, 2019, Vancouver, Canada

  12. arXiv:1908.09453  [pdf, other

    cs.LG cs.AI cs.GT cs.MA

    OpenSpiel: A Framework for Reinforcement Learning in Games

    Authors: Marc Lanctot, Edward Lockhart, Jean-Baptiste Lespiau, Vinicius Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes , et al. (2 additional authors not shown)

    Abstract: OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and sequential, strictly turn-taking and simultaneous-move, perfect and imperfect information games, as well as traditional multiagent environments such as (partia… ▽ More

    Submitted 26 September, 2020; v1 submitted 25 August, 2019; originally announced August 2019.

  13. arXiv:1906.00190  [pdf, other

    cs.LG cs.AI stat.ML

    Neural Replicator Dynamics

    Authors: Daniel Hennes, Dustin Morrill, Shayegan Omidshafiei, Remi Munos, Julien Perolat, Marc Lanctot, Audrunas Gruslys, Jean-Baptiste Lespiau, Paavo Parmas, Edgar Duenez-Guzman, Karl Tuyls

    Abstract: Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning. Using these algorithms in multiagent environments poses problems such as nonstationarity and instability. In this paper, we first demonstrate that standard softmax-based policy gradient can be prone to poor performance in the presence of even the most benign nonstati… ▽ More

    Submitted 26 February, 2020; v1 submitted 1 June, 2019; originally announced June 2019.

  14. arXiv:1903.05614  [pdf, other

    cs.AI cs.GT cs.LG

    Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent

    Authors: Edward Lockhart, Marc Lanctot, Julien Pérolat, Jean-Baptiste Lespiau, Dustin Morrill, Finbarr Timbers, Karl Tuyls

    Abstract: In this paper, we present exploitability descent, a new algorithm to compute approximate equilibria in two-player zero-sum extensive-form games with imperfect information, by direct policy optimization against worst-case opponents. We prove that when following this optimization, the exploitability of a player's strategy converges asymptotically to zero, and hence when both players employ this opti… ▽ More

    Submitted 12 June, 2020; v1 submitted 13 March, 2019; originally announced March 2019.

    Comments: IJCAI 2019, 11 pages, 1 figure

  15. DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker

    Authors: Matej Moravčík, Martin Schmid, Neil Burch, Viliam Lisý, Dustin Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson, Michael Bowling

    Abstract: Artificial intelligence has seen several breakthroughs in recent years, with games often serving as milestones. A common feature of these games is that players have perfect information. Poker is the quintessential game of imperfect information, and a longstanding challenge problem in artificial intelligence. We introduce DeepStack, an algorithm for imperfect information settings. It combines recur… ▽ More

    Submitted 3 March, 2017; v1 submitted 6 January, 2017; originally announced January 2017.

  16. arXiv:1411.7974  [pdf, ps, other

    cs.AI cs.GT cs.MA

    Solving Games with Functional Regret Estimation

    Authors: Kevin Waugh, Dustin Morrill, J. Andrew Bagnell, Michael Bowling

    Abstract: We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function app… ▽ More

    Submitted 31 December, 2014; v1 submitted 28 November, 2014; originally announced November 2014.

    Comments: AAAI Conference on Artificial Intelligence 2015