Skip to main content

Showing 1–50 of 127 results for author: Sandholm, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15970  [pdf, ps, other

    cs.GT cs.AI cs.CC

    Imperfect-Recall Games: Equilibrium Concepts and Their Complexity

    Authors: Emanuel Tewolde, Brian Hu Zhang, Caspar Oesterheld, Manolis Zampetakis, Tuomas Sandholm, Paul W. Goldberg, Vincent Conitzer

    Abstract: We investigate optimal decision making under imperfect recall, that is, when an agent forgets information it once held before. An example is the absentminded driver game, as well as team games in which the members have limited communication capabilities. In the framework of extensive-form games with imperfect recall, we analyze the computational complexities of finding equilibria in multiplayer se… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Long version of the paper that got accepted to the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI 2024). 35 pages, 10 figures, 1 table

    MSC Class: 91A05; 91A06; 91A10; 91A11; 91A18; 91A35; 91A68; 68T37; 68Q17; 68Q25 ACM Class: I.2; J.4; F.2

  2. arXiv:2406.13116  [pdf, ps, other

    cs.GT

    A Lower Bound on Swap Regret in Extensive-Form Games

    Authors: Constantinos Daskalakis, Gabriele Farina, Noah Golowich, Tuomas Sandholm, Brian Hu Zhang

    Abstract: Recent simultaneous works by Peng and Rubinstein [2024] and Dagan et al. [2024] have demonstrated the existence of a no-swap-regret learning algorithm that can reach $ε$ average swap regret against an adversary in any extensive-form game within $m^{\tilde{\mathcal O}(1/ε)}$ rounds, where $m$ is the number of nodes in the game tree. However, the question of whether a $\mathrm{poly}(m, 1/ε)$-round a… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  3. arXiv:2406.08687  [pdf, other

    cs.AI

    AlphaZeroES: Direct score maximization outperforms planning loss minimization

    Authors: Carlos Martin, Tuomas Sandholm

    Abstract: Planning at execution time has been shown to dramatically improve performance for agents in both single-agent and multi-agent settings. A well-known family of approaches to planning at execution time are AlphaZero and its variants, which use Monte Carlo Tree Search together with a neural network that guides the search by predicting state values and action probabilities. AlphaZero trains these netw… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2308.08693

  4. arXiv:2406.08683  [pdf, other

    cs.GT

    Simultaneous incremental support adjustment and metagame solving: An equilibrium-finding framework for continuous-action games

    Authors: Carlos Martin, Tuomas Sandholm

    Abstract: We present a framework for computing approximate mixed-strategy Nash equilibria of continuous-action games. It is a modification of the traditional double oracle algorithm, extended to multiple players and continuous action spaces. Unlike prior methods, it maintains fixed-cardinality pure strategy sets for each player. Thus, unlike prior methods, only a constant amount of memory is necessary. Furt… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2301.08830

  5. arXiv:2405.06797  [pdf, ps, other

    cs.GT

    Exponential Lower Bounds on the Double Oracle Algorithm in Zero-Sum Games

    Authors: Brian Hu Zhang, Tuomas Sandholm

    Abstract: The double oracle algorithm is a popular method of solving games, because it is able to reduce computing equilibria to computing a series of best responses. However, its theoretical properties are not well understood. In this paper, we provide exponential lower bounds on the performance of the double oracle algorithm in both partially-observable stochastic games (POSGs) and extensive-form games (E… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  6. arXiv:2404.09097  [pdf, other

    cs.GT

    Faster Game Solving via Hyperparameter Schedules

    Authors: Naifeng Zhang, Stephen McAleer, Tuomas Sandholm

    Abstract: The counterfactual regret minimization (CFR) family of algorithms consists of iterative algorithms for imperfect-information games. In two-player zero-sum games, the time average of the iterates converges to a Nash equilibrium. The state-of-the-art prior variants, Discounted CFR (DCFR) and Predictive CFR$^+$ (PCFR$^+$) are the fastest known algorithms for solving two-player zero-sum games in pract… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  7. arXiv:2402.09670  [pdf, ps, other

    cs.GT

    Efficient $Φ$-Regret Minimization with Low-Degree Swap Deviations in Extensive-Form Games

    Authors: Brian Hu Zhang, Ioannis Anagnostides, Gabriele Farina, Tuomas Sandholm

    Abstract: Recent breakthrough results by Dagan, Daskalakis, Fishelson and Golowich [2023] and Peng and Rubinstein [2023] established an efficient algorithm attaining at most $ε$ swap regret over extensive-form strategy spaces of dimension $N$ in $N^{\tilde O(1/ε)}$ rounds. On the other extreme, Farina and Pipis [2023] developed an efficient algorithm for minimizing the weaker notion of linear-swap regret in… ▽ More

    Submitted 17 February, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  8. arXiv:2402.08129  [pdf, ps, other

    cs.GT

    Automated Design of Affine Maximizer Mechanisms in Dynamic Settings

    Authors: Michael Curry, Vinzenz Thoma, Darshan Chakrabarti, Stephen McAleer, Christian Kroer, Tuomas Sandholm, Niao He, Sven Seuken

    Abstract: Dynamic mechanism design is a challenging extension to ordinary mechanism design in which the mechanism designer must make a sequence of decisions over time in the face of possibly untruthful reports of participating agents. Optimizing dynamic mechanisms for welfare is relatively well understood. However, there has been less work on optimizing for other goals (e.g. revenue), and without restrictiv… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: To be published in the Thirty-Eighth Proceedings of the AAAI Conference on Artificial Intelligence 2024

  9. arXiv:2402.06053  [pdf, other

    cs.HC cs.AI cs.CY

    Randomness Is All You Need: Semantic Traversal of Problem-Solution Spaces with Large Language Models

    Authors: Thomas Sandholm, Sayandev Mukherjee, Bernardo A. Huberman

    Abstract: We present a novel approach to exploring innovation problem and solution domains using LLM fine-tuning with a custom idea database. By semantically traversing the bi-directional problem and solution tree at different temperature levels we achieve high diversity in solution edit distance while still remaining close to the original problem statement semantically. In addition to finding a variety of… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  10. arXiv:2402.05245  [pdf, ps, other

    cs.GT

    On the Outcome Equivalence of Extensive-Form and Behavioral Correlated Equilibria

    Authors: Brian Hu Zhang, Tuomas Sandholm

    Abstract: We investigate two notions of correlated equilibrium for extensive-form games: extensive-form correlated equilibrium (EFCE) and behavioral correlated equilibrium (BCE). We show that the two are outcome-equivalent, in the sense that every outcome distribution achievable under one notion is achievable under the other. Our result implies, to our knowledge, the first polynomial-time algorithm for comp… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  11. arXiv:2401.17044  [pdf, other

    cs.AI cs.GT cs.MA

    Scalable Mechanism Design for Multi-Agent Path Finding

    Authors: Paul Friedrich, Yulun Zhang, Michael Curry, Ludwig Dierks, Stephen McAleer, Jiaoyang Li, Tuomas Sandholm, Sven Seuken

    Abstract: Multi-Agent Path Finding (MAPF) involves determining paths for multiple agents to travel simultaneously and collision-free through a shared area toward given goal locations. This problem is computationally complex, especially when dealing with large numbers of agents, as is common in realistic applications like autonomous vehicle coordination. Finding an optimal solution is often computationally i… ▽ More

    Submitted 8 May, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: 12 pages, 5 figures. IJCAI'24 camera-ready version

  12. arXiv:2401.13773  [pdf, other

    math.OC cs.DM cs.DS

    New Sequence-Independent Lifting Techniques for Cutting Planes and When They Induce Facets

    Authors: Siddharth Prasad, Ellen Vitercik, Maria-Florina Balcan, Tuomas Sandholm

    Abstract: Sequence-independent lifting is a procedure for strengthening valid inequalities of an integer program. We generalize the sequence-independent lifting method of Gu, Nemhauser, and Savelsbergh (GNS lifting) for cover inequalities and correct an error in their proposed generalization. We obtain a new sequence-independent lifting technique -- piecewise-constant (PC) lifting -- with a number of intere… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  13. arXiv:2312.12067  [pdf, other

    cs.GT cs.LG

    Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property

    Authors: Ioannis Anagnostides, Ioannis Panageas, Gabriele Farina, Tuomas Sandholm

    Abstract: Policy gradient methods enjoy strong practical performance in numerous tasks in reinforcement learning. Their theoretical understanding in multiagent settings, however, remains limited, especially beyond two-player competitive and potential Markov games. In this paper, we develop a new framework to characterize optimistic policy gradient methods in multi-player Markov games with a single controlle… ▽ More

    Submitted 21 December, 2023; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: To appear at AAAI 2024

  14. arXiv:2311.14869  [pdf, ps, other

    cs.GT

    On the Complexity of Computing Sparse Equilibria and Lower Bounds for No-Regret Learning in Games

    Authors: Ioannis Anagnostides, Alkis Kalavasis, Tuomas Sandholm, Manolis Zampetakis

    Abstract: Characterizing the performance of no-regret dynamics in multi-player games is a foundational problem at the interface of online learning and game theory. Recent results have revealed that when all players adopt specific learning algorithms, it is possible to improve exponentially over what is predicted by the overly pessimistic no-regret framework in the traditional adversarial regime, thereby lea… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

    Comments: To appear at ITCS 2024

  15. arXiv:2310.16976  [pdf, other

    cs.GT

    On the Interplay between Social Welfare and Tractability of Equilibria

    Authors: Ioannis Anagnostides, Tuomas Sandholm

    Abstract: Computational tractability and social welfare (aka. efficiency) of equilibria are two fundamental but in general orthogonal considerations in algorithmic game theory. Nevertheless, we show that when (approximate) full efficiency can be guaranteed via a smoothness argument à la Roughgarden, Nash equilibria are approachable under a family of no-regret learning algorithms, thereby enabling fast and d… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: To appear at NeurIPS 2023

  16. arXiv:2310.15935  [pdf, other

    cs.GT

    Mediator Interpretation and Faster Learning Algorithms for Linear Correlated Equilibria in General Extensive-Form Games

    Authors: Brian Hu Zhang, Gabriele Farina, Tuomas Sandholm

    Abstract: A recent paper by Farina & Pipis (2023) established the existence of uncoupled no-linear-swap regret dynamics with polynomial-time iterations in extensive-form games. The equilibrium points reached by these dynamics, known as linear correlated equilibria, are currently the tightest known relaxation of correlated equilibrium that can be learned in polynomial time in any finite extensive-form game.… ▽ More

    Submitted 15 March, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

  17. arXiv:2310.04373  [pdf, other

    cs.LG cs.AI

    Confronting Reward Model Overoptimization with Constrained RLHF

    Authors: Ted Moskovitz, Aaditya K. Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D. Dragan, Stephen McAleer

    Abstract: Large language models are typically aligned with human preferences by optimizing $\textit{reward models}$ (RMs) fitted to human feedback. However, human preferences are multi-faceted, and it is increasingly common to derive reward from a composition of simpler reward models which each capture a different aspect of language quality. This itself presents a challenge, as it is difficult to appropriat… ▽ More

    Submitted 10 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

  18. arXiv:2308.16017  [pdf, ps, other

    cs.GT

    Hidden-Role Games: Equilibrium Concepts and Computation

    Authors: Luca Carminati, Brian Hu Zhang, Gabriele Farina, Nicola Gatti, Tuomas Sandholm

    Abstract: In this paper, we study the class of games known as hidden-role games in which players are assigned privately to teams and are faced with the challenge of recognizing and cooperating with teammates. This model includes both popular recreational games such as the Mafia/Werewolf family and The Resistance (Avalon) and many real-world settings, such as distributed systems where nodes need to work toge… ▽ More

    Submitted 17 February, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

  19. arXiv:2308.08693  [pdf, other

    cs.AI cs.LG

    AI planning in the imagination: High-level planning on learned abstract search spaces

    Authors: Carlos Martin, Tuomas Sandholm

    Abstract: Search and planning algorithms have been a cornerstone of artificial intelligence since the field's inception. Giving reinforcement learning agents the ability to plan during execution time has resulted in significant performance improvements in various domains. However, in real-world environments, the model with respect to which the agent plans has been constrained to be grounded in the real envi… ▽ More

    Submitted 2 December, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

  20. arXiv:2307.12062  [pdf, other

    cs.LG cs.AI

    Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations

    Authors: Yongyuan Liang, Yanchao Sun, Ruijie Zheng, Xiangyu Liu, Benjamin Eysenbach, Tuomas Sandholm, Furong Huang, Stephen McAleer

    Abstract: Deploying reinforcement learning (RL) systems requires robustness to uncertainty and model misspecification, yet prior robust RL methods typically only study noise introduced independently across time. However, practical sources of uncertainty are usually coupled across time. We formally introduce temporally-coupled perturbations, presenting a novel challenge for existing robust RL methods. To tac… ▽ More

    Submitted 25 April, 2024; v1 submitted 22 July, 2023; originally announced July 2023.

    Comments: Accepted at The Twelfth International Conference on Learning Representations (ICLR 2024)

  21. arXiv:2306.05221  [pdf, other

    cs.GT

    Steering No-Regret Learners to a Desired Equilibrium

    Authors: Brian Hu Zhang, Gabriele Farina, Ioannis Anagnostides, Federico Cacciamani, Stephen Marcus McAleer, Andreas Alexander Haupt, Andrea Celli, Nicola Gatti, Vincent Conitzer, Tuomas Sandholm

    Abstract: A mediator observes no-regret learners playing an extensive-form game repeatedly across $T$ rounds. The mediator attempts to steer players toward some desirable predetermined equilibrium by giving (nonnegative) payments to players. We call this the steering problem. The steering problem captures problems several problems of interest, among them equilibrium selection and information design (persuas… ▽ More

    Submitted 17 February, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

  22. arXiv:2306.05216  [pdf, ps, other

    cs.GT

    Computing Optimal Equilibria and Mechanisms via Learning in Zero-Sum Extensive-Form Games

    Authors: Brian Hu Zhang, Gabriele Farina, Ioannis Anagnostides, Federico Cacciamani, Stephen Marcus McAleer, Andreas Alexander Haupt, Andrea Celli, Nicola Gatti, Vincent Conitzer, Tuomas Sandholm

    Abstract: We introduce a new approach for computing optimal equilibria via learning in games. It applies to extensive-form settings with any number of players, including mechanism design, information design, and solution concepts such as correlated, communication, and certification equilibria. We observe that optimal equilibria are minimax equilibrium strategies of a player in an extensive-form zero-sum gam… ▽ More

    Submitted 23 May, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

  23. arXiv:2306.03049  [pdf, other

    cs.NI

    WHO-IS: Wireless Hetnet Optimization using Impact Selection

    Authors: Thomas Sandholm, Irene Macaluso, Sayandev Mukherjee

    Abstract: We propose a method to first identify users who have the most negative impact on the overall network performance, and then offload them to an orthogonal channel. The feasibility of such an approach is verified using real-world traces, network simulations, and a lab experiment that employs multi-homed wireless stations. In our experiment, as offload target, we employ LiFi IR transceivers, and as th… ▽ More

    Submitted 26 June, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

  24. arXiv:2302.14234  [pdf, other

    cs.GT econ.TH

    Bicriteria Multidimensional Mechanism Design with Side Information

    Authors: Maria-Florina Balcan, Siddharth Prasad, Tuomas Sandholm

    Abstract: We develop a versatile new methodology for multidimensional mechanism design that incorporates side information about agent types with the bicriteria goal of generating high social welfare and high revenue simultaneously. Side information can come from a variety of sources -- examples include advice from a domain expert, predictions from a machine-learning model trained on historical agent data, o… ▽ More

    Submitted 5 June, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

  25. arXiv:2301.11241  [pdf, other

    cs.LG cs.GT

    On the Convergence of No-Regret Learning Dynamics in Time-Varying Games

    Authors: Ioannis Anagnostides, Ioannis Panageas, Gabriele Farina, Tuomas Sandholm

    Abstract: Most of the literature on learning in games has focused on the restrictive setting where the underlying repeated game does not change over time. Much less is known about the convergence of no-regret learning algorithms in dynamic multiagent settings. In this paper, we characterize the convergence of optimistic gradient descent (OGD) in time-varying games. Our framework yields sharp convergence bou… ▽ More

    Submitted 18 October, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: To appear at NeurIPS 2023; V3 incorporates reviewers' feedback and minor corrections

  26. arXiv:2301.08830  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    ApproxED: Approximate exploitability descent via learned best responses

    Authors: Carlos Martin, Tuomas Sandholm

    Abstract: There has been substantial progress on finding game-theoretic equilibria. Most of that work has focused on games with finite, discrete action spaces. However, many games involving space, time, money, and other fine-grained quantities have continuous action spaces (or are best modeled as having such). We study the problem of finding an approximate Nash equilibrium of games with continuous action se… ▽ More

    Submitted 12 June, 2024; v1 submitted 20 January, 2023; originally announced January 2023.

  27. arXiv:2211.15936  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks

    Authors: Carlos Martin, Tuomas Sandholm

    Abstract: We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients. Such game access is common in reinforcement learning settings, where the environment is typically treated as a black box. To tackle this problem, we apply zeroth-order optimization techniques that combine smoothed gradient estimators with equilibrium-finding dynamics. We model p… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

  28. arXiv:2209.14110  [pdf, other

    cs.GT

    Meta-Learning in Games

    Authors: Keegan Harris, Ioannis Anagnostides, Gabriele Farina, Mikhail Khodak, Zhiwei Steven Wu, Tuomas Sandholm

    Abstract: In the literature on game-theoretic equilibrium finding, focus has mainly been on solving a single game in isolation. In practice, however, strategic interactions -- ranging from routing problems to online advertising auctions -- evolve dynamically, thereby leading to many similar games to be solved. To address this gap, we introduce meta-learning for equilibrium finding and learning to play games… ▽ More

    Submitted 1 March, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

    Comments: In the eleventh Conference on Learning Representations (ICLR 2023)

  29. arXiv:2208.09747  [pdf, ps, other

    cs.GT cs.LG

    Near-Optimal $Φ$-Regret Learning in Extensive-Form Games

    Authors: Ioannis Anagnostides, Gabriele Farina, Tuomas Sandholm

    Abstract: In this paper, we establish efficient and uncoupled learning dynamics so that, when employed by all players in multiplayer perfect-recall imperfect-information extensive-form games, the trigger regret of each player grows as $O(\log T)$ after $T$ repetitions of play. This improves exponentially over the prior best known trigger-regret bound of $O(T^{1/4})$, and settles a recent open question by Ba… ▽ More

    Submitted 19 September, 2023; v1 submitted 20 August, 2022; originally announced August 2022.

    Comments: Appearing at ICML 2023. V3 corrects a statement

  30. arXiv:2207.06541  [pdf, other

    cs.GT cs.LG cs.MA

    Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

    Authors: Stephen McAleer, JB Lanier, Kevin Wang, Pierre Baldi, Roy Fox, Tuomas Sandholm

    Abstract: In competitive two-agent environments, deep reinforcement learning (RL) methods based on the \emph{Double Oracle (DO)} algorithm, such as \emph{Policy Space Response Oracles (PSRO)} and \emph{Anytime PSRO (APSRO)}, iteratively add RL best response policies to a population. Eventually, an optimal mixture of these population policies will approximate a Nash equilibrium. However, these methods might… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

  31. arXiv:2206.15395  [pdf, other

    cs.GT

    Polynomial-Time Optimal Equilibria with a Mediator in Extensive-Form Games

    Authors: Brian Hu Zhang, Tuomas Sandholm

    Abstract: For common notions of correlated equilibrium in extensive-form games, computing an optimal (e.g., welfare-maximizing) equilibrium is NP-hard. Other equilibrium notions -- communication (Forges 1986) and certification (Forges & Koessler 2005) equilibria -- augment the game with a mediator that has the power to both send and receive messages to and from the players -- and, in particular, to remember… ▽ More

    Submitted 30 November, 2022; v1 submitted 30 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022

  32. arXiv:2206.12762  [pdf, other

    cs.NI

    SnoW: Serverless n-Party calls over WebRTC

    Authors: Thomas Sandholm

    Abstract: We present a novel WebRTC communication system capable of hosting multi-party audio and video conferencing sessions without a media server. We implement various communication models based on the needs and capabilities of the communicating parties, and show that we can construct the equivalent of Mesh, SFU, and MCU WebRTC networks in our peer-to-peer architecture. In our evaluation we conclude that… ▽ More

    Submitted 25 June, 2022; originally announced June 2022.

  33. arXiv:2206.08742  [pdf, other

    cs.GT cs.LG

    Near-Optimal No-Regret Learning Dynamics for General Convex Games

    Authors: Gabriele Farina, Ioannis Anagnostides, Haipeng Luo, Chung-Wei Lee, Christian Kroer, Tuomas Sandholm

    Abstract: A recent line of work has established uncoupled learning dynamics such that, when employed by all players in a game, each player's \emph{regret} after $T$ repetitions grows polylogarithmically in $T$, an exponential improvement over the traditional guarantees within the no-regret framework. However, so far these results have only been limited to certain classes of games with structured strategy sp… ▽ More

    Submitted 16 October, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

    Comments: To appear at NeurIPS 2022. V2 incorporates reviewers' feedback

  34. arXiv:2206.04122  [pdf, other

    cs.GT cs.AI cs.LG stat.ML

    ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret

    Authors: Stephen McAleer, Gabriele Farina, Marc Lanctot, Tuomas Sandholm

    Abstract: Recent techniques for approximating Nash equilibria in very large games leverage neural networks to learn approximately optimal policies (strategies). One promising line of research uses neural networks to approximate counterfactual regret minimization (CFR) or its modern variants. DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains… ▽ More

    Submitted 11 October, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

  35. arXiv:2204.11417  [pdf, other

    cs.GT cs.LG

    Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games

    Authors: Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Tuomas Sandholm

    Abstract: In this paper we establish efficient and \emph{uncoupled} learning dynamics so that, when employed by all players in a general-sum multiplayer game, the \emph{swap regret} of each player after $T$ repetitions of the game is bounded by $O(\log T)$, improving over the prior best bounds of $O(\log^4 (T))$. At the same time, we guarantee optimal $O(\sqrt{T})$ swap regret in the adversarial regime as w… ▽ More

    Submitted 5 October, 2022; v1 submitted 24 April, 2022; originally announced April 2022.

    Comments: To appear at NeurIPS 2022. V2 incorporates reviewers' feedback and minor corrections

  36. arXiv:2204.07312  [pdf, other

    math.OC cs.DS cs.LG

    Structural Analysis of Branch-and-Cut and the Learnability of Gomory Mixed Integer Cuts

    Authors: Maria-Florina Balcan, Siddharth Prasad, Tuomas Sandholm, Ellen Vitercik

    Abstract: The incorporation of cutting planes within the branch-and-bound algorithm, known as branch-and-cut, forms the backbone of modern integer programming solvers. These solvers are the foremost method for solving discrete optimization problems and thus have a vast array of applications in machine learning, operations research, and many other fields. Choosing cutting planes effectively is a major resear… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  37. arXiv:2203.12074  [pdf, other

    cs.GT

    Optimistic Mirror Descent Either Converges to Nash or to Strong Coarse Correlated Equilibria in Bimatrix Games

    Authors: Ioannis Anagnostides, Gabriele Farina, Ioannis Panageas, Tuomas Sandholm

    Abstract: We show that, for any sufficiently small fixed $ε> 0$, when both players in a general-sum two-player (bimatrix) game employ optimistic mirror descent (OMD) with smooth regularization, learning rate $η= O(ε^2)$ and $T = Ω(\text{poly}(1/ε))$ repetitions, either the dynamics reach an $ε$-approximate Nash equilibrium (NE), or the average correlated distribution of play is an $Ω(\text{poly}(ε))$-strong… ▽ More

    Submitted 6 October, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: To appear at NeurIPS 2022. V2 incorporates reviewers' feedback

  38. arXiv:2203.12056  [pdf, other

    cs.GT

    On Last-Iterate Convergence Beyond Zero-Sum Games

    Authors: Ioannis Anagnostides, Ioannis Panageas, Gabriele Farina, Tuomas Sandholm

    Abstract: Most existing results about \emph{last-iterate convergence} of learning dynamics are limited to two-player zero-sum games, and only apply under rigid assumptions about what dynamics the players follow. In this paper we provide new results and techniques that apply to broader families of games and learning dynamics. First, we use a regret-based analysis to show that in a class of games that include… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

  39. arXiv:2203.07181  [pdf, other

    cs.GT cs.AI cs.LG

    Optimal Correlated Equilibria in General-Sum Extensive-Form Games: Fixed-Parameter Algorithms, Hardness, and Two-Sided Column-Generation

    Authors: Brian Zhang, Gabriele Farina, Andrea Celli, Tuomas Sandholm

    Abstract: We study the problem of finding optimal correlated equilibria of various sorts: normal-form coarse correlated equilibrium (NFCCE), extensive-form coarse correlated equilibrium (EFCCE), and extensive-form correlated equilibrium (EFCE). This is NP-hard in the general case and has been studied in special cases, most notably triangle-free games, which include all two-player games with public chance mo… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

  40. arXiv:2202.05446  [pdf, other

    cs.GT

    Faster No-Regret Learning Dynamics for Extensive-Form Correlated and Coarse Correlated Equilibria

    Authors: Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Andrea Celli, Tuomas Sandholm

    Abstract: A recent emerging trend in the literature on learning in games has been concerned with providing faster learning dynamics for correlated and coarse correlated equilibria in normal-form games. Much less is known about the significantly more challenging setting of extensive-form games, which can capture both sequential and simultaneous moves, as well as imperfect information. In this paper we establ… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

    Comments: Preliminary parts of this paper will appear at the AAAI-22 Workshop on Reinforcement Learning in Games. This version also contains results from an earlier preprint published by a subset of the authors (arXiv:2109.08138)

  41. arXiv:2202.02872  [pdf, other

    cs.GT cs.LG econ.GN

    Differentiable Economics for Randomized Affine Maximizer Auctions

    Authors: Michael Curry, Tuomas Sandholm, John Dickerson

    Abstract: A recent approach to automated mechanism design, differentiable economics, represents auctions by rich function approximators and optimizes their performance by gradient descent. The ideal auction architecture for differentiable economics would be perfectly strategyproof, support multiple bidders and items, and be rich enough to represent the optimal (i.e. revenue-maximizing) mechanism. So far, su… ▽ More

    Submitted 6 February, 2022; originally announced February 2022.

  42. arXiv:2202.00789  [pdf, other

    cs.GT

    Team Belief DAG: Generalizing the Sequence Form to Team Games for Fast Computation of Correlated Team Max-Min Equilibria via Regret Minimization

    Authors: Brian Hu Zhang, Gabriele Farina, Tuomas Sandholm

    Abstract: A classic result in the theory of extensive-form games asserts that the set of strategies available to any perfect-recall player is strategically equivalent to a low-dimensional convex polytope, called the sequence-form polytope. Online convex optimization tools operating on this polytope are the current state-of-the-art for computing several notions of equilibria in games, and have been crucial i… ▽ More

    Submitted 17 February, 2024; v1 submitted 1 February, 2022; originally announced February 2022.

  43. arXiv:2201.07700  [pdf, other

    cs.GT cs.LG cs.MA

    Anytime PSRO for Two-Player Zero-Sum Games

    Authors: Stephen McAleer, Kevin Wang, John Lanier, Marc Lanctot, Pierre Baldi, Tuomas Sandholm, Roy Fox

    Abstract: Policy space response oracles (PSRO) is a multi-agent reinforcement learning algorithm that has achieved state-of-the-art performance in very large two-player zero-sum games. PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next. We propose anytime double oracle (ADO)… ▽ More

    Submitted 28 January, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

    Comments: Published in AAAI Reinforcement Learning in Games Workshop

  44. arXiv:2112.03804  [pdf, other

    cs.GT

    Fast Payoff Matrix Sparsification Techniques for Structured Extensive-Form Games

    Authors: Gabriele Farina, Tuomas Sandholm

    Abstract: The practical scalability of many optimization algorithms for large extensive-form games is often limited by the games' huge payoff matrices. To ameliorate the issue, Zhang and Sandholm (2020) recently proposed a sparsification technique that factorizes the payoff matrix $\mathbf{A}$ into a sparser object $\mathbf{A} = \hat{\mathbf{A}} + \mathbf{U}\mathbf{V}^\top$, where the total combined number… ▽ More

    Submitted 7 December, 2021; originally announced December 2021.

    Comments: To appear at AAAI'22

  45. arXiv:2111.11207  [pdf, other

    cs.LG cs.AI cs.DS math.OC

    Improved Sample Complexity Bounds for Branch-and-Cut

    Authors: Maria-Florina Balcan, Siddharth Prasad, Tuomas Sandholm, Ellen Vitercik

    Abstract: Branch-and-cut is the most widely used algorithm for solving integer programs, employed by commercial solvers like CPLEX and Gurobi. Branch-and-cut has a wide variety of tunable parameters that have a huge impact on the size of the search tree that it builds, but are challenging to tune by hand. An increasingly popular approach is to use machine learning to tune these parameters: using a training… ▽ More

    Submitted 11 May, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

  46. arXiv:2111.09161  [pdf, other

    cs.NI

    MASS: Mobile Autonomous Station Simulation

    Authors: Thomas Sandholm, Sayandev Mukherjee

    Abstract: We propose a set of tools to replay wireless network traffic traces, while preserving the privacy of the original traces. Traces are generated by a user- and context-aware trained generative adversarial network (GAN). The replay allows for realistic traces from any number of users and of any trace duration to be produced given contextual parameters like the type of application and the real-time… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

  47. Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games

    Authors: Ioannis Anagnostides, Constantinos Daskalakis, Gabriele Farina, Maxwell Fishelson, Noah Golowich, Tuomas Sandholm

    Abstract: Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS`21) showed that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(\textrm{polylog}(T))$ after $T$ repetitions of the game. We extend their result from external regret to internal regret and swap regret, thereby establishing uncoup… ▽ More

    Submitted 24 January, 2023; v1 submitted 10 November, 2021; originally announced November 2021.

    Comments: Appeared at STOC 2022

  48. arXiv:2109.08138  [pdf, other

    cs.GT

    Efficient Decentralized Learning Dynamics for Extensive-Form Coarse Correlated Equilibrium: No Expensive Computation of Stationary Distributions Required

    Authors: Gabriele Farina, Andrea Celli, Tuomas Sandholm

    Abstract: While in two-player zero-sum games the Nash equilibrium is a well-established prescriptive notion of optimal play, its applicability as a prescriptive tool beyond that setting is limited. Consequently, the study of decentralized learning dynamics that guarantee convergence to correlated solution concepts in multiplayer, general-sum extensive-form (i.e., tree-form) games has become an important top… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

  49. arXiv:2109.05284  [pdf, ps, other

    cs.GT

    Team Correlated Equilibria in Zero-Sum Extensive-Form Games via Tree Decompositions

    Authors: Brian Hu Zhang, Tuomas Sandholm

    Abstract: Despite the many recent practical and theoretical breakthroughs in computational game theory, equilibrium finding in extensive-form team games remains a significant challenge. While NP-hard in the worst case, there are provably efficient algorithms for certain families of team game. In particular, if the game has common external information, also known as A-loss recall -- informally, actions playe… ▽ More

    Submitted 16 January, 2022; v1 submitted 11 September, 2021; originally announced September 2021.

  50. arXiv:2108.05475  [pdf, other

    cs.DC cs.CR

    SAFE: Secure Aggregation with Failover and Encryption

    Authors: Thomas Sandholm, Sayandev Mukherjee, Bernardo A. Huberman

    Abstract: We propose and experimentally evaluate a novel secure aggregation algorithm targeted at cross-organizational federated learning applications with a fixed set of participating learners. Our solution organizes learners in a chain and encrypts all traffic to reduce the controller of the aggregation to a mere message broker. We show that our algorithm scales better and is less resource demanding than… ▽ More

    Submitted 12 August, 2021; v1 submitted 11 August, 2021; originally announced August 2021.