Search | arXiv e-print repository

Efficient $Φ$-Regret Minimization with Low-Degree Swap Deviations in Extensive-Form Games

Authors: Brian Hu Zhang, Ioannis Anagnostides, Gabriele Farina, Tuomas Sandholm

Abstract: Recent breakthrough results by Dagan, Daskalakis, Fishelson and Golowich [2023] and Peng and Rubinstein [2023] established an efficient algorithm attaining at most $ε$ swap regret over extensive-form strategy spaces of dimension $N$ in $N^{\tilde O(1/ε)}$ rounds. On the other extreme, Farina and Pipis [2023] developed an efficient algorithm for minimizing the weaker notion of linear-swap regret in… ▽ More Recent breakthrough results by Dagan, Daskalakis, Fishelson and Golowich [2023] and Peng and Rubinstein [2023] established an efficient algorithm attaining at most $ε$ swap regret over extensive-form strategy spaces of dimension $N$ in $N^{\tilde O(1/ε)}$ rounds. On the other extreme, Farina and Pipis [2023] developed an efficient algorithm for minimizing the weaker notion of linear-swap regret in $\mathsf{poly}(N)/ε^2$ rounds. In this paper, we take a step toward bridging the gap between those two results. We introduce the set of $k$-mediator deviations, which generalize the untimed communication deviations recently introduced by Zhang, Farina and Sandholm [2024] to the case of having multiple mediators. We develop parameterized algorithms for minimizing the regret with respect to this set of deviations in $N^{O(k)}/ε^2$ rounds. This closes the gap in the sense that $k=1$ recovers linear swap regret, while $k=N$ recovers swap regret. Moreover, by relating $k$-mediator deviations to low-degree polynomials, we show that regret minimization against degree-$k$ polynomial swap deviations is achievable in $N^{O(kd)^3}/ε^2$ rounds, where $d$ is the depth of the game, assuming constant branching factor. For a fixed degree $k$, this is polynomial for Bayesian games and quasipolynomial more broadly when $d = \mathsf{polylog} N$ -- the usual balancedness assumption on the game tree. △ Less

Submitted 17 February, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

arXiv:2312.12067 [pdf, other]

Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property

Authors: Ioannis Anagnostides, Ioannis Panageas, Gabriele Farina, Tuomas Sandholm

Abstract: Policy gradient methods enjoy strong practical performance in numerous tasks in reinforcement learning. Their theoretical understanding in multiagent settings, however, remains limited, especially beyond two-player competitive and potential Markov games. In this paper, we develop a new framework to characterize optimistic policy gradient methods in multi-player Markov games with a single controlle… ▽ More Policy gradient methods enjoy strong practical performance in numerous tasks in reinforcement learning. Their theoretical understanding in multiagent settings, however, remains limited, especially beyond two-player competitive and potential Markov games. In this paper, we develop a new framework to characterize optimistic policy gradient methods in multi-player Markov games with a single controller. Specifically, under the further assumption that the game exhibits an equilibrium collapse, in that the marginals of coarse correlated equilibria (CCE) induce Nash equilibria (NE), we show convergence to stationary $ε$-NE in $O(1/ε^2)$ iterations, where $O(\cdot)$ suppresses polynomial factors in the natural parameters of the game. Such an equilibrium collapse is well-known to manifest itself in two-player zero-sum Markov games, but also occurs even in a class of multi-player Markov games with separable interactions, as established by recent work. As a result, we bypass known complexity barriers for computing stationary NE when either of our assumptions fails. Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games. △ Less

Submitted 21 December, 2023; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: To appear at AAAI 2024

arXiv:2311.14869 [pdf, ps, other]

On the Complexity of Computing Sparse Equilibria and Lower Bounds for No-Regret Learning in Games

Authors: Ioannis Anagnostides, Alkis Kalavasis, Tuomas Sandholm, Manolis Zampetakis

Abstract: Characterizing the performance of no-regret dynamics in multi-player games is a foundational problem at the interface of online learning and game theory. Recent results have revealed that when all players adopt specific learning algorithms, it is possible to improve exponentially over what is predicted by the overly pessimistic no-regret framework in the traditional adversarial regime, thereby lea… ▽ More Characterizing the performance of no-regret dynamics in multi-player games is a foundational problem at the interface of online learning and game theory. Recent results have revealed that when all players adopt specific learning algorithms, it is possible to improve exponentially over what is predicted by the overly pessimistic no-regret framework in the traditional adversarial regime, thereby leading to faster convergence to the set of coarse correlated equilibria (CCE). Yet, despite considerable recent progress, the fundamental complexity barriers for learning in normal- and extensive-form games are poorly understood. In this paper, we make a step towards closing this gap by first showing that -- barring major complexity breakthroughs -- any polynomial-time learning algorithms in extensive-form games need at least $2^{\log^{1/2 - o(1)} |\mathcal{T}|}$ iterations for the average regret to reach below even an absolute constant, where $|\mathcal{T}|$ is the number of nodes in the game. This establishes a superpolynomial separation between no-regret learning in normal- and extensive-form games, as in the former class a logarithmic number of iterations suffices to achieve constant average regret. Furthermore, our results imply that algorithms such as multiplicative weights update, as well as its \emph{optimistic} counterpart, require at least $2^{(\log \log m)^{1/2 - o(1)}}$ iterations to attain an $O(1)$-CCE in $m$-action normal-form games. These are the first non-trivial -- and dimension-dependent -- lower bounds in that setting for the most well-studied algorithms in the literature. From a technical standpoint, we follow a beautiful connection recently made by Foster, Golowich, and Kakade (ICML '23) between sparse CCE and Nash equilibria in the context of Markov games. Consequently, our lower bounds rule out polynomial-time algorithms well beyond the traditional online learning framework. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: To appear at ITCS 2024

arXiv:2310.16976 [pdf, other]

On the Interplay between Social Welfare and Tractability of Equilibria

Authors: Ioannis Anagnostides, Tuomas Sandholm

Abstract: Computational tractability and social welfare (aka. efficiency) of equilibria are two fundamental but in general orthogonal considerations in algorithmic game theory. Nevertheless, we show that when (approximate) full efficiency can be guaranteed via a smoothness argument à la Roughgarden, Nash equilibria are approachable under a family of no-regret learning algorithms, thereby enabling fast and d… ▽ More Computational tractability and social welfare (aka. efficiency) of equilibria are two fundamental but in general orthogonal considerations in algorithmic game theory. Nevertheless, we show that when (approximate) full efficiency can be guaranteed via a smoothness argument à la Roughgarden, Nash equilibria are approachable under a family of no-regret learning algorithms, thereby enabling fast and decentralized computation. We leverage this connection to obtain new convergence results in large games -- wherein the number of players $n \gg 1$ -- under the well-documented property of full efficiency via smoothness in the limit. Surprisingly, our framework unifies equilibrium computation in disparate classes of problems including games with vanishing strategic sensitivity and two-player zero-sum games, illuminating en route an immediate but overlooked equivalence between smoothness and a well-studied condition in the optimization literature known as the Minty property. Finally, we establish that a family of no-regret dynamics attains a welfare bound that improves over the smoothness framework while at the same time guaranteeing convergence to the set of coarse correlated equilibria. We show this by employing the clairvoyant mirror descent algortihm recently introduced by Piliouras et al. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: To appear at NeurIPS 2023

arXiv:2306.05221 [pdf, other]

Steering No-Regret Learners to a Desired Equilibrium

Authors: Brian Hu Zhang, Gabriele Farina, Ioannis Anagnostides, Federico Cacciamani, Stephen Marcus McAleer, Andreas Alexander Haupt, Andrea Celli, Nicola Gatti, Vincent Conitzer, Tuomas Sandholm

Abstract: A mediator observes no-regret learners playing an extensive-form game repeatedly across $T$ rounds. The mediator attempts to steer players toward some desirable predetermined equilibrium by giving (nonnegative) payments to players. We call this the steering problem. The steering problem captures problems several problems of interest, among them equilibrium selection and information design (persuas… ▽ More A mediator observes no-regret learners playing an extensive-form game repeatedly across $T$ rounds. The mediator attempts to steer players toward some desirable predetermined equilibrium by giving (nonnegative) payments to players. We call this the steering problem. The steering problem captures problems several problems of interest, among them equilibrium selection and information design (persuasion). If the mediator's budget is unbounded, steering is trivial because the mediator can simply pay the players to play desirable actions. We study two bounds on the mediator's payments: a total budget and a per-round budget. If the mediator's total budget does not grow with $T$, we show that steering is impossible. However, we show that it is enough for the total budget to grow sublinearly with $T$, that is, for the average payment to vanish. When players' full strategies are observed at each round, we show that constant per-round budgets permit steering. In the more challenging setting where only trajectories through the game tree are observable, we show that steering is impossible with constant per-round budgets in general extensive-form games, but possible in normal-form games or if the per-round budget may itself depend on $T$. We also show how our results can be generalized to the case when the equilibrium is being computed online while steering is happening. We supplement our theoretical positive results with experiments highlighting the efficacy of steering in large games. △ Less

Submitted 17 February, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

arXiv:2306.05216 [pdf, ps, other]

Computing Optimal Equilibria and Mechanisms via Learning in Zero-Sum Extensive-Form Games

Authors: Brian Hu Zhang, Gabriele Farina, Ioannis Anagnostides, Federico Cacciamani, Stephen Marcus McAleer, Andreas Alexander Haupt, Andrea Celli, Nicola Gatti, Vincent Conitzer, Tuomas Sandholm

Abstract: We introduce a new approach for computing optimal equilibria via learning in games. It applies to extensive-form settings with any number of players, including mechanism design, information design, and solution concepts such as correlated, communication, and certification equilibria. We observe that optimal equilibria are minimax equilibrium strategies of a player in an extensive-form zero-sum gam… ▽ More We introduce a new approach for computing optimal equilibria via learning in games. It applies to extensive-form settings with any number of players, including mechanism design, information design, and solution concepts such as correlated, communication, and certification equilibria. We observe that optimal equilibria are minimax equilibrium strategies of a player in an extensive-form zero-sum game. This reformulation allows to apply techniques for learning in zero-sum games, yielding the first learning dynamics that converge to optimal equilibria, not only in empirical averages, but also in iterates. We demonstrate the practical scalability and flexibility of our approach by attaining state-of-the-art performance in benchmark tabular games, and by computing an optimal mechanism for a sequential auction design problem using deep reinforcement learning. △ Less

Submitted 23 May, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

arXiv:2301.11241 [pdf, other]

On the Convergence of No-Regret Learning Dynamics in Time-Varying Games

Authors: Ioannis Anagnostides, Ioannis Panageas, Gabriele Farina, Tuomas Sandholm

Abstract: Most of the literature on learning in games has focused on the restrictive setting where the underlying repeated game does not change over time. Much less is known about the convergence of no-regret learning algorithms in dynamic multiagent settings. In this paper, we characterize the convergence of optimistic gradient descent (OGD) in time-varying games. Our framework yields sharp convergence bou… ▽ More Most of the literature on learning in games has focused on the restrictive setting where the underlying repeated game does not change over time. Much less is known about the convergence of no-regret learning algorithms in dynamic multiagent settings. In this paper, we characterize the convergence of optimistic gradient descent (OGD) in time-varying games. Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games parameterized on natural variation measures of the sequence of games, subsuming known results for static games. Furthermore, we establish improved second-order variation bounds under strong convexity-concavity, as long as each game is repeated multiple times. Our results also apply to time-varying general-sum multi-player games via a bilinear formulation of correlated equilibria, which has novel implications for meta-learning and for obtaining refined variation-dependent regret bounds, addressing questions left open in prior papers. Finally, we leverage our framework to also provide new insights on dynamic regret guarantees in static games. △ Less

Submitted 18 October, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

Comments: To appear at NeurIPS 2023; V3 incorporates reviewers' feedback and minor corrections

arXiv:2301.02129 [pdf, ps, other]

Algorithms and Complexity for Computing Nash Equilibria in Adversarial Team Games

Authors: Ioannis Anagnostides, Fivos Kalogiannis, Ioannis Panageas, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Stephen McAleer

Abstract: Adversarial team games model multiplayer strategic interactions in which a team of identically-interested players is competing against an adversarial player in a zero-sum game. Such games capture many well-studied settings in game theory, such as congestion games, but go well-beyond to environments wherein the cooperation of one team -- in the absence of explicit communication -- is obstructed by… ▽ More Adversarial team games model multiplayer strategic interactions in which a team of identically-interested players is competing against an adversarial player in a zero-sum game. Such games capture many well-studied settings in game theory, such as congestion games, but go well-beyond to environments wherein the cooperation of one team -- in the absence of explicit communication -- is obstructed by competing entities; the latter setting remains poorly understood despite its numerous applications. Since the seminal work of Von Stengel and Koller (GEB `97), different solution concepts have received attention from an algorithmic standpoint. Yet, the complexity of the standard Nash equilibrium has remained open. In this paper, we settle this question by showing that computing a Nash equilibrium in adversarial team games belongs to the class continuous local search (CLS), thereby establishing CLS-completeness by virtue of the recent CLS-hardness result of Rubinstein and Babichenko (STOC `21) in potential games. To do so, we leverage linear programming duality to prove that any $ε$-approximate stationary strategy for the team can be extended in polynomial time to an $O(ε)$-approximate Nash equilibrium, where the $O(\cdot)$ notation suppresses polynomial factors in the description of the game. As a consequence, we show that the Moreau envelop of a suitable best response function acts as a potential under certain natural gradient-based dynamics. △ Less

Submitted 30 May, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

Comments: To appear at the conference on Economics and Computation (EC) 2023

arXiv:2209.14110 [pdf, other]

Meta-Learning in Games

Authors: Keegan Harris, Ioannis Anagnostides, Gabriele Farina, Mikhail Khodak, Zhiwei Steven Wu, Tuomas Sandholm

Abstract: In the literature on game-theoretic equilibrium finding, focus has mainly been on solving a single game in isolation. In practice, however, strategic interactions -- ranging from routing problems to online advertising auctions -- evolve dynamically, thereby leading to many similar games to be solved. To address this gap, we introduce meta-learning for equilibrium finding and learning to play games… ▽ More In the literature on game-theoretic equilibrium finding, focus has mainly been on solving a single game in isolation. In practice, however, strategic interactions -- ranging from routing problems to online advertising auctions -- evolve dynamically, thereby leading to many similar games to be solved. To address this gap, we introduce meta-learning for equilibrium finding and learning to play games. We establish the first meta-learning guarantees for a variety of fundamental and well-studied classes of games, including two-player zero-sum games, general-sum games, and Stackelberg games. In particular, we obtain rates of convergence to different game-theoretic equilibria that depend on natural notions of similarity between the sequence of games encountered, while at the same time recovering the known single-game guarantees when the sequence of games is arbitrary. Along the way, we prove a number of new results in the single-game regime through a simple and unified framework, which may be of independent interest. Finally, we evaluate our meta-learning algorithms on endgames faced by the poker agent Libratus against top human professionals. The experiments show that games with varying stack sizes can be solved significantly faster using our meta-learning techniques than by solving them separately, often by an order of magnitude. △ Less

Submitted 1 March, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

Comments: In the eleventh Conference on Learning Representations (ICLR 2023)

arXiv:2208.11787 [pdf, other]

Sampling and Optimal Preference Elicitation in Simple Mechanisms

Authors: Ioannis Anagnostides, Dimitris Fotakis, Panagiotis Patsilinakos

Abstract: In this work we are concerned with the design of efficient mechanisms while eliciting limited information from the agents. First, we study the performance of sampling approximations in facility location games. Our key result is to show that for any $ε> 0$, a sample of size $c(ε) = Θ(1/ε^2)$ yields in expectation a $1 + ε$ approximation with respect to the optimal social cost of the generalized med… ▽ More In this work we are concerned with the design of efficient mechanisms while eliciting limited information from the agents. First, we study the performance of sampling approximations in facility location games. Our key result is to show that for any $ε> 0$, a sample of size $c(ε) = Θ(1/ε^2)$ yields in expectation a $1 + ε$ approximation with respect to the optimal social cost of the generalized median mechanism on the metric space $(\mathbb{R}^d, \| \cdot \|_1)$, while the number of agents $n \to \infty$. Moreover, we study a series of exemplar environments from auction theory through a communication complexity framework, measuring the expected number of bits elicited from the agents; we posit that any valuation can be expressed with $k$ bits, and we mainly assume that $k$ is independent of the number of agents $n$. In this context, we show that Vickrey's rule can be implemented with an expected communication of $1 + ε$ bits from an average bidder, for any $ε> 0$, asymptotically matching the trivial lower bound. As a corollary, we provide a compelling method to increment the price in an English auction. We also leverage our single-item format with an efficient encoding scheme to prove that the same communication bound can be recovered in the domain of additive valuations through simultaneous ascending auctions, assuming that the number of items is a constant. Finally, we propose an ascending-type multi-unit auction under unit demand bidders; our mechanism announces at every round two separate prices and is based on a sampling algorithm that performs approximate selection with limited communication, leading again to asymptotically optimal communication. Our results do not require any prior knowledge on the agents' valuations, and mainly follow from natural sampling techniques. △ Less

Submitted 24 August, 2022; originally announced August 2022.

Comments: Preliminary version appeared at SAGT 2020

arXiv:2208.09747 [pdf, ps, other]

Near-Optimal $Φ$-Regret Learning in Extensive-Form Games

Authors: Ioannis Anagnostides, Gabriele Farina, Tuomas Sandholm

Abstract: In this paper, we establish efficient and uncoupled learning dynamics so that, when employed by all players in multiplayer perfect-recall imperfect-information extensive-form games, the trigger regret of each player grows as $O(\log T)$ after $T$ repetitions of play. This improves exponentially over the prior best known trigger-regret bound of $O(T^{1/4})$, and settles a recent open question by Ba… ▽ More In this paper, we establish efficient and uncoupled learning dynamics so that, when employed by all players in multiplayer perfect-recall imperfect-information extensive-form games, the trigger regret of each player grows as $O(\log T)$ after $T$ repetitions of play. This improves exponentially over the prior best known trigger-regret bound of $O(T^{1/4})$, and settles a recent open question by Bai et al. (2022). As an immediate consequence, we guarantee convergence to the set of extensive-form correlated equilibria and coarse correlated equilibria at a near-optimal rate of $\frac{\log T}{T}$. Building on prior work, at the heart of our construction lies a more general result regarding fixed points deriving from rational functions with polynomial degree, a property that we establish for the fixed points of (coarse) trigger deviation functions. Moreover, our construction leverages a refined regret circuit for the convex hull, which -- unlike prior guarantees -- preserves the RVU property introduced by Syrgkanis et al. (NIPS, 2015); this observation has an independent interest in establishing near-optimal regret under learning dynamics based on a CFR-type decomposition of the regret. △ Less

Submitted 19 September, 2023; v1 submitted 20 August, 2022; originally announced August 2022.

Comments: Appearing at ICML 2023. V3 corrects a statement

arXiv:2208.02204 [pdf, ps, other]

Efficiently Computing Nash Equilibria in Adversarial Team Markov Games

Authors: Fivos Kalogiannis, Ioannis Anagnostides, Ioannis Panageas, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Vaggos Chatziafratis, Stelios Stavroulakis

Abstract: Computing Nash equilibrium policies is a central problem in multi-agent reinforcement learning that has received extensive attention both in theory and in practice. However, provable guarantees have been thus far either limited to fully competitive or cooperative scenarios or impose strong assumptions that are difficult to meet in most practical applications. In this work, we depart from those pri… ▽ More Computing Nash equilibrium policies is a central problem in multi-agent reinforcement learning that has received extensive attention both in theory and in practice. However, provable guarantees have been thus far either limited to fully competitive or cooperative scenarios or impose strong assumptions that are difficult to meet in most practical applications. In this work, we depart from those prior results by investigating infinite-horizon \emph{adversarial team Markov games}, a natural and well-motivated class of games in which a team of identically-interested players -- in the absence of any explicit coordination or communication -- is competing against an adversarial player. This setting allows for a unifying treatment of zero-sum Markov games and Markov potential games, and serves as a step to model more realistic strategic interactions that feature both competing and cooperative interests. Our main contribution is the first algorithm for computing stationary $ε$-approximate Nash equilibria in adversarial team Markov games with computational complexity that is polynomial in all the natural parameters of the game, as well as $1/ε$. The proposed algorithm is particularly natural and practical, and it is based on performing independent policy gradient steps for each player in the team, in tandem with best responses from the side of the adversary; in turn, the policy for the adversary is then obtained by solving a carefully constructed linear program. Our analysis leverages non-standard techniques to establish the KKT optimality conditions for a nonlinear program with nonconvex constraints, thereby leading to a natural interpretation of the induced Lagrange multipliers. Along the way, we significantly extend an important characterization of optimal policies in adversarial (normal-form) team games due to Von Stengel and Koller (GEB `97). △ Less

Submitted 3 August, 2022; originally announced August 2022.

arXiv:2206.08742 [pdf, other]

Near-Optimal No-Regret Learning Dynamics for General Convex Games

Authors: Gabriele Farina, Ioannis Anagnostides, Haipeng Luo, Chung-Wei Lee, Christian Kroer, Tuomas Sandholm

Abstract: A recent line of work has established uncoupled learning dynamics such that, when employed by all players in a game, each player's \emph{regret} after $T$ repetitions grows polylogarithmically in $T$, an exponential improvement over the traditional guarantees within the no-regret framework. However, so far these results have only been limited to certain classes of games with structured strategy sp… ▽ More A recent line of work has established uncoupled learning dynamics such that, when employed by all players in a game, each player's \emph{regret} after $T$ repetitions grows polylogarithmically in $T$, an exponential improvement over the traditional guarantees within the no-regret framework. However, so far these results have only been limited to certain classes of games with structured strategy spaces -- such as normal-form and extensive-form games. The question as to whether $O(\text{polylog} T)$ regret bounds can be obtained for general convex and compact strategy sets -- which occur in many fundamental models in economics and multiagent systems -- while retaining efficient strategy updates is an important question. In this paper, we answer this in the positive by establishing the first uncoupled learning algorithm with $O(\log T)$ per-player regret in general \emph{convex games}, that is, games with concave utility functions supported on arbitrary convex and compact strategy sets. Our learning dynamics are based on an instantiation of optimistic follow-the-regularized-leader over an appropriately \emph{lifted} space using a \emph{self-concordant regularizer} that is, peculiarly, not a barrier for the feasible region. Further, our learning dynamics are efficiently implementable given access to a proximal oracle for the convex strategy set, leading to $O(\log\log T)$ per-iteration complexity; we also give extensions when access to only a \emph{linear} optimization oracle is assumed. Finally, we adapt our dynamics to guarantee $O(\sqrt{T})$ regret in the adversarial regime. Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret bounds either in terms of the dependence on the number of iterations or on the dimension of the strategy sets. △ Less

Submitted 16 October, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

Comments: To appear at NeurIPS 2022. V2 incorporates reviewers' feedback

arXiv:2204.11417 [pdf, other]

Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games

Authors: Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Tuomas Sandholm

Abstract: In this paper we establish efficient and \emph{uncoupled} learning dynamics so that, when employed by all players in a general-sum multiplayer game, the \emph{swap regret} of each player after $T$ repetitions of the game is bounded by $O(\log T)$, improving over the prior best bounds of $O(\log^4 (T))$. At the same time, we guarantee optimal $O(\sqrt{T})$ swap regret in the adversarial regime as w… ▽ More In this paper we establish efficient and \emph{uncoupled} learning dynamics so that, when employed by all players in a general-sum multiplayer game, the \emph{swap regret} of each player after $T$ repetitions of the game is bounded by $O(\log T)$, improving over the prior best bounds of $O(\log^4 (T))$. At the same time, we guarantee optimal $O(\sqrt{T})$ swap regret in the adversarial regime as well. To obtain these results, our primary contribution is to show that when all players follow our dynamics with a \emph{time-invariant} learning rate, the \emph{second-order path lengths} of the dynamics up to time $T$ are bounded by $O(\log T)$, a fundamental property which could have further implications beyond near-optimally bounding the (swap) regret. Our proposed learning dynamics combine in a novel way \emph{optimistic} regularized learning with the use of \emph{self-concordant barriers}. Further, our analysis is remarkably simple, bypassing the cumbersome framework of higher-order smoothness recently developed by Daskalakis, Fishelson, and Golowich (NeurIPS'21). △ Less

Submitted 5 October, 2022; v1 submitted 24 April, 2022; originally announced April 2022.

Comments: To appear at NeurIPS 2022. V2 incorporates reviewers' feedback and minor corrections

arXiv:2203.12074 [pdf, other]

Optimistic Mirror Descent Either Converges to Nash or to Strong Coarse Correlated Equilibria in Bimatrix Games

Authors: Ioannis Anagnostides, Gabriele Farina, Ioannis Panageas, Tuomas Sandholm

Abstract: We show that, for any sufficiently small fixed $ε> 0$, when both players in a general-sum two-player (bimatrix) game employ optimistic mirror descent (OMD) with smooth regularization, learning rate $η= O(ε^2)$ and $T = Ω(\text{poly}(1/ε))$ repetitions, either the dynamics reach an $ε$-approximate Nash equilibrium (NE), or the average correlated distribution of play is an $Ω(\text{poly}(ε))$-strong… ▽ More We show that, for any sufficiently small fixed $ε> 0$, when both players in a general-sum two-player (bimatrix) game employ optimistic mirror descent (OMD) with smooth regularization, learning rate $η= O(ε^2)$ and $T = Ω(\text{poly}(1/ε))$ repetitions, either the dynamics reach an $ε$-approximate Nash equilibrium (NE), or the average correlated distribution of play is an $Ω(\text{poly}(ε))$-strong coarse correlated equilibrium (CCE): any possible unilateral deviation does not only leave the player worse, but will decrease its utility by $Ω(\text{poly}(ε))$. As an immediate consequence, when the iterates of OMD are bounded away from being Nash equilibria in a bimatrix game, we guarantee convergence to an exact CCE after only $O(1)$ iterations. Our results reveal that uncoupled no-regret learning algorithms can converge to CCE in general-sum games remarkably faster than to NE in, for example, zero-sum games. To establish this, we show that when OMD does not reach arbitrarily close to a NE, the (cumulative) regret of both players is not only negative, but decays linearly with time. Given that regret is the canonical measure of performance in online learning, our results suggest that cycling behavior of no-regret learning algorithms in games can be justified in terms of efficiency. △ Less

Submitted 6 October, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: To appear at NeurIPS 2022. V2 incorporates reviewers' feedback

arXiv:2203.12056 [pdf, other]

On Last-Iterate Convergence Beyond Zero-Sum Games

Authors: Ioannis Anagnostides, Ioannis Panageas, Gabriele Farina, Tuomas Sandholm

Abstract: Most existing results about \emph{last-iterate convergence} of learning dynamics are limited to two-player zero-sum games, and only apply under rigid assumptions about what dynamics the players follow. In this paper we provide new results and techniques that apply to broader families of games and learning dynamics. First, we use a regret-based analysis to show that in a class of games that include… ▽ More Most existing results about \emph{last-iterate convergence} of learning dynamics are limited to two-player zero-sum games, and only apply under rigid assumptions about what dynamics the players follow. In this paper we provide new results and techniques that apply to broader families of games and learning dynamics. First, we use a regret-based analysis to show that in a class of games that includes constant-sum polymatrix and strategically zero-sum games, dynamics such as \emph{optimistic mirror descent (OMD)} have \emph{bounded second-order path lengths}, a property which holds even when players employ different algorithms and prediction mechanisms. This enables us to obtain $O(1/\sqrt{T})$ rates and optimal $O(1)$ regret bounds. Our analysis also reveals a surprising property: OMD either reaches arbitrarily close to a Nash equilibrium, or it outperforms the \emph{robust price of anarchy} in efficiency. Moreover, for potential games we establish convergence to an $ε$-equilibrium after $O(1/ε^2)$ iterations for mirror descent under a broad class of regularizers, as well as optimal $O(1)$ regret bounds for OMD variants. Our framework also extends to near-potential games, and unifies known analyses for distributed learning in Fisher's market model. Finally, we analyze the convergence, efficiency, and robustness of \emph{optimistic gradient descent (OGD)} in general-sum continuous games. △ Less

Submitted 22 March, 2022; originally announced March 2022.

arXiv:2202.05446 [pdf, other]

Faster No-Regret Learning Dynamics for Extensive-Form Correlated and Coarse Correlated Equilibria

Authors: Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Andrea Celli, Tuomas Sandholm

Abstract: A recent emerging trend in the literature on learning in games has been concerned with providing faster learning dynamics for correlated and coarse correlated equilibria in normal-form games. Much less is known about the significantly more challenging setting of extensive-form games, which can capture both sequential and simultaneous moves, as well as imperfect information. In this paper we establ… ▽ More A recent emerging trend in the literature on learning in games has been concerned with providing faster learning dynamics for correlated and coarse correlated equilibria in normal-form games. Much less is known about the significantly more challenging setting of extensive-form games, which can capture both sequential and simultaneous moves, as well as imperfect information. In this paper we establish faster no-regret learning dynamics for \textit{extensive-form correlated equilibria (EFCE)} in multiplayer general-sum imperfect-information extensive-form games. When all players follow our accelerated dynamics, the correlated distribution of play is an $O(T^{-3/4})$-approximate EFCE, where the $O(\cdot)$ notation suppresses parameters polynomial in the description of the game. This significantly improves over the best prior rate of $O(T^{-1/2})$. To achieve this, we develop a framework for performing accelerated \emph{Phi-regret minimization} via predictions. One of our key technical contributions -- that enables us to employ our generic template -- is to characterize the stability of fixed points associated with \emph{trigger deviation functions} through a refined perturbation analysis of a structured Markov chain. Furthermore, for the simpler solution concept of extensive-form \emph{coarse} correlated equilibrium (EFCCE) we give a new succinct closed-form characterization of the associated fixed points, bypassing the expensive computation of stationary distributions required for EFCE. Our results place EFCCE closer to \emph{normal-form coarse correlated equilibria} in terms of the per-iteration complexity, although the former prescribes a much more compelling notion of correlation. Finally, experiments conducted on standard benchmarks corroborate our theoretical findings. △ Less

Submitted 10 February, 2022; originally announced February 2022.

Comments: Preliminary parts of this paper will appear at the AAAI-22 Workshop on Reinforcement Learning in Games. This version also contains results from an earlier preprint published by a subset of the authors (arXiv:2109.08138)

arXiv:2111.06008 [pdf, ps, other]

doi 10.1145/3519935.3520031

Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games

Authors: Ioannis Anagnostides, Constantinos Daskalakis, Gabriele Farina, Maxwell Fishelson, Noah Golowich, Tuomas Sandholm

Abstract: Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS`21) showed that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(\textrm{polylog}(T))$ after $T$ repetitions of the game. We extend their result from external regret to internal regret and swap regret, thereby establishing uncoup… ▽ More Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS`21) showed that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(\textrm{polylog}(T))$ after $T$ repetitions of the game. We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium at the rate of $\tilde{O}(T^{-1})$. This substantially improves over the prior best rate of convergence for correlated equilibria of $O(T^{-3/4})$ due to Chen and Peng (NeurIPS`20), and it is optimal -- within the no-regret framework -- up to polylogarithmic factors in $T$. To obtain these results, we develop new techniques for establishing higher-order smoothness for learning dynamics involving fixed point operations. Specifically, we establish that the no-internal-regret learning dynamics of Stoltz and Lugosi (Mach Learn`05) are equivalently simulated by no-external-regret dynamics on a combinatorial space. This allows us to trade the computation of the stationary distribution on a polynomial-sized Markov chain for a (much more well-behaved) linear transformation on an exponential-sized set, enabling us to leverage similar techniques as DFG to near-optimally bound the internal regret. Moreover, we establish an $O(\textrm{polylog}(T))$ no-swap-regret bound for the classic algorithm of Blum and Mansour (BM) (JMLR`07). We do so by introducing a technique based on the Cauchy Integral Formula that circumvents the more limited combinatorial arguments of DFG. In addition to shedding clarity on the near-optimal regret guarantees of BM, our arguments provide insights into the various ways in which the techniques by DFG can be extended and leveraged in the analysis of more involved learning algorithms. △ Less

Submitted 24 January, 2023; v1 submitted 10 November, 2021; originally announced November 2021.

Comments: Appeared at STOC 2022

arXiv:2109.05151 [pdf, other]

Almost Universally Optimal Distributed Laplacian Solvers via Low-Congestion Shortcuts

Authors: Ioannis Anagnostides, Christoph Lenzen, Bernhard Haeupler, Goran Zuzic, Themis Gouleakis

Abstract: In this paper, we refine the (almost) \emph{existentially optimal} distributed Laplacian solver recently developed by Forster, Goranci, Liu, Peng, Sun, and Ye (FOCS `21) into an (almost) \emph{universally optimal} distributed Laplacian solver. Specifically, when the topology is known, we show that any Laplacian system on an $n$-node graph with \emph{shortcut quality} $\text{SQ}(G)$ can be solved… ▽ More In this paper, we refine the (almost) \emph{existentially optimal} distributed Laplacian solver recently developed by Forster, Goranci, Liu, Peng, Sun, and Ye (FOCS `21) into an (almost) \emph{universally optimal} distributed Laplacian solver. Specifically, when the topology is known, we show that any Laplacian system on an $n$-node graph with \emph{shortcut quality} $\text{SQ}(G)$ can be solved within $n^{o(1)} \text{SQ}(G) \log(1/\varepsilon)$ rounds, where $\varepsilon$ is the required accuracy. This almost matches our lower bound which guarantees that any correct algorithm on $G$ requires $\widetildeΩ(\text{SQ}(G))$ rounds, even for a crude solution with $\varepsilon \le 1/2$. Even in the unknown-topology case (i.e., standard CONGEST), the same bounds also hold in most networks of interest. Furthermore, conditional on conjectured improvements in state-of-the-art constructions of low-congestion shortcuts, the CONGEST results will match the known-topology ones. Moreover, following a recent line of work in distributed algorithms, we consider a hybrid communication model which enhances CONGEST with limited global power in the form of the node-capacitated clique (NCC) model. In this model, we show the existence of a Laplacian solver with round complexity $n^{o(1)} \log(1/\varepsilon)$. The unifying thread of these results, and our main technical contribution, is the study of novel \emph{congested} generalization of the standard \emph{part-wise aggregation} problem. We develop near-optimal algorithms for this primitive in the Supported-CONGEST model, almost-optimal algorithms in (standard) CONGEST, as well as a very simple algorithm for bounded-treewidth graphs with slightly worse bounds. This primitive can be readily used to accelerate the FOCS`21 Laplacian solver. We believe this primitive will find further independent applications. △ Less

Submitted 14 May, 2022; v1 submitted 10 September, 2021; originally announced September 2021.

arXiv:2109.02184 [pdf, other]

Dimensionality, Coordination, and Robustness in Voting

Authors: Ioannis Anagnostides, Dimitris Fotakis, Panagiotis Patsilinakos

Abstract: We study the performance of voting mechanisms from a utilitarian standpoint, under the recently introduced framework of metric-distortion, offering new insights along three main lines. First, if $d$ represents the doubling dimension of the metric space, we show that the distortion of STV is $O(d \log \log m)$, where $m$ represents the number of candidates. For doubling metrics this implies an expo… ▽ More We study the performance of voting mechanisms from a utilitarian standpoint, under the recently introduced framework of metric-distortion, offering new insights along three main lines. First, if $d$ represents the doubling dimension of the metric space, we show that the distortion of STV is $O(d \log \log m)$, where $m$ represents the number of candidates. For doubling metrics this implies an exponential improvement over the lower bound for general metrics, and as a special case it effectively answers a question left open by Skowron and Elkind (AAAI '17) regarding the distortion of STV under low-dimensional Euclidean spaces. More broadly, this constitutes the first nexus between the performance of any voting rule and the "intrinsic dimensionality" of the underlying metric space. We also establish a nearly-matching lower bound, refining the construction of Skowron and Elkind. Moreover, motivated by the efficiency of STV, we investigate whether natural learning rules can lead to low-distortion outcomes. Specifically, we introduce simple, deterministic and decentralized exploration/exploitation dynamics, and we show that they converge to a candidate with $O(1)$ distortion. Finally, driven by applications in facility location games, we consider several refinements and extensions of the standard metric-setting. Namely, we prove that the deterministic mechanism recently introduced by Gkatzelis, Halpern, and Shah (FOCS '20) attains the optimal distortion bound of $2$ under ultra-metrics, while it also comes close to our lower bound under distances satisfying approximate triangle inequalities. △ Less

Submitted 24 March, 2022; v1 submitted 5 September, 2021; originally announced September 2021.

arXiv:2108.01740 [pdf, other]

Deterministic Distributed Algorithms and Lower Bounds in the Hybrid Model

Authors: Ioannis Anagnostides, Themis Gouleakis

Abstract: The $\hybrid$ model was recently introduced by Augustine et al. \cite{DBLP:conf/soda/AugustineHKSS20} in order to characterize from an algorithmic standpoint the capabilities of networks which combine multiple communication modes. Concretely, it is assumed that the standard $\local$ model of distributed computing is enhanced with the feature of all-to-all communication, but with very limited bandw… ▽ More The $\hybrid$ model was recently introduced by Augustine et al. \cite{DBLP:conf/soda/AugustineHKSS20} in order to characterize from an algorithmic standpoint the capabilities of networks which combine multiple communication modes. Concretely, it is assumed that the standard $\local$ model of distributed computing is enhanced with the feature of all-to-all communication, but with very limited bandwidth, captured by the node-capacitated clique ($\ncc$). In this work we provide several new insights on the power of hybrid networks for fundamental problems in distributed algorithms. First, we present a deterministic algorithm which solves any problem on a sparse $n$-node graph in $\widetilde{\mathcal{O}}(\sqrt{n})$ rounds of $\hybrid$. We combine this primitive with several sparsification techniques to obtain efficient distributed algorithms for general graphs. Most notably, for the all-pairs shortest paths problem we give deterministic $(1 + ε)$- and $\log n/\log \log n$-approximate algorithms for unweighted and weighted graphs respectively with round complexity $\widetilde{\mathcal{O}}(\sqrt{n})$ in $\hybrid$, closely matching the performance of the state of the art randomized algorithm of Kuhn and Schneider \cite{10.1145/3382734.3405719}. Moreover, we make a connection with the Ghaffari-Haeupler framework of low-congestion shortcuts \cite{DBLP:conf/soda/GhaffariH16}, leading -- among others -- to a $(1 + ε)$-approximate algorithm for Min-Cut after $\log^{\mathcal{O}(1)}n$ rounds, with high probability, even if we restrict local edges to transfer $\mathcal{O}(\log n)$-bits per round. Finally, we prove via a reduction from the set disjointness problem that $\widetildeΩ(n^{1/3})$ rounds are required to determine the radius of an unweighted graph, as well as a $(3/2 - ε)$-approximation for weighted graphs. △ Less

Submitted 3 August, 2021; originally announced August 2021.

arXiv:2107.02489 [pdf, other]

Metric-Distortion Bounds under Limited Information

Authors: Ioannis Anagnostides, Dimitris Fotakis, Panagiotis Patsilinakos

Abstract: In this work we study the metric distortion problem in voting theory under a limited amount of ordinal information. Our primary contribution is threefold. First, we consider mechanisms which perform a sequence of pairwise comparisons between candidates. We show that a widely-popular deterministic mechanism employed in most knockout phases yields distortion $\mathcal{O}(\log m)$ while eliciting onl… ▽ More In this work we study the metric distortion problem in voting theory under a limited amount of ordinal information. Our primary contribution is threefold. First, we consider mechanisms which perform a sequence of pairwise comparisons between candidates. We show that a widely-popular deterministic mechanism employed in most knockout phases yields distortion $\mathcal{O}(\log m)$ while eliciting only $m-1$ out of $Θ(m^2)$ possible pairwise comparisons, where $m$ represents the number of candidates. Our analysis for this mechanism leverages a powerful technical lemma recently developed by Kempe \cite{DBLP:conf/aaai/000120a}. We also provide a matching lower bound on its distortion. In contrast, we prove that any mechanism which performs fewer than $m-1$ pairwise comparisons is destined to have unbounded distortion. Moreover, we study the power of deterministic mechanisms under incomplete rankings. Most notably, when every agent provides her $k$-top preferences we show an upper bound of $6 m/k + 1$ on the distortion, for any $k \in \{1, 2, \dots, m\}$. Thus, we substantially improve over the previous bound of $12 m/k$ recently established by Kempe \cite{DBLP:conf/aaai/000120a,DBLP:conf/aaai/000120b}, and we come closer to matching the best-known lower bound. Finally, we are concerned with the sample complexity required to ensure near-optimal distortion with high probability. Our main contribution is to show that a random sample of $Θ(m/ε^2)$ voters suffices to guarantee distortion $3 + ε$ with high probability, for any sufficiently small $ε> 0$. This result is based on analyzing the sensitivity of the deterministic mechanism introduced by Gkatzelis, Halpern, and Shah \cite{DBLP:conf/focs/Gkatzelis0020}. Importantly, all of our sample-complexity bounds are distribution-independent. △ Less

Submitted 6 July, 2021; originally announced July 2021.

arXiv:2010.09106 [pdf, other]

Robust Learning under Strong Noise via SQs

Authors: Ioannis Anagnostides, Themis Gouleakis, Ali Marashian

Abstract: This work provides several new insights on the robustness of Kearns' statistical query framework against challenging label-noise models. First, we build on a recent result by \cite{DBLP:journals/corr/abs-2006-04787} that showed noise tolerance of distribution-independently evolvable concept classes under Massart noise. Specifically, we extend their characterization to more general noise models, in… ▽ More This work provides several new insights on the robustness of Kearns' statistical query framework against challenging label-noise models. First, we build on a recent result by \cite{DBLP:journals/corr/abs-2006-04787} that showed noise tolerance of distribution-independently evolvable concept classes under Massart noise. Specifically, we extend their characterization to more general noise models, including the Tsybakov model which considerably generalizes the Massart condition by allowing the flip** probability to be arbitrarily close to $\frac{1}{2}$ for a subset of the domain. As a corollary, we employ an evolutionary algorithm by \cite{DBLP:conf/colt/KanadeVV10} to obtain the first polynomial time algorithm with arbitrarily small excess error for learning linear threshold functions over any spherically symmetric distribution in the presence of spherically symmetric Tsybakov noise. Moreover, we posit access to a stronger oracle, in which for every labeled example we additionally obtain its flip** probability. In this model, we show that every SQ learnable class admits an efficient learning algorithm with OPT + $ε$ misclassification error for a broad class of noise models. This setting substantially generalizes the widely-studied problem of classification under RCN with known noise rate, and corresponds to a non-convex optimization problem even when the noise function -- i.e. the flip** probabilities of all points -- is known in advance. △ Less

Submitted 18 October, 2020; originally announced October 2020.

arXiv:2010.03211 [pdf, other]

A Robust Framework for Analyzing Gradient-Based Dynamics in Bilinear Games

Authors: Ioannis Anagnostides, Paolo Penna

Abstract: In this work, we establish a frequency-domain framework for analyzing gradient-based algorithms in linear minimax optimization problems; specifically, our approach is based on the Z-transform, a powerful tool applied in Control Theory and Signal Processing in order to characterize linear discrete-time systems. We employ our framework to obtain the first tight analysis of stability of Optimistic Gr… ▽ More In this work, we establish a frequency-domain framework for analyzing gradient-based algorithms in linear minimax optimization problems; specifically, our approach is based on the Z-transform, a powerful tool applied in Control Theory and Signal Processing in order to characterize linear discrete-time systems. We employ our framework to obtain the first tight analysis of stability of Optimistic Gradient Descent/Ascent (OGDA), a natural variant of Gradient Descent/Ascent that was shown to exhibit last-iterate convergence in bilinear games by Daskalakis et al. \cite{DBLP:journals/corr/abs-1711-00141}. Importantly, our analysis is considerably simpler and more concise than the existing ones. Moreover, building on the intuition of OGDA, we consider a general family of gradient-based algorithms that augment the memory of the optimization through multiple historical steps. We reduce the convergence -- to a saddle-point -- of the dynamics in bilinear games to the stability of a polynomial, for which efficient algorithmic schemes are well-established. As an immediate corollary, we obtain a broad class of algorithms -- that contains OGDA as a special case -- with a last-iterate convergence guarantee to the space of Nash equilibria of the game. △ Less

Submitted 7 October, 2020; originally announced October 2020.

arXiv:2010.00109 [pdf, other]

Solving Zero-Sum Games through Alternating Projections

Authors: Ioannis Anagnostides, Paolo Penna

Abstract: In this work, we establish near-linear and strong convergence for a natural first-order iterative algorithm that simulates Von Neumann's Alternating Projections method in zero-sum games. First, we provide a precise analysis of Optimistic Gradient Descent/Ascent (OGDA) -- an optimistic variant of Gradient Descent/Ascent -- for \emph{unconstrained} bilinear games, extending and strengthening prior r… ▽ More In this work, we establish near-linear and strong convergence for a natural first-order iterative algorithm that simulates Von Neumann's Alternating Projections method in zero-sum games. First, we provide a precise analysis of Optimistic Gradient Descent/Ascent (OGDA) -- an optimistic variant of Gradient Descent/Ascent -- for \emph{unconstrained} bilinear games, extending and strengthening prior results along several directions. Our characterization is based on a closed-form solution we derive for the dynamics, while our results also reveal several surprising properties. Indeed, our main algorithmic contribution is founded on a geometric feature of OGDA we discovered; namely, the limit points of the dynamics are the orthogonal projection of the initial state to the space of attractors. Motivated by this property, we show that the equilibria for a natural class of \emph{constrained} bilinear games are the intersection of the unconstrained stationary points with the corresponding probability simplexes. Thus, we employ OGDA to implement an Alternating Projections procedure, converging to an $ε$-approximate Nash equilibrium in $\widetilde{\mathcal{O}}(\log^2(1/ε))$ iterations. Our techniques supplement the recent work in pursuing last-iterate guarantees in min-max optimization. Finally, we illustrate an -- in principle -- trivial reduction from any game to the assumed class of instances, without altering the space of equilibria. △ Less

Submitted 17 August, 2021; v1 submitted 30 September, 2020; originally announced October 2020.

Showing 1–25 of 25 results for author: Anagnostides, I