Search | arXiv e-print repository

Payoff-based learning with matrix multiplicative weights in quantum games

Authors: Kyriakos Lotidis, Panayotis Mertikopoulos, Nicholas Bambos, Jose Blanchet

Abstract: In this paper, we study the problem of learning in quantum games - and other classes of semidefinite games - with scalar, payoff-based feedback. For concreteness, we focus on the widely used matrix multiplicative weights (MMW) algorithm and, instead of requiring players to have full knowledge of the game (and/or each other's chosen states), we introduce a suite of minimal-information matrix multip… ▽ More In this paper, we study the problem of learning in quantum games - and other classes of semidefinite games - with scalar, payoff-based feedback. For concreteness, we focus on the widely used matrix multiplicative weights (MMW) algorithm and, instead of requiring players to have full knowledge of the game (and/or each other's chosen states), we introduce a suite of minimal-information matrix multiplicative weights (3MW) methods tailored to different information frameworks. The main difficulty to attaining convergence in this setting is that, in contrast to classical finite games, quantum games have an infinite continuum of pure states (the quantum equivalent of pure strategies), so standard importance-weighting techniques for estimating payoff vectors cannot be employed. Instead, we borrow ideas from bandit convex optimization and we design a zeroth-order gradient sampler adapted to the semidefinite geometry of the problem at hand. As a first result, we show that the 3MW method with deterministic payoff feedback retains the $\mathcal{O}(1/\sqrt{T})$ convergence rate of the vanilla, full information MMW algorithm in quantum min-max games, even though the players only observe a single scalar. Subsequently, we relax the algorithm's information requirements even further and we provide a 3MW method that only requires players to observe a random realization of their payoff observable, and converges to equilibrium at an $\mathcal{O}(T^{-1/4})$ rate. Finally, going beyond zero-sum games, we show that a regularized variant of the proposed 3MW method guarantees local convergence with high probability to all equilibria that satisfy a certain first-order stability condition. △ Less

Submitted 4 November, 2023; originally announced November 2023.

Comments: 39 pages, 21 figures, 2 tables

MSC Class: Primary 91A10; 91A26; 37N40; secondary 68Q32; 81Q93

arXiv:2302.02333 [pdf, other]

Learning in quantum games

Authors: Kyriakos Lotidis, Panayotis Mertikopoulos, Nicholas Bambos

Abstract: In this paper, we introduce a class of learning dynamics for general quantum games, that we call "follow the quantum regularized leader" (FTQL), in reference to the classical "follow the regularized leader" (FTRL) template for learning in finite games. We show that the induced quantum state dynamics decompose into (i) a classical, commutative component which governs the dynamics of the system's ei… ▽ More In this paper, we introduce a class of learning dynamics for general quantum games, that we call "follow the quantum regularized leader" (FTQL), in reference to the classical "follow the regularized leader" (FTRL) template for learning in finite games. We show that the induced quantum state dynamics decompose into (i) a classical, commutative component which governs the dynamics of the system's eigenvalues in a way analogous to the evolution of mixed strategies under FTRL; and (ii) a non-commutative component for the system's eigenvectors which has no classical counterpart. Despite the complications that this non-classical component entails, we find that the FTQL dynamics incur no more than constant regret in all quantum games. Moreover, adjusting classical notions of stability to account for the nonlinear geometry of the state space of quantum games, we show that only pure quantum equilibria can be stable and attracting under FTQL while, as a partial converse, pure equilibria that satisfy a certain "variational stability" condition are always attracting. Finally, we show that the FTQL dynamics are Poincaré recurrent in quantum min-max games, extending in this way a very recent result for the quantum replicator dynamics. △ Less

Submitted 5 February, 2023; originally announced February 2023.

Comments: 30 pages, 4 figures

MSC Class: Primary 91A81; 37N40; 68Q32; secondary 68T05; 81Q93; 91B80

arXiv:2210.08857 [pdf, ps, other]

On the convergence of policy gradient methods to Nash equilibria in general stochastic games

Authors: Angeliki Giannou, Kyriakos Lotidis, Panayotis Mertikopoulos, Emmanouil-Vasileios Vlatakis-Gkaragkounis

Abstract: Learning in stochastic games is a notoriously difficult problem because, in addition to each other's strategic decisions, the players must also contend with the fact that the game itself evolves over time, possibly in a very complicated manner. Because of this, the convergence properties of popular learning algorithms - like policy gradient and its variants - are poorly understood, except in speci… ▽ More Learning in stochastic games is a notoriously difficult problem because, in addition to each other's strategic decisions, the players must also contend with the fact that the game itself evolves over time, possibly in a very complicated manner. Because of this, the convergence properties of popular learning algorithms - like policy gradient and its variants - are poorly understood, except in specific classes of games (such as potential or two-player, zero-sum games). In view of this, we examine the long-run behavior of policy gradient methods with respect to Nash equilibrium policies that are second-order stationary (SOS) in a sense similar to the type of sufficiency conditions used in optimization. Our first result is that SOS policies are locally attracting with high probability, and we show that policy gradient trajectories with gradient estimates provided by the REINFORCE algorithm achieve an $\mathcal{O}(1/\sqrt{n})$ distance-squared convergence rate if the method's step-size is chosen appropriately. Subsequently, specializing to the class of deterministic Nash policies, we show that this rate can be improved dramatically and, in fact, policy gradient methods converge within a finite number of iterations in that case. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: 43 pages, 2 tables; to appear in the proceedings of NeurIPS 2022

MSC Class: Primary 91A15; 91A26; secondary 68Q32; 68T05; 90C40

arXiv:2209.04926 [pdf, other]

Learning in Games with Quantized Payoff Observations

Authors: Kyriakos Lotidis, Panayotis Mertikopoulos, Nicholas Bambos

Abstract: This paper investigates the impact of feedback quantization on multi-agent learning. In particular, we analyze the equilibrium convergence properties of the well-known "follow the regularized leader" (FTRL) class of algorithms when players can only observe a quantized (and possibly noisy) version of their payoffs. In this information-constrained setting, we show that coarser quantization triggers… ▽ More This paper investigates the impact of feedback quantization on multi-agent learning. In particular, we analyze the equilibrium convergence properties of the well-known "follow the regularized leader" (FTRL) class of algorithms when players can only observe a quantized (and possibly noisy) version of their payoffs. In this information-constrained setting, we show that coarser quantization triggers a qualitative shift in the convergence behavior of FTRL schemes. Specifically, if the quantization error lies below a threshold value (which depends only on the underlying game and not on the level of uncertainty entering the process or the specific FTRL variant under study), then (i) FTRL is attracted to the game's strict Nash equilibria with arbitrarily high probability; and (ii) the algorithm's asymptotic rate of convergence remains the same as in the non-quantized case. Otherwise, for larger quantization levels, these convergence properties are lost altogether: players may fail to learn anything beyond their initial state, even with full information on their payoff vectors. This is in contrast to the impact of quantization in continuous optimization problems, where the quality of the obtained solution degrades smoothly with the quantization level. △ Less

Submitted 11 September, 2022; originally announced September 2022.

arXiv:1809.01803 [pdf, ps, other]

A Bridge between Liquid and Social Welfare in Combinatorial Auctions with Submodular Bidders

Authors: Dimitris Fotakis, Kyriakos Lotidis, Chara Podimata

Abstract: We study incentive compatible mechanisms for Combinatorial Auctions where the bidders have submodular (or XOS) valuations and are budget-constrained. Our objective is to maximize the \emph{liquid welfare}, a notion of efficiency for budget-constrained bidders introduced by Dobzinski and Paes Leme (2014). We show that some of the known truthful mechanisms that best-approximate the social welfare fo… ▽ More We study incentive compatible mechanisms for Combinatorial Auctions where the bidders have submodular (or XOS) valuations and are budget-constrained. Our objective is to maximize the \emph{liquid welfare}, a notion of efficiency for budget-constrained bidders introduced by Dobzinski and Paes Leme (2014). We show that some of the known truthful mechanisms that best-approximate the social welfare for Combinatorial Auctions with submodular bidders through demand query oracles can be adapted, so that they retain truthfulness and achieve asymptotically the same approximation guarantees for the liquid welfare. More specifically, for the problem of optimizing the liquid welfare in Combinatorial Auctions with submodular bidders, we obtain a universally truthful randomized $O(\log m)$-approximate mechanism, where $m$ is the number of items, by adapting the mechanism of Krysta and Vöcking (2012). Additionally, motivated by large market assumptions often used in mechanism design, we introduce a notion of competitive markets and show that in such markets, liquid welfare can be approximated within a constant factor by a randomized universally truthful mechanism. Finally, in the Bayesian setting, we obtain a truthful $O(1)$-approximate mechanism for the case where bidder valuations are generated as independent samples from a known distribution, by adapting the results of Feldman, Gravin and Lucier (2014). △ Less

Submitted 12 December, 2018; v1 submitted 5 September, 2018; originally announced September 2018.

Comments: AAAI-19

Showing 1–5 of 5 results for author: Lotidis, K