Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games

Anagnostides, Ioannis; Daskalakis, Constantinos; Farina, Gabriele; Fishelson, Maxwell; Golowich, Noah; Sandholm, Tuomas

doi:10.1145/3519935.3520031

Computer Science > Machine Learning

arXiv:2111.06008 (cs)

[Submitted on 11 Nov 2021 (v1), last revised 24 Jan 2023 (this version, v3)]

Title:Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games

Authors:Ioannis Anagnostides, Constantinos Daskalakis, Gabriele Farina, Maxwell Fishelson, Noah Golowich, Tuomas Sandholm

View PDF

Abstract:Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS`21) showed that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(\textrm{polylog}(T))$ after $T$ repetitions of the game. We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium at the rate of $\tilde{O}(T^{-1})$. This substantially improves over the prior best rate of convergence for correlated equilibria of $O(T^{-3/4})$ due to Chen and Peng (NeurIPS`20), and it is optimal -- within the no-regret framework -- up to polylogarithmic factors in $T$.
To obtain these results, we develop new techniques for establishing higher-order smoothness for learning dynamics involving fixed point operations. Specifically, we establish that the no-internal-regret learning dynamics of Stoltz and Lugosi (Mach Learn`05) are equivalently simulated by no-external-regret dynamics on a combinatorial space. This allows us to trade the computation of the stationary distribution on a polynomial-sized Markov chain for a (much more well-behaved) linear transformation on an exponential-sized set, enabling us to leverage similar techniques as DFG to near-optimally bound the internal regret.
Moreover, we establish an $O(\textrm{polylog}(T))$ no-swap-regret bound for the classic algorithm of Blum and Mansour (BM) (JMLR`07). We do so by introducing a technique based on the Cauchy Integral Formula that circumvents the more limited combinatorial arguments of DFG. In addition to shedding clarity on the near-optimal regret guarantees of BM, our arguments provide insights into the various ways in which the techniques by DFG can be extended and leveraged in the analysis of more involved learning algorithms.

Comments:	Appeared at STOC 2022
Subjects:	Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
Cite as:	arXiv:2111.06008 [cs.LG]
	(or arXiv:2111.06008v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2111.06008
Related DOI:	https://doi.org/10.1145/3519935.3520031

Submission history

From: Maxwell Fishelson [view email]
[v1] Thu, 11 Nov 2021 01:19:53 UTC (318 KB)
[v2] Mon, 4 Jul 2022 13:17:05 UTC (318 KB)
[v3] Tue, 24 Jan 2023 22:51:23 UTC (321 KB)

Computer Science > Machine Learning

Title:Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators