Search | arXiv e-print repository

What is the long-run distribution of stochastic gradient descent? A large deviations analysis

Authors: Waïss Azizian, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos

Abstract: In this paper, we examine the long-run distribution of stochastic gradient descent (SGD) in general, non-convex problems. Specifically, we seek to understand which regions of the problem's state space are more likely to be visited by SGD, and by how much. Using an approach based on the theory of large deviations and randomly perturbed dynamical systems, we show that the long-run distribution of SG… ▽ More In this paper, we examine the long-run distribution of stochastic gradient descent (SGD) in general, non-convex problems. Specifically, we seek to understand which regions of the problem's state space are more likely to be visited by SGD, and by how much. Using an approach based on the theory of large deviations and randomly perturbed dynamical systems, we show that the long-run distribution of SGD resembles the Boltzmann-Gibbs distribution of equilibrium thermodynamics with temperature equal to the method's step-size and energy levels determined by the problem's objective and the statistics of the noise. In particular, we show that, in the long run, (a) the problem's critical region is visited exponentially more often than any non-critical region; (b) the iterates of SGD are exponentially concentrated around the problem's minimum energy state (which does not always coincide with the global minimum of the objective); (c) all other connected components of critical points are visited with frequency that is exponentially proportional to their energy level; and, finally (d) any component of local maximizers or saddle points is "dominated" by a component of local minimizers which is visited exponentially more often. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 70 pages, 3 figures; to be published in the proceedings of ICML 2024

MSC Class: Primary 90C15; 90C26; 60F10; secondary 90C30; 68Q32

arXiv:2405.17693 [pdf, other]

Tamed Langevin sampling under weaker conditions

Authors: Iosif Lytras, Panayotis Mertikopoulos

Abstract: Motivated by applications to deep learning which often fail standard Lipschitz smoothness requirements, we examine the problem of sampling from distributions that are not log-concave and are only weakly dissipative, with log-gradients allowed to grow superlinearly at infinity. In terms of structure, we only assume that the target distribution satisfies either a log-Sobolev or a Poincaré inequality… ▽ More Motivated by applications to deep learning which often fail standard Lipschitz smoothness requirements, we examine the problem of sampling from distributions that are not log-concave and are only weakly dissipative, with log-gradients allowed to grow superlinearly at infinity. In terms of structure, we only assume that the target distribution satisfies either a log-Sobolev or a Poincaré inequality and a local Lipschitz smoothness assumption with modulus growing possibly polynomially at infinity. This set of assumptions greatly exceeds the operational limits of the "vanilla" unadjusted Langevin algorithm (ULA), making sampling from such distributions a highly involved affair. To account for this, we introduce a taming scheme which is tailored to the growth and decay properties of the target distribution, and we provide explicit non-asymptotic guarantees for the proposed sampler in terms of the Kullback-Leibler (KL) divergence, total variation, and Wasserstein distance to the target distribution. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 32 pages, 2 figures

MSC Class: Primary 65C05; 60H10; secondary 68Q32

arXiv:2405.07224 [pdf, other]

A geometric decomposition of finite games: Convergence vs. recurrence under exponential weights

Authors: Davide Legacci, Panayotis Mertikopoulos, Bary Pradelski

Abstract: In view of the complexity of the dynamics of learning in games, we seek to decompose a game into simpler components where the dynamics' long-run behavior is well understood. A natural starting point for this is Helmholtz's theorem, which decomposes a vector field into a potential and an incompressible component. However, the geometry of game dynamics - and, in particular, the dynamics of exponenti… ▽ More In view of the complexity of the dynamics of learning in games, we seek to decompose a game into simpler components where the dynamics' long-run behavior is well understood. A natural starting point for this is Helmholtz's theorem, which decomposes a vector field into a potential and an incompressible component. However, the geometry of game dynamics - and, in particular, the dynamics of exponential / multiplicative weights (EW) schemes - is not compatible with the Euclidean underpinnings of Helmholtz's theorem. This leads us to consider a specific Riemannian framework based on the so-called Shahshahani metric, and introduce the class of incompressible games, for which we establish the following results: First, in addition to being volume-preserving, the continuous-time EW dynamics in incompressible games admit a constant of motion and are Poincaré recurrent - i.e., almost every trajectory of play comes arbitrarily close to its starting point infinitely often. Second, we establish a deep connection with a well-known decomposition of games into a potential and harmonic component (where the players' objectives are aligned and anti-aligned respectively): a game is incompressible if and only if it is harmonic, implying in turn that the EW dynamics lead to Poincaré recurrence in harmonic games. △ Less

Submitted 18 May, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

Comments: 49 pages, 16 figures

MSC Class: Primary 91A10; 91A26; secondary 91A68; 68Q32; 68T05

arXiv:2402.09824 [pdf, other]

On the discrete-time origins of the replicator dynamics: From convergence to instability and chaos

Authors: Fryderyk Falniowski, Panayotis Mertikopoulos

Abstract: We consider three distinct discrete-time models of learning and evolution in games: a biological model based on intra-species selective pressure, the dynamics induced by pairwise proportional imitation, and the exponential / multiplicative weights (EW) algorithm for online learning. Even though these models share the same continuous-time limit - the replicator dynamics - we show that second-order… ▽ More We consider three distinct discrete-time models of learning and evolution in games: a biological model based on intra-species selective pressure, the dynamics induced by pairwise proportional imitation, and the exponential / multiplicative weights (EW) algorithm for online learning. Even though these models share the same continuous-time limit - the replicator dynamics - we show that second-order effects play a crucial role and may lead to drastically different behaviors in each model, even in very simple, symmetric $2\times2$ games. Specifically, we study the resulting discrete-time dynamics in a class of parametrized congestion games, and we show that (i) in the biological model of intra-species competition, the dynamics remain convergent for any parameter value; (ii) the dynamics of pairwise proportional imitation exhibit an entire range of behaviors for larger time steps and different equilibrium configurations (stability, instability, and even Li-Yorke chaos); while (iii) in the EW algorithm, increasing the time step (almost) inevitably leads to chaos (again, in the formal, Li-Yorke sense). This divergence of behaviors comes in stark contrast to the globally convergent behavior of the replicator dynamics, and serves to delineate the extent to which the replicator dynamics provide a useful predictor for the long-run behavior of their discrete-time origins. △ Less

Submitted 26 February, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: 22 pages, 8 figures

MSC Class: Primary 91A22; 91A26; 37E05; secondary 37N40; 91A14

arXiv:2312.16609 [pdf, other]

Exploiting hidden structures in non-convex games for convergence to Nash equilibrium

Authors: Iosif Sakos, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Panayotis Mertikopoulos, Georgios Piliouras

Abstract: A wide array of modern machine learning applications - from adversarial models to multi-agent reinforcement learning - can be formulated as non-cooperative games whose Nash equilibria represent the system's desired operational states. Despite having a highly non-convex loss landscape, many cases of interest possess a latent convex structure that could potentially be leveraged to yield convergence… ▽ More A wide array of modern machine learning applications - from adversarial models to multi-agent reinforcement learning - can be formulated as non-cooperative games whose Nash equilibria represent the system's desired operational states. Despite having a highly non-convex loss landscape, many cases of interest possess a latent convex structure that could potentially be leveraged to yield convergence to equilibrium. Driven by this observation, our paper proposes a flexible first-order method that successfully exploits such "hidden structures" and achieves convergence under minimal assumptions for the transformation connecting the players' control variables to the game's latent, convex-structured layer. The proposed method - which we call preconditioned hidden gradient descent (PHGD) - hinges on a judiciously chosen gradient preconditioning scheme related to natural gradient methods. Importantly, we make no separability assumptions for the game's hidden structure, and we provide explicit convergence rate guarantees for both deterministic and stochastic environments. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: 32 pages, 18 figures

MSC Class: Primary 91A10; 91A26; secondary 68Q32

arXiv:2311.10859 [pdf, other]

A Quadratic Speedup in Finding Nash Equilibria of Quantum Zero-Sum Games

Authors: Francisca Vasconcelos, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Panayotis Mertikopoulos, Georgios Piliouras, Michael I. Jordan

Abstract: Recent developments in domains such as non-local games, quantum interactive proofs, and quantum generative adversarial networks have renewed interest in quantum game theory and, specifically, quantum zero-sum games. Central to classical game theory is the efficient algorithmic computation of Nash equilibria, which represent optimal strategies for both players. In 2008, Jain and Watrous proposed th… ▽ More Recent developments in domains such as non-local games, quantum interactive proofs, and quantum generative adversarial networks have renewed interest in quantum game theory and, specifically, quantum zero-sum games. Central to classical game theory is the efficient algorithmic computation of Nash equilibria, which represent optimal strategies for both players. In 2008, Jain and Watrous proposed the first classical algorithm for computing equilibria in quantum zero-sum games using the Matrix Multiplicative Weight Updates (MMWU) method to achieve a convergence rate of $\mathcal{O}(d/ε^2)$ iterations to $ε$-Nash equilibria in the $4^d$-dimensional spectraplex. In this work, we propose a hierarchy of quantum optimization algorithms that generalize MMWU via an extra-gradient mechanism. Notably, within this proposed hierarchy, we introduce the Optimistic Matrix Multiplicative Weights Update (OMMWU) algorithm and establish its average-iterate convergence complexity as $\mathcal{O}(d/ε)$ iterations to $ε$-Nash equilibria. This quadratic speed-up relative to Jain and Watrous' original algorithm sets a new benchmark for computing $ε$-Nash equilibria in quantum zero-sum games. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: 53 pages, 7 figures, QTML 2023 (Accepted (Long Talk))

MSC Class: primary 91A05; 81Q93; secondary 68Q32; 91A26; 37N40;

arXiv:2311.02423 [pdf, other]

Payoff-based learning with matrix multiplicative weights in quantum games

Authors: Kyriakos Lotidis, Panayotis Mertikopoulos, Nicholas Bambos, Jose Blanchet

Abstract: In this paper, we study the problem of learning in quantum games - and other classes of semidefinite games - with scalar, payoff-based feedback. For concreteness, we focus on the widely used matrix multiplicative weights (MMW) algorithm and, instead of requiring players to have full knowledge of the game (and/or each other's chosen states), we introduce a suite of minimal-information matrix multip… ▽ More In this paper, we study the problem of learning in quantum games - and other classes of semidefinite games - with scalar, payoff-based feedback. For concreteness, we focus on the widely used matrix multiplicative weights (MMW) algorithm and, instead of requiring players to have full knowledge of the game (and/or each other's chosen states), we introduce a suite of minimal-information matrix multiplicative weights (3MW) methods tailored to different information frameworks. The main difficulty to attaining convergence in this setting is that, in contrast to classical finite games, quantum games have an infinite continuum of pure states (the quantum equivalent of pure strategies), so standard importance-weighting techniques for estimating payoff vectors cannot be employed. Instead, we borrow ideas from bandit convex optimization and we design a zeroth-order gradient sampler adapted to the semidefinite geometry of the problem at hand. As a first result, we show that the 3MW method with deterministic payoff feedback retains the $\mathcal{O}(1/\sqrt{T})$ convergence rate of the vanilla, full information MMW algorithm in quantum min-max games, even though the players only observe a single scalar. Subsequently, we relax the algorithm's information requirements even further and we provide a 3MW method that only requires players to observe a random realization of their payoff observable, and converges to equilibrium at an $\mathcal{O}(T^{-1/4})$ rate. Finally, going beyond zero-sum games, we show that a regularized variant of the proposed 3MW method guarantees local convergence with high probability to all equilibria that satisfy a certain first-order stability condition. △ Less

Submitted 4 November, 2023; originally announced November 2023.

Comments: 39 pages, 21 figures, 2 tables

MSC Class: Primary 91A10; 91A26; 37N40; secondary 68Q32; 81Q93

arXiv:2311.02407 [pdf, other]

The equivalence of dynamic and strategic stability under regularized learning in games

Authors: Victor Boone, Panayotis Mertikopoulos

Abstract: In this paper, we examine the long-run behavior of regularized, no-regret learning in finite games. A well-known result in the field states that the empirical frequencies of no-regret play converge to the game's set of coarse correlated equilibria; however, our understanding of how the players' actual strategies evolve over time is much more limited - and, in many cases, non-existent. This issue i… ▽ More In this paper, we examine the long-run behavior of regularized, no-regret learning in finite games. A well-known result in the field states that the empirical frequencies of no-regret play converge to the game's set of coarse correlated equilibria; however, our understanding of how the players' actual strategies evolve over time is much more limited - and, in many cases, non-existent. This issue is exacerbated further by a series of recent results showing that only strict Nash equilibria are stable and attracting under regularized learning, thus making the relation between learning and pointwise solution concepts particularly elusive. In lieu of this, we take a more general approach and instead seek to characterize the \emph{setwise} rationality properties of the players' day-to-day play. To that end, we focus on one of the most stringent criteria of setwise strategic stability, namely that any unilateral deviation from the set in question incurs a cost to the deviator - a property known as closedness under better replies (club). In so doing, we obtain a far-reaching equivalence between strategic and dynamic stability: a product of pure strategies is closed under better replies if and only if its span is stable and attracting under regularized learning. In addition, we estimate the rate of convergence to such sets, and we show that methods based on entropic regularization (like the exponential weights algorithm) converge at a geometric rate, while projection-based methods converge within a finite number of iterations, even with bandit, payoff-based feedback. △ Less

Submitted 4 November, 2023; originally announced November 2023.

Comments: 31 pages, 8 figures, 2 tables

MSC Class: Primary 91A10; 91A26; secondary 68Q32; 62L20

arXiv:2311.02374 [pdf, other]

Riemannian stochastic optimization methods avoid strict saddle points

Authors: Ya-** Hsieh, Mohammad Reza Karimi, Andreas Krause, Panayotis Mertikopoulos

Abstract: Many modern machine learning applications - from online principal component analysis to covariance matrix identification and dictionary learning - can be formulated as minimization problems on Riemannian manifolds, and are typically solved with a Riemannian stochastic gradient method (or some variant thereof). However, in many cases of interest, the resulting minimization problem is not geodesical… ▽ More Many modern machine learning applications - from online principal component analysis to covariance matrix identification and dictionary learning - can be formulated as minimization problems on Riemannian manifolds, and are typically solved with a Riemannian stochastic gradient method (or some variant thereof). However, in many cases of interest, the resulting minimization problem is not geodesically convex, so the convergence of the chosen solver to a desirable solution - i.e., a local minimizer - is by no means guaranteed. In this paper, we study precisely this question, that is, whether stochastic Riemannian optimization algorithms are guaranteed to avoid saddle points with probability 1. For generality, we study a family of retraction-based methods which, in addition to having a potentially much lower per-iteration cost relative to Riemannian gradient descent, include other widely used algorithms, such as natural policy gradient methods and mirror descent in ordinary convex spaces. In this general setting, we show that, under mild assumptions for the ambient manifold and the oracle providing gradient information, the policies under study avoid strict saddle points / submanifolds with probability 1, from any initial condition. This result provides an important sanity check for the use of gradient methods on manifolds as it shows that, almost always, the limit state of a stochastic Riemannian algorithm can only be a local minimizer. △ Less

Submitted 4 November, 2023; originally announced November 2023.

Comments: 27 pages, 3 figures

MSC Class: Primary 62L20; 37N40; secondary 90C15; 90C48

arXiv:2302.02333 [pdf, other]

Learning in quantum games

Authors: Kyriakos Lotidis, Panayotis Mertikopoulos, Nicholas Bambos

Abstract: In this paper, we introduce a class of learning dynamics for general quantum games, that we call "follow the quantum regularized leader" (FTQL), in reference to the classical "follow the regularized leader" (FTRL) template for learning in finite games. We show that the induced quantum state dynamics decompose into (i) a classical, commutative component which governs the dynamics of the system's ei… ▽ More In this paper, we introduce a class of learning dynamics for general quantum games, that we call "follow the quantum regularized leader" (FTQL), in reference to the classical "follow the regularized leader" (FTRL) template for learning in finite games. We show that the induced quantum state dynamics decompose into (i) a classical, commutative component which governs the dynamics of the system's eigenvalues in a way analogous to the evolution of mixed strategies under FTRL; and (ii) a non-commutative component for the system's eigenvectors which has no classical counterpart. Despite the complications that this non-classical component entails, we find that the FTQL dynamics incur no more than constant regret in all quantum games. Moreover, adjusting classical notions of stability to account for the nonlinear geometry of the state space of quantum games, we show that only pure quantum equilibria can be stable and attracting under FTQL while, as a partial converse, pure equilibria that satisfy a certain "variational stability" condition are always attracting. Finally, we show that the FTQL dynamics are Poincaré recurrent in quantum min-max games, extending in this way a very recent result for the quantum replicator dynamics. △ Less

Submitted 5 February, 2023; originally announced February 2023.

Comments: 30 pages, 4 figures

MSC Class: Primary 91A81; 37N40; 68Q32; secondary 68T05; 81Q93; 91B80

arXiv:2211.08043 [pdf, ps, other]

The rate of convergence of Bregman proximal methods: Local geometry vs. regularity vs. sharpness

Authors: Waïss Azizian, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos

Abstract: We examine the last-iterate convergence rate of Bregman proximal methods - from mirror descent to mirror-prox and its optimistic variants - as a function of the local geometry induced by the prox-map** defining the method. For generality, we focus on local solutions of constrained, non-monotone variational inequalities, and we show that the convergence rate of a given method depends sharply on i… ▽ More We examine the last-iterate convergence rate of Bregman proximal methods - from mirror descent to mirror-prox and its optimistic variants - as a function of the local geometry induced by the prox-map** defining the method. For generality, we focus on local solutions of constrained, non-monotone variational inequalities, and we show that the convergence rate of a given method depends sharply on its associated Legendre exponent, a notion that measures the growth rate of the underlying Bregman function (Euclidean, entropic, or other) near a solution. In particular, we show that boundary solutions exhibit a stark separation of regimes between methods with a zero and non-zero Legendre exponent: the former converge at a linear rate, while the latter converge, in general, sublinearly. This dichotomy becomes even more pronounced in linearly constrained problems where methods with entropic regularization achieve a linear convergence rate along sharp directions, compared to convergence in a finite number of steps under Euclidean regularization. △ Less

Submitted 2 August, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

Comments: 31 pages, 2 tables, 2 figures

MSC Class: Primary 65K15; 90C33; secondary 68Q25; 68Q32

arXiv:2210.12860 [pdf, ps, other]

Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee

Authors: Tianyi Lin, Panayotis Mertikopoulos, Michael I. Jordan

Abstract: We propose and analyze several inexact regularized Newton-type methods for finding a global saddle point of \emph{convex-concave} unconstrained min-max optimization problems. Compared to first-order methods, our understanding of second-order methods for min-max optimization is relatively limited, as obtaining global rates of convergence with second-order information is much more involved. In this… ▽ More We propose and analyze several inexact regularized Newton-type methods for finding a global saddle point of \emph{convex-concave} unconstrained min-max optimization problems. Compared to first-order methods, our understanding of second-order methods for min-max optimization is relatively limited, as obtaining global rates of convergence with second-order information is much more involved. In this paper, we examine how second-order information can be used to speed up extra-gradient methods, even under inexactness. Specifically, we show that the proposed methods generate iterates that remain within a bounded set and that the averaged iterates converge to an $ε$-saddle point within $O(ε^{-2/3})$ iterations in terms of a restricted gap function. This matched the theoretically established lower bound in this context. We also provide a simple routine for solving the subproblem at each iteration, requiring a single Schur decomposition and $O(\log\log(1/ε))$ calls to a linear system solver in a quasi-upper-triangular system. Thus, our method improves the existing line-search-based second-order min-max optimization methods by shaving off an $O(\log\log(1/ε))$ factor in the required number of Schur decompositions. Finally, we present numerical experiments on synthetic and real data that demonstrate the efficiency of the proposed methods. △ Less

Submitted 23 April, 2024; v1 submitted 23 October, 2022; originally announced October 2022.

Comments: Provide a simple subroutine with a detailed complexity analysis; 30 pages, 9 figures

arXiv:2210.08857 [pdf, ps, other]

On the convergence of policy gradient methods to Nash equilibria in general stochastic games

Authors: Angeliki Giannou, Kyriakos Lotidis, Panayotis Mertikopoulos, Emmanouil-Vasileios Vlatakis-Gkaragkounis

Abstract: Learning in stochastic games is a notoriously difficult problem because, in addition to each other's strategic decisions, the players must also contend with the fact that the game itself evolves over time, possibly in a very complicated manner. Because of this, the convergence properties of popular learning algorithms - like policy gradient and its variants - are poorly understood, except in speci… ▽ More Learning in stochastic games is a notoriously difficult problem because, in addition to each other's strategic decisions, the players must also contend with the fact that the game itself evolves over time, possibly in a very complicated manner. Because of this, the convergence properties of popular learning algorithms - like policy gradient and its variants - are poorly understood, except in specific classes of games (such as potential or two-player, zero-sum games). In view of this, we examine the long-run behavior of policy gradient methods with respect to Nash equilibrium policies that are second-order stationary (SOS) in a sense similar to the type of sufficiency conditions used in optimization. Our first result is that SOS policies are locally attracting with high probability, and we show that policy gradient trajectories with gradient estimates provided by the REINFORCE algorithm achieve an $\mathcal{O}(1/\sqrt{n})$ distance-squared convergence rate if the method's step-size is chosen appropriately. Subsequently, specializing to the class of deterministic Nash policies, we show that this rate can be improved dramatically and, in fact, policy gradient methods converge within a finite number of iterations in that case. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: 43 pages, 2 tables; to appear in the proceedings of NeurIPS 2022

MSC Class: Primary 91A15; 91A26; secondary 68Q32; 68T05; 90C40

arXiv:2209.08416 [pdf, other]

doi 10.3934/jdg.2022021

Survival of dominated strategies under imitation dynamics

Authors: Panayotis Mertikopoulos, Yannick Viossat

Abstract: The literature on evolutionary game theory suggests that pure strategies that are strictly dominated by other pure strategies always become extinct under imitative game dynamics, but they can survive under innovative dynamics. As we explain, this is because innovative dynamics favour rare strategies while standard imitative dynamics do not. However, as we also show, there are reasonable imitation… ▽ More The literature on evolutionary game theory suggests that pure strategies that are strictly dominated by other pure strategies always become extinct under imitative game dynamics, but they can survive under innovative dynamics. As we explain, this is because innovative dynamics favour rare strategies while standard imitative dynamics do not. However, as we also show, there are reasonable imitation protocols that favour rare or frequent strategies, thus allowing strictly dominated strategies to survive in large classes of imitation dynamics. Dominated strategies can persist at nontrivial frequencies even when the level of domination is not small. △ Less

Submitted 17 September, 2022; originally announced September 2022.

Comments: 27 pages, 7 figures

MSC Class: Primary: 91A22; 91A26

arXiv:2209.04926 [pdf, other]

Learning in Games with Quantized Payoff Observations

Authors: Kyriakos Lotidis, Panayotis Mertikopoulos, Nicholas Bambos

Abstract: This paper investigates the impact of feedback quantization on multi-agent learning. In particular, we analyze the equilibrium convergence properties of the well-known "follow the regularized leader" (FTRL) class of algorithms when players can only observe a quantized (and possibly noisy) version of their payoffs. In this information-constrained setting, we show that coarser quantization triggers… ▽ More This paper investigates the impact of feedback quantization on multi-agent learning. In particular, we analyze the equilibrium convergence properties of the well-known "follow the regularized leader" (FTRL) class of algorithms when players can only observe a quantized (and possibly noisy) version of their payoffs. In this information-constrained setting, we show that coarser quantization triggers a qualitative shift in the convergence behavior of FTRL schemes. Specifically, if the quantization error lies below a threshold value (which depends only on the underlying game and not on the level of uncertainty entering the process or the specific FTRL variant under study), then (i) FTRL is attracted to the game's strict Nash equilibria with arbitrarily high probability; and (ii) the algorithm's asymptotic rate of convergence remains the same as in the non-quantized case. Otherwise, for larger quantization levels, these convergence properties are lost altogether: players may fail to learn anything beyond their initial state, even with full information on their payoff vectors. This is in contrast to the impact of quantization in continuous optimization problems, where the quality of the obtained solution degrades smoothly with the quantization level. △ Less

Submitted 11 September, 2022; originally announced September 2022.

arXiv:2207.07543 [pdf, other]

Pick your Neighbor: Local Gauss-Southwell Rule for Fast Asynchronous Decentralized Optimization

Authors: Marina Costantini, Nikolaos Liakopoulos, Panayotis Mertikopoulos, Thrasyvoulos Spyropoulos

Abstract: In decentralized optimization environments, each agent $i$ in a network of $n$ nodes has its own private function $f_i$, and nodes communicate with their neighbors to cooperatively minimize the aggregate objective $\sum_{i=1}^n f_i$. In this setting, synchronizing the nodes' updates incurs significant communication overhead and computational costs, so much of the recent literature has focused on t… ▽ More In decentralized optimization environments, each agent $i$ in a network of $n$ nodes has its own private function $f_i$, and nodes communicate with their neighbors to cooperatively minimize the aggregate objective $\sum_{i=1}^n f_i$. In this setting, synchronizing the nodes' updates incurs significant communication overhead and computational costs, so much of the recent literature has focused on the analysis and design of asynchronous optimization algorithms, where agents activate and communicate at arbitrary times without needing a global synchronization enforcer. However, most works assume that when a node activates, it selects the neighbor to contact based on a fixed probability (e.g., uniformly at random), a choice that ignores the optimization landscape at the moment of activation. Instead, in this work we introduce an optimization-aware selection rule that chooses the neighbor providing the highest dual cost improvement (a quantity related to a dualization of the problem based on consensus). This scheme is related to the coordinate descent (CD) method with the Gauss-Southwell (GS) rule for coordinate updates; in our setting however, only a subset of coordinates is accessible at each iteration (because each node can communicate only with its neighbors), so the existing literature on GS methods does not apply. To overcome this difficulty, we develop a new analytical framework for smooth and strongly convex $f_i$ that covers the class of set-wise CD algorithms -- a class that directly applies to decentralized scenarios, but is not limited to them -- and we show that the proposed set-wise GS rule achieves a speedup factor of up to the maximum degree in the network (which is in the order of $Θ(n)$ for highly connected graphs). The speedup predicted by our analysis is validated in numerical experiments with synthetic data. △ Less

Submitted 15 September, 2022; v1 submitted 15 July, 2022; originally announced July 2022.

Comments: Revised writing, added references

MSC Class: 90C25; 68T99 ACM Class: G.1.6; I.2.11; I.2.6; C.2.4

arXiv:2206.09352 [pdf, other]

A universal black-box optimization method with almost dimension-free convergence rate guarantees

Authors: Kimon Antonakopoulos, Dong Quan Vu, Vokan Cevher, Kfir Y. Levy, Panayotis Mertikopoulos

Abstract: Universal methods for optimization are designed to achieve theoretically optimal convergence rates without any prior knowledge of the problem's regularity parameters or the accurarcy of the gradient oracle employed by the optimizer. In this regard, existing state-of-the-art algorithms achieve an $\mathcal{O}(1/T^2)$ value convergence rate in Lipschitz smooth problems with a perfect gradient oracle… ▽ More Universal methods for optimization are designed to achieve theoretically optimal convergence rates without any prior knowledge of the problem's regularity parameters or the accurarcy of the gradient oracle employed by the optimizer. In this regard, existing state-of-the-art algorithms achieve an $\mathcal{O}(1/T^2)$ value convergence rate in Lipschitz smooth problems with a perfect gradient oracle, and an $\mathcal{O}(1/\sqrt{T})$ convergence rate when the underlying problem is non-smooth and/or the gradient oracle is stochastic. On the downside, these methods do not take into account the problem's dimensionality, and this can have a catastrophic impact on the achieved convergence rate, in both theory and practice. Our paper aims to bridge this gap by providing a scalable universal gradient method - dubbed UnderGrad - whose oracle complexity is almost dimension-free in problems with a favorable geometry (like the simplex, linearly constrained semidefinite programs and combinatorial bandits), while retaining the order-optimal dependence on $T$ described above. These "best-of-both-worlds" results are achieved via a primal-dual update scheme inspired by the dual exploration method for variational inequalities. △ Less

Submitted 19 June, 2022; originally announced June 2022.

Comments: 31 pages, 4 figures, 1 table; to appear in ICML 2022

MSC Class: Primary 90C25; 90C15; secondary 68Q32; 68T05

arXiv:2206.09348 [pdf, other]

Nested bandits

Authors: Matthieu Martin, Panayotis Mertikopoulos, Thibaud Rahier, Houssam Zenati

Abstract: In many online decision processes, the optimizing agent is called to choose between large numbers of alternatives with many inherent similarities; in turn, these similarities imply closely correlated losses that may confound standard discrete choice models and bandit algorithms. We study this question in the context of nested bandits, a class of adversarial multi-armed bandit problems where the le… ▽ More In many online decision processes, the optimizing agent is called to choose between large numbers of alternatives with many inherent similarities; in turn, these similarities imply closely correlated losses that may confound standard discrete choice models and bandit algorithms. We study this question in the context of nested bandits, a class of adversarial multi-armed bandit problems where the learner seeks to minimize their regret in the presence of a large number of distinct alternatives with a hierarchy of embedded (non-combinatorial) similarities. In this setting, optimal algorithms based on the exponential weights blueprint (like Hedge, EXP3, and their variants) may incur significant regret because they tend to spend excessive amounts of time exploring irrelevant alternatives with similar, suboptimal costs. To account for this, we propose a nested exponential weights (NEW) algorithm that performs a layered exploration of the learner's set of alternatives based on a nested, step-by-step selection method. In so doing, we obtain a series of tight bounds for the learner's regret showing that online learning problems with a high degree of similarity between alternatives can be resolved efficiently, without a red bus / blue bus paradox occurring. △ Less

Submitted 19 June, 2022; originally announced June 2022.

Comments: 35 pages, 14 figures; to appear in ICML 2022

MSC Class: Primary 68Q32; secondary 91B06

arXiv:2206.06795 [pdf, other]

Riemannian stochastic approximation algorithms

Authors: Mohammad Reza Karimi, Ya-** Hsieh, Panayotis Mertikopoulos, Andreas Krause

Abstract: We examine a wide class of stochastic approximation algorithms for solving (stochastic) nonlinear problems on Riemannian manifolds. Such algorithms arise naturally in the study of Riemannian optimization, game theory and optimal transport, but their behavior is much less understood compared to the Euclidean case because of the lack of a global linear structure on the manifold. We overcome this dif… ▽ More We examine a wide class of stochastic approximation algorithms for solving (stochastic) nonlinear problems on Riemannian manifolds. Such algorithms arise naturally in the study of Riemannian optimization, game theory and optimal transport, but their behavior is much less understood compared to the Euclidean case because of the lack of a global linear structure on the manifold. We overcome this difficulty by introducing a suitable Fermi coordinate frame which allows us to map the asymptotic behavior of the Riemannian Robbins-Monro (RRM) algorithms under study to that of an associated deterministic dynamical system. In so doing, we provide a general template of almost sure convergence results that mirrors and extends the existing theory for Euclidean Robbins-Monro schemes, despite the significant complications that arise due to the curvature and topology of the underlying manifold. We showcase the flexibility of the proposed framework by applying it to a range of retraction-based variants of the popular optimistic / extra-gradient methods for solving minimization problems and games, and we provide a unified treatment for their convergence. △ Less

Submitted 27 December, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: 33 pages, 2 figures; a one-page abstract of this paper was presented in COLT 2022

MSC Class: Primary 62L20; 37N40; secondary 90C15; 90C47; 90C48

arXiv:2206.06015 [pdf, other]

No-Regret Learning in Games with Noisy Feedback: Faster Rates and Adaptivity via Learning Rate Separation

Authors: Yu-Guan Hsieh, Kimon Antonakopoulos, Volkan Cevher, Panayotis Mertikopoulos

Abstract: We examine the problem of regret minimization when the learner is involved in a continuous game with other optimizing agents: in this case, if all players follow a no-regret algorithm, it is possible to achieve significantly lower regret relative to fully adversarial environments. We study this problem in the context of variationally stable games (a class of continuous games which includes all con… ▽ More We examine the problem of regret minimization when the learner is involved in a continuous game with other optimizing agents: in this case, if all players follow a no-regret algorithm, it is possible to achieve significantly lower regret relative to fully adversarial environments. We study this problem in the context of variationally stable games (a class of continuous games which includes all convex-concave and monotone games), and when the players only have access to noisy estimates of their individual payoff gradients. If the noise is additive, the game-theoretic and purely adversarial settings enjoy similar regret guarantees; however, if the noise is multiplicative, we show that the learners can, in fact, achieve constant regret. We achieve this faster rate via an optimistic gradient scheme with learning rate separation -- that is, the method's extrapolation and update steps are tuned to different schedules, depending on the noise profile. Subsequently, to eliminate the need for delicate hyperparameter tuning, we propose a fully adaptive method that attains nearly the same guarantees as its non-adapted counterpart, while operating without knowledge of either the game or of the noise profile. △ Less

Submitted 17 March, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

Comments: In Advances in Neural Information Processing Systems 35 (NeurIPS 2022)

arXiv:2206.03922 [pdf, other]

A unified stochastic approximation framework for learning in games

Authors: Panayotis Mertikopoulos, Ya-** Hsieh, Volkan Cevher

Abstract: We develop a flexible stochastic approximation framework for analyzing the long-run behavior of learning in games (both continuous and finite). The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, the exponential/multiplicative weights algorithm for learning in finite games, optimistic and bandit variants of the above, etc. In a… ▽ More We develop a flexible stochastic approximation framework for analyzing the long-run behavior of learning in games (both continuous and finite). The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, the exponential/multiplicative weights algorithm for learning in finite games, optimistic and bandit variants of the above, etc. In addition to providing an integrated view of these algorithms, our framework further allows us to obtain several new convergence results, both asymptotic and in finite time, in both continuous and finite games. Specifically, we provide a range of criteria for identifying classes of Nash equilibria and sets of action profiles that are attracting with high probability, and we also introduce the notion of coherence, a game-theoretic property that includes strict and sharp equilibria, and which leads to convergence in finite time. Importantly, our analysis applies to both oracle-based and bandit, payoff-based methods - that is, when players only observe their realized payoffs. △ Less

Submitted 3 July, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

Comments: 40 pages, 5 figures, 2 tables

MSC Class: Primary 91A10; 91A26; secondary 68Q32; 68T02

arXiv:2201.02985 [pdf, other]

Routing in an Uncertain World: Adaptivity, Efficiency, and Equilibrium

Authors: Dong Quan Vu, Kimon Antonakopoulos, Panayotis Mertikopoulos

Abstract: We consider the traffic assignment problem in nonatomic routing games where the players' cost functions may be subject to random fluctuations (e.g., weather disturbances, perturbations in the underlying network, etc.). We tackle this problem from the viewpoint of a control interface that makes routing recommendations based solely on observed costs and without any further knowledge of the system's… ▽ More We consider the traffic assignment problem in nonatomic routing games where the players' cost functions may be subject to random fluctuations (e.g., weather disturbances, perturbations in the underlying network, etc.). We tackle this problem from the viewpoint of a control interface that makes routing recommendations based solely on observed costs and without any further knowledge of the system's governing dynamics -- such as the network's cost functions, the distribution of any random events affecting the network, etc. In this online setting, learning methods based on the popular exponential weights algorithm converge to equilibrium at an $\mathcal{O}({1/\sqrt{T}})$ rate: this rate is known to be order-optimal in stochastic networks, but it is otherwise suboptimal in static networks. In the latter case, it is possible to achieve an $\mathcal{O}({1/T^{2}})$ equilibrium convergence rate via the use of finely tuned accelerated algorithms; on the other hand, these accelerated algorithms fail to converge altogether in the presence of persistent randomness, so it is not clear how to achieve the "best of both worlds" in terms of convergence speed. Our paper seeks to fill this gap by proposing an adaptive routing algortihm with the following desirable properties: $(i)$ it seamlessly interpolates between the $\mathcal{O}({1/T^{2}})$ and $\mathcal{O}({1/\sqrt{T}})$ rates for static and stochastic environments respectively; $(ii)$ its convergence speed is polylogarithmic in the number of paths in the network; ${(iii)}$ the method's per-iteration complexity and memory requirements are both linear in the number of nodes and edges in the network; and ${(iv)}$ it does not require any prior knowledge of the problem's parameters. △ Less

Submitted 9 January, 2022; originally announced January 2022.

arXiv:2109.05829 [pdf, other]

Zeroth-order non-convex learning via hierarchical dual averaging

Authors: Amélie Héliou, Matthieu Martin, Panayotis Mertikopoulos, Thibaud Rahier

Abstract: We propose a hierarchical version of dual averaging for zeroth-order online non-convex optimization - i.e., learning processes where, at each stage, the optimizer is facing an unknown non-convex loss function and only receives the incurred loss as feedback. The proposed class of policies relies on the construction of an online model that aggregates loss information as it arrives, and it consists o… ▽ More We propose a hierarchical version of dual averaging for zeroth-order online non-convex optimization - i.e., learning processes where, at each stage, the optimizer is facing an unknown non-convex loss function and only receives the incurred loss as feedback. The proposed class of policies relies on the construction of an online model that aggregates loss information as it arrives, and it consists of two principal components: (a) a regularizer adapted to the Fisher information metric (as opposed to the metric norm of the ambient space); and (b) a principled exploration of the problem's state space based on an adapted hierarchical schedule. This construction enables sharper control of the model's bias and variance, and allows us to derive tight bounds for both the learner's static and dynamic regret - i.e., the regret incurred against the best dynamic policy in hindsight over the horizon of play. △ Less

Submitted 13 September, 2021; originally announced September 2021.

Comments: 40 pages, 14 figures

MSC Class: Primary 68Q32; 90C56; secondary 90C15; 90C26

arXiv:2108.04506 [pdf, other]

A heuristic for estimating Nash equilibria in first-price auctions with correlated values

Authors: Benjamin Heymann, Panayotis Mertikopoulos

Abstract: Our paper concerns the computation of Nash equilibria of first-price auctions with correlated values. While there exist several equilibrium computation methods for auctions with independent values, the correlation of the bidders' values introduces significant complications that render existing methods unsatisfactory in practice. Our contribution is a step towards filling this gap: inspired by the… ▽ More Our paper concerns the computation of Nash equilibria of first-price auctions with correlated values. While there exist several equilibrium computation methods for auctions with independent values, the correlation of the bidders' values introduces significant complications that render existing methods unsatisfactory in practice. Our contribution is a step towards filling this gap: inspired by the seminal fictitious play process of Brown and Robinson, we present a learning heuristic-that we call fictitious bidding (FB)-for estimating Bayes-Nash equilibria of first-price auctions with correlated values, and we assess the performance of this heuristic on several relevant examples. △ Less

Submitted 10 August, 2021; originally announced August 2021.

arXiv:2107.08011 [pdf, other]

Adaptive first-order methods revisited: Convex optimization without Lipschitz requirements

Authors: Kimon Antonakopoulos, Panayotis Mertikopoulos

Abstract: We propose a new family of adaptive first-order methods for a class of convex minimization problems that may fail to be Lipschitz continuous or smooth in the standard sense. Specifically, motivated by a recent flurry of activity on non-Lipschitz (NoLips) optimization, we consider problems that are continuous or smooth relative to a reference Bregman function - as opposed to a global, ambient norm… ▽ More We propose a new family of adaptive first-order methods for a class of convex minimization problems that may fail to be Lipschitz continuous or smooth in the standard sense. Specifically, motivated by a recent flurry of activity on non-Lipschitz (NoLips) optimization, we consider problems that are continuous or smooth relative to a reference Bregman function - as opposed to a global, ambient norm (Euclidean or otherwise). These conditions encompass a wide range of problems with singular objectives, such as Fisher markets, Poisson tomography, D-design, and the like. In this setting, the application of existing order-optimal adaptive methods - like UnixGrad or AcceleGrad - is not possible, especially in the presence of randomness and uncertainty. The proposed method - which we call adaptive mirror descent (AdaMir) - aims to close this gap by concurrently achieving min-max optimal rates in problems that are relatively continuous or smooth, including stochastic ones. △ Less

Submitted 16 July, 2021; originally announced July 2021.

Comments: 34 pages, 4 figures

MSC Class: Primary 90C25; 90C15; 90C30; secondary 68Q25; 90C60

arXiv:2107.02919 [pdf, other]

Distributed stochastic optimization with large delays

Authors: Zhengyuan Zhou, Panayotis Mertikopoulos, Nicholas Bambos, Peter W. Glynn, Yinyu Ye

Abstract: One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent on distributed computing architectures (possibly) asychronously. However, a key obstacle in the efficient implementation of DASGD is the issue of delays: when a… ▽ More One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent on distributed computing architectures (possibly) asychronously. However, a key obstacle in the efficient implementation of DASGD is the issue of delays: when a computing node contributes a gradient update, the global model parameter may have already been updated by other nodes several times over, thereby rendering this gradient information stale. These delays can quickly add up if the computational throughput of a node is saturated, so the convergence of DASGD may be compromised in the presence of large delays. Our first contribution is that, by carefully tuning the algorithm's step-size, convergence to the critical set is still achieved in mean square, even if the delays grow unbounded at a polynomial rate. We also establish finer results in a broad class of structured optimization problems (called variationally coherent), where we show that DASGD converges to a global optimum with probability $1$ under the same delay assumptions. Together, these results contribute to the broad landscape of large-scale non-convex stochastic optimization by offering state-of-the-art theoretical guarantees and providing insights for algorithm design. △ Less

Submitted 6 July, 2021; originally announced July 2021.

Comments: 41 pages, 8 figures; to be published in Mathematics of Operations Research

MSC Class: Primary 90C15; 90C26; secondary 90C25; 90C06

arXiv:2107.01906 [pdf, ps, other]

The Last-Iterate Convergence Rate of Optimistic Mirror Descent in Stochastic Variational Inequalities

Authors: Waïss Azizian, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos

Abstract: In this paper, we analyze the local convergence rate of optimistic mirror descent methods in stochastic variational inequalities, a class of optimization problems with important applications to learning theory and machine learning. Our analysis reveals an intricate relation between the algorithm's rate of convergence and the local geometry induced by the method's underlying Bregman function. We qu… ▽ More In this paper, we analyze the local convergence rate of optimistic mirror descent methods in stochastic variational inequalities, a class of optimization problems with important applications to learning theory and machine learning. Our analysis reveals an intricate relation between the algorithm's rate of convergence and the local geometry induced by the method's underlying Bregman function. We quantify this relation by means of the Legendre exponent, a notion that we introduce to measure the growth rate of the Bregman divergence relative to the ambient norm near a solution. We show that this exponent determines both the optimal step-size policy of the algorithm and the optimal rates attained, explaining in this way the differences observed for some popular Bregman functions (Euclidean projection, negative entropy, fractional power, etc.). △ Less

Submitted 5 July, 2021; originally announced July 2021.

Comments: 31 pages, 3 figures, 1 table; to be presented at the 34th Annual Conference on Learning Theory (COLT 2021)

MSC Class: 65K15; 90C33 (Primary) 68Q25; 68Q32 (Secondary)

arXiv:2107.01595 [pdf, ps, other]

Learning in nonatomic games, Part I: Finite action spaces and population games

Authors: Saeed Hadikhanloo, Rida Laraki, Panayotis Mertikopoulos, Sylvain Sorin

Abstract: We examine the long-run behavior of a wide range of dynamics for learning in nonatomic games, in both discrete and continuous time. The class of dynamics under consideration includes fictitious play and its regularized variants, the best-reply dynamics (again, possibly regularized), as well as the dynamics of dual averaging / "follow the regularized leader" (which themselves include as special cas… ▽ More We examine the long-run behavior of a wide range of dynamics for learning in nonatomic games, in both discrete and continuous time. The class of dynamics under consideration includes fictitious play and its regularized variants, the best-reply dynamics (again, possibly regularized), as well as the dynamics of dual averaging / "follow the regularized leader" (which themselves include as special cases the replicator dynamics and Friedman's projection dynamics). Our analysis concerns both the actual trajectory of play and its time-average, and we cover potential and monotone games, as well as games with an evolutionarily stable state (global or otherwise). We focus exclusively on games with finite action spaces; nonatomic games with continuous action spaces are treated in detail in Part II of this paper. △ Less

Submitted 4 July, 2021; originally announced July 2021.

Comments: 27 pages

MSC Class: Primary 91A22; 91A26; secondary 37N40; 68Q32

arXiv:2106.14636 [pdf, other]

Asymptotic Degradation of Linear Regression Estimates With Strategic Data Sources

Authors: Benjamin Roussillon, Nicolas Gast, Patrick Loiseau, Panayotis Mertikopoulos

Abstract: We consider the problem of linear regression from strategic data sources with a public good component, i.e., when data is provided by strategic agents who seek to minimize an individual provision cost for increasing their data's precision while benefiting from the model's overall precision. In contrast to previous works, our model tackles the case where there is uncertainty on the attributes chara… ▽ More We consider the problem of linear regression from strategic data sources with a public good component, i.e., when data is provided by strategic agents who seek to minimize an individual provision cost for increasing their data's precision while benefiting from the model's overall precision. In contrast to previous works, our model tackles the case where there is uncertainty on the attributes characterizing the agents' data -- a critical aspect of the problem when the number of agents is large. We provide a characterization of the game's equilibrium, which reveals an interesting connection with optimal design. Subsequently, we focus on the asymptotic behavior of the covariance of the linear regression parameters estimated via generalized least squares as the number of data sources becomes large. We provide upper and lower bounds for this covariance matrix and we show that, when the agents' provision costs are superlinear, the model's covariance converges to zero but at a slower rate relative to virtually all learning problems with exogenous data. On the other hand, if the agents' provision costs are linear, this covariance fails to converge. This shows that even the basic property of consistency of generalized least squares estimators is compromised when the data sources are strategic. △ Less

Submitted 11 March, 2022; v1 submitted 28 June, 2021; originally announced June 2021.

Comments: 31 pages, 7 figures

arXiv:2105.13348 [pdf, other]

Optimization in Open Networks via Dual Averaging

Authors: Yu-Guan Hsieh, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos

Abstract: In networks of autonomous agents (e.g., fleets of vehicles, scattered sensors), the problem of minimizing the sum of the agents' local functions has received a lot of interest. We tackle here this distributed optimization problem in the case of open networks when agents can join and leave the network at any time. Leveraging recent online optimization techniques, we propose and analyze the converge… ▽ More In networks of autonomous agents (e.g., fleets of vehicles, scattered sensors), the problem of minimizing the sum of the agents' local functions has received a lot of interest. We tackle here this distributed optimization problem in the case of open networks when agents can join and leave the network at any time. Leveraging recent online optimization techniques, we propose and analyze the convergence of a decentralized asynchronous optimization method for open networks. △ Less

Submitted 16 October, 2021; v1 submitted 27 May, 2021; originally announced May 2021.

Comments: In 60th IEEE Conference on Decision and Control (CDC 2021); 7 pages, 1 figure

arXiv:2104.12761 [pdf, other]

Adaptive Learning in Continuous Games: Optimal Regret Bounds and Convergence to Nash Equilibrium

Authors: Yu-Guan Hsieh, Kimon Antonakopoulos, Panayotis Mertikopoulos

Abstract: In game-theoretic learning, several agents are simultaneously following their individual interests, so the environment is non-stationary from each player's perspective. In this context, the performance of a learning algorithm is often measured by its regret. However, no-regret algorithms are not created equal in terms of game-theoretic guarantees: depending on how they are tuned, some of them may… ▽ More In game-theoretic learning, several agents are simultaneously following their individual interests, so the environment is non-stationary from each player's perspective. In this context, the performance of a learning algorithm is often measured by its regret. However, no-regret algorithms are not created equal in terms of game-theoretic guarantees: depending on how they are tuned, some of them may drive the system to an equilibrium, while others could produce cyclic, chaotic, or otherwise divergent trajectories. To account for this, we propose a range of no-regret policies based on optimistic mirror descent, with the following desirable properties: i) they do not require any prior tuning or knowledge of the game; ii) they all achieve O(\sqrt{T}) regret against arbitrary, adversarial opponents; and iii) they converge to the best response against convergent opponents. Also, if employed by all players, then iv) they guarantee O(1) social regret; while v) the induced sequence of play converges to Nash equilibrium with O(1) individual regret in all variationally stable games (a class of games that includes all monotone and convex-concave zero-sum games). △ Less

Submitted 16 October, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

Comments: In the 34th Annual Conference on Learning Theory (COLT 2021); 35 pages, 2 figures

arXiv:2101.04667 [pdf, ps, other]

Survival of the strictest: Stable and unstable equilibria under regularized learning with partial information

Authors: Angeliki Giannou, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Panayotis Mertikopoulos

Abstract: In this paper, we examine the Nash equilibrium convergence properties of no-regret learning in general N-player games. For concreteness, we focus on the archetypal follow the regularized leader (FTRL) family of algorithms, and we consider the full spectrum of uncertainty that the players may encounter - from noisy, oracle-based feedback, to bandit, payoff-based information. In this general context… ▽ More In this paper, we examine the Nash equilibrium convergence properties of no-regret learning in general N-player games. For concreteness, we focus on the archetypal follow the regularized leader (FTRL) family of algorithms, and we consider the full spectrum of uncertainty that the players may encounter - from noisy, oracle-based feedback, to bandit, payoff-based information. In this general context, we establish a comprehensive equivalence between the stability of a Nash equilibrium and its support: a Nash equilibrium is stable and attracting with arbitrarily high probability if and only if it is strict (i.e., each equilibrium strategy has a unique best response). This equivalence extends existing continuous-time versions of the folk theorem of evolutionary game theory to a bona fide algorithmic learning setting, and it provides a clear refinement criterion for the prediction of the day-to-day behavior of no-regret learning in games △ Less

Submitted 4 February, 2021; v1 submitted 12 January, 2021; originally announced January 2021.

arXiv:2012.11579 [pdf, ps, other]

Multi-Agent Online Optimization with Delays: Asynchronicity, Adaptivity, and Optimism

Authors: Yu-Guan Hsieh, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos

Abstract: In this paper, we provide a general framework for studying multi-agent online learning problems in the presence of delays and asynchronicities. Specifically, we propose and analyze a class of adaptive dual averaging schemes in which agents only need to accumulate gradient feedback received from the whole system, without requiring any between-agent coordination. In the single-agent case, the adapti… ▽ More In this paper, we provide a general framework for studying multi-agent online learning problems in the presence of delays and asynchronicities. Specifically, we propose and analyze a class of adaptive dual averaging schemes in which agents only need to accumulate gradient feedback received from the whole system, without requiring any between-agent coordination. In the single-agent case, the adaptivity of the proposed method allows us to extend a range of existing results to problems with potentially unbounded delays between playing an action and receiving the corresponding feedback. In the multi-agent case, the situation is significantly more complicated because agents may not have access to a global clock to use as a reference point; to overcome this, we focus on the information that is available for producing each prediction rather than the actual delay associated with each feedback. This allows us to derive adaptive learning strategies with optimal regret bounds, even in a fully decentralized, asynchronous environment. Finally, we also analyze an "optimistic" variant of the proposed algorithm which is capable of exploiting the predictability of problems with a slower variation and leads to improved regret bounds. △ Less

Submitted 16 April, 2022; v1 submitted 21 December, 2020; originally announced December 2020.

Comments: Accepted by Journal of Machine Learning Research (JMLR)

arXiv:2010.12100 [pdf, other]

Adaptive extra-gradient methods for min-max optimization and games

Authors: Kimon Antonakopoulos, E. Veronica Belmega, Panayotis Mertikopoulos

Abstract: We present a new family of min-max optimization algorithms that automatically exploit the geometry of the gradient data observed at earlier iterations to perform more informative extra-gradient steps in later ones. Thanks to this adaptation mechanism, the proposed method automatically detects whether the problem is smooth or not, without requiring any prior tuning by the optimizer. As a result, th… ▽ More We present a new family of min-max optimization algorithms that automatically exploit the geometry of the gradient data observed at earlier iterations to perform more informative extra-gradient steps in later ones. Thanks to this adaptation mechanism, the proposed method automatically detects whether the problem is smooth or not, without requiring any prior tuning by the optimizer. As a result, the algorithm simultaneously achieves order-optimal convergence rates, i.e., it converges to an $\varepsilon$-optimal solution within $\mathcal{O}(1/\varepsilon)$ iterations in smooth problems, and within $\mathcal{O}(1/\varepsilon^2)$ iterations in non-smooth ones. Importantly, these guarantees do not require any of the standard boundedness or Lipschitz continuity conditions that are typically assumed in the literature; in particular, they apply even to problems with singularities (such as resource allocation problems and the like). This adaptation is achieved through the use of a geometric apparatus based on Finsler metrics and a suitably chosen mirror-prox template that allows us to derive sharp convergence rates for the methods at hand. △ Less

Submitted 19 November, 2020; v1 submitted 22 October, 2020; originally announced October 2020.

Comments: 28 pages, 5 figures, 1 table

MSC Class: Primary 90C47; 91A68; secondary 49J40; 90C33

arXiv:2010.09514 [pdf, ps, other]

No-regret learning and mixed Nash equilibria: They do not mix

Authors: Lampros Flokas, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Thanasis Lianeas, Panayotis Mertikopoulos, Georgios Piliouras

Abstract: Understanding the behavior of no-regret dynamics in general $N$-player games is a fundamental question in online learning and game theory. A folk result in the field states that, in finite games, the empirical frequency of play under no-regret learning converges to the game's set of coarse correlated equilibria. By contrast, our understanding of how the day-to-day behavior of the dynamics correlat… ▽ More Understanding the behavior of no-regret dynamics in general $N$-player games is a fundamental question in online learning and game theory. A folk result in the field states that, in finite games, the empirical frequency of play under no-regret learning converges to the game's set of coarse correlated equilibria. By contrast, our understanding of how the day-to-day behavior of the dynamics correlates to the game's Nash equilibria is much more limited, and only partial results are known for certain classes of games (such as zero-sum or congestion games). In this paper, we study the dynamics of "follow-the-regularized-leader" (FTRL), arguably the most well-studied class of no-regret dynamics, and we establish a swee** negative result showing that the notion of mixed Nash equilibrium is antithetical to no-regret learning. Specifically, we show that any Nash equilibrium which is not strict (in that every player has a unique best response) cannot be stable and attracting under the dynamics of FTRL. This result has significant implications for predicting the outcome of a learning process as it shows unequivocally that only strict (and hence, pure) Nash equilibria can emerge as stable limit points thereof. △ Less

Submitted 20 October, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

Comments: 24 pages, 7 figures, 1 table

MSC Class: Primary 91A26; 37N40; secondary 91A68; 68Q32; 68T05

arXiv:2010.08496 [pdf, other]

Online non-convex optimization with imperfect feedback

Authors: Amélie Héliou, Matthieu Martin, Panayotis Mertikopoulos, Thibaud Rahier

Abstract: We consider the problem of online learning with non-convex losses. In terms of feedback, we assume that the learner observes - or otherwise constructs - an inexact model for the loss function encountered at each stage, and we propose a mixed-strategy learning policy based on dual averaging. In this general context, we derive a series of tight regret minimization guarantees, both for the learner's… ▽ More We consider the problem of online learning with non-convex losses. In terms of feedback, we assume that the learner observes - or otherwise constructs - an inexact model for the loss function encountered at each stage, and we propose a mixed-strategy learning policy based on dual averaging. In this general context, we derive a series of tight regret minimization guarantees, both for the learner's static (external) regret, as well as the regret incurred against the best dynamic policy in hindsight. Subsequently, we apply this general template to the case where the learner only has access to the actual loss incurred at each stage of the process. This is achieved by means of a kernel-based estimator which generates an inexact model for each round's loss function using only the learner's realized losses as input. △ Less

Submitted 16 October, 2020; originally announced October 2020.

Comments: 30 pages, 2 figures, 1 table

MSC Class: Primary 68Q32; secondary 90C26; 91A26

arXiv:2010.06250 [pdf, ps, other]

Regret minimization in stochastic non-convex learning via a proximal-gradient approach

Authors: Nadav Hallak, Panayotis Mertikopoulos, Volkan Cevher

Abstract: Motivated by applications in machine learning and operations research, we study regret minimization with stochastic first-order oracle feedback in online constrained, and possibly non-smooth, non-convex problems. In this setting, the minimization of external regret is beyond reach for first-order methods, so we focus on a local regret measure defined via a proximal-gradient map**. To achieve no… ▽ More Motivated by applications in machine learning and operations research, we study regret minimization with stochastic first-order oracle feedback in online constrained, and possibly non-smooth, non-convex problems. In this setting, the minimization of external regret is beyond reach for first-order methods, so we focus on a local regret measure defined via a proximal-gradient map**. To achieve no (local) regret in this setting, we develop a prox-grad method based on stochastic first-order feedback, and a simpler method for when access to a perfect first-order oracle is possible. Both methods are min-max order-optimal, and we also establish a bound on the number of prox-grad queries these methods require. As an important application of our results, we also obtain a link between online and offline non-convex stochastic optimization manifested as a new prox-grad scheme with complexity guarantees matching those obtained via variance reduction techniques. △ Less

Submitted 13 October, 2020; originally announced October 2020.

arXiv:2006.11144 [pdf, other]

On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

Authors: Panayotis Mertikopoulos, Nadav Hallak, Ali Kavis, Volkan Cevher

Abstract: This paper analyzes the trajectories of stochastic gradient descent (SGD) to help understand the algorithm's convergence properties in non-convex problems. We first show that the sequence of iterates generated by SGD remains bounded and converges with probability $1$ under a very broad range of step-size schedules. Subsequently, going beyond existing positive probability guarantees, we show that S… ▽ More This paper analyzes the trajectories of stochastic gradient descent (SGD) to help understand the algorithm's convergence properties in non-convex problems. We first show that the sequence of iterates generated by SGD remains bounded and converges with probability $1$ under a very broad range of step-size schedules. Subsequently, going beyond existing positive probability guarantees, we show that SGD avoids strict saddle points/manifolds with probability $1$ for the entire spectrum of step-size policies considered. Finally, we prove that the algorithm's rate of convergence to Hurwicz minimizers is $\mathcal{O}(1/n^{p})$ if the method is employed with a $Θ(1/n^p)$ step-size schedule. This provides an important guideline for tuning the algorithm's step-size as it suggests that a cool-down phase with a vanishing step-size could lead to faster convergence; we demonstrate this heuristic using ResNet architectures on CIFAR. △ Less

Submitted 19 June, 2020; originally announced June 2020.

Comments: 32 pages, 8 figures

MSC Class: Primary 90C26; 62L20; secondary 90C30; 90C15; 37N40

arXiv:2006.10911 [pdf, ps, other]

Gradient-free Online Learning in Games with Delayed Rewards

Authors: Amélie Héliou, Panayotis Mertikopoulos, Zhengyuan Zhou

Abstract: Motivated by applications to online advertising and recommender systems, we consider a game-theoretic model with delayed rewards and asynchronous, payoff-based feedback. In contrast to previous work on delayed multi-armed bandits, we focus on multi-player games with continuous action spaces, and we examine the long-run behavior of strategic agents that follow a no-regret learning policy (but are o… ▽ More Motivated by applications to online advertising and recommender systems, we consider a game-theoretic model with delayed rewards and asynchronous, payoff-based feedback. In contrast to previous work on delayed multi-armed bandits, we focus on multi-player games with continuous action spaces, and we examine the long-run behavior of strategic agents that follow a no-regret learning policy (but are otherwise oblivious to the game being played, the objectives of their opponents, etc.). To account for the lack of a consistent stream of information (for instance, rewards can arrive out of order, with an a priori unbounded delay, etc.), we introduce a gradient-free learning policy where payoff information is placed in a priority queue as it arrives. In this general context, we derive new bounds for the agents' regret; furthermore, under a standard diagonal concavity assumption, we show that the induced sequence of play converges to Nash equilibrium with probability $1$, even if the delay between choosing an action and receiving the corresponding reward is unbounded. △ Less

Submitted 18 June, 2020; originally announced June 2020.

Comments: 26 pages, 4 figures; to appear in ICML 2020

MSC Class: Primary 91A10; 91A68; 68Q32; secondary 91A20; 91A26; 68T05

arXiv:2006.09065 [pdf, other]

The limits of min-max optimization algorithms: convergence to spurious non-critical sets

Authors: Ya-** Hsieh, Panayotis Mertikopoulos, Volkan Cevher

Abstract: Compared to ordinary function minimization problems, min-max optimization algorithms encounter far greater challenges because of the existence of periodic cycles and similar phenomena. Even though some of these behaviors can be overcome in the convex-concave regime, the general case is considerably more difficult. On that account, we take an in-depth look at a comprehensive class of state-of-the a… ▽ More Compared to ordinary function minimization problems, min-max optimization algorithms encounter far greater challenges because of the existence of periodic cycles and similar phenomena. Even though some of these behaviors can be overcome in the convex-concave regime, the general case is considerably more difficult. On that account, we take an in-depth look at a comprehensive class of state-of-the art algorithms and prevalent heuristics in non-convex / non-concave problems, and we establish the following general results: a) generically, the algorithms' limit points are contained in the ICT sets of a common, mean-field system; b) the attractors of this system also attract the algorithms in question with arbitrarily high probability; and c) all algorithms avoid the system's unstable sets with probability 1. On the surface, this provides a highly optimistic outlook for min-max algorithms; however, we show that there exist spurious attractors that do not contain any stationary points of the problem under study. In this regard, our work suggests that existing min-max algorithms may be subject to inescapable convergence failures. We complement our theoretical analysis by illustrating such attractors in simple, two-dimensional, almost bilinear problems. △ Less

Submitted 14 February, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

arXiv:2006.05445 [pdf, other]

doi 10.1109/TSP.2020.3029983

Fast Optimization with Zeroth-Order Feedback in Distributed, Multi-User MIMO Systems

Authors: Olivier Bilenne, Panayotis Mertikopoulos, E. Veronica Belmega

Abstract: In this paper, we develop a gradient-free optimization methodology for efficient resource allocation in Gaussian MIMO multiple access channels. Our approach combines two main ingredients: (i) an entropic semidefinite optimization based on matrix exponential learning (MXL); and (ii) a one-shot gradient estimator which achieves low variance through the reuse of past information. This novel algorithm… ▽ More In this paper, we develop a gradient-free optimization methodology for efficient resource allocation in Gaussian MIMO multiple access channels. Our approach combines two main ingredients: (i) an entropic semidefinite optimization based on matrix exponential learning (MXL); and (ii) a one-shot gradient estimator which achieves low variance through the reuse of past information. This novel algorithm, which we call gradient-free MXL algorithm with callbacks (MXL0$^{+}$), retains the convergence speed of gradient-based methods while requiring minimal feedback per iteration$-$a single scalar. In more detail, in a MIMO multiple access channel with $K$ users and $M$ transmit antennas per user, the MXL0$^{+}$ algorithm achieves $ε$-optimality within $\text{poly}(K,M)/ε^2$ iterations (on average and with high probability), even when implemented in a fully distributed, asynchronous manner. For cross-validation, we also perform a series of numerical experiments in medium- to large-scale MIMO networks under realistic channel conditions. Throughout our experiments, the performance of MXL0$^{+}$ matches$-$and sometimes exceeds$-$that of gradient-based MXL methods, all the while operating with a vastly reduced communication overhead. In view of these findings, the MXL0$^{+}$ algorithm appears to be uniquely suited for distributed massive MIMO systems where gradient calculations can become prohibitively expensive. △ Less

Submitted 18 October, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

Comments: Final version; to appear in IEEE Transactions on Signal Processing; 16 pages, 4 figures

ACM Class: C.2.1; G.1.6

Journal ref: IEEE Trans. Signal Process., vol. 68, pp. 6085-6100, 2020

arXiv:2003.10162 [pdf, other]

Explore Aggressively, Update Conservatively: Stochastic Extragradient Methods with Variable Stepsize Scaling

Authors: Yu-Guan Hsieh, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos

Abstract: Owing to their stability and convergence speed, extragradient methods have become a staple for solving large-scale saddle-point problems in machine learning. The basic premise of these algorithms is the use of an extrapolation step before performing an update; thanks to this exploration step, extra-gradient methods overcome many of the non-convergence issues that plague gradient descent/ascent sch… ▽ More Owing to their stability and convergence speed, extragradient methods have become a staple for solving large-scale saddle-point problems in machine learning. The basic premise of these algorithms is the use of an extrapolation step before performing an update; thanks to this exploration step, extra-gradient methods overcome many of the non-convergence issues that plague gradient descent/ascent schemes. On the other hand, as we show in this paper, running vanilla extragradient with stochastic gradients may jeopardize its convergence, even in simple bilinear models. To overcome this failure, we investigate a double stepsize extragradient algorithm where the exploration step evolves at a more aggressive time-scale compared to the update step. We show that this modification allows the method to converge even with stochastic gradients, and we derive sharp convergence rates under an error bound condition. △ Less

Submitted 5 November, 2020; v1 submitted 23 March, 2020; originally announced March 2020.

Comments: In Advances in Neural Information Processing Systems 33 (NeurIPS 2020); 29 pages, 5 figures

MSC Class: 65K15; 62L20; 90C15; 90C33

arXiv:2003.09729 [pdf, other]

A new regret analysis for Adam-type algorithms

Authors: Ahmet Alacaoglu, Yura Malitsky, Panayotis Mertikopoulos, Volkan Cevher

Abstract: In this paper, we focus on a theory-practice gap for Adam and its variants (AMSgrad, AdamNC, etc.). In practice, these algorithms are used with a constant first-order moment parameter $β_{1}$ (typically between $0.9$ and $0.99$). In theory, regret guarantees for online convex optimization require a rapidly decaying $β_{1}\to0$ schedule. We show that this is an artifact of the standard analysis and… ▽ More In this paper, we focus on a theory-practice gap for Adam and its variants (AMSgrad, AdamNC, etc.). In practice, these algorithms are used with a constant first-order moment parameter $β_{1}$ (typically between $0.9$ and $0.99$). In theory, regret guarantees for online convex optimization require a rapidly decaying $β_{1}\to0$ schedule. We show that this is an artifact of the standard analysis and propose a novel framework that allows us to derive optimal, data-dependent regret bounds with a constant $β_{1}$, without further assumptions. We also demonstrate the flexibility of our analysis on a wide range of different algorithms and settings. △ Less

Submitted 21 March, 2020; originally announced March 2020.

arXiv:2002.09806 [pdf, ps, other]

Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games

Authors: Tianyi Lin, Zhengyuan Zhou, Panayotis Mertikopoulos, Michael I. Jordan

Abstract: In this paper, we consider multi-agent learning via online gradient descent in a class of games called $λ$-cocoercive games, a fairly broad class of games that admits many Nash equilibria and that properly includes unconstrained strongly monotone games. We characterize the finite-time last-iterate convergence rate for joint OGD learning on $λ$-cocoercive games; further, building on this result, we… ▽ More In this paper, we consider multi-agent learning via online gradient descent in a class of games called $λ$-cocoercive games, a fairly broad class of games that admits many Nash equilibria and that properly includes unconstrained strongly monotone games. We characterize the finite-time last-iterate convergence rate for joint OGD learning on $λ$-cocoercive games; further, building on this result, we develop a fully adaptive OGD learning algorithm that does not require any knowledge of problem parameter (e.g. cocoercive constant $λ$) and show, via a novel double-stop** time technique, that this adaptive algorithm achieves same finite-time last-iterate convergence rate as non-adaptive counterpart. Subsequently, we extend OGD learning to the noisy gradient feedback case and establish last-iterate convergence results -- first qualitative almost sure convergence, then quantitative finite-time convergence rates -- all under non-decreasing step-sizes. To our knowledge, we provide the first set of results that fill in several gaps of the existing multi-agent online learning literature, where three aspects -- finite-time convergence rates, non-decreasing step-sizes, and fully adaptive algorithms have been unexplored before. △ Less

Submitted 17 July, 2021; v1 submitted 22 February, 2020; originally announced February 2020.

Comments: Accepted by ICML 2020; The first two authors contributed equally to this work

arXiv:2001.00468 [pdf, ps, other]

Quick or cheap? Breaking points in dynamic markets

Authors: Panayotis Mertikopoulos, Heinrich H. Nax, Bary S. R. Pradelski

Abstract: We examine two-sided markets where players arrive stochastically over time and are drawn from a continuum of types. The cost of matching a client and provider varies, so a social planner is faced with two contending objectives: a) to reduce players' waiting time before getting matched; and b) to form efficient pairs in order to reduce matching costs. We show that such markets are characterized by… ▽ More We examine two-sided markets where players arrive stochastically over time and are drawn from a continuum of types. The cost of matching a client and provider varies, so a social planner is faced with two contending objectives: a) to reduce players' waiting time before getting matched; and b) to form efficient pairs in order to reduce matching costs. We show that such markets are characterized by a quick-or-cheap dilemma: Under a large class of distributional assumptions, there is no 'free lunch', i.e., there exists no clearing schedule that is simultaneously optimal along both objectives. We further identify a unique breaking point signifying a stark reduction in matching cost contrasted by an increase in waiting time. Generalizing this model, we identify two regimes: one, where no free lunch exists; the other, where a window of opportunity opens to achieve a free lunch. Remarkably, greedy scheduling is never optimal in this setting. △ Less

Submitted 3 January, 2020; v1 submitted 2 January, 2020; originally announced January 2020.

Comments: 32 pages, 2 tables

arXiv:1908.08465 [pdf, other]

On the convergence of single-call stochastic extra-gradient methods

Authors: Yu-Guan Hsieh, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos

Abstract: Variational inequalities have recently attracted considerable interest in machine learning as a flexible paradigm for models that go beyond ordinary loss function minimization (such as generative adversarial networks and related deep learning systems). In this setting, the optimal $\mathcal{O}(1/t)$ convergence rate for solving smooth monotone variational inequalities is achieved by the Extra-Grad… ▽ More Variational inequalities have recently attracted considerable interest in machine learning as a flexible paradigm for models that go beyond ordinary loss function minimization (such as generative adversarial networks and related deep learning systems). In this setting, the optimal $\mathcal{O}(1/t)$ convergence rate for solving smooth monotone variational inequalities is achieved by the Extra-Gradient (EG) algorithm and its variants. Aiming to alleviate the cost of an extra gradient step per iteration (which can become quite substantial in deep learning applications), several algorithms have been proposed as surrogates to Extra-Gradient with a \emph{single} oracle call per iteration. In this paper, we develop a synthetic view of such algorithms, and we complement the existing literature by showing that they retain a $\mathcal{O}(1/t)$ ergodic convergence rate in smooth, deterministic problems. Subsequently, beyond the monotone deterministic case, we also show that the last iterate of single-call, \emph{stochastic} extra-gradient methods still enjoys a $\mathcal{O}(1/t)$ local convergence rate to solutions of \emph{non-monotone} variational inequalities that satisfy a second-order sufficient condition. △ Less

Submitted 11 February, 2020; v1 submitted 22 August, 2019; originally announced August 2019.

Comments: In Advances in Neural Information Processing Systems 32 (NeurIPS 2019); 24 pages, 3 figures

MSC Class: 65K15; 62L20; 90C15; 90C33

arXiv:1902.03355 [pdf, other]

Forward-backward-forward methods with variance reduction for stochastic variational inequalities

Authors: Radu Ioan Bot, Panayotis Mertikopoulos, Mathias Staudigl, Phan Tu Vuong

Abstract: We develop a new stochastic algorithm with variance reduction for solving pseudo-monotone stochastic variational inequalities. Our method builds on Tseng's forward-backward-forward (FBF) algorithm, which is known in the deterministic literature to be a valuable alternative to Korpelevich's extragradient method when solving variational inequalities over a convex and closed set governed by pseudo-mo… ▽ More We develop a new stochastic algorithm with variance reduction for solving pseudo-monotone stochastic variational inequalities. Our method builds on Tseng's forward-backward-forward (FBF) algorithm, which is known in the deterministic literature to be a valuable alternative to Korpelevich's extragradient method when solving variational inequalities over a convex and closed set governed by pseudo-monotone, Lipschitz continuous operators. The main computational advantage of Tseng's algorithm is that it relies only on a single projection step and two independent queries of a stochastic oracle. Our algorithm incorporates a variance reduction mechanism and leads to almost sure (a.s.) convergence to an optimal solution. To the best of our knowledge, this is the first stochastic look-ahead algorithm achieving this by using only a single projection at each iteration.. △ Less

Submitted 8 February, 2019; originally announced February 2019.

Comments: 34 pages, 11 figures

MSC Class: Primary 65K15; 62L20; secondary 90C15; 90C33

arXiv:1810.01925 [pdf, ps, other]

Bandit learning in concave $N$-person games

Authors: Mario Bravo, David S. Leslie, Panayotis Mertikopoulos

Abstract: This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agents' most sensible choice in this setting would be to employ a no-regret learning algorithm. In general, this does not mean that the players'… ▽ More This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agents' most sensible choice in this setting would be to employ a no-regret learning algorithm. In general, this does not mean that the players' behavior stabilizes in the long run: no-regret learning may lead to cycles, even with perfect gradient information. However, if a standard monotonicity condition is satisfied, our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability $1$. We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization. △ Less

Submitted 3 October, 2018; originally announced October 2018.

Comments: 24 pages, 1 figure

MSC Class: Primary 91A10; 91A26; secondary 68Q32; 68T02

arXiv:1809.09449 [pdf, other]

doi 10.1137/18M1215682

Hessian barrier algorithms for linearly constrained optimization problems

Authors: Immanuel M. Bomze, Panayotis Mertikopoulos, Werner Schachinger, Mathias Staudigl

Abstract: In this paper, we propose an interior-point method for linearly constrained optimization problems (possibly nonconvex). The method - which we call the Hessian barrier algorithm (HBA) - combines a forward Euler discretization of Hessian Riemannian gradient flows with an Armijo backtracking step-size policy. In this way, HBA can be seen as an alternative to mirror descent (MD), and contains as speci… ▽ More In this paper, we propose an interior-point method for linearly constrained optimization problems (possibly nonconvex). The method - which we call the Hessian barrier algorithm (HBA) - combines a forward Euler discretization of Hessian Riemannian gradient flows with an Armijo backtracking step-size policy. In this way, HBA can be seen as an alternative to mirror descent (MD), and contains as special cases the affine scaling algorithm, regularized Newton processes, and several other iterative solution methods. Our main result is that, modulo a non-degeneracy condition, the algorithm converges to the problem's set of critical points; hence, in the convex case, the algorithm converges globally to the problem's minimum set. In the case of linearly constrained quadratic programs (not necessarily convex), we also show that the method's convergence rate is $\mathcal{O}(1/k^ρ)$ for some $ρ\in(0,1]$ that depends only on the choice of kernel function (i.e., not on the problem's primitives). These theoretical results are validated by numerical experiments in standard non-convex test functions and large-scale traffic assignment problems. △ Less

Submitted 8 May, 2019; v1 submitted 25 September, 2018; originally announced September 2018.

Comments: 27 pages, 6 figures

MSC Class: Primary: 90C51; 90C30; secondary: 90C25; 90C26

Journal ref: SIAM Journal on Optimization 29 (2019), 2100-2127

arXiv:1809.03066 [pdf, ps, other]

doi 10.1287/moor.2022.1283

Multi-agent online learning in time-varying games

Authors: Benoit Duvocelle, Panayotis Mertikopoulos, Mathias Staudigl, Dries Vermeulen

Abstract: We examine the long-run behavior of multi-agent online learning in games that evolve over time. Specifically, we focus on a wide class of policies based on mirror descent, and we show that the induced sequence of play (a) converges to Nash equilibrium in time-varying games that stabilize in the long run to a strictly monotone limit; and (b) it stays asymptotically close to the evolving equilibrium… ▽ More We examine the long-run behavior of multi-agent online learning in games that evolve over time. Specifically, we focus on a wide class of policies based on mirror descent, and we show that the induced sequence of play (a) converges to Nash equilibrium in time-varying games that stabilize in the long run to a strictly monotone limit; and (b) it stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone). Our results apply to both gradient-based and payoff-based feedback - i.e., the "bandit feedback" case where players only get to observe the payoffs of their chosen actions. △ Less

Submitted 4 September, 2021; v1 submitted 9 September, 2018; originally announced September 2018.

Comments: 35 pages

MSC Class: Primary 91A10; 91A26; secondary 68Q32; 68T02

Journal ref: Mathematics of Operations Research, 2022

Showing 1–50 of 84 results for author: Mertikopoulos, P