-
Policy Space Response Oracles: A Survey
Authors:
Ariyan Bighashdel,
Yongzhao Wang,
Stephen McAleer,
Rahul Savani,
Frans A. Oliehoek
Abstract:
Game theory provides a mathematical way to study the interaction between multiple decision makers. However, classical game-theoretic analysis is limited in scalability due to the large number of strategies, precluding direct application to more complex scenarios. This survey provides a comprehensive overview of a framework for large games, known as Policy Space Response Oracles (PSRO), which holds…
▽ More
Game theory provides a mathematical way to study the interaction between multiple decision makers. However, classical game-theoretic analysis is limited in scalability due to the large number of strategies, precluding direct application to more complex scenarios. This survey provides a comprehensive overview of a framework for large games, known as Policy Space Response Oracles (PSRO), which holds promise to improve scalability by focusing attention on sufficient subsets of strategies. We first motivate PSRO and provide historical context. We then focus on the strategy exploration problem for PSRO: the challenge of assembling effective subsets of strategies that still represent the original game well with minimum computational cost. We survey current research directions for enhancing the efficiency of PSRO, and explore the applications of PSRO across various domains. We conclude by discussing open questions and future research.
△ Less
Submitted 27 May, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Two Choices are Enough for P-LCPs, USOs, and Colorful Tangents
Authors:
Michaela Borzechowski,
John Fearnley,
Spencer Gordon,
Rahul Savani,
Patrick Schnider,
Simon Weber
Abstract:
We provide polynomial-time reductions between three search problems from three distinct areas: the P-matrix linear complementarity problem (P-LCP), finding the sink of a unique sink orientation (USO), and a variant of the $α$-Ham Sandwich problem. For all three settings, we show that "two choices are enough", meaning that the general non-binary version of the problem can be reduced in polynomial t…
▽ More
We provide polynomial-time reductions between three search problems from three distinct areas: the P-matrix linear complementarity problem (P-LCP), finding the sink of a unique sink orientation (USO), and a variant of the $α$-Ham Sandwich problem. For all three settings, we show that "two choices are enough", meaning that the general non-binary version of the problem can be reduced in polynomial time to the binary version. This specifically means that generalized P-LCPs are equivalent to P-LCPs, and grid USOs are equivalent to cube USOs. These results are obtained by showing that both the P-LCP and our $α$-Ham Sandwich variant are equivalent to a new problem we introduce, P-Lin-Bellman. This problem can be seen as a new tool for formulating problems as P-LCPs.
△ Less
Submitted 21 May, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
The Complexity of Computing KKT Solutions of Quadratic Programs
Authors:
John Fearnley,
Paul W. Goldberg,
Alexandros Hollender,
Rahul Savani
Abstract:
It is well known that solving a (non-convex) quadratic program is NP-hard. We show that the problem remains hard even if we are only looking for a Karush-Kuhn-Tucker (KKT) point, instead of a global optimum. Namely, we prove that computing a KKT point of a quadratic polynomial over the domain $[0,1]^n$ is complete for the class CLS = PPAD$\cap$PLS.
It is well known that solving a (non-convex) quadratic program is NP-hard. We show that the problem remains hard even if we are only looking for a Karush-Kuhn-Tucker (KKT) point, instead of a global optimum. Namely, we prove that computing a KKT point of a quadratic polynomial over the domain $[0,1]^n$ is complete for the class CLS = PPAD$\cap$PLS.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Conditional Generators for Limit Order Book Environments: Explainability, Challenges, and Robustness
Authors:
Andrea Coletta,
Joseph Jerome,
Rahul Savani,
Svitlana Vyetrenko
Abstract:
Limit order books are a fundamental and widespread market mechanism. This paper investigates the use of conditional generative models for order book simulation. For develo** a trading agent, this approach has drawn recent attention as an alternative to traditional backtesting due to its ability to react to the presence of the trading agent. Using a state-of-the-art CGAN (from Coletta et al. (202…
▽ More
Limit order books are a fundamental and widespread market mechanism. This paper investigates the use of conditional generative models for order book simulation. For develo** a trading agent, this approach has drawn recent attention as an alternative to traditional backtesting due to its ability to react to the presence of the trading agent. Using a state-of-the-art CGAN (from Coletta et al. (2022)), we explore its dependence upon input features, which highlights both strengths and weaknesses. To do this, we use "adversarial attacks" on the model's features and its mechanism. We then show how these insights can be used to improve the CGAN, both in terms of its realism and robustness. We finish by laying out a roadmap for future work.
△ Less
Submitted 22 June, 2023;
originally announced June 2023.
-
Ordinal Potential-based Player Rating
Authors:
Nelson Vadori,
Rahul Savani
Abstract:
It was recently observed that Elo ratings fail at preserving transitive relations among strategies and therefore cannot correctly extract the transitive component of a game. We provide a characterization of transitive games as a weak variant of ordinal potential games and show that Elo ratings actually do preserve transitivity when computed in the right space, using suitable invertible map**s. L…
▽ More
It was recently observed that Elo ratings fail at preserving transitive relations among strategies and therefore cannot correctly extract the transitive component of a game. We provide a characterization of transitive games as a weak variant of ordinal potential games and show that Elo ratings actually do preserve transitivity when computed in the right space, using suitable invertible map**s. Leveraging this insight, we introduce a new game decomposition of an arbitrary game into transitive and cyclic components that is learnt using a neural network-based architecture and that prioritises capturing the sign pattern of the game, namely transitive and cyclic relations among strategies. We link our approach to the known concept of sign-rank, and evaluate our methodology using both toy examples and empirical data from real-world games.
△ Less
Submitted 6 March, 2024; v1 submitted 8 June, 2023;
originally announced June 2023.
-
First Order Methods for Geometric Optimization of Crystal Structures
Authors:
Antonia Tsili,
Matthew Dyer,
Vladimir Gusev,
Piotr Krysta,
Rahul Savani
Abstract:
The geometric optimization of crystal structures is a procedure widely used in Chemistry that changes the geometrical placement of the particles inside a structure. It is called structural relaxation and constitutes a local minimization problem with a non-convex objective function whose domain complexity increases according to the number of particles involved. In this work we study the performance…
▽ More
The geometric optimization of crystal structures is a procedure widely used in Chemistry that changes the geometrical placement of the particles inside a structure. It is called structural relaxation and constitutes a local minimization problem with a non-convex objective function whose domain complexity increases according to the number of particles involved. In this work we study the performance of the two most popular first order optimization methods in structural relaxation. Although frequently employed, there is a lack of their study in this context from an algorithmic point of view. We run each algorithm in combination with a constant step size, which provides a benchmark for the methods' analysis and direct comparison. We also design dynamic step size rules and study how these improve the two algorithms' performance. Our results show that there is a trade-off between convergence rate and the possibility of an experiment to succeed, hence we construct a function to assign utility to each method based on our respective preference. The function is built according to a recently introduced model of preference indication concerning algorithms with deadline and their run time. Finally, building on all our insights from the experimental results, we provide algorithmic recipes that best correspond to each of the presented preferences and select one recipe as the optimal for equally weighted preferences.
Alongside our results we present our open source Python software veltiCRYS, which was used to perform the geometric optimization experiments. Our implementation, can be easily edited to accommodate other energy functions and is especially targeted for testing different methods in structural relaxation.
△ Less
Submitted 22 May, 2023; v1 submitted 26 January, 2023;
originally announced January 2023.
-
Model-based gym environments for limit order book trading
Authors:
Joseph Jerome,
Leandro Sanchez-Betancourt,
Rahul Savani,
Martin Herdegen
Abstract:
Within the mathematical finance literature there is a rich catalogue of mathematical models for studying algorithmic trading problems -- such as market-making and optimal execution -- in limit order books. This paper introduces \mbtgym, a Python module that provides a suite of gym environments for training reinforcement learning (RL) agents to solve such model-based trading problems. The module is…
▽ More
Within the mathematical finance literature there is a rich catalogue of mathematical models for studying algorithmic trading problems -- such as market-making and optimal execution -- in limit order books. This paper introduces \mbtgym, a Python module that provides a suite of gym environments for training reinforcement learning (RL) agents to solve such model-based trading problems. The module is set up in an extensible way to allow the combination of different aspects of different models. It supports highly efficient implementations of vectorized environments to allow faster training of RL agents. In this paper, we motivate the challenge of using RL to solve such model-based limit order book problems in mathematical finance, we explain the design of our gym environment, and then demonstrate its use in solving standard and non-standard problems from the literature. Finally, we lay out a roadmap for further development of our module, which we provide as an open source repository on GitHub so that it can serve as a focal point for RL research in model-based algorithmic trading.
△ Less
Submitted 16 September, 2022;
originally announced September 2022.
-
Market Making with Scaled Beta Policies
Authors:
Joseph Jerome,
Gregory Palmer,
Rahul Savani
Abstract:
This paper introduces a new representation for the actions of a market maker in an order-driven market. This representation uses scaled beta distributions, and generalises three approaches taken in the artificial intelligence for market making literature: single price-level selection, ladder strategies and "market making at the touch". Ladder strategies place uniform volume across an interval of c…
▽ More
This paper introduces a new representation for the actions of a market maker in an order-driven market. This representation uses scaled beta distributions, and generalises three approaches taken in the artificial intelligence for market making literature: single price-level selection, ladder strategies and "market making at the touch". Ladder strategies place uniform volume across an interval of contiguous prices. Scaled beta distribution based policies generalise these, allowing volume to be skewed across the price interval. We demonstrate that this flexibility is useful for inventory management, one of the key challenges faced by a market maker.
In this paper, we conduct three main experiments: first, we compare our more flexible beta-based actions with the special case of ladder strategies; then, we investigate the performance of simple fixed distributions; and finally, we devise and evaluate a simple and intuitive dynamic control policy that adjusts actions in a continuous manner depending on the signed inventory that the market maker has acquired. All empirical evaluations use a high-fidelity limit order book simulator based on historical data with 50 levels on each side.
△ Less
Submitted 27 September, 2022; v1 submitted 7 July, 2022;
originally announced July 2022.
-
Trading via Selective Classification
Authors:
Nestoras Chalkidis,
Rahul Savani
Abstract:
A binary classifier that tries to predict if the price of an asset will increase or decrease naturally gives rise to a trading strategy that follows the prediction and thus always has a position in the market. Selective classification extends a binary or many-class classifier to allow it to abstain from making a prediction for certain inputs, thereby allowing a trade-off between the accuracy of th…
▽ More
A binary classifier that tries to predict if the price of an asset will increase or decrease naturally gives rise to a trading strategy that follows the prediction and thus always has a position in the market. Selective classification extends a binary or many-class classifier to allow it to abstain from making a prediction for certain inputs, thereby allowing a trade-off between the accuracy of the resulting selective classifier against coverage of the input feature space. Selective classifiers give rise to trading strategies that do not take a trading position when the classifier abstains. We investigate the application of binary and ternary selective classification to trading strategy design. For ternary classification, in addition to classes for the price going up or down, we include a third class that corresponds to relatively small price moves in either direction, and gives the classifier another way to avoid making a directional prediction. We use a walk-forward train-validate-test approach to evaluate and compare binary and ternary, selective and non-selective classifiers across several different feature sets based on four classification approaches: logistic regression, random forests, feed-forward, and recurrent neural networks. We then turn these classifiers into trading strategies for which we perform backtests on commodity futures markets. Our empirical results demonstrate the potential of selective classification for trading.
△ Less
Submitted 31 October, 2021; v1 submitted 28 October, 2021;
originally announced October 2021.
-
Consensus Multiplicative Weights Update: Learning to Learn using Projector-based Game Signatures
Authors:
Nelson Vadori,
Rahul Savani,
Thomas Spooner,
Sumitra Ganesh
Abstract:
Cheung and Piliouras (2020) recently showed that two variants of the Multiplicative Weights Update method - OMWU and MWU - display opposite convergence properties depending on whether the game is zero-sum or cooperative. Inspired by this work and the recent literature on learning to optimize for single functions, we introduce a new framework for learning last-iterate convergence to Nash Equilibria…
▽ More
Cheung and Piliouras (2020) recently showed that two variants of the Multiplicative Weights Update method - OMWU and MWU - display opposite convergence properties depending on whether the game is zero-sum or cooperative. Inspired by this work and the recent literature on learning to optimize for single functions, we introduce a new framework for learning last-iterate convergence to Nash Equilibria in games, where the update rule's coefficients (learning rates) along a trajectory are learnt by a reinforcement learning policy that is conditioned on the nature of the game: \textit{the game signature}. We construct the latter using a new decomposition of two-player games into eight components corresponding to commutative projection operators, generalizing and unifying recent game concepts studied in the literature. We compare the performance of various update rules when their coefficients are learnt, and show that the RL policy is able to exploit the game signature across a wide range of game types. In doing so, we introduce CMWU, a new algorithm that extends consensus optimization to the constrained case, has local convergence guarantees for zero-sum bimatrix games, and show that it enjoys competitive performance on both zero-sum games with constant coefficients and across a spectrum of games when its coefficients are learnt.
△ Less
Submitted 11 June, 2022; v1 submitted 4 June, 2021;
originally announced June 2021.
-
Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent
Authors:
Ian Gemp,
Rahul Savani,
Marc Lanctot,
Yoram Bachrach,
Thomas Anthony,
Richard Everett,
Andrea Tacchetti,
Tom Eccles,
János Kramár
Abstract:
Nash equilibrium is a central concept in game theory. Several Nash solvers exist, yet none scale to normal-form games with many actions and many players, especially those with payoff tensors too big to be stored in memory. In this work, we propose an approach that iteratively improves an approximation to a Nash equilibrium through joint play. It accomplishes this by tracing a previously establishe…
▽ More
Nash equilibrium is a central concept in game theory. Several Nash solvers exist, yet none scale to normal-form games with many actions and many players, especially those with payoff tensors too big to be stored in memory. In this work, we propose an approach that iteratively improves an approximation to a Nash equilibrium through joint play. It accomplishes this by tracing a previously established homotopy that defines a continuum of equilibria for the game regularized with decaying levels of entropy. This continuum asymptotically approaches the limiting logit equilibrium, proven by McKelvey and Palfrey (1995) to be unique in almost all games, thereby partially circumventing the well-known equilibrium selection problem of many-player games. To encourage iterates to remain near this path, we efficiently minimize average deviation incentive via stochastic gradient descent, intelligently sampling entries in the payoff tensor as needed. Monte Carlo estimates of the stochastic gradient from joint play are biased due to the appearance of a nonlinear max operator in the objective, so we introduce additional innovations to the algorithm to alleviate gradient bias. The descent process can also be viewed as repeatedly constructing and reacting to a polymatrix approximation to the game. In these ways, our proposed approach, average deviation incentive descent with adaptive sampling (ADIDAS), is most similar to three classical approaches, namely homotopy-type, Lyapunov, and iterative polymatrix solvers. The lack of local convergence guarantees for biased gradient descent prevents guaranteed convergence to Nash, however, we demonstrate through extensive experiments the ability of this approach to approximate a unique Nash in normal-form games with as many as seven players and twenty one actions (several billion outcomes) that are orders of magnitude larger than those possible with prior algorithms.
△ Less
Submitted 4 February, 2022; v1 submitted 2 June, 2021;
originally announced June 2021.
-
Difference Rewards Policy Gradients
Authors:
Jacopo Castellini,
Sam Devlin,
Frans A. Oliehoek,
Rahul Savani
Abstract:
Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent's contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly…
▽ More
Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent's contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies when the reward function is known. By differencing the reward function directly, Dr.Reinforce avoids difficulties associated with learning the Q-function as done by Counterfactual Multiagent Policy Gradients (COMA), a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the effectiveness of a version of Dr.Reinforce that learns an additional reward network that is used to estimate the difference rewards.
△ Less
Submitted 9 November, 2023; v1 submitted 21 December, 2020;
originally announced December 2020.
-
The Complexity of Gradient Descent: CLS = PPAD $\cap$ PLS
Authors:
John Fearnley,
Paul W. Goldberg,
Alexandros Hollender,
Rahul Savani
Abstract:
We study search problems that can be solved by performing Gradient Descent on a bounded convex polytopal domain and show that this class is equal to the intersection of two well-known classes: PPAD and PLS. As our main underlying technical contribution, we show that computing a Karush-Kuhn-Tucker (KKT) point of a continuously differentiable function over the domain $[0,1]^2$ is PPAD $\cap$ PLS-com…
▽ More
We study search problems that can be solved by performing Gradient Descent on a bounded convex polytopal domain and show that this class is equal to the intersection of two well-known classes: PPAD and PLS. As our main underlying technical contribution, we show that computing a Karush-Kuhn-Tucker (KKT) point of a continuously differentiable function over the domain $[0,1]^2$ is PPAD $\cap$ PLS-complete. This is the first non-artificial problem to be shown complete for this class. Our results also imply that the class CLS (Continuous Local Search) - which was defined by Daskalakis and Papadimitriou as a more "natural" counterpart to PPAD $\cap$ PLS and contains many interesting problems - is itself equal to PPAD $\cap$ PLS.
△ Less
Submitted 3 March, 2023; v1 submitted 3 November, 2020;
originally announced November 2020.
-
A faster algorithm for finding Tarski fixed points
Authors:
John Fearnley,
Dömötör Pálvölgyi,
Rahul Savani
Abstract:
Dang et al. have given an algorithm that can find a Tarski fixed point in a $k$-dimensional lattice of width $n$ using $O(\log^{k} n)$ queries. Multiple authors have conjectured that this algorithm is optimal [Dang et al., Etessami et al.], and indeed this has been proven for two-dimensional instances [Etessami et al.]. We show that these conjectures are false in dimension three or higher by givin…
▽ More
Dang et al. have given an algorithm that can find a Tarski fixed point in a $k$-dimensional lattice of width $n$ using $O(\log^{k} n)$ queries. Multiple authors have conjectured that this algorithm is optimal [Dang et al., Etessami et al.], and indeed this has been proven for two-dimensional instances [Etessami et al.]. We show that these conjectures are false in dimension three or higher by giving an $O(\log^2 n)$ query algorithm for the three-dimensional Tarski problem. We also give a new decomposition theorem for $k$-dimensional Tarski problems which, in combination with our new algorithm for three dimensions, gives an $O(\log^{2 \lceil k/3 \rceil} n)$ query algorithm for the $k$-dimensional problem.
△ Less
Submitted 20 March, 2021; v1 submitted 6 October, 2020;
originally announced October 2020.
-
A deep learning approach to identify unhealthy advertisements in street view images
Authors:
Gregory Palmer,
Mark Green,
Emma Boyland,
Yales Stefano Rios Vasconcelos,
Rahul Savani,
Alex Singleton
Abstract:
While outdoor advertisements are common features within towns and cities, they may reinforce social inequalities in health. Vulnerable populations in deprived areas may have greater exposure to fast food, gambling and alcohol advertisements encouraging their consumption. Understanding who is exposed and evaluating potential policy restrictions requires a substantial manual data collection effort.…
▽ More
While outdoor advertisements are common features within towns and cities, they may reinforce social inequalities in health. Vulnerable populations in deprived areas may have greater exposure to fast food, gambling and alcohol advertisements encouraging their consumption. Understanding who is exposed and evaluating potential policy restrictions requires a substantial manual data collection effort. To address this problem we develop a deep learning workflow to automatically extract and classify unhealthy advertisements from street-level images. We introduce the Liverpool 360 Street View (LIV360SV) dataset for evaluating our workflow. The dataset contains 25,349, 360 degree, street-level images collected via cycling with a GoPro Fusion camera, recorded Jan 14th - 18th 2020. 10,106 advertisements were identified and classified as food (1335), alcohol (217), gambling (149) and other (8405) (e.g., cars and broadband). We find evidence of social inequalities with a larger proportion of food advertisements located within deprived areas and those frequented by students. Our project presents a novel implementation for the incidental classification of street view images for identifying unhealthy advertisements, providing a means through which to identify areas that can benefit from tougher advertisement restriction policies for tackling social inequalities.
△ Less
Submitted 7 February, 2021; v1 submitted 9 July, 2020;
originally announced July 2020.
-
A Natural Actor-Critic Algorithm with Downside Risk Constraints
Authors:
Thomas Spooner,
Rahul Savani
Abstract:
Existing work on risk-sensitive reinforcement learning - both for symmetric and downside risk measures - has typically used direct Monte-Carlo estimation of policy gradients. While this approach yields unbiased gradient estimates, it also suffers from high variance and decreased sample efficiency compared to temporal-difference methods. In this paper, we study prediction and control with aversion…
▽ More
Existing work on risk-sensitive reinforcement learning - both for symmetric and downside risk measures - has typically used direct Monte-Carlo estimation of policy gradients. While this approach yields unbiased gradient estimates, it also suffers from high variance and decreased sample efficiency compared to temporal-difference methods. In this paper, we study prediction and control with aversion to downside risk which we gauge by the lower partial moment of the return. We introduce a new Bellman equation that upper bounds the lower partial moment, circumventing its non-linearity. We prove that this proxy for the lower partial moment is a contraction, and provide intuition into the stability of the algorithm by variance decomposition. This allows sample-efficient, on-line estimation of partial moments. For risk-sensitive control, we instantiate Reward Constrained Policy Optimization, a recent actor-critic method for finding constrained policies, with our proxy for the lower partial moment. We extend the method to use natural policy gradients and demonstrate the effectiveness of our approach on three benchmark problems for risk-sensitive reinforcement learning.
△ Less
Submitted 8 July, 2020;
originally announced July 2020.
-
Robust Market Making via Adversarial Reinforcement Learning
Authors:
Thomas Spooner,
Rahul Savani
Abstract:
We show that adversarial reinforcement learning (ARL) can be used to produce market marking agents that are robust to adversarial and adaptively-chosen market conditions. To apply ARL, we turn the well-studied single-agent model of Avellaneda and Stoikov [2008] into a discrete-time zero-sum game between a market maker and adversary. The adversary acts as a proxy for other market participants that…
▽ More
We show that adversarial reinforcement learning (ARL) can be used to produce market marking agents that are robust to adversarial and adaptively-chosen market conditions. To apply ARL, we turn the well-studied single-agent model of Avellaneda and Stoikov [2008] into a discrete-time zero-sum game between a market maker and adversary. The adversary acts as a proxy for other market participants that would like to profit at the market maker's expense. We empirically compare two conventional single-agent RL agents with ARL, and show that our ARL approach leads to: 1) the emergence of risk-averse behaviour without constraints or domain-specific penalties; 2) significant improvements in performance across a set of standard metrics, evaluated with or without an adversary in the test environment, and; 3) improved robustness to model uncertainty. We empirically demonstrate that our ARL method consistently converges, and we prove for several special cases that the profiles that we converge to correspond to Nash equilibria in a simplified single-stage game.
△ Less
Submitted 8 July, 2020; v1 submitted 3 March, 2020;
originally announced March 2020.
-
Tree Polymatrix Games are PPAD-hard
Authors:
Argyrios Deligkas,
John Fearnley,
Rahul Savani
Abstract:
We prove that it is PPAD-hard to compute a Nash equilibrium in a tree polymatrix game with twenty actions per player. This is the first PPAD hardness result for a game with a constant number of actions per player where the interaction graph is acyclic. Along the way we show PPAD-hardness for finding an $ε$-fixed point of a 2D LinearFIXP instance, when $ε$ is any constant less than…
▽ More
We prove that it is PPAD-hard to compute a Nash equilibrium in a tree polymatrix game with twenty actions per player. This is the first PPAD hardness result for a game with a constant number of actions per player where the interaction graph is acyclic. Along the way we show PPAD-hardness for finding an $ε$-fixed point of a 2D LinearFIXP instance, when $ε$ is any constant less than $(\sqrt{2} - 1)/2 \approx 0.2071$. This lifts the hardness regime from polynomially small approximations in $k$-dimensions to constant approximations in two-dimensions, and our constant is substantial when compared to the trivial upper bound of $0.5$.
△ Less
Submitted 27 February, 2020;
originally announced February 2020.
-
The Automated Inspection of Opaque Liquid Vaccines
Authors:
Gregory Palmer,
Benjamin Schnieders,
Rahul Savani,
Karl Tuyls,
Joscha-David Fossel,
Harry Flore
Abstract:
In the pharmaceutical industry the screening of opaque vaccines containing suspensions is currently a manual task carried out by trained human visual inspectors. We show that deep learning can be used to effectively automate this process. A moving contrast is required to distinguish anomalies from other particles, reflections and dust resting on a vial's surface. We train 3D-ConvNets to predict th…
▽ More
In the pharmaceutical industry the screening of opaque vaccines containing suspensions is currently a manual task carried out by trained human visual inspectors. We show that deep learning can be used to effectively automate this process. A moving contrast is required to distinguish anomalies from other particles, reflections and dust resting on a vial's surface. We train 3D-ConvNets to predict the likelihood of 20-frame video samples containing anomalies. Our unaugmented dataset consists of hand-labelled samples, recorded using vials provided by the HAL Allergy Group, a pharmaceutical company. We trained ten randomly initialized 3D-ConvNets to provide a benchmark, observing mean AUROC scores of 0.94 and 0.93 for positive samples (containing anomalies) and negative (anomaly-free) samples, respectively. Using Frame-Completion Generative Adversarial Networks we: (i) introduce an algorithm for computing saliency maps, which we use to verify that the 3D-ConvNets are indeed identifying anomalies; (ii) propose a novel self-training approach using the saliency maps to determine if multiple networks agree on the location of anomalies. Our self-training approach allows us to augment our data set by labelling 217,888 additional samples. 3D-ConvNets trained with our augmented dataset improve on the results we get when we train only on the unaugmented dataset.
△ Less
Submitted 21 February, 2020;
originally announced February 2020.
-
One-Clock Priced Timed Games are PSPACE-hard
Authors:
John Fearnley,
Rasmus Ibsen-Jensen,
Rahul Savani
Abstract:
The main result of this paper is that computing the value of a one-clock priced timed game (OCPTG) is PSPACE-hard. Along the way, we provide a family of OCPTGs that have an exponential number of event points. Both results hold even in very restricted classes of games such as DAGs with treewidth three. Finally, we provide a number of positive results, including polynomial-time algorithms for even m…
▽ More
The main result of this paper is that computing the value of a one-clock priced timed game (OCPTG) is PSPACE-hard. Along the way, we provide a family of OCPTGs that have an exponential number of event points. Both results hold even in very restricted classes of games such as DAGs with treewidth three. Finally, we provide a number of positive results, including polynomial-time algorithms for even more restricted classes of OCPTGs such as trees.
△ Less
Submitted 4 March, 2020; v1 submitted 13 January, 2020;
originally announced January 2020.
-
Evolving Indoor Navigational Strategies Using Gated Recurrent Units In NEAT
Authors:
James Butterworth,
Rahul Savani,
Karl Tuyls
Abstract:
Simultaneous Localisation and Map** (SLAM) algorithms are expensive to run on smaller robotic platforms such as Micro-Aerial Vehicles. Bug algorithms are an alternative that use relatively little processing power, and avoid high memory consumption by not building an explicit map of the environment. Bug Algorithms achieve relatively good performance in simulated and robotic maze solving domains.…
▽ More
Simultaneous Localisation and Map** (SLAM) algorithms are expensive to run on smaller robotic platforms such as Micro-Aerial Vehicles. Bug algorithms are an alternative that use relatively little processing power, and avoid high memory consumption by not building an explicit map of the environment. Bug Algorithms achieve relatively good performance in simulated and robotic maze solving domains. However, because they are hand-designed, a natural question is whether they are globally optimal control policies. In this work we explore the performance of Neuroevolution - specifically NEAT - at evolving control policies for simulated differential drive robots carrying out generalised maze navigation. We extend NEAT to include Gated Recurrent Units (GRUs) to help deal with long term dependencies. We show that both NEAT and our NEAT-GRU can repeatably generate controllers that outperform I-Bug (an algorithm particularly well-suited for use in real robots) on a test set of 209 indoor maze like environments. We show that NEAT-GRU is superior to NEAT in this task but also that out of the 2 systems, only NEAT-GRU can continuously evolve successful controllers for a much harder task in which no bearing information about the target is provided to the agent.
△ Less
Submitted 12 April, 2019;
originally announced April 2019.
-
Analysing Factorizations of Action-Value Networks for Cooperative Multi-Agent Reinforcement Learning
Authors:
Jacopo Castellini,
Frans A. Oliehoek,
Rahul Savani,
Shimon Whiteson
Abstract:
Recent years have seen the application of deep reinforcement learning techniques to cooperative multi-agent systems, with great empirical success. However, given the lack of theoretical insight, it remains unclear what the employed neural networks are learning, or how we should enhance their learning power to address the problems on which they fail. In this work, we empirically investigate the lea…
▽ More
Recent years have seen the application of deep reinforcement learning techniques to cooperative multi-agent systems, with great empirical success. However, given the lack of theoretical insight, it remains unclear what the employed neural networks are learning, or how we should enhance their learning power to address the problems on which they fail. In this work, we empirically investigate the learning power of various network architectures on a series of one-shot games. Despite their simplicity, these games capture many of the crucial problems that arise in the multi-agent setting, such as an exponential number of joint actions or the lack of an explicit coordination mechanism. Our results extend those in [4] and quantify how well various approaches can represent the requisite value functions, and help us identify the reasons that can impede good performance, like sparsity of the values or too tight coordination requirements.
△ Less
Submitted 9 November, 2023; v1 submitted 20 February, 2019;
originally announced February 2019.
-
Unique End of Potential Line
Authors:
John Fearnley,
Spencer Gordon,
Ruta Mehta,
Rahul Savani
Abstract:
This paper studies the complexity of problems in PPAD $\cap$ PLS that have unique solutions. Three well-known examples of such problems are the problem of finding a fixpoint of a contraction map, finding the unique sink of a Unique Sink Orientation (USO), and solving the P-matrix Linear Complementarity Problem (P-LCP). Each of these are promise-problems, and when the promise holds, they always pos…
▽ More
This paper studies the complexity of problems in PPAD $\cap$ PLS that have unique solutions. Three well-known examples of such problems are the problem of finding a fixpoint of a contraction map, finding the unique sink of a Unique Sink Orientation (USO), and solving the P-matrix Linear Complementarity Problem (P-LCP). Each of these are promise-problems, and when the promise holds, they always possess unique solutions.
We define the complexity class UEOPL to capture problems of this type. We first define a class that we call EOPL, which consists of all problems that can be reduced to End-of-Potential-Line. This problem merges the canonical PPAD-complete problem End-of-Line, with the canonical PLS-complete problem Sink-of-Dag, and so EOPL captures problems that can be solved by a line-following algorithm that also simultaneously decreases a potential function.
Promise-UEOPL is a promise-subclass of EOPL in which the line in the End-of-Potential-Line instance is guaranteed to be unique via a promise. We turn this into a non-promise class UEOPL, by adding an extra solution type to EOPL that captures any pair of points that are provably on two different lines.
We show that UEOPL $\subseteq$ EOPL $\subseteq$ CLS, and that all of our motivating problems are contained in UEOPL: specifically USO, P-LCP, and finding a fixpoint of a Piecewise-Linear Contraction under an $\ell_p$-norm all lie in UEOPL. Our results also imply that parity games, mean-payoff games, discounted games, and simple-stochastic games lie in UEOPL.
All of our containment results are proved via a reduction to a problem that we call One-Permutation Discrete Contraction (OPDC). This problem is motivated by a discretized version of contraction, but it is also closely related to the USO problem. We show that OPDC lies in UEOPL, and we are also able to show that OPDC is UEOPL-complete.
△ Less
Submitted 9 November, 2018;
originally announced November 2018.
-
Negative Update Intervals in Deep Multi-Agent Reinforcement Learning
Authors:
Gregory Palmer,
Rahul Savani,
Karl Tuyls
Abstract:
In Multi-Agent Reinforcement Learning (MA-RL), independent cooperative learners must overcome a number of pathologies to learn optimal joint policies. Addressing one pathology often leaves approaches vulnerable towards others. For instance, hysteretic Q-learning addresses miscoordination while leaving agents vulnerable towards misleading stochastic rewards. Other methods, such as leniency, have pr…
▽ More
In Multi-Agent Reinforcement Learning (MA-RL), independent cooperative learners must overcome a number of pathologies to learn optimal joint policies. Addressing one pathology often leaves approaches vulnerable towards others. For instance, hysteretic Q-learning addresses miscoordination while leaving agents vulnerable towards misleading stochastic rewards. Other methods, such as leniency, have proven more robust when dealing with multiple pathologies simultaneously. However, leniency has predominately been studied within the context of strategic form games (bimatrix games) and fully observable Markov games consisting of a small number of probabilistic state transitions. This raises the question of whether these findings scale to more complex domains. For this purpose we implement a temporally extend version of the Climb Game, within which agents must overcome multiple pathologies simultaneously, including relative overgeneralisation, stochasticity, the alter-exploration and moving target problems, while learning from a large observation space. We find that existing lenient and hysteretic approaches fail to consistently learn near optimal joint-policies in this environment. To address these pathologies we introduce Negative Update Intervals-DDQN (NUI-DDQN), a Deep MA-RL algorithm which discards episodes yielding cumulative rewards outside the range of expanding intervals. NUI-DDQN consistently gravitates towards optimal joint-policies in our environment, overcoming the outlined pathologies.
△ Less
Submitted 7 May, 2019; v1 submitted 13 September, 2018;
originally announced September 2018.
-
Beyond Local Nash Equilibria for Adversarial Networks
Authors:
Frans A. Oliehoek,
Rahul Savani,
Jose Gallego,
Elise van der Pol,
Roderich Groß
Abstract:
Save for some special cases, current training methods for Generative Adversarial Networks (GANs) are at best guaranteed to converge to a `local Nash equilibrium` (LNE). Such LNEs, however, can be arbitrarily far from an actual Nash equilibrium (NE), which implies that there are no guarantees on the quality of the found generator or classifier. This paper proposes to model GANs explicitly as finite…
▽ More
Save for some special cases, current training methods for Generative Adversarial Networks (GANs) are at best guaranteed to converge to a `local Nash equilibrium` (LNE). Such LNEs, however, can be arbitrarily far from an actual Nash equilibrium (NE), which implies that there are no guarantees on the quality of the found generator or classifier. This paper proposes to model GANs explicitly as finite games in mixed strategies, thereby ensuring that every LNE is an NE. With this formulation, we propose a solution method that is proven to monotonically converge to a resource-bounded Nash equilibrium (RB-NE): by increasing computational resources we can find better solutions. We empirically demonstrate that our method is less prone to typical GAN problems such as mode collapse, and produces solutions that are less exploitable than those produced by GANs and MGANs, and closely resemble theoretical predictions about NEs.
△ Less
Submitted 26 July, 2018; v1 submitted 18 June, 2018;
originally announced June 2018.
-
Market Making via Reinforcement Learning
Authors:
Thomas Spooner,
John Fearnley,
Rahul Savani,
Andreas Koukorinis
Abstract:
Market making is a fundamental trading problem in which an agent provides liquidity by continually offering to buy and sell a security. The problem is challenging due to inventory risk, the risk of accumulating an unfavourable position and ultimately losing money. In this paper, we develop a high-fidelity simulation of limit order book markets, and use it to design a market making agent using temp…
▽ More
Market making is a fundamental trading problem in which an agent provides liquidity by continually offering to buy and sell a security. The problem is challenging due to inventory risk, the risk of accumulating an unfavourable position and ultimately losing money. In this paper, we develop a high-fidelity simulation of limit order book markets, and use it to design a market making agent using temporal-difference reinforcement learning. We use a linear combination of tile codings as a value function approximator, and design a custom reward function that controls inventory risk. We demonstrate the effectiveness of our approach by showing that our agent outperforms both simple benchmark strategies and a recent online learning approach from the literature.
△ Less
Submitted 11 April, 2018;
originally announced April 2018.
-
End of Potential Line
Authors:
John Fearnley,
Spencer Gordon,
Ruta Mehta,
Rahul Savani
Abstract:
We introduce the problem EndOfPotentialLine and the corresponding complexity class EOPL of all problems that can be reduced to it in polynomial time. This class captures problems that admit a single combinatorial proof of their joint membership in the complexity classes PPAD of fixpoint problems and PLS of local search problems. EOPL is a combinatorially-defined alternative to the class CLS (for C…
▽ More
We introduce the problem EndOfPotentialLine and the corresponding complexity class EOPL of all problems that can be reduced to it in polynomial time. This class captures problems that admit a single combinatorial proof of their joint membership in the complexity classes PPAD of fixpoint problems and PLS of local search problems. EOPL is a combinatorially-defined alternative to the class CLS (for Continuous Local Search), which was introduced in with the goal of capturing the complexity of some well-known problems in PPAD $\cap$ PLS that have resisted, in some cases for decades, attempts to put them in polynomial time. Two of these are Contraction, the problem of finding a fixpoint of a contraction map, and P-LCP, the problem of solving a P-matrix Linear Complementarity Problem.
We show that EndOfPotentialLine is in CLS via a two-way reduction to EndOfMeteredLine. The latter was defined in to show query and cryptographic lower bounds for CLS. Our two main results are to show that both PL-Contraction (Piecewise-Linear Contraction) and P-LCP are in EOPL. Our reductions imply that the promise versions of PL-Contraction and P-LCP are in the promise class UniqueEOPL, which corresponds to the case of a single potential line. This also shows that simple-stochastic, discounted, mean-payoff, and parity games are in EOPL.
Using the insights from our reduction for PL-Contraction, we obtain the first polynomial-time algorithms for finding fixed points of contraction maps in fixed dimension for any $\ell_p$ norm, where previously such algorithms were only known for the $\ell_2$ and $\ell_\infty$ norms. Our reduction from P-LCP to EndOfPotentialLine allows a technique of Aldous to be applied, which in turn gives the fastest-known randomized algorithm for the P-LCP.
△ Less
Submitted 18 April, 2018; v1 submitted 10 April, 2018;
originally announced April 2018.
-
GANGs: Generative Adversarial Network Games
Authors:
Frans A. Oliehoek,
Rahul Savani,
Jose Gallego-Posada,
Elise van der Pol,
Edwin D. de Jong,
Roderich Gross
Abstract:
Generative Adversarial Networks (GAN) have become one of the most successful frameworks for unsupervised generative modeling. As GANs are difficult to train much research has focused on this. However, very little of this research has directly exploited game-theoretic techniques. We introduce Generative Adversarial Network Games (GANGs), which explicitly model a finite zero-sum game between a gener…
▽ More
Generative Adversarial Networks (GAN) have become one of the most successful frameworks for unsupervised generative modeling. As GANs are difficult to train much research has focused on this. However, very little of this research has directly exploited game-theoretic techniques. We introduce Generative Adversarial Network Games (GANGs), which explicitly model a finite zero-sum game between a generator ($G$) and classifier ($C$) that use mixed strategies. The size of these games precludes exact solution methods, therefore we define resource-bounded best responses (RBBRs), and a resource-bounded Nash Equilibrium (RB-NE) as a pair of mixed strategies such that neither $G$ or $C$ can find a better RBBR. The RB-NE solution concept is richer than the notion of `local Nash equilibria' in that it captures not only failures of esca** local optima of gradient descent, but applies to any approximate best response computations, including methods with random restarts. To validate our approach, we solve GANGs with the Parallel Nash Memory algorithm, which provably monotonically converges to an RB-NE. We compare our results to standard GAN setups, and demonstrate that our method deals well with typical GAN problems such as mode collapse, partial mode coverage and forgetting.
△ Less
Submitted 17 December, 2017; v1 submitted 2 December, 2017;
originally announced December 2017.
-
Symmetric Decomposition of Asymmetric Games
Authors:
Karl Tuyls,
Julien Perolat,
Marc Lanctot,
Georg Ostrovski,
Rahul Savani,
Joel Leibo,
Toby Ord,
Thore Graepel,
Shane Legg
Abstract:
We introduce new theoretical insights into two-population asymmetric games allowing for an elegant symmetric decomposition into two single population symmetric games. Specifically, we show how an asymmetric bimatrix game (A,B) can be decomposed into its symmetric counterparts by envisioning and investigating the payoff tables (A and B) that constitute the asymmetric game, as two independent, singl…
▽ More
We introduce new theoretical insights into two-population asymmetric games allowing for an elegant symmetric decomposition into two single population symmetric games. Specifically, we show how an asymmetric bimatrix game (A,B) can be decomposed into its symmetric counterparts by envisioning and investigating the payoff tables (A and B) that constitute the asymmetric game, as two independent, single population, symmetric games. We reveal several surprising formal relationships between an asymmetric two-population game and its symmetric single population counterparts, which facilitate a convenient analysis of the original asymmetric game due to the dimensionality reduction of the decomposition. The main finding reveals that if (x,y) is a Nash equilibrium of an asymmetric game (A,B), this implies that y is a Nash equilibrium of the symmetric counterpart game determined by payoff table A, and x is a Nash equilibrium of the symmetric counterpart game determined by payoff table B. Also the reverse holds and combinations of Nash equilibria of the counterpart games form Nash equilibria of the asymmetric game. We illustrate how these formal relationships aid in identifying and analysing the Nash structure of asymmetric games, by examining the evolutionary dynamics of the simpler counterpart games in several canonical examples.
△ Less
Submitted 17 January, 2018; v1 submitted 14 November, 2017;
originally announced November 2017.
-
Reachability Switching Games
Authors:
John Fearnley,
Martin Gairing,
Matthias Mnich,
Rahul Savani
Abstract:
We study the problem of deciding the winner of reachability switching games for zero-, one-, and two-player variants. Switching games provide a deterministic analogue of stochastic games. We show that the zero-player case is NL-hard, the one-player case is NP-complete, and that the two-player case is PSPACE-hard and in EXPTIME. For the zero-player case, we also show P-hardness for a succinctly-rep…
▽ More
We study the problem of deciding the winner of reachability switching games for zero-, one-, and two-player variants. Switching games provide a deterministic analogue of stochastic games. We show that the zero-player case is NL-hard, the one-player case is NP-complete, and that the two-player case is PSPACE-hard and in EXPTIME. For the zero-player case, we also show P-hardness for a succinctly-represented model that maintains the upper bound of NP $\cap$ coNP. For the one- and two-player cases, our results hold in both the natural, explicit model and succinctly-represented model. Our results show that the switching variant of a game is harder in complexity-theoretic terms than the corresponding stochastic version.
△ Less
Submitted 21 April, 2021; v1 submitted 26 September, 2017;
originally announced September 2017.
-
Lenient Multi-Agent Deep Reinforcement Learning
Authors:
Gregory Palmer,
Karl Tuyls,
Daan Bloembergen,
Rahul Savani
Abstract:
Much of the success of single agent deep reinforcement learning (DRL) in recent years can be attributed to the use of experience replay memories (ERM), which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, care is required when using ERMs for multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become outdated becaus…
▽ More
Much of the success of single agent deep reinforcement learning (DRL) in recent years can be attributed to the use of experience replay memories (ERM), which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, care is required when using ERMs for multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become outdated because agents update their policies in parallel [11]. In this work we apply leniency [23] to MA-DRL. Lenient agents map state-action pairs to decaying temperature values that control the amount of leniency applied towards negative policy updates that are sampled from the ERM. This introduces optimism in the value-function update, and has been shown to facilitate cooperation in tabular fully-cooperative multi-agent reinforcement learning problems. We evaluate our Lenient-DQN (LDQN) empirically against the related Hysteretic-DQN (HDQN) algorithm [22] as well as a modified version we call scheduled-HDQN, that uses average reward learning near terminal states. Evaluations take place in extended variations of the Coordinated Multi-Agent Object Transportation Problem (CMOTP) [8] which include fully-cooperative sub-tasks and stochastic rewards. We find that LDQN agents are more likely to converge to the optimal policy in a stochastic reward CMOTP compared to standard and scheduled-HDQN agents.
△ Less
Submitted 27 February, 2018; v1 submitted 14 July, 2017;
originally announced July 2017.
-
Computing Constrained Approximate Equilibria in Polymatrix Games
Authors:
Argyrios Deligkas,
John Fearnley,
Rahul Savani
Abstract:
This paper is about computing constrained approximate Nash equilibria in polymatrix games, which are succinctly represented many-player games defined by an interaction graph between the players. In a recent breakthrough, Rubinstein showed that there exists a small constant $ε$, such that it is PPAD-complete to find an (unconstrained) $ε$-Nash equilibrium of a polymatrix game. In the first part of…
▽ More
This paper is about computing constrained approximate Nash equilibria in polymatrix games, which are succinctly represented many-player games defined by an interaction graph between the players. In a recent breakthrough, Rubinstein showed that there exists a small constant $ε$, such that it is PPAD-complete to find an (unconstrained) $ε$-Nash equilibrium of a polymatrix game. In the first part of the paper, we show that is NP-hard to decide if a polymatrix game has a constrained approximate equilibrium for 9 natural constraints and any non-trivial approximation guarantee. These results hold even for planar bipartite polymatrix games with degree 3 and at most 7 strategies per player, and all non-trivial approximation guarantees. These results stand in contrast to similar results for bimatrix games, which obviously need a non-constant number of actions, and which rely on stronger complexity-theoretic conjectures such as the exponential time hypothesis. In the second part, we provide a deterministic QPTAS for interaction graphs with bounded treewidth and with logarithmically many actions per player that can compute constrained approximate equilibria for a wide family of constraints that cover many of the constraints dealt with in the first part.
△ Less
Submitted 8 May, 2017; v1 submitted 5 May, 2017;
originally announced May 2017.
-
LiftUpp: Support to develop learner performance
Authors:
Frans A. Oliehoek,
Rahul Savani,
Elliot Adderton,
Xia Cui,
David Jackson,
Phil Jimmieson,
John Christopher Jones,
Keith Kennedy,
Ben Mason,
Adam Plumbley,
Luke Dawson
Abstract:
Various motivations exist to move away from the simple assessment of knowledge towards the more complex assessment and development of competence. However, to accommodate such a change, high demands are put on the supporting e-infrastructure in terms of intelligently collecting and analysing data. In this paper, we discuss these challenges and how they are being addressed by LiftUpp, a system that…
▽ More
Various motivations exist to move away from the simple assessment of knowledge towards the more complex assessment and development of competence. However, to accommodate such a change, high demands are put on the supporting e-infrastructure in terms of intelligently collecting and analysing data. In this paper, we discuss these challenges and how they are being addressed by LiftUpp, a system that is now used in 70% of UK dental schools, and is finding wider applications in physiotherapy, medicine and veterinary science. We describe how data is collected for workplace-based development in dentistry using a dedicated iPad app, which enables an integrated approach to linking and assessing work flows, skills and learning outcomes. Furthermore, we detail how the various forms of collected data can be fused, visualized and integrated with conventional forms of assessment. This enables curriculum integration, improved real-time student feedback, support for administration, and informed instructional planning. Together these facets contribute to better support for the development of learners' competence in situated learning setting, as well as an improved experience. Finally, we discuss several directions for future research on intelligent teaching systems that are afforded by using the design present within LiftUpp.
△ Less
Submitted 21 April, 2017;
originally announced April 2017.
-
CLS: New Problems and Completeness
Authors:
John Fearnley,
Spencer Gordon,
Ruta Mehta,
Rahul Savani
Abstract:
The complexity class CLS was introduced by Daskalakis and Papadimitriou with the goal of capturing the complexity of some well-known problems in PPAD$~\cap~$PLS that have resisted, in some cases for decades, attempts to put them in polynomial time. No complete problem was known for CLS, and in previous work, the problems ContractionMap, i.e., the problem of finding an approximate fixpoint of a con…
▽ More
The complexity class CLS was introduced by Daskalakis and Papadimitriou with the goal of capturing the complexity of some well-known problems in PPAD$~\cap~$PLS that have resisted, in some cases for decades, attempts to put them in polynomial time. No complete problem was known for CLS, and in previous work, the problems ContractionMap, i.e., the problem of finding an approximate fixpoint of a contraction map, and PLCP, i.e., the problem of solving a P-matrix Linear Complementarity Problem, were identified as prime candidates.
First, we present a new CLS-complete problem MetaMetricContractionMap, which is closely related to the ContractionMap. Second, we introduce EndOfPotentialLine, which captures aspects of PPAD and PLS directly via a monotonic directed path, and show that EndOfPotentialLine is in CLS via a two-way reduction to EndOfMeteredLine. The latter was defined to keep track of how far a vertex is on the PPAD path via a restricted potential function. Third, we reduce PLCP to EndOfPotentialLine, thus making EndOfPotentialLine and EndOfMeteredLine at least as likely to be hard for CLS as PLCP. This last result leverages the monotonic structure of Lemke paths for PLCP problems, making EndOfPotentialLine a likely candidate to capture the exact complexity of PLCP; we note that the structure of Lemke-Howson paths for finding a Nash equilibrium in a two-player game very directly motivated the definition of the complexity class PPAD, which eventually ended up capturing this problem's complexity exactly.
△ Less
Submitted 7 April, 2017; v1 submitted 20 February, 2017;
originally announced February 2017.
-
Inapproximability Results for Approximate Nash Equilibria
Authors:
Argyrios Deligkas,
John Fearnley,
Rahul Savani
Abstract:
We study the problem of finding approximate Nash equilibria that satisfy certain conditions, such as providing good social welfare. In particular, we study the problem $ε$-NE $δ$-SW: find an $ε$-approximate Nash equilibrium ($ε$-NE) that is within $δ$ of the best social welfare achievable by an $ε$-NE. Our main result is that, if the exponential-time hypothesis (ETH) is true, then solving…
▽ More
We study the problem of finding approximate Nash equilibria that satisfy certain conditions, such as providing good social welfare. In particular, we study the problem $ε$-NE $δ$-SW: find an $ε$-approximate Nash equilibrium ($ε$-NE) that is within $δ$ of the best social welfare achievable by an $ε$-NE. Our main result is that, if the exponential-time hypothesis (ETH) is true, then solving $\left(\frac{1}{8} - \mathrm{O}(δ)\right)$-NE $\mathrm{O}(δ)$-SW for an $n\times n$ bimatrix game requires $n^{\mathrm{\widetilde Ω}(\log n)}$ time. Building on this result, we show similar conditional running time lower bounds on a number of decision problems for approximate Nash equilibria that do not involve social welfare, including maximizing or minimizing a certain player's payoff, or finding approximate equilibria contained in a given pair of supports. We show quasi-polynomial lower bounds for these problems assuming that ETH holds, where these lower bounds apply to $ε$-Nash equilibria for all $ε< \frac{1}{8}$. The hardness of these other decision problems has so far only been studied in the context of exact equilibria.
△ Less
Submitted 25 April, 2017; v1 submitted 11 August, 2016;
originally announced August 2016.
-
An Empirical Study on Computing Equilibria in Polymatrix Games
Authors:
Argyrios Deligkas,
John Fearnley,
Tobenna Peter Igwe,
Rahul Savani
Abstract:
The Nash equilibrium is an important benchmark for behaviour in systems of strategic autonomous agents. Polymatrix games are a succinct and expressive representation of multiplayer games that model pairwise interactions between players. The empirical performance of algorithms to solve these games has received little attention, despite their wide-ranging applications. In this paper we carry out a c…
▽ More
The Nash equilibrium is an important benchmark for behaviour in systems of strategic autonomous agents. Polymatrix games are a succinct and expressive representation of multiplayer games that model pairwise interactions between players. The empirical performance of algorithms to solve these games has received little attention, despite their wide-ranging applications. In this paper we carry out a comprehensive empirical study of two prominent algorithms for computing a sample equilibrium in these games, Lemke's algorithm that computes an exact equilibrium, and a gradient descent method that computes an approximate equilibrium. Our study covers games arising from a number of interesting applications. We find that Lemke's algorithm can compute exact equilibria in relatively large games in a reasonable amount of time. If we are willing to accept (high-quality) approximate equilibria, then we can deal with much larger games using the descent method. We also report on which games are most challenging for each of the algorithms.
△ Less
Submitted 16 March, 2016; v1 submitted 22 February, 2016;
originally announced February 2016.
-
Distributed Methods for Computing Approximate Equilibria
Authors:
Artur Czumaj,
Argyrios Deligkas,
Michail Fasoulakis,
John Fearnley,
Marcin Jurdziński,
Rahul Savani
Abstract:
We present a new, distributed method to compute approximate Nash equilibria in bimatrix games. In contrast to previous approaches that analyze the two payoff matrices at the same time (for example, by solving a single LP that combines the two players payoffs), our algorithm first solves two independent LPs, each of which is derived from one of the two payoff matrices, and then compute approximate…
▽ More
We present a new, distributed method to compute approximate Nash equilibria in bimatrix games. In contrast to previous approaches that analyze the two payoff matrices at the same time (for example, by solving a single LP that combines the two players payoffs), our algorithm first solves two independent LPs, each of which is derived from one of the two payoff matrices, and then compute approximate Nash equilibria using only limited communication between the players.
Our method has several applications for improved bounds for efficient computations of approximate Nash equilibria in bimatrix games. First, it yields a best polynomial-time algorithm for computing \emph{approximate well-supported Nash equilibria (WSNE)}, which guarantees to find a 0.6528-WSNE in polynomial time. Furthermore, since our algorithm solves the two LPs separately, it can be used to improve upon the best known algorithms in the limited communication setting: the algorithm can be implemented to obtain a randomized expected-polynomial-time algorithm that uses poly-logarithmic communication and finds a 0.6528-WSNE. The algorithm can also be carried out to beat the best known bound in the query complexity setting, requiring $O(n \log n)$ payoff queries to compute a 0.6528-WSNE. Finally, our approach can also be adapted to provide the best known communication efficient algorithm for computing \emph{approximate Nash equilibria}: it uses poly-logarithmic communication to find a 0.382-approximate Nash equilibrium.
△ Less
Submitted 10 December, 2015;
originally announced December 2015.
-
Computing stable outcomes in symmetric additively-separable hedonic games
Authors:
Martin Gairing,
Rahul Savani
Abstract:
We study the computational complexity of finding stable outcomes in hedonic games, which are a class of coalition formation games. We restrict our attention to symmetric additively-separable hedonic games, which are a nontrivial subclass of such games that are guaranteed to possess stable outcomes. These games are specified by an undirected edge- weighted graph: nodes are players, an outcome of th…
▽ More
We study the computational complexity of finding stable outcomes in hedonic games, which are a class of coalition formation games. We restrict our attention to symmetric additively-separable hedonic games, which are a nontrivial subclass of such games that are guaranteed to possess stable outcomes. These games are specified by an undirected edge- weighted graph: nodes are players, an outcome of the game is a partition of the nodes into coalitions, and the utility of a node is the sum of incident edge weights in the same coalition. We consider several stability requirements defined in the literature. These are based on restricting feasible player deviations, for example, by giving existing coalition members veto power. We extend these restrictions by considering more general forms of preference aggregation for coalition members. In particular, we consider voting schemes to decide if coalition members will allow a player to enter or leave their coalition. For all of the stability requirements we consider, the existence of a stable outcome is guaranteed by a potential function argument, and local improvements will converge to a stable outcome. We provide an almost complete characterization of these games in terms of the tractability of computing such stable outcomes. Our findings comprise positive results in the form of polynomial-time algorithms, and negative (PLS-completeness) results. The negative results extend to more general hedonic games.
△ Less
Submitted 17 September, 2015;
originally announced September 2015.
-
The Complexity of All-switches Strategy Improvement
Authors:
John Fearnley,
Rahul Savani
Abstract:
Strategy improvement is a widely-used and well-studied class of algorithms for solving graph-based infinite games. These algorithms are parameterized by a switching rule, and one of the most natural rules is "all switches" which switches as many edges as possible in each iteration. Continuing a recent line of work, we study all-switches strategy improvement from the perspective of computational co…
▽ More
Strategy improvement is a widely-used and well-studied class of algorithms for solving graph-based infinite games. These algorithms are parameterized by a switching rule, and one of the most natural rules is "all switches" which switches as many edges as possible in each iteration. Continuing a recent line of work, we study all-switches strategy improvement from the perspective of computational complexity. We consider two natural decision problems, both of which have as input a game $G$, a starting strategy $s$, and an edge $e$. The problems are: 1.) The edge switch problem, namely, is the edge $e$ ever switched by all-switches strategy improvement when it is started from $s$ on game $G$? 2.) The optimal strategy problem, namely, is the edge $e$ used in the final strategy that is found by strategy improvement when it is started from $s$ on game $G$? We show $\mathtt{PSPACE}$-completeness of the edge switch problem and optimal strategy problem for the following settings: Parity games with the discrete strategy improvement algorithm of Vöge and Jurdziński; mean-payoff games with the gain-bias algorithm [14,37]; and discounted-payoff games and simple stochastic games with their standard strategy improvement algorithms. We also show $\mathtt{PSPACE}$-completeness of an analogous problem to edge switch for the bottom-antipodal algorithm for finding the sink of an Acyclic Unique Sink Orientation on a cube.
△ Less
Submitted 29 October, 2018; v1 submitted 16 July, 2015;
originally announced July 2015.
-
An Empirical Study of Finding Approximate Equilibria in Bimatrix Games
Authors:
John Fearnley,
Tobenna Peter Igwe,
Rahul Savani
Abstract:
While there have been a number of studies about the efficacy of methods to find exact Nash equilibria in bimatrix games, there has been little empirical work on finding approximate Nash equilibria. Here we provide such a study that compares a number of approximation methods and exact methods. In particular, we explore the trade-off between the quality of approximate equilibrium and the required ru…
▽ More
While there have been a number of studies about the efficacy of methods to find exact Nash equilibria in bimatrix games, there has been little empirical work on finding approximate Nash equilibria. Here we provide such a study that compares a number of approximation methods and exact methods. In particular, we explore the trade-off between the quality of approximate equilibrium and the required running time to find one. We found that the existing library GAMUT, which has been the de facto standard that has been used to test exact methods, is insufficient as a test bed for approximation methods since many of its games have pure equilibria or other easy-to-find good approximate equilibria. We extend the breadth and depth of our study by including new interesting families of bimatrix games, and studying bimatrix games upto size $2000 \times 2000$. Finally, we provide new close-to-worst-case examples for the best-performing algorithms for finding approximate Nash equilibria.
△ Less
Submitted 9 April, 2015; v1 submitted 17 February, 2015;
originally announced February 2015.
-
Unit Vector Games
Authors:
Rahul Savani,
Bernhard von Stengel
Abstract:
McLennan and Tourky (2010) showed that "imitation games" provide a new view of the computation of Nash equilibria of bimatrix games with the Lemke-Howson algorithm. In an imitation game, the payoff matrix of one of the players is the identity matrix. We study the more general "unit vector games", which are already known, where the payoff matrix of one player is composed of unit vectors. Our main a…
▽ More
McLennan and Tourky (2010) showed that "imitation games" provide a new view of the computation of Nash equilibria of bimatrix games with the Lemke-Howson algorithm. In an imitation game, the payoff matrix of one of the players is the identity matrix. We study the more general "unit vector games", which are already known, where the payoff matrix of one player is composed of unit vectors. Our main application is a simplification of the construction by Savani and von Stengel (2006) of bimatrix games where two basic equilibrium-finding algorithms take exponentially many steps: the Lemke-Howson algorithm, and support enumeration.
△ Less
Submitted 14 February, 2016; v1 submitted 9 January, 2015;
originally announced January 2015.
-
Computing Approximate Nash Equilibria in Polymatrix Games
Authors:
Argyrios Deligkas,
John Fearnley,
Rahul Savani,
Paul Spirakis
Abstract:
In an $ε$-Nash equilibrium, a player can gain at most $ε$ by unilaterally changing his behaviour. For two-player (bimatrix) games with payoffs in $[0,1]$, the best-known$ε$ achievable in polynomial time is 0.3393. In general, for $n$-player games an $ε$-Nash equilibrium can be computed in polynomial time for an $ε$ that is an increasing function of $n$ but does not depend on the number of strategi…
▽ More
In an $ε$-Nash equilibrium, a player can gain at most $ε$ by unilaterally changing his behaviour. For two-player (bimatrix) games with payoffs in $[0,1]$, the best-known$ε$ achievable in polynomial time is 0.3393. In general, for $n$-player games an $ε$-Nash equilibrium can be computed in polynomial time for an $ε$ that is an increasing function of $n$ but does not depend on the number of strategies of the players. For three-player and four-player games the corresponding values of $ε$ are 0.6022 and 0.7153, respectively. Polymatrix games are a restriction of general $n$-player games where a player's payoff is the sum of payoffs from a number of bimatrix games. There exists a very small but constant $ε$ such that computing an $ε$-Nash equilibrium of a polymatrix game is \PPAD-hard. Our main result is that a $(0.5+δ)$-Nash equilibrium of an $n$-player polymatrix game can be computed in time polynomial in the input size and $\frac{1}δ$. Inspired by the algorithm of Tsaknakis and Spirakis, our algorithm uses gradient descent on the maximum regret of the players. We also show that this algorithm can be applied to efficiently find a $(0.5+δ)$-Nash equilibrium in a two-player Bayesian game.
△ Less
Submitted 1 October, 2014; v1 submitted 12 September, 2014;
originally announced September 2014.
-
The Complexity of the Simplex Method
Authors:
John Fearnley,
Rahul Savani
Abstract:
The simplex method is a well-studied and widely-used pivoting method for solving linear programs. When Dantzig originally formulated the simplex method, he gave a natural pivot rule that pivots into the basis a variable with the most violated reduced cost. In their seminal work, Klee and Minty showed that this pivot rule takes exponential time in the worst case. We prove two main results on the si…
▽ More
The simplex method is a well-studied and widely-used pivoting method for solving linear programs. When Dantzig originally formulated the simplex method, he gave a natural pivot rule that pivots into the basis a variable with the most violated reduced cost. In their seminal work, Klee and Minty showed that this pivot rule takes exponential time in the worst case. We prove two main results on the simplex method. Firstly, we show that it is PSPACE-complete to find the solution that is computed by the simplex method using Dantzig's pivot rule. Secondly, we prove that deciding whether Dantzig's rule ever chooses a specific variable to enter the basis is PSPACE-complete. We use the known connection between Markov decision processes (MDPs) and linear programming, and an equivalence between Dantzig's pivot rule and a natural variant of policy iteration for average-reward MDPs. We construct MDPs and show PSPACE-completeness results for single-switch policy iteration, which in turn imply our main results for the simplex method.
△ Less
Submitted 17 April, 2014; v1 submitted 2 April, 2014;
originally announced April 2014.
-
Game Theory Explorer - Software for the Applied Game Theorist
Authors:
Rahul Savani,
Bernhard von Stengel
Abstract:
This paper presents the "Game Theory Explorer" software tool to create and analyze games as models of strategic interaction. A game in extensive or strategic form is created and nicely displayed with a graphical user interface in a web browser. State-of-the-art algorithms then compute all Nash equilibria of the game after a mouseclick. In tutorial fashion, we present how the program is used, and t…
▽ More
This paper presents the "Game Theory Explorer" software tool to create and analyze games as models of strategic interaction. A game in extensive or strategic form is created and nicely displayed with a graphical user interface in a web browser. State-of-the-art algorithms then compute all Nash equilibria of the game after a mouseclick. In tutorial fashion, we present how the program is used, and the ideas behind its main algorithms. We report on experiences with the architecture of the software and its development as an open-source project.
△ Less
Submitted 16 March, 2014;
originally announced March 2014.
-
Finding Approximate Nash Equilibria of Bimatrix Games via Payoff Queries
Authors:
John Fearnley,
Rahul Savani
Abstract:
We study the deterministic and randomized query complexity of finding approximate equilibria in bimatrix games. We show that the deterministic query complexity of finding an $ε$-Nash equilibrium when $ε< \frac{1}{2}$ is $Ω(k^2)$, even in zero-one constant-sum games. In combination with previous results \cite{FGGS13}, this provides a complete characterization of the deterministic query complexity o…
▽ More
We study the deterministic and randomized query complexity of finding approximate equilibria in bimatrix games. We show that the deterministic query complexity of finding an $ε$-Nash equilibrium when $ε< \frac{1}{2}$ is $Ω(k^2)$, even in zero-one constant-sum games. In combination with previous results \cite{FGGS13}, this provides a complete characterization of the deterministic query complexity of approximate Nash equilibria. We also study randomized querying algorithms. We give a randomized algorithm for finding a $(\frac{3 - \sqrt{5}}{2} + ε)$-Nash equilibrium using $O(\frac{k \cdot \log k}{ε^2})$ payoff queries, which shows that the $\frac{1}{2}$ barrier for deterministic algorithms can be broken by randomization. For well-supported Nash equilibria (WSNE), we first give a randomized algorithm for finding an $ε$-WSNE of a zero-sum bimatrix game using $O(\frac{k \cdot \log k}{ε^4})$ payoff queries, and we then use this to obtain a randomized algorithm for finding a $(\frac{2}{3} + ε)$-WSNE in a general bimatrix game using $O(\frac{k \cdot \log k}{ε^4})$ payoff queries. Finally, we initiate the study of lower bounds against randomized algorithms in the context of bimatrix games, by showing that randomized algorithms require $Ω(k^2)$ payoff queries in order to find a $\frac{1}{6k}$-Nash equilibrium, even in zero-one constant-sum games. In particular, this rules out query-efficient randomized algorithms for finding exact Nash equilibria.
△ Less
Submitted 12 February, 2014; v1 submitted 28 October, 2013;
originally announced October 2013.
-
Polylogarithmic Supports are required for Approximate Well-Supported Nash Equilibria below 2/3
Authors:
Yogesh Anbalagan,
Sergey Norin,
Rahul Savani,
Adrian Vetta
Abstract:
In an epsilon-approximate Nash equilibrium, a player can gain at most epsilon in expectation by unilateral deviation. An epsilon well-supported approximate Nash equilibrium has the stronger requirement that every pure strategy used with positive probability must have payoff within epsilon of the best response payoff. Daskalakis, Mehta and Papadimitriou conjectured that every win-lose bimatrix game…
▽ More
In an epsilon-approximate Nash equilibrium, a player can gain at most epsilon in expectation by unilateral deviation. An epsilon well-supported approximate Nash equilibrium has the stronger requirement that every pure strategy used with positive probability must have payoff within epsilon of the best response payoff. Daskalakis, Mehta and Papadimitriou conjectured that every win-lose bimatrix game has a 2/3-well-supported Nash equilibrium that uses supports of cardinality at most three. Indeed, they showed that such an equilibrium will exist subject to the correctness of a graph-theoretic conjecture. Regardless of the correctness of this conjecture, we show that the barrier of a 2/3 payoff guarantee cannot be broken with constant size supports; we construct win-lose games that require supports of cardinality at least Omega((log n)^(1/3)) in any epsilon-well supported equilibrium with epsilon < 2/3. The key tool in showing the validity of the construction is a proof of a bipartite digraph variant of the well-known Caccetta-Haggkvist conjecture. A probabilistic argument shows that there exist epsilon-well-supported equilibria with supports of cardinality O(log n/(epsilon^2)), for any epsilon> 0; thus, the polylogarithmic cardinality bound presented cannot be greatly improved. We also show that for any delta > 0, there exist win-lose games for which no pair of strategies with support sizes at most two is a (1-delta)-well-supported Nash equilibrium. In contrast, every bimatrix game with payoffs in [0,1] has a 1/2-approximate Nash equilibrium where the supports of the players have cardinality at most two.
△ Less
Submitted 21 March, 2014; v1 submitted 27 September, 2013;
originally announced September 2013.
-
Learning Equilibria of Games via Payoff Queries
Authors:
John Fearnley,
Martin Gairing,
Paul Goldberg,
Rahul Savani
Abstract:
A recent body of experimental literature has studied empirical game-theoretical analysis, in which we have partial knowledge of a game, consisting of observations of a subset of the pure-strategy profiles and their associated payoffs to players. The aim is to find an exact or approximate Nash equilibrium of the game, based on these observations. It is usually assumed that the strategy profiles may…
▽ More
A recent body of experimental literature has studied empirical game-theoretical analysis, in which we have partial knowledge of a game, consisting of observations of a subset of the pure-strategy profiles and their associated payoffs to players. The aim is to find an exact or approximate Nash equilibrium of the game, based on these observations. It is usually assumed that the strategy profiles may be chosen in an on-line manner by the algorithm. We study a corresponding computational learning model, and the query complexity of learning equilibria for various classes of games. We give basic results for bimatrix and graphical games. Our focus is on symmetric network congestion games. For directed acyclic networks, we can learn the cost functions (and hence compute an equilibrium) while querying just a small fraction of pure-strategy profiles. For the special case of parallel links, we have the stronger result that an equilibrium can be identified while only learning a small fraction of the cost values.
△ Less
Submitted 12 February, 2014; v1 submitted 13 February, 2013;
originally announced February 2013.
-
Approximate Well-supported Nash Equilibria below Two-thirds
Authors:
John Fearnley,
Paul W. Goldberg,
Rahul Savani,
Troels Bjerre Sørensen
Abstract:
In an epsilon-Nash equilibrium, a player can gain at most epsilon by changing his behaviour. Recent work has addressed the question of how best to compute epsilon-Nash equilibria, and for what values of epsilon a polynomial-time algorithm exists. An epsilon-well-supported Nash equilibrium (epsilon-WSNE) has the additional requirement that any strategy that is used with non-zero probability by a pl…
▽ More
In an epsilon-Nash equilibrium, a player can gain at most epsilon by changing his behaviour. Recent work has addressed the question of how best to compute epsilon-Nash equilibria, and for what values of epsilon a polynomial-time algorithm exists. An epsilon-well-supported Nash equilibrium (epsilon-WSNE) has the additional requirement that any strategy that is used with non-zero probability by a player must have payoff at most epsilon less than the best response. A recent algorithm of Kontogiannis and Spirakis shows how to compute a 2/3-WSNE in polynomial time, for bimatrix games. Here we introduce a new technique that leads to an improvement to the worst-case approximation guarantee.
△ Less
Submitted 2 December, 2014; v1 submitted 3 April, 2012;
originally announced April 2012.
-
On the Approximation Performance of Fictitious Play in Finite Games
Authors:
Paul W. Goldberg,
Rahul Savani,
Troels Bjerre Sorensen,
Carmine Ventre
Abstract:
We study the performance of Fictitious Play, when used as a heuristic for finding an approximate Nash equilibrium of a 2-player game. We exhibit a class of 2-player games having payoffs in the range [0,1] that show that Fictitious Play fails to find a solution having an additive approximation guarantee significantly better than 1/2. Our construction shows that for n times n games, in the worst cas…
▽ More
We study the performance of Fictitious Play, when used as a heuristic for finding an approximate Nash equilibrium of a 2-player game. We exhibit a class of 2-player games having payoffs in the range [0,1] that show that Fictitious Play fails to find a solution having an additive approximation guarantee significantly better than 1/2. Our construction shows that for n times n games, in the worst case both players may perpetually have mixed strategies whose payoffs fall short of the best response by an additive quantity 1/2 - O(1/n^(1-delta)) for arbitrarily small delta. We also show an essentially matching upper bound of 1/2 - O(1/n).
△ Less
Submitted 19 March, 2011; v1 submitted 5 March, 2011;
originally announced March 2011.
-
The Complexity of the Homotopy Method, Equilibrium Selection, and Lemke-Howson Solutions
Authors:
Paul W. Goldberg,
Christos H. Papadimitriou,
Rahul Savani
Abstract:
We show that the widely used homotopy method for solving fixpoint problems, as well as the Harsanyi-Selten equilibrium selection process for games, are PSPACE-complete to implement. Extending our result for the Harsanyi-Selten process, we show that several other homotopy-based algorithms for finding equilibria of games are also PSPACE-complete to implement. A further application of our techniques…
▽ More
We show that the widely used homotopy method for solving fixpoint problems, as well as the Harsanyi-Selten equilibrium selection process for games, are PSPACE-complete to implement. Extending our result for the Harsanyi-Selten process, we show that several other homotopy-based algorithms for finding equilibria of games are also PSPACE-complete to implement. A further application of our techniques yields the result that it is PSPACE-complete to compute any of the equilibria that could be found via the classical Lemke-Howson algorithm, a complexity-theoretic strengthening of the result in [Savani and von Stengel]. These results show that our techniques can be widely applied and suggest that the PSPACE-completeness of implementing homotopy methods is a general principle.
△ Less
Submitted 4 August, 2011; v1 submitted 28 June, 2010;
originally announced June 2010.