-
Ordinal Potential-based Player Rating
Authors:
Nelson Vadori,
Rahul Savani
Abstract:
It was recently observed that Elo ratings fail at preserving transitive relations among strategies and therefore cannot correctly extract the transitive component of a game. We provide a characterization of transitive games as a weak variant of ordinal potential games and show that Elo ratings actually do preserve transitivity when computed in the right space, using suitable invertible map**s. L…
▽ More
It was recently observed that Elo ratings fail at preserving transitive relations among strategies and therefore cannot correctly extract the transitive component of a game. We provide a characterization of transitive games as a weak variant of ordinal potential games and show that Elo ratings actually do preserve transitivity when computed in the right space, using suitable invertible map**s. Leveraging this insight, we introduce a new game decomposition of an arbitrary game into transitive and cyclic components that is learnt using a neural network-based architecture and that prioritises capturing the sign pattern of the game, namely transitive and cyclic relations among strategies. We link our approach to the known concept of sign-rank, and evaluate our methodology using both toy examples and empirical data from real-world games.
△ Less
Submitted 6 March, 2024; v1 submitted 8 June, 2023;
originally announced June 2023.
-
Towards Multi-Agent Reinforcement Learning driven Over-The-Counter Market Simulations
Authors:
Nelson Vadori,
Leo Ardon,
Sumitra Ganesh,
Thomas Spooner,
Selim Amrouni,
Jared Vann,
Mengda Xu,
Zeyu Zheng,
Tucker Balch,
Manuela Veloso
Abstract:
We study a game between liquidity provider and liquidity taker agents interacting in an over-the-counter market, for which the typical example is foreign exchange. We show how a suitable design of parameterized families of reward functions coupled with shared policy learning constitutes an efficient solution to this problem. By playing against each other, our deep-reinforcement-learning-driven age…
▽ More
We study a game between liquidity provider and liquidity taker agents interacting in an over-the-counter market, for which the typical example is foreign exchange. We show how a suitable design of parameterized families of reward functions coupled with shared policy learning constitutes an efficient solution to this problem. By playing against each other, our deep-reinforcement-learning-driven agents learn emergent behaviors relative to a wide spectrum of objectives encompassing profit-and-loss, optimal execution and market share. In particular, we find that liquidity providers naturally learn to balance hedging and skewing, where skewing refers to setting their buy and sell prices asymmetrically as a function of their inventory. We further introduce a novel RL-based calibration algorithm which we found performed well at imposing constraints on the game equilibrium. On the theoretical side, we are able to show convergence rates for our multi-agent policy gradient algorithm under a transitivity assumption, closely related to generalized ordinal potential games.
△ Less
Submitted 1 August, 2023; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Calibration of Derivative Pricing Models: a Multi-Agent Reinforcement Learning Perspective
Authors:
Nelson Vadori
Abstract:
One of the most fundamental questions in quantitative finance is the existence of continuous-time diffusion models that fit market prices of a given set of options. Traditionally, one employs a mix of intuition, theoretical and empirical analysis to find models that achieve exact or approximate fits. Our contribution is to show how a suitable game theoretical formulation of this problem can help s…
▽ More
One of the most fundamental questions in quantitative finance is the existence of continuous-time diffusion models that fit market prices of a given set of options. Traditionally, one employs a mix of intuition, theoretical and empirical analysis to find models that achieve exact or approximate fits. Our contribution is to show how a suitable game theoretical formulation of this problem can help solve this question by leveraging existing developments in modern deep multi-agent reinforcement learning to search in the space of stochastic processes. Our experiments show that we are able to learn local volatility, as well as path-dependence required in the volatility process to minimize the price of a Bermudan option. Our algorithm can be seen as a particle method \textit{à la} Guyon \textit{et} Henry-Labordere where particles, instead of being designed to ensure $σ_{loc}(t,S_t)^2 = \mathbb{E}[σ_t^2|S_t]$, are learning RL-driven agents cooperating towards more general calibration targets.
△ Less
Submitted 6 October, 2023; v1 submitted 14 March, 2022;
originally announced March 2022.
-
Towards a fully RL-based Market Simulator
Authors:
Leo Ardon,
Nelson Vadori,
Thomas Spooner,
Mengda Xu,
Jared Vann,
Sumitra Ganesh
Abstract:
We present a new financial framework where two families of RL-based agents representing the Liquidity Providers and Liquidity Takers learn simultaneously to satisfy their objective. Thanks to a parametrized reward formulation and the use of Deep RL, each group learns a shared policy able to generalize and interpolate over a wide range of behaviors. This is a step towards a fully RL-based market si…
▽ More
We present a new financial framework where two families of RL-based agents representing the Liquidity Providers and Liquidity Takers learn simultaneously to satisfy their objective. Thanks to a parametrized reward formulation and the use of Deep RL, each group learns a shared policy able to generalize and interpolate over a wide range of behaviors. This is a step towards a fully RL-based market simulator replicating complex market conditions particularly suited to study the dynamics of the financial market under various scenarios.
△ Less
Submitted 8 November, 2021; v1 submitted 13 October, 2021;
originally announced October 2021.
-
Consensus Multiplicative Weights Update: Learning to Learn using Projector-based Game Signatures
Authors:
Nelson Vadori,
Rahul Savani,
Thomas Spooner,
Sumitra Ganesh
Abstract:
Cheung and Piliouras (2020) recently showed that two variants of the Multiplicative Weights Update method - OMWU and MWU - display opposite convergence properties depending on whether the game is zero-sum or cooperative. Inspired by this work and the recent literature on learning to optimize for single functions, we introduce a new framework for learning last-iterate convergence to Nash Equilibria…
▽ More
Cheung and Piliouras (2020) recently showed that two variants of the Multiplicative Weights Update method - OMWU and MWU - display opposite convergence properties depending on whether the game is zero-sum or cooperative. Inspired by this work and the recent literature on learning to optimize for single functions, we introduce a new framework for learning last-iterate convergence to Nash Equilibria in games, where the update rule's coefficients (learning rates) along a trajectory are learnt by a reinforcement learning policy that is conditioned on the nature of the game: \textit{the game signature}. We construct the latter using a new decomposition of two-player games into eight components corresponding to commutative projection operators, generalizing and unifying recent game concepts studied in the literature. We compare the performance of various update rules when their coefficients are learnt, and show that the RL policy is able to exploit the game signature across a wide range of game types. In doing so, we introduce CMWU, a new algorithm that extends consensus optimization to the constrained case, has local convergence guarantees for zero-sum bimatrix games, and show that it enjoys competitive performance on both zero-sum games with constant coefficients and across a spectrum of games when its coefficients are learnt.
△ Less
Submitted 11 June, 2022; v1 submitted 4 June, 2021;
originally announced June 2021.
-
Factored Policy Gradients: Leveraging Structure for Efficient Learning in MOMDPs
Authors:
Thomas Spooner,
Nelson Vadori,
Sumitra Ganesh
Abstract:
Policy gradient methods can solve complex tasks but often fail when the dimensionality of the action-space or objective multiplicity grow very large. This occurs, in part, because the variance on score-based gradient estimators scales quadratically. In this paper, we address this problem through a factor baseline which exploits independence structure encoded in a novel action-target influence netw…
▽ More
Policy gradient methods can solve complex tasks but often fail when the dimensionality of the action-space or objective multiplicity grow very large. This occurs, in part, because the variance on score-based gradient estimators scales quadratically. In this paper, we address this problem through a factor baseline which exploits independence structure encoded in a novel action-target influence network. Factored policy gradients (FPGs), which follow, provide a common framework for analysing key state-of-the-art algorithms, are shown to generalise traditional policy gradients, and yield a principled way of incorporating prior knowledge of a problem domain's generative processes. We provide an analysis of the proposed estimator and identify the conditions under which variance is reduced. The algorithmic aspects of FPGs are discussed, including optimal policy factorisation, as characterised by minimum biclique coverings, and the implications for the bias-variance trade-off of incorrectly specifying the network. Finally, we demonstrate the performance advantages of our algorithm on large-scale bandit and traffic intersection problems, providing a novel contribution to the latter in the form of a spatial approximation.
△ Less
Submitted 23 November, 2021; v1 submitted 20 February, 2021;
originally announced February 2021.
-
Calibration of Shared Equilibria in General Sum Partially Observable Markov Games
Authors:
Nelson Vadori,
Sumitra Ganesh,
Prashant Reddy,
Manuela Veloso
Abstract:
Training multi-agent systems (MAS) to achieve realistic equilibria gives us a useful tool to understand and model real-world systems. We consider a general sum partially observable Markov game where agents of different types share a single policy network, conditioned on agent-specific information. This paper aims at i) formally understanding equilibria reached by such agents, and ii) matching emer…
▽ More
Training multi-agent systems (MAS) to achieve realistic equilibria gives us a useful tool to understand and model real-world systems. We consider a general sum partially observable Markov game where agents of different types share a single policy network, conditioned on agent-specific information. This paper aims at i) formally understanding equilibria reached by such agents, and ii) matching emergent phenomena of such equilibria to real-world targets. Parameter sharing with decentralized execution has been introduced as an efficient way to train multiple agents using a single policy network. However, the nature of resulting equilibria reached by such agents has not been yet studied: we introduce the novel concept of Shared equilibrium as a symmetric pure Nash equilibrium of a certain Functional Form Game (FFG) and prove convergence to the latter for a certain class of games using self-play. In addition, it is important that such equilibria satisfy certain constraints so that MAS are calibrated to real world data for practical use: we solve this problem by introducing a novel dual-Reinforcement Learning based approach that fits emergent behaviors of agents in a Shared equilibrium to externally-specified targets, and apply our methods to a n-player market example. We do so by calibrating parameters governing distributions of agent types rather than individual agents, which allows both behavior differentiation among agents and coherent scaling of the shared policy network to multiple agents.
△ Less
Submitted 23 October, 2020; v1 submitted 23 June, 2020;
originally announced June 2020.
-
Risk-Sensitive Reinforcement Learning: a Martingale Approach to Reward Uncertainty
Authors:
Nelson Vadori,
Sumitra Ganesh,
Prashant Reddy,
Manuela Veloso
Abstract:
We introduce a novel framework to account for sensitivity to rewards uncertainty in sequential decision-making problems. While risk-sensitive formulations for Markov decision processes studied so far focus on the distribution of the cumulative reward as a whole, we aim at learning policies sensitive to the uncertain/stochastic nature of the rewards, which has the advantage of being conceptually mo…
▽ More
We introduce a novel framework to account for sensitivity to rewards uncertainty in sequential decision-making problems. While risk-sensitive formulations for Markov decision processes studied so far focus on the distribution of the cumulative reward as a whole, we aim at learning policies sensitive to the uncertain/stochastic nature of the rewards, which has the advantage of being conceptually more meaningful in some cases. To this end, we present a new decomposition of the randomness contained in the cumulative reward based on the Doob decomposition of a stochastic process, and introduce a new conceptual tool - the \textit{chaotic variation} - which can rigorously be interpreted as the risk measure of the martingale component associated to the cumulative reward process. We innovate on the reinforcement learning side by incorporating this new risk-sensitive approach into model-free algorithms, both policy gradient and value function based, and illustrate its relevance on grid world and portfolio optimization problems.
△ Less
Submitted 15 September, 2020; v1 submitted 22 June, 2020;
originally announced June 2020.
-
Reinforcement Learning for Market Making in a Multi-agent Dealer Market
Authors:
Sumitra Ganesh,
Nelson Vadori,
Mengda Xu,
Hua Zheng,
Prashant Reddy,
Manuela Veloso
Abstract:
Market makers play an important role in providing liquidity to markets by continuously quoting prices at which they are willing to buy and sell, and managing inventory risk. In this paper, we build a multi-agent simulation of a dealer market and demonstrate that it can be used to understand the behavior of a reinforcement learning (RL) based market maker agent. We use the simulator to train an RL-…
▽ More
Market makers play an important role in providing liquidity to markets by continuously quoting prices at which they are willing to buy and sell, and managing inventory risk. In this paper, we build a multi-agent simulation of a dealer market and demonstrate that it can be used to understand the behavior of a reinforcement learning (RL) based market maker agent. We use the simulator to train an RL-based market maker agent with different competitive scenarios, reward formulations and market price trends (drifts). We show that the reinforcement learning agent is able to learn about its competitor's pricing policy; it also learns to manage inventory by smartly selecting asymmetric prices on the buy and sell sides (skewing), and maintaining a positive (or negative) inventory depending on whether the market price drift is positive (or negative). Finally, we propose and test reward formulations for creating risk averse RL-based market maker agents.
△ Less
Submitted 13 November, 2019;
originally announced November 2019.
-
A Semi-Markovian Modeling of Limit Order Markets
Authors:
Anatoliy Swishchuk,
Nelson Vadori
Abstract:
R. Cont and A. de Larrard (SIAM J. Finan. Math, 2013) introduced a tractable stochastic model for the dynamics of a limit order book, computing various quantities of interest such as the probability of a price increase or the diffusion limit of the price process. As suggested by empirical observations, we extend their framework to 1) arbitrary distributions for book events inter-arrival times (pos…
▽ More
R. Cont and A. de Larrard (SIAM J. Finan. Math, 2013) introduced a tractable stochastic model for the dynamics of a limit order book, computing various quantities of interest such as the probability of a price increase or the diffusion limit of the price process. As suggested by empirical observations, we extend their framework to 1) arbitrary distributions for book events inter-arrival times (possibly non-exponential) and 2) both the nature of a new book event and its corresponding inter-arrival time depend on the nature of the previous book event. We do so by resorting to Markov renewal processes to model the dynamics of the bid and ask queues. We keep analytical tractability via explicit expressions for the Laplace transforms of various quantities of interest. We justify and illustrate our approach by calibrating our model to the five stocks Amazon, Apple, Google, Intel and Microsoft on June 21^{st} 2012. As in R. Cont and A. de Larrard, the bid-ask spread remains constant equal to one tick, only the bid and ask queues are modeled (they are independent from each other and get reinitialized after a price change), and all orders have the same size.
△ Less
Submitted 7 January, 2016;
originally announced January 2016.
-
Law of Large Numbers for Semi-Markov inhomogeneous Random Evolutions on Banach spaces
Authors:
Nelson Vadori,
Anatoliy Swishchuk
Abstract:
Using backward propagators, we construct inhomogeneous Random Evolutions on Banach spaces driven by (uniformly ergodic) Semi-Markov processes. After studying some of their properties (measurability, continuity, integral representation), we establish a Law of Large Numbers for such inhomogeneous Random Evolutions, and more precisely their weak convergence - in the Skorohod space $D$ - to an inhomog…
▽ More
Using backward propagators, we construct inhomogeneous Random Evolutions on Banach spaces driven by (uniformly ergodic) Semi-Markov processes. After studying some of their properties (measurability, continuity, integral representation), we establish a Law of Large Numbers for such inhomogeneous Random Evolutions, and more precisely their weak convergence - in the Skorohod space $D$ - to an inhomogeneous semigroup. A martingale characterization of these inhomogeneous Random Evolutions is also obtained. Finally, we present applications to inhomogeneous Lévy Random Evolutions.
△ Less
Submitted 27 May, 2013; v1 submitted 15 April, 2013;
originally announced April 2013.