Search | arXiv e-print repository

Ordinal Potential-based Player Rating

Abstract: It was recently observed that Elo ratings fail at preserving transitive relations among strategies and therefore cannot correctly extract the transitive component of a game. We provide a characterization of transitive games as a weak variant of ordinal potential games and show that Elo ratings actually do preserve transitivity when computed in the right space, using suitable invertible map**s. L… ▽ More It was recently observed that Elo ratings fail at preserving transitive relations among strategies and therefore cannot correctly extract the transitive component of a game. We provide a characterization of transitive games as a weak variant of ordinal potential games and show that Elo ratings actually do preserve transitivity when computed in the right space, using suitable invertible map**s. Leveraging this insight, we introduce a new game decomposition of an arbitrary game into transitive and cyclic components that is learnt using a neural network-based architecture and that prioritises capturing the sign pattern of the game, namely transitive and cyclic relations among strategies. We link our approach to the known concept of sign-rank, and evaluate our methodology using both toy examples and empirical data from real-world games. △ Less

Submitted 6 March, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

arXiv:2210.07184 [pdf, other]

Towards Multi-Agent Reinforcement Learning driven Over-The-Counter Market Simulations

Authors: Nelson Vadori, Leo Ardon, Sumitra Ganesh, Thomas Spooner, Selim Amrouni, Jared Vann, Mengda Xu, Zeyu Zheng, Tucker Balch, Manuela Veloso

Abstract: We study a game between liquidity provider and liquidity taker agents interacting in an over-the-counter market, for which the typical example is foreign exchange. We show how a suitable design of parameterized families of reward functions coupled with shared policy learning constitutes an efficient solution to this problem. By playing against each other, our deep-reinforcement-learning-driven age… ▽ More We study a game between liquidity provider and liquidity taker agents interacting in an over-the-counter market, for which the typical example is foreign exchange. We show how a suitable design of parameterized families of reward functions coupled with shared policy learning constitutes an efficient solution to this problem. By playing against each other, our deep-reinforcement-learning-driven agents learn emergent behaviors relative to a wide spectrum of objectives encompassing profit-and-loss, optimal execution and market share. In particular, we find that liquidity providers naturally learn to balance hedging and skewing, where skewing refers to setting their buy and sell prices asymmetrically as a function of their inventory. We further introduce a novel RL-based calibration algorithm which we found performed well at imposing constraints on the game equilibrium. On the theoretical side, we are able to show convergence rates for our multi-agent policy gradient algorithm under a transitivity assumption, closely related to generalized ordinal potential games. △ Less

Submitted 1 August, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

arXiv:2203.06865 [pdf, other]

Calibration of Derivative Pricing Models: a Multi-Agent Reinforcement Learning Perspective

Authors: Nelson Vadori

Abstract: One of the most fundamental questions in quantitative finance is the existence of continuous-time diffusion models that fit market prices of a given set of options. Traditionally, one employs a mix of intuition, theoretical and empirical analysis to find models that achieve exact or approximate fits. Our contribution is to show how a suitable game theoretical formulation of this problem can help s… ▽ More One of the most fundamental questions in quantitative finance is the existence of continuous-time diffusion models that fit market prices of a given set of options. Traditionally, one employs a mix of intuition, theoretical and empirical analysis to find models that achieve exact or approximate fits. Our contribution is to show how a suitable game theoretical formulation of this problem can help solve this question by leveraging existing developments in modern deep multi-agent reinforcement learning to search in the space of stochastic processes. Our experiments show that we are able to learn local volatility, as well as path-dependence required in the volatility process to minimize the price of a Bermudan option. Our algorithm can be seen as a particle method \textit{à la} Guyon \textit{et} Henry-Labordere where particles, instead of being designed to ensure $σ_{loc}(t,S_t)^2 = \mathbb{E}[σ_t^2|S_t]$, are learning RL-driven agents cooperating towards more general calibration targets. △ Less

Submitted 6 October, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

arXiv:2110.06829 [pdf, other]

doi 10.1145/3490354.3494372

Towards a fully RL-based Market Simulator

Authors: Leo Ardon, Nelson Vadori, Thomas Spooner, Mengda Xu, Jared Vann, Sumitra Ganesh

Abstract: We present a new financial framework where two families of RL-based agents representing the Liquidity Providers and Liquidity Takers learn simultaneously to satisfy their objective. Thanks to a parametrized reward formulation and the use of Deep RL, each group learns a shared policy able to generalize and interpolate over a wide range of behaviors. This is a step towards a fully RL-based market si… ▽ More We present a new financial framework where two families of RL-based agents representing the Liquidity Providers and Liquidity Takers learn simultaneously to satisfy their objective. Thanks to a parametrized reward formulation and the use of Deep RL, each group learns a shared policy able to generalize and interpolate over a wide range of behaviors. This is a step towards a fully RL-based market simulator replicating complex market conditions particularly suited to study the dynamics of the financial market under various scenarios. △ Less

Submitted 8 November, 2021; v1 submitted 13 October, 2021; originally announced October 2021.

Journal ref: ACM International Conference on AI in Finance, 2021

arXiv:2106.02615 [pdf, other]

Consensus Multiplicative Weights Update: Learning to Learn using Projector-based Game Signatures

Authors: Nelson Vadori, Rahul Savani, Thomas Spooner, Sumitra Ganesh

Abstract: Cheung and Piliouras (2020) recently showed that two variants of the Multiplicative Weights Update method - OMWU and MWU - display opposite convergence properties depending on whether the game is zero-sum or cooperative. Inspired by this work and the recent literature on learning to optimize for single functions, we introduce a new framework for learning last-iterate convergence to Nash Equilibria… ▽ More Cheung and Piliouras (2020) recently showed that two variants of the Multiplicative Weights Update method - OMWU and MWU - display opposite convergence properties depending on whether the game is zero-sum or cooperative. Inspired by this work and the recent literature on learning to optimize for single functions, we introduce a new framework for learning last-iterate convergence to Nash Equilibria in games, where the update rule's coefficients (learning rates) along a trajectory are learnt by a reinforcement learning policy that is conditioned on the nature of the game: \textit{the game signature}. We construct the latter using a new decomposition of two-player games into eight components corresponding to commutative projection operators, generalizing and unifying recent game concepts studied in the literature. We compare the performance of various update rules when their coefficients are learnt, and show that the RL policy is able to exploit the game signature across a wide range of game types. In doing so, we introduce CMWU, a new algorithm that extends consensus optimization to the constrained case, has local convergence guarantees for zero-sum bimatrix games, and show that it enjoys competitive performance on both zero-sum games with constant coefficients and across a spectrum of games when its coefficients are learnt. △ Less

Submitted 11 June, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

Comments: ICML 2022, the 39th International Conference on Machine Learning

arXiv:2102.10362 [pdf, other]

Factored Policy Gradients: Leveraging Structure for Efficient Learning in MOMDPs

Authors: Thomas Spooner, Nelson Vadori, Sumitra Ganesh

Abstract: Policy gradient methods can solve complex tasks but often fail when the dimensionality of the action-space or objective multiplicity grow very large. This occurs, in part, because the variance on score-based gradient estimators scales quadratically. In this paper, we address this problem through a factor baseline which exploits independence structure encoded in a novel action-target influence netw… ▽ More Policy gradient methods can solve complex tasks but often fail when the dimensionality of the action-space or objective multiplicity grow very large. This occurs, in part, because the variance on score-based gradient estimators scales quadratically. In this paper, we address this problem through a factor baseline which exploits independence structure encoded in a novel action-target influence network. Factored policy gradients (FPGs), which follow, provide a common framework for analysing key state-of-the-art algorithms, are shown to generalise traditional policy gradients, and yield a principled way of incorporating prior knowledge of a problem domain's generative processes. We provide an analysis of the proposed estimator and identify the conditions under which variance is reduced. The algorithmic aspects of FPGs are discussed, including optimal policy factorisation, as characterised by minimum biclique coverings, and the implications for the bias-variance trade-off of incorrectly specifying the network. Finally, we demonstrate the performance advantages of our algorithm on large-scale bandit and traffic intersection problems, providing a novel contribution to the latter in the form of a spatial approximation. △ Less

Submitted 23 November, 2021; v1 submitted 20 February, 2021; originally announced February 2021.

Comments: NeurIPS 2021; 19 pages, 19 figures, 1 table

arXiv:2006.13085 [pdf, other]

Calibration of Shared Equilibria in General Sum Partially Observable Markov Games

Authors: Nelson Vadori, Sumitra Ganesh, Prashant Reddy, Manuela Veloso

Abstract: Training multi-agent systems (MAS) to achieve realistic equilibria gives us a useful tool to understand and model real-world systems. We consider a general sum partially observable Markov game where agents of different types share a single policy network, conditioned on agent-specific information. This paper aims at i) formally understanding equilibria reached by such agents, and ii) matching emer… ▽ More Training multi-agent systems (MAS) to achieve realistic equilibria gives us a useful tool to understand and model real-world systems. We consider a general sum partially observable Markov game where agents of different types share a single policy network, conditioned on agent-specific information. This paper aims at i) formally understanding equilibria reached by such agents, and ii) matching emergent phenomena of such equilibria to real-world targets. Parameter sharing with decentralized execution has been introduced as an efficient way to train multiple agents using a single policy network. However, the nature of resulting equilibria reached by such agents has not been yet studied: we introduce the novel concept of Shared equilibrium as a symmetric pure Nash equilibrium of a certain Functional Form Game (FFG) and prove convergence to the latter for a certain class of games using self-play. In addition, it is important that such equilibria satisfy certain constraints so that MAS are calibrated to real world data for practical use: we solve this problem by introducing a novel dual-Reinforcement Learning based approach that fits emergent behaviors of agents in a Shared equilibrium to externally-specified targets, and apply our methods to a n-player market example. We do so by calibrating parameters governing distributions of agent types rather than individual agents, which allows both behavior differentiation among agents and coherent scaling of the shared policy network to multiple agents. △ Less

Submitted 23 October, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

Comments: NeurIPS 2020, Thirty-fourth Conference on Neural Information Processing Systems

arXiv:2006.12686 [pdf, other]

doi 10.1145/3383455.3422519

Risk-Sensitive Reinforcement Learning: a Martingale Approach to Reward Uncertainty

Authors: Nelson Vadori, Sumitra Ganesh, Prashant Reddy, Manuela Veloso

Abstract: We introduce a novel framework to account for sensitivity to rewards uncertainty in sequential decision-making problems. While risk-sensitive formulations for Markov decision processes studied so far focus on the distribution of the cumulative reward as a whole, we aim at learning policies sensitive to the uncertain/stochastic nature of the rewards, which has the advantage of being conceptually mo… ▽ More We introduce a novel framework to account for sensitivity to rewards uncertainty in sequential decision-making problems. While risk-sensitive formulations for Markov decision processes studied so far focus on the distribution of the cumulative reward as a whole, we aim at learning policies sensitive to the uncertain/stochastic nature of the rewards, which has the advantage of being conceptually more meaningful in some cases. To this end, we present a new decomposition of the randomness contained in the cumulative reward based on the Doob decomposition of a stochastic process, and introduce a new conceptual tool - the \textit{chaotic variation} - which can rigorously be interpreted as the risk measure of the martingale component associated to the cumulative reward process. We innovate on the reinforcement learning side by incorporating this new risk-sensitive approach into model-free algorithms, both policy gradient and value function based, and illustrate its relevance on grid world and portfolio optimization problems. △ Less

Submitted 15 September, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

Comments: Published at ICAIF 2020: ACM International Conference on AI in Finance

arXiv:1911.05892 [pdf, other]

Reinforcement Learning for Market Making in a Multi-agent Dealer Market

Authors: Sumitra Ganesh, Nelson Vadori, Mengda Xu, Hua Zheng, Prashant Reddy, Manuela Veloso

Abstract: Market makers play an important role in providing liquidity to markets by continuously quoting prices at which they are willing to buy and sell, and managing inventory risk. In this paper, we build a multi-agent simulation of a dealer market and demonstrate that it can be used to understand the behavior of a reinforcement learning (RL) based market maker agent. We use the simulator to train an RL-… ▽ More Market makers play an important role in providing liquidity to markets by continuously quoting prices at which they are willing to buy and sell, and managing inventory risk. In this paper, we build a multi-agent simulation of a dealer market and demonstrate that it can be used to understand the behavior of a reinforcement learning (RL) based market maker agent. We use the simulator to train an RL-based market maker agent with different competitive scenarios, reward formulations and market price trends (drifts). We show that the reinforcement learning agent is able to learn about its competitor's pricing policy; it also learns to manage inventory by smartly selecting asymmetric prices on the buy and sell sides (skewing), and maintaining a positive (or negative) inventory depending on whether the market price drift is positive (or negative). Finally, we propose and test reward formulations for creating risk averse RL-based market maker agents. △ Less

Submitted 13 November, 2019; originally announced November 2019.

arXiv:1601.01710 [pdf, other]

A Semi-Markovian Modeling of Limit Order Markets

Authors: Anatoliy Swishchuk, Nelson Vadori

Abstract: R. Cont and A. de Larrard (SIAM J. Finan. Math, 2013) introduced a tractable stochastic model for the dynamics of a limit order book, computing various quantities of interest such as the probability of a price increase or the diffusion limit of the price process. As suggested by empirical observations, we extend their framework to 1) arbitrary distributions for book events inter-arrival times (pos… ▽ More R. Cont and A. de Larrard (SIAM J. Finan. Math, 2013) introduced a tractable stochastic model for the dynamics of a limit order book, computing various quantities of interest such as the probability of a price increase or the diffusion limit of the price process. As suggested by empirical observations, we extend their framework to 1) arbitrary distributions for book events inter-arrival times (possibly non-exponential) and 2) both the nature of a new book event and its corresponding inter-arrival time depend on the nature of the previous book event. We do so by resorting to Markov renewal processes to model the dynamics of the bid and ask queues. We keep analytical tractability via explicit expressions for the Laplace transforms of various quantities of interest. We justify and illustrate our approach by calibrating our model to the five stocks Amazon, Apple, Google, Intel and Microsoft on June 21^{st} 2012. As in R. Cont and A. de Larrard, the bid-ask spread remains constant equal to one tick, only the bid and ask queues are modeled (they are independent from each other and get reinitialized after a price change), and all orders have the same size. △ Less

Submitted 7 January, 2016; originally announced January 2016.

Comments: 27 pages, 1 figure, 13 tables

MSC Class: 60K15; 60K20; 90B22; 91B24; 91B70

arXiv:1304.4169 [pdf, ps, other]

Law of Large Numbers for Semi-Markov inhomogeneous Random Evolutions on Banach spaces

Authors: Nelson Vadori, Anatoliy Swishchuk

Abstract: Using backward propagators, we construct inhomogeneous Random Evolutions on Banach spaces driven by (uniformly ergodic) Semi-Markov processes. After studying some of their properties (measurability, continuity, integral representation), we establish a Law of Large Numbers for such inhomogeneous Random Evolutions, and more precisely their weak convergence - in the Skorohod space $D$ - to an inhomog… ▽ More Using backward propagators, we construct inhomogeneous Random Evolutions on Banach spaces driven by (uniformly ergodic) Semi-Markov processes. After studying some of their properties (measurability, continuity, integral representation), we establish a Law of Large Numbers for such inhomogeneous Random Evolutions, and more precisely their weak convergence - in the Skorohod space $D$ - to an inhomogeneous semigroup. A martingale characterization of these inhomogeneous Random Evolutions is also obtained. Finally, we present applications to inhomogeneous Lévy Random Evolutions. △ Less

Submitted 27 May, 2013; v1 submitted 15 April, 2013; originally announced April 2013.

Comments: v2: typos removed. Remark 4.14 corrected. Intro slightly changed v3: typos removed. Intro slightly changed

Showing 1–11 of 11 results for author: Vadori, N