-
Six Levels of Privacy: A Framework for Financial Synthetic Data
Authors:
Tucker Balch,
Vamsi K. Potluru,
Deepak Paramanand,
Manuela Veloso
Abstract:
Synthetic Data is increasingly important in financial applications. In addition to the benefits it provides, such as improved financial modeling and better testing procedures, it poses privacy risks as well. Such data may arise from client information, business information, or other proprietary sources that must be protected. Even though the process by which Synthetic Data is generated serves to o…
▽ More
Synthetic Data is increasingly important in financial applications. In addition to the benefits it provides, such as improved financial modeling and better testing procedures, it poses privacy risks as well. Such data may arise from client information, business information, or other proprietary sources that must be protected. Even though the process by which Synthetic Data is generated serves to obscure the original data to some degree, the extent to which privacy is preserved is hard to assess. Accordingly, we introduce a hierarchy of ``levels'' of privacy that are useful for categorizing Synthetic Data generation methods and the progressively improved protections they offer. While the six levels were devised in the context of financial applications, they may also be appropriate for other industries as well. Our paper includes: A brief overview of Financial Synthetic Data, how it can be used, how its value can be assessed, privacy risks, and privacy attacks. We close with details of the ``Six Levels'' that include defenses against those attacks.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Synthetic Data Applications in Finance
Authors:
Vamsi K. Potluru,
Daniel Borrajo,
Andrea Coletta,
Niccolò Dalmasso,
Yousef El-Laham,
Elizabeth Fons,
Mohsen Ghassemi,
Sriram Gopalakrishnan,
Vikesh Gosai,
Eleonora Kreačić,
Ganapathy Mani,
Saheed Obitayo,
Deepak Paramanand,
Natraj Raman,
Mikhail Solonin,
Srijan Sood,
Svitlana Vyetrenko,
Haibei Zhu,
Manuela Veloso,
Tucker Balch
Abstract:
Synthetic data has made tremendous strides in various commercial settings including finance, healthcare, and virtual reality. We present a broad overview of prototypical applications of synthetic data in the financial sector and in particular provide richer details for a few select ones. These cover a wide variety of data modalities including tabular, time-series, event-series, and unstructured ar…
▽ More
Synthetic data has made tremendous strides in various commercial settings including finance, healthcare, and virtual reality. We present a broad overview of prototypical applications of synthetic data in the financial sector and in particular provide richer details for a few select ones. These cover a wide variety of data modalities including tabular, time-series, event-series, and unstructured arising from both markets and retail financial applications. Since finance is a highly regulated industry, synthetic data is a potential approach for dealing with issues related to privacy, fairness, and explainability. Various metrics are utilized in evaluating the quality and effectiveness of our approaches in these applications. We conclude with open directions in synthetic data in the context of the financial domain.
△ Less
Submitted 20 March, 2024; v1 submitted 29 December, 2023;
originally announced January 2024.
-
Prime Match: A Privacy-Preserving Inventory Matching System
Authors:
Antigoni Polychroniadou,
Gilad Asharov,
Benjamin Diamond,
Tucker Balch,
Hans Buehler,
Richard Hua,
Suwen Gu,
Greg Gimler,
Manuela Veloso
Abstract:
Inventory matching is a standard mechanism/auction for trading financial stocks by which buyers and sellers can be paired. In the financial world, banks often undertake the task of finding such matches between their clients. The related stocks can be traded without adversely impacting the market price for either client. If matches between clients are found, the bank can offer the trade at advantag…
▽ More
Inventory matching is a standard mechanism/auction for trading financial stocks by which buyers and sellers can be paired. In the financial world, banks often undertake the task of finding such matches between their clients. The related stocks can be traded without adversely impacting the market price for either client. If matches between clients are found, the bank can offer the trade at advantageous rates. If no match is found, the parties have to buy or sell the stock in the public market, which introduces additional costs. A problem with the process as it is presently conducted is that the involved parties must share their order to buy or sell a particular stock, along with the intended quantity (number of shares), to the bank. Clients worry that if this information were to leak somehow, then other market participants would become aware of their intentions and thus cause the price to move adversely against them before their transaction finalizes. We provide a solution, Prime Match, that enables clients to match their orders efficiently with reduced market impact while maintaining privacy. In the case where there are no matches, no information is revealed. Our main cryptographic innovation is a two-round secure linear comparison protocol for computing the minimum between two quantities without preprocessing and with malicious security, which can be of independent interest. We report benchmarks of our Prime Match system, which runs in production and is adopted by J.P. Morgan. The system is designed utilizing a star topology network, which provides clients with a centralized node (the bank) as an alternative to the idealized assumption of point-to-point connections, which would be impractical and undesired for the clients to implement in reality. Prime Match is the first secure multiparty computation solution running live in the traditional financial world.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
Financial Time Series Forecasting using CNN and Transformer
Authors:
Zhen Zeng,
Rachneet Kaur,
Suchetha Siddagangappa,
Saba Rahimi,
Tucker Balch,
Manuela Veloso
Abstract:
Time series forecasting is important across various domains for decision-making. In particular, financial time series such as stock prices can be hard to predict as it is difficult to model short-term and long-term temporal dependencies between data points. Convolutional Neural Networks (CNN) are good at capturing local patterns for modeling short-term dependencies. However, CNNs cannot learn long…
▽ More
Time series forecasting is important across various domains for decision-making. In particular, financial time series such as stock prices can be hard to predict as it is difficult to model short-term and long-term temporal dependencies between data points. Convolutional Neural Networks (CNN) are good at capturing local patterns for modeling short-term dependencies. However, CNNs cannot learn long-term dependencies due to the limited receptive field. Transformers on the other hand are capable of learning global context and long-term dependencies. In this paper, we propose to harness the power of CNNs and Transformers to model both short-term and long-term dependencies within a time series, and forecast if the price would go up, down or remain the same (flat) in the future. In our experiments, we demonstrated the success of the proposed method in comparison to commonly adopted statistical and deep learning methods on forecasting intraday stock price change of S&P 500 constituents.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
Towards Multi-Agent Reinforcement Learning driven Over-The-Counter Market Simulations
Authors:
Nelson Vadori,
Leo Ardon,
Sumitra Ganesh,
Thomas Spooner,
Selim Amrouni,
Jared Vann,
Mengda Xu,
Zeyu Zheng,
Tucker Balch,
Manuela Veloso
Abstract:
We study a game between liquidity provider and liquidity taker agents interacting in an over-the-counter market, for which the typical example is foreign exchange. We show how a suitable design of parameterized families of reward functions coupled with shared policy learning constitutes an efficient solution to this problem. By playing against each other, our deep-reinforcement-learning-driven age…
▽ More
We study a game between liquidity provider and liquidity taker agents interacting in an over-the-counter market, for which the typical example is foreign exchange. We show how a suitable design of parameterized families of reward functions coupled with shared policy learning constitutes an efficient solution to this problem. By playing against each other, our deep-reinforcement-learning-driven agents learn emergent behaviors relative to a wide spectrum of objectives encompassing profit-and-loss, optimal execution and market share. In particular, we find that liquidity providers naturally learn to balance hedging and skewing, where skewing refers to setting their buy and sell prices asymmetrically as a function of their inventory. We further introduce a novel RL-based calibration algorithm which we found performed well at imposing constraints on the game equilibrium. On the theoretical side, we are able to show convergence rates for our multi-agent policy gradient algorithm under a transitivity assumption, closely related to generalized ordinal potential games.
△ Less
Submitted 1 August, 2023; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Parameterized Explanations for Investor / Company Matching
Authors:
Simerjot Kaur,
Ivan Brugere,
Andrea Stefanucci,
Armineh Nourbakhsh,
Sameena Shah,
Manuela Veloso
Abstract:
Matching companies and investors is usually considered a highly specialized decision making process. Building an AI agent that can automate such recommendation process can significantly help reduce costs, and eliminate human biases and errors. However, limited sample size of financial data-sets and the need for not only good recommendations, but also explaining why a particular recommendation is b…
▽ More
Matching companies and investors is usually considered a highly specialized decision making process. Building an AI agent that can automate such recommendation process can significantly help reduce costs, and eliminate human biases and errors. However, limited sample size of financial data-sets and the need for not only good recommendations, but also explaining why a particular recommendation is being made, makes this a challenging problem. In this work we propose a representation learning based recommendation engine that works extremely well with small datasets and demonstrate how it can be coupled with a parameterized explanation generation engine to build an explainable recommendation system for investor-company matching. We compare the performance of our system with human generated recommendations and demonstrate the ability of our algorithm to perform extremely well on this task. We also highlight how explainability helps with real-life adoption of our system.
△ Less
Submitted 27 October, 2021;
originally announced November 2021.
-
ABIDES-Gym: Gym Environments for Multi-Agent Discrete Event Simulation and Application to Financial Markets
Authors:
Selim Amrouni,
Aymeric Moulin,
Jared Vann,
Svitlana Vyetrenko,
Tucker Balch,
Manuela Veloso
Abstract:
Model-free Reinforcement Learning (RL) requires the ability to sample trajectories by taking actions in the original problem environment or a simulated version of it. Breakthroughs in the field of RL have been largely facilitated by the development of dedicated open source simulators with easy to use frameworks such as OpenAI Gym and its Atari environments. In this paper we propose to use the Open…
▽ More
Model-free Reinforcement Learning (RL) requires the ability to sample trajectories by taking actions in the original problem environment or a simulated version of it. Breakthroughs in the field of RL have been largely facilitated by the development of dedicated open source simulators with easy to use frameworks such as OpenAI Gym and its Atari environments. In this paper we propose to use the OpenAI Gym framework on discrete event time based Discrete Event Multi-Agent Simulation (DEMAS). We introduce a general technique to wrap a DEMAS simulator into the Gym framework. We expose the technique in detail and implement it using the simulator ABIDES as a base. We apply this work by specifically using the markets extension of ABIDES, ABIDES-Markets, and develop two benchmark financial markets OpenAI Gym environments for training daily investor and execution agents. As a result, these two environments describe classic financial problems with a complex interactive market behavior response to the experimental agent's action.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Towards Robust Representation of Limit Orders Books for Deep Learning Models
Authors:
Yufei Wu,
Mahmoud Mahfouz,
Daniele Magazzeni,
Manuela Veloso
Abstract:
The success of deep learning-based limit order book forecasting models is highly dependent on the quality and the robustness of the input data representation. A significant body of the quantitative finance literature focuses on utilising different deep learning architectures without taking into consideration the key assumptions these models make with respect to the input data representation. In th…
▽ More
The success of deep learning-based limit order book forecasting models is highly dependent on the quality and the robustness of the input data representation. A significant body of the quantitative finance literature focuses on utilising different deep learning architectures without taking into consideration the key assumptions these models make with respect to the input data representation. In this paper, we highlight the issues associated with the commonly-used representations of limit order book data from both a theoretical and practical perspectives. We also show the fragility of the representations under adversarial perturbations and propose two simple modifications to the existing representations that match the theoretical assumptions of deep learning models. Finally, we show experimentally how our proposed representations lead to state-of-the-art performance in both accuracy and robustness utilising very simple neural network architectures.
△ Less
Submitted 7 December, 2022; v1 submitted 10 October, 2021;
originally announced October 2021.
-
How Robust are Limit Order Book Representations under Data Perturbation?
Authors:
Yufei Wu,
Mahmoud Mahfouz,
Daniele Magazzeni,
Manuela Veloso
Abstract:
The success of machine learning models in the financial domain is highly reliant on the quality of the data representation. In this paper, we focus on the representation of limit order book data and discuss the opportunities and challenges for learning representations of such data. We also experimentally analyse the issues associated with existing representations and present a guideline for future…
▽ More
The success of machine learning models in the financial domain is highly reliant on the quality of the data representation. In this paper, we focus on the representation of limit order book data and discuss the opportunities and challenges for learning representations of such data. We also experimentally analyse the issues associated with existing representations and present a guideline for future research in this area.
△ Less
Submitted 10 October, 2021;
originally announced October 2021.
-
Learning to Classify and Imitate Trading Agents in Continuous Double Auction Markets
Authors:
Mahmoud Mahfouz,
Tucker Balch,
Manuela Veloso,
Danilo Mandic
Abstract:
Continuous double auctions such as the limit order book employed by exchanges are widely used in practice to match buyers and sellers of a variety of financial instruments. In this work, we develop an agent-based model for trading in a limit order book and show (1) how opponent modelling techniques can be applied to classify trading agent archetypes and (2) how behavioural cloning can be used to i…
▽ More
Continuous double auctions such as the limit order book employed by exchanges are widely used in practice to match buyers and sellers of a variety of financial instruments. In this work, we develop an agent-based model for trading in a limit order book and show (1) how opponent modelling techniques can be applied to classify trading agent archetypes and (2) how behavioural cloning can be used to imitate these agents in a simulated setting. We experimentally compare a number of techniques for both tasks and evaluate their applicability and use in real-world scenarios.
△ Less
Submitted 29 October, 2021; v1 submitted 4 October, 2021;
originally announced October 2021.
-
Visual Time Series Forecasting: An Image-driven Approach
Authors:
Naftali Cohen,
Srijan Sood,
Zhen Zeng,
Tucker Balch,
Manuela Veloso
Abstract:
In this work, we address time-series forecasting as a computer vision task. We capture input data as an image and train a model to produce the subsequent image. This approach results in predicting distributions as opposed to pointwise values. To assess the robustness and quality of our approach, we examine various datasets and multiple evaluation metrics. Our experiments show that our forecasting…
▽ More
In this work, we address time-series forecasting as a computer vision task. We capture input data as an image and train a model to produce the subsequent image. This approach results in predicting distributions as opposed to pointwise values. To assess the robustness and quality of our approach, we examine various datasets and multiple evaluation metrics. Our experiments show that our forecasting tool is effective for cyclic data but somewhat less for irregular data such as stock prices. Importantly, when using image-based evaluation metrics, we find our method to outperform various baselines, including ARIMA, and a numerical variation of our deep learning approach.
△ Less
Submitted 15 November, 2021; v1 submitted 2 July, 2021;
originally announced July 2021.
-
Risk-Sensitive Reinforcement Learning: a Martingale Approach to Reward Uncertainty
Authors:
Nelson Vadori,
Sumitra Ganesh,
Prashant Reddy,
Manuela Veloso
Abstract:
We introduce a novel framework to account for sensitivity to rewards uncertainty in sequential decision-making problems. While risk-sensitive formulations for Markov decision processes studied so far focus on the distribution of the cumulative reward as a whole, we aim at learning policies sensitive to the uncertain/stochastic nature of the rewards, which has the advantage of being conceptually mo…
▽ More
We introduce a novel framework to account for sensitivity to rewards uncertainty in sequential decision-making problems. While risk-sensitive formulations for Markov decision processes studied so far focus on the distribution of the cumulative reward as a whole, we aim at learning policies sensitive to the uncertain/stochastic nature of the rewards, which has the advantage of being conceptually more meaningful in some cases. To this end, we present a new decomposition of the randomness contained in the cumulative reward based on the Doob decomposition of a stochastic process, and introduce a new conceptual tool - the \textit{chaotic variation} - which can rigorously be interpreted as the risk measure of the martingale component associated to the cumulative reward process. We innovate on the reinforcement learning side by incorporating this new risk-sensitive approach into model-free algorithms, both policy gradient and value function based, and illustrate its relevance on grid world and portfolio optimization problems.
△ Less
Submitted 15 September, 2020; v1 submitted 22 June, 2020;
originally announced June 2020.
-
Bayesian Consensus: Consensus Estimates from Miscalibrated Instruments under Heteroscedastic Noise
Authors:
Chirag Nagpal,
Robert E. Tillman,
Prashant Reddy,
Manuela Veloso
Abstract:
We consider the problem of aggregating predictions or measurements from a set of human forecasters, models, sensors or other instruments which may be subject to bias or miscalibration and random heteroscedastic noise. We propose a Bayesian consensus estimator that adjusts for miscalibration and noise and show that this estimator is unbiased and asymptotically more efficient than naive alternatives…
▽ More
We consider the problem of aggregating predictions or measurements from a set of human forecasters, models, sensors or other instruments which may be subject to bias or miscalibration and random heteroscedastic noise. We propose a Bayesian consensus estimator that adjusts for miscalibration and noise and show that this estimator is unbiased and asymptotically more efficient than naive alternatives. We further propose a Hierarchical Bayesian Model that leverages our proposed estimator and apply it to two real world forecasting challenges that require consensus estimates from error prone individual estimates: forecasting influenza like illness (ILI) weekly percentages and forecasting annual earnings of public companies. We demonstrate that our approach is effective at mitigating bias and error and results in more accurate forecasts than existing consensus models.
△ Less
Submitted 8 January, 2021; v1 submitted 14 April, 2020;
originally announced April 2020.
-
Get Real: Realism Metrics for Robust Limit Order Book Market Simulations
Authors:
Svitlana Vyetrenko,
David Byrd,
Nick Petosa,
Mahmoud Mahfouz,
Danial Dervovic,
Manuela Veloso,
Tucker Hybinette Balch
Abstract:
Machine learning (especially reinforcement learning) methods for trading are increasingly reliant on simulation for agent training and testing. Furthermore, simulation is important for validation of hand-coded trading strategies and for testing hypotheses about market structure. A challenge, however, concerns the robustness of policies validated in simulation because the simulations lack fidelity.…
▽ More
Machine learning (especially reinforcement learning) methods for trading are increasingly reliant on simulation for agent training and testing. Furthermore, simulation is important for validation of hand-coded trading strategies and for testing hypotheses about market structure. A challenge, however, concerns the robustness of policies validated in simulation because the simulations lack fidelity. In fact, researchers have shown that many market simulation approaches fail to reproduce statistics and stylized facts seen in real markets. As a step towards addressing this we surveyed the literature to collect a set of reference metrics and applied them to real market data and simulation output. Our paper provides a comprehensive catalog of these metrics including mathematical formulations where appropriate. Our results show that there are still significant discrepancies between simulated markets and real ones. However, this work serves as a benchmark against which we can measure future improvement.
△ Less
Submitted 10 December, 2019;
originally announced December 2019.
-
On the Importance of Opponent Modeling in Auction Markets
Authors:
Mahmoud Mahfouz,
Angelos Filos,
Cyrine Chtourou,
Joshua Lockhart,
Samuel Assefa,
Manuela Veloso,
Danilo Mandic,
Tucker Balch
Abstract:
The dynamics of financial markets are driven by the interactions between participants, as well as the trading mechanisms and regulatory frameworks that govern these interactions. Decision-makers would rather not ignore the impact of other participants on these dynamics and should employ tools and models that take this into account. To this end, we demonstrate the efficacy of applying opponent-mode…
▽ More
The dynamics of financial markets are driven by the interactions between participants, as well as the trading mechanisms and regulatory frameworks that govern these interactions. Decision-makers would rather not ignore the impact of other participants on these dynamics and should employ tools and models that take this into account. To this end, we demonstrate the efficacy of applying opponent-modeling in a number of simulated market settings. While our simulations are simplified representations of actual market dynamics, they provide an idealized "playground" in which our techniques can be demonstrated and tested. We present this work with the aim that our techniques could be refined and, with some effort, scaled up to the full complexity of real-world market scenarios. We hope that the results presented encourage practitioners to adopt opponent-modeling methods and apply them online systems, in order to enable not only reactive but also proactive decisions to be made.
△ Less
Submitted 28 November, 2019;
originally announced November 2019.
-
Reinforcement Learning for Market Making in a Multi-agent Dealer Market
Authors:
Sumitra Ganesh,
Nelson Vadori,
Mengda Xu,
Hua Zheng,
Prashant Reddy,
Manuela Veloso
Abstract:
Market makers play an important role in providing liquidity to markets by continuously quoting prices at which they are willing to buy and sell, and managing inventory risk. In this paper, we build a multi-agent simulation of a dealer market and demonstrate that it can be used to understand the behavior of a reinforcement learning (RL) based market maker agent. We use the simulator to train an RL-…
▽ More
Market makers play an important role in providing liquidity to markets by continuously quoting prices at which they are willing to buy and sell, and managing inventory risk. In this paper, we build a multi-agent simulation of a dealer market and demonstrate that it can be used to understand the behavior of a reinforcement learning (RL) based market maker agent. We use the simulator to train an RL-based market maker agent with different competitive scenarios, reward formulations and market price trends (drifts). We show that the reinforcement learning agent is able to learn about its competitor's pricing policy; it also learns to manage inventory by smartly selecting asymmetric prices on the buy and sell sides (skewing), and maintaining a positive (or negative) inventory depending on whether the market price drift is positive (or negative). Finally, we propose and test reward formulations for creating risk averse RL-based market maker agents.
△ Less
Submitted 13 November, 2019;
originally announced November 2019.
-
Trading via Image Classification
Authors:
Naftali Cohen,
Tucker Balch,
Manuela Veloso
Abstract:
The art of systematic financial trading evolved with an array of approaches, ranging from simple strategies to complex algorithms all relying, primary, on aspects of time-series analysis. Recently, after visiting the trading floor of a leading financial institution, we noticed that traders always execute their trade orders while observing images of financial time-series on their screens. In this w…
▽ More
The art of systematic financial trading evolved with an array of approaches, ranging from simple strategies to complex algorithms all relying, primary, on aspects of time-series analysis. Recently, after visiting the trading floor of a leading financial institution, we noticed that traders always execute their trade orders while observing images of financial time-series on their screens. In this work, we built upon the success in image recognition and examine the value in transforming the traditional time-series analysis to that of image classification. We create a large sample of financial time-series images encoded as candlestick (Box and Whisker) charts and label the samples following three algebraically-defined binary trade strategies. Using the images, we train over a dozen machine-learning classification models and find that the algorithms are very efficient in recovering the complicated, multiscale label-generating rules when the data is represented visually. We suggest that the transformation of continuous numeric time-series classification problem to a vision problem is useful for recovering signals typical of technical analysis.
△ Less
Submitted 26 October, 2020; v1 submitted 23 July, 2019;
originally announced July 2019.
-
The Effect of Visual Design in Image Classification
Authors:
Naftali Cohen,
Tucker Balch,
Manuela Veloso
Abstract:
Financial companies continuously analyze the state of the markets to rethink and adjust their investment strategies. While the analysis is done on the digital form of data, decisions are often made based on graphical representations in white papers or presentation slides. In this study, we examine whether binary decisions are better to be decided based on the numeric or the visual representation o…
▽ More
Financial companies continuously analyze the state of the markets to rethink and adjust their investment strategies. While the analysis is done on the digital form of data, decisions are often made based on graphical representations in white papers or presentation slides. In this study, we examine whether binary decisions are better to be decided based on the numeric or the visual representation of the same data. Using two data sets, a matrix of numerical data with spatial dependencies and financial data describing the state of the S&P index, we compare the results of supervised classification based on the original numerical representation and the visual transformation of the same data. We show that, for these data sets, the visual transformation results in higher predictability skill compared to the original form of the data. We suggest thinking of the visual representation of numeric data, effectively, as a combination of dimensional reduction and feature engineering techniques. In particular, if the visual layout encapsulates the full complexity of the data. In this view, thoughtful visual design can guard against overfitting, or introduce new features -- all of which benefit the learning process, and effectively lead to better recognition of meaningful patterns.
△ Less
Submitted 20 August, 2019; v1 submitted 22 July, 2019;
originally announced July 2019.