-
Error Analysis of Option Pricing via Deep PDE Solvers: Empirical Study
Authors:
Rawin Assabumrungrat,
Kentaro Minami,
Masanori Hirano
Abstract:
Option pricing, a fundamental problem in finance, often requires solving non-linear partial differential equations (PDEs). When dealing with multi-asset options, such as rainbow options, these PDEs become high-dimensional, leading to challenges posed by the curse of dimensionality. While deep learning-based PDE solvers have recently emerged as scalable solutions to this high-dimensional problem, t…
▽ More
Option pricing, a fundamental problem in finance, often requires solving non-linear partial differential equations (PDEs). When dealing with multi-asset options, such as rainbow options, these PDEs become high-dimensional, leading to challenges posed by the curse of dimensionality. While deep learning-based PDE solvers have recently emerged as scalable solutions to this high-dimensional problem, their empirical and quantitative accuracy remains not well-understood, hindering their real-world applicability. In this study, we aimed to offer actionable insights into the utility of Deep PDE solvers for practical option pricing implementation. Through comparative experiments, we assessed the empirical performance of these solvers in high-dimensional contexts. Our investigation identified three primary sources of errors in Deep PDE solvers: (i) errors inherent in the specifications of the target option and underlying assets, (ii) errors originating from the asset model simulation methods, and (iii) errors stemming from the neural network training. Through ablation studies, we evaluated the individual impact of each error source. Our results indicate that the Deep BSDE method (DBSDE) is superior in performance and exhibits robustness against variations in option specifications. In contrast, some other methods are overly sensitive to option specifications, such as time to expiration. We also find that the performance of these methods improves inversely proportional to the square root of batch size and the number of time steps. This observation can aid in estimating computational resources for achieving desired accuracies with Deep PDE solvers.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Adversarial Deep Hedging: Learning to Hedge without Price Process Modeling
Authors:
Masanori Hirano,
Kentaro Minami,
Kentaro Imajo
Abstract:
Deep hedging is a deep-learning-based framework for derivative hedging in incomplete markets. The advantage of deep hedging lies in its ability to handle various realistic market conditions, such as market frictions, which are challenging to address within the traditional mathematical finance framework. Since deep hedging relies on market simulation, the underlying asset price process model is cru…
▽ More
Deep hedging is a deep-learning-based framework for derivative hedging in incomplete markets. The advantage of deep hedging lies in its ability to handle various realistic market conditions, such as market frictions, which are challenging to address within the traditional mathematical finance framework. Since deep hedging relies on market simulation, the underlying asset price process model is crucial. However, existing literature on deep hedging often relies on traditional mathematical finance models, e.g., Brownian motion and stochastic volatility models, and discovering effective underlying asset models for deep hedging learning has been a challenge. In this study, we propose a new framework called adversarial deep hedging, inspired by adversarial learning. In this framework, a hedger and a generator, which respectively model the underlying asset process and the underlying asset process, are trained in an adversarial manner. The proposed method enables to learn a robust hedger without explicitly modeling the underlying asset process. Through numerical experiments, we demonstrate that our proposed method achieves competitive performance to models that assume explicit underlying asset processes across various real market data.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Uncertainty Aware Trader-Company Method: Interpretable Stock Price Prediction Capturing Uncertainty
Authors:
Yugo Fujimoto,
Kei Nakagawa,
Kentaro Imajo,
Kentaro Minami
Abstract:
Machine learning is an increasingly popular tool with some success in predicting stock prices. One promising method is the Trader-Company~(TC) method, which takes into account the dynamism of the stock market and has both high predictive power and interpretability. Machine learning-based stock prediction methods including the TC method have been concentrating on point prediction. However, point pr…
▽ More
Machine learning is an increasingly popular tool with some success in predicting stock prices. One promising method is the Trader-Company~(TC) method, which takes into account the dynamism of the stock market and has both high predictive power and interpretability. Machine learning-based stock prediction methods including the TC method have been concentrating on point prediction. However, point prediction in the absence of uncertainty estimates lacks credibility quantification and raises concerns about safety. The challenge in this paper is to make an investment strategy that combines high predictive power and the ability to quantify uncertainty. We propose a novel approach called Uncertainty Aware Trader-Company Method~(UTC) method. The core idea of this approach is to combine the strengths of both frameworks by merging the TC method with the probabilistic modeling, which provides probabilistic predictions and uncertainty estimations. We expect this to retain the predictive power and interpretability of the TC method while capturing the uncertainty. We theoretically prove that the proposed method estimates the posterior variance and does not introduce additional biases from the original TC method. We conduct a comprehensive evaluation of our approach based on the synthetic and real market datasets. We confirm with synthetic data that the UTC method can detect situations where the uncertainty increases and the prediction is difficult. We also confirmed that the UTC method can detect abrupt changes in data generating distributions. We demonstrate with real market data that the UTC method can achieve higher returns and lower risks than baselines.
△ Less
Submitted 2 November, 2022; v1 submitted 30 October, 2022;
originally announced October 2022.
-
One-bit Submission for Locally Private Quasi-MLE: Its Asymptotic Normality and Limitation
Authors:
Hajime Ono,
Kazuhiro Minami,
Hideitsu Hino
Abstract:
Local differential privacy~(LDP) is an information-theoretic privacy definition suitable for statistical surveys that involve an untrusted data curator. An LDP version of quasi-maximum likelihood estimator~(QMLE) has been developed, but the existing method to build LDP QMLE is difficult to implement for a large-scale survey system in the real world due to long waiting time, expensive communication…
▽ More
Local differential privacy~(LDP) is an information-theoretic privacy definition suitable for statistical surveys that involve an untrusted data curator. An LDP version of quasi-maximum likelihood estimator~(QMLE) has been developed, but the existing method to build LDP QMLE is difficult to implement for a large-scale survey system in the real world due to long waiting time, expensive communication cost, and the boundedness assumption of derivative of a log-likelihood function. We provided an alternative LDP protocol without those issues, which is potentially much easily deployable to a large-scale survey. We also provided sufficient conditions for the consistency and asymptotic normality and limitations of our protocol. Our protocol is less burdensome for the users, and the theoretical guarantees cover more realistic cases than those for the existing method.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
Unified Perspective on Probability Divergence via Maximum Likelihood Density Ratio Estimation: Bridging KL-Divergence and Integral Probability Metrics
Authors:
Masahiro Kato,
Masaaki Imaizumi,
Kentaro Minami
Abstract:
This paper provides a unified perspective for the Kullback-Leibler (KL)-divergence and the integral probability metrics (IPMs) from the perspective of maximum likelihood density-ratio estimation (DRE). Both the KL-divergence and the IPMs are widely used in various fields in applications such as generative modeling. However, a unified understanding of these concepts has still been unexplored. In th…
▽ More
This paper provides a unified perspective for the Kullback-Leibler (KL)-divergence and the integral probability metrics (IPMs) from the perspective of maximum likelihood density-ratio estimation (DRE). Both the KL-divergence and the IPMs are widely used in various fields in applications such as generative modeling. However, a unified understanding of these concepts has still been unexplored. In this paper, we show that the KL-divergence and the IPMs can be represented as maximal likelihoods differing only by sampling schemes, and use this result to derive a unified form of the IPMs and a relaxed estimation method. To develop the estimation problem, we construct an unconstrained maximum likelihood estimator to perform DRE with a stratified sampling scheme. We further propose a novel class of probability divergences, called the Density Ratio Metrics (DRMs), that interpolates the KL-divergence and the IPMs. In addition to these findings, we also introduce some applications of the DRMs, such as DRE and generative adversarial networks. In experiments, we validate the effectiveness of our proposed methods.
△ Less
Submitted 31 January, 2022;
originally announced January 2022.
-
Contrastive Representation Learning with Trainable Augmentation Channel
Authors:
Masanori Koyama,
Kentaro Minami,
Takeru Miyato,
Yarin Gal
Abstract:
In contrastive representation learning, data representation is trained so that it can classify the image instances even when the images are altered by augmentations. However, depending on the datasets, some augmentations can damage the information of the images beyond recognition, and such augmentations can result in collapsed representations. We present a partial solution to this problem by forma…
▽ More
In contrastive representation learning, data representation is trained so that it can classify the image instances even when the images are altered by augmentations. However, depending on the datasets, some augmentations can damage the information of the images beyond recognition, and such augmentations can result in collapsed representations. We present a partial solution to this problem by formalizing a stochastic encoding process in which there exist a tug-of-war between the data corruption introduced by the augmentations and the information preserved by the encoder. We show that, with the infoMax objective based on this framework, we can learn a data-dependent distribution of augmentations to avoid the collapse of the representation.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.
-
MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems
Authors:
Steven Farrell,
Murali Emani,
Jacob Balma,
Lukas Drescher,
Aleksandr Drozd,
Andreas Fink,
Geoffrey Fox,
David Kanter,
Thorsten Kurth,
Peter Mattson,
Dawei Mu,
Amit Ruhela,
Kento Sato,
Koichi Shirahata,
Tsuguchika Tabaru,
Aristeidis Tsaris,
Jan Balewski,
Ben Cumming,
Takumi Danjo,
Jens Domke,
Takaaki Fukai,
Naoto Fukumoto,
Tatsuya Fukushi,
Balazs Gerofi,
Takumi Honda
, et al. (18 additional authors not shown)
Abstract:
Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High performance computing systems are pushing the frontiers of performance with a rich diversity of hardware resources and massive scale-out capabilities. There is a critical need to understand fair and effective benchmarking of machine learning appli…
▽ More
Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High performance computing systems are pushing the frontiers of performance with a rich diversity of hardware resources and massive scale-out capabilities. There is a critical need to understand fair and effective benchmarking of machine learning applications that are representative of real-world scientific use cases. MLPerf is a community-driven standard to benchmark machine learning workloads, focusing on end-to-end performance metrics. In this paper, we introduce MLPerf HPC, a benchmark suite of large-scale scientific machine learning training applications driven by the MLCommons Association. We present the results from the first submission round, including a diverse set of some of the world's largest HPC systems. We develop a systematic framework for their joint analysis and compare them in terms of data staging, algorithmic convergence, and compute performance. As a result, we gain a quantitative understanding of optimizations on different subsystems such as staging and on-node loading of data, compute-unit utilization, and communication scheduling, enabling overall $>10 \times$ (end-to-end) performance improvements through system scaling. Notably, our analysis shows a scale-dependent interplay between the dataset size, a system's memory hierarchy, and training convergence that underlines the importance of near-compute storage. To overcome the data-parallel scalability challenge at large batch sizes, we discuss specific learning techniques and hybrid data-and-model parallelism that are effective on large systems. We conclude by characterizing each benchmark with respect to low-level memory, I/O, and network behavior to parameterize extended roofline performance models in future rounds.
△ Less
Submitted 26 October, 2021; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Theoretically Motivated Data Augmentation and Regularization for Portfolio Construction
Authors:
Liu Ziyin,
Kentaro Minami,
Kentaro Imajo
Abstract:
The task we consider is portfolio construction in a speculative market, a fundamental problem in modern finance. While various empirical works now exist to explore deep learning in finance, the theory side is almost non-existent. In this work, we focus on develo** a theoretical framework for understanding the use of data augmentation for deep-learning-based approaches to quantitative finance. Th…
▽ More
The task we consider is portfolio construction in a speculative market, a fundamental problem in modern finance. While various empirical works now exist to explore deep learning in finance, the theory side is almost non-existent. In this work, we focus on develo** a theoretical framework for understanding the use of data augmentation for deep-learning-based approaches to quantitative finance. The proposed theory clarifies the role and necessity of data augmentation for finance; moreover, our theory implies that a simple algorithm of injecting a random noise of strength $\sqrt{|r_{t-1}|}$ to the observed return $r_{t}$ is better than not injecting any noise and a few other financially irrelevant data augmentation techniques.
△ Less
Submitted 22 December, 2022; v1 submitted 8 June, 2021;
originally announced June 2021.
-
The equivalence between Stein variational gradient descent and black-box variational inference
Authors:
Casey Chu,
Kentaro Minami,
Kenji Fukumizu
Abstract:
We formalize an equivalence between two popular methods for Bayesian inference: Stein variational gradient descent (SVGD) and black-box variational inference (BBVI). In particular, we show that BBVI corresponds precisely to SVGD when the kernel is the neural tangent kernel. Furthermore, we interpret SVGD and BBVI as kernel gradient flows; we do this by leveraging the recent perspective that views…
▽ More
We formalize an equivalence between two popular methods for Bayesian inference: Stein variational gradient descent (SVGD) and black-box variational inference (BBVI). In particular, we show that BBVI corresponds precisely to SVGD when the kernel is the neural tangent kernel. Furthermore, we interpret SVGD and BBVI as kernel gradient flows; we do this by leveraging the recent perspective that views SVGD as a gradient flow in the space of probability distributions and showing that BBVI naturally motivates a Riemannian structure on that space. We observe that kernel gradient flow also describes dynamics found in the training of generative adversarial networks (GANs). This work thereby unifies several existing techniques in variational inference and generative modeling and identifies the kernel as a fundamental object governing the behavior of these algorithms, motivating deeper analysis of its properties.
△ Less
Submitted 3 April, 2020;
originally announced April 2020.
-
Smoothness and Stability in GANs
Authors:
Casey Chu,
Kentaro Minami,
Kenji Fukumizu
Abstract:
Generative adversarial networks, or GANs, commonly display unstable behavior during training. In this work, we develop a principled theoretical framework for understanding the stability of various types of GANs. In particular, we derive conditions that guarantee eventual stationarity of the generator when it is trained with gradient descent, conditions that must be satisfied by the divergence that…
▽ More
Generative adversarial networks, or GANs, commonly display unstable behavior during training. In this work, we develop a principled theoretical framework for understanding the stability of various types of GANs. In particular, we derive conditions that guarantee eventual stationarity of the generator when it is trained with gradient descent, conditions that must be satisfied by the divergence that is minimized by the GAN and the generator's architecture. We find that existing GAN variants satisfy some, but not all, of these conditions. Using tools from convex analysis, optimal transport, and reproducing kernels, we construct a GAN that fulfills these conditions simultaneously. In the process, we explain and clarify the need for various existing GAN stabilization techniques, including Lipschitz constraints, gradient penalties, and smooth activation functions.
△ Less
Submitted 10 February, 2020;
originally announced February 2020.
-
Trace Equivalence and Epistemic Logic to Express Security Properties
Authors:
Kiraku Minami
Abstract:
In process algebras, security properties are expressed as equivalences between processes, but which equivalence is suitable is not clear. This means that there is a gap between an intuitive security notion and the formulation. Appropriate formalization is essential for verification, and our purpose is bridging this gap. By chasing scope extrusions, we prove that trace equivalence is congruent. Mor…
▽ More
In process algebras, security properties are expressed as equivalences between processes, but which equivalence is suitable is not clear. This means that there is a gap between an intuitive security notion and the formulation. Appropriate formalization is essential for verification, and our purpose is bridging this gap. By chasing scope extrusions, we prove that trace equivalence is congruent. Moreover, we construct an epistemic logic for the applied pi calculus and show that its logical equivalence agrees with the trace equivalence. We use the epistemic logic to show that trace equivalence is pertinent in the presence of a non-adaptive attacker.
△ Less
Submitted 8 March, 2019;
originally announced March 2019.