-
ProDAG: Projection-induced variational inference for directed acyclic graphs
Authors:
Ryan Thompson,
Edwin V. Bonilla,
Robert Kohn
Abstract:
Directed acyclic graph (DAG) learning is a rapidly expanding field of research. Though the field has witnessed remarkable advances over the past few years, it remains statistically and computationally challenging to learn a single (point estimate) DAG from data, let alone provide uncertainty quantification. Our article addresses the difficult task of quantifying graph uncertainty by develo** a v…
▽ More
Directed acyclic graph (DAG) learning is a rapidly expanding field of research. Though the field has witnessed remarkable advances over the past few years, it remains statistically and computationally challenging to learn a single (point estimate) DAG from data, let alone provide uncertainty quantification. Our article addresses the difficult task of quantifying graph uncertainty by develo** a variational Bayes inference framework based on novel distributions that have support directly on the space of DAGs. The distributions, which we use to form our prior and variational posterior, are induced by a projection operation, whereby an arbitrary continuous distribution is projected onto the space of sparse weighted acyclic adjacency matrices (matrix representations of DAGs) with probability mass on exact zeros. Though the projection constitutes a combinatorial optimization problem, it is solvable at scale via recently developed techniques that reformulate acyclicity as a continuous constraint. We empirically demonstrate that our method, ProDAG, can deliver accurate inference, and often outperforms existing state-of-the-art alternatives.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Contextual Directed Acyclic Graphs
Authors:
Ryan Thompson,
Edwin V. Bonilla,
Robert Kohn
Abstract:
Estimating the structure of directed acyclic graphs (DAGs) from observational data remains a significant challenge in machine learning. Most research in this area concentrates on learning a single DAG for the entire population. This paper considers an alternative setting where the graph structure varies across individuals based on available "contextual" features. We tackle this contextual DAG prob…
▽ More
Estimating the structure of directed acyclic graphs (DAGs) from observational data remains a significant challenge in machine learning. Most research in this area concentrates on learning a single DAG for the entire population. This paper considers an alternative setting where the graph structure varies across individuals based on available "contextual" features. We tackle this contextual DAG problem via a neural network that maps the contextual features to a DAG, represented as a weighted adjacency matrix. The neural network is equipped with a novel projection layer that ensures the output matrices are sparse and satisfy a recently developed characterization of acyclicity. We devise a scalable computational framework for learning contextual DAGs and provide a convergence guarantee and an analytical gradient for backpropagating through the projection layer. Our experiments suggest that the new approach can recover the true context-specific graph where existing approaches fail.
△ Less
Submitted 20 February, 2024; v1 submitted 24 October, 2023;
originally announced October 2023.
-
Data Scaling Effect of Deep Learning in Financial Time Series Forecasting
Authors:
Chen Liu,
Minh-Ngoc Tran,
Chao Wang,
Richard Gerlach,
Robert Kohn
Abstract:
For years, researchers investigated the applications of deep learning in forecasting financial time series. However, they continued to rely on the conventional econometric approach for model training that optimizes the deep learning models on individual assets. This study highlights the importance of global training, where the deep learning model is optimized across a wide spectrum of stocks. Focu…
▽ More
For years, researchers investigated the applications of deep learning in forecasting financial time series. However, they continued to rely on the conventional econometric approach for model training that optimizes the deep learning models on individual assets. This study highlights the importance of global training, where the deep learning model is optimized across a wide spectrum of stocks. Focusing on stock volatility forecasting as an exemplar, we show that global training is not only beneficial but also necessary for deep learning-based financial time series forecasting. We further demonstrate that, given a sufficient amount of training data, a globally trained deep learning model is capable of delivering accurate zero-shot forecasts for any stocks.
△ Less
Submitted 31 May, 2024; v1 submitted 5 September, 2023;
originally announced September 2023.
-
Particle Mean Field Variational Bayes
Authors:
Minh-Ngoc Tran,
Paco Tseng,
Robert Kohn
Abstract:
The Mean Field Variational Bayes (MFVB) method is one of the most computationally efficient techniques for Bayesian inference. However, its use has been restricted to models with conjugate priors or those that require analytical calculations. This paper proposes a novel particle-based MFVB approach that greatly expands the applicability of the MFVB method. We establish the theoretical basis of the…
▽ More
The Mean Field Variational Bayes (MFVB) method is one of the most computationally efficient techniques for Bayesian inference. However, its use has been restricted to models with conjugate priors or those that require analytical calculations. This paper proposes a novel particle-based MFVB approach that greatly expands the applicability of the MFVB method. We establish the theoretical basis of the new method by leveraging the connection between Wasserstein gradient flows and Langevin diffusion dynamics, and demonstrate the effectiveness of this approach using Bayesian logistic regression, stochastic volatility, and deep neural networks.
△ Less
Submitted 17 May, 2023; v1 submitted 24 March, 2023;
originally announced March 2023.
-
Deep Learning Enhanced Realized GARCH
Authors:
Chen Liu,
Chao Wang,
Minh-Ngoc Tran,
Robert Kohn
Abstract:
We propose a new approach to volatility modeling by combining deep learning (LSTM) and realized volatility measures. This LSTM-enhanced realized GARCH framework incorporates and distills modeling advances from financial econometrics, high frequency trading data and deep learning. Bayesian inference via the Sequential Monte Carlo method is employed for statistical inference and forecasting. The new…
▽ More
We propose a new approach to volatility modeling by combining deep learning (LSTM) and realized volatility measures. This LSTM-enhanced realized GARCH framework incorporates and distills modeling advances from financial econometrics, high frequency trading data and deep learning. Bayesian inference via the Sequential Monte Carlo method is employed for statistical inference and forecasting. The new framework can jointly model the returns and realized volatility measures, has an excellent in-sample fit and superior predictive performance compared to several benchmark models, while being able to adapt well to the stylized facts in volatility. The performance of the new framework is tested using a wide range of metrics, from marginal likelihood, volatility forecasting, to tail risk forecasting and option pricing. We report on a comprehensive empirical study using 31 widely traded stock indices over a time period that includes COVID-19 pandemic.
△ Less
Submitted 17 October, 2023; v1 submitted 15 February, 2023;
originally announced February 2023.
-
The Contextual Lasso: Sparse Linear Models via Deep Neural Networks
Authors:
Ryan Thompson,
Amir Dezfouli,
Robert Kohn
Abstract:
Sparse linear models are one of several core tools for interpretable machine learning, a field of emerging importance as predictive models permeate decision-making in many domains. Unfortunately, sparse linear models are far less flexible as functions of their input features than black-box models like deep neural networks. With this capability gap in mind, we study a not-uncommon situation where t…
▽ More
Sparse linear models are one of several core tools for interpretable machine learning, a field of emerging importance as predictive models permeate decision-making in many domains. Unfortunately, sparse linear models are far less flexible as functions of their input features than black-box models like deep neural networks. With this capability gap in mind, we study a not-uncommon situation where the input features dichotomize into two groups: explanatory features, which are candidates for inclusion as variables in an interpretable model, and contextual features, which select from the candidate variables and determine their effects. This dichotomy leads us to the contextual lasso, a new statistical estimator that fits a sparse linear model to the explanatory features such that the sparsity pattern and coefficients vary as a function of the contextual features. The fitting process learns this function nonparametrically via a deep neural network. To attain sparse coefficients, we train the network with a novel lasso regularizer in the form of a projection layer that maps the network's output onto the space of $\ell_1$-constrained linear models. An extensive suite of experiments on real and synthetic data suggests that the learned models, which remain highly transparent, can be sparser than the regular lasso without sacrificing the predictive power of a standard deep neural network.
△ Less
Submitted 2 January, 2024; v1 submitted 2 February, 2023;
originally announced February 2023.
-
A New Approach to Drifting Games, Based on Asymptotically Optimal Potentials
Authors:
Zhilei Wang,
Robert V. Kohn
Abstract:
We develop a new approach to drifting games, a class of two-person games with many applications to boosting and online learning settings. Our approach involves (a) guessing an asymptotically optimal potential by solving an associated partial differential equation (PDE); then (b) justifying the guess, by proving upper and lower bounds on the final-time loss whose difference scales like a negative p…
▽ More
We develop a new approach to drifting games, a class of two-person games with many applications to boosting and online learning settings. Our approach involves (a) guessing an asymptotically optimal potential by solving an associated partial differential equation (PDE); then (b) justifying the guess, by proving upper and lower bounds on the final-time loss whose difference scales like a negative power of the number of time steps. The proofs of our potential-based upper bounds are elementary, using little more than Taylor expansion. The proofs of our potential-based lower bounds are also elementary, combining Taylor expansion with probabilistic or combinatorial arguments. Not only is our approach more elementary, but we give new potentials and derive corresponding upper and lower bounds that match each other in the asymptotic regime.
△ Less
Submitted 11 February, 2023; v1 submitted 22 July, 2022;
originally announced July 2022.
-
A PDE-Based Analysis of the Symmetric Two-Armed Bernoulli Bandit
Authors:
Vladimir A. Kobzar,
Robert V. Kohn
Abstract:
This work addresses a version of the two-armed Bernoulli bandit problem where the sum of the means of the arms is one (the symmetric two-armed Bernoulli bandit). In a regime where the gap between these means goes to zero as the number of prediction periods approaches infinity, i.e., the difficulty of detecting the gap increases as the sample size increases, we obtain the leading order terms of the…
▽ More
This work addresses a version of the two-armed Bernoulli bandit problem where the sum of the means of the arms is one (the symmetric two-armed Bernoulli bandit). In a regime where the gap between these means goes to zero as the number of prediction periods approaches infinity, i.e., the difficulty of detecting the gap increases as the sample size increases, we obtain the leading order terms of the minmax optimal regret and pseudoregret for this problem by associating each of them with a solution of a linear heat equation. Our results improve upon the previously known results; specifically, we explicitly compute these leading order terms in three different scaling regimes for the gap. Additionally, we obtain new non-asymptotic bounds for any given time horizon. Although optimal player strategies are not known for more general bandit problems, there is significant interest in considering how regret accumulates under specific player strategies, even when they are not known to be optimal. We expect that the methods of this paper should be useful in settings of that type.
△ Less
Submitted 14 July, 2023; v1 submitted 11 February, 2022;
originally announced February 2022.
-
A PDE Approach to the Prediction of a Binary Sequence with Advice from Two History-Dependent Experts
Authors:
Nadejda Drenska,
Robert V. Kohn
Abstract:
The prediction of a binary sequence is a classic example of online machine learning. We like to call it the 'stock prediction problem,' viewing the sequence as the price history of a stock that goes up or down one unit at each time step. In this problem, an investor has access to the predictions of two or more 'experts,' and strives to minimize her final-time regret with respect to the best-perfor…
▽ More
The prediction of a binary sequence is a classic example of online machine learning. We like to call it the 'stock prediction problem,' viewing the sequence as the price history of a stock that goes up or down one unit at each time step. In this problem, an investor has access to the predictions of two or more 'experts,' and strives to minimize her final-time regret with respect to the best-performing expert. Probability plays no role; rather, the market is assumed to be adversarial. We consider the case when there are two history-dependent experts, whose predictions are determined by the d most recent stock moves. Focusing on an appropriate continuum limit and using methods from optimal control, graph theory, and partial differential equations, we discuss strategies for the investor and the adversarial market, and we determine associated upper and lower bounds for the investor's final-time regret. When d is less than 4 our upper and lower bounds coalesce, so the proposed strategies are asymptotically optimal. Compared to other recent applications of partial differential equations to prediction, ours has a new element: there are two timescales, since the recent history changes at every step whereas regret accumulates more slowly.
△ Less
Submitted 24 July, 2020;
originally announced July 2020.
-
New Potential-Based Bounds for the Geometric-Stop** Version of Prediction with Expert Advice
Authors:
Vladimir A. Kobzar,
Robert V. Kohn,
Zhilei Wang
Abstract:
This work addresses the classic machine learning problem of online prediction with expert advice. A new potential-based framework for the fixed horizon version of this problem has been recently developed using verification arguments from optimal control theory. This paper extends this framework to the random (geometric) stop** version. To obtain explicit bounds, we construct potentials for the g…
▽ More
This work addresses the classic machine learning problem of online prediction with expert advice. A new potential-based framework for the fixed horizon version of this problem has been recently developed using verification arguments from optimal control theory. This paper extends this framework to the random (geometric) stop** version. To obtain explicit bounds, we construct potentials for the geometric version from potentials used for the fixed horizon version of the problem. This construction leads to new explicit lower and upper bounds associated with specific adversary and player strategies. While there are several known lower bounds in the fixed horizon setting, our lower bounds appear to be the first such results in the geometric stop** setting with an arbitrary number of experts. Our framework also leads in some cases to improved upper bounds. For two and three experts, our bounds are optimal to leading order.
△ Less
Submitted 29 June, 2020; v1 submitted 4 December, 2019;
originally announced December 2019.
-
New Potential-Based Bounds for Prediction with Expert Advice
Authors:
Vladimir A. Kobzar,
Robert V. Kohn,
Zhilei Wang
Abstract:
This work addresses the classic machine learning problem of online prediction with expert advice. We consider the finite-horizon version of this zero-sum, two-person game. Using verification arguments from optimal control theory, we view the task of finding better lower and upper bounds on the value of the game (regret) as the problem of finding better sub- and supersolutions of certain partial di…
▽ More
This work addresses the classic machine learning problem of online prediction with expert advice. We consider the finite-horizon version of this zero-sum, two-person game. Using verification arguments from optimal control theory, we view the task of finding better lower and upper bounds on the value of the game (regret) as the problem of finding better sub- and supersolutions of certain partial differential equations (PDEs). These sub- and supersolutions serve as the potentials for player and adversary strategies, which lead to the corresponding bounds. To get explicit bounds, we use closed-form solutions of specific PDEs. Our bounds hold for any given number of experts and horizon; in certain regimes (which we identify) they improve upon the previous state of the art. For two and three experts, our bounds provide the optimal leading order term.
△ Less
Submitted 29 June, 2020; v1 submitted 5 November, 2019;
originally announced November 2019.
-
Prediction with Expert Advice: a PDE Perspective
Authors:
Nadejda Drenska,
Robert V. Kohn
Abstract:
This work addresses a classic problem of online prediction with expert advice. We assume an adversarial opponent, and we consider both the finite-horizon and random-stop** versions of this zero-sum, two-person game. Focusing on an appropriate continuum limit and using methods from optimal control, we characterize the value of the game as the viscosity solution of a certain nonlinear partial diff…
▽ More
This work addresses a classic problem of online prediction with expert advice. We assume an adversarial opponent, and we consider both the finite-horizon and random-stop** versions of this zero-sum, two-person game. Focusing on an appropriate continuum limit and using methods from optimal control, we characterize the value of the game as the viscosity solution of a certain nonlinear partial differential equation. The analysis also reveals the predictor's and the opponent's minimax optimal strategies. Our work provides, in particular, a continuum perspective on recent work of Gravin, Peres, and Sivan (Proc SODA 2016). Our techniques are similar to those of Kohn and Serfaty (Comm Pure Appl Math 2010), where scaling limits of some two-person games led to elliptic or parabolic PDEs.
△ Less
Submitted 25 April, 2019;
originally announced April 2019.
-
Variance reduction properties of the reparameterization trick
Authors:
Ming Xu,
Matias Quiroz,
Robert Kohn,
Scott A. Sisson
Abstract:
The reparameterization trick is widely used in variational inference as it yields more accurate estimates of the gradient of the variational objective than alternative approaches such as the score function method. Although there is overwhelming empirical evidence in the literature showing its success, there is relatively little research exploring why the reparameterization trick is so effective. W…
▽ More
The reparameterization trick is widely used in variational inference as it yields more accurate estimates of the gradient of the variational objective than alternative approaches such as the score function method. Although there is overwhelming empirical evidence in the literature showing its success, there is relatively little research exploring why the reparameterization trick is so effective. We explore this under the idealized assumptions that the variational approximation is a mean-field Gaussian density and that the log of the joint density of the model parameters and the data is a quadratic function that depends on the variational mean. From this, we show that the marginal variances of the reparameterization gradient estimator are smaller than those of the score function gradient estimator. We apply the result of our idealized analysis to real-world examples.
△ Less
Submitted 27 December, 2018; v1 submitted 26 September, 2018;
originally announced September 2018.