-
Stochastic gradient descent-based inference for dynamic network models with attractors
Authors:
Hancong Pan,
Xiao**g Zhu,
Cantay Caliskan,
Dino P. Christenson,
Konstantinos Spiliopoulos,
Dylan Walker,
Eric D. Kolaczyk
Abstract:
In Coevolving Latent Space Networks with Attractors (CLSNA) models, nodes in a latent space represent social actors, and edges indicate their dynamic interactions. Attractors are added at the latent level to capture the notion of attractive and repulsive forces between nodes, borrowing from dynamical systems theory. However, CLSNA reliance on MCMC estimation makes scaling difficult, and the requir…
▽ More
In Coevolving Latent Space Networks with Attractors (CLSNA) models, nodes in a latent space represent social actors, and edges indicate their dynamic interactions. Attractors are added at the latent level to capture the notion of attractive and repulsive forces between nodes, borrowing from dynamical systems theory. However, CLSNA reliance on MCMC estimation makes scaling difficult, and the requirement for nodes to be present throughout the study period limit practical applications. We address these issues by (i) introducing a Stochastic gradient descent (SGD) parameter estimation method, (ii) develo** a novel approach for uncertainty quantification using SGD, and (iii) extending the model to allow nodes to join and leave over time. Simulation results show that our extensions result in little loss of accuracy compared to MCMC, but can scale to much larger networks. We apply our approach to the longitudinal social networks of members of US Congress on the social media platform X. Accounting for node dynamics overcomes selection bias in the network and uncovers uniquely and increasingly repulsive forces within the Republican Party.
△ Less
Submitted 20 March, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences
Authors:
Samuel Chun-Hei Lam,
Justin Sirignano,
Konstantinos Spiliopoulos
Abstract:
Mathematical methods are developed to characterize the asymptotics of recurrent neural networks (RNN) as the number of hidden units, data samples in the sequence, hidden state updates, and training steps simultaneously grow to infinity. In the case of an RNN with a simplified weight matrix, we prove the convergence of the RNN to the solution of an infinite-dimensional ODE coupled with the fixed po…
▽ More
Mathematical methods are developed to characterize the asymptotics of recurrent neural networks (RNN) as the number of hidden units, data samples in the sequence, hidden state updates, and training steps simultaneously grow to infinity. In the case of an RNN with a simplified weight matrix, we prove the convergence of the RNN to the solution of an infinite-dimensional ODE coupled with the fixed point of a random algebraic equation. The analysis requires addressing several challenges which are unique to RNNs. In typical mean-field applications (e.g., feedforward neural networks), discrete updates are of magnitude $\mathcal{O}(\frac{1}{N})$ and the number of updates is $\mathcal{O}(N)$. Therefore, the system can be represented as an Euler approximation of an appropriate ODE/PDE, which it will converge to as $N \rightarrow \infty$. However, the RNN hidden layer updates are $\mathcal{O}(1)$. Therefore, RNNs cannot be represented as a discretization of an ODE/PDE and standard mean-field techniques cannot be applied. Instead, we develop a fixed point analysis for the evolution of the RNN memory states, with convergence estimates in terms of the number of update steps and the number of hidden units. The RNN hidden layer is studied as a function in a Sobolev space, whose evolution is governed by the data sequence (a Markov chain), the parameter updates, and its dependence on the RNN hidden layer at the previous time step. Due to the strong correlation between updates, a Poisson equation must be used to bound the fluctuations of the RNN around its limit equation. These mathematical methods give rise to the neural tangent kernel (NTK) limits for RNNs trained on data sequences as the number of data samples and size of the neural network grow to infinity.
△ Less
Submitted 15 May, 2024; v1 submitted 28 August, 2023;
originally announced August 2023.
-
Transport map unadjusted Langevin algorithms: learning and discretizing perturbed samplers
Authors:
Benjamin J. Zhang,
Youssef M. Marzouk,
Konstantinos Spiliopoulos
Abstract:
Langevin dynamics are widely used in sampling high-dimensional, non-Gaussian distributions whose densities are known up to a normalizing constant. In particular, there is strong interest in unadjusted Langevin algorithms (ULA), which directly discretize Langevin dynamics to estimate expectations over the target distribution. We study the use of transport maps that approximately normalize a target…
▽ More
Langevin dynamics are widely used in sampling high-dimensional, non-Gaussian distributions whose densities are known up to a normalizing constant. In particular, there is strong interest in unadjusted Langevin algorithms (ULA), which directly discretize Langevin dynamics to estimate expectations over the target distribution. We study the use of transport maps that approximately normalize a target distribution as a way to precondition and accelerate the convergence of Langevin dynamics. We show that in continuous time, when a transport map is applied to Langevin dynamics, the result is a Riemannian manifold Langevin dynamics (RMLD) with metric defined by the transport map. We also show that applying a transport map to an irreversibly-perturbed ULA results in a geometry-informed irreversible perturbation (GiIrr) of the original dynamics. These connections suggest more systematic ways of learning metrics and perturbations, and also yield alternative discretizations of the RMLD described by the map, which we study. Under appropriate conditions, these discretized processes can be endowed with non-asymptotic bounds describing convergence to the target distribution in 2-Wasserstein distance. Illustrative numerical results complement our theoretical claims.
△ Less
Submitted 28 September, 2023; v1 submitted 14 February, 2023;
originally announced February 2023.
-
Normalization effects on deep neural networks
Authors:
Jiahui Yu,
Konstantinos Spiliopoulos
Abstract:
We study the effect of normalization on the layers of deep neural networks of feed-forward type. A given layer $i$ with $N_{i}$ hidden units is allowed to be normalized by $1/N_{i}^{γ_{i}}$ with $γ_{i}\in[1/2,1]$ and we study the effect of the choice of the $γ_{i}$ on the statistical behavior of the neural network's output (such as variance) as well as on the test accuracy on the MNIST data set. W…
▽ More
We study the effect of normalization on the layers of deep neural networks of feed-forward type. A given layer $i$ with $N_{i}$ hidden units is allowed to be normalized by $1/N_{i}^{γ_{i}}$ with $γ_{i}\in[1/2,1]$ and we study the effect of the choice of the $γ_{i}$ on the statistical behavior of the neural network's output (such as variance) as well as on the test accuracy on the MNIST data set. We find that in terms of variance of the neural network's output and test accuracy the best choice is to choose the $γ_{i}$'s to be equal to one, which is the mean-field scaling. We also find that this is particularly true for the outer layer, in that the neural network's behavior is more sensitive in the scaling of the outer layer as opposed to the scaling of the inner layers. The mechanism for the mathematical analysis is an asymptotic expansion for the neural network's output. An important practical consequence of the analysis is that it provides a systematic and mathematically informed way to choose the learning rate hyperparameters. Such a choice guarantees that the neural network behaves in a statistically robust way as the $N_i$ grow to infinity.
△ Less
Submitted 2 September, 2022;
originally announced September 2022.
-
Importance sampling for stochastic reaction-diffusion equations in the moderate deviation regime
Authors:
Ioannis Gasteratos,
Michael Salins,
Konstantinos Spiliopoulos
Abstract:
We develop a provably efficient importance sampling scheme that estimates exit probabilities of solutions to small-noise stochastic reaction-diffusion equations from scaled neighborhoods of a stable equilibrium. The moderate deviation scaling allows for a local approximation of the nonlinear dynamics by their linearized version. In addition, we identify a finite-dimensional subspace where exits ta…
▽ More
We develop a provably efficient importance sampling scheme that estimates exit probabilities of solutions to small-noise stochastic reaction-diffusion equations from scaled neighborhoods of a stable equilibrium. The moderate deviation scaling allows for a local approximation of the nonlinear dynamics by their linearized version. In addition, we identify a finite-dimensional subspace where exits take place with high probability. Using stochastic control and variational methods we show that our scheme performs well both in the zero noise limit and pre-asymptotically. Simulation studies for stochastically perturbed bistable dynamics illustrate the theoretical results.
△ Less
Submitted 22 October, 2023; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Disentangling positive and negative partisanship in social media interactions using a coevolving latent space network with attractors model
Authors:
Xiao**g Zhu,
Cantay Caliskan,
Dino P. Christenson,
Konstantinos Spiliopoulos,
Dylan Walker,
Eric D. Kolaczyk
Abstract:
We develop a broadly applicable class of coevolving latent space network with attractors (CLSNA) models, where nodes represent individual social actors assumed to lie in an unknown latent space, edges represent the presence of a specified interaction between actors, and attractors are added in the latent level to capture the notion of attractive and repulsive forces. We apply the CLSNA models to u…
▽ More
We develop a broadly applicable class of coevolving latent space network with attractors (CLSNA) models, where nodes represent individual social actors assumed to lie in an unknown latent space, edges represent the presence of a specified interaction between actors, and attractors are added in the latent level to capture the notion of attractive and repulsive forces. We apply the CLSNA models to understand the dynamics of partisan polarization on social media, where we expect Republicans and Democrats to increasingly interact with their own party and disengage with the opposing party. Using longitudinal social networks from the social media platforms Twitter and Reddit, we investigate the relative contributions of positive (attractive) and negative (repulsive) forces among political elites and the public, respectively. Our goals are to disentangle the positive and negative forces within and between parties and explore if and how they change over time. Our analysis confirms the existence of partisan polarization in social media interactions among both political elites and the public. Moreover, while positive partisanship is the driving force of interactions across the full periods of study for both the public and Democratic elites, negative partisanship has come to dominate Republican elites' interactions since the run-up to the 2016 presidential election.
△ Less
Submitted 13 August, 2022; v1 submitted 27 September, 2021;
originally announced September 2021.
-
Geometry-informed irreversible perturbations for accelerated convergence of Langevin dynamics
Authors:
Benjamin J. Zhang,
Youssef M. Marzouk,
Konstantinos Spiliopoulos
Abstract:
We introduce a novel geometry-informed irreversible perturbation that accelerates convergence of the Langevin algorithm for Bayesian computation. It is well documented that there exist perturbations to the Langevin dynamics that preserve its invariant measure while accelerating its convergence. Irreversible perturbations and reversible perturbations (such as Riemannian manifold Langevin dynamics (…
▽ More
We introduce a novel geometry-informed irreversible perturbation that accelerates convergence of the Langevin algorithm for Bayesian computation. It is well documented that there exist perturbations to the Langevin dynamics that preserve its invariant measure while accelerating its convergence. Irreversible perturbations and reversible perturbations (such as Riemannian manifold Langevin dynamics (RMLD)) have separately been shown to improve the performance of Langevin samplers. We consider these two perturbations simultaneously by presenting a novel form of irreversible perturbation for RMLD that is informed by the underlying geometry. Through numerical examples, we show that this new irreversible perturbation can improve estimation performance over irreversible perturbations that do not take the geometry into account. Moreover we demonstrate that irreversible perturbations generally can be implemented in conjunction with the stochastic gradient version of the Langevin algorithm. Lastly, while continuous-time irreversible perturbations cannot impair the performance of a Langevin estimator, the situation can sometimes be more complicated when discretization is considered. To this end, we describe a discrete-time example in which irreversibility increases both the bias and variance of the resulting estimator.
△ Less
Submitted 1 September, 2022; v1 submitted 18 August, 2021;
originally announced August 2021.
-
Normalization effects on shallow neural networks and related asymptotic expansions
Authors:
Jiahui Yu,
Konstantinos Spiliopoulos
Abstract:
We consider shallow (single hidden layer) neural networks and characterize their performance when trained with stochastic gradient descent as the number of hidden units $N$ and gradient descent steps grow to infinity. In particular, we investigate the effect of different scaling schemes, which lead to different normalizations of the neural network, on the network's statistical output, closing the…
▽ More
We consider shallow (single hidden layer) neural networks and characterize their performance when trained with stochastic gradient descent as the number of hidden units $N$ and gradient descent steps grow to infinity. In particular, we investigate the effect of different scaling schemes, which lead to different normalizations of the neural network, on the network's statistical output, closing the gap between the $1/\sqrt{N}$ and the mean-field $1/N$ normalization. We develop an asymptotic expansion for the neural network's statistical output pointwise with respect to the scaling parameter as the number of hidden units grows to infinity. Based on this expansion, we demonstrate mathematically that to leading order in $N$, there is no bias-variance trade off, in that both bias and variance (both explicitly characterized) decrease as the number of hidden units increases and time grows. In addition, we show that to leading order in $N$, the variance of the neural network's statistical output decays as the implied normalization by the scaling parameter approaches the mean field normalization. Numerical studies on the MNIST and CIFAR10 datasets show that test and train accuracy monotonically improve as the neural network's normalization gets closer to the mean field normalization.
△ Less
Submitted 1 June, 2022; v1 submitted 20 November, 2020;
originally announced November 2020.
-
Asymptotics of Reinforcement Learning with Neural Networks
Authors:
Justin Sirignano,
Konstantinos Spiliopoulos
Abstract:
We prove that a single-layer neural network trained with the Q-learning algorithm converges in distribution to a random ordinary differential equation as the size of the model and the number of training steps become large. Analysis of the limit differential equation shows that it has a unique stationary solution which is the solution of the Bellman equation, thus giving the optimal control for the…
▽ More
We prove that a single-layer neural network trained with the Q-learning algorithm converges in distribution to a random ordinary differential equation as the size of the model and the number of training steps become large. Analysis of the limit differential equation shows that it has a unique stationary solution which is the solution of the Bellman equation, thus giving the optimal control for the problem. In addition, we study the convergence of the limit differential equation to the stationary solution. As a by-product of our analysis, we obtain the limiting behavior of single-layer neural networks when trained on i.i.d. data with stochastic gradient descent under the widely-used Xavier initialization.
△ Less
Submitted 2 April, 2021; v1 submitted 13 November, 2019;
originally announced November 2019.
-
Scaling Limit of Neural Networks with the Xavier Initialization and Convergence to a Global Minimum
Authors:
Justin Sirignano,
Konstantinos Spiliopoulos
Abstract:
We analyze single-layer neural networks with the Xavier initialization in the asymptotic regime of large numbers of hidden units and large numbers of stochastic gradient descent training steps. The evolution of the neural network during training can be viewed as a stochastic system and, using techniques from stochastic analysis, we prove the neural network converges in distribution to a random ODE…
▽ More
We analyze single-layer neural networks with the Xavier initialization in the asymptotic regime of large numbers of hidden units and large numbers of stochastic gradient descent training steps. The evolution of the neural network during training can be viewed as a stochastic system and, using techniques from stochastic analysis, we prove the neural network converges in distribution to a random ODE with a Gaussian distribution. The limit is completely different than in the typical mean-field results for neural networks due to the $\frac{1}{\sqrt{N}}$ normalization factor in the Xavier initialization (versus the $\frac{1}{N}$ factor in the typical mean-field framework). Although the pre-limit problem of optimizing a neural network is non-convex (and therefore the neural network may converge to a local minimum), the limit equation minimizes a (quadratic) convex objective function and therefore converges to a global minimum. Furthermore, under reasonable assumptions, the matrix in the limiting quadratic objective function is positive definite and thus the neural network (in the limit) will converge to a global minimum with zero loss on the training set.
△ Less
Submitted 12 April, 2022; v1 submitted 9 July, 2019;
originally announced July 2019.
-
Mean Field Analysis of Deep Neural Networks
Authors:
Justin Sirignano,
Konstantinos Spiliopoulos
Abstract:
We analyze multi-layer neural networks in the asymptotic regime of simultaneously (A) large network sizes and (B) large numbers of stochastic gradient descent training iterations. We rigorously establish the limiting behavior of the multi-layer neural network output. The limit procedure is valid for any number of hidden layers and it naturally also describes the limiting behavior of the training l…
▽ More
We analyze multi-layer neural networks in the asymptotic regime of simultaneously (A) large network sizes and (B) large numbers of stochastic gradient descent training iterations. We rigorously establish the limiting behavior of the multi-layer neural network output. The limit procedure is valid for any number of hidden layers and it naturally also describes the limiting behavior of the training loss. The ideas that we explore are to (a) take the limits of each hidden layer sequentially and (b) characterize the evolution of parameters in terms of their initialization. The limit satisfies a system of deterministic integro-differential equations. The proof uses methods from weak convergence and stochastic analysis. We show that, under suitable assumptions on the activation functions and the behavior for large times, the limit neural network recovers a global minimum (with zero loss for the objective function).
△ Less
Submitted 2 April, 2021; v1 submitted 11 March, 2019;
originally announced March 2019.
-
Information geometry for approximate Bayesian computation
Authors:
Konstantinos Spiliopoulos
Abstract:
The goal of this paper is to explore the basic Approximate Bayesian Computation (ABC) algorithm via the lens of information theory. ABC is a widely used algorithm in cases where the likelihood of the data is hard to work with or intractable, but one can simulate from it. We use relative entropy ideas to analyze the behavior of the algorithm as a function of the threshold parameter and of the size…
▽ More
The goal of this paper is to explore the basic Approximate Bayesian Computation (ABC) algorithm via the lens of information theory. ABC is a widely used algorithm in cases where the likelihood of the data is hard to work with or intractable, but one can simulate from it. We use relative entropy ideas to analyze the behavior of the algorithm as a function of the threshold parameter and of the size of the data. Relative entropy here is data driven as it depends on the values of the observed statistics. Relative entropy also allows us to explore the effect of the distance metric and sets up a mathematical framework for sensitivity analysis allowing to find important directions which could lead to lower computational cost of the algorithm for the same level of accuracy. In addition, we also investigate the bias of the estimators for generic observables as a function of both the threshold parameters and the size of the data. Our analysis provides error bounds on performance for positive tolerances and finite sample sizes. Simulation studies complement and illustrate the theoretical results.
△ Less
Submitted 12 August, 2019; v1 submitted 5 December, 2018;
originally announced December 2018.
-
Mean Field Analysis of Neural Networks: A Central Limit Theorem
Authors:
Justin Sirignano,
Konstantinos Spiliopoulos
Abstract:
We rigorously prove a central limit theorem for neural network models with a single hidden layer. The central limit theorem is proven in the asymptotic regime of simultaneously (A) large numbers of hidden units and (B) large numbers of stochastic gradient descent training iterations. Our result describes the neural network's fluctuations around its mean-field limit. The fluctuations have a Gaussia…
▽ More
We rigorously prove a central limit theorem for neural network models with a single hidden layer. The central limit theorem is proven in the asymptotic regime of simultaneously (A) large numbers of hidden units and (B) large numbers of stochastic gradient descent training iterations. Our result describes the neural network's fluctuations around its mean-field limit. The fluctuations have a Gaussian distribution and satisfy a stochastic partial differential equation. The proof relies upon weak convergence methods from stochastic analysis. In particular, we prove relative compactness for the sequence of processes and uniqueness of the limiting process in a suitable Sobolev space.
△ Less
Submitted 3 June, 2019; v1 submitted 28 August, 2018;
originally announced August 2018.
-
Importance sampling for slow-fast diffusions based on moderate deviations
Authors:
Matthew R. Morse,
Konstantinos Spiliopoulos
Abstract:
We consider systems of slow--fast diffusions with small noise in the slow component. We construct provably logarithmic asymptotically optimal importance schemes for the estimation of rare events based on the moderate deviations principle. Using the subsolution approach we construct schemes and identify conditions under which the schemes will be asymptotically optimal. Moderate deviations--based im…
▽ More
We consider systems of slow--fast diffusions with small noise in the slow component. We construct provably logarithmic asymptotically optimal importance schemes for the estimation of rare events based on the moderate deviations principle. Using the subsolution approach we construct schemes and identify conditions under which the schemes will be asymptotically optimal. Moderate deviations--based importance sampling offers a viable alternative to large deviations importance sampling when the events are not too rare. In particular, in many cases of interest one can indeed construct the required change of measure in closed form, a task which is more complicated using the large deviations--based importance sampling, especially when it comes to multiscale dynamically evolving processes. The presence of multiple scales and the fact that we do not make any periodicity assumptions for the coefficients driving the processes, complicates the design and the analysis of efficient importance sampling schemes. Simulation studies illustrate the theory.
△ Less
Submitted 6 January, 2020; v1 submitted 25 May, 2018;
originally announced May 2018.
-
Stochastic Gradient Descent in Continuous Time: A Central Limit Theorem
Authors:
Justin Sirignano,
Konstantinos Spiliopoulos
Abstract:
Stochastic gradient descent in continuous time (SGDCT) provides a computationally efficient method for the statistical learning of continuous-time models, which are widely used in science, engineering, and finance. The SGDCT algorithm follows a (noisy) descent direction along a continuous stream of data. The parameter updates occur in continuous time and satisfy a stochastic differential equation.…
▽ More
Stochastic gradient descent in continuous time (SGDCT) provides a computationally efficient method for the statistical learning of continuous-time models, which are widely used in science, engineering, and finance. The SGDCT algorithm follows a (noisy) descent direction along a continuous stream of data. The parameter updates occur in continuous time and satisfy a stochastic differential equation. This paper analyzes the asymptotic convergence rate of the SGDCT algorithm by proving a central limit theorem (CLT) for strongly convex objective functions and, under slightly stronger conditions, for non-convex objective functions as well. An $L^{p}$ convergence rate is also proven for the algorithm in the strongly convex case. The mathematical analysis lies at the intersection of stochastic analysis and statistical learning.
△ Less
Submitted 17 June, 2019; v1 submitted 11 October, 2017;
originally announced October 2017.
-
Discrete-Time Statistical Inference for Multiscale Diffusions
Authors:
Siragan Gailus,
Konstantinos Spiliopoulos
Abstract:
We study statistical inference for small-noise-perturbed multiscale dynamical systems under the assumption that we observe a single time series from the slow process only. We construct estimators for both averaging and homogenization regimes, based on an appropriate misspecified model motivated by a second-order stochastic Taylor expansion of the slow process with respect to a function of the time…
▽ More
We study statistical inference for small-noise-perturbed multiscale dynamical systems under the assumption that we observe a single time series from the slow process only. We construct estimators for both averaging and homogenization regimes, based on an appropriate misspecified model motivated by a second-order stochastic Taylor expansion of the slow process with respect to a function of the time-scale separation parameter. In the case of a fixed number of observations, we establish consistency, asymptotic normality, and asymptotic statistical efficiency of a minimum contrast estimator (MCE), the limiting variance having been identified explicitly; we furthermore establish consistency and asymptotic normality of a simplified minimum constrast estimator (SMCE), which is however not in general efficient. These results are then extended to the case of high-frequency observations under a condition restricting the rate at which the number of observations may grow vis-à-vis the separation of scales. Numerical simulations illustrate the theoretical results.
△ Less
Submitted 11 September, 2018; v1 submitted 7 September, 2017;
originally announced September 2017.
-
DGM: A deep learning algorithm for solving partial differential equations
Authors:
Justin Sirignano,
Konstantinos Spiliopoulos
Abstract:
High-dimensional PDEs have been a longstanding computational challenge. We propose to solve high-dimensional PDEs by approximating the solution with a deep neural network which is trained to satisfy the differential operator, initial condition, and boundary conditions. Our algorithm is meshfree, which is key since meshes become infeasible in higher dimensions. Instead of forming a mesh, the neural…
▽ More
High-dimensional PDEs have been a longstanding computational challenge. We propose to solve high-dimensional PDEs by approximating the solution with a deep neural network which is trained to satisfy the differential operator, initial condition, and boundary conditions. Our algorithm is meshfree, which is key since meshes become infeasible in higher dimensions. Instead of forming a mesh, the neural network is trained on batches of randomly sampled time and space points. The algorithm is tested on a class of high-dimensional free boundary PDEs, which we are able to accurately solve in up to $200$ dimensions. The algorithm is also tested on a high-dimensional Hamilton-Jacobi-Bellman PDE and Burgers' equation. The deep learning algorithm approximates the general solution to the Burgers' equation for a continuum of different boundary conditions and physical conditions (which can be viewed as a high-dimensional space). We call the algorithm a "Deep Galerkin Method (DGM)" since it is similar in spirit to Galerkin methods, with the solution approximated by a neural network instead of a linear combination of basis functions. In addition, we prove a theorem regarding the approximation power of neural networks for a class of quasilinear parabolic PDEs.
△ Less
Submitted 5 September, 2018; v1 submitted 24 August, 2017;
originally announced August 2017.
-
Importance sampling for metastable and multiscale dynamical systems
Authors:
Konstantinos Spiliopoulos
Abstract:
In this article, we address the issues that come up in the design of importance sampling schemes for rare events associated to stochastic dynamical systems. We focus on the issue of metastability and on the effect of multiple scales. We discuss why seemingly reasonable schemes that follow large deviations optimal paths may perform poorly in practice, even though they are asymptotically optimal. Pr…
▽ More
In this article, we address the issues that come up in the design of importance sampling schemes for rare events associated to stochastic dynamical systems. We focus on the issue of metastability and on the effect of multiple scales. We discuss why seemingly reasonable schemes that follow large deviations optimal paths may perform poorly in practice, even though they are asymptotically optimal. Pre-asymptotic optimality is important when one deals with metastable dynamics and we discuss possible ways as to how to address this issue. Moreover, we discuss how the effect of the multiple scales (either in periodic or random environments) on the efficient design of importance sampling should be addressed. We discuss the mathematical and practical issues that come up, how to overcome some of the issues and discuss future challenges.
△ Less
Submitted 27 July, 2017;
originally announced July 2017.
-
Optimal Scaling of the MALA algorithm with Irreversible Proposals for Gaussian targets
Authors:
Michela Ottobre,
Natesh S. Pillai,
Konstantinos Spiliopoulos
Abstract:
It is well known in many settings that reversible Langevin diffusions in confining potentials converge to equilibrium exponentially fast. Adding irreversible perturbations to the drift of a Langevin diffusion that maintain the same invariant measure accelerates its convergence to stationarity. Many existing works thus advocate the use of such non-reversible dynamics for sampling. When implementing…
▽ More
It is well known in many settings that reversible Langevin diffusions in confining potentials converge to equilibrium exponentially fast. Adding irreversible perturbations to the drift of a Langevin diffusion that maintain the same invariant measure accelerates its convergence to stationarity. Many existing works thus advocate the use of such non-reversible dynamics for sampling. When implementing Markov Chain Monte Carlo algorithms (MCMC) using time discretisations of such Stochastic Differential Equations (SDEs), one can append the discretization with the usual Metropolis-Hastings accept-reject step and this is often done in practice because the accept--reject step eliminates bias. On the other hand, such a step makes the resulting chain reversible. It is not known whether adding the accept-reject step preserves the faster mixing properties of the non-reversible dynamics. In this paper, we address this gap between theory and practice by analyzing the optimal scaling of MCMC algorithms constructed from proposal moves that are time-step Euler discretisations of an irreversible SDE, for high dimensional Gaussian target measures. We call the resulting algorithm the \imala, in comparison to the classical MALA algorithm (here {\em ip} is for irreversible proposal). In order to quantify how the cost of the algorithm scales with the dimension $N$, we prove invariance principles for the appropriately rescaled chain. In contrast to the usual MALA algorithm, we show that there could be two regimes asymptotically: (i) a diffusive regime, as in the MALA algorithm and (ii) a ``fluid" regime where the limit is an ordinary differential equation. We provide concrete examples where the limit is a diffusion, as in the standard MALA, but with provably higher limiting acceptance probabilities. Numerical results are also given corroborating the theory.
△ Less
Submitted 1 July, 2019; v1 submitted 6 February, 2017;
originally announced February 2017.
-
Stochastic Gradient Descent in Continuous Time
Authors:
Justin Sirignano,
Konstantinos Spiliopoulos
Abstract:
Stochastic gradient descent in continuous time (SGDCT) provides a computationally efficient method for the statistical learning of continuous-time models, which are widely used in science, engineering, and finance. The SGDCT algorithm follows a (noisy) descent direction along a continuous stream of data. SGDCT performs an online parameter update in continuous time, with the parameter updates…
▽ More
Stochastic gradient descent in continuous time (SGDCT) provides a computationally efficient method for the statistical learning of continuous-time models, which are widely used in science, engineering, and finance. The SGDCT algorithm follows a (noisy) descent direction along a continuous stream of data. SGDCT performs an online parameter update in continuous time, with the parameter updates $θ_t$ satisfying a stochastic differential equation. We prove that $\lim_{t \rightarrow \infty} \nabla \bar g(θ_t) = 0$ where $\bar g$ is a natural objective function for the estimation of the continuous-time dynamics. The convergence proof leverages ergodicity by using an appropriate Poisson equation to help describe the evolution of the parameters for large times. SGDCT can also be used to solve continuous-time optimization problems, such as American options. For certain continuous-time problems, SGDCT has some promising advantages compared to a traditional stochastic gradient descent algorithm. As an example application, SGDCT is combined with a deep neural network to price high-dimensional American options (up to 100 dimensions).
△ Less
Submitted 29 October, 2017; v1 submitted 16 November, 2016;
originally announced November 2016.
-
Rare event simulation via importance sampling for linear SPDE's
Authors:
Michael Salins,
Konstantinos Spiliopoulos
Abstract:
The goal of this paper is to develop provably efficient importance sampling Monte Carlo methods for the estimation of rare events within the class of linear stochastic partial differential equations (SPDEs). We find that if a spectral gap of appropriate size exists, then one can identify a lower dimensional manifold where the rare event takes place. This allows one to build importance sampling cha…
▽ More
The goal of this paper is to develop provably efficient importance sampling Monte Carlo methods for the estimation of rare events within the class of linear stochastic partial differential equations (SPDEs). We find that if a spectral gap of appropriate size exists, then one can identify a lower dimensional manifold where the rare event takes place. This allows one to build importance sampling changes of measures that perform provably well even pre-asymptotically (i.e. for small but non-zero size of the noise) without degrading in performance due to infinite dimensionality or due to long simulation time horizons. Simulation studies supplement and illustrate the theoretical results.
△ Less
Submitted 4 May, 2017; v1 submitted 14 September, 2016;
originally announced September 2016.
-
Dimension Reduction in Statistical Estimation of Partially Observed Multiscale Processes
Authors:
Andrew Papanicolaou,
Konstantinos Spiliopoulos
Abstract:
We consider partially observed multiscale diffusion models that are specified up to an unknown vector parameter. We establish for a very general class of test functions that the filter of the original model converges to a filter of reduced dimension. Then, this result is used to justify statistical estimation for the unknown parameters of interest based on the model of reduced dimension but using…
▽ More
We consider partially observed multiscale diffusion models that are specified up to an unknown vector parameter. We establish for a very general class of test functions that the filter of the original model converges to a filter of reduced dimension. Then, this result is used to justify statistical estimation for the unknown parameters of interest based on the model of reduced dimension but using the original available data. This allows to learn the unknown parameters of interest while working in lower dimensions, as opposed to working with the original high dimensional system. Simulation studies support and illustrate the theoretical results.
△ Less
Submitted 26 November, 2017; v1 submitted 20 July, 2016;
originally announced July 2016.
-
Analysis of multiscale integrators for multiple attractors and irreversible Langevin samplers
Authors:
Jianfeng Lu,
Konstantinos Spiliopoulos
Abstract:
We study multiscale integrator numerical schemes for a class of stiff stochastic differential equations (SDEs). We consider multiscale SDEs with potentially multiple attractors that behave as diffusions on graphs as the stiffness parameter goes to its limit. Classical numerical discretization schemes, such as the Euler-Maruyama scheme, become unstable as the stiffness parameter converges to its li…
▽ More
We study multiscale integrator numerical schemes for a class of stiff stochastic differential equations (SDEs). We consider multiscale SDEs with potentially multiple attractors that behave as diffusions on graphs as the stiffness parameter goes to its limit. Classical numerical discretization schemes, such as the Euler-Maruyama scheme, become unstable as the stiffness parameter converges to its limit and appropriate multiscale integrators can correct for this. We rigorously establish the convergence of the numerical method to the related diffusion on graph, identifying the appropriate choice of discretization parameters. Theoretical results are supplemented by numerical studies on the problem of the recently develo** area of introducing irreversibility in Langevin samplers in order to accelerate convergence to equilibrium.
△ Less
Submitted 9 October, 2018; v1 submitted 30 June, 2016;
originally announced June 2016.
-
Improving the convergence of reversible samplers
Authors:
Luc Rey-Bellet,
Konstantinos Spiliopoulos
Abstract:
In Monte-Carlo methods the Markov processes used to sample a given target distribution usually satisfy detailed balance, i.e. they are time-reversible. However, relatively recent results have demonstrated that appropriate reversible and irreversible perturbations can accelerate convergence to equilibrium. In this paper we present some general design principles which apply to general Markov process…
▽ More
In Monte-Carlo methods the Markov processes used to sample a given target distribution usually satisfy detailed balance, i.e. they are time-reversible. However, relatively recent results have demonstrated that appropriate reversible and irreversible perturbations can accelerate convergence to equilibrium. In this paper we present some general design principles which apply to general Markov processes. Working with the generator of Markov processes, we prove that for some of the most commonly used performance criteria, i.e., spectral gap, asymptotic variance and large deviation functionals, sampling is improved for appropriate reversible and irreversible perturbations of some initially given reversible sampler. Moreover we provide specific constructions for such reversible and irreversible perturbations for various commonly used Markov processes, such as Markov chains and diffusions. In the case of diffusions, we make the discussion more specific using the large deviations rate function as a measure of performance.
△ Less
Submitted 9 June, 2016; v1 submitted 29 January, 2016;
originally announced January 2016.
-
Sequential Monte Carlo for fractional Stochastic Volatility Models
Authors:
Alexandra Chronopoulou,
Konstantinos Spiliopoulos
Abstract:
In this paper we consider a fractional stochastic volatility model, that is a model in which the volatility may exhibit a long-range dependent or a rough/antipersistent behavior. We propose a dynamic sequential Monte Carlo methodology that is applicable to both long memory and antipersistent processes in order to estimate the volatility as well as the unknown parameters of the model. We establish…
▽ More
In this paper we consider a fractional stochastic volatility model, that is a model in which the volatility may exhibit a long-range dependent or a rough/antipersistent behavior. We propose a dynamic sequential Monte Carlo methodology that is applicable to both long memory and antipersistent processes in order to estimate the volatility as well as the unknown parameters of the model. We establish a central limit theorem for the state and parameter filters and we study asymptotic properties (consistency and asymptotic normality) for the filter. We illustrate our results with a simulation study and we apply our method to estimating the volatility and the parameters of a long-range dependent model for S&P 500 data.
△ Less
Submitted 25 February, 2017; v1 submitted 11 August, 2015;
originally announced August 2015.
-
Rare event simulation for multiscale diffusions in random environments
Authors:
Konstantinos Spiliopoulos
Abstract:
We consider systems of stochastic differential equations with multiple scales and small noise and assume that the coefficients of the equations are ergodic and stationary random fields. Our goal is to construct provably-efficient importance sampling Monte Carlo methods that allow efficient computation of rare event probabilities or expectations of functionals that can be associated with rare event…
▽ More
We consider systems of stochastic differential equations with multiple scales and small noise and assume that the coefficients of the equations are ergodic and stationary random fields. Our goal is to construct provably-efficient importance sampling Monte Carlo methods that allow efficient computation of rare event probabilities or expectations of functionals that can be associated with rare events. Standard Monte Carlo algorithms perform poorly in the small noise limit and hence fast simulations algorithms become relevant. The presence of multiple scales complicates the design and the analysis of efficient importance sampling schemes. An additional complication is the randomness of the environment. We construct explicit changes of measures that are proven to be logarithmic asymptotically efficient with probability one with respect to the random environment (i.e., in the quenched sense). Numerical simulations support the theoretical results.
△ Less
Submitted 28 September, 2015; v1 submitted 1 October, 2014;
originally announced October 2014.