-
Almost sure convergence rates of stochastic gradient methods under gradient domination
Authors:
Simon Weissmann,
Sara Klein,
Waïss Azizian,
Leif Döring
Abstract:
Stochastic gradient methods are among the most important algorithms in training machine learning problems. While classical assumptions such as strong convexity allow a simple analysis they are rarely satisfied in applications. In recent years, global and local gradient domination properties have shown to be a more realistic replacement of strong convexity. They were proved to hold in diverse setti…
▽ More
Stochastic gradient methods are among the most important algorithms in training machine learning problems. While classical assumptions such as strong convexity allow a simple analysis they are rarely satisfied in applications. In recent years, global and local gradient domination properties have shown to be a more realistic replacement of strong convexity. They were proved to hold in diverse settings such as (simple) policy gradient methods in reinforcement learning and training of deep neural networks with analytic activation functions. We prove almost sure convergence rates $f(X_n)-f^*\in o\big( n^{-\frac{1}{4β-1}+ε}\big)$ of the last iterate for stochastic gradient descent (with and without momentum) under global and local $β$-gradient domination assumptions. The almost sure rates get arbitrarily close to recent rates in expectation. Finally, we demonstrate how to apply our results to the training task in both supervised and reinforcement learning.
△ Less
Submitted 27 May, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
On the mean-field limit for Stein variational gradient descent: stability and multilevel approximation
Authors:
Simon Weissmann,
Jakob Zech
Abstract:
In this paper we propose and analyze a novel multilevel version of Stein variational gradient descent (SVGD). SVGD is a recent particle based variational inference method. For Bayesian inverse problems with computationally expensive likelihood evaluations, the method can become prohibitive as it requires to evolve a discrete dynamical system over many time steps, each of which requires likelihood…
▽ More
In this paper we propose and analyze a novel multilevel version of Stein variational gradient descent (SVGD). SVGD is a recent particle based variational inference method. For Bayesian inverse problems with computationally expensive likelihood evaluations, the method can become prohibitive as it requires to evolve a discrete dynamical system over many time steps, each of which requires likelihood evaluations at all particle locations. To address this, we introduce a multilevel variant that involves running several interacting particle dynamics in parallel corresponding to different approximation levels of the likelihood. By carefully tuning the number of particles at each level, we prove that a significant reduction in computational complexity can be achieved. As an application we provide a numerical experiment for a PDE driven inverse problem, which confirms the speed up suggested by our theoretical results.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
The Ensemble Kalman Filter for Dynamic Inverse Problems
Authors:
Simon Weissmann,
Neil K. Chada,
Xin T. Tong
Abstract:
In inverse problems, the goal is to estimate unknown model parameters from noisy observational data. Traditionally, inverse problems are solved under the assumption of a fixed forward operator describing the observation model. In this article, we consider the extension of this approach to situations where we have a dynamic forward model, motivated by applications in scientific computation and engi…
▽ More
In inverse problems, the goal is to estimate unknown model parameters from noisy observational data. Traditionally, inverse problems are solved under the assumption of a fixed forward operator describing the observation model. In this article, we consider the extension of this approach to situations where we have a dynamic forward model, motivated by applications in scientific computation and engineering. We specifically consider this extension for a derivative-free optimizer, the ensemble Kalman inversion (EKI). We introduce and justify a new methodology called dynamic-EKI, which is a particle-based method with a changing forward operator. We analyze our new method, presenting results related to the control of our particle system through its covariance structure. This analysis includes moment bounds and an ensemble collapse, which are essential for demonstrating a convergence result. We establish convergence in expectation and validate our theoretical findings through experiments with dynamic-EKI applied to a 2D Darcy flow partial differential equation.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Metropolis-adjusted interacting particle sampling
Authors:
Björn Sprungk,
Simon Weissmann,
Jakob Zech
Abstract:
In recent years, various interacting particle samplers have been developed to sample from complex target distributions, such as those found in Bayesian inverse problems. These samplers are motivated by the mean-field limit perspective and implemented as ensembles of particles that move in the product state space according to coupled stochastic differential equations. The ensemble approximation and…
▽ More
In recent years, various interacting particle samplers have been developed to sample from complex target distributions, such as those found in Bayesian inverse problems. These samplers are motivated by the mean-field limit perspective and implemented as ensembles of particles that move in the product state space according to coupled stochastic differential equations. The ensemble approximation and numerical time step** used to simulate these systems can introduce bias and affect the invariance of the particle system with respect to the target distribution. To correct for this, we investigate the use of a Metropolization step, similar to the Metropolis-adjusted Langevin algorithm. We examine Metropolization of either the whole ensemble or smaller subsets of the ensemble, and prove basic convergence of the resulting ensemble Markov chain to the target distribution. Our numerical results demonstrate the benefits of this correction in numerical examples for popular interacting particle samplers such as ALDI, CBS, and stochastic SVGD.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
On the ensemble Kalman inversion under inequality constraints
Authors:
Matei Hanu,
Simon Weissmann
Abstract:
The ensemble Kalman inversion (EKI), a recently introduced optimisation method for solving inverse problems, is widely employed for the efficient and derivative-free estimation of unknown parameters. Specifically in cases involving ill-posed inverse problems and high-dimensional parameter spaces, the scheme has shown promising success. However, in its general form, the EKI does not take constraint…
▽ More
The ensemble Kalman inversion (EKI), a recently introduced optimisation method for solving inverse problems, is widely employed for the efficient and derivative-free estimation of unknown parameters. Specifically in cases involving ill-posed inverse problems and high-dimensional parameter spaces, the scheme has shown promising success. However, in its general form, the EKI does not take constraints into account, which are essential and often stem from physical limitations or specific requirements. Based on a log-barrier approach, we suggest adapting the continuous-time formulation of EKI to incorporate convex inequality constraints. We underpin this adaptation with a theoretical analysis that provides lower and upper bounds on the ensemble collapse, as well as convergence to the constraint optimum for general nonlinear forward models. Finally, we showcase our results through two examples involving partial differential equations (PDEs).
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods
Authors:
Sara Klein,
Simon Weissmann,
Leif Döring
Abstract:
Markov Decision Processes (MDPs) are a formal framework for modeling and solving sequential decision-making problems. In finite-time horizons such problems are relevant for instance for optimal stop** or specific supply chain problems, but also in the training of large language models. In contrast to infinite horizon MDPs optimal policies are not stationary, policies must be learned for every si…
▽ More
Markov Decision Processes (MDPs) are a formal framework for modeling and solving sequential decision-making problems. In finite-time horizons such problems are relevant for instance for optimal stop** or specific supply chain problems, but also in the training of large language models. In contrast to infinite horizon MDPs optimal policies are not stationary, policies must be learned for every single epoch. In practice all parameters are often trained simultaneously, ignoring the inherent structure suggested by dynamic programming. This paper introduces a combination of dynamic programming and policy gradient called dynamic policy gradient, where the parameters are trained backwards in time. For the tabular softmax parametrisation we carry out the convergence analysis for simultaneous and dynamic policy gradient towards global optima, both in the exact and sampled gradient settings without regularisation. It turns out that the use of dynamic policy gradient training much better exploits the structure of finite- time problems which is reflected in improved convergence bounds.
△ Less
Submitted 6 May, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Adaptive multilevel subset simulation with selective refinement
Authors:
Daniel Elfverson,
Robert Scheichl,
Simon Weissmann,
F. Alejandro DiazDelaO
Abstract:
In this work we propose an adaptive multilevel version of subset simulation to estimate the probability of rare events for complex physical systems. Given a sequence of nested failure domains of increasing size, the rare event probability is expressed as a product of conditional probabilities. The proposed new estimator uses different model resolutions and varying numbers of samples across the hie…
▽ More
In this work we propose an adaptive multilevel version of subset simulation to estimate the probability of rare events for complex physical systems. Given a sequence of nested failure domains of increasing size, the rare event probability is expressed as a product of conditional probabilities. The proposed new estimator uses different model resolutions and varying numbers of samples across the hierarchy of nested failure sets. In order to dramatically reduce the computational cost, we construct the intermediate failure sets such that only a small number of expensive high-resolution model evaluations are needed, whilst the majority of samples can be taken from inexpensive low-resolution simulations. A key idea in our new estimator is the use of a posteriori error estimators combined with a selective mesh refinement strategy to guarantee the critical subset property that may be violated when changing model resolution from one failure set to the next. The efficiency gains and the statistical properties of the estimator are investigated both theoretically via shaking transformations, as well as numerically. On a model problem from subsurface flow, the new multilevel estimator achieves gains of more than a factor 60 over standard subset simulation for a practically relevant relative error of 25%.
△ Less
Submitted 12 December, 2023; v1 submitted 10 August, 2022;
originally announced August 2022.
-
Multilevel Optimization for Inverse Problems
Authors:
Simon Weissmann,
Ashia Wilson,
Jakob Zech
Abstract:
Inverse problems occur in a variety of parameter identification tasks in engineering. Such problems are challenging in practice, as they require repeated evaluation of computationally expensive forward models. We introduce a unifying framework of multilevel optimization that can be applied to a wide range of optimization-based solvers. Our framework provably reduces the computational cost associat…
▽ More
Inverse problems occur in a variety of parameter identification tasks in engineering. Such problems are challenging in practice, as they require repeated evaluation of computationally expensive forward models. We introduce a unifying framework of multilevel optimization that can be applied to a wide range of optimization-based solvers. Our framework provably reduces the computational cost associated with evaluating the expensive forward maps stemming from various physical models. To demonstrate the versatility of our analysis, we discuss its implications for various methodologies including multilevel (accelerated, stochastic) gradient descent, a multilevel ensemble Kalman inversion and a multilevel Langevin sampler. We also provide numerical experiments to verify our theoretical findings.
△ Less
Submitted 28 April, 2022;
originally announced April 2022.
-
Gradient flow structure and convergence analysis of the ensemble Kalman inversion for nonlinear forward models
Authors:
Simon Weissmann
Abstract:
The ensemble Kalman inversion (EKI) is a particle based method which has been introduced as the application of the ensemble Kalman filter to inverse problems. In practice it has been widely used as derivative-free optimization method in order to estimate unknown parameters from noisy measurement data. For linear forward models the EKI can be viewed as gradient flow preconditioned by a certain samp…
▽ More
The ensemble Kalman inversion (EKI) is a particle based method which has been introduced as the application of the ensemble Kalman filter to inverse problems. In practice it has been widely used as derivative-free optimization method in order to estimate unknown parameters from noisy measurement data. For linear forward models the EKI can be viewed as gradient flow preconditioned by a certain sample covariance matrix. Through the preconditioning the resulting scheme remains in a finite dimensional subspace of the original high-dimensional (or even infinite dimensional) parameter space and can be viewed as optimizer restricted to this subspace. For general nonlinear forward models the resulting EKI flow can only be viewed as gradient flow in approximation. In this paper we discuss the effect of applying a sample covariance as preconditioning matrix and quantify the gradient flow structure of the EKI by controlling the approximation error through the spread in the particle system. The ensemble collapse on the one side leads to an accurate gradient approximation, but on the other side to degeneration in the preconditioning sample covariance matrix. In order to ensure convergence as optimization method we derive lower as well as upper bounds on the ensemble collapse. Furthermore, we introduce covariance inflation without breaking the subspace property intending to reduce the collapse rate of the ensemble such that the convergence rate improves. In a numerical experiment we apply EKI to a nonlinear elliptic boundary-value problem and illustrate the dependence of EKI as derivative-free optimizer on the choice of the initial ensemble.
△ Less
Submitted 2 September, 2022; v1 submitted 31 March, 2022;
originally announced March 2022.
-
One-shot Learning of Surrogates in PDE-constrained Optimization Under Uncertainty
Authors:
Philipp A. Guth,
Claudia Schillings,
Simon Weissmann
Abstract:
We propose a general framework for machine learning based optimization under uncertainty. Our approach replaces the complex forward model by a surrogate, which is learned simultaneously in a one-shot sense when solving the optimal control problem. Our approach relies on a reformulation of the problem as a penalized empirical risk minimization problem for which we provide a consistency analysis in…
▽ More
We propose a general framework for machine learning based optimization under uncertainty. Our approach replaces the complex forward model by a surrogate, which is learned simultaneously in a one-shot sense when solving the optimal control problem. Our approach relies on a reformulation of the problem as a penalized empirical risk minimization problem for which we provide a consistency analysis in terms of large data and increasing penalty parameter. To solve the resulting problem, we suggest a stochastic gradient method with adaptive control of the penalty parameter and prove convergence under suitable assumptions on the surrogate model. Numerical experiments illustrate the results for linear and nonlinear surrogate models.
△ Less
Submitted 22 December, 2023; v1 submitted 21 December, 2021;
originally announced December 2021.
-
Adaptive Tikhonov strategies for stochastic ensemble Kalman inversion
Authors:
Simon Weissmann,
Neil K. Chada,
Claudia Schillings,
Xin T. Tong
Abstract:
Ensemble Kalman inversion (EKI) is a derivative-free optimizer aimed at solving inverse problems, taking motivation from the celebrated ensemble Kalman filter. The purpose of this article is to consider the introduction of adaptive Tikhonov strategies for EKI. This work builds upon Tikhonov EKI (TEKI) which was proposed for a fixed regularization constant. By adaptively learning the regularization…
▽ More
Ensemble Kalman inversion (EKI) is a derivative-free optimizer aimed at solving inverse problems, taking motivation from the celebrated ensemble Kalman filter. The purpose of this article is to consider the introduction of adaptive Tikhonov strategies for EKI. This work builds upon Tikhonov EKI (TEKI) which was proposed for a fixed regularization constant. By adaptively learning the regularization parameter, this procedure is known to improve the recovery of the underlying unknown. For the analysis, we consider a continuous-time setting where we extend known results such as well-posdeness and convergence of various loss functions, but with the addition of noisy observations. Furthermore, we allow a time-varying noise and regularization covariance in our presented convergence result which mimic adaptive regularization schemes. In turn we present three adaptive regularization schemes, which are highlighted from both the deterministic and Bayesian approaches for inverse problems, which include bilevel optimization, the MAP formulation and covariance learning. We numerically test these schemes and the theory on linear and nonlinear partial differential equations, where they outperform the non-adaptive TEKI and EKI.
△ Less
Submitted 18 October, 2021;
originally announced October 2021.
-
Continuous time limit of the stochastic ensemble Kalman inversion: Strong convergence analysis
Authors:
Dirk Blömker,
Claudia Schillings,
Philipp Wacker,
Simon Weissmann
Abstract:
The Ensemble Kalman inversion (EKI) method is a method for the estimation of unknown parameters in the context of (Bayesian) inverse problems. The method approximates the underlying measure by an ensemble of particles and iteratively applies the ensemble Kalman update to evolve (the approximation of the) prior into the posterior measure.
For the convergence analysis of the EKI it is common pract…
▽ More
The Ensemble Kalman inversion (EKI) method is a method for the estimation of unknown parameters in the context of (Bayesian) inverse problems. The method approximates the underlying measure by an ensemble of particles and iteratively applies the ensemble Kalman update to evolve (the approximation of the) prior into the posterior measure.
For the convergence analysis of the EKI it is common practice to derive a continuous version, replacing the iteration with a stochastic differential equation. In this paper we validate this approach by showing that the stochastic EKI iteration converges to paths of the continuous-time stochastic differential equation by considering both the nonlinear and linear setting, and we prove convergence in probability for the former, and convergence in moments for the latter. The methods employed can also be applied to the analysis of more general numerical schemes for stochastic differential equations in general.
△ Less
Submitted 30 July, 2021;
originally announced July 2021.
-
Consistency analysis of bilevel data-driven learning in inverse problems
Authors:
Neil K. Chada,
Claudia Schillings,
Xin T. Tong,
Simon Weissmann
Abstract:
One fundamental problem when solving inverse problems is how to find regularization parameters. This article considers solving this problem using data-driven bilevel optimization, i.e. we consider the adaptive learning of the regularization parameter from data by means of optimization. This approach can be interpreted as solving an empirical risk minimization problem, and we analyze its performanc…
▽ More
One fundamental problem when solving inverse problems is how to find regularization parameters. This article considers solving this problem using data-driven bilevel optimization, i.e. we consider the adaptive learning of the regularization parameter from data by means of optimization. This approach can be interpreted as solving an empirical risk minimization problem, and we analyze its performance in the large data sample size limit for general nonlinear problems. We demonstrate how to implement our framework on linear inverse problems, where we can further show the inverse accuracy does not depend on the ambient space dimension. To reduce the associated computational cost, online numerical schemes are derived using the stochastic gradient descent method. We prove convergence of these numerical schemes under suitable assumptions on the forward problem. Numerical experiments are presented illustrating the theoretical results and demonstrating the applicability and efficiency of the proposed approaches for various linear and nonlinear inverse problems, including Darcy flow, the eikonal equation, and an image denoising example.
△ Less
Submitted 7 January, 2021; v1 submitted 6 July, 2020;
originally announced July 2020.
-
Ensemble Kalman filter for neural network based one-shot inversion
Authors:
Philipp A. Guth,
Claudia Schillings,
Simon Weissmann
Abstract:
We study the use of novel techniques arising in machine learning for inverse problems. Our approach replaces the complex forward model by a neural network, which is trained simultaneously in a one-shot sense when estimating the unknown parameters from data, i.e. the neural network is trained only for the unknown parameter. By establishing a link to the Bayesian approach to inverse problems, an alg…
▽ More
We study the use of novel techniques arising in machine learning for inverse problems. Our approach replaces the complex forward model by a neural network, which is trained simultaneously in a one-shot sense when estimating the unknown parameters from data, i.e. the neural network is trained only for the unknown parameter. By establishing a link to the Bayesian approach to inverse problems, an algorithmic framework is developed which ensures the feasibility of the parameter estimate w.r. to the forward model. We propose an efficient, derivative-free optimization method based on variants of the ensemble Kalman inversion. Numerical experiments show that the ensemble Kalman filter for neural network based one-shot inversion is a promising direction combining optimization and machine learning techniques for inverse problems.
△ Less
Submitted 14 September, 2020; v1 submitted 5 May, 2020;
originally announced May 2020.
-
Fokker-Planck particle systems for Bayesian inference: Computational approaches
Authors:
Sebastian Reich,
Simon Weissmann
Abstract:
Bayesian inference can be embedded into an appropriately defined dynamics in the space of probability measures. In this paper, we take Brownian motion and its associated Fokker--Planck equation as a starting point for such embeddings and explore several interacting particle approximations. More specifically, we consider both deterministic and stochastic interacting particle systems and combine the…
▽ More
Bayesian inference can be embedded into an appropriately defined dynamics in the space of probability measures. In this paper, we take Brownian motion and its associated Fokker--Planck equation as a starting point for such embeddings and explore several interacting particle approximations. More specifically, we consider both deterministic and stochastic interacting particle systems and combine them with the idea of preconditioning by the empirical covariance matrix. In addition to leading to affine invariant formulations which asymptotically speed up convergence, preconditioning allows for gradient-free implementations in the spirit of the ensemble Kalman filter. While such gradient-free implementations have been demonstrated to work well for posterior measures that are nearly Gaussian, we extend their scope of applicability to multimodal measures by introducing localised gradient-free approximations. Numerical results demonstrate the effectiveness of the considered methodologies.
△ Less
Submitted 8 February, 2021; v1 submitted 25 November, 2019;
originally announced November 2019.
-
On the Incorporation of Box-Constraints for Ensemble Kalman Inversion
Authors:
Neil K. Chada,
Claudia Schillings,
Simon Weissmann
Abstract:
The Bayesian approach to inverse problems is widely used in practice to infer unknown parameters from noisy observations. In this framework, the ensemble Kalman inversion has been successfully applied for the quantification of uncertainties in various areas of applications. In recent years, a complete analysis of the method has been developed for linear inverse problems adopting an optimization vi…
▽ More
The Bayesian approach to inverse problems is widely used in practice to infer unknown parameters from noisy observations. In this framework, the ensemble Kalman inversion has been successfully applied for the quantification of uncertainties in various areas of applications. In recent years, a complete analysis of the method has been developed for linear inverse problems adopting an optimization viewpoint. However, many applications require the incorporation of additional constraints on the parameters, e.g. arising due to physical constraints. We propose a new variant of the ensemble Kalman inversion to include box constraints on the unknown parameters motivated by the theory of projected preconditioned gradient flows. Based on the continuous time limit of the constrained ensemble Kalman inversion, we discuss a complete convergence analysis for linear forward problems. We adopt techniques from filtering which are crucial in order to improve the performance and establish a correct descent, such as variance inflation. These benefits are highlighted through a number of numerical examples on various inverse problems based on partial differential equations.
△ Less
Submitted 14 October, 2019; v1 submitted 2 August, 2019;
originally announced August 2019.
-
Well Posedness and Convergence Analysis of the Ensemble Kalman Inversion
Authors:
Dirk Blömker,
Claudia Schillings,
Philipp Wacker,
Simon Weissmann
Abstract:
The ensemble Kalman inversion is widely used in practice to estimate unknown parameters from noisy measurement data. Its low computational costs, straightforward implementation, and non-intrusive nature makes the method appealing in various areas of application. We present a complete analysis of the ensemble Kalman inversion with perturbed observations for a fixed ensemble size when applied to lin…
▽ More
The ensemble Kalman inversion is widely used in practice to estimate unknown parameters from noisy measurement data. Its low computational costs, straightforward implementation, and non-intrusive nature makes the method appealing in various areas of application. We present a complete analysis of the ensemble Kalman inversion with perturbed observations for a fixed ensemble size when applied to linear inverse problems. The well-posedness and convergence results are based on the continuous time scaling limits of the method. The resulting coupled system of stochastic differential equations allows to derive estimates on the long-time behaviour and provides insights into the convergence properties of the ensemble Kalman inversion. We view the method as a derivative free optimization method for the least-squares misfit functional, which opens up the perspective to use the method in various areas of applications such as imaging, groundwater flow problems, biological problems as well as in the context of the training of neural networks.
△ Less
Submitted 26 February, 2019; v1 submitted 19 October, 2018;
originally announced October 2018.
-
A new doubly discrete analogue of smoke ring flow and the real time simulation of fluid flow
Authors:
Ulrich Pinkall,
Boris Springborn,
Steffen Weissmann
Abstract:
Modelling incompressible ideal fluids as a finite collection of vortex filaments is important in physics (super-fluidity, models for the onset of turbulence) as well as for numerical algorithms used in computer graphics for the real time simulation of smoke. Here we introduce a time-discrete evolution equation for arbitrary closed polygons in 3-space that is a discretisation of the localised ind…
▽ More
Modelling incompressible ideal fluids as a finite collection of vortex filaments is important in physics (super-fluidity, models for the onset of turbulence) as well as for numerical algorithms used in computer graphics for the real time simulation of smoke. Here we introduce a time-discrete evolution equation for arbitrary closed polygons in 3-space that is a discretisation of the localised induction approximation of filament motion. This discretisation shares with its continuum limit the property that it is a completely integrable system. We apply this polygon evolution to a significant improvement of the numerical algorithms used in Computer Graphics.
△ Less
Submitted 7 August, 2007;
originally announced August 2007.