Search | arXiv e-print repository

Efficient, Multimodal, and Derivative-Free Bayesian Inference With Fisher-Rao Gradient Flows

Authors: Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich, Andrew M. Stuart

Abstract: In this paper, we study efficient approximate sampling for probability distributions known up to normalization constants. We specifically focus on a problem class arising in Bayesian inference for large-scale inverse problems in science and engineering applications. The computational challenges we address with the proposed methodology are: (i) the need for repeated evaluations of expensive forward… ▽ More In this paper, we study efficient approximate sampling for probability distributions known up to normalization constants. We specifically focus on a problem class arising in Bayesian inference for large-scale inverse problems in science and engineering applications. The computational challenges we address with the proposed methodology are: (i) the need for repeated evaluations of expensive forward models; (ii) the potential existence of multiple modes; and (iii) the fact that gradient of, or adjoint solver for, the forward model might not be feasible. While existing Bayesian inference methods meet some of these challenges individually, we propose a framework that tackles all three systematically. Our approach builds upon the Fisher-Rao gradient flow in probability space, yielding a dynamical system for probability densities that converges towards the target distribution at a uniform exponential rate. This rapid convergence is advantageous for the computational burden outlined in (i). We apply Gaussian mixture approximations with operator splitting techniques to simulate the flow numerically; the resulting approximation can capture multiple modes thus addressing (ii). Furthermore, we employ the Kalman methodology to facilitate a derivative-free update of these Gaussian components and their respective weights, addressing the issue in (iii). The proposed methodology results in an efficient derivative-free sampler flexible enough to handle multi-modal distributions: Gaussian Mixture Kalman Inversion (GMKI). The effectiveness of GMKI is demonstrated both theoretically and numerically in several experiments with multimodal target distributions, including proof-of-concept and two-dimensional examples, as well as a large-scale application: recovering the Navier-Stokes initial condition from solution data at positive times. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 42 pages, 9 figures

arXiv:2406.14738 [pdf, ps, other]

Robust parameter estimation for partially observed second-order diffusion processes

Authors: Jan Albrecht, Sebastian Reich

Abstract: Estimating parameters of a diffusion process given continuous-time observations of the process via maximum likelihood approaches or, online, via stochastic gradient descent or Kalman filter formulations constitutes a well-established research area. It has also been established previously that these techniques are, in general, not robust to perturbations in the data in the form of temporal correlat… ▽ More Estimating parameters of a diffusion process given continuous-time observations of the process via maximum likelihood approaches or, online, via stochastic gradient descent or Kalman filter formulations constitutes a well-established research area. It has also been established previously that these techniques are, in general, not robust to perturbations in the data in the form of temporal correlations. While the subject is relatively well understood and appropriate modifications have been suggested in the context of multi-scale diffusion processes and their reduced model equations, we consider here an alternative setting where a second-order diffusion process in positions and velocities is only observed via its positions. In this note, we propose a simple modification to standard stochastic gradient descent and Kalman filter formulations, which eliminates the arising systematic estimation biases. The modification can be extended to standard maximum likelihood approaches and avoids computation of previously proposed correction terms. △ Less

Submitted 20 June, 2024; originally announced June 2024.

MSC Class: 65C30; 65L09; 60M20; 62F10; 62F15; 62L20

arXiv:2401.04372 [pdf, ps, other]

Stable generative modeling using diffusion maps

Authors: Georg Gottwald, Fengyi Li, Youssef Marzouk, Sebastian Reich

Abstract: We consider the problem of sampling from an unknown distribution for which only a sufficiently large number of training samples are available. Such settings have recently drawn considerable interest in the context of generative modelling. In this paper, we propose a generative model combining diffusion maps and Langevin dynamics. Diffusion maps are used to approximate the drift term from the avail… ▽ More We consider the problem of sampling from an unknown distribution for which only a sufficiently large number of training samples are available. Such settings have recently drawn considerable interest in the context of generative modelling. In this paper, we propose a generative model combining diffusion maps and Langevin dynamics. Diffusion maps are used to approximate the drift term from the available training samples, which is then implemented in a discrete-time Langevin sampler to generate new samples. By setting the kernel bandwidth to match the time step size used in the unadjusted Langevin algorithm, our method effectively circumvents any stability issues typically associated with time-step** stiff stochastic differential equations. More precisely, we introduce a novel split-step scheme, ensuring that the generated samples remain within the convex hull of the training samples. Our framework can be naturally extended to generate conditional samples. We demonstrate the performance of our proposed scheme through experiments on synthetic datasets with increasing dimensions and on a stochastic subgrid-scale parametrization conditional sampling problem. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: 23 pages, 25 figures

arXiv:2312.15975 [pdf, other]

Filtered data based estimators for stochastic processes driven by colored noise

Authors: Grigorios A. Pavliotis, Sebastian Reich, Andrea Zanoni

Abstract: We consider the problem of estimating unknown parameters in stochastic differential equations driven by colored noise, which we model as a sequence of Gaussian stationary processes with decreasing correlation time. We aim to infer parameters in the limit equation, driven by white noise, given observations of the colored noise dynamics. We consider both the maximum likelihood and the stochastic gra… ▽ More We consider the problem of estimating unknown parameters in stochastic differential equations driven by colored noise, which we model as a sequence of Gaussian stationary processes with decreasing correlation time. We aim to infer parameters in the limit equation, driven by white noise, given observations of the colored noise dynamics. We consider both the maximum likelihood and the stochastic gradient descent in continuous time estimators, and we propose to modify them by including filtered data. We provide a convergence analysis for our estimators showing their asymptotic unbiasedness in a general setting and asymptotic normality under a simplified scenario. △ Less

Submitted 22 January, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

arXiv:2311.18060 [pdf, ps, other]

Levitin-Polyak well-posedness of split multivalued variational inequalities

Authors: Soumitra Dey, Simeon Reich

Abstract: We introduce and study the split multivalued variational inequality problem (SMVIP) and the parametric SMVIP. We examine, in particular, Levitin-Polyak well-posedness of SMVIPs and parametric SMVIPs in Hilbert spaces. We provide several examples to illustrate our theoretical results. We also discuss several important special cases. We introduce and study the split multivalued variational inequality problem (SMVIP) and the parametric SMVIP. We examine, in particular, Levitin-Polyak well-posedness of SMVIPs and parametric SMVIPs in Hilbert spaces. We provide several examples to illustrate our theoretical results. We also discuss several important special cases. △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: arXiv admin note: text overlap with arXiv:2208.07126

MSC Class: 49K40; 49J40; 90C31; 47H10; 47J20

arXiv:2311.06906 [pdf, ps, other]

Particle-based algorithm for stochastic optimal control

Authors: Sebastian Reich

Abstract: The solution to a stochastic optimal control problem can be determined by computing the value function from a discretization of the associated Hamilton-Jacobi-Bellman equation. Alternatively, the problem can be reformulated in terms of a pair of forward-backward SDEs, which makes Monte-Carlo techniques applicable. More recently, the problem has also been viewed from the perspective of forward and… ▽ More The solution to a stochastic optimal control problem can be determined by computing the value function from a discretization of the associated Hamilton-Jacobi-Bellman equation. Alternatively, the problem can be reformulated in terms of a pair of forward-backward SDEs, which makes Monte-Carlo techniques applicable. More recently, the problem has also been viewed from the perspective of forward and reverse time SDEs and their associated Fokker-Planck equations. This approach is closely related to techniques used in diffusion-based generative models. Forward and reverse time formulations express the value function as the ratio of two probability density functions; one stemming from a forward McKean-Vlasov SDE and another one from a reverse McKean-Vlasov SDE. In this paper, we extend this approach to a more general class of stochastic optimal control problems and combine it with ensemble Kalman filter type and diffusion map approximation techniques in order to obtain efficient and robust particle-based algorithms. △ Less

Submitted 27 February, 2024; v1 submitted 12 November, 2023; originally announced November 2023.

MSC Class: 93E20; 49L12; 65C35; 65M75

arXiv:2310.10205 [pdf, other]

New iterative algorithms for solving split variational inclusions

Authors: Soumitra Dey, Chinedu Izuchukwu, Adeolu Taiwo, Simeon Reich

Abstract: In this paper we study a class of split variational inclusion (SVI) and regularized split variational inclusion (RSVI) problems in real Hilbert spaces. We discuss various analytical properties of the net generated by the RSVI and establish the existence and uniqueness of the solution to the RSVI. Using analytical properties of this net and under certain assumptions on the parameters and map**s a… ▽ More In this paper we study a class of split variational inclusion (SVI) and regularized split variational inclusion (RSVI) problems in real Hilbert spaces. We discuss various analytical properties of the net generated by the RSVI and establish the existence and uniqueness of the solution to the RSVI. Using analytical properties of this net and under certain assumptions on the parameters and map**s associated with the SVI, we establish the strong convergence of the sequence generated by our proposed iterative algorithm. We also deduce another iterative algorithm by taking the regularization parameters to be zero in our proposed algorithm. We establish the weak convergence of the sequence generated by our new algorithm under certain assumptions. Moreover, we discuss two special cases of the SVI, namely the split convex minimization and the split variational inequality problems, and give several numerical examples. △ Less

Submitted 16 October, 2023; originally announced October 2023.

MSC Class: 65Y05; 65K15; 47H05; 49J53; 47H10

arXiv:2310.03597 [pdf, other]

Sampling via Gradient Flows in the Space of Probability Measures

Authors: Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich, Andrew M Stuart

Abstract: Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design com… ▽ More Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically. △ Less

Submitted 9 March, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: Related and text overlap with arXiv:2302.11024

arXiv:2309.04742 [pdf, other]

Affine Invariant Ensemble Transform Methods to Improve Predictive Uncertainty in Neural Networks

Authors: Diksha Bhandari, Jakiw Pidstrigach, Sebastian Reich

Abstract: We consider the problem of performing Bayesian inference for logistic regression using appropriate extensions of the ensemble Kalman filter. Two interacting particle systems are proposed that sample from an approximate posterior and prove quantitative convergence rates of these interacting particle systems to their mean-field limit as the number of particles tends to infinity. Furthermore, we appl… ▽ More We consider the problem of performing Bayesian inference for logistic regression using appropriate extensions of the ensemble Kalman filter. Two interacting particle systems are proposed that sample from an approximate posterior and prove quantitative convergence rates of these interacting particle systems to their mean-field limit as the number of particles tends to infinity. Furthermore, we apply these techniques and examine their effectiveness as methods of Bayesian approximation for quantifying predictive uncertainty in neural networks. △ Less

Submitted 1 July, 2024; v1 submitted 9 September, 2023; originally announced September 2023.

arXiv:2308.16784 [pdf, other]

Dropout Ensemble Kalman inversion for high dimensional inverse problems

Authors: Shuigen Liu, Sebastian Reich, Xin T. Tong

Abstract: Ensemble Kalman inversion (EKI) is an ensemble-based method to solve inverse problems. Its gradient-free formulation makes it an attractive tool for problems with involved formulation. However, EKI suffers from the ''subspace property'', i.e., the EKI solutions are confined in the subspace spanned by the initial ensemble. It implies that the ensemble size should be larger than the problem dimensio… ▽ More Ensemble Kalman inversion (EKI) is an ensemble-based method to solve inverse problems. Its gradient-free formulation makes it an attractive tool for problems with involved formulation. However, EKI suffers from the ''subspace property'', i.e., the EKI solutions are confined in the subspace spanned by the initial ensemble. It implies that the ensemble size should be larger than the problem dimension to ensure EKI's convergence to the correct solution. Such scaling of ensemble size is impractical and prevents the use of EKI in high dimensional problems. To address this issue, we propose a novel approach using dropout regularization to mitigate the subspace problem. We prove that dropout-EKI converges in the small ensemble settings, and the computational cost of the algorithm scales linearly with dimension. We also show that dropout-EKI reaches the optimal query complexity, up to a constant factor. Numerical examples demonstrate the effectiveness of our approach. △ Less

Submitted 31 August, 2023; originally announced August 2023.

arXiv:2306.12219 [pdf, ps, other]

Comparing the Methods of Alternating and Simultaneous Projections for Two Subspaces

Authors: Simeon Reich, Rafał Zalas

Abstract: We study the well-known methods of alternating and simultaneous projections when applied to two nonorthogonal linear subspaces of a real Euclidean space. Assuming that both of the methods have a common starting point chosen from either one of the subspaces, we show that the method of alternating projections converges significantly faster than the method of simultaneous projections. On the other ha… ▽ More We study the well-known methods of alternating and simultaneous projections when applied to two nonorthogonal linear subspaces of a real Euclidean space. Assuming that both of the methods have a common starting point chosen from either one of the subspaces, we show that the method of alternating projections converges significantly faster than the method of simultaneous projections. On the other hand, we provide examples of subspaces and starting points, where the method of simultaneous projections outperforms the method of alternating projections. △ Less

Submitted 14 November, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

arXiv:2304.12727 [pdf, ps, other]

On forward-backward SDE approaches to continuous-time minimum variance estimation

Authors: ** Won Kim, Sebastian Reich

Abstract: The work of Kalman and Bucy has established a duality between filtering and optimal estimation in the context of time-continuous linear systems. This duality has recently been extended to time-continuous nonlinear systems in terms of an optimization problem constrained by a backward stochastic partial differential equation. Here we revisit this problem from the perspective of appropriate forward-b… ▽ More The work of Kalman and Bucy has established a duality between filtering and optimal estimation in the context of time-continuous linear systems. This duality has recently been extended to time-continuous nonlinear systems in terms of an optimization problem constrained by a backward stochastic partial differential equation. Here we revisit this problem from the perspective of appropriate forward-backward stochastic differential equations. This approach sheds new light on the estimation problem and provides a unifying perspective. It is also demonstrated that certain formulations of the estimation problem lead to deterministic formulations similar to the linear Gaussian case as originally investigated by Kalman and Bucy. Finally, optimal control of partially observed diffusion processes is discussed as an application of the proposed estimators. △ Less

Submitted 14 August, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

MSC Class: 90E10; 90E11; 60G35; 62M20; 93E11; 93E20

arXiv:2303.16494 [pdf, other]

doi 10.1137/23M1561142

EnKSGD: A Class Of Preconditioned Black Box Optimization And Inversion Algorithms

Authors: Brian Irwin, Sebastian Reich

Abstract: In this paper, we introduce the Ensemble Kalman-Stein Gradient Descent (EnKSGD) class of algorithms. The EnKSGD class of algorithms builds on the ensemble Kalman filter (EnKF) line of work, applying techniques from sequential data assimilation to unconstrained optimization and parameter estimation problems. The essential idea is to exploit the EnKF as a black box (i.e. derivative-free, zeroth orde… ▽ More In this paper, we introduce the Ensemble Kalman-Stein Gradient Descent (EnKSGD) class of algorithms. The EnKSGD class of algorithms builds on the ensemble Kalman filter (EnKF) line of work, applying techniques from sequential data assimilation to unconstrained optimization and parameter estimation problems. The essential idea is to exploit the EnKF as a black box (i.e. derivative-free, zeroth order) optimization tool if iterated to convergence. In this paper, we return to the foundations of the EnKF as a sequential data assimilation technique, including its continuous-time and mean-field limits, with the goal of develo** faster optimization algorithms suited to noisy black box optimization and inverse problems. The resulting EnKSGD class of algorithms can be designed to both maintain the desirable property of affine-invariance, and employ the well-known backtracking line search. Furthermore, EnKSGD algorithms are designed to not necessitate the subspace restriction property and variance collapse property of previous iterated EnKF approaches to optimization, as both these properties can be undesirable in an optimization context. EnKSGD also generalizes beyond the $L^{2}$ loss, and is thus applicable to a wider class of problems than the standard EnKF. Numerical experiments with both linear and nonlinear least squares problems, as well as maximum likelihood estimation, demonstrate the faster convergence of EnKSGD relative to alternative EnKF approaches to optimization. △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: 20 pages, 3 figures

MSC Class: 65K10; 90C56; 65C35; 65C05; 62F10

arXiv:2302.11024 [pdf, other]

Gradient Flows for Sampling: Mean-Field Models, Gaussian Approximations and Affine Invariance

Authors: Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich, Andrew M. Stuart

Abstract: Sampling a probability distribution with an unknown normalization constant is a fundamental problem in computational science and engineering. This task may be cast as an optimization problem over all probability measures, and an initial distribution can be evolved to the desired minimizer dynamically via gradient flows. Mean-field models, whose law is governed by the gradient flow in the space of… ▽ More Sampling a probability distribution with an unknown normalization constant is a fundamental problem in computational science and engineering. This task may be cast as an optimization problem over all probability measures, and an initial distribution can be evolved to the desired minimizer dynamically via gradient flows. Mean-field models, whose law is governed by the gradient flow in the space of probability measures, may also be identified; particle approximations of these mean-field models form the basis of algorithms. The gradient flow approach is also the basis of algorithms for variational inference, in which the optimization is performed over a parameterized family of probability distributions such as Gaussians, and the underlying gradient flow is restricted to the parameterized family. By choosing different energy functionals and metrics for the gradient flow, different algorithms with different convergence properties arise. In this paper, we concentrate on the Kullback-Leibler divergence after showing that, up to scaling, it has the unique property that the gradient flows resulting from this choice of energy do not depend on the normalization constant. For the metrics, we focus on variants of the Fisher-Rao, Wasserstein, and Stein metrics; we introduce the affine invariance property for gradient flows, and their corresponding mean-field models, determine whether a given metric leads to affine invariance, and modify it to make it affine invariant if it does not. We study the resulting gradient flows in both probability density space and Gaussian space. The flow in the Gaussian space may be understood as a Gaussian approximation of the flow. We demonstrate that the Gaussian approximation based on the metric and through moment closure coincide, establish connections between them, and study their long-time convergence properties showing the advantages of affine invariance. △ Less

Submitted 2 November, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: 82 pages, 8 figures (Welcome any feedback!)

arXiv:2302.10130 [pdf, other]

Infinite-Dimensional Diffusion Models

Authors: Jakiw Pidstrigach, Youssef Marzouk, Sebastian Reich, Sven Wang

Abstract: Diffusion models have had a profound impact on many application areas, including those where data are intrinsically infinite-dimensional, such as images or time series. The standard approach is first to discretize and then to apply diffusion models to the discretized data. While such approaches are practically appealing, the performance of the resulting algorithms typically deteriorates as discret… ▽ More Diffusion models have had a profound impact on many application areas, including those where data are intrinsically infinite-dimensional, such as images or time series. The standard approach is first to discretize and then to apply diffusion models to the discretized data. While such approaches are practically appealing, the performance of the resulting algorithms typically deteriorates as discretization parameters are refined. In this paper, we instead directly formulate diffusion-based generative models in infinite dimensions and apply them to the generative modeling of functions. We prove that our formulations are well posed in the infinite-dimensional setting and provide dimension-independent distance bounds from the sample to the target measure. Using our theory, we also develop guidelines for the design of infinite-dimensional diffusion models. For image distributions, these guidelines are in line with the canonical choices currently made for diffusion models. For other distributions, however, we can improve upon these canonical choices, which we show both theoretically and empirically, by applying the algorithms to data distributions on manifolds and inspired by Bayesian inverse problems or simulation-based inference. △ Less

Submitted 3 October, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

MSC Class: 68T99; 60Hxx

arXiv:2209.11371 [pdf, other]

Ensemble Kalman Methods: A Mean Field Perspective

Authors: Edoardo Calvello, Sebastian Reich, Andrew M. Stuart

Abstract: This paper provides a unifying mean field based framework for the derivation and analysis of ensemble Kalman methods. Both state estimation and parameter estimation problems are considered, and formulations in both discrete and continuous time are employed. For state estimation problems both the control and filtering approaches are studied; analogously, for parameter estimation (inverse) problems… ▽ More This paper provides a unifying mean field based framework for the derivation and analysis of ensemble Kalman methods. Both state estimation and parameter estimation problems are considered, and formulations in both discrete and continuous time are employed. For state estimation problems both the control and filtering approaches are studied; analogously, for parameter estimation (inverse) problems the optimization and Bayesian perspectives are both studied. The approach taken unifies a wide-ranging literature in the field, provides a framework for analysis of ensemble Kalman methods, and suggests open problems. △ Less

Submitted 22 September, 2022; originally announced September 2022.

arXiv:2209.05279 [pdf, ps, other]

Data assimilation: A dynamic homotopy-based coupling approach

Authors: Sebastian Reich

Abstract: Homotopy approaches to Bayesian inference have found widespread use especially if the Kullback-Leibler divergence between the prior and the posterior distribution is large. Here we extend one of these homotopy approach to include an underlying stochastic diffusion process. The underlying mathematical problem is closely related to the Schrödinger bridge problem for given marginal distributions. We… ▽ More Homotopy approaches to Bayesian inference have found widespread use especially if the Kullback-Leibler divergence between the prior and the posterior distribution is large. Here we extend one of these homotopy approach to include an underlying stochastic diffusion process. The underlying mathematical problem is closely related to the Schrödinger bridge problem for given marginal distributions. We demonstrate that the proposed homotopy approach provides a computationally tractable approximation to the underlying bridge problem. In particular, our implementation builds upon the widely used ensemble Kalman filter methodology and extends it to Schrödinger bridge problems within the context of sequential data assimilation. △ Less

Submitted 3 November, 2022; v1 submitted 12 September, 2022; originally announced September 2022.

MSC Class: 60M20; 60G35; 93E99; 62F15; 65C35

arXiv:2208.07126 [pdf, other]

doi 10.1007/s13398-023-01416-8

Levitin-Polyak Well-posedness for Split Equilibrium Problems

Authors: Soumitra Dey, Aviv Gibali, Simeon Reich

Abstract: The notion of well-posedness has drawn the attention of many researchers in the field of nonlinear analysis, as it allows to explore problems in which exact solutions are not known and/or computationally hard to compute. Roughly speaking, for a given problem, well-posedness guarantees the convergence of approximations to exact solutions via an iterative method. Thus, in this paper we extend the co… ▽ More The notion of well-posedness has drawn the attention of many researchers in the field of nonlinear analysis, as it allows to explore problems in which exact solutions are not known and/or computationally hard to compute. Roughly speaking, for a given problem, well-posedness guarantees the convergence of approximations to exact solutions via an iterative method. Thus, in this paper we extend the concept of Levitin-Polyak well-posedness to split equilibrium problems in real Banach spaces. In particular, we establish a metric characterization of Levitin-Polyak well-posedness by perturbations and also show an equivalence between Levitin-Polyak well-posedness by perturbations for split equilibrium problems and the existence and uniqueness of their solutions. △ Less

Submitted 3 May, 2023; v1 submitted 15 August, 2022; originally announced August 2022.

MSC Class: 49K40; 49J40; 90C31; 47H10; 47J20

Journal ref: Rev. Real Acad. Cienc. Exactas Fis. Nat. Ser. A-Mat., 117, Article number: 88 (2023)

arXiv:2208.06871 [pdf, other]

Strong Convergence of Forward-Reflected-Backward Splitting Methods for Solving Monotone Inclusions with Applications to Image Restoration and Optimal Control

Authors: Chinedu Izuchukwu, Simeon Reich, Yekini Shehu, Adeolu Taiwo

Abstract: In this paper, we propose and study several strongly convergent versions of the forward-reflected-backward splitting method of Malitsky and Tam for finding a zero of the sum of two monotone operators in a real Hilbert space. Our proposed methods only require one forward evaluation of the single-valued operator and one backward evaluation of the set-valued operator at each iteration; a feature that… ▽ More In this paper, we propose and study several strongly convergent versions of the forward-reflected-backward splitting method of Malitsky and Tam for finding a zero of the sum of two monotone operators in a real Hilbert space. Our proposed methods only require one forward evaluation of the single-valued operator and one backward evaluation of the set-valued operator at each iteration; a feature that is absent in many other available strongly convergent splitting methods in the literature. We also develop inertial versions of our methods and strong convergence results are obtained for these methods when the set-valued operator is maximal monotone and the single-valued operator is Lipschitz continuous and monotone. Finally, we discuss some examples from image restorations and optimal control regarding the implementations of our methods in comparison with known related methods in the literature. △ Less

Submitted 14 August, 2022; originally announced August 2022.

MSC Class: 47H09; 47H10; 49J20; 49J40

arXiv:2207.09317 [pdf, other]

Generalized projections on general Banach spaces

Authors: Akhtar A. Khan, **lu Li, Simeon Reich

Abstract: In general Banach spaces, the metric projection map lacks the powerful properties it enjoys in Hilbert spaces. There are a few generalized projections that have been proposed in order to resolve many of the deficiencies of the metric projection. However, such notions are predominantly studied in Banach spaces with rich topological structures, such as uniformly convex Banach spaces. In this paper,… ▽ More In general Banach spaces, the metric projection map lacks the powerful properties it enjoys in Hilbert spaces. There are a few generalized projections that have been proposed in order to resolve many of the deficiencies of the metric projection. However, such notions are predominantly studied in Banach spaces with rich topological structures, such as uniformly convex Banach spaces. In this paper, we investigate two notions of generalized projection in general Banach spaces. Various examples are provided to demonstrate the proposed notions and the loss of structure in the generalized projections after migrating from specially structured Banach spaces to general Banach spaces. Connections between the generalized projection and the metric projection are thoroughly explored. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: 31 pages

arXiv:2206.08240 [pdf, ps, other]

doi 10.1007/s00025-022-01694-5

Convergence of Two Simple Methods for Solving Monotone Inclusion Problems in Reflexive Banach Spaces

Authors: Chinedu Izuchukwu, Simeon Reich, Yekini Shehu

Abstract: We propose two very simple methods, the first one with constant step sizes and the second one with self-adaptive step sizes, for finding a zero of the sum of two monotone operators in real reflexive Banach spaces. Our methods require only one evaluation of the single-valued operator at each iteration. Weak convergence results are obtained when the set-valued operator is maximal monotone and the si… ▽ More We propose two very simple methods, the first one with constant step sizes and the second one with self-adaptive step sizes, for finding a zero of the sum of two monotone operators in real reflexive Banach spaces. Our methods require only one evaluation of the single-valued operator at each iteration. Weak convergence results are obtained when the set-valued operator is maximal monotone and the single-valued operator is Lipschitz continuous, and strong convergence results are obtained when either one of these two operators is required, in addition, to be strongly monotone. We also obtain the rate of convergence of our proposed methods in real reflexive Banach spaces. Finally, we apply our results to solving generalized Nash equilibrium problems for gas markets. △ Less

Submitted 10 July, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

Comments: 15 pages

MSC Class: 47H05; 47J20; 47J25; 65K15; 90C25

Journal ref: Results Math 77, 143 (2022)

arXiv:2205.13843 [pdf, ps, other]

Polynomial Estimates for the Method of Cyclic Projections in Hilbert Spaces

Authors: Simeon Reich, Rafał Zalas

Abstract: We study the method of cyclic projections when applied to closed and linear subspaces $M_i$, $i=1,\ldots,m$, of a real Hilbert space $\mathcal H$. We show that the average distance to individual sets enjoys a polynomial behaviour $o(k^{-1/2})$ along the trajectory of the generated iterates. Surprisingly, when the starting points are chosen from the subspace $\sum_{i=1}^{m}M_i^\perp$, our result yi… ▽ More We study the method of cyclic projections when applied to closed and linear subspaces $M_i$, $i=1,\ldots,m$, of a real Hilbert space $\mathcal H$. We show that the average distance to individual sets enjoys a polynomial behaviour $o(k^{-1/2})$ along the trajectory of the generated iterates. Surprisingly, when the starting points are chosen from the subspace $\sum_{i=1}^{m}M_i^\perp$, our result yields a polynomial rate of convergence $\mathcal O(k^{-1/2})$ for the method of cyclic projections itself. Moreover, if $\sum_{i=1}^{m} M_i^\perp$ is not closed, then both of the aforementioned rates are best possible in the sense that the corresponding polynomial $k^{1/2}$ cannot be replaced by $k^{1/2+\varepsilon}$ for any $\varepsilon >0$. △ Less

Submitted 17 April, 2023; v1 submitted 27 May, 2022; originally announced May 2022.

MSC Class: 41A25; 41A28; 41A44; 41A65

arXiv:2204.10279 [pdf, ps, other]

doi 10.1016/j.jmaa.2023.127179

Generic properties of nonexpansive map**s on unbounded domains

Authors: Christian Bargetz, Simeon Reich, Daylen Thimm

Abstract: We investigate typical properties of nonexpansive map**s on unbounded complete hyperbolic metric spaces. For two families of metrics of uniform convergence on bounded sets, we show that the typical nonexpansive map** is a Rakotch contraction on every bounded subset and that there is a bounded set which is mapped into itself by this map**. In particular, we obtain that the typical nonexpansiv… ▽ More We investigate typical properties of nonexpansive map**s on unbounded complete hyperbolic metric spaces. For two families of metrics of uniform convergence on bounded sets, we show that the typical nonexpansive map** is a Rakotch contraction on every bounded subset and that there is a bounded set which is mapped into itself by this map**. In particular, we obtain that the typical nonexpansive map** in this setting has a unique fixed point which can be reached by iterating the map**. Nevertheless, it turns out that the typical map** is not a Rakotch contraction on the whole space and that it has the maximal possible Lipschitz constant of one on a residual subset of its domain. By typical we mean that the complement of the set of map**s with this property is $σ$-$φ$-porous, that is, small in a metric sense. For a metric of pointwise convergence, we show that the set of Rakotch contractions is meagre. △ Less

Submitted 8 February, 2023; v1 submitted 21 April, 2022; originally announced April 2022.

Comments: 38 pages; added more details to some parts of the manuscript; corrected a number of small mistakes

MSC Class: 47H09; 54E52

Journal ref: J. Math. Anal. Appl. 526(1): Article 127179, 2023

arXiv:2204.05707 [pdf, other]

A Neural Network for Solving Inverse Quasi-Variational Inequalities

Authors: Soumitra Dey, Simeon Reich

Abstract: We study the existence and uniqueness of solutions to the inverse quasi-variational inequality problem. Motivated by the neural network approach to solving optimization problems such as variational inequality, monotone inclusion, and inverse variational problems, we consider a neural network associated with the inverse quasi-variational inequality problem, and establish the existence and uniquenes… ▽ More We study the existence and uniqueness of solutions to the inverse quasi-variational inequality problem. Motivated by the neural network approach to solving optimization problems such as variational inequality, monotone inclusion, and inverse variational problems, we consider a neural network associated with the inverse quasi-variational inequality problem, and establish the existence and uniqueness of a solution to the proposed network. We prove that every trajectory of the proposed neural network converges to the unique solution of the inverse quasi-variational inequality problem and that the network is globally asymptotically stable at its equilibrium point. We also prove that if the function which governs the inverse quasi-variational inequality problem is strongly monotone and Lipschitz continuous, then the network is globally exponentially stable at its equilibrium point. We discretize the network and show that the sequence generated by the discretization of the network converges strongly to a solution of the inverse quasi-variational inequality problem under certain assumptions on the parameters involved. Finally, we provide numerical examples to support and illustrate our theoretical results. △ Less

Submitted 12 April, 2022; originally announced April 2022.

MSC Class: 92B20; 58E35; 47J20; 68T05; 82C32; 90C33

arXiv:2204.04386 [pdf, other]

Efficient Derivative-free Bayesian Inference for Large-Scale Inverse Problems

Authors: Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich, Andrew M. Stuart

Abstract: We consider Bayesian inference for large scale inverse problems, where computational challenges arise from the need for repeated evaluations of an expensive forward model. This renders most Markov chain Monte Carlo approaches infeasible, since they typically require $O(10^4)$ model runs, or more. Moreover, the forward model is often given as a black box or is impractical to differentiate. Therefor… ▽ More We consider Bayesian inference for large scale inverse problems, where computational challenges arise from the need for repeated evaluations of an expensive forward model. This renders most Markov chain Monte Carlo approaches infeasible, since they typically require $O(10^4)$ model runs, or more. Moreover, the forward model is often given as a black box or is impractical to differentiate. Therefore derivative-free algorithms are highly desirable. We propose a framework, which is built on Kalman methodology, to efficiently perform Bayesian inference in such inverse problems. The basic method is based on an approximation of the filtering distribution of a novel mean-field dynamical system into which the inverse problem is embedded as an observation operator. Theoretical properties of the mean-field model are established for linear inverse problems, demonstrating that the desired Bayesian posterior is given by the steady state of the law of the filtering distribution of the mean-field dynamical system, and proving exponential convergence to it. This suggests that, for nonlinear problems which are close to Gaussian, sequentially computing this law provides the basis for efficient iterative methods to approximate the Bayesian posterior. Ensemble methods are applied to obtain interacting particle system approximations of the filtering distribution of the mean-field model; and practical strategies to further reduce the computational and memory cost of the methodology are presented, including low-rank approximation and a bi-fidelity approach. The effectiveness of the framework is demonstrated in several numerical experiments, including proof-of-concept linear/nonlinear examples and two large-scale applications: learning of permeability parameters in subsurface flow; and learning subgrid-scale parameters in a global climate model from time-averaged statistics. △ Less

Submitted 11 August, 2022; v1 submitted 9 April, 2022; originally announced April 2022.

Comments: 44 pages, 15 figures

arXiv:2204.00275 [pdf, ps, other]

Unrestricted Douglas-Rachford algorithms for solving convex feasibility problems in Hilbert space

Authors: Kay Barshad, Aviv Gibali, Simeon Reich

Abstract: In this work we focus on the convex feasibility problem (CFP) in Hilbert space. A specific method in this area that has gained a lot of interest in recent years is the Douglas-Rachford (DR) algorithm. This algorithm was originally introduced in 1956 for solving stationary and non-stationary heat equations. Then in 1979, Lions and Mercier adjusted and extended the algorithm with the aim of solving… ▽ More In this work we focus on the convex feasibility problem (CFP) in Hilbert space. A specific method in this area that has gained a lot of interest in recent years is the Douglas-Rachford (DR) algorithm. This algorithm was originally introduced in 1956 for solving stationary and non-stationary heat equations. Then in 1979, Lions and Mercier adjusted and extended the algorithm with the aim of solving CFPs and even more general problems, such as finding zeros of the sum of two maximally monotone operators. Many developments which implement various concepts concerning this algorithm have occurred during the last decade. We introduce an unrestricted DR algorithm, which provides a general framework for such concepts. Using unrestricted products of a finite number of strongly nonexpansive operators, we apply this framework to provide new iterative methods, where, \textit{inter alia}, such operators may be interlaced between the operators used in the scheme of our \ unrestricted \color DR algorithm. △ Less

Submitted 5 November, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

Comments: 13 pages

MSC Class: 65K05; 90C25

arXiv:2201.00611 [pdf, ps, other]

doi 10.1007/978-3-031-18988-3_15

Robust parameter estimation using the ensemble Kalman filter

Authors: Sebastian Reich

Abstract: Standard maximum likelihood or Bayesian approaches to parameter estimation for stochastic differential equations are not robust to perturbations in the continuous-in-time data. In this paper, we give a rather elementary explanation of this observation in the context of continuous-time parameter estimation using an ensemble Kalman filter. We employ the frequentist perspective to shed new light on t… ▽ More Standard maximum likelihood or Bayesian approaches to parameter estimation for stochastic differential equations are not robust to perturbations in the continuous-in-time data. In this paper, we give a rather elementary explanation of this observation in the context of continuous-time parameter estimation using an ensemble Kalman filter. We employ the frequentist perspective to shed new light on three robust estimation techniques; namely subsampling the data, rough path corrections, and data filtering. We illustrate our findings through a simple numerical experiment. △ Less

Submitted 19 December, 2023; v1 submitted 3 January, 2022; originally announced January 2022.

MSC Class: 62L12; 62M20; 62F15; 60L90

arXiv:2107.06621 [pdf, other]

doi 10.1214/23-AAP1957

Rough McKean-Vlasov dynamics for robust ensemble Kalman filtering

Authors: Michele Coghi, Torstein Nilssen, Nikolas Nüsken, Sebastian Reich

Abstract: Motivated by the challenge of incorporating data into misspecified and multiscale dynamical models, we study a McKean-Vlasov equation that contains the data stream as a common driving rough path. This setting allows us to prove well-posedness as well as continuity with respect to the driver in an appropriate rough-path topology. The latter property is key in our subsequent development of a robust… ▽ More Motivated by the challenge of incorporating data into misspecified and multiscale dynamical models, we study a McKean-Vlasov equation that contains the data stream as a common driving rough path. This setting allows us to prove well-posedness as well as continuity with respect to the driver in an appropriate rough-path topology. The latter property is key in our subsequent development of a robust data assimilation methodology: We establish propagation of chaos for the associated interacting particle system, which in turn is suggestive of a numerical scheme that can be viewed as an extension of the ensemble Kalman filter to a rough-path framework. Finally, we discuss a data-driven method based on subsampling to construct suitable rough path lifts and demonstrate the robustness of our scheme in a number of numerical experiments related to parameter estimation problems in multiscale contexts. △ Less

Submitted 20 January, 2022; v1 submitted 14 July, 2021; originally announced July 2021.

Comments: 44 pages, 7 figures

MSC Class: 60L20; 60L90; 60H10; 60F99; 65C35; 62M05

arXiv:2104.08061 [pdf, other]

Affine-invariant ensemble transform methods for logistic regression

Authors: Jakiw Pidstrigach, Sebastian Reich

Abstract: We investigate the application of ensemble transform approaches to Bayesian inference of logistic regression problems. Our approach relies on appropriate extensions of the popular ensemble Kalman filter and the feedback particle filter to the cross entropy loss function and is based on a well-established homotopy approach to Bayesian inference. The arising finite particle evolution equations as we… ▽ More We investigate the application of ensemble transform approaches to Bayesian inference of logistic regression problems. Our approach relies on appropriate extensions of the popular ensemble Kalman filter and the feedback particle filter to the cross entropy loss function and is based on a well-established homotopy approach to Bayesian inference. The arising finite particle evolution equations as well as their mean-field limits are affine-invariant. Furthermore, the proposed methods can be implemented in a gradient-free manner in case of nonlinear logistic regression and the data can be randomly subsampled similar to mini-batching of stochastic gradient descent. We also propose a closely related SDE-based sampling method which again is affine-invariant and can easily be made gradient-free. Numerical examples demonstrate the appropriateness of the proposed methodologies. △ Less

Submitted 24 September, 2021; v1 submitted 16 April, 2021; originally announced April 2021.

MSC Class: 62J02; 65C05; 62F15 (Primary) 65C35 (Secondary)

arXiv:2102.00471 [pdf, ps, other]

Finitely Convergent Iterative Methods with Overrelaxations Revisited

Authors: Victor I. Kolobov, Simeon Reich, Rafał Zalas

Abstract: We study the finite convergence of iterative methods for solving convex feasibility problems. Our key assumptions are that the interior of the solution set is nonempty and that certain overrelaxation parameters converge to zero, but with a rate slower than any geometric sequence. Unlike other works in this area, which require divergent series of overrelaxations, our approach allows us to consider… ▽ More We study the finite convergence of iterative methods for solving convex feasibility problems. Our key assumptions are that the interior of the solution set is nonempty and that certain overrelaxation parameters converge to zero, but with a rate slower than any geometric sequence. Unlike other works in this area, which require divergent series of overrelaxations, our approach allows us to consider some summable series. By employing quasi-Fejérian analysis in the latter case, we obtain additional asymptotic convergence guarantees, even when the interior of the solution set is empty. △ Less

Submitted 11 July, 2021; v1 submitted 31 January, 2021; originally announced February 2021.

MSC Class: 47J25; 47N10; 90C25

arXiv:2101.03612 [pdf, other]

doi 10.1007/s10596-021-10100-y

Randomized maximum likelihood based posterior sampling

Authors: Yuming Ba, Jana de Wiljes, Dean S. Oliver, Sebastian Reich

Abstract: Minimization of a stochastic cost function is commonly used for approximate sampling in high-dimensional Bayesian inverse problems with Gaussian prior distributions and multimodal posterior distributions. The density of the samples generated by minimization is not the desired target density, unless the observation operator is linear, but the distribution of samples is useful as a proposal density… ▽ More Minimization of a stochastic cost function is commonly used for approximate sampling in high-dimensional Bayesian inverse problems with Gaussian prior distributions and multimodal posterior distributions. The density of the samples generated by minimization is not the desired target density, unless the observation operator is linear, but the distribution of samples is useful as a proposal density for importance sampling or for Markov chain Monte Carlo methods. In this paper, we focus on applications to sampling from multimodal posterior distributions in high dimensions. We first show that sampling from multimodal distributions is improved by computing all critical points instead of only minimizers of the objective function. For applications to high-dimensional geoscience problems, we demonstrate an efficient approximate weighting that uses a low-rank Gauss-Newton approximation of the determinant of the Jacobian. The method is applied to two toy problems with known posterior distributions and a Darcy flow problem with multiple modes in the posterior. △ Less

Submitted 17 August, 2021; v1 submitted 10 January, 2021; originally announced January 2021.

MSC Class: 65C05; 62F15; 65K10

arXiv:2011.08640 [pdf, ps, other]

Class Group Relations in a Function Field Analogue of ${\mathbb Q}(ζ_p, \sqrt[p]{n})$

Authors: Steven Reich

Abstract: For an odd prime $p$ and polynomial $P(T)$, we consider the extension $F$ of $k={\mathbb F}_p(T)$ defined by adjoining a root of $x^p+Tx-P(T)$. Such a field is a function field analogue of the number field ${\mathbb Q}(\sqrt[p]{n})$. We prove two theorems about the Galois closure $L$ of $F$: that its degree-0 divisor class group is $A^{p-1}$ for some group $A$, and that its class number is the… ▽ More For an odd prime $p$ and polynomial $P(T)$, we consider the extension $F$ of $k={\mathbb F}_p(T)$ defined by adjoining a root of $x^p+Tx-P(T)$. Such a field is a function field analogue of the number field ${\mathbb Q}(\sqrt[p]{n})$. We prove two theorems about the Galois closure $L$ of $F$: that its degree-0 divisor class group is $A^{p-1}$ for some group $A$, and that its class number is the $(p-1)$-st power of the class number of $F$, in analogy with results of R. Schoof and T. Honda for number fields. △ Less

Submitted 17 November, 2020; originally announced November 2020.

arXiv:2007.12658 [pdf, ps, other]

doi 10.1137/20m1355197

McKean-Vlasov SDEs in nonlinear filtering

Authors: Sahani Pathiraja, Sebastian Reich, Wilhelm Stannat

Abstract: Various particle filters have been proposed over the last couple of decades with the common feature that the update step is governed by a type of control law. This feature makes them an attractive alternative to traditional sequential Monte Carlo which scales poorly with the state dimension due to weight degeneracy. This article proposes a unifying framework that allows to systematically derive th… ▽ More Various particle filters have been proposed over the last couple of decades with the common feature that the update step is governed by a type of control law. This feature makes them an attractive alternative to traditional sequential Monte Carlo which scales poorly with the state dimension due to weight degeneracy. This article proposes a unifying framework that allows to systematically derive the McKean-Vlasov representations of these filters for the discrete time and continuous time observation case, taking inspiration from the smooth approximation of the data considered in Crisan & Xiong (2010) and Clark & Crisan (2005). We consider three filters that have been proposed in the literature and use this framework to derive Itô representations of their limiting forms as the approximation parameter $δ\rightarrow 0$. All filters require the solution of a Poisson equation defined on $\mathbb{R}^{d}$, for which existence and uniqueness of solutions can be a non-trivial issue. We additionally establish conditions on the signal-observation system that ensures well-posedness of the weighted Poisson equation arising in one of the filters. △ Less

Submitted 17 November, 2021; v1 submitted 24 July, 2020; originally announced July 2020.

Journal ref: SIAM Journal on Control and Optimization, 59(6), pp.4188-4215 (2021)

arXiv:2007.10664 [pdf, ps, other]

Error Bounds for the Method of Simultaneous Projections with Infinitely Many Subspaces

Authors: Simeon Reich, Rafał Zalas

Abstract: We investigate the properties of the simultaneous projection method as applied to countably infinitely many closed and linear subspaces of a real Hilbert space. We establish the optimal error bound for linear convergence of this method, which we express in terms of the cosine of the Friedrichs angle computed in an infinite product space. In addition, we provide estimates and alternative expression… ▽ More We investigate the properties of the simultaneous projection method as applied to countably infinitely many closed and linear subspaces of a real Hilbert space. We establish the optimal error bound for linear convergence of this method, which we express in terms of the cosine of the Friedrichs angle computed in an infinite product space. In addition, we provide estimates and alternative expressions for the above-mentioned number. Furthermore, we relate this number to the dichotomy theorem and to super-polynomially fast convergence. We also discuss polynomial convergence of the simultaneous projection method which takes place for particularly chosen starting points. △ Less

Submitted 29 August, 2021; v1 submitted 21 July, 2020; originally announced July 2020.

MSC Class: 41A25; 41A28; 41A44; 41A65

arXiv:2006.02037 [pdf, other]

Spectral convergence of diffusion maps: improved error bounds and an alternative normalisation

Authors: Caroline L. Wormell, Sebastian Reich

Abstract: Diffusion maps is a manifold learning algorithm widely used for dimensionality reduction. Using a sample from a distribution, it approximates the eigenvalues and eigenfunctions of associated Laplace-Beltrami operators. Theoretical bounds on the approximation error are however generally much weaker than the rates that are seen in practice. This paper uses new approaches to improve the error bounds… ▽ More Diffusion maps is a manifold learning algorithm widely used for dimensionality reduction. Using a sample from a distribution, it approximates the eigenvalues and eigenfunctions of associated Laplace-Beltrami operators. Theoretical bounds on the approximation error are however generally much weaker than the rates that are seen in practice. This paper uses new approaches to improve the error bounds in the model case where the distribution is supported on a hypertorus. For the data sampling (variance) component of the error we make spatially localised compact embedding estimates on certain Hardy spaces; we study the deterministic (bias) component as a perturbation of the Laplace-Beltrami operator's associated PDE, and apply relevant spectral stability results. Using these approaches, we match long-standing pointwise error bounds for both the spectral data and the norm convergence of the operator discretisation. We also introduce an alternative normalisation for diffusion maps based on Sinkhorn weights. This normalisation approximates a Langevin diffusion on the sample and yields a symmetric operator approximation. We prove that it has better convergence compared with the standard normalisation on flat domains, and present a highly efficient algorithm to compute the Sinkhorn weights. △ Less

Submitted 7 April, 2021; v1 submitted 3 June, 2020; originally announced June 2020.

Comments: Electronic copy of the final peer-reviewed manuscript accepted for publication

MSC Class: 35P15; 60J60; 62M05; 65D99

arXiv:2006.00702 [pdf, ps, other]

doi 10.3390/e22080802

Interacting particle solutions of Fokker-Planck equations through gradient-log-density estimation

Authors: Dimitra Maoutsa, Sebastian Reich, Manfred Opper

Abstract: Fokker-Planck equations are extensively employed in various scientific fields as they characterise the behaviour of stochastic systems at the level of probability density functions. Although broadly used, they allow for analytical treatment only in limited settings, and often is inevitable to resort to numerical solutions. Here, we develop a computational approach for simulating the time evolution… ▽ More Fokker-Planck equations are extensively employed in various scientific fields as they characterise the behaviour of stochastic systems at the level of probability density functions. Although broadly used, they allow for analytical treatment only in limited settings, and often is inevitable to resort to numerical solutions. Here, we develop a computational approach for simulating the time evolution of Fokker-Planck solutions in terms of a mean field limit of an interacting particle system. The interactions between particles are determined by the gradient of the logarithm of the particle density, approximated here by a novel statistical estimator. The performance of our method shows promising results, with more accurate and less fluctuating statistics compared to direct stochastic simulations of comparable particle number. Taken together, our framework allows for effortless and reliable particle-based simulations of Fokker-Planck equations in low and moderate dimensions. The proposed gradient-log-density estimator is also of independent interest, for example, in the context of optimal control. △ Less

Submitted 1 June, 2020; originally announced June 2020.

Comments: 34 pages, 8 figures

MSC Class: 82C80; 37M05; 37H05; 60H35; 65C35; 65N75

arXiv:2004.07540 [pdf, ps, other]

doi 10.1016/j.laa.2020.05.023

On angles, projections and iterations

Authors: Christian Bargetz, Jona Klemenc, Simeon Reich, Natalia Skorokhod

Abstract: We investigate connections between the geometry of linear subspaces and the convergence of the alternating projection method for linear projections. The aim of this article is twofold: in the first part, we show that even in Euclidean spaces the convergence of the alternating method is not determined by the principal angles between the subspaces involved. In the second part, we investigate the pro… ▽ More We investigate connections between the geometry of linear subspaces and the convergence of the alternating projection method for linear projections. The aim of this article is twofold: in the first part, we show that even in Euclidean spaces the convergence of the alternating method is not determined by the principal angles between the subspaces involved. In the second part, we investigate the properties of the Oppenheim angle between two linear projections. We discuss, in particular, the question of existence and uniqueness of "consistency projections" in this context. △ Less

Submitted 25 June, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

Comments: 15 pages; published in "Linear Algebra and Its Applications". This version corrects a number of misprints

Journal ref: Linear Algebra Appl. 603 (2020): 41-56

arXiv:2004.02567 [pdf, ps, other]

doi 10.12775/TMNA.2020.040

On the existence of fixed points for typical nonexpansive map**s on spaces with positive curvature

Authors: Christian Bargetz, Michael Dymond, Emir Medjic, Simeon Reich

Abstract: We show that the typical nonexpansive map** on a small enough subset of a CAT($κ$)-space is a contraction in the sense of Rakotch. By typical we mean that the set of nonexpansive mapp**s without this property is a $σ$-porous set and therefore also of the first Baire category. Moreover, we exhibit metric spaces where strict contractions are not dense in the space of nonexpansive map**s. In so… ▽ More We show that the typical nonexpansive map** on a small enough subset of a CAT($κ$)-space is a contraction in the sense of Rakotch. By typical we mean that the set of nonexpansive mapp**s without this property is a $σ$-porous set and therefore also of the first Baire category. Moreover, we exhibit metric spaces where strict contractions are not dense in the space of nonexpansive map**s. In some of these cases we show that all continuous self-map**s have a fixed point nevertheless. △ Less

Submitted 6 April, 2021; v1 submitted 6 April, 2020; originally announced April 2020.

Comments: 14 pages. Accepted version of the manuscript

MSC Class: 47H09; 54E52

Journal ref: Topol. Methods Nonlinear Anal. 57 (2021), 621-634

arXiv:2003.09219 [pdf, ps, other]

Posterior contraction rates for non-parametric state and drift estimation

Authors: Sebastian Reich, Paul Rozdeba

Abstract: We consider a combined state and drift estimation problem for the linear stochastic heat equation. The infinite-dimensional Bayesian inference problem is formulated in terms of the Kalman-Bucy filter over an extended state space, and its long-time asymptotic properties are studied. Asymptotic posterior contraction rates in the unknown drift function are the main contribution of this paper. Such ra… ▽ More We consider a combined state and drift estimation problem for the linear stochastic heat equation. The infinite-dimensional Bayesian inference problem is formulated in terms of the Kalman-Bucy filter over an extended state space, and its long-time asymptotic properties are studied. Asymptotic posterior contraction rates in the unknown drift function are the main contribution of this paper. Such rates have been studied before for stationary non-parametric Bayesian inverse problems, and here we demonstrate the consistency of our time-dependent formulation with these previous results building upon scale separation and a slow manifold approximation. △ Less

Submitted 17 August, 2020; v1 submitted 20 March, 2020; originally announced March 2020.

MSC Class: 62G20; 62G05; 60H15; 62F15; 62M20

arXiv:1912.02859 [pdf, ps, other]

Affine invariant interacting Langevin dynamics for Bayesian inference

Authors: Alfredo Garbuno-Inigo, Nikolas Nüsken, Sebastian Reich

Abstract: We propose a computational method (with acronym ALDI) for sampling from a given target distribution based on first-order (overdamped) Langevin dynamics which satisfies the property of affine invariance. The central idea of ALDI is to run an ensemble of particles with their empirical covariance serving as a preconditioner for their underlying Langevin dynamics. ALDI does not require taking the inve… ▽ More We propose a computational method (with acronym ALDI) for sampling from a given target distribution based on first-order (overdamped) Langevin dynamics which satisfies the property of affine invariance. The central idea of ALDI is to run an ensemble of particles with their empirical covariance serving as a preconditioner for their underlying Langevin dynamics. ALDI does not require taking the inverse or square root of the empirical covariance matrix, which enables application to high-dimensional sampling problems. The theoretical properties of ALDI are studied in terms of non-degeneracy and ergodicity. Furthermore, we study its connections to diffusion on Riemannian manifolds and Wasserstein gradient flows. Bayesian inference serves as a main application area for ALDI. In case of a forward problem with additive Gaussian measurement errors, ALDI allows for a gradient-free approximation in the spirit of the ensemble Kalman filter. A computational comparison between gradient-free and gradient-based ALDI is provided for a PDE constrained Bayesian inverse problem. △ Less

Submitted 9 April, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

MSC Class: 65N21; 62F15; 65N75; 65C30; 90C56

arXiv:1911.10832 [pdf, other]

Fokker-Planck particle systems for Bayesian inference: Computational approaches

Authors: Sebastian Reich, Simon Weissmann

Abstract: Bayesian inference can be embedded into an appropriately defined dynamics in the space of probability measures. In this paper, we take Brownian motion and its associated Fokker--Planck equation as a starting point for such embeddings and explore several interacting particle approximations. More specifically, we consider both deterministic and stochastic interacting particle systems and combine the… ▽ More Bayesian inference can be embedded into an appropriately defined dynamics in the space of probability measures. In this paper, we take Brownian motion and its associated Fokker--Planck equation as a starting point for such embeddings and explore several interacting particle approximations. More specifically, we consider both deterministic and stochastic interacting particle systems and combine them with the idea of preconditioning by the empirical covariance matrix. In addition to leading to affine invariant formulations which asymptotically speed up convergence, preconditioning allows for gradient-free implementations in the spirit of the ensemble Kalman filter. While such gradient-free implementations have been demonstrated to work well for posterior measures that are nearly Gaussian, we extend their scope of applicability to multimodal measures by introducing localised gradient-free approximations. Numerical results demonstrate the effectiveness of the considered methodologies. △ Less

Submitted 8 February, 2021; v1 submitted 25 November, 2019; originally announced November 2019.

arXiv:1908.10890 [pdf, ps, other]

Note on Interacting Langevin Diffusions: Gradient Structure and Ensemble Kalman Sampler by Garbuno-Inigo, Hoffmann, Li and Stuart

Authors: Nikolas Nüsken, Sebastian Reich

Abstract: An interacting system of Langevin dynamics driven particles has been proposed for sampling from a given posterior density by Garbuno-Inigo, Hoffmann, Li and Stuart in Interacting Langevin Diffusions: Gradient Structure and Ensemble Kalman Sampler (arXiv:1903:08866v2). The proposed formulation is primarily studied from a formal mean-field limit perspective, while the theoretical behaviour under a f… ▽ More An interacting system of Langevin dynamics driven particles has been proposed for sampling from a given posterior density by Garbuno-Inigo, Hoffmann, Li and Stuart in Interacting Langevin Diffusions: Gradient Structure and Ensemble Kalman Sampler (arXiv:1903:08866v2). The proposed formulation is primarily studied from a formal mean-field limit perspective, while the theoretical behaviour under a finite particle size is left as an open problem. In this note we demonstrate that the particle-based covariance interaction term requires a non-trivial correction. We also show that the corrected dynamics samples exactly from the desired posterior provided that the empirical covariance matrix of the particle system remains non-singular and the posterior log-density satisfies the standard Bakry-Emery criterion. △ Less

Submitted 28 August, 2019; originally announced August 2019.

MSC Class: 60H10; 82C22; 62F15; 35Q84

arXiv:1908.07398 [pdf, other]

Outer Approximation Methods for Solving Variational Inequalities Defined over the Solution Set of a Split Convex Feasibility Problem

Authors: Andrzej Cegielski, Aviv Gibali, Simeon Reich, Rafał Zalas

Abstract: We study variational inequalities which are governed by a strongly monotone and Lipschitz continuous operator $F$ over a closed and convex set $S$. We assume that $S=C\cap A^{-1}(Q)$ is the nonempty solution set of a (multiple-set) split convex feasibility problem, where $C$ and $Q$ are both closed and convex subsets of two real Hilbert spaces $\mathcal H_1$ and $\mathcal H_2$, respectively, and t… ▽ More We study variational inequalities which are governed by a strongly monotone and Lipschitz continuous operator $F$ over a closed and convex set $S$. We assume that $S=C\cap A^{-1}(Q)$ is the nonempty solution set of a (multiple-set) split convex feasibility problem, where $C$ and $Q$ are both closed and convex subsets of two real Hilbert spaces $\mathcal H_1$ and $\mathcal H_2$, respectively, and the operator $A$ acting between them is linear. We consider a modification of the gradient projection method the main idea of which is to replace at each step the metric projection onto $S$ by another metric projection onto a half-space which contains $S$. We propose three variants of a method for constructing the above-mentioned half-spaces by employing the multiple-set and the split structure of the set $S$. For the split part we make use of the Landweber transform. △ Less

Submitted 20 August, 2019; originally announced August 2019.

MSC Class: 47H09; 47H10; 47J20; 47J25; 65K15

arXiv:1905.05660 [pdf, ps, other]

Finitely Convergent Deterministic and Stochastic Iterative Methods for Solving Convex Feasibility Problems

Authors: Victor I. Kolobov, Simeon Reich, Rafał Zalas

Abstract: We propose finitely convergent methods for solving convex feasibility problems defined over a possibly infinite pool of constraints. Following other works in this area, we assume that the interior of the solution set is nonempty and that certain overrelaxation parameters form a divergent series. We combine our methods with a very general class of deterministic control sequences where, roughly spea… ▽ More We propose finitely convergent methods for solving convex feasibility problems defined over a possibly infinite pool of constraints. Following other works in this area, we assume that the interior of the solution set is nonempty and that certain overrelaxation parameters form a divergent series. We combine our methods with a very general class of deterministic control sequences where, roughly speaking, we require that sooner or later we encounter a violated constraint if one exists. This requirement is satisfied, in particular, by the cyclic, repetitive and remotest set controls. Moreover, it is almost surely satisfied for random controls. △ Less

Submitted 20 September, 2020; v1 submitted 14 May, 2019; originally announced May 2019.

arXiv:1903.10924 [pdf, ps, other]

Generic Convergence of Sequences of Successive Approximations in Banach Spaces

Authors: Christian Bargetz, Simeon Reich

Abstract: We study the generic behavior of the method of successive approximations for set-valued map**s in Banach spaces. We consider, in particular, the case of those set-valued map**s which are defined by pairs of nonexpansive map**s and give a positive answer to a question raised by Francesco S. de Blasi. We study the generic behavior of the method of successive approximations for set-valued map**s in Banach spaces. We consider, in particular, the case of those set-valued map**s which are defined by pairs of nonexpansive map**s and give a positive answer to a question raised by Francesco S. de Blasi. △ Less

Submitted 26 March, 2019; originally announced March 2019.

Comments: 18 pages

MSC Class: 47H04; 47H09; 47H10; 54E52

Journal ref: Pure and Applied Functional Analysis 4(3): 477-493, 2019,

arXiv:1903.10717 [pdf, ps, other]

doi 10.3390/e21050505

State and Parameter Estimation from Observed Signal Increments

Authors: Nikolas Nüsken, Sebastian Reich, Paul J. Rozdeba

Abstract: The success of the ensemble Kalman filter has triggered a strong interest in expanding its scope beyond classical state estimation problems. In this paper, we focus on continuous-time data assimilation where the model and measurement errors are correlated and both states and parameters need to be identified. Such scenarios arise from noisy and partial observations of Lagrangian particles which mov… ▽ More The success of the ensemble Kalman filter has triggered a strong interest in expanding its scope beyond classical state estimation problems. In this paper, we focus on continuous-time data assimilation where the model and measurement errors are correlated and both states and parameters need to be identified. Such scenarios arise from noisy and partial observations of Lagrangian particles which move under a stochastic velocity field involving unknown parameters. We take an appropriate class of McKean-Vlasov equations as the starting point to derive ensemble Kalman-Bucy filter algorithms for combined state and parameter estimation. We demonstrate their performance through a series of increasingly complex multi-scale model systems. △ Less

Submitted 1 May, 2019; v1 submitted 26 March, 2019; originally announced March 2019.

MSC Class: 62M20; 93E11; 93E20; 65D10; 65C05; 65C35

arXiv:1903.00186 [pdf, ps, other]

Discrete gradients for computational Bayesian inference

Authors: Sahani Pathiraja, Sebastian Reich

Abstract: In this paper, we exploit the gradient flow structure of continuous-time formulations of Bayesian inference in terms of their numerical time-step**. We focus on two particular examples, namely, the continuous-time ensemble Kalman-Bucy filter and a particle discretisation of the Fokker-Planck equation associated to Brownian dynamics. Both formulations can lead to stiff differential equations whic… ▽ More In this paper, we exploit the gradient flow structure of continuous-time formulations of Bayesian inference in terms of their numerical time-step**. We focus on two particular examples, namely, the continuous-time ensemble Kalman-Bucy filter and a particle discretisation of the Fokker-Planck equation associated to Brownian dynamics. Both formulations can lead to stiff differential equations which require special numerical methods for their efficient numerical implementation. We compare discrete gradient methods to alternative semi-implicit and other iterative implementations of the underlying Bayesian inference problems. △ Less

Submitted 21 June, 2019; v1 submitted 1 March, 2019; originally announced March 2019.

MSC Class: 65M12; 62F15; 65C05

arXiv:1902.02363 [pdf, ps, other]

Stability of the optimal values under small perturbations of the constraint set

Authors: Daniel Reem, Simeon Reich, Alvaro De Pierro

Abstract: This paper discusses a general and useful stability principle which, roughly speaking, says that given a uniformly continuous function defined on an arbitrary metric space, if the function is bounded on the constraint set and we slightly change this set, then its optimal (extreme) values on this set vary slightly, and, moreover, they are actually uniformly continuous as a function of the constrain… ▽ More This paper discusses a general and useful stability principle which, roughly speaking, says that given a uniformly continuous function defined on an arbitrary metric space, if the function is bounded on the constraint set and we slightly change this set, then its optimal (extreme) values on this set vary slightly, and, moreover, they are actually uniformly continuous as a function of the constraint set. The principle holds in a much more general setting than a metric space, since the distance function may be asymmetric, may attain negative and even infinite values, and so on. This stability principle leads to applications in parametric optimization, mixed linear-nonlinear programming and analysis of Lipschitz continuity, as well as to a general scheme for tackling a wide class of non-convex and non-smooth optimization problems. We also discuss the issue of stability when the objective function is merely continuous. As a byproduct of our analysis we obtain a significant generalization of the concept of a generalized inverse of a linear operator and a very general variant of the so-called "Hoffman's Lemma". △ Less

Submitted 29 March, 2020; v1 submitted 6 February, 2019; originally announced February 2019.

Comments: To appear in "Pure and Applied Functional Analysis"; correction of a few minor linguistic inaccuracies; added an example of a space with a negative distance in Subsection 1.2 (the ninth example); added a few references; added thanks

MSC Class: 90C31; 49K40; 90C26; 54E99; 46A19; 90C59; 54C30; 15A06; 15A09 ACM Class: G.1.0; G.1.2; G.1.6; G.1.10; J.2

Journal ref: Pure and Applied Functional Analysis 5 (2020), no. 3, 705--731

arXiv:1901.06300 [pdf, ps, other]

Ensemble transform algorithms for nonlinear smoothing problems

Authors: Jana de Wiljes, Sahani Pathiraja, Sebastian Reich

Abstract: Several numerical tools designed to overcome the challenges of smoothing in a nonlinear and non-Gaussian setting are investigated for a class of particle smoothers. The considered family of smoothers is induced by the class of linear ensemble transform filters which contains classical filters such as the stochastic ensemble Kalman filter, the ensemble square root filter and the recently introduced… ▽ More Several numerical tools designed to overcome the challenges of smoothing in a nonlinear and non-Gaussian setting are investigated for a class of particle smoothers. The considered family of smoothers is induced by the class of linear ensemble transform filters which contains classical filters such as the stochastic ensemble Kalman filter, the ensemble square root filter and the recently introduced nonlinear ensemble transform filter. Further the ensemble transform particle smoother is introduced and particularly highlighted as it is consistent in the particle limit and does not require assumptions with respect to the family of the posterior distribution. The linear update pattern of the considered class of linear ensemble transform smoothers allows one to implement important supplementary techniques such as adaptive spread corrections, hybrid formulations, and localization in order to facilitate their application to complex estimation problems. These additional features are derived and numerically investigated for a sequence of increasingly challenging test problems. △ Less

Submitted 28 October, 2019; v1 submitted 18 January, 2019; originally announced January 2019.

MSC Class: 65C05; 62M20; 93E11; 62F15; 86A22

arXiv:1812.07450 [pdf, ps, other]

Weak, Strong and Linear Convergence of the CQ-Method Via the Regularity of Landweber Operators

Authors: Andrzej Cegielski, Simeon Reich, Rafał Zalas

Abstract: We consider the split convex feasibility problem in a fixed point setting. Motivated by the well-known CQ-method of Byrne (2002), we define an abstract andweber transform which applies to more general operators than the metric projection. We call the result of this transform a Landweber operator. It turns out that the Landweber transform preserves many interesting properties. For example, the Land… ▽ More We consider the split convex feasibility problem in a fixed point setting. Motivated by the well-known CQ-method of Byrne (2002), we define an abstract andweber transform which applies to more general operators than the metric projection. We call the result of this transform a Landweber operator. It turns out that the Landweber transform preserves many interesting properties. For example, the Landweber transform of a (quasi/firmly) nonexpansive map** is again (quasi/firmly) nonexpansive. Moreover, the Landweber transform of a (weakly/linearly) regular map** is again (weakly/linearly) regular. The preservation of regularity is important because it leads to (weak/linear) convergence of many CQ-type methods. △ Less

Submitted 18 December, 2018; originally announced December 2018.

MSC Class: 47J25; 47N10; 49N45

Showing 1–50 of 99 results for author: Reich, S