Search | arXiv e-print repository

Robust parameter estimation for partially observed second-order diffusion processes

Abstract: Estimating parameters of a diffusion process given continuous-time observations of the process via maximum likelihood approaches or, online, via stochastic gradient descent or Kalman filter formulations constitutes a well-established research area. It has also been established previously that these techniques are, in general, not robust to perturbations in the data in the form of temporal correlat… ▽ More Estimating parameters of a diffusion process given continuous-time observations of the process via maximum likelihood approaches or, online, via stochastic gradient descent or Kalman filter formulations constitutes a well-established research area. It has also been established previously that these techniques are, in general, not robust to perturbations in the data in the form of temporal correlations. While the subject is relatively well understood and appropriate modifications have been suggested in the context of multi-scale diffusion processes and their reduced model equations, we consider here an alternative setting where a second-order diffusion process in positions and velocities is only observed via its positions. In this note, we propose a simple modification to standard stochastic gradient descent and Kalman filter formulations, which eliminates the arising systematic estimation biases. The modification can be extended to standard maximum likelihood approaches and avoids computation of previously proposed correction terms. △ Less

Submitted 20 June, 2024; originally announced June 2024.

MSC Class: 65C30; 65L09; 60M20; 62F10; 62F15; 62L20

arXiv:2401.04372 [pdf, ps, other]

Stable generative modeling using diffusion maps

Authors: Georg Gottwald, Fengyi Li, Youssef Marzouk, Sebastian Reich

Abstract: We consider the problem of sampling from an unknown distribution for which only a sufficiently large number of training samples are available. Such settings have recently drawn considerable interest in the context of generative modelling. In this paper, we propose a generative model combining diffusion maps and Langevin dynamics. Diffusion maps are used to approximate the drift term from the avail… ▽ More We consider the problem of sampling from an unknown distribution for which only a sufficiently large number of training samples are available. Such settings have recently drawn considerable interest in the context of generative modelling. In this paper, we propose a generative model combining diffusion maps and Langevin dynamics. Diffusion maps are used to approximate the drift term from the available training samples, which is then implemented in a discrete-time Langevin sampler to generate new samples. By setting the kernel bandwidth to match the time step size used in the unadjusted Langevin algorithm, our method effectively circumvents any stability issues typically associated with time-step** stiff stochastic differential equations. More precisely, we introduce a novel split-step scheme, ensuring that the generated samples remain within the convex hull of the training samples. Our framework can be naturally extended to generate conditional samples. We demonstrate the performance of our proposed scheme through experiments on synthetic datasets with increasing dimensions and on a stochastic subgrid-scale parametrization conditional sampling problem. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: 23 pages, 25 figures

arXiv:2310.06721 [pdf, other]

Tweedie Moment Projected Diffusions For Inverse Problems

Authors: Benjamin Boys, Mark Girolami, Jakiw Pidstrigach, Sebastian Reich, Alan Mosca, O. Deniz Akyildiz

Abstract: Diffusion generative models unlock new possibilities for inverse problems as they allow for the incorporation of strong empirical priors into the process of scientific inference. Recently, diffusion models received significant attention for solving inverse problems by posterior sampling, but many challenges remain open due to the intractability of this sampling process. Prior work resorted to Gaus… ▽ More Diffusion generative models unlock new possibilities for inverse problems as they allow for the incorporation of strong empirical priors into the process of scientific inference. Recently, diffusion models received significant attention for solving inverse problems by posterior sampling, but many challenges remain open due to the intractability of this sampling process. Prior work resorted to Gaussian approximations to conditional densities of the reverse process, leveraging Tweedie's formula to parameterise its mean, complemented with various heuristics. In this work, we leverage higher order information using Tweedie's formula and obtain a finer approximation with a principled covariance estimate. This novel approximation removes any time-dependent step-size hyperparameters required by earlier methods, and enables higher quality approximations of the posterior density which results in better samples. Specifically, we tackle noisy linear inverse problems and obtain a novel approximation to the gradient of the likelihood. We then plug this gradient estimate into various diffusion models and show that this method is optimal for a Gaussian data distribution. We illustrate the empirical effectiveness of our approach for general linear inverse problems on toy synthetic examples as well as image restoration using pretrained diffusion models as the prior. We show that our method improves the sample quality by providing statistically principled approximations to diffusion posterior sampling problem. △ Less

Submitted 22 November, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: 9 pages, 4 figures, 1 table when excluding abstract and bibliography; 29 pages, 12 figures, 2 tables when including abstract and bibliography

arXiv:2310.03597 [pdf, other]

Sampling via Gradient Flows in the Space of Probability Measures

Authors: Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich, Andrew M Stuart

Abstract: Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design com… ▽ More Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically. △ Less

Submitted 9 March, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: Related and text overlap with arXiv:2302.11024

arXiv:2309.04742 [pdf, other]

Affine Invariant Ensemble Transform Methods to Improve Predictive Uncertainty in Neural Networks

Authors: Diksha Bhandari, Jakiw Pidstrigach, Sebastian Reich

Abstract: We consider the problem of performing Bayesian inference for logistic regression using appropriate extensions of the ensemble Kalman filter. Two interacting particle systems are proposed that sample from an approximate posterior and prove quantitative convergence rates of these interacting particle systems to their mean-field limit as the number of particles tends to infinity. Furthermore, we appl… ▽ More We consider the problem of performing Bayesian inference for logistic regression using appropriate extensions of the ensemble Kalman filter. Two interacting particle systems are proposed that sample from an approximate posterior and prove quantitative convergence rates of these interacting particle systems to their mean-field limit as the number of particles tends to infinity. Furthermore, we apply these techniques and examine their effectiveness as methods of Bayesian approximation for quantifying predictive uncertainty in neural networks. △ Less

Submitted 1 July, 2024; v1 submitted 9 September, 2023; originally announced September 2023.

arXiv:2303.11941 [pdf, other]

Bayesian Dynamical Modeling of Fixational Eye Movements

Authors: Lisa Schwetlick, Sebastian Reich, Ralf Engbert

Abstract: Humans constantly move their eyes, even during visual fixations, where miniature (or fixational) eye movements are produced involuntarily. Fixational eye movements are composed of slow components (physiological drift and tremor) and fast microsaccades. The complex dynamics of physiological drift can be modeled qualitatively as a statistically self-avoiding random walk (SAW model, see Engbert et al… ▽ More Humans constantly move their eyes, even during visual fixations, where miniature (or fixational) eye movements are produced involuntarily. Fixational eye movements are composed of slow components (physiological drift and tremor) and fast microsaccades. The complex dynamics of physiological drift can be modeled qualitatively as a statistically self-avoiding random walk (SAW model, see Engbert et al., 2011). In this study, we implement a data assimilation approach for the SAW model to explain quantitative differences in experimental data obtained from high-resolution, video-based eye tracking. We present a likelihood function for the SAW model which allows us apply Bayesian parameter estimation at the level of individual human participants. Based on the model fits we find a relationship between the activation predicted by the SAW model and the occurrence of microsaccades. The latent model activation relative to microsaccade onsets and offsets using experimental data reveals evidence for a triggering mechanism for microsaccades. These findings suggest that the SAW model is capable of capturing individual differences and can serve as a tool for exploring the relationship between physiological drift and microsaccades as the two most important components of fixational eye movements. Our results contribute to the understanding of individual variability in microsaccade behaviors and the role of fixational eye movements in visual information processing. △ Less

Submitted 21 March, 2023; originally announced March 2023.

arXiv:2302.11024 [pdf, other]

Gradient Flows for Sampling: Mean-Field Models, Gaussian Approximations and Affine Invariance

Authors: Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich, Andrew M. Stuart

Abstract: Sampling a probability distribution with an unknown normalization constant is a fundamental problem in computational science and engineering. This task may be cast as an optimization problem over all probability measures, and an initial distribution can be evolved to the desired minimizer dynamically via gradient flows. Mean-field models, whose law is governed by the gradient flow in the space of… ▽ More Sampling a probability distribution with an unknown normalization constant is a fundamental problem in computational science and engineering. This task may be cast as an optimization problem over all probability measures, and an initial distribution can be evolved to the desired minimizer dynamically via gradient flows. Mean-field models, whose law is governed by the gradient flow in the space of probability measures, may also be identified; particle approximations of these mean-field models form the basis of algorithms. The gradient flow approach is also the basis of algorithms for variational inference, in which the optimization is performed over a parameterized family of probability distributions such as Gaussians, and the underlying gradient flow is restricted to the parameterized family. By choosing different energy functionals and metrics for the gradient flow, different algorithms with different convergence properties arise. In this paper, we concentrate on the Kullback-Leibler divergence after showing that, up to scaling, it has the unique property that the gradient flows resulting from this choice of energy do not depend on the normalization constant. For the metrics, we focus on variants of the Fisher-Rao, Wasserstein, and Stein metrics; we introduce the affine invariance property for gradient flows, and their corresponding mean-field models, determine whether a given metric leads to affine invariance, and modify it to make it affine invariant if it does not. We study the resulting gradient flows in both probability density space and Gaussian space. The flow in the Gaussian space may be understood as a Gaussian approximation of the flow. We demonstrate that the Gaussian approximation based on the metric and through moment closure coincide, establish connections between them, and study their long-time convergence properties showing the advantages of affine invariance. △ Less

Submitted 2 November, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: 82 pages, 8 figures (Welcome any feedback!)

arXiv:2302.10130 [pdf, other]

Infinite-Dimensional Diffusion Models

Authors: Jakiw Pidstrigach, Youssef Marzouk, Sebastian Reich, Sven Wang

Abstract: Diffusion models have had a profound impact on many application areas, including those where data are intrinsically infinite-dimensional, such as images or time series. The standard approach is first to discretize and then to apply diffusion models to the discretized data. While such approaches are practically appealing, the performance of the resulting algorithms typically deteriorates as discret… ▽ More Diffusion models have had a profound impact on many application areas, including those where data are intrinsically infinite-dimensional, such as images or time series. The standard approach is first to discretize and then to apply diffusion models to the discretized data. While such approaches are practically appealing, the performance of the resulting algorithms typically deteriorates as discretization parameters are refined. In this paper, we instead directly formulate diffusion-based generative models in infinite dimensions and apply them to the generative modeling of functions. We prove that our formulations are well posed in the infinite-dimensional setting and provide dimension-independent distance bounds from the sample to the target measure. Using our theory, we also develop guidelines for the design of infinite-dimensional diffusion models. For image distributions, these guidelines are in line with the canonical choices currently made for diffusion models. For other distributions, however, we can improve upon these canonical choices, which we show both theoretically and empirically, by applying the algorithms to data distributions on manifolds and inspired by Bayesian inverse problems or simulation-based inference. △ Less

Submitted 3 October, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

MSC Class: 68T99; 60Hxx

arXiv:2010.06721 [pdf, other]

Ensemble Distillation for Structured Prediction: Calibrated, Accurate, Fast-Choose Three

Authors: Steven Reich, David Mueller, Nicholas Andrews

Abstract: Modern neural networks do not always produce well-calibrated predictions, even when trained with a proper scoring function such as cross-entropy. In classification settings, simple methods such as isotonic regression or temperature scaling may be used in conjunction with a held-out dataset to calibrate model outputs. However, extending these methods to structured prediction is not always straightf… ▽ More Modern neural networks do not always produce well-calibrated predictions, even when trained with a proper scoring function such as cross-entropy. In classification settings, simple methods such as isotonic regression or temperature scaling may be used in conjunction with a held-out dataset to calibrate model outputs. However, extending these methods to structured prediction is not always straightforward or effective; furthermore, a held-out calibration set may not always be available. In this paper, we study ensemble distillation as a general framework for producing well-calibrated structured prediction models while avoiding the prohibitive inference-time cost of ensembles. We validate this framework on two tasks: named-entity recognition and machine translation. We find that, across both tasks, ensemble distillation produces models which retain much of, and occasionally improve upon, the performance and calibration benefits of ensembles, while only requiring a single model during test-time. △ Less

Submitted 25 March, 2021; v1 submitted 13 October, 2020; originally announced October 2020.

Comments: EMNLP 2020. v2: Changed formatting of title in metadata; no other changes

arXiv:2008.06780 [pdf, other]

Automated Detection of Cortical Lesions in Multiple Sclerosis Patients with 7T MRI

Authors: Francesco La Rosa, Erin S Beck, Ahmed Abdulkadir, Jean-Philippe Thiran, Daniel S Reich, Pascal Sati, Meritxell Bach Cuadra

Abstract: The automated detection of cortical lesions (CLs) in patients with multiple sclerosis (MS) is a challenging task that, despite its clinical relevance, has received very little attention. Accurate detection of the small and scarce lesions requires specialized sequences and high or ultra-high field MRI. For supervised training based on multimodal structural MRI at 7T, two experts generated ground tr… ▽ More The automated detection of cortical lesions (CLs) in patients with multiple sclerosis (MS) is a challenging task that, despite its clinical relevance, has received very little attention. Accurate detection of the small and scarce lesions requires specialized sequences and high or ultra-high field MRI. For supervised training based on multimodal structural MRI at 7T, two experts generated ground truth segmentation masks of 60 patients with 2014 CLs. We implemented a simplified 3D U-Net with three resolution levels (3D U-Net-). By increasing the complexity of the task (adding brain tissue segmentation), while randomly drop** input channels during training, we improved the performance compared to the baseline. Considering a minimum lesion size of 0.75 μL, we achieved a lesion-wise cortical lesion detection rate of 67% and a false positive rate of 42%. However, 393 (24%) of the lesions reported as false positives were post-hoc confirmed as potential or definite lesions by an expert. This indicates the potential of the proposed method to support experts in the tedious process of CL manual segmentation. △ Less

Submitted 15 August, 2020; originally announced August 2020.

Comments: Accepted to MICCAI 2020

arXiv:2007.07383 [pdf, ps, other]

doi 10.1016/j.physd.2021.132911

Supervised learning from noisy observations: Combining machine-learning techniques with data assimilation

Authors: Georg A. Gottwald, Sebastian Reich

Abstract: Data-driven prediction and physics-agnostic machine-learning methods have attracted increased interest in recent years achieving forecast horizons going well beyond those to be expected for chaotic dynamical systems. In a separate strand of research data-assimilation has been successfully used to optimally combine forecast models and their inherent uncertainty with incoming noisy observations. The… ▽ More Data-driven prediction and physics-agnostic machine-learning methods have attracted increased interest in recent years achieving forecast horizons going well beyond those to be expected for chaotic dynamical systems. In a separate strand of research data-assimilation has been successfully used to optimally combine forecast models and their inherent uncertainty with incoming noisy observations. The key idea in our work here is to achieve increased forecast capabilities by judiciously combining machine-learning algorithms and data assimilation. We combine the physics-agnostic data-driven approach of random feature maps as a forecast model within an ensemble Kalman filter data assimilation procedure. The machine-learning model is learned sequentially by incorporating incoming noisy observations. We show that the obtained forecast model has remarkably good forecast skill while being computationally cheap once trained. Going beyond the task of forecasting, we show that our method can be used to generate reliable ensembles for probabilistic forecasting as well as to learn effective model closure in multi-scale systems. △ Less

Submitted 8 March, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

arXiv:2006.02037 [pdf, other]

Spectral convergence of diffusion maps: improved error bounds and an alternative normalisation

Authors: Caroline L. Wormell, Sebastian Reich

Abstract: Diffusion maps is a manifold learning algorithm widely used for dimensionality reduction. Using a sample from a distribution, it approximates the eigenvalues and eigenfunctions of associated Laplace-Beltrami operators. Theoretical bounds on the approximation error are however generally much weaker than the rates that are seen in practice. This paper uses new approaches to improve the error bounds… ▽ More Diffusion maps is a manifold learning algorithm widely used for dimensionality reduction. Using a sample from a distribution, it approximates the eigenvalues and eigenfunctions of associated Laplace-Beltrami operators. Theoretical bounds on the approximation error are however generally much weaker than the rates that are seen in practice. This paper uses new approaches to improve the error bounds in the model case where the distribution is supported on a hypertorus. For the data sampling (variance) component of the error we make spatially localised compact embedding estimates on certain Hardy spaces; we study the deterministic (bias) component as a perturbation of the Laplace-Beltrami operator's associated PDE, and apply relevant spectral stability results. Using these approaches, we match long-standing pointwise error bounds for both the spectral data and the norm convergence of the operator discretisation. We also introduce an alternative normalisation for diffusion maps based on Sinkhorn weights. This normalisation approximates a Langevin diffusion on the sample and yields a symmetric operator approximation. We prove that it has better convergence compared with the standard normalisation on flat domains, and present a highly efficient algorithm to compute the Sinkhorn weights. △ Less

Submitted 7 April, 2021; v1 submitted 3 June, 2020; originally announced June 2020.

Comments: Electronic copy of the final peer-reviewed manuscript accepted for publication

MSC Class: 35P15; 60J60; 62M05; 65D99

arXiv:2006.00702 [pdf, ps, other]

doi 10.3390/e22080802

Interacting particle solutions of Fokker-Planck equations through gradient-log-density estimation

Authors: Dimitra Maoutsa, Sebastian Reich, Manfred Opper

Abstract: Fokker-Planck equations are extensively employed in various scientific fields as they characterise the behaviour of stochastic systems at the level of probability density functions. Although broadly used, they allow for analytical treatment only in limited settings, and often is inevitable to resort to numerical solutions. Here, we develop a computational approach for simulating the time evolution… ▽ More Fokker-Planck equations are extensively employed in various scientific fields as they characterise the behaviour of stochastic systems at the level of probability density functions. Although broadly used, they allow for analytical treatment only in limited settings, and often is inevitable to resort to numerical solutions. Here, we develop a computational approach for simulating the time evolution of Fokker-Planck solutions in terms of a mean field limit of an interacting particle system. The interactions between particles are determined by the gradient of the logarithm of the particle density, approximated here by a novel statistical estimator. The performance of our method shows promising results, with more accurate and less fluctuating statistics compared to direct stochastic simulations of comparable particle number. Taken together, our framework allows for effortless and reliable particle-based simulations of Fokker-Planck equations in low and moderate dimensions. The proposed gradient-log-density estimator is also of independent interest, for example, in the context of optimal control. △ Less

Submitted 1 June, 2020; originally announced June 2020.

Comments: 34 pages, 8 figures

MSC Class: 82C80; 37M05; 37H05; 60H35; 65C35; 65N75

arXiv:2005.12857 [pdf, other]

GP-ETAS: Semiparametric Bayesian inference for the spatio-temporal Epidemic Type Aftershock Sequence model

Authors: Christian Molkenthin, Christian Donner, Sebastian Reich, Gert Zöller, Sebastian Hainzl, Matthias Holschneider, Manfred Opper

Abstract: The spatio-temporal Epidemic Type Aftershock Sequence (ETAS) model is widely used to describe the self-exciting nature of earthquake occurrences. While traditional inference methods provide only point estimates of the model parameters, we aim at a full Bayesian treatment of model inference, allowing naturally to incorporate prior knowledge and uncertainty quantification of the resulting estimates.… ▽ More The spatio-temporal Epidemic Type Aftershock Sequence (ETAS) model is widely used to describe the self-exciting nature of earthquake occurrences. While traditional inference methods provide only point estimates of the model parameters, we aim at a full Bayesian treatment of model inference, allowing naturally to incorporate prior knowledge and uncertainty quantification of the resulting estimates. Therefore, we introduce a highly flexible, non-parametric representation for the spatially varying ETAS background intensity through a Gaussian process (GP) prior. Combined with classical triggering functions this results in a new model formulation, namely the GP-ETAS model. We enable tractable and efficient Gibbs sampling by deriving an augmented form of the GP-ETAS inference problem. This novel sampling approach allows us to assess the posterior model variables conditioned on observed earthquake catalogues, i.e., the spatial background intensity and the parameters of the triggering function. Empirical results on two synthetic data sets indicate that GP-ETAS outperforms standard models and thus demonstrate the predictive power for observed earthquake catalogues including uncertainty quantification for the estimated parameters. Finally, a case study for the l'Aquila region, Italy, with the devastating event on 6 April 2009, is presented. △ Less

Submitted 26 May, 2020; originally announced May 2020.

arXiv:2002.06753 [pdf, other]

Unraveling Meta-Learning: Understanding Feature Representations for Few-Shot Tasks

Authors: Micah Goldblum, Steven Reich, Liam Fowl, Renkun Ni, Valeriia Cherepanova, Tom Goldstein

Abstract: Meta-learning algorithms produce feature extractors which achieve state-of-the-art performance on few-shot classification. While the literature is rich with meta-learning methods, little is known about why the resulting feature extractors perform so well. We develop a better understanding of the underlying mechanics of meta-learning and the difference between models trained using meta-learning and… ▽ More Meta-learning algorithms produce feature extractors which achieve state-of-the-art performance on few-shot classification. While the literature is rich with meta-learning methods, little is known about why the resulting feature extractors perform so well. We develop a better understanding of the underlying mechanics of meta-learning and the difference between models trained using meta-learning and models which are trained classically. In doing so, we introduce and verify several hypotheses for why meta-learned models perform better. Furthermore, we develop a regularizer which boosts the performance of standard training routines for few-shot classification. In many cases, our routine outperforms meta-learning while simultaneously running an order of magnitude faster. △ Less

Submitted 1 July, 2020; v1 submitted 16 February, 2020; originally announced February 2020.

Comments: ICML 2020

arXiv:1911.07989 [pdf, other]

WITCHcraft: Efficient PGD attacks with random step size

Authors: **-Yeh Chiang, Jonas Gei**, Micah Goldblum, Tom Goldstein, Renkun Ni, Steven Reich, Ali Shafahi

Abstract: State-of-the-art adversarial attacks on neural networks use expensive iterative methods and numerous random restarts from different initial points. Iterative FGSM-based methods without restarts trade off performance for computational efficiency because they do not adequately explore the image space and are highly sensitive to the choice of step size. We propose a variant of Projected Gradient Desc… ▽ More State-of-the-art adversarial attacks on neural networks use expensive iterative methods and numerous random restarts from different initial points. Iterative FGSM-based methods without restarts trade off performance for computational efficiency because they do not adequately explore the image space and are highly sensitive to the choice of step size. We propose a variant of Projected Gradient Descent (PGD) that uses a random step size to improve performance without resorting to expensive random restarts. Our method, Wide Iterative Stochastic crafting (WITCHcraft), achieves results superior to the classical PGD attack on the CIFAR-10 and MNIST data sets but without additional computational cost. This simple modification of PGD makes crafting attacks more economical, which is important in situations like adversarial training where attacks need to be crafted in real time. △ Less

Submitted 18 November, 2019; originally announced November 2019.

Comments: Authors contributed equally and are listed in alphabetical order

arXiv:1901.06300 [pdf, ps, other]

Ensemble transform algorithms for nonlinear smoothing problems

Authors: Jana de Wiljes, Sahani Pathiraja, Sebastian Reich

Abstract: Several numerical tools designed to overcome the challenges of smoothing in a nonlinear and non-Gaussian setting are investigated for a class of particle smoothers. The considered family of smoothers is induced by the class of linear ensemble transform filters which contains classical filters such as the stochastic ensemble Kalman filter, the ensemble square root filter and the recently introduced… ▽ More Several numerical tools designed to overcome the challenges of smoothing in a nonlinear and non-Gaussian setting are investigated for a class of particle smoothers. The considered family of smoothers is induced by the class of linear ensemble transform filters which contains classical filters such as the stochastic ensemble Kalman filter, the ensemble square root filter and the recently introduced nonlinear ensemble transform filter. Further the ensemble transform particle smoother is introduced and particularly highlighted as it is consistent in the particle limit and does not require assumptions with respect to the family of the posterior distribution. The linear update pattern of the considered class of linear ensemble transform smoothers allows one to implement important supplementary techniques such as adaptive spread corrections, hybrid formulations, and localization in order to facilitate their application to complex estimation problems. These additional features are derived and numerically investigated for a sequence of increasingly challenging test problems. △ Less

Submitted 28 October, 2019; v1 submitted 18 January, 2019; originally announced January 2019.

MSC Class: 65C05; 62M20; 93E11; 62F15; 86A22

arXiv:1807.10434 [pdf, other]

Particle filters for high-dimensional geoscience applications: a review

Authors: Peter Jan van Leeuwen, Hans R. Künsch, Lars Nerger, Roland Potthast, Sebastian Reich

Abstract: Particle filters contain the promise of fully nonlinear data assimilation. They have been applied in numerous science areas, but their application to the geosciences has been limited due to their inefficiency in high-dimensional systems in standard settings. However, huge progress has been made, and this limitation is disappearing fast due to recent developments in proposal densities, the use of i… ▽ More Particle filters contain the promise of fully nonlinear data assimilation. They have been applied in numerous science areas, but their application to the geosciences has been limited due to their inefficiency in high-dimensional systems in standard settings. However, huge progress has been made, and this limitation is disappearing fast due to recent developments in proposal densities, the use of ideas from (optimal) transportation, the use of localisation and intelligent adaptive resampling strategies. Furthermore, powerful hybrids between particle filters and ensemble Kalman filters and variational methods have been developed. We present a state of the art discussion of present efforts of develo** particle filters for highly nonlinear geoscience state-estimation problems with an emphasis on atmospheric and oceanic applications, including many new ideas, derivations, and unifications, highlighting hidden connections, and generating a valuable tool and guide for the community. Initial experiments show that particle filters can be competitive with present-day methods for numerical weather prediction suggesting that they will become mainstream soon. △ Less

Submitted 13 April, 2019; v1 submitted 27 July, 2018; originally announced July 2018.

Comments: Review paper, 36 pages, 9 figures, Resubmitted to Q.J.Royal Meteorol. Soc

MSC Class: 62M86

arXiv:1803.00641 [pdf, ps, other]

doi 10.1080/02331934.2018.1543295

Re-examination of Bregman functions and new properties of their divergences

Authors: Daniel Reem, Simeon Reich, Alvaro De Pierro

Abstract: The Bregman divergence (Bregman distance, Bregman measure of distance) is a certain useful substitute for a distance, obtained from a well-chosen function (the "Bregman function"). Bregman functions and divergences have been extensively investigated during the last decades and have found applications in optimization, operations research, information theory, nonlinear analysis, machine learning and… ▽ More The Bregman divergence (Bregman distance, Bregman measure of distance) is a certain useful substitute for a distance, obtained from a well-chosen function (the "Bregman function"). Bregman functions and divergences have been extensively investigated during the last decades and have found applications in optimization, operations research, information theory, nonlinear analysis, machine learning and more. This paper re-examines various aspects related to the theory of Bregman functions and divergences. In particular, it presents many sufficient conditions which allow the construction of Bregman functions in a general setting and introduces new Bregman functions (such as a negative iterated log entropy). Moreover, it sheds new light on several known Bregman functions such as quadratic entropies, the negative Havrda-Charvát-Tsallis entropy, and the negative Boltzmann-Gibbs-Shannon entropy, and it shows that the negative Burg entropy, which is not a Bregman function according to the classical theory but nevertheless is known to have "Bregmanian properties", can, by our re-examination of the theory, be considered as a Bregman function. Our analysis yields several by-products of independent interest such as the introduction of the concept of relative uniform convexity (a certain generalization of uniform convexity), new properties of uniformly and strongly convex functions, and results in Banach space theory. △ Less

Submitted 8 April, 2019; v1 submitted 1 March, 2018; originally announced March 2018.

Comments: Correction of a few very minor inaccuracies; added journal details (volume, page numbers, etc.); some references were updated

MSC Class: 52A41; 52B55; 46N10; 90C25; 90C30; 46T99; 47N10; 49M37; 26B25; 58C05 ACM Class: G.1.6; H.1.1; J.2

Journal ref: Optimization 68 (2019), 279--348

arXiv:1606.07309 [pdf, other]

Likelihood-based Parameter Estimation and Comparison of Dynamical Cognitive Models

Authors: Heiko H. Schütt, Lars Rothkegel, Hans A. Trukenbrod, Sebastian Reich, Felix A. Wichmann, Ralf Engbert

Abstract: Dynamical models of cognition play an increasingly important role in driving theoretical and experimental research in psychology. Therefore, parameter estimation, model analysis and comparison of dynamical models are of essential importance. Here we propose a maximum-likelihood approach for model analysis in a fully dynamical framework that includes time-ordered experimental data. Our methods can… ▽ More Dynamical models of cognition play an increasingly important role in driving theoretical and experimental research in psychology. Therefore, parameter estimation, model analysis and comparison of dynamical models are of essential importance. Here we propose a maximum-likelihood approach for model analysis in a fully dynamical framework that includes time-ordered experimental data. Our methods can be applied to dynamical models for the prediction of discrete behavior (e.g., movement onsets), in particular, we use a dynamical model of saccade generation in scene viewing as a case study for our approach. For this model, the likelihood function can be computed directly by numerical simulation, which enables more efficient parameter estimation including Bayesian inference to obtain reliable estimates and corresponding credible intervals. Using hierarchical models inference is even possible for individual observers. Furthermore, our likelihood approach can be used to compare different models. In our example, the dynamical framework is shown to outperform non-dynamical statistical models. Additionally, the likelihood based evaluation differentiates model variants, which produced indistinguishable predictions on hitherto used statistics. Our results indicate that the likelihood approach is a promising framework for dynamical cognitive models. △ Less

Submitted 5 April, 2017; v1 submitted 23 June, 2016; originally announced June 2016.

Comments: 29 pages, 10 figures, to appear in Psychological Review as a theoretical note

arXiv:1509.08359 [pdf, other]

Relating multi-sequence longitudinal intensity profiles and clinical covariates in new multiple sclerosis lesions

Authors: Elizabeth M. Sweeney, Russell T. Shinohara, Blake E. Dewey, Matthew K. Schindler, John Muschelli, Daniel S. Reich, Ciprian M. Crainiceanu, Ani Eloyan

Abstract: Structural magnetic resonance imaging (MRI) can be used to detect lesions in the brains of multiple sclerosis (MS) patients. The formation of these lesions is a complex process involving inflammation, tissue damage, and tissue repair, all of which are visible on MRI. Here we characterize the lesion formation process on longitudinal, multi-sequence structural MRI from 34 MS patients and relate the… ▽ More Structural magnetic resonance imaging (MRI) can be used to detect lesions in the brains of multiple sclerosis (MS) patients. The formation of these lesions is a complex process involving inflammation, tissue damage, and tissue repair, all of which are visible on MRI. Here we characterize the lesion formation process on longitudinal, multi-sequence structural MRI from 34 MS patients and relate the longitudinal changes we observe within lesions to therapeutic interventions. In this article, we first outline a pipeline to extract voxel level, multi-sequence longitudinal profiles from four MRI sequences within lesion tissue. We then propose two models to relate clinical covariates to the longitudinal profiles. The first model is a principal component analysis (PCA) regression model, which collapses the information from all four profiles into a scalar value. We find that the score on the first PC identifies areas of slow, long-term intensity changes within the lesion at a voxel level, as validated by two experienced clinicians, a neuroradiologist and a neurologist. On a quality scale of 1 to 4 (4 being the highest) the neuroradiologist gave the score on the first PC a median rating of 4 (95% CI: [4,4]), and the neurologist gave it a median rating of 3 (95% CI: [3,3]). In the PCA regression model, we find that treatment with disease modifying therapies (p-value < 0.01), steroids (p-value < 0.01), and being closer to the boundary of abnormal signal intensity (p-value < 0.01) are associated with a return of a voxel to intensity values closer to that of normal-appearing tissue. The second model is a function-on-scalar regression, which allows for assessment of the individual time points at which the covariates are associated with the profiles. In the function-on-scalar regression both age and distance to the boundary were found to have a statistically significant association with the profiles. △ Less

Submitted 28 September, 2015; originally announced September 2015.

arXiv:1501.04420 [pdf, ps, other]

doi 10.1214/14-AOAS748

Longitudinal high-dimensional principal components analysis with application to diffusion tensor imaging of multiple sclerosis

Authors: Vadim Zipunnikov, Sonja Greven, Haochang Shou, Brian S. Caffo, Daniel S. Reich, Ciprian M. Crainiceanu

Abstract: We develop a flexible framework for modeling high-dimensional imaging data observed longitudinally. The approach decomposes the observed variability of repeatedly measured high-dimensional observations into three additive components: a subject-specific imaging random intercept that quantifies the cross-sectional variability, a subject-specific imaging slope that quantifies the dynamic irreversible… ▽ More We develop a flexible framework for modeling high-dimensional imaging data observed longitudinally. The approach decomposes the observed variability of repeatedly measured high-dimensional observations into three additive components: a subject-specific imaging random intercept that quantifies the cross-sectional variability, a subject-specific imaging slope that quantifies the dynamic irreversible deformation over multiple realizations, and a subject-visit-specific imaging deviation that quantifies exchangeable effects between visits. The proposed method is very fast, scalable to studies including ultrahigh-dimensional data, and can easily be adapted to and executed on modest computing infrastructures. The method is applied to the longitudinal analysis of diffusion tensor imaging (DTI) data of the corpus callosum of multiple sclerosis (MS) subjects. The study includes $176$ subjects observed at $466$ visits. For each subject and visit the study contains a registered DTI scan of the corpus callosum at roughly 30,000 voxels. △ Less

Submitted 19 January, 2015; originally announced January 2015.

Comments: Published in at http://dx.doi.org/10.1214/14-AOAS748 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS748

Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 4, 2175-2202

arXiv:1306.5524 [pdf, other]

Soft Null Hypotheses: A Case Study of Image Enhancement Detection in Brain Lesions

Authors: Haochang Shou, Russell T. Shinohara, Han Liu, Daniel S. Reich, Ciprian M. Crainiceanu

Abstract: This work is motivated by a study of a population of multiple sclerosis (MS) patients using dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) to identify active brain lesions. At each visit, a contrast agent is administered intravenously to a subject and a series of images is acquired to reveal the location and activity of MS lesions within the brain. Our goal is to identify and quant… ▽ More This work is motivated by a study of a population of multiple sclerosis (MS) patients using dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) to identify active brain lesions. At each visit, a contrast agent is administered intravenously to a subject and a series of images is acquired to reveal the location and activity of MS lesions within the brain. Our goal is to identify and quantify lesion enhancement location at the subject level and lesion enhancement patterns at the population level. With this example, we aim to address the difficult problem of transforming a qualitative scientific null hypothesis, such as "this voxel does not enhance", to a well-defined and numerically testable null hypothesis based on existing data. We call the procedure "soft null hypothesis" testing as opposed to the standard "hard null hypothesis" testing. This problem is fundamentally different from: 1) testing when a quantitative null hypothesis is given; 2) clustering using a mixture distribution; or 3) identifying a reasonable threshold with a parametric null assumption. We analyze a total of 20 subjects scanned at 63 visits (~30Gb), the largest population of such clinical brain images. △ Less

Submitted 24 June, 2013; originally announced June 2013.

Showing 1–23 of 23 results for author: Reich, S