Search | arXiv e-print repository

Spatially Penalised Registration of Multivariate Functional Data

Authors: Xiaohan Guo, Sebastian Kurtek, Karthik Bharath

Abstract: Registration of multivariate functional data involves handling of both cross-component and cross-observation phase variations. Allowing for the two phase variations to be modelled as general diffeomorphic time war**s, in this work we focus on the hitherto unconsidered setting where phase variation of the component functions are spatially correlated. We propose an algorithm to optimize a metric-b… ▽ More Registration of multivariate functional data involves handling of both cross-component and cross-observation phase variations. Allowing for the two phase variations to be modelled as general diffeomorphic time war**s, in this work we focus on the hitherto unconsidered setting where phase variation of the component functions are spatially correlated. We propose an algorithm to optimize a metric-based objective function for registration with a novel penalty term that incorporates the spatial correlation between the component phase variations through a kriging estimate of an appropriate phase random field. The penalty term encourages the overall phase at a particular location to be similar to the spatially weighted average phase in its neighbourhood, and thus engenders a regularization that prevents over-alignment. Utility of the registration method, and its superior performance compared to methods that fail to account for the spatial correlation, is demonstrated through performance on simulated examples and two multivariate functional datasets pertaining to EEG signals and ozone concentrations. The generality of the framework opens up the possibility for extension to settings involving different forms of correlation between the component functions and their phases. △ Less

Submitted 22 July, 2022; originally announced July 2022.

arXiv:2203.12005 [pdf, other]

Sequential Bayesian Registration for Functional Data

Authors: Yoonji Kim, Oksana A. Chkrebtii, Sebastian A. Kurtek

Abstract: In many modern applications, discretely-observed data may be naturally understood as a set of functions. Functional data often exhibit two confounded sources of variability: amplitude (y-axis) and phase (x-axis). The extraction of amplitude and phase, a process known as registration, is essential in exploring the underlying structure of functional data in a variety of areas, from environmental mon… ▽ More In many modern applications, discretely-observed data may be naturally understood as a set of functions. Functional data often exhibit two confounded sources of variability: amplitude (y-axis) and phase (x-axis). The extraction of amplitude and phase, a process known as registration, is essential in exploring the underlying structure of functional data in a variety of areas, from environmental monitoring to medical imaging. Critically, such data are often gathered sequentially with new functional observations arriving over time. Despite this, most available registration procedures are only applicable to batch learning, leading to inefficient computation. To address these challenges, we introduce a Bayesian framework for sequential registration of functional data, which updates statistical inference as new sets of functions are assimilated. This Bayesian model-based sequential learning approach utilizes sequential Monte Carlo sampling to recursively update the alignment of observed functions while accounting for associated uncertainty. As a result, distributed computing, which is not generally an option in batch learning, significantly reduces computational cost. Simulation studies and comparisons to existing batch learning methods reveal that the proposed approach performs well even when the target posterior distribution has a challenging structure. We apply the proposed method to three real datasets: (1) functions of annual drought intensity near Kaweah River in California, (2) annual sea surface salinity functions near Null Island, and (3) PQRST complexes segmented from an electrocardiogram signal. △ Less

Submitted 22 March, 2022; originally announced March 2022.

Comments: 36 pages, 10 figures, 1 table

arXiv:2106.15436 [pdf, other]

Topo-Geometric Analysis of Variability in Point Clouds using Persistence Landscapes

Authors: James Matuk, Sebastian Kurtek, Karthik Bharath

Abstract: Topological data analysis provides a set of tools to uncover low-dimensional structure in noisy point clouds. Prominent amongst the tools is persistence homology, which summarizes birth-death times of homological features using data objects known as persistence diagrams. To better aid statistical analysis, a functional representation of the diagrams, known as persistence landscapes, enable use of… ▽ More Topological data analysis provides a set of tools to uncover low-dimensional structure in noisy point clouds. Prominent amongst the tools is persistence homology, which summarizes birth-death times of homological features using data objects known as persistence diagrams. To better aid statistical analysis, a functional representation of the diagrams, known as persistence landscapes, enable use of functional data analysis and machine learning tools. Topological and geometric variabilities inherent in point clouds are confounded in both persistence diagrams and landscapes, and it is important to distinguish topological signal from noise to draw reliable conclusions on the structure of the point clouds when using persistence homology. We develop a framework for decomposing variability in persistence diagrams into topological signal and topological noise through alignment of persistence landscapes using an elastic Riemannian metric. Aligned landscapes (amplitude) isolate the topological signal. Reparameterizations used for landscape alignment (phase) are linked to a resolution parameter used to generate persistence diagrams, and capture topological noise in the form of geometric, global scaling and sampling variabilities. We illustrate the importance of decoupling topological signal and topological noise in persistence diagrams (landscapes) using several simulated examples. We also demonstrate that our approach provides novel insights in two real data studies. △ Less

Submitted 1 February, 2024; v1 submitted 29 June, 2021; originally announced June 2021.

arXiv:2106.10941 [pdf, other]

Tumor Radiogenomics with Bayesian Layered Variable Selection

Authors: Shariq Mohammed, Sebastian Kurtek, Karthik Bharath, Arvind Rao, Veerabhadran Baladandayuthapani

Abstract: We propose a statistical framework to integrate radiological magnetic resonance imaging (MRI) and genomic data to identify the underlying radiogenomic associations in lower grade gliomas (LGG). We devise a novel imaging phenotype by dividing the tumor region into concentric spherical layers that mimics the tumor evolution process. MRI data within each layer is represented by voxel--intensity-based… ▽ More We propose a statistical framework to integrate radiological magnetic resonance imaging (MRI) and genomic data to identify the underlying radiogenomic associations in lower grade gliomas (LGG). We devise a novel imaging phenotype by dividing the tumor region into concentric spherical layers that mimics the tumor evolution process. MRI data within each layer is represented by voxel--intensity-based probability density functions which capture the complete information about tumor heterogeneity. Under a Riemannian-geometric framework these densities are mapped to a vector of principal component scores which act as imaging phenotypes. Subsequently, we build Bayesian variable selection models for each layer with the imaging phenotypes as the response and the genomic markers as predictors. Our novel hierarchical prior formulation incorporates the interior-to-exterior structure of the layers, and the correlation between the genomic markers. We employ a computationally-efficient Expectation--Maximization-based strategy for estimation. Simulation studies demonstrate the superior performance of our approach compared to other approaches. With a focus on the cancer driver genes in LGG, we discuss some biologically relevant findings. Genes implicated with survival and oncogenesis are identified as being associated with the spherical layers, which could potentially serve as early-stage diagnostic markers for disease monitoring, prior to routine invasive approaches. △ Less

Submitted 21 June, 2021; originally announced June 2021.

arXiv:2104.00510 [pdf, other]

RADIOHEAD: Radiogenomic Analysis Incorporating Tumor Heterogeneity in Imaging Through Densities

Authors: Shariq Mohammed, Karthik Bharath, Sebastian Kurtek, Arvind Rao, Veerabhadran Baladandayuthapani

Abstract: Recent technological advancements have enabled detailed investigation of associations between the molecular architecture and tumor heterogeneity, through multi-source integration of radiological imaging and genomic (radiogenomic) data. In this paper, we integrate and harness radiogenomic data in patients with lower grade gliomas (LGG), a type of brain cancer, in order to develop a regression frame… ▽ More Recent technological advancements have enabled detailed investigation of associations between the molecular architecture and tumor heterogeneity, through multi-source integration of radiological imaging and genomic (radiogenomic) data. In this paper, we integrate and harness radiogenomic data in patients with lower grade gliomas (LGG), a type of brain cancer, in order to develop a regression framework called RADIOHEAD (RADIOgenomic analysis incorporating tumor HEterogeneity in imAging through Densities) to identify radiogenomic associations. Imaging data is represented through voxel intensity probability density functions of tumor sub-regions obtained from multimodal magnetic resonance imaging, and genomic data through molecular signatures in the form of pathway enrichment scores corresponding to their gene expression profiles. Employing a Riemannian-geometric framework for principal component analysis on the set of probability densities functions, we map each probability density to a vector of principal component scores, which are then included as predictors in a Bayesian regression model with the pathway enrichment scores as the response. Variable selection compatible with the grou** structure amongst the predictors induced through the tumor sub-regions is carried out under a group spike-and-slab prior. A Bayesian false discovery rate mechanism is then used to infer significant associations based on the posterior distribution of the regression coefficients. Our analyses reveal several pathways relevant to LGG etiology (such as synaptic transmission, nerve impulse and neurotransmitter pathways), to have significant associations with the corresponding imaging-based predictors. △ Less

Submitted 7 April, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

arXiv:2103.01097 [pdf, other]

Tangent functional canonical correlation analysis for densities and shapes, with applications to multimodal imaging data

Authors: Min Ho Cho, Sebastian Kurtek, Karthik Bharath

Abstract: It is quite common for functional data arising from imaging data to assume values in infinite-dimensional manifolds. Uncovering associations between two or more such nonlinear functional data extracted from the same object across medical imaging modalities can assist development of personalized treatment strategies. We propose a method for canonical correlation analysis between paired probability… ▽ More It is quite common for functional data arising from imaging data to assume values in infinite-dimensional manifolds. Uncovering associations between two or more such nonlinear functional data extracted from the same object across medical imaging modalities can assist development of personalized treatment strategies. We propose a method for canonical correlation analysis between paired probability densities or shapes of closed planar curves, routinely used in biomedical studies, which combines a convenient linearization and dimension reduction of the data using tangent space coordinates. Leveraging the fact that the corresponding manifolds are submanifolds of unit Hilbert spheres, we describe how finite-dimensional representations of the functional data objects can be easily computed, which then facilitates use of standard multivariate canonical correlation analysis methods. We further construct and visualize canonical variate directions directly on the space of densities or shapes. Utility of the method is demonstrated through numerical simulations and performance on a magnetic resonance imaging dataset of Glioblastoma Multiforme brain tumors. △ Less

Submitted 24 September, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

arXiv:2011.12397 [pdf, ps, other]

Elastic $k$-means clustering of functional data for posterior exploration, with an application to inference on acute respiratory infection dynamics

Authors: Xiao Zang, Sebastian Kurtek, Oksana Chkrebtii, J. Derek Tucker

Abstract: We propose a new method for clustering of functional data using a $k$-means framework. We work within the elastic functional data analysis framework, which allows for decomposition of the overall variation in functional data into amplitude and phase components. We use the amplitude component to partition functions into shape clusters using an automated approach. To select an appropriate number of… ▽ More We propose a new method for clustering of functional data using a $k$-means framework. We work within the elastic functional data analysis framework, which allows for decomposition of the overall variation in functional data into amplitude and phase components. We use the amplitude component to partition functions into shape clusters using an automated approach. To select an appropriate number of clusters, we additionally propose a novel Bayesian Information Criterion defined using a mixture model on principal components estimated using functional Principal Component Analysis. The proposed method is motivated by the problem of posterior exploration, wherein samples obtained from Markov chain Monte Carlo algorithms are naturally represented as functions. We evaluate our approach using a simulated dataset, and apply it to a study of acute respiratory infection dynamics in San Luis Potosí, Mexico. △ Less

Submitted 24 November, 2020; originally announced November 2020.

arXiv:2010.09578 [pdf, other]

Variograms for spatial functional data with phase variation

Authors: Xiaohan Guo, Sebastian Kurtek, Karthik Bharath

Abstract: Spatial, amplitude and phase variations in spatial functional data are confounded. Conclusions from the popular functional trace variogram, which quantifies spatial variation, can be misleading when analysing misaligned functional data with phase variation. To remedy this, we describe a framework that extends amplitude-phase separation methods in functional data to the spatial setting, with a view… ▽ More Spatial, amplitude and phase variations in spatial functional data are confounded. Conclusions from the popular functional trace variogram, which quantifies spatial variation, can be misleading when analysing misaligned functional data with phase variation. To remedy this, we describe a framework that extends amplitude-phase separation methods in functional data to the spatial setting, with a view towards performing clustering and spatial prediction. We propose a decomposition of the trace variogram into amplitude and phase components and quantify how spatial correlations between functional observations manifest in their respective amplitude and phase components. This enables us to generate separate amplitude and phase clustering methods for spatial functional data, and develop a novel spatial functional interpolant at unobserved locations based on combining separate amplitude and phase predictions. Through simulations and real data analyses, we found that the proposed methods result in more accurate predictions and more interpretable clustering results. △ Less

Submitted 19 October, 2020; originally announced October 2020.

arXiv:2004.11128 [pdf, other]

The Weighted Euler Curve Transform for Shape and Image Analysis

Authors: Qitong Jiang, Sebastian Kurtek, Tom Needham

Abstract: The Euler Curve Transform (ECT) of Turner et al.\ is a complete invariant of an embedded simplicial complex, which is amenable to statistical analysis. We generalize the ECT to provide a similarly convenient representation for weighted simplicial complexes, objects which arise naturally, for example, in certain medical imaging applications. We leverage work of Ghrist et al.\ on Euler integral calc… ▽ More The Euler Curve Transform (ECT) of Turner et al.\ is a complete invariant of an embedded simplicial complex, which is amenable to statistical analysis. We generalize the ECT to provide a similarly convenient representation for weighted simplicial complexes, objects which arise naturally, for example, in certain medical imaging applications. We leverage work of Ghrist et al.\ on Euler integral calculus to prove that this invariant---dubbed the Weighted Euler Curve Transform (WECT)---is also complete. We explain how to transform a segmented region of interest in a grayscale image into a weighted simplicial complex and then into a WECT representation. This WECT representation is applied to study Glioblastoma Multiforme brain tumor shape and texture data. We show that the WECT representation is effective at clustering tumors based on qualitative shape and texture features and that this clustering correlates with patient survival time. △ Less

Submitted 23 April, 2020; originally announced April 2020.

Comments: To appear in CVPR conference workshop proceedings for DIFF-CVML 2020

arXiv:1912.05125 [pdf, other]

Bayesian Framework for Simultaneous Registration and Estimation of Noisy, Sparse and Fragmented Functional Data

Authors: James Matuk, Karthik Bharath, Oksana Chkrebtii, Sebastian Kurtek

Abstract: In many applications, smooth processes generate data that is recorded under a variety of observation regimes, such as dense, sparse or fragmented observations that are often contaminated with error. The statistical goal of registering and estimating the individual underlying functions from discrete observations has thus far been mainly approached sequentially without formal uncertainty propagation… ▽ More In many applications, smooth processes generate data that is recorded under a variety of observation regimes, such as dense, sparse or fragmented observations that are often contaminated with error. The statistical goal of registering and estimating the individual underlying functions from discrete observations has thus far been mainly approached sequentially without formal uncertainty propagation, or in an application-specific manner. We propose a unified Bayesian framework for simultaneous registration and estimation, which is flexible enough to accommodate inference on individual functions under general observation regimes. Our ability to do this relies on the specification of strongly informative prior models over the amplitude component of function variability. We provide two strategies for this critical choice: a data-driven approach that defines an empirical basis for the amplitude subspace based on training data, and a shape-restricted approach when the relative location and number of local extrema is well-understood. The proposed methods build on elastic functional data analysis, which separately models amplitude and phase variability inherent in functional data. We emphasize the importance of uncertainty quantification and visualization of these two components as they provide complementary information about the estimated functions. We validate the framework using simulations and real applications to medical imaging and biometrics. △ Less

Submitted 11 December, 2019; originally announced December 2019.

arXiv:1911.02249 [pdf, other]

doi 10.1080/00401706.2021.1883481

Estimation of Spatial Deformation for Nonstationary Processes via Variogram Alignment

Authors: Ghulam A. Qadir, Ying Sun, Sebastian Kurtek

Abstract: In modeling spatial processes, a second-order stationarity assumption is often made. However, for spatial data observed on a vast domain, the covariance function often varies over space, leading to a heterogeneous spatial dependence structure, therefore requiring nonstationary modeling. Spatial deformation is one of the main methods for modeling nonstationary processes, assuming the nonstationary… ▽ More In modeling spatial processes, a second-order stationarity assumption is often made. However, for spatial data observed on a vast domain, the covariance function often varies over space, leading to a heterogeneous spatial dependence structure, therefore requiring nonstationary modeling. Spatial deformation is one of the main methods for modeling nonstationary processes, assuming the nonstationary process has a stationary counterpart in the deformed space. The estimation of the deformation function poses severe challenges. Here, we introduce a novel approach for nonstationary geostatistical modeling, using space deformation, when a single realization of the spatial process is observed. Our method is based, at a fundamental level, on aligning regional variograms, where war** variability of the distance from each subregion explains the spatial nonstationarity. We propose to use multi-dimensional scaling to map the warped distances to spatial locations. We asses the performance of our new method using multiple simulation studies. Additionally, we illustrate our methodology on precipitation data to estimate the heterogeneous spatial dependence and to perform spatial predictions. △ Less

Submitted 7 November, 2019; v1 submitted 6 November, 2019; originally announced November 2019.

MSC Class: 62H11; 62M30

Journal ref: Technometrics 2021

arXiv:1901.07593 [pdf, other]

Aggregated Pairwise Classification of Statistical Shapes

Authors: Min Ho Cho, Sebastian Kurtek, Steven N. MacEachern

Abstract: The classification of shapes is of great interest in diverse areas ranging from medical imaging to computer vision and beyond. While many statistical frameworks have been developed for the classification problem, most are strongly tied to early formulations of the problem - with an object to be classified described as a vector in a relatively low-dimensional Euclidean space. Statistical shape data… ▽ More The classification of shapes is of great interest in diverse areas ranging from medical imaging to computer vision and beyond. While many statistical frameworks have been developed for the classification problem, most are strongly tied to early formulations of the problem - with an object to be classified described as a vector in a relatively low-dimensional Euclidean space. Statistical shape data have two main properties that suggest a need for a novel approach: (i) shapes are inherently infinite dimensional with strong dependence among the positions of nearby points, and (ii) shape space is not Euclidean, but is fundamentally curved. To accommodate these features of the data, we work with the square-root velocity function of the curves to provide a useful formal description of the shape, pass to tangent spaces of the manifold of shapes at different projection points which effectively separate shapes for pairwise classification in the training data, and use principal components within these tangent spaces to reduce dimensionality. We illustrate the impact of the projection point and choice of subspace on the misclassification rate with a novel method of combining pairwise classifiers. △ Less

Submitted 22 January, 2019; originally announced January 2019.

arXiv:1810.03671 [pdf, other]

doi 10.1007/s13171-018-0145-7

Geometric Sensitivity Measures for Bayesian Nonparametric Density Estimation Models

Authors: Abhijoy Saha, Sebastian Kurtek

Abstract: We propose a geometric framework to assess global sensitivity in Bayesian nonparametric models for density estimation. We study sensitivity of nonparametric Bayesian models for density estimation, based on Dirichlet-type priors, to perturbations of either the precision parameter or the base probability measure. To quantify the different effects of the perturbations of the parameters and hyperparam… ▽ More We propose a geometric framework to assess global sensitivity in Bayesian nonparametric models for density estimation. We study sensitivity of nonparametric Bayesian models for density estimation, based on Dirichlet-type priors, to perturbations of either the precision parameter or the base probability measure. To quantify the different effects of the perturbations of the parameters and hyperparameters in these models on the posterior, we define three geometrically-motivated global sensitivity measures based on geodesic paths and distances computed under the nonparametric Fisher-Rao Riemannian metric on the space of densities, applied to posterior samples of densities: (1) the Fisher-Rao distance between density averages of posterior samples, (2) the log-ratio of Karcher variances of posterior samples, and (3) the norm of the difference of scaled cumulative eigenvalues of empirical covariance operators obtained from posterior samples. We validate our approach using multiple simulation studies, and consider the problem of sensitivity analysis for Bayesian density estimation models in the context of three real datasets that have previously been studied. △ Less

Submitted 8 October, 2018; originally announced October 2018.

Comments: Accepted for publication in Sankhya A

arXiv:1805.11401 [pdf, other]

doi 10.1080/02664763.2019.1645818

A Geometric Approach for Computing Tolerance Bounds for Elastic Functional Data

Authors: J. Derek Tucker, John R. Lewis, Caleb King, Sebastian Kurtek

Abstract: We develop a method for constructing tolerance bounds for functional data with random war** variability. In particular, we define a generative, probabilistic model for the amplitude and phase components of such observations, which parsimoniously characterizes variability in the baseline data. Based on the proposed model, we define two different types of tolerance bounds that are able to measure… ▽ More We develop a method for constructing tolerance bounds for functional data with random war** variability. In particular, we define a generative, probabilistic model for the amplitude and phase components of such observations, which parsimoniously characterizes variability in the baseline data. Based on the proposed model, we define two different types of tolerance bounds that are able to measure both types of variability, and as a result, identify when the data has gone beyond the bounds of amplitude and/or phase. The first functional tolerance bounds are computed via a bootstrap procedure on the geometric space of amplitude and phase functions. The second functional tolerance bounds utilize functional Principal Component Analysis to construct a tolerance factor. This work is motivated by two main applications: process control and disease monitoring. The problem of statistical analysis and modeling of functional data in process control is important in determining when a production has moved beyond a baseline. Similarly, in biomedical applications, doctors use long, approximately periodic signals (such as the electrocardiogram) to diagnose and monitor diseases. In this context, it is desirable to identify abnormalities in these signals. We additionally consider a simulated example to assess our approach and compare it to two existing methods. △ Less

Submitted 25 April, 2019; v1 submitted 29 May, 2018; originally announced May 2018.

Comments: 25 pages

MSC Class: 62F25

arXiv:1710.05008 [pdf, other]

Automatic Detection and Uncertainty Quantification of Landmarks on Elastic Curves

Authors: Justin Strait, Oksana Chkrebtii, Sebastian Kurtek

Abstract: A population quantity of interest in statistical shape analysis is the location of landmarks, which are points that aid in reconstructing and representing shapes of objects. We provide an automated, model-based approach to inferring landmarks given a sample of shape data. The model is formulated based on a linear reconstruction of the shape, passing through the specified points, and a Bayesian inf… ▽ More A population quantity of interest in statistical shape analysis is the location of landmarks, which are points that aid in reconstructing and representing shapes of objects. We provide an automated, model-based approach to inferring landmarks given a sample of shape data. The model is formulated based on a linear reconstruction of the shape, passing through the specified points, and a Bayesian inferential approach is described for estimating unknown landmark locations. The question of how many landmarks to select is addressed in two different ways: (1) by defining a criterion-based approach, and (2) joint estimation of the number of landmarks along with their locations. Efficient methods for posterior sampling are also discussed. We motivate our approach using several simulated examples, as well as data obtained from applications in computer vision and biology; additionally, we explore placements and associated uncertainty in landmarks for various substructures extracted from magnetic resonance image slices. △ Less

Submitted 13 October, 2017; originally announced October 2017.

arXiv:1708.04891 [pdf, other]

Distribution on Warp Maps for Alignment of Open and Closed Curves

Authors: Karthik Bharath, Sebastian Kurtek

Abstract: Alignment of curve data is an integral part of their statistical analysis, and can be achieved using model- or optimization-based approaches. The parameter space is usually the set of monotone, continuous warp maps of a domain. Infinite-dimensional nature of the parameter space encourages sampling based approaches, which require a distribution on the set of warp maps. Moreover, the distribution sh… ▽ More Alignment of curve data is an integral part of their statistical analysis, and can be achieved using model- or optimization-based approaches. The parameter space is usually the set of monotone, continuous warp maps of a domain. Infinite-dimensional nature of the parameter space encourages sampling based approaches, which require a distribution on the set of warp maps. Moreover, the distribution should also enable sampling in the presence of important landmark information on the curves which constrain the warp maps. For alignment of closed and open curves in $\mathbb{R}^d, d=1,2,3$, possibly with landmark information, we provide a constructive, point-process based definition of a distribution on the set of warp maps of $[0,1]$ and the unit circle $\mathbb{S}^1$ that is (1) simple to sample from, and (2) possesses the desiderata for decomposition of the alignment problem with landmark constraints into multiple unconstrained ones. For warp maps on $[0,1]$, the distribution is related to the Dirichlet process. We demonstrate its utility by using it as a prior distribution on warp maps in a Bayesian model for alignment of two univariate curves, and as a proposal distribution in a stochastic algorithm that optimizes a suitable alignment functional for higher-dimensional curves. Several examples from simulated and real datasets are provided. △ Less

Submitted 4 February, 2019; v1 submitted 16 August, 2017; originally announced August 2017.

arXiv:1707.09714 [pdf, other]

A Geometric Variational Approach to Bayesian Inference

Authors: Abhijoy Saha, Karthik Bharath, Sebastian Kurtek

Abstract: We propose a novel Riemannian geometric framework for variational inference in Bayesian models based on the nonparametric Fisher-Rao metric on the manifold of probability density functions. Under the square-root density representation, the manifold can be identified with the positive orthant of the unit hypersphere in L2, and the Fisher-Rao metric reduces to the standard L2 metric. Exploiting such… ▽ More We propose a novel Riemannian geometric framework for variational inference in Bayesian models based on the nonparametric Fisher-Rao metric on the manifold of probability density functions. Under the square-root density representation, the manifold can be identified with the positive orthant of the unit hypersphere in L2, and the Fisher-Rao metric reduces to the standard L2 metric. Exploiting such a Riemannian structure, we formulate the task of approximating the posterior distribution as a variational problem on the hypersphere based on the alpha-divergence. This provides a tighter lower bound on the marginal distribution when compared to, and a corresponding upper bound unavailable with, approaches based on the Kullback-Leibler divergence. We propose a novel gradient-based algorithm for the variational problem based on Frechet derivative operators motivated by the geometry of the Hilbert sphere, and examine its properties. Through simulations and real-data applications, we demonstrate the utility of the proposed geometric framework and algorithm on several Bayesian models. △ Less

Submitted 27 March, 2019; v1 submitted 30 July, 2017; originally announced July 2017.

arXiv:1702.01191 [pdf, other]

Radiologic Image-based Statistical Shape Analysis of Brain Tumors

Authors: Karthik Bharath, Sebastian Kurtek, Arvind Rao, Veerabhadran Baladandayuthapani

Abstract: We propose a curve-based Riemannian-geometric approach for general shape-based statistical analyses of tumors obtained from radiologic images. A key component of the framework is a suitable metric that (1) enables comparisons of tumor shapes, (2) provides tools for computing descriptive statistics and implementing principal component analysis on the space of tumor shapes, and (3) allows for a rich… ▽ More We propose a curve-based Riemannian-geometric approach for general shape-based statistical analyses of tumors obtained from radiologic images. A key component of the framework is a suitable metric that (1) enables comparisons of tumor shapes, (2) provides tools for computing descriptive statistics and implementing principal component analysis on the space of tumor shapes, and (3) allows for a rich class of continuous deformations of a tumor shape. The utility of the framework is illustrated through specific statistical tasks on a dataset of radiologic images of patients diagnosed with glioblastoma multiforme, a malignant brain tumor with poor prognosis. In particular, our analysis discovers two patient clusters with very different survival, subtype and genomic characteristics. Furthermore, it is demonstrated that adding tumor shape information into survival models containing clinical and genomic variables results in a significant increase in predictive power. △ Less

Submitted 3 February, 2017; originally announced February 2017.

arXiv:1702.01183 [pdf, other]

doi 10.1080/01621459.2016.1256813

A Geometric Approach to Visualization of Variability in Functional Data

Authors: Weiyi Xie, Sebastian Kurtek, Karthik Bharath, Ying Sun

Abstract: We propose a new method for the construction and visualization of boxplot-type displays for functional data. We use a recent functional data analysis framework, based on a representation of functions called square-root slope functions, to decompose observed variation in functional data into three main components: amplitude, phase, and vertical translation. We then construct separate displays for e… ▽ More We propose a new method for the construction and visualization of boxplot-type displays for functional data. We use a recent functional data analysis framework, based on a representation of functions called square-root slope functions, to decompose observed variation in functional data into three main components: amplitude, phase, and vertical translation. We then construct separate displays for each component, using the geometry and metric of each representation space, based on a novel definition of the median, the two quartiles, and extreme observations. The outlyingness of functional data is a very complex concept. Thus, we propose to identify outliers based on any of the three main components after decomposition. We provide a variety of visualization tools for the proposed boxplot-type displays including surface plots. We evaluate the proposed method using extensive simulations and then focus our attention on three real data applications including exploratory data analysis of sea surface temperature functions, electrocardiogram functions and growth curves. △ Less

Submitted 3 February, 2017; originally announced February 2017.

Comments: Journal of the American Statistical Association, 2016

arXiv:1505.06954 [pdf, other]

A Geometric Approach to Pairwise Bayesian Alignment of Functional Data Using Importance Sampling

Authors: Sebastian Kurtek

Abstract: We present a Bayesian model for pairwise nonlinear registration of functional data. We use the Riemannian geometry of the space of war** functions to define appropriate prior distributions and sample from the posterior using importance sampling. A simple square-root transformation is used to simplify the geometry of the space of war** functions, which allows for computation of sample statistic… ▽ More We present a Bayesian model for pairwise nonlinear registration of functional data. We use the Riemannian geometry of the space of war** functions to define appropriate prior distributions and sample from the posterior using importance sampling. A simple square-root transformation is used to simplify the geometry of the space of war** functions, which allows for computation of sample statistics, such as the mean and median, and a fast implementation of a $k$-means clustering algorithm. These tools allow for efficient posterior inference, where multiple modes of the posterior distribution corresponding to multiple plausible alignments of the given functions are found. We also show pointwise $95\%$ credible intervals to assess the uncertainty of the alignment in different clusters. We validate this model using simulations and present multiple examples on real data from different application domains including biometrics and medicine. △ Less

Submitted 3 February, 2017; v1 submitted 26 May, 2015; originally announced May 2015.

arXiv:1405.0803 [pdf, ps, other]

doi 10.1214/13-AOAS701

Statistical analysis of trajectories on Riemannian manifolds: Bird migration, hurricane tracking and video surveillance

Authors: **gyong Su, Sebastian Kurtek, Eric Klassen, Anuj Srivastava

Abstract: We consider the statistical analysis of trajectories on Riemannian manifolds that are observed under arbitrary temporal evolutions. Past methods rely on cross-sectional analysis, with the given temporal registration, and consequently may lose the mean structure and artificially inflate observed variances. We introduce a quantity that provides both a cost function for temporal registration and a pr… ▽ More We consider the statistical analysis of trajectories on Riemannian manifolds that are observed under arbitrary temporal evolutions. Past methods rely on cross-sectional analysis, with the given temporal registration, and consequently may lose the mean structure and artificially inflate observed variances. We introduce a quantity that provides both a cost function for temporal registration and a proper distance for comparison of trajectories. This distance is used to define statistical summaries, such as sample means and covariances, of synchronized trajectories and "Gaussian-type" models to capture their variability at discrete times. It is invariant to identical time-war**s (or temporal reparameterizations) of trajectories. This is based on a novel mathematical representation of trajectories, termed transported square-root vector field (TSRVF), and the $\mathbb{L}^2$ norm on the space of TSRVFs. We illustrate this framework using three representative manifolds---$\mathbb{S}^2$, $\mathrm {SE}(2)$ and shape space of planar contours---involving both simulated and real data. In particular, we demonstrate: (1) improvements in mean structures and significant reductions in cross-sectional variances using real data sets, (2) statistical modeling for capturing variability in aligned trajectories, and (3) evaluating random trajectories under these models. Experimental results concern bird migration, hurricane tracking and video surveillance. △ Less

Submitted 5 May, 2014; originally announced May 2014.

Comments: Published in at http://dx.doi.org/10.1214/13-AOAS701 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS701

Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 1, 530-552

arXiv:1403.5150 [pdf, other]

Bayes Sensitivity with Fisher-Rao Metric

Authors: Sebastian Kurtek, Karthik Bharath

Abstract: We propose a geometric framework to assess sensitivity of Bayesian procedures to modeling assumptions based on the nonparametric Fisher-Rao metric. While the framework is general in spirit, the focus of this article is restricted to metric-based diagnosis under two settings: assessing local and global robustness in Bayesian procedures to perturbations of the likelihood and prior, and identificatio… ▽ More We propose a geometric framework to assess sensitivity of Bayesian procedures to modeling assumptions based on the nonparametric Fisher-Rao metric. While the framework is general in spirit, the focus of this article is restricted to metric-based diagnosis under two settings: assessing local and global robustness in Bayesian procedures to perturbations of the likelihood and prior, and identification of influential observations. The approach is based on the square-root representation of densities which enables one to compute geodesics and geodesic distances in analytical form, facilitating the definition of naturally calibrated local and global discrepancy measures. An important feature of our approach is the definition of a geometric $ε$-contamination class of sampling distributions and priors via intrinsic analysis on the space of probability density functions. We showcase the applicability of our framework on several simulated toy datasets as well as in real data settings for generalized mixed effects models, directional data and shape data. △ Less

Submitted 25 April, 2014; v1 submitted 20 March, 2014; originally announced March 2014.

arXiv:1103.3817 [pdf, ps, other]

Registration of Functional Data Using Fisher-Rao Metric

Authors: Anuj Srivastava, Wei Wu, Sebastian Kurtek, Eric Klassen, J. S. Marron

Abstract: We introduce a novel geometric framework for separating the phase and the amplitude variability in functional data of the type frequently studied in growth curve analysis. This framework uses the Fisher-Rao Riemannian metric to derive a proper distance on the quotient space of functions modulo the time-war** group. A convenient square-root velocity function (SRVF) representation transforms the F… ▽ More We introduce a novel geometric framework for separating the phase and the amplitude variability in functional data of the type frequently studied in growth curve analysis. This framework uses the Fisher-Rao Riemannian metric to derive a proper distance on the quotient space of functions modulo the time-war** group. A convenient square-root velocity function (SRVF) representation transforms the Fisher-Rao metric into the standard $\ltwo$ metric, simplifying the computations. This distance is then used to define a Karcher mean template and warp the individual functions to align them with the Karcher mean template. The strength of this framework is demonstrated by deriving a consistent estimator of a signal observed under random war**, scaling, and vertical translation. These ideas are demonstrated using both simulated and real data from different application domains: the Berkeley growth study, handwritten signature curves, neuroscience spike trains, and gene expression signals. The proposed method is empirically shown to be be superior in performance to several recently published methods for functional alignment. △ Less

Submitted 16 May, 2011; v1 submitted 19 March, 2011; originally announced March 2011.

Comments: Revised paper. More focused on a subproblem and more theoretical results

Showing 1–23 of 23 results for author: Kurtek, S