Search | arXiv e-print repository

Statistics for Phylogenetic Trees in the Presence of Stickiness

Authors: Lars Lammers, Tom M. W. Nye, Stephan F. Huckemann

Abstract: Samples of phylogenetic trees arise in a variety of evolutionary and biomedical applications, and the Fréchet mean in Billera-Holmes-Vogtmann tree space is a summary tree shown to have advantages over other mean or consensus trees. However, use of the Fréchet mean raises computational and statistical issues which we explore in this paper. The Fréchet sample mean is known often to contain fewer int… ▽ More Samples of phylogenetic trees arise in a variety of evolutionary and biomedical applications, and the Fréchet mean in Billera-Holmes-Vogtmann tree space is a summary tree shown to have advantages over other mean or consensus trees. However, use of the Fréchet mean raises computational and statistical issues which we explore in this paper. The Fréchet sample mean is known often to contain fewer internal edges than the trees in the sample, and in this circumstance calculating the mean by iterative schemes can be problematic due to slow convergence. We present new methods for identifying edges which must lie in the Fréchet sample mean and apply these to a data set of gene trees relating organisms from the apicomplexa which cause a variety of parasitic infections. When a sample of trees contains a significant level of heterogeneity in the branching patterns, or topologies, displayed by the trees then the Fréchet mean is often a star tree, lacking any internal edges. Not only in this situation, the population Fréchet mean is affected by a non-Euclidean phenomenon called stickness which impacts upon asymptotics, and we examine two data sets for which the mean tree is a star tree. The first consists of trees representing the physical shape of artery structures in a sample of medical images of human brains in which the branching patterns are very diverse. The second consists of gene trees from a population of baboons in which there is evidence of substantial hybridization. We develop hypothesis tests which work in the presence of stickiness. The first is a test for the presence of a given edge in the Fréchet population mean; the second is a two-sample test for differences in two distributions which share the same sticky population mean. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 37 pages, 16 figures

arXiv:2402.12290 [pdf, ps, other]

A Lower Bound for Estimating Fréchet Means

Authors: Shayan Hundrieser, Benjamin Eltzner, Stephan F. Huckemann

Abstract: Fréchet means, conceptually appealing, generalize the Euclidean expectation to general metric spaces. We explore how well Fréchet means can be estimated from independent and identically distributed samples and uncover a fundamental limitation: In the vicinity of a probability distribution $P$ with nonunique means, independent of sample size, it is not possible to uniformly estimate Fréchet means b… ▽ More Fréchet means, conceptually appealing, generalize the Euclidean expectation to general metric spaces. We explore how well Fréchet means can be estimated from independent and identically distributed samples and uncover a fundamental limitation: In the vicinity of a probability distribution $P$ with nonunique means, independent of sample size, it is not possible to uniformly estimate Fréchet means below a precision determined by the diameter of the set of Fréchet means of $P$. Implications were previously identified for empirical plug-in estimators as part of the phenomenon \emph{finite sample smeariness}. Our findings thus confirm inevitable statistical challenges in the estimation of Fréchet means on metric spaces for which there exist distributions with nonunique means. Illustrating the relevance of our lower bound, examples of extrinsic, intrinsic, Procrustes, diffusion and Wasserstein means showcase either deteriorating constants or slow convergence rates of empirical Fréchet means for samples near the regime of nonunique means. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 24 pages, 1 figure

MSC Class: Primary 62F10; 62H12; secondary 60D05

arXiv:2311.08846 [pdf, other]

Sticky Flavors

Authors: Lars Lammers, Do Tran Van, Stephan F. Huckemann

Abstract: The Fréchet mean, a generalization to a metric space of the expectation of a random variable in a vector space, can exhibit unexpected behavior for a wide class of random variables. For instance, it can stick to a point (more generally to a closed set) under resampling: sample stickiness. It can stick to a point for topologically nearby distributions: topological stickiness, such as total variatio… ▽ More The Fréchet mean, a generalization to a metric space of the expectation of a random variable in a vector space, can exhibit unexpected behavior for a wide class of random variables. For instance, it can stick to a point (more generally to a closed set) under resampling: sample stickiness. It can stick to a point for topologically nearby distributions: topological stickiness, such as total variation or Wasserstein stickiness. It can stick to a point for slight but arbitrary perturbations: perturbation stickiness. Here, we explore these and various other flavors of stickiness and their relationship in varying scenarios, for instance on CAT($κ$) spaces, $κ\in \mathbb{R}$. Interestingly, modulation stickiness (faster asymptotic rate than $\sqrt{n}$) and directional stickiness (a generalization of moment stickiness from the literature) allow for the development of new statistical methods building on an asymptotic fluctuation, where, due to stickiness, the mean itself features no asymptotic fluctuation. Also, we rule out sticky flavors on manifolds in scenarios with curvature bounds. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2307.12414 [pdf, other]

Drift Models on Complex Projective Space for Electron-Nuclear Double Resonance

Authors: Henrik Wiechers, Markus Zobel, Marina Bennati, Igor Tkach, Benjamin Eltzner, Stephan Huckemann, Yvo Pokern

Abstract: ENDOR spectroscopy is an important tool to determine the complicated three-dimensional structure of biomolecules and in particular enables measurements of intramolecular distances. Usually, spectra are determined by averaging the data matrix, which does not take into account the significant thermal drifts that occur in the measurement process. In contrast, we present an asymptotic analysis for the… ▽ More ENDOR spectroscopy is an important tool to determine the complicated three-dimensional structure of biomolecules and in particular enables measurements of intramolecular distances. Usually, spectra are determined by averaging the data matrix, which does not take into account the significant thermal drifts that occur in the measurement process. In contrast, we present an asymptotic analysis for the homoscedastic drift model, a pioneering parametric model that achieves striking model fits in practice and allows both hypothesis testing and confidence intervals for spectra. The ENDOR spectrum and an orthogonal component are modeled as an element of complex projective space, and formulated in the framework of generalized Fréchet means. To this end, two general formulations of strong consistency for set-valued Fréchet means are extended and subsequently applied to the homoscedastic drift model to prove strong consistency. Building on this, central limit theorems for the ENDOR spectrum are shown. Furthermore, we extend applicability by taking into account a phase noise contribution leading to the heteroscedastic drift model. Both drift models offer improved signal-to-noise ratio over pre-existing models. △ Less

Submitted 23 July, 2023; originally announced July 2023.

Comments: 68 pages, 10 figures

arXiv:2305.10324 [pdf, other]

Exploring Uniform Finite Sample Stickiness

Authors: Susanne Ulmer, Do Tran Van, Stephan F. Huckemann

Abstract: It is well known, that Fréchet means on non-Euclidean spaces may exhibit nonstandard asymptotic rates depending on curvature. Even for distributions featuring standard asymptotic rates, there are non-Euclidean effects, altering finite sampling rates up to considerable sample sizes. These effects can be measured by the variance modulation function proposed by Pennec (2019). Among others, in view of… ▽ More It is well known, that Fréchet means on non-Euclidean spaces may exhibit nonstandard asymptotic rates depending on curvature. Even for distributions featuring standard asymptotic rates, there are non-Euclidean effects, altering finite sampling rates up to considerable sample sizes. These effects can be measured by the variance modulation function proposed by Pennec (2019). Among others, in view of statistical inference, it is important to bound this function on intervals of sampling sizes. In a first step into this direction, for the special case of a K-spider we give such an interval, based only on folded moments and total probabilities of spider legs and illustrate the method by simulations. △ Less

Submitted 17 May, 2023; originally announced May 2023.

Comments: 9 pages, 3 figures

MSC Class: 60F05

arXiv:2304.05025 [pdf, other]

Types of Stickiness in BHV Phylogenetic Tree Spaces and Their Degree

Authors: Lars Lammers, Do Tran Van, Tom M. W. Nye, Stephan F. Huckemann

Abstract: It has been observed that the sample mean of certain probability distributions in Billera-Holmes-Vogtmann (BHV) phylogenetic spaces is confined to a lower-dimensional subspace for large enough sample size. This non-standard behavior has been called stickiness and poses difficulties in statistical applications when comparing samples of sticky distributions. We extend previous results on stickiness… ▽ More It has been observed that the sample mean of certain probability distributions in Billera-Holmes-Vogtmann (BHV) phylogenetic spaces is confined to a lower-dimensional subspace for large enough sample size. This non-standard behavior has been called stickiness and poses difficulties in statistical applications when comparing samples of sticky distributions. We extend previous results on stickiness to show the equivalence of this sampling behavior to topological conditions in the special case of BHV spaces. Furthermore, we propose to alleviate statistical comparision of sticky distributions by including the directional derivatives of the Fréchet function: the degree of stickiness. △ Less

Submitted 11 April, 2023; originally announced April 2023.

Comments: 8 Pages, 1 Figure, conference submission to GSI 2023

MSC Class: 62F03

arXiv:2209.05332 [pdf, other]

Foundations of the Wald Space for Phylogenetic Trees

Authors: Jonas Lueg, Maryam K. Garba, Tom M. W. Nye, Stephan F. Huckemann

Abstract: Evolutionary relationships between species are represented by phylogenetic trees, but these relationships are subject to uncertainty due to the random nature of evolution. A geometry for the space of phylogenetic trees is necessary in order to properly quantify this uncertainty during the statistical analysis of collections of possible evolutionary trees inferred from biological data. Recently, th… ▽ More Evolutionary relationships between species are represented by phylogenetic trees, but these relationships are subject to uncertainty due to the random nature of evolution. A geometry for the space of phylogenetic trees is necessary in order to properly quantify this uncertainty during the statistical analysis of collections of possible evolutionary trees inferred from biological data. Recently, the wald space has been introduced: a length space for trees which is a certain subset of the manifold of symmetric positive definite matrices. In this work, the wald space is introduced formally and its topology and structure is studied in detail. In particular, we show that wald space has the topology of a disjoint union of open cubes, it is contractible, and by careful characterization of cube boundaries, we demonstrate that wald space is a Whitney stratified space of type (A). Imposing the metric induced by the affine invariant metric on symmetric positive definite matrices, we prove that wald space is a geodesic Riemann stratified space. A new numerical method is proposed and investigated for construction of geodesics, computation of Fréchet means and calculation of curvature in wald space. This work is intended to serve as a mathematical foundation for further geometric and statistical research on this space. △ Less

Submitted 12 September, 2022; originally announced September 2022.

Comments: 42 pages, 15 figures

MSC Class: 30L05; 57N80; 53A35

arXiv:2105.12061 [pdf, other]

Diffusion Means in Geometric Spaces

Authors: Benjamin Eltzner, Pernille Hansen, Stephan F. Huckemann, Stefan Sommer

Abstract: We introduce a location statistic for distributions on non-linear geometric spaces, the diffusion mean, serving as an extension and an alternative to the Fréchet mean. The diffusion mean arises as the generalization of Gaussian maximum likelihood analysis to non-linear spaces by maximizing the likelihood of a Brownian motion. The diffusion mean depends on a time parameter $t$, which admits the int… ▽ More We introduce a location statistic for distributions on non-linear geometric spaces, the diffusion mean, serving as an extension and an alternative to the Fréchet mean. The diffusion mean arises as the generalization of Gaussian maximum likelihood analysis to non-linear spaces by maximizing the likelihood of a Brownian motion. The diffusion mean depends on a time parameter $t$, which admits the interpretation of the allowed variance of the diffusion. The diffusion $t$-mean of a distribution $X$ is the most likely origin of a Brownian motion at time $t$, given the end-point distribution $X$. We give a detailed description of the asymptotic behavior of the diffusion estimator and provide sufficient conditions for the diffusion estimator to be strongly consistent. Particularly, we present a smeary central limit theorem for diffusion means and we show that joint estimation of the mean and diffusion variance rules out smeariness in all directions simultaneously in general situations. Furthermore, we investigate properties of the diffusion mean for distributions on the sphere $\mathbb S^n$. Experimentally, we consider simulated data and data from magnetic pole reversals, all indicating similar or improved convergence rate compared to the Fréchet mean. Here, we additionally estimate $t$ and consider its effects on smeariness and uniqueness of the diffusion mean for distributions on the sphere. △ Less

Submitted 4 December, 2022; v1 submitted 25 May, 2021; originally announced May 2021.

arXiv:2103.00512 [pdf, other]

Finite Sample Smeariness on Spheres

Authors: Benjamin Eltzner, Shayan Hundrieser, Stephan F. Huckemann

Abstract: Finite Sample Smeariness (FSS) has been recently discovered. It means that the distribution of sample Fréchet means of underlying rather unsuspicious random variables can behave as if it were smeary for quite large regimes of finite sample sizes. In effect classical quantile-based statistical testing procedures do not preserve nominal size, they reject too often under the null hypothesis. Suitably… ▽ More Finite Sample Smeariness (FSS) has been recently discovered. It means that the distribution of sample Fréchet means of underlying rather unsuspicious random variables can behave as if it were smeary for quite large regimes of finite sample sizes. In effect classical quantile-based statistical testing procedures do not preserve nominal size, they reject too often under the null hypothesis. Suitably designed bootstrap tests, however, amend for FSS. On the circle it has been known that arbitrarily sized FSS is possible, and that all distributions with a nonvanishing density feature FSS. These results are extended to spheres of arbitrary dimension. In particular all rotationally symmetric distributions, not necessarily supported on the entire sphere feature FSS of Type I. While on the circle there is also FSS of Type II it is conjectured that this is not possible on higher-dimensional spheres. △ Less

Submitted 28 February, 2021; originally announced March 2021.

Comments: 8 pages, 4 figures, conference paper, GSI 2021

arXiv:2103.00469 [pdf, other]

Smeariness Begets Finite Sample Smeariness

Authors: Do Tran, Benjamin Eltzner, Stephan Huckemann

Abstract: Fréchet means are indispensable for nonparametric statistics on non-Euclidean spaces. For suitable random variables, in some sense, they "sense" topological and geometric structure. In particular, smeariness seems to indicate the presence of positive curvature. While smeariness may be considered more as an academical curiosity, occurring rarely, it has been recently demonstrated that finite sample… ▽ More Fréchet means are indispensable for nonparametric statistics on non-Euclidean spaces. For suitable random variables, in some sense, they "sense" topological and geometric structure. In particular, smeariness seems to indicate the presence of positive curvature. While smeariness may be considered more as an academical curiosity, occurring rarely, it has been recently demonstrated that finite sample smeariness (FSS) occurs regularly on circles, tori and spheres and affects a large class of typical probability distributions. FSS can be well described by the modulation measuring the quotient of rescaled expected sample mean variance and population variance. Under FSS it is larger than one - that is its value on Euclidean spaces - and this makes quantile based tests using tangent space approximations inapplicable. We show here that near smeary probability distributions there are always FSS probability distributions and as a first step towards the conjecture that all compact spaces feature smeary distributions, we establish directional smeariness under curvature bounds. △ Less

Submitted 28 February, 2021; originally announced March 2021.

Comments: 8 pages, 1 figure, conference submission to GSI 2021

arXiv:2010.08661 [pdf, other]

Generalized Intersection Algorithms with Fixpoints for Image Decomposition Learning

Authors: Robin Richter, Duy H. Thai, Stephan F. Huckemann

Abstract: In image processing, classical methods minimize a suitable functional that balances between computational feasibility (convexity of the functional is ideal) and suitable penalties reflecting the desired image decomposition. The fact that algorithms derived from such minimization problems can be used to construct (deep) learning architectures has spurred the development of algorithms that can be tr… ▽ More In image processing, classical methods minimize a suitable functional that balances between computational feasibility (convexity of the functional is ideal) and suitable penalties reflecting the desired image decomposition. The fact that algorithms derived from such minimization problems can be used to construct (deep) learning architectures has spurred the development of algorithms that can be trained for a specifically desired image decomposition, e.g. into cartoon and texture. While many such methods are very successful, theoretical guarantees are only scarcely available. To this end, in this contribution, we formalize a general class of intersection point problems encompassing a wide range of (learned) image decomposition models, and we give an existence result for a large subclass of such problems, i.e. giving the existence of a fixpoint of the corresponding algorithm. This class generalizes classical model-based variational problems, such as the TV-l2 -model or the more general TV-Hilbert model. To illustrate the potential for learned algorithms, novel (non learned) choices within our class show comparable results in denoising and texture removal. △ Less

Submitted 16 October, 2020; originally announced October 2020.

Comments: 30 pages, 4 figures

MSC Class: 65D18 (Primary) 68U10 (Secondary)

arXiv:2003.13004 [pdf, other]

Information geometry for phylogenetic trees

Authors: Maryam K. Garba, Tom M. W. Nye, Jonas Lueg, Stephan F. Huckemann

Abstract: We propose a new space of phylogenetic trees which we call wald space. The motivation is to develop a space suitable for statistical analysis of phylogenies, but with a geometry based on more biologically principled assumptions than existing spaces: in wald space, trees are close if they induce similar distributions on genetic sequence data. As a point set, wald space contains the previously devel… ▽ More We propose a new space of phylogenetic trees which we call wald space. The motivation is to develop a space suitable for statistical analysis of phylogenies, but with a geometry based on more biologically principled assumptions than existing spaces: in wald space, trees are close if they induce similar distributions on genetic sequence data. As a point set, wald space contains the previously developed Billera-Holmes-Vogtmann (BHV) tree space; it also contains disconnected forests, like the edge-product (EP) space but without certain singularities of the EP space. We investigate two related geometries on wald space. The first is the geometry of the Fisher information metric of character distributions induced by the two-state symmetric Markov substitution process on each tree. Infinitesimally, the metric is proportional to the Kullback-Leibler divergence, or equivalently, as we show, any to f -divergence. The second geometry is obtained analogously but using a related continuous-valued Gaussian process on each tree, and it can be viewed as the trace metric of the affine-invariant metric for covariance matrices. We derive a gradient descent algorithm to project from the ambient space of covariance matrices to wald space. For both geometries we derive computational methods to compute geodesics in polynomial time and show numerically that the two information geometries (discrete and continuous) are very similar. In particular geodesics are approximated extrinsically. Comparison with the BHV geometry shows that our canonical and biologically motivated space is substantially different. △ Less

Submitted 17 September, 2020; v1 submitted 29 March, 2020; originally announced March 2020.

MSC Class: 92D15; 53A35; 94A17

arXiv:1909.06583 [pdf, other]

doi 10.1093/jrsssc/qlad060

Confidence Tubes for Curves on SO(3) and Identification of Subject-Specific Gait Change after Kneeling

Authors: Fabian J. E. Telschow, Michael R. Pierrynowski, Stephan F. Huckemann

Abstract: In order to identify changes of gait patterns, e.g. due to prolonged occupational kneeling, which is believed to be major risk factor, among others, for the development of knee osteoarthritis, we develop confidence tubes for curves following a Gaussian perturbation model on SO(3). These are based on an application of the Gaussian kinematic formula to a process of Hotelling statistics and we approx… ▽ More In order to identify changes of gait patterns, e.g. due to prolonged occupational kneeling, which is believed to be major risk factor, among others, for the development of knee osteoarthritis, we develop confidence tubes for curves following a Gaussian perturbation model on SO(3). These are based on an application of the Gaussian kinematic formula to a process of Hotelling statistics and we approximate them by a computible version, for which we show convergence. Simulations endorse our method, which in application to gait curves from eight volunteers undergoing kneeling tasks, identifies phases of the gait cycle that have changed due to kneeling tasks. We find that after kneeling, deviation from normal gait is stronger, in particular for older aged male volunteers. Notably our method adjusts for different walking speeds and marker replacement at different visits. △ Less

Submitted 14 September, 2019; originally announced September 2019.

Comments: 19 pages, 4 figures

arXiv:1909.00410 [pdf, ps, other]

Stability of the Cut Locus and a Central Limit Theorem for Fréchet Means of Riemannian Manifolds

Authors: Benjamin Eltzner, Fernando Galaz-Garcia, Stephan F. Huckemann, Wilderich Tuschmann

Abstract: We obtain a Central Limit Theorem for closed Riemannian manifolds, clarifying along the way the geometric meaning of some of the hypotheses in Bhattacharya and Lin's Omnibus Central Limit Theorem for Fréchet means. We obtain our CLT assuming certain stability hypothesis for the cut locus, which always holds when the manifold is compact but may not be satisfied in the non-compact case. We obtain a Central Limit Theorem for closed Riemannian manifolds, clarifying along the way the geometric meaning of some of the hypotheses in Bhattacharya and Lin's Omnibus Central Limit Theorem for Fréchet means. We obtain our CLT assuming certain stability hypothesis for the cut locus, which always holds when the manifold is compact but may not be satisfied in the non-compact case. △ Less

Submitted 4 September, 2019; v1 submitted 1 September, 2019; originally announced September 2019.

Comments: Typos corrected

MSC Class: 53C20; 60F05; 62E20

arXiv:1801.06581 [pdf, other]

A Smeary Central Limit Theorem for Manifolds with Application to High Dimensional Spheres

Authors: Benjamin Eltzner, Stephan F. Huckemann

Abstract: The (CLT) central limit theorems for generalized Frechet means (data descriptors assuming values in stratified spaces, such as intrinsic means, geodesics, etc.) on manifolds from the literature are only valid if a certain empirical process of Hessians of the Frechet function converges suitably, as in the proof of the prototypical BP-CLT (Bhattacharya and Patrangenaru (2005)). This is not valid in… ▽ More The (CLT) central limit theorems for generalized Frechet means (data descriptors assuming values in stratified spaces, such as intrinsic means, geodesics, etc.) on manifolds from the literature are only valid if a certain empirical process of Hessians of the Frechet function converges suitably, as in the proof of the prototypical BP-CLT (Bhattacharya and Patrangenaru (2005)). This is not valid in many realistic scenarios and we provide for a new very general CLT. In particular this includes scenarios where, in a suitable chart, the sample mean fluctuates asymptotically at a scale $n^α$ with exponents $α < 1/2$ with a non-normal distribution. As the BP-CLT yields only fluctuations that are, rescaled with $n^{1/2}$ , asymptotically normal, just as the classical CLT for random vectors, these lower rates, somewhat loosely called smeariness, had to date been observed only on the circle (Hotz and Huckemann (2015)). We make the concept of smeariness on manifolds precise, give an example for two-smeariness on spheres of arbitrary dimension, and show that smeariness, although "almost never" occurring, may have serious statistical implications on a continuum of sample scenarios nearby. In fact, this effect increases with dimension, striking in particular in high dimension low sample size scenarios. △ Less

Submitted 19 January, 2018; originally announced January 2018.

Comments: 16 pages, 2 figures

arXiv:1711.07417 [pdf, other]

doi 10.1007/s00285-019-01338-3

An Anisotropic Interaction Model for Simulating Fingerprints

Authors: Bertram Düring, Carsten Gottschlich, Stephan Huckemann, Lisa Maria Kreusser, Carola-Bibiane Schönlieb

Abstract: Evidence suggests that both the interaction of so-called Merkel cells and the epidermal stress distribution play an important role in the formation of fingerprint patterns during pregnancy. To model the formation of fingerprint patterns in a biologically meaningful way these patterns have to become stationary. For the creation of synthetic fingerprints it is also very desirable that rescaling the… ▽ More Evidence suggests that both the interaction of so-called Merkel cells and the epidermal stress distribution play an important role in the formation of fingerprint patterns during pregnancy. To model the formation of fingerprint patterns in a biologically meaningful way these patterns have to become stationary. For the creation of synthetic fingerprints it is also very desirable that rescaling the model parameters leads to rescaled distances between the stationary fingerprint ridges. Based on these observations, as well as the model introduced by Kücken and Champod we propose a new model for the formation of fingerprint patterns during pregnancy. In this anisotropic interaction model the interaction forces not only depend on the distance vector between the cells and the model parameters, but additionally on an underlying tensor field, representing a stress field. This dependence on the tensor field leads to complex, anisotropic patterns. We study the resulting stationary patterns both analytically and numerically. In particular, we show that fingerprint patterns can be modeled as stationary solutions by choosing the underlying tensor field appropriately. △ Less

Submitted 20 November, 2017; originally announced November 2017.

MSC Class: 35B36; 70F10; 82C22; 92C15; 92C17

Journal ref: Journal of Mathematical Biology, 78(7), 2171-2206, 2019

arXiv:1609.00814 [pdf, other]

Backward Nested Descriptors Asymptotics with Inference on Stem Cell Differentiation

Authors: Stephan F. Huckemann, Benjamin Eltzner

Abstract: For sequences of random backward nested subspaces as occur, say, in dimension reduction for manifold or stratified space valued data, asymptotic results are derived. In fact, we formulate our results more generally for backward nested families of descriptors (BNFD). Under rather general conditions, asymptotic strong consistency holds. Under additional, still rather general hypotheses, among them e… ▽ More For sequences of random backward nested subspaces as occur, say, in dimension reduction for manifold or stratified space valued data, asymptotic results are derived. In fact, we formulate our results more generally for backward nested families of descriptors (BNFD). Under rather general conditions, asymptotic strong consistency holds. Under additional, still rather general hypotheses, among them existence of a.s. local twice differentiable charts, asymptotic joint normality of a BNFD can be shown. If charts factor suitably, this leads to individual asymptotic normality for the last element, a principal nested mean or a principal nested geodesic, say. It turns out that these results pertain to principal nested spheres (PNS) and principal nested great subsphere (PNGS) analysis by Jung et al. (2010) as well as to the intrinsic mean on a first geodesic principal component (IMo1GPC) for manifolds and Kendall's shape spaces. A nested bootstrap two-sample test is derived and illustrated with simulations. In a study on real data, PNGS is applied to track early human mesenchymal stem cell differentiation over a coarse time grid and, among others, to locate a change point with direct consequences for the design of further studies. △ Less

Submitted 3 September, 2016; originally announced September 2016.

arXiv:1410.6879 [pdf, other]

Sticky central limit theorems at isolated hyperbolic planar singularities

Authors: Stephan Huckemann, Jonathan C. Mattingly, Ezra Miller, James Nolen

Abstract: We derive the limiting distribution of the barycenter $b_n$ of an i.i.d. sample of $n$ random points on a planar cone with angular spread larger than $2π$. There are three mutually exclusive possibilities: (i) (fully sticky case) after a finite random time the barycenter is almost surely at the origin; (ii) (partly sticky case) the limiting distribution of $\sqrt{n} b_n$ comprises a point mass at… ▽ More We derive the limiting distribution of the barycenter $b_n$ of an i.i.d. sample of $n$ random points on a planar cone with angular spread larger than $2π$. There are three mutually exclusive possibilities: (i) (fully sticky case) after a finite random time the barycenter is almost surely at the origin; (ii) (partly sticky case) the limiting distribution of $\sqrt{n} b_n$ comprises a point mass at the origin, an open sector of a Gaussian, and the projection of a Gaussian to the sector's bounding rays; or (iii) (nonsticky case) the barycenter stays away from the origin and the renormalized fluctuations have a fully supported limit distribution---usually Gaussian but not always. We conclude with an alternative, topological definition of stickiness that generalizes readily to measures on general metric spaces. △ Less

Submitted 13 July, 2015; v1 submitted 25 October, 2014; originally announced October 2014.

Comments: revised version, 39 pages

arXiv:1202.4267 [pdf, ps, other]

doi 10.1214/12-AAP899

Sticky central limit theorems on open books

Authors: Thomas Hotz, Sean Skwerer, Stephan Huckemann, Huiling Le, J. S. Marron, Jonathan C. Mattingly, Ezra Miller, James Nolen, Megan Owen, Vic Patrangenaru

Abstract: Given a probability distribution on an open book (a metric space obtained by gluing a disjoint union of copies of a half-space along their boundary hyperplanes), we define a precise concept of when the Fréchet mean (barycenter) is sticky. This nonclassical phenomenon is quantified by a law of large numbers (LLN) stating that the empirical mean eventually almost surely lies on the (codimension $1$… ▽ More Given a probability distribution on an open book (a metric space obtained by gluing a disjoint union of copies of a half-space along their boundary hyperplanes), we define a precise concept of when the Fréchet mean (barycenter) is sticky. This nonclassical phenomenon is quantified by a law of large numbers (LLN) stating that the empirical mean eventually almost surely lies on the (codimension $1$ and hence measure $0$) spine that is the glued hyperplane, and a central limit theorem (CLT) stating that the limiting distribution is Gaussian and supported on the spine. We also state versions of the LLN and CLT for the cases where the mean is nonsticky (i.e., not lying on the spine) and partly sticky (i.e., is, on the spine but not sticky). △ Less

Submitted 3 December, 2013; v1 submitted 20 February, 2012; originally announced February 2012.

Comments: Published in at http://dx.doi.org/10.1214/12-AAP899 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AAP-AAP899

Journal ref: Annals of Applied Probability 2013, Vol. 23, No. 6, 2238-2258

arXiv:1108.2141 [pdf, other]

Intrinsic Means on the Circle: Uniqueness, Locus and Asymptotics

Authors: Thomas Hotz, Stephan Huckemann

Abstract: This paper gives a comprehensive treatment of local uniqueness, asymptotics and numerics for intrinsic means on the circle. It turns out that local uniqueness as well as rates of convergence are governed by the distribution near the antipode. In a nutshell, if the distribution there is locally less than uniform, we have local uniqueness and asymptotic normality with a rate of 1 / \surdn. With incr… ▽ More This paper gives a comprehensive treatment of local uniqueness, asymptotics and numerics for intrinsic means on the circle. It turns out that local uniqueness as well as rates of convergence are governed by the distribution near the antipode. In a nutshell, if the distribution there is locally less than uniform, we have local uniqueness and asymptotic normality with a rate of 1 / \surdn. With increased proximity to the uniform distribution the rate can be arbitrarly slow, and in the limit, local uniqueness is lost. Further, we give general distributional conditions, e.g. unimodality, that ensure global uniqueness. Along the way, we discover that sample means can occur only at the vertices of a regular polygon which allows to compute intrinsic sample means in linear time from sorted data. This algorithm is finally applied in a simulation study demonstrating the dependence of the convergence rates on the behavior of the density at the antipode. △ Less

Submitted 10 August, 2011; originally announced August 2011.

MSC Class: 62H11 (Primary) 60F05 (Secondary)

arXiv:1010.4202 [pdf, ps, other]

doi 10.1214/09-AOS783

Möbius deconvolution on the hyperbolic plane with application to impedance density estimation

Authors: Stephan F. Huckemann, Peter T. Kim, Ja-Yong Koo, Axel Munk

Abstract: In this paper we consider a novel statistical inverse problem on the Poincaré, or Lobachevsky, upper (complex) half plane. Here the Riemannian structure is hyperbolic and a transitive group action comes from the space of $2\times2$ real matrices of determinant one via Möbius transformations. Our approach is based on a deconvolution technique which relies on the Helgason--Fourier calculus adapted t… ▽ More In this paper we consider a novel statistical inverse problem on the Poincaré, or Lobachevsky, upper (complex) half plane. Here the Riemannian structure is hyperbolic and a transitive group action comes from the space of $2\times2$ real matrices of determinant one via Möbius transformations. Our approach is based on a deconvolution technique which relies on the Helgason--Fourier calculus adapted to this hyperbolic space. This gives a minimax nonparametric density estimator of a hyperbolic density that is corrupted by a random Möbius transform. A motivation for this work comes from the reconstruction of impedances of capacitors where the above scenario on the Poincaré plane exactly describes the physical system that is of statistical interest. △ Less

Submitted 20 October, 2010; originally announced October 2010.

Comments: Published in at http://dx.doi.org/10.1214/09-AOS783 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS783

Journal ref: Annals of Statistics 2010, Vol. 38, No. 4, 2465-2498

arXiv:1002.0795 [pdf, ps, other]

On the meaning of mean shape

Authors: Stephan Huckemann

Abstract: Various concepts of mean shape previously unrelated in the literature are brought into relation. In particular for non-manifolds such as Kendall's 3D shape space, this paper answers the question, for which means one may apply a two-sample test. The answer is positive if intrinsic or Ziezold means are used. The underlying general result of manifold stability of a mean on a shape space, the quotient… ▽ More Various concepts of mean shape previously unrelated in the literature are brought into relation. In particular for non-manifolds such as Kendall's 3D shape space, this paper answers the question, for which means one may apply a two-sample test. The answer is positive if intrinsic or Ziezold means are used. The underlying general result of manifold stability of a mean on a shape space, the quotient due to an isometric action of a compact Lie group on a Riemannian manifold, blends the Slice Theorem from differential geometry with the statistics of shape. For 3D Procrustes means, however, a counterexample is given. To further elucidate on subtleties of means, for spheres and Kendall's shape spaces, a first order relationship between intrinsic, residual/Procrustean and extrinsic/Ziezold means is derived stating that for high concentration the latter approximately divides the (generalized) geodesic segment between the former two by the ratio $1:3$. This fact, consequences of coordinate choices for the power of tests and other details, e.g. that extrinsic Schoenberg means may increase dimension are discussed and illustrated by simulations and exemplary datasets. △ Less

Submitted 12 May, 2011; v1 submitted 3 February, 2010; originally announced February 2010.

Comments: 32 pages, 13 figures

arXiv:1002.0616 [pdf, ps, other]

Dynamic shape analysis and comparison of leaf growth

Authors: Stephan Huckemann

Abstract: In the statistical analysis of shape a goal beyond the analysis of static shapes lies in the quantification of `same' deformation of different shapes. Typically, shape spaces are modelled as Riemannian manifolds on which parallel transport along geodesics naturally qualifies as a measure for the `similarity' of deformation. Since these spaces are usually defined as combinations of Riemannian imm… ▽ More In the statistical analysis of shape a goal beyond the analysis of static shapes lies in the quantification of `same' deformation of different shapes. Typically, shape spaces are modelled as Riemannian manifolds on which parallel transport along geodesics naturally qualifies as a measure for the `similarity' of deformation. Since these spaces are usually defined as combinations of Riemannian immersions and submersions, only for few well featured spaces such as spheres or complex projective spaces (which are Kendall's spaces for 2D shapes), parallel transport along geodesics can be computed explicitly. In this contribution a general numerical method to compute parallel transport along geodesics when no explicit formula is available is provided. This method is applied to the shape spaces of closed 2D contours based on angular direction and to Kendall's spaces of shapes of arbitrary dimension. In application to the temporal evolution of leaf shape over a growing period, one leaf's shape-growth dynamics can be applied to another leaf. For a specific poplar tree investigated it is found that leaves of initially and terminally different shape evolve rather parallel, i.e. with comparable dynamics. △ Less

Submitted 3 February, 2010; originally announced February 2010.

Comments: 19 pages, 16 Figures

Showing 1–23 of 23 results for author: Huckemann, S