Search | arXiv e-print repository

Drift Models on Complex Projective Space for Electron-Nuclear Double Resonance

Authors: Henrik Wiechers, Markus Zobel, Marina Bennati, Igor Tkach, Benjamin Eltzner, Stephan Huckemann, Yvo Pokern

Abstract: ENDOR spectroscopy is an important tool to determine the complicated three-dimensional structure of biomolecules and in particular enables measurements of intramolecular distances. Usually, spectra are determined by averaging the data matrix, which does not take into account the significant thermal drifts that occur in the measurement process. In contrast, we present an asymptotic analysis for the… ▽ More ENDOR spectroscopy is an important tool to determine the complicated three-dimensional structure of biomolecules and in particular enables measurements of intramolecular distances. Usually, spectra are determined by averaging the data matrix, which does not take into account the significant thermal drifts that occur in the measurement process. In contrast, we present an asymptotic analysis for the homoscedastic drift model, a pioneering parametric model that achieves striking model fits in practice and allows both hypothesis testing and confidence intervals for spectra. The ENDOR spectrum and an orthogonal component are modeled as an element of complex projective space, and formulated in the framework of generalized Fréchet means. To this end, two general formulations of strong consistency for set-valued Fréchet means are extended and subsequently applied to the homoscedastic drift model to prove strong consistency. Building on this, central limit theorems for the ENDOR spectrum are shown. Furthermore, we extend applicability by taking into account a phase noise contribution leading to the heteroscedastic drift model. Both drift models offer improved signal-to-noise ratio over pre-existing models. △ Less

Submitted 23 July, 2023; originally announced July 2023.

Comments: 68 pages, 10 figures

arXiv:2105.12061 [pdf, other]

Diffusion Means in Geometric Spaces

Authors: Benjamin Eltzner, Pernille Hansen, Stephan F. Huckemann, Stefan Sommer

Abstract: We introduce a location statistic for distributions on non-linear geometric spaces, the diffusion mean, serving as an extension and an alternative to the Fréchet mean. The diffusion mean arises as the generalization of Gaussian maximum likelihood analysis to non-linear spaces by maximizing the likelihood of a Brownian motion. The diffusion mean depends on a time parameter $t$, which admits the int… ▽ More We introduce a location statistic for distributions on non-linear geometric spaces, the diffusion mean, serving as an extension and an alternative to the Fréchet mean. The diffusion mean arises as the generalization of Gaussian maximum likelihood analysis to non-linear spaces by maximizing the likelihood of a Brownian motion. The diffusion mean depends on a time parameter $t$, which admits the interpretation of the allowed variance of the diffusion. The diffusion $t$-mean of a distribution $X$ is the most likely origin of a Brownian motion at time $t$, given the end-point distribution $X$. We give a detailed description of the asymptotic behavior of the diffusion estimator and provide sufficient conditions for the diffusion estimator to be strongly consistent. Particularly, we present a smeary central limit theorem for diffusion means and we show that joint estimation of the mean and diffusion variance rules out smeariness in all directions simultaneously in general situations. Furthermore, we investigate properties of the diffusion mean for distributions on the sphere $\mathbb S^n$. Experimentally, we consider simulated data and data from magnetic pole reversals, all indicating similar or improved convergence rate compared to the Fréchet mean. Here, we additionally estimate $t$ and consider its effects on smeariness and uniqueness of the diffusion mean for distributions on the sphere. △ Less

Submitted 4 December, 2022; v1 submitted 25 May, 2021; originally announced May 2021.

arXiv:2104.00094 [pdf, other]

Clustering Schemes on the Torus with Application to RNA Clashes

Authors: Henrik Wiechers, Benjamin Eltzner, Stephan F. Huckemann, Kanti V. Mardia

Abstract: Molecular structures of RNA molecules reconstructed from X-ray crystallography frequently contain errors. Motivated by this problem we examine clustering on a torus since RNA shapes can be described by dihedral angles. A previously developed clustering method for torus data involves two tuning parameters and we assess clustering results for different parameter values in relation to the problem of… ▽ More Molecular structures of RNA molecules reconstructed from X-ray crystallography frequently contain errors. Motivated by this problem we examine clustering on a torus since RNA shapes can be described by dihedral angles. A previously developed clustering method for torus data involves two tuning parameters and we assess clustering results for different parameter values in relation to the problem of so-called RNA clashes. This clustering problem is part of the dynamically evolving field of statistics on manifolds. Statistical problems on the torus highlight general challenges for statistics on manifolds. Therefore, the torus PCA and clustering methods we propose make an important contribution to directional statistics and statistics on manifolds in general. △ Less

Submitted 28 February, 2021; originally announced April 2021.

Comments: 8 pages, 4 figures, conference submission to GSI 2021

arXiv:2103.06071 [pdf, other]

Analyzing cross-talk between superimposed signals: Vector norm dependent hidden Markov models and applications to ion channels

Authors: Laura Jula Vanegas, Benjamin Eltzner, Daniel Rudolf, Miroslav Dura, Stephan E. Lehnart, Axel Munk

Abstract: We propose and investigate a hidden Markov model (HMM) for the analysis of dependent, aggregated, superimposed two-state signal recordings. A major motivation for this work is that often these signals cannot be observed individually but only their superposition. Among others, such models are in high demand for the understanding of cross-talk between ion channels, where each single channel cannot b… ▽ More We propose and investigate a hidden Markov model (HMM) for the analysis of dependent, aggregated, superimposed two-state signal recordings. A major motivation for this work is that often these signals cannot be observed individually but only their superposition. Among others, such models are in high demand for the understanding of cross-talk between ion channels, where each single channel cannot be measured separately. As an essential building block, we introduce a parameterized vector norm dependent Markov chain model and characterize it in terms of permutation invariance as well as conditional independence. This building block leads to a hidden Markov chain sum process which can be used for analyzing the dependence structure of superimposed two-state signal observations within an HMM. Notably, the model parameters of the vector norm dependent Markov chain are uniquely determined by the parameters of the sum process and are therefore identifiable. We provide algorithms to estimate the parameters, discuss model selection and apply our methodology to real-world ion channel data from the heart muscle, where we show competitive gating. △ Less

Submitted 28 June, 2023; v1 submitted 10 March, 2021; originally announced March 2021.

Comments: 49 pages, 11 figures. An R package can be found at: https://github.com/ljvanegas/VND

arXiv:2103.00588 [pdf, other]

Diffusion Means and Heat Kernel on Manifolds

Authors: Pernille Hansen, Benjamin Eltzner, Stefan Sommer

Abstract: We introduce diffusion means as location statistics on manifold data spaces. A diffusion mean is defined as the starting point of an isotropic diffusion with a given diffusivity. They can therefore be defined on all spaces on which a Brownian motion can be defined and numerical calculation of sample diffusion means is possible on a variety of spaces using the heat kernel expansion. We present seve… ▽ More We introduce diffusion means as location statistics on manifold data spaces. A diffusion mean is defined as the starting point of an isotropic diffusion with a given diffusivity. They can therefore be defined on all spaces on which a Brownian motion can be defined and numerical calculation of sample diffusion means is possible on a variety of spaces using the heat kernel expansion. We present several classes of spaces, for which the heat kernel is known and sample diffusion means can therefore be calculated. As an example, we investigate a classic data set from directional statistics, for which the sample Fréchet mean exhibits finite sample smeariness. △ Less

Submitted 28 February, 2021; originally announced March 2021.

Comments: 8 pages, 1 figure, conference paper submitted to GSI 2021

arXiv:2103.00512 [pdf, other]

Finite Sample Smeariness on Spheres

Authors: Benjamin Eltzner, Shayan Hundrieser, Stephan F. Huckemann

Abstract: Finite Sample Smeariness (FSS) has been recently discovered. It means that the distribution of sample Fréchet means of underlying rather unsuspicious random variables can behave as if it were smeary for quite large regimes of finite sample sizes. In effect classical quantile-based statistical testing procedures do not preserve nominal size, they reject too often under the null hypothesis. Suitably… ▽ More Finite Sample Smeariness (FSS) has been recently discovered. It means that the distribution of sample Fréchet means of underlying rather unsuspicious random variables can behave as if it were smeary for quite large regimes of finite sample sizes. In effect classical quantile-based statistical testing procedures do not preserve nominal size, they reject too often under the null hypothesis. Suitably designed bootstrap tests, however, amend for FSS. On the circle it has been known that arbitrarily sized FSS is possible, and that all distributions with a nonvanishing density feature FSS. These results are extended to spheres of arbitrary dimension. In particular all rotationally symmetric distributions, not necessarily supported on the entire sphere feature FSS of Type I. While on the circle there is also FSS of Type II it is conjectured that this is not possible on higher-dimensional spheres. △ Less

Submitted 28 February, 2021; originally announced March 2021.

Comments: 8 pages, 4 figures, conference paper, GSI 2021

arXiv:2011.14762 [pdf, other]

Testing for Uniqueness of Estimators

Authors: Benjamin Eltzner

Abstract: Uniqueness of the population value of an estimated descriptor is a standard assumption in asymptotic theory. However, m-estimation problems often allow for local minima of the sample estimating function, which may stem from multiple global minima of the underlying population estimating function. In the present article, we provide tools to systematically determine for a given sample whether the und… ▽ More Uniqueness of the population value of an estimated descriptor is a standard assumption in asymptotic theory. However, m-estimation problems often allow for local minima of the sample estimating function, which may stem from multiple global minima of the underlying population estimating function. In the present article, we provide tools to systematically determine for a given sample whether the underlying population estimating function may have multiple global minima. To achieve this goal, we develop asymptotic theory for non-unique minimizers and introduce asymptotic tests using the bootstrap. We discuss three applications of our tests to data, each of which presents a typical scenario in which non-uniqueness of descriptors may occur. These model scenarios are the mean on a non-euclidean space, non-linear regression and Gaussian mixture clustering. △ Less

Submitted 30 November, 2020; originally announced November 2020.

Comments: 28 pages, 9 figures

arXiv:2005.02321 [pdf, other]

Finite Sample Smeariness of Fréchet Means and Application to Climate

Authors: Shayan Hundrieser, Benjamin Eltzner, Stephan F. Huckemann

Abstract: Fréchet means on non-Euclidean spaces may exhibit nonstandard asymptotic rates rendering quantile-based asymptotic inference inapplicable. We show here that this affects, among others, all circular distributions whose support exceeds a half circle. We exhaustively describe this phenomenon and introduce a new concept which we call finite samples smeariness (FSS). In the presence of FSS, it turns ou… ▽ More Fréchet means on non-Euclidean spaces may exhibit nonstandard asymptotic rates rendering quantile-based asymptotic inference inapplicable. We show here that this affects, among others, all circular distributions whose support exceeds a half circle. We exhaustively describe this phenomenon and introduce a new concept which we call finite samples smeariness (FSS). In the presence of FSS, it turns out that quantile-based tests for equality of Fréchet means systematically feature effective levels higher than their nominal level which perseveres asymptotically in case of Type I FSS. In contrast, suitable bootstrap-based tests correct for FSS and asymptotically attain the correct level. For illustration of the relevance of FSS in real data, we apply our method to directional wind data from two European cities. It turns out that quantile based tests, not correcting for FSS, find a multitude of significant wind changes. This multitude condenses to a few years featuring significant wind changes, when our bootstrap tests are applied, correcting for FSS. △ Less

Submitted 26 July, 2021; v1 submitted 5 May, 2020; originally announced May 2020.

Comments: 38 pages, 9 figures

MSC Class: 62 H 11; 60 F 05; 62 G 10

arXiv:1609.00814 [pdf, other]

Backward Nested Descriptors Asymptotics with Inference on Stem Cell Differentiation

Authors: Stephan F. Huckemann, Benjamin Eltzner

Abstract: For sequences of random backward nested subspaces as occur, say, in dimension reduction for manifold or stratified space valued data, asymptotic results are derived. In fact, we formulate our results more generally for backward nested families of descriptors (BNFD). Under rather general conditions, asymptotic strong consistency holds. Under additional, still rather general hypotheses, among them e… ▽ More For sequences of random backward nested subspaces as occur, say, in dimension reduction for manifold or stratified space valued data, asymptotic results are derived. In fact, we formulate our results more generally for backward nested families of descriptors (BNFD). Under rather general conditions, asymptotic strong consistency holds. Under additional, still rather general hypotheses, among them existence of a.s. local twice differentiable charts, asymptotic joint normality of a BNFD can be shown. If charts factor suitably, this leads to individual asymptotic normality for the last element, a principal nested mean or a principal nested geodesic, say. It turns out that these results pertain to principal nested spheres (PNS) and principal nested great subsphere (PNGS) analysis by Jung et al. (2010) as well as to the intrinsic mean on a first geodesic principal component (IMo1GPC) for manifolds and Kendall's shape spaces. A nested bootstrap two-sample test is derived and illustrated with simulations. In a study on real data, PNGS is applied to track early human mesenchymal stem cell differentiation over a coarse time grid and, among others, to locate a change point with direct consequences for the design of further studies. △ Less

Submitted 3 September, 2016; originally announced September 2016.

arXiv:1604.04318 [pdf, other]

Principal Sub-manifolds

Authors: Zhigang Yao, Benjamin Eltzner, Tung Pham

Abstract: We propose a novel method of finding principal components in multivariate data sets that lie on an embedded nonlinear Riemannian manifold within a higher-dimensional space. Our aim is to extend the geometric interpretation of PCA, while being able to capture non-geodesic modes of variation in the data. We introduce the concept of a principal sub-manifold, a manifold passing through a reference poi… ▽ More We propose a novel method of finding principal components in multivariate data sets that lie on an embedded nonlinear Riemannian manifold within a higher-dimensional space. Our aim is to extend the geometric interpretation of PCA, while being able to capture non-geodesic modes of variation in the data. We introduce the concept of a principal sub-manifold, a manifold passing through a reference point, and at any point on the manifold extending in the direction of highest variation in the space spanned by the eigenvectors of the local tangent space PCA. Compared to recent work for the case where the sub-manifold is of dimension one Panaretos et al. (2014)$-$essentially a curve lying on the manifold attempting to capture one-dimensional variation$-$the current setting is much more general. The principal sub-manifold is therefore an extension of the principal flow, accommodating to capture higher dimensional variation in the data. We show the principal sub-manifold yields the ball spanned by the usual principal components in Euclidean space. By means of examples, we illustrate how to find, use and interpret a principal sub-manifold and we present an application in shape analysis. △ Less

Submitted 4 June, 2024; v1 submitted 14 April, 2016; originally announced April 2016.

Comments: 45 pages, 21 figures

arXiv:1511.04993 [pdf, other]

Torus Principal Component Analysis with an Application to RNA Structures

Authors: Benjamin Eltzner, Stephan Huckemann, Kanti V. Mardia

Abstract: There are several cutting edge applications needing PCA methods for data on tori and we propose a novel torus-PCA method with important properties that can be generally applied. There are two existing general methods: tangent space PCA and geodesic PCA. However, unlike tangent space PCA, our torus-PCA honors the cyclic topology of the data space whereas, unlike geodesic PCA, our torus-PCA produces… ▽ More There are several cutting edge applications needing PCA methods for data on tori and we propose a novel torus-PCA method with important properties that can be generally applied. There are two existing general methods: tangent space PCA and geodesic PCA. However, unlike tangent space PCA, our torus-PCA honors the cyclic topology of the data space whereas, unlike geodesic PCA, our torus-PCA produces a variety of non-winding, non-dense descriptors. This is achieved by deforming tori into spheres and then using a variant of the recently developed principle nested spheres analysis. This PCA analysis involves a step of small sphere fitting and we provide an improved test to avoid overfitting. However, deforming tori into spheres creates singularities. We introduce a data-adaptive pre-clustering technique to keep the singularities away from the data. For the frequently encountered case that the residual variance around the PCA main component is small, we use a post-mode hunting technique for more fine-grained clustering. Thus in general, there are three successive interrelated key steps of torus-PCA in practice: pre-clustering, deformation, and post-mode hunting. We illustrate our method with two recently studied RNA structure (tori) data sets: one is a small RNA data set which is established as the benchmark for PCA and we validate our method through this data. Another is a large RNA data set (containing the small RNA data set) for which we show that our method provides interpretable principal components as well as giving further insight into its structure. △ Less

Submitted 16 November, 2015; originally announced November 2015.

Comments: 35 pages, 19 figures

MSC Class: 62H30

Showing 1–11 of 11 results for author: Eltzner, B