-
Drift Models on Complex Projective Space for Electron-Nuclear Double Resonance
Authors:
Henrik Wiechers,
Markus Zobel,
Marina Bennati,
Igor Tkach,
Benjamin Eltzner,
Stephan Huckemann,
Yvo Pokern
Abstract:
ENDOR spectroscopy is an important tool to determine the complicated three-dimensional structure of biomolecules and in particular enables measurements of intramolecular distances. Usually, spectra are determined by averaging the data matrix, which does not take into account the significant thermal drifts that occur in the measurement process. In contrast, we present an asymptotic analysis for the…
▽ More
ENDOR spectroscopy is an important tool to determine the complicated three-dimensional structure of biomolecules and in particular enables measurements of intramolecular distances. Usually, spectra are determined by averaging the data matrix, which does not take into account the significant thermal drifts that occur in the measurement process. In contrast, we present an asymptotic analysis for the homoscedastic drift model, a pioneering parametric model that achieves striking model fits in practice and allows both hypothesis testing and confidence intervals for spectra. The ENDOR spectrum and an orthogonal component are modeled as an element of complex projective space, and formulated in the framework of generalized Fréchet means. To this end, two general formulations of strong consistency for set-valued Fréchet means are extended and subsequently applied to the homoscedastic drift model to prove strong consistency. Building on this, central limit theorems for the ENDOR spectrum are shown. Furthermore, we extend applicability by taking into account a phase noise contribution leading to the heteroscedastic drift model. Both drift models offer improved signal-to-noise ratio over pre-existing models.
△ Less
Submitted 23 July, 2023;
originally announced July 2023.
-
Diffusion Means in Geometric Spaces
Authors:
Benjamin Eltzner,
Pernille Hansen,
Stephan F. Huckemann,
Stefan Sommer
Abstract:
We introduce a location statistic for distributions on non-linear geometric spaces, the diffusion mean, serving as an extension and an alternative to the Fréchet mean. The diffusion mean arises as the generalization of Gaussian maximum likelihood analysis to non-linear spaces by maximizing the likelihood of a Brownian motion. The diffusion mean depends on a time parameter $t$, which admits the int…
▽ More
We introduce a location statistic for distributions on non-linear geometric spaces, the diffusion mean, serving as an extension and an alternative to the Fréchet mean. The diffusion mean arises as the generalization of Gaussian maximum likelihood analysis to non-linear spaces by maximizing the likelihood of a Brownian motion. The diffusion mean depends on a time parameter $t$, which admits the interpretation of the allowed variance of the diffusion. The diffusion $t$-mean of a distribution $X$ is the most likely origin of a Brownian motion at time $t$, given the end-point distribution $X$. We give a detailed description of the asymptotic behavior of the diffusion estimator and provide sufficient conditions for the diffusion estimator to be strongly consistent. Particularly, we present a smeary central limit theorem for diffusion means and we show that joint estimation of the mean and diffusion variance rules out smeariness in all directions simultaneously in general situations. Furthermore, we investigate properties of the diffusion mean for distributions on the sphere $\mathbb S^n$. Experimentally, we consider simulated data and data from magnetic pole reversals, all indicating similar or improved convergence rate compared to the Fréchet mean. Here, we additionally estimate $t$ and consider its effects on smeariness and uniqueness of the diffusion mean for distributions on the sphere.
△ Less
Submitted 4 December, 2022; v1 submitted 25 May, 2021;
originally announced May 2021.
-
Clustering Schemes on the Torus with Application to RNA Clashes
Authors:
Henrik Wiechers,
Benjamin Eltzner,
Stephan F. Huckemann,
Kanti V. Mardia
Abstract:
Molecular structures of RNA molecules reconstructed from X-ray crystallography frequently contain errors. Motivated by this problem we examine clustering on a torus since RNA shapes can be described by dihedral angles. A previously developed clustering method for torus data involves two tuning parameters and we assess clustering results for different parameter values in relation to the problem of…
▽ More
Molecular structures of RNA molecules reconstructed from X-ray crystallography frequently contain errors. Motivated by this problem we examine clustering on a torus since RNA shapes can be described by dihedral angles. A previously developed clustering method for torus data involves two tuning parameters and we assess clustering results for different parameter values in relation to the problem of so-called RNA clashes. This clustering problem is part of the dynamically evolving field of statistics on manifolds. Statistical problems on the torus highlight general challenges for statistics on manifolds. Therefore, the torus PCA and clustering methods we propose make an important contribution to directional statistics and statistics on manifolds in general.
△ Less
Submitted 28 February, 2021;
originally announced April 2021.
-
Analyzing cross-talk between superimposed signals: Vector norm dependent hidden Markov models and applications to ion channels
Authors:
Laura Jula Vanegas,
Benjamin Eltzner,
Daniel Rudolf,
Miroslav Dura,
Stephan E. Lehnart,
Axel Munk
Abstract:
We propose and investigate a hidden Markov model (HMM) for the analysis of dependent, aggregated, superimposed two-state signal recordings. A major motivation for this work is that often these signals cannot be observed individually but only their superposition. Among others, such models are in high demand for the understanding of cross-talk between ion channels, where each single channel cannot b…
▽ More
We propose and investigate a hidden Markov model (HMM) for the analysis of dependent, aggregated, superimposed two-state signal recordings. A major motivation for this work is that often these signals cannot be observed individually but only their superposition. Among others, such models are in high demand for the understanding of cross-talk between ion channels, where each single channel cannot be measured separately. As an essential building block, we introduce a parameterized vector norm dependent Markov chain model and characterize it in terms of permutation invariance as well as conditional independence. This building block leads to a hidden Markov chain sum process which can be used for analyzing the dependence structure of superimposed two-state signal observations within an HMM. Notably, the model parameters of the vector norm dependent Markov chain are uniquely determined by the parameters of the sum process and are therefore identifiable. We provide algorithms to estimate the parameters, discuss model selection and apply our methodology to real-world ion channel data from the heart muscle, where we show competitive gating.
△ Less
Submitted 28 June, 2023; v1 submitted 10 March, 2021;
originally announced March 2021.
-
Diffusion Means and Heat Kernel on Manifolds
Authors:
Pernille Hansen,
Benjamin Eltzner,
Stefan Sommer
Abstract:
We introduce diffusion means as location statistics on manifold data spaces. A diffusion mean is defined as the starting point of an isotropic diffusion with a given diffusivity. They can therefore be defined on all spaces on which a Brownian motion can be defined and numerical calculation of sample diffusion means is possible on a variety of spaces using the heat kernel expansion. We present seve…
▽ More
We introduce diffusion means as location statistics on manifold data spaces. A diffusion mean is defined as the starting point of an isotropic diffusion with a given diffusivity. They can therefore be defined on all spaces on which a Brownian motion can be defined and numerical calculation of sample diffusion means is possible on a variety of spaces using the heat kernel expansion. We present several classes of spaces, for which the heat kernel is known and sample diffusion means can therefore be calculated. As an example, we investigate a classic data set from directional statistics, for which the sample Fréchet mean exhibits finite sample smeariness.
△ Less
Submitted 28 February, 2021;
originally announced March 2021.
-
Finite Sample Smeariness on Spheres
Authors:
Benjamin Eltzner,
Shayan Hundrieser,
Stephan F. Huckemann
Abstract:
Finite Sample Smeariness (FSS) has been recently discovered. It means that the distribution of sample Fréchet means of underlying rather unsuspicious random variables can behave as if it were smeary for quite large regimes of finite sample sizes. In effect classical quantile-based statistical testing procedures do not preserve nominal size, they reject too often under the null hypothesis. Suitably…
▽ More
Finite Sample Smeariness (FSS) has been recently discovered. It means that the distribution of sample Fréchet means of underlying rather unsuspicious random variables can behave as if it were smeary for quite large regimes of finite sample sizes. In effect classical quantile-based statistical testing procedures do not preserve nominal size, they reject too often under the null hypothesis. Suitably designed bootstrap tests, however, amend for FSS. On the circle it has been known that arbitrarily sized FSS is possible, and that all distributions with a nonvanishing density feature FSS. These results are extended to spheres of arbitrary dimension. In particular all rotationally symmetric distributions, not necessarily supported on the entire sphere feature FSS of Type I. While on the circle there is also FSS of Type II it is conjectured that this is not possible on higher-dimensional spheres.
△ Less
Submitted 28 February, 2021;
originally announced March 2021.
-
Testing for Uniqueness of Estimators
Authors:
Benjamin Eltzner
Abstract:
Uniqueness of the population value of an estimated descriptor is a standard assumption in asymptotic theory. However, m-estimation problems often allow for local minima of the sample estimating function, which may stem from multiple global minima of the underlying population estimating function. In the present article, we provide tools to systematically determine for a given sample whether the und…
▽ More
Uniqueness of the population value of an estimated descriptor is a standard assumption in asymptotic theory. However, m-estimation problems often allow for local minima of the sample estimating function, which may stem from multiple global minima of the underlying population estimating function. In the present article, we provide tools to systematically determine for a given sample whether the underlying population estimating function may have multiple global minima. To achieve this goal, we develop asymptotic theory for non-unique minimizers and introduce asymptotic tests using the bootstrap. We discuss three applications of our tests to data, each of which presents a typical scenario in which non-uniqueness of descriptors may occur. These model scenarios are the mean on a non-euclidean space, non-linear regression and Gaussian mixture clustering.
△ Less
Submitted 30 November, 2020;
originally announced November 2020.
-
Finite Sample Smeariness of Fréchet Means and Application to Climate
Authors:
Shayan Hundrieser,
Benjamin Eltzner,
Stephan F. Huckemann
Abstract:
Fréchet means on non-Euclidean spaces may exhibit nonstandard asymptotic rates rendering quantile-based asymptotic inference inapplicable. We show here that this affects, among others, all circular distributions whose support exceeds a half circle. We exhaustively describe this phenomenon and introduce a new concept which we call finite samples smeariness (FSS). In the presence of FSS, it turns ou…
▽ More
Fréchet means on non-Euclidean spaces may exhibit nonstandard asymptotic rates rendering quantile-based asymptotic inference inapplicable. We show here that this affects, among others, all circular distributions whose support exceeds a half circle. We exhaustively describe this phenomenon and introduce a new concept which we call finite samples smeariness (FSS). In the presence of FSS, it turns out that quantile-based tests for equality of Fréchet means systematically feature effective levels higher than their nominal level which perseveres asymptotically in case of Type I FSS. In contrast, suitable bootstrap-based tests correct for FSS and asymptotically attain the correct level. For illustration of the relevance of FSS in real data, we apply our method to directional wind data from two European cities. It turns out that quantile based tests, not correcting for FSS, find a multitude of significant wind changes. This multitude condenses to a few years featuring significant wind changes, when our bootstrap tests are applied, correcting for FSS.
△ Less
Submitted 26 July, 2021; v1 submitted 5 May, 2020;
originally announced May 2020.
-
Backward Nested Descriptors Asymptotics with Inference on Stem Cell Differentiation
Authors:
Stephan F. Huckemann,
Benjamin Eltzner
Abstract:
For sequences of random backward nested subspaces as occur, say, in dimension reduction for manifold or stratified space valued data, asymptotic results are derived. In fact, we formulate our results more generally for backward nested families of descriptors (BNFD). Under rather general conditions, asymptotic strong consistency holds. Under additional, still rather general hypotheses, among them e…
▽ More
For sequences of random backward nested subspaces as occur, say, in dimension reduction for manifold or stratified space valued data, asymptotic results are derived. In fact, we formulate our results more generally for backward nested families of descriptors (BNFD). Under rather general conditions, asymptotic strong consistency holds. Under additional, still rather general hypotheses, among them existence of a.s. local twice differentiable charts, asymptotic joint normality of a BNFD can be shown. If charts factor suitably, this leads to individual asymptotic normality for the last element, a principal nested mean or a principal nested geodesic, say. It turns out that these results pertain to principal nested spheres (PNS) and principal nested great subsphere (PNGS) analysis by Jung et al. (2010) as well as to the intrinsic mean on a first geodesic principal component (IMo1GPC) for manifolds and Kendall's shape spaces. A nested bootstrap two-sample test is derived and illustrated with simulations. In a study on real data, PNGS is applied to track early human mesenchymal stem cell differentiation over a coarse time grid and, among others, to locate a change point with direct consequences for the design of further studies.
△ Less
Submitted 3 September, 2016;
originally announced September 2016.
-
Principal Sub-manifolds
Authors:
Zhigang Yao,
Benjamin Eltzner,
Tung Pham
Abstract:
We propose a novel method of finding principal components in multivariate data sets that lie on an embedded nonlinear Riemannian manifold within a higher-dimensional space. Our aim is to extend the geometric interpretation of PCA, while being able to capture non-geodesic modes of variation in the data. We introduce the concept of a principal sub-manifold, a manifold passing through a reference poi…
▽ More
We propose a novel method of finding principal components in multivariate data sets that lie on an embedded nonlinear Riemannian manifold within a higher-dimensional space. Our aim is to extend the geometric interpretation of PCA, while being able to capture non-geodesic modes of variation in the data. We introduce the concept of a principal sub-manifold, a manifold passing through a reference point, and at any point on the manifold extending in the direction of highest variation in the space spanned by the eigenvectors of the local tangent space PCA. Compared to recent work for the case where the sub-manifold is of dimension one Panaretos et al. (2014)$-$essentially a curve lying on the manifold attempting to capture one-dimensional variation$-$the current setting is much more general. The principal sub-manifold is therefore an extension of the principal flow, accommodating to capture higher dimensional variation in the data. We show the principal sub-manifold yields the ball spanned by the usual principal components in Euclidean space. By means of examples, we illustrate how to find, use and interpret a principal sub-manifold and we present an application in shape analysis.
△ Less
Submitted 4 June, 2024; v1 submitted 14 April, 2016;
originally announced April 2016.
-
Torus Principal Component Analysis with an Application to RNA Structures
Authors:
Benjamin Eltzner,
Stephan Huckemann,
Kanti V. Mardia
Abstract:
There are several cutting edge applications needing PCA methods for data on tori and we propose a novel torus-PCA method with important properties that can be generally applied. There are two existing general methods: tangent space PCA and geodesic PCA. However, unlike tangent space PCA, our torus-PCA honors the cyclic topology of the data space whereas, unlike geodesic PCA, our torus-PCA produces…
▽ More
There are several cutting edge applications needing PCA methods for data on tori and we propose a novel torus-PCA method with important properties that can be generally applied. There are two existing general methods: tangent space PCA and geodesic PCA. However, unlike tangent space PCA, our torus-PCA honors the cyclic topology of the data space whereas, unlike geodesic PCA, our torus-PCA produces a variety of non-winding, non-dense descriptors. This is achieved by deforming tori into spheres and then using a variant of the recently developed principle nested spheres analysis. This PCA analysis involves a step of small sphere fitting and we provide an improved test to avoid overfitting. However, deforming tori into spheres creates singularities. We introduce a data-adaptive pre-clustering technique to keep the singularities away from the data. For the frequently encountered case that the residual variance around the PCA main component is small, we use a post-mode hunting technique for more fine-grained clustering. Thus in general, there are three successive interrelated key steps of torus-PCA in practice: pre-clustering, deformation, and post-mode hunting. We illustrate our method with two recently studied RNA structure (tori) data sets: one is a small RNA data set which is established as the benchmark for PCA and we validate our method through this data. Another is a large RNA data set (containing the small RNA data set) for which we show that our method provides interpretable principal components as well as giving further insight into its structure.
△ Less
Submitted 16 November, 2015;
originally announced November 2015.