-
Simultaneous semiparametric inference for single-index models
Authors:
Jiajun Tang,
Holger Dette
Abstract:
In the common partially linear single-index model we establish a Bahadur representation for a smoothing spline estimator of all model parameters and use this result to prove the joint weak convergence of the estimator of the index link function at a given point, together with the estimators of the parametric regression coefficients. We obtain the surprising result that, despite of the nature of si…
▽ More
In the common partially linear single-index model we establish a Bahadur representation for a smoothing spline estimator of all model parameters and use this result to prove the joint weak convergence of the estimator of the index link function at a given point, together with the estimators of the parametric regression coefficients. We obtain the surprising result that, despite of the nature of single-index models where the link function is evaluated at a linear combination of the index-coefficients, the estimator of the link function and the estimator of the index-coefficients are asymptotically independent. Our approach leverages a delicate analysis based on reproducing kernel Hilbert space and empirical process theory.
We show that the smoothing spline estimator achieves the minimax optimal rate with respect to the $L^2$-risk and consider several statistical applications where joint inference on all model parameters is of interest. In particular, we develop a simultaneous confidence band for the link function and propose inference tools to investigate if the maximum absolute deviation between the (unknown) link function and a given function exceeds a given threshold. We also construct tests for joint hypotheses regarding model parameters which involve both the nonparametric and parametric components and propose novel multiplier bootstrap procedures to avoid the estimation of unknown asymptotic quantities.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Nonparametric bootstrap of high-dimensional sample covariance matrices
Authors:
Holger Dette,
Angelika Rohde
Abstract:
We introduce a new "$(m,mp/n)$ out of $(n,p)$" sampling-with-replace\-ment bootstrap for eigenvalue statistics of high-dimensional sample covariance matrices based on $n$ independent $p$-dimensional random vectors. In the high-dimensional scenario $p/n\rightarrow c\in (0,\infty)$, this fully nonparametric and computationally tractable bootstrap is shown to consistently reproduce the empirical spec…
▽ More
We introduce a new "$(m,mp/n)$ out of $(n,p)$" sampling-with-replace\-ment bootstrap for eigenvalue statistics of high-dimensional sample covariance matrices based on $n$ independent $p$-dimensional random vectors. In the high-dimensional scenario $p/n\rightarrow c\in (0,\infty)$, this fully nonparametric and computationally tractable bootstrap is shown to consistently reproduce the empirical spectral measure if $m/n\rightarrow 0$. If $m^2/n\rightarrow 0$, it approximates correctly the distribution of linear spectral statistics. The crucial component is a suitably defined Representative Subpopulation Condition which is shown to be verified in a large variety of situations. Our proofs are conducted under minimal moment requirements and incorporate delicate results on non-centered quadratic forms, combinatorial trace moments estimates as well as a conditional bootstrap martingale CLT which may be of independent interest.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Uncertainty quantification by block bootstrap for differentially private stochastic gradient descent
Authors:
Holger Dette,
Carina Graw
Abstract:
Stochastic Gradient Descent (SGD) is a widely used tool in machine learning. In the context of Differential Privacy (DP), SGD has been well studied in the last years in which the focus is mainly on convergence rates and privacy guarantees. While in the non private case, uncertainty quantification (UQ) for SGD by bootstrap has been addressed by several authors, these procedures cannot be transferre…
▽ More
Stochastic Gradient Descent (SGD) is a widely used tool in machine learning. In the context of Differential Privacy (DP), SGD has been well studied in the last years in which the focus is mainly on convergence rates and privacy guarantees. While in the non private case, uncertainty quantification (UQ) for SGD by bootstrap has been addressed by several authors, these procedures cannot be transferred to differential privacy due to multiple queries to the private data. In this paper, we propose a novel block bootstrap for SGD under local differential privacy that is computationally tractable and does not require an adjustment of the privacy budget. The method can be easily implemented and is applicable to a broad class of estimation problems. We prove the validity of our approach and illustrate its finite sample properties by means of a simulation study. As a by-product, the new method also provides a simple alternative numerical tool for UQ for non-private SGD.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Testing for similarity of dose response in multi-regional clinical trials
Authors:
Holger Dette,
Lukas Koletzko,
Frank Bretz
Abstract:
This paper addresses the problem of deciding whether the dose response relationships between subgroups and the full population in a multi-regional trial are similar to each other. Similarity is measured in terms of the maximal deviation between the dose response curves. We consider a parametric framework and develop two powerful bootstrap tests for the similarity between the dose response curves o…
▽ More
This paper addresses the problem of deciding whether the dose response relationships between subgroups and the full population in a multi-regional trial are similar to each other. Similarity is measured in terms of the maximal deviation between the dose response curves. We consider a parametric framework and develop two powerful bootstrap tests for the similarity between the dose response curves of one subgroup and the full population, and for the similarity between the dose response curves of several subgroups and the full population. We prove the validity of the tests, investigate the finite sample properties by means of a simulation study and finally illustrate the methodology in a case study.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
New energy distances for statistical inference on infinite dimensional Hilbert spaces without moment conditions
Authors:
Holger Dette,
Jiajun Tang
Abstract:
For statistical inference on an infinite-dimensional Hilbert space $\H $ with no moment conditions we introduce a new class of energy distances on the space of probability measures on $\H$. The proposed distances consist of the integrated squared modulus of the corresponding difference of the characteristic functionals with respect to a reference probability measure on the Hilbert space. Necessary…
▽ More
For statistical inference on an infinite-dimensional Hilbert space $\H $ with no moment conditions we introduce a new class of energy distances on the space of probability measures on $\H$. The proposed distances consist of the integrated squared modulus of the corresponding difference of the characteristic functionals with respect to a reference probability measure on the Hilbert space. Necessary and sufficient conditions are established for the reference probability measure to be {\em characteristic}, the property that guarantees that the distance defines a metric on the space of probability measures on $\H$. We also use these results to define new distance covariances, which can be used to measure the dependence between the marginals of a two dimensional distribution of $\H^2$ without existing moments.
On the basis of the new distances we develop statistical inference for Hilbert space valued data, which does not require any moment assumptions. As a consequence, our methods are robust with respect to heavy tails in finite dimensional data. In particular, we consider the problem of comparing the distributions of two samples and the problem of testing for independence and construct new minimax optimal tests for the corresponding hypotheses. We also develop aggregated (with respect to the reference measure) procedures for power enhancement and investigate the finite-sample properties by means of a simulation study.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Balancing the edge effect and dimension of spectral spatial statistics under irregular sampling with applications to isotropy testing
Authors:
Theresa Eckle,
Anne van Delft,
Holger Dette
Abstract:
We investigate distributional properties of a class of spectral spatial statistics under irregular sampling of a random field that is defined on $\mathbb{R}^d$, and use this to obtain a test for isotropy. Within this context, edge effects are well-known to create a bias in classical estimators commonly encountered in the analysis of spatial data. This bias increases with dimension $d$ and, for…
▽ More
We investigate distributional properties of a class of spectral spatial statistics under irregular sampling of a random field that is defined on $\mathbb{R}^d$, and use this to obtain a test for isotropy. Within this context, edge effects are well-known to create a bias in classical estimators commonly encountered in the analysis of spatial data. This bias increases with dimension $d$ and, for $d>1$, can become non-negligible in the limiting distribution of such statistics to the extent that a nondegenerate distribution does not exist. We provide a general theory for a class of (integrated) spectral statistics that enables to 1) significantly reduce this bias and 2) that ensures that asymptotically Gaussian limits can be derived for $d \le 3$ for appropriately tapered versions of such statistics. We use this to address some crucial gaps in the literature, and demonstrate that tapering with a sufficiently smooth function is necessary to achieve such results. Our findings specifically shed a new light on a recent result in Subba Rao (2018a). Our theory then is used to propose a novel test for isotropy. In contrast to most of the literature, which validates this assumption on a finite number of spatial locations (or a finite number of Fourier frequencies), we develop a test for isotropy on the full spatial domain by means of its characterization in the frequency domain. More precisely, we derive an explicit expression for the minimum $L^2$-distance between the spectral density of the random field and its best approximation by a spectral density of an isotropic process. We prove asymptotic normality of an estimator of this quantity in the mixed increasing domain framework and use this result to derive an asymptotic level $α$-test.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Multiple change point detection in functional data with applications to biomechanical fatigue data
Authors:
Patrick Bastian,
Rupsa Basu,
Holger Dette
Abstract:
Injuries to the lower extremity joints are often debilitating, particularly for professional athletes. Understanding the onset of stressful conditions on these joints is therefore important in order to ensure prevention of injuries as well as individualised training for enhanced athletic performance. We study the biomechanical joint angles from the hip, knee and ankle for runners who are experienc…
▽ More
Injuries to the lower extremity joints are often debilitating, particularly for professional athletes. Understanding the onset of stressful conditions on these joints is therefore important in order to ensure prevention of injuries as well as individualised training for enhanced athletic performance. We study the biomechanical joint angles from the hip, knee and ankle for runners who are experiencing fatigue. The data is cyclic in nature and densely collected by body worn sensors, which makes it ideal to work with in the functional data analysis (FDA) framework.
We develop a new method for multiple change point detection for functional data, which improves the state of the art with respect to at least two novel aspects. First, the curves are compared with respect to their maximum absolute deviation, which leads to a better interpretation of local changes in the functional data compared to classical $L^2$-approaches. Secondly, as slight aberrations are to be often expected in a human movement data, our method will not detect arbitrarily small changes but hunts for relevant changes, where maximum absolute deviation between the curves exceeds a specified threshold, say $Δ>0$. We recover multiple changes in a long functional time series of biomechanical knee angle data, which are larger than the desired threshold $Δ$, allowing us to identify changes purely due to fatigue. In this work, we analyse data from both controlled indoor as well as from an uncontrolled outdoor (marathon) setting.
△ Less
Submitted 24 April, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Testing for equivalence of pre-trends in Difference-in-Differences estimation
Authors:
Holger Dette,
Martin Schumann
Abstract:
The plausibility of the ``parallel trends assumption'' in Difference-in-Differences estimation is usually assessed by a test of the null hypothesis that the difference between the average outcomes of both groups is constant over time before the treatment. However, failure to reject the null hypothesis does not imply the absence of differences in time trends between both groups. We provide equivale…
▽ More
The plausibility of the ``parallel trends assumption'' in Difference-in-Differences estimation is usually assessed by a test of the null hypothesis that the difference between the average outcomes of both groups is constant over time before the treatment. However, failure to reject the null hypothesis does not imply the absence of differences in time trends between both groups. We provide equivalence tests that allow researchers to find evidence in favor of the parallel trends assumption and thus increase the credibility of their treatment effect estimates. While we motivate our tests in the standard two-way fixed effects model, we discuss simple extensions to settings in which treatment adoption is staggered over time.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
A Simple Bootstrap for Chatterjee's Rank Correlation
Authors:
Holger Dette,
Marius Kroll
Abstract:
We prove that an $m$ out of $n$ bootstrap procedure for Chatterjee's rank correlation is consistent whenever asymptotic normality of Chatterjee's rank correlation can be established. In particular, we prove that $m$ out of $n$ bootstrap works for continuous as well as for discrete data with independent coordinates; furthermore, simulations indicate that it also performs well for discrete data with…
▽ More
We prove that an $m$ out of $n$ bootstrap procedure for Chatterjee's rank correlation is consistent whenever asymptotic normality of Chatterjee's rank correlation can be established. In particular, we prove that $m$ out of $n$ bootstrap works for continuous as well as for discrete data with independent coordinates; furthermore, simulations indicate that it also performs well for discrete data with dependent coordinates, and that it outperforms alternative estimation methods. Consistency of the bootstrap is proved in the Kolmogorov as well as in the Wasserstein distance.
△ Less
Submitted 8 March, 2024; v1 submitted 2 August, 2023;
originally announced August 2023.
-
A CLT for the difference of eigenvalue statistics of sample covariance matrices
Authors:
Nina Dörnemann,
Holger Dette
Abstract:
In the case where the dimension of the data grows at the same rate as the sample size we prove a central limit theorem for the difference of a linear spectral statistic of the sample covariance and a linear spectral statistic of the matrix that is obtained from the sample covariance matrix by deleting a column and the corresponding row. Unlike previous works, we do neither require that the populat…
▽ More
In the case where the dimension of the data grows at the same rate as the sample size we prove a central limit theorem for the difference of a linear spectral statistic of the sample covariance and a linear spectral statistic of the matrix that is obtained from the sample covariance matrix by deleting a column and the corresponding row. Unlike previous works, we do neither require that the population covariance matrix is diagonal nor that moments of all order exist. Our proof methodology incorporates subtle enhancements to existing strategies, which meet the challenges introduced by determining the mean and covariance structure for the difference of two such eigenvalue statistics. Moreover, we also establish the asymptotic independence of the difference-type spectral statistic and the usual linear spectral statistic of sample covariance matrices.
△ Less
Submitted 16 June, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Testing equivalence of multinomial distributions -- a constrained bootstrap approach
Authors:
Patrick Bastian,
Holger Dette,
Lukas Koletzko
Abstract:
In this paper we develop a novel bootstrap test for the comparison of two multinomial distributions. The two distributions are called {\it equivalent} or {\it similar} if a norm of the difference between the class probabilities is smaller than a given threshold. In contrast to most of the literature our approach does not require differentiability of the norm and is in particular applicable for the…
▽ More
In this paper we develop a novel bootstrap test for the comparison of two multinomial distributions. The two distributions are called {\it equivalent} or {\it similar} if a norm of the difference between the class probabilities is smaller than a given threshold. In contrast to most of the literature our approach does not require differentiability of the norm and is in particular applicable for the maximum- and $L^1$-norm.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
A reinforced learning approach to optimal design under model uncertainty
Authors:
Mingyao Ai,
Holger Dette,
Zhengfu Liu,
Jun Yu
Abstract:
Optimal designs are usually model-dependent and likely to be sub-optimal if the postulated model is not correctly specified. In practice, it is common that a researcher has a list of candidate models at hand and a design has to be found that is efficient for selecting the true model among the competing candidates and is also efficient (optimal, if possible) for estimating the parameters of the tru…
▽ More
Optimal designs are usually model-dependent and likely to be sub-optimal if the postulated model is not correctly specified. In practice, it is common that a researcher has a list of candidate models at hand and a design has to be found that is efficient for selecting the true model among the competing candidates and is also efficient (optimal, if possible) for estimating the parameters of the true model. In this article, we use a reinforced learning approach to address this problem. We develop a sequential algorithm, which generates a sequence of designs which have asymptotically, as the number of stages increases, the same efficiency for estimating the parameters in the true model as an optimal design if the true model would have correctly been specified in advance. A lower bound is established to quantify the relative efficiency between such a design and an optimal design for the true model in finite stages. Moreover, the resulting designs are also efficient for discriminating between the true model and other rival models from the candidate list. Some connections with other state-of-the-art algorithms for model discrimination and parameter estimation are discussed and the methodology is illustrated by a small simulation study.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
Comparing regression curves -- an $L^1$-point of view
Authors:
Patrick Bastian,
Holger Dette,
Lukas Koletzko,
Kathrin Möllenhoff
Abstract:
In this paper we compare two regression curves by measuring their difference by the area between the two curves, represented by their $L^1$-distance. We develop asymptotic confidence intervals for this measure and statistical tests to investigate the similarity/equivalence of the two curves. Bootstrap methodology specifically designed for equivalence testing is developed to obtain procedures with…
▽ More
In this paper we compare two regression curves by measuring their difference by the area between the two curves, represented by their $L^1$-distance. We develop asymptotic confidence intervals for this measure and statistical tests to investigate the similarity/equivalence of the two curves. Bootstrap methodology specifically designed for equivalence testing is developed to obtain procedures with good finite sample properties and its consistency is rigorously proved. The finite sample properties are investigated by means of a small simulation study.
△ Less
Submitted 2 February, 2023;
originally announced February 2023.
-
Testing separability for continuous functional data
Authors:
Holger Dette,
Gauthier Dierickx,
Tim Kutta
Abstract:
Analyzing the covariance structure of data is a fundamental task of statistics. While this task is simple for low-dimensional observations, it becomes challenging for more intricate objects, such as multivariate functions. Here, the covariance can be so complex that just saving a non-parametric estimate is impractical and structural assumptions are necessary to tame the model. One popular assumpti…
▽ More
Analyzing the covariance structure of data is a fundamental task of statistics. While this task is simple for low-dimensional observations, it becomes challenging for more intricate objects, such as multivariate functions. Here, the covariance can be so complex that just saving a non-parametric estimate is impractical and structural assumptions are necessary to tame the model. One popular assumption for space-time data is separability of the covariance into purely spatial and temporal factors. In this paper, we present a new test for separability in the context of dependent functional time series. While most of the related work studies functional data in a Hilbert space of square integrable functions, we model the observations as objects in the space of continuous functions equipped with the supremum norm. We argue that this (mathematically challenging) setup enhances interpretability for users and is more in line with practical preprocessing.
Our test statistic measures the maximal deviation between the estimated covariance kernel and a separable approximation. Critical values are obtained by a non-standard multiplier bootstrap for dependent data. We prove the statistical validity of our approach and demonstrate its practicability in a simulation study and a data example.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
Fluctuations of the diagonal entries of a large sample precision matrix
Authors:
Nina Dörnemann,
Holger Dette
Abstract:
For a given $p\times n$ data matrix $\textbf{X}_n$ with i.i.d. centered entries and a population covariance matrix $\bfΣ$, the corresponding sample precision matrix $\hat{\bfΣ}^{-1}$ is defined as the inverse of the sample covariance matrix $\hat{\bfΣ} = (1/n) \bfΣ^{1/2} \textbf{X}_n\textbf{X}_n^\top \bfΣ^{1/2}$. We determine the joint distribution of a vector of diagonal entries of the matrix…
▽ More
For a given $p\times n$ data matrix $\textbf{X}_n$ with i.i.d. centered entries and a population covariance matrix $\bfΣ$, the corresponding sample precision matrix $\hat{\bfΣ}^{-1}$ is defined as the inverse of the sample covariance matrix $\hat{\bfΣ} = (1/n) \bfΣ^{1/2} \textbf{X}_n\textbf{X}_n^\top \bfΣ^{1/2}$. We determine the joint distribution of a vector of diagonal entries of the matrix $\hat{\bfΣ}^{-1}$ in the situation, where $p_n=p< n$ and $p/n \to y \in [0,1)$ for $n\to\infty$ and $\bfΣ$ is a diagonal matrix. Remarkably, our results cover both the case where the dimension is negligible in comparison to the sample size and the case where it is of the same magnitude. Our approach is based on a QR-decomposition of the data matrix, yielding a connection to random quadratic forms and allowing the application of a central limit theorem for martingale difference schemes. Moreover, we discuss an interesting connection to linear spectral statistics of the sample covariance matrix. More precisely, the logarithmic diagonal entry of the sample precision matrix can be interpreted as a difference of two highly dependent linear spectral statistics of $\hat{\bfΣ}$ and a submatrix of $\hat{\bfΣ}$. This difference of spectral statistics fluctuates on a much smaller scale than each single statistic.
△ Less
Submitted 20 December, 2022; v1 submitted 1 November, 2022;
originally announced November 2022.
-
Testing for practically significant dependencies in high dimensions via bootstrap** maxima of U-statistics
Authors:
Patrick Bastian,
Holger Dette,
Johannes Heiny
Abstract:
This paper takes a different look on the problem of testing the mutual independence of the components of a high-dimensional vector. Instead of testing if all pairwise associations (e.g. all pairwise Kendall's $τ$) between the components vanish, we are interested in the (null)-hypothesis that all pairwise associations do not exceed a certain threshold in absolute value. The consideration of these h…
▽ More
This paper takes a different look on the problem of testing the mutual independence of the components of a high-dimensional vector. Instead of testing if all pairwise associations (e.g. all pairwise Kendall's $τ$) between the components vanish, we are interested in the (null)-hypothesis that all pairwise associations do not exceed a certain threshold in absolute value. The consideration of these hypotheses is motivated by the observation that in the high-dimensional regime, it is rare, and perhaps impossible, to have a null hypothesis that can be exactly modeled by assuming that all pairwise associations are precisely equal to zero.
The formulation of the null hypothesis as a composite hypothesis makes the problem of constructing tests non-standard and in this paper we provide a solution for a broad class of dependence measures, which can be estimated by $U$-statistics. In particular we develop an asymptotic and a bootstrap level $α$-test for the new hypotheses in the high-dimensional regime. We also prove that the new tests are minimax-optimal and investigate their finite sample properties by means of a small simulation study and a data example.
△ Less
Submitted 12 February, 2024; v1 submitted 31 October, 2022;
originally announced October 2022.
-
A general framework to quantify deviations from structural assumptions in the analysis of nonstationary function-valued processes
Authors:
Anne van Delft,
Holger Dette
Abstract:
We present a general theory to quantify the uncertainty from imposing structural assumptions on the second-order structure of nonstationary Hilbert space-valued processes, which can be measured via functionals of time-dependent spectral density operators. The second-order dynamics are well-known to be elements of the space of trace-class operators, the latter is a Banach space of type 1 and of cot…
▽ More
We present a general theory to quantify the uncertainty from imposing structural assumptions on the second-order structure of nonstationary Hilbert space-valued processes, which can be measured via functionals of time-dependent spectral density operators. The second-order dynamics are well-known to be elements of the space of trace-class operators, the latter is a Banach space of type 1 and of cotype 2, which makes the development of statistical inference tools more challenging. A part of our contribution is to obtain a weak invariance principle as well as concentration inequalities for (functionals of) the sequential time-varying spectral density operator. In addition, we introduce deviation measures in the nonstationary context, and derive estimators that are asymptotically pivotal. We then apply this framework and propose statistical methodology to investigate the validity of structural assumptions for nonstationary response surface data, such as low-rank assumptions in the context of time-varying dynamic fPCA and principle separable component analysis, deviations from stationarity with respect to the square root distance, and deviations from zero functional canonical coherency.
△ Less
Submitted 16 September, 2023; v1 submitted 22 August, 2022;
originally announced August 2022.
-
Validating Approximate Slope Homogeneity in Large Panels
Authors:
Tim Kutta,
Holger Dette
Abstract:
Statistical inference for large data panels is omnipresent in modern economic applications. An important benefit of panel analysis is the possibility to reduce noise and thus to guarantee stable inference by intersectional pooling. However, it is wellknown that pooling can lead to a biased analysis if individual heterogeneity is too strong. In classical linear panel models, this trade-off concerns…
▽ More
Statistical inference for large data panels is omnipresent in modern economic applications. An important benefit of panel analysis is the possibility to reduce noise and thus to guarantee stable inference by intersectional pooling. However, it is wellknown that pooling can lead to a biased analysis if individual heterogeneity is too strong. In classical linear panel models, this trade-off concerns the homogeneity of slope parameters, and a large body of tests has been developed to validate this assumption. Yet, such tests can detect inconsiderable deviations from slope homogeneity, discouraging pooling, even when practically beneficial. In order to permit a more pragmatic analysis, which allows pooling when individual heterogeneity is sufficiently small, we present in this paper the concept of approximate slope homogeneity. We develop an asymptotic level $α$ test for this hypothesis, that is uniformly consistent against classes of local alternatives. In contrast to existing methods, which focus on exact slope homogeneity and are usually sensitive to dependence in the data, the proposed test statistic is (asymptotically) pivotal and applicable under simultaneous intersectional and temporal dependence. Moreover, it can accommodate the realistic case of panels with large intersections. A simulation study and a data example underline the usefulness of our approach.
△ Less
Submitted 13 December, 2022; v1 submitted 4 May, 2022;
originally announced May 2022.
-
Detecting relevant changes in the spatiotemporal mean function
Authors:
Holger Dette,
Pascal Quanz
Abstract:
For a spatiotemporal process $\{X_j(s,t) | ~s \in S~,~t \in T \}_{j =1, \ldots , n} $, where $S$ denotes the set of spatial locations and $T$ the time domain, we consider the problem of testing for a change in the sequence of mean functions. In contrast to most of the literature we are not interested in arbitrarily small changes, but only in changes with a norm exceeding a given threshold. Asympto…
▽ More
For a spatiotemporal process $\{X_j(s,t) | ~s \in S~,~t \in T \}_{j =1, \ldots , n} $, where $S$ denotes the set of spatial locations and $T$ the time domain, we consider the problem of testing for a change in the sequence of mean functions. In contrast to most of the literature we are not interested in arbitrarily small changes, but only in changes with a norm exceeding a given threshold. Asymptotically distribution free tests are proposed, which do not require the estimation of the long-run spatiotemporal covariance structure. In particular we consider a fully functional approach and a test based on the cumulative sum paradigm, investigate the large sample properties of the corresponding test statistics and study their finite sample properties by means of simulation study.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
An RKHS approach for pivotal inference in functional linear regression
Authors:
Holger Dette,
Jiajun Tang
Abstract:
We develop methodology for testing hypotheses regarding the slope function in functional linear regression for time series via a reproducing kernel Hilbert space approach. In contrast to most of the literature, which considers tests for the exact nullity of the slope function, we are interested in the null hypothesis that the slope function vanishes only approximately, where deviations are measure…
▽ More
We develop methodology for testing hypotheses regarding the slope function in functional linear regression for time series via a reproducing kernel Hilbert space approach. In contrast to most of the literature, which considers tests for the exact nullity of the slope function, we are interested in the null hypothesis that the slope function vanishes only approximately, where deviations are measured with respect to the $L^2$-norm. An asymptotically pivotal test is proposed, which does not require the estimation of nuisance parameters and long-run covariances. The key technical tools to prove the validity of our approach include a uniform Bahadur representation and a weak invariance principle for a sequential process of estimates of the slope function. Both scalar-on-function and function-on-function linear regression are considered and finite-sample methods for implementing our methodology are provided. We also illustrate the potential of our methods by means of a small simulation study and a data example.
△ Less
Submitted 16 February, 2022;
originally announced February 2022.
-
Rearranged dependence measures
Authors:
Christopher Strothmann,
Holger Dette,
Karl Friedrich Siburg
Abstract:
Most of the popular dependence measures for two random variables $X$ and $Y$ (such as Pearson's and Spearman's correlation, Kendall's $τ$ and Gini's $γ$) vanish whenever $X$ and $Y$ are independent. However, neither does a vanishing dependence measure necessarily imply independence, nor does a measure equal to 1 imply that one variable is a measurable function of the other. Yet, both properties ar…
▽ More
Most of the popular dependence measures for two random variables $X$ and $Y$ (such as Pearson's and Spearman's correlation, Kendall's $τ$ and Gini's $γ$) vanish whenever $X$ and $Y$ are independent. However, neither does a vanishing dependence measure necessarily imply independence, nor does a measure equal to 1 imply that one variable is a measurable function of the other. Yet, both properties are natural properties for a convincing dependence measure. In this paper, we present a general approach to transforming a given dependence measure into a new one which exactly characterizes independence as well as functional dependence. Our approach uses the concept of monotone rearrangements as introduced by Hardy and Littlewood and is applicable to a broad class of measures. In particular, we are able to define a rearranged Spearman's $ρ$ and a rearranged Kendall's $τ$ which do attain the value $0$ if and only if both variables are independent, and the value $1$ if and only if one variable is a measurable function of the other. We also present simple estimators for the rearranged dependence measures, prove their consistency and illustrate their finite sample properties by means of a simulation study and a data example.
△ Less
Submitted 26 February, 2023; v1 submitted 10 January, 2022;
originally announced January 2022.
-
The integrated copula spectrum
Authors:
Yuichi Goto,
Tobias Kley,
Ria Van Hecke,
Stanislav Volgushev,
Holger Dette,
Marc Hallin
Abstract:
Frequency domain methods form a ubiquitous part of the statistical toolbox for time series analysis. In recent years, considerable interest has been given to the development of new spectral methodology and tools capturing dynamics in the entire joint distributions and thus avoiding the limitations of classical, $L^2$-based spectral methods. Most of the spectral concepts proposed in that literature…
▽ More
Frequency domain methods form a ubiquitous part of the statistical toolbox for time series analysis. In recent years, considerable interest has been given to the development of new spectral methodology and tools capturing dynamics in the entire joint distributions and thus avoiding the limitations of classical, $L^2$-based spectral methods. Most of the spectral concepts proposed in that literature suffer from one major drawback, though: their estimation requires the choice of a smoothing parameter, which has a considerable impact on estimation quality and poses challenges for statistical inference. In this paper, associated with the concept of copula-based spectrum, we introduce the notion of copula spectral distribution function or integrated copula spectrum. This integrated copula spectrum retains the advantages of copula-based spectra but can be estimated without the need for smoothing parameters. We provide such estimators, along with a thorough theoretical analysis, based on a functional central limit theorem, of their asymptotic properties. We leverage these results to test various hypotheses that cannot be addressed by classical spectral methods, such as the lack of time-reversibility or asymmetry in tail dynamics.
△ Less
Submitted 13 December, 2021;
originally announced December 2021.
-
Multivariate Mean Comparison under Differential Privacy
Authors:
Martin Dunsche,
Tim Kutta,
Holger Dette
Abstract:
The comparison of multivariate population means is a central task of statistical inference. While statistical theory provides a variety of analysis tools, they usually do not protect individuals' privacy. This knowledge can create incentives for participants in a study to conceal their true data (especially for outliers), which might result in a distorted analysis. In this paper we address this pr…
▽ More
The comparison of multivariate population means is a central task of statistical inference. While statistical theory provides a variety of analysis tools, they usually do not protect individuals' privacy. This knowledge can create incentives for participants in a study to conceal their true data (especially for outliers), which might result in a distorted analysis. In this paper we address this problem by develo** a hypothesis test for multivariate mean comparisons that guarantees differential privacy to users. The test statistic is based on the popular Hotelling's $t^2$-statistic, which has a natural interpretation in terms of the Mahalanobis distance. In order to control the type-1-error, we present a bootstrap algorithm under differential privacy that provably yields a reliable test decision. In an empirical study we demonstrate the applicability of this approach.
△ Less
Submitted 15 October, 2021;
originally announced October 2021.
-
Statistical inference for function-on-function linear regression
Authors:
Holger Dette,
Jiajun Tang
Abstract:
Function-on-function linear regression is important for understanding the relationship between the response and the predictor that are both functions. In this article, we propose a reproducing kernel Hilbert space approach to function-on-function linear regressionvia the penalised least square, regularized by the thin-plate spline smoothness penalty. The minimax optimal convergence rate of our est…
▽ More
Function-on-function linear regression is important for understanding the relationship between the response and the predictor that are both functions. In this article, we propose a reproducing kernel Hilbert space approach to function-on-function linear regressionvia the penalised least square, regularized by the thin-plate spline smoothness penalty. The minimax optimal convergence rate of our estimator of the coefficient function is studied. We derive the Bahadur representation, which allows us to propose statistical inference methods using bootstrap and the convergence of Banach-valued random variables in the sup-norm. We illustrate our method and verify our theoretical results via simulated data experiments and a real data example.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
Confidence surfaces for the mean of locally stationary functional time series
Authors:
Holger Dette,
Weichi Wu
Abstract:
The problem of constructing a simultaneous confidence band for the mean function of a locally stationary functional time series $ \{ X_{i,n} (t) \}_{i = 1, \ldots, n}$ is challenging as these bands can not be built on classical limit theory. On the one hand, for a fixed argument $t$ of the functions $ X_{i,n}$, the maximum absolute deviation between an estimate and the time dependent regression fu…
▽ More
The problem of constructing a simultaneous confidence band for the mean function of a locally stationary functional time series $ \{ X_{i,n} (t) \}_{i = 1, \ldots, n}$ is challenging as these bands can not be built on classical limit theory. On the one hand, for a fixed argument $t$ of the functions $ X_{i,n}$, the maximum absolute deviation between an estimate and the time dependent regression function exhibits (after appropriate standardization) an extreme value behaviour with a Gumbel distribution in the limit. On the other hand, for stationary functional data, simultaneous confidence bands can be built on classical central theorems for Banach space valued random variables and the limit distribution of the maximum absolute deviation is given by the sup-norm of a Gaussian process. As both limit theorems have different rates of convergence, they are not compatible, and a weak convergence result, which could be used for the construction of a confidence surface in the locally stationary case, does not exist.
In this paper we propose new bootstrap methodology to construct a simultaneous confidence band for the mean function of a locally stationary functional time series, which is motivated by a Gaussian approximation for the maximum absolute deviation. We prove the validity of our approach by asymptotic theory, demonstrate good finite sample properties by means of a simulation study and illustrate its applicability analyzing a data example.
△ Less
Submitted 6 July, 2022; v1 submitted 8 September, 2021;
originally announced September 2021.
-
Statistical Quantification of Differential Privacy: A Local Approach
Authors:
Önder Askin,
Tim Kutta,
Holger Dette
Abstract:
In this work, we introduce a new approach for statistical quantification of differential privacy in a black box setting. We present estimators and confidence intervals for the optimal privacy parameter of a randomized algorithm $A$, as well as other key variables (such as the "data-centric privacy level"). Our estimators are based on a local characterization of privacy and in contrast to the relat…
▽ More
In this work, we introduce a new approach for statistical quantification of differential privacy in a black box setting. We present estimators and confidence intervals for the optimal privacy parameter of a randomized algorithm $A$, as well as other key variables (such as the "data-centric privacy level"). Our estimators are based on a local characterization of privacy and in contrast to the related literature avoid the process of "event selection" - a major obstacle to privacy validation. This makes our methods easy to implement and user-friendly. We show fast convergence rates of the estimators and asymptotic validity of the confidence intervals. An experimental study of various algorithms confirms the efficacy of our approach.
△ Less
Submitted 2 May, 2022; v1 submitted 21 August, 2021;
originally announced August 2021.
-
Statistical inference for the slope parameter in functional linear regression
Authors:
Tim Kutta,
Gauthier Dierickx,
Holger Dette
Abstract:
In this paper we consider the linear regression model $Y =S X+\varepsilon $ with functional regressors and responses. We develop new inference tools to quantify deviations of the true slope $S$ from a hypothesized operator $S_0$ with respect to the Hilbert--Schmidt norm $\| S- S_0\|^2$, as well as the prediction error $\mathbb{E} \| S X - S_0 X \|^2$. Our analysis is applicable to functional time…
▽ More
In this paper we consider the linear regression model $Y =S X+\varepsilon $ with functional regressors and responses. We develop new inference tools to quantify deviations of the true slope $S$ from a hypothesized operator $S_0$ with respect to the Hilbert--Schmidt norm $\| S- S_0\|^2$, as well as the prediction error $\mathbb{E} \| S X - S_0 X \|^2$. Our analysis is applicable to functional time series and based on asymptotically pivotal statistics. This makes it particularly user friendly, because it avoids the choice of tuning parameters inherent in long-run variance estimation or bootstrap of dependent data. We also discuss two sample problems as well as change point detection. Finite sample properties are investigated by means of a simulation study.\\ Mathematically our approach is based on a sequential version of the popular spectral cut-off estimator $\hat S_N$ for $S$. It is well-known that the $L^2$-minimax rates in the functional regression model, both in estimation and prediction, are substantially slower than $1/\sqrt{N}$ (where $N$ denotes the sample size) and that standard estimators for $S$ do not converge weakly to non-degenerate limits. However, we demonstrate that simple plug-in estimators - such as $\| \hat S_N - S_0 \|^2$ for $\| S - S_0 \|^2$ - are $\sqrt{N}$-consistent and its sequential versions satisfy weak invariance principles. These results are based on the smoothing effect of $L^2$-norms and established by a new proof-technique, the {\it smoothness shift}, which has potential applications in other statistical inverse problems.
△ Less
Submitted 16 August, 2021;
originally announced August 2021.
-
Linear spectral statistics of sequential sample covariance matrices
Authors:
Nina Dörnemann,
Holger Dette
Abstract:
Independent $p$-dimensional vectors with independent complex or real valued entries such that $\mathbb{E} [\mathbf{x}_i] = \mathbf{0}$, ${\rm Var } (\mathbf{x}_i) = \mathbf{I}_p$, $i=1, \ldots,n$, let $\mathbf{T }_n$ be a $p \times p$ Hermitian nonnegative definite matrix and $f $ be a given function. We prove that an approriately standardized version of the stochastic process…
▽ More
Independent $p$-dimensional vectors with independent complex or real valued entries such that $\mathbb{E} [\mathbf{x}_i] = \mathbf{0}$, ${\rm Var } (\mathbf{x}_i) = \mathbf{I}_p$, $i=1, \ldots,n$, let $\mathbf{T }_n$ be a $p \times p$ Hermitian nonnegative definite matrix and $f $ be a given function. We prove that an approriately standardized version of the stochastic process $ \big ( {\operatorname{tr}} ( f(\mathbf{B}_{n,t}) ) \big )_{t \in [t_0, 1]} $ corresponding to a linear spectral statistic of the sequential empirical covariance estimator $$ \big ( \mathbf{B}_{n,t} )_{t\in [ t_0 , 1]} = \Big ( \frac{1}{n} \sum_{i=1}^{\lfloor n t \rfloor} \mathbf{T }^{1/2}_n \mathbf{x}_i \mathbf{x}_i ^\star \mathbf{T }^{1/2}_n \Big)_{t\in [ t_0 , 1]} $$ converges weakly to a non-standard Gaussian process for $n,p\to\infty$. As an application we use these results to develop a novel approach for monitoring the sphericity assumption in a high-dimensional framework, even if the dimension of the underlying data is larger than the sample size.
△ Less
Submitted 23 July, 2021; v1 submitted 21 July, 2021;
originally announced July 2021.
-
Asymptotic equivalence for nonparametric regression with dependent errors: Gauss-Markov processes
Authors:
Holger Dette,
Martin Kroll
Abstract:
For the class of Gauss-Markov processes we study the problem of asymptotic equivalence of the nonparametric regression model with errors given by the increments of the process and the continuous time model, where a whole path of a sum of a deterministic signal and the Gauss-Markov process can be observed. In particular we provide sufficient conditions such that asymptotic equivalence of the two mo…
▽ More
For the class of Gauss-Markov processes we study the problem of asymptotic equivalence of the nonparametric regression model with errors given by the increments of the process and the continuous time model, where a whole path of a sum of a deterministic signal and the Gauss-Markov process can be observed. In particular we provide sufficient conditions such that asymptotic equivalence of the two models holds for functions from a given class, and we verify these for the special cases of Sobolev ellipsoids and Hölder classes with smoothness index $> 1/2$ under mild assumptions on the Gauss-Markov process at hand. To derive these results, we develop an explicit characterization of the reproducing kernel Hilbert space associated with the Gauss-Markov process, that hinges on a characterization of such processes by a property of the corresponding covariance kernel introduced by Doob. In order to demonstrate that the given assumptions on the Gauss-Markov process are in some sense sharp we also show that asymptotic equivalence fails to hold for the special case of Brownian bridge. Our findings demonstrate that the well-known asymptotic equivalence of the Gaussian white noise model and the nonparametric regression model with i.i.d. standard normal errors can be extended to a result treating general Gauss-Markov noises in a unified manner.
△ Less
Submitted 24 October, 2021; v1 submitted 19 April, 2021;
originally announced April 2021.
-
Nonparametric and high-dimensional functional graphical models
Authors:
Eftychia Solea,
Holger Dette
Abstract:
We consider the problem of constructing nonparametric undirected graphical models for high-dimensional functional data. Most existing statistical methods in this context assume either a Gaussian distribution on the vertices or linear conditional means. In this article we provide a more flexible model which relaxes the linearity assumption by replacing it by an arbitrary additive form. The use of f…
▽ More
We consider the problem of constructing nonparametric undirected graphical models for high-dimensional functional data. Most existing statistical methods in this context assume either a Gaussian distribution on the vertices or linear conditional means. In this article we provide a more flexible model which relaxes the linearity assumption by replacing it by an arbitrary additive form. The use of functional principal components offers an estimation strategy that uses a group lasso penalty to estimate the relevant edges of the graph. We establish statistical guarantees for the resulting estimators, which can be used to prove consistency if the dimension and the number of functional principal components diverge to infinity with the sample size. We also investigate the empirical performance of our method through simulation studies and a real data application.
△ Less
Submitted 18 March, 2021;
originally announced March 2021.
-
Reproducing kernel Hilbert spaces, polynomials and the classical moment problems
Authors:
Holger Dette,
Anatoly Zhigljavsky
Abstract:
We show that polynomials do not belong to the reproducing kernel Hilbert space of infinitely differentiable translation-invariant kernels whose spectral measures have moments corresponding to a determinate moment problem. Our proof is based on relating this question to the problem of best linear estimation in continuous time one-parameter regression models with a stationary error process defined b…
▽ More
We show that polynomials do not belong to the reproducing kernel Hilbert space of infinitely differentiable translation-invariant kernels whose spectral measures have moments corresponding to a determinate moment problem. Our proof is based on relating this question to the problem of best linear estimation in continuous time one-parameter regression models with a stationary error process defined by the kernel. In particular, we show that the existence of a sequence of estimators with variances converging to $0$ implies that the regression function cannot be an element of the reproducing kernel Hilbert space. This question is then related to the determinacy of the Hamburger moment problem for the spectral measure corresponding to the kernel.
In the literature it was observed that a non-vanishing constant function does not belong to the reproducing kernel Hilbert space associated with the Gaussian kernel (see Corollary 4.44 in Steinwart and Christmann, 2008). Our results provide a unifying view of this phenomenon and show that the mentioned result can be extended for arbitrary polynomials and a broad class of translation-invariant kernels.
△ Less
Submitted 12 August, 2021; v1 submitted 28 January, 2021;
originally announced January 2021.
-
Optimal designs for comparing regression curves -- dependence within and between groups
Authors:
Kirsten Schorning,
Holger Dette
Abstract:
We consider the problem of designing experiments for the comparison of two regression curves describing the relation between a predictor and a response in two groups, where the data between and within the group may be dependent. In order to derive efficient designs we use results from stochastic analysis to identify the best linear unbiased estimator (BLUE) in a corresponding continuous time model…
▽ More
We consider the problem of designing experiments for the comparison of two regression curves describing the relation between a predictor and a response in two groups, where the data between and within the group may be dependent. In order to derive efficient designs we use results from stochastic analysis to identify the best linear unbiased estimator (BLUE) in a corresponding continuous time model. It is demonstrated that in general simultaneous estimation using the data from both groups yields more precise results than estimation of the parameters separately in the two groups. Using the BLUE from simultaneous estimation, we then construct an efficient linear estimator for finite sample size by minimizing the mean squared error between the optimal solution in the continuous time model and its discrete approximation with respect to the weights (of the linear estimator). Finally, the optimal design points are determined by minimizing the maximal width of a simultaneous confidence band for the difference of the two regression functions. The advantages of the new approach are illustrated by means of a simulation study, where it is shown that the use of the optimal designs yields substantially narrower confidence bands than the application of uniform designs.
△ Less
Submitted 14 January, 2021;
originally announced January 2021.
-
A note on optimal designs for estimating the slope of a polynomial regression
Authors:
Holger Dette,
Viatcheslav B. Melas,
Petr Shpilev
Abstract:
In this note we consider the optimal design problem for estimating the slope of a polynomial regression with no intercept at a given point, say z. In contrast to previous work, which considers symmetric design spaces we investigate the model on the interval $[0, a]$ and characterize those values of $z$, where an explicit solution of the optimal design is possible.
In this note we consider the optimal design problem for estimating the slope of a polynomial regression with no intercept at a given point, say z. In contrast to previous work, which considers symmetric design spaces we investigate the model on the interval $[0, a]$ and characterize those values of $z$, where an explicit solution of the optimal design is possible.
△ Less
Submitted 18 September, 2020;
originally announced September 2020.
-
A Portmanteau-type test for detecting serial correlation in locally stationary functional time series
Authors:
Axel Bücher,
Holger Dette,
Florian Heinrichs
Abstract:
The Portmanteau test provides the vanilla method for detecting serial correlations in classical univariate time series analysis. The method is extended to the case of observations from a locally stationary functional time series. Asymptotic critical values are obtained by a suitable block multiplier bootstrap procedure. The test is shown to asymptotically hold its level and to be consistent agains…
▽ More
The Portmanteau test provides the vanilla method for detecting serial correlations in classical univariate time series analysis. The method is extended to the case of observations from a locally stationary functional time series. Asymptotic critical values are obtained by a suitable block multiplier bootstrap procedure. The test is shown to asymptotically hold its level and to be consistent against general alternatives.
△ Less
Submitted 15 September, 2020;
originally announced September 2020.
-
Detecting relevant differences in the covariance operators of functional time series -- a sup-norm approach
Authors:
Holger Dette,
Kevin Kokot
Abstract:
In this paper we propose statistical inference tools for the covariance operators of functional time series in the two sample and change point problem. In contrast to most of the literature the focus of our approach is not testing the null hypothesis of exact equality of the covariance operators. Instead we propose to formulate the null hypotheses in them form that "the distance between the operat…
▽ More
In this paper we propose statistical inference tools for the covariance operators of functional time series in the two sample and change point problem. In contrast to most of the literature the focus of our approach is not testing the null hypothesis of exact equality of the covariance operators. Instead we propose to formulate the null hypotheses in them form that "the distance between the operators is small", where we measure deviations by the sup-norm. We provide powerful bootstrap tests for these type of hypotheses, investigate their asymptotic properties and study their finite sample properties by means of a simulation study.
△ Less
Submitted 12 June, 2020;
originally announced June 2020.
-
Sequential change point detection in high dimensional time series
Authors:
Josua Gösmann,
Christina Stoehr,
Johannes Heiny,
Holger Dette
Abstract:
Change point detection in high dimensional data has found considerable interest in recent years. Most of the literature either designs methodology for a retrospective analysis, where the whole sample is already available when the statistical inference begins, or considers online detection schemes controlling the average time until a false alarm. This paper takes a different point of view and devel…
▽ More
Change point detection in high dimensional data has found considerable interest in recent years. Most of the literature either designs methodology for a retrospective analysis, where the whole sample is already available when the statistical inference begins, or considers online detection schemes controlling the average time until a false alarm. This paper takes a different point of view and develops monitoring schemes for the online scenario, where high dimensional data arrives successively and the goal is to detect changes as fast as possible controlling at the same time the probability of a type I error of a false alarm. We develop a sequential procedure capable of detecting changes in the mean vector of a successively observed high dimensional time series with spatial and temporal dependence. The statistical properties of the method are analyzed in the case where both, thesample size and dimension tend to infinity. In this scenario, it is shown that the new monitoring scheme has asymptotic level alpha under the null hypothesis of no change and is consistent under the alternative of a change in at least one component of the high dimensional mean vector. The approach is based on a new type of monitoring scheme for one-dimensional data which turns out to be often more powerful than the usually used CUSUM and Page-CUSUM methods, and the component-wise statistics are aggregated by the maximum statistic. For the analysis of the asymptotic properties of our monitoring scheme we prove that the range of a Brownian motion on a given interval is in the domain of attraction of the Gumbel distribution, which is a result of independent interest in extreme value theory. The finite sample properties of the new methodology are illustrated by means of a simulation study and in the analysis of a data example.
△ Less
Submitted 15 December, 2020; v1 submitted 31 May, 2020;
originally announced June 2020.
-
A distribution free test for changes in the trend function of locally stationary processes
Authors:
Holger Dette,
Florian Heinrichs
Abstract:
In the common time series model $X_{i,n} = μ(i/n) + \varepsilon_{i,n}$ with non-stationary errors we consider the problem of detecting a significant deviation of the mean function $μ$ from a benchmark $g (μ)$ (such as the initial value $μ(0)$ or the average trend $\int_{0}^{1} μ(t) dt$). The problem is motivated by a more realistic modelling of change point analysis, where one is interested in ide…
▽ More
In the common time series model $X_{i,n} = μ(i/n) + \varepsilon_{i,n}$ with non-stationary errors we consider the problem of detecting a significant deviation of the mean function $μ$ from a benchmark $g (μ)$ (such as the initial value $μ(0)$ or the average trend $\int_{0}^{1} μ(t) dt$). The problem is motivated by a more realistic modelling of change point analysis, where one is interested in identifying relevant deviations in a smoothly varying sequence of means $ (μ(i/n))_{i =1,\ldots ,n }$ and cannot assume that the sequence is piecewise constant. A test for this type of hypotheses is developed using an appropriate estimator for the integrated squared deviation of the mean function and the threshold. By a new concept of self-normalization adapted to non-stationary processes an asymptotically pivotal test for the hypothesis of a relevant deviation is constructed. The results are illustrated by means of a simulation study and a data example.
△ Less
Submitted 22 May, 2020;
originally announced May 2020.
-
Efficient tests for bio-equivalence in functional data
Authors:
Holger Dette,
Kevin Kokot
Abstract:
We study the problem of testing the equivalence of functional parameters (such as the mean or variance function) in the two sample functional data problem. In contrast to previous work, which reduces the functional problem to a multiple testing problem for the equivalence of scalar data by comparing the functions at each point, our approach is based on an estimate of a distance measuring the maxim…
▽ More
We study the problem of testing the equivalence of functional parameters (such as the mean or variance function) in the two sample functional data problem. In contrast to previous work, which reduces the functional problem to a multiple testing problem for the equivalence of scalar data by comparing the functions at each point, our approach is based on an estimate of a distance measuring the maximum deviation between the two functional parameters. Equivalence is claimed if the estimate for the maximum deviation does not exceed a given threshold. A bootstrap procedure is proposed to obtain quantiles for the distribution of the test statistic and consistency of the corresponding test is proved in the large sample scenario. As the methods proposed here avoid the use of the intersection-union principle they are less conservative and more powerful than the currently available methodology.
△ Less
Submitted 26 April, 2020;
originally announced April 2020.
-
Pivotal tests for relevant differences in the second order dynamics of functional time series
Authors:
Anne van Delft,
Holger Dette
Abstract:
Motivated by the need to statistically quantify differences between modern (complex) data-sets which commonly result as high-resolution measurements of stochastic processes varying over a continuum, we propose novel testing procedures to detect relevant differences between the second order dynamics of two functional time series. In order to take the between-function dynamics into account that char…
▽ More
Motivated by the need to statistically quantify differences between modern (complex) data-sets which commonly result as high-resolution measurements of stochastic processes varying over a continuum, we propose novel testing procedures to detect relevant differences between the second order dynamics of two functional time series. In order to take the between-function dynamics into account that characterize this type of functional data, a frequency domain approach is taken. Test statistics are developed to compare differences in the spectral density operators and in the primary modes of variation as encoded in the associated eigenelements. Under mild moment conditions, we show convergence of the underlying statistics to Brownian motions and construct pivotal test statistics. The latter is essential because the nuisance parameters can be unwieldy and their robust estimation infeasible, especially if the two functional time series are dependent. In addition to these novel features, the properties of the tests are robust to any choice of frequency band enabling also to compare energy contents at a single frequency. The finite sample performance of the tests are verified through a simulation study and are illustrated with an application to fMRI data.
△ Less
Submitted 13 June, 2022; v1 submitted 9 April, 2020;
originally announced April 2020.
-
Quantifying deviations from separability in space-time functional processes
Authors:
Holger Dette,
Gauthier Dierickx,
Tim Kutta
Abstract:
The estimation of covariance operators of spatio-temporal data is in many applications only computationally feasible under simplifying assumptions, such as separability of the covariance into strictly temporal and spatial factors.Powerful tests for this assumption have been proposed in the literature. However, as real world systems, such as climate data are notoriously inseparable, validating this…
▽ More
The estimation of covariance operators of spatio-temporal data is in many applications only computationally feasible under simplifying assumptions, such as separability of the covariance into strictly temporal and spatial factors.Powerful tests for this assumption have been proposed in the literature. However, as real world systems, such as climate data are notoriously inseparable, validating this assumption by statistical tests, seems inherently questionable. In this paper we present an alternative approach: By virtue of separability measures, we quantify how strongly the data's covariance operator diverges from a separable approximation. Confidence intervals localize these measures with statistical guarantees. This method provides users with a flexible tool, to weigh the computational gains of a separable model against the associated increase in bias. As separable approximations we consider the established methods of partial traces and partial products, and develop weak convergence principles for the corresponding estimators. Moreover, we also prove such results for estimators of optimal, separable approximations, which are arguably of most interest in applications. In particular we present for the first time statistical inference for this object, which has been confined to estimation previously. Besides confidence intervals, our results encompass tests for approximate separability. All methods proposed in this paper are free of nuisance parameters and do neither require computationally expensive resampling procedures nor the estimation of nuisance parameters. A simulation study underlines the advantages of our approach and its applicability is demonstrated by the investigation of German annual temperature data.
△ Less
Submitted 26 March, 2020;
originally announced March 2020.
-
Design admissibility and de la Garza phenomenon in multi-factor experiments
Authors:
Holger Dette,
Xin Liu,
Rong-Xian Yue
Abstract:
The determination of an optimal design for a given regression problem is an intricate optimization problem, especially for models with multivariate predictors. Design admissibility and invariance are main tools to reduce the complexity of the optimization problem and have been successfully applied for models with univariate predictors. In particular several authors have developed sufficient condit…
▽ More
The determination of an optimal design for a given regression problem is an intricate optimization problem, especially for models with multivariate predictors. Design admissibility and invariance are main tools to reduce the complexity of the optimization problem and have been successfully applied for models with univariate predictors. In particular several authors have developed sufficient conditions for the existence of saturated designs in univariate models, where the number of support points of the optimal design equals the number of parameters. These results generalize the celebrated de la Garza phenomenon (de la Garza, 1954) which states that for a polynomial regression model of degree $p-1$ any optimal design can be based on at most $p$ points. This paper provides - for the first time - extensions of these results for models with a multivariate predictor. In particular we study a geometric characterization of the support points of an optimal design to provide sufficient conditions for the occurrence of the de la Garza phenomenon in models with multivariate predictors and characterize properties of admissible designs in terms of admissibility of designs in conditional univariate regression models.
△ Less
Submitted 20 March, 2020;
originally announced March 2020.
-
Statistical Inference for High Dimensional Panel Functional Time Series
Authors:
Zhou Zhou,
Holger Dette
Abstract:
In this paper we develop statistical inference tools for high dimensional functional time series. We introduce a new concept of physical dependent processes in the space of square integrable functions, which adopts the idea of basis decomposition of functional data in these spaces, and derive Gaussian and multiplier bootstrap approximations for sums of high dimensional functional time series. Thes…
▽ More
In this paper we develop statistical inference tools for high dimensional functional time series. We introduce a new concept of physical dependent processes in the space of square integrable functions, which adopts the idea of basis decomposition of functional data in these spaces, and derive Gaussian and multiplier bootstrap approximations for sums of high dimensional functional time series. These results have numerous important statistical consequences. Exemplarily, we consider the development of joint simultaneous confidence bands for the mean functions and the construction of tests for the hypotheses that the mean functions in the spatial dimension are parallel. The results are illustrated by means of a small simulation study and in the analysis of Canadian temperature data.
△ Less
Submitted 12 March, 2020;
originally announced March 2020.
-
Are deviations in a gradually varying mean relevant? A testing approach based on sup-norm estimators
Authors:
Axel Bücher,
Holger Dette,
Florian Heinrichs
Abstract:
Classical change point analysis aims at (1) detecting abrupt changes in the mean of a possibly non-stationary time series and at (2) identifying regions where the mean exhibits a piecewise constant behavior. In many applications however, it is more reasonable to assume that the mean changes gradually in a smooth way. Those gradual changes may either be non-relevant (i.e., small), or relevant for a…
▽ More
Classical change point analysis aims at (1) detecting abrupt changes in the mean of a possibly non-stationary time series and at (2) identifying regions where the mean exhibits a piecewise constant behavior. In many applications however, it is more reasonable to assume that the mean changes gradually in a smooth way. Those gradual changes may either be non-relevant (i.e., small), or relevant for a specific problem at hand, and the present paper presents statistical methodology to detect the latter. More precisely, we consider the common nonparametric regression model $X_{i} = μ(i/n) + \varepsilon_{i}$ with possibly non-stationary errors and propose a test for the null hypothesis that the maximum absolute deviation of the regression function $μ$ from a functional $g (μ)$ (such as the value $μ(0)$ or the integral $\int_{0}^{1} μ(t) dt$) is smaller than a given threshold on a given interval $[x_{0},x_{1}] \subseteq [0,1]$. A test for this type of hypotheses is developed using an appropriate estimator, say $\hat d_{\infty, n}$, for the maximum deviation $ d_{\infty}= \sup_{t \in [x_{0},x_{1}]} |μ(t) - g( μ) |$. We derive the limiting distribution of an appropriately standardized version of $\hat d_{\infty,n}$, where the standardization depends on the Lebesgue measure of the set of extremal points of the function $μ(\cdot)-g(μ)$. A refined procedure based on an estimate of this set is developed and its consistency is proved. The results are illustrated by means of a simulation study and a data example.
△ Less
Submitted 14 February, 2020;
originally announced February 2020.
-
Prediction in locally stationary time series
Authors:
Holger Dette,
Weichi Wu
Abstract:
We develop an estimator for the high-dimensional covariance matrix of a locally stationary process with a smoothly varying trend and use this statistic to derive consistent predictors in non-stationary time series. In contrast to the currently available methods for this problem the predictor developed here does not rely on fitting an autoregressive model and does not require a vanishing trend. The…
▽ More
We develop an estimator for the high-dimensional covariance matrix of a locally stationary process with a smoothly varying trend and use this statistic to derive consistent predictors in non-stationary time series. In contrast to the currently available methods for this problem the predictor developed here does not rely on fitting an autoregressive model and does not require a vanishing trend. The finite sample properties of the new methodology are illustrated by means of a simulation study and a financial indices study.
△ Less
Submitted 3 January, 2020; v1 submitted 2 January, 2020;
originally announced January 2020.
-
Detecting structural breaks in eigensystems of functional time series
Authors:
Holger Dette,
Tim Kutta
Abstract:
Detecting structural changes in functional data is a prominent topic in statistical literature. However not all trends in the data are important in applications, but only those of large enough influence. In this paper we address the problem of identifying relevant changes in the eigenfunctions and eigenvalues of covariance kernels of $L^2[0,1]$-valued time series. By self-normalization techniques…
▽ More
Detecting structural changes in functional data is a prominent topic in statistical literature. However not all trends in the data are important in applications, but only those of large enough influence. In this paper we address the problem of identifying relevant changes in the eigenfunctions and eigenvalues of covariance kernels of $L^2[0,1]$-valued time series. By self-normalization techniques we derive pivotal, asymptotically consistent tests for relevant changes in these characteristics of the second order structure and investigate their finite sample properties in a simulation study. The applicability of our approach is demonstrated analyzing German annual temperature data.
△ Less
Submitted 18 November, 2019;
originally announced November 2019.
-
Two-sample tests for relevant differences in the eigenfunctions of covariance operators
Authors:
Alexander Aue,
Holger Dette,
Gregory Rice
Abstract:
This paper deals with two-sample tests for functional time series data, which have become widely available in conjunction with the advent of modern complex observation systems. Here, particular interest is in evaluating whether two sets of functional time series observations share the shape of their primary modes of variation as encoded by the eigenfunctions of the respective covariance operators.…
▽ More
This paper deals with two-sample tests for functional time series data, which have become widely available in conjunction with the advent of modern complex observation systems. Here, particular interest is in evaluating whether two sets of functional time series observations share the shape of their primary modes of variation as encoded by the eigenfunctions of the respective covariance operators. To this end, a novel testing approach is introduced that connects with, and extends, existing literature in two main ways. First, tests are set up in the relevant testing framework, where interest is not in testing an exact null hypothesis but rather in detecting deviations deemed sufficiently relevant, with relevance determined by the practitioner and perhaps guided by domain experts. Second, the proposed test statistics rely on a self-normalization principle that helps to avoid the notoriously difficult task of estimating the long-run covariance structure of the underlying functional time series. The main theoretical result of this paper is the derivation of the large-sample behavior of the proposed test statistics. Empirical evidence, indicating that the proposed procedures work well in finite samples and compare favorably with competing methods, is provided through a simulation study, and an application to annual temperature data.
△ Less
Submitted 13 September, 2019;
originally announced September 2019.
-
Identifying shifts between two regression curves
Authors:
Holger Dette,
Subhra Sankar Dhar,
Weichi Wu
Abstract:
This article studies the problem whether two convex (concave) regression functions modelling the relation between a response and covariate in two samples differ by a shift in the horizontal and/or vertical axis. We consider a nonparametric situation assuming only smoothness of the regression functions. A graphical tool based on the derivatives of the regression functions and their inverses is prop…
▽ More
This article studies the problem whether two convex (concave) regression functions modelling the relation between a response and covariate in two samples differ by a shift in the horizontal and/or vertical axis. We consider a nonparametric situation assuming only smoothness of the regression functions. A graphical tool based on the derivatives of the regression functions and their inverses is proposed to answer this question and studied in several examples. We also formalize this question in a corresponding hypothesis and develop a statistical test. The asymptotic properties of the corresponding test statistic are investigated under the null hypothesis and local alternatives. In contrast to most of the literature on comparing shape invariant models, which requires independent data the procedure is applicable for dependent and non-stationary data. We also illustrate the finite sample properties of the new test by means of a small simulation study and a real data example.
△ Less
Submitted 12 August, 2019;
originally announced August 2019.
-
Prediction in regression models with continuous observations
Authors:
Holger Dette,
Andrey Pepelyshev,
Anatoly Zhigljavsky
Abstract:
We consider the problem of predicting values of a random process or field satisfying a linear model $y(x)=θ^\top f(x) + \varepsilon(x)$, where errors $\varepsilon(x)$ are correlated. This is a common problem in kriging, where the case of discrete observations is standard. By focussing on the case of continuous observations, we derive expressions for the best linear unbiased predictors and their me…
▽ More
We consider the problem of predicting values of a random process or field satisfying a linear model $y(x)=θ^\top f(x) + \varepsilon(x)$, where errors $\varepsilon(x)$ are correlated. This is a common problem in kriging, where the case of discrete observations is standard. By focussing on the case of continuous observations, we derive expressions for the best linear unbiased predictors and their mean squared error. Our results are also applicable in the case where the derivatives of the process $y$ are available, and either a response or one of its derivatives need to be predicted. The theoretical results are illustrated by several examples in particular for the popular Matérn $3/2$ kernel.
△ Less
Submitted 12 August, 2019;
originally announced August 2019.
-
Optimal designs for estimating individual coefficients in polynomial regression with no intercept
Authors:
Holger Dette,
Viatcheslav B. Melas,
Petr Shpilev
Abstract:
In a seminal paper \cite{studden1968} characterized $c$-optimal designs in regression models, where the regression functions form a Chebyshev system. He used these results to determine the optimal design for estimating the individual coefficients in a polynomial regression model on the interval $[-1,1]$ explicitly. In this note we identify the optimal design for estimating the individual coefficie…
▽ More
In a seminal paper \cite{studden1968} characterized $c$-optimal designs in regression models, where the regression functions form a Chebyshev system. He used these results to determine the optimal design for estimating the individual coefficients in a polynomial regression model on the interval $[-1,1]$ explicitly. In this note we identify the optimal design for estimating the individual coefficients in a polynomial regression model with no intercept (here the regression functions do not form a Chebyshev system).
△ Less
Submitted 19 June, 2019;
originally announced June 2019.
-
A new approach for open-end sequential change point monitoring
Authors:
Josua Gösmann,
Tobias Kley,
Holger Dette
Abstract:
We propose a new sequential monitoring scheme for changes in the parameters of a multivariate time series. In contrast to procedures proposed in the literature which compare an estimator from the training sample with an estimator calculated from the remaining data, we suggest to divide the sample at each time point after the training sample. Estimators from the sample before and after all separati…
▽ More
We propose a new sequential monitoring scheme for changes in the parameters of a multivariate time series. In contrast to procedures proposed in the literature which compare an estimator from the training sample with an estimator calculated from the remaining data, we suggest to divide the sample at each time point after the training sample. Estimators from the sample before and after all separation points are then continuously compared calculating a maximum of norms of their differences. For open-end scenarios our approach yields an asymptotic level $α$ procedure, which is consistent under the alternative of a change in the parameter. By means of a simulation study it is demonstrated that the new method outperforms the commonly used procedures with respect to power and the feasibility of our approach is illustrated by analyzing two data examples.
△ Less
Submitted 27 July, 2020; v1 submitted 3 June, 2019;
originally announced June 2019.