-
Lag selection and estimation of stable parameters for multiple autoregressive processes through convex programming
Authors:
Somnath Chakraborty,
Johannes Lederer,
Rainer von Sachs
Abstract:
Motivated by a variety of applications, high-dimensional time series have become an active topic of research. In particular, several methods and finite-sample theories for individual stable autoregressive processes with known lag have become available very recently. We, instead, consider multiple stable autoregressive processes that share an unknown lag. We use information across the different pro…
▽ More
Motivated by a variety of applications, high-dimensional time series have become an active topic of research. In particular, several methods and finite-sample theories for individual stable autoregressive processes with known lag have become available very recently. We, instead, consider multiple stable autoregressive processes that share an unknown lag. We use information across the different processes to simultaneously select the lag and estimate the parameters. We prove that the estimated process is stable, and we establish rates for the forecasting error that can outmatch the known rate in our setting. Our insights on the lag selection and the stability are also of interest for the case of individual autoregressive processes.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
Statistical inference for intrinsic wavelet estimators of SPD matrices in a log-Euclidean manifold
Authors:
Johannes Krebs,
Daniel Rademacher,
Rainer von Sachs
Abstract:
In this paper we treat statistical inference for an intrinsic wavelet estimator of curves of symmetric positive definite (SPD) matrices in a log-Euclidean manifold. This estimator preserves positive-definiteness and enjoys permutation-equivariance, which is particularly relevant for covariance matrices. Our second-generation wavelet estimator is based on average-interpolation and allows the same p…
▽ More
In this paper we treat statistical inference for an intrinsic wavelet estimator of curves of symmetric positive definite (SPD) matrices in a log-Euclidean manifold. This estimator preserves positive-definiteness and enjoys permutation-equivariance, which is particularly relevant for covariance matrices. Our second-generation wavelet estimator is based on average-interpolation and allows the same powerful properties, including fast algorithms, known from nonparametric curve estimation with wavelets in standard Euclidean set-ups.
The core of our work is the proposition of confidence sets for our high-level wavelet estimator in a non-Euclidean geometry. We derive asymptotic normality of this estimator, including explicit expressions of its asymptotic variance. This opens the door for constructing asymptotic confidence regions which we compare with our proposed bootstrap scheme for inference. Detailed numerical simulations confirm the appropriateness of our suggested inference schemes.
△ Less
Submitted 14 February, 2022;
originally announced February 2022.
-
VC-PCR: A Prediction Method based on Supervised Variable Selection and Clustering
Authors:
Rebecca Marion,
Johannes Lederer,
Bernadette Govaerts,
Rainer von Sachs
Abstract:
Sparse linear prediction methods suffer from decreased prediction accuracy when the predictor variables have cluster structure (e.g. there are highly correlated groups of variables). To improve prediction accuracy, various methods have been proposed to identify variable clusters from the data and integrate cluster information into a sparse modeling process. But none of these methods achieve satisf…
▽ More
Sparse linear prediction methods suffer from decreased prediction accuracy when the predictor variables have cluster structure (e.g. there are highly correlated groups of variables). To improve prediction accuracy, various methods have been proposed to identify variable clusters from the data and integrate cluster information into a sparse modeling process. But none of these methods achieve satisfactory performance for prediction, variable selection and variable clustering simultaneously. This paper presents Variable Cluster Principal Component Regression (VC-PCR), a prediction method that supervises variable selection and variable clustering in order to solve this problem. Experiments with real and simulated data demonstrate that, compared to competitor methods, VC-PCR achieves better prediction, variable selection and clustering performance when cluster structure is present.
△ Less
Submitted 2 February, 2022;
originally announced February 2022.
-
Nonparametric monitoring of sunspot number observations: a case study
Authors:
Sophie Mathieu,
Laure Lefèvre,
Rainer von Sachs,
Véronique Delouille,
Christian Ritter,
Frédéric Clette
Abstract:
Solar activity is an important driver of long-term climate trends and must be accounted for in climate models. Unfortunately, direct measurements of this quantity over long periods do not exist. The only observation related to solar activity whose records reach back to the seventeenth century are sunspots. Surprisingly, determining the number of sunspots consistently over time has remained until t…
▽ More
Solar activity is an important driver of long-term climate trends and must be accounted for in climate models. Unfortunately, direct measurements of this quantity over long periods do not exist. The only observation related to solar activity whose records reach back to the seventeenth century are sunspots. Surprisingly, determining the number of sunspots consistently over time has remained until today a challenging statistical problem. It arises from the need of consolidating data from multiple observing stations around the world in a context of low signal-to-noise ratios, non-stationarity, missing data, non-standard distributions and many kinds of errors. The data from some stations experience therefore severe and various deviations over time. In this paper, we propose the first systematic and thorough statistical approach for monitoring these complex and important series. It consists of three steps essential for successful treatment of the data: smoothing on multiple timescales, monitoring using block bootstrap calibrated CUSUM charts and classifying of out-of-control situations by support vector techniques. This approach allows us to detect a wide range of anomalies (such as sudden jumps or more progressive drifts), unseen in previous analyses. It helps us to identify the causes of major deviations, which are often observer or equipment related. Their detection and identification will contribute to improve future observations. Their elimination or correction in past data will lead to a more precise reconstruction of the world reference index for solar activity: the International Sunspot Number.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
Nonparametric robust monitoring of time series panel data
Authors:
Sophie Mathieu,
Rainer von Sachs,
Véronique Delouille,
Laure Lefèvre,
Christian Ritter
Abstract:
In many applications, a control procedure is required to detect potential deviations in a panel of serially correlated processes. It is common that the processes are corrupted by noise and that no prior information about the in-control data are available for that purpose. This paper suggests a general nonparametric monitoring scheme for supervising such a panel with time-varying mean and variance.…
▽ More
In many applications, a control procedure is required to detect potential deviations in a panel of serially correlated processes. It is common that the processes are corrupted by noise and that no prior information about the in-control data are available for that purpose. This paper suggests a general nonparametric monitoring scheme for supervising such a panel with time-varying mean and variance. The method is based on a control chart designed by block bootstrap, which does not require parametric assumptions on the distribution of the data. The procedure is tailored to cope with strong noise, potentially missing values and absence of in-control series, which is tackled by an intelligent exploitation of the information in the panel. Our methodology is completed by support vector machine procedures to estimate magnitude and form of the encountered deviations (such as stepwise shifts or functional drifts). This scheme, though generic in nature, is able to treat an important applied data problem: the control of deviations in a subset of sunspot number observations which are part of the International Sunspot Number, a world reference for long-term solar activity.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
Uncertainty quantification in sunspot counts
Authors:
Sophie Mathieu,
Véronique Delouille,
Laure Lefèvre,
Christian Ritter,
Rainer von Sachs
Abstract:
Observing and counting sunspots constitutes one of the longest-running scientific experiment, with first observations dating back to Galileo and the invention of the telescope around 1610. Today the sunspot number (SN) time series acts as a benchmark of solar activity in a large range of physical models. An appropriate statistical modelling, adapted to the time series' complex nature, is however s…
▽ More
Observing and counting sunspots constitutes one of the longest-running scientific experiment, with first observations dating back to Galileo and the invention of the telescope around 1610. Today the sunspot number (SN) time series acts as a benchmark of solar activity in a large range of physical models. An appropriate statistical modelling, adapted to the time series' complex nature, is however still lacking. In this work, we provide the first comprehensive uncertainty quantification analysis of sunspot counts. Our interest lies in the following three components: the number of spots ($N_s$), the number of sunspot groups ($N_g$), and the composite $N_c$, defined as $N_c:=N_s+10N_g$. Those are reported by a network of observatories around the world, and are corrupted by errors of various types. We use a multiplicative framework to provide, for each of the three components, an estimation of their error distribution in various regimes (short-term, long-term, minima of solar activity). We also propose a robust estimator for the underlying solar signal and fit a density distribution that takes into account intrinsic characteristics such as over-dispersion, excess of zeros, and multiple modes. The estimation of the solar signal underlying the composite $N_c$ may be seen as a robust version of the International Sunspot Number (ISN), a quantity widely used as a proxy of solar activity. Therefore our results on $N_c$ may serve to characterize the uncertainty on ISN as well. Our results paves the way for a future monitoring of the observatories in quasi-real time, with the aim to alert the observers when they start deviating from the network and prevent large drifts from occurring in the network.
△ Less
Submitted 21 September, 2020;
originally announced September 2020.
-
Intrinsic wavelet regression for surfaces of Hermitian positive definite matrices
Authors:
Joris Chau,
Rainer von Sachs
Abstract:
This paper develops intrinsic wavelet denoising methods for surfaces of Hermitian positive definite matrices, with in mind the application to nonparametric estimation of the time-varying spectral matrix of a multivariate locally stationary time series. First, we construct intrinsic average-interpolating wavelet transforms acting directly on surfaces of Hermitian positive definite matrices in a cur…
▽ More
This paper develops intrinsic wavelet denoising methods for surfaces of Hermitian positive definite matrices, with in mind the application to nonparametric estimation of the time-varying spectral matrix of a multivariate locally stationary time series. First, we construct intrinsic average-interpolating wavelet transforms acting directly on surfaces of Hermitian positive definite matrices in a curved Riemannian manifold with respect to an affine-invariant metric. Second, we derive the wavelet coefficient decay and linear wavelet thresholding convergence rates of intrinsically smooth surfaces of Hermitian positive definite matrices, and investigate practical nonlinear thresholding of wavelet coefficients based on their trace in the context of intrinsic signal plus noise models in the Riemannian manifold. The finite-sample performance of nonlinear tree-structured trace thresholding is assessed by means of simulated data, and the proposed intrinsic wavelet methods are used to estimate the time-varying spectral matrix of a nonstationary multivariate electroencephalography (EEG) time series recorded during an epileptic brain seizure.
△ Less
Submitted 21 May, 2019; v1 submitted 27 August, 2018;
originally announced August 2018.
-
Asymptotics for high-dimensional covariance matrices and quadratic forms with applications to the trace functional and shrinkage
Authors:
Ansgar Steland,
Rainer von Sachs
Abstract:
We establish large sample approximations for an arbitray number of bilinear forms of the sample variance-covariance matrix of a high-dimensional vector time series using $ \ell_1$-bounded and small $\ell_2$-bounded weighting vectors. Estimation of the asymptotic covariance structure is also discussed. The results hold true without any constraint on the dimension, the number of forms and the sample…
▽ More
We establish large sample approximations for an arbitray number of bilinear forms of the sample variance-covariance matrix of a high-dimensional vector time series using $ \ell_1$-bounded and small $\ell_2$-bounded weighting vectors. Estimation of the asymptotic covariance structure is also discussed. The results hold true without any constraint on the dimension, the number of forms and the sample size or their ratios. Concrete and potential applications are widespread and cover high-dimensional data science problems such as tests for large numbers of covariances, sparse portfolio optimization and projections onto sparse principal components or more general spanning sets as frequently considered, e.g. in classification and dictionary learning. As two specific applications of our results, we study in greater detail the asymptotics of the trace functional and shrinkage estimation of covariance matrices. In shrinkage estimation, it turns out that the asymptotics differs for weighting vectors bounded away from orthogonaliy and nearly orthogonal ones in the sense that their inner product converges to 0.
△ Less
Submitted 6 November, 2017;
originally announced November 2017.
-
Intrinsic data depth for Hermitian positive definite matrices
Authors:
Joris Chau,
Hernando Ombao,
Rainer von Sachs
Abstract:
Nondegenerate covariance, correlation and spectral density matrices are necessarily symmetric or Hermitian and positive definite. The main contribution of this paper is the development of statistical data depths for collections of Hermitian positive definite matrices by exploiting the geometric structure of the space as a Riemannian manifold. The depth functions allow one to naturally characterize…
▽ More
Nondegenerate covariance, correlation and spectral density matrices are necessarily symmetric or Hermitian and positive definite. The main contribution of this paper is the development of statistical data depths for collections of Hermitian positive definite matrices by exploiting the geometric structure of the space as a Riemannian manifold. The depth functions allow one to naturally characterize most central or outlying matrices, but also provide a practical framework for inference in the context of samples of positive definite matrices. First, the desired properties of an intrinsic data depth function acting on the space of Hermitian positive definite matrices are presented. Second, we propose two computationally fast pointwise and integrated data depth functions that satisfy each of these requirements and investigate several robustness and efficiency aspects. As an application, we construct depth-based confidence regions for the intrinsic mean of a sample of positive definite matrices, which is applied to the exploratory analysis of a collection of covariance matrices associated to a multicenter research trial.
△ Less
Submitted 9 April, 2018; v1 submitted 26 June, 2017;
originally announced June 2017.
-
Large-sample approximations for variance-covariance matrices of high-dimensional time series
Authors:
Ansgar Steland,
Rainer von Sachs
Abstract:
Distributional approximations of (bi--) linear functions of sample variance-covariance matrices play a critical role to analyze vector time series, as they are needed for various purposes, especially to draw inference on the dependence structure in terms of second moments and to analyze projections onto lower dimensional spaces as those generated by principal components. This particularly applies…
▽ More
Distributional approximations of (bi--) linear functions of sample variance-covariance matrices play a critical role to analyze vector time series, as they are needed for various purposes, especially to draw inference on the dependence structure in terms of second moments and to analyze projections onto lower dimensional spaces as those generated by principal components. This particularly applies to the high-dimensional case, where the dimension $d$ is allowed to grow with the sample size $n$ and may even be larger than $n$. We establish large-sample approximations for such bilinear forms related to the sample variance-covariance matrix of a high-dimensional vector time series in terms of strong approximations by Brownian motions. The results cover weakly dependent as well as many long-range dependent linear processes and are valid for uniformly $ \ell_1 $-bounded projection vectors, which arise, either naturally or by construction, in many statistical problems extensively studied for high-dimensional series. Among those problems are sparse financial portfolio selection, sparse principal components, the LASSO, shrinkage estimation and change-point analysis for high--dimensional time series, which matter for the analysis of big data and are discussed in greater detail.
△ Less
Submitted 20 April, 2017;
originally announced April 2017.
-
Time-frequency analysis of locally stationary Hawkes processes
Authors:
François Roueff,
Rainer Von Sachs
Abstract:
Locally stationary Hawkes processes have been introduced in order to generalise classical Hawkes processes away from stationarity by allowing for a time-varying second-order structure. This class of self-exciting point processes has recently attracted a lot of interest in applications in the life sciences (seismology, genomics, neuro-science,...), but also in the modelling of high-frequency financ…
▽ More
Locally stationary Hawkes processes have been introduced in order to generalise classical Hawkes processes away from stationarity by allowing for a time-varying second-order structure. This class of self-exciting point processes has recently attracted a lot of interest in applications in the life sciences (seismology, genomics, neuro-science,...), but also in the modelling of high-frequency financial data. In this contribution we provide a fully developed nonparametric estimation theory of both local mean density and local Bartlett spectra of a locally stationary Hawkes process. In particular we apply our kernel estimation of the spectrum localised both in time and frequency to two data sets of transaction times revealing pertinent features in the data that had not been made visible by classical non-localised approaches based on models with constant fertility functions over time.
△ Less
Submitted 30 January, 2018; v1 submitted 5 April, 2017;
originally announced April 2017.
-
Intrinsic wavelet regression for curves of Hermitian positive definite matrices
Authors:
Joris Chau,
Rainer von Sachs
Abstract:
Intrinsic wavelet transforms and wavelet estimation methods are introduced for curves in the non-Euclidean space of Hermitian positive definite matrices, with in mind the application to Fourier spectral estimation of multivariate stationary time series. The main focus is on intrinsic average-interpolation wavelet transforms in the space of positive definite matrices equipped with an affine-invaria…
▽ More
Intrinsic wavelet transforms and wavelet estimation methods are introduced for curves in the non-Euclidean space of Hermitian positive definite matrices, with in mind the application to Fourier spectral estimation of multivariate stationary time series. The main focus is on intrinsic average-interpolation wavelet transforms in the space of positive definite matrices equipped with an affine-invariant Riemannian metric, and convergence rates of linear wavelet thresholding are derived for intrinsically smooth curves of Hermitian positive definite matrices. In the context of multivariate Fourier spectral estimation, intrinsic wavelet thresholding is equivariant under a change of basis of the time series, and nonlinear wavelet thresholding is able to capture localized features in the spectral density matrix across frequency, always guaranteeing positive definite estimates. The finite-sample performance of intrinsic wavelet thresholding is assessed by means of simulated data and compared to several benchmark estimators in the Riemannian manifold. Further illustrations are provided by examining the multivariate spectra of trial-replicated brain signal time series recorded during a learning experiment.
△ Less
Submitted 10 November, 2019; v1 submitted 12 January, 2017;
originally announced January 2017.
-
Functional mixed effects wavelet estimation for spectra of replicated time series
Authors:
Joris Chau,
Rainer von Sachs
Abstract:
Motivated by spectral analysis of replicated brain signal time series, we propose a functional mixed effects approach to model replicate-specific spectral densities as random curves varying about a deterministic population-mean spectrum. In contrast to existing work, we do not assume the replicate-specific spectral curves to be independent, i.e. there may exist explicit correlation between differe…
▽ More
Motivated by spectral analysis of replicated brain signal time series, we propose a functional mixed effects approach to model replicate-specific spectral densities as random curves varying about a deterministic population-mean spectrum. In contrast to existing work, we do not assume the replicate-specific spectral curves to be independent, i.e. there may exist explicit correlation between different replicates in the population. By projecting the replicate-specific curves onto an orthonormal wavelet basis, estimation and prediction is carried out under an equivalent linear mixed effects model in the wavelet coefficient domain. To cope with potentially very localized features of the spectral curves, we develop estimators and predictors based on a combination of generalized least squares estimation and nonlinear wavelet thresholding, including asymptotic confidence sets for the population-mean curve. We derive risk bounds for the nonlinear wavelet estimator of the population-mean curve, a result that reflects the influence of correlation between different curves in the replicate-population, and we derive consistency of the estimators of the inter- and intra-curve correlation structure in an appropriate sparseness class of functions. To illustrate the proposed functional mixed effects model and our estimation and prediction procedures, we present several simulated time series data examples and we analyze a motivating brain signal dataset recorded during an associative learning experiment.
△ Less
Submitted 23 July, 2016; v1 submitted 5 April, 2016;
originally announced April 2016.
-
Nonparametric Transient Classification using Adaptive Wavelets
Authors:
Melvin M. Varughese,
Rainer von Sachs,
Michael Stephanou,
Bruce A. Bassett
Abstract:
Classifying transients based on multi band light curves is a challenging but crucial problem in the era of GAIA and LSST since the sheer volume of transients will make spectroscopic classification unfeasible. Here we present a nonparametric classifier that uses the transient's light curve measurements to predict its class given training data. It implements two novel components: the first is the us…
▽ More
Classifying transients based on multi band light curves is a challenging but crucial problem in the era of GAIA and LSST since the sheer volume of transients will make spectroscopic classification unfeasible. Here we present a nonparametric classifier that uses the transient's light curve measurements to predict its class given training data. It implements two novel components: the first is the use of the BAGIDIS wavelet methodology - a characterization of functional data using hierarchical wavelet coefficients. The second novelty is the introduction of a ranked probability classifier on the wavelet coefficients that handles both the heteroscedasticity of the data in addition to the potential non-representativity of the training set. The ranked classifier is simple and quick to implement while a major advantage of the BAGIDIS wavelets is that they are translation invariant, hence they do not need the light curves to be aligned to extract features. Further, BAGIDIS is nonparametric so it can be used for blind searches for new objects. We demonstrate the effectiveness of our ranked wavelet classifier against the well-tested Supernova Photometric Classification Challenge dataset in which the challenge is to correctly classify light curves as Type Ia or non-Ia supernovae. We train our ranked probability classifier on the spectroscopically-confirmed subsample (which is not representative) and show that it gives good results for all supernova with observed light curve timespans greater than 100 days (roughly 55% of the dataset). For such data, we obtain a Ia efficiency of 80.5% and a purity of 82.4% yielding a highly competitive score of 0.49 whilst implementing a truly "model-blind" approach to supernova classification. Consequently this approach may be particularly suitable for the classification of astronomical transients in the era of large synoptic sky surveys.
△ Less
Submitted 23 October, 2015; v1 submitted 31 March, 2015;
originally announced April 2015.
-
Locally stationary long memory estimation
Authors:
François Roueff,
Rainer Von Sachs
Abstract:
There exists a wide literature on modelling strongly dependent time series using a longmemory parameter d, including more recent work on semiparametric wavelet estimation. As a generalization of these latter approaches, in this work we allow the long-memory parameter d to be varying over time. We embed our approach into the framework of locally stationary processes. We show weak consistency and a…
▽ More
There exists a wide literature on modelling strongly dependent time series using a longmemory parameter d, including more recent work on semiparametric wavelet estimation. As a generalization of these latter approaches, in this work we allow the long-memory parameter d to be varying over time. We embed our approach into the framework of locally stationary processes. We show weak consistency and a central limit theorem for our log-regression wavelet estimator of the time-dependent d in a Gaussian context. Both simulations and a real data example complete our work on providing a fairly general approach.
△ Less
Submitted 31 May, 2010; v1 submitted 29 July, 2009;
originally announced July 2009.
-
Locally adaptive estimation of evolutionary wavelet spectra
Authors:
Sébastien Van Bellegem,
Rainer von Sachs
Abstract:
We introduce a wavelet-based model of local stationarity. This model enlarges the class of locally stationary wavelet processes and contains processes whose spectral density function may change very suddenly in time. A notion of time-varying wavelet spectrum is uniquely defined as a wavelet-type transform of the autocovariance function with respect to so-called autocorrelation wavelets. This lea…
▽ More
We introduce a wavelet-based model of local stationarity. This model enlarges the class of locally stationary wavelet processes and contains processes whose spectral density function may change very suddenly in time. A notion of time-varying wavelet spectrum is uniquely defined as a wavelet-type transform of the autocovariance function with respect to so-called autocorrelation wavelets. This leads to a natural representation of the autocovariance which is localized on scales. We propose a pointwise adaptive estimator of the time-varying spectrum. The behavior of the estimator studied in homogeneous and inhomogeneous regions of the wavelet spectrum.
△ Less
Submitted 11 August, 2008;
originally announced August 2008.
-
Structural shrinkage of nonparametric spectral estimators for multivariate time series
Authors:
Hilmar Böhm,
Rainer von Sachs
Abstract:
In this paper we investigate the performance of periodogram based estimators of the spectral density matrix of possibly high-dimensional time series. We suggest and study shrinkage as a remedy against numerical instabilities due to deteriorating condition numbers of (kernel) smoothed periodogram matrices. Moreover, shrinking the empirical eigenvalues in the frequency domain towards one another a…
▽ More
In this paper we investigate the performance of periodogram based estimators of the spectral density matrix of possibly high-dimensional time series. We suggest and study shrinkage as a remedy against numerical instabilities due to deteriorating condition numbers of (kernel) smoothed periodogram matrices. Moreover, shrinking the empirical eigenvalues in the frequency domain towards one another also improves at the same time the Mean Squared Error (MSE) of these widely used nonparametric spectral estimators. Compared to some existing time domain approaches, restricted to i.i.d. data, in the frequency domain it is necessary to take the size of the smoothing span as "effective or local sample size" into account. While Böhm and von Sachs (2007) proposes a multiple of the identity matrix as optimal shrinkage target in the absence of knowledge about the multidimensional structure of the data, here we consider "structural" shrinkage. We assume that the spectral structure of the data is induced by underlying factors. However, in contrast to actual factor modelling suffering from the need to choose the number of factors, we suggest a model-free approach. Our final estimator is the asymptotically MSE-optimal linear combination of the smoothed periodogram and the parametric estimator based on an underfitting (and hence deliberately misspecified) factor model. We complete our theoretical considerations by some extensive simulation studies. In the situation of data generated from a higher-order factor model, we compare all four types of involved estimators (including the one of Böhm and von Sachs (2007)).
△ Less
Submitted 13 August, 2008; v1 submitted 30 April, 2008;
originally announced April 2008.