-
Learning the regularity of multivariate functional data
Authors:
Omar Kassi,
Nicolas Klutchnikoff,
Valentin Patilea
Abstract:
Combining information both within and between sample realizations, we propose a simple estimator for the local regularity of surfaces in the functional data framework. The independently generated surfaces are measured with errors at possibly random discrete times. Non-asymptotic exponential bounds for the concentration of the regularity estimators are derived. An indicator for anisotropy is propos…
▽ More
Combining information both within and between sample realizations, we propose a simple estimator for the local regularity of surfaces in the functional data framework. The independently generated surfaces are measured with errors at possibly random discrete times. Non-asymptotic exponential bounds for the concentration of the regularity estimators are derived. An indicator for anisotropy is proposed and an exponential bound of its risk is derived. Two applications are proposed. We first consider the class of multi-fractional, bi-dimensional, Brownian sheets with domain deformation, and study the nonparametric estimation of the deformation. As a second application, we build minimax optimal, bivariate kernel estimators for the reconstruction of the surfaces.
△ Less
Submitted 2 October, 2023; v1 submitted 26 July, 2023;
originally announced July 2023.
-
Adaptive functional principal components analysis
Authors:
Sunny G. W. Wang,
Valentin Patilea,
Nicolas Klutchnikoff
Abstract:
Functional data analysis almost always involves smoothing discrete observations into curves, because they are never observed in continuous time and rarely without error. Although smoothing parameters affect the subsequent inference, data-driven methods for selecting these parameters are not well-developed, frustrated by the difficulty of using all the information shared by curves while being compu…
▽ More
Functional data analysis almost always involves smoothing discrete observations into curves, because they are never observed in continuous time and rarely without error. Although smoothing parameters affect the subsequent inference, data-driven methods for selecting these parameters are not well-developed, frustrated by the difficulty of using all the information shared by curves while being computationally efficient. On the one hand, smoothing individual curves in an isolated, albeit sophisticated way, ignores useful signals present in other curves. On the other hand, bandwidth selection by automatic procedures such as cross-validation after pooling all the curves together quickly become computationally unfeasible due to the large number of data points. In this paper we propose a new data-driven, adaptive kernel smoothing, specifically tailored for functional principal components analysis through the derivation of sharp, explicit risk bounds for the eigen-elements. The minimization of these quadratic risk bounds provide refined, yet computationally efficient bandwidth rules for each eigen-element separately. Both common and independent design cases are allowed. Rates of convergence for the estimators are derived. An extensive simulation study, designed in a versatile manner to closely mimic the characteristics of real data sets supports our methodological contribution. An illustration on a real data application is provided.
△ Less
Submitted 16 April, 2024; v1 submitted 28 June, 2023;
originally announced June 2023.
-
A 2-step estimation procedure for semiparametric mixture cure models
Authors:
Eni Musta,
Valentin Patilea,
Ingrid Van Keilegom
Abstract:
Cure models have been developed as an alternative modelling approach to conventional survival analysis in order to account for the presence of cured subjects that will never experience the event of interest. Mixture cure models, which model separately the cure probability and the survival of uncured subjects depending on a set of covariates, are particularly useful for distinguishing curative from…
▽ More
Cure models have been developed as an alternative modelling approach to conventional survival analysis in order to account for the presence of cured subjects that will never experience the event of interest. Mixture cure models, which model separately the cure probability and the survival of uncured subjects depending on a set of covariates, are particularly useful for distinguishing curative from life-prolonging effects. In practice, it is common to assume a parametric model for the cure probability and a semiparametric model for the survival of the susceptibles. Because of the latent cure status, maximum likelihood estimation is performed by means of the iterative EM algorithm. Here, we focus on the cure probabilities and propose a two-step procedure to improve upon the performance of the maximum likelihood estimator when the sample size is not large. The new method is based on the idea of presmoothing by first constructing a nonparametric estimator and then projecting it into the desired parametric class. We investigate the theoretical properties of the resulting estimator and show through an extensive simulation study for the logistic-Cox model that it outperforms the existing method. Practical use of the method is illustrated through two melanoma datasets.
△ Less
Submitted 17 July, 2022;
originally announced July 2022.
-
Clustering multivariate functional data using unsupervised binary trees
Authors:
Steven Golovkine,
Nicolas Klutchnikoff,
Valentin Patilea
Abstract:
We propose a model-based clustering algorithm for a general class of functional data for which the components could be curves or images. The random functional data realizations could be measured with error at discrete, and possibly random, points in the definition domain. The idea is to build a set of binary trees by recursive splitting of the observations. The number of groups are determined in a…
▽ More
We propose a model-based clustering algorithm for a general class of functional data for which the components could be curves or images. The random functional data realizations could be measured with error at discrete, and possibly random, points in the definition domain. The idea is to build a set of binary trees by recursive splitting of the observations. The number of groups are determined in a data-driven way. The new algorithm provides easily interpretable results and fast predictions for online data sets. Results on simulated datasets reveal good performance in various complex settings. The methodology is applied to the analysis of vehicle trajectories on a German roundabout.
△ Less
Submitted 24 September, 2021; v1 submitted 10 December, 2020;
originally announced December 2020.
-
A presmoothing approach for estimation in semiparametric mixture cure models
Authors:
Eni Musta,
Valentin Patilea,
Ingrid Van Keilegom
Abstract:
A challenge when dealing with survival analysis data is accounting for a cure fraction, meaning that some subjects will never experience the event of interest. Mixture cure models have been frequently used to estimate both the probability of being cured and the time to event for the susceptible subjects, by usually assuming a parametric (logistic) form of the incidence. We propose a new estimation…
▽ More
A challenge when dealing with survival analysis data is accounting for a cure fraction, meaning that some subjects will never experience the event of interest. Mixture cure models have been frequently used to estimate both the probability of being cured and the time to event for the susceptible subjects, by usually assuming a parametric (logistic) form of the incidence. We propose a new estimation procedure for a parametric cure rate that relies on a preliminary smooth estimator and is independent of the model assumed for the latency. We investigate the theoretical properties of the estimators and show through simulations that, in the logistic/Cox model, presmoothing leads to more accurate results compared to the maximum likelihood estimator. To illustrate the practical use, we apply the new estimation procedure to two studies of melanoma survival data.
△ Less
Submitted 14 June, 2021; v1 submitted 12 August, 2020;
originally announced August 2020.
-
Wilks' theorem for semiparametric regressions with weakly dependent data
Authors:
Marie Du Roy de Chaumaray,
Matthieu Marbac,
Valentin Patilea
Abstract:
The empirical likelihood inference is extended to a class of semiparametric models for stationary, weakly dependent series. A partially linear single-index regression is used for the conditional mean of the series given its past, and the present and past values of a vector of covariates. A parametric model for the conditional variance of the series is added to capture further nonlinear effects. We…
▽ More
The empirical likelihood inference is extended to a class of semiparametric models for stationary, weakly dependent series. A partially linear single-index regression is used for the conditional mean of the series given its past, and the present and past values of a vector of covariates. A parametric model for the conditional variance of the series is added to capture further nonlinear effects. We propose a fixed number of suitable moment equations which characterize the mean and variance model. We derive an empirical log-likelihood ratio which includes nonparametric estimators of several functions, and we show that this ratio has the same limit as in the case where these functions are known.
△ Less
Submitted 17 May, 2021; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Orthogonal Impulse Response Analysis in Presence of Time-Varying Covariance
Authors:
Valentin Patilea,
Hamdi Raïssi
Abstract:
In this paper the orthogonal impulse response functions (OIRF) are studied in the non-standard, though quite common, case where the covariance of the error vector is not constant in time. The usual approach for taking into account such behavior of the covariance consists in applying the standard tools to sub-periods of the whole sample. We underline that such a practice may lead to severe upward b…
▽ More
In this paper the orthogonal impulse response functions (OIRF) are studied in the non-standard, though quite common, case where the covariance of the error vector is not constant in time. The usual approach for taking into account such behavior of the covariance consists in applying the standard tools to sub-periods of the whole sample. We underline that such a practice may lead to severe upward bias. We propose a new approach intended to give what we argue to be a more accurate resume of the time-varying OIRF. This consists in averaging the Cholesky decomposition of nonparametric covariance estimators. In addition an index is developed to evaluate the heteroscedasticity effect on the OIRF analysis. The asymptotic behavior of the different estimators considered in the paper is investigated. The theoretical results are illustrated by Monte Carlo experiments. The analysis of the orthogonal response functions of the U.S. inflation to an oil price shock, shows the relevance of the tools proposed herein for an appropriate analysis of economic variables.
△ Less
Submitted 30 September, 2020; v1 submitted 24 March, 2020;
originally announced March 2020.
-
A likelihood-based approach for cure regression models
Authors:
Kevin Burke,
Valentin Patilea
Abstract:
We propose a new likelihood-based approach for estimation, inference and variable selection for parametric cure regression models in time-to-event analysis under random right-censoring. In this context, it often happens that some subjects are "cured", i.e., they will never experience the event of interest. Then, the sample of censored observations is an unlabeled mixture of cured and "susceptible"…
▽ More
We propose a new likelihood-based approach for estimation, inference and variable selection for parametric cure regression models in time-to-event analysis under random right-censoring. In this context, it often happens that some subjects are "cured", i.e., they will never experience the event of interest. Then, the sample of censored observations is an unlabeled mixture of cured and "susceptible" subjects. Using inverse probability censoring weighting (IPCW), we propose a likelihood-based estimation procedure for the cure regression model without making assumptions about the distribution of survival times for the susceptible subjects. The IPCW approach does require a preliminary estimate of the censoring distribution, for which general parametric, semi- or non-parametric approaches can be used. The incorporation of a penalty term in our estimation procedure is straightforward; in particular, we propose L1-type penalties for variable selection. Our theoretical results are derived under mild assumptions. Simulation experiments and real data analysis illustrate the effectiveness of the new approach.
△ Less
Submitted 15 July, 2020; v1 submitted 13 December, 2018;
originally announced December 2018.
-
Testing second order dynamics for autoregressive processes in presence of time-varying variance
Authors:
Valentin Patilea,
Hamdi Raïssi
Abstract:
The volatility modeling for autoregressive univariate time series is considered. A benchmark approach is the stationary ARCH model of Engle (1982). Motivated by real data evidence, processes with non constant unconditional variance and ARCH effects have been recently introduced. We take into account such possible non stationarity and propose simple testing procedures for ARCH effects. Adaptive McL…
▽ More
The volatility modeling for autoregressive univariate time series is considered. A benchmark approach is the stationary ARCH model of Engle (1982). Motivated by real data evidence, processes with non constant unconditional variance and ARCH effects have been recently introduced. We take into account such possible non stationarity and propose simple testing procedures for ARCH effects. Adaptive McLeod and Li's portmanteau and ARCH-LM tests for checking for second order dynamics are provided. The standard versions of these tests, commonly used by practitioners, suppose constant unconditional variance. We prove the failure of these standard tests with time-varying unconditional variance. The theoretical results are illustrated by mean of simulated and real data.
△ Less
Submitted 11 December, 2012;
originally announced December 2012.
-
Corrected portmanteau tests for VAR models with time-varying variance
Authors:
Valentin Patilea,
Hamdi Raïssi
Abstract:
The problem of test of fit for Vector AutoRegressive (VAR) processes with unconditionally heteroscedastic errors is studied. The volatility structure is deterministic but time-varying and allows for changes that are commonly observed in economic or financial multivariate series. Our analysis is based on the residual autocovariances and autocorrelations obtained from Ordinary Least Squares (OLS), G…
▽ More
The problem of test of fit for Vector AutoRegressive (VAR) processes with unconditionally heteroscedastic errors is studied. The volatility structure is deterministic but time-varying and allows for changes that are commonly observed in economic or financial multivariate series. Our analysis is based on the residual autocovariances and autocorrelations obtained from Ordinary Least Squares (OLS), Generalized Least Squares (GLS)and Adaptive Least Squares (ALS) estimation of the autoregressive parameters. The ALS approach is the GLS approach adapted to the unknown time-varying volatility that is then estimated by kernel smoothing. The properties of the three types of residual autocovariances and autocorrelations are derived. In particular it is shown that the ALS and GLS residual autocorrelations are asymptotically equivalent. It is also found that the asymptotic distribution of the OLS residual autocorrelations can be quite different from the standard chi-square asymptotic distribution obtained in a correctly specified VAR model with iid innovations. As a consequence the standard portmanteau tests are unreliable in our framework. The correct critical values of the standard portmanteau tests based on the OLS residuals are derived. Moreover, modified portmanteau statistics based on ALS residual autocorrelations are introduced. Portmanteau tests with modified statistics based on OLS and ALS residuals and standard chi-square asymptotic distributions under the null hypothesis are also proposed. An extension of our portmanteau approaches to testing the lag length in a vector error correction type model for co-integrating relations is briefly investigated. The finite sample properties of the goodness-of-fit tests we consider are investigated by Monte Carlo experiments. The theoretical results are also illustrated using two U.S. economic data sets.
△ Less
Submitted 31 May, 2011; v1 submitted 18 May, 2011;
originally announced May 2011.
-
Adaptive estimation of vector autoregressive models with time-varying variance: application to testing linear causality in mean
Authors:
Valentin Patilea,
Hamdi Raïssi
Abstract:
Linear Vector AutoRegressive (VAR) models where the innovations could be unconditionally heteroscedastic and serially dependent are considered. The volatility structure is deterministic and quite general, including breaks or trending variances as special cases. In this framework we propose Ordinary Least Squares (OLS), Generalized Least Squares (GLS) and Adaptive Least Squares (ALS) procedures. Th…
▽ More
Linear Vector AutoRegressive (VAR) models where the innovations could be unconditionally heteroscedastic and serially dependent are considered. The volatility structure is deterministic and quite general, including breaks or trending variances as special cases. In this framework we propose Ordinary Least Squares (OLS), Generalized Least Squares (GLS) and Adaptive Least Squares (ALS) procedures. The GLS estimator requires the knowledge of the time-varying variance structure while in the ALS approach the unknown variance is estimated by kernel smoothing with the outer product of the OLS residuals vectors. Different bandwidths for the different cells of the time-varying variance matrix are also allowed. We derive the asymptotic distribution of the proposed estimators for the VAR model coefficients and compare their properties. In particular we show that the ALS estimator is asymptotically equivalent to the infeasible GLS estimator. This asymptotic equivalence is obtained uniformly with respect to the bandwidth(s) in a given range and hence justifies data-driven bandwidth rules. Using these results we build Wald tests for the linear Granger causality in mean which are adapted to VAR processes driven by errors with a non stationary volatility. It is also shown that the commonly used standard Wald test for the linear Granger causality in mean is potentially unreliable in our framework. Monte Carlo experiments illustrate the use of the different estimation approaches for the analysis of VAR models with stable innovations.
△ Less
Submitted 8 July, 2010; v1 submitted 7 July, 2010;
originally announced July 2010.