Search | arXiv e-print repository

Scalable Subsampling Inference for Deep Neural Networks

Abstract: Deep neural networks (DNN) has received increasing attention in machine learning applications in the last several years. Recently, a non-asymptotic error bound has been developed to measure the performance of the fully connected DNN estimator with ReLU activation functions for estimating regression models. The paper at hand gives a small improvement on the current error bound based on the latest r… ▽ More Deep neural networks (DNN) has received increasing attention in machine learning applications in the last several years. Recently, a non-asymptotic error bound has been developed to measure the performance of the fully connected DNN estimator with ReLU activation functions for estimating regression models. The paper at hand gives a small improvement on the current error bound based on the latest results on the approximation ability of DNN. More importantly, however, a non-random subsampling technique--scalable subsampling--is applied to construct a `subagged' DNN estimator. Under regularity conditions, it is shown that the subagged DNN estimator is computationally efficient without sacrificing accuracy for either estimation or prediction tasks. Beyond point estimation/prediction, we propose different approaches to build confidence and prediction intervals based on the subagged DNN estimator. In addition to being asymptotically valid, the proposed confidence/prediction intervals appear to work well in finite samples. All in all, the scalable subsampling DNN estimator offers the complete package in terms of statistical inference, i.e., (a) computational efficiency; (b) point estimation/prediction accuracy; and (c) allowing for the construction of practically useful confidence and prediction intervals. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2311.00294 [pdf, ps, other]

Multi-step ahead prediction intervals for non-parametric autoregressions via bootstrap: consistency, debiasing and pertinence

Authors: Dimitris N. Politis, Ke** Wu

Abstract: To address the difficult problem of multi-step ahead prediction of non-parametric autoregressions, we consider a forward bootstrap approach. Employing a local constant estimator, we can analyze a general type of non-parametric time series model, and show that the proposed point predictions are consistent with the true optimal predictor. We construct a quantile prediction interval that is asymptoti… ▽ More To address the difficult problem of multi-step ahead prediction of non-parametric autoregressions, we consider a forward bootstrap approach. Employing a local constant estimator, we can analyze a general type of non-parametric time series model, and show that the proposed point predictions are consistent with the true optimal predictor. We construct a quantile prediction interval that is asymptotically valid. Moreover, using a debiasing technique, we can asymptotically approximate the distribution of multi-step ahead non-parametric estimation by bootstrap. As a result, we can build bootstrap prediction intervals that are pertinent, i.e., can capture the model estimation variability, thus improving upon the standard quantile prediction intervals. Simulation studies are given to illustrate the performance of our point predictions and pertinent prediction intervals for finite samples. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2306.04126 [pdf, ps, other]

Bootstrap Prediction Inference of Non-linear Autoregressive Models

Authors: Ke** Wu, Dimitris N. Politis

Abstract: The non-linear autoregressive (NLAR) model plays an important role in modeling and predicting time series. One-step ahead prediction is straightforward using the NLAR model, but the multi-step ahead prediction is cumbersome. For instance, iterating the one-step ahead predictor is a convenient strategy for linear autoregressive (LAR) models, but it is suboptimal under NLAR. In this paper, we first… ▽ More The non-linear autoregressive (NLAR) model plays an important role in modeling and predicting time series. One-step ahead prediction is straightforward using the NLAR model, but the multi-step ahead prediction is cumbersome. For instance, iterating the one-step ahead predictor is a convenient strategy for linear autoregressive (LAR) models, but it is suboptimal under NLAR. In this paper, we first propose a simulation and/or bootstrap algorithm to construct optimal point predictors under an $L_1$ or $L_2$ loss criterion. In addition, we construct bootstrap prediction intervals in the multi-step ahead prediction problem; in particular, we develop an asymptotically valid quantile prediction interval as well as a pertinent prediction interval for future values. In order to correct the undercoverage of prediction intervals with finite samples, we further employ predictive -- as opposed to fitted -- residuals in the bootstrap process. Simulation studies are also given to substantiate the finite sample performance of our methods. △ Less

Submitted 6 June, 2023; originally announced June 2023.

arXiv:2212.03079 [pdf, other]

Model-Based and Model-Free point prediction algorithms for locally stationary random fields

Authors: Srinjoy Das, Yiwen Zhang, Dimitris N. Politis

Abstract: The Model-free Prediction Principle has been successfully applied to general regression problems, as well as problems involving stationary and locally stationary time series. In this paper we demonstrate how Model-Free Prediction can be applied to handle random fields that are only locally stationary, i.e., they can be assumed to be stationary only across a limited part over their entire region of… ▽ More The Model-free Prediction Principle has been successfully applied to general regression problems, as well as problems involving stationary and locally stationary time series. In this paper we demonstrate how Model-Free Prediction can be applied to handle random fields that are only locally stationary, i.e., they can be assumed to be stationary only across a limited part over their entire region of definition. We construct one-step-ahead point predictors and compare the performance of Model-free to Model-based prediction using models that incorporate a trend and/or heteroscedasticity. Both aspects of the paper, Model-free and Model-based, are novel in the context of random fields that are locally (but not globally) stationary. We demonstrate the application of our Model-based and Model-free point prediction methods to synthetic data as well as images from the CIFAR-10 dataset and in the latter case show that our best Model-free point prediction results outperform those obtained using Model-based prediction. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:1712.02383

arXiv:2212.02584 [pdf, ps, other]

Local Quadratic Spectral and Covariance Matrix Estimation

Authors: Tucker S. McElroy, Dimitris N. Politis

Abstract: The problem of estimating the spectral density matrix $f(w)$ of a multivariate time series is revisited with special focus on the frequencies $w=0$ and $w=π$. Recognizing that the entries of the spectral density matrix at these two boundary points are real-valued, we propose a new estimator constructed from a local polynomial regression of the real portion of the multivariate periodogram. The case… ▽ More The problem of estimating the spectral density matrix $f(w)$ of a multivariate time series is revisited with special focus on the frequencies $w=0$ and $w=π$. Recognizing that the entries of the spectral density matrix at these two boundary points are real-valued, we propose a new estimator constructed from a local polynomial regression of the real portion of the multivariate periodogram. The case $w=0$ is of particular importance, since $f(0)$ is associated with the large-sample covariance matrix of the sample mean; hence, estimating $f(0)$ is crucial in order to conduct any sort of statistical inference on the mean. We explore the properties of the local polynomial estimator through theory and simulations, and discuss an application to inflation and unemployment. △ Less

Submitted 5 December, 2022; originally announced December 2022.

arXiv:2112.08671 [pdf, other]

Model-free Bootstrap Prediction Regions for Multivariate Time Series

Authors: Yiren Wang, Dimitris N. Politis

Abstract: In Das and Politis(2020), a model-free bootstrap(MFB) paradigm was proposed for generating prediction intervals of univariate, (locally) stationary time series. Theoretical guarantees for this algorithm was resolved in Wang and Politis(2019) under stationarity and weak dependence condition. Following this line of work, here we extend MFB for predictive inference under a multivariate time series se… ▽ More In Das and Politis(2020), a model-free bootstrap(MFB) paradigm was proposed for generating prediction intervals of univariate, (locally) stationary time series. Theoretical guarantees for this algorithm was resolved in Wang and Politis(2019) under stationarity and weak dependence condition. Following this line of work, here we extend MFB for predictive inference under a multivariate time series setup. We describe two algorithms, the first one works for a particular class of time series under any fixed dimension d; the second one works for a more generalized class of time series under low-dimensional setting. We justify our procedure through theoretical validity and simulation performance. △ Less

Submitted 16 December, 2021; originally announced December 2021.

Comments: This is an initial version of the paper. A generalization to our setting is under investigation

arXiv:2109.12156 [pdf, other]

Model-free Bootstrap and Conformal Prediction in Regression: Conditionality, Conjecture Testing, and Pertinent Prediction Intervals

Authors: Yiren Wang, Dimitris N. Politis

Abstract: Predictive inference under a general regression setting is gaining more interest in the big-data era. In terms of going beyond point prediction to develop prediction intervals, two main threads of development are conformal prediction and Model-free prediction. Recently, Chernozhukov et al.(2021) proposed a new conformal prediction approach exploiting the same uniformization procedure as in the Mod… ▽ More Predictive inference under a general regression setting is gaining more interest in the big-data era. In terms of going beyond point prediction to develop prediction intervals, two main threads of development are conformal prediction and Model-free prediction. Recently, Chernozhukov et al.(2021) proposed a new conformal prediction approach exploiting the same uniformization procedure as in the Model-free Bootstrap of Politis (2015). Hence, it is of interest to compare and further investigate the performance of the two methods. In the paper at hand, we contrast the two approaches via theoretical analysis and numerical experiments with a focus on conditional coverage of prediction intervals. We discuss suitable scenarios for applying each algorithm, underscore the importance of conditional vs. unconditional coverage, and show that, under mild conditions, the Model-free bootstrap yields prediction intervals with guaranteed better conditional coverage compared to quantile estimation. We also extend the concept of `pertinence' of prediction intervals in Politis (2015) to the nonparametric regression setting, and give concrete examples where its importance emerges under finite sample scenarios. Finally, we define the new notion of `conjecture testing' that is the analog of hypothesis testing as applied to the prediction problem; we also devise a modified conformal score to allow conformal prediction to handle one-sided 'conjecture tests', and compare to the Model-free bootstrap. △ Less

Submitted 23 October, 2021; v1 submitted 24 September, 2021; originally announced September 2021.

arXiv:2009.08071 [pdf, other]

Ridge Regression Revisited: Debiasing, Thresholding and Bootstrap

Authors: Yunyi Zhang, Dimitris N. Politis

Abstract: The success of the Lasso in the era of high-dimensional data can be attributed to its conducting an implicit model selection, i.e., zeroing out regression coefficients that are not significant. By contrast, classical ridge regression can not reveal a potential sparsity of parameters, and may also introduce a large bias under the high-dimensional setting. Nevertheless, recent work on the Lasso invo… ▽ More The success of the Lasso in the era of high-dimensional data can be attributed to its conducting an implicit model selection, i.e., zeroing out regression coefficients that are not significant. By contrast, classical ridge regression can not reveal a potential sparsity of parameters, and may also introduce a large bias under the high-dimensional setting. Nevertheless, recent work on the Lasso involves debiasing and thresholding, the latter in order to further enhance the model selection. As a consequence, ridge regression may be worth another look since -- after debiasing and thresholding -- it may offer some advantages over the Lasso, e.g., it can be easily computed using a closed-form expression. % and it has similar performance to threshold Lasso. In this paper, we define a debiased and thresholded ridge regression method, and prove a consistency result and a Gaussian approximation theorem. We further introduce a wild bootstrap algorithm to construct confidence regions and perform hypothesis testing for a linear combination of parameters. In addition to estimation, we consider the problem of prediction, and present a novel, hybrid bootstrap algorithm tailored for prediction intervals. Extensive numerical simulations further show that the debiased and thresholded ridge regression has favorable finite sample performance and may be preferable in some settings. △ Less

Submitted 22 April, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

Comments: 2 figures, 37 pages

arXiv:2005.09145 [pdf, other]

Bootstrap prediction intervals with asymptotic conditional validity and unconditional guarantees

Authors: Yunyi Zhang, Dimitris N. Politis

Abstract: It can be argued that optimal prediction should take into account all available data. Therefore, to evaluate a prediction interval's performance one should employ conditional coverage probability, conditioning on all available observations. Focusing on a linear model, we derive the asymptotic distribution of the difference between the conditional coverage probability of a nominal prediction interv… ▽ More It can be argued that optimal prediction should take into account all available data. Therefore, to evaluate a prediction interval's performance one should employ conditional coverage probability, conditioning on all available observations. Focusing on a linear model, we derive the asymptotic distribution of the difference between the conditional coverage probability of a nominal prediction interval and the conditional coverage probability of a prediction interval obtained via a residual-based bootstrap. Applying this result, we show that a prediction interval generated by the residual-based bootstrap has approximately 50% probability to yield conditional under-coverage. We then develop a new bootstrap algorithm that generates a prediction interval that asymptotically controls both the conditional coverage probability as well as the possibility of conditional under-coverage. We complement the asymptotic results with several finite-sample simulations. △ Less

Submitted 27 February, 2021; v1 submitted 18 May, 2020; originally announced May 2020.

Comments: 27 pages and 2 figures

arXiv:1712.02383 [pdf, other]

Predictive inference for locally stationary time series with an application to climate data

Authors: Srinjoy Das, Dimitris N. Politis

Abstract: The Model-free Prediction Principle of Politis (2015) has been successfully applied to general regression problems, as well as problems involving stationary time series. However, with long time series, e.g. annual temperature measurements spanning over 100 years or daily financial returns spanning several years, it may be unrealistic to assume stationarity throughout the span of the dataset. In th… ▽ More The Model-free Prediction Principle of Politis (2015) has been successfully applied to general regression problems, as well as problems involving stationary time series. However, with long time series, e.g. annual temperature measurements spanning over 100 years or daily financial returns spanning several years, it may be unrealistic to assume stationarity throughout the span of the dataset. In the paper at hand, we show how Model-free Prediction can be applied to handle time series that are only locally stationary, i.e., they can be assumed to be as stationary only over short time-windows. Surprisingly there is little literature on point prediction for general locally stationary time series even in model-based setups and there is no literature on the construction of prediction intervals of locally stationary time series. We attempt to fill this gap here as well. Both one-step-ahead point predictors and prediction intervals are constructed, and the performance of model-free is compared to model-based prediction using models that incorporate a trend and/or heteroscedasticity. Both aspects of the paper, model-free and model-based, are novel in the context of time-series that are locally (but not globally) stationary. We also demonstrate the application of our Model-based and Model-free prediction methods to speleothem climate data which exhibits local stationarity and show that our best model-free point prediction results outperform that obtained with the RAMPFIT algorithm previously used for analysis of this data. △ Less

Submitted 11 June, 2018; v1 submitted 6 December, 2017; originally announced December 2017.

arXiv:1704.00674 [pdf, other]

Nonparametric estimation of the conditional distribution at regression boundary points

Authors: Srinjoy Das, Dimitris N. Politis

Abstract: Nonparametric regression is a standard statistical tool with increased importance in the Big Data era. Boundary points pose additional difficulties but local polynomial regression can be used to alleviate them. Local linear regression, for example, is easy to implement and performs quite well both at interior as well as boundary points. Estimating the conditional distribution function and/or the q… ▽ More Nonparametric regression is a standard statistical tool with increased importance in the Big Data era. Boundary points pose additional difficulties but local polynomial regression can be used to alleviate them. Local linear regression, for example, is easy to implement and performs quite well both at interior as well as boundary points. Estimating the conditional distribution function and/or the quantile function at a given regressor point is immediate via standard kernel methods but problems ensue if local linear methods are to be used. In particular, the distribution function estimator is not guaranteed to be monotone increasing, and the quantile curves can "cross". In the paper at hand, a simple method of correcting the local linear distribution estimator for monotonicity is proposed, and its good performance is demonstrated via simulations and real data examples. △ Less

Submitted 3 April, 2017; originally announced April 2017.

arXiv:1608.04039 [pdf, other]

Bootstrap Seasonal Unit Root Test under Periodic Variation

Authors: Nan Zou, Dimitris N. Politis

Abstract: Both seasonal unit roots and periodic variation can be prevalent in seasonal data. When testing seasonal unit roots under periodic variation, the validity of the existing methods, such as the HEGY test, remains unknown. This paper analyzes the behavior of the augmented HEGY test and the unaugmented HEGY test under periodic variation. It turns out that the asymptotic null distributions of the HEGY… ▽ More Both seasonal unit roots and periodic variation can be prevalent in seasonal data. When testing seasonal unit roots under periodic variation, the validity of the existing methods, such as the HEGY test, remains unknown. This paper analyzes the behavior of the augmented HEGY test and the unaugmented HEGY test under periodic variation. It turns out that the asymptotic null distributions of the HEGY statistics testing the single roots at $1$ or $-1$ when there is periodic variation are identical to the asymptotic null distributions when there is no periodic variation. On the other hand, the asymptotic null distributions of the statistics testing any coexistence of roots at $1$, $-1$, $i$, or $-i$ when there is periodic variation are non-standard and are different from the asymptotic null distributions when there is no periodic variation. Therefore, when periodic variation exists, HEGY tests are not directly applicable to the joint tests for any concurrence of seasonal unit roots. As a remedy, bootstrap is proposed; in particular, the augmented HEGY test with seasonal independent and identically distributed (iid) bootstrap and the unaugmented HEGY test with seasonal block bootstrap are implemented. The consistency of these bootstrap procedures is established. The finite-sample behavior of these bootstrap tests is illustrated via simulation and prevails over their competitors'. Finally, these bootstrap tests are applied to detect the seasonal unit roots in various economic time series. △ Less

Submitted 21 September, 2019; v1 submitted 13 August, 2016; originally announced August 2016.

arXiv:0903.3014 [pdf, other]

CDF and Survival Function Estimation with Infinite-Order Kernels

Authors: Arthur Berg, Dimitris N. Politis

Abstract: A reduced-bias nonparametric estimator of the cumulative distribution function (CDF) and the survival function is proposed using infinite-order kernels. Fourier transform theory on generalized functions is utilized to obtain the improved bias estimates. The new estimators are analyzed in terms of their relative deficiency to the empirical distribution function and Kaplan-Meier estimator, and eve… ▽ More A reduced-bias nonparametric estimator of the cumulative distribution function (CDF) and the survival function is proposed using infinite-order kernels. Fourier transform theory on generalized functions is utilized to obtain the improved bias estimates. The new estimators are analyzed in terms of their relative deficiency to the empirical distribution function and Kaplan-Meier estimator, and even improvements in terms of asymptotic relative efficiency (ARE) are present under specified assumptions on the data. The deficiency analysis introduces a deficiency rate which provides a continuum between the classical deficiency analysis and an efficiency analysis. Additionally, an automatic bandwidth selection algorithm, specially tailored to the infinite-order kernels, is incorporated into the estimators. In small sample sizes these estimators can significantly improve the estimation of the CDF and survival function as is illustrated through the deficiency analysis and computer simulations. △ Less

Submitted 17 March, 2009; originally announced March 2009.

arXiv:0705.2214 [pdf]

Bagging multiple comparisons from microarray data

Authors: Dimitris N. Politis

Abstract: The problem of large-scale simultaneous hypothesis testing is re-visited. Bagging and subagging procedures are put forth with the purpose of improving the discovery power of the tests. The procedures are implemented in both simulated and real data. It is shown that bagging and subagging significantly improve power at the cost of a small increase in false discovery rate with the proposed `maximum… ▽ More The problem of large-scale simultaneous hypothesis testing is re-visited. Bagging and subagging procedures are put forth with the purpose of improving the discovery power of the tests. The procedures are implemented in both simulated and real data. It is shown that bagging and subagging significantly improve power at the cost of a small increase in false discovery rate with the proposed `maximum contrast' subagging having an edge over bagging, i.e., yielding similar power but significantly smaller false discovery rates. △ Less

Submitted 15 May, 2007; originally announced May 2007.

Showing 1–14 of 14 results for author: Politis, D N