Search | arXiv e-print repository

Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation

Authors: **-Hong Du, Pratik Patil, Arun Kumar Kuchibhotla

Abstract: We study subsampling-based ridge ensembles in the proportional asymptotics regime, where the feature size grows proportionally with the sample size such that their ratio converges to a constant. By analyzing the squared prediction risk of ridge ensembles as a function of the explicit penalty $λ$ and the limiting subsample aspect ratio $φ_s$ (the ratio of the feature size to the subsample size), we… ▽ More We study subsampling-based ridge ensembles in the proportional asymptotics regime, where the feature size grows proportionally with the sample size such that their ratio converges to a constant. By analyzing the squared prediction risk of ridge ensembles as a function of the explicit penalty $λ$ and the limiting subsample aspect ratio $φ_s$ (the ratio of the feature size to the subsample size), we characterize contours in the $(λ, φ_s)$-plane at any achievable risk. As a consequence, we prove that the risk of the optimal full ridgeless ensemble (fitted on all possible subsamples) matches that of the optimal ridge predictor. In addition, we prove strong uniform consistency of generalized cross-validation (GCV) over the subsample sizes for estimating the prediction risk of ridge ensembles. This allows for GCV-based tuning of full ridgeless ensembles without sample splitting and yields a predictor whose risk matches optimal ridge risk. △ Less

Submitted 16 July, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

Comments: 47 pages, 11 figures; this version fixes minor typos. arXiv admin note: text overlap with arXiv:2210.11445

arXiv:2205.12937 [pdf, other]

Mitigating multiple descents: A model-agnostic framework for risk monotonization

Authors: Pratik Patil, Arun Kumar Kuchibhotla, Yuting Wei, Alessandro Rinaldo

Abstract: Recent empirical and theoretical analyses of several commonly used prediction procedures reveal a peculiar risk behavior in high dimensions, referred to as double/multiple descent, in which the asymptotic risk is a non-monotonic function of the limiting aspect ratio of the number of features or parameters to the sample size. To mitigate this undesirable behavior, we develop a general framework for… ▽ More Recent empirical and theoretical analyses of several commonly used prediction procedures reveal a peculiar risk behavior in high dimensions, referred to as double/multiple descent, in which the asymptotic risk is a non-monotonic function of the limiting aspect ratio of the number of features or parameters to the sample size. To mitigate this undesirable behavior, we develop a general framework for risk monotonization based on cross-validation that takes as input a generic prediction procedure and returns a modified procedure whose out-of-sample prediction risk is, asymptotically, monotonic in the limiting aspect ratio. As part of our framework, we propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting, respectively, and show that, under very mild assumptions, they provably achieve monotonic asymptotic risk behavior. Our results are applicable to a broad variety of prediction procedures and loss functions, and do not require a well-specified (parametric) model. We exemplify our framework with concrete analyses of the minimum $\ell_2$, $\ell_1$-norm least squares prediction procedures. As one of the ingredients in our analysis, we also derive novel additive and multiplicative forms of oracle risk inequalities for split cross-validation that are of independent interest. △ Less

Submitted 25 May, 2022; originally announced May 2022.

Comments: 110 pages, 15 figures

arXiv:2006.05022 [pdf, other]

Near-Optimal Confidence Sequences for Bounded Random Variables

Authors: Arun Kumar Kuchibhotla, Qinqing Zheng

Abstract: Many inference problems, such as sequential decision problems like A/B testing, adaptive sampling schemes like bandit selection, are often online in nature. The fundamental problem for online inference is to provide a sequence of confidence intervals that are valid uniformly over the growing-into-infinity sample sizes. To address this question, we provide a near-optimal confidence sequence for bou… ▽ More Many inference problems, such as sequential decision problems like A/B testing, adaptive sampling schemes like bandit selection, are often online in nature. The fundamental problem for online inference is to provide a sequence of confidence intervals that are valid uniformly over the growing-into-infinity sample sizes. To address this question, we provide a near-optimal confidence sequence for bounded random variables by utilizing Bentkus' concentration results. We show that it improves on the existing approaches that use the Cram{é}r-Chernoff technique such as the Hoeffding, Bernstein, and Bennett inequalities. The resulting confidence sequence is confirmed to be favorable in both synthetic coverage problems and an application to adaptive stop** algorithms. △ Less

Submitted 3 June, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

Comments: Accepted to ICML 2021

arXiv:1910.10562 [pdf, other]

doi 10.1016/j.patcog.2021.108496

Nested conformal prediction and quantile out-of-bag ensemble methods

Authors: Chirag Gupta, Arun K. Kuchibhotla, Aaditya K. Ramdas

Abstract: Conformal prediction is a popular tool for providing valid prediction sets for classification and regression problems, without relying on any distributional assumptions on the data. While the traditional description of conformal prediction starts with a nonconformity score, we provide an alternate (but equivalent) view that starts with a sequence of nested sets and calibrates them to find a valid… ▽ More Conformal prediction is a popular tool for providing valid prediction sets for classification and regression problems, without relying on any distributional assumptions on the data. While the traditional description of conformal prediction starts with a nonconformity score, we provide an alternate (but equivalent) view that starts with a sequence of nested sets and calibrates them to find a valid prediction set. The nested framework subsumes all nonconformity scores, including recent proposals based on quantile regression and density estimation. While these ideas were originally derived based on sample splitting, our framework seamlessly extends them to other aggregation schemes like cross-conformal, jackknife+ and out-of-bag methods. We use the framework to derive a new algorithm (QOOB, pronounced cube) that combines four ideas: quantile regression, cross-conformalization, ensemble methods and out-of-bag predictions. We develop a computationally efficient implementation of cross-conformal, that is also used by QOOB. In a detailed numerical investigation, QOOB performs either the best or close to the best on all simulated and real datasets. Code for QOOB is available at https://github.com/aigen/QOOB. △ Less

Submitted 9 May, 2022; v1 submitted 23 October, 2019; originally announced October 2019.

Comments: 38 pages, 4 figures, 8 tables. This version fixes a bug in the proof of Proposition 3. Published paper available at https://www.sciencedirect.com/science/article/abs/pii/S0031320321006725

Journal ref: Pattern Recognition 127 (2022): 108496

arXiv:1909.02088 [pdf, other]

On Least Squares Estimation under Heteroscedastic and Heavy-Tailed Errors

Authors: Arun K. Kuchibhotla, Rohit K. Patra

Abstract: We consider least squares estimation in a general nonparametric regression model. The rate of convergence of the least squares estimator (LSE) for the unknown regression function is well studied when the errors are sub-Gaussian. We find upper bounds on the rates of convergence of the LSE when the errors have uniformly bounded conditional variance and have only finitely many moments. We show that t… ▽ More We consider least squares estimation in a general nonparametric regression model. The rate of convergence of the least squares estimator (LSE) for the unknown regression function is well studied when the errors are sub-Gaussian. We find upper bounds on the rates of convergence of the LSE when the errors have uniformly bounded conditional variance and have only finitely many moments. We show that the interplay between the moment assumptions on the error, the metric entropy of the class of functions involved, and the "local" structure of the function class around the truth drives the rate of convergence of the LSE. We find sufficient conditions on the errors under which the rate of the LSE matches the rate of the LSE under sub-Gaussian error. Our results are finite sample and allow for heteroscedastic and heavy-tailed errors. △ Less

Submitted 8 April, 2021; v1 submitted 4 September, 2019; originally announced September 2019.

Comments: 49 pages, 2 figures, and 3 tables

Showing 1–5 of 5 results for author: Kuchibhotla, A K