Skip to main content

Showing 1–34 of 34 results for author: Kuchibhotla, A K

Searching in archive math. Search in all archives.
.
  1. arXiv:2405.06437  [pdf, ps, other

    math.ST

    Generalized van Trees inequality: Local minimax bounds for non-smooth functionals and irregular statistical models

    Authors: Kenta Takatsu, Arun Kumar Kuchibhotla

    Abstract: In a decision-theoretic framework, minimax lower bound provides the worst-case performance of estimators relative to a given class of statistical models. For parametric and semiparametric models, the Hájek--Le Cam local asymptotic minimax (LAM) theorem provides the optimal and sharp asymptotic lower bound. Despite its relative generality, this result comes with limitations as it only applies to th… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  2. arXiv:2403.06357  [pdf, other

    math.ST

    Inference for Median and a Generalization of HulC

    Authors: Manit Paul, Arun Kumar Kuchibhotla

    Abstract: Constructing distribution-free confidence intervals for the median, a classic problem in statistics, has seen numerous solutions in the literature. While coverage validity has received ample attention, less has been explored about interval width. Our study breaks new ground by investigating the width of these intervals under non-standard assumptions. Surprisingly, we find that properly scaled, the… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: 75 pages, 10 figures

  3. arXiv:2311.16598  [pdf, ps, other

    math.ST stat.ME

    Rectangular Hull Confidence Regions for Multivariate Parameters

    Authors: Aniket Jain, Arun K Kuchibhotla

    Abstract: We introduce three notions of multivariate median bias, namely, rectilinear, Tukey, and orthant median bias. Each of these median biases is zero under a suitable notion of multivariate symmetry. We study the coverage probabilities of rectangular hull of $B$ independent multivariate estimators, with special attention to the number of estimators $B$ needed to ensure a miscoverage of at most $α$. It… ▽ More

    Submitted 5 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Added the proof of Proposition 3.9

  4. arXiv:2310.20058  [pdf, other

    math.ST stat.ME

    New Asymptotic Limit Theory and Inference for Monotone Regression

    Authors: Soham Mallick, Siddhaarth Sarkar, Arun Kumar Kuchibhotla

    Abstract: Nonparametric regression problems with qualitative constraints such as monotonicity or convexity are ubiquitous in applications. For example, in predicting the yield of a factory in terms of the number of labor hours, the monotonicity of the conditional mean function is a natural constraint. One can estimate a monotone conditional mean function using nonparametric least squares estimation, which i… ▽ More

    Submitted 17 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Additional simulations added along with link to R code files

  5. arXiv:2307.16798  [pdf, other

    stat.ME math.ST stat.AP

    Forster-Warmuth Counterfactual Regression: A Unified Learning Approach

    Authors: Yachong Yang, Arun Kumar Kuchibhotla, Eric Tchetgen Tchetgen

    Abstract: Series or orthogonal basis regression is one of the most popular non-parametric regression techniques in practice, obtained by regressing the response on features generated by evaluating the basis functions at observed covariate values. The most routinely used series estimator is based on ordinary least squares fitting, which is known to be minimax rate optimal in various settings, albeit under st… ▽ More

    Submitted 20 March, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

    Comments: Add grant acknowledgement

  6. arXiv:2307.05732  [pdf, ps, other

    stat.ME math.ST

    From isotonic to Lipschitz regression: a new interpolative perspective on shape-restricted estimation

    Authors: Kenta Takatsu, Tianyu Zhang, Arun Kumar Kuchibhotla

    Abstract: This manuscript seeks to bridge two seemingly disjoint paradigms of nonparametric regression estimation based on smoothness assumptions and shape constraints. The proposed approach is motivated by a conceptually simple observation: Every Lipschitz function is a sum of monotonic and linear functions. This principle is further generalized to the higher-order monotonicity and multivariate covariates.… ▽ More

    Submitted 20 June, 2024; v1 submitted 11 July, 2023; originally announced July 2023.

  7. arXiv:2307.00795  [pdf, other

    math.ST

    Inference for Projection Parameters in Linear Regression: beyond $d = o(n^{1/2})$

    Authors: Woonyoung Chang, Arun Kumar Kuchibhotla, Alessandro Rinaldo

    Abstract: We consider the problem of inference for projection parameters in linear regression with increasing dimensions. This problem has been studied under a variety of assumptions in the literature. The classical asymptotic normality result for the least squares estimator of the projection parameter only holds when the dimension $d$ of the covariates is of a smaller order than $n^{1/2}$, where $n$ is the… ▽ More

    Submitted 11 January, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: Updated Jan 11, 2024

  8. arXiv:2306.14382  [pdf, ps, other

    math.PR math.ST stat.AP

    Central Limit Theorems and Approximation Theory: Part II

    Authors: Arun Kumar Kuchibhotla

    Abstract: In Part I of this article (Banerjee and Kuchibhotla (2023)), we have introduced a new method to bound the difference in expectations of an average of independent random vector and the limiting Gaussian random vector using level sets. In the current article, we further explore this idea using finite sample Edgeworth expansions and also established integral representation theorems.

    Submitted 25 June, 2023; originally announced June 2023.

  9. arXiv:2306.14299  [pdf, ps, other

    math.PR math.ST

    Dual Induction CLT for High-dimensional m-dependent Data

    Authors: Heejong Bong, Arun Kumar Kuchibhotla, Alessandro Rinaldo

    Abstract: We derive novel and sharp high-dimensional Berry--Esseen bounds for the sum of $m$-dependent random vectors over the class of hyper-rectangles exhibiting only a poly-logarithmic dependence in the dimension. Our results hold under minimal assumptions, such as non-degenerate covariances and finite third moments, and yield a sample complexity of order $\sqrt{m/n}$, aside from logarithmic terms, match… ▽ More

    Submitted 16 November, 2023; v1 submitted 25 June, 2023; originally announced June 2023.

    Comments: 25 pages

    MSC Class: 60B12; 60F05

  10. arXiv:2306.05947  [pdf, ps, other

    math.ST stat.AP

    Central Limit Theorems and Approximation Theory: Part I

    Authors: Arisina Banerjee, Arun K Kuchibhotla

    Abstract: Central limit theorems (CLTs) have a long history in probability and statistics. They play a fundamental role in constructing valid statistical inference procedures. Over the last century, various techniques have been developed in probability and statistics to prove CLTs under a variety of assumptions on random variables. Quantitative versions of CLTs (e.g., Berry--Esseen bounds) have also been pa… ▽ More

    Submitted 25 June, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: 25 pages

  11. arXiv:2304.13016  [pdf, other

    math.ST cs.LG stat.ML

    Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation

    Authors: **-Hong Du, Pratik Patil, Arun Kumar Kuchibhotla

    Abstract: We study subsampling-based ridge ensembles in the proportional asymptotics regime, where the feature size grows proportionally with the sample size such that their ratio converges to a constant. By analyzing the squared prediction risk of ridge ensembles as a function of the explicit penalty $λ$ and the limiting subsample aspect ratio $φ_s$ (the ratio of the feature size to the subsample size), we… ▽ More

    Submitted 16 July, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: 47 pages, 11 figures; this version fixes minor typos. arXiv admin note: text overlap with arXiv:2210.11445

  12. arXiv:2302.03850  [pdf, ps, other

    math.ST math.PR

    Tight Concentration Inequality for Sub-Weibull Random Variables with Generalized Bernstien Orlicz norm

    Authors: Heejong Bong, Arun Kumar Kuchibhotla

    Abstract: Recent development in high-dimensional statistical inference has necessitated concentration inequalities for a broader range of random variables. We focus on sub-Weibull random variables, which extend sub-Gaussian or sub-exponential random variables to allow heavy-tailed distributions. This paper presents concentration inequalities for independent sub-Weibull random variables with finite Generaliz… ▽ More

    Submitted 25 February, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

    MSC Class: 60G50; 60E15 (Primary) 60B20; 62E22 (Secondary)

  13. arXiv:2212.05355  [pdf, ps, other

    math.PR math.ST

    High-dimensional Berry-Esseen Bound for $m$-Dependent Random Samples

    Authors: Heejong Bong, Arun Kumar Kuchibhotla, Alessandro Rinaldo

    Abstract: In this work, we provide a $(n/m)^{-1/2}$-rate finite sample Berry-Esseen bound for $m$-dependent high-dimensional random vectors over the class of hyper-rectangles. This bound imposes minimal assumptions on the random vectors such as nondegenerate covariances and finite third moments. The proof uses inductive relationships between anti-concentration inequalities and Berry--Esseen bounds, which ar… ▽ More

    Submitted 10 December, 2022; originally announced December 2022.

  14. arXiv:2210.11445  [pdf, other

    math.ST stat.ML

    Bagging in overparameterized learning: Risk characterization and risk monotonization

    Authors: Pratik Patil, **-Hong Du, Arun Kumar Kuchibhotla

    Abstract: Bagging is a commonly used ensemble technique in statistics and machine learning to improve the performance of prediction procedures. In this paper, we study the prediction risk of variants of bagged predictors under the proportional asymptotics regime, in which the ratio of the number of features to the number of observations converges to a constant. Specifically, we propose a general strategy to… ▽ More

    Submitted 24 October, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: 102 pages, 34 figures; this version add minor clarifications at few places

  15. arXiv:2206.02954  [pdf, ps, other

    math.ST stat.ME

    Median Regularity and Honest Inference

    Authors: Arun Kumar Kuchibhotla, Sivaraman Balakrishnan, Larry Wasserman

    Abstract: We introduce a new notion of regularity of an estimator called median regularity. We prove that uniformly valid (honest) inference for a functional is possible if and only if there exists a median regular estimator of that functional. To our knowledge, such a notion of regularity that is necessary for uniformly valid inference is unavailable in the literature.

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: 10 pages

  16. arXiv:2205.12937  [pdf, other

    math.ST cs.LG stat.ML

    Mitigating multiple descents: A model-agnostic framework for risk monotonization

    Authors: Pratik Patil, Arun Kumar Kuchibhotla, Yuting Wei, Alessandro Rinaldo

    Abstract: Recent empirical and theoretical analyses of several commonly used prediction procedures reveal a peculiar risk behavior in high dimensions, referred to as double/multiple descent, in which the asymptotic risk is a non-monotonic function of the limiting aspect ratio of the number of features or parameters to the sample size. To mitigate this undesirable behavior, we develop a general framework for… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: 110 pages, 15 figures

  17. arXiv:2203.01761  [pdf, other

    stat.ME math.ST

    Doubly Robust Calibration of Prediction Sets under Covariate Shift

    Authors: Yachong Yang, Arun Kumar Kuchibhotla, Eric Tchetgen Tchetgen

    Abstract: Conformal prediction has received tremendous attention in recent years and has offered new solutions to problems in missing data and causal inference; yet these advances have not leveraged modern semiparametric efficiency theory for more robust and efficient uncertainty quantification. In this paper, we consider the problem of obtaining distribution-free prediction regions accounting for a shift i… ▽ More

    Submitted 13 December, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: New contribution about impossibility of finite sample results and asymptotic conditional coverage through CQR score

  18. arXiv:2106.00164  [pdf, ps, other

    math.ST

    Median bias of M-estimators

    Authors: Arun Kumar Kuchibhotla

    Abstract: In this note, we derive bounds on the median bias of univariate M-estimators under mild regularity conditions. These requirements are not sufficient to imply convergence in distribution of the M-estimators. We also discuss median bias of some multivariate M-estimators.

    Submitted 31 May, 2021; originally announced June 2021.

  19. arXiv:2105.14577  [pdf, other

    math.ST stat.CO stat.ME

    The HulC: Confidence Regions from Convex Hulls

    Authors: Arun Kumar Kuchibhotla, Sivaraman Balakrishnan, Larry Wasserman

    Abstract: We develop and analyze the HulC, an intuitive and general method for constructing confidence sets using the convex hull of estimates constructed from subsets of the data. Unlike classical methods which are based on estimating the (limiting) distribution of an estimator, the HulC is often simpler to use and effectively bypasses this step. In comparison to the bootstrap, the HulC requires fewer regu… ▽ More

    Submitted 8 September, 2023; v1 submitted 30 May, 2021; originally announced May 2021.

    Comments: Latest version. Fixed a gap in Proposition and Theorem 1 pointed out by Prof. Hannes Leeb. Now all the simulations include a comparison with subsampling. Also, added several new simulation settings including quantile regression, isotonic regression both under non-standard assumptions

  20. arXiv:2009.13673  [pdf, ps, other

    math.ST

    High-dimensional CLT for Sums of Non-degenerate Random Vectors: $n^{-1/2}$-rate

    Authors: Arun Kumar Kuchibhotla, Alessandro Rinaldo

    Abstract: In this note, we provide a Berry--Esseen bounds for rectangles in high-dimensions when the random vectors have non-singular covariance matrices. Under this assumption of non-singularity, we prove an $n^{-1/2}$ scaling for the Berry--Esseen bound for sums of mean independent random vectors with a finite third moment. The proof is essentially the method of compositions proof of multivariate Berry--E… ▽ More

    Submitted 28 September, 2020; originally announced September 2020.

    Comments: 21 pages

  21. arXiv:2007.09751  [pdf, ps, other

    math.ST stat.ME

    Berry-Esseen Bounds for Projection Parameters and Partial Correlations with Increasing Dimension

    Authors: Arun Kumar Kuchibhotla, Alessandro Rinaldo, Larry Wasserman

    Abstract: We provide finite sample bounds on the Normal approximation to the law of the least squares estimator of the projection parameters normalized by the sandwich-based standard errors. Our results hold in the increasing dimension setting and under minimal assumptions on the data generating distribution. In particular, we do not assume a linear regression function and only require the existence of fini… ▽ More

    Submitted 22 October, 2021; v1 submitted 19 July, 2020; originally announced July 2020.

    Comments: 58 pages, 0 figures

  22. arXiv:2006.05022  [pdf, other

    math.ST cs.AI cs.LG stat.AP stat.ML

    Near-Optimal Confidence Sequences for Bounded Random Variables

    Authors: Arun Kumar Kuchibhotla, Qinqing Zheng

    Abstract: Many inference problems, such as sequential decision problems like A/B testing, adaptive sampling schemes like bandit selection, are often online in nature. The fundamental problem for online inference is to provide a sequence of confidence intervals that are valid uniformly over the growing-into-infinity sample sizes. To address this question, we provide a near-optimal confidence sequence for bou… ▽ More

    Submitted 3 June, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: Accepted to ICML 2021

  23. arXiv:1910.10562  [pdf, other

    stat.ME cs.AI math.ST stat.ML

    Nested conformal prediction and quantile out-of-bag ensemble methods

    Authors: Chirag Gupta, Arun K. Kuchibhotla, Aaditya K. Ramdas

    Abstract: Conformal prediction is a popular tool for providing valid prediction sets for classification and regression problems, without relying on any distributional assumptions on the data. While the traditional description of conformal prediction starts with a nonconformity score, we provide an alternate (but equivalent) view that starts with a sequence of nested sets and calibrates them to find a valid… ▽ More

    Submitted 9 May, 2022; v1 submitted 23 October, 2019; originally announced October 2019.

    Comments: 38 pages, 4 figures, 8 tables. This version fixes a bug in the proof of Proposition 3. Published paper available at https://www.sciencedirect.com/science/article/abs/pii/S0031320321006725

    Journal ref: Pattern Recognition 127 (2022): 108496

  24. arXiv:1910.06386  [pdf, other

    math.ST stat.ME

    All of Linear Regression

    Authors: Arun K. Kuchibhotla, Lawrence D. Brown, Andreas Buja, Junhui Cai

    Abstract: Least squares linear regression is one of the oldest and widely used data analysis tools. Although the theoretical analysis of the ordinary least squares (OLS) estimator is as old, several fundamental questions are yet to be answered. Suppose regression observations $(X_1,Y_1),\ldots,(X_n,Y_n)\in\mathbb{R}^d\times\mathbb{R}$ (not necessarily independent) are available. Some of the questions we dea… ▽ More

    Submitted 14 October, 2019; originally announced October 2019.

  25. arXiv:1910.05480  [pdf, ps, other

    math.ST stat.ML

    First order expansion of convex regularized estimators

    Authors: Pierre C Bellec, Arun K Kuchibhotla

    Abstract: We consider first order expansions of convex penalized estimators in high-dimensional regression problems with random designs. Our setting includes linear regression and logistic regression as special cases. For a given penalty function $h$ and the corresponding penalized estimator $\hatβ$, we construct a quantity $η$, the first order expansion of $\hatβ$, such that the distance between $\hatβ$ an… ▽ More

    Submitted 8 March, 2020; v1 submitted 11 October, 2019; originally announced October 2019.

    Comments: Accepted at NeurIPS 2019 and published at https://papers.nips.cc/paper/8606-first-order-expansion-of-convex-regularized-estimators . The version here includes the supplementary material

  26. arXiv:1909.02088  [pdf, other

    math.ST cs.LG stat.ML

    On Least Squares Estimation under Heteroscedastic and Heavy-Tailed Errors

    Authors: Arun K. Kuchibhotla, Rohit K. Patra

    Abstract: We consider least squares estimation in a general nonparametric regression model. The rate of convergence of the least squares estimator (LSE) for the unknown regression function is well studied when the errors are sub-Gaussian. We find upper bounds on the rates of convergence of the LSE when the errors have uniformly bounded conditional variance and have only finitely many moments. We show that t… ▽ More

    Submitted 8 April, 2021; v1 submitted 4 September, 2019; originally announced September 2019.

    Comments: 49 pages, 2 figures, and 3 tables

  27. arXiv:1809.10538  [pdf, ps, other

    math.ST

    Model-free Study of Ordinary Least Squares Linear Regression

    Authors: Arun K. Kuchibhotla, Lawrence D. Brown, Andreas Buja

    Abstract: Ordinary least squares (OLS) linear regression is one of the most basic statistical techniques for data analysis. In the main stream literature and the statistical education, the study of linear regression is typically restricted to the case where the covariates are fixed, errors are mean zero Gaussians with variance independent of the (fixed) covariates. Even though OLS has been studied under mis… ▽ More

    Submitted 27 September, 2018; originally announced September 2018.

    Comments: 33 pages

  28. arXiv:1809.05172  [pdf, ps, other

    math.ST stat.ML

    Deterministic Inequalities for Smooth M-estimators

    Authors: Arun Kumar Kuchibhotla

    Abstract: Ever since the proof of asymptotic normality of maximum likelihood estimator by Cramer (1946), it has been understood that a basic technique of the Taylor series expansion suffices for asymptotics of $M$-estimators with smooth/differentiable loss function. Although the Taylor series expansion is a purely deterministic tool, the realization that the asymptotic normality results can also be made det… ▽ More

    Submitted 13 September, 2018; originally announced September 2018.

    Comments: 49 pages

  29. arXiv:1806.06153  [pdf, ps, other

    math.ST

    High-dimensional CLT: Improvements, Non-uniform Extensions and Large Deviations

    Authors: Arun Kumar Kuchibhotla, Somabha Mukherjee, Debapratim Banerjee

    Abstract: Central limit theorems (CLTs) for high-dimensional random vectors with dimension possibly growing with the sample size have received a lot of attention in the recent times. Chernozhukov et al. (2017) proved a Berry--Esseen type result for high-dimensional averages for the class of hyperrectangles and they proved that the rate of convergence can be upper bounded by $n^{-1/6}$ upto a polynomial fact… ▽ More

    Submitted 24 June, 2019; v1 submitted 15 June, 2018; originally announced June 2018.

    Comments: 76 pages

  30. arXiv:1806.04119  [pdf, ps, other

    stat.ME math.ST

    Valid Post-selection Inference in Assumption-lean Linear Regression

    Authors: Arun Kumar Kuchibhotla, Lawrence D. Brown, Andreas Buja, Edward I. George, Linda Zhao

    Abstract: Construction of valid statistical inference for estimators based on data-driven selection has received a lot of attention in the recent times. Berk et al. (2013) is possibly the first work to provide valid inference for Gaussian homoscedastic linear regression with fixed covariates under arbitrary covariate/variable selection. The setting is unrealistic and is extended by Bachoc et al. (2016) by r… ▽ More

    Submitted 11 June, 2018; originally announced June 2018.

    Comments: 49 pages

  31. arXiv:1804.02605  [pdf, other

    math.ST stat.ME stat.ML

    Moving Beyond Sub-Gaussianity in High-Dimensional Statistics: Applications in Covariance Estimation and Linear Regression

    Authors: Arun Kumar Kuchibhotla, Abhishek Chakrabortty

    Abstract: Concentration inequalities form an essential toolkit in the study of high dimensional (HD) statistical methods. Most of the relevant statistics literature in this regard is based on sub-Gaussian or sub-exponential tail assumptions. In this paper, we first bring together various probabilistic inequalities for sums of independent random variables under much more general exponential type (namely sub-… ▽ More

    Submitted 9 May, 2022; v1 submitted 7 April, 2018; originally announced April 2018.

    Comments: 68 pages; Revised version; To appear in Information and Inference: A Journal of the IMA

    MSC Class: 60G50; 62J05; 60B20; 62J07; 62E17; 60F05; 60E15

    Journal ref: Information and Inference: A Journal of the IMA (2022), Vol. 11, No. 4, 1389-1456

  32. arXiv:1802.05801  [pdf, ps, other

    math.ST

    Uniform-in-Submodel Bounds for Linear Regression in a Model Free Framework

    Authors: Arun Kumar Kuchibhotla, Lawrence D. Brown, Andreas Buja, Edward I. George, Linda Zhao

    Abstract: For the last two decades, high-dimensional data and methods have proliferated throughout the literature. Yet, the classical technique of linear regression has not lost its usefulness in applications. In fact, many high-dimensional estimation techniques can be seen as variable selection that leads to a smaller set of variables (a ``sub-model'') where classical linear regression applies. We analyze… ▽ More

    Submitted 17 May, 2021; v1 submitted 15 February, 2018; originally announced February 2018.

    Comments: Forthcoming at Econometric Theory

  33. arXiv:1708.00145  [pdf, other

    math.ST stat.CO stat.ME

    Semiparametric Efficiency in Convexity Constrained Single Index Model

    Authors: Arun K. Kuchibhotla, Rohit K. Patra, Bodhisattva Sen

    Abstract: We consider estimation and inference in a single index regression model with an unknown convex link function. We introduce a convex and Lipschitz constrained least squares estimator (CLSE) for both the parametric and the nonparametric components given independent and identically distributed observations. We prove the consistency and find the rates of convergence of the CLSE when the errors are ass… ▽ More

    Submitted 13 January, 2021; v1 submitted 31 July, 2017; originally announced August 2017.

    Comments: Removed the density bounded away from zero assumption in assumption (A5). Weakened assumption (B2)

  34. arXiv:1612.03257  [pdf, other

    math.ST

    Models as Approximations II: A Model-Free Theory of Parametric Regression

    Authors: Andreas Buja, Lawrence Brown, Arun Kumar Kuchibhotla, Richard Berk, Ed George, Linda Zhao

    Abstract: We develop a model-free theory of general types of parametric regression for iid observations. The theory replaces the parameters of parametric models with statistical functionals, to be called "regression functionals'', defined on large non-parametric classes of joint $\xy$ distributions, without assuming a correct model. Parametric models are reduced to heuristics to suggest plausible objective… ▽ More

    Submitted 6 July, 2019; v1 submitted 10 December, 2016; originally announced December 2016.

    Comments: Submitted

    MSC Class: 62A01