Skip to main content

Showing 1–23 of 23 results for author: Buja, A

.
  1. Hole or grain? A Section Pursuit Index for Finding Hidden Structure in Multiple Dimensions

    Authors: Ursula Laa, Dianne Cook, Andreas Buja, German Valencia

    Abstract: Multivariate data is often visualized using linear projections, produced by techniques such as principal component analysis, linear discriminant analysis, and projection pursuit. A problem with projections is that they obscure low and high density regions near the center of the distribution. Sections, or slices, can help to reveal them. This paper develops a section pursuit method, building on the… ▽ More

    Submitted 10 March, 2022; v1 submitted 28 April, 2020; originally announced April 2020.

    Comments: v3 is accepted for publication in JCGS and contains the appendix

    Journal ref: Journal of Computational and Graphical Statistics, 2022

  2. arXiv:1910.06386  [pdf, other

    math.ST stat.ME

    All of Linear Regression

    Authors: Arun K. Kuchibhotla, Lawrence D. Brown, Andreas Buja, Junhui Cai

    Abstract: Least squares linear regression is one of the oldest and widely used data analysis tools. Although the theoretical analysis of the ordinary least squares (OLS) estimator is as old, several fundamental questions are yet to be answered. Suppose regression observations $(X_1,Y_1),\ldots,(X_n,Y_n)\in\mathbb{R}^d\times\mathbb{R}$ (not necessarily independent) are available. Some of the questions we dea… ▽ More

    Submitted 14 October, 2019; originally announced October 2019.

  3. arXiv:1809.10538  [pdf, ps, other

    math.ST

    Model-free Study of Ordinary Least Squares Linear Regression

    Authors: Arun K. Kuchibhotla, Lawrence D. Brown, Andreas Buja

    Abstract: Ordinary least squares (OLS) linear regression is one of the most basic statistical techniques for data analysis. In the main stream literature and the statistical education, the study of linear regression is typically restricted to the case where the covariates are fixed, errors are mean zero Gaussians with variance independent of the (fixed) covariates. Even though OLS has been studied under mis… ▽ More

    Submitted 27 September, 2018; originally announced September 2018.

    Comments: 33 pages

  4. A Security Analysis of IoT Encryption: Side-channel Cube Attack on Simeck32/64

    Authors: Alya Geogiana Buja, Shekh Faisal Abdul-Latip, Rabiah Ahmad

    Abstract: Simeck, a lightweight block cipher has been proposed to be one of the encryption that can be employed in the Internet of Things (IoT) applications. Therefore, this paper presents the security of the Simeck32/64 block cipher against side-channel cube attack. We exhibit our attack against Simeck32/64 using the Hamming weight leakage assumption to extract linearly independent equations in key bits. W… ▽ More

    Submitted 10 August, 2018; originally announced August 2018.

    Comments: 12 pages, 6 figures, 4 tables, International Journal of Computer Networks & Communications

    Journal ref: International Journal of Computer Networks & Communications (IJCNC) Vol.10, No.4, July 2018

  5. arXiv:1807.04164  [pdf, other

    stat.ME

    Using Recursive Partitioning to Find and Estimate Heterogenous Treatment Effects In Randomized Clinical Trials

    Authors: Richard Berk, Matthew Olson, Andreas Buja, Aurelie Ouss

    Abstract: Heterogeneous treatment effects can be very important in the analysis of randomized clinical trials. Heightened risks or enhanced benefits may exist for particular subsets of study subjects. When the heterogeneous treatment effects are specified as the research is being designed, there are proper and readily available analysis techniques. When the heterogeneous treatment effects are inductively ob… ▽ More

    Submitted 11 July, 2018; originally announced July 2018.

    Comments: 21 pages, 1 figure, under review

  6. arXiv:1806.09014  [pdf, other

    stat.ME

    Assumption Lean Regression

    Authors: Richard Berk, Andreas Buja, Lawrence Brown, Edward George, Arun Kumar Kuchibhotla, Weijie J. Su, Linda Zhao

    Abstract: It is well known that models used in conventional regression analysis are commonly misspecified. A standard response is little more than a shrug. Data analysts invoke Box's maxim that all models are wrong and then proceed as if the results are useful nevertheless. In this paper, we provide an alternative. Regression models are treated explicitly as approximations of a true response surface that ca… ▽ More

    Submitted 26 June, 2018; v1 submitted 23 June, 2018; originally announced June 2018.

    Comments: Submitted for review, 21 pages, 2 figures

  7. arXiv:1806.04119  [pdf, ps, other

    stat.ME math.ST

    Valid Post-selection Inference in Assumption-lean Linear Regression

    Authors: Arun Kumar Kuchibhotla, Lawrence D. Brown, Andreas Buja, Edward I. George, Linda Zhao

    Abstract: Construction of valid statistical inference for estimators based on data-driven selection has received a lot of attention in the recent times. Berk et al. (2013) is possibly the first work to provide valid inference for Gaussian homoscedastic linear regression with fixed covariates under arbitrary covariate/variable selection. The setting is unrealistic and is extended by Bachoc et al. (2016) by r… ▽ More

    Submitted 11 June, 2018; originally announced June 2018.

    Comments: 49 pages

  8. arXiv:1802.05801  [pdf, ps, other

    math.ST

    Uniform-in-Submodel Bounds for Linear Regression in a Model Free Framework

    Authors: Arun Kumar Kuchibhotla, Lawrence D. Brown, Andreas Buja, Edward I. George, Linda Zhao

    Abstract: For the last two decades, high-dimensional data and methods have proliferated throughout the literature. Yet, the classical technique of linear regression has not lost its usefulness in applications. In fact, many high-dimensional estimation techniques can be seen as variable selection that leads to a smaller set of variables (a ``sub-model'') where classical linear regression applies. We analyze… ▽ More

    Submitted 17 May, 2021; v1 submitted 15 February, 2018; originally announced February 2018.

    Comments: Forthcoming at Econometric Theory

  9. arXiv:1612.03257  [pdf, other

    math.ST

    Models as Approximations II: A Model-Free Theory of Parametric Regression

    Authors: Andreas Buja, Lawrence Brown, Arun Kumar Kuchibhotla, Richard Berk, Ed George, Linda Zhao

    Abstract: We develop a model-free theory of general types of parametric regression for iid observations. The theory replaces the parameters of parametric models with statistical functionals, to be called "regression functionals'', defined on large non-parametric classes of joint $\xy$ distributions, without assuming a correct model. Parametric models are reduced to heuristics to suggest plausible objective… ▽ More

    Submitted 6 July, 2019; v1 submitted 10 December, 2016; originally announced December 2016.

    Comments: Submitted

    MSC Class: 62A01

  10. arXiv:1612.02528  [pdf, ps, other

    stat.ML

    Smoothing Effects of Bagging: Von Mises Expansions of Bagged Statistical Functionals

    Authors: Andreas Buja, Werner Stuetzle

    Abstract: Bagging is a device intended for reducing the prediction error of learning algorithms. In its simplest form, bagging draws bootstrap samples from the training sample, applies the learning algorithm to each bootstrap sample, and then averages the resulting prediction rules. We extend the definition of bagging from statistics to statistical functionals and study the von Mises expansion of bagged s… ▽ More

    Submitted 7 December, 2016; originally announced December 2016.

    MSC Class: 62G09

  11. arXiv:1612.02391  [pdf, other

    math.ST

    Semi-Supervised linear regression

    Authors: David Azriel, Lawrence D. Brown, Michael Sklar, Richard Berk, Andreas Buja, Linda Zhao

    Abstract: We study a regression problem where for some part of the data we observe both the label variable ($Y$) and the predictors (${\bf X}$), while for other part of the data only the predictors are given. Such a problem arises, for example, when observations of the label variable are costly and may require a skilled human agent. When the conditional expectation $E[Y | {\bf X}]$ is not exactly linear, on… ▽ More

    Submitted 13 April, 2021; v1 submitted 7 December, 2016; originally announced December 2016.

  12. arXiv:1511.06821  [pdf, other

    stat.ME stat.ML

    Kernel Additive Principal Components

    Authors: Xin Lu Tan, Andreas Buja, Zongming Ma

    Abstract: Additive principal components (APCs for short) are a nonlinear generalization of linear principal components. We focus on smallest APCs to describe additive nonlinear constraints that are approximately satisfied by the data. Thus APCs fit data with implicit equations that treat the variables symmetrically, as opposed to regression analyses which fit data with explicit equations that treat the data… ▽ More

    Submitted 20 November, 2015; originally announced November 2015.

    Comments: 54 pages including appendices

  13. arXiv:1511.00273  [pdf, other

    stat.ME

    Calibrated Percentile Double Bootstrap For Robust Linear Regression Inference

    Authors: Daniel McCarthy, Kai Zhang, Lawrence Brown, Richard Berk, Andreas Buja, Edward George, Linda Zhao

    Abstract: We consider inference for the parameters of a linear model when the covariates are random and the relationship between response and covariates is possibly non-linear. Conventional inference methods such as z-intervals perform poorly in these cases. We propose a double bootstrap-based calibrated percentile method, perc-cal, as a general-purpose CI method which performs very well relative to alterna… ▽ More

    Submitted 16 January, 2017; v1 submitted 1 November, 2015; originally announced November 2015.

    MSC Class: 62F40

  14. arXiv:1405.6803  [pdf, ps, other

    math.ST stat.ME

    Discussion: "A significance test for the lasso"

    Authors: A. Buja, L. Brown

    Abstract: Discussion of "A significance test for the lasso" by Richard Lockhart, Jonathan Taylor, Ryan J. Tibshirani, Robert Tibshirani [arXiv:1301.7161].

    Submitted 27 May, 2014; originally announced May 2014.

    Comments: Published in at http://dx.doi.org/10.1214/14-AOS1175F the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS1175F

    Journal ref: Annals of Statistics 2014, Vol. 42, No. 2, 509-517

  15. arXiv:1405.0338  [pdf, ps, other

    math.ST

    Rate Optimal Denoising of Simultaneously Sparse and Low Rank Matrices

    Authors: Dan Yang, Zongming Ma, Andreas Buja

    Abstract: We study minimax rates for denoising simultaneously sparse and low rank matrices in high dimensions. We show that an iterative thresholding algorithm achieves (near) optimal rates adaptively under mild conditions for a large class of loss functions. Numerical experiments on synthetic datasets also demonstrate the competitive performance of the proposed method.

    Submitted 1 May, 2014; originally announced May 2014.

  16. arXiv:1404.1578  [pdf, other

    stat.ME

    Models as Approximations I: Consequences Illustrated with Linear Regression

    Authors: Andreas Buja, Richard Berk, Lawrence Brown, Edward George, Emil Pitkin, Mikhail Traskin, Linda Zhao, Kai Zhang

    Abstract: In the early 1980s Halbert White inaugurated a "model-robust'' form of statistical inference based on the "sandwich estimator'' of standard error. This estimator is known to be "heteroskedasticity-consistent", but it is less well-known to be "nonlinearity-consistent'' as well. Nonlinearity, however, raises fundamental issues because in its presence regressors are not ancillary, hence can't be trea… ▽ More

    Submitted 6 July, 2019; v1 submitted 6 April, 2014; originally announced April 2014.

    Comments: Submitted

  17. arXiv:1311.0291  [pdf, other

    stat.ME

    Improved Precision in Estimating Average Treatment Effects

    Authors: Emil Pitkin, Richard Berk, Lawrence Brown, Andreas Buja, Ed George, Kai Zhang, Linda Zhao

    Abstract: The Average Treatment Effect (ATE) is a global measure of the effectiveness of an experimental treatment intervention. Classical methods of its estimation either ignore relevant covariates or do not fully exploit them. Moreover, past work has considered covariates as fixed. We present a method for improving the precision of the ATE estimate: the treatment and control responses are estimated via a… ▽ More

    Submitted 1 November, 2013; originally announced November 2013.

    Comments: 22 pages, 1 figure

  18. arXiv:1310.5677  [pdf

    stat.ME

    Penalized Split Criteria for Interpretable Trees

    Authors: Alex Goldstein, Andreas Buja

    Abstract: This paper describes techniques for growing classification and regression trees designed to induce visually interpretable trees. This is achieved by penalizing splits that extend the subset of features used in a particular branch of the tree. After a brief motivation, we summarize existing methods and introduce new ones, providing illustrative examples throughout. Using a number of real classifica… ▽ More

    Submitted 21 October, 2013; originally announced October 2013.

    Comments: 25 pages

  19. Valid post-selection inference

    Authors: Richard Berk, Lawrence Brown, Andreas Buja, Kai Zhang, Linda Zhao

    Abstract: It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees that classical statistical theory provides for tests and confidence intervals when the model has been chosen a priori. We propose to produce valid ``post-selection inference'' by reducing the problem to… ▽ More

    Submitted 5 June, 2013; originally announced June 2013.

    Comments: Published in at http://dx.doi.org/10.1214/12-AOS1077 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS1077

    Journal ref: Annals of Statistics 2013, Vol. 41, No. 2, 802-837

  20. arXiv:1112.2433  [pdf, other

    stat.ME

    A Sparse SVD Method for High-dimensional Data

    Authors: Dan Yang, Zongming Ma, Andreas Buja

    Abstract: We present a new computational approach to approximating a large, noisy data table by a low-rank matrix with sparse singular vectors. The approximation is obtained from thresholded subspace iterations that produce the singular vectors simultaneously, rather than successively as in competing proposals. We introduce novel ways to estimate thresholding parameters which obviate the need for computatio… ▽ More

    Submitted 11 December, 2011; originally announced December 2011.

  21. A Conversation with Peter Huber

    Authors: Andreas Buja, Hans R. Künsch

    Abstract: Peter J. Huber was born on March 25, 1934, in Wohlen, a small town in the Swiss countryside. He obtained a diploma in mathematics in 1958 and a Ph.D. in mathematics in 1961, both from ETH Zurich. His thesis was in pure mathematics, but he then decided to go into statistics. He spent 1961--1963 as a postdoc at the statistics department in Berkeley where he wrote his first and most famous paper on… ▽ More

    Submitted 6 August, 2008; originally announced August 2008.

    Comments: Published in at http://dx.doi.org/10.1214/07-STS251 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-STS-STS251

    Journal ref: Statistical Science 2008, Vol. 23, No. 1, 120-135

  22. Functional principal components analysis via penalized rank one approximation

    Authors: Jianhua Z. Huang, Haipeng Shen, Andreas Buja

    Abstract: Two existing approaches to functional principal components analysis (FPCA) are due to Rice and Silverman (1991) and Silverman (1996), both based on maximizing variance but introducing penalization in different ways. In this article we propose an alternative approach to FPCA using penalized rank one approximation to the data matrix. Our contributions are four-fold: (1) by considering invariance u… ▽ More

    Submitted 30 July, 2008; originally announced July 2008.

    Comments: Published in at http://dx.doi.org/10.1214/08-EJS218 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-EJS-EJS_2008_218 MSC Class: 62G08; 62H25 (Primary) 65F30 (Secondary)

    Journal ref: Electronic Journal of Statistics 2008, Vol. 2, 678-695

  23. Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting

    Authors: Andreas Buja, David Mease, Abraham J. Wyner

    Abstract: The authors are doing the readers of Statistical Science a true service with a well-written and up-to-date overview of boosting that originated with the seminal algorithms of Freund and Schapire. Equally, we are grateful for high-level software that will permit a larger readership to experiment with, or simply apply, boosting-inspired model fitting. The authors show us a world of methodology tha… ▽ More

    Submitted 17 April, 2008; originally announced April 2008.

    Comments: Published in at http://dx.doi.org/10.1214/07-STS242B the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-STS-STS242B

    Journal ref: Statistical Science 2007, Vol. 22, No. 4, 506-512