Search | arXiv e-print repository

arXiv:2004.13327 [pdf, other]

doi 10.1080/10618600.2022.2035230

Hole or grain? A Section Pursuit Index for Finding Hidden Structure in Multiple Dimensions

Authors: Ursula Laa, Dianne Cook, Andreas Buja, German Valencia

Abstract: Multivariate data is often visualized using linear projections, produced by techniques such as principal component analysis, linear discriminant analysis, and projection pursuit. A problem with projections is that they obscure low and high density regions near the center of the distribution. Sections, or slices, can help to reveal them. This paper develops a section pursuit method, building on the… ▽ More Multivariate data is often visualized using linear projections, produced by techniques such as principal component analysis, linear discriminant analysis, and projection pursuit. A problem with projections is that they obscure low and high density regions near the center of the distribution. Sections, or slices, can help to reveal them. This paper develops a section pursuit method, building on the extensive work in projection pursuit, to search for interesting slices of the data. Linear projections are used to define sections of the parameter space, and to calculate interestingness by comparing the distribution of observations, inside and outside a section. By optimizing this index, it is possible to reveal features such as holes (low density) or grains (high density). The optimization is incorporated into a guided tour so that the search for structure can be dynamic. The approach can be useful for problems when data distributions depart from uniform or normal, as in visually exploring nonlinear manifolds, and functions in multivariate space. Two applications of section pursuit are shown: exploring decision boundaries from classification models, and exploring subspaces induced by complex inequality conditions from multiple parameter model. The new methods are available in R, in the tourr package. △ Less

Submitted 10 March, 2022; v1 submitted 28 April, 2020; originally announced April 2020.

Comments: v3 is accepted for publication in JCGS and contains the appendix

Journal ref: Journal of Computational and Graphical Statistics, 2022

arXiv:1910.06386 [pdf, other]

All of Linear Regression

Authors: Arun K. Kuchibhotla, Lawrence D. Brown, Andreas Buja, Junhui Cai

Abstract: Least squares linear regression is one of the oldest and widely used data analysis tools. Although the theoretical analysis of the ordinary least squares (OLS) estimator is as old, several fundamental questions are yet to be answered. Suppose regression observations $(X_1,Y_1),\ldots,(X_n,Y_n)\in\mathbb{R}^d\times\mathbb{R}$ (not necessarily independent) are available. Some of the questions we dea… ▽ More Least squares linear regression is one of the oldest and widely used data analysis tools. Although the theoretical analysis of the ordinary least squares (OLS) estimator is as old, several fundamental questions are yet to be answered. Suppose regression observations $(X_1,Y_1),\ldots,(X_n,Y_n)\in\mathbb{R}^d\times\mathbb{R}$ (not necessarily independent) are available. Some of the questions we deal with are as follows: under what conditions, does the OLS estimator converge and what is the limit? What happens if the dimension is allowed to grow with $n$? What happens if the observations are dependent with dependence possibly strengthening with $n$? How to do statistical inference under these kinds of misspecification? What happens to the OLS estimator under variable selection? How to do inference under misspecification and variable selection? We answer all the questions raised above with one simple deterministic inequality which holds for any set of observations and any sample size. This implies that all our results are a finite sample (non-asymptotic) in nature. In the end, one only needs to bound certain random quantities under specific settings of interest to get concrete rates and we derive these bounds for the case of independent observations. In particular, the problem of inference after variable selection is studied, for the first time, when $d$, the number of covariates increases (almost exponentially) with sample size $n$. We provide comments on the ``right'' statistic to consider for inference under variable selection and efficient computation of quantiles. △ Less

Submitted 14 October, 2019; originally announced October 2019.

arXiv:1809.10538 [pdf, ps, other]

Model-free Study of Ordinary Least Squares Linear Regression

Authors: Arun K. Kuchibhotla, Lawrence D. Brown, Andreas Buja

Abstract: Ordinary least squares (OLS) linear regression is one of the most basic statistical techniques for data analysis. In the main stream literature and the statistical education, the study of linear regression is typically restricted to the case where the covariates are fixed, errors are mean zero Gaussians with variance independent of the (fixed) covariates. Even though OLS has been studied under mis… ▽ More Ordinary least squares (OLS) linear regression is one of the most basic statistical techniques for data analysis. In the main stream literature and the statistical education, the study of linear regression is typically restricted to the case where the covariates are fixed, errors are mean zero Gaussians with variance independent of the (fixed) covariates. Even though OLS has been studied under misspecification from as early as the 1960's, the implications have not yet caught up with the main stream literature and applied sciences. The present article is an attempt at a unified viewpoint that makes the various implications of misspecification stand out. △ Less

Submitted 27 September, 2018; originally announced September 2018.

Comments: 33 pages

arXiv:1808.03557 [pdf]

doi 10.5121/ijcnc.2018.10406

A Security Analysis of IoT Encryption: Side-channel Cube Attack on Simeck32/64

Authors: Alya Geogiana Buja, Shekh Faisal Abdul-Latip, Rabiah Ahmad

Abstract: Simeck, a lightweight block cipher has been proposed to be one of the encryption that can be employed in the Internet of Things (IoT) applications. Therefore, this paper presents the security of the Simeck32/64 block cipher against side-channel cube attack. We exhibit our attack against Simeck32/64 using the Hamming weight leakage assumption to extract linearly independent equations in key bits. W… ▽ More Simeck, a lightweight block cipher has been proposed to be one of the encryption that can be employed in the Internet of Things (IoT) applications. Therefore, this paper presents the security of the Simeck32/64 block cipher against side-channel cube attack. We exhibit our attack against Simeck32/64 using the Hamming weight leakage assumption to extract linearly independent equations in key bits. We have been able to find 32 linearly independent equations in 32 key variables by only considering the second bit from the LSB of the Hamming weight leakage of the internal state on the fourth round of the cipher. This enables our attack to improve previous attacks on Simeck32/64 within side-channel attack model with better time and data complexity of 2^35 and 2^11.29 respectively. △ Less

Submitted 10 August, 2018; originally announced August 2018.

Comments: 12 pages, 6 figures, 4 tables, International Journal of Computer Networks & Communications

Journal ref: International Journal of Computer Networks & Communications (IJCNC) Vol.10, No.4, July 2018

arXiv:1807.04164 [pdf, other]

Using Recursive Partitioning to Find and Estimate Heterogenous Treatment Effects In Randomized Clinical Trials

Authors: Richard Berk, Matthew Olson, Andreas Buja, Aurelie Ouss

Abstract: Heterogeneous treatment effects can be very important in the analysis of randomized clinical trials. Heightened risks or enhanced benefits may exist for particular subsets of study subjects. When the heterogeneous treatment effects are specified as the research is being designed, there are proper and readily available analysis techniques. When the heterogeneous treatment effects are inductively ob… ▽ More Heterogeneous treatment effects can be very important in the analysis of randomized clinical trials. Heightened risks or enhanced benefits may exist for particular subsets of study subjects. When the heterogeneous treatment effects are specified as the research is being designed, there are proper and readily available analysis techniques. When the heterogeneous treatment effects are inductively obtained as an experiment's data are analyzed, significant complications are introduced. There can be a need for special loss functions designed to find local average treatment effects and for techniques that properly address post selection statistical inference. In this paper, we tackle both while undertaking a recursive partitioning analysis of a randomized clinical trial testing whether individuals on probation, who are low risk, can be minimally supervised with no increase in recidivism. △ Less

Submitted 11 July, 2018; originally announced July 2018.

Comments: 21 pages, 1 figure, under review

arXiv:1806.09014 [pdf, other]

Assumption Lean Regression

Authors: Richard Berk, Andreas Buja, Lawrence Brown, Edward George, Arun Kumar Kuchibhotla, Weijie J. Su, Linda Zhao

Abstract: It is well known that models used in conventional regression analysis are commonly misspecified. A standard response is little more than a shrug. Data analysts invoke Box's maxim that all models are wrong and then proceed as if the results are useful nevertheless. In this paper, we provide an alternative. Regression models are treated explicitly as approximations of a true response surface that ca… ▽ More It is well known that models used in conventional regression analysis are commonly misspecified. A standard response is little more than a shrug. Data analysts invoke Box's maxim that all models are wrong and then proceed as if the results are useful nevertheless. In this paper, we provide an alternative. Regression models are treated explicitly as approximations of a true response surface that can have a number of desirable statistical properties, including estimates that are asymptotically unbiased. Valid statistical inference follows. We generalize the formulation to include regression functionals, which broadens substantially the range of potential applications. An empirical application is provided to illustrate the paper's key concepts. △ Less

Submitted 26 June, 2018; v1 submitted 23 June, 2018; originally announced June 2018.

Comments: Submitted for review, 21 pages, 2 figures

arXiv:1806.04119 [pdf, ps, other]

Valid Post-selection Inference in Assumption-lean Linear Regression

Authors: Arun Kumar Kuchibhotla, Lawrence D. Brown, Andreas Buja, Edward I. George, Linda Zhao

Abstract: Construction of valid statistical inference for estimators based on data-driven selection has received a lot of attention in the recent times. Berk et al. (2013) is possibly the first work to provide valid inference for Gaussian homoscedastic linear regression with fixed covariates under arbitrary covariate/variable selection. The setting is unrealistic and is extended by Bachoc et al. (2016) by r… ▽ More Construction of valid statistical inference for estimators based on data-driven selection has received a lot of attention in the recent times. Berk et al. (2013) is possibly the first work to provide valid inference for Gaussian homoscedastic linear regression with fixed covariates under arbitrary covariate/variable selection. The setting is unrealistic and is extended by Bachoc et al. (2016) by relaxing the distributional assumptions. A major drawback of the aforementioned works is that the construction of valid confidence regions is computationally intensive. In this paper, we first prove that post-selection inference is equivalent to simultaneous inference and then construct valid post-selection confidence regions which are computationally simple. Our construction is based on deterministic inequalities and apply to independent as well as dependent random variables without the requirement of correct distributional assumptions. Finally, we compare the volume of our confidence regions with the existing ones and show that under non-stochastic covariates, our regions are much smaller. △ Less

Submitted 11 June, 2018; originally announced June 2018.

Comments: 49 pages

arXiv:1802.05801 [pdf, ps, other]

Uniform-in-Submodel Bounds for Linear Regression in a Model Free Framework

Authors: Arun Kumar Kuchibhotla, Lawrence D. Brown, Andreas Buja, Edward I. George, Linda Zhao

Abstract: For the last two decades, high-dimensional data and methods have proliferated throughout the literature. Yet, the classical technique of linear regression has not lost its usefulness in applications. In fact, many high-dimensional estimation techniques can be seen as variable selection that leads to a smaller set of variables (a ``sub-model'') where classical linear regression applies. We analyze… ▽ More For the last two decades, high-dimensional data and methods have proliferated throughout the literature. Yet, the classical technique of linear regression has not lost its usefulness in applications. In fact, many high-dimensional estimation techniques can be seen as variable selection that leads to a smaller set of variables (a ``sub-model'') where classical linear regression applies. We analyze linear regression estimators resulting from model-selection by proving estimation error and linear representation bounds uniformly over sets of submodels. Based on deterministic inequalities, our results provide ``good'' rates when applied to both independent and dependent data. These results are useful in meaningfully interpreting the linear regression estimator obtained after exploring and reducing the variables and also in justifying post model-selection inference. All results are derived under no model assumptions and are non-asymptotic in nature. △ Less

Submitted 17 May, 2021; v1 submitted 15 February, 2018; originally announced February 2018.

Comments: Forthcoming at Econometric Theory

arXiv:1612.03257 [pdf, other]

Models as Approximations II: A Model-Free Theory of Parametric Regression

Authors: Andreas Buja, Lawrence Brown, Arun Kumar Kuchibhotla, Richard Berk, Ed George, Linda Zhao

Abstract: We develop a model-free theory of general types of parametric regression for iid observations. The theory replaces the parameters of parametric models with statistical functionals, to be called "regression functionals'', defined on large non-parametric classes of joint $\xy$ distributions, without assuming a correct model. Parametric models are reduced to heuristics to suggest plausible objective… ▽ More We develop a model-free theory of general types of parametric regression for iid observations. The theory replaces the parameters of parametric models with statistical functionals, to be called "regression functionals'', defined on large non-parametric classes of joint $\xy$ distributions, without assuming a correct model. Parametric models are reduced to heuristics to suggest plausible objective functions. An example of a regression functional is the vector of slopes of linear equations fitted by OLS to largely arbitrary $\xy$ distributions, without assuming a linear model (see Part~I). More generally, regression functionals can be defined by minimizing objective functions or solving estimating equations at joint $\xy$ distributions. In this framework it is possible to achieve the following: (1)~define a notion of well-specification for regression functionals that replaces the notion of correct specification of models, (2)~propose a well-specification diagnostic for regression functionals based on reweighting distributions and data, (3)~decompose sampling variability of regression functionals into two sources, one due to the conditional response distribution and another due to the regressor distribution interacting with misspecification, both of order $N^{-1/2}$, (4)~exhibit plug-in/sandwich estimators of standard error as limit cases of $\xy$ bootstrap estimators, and (5)~provide theoretical heuristics to indicate that $\xy$ bootstrap standard errors may generally be more stable than sandwich estimators. △ Less

Submitted 6 July, 2019; v1 submitted 10 December, 2016; originally announced December 2016.

Comments: Submitted

MSC Class: 62A01

arXiv:1612.02528 [pdf, ps, other]

Smoothing Effects of Bagging: Von Mises Expansions of Bagged Statistical Functionals

Authors: Andreas Buja, Werner Stuetzle

Abstract: Bagging is a device intended for reducing the prediction error of learning algorithms. In its simplest form, bagging draws bootstrap samples from the training sample, applies the learning algorithm to each bootstrap sample, and then averages the resulting prediction rules. We extend the definition of bagging from statistics to statistical functionals and study the von Mises expansion of bagged s… ▽ More Bagging is a device intended for reducing the prediction error of learning algorithms. In its simplest form, bagging draws bootstrap samples from the training sample, applies the learning algorithm to each bootstrap sample, and then averages the resulting prediction rules. We extend the definition of bagging from statistics to statistical functionals and study the von Mises expansion of bagged statistical functionals. We show that the expansion is related to the Efron-Stein ANOVA expansion of the raw (unbagged) functional. The basic observation is that a bagged functional is always smooth in the sense that the von Mises expansion exists and is finite of length 1 + resample size $M$. This holds even if the raw functional is rough or unstable. The resample size $M$ acts as a smoothing parameter, where a smaller $M$ means more smoothing. △ Less

Submitted 7 December, 2016; originally announced December 2016.

MSC Class: 62G09

arXiv:1612.02391 [pdf, other]

Semi-Supervised linear regression

Authors: David Azriel, Lawrence D. Brown, Michael Sklar, Richard Berk, Andreas Buja, Linda Zhao

Abstract: We study a regression problem where for some part of the data we observe both the label variable ($Y$) and the predictors (${\bf X}$), while for other part of the data only the predictors are given. Such a problem arises, for example, when observations of the label variable are costly and may require a skilled human agent. When the conditional expectation $E[Y | {\bf X}]$ is not exactly linear, on… ▽ More We study a regression problem where for some part of the data we observe both the label variable ($Y$) and the predictors (${\bf X}$), while for other part of the data only the predictors are given. Such a problem arises, for example, when observations of the label variable are costly and may require a skilled human agent. When the conditional expectation $E[Y | {\bf X}]$ is not exactly linear, one can consider the best linear approximation to the conditional expectation, which can be estimated consistently by the least squares estimates (LSE). The latter depends only on the labeled data. We suggest improved alternative estimates to the LSE that use also the unlabeled data. Our estimation method can be easily implemented and has simply described asymptotic properties.The new estimates asymptotically dominate the usual standard procedures under certain non-linearity condition of $E[Y | {\bf X}]$; otherwise, they are asymptotically equivalent.The performance of the new estimator for small sample size is investigated in an extensive simulation study. A real data example of inferring homeless population is used to illustrate the new methodology. △ Less

Submitted 13 April, 2021; v1 submitted 7 December, 2016; originally announced December 2016.

arXiv:1511.06821 [pdf, other]

Kernel Additive Principal Components

Authors: Xin Lu Tan, Andreas Buja, Zongming Ma

Abstract: Additive principal components (APCs for short) are a nonlinear generalization of linear principal components. We focus on smallest APCs to describe additive nonlinear constraints that are approximately satisfied by the data. Thus APCs fit data with implicit equations that treat the variables symmetrically, as opposed to regression analyses which fit data with explicit equations that treat the data… ▽ More Additive principal components (APCs for short) are a nonlinear generalization of linear principal components. We focus on smallest APCs to describe additive nonlinear constraints that are approximately satisfied by the data. Thus APCs fit data with implicit equations that treat the variables symmetrically, as opposed to regression analyses which fit data with explicit equations that treat the data asymmetrically by singling out a response variable. We propose a regularized data-analytic procedure for APC estimation using kernel methods. In contrast to existing approaches to APCs that are based on regularization through subspace restriction, kernel methods achieve regularization through shrinkage and therefore grant distinctive flexibility in APC estimation by allowing the use of infinite-dimensional functions spaces for searching APC transformation while retaining computational feasibility. To connect population APCs and kernelized finite-sample APCs, we study kernelized population APCs and their associated eigenproblems, which eventually lead to the establishment of consistency of the estimated APCs. Lastly, we discuss an iterative algorithm for computing kernelized finite-sample APCs. △ Less

Submitted 20 November, 2015; originally announced November 2015.

Comments: 54 pages including appendices

arXiv:1511.00273 [pdf, other]

Calibrated Percentile Double Bootstrap For Robust Linear Regression Inference

Authors: Daniel McCarthy, Kai Zhang, Lawrence Brown, Richard Berk, Andreas Buja, Edward George, Linda Zhao

Abstract: We consider inference for the parameters of a linear model when the covariates are random and the relationship between response and covariates is possibly non-linear. Conventional inference methods such as z-intervals perform poorly in these cases. We propose a double bootstrap-based calibrated percentile method, perc-cal, as a general-purpose CI method which performs very well relative to alterna… ▽ More We consider inference for the parameters of a linear model when the covariates are random and the relationship between response and covariates is possibly non-linear. Conventional inference methods such as z-intervals perform poorly in these cases. We propose a double bootstrap-based calibrated percentile method, perc-cal, as a general-purpose CI method which performs very well relative to alternative methods in challenging situations such as these. The superior performance of perc-cal is demonstrated by a thorough, full-factorial design synthetic data study as well as a real data example involving the length of criminal sentences. We also provide theoretical justification for the perc-cal method under mild conditions. The method is implemented in the R package `perccal', available through CRAN and coded primarily in C++, to make it easier for practitioners to use. △ Less

Submitted 16 January, 2017; v1 submitted 1 November, 2015; originally announced November 2015.

MSC Class: 62F40

arXiv:1405.6803 [pdf, ps, other]

doi 10.1214/14-AOS1175F

Discussion: "A significance test for the lasso"

Authors: A. Buja, L. Brown

Abstract: Discussion of "A significance test for the lasso" by Richard Lockhart, Jonathan Taylor, Ryan J. Tibshirani, Robert Tibshirani [arXiv:1301.7161]. Discussion of "A significance test for the lasso" by Richard Lockhart, Jonathan Taylor, Ryan J. Tibshirani, Robert Tibshirani [arXiv:1301.7161]. △ Less

Submitted 27 May, 2014; originally announced May 2014.

Comments: Published in at http://dx.doi.org/10.1214/14-AOS1175F the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1175F

Journal ref: Annals of Statistics 2014, Vol. 42, No. 2, 509-517

arXiv:1405.0338 [pdf, ps, other]

Rate Optimal Denoising of Simultaneously Sparse and Low Rank Matrices

Authors: Dan Yang, Zongming Ma, Andreas Buja

Abstract: We study minimax rates for denoising simultaneously sparse and low rank matrices in high dimensions. We show that an iterative thresholding algorithm achieves (near) optimal rates adaptively under mild conditions for a large class of loss functions. Numerical experiments on synthetic datasets also demonstrate the competitive performance of the proposed method. We study minimax rates for denoising simultaneously sparse and low rank matrices in high dimensions. We show that an iterative thresholding algorithm achieves (near) optimal rates adaptively under mild conditions for a large class of loss functions. Numerical experiments on synthetic datasets also demonstrate the competitive performance of the proposed method. △ Less

Submitted 1 May, 2014; originally announced May 2014.

arXiv:1404.1578 [pdf, other]

Models as Approximations I: Consequences Illustrated with Linear Regression

Authors: Andreas Buja, Richard Berk, Lawrence Brown, Edward George, Emil Pitkin, Mikhail Traskin, Linda Zhao, Kai Zhang

Abstract: In the early 1980s Halbert White inaugurated a "model-robust'' form of statistical inference based on the "sandwich estimator'' of standard error. This estimator is known to be "heteroskedasticity-consistent", but it is less well-known to be "nonlinearity-consistent'' as well. Nonlinearity, however, raises fundamental issues because in its presence regressors are not ancillary, hence can't be trea… ▽ More In the early 1980s Halbert White inaugurated a "model-robust'' form of statistical inference based on the "sandwich estimator'' of standard error. This estimator is known to be "heteroskedasticity-consistent", but it is less well-known to be "nonlinearity-consistent'' as well. Nonlinearity, however, raises fundamental issues because in its presence regressors are not ancillary, hence can't be treated as fixed. The consequences are deep: (1)~population slopes need to be re-interpreted as statistical functionals obtained from OLS fits to largely arbitrary joint $\xy$~distributions; (2)~the meaning of slope parameters needs to be rethought; (3)~the regressor distribution affects the slope parameters; (4)~randomness of the regressors becomes a source of sampling variability in slope estimates; (5)~inference needs to be based on model-robust standard errors, including sandwich estimators or the $\xy$~bootstrap. In theory, model-robust and model-trusting standard errors can deviate by arbitrary magnitudes either way. In practice, significant deviations between them can be detected with a diagnostic test. △ Less

Submitted 6 July, 2019; v1 submitted 6 April, 2014; originally announced April 2014.

Comments: Submitted

arXiv:1311.0291 [pdf, other]

Improved Precision in Estimating Average Treatment Effects

Authors: Emil Pitkin, Richard Berk, Lawrence Brown, Andreas Buja, Ed George, Kai Zhang, Linda Zhao

Abstract: The Average Treatment Effect (ATE) is a global measure of the effectiveness of an experimental treatment intervention. Classical methods of its estimation either ignore relevant covariates or do not fully exploit them. Moreover, past work has considered covariates as fixed. We present a method for improving the precision of the ATE estimate: the treatment and control responses are estimated via a… ▽ More The Average Treatment Effect (ATE) is a global measure of the effectiveness of an experimental treatment intervention. Classical methods of its estimation either ignore relevant covariates or do not fully exploit them. Moreover, past work has considered covariates as fixed. We present a method for improving the precision of the ATE estimate: the treatment and control responses are estimated via a regression, and information is pooled between the groups to produce an asymptotically unbiased estimate; we subsequently justify the random X paradigm underlying the result. Standard errors are derived, and the estimator's performance is compared to the traditional estimator. Conditions under which the regression-based estimator is preferable are detailed, and a demonstration on real data is presented. △ Less

Submitted 1 November, 2013; originally announced November 2013.

Comments: 22 pages, 1 figure

arXiv:1310.5677 [pdf]

Penalized Split Criteria for Interpretable Trees

Authors: Alex Goldstein, Andreas Buja

Abstract: This paper describes techniques for growing classification and regression trees designed to induce visually interpretable trees. This is achieved by penalizing splits that extend the subset of features used in a particular branch of the tree. After a brief motivation, we summarize existing methods and introduce new ones, providing illustrative examples throughout. Using a number of real classifica… ▽ More This paper describes techniques for growing classification and regression trees designed to induce visually interpretable trees. This is achieved by penalizing splits that extend the subset of features used in a particular branch of the tree. After a brief motivation, we summarize existing methods and introduce new ones, providing illustrative examples throughout. Using a number of real classification and regression datasets, we find that these procedures can offer more interpretable fits than the CART methodology with very modest increases in out-of-sample loss. △ Less

Submitted 21 October, 2013; originally announced October 2013.

Comments: 25 pages

arXiv:1306.1059 [pdf, ps, other]

doi 10.1214/12-AOS1077

Valid post-selection inference

Authors: Richard Berk, Lawrence Brown, Andreas Buja, Kai Zhang, Linda Zhao

Abstract: It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees that classical statistical theory provides for tests and confidence intervals when the model has been chosen a priori. We propose to produce valid ``post-selection inference'' by reducing the problem to… ▽ More It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees that classical statistical theory provides for tests and confidence intervals when the model has been chosen a priori. We propose to produce valid ``post-selection inference'' by reducing the problem to one of simultaneous inference and hence suitably widening conventional confidence and retention intervals. Simultaneity is required for all linear functions that arise as coefficient estimates in all submodels. By purchasing ``simultaneity insurance'' for all possible submodels, the resulting post-selection inference is rendered universally valid under all possible model selection procedures. This inference is therefore generally conservative for particular selection procedures, but it is always less conservative than full Scheffe protection. Importantly it does not depend on the truth of the selected submodel, and hence it produces valid inference even in wrong models. We describe the structure of the simultaneous inference problem and give some asymptotic results. △ Less

Submitted 5 June, 2013; originally announced June 2013.

Comments: Published in at http://dx.doi.org/10.1214/12-AOS1077 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1077

Journal ref: Annals of Statistics 2013, Vol. 41, No. 2, 802-837

arXiv:1112.2433 [pdf, other]

A Sparse SVD Method for High-dimensional Data

Authors: Dan Yang, Zongming Ma, Andreas Buja

Abstract: We present a new computational approach to approximating a large, noisy data table by a low-rank matrix with sparse singular vectors. The approximation is obtained from thresholded subspace iterations that produce the singular vectors simultaneously, rather than successively as in competing proposals. We introduce novel ways to estimate thresholding parameters which obviate the need for computatio… ▽ More We present a new computational approach to approximating a large, noisy data table by a low-rank matrix with sparse singular vectors. The approximation is obtained from thresholded subspace iterations that produce the singular vectors simultaneously, rather than successively as in competing proposals. We introduce novel ways to estimate thresholding parameters which obviate the need for computationally expensive cross-validation. We also introduce a way to sparsely initialize the algorithm for computational savings that allow our algorithm to outperform the vanilla SVD on the full data table when the signal is sparse. A comparison with two existing sparse SVD methods suggests that our algorithm is computationally always faster and statistically always at least comparable to the better of the two competing algorithms. △ Less

Submitted 11 December, 2011; originally announced December 2011.

arXiv:0808.0777 [pdf, ps, other]

doi 10.1214/07-STS251

A Conversation with Peter Huber

Authors: Andreas Buja, Hans R. Künsch

Abstract: Peter J. Huber was born on March 25, 1934, in Wohlen, a small town in the Swiss countryside. He obtained a diploma in mathematics in 1958 and a Ph.D. in mathematics in 1961, both from ETH Zurich. His thesis was in pure mathematics, but he then decided to go into statistics. He spent 1961--1963 as a postdoc at the statistics department in Berkeley where he wrote his first and most famous paper on… ▽ More Peter J. Huber was born on March 25, 1934, in Wohlen, a small town in the Swiss countryside. He obtained a diploma in mathematics in 1958 and a Ph.D. in mathematics in 1961, both from ETH Zurich. His thesis was in pure mathematics, but he then decided to go into statistics. He spent 1961--1963 as a postdoc at the statistics department in Berkeley where he wrote his first and most famous paper on robust statistics, ``Robust Estimation of a Location Parameter.'' After a position as a visiting professor at Cornell University, he became a full professor at ETH Zurich. He worked at ETH until 1978, interspersed by visiting positions at Cornell, Yale, Princeton and Harvard. After leaving ETH, he held professor positions at Harvard University 1978--1988, at MIT 1988--1992, and finally at the University of Bayreuth from 1992 until his retirement in 1999. He now lives in Klosters, a village in the Grisons in the Swiss Alps. Peter Huber has published four books and over 70 papers on statistics and data analysis. In addition, he has written more than a dozen papers and two books on Babylonian mathematics, astronomy and history. In 1972, he delivered the Wald lectures. He is a fellow of the IMS, of the American Association for the Advancement of Science, and of the American Academy of Arts and Sciences. In 1988 he received a Humboldt Award and in 1994 an honorary doctorate from the University of Neuchâtel. In addition to his fundamental results in robust statistics, Peter Huber made important contributions to computational statistics, strategies in data analysis, and applications of statistics in fields such as crystallography, EEGs, and human growth curves. △ Less

Submitted 6 August, 2008; originally announced August 2008.

Comments: Published in at http://dx.doi.org/10.1214/07-STS251 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS251

Journal ref: Statistical Science 2008, Vol. 23, No. 1, 120-135

arXiv:0807.4862 [pdf, ps, other]

doi 10.1214/08-EJS218

Functional principal components analysis via penalized rank one approximation

Authors: Jianhua Z. Huang, Haipeng Shen, Andreas Buja

Abstract: Two existing approaches to functional principal components analysis (FPCA) are due to Rice and Silverman (1991) and Silverman (1996), both based on maximizing variance but introducing penalization in different ways. In this article we propose an alternative approach to FPCA using penalized rank one approximation to the data matrix. Our contributions are four-fold: (1) by considering invariance u… ▽ More Two existing approaches to functional principal components analysis (FPCA) are due to Rice and Silverman (1991) and Silverman (1996), both based on maximizing variance but introducing penalization in different ways. In this article we propose an alternative approach to FPCA using penalized rank one approximation to the data matrix. Our contributions are four-fold: (1) by considering invariance under scale transformation of the measurements, the new formulation sheds light on how regularization should be performed for FPCA and suggests an efficient power algorithm for computation; (2) it naturally incorporates spline smoothing of discretized functional data; (3) the connection with smoothing splines also facilitates construction of cross-validation or generalized cross-validation criteria for smoothing parameter selection that allows efficient computation; (4) different smoothing parameters are permitted for different FPCs. The methodology is illustrated with a real data example and a simulation. △ Less

Submitted 30 July, 2008; originally announced July 2008.

Comments: Published in at http://dx.doi.org/10.1214/08-EJS218 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-EJS-EJS_2008_218 MSC Class: 62G08; 62H25 (Primary) 65F30 (Secondary)

Journal ref: Electronic Journal of Statistics 2008, Vol. 2, 678-695

arXiv:0804.2757 [pdf, ps, other]

doi 10.1214/07-STS242B

Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting

Authors: Andreas Buja, David Mease, Abraham J. Wyner

Abstract: The authors are doing the readers of Statistical Science a true service with a well-written and up-to-date overview of boosting that originated with the seminal algorithms of Freund and Schapire. Equally, we are grateful for high-level software that will permit a larger readership to experiment with, or simply apply, boosting-inspired model fitting. The authors show us a world of methodology tha… ▽ More The authors are doing the readers of Statistical Science a true service with a well-written and up-to-date overview of boosting that originated with the seminal algorithms of Freund and Schapire. Equally, we are grateful for high-level software that will permit a larger readership to experiment with, or simply apply, boosting-inspired model fitting. The authors show us a world of methodology that illustrates how a fundamental innovation can penetrate every nook and cranny of statistical thinking and practice. They introduce the reader to one particular interpretation of boosting and then give a display of its potential with extensions from classification (where it all started) to least squares, exponential family models, survival analysis, to base-learners other than trees such as smoothing splines, to degrees of freedom and regularization, and to fascinating recent work in model selection. The uninitiated reader will find that the authors did a nice job of presenting a certain coherent and useful interpretation of boosting. The other reader, though, who has watched the business of boosting for a while, may have quibbles with the authors over details of the historic record and, more importantly, over their optimism about the current state of theoretical knowledge. In fact, as much as ``the statistical view'' has proven fruitful, it has also resulted in some ideas about why boosting works that may be misconceived, and in some recommendations that may be misguided. [arXiv:0804.2752] △ Less

Submitted 17 April, 2008; originally announced April 2008.

Comments: Published in at http://dx.doi.org/10.1214/07-STS242B the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS242B

Journal ref: Statistical Science 2007, Vol. 22, No. 4, 506-512

Showing 1–23 of 23 results for author: Buja, A