-
Hole or grain? A Section Pursuit Index for Finding Hidden Structure in Multiple Dimensions
Authors:
Ursula Laa,
Dianne Cook,
Andreas Buja,
German Valencia
Abstract:
Multivariate data is often visualized using linear projections, produced by techniques such as principal component analysis, linear discriminant analysis, and projection pursuit. A problem with projections is that they obscure low and high density regions near the center of the distribution. Sections, or slices, can help to reveal them. This paper develops a section pursuit method, building on the…
▽ More
Multivariate data is often visualized using linear projections, produced by techniques such as principal component analysis, linear discriminant analysis, and projection pursuit. A problem with projections is that they obscure low and high density regions near the center of the distribution. Sections, or slices, can help to reveal them. This paper develops a section pursuit method, building on the extensive work in projection pursuit, to search for interesting slices of the data. Linear projections are used to define sections of the parameter space, and to calculate interestingness by comparing the distribution of observations, inside and outside a section. By optimizing this index, it is possible to reveal features such as holes (low density) or grains (high density). The optimization is incorporated into a guided tour so that the search for structure can be dynamic. The approach can be useful for problems when data distributions depart from uniform or normal, as in visually exploring nonlinear manifolds, and functions in multivariate space. Two applications of section pursuit are shown: exploring decision boundaries from classification models, and exploring subspaces induced by complex inequality conditions from multiple parameter model. The new methods are available in R, in the tourr package.
△ Less
Submitted 10 March, 2022; v1 submitted 28 April, 2020;
originally announced April 2020.
-
All of Linear Regression
Authors:
Arun K. Kuchibhotla,
Lawrence D. Brown,
Andreas Buja,
Junhui Cai
Abstract:
Least squares linear regression is one of the oldest and widely used data analysis tools. Although the theoretical analysis of the ordinary least squares (OLS) estimator is as old, several fundamental questions are yet to be answered. Suppose regression observations $(X_1,Y_1),\ldots,(X_n,Y_n)\in\mathbb{R}^d\times\mathbb{R}$ (not necessarily independent) are available. Some of the questions we dea…
▽ More
Least squares linear regression is one of the oldest and widely used data analysis tools. Although the theoretical analysis of the ordinary least squares (OLS) estimator is as old, several fundamental questions are yet to be answered. Suppose regression observations $(X_1,Y_1),\ldots,(X_n,Y_n)\in\mathbb{R}^d\times\mathbb{R}$ (not necessarily independent) are available. Some of the questions we deal with are as follows: under what conditions, does the OLS estimator converge and what is the limit? What happens if the dimension is allowed to grow with $n$? What happens if the observations are dependent with dependence possibly strengthening with $n$? How to do statistical inference under these kinds of misspecification? What happens to the OLS estimator under variable selection? How to do inference under misspecification and variable selection?
We answer all the questions raised above with one simple deterministic inequality which holds for any set of observations and any sample size. This implies that all our results are a finite sample (non-asymptotic) in nature. In the end, one only needs to bound certain random quantities under specific settings of interest to get concrete rates and we derive these bounds for the case of independent observations. In particular, the problem of inference after variable selection is studied, for the first time, when $d$, the number of covariates increases (almost exponentially) with sample size $n$. We provide comments on the ``right'' statistic to consider for inference under variable selection and efficient computation of quantiles.
△ Less
Submitted 14 October, 2019;
originally announced October 2019.
-
Model-free Study of Ordinary Least Squares Linear Regression
Authors:
Arun K. Kuchibhotla,
Lawrence D. Brown,
Andreas Buja
Abstract:
Ordinary least squares (OLS) linear regression is one of the most basic statistical techniques for data analysis. In the main stream literature and the statistical education, the study of linear regression is typically restricted to the case where the covariates are fixed, errors are mean zero Gaussians with variance independent of the (fixed) covariates. Even though OLS has been studied under mis…
▽ More
Ordinary least squares (OLS) linear regression is one of the most basic statistical techniques for data analysis. In the main stream literature and the statistical education, the study of linear regression is typically restricted to the case where the covariates are fixed, errors are mean zero Gaussians with variance independent of the (fixed) covariates. Even though OLS has been studied under misspecification from as early as the 1960's, the implications have not yet caught up with the main stream literature and applied sciences. The present article is an attempt at a unified viewpoint that makes the various implications of misspecification stand out.
△ Less
Submitted 27 September, 2018;
originally announced September 2018.
-
A Security Analysis of IoT Encryption: Side-channel Cube Attack on Simeck32/64
Authors:
Alya Geogiana Buja,
Shekh Faisal Abdul-Latip,
Rabiah Ahmad
Abstract:
Simeck, a lightweight block cipher has been proposed to be one of the encryption that can be employed in the Internet of Things (IoT) applications. Therefore, this paper presents the security of the Simeck32/64 block cipher against side-channel cube attack. We exhibit our attack against Simeck32/64 using the Hamming weight leakage assumption to extract linearly independent equations in key bits. W…
▽ More
Simeck, a lightweight block cipher has been proposed to be one of the encryption that can be employed in the Internet of Things (IoT) applications. Therefore, this paper presents the security of the Simeck32/64 block cipher against side-channel cube attack. We exhibit our attack against Simeck32/64 using the Hamming weight leakage assumption to extract linearly independent equations in key bits. We have been able to find 32 linearly independent equations in 32 key variables by only considering the second bit from the LSB of the Hamming weight leakage of the internal state on the fourth round of the cipher. This enables our attack to improve previous attacks on Simeck32/64 within side-channel attack model with better time and data complexity of 2^35 and 2^11.29 respectively.
△ Less
Submitted 10 August, 2018;
originally announced August 2018.
-
Using Recursive Partitioning to Find and Estimate Heterogenous Treatment Effects In Randomized Clinical Trials
Authors:
Richard Berk,
Matthew Olson,
Andreas Buja,
Aurelie Ouss
Abstract:
Heterogeneous treatment effects can be very important in the analysis of randomized clinical trials. Heightened risks or enhanced benefits may exist for particular subsets of study subjects. When the heterogeneous treatment effects are specified as the research is being designed, there are proper and readily available analysis techniques. When the heterogeneous treatment effects are inductively ob…
▽ More
Heterogeneous treatment effects can be very important in the analysis of randomized clinical trials. Heightened risks or enhanced benefits may exist for particular subsets of study subjects. When the heterogeneous treatment effects are specified as the research is being designed, there are proper and readily available analysis techniques. When the heterogeneous treatment effects are inductively obtained as an experiment's data are analyzed, significant complications are introduced. There can be a need for special loss functions designed to find local average treatment effects and for techniques that properly address post selection statistical inference. In this paper, we tackle both while undertaking a recursive partitioning analysis of a randomized clinical trial testing whether individuals on probation, who are low risk, can be minimally supervised with no increase in recidivism.
△ Less
Submitted 11 July, 2018;
originally announced July 2018.
-
Assumption Lean Regression
Authors:
Richard Berk,
Andreas Buja,
Lawrence Brown,
Edward George,
Arun Kumar Kuchibhotla,
Weijie J. Su,
Linda Zhao
Abstract:
It is well known that models used in conventional regression analysis are commonly misspecified. A standard response is little more than a shrug. Data analysts invoke Box's maxim that all models are wrong and then proceed as if the results are useful nevertheless. In this paper, we provide an alternative. Regression models are treated explicitly as approximations of a true response surface that ca…
▽ More
It is well known that models used in conventional regression analysis are commonly misspecified. A standard response is little more than a shrug. Data analysts invoke Box's maxim that all models are wrong and then proceed as if the results are useful nevertheless. In this paper, we provide an alternative. Regression models are treated explicitly as approximations of a true response surface that can have a number of desirable statistical properties, including estimates that are asymptotically unbiased. Valid statistical inference follows. We generalize the formulation to include regression functionals, which broadens substantially the range of potential applications. An empirical application is provided to illustrate the paper's key concepts.
△ Less
Submitted 26 June, 2018; v1 submitted 23 June, 2018;
originally announced June 2018.
-
Valid Post-selection Inference in Assumption-lean Linear Regression
Authors:
Arun Kumar Kuchibhotla,
Lawrence D. Brown,
Andreas Buja,
Edward I. George,
Linda Zhao
Abstract:
Construction of valid statistical inference for estimators based on data-driven selection has received a lot of attention in the recent times. Berk et al. (2013) is possibly the first work to provide valid inference for Gaussian homoscedastic linear regression with fixed covariates under arbitrary covariate/variable selection. The setting is unrealistic and is extended by Bachoc et al. (2016) by r…
▽ More
Construction of valid statistical inference for estimators based on data-driven selection has received a lot of attention in the recent times. Berk et al. (2013) is possibly the first work to provide valid inference for Gaussian homoscedastic linear regression with fixed covariates under arbitrary covariate/variable selection. The setting is unrealistic and is extended by Bachoc et al. (2016) by relaxing the distributional assumptions. A major drawback of the aforementioned works is that the construction of valid confidence regions is computationally intensive. In this paper, we first prove that post-selection inference is equivalent to simultaneous inference and then construct valid post-selection confidence regions which are computationally simple. Our construction is based on deterministic inequalities and apply to independent as well as dependent random variables without the requirement of correct distributional assumptions. Finally, we compare the volume of our confidence regions with the existing ones and show that under non-stochastic covariates, our regions are much smaller.
△ Less
Submitted 11 June, 2018;
originally announced June 2018.
-
Uniform-in-Submodel Bounds for Linear Regression in a Model Free Framework
Authors:
Arun Kumar Kuchibhotla,
Lawrence D. Brown,
Andreas Buja,
Edward I. George,
Linda Zhao
Abstract:
For the last two decades, high-dimensional data and methods have proliferated throughout the literature. Yet, the classical technique of linear regression has not lost its usefulness in applications. In fact, many high-dimensional estimation techniques can be seen as variable selection that leads to a smaller set of variables (a ``sub-model'') where classical linear regression applies. We analyze…
▽ More
For the last two decades, high-dimensional data and methods have proliferated throughout the literature. Yet, the classical technique of linear regression has not lost its usefulness in applications. In fact, many high-dimensional estimation techniques can be seen as variable selection that leads to a smaller set of variables (a ``sub-model'') where classical linear regression applies. We analyze linear regression estimators resulting from model-selection by proving estimation error and linear representation bounds uniformly over sets of submodels. Based on deterministic inequalities, our results provide ``good'' rates when applied to both independent and dependent data. These results are useful in meaningfully interpreting the linear regression estimator obtained after exploring and reducing the variables and also in justifying post model-selection inference. All results are derived under no model assumptions and are non-asymptotic in nature.
△ Less
Submitted 17 May, 2021; v1 submitted 15 February, 2018;
originally announced February 2018.
-
Models as Approximations II: A Model-Free Theory of Parametric Regression
Authors:
Andreas Buja,
Lawrence Brown,
Arun Kumar Kuchibhotla,
Richard Berk,
Ed George,
Linda Zhao
Abstract:
We develop a model-free theory of general types of parametric regression for iid observations. The theory replaces the parameters of parametric models with statistical functionals, to be called "regression functionals'', defined on large non-parametric classes of joint $\xy$ distributions, without assuming a correct model. Parametric models are reduced to heuristics to suggest plausible objective…
▽ More
We develop a model-free theory of general types of parametric regression for iid observations. The theory replaces the parameters of parametric models with statistical functionals, to be called "regression functionals'', defined on large non-parametric classes of joint $\xy$ distributions, without assuming a correct model. Parametric models are reduced to heuristics to suggest plausible objective functions. An example of a regression functional is the vector of slopes of linear equations fitted by OLS to largely arbitrary $\xy$ distributions, without assuming a linear model (see Part~I). More generally, regression functionals can be defined by minimizing objective functions or solving estimating equations at joint $\xy$ distributions. In this framework it is possible to achieve the following: (1)~define a notion of well-specification for regression functionals that replaces the notion of correct specification of models, (2)~propose a well-specification diagnostic for regression functionals based on reweighting distributions and data, (3)~decompose sampling variability of regression functionals into two sources, one due to the conditional response distribution
and another due to the regressor distribution interacting with misspecification, both of order $N^{-1/2}$, (4)~exhibit plug-in/sandwich estimators of standard error as limit cases of $\xy$ bootstrap estimators, and (5)~provide theoretical heuristics to indicate that $\xy$ bootstrap standard errors may generally be more stable than sandwich estimators.
△ Less
Submitted 6 July, 2019; v1 submitted 10 December, 2016;
originally announced December 2016.
-
Smoothing Effects of Bagging: Von Mises Expansions of Bagged Statistical Functionals
Authors:
Andreas Buja,
Werner Stuetzle
Abstract:
Bagging is a device intended for reducing the prediction error of learning algorithms. In its simplest form, bagging draws bootstrap samples from the training sample, applies the learning algorithm to each bootstrap sample, and then averages the resulting prediction rules.
We extend the definition of bagging from statistics to statistical functionals and study the von Mises expansion of bagged s…
▽ More
Bagging is a device intended for reducing the prediction error of learning algorithms. In its simplest form, bagging draws bootstrap samples from the training sample, applies the learning algorithm to each bootstrap sample, and then averages the resulting prediction rules.
We extend the definition of bagging from statistics to statistical functionals and study the von Mises expansion of bagged statistical functionals. We show that the expansion is related to the Efron-Stein ANOVA expansion of the raw (unbagged) functional. The basic observation is that a bagged functional is always smooth in the sense that the von Mises expansion exists and is finite of length 1 + resample size $M$. This holds even if the raw functional is rough or unstable. The resample size $M$ acts as a smoothing parameter, where a smaller $M$ means more smoothing.
△ Less
Submitted 7 December, 2016;
originally announced December 2016.
-
Semi-Supervised linear regression
Authors:
David Azriel,
Lawrence D. Brown,
Michael Sklar,
Richard Berk,
Andreas Buja,
Linda Zhao
Abstract:
We study a regression problem where for some part of the data we observe both the label variable ($Y$) and the predictors (${\bf X}$), while for other part of the data only the predictors are given. Such a problem arises, for example, when observations of the label variable are costly and may require a skilled human agent. When the conditional expectation $E[Y | {\bf X}]$ is not exactly linear, on…
▽ More
We study a regression problem where for some part of the data we observe both the label variable ($Y$) and the predictors (${\bf X}$), while for other part of the data only the predictors are given. Such a problem arises, for example, when observations of the label variable are costly and may require a skilled human agent. When the conditional expectation $E[Y | {\bf X}]$ is not exactly linear, one can consider the best linear approximation to the conditional expectation, which can be estimated consistently by the least squares estimates (LSE). The latter depends only on the labeled data. We suggest improved alternative estimates to the LSE that use also the unlabeled data. Our estimation method can be easily implemented and has simply described asymptotic properties.The new estimates asymptotically dominate the usual standard procedures under certain non-linearity condition of $E[Y | {\bf X}]$; otherwise, they are asymptotically equivalent.The performance of the new estimator for small sample size is investigated in an extensive simulation study. A real data example of inferring homeless population is used to illustrate the new methodology.
△ Less
Submitted 13 April, 2021; v1 submitted 7 December, 2016;
originally announced December 2016.
-
Kernel Additive Principal Components
Authors:
Xin Lu Tan,
Andreas Buja,
Zongming Ma
Abstract:
Additive principal components (APCs for short) are a nonlinear generalization of linear principal components. We focus on smallest APCs to describe additive nonlinear constraints that are approximately satisfied by the data. Thus APCs fit data with implicit equations that treat the variables symmetrically, as opposed to regression analyses which fit data with explicit equations that treat the data…
▽ More
Additive principal components (APCs for short) are a nonlinear generalization of linear principal components. We focus on smallest APCs to describe additive nonlinear constraints that are approximately satisfied by the data. Thus APCs fit data with implicit equations that treat the variables symmetrically, as opposed to regression analyses which fit data with explicit equations that treat the data asymmetrically by singling out a response variable. We propose a regularized data-analytic procedure for APC estimation using kernel methods. In contrast to existing approaches to APCs that are based on regularization through subspace restriction, kernel methods achieve regularization through shrinkage and therefore grant distinctive flexibility in APC estimation by allowing the use of infinite-dimensional functions spaces for searching APC transformation while retaining computational feasibility. To connect population APCs and kernelized finite-sample APCs, we study kernelized population APCs and their associated eigenproblems, which eventually lead to the establishment of consistency of the estimated APCs. Lastly, we discuss an iterative algorithm for computing kernelized finite-sample APCs.
△ Less
Submitted 20 November, 2015;
originally announced November 2015.
-
Calibrated Percentile Double Bootstrap For Robust Linear Regression Inference
Authors:
Daniel McCarthy,
Kai Zhang,
Lawrence Brown,
Richard Berk,
Andreas Buja,
Edward George,
Linda Zhao
Abstract:
We consider inference for the parameters of a linear model when the covariates are random and the relationship between response and covariates is possibly non-linear. Conventional inference methods such as z-intervals perform poorly in these cases. We propose a double bootstrap-based calibrated percentile method, perc-cal, as a general-purpose CI method which performs very well relative to alterna…
▽ More
We consider inference for the parameters of a linear model when the covariates are random and the relationship between response and covariates is possibly non-linear. Conventional inference methods such as z-intervals perform poorly in these cases. We propose a double bootstrap-based calibrated percentile method, perc-cal, as a general-purpose CI method which performs very well relative to alternative methods in challenging situations such as these. The superior performance of perc-cal is demonstrated by a thorough, full-factorial design synthetic data study as well as a real data example involving the length of criminal sentences. We also provide theoretical justification for the perc-cal method under mild conditions. The method is implemented in the R package `perccal', available through CRAN and coded primarily in C++, to make it easier for practitioners to use.
△ Less
Submitted 16 January, 2017; v1 submitted 1 November, 2015;
originally announced November 2015.
-
Discussion: "A significance test for the lasso"
Authors:
A. Buja,
L. Brown
Abstract:
Discussion of "A significance test for the lasso" by Richard Lockhart, Jonathan Taylor, Ryan J. Tibshirani, Robert Tibshirani [arXiv:1301.7161].
Discussion of "A significance test for the lasso" by Richard Lockhart, Jonathan Taylor, Ryan J. Tibshirani, Robert Tibshirani [arXiv:1301.7161].
△ Less
Submitted 27 May, 2014;
originally announced May 2014.
-
Rate Optimal Denoising of Simultaneously Sparse and Low Rank Matrices
Authors:
Dan Yang,
Zongming Ma,
Andreas Buja
Abstract:
We study minimax rates for denoising simultaneously sparse and low rank matrices in high dimensions. We show that an iterative thresholding algorithm achieves (near) optimal rates adaptively under mild conditions for a large class of loss functions. Numerical experiments on synthetic datasets also demonstrate the competitive performance of the proposed method.
We study minimax rates for denoising simultaneously sparse and low rank matrices in high dimensions. We show that an iterative thresholding algorithm achieves (near) optimal rates adaptively under mild conditions for a large class of loss functions. Numerical experiments on synthetic datasets also demonstrate the competitive performance of the proposed method.
△ Less
Submitted 1 May, 2014;
originally announced May 2014.
-
Models as Approximations I: Consequences Illustrated with Linear Regression
Authors:
Andreas Buja,
Richard Berk,
Lawrence Brown,
Edward George,
Emil Pitkin,
Mikhail Traskin,
Linda Zhao,
Kai Zhang
Abstract:
In the early 1980s Halbert White inaugurated a "model-robust'' form of statistical inference based on the "sandwich estimator'' of standard error. This estimator is known to be "heteroskedasticity-consistent", but it is less well-known to be "nonlinearity-consistent'' as well. Nonlinearity, however, raises fundamental issues because in its presence regressors are not ancillary, hence can't be trea…
▽ More
In the early 1980s Halbert White inaugurated a "model-robust'' form of statistical inference based on the "sandwich estimator'' of standard error. This estimator is known to be "heteroskedasticity-consistent", but it is less well-known to be "nonlinearity-consistent'' as well. Nonlinearity, however, raises fundamental issues because in its presence regressors are not ancillary, hence can't be treated as fixed.
The consequences are deep: (1)~population slopes need to be re-interpreted as statistical functionals obtained from OLS fits to largely arbitrary joint $\xy$~distributions; (2)~the meaning of slope parameters needs to be rethought; (3)~the regressor distribution affects the slope parameters; (4)~randomness of the regressors becomes a source of sampling variability in slope estimates; (5)~inference needs to be based on model-robust standard errors, including sandwich estimators or the $\xy$~bootstrap. In theory, model-robust and model-trusting standard errors can deviate by arbitrary magnitudes either way. In practice, significant deviations between them can be detected with a diagnostic test.
△ Less
Submitted 6 July, 2019; v1 submitted 6 April, 2014;
originally announced April 2014.
-
Improved Precision in Estimating Average Treatment Effects
Authors:
Emil Pitkin,
Richard Berk,
Lawrence Brown,
Andreas Buja,
Ed George,
Kai Zhang,
Linda Zhao
Abstract:
The Average Treatment Effect (ATE) is a global measure of the effectiveness of an experimental treatment intervention. Classical methods of its estimation either ignore relevant covariates or do not fully exploit them. Moreover, past work has considered covariates as fixed. We present a method for improving the precision of the ATE estimate: the treatment and control responses are estimated via a…
▽ More
The Average Treatment Effect (ATE) is a global measure of the effectiveness of an experimental treatment intervention. Classical methods of its estimation either ignore relevant covariates or do not fully exploit them. Moreover, past work has considered covariates as fixed. We present a method for improving the precision of the ATE estimate: the treatment and control responses are estimated via a regression, and information is pooled between the groups to produce an asymptotically unbiased estimate; we subsequently justify the random X paradigm underlying the result. Standard errors are derived, and the estimator's performance is compared to the traditional estimator. Conditions under which the regression-based estimator is preferable are detailed, and a demonstration on real data is presented.
△ Less
Submitted 1 November, 2013;
originally announced November 2013.
-
Penalized Split Criteria for Interpretable Trees
Authors:
Alex Goldstein,
Andreas Buja
Abstract:
This paper describes techniques for growing classification and regression trees designed to induce visually interpretable trees. This is achieved by penalizing splits that extend the subset of features used in a particular branch of the tree. After a brief motivation, we summarize existing methods and introduce new ones, providing illustrative examples throughout. Using a number of real classifica…
▽ More
This paper describes techniques for growing classification and regression trees designed to induce visually interpretable trees. This is achieved by penalizing splits that extend the subset of features used in a particular branch of the tree. After a brief motivation, we summarize existing methods and introduce new ones, providing illustrative examples throughout. Using a number of real classification and regression datasets, we find that these procedures can offer more interpretable fits than the CART methodology with very modest increases in out-of-sample loss.
△ Less
Submitted 21 October, 2013;
originally announced October 2013.
-
Valid post-selection inference
Authors:
Richard Berk,
Lawrence Brown,
Andreas Buja,
Kai Zhang,
Linda Zhao
Abstract:
It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees that classical statistical theory provides for tests and confidence intervals when the model has been chosen a priori. We propose to produce valid ``post-selection inference'' by reducing the problem to…
▽ More
It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees that classical statistical theory provides for tests and confidence intervals when the model has been chosen a priori. We propose to produce valid ``post-selection inference'' by reducing the problem to one of simultaneous inference and hence suitably widening conventional confidence and retention intervals. Simultaneity is required for all linear functions that arise as coefficient estimates in all submodels. By purchasing ``simultaneity insurance'' for all possible submodels, the resulting post-selection inference is rendered universally valid under all possible model selection procedures. This inference is therefore generally conservative for particular selection procedures, but it is always less conservative than full Scheffe protection. Importantly it does not depend on the truth of the selected submodel, and hence it produces valid inference even in wrong models. We describe the structure of the simultaneous inference problem and give some asymptotic results.
△ Less
Submitted 5 June, 2013;
originally announced June 2013.
-
A Sparse SVD Method for High-dimensional Data
Authors:
Dan Yang,
Zongming Ma,
Andreas Buja
Abstract:
We present a new computational approach to approximating a large, noisy data table by a low-rank matrix with sparse singular vectors. The approximation is obtained from thresholded subspace iterations that produce the singular vectors simultaneously, rather than successively as in competing proposals. We introduce novel ways to estimate thresholding parameters which obviate the need for computatio…
▽ More
We present a new computational approach to approximating a large, noisy data table by a low-rank matrix with sparse singular vectors. The approximation is obtained from thresholded subspace iterations that produce the singular vectors simultaneously, rather than successively as in competing proposals. We introduce novel ways to estimate thresholding parameters which obviate the need for computationally expensive cross-validation. We also introduce a way to sparsely initialize the algorithm for computational savings that allow our algorithm to outperform the vanilla SVD on the full data table when the signal is sparse. A comparison with two existing sparse SVD methods suggests that our algorithm is computationally always faster and statistically always at least comparable to the better of the two competing algorithms.
△ Less
Submitted 11 December, 2011;
originally announced December 2011.
-
A Conversation with Peter Huber
Authors:
Andreas Buja,
Hans R. Künsch
Abstract:
Peter J. Huber was born on March 25, 1934, in Wohlen, a small town in the Swiss countryside. He obtained a diploma in mathematics in 1958 and a Ph.D. in mathematics in 1961, both from ETH Zurich. His thesis was in pure mathematics, but he then decided to go into statistics. He spent 1961--1963 as a postdoc at the statistics department in Berkeley where he wrote his first and most famous paper on…
▽ More
Peter J. Huber was born on March 25, 1934, in Wohlen, a small town in the Swiss countryside. He obtained a diploma in mathematics in 1958 and a Ph.D. in mathematics in 1961, both from ETH Zurich. His thesis was in pure mathematics, but he then decided to go into statistics. He spent 1961--1963 as a postdoc at the statistics department in Berkeley where he wrote his first and most famous paper on robust statistics, ``Robust Estimation of a Location Parameter.'' After a position as a visiting professor at Cornell University, he became a full professor at ETH Zurich. He worked at ETH until 1978, interspersed by visiting positions at Cornell, Yale, Princeton and Harvard. After leaving ETH, he held professor positions at Harvard University 1978--1988, at MIT 1988--1992, and finally at the University of Bayreuth from 1992 until his retirement in 1999. He now lives in Klosters, a village in the Grisons in the Swiss Alps. Peter Huber has published four books and over 70 papers on statistics and data analysis. In addition, he has written more than a dozen papers and two books on Babylonian mathematics, astronomy and history. In 1972, he delivered the Wald lectures. He is a fellow of the IMS, of the American Association for the Advancement of Science, and of the American Academy of Arts and Sciences. In 1988 he received a Humboldt Award and in 1994 an honorary doctorate from the University of Neuchâtel. In addition to his fundamental results in robust statistics, Peter Huber made important contributions to computational statistics, strategies in data analysis, and applications of statistics in fields such as crystallography, EEGs, and human growth curves.
△ Less
Submitted 6 August, 2008;
originally announced August 2008.
-
Functional principal components analysis via penalized rank one approximation
Authors:
Jianhua Z. Huang,
Haipeng Shen,
Andreas Buja
Abstract:
Two existing approaches to functional principal components analysis (FPCA) are due to Rice and Silverman (1991) and Silverman (1996), both based on maximizing variance but introducing penalization in different ways. In this article we propose an alternative approach to FPCA using penalized rank one approximation to the data matrix. Our contributions are four-fold: (1) by considering invariance u…
▽ More
Two existing approaches to functional principal components analysis (FPCA) are due to Rice and Silverman (1991) and Silverman (1996), both based on maximizing variance but introducing penalization in different ways. In this article we propose an alternative approach to FPCA using penalized rank one approximation to the data matrix. Our contributions are four-fold: (1) by considering invariance under scale transformation of the measurements, the new formulation sheds light on how regularization should be performed for FPCA and suggests an efficient power algorithm for computation; (2) it naturally incorporates spline smoothing of discretized functional data; (3) the connection with smoothing splines also facilitates construction of cross-validation or generalized cross-validation criteria for smoothing parameter selection that allows efficient computation; (4) different smoothing parameters are permitted for different FPCs. The methodology is illustrated with a real data example and a simulation.
△ Less
Submitted 30 July, 2008;
originally announced July 2008.
-
Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting
Authors:
Andreas Buja,
David Mease,
Abraham J. Wyner
Abstract:
The authors are doing the readers of Statistical Science a true service with a well-written and up-to-date overview of boosting that originated with the seminal algorithms of Freund and Schapire. Equally, we are grateful for high-level software that will permit a larger readership to experiment with, or simply apply, boosting-inspired model fitting. The authors show us a world of methodology tha…
▽ More
The authors are doing the readers of Statistical Science a true service with a well-written and up-to-date overview of boosting that originated with the seminal algorithms of Freund and Schapire. Equally, we are grateful for high-level software that will permit a larger readership to experiment with, or simply apply, boosting-inspired model fitting. The authors show us a world of methodology that illustrates how a fundamental innovation can penetrate every nook and cranny of statistical thinking and practice. They introduce the reader to one particular interpretation of boosting and then give a display of its potential with extensions from classification (where it all started) to least squares, exponential family models, survival analysis, to base-learners other than trees such as smoothing splines, to degrees of freedom and regularization, and to fascinating recent work in model selection. The uninitiated reader will find that the authors did a nice job of presenting a certain coherent and useful interpretation of boosting. The other reader, though, who has watched the business of boosting for a while, may have quibbles with the authors over details of the historic record and, more importantly, over their optimism about the current state of theoretical knowledge. In fact, as much as ``the statistical view'' has proven fruitful, it has also resulted in some ideas about why boosting works that may be misconceived, and in some recommendations that may be misguided. [arXiv:0804.2752]
△ Less
Submitted 17 April, 2008;
originally announced April 2008.