Search | arXiv e-print repository

The Geometric Median and Applications to Robust Mean Estimation

Abstract: This paper is devoted to the statistical and numerical properties of the geometric median, and its applications to the problem of robust mean estimation via the median of means principle. Our main theoretical results include (a) an upper bound for the distance between the mean and the median for general absolutely continuous distributions in R^d, and examples of specific classes of distributions f… ▽ More This paper is devoted to the statistical and numerical properties of the geometric median, and its applications to the problem of robust mean estimation via the median of means principle. Our main theoretical results include (a) an upper bound for the distance between the mean and the median for general absolutely continuous distributions in R^d, and examples of specific classes of distributions for which these bounds do not depend on the ambient dimension d; (b) exponential deviation inequalities for the distance between the sample and the population versions of the geometric median, which again depend only on the trace-type quantities and not on the ambient dimension. As a corollary, we deduce improved bounds for the (geometric) median of means estimator that hold for large classes of heavy-tailed distributions. Finally, we address the error of numerical approximation, which is an important practical aspect of any statistical estimation procedure. We demonstrate that the objective function minimized by the geometric median satisfies a "local quadratic growth" condition that allows one to translate suboptimality bounds for the objective function to the corresponding bounds for the numerical approximation to the median itself, and propose a simple stop** rule applicable to any optimization method which yields explicit error guarantees. We conclude with the numerical experiments including the application to estimation of mean values of log-returns for S&P 500 data. △ Less

Submitted 19 July, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: 28 pages, 2 figures

MSC Class: 62G35; 60E15

arXiv:2304.13239 [pdf, other]

Numerical Approximation of Andrews Plots with Optimal Spatial-Spectral Smoothing

Authors: Mitchell Rimerman, Nate Strawn

Abstract: Andrews plots provide aesthetically pleasant visualizations of high-dimensional datasets. This work proves that Andrews plots (when defined in terms of the principal component scores of a dataset) are optimally ``smooth'' on average, and solve an infinite-dimensional quadratic minimization program over the set of linear isometries from the Euclidean data space to $L^2([0,1])$. By building technica… ▽ More Andrews plots provide aesthetically pleasant visualizations of high-dimensional datasets. This work proves that Andrews plots (when defined in terms of the principal component scores of a dataset) are optimally ``smooth'' on average, and solve an infinite-dimensional quadratic minimization program over the set of linear isometries from the Euclidean data space to $L^2([0,1])$. By building technical machinery that characterizes the solutions to general infinite-dimensional quadratic minimization programs over linear isometries, we further show that the solution set is (in the generic case) a manifold. To avoid the ambiguities presented by this manifold of solutions, we add ``spectral smoothing'' terms to the infinite-dimensional optimization program to induce Andrews plots with optimal spatial-spectral smoothing. We characterize the (generic) set of solutions to this program and prove that the resulting plots admit efficient numerical approximations. These spatial-spectral smooth Andrews plots tend to avoid some ``visual clutter'' that arises due to the oscillation of trigonometric polynomials. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: 25 pages, 12 figures

MSC Class: 42A99; 46N10; 47N10

arXiv:2107.10869 [pdf, other]

Filament Plots for Data Visualization

Authors: Nate Strawn

Abstract: The efficiency of modern computer graphics allows us to explore collections of space curves simultaneously with "drag-to-rotate" interfaces. This inspires us to replace "scatterplots of points" with "scatterplots of curves" to simultaneously visualize relationships across an entire dataset. Since spaces of curves are infinite dimensional, scatterplots of curves avoid the "lossy" nature of scatterp… ▽ More The efficiency of modern computer graphics allows us to explore collections of space curves simultaneously with "drag-to-rotate" interfaces. This inspires us to replace "scatterplots of points" with "scatterplots of curves" to simultaneously visualize relationships across an entire dataset. Since spaces of curves are infinite dimensional, scatterplots of curves avoid the "lossy" nature of scatterplots of points. In particular, if two points are close in a scatterplot of points derived from high-dimensional data, it does not generally follow that the two associated data points are close in the data space. Standard Andrews plots provide scatterplots of curves that perfectly preserve Euclidean distances, but simultaneous visualization of these graphs over an entire dataset produces visual clutter because graphs of functions generally overlap in 2D. We mitigate this visual clutter issue by constructing computationally inexpensive 3D extensions of Andrews plots. First, we construct optimally smooth 3D Andrews plots by considering linear isometries from Euclidean data spaces to spaces of planar parametric curves. We rigorously parametrize the linear isometries that produce (on average) optimally smooth curves over a given dataset. This parameterization of optimal isometries reveals many degrees of freedom, and (using recent results on generalized Gauss sums) we identify a particular member of this set which admits an asymptotic "tour" property that avoids certain local degeneracies as well. Finally, we construct unit-length 3D curves (filaments) by numerically solving Frenet-Serret systems given data from these 3D Andrews plots. We conclude with examples of filament plots for several standard datasets, illustrating how filament plots avoid visual clutter. Code and examples available at https://github.com/n8epi/filaments/ and https://n8epi.github.io/filaments/ △ Less

Submitted 9 March, 2022; v1 submitted 20 July, 2021; originally announced July 2021.

Comments: 43 pages, 13 figures; newest version updates plots, clarifies some terminology, and clarifies proofs

MSC Class: 42A99; 46N10; 53Z50

arXiv:1704.02658 [pdf, other]

Distributed Statistical Estimation and Rates of Convergence in Normal Approximation

Authors: Stanislav Minsker, Nate Strawn

Abstract: This paper presents a class of new algorithms for distributed statistical estimation that exploit divide-and-conquer approach. We show that one of the key benefits of the divide-and-conquer strategy is robustness, an important characteristic for large distributed systems. We establish connections between performance of these distributed algorithms and the rates of convergence in normal approximati… ▽ More This paper presents a class of new algorithms for distributed statistical estimation that exploit divide-and-conquer approach. We show that one of the key benefits of the divide-and-conquer strategy is robustness, an important characteristic for large distributed systems. We establish connections between performance of these distributed algorithms and the rates of convergence in normal approximation, and prove non-asymptotic deviations guarantees, as well as limit theorems, for the resulting estimators. Our techniques are illustrated through several examples: in particular, we obtain new results for the median-of-means estimator, as well as provide performance guarantees for distributed maximum likelihood estimation. △ Less

Submitted 27 August, 2018; v1 submitted 9 April, 2017; originally announced April 2017.

MSC Class: 68W15; 62G35

arXiv:1411.4158 [pdf, ps, other]

Bayesian Graphical Models for Multivariate Functional Data

Authors: Hongxiao Zhu, Nate Strawn, David B. Dunson

Abstract: Graphical models express conditional independence relationships among variables. Although methods for vector-valued data are well established, functional data graphical models remain underdeveloped. We introduce a notion of conditional independence between random functions, and construct a framework for Bayesian inference of undirected, decomposable graphs in the multivariate functional data conte… ▽ More Graphical models express conditional independence relationships among variables. Although methods for vector-valued data are well established, functional data graphical models remain underdeveloped. We introduce a notion of conditional independence between random functions, and construct a framework for Bayesian inference of undirected, decomposable graphs in the multivariate functional data context. This framework is based on extending Markov distributions and hyper Markov laws from random variables to random processes, providing a principled alternative to naive application of multivariate methods to discretized functional data. Markov properties facilitate the composition of likelihoods and priors according to the decomposition of a graph. Our focus is on Gaussian process graphical models using orthogonal basis expansions. We propose a hyper-inverse-Wishart-process prior for the covariance kernels of the infinite coefficient sequences of the basis expansion, establish existence, uniqueness, strong hyper Markov property, and conjugacy. Stochastic search Markov chain Monte Carlo algorithms are developed for posterior inference, assessed through simulations, and applied to a study of brain activity and alcoholism. △ Less

Submitted 5 January, 2016; v1 submitted 15 November, 2014; originally announced November 2014.

arXiv:1410.0719 [pdf, other]

Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)

Authors: L. Jacques, C. De Vleeschouwer, Y. Boursier, P. Sudhakar, C. De Mol, A. Pizurica, S. Anthoine, P. Vandergheynst, P. Frossard, C. Bilen, S. Kitic, N. Bertin, R. Gribonval, N. Boumal, B. Mishra, P. -A. Absil, R. Sepulchre, S. Bundervoet, C. Schretter, A. Dooms, P. Schelkens, O. Chabiron, F. Malgouyres, J. -Y. Tourneret, N. Dobigeon , et al. (42 additional authors not shown)

Abstract: The implicit objective of the biennial "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For its second edition, the iTWIST workshop took place in the medieval and picturesque town of Namur in… ▽ More The implicit objective of the biennial "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For its second edition, the iTWIST workshop took place in the medieval and picturesque town of Namur in Belgium, from Wednesday August 27th till Friday August 29th, 2014. The workshop was conveniently located in "The Arsenal" building within walking distance of both hotels and town center. iTWIST'14 has gathered about 70 international participants and has featured 9 invited talks, 10 oral presentations, and 14 posters on the following themes, all related to the theory, application and generalization of the "sparsity paradigm": Sparsity-driven data sensing and processing; Union of low dimensional subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph sensing/processing; Blind inverse problems and dictionary learning; Sparsity and computational neuroscience; Information theory, geometry and randomness; Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?; Sparse machine learning and inference. △ Less

Submitted 9 October, 2014; v1 submitted 2 October, 2014; originally announced October 2014.

Comments: 69 pages, 24 extended abstracts, iTWIST'14 website: http://sites.google.com/site/itwist14

arXiv:1406.0214 [pdf, other]

Topological and Statistical Behavior Classifiers for Tracking Applications

Authors: Paul Bendich, Sang Chin, Jesse Clarke, Jonathan deSena, John Harer, Elizabeth Munch, Andrew Newman, David Porter, David Rouse, Nate Strawn, Adam Watkins

Abstract: We introduce the first unified theory for target tracking using Multiple Hypothesis Tracking, Topological Data Analysis, and machine learning. Our string of innovations are 1) robust topological features are used to encode behavioral information, 2) statistical models are fitted to distributions over these topological features, and 3) the target type classification methods of Wigren and Bar Shalom… ▽ More We introduce the first unified theory for target tracking using Multiple Hypothesis Tracking, Topological Data Analysis, and machine learning. Our string of innovations are 1) robust topological features are used to encode behavioral information, 2) statistical models are fitted to distributions over these topological features, and 3) the target type classification methods of Wigren and Bar Shalom et al. are employed to exploit the resulting likelihoods for topological features inside of the tracking procedure. To demonstrate the efficacy of our approach, we test our procedure on synthetic vehicular data generated by the Simulation of Urban Mobility package. △ Less

Submitted 1 June, 2014; originally announced June 2014.

arXiv:1401.5833 [pdf, other]

Multiscale Dictionary Learning: Non-Asymptotic Bounds and Robustness

Authors: Mauro Maggioni, Stanislav Minsker, Nate Strawn

Abstract: High-dimensional datasets are well-approximated by low-dimensional structures. Over the past decade, this empirical observation motivated the investigation of detection, measurement, and modeling techniques to exploit these low-dimensional intrinsic structures, yielding numerous implications for high-dimensional statistics, machine learning, and signal processing. Manifold learning (where the low-… ▽ More High-dimensional datasets are well-approximated by low-dimensional structures. Over the past decade, this empirical observation motivated the investigation of detection, measurement, and modeling techniques to exploit these low-dimensional intrinsic structures, yielding numerous implications for high-dimensional statistics, machine learning, and signal processing. Manifold learning (where the low-dimensional structure is a manifold) and dictionary learning (where the low-dimensional structure is the set of sparse linear combinations of vectors from a finite dictionary) are two prominent theoretical and computational frameworks in this area. Despite their ostensible distinction, the recently-introduced Geometric Multi-Resolution Analysis (GMRA) provides a robust, computationally efficient, multiscale procedure for simultaneously learning manifolds and dictionaries. In this work, we prove non-asymptotic probabilistic bounds on the approximation error of GMRA for a rich class of data-generating statistical models that includes "noisy" manifolds, thereby establishing the theoretical robustness of the procedure and confirming empirical observations. In particular, if a dataset aggregates near a low-dimensional manifold, our results show that the approximation error of the GMRA is completely independent of the ambient dimension. Our work therefore establishes GMRA as a provably fast algorithm for dictionary learning with approximation and sparsity guarantees. We include several numerical experiments confirming these theoretical results, and our theoretical framework provides new tools for assessing the behavior of manifold learning and dictionary learning procedures on a large class of interesting models. △ Less

Submitted 13 December, 2015; v1 submitted 22 January, 2014; originally announced January 2014.

Comments: This new version reorganizes proofs, and more numerical experiments are performed

arXiv:1311.4748 [pdf, other]

Connectivity and Irreducibility of Algebraic Varieties of Finite Unit Norm Tight Frames

Authors: Jameson Cahill, Dustin G. Mixon, Nate Strawn

Abstract: In this paper, we settle a long-standing problem on the connectivity of spaces of finite unit norm tight frames (FUNTFs), essentially affirming a conjecture first appearing in [Dykema and Strawn, 2003]. Our central technique involves continuous liftings of paths from the polytope of eigensteps to spaces of FUNTFs. After demonstrating this connectivity result, we refine our analysis to show that th… ▽ More In this paper, we settle a long-standing problem on the connectivity of spaces of finite unit norm tight frames (FUNTFs), essentially affirming a conjecture first appearing in [Dykema and Strawn, 2003]. Our central technique involves continuous liftings of paths from the polytope of eigensteps to spaces of FUNTFs. After demonstrating this connectivity result, we refine our analysis to show that the set of nonsingular points on these spaces is also connected, and we use this result to show that spaces of FUNTFs are irreducible in the algebro-geometric sense, and also that generic FUNTFs are full spark. △ Less

Submitted 14 January, 2016; v1 submitted 19 November, 2013; originally announced November 2013.

Comments: 33 pages, 4 figures

arXiv:1207.4854 [pdf, ps, other]

Finite sample posterior concentration in high-dimensional regression

Authors: Nate Strawn, Artin Armagan, Rayan Saab, Lawrence Carin, David Dunson

Abstract: We study the behavior of the posterior distribution in high-dimensional Bayesian Gaussian linear regression models having $p\gg n$, with $p$ the number of predictors and $n$ the sample size. Our focus is on obtaining quantitative finite sample bounds ensuring sufficient posterior probability assigned in neighborhoods of the true regression coefficient vector, $β^0$, with high probability. We assum… ▽ More We study the behavior of the posterior distribution in high-dimensional Bayesian Gaussian linear regression models having $p\gg n$, with $p$ the number of predictors and $n$ the sample size. Our focus is on obtaining quantitative finite sample bounds ensuring sufficient posterior probability assigned in neighborhoods of the true regression coefficient vector, $β^0$, with high probability. We assume that $β^0$ is approximately $S$-sparse and obtain universal bounds, which provide insight into the role of the prior in controlling concentration of the posterior. Based on these finite sample bounds, we examine the implied asymptotic contraction rates for several examples showing that sparsely-structured and heavy-tail shrinkage priors exhibit rapid contraction rates. We also demonstrate that a stronger result holds for the Uniform-Gaussian\footnote[2]{A binary vector of indicators ($γ$) is drawn from the uniform distribution on the set of binary sequences with exactly $S$ ones, and then each $β_i\sim\mathcal{N}(0,V^2)$ if $γ_i=1$ and $β_i=0$ if $γ_i=0$.} prior. These types of finite sample bounds provide guidelines for designing and evaluating priors for high-dimensional problems. △ Less

Submitted 3 January, 2014; v1 submitted 20 July, 2012; originally announced July 2012.

arXiv:1107.2173 [pdf, other]

Constructing all self-adjoint matrices with prescribed spectrum and diagonal

Authors: Matthew Fickus, Dustin G. Mixon, Miriam J. Poteet, Nate Strawn

Abstract: The Schur-Horn Theorem states that there exists a self-adjoint matrix with a given spectrum and diagonal if and only if the spectrum majorizes the diagonal. Though the original proof of this result was nonconstructive, several constructive proofs have subsequently been found. Most of these constructive proofs rely on Givens rotations, and none have been shown to be able to produce every example of… ▽ More The Schur-Horn Theorem states that there exists a self-adjoint matrix with a given spectrum and diagonal if and only if the spectrum majorizes the diagonal. Though the original proof of this result was nonconstructive, several constructive proofs have subsequently been found. Most of these constructive proofs rely on Givens rotations, and none have been shown to be able to produce every example of such a matrix. We introduce a new construction method that is able to do so. This method is based on recent advances in finite frame theory which show how to construct frames whose frame operator has a given prescribed spectrum and whose vectors have given prescribed lengths. This frame construction requires one to find a sequence of eigensteps, that is, a sequence of interlacing spectra that satisfy certain trace considerations. In this paper, we show how to explicitly construct every such sequence of eigensteps. Here, the key idea is to visualize eigenstep construction as iteratively building a staircase. This visualization leads to an algorithm, dubbed Top Kill, which produces a valid sequence of eigensteps whenever it is possible to do so. We then build on Top Kill to explicitly parametrize the set of all valid eigensteps. This yields an explicit method for constructing all self-adjoint matrices with a given spectrum and diagonal, and moreover all frames whose frame operator has a given spectrum and whose elements have given lengths. △ Less

Submitted 11 July, 2011; originally announced July 2011.

MSC Class: 42C15

arXiv:1106.0921 [pdf, other]

Constructing finite frames of a given spectrum and set of lengths

Authors: Jameson Cahill, Matthew Fickus, Dustin G. Mixon, Miriam J. Poteet, Nathaniel K. Strawn

Abstract: When constructing finite frames for a given application, the most important consideration is the spectrum of the frame operator. Indeed, the minimum and maximum eigenvalues of the frame operator are the optimal frame bounds, and the frame is tight precisely when this spectrum is constant. Often, the second-most important design consideration is the lengths of frame vectors: Gabor, wavelet, equiang… ▽ More When constructing finite frames for a given application, the most important consideration is the spectrum of the frame operator. Indeed, the minimum and maximum eigenvalues of the frame operator are the optimal frame bounds, and the frame is tight precisely when this spectrum is constant. Often, the second-most important design consideration is the lengths of frame vectors: Gabor, wavelet, equiangular and Grassmannian frames are all special cases of equal norm frames, and unit norm tight frame-based encoding is known to be optimally robust against additive noise and erasures. We consider the problem of constructing frames whose frame operator has a given spectrum and whose vectors have prescribed lengths. For a given spectrum and set of lengths, the existence of such frames is characterized by the Schur-Horn Theorem---they exist if and only if the spectrum majorizes the squared lengths---the classical proof of which is nonconstructive. Certain construction methods, such as harmonic frames and spectral tetris, are known in the special case of unit norm tight frames, but even these provide but a few examples from the manifold of all such frames, the dimension of which is known and nontrivial. In this paper, we provide a new method for explicitly constructing any and all frames whose frame operator has a prescribed spectrum and whose vectors have prescribed lengths. The method itself has two parts. In the first part, one chooses eigensteps---a sequence of interlacing spectra---that transform the trivial spectrum into the desired one. The second part is to explicitly compute the frame vectors in terms of these eigensteps; though nontrivial, this process is nevertheless straightforward enough to be implemented by hand, involving only arithmetic, square roots and matrix multiplication. △ Less

Submitted 5 June, 2011; originally announced June 2011.

arXiv:1104.4135 [pdf, ps, other]

doi 10.1093/biomet/ast028

Posterior consistency in linear models under shrinkage priors

Authors: Artin Armagan, David B. Dunson, Jaeyong Lee, Waheed U. Bajwa, Nate Strawn

Abstract: We investigate the asymptotic behavior of posterior distributions of regression coefficients in high-dimensional linear models as the number of dimensions grows with the number of observations. We show that the posterior distribution concentrates in neighborhoods of the true parameter under simple sufficient conditions. These conditions hold under popular shrinkage priors given some sparsity assum… ▽ More We investigate the asymptotic behavior of posterior distributions of regression coefficients in high-dimensional linear models as the number of dimensions grows with the number of observations. We show that the posterior distribution concentrates in neighborhoods of the true parameter under simple sufficient conditions. These conditions hold under popular shrinkage priors given some sparsity assumptions. △ Less

Submitted 19 May, 2013; v1 submitted 20 April, 2011; originally announced April 2011.

Comments: To appear in Biometrika

Journal ref: Biometrika, vol. 100, no. 4, pp. 1011-1018, Dec. 2013

arXiv:math/0307367 [pdf, ps, other]

Manifold structure of spaces of spherical tight frames

Authors: Ken Dykema, Nate Strawn

Abstract: We consider the space F^E_{k,n} of all spherical tight frames of k vectors in real or complex n--dimensional Hilbert space E^n, i.e. E=R or E=C, and its orbit space G^E_{k,n}=F^E_{k,n}/O^E_n under the obvious action of the group O^E_n of structure preserving transformations of E^n. We show that the quotient map F^E_{k,n} -> G^E_{k,n} is a locally trivial fiber bundle (also in the more general ca… ▽ More We consider the space F^E_{k,n} of all spherical tight frames of k vectors in real or complex n--dimensional Hilbert space E^n, i.e. E=R or E=C, and its orbit space G^E_{k,n}=F^E_{k,n}/O^E_n under the obvious action of the group O^E_n of structure preserving transformations of E^n. We show that the quotient map F^E_{k,n} -> G^E_{k,n} is a locally trivial fiber bundle (also in the more general case of ellipsoidal tight frames) and that there is a homeomorphism G^E_{k,n} -> G^E_{k,k-n}. We show that G^E_{k,n} and F^E_{k,n} are real manifolds whenever k and n are relatively prime, and we describe them as disjoint unions of finitely many manifolds (of various dimensions) when when k and n have a common divisor. We also prove that F^R_{k,2} is connected (k >= 4) and F^R_{n+2,n} is connected, (n >= 2). The spaces G^R_{4,2} and G^R_{5,2} are investigated in detail. The former is found to be a graph and the latter is the orientable surface of genus 25. △ Less

Submitted 27 September, 2003; v1 submitted 28 July, 2003; originally announced July 2003.

Comments: The new version corrects some typographical errors, including a misleading error in the abstract: we show connectedness of F^R_{k,2}, not of more general F^R_{k,n}

MSC Class: 42C15; 94A12; 14P05

Showing 1–14 of 14 results for author: Strawn, N