Search | arXiv e-print repository

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

Authors: Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo

Abstract: The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration… ▽ More The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration until fitted models become useless. However, those studies largely assumed that new data replace old data over time, where an arguably more realistic assumption is that data accumulate over time. In this paper, we ask: what effect does accumulating data have on model collapse? We empirically study this question by pretraining sequences of language models on text corpora. We confirm that replacing the original real data by each generation's synthetic data does indeed tend towards model collapse, then demonstrate that accumulating the successive generations of synthetic data alongside the original real data avoids model collapse; these results hold across a range of model sizes, architectures, and hyperparameters. We obtain similar results for deep generative models on other types of real data: diffusion models for molecule conformation generation and variational autoencoders for image generation. To understand why accumulating data can avoid model collapse, we use an analytically tractable framework introduced by prior work in which a sequence of linear models are fit to the previous models' outputs. Previous work used this framework to show that if data are replaced, the test error increases with the number of model-fitting iterations; we extend this argument to prove that if data instead accumulate, the test error has a finite upper bound independent of the number of iterations, meaning model collapse no longer occurs. △ Less

Submitted 29 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

arXiv:2106.02073 [pdf, other]

Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path

Authors: X. Y. Han, Vardan Papyan, David L. Donoho

Abstract: The recently discovered Neural Collapse (NC) phenomenon occurs pervasively in today's deep net training paradigm of driving cross-entropy (CE) loss towards zero. During NC, last-layer features collapse to their class-means, both classifiers and class-means collapse to the same Simplex Equiangular Tight Frame, and classifier behavior collapses to the nearest-class-mean decision rule. Recent works d… ▽ More The recently discovered Neural Collapse (NC) phenomenon occurs pervasively in today's deep net training paradigm of driving cross-entropy (CE) loss towards zero. During NC, last-layer features collapse to their class-means, both classifiers and class-means collapse to the same Simplex Equiangular Tight Frame, and classifier behavior collapses to the nearest-class-mean decision rule. Recent works demonstrated that deep nets trained with mean squared error (MSE) loss perform comparably to those trained with CE. As a preliminary, we empirically establish that NC emerges in such MSE-trained deep nets as well through experiments on three canonical networks and five benchmark datasets. We provide, in a Google Colab notebook, PyTorch code for reproducing MSE-NC and CE-NC: at https://colab.research.google.com/github/neuralcollapse/neuralcollapse/blob/main/neuralcollapse.ipynb. The analytically-tractable MSE loss offers more mathematical opportunities than the hard-to-analyze CE loss, inspiring us to leverage MSE loss towards the theoretical investigation of NC. We develop three main contributions: (I) We show a new decomposition of the MSE loss into (A) terms directly interpretable through the lens of NC and which assume the last-layer classifier is exactly the least-squares classifier; and (B) a term capturing the deviation from this least-squares classifier. (II) We exhibit experiments on canonical datasets and networks demonstrating that term-(B) is negligible during training. This motivates us to introduce a new theoretical construct: the central path, where the linear classifier stays MSE-optimal for feature activations throughout the dynamics. (III) By studying renormalized gradient flow along the central path, we derive exact dynamics that predict NC. △ Less

Submitted 9 May, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

Comments: ICLR 2022 Outstanding Paper Prize & Oral. Appendix contains [A] empirical experiments, [B-D] proofs of theoretical results, and [E] survey of related works examining Neural Collapse

arXiv:2008.08186 [pdf, other]

doi 10.1073/pnas.2015509117

Prevalence of Neural Collapse during the terminal phase of deep learning training

Authors: Vardan Papyan, X. Y. Han, David L. Donoho

Abstract: Modern practice for training classification deepnets involves a Terminal Phase of Training (TPT), which begins at the epoch where training error first vanishes; During TPT, the training error stays effectively zero while training loss is pushed towards zero. Direct measurements of TPT, for three prototypical deepnet architectures and across seven canonical classification datasets, expose a pervasi… ▽ More Modern practice for training classification deepnets involves a Terminal Phase of Training (TPT), which begins at the epoch where training error first vanishes; During TPT, the training error stays effectively zero while training loss is pushed towards zero. Direct measurements of TPT, for three prototypical deepnet architectures and across seven canonical classification datasets, expose a pervasive inductive bias we call Neural Collapse, involving four deeply interconnected phenomena: (NC1) Cross-example within-class variability of last-layer training activations collapses to zero, as the individual activations themselves collapse to their class-means; (NC2) The class-means collapse to the vertices of a Simplex Equiangular Tight Frame (ETF); (NC3) Up to rescaling, the last-layer classifiers collapse to the class-means, or in other words to the Simplex ETF, i.e. to a self-dual configuration; (NC4) For a given activation, the classifier's decision collapses to simply choosing whichever class has the closest train class-mean, i.e. the Nearest Class Center (NCC) decision rule. The symmetric and very simple geometry induced by the TPT confers important benefits, including better generalization performance, better robustness, and better interpretability. △ Less

Submitted 21 August, 2020; v1 submitted 18 August, 2020; originally announced August 2020.

arXiv:1901.08705 [pdf, other]

Ambitious Data Science Can Be Painless

Authors: Hatef Monajemi, Riccardo Murri, Eric Jonas, Percy Liang, Victoria Stodden, David L. Donoho

Abstract: Modern data science research can involve massive computational experimentation; an ambitious PhD in computational fields may do experiments consuming several million CPU hours. Traditional computing practices, in which researchers use laptops or shared campus-resident resources, are inadequate for experiments at the massive scale and varied scope that we now see in data science. On the other hand,… ▽ More Modern data science research can involve massive computational experimentation; an ambitious PhD in computational fields may do experiments consuming several million CPU hours. Traditional computing practices, in which researchers use laptops or shared campus-resident resources, are inadequate for experiments at the massive scale and varied scope that we now see in data science. On the other hand, modern cloud computing promises seemingly unlimited computational resources that can be custom configured, and seems to offer a powerful new venue for ambitious data-driven science. Exploiting the cloud fully, the amount of work that could be completed in a fixed amount of time can expand by several orders of magnitude. As potentially powerful as cloud-based experimentation may be in the abstract, it has not yet become a standard option for researchers in many academic disciplines. The prospect of actually conducting massive computational experiments in today's cloud systems confronts the potential user with daunting challenges. Leading considerations include: (i) the seeming complexity of today's cloud computing interface, (ii) the difficulty of executing an overwhelmingly large number of jobs, and (iii) the difficulty of monitoring and combining a massive collection of separate results. Starting a massive experiment `bare-handed' seems therefore highly problematic and prone to rapid `researcher burn out'. New software stacks are emerging that render massive cloud experiments relatively painless. Such stacks simplify experimentation by systematizing experiment definition, automating distribution and management of tasks, and allowing easy harvesting of results and documentation. In this article, we discuss several painless computing stacks that abstract away the difficulties of massive experimentation, thereby allowing a proliferation of ambitious experiments for scientific discovery. △ Less

Submitted 24 January, 2019; originally announced January 2019.

Comments: Submitted to Harvard Data Science Review

arXiv:1702.03062 [pdf, other]

Sparsity/Undersampling Tradeoffs in Anisotropic Undersampling, with Applications in MR Imaging/Spectroscopy

Authors: Hatef Monajemi, David L. Donoho

Abstract: We study anisotropic undersampling schemes like those used in multi-dimensional NMR spectroscopy and MR imaging, which sample exhaustively in certain time dimensions and randomly in others. Our analysis shows that anisotropic undersampling schemes are equivalent to certain block-diagonal measurement systems. We develop novel exact formulas for the sparsity/undersampling tradeoffs in such measure… ▽ More We study anisotropic undersampling schemes like those used in multi-dimensional NMR spectroscopy and MR imaging, which sample exhaustively in certain time dimensions and randomly in others. Our analysis shows that anisotropic undersampling schemes are equivalent to certain block-diagonal measurement systems. We develop novel exact formulas for the sparsity/undersampling tradeoffs in such measurement systems. Our formulas predict finite-N phase transition behavior differing substantially from the well known asymptotic phase transitions for classical Gaussian undersampling. Extensive empirical work shows that our formulas accurately describe observed finite-N behavior, while the usual formulas based on universality are substantially inaccurate. We also vary the anisotropy, kee** the total number of samples fixed, and for each variation we determine the precise sparsity/undersampling tradeoff (phase transition). We show that, other things being equal, the ability to recover a sparse object decreases with an increasing number of exhaustively-sampled dimensions. △ Less

Submitted 16 March, 2018; v1 submitted 9 February, 2017; originally announced February 2017.

arXiv:1302.2331 [pdf, other]

doi 10.1073/pnas.1306110110

The Phase Transition of Matrix Recovery from Gaussian Measurements Matches the Minimax MSE of Matrix Denoising

Authors: David L. Donoho, Matan Gavish, Andrea Montanari

Abstract: Let $X_0$ be an unknown $M$ by $N$ matrix. In matrix recovery, one takes $n < MN$ linear measurements $y_1,..., y_n$ of $X_0$, where $y_i = \Tr(a_i^T X_0)$ and each $a_i$ is a $M$ by $N$ matrix. For measurement matrices with Gaussian i.i.d entries, it known that if $X_0$ is of low rank, it is recoverable from just a few measurements. A popular approach for matrix recovery is Nuclear Norm Minimizat… ▽ More Let $X_0$ be an unknown $M$ by $N$ matrix. In matrix recovery, one takes $n < MN$ linear measurements $y_1,..., y_n$ of $X_0$, where $y_i = \Tr(a_i^T X_0)$ and each $a_i$ is a $M$ by $N$ matrix. For measurement matrices with Gaussian i.i.d entries, it known that if $X_0$ is of low rank, it is recoverable from just a few measurements. A popular approach for matrix recovery is Nuclear Norm Minimization (NNM). Empirical work reveals a \emph{phase transition} curve, stated in terms of the undersampling fraction $δ(n,M,N) = n/(MN)$, rank fraction $ρ=r/N$ and aspect ratio $β=M/N$. Specifically, a curve $δ^* = δ^*(ρ;β)$ exists such that, if $δ> δ^*(ρ;β)$, NNM typically succeeds, while if $δ< δ^*(ρ;β)$, it typically fails. An apparently quite different problem is matrix denoising in Gaussian noise, where an unknown $M$ by $N$ matrix $X_0$ is to be estimated based on direct noisy measurements $Y = X_0 + Z$, where the matrix $Z$ has iid Gaussian entries. It has been empirically observed that, if $X_0$ has low rank, it may be recovered quite accurately from the noisy measurement $Y$. A popular matrix denoising scheme solves the unconstrained optimization problem $\text{min} \| Y - X \|_F^2/2 + λ\|X\|_* $. When optimally tuned, this scheme achieves the asymptotic minimax MSE $\cM(ρ) = \lim_{N \goto \infty} \inf_λ\sup_{\rank(X) \leq ρ\cdot N} MSE(X,\hat{X}_λ)$. We report extensive experiments showing that the phase transition $δ^*(ρ)$ in the first problem coincides with the minimax risk curve $\cM(ρ)$ in the second problem, for {\em any} rank fraction $0 < ρ< 1$. △ Less

Submitted 10 February, 2013; originally announced February 2013.

arXiv:1112.0708 [pdf, other]

Information-Theoretically Optimal Compressed Sensing via Spatial Coupling and Approximate Message Passing

Authors: David L. Donoho, Adel Javanmard, Andrea Montanari

Abstract: We study the compressed sensing reconstruction problem for a broad class of random, band-diagonal sensing matrices. This construction is inspired by the idea of spatial coupling in coding theory. As demonstrated heuristically and numerically by Krzakala et al. \cite{KrzakalaEtAl}, message passing algorithms can effectively solve the reconstruction problem for spatially coupled measurements with un… ▽ More We study the compressed sensing reconstruction problem for a broad class of random, band-diagonal sensing matrices. This construction is inspired by the idea of spatial coupling in coding theory. As demonstrated heuristically and numerically by Krzakala et al. \cite{KrzakalaEtAl}, message passing algorithms can effectively solve the reconstruction problem for spatially coupled measurements with undersampling rates close to the fraction of non-zero coordinates. We use an approximate message passing (AMP) algorithm and analyze it through the state evolution method. We give a rigorous proof that this approach is successful as soon as the undersampling rate $δ$ exceeds the (upper) Rényi information dimension of the signal, $\uRenyi(p_X)$. More precisely, for a sequence of signals of diverging dimension $n$ whose empirical distribution converges to $p_X$, reconstruction is with high probability successful from $\uRenyi(p_X)\, n+o(n)$ measurements taken according to a band diagonal matrix. For sparse signals, i.e., sequences of dimension $n$ and $k(n)$ non-zero entries, this implies reconstruction from $k(n)+o(n)$ measurements. For `discrete' signals, i.e., signals whose coordinates take a fixed finite set of values, this implies reconstruction from $o(n)$ measurements. The result is robust with respect to noise, does not apply uniquely to random signals, but requires the knowledge of the empirical distribution of the signal $p_X$. △ Less

Submitted 18 January, 2013; v1 submitted 3 December, 2011; originally announced December 2011.

Comments: 60 pages, 7 figures, Sections 3,5 and Appendices A,B are added. The stability constant is quantified (cf Theorem 1.7)

arXiv:1004.3006 [pdf, ps, other]

Microlocal Analysis of the Geometric Separation Problem

Authors: David L. Donoho, Gitta Kutyniok

Abstract: Image data are often composed of two or more geometrically distinct constituents; in galaxy catalogs, for instance, one sees a mixture of pointlike structures (galaxy superclusters) and curvelike structures (filaments). It would be ideal to process a single image and extract two geometrically `pure' images, each one containing features from only one of the two geometric constituents. This seems t… ▽ More Image data are often composed of two or more geometrically distinct constituents; in galaxy catalogs, for instance, one sees a mixture of pointlike structures (galaxy superclusters) and curvelike structures (filaments). It would be ideal to process a single image and extract two geometrically `pure' images, each one containing features from only one of the two geometric constituents. This seems to be a seriously underdetermined problem, but recent empirical work achieved highly persuasive separations. We present a theoretical analysis showing that accurate geometric separation of point and curve singularities can be achieved by minimizing the $\ell_1$ norm of the representing coefficients in two geometrically complementary frames: wavelets and curvelets. Driving our analysis is a specific property of the ideal (but unachievable) representation where each content type is expanded in the frame best adapted to it. This ideal representation has the property that important coefficients are clustered geometrically in phase space, and that at fine scales, there is very little coherence between a cluster of elements in one frame expansion and individual elements in the complementary frame. We formally introduce notions of cluster coherence and clustered sparsity and use this machinery to show that the underdetermined systems of linear equations can be stably solved by $\ell_1$ minimization; microlocal phase space helps organize the calculations that cluster coherence requires. △ Less

Submitted 18 April, 2010; originally announced April 2010.

Comments: 59 pages, 9 figures

Report number: Technical Report No. 2010-01, Statistics Department, Stanford University

arXiv:1004.1218 [pdf, other]

The Noise-Sensitivity Phase Transition in Compressed Sensing

Authors: David L. Donoho, Arian Maleki, Andrea Montanari

Abstract: Consider the noisy underdetermined system of linear equations: y=Ax0 + z0, with n x N measurement matrix A, n < N, and Gaussian white noise z0 ~ N(0,σ^2 I). Both y and A are known, both x0 and z0 are unknown, and we seek an approximation to x0. When x0 has few nonzeros, useful approximations are obtained by l1-penalized l2 minimization, in which the reconstruction \hxl solves min || y - Ax||^2/2… ▽ More Consider the noisy underdetermined system of linear equations: y=Ax0 + z0, with n x N measurement matrix A, n < N, and Gaussian white noise z0 ~ N(0,σ^2 I). Both y and A are known, both x0 and z0 are unknown, and we seek an approximation to x0. When x0 has few nonzeros, useful approximations are obtained by l1-penalized l2 minimization, in which the reconstruction \hxl solves min || y - Ax||^2/2 + λ||x||_1. Evaluate performance by mean-squared error (MSE = E ||\hxl - x0||_2^2/N). Consider matrices A with iid Gaussian entries and a large-system limit in which n,N\to\infty with n/N \to δand k/n \to ρ. Call the ratio MSE/σ^2 the noise sensitivity. We develop formal expressions for the MSE of \hxl, and evaluate its worst-case formal noise sensitivity over all types of k-sparse signals. The phase space 0 < δ, ρ< 1 is partitioned by curve ρ= \rhoMSE(δ) into two regions. Formal noise sensitivity is bounded throughout the region ρ< \rhoMSE(δ) and is unbounded throughout the region ρ> \rhoMSE(δ). The phase boundary ρ= \rhoMSE(δ) is identical to the previously-known phase transition curve for equivalence of l1 - l0 minimization in the k-sparse noiseless case. Hence a single phase boundary describes the fundamental phase transitions both for the noiseless and noisy cases. Extensive computational experiments validate the predictions of this formalism, including the existence of game theoretical structures underlying it. Underlying our formalism is the AMP algorithm introduced earlier by the authors. Other papers by the authors detail expressions for the formal MSE of AMP and its close connection to l1-penalized reconstruction. Here we derive the minimax formal MSE of AMP and then read out results for l1-penalized reconstruction. △ Less

Submitted 7 April, 2010; originally announced April 2010.

Comments: 40 pages, 13 pdf figures

arXiv:0911.4222 [pdf, other]

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation

Authors: David L. Donoho, Arian Maleki, Andrea Montanari

Abstract: In a recent paper, the authors proposed a new class of low-complexity iterative thresholding algorithms for reconstructing sparse signals from a small set of linear measurements \cite{DMM}. The new algorithms are broadly referred to as AMP, for approximate message passing. This is the second of two conference papers describing the derivation of these algorithms, connection with related literatur… ▽ More In a recent paper, the authors proposed a new class of low-complexity iterative thresholding algorithms for reconstructing sparse signals from a small set of linear measurements \cite{DMM}. The new algorithms are broadly referred to as AMP, for approximate message passing. This is the second of two conference papers describing the derivation of these algorithms, connection with related literature, extensions of original framework, and new empirical evidence. This paper describes the state evolution formalism for analyzing these algorithms, and some of the conclusions that can be drawn from this formalism. We carried out extensive numerical simulations to confirm these predictions. We present here a few representative results. △ Less

Submitted 21 November, 2009; originally announced November 2009.

Comments: 5 pages, 3 pdf figures, IEEE Information Theory Workshop, Cairo 2010

arXiv:0911.4219 [pdf, ps, other]

Message Passing Algorithms for Compressed Sensing: I. Motivation and Construction

Authors: David L. Donoho, Arian Maleki, Andrea Montanari

Abstract: In a recent paper, the authors proposed a new class of low-complexity iterative thresholding algorithms for reconstructing sparse signals from a small set of linear measurements \cite{DMM}. The new algorithms are broadly referred to as AMP, for approximate message passing. This is the first of two conference papers describing the derivation of these algorithms, connection with the related litera… ▽ More In a recent paper, the authors proposed a new class of low-complexity iterative thresholding algorithms for reconstructing sparse signals from a small set of linear measurements \cite{DMM}. The new algorithms are broadly referred to as AMP, for approximate message passing. This is the first of two conference papers describing the derivation of these algorithms, connection with the related literature, extensions of the original framework, and new empirical evidence. In particular, the present paper outlines the derivation of AMP from standard sum-product belief propagation, and its extension in several directions. We also discuss relations with formal calculations based on statistical mechanics methods. △ Less

Submitted 21 November, 2009; originally announced November 2009.

Comments: 5 pages, IEEE Information Theory Workshop, Cairo 2010

arXiv:0909.0777 [pdf, other]

doi 10.1109/JSTSP.2009.2039176

Optimally Tuned Iterative Reconstruction Algorithms for Compressed Sensing

Authors: Arian Maleki, David L. Donoho

Abstract: We conducted an extensive computational experiment, lasting multiple CPU-years, to optimally select parameters for two important classes of algorithms for finding sparse solutions of underdetermined systems of linear equations. We make the optimally tuned implementations available at {\tt sparselab.stanford.edu}; they run `out of the box' with no user tuning: it is not necessary to select thresh… ▽ More We conducted an extensive computational experiment, lasting multiple CPU-years, to optimally select parameters for two important classes of algorithms for finding sparse solutions of underdetermined systems of linear equations. We make the optimally tuned implementations available at {\tt sparselab.stanford.edu}; they run `out of the box' with no user tuning: it is not necessary to select thresholds or know the likely degree of sparsity. Our class of algorithms includes iterative hard and soft thresholding with or without relaxation, as well as CoSaMP, subspace pursuit and some natural extensions. As a result, our optimally tuned algorithms dominate such proposals. Our notion of optimality is defined in terms of phase transitions, i.e. we maximize the number of nonzeros at which the algorithm can successfully operate. We show that the phase transition is a well-defined quantity with our suite of random underdetermined linear systems. Our tuning gives the highest transition possible within each class of algorithms. △ Less

Submitted 3 September, 2009; originally announced September 2009.

Comments: 12 pages, 14 figures

arXiv:0907.3574 [pdf, ps, other]

doi 10.1073/pnas.0909892106

Message Passing Algorithms for Compressed Sensing

Authors: David L. Donoho, Arian Maleki, Andrea Montanari

Abstract: Compressed sensing aims to undersample certain high-dimensional signals, yet accurately reconstruct them by exploiting signal characteristics. Accurate reconstruction is possible when the object to be recovered is sufficiently sparse in a known basis. Currently, the best known sparsity-undersampling tradeoff is achieved when reconstructing by convex optimization -- which is expensive in importan… ▽ More Compressed sensing aims to undersample certain high-dimensional signals, yet accurately reconstruct them by exploiting signal characteristics. Accurate reconstruction is possible when the object to be recovered is sufficiently sparse in a known basis. Currently, the best known sparsity-undersampling tradeoff is achieved when reconstructing by convex optimization -- which is expensive in important large-scale applications. Fast iterative thresholding algorithms have been intensively studied as alternatives to convex optimization for large-scale problems. Unfortunately known fast algorithms offer substantially worse sparsity-undersampling tradeoffs than convex optimization. We introduce a simple costless modification to iterative thresholding making the sparsity-undersampling tradeoff of the new algorithms equivalent to that of the corresponding convex optimization procedures. The new iterative-thresholding algorithms are inspired by belief propagation in graphical models. Our empirical measurements of the sparsity-undersampling tradeoff for the new algorithms agree with theoretical calculations. We show that a state evolution formalism correctly derives the true sparsity-undersampling tradeoff. There is a surprising agreement between earlier calculations based on random convex polytopes and this new, apparently very different theoretical formalism. △ Less

Submitted 21 July, 2009; originally announced July 2009.

Comments: 6 pages paper + 9 pages supplementary information, 13 eps figure. Submitted to Proc. Natl. Acad. Sci. USA

arXiv:0906.2530 [pdf, other]

doi 10.1098/rsta.2009.0152

Observed Universality of Phase Transitions in High-Dimensional Geometry, with Implications for Modern Data Analysis and Signal Processing

Authors: David L. Donoho, Jared Tanner

Abstract: We review connections between phase transitions in high-dimensional combinatorial geometry and phase transitions occurring in modern high-dimensional data analysis and signal processing. In data analysis, such transitions arise as abrupt breakdown of linear model selection, robust data fitting or compressed sensing reconstructions, when the complexity of the model or the number of outliers incre… ▽ More We review connections between phase transitions in high-dimensional combinatorial geometry and phase transitions occurring in modern high-dimensional data analysis and signal processing. In data analysis, such transitions arise as abrupt breakdown of linear model selection, robust data fitting or compressed sensing reconstructions, when the complexity of the model or the number of outliers increases beyond a threshold. In combinatorial geometry these transitions appear as abrupt changes in the properties of face counts of convex polytopes when the dimensions are varied. The thresholds in these very different problems appear in the same critical locations after appropriate calibration of variables. These thresholds are important in each subject area: for linear modelling, they place hard limits on the degree to which the now-ubiquitous high-throughput data analysis can be successful; for robustness, they place hard limits on the degree to which standard robust fitting methods can tolerate outliers before breaking down; for compressed sensing, they define the sharp boundary of the undersampling/sparsity tradeoff in undersampling theorems. Existing derivations of phase transitions in combinatorial geometry assume the underlying matrices have independent and identically distributed (iid) Gaussian elements. In applications, however, it often seems that Gaussianity is not required. We conducted an extensive computational experiment and formal inferential analysis to test the hypothesis that these phase transitions are {\it universal} across a range of underlying matrix ensembles. The experimental results are consistent with an asymptotic large-$n$ universality across matrix ensembles; finite-sample universality can be rejected. △ Less

Submitted 14 June, 2009; originally announced June 2009.

Comments: 47 pages, 24 figures, 10 tables

arXiv:0807.3590 [pdf, ps, other]

Counting the Faces of Randomly-Projected Hypercubes and Orthants, with Applications

Authors: David L. Donoho, Jared Tanner

Abstract: Let $A$ be an $n$ by $N$ real valued random matrix, and $\h$ denote the $N$-dimensional hypercube. For numerous random matrix ensembles, the expected number of $k$-dimensional faces of the random $n$-dimensional zonotope $A\h$ obeys the formula $E f_k(A\h) /f_k(\h) = 1-P_{N-n,N-k}$, where $P_{N-n,N-k}$ is a fair-coin-tossing probability. The formula applies, for example, where the columns of… ▽ More Let $A$ be an $n$ by $N$ real valued random matrix, and $\h$ denote the $N$-dimensional hypercube. For numerous random matrix ensembles, the expected number of $k$-dimensional faces of the random $n$-dimensional zonotope $A\h$ obeys the formula $E f_k(A\h) /f_k(\h) = 1-P_{N-n,N-k}$, where $P_{N-n,N-k}$ is a fair-coin-tossing probability. The formula applies, for example, where the columns of $A$ are drawn i.i.d. from an absolutely continuous symmetric distribution. The formula exploits Wendel's Theorem\cite{We62}. Let $\po$ denote the positive orthant; the expected number of $k$-faces of the random cone$A \po$ obeys $ {\cal E} f_k(A\po) /f_k(\po) = 1 - P_{N-n,N-k}$. The formula applies to numerous matrix ensembles, including those with iid random columns from an absolutely continuous, centrally symmetric distribution. There is an asymptotically sharp threshold in the behavior of face counts of the projected hypercube; thresholds known for projecting the simplex and the cross-polytope, occur at very different locations. We briefly consider face counts of the projected orthant when $A$ does not have mean zero; these do behave similarly to those for the projected simplex. We consider non-random projectors of the orthant; the 'best possible' $A$ is the one associated with the first $n$ rows of the Fourier matrix. These geometric face-counting results have implications for signal processing, information theory, inverse problems, and optimization. Most of these flow in some way from the fact that face counting is related to conditions for uniqueness of solutions of underdetermined systems of linear equations. △ Less

Submitted 22 July, 2008; originally announced July 2008.

Comments: 21 pages, 3 figures

MSC Class: 52A22; 52B05; 52B11; 52B12; 62E20; 68P30; 68P25; 68W20; 68W40; 94B20; 94B35; 94B65; 94B70

Showing 1–15 of 15 results for author: Donoho, D L