Skip to main content

Showing 1–50 of 53 results for author: Donoho, D

.
  1. arXiv:2404.01413  [pdf, other

    cs.LG cs.AI cs.CL cs.ET stat.ML

    Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

    Authors: Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo

    Abstract: The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration… ▽ More

    Submitted 29 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  2. arXiv:2310.00865  [pdf, other

    stat.OT

    Data Science at the Singularity

    Authors: David Donoho

    Abstract: A purported `AI Singularity' has been in the public eye recently. Mass media and US national political attention focused on `AI Doom' narratives hawked by social media influencers. The European Commission is announcing initiatives to forestall `AI Extinction'. In my opinion, `AI Singularity' is the wrong narrative for what's happening now; recent happenings signal something else entirely. Somethin… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

    Comments: 1 Figure

  3. arXiv:2308.01839  [pdf, other

    q-bio.QM cs.CV q-bio.GN stat.AP stat.ML

    Is your data alignable? Principled and interpretable alignability testing and integration of single-cell data

    Authors: Rong Ma, Eric D. Sun, David Donoho, James Zou

    Abstract: Single-cell data integration can provide a comprehensive molecular view of cells, and many algorithms have been developed to remove unwanted technical or biological variations and integrate heterogeneous single-cell datasets. Despite their wide usage, existing methods suffer from several fundamental limitations. In particular, we lack a rigorous statistical test for whether two high-dimensional si… ▽ More

    Submitted 29 February, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

    Journal ref: Proceedings of the National Academy of Sciences, 2024, 121(10) e2313719121

  4. arXiv:2210.04488  [pdf, other

    math.ST

    Optimal Eigenvalue Shrinkage in the Semicircle Limit

    Authors: David L. Donoho, Michael J. Feldman

    Abstract: Modern datasets are trending towards ever higher dimension. In response, recent theoretical studies of covariance estimation often assume the proportional-growth asymptotic framework, where the sample size $n$ and dimension $p$ are comparable, with $n, p \rightarrow \infty $ and $γ_n = p/n \rightarrow γ> 0$. Yet, many datasets -- perhaps most -- have very different numbers of rows and columns. We… ▽ More

    Submitted 30 July, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

  5. arXiv:2106.07053  [pdf, other

    cs.IT cs.AI eess.SY math.ST stat.OT

    Convex Sparse Blind Deconvolution

    Authors: Qingyun Sun, David Donoho

    Abstract: In the blind deconvolution problem, we observe the convolution of an unknown filter and unknown signal and attempt to reconstruct the filter and signal. The problem seems impossible in general, since there are seemingly many more unknowns than knowns . Nevertheless, this problem arises in many application fields; and empirically, some of these fields have had success using heuristic methods -- eve… ▽ More

    Submitted 13 June, 2021; originally announced June 2021.

  6. arXiv:2106.02073  [pdf, other

    cs.LG cs.AI math.DG math.OC stat.ML

    Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path

    Authors: X. Y. Han, Vardan Papyan, David L. Donoho

    Abstract: The recently discovered Neural Collapse (NC) phenomenon occurs pervasively in today's deep net training paradigm of driving cross-entropy (CE) loss towards zero. During NC, last-layer features collapse to their class-means, both classifiers and class-means collapse to the same Simplex Equiangular Tight Frame, and classifier behavior collapses to the nearest-class-mean decision rule. Recent works d… ▽ More

    Submitted 9 May, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

    Comments: ICLR 2022 Outstanding Paper Prize & Oral. Appendix contains [A] empirical experiments, [B-D] proofs of theoretical results, and [E] survey of related works examining Neural Collapse

  7. arXiv:2103.03218  [pdf, ps, other

    math.ST

    The Impossibility Region for Detecting Sparse Mixtures using the Higher Criticism

    Authors: David L. Donoho, Alon Kipnis

    Abstract: Consider a multiple hypothesis testing setting involving rare/weak effects: relatively few tests, out of possibly many, deviate from their null hypothesis behavior. Summarizing the significance of each test by a P-value, we construct a global test against the null using the Higher Criticism (HC) statistics of these P-values. We calibrate the rare/weak model using parameters controlling the asympto… ▽ More

    Submitted 19 October, 2021; v1 submitted 15 February, 2021; originally announced March 2021.

    MSC Class: 2010; Primary: 62H17; 62H15

  8. arXiv:2101.03517  [pdf, other

    astro-ph.GA

    Three-dimensional simulations of X-ray cavities inflated by radio galaxies

    Authors: Michael D. Smith andJustin Donohoe

    Abstract: Vast cavities in the intergalactic medium are excavated by radio galaxies. The cavities appear as such in X-ray images because the external medium has been swept up, leaving a hot but low density bubble surrounding the radio lobes. We explore here the predicted thermal X-ray emission from a large set of high-resolution three dimensional simulations of radio galaxies driven by supersonic jets. We a… ▽ More

    Submitted 10 January, 2021; originally announced January 2021.

    Comments: 14 pages, 15 figures, 2 table; accepted for publication to Monthly Notices of the Total Astronomical Society

  9. arXiv:2009.12297  [pdf, other

    math.ST stat.ME

    ScreeNOT: Exact MSE-Optimal Singular Value Thresholding in Correlated Noise

    Authors: David L. Donoho, Matan Gavish, Elad Romanov

    Abstract: We derive a formula for optimal hard thresholding of the singular value decomposition in the presence of correlated additive noise; although it nominally involves unobservables, we show how to apply it even where the noise covariance structure is not a-priori known or is not independently estimable. The proposed method, which we call ScreeNOT, is a mathematically solid alternative to Cattell's e… ▽ More

    Submitted 26 March, 2023; v1 submitted 25 September, 2020; originally announced September 2020.

    Journal ref: Annals of Statistics, 2023

  10. arXiv:2008.08186  [pdf, other

    cs.LG cs.CV stat.ML

    Prevalence of Neural Collapse during the terminal phase of deep learning training

    Authors: Vardan Papyan, X. Y. Han, David L. Donoho

    Abstract: Modern practice for training classification deepnets involves a Terminal Phase of Training (TPT), which begins at the epoch where training error first vanishes; During TPT, the training error stays effectively zero while training loss is pushed towards zero. Direct measurements of TPT, for three prototypical deepnet architectures and across seven canonical classification datasets, expose a pervasi… ▽ More

    Submitted 21 August, 2020; v1 submitted 18 August, 2020; originally announced August 2020.

  11. arXiv:2007.01958  [pdf, other

    math.ST stat.CO

    Higher Criticism to Compare Two Large Frequency Tables, with sensitivity to Possible Rare and Weak Differences

    Authors: David L. Donoho, Alon Kipnis

    Abstract: We adapt Higher Criticism (HC) to the comparison of two frequency tables which may -- or may not -- exhibit moderate differences between the tables in some unknown, relatively small subset out of a large number of categories. Our analysis of the power of the proposed HC test quantifies the rarity and size of assumed differences and applies moderate deviations-analysis to determine the asymptotic p… ▽ More

    Submitted 21 June, 2022; v1 submitted 3 July, 2020; originally announced July 2020.

    MSC Class: 62H17; 62H15; 62G10

    Journal ref: Annals of Statistics 2022, Vol. 50, No. 3, 1447-1472

  12. arXiv:1906.03742  [pdf, other

    cs.LG stat.ML

    Degrees of Freedom Analysis of Unrolled Neural Networks

    Authors: Morteza Mardani, Qingyun Sun, Vardan Papyan, Shreyas Vasanawala, John Pauly, David Donoho

    Abstract: Unrolled neural networks emerged recently as an effective model for learning inverse maps appearing in image restoration tasks. However, their generalization risk (i.e., test mean-squared-error) and its link to network design and train sample size remains mysterious. Leveraging the Stein's Unbiased Risk Estimator (SURE), this paper analyzes the generalization risk with its bias and variance compon… ▽ More

    Submitted 9 June, 2019; originally announced June 2019.

  13. arXiv:1901.08705  [pdf, other

    cs.DC

    Ambitious Data Science Can Be Painless

    Authors: Hatef Monajemi, Riccardo Murri, Eric Jonas, Percy Liang, Victoria Stodden, David L. Donoho

    Abstract: Modern data science research can involve massive computational experimentation; an ambitious PhD in computational fields may do experiments consuming several million CPU hours. Traditional computing practices, in which researchers use laptops or shared campus-resident resources, are inadequate for experiments at the massive scale and varied scope that we now see in data science. On the other hand,… ▽ More

    Submitted 24 January, 2019; originally announced January 2019.

    Comments: Submitted to Harvard Data Science Review

  14. arXiv:1810.07403  [pdf, ps, other

    math.ST stat.ME

    Optimal Covariance Estimation for Condition Number Loss in the Spiked Model

    Authors: David L. Donoho, Behrooz Ghorbani

    Abstract: We study estimation of the covariance matrix under relative condition number loss $κ(Σ^{-1/2} \hatΣ Σ^{-1/2})$, where $κ(Δ)$ is the condition number of matrix $Δ$, and $\hatΣ$ and $Σ$ are the estimated and theoretical covariance matrices. Optimality in $κ$-loss provides optimal guarantees in two stylized applications: Multi-User Covariance Estimation and Multi-Task Linear Discriminant Analysis. We… ▽ More

    Submitted 17 October, 2018; originally announced October 2018.

    Comments: 85 pages, 4 figures

  15. arXiv:1806.03963  [pdf, other

    cs.CV cs.LG

    Neural Proximal Gradient Descent for Compressive Imaging

    Authors: Morteza Mardani, Qingyun Sun, Shreyas Vasawanala, Vardan Papyan, Hatef Monajemi, John Pauly, David Donoho

    Abstract: Recovering high-resolution images from limited sensory data typically leads to a serious ill-posed inverse problem, demanding inversion algorithms that effectively capture the prior information. Learning a good inverse map** from training data faces severe challenges, including: (i) scarcity of training data; (ii) need for plausible reconstructions that are physically feasible; (iii) need for fa… ▽ More

    Submitted 1 June, 2018; originally announced June 2018.

    Comments: arXiv admin note: text overlap with arXiv:1711.10046

  16. arXiv:1711.10046  [pdf, other

    cs.AI cs.IR cs.LG

    Recurrent Generative Adversarial Networks for Proximal Learning and Automated Compressive Image Recovery

    Authors: Morteza Mardani, Hatef Monajemi, Vardan Papyan, Shreyas Vasanawala, David Donoho, John Pauly

    Abstract: Recovering images from undersampled linear measurements typically leads to an ill-posed linear inverse problem, that asks for proper statistical priors. Building effective priors is however challenged by the low train and test overhead dictated by real-time tasks; and the need for retrieving visually "plausible" and physically "feasible" images with minimal hallucination. To cope with these challe… ▽ More

    Submitted 27 November, 2017; originally announced November 2017.

    Comments: 11 pages, 11 figures

  17. arXiv:1702.03062  [pdf, other

    cs.IT

    Sparsity/Undersampling Tradeoffs in Anisotropic Undersampling, with Applications in MR Imaging/Spectroscopy

    Authors: Hatef Monajemi, David L. Donoho

    Abstract: We study anisotropic undersampling schemes like those used in multi-dimensional NMR spectroscopy and MR imaging, which sample exhaustively in certain time dimensions and randomly in others. Our analysis shows that anisotropic undersampling schemes are equivalent to certain block-diagonal measurement systems. We develop novel exact formulas for the sparsity/undersampling tradeoffs in such measure… ▽ More

    Submitted 16 March, 2018; v1 submitted 9 February, 2017; originally announced February 2017.

  18. arXiv:1702.01830  [pdf, other

    stat.AP

    Incoherence of Partial-Component Sampling in multidimensional NMR

    Authors: Hatef Monajemi, David L. Donoho, Jeffrey C. Hoch, Adam D. Schuyler

    Abstract: In NMR spectroscopy, undersampling in the indirect dimensions causes reconstruction artifacts whose size can be bounded using the so-called {\it coherence}. In experiments with multiple indirect dimensions, new undersampling approaches were recently proposed: random phase detection (RPD) \cite{Maciejewski11} and its generalization, partial component sampling (PCS) \cite{Schuyler13}. The new approa… ▽ More

    Submitted 6 February, 2017; originally announced February 2017.

  19. arXiv:1606.00925  [pdf, other

    cs.LG stat.ML

    Convolutional Imputation of Matrix Networks

    Authors: Qingyun Sun, Mengyuan Yan David Donoho, Stephen Boyd

    Abstract: A matrix network is a family of matrices, with relatedness modeled by a weighted graph. We consider the task of completing a partially observed matrix network. We assume a novel sampling scheme where a fraction of matrices might be completely unobserved. How can we recover the entire matrix network from incomplete observations? This mathematical problem arises in many applications including medica… ▽ More

    Submitted 7 June, 2018; v1 submitted 2 June, 2016; originally announced June 2016.

    Comments: Accepted by ICML 2018

  20. arXiv:1503.02106  [pdf, other

    math.ST

    Variance Breakdown of Huber (M)-estimators: $n/p \rightarrow m \in (1,\infty)$

    Authors: David L. Donoho, Andrea Montanari

    Abstract: A half century ago, Huber evaluated the minimax asymptotic variance in scalar location estimation, $ \min_ψ\max_{F \in {\cal F}_ε} V(ψ, F) = \frac{1}{I(F_ε^*)} $, where $V(ψ,F)$ denotes the asymptotic variance of the $(M)$-estimator for location with score function $ψ$, and $I(F_ε^*)$ is the minimal Fisher information $ \min_{{\cal F}_ε} I(F)$ over the class of $ε$-Contaminated Normal distribution… ▽ More

    Submitted 6 March, 2015; originally announced March 2015.

    Comments: Based on a lecture delivered at a special colloquium honoring the 50th anniversary of the Seminar für Statistik (SfS) at ETH Zürich, November 25, 2014

    MSC Class: 62C20; 62J05; 62G35

  21. Higher Criticism for Large-Scale Inference, Especially for Rare and Weak Effects

    Authors: David Donoho, Jiashun **

    Abstract: In modern high-throughput data analysis, researchers perform a large number of statistical tests, expecting to find perhaps a small fraction of significant effects against a predominantly null background. Higher Criticism (HC) was introduced to determine whether there are any nonzero effects; more recently, it was applied to feature selection, where it provides a method for selecting useful predic… ▽ More

    Submitted 10 April, 2015; v1 submitted 17 October, 2014; originally announced October 2014.

    Comments: Published at http://dx.doi.org/10.1214/14-STS506 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-STS-STS506

    Journal ref: Statistical Science 2015, Vol. 30, No. 1, 1-25

  22. arXiv:1405.7511  [pdf, ps, other

    math.ST

    Optimal Shrinkage of Singular Values

    Authors: Matan Gavish, David L. Donoho

    Abstract: We consider recovery of low-rank matrices from noisy data by shrinkage of singular values, in which a single, univariate nonlinearity is applied to each of the empirical singular values. We adopt an asymptotic framework, in which the matrix size is much larger than the rank of the signal matrix to be recovered, and the signal-to-noise ratio of the low-rank piece stays constant. For a variety of lo… ▽ More

    Submitted 15 May, 2016; v1 submitted 29 May, 2014; originally announced May 2014.

  23. arXiv:1311.0851  [pdf, ps, other

    math.ST

    Optimal Shrinkage of Eigenvalues in the Spiked Covariance Model

    Authors: David L. Donoho, Matan Gavish, Iain M. Johnstone

    Abstract: We show that in a common high-dimensional covariance model, the choice of loss function has a profound effect on optimal estimation. In an asymptotic framework based on the Spiked Covariance model and use of orthogonally invariant estimators, we show that optimal estimation of the population covariance matrix boils down to design of an optimal shrinker $η$ that acts elementwise on the sample eigen… ▽ More

    Submitted 4 June, 2017; v1 submitted 4 November, 2013; originally announced November 2013.

  24. arXiv:1310.7320  [pdf, other

    math.ST cs.IT

    High Dimensional Robust M-Estimation: Asymptotic Variance via Approximate Message Passing

    Authors: David Donoho, Andrea Montanari

    Abstract: In a recent article (Proc. Natl. Acad. Sci., 110(36), 14557-14562), El Karoui et al. study the distribution of robust regression estimators in the regime in which the number of parameters p is of the same order as the number of samples n. Using numerical simulations and `highly plausible' heuristic arguments, they unveil a striking new phenomenon. Namely, the regression coefficients contain an ext… ▽ More

    Submitted 15 November, 2013; v1 submitted 28 October, 2013; originally announced October 2013.

    Comments: 32 pages, 5 figures (v2 contains numerical simulations)

  25. arXiv:1305.5870  [pdf, other

    stat.ME

    The Optimal Hard Threshold for Singular Values is 4/sqrt(3)

    Authors: Matan Gavish, David L. Donoho

    Abstract: We consider recovery of low-rank matrices from noisy data by hard thresholding of singular values, where singular values below a prescribed threshold $λ$ are set to 0. We study the asymptotic MSE in a framework where the matrix size is large compared to the rank of the matrix to be recovered, and the signal-to-noise ratio of the low-rank piece stays constant. The AMSE-optimal choice of hard thresh… ▽ More

    Submitted 4 June, 2014; v1 submitted 24 May, 2013; originally announced May 2013.

  26. Minimax risk of matrix denoising by singular value thresholding

    Authors: David Donoho, Matan Gavish

    Abstract: An unknown $m$ by $n$ matrix $X_0$ is to be estimated from noisy measurements $Y=X_0+Z$, where the noise matrix $Z$ has i.i.d. Gaussian entries. A popular matrix denoising scheme solves the nuclear norm penalization problem $\operatorname {min}_X\|Y-X\|_F^2/2+λ\|X\|_*$, where $\|X\|_*$ denotes the nuclear norm (sum of singular values). This is the analog, for matrices, of $\ell_1$ penalization in… ▽ More

    Submitted 4 November, 2014; v1 submitted 7 April, 2013; originally announced April 2013.

    Comments: Published in at http://dx.doi.org/10.1214/14-AOS1257 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS1257

    Journal ref: Annals of Statistics 2014, Vol. 42, No. 6, 2413-2440

  27. arXiv:1302.2758  [pdf, other

    astro-ph.IM astro-ph.CO physics.data-an

    Sparsity and the Bayesian Perspective

    Authors: J. -L. Starck, D. L. Donoho, M. J. Fadili, A. Rassat

    Abstract: Sparsity has been recently introduced in cosmology for weak-lensing and CMB data analysis for different applications such as denoising, component separation or inpainting (i.e. filling the missing data or the mask). Although it gives very nice numerical results, CMB sparse inpainting has been severely criticized by top researchers in cosmology, based on arguments derived from a Bayesian perspectiv… ▽ More

    Submitted 15 February, 2013; v1 submitted 12 February, 2013; originally announced February 2013.

    Comments: Accepted in A&A

  28. The Phase Transition of Matrix Recovery from Gaussian Measurements Matches the Minimax MSE of Matrix Denoising

    Authors: David L. Donoho, Matan Gavish, Andrea Montanari

    Abstract: Let $X_0$ be an unknown $M$ by $N$ matrix. In matrix recovery, one takes $n < MN$ linear measurements $y_1,..., y_n$ of $X_0$, where $y_i = \Tr(a_i^T X_0)$ and each $a_i$ is a $M$ by $N$ matrix. For measurement matrices with Gaussian i.i.d entries, it known that if $X_0$ is of low rank, it is recoverable from just a few measurements. A popular approach for matrix recovery is Nuclear Norm Minimizat… ▽ More

    Submitted 10 February, 2013; originally announced February 2013.

  29. arXiv:1112.0708  [pdf, other

    cs.IT cond-mat.stat-mech math.ST

    Information-Theoretically Optimal Compressed Sensing via Spatial Coupling and Approximate Message Passing

    Authors: David L. Donoho, Adel Javanmard, Andrea Montanari

    Abstract: We study the compressed sensing reconstruction problem for a broad class of random, band-diagonal sensing matrices. This construction is inspired by the idea of spatial coupling in coding theory. As demonstrated heuristically and numerically by Krzakala et al. \cite{KrzakalaEtAl}, message passing algorithms can effectively solve the reconstruction problem for spatially coupled measurements with un… ▽ More

    Submitted 18 January, 2013; v1 submitted 3 December, 2011; originally announced December 2011.

    Comments: 60 pages, 7 figures, Sections 3,5 and Appendices A,B are added. The stability constant is quantified (cf Theorem 1.7)

  30. arXiv:1111.1041  [pdf, other

    cs.IT math.ST

    Accurate Prediction of Phase Transitions in Compressed Sensing via a Connection to Minimax Denoising

    Authors: David Donoho, Iain Johnstone, Andrea Montanari

    Abstract: Compressed sensing posits that, within limits, one can undersample a sparse signal and yet reconstruct it accurately. Knowing the precise limits to such undersampling is important both for theory and practice. We present a formula that characterizes the allowed undersampling of generalized sparse objects. The formula applies to Approximate Message Passing (AMP) algorithms for compressed sensing, w… ▽ More

    Submitted 7 January, 2013; v1 submitted 4 November, 2011; originally announced November 2011.

    Comments: 71 pages, 32 pdf figures

  31. arXiv:1103.1943  [pdf, other

    cs.IT math.ST

    Compressed Sensing over $\ell_p$-balls: Minimax Mean Square Error

    Authors: David Donoho, Iain Johnstone, Arian Maleki, Andrea Montanari

    Abstract: We consider the compressed sensing problem, where the object $x_0 \in \bR^N$ is to be recovered from incomplete measurements $y = Ax_0 + z$; here the sensing matrix $A$ is an $n \times N$ random matrix with iid Gaussian entries and $n < N$. A popular method of sparsity-promoting reconstruction is $\ell^1$-penalized least-squares reconstruction (aka LASSO, Basis Pursuit). It is currently popular… ▽ More

    Submitted 23 March, 2011; v1 submitted 10 March, 2011; originally announced March 2011.

    Comments: 41 pages, 11 pdf figures

  32. arXiv:1004.3006  [pdf, ps, other

    math.FA cs.IT math.NA

    Microlocal Analysis of the Geometric Separation Problem

    Authors: David L. Donoho, Gitta Kutyniok

    Abstract: Image data are often composed of two or more geometrically distinct constituents; in galaxy catalogs, for instance, one sees a mixture of pointlike structures (galaxy superclusters) and curvelike structures (filaments). It would be ideal to process a single image and extract two geometrically `pure' images, each one containing features from only one of the two geometric constituents. This seems t… ▽ More

    Submitted 18 April, 2010; originally announced April 2010.

    Comments: 59 pages, 9 figures

    Report number: Technical Report No. 2010-01, Statistics Department, Stanford University

  33. arXiv:1004.1218  [pdf, other

    math.ST cs.IT

    The Noise-Sensitivity Phase Transition in Compressed Sensing

    Authors: David L. Donoho, Arian Maleki, Andrea Montanari

    Abstract: Consider the noisy underdetermined system of linear equations: y=Ax0 + z0, with n x N measurement matrix A, n < N, and Gaussian white noise z0 ~ N(0,σ^2 I). Both y and A are known, both x0 and z0 are unknown, and we seek an approximation to x0. When x0 has few nonzeros, useful approximations are obtained by l1-penalized l2 minimization, in which the reconstruction \hxl solves min || y - Ax||^2/2… ▽ More

    Submitted 7 April, 2010; originally announced April 2010.

    Comments: 40 pages, 13 pdf figures

  34. arXiv:0911.4222  [pdf, other

    cs.IT

    Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation

    Authors: David L. Donoho, Arian Maleki, Andrea Montanari

    Abstract: In a recent paper, the authors proposed a new class of low-complexity iterative thresholding algorithms for reconstructing sparse signals from a small set of linear measurements \cite{DMM}. The new algorithms are broadly referred to as AMP, for approximate message passing. This is the second of two conference papers describing the derivation of these algorithms, connection with related literatur… ▽ More

    Submitted 21 November, 2009; originally announced November 2009.

    Comments: 5 pages, 3 pdf figures, IEEE Information Theory Workshop, Cairo 2010

  35. arXiv:0911.4219  [pdf, ps, other

    cs.IT

    Message Passing Algorithms for Compressed Sensing: I. Motivation and Construction

    Authors: David L. Donoho, Arian Maleki, Andrea Montanari

    Abstract: In a recent paper, the authors proposed a new class of low-complexity iterative thresholding algorithms for reconstructing sparse signals from a small set of linear measurements \cite{DMM}. The new algorithms are broadly referred to as AMP, for approximate message passing. This is the first of two conference papers describing the derivation of these algorithms, connection with the related litera… ▽ More

    Submitted 21 November, 2009; originally announced November 2009.

    Comments: 5 pages, IEEE Information Theory Workshop, Cairo 2010

  36. arXiv:0909.0777  [pdf, other

    math.NA cs.IT cs.MS

    Optimally Tuned Iterative Reconstruction Algorithms for Compressed Sensing

    Authors: Arian Maleki, David L. Donoho

    Abstract: We conducted an extensive computational experiment, lasting multiple CPU-years, to optimally select parameters for two important classes of algorithms for finding sparse solutions of underdetermined systems of linear equations. We make the optimally tuned implementations available at {\tt sparselab.stanford.edu}; they run `out of the box' with no user tuning: it is not necessary to select thresh… ▽ More

    Submitted 3 September, 2009; originally announced September 2009.

    Comments: 12 pages, 14 figures

  37. arXiv:0907.3574  [pdf, ps, other

    cs.IT cond-mat.dis-nn stat.CO

    Message Passing Algorithms for Compressed Sensing

    Authors: David L. Donoho, Arian Maleki, Andrea Montanari

    Abstract: Compressed sensing aims to undersample certain high-dimensional signals, yet accurately reconstruct them by exploiting signal characteristics. Accurate reconstruction is possible when the object to be recovered is sufficiently sparse in a known basis. Currently, the best known sparsity-undersampling tradeoff is achieved when reconstructing by convex optimization -- which is expensive in importan… ▽ More

    Submitted 21 July, 2009; originally announced July 2009.

    Comments: 6 pages paper + 9 pages supplementary information, 13 eps figure. Submitted to Proc. Natl. Acad. Sci. USA

  38. arXiv:0906.2530  [pdf, other

    math.ST cs.IT physics.data-an stat.CO

    Observed Universality of Phase Transitions in High-Dimensional Geometry, with Implications for Modern Data Analysis and Signal Processing

    Authors: David L. Donoho, Jared Tanner

    Abstract: We review connections between phase transitions in high-dimensional combinatorial geometry and phase transitions occurring in modern high-dimensional data analysis and signal processing. In data analysis, such transitions arise as abrupt breakdown of linear model selection, robust data fitting or compressed sensing reconstructions, when the complexity of the model or the number of outliers incre… ▽ More

    Submitted 14 June, 2009; originally announced June 2009.

    Comments: 47 pages, 24 figures, 10 tables

  39. Feature selection by Higher Criticism thresholding: optimal phase diagram

    Authors: David Donoho, Jiashun **

    Abstract: We consider two-class linear classification in a high-dimensional, low-sample size setting. Only a small fraction of the features are useful, the useful features are unknown to us, and each useful feature contributes weakly to the classification decision -- this setting was called the rare/weak model (RW Model). We select features by thresholding feature $z$-scores. The threshold is set by {\it… ▽ More

    Submitted 11 December, 2008; originally announced December 2008.

    Comments: 4 figures, 24 pages

    MSC Class: 62H30 (Primary); 62H15; 62G20 (Secondary); 62G32

  40. arXiv:0807.3590  [pdf, ps, other

    math.MG cs.IT math.OC math.PR

    Counting the Faces of Randomly-Projected Hypercubes and Orthants, with Applications

    Authors: David L. Donoho, Jared Tanner

    Abstract: Let $A$ be an $n$ by $N$ real valued random matrix, and $\h$ denote the $N$-dimensional hypercube. For numerous random matrix ensembles, the expected number of $k$-dimensional faces of the random $n$-dimensional zonotope $A\h$ obeys the formula $E f_k(A\h) /f_k(\h) = 1-P_{N-n,N-k}$, where $P_{N-n,N-k}$ is a fair-coin-tossing probability. The formula applies, for example, where the columns of… ▽ More

    Submitted 22 July, 2008; originally announced July 2008.

    Comments: 21 pages, 3 figures

    MSC Class: 52A22; 52B05; 52B11; 52B12; 62E20; 68P30; 68P25; 68W20; 68W40; 94B20; 94B35; 94B65; 94B70

  41. The Simplest Solution to an Underdetermined System of Linear Equations

    Authors: David Donoho, Hossein Kakavand, James Mammen

    Abstract: Consider a d*n matrix A, with d<n. The problem of solving for x in y=Ax is underdetermined, and has infinitely many solutions (if there are any). Given y, the minimum Kolmogorov complexity solution (MKCS) of the input x is defined to be an input z (out of many) with minimum Kolmogorov-complexity that satisfies y=Az. One expects that if the actual input is simple enough, then MKCS will recover th… ▽ More

    Submitted 19 February, 2007; originally announced February 2007.

    Comments: Proceedings of the IEEE International Symposium on Information Theory Seattle, Washington, July 9-14, 2006

  42. Does median filtering truly preserve edges better than linear filtering?

    Authors: Ery Arias-Castro, David L. Donoho

    Abstract: Image processing researchers commonly assert that "median filtering is better than linear filtering for removing noise in the presence of edges." Using a straightforward large-$n$ decision-theory framework, this folk-theorem is seen to be false in general. We show that median filtering and linear filtering have similar asymptotic worst-case mean-squared error (MSE) when the signal-to-noise ratio… ▽ More

    Submitted 20 April, 2009; v1 submitted 14 December, 2006; originally announced December 2006.

    Comments: Published in at http://dx.doi.org/10.1214/08-AOS604 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS604 MSC Class: 62G08; 62G20 (Primary) 60G35 (Secondary)

    Journal ref: Annals of Statistics 2009, Vol. 37, No. 3, 1172-1206

  43. Multi-scale morphology of the galaxy distribution

    Authors: Enn Saar, Vicent J. Martinez, Jean-Luc Starck, David L. Donoho

    Abstract: Many statistical methods have been proposed in the last years for analyzing the spatial distribution of galaxies. Very few of them, however, can handle properly the border effects of complex observational sample volumes. In this paper, we first show how to calculate the Minkowski Functionals (MF) taking into account these border effects. Then we present a multiscale extension of the MF which giv… ▽ More

    Submitted 31 October, 2006; originally announced October 2006.

    Comments: 17 pages, 19 figures, accepted for publication in MNRAS

    Journal ref: Mon.Not.Roy.Astron.Soc.374:1030-1044,2007

  44. arXiv:math/0607364  [pdf, ps, other

    math.MG math.NA math.PR math.ST

    Counting faces of randomly-projected polytopes when the projection radically lowers dimension

    Authors: David L. Donoho, Jared Tanner

    Abstract: This paper develops asymptotic methods to count faces of random high-dimensional polytopes. Beyond its intrinsic interest, our conclusions have surprising implications - in statistics, probability, information theory, and signal processing - with potential impacts in practical subjects like medical imaging and digital communications. Three such implications concern: convex hulls of Gaussian poin… ▽ More

    Submitted 26 September, 2006; v1 submitted 15 July, 2006; originally announced July 2006.

    Comments: 56 pages

    MSC Class: 52A22; 52B05; 52B11; 52B12; 62E20; 68P30; 68P25; 68W20; 68W40; 94B20 94B35; 94B65; 94B70

  45. Adaptive multiscale detection of filamentary structures in a background of uniform random points

    Authors: Ery Arias-Castro, David L. Donoho, Xiaoming Huo

    Abstract: We are given a set of $n$ points that might be uniformly distributed in the unit square $[0,1]^2$. We wish to test whether the set, although mostly consisting of uniformly scattered points, also contains a small fraction of points sampled from some (a priori unknown) curve with $C^α$-norm bounded by $β$. An asymptotic detection threshold exists in this problem; for a constant $T_-(α,β)>0$, if th… ▽ More

    Submitted 18 May, 2006; originally announced May 2006.

    Comments: Published at http://dx.doi.org/10.1214/009053605000000787 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS0097 MSC Class: 62M30 (Primary) 62G10; 62G20 (Secondary)

    Journal ref: Annals of Statistics 2006, Vol. 34, No. 1, 326-349

  46. arXiv:math/0603673  [pdf, ps, other

    math.PR

    Correction. Connect The Dots: How Many Random Points Can A Regular Curve Pass Through?

    Authors: E. Arias-Castro, D. L. Donoho, X. Huo, C. A. Tovey

    Abstract: Correction for Adv. in Appl. Probab. 37, no. 3 (2005), 571-603

    Submitted 28 March, 2006; originally announced March 2006.

    Comments: 2 pages, 1 figure

    MSC Class: 60D05; 62M40

  47. Asymptotic minimaxity of False Discovery Rate thresholding for sparse exponential data

    Authors: David Donoho, Jiashun **

    Abstract: We apply FDR thresholding to a non-Gaussian vector whose coordinates X_i, i=1,..., n, are independent exponential with individual means $μ_i$. The vector $μ=(μ_i)$ is thought to be sparse, with most coordinates 1 but a small fraction significantly larger than 1; roughly, most coordinates are simply `noise,' but a small fraction contain `signal.' We measure risk by per-coordinate mean-squared err… ▽ More

    Submitted 1 August, 2007; v1 submitted 14 February, 2006; originally announced February 2006.

    Comments: Published at http://dx.doi.org/10.1214/009053606000000920 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS0150 MSC Class: 62H12; 62C20 (Primary) 62G20; 62C10; 62C12. (Secondary)

    Journal ref: Annals of Statistics 2006, Vol. 34, No. 6, 2980-3018

  48. Morphology of the galaxy distribution from wavelet denoising

    Authors: V. J. Martinez, J. -L. Starck, E. Saar, D. L. Donoho, S. Reynolds, P. de la Cruz, S. Paredes

    Abstract: We have developed a method based on wavelets to obtain the true underlying smooth density from a point distribution. The goal has been to reconstruct the density field in an optimal way ensuring that the morphology of the reconstructed field reflects the true underlying morphology of the point field which, as the galaxy distribution, has a genuinely multiscale structure, with near-singular behav… ▽ More

    Submitted 15 August, 2005; originally announced August 2005.

    Comments: Accepted for publication in ApJ

    Journal ref: Astrophys.J.634:744-755,2005

  49. arXiv:math/0505374  [pdf, ps, other

    math.ST

    Adapting to Unknown Sparsity by controlling the False Discovery Rate

    Authors: Felix Abramovich, Yoav Benjamini, David L. Donoho, Iain M. Johnstone

    Abstract: We attempt to recover an $n$-dimensional vector observed in white noise, where $n$ is large and the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing power-law decay bounds on the ordered entries; and controlling the $\ell_p$ norm for $p$ small. We obtain a procedur… ▽ More

    Submitted 18 May, 2005; originally announced May 2005.

    Comments: This is a complete version of a paper to appear in Annals of Statitistics. The paper in AoS has certain proofs abbreviated that are given here in detail

    MSC Class: 62F10; 62G12

  50. Cosmological non-Gaussian Signature Detection: Comparing Performance of Different Statistical Tests

    Authors: J. **, J. -L. Starck, D. L. Donoho, N. Aghanim, O. Forni

    Abstract: Currently, it appears that the best method for non-Gaussianity detection in the Cosmic Microwave Background (CMB) consists in calculating the kurtosis of the wavelet coefficients. We know that wavelet-kurtosis outperforms other methods such as the bispectrum, the genus, ridgelet-kurtosis and curvelet-kurtosis on an empirical basis, but relatively few studies have compared other transform-based s… ▽ More

    Submitted 16 March, 2005; originally announced March 2005.

    Comments: Manuscript will all figures can be download at: http://jstarck.free.fr/HC04.pdf