Skip to main content

Showing 1–50 of 66 results for author: Arias-Castro, E

Searching in archive math. Search in all archives.
.
  1. arXiv:2407.03574  [pdf, other

    stat.ML cs.LG math.ST

    An Axiomatic Definition of Hierarchical Clustering

    Authors: Ery Arias-Castro, Elizabeth Coda

    Abstract: In this paper, we take an axiomatic approach to defining a population hierarchical clustering for piecewise constant densities, and in a similar manner to Lebesgue integration, extend this definition to more general densities. When the density satisfies some mild conditions, e.g., when it has connected support, is continuous, and vanishes only at infinity, or when the connected components of the d… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  2. arXiv:2312.04924  [pdf, other

    stat.ME math.ST

    Sparse Anomaly Detection Across Referentials: A Rank-Based Higher Criticism Approach

    Authors: Ivo V. Stoepker, Rui M. Castro, Ery Arias-Castro

    Abstract: Detecting anomalies in large sets of observations is crucial in various applications, such as epidemiological studies, gene expression studies, and systems monitoring. We consider settings where the units of interest result in multiple independent observations from potentially distinct referentials. Scan statistics and related methods are commonly used in such settings, but rely on stringent model… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    MSC Class: 62G10; 62G20; 62G32; 62J15

  3. arXiv:2310.10900  [pdf, other

    math.ST cs.NI math.PR

    Stability of Sequential Lateration and of Stress Minimization in the Presence of Noise

    Authors: Ery Arias-Castro, Siddharth Vishwanath

    Abstract: Sequential lateration is a class of methods for multidimensional scaling where a suitable subset of nodes is first embedded by some method, e.g., a clique embedded by classical scaling, and then the remaining nodes are recursively embedded by lateration. A graph is a lateration graph when it can be embedded by such a procedure. We provide a stability result for a particular variant of sequential l… ▽ More

    Submitted 26 March, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2207.07218

  4. arXiv:2310.00211  [pdf, ps, other

    math.ST

    Theoretical Foundations of Ordinal Multidimensional Scaling, Including Internal and External Unfolding

    Authors: Ery Arias-Castro, Clément Berenfeld, Daniel Kane

    Abstract: We provide a comprehensive theory of multiple variants of ordinal multidimensional scaling, including external and internal unfolding. We do so in the continuous model of Shepard (1966).

    Submitted 5 October, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: same exact version with funding information added

  5. arXiv:2208.14540  [pdf, ps, other

    math.ST cs.LG math.MG

    Embedding Functional Data: Multidimensional Scaling and Manifold Learning

    Authors: Ery Arias-Castro, Wanli Qiao

    Abstract: We adapt concepts, methodology, and theory originally developed in the areas of multidimensional scaling and dimensionality reduction for multivariate data to the functional setting. We focus on classical scaling and Isomap -- prototypical methods that have played important roles in these area -- and showcase their use in the context of functional data analysis. In the process, we highlight the cr… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

  6. arXiv:2207.11121  [pdf, other

    math.OC stat.CO

    Fitting a Multi-modal Density by Dynamic Programming

    Authors: Ery Arias-Castro, He Jiang

    Abstract: We consider the problem of fitting a probability density function when it is constrained to have a given number of modal intervals. We propose a dynamic programming approach to solving this problem numerically. When this number is not known, we provide several data-driven ways for selecting it. We perform some numerical experiments to illustrate our methodology.

    Submitted 14 July, 2022; originally announced July 2022.

  7. arXiv:2207.07218  [pdf, other

    stat.ME math.MG math.ST stat.ML

    On the Selection of Tuning Parameters for Patch-Stitching Embedding Methods

    Authors: Ery Arias-Castro, Phong Alain Chau

    Abstract: While classical scaling, just like principal component analysis, is parameter-free, other methods for embedding multivariate data require the selection of one or several tuning parameters. This tuning can be difficult due to the unsupervised nature of the situation. We propose a simple, almost obvious, approach to supervise the choice of tuning parameter(s): minimize a notion of stress. We apply t… ▽ More

    Submitted 17 October, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: Title change. Theory was removed to spin off another paper [arXiv:2310.10900]

  8. arXiv:2202.09023  [pdf, ps, other

    math.ST math.CA stat.ML

    Clustering by Hill-Climbing: Consistency Results

    Authors: Ery Arias-Castro, Wanli Qiao

    Abstract: We consider several hill-climbing approaches to clustering as formulated by Fukunaga and Hostetler in the 1970's. We study both continuous-space and discrete-space (i.e., medoid) variants and establish their consistency.

    Submitted 18 February, 2022; originally announced February 2022.

  9. arXiv:2111.10298  [pdf, other

    math.ST cs.LG

    An Asymptotic Equivalence between the Mean-Shift Algorithm and the Cluster Tree

    Authors: Ery Arias-Castro, Wanli Qiao

    Abstract: Two important nonparametric approaches to clustering emerged in the 1970's: clustering by level sets or cluster tree as proposed by Hartigan, and clustering by gradient lines or gradient flow as proposed by Fukunaga and Hosteler. In a recent paper, we argue the thesis that these two approaches are fundamentally the same by showing that the gradient flow provides a way to move along the cluster tre… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

  10. arXiv:2109.08362  [pdf, other

    math.ST stat.ML

    Moving Up the Cluster Tree with the Gradient Flow

    Authors: Ery Arias-Castro, Wanli Qiao

    Abstract: The paper establishes a strong correspondence between two important clustering approaches that emerged in the 1970's: clustering by level sets or cluster tree as proposed by Hartigan and clustering by gradient lines or gradient flow as proposed by Fukunaga and Hostetler. We do so by showing that we can move up the cluster tree by following the gradient ascent flow.

    Submitted 9 December, 2021; v1 submitted 17 September, 2021; originally announced September 2021.

    Comments: This is an expanded version. We changed the title to better reflect the contribution made in the paper

  11. arXiv:2105.03122  [pdf, other

    math.ST math.PR

    The Coreness and H-Index of Random Geometric Graphs

    Authors: Eddie Aamari, Ery Arias-Castro, Clément Berenfeld

    Abstract: In network analysis, a measure of node centrality provides a scale indicating how central a node is within a network. The coreness is a popular notion of centrality that accounts for the maximal smallest degree of a subgraph containing a given node. In this paper, we study the coreness of random geometric graphs and show that, with an increasing number of nodes and properly chosen connectivity rad… ▽ More

    Submitted 13 June, 2024; v1 submitted 7 May, 2021; originally announced May 2021.

  12. arXiv:2104.07870  [pdf, other

    math.ST stat.ME

    Estimation of the Global Mode of a Density: Minimaxity, Adaptation, and Computational Complexity

    Authors: Ery Arias-Castro, Wanli Qiao, Lin Zheng

    Abstract: We consider the estimation of the global mode of a density under some decay rate condition around the global mode. We show that the maximum of a histogram, with proper choice of bandwidth, achieves the minimax rate that we establish for the setting that we consider. This is based on knowledge of the decay rate. Addressing the situation where the decay rate is unknown, we propose a multiscale varia… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  13. arXiv:2012.07937  [pdf, other

    math.ST eess.SP

    Template Matching with Ranks

    Authors: Ery Arias-Castro, Lin Zheng

    Abstract: We consider the problem of matching a template to a noisy signal. Motivated by some recent proposals in the signal processing literature, we suggest a rank-based method and study its asymptotic properties using some well-established techniques in empirical process theory combined with Hájek's projection method. The resulting estimator of the shift is shown to achieve a parametric rate of convergen… ▽ More

    Submitted 14 December, 2020; originally announced December 2020.

  14. arXiv:2011.12478  [pdf, other

    stat.ML cs.LG math.ST

    Minimax Estimation of Distances on a Surface and Minimax Manifold Learning in the Isometric-to-Convex Setting

    Authors: Ery Arias-Castro, Phong Alain Chau

    Abstract: We start by considering the problem of estimating intrinsic distances on a smooth submanifold. We show that minimax optimality can be obtained via a reconstruction of the surface, and discuss the use of a particular mesh construction -- the tangential Delaunay complex -- for that purpose. We then turn to manifold learning and argue that a variant of Isomap where the distances are instead computed… ▽ More

    Submitted 3 October, 2023; v1 submitted 24 November, 2020; originally announced November 2020.

  15. arXiv:2010.09906  [pdf, ps, other

    math.ST stat.ME

    On the Consistency of Metric and Non-Metric K-medoids

    Authors: Ery Arias-Castro, He Jiang

    Abstract: We establish the consistency of K-medoids in the context of metric spaces. We start by proving that K-medoids is asymptotically equivalent to K-means restricted to the support of the underlying distribution under general conditions, including a wide selection of loss functions. This asymptotic equivalence, in turn, enables us to apply the work of Parna (1986) on the consistency of K-means. This ge… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

  16. arXiv:2009.04072  [pdf, other

    math.ST eess.SP

    Template Matching and Change Point Detection by M-estimation

    Authors: Ery Arias-Castro, Lin Zheng

    Abstract: We consider the fundamental problem of matching a template to a signal. We do so by M-estimation, which encompasses procedures that are robust to gross errors (i.e., outliers). Using standard results from empirical process theory, we derive the convergence rate and the asymptotic distribution of the M-estimator under relatively mild assumptions. We also discuss the optimality of the estimator, bot… ▽ More

    Submitted 8 September, 2020; originally announced September 2020.

  17. arXiv:2009.03117  [pdf, other

    stat.ME math.ST

    Anomaly Detection for a Large Number of Streams: A Permutation-Based Higher Criticism Approach

    Authors: Ivo V. Stoepker, Rui M. Castro, Ery Arias-Castro, Edwin van den Heuvel

    Abstract: Anomaly detection when observing a large number of data streams is essential in a variety of applications, ranging from epidemiological studies to monitoring of complex systems. High-dimensional scenarios are usually tackled with scan-statistics and related methods, requiring stringent modeling assumptions for proper calibration. In this work we take a non-parametric stance, and propose a permutat… ▽ More

    Submitted 6 October, 2022; v1 submitted 7 September, 2020; originally announced September 2020.

  18. arXiv:1906.08884  [pdf, other

    math.ST stat.CO stat.ME

    A Multiscale Scan Statistic for Adaptive Submatrix Localization

    Authors: Yuchao Liu, Ery Arias-Castro

    Abstract: We consider the problem of localizing a submatrix with larger-than-usual entry values inside a data matrix, without the prior knowledge of the submatrix size. We establish an optimization framework based on a multiscale scan statistic, and develop algorithms in order to approach the optimizer. We also show that our estimator only requires a signal strength of the same order as the minimax estimato… ▽ More

    Submitted 20 June, 2019; originally announced June 2019.

    Comments: The original version was accepted by KDD2019 Research Track. Detail of the proof is available at https://escholarship.org/uc/item/9wt627dg

  19. arXiv:1811.07105  [pdf, other

    math.ST

    Detection of Sparse Positive Dependence

    Authors: Ery Arias-Castro, Rong Huang, Nicolas Verzelen

    Abstract: In a bivariate setting, we consider the problem of detecting a sparse contamination or mixture component, where the effect manifests itself as a positive dependence between the variables, which are otherwise independent in the main component. We first look at this problem in the context of a normal mixture model. In essence, the situation reduces to a univariate setting where the effect is a decre… ▽ More

    Submitted 9 January, 2020; v1 submitted 17 November, 2018; originally announced November 2018.

  20. arXiv:1811.01101  [pdf, other

    math.PR

    Some Random Paths with Angle Constraints

    Authors: Clément Berenfeld, Ery Arias-Castro

    Abstract: We propose a simple, geometrically-motivated construction of smooth random paths in the plane. The construction is such that, with probability one, the paths have finite curvature everywhere (and the realizations are visually pleasing when simulated on a computer). Our construction is Markov of order 2. We show that a simpler construction which is Markov of order 1 fails to exhibit the desired fin… ▽ More

    Submitted 2 November, 2018; originally announced November 2018.

  21. arXiv:1810.09569  [pdf, other

    cs.LG math.NA stat.ML

    Perturbation Bounds for Procrustes, Classical Scaling, and Trilateration, with Applications to Manifold Learning

    Authors: Ery Arias-Castro, Adel Javanmard, Bruno Pelletier

    Abstract: One of the common tasks in unsupervised learning is dimensionality reduction, where the goal is to find meaningful low-dimensional structures hidden in high-dimensional data. Sometimes referred to as manifold learning, this problem is closely related to the problem of localization, which aims at embedding a weighted graph into a low-dimensional Euclidean space. Several methods have been proposed f… ▽ More

    Submitted 24 October, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

    Comments: 33 pages, 6 Figures

  22. arXiv:1808.00631  [pdf, other

    math.ST

    A Scan Procedure for Multiple Testing

    Authors: Shiyun Chen, Andrew Ying, Ery Arias-Castro

    Abstract: In a multiple testing framework, we propose a method that identifies the interval with the highest estimated false discovery rate of P-values and rejects the corresponding null hypotheses. Unlike the Benjamini-Hochberg method, which does the same but over intervals with an endpoint at the origin, the new procedure `scans' all intervals. In parallel with \citep*{storey2004strong}, we show that this… ▽ More

    Submitted 1 August, 2018; originally announced August 2018.

  23. arXiv:1807.10785  [pdf, other

    math.ST

    The Sparse Variance Contamination Model

    Authors: Ery Arias-Castro, Rong Huang

    Abstract: We consider a Gaussian contamination (i.e., mixture) model where the contamination manifests itself as a change in variance. We study this model in various asymptotic regimes, in parallel with the work of Ingster (1997) and Donoho and ** (2004), who considered a similar model where the contamination was in the mean instead.

    Submitted 27 July, 2018; originally announced July 2018.

  24. arXiv:1804.10611  [pdf, other

    math.ST math.MG

    On the Estimation of Latent Distances Using Graph Distances

    Authors: Ery Arias-Castro, Antoine Channarond, Bruno Pelletier, Nicolas Verzelen

    Abstract: We are given the adjacency matrix of a geometric graph and the task of recovering the latent positions. We study one of the most popular approaches which consists in using the graph distances and derive error bounds under various assumptions on the link function. In the simplest case where the link function is proportional to an indicator function, the bound matches an information lower bound that… ▽ More

    Submitted 11 August, 2020; v1 submitted 27 April, 2018; originally announced April 2018.

  25. arXiv:1802.08715  [pdf, other

    math.ST

    Detection of Sparse Mixtures: Higher Criticism and Scan Statistic

    Authors: Ery Arias-Castro, Andrew Ying

    Abstract: We consider the problem of detecting a sparse mixture as studied by Ingster (1997) and Donoho and ** (2004). We consider a wide array of base distributions. In particular, we study the situation when the base distribution has polynomial tails, a situation that has not received much attention in the literature. Perhaps surprisingly, we find that in the context of such a power-law distribution, the… ▽ More

    Submitted 23 February, 2018; originally announced February 2018.

  26. arXiv:1711.11220  [pdf, other

    math.ST stat.CO

    RANSAC Algorithms for Subspace Recovery and Subspace Clustering

    Authors: Ery Arias-Castro, Jue Wang

    Abstract: We consider the RANSAC algorithm in the context of subspace recovery and subspace clustering. We derive some theory and perform some numerical experiments. We also draw some correspondences with the methods of Hardt and Moitra (2013) and Chen and Lerman (2009b).

    Submitted 29 November, 2017; originally announced November 2017.

  27. arXiv:1706.09441  [pdf, other

    cs.CG math.MG

    Unconstrained and Curvature-Constrained Shortest-Path Distances and their Approximation

    Authors: Ery Arias-Castro, Thibaut Le Gouic

    Abstract: We study shortest paths and their distances on a subset of a Euclidean space, and their approximation by their equivalents in a neighborhood graph defined on a sample from that subset. In particular, we recover and extend the results of Bernstein et al. (2000). We do the same with curvature-constrained shortest paths and their distances, establishing what we believe are the first approximation bou… ▽ More

    Submitted 24 October, 2018; v1 submitted 28 June, 2017; originally announced June 2017.

  28. arXiv:1705.10190  [pdf, other

    math.ST

    Sequential Multiple Testing

    Authors: Shiyun Chen, Ery Arias-Castro

    Abstract: We study an online multiple testing problem where the hypotheses arrive sequentially in a stream. The test statistics are independent and assumed to have the same distribution under their respective null hypotheses. We investigate two procedures LORD and LOND, proposed by (Javanmard and Montanari, 2015), which are proved to control the FDR in an online manner. In some (static) model, we show that… ▽ More

    Submitted 25 May, 2017; originally announced May 2017.

    Comments: arXiv admin note: text overlap with arXiv:1604.07520

  29. arXiv:1607.08156  [pdf, ps, other

    math.ST

    Remember the Curse of Dimensionality: The Case of Goodness-of-Fit Testing in Arbitrary Dimension

    Authors: Ery Arias-Castro, Bruno Pelletier, Venkatesh Saligrama

    Abstract: Despite a substantial literature on nonparametric two-sample goodness-of-fit testing in arbitrary dimensions spanning decades, there is no mention there of any curse of dimensionality. Only more recently Ramdas et al. (2015) have discussed this issue in the context of kernel methods by showing that their performance degrades with the dimension even when the underlying distributions are isotropic G… ▽ More

    Submitted 11 September, 2018; v1 submitted 27 July, 2016; originally announced July 2016.

    Comments: This version comes after the publication of the paper in the Journal of Nonparametric Statistics. The main change is to cite the work of Ramdas et al. Some very minor typos were also corrected

  30. arXiv:1607.07549  [pdf, ps, other

    math.ST

    Concentration of Measure for Radial Distributions and Consequences for Statistical Modeling

    Authors: Ery Arias-Castro, Xiao Pu

    Abstract: Motivated by problems in high-dimensional statistics such as mixture modeling for classification and clustering, we consider the behavior of radial densities as the dimension increases. We establish a form of concentration of measure, and even a convergence in distribution, under additional assumptions. This extends the well-known behavior of the normal distribution (its concentration around the s… ▽ More

    Submitted 11 September, 2016; v1 submitted 26 July, 2016; originally announced July 2016.

  31. arXiv:1605.01333  [pdf, other

    math.ST stat.ME

    Minimax Estimation of the Volume of a Set with Smooth Boundary

    Authors: Ery Arias-Castro, Beatriz Pateiro-López, Alberto Rodríguez-Casal

    Abstract: We consider the problem of estimating the volume of a compact domain in a Euclidean space based on a uniform sample from the domain. We assume the domain has a boundary with positive reach. We propose a data splitting approach to correct the bias of the plug-in estimator based on the sample alpha-convex hull. We show that this simple estimator achieves a minimax lower bound that we derive. Some nu… ▽ More

    Submitted 4 May, 2016; originally announced May 2016.

  32. arXiv:1604.07520  [pdf, other

    math.ST

    Distribution-free Multiple Testing

    Authors: Ery Arias-Castro, Shiyun Chen

    Abstract: We study a stylized multiple testing problem where the test statistics are independent and assumed to have the same distribution under their respective null hypotheses. We first show that, in the normal means model where the test statistics are normal Z-scores, the well-known method of (Benjamini and Hochberg, 1995) is optimal in some asymptotic sense. We then show that this is also the case of a… ▽ More

    Submitted 26 April, 2016; originally announced April 2016.

  33. arXiv:1604.07449  [pdf, other

    math.ST stat.ME

    Distribution-free Detection of a Submatrix

    Authors: Ery Arias-Castro, Yuchao Liu

    Abstract: We consider the problem of detecting the presence of a submatrix with larger-than-usual values in a large data matrix. This problem was considered in (Butucea and Ingster, 2013) under a one-parameter exponential family, and one of the test they analyzed is the scan test. Taking a nonparametric stance, we show that a calibration by permutation leads to the same (first-order) asymptotic performance.… ▽ More

    Submitted 25 April, 2016; originally announced April 2016.

  34. arXiv:1603.05947  [pdf, ps, other

    math.ST

    Noisy Hypotheses in the Age of Discovery Science

    Authors: Ery Arias-Castro

    Abstract: We draw attention to one specific issue raised by Ioannidis (2005), that of very many hypotheses being tested in a given field of investigation. To better isolate the problem that arises in this (massive) multiple testing scenario, we consider a utopian setting where the hypotheses are tested with no additional bias. We show that, as the number of hypotheses being tested becomes much larger than t… ▽ More

    Submitted 9 November, 2016; v1 submitted 18 March, 2016; originally announced March 2016.

  35. arXiv:1511.01009  [pdf, ps, other

    math.ST

    Detecting a Path of Correlations in a Network

    Authors: Ery Arias-Castro, Gábor Lugosi, Nicolas Verzelen

    Abstract: We consider the problem of detecting an anomaly in the form of a path of correlations hidden in white noise. We provide a minimax lower bound and a test that, under mild assumptions, is able to achieve the lower bound up to a multiplicative constant.

    Submitted 22 December, 2016; v1 submitted 3 November, 2015; originally announced November 2015.

    Comments: arXiv admin note: text overlap with arXiv:1504.06984

  36. arXiv:1509.05790  [pdf, ps, other

    math.ST

    On the Consistency of the Crossmatch Test

    Authors: Ery Arias-Castro, Bruno Pelletier

    Abstract: Rosenbaum (2005) proposed the crossmatch test for two-sample goodness-of-fit testing in arbitrary dimensions. We prove that the test is consistent against all fixed alternatives. In the process, we develop a general consistency result based on (Henze & Penrose, 1999) that applies more generally.

    Submitted 18 September, 2015; originally announced September 2015.

  37. arXiv:1508.03002  [pdf, other

    stat.ME math.ST

    Distribution-Free Detection of Structured Anomalies: Permutation and Rank-Based Scans

    Authors: Ery Arias-Castro, Rui M. Castro, Ervin Tánczos, Meng Wang

    Abstract: The scan statistic is by far the most popular method for anomaly detection, being popular in syndromic surveillance, signal and image processing, and target detection based on sensor networks, among other applications. The use of the scan statistics in such settings yields a hypothesis testing procedure, where the null hypothesis corresponds to the absence of anomalous behavior. If the null distri… ▽ More

    Submitted 24 November, 2016; v1 submitted 12 August, 2015; originally announced August 2015.

  38. arXiv:1507.00065  [pdf, ps, other

    math.ST

    On Estimating the Perimeter Using the Alpha-Shape

    Authors: Ery Arias-Castro, Alberto Rodríguez Casal

    Abstract: We consider the problem of estimating the perimeter of a smooth domain in the plane based on a sample from the uniform distribution over the domain. We study the performance of the estimator defined as the perimeter of the alpha-shape of the sample. Some numerical experiments corroborate our theoretical findings.

    Submitted 30 June, 2015; originally announced July 2015.

  39. arXiv:1505.01247  [pdf, other

    math.ST

    The Sparse Poisson Means Model

    Authors: Ery Arias-Castro, Meng Wang

    Abstract: We consider the problem of detecting a sparse Poisson mixture. Our results parallel those for the detection of a sparse normal mixture, pioneered by Ingster (1997) and Donoho and ** (2004), when the Poisson means are larger than logarithmic in the sample size. In particular, a form of higher criticism achieves the detection boundary in the whole sparse regime. When the Poisson means are smaller t… ▽ More

    Submitted 5 May, 2015; originally announced May 2015.

  40. arXiv:1504.06984  [pdf, other

    math.ST

    Detecting Markov Random Fields Hidden in White Noise

    Authors: Ery Arias-Castro, Sébastien Bubeck, Gábor Lugosi, Nicolas Verzelen

    Abstract: Motivated by change point problems in time series and the detection of textured objects in images, we consider the problem of detecting a piece of a Gaussian Markov random field hidden in white Gaussian noise. We derive minimax lower bounds and propose near-optimal tests.

    Submitted 14 October, 2015; v1 submitted 27 April, 2015; originally announced April 2015.

    Comments: In the 2nd version we removed the part on path detection, which will appear on its own in a separate paper

  41. arXiv:1501.02861  [pdf, ps, other

    math.ST

    Some theory for ordinal embedding

    Authors: Ery Arias-Castro

    Abstract: Motivated by recent work on ordinal embedding (Kleindessner and von Luxburg, 2014), we derive large sample consistency results and rates of convergence for the problem of embedding points based on triple or quadruple distance comparisons. We also consider a variant of this problem where only local comparisons are provided. Finally, inspired by (Jamieson and Nowak, 2011), we bound the number of suc… ▽ More

    Submitted 4 May, 2016; v1 submitted 12 January, 2015; originally announced January 2015.

  42. arXiv:1409.7127  [pdf, other

    math.ST

    Exact Asymptotics for the Scan Statistic and Fast Alternatives

    Authors: James Sharpnack, Ery Arias-Castro

    Abstract: We consider the problem of detecting a rectangle of activation in a grid of sensors in d-dimensions with noisy measurements. This has applications to massive surveillance projects and anomaly detection in large datasets in which one detects anomalously high measurements over rectangular regions, or more generally, blobs. Recently, the asymptotic distribution of a multiscale scan statistic was esta… ▽ More

    Submitted 24 September, 2014; originally announced September 2014.

    MSC Class: 62F03

  43. arXiv:1405.1478  [pdf, other

    math.ST stat.ME

    Detection and Feature Selection in Sparse Mixture Models

    Authors: Nicolas Verzelen, Ery Arias-Castro

    Abstract: We consider Gaussian mixture models in high dimensions and concentrate on the twin tasks of detection and feature selection. Under sparsity assumptions on the difference in means, we derive information bounds and establish the performance of various procedures, including the top sparse eigenvalue of the sample covariance matrix and other projection tests based on moments, such as the skewness and… ▽ More

    Submitted 1 October, 2016; v1 submitted 6 May, 2014; originally announced May 2014.

    Comments: 70 pages

  44. arXiv:1308.2955  [pdf, other

    math.ST stat.ML

    Community Detection in Sparse Random Networks

    Authors: Ery Arias-Castro, Nicolas Verzelen

    Abstract: We consider the problem of detecting a tight community in a sparse random network. This is formalized as testing for the existence of a dense random subgraph in a random graph. Under the null hypothesis, the graph is a realization of an Erdös-Rényi graph on $N$ vertices and with connection probability $p_0$; under the alternative, there is an unknown subgraph on $n$ vertices where the connection p… ▽ More

    Submitted 25 September, 2014; v1 submitted 13 August, 2013; originally announced August 2013.

  45. arXiv:1308.0346  [pdf, other

    math.ST

    Distribution-Free Tests for Sparse Heterogeneous Mixtures

    Authors: Ery Arias-Castro, Meng Wang

    Abstract: We consider the problem of detecting sparse heterogeneous mixtures from a nonparametric perspective, and develop distribution-free tests when all effects have the same sign. Specifically, we assume that the null distribution is symmetric about zero, while the true effects have positive median. We evaluate the precise performance of classical tests for the median (t-test, sign test) and classical t… ▽ More

    Submitted 15 November, 2013; v1 submitted 1 August, 2013; originally announced August 2013.

  46. arXiv:1302.7099  [pdf, ps, other

    math.ST stat.ML

    Community Detection in Random Networks

    Authors: Ery Arias-Castro, Nicolas Verzelen

    Abstract: We formalize the problem of detecting a community in a network into testing whether in a given (random) graph there is a subgraph that is unusually dense. We observe an undirected and unweighted graph on N nodes. Under the null hypothesis, the graph is a realization of an Erdös-Rényi graph with probability p0. Under the (composite) alternative, there is a subgraph of n nodes where the probability… ▽ More

    Submitted 28 February, 2013; originally announced February 2013.

  47. arXiv:1208.6516  [pdf, other

    cs.CV math.ST

    A two-stage denoising filter: the preprocessed Yaroslavsky filter

    Authors: Joseph Salmon, Rebecca Willett, Ery Arias-Castro

    Abstract: This paper describes a simple image noise removal method which combines a preprocessing step with the Yaroslavsky filter for strong numerical, visual, and theoretical performance on a broad class of images. The framework developed is a two-stage approach. In the first stage the image is filtered with a classical denoising method (e.g., wavelet or curvelet thresholding). In the second stage a modif… ▽ More

    Submitted 31 August, 2012; originally announced August 2012.

    ACM Class: I.4.3; I.4.10; I.5.1; G.3

  48. arXiv:1208.2635  [pdf, ps, other

    math.ST

    Variable Selection with Exponential Weights and $l_0$-Penalization

    Authors: Ery Arias-Castro, Karim Lounici

    Abstract: In the context of a linear model with a sparse coefficient vector, exponential weights methods have been shown to be achieve oracle inequalities for prediction. We show that such methods also succeed at variable selection and estimation under the necessary identifiability condition on the design matrix, instead of much stronger assumptions required by other methods such as the Lasso or the Dantzig… ▽ More

    Submitted 16 September, 2012; v1 submitted 13 August, 2012; originally announced August 2012.

    Comments: 23 pages; 1 figures

  49. Detecting positive correlations in a multivariate sample

    Authors: Ery Arias-Castro, Sébastien Bubeck, Gábor Lugosi

    Abstract: We consider the problem of testing whether a correlation matrix of a multivariate normal population is the identity matrix. We focus on sparse classes of alternatives where only a few entries are nonzero and, in fact, positive. We derive a general lower bound applicable to various classes and study the performance of some near-optimal tests. We pay special attention to computational feasibility an… ▽ More

    Submitted 14 April, 2015; v1 submitted 24 February, 2012; originally announced February 2012.

    Comments: Published at http://dx.doi.org/10.3150/13-BEJ565 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm)

    Report number: IMS-BEJ-BEJ565

    Journal ref: Bernoulli 2015, Vol. 21, No. 1, 209-241

  50. arXiv:1112.6235  [pdf, ps, other

    math.ST cs.IT

    Detecting a Vector Based on Linear Measurements

    Authors: Ery Arias-Castro

    Abstract: We consider a situation where the state of a system is represented by a real-valued vector. Under normal circumstances, the vector is zero, while an event manifests as non-zero entries in this vector, possibly few. Our interest is in the design of algorithms that can reliably detect events (i.e., test whether the vector is zero or not) with the least amount of information. We place ourselves in a… ▽ More

    Submitted 29 December, 2011; originally announced December 2011.