Search | arXiv e-print repository

Subtractive random forests with two choices

Authors: Francisco Calvillo, Luc Devroye, Gábor Lugosi

Abstract: Recommendation systems are pivotal in aiding users amid vast online content. Broutin, Devroye, Lugosi, and Oliveira proposed Subtractive Random Forests (\textsc{surf}), a model that emphasizes temporal user preferences. Expanding on \textsc{surf}, we introduce a model for a multi-choice recommendation system, enabling users to select from two independent suggestions based on past interactions. We… ▽ More Recommendation systems are pivotal in aiding users amid vast online content. Broutin, Devroye, Lugosi, and Oliveira proposed Subtractive Random Forests (\textsc{surf}), a model that emphasizes temporal user preferences. Expanding on \textsc{surf}, we introduce a model for a multi-choice recommendation system, enabling users to select from two independent suggestions based on past interactions. We evaluate its effectiveness and robustness across diverse scenarios, incorporating heavy-tailed distributions for time delays. By analyzing user topic evolution, we assess the system's consistency. Our study offers insights into the performance and potential enhancements of multi-choice recommendation systems in practical settings. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.10412 [pdf, other]

Property testing in graphical models: testing small separation numbers

Authors: Luc Devroye, Gábor Lugosi, Piotr Zwiernik

Abstract: In many statistical applications, the dimension is too large to handle for standard high-dimensional machine learning procedures. This is particularly true for graphical models, where the interpretation of a large graph is difficult and learning its structure is often computationally impossible either because the underlying graph is not sufficiently sparse or the number of vertices is too large. T… ▽ More In many statistical applications, the dimension is too large to handle for standard high-dimensional machine learning procedures. This is particularly true for graphical models, where the interpretation of a large graph is difficult and learning its structure is often computationally impossible either because the underlying graph is not sufficiently sparse or the number of vertices is too large. To address this issue, we develop a procedure to test a property of a graph underlying a graphical model that requires only a subquadratic number of correlation queries (i.e., we require that the algorithm only can access a tiny fraction of the covariance matrix). This provides a conceptually simple test to determine whether the underlying graph is a tree or, more generally, if it has a small separation number, a quantity closely related to the treewidth of the graph. The proposed method is a divide-and-conquer algorithm that can be applied to quite general graphical models. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2404.04462 [pdf, ps, other]

On the size of temporal cliques in subcritical random temporal graphs

Authors: Caelan Atamanchuk, Luc Devroye, Gabor Lugosi

Abstract: A \emph{random temporal graph} is an Erdős-Rényi random graph $G(n,p)$, together with a random ordering of its edges. A path in the graph is called \emph{increasing} if the edges on the path appear in increasing order. A set $S$ of vertices forms a \emph{temporal clique} if for all $u,v \in S$, there is an increasing path from $u$ to $v$. \cite{Becker2023} proved that if $p=c\log n/n$ for $c>1$, t… ▽ More A \emph{random temporal graph} is an Erdős-Rényi random graph $G(n,p)$, together with a random ordering of its edges. A path in the graph is called \emph{increasing} if the edges on the path appear in increasing order. A set $S$ of vertices forms a \emph{temporal clique} if for all $u,v \in S$, there is an increasing path from $u$ to $v$. \cite{Becker2023} proved that if $p=c\log n/n$ for $c>1$, then, with high probability, there is a temporal clique of size $n-o(n)$. On the other hand, for $c<1$, with high probability, the largest temporal clique is of size $o(n)$. In this note we improve the latter bound by showing that, for $c<1$, the largest temporal clique is of \emph{constant} size with high probability. △ Less

Submitted 28 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.01545 [pdf, other]

Burning Random Trees

Authors: Luc Devroye, Austin Eide, Pawel Pralat

Abstract: Let $\mathcal{T}$ be a Galton-Watson tree with a given offspring distribution $ξ$, where $ξ$ is a $Z_{\geq 0}$-valued random variable with $E[ξ] = 1$ and $0 < σ^{2}:=Var[ξ] < \infty$. For $n \geq 1$, let $T_{n}$ be the tree $\mathcal{T}$ conditioned to have $n$ vertices. In this paper we investigate $b(T_n)$, the burning number of $T_n$. Our main result shows that asymptotically almost surely… ▽ More Let $\mathcal{T}$ be a Galton-Watson tree with a given offspring distribution $ξ$, where $ξ$ is a $Z_{\geq 0}$-valued random variable with $E[ξ] = 1$ and $0 < σ^{2}:=Var[ξ] < \infty$. For $n \geq 1$, let $T_{n}$ be the tree $\mathcal{T}$ conditioned to have $n$ vertices. In this paper we investigate $b(T_n)$, the burning number of $T_n$. Our main result shows that asymptotically almost surely $b(T_n)$ is of the order of $n^{1/3}$. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 11 pages

arXiv:2403.20185 [pdf, other]

Random friend trees

Authors: Louigi Addario Berry, Simon Briend, Luc Devroye, Serte Donderwinkel, Céline Kerriou, Gábor Lugosi

Abstract: We study a random recursive tree model featuring complete redirection called the random friend tree and introduced by Saramäki and Kaski. Vertices are attached in a sequential manner one by one by selecting an existing target vertex and connecting to one of its neighbours (or friends), chosen uniformly at random. This model has interesting emergent properties, such as a highly skewed degree sequen… ▽ More We study a random recursive tree model featuring complete redirection called the random friend tree and introduced by Saramäki and Kaski. Vertices are attached in a sequential manner one by one by selecting an existing target vertex and connecting to one of its neighbours (or friends), chosen uniformly at random. This model has interesting emergent properties, such as a highly skewed degree sequence. In contrast to the preferential attachment model, these emergent phenomena stem from a local rather than a global attachment mechanism. The structure of the resulting tree is also strikingly different from both the preferential attachment tree and the uniform random recursive tree: every edge is incident to a macro-hub of asymptotically linear degree, and with high probability all but at most $n^{9/10}$ vertices in a tree of size $n$ are leaves. We prove various results on the neighbourhood of fixed vertices and edges, and we study macroscopic properties such as the diameter and the degree distribution, providing insights into the overall structure of the tree. We also present a number of open questions on this model and related models. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: 36 pages, 4 figures

MSC Class: 60C05 60J80 05C05

arXiv:2311.13059 [pdf, ps, other]

A note on estimating the dimension from a random geometric graph

Authors: Caelan Atamanchuk, Luc Devroye, Gabor Lugosi

Abstract: Let $G_n$ be a random geometric graph with vertex set $[n]$ based on $n$ i.i.d.\ random vectors $X_1,\ldots,X_n$ drawn from an unknown density $f$ on $\R^d$. An edge $(i,j)$ is present when $\|X_i -X_j\| \le r_n$, for a given threshold $r_n$ possibly depending upon $n$, where $\| \cdot \|$ denotes Euclidean distance. We study the problem of estimating the dimension $d$ of the underlying space when… ▽ More Let $G_n$ be a random geometric graph with vertex set $[n]$ based on $n$ i.i.d.\ random vectors $X_1,\ldots,X_n$ drawn from an unknown density $f$ on $\R^d$. An edge $(i,j)$ is present when $\|X_i -X_j\| \le r_n$, for a given threshold $r_n$ possibly depending upon $n$, where $\| \cdot \|$ denotes Euclidean distance. We study the problem of estimating the dimension $d$ of the underlying space when we have access to the adjacency matrix of the graph but do not know $r_n$ or the vectors $X_i$. The main result of the paper is that there exists an estimator of $d$ that converges to $d$ in probability as $n \to \infty$ for all densities with $\int f^5 < \infty$ whenever $n^{3/2} r_n^d \to \infty$ and $r_n = o(1)$. The conditions allow very sparse graphs since when $n^{3/2} r_n^d \to 0$, the graph contains isolated edges only, with high probability. We also show that, without any condition on the density, a consistent estimator of $d$ exists when $n r_n^d \to \infty$ and $r_n = o(1)$. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2310.16715 [pdf, ps, other]

An Algorithm to Recover Shredded Random Matrices

Authors: Caelan Atamanchuk, Luc Devroye, Massimo Vicenzo

Abstract: Given some binary matrix $M$, suppose we are presented with the collection of its rows and columns in independent arbitrary orderings. From this information, are we able to recover the unique original orderings and matrix? We present an algorithm that identifies whether there is a unique ordering associated with a set of rows and columns, and outputs either the unique correct orderings for the row… ▽ More Given some binary matrix $M$, suppose we are presented with the collection of its rows and columns in independent arbitrary orderings. From this information, are we able to recover the unique original orderings and matrix? We present an algorithm that identifies whether there is a unique ordering associated with a set of rows and columns, and outputs either the unique correct orderings for the rows and columns or the full collection of all valid orderings and valid matrices. We show that there is a constant $c > 0$ such that the algorithm terminates in $O(n^2)$ time with high probability and in expectation for random $n \times n$ binary matrices with i.i.d.\ Bernoulli $(p)$ entries $(m_{ij})_{ij=1}^n$ such that $\frac{c\log^2(n)}{n(\log\log(n))^2} \leq p \leq \frac{1}{2}$. △ Less

Submitted 23 April, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

MSC Class: 60C05 (Primary) 68Q25 (Secondary)

arXiv:2304.03741 [pdf, other]

A Proletarian Approach to Generating Eigenvalues of GUE Matrices

Authors: Luc Devroye, Jad Hamdan

Abstract: We propose a simple algorithm to generate random variables described by densities equaling squared Hermite functions. Using results from random matrix theory, we utilize this to generate a randomly chosen eigenvalue of a matrix from the Gaussian Unitary Ensemble (GUE) in sublinear expected time in the RAM model. We propose a simple algorithm to generate random variables described by densities equaling squared Hermite functions. Using results from random matrix theory, we utilize this to generate a randomly chosen eigenvalue of a matrix from the Gaussian Unitary Ensemble (GUE) in sublinear expected time in the RAM model. △ Less

Submitted 14 September, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

Comments: 12 pages, 2 figures

arXiv:2210.10544 [pdf, ps, other]

Subtractive random forests

Authors: Nicolas Broutin, Luc Devroye, Gabor Lugosi, Roberto Imbuzeiro Oliveira

Abstract: Motivated by online recommendation systems, we study a family of random forests. The vertices of the forest are labeled by integers. Each non-positive integer $i\le 0$ is the root of a tree. Vertices labeled by positive integers $n \ge 1$ are attached sequentially such that the parent of vertex $n$ is $n-Z_n$, where the $Z_n$ are i.i.d.\ random variables taking values in $\mathbb N$. We study seve… ▽ More Motivated by online recommendation systems, we study a family of random forests. The vertices of the forest are labeled by integers. Each non-positive integer $i\le 0$ is the root of a tree. Vertices labeled by positive integers $n \ge 1$ are attached sequentially such that the parent of vertex $n$ is $n-Z_n$, where the $Z_n$ are i.i.d.\ random variables taking values in $\mathbb N$. We study several characteristics of the resulting random forest. In particular, we establish bounds for the expected tree sizes, the number of trees in the forest, the number of leaves, the maximum degree, and the height of the forest. We show that for all distributions of the $Z_n$, the forest contains at most one infinite tree, almost surely. If ${\mathbb E} Z_n < \infty$, then there is a unique infinite tree and the total size of the remaining trees is finite, with finite expected value if ${\mathbb E}Z_n^2 < \infty$. If ${\mathbb E} Z_n = \infty$ then almost surely all trees are finite. △ Less

Submitted 25 February, 2024; v1 submitted 19 October, 2022; originally announced October 2022.

arXiv:2203.08006 [pdf, ps, other]

Estimating monotone densities by cellular binary trees

Authors: Luc Devroye, Jad Hamdan

Abstract: We propose a novel, simple density estimation algorithm for bounded monotone densities with compact support under a cellular restriction. We show that its expected error ($L_1$ distance) converges at a rate of $n^{-1/3}$, that its expected runtime is sublinear and, in doing so, find a connection to the theory of Galton--Watson processes. We propose a novel, simple density estimation algorithm for bounded monotone densities with compact support under a cellular restriction. We show that its expected error ($L_1$ distance) converges at a rate of $n^{-1/3}$, that its expected runtime is sublinear and, in doing so, find a connection to the theory of Galton--Watson processes. △ Less

Submitted 15 March, 2022; originally announced March 2022.

arXiv:2106.14389 [pdf, ps, other]

On the peel number and the leaf-height of a Galton-Watson tree

Authors: Luc Devroye, Marcel K. Goh, Rosie Y. Zhao

Abstract: We study several parameters of a random Bienaymé-Galton-Watson tree $T_n$ of size $n$ defined in terms of an offspring distribution $ξ$ with mean $1$ and nonzero finite variance $σ^2$. Let $f(s)={\bf E}\{s^ξ\}$ be the generating function of the random variable $ξ$. We show that the independence number is in probability asymptotic to $qn$, where $q$ is the unique solution to $q = f(1-q)$. One of th… ▽ More We study several parameters of a random Bienaymé-Galton-Watson tree $T_n$ of size $n$ defined in terms of an offspring distribution $ξ$ with mean $1$ and nonzero finite variance $σ^2$. Let $f(s)={\bf E}\{s^ξ\}$ be the generating function of the random variable $ξ$. We show that the independence number is in probability asymptotic to $qn$, where $q$ is the unique solution to $q = f(1-q)$. One of the many algorithms for finding the largest independent set of nodes uses a notion of repeated peeling away of all leaves and their parents. The number of rounds of peeling is shown to be in probability asymptotic to $\log n / \log\bigl(1/f'(1-q)\bigr)$. Finally, we study a related parameter which we call the leaf-height. Also sometimes called the protection number, this is the maximal shortest path length between any node and a leaf in its subtree. If $p_1 = {\bf P}\{ξ=1\}>0$, then we show that the maximum leaf-height over all nodes in $T_n$ is in probability asymptotic to $\log n/\log(1/p_1)$. If $p_1 = 0$ and $κ$ is the first integer $i>1$ with ${\bf P}\{ξ=i\}>0$, then the leaf-height is in probability asymptotic to $\log_κ\log n$. △ Less

Submitted 12 April, 2022; v1 submitted 28 June, 2021; originally announced June 2021.

Comments: 20 pages, 5 figures, 1 table, revised according to referee suggestions, added missing "the" in the arXiv metadata title

MSC Class: 60J80

Journal ref: Combinatorics, Probability and Computing 32 (2023), 68-90

arXiv:2105.12046 [pdf, ps, other]

doi 10.46298/dmtcs.7515

Leaf multiplicity in a Bienaymé-Galton-Watson tree

Authors: Anna M. Brandenberger, Luc Devroye, Marcel K. Goh, Rosie Y. Zhao

Abstract: This note defines a notion of multiplicity for nodes in a rooted tree and presents an asymptotic calculation of the maximum multiplicity over all leaves in a Bienaymé-Galton-Watson tree with critical offspring distribution $ξ$, conditioned on the tree being of size $n$. In particular, we show that if $S_n$ is the maximum multiplicity in a conditional Bienaymé-Galton-Watson tree, then… ▽ More This note defines a notion of multiplicity for nodes in a rooted tree and presents an asymptotic calculation of the maximum multiplicity over all leaves in a Bienaymé-Galton-Watson tree with critical offspring distribution $ξ$, conditioned on the tree being of size $n$. In particular, we show that if $S_n$ is the maximum multiplicity in a conditional Bienaymé-Galton-Watson tree, then $S_n = Ω(\log n)$ asymptotically in probability and under the further assumption that ${\bf E}\{2^ξ\} < \infty$, we have $S_n = O(\log n)$ asymptotically in probability as well. Explicit formulas are given for the constants in both bounds. We conclude by discussing links with an alternate definition of multiplicity that arises in the root-estimation problem. △ Less

Submitted 21 March, 2022; v1 submitted 25 May, 2021; originally announced May 2021.

Comments: 17 pages, 6 figures, final journal version

Journal ref: Discrete Mathematics & Theoretical Computer Science, vol. 24, no. 1, Analysis of Algorithms (March 30, 2022) dmtcs:7515

arXiv:2105.01108

Consistent Density Estimation Under Discrete Mixture Models

Authors: Luc Devroye, Alex Dytso

Abstract: This work considers a problem of estimating a mixing probability density $f$ in the setting of discrete mixture models. The paper consists of three parts. The first part focuses on the construction of an $L_1$ consistent estimator of $f$. In particular, under the assumptions that the probability measure $μ$ of the observation is atomic, and the map from $f$ to $μ$ is bijective, it is shown that… ▽ More This work considers a problem of estimating a mixing probability density $f$ in the setting of discrete mixture models. The paper consists of three parts. The first part focuses on the construction of an $L_1$ consistent estimator of $f$. In particular, under the assumptions that the probability measure $μ$ of the observation is atomic, and the map from $f$ to $μ$ is bijective, it is shown that there exists an estimator $f_n$ such that for every density $f$ $\lim_{n\to \infty} \mathbb{E} \left[ \int |f_n -f | \right]=0$. The second part discusses the implementation details. Specifically, it is shown that the consistency for every $f$ can be attained with a computationally feasible estimator. The third part, as a study case, considers a Poisson mixture model. In particular, it is shown that in the Poisson noise setting, the bijection condition holds and, hence, estimation can be performed consistently for every $f$. △ Less

Submitted 10 May, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

Comments: Reason for withdrawal: There is an issue with the proof of Theorem~1

arXiv:2102.12952 [pdf, ps, other]

On the consistency of the Kozachenko-Leonenko entropy estimate

Authors: Luc Devroye, László Györfi

Abstract: We revisit the problem of the estimation of the differential entropy $H(f)$ of a random vector $X$ in $R^d$ with density $f$, assuming that $H(f)$ exists and is finite. In this note, we study the consistency of the popular nearest neighbor estimate $H_n$ of Kozachenko and Leonenko. Without any smoothness condition we show that the estimate is consistent ($E\{|H_n - H(f)|\} \to 0$ as… ▽ More We revisit the problem of the estimation of the differential entropy $H(f)$ of a random vector $X$ in $R^d$ with density $f$, assuming that $H(f)$ exists and is finite. In this note, we study the consistency of the popular nearest neighbor estimate $H_n$ of Kozachenko and Leonenko. Without any smoothness condition we show that the estimate is consistent ($E\{|H_n - H(f)|\} \to 0$ as $n \to \infty$) if and only if $\mathbb{E} \{ \log ( \| X \| + 1 )\} < \infty$. Furthermore, if $X$ has compact support, then $H_n \to H(f)$ almost surely. △ Less

Submitted 25 February, 2021; originally announced February 2021.

arXiv:2010.11537 [pdf, ps, other]

On Mean Estimation for Heteroscedastic Random Variables

Authors: Luc Devroye, Silvio Lattanzi, Gabor Lugosi, Nikita Zhivotovskiy

Abstract: We study the problem of estimating the common mean $μ$ of $n$ independent symmetric random variables with different and unknown standard deviations $σ_1 \le σ_2 \le \cdots \leσ_n$. We show that, under some mild regularity assumptions on the distribution, there is a fully adaptive estimator $\widehatμ$ such that it is invariant to permutations of the elements of the sample and satisfies that, up to… ▽ More We study the problem of estimating the common mean $μ$ of $n$ independent symmetric random variables with different and unknown standard deviations $σ_1 \le σ_2 \le \cdots \leσ_n$. We show that, under some mild regularity assumptions on the distribution, there is a fully adaptive estimator $\widehatμ$ such that it is invariant to permutations of the elements of the sample and satisfies that, up to logarithmic factors, with high probability, \[ |\widehatμ - μ| \lesssim \min\left\{σ_{m^*}, \frac{\sqrt{n}}{\sum_{i = \sqrt{n}}^n σ_i^{-1}} \right\}~, \] where the index $m^* \lesssim \sqrt{n}$ satisfies $m^* \approx \sqrt{σ_{m^*}\sum_{i = m^*}^nσ_i^{-1}}$. △ Less

Submitted 22 October, 2020; originally announced October 2020.

Comments: 29 pages

arXiv:2010.08613 [pdf, ps, other]

The Horton-Strahler Number of Conditioned Galton-Watson Trees

Authors: Anna M. Brandenberger, Luc Devroye, Tommy Reddad

Abstract: The Horton-Strahler number of a tree is a measure of its branching complexity; it is also known in the literature as the register function. We show that for critical Galton-Watson trees with finite variance conditioned to be of size $n$, the Horton-Strahler number grows as $\frac{1}{2}\log_2 n$ in probability. We further define some generalizations of this number. Among these are the rigid Horton-… ▽ More The Horton-Strahler number of a tree is a measure of its branching complexity; it is also known in the literature as the register function. We show that for critical Galton-Watson trees with finite variance conditioned to be of size $n$, the Horton-Strahler number grows as $\frac{1}{2}\log_2 n$ in probability. We further define some generalizations of this number. Among these are the rigid Horton-Strahler number and the $k$-ary register function, for which we prove asymptotic results analogous to the standard case. △ Less

Submitted 16 October, 2020; originally announced October 2020.

Comments: 26 pages, 3 figures

MSC Class: 60C05; 60J80 (Primary) 05C80; 05C05 (Secondary)

arXiv:2007.05681 [pdf, ps, other]

Root estimation in Galton-Watson trees

Authors: Anna M. Brandenberger, Luc Devroye, Marcel K. Goh

Abstract: Given only the free-tree structure of a tree, the root estimation problem asks if one can guess which of the free tree's nodes is the root of the original tree. We determine the maximum-likelihood estimator for the root of a free tree when the underlying tree is a size-conditioned Galton-Watson tree and calculate its probability of being correct. Given only the free-tree structure of a tree, the root estimation problem asks if one can guess which of the free tree's nodes is the root of the original tree. We determine the maximum-likelihood estimator for the root of a free tree when the underlying tree is a size-conditioned Galton-Watson tree and calculate its probability of being correct. △ Less

Submitted 17 August, 2021; v1 submitted 11 July, 2020; originally announced July 2020.

Comments: 24 pages, TeX; the main change is that Section 3 has been removed and its proofs incorporated into other sections; various other small fixes and changes for readability

Journal ref: Random Structures and Algorithms 61 (2022), 520-542

arXiv:2006.11787 [pdf, other]

Broadcasting on random recursive trees

Authors: Louigi Addario-Berry, Luc Devroye, Gabor Lugosi, Vasiliki Velona

Abstract: We study the broadcasting problem when the underlying tree is a random recursive tree. The root of the tree has a random bit value assigned. Every other vertex has the same bit value as its parent with probability $1-q$ and the opposite value with probability $q$, where $q \in [0,1]$. The broadcasting problem consists in estimating the value of the root bit upon observing the unlabeled tree, toget… ▽ More We study the broadcasting problem when the underlying tree is a random recursive tree. The root of the tree has a random bit value assigned. Every other vertex has the same bit value as its parent with probability $1-q$ and the opposite value with probability $q$, where $q \in [0,1]$. The broadcasting problem consists in estimating the value of the root bit upon observing the unlabeled tree, together with the bit value associated with every vertex. In a more difficult version of the problem, the unlabeled tree is observed but only the bit values of the leaves are observed. When the underlying tree is a uniform random recursive tree, in both variants of the problem we characterize the values of $q$ for which the optimal reconstruction method has a probability of error bounded away from $1/2$. We also show that the probability of error is bounded by a constant times $q$. Two simple reconstruction rules are analyzed in detail. One of them is the simple majority vote, the other is the bit value of the centroid of the tree. Most results are extended to linear preferential attachment trees as well. △ Less

Submitted 24 April, 2021; v1 submitted 21 June, 2020; originally announced June 2020.

arXiv:2005.01242 [pdf, other]

Probabilistic Analysis of RRT Trees

Authors: Konrad Anand, Luc Devroye

Abstract: This thesis presents analysis of the properties and run-time of the Rapidly-exploring Random Tree (RRT) algorithm. It is shown that the time for the RRT with stepsize $ε$ to grow close to every point in the $d$-dimensional unit cube is $Θ\left(\frac1{ε^d} \log \left(\frac1ε\right)\right)$. Also, the time it takes for the tree to reach a region of positive probability is… ▽ More This thesis presents analysis of the properties and run-time of the Rapidly-exploring Random Tree (RRT) algorithm. It is shown that the time for the RRT with stepsize $ε$ to grow close to every point in the $d$-dimensional unit cube is $Θ\left(\frac1{ε^d} \log \left(\frac1ε\right)\right)$. Also, the time it takes for the tree to reach a region of positive probability is $O\left(ε^{-\frac32}\right)$. Finally, a relationship is shown to the Nearest Neighbour Tree (NNT). This relationship shows that the total Euclidean path length after $n$ steps is $O(\sqrt n)$ and the expected height of the tree is bounded above by $(e + o(1)) \log n$. △ Less

Submitted 3 May, 2020; originally announced May 2020.

Comments: 29 pages, 10 figures, submitted to The International Journal of Robotics Research

arXiv:1909.07367 [pdf, other]

Hipster random walks

Authors: Louigi Addario-Berry, Hannah Cairns, Luc Devroye, Celine Kerriou, Rivka Mitchell

Abstract: We introduce and study a family of random processes on trees we call hipster random walks, special instances of which we heuristically connect to the min-plus binary trees introduced by Robin Pemantle and studied by Auffinger and Cable (2017; arXiv:1709.07849), and to the critical random hierarchical lattice studied by Hambly and Jordan (2004). We prove distributional convergence for the processes… ▽ More We introduce and study a family of random processes on trees we call hipster random walks, special instances of which we heuristically connect to the min-plus binary trees introduced by Robin Pemantle and studied by Auffinger and Cable (2017; arXiv:1709.07849), and to the critical random hierarchical lattice studied by Hambly and Jordan (2004). We prove distributional convergence for the processes by showing that their evolutions can be understood as a discrete analogues of certain convection-diffusion equations, then using a combination of coupling arguments and results from the numerical analysis literature on convergence of numerical approximations of PDEs. △ Less

Submitted 16 September, 2019; originally announced September 2019.

Comments: 28 pages

MSC Class: 60F05; 60K35 (Primary); 65M12; 35K65 (Secondary)

arXiv:1812.06063 [pdf, ps, other]

Discrete minimax estimation with trees

Authors: Luc Devroye, Tommy Reddad

Abstract: We propose a simple recursive data-based partitioning scheme which produces piecewise-constant or piecewise-linear density estimates on intervals, and show how this scheme can determine the optimal $L_1$ minimax rate for some discrete nonparametric classes. We propose a simple recursive data-based partitioning scheme which produces piecewise-constant or piecewise-linear density estimates on intervals, and show how this scheme can determine the optimal $L_1$ minimax rate for some discrete nonparametric classes. △ Less

Submitted 27 June, 2019; v1 submitted 14 December, 2018; originally announced December 2018.

MSC Class: 60G07

arXiv:1810.08693 [pdf, ps, other]

The total variation distance between high-dimensional Gaussians with the same mean

Authors: Luc Devroye, Abbas Mehrabian, Tommy Reddad

Abstract: Given two high-dimensional Gaussians with the same mean, we prove a lower and an upper bound for their total variation distance, which are within a constant factor of one another. Given two high-dimensional Gaussians with the same mean, we prove a lower and an upper bound for their total variation distance, which are within a constant factor of one another. △ Less

Submitted 22 October, 2023; v1 submitted 19 October, 2018; originally announced October 2018.

Comments: In an earlier version, tight bounds were claimed for the total-variation distance between two general Gaussians. But the proof of the upper bound was incorrect, and we removed the flawed bound from the paper. Later, Arbas, Ashtiani, and Liaw (Theorem 1.8 in arxiv.longhoe.net/abs/2303.04288v2) proved tight bounds for the total-variation distance between two general Gaussians, solving the original problem

arXiv:1810.00969 [pdf, other]

On the discovery of the seed in uniform attachment trees

Authors: Luc Devroye, Tommy Reddad

Abstract: We investigate the size of vertex confidence sets for including part of (or the entirety of) the seed in seeded uniform attachment trees, given knowledge of some of the seed's properties, and with a prescribed probability of failure. We also study the problem of identifying the leaves of a seed in a seeded uniform attachment tree, given knowledge of the positions of all internal nodes of the seed. We investigate the size of vertex confidence sets for including part of (or the entirety of) the seed in seeded uniform attachment trees, given knowledge of some of the seed's properties, and with a prescribed probability of failure. We also study the problem of identifying the leaves of a seed in a seeded uniform attachment tree, given knowledge of the positions of all internal nodes of the seed. △ Less

Submitted 22 February, 2019; v1 submitted 1 October, 2018; originally announced October 2018.

MSC Class: 05C80

arXiv:1806.06887 [pdf, ps, other]

The Minimax Learning Rates of Normal and Ising Undirected Graphical Models

Authors: Luc Devroye, Abbas Mehrabian, Tommy Reddad

Abstract: Let $G$ be an undirected graph with $m$ edges and $d$ vertices. We show that $d$-dimensional Ising models on $G$ can be learned from $n$ i.i.d. samples within expected total variation distance some constant factor of $\min\{1, \sqrt{(m + d)/n}\}$, and that this rate is optimal. We show that the same rate holds for the class of $d$-dimensional multivariate normal undirected graphical models with re… ▽ More Let $G$ be an undirected graph with $m$ edges and $d$ vertices. We show that $d$-dimensional Ising models on $G$ can be learned from $n$ i.i.d. samples within expected total variation distance some constant factor of $\min\{1, \sqrt{(m + d)/n}\}$, and that this rate is optimal. We show that the same rate holds for the class of $d$-dimensional multivariate normal undirected graphical models with respect to $G$. We also identify the optimal rate of $\min\{1, \sqrt{m/n}\}$ for Ising models with no external magnetic field. △ Less

Submitted 3 June, 2020; v1 submitted 18 June, 2018; originally announced June 2018.

Comments: Accepted in the Electronic Journal of Statistics; 24 pages

MSC Class: 62G07; 82B20

arXiv:1805.09425 [pdf, other]

Recursive functions on conditional Galton--Watson trees

Authors: Nicolas Broutin, Luc Devroye, Nicolas Fraiman

Abstract: A recursive function on a tree is a function in which each leaf has a given value, and each internal node has a value equal to a function of the number of children, the values of the children, and possibly an explicitly specified random element $U$. The value of the root is the key quantity of interest in general. In this first study, all node values and function values are in a finite set $S$. In… ▽ More A recursive function on a tree is a function in which each leaf has a given value, and each internal node has a value equal to a function of the number of children, the values of the children, and possibly an explicitly specified random element $U$. The value of the root is the key quantity of interest in general. In this first study, all node values and function values are in a finite set $S$. In this note, we describe the limit behavior when the leaf values are drawn independently from a fixed distribution on $S$, and the tree $T_n$ is a random Galton--Watson tree of size $n$. △ Less

Submitted 23 March, 2020; v1 submitted 23 May, 2018; originally announced May 2018.

Comments: 13 pages, 1 figure

arXiv:1804.03069 [pdf, other]

doi 10.1214/19-EJP318

K-cut on paths and some trees

Authors: Xing Shi Cai, Luc Devroye, Cecilia Holmgren, Fiona Skerman

Abstract: We define the (random) $k$-cut number of a rooted graph to model the difficulty of the destruction of a resilient network. The process is as the cut model of Meir and Moon except now a node must be cut $k$ times before it is destroyed. The first order terms of the expectation and variance of $\mathcal{X}_{n}$, the $k$-cut number of a path of length $n$, are proved. We also show that… ▽ More We define the (random) $k$-cut number of a rooted graph to model the difficulty of the destruction of a resilient network. The process is as the cut model of Meir and Moon except now a node must be cut $k$ times before it is destroyed. The first order terms of the expectation and variance of $\mathcal{X}_{n}$, the $k$-cut number of a path of length $n$, are proved. We also show that $\mathcal{X}_{n}$, after rescaling, converges in distribution to a limit $\mathcal{B}_{k}$, which has a complicated representation. The paper then briefly discusses the $k$-cut number of some trees and general graphs. We conclude by some analytic results which may be of interest. △ Less

Submitted 30 January, 2019; v1 submitted 9 April, 2018; originally announced April 2018.

Comments: The paper was originally titled "Cutting resilient networks"

MSC Class: 60C05

arXiv:1712.07775 [pdf, ps, other]

Local optima of the Sherrington-Kirkpatrick Hamiltonian

Authors: Louigi Addario-Berry, Luc Devroye, Gabor Lugosi, Roberto Imbuzeiro Oliveira

Abstract: We study local optima of the Hamiltonian of the Sherrington-Kirkpatrick model. We compute the exponent of the expected number of local optima and determine the "typical" value of the Hamiltonian. We study local optima of the Hamiltonian of the Sherrington-Kirkpatrick model. We compute the exponent of the expected number of local optima and determine the "typical" value of the Hamiltonian. △ Less

Submitted 20 December, 2017; originally announced December 2017.

Comments: 20 pages

arXiv:1708.08891 [pdf, ps, other]

A lower bound on the size of an absorbing set in an arc-coloured tournament

Authors: Laurent Beaudou, Luc Devroye, Gena Hahn

Abstract: Bousquet, Lochet and Thomassé recently gave an elegant proof that for any integer $n$, there is a least integer $f(n)$ such that any tournament whose arcs are coloured with $n$ colours contains a subset of vertices $S$ of size $f(n)$ with the property that any vertex not in $S$ admits a monochromatic path to some vertex of $S$. In this note we provide a lower bound on the value $f(n)$. Bousquet, Lochet and Thomassé recently gave an elegant proof that for any integer $n$, there is a least integer $f(n)$ such that any tournament whose arcs are coloured with $n$ colours contains a subset of vertices $S$ of size $f(n)$ with the property that any vertex not in $S$ admits a monochromatic path to some vertex of $S$. In this note we provide a lower bound on the value $f(n)$. △ Less

Submitted 30 August, 2017; v1 submitted 29 August, 2017; originally announced August 2017.

arXiv:1707.00083 [pdf, other]

Notes on Growing a Tree in a Graph

Authors: Luc Devroye, Vida Dujmović, Alan Frieze, Abbas Mehrabian, Pat Morin, Bruce Reed

Abstract: We study the height of a spanning tree $T$ of a graph $G$ obtained by starting with a single vertex of $G$ and repeatedly selecting, uniformly at random, an edge of $G$ with exactly one endpoint in $T$ and adding this edge to $T$. We study the height of a spanning tree $T$ of a graph $G$ obtained by starting with a single vertex of $G$ and repeatedly selecting, uniformly at random, an edge of $G$ with exactly one endpoint in $T$ and adding this edge to $T$. △ Less

Submitted 4 July, 2017; v1 submitted 30 June, 2017; originally announced July 2017.

Comments: Updated grant acknowledgement

arXiv:1701.02527 [pdf, other]

The heavy path approach to Galton-Watson trees with an application to Apollonian networks

Authors: Luc Devroye, Cecilia Holmgren, Henning Sulzbach

Abstract: We study the heavy path decomposition of conditional Galton-Watson trees. In a standard Galton-Watson tree conditional on its size $n$, we order all children by their subtree sizes, from large (heavy) to small. A node is marked if it is among the $k$ heaviest nodes among its siblings. Unmarked nodes and their subtrees are removed, leaving only a tree of marked nodes, which we call the $k$-heavy tr… ▽ More We study the heavy path decomposition of conditional Galton-Watson trees. In a standard Galton-Watson tree conditional on its size $n$, we order all children by their subtree sizes, from large (heavy) to small. A node is marked if it is among the $k$ heaviest nodes among its siblings. Unmarked nodes and their subtrees are removed, leaving only a tree of marked nodes, which we call the $k$-heavy tree. We study various properties of these trees, including their size and the maximal distance from any original node to the $k$-heavy tree. In particular, under some moment condition, the $2$-heavy tree is with high probability larger than $cn$ for some constant $c > 0$, and the maximal distance from the $k$-heavy tree is $O(n^{1/(k+1)})$ in probability. As a consequence, for uniformly random Apollonian networks of size $n$, the expected size of the longest simple path is $Ω(n)$. △ Less

Submitted 10 January, 2017; originally announced January 2017.

Comments: 3 figures

arXiv:1602.03850 [pdf, other]

doi 10.30757/ALEA.v14-29

A study of large fringe and non-fringe subtrees in conditional Galton-Watson trees

Authors: Xing Shi Cai, Luc Devroye

Abstract: We study the conditions for families of subtrees to exist with high probability (whp) in a Galton-Walton tree of size $n$. We first give a Poisson approximation of fringe subtree counts, which yields the height of the maximal complete $r$-ary fringe subtree. Then we determine the maximal $K_n$ such that every tree of size at most $K_n$ appears as fringe subtree whp. Finally, we study non-fringe su… ▽ More We study the conditions for families of subtrees to exist with high probability (whp) in a Galton-Walton tree of size $n$. We first give a Poisson approximation of fringe subtree counts, which yields the height of the maximal complete $r$-ary fringe subtree. Then we determine the maximal $K_n$ such that every tree of size at most $K_n$ appears as fringe subtree whp. Finally, we study non-fringe subtree counts and determine the height of the maximal complete $r$-ary non-fringe subtree. △ Less

Submitted 11 February, 2016; originally announced February 2016.

Comments: 40 pages, 7 figure

MSC Class: 60C05

arXiv:1512.04267 [pdf, ps, other]

On the measure of Voronoi cells

Authors: Luc Devroye, László Györfi, Gábor Lugosi, Harro Walk

Abstract: $n$ independent random points drawn from a density $f$ in $R^d$ define a random Voronoi partition. We study the measure of a typical cell of the partition. We prove that the asymptotic distribution of the probability measure of the cell centered at a point $x \in R^d$ is independent of $x$ and the density $f… ▽ More $n$ independent random points drawn from a density $f$ in $R^d$ define a random Voronoi partition. We study the measure of a typical cell of the partition. We prove that the asymptotic distribution of the probability measure of the cell centered at a point $x \in R^d$ is independent of $x$ and the density $f$. We determine all moments of the asymptotic distribution and show that the distribution becomes more concentrated as $d$ becomes large. In particular, we show that the variance converges to zero exponentially fast in $d$. %We also study the measure of the largest cell of the partition. %{\red We also obtain a density-free bound for the rate of convergence of the diameter of a typical Voronoi cell. △ Less

Submitted 14 December, 2015; originally announced December 2015.

Comments: 19 pages

arXiv:1509.05845 [pdf, ps, other]

Sub-Gaussian mean estimators

Authors: Luc Devroye, Matthieu Lerasle, Gabor Lugosi, Roberto I. Oliveira

Abstract: We discuss the possibilities and limitations of estimating the mean of a real-valued random variable from independent and identically distributed observations from a non-asymptotic point of view. In particular, we define estimators with a sub-Gaussian behavior even for certain heavy-tailed distributions. We also prove various impossibility results for mean estimators. We discuss the possibilities and limitations of estimating the mean of a real-valued random variable from independent and identically distributed observations from a non-asymptotic point of view. In particular, we define estimators with a sub-Gaussian behavior even for certain heavy-tailed distributions. We also prove various impossibility results for mean estimators. △ Less

Submitted 18 September, 2015; originally announced September 2015.

Comments: 34 pages

arXiv:1506.02811 [pdf, other]

Exceptional rotations of random graphs: a VC theory

Authors: Louigi Addario-Berry, Shankar Bhamidi, Sébastien Bubeck, Luc Devroye, Gabor Lugosi, Roberto Imbuzeiro Oliveira

Abstract: In this paper we explore maximal deviations of large random structures from their typical behavior. We introduce a model for a high-dimensional random graph process and ask analogous questions to those of Vapnik and Chervonenkis for deviations of averages: how "rich" does the process have to be so that one sees atypical behavior. In particular, we study a natural process of Erdős-Rényi random grap… ▽ More In this paper we explore maximal deviations of large random structures from their typical behavior. We introduce a model for a high-dimensional random graph process and ask analogous questions to those of Vapnik and Chervonenkis for deviations of averages: how "rich" does the process have to be so that one sees atypical behavior. In particular, we study a natural process of Erdős-Rényi random graphs indexed by unit vectors in $\mathbb{R}^d$. We investigate the deviations of the process with respect to three fundamental properties: clique number, chromatic number, and connectivity. In all cases we establish upper and lower bounds for the minimal dimension $d$ that guarantees the existence of "exceptional directions" in which the random graph behaves atypically with respect to the property. For each of the three properties, four theorems are established, to describe upper and lower bounds for the threshold dimension in the subcritical and supercritical regimes. △ Less

Submitted 9 June, 2015; originally announced June 2015.

arXiv:1504.06238 [pdf, other]

doi 10.1002/rsa.20707

The graph structure of a deterministic automaton chosen at random: full version

Authors: Xing Shi Cai, Luc Devroye

Abstract: A deterministic finite automaton (DFA) of $n$ states over a $k$-letter alphabet can be seen as a digraph with $n$ vertices which all have exactly $k$ labeled out-arcs ($k$-out digraph). In 1973 Grusho first proved that with high probability (whp) in a random $k$-out digraph there is a strongly connected component (SCC) of linear size that is reachable from all vertices, i.e., a giant. He also prov… ▽ More A deterministic finite automaton (DFA) of $n$ states over a $k$-letter alphabet can be seen as a digraph with $n$ vertices which all have exactly $k$ labeled out-arcs ($k$-out digraph). In 1973 Grusho first proved that with high probability (whp) in a random $k$-out digraph there is a strongly connected component (SCC) of linear size that is reachable from all vertices, i.e., a giant. He also proved that the size of the giant follows a central limit law. We show that whp the part outside the giant contains at most a few short cycles and mostly consists of overlap** tree-like structures. Thus the directed acyclic graph (DAG) of a random $k$-out digraph is almost the same as the digraph with the giant contracted into one vertex. These findings lead to a new, concise and self-contained proof of Grusho's theorem. This work also contains some other results including the structure outside the giant, the phase transition phenomenon in strong connectivity, the typical distance, and an extension to simple digraphs. △ Less

Submitted 9 August, 2016; v1 submitted 23 April, 2015; originally announced April 2015.

Comments: 48 pages, 7 figures

arXiv:1411.4426 [pdf, ps, other]

Explosion and linear transit times in infinite trees

Authors: Omid Amini, Luc Devroye, Simon Griffiths, Neil Olver

Abstract: Let $T$ be an infinite rooted tree with weights $w_e$ assigned to its edges. Denote by $m_n(T)$ the minimum weight of a path from the root to a node of the $n$th generation. We consider the possible behaviour of $m_n(T)$ with focus on the two following cases: we say $T$ is explosive if \[ \lim_{n\to \infty}m_n(T) < \infty, \] and say that $T$ exhibits linear growth if \[ \liminf_{n\to \infty} \fra… ▽ More Let $T$ be an infinite rooted tree with weights $w_e$ assigned to its edges. Denote by $m_n(T)$ the minimum weight of a path from the root to a node of the $n$th generation. We consider the possible behaviour of $m_n(T)$ with focus on the two following cases: we say $T$ is explosive if \[ \lim_{n\to \infty}m_n(T) < \infty, \] and say that $T$ exhibits linear growth if \[ \liminf_{n\to \infty} \frac{m_n(T)}{n} > 0. \] We consider a class of infinite randomly weighted trees related to the Poisson-weighted infinite tree, and determine precisely which trees in this class have linear growth almost surely. We then apply this characterization to obtain new results concerning the event of explosion in infinite randomly weighted spherically-symmetric trees, answering a question of Pemantle and Peres. As a further application, we consider the random real tree generated by attaching sticks of deterministic decreasing lengths, and determine for which sequences of lengths the tree has finite height almost surely. △ Less

Submitted 17 November, 2014; originally announced November 2014.

arXiv:1411.3317 [pdf, ps, other]

Finding Adam in random growing trees

Authors: Sébastien Bubeck, Luc Devroye, Gábor Lugosi

Abstract: We investigate algorithms to find the first vertex in large trees generated by either the uniform attachment or preferential attachment model. We require the algorithm to output a set of $K$ vertices, such that, with probability at least $1-ε$, the first vertex is in this set. We show that for any $ε$, there exist such algorithms with $K$ independent of the size of the input tree. Moreover, we pro… ▽ More We investigate algorithms to find the first vertex in large trees generated by either the uniform attachment or preferential attachment model. We require the algorithm to output a set of $K$ vertices, such that, with probability at least $1-ε$, the first vertex is in this set. We show that for any $ε$, there exist such algorithms with $K$ independent of the size of the input tree. Moreover, we provide almost tight bounds for the best value of $K$ as a function of $ε$. In the uniform attachment case we show that the optimal $K$ is subpolynomial in $1/ε$, and that it has to be at least superpolylogarithmic. On the other hand, the preferential attachment case is exponentially harder, as we prove that the best $K$ is polynomial in $1/ε$. We conclude the paper with several open problems. △ Less

Submitted 1 December, 2015; v1 submitted 12 November, 2014; originally announced November 2014.

Comments: 14 pages

arXiv:1403.1274 [pdf, other]

Almost optimal sparsification of random geometric graphs

Authors: Nicolas Broutin, Luc Devroye, Gabor Lugosi

Abstract: A random geometric irrigation graph $Γ_n(r_n,ξ)$ has $n$ vertices identified by $n$ independent uniformly distributed points $X_1,\ldots,X_n$ in the unit square $[0,1]^2$. Each point $X_i$ selects $ξ_i$ neighbors at random, without replacement, among those points $X_j$ ($j\neq i$) for which $\|X_i-X_j\| < r_n$, and the selected vertices are connected to $X_i$ by an edge. The number $ξ_i$ of the ne… ▽ More A random geometric irrigation graph $Γ_n(r_n,ξ)$ has $n$ vertices identified by $n$ independent uniformly distributed points $X_1,\ldots,X_n$ in the unit square $[0,1]^2$. Each point $X_i$ selects $ξ_i$ neighbors at random, without replacement, among those points $X_j$ ($j\neq i$) for which $\|X_i-X_j\| < r_n$, and the selected vertices are connected to $X_i$ by an edge. The number $ξ_i$ of the neighbors is an integer-valued random variable, chosen independently with identical distribution for each $X_i$ such that $ξ_i$ satisfies $1\le ξ_i \le κ$ for a constant $κ>1$. We prove that when $r_n = γ_n \sqrt{\log n/n}$ for $γ_n \to \infty$ with $γ_n =o(n^{1/6}/\log^{5/6}n)$, then the random geometric irrigation graph experiences explosive percolation in the sense that when $\mathbf E ξ_i=1$, then the largest connected component has size $o(n)$ but if $\mathbf E ξ_i >1$, then the size of the largest connected component is with high probability $n-o(n)$. This offers a natural non-centralized sparsification of a random geometric graph that is mostly connected. △ Less

Submitted 7 March, 2014; v1 submitted 5 March, 2014; originally announced March 2014.

MSC Class: 05C80; 60C05

arXiv:1402.3696 [pdf, ps, other]

Connectivity of sparse Bluetooth networks

Authors: Nicolas Broutin, Luc Devroye, Gábor Lugosi

Abstract: Consider a random geometric graph defined on $n$ vertices uniformly distributed in the $d$-dimensional unit torus. Two vertices are connected if their distance is less than a "visibility radius" $r_n$. We consider {\sl Bluetooth networks} that are locally sparsified random geometric graphs. Each vertex selects $c$ of its neighbors in the random geometric graph at random and connects only to the se… ▽ More Consider a random geometric graph defined on $n$ vertices uniformly distributed in the $d$-dimensional unit torus. Two vertices are connected if their distance is less than a "visibility radius" $r_n$. We consider {\sl Bluetooth networks} that are locally sparsified random geometric graphs. Each vertex selects $c$ of its neighbors in the random geometric graph at random and connects only to the selected points. We show that if the visibility radius is at least of the order of $n^{-(1-δ)/d}$ for some $δ> 0$, then a constant value of $c$ is sufficient for the graph to be connected, with high probability. It suffices to take $c \ge \sqrt{(1+ε)/δ} + K$ for any positive $ε$ where $K$ is a constant depending on $d$ only. On the other hand, with $c\le \sqrt{(1-ε)/δ}$, the graph is disconnected, with high probability. △ Less

Submitted 15 February, 2014; originally announced February 2014.

MSC Class: 05C80; 60C05

arXiv:1310.0665 [pdf, ps, other]

Protected nodes and fringe subtrees in some random trees

Authors: Luc Devroye, Svante Janson

Abstract: We study protected nodes in various classes of random rooted trees by putting them in the general context of fringe subtrees introduced by Aldous (1991). Several types of random trees are considered: simply generated trees (or conditioned Galton-Watson trees), which includes several cases treated separately by other authors, binary search trees and random recursive trees. This gives unified and si… ▽ More We study protected nodes in various classes of random rooted trees by putting them in the general context of fringe subtrees introduced by Aldous (1991). Several types of random trees are considered: simply generated trees (or conditioned Galton-Watson trees), which includes several cases treated separately by other authors, binary search trees and random recursive trees. This gives unified and simple proofs of several earlier results, as well as new results. △ Less

Submitted 2 October, 2013; originally announced October 2013.

Comments: 11 pages

MSC Class: 60C05; 05C05

arXiv:1301.4679 [pdf, ps, other]

Cellular Tree Classifiers

Authors: Gérard Biau, Luc Devroye

Abstract: The cellular tree classifier model addresses a fundamental problem in the design of classifiers for a parallel or distributed computing world: Given a data set, is it sufficient to apply a majority rule for classification, or shall one split the data into two or more parts and send each part to a potentially different computer (or cell) for further processing? At first sight, it seems impossible t… ▽ More The cellular tree classifier model addresses a fundamental problem in the design of classifiers for a parallel or distributed computing world: Given a data set, is it sufficient to apply a majority rule for classification, or shall one split the data into two or more parts and send each part to a potentially different computer (or cell) for further processing? At first sight, it seems impossible to define with this paradigm a consistent classifier as no cell knows the "original data size", $n$. However, we show that this is not so by exhibiting two different consistent classifiers. The consistency is universal but is only shown for distributions with nonatomic marginals. △ Less

Submitted 25 June, 2013; v1 submitted 20 January, 2013; originally announced January 2013.

arXiv:1210.7168 [pdf, other]

doi 10.1002/rsa.20391

Depth properties of scaled attachment random recursive trees

Authors: Luc Devroye, Omar Fawzi, Nicolas Fraiman

Abstract: We study depth properties of a general class of random recursive trees where each node i attaches to the random node iX_i and X_0, ..., X_n is a sequence of i.i.d. random variables taking values in [0,1). We call such trees scaled attachment random recursive trees (SARRT). We prove that the typical depth D_n, the maximum depth (or height) H_n and the minimum depth M_n of a SARRT are asymptotically… ▽ More We study depth properties of a general class of random recursive trees where each node i attaches to the random node iX_i and X_0, ..., X_n is a sequence of i.i.d. random variables taking values in [0,1). We call such trees scaled attachment random recursive trees (SARRT). We prove that the typical depth D_n, the maximum depth (or height) H_n and the minimum depth M_n of a SARRT are asymptotically given by D_n \sim μ^{-1} \log n, H_n \sim α_{\max} \log n and M_n \sim α_{\min} \log n where μ, α_{\max} and α_{\min} are constants depending only on the distribution of X_0 whenever X_0 has a density. In particular, this gives a new elementary proof for the height of uniform random recursive trees H_n \sim e \log n that does not use branching random walks. △ Less

Submitted 26 October, 2012; originally announced October 2012.

Comments: 31 pages, 4 figures

MSC Class: 60C05; 05C05

Journal ref: Random Structures and Algorithms (2012), Vol. 41, 66-98

arXiv:1210.6259 [pdf, ps, other]

Connectivity of inhomogeneous random graphs

Authors: Luc Devroye, Nicolas Fraiman

Abstract: We find conditions for the connectivity of inhomogeneous random graphs with intermediate density. Our results generalize the classical result for G(n, p), when p = c log n/n. We draw n independent points X_i from a general distribution on a separable metric space, and let their indices form the vertex set of a graph. An edge (i,j) is added with probability min(1, \K(X_i,X_j) log n/n), where \K \ge… ▽ More We find conditions for the connectivity of inhomogeneous random graphs with intermediate density. Our results generalize the classical result for G(n, p), when p = c log n/n. We draw n independent points X_i from a general distribution on a separable metric space, and let their indices form the vertex set of a graph. An edge (i,j) is added with probability min(1, \K(X_i,X_j) log n/n), where \K \ge 0 is a fixed kernel. We show that, under reasonably weak assumptions, the connectivity threshold of the model can be determined. △ Less

Submitted 23 October, 2012; originally announced October 2012.

Comments: 13 pages. To appear in Random Structures and Algorithms

MSC Class: 05C80; 60C05

arXiv:1201.0586 [pdf, ps, other]

An Affine Invariant $k$-Nearest Neighbor Regression Estimate

Authors: Gérard Biau, Luc Devroye, Vida Dujmovic, Adam Krzyzak

Abstract: We design a data-dependent metric in $\mathbb R^d$ and use it to define the $k$-nearest neighbors of a given point. Our metric is invariant under all affine transformations. We show that, with this metric, the standard $k$-nearest neighbor regression estimate is asymptotically consistent under the usual conditions on $k$, and minimal requirements on the input data. We design a data-dependent metric in $\mathbb R^d$ and use it to define the $k$-nearest neighbors of a given point. Our metric is invariant under all affine transformations. We show that, with this metric, the standard $k$-nearest neighbor regression estimate is asymptotically consistent under the usual conditions on $k$, and minimal requirements on the input data. △ Less

Submitted 18 May, 2012; v1 submitted 3 January, 2012; originally announced January 2012.

arXiv:1106.0461 [pdf, other]

Random hyperplane search trees in high dimensions

Authors: Luc Devroye, James King

Abstract: Given a set S of n \geq d points in general position in R^d, a random hyperplane split is obtained by sampling d points uniformly at random without replacement from S and splitting based on their affine hull. A random hyperplane search tree is a binary space partition tree obtained by recursive application of random hyperplane splits. We investigate the structural distributions of such random tree… ▽ More Given a set S of n \geq d points in general position in R^d, a random hyperplane split is obtained by sampling d points uniformly at random without replacement from S and splitting based on their affine hull. A random hyperplane search tree is a binary space partition tree obtained by recursive application of random hyperplane splits. We investigate the structural distributions of such random trees with a particular focus on the growth with d. A blessing of dimensionality arises--as d increases, random hyperplane splits more closely resemble perfectly balanced splits; in turn, random hyperplane search trees more closely resemble perfectly balanced binary search trees. We prove that, for any fixed dimension d, a random hyperplane search tree storing n points has height at most (1 + O(1/sqrt(d))) log_2 n and average element depth at most (1 + O(1/d)) log_2 n with high probability as n \rightarrow \infty. Further, we show that these bounds are asymptotically optimal with respect to d. △ Less

Submitted 2 June, 2011; originally announced June 2011.

Comments: 19 pages, 4 figures

MSC Class: 68Q87

arXiv:1103.0351 [pdf, other]

Connectivity threshold for Bluetooth graphs

Authors: Nicolas Broutin, Luc Devroye, Nicolas Fraiman, Gábor Lugosi

Abstract: We study the connectivity properties of random Bluetooth graphs that model certain "ad hoc" wireless networks. The graphs are obtained as "irrigation subgraphs" of the well-known random geometric graph model. There are two parameters that control the model: the radius $r$ that determines the "visible neighbors" of each node and the number of edges $c$ that each node is allowed to send to these. Th… ▽ More We study the connectivity properties of random Bluetooth graphs that model certain "ad hoc" wireless networks. The graphs are obtained as "irrigation subgraphs" of the well-known random geometric graph model. There are two parameters that control the model: the radius $r$ that determines the "visible neighbors" of each node and the number of edges $c$ that each node is allowed to send to these. The randomness comes from the underlying distribution of data points in space and from the choices of each vertex. We prove that no connectivity can take place with high probability for a range of parameters $r, c$ and completely characterize the connectivity threshold (in $c$) for values of $r$ close the critical value for connectivity in the underlying random geometric graph. △ Less

Submitted 2 March, 2011; originally announced March 2011.

Comments: 21 pages, 5 figures

MSC Class: 05C80; 60C05

arXiv:1102.0950 [pdf, ps, other]

doi 10.1214/12-AOP806

On explosions in heavy-tailed branching random walks

Authors: Omid Amini, Luc Devroye, Simon Griffiths, Neil Olver

Abstract: Consider a branching random walk on $\mathbb{R}$, with offspring distribution Z and nonnegative displacement distribution W. We say that explosion occurs if an infinite number of particles may be found within a finite distance of the origin. In this paper, we investigate this phenomenon when the offspring distribution Z is heavy-tailed. Under an appropriate condition, we are able to characterize t… ▽ More Consider a branching random walk on $\mathbb{R}$, with offspring distribution Z and nonnegative displacement distribution W. We say that explosion occurs if an infinite number of particles may be found within a finite distance of the origin. In this paper, we investigate this phenomenon when the offspring distribution Z is heavy-tailed. Under an appropriate condition, we are able to characterize the pairs (Z, W) for which explosion occurs, by demonstrating the equivalence of explosion with a seemingly much weaker event: that the sum over generations of the minimum displacement in each generation is finite. Furthermore, we demonstrate that our condition on the tail is best possible for this equivalence to occur. We also investigate, under additional smoothness assumptions, the behavior of $M_n$, the position of the particle in generation n closest to the origin, when explosion does not occur (and hence $\lim_{n\rightarrow\infty}M_n=\infty$). △ Less

Submitted 14 June, 2013; v1 submitted 4 February, 2011; originally announced February 2011.

Comments: Published in at http://dx.doi.org/10.1214/12-AOP806 the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOP-AOP806

Journal ref: Annals of Probability 2013, Vol. 41, No. 3B, 1864-1899

arXiv:1011.4121 [pdf, ps, other]

doi 10.1214/12-AOP758

Sub-Gaussian tail bounds for the width and height of conditioned Galton--Watson trees

Authors: Louigi Addario-Berry, Luc Devroye, Svante Janson

Abstract: We study the height and width of a Galton--Watson tree with offspring distribution B satisfying E(B)=1, 0 < Var(B) < infinity, conditioned on having exactly n nodes. Under this conditioning, we derive sub-Gaussian tail bounds for both the width (largest number of nodes in any level) and height (greatest level containing a node); the bounds are optimal up to constant factors in the exponent. Under… ▽ More We study the height and width of a Galton--Watson tree with offspring distribution B satisfying E(B)=1, 0 < Var(B) < infinity, conditioned on having exactly n nodes. Under this conditioning, we derive sub-Gaussian tail bounds for both the width (largest number of nodes in any level) and height (greatest level containing a node); the bounds are optimal up to constant factors in the exponent. Under the same conditioning, we also derive essentially optimal upper tail bounds for the number of nodes at level k, for 1 <= k <= n. △ Less

Submitted 17 November, 2010; originally announced November 2010.

Comments: 15 pages

MSC Class: 60C05; 60J80

arXiv:1004.3146 [pdf, ps, other]

Copulas in three dimensions with prescribed correlations

Authors: Luc Devroye, Gerard Letac

Abstract: Given an arbitrary three-dimensional correlation matrix, we prove that there exists a three-dimensional joint distribution for the random variable $(X,Y,Z)$ such that $X$,$Y$ and $Z$ are identically distributed with beta distribution $β_{k,k}(dx)$ on $(0,1)$ if $k\geq 1/2$. This implies that any correlation structure can be attained for three-dimensional copulas. Given an arbitrary three-dimensional correlation matrix, we prove that there exists a three-dimensional joint distribution for the random variable $(X,Y,Z)$ such that $X$,$Y$ and $Z$ are identically distributed with beta distribution $β_{k,k}(dx)$ on $(0,1)$ if $k\geq 1/2$. This implies that any correlation structure can be attained for three-dimensional copulas. △ Less

Submitted 19 April, 2010; originally announced April 2010.

Comments: 15 pages, 2 figures

MSC Class: 62J10

arXiv:0908.3437 [pdf, ps, other]

doi 10.1214/10-AOS817

On combinatorial testing problems

Authors: Louigi Addario-Berry, Nicolas Broutin, Luc Devroye, Gábor Lugosi

Abstract: We study a class of hypothesis testing problems in which, upon observing the realization of an $n$-dimensional Gaussian vector, one has to decide whether the vector was drawn from a standard normal distribution or, alternatively, whether there is a subset of the components belonging to a certain given class of sets whose elements have been ``contaminated,'' that is, have a mean different from zero… ▽ More We study a class of hypothesis testing problems in which, upon observing the realization of an $n$-dimensional Gaussian vector, one has to decide whether the vector was drawn from a standard normal distribution or, alternatively, whether there is a subset of the components belonging to a certain given class of sets whose elements have been ``contaminated,'' that is, have a mean different from zero. We establish some general conditions under which testing is possible and others under which testing is hopeless with a small risk. The combinatorial and geometric structure of the class of sets is shown to play a crucial role. The bounds are illustrated on various examples. △ Less

Submitted 19 November, 2010; v1 submitted 24 August, 2009; originally announced August 2009.

Comments: Published in at http://dx.doi.org/10.1214/10-AOS817 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS817

Journal ref: Annals of Statistics 2010, Vol. 38, No. 5, 3063-3092

Showing 1–50 of 56 results for author: Devroye, L