-
Subtractive random forests with two choices
Authors:
Francisco Calvillo,
Luc Devroye,
Gábor Lugosi
Abstract:
Recommendation systems are pivotal in aiding users amid vast online content. Broutin, Devroye, Lugosi, and Oliveira proposed Subtractive Random Forests (\textsc{surf}), a model that emphasizes temporal user preferences. Expanding on \textsc{surf}, we introduce a model for a multi-choice recommendation system, enabling users to select from two independent suggestions based on past interactions. We…
▽ More
Recommendation systems are pivotal in aiding users amid vast online content. Broutin, Devroye, Lugosi, and Oliveira proposed Subtractive Random Forests (\textsc{surf}), a model that emphasizes temporal user preferences. Expanding on \textsc{surf}, we introduce a model for a multi-choice recommendation system, enabling users to select from two independent suggestions based on past interactions. We evaluate its effectiveness and robustness across diverse scenarios, incorporating heavy-tailed distributions for time delays. By analyzing user topic evolution, we assess the system's consistency. Our study offers insights into the performance and potential enhancements of multi-choice recommendation systems in practical settings.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Property testing in graphical models: testing small separation numbers
Authors:
Luc Devroye,
Gábor Lugosi,
Piotr Zwiernik
Abstract:
In many statistical applications, the dimension is too large to handle for standard high-dimensional machine learning procedures. This is particularly true for graphical models, where the interpretation of a large graph is difficult and learning its structure is often computationally impossible either because the underlying graph is not sufficiently sparse or the number of vertices is too large. T…
▽ More
In many statistical applications, the dimension is too large to handle for standard high-dimensional machine learning procedures. This is particularly true for graphical models, where the interpretation of a large graph is difficult and learning its structure is often computationally impossible either because the underlying graph is not sufficiently sparse or the number of vertices is too large. To address this issue, we develop a procedure to test a property of a graph underlying a graphical model that requires only a subquadratic number of correlation queries (i.e., we require that the algorithm only can access a tiny fraction of the covariance matrix). This provides a conceptually simple test to determine whether the underlying graph is a tree or, more generally, if it has a small separation number, a quantity closely related to the treewidth of the graph. The proposed method is a divide-and-conquer algorithm that can be applied to quite general graphical models.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
On the size of temporal cliques in subcritical random temporal graphs
Authors:
Caelan Atamanchuk,
Luc Devroye,
Gabor Lugosi
Abstract:
A \emph{random temporal graph} is an Erdős-Rényi random graph $G(n,p)$, together with a random ordering of its edges. A path in the graph is called \emph{increasing} if the edges on the path appear in increasing order. A set $S$ of vertices forms a \emph{temporal clique} if for all $u,v \in S$, there is an increasing path from $u$ to $v$. \cite{Becker2023} proved that if $p=c\log n/n$ for $c>1$, t…
▽ More
A \emph{random temporal graph} is an Erdős-Rényi random graph $G(n,p)$, together with a random ordering of its edges. A path in the graph is called \emph{increasing} if the edges on the path appear in increasing order. A set $S$ of vertices forms a \emph{temporal clique} if for all $u,v \in S$, there is an increasing path from $u$ to $v$. \cite{Becker2023} proved that if $p=c\log n/n$ for $c>1$, then, with high probability, there is a temporal clique of size $n-o(n)$. On the other hand, for $c<1$, with high probability, the largest temporal clique is of size $o(n)$. In this note we improve the latter bound by showing that, for $c<1$, the largest temporal clique is of \emph{constant} size with high probability.
△ Less
Submitted 28 April, 2024; v1 submitted 5 April, 2024;
originally announced April 2024.
-
Burning Random Trees
Authors:
Luc Devroye,
Austin Eide,
Pawel Pralat
Abstract:
Let $\mathcal{T}$ be a Galton-Watson tree with a given offspring distribution $ξ$, where $ξ$ is a $Z_{\geq 0}$-valued random variable with $E[ξ] = 1$ and $0 < σ^{2}:=Var[ξ] < \infty$. For $n \geq 1$, let $T_{n}$ be the tree $\mathcal{T}$ conditioned to have $n$ vertices. In this paper we investigate $b(T_n)$, the burning number of $T_n$. Our main result shows that asymptotically almost surely…
▽ More
Let $\mathcal{T}$ be a Galton-Watson tree with a given offspring distribution $ξ$, where $ξ$ is a $Z_{\geq 0}$-valued random variable with $E[ξ] = 1$ and $0 < σ^{2}:=Var[ξ] < \infty$. For $n \geq 1$, let $T_{n}$ be the tree $\mathcal{T}$ conditioned to have $n$ vertices. In this paper we investigate $b(T_n)$, the burning number of $T_n$. Our main result shows that asymptotically almost surely $b(T_n)$ is of the order of $n^{1/3}$.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Random friend trees
Authors:
Louigi Addario Berry,
Simon Briend,
Luc Devroye,
Serte Donderwinkel,
Céline Kerriou,
Gábor Lugosi
Abstract:
We study a random recursive tree model featuring complete redirection called the random friend tree and introduced by Saramäki and Kaski. Vertices are attached in a sequential manner one by one by selecting an existing target vertex and connecting to one of its neighbours (or friends), chosen uniformly at random. This model has interesting emergent properties, such as a highly skewed degree sequen…
▽ More
We study a random recursive tree model featuring complete redirection called the random friend tree and introduced by Saramäki and Kaski. Vertices are attached in a sequential manner one by one by selecting an existing target vertex and connecting to one of its neighbours (or friends), chosen uniformly at random. This model has interesting emergent properties, such as a highly skewed degree sequence. In contrast to the preferential attachment model, these emergent phenomena stem from a local rather than a global attachment mechanism. The structure of the resulting tree is also strikingly different from both the preferential attachment tree and the uniform random recursive tree: every edge is incident to a macro-hub of asymptotically linear degree, and with high probability all but at most $n^{9/10}$ vertices in a tree of size $n$ are leaves. We prove various results on the neighbourhood of fixed vertices and edges, and we study macroscopic properties such as the diameter and the degree distribution, providing insights into the overall structure of the tree. We also present a number of open questions on this model and related models.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
A note on estimating the dimension from a random geometric graph
Authors:
Caelan Atamanchuk,
Luc Devroye,
Gabor Lugosi
Abstract:
Let $G_n$ be a random geometric graph with vertex set $[n]$ based on $n$ i.i.d.\ random vectors $X_1,\ldots,X_n$ drawn from an unknown density $f$ on $\R^d$. An edge $(i,j)$ is present when $\|X_i -X_j\| \le r_n$, for a given threshold $r_n$ possibly depending upon $n$, where $\| \cdot \|$ denotes Euclidean distance. We study the problem of estimating the dimension $d$ of the underlying space when…
▽ More
Let $G_n$ be a random geometric graph with vertex set $[n]$ based on $n$ i.i.d.\ random vectors $X_1,\ldots,X_n$ drawn from an unknown density $f$ on $\R^d$. An edge $(i,j)$ is present when $\|X_i -X_j\| \le r_n$, for a given threshold $r_n$ possibly depending upon $n$, where $\| \cdot \|$ denotes Euclidean distance. We study the problem of estimating the dimension $d$ of the underlying space when we have access to the adjacency matrix of the graph but do not know $r_n$ or the vectors $X_i$. The main result of the paper is that there exists an estimator of $d$ that converges to $d$ in probability as $n \to \infty$ for all densities with $\int f^5 < \infty$ whenever $n^{3/2} r_n^d \to \infty$ and $r_n = o(1)$. The conditions allow very sparse graphs since when $n^{3/2} r_n^d \to 0$, the graph contains isolated edges only, with high probability. We also show that, without any condition on the density, a consistent estimator of $d$ exists when $n r_n^d \to \infty$ and $r_n = o(1)$.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
An Algorithm to Recover Shredded Random Matrices
Authors:
Caelan Atamanchuk,
Luc Devroye,
Massimo Vicenzo
Abstract:
Given some binary matrix $M$, suppose we are presented with the collection of its rows and columns in independent arbitrary orderings. From this information, are we able to recover the unique original orderings and matrix? We present an algorithm that identifies whether there is a unique ordering associated with a set of rows and columns, and outputs either the unique correct orderings for the row…
▽ More
Given some binary matrix $M$, suppose we are presented with the collection of its rows and columns in independent arbitrary orderings. From this information, are we able to recover the unique original orderings and matrix? We present an algorithm that identifies whether there is a unique ordering associated with a set of rows and columns, and outputs either the unique correct orderings for the rows and columns or the full collection of all valid orderings and valid matrices. We show that there is a constant $c > 0$ such that the algorithm terminates in $O(n^2)$ time with high probability and in expectation for random $n \times n$ binary matrices with i.i.d.\ Bernoulli $(p)$ entries $(m_{ij})_{ij=1}^n$ such that $\frac{c\log^2(n)}{n(\log\log(n))^2} \leq p \leq \frac{1}{2}$.
△ Less
Submitted 23 April, 2024; v1 submitted 25 October, 2023;
originally announced October 2023.
-
A Proletarian Approach to Generating Eigenvalues of GUE Matrices
Authors:
Luc Devroye,
Jad Hamdan
Abstract:
We propose a simple algorithm to generate random variables described by densities equaling squared Hermite functions. Using results from random matrix theory, we utilize this to generate a randomly chosen eigenvalue of a matrix from the Gaussian Unitary Ensemble (GUE) in sublinear expected time in the RAM model.
We propose a simple algorithm to generate random variables described by densities equaling squared Hermite functions. Using results from random matrix theory, we utilize this to generate a randomly chosen eigenvalue of a matrix from the Gaussian Unitary Ensemble (GUE) in sublinear expected time in the RAM model.
△ Less
Submitted 14 September, 2023; v1 submitted 7 April, 2023;
originally announced April 2023.
-
Subtractive random forests
Authors:
Nicolas Broutin,
Luc Devroye,
Gabor Lugosi,
Roberto Imbuzeiro Oliveira
Abstract:
Motivated by online recommendation systems, we study a family of random forests. The vertices of the forest are labeled by integers. Each non-positive integer $i\le 0$ is the root of a tree. Vertices labeled by positive integers $n \ge 1$ are attached sequentially such that the parent of vertex $n$ is $n-Z_n$, where the $Z_n$ are i.i.d.\ random variables taking values in $\mathbb N$. We study seve…
▽ More
Motivated by online recommendation systems, we study a family of random forests. The vertices of the forest are labeled by integers. Each non-positive integer $i\le 0$ is the root of a tree. Vertices labeled by positive integers $n \ge 1$ are attached sequentially such that the parent of vertex $n$ is $n-Z_n$, where the $Z_n$ are i.i.d.\ random variables taking values in $\mathbb N$. We study several characteristics of the resulting random forest. In particular, we establish bounds for the expected tree sizes, the number of trees in the forest, the number of leaves, the maximum degree, and the height of the forest. We show that for all distributions of the $Z_n$, the forest contains at most one infinite tree, almost surely. If ${\mathbb E} Z_n < \infty$, then there is a unique infinite tree and the total size of the remaining trees is finite, with finite expected value if ${\mathbb E}Z_n^2 < \infty$. If ${\mathbb E} Z_n = \infty$ then almost surely all trees are finite.
△ Less
Submitted 25 February, 2024; v1 submitted 19 October, 2022;
originally announced October 2022.
-
Estimating monotone densities by cellular binary trees
Authors:
Luc Devroye,
Jad Hamdan
Abstract:
We propose a novel, simple density estimation algorithm for bounded monotone densities with compact support under a cellular restriction. We show that its expected error ($L_1$ distance) converges at a rate of $n^{-1/3}$, that its expected runtime is sublinear and, in doing so, find a connection to the theory of Galton--Watson processes.
We propose a novel, simple density estimation algorithm for bounded monotone densities with compact support under a cellular restriction. We show that its expected error ($L_1$ distance) converges at a rate of $n^{-1/3}$, that its expected runtime is sublinear and, in doing so, find a connection to the theory of Galton--Watson processes.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
On the peel number and the leaf-height of a Galton-Watson tree
Authors:
Luc Devroye,
Marcel K. Goh,
Rosie Y. Zhao
Abstract:
We study several parameters of a random Bienaymé-Galton-Watson tree $T_n$ of size $n$ defined in terms of an offspring distribution $ξ$ with mean $1$ and nonzero finite variance $σ^2$. Let $f(s)={\bf E}\{s^ξ\}$ be the generating function of the random variable $ξ$. We show that the independence number is in probability asymptotic to $qn$, where $q$ is the unique solution to $q = f(1-q)$. One of th…
▽ More
We study several parameters of a random Bienaymé-Galton-Watson tree $T_n$ of size $n$ defined in terms of an offspring distribution $ξ$ with mean $1$ and nonzero finite variance $σ^2$. Let $f(s)={\bf E}\{s^ξ\}$ be the generating function of the random variable $ξ$. We show that the independence number is in probability asymptotic to $qn$, where $q$ is the unique solution to $q = f(1-q)$. One of the many algorithms for finding the largest independent set of nodes uses a notion of repeated peeling away of all leaves and their parents. The number of rounds of peeling is shown to be in probability asymptotic to $\log n / \log\bigl(1/f'(1-q)\bigr)$. Finally, we study a related parameter which we call the leaf-height. Also sometimes called the protection number, this is the maximal shortest path length between any node and a leaf in its subtree. If $p_1 = {\bf P}\{ξ=1\}>0$, then we show that the maximum leaf-height over all nodes in $T_n$ is in probability asymptotic to $\log n/\log(1/p_1)$. If $p_1 = 0$ and $κ$ is the first integer $i>1$ with ${\bf P}\{ξ=i\}>0$, then the leaf-height is in probability asymptotic to $\log_κ\log n$.
△ Less
Submitted 12 April, 2022; v1 submitted 28 June, 2021;
originally announced June 2021.
-
Leaf multiplicity in a Bienaymé-Galton-Watson tree
Authors:
Anna M. Brandenberger,
Luc Devroye,
Marcel K. Goh,
Rosie Y. Zhao
Abstract:
This note defines a notion of multiplicity for nodes in a rooted tree and presents an asymptotic calculation of the maximum multiplicity over all leaves in a Bienaymé-Galton-Watson tree with critical offspring distribution $ξ$, conditioned on the tree being of size $n$. In particular, we show that if $S_n$ is the maximum multiplicity in a conditional Bienaymé-Galton-Watson tree, then…
▽ More
This note defines a notion of multiplicity for nodes in a rooted tree and presents an asymptotic calculation of the maximum multiplicity over all leaves in a Bienaymé-Galton-Watson tree with critical offspring distribution $ξ$, conditioned on the tree being of size $n$. In particular, we show that if $S_n$ is the maximum multiplicity in a conditional Bienaymé-Galton-Watson tree, then $S_n = Ω(\log n)$ asymptotically in probability and under the further assumption that ${\bf E}\{2^ξ\} < \infty$, we have $S_n = O(\log n)$ asymptotically in probability as well. Explicit formulas are given for the constants in both bounds. We conclude by discussing links with an alternate definition of multiplicity that arises in the root-estimation problem.
△ Less
Submitted 21 March, 2022; v1 submitted 25 May, 2021;
originally announced May 2021.
-
Consistent Density Estimation Under Discrete Mixture Models
Authors:
Luc Devroye,
Alex Dytso
Abstract:
This work considers a problem of estimating a mixing probability density $f$ in the setting of discrete mixture models. The paper consists of three parts.
The first part focuses on the construction of an $L_1$ consistent estimator of $f$. In particular, under the assumptions that the probability measure $μ$ of the observation is atomic, and the map from $f$ to $μ$ is bijective, it is shown that…
▽ More
This work considers a problem of estimating a mixing probability density $f$ in the setting of discrete mixture models. The paper consists of three parts.
The first part focuses on the construction of an $L_1$ consistent estimator of $f$. In particular, under the assumptions that the probability measure $μ$ of the observation is atomic, and the map from $f$ to $μ$ is bijective, it is shown that there exists an estimator $f_n$ such that for every density $f$ $\lim_{n\to \infty} \mathbb{E} \left[ \int |f_n -f | \right]=0$.
The second part discusses the implementation details. Specifically, it is shown that the consistency for every $f$ can be attained with a computationally feasible estimator.
The third part, as a study case, considers a Poisson mixture model. In particular, it is shown that in the Poisson noise setting, the bijection condition holds and, hence, estimation can be performed consistently for every $f$.
△ Less
Submitted 10 May, 2021; v1 submitted 3 May, 2021;
originally announced May 2021.
-
On the consistency of the Kozachenko-Leonenko entropy estimate
Authors:
Luc Devroye,
László Györfi
Abstract:
We revisit the problem of the estimation of the differential entropy $H(f)$ of a random vector $X$ in $R^d$ with density $f$, assuming that $H(f)$ exists and is finite. In this note, we study the consistency of the popular nearest neighbor estimate $H_n$ of Kozachenko and Leonenko. Without any smoothness condition we show that the estimate is consistent ($E\{|H_n - H(f)|\} \to 0$ as…
▽ More
We revisit the problem of the estimation of the differential entropy $H(f)$ of a random vector $X$ in $R^d$ with density $f$, assuming that $H(f)$ exists and is finite. In this note, we study the consistency of the popular nearest neighbor estimate $H_n$ of Kozachenko and Leonenko. Without any smoothness condition we show that the estimate is consistent ($E\{|H_n - H(f)|\} \to 0$ as $n \to \infty$) if and only if $\mathbb{E} \{ \log ( \| X \| + 1 )\} < \infty$. Furthermore, if $X$ has compact support, then $H_n \to H(f)$ almost surely.
△ Less
Submitted 25 February, 2021;
originally announced February 2021.
-
On Mean Estimation for Heteroscedastic Random Variables
Authors:
Luc Devroye,
Silvio Lattanzi,
Gabor Lugosi,
Nikita Zhivotovskiy
Abstract:
We study the problem of estimating the common mean $μ$ of $n$ independent symmetric random variables with different and unknown standard deviations $σ_1 \le σ_2 \le \cdots \leσ_n$. We show that, under some mild regularity assumptions on the distribution, there is a fully adaptive estimator $\widehatμ$ such that it is invariant to permutations of the elements of the sample and satisfies that, up to…
▽ More
We study the problem of estimating the common mean $μ$ of $n$ independent symmetric random variables with different and unknown standard deviations $σ_1 \le σ_2 \le \cdots \leσ_n$. We show that, under some mild regularity assumptions on the distribution, there is a fully adaptive estimator $\widehatμ$ such that it is invariant to permutations of the elements of the sample and satisfies that, up to logarithmic factors, with high probability, \[ |\widehatμ - μ| \lesssim \min\left\{σ_{m^*}, \frac{\sqrt{n}}{\sum_{i = \sqrt{n}}^n σ_i^{-1}} \right\}~, \] where the index $m^* \lesssim \sqrt{n}$ satisfies $m^* \approx \sqrt{σ_{m^*}\sum_{i = m^*}^nσ_i^{-1}}$.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
The Horton-Strahler Number of Conditioned Galton-Watson Trees
Authors:
Anna M. Brandenberger,
Luc Devroye,
Tommy Reddad
Abstract:
The Horton-Strahler number of a tree is a measure of its branching complexity; it is also known in the literature as the register function. We show that for critical Galton-Watson trees with finite variance conditioned to be of size $n$, the Horton-Strahler number grows as $\frac{1}{2}\log_2 n$ in probability. We further define some generalizations of this number. Among these are the rigid Horton-…
▽ More
The Horton-Strahler number of a tree is a measure of its branching complexity; it is also known in the literature as the register function. We show that for critical Galton-Watson trees with finite variance conditioned to be of size $n$, the Horton-Strahler number grows as $\frac{1}{2}\log_2 n$ in probability. We further define some generalizations of this number. Among these are the rigid Horton-Strahler number and the $k$-ary register function, for which we prove asymptotic results analogous to the standard case.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
Root estimation in Galton-Watson trees
Authors:
Anna M. Brandenberger,
Luc Devroye,
Marcel K. Goh
Abstract:
Given only the free-tree structure of a tree, the root estimation problem asks if one can guess which of the free tree's nodes is the root of the original tree. We determine the maximum-likelihood estimator for the root of a free tree when the underlying tree is a size-conditioned Galton-Watson tree and calculate its probability of being correct.
Given only the free-tree structure of a tree, the root estimation problem asks if one can guess which of the free tree's nodes is the root of the original tree. We determine the maximum-likelihood estimator for the root of a free tree when the underlying tree is a size-conditioned Galton-Watson tree and calculate its probability of being correct.
△ Less
Submitted 17 August, 2021; v1 submitted 11 July, 2020;
originally announced July 2020.
-
Broadcasting on random recursive trees
Authors:
Louigi Addario-Berry,
Luc Devroye,
Gabor Lugosi,
Vasiliki Velona
Abstract:
We study the broadcasting problem when the underlying tree is a random recursive tree. The root of the tree has a random bit value assigned. Every other vertex has the same bit value as its parent with probability $1-q$ and the opposite value with probability $q$, where $q \in [0,1]$. The broadcasting problem consists in estimating the value of the root bit upon observing the unlabeled tree, toget…
▽ More
We study the broadcasting problem when the underlying tree is a random recursive tree. The root of the tree has a random bit value assigned. Every other vertex has the same bit value as its parent with probability $1-q$ and the opposite value with probability $q$, where $q \in [0,1]$. The broadcasting problem consists in estimating the value of the root bit upon observing the unlabeled tree, together with the bit value associated with every vertex. In a more difficult version of the problem, the unlabeled tree is observed but only the bit values of the leaves are observed. When the underlying tree is a uniform random recursive tree, in both variants of the problem we characterize the values of $q$ for which the optimal reconstruction method has a probability of error bounded away from $1/2$. We also show that the probability of error is bounded by a constant times $q$. Two simple reconstruction rules are analyzed in detail. One of them is the simple majority vote, the other is the bit value of the centroid of the tree. Most results are extended to linear preferential attachment trees as well.
△ Less
Submitted 24 April, 2021; v1 submitted 21 June, 2020;
originally announced June 2020.
-
Probabilistic Analysis of RRT Trees
Authors:
Konrad Anand,
Luc Devroye
Abstract:
This thesis presents analysis of the properties and run-time of the Rapidly-exploring Random Tree (RRT) algorithm. It is shown that the time for the RRT with stepsize $ε$ to grow close to every point in the $d$-dimensional unit cube is $Θ\left(\frac1{ε^d} \log \left(\frac1ε\right)\right)$. Also, the time it takes for the tree to reach a region of positive probability is…
▽ More
This thesis presents analysis of the properties and run-time of the Rapidly-exploring Random Tree (RRT) algorithm. It is shown that the time for the RRT with stepsize $ε$ to grow close to every point in the $d$-dimensional unit cube is $Θ\left(\frac1{ε^d} \log \left(\frac1ε\right)\right)$. Also, the time it takes for the tree to reach a region of positive probability is $O\left(ε^{-\frac32}\right)$. Finally, a relationship is shown to the Nearest Neighbour Tree (NNT). This relationship shows that the total Euclidean path length after $n$ steps is $O(\sqrt n)$ and the expected height of the tree is bounded above by $(e + o(1)) \log n$.
△ Less
Submitted 3 May, 2020;
originally announced May 2020.
-
Hipster random walks
Authors:
Louigi Addario-Berry,
Hannah Cairns,
Luc Devroye,
Celine Kerriou,
Rivka Mitchell
Abstract:
We introduce and study a family of random processes on trees we call hipster random walks, special instances of which we heuristically connect to the min-plus binary trees introduced by Robin Pemantle and studied by Auffinger and Cable (2017; arXiv:1709.07849), and to the critical random hierarchical lattice studied by Hambly and Jordan (2004). We prove distributional convergence for the processes…
▽ More
We introduce and study a family of random processes on trees we call hipster random walks, special instances of which we heuristically connect to the min-plus binary trees introduced by Robin Pemantle and studied by Auffinger and Cable (2017; arXiv:1709.07849), and to the critical random hierarchical lattice studied by Hambly and Jordan (2004). We prove distributional convergence for the processes by showing that their evolutions can be understood as a discrete analogues of certain convection-diffusion equations, then using a combination of coupling arguments and results from the numerical analysis literature on convergence of numerical approximations of PDEs.
△ Less
Submitted 16 September, 2019;
originally announced September 2019.
-
Discrete minimax estimation with trees
Authors:
Luc Devroye,
Tommy Reddad
Abstract:
We propose a simple recursive data-based partitioning scheme which produces piecewise-constant or piecewise-linear density estimates on intervals, and show how this scheme can determine the optimal $L_1$ minimax rate for some discrete nonparametric classes.
We propose a simple recursive data-based partitioning scheme which produces piecewise-constant or piecewise-linear density estimates on intervals, and show how this scheme can determine the optimal $L_1$ minimax rate for some discrete nonparametric classes.
△ Less
Submitted 27 June, 2019; v1 submitted 14 December, 2018;
originally announced December 2018.
-
The total variation distance between high-dimensional Gaussians with the same mean
Authors:
Luc Devroye,
Abbas Mehrabian,
Tommy Reddad
Abstract:
Given two high-dimensional Gaussians with the same mean, we prove a lower and an upper bound for their total variation distance, which are within a constant factor of one another.
Given two high-dimensional Gaussians with the same mean, we prove a lower and an upper bound for their total variation distance, which are within a constant factor of one another.
△ Less
Submitted 22 October, 2023; v1 submitted 19 October, 2018;
originally announced October 2018.
-
On the discovery of the seed in uniform attachment trees
Authors:
Luc Devroye,
Tommy Reddad
Abstract:
We investigate the size of vertex confidence sets for including part of (or the entirety of) the seed in seeded uniform attachment trees, given knowledge of some of the seed's properties, and with a prescribed probability of failure. We also study the problem of identifying the leaves of a seed in a seeded uniform attachment tree, given knowledge of the positions of all internal nodes of the seed.
We investigate the size of vertex confidence sets for including part of (or the entirety of) the seed in seeded uniform attachment trees, given knowledge of some of the seed's properties, and with a prescribed probability of failure. We also study the problem of identifying the leaves of a seed in a seeded uniform attachment tree, given knowledge of the positions of all internal nodes of the seed.
△ Less
Submitted 22 February, 2019; v1 submitted 1 October, 2018;
originally announced October 2018.
-
The Minimax Learning Rates of Normal and Ising Undirected Graphical Models
Authors:
Luc Devroye,
Abbas Mehrabian,
Tommy Reddad
Abstract:
Let $G$ be an undirected graph with $m$ edges and $d$ vertices. We show that $d$-dimensional Ising models on $G$ can be learned from $n$ i.i.d. samples within expected total variation distance some constant factor of $\min\{1, \sqrt{(m + d)/n}\}$, and that this rate is optimal. We show that the same rate holds for the class of $d$-dimensional multivariate normal undirected graphical models with re…
▽ More
Let $G$ be an undirected graph with $m$ edges and $d$ vertices. We show that $d$-dimensional Ising models on $G$ can be learned from $n$ i.i.d. samples within expected total variation distance some constant factor of $\min\{1, \sqrt{(m + d)/n}\}$, and that this rate is optimal. We show that the same rate holds for the class of $d$-dimensional multivariate normal undirected graphical models with respect to $G$. We also identify the optimal rate of $\min\{1, \sqrt{m/n}\}$ for Ising models with no external magnetic field.
△ Less
Submitted 3 June, 2020; v1 submitted 18 June, 2018;
originally announced June 2018.
-
Recursive functions on conditional Galton--Watson trees
Authors:
Nicolas Broutin,
Luc Devroye,
Nicolas Fraiman
Abstract:
A recursive function on a tree is a function in which each leaf has a given value, and each internal node has a value equal to a function of the number of children, the values of the children, and possibly an explicitly specified random element $U$. The value of the root is the key quantity of interest in general. In this first study, all node values and function values are in a finite set $S$. In…
▽ More
A recursive function on a tree is a function in which each leaf has a given value, and each internal node has a value equal to a function of the number of children, the values of the children, and possibly an explicitly specified random element $U$. The value of the root is the key quantity of interest in general. In this first study, all node values and function values are in a finite set $S$. In this note, we describe the limit behavior when the leaf values are drawn independently from a fixed distribution on $S$, and the tree $T_n$ is a random Galton--Watson tree of size $n$.
△ Less
Submitted 23 March, 2020; v1 submitted 23 May, 2018;
originally announced May 2018.
-
K-cut on paths and some trees
Authors:
Xing Shi Cai,
Luc Devroye,
Cecilia Holmgren,
Fiona Skerman
Abstract:
We define the (random) $k$-cut number of a rooted graph to model the difficulty of the destruction of a resilient network. The process is as the cut model of Meir and Moon except now a node must be cut $k$ times before it is destroyed. The first order terms of the expectation and variance of $\mathcal{X}_{n}$, the $k$-cut number of a path of length $n$, are proved. We also show that…
▽ More
We define the (random) $k$-cut number of a rooted graph to model the difficulty of the destruction of a resilient network. The process is as the cut model of Meir and Moon except now a node must be cut $k$ times before it is destroyed. The first order terms of the expectation and variance of $\mathcal{X}_{n}$, the $k$-cut number of a path of length $n$, are proved. We also show that $\mathcal{X}_{n}$, after rescaling, converges in distribution to a limit $\mathcal{B}_{k}$, which has a complicated representation. The paper then briefly discusses the $k$-cut number of some trees and general graphs. We conclude by some analytic results which may be of interest.
△ Less
Submitted 30 January, 2019; v1 submitted 9 April, 2018;
originally announced April 2018.
-
Local optima of the Sherrington-Kirkpatrick Hamiltonian
Authors:
Louigi Addario-Berry,
Luc Devroye,
Gabor Lugosi,
Roberto Imbuzeiro Oliveira
Abstract:
We study local optima of the Hamiltonian of the Sherrington-Kirkpatrick model. We compute the exponent of the expected number of local optima and determine the "typical" value of the Hamiltonian.
We study local optima of the Hamiltonian of the Sherrington-Kirkpatrick model. We compute the exponent of the expected number of local optima and determine the "typical" value of the Hamiltonian.
△ Less
Submitted 20 December, 2017;
originally announced December 2017.
-
A lower bound on the size of an absorbing set in an arc-coloured tournament
Authors:
Laurent Beaudou,
Luc Devroye,
Gena Hahn
Abstract:
Bousquet, Lochet and Thomassé recently gave an elegant proof that for any integer $n$, there is a least integer $f(n)$ such that any tournament whose arcs are coloured with $n$ colours contains a subset of vertices $S$ of size $f(n)$ with the property that any vertex not in $S$ admits a monochromatic path to some vertex of $S$. In this note we provide a lower bound on the value $f(n)$.
Bousquet, Lochet and Thomassé recently gave an elegant proof that for any integer $n$, there is a least integer $f(n)$ such that any tournament whose arcs are coloured with $n$ colours contains a subset of vertices $S$ of size $f(n)$ with the property that any vertex not in $S$ admits a monochromatic path to some vertex of $S$. In this note we provide a lower bound on the value $f(n)$.
△ Less
Submitted 30 August, 2017; v1 submitted 29 August, 2017;
originally announced August 2017.
-
Notes on Growing a Tree in a Graph
Authors:
Luc Devroye,
Vida Dujmović,
Alan Frieze,
Abbas Mehrabian,
Pat Morin,
Bruce Reed
Abstract:
We study the height of a spanning tree $T$ of a graph $G$ obtained by starting with a single vertex of $G$ and repeatedly selecting, uniformly at random, an edge of $G$ with exactly one endpoint in $T$ and adding this edge to $T$.
We study the height of a spanning tree $T$ of a graph $G$ obtained by starting with a single vertex of $G$ and repeatedly selecting, uniformly at random, an edge of $G$ with exactly one endpoint in $T$ and adding this edge to $T$.
△ Less
Submitted 4 July, 2017; v1 submitted 30 June, 2017;
originally announced July 2017.
-
The heavy path approach to Galton-Watson trees with an application to Apollonian networks
Authors:
Luc Devroye,
Cecilia Holmgren,
Henning Sulzbach
Abstract:
We study the heavy path decomposition of conditional Galton-Watson trees. In a standard Galton-Watson tree conditional on its size $n$, we order all children by their subtree sizes, from large (heavy) to small. A node is marked if it is among the $k$ heaviest nodes among its siblings. Unmarked nodes and their subtrees are removed, leaving only a tree of marked nodes, which we call the $k$-heavy tr…
▽ More
We study the heavy path decomposition of conditional Galton-Watson trees. In a standard Galton-Watson tree conditional on its size $n$, we order all children by their subtree sizes, from large (heavy) to small. A node is marked if it is among the $k$ heaviest nodes among its siblings. Unmarked nodes and their subtrees are removed, leaving only a tree of marked nodes, which we call the $k$-heavy tree. We study various properties of these trees, including their size and the maximal distance from any original node to the $k$-heavy tree. In particular, under some moment condition, the $2$-heavy tree is with high probability larger than $cn$ for some constant $c > 0$, and the maximal distance from the $k$-heavy tree is $O(n^{1/(k+1)})$ in probability. As a consequence, for uniformly random Apollonian networks of size $n$, the expected size of the longest simple path is $Ω(n)$.
△ Less
Submitted 10 January, 2017;
originally announced January 2017.
-
A study of large fringe and non-fringe subtrees in conditional Galton-Watson trees
Authors:
Xing Shi Cai,
Luc Devroye
Abstract:
We study the conditions for families of subtrees to exist with high probability (whp) in a Galton-Walton tree of size $n$. We first give a Poisson approximation of fringe subtree counts, which yields the height of the maximal complete $r$-ary fringe subtree. Then we determine the maximal $K_n$ such that every tree of size at most $K_n$ appears as fringe subtree whp. Finally, we study non-fringe su…
▽ More
We study the conditions for families of subtrees to exist with high probability (whp) in a Galton-Walton tree of size $n$. We first give a Poisson approximation of fringe subtree counts, which yields the height of the maximal complete $r$-ary fringe subtree. Then we determine the maximal $K_n$ such that every tree of size at most $K_n$ appears as fringe subtree whp. Finally, we study non-fringe subtree counts and determine the height of the maximal complete $r$-ary non-fringe subtree.
△ Less
Submitted 11 February, 2016;
originally announced February 2016.
-
On the measure of Voronoi cells
Authors:
Luc Devroye,
László Györfi,
Gábor Lugosi,
Harro Walk
Abstract:
$n$ independent random points drawn from a density $f$ in $R^d$ define a random Voronoi partition. We study the measure of a typical cell of the partition. We prove that the asymptotic distribution of the probability measure of the cell centered at a point $x \in R^d$ is independent of $x$ and the density $f…
▽ More
$n$ independent random points drawn from a density $f$ in $R^d$ define a random Voronoi partition. We study the measure of a typical cell of the partition. We prove that the asymptotic distribution of the probability measure of the cell centered at a point $x \in R^d$ is independent of $x$ and the density $f$. We determine all moments of the asymptotic distribution and show that the distribution becomes more concentrated as $d$ becomes large. In particular, we show that the variance converges to zero exponentially fast in $d$. %We also study the measure of the largest cell of the partition. %{\red We also obtain a density-free bound for the rate of convergence of the diameter of a typical Voronoi cell.
△ Less
Submitted 14 December, 2015;
originally announced December 2015.
-
Sub-Gaussian mean estimators
Authors:
Luc Devroye,
Matthieu Lerasle,
Gabor Lugosi,
Roberto I. Oliveira
Abstract:
We discuss the possibilities and limitations of estimating the mean of a real-valued random variable from independent and identically distributed observations from a non-asymptotic point of view. In particular, we define estimators with a sub-Gaussian behavior even for certain heavy-tailed distributions. We also prove various impossibility results for mean estimators.
We discuss the possibilities and limitations of estimating the mean of a real-valued random variable from independent and identically distributed observations from a non-asymptotic point of view. In particular, we define estimators with a sub-Gaussian behavior even for certain heavy-tailed distributions. We also prove various impossibility results for mean estimators.
△ Less
Submitted 18 September, 2015;
originally announced September 2015.
-
Exceptional rotations of random graphs: a VC theory
Authors:
Louigi Addario-Berry,
Shankar Bhamidi,
Sébastien Bubeck,
Luc Devroye,
Gabor Lugosi,
Roberto Imbuzeiro Oliveira
Abstract:
In this paper we explore maximal deviations of large random structures from their typical behavior. We introduce a model for a high-dimensional random graph process and ask analogous questions to those of Vapnik and Chervonenkis for deviations of averages: how "rich" does the process have to be so that one sees atypical behavior. In particular, we study a natural process of Erdős-Rényi random grap…
▽ More
In this paper we explore maximal deviations of large random structures from their typical behavior. We introduce a model for a high-dimensional random graph process and ask analogous questions to those of Vapnik and Chervonenkis for deviations of averages: how "rich" does the process have to be so that one sees atypical behavior. In particular, we study a natural process of Erdős-Rényi random graphs indexed by unit vectors in $\mathbb{R}^d$. We investigate the deviations of the process with respect to three fundamental properties: clique number, chromatic number, and connectivity. In all cases we establish upper and lower bounds for the minimal dimension $d$ that guarantees the existence of "exceptional directions" in which the random graph behaves atypically with respect to the property. For each of the three properties, four theorems are established, to describe upper and lower bounds for the threshold dimension in the subcritical and supercritical regimes.
△ Less
Submitted 9 June, 2015;
originally announced June 2015.
-
The graph structure of a deterministic automaton chosen at random: full version
Authors:
Xing Shi Cai,
Luc Devroye
Abstract:
A deterministic finite automaton (DFA) of $n$ states over a $k$-letter alphabet can be seen as a digraph with $n$ vertices which all have exactly $k$ labeled out-arcs ($k$-out digraph). In 1973 Grusho first proved that with high probability (whp) in a random $k$-out digraph there is a strongly connected component (SCC) of linear size that is reachable from all vertices, i.e., a giant. He also prov…
▽ More
A deterministic finite automaton (DFA) of $n$ states over a $k$-letter alphabet can be seen as a digraph with $n$ vertices which all have exactly $k$ labeled out-arcs ($k$-out digraph). In 1973 Grusho first proved that with high probability (whp) in a random $k$-out digraph there is a strongly connected component (SCC) of linear size that is reachable from all vertices, i.e., a giant. He also proved that the size of the giant follows a central limit law. We show that whp the part outside the giant contains at most a few short cycles and mostly consists of overlap** tree-like structures. Thus the directed acyclic graph (DAG) of a random $k$-out digraph is almost the same as the digraph with the giant contracted into one vertex. These findings lead to a new, concise and self-contained proof of Grusho's theorem. This work also contains some other results including the structure outside the giant, the phase transition phenomenon in strong connectivity, the typical distance, and an extension to simple digraphs.
△ Less
Submitted 9 August, 2016; v1 submitted 23 April, 2015;
originally announced April 2015.
-
Explosion and linear transit times in infinite trees
Authors:
Omid Amini,
Luc Devroye,
Simon Griffiths,
Neil Olver
Abstract:
Let $T$ be an infinite rooted tree with weights $w_e$ assigned to its edges. Denote by $m_n(T)$ the minimum weight of a path from the root to a node of the $n$th generation. We consider the possible behaviour of $m_n(T)$ with focus on the two following cases: we say $T$ is explosive if \[ \lim_{n\to \infty}m_n(T) < \infty, \] and say that $T$ exhibits linear growth if \[ \liminf_{n\to \infty} \fra…
▽ More
Let $T$ be an infinite rooted tree with weights $w_e$ assigned to its edges. Denote by $m_n(T)$ the minimum weight of a path from the root to a node of the $n$th generation. We consider the possible behaviour of $m_n(T)$ with focus on the two following cases: we say $T$ is explosive if \[ \lim_{n\to \infty}m_n(T) < \infty, \] and say that $T$ exhibits linear growth if \[ \liminf_{n\to \infty} \frac{m_n(T)}{n} > 0. \]
We consider a class of infinite randomly weighted trees related to the Poisson-weighted infinite tree, and determine precisely which trees in this class have linear growth almost surely. We then apply this characterization to obtain new results concerning the event of explosion in infinite randomly weighted spherically-symmetric trees, answering a question of Pemantle and Peres. As a further application, we consider the random real tree generated by attaching sticks of deterministic decreasing lengths, and determine for which sequences of lengths the tree has finite height almost surely.
△ Less
Submitted 17 November, 2014;
originally announced November 2014.
-
Finding Adam in random growing trees
Authors:
Sébastien Bubeck,
Luc Devroye,
Gábor Lugosi
Abstract:
We investigate algorithms to find the first vertex in large trees generated by either the uniform attachment or preferential attachment model. We require the algorithm to output a set of $K$ vertices, such that, with probability at least $1-ε$, the first vertex is in this set. We show that for any $ε$, there exist such algorithms with $K$ independent of the size of the input tree. Moreover, we pro…
▽ More
We investigate algorithms to find the first vertex in large trees generated by either the uniform attachment or preferential attachment model. We require the algorithm to output a set of $K$ vertices, such that, with probability at least $1-ε$, the first vertex is in this set. We show that for any $ε$, there exist such algorithms with $K$ independent of the size of the input tree. Moreover, we provide almost tight bounds for the best value of $K$ as a function of $ε$. In the uniform attachment case we show that the optimal $K$ is subpolynomial in $1/ε$, and that it has to be at least superpolylogarithmic. On the other hand, the preferential attachment case is exponentially harder, as we prove that the best $K$ is polynomial in $1/ε$. We conclude the paper with several open problems.
△ Less
Submitted 1 December, 2015; v1 submitted 12 November, 2014;
originally announced November 2014.
-
Almost optimal sparsification of random geometric graphs
Authors:
Nicolas Broutin,
Luc Devroye,
Gabor Lugosi
Abstract:
A random geometric irrigation graph $Γ_n(r_n,ξ)$ has $n$ vertices identified by $n$ independent uniformly distributed points $X_1,\ldots,X_n$ in the unit square $[0,1]^2$. Each point $X_i$ selects $ξ_i$ neighbors at random, without replacement, among those points $X_j$ ($j\neq i$) for which $\|X_i-X_j\| < r_n$, and the selected vertices are connected to $X_i$ by an edge. The number $ξ_i$ of the ne…
▽ More
A random geometric irrigation graph $Γ_n(r_n,ξ)$ has $n$ vertices identified by $n$ independent uniformly distributed points $X_1,\ldots,X_n$ in the unit square $[0,1]^2$. Each point $X_i$ selects $ξ_i$ neighbors at random, without replacement, among those points $X_j$ ($j\neq i$) for which $\|X_i-X_j\| < r_n$, and the selected vertices are connected to $X_i$ by an edge. The number $ξ_i$ of the neighbors is an integer-valued random variable, chosen independently with identical distribution for each $X_i$ such that $ξ_i$ satisfies $1\le ξ_i \le κ$ for a constant $κ>1$. We prove that when $r_n = γ_n \sqrt{\log n/n}$ for $γ_n \to \infty$ with $γ_n =o(n^{1/6}/\log^{5/6}n)$, then the random geometric irrigation graph experiences explosive percolation in the sense that when $\mathbf E ξ_i=1$, then the largest connected component has size $o(n)$ but if $\mathbf E ξ_i >1$, then the size of the largest connected component is with high probability $n-o(n)$. This offers a natural non-centralized sparsification of a random geometric graph that is mostly connected.
△ Less
Submitted 7 March, 2014; v1 submitted 5 March, 2014;
originally announced March 2014.
-
Connectivity of sparse Bluetooth networks
Authors:
Nicolas Broutin,
Luc Devroye,
Gábor Lugosi
Abstract:
Consider a random geometric graph defined on $n$ vertices uniformly distributed in the $d$-dimensional unit torus. Two vertices are connected if their distance is less than a "visibility radius" $r_n$. We consider {\sl Bluetooth networks} that are locally sparsified random geometric graphs. Each vertex selects $c$ of its neighbors in the random geometric graph at random and connects only to the se…
▽ More
Consider a random geometric graph defined on $n$ vertices uniformly distributed in the $d$-dimensional unit torus. Two vertices are connected if their distance is less than a "visibility radius" $r_n$. We consider {\sl Bluetooth networks} that are locally sparsified random geometric graphs. Each vertex selects $c$ of its neighbors in the random geometric graph at random and connects only to the selected points. We show that if the visibility radius is at least of the order of $n^{-(1-δ)/d}$ for some $δ> 0$, then a constant value of $c$ is sufficient for the graph to be connected, with high probability. It suffices to take $c \ge \sqrt{(1+ε)/δ} + K$ for any positive $ε$ where $K$ is a constant depending on $d$ only. On the other hand, with $c\le \sqrt{(1-ε)/δ}$, the graph is disconnected, with high probability.
△ Less
Submitted 15 February, 2014;
originally announced February 2014.
-
Protected nodes and fringe subtrees in some random trees
Authors:
Luc Devroye,
Svante Janson
Abstract:
We study protected nodes in various classes of random rooted trees by putting them in the general context of fringe subtrees introduced by Aldous (1991). Several types of random trees are considered: simply generated trees (or conditioned Galton-Watson trees), which includes several cases treated separately by other authors, binary search trees and random recursive trees. This gives unified and si…
▽ More
We study protected nodes in various classes of random rooted trees by putting them in the general context of fringe subtrees introduced by Aldous (1991). Several types of random trees are considered: simply generated trees (or conditioned Galton-Watson trees), which includes several cases treated separately by other authors, binary search trees and random recursive trees. This gives unified and simple proofs of several earlier results, as well as new results.
△ Less
Submitted 2 October, 2013;
originally announced October 2013.
-
Cellular Tree Classifiers
Authors:
Gérard Biau,
Luc Devroye
Abstract:
The cellular tree classifier model addresses a fundamental problem in the design of classifiers for a parallel or distributed computing world: Given a data set, is it sufficient to apply a majority rule for classification, or shall one split the data into two or more parts and send each part to a potentially different computer (or cell) for further processing? At first sight, it seems impossible t…
▽ More
The cellular tree classifier model addresses a fundamental problem in the design of classifiers for a parallel or distributed computing world: Given a data set, is it sufficient to apply a majority rule for classification, or shall one split the data into two or more parts and send each part to a potentially different computer (or cell) for further processing? At first sight, it seems impossible to define with this paradigm a consistent classifier as no cell knows the "original data size", $n$. However, we show that this is not so by exhibiting two different consistent classifiers. The consistency is universal but is only shown for distributions with nonatomic marginals.
△ Less
Submitted 25 June, 2013; v1 submitted 20 January, 2013;
originally announced January 2013.
-
Depth properties of scaled attachment random recursive trees
Authors:
Luc Devroye,
Omar Fawzi,
Nicolas Fraiman
Abstract:
We study depth properties of a general class of random recursive trees where each node i attaches to the random node iX_i and X_0, ..., X_n is a sequence of i.i.d. random variables taking values in [0,1). We call such trees scaled attachment random recursive trees (SARRT). We prove that the typical depth D_n, the maximum depth (or height) H_n and the minimum depth M_n of a SARRT are asymptotically…
▽ More
We study depth properties of a general class of random recursive trees where each node i attaches to the random node iX_i and X_0, ..., X_n is a sequence of i.i.d. random variables taking values in [0,1). We call such trees scaled attachment random recursive trees (SARRT). We prove that the typical depth D_n, the maximum depth (or height) H_n and the minimum depth M_n of a SARRT are asymptotically given by D_n \sim μ^{-1} \log n, H_n \sim α_{\max} \log n and M_n \sim α_{\min} \log n where μ, α_{\max} and α_{\min} are constants depending only on the distribution of X_0 whenever X_0 has a density. In particular, this gives a new elementary proof for the height of uniform random recursive trees H_n \sim e \log n that does not use branching random walks.
△ Less
Submitted 26 October, 2012;
originally announced October 2012.
-
Connectivity of inhomogeneous random graphs
Authors:
Luc Devroye,
Nicolas Fraiman
Abstract:
We find conditions for the connectivity of inhomogeneous random graphs with intermediate density. Our results generalize the classical result for G(n, p), when p = c log n/n. We draw n independent points X_i from a general distribution on a separable metric space, and let their indices form the vertex set of a graph. An edge (i,j) is added with probability min(1, \K(X_i,X_j) log n/n), where \K \ge…
▽ More
We find conditions for the connectivity of inhomogeneous random graphs with intermediate density. Our results generalize the classical result for G(n, p), when p = c log n/n. We draw n independent points X_i from a general distribution on a separable metric space, and let their indices form the vertex set of a graph. An edge (i,j) is added with probability min(1, \K(X_i,X_j) log n/n), where \K \ge 0 is a fixed kernel. We show that, under reasonably weak assumptions, the connectivity threshold of the model can be determined.
△ Less
Submitted 23 October, 2012;
originally announced October 2012.
-
An Affine Invariant $k$-Nearest Neighbor Regression Estimate
Authors:
Gérard Biau,
Luc Devroye,
Vida Dujmovic,
Adam Krzyzak
Abstract:
We design a data-dependent metric in $\mathbb R^d$ and use it to define the $k$-nearest neighbors of a given point. Our metric is invariant under all affine transformations. We show that, with this metric, the standard $k$-nearest neighbor regression estimate is asymptotically consistent under the usual conditions on $k$, and minimal requirements on the input data.
We design a data-dependent metric in $\mathbb R^d$ and use it to define the $k$-nearest neighbors of a given point. Our metric is invariant under all affine transformations. We show that, with this metric, the standard $k$-nearest neighbor regression estimate is asymptotically consistent under the usual conditions on $k$, and minimal requirements on the input data.
△ Less
Submitted 18 May, 2012; v1 submitted 3 January, 2012;
originally announced January 2012.
-
Random hyperplane search trees in high dimensions
Authors:
Luc Devroye,
James King
Abstract:
Given a set S of n \geq d points in general position in R^d, a random hyperplane split is obtained by sampling d points uniformly at random without replacement from S and splitting based on their affine hull. A random hyperplane search tree is a binary space partition tree obtained by recursive application of random hyperplane splits. We investigate the structural distributions of such random tree…
▽ More
Given a set S of n \geq d points in general position in R^d, a random hyperplane split is obtained by sampling d points uniformly at random without replacement from S and splitting based on their affine hull. A random hyperplane search tree is a binary space partition tree obtained by recursive application of random hyperplane splits. We investigate the structural distributions of such random trees with a particular focus on the growth with d. A blessing of dimensionality arises--as d increases, random hyperplane splits more closely resemble perfectly balanced splits; in turn, random hyperplane search trees more closely resemble perfectly balanced binary search trees.
We prove that, for any fixed dimension d, a random hyperplane search tree storing n points has height at most (1 + O(1/sqrt(d))) log_2 n and average element depth at most (1 + O(1/d)) log_2 n with high probability as n \rightarrow \infty. Further, we show that these bounds are asymptotically optimal with respect to d.
△ Less
Submitted 2 June, 2011;
originally announced June 2011.
-
Connectivity threshold for Bluetooth graphs
Authors:
Nicolas Broutin,
Luc Devroye,
Nicolas Fraiman,
Gábor Lugosi
Abstract:
We study the connectivity properties of random Bluetooth graphs that model certain "ad hoc" wireless networks. The graphs are obtained as "irrigation subgraphs" of the well-known random geometric graph model. There are two parameters that control the model: the radius $r$ that determines the "visible neighbors" of each node and the number of edges $c$ that each node is allowed to send to these. Th…
▽ More
We study the connectivity properties of random Bluetooth graphs that model certain "ad hoc" wireless networks. The graphs are obtained as "irrigation subgraphs" of the well-known random geometric graph model. There are two parameters that control the model: the radius $r$ that determines the "visible neighbors" of each node and the number of edges $c$ that each node is allowed to send to these. The randomness comes from the underlying distribution of data points in space and from the choices of each vertex. We prove that no connectivity can take place with high probability for a range of parameters $r, c$ and completely characterize the connectivity threshold (in $c$) for values of $r$ close the critical value for connectivity in the underlying random geometric graph.
△ Less
Submitted 2 March, 2011;
originally announced March 2011.
-
On explosions in heavy-tailed branching random walks
Authors:
Omid Amini,
Luc Devroye,
Simon Griffiths,
Neil Olver
Abstract:
Consider a branching random walk on $\mathbb{R}$, with offspring distribution Z and nonnegative displacement distribution W. We say that explosion occurs if an infinite number of particles may be found within a finite distance of the origin. In this paper, we investigate this phenomenon when the offspring distribution Z is heavy-tailed. Under an appropriate condition, we are able to characterize t…
▽ More
Consider a branching random walk on $\mathbb{R}$, with offspring distribution Z and nonnegative displacement distribution W. We say that explosion occurs if an infinite number of particles may be found within a finite distance of the origin. In this paper, we investigate this phenomenon when the offspring distribution Z is heavy-tailed. Under an appropriate condition, we are able to characterize the pairs (Z, W) for which explosion occurs, by demonstrating the equivalence of explosion with a seemingly much weaker event: that the sum over generations of the minimum displacement in each generation is finite. Furthermore, we demonstrate that our condition on the tail is best possible for this equivalence to occur. We also investigate, under additional smoothness assumptions, the behavior of $M_n$, the position of the particle in generation n closest to the origin, when explosion does not occur (and hence $\lim_{n\rightarrow\infty}M_n=\infty$).
△ Less
Submitted 14 June, 2013; v1 submitted 4 February, 2011;
originally announced February 2011.
-
Sub-Gaussian tail bounds for the width and height of conditioned Galton--Watson trees
Authors:
Louigi Addario-Berry,
Luc Devroye,
Svante Janson
Abstract:
We study the height and width of a Galton--Watson tree with offspring distribution B satisfying E(B)=1, 0 < Var(B) < infinity, conditioned on having exactly n nodes. Under this conditioning, we derive sub-Gaussian tail bounds for both the width (largest number of nodes in any level) and height (greatest level containing a node); the bounds are optimal up to constant factors in the exponent. Under…
▽ More
We study the height and width of a Galton--Watson tree with offspring distribution B satisfying E(B)=1, 0 < Var(B) < infinity, conditioned on having exactly n nodes. Under this conditioning, we derive sub-Gaussian tail bounds for both the width (largest number of nodes in any level) and height (greatest level containing a node); the bounds are optimal up to constant factors in the exponent. Under the same conditioning, we also derive essentially optimal upper tail bounds for the number of nodes at level k, for 1 <= k <= n.
△ Less
Submitted 17 November, 2010;
originally announced November 2010.
-
Copulas in three dimensions with prescribed correlations
Authors:
Luc Devroye,
Gerard Letac
Abstract:
Given an arbitrary three-dimensional correlation matrix, we prove that there exists a three-dimensional joint distribution for the random variable $(X,Y,Z)$ such that $X$,$Y$ and $Z$ are identically distributed with beta distribution $β_{k,k}(dx)$ on $(0,1)$ if $k\geq 1/2$. This implies that any correlation structure can be attained for three-dimensional copulas.
Given an arbitrary three-dimensional correlation matrix, we prove that there exists a three-dimensional joint distribution for the random variable $(X,Y,Z)$ such that $X$,$Y$ and $Z$ are identically distributed with beta distribution $β_{k,k}(dx)$ on $(0,1)$ if $k\geq 1/2$. This implies that any correlation structure can be attained for three-dimensional copulas.
△ Less
Submitted 19 April, 2010;
originally announced April 2010.
-
On combinatorial testing problems
Authors:
Louigi Addario-Berry,
Nicolas Broutin,
Luc Devroye,
Gábor Lugosi
Abstract:
We study a class of hypothesis testing problems in which, upon observing the realization of an $n$-dimensional Gaussian vector, one has to decide whether the vector was drawn from a standard normal distribution or, alternatively, whether there is a subset of the components belonging to a certain given class of sets whose elements have been ``contaminated,'' that is, have a mean different from zero…
▽ More
We study a class of hypothesis testing problems in which, upon observing the realization of an $n$-dimensional Gaussian vector, one has to decide whether the vector was drawn from a standard normal distribution or, alternatively, whether there is a subset of the components belonging to a certain given class of sets whose elements have been ``contaminated,'' that is, have a mean different from zero. We establish some general conditions under which testing is possible and others under which testing is hopeless with a small risk. The combinatorial and geometric structure of the class of sets is shown to play a crucial role. The bounds are illustrated on various examples.
△ Less
Submitted 19 November, 2010; v1 submitted 24 August, 2009;
originally announced August 2009.