-
An asymptotically optimal algorithm for generating bin cardinalities
Authors:
Luc Devroye,
Dimitrios Los
Abstract:
In the balls-into-bins setting, $n$ balls are thrown uniformly at random into $n$ bins. The naïve way to generate the final load vector takes $Θ(n)$ time. However, it is well-known that this load vector has with high probability bin cardinalities of size $Θ(\frac{\log n}{\log \log n})$. Here, we present an algorithm in the RAM model that generates the bin cardinalities of the final load vector in…
▽ More
In the balls-into-bins setting, $n$ balls are thrown uniformly at random into $n$ bins. The naïve way to generate the final load vector takes $Θ(n)$ time. However, it is well-known that this load vector has with high probability bin cardinalities of size $Θ(\frac{\log n}{\log \log n})$. Here, we present an algorithm in the RAM model that generates the bin cardinalities of the final load vector in the optimal $Θ(\frac{\log n}{\log \log n})$ time in expectation and with high probability.
Further, the algorithm that we present is still optimal for any $m \in [n, n \log n]$ balls and can also be used as a building block to efficiently simulate more involved load balancing algorithms. In particular, for the Two-Choice algorithm, which samples two bins in each step and allocates to the least-loaded of the two, we obtain roughly a quadratic speed-up over the naïve simulation.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
A note on estimating the dimension from a random geometric graph
Authors:
Caelan Atamanchuk,
Luc Devroye,
Gabor Lugosi
Abstract:
Let $G_n$ be a random geometric graph with vertex set $[n]$ based on $n$ i.i.d.\ random vectors $X_1,\ldots,X_n$ drawn from an unknown density $f$ on $\R^d$. An edge $(i,j)$ is present when $\|X_i -X_j\| \le r_n$, for a given threshold $r_n$ possibly depending upon $n$, where $\| \cdot \|$ denotes Euclidean distance. We study the problem of estimating the dimension $d$ of the underlying space when…
▽ More
Let $G_n$ be a random geometric graph with vertex set $[n]$ based on $n$ i.i.d.\ random vectors $X_1,\ldots,X_n$ drawn from an unknown density $f$ on $\R^d$. An edge $(i,j)$ is present when $\|X_i -X_j\| \le r_n$, for a given threshold $r_n$ possibly depending upon $n$, where $\| \cdot \|$ denotes Euclidean distance. We study the problem of estimating the dimension $d$ of the underlying space when we have access to the adjacency matrix of the graph but do not know $r_n$ or the vectors $X_i$. The main result of the paper is that there exists an estimator of $d$ that converges to $d$ in probability as $n \to \infty$ for all densities with $\int f^5 < \infty$ whenever $n^{3/2} r_n^d \to \infty$ and $r_n = o(1)$. The conditions allow very sparse graphs since when $n^{3/2} r_n^d \to 0$, the graph contains isolated edges only, with high probability. We also show that, without any condition on the density, a consistent estimator of $d$ exists when $n r_n^d \to \infty$ and $r_n = o(1)$.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
An Algorithm to Recover Shredded Random Matrices
Authors:
Caelan Atamanchuk,
Luc Devroye,
Massimo Vicenzo
Abstract:
Given some binary matrix $M$, suppose we are presented with the collection of its rows and columns in independent arbitrary orderings. From this information, are we able to recover the unique original orderings and matrix? We present an algorithm that identifies whether there is a unique ordering associated with a set of rows and columns, and outputs either the unique correct orderings for the row…
▽ More
Given some binary matrix $M$, suppose we are presented with the collection of its rows and columns in independent arbitrary orderings. From this information, are we able to recover the unique original orderings and matrix? We present an algorithm that identifies whether there is a unique ordering associated with a set of rows and columns, and outputs either the unique correct orderings for the rows and columns or the full collection of all valid orderings and valid matrices. We show that there is a constant $c > 0$ such that the algorithm terminates in $O(n^2)$ time with high probability and in expectation for random $n \times n$ binary matrices with i.i.d.\ Bernoulli $(p)$ entries $(m_{ij})_{ij=1}^n$ such that $\frac{c\log^2(n)}{n(\log\log(n))^2} \leq p \leq \frac{1}{2}$.
△ Less
Submitted 23 April, 2024; v1 submitted 25 October, 2023;
originally announced October 2023.
-
Two-way Linear Probing Revisited
Authors:
Ketan Dalal,
Luc Devroye,
Ebrahim Malalla
Abstract:
We introduce linear probing hashing schemes that construct a hash table of size $n$, with constant load factor $α$, on which the worst-case unsuccessful search time is asymptotically almost surely $O(\log \log n)$. The schemes employ two linear probe sequences to find empty cells for the keys. Matching lower bounds on the maximum cluster size produced by any algorithm that uses two linear probe se…
▽ More
We introduce linear probing hashing schemes that construct a hash table of size $n$, with constant load factor $α$, on which the worst-case unsuccessful search time is asymptotically almost surely $O(\log \log n)$. The schemes employ two linear probe sequences to find empty cells for the keys. Matching lower bounds on the maximum cluster size produced by any algorithm that uses two linear probe sequences are obtained as well.
△ Less
Submitted 18 September, 2023; v1 submitted 11 September, 2023;
originally announced September 2023.
-
Broadcasting in random recursive dags
Authors:
Simon Briend,
Luc Devroye,
Gabor Lugosi
Abstract:
A uniform $k$-{\sc dag} generalizes the uniform random recursive tree by picking $k$ parents uniformly at random from the existing nodes. It starts with $k$ ''roots''. Each of the $k$ roots is assigned a bit. These bits are propagated by a noisy channel. The parents' bits are flipped with probability $p$, and a majority vote is taken. When all nodes have received their bits, the $k$-{\sc dag} is s…
▽ More
A uniform $k$-{\sc dag} generalizes the uniform random recursive tree by picking $k$ parents uniformly at random from the existing nodes. It starts with $k$ ''roots''. Each of the $k$ roots is assigned a bit. These bits are propagated by a noisy channel. The parents' bits are flipped with probability $p$, and a majority vote is taken. When all nodes have received their bits, the $k$-{\sc dag} is shown without identifying the roots. The goal is to estimate the majority bit among the roots. We identify the threshold for $p$ as a function of $k$ below which the majority rule among all nodes yields an error $c+o(1)$ with $c<1/2$. Above the threshold the majority rule errs with probability $1/2+o(1)$.
△ Less
Submitted 24 February, 2024; v1 submitted 2 June, 2023;
originally announced June 2023.
-
Subtractive random forests
Authors:
Nicolas Broutin,
Luc Devroye,
Gabor Lugosi,
Roberto Imbuzeiro Oliveira
Abstract:
Motivated by online recommendation systems, we study a family of random forests. The vertices of the forest are labeled by integers. Each non-positive integer $i\le 0$ is the root of a tree. Vertices labeled by positive integers $n \ge 1$ are attached sequentially such that the parent of vertex $n$ is $n-Z_n$, where the $Z_n$ are i.i.d.\ random variables taking values in $\mathbb N$. We study seve…
▽ More
Motivated by online recommendation systems, we study a family of random forests. The vertices of the forest are labeled by integers. Each non-positive integer $i\le 0$ is the root of a tree. Vertices labeled by positive integers $n \ge 1$ are attached sequentially such that the parent of vertex $n$ is $n-Z_n$, where the $Z_n$ are i.i.d.\ random variables taking values in $\mathbb N$. We study several characteristics of the resulting random forest. In particular, we establish bounds for the expected tree sizes, the number of trees in the forest, the number of leaves, the maximum degree, and the height of the forest. We show that for all distributions of the $Z_n$, the forest contains at most one infinite tree, almost surely. If ${\mathbb E} Z_n < \infty$, then there is a unique infinite tree and the total size of the remaining trees is finite, with finite expected value if ${\mathbb E}Z_n^2 < \infty$. If ${\mathbb E} Z_n = \infty$ then almost surely all trees are finite.
△ Less
Submitted 25 February, 2024; v1 submitted 19 October, 2022;
originally announced October 2022.
-
Consistent Density Estimation Under Discrete Mixture Models
Authors:
Luc Devroye,
Alex Dytso
Abstract:
This work considers a problem of estimating a mixing probability density $f$ in the setting of discrete mixture models. The paper consists of three parts.
The first part focuses on the construction of an $L_1$ consistent estimator of $f$. In particular, under the assumptions that the probability measure $μ$ of the observation is atomic, and the map from $f$ to $μ$ is bijective, it is shown that…
▽ More
This work considers a problem of estimating a mixing probability density $f$ in the setting of discrete mixture models. The paper consists of three parts.
The first part focuses on the construction of an $L_1$ consistent estimator of $f$. In particular, under the assumptions that the probability measure $μ$ of the observation is atomic, and the map from $f$ to $μ$ is bijective, it is shown that there exists an estimator $f_n$ such that for every density $f$ $\lim_{n\to \infty} \mathbb{E} \left[ \int |f_n -f | \right]=0$.
The second part discusses the implementation details. Specifically, it is shown that the consistency for every $f$ can be attained with a computationally feasible estimator.
The third part, as a study case, considers a Poisson mixture model. In particular, it is shown that in the Poisson noise setting, the bijection condition holds and, hence, estimation can be performed consistently for every $f$.
△ Less
Submitted 10 May, 2021; v1 submitted 3 May, 2021;
originally announced May 2021.
-
On Mean Estimation for Heteroscedastic Random Variables
Authors:
Luc Devroye,
Silvio Lattanzi,
Gabor Lugosi,
Nikita Zhivotovskiy
Abstract:
We study the problem of estimating the common mean $μ$ of $n$ independent symmetric random variables with different and unknown standard deviations $σ_1 \le σ_2 \le \cdots \leσ_n$. We show that, under some mild regularity assumptions on the distribution, there is a fully adaptive estimator $\widehatμ$ such that it is invariant to permutations of the elements of the sample and satisfies that, up to…
▽ More
We study the problem of estimating the common mean $μ$ of $n$ independent symmetric random variables with different and unknown standard deviations $σ_1 \le σ_2 \le \cdots \leσ_n$. We show that, under some mild regularity assumptions on the distribution, there is a fully adaptive estimator $\widehatμ$ such that it is invariant to permutations of the elements of the sample and satisfies that, up to logarithmic factors, with high probability, \[ |\widehatμ - μ| \lesssim \min\left\{σ_{m^*}, \frac{\sqrt{n}}{\sum_{i = \sqrt{n}}^n σ_i^{-1}} \right\}~, \] where the index $m^* \lesssim \sqrt{n}$ satisfies $m^* \approx \sqrt{σ_{m^*}\sum_{i = m^*}^nσ_i^{-1}}$.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
Probabilistic Analysis of RRT Trees
Authors:
Konrad Anand,
Luc Devroye
Abstract:
This thesis presents analysis of the properties and run-time of the Rapidly-exploring Random Tree (RRT) algorithm. It is shown that the time for the RRT with stepsize $ε$ to grow close to every point in the $d$-dimensional unit cube is $Θ\left(\frac1{ε^d} \log \left(\frac1ε\right)\right)$. Also, the time it takes for the tree to reach a region of positive probability is…
▽ More
This thesis presents analysis of the properties and run-time of the Rapidly-exploring Random Tree (RRT) algorithm. It is shown that the time for the RRT with stepsize $ε$ to grow close to every point in the $d$-dimensional unit cube is $Θ\left(\frac1{ε^d} \log \left(\frac1ε\right)\right)$. Also, the time it takes for the tree to reach a region of positive probability is $O\left(ε^{-\frac32}\right)$. Finally, a relationship is shown to the Nearest Neighbour Tree (NNT). This relationship shows that the total Euclidean path length after $n$ steps is $O(\sqrt n)$ and the expected height of the tree is bounded above by $(e + o(1)) \log n$.
△ Less
Submitted 3 May, 2020;
originally announced May 2020.
-
Discrete minimax estimation with trees
Authors:
Luc Devroye,
Tommy Reddad
Abstract:
We propose a simple recursive data-based partitioning scheme which produces piecewise-constant or piecewise-linear density estimates on intervals, and show how this scheme can determine the optimal $L_1$ minimax rate for some discrete nonparametric classes.
We propose a simple recursive data-based partitioning scheme which produces piecewise-constant or piecewise-linear density estimates on intervals, and show how this scheme can determine the optimal $L_1$ minimax rate for some discrete nonparametric classes.
△ Less
Submitted 27 June, 2019; v1 submitted 14 December, 2018;
originally announced December 2018.
-
On the discovery of the seed in uniform attachment trees
Authors:
Luc Devroye,
Tommy Reddad
Abstract:
We investigate the size of vertex confidence sets for including part of (or the entirety of) the seed in seeded uniform attachment trees, given knowledge of some of the seed's properties, and with a prescribed probability of failure. We also study the problem of identifying the leaves of a seed in a seeded uniform attachment tree, given knowledge of the positions of all internal nodes of the seed.
We investigate the size of vertex confidence sets for including part of (or the entirety of) the seed in seeded uniform attachment trees, given knowledge of some of the seed's properties, and with a prescribed probability of failure. We also study the problem of identifying the leaves of a seed in a seeded uniform attachment tree, given knowledge of the positions of all internal nodes of the seed.
△ Less
Submitted 22 February, 2019; v1 submitted 1 October, 2018;
originally announced October 2018.
-
Remote Sampling with Applications to General Entanglement Simulation
Authors:
Gilles Brassard,
Luc Devroye,
Claude Gravel
Abstract:
We show how to sample exactly discrete probability distributions whose defining parameters are distributed among remote parties. For this purpose, von Neumann's rejection algorithm is turned into a distributed sampling communication protocol. We study the expected number of bits communicated among the parties and also exhibit a trade-off between the number of rounds of the rejection algorithm and…
▽ More
We show how to sample exactly discrete probability distributions whose defining parameters are distributed among remote parties. For this purpose, von Neumann's rejection algorithm is turned into a distributed sampling communication protocol. We study the expected number of bits communicated among the parties and also exhibit a trade-off between the number of rounds of the rejection algorithm and the number of bits transmitted in the initial phase. Finally, we apply remote sampling to the simulation of quantum entanglement in its most general form possible, when an arbitrary number of parties share systems of arbitrary dimensions on which they apply arbitrary measurements (not restricted to being projective measurements). In case the dimension of the systems and the number of possible outcomes per party is bounded by a constant, it suffices to communicate an expected O(m^2) bits in order to simulate exactly the outcomes that these measurements would have produced on those systems, where m is the number of participants.
△ Less
Submitted 17 July, 2018;
originally announced July 2018.
-
The Minimax Learning Rates of Normal and Ising Undirected Graphical Models
Authors:
Luc Devroye,
Abbas Mehrabian,
Tommy Reddad
Abstract:
Let $G$ be an undirected graph with $m$ edges and $d$ vertices. We show that $d$-dimensional Ising models on $G$ can be learned from $n$ i.i.d. samples within expected total variation distance some constant factor of $\min\{1, \sqrt{(m + d)/n}\}$, and that this rate is optimal. We show that the same rate holds for the class of $d$-dimensional multivariate normal undirected graphical models with re…
▽ More
Let $G$ be an undirected graph with $m$ edges and $d$ vertices. We show that $d$-dimensional Ising models on $G$ can be learned from $n$ i.i.d. samples within expected total variation distance some constant factor of $\min\{1, \sqrt{(m + d)/n}\}$, and that this rate is optimal. We show that the same rate holds for the class of $d$-dimensional multivariate normal undirected graphical models with respect to $G$. We also identify the optimal rate of $\min\{1, \sqrt{m/n}\}$ for Ising models with no external magnetic field.
△ Less
Submitted 3 June, 2020; v1 submitted 18 June, 2018;
originally announced June 2018.
-
Local optima of the Sherrington-Kirkpatrick Hamiltonian
Authors:
Louigi Addario-Berry,
Luc Devroye,
Gabor Lugosi,
Roberto Imbuzeiro Oliveira
Abstract:
We study local optima of the Hamiltonian of the Sherrington-Kirkpatrick model. We compute the exponent of the expected number of local optima and determine the "typical" value of the Hamiltonian.
We study local optima of the Hamiltonian of the Sherrington-Kirkpatrick model. We compute the exponent of the expected number of local optima and determine the "typical" value of the Hamiltonian.
△ Less
Submitted 20 December, 2017;
originally announced December 2017.
-
Notes on Growing a Tree in a Graph
Authors:
Luc Devroye,
Vida Dujmović,
Alan Frieze,
Abbas Mehrabian,
Pat Morin,
Bruce Reed
Abstract:
We study the height of a spanning tree $T$ of a graph $G$ obtained by starting with a single vertex of $G$ and repeatedly selecting, uniformly at random, an edge of $G$ with exactly one endpoint in $T$ and adding this edge to $T$.
We study the height of a spanning tree $T$ of a graph $G$ obtained by starting with a single vertex of $G$ and repeatedly selecting, uniformly at random, an edge of $G$ with exactly one endpoint in $T$ and adding this edge to $T$.
△ Less
Submitted 4 July, 2017; v1 submitted 30 June, 2017;
originally announced July 2017.
-
An analysis of budgeted parallel search on conditional Galton-Watson trees
Authors:
David Avis,
Luc Devroye
Abstract:
Recently Avis and Jordan have demonstrated the efficiency of a simple technique called budgeting for the parallelization of a number of tree search algorithms. The idea is to limit the amount of work that a processor performs before it terminates its search and returns any unexplored nodes to a master process. This limit is set by a critical budget parameter which determines the overhead of the pr…
▽ More
Recently Avis and Jordan have demonstrated the efficiency of a simple technique called budgeting for the parallelization of a number of tree search algorithms. The idea is to limit the amount of work that a processor performs before it terminates its search and returns any unexplored nodes to a master process. This limit is set by a critical budget parameter which determines the overhead of the process. In this paper we study the behaviour of the budget parameter on conditional Galton-Watson trees obtaining asymptotically tight bounds on this overhead. We present empirical results to show that this bound is surprisingly accurate in practice.
△ Less
Submitted 5 September, 2019; v1 submitted 30 March, 2017;
originally announced March 2017.
-
The heavy path approach to Galton-Watson trees with an application to Apollonian networks
Authors:
Luc Devroye,
Cecilia Holmgren,
Henning Sulzbach
Abstract:
We study the heavy path decomposition of conditional Galton-Watson trees. In a standard Galton-Watson tree conditional on its size $n$, we order all children by their subtree sizes, from large (heavy) to small. A node is marked if it is among the $k$ heaviest nodes among its siblings. Unmarked nodes and their subtrees are removed, leaving only a tree of marked nodes, which we call the $k$-heavy tr…
▽ More
We study the heavy path decomposition of conditional Galton-Watson trees. In a standard Galton-Watson tree conditional on its size $n$, we order all children by their subtree sizes, from large (heavy) to small. A node is marked if it is among the $k$ heaviest nodes among its siblings. Unmarked nodes and their subtrees are removed, leaving only a tree of marked nodes, which we call the $k$-heavy tree. We study various properties of these trees, including their size and the maximal distance from any original node to the $k$-heavy tree. In particular, under some moment condition, the $2$-heavy tree is with high probability larger than $cn$ for some constant $c > 0$, and the maximal distance from the $k$-heavy tree is $O(n^{1/(k+1)})$ in probability. As a consequence, for uniformly random Apollonian networks of size $n$, the expected size of the longest simple path is $Ω(n)$.
△ Less
Submitted 10 January, 2017;
originally announced January 2017.
-
The expected bit complexity of the von Neumann rejection algorithm
Authors:
Luc Devroye,
Claude Gravel
Abstract:
In 1952, von Neumann introduced the rejection method for random variate generation. We revisit this algorithm when we have a source of perfect bits at our disposal. In this random bit model, there are universal lower bounds for generating a random variate with a given density to within an accuracy $ε$ derived by Knuth and Yao, and refined by the authors. In general, von Neumann's method fails in t…
▽ More
In 1952, von Neumann introduced the rejection method for random variate generation. We revisit this algorithm when we have a source of perfect bits at our disposal. In this random bit model, there are universal lower bounds for generating a random variate with a given density to within an accuracy $ε$ derived by Knuth and Yao, and refined by the authors. In general, von Neumann's method fails in this model. We propose a modification that insures proper behavior for all Riemann-integrable densities on compact sets, and show that the expected number of random bits needed behaves optimally with respect to universal lower bounds. In particular, we introduce the notion of an oracle that evaluates the supremum and infimum of a function on any rectangle of $\mathbb{R}^{d}$, and develop a quadtree-style extension of the classical rejection method.
△ Less
Submitted 2 April, 2016; v1 submitted 6 November, 2015;
originally announced November 2015.
-
The graph structure of a deterministic automaton chosen at random: full version
Authors:
Xing Shi Cai,
Luc Devroye
Abstract:
A deterministic finite automaton (DFA) of $n$ states over a $k$-letter alphabet can be seen as a digraph with $n$ vertices which all have exactly $k$ labeled out-arcs ($k$-out digraph). In 1973 Grusho first proved that with high probability (whp) in a random $k$-out digraph there is a strongly connected component (SCC) of linear size that is reachable from all vertices, i.e., a giant. He also prov…
▽ More
A deterministic finite automaton (DFA) of $n$ states over a $k$-letter alphabet can be seen as a digraph with $n$ vertices which all have exactly $k$ labeled out-arcs ($k$-out digraph). In 1973 Grusho first proved that with high probability (whp) in a random $k$-out digraph there is a strongly connected component (SCC) of linear size that is reachable from all vertices, i.e., a giant. He also proved that the size of the giant follows a central limit law. We show that whp the part outside the giant contains at most a few short cycles and mostly consists of overlap** tree-like structures. Thus the directed acyclic graph (DAG) of a random $k$-out digraph is almost the same as the digraph with the giant contracted into one vertex. These findings lead to a new, concise and self-contained proof of Grusho's theorem. This work also contains some other results including the structure outside the giant, the phase transition phenomenon in strong connectivity, the typical distance, and an extension to simple digraphs.
△ Less
Submitted 9 August, 2016; v1 submitted 23 April, 2015;
originally announced April 2015.
-
Random variate generation using only finitely many unbiased, independently and identically distributed random bits
Authors:
Luc Devroye,
Claude Gravel
Abstract:
For any discrete probability distributions with bounded entropy, we can generate exactly a random variate using only a finite expected number of perfect coin flips. A perfect coin flip is the outcome of an unbiased Bernoulli random variable. Coin flips are unbiased, independently and identically distributed in all our work. We survey well-known algorithms for the discrete case such as the one from…
▽ More
For any discrete probability distributions with bounded entropy, we can generate exactly a random variate using only a finite expected number of perfect coin flips. A perfect coin flip is the outcome of an unbiased Bernoulli random variable. Coin flips are unbiased, independently and identically distributed in all our work. We survey well-known algorithms for the discrete case such as the one from Knuth and Yao as well as the one from Han and Hoshi. We also discuss briefly about a practical implementation for the algorithm proposed by Knuth and Yao. For the continuous case, only approximations can be hoped for. The freedom to choose the accuracy for the approximations matters, and, for that, we propose to measure accuracy in terms of the Wasserstein $L_\infty$-metric. We derive a universal lower bound for the expected number of perfect coin flips required to reach a desired accuracy. We also provide several algorithms for absolutely continuous distributions that come within our universal lower bound.
△ Less
Submitted 10 November, 2020; v1 submitted 9 February, 2015;
originally announced February 2015.
-
Finding Adam in random growing trees
Authors:
Sébastien Bubeck,
Luc Devroye,
Gábor Lugosi
Abstract:
We investigate algorithms to find the first vertex in large trees generated by either the uniform attachment or preferential attachment model. We require the algorithm to output a set of $K$ vertices, such that, with probability at least $1-ε$, the first vertex is in this set. We show that for any $ε$, there exist such algorithms with $K$ independent of the size of the input tree. Moreover, we pro…
▽ More
We investigate algorithms to find the first vertex in large trees generated by either the uniform attachment or preferential attachment model. We require the algorithm to output a set of $K$ vertices, such that, with probability at least $1-ε$, the first vertex is in this set. We show that for any $ε$, there exist such algorithms with $K$ independent of the size of the input tree. Moreover, we provide almost tight bounds for the best value of $K$ as a function of $ε$. In the uniform attachment case we show that the optimal $K$ is subpolynomial in $1/ε$, and that it has to be at least superpolylogarithmic. On the other hand, the preferential attachment case is exponentially harder, as we prove that the best $K$ is polynomial in $1/ε$. We conclude the paper with several open problems.
△ Less
Submitted 1 December, 2015; v1 submitted 12 November, 2014;
originally announced November 2014.
-
Almost optimal sparsification of random geometric graphs
Authors:
Nicolas Broutin,
Luc Devroye,
Gabor Lugosi
Abstract:
A random geometric irrigation graph $Γ_n(r_n,ξ)$ has $n$ vertices identified by $n$ independent uniformly distributed points $X_1,\ldots,X_n$ in the unit square $[0,1]^2$. Each point $X_i$ selects $ξ_i$ neighbors at random, without replacement, among those points $X_j$ ($j\neq i$) for which $\|X_i-X_j\| < r_n$, and the selected vertices are connected to $X_i$ by an edge. The number $ξ_i$ of the ne…
▽ More
A random geometric irrigation graph $Γ_n(r_n,ξ)$ has $n$ vertices identified by $n$ independent uniformly distributed points $X_1,\ldots,X_n$ in the unit square $[0,1]^2$. Each point $X_i$ selects $ξ_i$ neighbors at random, without replacement, among those points $X_j$ ($j\neq i$) for which $\|X_i-X_j\| < r_n$, and the selected vertices are connected to $X_i$ by an edge. The number $ξ_i$ of the neighbors is an integer-valued random variable, chosen independently with identical distribution for each $X_i$ such that $ξ_i$ satisfies $1\le ξ_i \le κ$ for a constant $κ>1$. We prove that when $r_n = γ_n \sqrt{\log n/n}$ for $γ_n \to \infty$ with $γ_n =o(n^{1/6}/\log^{5/6}n)$, then the random geometric irrigation graph experiences explosive percolation in the sense that when $\mathbf E ξ_i=1$, then the largest connected component has size $o(n)$ but if $\mathbf E ξ_i >1$, then the size of the largest connected component is with high probability $n-o(n)$. This offers a natural non-centralized sparsification of a random geometric graph that is mostly connected.
△ Less
Submitted 7 March, 2014; v1 submitted 5 March, 2014;
originally announced March 2014.
-
Connectivity of sparse Bluetooth networks
Authors:
Nicolas Broutin,
Luc Devroye,
Gábor Lugosi
Abstract:
Consider a random geometric graph defined on $n$ vertices uniformly distributed in the $d$-dimensional unit torus. Two vertices are connected if their distance is less than a "visibility radius" $r_n$. We consider {\sl Bluetooth networks} that are locally sparsified random geometric graphs. Each vertex selects $c$ of its neighbors in the random geometric graph at random and connects only to the se…
▽ More
Consider a random geometric graph defined on $n$ vertices uniformly distributed in the $d$-dimensional unit torus. Two vertices are connected if their distance is less than a "visibility radius" $r_n$. We consider {\sl Bluetooth networks} that are locally sparsified random geometric graphs. Each vertex selects $c$ of its neighbors in the random geometric graph at random and connects only to the selected points. We show that if the visibility radius is at least of the order of $n^{-(1-δ)/d}$ for some $δ> 0$, then a constant value of $c$ is sufficient for the graph to be connected, with high probability. It suffices to take $c \ge \sqrt{(1+ε)/δ} + K$ for any positive $ε$ where $K$ is a constant depending on $d$ only. On the other hand, with $c\le \sqrt{(1-ε)/δ}$, the graph is disconnected, with high probability.
△ Less
Submitted 15 February, 2014;
originally announced February 2014.
-
The Analysis of Kademlia for random IDs
Authors:
Xing Shi Cai,
Luc Devroye
Abstract:
Kademlia is the de facto standard searching algorithm for P2P (peer-to-peer) networks on the Internet. In our earlier work, we introduced two slightly different models for Kademlia and studied how many steps it takes to search for a target node by using Kademlia's searching algorithm. The first model, in which nodes of the network are labelled with deterministic IDs, had been discussed in that pap…
▽ More
Kademlia is the de facto standard searching algorithm for P2P (peer-to-peer) networks on the Internet. In our earlier work, we introduced two slightly different models for Kademlia and studied how many steps it takes to search for a target node by using Kademlia's searching algorithm. The first model, in which nodes of the network are labelled with deterministic IDs, had been discussed in that paper. The second one, in which nodes are labelled with random IDs, which we call the Random ID Model, was only briefly mentioned. Refined results with detailed proofs for this model are given in this paper. Our analysis shows that with high probability it takes about $c \log n$ steps to locate any node, where $n$ is the total number of nodes in the network and $c$ is a constant that does not depend on $n$.
△ Less
Submitted 12 May, 2015; v1 submitted 5 February, 2014;
originally announced February 2014.
-
A Probabilistic Analysis of Kademlia Networks
Authors:
Xing Shi Cai,
Luc Devroye
Abstract:
Kademlia is currently the most widely used searching algorithm in P2P (peer-to-peer) networks. This work studies an essential question about Kademlia from a mathematical perspective: how long does it take to locate a node in the network? To answer it, we introduce a random graph K and study how many steps are needed to locate a given vertex in K using Kademlia's algorithm, which we call the routin…
▽ More
Kademlia is currently the most widely used searching algorithm in P2P (peer-to-peer) networks. This work studies an essential question about Kademlia from a mathematical perspective: how long does it take to locate a node in the network? To answer it, we introduce a random graph K and study how many steps are needed to locate a given vertex in K using Kademlia's algorithm, which we call the routing time. Two slightly different versions of K are studied. In the first one, vertices of K are labelled with fixed IDs. In the second one, vertices are assumed to have randomly selected IDs. In both cases, we show that the routing time is about c*log(n), where n is the number of nodes in the network and c is an explicitly described constant.
△ Less
Submitted 23 September, 2013;
originally announced September 2013.
-
Exact simulation of the GHZ distribution
Authors:
Gilles Brassard,
Luc Devroye,
Claude Gravel
Abstract:
John Bell has shown that the correlations entailed by quantum mechanics cannot be reproduced by a classical process involving non-communicating parties. But can they be simulated with the help of bounded communication? This problem has been studied for more than two decades and it is now well understood in the case of bipartite entanglement. However, the issue was still widely open for multipartit…
▽ More
John Bell has shown that the correlations entailed by quantum mechanics cannot be reproduced by a classical process involving non-communicating parties. But can they be simulated with the help of bounded communication? This problem has been studied for more than two decades and it is now well understood in the case of bipartite entanglement. However, the issue was still widely open for multipartite entanglement, even for the simplest case, which is the tripartite Greenberger-Horne-Zeilinger (GHZ) state. We give an exact simulation of arbitrary independent von Neumann measurements on general n-partite GHZ states. Our protocol requires O(n^2) bits of expected communication between the parties, and O(n log n) expected time is sufficient to carry it out in parallel. Furthermore, we need only an expectation of O(n) independent unbiased random bits, with no need for the generation of continuous real random variables nor prior shared random variables. In the case of equatorial measurements, we improve on the prior art with a protocol that needs only O(n log n) bits of communication and O(log^2 n) parallel time. At the cost of a slight increase in the number of bits communicated, these tasks can be accomplished with a constant expected number of rounds.
△ Less
Submitted 17 May, 2015; v1 submitted 24 March, 2013;
originally announced March 2013.
-
Prediction by Random-Walk Perturbation
Authors:
Luc Devroye,
Gábor Lugosi,
Gergely Neu
Abstract:
We propose a version of the follow-the-perturbed-leader online prediction algorithm in which the cumulative losses are perturbed by independent symmetric random walks. The forecaster is shown to achieve an expected regret of the optimal order O(sqrt(n log N)) where n is the time horizon and N is the number of experts. More importantly, it is shown that the forecaster changes its prediction at most…
▽ More
We propose a version of the follow-the-perturbed-leader online prediction algorithm in which the cumulative losses are perturbed by independent symmetric random walks. The forecaster is shown to achieve an expected regret of the optimal order O(sqrt(n log N)) where n is the time horizon and N is the number of experts. More importantly, it is shown that the forecaster changes its prediction at most O(sqrt(n log N)) times, in expectation. We also extend the analysis to online combinatorial optimization and show that even in this more general setting, the forecaster rarely switches between experts while having a regret of near-optimal order.
△ Less
Submitted 23 February, 2013;
originally announced February 2013.
-
Cellular Tree Classifiers
Authors:
Gérard Biau,
Luc Devroye
Abstract:
The cellular tree classifier model addresses a fundamental problem in the design of classifiers for a parallel or distributed computing world: Given a data set, is it sufficient to apply a majority rule for classification, or shall one split the data into two or more parts and send each part to a potentially different computer (or cell) for further processing? At first sight, it seems impossible t…
▽ More
The cellular tree classifier model addresses a fundamental problem in the design of classifiers for a parallel or distributed computing world: Given a data set, is it sufficient to apply a majority rule for classification, or shall one split the data into two or more parts and send each part to a potentially different computer (or cell) for further processing? At first sight, it seems impossible to define with this paradigm a consistent classifier as no cell knows the "original data size", $n$. However, we show that this is not so by exhibiting two different consistent classifiers. The consistency is universal but is only shown for distributions with nonatomic marginals.
△ Less
Submitted 25 June, 2013; v1 submitted 20 January, 2013;
originally announced January 2013.
-
A Note on Interference in Random Point Sets
Authors:
Luc Devroye,
Pat Morin
Abstract:
The (maximum receiver-centric) interference of a geometric graph (von Rickenbach etal (2005)) is studied. It is shown that, with high probability, the following results hold for a set, V, of n points independently and uniformly distributed in the unit d-cube, for constant dimension d: (1) there exists a connected graph with vertex set V that has interference O((log n)^{1/3}); (2) no connected grap…
▽ More
The (maximum receiver-centric) interference of a geometric graph (von Rickenbach etal (2005)) is studied. It is shown that, with high probability, the following results hold for a set, V, of n points independently and uniformly distributed in the unit d-cube, for constant dimension d: (1) there exists a connected graph with vertex set V that has interference O((log n)^{1/3}); (2) no connected graph with vertex set V has interference o((log n)^{1/4}); and (3) the minimum spanning tree of $V$ has interference Theta((\log n)^{1/2}).
△ Less
Submitted 12 June, 2012; v1 submitted 27 February, 2012;
originally announced February 2012.
-
Random hyperplane search trees in high dimensions
Authors:
Luc Devroye,
James King
Abstract:
Given a set S of n \geq d points in general position in R^d, a random hyperplane split is obtained by sampling d points uniformly at random without replacement from S and splitting based on their affine hull. A random hyperplane search tree is a binary space partition tree obtained by recursive application of random hyperplane splits. We investigate the structural distributions of such random tree…
▽ More
Given a set S of n \geq d points in general position in R^d, a random hyperplane split is obtained by sampling d points uniformly at random without replacement from S and splitting based on their affine hull. A random hyperplane search tree is a binary space partition tree obtained by recursive application of random hyperplane splits. We investigate the structural distributions of such random trees with a particular focus on the growth with d. A blessing of dimensionality arises--as d increases, random hyperplane splits more closely resemble perfectly balanced splits; in turn, random hyperplane search trees more closely resemble perfectly balanced binary search trees.
We prove that, for any fixed dimension d, a random hyperplane search tree storing n points has height at most (1 + O(1/sqrt(d))) log_2 n and average element depth at most (1 + O(1/d)) log_2 n with high probability as n \rightarrow \infty. Further, we show that these bounds are asymptotically optimal with respect to d.
△ Less
Submitted 2 June, 2011;
originally announced June 2011.
-
Connectivity threshold for Bluetooth graphs
Authors:
Nicolas Broutin,
Luc Devroye,
Nicolas Fraiman,
Gábor Lugosi
Abstract:
We study the connectivity properties of random Bluetooth graphs that model certain "ad hoc" wireless networks. The graphs are obtained as "irrigation subgraphs" of the well-known random geometric graph model. There are two parameters that control the model: the radius $r$ that determines the "visible neighbors" of each node and the number of edges $c$ that each node is allowed to send to these. Th…
▽ More
We study the connectivity properties of random Bluetooth graphs that model certain "ad hoc" wireless networks. The graphs are obtained as "irrigation subgraphs" of the well-known random geometric graph model. There are two parameters that control the model: the radius $r$ that determines the "visible neighbors" of each node and the number of edges $c$ that each node is allowed to send to these. The randomness comes from the underlying distribution of data points in space and from the choices of each vertex. We prove that no connectivity can take place with high probability for a range of parameters $r, c$ and completely characterize the connectivity threshold (in $c$) for values of $r$ close the critical value for connectivity in the underlying random geometric graph.
△ Less
Submitted 2 March, 2011;
originally announced March 2011.
-
The dilation of the Delaunay triangulation is greater than π/2
Authors:
Prosenjit Bose,
Luc Devroye,
Maarten Löffler,
Jack Snoeyink,
Vishal Verma
Abstract:
Consider the Delaunay triangulation T of a set P of points in the plane as a Euclidean graph, in which the weight of every edge is its length. It has long been conjectured that the dilation in T of any pair p, p \in P, which is the ratio of the length of the shortest path from p to p' in T over the Euclidean distance ||pp'||, can be at most π/2 \approx 1.5708. In this paper, we show how to constru…
▽ More
Consider the Delaunay triangulation T of a set P of points in the plane as a Euclidean graph, in which the weight of every edge is its length. It has long been conjectured that the dilation in T of any pair p, p \in P, which is the ratio of the length of the shortest path from p to p' in T over the Euclidean distance ||pp'||, can be at most π/2 \approx 1.5708. In this paper, we show how to construct point sets in convex position with dilation > 1.5810 and in general position with dilation > 1.5846. Furthermore, we show that a sufficiently large set of points drawn independently from any distribution will in the limit approach the worst-case dilation for that distribution.
△ Less
Submitted 2 June, 2010;
originally announced June 2010.
-
Odds-On Trees
Authors:
Prosenjit Bose,
Luc Devroye,
Karim Douieb,
Vida Dujmovic,
James King,
Pat Morin
Abstract:
Let R^d -> A be a query problem over R^d for which there exists a data structure S that can compute P(q) in O(log n) time for any query point q in R^d. Let D be a probability measure over R^d representing a distribution of queries. We describe a data structure called the odds-on tree, of size O(n^ε) that can be used as a filter that quickly computes P(q) for some query values q in R^d and relies…
▽ More
Let R^d -> A be a query problem over R^d for which there exists a data structure S that can compute P(q) in O(log n) time for any query point q in R^d. Let D be a probability measure over R^d representing a distribution of queries. We describe a data structure called the odds-on tree, of size O(n^ε) that can be used as a filter that quickly computes P(q) for some query values q in R^d and relies on S for the remaining queries. With an odds-on tree, the expected query time for a point drawn according to D is O(H*+1), where H* is a lower-bound on the expected cost of any linear decision tree that solves P.
Odds-on trees have a number of applications, including distribution-sensitive data structures for point location in 2-d, point-in-polytope testing in d dimensions, ray shooting in simple polygons, ray shooting in polytopes, nearest-neighbour queries in R^d, point-location in arrangements of hyperplanes in R^d, and many other geometric searching problems that can be solved in the linear-decision tree model. A standard lifting technique extends these results to algebraic decision trees of constant degree. A slightly different version of odds-on trees yields similar results for orthogonal searching problems that can be solved in the comparison tree model.
△ Less
Submitted 4 February, 2010;
originally announced February 2010.
-
Point Location in Disconnected Planar Subdivisions
Authors:
Prosenjit Bose,
Luc Devroye,
Karim Douieb,
Vida Dujmovic,
James King,
Pat Morin
Abstract:
Let $G$ be a (possibly disconnected) planar subdivision and let $D$ be a probability measure over $\R^2$. The current paper shows how to preprocess $(G,D)$ into an O(n) size data structure that can answer planar point location queries over $G$. The expected query time of this data structure, for a query point drawn according to $D$, is $O(H+1)$, where $H$ is a lower bound on the expected query t…
▽ More
Let $G$ be a (possibly disconnected) planar subdivision and let $D$ be a probability measure over $\R^2$. The current paper shows how to preprocess $(G,D)$ into an O(n) size data structure that can answer planar point location queries over $G$. The expected query time of this data structure, for a query point drawn according to $D$, is $O(H+1)$, where $H$ is a lower bound on the expected query time of any linear decision tree for point location in $G$. This extends the results of Collette et al (2008, 2009) from connected planar subdivisions to disconnected planar subdivisions. A version of this structure, when combined with existing results on succinct point location, provides a succinct distribution-sensitive point location structure.
△ Less
Submitted 15 January, 2010;
originally announced January 2010.
-
Memoryless Routing in Convex Subdivisions: Random Walks are Optimal
Authors:
Dan Chen,
Luc Devroye,
Vida Dujmovic,
Pat Morin
Abstract:
A memoryless routing algorithm is one in which the decision about the next edge on the route to a vertex t for a packet currently located at vertex v is made based only on the coordinates of v, t, and the neighbourhood, N(v), of v. The current paper explores the limitations of such algorithms by showing that, for any (randomized) memoryless routing algorithm A, there exists a convex subdivision…
▽ More
A memoryless routing algorithm is one in which the decision about the next edge on the route to a vertex t for a packet currently located at vertex v is made based only on the coordinates of v, t, and the neighbourhood, N(v), of v. The current paper explores the limitations of such algorithms by showing that, for any (randomized) memoryless routing algorithm A, there exists a convex subdivision on which A takes Omega(n^2) expected time to route a message between some pair of vertices. Since this lower bound is matched by a random walk, this result implies that the geometric information available in convex subdivisions is not helpful for this class of routing algorithms. The current paper also shows the existence of triangulations for which the Random-Compass algorithm proposed by Bose etal (2002,2004) requires 2^{Ω(n)} time to route between some pair of vertices.
△ Less
Submitted 12 November, 2009;
originally announced November 2009.
-
On the Expected Maximum Degree of Gabriel and Yao Graphs
Authors:
Luc Devroye,
Joachim Gudmundsson,
Pat Morin
Abstract:
Motivated by applications of Gabriel graphs and Yao graphs in wireless ad-hoc networks, we show that the maximal degree of a random Gabriel graph or Yao graph defined on $n$ points drawn uniformly at random from a unit square grows as $Θ(\log n / \log \log n)$ in probability.
Motivated by applications of Gabriel graphs and Yao graphs in wireless ad-hoc networks, we show that the maximal degree of a random Gabriel graph or Yao graph defined on $n$ points drawn uniformly at random from a unit square grows as $Θ(\log n / \log \log n)$ in probability.
△ Less
Submitted 21 May, 2009;
originally announced May 2009.
-
Perfect simulation from the Quicksort limit distribution
Authors:
Luc Devroye,
James Allen Fill,
Ralph Neininger
Abstract:
The weak limit of the normalized number of comparisons needed by the Quicksort algorithm to sort n randomly permuted items is known to be determined implicitly by a distributional fixed-point equation. We give an algorithm for perfect random variate generation from this distribution.
The weak limit of the normalized number of comparisons needed by the Quicksort algorithm to sort n randomly permuted items is known to be determined implicitly by a distributional fixed-point equation. We give an algorithm for perfect random variate generation from this distribution.
△ Less
Submitted 23 May, 2000; v1 submitted 23 May, 2000;
originally announced May 2000.