Search | arXiv e-print repository

An asymptotically optimal algorithm for generating bin cardinalities

Abstract: In the balls-into-bins setting, $n$ balls are thrown uniformly at random into $n$ bins. The naïve way to generate the final load vector takes $Θ(n)$ time. However, it is well-known that this load vector has with high probability bin cardinalities of size $Θ(\frac{\log n}{\log \log n})$. Here, we present an algorithm in the RAM model that generates the bin cardinalities of the final load vector in… ▽ More In the balls-into-bins setting, $n$ balls are thrown uniformly at random into $n$ bins. The naïve way to generate the final load vector takes $Θ(n)$ time. However, it is well-known that this load vector has with high probability bin cardinalities of size $Θ(\frac{\log n}{\log \log n})$. Here, we present an algorithm in the RAM model that generates the bin cardinalities of the final load vector in the optimal $Θ(\frac{\log n}{\log \log n})$ time in expectation and with high probability. Further, the algorithm that we present is still optimal for any $m \in [n, n \log n]$ balls and can also be used as a building block to efficiently simulate more involved load balancing algorithms. In particular, for the Two-Choice algorithm, which samples two bins in each step and allocates to the least-loaded of the two, we obtain roughly a quadratic speed-up over the naïve simulation. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 11 pages

MSC Class: 65C10; 60C05; 11K45; 68W20; 68U20; 68W27; 68W40 ACM Class: G.3; G.2.m; F.2.2

arXiv:2311.13059 [pdf, ps, other]

A note on estimating the dimension from a random geometric graph

Authors: Caelan Atamanchuk, Luc Devroye, Gabor Lugosi

Abstract: Let $G_n$ be a random geometric graph with vertex set $[n]$ based on $n$ i.i.d.\ random vectors $X_1,\ldots,X_n$ drawn from an unknown density $f$ on $\R^d$. An edge $(i,j)$ is present when $\|X_i -X_j\| \le r_n$, for a given threshold $r_n$ possibly depending upon $n$, where $\| \cdot \|$ denotes Euclidean distance. We study the problem of estimating the dimension $d$ of the underlying space when… ▽ More Let $G_n$ be a random geometric graph with vertex set $[n]$ based on $n$ i.i.d.\ random vectors $X_1,\ldots,X_n$ drawn from an unknown density $f$ on $\R^d$. An edge $(i,j)$ is present when $\|X_i -X_j\| \le r_n$, for a given threshold $r_n$ possibly depending upon $n$, where $\| \cdot \|$ denotes Euclidean distance. We study the problem of estimating the dimension $d$ of the underlying space when we have access to the adjacency matrix of the graph but do not know $r_n$ or the vectors $X_i$. The main result of the paper is that there exists an estimator of $d$ that converges to $d$ in probability as $n \to \infty$ for all densities with $\int f^5 < \infty$ whenever $n^{3/2} r_n^d \to \infty$ and $r_n = o(1)$. The conditions allow very sparse graphs since when $n^{3/2} r_n^d \to 0$, the graph contains isolated edges only, with high probability. We also show that, without any condition on the density, a consistent estimator of $d$ exists when $n r_n^d \to \infty$ and $r_n = o(1)$. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2310.16715 [pdf, ps, other]

An Algorithm to Recover Shredded Random Matrices

Authors: Caelan Atamanchuk, Luc Devroye, Massimo Vicenzo

Abstract: Given some binary matrix $M$, suppose we are presented with the collection of its rows and columns in independent arbitrary orderings. From this information, are we able to recover the unique original orderings and matrix? We present an algorithm that identifies whether there is a unique ordering associated with a set of rows and columns, and outputs either the unique correct orderings for the row… ▽ More Given some binary matrix $M$, suppose we are presented with the collection of its rows and columns in independent arbitrary orderings. From this information, are we able to recover the unique original orderings and matrix? We present an algorithm that identifies whether there is a unique ordering associated with a set of rows and columns, and outputs either the unique correct orderings for the rows and columns or the full collection of all valid orderings and valid matrices. We show that there is a constant $c > 0$ such that the algorithm terminates in $O(n^2)$ time with high probability and in expectation for random $n \times n$ binary matrices with i.i.d.\ Bernoulli $(p)$ entries $(m_{ij})_{ij=1}^n$ such that $\frac{c\log^2(n)}{n(\log\log(n))^2} \leq p \leq \frac{1}{2}$. △ Less

Submitted 23 April, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

MSC Class: 60C05 (Primary) 68Q25 (Secondary)

arXiv:2309.05308 [pdf, ps, other]

Two-way Linear Probing Revisited

Authors: Ketan Dalal, Luc Devroye, Ebrahim Malalla

Abstract: We introduce linear probing hashing schemes that construct a hash table of size $n$, with constant load factor $α$, on which the worst-case unsuccessful search time is asymptotically almost surely $O(\log \log n)$. The schemes employ two linear probe sequences to find empty cells for the keys. Matching lower bounds on the maximum cluster size produced by any algorithm that uses two linear probe se… ▽ More We introduce linear probing hashing schemes that construct a hash table of size $n$, with constant load factor $α$, on which the worst-case unsuccessful search time is asymptotically almost surely $O(\log \log n)$. The schemes employ two linear probe sequences to find empty cells for the keys. Matching lower bounds on the maximum cluster size produced by any algorithm that uses two linear probe sequences are obtained as well. △ Less

Submitted 18 September, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: 31 pages, 8 figures, 5 tables, references added

ACM Class: E.2; F.2.2

arXiv:2306.01727 [pdf, other]

Broadcasting in random recursive dags

Authors: Simon Briend, Luc Devroye, Gabor Lugosi

Abstract: A uniform $k$-{\sc dag} generalizes the uniform random recursive tree by picking $k$ parents uniformly at random from the existing nodes. It starts with $k$ ''roots''. Each of the $k$ roots is assigned a bit. These bits are propagated by a noisy channel. The parents' bits are flipped with probability $p$, and a majority vote is taken. When all nodes have received their bits, the $k$-{\sc dag} is s… ▽ More A uniform $k$-{\sc dag} generalizes the uniform random recursive tree by picking $k$ parents uniformly at random from the existing nodes. It starts with $k$ ''roots''. Each of the $k$ roots is assigned a bit. These bits are propagated by a noisy channel. The parents' bits are flipped with probability $p$, and a majority vote is taken. When all nodes have received their bits, the $k$-{\sc dag} is shown without identifying the roots. The goal is to estimate the majority bit among the roots. We identify the threshold for $p$ as a function of $k$ below which the majority rule among all nodes yields an error $c+o(1)$ with $c<1/2$. Above the threshold the majority rule errs with probability $1/2+o(1)$. △ Less

Submitted 24 February, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

arXiv:2210.10544 [pdf, ps, other]

Subtractive random forests

Authors: Nicolas Broutin, Luc Devroye, Gabor Lugosi, Roberto Imbuzeiro Oliveira

Abstract: Motivated by online recommendation systems, we study a family of random forests. The vertices of the forest are labeled by integers. Each non-positive integer $i\le 0$ is the root of a tree. Vertices labeled by positive integers $n \ge 1$ are attached sequentially such that the parent of vertex $n$ is $n-Z_n$, where the $Z_n$ are i.i.d.\ random variables taking values in $\mathbb N$. We study seve… ▽ More Motivated by online recommendation systems, we study a family of random forests. The vertices of the forest are labeled by integers. Each non-positive integer $i\le 0$ is the root of a tree. Vertices labeled by positive integers $n \ge 1$ are attached sequentially such that the parent of vertex $n$ is $n-Z_n$, where the $Z_n$ are i.i.d.\ random variables taking values in $\mathbb N$. We study several characteristics of the resulting random forest. In particular, we establish bounds for the expected tree sizes, the number of trees in the forest, the number of leaves, the maximum degree, and the height of the forest. We show that for all distributions of the $Z_n$, the forest contains at most one infinite tree, almost surely. If ${\mathbb E} Z_n < \infty$, then there is a unique infinite tree and the total size of the remaining trees is finite, with finite expected value if ${\mathbb E}Z_n^2 < \infty$. If ${\mathbb E} Z_n = \infty$ then almost surely all trees are finite. △ Less

Submitted 25 February, 2024; v1 submitted 19 October, 2022; originally announced October 2022.

arXiv:2105.01108

Consistent Density Estimation Under Discrete Mixture Models

Authors: Luc Devroye, Alex Dytso

Abstract: This work considers a problem of estimating a mixing probability density $f$ in the setting of discrete mixture models. The paper consists of three parts. The first part focuses on the construction of an $L_1$ consistent estimator of $f$. In particular, under the assumptions that the probability measure $μ$ of the observation is atomic, and the map from $f$ to $μ$ is bijective, it is shown that… ▽ More This work considers a problem of estimating a mixing probability density $f$ in the setting of discrete mixture models. The paper consists of three parts. The first part focuses on the construction of an $L_1$ consistent estimator of $f$. In particular, under the assumptions that the probability measure $μ$ of the observation is atomic, and the map from $f$ to $μ$ is bijective, it is shown that there exists an estimator $f_n$ such that for every density $f$ $\lim_{n\to \infty} \mathbb{E} \left[ \int |f_n -f | \right]=0$. The second part discusses the implementation details. Specifically, it is shown that the consistency for every $f$ can be attained with a computationally feasible estimator. The third part, as a study case, considers a Poisson mixture model. In particular, it is shown that in the Poisson noise setting, the bijection condition holds and, hence, estimation can be performed consistently for every $f$. △ Less

Submitted 10 May, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

Comments: Reason for withdrawal: There is an issue with the proof of Theorem~1

arXiv:2010.11537 [pdf, ps, other]

On Mean Estimation for Heteroscedastic Random Variables

Authors: Luc Devroye, Silvio Lattanzi, Gabor Lugosi, Nikita Zhivotovskiy

Abstract: We study the problem of estimating the common mean $μ$ of $n$ independent symmetric random variables with different and unknown standard deviations $σ_1 \le σ_2 \le \cdots \leσ_n$. We show that, under some mild regularity assumptions on the distribution, there is a fully adaptive estimator $\widehatμ$ such that it is invariant to permutations of the elements of the sample and satisfies that, up to… ▽ More We study the problem of estimating the common mean $μ$ of $n$ independent symmetric random variables with different and unknown standard deviations $σ_1 \le σ_2 \le \cdots \leσ_n$. We show that, under some mild regularity assumptions on the distribution, there is a fully adaptive estimator $\widehatμ$ such that it is invariant to permutations of the elements of the sample and satisfies that, up to logarithmic factors, with high probability, \[ |\widehatμ - μ| \lesssim \min\left\{σ_{m^*}, \frac{\sqrt{n}}{\sum_{i = \sqrt{n}}^n σ_i^{-1}} \right\}~, \] where the index $m^* \lesssim \sqrt{n}$ satisfies $m^* \approx \sqrt{σ_{m^*}\sum_{i = m^*}^nσ_i^{-1}}$. △ Less

Submitted 22 October, 2020; originally announced October 2020.

Comments: 29 pages

arXiv:2005.01242 [pdf, other]

Probabilistic Analysis of RRT Trees

Authors: Konrad Anand, Luc Devroye

Abstract: This thesis presents analysis of the properties and run-time of the Rapidly-exploring Random Tree (RRT) algorithm. It is shown that the time for the RRT with stepsize $ε$ to grow close to every point in the $d$-dimensional unit cube is $Θ\left(\frac1{ε^d} \log \left(\frac1ε\right)\right)$. Also, the time it takes for the tree to reach a region of positive probability is… ▽ More This thesis presents analysis of the properties and run-time of the Rapidly-exploring Random Tree (RRT) algorithm. It is shown that the time for the RRT with stepsize $ε$ to grow close to every point in the $d$-dimensional unit cube is $Θ\left(\frac1{ε^d} \log \left(\frac1ε\right)\right)$. Also, the time it takes for the tree to reach a region of positive probability is $O\left(ε^{-\frac32}\right)$. Finally, a relationship is shown to the Nearest Neighbour Tree (NNT). This relationship shows that the total Euclidean path length after $n$ steps is $O(\sqrt n)$ and the expected height of the tree is bounded above by $(e + o(1)) \log n$. △ Less

Submitted 3 May, 2020; originally announced May 2020.

Comments: 29 pages, 10 figures, submitted to The International Journal of Robotics Research

arXiv:1812.06063 [pdf, ps, other]

Discrete minimax estimation with trees

Authors: Luc Devroye, Tommy Reddad

Abstract: We propose a simple recursive data-based partitioning scheme which produces piecewise-constant or piecewise-linear density estimates on intervals, and show how this scheme can determine the optimal $L_1$ minimax rate for some discrete nonparametric classes. We propose a simple recursive data-based partitioning scheme which produces piecewise-constant or piecewise-linear density estimates on intervals, and show how this scheme can determine the optimal $L_1$ minimax rate for some discrete nonparametric classes. △ Less

Submitted 27 June, 2019; v1 submitted 14 December, 2018; originally announced December 2018.

MSC Class: 60G07

arXiv:1810.00969 [pdf, other]

On the discovery of the seed in uniform attachment trees

Authors: Luc Devroye, Tommy Reddad

Abstract: We investigate the size of vertex confidence sets for including part of (or the entirety of) the seed in seeded uniform attachment trees, given knowledge of some of the seed's properties, and with a prescribed probability of failure. We also study the problem of identifying the leaves of a seed in a seeded uniform attachment tree, given knowledge of the positions of all internal nodes of the seed. We investigate the size of vertex confidence sets for including part of (or the entirety of) the seed in seeded uniform attachment trees, given knowledge of some of the seed's properties, and with a prescribed probability of failure. We also study the problem of identifying the leaves of a seed in a seeded uniform attachment tree, given knowledge of the positions of all internal nodes of the seed. △ Less

Submitted 22 February, 2019; v1 submitted 1 October, 2018; originally announced October 2018.

MSC Class: 05C80

arXiv:1807.06649 [pdf, other]

doi 10.3390/e21010092

Remote Sampling with Applications to General Entanglement Simulation

Authors: Gilles Brassard, Luc Devroye, Claude Gravel

Abstract: We show how to sample exactly discrete probability distributions whose defining parameters are distributed among remote parties. For this purpose, von Neumann's rejection algorithm is turned into a distributed sampling communication protocol. We study the expected number of bits communicated among the parties and also exhibit a trade-off between the number of rounds of the rejection algorithm and… ▽ More We show how to sample exactly discrete probability distributions whose defining parameters are distributed among remote parties. For this purpose, von Neumann's rejection algorithm is turned into a distributed sampling communication protocol. We study the expected number of bits communicated among the parties and also exhibit a trade-off between the number of rounds of the rejection algorithm and the number of bits transmitted in the initial phase. Finally, we apply remote sampling to the simulation of quantum entanglement in its most general form possible, when an arbitrary number of parties share systems of arbitrary dimensions on which they apply arbitrary measurements (not restricted to being projective measurements). In case the dimension of the systems and the number of possible outcomes per party is bounded by a constant, it suffices to communicate an expected O(m^2) bits in order to simulate exactly the outcomes that these measurements would have produced on those systems, where m is the number of participants. △ Less

Submitted 17 July, 2018; originally announced July 2018.

Comments: 17 pages, 1 figure, 4 algorithms (protocols); Complete generalization of previous paper arXiv:1303.5942 [cs.IT] -- Exact simulation of the GHZ distribution -- by the same authors

Journal ref: Entropy 21(1):92, 2019

arXiv:1806.06887 [pdf, ps, other]

The Minimax Learning Rates of Normal and Ising Undirected Graphical Models

Authors: Luc Devroye, Abbas Mehrabian, Tommy Reddad

Abstract: Let $G$ be an undirected graph with $m$ edges and $d$ vertices. We show that $d$-dimensional Ising models on $G$ can be learned from $n$ i.i.d. samples within expected total variation distance some constant factor of $\min\{1, \sqrt{(m + d)/n}\}$, and that this rate is optimal. We show that the same rate holds for the class of $d$-dimensional multivariate normal undirected graphical models with re… ▽ More Let $G$ be an undirected graph with $m$ edges and $d$ vertices. We show that $d$-dimensional Ising models on $G$ can be learned from $n$ i.i.d. samples within expected total variation distance some constant factor of $\min\{1, \sqrt{(m + d)/n}\}$, and that this rate is optimal. We show that the same rate holds for the class of $d$-dimensional multivariate normal undirected graphical models with respect to $G$. We also identify the optimal rate of $\min\{1, \sqrt{m/n}\}$ for Ising models with no external magnetic field. △ Less

Submitted 3 June, 2020; v1 submitted 18 June, 2018; originally announced June 2018.

Comments: Accepted in the Electronic Journal of Statistics; 24 pages

MSC Class: 62G07; 82B20

arXiv:1712.07775 [pdf, ps, other]

Local optima of the Sherrington-Kirkpatrick Hamiltonian

Authors: Louigi Addario-Berry, Luc Devroye, Gabor Lugosi, Roberto Imbuzeiro Oliveira

Abstract: We study local optima of the Hamiltonian of the Sherrington-Kirkpatrick model. We compute the exponent of the expected number of local optima and determine the "typical" value of the Hamiltonian. We study local optima of the Hamiltonian of the Sherrington-Kirkpatrick model. We compute the exponent of the expected number of local optima and determine the "typical" value of the Hamiltonian. △ Less

Submitted 20 December, 2017; originally announced December 2017.

Comments: 20 pages

arXiv:1707.00083 [pdf, other]

Notes on Growing a Tree in a Graph

Authors: Luc Devroye, Vida Dujmović, Alan Frieze, Abbas Mehrabian, Pat Morin, Bruce Reed

Abstract: We study the height of a spanning tree $T$ of a graph $G$ obtained by starting with a single vertex of $G$ and repeatedly selecting, uniformly at random, an edge of $G$ with exactly one endpoint in $T$ and adding this edge to $T$. We study the height of a spanning tree $T$ of a graph $G$ obtained by starting with a single vertex of $G$ and repeatedly selecting, uniformly at random, an edge of $G$ with exactly one endpoint in $T$ and adding this edge to $T$. △ Less

Submitted 4 July, 2017; v1 submitted 30 June, 2017; originally announced July 2017.

Comments: Updated grant acknowledgement

arXiv:1703.10731 [pdf, other]

An analysis of budgeted parallel search on conditional Galton-Watson trees

Authors: David Avis, Luc Devroye

Abstract: Recently Avis and Jordan have demonstrated the efficiency of a simple technique called budgeting for the parallelization of a number of tree search algorithms. The idea is to limit the amount of work that a processor performs before it terminates its search and returns any unexplored nodes to a master process. This limit is set by a critical budget parameter which determines the overhead of the pr… ▽ More Recently Avis and Jordan have demonstrated the efficiency of a simple technique called budgeting for the parallelization of a number of tree search algorithms. The idea is to limit the amount of work that a processor performs before it terminates its search and returns any unexplored nodes to a master process. This limit is set by a critical budget parameter which determines the overhead of the process. In this paper we study the behaviour of the budget parameter on conditional Galton-Watson trees obtaining asymptotically tight bounds on this overhead. We present empirical results to show that this bound is surprisingly accurate in practice. △ Less

Submitted 5 September, 2019; v1 submitted 30 March, 2017; originally announced March 2017.

Comments: 15 pages, 3 figures, 2 tables Minor revisions including an extended description of the Q-process with additional figure

arXiv:1701.02527 [pdf, other]

The heavy path approach to Galton-Watson trees with an application to Apollonian networks

Authors: Luc Devroye, Cecilia Holmgren, Henning Sulzbach

Abstract: We study the heavy path decomposition of conditional Galton-Watson trees. In a standard Galton-Watson tree conditional on its size $n$, we order all children by their subtree sizes, from large (heavy) to small. A node is marked if it is among the $k$ heaviest nodes among its siblings. Unmarked nodes and their subtrees are removed, leaving only a tree of marked nodes, which we call the $k$-heavy tr… ▽ More We study the heavy path decomposition of conditional Galton-Watson trees. In a standard Galton-Watson tree conditional on its size $n$, we order all children by their subtree sizes, from large (heavy) to small. A node is marked if it is among the $k$ heaviest nodes among its siblings. Unmarked nodes and their subtrees are removed, leaving only a tree of marked nodes, which we call the $k$-heavy tree. We study various properties of these trees, including their size and the maximal distance from any original node to the $k$-heavy tree. In particular, under some moment condition, the $2$-heavy tree is with high probability larger than $cn$ for some constant $c > 0$, and the maximal distance from the $k$-heavy tree is $O(n^{1/(k+1)})$ in probability. As a consequence, for uniformly random Apollonian networks of size $n$, the expected size of the longest simple path is $Ω(n)$. △ Less

Submitted 10 January, 2017; originally announced January 2017.

Comments: 3 figures

arXiv:1511.02273 [pdf, other]

doi 10.1007/s11222-016-9648-z

The expected bit complexity of the von Neumann rejection algorithm

Authors: Luc Devroye, Claude Gravel

Abstract: In 1952, von Neumann introduced the rejection method for random variate generation. We revisit this algorithm when we have a source of perfect bits at our disposal. In this random bit model, there are universal lower bounds for generating a random variate with a given density to within an accuracy $ε$ derived by Knuth and Yao, and refined by the authors. In general, von Neumann's method fails in t… ▽ More In 1952, von Neumann introduced the rejection method for random variate generation. We revisit this algorithm when we have a source of perfect bits at our disposal. In this random bit model, there are universal lower bounds for generating a random variate with a given density to within an accuracy $ε$ derived by Knuth and Yao, and refined by the authors. In general, von Neumann's method fails in this model. We propose a modification that insures proper behavior for all Riemann-integrable densities on compact sets, and show that the expected number of random bits needed behaves optimally with respect to universal lower bounds. In particular, we introduce the notion of an oracle that evaluates the supremum and infimum of a function on any rectangle of $\mathbb{R}^{d}$, and develop a quadtree-style extension of the classical rejection method. △ Less

Submitted 2 April, 2016; v1 submitted 6 November, 2015; originally announced November 2015.

Comments: 25 pages, 4 figures

MSC Class: 65C10; 68Q25; 68Q30; 68Q87; 68W20; 68W40

arXiv:1504.06238 [pdf, other]

doi 10.1002/rsa.20707

The graph structure of a deterministic automaton chosen at random: full version

Authors: Xing Shi Cai, Luc Devroye

Abstract: A deterministic finite automaton (DFA) of $n$ states over a $k$-letter alphabet can be seen as a digraph with $n$ vertices which all have exactly $k$ labeled out-arcs ($k$-out digraph). In 1973 Grusho first proved that with high probability (whp) in a random $k$-out digraph there is a strongly connected component (SCC) of linear size that is reachable from all vertices, i.e., a giant. He also prov… ▽ More A deterministic finite automaton (DFA) of $n$ states over a $k$-letter alphabet can be seen as a digraph with $n$ vertices which all have exactly $k$ labeled out-arcs ($k$-out digraph). In 1973 Grusho first proved that with high probability (whp) in a random $k$-out digraph there is a strongly connected component (SCC) of linear size that is reachable from all vertices, i.e., a giant. He also proved that the size of the giant follows a central limit law. We show that whp the part outside the giant contains at most a few short cycles and mostly consists of overlap** tree-like structures. Thus the directed acyclic graph (DAG) of a random $k$-out digraph is almost the same as the digraph with the giant contracted into one vertex. These findings lead to a new, concise and self-contained proof of Grusho's theorem. This work also contains some other results including the structure outside the giant, the phase transition phenomenon in strong connectivity, the typical distance, and an extension to simple digraphs. △ Less

Submitted 9 August, 2016; v1 submitted 23 April, 2015; originally announced April 2015.

Comments: 48 pages, 7 figures

arXiv:1502.02539 [pdf, ps, other]

Random variate generation using only finitely many unbiased, independently and identically distributed random bits

Authors: Luc Devroye, Claude Gravel

Abstract: For any discrete probability distributions with bounded entropy, we can generate exactly a random variate using only a finite expected number of perfect coin flips. A perfect coin flip is the outcome of an unbiased Bernoulli random variable. Coin flips are unbiased, independently and identically distributed in all our work. We survey well-known algorithms for the discrete case such as the one from… ▽ More For any discrete probability distributions with bounded entropy, we can generate exactly a random variate using only a finite expected number of perfect coin flips. A perfect coin flip is the outcome of an unbiased Bernoulli random variable. Coin flips are unbiased, independently and identically distributed in all our work. We survey well-known algorithms for the discrete case such as the one from Knuth and Yao as well as the one from Han and Hoshi. We also discuss briefly about a practical implementation for the algorithm proposed by Knuth and Yao. For the continuous case, only approximations can be hoped for. The freedom to choose the accuracy for the approximations matters, and, for that, we propose to measure accuracy in terms of the Wasserstein $L_\infty$-metric. We derive a universal lower bound for the expected number of perfect coin flips required to reach a desired accuracy. We also provide several algorithms for absolutely continuous distributions that come within our universal lower bound. △ Less

Submitted 10 November, 2020; v1 submitted 9 February, 2015; originally announced February 2015.

Comments: 54 pages, 9 figures

MSC Class: 65C10; 68Q25; 68Q30; 68Q87; 68W20; 68W40

arXiv:1411.3317 [pdf, ps, other]

Finding Adam in random growing trees

Authors: Sébastien Bubeck, Luc Devroye, Gábor Lugosi

Abstract: We investigate algorithms to find the first vertex in large trees generated by either the uniform attachment or preferential attachment model. We require the algorithm to output a set of $K$ vertices, such that, with probability at least $1-ε$, the first vertex is in this set. We show that for any $ε$, there exist such algorithms with $K$ independent of the size of the input tree. Moreover, we pro… ▽ More We investigate algorithms to find the first vertex in large trees generated by either the uniform attachment or preferential attachment model. We require the algorithm to output a set of $K$ vertices, such that, with probability at least $1-ε$, the first vertex is in this set. We show that for any $ε$, there exist such algorithms with $K$ independent of the size of the input tree. Moreover, we provide almost tight bounds for the best value of $K$ as a function of $ε$. In the uniform attachment case we show that the optimal $K$ is subpolynomial in $1/ε$, and that it has to be at least superpolylogarithmic. On the other hand, the preferential attachment case is exponentially harder, as we prove that the best $K$ is polynomial in $1/ε$. We conclude the paper with several open problems. △ Less

Submitted 1 December, 2015; v1 submitted 12 November, 2014; originally announced November 2014.

Comments: 14 pages

arXiv:1403.1274 [pdf, other]

Almost optimal sparsification of random geometric graphs

Authors: Nicolas Broutin, Luc Devroye, Gabor Lugosi

Abstract: A random geometric irrigation graph $Γ_n(r_n,ξ)$ has $n$ vertices identified by $n$ independent uniformly distributed points $X_1,\ldots,X_n$ in the unit square $[0,1]^2$. Each point $X_i$ selects $ξ_i$ neighbors at random, without replacement, among those points $X_j$ ($j\neq i$) for which $\|X_i-X_j\| < r_n$, and the selected vertices are connected to $X_i$ by an edge. The number $ξ_i$ of the ne… ▽ More A random geometric irrigation graph $Γ_n(r_n,ξ)$ has $n$ vertices identified by $n$ independent uniformly distributed points $X_1,\ldots,X_n$ in the unit square $[0,1]^2$. Each point $X_i$ selects $ξ_i$ neighbors at random, without replacement, among those points $X_j$ ($j\neq i$) for which $\|X_i-X_j\| < r_n$, and the selected vertices are connected to $X_i$ by an edge. The number $ξ_i$ of the neighbors is an integer-valued random variable, chosen independently with identical distribution for each $X_i$ such that $ξ_i$ satisfies $1\le ξ_i \le κ$ for a constant $κ>1$. We prove that when $r_n = γ_n \sqrt{\log n/n}$ for $γ_n \to \infty$ with $γ_n =o(n^{1/6}/\log^{5/6}n)$, then the random geometric irrigation graph experiences explosive percolation in the sense that when $\mathbf E ξ_i=1$, then the largest connected component has size $o(n)$ but if $\mathbf E ξ_i >1$, then the size of the largest connected component is with high probability $n-o(n)$. This offers a natural non-centralized sparsification of a random geometric graph that is mostly connected. △ Less

Submitted 7 March, 2014; v1 submitted 5 March, 2014; originally announced March 2014.

MSC Class: 05C80; 60C05

arXiv:1402.3696 [pdf, ps, other]

Connectivity of sparse Bluetooth networks

Authors: Nicolas Broutin, Luc Devroye, Gábor Lugosi

Abstract: Consider a random geometric graph defined on $n$ vertices uniformly distributed in the $d$-dimensional unit torus. Two vertices are connected if their distance is less than a "visibility radius" $r_n$. We consider {\sl Bluetooth networks} that are locally sparsified random geometric graphs. Each vertex selects $c$ of its neighbors in the random geometric graph at random and connects only to the se… ▽ More Consider a random geometric graph defined on $n$ vertices uniformly distributed in the $d$-dimensional unit torus. Two vertices are connected if their distance is less than a "visibility radius" $r_n$. We consider {\sl Bluetooth networks} that are locally sparsified random geometric graphs. Each vertex selects $c$ of its neighbors in the random geometric graph at random and connects only to the selected points. We show that if the visibility radius is at least of the order of $n^{-(1-δ)/d}$ for some $δ> 0$, then a constant value of $c$ is sufficient for the graph to be connected, with high probability. It suffices to take $c \ge \sqrt{(1+ε)/δ} + K$ for any positive $ε$ where $K$ is a constant depending on $d$ only. On the other hand, with $c\le \sqrt{(1-ε)/δ}$, the graph is disconnected, with high probability. △ Less

Submitted 15 February, 2014; originally announced February 2014.

MSC Class: 05C80; 60C05

arXiv:1402.1191 [pdf, ps, other]

doi 10.1080/15427951.2015.1051674

The Analysis of Kademlia for random IDs

Authors: Xing Shi Cai, Luc Devroye

Abstract: Kademlia is the de facto standard searching algorithm for P2P (peer-to-peer) networks on the Internet. In our earlier work, we introduced two slightly different models for Kademlia and studied how many steps it takes to search for a target node by using Kademlia's searching algorithm. The first model, in which nodes of the network are labelled with deterministic IDs, had been discussed in that pap… ▽ More Kademlia is the de facto standard searching algorithm for P2P (peer-to-peer) networks on the Internet. In our earlier work, we introduced two slightly different models for Kademlia and studied how many steps it takes to search for a target node by using Kademlia's searching algorithm. The first model, in which nodes of the network are labelled with deterministic IDs, had been discussed in that paper. The second one, in which nodes are labelled with random IDs, which we call the Random ID Model, was only briefly mentioned. Refined results with detailed proofs for this model are given in this paper. Our analysis shows that with high probability it takes about $c \log n$ steps to locate any node, where $n$ is the total number of nodes in the network and $c$ is a constant that does not depend on $n$. △ Less

Submitted 12 May, 2015; v1 submitted 5 February, 2014; originally announced February 2014.

Comments: 15 pages. 2 figures

arXiv:1309.5866 [pdf, ps, other]

doi 10.1007/978-3-642-45030-3_66

A Probabilistic Analysis of Kademlia Networks

Authors: Xing Shi Cai, Luc Devroye

Abstract: Kademlia is currently the most widely used searching algorithm in P2P (peer-to-peer) networks. This work studies an essential question about Kademlia from a mathematical perspective: how long does it take to locate a node in the network? To answer it, we introduce a random graph K and study how many steps are needed to locate a given vertex in K using Kademlia's algorithm, which we call the routin… ▽ More Kademlia is currently the most widely used searching algorithm in P2P (peer-to-peer) networks. This work studies an essential question about Kademlia from a mathematical perspective: how long does it take to locate a node in the network? To answer it, we introduce a random graph K and study how many steps are needed to locate a given vertex in K using Kademlia's algorithm, which we call the routing time. Two slightly different versions of K are studied. In the first one, vertices of K are labelled with fixed IDs. In the second one, vertices are assumed to have randomly selected IDs. In both cases, we show that the routing time is about c*log(n), where n is the number of nodes in the network and c is an explicitly described constant. △ Less

Submitted 23 September, 2013; originally announced September 2013.

Comments: ISAAC 2013

arXiv:1303.5942 [pdf, other]

doi 10.1109/TIT.2015.2504525

Exact simulation of the GHZ distribution

Authors: Gilles Brassard, Luc Devroye, Claude Gravel

Abstract: John Bell has shown that the correlations entailed by quantum mechanics cannot be reproduced by a classical process involving non-communicating parties. But can they be simulated with the help of bounded communication? This problem has been studied for more than two decades and it is now well understood in the case of bipartite entanglement. However, the issue was still widely open for multipartit… ▽ More John Bell has shown that the correlations entailed by quantum mechanics cannot be reproduced by a classical process involving non-communicating parties. But can they be simulated with the help of bounded communication? This problem has been studied for more than two decades and it is now well understood in the case of bipartite entanglement. However, the issue was still widely open for multipartite entanglement, even for the simplest case, which is the tripartite Greenberger-Horne-Zeilinger (GHZ) state. We give an exact simulation of arbitrary independent von Neumann measurements on general n-partite GHZ states. Our protocol requires O(n^2) bits of expected communication between the parties, and O(n log n) expected time is sufficient to carry it out in parallel. Furthermore, we need only an expectation of O(n) independent unbiased random bits, with no need for the generation of continuous real random variables nor prior shared random variables. In the case of equatorial measurements, we improve on the prior art with a protocol that needs only O(n log n) bits of communication and O(log^2 n) parallel time. At the cost of a slight increase in the number of bits communicated, these tasks can be accomplished with a constant expected number of rounds. △ Less

Submitted 17 May, 2015; v1 submitted 24 March, 2013; originally announced March 2013.

Comments: Improved in a variety of ways, including new results. 27 pages

arXiv:1302.5797 [pdf, ps, other]

Prediction by Random-Walk Perturbation

Authors: Luc Devroye, Gábor Lugosi, Gergely Neu

Abstract: We propose a version of the follow-the-perturbed-leader online prediction algorithm in which the cumulative losses are perturbed by independent symmetric random walks. The forecaster is shown to achieve an expected regret of the optimal order O(sqrt(n log N)) where n is the time horizon and N is the number of experts. More importantly, it is shown that the forecaster changes its prediction at most… ▽ More We propose a version of the follow-the-perturbed-leader online prediction algorithm in which the cumulative losses are perturbed by independent symmetric random walks. The forecaster is shown to achieve an expected regret of the optimal order O(sqrt(n log N)) where n is the time horizon and N is the number of experts. More importantly, it is shown that the forecaster changes its prediction at most O(sqrt(n log N)) times, in expectation. We also extend the analysis to online combinatorial optimization and show that even in this more general setting, the forecaster rarely switches between experts while having a regret of near-optimal order. △ Less

Submitted 23 February, 2013; originally announced February 2013.

arXiv:1301.4679 [pdf, ps, other]

Cellular Tree Classifiers

Authors: Gérard Biau, Luc Devroye

Abstract: The cellular tree classifier model addresses a fundamental problem in the design of classifiers for a parallel or distributed computing world: Given a data set, is it sufficient to apply a majority rule for classification, or shall one split the data into two or more parts and send each part to a potentially different computer (or cell) for further processing? At first sight, it seems impossible t… ▽ More The cellular tree classifier model addresses a fundamental problem in the design of classifiers for a parallel or distributed computing world: Given a data set, is it sufficient to apply a majority rule for classification, or shall one split the data into two or more parts and send each part to a potentially different computer (or cell) for further processing? At first sight, it seems impossible to define with this paradigm a consistent classifier as no cell knows the "original data size", $n$. However, we show that this is not so by exhibiting two different consistent classifiers. The consistency is universal but is only shown for distributions with nonatomic marginals. △ Less

Submitted 25 June, 2013; v1 submitted 20 January, 2013; originally announced January 2013.

arXiv:1202.5945 [pdf, other]

A Note on Interference in Random Point Sets

Authors: Luc Devroye, Pat Morin

Abstract: The (maximum receiver-centric) interference of a geometric graph (von Rickenbach etal (2005)) is studied. It is shown that, with high probability, the following results hold for a set, V, of n points independently and uniformly distributed in the unit d-cube, for constant dimension d: (1) there exists a connected graph with vertex set V that has interference O((log n)^{1/3}); (2) no connected grap… ▽ More The (maximum receiver-centric) interference of a geometric graph (von Rickenbach etal (2005)) is studied. It is shown that, with high probability, the following results hold for a set, V, of n points independently and uniformly distributed in the unit d-cube, for constant dimension d: (1) there exists a connected graph with vertex set V that has interference O((log n)^{1/3}); (2) no connected graph with vertex set V has interference o((log n)^{1/4}); and (3) the minimum spanning tree of $V$ has interference Theta((\log n)^{1/2}). △ Less

Submitted 12 June, 2012; v1 submitted 27 February, 2012; originally announced February 2012.

Comments: Updated for journal submission

arXiv:1106.0461 [pdf, other]

Random hyperplane search trees in high dimensions

Authors: Luc Devroye, James King

Abstract: Given a set S of n \geq d points in general position in R^d, a random hyperplane split is obtained by sampling d points uniformly at random without replacement from S and splitting based on their affine hull. A random hyperplane search tree is a binary space partition tree obtained by recursive application of random hyperplane splits. We investigate the structural distributions of such random tree… ▽ More Given a set S of n \geq d points in general position in R^d, a random hyperplane split is obtained by sampling d points uniformly at random without replacement from S and splitting based on their affine hull. A random hyperplane search tree is a binary space partition tree obtained by recursive application of random hyperplane splits. We investigate the structural distributions of such random trees with a particular focus on the growth with d. A blessing of dimensionality arises--as d increases, random hyperplane splits more closely resemble perfectly balanced splits; in turn, random hyperplane search trees more closely resemble perfectly balanced binary search trees. We prove that, for any fixed dimension d, a random hyperplane search tree storing n points has height at most (1 + O(1/sqrt(d))) log_2 n and average element depth at most (1 + O(1/d)) log_2 n with high probability as n \rightarrow \infty. Further, we show that these bounds are asymptotically optimal with respect to d. △ Less

Submitted 2 June, 2011; originally announced June 2011.

Comments: 19 pages, 4 figures

MSC Class: 68Q87

arXiv:1103.0351 [pdf, other]

Connectivity threshold for Bluetooth graphs

Authors: Nicolas Broutin, Luc Devroye, Nicolas Fraiman, Gábor Lugosi

Abstract: We study the connectivity properties of random Bluetooth graphs that model certain "ad hoc" wireless networks. The graphs are obtained as "irrigation subgraphs" of the well-known random geometric graph model. There are two parameters that control the model: the radius $r$ that determines the "visible neighbors" of each node and the number of edges $c$ that each node is allowed to send to these. Th… ▽ More We study the connectivity properties of random Bluetooth graphs that model certain "ad hoc" wireless networks. The graphs are obtained as "irrigation subgraphs" of the well-known random geometric graph model. There are two parameters that control the model: the radius $r$ that determines the "visible neighbors" of each node and the number of edges $c$ that each node is allowed to send to these. The randomness comes from the underlying distribution of data points in space and from the choices of each vertex. We prove that no connectivity can take place with high probability for a range of parameters $r, c$ and completely characterize the connectivity threshold (in $c$) for values of $r$ close the critical value for connectivity in the underlying random geometric graph. △ Less

Submitted 2 March, 2011; originally announced March 2011.

Comments: 21 pages, 5 figures

MSC Class: 05C80; 60C05

arXiv:1006.0291 [pdf, other]

The dilation of the Delaunay triangulation is greater than π/2

Authors: Prosenjit Bose, Luc Devroye, Maarten Löffler, Jack Snoeyink, Vishal Verma

Abstract: Consider the Delaunay triangulation T of a set P of points in the plane as a Euclidean graph, in which the weight of every edge is its length. It has long been conjectured that the dilation in T of any pair p, p \in P, which is the ratio of the length of the shortest path from p to p' in T over the Euclidean distance ||pp'||, can be at most π/2 \approx 1.5708. In this paper, we show how to constru… ▽ More Consider the Delaunay triangulation T of a set P of points in the plane as a Euclidean graph, in which the weight of every edge is its length. It has long been conjectured that the dilation in T of any pair p, p \in P, which is the ratio of the length of the shortest path from p to p' in T over the Euclidean distance ||pp'||, can be at most π/2 \approx 1.5708. In this paper, we show how to construct point sets in convex position with dilation > 1.5810 and in general position with dilation > 1.5846. Furthermore, we show that a sufficiently large set of points drawn independently from any distribution will in the limit approach the worst-case dilation for that distribution. △ Less

Submitted 2 June, 2010; originally announced June 2010.

Comments: 12 pages, 6 figures, invited to the special edition of Computational Geometry Theory and Applications for papers from CCCG 2009

arXiv:1002.1092 [pdf, other]

Odds-On Trees

Authors: Prosenjit Bose, Luc Devroye, Karim Douieb, Vida Dujmovic, James King, Pat Morin

Abstract: Let R^d -> A be a query problem over R^d for which there exists a data structure S that can compute P(q) in O(log n) time for any query point q in R^d. Let D be a probability measure over R^d representing a distribution of queries. We describe a data structure called the odds-on tree, of size O(n^ε) that can be used as a filter that quickly computes P(q) for some query values q in R^d and relies… ▽ More Let R^d -> A be a query problem over R^d for which there exists a data structure S that can compute P(q) in O(log n) time for any query point q in R^d. Let D be a probability measure over R^d representing a distribution of queries. We describe a data structure called the odds-on tree, of size O(n^ε) that can be used as a filter that quickly computes P(q) for some query values q in R^d and relies on S for the remaining queries. With an odds-on tree, the expected query time for a point drawn according to D is O(H*+1), where H* is a lower-bound on the expected cost of any linear decision tree that solves P. Odds-on trees have a number of applications, including distribution-sensitive data structures for point location in 2-d, point-in-polytope testing in d dimensions, ray shooting in simple polygons, ray shooting in polytopes, nearest-neighbour queries in R^d, point-location in arrangements of hyperplanes in R^d, and many other geometric searching problems that can be solved in the linear-decision tree model. A standard lifting technique extends these results to algebraic decision trees of constant degree. A slightly different version of odds-on trees yields similar results for orthogonal searching problems that can be solved in the comparison tree model. △ Less

Submitted 4 February, 2010; originally announced February 2010.

Comments: 19 pages, 0 figures

arXiv:1001.2763 [pdf, other]

Point Location in Disconnected Planar Subdivisions

Authors: Prosenjit Bose, Luc Devroye, Karim Douieb, Vida Dujmovic, James King, Pat Morin

Abstract: Let $G$ be a (possibly disconnected) planar subdivision and let $D$ be a probability measure over $\R^2$. The current paper shows how to preprocess $(G,D)$ into an O(n) size data structure that can answer planar point location queries over $G$. The expected query time of this data structure, for a query point drawn according to $D$, is $O(H+1)$, where $H$ is a lower bound on the expected query t… ▽ More Let $G$ be a (possibly disconnected) planar subdivision and let $D$ be a probability measure over $\R^2$. The current paper shows how to preprocess $(G,D)$ into an O(n) size data structure that can answer planar point location queries over $G$. The expected query time of this data structure, for a query point drawn according to $D$, is $O(H+1)$, where $H$ is a lower bound on the expected query time of any linear decision tree for point location in $G$. This extends the results of Collette et al (2008, 2009) from connected planar subdivisions to disconnected planar subdivisions. A version of this structure, when combined with existing results on succinct point location, provides a succinct distribution-sensitive point location structure. △ Less

Submitted 15 January, 2010; originally announced January 2010.

arXiv:0911.2484 [pdf, other]

doi 10.1016/j.comgeo.2011.12.005

Memoryless Routing in Convex Subdivisions: Random Walks are Optimal

Authors: Dan Chen, Luc Devroye, Vida Dujmovic, Pat Morin

Abstract: A memoryless routing algorithm is one in which the decision about the next edge on the route to a vertex t for a packet currently located at vertex v is made based only on the coordinates of v, t, and the neighbourhood, N(v), of v. The current paper explores the limitations of such algorithms by showing that, for any (randomized) memoryless routing algorithm A, there exists a convex subdivision… ▽ More A memoryless routing algorithm is one in which the decision about the next edge on the route to a vertex t for a packet currently located at vertex v is made based only on the coordinates of v, t, and the neighbourhood, N(v), of v. The current paper explores the limitations of such algorithms by showing that, for any (randomized) memoryless routing algorithm A, there exists a convex subdivision on which A takes Omega(n^2) expected time to route a message between some pair of vertices. Since this lower bound is matched by a random walk, this result implies that the geometric information available in convex subdivisions is not helpful for this class of routing algorithms. The current paper also shows the existence of triangulations for which the Random-Compass algorithm proposed by Bose etal (2002,2004) requires 2^{Ω(n)} time to route between some pair of vertices. △ Less

Submitted 12 November, 2009; originally announced November 2009.

Comments: 11 pages, 6 figures

Journal ref: Computational Geometry: Theory and Applications, Volume 45, Issue 4, May 2012, Pages 178-185

arXiv:0905.3584 [pdf, ps, other]

On the Expected Maximum Degree of Gabriel and Yao Graphs

Authors: Luc Devroye, Joachim Gudmundsson, Pat Morin

Abstract: Motivated by applications of Gabriel graphs and Yao graphs in wireless ad-hoc networks, we show that the maximal degree of a random Gabriel graph or Yao graph defined on $n$ points drawn uniformly at random from a unit square grows as $Θ(\log n / \log \log n)$ in probability. Motivated by applications of Gabriel graphs and Yao graphs in wireless ad-hoc networks, we show that the maximal degree of a random Gabriel graph or Yao graph defined on $n$ points drawn uniformly at random from a unit square grows as $Θ(\log n / \log \log n)$ in probability. △ Less

Submitted 21 May, 2009; originally announced May 2009.

Comments: 20 pages, 10 figures

ACM Class: I.3.5; E.1

arXiv:math/0005237 [pdf, ps, other]

Perfect simulation from the Quicksort limit distribution

Authors: Luc Devroye, James Allen Fill, Ralph Neininger

Abstract: The weak limit of the normalized number of comparisons needed by the Quicksort algorithm to sort n randomly permuted items is known to be determined implicitly by a distributional fixed-point equation. We give an algorithm for perfect random variate generation from this distribution. The weak limit of the normalized number of comparisons needed by the Quicksort algorithm to sort n randomly permuted items is known to be determined implicitly by a distributional fixed-point equation. We give an algorithm for perfect random variate generation from this distribution. △ Less

Submitted 23 May, 2000; v1 submitted 23 May, 2000; originally announced May 2000.

Comments: 7 pages. See also http://www.mts.jhu.edu/~fill/, http://www-cgrl.cs.mcgill.ca/~luc/, and http://www.stochastik.uni-freiburg.de/homepages/neininger/ . Submitted for publication in May, 2000

Report number: 603, Department of Mathematical Sciences, The Johns Hopkins University MSC Class: 65C10 (primary); 65C05; 68U20; 11K45 (secondary)

Showing 1–37 of 37 results for author: Devroye, L