Search | arXiv e-print repository

Enumeration and Succinct Encoding of AVL Trees

Authors: Jeremy Chizewer, Stephen Melczer, J. Ian Munro, Ava Pun

Abstract: We use a novel decomposition to create succinct data structures -- supporting a wide range of operations on static trees in constant time -- for a variety tree classes, extending results of Munro, Nicholson, Benkner, and Wild. Motivated by the class of AVL trees, we further derive asymptotics for the information-theoretic lower bound on the number of bits needed to store tree classes whose generat… ▽ More We use a novel decomposition to create succinct data structures -- supporting a wide range of operations on static trees in constant time -- for a variety tree classes, extending results of Munro, Nicholson, Benkner, and Wild. Motivated by the class of AVL trees, we further derive asymptotics for the information-theoretic lower bound on the number of bits needed to store tree classes whose generating functions satisfy certain functional equations. In particular, we prove that AVL trees require approximately $0.938$ bits per node to encode. △ Less

Submitted 11 March, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

Comments: Updated with more general results

arXiv:2209.14401 [pdf, other]

Shortest Beer Path Queries in Interval Graphs

Authors: Rathish Das, Meng He, Eitan Kondratovsky, J. Ian Munro, Anurag Murty Naredla, Kaiyu Wu

Abstract: Our interest is in paths between pairs of vertices that go through at least one of a subset of the vertices known as beer vertices. Such a path is called a beer path, and the beer distance between two vertices is the length of the shortest beer path. We show that we can represent unweighted interval graphs using $2n \log n + O(n) + O(|B|\log n)$ bits where $|B|$ is the number of beer vertices. T… ▽ More Our interest is in paths between pairs of vertices that go through at least one of a subset of the vertices known as beer vertices. Such a path is called a beer path, and the beer distance between two vertices is the length of the shortest beer path. We show that we can represent unweighted interval graphs using $2n \log n + O(n) + O(|B|\log n)$ bits where $|B|$ is the number of beer vertices. This data structure answers beer distance queries in $O(\log^\varepsilon n)$ time for any constant $\varepsilon > 0$ and shortest beer path queries in $O(\log^\varepsilon n + d)$ time, where $d$ is the beer distance between the two nodes. We also show that proper interval graphs may be represented using $3n + o(n)$ bits to support beer distance queries in $O(f(n)\log n)$ time for any $f(n) \in ω(1)$ and shortest beer path queries in $O(d)$ time. All of these results also have time-space trade-offs. Lastly we show that the information theoretic lower bound for beer proper interval graphs is very close to the space of our structure, namely $\log(4+2\sqrt{3})n - o(n)$ (or about $ 2.9 n$) bits. △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: To appear in ISAAC 2022

arXiv:2104.13457 [pdf, other]

doi 10.4230/LIPIcs.ESA.2021.70

Hypersuccinct Trees -- New universal tree source codes for optimal compressed tree data structures and range minima

Authors: J. Ian Munro, Patrick K. Nicholson, Louisa Seelbach Benkner, Sebastian Wild

Abstract: We present a new universal source code for distributions of unlabeled binary and ordinal trees that achieves optimal compression to within lower order terms for all tree sources covered by existing universal codes. At the same time, it supports answering many navigational queries on the compressed representation in constant time on the word-RAM; this is not known to be possible for any existing tr… ▽ More We present a new universal source code for distributions of unlabeled binary and ordinal trees that achieves optimal compression to within lower order terms for all tree sources covered by existing universal codes. At the same time, it supports answering many navigational queries on the compressed representation in constant time on the word-RAM; this is not known to be possible for any existing tree compression method. The resulting data structures, "hypersuccinct trees", hence combine the compression achieved by the best known universal codes with the operation support of the best succinct tree data structures. We apply hypersuccinct trees to obtain a universal compressed data structure for range-minimum queries. It has constant query time and the optimal worst-case space usage of $2n+o(n)$ bits, but the space drops to $1.736n + o(n)$ bits on average for random permutations of $n$ elements, and $2\lg\binom nr + o(n)$ for arrays with $r$ increasing runs, respectively. Both results are optimal; the former answers an open problem of Davoodi et al. (2014) and Golin et al. (2016). Compared to prior work on succinct data structures, we do not have to tailor our data structure to specific applications; hypersuccinct trees automatically adapt to the trees at hand. We show that they simultaneously achieve the optimal space usage to within lower order terms for a wide range of distributions over tree shapes, including: binary search trees (BSTs) generated by insertions in random order / Cartesian trees of random arrays, random fringe-balanced BSTs, binary trees with a given number of binary/unary/leaf nodes, random binary tries generated from memoryless sources, full binary trees, unary paths, as well as uniformly chosen weight-balanced BSTs, AVL trees, and left-leaning red-black trees. △ Less

Submitted 3 September, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

Comments: part of ESA 2021

arXiv:2103.01084 [pdf, ps, other]

doi 10.1145/3477910

A Simple Algorithm for Optimal Search Trees with Two-Way Comparisons

Authors: Marek Chrobak, Mordecai Golin, J. Ian Munro, Neal E. Young

Abstract: We present a simple $O(n^4)$-time algorithm for computing optimal search trees with two-way comparisons. The only previous solution to this problem, by Anderson et al., has the same running time, but is significantly more complicated and is restricted to the variant where only successful queries are allowed. Our algorithm extends directly to solve the standard full variant of the problem, which al… ▽ More We present a simple $O(n^4)$-time algorithm for computing optimal search trees with two-way comparisons. The only previous solution to this problem, by Anderson et al., has the same running time, but is significantly more complicated and is restricted to the variant where only successful queries are allowed. Our algorithm extends directly to solve the standard full variant of the problem, which also allows unsuccessful queries and for which no polynomial-time algorithm was previously known. The correctness proof of our algorithm relies on a new structural theorem for two-way-comparison search trees. △ Less

Submitted 4 October, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

Comments: v3 adds Appendix B, with a stronger alternative to Theorem 1

MSC Class: 68P10; 68P30; 68W25; 94A45 ACM Class: E.4; G.1.6; G.2.2; H.3.1; I.4.2

Journal ref: ACM Transactions on Algorithms 18(1) (2022) 1-11

arXiv:2103.01052 [pdf, other]

doi 10.1016/j.ic.2021.104707

On the Cost of Unsuccessful Searches in Search Trees with Two-way Comparisons

Authors: Marek Chrobak, Mordecai Golin, J. Ian Munro, Neal E. Young

Abstract: Search trees are commonly used to implement access operations to a set of stored keys. If this set is static and the probabilities of membership queries are known in advance, then one can precompute an optimal search tree, namely one that minimizes the expected access cost. For a non-key query, a search tree can determine its approximate location by returning the inter-key interval containing the… ▽ More Search trees are commonly used to implement access operations to a set of stored keys. If this set is static and the probabilities of membership queries are known in advance, then one can precompute an optimal search tree, namely one that minimizes the expected access cost. For a non-key query, a search tree can determine its approximate location by returning the inter-key interval containing the query. This is in contrast to other dictionary data structures, like hash tables, that only report a failed search. We address the question "what is the additional cost of determining approximate locations for non-key queries"? We prove that for two-way comparison trees this additional cost is at most 1. Our proof is based on a novel probabilistic argument that involves converting a search tree that does not identify non-key queries into a random tree that does. △ Less

Submitted 9 March, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

Comments: v2 has updated bibliography

MSC Class: 68P10; 68P30; 68W25; 94A45 ACM Class: E.4; G.1.6; G.2.2; H.3.1; I.4.2

Journal ref: Information and Computation 281 (2021)

arXiv:2005.07644 [pdf, other]

doi 10.4230/LIPIcs.ISAAC.2020.57

Distance Oracles for Interval Graphs via Breadth-First Rank/Select in Succinct Trees

Authors: Meng He, J. Ian Munro, Yakov Nekrich, Sebastian Wild, Kaiyu Wu

Abstract: We present the first succinct distance oracles for (unweighted) interval graphs and related classes of graphs, using a novel succinct data structure for ordinal trees that supports the map** between preorder (i.e., depth-first) ranks and level-order (breadth-first) ranks of nodes in constant time. Our distance oracles for interval graphs also support navigation queries -- testing adjacency, comp… ▽ More We present the first succinct distance oracles for (unweighted) interval graphs and related classes of graphs, using a novel succinct data structure for ordinal trees that supports the map** between preorder (i.e., depth-first) ranks and level-order (breadth-first) ranks of nodes in constant time. Our distance oracles for interval graphs also support navigation queries -- testing adjacency, computing node degrees, neighborhoods, and shortest paths -- all in optimal time. Our technique also yields optimal distance oracles for proper interval graphs (unit-interval graphs) and circular-arc graphs. Our tree data structure supports all operations provided by different approaches in previous work, as well as map** to and from level-order ranks and retrieving the last (first) internal node before (after) a given node in a level-order traversal, all in constant time. △ Less

Submitted 30 September, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

Comments: to appear in ISAAC 2020

arXiv:1908.00563 [pdf, ps, other]

Dynamic Optimality Refuted -- For Tournament Heaps

Authors: J. Ian Munro, Richard Peng, Sebastian Wild, Lingyi Zhang

Abstract: We prove a separation between offline and online algorithms for finger-based tournament heaps undergoing key modifications. These heaps are implemented by binary trees with keys stored on leaves, and intermediate nodes tracking the min of their respective subtrees. They represent a natural starting point for studying self-adjusting heaps due to the need to access the root-to-leaf path upon modific… ▽ More We prove a separation between offline and online algorithms for finger-based tournament heaps undergoing key modifications. These heaps are implemented by binary trees with keys stored on leaves, and intermediate nodes tracking the min of their respective subtrees. They represent a natural starting point for studying self-adjusting heaps due to the need to access the root-to-leaf path upon modifications. We combine previous studies on the competitive ratios of unordered binary search trees by [Fredman WADS2011] and on order-by-next request by [Martínez-Roura TCS2000] and [Munro ESA2000] to show that for any number of fingers, tournament heaps cannot handle a sequence of modify-key operations with competitive ratio in $o(\sqrt{\log{n}})$. Critical to this analysis is the characterization of the modifications that a heap can undergo upon an access. There are $\exp(Θ(n \log{n}))$ valid heaps on $n$ keys, but only $\exp(Θ(n))$ binary search trees. We parameterize the modification power through the well-studied concept of fingers: additional pointers the data structure can manipulate arbitrarily. Here we demonstrate that fingers can be significantly more powerful than servers moving on a static tree by showing that access to $k$ fingers allow an offline algorithm to handle any access sequence with amortized cost $O(\log_{k}(n) + 2^{\lg^{*}n})$. △ Less

Submitted 1 August, 2019; originally announced August 2019.

arXiv:1907.08579 [pdf, other]

On Approximate Range Mode and Range Selection

Authors: Hicham El-Zein, Meng He, J. Ian Munro, Yakov Nekrich, Bryce Sandlund

Abstract: For any $ε\in (0,1)$, a $(1+ε)$-approximate range mode query asks for the position of an element whose frequency in the query range is at most a factor $(1+ε)$ smaller than the true mode. For this problem, we design an $O(n/ε)$ bit data structure supporting queries in $O(\lg(1/ε))$ time. This is an encoding data structure which does not require access to the input sequence; we prove the space cost… ▽ More For any $ε\in (0,1)$, a $(1+ε)$-approximate range mode query asks for the position of an element whose frequency in the query range is at most a factor $(1+ε)$ smaller than the true mode. For this problem, we design an $O(n/ε)$ bit data structure supporting queries in $O(\lg(1/ε))$ time. This is an encoding data structure which does not require access to the input sequence; we prove the space cost is asymptotically optimal for constant $ε$. Our solution improves the previous best result of Greve et al. (Cell Probe Lower Bounds and Approximations for Range Mode, ICALP'10) by reducing the space cost by a factor of $\lg n$ while achieving the same query time. We also design an $O(n)$-word dynamic data structure that answers queries in $O(\lg n /\lg\lg n)$ time and supports insertions and deletions in $O(\lg n)$ time, for any constant $ε\in (0,1)$. This is the first result on dynamic approximate range mode; it can also be used to obtain the first static data structure for approximate 3-sided range mode queries in two dimensions. We also consider approximate range selection. For any $α\in (0,1/2)$, an $α$-approximate range selection query asks for the position of an element whose rank in the query range is in $[k - αs, k + αs]$, where $k$ is a rank given by the query and $s$ is the size of the query range. When $α$ is a constant, we design an $O(n)$-bit encoding data structure that can answer queries in constant time and prove this space cost is asymptotically optimal. The previous best result by Krizanc et al. (Range Mode and Range Median Queries on Lists and Trees, Nordic Journal of Computing, 2005) uses $O(n\lg n)$ bits, or $O(n)$ words, to achieve constant approximation for range median only. Thus we not only improve the space cost, but also provide support for any arbitrary $k$ given at query time. △ Less

Submitted 19 July, 2019; originally announced July 2019.

arXiv:1903.06601 [pdf, other]

Dynamic Planar Point Location in External Memory

Authors: J. Ian Munro, Yakov Nekrich

Abstract: In this paper we describe a fully-dynamic data structure for the planar point location problem in the external memory model. Our data structure supports queries in $O(\log_B n(\log\log_B n)^3))$ I/Os and updates in $O(\log_B n(\log\log_B n)^2))$ amortized I/Os, where $n$ is the number of segments in the subdivision and $B$ is the block size. This is the first dynamic data structure with almost-opt… ▽ More In this paper we describe a fully-dynamic data structure for the planar point location problem in the external memory model. Our data structure supports queries in $O(\log_B n(\log\log_B n)^3))$ I/Os and updates in $O(\log_B n(\log\log_B n)^2))$ amortized I/Os, where $n$ is the number of segments in the subdivision and $B$ is the block size. This is the first dynamic data structure with almost-optimal query cost. For comparison all previously known results for this problem require $O(\log_B^2 n)$ I/Os to answer queries. Our result almost matches the best known upper bound in the internal-memory model. △ Less

Submitted 15 March, 2019; originally announced March 2019.

Comments: extended version of a SoCG'19 paper

arXiv:1903.02533 [pdf, other]

Entropy Trees and Range-Minimum Queries In Optimal Average-Case Space

Authors: J. Ian Munro, Sebastian Wild

Abstract: The range-minimum query (RMQ) problem is a fundamental data structuring task with numerous applications. Despite the fact that succinct solutions with worst-case optimal $2n+o(n)$ bits of space and constant query time are known, it has been unknown whether such a data structure can be made adaptive to the reduced entropy of random inputs (Davoodi et al. 2014). We construct a succinct data structur… ▽ More The range-minimum query (RMQ) problem is a fundamental data structuring task with numerous applications. Despite the fact that succinct solutions with worst-case optimal $2n+o(n)$ bits of space and constant query time are known, it has been unknown whether such a data structure can be made adaptive to the reduced entropy of random inputs (Davoodi et al. 2014). We construct a succinct data structure with the optimal $1.736n+o(n)$ bits of space on average for random RMQ instances, settling this open problem. Our solution relies on a compressed data structure for binary trees that is of independent interest. It can store a (static) binary search tree generated by random insertions in asymptotically optimal expected space and supports many queries in constant time. Using an instance-optimal encoding of subtrees, we furthermore obtain a "hyper-succinct" data structure for binary trees that improves upon the ultra-succinct representation of Jansson, Sadakane and Sung (2012). △ Less

Submitted 6 March, 2019; originally announced March 2019.

arXiv:1902.05166 [pdf, other]

Space-Efficient Data Structures for Lattices

Authors: J. Ian Munro, Bryce Sandlund, Corwin Sinnamon

Abstract: A lattice is a partially-ordered set in which every pair of elements has a unique meet (greatest lower bound) and join (least upper bound). We present new data structures for lattices that are simple, efficient, and nearly optimal in terms of space complexity. Our first data structure can answer partial order queries in constant time and find the meet or join of two elements in $O(n^{3/4})$ time… ▽ More A lattice is a partially-ordered set in which every pair of elements has a unique meet (greatest lower bound) and join (least upper bound). We present new data structures for lattices that are simple, efficient, and nearly optimal in terms of space complexity. Our first data structure can answer partial order queries in constant time and find the meet or join of two elements in $O(n^{3/4})$ time, where $n$ is the number of elements in the lattice. It occupies $O(n^{3/2}\log n)$ bits of space, which is only a $Θ(\log n)$ factor from the $Θ(n^{3/2})$-bit lower bound for storing lattices. The preprocessing time is $O(n^2)$. This structure admits a simple space-time tradeoff so that, for any $c \in [\frac{1}{2}, 1]$, the data structure supports meet and join queries in $O(n^{1-c/2})$ time, occupies $O(n^{1+c}\log n)$ bits of space, and can be constructed in $O(n^2 + n^{1+3c/2})$ time. Our second data structure uses $O(n^{3/2}\log n)$ bits of space and supports meet and join in $O(d \frac{\log n}{\log d})$ time, where $d$ is the maximum degree of any element in the transitive reduction graph of the lattice. This structure is much faster for lattices with low-degree elements. This paper also identifies an error in a long-standing solution to the problem of representing lattices. We discuss the issue with this previous work. △ Less

Submitted 16 June, 2020; v1 submitted 13 February, 2019; originally announced February 2019.

Comments: Accepted in SWAT 2020

arXiv:1901.03783 [pdf, ps, other]

doi 10.1007/s00236-021-00411-z

On Huang and Wong's Algorithm for Generalized Binary Split Trees

Authors: Marek Chrobak, Mordecai Golin, J. Ian Munro, Neal E. Young

Abstract: Huang and Wong [1984] proposed a polynomial-time dynamic-programming algorithm for computing optimal generalized binary split trees. We show that their algorithm is incorrect. Thus, it remains open whether such trees can be computed in polynomial time. Spuler [1994] proposed modifying Huang and Wong's algorithm to obtain an algorithm for a different problem: computing optimal two-way-comparison se… ▽ More Huang and Wong [1984] proposed a polynomial-time dynamic-programming algorithm for computing optimal generalized binary split trees. We show that their algorithm is incorrect. Thus, it remains open whether such trees can be computed in polynomial time. Spuler [1994] proposed modifying Huang and Wong's algorithm to obtain an algorithm for a different problem: computing optimal two-way-comparison search trees. We show that the dynamic program underlying Spuler's algorithm is not valid, in that it does not satisfy the necessary optimal-substructure property and its proposed recurrence relation is incorrect. It remains unknown whether the algorithm is guaranteed to compute a correct overall solution. △ Less

Submitted 14 February, 2022; v1 submitted 11 January, 2019; originally announced January 2019.

MSC Class: 68P10; 68P30; 68W25; 94A45 ACM Class: E.4; G.1.6; G.2.2; H.3.1; I.4.2

Journal ref: Acta Informatica (2022)

arXiv:1807.03827 [pdf, other]

Improved Time and Space Bounds for Dynamic Range Mode

Authors: Hicham El-Zein, Meng He, J. Ian Munro, Bryce Sandlund

Abstract: Given an array A of $n$ elements, we wish to support queries for the most frequent and least frequent element in a subrange $[l, r]$ of $A$. We also wish to support updates that change a particular element at index $i$ or insert/ delete an element at index $i$. For the range mode problem, our data structure supports all operations in $O(n^{2/3})$ deterministic time using only $O(n)$ space. This im… ▽ More Given an array A of $n$ elements, we wish to support queries for the most frequent and least frequent element in a subrange $[l, r]$ of $A$. We also wish to support updates that change a particular element at index $i$ or insert/ delete an element at index $i$. For the range mode problem, our data structure supports all operations in $O(n^{2/3})$ deterministic time using only $O(n)$ space. This improves two results by Chan et al. \cite{C14}: a linear space data structure supporting update and query operations in $\tilde{O}(n^{3/4})$ time and an $O(n^{4/3})$ space data structure supporting update and query operations in $\tilde{O}(n^{2/3})$ time. For the range least frequent problem, we address two variations. In the first, we are allowed to answer with an element of $A$ that may not appear in the query range, and in the second, the returned element must be present in the query range. For the first variation, we develop a data structure that supports queries in $\tilde{O}(n^{2/3})$ time, updates in $O(n^{2/3})$ time, and occupies $O(n)$ space. For the second variation, we develop a Monte Carlo data structure that supports queries in $O(n^{2/3})$ time, updates in $\tilde{O}(n^{2/3})$ time, and occupies $\tilde{O}(n)$ space, but requires that updates are made independently of the results of previous queries. The Monte Carlo data structure is also capable of answering $k$-frequency queries; that is, the problem of finding an element of given frequency in the specified query range. Previously, no dynamic data structures were known for least frequent element or $k$-frequency queries. △ Less

Submitted 10 July, 2018; originally announced July 2018.

arXiv:1806.10498 [pdf, other]

Dynamic Trees with Almost-Optimal Access Cost

Authors: Mordecai Golin, John Iacono, Stefan Langerman, J. Ian Munro, Yakov Nekrich

Abstract: An optimal binary search tree for an access sequence on elements is a static tree that minimizes the total search cost. Constructing perfectly optimal binary search trees is expensive so the most efficient algorithms construct almost optimal search trees. There exists a long literature of constructing almost optimal search trees dynamically, i.e., when the access pattern is not known in advance. A… ▽ More An optimal binary search tree for an access sequence on elements is a static tree that minimizes the total search cost. Constructing perfectly optimal binary search trees is expensive so the most efficient algorithms construct almost optimal search trees. There exists a long literature of constructing almost optimal search trees dynamically, i.e., when the access pattern is not known in advance. All of these trees, e.g., splay trees and treaps, provide a multiplicative approximation to the optimal search cost. In this paper we show how to maintain an almost optimal weighted binary search tree under access operations and insertions of new elements where the approximation is an additive constant. More technically, we maintain a tree in which the depth of the leaf holding an element $e_i$ does not exceed $\min(\log(W/w_i),\log n)+O(1)$ where $w_i$ is the number of times $e_i$ was accessed and $W$ is the total length of the access sequence. Our techniques can also be used to encode a sequence of $m$ symbols with a dynamic alphabetic code in $O(m)$ time so that the encoding length is bounded by $m(H+O(1))$, where $H$ is the entropy of the sequence. This is the first efficient algorithm for adaptive alphabetic coding that runs in constant time per symbol. △ Less

Submitted 27 June, 2018; originally announced June 2018.

Comments: Full version of an ESA'18 paper

arXiv:1805.04154 [pdf, other]

doi 10.4230/lipics.esa.2018.63

Nearly-Optimal Mergesorts: Fast, Practical Sorting Methods That Optimally Adapt to Existing Runs

Authors: J. Ian Munro, Sebastian Wild

Abstract: We present two stable mergesort variants, "peeksort" and "powersort", that exploit existing runs and find nearly-optimal merging orders with practically negligible overhead. Previous methods either require substantial effort for determining the merging order (Takaoka 2009; Barbay & Navarro 2013) or do not have a constant-factor optimal worst-case guarantee (Peters 2001; Auger, Nicaud & Pivoteau 20… ▽ More We present two stable mergesort variants, "peeksort" and "powersort", that exploit existing runs and find nearly-optimal merging orders with practically negligible overhead. Previous methods either require substantial effort for determining the merging order (Takaoka 2009; Barbay & Navarro 2013) or do not have a constant-factor optimal worst-case guarantee (Peters 2001; Auger, Nicaud & Pivoteau 2015; Buss & Knop 2018). We demonstrate that our methods are competitive in terms of running time with state-of-the-art implementations of stable sorting methods. △ Less

Submitted 10 May, 2018; originally announced May 2018.

arXiv:1802.09505 [pdf, other]

Faster Algorithms for some Optimization Problems on Collinear Points

Authors: Ahmad Biniaz, Prosenjit Bose, Paz Carmi, Anil Maheshwari, J. Ian Munro, Michiel Smid

Abstract: We propose faster algorithms for the following three optimization problems on $n$ collinear points, i.e., points in dimension one. The first two problems are known to be NP-hard in higher dimensions. 1- Maximizing total area of disjoint disks: In this problem the goal is to maximize the total area of nonoverlap** disks centered at the points. Acharyya, De, and Nandy (2017) presented an… ▽ More We propose faster algorithms for the following three optimization problems on $n$ collinear points, i.e., points in dimension one. The first two problems are known to be NP-hard in higher dimensions. 1- Maximizing total area of disjoint disks: In this problem the goal is to maximize the total area of nonoverlap** disks centered at the points. Acharyya, De, and Nandy (2017) presented an $O(n^2)$-time algorithm for this problem. We present an optimal $Θ(n)$-time algorithm. 2- Minimizing sum of the radii of client-server coverage: The $n$ points are partitioned into two sets, namely clients and servers. The goal is to minimize the sum of the radii of disks centered at servers such that every client is in some disk, i.e., in the coverage range of some server. Lev-Tov and Peleg (2005) presented an $O(n^3)$-time algorithm for this problem. We present an $O(n^2)$-time algorithm, thereby improving the running time by a factor of $Θ(n)$. 3- Minimizing total area of point-interval coverage: The $n$ input points belong to an interval $I$. The goal is to find a set of $n$ disks of minimum total area, covering $I$, such that every disk contains at least one input point. We present an algorithm that solves this problem in $O(n^2)$ time. △ Less

Submitted 26 July, 2018; v1 submitted 26 February, 2018; originally announced February 2018.

Comments: To appear in SoCG 2018. Full version (15 pages)

arXiv:1712.07431 [pdf, ps, other]

Text Indexing and Searching in Sublinear Time

Authors: J. Ian Munro, Gonzalo Navarro, Yakov Nekrich

Abstract: We introduce the first index that can be built in $o(n)$ time for a text of length $n$, and can also be queried in $o(q)$ time for a pattern of length $q$. On an alphabet of size $σ$, our index uses $O(n\sqrt{\log n\logσ})$ bits, is built in $O(n((\log\log n)^2+\sqrt{\logσ})/\sqrt{\log_σn})$ deterministic time, and computes the number $\mathrm{occ}$ of occurrences of the pattern in time… ▽ More We introduce the first index that can be built in $o(n)$ time for a text of length $n$, and can also be queried in $o(q)$ time for a pattern of length $q$. On an alphabet of size $σ$, our index uses $O(n\sqrt{\log n\logσ})$ bits, is built in $O(n((\log\log n)^2+\sqrt{\logσ})/\sqrt{\log_σn})$ deterministic time, and computes the number $\mathrm{occ}$ of occurrences of the pattern in time $O(q/\log_σn+\log n)$. Each such occurrence can then be found in $O(\sqrt{\log n\logσ})$ time. By slightly increasing the space and construction time, to $O(n(\sqrt{\log n\logσ}+ \logσ\log^\varepsilon n))$ and $O(n\log^{3/2}σ/\log^{1/2-\varepsilon} n)$, respectively, for any constant $0<\varepsilon<1/2$, we can find the $\mathrm{occ}$ pattern occurrences in time $O(q/\log_σn + \sqrt{\log_σn}\log\log n + \mathrm{occ})$. We build on a novel text sampling based on difference covers, which enjoys properties that allow us efficiently computing longest common prefixes in constant time. We extend our results to the secondary memory model as well, where we give the first construction in $o(\mathit{Sort}(n))$ I/Os of a data structure with suffix array functionality; this data structure supports pattern matching queries with optimal or nearly-optimal cost. △ Less

Submitted 15 July, 2019; v1 submitted 20 December, 2017; originally announced December 2017.

arXiv:1707.01743 [pdf, other]

Fast Compressed Self-Indexes with Deterministic Linear-Time Construction

Authors: J. Ian Munro, Gonzalo Navarro, Yakov Nekrich

Abstract: We introduce a compressed suffix array representation that, on a text $T$ of length $n$ over an alphabet of size $σ$, can be built in $O(n)$ deterministic time, within $O(n\logσ)$ bits of working space, and counts the number of occurrences of any pattern $P$ in $T$ in time $O(|P| + \log\log_w σ)$ on a RAM machine of $w=Ω(\log n)$-bit words. This new index outperforms all the other compressed index… ▽ More We introduce a compressed suffix array representation that, on a text $T$ of length $n$ over an alphabet of size $σ$, can be built in $O(n)$ deterministic time, within $O(n\logσ)$ bits of working space, and counts the number of occurrences of any pattern $P$ in $T$ in time $O(|P| + \log\log_w σ)$ on a RAM machine of $w=Ω(\log n)$-bit words. This new index outperforms all the other compressed indexes that can be built in linear deterministic time, and some others. The only faster indexes can be built in linear time only in expectation, or require $Θ(n\log n)$ bits. We also show that, by using $O(n\logσ)$ bits, we can build in linear time an index that counts in time $O(|P|/\log_σn + \log n(\log\log n)^2)$, which is RAM-optimal for $w=Θ(\log n)$ and sufficiently long patterns. △ Less

Submitted 1 September, 2017; v1 submitted 6 July, 2017; originally announced July 2017.

arXiv:1607.04346 [pdf, other]

Space-Efficient Construction of Compressed Indexes in Deterministic Linear Time

Authors: J. Ian Munro, Gonzalo Navarro, Yakov Nekrich

Abstract: We show that the compressed suffix array and the compressed suffix tree of a string $T$ can be built in $O(n)$ deterministic time using $O(n\logσ)$ bits of space, where $n$ is the string length and $σ$ is the alphabet size. Previously described deterministic algorithms either run in time that depends on the alphabet size or need $ω(n\log σ)$ bits of working space. Our result has immediate applicat… ▽ More We show that the compressed suffix array and the compressed suffix tree of a string $T$ can be built in $O(n)$ deterministic time using $O(n\logσ)$ bits of space, where $n$ is the string length and $σ$ is the alphabet size. Previously described deterministic algorithms either run in time that depends on the alphabet size or need $ω(n\log σ)$ bits of working space. Our result has immediate applications to other problems, such as yielding the first linear-time LZ77 and LZ78 parsing algorithms that use $O(n \logσ)$ bits. △ Less

Submitted 13 November, 2016; v1 submitted 14 July, 2016; originally announced July 2016.

Comments: Extended version of a paper to appear at SODA 2017

arXiv:1606.04495 [pdf, ps, other]

Range Majorities and Minorities in Arrays

Authors: Djamal Belazzougui, Travis Gagie, J. Ian Munro, Gonzalo Navarro, Yakov Nekrich

Abstract: Karpinski and Nekrich (2008) introduced the problem of parameterized range majority, which asks us to preprocess a string of length $n$ such that, given the endpoints of a range, one can quickly find all the distinct elements whose relative frequencies in that range are more than a threshold $τ$. Subsequent authors have reduced their time and space bounds such that, when $τ$ is fixed at preprocess… ▽ More Karpinski and Nekrich (2008) introduced the problem of parameterized range majority, which asks us to preprocess a string of length $n$ such that, given the endpoints of a range, one can quickly find all the distinct elements whose relative frequencies in that range are more than a threshold $τ$. Subsequent authors have reduced their time and space bounds such that, when $τ$ is fixed at preprocessing time, we need either $O(n \log (1 / τ))$ space and optimal $O(1 / τ)$ query time or linear space and $O((1 / τ) \log \log σ)$ query time, where $σ$ is the alphabet size. In this paper we give the first linear-space solution with optimal $O(1 / τ)$ query time, even with variable $τ$ (i.e., specified with the query). For the case when $σ$ is polynomial on the computer word size, our space is optimally compressed according to the symbol frequencies in the string. Otherwise, either the compressed space is increased by an arbitrarily small constant factor or the time rises to any function in $(1/τ)\cdotω(1)$. We obtain the same results on the complementary problem of parameterized range minority introduced by Chan et al. (2015), who had achieved linear space and $O(1 / τ)$ query time with variable $τ$. △ Less

Submitted 14 June, 2016; originally announced June 2016.

Comments: arXiv admin note: substantial text overlap with arXiv:1210.1765

arXiv:1507.06866 [pdf, ps, other]

Compressed Data Structures for Dynamic Sequences

Authors: J. Ian Munro, Yakov Nekrich

Abstract: We consider the problem of storing a dynamic string $S$ over an alphabet $Σ=\{\,1,\ldots,σ\,\}$ in compressed form. Our representation supports insertions and deletions of symbols and answers three fundamental queries: $\mathrm{access}(i,S)$ returns the $i$-th symbol in $S$, $\mathrm{rank}_a(i,S)$ counts how many times a symbol $a$ occurs among the first $i$ positions in $S$, and… ▽ More We consider the problem of storing a dynamic string $S$ over an alphabet $Σ=\{\,1,\ldots,σ\,\}$ in compressed form. Our representation supports insertions and deletions of symbols and answers three fundamental queries: $\mathrm{access}(i,S)$ returns the $i$-th symbol in $S$, $\mathrm{rank}_a(i,S)$ counts how many times a symbol $a$ occurs among the first $i$ positions in $S$, and $\mathrm{select}_a(i,S)$ finds the position where a symbol $a$ occurs for the $i$-th time. We present the first fully-dynamic data structure for arbitrarily large alphabets that achieves optimal query times for all three operations and supports updates with worst-case time guarantees. Ours is also the first fully-dynamic data structure that needs only $nH_k+o(n\logσ)$ bits, where $H_k$ is the $k$-th order entropy and $n$ is the string length. Moreover our representation supports extraction of a substring $S[i..i+\ell]$ in optimal $O(\log n/\log\log n + \ell/\log_σn)$ time. △ Less

Submitted 24 July, 2015; originally announced July 2015.

arXiv:1505.00357 [pdf, other]

doi 10.1007/978-3-662-48971-0_7

Optimal Search Trees with 2-Way Comparisons

Authors: Marek Chrobak, Mordecai Golin, J. Ian Munro, Neal E. Young

Abstract: In 1971, Knuth gave an $O(n^2)$-time algorithm for the classic problem of finding an optimal binary search tree. Knuth's algorithm works only for search trees based on 3-way comparisons, while most modern computers support only 2-way comparisons (e.g., $<, \le, =, \ge$, and $>$). Until this paper, the problem of finding an optimal search tree using 2-way comparisons remained open -- poly-time algo… ▽ More In 1971, Knuth gave an $O(n^2)$-time algorithm for the classic problem of finding an optimal binary search tree. Knuth's algorithm works only for search trees based on 3-way comparisons, while most modern computers support only 2-way comparisons (e.g., $<, \le, =, \ge$, and $>$). Until this paper, the problem of finding an optimal search tree using 2-way comparisons remained open -- poly-time algorithms were known only for restricted variants. We solve the general case, giving (i) an $O(n^4)$-time algorithm and (ii) an $O(n \log n)$-time additive-3 approximation algorithm. Also, for finding optimal binary split trees, we (iii) obtain a linear speedup and (iv) prove some previous work incorrect. △ Less

Submitted 9 March, 2021; v1 submitted 2 May, 2015; originally announced May 2015.

Comments: ERRATUM: The proof of Theorem 3 of the ISAAC'15 paper (v4 here) is incorrect. Version v5 here contains: a full erratum, proofs of the other results, and pointers to journal versions expanding those results

MSC Class: 68P10; 68P30; 68W25; 94A45; ACM Class: E.4; G.1.6; G.2.2; H.3.1; I.4.2

Journal ref: Optimal Search Trees with 2-Way Comparisons. In: Elbassioni K., Makino K. (eds) Algorithms and Computation. ISAAC 2015. Lecture Notes in Computer Science, vol 9472 (2105). Springer, Berlin, Heidelberg

arXiv:1503.05977 [pdf, other]

Dynamic Data Structures for Document Collections and Graphs

Authors: J. Ian Munro, Yakov Nekrich, Jeffrey Scott Vitter

Abstract: In the dynamic indexing problem, we must maintain a changing collection of text documents so that we can efficiently support insertions, deletions, and pattern matching queries. We are especially interested in develo** efficient data structures that store and query the documents in compressed form. All previous compressed solutions to this problem rely on answering rank and select queries on a d… ▽ More In the dynamic indexing problem, we must maintain a changing collection of text documents so that we can efficiently support insertions, deletions, and pattern matching queries. We are especially interested in develo** efficient data structures that store and query the documents in compressed form. All previous compressed solutions to this problem rely on answering rank and select queries on a dynamic sequence of symbols. Because of the lower bound in [Fredman and Saks, 1989], answering rank queries presents a bottleneck in compressed dynamic indexing. In this paper we show how this lower bound can be circumvented using our new framework. We demonstrate that the gap between static and dynamic variants of the indexing problem can be almost closed. Our method is based on a novel framework for adding dynamism to static compressed data structures. Our framework also applies more generally to dynamizing other problems. We show, for example, how our framework can be applied to develop compressed representations of dynamic graphs and binary relations. △ Less

Submitted 19 March, 2015; originally announced March 2015.

arXiv:1306.4287 [pdf, ps, other]

Succinct data structures for representing equivalence classes

Authors: Moshe Lewenstein, J. Ian Munro, Venkatesh Raman

Abstract: Given a partition of an n element set into equivalence classes, we consider time-space tradeoffs for representing it to support the query that asks whether two given elements are in the same equivalence class. This has various applications including for testing whether two vertices are in the same component in an undirected graph or in the same strongly connected component in a directed graph. W… ▽ More Given a partition of an n element set into equivalence classes, we consider time-space tradeoffs for representing it to support the query that asks whether two given elements are in the same equivalence class. This has various applications including for testing whether two vertices are in the same component in an undirected graph or in the same strongly connected component in a directed graph. We consider the problem in several models. -- Concerning labeling schemes where we assign labels to elements and the query is to be answered just by examining the labels of the queried elements (without any extra space): if each vertex is required to have a unique label, then we show that a label space of (\sum_{i=1}^n \lfloor {n \over i} \rfloor) is necessary and sufficient. In other words, \lg n + \lg \lg n + O(1) bits of space are necessary and sufficient for representing each of the labels. This slightly strengthens the known lower bound and is in contrast to the known necessary and sufficient bound of \lceil \lg n \rceil for the label length, if each vertex need not get a unique label. --Concerning succinct data structures for the problem when the n elements are to be uniquely assigned labels from label set {1, 2, ...n}, we first show that Θ(\sqrt n) bits are necessary and sufficient to represent the equivalence class information. This space includes the space for implicitly encoding the vertex labels. We can support the query in such a structure in O(\lg n) time in the standard word RAM model. We then develop structures resulting in one where the queries can be supported in constant time using O({\sqrt n} \lg n) bits of space. We also develop space efficient structures where union operation along with the equivalence query can be answered fast. △ Less

Submitted 18 June, 2013; originally announced June 2013.

arXiv:1204.4835 [pdf, other]

Succinct Indices for Range Queries with applications to Orthogonal Range Maxima

Authors: Arash Farzan, J. Ian Munro, Rajeev Raman

Abstract: We consider the problem of preprocessing $N$ points in 2D, each endowed with a priority, to answer the following queries: given a axis-parallel rectangle, determine the point with the largest priority in the rectangle. Using the ideas of the \emph{effective entropy} of range maxima queries and \emph{succinct indices} for range maxima queries, we obtain a structure that uses O(N) words and answers… ▽ More We consider the problem of preprocessing $N$ points in 2D, each endowed with a priority, to answer the following queries: given a axis-parallel rectangle, determine the point with the largest priority in the rectangle. Using the ideas of the \emph{effective entropy} of range maxima queries and \emph{succinct indices} for range maxima queries, we obtain a structure that uses O(N) words and answers the above query in $O(\log N \log \log N)$ time. This is a direct improvement of Chazelle's result from FOCS 1985 for this problem -- Chazelle required $O(N/ε)$ words to answer queries in $O((\log N)^{1+ε})$ time for any constant $ε> 0$. △ Less

Submitted 21 April, 2012; originally announced April 2012.

Comments: To appear in ICALP 2012

Report number: Leicester CS-TR-12-001

arXiv:1204.1957 [pdf, other]

Succinct Posets

Authors: J. Ian Munro, Patrick K. Nicholson

Abstract: We describe an algorithm for compressing a partially ordered set, or \emph{poset}, so that it occupies space matching the information theory lower bound (to within lower order terms), in the worst case. Using this algorithm, we design a succinct data structure for representing a poset that, given two elements, can report whether one precedes the other in constant time. This is equivalent to succin… ▽ More We describe an algorithm for compressing a partially ordered set, or \emph{poset}, so that it occupies space matching the information theory lower bound (to within lower order terms), in the worst case. Using this algorithm, we design a succinct data structure for representing a poset that, given two elements, can report whether one precedes the other in constant time. This is equivalent to succinctly representing the transitive closure graph of the poset, and we note that the same method can also be used to succinctly represent the transitive reduction graph. For an $n$ element poset, the data structure occupies $n^2/4 + o(n^2)$ bits, in the worst case, which is roughly half the space occupied by an upper triangular matrix. Furthermore, a slight extension to this data structure yields a succinct oracle for reachability in arbitrary directed graphs. Thus, using roughly a quarter of the space required to represent an arbitrary directed graph, reachability queries can be supported in constant time. △ Less

Submitted 22 April, 2012; v1 submitted 9 April, 2012; originally announced April 2012.

Comments: 12 pages lncs format + short appendix

arXiv:1108.1983 [pdf, ps, other]

Succinct Representations of Permutations and Functions

Authors: J. Ian Munro, Rajeev Raman, Venkatesh Raman, S. Srinivasa Rao

Abstract: We investigate the problem of succinctly representing an arbitrary permutation, π, on {0,...,n-1} so that π^k(i) can be computed quickly for any i and any (positive or negative) integer power k. A representation taking (1+ε) n lg n + O(1) bits suffices to compute arbitrary powers in constant time, for any positive constant ε<= 1. A representation taking the optimal \ceil{\lg n!} + o(n) bits can be… ▽ More We investigate the problem of succinctly representing an arbitrary permutation, π, on {0,...,n-1} so that π^k(i) can be computed quickly for any i and any (positive or negative) integer power k. A representation taking (1+ε) n lg n + O(1) bits suffices to compute arbitrary powers in constant time, for any positive constant ε<= 1. A representation taking the optimal \ceil{\lg n!} + o(n) bits can be used to compute arbitrary powers in O(lg n / lg lg n) time. We then consider the more general problem of succinctly representing an arbitrary function, f: [n] \rightarrow [n] so that f^k(i) can be computed quickly for any i and any integer power k. We give a representation that takes (1+ε) n lg n + O(1) bits, for any positive constant ε<= 1, and computes arbitrary positive powers in constant time. It can also be used to compute f^k(i), for any negative integer k, in optimal O(1+|f^k(i)|) time. We place emphasis on the redundancy, or the space beyond the information-theoretic lower bound that the data structure uses in order to support operations efficiently. A number of lower bounds have recently been shown on the redundancy of data structures. These lower bounds confirm the space-time optimality of some of our solutions. Furthermore, the redundancy of one of our structures "surpasses" a recent lower bound by Golynski [Golynski, SODA 2009], thus demonstrating the limitations of this lower bound. △ Less

Submitted 9 August, 2011; originally announced August 2011.

Comments: Preliminary versions of these results have appeared in the Proceedings of ICALP 2003 and 2004. However, all results in this version are improved over the earlier conference version

arXiv:1106.5076 [pdf, other]

Dynamic Range Selection in Linear Space

Authors: Meng He, J. Ian Munro, Patrick K. Nicholson

Abstract: Given a set $S$ of $n$ points in the plane, we consider the problem of answering range selection queries on $S$: that is, given an arbitrary $x$-range $Q$ and an integer $k > 0$, return the $k$-th smallest $y$-coordinate from the set of points that have $x$-coordinates in $Q$. We present a linear space data structure that maintains a dynamic set of $n$ points in the plane with real coordinates, an… ▽ More Given a set $S$ of $n$ points in the plane, we consider the problem of answering range selection queries on $S$: that is, given an arbitrary $x$-range $Q$ and an integer $k > 0$, return the $k$-th smallest $y$-coordinate from the set of points that have $x$-coordinates in $Q$. We present a linear space data structure that maintains a dynamic set of $n$ points in the plane with real coordinates, and supports range selection queries in $O((\lg n / \lg \lg n)^2)$ time, as well as insertions and deletions in $O((\lg n / \lg \lg n)^2)$ amortized time. The space usage of this data structure is an $Θ(\lg n / \lg \lg n)$ factor improvement over the previous best result, while maintaining asymptotically matching query and update times. We also present a succinct data structure that supports range selection queries on a dynamic array of $n$ values drawn from a bounded universe. △ Less

Submitted 8 May, 2013; v1 submitted 24 June, 2011; originally announced June 2011.

Comments: 11 pages (lncs fullpage). This is a corrected version of the preliminary version of the paper that appeared in ISAAC 2011

arXiv:1104.5517 [pdf, ps, other]

Dynamic Range Majority Data Structures

Authors: Amr Elmasry, Meng He, J. Ian Munro, Patrick K. Nicholson

Abstract: Given a set $P$ of coloured points on the real line, we study the problem of answering range $α$-majority (or "heavy hitter") queries on $P$. More specifically, for a query range $Q$, we want to return each colour that is assigned to more than an $α$-fraction of the points contained in $Q$. We present a new data structure for answering range $α$-majority queries on a dynamic set of points, where… ▽ More Given a set $P$ of coloured points on the real line, we study the problem of answering range $α$-majority (or "heavy hitter") queries on $P$. More specifically, for a query range $Q$, we want to return each colour that is assigned to more than an $α$-fraction of the points contained in $Q$. We present a new data structure for answering range $α$-majority queries on a dynamic set of points, where $α\in (0,1)$. Our data structure uses O(n) space, supports queries in $O((\lg n) / α)$ time, and updates in $O((\lg n) / α)$ amortized time. If the coordinates of the points are integers, then the query time can be improved to $O(\lg n / (α\lg \lg n) + (\lg(1/α))/α))$. For constant values of $α$, this improved query time matches an existing lower bound, for any data structure with polylogarithmic update time. We also generalize our data structure to handle sets of points in d-dimensions, for $d \ge 2$, as well as dynamic arrays, in which each entry is a colour. △ Less

Submitted 4 December, 2012; v1 submitted 28 April, 2011; originally announced April 2011.

Comments: 16 pages, Preliminary version appeared in ISAAC 2011

arXiv:1005.4652 [pdf, ps, other]

Succinct Representations of Dynamic Strings

Authors: Meng He, J. Ian Munro

Abstract: The rank and select operations over a string of length n from an alphabet of size $σ$ have been used widely in the design of succinct data structures. In many applications, the string itself need be maintained dynamically, allowing characters of the string to be inserted and deleted. Under the word RAM model with word size $w=Ω(\lg n)$, we design a succinct representation of dynamic strings using… ▽ More The rank and select operations over a string of length n from an alphabet of size $σ$ have been used widely in the design of succinct data structures. In many applications, the string itself need be maintained dynamically, allowing characters of the string to be inserted and deleted. Under the word RAM model with word size $w=Ω(\lg n)$, we design a succinct representation of dynamic strings using $nH_0 + o(n)\lgσ+ O(w)$ bits to support rank, select, insert and delete in $O(\frac{\lg n}{\lg\lg n}(\frac{\lg σ}{\lg\lg n}+1))$ time. When the alphabet size is small, i.e. when $σ= O(\polylog (n))$, including the case in which the string is a bit vector, these operations are supported in $O(\frac{\lg n}{\lg\lg n})$ time. Our data structures are more efficient than previous results on the same problem, and we have applied them to improve results on the design and construction of space-efficient text indexes. △ Less

Submitted 23 June, 2010; v1 submitted 25 May, 2010; originally announced May 2010.

arXiv:1002.3511 [pdf, ps, other]

Range Reporting for Moving Points on a Grid

Authors: Marek Karpinski, J. Ian Munro, Yakov Nekrich

Abstract: In this paper we describe a new data structure that supports orthogonal range reporting queries on a set of points that move along linear trajectories on a $U\times U$ grid. The assumption that points lie on a $U\times U$ grid enables us to significantly decrease the query time in comparison to the standard kinetic model. Our data structure answers queries in $O(\sqrt{\log U/\log \log U}+k)$ tim… ▽ More In this paper we describe a new data structure that supports orthogonal range reporting queries on a set of points that move along linear trajectories on a $U\times U$ grid. The assumption that points lie on a $U\times U$ grid enables us to significantly decrease the query time in comparison to the standard kinetic model. Our data structure answers queries in $O(\sqrt{\log U/\log \log U}+k)$ time, where $k$ denotes the number of points in the answer. The above improves over the $Ω(\log n)$ lower bound that is valid in the infinite-precision kinetic model. The methods used in this paper could be also of independent interest. △ Less

Submitted 18 February, 2010; originally announced February 2010.

Comments: 14 pages

arXiv:0911.0086 [pdf, other]

Sorting under Partial Information (without the Ellipsoid Algorithm)

Authors: Jean Cardinal, Samuel Fiorini, Gwenaël Joret, Raphaël Jungers, J. Ian Munro

Abstract: We revisit the well-known problem of sorting under partial information: sort a finite set given the outcomes of comparisons between some pairs of elements. The input is a partially ordered set P, and solving the problem amounts to discovering an unknown linear extension of P, using pairwise comparisons. The information-theoretic lower bound on the number of comparisons needed in the worst case is… ▽ More We revisit the well-known problem of sorting under partial information: sort a finite set given the outcomes of comparisons between some pairs of elements. The input is a partially ordered set P, and solving the problem amounts to discovering an unknown linear extension of P, using pairwise comparisons. The information-theoretic lower bound on the number of comparisons needed in the worst case is log e(P), the binary logarithm of the number of linear extensions of P. In a breakthrough paper, Jeff Kahn and Jeong Han Kim (J. Comput. System Sci. 51 (3), 390-399, 1995) showed that there exists a polynomial-time algorithm for the problem achieving this bound up to a constant factor. Their algorithm invokes the ellipsoid algorithm at each iteration for determining the next comparison, making it impractical. We develop efficient algorithms for sorting under partial information. Like Kahn and Kim, our approach relies on graph entropy. However, our algorithms differ in essential ways from theirs. Rather than resorting to convex programming for computing the entropy, we approximate the entropy, or make sure it is computed only once, in a restricted class of graphs, permitting the use of a simpler algorithm. Specifically, we present: - an O(n^2) algorithm performing O(log n log e(P)) comparisons; - an O(n^2.5) algorithm performing at most (1+ epsilon) log e(P) + O_epsilon (n) comparisons; - an O(n^2.5) algorithm performing O(log e(P)) comparisons. All our algorithms can be implemented in such a way that their computational bottleneck is confined in a preprocessing phase, while the sorting phase is completed in O(q) + O(n) time, where q denotes the number of comparisons performed. △ Less

Submitted 21 January, 2013; v1 submitted 31 October, 2009; originally announced November 2009.

Comments: v3: Minor changes. A preliminary version appeared in the proceedings of the 42th ACM Symposium on Theory of Computing (STOC 2010)

arXiv:0811.2572 [pdf, other]

doi 10.1137/090759860

An Efficient Algorithm for Partial Order Production

Authors: Jean Cardinal, Samuel Fiorini, Gwenaël Joret, Raphaël M. Jungers, J. Ian Munro

Abstract: We consider the problem of partial order production: arrange the elements of an unknown totally ordered set T into a target partially ordered set S, by comparing a minimum number of pairs in T. Special cases include sorting by comparisons, selection, multiple selection, and heap construction. We give an algorithm performing ITLB + o(ITLB) + O(n) comparisons in the worst case. Here, n denotes t… ▽ More We consider the problem of partial order production: arrange the elements of an unknown totally ordered set T into a target partially ordered set S, by comparing a minimum number of pairs in T. Special cases include sorting by comparisons, selection, multiple selection, and heap construction. We give an algorithm performing ITLB + o(ITLB) + O(n) comparisons in the worst case. Here, n denotes the size of the ground sets, and ITLB denotes a natural information-theoretic lower bound on the number of comparisons needed to produce the target partial order. Our approach is to replace the target partial order by a weak order (that is, a partial order with a layered structure) extending it, without increasing the information theoretic lower bound too much. We then solve the problem by applying an efficient multiple selection algorithm. The overall complexity of our algorithm is polynomial. This answers a question of Yao (SIAM J. Comput. 18, 1989). We base our analysis on the entropy of the target partial order, a quantity that can be efficiently computed and provides a good estimate of the information-theoretic lower bound. △ Less

Submitted 1 December, 2009; v1 submitted 17 November, 2008; originally announced November 2008.

Comments: Referees' comments incorporated

Journal ref: SIAM J. Comput. Volume 39, Issue 7, pp. 2927-2940 (2010)

arXiv:cs/0601081 [pdf, ps, other]

An O(1) Solution to the Prefix Sum Problem on a Specialized Memory Architecture

Authors: Andrej Brodnik, Johan Karlsson, J. Ian Munro, Andreas Nilsson

Abstract: In this paper we study the Prefix Sum problem introduced by Fredman. We show that it is possible to perform both update and retrieval in O(1) time simultaneously under a memory model in which individual bits may be shared by several words. We also show that two variants (generalizations) of the problem can be solved optimally in $Θ(\lg N)$ time under the comparison based model of computation… ▽ More In this paper we study the Prefix Sum problem introduced by Fredman. We show that it is possible to perform both update and retrieval in O(1) time simultaneously under a memory model in which individual bits may be shared by several words. We also show that two variants (generalizations) of the problem can be solved optimally in $Θ(\lg N)$ time under the comparison based model of computation. △ Less

Submitted 18 January, 2006; originally announced January 2006.

Comments: 12 pages

ACM Class: E.1; F.1.1

arXiv:cs/0107031 [pdf, ps, other]

The Complexity of Clickomania

Authors: Therese C. Biedl, Erik D. Demaine, Martin L. Demaine, Rudolf Fleischer, Lars Jacobsen, J. Ian Munro

Abstract: We study a popular puzzle game known variously as Clickomania and Same Game. Basically, a rectangular grid of blocks is initially colored with some number of colors, and the player repeatedly removes a chosen connected monochromatic group of at least two square blocks, and any blocks above it fall down. We show that one-column puzzles can be solved, i.e., the maximum possible number of blocks ca… ▽ More We study a popular puzzle game known variously as Clickomania and Same Game. Basically, a rectangular grid of blocks is initially colored with some number of colors, and the player repeatedly removes a chosen connected monochromatic group of at least two square blocks, and any blocks above it fall down. We show that one-column puzzles can be solved, i.e., the maximum possible number of blocks can be removed, in linear time for two colors, and in polynomial time for an arbitrary number of colors. On the other hand, deciding whether a puzzle is solvable (all blocks can be removed) is NP-complete for two columns and five colors, or five columns and three colors. △ Less

Submitted 21 July, 2001; originally announced July 2001.

Comments: 15 pages, 3 figures. To appear in More Games of No Chance, edited by R. J. Nowakowski

ACM Class: F.2.2; F.1.3; F.1.1; G.2.1

Showing 1–35 of 35 results for author: Munro, J I