Fast and Simple Sorting Using Partial Information
Abstract
We consider the problem of sorting a set of items having an unknown total order by doing binary comparisons of the items, given the outcomes of some pre-existing comparisons. We present a simple algorithm with a running time of , where , , and are the number of items, the number of pre-existing comparisons, and the number of total orders consistent with the outcomes of the pre-existing comparisons, respectively. The algorithm does comparisons.
Our running time and comparison bounds are best possible up to constant factors, thus resolving a problem that has been studied intensely since 1976 (Fredman, Theoretical Computer Science). The best previous algorithm with a bound of on the number of comparisons has a time bound of and is significantly more complicated. Our algorithm combines three classic algorithms: topological sort, heapsort with the right kind of heap, and efficient insertion into a sorted list.
1 Introduction
We consider the problem of sorting a set of items that are elements of a totally ordered set, assuming we are given for free the outcomes of certain comparisons between the items. This problem has been called sorting under partial information [Fre76, KK92], and it has been intensely studied since 1978.
We present a simple algorithm for this problem that runs in time and does comparisons, where , , and are the number of items, the number of pre-existing comparisons, and the number of total orders consistent with the pre-existing comparisons, respectively. These bounds are tight. The best previous algorithm with an comparison bound has a time bound of and is much more complicated [Car+10].
Our algorithm for this problem combines three classic algorithms in a natural way: topological sort, heapsort with the right kind of heap, and efficient insertion into a sorted list. Unlike previous algorithms, ours does not use an estimate of to determine the next comparison.
Our algorithm is closely related to a recent result of [Hae+23] proving that Dijkstra’s algorithm with an appropriate heap is universally optimal for the task of sorting vertices according to their distance from a source vertex. We use the same kind of heap, and we use similar ideas to analyze our algorithm.
Graph-theoretical problem formulation
We assume the input is given in the form of a directed graph whose vertices are the items, having an arc for each given comparison outcome . The goal is to compute the unknown total order of the vertices of by doing additional binary comparisons of the vertices. The parameters and denote the number of vertices and the number of arcs of , respectively.
The problem has a solution if and only if is acyclic; that is, is a directed acyclic graph (DAG). Each possible solution is a topological order of the DAG, a total order such that if is an arc, . A lower bound on the number of comparisons needed to solve the problem is ,111Throughout this paper “” without a base denotes the base-two logarithm. where is the number of possible topological orders of .
An equivalent view of the problem is that the given comparisons define a partial order, and the problem is to find an unknown linear extension of this partial order by doing additional comparisons. We prefer the DAG view, because the input DAG may not be transitively closed (if and are arcs then so is ) nor transitively reduced (if is an arc then there is no other path from to ). We want a solution for an arbitrary DAG, and we do not want to take the time to compute either its transitive reduction or its transitive closure. From now on, we call the problem DAG sorting.
Roadmap
In Section 2, we briefly discuss related work on DAG sorting, on heaps with the working-set bound (a key ingredient in our algorithm), on related sorting and ordering problems, and on sampling and counting topological orders. In Section 3, we present our basic algorithm, which we call topological heapsort. We prove that it runs in time and does comparisons. We eliminate the additive term in the number of comparisons in Section 4 by adding an additional step to our algorithm, producing an algorithm that we call topological heapsort with insertion. This algorithm is best possible to within constant factors in both its running time and its number of comparisons.
2 Related Work
DAG sorting
[Fre76] considered the generalization of the DAG sorting problem in which there is an unknown total order selected from an arbitrary subset of the total orders of elements, and the problem is to find the unknown total order by doing binary comparisons of the elements. He showed that if there are possible total orders, binary comparisons suffice. His algorithm is highly inefficient, however.
For the special case of sorting a DAG, [Fre76] and [Lin84] independently conjectured that there always exists a balancing comparison: a comparison “?” such that the fraction of the topological orders of for which the answer is “yes” lies between and for . This is the 1/3–2/3 conjecture. [KS84] showed that the claim is true for , and the value of has since been improved several times [KL91, BFT95, Bri99]. It follows immediately that there is an algorithm that always does at most comparisons: Repeatedly find a balancing comparison.
[KK92] gave a polynomial-time algorithm for the DAG sorting problem that does comparisons. Their algorithm uses the ellipsoid method to find a good comparison. [Car+10] gave several algorithms for sorting a DAG that avoid the use of the ellipsoid method. In particular, they devised an algorithm that runs in time and does comparisons. This is the fastest previous algorithm that sorts a DAG in comparisons. Another algorithm in [Car+10] has the same time bound and does comparisons.
Heaps with the working-set bound
Our algorithm substantially deviates from the techniques previously used for DAG sorting. Our basic algorithm is a classical topological sorting algorithm modified to use a heap to choose the next-smallest vertex. We prove that if the heap has the so-called working-set bound, then our DAG sorting algorithm is efficient.
A heap is a data structure storing a set of items, each having a key selected from a totally ordered set. The heap is initially empty. For our purpose, heaps support two operations. The first is , which inserts item into heap ; must not be in already and must have a predefined key. The second is , which deletes and returns an item in with minimum key. Some heap implementations support a third operation, , which, given the location of an item in heap such that has key greater than , replaces the key of in heap by . Fibonacci heaps [Fre+86a] and other equally efficient heap implementations support insert and decrease-key in amortized time and delete-min on an -item heap in time.
In our application we need a more refined bound for delete-mins that depends on the sequence of operations done on the heap. This is the working-set bound, defined as follows. Consider an item that is inserted into the heap at some time and later deleted. An item is in the working set of if it is or is inserted after but before is deleted. The number of items in the working set of can change over time as other items are inserted and deleted. The working set size of is the maximum size of its working set while is in the heap. The heap has the working set bound if each insertion takes time and each delete-min of an item takes time. These bounds can be amortized. (If decrease-key is a supported operation, it has no required time bound.) We call a heap that has the working-set bound a working-set heap. Although it is not obvious, one can prove that a heap with the working set bound has amortized time bounds of for insert and for delete-min [Hae+24].
The working-set bound is related to a similar bound for binary search trees, one possessed by splay trees [ST85].
[Iac00] proved that pairing heaps [Fre+86], a form of self-adjusting heap, have the working-set bound, provided that the heap ends empty. Splay trees used as heaps in an appropriate way also have the working-set bound, again provided that the heap ends empty [Hae+24]. [Elm06] devised a heap with an even better bound: The time for a delete-min of is , where is the size of the working set of when is deleted. One can obtain the same bound in a more straightforward way using a finger search tree [Hae+24]. [Hae+23] developed a heap with the working-set bound that also supports decrease-key operations in amortized time.
Related sorting and ordering problems
Our approach is strongly influenced by recent work [Hae+23] on the distance-ordering problem. The input to this problem is a directed graph with non-negative arc weights and a source vertex. The problem is to sort the vertices of the input graph by their distance from the source. This problem can be solved in time and comparisons by running Dijkstra’s algorithm using a Fibonacci heap [Fre+86a]. The bounds are best possible in the worst case. The authors of [Hae+23] improve this worst-case result by giving a so-called universally optimal algorithm: They show that if Dijkstra’s algorithm is implemented using a heap with the working set bound, then for any fixed unweighted graph with a given source vertex, the algorithm takes the minimum time needed (to within a constant factor) to solve the problem on with a worst-case choice of arc weights. Moreover, a variant of Dijkstra’s algorithm minimizes not only the time but also the number of comparisons to within a constant factor.
The DAG sorting problem also asks for a universally optimal algorithm in that on any given DAG it should minimize the running time and number of comparisons. Both problems are generalizations of sorting. Not only are the two problems similar, but we show that they can be solved by similar techniques.
Another algorithm related to ours is the adaptive heapsort algorithm of [LP93]. This algorithm sorts a sequence of numbers by first building a heap-ordered tree, the Cartesian tree of the sequence, defined recursively as follows: The root is the minimum number in the sequence, say , its left subtree is the Cartesian tree of the subsequence preceding , and its right subtree is the Cartesian tree of the subsequence following . The Cartesian tree can be built in at most comparisons. The algorithm finishes the sort using a heap to store the possible minima, initially only the root. After a delete-min, say of , the children of in the Cartesian tree are inserted in the heap. [LP93] use a standard heap in their algorithm, but if one uses a working-set heap instead, our results imply that adaptive heapsort does comparisons, where is the number of topological orders of the Cartesian tree viewed as a DAG.
An ordering problem that is dual to DAG sorting is that of producing a given partial order on a totally ordered set, by doing enough comparisons so that the partial order induced by the comparison outcomes is isomorphic to the input partial order. The two problems are dual in the sense that if an -element partial order has topological orders, the partial-order production problem on the same partial order requires comparisons [Sch76, Yao89], both worst-case and expected-case. Cardinal et al. [Car+09] gave an algorithm for partial-order production that does a number of comparisons within a lower-order term of the lower bound plus . One step in their algorithm finds a greedy coloring of the comparability graph of the partially ordered set. We use a similar concept, that of a greedy clique partition of an interval graph (see Section 3.2), but only in our analysis, not in our algorithm, and our partition is of a different graph, one determined by the behavior of our algorithm.
Sampling and counting topological orders
Even though we give an algorithm that does comparisons, calculating exactly is hard, since the problem of determining is #P-complete [BW91]. There are algorithms that compute approximately [DFK91, Ban+10], however. [BW91] have shown that a constant-factor approximation to can be obtained with high probability using calls to an oracle that samples a uniformly random topological order. There are multiple results on sampling a random topological order [Mat91, KK91, BD99, Hub06], with the state-of-the-art approach having time complexity and leading to an approximation algorithm for . In Appendix A, we show how to use one oracle call to get a constant-factor approximation to , and hence a polynomial approximation to .
3 Topological Heapsort and Its Efficiency
This section introduces our basic algorithm, which we call topological heapsort. The algorithm combines two classic algorithms, topological sort [Knu97] and heapsort [Wil64, Flo64] in a simple way. We describe the algorithm in Section 3.1. In Section 3.2, we prove that our algorithm runs in time and does comparisons if it is implemented with a working-set heap.
3.1 Topological Heapsort
We start by recalling a basic result in graph theory: A directed graph is a DAG if and only if it has a topological order. One can prove this by the following classic topological sorting algorithm [Kah62, Knu97]: Call a vertex a source if it has no entering arcs. Given a directed graph, repeatedly delete a source and its outgoing arcs, until there are either no vertices or no sources. In the former case, the vertex deletion order is a topological order; in the latter case, one can find a cycle by starting at any remaining vertex and building a path by repeatedly traversing some arc in the reverse direction and continuing until a vertex is repeated.
To make this algorithm efficient, one needs to keep track of the current set of sources. Kahn [Kah62] does this by maintaining for each vertex its current in-degree (number of incoming arcs). The sources are the vertices with in-degree zero. When a vertex is deleted, the in-degrees of its immediate successors (those reached by an outgoing arc) are decremented. In each iteration, Kahn finds a source by examining all the remaining vertices, which results in worst-case running time. Knuth [Knu97] adds the idea of maintaining the current set of sources in a separate data structure, for which he uses a queue. This reduces the running time to .
Given a DAG, our task is to find a specific topological order, the one corresponding to the unknown total order of the vertices. Our basic algorithm is the topological sorting algorithm with the current set of sources stored in a heap. The key of a vertex is the vertex itself. Each step deletes the minimum vertex, say , from the heap, adds to the total order, decrements the in-degree of each vertex such that is an arc, inserts into the heap each such whose in-degree is now zero, and finally deletes and its outgoing arcs from the DAG. The complete description of the algorithm is given in Algorithm 1.
This algorithm is not only a version of topological sort, it is also a form of heapsort [Wil64, Flo64], which in turn is a form of selection sort: The algorithm adds the vertices to the total order in increasing order, using a heap to do so. It differs from standard heapsort in that the heap contains only the current sources, which are the only candidates for the next minimum, rather than all the undeleted vertices. We call the algorithm topological heapsort.
Topological heapsort is correct because if is an undeleted vertex that is not a source, it cannot be the smallest undeleted vertex. The running time of the algorithm is plus the time required for insertions into and intermixed delete-mins from . All the vertex comparisons are in the heap operations. The input can be a list of the arcs in ; if it is, as part of the initialization we build for each vertex a list of its outgoing arcs.
We implement topological heapsort using a working-set heap.
3.2 Efficiency of Topological Heapsort
To prove that topological heapsort is efficient, we must estimate the number of comparisons needed to sort the vertices of a given DAG . This is at least , where is the number of topological orders of , by a standard information-theory lower bound argument.
For completeness we present a version of this argument. Given a sorting algorithm, consider the adversary that begins with the set of all topological orders consistent with the comparison outcomes so far, and that responds to each comparison with the outcome that eliminates at most half of the previously consistent orders. Then the algorithm must do at least comparisons to verify that is has the correct order. If it does fewer, then even if it guesses the order, it will be correct at most half the time if the adversary responds with a uniformly random consistent total order.
We shall show that topological heapsort matches this bound to within a constant factor plus an additive term in . Specifically, we shall prove the following theorem:
Theorem 3.1.
Topological heapsort implemented with a working-set heap runs in time and does comparisons.
Our approach is very similar to that used in [Hae+23]. We prove the theorem by develo** a lower bound on that can be related to the time required by the delete-min operations in topological heapsort via the working-set bound.
Given a run of topological heapsort, let and be the times at which vertex is inserted into and deleted from the heap , respectively. We associate the time interval with vertex . We define the interval graph of the run to be the undirected graph whose vertices are the vertices of the DAG , with an edge connecting two vertices if their intervals overlap. A clique of is a set of pairwise adjacent vertices. For any clique of , there is at least one time that is common to all of the corresponding intervals, namely , the insertion time of the last-inserted vertex in the clique. We call this time the critical time of .
We partition the vertices of into cliques in a greedy fashion, by selecting any clique of maximum size, deleting its vertices and incident edges, and repeating until there are no vertices left. Since the vertices of and are the same, this also partitions the vertices of . Although the first clique deleted is maximal in (no additional vertices can be added), this is not true of later cliques, since it may be possible to add to them vertices that were deleted earlier. Let be the sequence of cliques, in increasing order by critical time. This order is in general different from their order of selection. We denote by the number of vertices in .
If is an arc of , then must be deleted from before is inserted. Thus if and , then . In particular, no arcs of have both ends in the same . This allows us to prove the following lemma:
Lemma 3.2.
.
Proof.
We can obtain a topological order of by arranging the vertices of in any order, followed by the vertices of in any order, and so on. Thus The lemma follows by taking logarithms and applying Stirling’s approximation. ∎
Lemma 3.2 gives us a lower bound on . We need to relate this lower bound to the time taken by the delete-min operations. We need one more definition. If is any clique of (not just a ), the primary vertex of is the vertex in inserted into the earliest. Using this definition, we can reformulate the definition of working-set size given in Section 2 as follows: The working-set size of a vertex is the maximum number of vertices in a clique of whose primary vertex is . This definition coincides with the original definition for the original , but can decrease as vertices are deleted from .
The next lemma is the heart of our analysis. It is a variant of Lemmas 3.14 and 3.15 in [Hae+23].
Lemma 3.3.
, where is the working-set size of in .
Proof.
We prove the lemma by induction on . Suppose the lemma is true for . Let be the first clique selected when building the clique partition, let be after the vertices in and their incident edges are deleted, and let for be a renumbering of the cliques , not including . (The order does not matter.) By the induction hypothesis, , where is the working-set size of in .
To prove the bound for , we must account for the terms for the clique and its vertices, and also for any increases in the working-set sizes of vertices in caused by adding the vertices in . The first is easy, the second is the technical part of the proof.
Since is a largest clique in , for every vertex in . Hence .
Consider the increase in working-set sizes that results from adding one vertex in to . This can increase the working-set size of only if , and then by at most one. Let be the vertices such that , ordered in decreasing order by . Then the increase in caused by adding is at most by the concavity of the function. Summing over , the sum telescopes, and we find that the total increase is at most . The set containing and all the vertices is a clique in . By the greedy choice of , . This argument applies sequentially to each addition of a vertex in to . Thus .
Adding our two bounds and the bound given by the induction hypothesis gives us the desired bound for . The lemma follows by induction. ∎
Theorem 3.1 follows immediately from Lemmas 3.2 and 3.3, since the running time of topological heapsort is plus the time to do the heap operations, the time to do the heap operations is by the two lemmas and the definition of the working-set bound, and all the comparisons are in the heap operations.
4 Topological Heapsort with Insertion
The bound on comparisons in Theorem 3.1 includes an additive term linear in . This term is significant if the number of topological orders of the DAG is small, specifically sub-exponential in . In this section, we augment topological heapsort to eliminate this term in the bound.
The first step is to determine when the additive term is significant. This is only when the input DAG contains a long path.
Lemma 4.1.
Let be the number of vertices on a longest path in a DAG . Then has at least topological orders.
Proof.
Consider the partition of the vertices of into layers , where layer contains all vertices such that the longest path ending in contains vertices. Then if is an arc with and , then . In particular, there are no arcs with both ends in the same layer. It follows as in the proof of Lemma 3.2 that the number of topological orders of is at least : The vertices can be ordered layer by layer, and those within a layer can be ordered arbitrarily. A layer that contains or vertices contributes at least to this product. This means that each layer contributes to the product at least . The number of topological orders is thus at least
∎
By Lemma 4.1, topological heapsort uses the desired bound of comparisons to sort the vertices of any DAG whose maximum-length path contains at most vertices, where is any positive constant, since in this case . Thus we only need a way to handle any DAG that has a path containing at least a large constant fraction of the vertices. To handle such a DAG, we find a longest path and run topological heapsort without inserting any of the vertices on this path into the heap. Instead, we insert each vertex returned by a delete-min into the long path using an exponential search followed by a binary search on the relevant part of the path to find the insertion position.
Topological heapsort with insertion
The resulting sorting algorithm, which we call topological heapsort with insertion, is as follows:
![Refer to caption](x1.png)
-
Item 1: Find a longest path in the given DAG . Mark the last vertex of . For each vertex not on , mark the last vertex on such that is an arc and the first vertex on such that is an arc. Delete all arcs between and other than and . Form DAG from by adding an arc from each marked vertex on other than the last one to the next marked vertex , and deleting all unmarked vertices on and their incident arcs. Save the vertices that were originally on in an array .
-
Item 2: Run topological heapsort on . Each time a vertex is deleted from , do the following: Delete from all vertices less than or equal to , and add these vertices to the sorted list in their order in . Add to as well. Then continue executing topological heapsort.
To do Item 1, find any topological order of the vertices of and process the vertices in topological order as follows: Compute for each vertex the length of the longest path ending in , using the recurrence , where is the set of arcs of . Find a longest path by starting at a vertex of largest and proceeding backward, always to a vertex of largest . Finding a longest path takes time and no vertex comparisons.
In Item 2, find the set of vertices in less than or equal to as follows: If is in , then this set is the prefix of ending with . Testing this requires no vertex comparisons. If is not in , use exponential search followed by binary search: Compare with the first, second, fourth, eighth, … vertex in until finding a pair of vertices and in such that (possibly if is less than the first vertex in ). Then do a binary search on the set of vertices in between and to find . A search that returns the -th vertex on takes time and comparisons.
The running time of topological heapsort with insertion is . We shall show that the number of comparisons it does is . We need the following simple monotonicity lemma:
Lemma 4.2.
Let be a DAG. Suppose that a DAG is formed by (i) deleting an arc such that there is a path from to avoiding this arc, (ii) adding an arc to , (iii) deleting a vertex with one incoming arc and one outgoing arc and adding an arc , or (iv) removing a source vertex. Then has at least as many topological orders as .
Proof.
(i) If is formed by deleting an arc such that there is a path from to avoiding , then the set of topological orders of is the same as that of . (ii) If is formed from by adding an arc, then any topological order of is a topological order of . (iii) If is formed from by deleting a vertex with incident arcs and and adding , then any topological order of can be extended to one of by inserting anywhere between and . (iv) Any topological order of may be extended to a topological order of by inserting before all the vertices of . ∎
Theorem 4.3.
Topological heapsort with insertion sorts the vertices of a DAG in time and comparisons.
Proof.
The time that the algorithm spends outside of Algorithm 1 is . Hence the running time is by Theorem 3.1. It remains to prove that the algorithm does comparisons.
Let be the number of vertices on . By Lemma 4.1, . The number of vertices in is at most , so the number of comparisons needed by topological sort when run on is by Theorems 3.1 and 4.2.
Consider the search that Item 2 does for a vertex not in . We use to denote the set of vertices removed from when processing in Item 2. The number of comparisons the search takes is . For any correct order of the vertices of and any node , consider the topological orders obtained by moving by , , , positions to the left in the current topological order. Note that these are indeed valid topological orders, since the vertices of immediately preceding have to be part of by the definition of . We may shift in this fashion independently for every vertex . Therefore, . We can rewrite this inequality as ; and, finally, conclude that . Thus our algorithm does at most comparisons during Item 2.
Combining the bounds for running topological heapsort on and inserting vertices into gives the theorem. ∎
Remark.
If Item 1 finds that the longest path has length at most for some fixed positive , the algorithm can skip the construction of and just run topological heapsort on the original graph . Also, if the problem must be solved repeatedly for a fixed DAG with different total orders, Item 1 needs to be run only once.
References
- [Ban+10] Jacqueline Banks, Scott Garrabrant, Mark L Huber and Anne Perizzolo “Using TPA to count linear extensions” In arXiv preprint arXiv:1010.4981, 2010
- [BHM13] Prosenjit Bose, John Howat and Pat Morin “A history of distribution-sensitive data structures” In Space-Efficient Data Structures, Streams, and Algorithms: Papers in Honor of J. Ian Munro on the Occasion of His 66th Birthday Springer, 2013, pp. 133–149
- [Bri99] Graham Brightwell “Balanced pairs in partial orders” In Discrete Mathematics 201.1-3 Elsevier, 1999, pp. 25–52
- [BW91] Graham Brightwell and Peter Winkler “Counting linear extensions is #P-complete” In Proceedings of the twenty-third annual ACM symposium on Theory of computing, 1991, pp. 175–181
- [BFT95] Graham R Brightwell, Stefan Felsner and William T Trotter “Balancing pairs and the cross product conjecture” In Order 12 Springer, 1995, pp. 327–349
- [BD99] Russ Bubley and Martin Dyer “Faster random generation of linear extensions” In Discrete mathematics 201.1-3 Elsevier, 1999, pp. 81–88
- [CF13] Jean Cardinal and Samuel Fiorini “On generalized comparison-based sorting problems” In Space-Efficient Data Structures, Streams, and Algorithms: Papers in Honor of J. Ian Munro on the Occasion of His 66th Birthday Springer, 2013, pp. 164–175
- [Car+10] Jean Cardinal et al. “Sorting under partial information (without the ellipsoid algorithm)” In Proceedings of the forty-second ACM symposium on Theory of computing, 2010, pp. 359–368
- [Car+09] Jean Cardinal et al. “An efficient algorithm for partial order production” In Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009 ACM, 2009, pp. 93–100
- [DM18] Eyal Dushkin and Tova Milo “Top-k sorting under partial order information” In Proceedings of the 2018 International Conference on Management of Data, 2018, pp. 1007–1019
- [DFK91] Martin Dyer, Alan Frieze and Ravi Kannan “A random polynomial-time algorithm for approximating the volume of convex bodies” In Journal of the ACM (JACM) 38.1 ACM New York, NY, USA, 1991, pp. 1–17
- [Elm06] Amr Elmasry “A priority queue with the working-set property” In International Journal of Foundations of Computer Science 17.06 World Scientific, 2006, pp. 1455–1465
- [EFI13] Amr Elmasry, Arash Farzan and John Iacono “On the hierarchy of distribution-sensitive properties for data structures” In Acta informatica 50.4 Springer, 2013, pp. 289–295
- [Flo64] Robert W Floyd “Algorithm 245: treesort” In Communications of the ACM 7.12 ACM New York, NY, USA, 1964, pp. 701
- [Fre76] Michael L Fredman “How good is the information theory bound in sorting?” In Theoretical Computer Science 1.4 Elsevier, 1976, pp. 355–361
- [Fre+86] Michael L Fredman, Robert Sedgewick, Daniel D Sleator and Robert E Tarjan “The pairing heap: A new form of self-adjusting heap” In Algorithmica 1.1-4 Springer, 1986, pp. 111–129
- [Fre+86a] Michael L. Fredman, Robert Sedgewick, Daniel Dominic Sleator and Robert Endre Tarjan “The Pairing Heap: A New Form of Self-Adjusting Heap” In Algorithmica 1.1, 1986, pp. 111–129 DOI: 10.1007/BF01840439
- [Hae+24] Bernhard Haeupler et al. “Heaps with the Working-Set Bound” Preprint, 2024
- [Hae+23] Bernhard Haeupler et al. “Universal Optimality of Dijkstra via Beyond-Worst-Case Heaps”, 2023 arXiv:2311.11793 [cs.DS]
- [Hub06] Mark Huber “Fast perfect sampling from linear extensions” In Discrete Mathematics 306.4 Elsevier, 2006, pp. 420–428
- [Iac00] John Iacono “Improved upper bounds for pairing heaps” In Scandinavian Workshop on Algorithm Theory, 2000, pp. 32–45 Springer
- [Kah62] Arthur B Kahn “Topological sorting of large networks” In Communications of the ACM 5.11 ACM New York, NY, USA, 1962, pp. 558–562
- [KK92] Jeff Kahn and Jeong Han Kim “Entropy and sorting” In Proceedings of the twenty-fourth annual ACM symposium on Theory of computing, 1992, pp. 178–187
- [KL91] Jeff Kahn and Nathan Linial “Balancing extensions via Brunn-Minkowski” In Combinatorica 11.4, 1991, pp. 363–368
- [KS84] Jeff Kahn and Michael Saks “Balancing poset extensions” In Order 1 Springer, 1984, pp. 113–126
- [KK91] Alexander Karzanov and Leonid Khachiyan “On the conductance of order Markov chains” In Order 8 Springer, 1991, pp. 7–15
- [Knu97] Donald E Knuth “The Art of Computer Programming: Fundamental Algorithms, volume 1” Addison-Wesley Professional, 1997
- [KS18] László Kozma and Thatchaphol Saranurak “Smooth heaps and a dual view of self-adjusting data structures” In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, 2018, pp. 801–814
- [LP93] Christos Levcopoulos and Ola Petersson “Adaptive heapsort” In Journal of Algorithms 14.3 Elsevier, 1993, pp. 395–413
- [Lin84] Nathan Linial “The information-theoretic bound is good for merging” In SIAM Journal on Computing 13.4 SIAM, 1984, pp. 795–801
- [Mat91] Peter Matthews “Generating a random linear extension of a partial order” In The Annals of Probability 19.3 Institute of Mathematical Statistics, 1991, pp. 1367–1392
- [Mun+19] J Ian Munro, Richard Peng, Sebastian Wild and Lingyi Zhang “Dynamic Optimality Refuted–For Tournament Heaps” In arXiv preprint arXiv:1908.00563, 2019
- [Sch76] A. Schönhage “The Production of Partial Orders” In Astérisque 38-39 Soc. Math. France, Paris, 1976, pp. 229–246
- [Sha48] Claude Elwood Shannon “A mathematical theory of communication” In The Bell system technical journal 27.3 Nokia Bell Labs, 1948, pp. 379–423
- [ST85] Daniel Dominic Sleator and Robert Endre Tarjan “Self-adjusting binary search trees” In Journal of the ACM (JACM) 32.3 ACM New York, NY, USA, 1985, pp. 652–686
- [Wil64] J Williams “Heapsort” In Commun. ACM 7.6, 1964, pp. 347–348
- [Yao89] Andrew Chi-Chih Yao “On the Complexity of Partial Order Productions” In SIAM J. Comput. 18.4, 1989, pp. 679–689
Appendix A Sampling and Counting Topological Orders
Given a DAG and its corresponding number of topological orders , our algorithm yields a simple way of estimating the value of to within a constant factor. The idea is that the DAG sorting problem with an unknown total order selected uniformly at random takes comparisons with high probability. Thus if the algorithm is run on one sample selected uniformly at random, we obtain a good approximation of .
Theorem A.1.
Let be a directed acyclic graph with topological orders. Assume that there is an algorithm that returns a topological order -pointwise close to uniform222We say that a distribution is -pointwise close to if for every element we have . in time . Then there is an algorithm that runs in time , performs comparisons, and returns a constant-factor approximation of the value with error probability .
Proof.
The algorithm merely samples a topological order, runs topological heapsort with insertion on the sample, and returns the number of comparisons made by the algorithm. The running time of the algorithm is . Since in order to sample the order, we have to read the whole input DAG, this is equal to the desired . The number of comparisons it does is (note that we do not use any comparisons when generating the sample).
The number of comparisons done by topological heapsort with insertion is by Theorem 4.3. Thus our algorithm returns an upper bound on that is at most a constant factor larger than the true value.
It remains to give a similar lower bound. We prove that with high probability . Suppose is a sample -pointwise close to uniform. Consider the event that . For any topological order , we have
The conditional entropy of is then
The comparisons performed by topological heapsort with insertion uniquely determine the topological order . Thus by Shannon’s source coding theorem for symbol codes [Sha48], we have , and we can write
Solving for , we get
which concludes the proof. ∎