HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.
failed: algpseudocodex
Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.
License: CC BY 4.0
arXiv:2403.01568v1 [cs.DS] 03 Mar 2024
11institutetext: Computer Science Department, Technion, Haifa, Israel. 11email: {idoron-arad,naor,hadas}@cs.technion.ac.il.22institutetext: Computer Science Department, Rutgers University-Camden, Camden, NJ, USA.
22email: [email protected].33institutetext: Computer Science Department, New Jersey Institute of Technology, Newark, NJ, USA. 33email: [email protected].
Approximations and Hardness of Covering and
Packing Partially Ordered Items
Motivated by applications in production planning and storage allocation in hierarchical databases, we initiate the study of covering partially ordered items (cpo). Given
a value , and
a directed graph where each vertex has a size in ,
we seek a collection of subsets of vertices that cover all the vertices, such that for any , the total size of vertices in is bounded by , and there are no edges from to .
The objective is to minimize the number of subsets . cpo is closely related to the rule caching problem (rcp) that has been widely studied
in the networking area. The input for rcp is a directed graph , a profit function , and .
The output is a subset of maximum profit such that and there are no edges from to .
Our main result is a -approximation algorithm for cpo on out-trees, complemented by an asymptotic -hardness of approximation result. We also give a two-way reduction between rcp and the densest -subhypergraph problem, surprisingly showing that the problems are equivalent w.r.t. polynomial-time approximation within any factor . This implies that rcp cannot be approximated within factor for any fixed , under standard
complexity assumptions. Prior to this work, rcp was just known to be strongly NP-hard.
We further show that there is no EPTAS for the special case of rcp where the profits are uniform, assuming Gap-ETH. Since this variant admits a PTAS, we essentially resolve the complexity status of this problem.
1 Introduction
Partially ordered entities are ubiquitous in the mathematical modeling of scheduling problems, distributed storage allocation, production planning, and unified language models. Often, the partial order represents either
precedence constraints or dependencies among entities (or items).
Motivated by applications in production planning [39, 2] and distributed storage allocation in hierarchical databases [45],
we introduce the covering partially ordered items (cpo) problem.
An instance of cpo consists of a directed graph , a value , and a size function .111For , we denote by the set of integers . A configuration is a subset of vertices such that ,222For a set , a function , and , define .
and is closed under precedence constraints; that is, for any and it holds that . A feasible solution is a set of configurations that covers , namely .
The cardinality of the solution is , the number of configurations. The goal is to find a feasible solution of minimum cardinality.
cpo can be applied to optimize the distributed storage of large hierarchical data in unified medical language systems (UMLS) [45]. UMLS data is
often distributed over several databases of bounded size. Due to the hierarchical nature of the medical taxonomy, each database needs to be closed under this hierarchy relation. The problem of minimizing the number of distributed databases of the UMLS data translates to a cpo problem instance.
Another application of cpo arises in production planning for steel mills that employ continuous casting [39, 2]. The steel-making process has high energy consumption. One way to save energy is by employing continuous casting and direct charging. In this routine, the molten steel is solidified into slabs and rolled into finished products of various sizes continuously, with no need to reheat the steel in the process. Each finished product requires specific casting, rolling, and thermal treatments in a given order, which can be modeled by a directed acyclic graph (DAG). A main challenge is to assign the finished products to batches whose size is dictated by the size of the ladle furnace while minimizing the amount of repeated operations. This gives rise to an instance of cpo.
A natural greedy approach for solving cpo is to repeatedly find, among all subsets of vertices that can be feasibly assigned to a single configuration, a subset that maximizes the size of yet unassigned vertices.
This single configuration problem is a variant of
the well known rule caching problem (rcp) that has been studied extensively [10, 43, 37, 19, 24, 38, 23, 15, 6, 34, 25, 14, 42, 33, 32, 44].
An instance of rcp consists of a directed graph , a profit function , and a value . We seek a subset of vertices which is closed under precedence constraints, such that , and is maximized. In Appendix 0.A we describe central applications of rcp in networking and the blockchain technology.
Prior to this work rcp was just known to be strongly NP-hard [4, 31]. Our initial attempt towards solving cpo was to find a good approximation for rcp.
Surprisingly, we were able to show an equivalence between rcp and the
densest -subhypergraph (dksh) problem w.r.t. approximability. The input for dksh consists of a hypergraph and a value . The goal is to find a subset of vertices of cardinality that maximizes the number (or weight) of induced hyperedges (a more formal definition is given in LABEL:sec:2).333dksh has also been widely studied (see, e.g., [7, 17, 8] and the references therein).
Unfortunately, dksh is known to be hard to approximate within a factor of , for , assuming the Small Set Expansion Conjecture (by combining the results of [26] and [17]).
This implies the same hardness of approximation for rcp (see Section 1.1).
Given this hardness result, we expect cpo to be hard to approximate on general graphs. Thus, we consider the special case of cpo where is an out-tree. We call this problem covering partially ordered items on out-trees (ct).
To the best of our knowledge, ct is studied here for the first time.
We note that when is an in-tree cpo is trivial since
the problem has a feasible (and unique) solution iff the total size of the vertices is at most .
1.1 Our results
Our first result is an approximation algorithm for ct.
Recall that, for , is an -approximation algorithm for a minimization (maximization) problem if, for any instance of , the output of is at most (at least ) times the optimum.
Theorem 1.1
There is a polynomial time -approximation algorithm for ct.
While out-trees have a simple structure, allowing for a greedy-based bottom-up approach in solving ct, the analysis of our approximation algorithm is nontrivial and requires extra care to make sure that the approximation bound has no additive terms (see below).
ct generalizes the classical bin packing (bp) problem. The input for bp is a set of items and a value . Each item has a size in , and the goal is to assign the items into a minimum number of bins of capacity .444We use the definition of bp as given in [16]. In an alternative definition found in the literature, bin capacities are normalized to one, and item sizes are in . An instance of bp is reduced to an instance of ct on a star graph by generating a leaf for each item of the bp instance and adding a root vertex of size zero. This trivial reduction implies that ct is strongly NP-hard [16].
Interestingly, we show that in contrast to bp, ct does not admit an asymptotic polynomial-time approximation scheme (APTAS), or even an asymptotic approximation strictly better than . This separates ct from bp which admits also an additive logarithmic approximation [18].
Theorem 1.2
For any , there is no asymptotic -approximation for ct unless P=NP.
Next, we study the hardness of rcp.
Theorem 1.3
For any , there is a -approximation for rcp if and only if there is a -approximation for dksh.
Corollary 1
Assuming the Small Set Expansion Hypothesis (SSEH) and NPBPP, for any there is no -approximation for rcp.
We give a tight lower bound also for the previously studied special case of
uniformrcp (u-rcp) [4, 3].555In [4], u-rcp is called uniform directed all-neighbor knapsack problem.
In u-rcp the vertices have uniform (unit) profits (i.e., ).
While u-rcp is known to admit a PTAS [4, 3], the question of whether the problem admits an EPTAS or even an FPTAS remained open.666We give formal defintions relating to approximation schemes in Appendix LABEL:sec:uniform. Our next result gives a negative answer to both of these questions, posed in [4, 3].
Theorem 1.4
Assuming Gap-ETH, there is no EPTAS for u-rcp.
Finally, we show that rcp remains essentially just as hard when the in-degrees and
out-degrees are bounded.
Theorem 1.5
A -approximation algorithm for rcp instances with in-degrees and out-degrees bounded by , for any , implies a -approximation for rcp.
Due to space constraints, we include in the paper body only the proof of Theorem 1.1 and defer the proofs of the other theorems to the Appendix.
Techniques:
Our algorithm for ct covers the vertices in a given out-tree in a bottom-up fashion, starting from the leaves. The key players in this process are vertices called anchors which define the candidate subtrees for covering in each iteration. Interestingly, we show that the subtree associated with a specific anchor (including also all of ’s ancestors) can be covered efficiently by using the naive NextFit algorithm.
To eliminate additive terms in the approximation guarantee (i.e., obtain an absolute ratio of ), a crucial step in the algorithm is to distinguish
in each call to NextFit between the case where NextFit outputs an even vs. odd number of configurations. In the latter case, we discard the last configuration and cover the corresponding leftover vertices in a later iteration of the algorithm.
The crux of the analysis is to charge the number of subsets (i.e., configurations) used by the algorithm separately to each anchor. Consider a subtree of an anchor , of total size , covered at some iteration. Observing that each subset including vertices in this subtree must include also all the ancestors of , we are able to show that the total number of subsets used is at most twice , where is the total size of the ancestors of in (including ). To complete the analysis, we lower bound the number of subsets used in any feasible solution. This is done via an intricate calculation
bounding the number of occurrences of each vertex
in the subtree of an anchor in any feasible cover, which is the
heart of the analysis. Our Greedy approach may be useful for other cpo classes of instances in which the input graph has a tree-like structure (e.g., graphs of bounded treewidth).
Our proofs of hardness for ct and rcp use sophisticated constructions, most notably, to show a two-way reduction between rcp and dksh (in Appendices LABEL:sec:2 and LABEL:sec:6) and the hardness of rcp with bounded degrees (see Appendix LABEL:sec:inOut).
Organization:
Section 2 presents our approximation algorithm for ct and the proof of Theorem 1.1, and Section 3 includes some open problems. In Appendix 0.A we describe common applications of rcp, and Appendix 0.B gives the hardness result for ct (proof of Theorem 1.2). Appendices LABEL:sec:2 and LABEL:sec:6 show the equivalence between rcp and dksh (proofs of Theorem 1.3 and Corollary 1). Appendix LABEL:sec:uniform shows that there is no EPTAS for u-rcp (Theorem 1.4), and Appendix LABEL:sec:inOut proves the hardness of rcp on graphs of in-degrees and out-degrees bounded by (Theorem 1.5). Finally, some missing proofs are given in Appendix LABEL:app:proofs.
2 Approximation Algorithm for ct
In this section, we present our approximation algorithm for ct. We start with some definitions and notations.
Let be an out-tree rooted at a vertex . Recall that in an out-tree all edges are oriented outwards from . Thus, for an edge , vertex precedes on the (unique) path from to . We say that is the parent of and is a child of .
More generally, if is on the (unique) path from to then is an ancestor of and is a descendant of . A vertex is considered an ancestor of itself but not a descendant of itself. Define to be the total size of the vertices on the path from to , which equals the total size of the ancestors of .
For , let be the subgraph of induced by . If is connected, then we say that is a subtree of . Note that in this case is also an out-tree. From now on, we consider only induced subgraphs that are connected, namely subtrees of .
If , then is a subtree of rooted at .
For an out-tree and a subset of vertices , let be the set of the ancestors in of the vertices in , and let be the set of the descendants in of the vertices in . Note that if is a subtree of rooted at , then . In case is a singleton set, we omit the set notation; that is, for , let be the set of the ancestors of in , and let be the set of the descendants of in .
We note that
if there is a vertex for which then there is no feasible solution. Also, if there is a leaf of for which then any solution must include the set (of size ), and after adding this set to the solution, we can remove and all of its ancestors which are not ancestors of any other leaf. Thus, w.l.o.g. we assume that for any vertex it holds that . Also, we note that if there is a leaf of of size then we can remove , solve for the resulting tree and then add to a subset in the cover that includes the parent of in . Thus, w.l.o.g. we assume that for any leaf of , .
The algorithm for computing a cover is iterative. In each iteration, we compute a partial cover as described below. We then continue to the next iteration with the subtree rooted at induced by the uncovered vertices and their ancestors. The algorithm terminates when either the set of uncovered vertices is empty or the total size of the vertices of the remaining subtree (rooted at ) is at most , in which case these vertices form the last set in the cover.
In each iteration of the algorithm, we compute a subset of vertices that we call anchors. We then compute a cover of some (potentially all) descendants of the anchors in , and proceed to the next iteration.
Algorithm1 is the pseudo code of the iterative algorithm. Initially, . Consider the -th iteration, for . If then the algorithm terminates. Otherwise, define as the set of all the vertices such that (i) the total size of the descendants of in is more than , and (ii) the total size of the descendants of every child of in is at most .
Procedure NextFit given in Algorithm2 is called for every . The input to Procedure NextFit is the tree defined as the rooted subtree that consists of the path from to and the descendants of in the subtree (see Figure 0(a)).
When called for an anchor , Procedure NextFit (Algorithm2) computes a cover of some (potentially all) descendants of . The number of sets returned in this procedure call is even, and the
total size of the descendants of that are not covered by the sets returned by Procedure NextFit is at most . Let be the set of all descendants of anchors in that were covered in iteration , together with all their ancestors. If , then the algorithm terminates. Otherwise, we let be the set of ancestors of the vertices in and continue to iteration .
Let be the set of anchors computed in all the iterations. For an anchor , let be the iteration in which was added to the set of anchors. Note that any leaf of appears in exactly one subset in . Thus, the iteration in which is covered is uniquely defined.
Definition 1
Let be an anchor.
•
If is an ancestor of a leaf of that is covered in iteration then we say that is anchored at .
•
If is not anchored at then we say that is a leftover vertex of .
•
Let denote the total size of the vertices that are anchored at ,
and denote the total size of the leftover vertices of .
Clearly, . Our assumption that for every leaf of , , implies that (i) and (ii) if there are leftover vertices then .
The proofs of the next lemmas are in LABEL:app:proofs.
Lemma 1
Let , and let be the (unique) child of that is also an ancestor of .
If is a leftover vertex of then all the vertices in the subtree of rooted at , as well as the vertices along the path from to , are also leftover vertices of .
If is anchored at then all the vertices in the subtree of rooted at , as well as the vertices along the path from to , are also anchored at .
Lemma 2
For any two anchors , the sets of vertices anchored at and are disjoint.
Define a “parent-child” relation among anchors as follows. For two anchors and , we say that is the anchor-parent of and is the anchor-child of if (i) is an ancestor of in , and (ii) the path from to (in ) does not contain any anchors other than and . Note that if anchor is an anchor-parent of then ; that is, the iteration in which is added to the set of anchors is after iteration . This follows from the definition of .
For anchor , let be the set of anchor-children of . We extend this definition for all , and let be all the anchors such that the path from to (in ) does not contain any anchors other than and (possibly) . For , let
be the set of anchors that are also descendants of .
A top anchor is an anchor that is not an anchor-child of any other anchor. Let denote the set of top anchors. Note that if the root is an anchor then .
Lemma 3
The number of subsets in the solution computed by Algorithm1 is upper bounded by
where
Proof
Let be the solution computed by the algorithm.
Fix , and let be the subsets in that were returned by Procedure NextFit (Algorithm2) when it computed a feasible cover of the vertices in . Note that the union of all the subsets in is the set of vertices anchored at (whose total size is ) together with all the ancestors of . Also, consists of at least two subsets, and a vertex anchored at cannot appear in more than one subset in . Consider the subsets in in the order in which they were computed by Algorithm2. It follows from Procedure NextFit that the total size of the vertices in any pair of consecutive subsets in this ordered list is at least . Since the total size of vertices anchored at is , the number of such disjoint pairs is upper bounded by . By Line 8 of Procedure NextFit, the number of sets in is even, and thus the total number of subsets in is upper bounded by . The total upper bound is given by summing this bound over all anchors . We may have one additional subset if the algorithm is terminated when . By our construction and Lemma1, this may happen iff there exists a leaf that is not anchored at any anchor. If such a leaf exists then one of the following two conditions must be satisfied: (i) has no ancestor that is an anchor, or (ii) is a leftover vertex of the (unique) top anchor that is an ancestor of , in which case . The lemma follows. ∎
We now prove a lower bound on the number of subsets in any feasible solution and in particular in the optimal solution.
Let be a feasible solution. Since every subset in is closed under ancestor relation, some vertices may appear in multiple subsets. We refer to each such appearance of a vertex as an occurrence of , and associate the size to each of its occurrences.
For an anchor , let be the set of all the subsets in that contain vertices anchored at either or a descendant of . For an anchor , let be the set of all vertices such that each vertex is both a descendant of and an ancestor of a vertex anchored at either or a descendant of .
Lemma 4
For every ,
the number of subsets in is at least
If the lower bound is tight then all the leaves that are in the subsets in must be anchored at either or a descendant of .
Proof
To prove the lower bound of on the number of subsets in for every , we prove a slightly stronger lower bound of
on the total size of the occurrences of vertices in in subsets in . Since any subset that contains a descendant of must contain also the ancestors of (including ), whose total size is , the total size of the vertices in that can be in a single subset in is no more than . Thus, a lower bound of on the total size of the occurrences of vertices in in subsets in implies a lower bound of on the number of subsets in (and on the number of occurrences of anchor ).
The lower bound on the total size of the occurrences of vertices in in subsets in also implies that if the lower bound is tight then all the leaves that are in the subsets in must be anchored at either or a descendant of . To see this, note that if any subset in contains a leaf that is not anchored at an anchor in then , also by our assumption. It follows that the total size of the subsets in is strictly more than . Clearly, this implies that the number of subsets in is strictly more than .
The proof is by induction starting from the bottom anchors in , which are the anchors with no anchor-children. For the induction base, consider a bottom anchor . Note that in this case is the set of all vertices anchored at . The subsets in cover all the vertices anchored at ; thus, the total size of the occurrences of these vertices in the subsets in is at least .
Clearly, .
For the inductive step, consider an anchor and assume that the lemma holds for every anchor . Specifically, for every anchor , the total size of the occurrences of vertices in in subsets in is at least . Note that and .
Let the subtree of rooted at given by the union of the paths from to each of its anchor-children, excluding the anchor-children (see Figure 0(b)). Note that the vertices of as well as the vertices in are in .
Claim 5
For every vertex of , the total size of the occurrences of vertices in in subsets in is at least
.
We prove the claim vertex by vertex, scanning the vertices of bottom-up. Consider a leaf of . By the definition of , its children are anchors in . By the induction hypothesis of Lemma4, for every anchor the total size of the occurrences of vertices in in the subsets in is at least .
The total size of such occurrences that are contained in any single subset of is at most , since any such subset must also contain the ancestors of (including ) whose size is . It follows that the number of occurrences of in these subsets in is at least , and the total size of these occurrences is at least . Note that for any pair of anchors , the sets and are disjoint. Summing over all the anchor-children of , we have that the total size of the occurrences of vertices in in the subsets in is at least
.
The last equality holds since for every , .
The lower bound for an internal vertex of is obtained similarly. Note that a child of is either an anchor or a vertex of . If is an anchor, that is , then as shown above, the total size of the occurrences of and its descendants in in the subsets in is
. Suppose that is a vertex of . Since is a child of and the vertices are scanned bottom-up the lower bound holds for , and the total size of the occurrences of vertices in in the subsets in is
. The total size of such occurrences that is contained in any single subset in is at most , thus; the number of occurrences of in these subsets is at least , and the total size of these occurrences is at least . We get that the total size of the occurrences of and the vertices in in the subsets in is
. For any pair of children of , the sets and are disjoint. Summing over all the children of , we get that the total size of the occurrences of vertices in in the subsets in is at least
.∎
Next, we consider vertices that are anchored at . By the definition of , each such vertex must occur at least once in subsets in ; also, . Note that may be an ancestor of an anchor . This may happen in case is a vertex of , and also in case a leftover vertex of an anchor is anchored at , and is on the path from to . In case is an ancestor of an anchor , our induction hypothesis and Claim 5 already imply a lower bound on the number of its occurrences in subsets in . Specifically, in case , our induction hypothesis implies a lower bound of on the number of its occurrences, and in case and , Claim 5 implies a lower bound of on the number of its occurrences. We prove that must occur at least once more in subsets in , in addition to this implied lower bound. This results in addition of to the total size of the occurrences of vertices anchored at in the subsets in .
Claim 6
For every vertex anchored at , the number of occurrences of in the subsets in is at least
If is anchored at then it must be an ancestor of a leaf of that is anchored at . Certainly, must occur in the subset in that covers . If is not an ancestor of an anchor , we are done.
If is an anchor and thus , and the number of occurrences of in the subsets in is strictly more than , then we are done. Otherwise, the lower bound is tight, and by the induction hypothesis of Lemma4, all the leaves that are in the subsets in must be anchored at or a descendant of . Thus, none of these subsets can cover . It follows that must occur in at least one more subset in that covers .
A similar argument applies also if and . Let be the nearest ancestor of that is an anchor. By Claim 5, the total size of the occurrences of vertices in in subsets in is at least
. It follows that the number of occurrences of in the subsets in is at least . If is not in any of the subsets in , then must occur in at least one more subset in that covers , and we are done. Suppose that this is not the case, and is in a subset in . It is not difficult to verify that the proof of Claim 5 implies the lower bound on the total size of the vertices in a subset of . This subset is the union of three sets:
, , and
. Clearly, is not in any of these three sets. Thus, the total size of the occurrences of vertices in in the subsets in is strictly more than . Hence, the number of occurrences of in the subsets in is strictly more than .∎
By Claims 5 and 6 and our induction hypothesis we get that the total size of the occurrences of vertices in in the subsets in is at least
The first equality holds since . ∎
Corollary 2
The number of subsets in any feasible solution is at least
If then by Lemma4 the number of occurrences of is at least . If this lower bound is tight then all the leaves in subsets in are anchored at some vertex. If , then there is a leaf of that is a leftover vertex of and thus not anchored at any vertex. In this case, at least one additional subset is needed to cover this leaf. If then . In this case, following the proof of Claim 5, we get that the total size of the occurrences of the descendants of in subsets in is at least . This implies that occurs in at least subsets in . If the bound is tight then all the leaves in these subsets are anchored vertices. Thus, if there exists a vertex that is not anchored at any vertex, at least one additional subset is needed. This occurs when either , or s.t. none of the ancestors of is an anchor. ∎
Corollary 2 and Lemma3 imply a factor 2 approximation.
3 Open problems
An intriguing open problem is to bridge the gap between our -approximation and -inapproximability result for ct.
Recall that ct is the special case of cpo on out-trees.
While we expect cpo to be hard to approximate on general graphs (as
mentioned above), exploring further the hardness of cpo on various graph classes
remains open.
Another appealing line of research is to investigate the connections between cpo and a natural covering variant of the dksh problem defined as follows.
Given a hypergraph and an integer , find the minimum number of vertex sets,
each of cardinality at most , such that every hyperedge is fully contained in one of the sets. We are not aware of earlir studies of this problem, even in the special case where is a graph.
One interesting direction is to derive nontrivial hardness results for this problem and
show possible implications for cpo.
References
[1]
Azzolini, D., Riguzzi, F., Lamma, E.: Studying transaction fees in the bitcoin
blockchain with probabilistic logic programming. Information
10(11), 335 (2019)
[2]
Biondi, M., Saliba, S., Harjunkoski, I.: Production optimization and scheduling
in a steel plant: Hot rolling mill. In: 18th World Congress of the
International Federation of Automatic Control. pp. 11750–11754 (2011)
[4]
Borradaile, G., Heeringa, B., Wilfong, G.: The knapsack problem with neighbour
constraints. Journal of Discrete Algorithms 16, 224–235 (2012)
[5]
Chalermsook, P., Cygan, M., Kortsarz, G., Laekhanukit, B., Manurangsi, P.,
Nanongkai, D., Trevisan, L.: From gap-ETH to FPT-inapproximability:
Clique, dominating set, and more. In: 58th Annual Symposium on Foundations of
Computer Science (FOCS). pp. 743–754 (2017)
[6]
Cheng, T., Wang, K., Wang, L.C., Lee, C.W.: An in-switch rule caching and
replacement algorithm in software defined networks. In: 2018 IEEE
International Conference on Communications (ICC). pp. 1–6. IEEE (2018)
[7]
Chlamtác, E., Dinitz, M., Konrad, C., Kortsarz, G., Rabanca, G.: The
densest k-subhypergraph problem. SIAM J. Discret. Math. 32(2),
1458–1477 (2018)
[8]
Chlamtáč, E., Dinitz, M., Makarychev, Y.: Minimizing the union: Tight
approximations for small set bipartite vertex expansion. In: Proceedings of
the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. pp.
881–899 (2017)
[9]
Cygan, M., Fomin, F.V., Kowalik, Ł., Lokshtanov, D., Marx, D., Pilipczuk,
M., Pilipczuk, M., Saurabh, S.: Parameterized algorithms, vol. 4 (2015)
[10]
Dong, M., Li, H., Ota, K., Xiao, J.: Rule caching in sdn-enabled mobile access
networks. IEEE Network 29(4), 40–45 (2015)
[11]
Doron-Arad, I., Shachnai, H.: Approximating bin packing with conflict graphs
via maximization techniques. arXiv preprint arXiv:2302.10613 (2023)
[13]
Efsandiari, H., Hajiaghyi, M., Könemann, J., Mahini, H., Malec, D., Sanita,
L.: Approximate deadline-scheduling with precedence constraints. In:
Algorithms-ESA 2015: 23rd Annual European Symposium, Patras, Greece,
September 14-16, 2015, Proceedings. pp. 483–495 (2015)
[14]
Gamage, S., Pasqual, A.: High-performance parallel packet classification
architecture with popular rule caching. In: 2012 18th IEEE International
Conference on Networks (ICON). pp. 52–57. IEEE (2012)
[15]
Gao, P., Xu, Y., Chao, H.J.: Ovs-cab: Efficient rule-caching for open vswitch
hardware offloading. Computer Networks 188, 107844 (2021)
[16]
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the
Theory of NP-Completeness. W. H. Freeman (1979)
[17]
Hajiaghayi, M., Jain, K., Konwar, K., Lau, L., Mandoiu, I., Russell, A.,
Shvartsman, A., Vazirani, V.: The minimum k-colored subgraph problem in
haploty** and dna primer selection. In: Proceedings of the International
Workshop on Bioinformatics Research and Applications (IWBRA). pp. 1–12
(2006)
[18]
Hoberg, R., Rothvoss, T.: A logarithmic additive integrality gap for bin
packing. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on
Discrete Algorithms. pp. 2616–2625 (2017)
[19]
Huang, H., Guo, S., Li, P., Liang, W., Zomaya, A.Y.: Cost minimization for rule
caching in software defined networking. IEEE Transactions on Parallel and
Distributed Systems 27(4), 1007–1016 (2015)
[20]
Ibarra, O.H., Kim, C.E.: Approximation algorithms for certain scheduling
problems. Mathematics of Operations Research 3(3), 197–204 (1978)
[21]
Katta, N., Alipourfard, O., Rexford, J., Walker, D.: Cacheflow:
Dependency-aware rule-caching for software-defined networks. In: Proceedings
of the Symposium on SDN Research. pp. 1–12 (2016)
[22]
Lenstra, J.K., Kan, A.R., Brucker, P.: Complexity of machine scheduling
problems. In: Annals of discrete mathematics, vol. 1, pp. 343–362 (1977)
[23]
Li, H., Guo, S., Wu, C., Li, J.: Fdrc: Flow-driven rule caching optimization in
software defined networking. In: 2015 IEEE International Conference on
Communications (ICC). pp. 5777–5782. IEEE (2015)
[24]
Li, R., Pang, Y., Zhao, J., Wang, X.: A tale of two (flow) tables: Demystifying
rule caching in openflow switches. In: Proceedings of the 48th International
Conference on Parallel Processing. pp. 1–10 (2019)
[25]
Li, R., Zhao, B., Chen, R., Zhao, J.: Taming the wildcards: Towards
dependency-free rule caching with freecache. In: 2020 IEEE/ACM 28th
International Symposium on Quality of Service (IWQoS). pp. 1–10. IEEE (2020)
[26]
Manurangsi, P.: Inapproximability of maximum edge biclique, maximum balanced
biclique and minimum k-cut from the small set expansion hypothesis. In: 44th
International Colloquium on Automata, Languages, and Programming (ICALP 2017)
(2017)
[27]
McMenamin, C., Daza, V., Fitzi, M., O’Donoghue, P.: Fairtradex: A decentralised
exchange preventing value extraction. In: Proceedings of the 2022 ACM CCS
Workshop on Decentralized Finance and Security. pp. 39–46 (2022)
[28]
Moreno, E., Espinoza, D., Goycoolea, M.: Large-scale multi-period precedence
constrained knapsack problem: a mining application. Electronic notes in
discrete mathematics 36, 407–414 (2010)
[29]
Obadia, A., Salles, A., Sankar, L., Chitra, T., Chellani, V., Daian, P.: Unity
is strength: A formalization of cross-domain maximal extractable value. arXiv
preprint arXiv:2112.01472 (2021)
[30]
Papazachos, Z.C., Karatza, H.D.: Gang scheduling with precedence constraints.
In: Proceedings of the 2010 International Symposium on Performance Evaluation
of Computer and Telecommunication Systems (SPECTS’10). pp. 331–337 (2010)
[31]
Pferschy, U., Scatamacchia, R.: Improved dynamic programming and approximation
results for the knapsack problem with setups. International Transactions in
Operational Research 25(2), 667–682 (2018)
[32]
Rastegar, S.H., Abbasfar, A., Shah-Mansouri, V.: Rule caching in sdn-enabled
base stations supporting massive iot devices with bursty traffic. IEEE
Internet of Things Journal 7(9), 8917–8931 (2020)
[33]
Rottenstreich, O., Kulik, A., Joshi, A., Rexford, J., Rétvári, G.,
Menasché, D.S.: Cooperative rule caching for sdn switches. In: 9th
International Conference on Cloud Networking (CloudNet). pp. 1–7 (2020)
[34]
Rottenstreich, O., Tapolcai, J.: Optimal rule caching and lossy compression for
longest prefix matching. IEEE/ACM Transactions on Networking 25(2),
864–878 (2016)
[35]
Samavati, M., Essam, D., Nehring, M., Sarker, R.: A methodology for the
large-scale multi-period precedence-constrained knapsack problem: an
application in the mining industry. International Journal of Production
Economics 193, 12–20 (2017)
[36]
Sarrar, N., Uhlig, S., Feldmann, A., Sherwood, R., Huang, X.: Leveraging zipf’s
law for traffic offloading. ACM SIGCOMM Computer Communication Review
42(1), 16–22 (2012)
[37]
Sheu, J.P., Chuo, Y.C.: Wildcard rules caching and cache replacement algorithms
in software-defined networking. IEEE Transactions on Network and Service
Management 13(1), 19–29 (2016)
[38]
Stonebraker, M., Jhingran, A., Goh, J., Potamianos, S.: On rules, procedure,
caching and views in data base systems. ACM SIGMOD Record 19(2),
281–290 (1990)
[39]
Wang, Y.Z., Zheng, Z., Zhu, M.M., Zhang, K.T., Gao, X.Q.: An integrated
production batch planning approach for steelmaking-continuous casting with
cast batching plan as the core. Computers & Industrial Engineering
173, 108636 (2022)
[40]
Weintraub, B., Torres, C.F., Nita-Rotaru, C., State, R.: A flash (bot) in the
pan: measuring maximal extractable value in private pools. In: Proceedings of
the 22nd ACM Internet Measurement Conference. pp. 458–471 (2022)
[41]
Woeginger, G.J.: On the approximability of average completion time scheduling
under precedence constraints. In: Automata, Languages and Programming: 28th
International Colloquium, ICALP 2001 Crete, Greece, July 8–12, 2001
Proceedings. pp. 887–897 (2001)
[43]
Yan, B., Xu, Y., Xing, H., Xi, K., Chao, H.J.: Cab: A reactive wildcard rule
caching system for software-defined networks. In: Proceedings of the third
workshop on Hot topics in software defined networking. pp. 163–168 (2014)
[44]
Yang, J., Li, T., Yan, J., Li, J., Li, C., Wang, B.: Pipecache: High hit rate
rule-caching scheme based on multi-stage cache tables. Electronics
9(6), 999 (2020)
[45]
Ye, Y., Jiang, Z., Diao, X., Yang, D., Du, G.: An ontology-based hierarchical
semantic modeling approach to clinical pathway workflows. Computers in
Biologyand Medicine 39, 722–732 (2009)
Appendix 0.A Motivation for rcp
A prime motivation for studying rcp comes from the area of networking [10, 43, 37, 19, 24, 38, 23, 15, 6, 34, 25, 14, 42, 33, 32, 44]. In a Software-Defined Network (SDN) traffic flow is governed by a logically centralized controller that utilizes packet-processing rules to manage the underlying switches [21]. The number of rules tends to be high while most traffic relies on a small fraction of these rules [36]. Thus, caching frequently used rules can accelerate the processing time of the packets.
However, standard caching policies cannot be used due to dependencies among rules.
One common form of dependency is a partial overlap in the binary strings representing the rules. For example, consider the rules =‘10**’ (where the symbol ‘*’ denotes a wildcard) and =‘1000’. Then whenever is placed in the cache, must be placed as well. Indeed, if only is in the cache then a message with a header ‘1000’ would be matched with , causing a correctness issue in handling this packet.
Now, the problem of placing a feasible subset of the rules which handle a maximum total volume of traffic can modeled as follows.
We represent the rules by a DAG , where corresponds to the rule , and there is a directed edge from to if placing in the cache implies that is also in the cache.
The profit of each vertex reflects the volume of traffic handled by the rule . The goal is to select a subset of vertices of maximum total profit which fits into the cache, that is closed under precedence constraints.
rcp can be used also to model the maximal extractable value (MEV) problem in blockchain [27, 29, 40, 1].
Each blockchain transaction is associated with a fee earned by the miner who creates the block containing this transaction. The set of transactions is associated with a partial order, and each blockchain prefix has to be closed under precedence constraints.
MEV is the maximum potential profit that a blockchain miner can gain from transactions that have not been validated.
Computing MEV can be cast as an rcp instance where the vertices of the graph are the transactions, the edges represent the precedence constraints, the profits are the associated fees, and the bound is the number of transactions that fit in a single block. Other applications of rcp variants arise, e.g., in the mining industry [28, 35] and in scheduling [30, 41, 13, 20, 22].
Appendix 0.B Hardness Result for CT
Our hardness result for ct is based on a reduction from bin packing with cluster complement conflict graph (bpcc). An undirected graph is called a cluster complement if there is a partition of such that for all it holds that is an independent set in and for all where and any and it holds that . We now formally define the bpcc problem.
Definition 2
The bin packing with cluster complement conflict graph (bpcc) is defined as follows.
Input: A cluster complement , a weight function , and a value .
Configuration: An independent set in such that .
Solution: For some , we say that is a solution with cardinality if the following holds.
•
For every it holds that is a configuration.
•
For all there is such that .
Objective: Find a solution of minimum cardinality.
Proof of Theorem1.2: We show a reduction from bpcc to ct. Let be a bpcc instance. Let be the unique partition of into maximal independent sets, which exists and can be found in polynomial time since is cluster complement. Then, define the reducedct instance as follows
•
The vertex set of contains a root and a vertex for every , where ( is a child of for every ). For every and every define a leaf and add an edge . Overall, we get a two-level star graph.
•
Define the size function such that , for all define , and for all and define .
•
Define .
For every , let
(1)
Claim 7
For every if is a configuration of then is a configuration of .
Proof
Assume that is a configuration of . Observe that, by (1), is closed under the precedence constraints. Moreover,
The first equality follows from (1). The second equality holds since is a configuration; thus, it is an independent set in , and it can contain vertices from at most one . The inequality holds since is a configuration. We conclude that is a configuration of .
For every let
(2)
Claim 8
For every if is a configuration of then is a configuration of .