¹¹institutetext: Computer Science Department, Technion, Haifa, Israel. ¹¹email: {idoron-arad,naor,hadas}@cs.technion.ac.il. ²²institutetext: Computer Science Department, Rutgers University-Camden, Camden, NJ, USA. ²²email: [email protected]. ³³institutetext: Computer Science Department, New Jersey Institute of Technology, Newark, NJ, USA. ³³email: [email protected].

Approximations and Hardness of Covering and Packing Partially Ordered Items

Ilan Doron-Arad 11 Guy Kortsarz 22 Joseph (Seffi) Naor 11
Baruch Schieber 33 Hadas Shachnai 11

Approximations and Hardness of Covering and Packing Partially Ordered Items

Ilan Doron-Arad 11 Guy Kortsarz 22 Joseph (Seffi) Naor 11
Baruch Schieber 33 Hadas Shachnai 11

Abstract

Motivated by applications in production planning and storage allocation in hierarchical databases, we initiate the study of covering partially ordered items (cpo). Given a value $k\in\mathbb{N}$ , and a directed graph $G=(V,E)$ where each vertex has a size in $\{0,1,\ldots,k\}$ , we seek a collection of subsets of vertices $C_{1},\ldots,C_{t}$ that cover all the vertices, such that for any $1\leq j\leq t$ , the total size of vertices in $C_{j}$ is bounded by $k$ , and there are no edges from $V\setminus C_{j}$ to $C_{j}$ . The objective is to minimize the number of subsets $t$ . cpo is closely related to the rule caching problem (rcp) that has been widely studied in the networking area. The input for rcp is a directed graph $G=(V,E)$ , a profit function $p:V\rightarrow\mathbb{Z}_{0}^{+}$ , and $k\in\mathbb{N}$ . The output is a subset $S\subseteq V$ of maximum profit such that $|S|\leq k$ and there are no edges from $V\setminus S$ to $S$ .

Our main result is a $2$ -approximation algorithm for cpo on out-trees, complemented by an asymptotic $1.5$ -hardness of approximation result. We also give a two-way reduction between rcp and the densest $k$ -subhypergraph problem, surprisingly showing that the problems are equivalent w.r.t. polynomial-time approximation within any factor $\rho\geq 1$ . This implies that rcp cannot be approximated within factor $|V|^{1-{\varepsilon}}$ for any fixed ${\varepsilon}>0$ , under standard complexity assumptions. Prior to this work, rcp was just known to be strongly NP-hard. We further show that there is no EPTAS for the special case of rcp where the profits are uniform, assuming Gap-ETH. Since this variant admits a PTAS, we essentially resolve the complexity status of this problem.

1 Introduction

Partially ordered entities are ubiquitous in the mathematical modeling of scheduling problems, distributed storage allocation, production planning, and unified language models. Often, the partial order represents either precedence constraints or dependencies among entities (or items). Motivated by applications in production planning [39, 2] and distributed storage allocation in hierarchical databases [45], we introduce the covering partially ordered items (cpo) problem. An instance of cpo consists of a directed graph $G=(V,E)$ , a value $k\in\mathbb{N}$ , and a size function $s:V\rightarrow[0:k]$ .¹¹1For $i,j\in{{\mathbb{Z}_{0}^{+}}}$ , we denote by $[i:j]$ the set of integers $\{i,i+1,\ldots,j\}$ . A configuration is a subset of vertices $U\subseteq V$ such that $s(U)\leq k$ ,²²2For a set $A$ , a function $f:A\rightarrow\mathbb{X}$ , and $B\subseteq A$ , define $f(B)=\sum_{b\in B}f(b)$ . and $U$ is closed under precedence constraints; that is, for any $u\in U$ and $(z,u)\in E$ it holds that $z\in U$ . A feasible solution is a set of configurations $C_{1},\ldots,C_{t}$ that covers $V$ , namely $\bigcup_{j\in[1:t]}C_{j}=V$ . The cardinality of the solution is $t$ , the number of configurations. The goal is to find a feasible solution of minimum cardinality.

cpo can be applied to optimize the distributed storage of large hierarchical data in unified medical language systems (UMLS) [45]. UMLS data is often distributed over several databases of bounded size. Due to the hierarchical nature of the medical taxonomy, each database needs to be closed under this hierarchy relation. The problem of minimizing the number of distributed databases of the UMLS data translates to a cpo problem instance.

Another application of cpo arises in production planning for steel mills that employ continuous casting [39, 2]. The steel-making process has high energy consumption. One way to save energy is by employing continuous casting and direct charging. In this routine, the molten steel is solidified into slabs and rolled into finished products of various sizes continuously, with no need to reheat the steel in the process. Each finished product requires specific casting, rolling, and thermal treatments in a given order, which can be modeled by a directed acyclic graph (DAG). A main challenge is to assign the finished products to batches whose size is dictated by the size of the ladle furnace while minimizing the amount of repeated operations. This gives rise to an instance of cpo.

A natural greedy approach for solving cpo is to repeatedly find, among all subsets of vertices that can be feasibly assigned to a single configuration, a subset that maximizes the size of yet unassigned vertices. This single configuration problem is a variant of the well known rule caching problem (rcp) that has been studied extensively [10, 43, 37, 19, 24, 38, 23, 15, 6, 34, 25, 14, 42, 33, 32, 44]. An instance of rcp consists of a directed graph $G=(V,E)$ , a profit function $p:V\rightarrow{{\mathbb{Z}_{0}^{+}}}$ , and a value $k\in\mathbb{N}$ . We seek a subset of vertices $U\subseteq V$ which is closed under precedence constraints, such that $|U|\leq k$ , and $p(U)=\sum_{u\in U}p(u)$ is maximized. In Appendix 0.A we describe central applications of rcp in networking and the blockchain technology.

Prior to this work rcp was just known to be strongly NP-hard [4, 31]. Our initial attempt towards solving cpo was to find a good approximation for rcp. Surprisingly, we were able to show an equivalence between rcp and the densest $k$ -subhypergraph (dksh) problem w.r.t. approximability. The input for dksh consists of a hypergraph $G=(V,E)$ and a value $k\in{\mathbb{N}}$ . The goal is to find a subset of vertices $S\subseteq V$ of cardinality $k$ that maximizes the number (or weight) of induced hyperedges (a more formal definition is given in LABEL:sec:2).³³3dksh has also been widely studied (see, e.g., [7, 17, 8] and the references therein).

Unfortunately, dksh is known to be hard to approximate within a factor of $|V|^{1-{\varepsilon}}$ , for ${\varepsilon}\in(0,1)$ , assuming the Small Set Expansion Conjecture (by combining the results of [26] and [17]). This implies the same hardness of approximation for rcp (see Section 1.1). Given this hardness result, we expect cpo to be hard to approximate on general graphs. Thus, we consider the special case of cpo where $G$ is an out-tree. We call this problem covering partially ordered items on out-trees (ct). To the best of our knowledge, ct is studied here for the first time. We note that when $G$ is an in-tree cpo is trivial since the problem has a feasible (and unique) solution iff the total size of the vertices is at most $k$ .

1.1 Our results

Our first result is an approximation algorithm for ct. Recall that, for $\alpha\geq 1$ , ${\mathcal{A}}$ is an $\alpha$ -approximation algorithm for a minimization (maximization) problem $\Pi$ if, for any instance of $\Pi$ , the output of ${\mathcal{A}}$ is at most $\alpha$ (at least $1/\alpha$ ) times the optimum.

Theorem 1.1

There is a polynomial time $2$ -approximation algorithm for ct.

While out-trees have a simple structure, allowing for a greedy-based bottom-up approach in solving ct, the analysis of our approximation algorithm is nontrivial and requires extra care to make sure that the approximation bound has no additive terms (see below).

ct generalizes the classical bin packing (bp) problem. The input for bp is a set of items and a value $k\in\mathbb{N}$ . Each item has a size in $[0:k]$ , and the goal is to assign the items into a minimum number of bins of capacity $k$ .⁴⁴4We use the definition of bp as given in [16]. In an alternative definition found in the literature, bin capacities are normalized to one, and item sizes are in $[0,1]$ . An instance of bp is reduced to an instance of ct on a star graph by generating a leaf for each item of the bp instance and adding a root vertex of size zero. This trivial reduction implies that ct is strongly NP-hard [16].

Interestingly, we show that in contrast to bp, ct does not admit an asymptotic polynomial-time approximation scheme (APTAS), or even an asymptotic approximation strictly better than $\frac{3}{2}$ . This separates ct from bp which admits also an additive logarithmic approximation [18].

Theorem 1.2

For any $\alpha<\frac{3}{2}$ , there is no asymptotic $\alpha$ -approximation for ct unless P=NP.

Next, we study the hardness of rcp.

Theorem 1.3

For any $\rho\geq 1$ , there is a $\rho$ -approximation for rcp if and only if there is a $\rho$ -approximation for dksh.

Corollary 1

Assuming the Small Set Expansion Hypothesis (SSEH) and NP $\neq$ BPP, for any ${\varepsilon}>0$ there is no $|V|^{1-{\varepsilon}}$ -approximation for rcp.

We give a tight lower bound also for the previously studied special case of uniform rcp (u-rcp) [4, 3].⁵⁵5In [4], u-rcp is called uniform directed all-neighbor knapsack problem. In u-rcp the vertices have uniform (unit) profits (i.e., $p(v)=1~{}\forall v\in V$ ). While u-rcp is known to admit a PTAS [4, 3], the question of whether the problem admits an EPTAS or even an FPTAS remained open.⁶⁶6We give formal defintions relating to approximation schemes in Appendix LABEL:sec:uniform. Our next result gives a negative answer to both of these questions, posed in [4, 3].

Theorem 1.4

Assuming Gap-ETH, there is no EPTAS for u-rcp.

Finally, we show that rcp remains essentially just as hard when the in-degrees and out-degrees are bounded.

Theorem 1.5

A $\rho$ -approximation algorithm for rcp instances with in-degrees and out-degrees bounded by $2$ , for any $\rho\geq 1$ , implies a $\rho$ -approximation for rcp.

Due to space constraints, we include in the paper body only the proof of Theorem 1.1 and defer the proofs of the other theorems to the Appendix.

Techniques: Our algorithm for ct covers the vertices in a given out-tree $T$ in a bottom-up fashion, starting from the leaves. The key players in this process are vertices called anchors which define the candidate subtrees for covering in each iteration. Interestingly, we show that the subtree associated with a specific anchor $a$ (including also all of $a$ ’s ancestors) can be covered efficiently by using the naive NextFit algorithm.

To eliminate additive terms in the approximation guarantee (i.e., obtain an absolute ratio of $2$ ), a crucial step in the algorithm is to distinguish in each call to NextFit between the case where NextFit outputs an even vs. odd number of configurations. In the latter case, we discard the last configuration and cover the corresponding leftover vertices in a later iteration of the algorithm.

The crux of the analysis is to charge the number of subsets (i.e., configurations) used by the algorithm separately to each anchor. Consider a subtree of an anchor $a$ , of total size $\textsc{sa}(a)$ , covered at some iteration. Observing that each subset including vertices in this subtree must include also all the ancestors of $a$ , we are able to show that the total number of subsets used is at most twice $\left\lfloor\frac{\textsc{sa}(a)}{k-h(a)+1}\right\rfloor$ , where $h(a)$ is the total size of the ancestors of $a$ in $T$ (including $a$ ). To complete the analysis, we lower bound the number of subsets used in any feasible solution. This is done via an intricate calculation bounding the number of occurrences of each vertex $v$ in the subtree of an anchor $a$ in any feasible cover, which is the heart of the analysis. Our Greedy approach may be useful for other cpo classes of instances in which the input graph $G$ has a tree-like structure (e.g., graphs of bounded treewidth).

Our proofs of hardness for ct and rcp use sophisticated constructions, most notably, to show a two-way reduction between rcp and dksh (in Appendices LABEL:sec:2 and LABEL:sec:6) and the hardness of rcp with bounded degrees (see Appendix LABEL:sec:inOut).

Organization: Section 2 presents our approximation algorithm for ct and the proof of Theorem 1.1, and Section 3 includes some open problems. In Appendix 0.A we describe common applications of rcp, and Appendix 0.B gives the hardness result for ct (proof of Theorem 1.2). Appendices LABEL:sec:2 and LABEL:sec:6 show the equivalence between rcp and dksh (proofs of Theorem 1.3 and Corollary 1). Appendix LABEL:sec:uniform shows that there is no EPTAS for u-rcp (Theorem 1.4), and Appendix LABEL:sec:inOut proves the hardness of rcp on graphs of in-degrees and out-degrees bounded by $2$ (Theorem 1.5). Finally, some missing proofs are given in Appendix LABEL:app:proofs.

2 Approximation Algorithm for ct

In this section, we present our approximation algorithm for ct. We start with some definitions and notations. Let $T=(V,E)$ be an out-tree rooted at a vertex $r\in V$ . Recall that in an out-tree all edges are oriented outwards from $r$ . Thus, for an edge $(u,v)\in E$ , vertex $u$ precedes $v$ on the (unique) path from $r$ to $v$ . We say that $u$ is the parent of $v$ and $v$ is a child of $u$ . More generally, if $u$ is on the (unique) path from $r$ to $v$ then $u$ is an ancestor of $v$ and $v$ is a descendant of $u$ . A vertex $v$ is considered an ancestor of itself but not a descendant of itself. Define $h(v)$ to be the total size of the vertices on the path from $r$ to $v$ , which equals the total size of the ancestors of $v$ .

For $U\subseteq V$ , let $T[U]$ be the subgraph of $T$ induced by $U$ . If $T[U]$ is connected, then we say that $T[U]$ is a subtree of $T$ . Note that in this case $T[U]$ is also an out-tree. From now on, we consider only induced subgraphs that are connected, namely subtrees of $T$ . If $r\in U$ , then $T[U]$ is a subtree of $T$ rooted at $r$ .

For an out-tree $T=(V,E)$ and a subset of vertices $U\subseteq V$ , let $\textsc{Ancs}_{{T}}({U})$ be the set of the ancestors in $T$ of the vertices in $U$ , and let $\textsc{Desc}_{{T}}({U})$ be the set of the descendants in $T$ of the vertices in $U$ . Note that if $T[U]$ is a subtree of $T$ rooted at $r$ , then $\textsc{Ancs}_{{T}}({U})=U$ . In case $U$ is a singleton set, we omit the set notation; that is, for $v\in V$ , let $\textsc{Ancs}_{{T}}({v})$ be the set of the ancestors of $v$ in $T$ , and let $\textsc{Desc}_{{T}}({v})$ be the set of the descendants of $v$ in $T$ .

We note that if there is a vertex $v\in V$ for which $h(v)>k$ then there is no feasible solution. Also, if there is a leaf $\ell$ of $T$ for which $h(\ell)=k$ then any solution must include the set $\textsc{Ancs}_{{T}}({\ell})$ (of size $k$ ), and after adding this set to the solution, we can remove $\ell$ and all of its ancestors which are not ancestors of any other leaf. Thus, w.l.o.g. we assume that for any vertex $v\in V$ it holds that $h(v)<k$ . Also, we note that if there is a leaf $\ell$ of $T$ of size $s(\ell)=0$ then we can remove $\ell$ , solve for the resulting tree and then add $\ell$ to a subset in the cover that includes the parent of $\ell$ in $T$ . Thus, w.l.o.g. we assume that for any leaf $\ell$ of $T$ , $s(\ell)>0$ .

The algorithm for computing a cover is iterative. In each iteration, we compute a partial cover as described below. We then continue to the next iteration with the subtree rooted at $r$ induced by the uncovered vertices and their ancestors. The algorithm terminates when either the set of uncovered vertices is empty or the total size of the vertices of the remaining subtree (rooted at $r$ ) is at most $k$ , in which case these vertices form the last set in the cover.

In each iteration $t$ of the algorithm, we compute a subset of vertices $A_{t}\subset V$ that we call anchors. We then compute a cover of some (potentially all) descendants of the anchors in $A_{t}$ , and proceed to the next iteration.

Algorithm 1 is the pseudo code of the iterative algorithm. Initially, $V_{1}=V$ . Consider the $t$ -th iteration, for $t\geq 1$ . If $s(V_{t})\leq k$ then the algorithm terminates. Otherwise, define $A_{t}$ as the set of all the vertices $v\in V_{t}$ such that (i) the total size of the descendants of $v$ in $T[V_{t}]$ is more than $k-h(v)$ , and (ii) the total size of the descendants of every child $u$ of $v$ in $T[V_{t}]$ is at most $k-h(u)=k-h(v)-s(u)$ .

Procedure NextFit given in Algorithm 2 is called for every $a\in A_{t}$ . The input to Procedure NextFit is the tree $T_{a}$ defined as the rooted subtree that consists of the path from $r$ to $a$ and the descendants of $a$ in the subtree $T[V_{t(a)}]$ (see Figure 0(a)). When called for an anchor $a$ , Procedure NextFit (Algorithm 2) computes a cover of some (potentially all) descendants of $a$ . The number of sets returned in this procedure call is even, and the total size of the descendants of $a$ that are not covered by the sets returned by Procedure NextFit is at most $k-h(a)$ . Let $U_{t}\subseteq V_{t}$ be the set of all descendants of anchors in $A_{t}$ that were covered in iteration $t$ , together with all their ancestors. If $V_{t}=U_{t}$ , then the algorithm terminates. Otherwise, we let $V_{t+1}$ be the set of ancestors of the vertices $V_{t}\setminus U_{t}$ in $T[V_{t}]$ and continue to iteration $t+1$ .

Algorithm 1 Feasible cover computation

1:Input: An out-tree

T=(V,E)

rooted at

r

and an integer

k>0

2:Output: A feasible cover

\mathcal{C}=C_{1},\ldots,C_{c}

V_{1}\leftarrow V

\mathcal{C}\leftarrow\emptyset

t\leftarrow 1

\While

s(V_{t})>k

X_{t}\leftarrow\left\{u\in V_{t}\,|\,s(\textsc{Desc}_{{T[V_{t}]}}({u}))\leq k-% h(u)\right\}

A_{t}\leftarrow\left\{v\in V_{t}\setminus X_{t}\ |\ \mbox{all the children of % }v\mbox{ in }T[V_{t}]\mbox{ are in }X_{t}\right\}

U_{t}\leftarrow\emptyset

\triangleright

U_{t}

stores the vertices covered in iteration

t

\For

a\in A_{t}

T_{a}\leftarrow T[\textsc{Ancs}_{{T[V_{t}]}}({a})\cup\textsc{Desc}_{{T[V_{t}]}% }({a})]

10:

Q_{1},\dots,Q_{m}\leftarrow\textsc{NextFit}(a,T_{a},k)

11:Add

Q_{1},\dots,Q_{m}

\mathcal{C}

\triangleright

Add the partial cover computed by NextFit

12:

U_{t}\leftarrow U_{t}\cup Q_{1}\cup\cdots\cup Q_{m}

\EndFor\If

V_{t}\setminus U_{t}\neq\emptyset

13:

V_{t+1}\leftarrow\textsc{Ancs}_{{T[V_{t}]}}({V_{t}\setminus U_{t}})

\Else

14:

V_{t+1}\leftarrow\emptyset

\EndIf

15:

t\leftarrow t+1

\EndWhile\If

V_{t}\neq\emptyset

\triangleright

The last set in the cover

16:Add

V_{t}

\mathcal{C}

\EndIf\Return

\mathcal{C}

Algorithm 2 Next-Fit packing

NextFit

a,T_{a},k

1:Input: An anchor

a\in A

, the subtree

T_{a}

and an integer

k>0

2:Output: A feasible cover

Q_{1},\ldots,Q_{m}

of some (potentially all) vertices in

\textsc{Desc}_{{T_{a}}}({a})

3:Let

u_{1},\ldots,u_{d}

be the children of

a

T_{a}

m\leftarrow 1

Q_{m}\leftarrow\textsc{Ancs}_{{T_{a}}}({a})

\For

s=1

d

\If

s(Q_{m})+s(\textsc{Desc}_{{T_{a}}}({u_{s}}))\leq k

Q_{m}\leftarrow Q_{m}\cup\textsc{Desc}_{{T_{a}}}({u_{s}})

\Else

m\leftarrow m+1

Q_{m}\leftarrow\textsc{Ancs}_{{T_{a}}}({a})\cup\textsc{Desc}_{{T_{a}}}({u_{s}})

\EndIf\EndFor\If

m\mbox{ is odd}

\triangleright

Remove the subset

Q_{m}

m

is odd

m\leftarrow m-1

\triangleright

Note that

m>1

\EndIf\Return

Q_{1},\ldots,Q_{m}

\EndProcedure

\Procedure

Let $A=\bigcup_{t}A_{t}=\{a_{1},a_{2},\ldots\}$ be the set of anchors computed in all the iterations. For an anchor $a\in A$ , let $t(a)$ be the iteration in which $a$ was added to the set of anchors. Note that any leaf $\ell$ of $T$ appears in exactly one subset in $\mathcal{C}$ . Thus, the iteration in which $\ell$ is covered is uniquely defined.

Refer to caption — (a) The subtree $T_{a}$

Definition 1

Let $a\in A$ be an anchor.

•

If $v\in\textsc{Desc}_{{T_{a}}}({a})$ is an ancestor of a leaf $\ell$ of $T$ that is covered in iteration $t(a)$ then we say that $v$ is anchored at $a$ .
•

If $v\in\textsc{Desc}_{{T_{a}}}({a})$ is not anchored at $a$ then we say that $v$ is a leftover vertex of $a$ .
•

Let $\textsc{sa}(a)$ denote the total size of the vertices that are anchored at $a$ , and $\textsc{lo}(a)$ denote the total size of the leftover vertices of $a$ .

Clearly, $\textsc{sa}(a)+\textsc{lo}(a)=s(\textsc{Desc}_{{T_{a}}}({a}))$ . Our assumption that for every leaf $\ell$ of $T$ , $s(\ell)>0$ , implies that (i) $\textsc{sa}(a)>0$ and (ii) if there are leftover vertices then $\textsc{lo}(a)>0$ .

The proofs of the next lemmas are in LABEL:app:proofs.

Lemma 1

Let $v\in\textsc{Desc}_{{T_{a}}}({a})$ , and let $u_{s}$ be the (unique) child of $a$ that is also an ancestor of $v$ . If $v$ is a leftover vertex of $a$ then all the vertices in the subtree of $T_{a}$ rooted at $v$ , as well as the vertices along the path from $u_{s}$ to $v$ , are also leftover vertices of $a$ . If $v$ is anchored at $a$ then all the vertices in the subtree of $T_{a}$ rooted at $v$ , as well as the vertices along the path from $u_{s}$ to $v$ , are also anchored at $a$ .

Lemma 2

For any two anchors $a,a^{\prime}\in A$ , the sets of vertices anchored at $a$ and $a^{\prime}$ are disjoint.

Define a “parent-child” relation among anchors as follows. For two anchors $a$ and $b$ , we say that $a$ is the anchor-parent of $b$ and $b$ is the anchor-child of $a$ if (i) $a$ is an ancestor of $b$ in $T$ , and (ii) the path from $a$ to $b$ (in $T$ ) does not contain any anchors other than $a$ and $b$ . Note that if anchor $a$ is an anchor-parent of $b$ then $t(a)>t(b)$ ; that is, the iteration $t(a)$ in which $a$ is added to the set of anchors is after iteration $t(b)$ . This follows from the definition of $A_{t}$ . For anchor $a\in A$ , let $\textsc{AC}(a)\subset\textsc{Desc}_{{T}}({a})\cap A$ be the set of anchor-children of $a$ . We extend this definition for all $v\in V$ , and let $\textsc{AC}(v)\subset\textsc{Desc}_{{T}}({v})\cap A$ be all the anchors $b\in\textsc{Desc}_{{T}}({v})\cap A$ such that the path from $v$ to $b$ (in $T$ ) does not contain any anchors other than $b$ and (possibly) $v$ . For $v\in V$ , let $\textsc{AD}(v)=\textsc{Desc}_{{T}}({v})\cap A$ be the set of anchors that are also descendants of $v$ . A top anchor is an anchor that is not an anchor-child of any other anchor. Let $\textsc{top}A\subseteq A$ denote the set of top anchors. Note that if the root $r$ is an anchor then $\textsc{top}A=\left\{r\right\}$ .

Lemma 3

The number of subsets in the solution computed by Algorithm 1 is upper bounded by

\alpha+\sum_{a\in A}2\left\lfloor\frac{\textsc{sa}(a)}{k-h(a)+1}\right\rfloor,

where

\alpha=\begin{cases}1&\exists a\in\textsc{top}A\text{ s.t. }\textsc{lo}(a)>0\\ 1&\exists\text{ leaf }\ell\in V\text{ s.t. }\textsc{Ancs}_{{T}}({\ell})\cap% \textsc{top}A=\emptyset\\ 0&\text{otherwise}\end{cases}

Proof

Let $\mathcal{Q}=Q_{1},\ldots,Q_{d}$ be the solution computed by the algorithm. Fix $a\in A$ , and let $\mathcal{Q}_{a}$ be the subsets in $\mathcal{Q}$ that were returned by Procedure NextFit (Algorithm 2) when it computed a feasible cover of the vertices in $\textsc{Desc}_{{T_{a}}}({a})$ . Note that the union of all the subsets in $\mathcal{Q}_{a}$ is the set of vertices anchored at $a$ (whose total size is $\textsc{sa}(a)$ ) together with all the ancestors of $a$ . Also, $\mathcal{Q}_{a}$ consists of at least two subsets, and a vertex anchored at $a$ cannot appear in more than one subset in $\mathcal{Q}_{a}$ . Consider the subsets in $\mathcal{Q}_{a}$ in the order in which they were computed by Algorithm 2. It follows from Procedure NextFit that the total size of the vertices in any pair of consecutive subsets in this ordered list is at least $k-h(a)+1$ . Since the total size of vertices anchored at $a$ is $\textsc{sa}(a)$ , the number of such disjoint pairs is upper bounded by $\left\lfloor\frac{\textsc{sa}(a)}{k-h(a)+1}\right\rfloor$ . By Line 8 of Procedure NextFit, the number of sets in $Q_{a}$ is even, and thus the total number of subsets in $\mathcal{Q}_{a}$ is upper bounded by $2\left\lfloor\frac{\textsc{sa}(a)}{k-h(a)+1}\right\rfloor$ . The total upper bound is given by summing this bound over all anchors $a\in A$ . We may have one additional subset if the algorithm is terminated when $|V_{t}|>0$ . By our construction and Lemma 1, this may happen iff there exists a leaf $\ell$ that is not anchored at any anchor. If such a leaf $\ell$ exists then one of the following two conditions must be satisfied: (i) $\ell$ has no ancestor that is an anchor, or (ii) $\ell$ is a leftover vertex of the (unique) top anchor $a$ that is an ancestor of $\ell$ , in which case $\textsc{lo}(a)>0$ . The lemma follows. ∎

We now prove a lower bound on the number of subsets in any feasible solution and in particular in the optimal solution. Let $\mathcal{P}=P_{1},\dots,P_{p}$ be a feasible solution. Since every subset in $\mathcal{P}$ is closed under ancestor relation, some vertices may appear in multiple subsets. We refer to each such appearance of a vertex $v$ as an occurrence of $v$ , and associate the size $s(v)$ to each of its occurrences. For an anchor $a\in A$ , let $\mathcal{P}(a)$ be the set of all the subsets in $\mathcal{P}$ that contain vertices anchored at either $a$ or a descendant of $a$ . For an anchor $a\in A$ , let $\mathcal{U}(a)\subseteq\textsc{Desc}_{{T}}({a})$ be the set of all vertices such that each vertex is both a descendant of $a$ and an ancestor of a vertex anchored at either $a$ or a descendant of $a$ .

Lemma 4

For every $a\in A$ , the number of subsets in $\mathcal{P}(a)$ is at least

\textsc{LB}(a)=\sum_{b\in\textsc{AD}(a)\cup\left\{a\right\}}\left\lfloor\frac{% \textsc{sa}(b)}{k-h(b)}\right\rfloor.

If the lower bound is tight then all the leaves that are in the subsets in $\mathcal{P}(a)$ must be anchored at either $a$ or a descendant of $a$ .

Proof

To prove the lower bound of $\textsc{LB}(a)$ on the number of subsets in $\mathcal{P}(a)$ for every $a\in A$ , we prove a slightly stronger lower bound of $\left(k-h(a)\right)\textsc{LB}(a)$ on the total size of the occurrences of vertices in $\mathcal{U}(a)$ in subsets in $\mathcal{P}(a)$ . Since any subset that contains a descendant of $a$ must contain also the ancestors of $a$ (including $a$ ), whose total size is $h(a)$ , the total size of the vertices in $\mathcal{U}(a)$ that can be in a single subset in $\mathcal{P}(a)$ is no more than $k-h(a)$ . Thus, a lower bound of $\left(k-h(a)\right)\textsc{LB}(a)$ on the total size of the occurrences of vertices in $\mathcal{U}(a)\subseteq\textsc{Desc}_{{T}}({a})$ in subsets in $\mathcal{P}(a)$ implies a lower bound of $\textsc{LB}(a)$ on the number of subsets in $\mathcal{P}(a)$ (and on the number of occurrences of anchor $a$ ).

The lower bound on the total size of the occurrences of vertices in $\mathcal{U}(a)$ in subsets in $\mathcal{P}(a)$ also implies that if the lower bound is tight then all the leaves that are in the subsets in $\mathcal{P}(a)$ must be anchored at either $a$ or a descendant of $a$ . To see this, note that if any subset in $\mathcal{P}(a)$ contains a leaf $\ell$ that is not anchored at an anchor in $\textsc{AD}(a)\cup\left\{a\right\}$ then $\ell\notin\mathcal{U}(a)$ , also $s(\ell)>0$ by our assumption. It follows that the total size of the subsets in $\mathcal{P}(a)$ is strictly more than $\left(k-h(a)\right)\textsc{LB}(a)$ . Clearly, this implies that the number of subsets in $\mathcal{P}(a)$ is strictly more than $\textsc{LB}(a)$ .

The proof is by induction starting from the bottom anchors in $T$ , which are the anchors with no anchor-children. For the induction base, consider a bottom anchor $a$ . Note that in this case $\mathcal{U}(a)$ is the set of all vertices anchored at $a$ . The subsets in $\mathcal{P}(a)$ cover all the vertices anchored at $a$ ; thus, the total size of the occurrences of these vertices in the subsets in $\mathcal{P}(a)$ is at least $\textsc{sa}(a)$ . Clearly, $\textsc{sa}(a)\geq\left(k-h(a)\right)\left\lfloor\frac{\textsc{sa}(a)}{k-h(a)}% \right\rfloor=\left(k-h(a)\right)\textsc{LB}(a)$ . For the inductive step, consider an anchor $a$ and assume that the lemma holds for every anchor $b\in\textsc{AC}(a)$ . Specifically, for every anchor $b\in\textsc{AC}(a)$ , the total size of the occurrences of vertices in $\mathcal{U}(b)$ in subsets in $\mathcal{P}(b)$ is at least $\left(k-h(b)\right)\textsc{LB}(b)$ . Note that $\mathcal{P}(b)\subseteq\mathcal{P}(a)$ and $\mathcal{U}(b)\subseteq\mathcal{U}(a)$ . Let $S_{a}$ the subtree of $T$ rooted at $a$ given by the union of the paths from $a$ to each of its anchor-children, excluding the anchor-children (see Figure 0(b)). Note that the vertices of $S_{a}$ as well as the vertices in $\textsc{AC}(v)$ are in $\mathcal{U}(a)$ .

Claim 5

For every vertex $v$ of $S_{a}$ , the total size of the occurrences of vertices in $\textsc{Desc}_{{T}}({v})\cap\mathcal{U}(a)$ in subsets in $\bigcup_{b\in\textsc{AC}(v)}\mathcal{P}(b)$ is at least $\left(k-h(v)\right)\sum_{b\in\textsc{AC}(v)}{\textsc{LB}(b)}$ .

Proof (of Claim 5)

We prove the claim vertex by vertex, scanning the vertices of $S_{a}$ bottom-up. Consider a leaf $v$ of $S_{a}$ . By the definition of $S_{a}$ , its children are anchors in $\textsc{AC}(v)$ . By the induction hypothesis of Lemma 4, for every anchor $b\in\textsc{AC}(v)$ the total size of the occurrences of vertices in $\mathcal{U}(b)$ in the subsets in $\mathcal{P}(b)$ is at least $\left(k-h(b)\right)\textsc{LB}(b)$ . The total size of such occurrences that are contained in any single subset of $\mathcal{P}(b)$ is at most $k-h(b)$ , since any such subset must also contain the ancestors of $b$ (including $b$ ) whose size is $h(b)$ . It follows that the number of occurrences of $b$ in these subsets in $\mathcal{P}(b)$ is at least ${\textsc{LB}(b)}$ , and the total size of these occurrences is at least $s(b)\cdot{\textsc{LB}(b)}$ . Note that for any pair of anchors $b,b^{\prime}\in\textsc{AC}(v)$ , the sets $\mathcal{U}(b)$ and $\mathcal{U}(b^{\prime})$ are disjoint. Summing over all the anchor-children of $v$ , we have that the total size of the occurrences of vertices in $\textsc{Desc}_{{T}}({v})\cap\mathcal{U}(a)$ in the subsets in $\bigcup_{b\in\textsc{AC}(v)}\mathcal{P}(b)$ is at least $\sum_{b\in\textsc{AC}(v)}\textsc{LB}(b)\left(\left(k-h(b)\right)+{s(b)}\right)% =\left(k-h(v)\right)\sum_{b\in\textsc{AC}(v)}{\textsc{LB}(b)}$ . The last equality holds since for every $b\in\textsc{AC}(v)$ , $h(v)+s(b)=h(b)$ . The lower bound for an internal vertex of $S_{a}$ is obtained similarly. Note that a child $u$ of $v$ is either an anchor or a vertex of $S_{a}$ . If $u$ is an anchor, that is $u\in\textsc{AC}(v)$ , then as shown above, the total size of the occurrences of $u$ and its descendants in $\mathcal{U}(u)$ in the subsets in $\mathcal{P}(u)$ is $\left(k-h(u)+s(u)\right){\textsc{LB}(u)}=\left(k-h(v)\right){\textsc{LB}(u)}$ . Suppose that $u$ is a vertex of $S_{a}$ . Since $u$ is a child of $v$ and the vertices are scanned bottom-up the lower bound holds for $u$ , and the total size of the occurrences of vertices in $\textsc{Desc}_{{T}}({u})\cap\mathcal{U}(a)$ in the subsets in $\bigcup_{b\in\textsc{AC}(u)}\mathcal{P}(b)$ is $\left(k-h(u)\right)\sum_{b\in\textsc{AC}(u)}{\textsc{LB}(b)}$ . The total size of such occurrences that is contained in any single subset in $\bigcup_{b\in\textsc{AC}(u)}\mathcal{P}(b)$ is at most $k-h(u)$ , thus; the number of occurrences of $u$ in these subsets is at least $\sum_{b\in\textsc{AC}(u)}{\textsc{LB}(b)}$ , and the total size of these occurrences is at least $s(u)\cdot\sum_{b\in\textsc{AC}(u)}{\textsc{LB}(b)}$ . We get that the total size of the occurrences of $u$ and the vertices in $\textsc{Desc}_{{T}}({u})\cap\mathcal{U}(a)$ in the subsets in $\bigcup_{b\in\textsc{AC}(u)}\mathcal{P}(b)$ is $\left(k-h(u)+s(u)\right)\sum_{b\in\textsc{AC}(u)}{\textsc{LB}(b)}=\left(k-h(v)% \right)\sum_{b\in\textsc{AC}(u)}{\textsc{LB}(b)}$ . For any pair $u,u^{\prime}$ of children of $v$ , the sets $\textsc{Desc}_{{T}}({u})$ and $\textsc{Desc}_{{T}}({u^{\prime}})$ are disjoint. Summing over all the children of $v$ , we get that the total size of the occurrences of vertices in $\textsc{Desc}_{{T}}({v})\cap\mathcal{U}(a)$ in the subsets in $\bigcup_{b\in\textsc{AC}(v)}\mathcal{P}(b)$ is at least $\left(k-h(v)\right)\sum_{b\in\textsc{AC}(v)}{\textsc{LB}(b)}$ .∎

Next, we consider vertices that are anchored at $a$ . By the definition of $\mathcal{P}(a)$ , each such vertex $v$ must occur at least once in subsets in $\mathcal{P}(a)$ ; also, $v\in\mathcal{U}(a)$ . Note that $v$ may be an ancestor of an anchor $b\in\textsc{AD}(a)$ . This may happen in case $v$ is a vertex of $S_{a}$ , and also in case a leftover vertex of an anchor $b\in\textsc{AD}(a)$ is anchored at $a$ , and $v$ is on the path from $a$ to $b$ . In case $v$ is an ancestor of an anchor $b\in\textsc{AD}(a)$ , our induction hypothesis and Claim 5 already imply a lower bound on the number of its occurrences in subsets in $\mathcal{P}(a)$ . Specifically, in case $v\in\textsc{AD}(a)$ , our induction hypothesis implies a lower bound of ${\textsc{LB}(v)}$ on the number of its occurrences, and in case $v\notin\textsc{AD}(a)$ and $\textsc{AC}(v)\neq\emptyset$ , Claim 5 implies a lower bound of $\sum_{b\in\textsc{AC}(v)}{\textsc{LB}(b)}$ on the number of its occurrences. We prove that $v$ must occur at least once more in subsets in $\mathcal{P}(a)$ , in addition to this implied lower bound. This results in addition of $\textsc{sa}(a)$ to the total size of the occurrences of vertices anchored at $a$ in the subsets in $\mathcal{P}(a)$ .

Claim 6

For every vertex $v$ anchored at $a$ , the number of occurrences of $v$ in the subsets in $\mathcal{P}(a)$ is at least

\begin{cases}1&v\notin\textsc{Ancs}_{{T}}({\textsc{AD}(a)})\\ 1+{\textsc{LB}(v)}&v\in\textsc{AD}(a)\\ 1+\sum_{b\in\textsc{AC}(v)}{\textsc{LB}(b)}&\text{otherwise}\end{cases}

Proof (of Claim 6)

If $v$ is anchored at $a$ then it must be an ancestor of a leaf $\ell$ of $T$ that is anchored at $a$ . Certainly, $v$ must occur in the subset in $\mathcal{P}(a)$ that covers $\ell$ . If $v$ is not an ancestor of an anchor $b\in\textsc{AD}(a)$ , we are done. If $v$ is an anchor and thus $v\in\textsc{AD}(a)$ , and the number of occurrences of $v$ in the subsets in $\mathcal{P}(v)$ is strictly more than $\textsc{LB}(v)$ , then we are done. Otherwise, the lower bound $\textsc{LB}(v)$ is tight, and by the induction hypothesis of Lemma 4, all the leaves that are in the subsets in $\mathcal{P}(v)$ must be anchored at $v$ or a descendant of $v$ . Thus, none of these subsets can cover $\ell$ . It follows that $v$ must occur in at least one more subset in $\mathcal{P}(a)$ that covers $\ell$ . A similar argument applies also if $v\notin\textsc{AD}(a)$ and $\textsc{AC}(v)\neq\emptyset$ . Let $a^{\prime}\in\textsc{AD}(a)\cap\left\{a\right\}$ be the nearest ancestor of $v$ that is an anchor. By Claim 5, the total size of the occurrences of vertices in $\textsc{Desc}_{{T}}({v})\cap\mathcal{U}(a^{\prime})$ in subsets in $\bigcup_{b\in\textsc{AC}(v)}\mathcal{P}(b)$ is at least $\left(k-h(v)\right)\sum_{b\in\textsc{AC}(v)}{\textsc{LB}(b)}$ . It follows that the number of occurrences of $v$ in the subsets in $\bigcup_{b\in\textsc{AC}(v)}\mathcal{P}(b)$ is at least $\sum_{b\in\textsc{AC}(v)}{\textsc{LB}(b)}$ . If $\ell$ is not in any of the subsets in $\bigcup_{b\in\textsc{AC}(v)}\mathcal{P}(b)$ , then $v$ must occur in at least one more subset in $\mathcal{P}(a)$ that covers $\ell$ , and we are done. Suppose that this is not the case, and $\ell$ is in a subset in $\bigcup_{b\in\textsc{AC}(v)}\mathcal{P}(b)$ . It is not difficult to verify that the proof of Claim 5 implies the lower bound on the total size of the vertices in a subset of $\textsc{Desc}_{{T}}({v})\cap\mathcal{U}(a^{\prime})$ . This subset is the union of three sets: $\textsc{Desc}_{{T}}({v})\cap\left(\bigcup_{b\in\textsc{AC}(v)}\mathcal{U}(b)\right)$ , $\textsc{AC}(v)$ , and $\textsc{Desc}_{{S_{a^{\prime}}}}({v})$ . Clearly, $\ell$ is not in any of these three sets. Thus, the total size of the occurrences of vertices in $\textsc{Desc}_{{T}}({v})\cap\mathcal{U}(a)$ in the subsets in $\bigcup_{b\in\textsc{AC}(v)}\mathcal{P}(b)\subset\mathcal{P}(a)$ is strictly more than $\left(k-h(v)\right)\sum_{b\in\textsc{AC}(v)}{\textsc{LB}(b)}$ . Hence, the number of occurrences of $v$ in the subsets in $\bigcup_{b\in\textsc{AC}(v)}\mathcal{P}(b)$ is strictly more than $\sum_{b\in\textsc{AC}(v)}{\textsc{LB}(b)}$ .∎

By Claims 5 and 6 and our induction hypothesis we get that the total size of the occurrences of vertices in $\textsc{Desc}_{{T}}({a})\cap\mathcal{U}(a)$ in the subsets in $\mathcal{P}(a)$ is at least

	$\displaystyle\textsc{sa}(a)$	$\displaystyle+\left(k-h(a)\right)\sum_{b\in\textsc{AC}(a)}{\textsc{LB}(b)}\geq% \left(k-h(a)\right)\left(\left\lfloor\frac{\textsc{sa}(a)}{k-h(a)}\right% \rfloor+\sum_{b\in\textsc{AC}(a)}{\textsc{LB}(b)}\right)$
		$\displaystyle=\left(k-h(a)\right)\sum_{c\in\textsc{AD}(a)\cup\left\{a\right\}}% \left\lfloor\frac{\textsc{sa}(c)}{k-h(c)}\right\rfloor=\left(k-h(a)\right)% \textsc{LB}(a)$

The first equality holds since ${\textsc{LB}(b)}=\sum_{c\in\textsc{AD}(b)\cup\left\{b\right\}}\left\lfloor% \frac{\textsc{sa}(c)}{k-h(c)}\right\rfloor$ . ∎

Corollary 2

The number of subsets in any feasible solution is at least

\alpha+\sum_{a\in A}\left\lfloor\frac{\textsc{sa}(a)}{k-h(a)}\right\rfloor,

where $\alpha$ is defined in Lemma 3.

Proof

If $\textsc{top}A=\left\{r\right\}$ then by Lemma 4 the number of occurrences of $r$ is at least $\textsc{LB}(r)=\sum_{a\in A}\left\lfloor\frac{\textsc{sa}(a)}{k-h(a)}\right\rfloor$ . If this lower bound is tight then all the leaves in subsets in $\mathcal{P}(r)$ are anchored at some vertex. If $\textsc{lo}(r)>0$ , then there is a leaf of $T$ that is a leftover vertex of $r$ and thus not anchored at any vertex. In this case, at least one additional subset is needed to cover this leaf. If $\textsc{top}A\neq\left\{r\right\}$ then $\textsc{AC}(r)=\textsc{top}A$ . In this case, following the proof of Claim 5, we get that the total size of the occurrences of the descendants of $r$ in subsets in $\mathcal{P}$ is at least $\left(k-s(r)\right)\sum_{a\in\textsc{top}A}\textsc{LB}(a)$ . This implies that $r$ occurs in at least $\sum_{a\in\textsc{top}A}\textsc{LB}(a)=\sum_{a\in A}\left\lfloor\frac{\textsc{% sa}(a)}{k-h(a)}\right\rfloor$ subsets in $\mathcal{P}$ . If the bound is tight then all the leaves in these subsets are anchored vertices. Thus, if there exists a vertex that is not anchored at any vertex, at least one additional subset is needed. This occurs when either $\exists a\in\textsc{top}A\text{ s.t. }\textsc{lo}(a)>0$ , or $\exists\text{ leaf }\ell\in V$ s.t. none of the ancestors of $\ell$ is an anchor. ∎

Corollary 2 and Lemma 3 imply a factor 2 approximation.

3 Open problems

An intriguing open problem is to bridge the gap between our $2$ -approximation and $1.5$ -inapproximability result for ct. Recall that ct is the special case of cpo on out-trees. While we expect cpo to be hard to approximate on general graphs (as mentioned above), exploring further the hardness of cpo on various graph classes remains open.

Another appealing line of research is to investigate the connections between cpo and a natural covering variant of the dksh problem defined as follows. Given a hypergraph $G=(V,E)$ and an integer $k$ , find the minimum number of vertex sets, each of cardinality at most $k$ , such that every hyperedge is fully contained in one of the sets. We are not aware of earlir studies of this problem, even in the special case where $G$ is a graph. One interesting direction is to derive nontrivial hardness results for this problem and show possible implications for cpo.

References

[1] Azzolini, D., Riguzzi, F., Lamma, E.: Studying transaction fees in the bitcoin blockchain with probabilistic logic programming. Information 10(11), 335 (2019)
[2] Biondi, M., Saliba, S., Harjunkoski, I.: Production optimization and scheduling in a steel plant: Hot rolling mill. In: 18th World Congress of the International Federation of Automatic Control. pp. 11750–11754 (2011)
[3] Bonsma, P.: Most balanced minimum cuts. Discrete Applied Mathematics 158(4), 261–276 (2010)
[4] Borradaile, G., Heeringa, B., Wilfong, G.: The knapsack problem with neighbour constraints. Journal of Discrete Algorithms 16, 224–235 (2012)
[5] Chalermsook, P., Cygan, M., Kortsarz, G., Laekhanukit, B., Manurangsi, P., Nanongkai, D., Trevisan, L.: From gap-ETH to FPT-inapproximability: Clique, dominating set, and more. In: 58th Annual Symposium on Foundations of Computer Science (FOCS). pp. 743–754 (2017)
[6] Cheng, T., Wang, K., Wang, L.C., Lee, C.W.: An in-switch rule caching and replacement algorithm in software defined networks. In: 2018 IEEE International Conference on Communications (ICC). pp. 1–6. IEEE (2018)
[7] Chlamtác, E., Dinitz, M., Konrad, C., Kortsarz, G., Rabanca, G.: The densest k-subhypergraph problem. SIAM J. Discret. Math. 32(2), 1458–1477 (2018)
[8] Chlamtáč, E., Dinitz, M., Makarychev, Y.: Minimizing the union: Tight approximations for small set bipartite vertex expansion. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. pp. 881–899 (2017)
[9] Cygan, M., Fomin, F.V., Kowalik, Ł., Lokshtanov, D., Marx, D., Pilipczuk, M., Pilipczuk, M., Saurabh, S.: Parameterized algorithms, vol. 4 (2015)
[10] Dong, M., Li, H., Ota, K., Xiao, J.: Rule caching in sdn-enabled mobile access networks. IEEE Network 29(4), 40–45 (2015)
[11] Doron-Arad, I., Shachnai, H.: Approximating bin packing with conflict graphs via maximization techniques. arXiv preprint arXiv:2302.10613 (2023)
[12] Downey, R.G., Fellows, M.R.: Fundamentals of parameterized complexity, vol. 4 (2013)
[13] Efsandiari, H., Hajiaghyi, M., Könemann, J., Mahini, H., Malec, D., Sanita, L.: Approximate deadline-scheduling with precedence constraints. In: Algorithms-ESA 2015: 23rd Annual European Symposium, Patras, Greece, September 14-16, 2015, Proceedings. pp. 483–495 (2015)
[14] Gamage, S., Pasqual, A.: High-performance parallel packet classification architecture with popular rule caching. In: 2012 18th IEEE International Conference on Networks (ICON). pp. 52–57. IEEE (2012)
[15] Gao, P., Xu, Y., Chao, H.J.: Ovs-cab: Efficient rule-caching for open vswitch hardware offloading. Computer Networks 188, 107844 (2021)
[16] Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman (1979)
[17] Hajiaghayi, M., Jain, K., Konwar, K., Lau, L., Mandoiu, I., Russell, A., Shvartsman, A., Vazirani, V.: The minimum k-colored subgraph problem in haploty** and dna primer selection. In: Proceedings of the International Workshop on Bioinformatics Research and Applications (IWBRA). pp. 1–12 (2006)
[18] Hoberg, R., Rothvoss, T.: A logarithmic additive integrality gap for bin packing. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. pp. 2616–2625 (2017)
[19] Huang, H., Guo, S., Li, P., Liang, W., Zomaya, A.Y.: Cost minimization for rule caching in software defined networking. IEEE Transactions on Parallel and Distributed Systems 27(4), 1007–1016 (2015)
[20] Ibarra, O.H., Kim, C.E.: Approximation algorithms for certain scheduling problems. Mathematics of Operations Research 3(3), 197–204 (1978)
[21] Katta, N., Alipourfard, O., Rexford, J., Walker, D.: Cacheflow: Dependency-aware rule-caching for software-defined networks. In: Proceedings of the Symposium on SDN Research. pp. 1–12 (2016)
[22] Lenstra, J.K., Kan, A.R., Brucker, P.: Complexity of machine scheduling problems. In: Annals of discrete mathematics, vol. 1, pp. 343–362 (1977)
[23] Li, H., Guo, S., Wu, C., Li, J.: Fdrc: Flow-driven rule caching optimization in software defined networking. In: 2015 IEEE International Conference on Communications (ICC). pp. 5777–5782. IEEE (2015)
[24] Li, R., Pang, Y., Zhao, J., Wang, X.: A tale of two (flow) tables: Demystifying rule caching in openflow switches. In: Proceedings of the 48th International Conference on Parallel Processing. pp. 1–10 (2019)
[25] Li, R., Zhao, B., Chen, R., Zhao, J.: Taming the wildcards: Towards dependency-free rule caching with freecache. In: 2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS). pp. 1–10. IEEE (2020)
[26] Manurangsi, P.: Inapproximability of maximum edge biclique, maximum balanced biclique and minimum k-cut from the small set expansion hypothesis. In: 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017) (2017)
[27] McMenamin, C., Daza, V., Fitzi, M., O’Donoghue, P.: Fairtradex: A decentralised exchange preventing value extraction. In: Proceedings of the 2022 ACM CCS Workshop on Decentralized Finance and Security. pp. 39–46 (2022)
[28] Moreno, E., Espinoza, D., Goycoolea, M.: Large-scale multi-period precedence constrained knapsack problem: a mining application. Electronic notes in discrete mathematics 36, 407–414 (2010)
[29] Obadia, A., Salles, A., Sankar, L., Chitra, T., Chellani, V., Daian, P.: Unity is strength: A formalization of cross-domain maximal extractable value. arXiv preprint arXiv:2112.01472 (2021)
[30] Papazachos, Z.C., Karatza, H.D.: Gang scheduling with precedence constraints. In: Proceedings of the 2010 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS’10). pp. 331–337 (2010)
[31] Pferschy, U., Scatamacchia, R.: Improved dynamic programming and approximation results for the knapsack problem with setups. International Transactions in Operational Research 25(2), 667–682 (2018)
[32] Rastegar, S.H., Abbasfar, A., Shah-Mansouri, V.: Rule caching in sdn-enabled base stations supporting massive iot devices with bursty traffic. IEEE Internet of Things Journal 7(9), 8917–8931 (2020)
[33] Rottenstreich, O., Kulik, A., Joshi, A., Rexford, J., Rétvári, G., Menasché, D.S.: Cooperative rule caching for sdn switches. In: 9th International Conference on Cloud Networking (CloudNet). pp. 1–7 (2020)
[34] Rottenstreich, O., Tapolcai, J.: Optimal rule caching and lossy compression for longest prefix matching. IEEE/ACM Transactions on Networking 25(2), 864–878 (2016)
[35] Samavati, M., Essam, D., Nehring, M., Sarker, R.: A methodology for the large-scale multi-period precedence-constrained knapsack problem: an application in the mining industry. International Journal of Production Economics 193, 12–20 (2017)
[36] Sarrar, N., Uhlig, S., Feldmann, A., Sherwood, R., Huang, X.: Leveraging zipf’s law for traffic offloading. ACM SIGCOMM Computer Communication Review 42(1), 16–22 (2012)
[37] Sheu, J.P., Chuo, Y.C.: Wildcard rules caching and cache replacement algorithms in software-defined networking. IEEE Transactions on Network and Service Management 13(1), 19–29 (2016)
[38] Stonebraker, M., Jhingran, A., Goh, J., Potamianos, S.: On rules, procedure, caching and views in data base systems. ACM SIGMOD Record 19(2), 281–290 (1990)
[39] Wang, Y.Z., Zheng, Z., Zhu, M.M., Zhang, K.T., Gao, X.Q.: An integrated production batch planning approach for steelmaking-continuous casting with cast batching plan as the core. Computers & Industrial Engineering 173, 108636 (2022)
[40] Weintraub, B., Torres, C.F., Nita-Rotaru, C., State, R.: A flash (bot) in the pan: measuring maximal extractable value in private pools. In: Proceedings of the 22nd ACM Internet Measurement Conference. pp. 458–471 (2022)
[41] Woeginger, G.J.: On the approximability of average completion time scheduling under precedence constraints. In: Automata, Languages and Programming: 28th International Colloquium, ICALP 2001 Crete, Greece, July 8–12, 2001 Proceedings. pp. 887–897 (2001)
[42] Yan, B., Xu, Y., Chao, H.J.: Adaptive wildcard rule cache management for software-defined networks. IEEE/ACM Transactions on Networking 26(2), 962–975 (2018)
[43] Yan, B., Xu, Y., Xing, H., Xi, K., Chao, H.J.: Cab: A reactive wildcard rule caching system for software-defined networks. In: Proceedings of the third workshop on Hot topics in software defined networking. pp. 163–168 (2014)
[44] Yang, J., Li, T., Yan, J., Li, J., Li, C., Wang, B.: Pipecache: High hit rate rule-caching scheme based on multi-stage cache tables. Electronics 9(6), 999 (2020)
[45] Ye, Y., Jiang, Z., Diao, X., Yang, D., Du, G.: An ontology-based hierarchical semantic modeling approach to clinical pathway workflows. Computers in Biologyand Medicine 39, 722–732 (2009)

Appendix 0.A Motivation for rcp

A prime motivation for studying rcp comes from the area of networking [10, 43, 37, 19, 24, 38, 23, 15, 6, 34, 25, 14, 42, 33, 32, 44]. In a Software-Defined Network (SDN) traffic flow is governed by a logically centralized controller that utilizes packet-processing rules to manage the underlying switches [21]. The number of rules tends to be high while most traffic relies on a small fraction of these rules [36]. Thus, caching frequently used rules can accelerate the processing time of the packets. However, standard caching policies cannot be used due to dependencies among rules. One common form of dependency is a partial overlap in the binary strings representing the rules. For example, consider the rules $R_{1}$ =‘10**’ (where the symbol ‘*’ denotes a wildcard) and $R_{2}$ =‘1000’. Then whenever $R_{1}$ is placed in the cache, $R_{2}$ must be placed as well. Indeed, if only $R_{1}$ is in the cache then a message with a header ‘1000’ would be matched with $R_{1}$ , causing a correctness issue in handling this packet. Now, the problem of placing a feasible subset of the rules which handle a maximum total volume of traffic can modeled as follows. We represent the rules by a DAG $G=(V,E)$ , where $v_{i}\in V$ corresponds to the rule $R_{i}$ , and there is a directed edge from $v_{i}$ to $v_{j}$ if placing $R_{j}$ in the cache implies that $R_{i}$ is also in the cache. The profit of each vertex $v_{i}\in V$ reflects the volume of traffic handled by the rule $R_{i}$ . The goal is to select a subset of vertices of maximum total profit which fits into the cache, that is closed under precedence constraints.

rcp can be used also to model the maximal extractable value (MEV) problem in blockchain [27, 29, 40, 1]. Each blockchain transaction is associated with a fee earned by the miner who creates the block containing this transaction. The set of transactions is associated with a partial order, and each blockchain prefix has to be closed under precedence constraints. MEV is the maximum potential profit that a blockchain miner can gain from transactions that have not been validated. Computing MEV can be cast as an rcp instance where the vertices of the graph are the transactions, the edges represent the precedence constraints, the profits are the associated fees, and the bound $k$ is the number of transactions that fit in a single block. Other applications of rcp variants arise, e.g., in the mining industry [28, 35] and in scheduling [30, 41, 13, 20, 22].

Appendix 0.B Hardness Result for CT

Our hardness result for ct is based on a reduction from bin packing with cluster complement conflict graph (bpcc). An undirected graph $G=(V,E)$ is called a cluster complement if there is a partition $V_{1},\ldots,V_{m}$ of $V$ such that for all $i\in[m]$ it holds that $V_{i}$ is an independent set in $G$ and for all $i,j\in[m]$ where $i\neq j$ and any $v\in V_{i}$ and $u\in V_{j}$ it holds that $\{u,v\}\in E$ . We now formally define the bpcc problem.

Definition 2

The bin packing with cluster complement conflict graph (bpcc) is defined as follows.
Input: A cluster complement $G=(V,E)$ , a weight function $w:V\rightarrow\mathbb{Z}^{+}_{0}$ , and a value $k\in\mathbb{N}$ .
Configuration: An independent set $C\subseteq V$ in $G$ such that $w(C)\leq k$ .
Solution: For some $m\in\mathbb{N}$ , we say that $\left(C_{1},\ldots,C_{q}\right)$ is a solution with cardinality $q$ if the following holds.

•

For every $i\in[q]$ it holds that $C_{i}$ is a configuration.
•

For all $v\in V$ there is $i\in[q]$ such that $v\in C_{i}$ .

Objective: Find a solution of minimum cardinality.

Proof of Theorem 1.2: We show a reduction from bpcc to ct. Let $I=(G=(V,E),w,k)$ be a bpcc instance. Let $V_{1},\ldots,V_{m}$ be the unique partition of $V$ into maximal independent sets, which exists and can be found in polynomial time since $G$ is cluster complement. Then, define the reduced ct instance $X_{I}=(H=({\mathcal{V}},{\mathcal{E}}),s,K)$ as follows

•

The vertex set ${\mathcal{V}}$ of $X_{I}$ contains a root $r$ and a vertex $r_{i}$ for every $i\in[m]$ , where $(r,r_{i})\in{\mathcal{E}}$ ( $r_{i}$ is a child of $r$ for every $i\in[m]$ ). For every $i\in[m]$ and every $v\in V_{i}$ define a leaf $\ell_{v}$ and add an edge $(r_{i},\ell_{v})\in{\mathcal{E}}$ . Overall, we get a two-level star graph.
•

Define the size function $s:{\mathcal{V}}\rightarrow\mathbb{Z}^{+}_{0}$ such that $s(r)=0$ , for all $i\in[m]$ define $s(r_{i})=2\cdot k$ , and for all $i\in[m]$ and $v\in V_{i}$ define $s(\ell_{v})=w(v)$ .
•

Define $K=3\cdot k$ .

For every $C\subseteq V$ , let

X(C)=\{r\}\cup\bigcup_{i\in[m]~{}|~{}C\cap V_{i}\neq\emptyset}\{r_{i}\}\cup% \bigcup_{v\in C}\{\ell_{v}\}.

(1)

Claim 7

For every $C\subseteq V$ if $C$ is a configuration of $I$ then $X(C)$ is a configuration of $X_{I}$ .

Proof

Assume that $C$ is a configuration of $I$ . Observe that, by (1), $X(C)$ is closed under the precedence constraints. Moreover,

s(X(C))=s(r)+\sum_{i\in[m]~{}|~{}C\cap V_{i}\neq\emptyset}s(r_{i})+\sum_{v\in C% }s(\ell_{v})=0+2k+w(C)\leq 3\cdot k=K.

The first equality follows from (1). The second equality holds since $C$ is a configuration; thus, it is an independent set in $G$ , and it can contain vertices from at most one $V_{i},i\in[m]$ . The inequality holds since $C$ is a configuration. We conclude that $X(C)$ is a configuration of $X_{I}$ . $\square$

For every $C\subseteq{\mathcal{V}}$ let

I(C)=\bigcup_{\ell_{v}\in C~{}|~{}v\in V}\{v\}.

(2)

Claim 8

For every $C\subseteq{\mathcal{V}}$ if $C$ is a configuration of $X_{I}$ then $I(C)$ is a configuration of $I$ .