HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: algpseudocodex

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: CC BY 4.0
arXiv:2403.01568v1 [cs.DS] 03 Mar 2024
11institutetext: Computer Science Department, Technion, Haifa, Israel. 11email: {idoron-arad,naor,hadas}@cs.technion.ac.il. 22institutetext: Computer Science Department, Rutgers University-Camden, Camden, NJ, USA. 22email: [email protected]. 33institutetext: Computer Science Department, New Jersey Institute of Technology, Newark, NJ, USA. 33email: [email protected].

Approximations and Hardness of Covering and Packing Partially Ordered Items

Ilan Doron-Arad 11    Guy Kortsarz 22    Joseph (Seffi) Naor 11   
Baruch Schieber
33
   Hadas Shachnai 11

Approximations and Hardness of Covering and Packing Partially Ordered Items

Ilan Doron-Arad 11    Guy Kortsarz 22    Joseph (Seffi) Naor 11   
Baruch Schieber
33
   Hadas Shachnai 11
Abstract

Motivated by applications in production planning and storage allocation in hierarchical databases, we initiate the study of covering partially ordered items (cpo). Given a value k𝑘k\in\mathbb{N}italic_k ∈ blackboard_N, and a directed graph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ) where each vertex has a size in {0,1,,k}01𝑘\{0,1,\ldots,k\}{ 0 , 1 , … , italic_k }, we seek a collection of subsets of vertices C1,,Ctsubscript𝐶1subscript𝐶𝑡C_{1},\ldots,C_{t}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT that cover all the vertices, such that for any 1jt1𝑗𝑡1\leq j\leq t1 ≤ italic_j ≤ italic_t, the total size of vertices in Cjsubscript𝐶𝑗C_{j}italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is bounded by k𝑘kitalic_k, and there are no edges from VCj𝑉subscript𝐶𝑗V\setminus C_{j}italic_V ∖ italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to Cjsubscript𝐶𝑗C_{j}italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. The objective is to minimize the number of subsets t𝑡titalic_t. cpo is closely related to the rule caching problem (rcp) that has been widely studied in the networking area. The input for rcp is a directed graph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ), a profit function p:V0+:𝑝𝑉superscriptsubscript0p:V\rightarrow\mathbb{Z}_{0}^{+}italic_p : italic_V → blackboard_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT, and k𝑘k\in\mathbb{N}italic_k ∈ blackboard_N. The output is a subset SV𝑆𝑉S\subseteq Vitalic_S ⊆ italic_V of maximum profit such that |S|k𝑆𝑘|S|\leq k| italic_S | ≤ italic_k and there are no edges from VS𝑉𝑆V\setminus Sitalic_V ∖ italic_S to S𝑆Sitalic_S.

Our main result is a 2222-approximation algorithm for cpo on out-trees, complemented by an asymptotic 1.51.51.51.5-hardness of approximation result. We also give a two-way reduction between rcp and the densest k𝑘kitalic_k-subhypergraph problem, surprisingly showing that the problems are equivalent w.r.t. polynomial-time approximation within any factor ρ1𝜌1\rho\geq 1italic_ρ ≥ 1. This implies that rcp cannot be approximated within factor |V|1εsuperscript𝑉1𝜀|V|^{1-{\varepsilon}}| italic_V | start_POSTSUPERSCRIPT 1 - italic_ε end_POSTSUPERSCRIPT for any fixed ε>0𝜀0{\varepsilon}>0italic_ε > 0, under standard complexity assumptions. Prior to this work, rcp was just known to be strongly NP-hard. We further show that there is no EPTAS for the special case of rcp where the profits are uniform, assuming Gap-ETH. Since this variant admits a PTAS, we essentially resolve the complexity status of this problem.

1 Introduction

Partially ordered entities are ubiquitous in the mathematical modeling of scheduling problems, distributed storage allocation, production planning, and unified language models. Often, the partial order represents either precedence constraints or dependencies among entities (or items). Motivated by applications in production planning [39, 2] and distributed storage allocation in hierarchical databases [45], we introduce the covering partially ordered items (cpo) problem. An instance of cpo consists of a directed graph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ), a value k𝑘k\in\mathbb{N}italic_k ∈ blackboard_N, and a size function s:V[0:k]s:V\rightarrow[0:k]italic_s : italic_V → [ 0 : italic_k ].111For i,j0+𝑖𝑗superscriptsubscript0i,j\in{{\mathbb{Z}_{0}^{+}}}italic_i , italic_j ∈ blackboard_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT, we denote by [i:j]delimited-[]:𝑖𝑗[i:j][ italic_i : italic_j ] the set of integers {i,i+1,,j}𝑖𝑖1𝑗\{i,i+1,\ldots,j\}{ italic_i , italic_i + 1 , … , italic_j }. A configuration is a subset of vertices UV𝑈𝑉U\subseteq Vitalic_U ⊆ italic_V such that s(U)k𝑠𝑈𝑘s(U)\leq kitalic_s ( italic_U ) ≤ italic_k,222For a set A𝐴Aitalic_A, a function f:A𝕏:𝑓𝐴𝕏f:A\rightarrow\mathbb{X}italic_f : italic_A → blackboard_X, and BA𝐵𝐴B\subseteq Aitalic_B ⊆ italic_A, define f(B)=bBf(b)𝑓𝐵subscript𝑏𝐵𝑓𝑏f(B)=\sum_{b\in B}f(b)italic_f ( italic_B ) = ∑ start_POSTSUBSCRIPT italic_b ∈ italic_B end_POSTSUBSCRIPT italic_f ( italic_b ). and U𝑈Uitalic_U is closed under precedence constraints; that is, for any uU𝑢𝑈u\in Uitalic_u ∈ italic_U and (z,u)E𝑧𝑢𝐸(z,u)\in E( italic_z , italic_u ) ∈ italic_E it holds that zU𝑧𝑈z\in Uitalic_z ∈ italic_U. A feasible solution is a set of configurations C1,,Ctsubscript𝐶1subscript𝐶𝑡C_{1},\ldots,C_{t}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT that covers V𝑉Vitalic_V, namely j[1:t]Cj=Vsubscript𝑗delimited-[]:1𝑡subscript𝐶𝑗𝑉\bigcup_{j\in[1:t]}C_{j}=V⋃ start_POSTSUBSCRIPT italic_j ∈ [ 1 : italic_t ] end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_V. The cardinality of the solution is t𝑡titalic_t, the number of configurations. The goal is to find a feasible solution of minimum cardinality.

cpo can be applied to optimize the distributed storage of large hierarchical data in unified medical language systems (UMLS) [45]. UMLS data is often distributed over several databases of bounded size. Due to the hierarchical nature of the medical taxonomy, each database needs to be closed under this hierarchy relation. The problem of minimizing the number of distributed databases of the UMLS data translates to a cpo problem instance.

Another application of cpo arises in production planning for steel mills that employ continuous casting [39, 2]. The steel-making process has high energy consumption. One way to save energy is by employing continuous casting and direct charging. In this routine, the molten steel is solidified into slabs and rolled into finished products of various sizes continuously, with no need to reheat the steel in the process. Each finished product requires specific casting, rolling, and thermal treatments in a given order, which can be modeled by a directed acyclic graph (DAG). A main challenge is to assign the finished products to batches whose size is dictated by the size of the ladle furnace while minimizing the amount of repeated operations. This gives rise to an instance of cpo.

A natural greedy approach for solving cpo is to repeatedly find, among all subsets of vertices that can be feasibly assigned to a single configuration, a subset that maximizes the size of yet unassigned vertices. This single configuration problem is a variant of the well known rule caching problem (rcp) that has been studied extensively [10, 43, 37, 19, 24, 38, 23, 15, 6, 34, 25, 14, 42, 33, 32, 44]. An instance of rcp consists of a directed graph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ), a profit function p:V0+:𝑝𝑉superscriptsubscript0p:V\rightarrow{{\mathbb{Z}_{0}^{+}}}italic_p : italic_V → blackboard_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT, and a value k𝑘k\in\mathbb{N}italic_k ∈ blackboard_N. We seek a subset of vertices UV𝑈𝑉U\subseteq Vitalic_U ⊆ italic_V which is closed under precedence constraints, such that |U|k𝑈𝑘|U|\leq k| italic_U | ≤ italic_k, and p(U)=uUp(u)𝑝𝑈subscript𝑢𝑈𝑝𝑢p(U)=\sum_{u\in U}p(u)italic_p ( italic_U ) = ∑ start_POSTSUBSCRIPT italic_u ∈ italic_U end_POSTSUBSCRIPT italic_p ( italic_u ) is maximized. In Appendix 0.A we describe central applications of rcp in networking and the blockchain technology.

Prior to this work rcp was just known to be strongly NP-hard [4, 31]. Our initial attempt towards solving cpo was to find a good approximation for rcp. Surprisingly, we were able to show an equivalence between rcp and the densest k𝑘kitalic_k-subhypergraph (dksh) problem w.r.t. approximability. The input for dksh consists of a hypergraph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ) and a value k𝑘k\in{\mathbb{N}}italic_k ∈ blackboard_N. The goal is to find a subset of vertices SV𝑆𝑉S\subseteq Vitalic_S ⊆ italic_V of cardinality k𝑘kitalic_k that maximizes the number (or weight) of induced hyperedges (a more formal definition is given in LABEL:sec:2).333dksh has also been widely studied (see, e.g., [7, 17, 8] and the references therein).

Unfortunately, dksh is known to be hard to approximate within a factor of |V|1εsuperscript𝑉1𝜀|V|^{1-{\varepsilon}}| italic_V | start_POSTSUPERSCRIPT 1 - italic_ε end_POSTSUPERSCRIPT, for ε(0,1)𝜀01{\varepsilon}\in(0,1)italic_ε ∈ ( 0 , 1 ), assuming the Small Set Expansion Conjecture (by combining the results of [26] and [17]). This implies the same hardness of approximation for rcp (see Section 1.1). Given this hardness result, we expect cpo to be hard to approximate on general graphs. Thus, we consider the special case of cpo where G𝐺Gitalic_G is an out-tree. We call this problem covering partially ordered items on out-trees (ct). To the best of our knowledge, ct is studied here for the first time. We note that when G𝐺Gitalic_G is an in-tree cpo is trivial since the problem has a feasible (and unique) solution iff the total size of the vertices is at most k𝑘kitalic_k.

1.1 Our results

Our first result is an approximation algorithm for ct. Recall that, for α1𝛼1\alpha\geq 1italic_α ≥ 1, 𝒜𝒜{\mathcal{A}}caligraphic_A is an α𝛼\alphaitalic_α-approximation algorithm for a minimization (maximization) problem ΠΠ\Piroman_Π if, for any instance of ΠΠ\Piroman_Π, the output of 𝒜𝒜{\mathcal{A}}caligraphic_A is at most α𝛼\alphaitalic_α (at least 1/α1𝛼1/\alpha1 / italic_α) times the optimum.

Theorem 1.1

There is a polynomial time 2222-approximation algorithm for ct.

While out-trees have a simple structure, allowing for a greedy-based bottom-up approach in solving ct, the analysis of our approximation algorithm is nontrivial and requires extra care to make sure that the approximation bound has no additive terms (see below).

ct generalizes the classical bin packing (bp) problem. The input for bp is a set of items and a value k𝑘k\in\mathbb{N}italic_k ∈ blackboard_N. Each item has a size in [0:k]delimited-[]:0𝑘[0:k][ 0 : italic_k ], and the goal is to assign the items into a minimum number of bins of capacity k𝑘kitalic_k.444We use the definition of bp as given in [16]. In an alternative definition found in the literature, bin capacities are normalized to one, and item sizes are in [0,1]01[0,1][ 0 , 1 ]. An instance of bp is reduced to an instance of ct on a star graph by generating a leaf for each item of the bp instance and adding a root vertex of size zero. This trivial reduction implies that ct is strongly NP-hard [16].

Interestingly, we show that in contrast to bp, ct does not admit an asymptotic polynomial-time approximation scheme (APTAS), or even an asymptotic approximation strictly better than 3232\frac{3}{2}divide start_ARG 3 end_ARG start_ARG 2 end_ARG. This separates ct from bp which admits also an additive logarithmic approximation [18].

Theorem 1.2

For any α<32𝛼32\alpha<\frac{3}{2}italic_α < divide start_ARG 3 end_ARG start_ARG 2 end_ARG, there is no asymptotic α𝛼\alphaitalic_α-approximation for ct unless P=NP.

Next, we study the hardness of rcp.

Theorem 1.3

For any ρ1𝜌1\rho\geq 1italic_ρ ≥ 1, there is a ρ𝜌\rhoitalic_ρ-approximation for rcp if and only if there is a ρ𝜌\rhoitalic_ρ-approximation for dksh.

Corollary 1

Assuming the Small Set Expansion Hypothesis (SSEH) and NP \neq BPP, for any ε>0𝜀0{\varepsilon}>0italic_ε > 0 there is no |V|1εsuperscript𝑉1𝜀|V|^{1-{\varepsilon}}| italic_V | start_POSTSUPERSCRIPT 1 - italic_ε end_POSTSUPERSCRIPT-approximation for rcp.

We give a tight lower bound also for the previously studied special case of uniform rcp (u-rcp) [4, 3].555In [4], u-rcp is called uniform directed all-neighbor knapsack problem. In u-rcp the vertices have uniform (unit) profits (i.e., p(v)=1vV𝑝𝑣1for-all𝑣𝑉p(v)=1~{}\forall v\in Vitalic_p ( italic_v ) = 1 ∀ italic_v ∈ italic_V). While u-rcp is known to admit a PTAS [4, 3], the question of whether the problem admits an EPTAS or even an FPTAS remained open.666We give formal defintions relating to approximation schemes in Appendix LABEL:sec:uniform. Our next result gives a negative answer to both of these questions, posed in [4, 3].

Theorem 1.4

Assuming Gap-ETH, there is no EPTAS for u-rcp.

Finally, we show that rcp remains essentially just as hard when the in-degrees and out-degrees are bounded.

Theorem 1.5

A ρ𝜌\rhoitalic_ρ-approximation algorithm for rcp instances with in-degrees and out-degrees bounded by 2222, for any ρ1𝜌1\rho\geq 1italic_ρ ≥ 1, implies a ρ𝜌\rhoitalic_ρ-approximation for rcp.

Due to space constraints, we include in the paper body only the proof of Theorem 1.1 and defer the proofs of the other theorems to the Appendix.

Techniques:  Our algorithm for ct covers the vertices in a given out-tree T𝑇Titalic_T in a bottom-up fashion, starting from the leaves. The key players in this process are vertices called anchors which define the candidate subtrees for covering in each iteration. Interestingly, we show that the subtree associated with a specific anchor a𝑎aitalic_a (including also all of a𝑎aitalic_a’s ancestors) can be covered efficiently by using the naive NextFit algorithm.

To eliminate additive terms in the approximation guarantee (i.e., obtain an absolute ratio of 2222), a crucial step in the algorithm is to distinguish in each call to NextFit between the case where NextFit outputs an even vs. odd number of configurations. In the latter case, we discard the last configuration and cover the corresponding leftover vertices in a later iteration of the algorithm.

The crux of the analysis is to charge the number of subsets (i.e., configurations) used by the algorithm separately to each anchor. Consider a subtree of an anchor a𝑎aitalic_a, of total size sa(a)sa𝑎\textsc{sa}(a)sa ( italic_a ), covered at some iteration. Observing that each subset including vertices in this subtree must include also all the ancestors of a𝑎aitalic_a, we are able to show that the total number of subsets used is at most twice sa(a)kh(a)+1sa𝑎𝑘𝑎1\left\lfloor\frac{\textsc{sa}(a)}{k-h(a)+1}\right\rfloor⌊ divide start_ARG sa ( italic_a ) end_ARG start_ARG italic_k - italic_h ( italic_a ) + 1 end_ARG ⌋, where h(a)𝑎h(a)italic_h ( italic_a ) is the total size of the ancestors of a𝑎aitalic_a in T𝑇Titalic_T (including a𝑎aitalic_a). To complete the analysis, we lower bound the number of subsets used in any feasible solution. This is done via an intricate calculation bounding the number of occurrences of each vertex v𝑣vitalic_v in the subtree of an anchor a𝑎aitalic_a in any feasible cover, which is the heart of the analysis. Our Greedy approach may be useful for other cpo classes of instances in which the input graph G𝐺Gitalic_G has a tree-like structure (e.g., graphs of bounded treewidth).

Our proofs of hardness for ct and rcp use sophisticated constructions, most notably, to show a two-way reduction between rcp and dksh (in Appendices LABEL:sec:2 and LABEL:sec:6) and the hardness of rcp with bounded degrees (see Appendix LABEL:sec:inOut).

Organization:  Section 2 presents our approximation algorithm for ct and the proof of Theorem 1.1, and Section 3 includes some open problems. In Appendix 0.A we describe common applications of rcp, and Appendix 0.B gives the hardness result for ct (proof of Theorem 1.2). Appendices LABEL:sec:2 and LABEL:sec:6 show the equivalence between rcp and dksh (proofs of Theorem 1.3 and Corollary 1). Appendix LABEL:sec:uniform shows that there is no EPTAS for u-rcp (Theorem 1.4), and Appendix LABEL:sec:inOut proves the hardness of rcp on graphs of in-degrees and out-degrees bounded by 2222 (Theorem 1.5). Finally, some missing proofs are given in Appendix LABEL:app:proofs.

2 Approximation Algorithm for ct

In this section, we present our approximation algorithm for ct. We start with some definitions and notations. Let T=(V,E)𝑇𝑉𝐸T=(V,E)italic_T = ( italic_V , italic_E ) be an out-tree rooted at a vertex rV𝑟𝑉r\in Vitalic_r ∈ italic_V. Recall that in an out-tree all edges are oriented outwards from r𝑟ritalic_r. Thus, for an edge (u,v)E𝑢𝑣𝐸(u,v)\in E( italic_u , italic_v ) ∈ italic_E, vertex u𝑢uitalic_u precedes v𝑣vitalic_v on the (unique) path from r𝑟ritalic_r to v𝑣vitalic_v. We say that u𝑢uitalic_u is the parent of v𝑣vitalic_v and v𝑣vitalic_v is a child of u𝑢uitalic_u. More generally, if u𝑢uitalic_u is on the (unique) path from r𝑟ritalic_r to v𝑣vitalic_v then u𝑢uitalic_u is an ancestor of v𝑣vitalic_v and v𝑣vitalic_v is a descendant of u𝑢uitalic_u. A vertex v𝑣vitalic_v is considered an ancestor of itself but not a descendant of itself. Define h(v)𝑣h(v)italic_h ( italic_v ) to be the total size of the vertices on the path from r𝑟ritalic_r to v𝑣vitalic_v, which equals the total size of the ancestors of v𝑣vitalic_v.

For UV𝑈𝑉U\subseteq Vitalic_U ⊆ italic_V, let T[U]𝑇delimited-[]𝑈T[U]italic_T [ italic_U ] be the subgraph of T𝑇Titalic_T induced by U𝑈Uitalic_U. If T[U]𝑇delimited-[]𝑈T[U]italic_T [ italic_U ] is connected, then we say that T[U]𝑇delimited-[]𝑈T[U]italic_T [ italic_U ] is a subtree of T𝑇Titalic_T. Note that in this case T[U]𝑇delimited-[]𝑈T[U]italic_T [ italic_U ] is also an out-tree. From now on, we consider only induced subgraphs that are connected, namely subtrees of T𝑇Titalic_T. If rU𝑟𝑈r\in Uitalic_r ∈ italic_U, then T[U]𝑇delimited-[]𝑈T[U]italic_T [ italic_U ] is a subtree of T𝑇Titalic_T rooted at r𝑟ritalic_r.

For an out-tree T=(V,E)𝑇𝑉𝐸T=(V,E)italic_T = ( italic_V , italic_E ) and a subset of vertices UV𝑈𝑉U\subseteq Vitalic_U ⊆ italic_V, let AncsT(U)subscriptAncs𝑇𝑈\textsc{Ancs}_{{T}}({U})Ancs start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_U ) be the set of the ancestors in T𝑇Titalic_T of the vertices in U𝑈Uitalic_U, and let DescT(U)subscriptDesc𝑇𝑈\textsc{Desc}_{{T}}({U})Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_U ) be the set of the descendants in T𝑇Titalic_T of the vertices in U𝑈Uitalic_U. Note that if T[U]𝑇delimited-[]𝑈T[U]italic_T [ italic_U ] is a subtree of T𝑇Titalic_T rooted at r𝑟ritalic_r, then AncsT(U)=UsubscriptAncs𝑇𝑈𝑈\textsc{Ancs}_{{T}}({U})=UAncs start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_U ) = italic_U. In case U𝑈Uitalic_U is a singleton set, we omit the set notation; that is, for vV𝑣𝑉v\in Vitalic_v ∈ italic_V, let AncsT(v)subscriptAncs𝑇𝑣\textsc{Ancs}_{{T}}({v})Ancs start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_v ) be the set of the ancestors of v𝑣vitalic_v in T𝑇Titalic_T, and let DescT(v)subscriptDesc𝑇𝑣\textsc{Desc}_{{T}}({v})Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_v ) be the set of the descendants of v𝑣vitalic_v in T𝑇Titalic_T.

We note that if there is a vertex vV𝑣𝑉v\in Vitalic_v ∈ italic_V for which h(v)>k𝑣𝑘h(v)>kitalic_h ( italic_v ) > italic_k then there is no feasible solution. Also, if there is a leaf \ellroman_ℓ of T𝑇Titalic_T for which h()=k𝑘h(\ell)=kitalic_h ( roman_ℓ ) = italic_k then any solution must include the set AncsT()subscriptAncs𝑇\textsc{Ancs}_{{T}}({\ell})Ancs start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( roman_ℓ ) (of size k𝑘kitalic_k), and after adding this set to the solution, we can remove \ellroman_ℓ and all of its ancestors which are not ancestors of any other leaf. Thus, w.l.o.g. we assume that for any vertex vV𝑣𝑉v\in Vitalic_v ∈ italic_V it holds that h(v)<k𝑣𝑘h(v)<kitalic_h ( italic_v ) < italic_k. Also, we note that if there is a leaf \ellroman_ℓ of T𝑇Titalic_T of size s()=0𝑠0s(\ell)=0italic_s ( roman_ℓ ) = 0 then we can remove \ellroman_ℓ, solve for the resulting tree and then add \ellroman_ℓ to a subset in the cover that includes the parent of \ellroman_ℓ in T𝑇Titalic_T. Thus, w.l.o.g. we assume that for any leaf \ellroman_ℓ of T𝑇Titalic_T, s()>0𝑠0s(\ell)>0italic_s ( roman_ℓ ) > 0.

The algorithm for computing a cover is iterative. In each iteration, we compute a partial cover as described below. We then continue to the next iteration with the subtree rooted at r𝑟ritalic_r induced by the uncovered vertices and their ancestors. The algorithm terminates when either the set of uncovered vertices is empty or the total size of the vertices of the remaining subtree (rooted at r𝑟ritalic_r) is at most k𝑘kitalic_k, in which case these vertices form the last set in the cover.

In each iteration t𝑡titalic_t of the algorithm, we compute a subset of vertices AtVsubscript𝐴𝑡𝑉A_{t}\subset Vitalic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⊂ italic_V that we call anchors. We then compute a cover of some (potentially all) descendants of the anchors in Atsubscript𝐴𝑡A_{t}italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and proceed to the next iteration.

Algorithm 1 is the pseudo code of the iterative algorithm. Initially, V1=Vsubscript𝑉1𝑉V_{1}=Vitalic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_V. Consider the t𝑡titalic_t-th iteration, for t1𝑡1t\geq 1italic_t ≥ 1. If s(Vt)k𝑠subscript𝑉𝑡𝑘s(V_{t})\leq kitalic_s ( italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ italic_k then the algorithm terminates. Otherwise, define Atsubscript𝐴𝑡A_{t}italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as the set of all the vertices vVt𝑣subscript𝑉𝑡v\in V_{t}italic_v ∈ italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT such that (i) the total size of the descendants of v𝑣vitalic_v in T[Vt]𝑇delimited-[]subscript𝑉𝑡T[V_{t}]italic_T [ italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] is more than kh(v)𝑘𝑣k-h(v)italic_k - italic_h ( italic_v ), and (ii) the total size of the descendants of every child u𝑢uitalic_u of v𝑣vitalic_v in T[Vt]𝑇delimited-[]subscript𝑉𝑡T[V_{t}]italic_T [ italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] is at most kh(u)=kh(v)s(u)𝑘𝑢𝑘𝑣𝑠𝑢k-h(u)=k-h(v)-s(u)italic_k - italic_h ( italic_u ) = italic_k - italic_h ( italic_v ) - italic_s ( italic_u ).

Procedure NextFit given in Algorithm 2 is called for every aAt𝑎subscript𝐴𝑡a\in A_{t}italic_a ∈ italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. The input to Procedure NextFit is the tree Tasubscript𝑇𝑎T_{a}italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT defined as the rooted subtree that consists of the path from r𝑟ritalic_r to a𝑎aitalic_a and the descendants of a𝑎aitalic_a in the subtree T[Vt(a)]𝑇delimited-[]subscript𝑉𝑡𝑎T[V_{t(a)}]italic_T [ italic_V start_POSTSUBSCRIPT italic_t ( italic_a ) end_POSTSUBSCRIPT ] (see Figure 0(a)). When called for an anchor a𝑎aitalic_a, Procedure NextFit (Algorithm 2) computes a cover of some (potentially all) descendants of a𝑎aitalic_a. The number of sets returned in this procedure call is even, and the total size of the descendants of a𝑎aitalic_a that are not covered by the sets returned by Procedure NextFit is at most kh(a)𝑘𝑎k-h(a)italic_k - italic_h ( italic_a ). Let UtVtsubscript𝑈𝑡subscript𝑉𝑡U_{t}\subseteq V_{t}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⊆ italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be the set of all descendants of anchors in Atsubscript𝐴𝑡A_{t}italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT that were covered in iteration t𝑡titalic_t, together with all their ancestors. If Vt=Utsubscript𝑉𝑡subscript𝑈𝑡V_{t}=U_{t}italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, then the algorithm terminates. Otherwise, we let Vt+1subscript𝑉𝑡1V_{t+1}italic_V start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT be the set of ancestors of the vertices VtUtsubscript𝑉𝑡subscript𝑈𝑡V_{t}\setminus U_{t}italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∖ italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in T[Vt]𝑇delimited-[]subscript𝑉𝑡T[V_{t}]italic_T [ italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] and continue to iteration t+1𝑡1t+1italic_t + 1.

Algorithm 1 Feasible cover computation
1:Input: An out-tree T=(V,E)𝑇𝑉𝐸T=(V,E)italic_T = ( italic_V , italic_E ) rooted at r𝑟ritalic_r and an integer k>0𝑘0k>0italic_k > 0
2:Output: A feasible cover 𝒞=C1,,Cc𝒞subscript𝐶1subscript𝐶𝑐\mathcal{C}=C_{1},\ldots,C_{c}caligraphic_C = italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_C start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT
3:V1Vsubscript𝑉1𝑉V_{1}\leftarrow Vitalic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ← italic_V
4:𝒞𝒞\mathcal{C}\leftarrow\emptysetcaligraphic_C ← ∅
5:t1𝑡1t\leftarrow 1italic_t ← 1 \Whiles(Vt)>k𝑠subscript𝑉𝑡𝑘s(V_{t})>kitalic_s ( italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) > italic_k
6:Xt{uVt|s(DescT[Vt](u))kh(u)}subscript𝑋𝑡conditional-set𝑢subscript𝑉𝑡𝑠subscriptDesc𝑇delimited-[]subscript𝑉𝑡𝑢𝑘𝑢X_{t}\leftarrow\left\{u\in V_{t}\,|\,s(\textsc{Desc}_{{T[V_{t}]}}({u}))\leq k-% h(u)\right\}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ← { italic_u ∈ italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_s ( Desc start_POSTSUBSCRIPT italic_T [ italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT ( italic_u ) ) ≤ italic_k - italic_h ( italic_u ) }
7:At{vVtXt|all the children of v in T[Vt] are in Xt}subscript𝐴𝑡conditional-set𝑣subscript𝑉𝑡subscript𝑋𝑡all the children of 𝑣 in 𝑇delimited-[]subscript𝑉𝑡 are in subscript𝑋𝑡A_{t}\leftarrow\left\{v\in V_{t}\setminus X_{t}\ |\ \mbox{all the children of % }v\mbox{ in }T[V_{t}]\mbox{ are in }X_{t}\right\}italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ← { italic_v ∈ italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∖ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | all the children of italic_v in italic_T [ italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] are in italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT }
8:Utsubscript𝑈𝑡U_{t}\leftarrow\emptysetitalic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ← ∅ \triangleright Utsubscript𝑈𝑡U_{t}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT stores the vertices covered in iteration t𝑡titalic_t \ForaAt𝑎subscript𝐴𝑡a\in A_{t}italic_a ∈ italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
9:TaT[AncsT[Vt](a)DescT[Vt](a)]subscript𝑇𝑎𝑇delimited-[]subscriptAncs𝑇delimited-[]subscript𝑉𝑡𝑎subscriptDesc𝑇delimited-[]subscript𝑉𝑡𝑎T_{a}\leftarrow T[\textsc{Ancs}_{{T[V_{t}]}}({a})\cup\textsc{Desc}_{{T[V_{t}]}% }({a})]italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ← italic_T [ Ancs start_POSTSUBSCRIPT italic_T [ italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT ( italic_a ) ∪ Desc start_POSTSUBSCRIPT italic_T [ italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT ( italic_a ) ]
10:Q1,,QmNextFit(a,Ta,k)subscript𝑄1subscript𝑄𝑚NextFit𝑎subscript𝑇𝑎𝑘Q_{1},\dots,Q_{m}\leftarrow\textsc{NextFit}(a,T_{a},k)italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ← NextFit ( italic_a , italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_k )
11:Add Q1,,Qmsubscript𝑄1subscript𝑄𝑚Q_{1},\dots,Q_{m}italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT to 𝒞𝒞\mathcal{C}caligraphic_C \triangleright Add the partial cover computed by NextFit
12:UtUtQ1Qmsubscript𝑈𝑡subscript𝑈𝑡subscript𝑄1subscript𝑄𝑚U_{t}\leftarrow U_{t}\cup Q_{1}\cup\cdots\cup Q_{m}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ← italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∪ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ ⋯ ∪ italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT \EndFor\IfVtUtsubscript𝑉𝑡subscript𝑈𝑡V_{t}\setminus U_{t}\neq\emptysetitalic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∖ italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≠ ∅
13:Vt+1AncsT[Vt](VtUt)subscript𝑉𝑡1subscriptAncs𝑇delimited-[]subscript𝑉𝑡subscript𝑉𝑡subscript𝑈𝑡V_{t+1}\leftarrow\textsc{Ancs}_{{T[V_{t}]}}({V_{t}\setminus U_{t}})italic_V start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ← Ancs start_POSTSUBSCRIPT italic_T [ italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT ( italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∖ italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) \Else
14:Vt+1subscript𝑉𝑡1V_{t+1}\leftarrow\emptysetitalic_V start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ← ∅ \EndIf
15:tt+1𝑡𝑡1t\leftarrow t+1italic_t ← italic_t + 1 \EndWhile\IfVtsubscript𝑉𝑡V_{t}\neq\emptysetitalic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≠ ∅ \triangleright The last set in the cover
16:Add Vtsubscript𝑉𝑡V_{t}italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to 𝒞𝒞\mathcal{C}caligraphic_C \EndIf\Return𝒞𝒞\mathcal{C}caligraphic_C
Algorithm 2 Next-Fit packing
NextFita,Ta,k𝑎subscript𝑇𝑎𝑘a,T_{a},kitalic_a , italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_k
1:Input: An anchor aA𝑎𝐴a\in Aitalic_a ∈ italic_A, the subtree Tasubscript𝑇𝑎T_{a}italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and an integer k>0𝑘0k>0italic_k > 0
2:Output: A feasible cover Q1,,Qmsubscript𝑄1subscript𝑄𝑚Q_{1},\ldots,Q_{m}italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT of some (potentially all) vertices in DescTa(a)subscriptDescsubscript𝑇𝑎𝑎\textsc{Desc}_{{T_{a}}}({a})Desc start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a )
3:Let u1,,udsubscript𝑢1subscript𝑢𝑑u_{1},\ldots,u_{d}italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_u start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT be the children of a𝑎aitalic_a in Tasubscript𝑇𝑎T_{a}italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT
4:m1𝑚1m\leftarrow 1italic_m ← 1
5:QmAncsTa(a)subscript𝑄𝑚subscriptAncssubscript𝑇𝑎𝑎Q_{m}\leftarrow\textsc{Ancs}_{{T_{a}}}({a})italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ← Ancs start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a ) \Fors=1𝑠1s=1italic_s = 1 to d𝑑ditalic_d \Ifs(Qm)+s(DescTa(us))k𝑠subscript𝑄𝑚𝑠subscriptDescsubscript𝑇𝑎subscript𝑢𝑠𝑘s(Q_{m})+s(\textsc{Desc}_{{T_{a}}}({u_{s}}))\leq kitalic_s ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) + italic_s ( Desc start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) ) ≤ italic_k
6:QmQmDescTa(us)subscript𝑄𝑚subscript𝑄𝑚subscriptDescsubscript𝑇𝑎subscript𝑢𝑠Q_{m}\leftarrow Q_{m}\cup\textsc{Desc}_{{T_{a}}}({u_{s}})italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ← italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∪ Desc start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) \Else
7:mm+1𝑚𝑚1m\leftarrow m+1italic_m ← italic_m + 1
8:QmAncsTa(a)DescTa(us)subscript𝑄𝑚subscriptAncssubscript𝑇𝑎𝑎subscriptDescsubscript𝑇𝑎subscript𝑢𝑠Q_{m}\leftarrow\textsc{Ancs}_{{T_{a}}}({a})\cup\textsc{Desc}_{{T_{a}}}({u_{s}})italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ← Ancs start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a ) ∪ Desc start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) \EndIf\EndFor\Ifm is odd𝑚 is oddm\mbox{ is odd}italic_m is odd \triangleright Remove the subset Qmsubscript𝑄𝑚Q_{m}italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT if m𝑚mitalic_m is odd
9:mm1𝑚𝑚1m\leftarrow m-1italic_m ← italic_m - 1 \triangleright Note that m>1𝑚1m>1italic_m > 1 \EndIf\ReturnQ1,,Qmsubscript𝑄1subscript𝑄𝑚Q_{1},\ldots,Q_{m}italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT \EndProcedure
\Procedure

Let A=tAt={a1,a2,}𝐴subscript𝑡subscript𝐴𝑡subscript𝑎1subscript𝑎2A=\bigcup_{t}A_{t}=\{a_{1},a_{2},\ldots\}italic_A = ⋃ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … } be the set of anchors computed in all the iterations. For an anchor aA𝑎𝐴a\in Aitalic_a ∈ italic_A, let t(a)𝑡𝑎t(a)italic_t ( italic_a ) be the iteration in which a𝑎aitalic_a was added to the set of anchors. Note that any leaf \ellroman_ℓ of T𝑇Titalic_T appears in exactly one subset in 𝒞𝒞\mathcal{C}caligraphic_C. Thus, the iteration in which \ellroman_ℓ is covered is uniquely defined.

Refer to caption
(a) The subtree Tasubscript𝑇𝑎T_{a}italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT
Refer to caption
(b) The subtree Sasubscript𝑆𝑎S_{a}italic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT
Figure 1: The subtrees Tasubscript𝑇𝑎T_{a}italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT defined in Algorithm 1, and Sasubscript𝑆𝑎S_{a}italic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT defined in the proof of Lemma 4.
Definition 1

Let aA𝑎𝐴a\in Aitalic_a ∈ italic_A be an anchor.

  • If vDescTa(a)𝑣subscriptDescsubscript𝑇𝑎𝑎v\in\textsc{Desc}_{{T_{a}}}({a})italic_v ∈ Desc start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a ) is an ancestor of a leaf \ellroman_ℓ of T𝑇Titalic_T that is covered in iteration t(a)𝑡𝑎t(a)italic_t ( italic_a ) then we say that v𝑣vitalic_v is anchored at a𝑎aitalic_a.

  • If vDescTa(a)𝑣subscriptDescsubscript𝑇𝑎𝑎v\in\textsc{Desc}_{{T_{a}}}({a})italic_v ∈ Desc start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a ) is not anchored at a𝑎aitalic_a then we say that v𝑣vitalic_v is a leftover vertex of a𝑎aitalic_a.

  • Let sa(a)sa𝑎\textsc{sa}(a)sa ( italic_a ) denote the total size of the vertices that are anchored at a𝑎aitalic_a, and lo(a)lo𝑎\textsc{lo}(a)lo ( italic_a ) denote the total size of the leftover vertices of a𝑎aitalic_a.

Clearly, sa(a)+lo(a)=s(DescTa(a))sa𝑎lo𝑎𝑠subscriptDescsubscript𝑇𝑎𝑎\textsc{sa}(a)+\textsc{lo}(a)=s(\textsc{Desc}_{{T_{a}}}({a}))sa ( italic_a ) + lo ( italic_a ) = italic_s ( Desc start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a ) ). Our assumption that for every leaf \ellroman_ℓ of T𝑇Titalic_T, s()>0𝑠0s(\ell)>0italic_s ( roman_ℓ ) > 0, implies that (i) sa(a)>0sa𝑎0\textsc{sa}(a)>0sa ( italic_a ) > 0 and (ii) if there are leftover vertices then lo(a)>0lo𝑎0\textsc{lo}(a)>0lo ( italic_a ) > 0.

The proofs of the next lemmas are in LABEL:app:proofs.

Lemma 1

Let vDescTa(a)𝑣subscriptDescsubscript𝑇𝑎𝑎v\in\textsc{Desc}_{{T_{a}}}({a})italic_v ∈ Desc start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a ), and let ussubscript𝑢𝑠u_{s}italic_u start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT be the (unique) child of a𝑎aitalic_a that is also an ancestor of v𝑣vitalic_v. If v𝑣vitalic_v is a leftover vertex of a𝑎aitalic_a then all the vertices in the subtree of Tasubscript𝑇𝑎T_{a}italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT rooted at v𝑣vitalic_v, as well as the vertices along the path from ussubscript𝑢𝑠u_{s}italic_u start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT to v𝑣vitalic_v, are also leftover vertices of a𝑎aitalic_a. If v𝑣vitalic_v is anchored at a𝑎aitalic_a then all the vertices in the subtree of Tasubscript𝑇𝑎T_{a}italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT rooted at v𝑣vitalic_v, as well as the vertices along the path from ussubscript𝑢𝑠u_{s}italic_u start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT to v𝑣vitalic_v, are also anchored at a𝑎aitalic_a.

Lemma 2

For any two anchors a,aA𝑎superscript𝑎normal-′𝐴a,a^{\prime}\in Aitalic_a , italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_A, the sets of vertices anchored at a𝑎aitalic_a and asuperscript𝑎normal-′a^{\prime}italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are disjoint.

Define a “parent-child” relation among anchors as follows. For two anchors a𝑎aitalic_a and b𝑏bitalic_b, we say that a𝑎aitalic_a is the anchor-parent of b𝑏bitalic_b and b𝑏bitalic_b is the anchor-child of a𝑎aitalic_a if (i) a𝑎aitalic_a is an ancestor of b𝑏bitalic_b in T𝑇Titalic_T, and (ii) the path from a𝑎aitalic_a to b𝑏bitalic_b (in T𝑇Titalic_T) does not contain any anchors other than a𝑎aitalic_a and b𝑏bitalic_b. Note that if anchor a𝑎aitalic_a is an anchor-parent of b𝑏bitalic_b then t(a)>t(b)𝑡𝑎𝑡𝑏t(a)>t(b)italic_t ( italic_a ) > italic_t ( italic_b ); that is, the iteration t(a)𝑡𝑎t(a)italic_t ( italic_a ) in which a𝑎aitalic_a is added to the set of anchors is after iteration t(b)𝑡𝑏t(b)italic_t ( italic_b ). This follows from the definition of Atsubscript𝐴𝑡A_{t}italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. For anchor aA𝑎𝐴a\in Aitalic_a ∈ italic_A, let AC(a)DescT(a)AAC𝑎subscriptDesc𝑇𝑎𝐴\textsc{AC}(a)\subset\textsc{Desc}_{{T}}({a})\cap AAC ( italic_a ) ⊂ Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_a ) ∩ italic_A be the set of anchor-children of a𝑎aitalic_a. We extend this definition for all vV𝑣𝑉v\in Vitalic_v ∈ italic_V, and let AC(v)DescT(v)AAC𝑣subscriptDesc𝑇𝑣𝐴\textsc{AC}(v)\subset\textsc{Desc}_{{T}}({v})\cap AAC ( italic_v ) ⊂ Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_v ) ∩ italic_A be all the anchors bDescT(v)A𝑏subscriptDesc𝑇𝑣𝐴b\in\textsc{Desc}_{{T}}({v})\cap Aitalic_b ∈ Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_v ) ∩ italic_A such that the path from v𝑣vitalic_v to b𝑏bitalic_b (in T𝑇Titalic_T) does not contain any anchors other than b𝑏bitalic_b and (possibly) v𝑣vitalic_v. For vV𝑣𝑉v\in Vitalic_v ∈ italic_V, let AD(v)=DescT(v)AAD𝑣subscriptDesc𝑇𝑣𝐴\textsc{AD}(v)=\textsc{Desc}_{{T}}({v})\cap AAD ( italic_v ) = Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_v ) ∩ italic_A be the set of anchors that are also descendants of v𝑣vitalic_v. A top anchor is an anchor that is not an anchor-child of any other anchor. Let topAAtop𝐴𝐴\textsc{top}A\subseteq Atop italic_A ⊆ italic_A denote the set of top anchors. Note that if the root r𝑟ritalic_r is an anchor then topA={r}top𝐴𝑟\textsc{top}A=\left\{r\right\}top italic_A = { italic_r }.

Lemma 3

The number of subsets in the solution computed by Algorithm 1 is upper bounded by

α+aA2sa(a)kh(a)+1,𝛼subscript𝑎𝐴2sa𝑎𝑘𝑎1\alpha+\sum_{a\in A}2\left\lfloor\frac{\textsc{sa}(a)}{k-h(a)+1}\right\rfloor,italic_α + ∑ start_POSTSUBSCRIPT italic_a ∈ italic_A end_POSTSUBSCRIPT 2 ⌊ divide start_ARG sa ( italic_a ) end_ARG start_ARG italic_k - italic_h ( italic_a ) + 1 end_ARG ⌋ ,

where

α={1atopA s.t. lo(a)>01 leaf V s.t. AncsT()topA=0𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒𝛼cases1𝑎top𝐴 s.t. lo𝑎01 leaf 𝑉subscript s.t. Ancs𝑇top𝐴0𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒\alpha=\begin{cases}1&\exists a\in\textsc{top}A\text{ s.t. }\textsc{lo}(a)>0\\ 1&\exists\text{ leaf }\ell\in V\text{ s.t. }\textsc{Ancs}_{{T}}({\ell})\cap% \textsc{top}A=\emptyset\\ 0&\text{otherwise}\end{cases}italic_α = { start_ROW start_CELL 1 end_CELL start_CELL ∃ italic_a ∈ top italic_A italic_s.t. smallcaps_lo ( italic_a ) > 0 end_CELL end_ROW start_ROW start_CELL 1 end_CELL start_CELL ∃ leaf roman_ℓ ∈ italic_V italic_s.t. smallcaps_Ancs start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( roman_ℓ ) ∩ top italic_A = ∅ end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise end_CELL end_ROW
Proof

Let 𝒬=Q1,,Qd𝒬subscript𝑄1subscript𝑄𝑑\mathcal{Q}=Q_{1},\ldots,Q_{d}caligraphic_Q = italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Q start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT be the solution computed by the algorithm. Fix aA𝑎𝐴a\in Aitalic_a ∈ italic_A, and let 𝒬asubscript𝒬𝑎\mathcal{Q}_{a}caligraphic_Q start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT be the subsets in 𝒬𝒬\mathcal{Q}caligraphic_Q that were returned by Procedure NextFit (Algorithm 2) when it computed a feasible cover of the vertices in DescTa(a)subscriptDescsubscript𝑇𝑎𝑎\textsc{Desc}_{{T_{a}}}({a})Desc start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a ). Note that the union of all the subsets in 𝒬asubscript𝒬𝑎\mathcal{Q}_{a}caligraphic_Q start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT is the set of vertices anchored at a𝑎aitalic_a (whose total size is sa(a)sa𝑎\textsc{sa}(a)sa ( italic_a )) together with all the ancestors of a𝑎aitalic_a. Also, 𝒬asubscript𝒬𝑎\mathcal{Q}_{a}caligraphic_Q start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT consists of at least two subsets, and a vertex anchored at a𝑎aitalic_a cannot appear in more than one subset in 𝒬asubscript𝒬𝑎\mathcal{Q}_{a}caligraphic_Q start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT. Consider the subsets in 𝒬asubscript𝒬𝑎\mathcal{Q}_{a}caligraphic_Q start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT in the order in which they were computed by Algorithm 2. It follows from Procedure NextFit that the total size of the vertices in any pair of consecutive subsets in this ordered list is at least kh(a)+1𝑘𝑎1k-h(a)+1italic_k - italic_h ( italic_a ) + 1. Since the total size of vertices anchored at a𝑎aitalic_a is sa(a)sa𝑎\textsc{sa}(a)sa ( italic_a ), the number of such disjoint pairs is upper bounded by sa(a)kh(a)+1sa𝑎𝑘𝑎1\left\lfloor\frac{\textsc{sa}(a)}{k-h(a)+1}\right\rfloor⌊ divide start_ARG sa ( italic_a ) end_ARG start_ARG italic_k - italic_h ( italic_a ) + 1 end_ARG ⌋. By Line 8 of Procedure NextFit, the number of sets in Qasubscript𝑄𝑎Q_{a}italic_Q start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT is even, and thus the total number of subsets in 𝒬asubscript𝒬𝑎\mathcal{Q}_{a}caligraphic_Q start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT is upper bounded by 2sa(a)kh(a)+12sa𝑎𝑘𝑎12\left\lfloor\frac{\textsc{sa}(a)}{k-h(a)+1}\right\rfloor2 ⌊ divide start_ARG sa ( italic_a ) end_ARG start_ARG italic_k - italic_h ( italic_a ) + 1 end_ARG ⌋. The total upper bound is given by summing this bound over all anchors aA𝑎𝐴a\in Aitalic_a ∈ italic_A. We may have one additional subset if the algorithm is terminated when |Vt|>0subscript𝑉𝑡0|V_{t}|>0| italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | > 0. By our construction and Lemma 1, this may happen iff there exists a leaf \ellroman_ℓ that is not anchored at any anchor. If such a leaf \ellroman_ℓ exists then one of the following two conditions must be satisfied: (i) \ellroman_ℓ has no ancestor that is an anchor, or (ii) \ellroman_ℓ is a leftover vertex of the (unique) top anchor a𝑎aitalic_a that is an ancestor of \ellroman_ℓ, in which case lo(a)>0lo𝑎0\textsc{lo}(a)>0lo ( italic_a ) > 0. The lemma follows. ∎

We now prove a lower bound on the number of subsets in any feasible solution and in particular in the optimal solution. Let 𝒫=P1,,Pp𝒫subscript𝑃1subscript𝑃𝑝\mathcal{P}=P_{1},\dots,P_{p}caligraphic_P = italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_P start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT be a feasible solution. Since every subset in 𝒫𝒫\mathcal{P}caligraphic_P is closed under ancestor relation, some vertices may appear in multiple subsets. We refer to each such appearance of a vertex v𝑣vitalic_v as an occurrence of v𝑣vitalic_v, and associate the size s(v)𝑠𝑣s(v)italic_s ( italic_v ) to each of its occurrences. For an anchor aA𝑎𝐴a\in Aitalic_a ∈ italic_A, let 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ) be the set of all the subsets in 𝒫𝒫\mathcal{P}caligraphic_P that contain vertices anchored at either a𝑎aitalic_a or a descendant of a𝑎aitalic_a. For an anchor aA𝑎𝐴a\in Aitalic_a ∈ italic_A, let 𝒰(a)DescT(a)𝒰𝑎subscriptDesc𝑇𝑎\mathcal{U}(a)\subseteq\textsc{Desc}_{{T}}({a})caligraphic_U ( italic_a ) ⊆ Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_a ) be the set of all vertices such that each vertex is both a descendant of a𝑎aitalic_a and an ancestor of a vertex anchored at either a𝑎aitalic_a or a descendant of a𝑎aitalic_a.

Lemma 4

For every aA𝑎𝐴a\in Aitalic_a ∈ italic_A, the number of subsets in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ) is at least

LB(a)=bAD(a){a}sa(b)kh(b).LB𝑎subscript𝑏AD𝑎𝑎sa𝑏𝑘𝑏\textsc{LB}(a)=\sum_{b\in\textsc{AD}(a)\cup\left\{a\right\}}\left\lfloor\frac{% \textsc{sa}(b)}{k-h(b)}\right\rfloor.LB ( italic_a ) = ∑ start_POSTSUBSCRIPT italic_b ∈ AD ( italic_a ) ∪ { italic_a } end_POSTSUBSCRIPT ⌊ divide start_ARG sa ( italic_b ) end_ARG start_ARG italic_k - italic_h ( italic_b ) end_ARG ⌋ .

If the lower bound is tight then all the leaves that are in the subsets in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ) must be anchored at either a𝑎aitalic_a or a descendant of a𝑎aitalic_a.

Proof

To prove the lower bound of LB(a)LB𝑎\textsc{LB}(a)LB ( italic_a ) on the number of subsets in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ) for every aA𝑎𝐴a\in Aitalic_a ∈ italic_A, we prove a slightly stronger lower bound of (kh(a))LB(a)𝑘𝑎LB𝑎\left(k-h(a)\right)\textsc{LB}(a)( italic_k - italic_h ( italic_a ) ) LB ( italic_a ) on the total size of the occurrences of vertices in 𝒰(a)𝒰𝑎\mathcal{U}(a)caligraphic_U ( italic_a ) in subsets in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ). Since any subset that contains a descendant of a𝑎aitalic_a must contain also the ancestors of a𝑎aitalic_a (including a𝑎aitalic_a), whose total size is h(a)𝑎h(a)italic_h ( italic_a ), the total size of the vertices in 𝒰(a)𝒰𝑎\mathcal{U}(a)caligraphic_U ( italic_a ) that can be in a single subset in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ) is no more than kh(a)𝑘𝑎k-h(a)italic_k - italic_h ( italic_a ). Thus, a lower bound of (kh(a))LB(a)𝑘𝑎LB𝑎\left(k-h(a)\right)\textsc{LB}(a)( italic_k - italic_h ( italic_a ) ) LB ( italic_a ) on the total size of the occurrences of vertices in 𝒰(a)DescT(a)𝒰𝑎subscriptDesc𝑇𝑎\mathcal{U}(a)\subseteq\textsc{Desc}_{{T}}({a})caligraphic_U ( italic_a ) ⊆ Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_a ) in subsets in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ) implies a lower bound of LB(a)LB𝑎\textsc{LB}(a)LB ( italic_a ) on the number of subsets in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ) (and on the number of occurrences of anchor a𝑎aitalic_a).

The lower bound on the total size of the occurrences of vertices in 𝒰(a)𝒰𝑎\mathcal{U}(a)caligraphic_U ( italic_a ) in subsets in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ) also implies that if the lower bound is tight then all the leaves that are in the subsets in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ) must be anchored at either a𝑎aitalic_a or a descendant of a𝑎aitalic_a. To see this, note that if any subset in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ) contains a leaf \ellroman_ℓ that is not anchored at an anchor in AD(a){a}AD𝑎𝑎\textsc{AD}(a)\cup\left\{a\right\}AD ( italic_a ) ∪ { italic_a } then 𝒰(a)𝒰𝑎\ell\notin\mathcal{U}(a)roman_ℓ ∉ caligraphic_U ( italic_a ), also s()>0𝑠0s(\ell)>0italic_s ( roman_ℓ ) > 0 by our assumption. It follows that the total size of the subsets in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ) is strictly more than (kh(a))LB(a)𝑘𝑎LB𝑎\left(k-h(a)\right)\textsc{LB}(a)( italic_k - italic_h ( italic_a ) ) LB ( italic_a ). Clearly, this implies that the number of subsets in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ) is strictly more than LB(a)LB𝑎\textsc{LB}(a)LB ( italic_a ).

The proof is by induction starting from the bottom anchors in T𝑇Titalic_T, which are the anchors with no anchor-children. For the induction base, consider a bottom anchor a𝑎aitalic_a. Note that in this case 𝒰(a)𝒰𝑎\mathcal{U}(a)caligraphic_U ( italic_a ) is the set of all vertices anchored at a𝑎aitalic_a. The subsets in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ) cover all the vertices anchored at a𝑎aitalic_a; thus, the total size of the occurrences of these vertices in the subsets in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ) is at least sa(a)sa𝑎\textsc{sa}(a)sa ( italic_a ). Clearly, sa(a)(kh(a))sa(a)kh(a)=(kh(a))LB(a)sa𝑎𝑘𝑎sa𝑎𝑘𝑎𝑘𝑎LB𝑎\textsc{sa}(a)\geq\left(k-h(a)\right)\left\lfloor\frac{\textsc{sa}(a)}{k-h(a)}% \right\rfloor=\left(k-h(a)\right)\textsc{LB}(a)sa ( italic_a ) ≥ ( italic_k - italic_h ( italic_a ) ) ⌊ divide start_ARG sa ( italic_a ) end_ARG start_ARG italic_k - italic_h ( italic_a ) end_ARG ⌋ = ( italic_k - italic_h ( italic_a ) ) LB ( italic_a ). For the inductive step, consider an anchor a𝑎aitalic_a and assume that the lemma holds for every anchor bAC(a)𝑏AC𝑎b\in\textsc{AC}(a)italic_b ∈ AC ( italic_a ). Specifically, for every anchor bAC(a)𝑏AC𝑎b\in\textsc{AC}(a)italic_b ∈ AC ( italic_a ), the total size of the occurrences of vertices in 𝒰(b)𝒰𝑏\mathcal{U}(b)caligraphic_U ( italic_b ) in subsets in 𝒫(b)𝒫𝑏\mathcal{P}(b)caligraphic_P ( italic_b ) is at least (kh(b))LB(b)𝑘𝑏LB𝑏\left(k-h(b)\right)\textsc{LB}(b)( italic_k - italic_h ( italic_b ) ) LB ( italic_b ). Note that 𝒫(b)𝒫(a)𝒫𝑏𝒫𝑎\mathcal{P}(b)\subseteq\mathcal{P}(a)caligraphic_P ( italic_b ) ⊆ caligraphic_P ( italic_a ) and 𝒰(b)𝒰(a)𝒰𝑏𝒰𝑎\mathcal{U}(b)\subseteq\mathcal{U}(a)caligraphic_U ( italic_b ) ⊆ caligraphic_U ( italic_a ). Let Sasubscript𝑆𝑎S_{a}italic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT the subtree of T𝑇Titalic_T rooted at a𝑎aitalic_a given by the union of the paths from a𝑎aitalic_a to each of its anchor-children, excluding the anchor-children (see Figure 0(b)). Note that the vertices of Sasubscript𝑆𝑎S_{a}italic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT as well as the vertices in AC(v)AC𝑣\textsc{AC}(v)AC ( italic_v ) are in 𝒰(a)𝒰𝑎\mathcal{U}(a)caligraphic_U ( italic_a ).

Claim 5

For every vertex v𝑣vitalic_v of Sasubscript𝑆𝑎S_{a}italic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, the total size of the occurrences of vertices in DescT(v)𝒰(a)subscriptDesc𝑇𝑣𝒰𝑎\textsc{Desc}_{{T}}({v})\cap\mathcal{U}(a)Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_v ) ∩ caligraphic_U ( italic_a ) in subsets in bAC(v)𝒫(b)subscript𝑏AC𝑣𝒫𝑏\bigcup_{b\in\textsc{AC}(v)}\mathcal{P}(b)⋃ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT caligraphic_P ( italic_b ) is at least (kh(v))bAC(v)LB(b)𝑘𝑣subscript𝑏AC𝑣LB𝑏\left(k-h(v)\right)\sum_{b\in\textsc{AC}(v)}{\textsc{LB}(b)}( italic_k - italic_h ( italic_v ) ) ∑ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT LB ( italic_b ).

Proof (of Claim 5)

We prove the claim vertex by vertex, scanning the vertices of Sasubscript𝑆𝑎S_{a}italic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT bottom-up. Consider a leaf v𝑣vitalic_v of Sasubscript𝑆𝑎S_{a}italic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT. By the definition of Sasubscript𝑆𝑎S_{a}italic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, its children are anchors in AC(v)AC𝑣\textsc{AC}(v)AC ( italic_v ). By the induction hypothesis of Lemma 4, for every anchor bAC(v)𝑏AC𝑣b\in\textsc{AC}(v)italic_b ∈ AC ( italic_v ) the total size of the occurrences of vertices in 𝒰(b)𝒰𝑏\mathcal{U}(b)caligraphic_U ( italic_b ) in the subsets in 𝒫(b)𝒫𝑏\mathcal{P}(b)caligraphic_P ( italic_b ) is at least (kh(b))LB(b)𝑘𝑏LB𝑏\left(k-h(b)\right)\textsc{LB}(b)( italic_k - italic_h ( italic_b ) ) LB ( italic_b ). The total size of such occurrences that are contained in any single subset of 𝒫(b)𝒫𝑏\mathcal{P}(b)caligraphic_P ( italic_b ) is at most kh(b)𝑘𝑏k-h(b)italic_k - italic_h ( italic_b ), since any such subset must also contain the ancestors of b𝑏bitalic_b (including b𝑏bitalic_b) whose size is h(b)𝑏h(b)italic_h ( italic_b ). It follows that the number of occurrences of b𝑏bitalic_b in these subsets in 𝒫(b)𝒫𝑏\mathcal{P}(b)caligraphic_P ( italic_b ) is at least LB(b)LB𝑏{\textsc{LB}(b)}LB ( italic_b ), and the total size of these occurrences is at least s(b)LB(b)𝑠𝑏LB𝑏s(b)\cdot{\textsc{LB}(b)}italic_s ( italic_b ) ⋅ LB ( italic_b ). Note that for any pair of anchors b,bAC(v)𝑏superscript𝑏AC𝑣b,b^{\prime}\in\textsc{AC}(v)italic_b , italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ AC ( italic_v ), the sets 𝒰(b)𝒰𝑏\mathcal{U}(b)caligraphic_U ( italic_b ) and 𝒰(b)𝒰superscript𝑏\mathcal{U}(b^{\prime})caligraphic_U ( italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) are disjoint. Summing over all the anchor-children of v𝑣vitalic_v, we have that the total size of the occurrences of vertices in DescT(v)𝒰(a)subscriptDesc𝑇𝑣𝒰𝑎\textsc{Desc}_{{T}}({v})\cap\mathcal{U}(a)Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_v ) ∩ caligraphic_U ( italic_a ) in the subsets in bAC(v)𝒫(b)subscript𝑏AC𝑣𝒫𝑏\bigcup_{b\in\textsc{AC}(v)}\mathcal{P}(b)⋃ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT caligraphic_P ( italic_b ) is at least bAC(v)LB(b)((kh(b))+s(b))=(kh(v))bAC(v)LB(b)subscript𝑏AC𝑣LB𝑏𝑘𝑏𝑠𝑏𝑘𝑣subscript𝑏AC𝑣LB𝑏\sum_{b\in\textsc{AC}(v)}\textsc{LB}(b)\left(\left(k-h(b)\right)+{s(b)}\right)% =\left(k-h(v)\right)\sum_{b\in\textsc{AC}(v)}{\textsc{LB}(b)}∑ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT LB ( italic_b ) ( ( italic_k - italic_h ( italic_b ) ) + italic_s ( italic_b ) ) = ( italic_k - italic_h ( italic_v ) ) ∑ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT LB ( italic_b ). The last equality holds since for every bAC(v)𝑏AC𝑣b\in\textsc{AC}(v)italic_b ∈ AC ( italic_v ), h(v)+s(b)=h(b)𝑣𝑠𝑏𝑏h(v)+s(b)=h(b)italic_h ( italic_v ) + italic_s ( italic_b ) = italic_h ( italic_b ). The lower bound for an internal vertex of Sasubscript𝑆𝑎S_{a}italic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT is obtained similarly. Note that a child u𝑢uitalic_u of v𝑣vitalic_v is either an anchor or a vertex of Sasubscript𝑆𝑎S_{a}italic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT. If u𝑢uitalic_u is an anchor, that is uAC(v)𝑢AC𝑣u\in\textsc{AC}(v)italic_u ∈ AC ( italic_v ), then as shown above, the total size of the occurrences of u𝑢uitalic_u and its descendants in 𝒰(u)𝒰𝑢\mathcal{U}(u)caligraphic_U ( italic_u ) in the subsets in 𝒫(u)𝒫𝑢\mathcal{P}(u)caligraphic_P ( italic_u ) is (kh(u)+s(u))LB(u)=(kh(v))LB(u)𝑘𝑢𝑠𝑢LB𝑢𝑘𝑣LB𝑢\left(k-h(u)+s(u)\right){\textsc{LB}(u)}=\left(k-h(v)\right){\textsc{LB}(u)}( italic_k - italic_h ( italic_u ) + italic_s ( italic_u ) ) LB ( italic_u ) = ( italic_k - italic_h ( italic_v ) ) LB ( italic_u ). Suppose that u𝑢uitalic_u is a vertex of Sasubscript𝑆𝑎S_{a}italic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT. Since u𝑢uitalic_u is a child of v𝑣vitalic_v and the vertices are scanned bottom-up the lower bound holds for u𝑢uitalic_u, and the total size of the occurrences of vertices in DescT(u)𝒰(a)subscriptDesc𝑇𝑢𝒰𝑎\textsc{Desc}_{{T}}({u})\cap\mathcal{U}(a)Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_u ) ∩ caligraphic_U ( italic_a ) in the subsets in bAC(u)𝒫(b)subscript𝑏AC𝑢𝒫𝑏\bigcup_{b\in\textsc{AC}(u)}\mathcal{P}(b)⋃ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_u ) end_POSTSUBSCRIPT caligraphic_P ( italic_b ) is (kh(u))bAC(u)LB(b)𝑘𝑢subscript𝑏AC𝑢LB𝑏\left(k-h(u)\right)\sum_{b\in\textsc{AC}(u)}{\textsc{LB}(b)}( italic_k - italic_h ( italic_u ) ) ∑ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_u ) end_POSTSUBSCRIPT LB ( italic_b ). The total size of such occurrences that is contained in any single subset in bAC(u)𝒫(b)subscript𝑏AC𝑢𝒫𝑏\bigcup_{b\in\textsc{AC}(u)}\mathcal{P}(b)⋃ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_u ) end_POSTSUBSCRIPT caligraphic_P ( italic_b ) is at most kh(u)𝑘𝑢k-h(u)italic_k - italic_h ( italic_u ), thus; the number of occurrences of u𝑢uitalic_u in these subsets is at least bAC(u)LB(b)subscript𝑏AC𝑢LB𝑏\sum_{b\in\textsc{AC}(u)}{\textsc{LB}(b)}∑ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_u ) end_POSTSUBSCRIPT LB ( italic_b ), and the total size of these occurrences is at least s(u)bAC(u)LB(b)𝑠𝑢subscript𝑏AC𝑢LB𝑏s(u)\cdot\sum_{b\in\textsc{AC}(u)}{\textsc{LB}(b)}italic_s ( italic_u ) ⋅ ∑ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_u ) end_POSTSUBSCRIPT LB ( italic_b ). We get that the total size of the occurrences of u𝑢uitalic_u and the vertices in DescT(u)𝒰(a)subscriptDesc𝑇𝑢𝒰𝑎\textsc{Desc}_{{T}}({u})\cap\mathcal{U}(a)Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_u ) ∩ caligraphic_U ( italic_a ) in the subsets in bAC(u)𝒫(b)subscript𝑏AC𝑢𝒫𝑏\bigcup_{b\in\textsc{AC}(u)}\mathcal{P}(b)⋃ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_u ) end_POSTSUBSCRIPT caligraphic_P ( italic_b ) is (kh(u)+s(u))bAC(u)LB(b)=(kh(v))bAC(u)LB(b)𝑘𝑢𝑠𝑢subscript𝑏AC𝑢LB𝑏𝑘𝑣subscript𝑏AC𝑢LB𝑏\left(k-h(u)+s(u)\right)\sum_{b\in\textsc{AC}(u)}{\textsc{LB}(b)}=\left(k-h(v)% \right)\sum_{b\in\textsc{AC}(u)}{\textsc{LB}(b)}( italic_k - italic_h ( italic_u ) + italic_s ( italic_u ) ) ∑ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_u ) end_POSTSUBSCRIPT LB ( italic_b ) = ( italic_k - italic_h ( italic_v ) ) ∑ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_u ) end_POSTSUBSCRIPT LB ( italic_b ). For any pair u,u𝑢superscript𝑢u,u^{\prime}italic_u , italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of children of v𝑣vitalic_v, the sets DescT(u)subscriptDesc𝑇𝑢\textsc{Desc}_{{T}}({u})Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_u ) and DescT(u)subscriptDesc𝑇superscript𝑢\textsc{Desc}_{{T}}({u^{\prime}})Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) are disjoint. Summing over all the children of v𝑣vitalic_v, we get that the total size of the occurrences of vertices in DescT(v)𝒰(a)subscriptDesc𝑇𝑣𝒰𝑎\textsc{Desc}_{{T}}({v})\cap\mathcal{U}(a)Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_v ) ∩ caligraphic_U ( italic_a ) in the subsets in bAC(v)𝒫(b)subscript𝑏AC𝑣𝒫𝑏\bigcup_{b\in\textsc{AC}(v)}\mathcal{P}(b)⋃ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT caligraphic_P ( italic_b ) is at least (kh(v))bAC(v)LB(b)𝑘𝑣subscript𝑏AC𝑣LB𝑏\left(k-h(v)\right)\sum_{b\in\textsc{AC}(v)}{\textsc{LB}(b)}( italic_k - italic_h ( italic_v ) ) ∑ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT LB ( italic_b ).∎

Next, we consider vertices that are anchored at a𝑎aitalic_a. By the definition of 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ), each such vertex v𝑣vitalic_v must occur at least once in subsets in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ); also, v𝒰(a)𝑣𝒰𝑎v\in\mathcal{U}(a)italic_v ∈ caligraphic_U ( italic_a ). Note that v𝑣vitalic_v may be an ancestor of an anchor bAD(a)𝑏AD𝑎b\in\textsc{AD}(a)italic_b ∈ AD ( italic_a ). This may happen in case v𝑣vitalic_v is a vertex of Sasubscript𝑆𝑎S_{a}italic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, and also in case a leftover vertex of an anchor bAD(a)𝑏AD𝑎b\in\textsc{AD}(a)italic_b ∈ AD ( italic_a ) is anchored at a𝑎aitalic_a, and v𝑣vitalic_v is on the path from a𝑎aitalic_a to b𝑏bitalic_b. In case v𝑣vitalic_v is an ancestor of an anchor bAD(a)𝑏AD𝑎b\in\textsc{AD}(a)italic_b ∈ AD ( italic_a ), our induction hypothesis and Claim 5 already imply a lower bound on the number of its occurrences in subsets in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ). Specifically, in case vAD(a)𝑣AD𝑎v\in\textsc{AD}(a)italic_v ∈ AD ( italic_a ), our induction hypothesis implies a lower bound of LB(v)LB𝑣{\textsc{LB}(v)}LB ( italic_v ) on the number of its occurrences, and in case vAD(a)𝑣AD𝑎v\notin\textsc{AD}(a)italic_v ∉ AD ( italic_a ) and AC(v)AC𝑣\textsc{AC}(v)\neq\emptysetAC ( italic_v ) ≠ ∅, Claim 5 implies a lower bound of bAC(v)LB(b)subscript𝑏AC𝑣LB𝑏\sum_{b\in\textsc{AC}(v)}{\textsc{LB}(b)}∑ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT LB ( italic_b ) on the number of its occurrences. We prove that v𝑣vitalic_v must occur at least once more in subsets in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ), in addition to this implied lower bound. This results in addition of sa(a)sa𝑎\textsc{sa}(a)sa ( italic_a ) to the total size of the occurrences of vertices anchored at a𝑎aitalic_a in the subsets in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ).

Claim 6

For every vertex v𝑣vitalic_v anchored at a𝑎aitalic_a, the number of occurrences of v𝑣vitalic_v in the subsets in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ) is at least

{1vAncsT(AD(a))1+LB(v)vAD(a)1+bAC(v)LB(b)𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒cases1𝑣subscriptAncs𝑇AD𝑎1LB𝑣𝑣AD𝑎1subscript𝑏AC𝑣LB𝑏𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒\begin{cases}1&v\notin\textsc{Ancs}_{{T}}({\textsc{AD}(a)})\\ 1+{\textsc{LB}(v)}&v\in\textsc{AD}(a)\\ 1+\sum_{b\in\textsc{AC}(v)}{\textsc{LB}(b)}&\text{otherwise}\end{cases}{ start_ROW start_CELL 1 end_CELL start_CELL italic_v ∉ Ancs start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( AD ( italic_a ) ) end_CELL end_ROW start_ROW start_CELL 1 + LB ( italic_v ) end_CELL start_CELL italic_v ∈ AD ( italic_a ) end_CELL end_ROW start_ROW start_CELL 1 + ∑ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT LB ( italic_b ) end_CELL start_CELL otherwise end_CELL end_ROW
Proof (of Claim 6)

If v𝑣vitalic_v is anchored at a𝑎aitalic_a then it must be an ancestor of a leaf \ellroman_ℓ of T𝑇Titalic_T that is anchored at a𝑎aitalic_a. Certainly, v𝑣vitalic_v must occur in the subset in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ) that covers \ellroman_ℓ. If v𝑣vitalic_v is not an ancestor of an anchor bAD(a)𝑏AD𝑎b\in\textsc{AD}(a)italic_b ∈ AD ( italic_a ), we are done. If v𝑣vitalic_v is an anchor and thus vAD(a)𝑣AD𝑎v\in\textsc{AD}(a)italic_v ∈ AD ( italic_a ), and the number of occurrences of v𝑣vitalic_v in the subsets in 𝒫(v)𝒫𝑣\mathcal{P}(v)caligraphic_P ( italic_v ) is strictly more than LB(v)LB𝑣\textsc{LB}(v)LB ( italic_v ), then we are done. Otherwise, the lower bound LB(v)LB𝑣\textsc{LB}(v)LB ( italic_v ) is tight, and by the induction hypothesis of Lemma 4, all the leaves that are in the subsets in 𝒫(v)𝒫𝑣\mathcal{P}(v)caligraphic_P ( italic_v ) must be anchored at v𝑣vitalic_v or a descendant of v𝑣vitalic_v. Thus, none of these subsets can cover \ellroman_ℓ. It follows that v𝑣vitalic_v must occur in at least one more subset in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ) that covers \ellroman_ℓ. A similar argument applies also if vAD(a)𝑣AD𝑎v\notin\textsc{AD}(a)italic_v ∉ AD ( italic_a ) and AC(v)AC𝑣\textsc{AC}(v)\neq\emptysetAC ( italic_v ) ≠ ∅. Let aAD(a){a}superscript𝑎AD𝑎𝑎a^{\prime}\in\textsc{AD}(a)\cap\left\{a\right\}italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ AD ( italic_a ) ∩ { italic_a } be the nearest ancestor of v𝑣vitalic_v that is an anchor. By Claim 5, the total size of the occurrences of vertices in DescT(v)𝒰(a)subscriptDesc𝑇𝑣𝒰superscript𝑎\textsc{Desc}_{{T}}({v})\cap\mathcal{U}(a^{\prime})Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_v ) ∩ caligraphic_U ( italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) in subsets in bAC(v)𝒫(b)subscript𝑏AC𝑣𝒫𝑏\bigcup_{b\in\textsc{AC}(v)}\mathcal{P}(b)⋃ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT caligraphic_P ( italic_b ) is at least (kh(v))bAC(v)LB(b)𝑘𝑣subscript𝑏AC𝑣LB𝑏\left(k-h(v)\right)\sum_{b\in\textsc{AC}(v)}{\textsc{LB}(b)}( italic_k - italic_h ( italic_v ) ) ∑ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT LB ( italic_b ). It follows that the number of occurrences of v𝑣vitalic_v in the subsets in bAC(v)𝒫(b)subscript𝑏AC𝑣𝒫𝑏\bigcup_{b\in\textsc{AC}(v)}\mathcal{P}(b)⋃ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT caligraphic_P ( italic_b ) is at least bAC(v)LB(b)subscript𝑏AC𝑣LB𝑏\sum_{b\in\textsc{AC}(v)}{\textsc{LB}(b)}∑ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT LB ( italic_b ). If \ellroman_ℓ is not in any of the subsets in bAC(v)𝒫(b)subscript𝑏AC𝑣𝒫𝑏\bigcup_{b\in\textsc{AC}(v)}\mathcal{P}(b)⋃ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT caligraphic_P ( italic_b ), then v𝑣vitalic_v must occur in at least one more subset in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ) that covers \ellroman_ℓ, and we are done. Suppose that this is not the case, and \ellroman_ℓ is in a subset in bAC(v)𝒫(b)subscript𝑏AC𝑣𝒫𝑏\bigcup_{b\in\textsc{AC}(v)}\mathcal{P}(b)⋃ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT caligraphic_P ( italic_b ). It is not difficult to verify that the proof of Claim 5 implies the lower bound on the total size of the vertices in a subset of DescT(v)𝒰(a)subscriptDesc𝑇𝑣𝒰superscript𝑎\textsc{Desc}_{{T}}({v})\cap\mathcal{U}(a^{\prime})Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_v ) ∩ caligraphic_U ( italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ). This subset is the union of three sets: DescT(v)(bAC(v)𝒰(b))subscriptDesc𝑇𝑣subscript𝑏AC𝑣𝒰𝑏\textsc{Desc}_{{T}}({v})\cap\left(\bigcup_{b\in\textsc{AC}(v)}\mathcal{U}(b)\right)Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_v ) ∩ ( ⋃ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT caligraphic_U ( italic_b ) ), AC(v)AC𝑣\textsc{AC}(v)AC ( italic_v ), and DescSa(v)subscriptDescsubscript𝑆superscript𝑎𝑣\textsc{Desc}_{{S_{a^{\prime}}}}({v})Desc start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_v ). Clearly, \ellroman_ℓ is not in any of these three sets. Thus, the total size of the occurrences of vertices in DescT(v)𝒰(a)subscriptDesc𝑇𝑣𝒰𝑎\textsc{Desc}_{{T}}({v})\cap\mathcal{U}(a)Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_v ) ∩ caligraphic_U ( italic_a ) in the subsets in bAC(v)𝒫(b)𝒫(a)subscript𝑏AC𝑣𝒫𝑏𝒫𝑎\bigcup_{b\in\textsc{AC}(v)}\mathcal{P}(b)\subset\mathcal{P}(a)⋃ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT caligraphic_P ( italic_b ) ⊂ caligraphic_P ( italic_a ) is strictly more than (kh(v))bAC(v)LB(b)𝑘𝑣subscript𝑏AC𝑣LB𝑏\left(k-h(v)\right)\sum_{b\in\textsc{AC}(v)}{\textsc{LB}(b)}( italic_k - italic_h ( italic_v ) ) ∑ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT LB ( italic_b ). Hence, the number of occurrences of v𝑣vitalic_v in the subsets in bAC(v)𝒫(b)subscript𝑏AC𝑣𝒫𝑏\bigcup_{b\in\textsc{AC}(v)}\mathcal{P}(b)⋃ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT caligraphic_P ( italic_b ) is strictly more than bAC(v)LB(b)subscript𝑏AC𝑣LB𝑏\sum_{b\in\textsc{AC}(v)}{\textsc{LB}(b)}∑ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_v ) end_POSTSUBSCRIPT LB ( italic_b ).∎

By Claims 5 and 6 and our induction hypothesis we get that the total size of the occurrences of vertices in DescT(a)𝒰(a)subscriptDesc𝑇𝑎𝒰𝑎\textsc{Desc}_{{T}}({a})\cap\mathcal{U}(a)Desc start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_a ) ∩ caligraphic_U ( italic_a ) in the subsets in 𝒫(a)𝒫𝑎\mathcal{P}(a)caligraphic_P ( italic_a ) is at least

sa(a)sa𝑎\displaystyle\textsc{sa}(a)sa ( italic_a ) +(kh(a))bAC(a)LB(b)(kh(a))(sa(a)kh(a)+bAC(a)LB(b))𝑘𝑎subscript𝑏AC𝑎LB𝑏𝑘𝑎sa𝑎𝑘𝑎subscript𝑏AC𝑎LB𝑏\displaystyle+\left(k-h(a)\right)\sum_{b\in\textsc{AC}(a)}{\textsc{LB}(b)}\geq% \left(k-h(a)\right)\left(\left\lfloor\frac{\textsc{sa}(a)}{k-h(a)}\right% \rfloor+\sum_{b\in\textsc{AC}(a)}{\textsc{LB}(b)}\right)+ ( italic_k - italic_h ( italic_a ) ) ∑ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_a ) end_POSTSUBSCRIPT LB ( italic_b ) ≥ ( italic_k - italic_h ( italic_a ) ) ( ⌊ divide start_ARG sa ( italic_a ) end_ARG start_ARG italic_k - italic_h ( italic_a ) end_ARG ⌋ + ∑ start_POSTSUBSCRIPT italic_b ∈ AC ( italic_a ) end_POSTSUBSCRIPT LB ( italic_b ) )
=(kh(a))cAD(a){a}sa(c)kh(c)=(kh(a))LB(a)absent𝑘𝑎subscript𝑐AD𝑎𝑎sa𝑐𝑘𝑐𝑘𝑎LB𝑎\displaystyle=\left(k-h(a)\right)\sum_{c\in\textsc{AD}(a)\cup\left\{a\right\}}% \left\lfloor\frac{\textsc{sa}(c)}{k-h(c)}\right\rfloor=\left(k-h(a)\right)% \textsc{LB}(a)= ( italic_k - italic_h ( italic_a ) ) ∑ start_POSTSUBSCRIPT italic_c ∈ AD ( italic_a ) ∪ { italic_a } end_POSTSUBSCRIPT ⌊ divide start_ARG sa ( italic_c ) end_ARG start_ARG italic_k - italic_h ( italic_c ) end_ARG ⌋ = ( italic_k - italic_h ( italic_a ) ) LB ( italic_a )

The first equality holds since LB(b)=cAD(b){b}sa(c)kh(c)LB𝑏subscript𝑐AD𝑏𝑏sa𝑐𝑘𝑐{\textsc{LB}(b)}=\sum_{c\in\textsc{AD}(b)\cup\left\{b\right\}}\left\lfloor% \frac{\textsc{sa}(c)}{k-h(c)}\right\rfloorLB ( italic_b ) = ∑ start_POSTSUBSCRIPT italic_c ∈ AD ( italic_b ) ∪ { italic_b } end_POSTSUBSCRIPT ⌊ divide start_ARG sa ( italic_c ) end_ARG start_ARG italic_k - italic_h ( italic_c ) end_ARG ⌋. ∎

Corollary 2

The number of subsets in any feasible solution is at least

α+aAsa(a)kh(a),𝛼subscript𝑎𝐴sa𝑎𝑘𝑎\alpha+\sum_{a\in A}\left\lfloor\frac{\textsc{sa}(a)}{k-h(a)}\right\rfloor,italic_α + ∑ start_POSTSUBSCRIPT italic_a ∈ italic_A end_POSTSUBSCRIPT ⌊ divide start_ARG sa ( italic_a ) end_ARG start_ARG italic_k - italic_h ( italic_a ) end_ARG ⌋ ,

where α𝛼\alphaitalic_α is defined in Lemma 3.

Proof

If topA={r}top𝐴𝑟\textsc{top}A=\left\{r\right\}top italic_A = { italic_r } then by Lemma 4 the number of occurrences of r𝑟ritalic_r is at least LB(r)=aAsa(a)kh(a)LB𝑟subscript𝑎𝐴sa𝑎𝑘𝑎\textsc{LB}(r)=\sum_{a\in A}\left\lfloor\frac{\textsc{sa}(a)}{k-h(a)}\right\rfloorLB ( italic_r ) = ∑ start_POSTSUBSCRIPT italic_a ∈ italic_A end_POSTSUBSCRIPT ⌊ divide start_ARG sa ( italic_a ) end_ARG start_ARG italic_k - italic_h ( italic_a ) end_ARG ⌋. If this lower bound is tight then all the leaves in subsets in 𝒫(r)𝒫𝑟\mathcal{P}(r)caligraphic_P ( italic_r ) are anchored at some vertex. If lo(r)>0lo𝑟0\textsc{lo}(r)>0lo ( italic_r ) > 0, then there is a leaf of T𝑇Titalic_T that is a leftover vertex of r𝑟ritalic_r and thus not anchored at any vertex. In this case, at least one additional subset is needed to cover this leaf. If topA{r}top𝐴𝑟\textsc{top}A\neq\left\{r\right\}top italic_A ≠ { italic_r } then AC(r)=topAAC𝑟top𝐴\textsc{AC}(r)=\textsc{top}AAC ( italic_r ) = top italic_A. In this case, following the proof of Claim 5, we get that the total size of the occurrences of the descendants of r𝑟ritalic_r in subsets in 𝒫𝒫\mathcal{P}caligraphic_P is at least (ks(r))atopALB(a)𝑘𝑠𝑟subscript𝑎top𝐴LB𝑎\left(k-s(r)\right)\sum_{a\in\textsc{top}A}\textsc{LB}(a)( italic_k - italic_s ( italic_r ) ) ∑ start_POSTSUBSCRIPT italic_a ∈ top italic_A end_POSTSUBSCRIPT LB ( italic_a ). This implies that r𝑟ritalic_r occurs in at least atopALB(a)=aAsa(a)kh(a)subscript𝑎top𝐴LB𝑎subscript𝑎𝐴sa𝑎𝑘𝑎\sum_{a\in\textsc{top}A}\textsc{LB}(a)=\sum_{a\in A}\left\lfloor\frac{\textsc{% sa}(a)}{k-h(a)}\right\rfloor∑ start_POSTSUBSCRIPT italic_a ∈ top italic_A end_POSTSUBSCRIPT LB ( italic_a ) = ∑ start_POSTSUBSCRIPT italic_a ∈ italic_A end_POSTSUBSCRIPT ⌊ divide start_ARG sa ( italic_a ) end_ARG start_ARG italic_k - italic_h ( italic_a ) end_ARG ⌋ subsets in 𝒫𝒫\mathcal{P}caligraphic_P. If the bound is tight then all the leaves in these subsets are anchored vertices. Thus, if there exists a vertex that is not anchored at any vertex, at least one additional subset is needed. This occurs when either atopA s.t. lo(a)>0𝑎top𝐴 s.t. lo𝑎0\exists a\in\textsc{top}A\text{ s.t. }\textsc{lo}(a)>0∃ italic_a ∈ top italic_A s.t. smallcaps_lo ( italic_a ) > 0, or  leaf V leaf 𝑉\exists\text{ leaf }\ell\in V∃ leaf roman_ℓ ∈ italic_V s.t. none of the ancestors of \ellroman_ℓ is an anchor. ∎

Corollary 2 and Lemma 3 imply a factor 2 approximation.

3 Open problems

An intriguing open problem is to bridge the gap between our 2222-approximation and 1.51.51.51.5-inapproximability result for ct. Recall that ct is the special case of cpo on out-trees. While we expect cpo to be hard to approximate on general graphs (as mentioned above), exploring further the hardness of cpo on various graph classes remains open.

Another appealing line of research is to investigate the connections between cpo and a natural covering variant of the dksh problem defined as follows. Given a hypergraph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ) and an integer k𝑘kitalic_k, find the minimum number of vertex sets, each of cardinality at most k𝑘kitalic_k, such that every hyperedge is fully contained in one of the sets. We are not aware of earlir studies of this problem, even in the special case where G𝐺Gitalic_G is a graph. One interesting direction is to derive nontrivial hardness results for this problem and show possible implications for cpo.

References

  • [1] Azzolini, D., Riguzzi, F., Lamma, E.: Studying transaction fees in the bitcoin blockchain with probabilistic logic programming. Information 10(11),  335 (2019)
  • [2] Biondi, M., Saliba, S., Harjunkoski, I.: Production optimization and scheduling in a steel plant: Hot rolling mill. In: 18th World Congress of the International Federation of Automatic Control. pp. 11750–11754 (2011)
  • [3] Bonsma, P.: Most balanced minimum cuts. Discrete Applied Mathematics 158(4), 261–276 (2010)
  • [4] Borradaile, G., Heeringa, B., Wilfong, G.: The knapsack problem with neighbour constraints. Journal of Discrete Algorithms 16, 224–235 (2012)
  • [5] Chalermsook, P., Cygan, M., Kortsarz, G., Laekhanukit, B., Manurangsi, P., Nanongkai, D., Trevisan, L.: From gap-ETH to FPT-inapproximability: Clique, dominating set, and more. In: 58th Annual Symposium on Foundations of Computer Science (FOCS). pp. 743–754 (2017)
  • [6] Cheng, T., Wang, K., Wang, L.C., Lee, C.W.: An in-switch rule caching and replacement algorithm in software defined networks. In: 2018 IEEE International Conference on Communications (ICC). pp. 1–6. IEEE (2018)
  • [7] Chlamtác, E., Dinitz, M., Konrad, C., Kortsarz, G., Rabanca, G.: The densest k-subhypergraph problem. SIAM J. Discret. Math. 32(2), 1458–1477 (2018)
  • [8] Chlamtáč, E., Dinitz, M., Makarychev, Y.: Minimizing the union: Tight approximations for small set bipartite vertex expansion. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. pp. 881–899 (2017)
  • [9] Cygan, M., Fomin, F.V., Kowalik, Ł., Lokshtanov, D., Marx, D., Pilipczuk, M., Pilipczuk, M., Saurabh, S.: Parameterized algorithms, vol. 4 (2015)
  • [10] Dong, M., Li, H., Ota, K., Xiao, J.: Rule caching in sdn-enabled mobile access networks. IEEE Network 29(4), 40–45 (2015)
  • [11] Doron-Arad, I., Shachnai, H.: Approximating bin packing with conflict graphs via maximization techniques. arXiv preprint arXiv:2302.10613 (2023)
  • [12] Downey, R.G., Fellows, M.R.: Fundamentals of parameterized complexity, vol. 4 (2013)
  • [13] Efsandiari, H., Hajiaghyi, M., Könemann, J., Mahini, H., Malec, D., Sanita, L.: Approximate deadline-scheduling with precedence constraints. In: Algorithms-ESA 2015: 23rd Annual European Symposium, Patras, Greece, September 14-16, 2015, Proceedings. pp. 483–495 (2015)
  • [14] Gamage, S., Pasqual, A.: High-performance parallel packet classification architecture with popular rule caching. In: 2012 18th IEEE International Conference on Networks (ICON). pp. 52–57. IEEE (2012)
  • [15] Gao, P., Xu, Y., Chao, H.J.: Ovs-cab: Efficient rule-caching for open vswitch hardware offloading. Computer Networks 188, 107844 (2021)
  • [16] Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman (1979)
  • [17] Hajiaghayi, M., Jain, K., Konwar, K., Lau, L., Mandoiu, I., Russell, A., Shvartsman, A., Vazirani, V.: The minimum k-colored subgraph problem in haploty** and dna primer selection. In: Proceedings of the International Workshop on Bioinformatics Research and Applications (IWBRA). pp. 1–12 (2006)
  • [18] Hoberg, R., Rothvoss, T.: A logarithmic additive integrality gap for bin packing. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. pp. 2616–2625 (2017)
  • [19] Huang, H., Guo, S., Li, P., Liang, W., Zomaya, A.Y.: Cost minimization for rule caching in software defined networking. IEEE Transactions on Parallel and Distributed Systems 27(4), 1007–1016 (2015)
  • [20] Ibarra, O.H., Kim, C.E.: Approximation algorithms for certain scheduling problems. Mathematics of Operations Research 3(3), 197–204 (1978)
  • [21] Katta, N., Alipourfard, O., Rexford, J., Walker, D.: Cacheflow: Dependency-aware rule-caching for software-defined networks. In: Proceedings of the Symposium on SDN Research. pp. 1–12 (2016)
  • [22] Lenstra, J.K., Kan, A.R., Brucker, P.: Complexity of machine scheduling problems. In: Annals of discrete mathematics, vol. 1, pp. 343–362 (1977)
  • [23] Li, H., Guo, S., Wu, C., Li, J.: Fdrc: Flow-driven rule caching optimization in software defined networking. In: 2015 IEEE International Conference on Communications (ICC). pp. 5777–5782. IEEE (2015)
  • [24] Li, R., Pang, Y., Zhao, J., Wang, X.: A tale of two (flow) tables: Demystifying rule caching in openflow switches. In: Proceedings of the 48th International Conference on Parallel Processing. pp. 1–10 (2019)
  • [25] Li, R., Zhao, B., Chen, R., Zhao, J.: Taming the wildcards: Towards dependency-free rule caching with freecache. In: 2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS). pp. 1–10. IEEE (2020)
  • [26] Manurangsi, P.: Inapproximability of maximum edge biclique, maximum balanced biclique and minimum k-cut from the small set expansion hypothesis. In: 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017) (2017)
  • [27] McMenamin, C., Daza, V., Fitzi, M., O’Donoghue, P.: Fairtradex: A decentralised exchange preventing value extraction. In: Proceedings of the 2022 ACM CCS Workshop on Decentralized Finance and Security. pp. 39–46 (2022)
  • [28] Moreno, E., Espinoza, D., Goycoolea, M.: Large-scale multi-period precedence constrained knapsack problem: a mining application. Electronic notes in discrete mathematics 36, 407–414 (2010)
  • [29] Obadia, A., Salles, A., Sankar, L., Chitra, T., Chellani, V., Daian, P.: Unity is strength: A formalization of cross-domain maximal extractable value. arXiv preprint arXiv:2112.01472 (2021)
  • [30] Papazachos, Z.C., Karatza, H.D.: Gang scheduling with precedence constraints. In: Proceedings of the 2010 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS’10). pp. 331–337 (2010)
  • [31] Pferschy, U., Scatamacchia, R.: Improved dynamic programming and approximation results for the knapsack problem with setups. International Transactions in Operational Research 25(2), 667–682 (2018)
  • [32] Rastegar, S.H., Abbasfar, A., Shah-Mansouri, V.: Rule caching in sdn-enabled base stations supporting massive iot devices with bursty traffic. IEEE Internet of Things Journal 7(9), 8917–8931 (2020)
  • [33] Rottenstreich, O., Kulik, A., Joshi, A., Rexford, J., Rétvári, G., Menasché, D.S.: Cooperative rule caching for sdn switches. In: 9th International Conference on Cloud Networking (CloudNet). pp. 1–7 (2020)
  • [34] Rottenstreich, O., Tapolcai, J.: Optimal rule caching and lossy compression for longest prefix matching. IEEE/ACM Transactions on Networking 25(2), 864–878 (2016)
  • [35] Samavati, M., Essam, D., Nehring, M., Sarker, R.: A methodology for the large-scale multi-period precedence-constrained knapsack problem: an application in the mining industry. International Journal of Production Economics 193, 12–20 (2017)
  • [36] Sarrar, N., Uhlig, S., Feldmann, A., Sherwood, R., Huang, X.: Leveraging zipf’s law for traffic offloading. ACM SIGCOMM Computer Communication Review 42(1), 16–22 (2012)
  • [37] Sheu, J.P., Chuo, Y.C.: Wildcard rules caching and cache replacement algorithms in software-defined networking. IEEE Transactions on Network and Service Management 13(1), 19–29 (2016)
  • [38] Stonebraker, M., Jhingran, A., Goh, J., Potamianos, S.: On rules, procedure, caching and views in data base systems. ACM SIGMOD Record 19(2), 281–290 (1990)
  • [39] Wang, Y.Z., Zheng, Z., Zhu, M.M., Zhang, K.T., Gao, X.Q.: An integrated production batch planning approach for steelmaking-continuous casting with cast batching plan as the core. Computers & Industrial Engineering 173, 108636 (2022)
  • [40] Weintraub, B., Torres, C.F., Nita-Rotaru, C., State, R.: A flash (bot) in the pan: measuring maximal extractable value in private pools. In: Proceedings of the 22nd ACM Internet Measurement Conference. pp. 458–471 (2022)
  • [41] Woeginger, G.J.: On the approximability of average completion time scheduling under precedence constraints. In: Automata, Languages and Programming: 28th International Colloquium, ICALP 2001 Crete, Greece, July 8–12, 2001 Proceedings. pp. 887–897 (2001)
  • [42] Yan, B., Xu, Y., Chao, H.J.: Adaptive wildcard rule cache management for software-defined networks. IEEE/ACM Transactions on Networking 26(2), 962–975 (2018)
  • [43] Yan, B., Xu, Y., Xing, H., Xi, K., Chao, H.J.: Cab: A reactive wildcard rule caching system for software-defined networks. In: Proceedings of the third workshop on Hot topics in software defined networking. pp. 163–168 (2014)
  • [44] Yang, J., Li, T., Yan, J., Li, J., Li, C., Wang, B.: Pipecache: High hit rate rule-caching scheme based on multi-stage cache tables. Electronics 9(6),  999 (2020)
  • [45] Ye, Y., Jiang, Z., Diao, X., Yang, D., Du, G.: An ontology-based hierarchical semantic modeling approach to clinical pathway workflows. Computers in Biologyand Medicine 39, 722–732 (2009)

Appendix 0.A Motivation for rcp

A prime motivation for studying rcp comes from the area of networking [10, 43, 37, 19, 24, 38, 23, 15, 6, 34, 25, 14, 42, 33, 32, 44]. In a Software-Defined Network (SDN) traffic flow is governed by a logically centralized controller that utilizes packet-processing rules to manage the underlying switches [21]. The number of rules tends to be high while most traffic relies on a small fraction of these rules [36]. Thus, caching frequently used rules can accelerate the processing time of the packets. However, standard caching policies cannot be used due to dependencies among rules. One common form of dependency is a partial overlap in the binary strings representing the rules. For example, consider the rules R1subscript𝑅1R_{1}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT=‘10**’ (where the symbol ‘*’ denotes a wildcard) and R2subscript𝑅2R_{2}italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT=‘1000’. Then whenever R1subscript𝑅1R_{1}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is placed in the cache, R2subscript𝑅2R_{2}italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT must be placed as well. Indeed, if only R1subscript𝑅1R_{1}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is in the cache then a message with a header ‘1000’ would be matched with R1subscript𝑅1R_{1}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, causing a correctness issue in handling this packet. Now, the problem of placing a feasible subset of the rules which handle a maximum total volume of traffic can modeled as follows. We represent the rules by a DAG G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ), where viVsubscript𝑣𝑖𝑉v_{i}\in Vitalic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_V corresponds to the rule Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and there is a directed edge from visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT if placing Rjsubscript𝑅𝑗R_{j}italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in the cache implies that Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is also in the cache. The profit of each vertex viVsubscript𝑣𝑖𝑉v_{i}\in Vitalic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_V reflects the volume of traffic handled by the rule Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The goal is to select a subset of vertices of maximum total profit which fits into the cache, that is closed under precedence constraints.

rcp can be used also to model the maximal extractable value (MEV) problem in blockchain [27, 29, 40, 1]. Each blockchain transaction is associated with a fee earned by the miner who creates the block containing this transaction. The set of transactions is associated with a partial order, and each blockchain prefix has to be closed under precedence constraints. MEV is the maximum potential profit that a blockchain miner can gain from transactions that have not been validated. Computing MEV can be cast as an rcp instance where the vertices of the graph are the transactions, the edges represent the precedence constraints, the profits are the associated fees, and the bound k𝑘kitalic_k is the number of transactions that fit in a single block. Other applications of rcp variants arise, e.g., in the mining industry [28, 35] and in scheduling [30, 41, 13, 20, 22].

Appendix 0.B Hardness Result for CT

Our hardness result for ct is based on a reduction from bin packing with cluster complement conflict graph (bpcc). An undirected graph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ) is called a cluster complement if there is a partition V1,,Vmsubscript𝑉1subscript𝑉𝑚V_{1},\ldots,V_{m}italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT of V𝑉Vitalic_V such that for all i[m]𝑖delimited-[]𝑚i\in[m]italic_i ∈ [ italic_m ] it holds that Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is an independent set in G𝐺Gitalic_G and for all i,j[m]𝑖𝑗delimited-[]𝑚i,j\in[m]italic_i , italic_j ∈ [ italic_m ] where ij𝑖𝑗i\neq jitalic_i ≠ italic_j and any vVi𝑣subscript𝑉𝑖v\in V_{i}italic_v ∈ italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and uVj𝑢subscript𝑉𝑗u\in V_{j}italic_u ∈ italic_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT it holds that {u,v}E𝑢𝑣𝐸\{u,v\}\in E{ italic_u , italic_v } ∈ italic_E. We now formally define the bpcc problem.

Definition 2

The bin packing with cluster complement conflict graph (bpcc) is defined as follows.
Input: A cluster complement G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ), a weight function w:V0+:𝑤𝑉subscriptsuperscript0w:V\rightarrow\mathbb{Z}^{+}_{0}italic_w : italic_V → blackboard_Z start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and a value k𝑘k\in\mathbb{N}italic_k ∈ blackboard_N.
Configuration: An independent set CV𝐶𝑉C\subseteq Vitalic_C ⊆ italic_V in G𝐺Gitalic_G such that w(C)k𝑤𝐶𝑘w(C)\leq kitalic_w ( italic_C ) ≤ italic_k.
Solution: For some m𝑚m\in\mathbb{N}italic_m ∈ blackboard_N, we say that (C1,,Cq)subscript𝐶1subscript𝐶𝑞\left(C_{1},\ldots,C_{q}\right)( italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_C start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) is a solution with cardinality q𝑞qitalic_q if the following holds.

  • For every i[q]𝑖delimited-[]𝑞i\in[q]italic_i ∈ [ italic_q ] it holds that Cisubscript𝐶𝑖C_{i}italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a configuration.

  • For all vV𝑣𝑉v\in Vitalic_v ∈ italic_V there is i[q]𝑖delimited-[]𝑞i\in[q]italic_i ∈ [ italic_q ] such that vCi𝑣subscript𝐶𝑖v\in C_{i}italic_v ∈ italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Objective: Find a solution of minimum cardinality.

Proof of Theorem 1.2: We show a reduction from bpcc to ct. Let I=(G=(V,E),w,k)𝐼𝐺𝑉𝐸𝑤𝑘I=(G=(V,E),w,k)italic_I = ( italic_G = ( italic_V , italic_E ) , italic_w , italic_k ) be a bpcc instance. Let V1,,Vmsubscript𝑉1subscript𝑉𝑚V_{1},\ldots,V_{m}italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT be the unique partition of V𝑉Vitalic_V into maximal independent sets, which exists and can be found in polynomial time since G𝐺Gitalic_G is cluster complement. Then, define the reduced ct instance XI=(H=(𝒱,),s,K)subscript𝑋𝐼𝐻𝒱𝑠𝐾X_{I}=(H=({\mathcal{V}},{\mathcal{E}}),s,K)italic_X start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT = ( italic_H = ( caligraphic_V , caligraphic_E ) , italic_s , italic_K ) as follows

  • The vertex set 𝒱𝒱{\mathcal{V}}caligraphic_V of XIsubscript𝑋𝐼X_{I}italic_X start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT contains a root r𝑟ritalic_r and a vertex risubscript𝑟𝑖r_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for every i[m]𝑖delimited-[]𝑚i\in[m]italic_i ∈ [ italic_m ], where (r,ri)𝑟subscript𝑟𝑖(r,r_{i})\in{\mathcal{E}}( italic_r , italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ caligraphic_E (risubscript𝑟𝑖r_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a child of r𝑟ritalic_r for every i[m]𝑖delimited-[]𝑚i\in[m]italic_i ∈ [ italic_m ]). For every i[m]𝑖delimited-[]𝑚i\in[m]italic_i ∈ [ italic_m ] and every vVi𝑣subscript𝑉𝑖v\in V_{i}italic_v ∈ italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT define a leaf vsubscript𝑣\ell_{v}roman_ℓ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT and add an edge (ri,v)subscript𝑟𝑖subscript𝑣(r_{i},\ell_{v})\in{\mathcal{E}}( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , roman_ℓ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) ∈ caligraphic_E. Overall, we get a two-level star graph.

  • Define the size function s:𝒱0+:𝑠𝒱subscriptsuperscript0s:{\mathcal{V}}\rightarrow\mathbb{Z}^{+}_{0}italic_s : caligraphic_V → blackboard_Z start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that s(r)=0𝑠𝑟0s(r)=0italic_s ( italic_r ) = 0, for all i[m]𝑖delimited-[]𝑚i\in[m]italic_i ∈ [ italic_m ] define s(ri)=2k𝑠subscript𝑟𝑖2𝑘s(r_{i})=2\cdot kitalic_s ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = 2 ⋅ italic_k, and for all i[m]𝑖delimited-[]𝑚i\in[m]italic_i ∈ [ italic_m ] and vVi𝑣subscript𝑉𝑖v\in V_{i}italic_v ∈ italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT define s(v)=w(v)𝑠subscript𝑣𝑤𝑣s(\ell_{v})=w(v)italic_s ( roman_ℓ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) = italic_w ( italic_v ).

  • Define K=3k𝐾3𝑘K=3\cdot kitalic_K = 3 ⋅ italic_k.

For every CV𝐶𝑉C\subseteq Vitalic_C ⊆ italic_V, let

X(C)={r}i[m]|CVi{ri}vC{v}.𝑋𝐶𝑟subscript𝑖conditionaldelimited-[]𝑚𝐶subscript𝑉𝑖subscript𝑟𝑖subscript𝑣𝐶subscript𝑣X(C)=\{r\}\cup\bigcup_{i\in[m]~{}|~{}C\cap V_{i}\neq\emptyset}\{r_{i}\}\cup% \bigcup_{v\in C}\{\ell_{v}\}.italic_X ( italic_C ) = { italic_r } ∪ ⋃ start_POSTSUBSCRIPT italic_i ∈ [ italic_m ] | italic_C ∩ italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ ∅ end_POSTSUBSCRIPT { italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } ∪ ⋃ start_POSTSUBSCRIPT italic_v ∈ italic_C end_POSTSUBSCRIPT { roman_ℓ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT } . (1)
Claim 7

For every CV𝐶𝑉C\subseteq Vitalic_C ⊆ italic_Vif C𝐶Citalic_C is a configuration of I𝐼Iitalic_I then X(C)𝑋𝐶X(C)italic_X ( italic_C ) is a configuration of XIsubscript𝑋𝐼X_{I}italic_X start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT.

Proof

Assume that C𝐶Citalic_C is a configuration of I𝐼Iitalic_I. Observe that, by (1), X(C)𝑋𝐶X(C)italic_X ( italic_C ) is closed under the precedence constraints. Moreover,

s(X(C))=s(r)+i[m]|CVis(ri)+vCs(v)=0+2k+w(C)3k=K.𝑠𝑋𝐶𝑠𝑟subscript𝑖conditionaldelimited-[]𝑚𝐶subscript𝑉𝑖𝑠subscript𝑟𝑖subscript𝑣𝐶𝑠subscript𝑣02𝑘𝑤𝐶3𝑘𝐾s(X(C))=s(r)+\sum_{i\in[m]~{}|~{}C\cap V_{i}\neq\emptyset}s(r_{i})+\sum_{v\in C% }s(\ell_{v})=0+2k+w(C)\leq 3\cdot k=K.italic_s ( italic_X ( italic_C ) ) = italic_s ( italic_r ) + ∑ start_POSTSUBSCRIPT italic_i ∈ [ italic_m ] | italic_C ∩ italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ ∅ end_POSTSUBSCRIPT italic_s ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_v ∈ italic_C end_POSTSUBSCRIPT italic_s ( roman_ℓ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) = 0 + 2 italic_k + italic_w ( italic_C ) ≤ 3 ⋅ italic_k = italic_K .

The first equality follows from (1). The second equality holds since C𝐶Citalic_C is a configuration; thus, it is an independent set in G𝐺Gitalic_G, and it can contain vertices from at most one Vi,i[m]subscript𝑉𝑖𝑖delimited-[]𝑚V_{i},i\in[m]italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i ∈ [ italic_m ]. The inequality holds since C𝐶Citalic_C is a configuration. We conclude that X(C)𝑋𝐶X(C)italic_X ( italic_C ) is a configuration of XIsubscript𝑋𝐼X_{I}italic_X start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT. \square

For every C𝒱𝐶𝒱C\subseteq{\mathcal{V}}italic_C ⊆ caligraphic_V let

I(C)=vC|vV{v}.𝐼𝐶subscriptsubscript𝑣conditional𝐶𝑣𝑉𝑣I(C)=\bigcup_{\ell_{v}\in C~{}|~{}v\in V}\{v\}.italic_I ( italic_C ) = ⋃ start_POSTSUBSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ∈ italic_C | italic_v ∈ italic_V end_POSTSUBSCRIPT { italic_v } . (2)
Claim 8

For every C𝒱𝐶𝒱C\subseteq{\mathcal{V}}italic_C ⊆ caligraphic_V if C𝐶Citalic_C is a configuration of XIsubscript𝑋𝐼X_{I}italic_X start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT then I(C)𝐼𝐶I(C)italic_I ( italic_C ) is a configuration of I𝐼Iitalic_I.